1. Introduction
Fossil fuels can no longer meet the demands of sustainable development. Accelerating the energy transition and building a new power system centered on renewable energy have become key measures for achieving the strategic goals of “Carbon Peaking” and “Carbon Neutrality”. Among these, solar energy, with its abundant resources, environmental friendliness, and technological maturity, has emerged as one of the primary sources of renewable energy production globally [
1,
2]. China’s PV installed capacity has experienced rapid growth in recent years. China’s photovoltaic installed capacity and power generation costs over the past decade are shown in
Figure 1. By 2024, the national PV installed capacity reached approximately 889 GW and the photovoltaic power generation reached 838.3 TWh. Consequently, PV power generation is a vital component of clean energy supply. Meanwhile, the cost of electricity generation from PV units has decreased year by year, enhancing the economic competitiveness of clean energy. This trend is driving energy structure transformation, reducing electricity consumption costs, and accelerating progress toward Carbon Neutrality goals.
However, PV power generation heavily relies on meteorological conditions, such as solar radiation intensity, temperature, and humidity—natural factors characterized by instability [
3]. Consequently, its output power exhibits pronounced intermittency and poor dispatchability [
4]. During peak electricity demand periods, the unpredictability of PV generation may cause imbalances between power supply and demand, potentially disrupting grid normal operations. This increases the complexity of power system dispatch and poses challenges to grid stability and the reliability of electricity supply. Against the backdrop of high-proportion participation of new energy sources in electricity market transactions, the uncertainty of PV power generation will lead to more frequent price fluctuations in the electricity market, posing a severe challenge to the future nationwide unified spot electricity market.
Accurate short-term PV power prediction has become a critical research focus to address these challenges. It acts as a key tool for power system operators to anticipate PV generation fluctuations, thereby optimizing power dispatch. Integrating improved PV power forecasting with other balancing resources including smart grids, energy storage systems such as batteries and pumped hydro, and demand-side management significantly facilitates supply–demand balancing and ensures stable grid operation. Concurrently, precise PV generation prediction results empower PV power generation enterprises to refine their bidding strategies in electricity markets. By accurately estimating future generation, they can formulate more rational clearing/bidding plans in day-ahead and intraday markets, avoiding risk and deviation penalties while maximizing profits. The core of PV power prediction lies in predicting generation output over a specific period based on meteorological data and historical power records. Traditional PV prediction methods primarily fall into three categories: physical models [
5], statistical models [
6], and machine learning models [
7,
8].
Physical models calculate PV power generation by modeling solar radiation propagation patterns using meteorological data and physical laws. However, due to the complexity of PV systems and variable meteorological conditions, these models typically require numerous parameters, demand high-quality input data, and involve significant computational complexity, presenting challenges in practical applications. Statistical models predict PV power generation by identifying patterns in historical data. Common statistical methods include Autoregressive Integrated Moving Average (ARIMA) models [
9], Support Vector Machines (SVMs) [
10], and neural networks [
11]. While statistical models can provide relatively accurate predictions under certain conditions, their reliance on linear relationships and strong assumptions limits their predictive accuracy when confronted with nonlinear, non-stationary, and highly variable meteorological conditions.
In recent years, with the rapid advancement of machine learning and deep learning technologies, data-driven PV power prediction methods [
12] have gradually become mainstream. In particular, deep learning models such as convolutional neural networks (CNNs) [
13], recurrent neural networks (RNNs) [
14], and long short-term memory (LSTM) networks [
15] have been extensively applied in PV power prediction due to their advantages in processing large-scale data and capturing temporal features. LSTM has emerged as one of the most prevalent deep learning models in PV power forecasting due to its ability to effectively address the vanishing gradient and exploding gradient issues encountered by traditional neural networks when processing long-term sequence data. Numerous researchers have employed LSTM for short-term PV power forecasting with promising results. For instance, Gao et al. [
16] compared the errors of three LSTM-based power forecasting methods to evaluate the role of operational weather forecasts in PV power forecasting. However, LSTM may exhibit temporal dependency issues when processing long sequences and suffer from low computational efficiency for large-scale data. Consequently, increasing attention has shifted toward applying the GRU model in PV power forecasting. As a variant of LSTM [
1], GRU simplifies architecture while retaining strong performance [
17]. In addition, to better consider the complex spatio-temporal correlations in photovoltaic power output prediction and calculate the probability and fluctuation range of the prediction results, further research has introduced the attention mechanism [
18] and adopted advanced architectures such as the improved CNN [
19]. Literature [
20] points out that the Informer model, through the ProbSparse attention mechanism and multi-scale feature fusion, can significantly reduce the computational complexity of long sequences while effectively capturing the key temporal dependencies and multi-scale fluctuation features in photovoltaic power sequences, achieving a very significant improvement in prediction accuracy. At the same time, many researchers have noticed that the temporal convolutional network can not only efficiently extract the temporal features of the input sequence, providing more effective representation information for subsequent prediction models [
21], but also, through its parallel computing capability and dilated causal convolution structure, can effectively capture the long-term dependencies in photovoltaic power sequences, enhancing the stability and accuracy of the prediction [
22].
While GRU exhibits lower computational costs during training, PV power forecasting involves complex meteorological conditions and nonlinear relationships. A single GRU model may struggle to fully capture these intricate features. To overcome this limitation, researchers have combined optimization algorithms with GRU to enhance prediction accuracy. For instance, Wang et al. [
23] adopted a Red-tailed Hawk (RTH) algorithm-optimized GRU for PV power forecasting under different weather conditions and demonstrated its feasibility in PV power generation by comparing it with common models. Chen Qingming et al. [
24] developed a GRU-based PV power forecasting model using the Grey Wolf Optimization (GWO) algorithm, demonstrating superior performance compared to LSTM. Li Sheng et al. [
25] optimized GRU parameters using the Pelican Optimization Algorithm (POA) for PV power forecasting, with results similarly demonstrating POA-GRU’s superior power prediction capability for PV stations. Comparative analysis of relevant literature is presented in
Table 1.
Although models like POA and GWO effectively enhance prediction accuracy when combined with GRU, they exhibit limitations in handling complex nonlinear relationships. These models struggle to accurately capture long-term dependencies and dynamic features within PV power sequences and possess weak generalization capabilities. The introduction of the Kolmogorov–Arnold Network (KAN) [
26] presents a promising solution. The KAN architecture demonstrates that multidimensional dynamic systems can be precisely represented using a finite set of functions, granting this model inherent advantages in approximating complex nonlinear functions [
27]. Combining the GRU model with KAN’s nonlinear approximation capability enables more effective capture of complex features and long-term dependencies in time series, reducing computational burden and enhancing prediction accuracy. For instance, Su et al. [
28] employed GRU-KAN for predicting high-speed aircraft trajectories, while Ma Feihu et al. [
29] employed GRU-KAN for predicting shared bicycle usage around subway stations. Both cases demonstrated significant improvements in prediction accuracy after incorporating KAN enhancements. However, these models suffer from deficiencies due to inadequate factor screening, which increases training costs.
In addition, probabilistic forecasting has become a research hotspot in PV power prediction, as it better captures the uncertainty of PV power compared to point forecasts. For instance, Niu proposed a probabilistic forecasting model based on KDE using point forecast results, providing support for the secure dispatch of power systems [
30].
In summary, addressing current research gaps, this paper proposes a hybrid PV power prediction model integrating the Crested Porcupine Optimizer–Random Forest (CPO-RF) and GRU-KAN. First, the DPCA classifies weather conditions. Then, the CPO-RF model performs feature screening on PV power time series data. Subsequently, a hybrid GRU model incorporating a KAN layer is employed to forecast PV power generation, with SHAP values evaluating feature importance and the impact of causal features. Finally, KDE is adopted to evaluate the interval forecasting performance of the KAN-GRU model
The rest of this article is organized as follows.
Section 2 introduces the principles of combined prediction models, covering DPCA, CPO, RF, KAN, GRU, SHAP, and KDE models, respectively, introduces and discusses the research framework, and proposes a combined photovoltaic power prediction model based on DPCA-CPO-RF-KAN-GRU.
Section 3 conducts a computing power analysis and compares it with common prediction models to verify the superiority of the model in this study. Finally, the research conclusion is drawn in
Section 4.
2. Methodology
This section first introduces the basic principles of the Density Peak Clustering Algorithm and the Crested Porcupine Optimization algorithm. Secondly, introduce the Random Forest model and clarify the application steps of the model. Then, introduce the structure of the KAN layer. Finally, the GRU model is introduced and the basic theoretical basis for conducting predictions with GRU is analyzed.
2.1. DPCA Model
The Density Peak Clustering Algorithm [
31,
32,
33] is a new clustering algorithm proposed in recent years, which identifies non-spherical clusters based on the distances between data points and can automatically determine cluster centers and the number of clusters. In short-term PV power forecasting, historical power data and corresponding meteorological variables exhibit distinct clustering characteristics due to different weather conditions. Directly inputting mixed weather-condition data into a prediction model increases the complexity of nonlinear fitting and reduces forecasting accuracy. Consequently, this study employs DPCA to cluster historical PV and meteorological data: each cluster center corresponds to a typical weather pattern, and data under similar weather conditions are grouped into the same cluster. Subsequent prediction models can then learn the mapping between features and power output separately for each cluster, thereby mitigating interference from heterogeneous weather data and improving the robustness of predictions.
The algorithm is based on two assumptions about the clustering center: (1) the density of the clustering center is high; (2) the distance between the cluster center and the data points with higher local density is relatively larger. The model is introduced as follows:
Set the dataset to be clustered as
; the corresponding set of subscripts is
,
is based on the distance between data points. For each of these data points
, two important parameters local density
and distance
need to be calculated. For local density
, the expression is as Equation (1).
In Equation (1), represents the distance between data points and ; stands for truncation distance. According to Equation (1), it can be seen that local density represents the number of points in dataset whose distance from data point is less than the truncation distance. For large datasets, the algorithm is robust to the selection of stage distance.
The Gaussian kernel function is a continuous value, and the probability that different data points have the same local density value is small. A descending order subscript order of the local density set
of all data points is expressed as
, and satisfies
Define the distance
as Equation (3).
In Equation (3), for every data point , can be computed. All data points are drawn in a two-dimensional coordinate plan, which is a decision graph. It can be intuitively seen that the data point with both large value and value is the clustering center.
After the clustering center is determined, each remaining data point is assigned to the class cluster belonging to its nearest point with higher density.
2.2. RF Model
The Random Forest (RF) algorithm is an ensemble learning technique based on decision trees [
34,
35,
36]. It enhances generalization ability and mitigates overfitting by introducing randomness during training. RF has the ability to evaluate feature importance. This study employs the RF model to identify critical factors affecting PV power sequences from meteorological and environmental variables. Selecting factors with higher importance as inputs can enhance the accuracy of predictions and reduce interference from irrelevant features. During the factor screening process, the Gini coefficient is employed to evaluate the influence of each meteorological indicator on PV power.
The procedure for calculating the influence degree of a PV dataset containing features is as follows:
- (1)
This paper employs a bootstrap model to randomly select sample sets from the initial dataset, generating regression trees. During each random sampling process, the unselected sample groups form out-of-bag datasets.
- (2)
When the dataset contains features, at each node, () features are randomly selected as the candidate feature set. Based on this candidate set, the optimal feature subset is chosen to split the node.
- (3)
Feature importance is assessed using the Gini score, calculated as follows:
In Equation (4), represents the Gini coefficient at node , represents the proportion of category at node , and represents the total number of categories.
The importance formula for feature
at node
is shown in Equation (5):
In Equation (5), represents the Gini coefficient for each feature , represents the Gini coefficient at node , and represents the Gini coefficient at node .
2.3. CPO Model
The CPO algorithm is a novel metaheuristic algorithm inspired by the defensive behaviors of crested porcupines [
37,
38,
39]. The CPO algorithm’s parameter optimization primarily consists of two phases: global exploration and local development. The exploration phase mimics behaviors such as vocal warning and odor diffusion to broadly search the solution space, while the exploitation phase refines solutions by emulating quill erection and tactical attacks. Since PV power prediction is subject to multiple nonlinear factors and the accuracy of the hybrid model is highly sensitive to its key parameters, arbitrary parameter selection may lead to local optima or overfitting. Therefore, this study employs CPO to optimize the key parameters of the RF model with the objective of minimizing PV power prediction error.
2.3.1. Global Exploration
During the global exploration phase, the Crested Porcupine maintains a distance from predators, employing both visual and auditory defense strategies to deter them.
First Defense Strategy: The Crested Porcupine raises and fans its quills to warn predators. The behavior model is as follows:
In Equation (6), indicates the position of the individual during iteration; represents the position of the individual in the next iteration; is a random number following a normal distribution; is a random number within the interval [0, 1]; is the optimal solution of function ; indicates the position of the predator at iteration time .
Second Defense Strategy: The Crested Porcupine generates noise and further threatens predators. The behavior model is as follows:
In Equation (7), is a binary vector composed of 0 and 1; is a random number between [0, 1]; , are two random integers within [1, N].
2.3.2. Local Development
During the local development phase, predators have already approached the Crested Porcupine at close range. The Crested Porcupine will employ scent and physical attacks to counter the predators.
Third Defense Strategy: The Crested Porcupine secretes a foul odor that spreads throughout the surrounding area, deterring predators from further approach. The behavior model is as follows:
In Equation (8), is a random integer within [1, N], is the defensive factor, is the odor diffusion factor, and is the parameter that controls the search direction.
Fourth Defense Strategy: When all previous strategies fail, the Crested Porcupine will launch a physical attack against the predator. The behavior model is as follows:
In Equation (9), is the convergence factor, , are two random integers within [1, N], and is the inelastic collision force generated when an individual physically attacks a predator.
2.3.3. The Process of the CPO-RF Collaborative Optimization Algorithm
In this study, the CPO algorithm is employed to optimize two key hyperparameters of the RF model: the number of trees and the maximum depth of trees. The detailed procedure is outlined as follows:
- (1)
Population initialization: The population is initialized with each individual’s solution randomly generated within the predefined parameter ranges, providing a diversified starting point for the search process.
- (2)
Fitness evaluation: For each individual, an RF model is constructed based on its current parameter set. The fitness value is then computed to assess the model’s performance on the training data.
- (3)
CPO parameter update: The parameters of each individual are updated according to the CPO exploration and exploitation rules. After the update, the fitness of the corresponding RF model is recalculated to explore potentially better parameter combinations.
- (4)
Iterative search: Steps (2) and (3) are repeated iteratively. The process continues until the maximum number of iterations is reached.
- (5)
Termination and output: Upon meeting the termination criterion, the optimal combination of tree count and tree depth is extracted. This configuration is adopted as the final hyperparameter setting for the RF model, which is subsequently used for feature selection to enhance overall model performance.
2.4. KAN Model
KAN is a novel neural network architecture based on the Kolmogorov–Arnold Representation Theorem. Compared to traditional Multi-Layer Perceptron (MLP), its key difference lies in the configuration and operational mechanism of its activation functions. The KAN model places learned activation functions at the network edges; these activation functions are not fixed but adaptively adjust during training. The KAN model achieves effective fitting of complex network structures with fewer parameters while maintaining flexibility, offering novel approaches to address issues such as parameter redundancy and limited adaptability encountered by traditional MLPs in handling complex tasks. Given the strong nonlinearity between weather features and power output, this study employs KAN as a feature mapping layer to transform features selected by Random Forest into high-dimensional representations that are more suitable for GRU modeling.
Kan model composed of an external function and an internal function, its mathematical expression is
In Equation (10),
represents an n-dimensional input vector. The domain of the internal function
is typically [0, 1], with its range being the set of real numbers
. The external function
is often defined with both domain and range as
. The activation function equation is
In Equations (11)–(13), , are both learnable parameters, and is the bias activation function; in the original KAN text, this function is initialized as the activation function, is a set of one-dimensional function combinations, and is a predefined set of base spline functions. During training, the spline parameter is continuously optimized to adjust the spline shape, thereby fitting the training data.
2.5. GRU Model
As a type of recurrent neural network (RNN), the GRU model can effectively handle long-term dependencies in sequential data through its unique gating mechanism. Compared with traditional deep learning models, the GRU model demonstrates superior performance in many sequence tasks and can accurately capture the dynamic temporal characteristics of photovoltaic power output that fluctuate with meteorological factors, making it suitable for photovoltaic power generation forecasting.
The GRU primarily consists of an update gate and a reset gate. The GRU update gate determines the learning degree of the hidden state
and input information
through weight coefficients
and
. The update gate
is calculated as follows:
In Equation (14), the reset gate of the GRU determines the learning degree between the hidden state
and the input information
through weight coefficients
and
. The reset gate
is calculated as follows:
The GRU also includes a candidate hidden state
, whose calculation formula is as follows:
Finally, the update formula for the hidden state
is as follows:
2.6. SHAP Model
Generally speaking, the technology of machine learning is regarded as an opaque tool, which lacks the ability to provide clear and understandable interpretation of results [
40,
41]. Therefore, the SHAP architecture was employed to clarify the specific influence of each attribute on the prediction results, thereby facilitating the interpretation of deep learning techniques.The main goal of this architecture is to quantify the interaction degree between each attribute and the probability function, which is known as the SHAP value. It can reflect whether a specific characteristic can significantly increase or decrease the final output probability. The specific operation steps include several parts:
In Equation (18), represents the contribution of the feature; represents the feature subset; represents the feature set; represents the total number of input features; is the load prediction value when the sample is the feature values in ; is the load prediction value when the sample only includes the feature values in .
2.7. Interval Prediction Based on the KDE Model
The KDE is a non-parametric method for estimating the probability density function. It smooths and superimposes each data point with a kernel function to estimate the continuous probability distribution of the error without assuming that the error follows a specific distribution. The specific steps include using the trained KAN-GRU model to make predictions on the validation set to obtain the point prediction sequence, calculating the error sequence between it and the true value, then applying KDE to the error sequence to estimate its probability density function, and determining it through Silverman’s rule or cross-validation to balance the smoothness and bias of the estimation. Finally, the prediction interval is constructed. Given a confidence level of
, on the estimated error density function, find the lower quantile
and the upper quantile
such that
. Then, the power prediction interval (
,
) for the future moments
can be obtained through the following formula:
In Equation (18), represents the point prediction value of the KAN-GRU model.
The prediction intervals generated by the KDE model can visually represent the possible fluctuation range of photovoltaic power, providing a crucial quantitative basis for the power system to cope with the uncertainty of photovoltaic output and formulate robust scheduling plans.
2.8. Research Framework
Based on PV power forecasting, the proposed CPO-RF hybrid model in this study comprises the following two steps:
- (1)
Feature Selection. The CPO model is employed to optimize the decision tree parameters and depth parameters of the RF model. Based on the optimized results, feature selection is performed on the photovoltaic sequence, with the selection outcomes serving as input variables for prediction.
- (2)
Power Forecasting. The GRU model is enhanced by incorporating a KAN layer to optimize it for PV power forecasting. The KAN-GRU hybrid model is illustrated in
Figure 2. SHAP analysis and interval prediction are conducted to further demonstrate the rationality of the proposed model
This study employs Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Square Error (MSE), and the coefficient of determination (R-Square, R
2) to evaluate model performance. The error calculation formula for the predicted value
and the actual value
is as follows:
For interval estimation, this study adopts Prediction Interval Coverage Probability (PICP) to evaluate the performance of the model. PICP reflects the probability that the actual power falls within the predicted fluctuation range and can be used to evaluate the reliability of the prediction model. It can be expressed as
The PV power forecasting process based on the DPCA-CPO-RF-KAN-GRU is illustrated in
Figure 3.
3. Case Analysis
This section verifies the validity of the proposed prediction model through the actual operation data of a photovoltaic power station in a certain area in northwest China. The screening of photovoltaic power prediction factors is carried out. Then, model construction and analysis are carried out to verify the superiority of this prediction model over conventional prediction models.
3.1. Data Sources and Processing
The experimental data were collected from a 50MW-level PV power station located in eastern Xinjiang Uygur Autonomous Region, China, within a general geographic coordinate range of 42–43° N latitude and 94–95° E longitude. The data span the period from 1 January 2022, to 31 December 2022, recorded at 15 min intervals. The region is characterized by a typical temperate continental arid climate, featuring abundant sunshine hours, intense solar radiation, low annual precipitation, and significant diurnal temperature fluctuations. However, it is also occasionally subject to dust events and rapid weather changes. These conditions create an environment with high photovoltaic potential alongside notable meteorological variability, presenting a suitable scenario for validating the short-term forecasting robustness of the proposed model.
Data preprocessing consists of three steps:
- (1)
Data cleaning: Abnormal data like missing values, duplicate records and outliers were eliminated. Considering the continuity of meteorological data changes in a short period of time, the average values of adjacent moments before and after missing data and abnormal data are used to fill in. A total of 35,040 valid data points were obtained.
- (2)
Feature normalization: To eliminate scale discrepancies among features and expedite model convergence, all input features were normalized to the interval [0, 1] via the Min-Max scaling method.
- (3)
Dataset splitting: To ensure a realistic assessment of temporal prediction performance, the dataset was split in chronological order. Specifically, the first 70% of the sequential data is assigned to the training set, the subsequent 10% serves as the validation set for hyperparameter optimization, and the final 20% is used as the test set to evaluate the final model performance.
3.2. Density Peak Clustering Algorithm Analysis
In this study, variables including module temperature, ambient temperature, pressure, humidity, total solar radiation, direct solar radiation, diffuse solar radiation, and PV power were employed as inputs to the DPCA model for the purpose of weather pattern clustering analysis. According to the method described above, the clustering results are shown in
Figure 4. In
Figure 4, the horizontal axis represents the density value and the vertical axis represents the relative distance. As shown in
Figure 4, there are three points with relatively large values, which can be used as the clustering center, and the number of classes is 3. Therefore, weather conditions can be classified into three categories. Among them, the red dots are Weather Type 0, the green dots are Weather Type 1, and the blue dots are Weather Type 2.
The power generation curves of each weather category over four consecutive days and the average vectors of weather features after clustering are shown in
Figure 5 and
Table 2, respectively. Each time unit in
Figure 5 corresponds to 15 min, resulting in 384 total time points.
Figure 5 presents the blue, green, and red curves denoting Weather 0, Weather 1, and Weather 2, respectively. Among the indicators in
Table 2, total solar radiation is the total solar radiant power incident per unit area on a horizontal surface. Direct solar radiation measures the solar radiant power per unit area received on a surface oriented normal to the direction of the sun’s direct beam. Diffuse solar radiation quantifies the solar radiant power per unit area incident on a horizontal surface from the entire sky hemisphere, excluding the direct beam from the solar. It can be seen from the results that Weather Type 2 exhibits the relatively high module temperature and the greatest total irradiance. This is accompanied by high direct and diffuse radiation, resulting in the maximum average power output. It represents ideal clear-sky conditions with minimal cloud cover. In contrast, Weather Type 1 has the lowest average power generation and solar radiation. Its temperature and radiation values are low, while humidity is high, making it unfavorable for PV system operation. Weather Type 0, while favorable, shows a high average power output. It is characterized by relatively long sunshine duration, high temperature, and strong radiation.
3.3. Feature Selection Based on CPO-RF
In this study, data from the training set are used for feature selection. The dataset contains seven potential features influencing PV power: module temperature, temperature, pressure, humidity, total solar radiation, direct solar radiation, and diffuse solar radiation. This study employs the CPO algorithm to optimize the number of decision trees and tree depth within the Random Forest model. Initial parameters are set as follows: population size = 30, maximum iterations = 20, tree count search range = [50, 100], and tree depth search range = [10, 30]. After iterative optimization, the optimal parameters were determined to be 86 trees and a depth of 24. These parameters were then applied to screen the features. The optimization process of the CPO-RF hybrid model is illustrated in
Figure 6.
During the feature screening stage, a small number of representative key features were selected based on the importance assessment of each feature.
The importance levels of the seven potential influencing factors are shown in
Table 3. To reduce input redundancy in the model and enhance the predictive efficiency and effectiveness of the hybrid model, this study selects meteorological features with importance levels exceeding 4 as input variables for the photovoltaic forecasting model.
This study presents a detailed analysis of the PV power forecasting, with key findings summarized as follows. Total solar radiation, the core energy source for PV generation (sum of direct and diffuse components), sees direct radiation sharply boosting power on clear days while diffuse radiation stabilizes basic output on overcast days. Module temperature has a conditional correlation with PV power: moderate levels support stable output under sufficient radiation, while temperatures exceeding critical temperature slightly reduce module efficiency and cause power drop.
To validate the effectiveness of the feature selection, this study conducted predictive experiments using the GRU model with different input feature sets. These sets were constructed based on varying thresholds of feature importance. Type 1 is a combination of variables with an absolute value of importance greater than 0. Type 2 is a combination of variables with an absolute value of importance greater than 2. Type 3 is a combination of variables with an absolute value of importance greater than 4. Type 4 is a combination of variables with an absolute value of importance greater than 8. The combination mode and prediction effect of predictors are shown in
Table 4 and
Figure 7.
In
Figure 7, the PV power curves of different correlation variables of the prediction model are represented by different combination colors. It can be found that the factor combination Type 3 based on the CPO-RF model has the highest accuracy. Feature selection based on the CPO-RF model can help the prediction model achieve higher accuracy. The input accuracy of Type 1 and Type 2 is relatively limited. An excessive number of irrelevant variables can lead to information pollution and thus create redundancy. The prediction model is affected by redundant information during the training process, which reduces its accuracy. The combination of Type 4 variables as input variables for prediction also cannot achieve high accuracy. Due to the limited information contained in the relevant variables and the inability to provide more information to the model, resulting in a decrease in prediction accuracy.
In addition,
Table 5 presents the prediction accuracy when different influencing factors are utilized as inputs. The results demonstrate that the model achieves an RMSE value of 3.70, MAE value of 3.05, and R
2 of 95.70% for Weather 0. In comparison, for Weather 1, the model achieves 3.55 RMSE, 2.89 MAE, 94.77% R
2. Similarly, on Weather 2, the model achieves an RMSE value of 3.57, MAE value of 2.92, and R
2 of 96.31%. These indices reflect a superior prediction accuracy compared to other types, thus confirming the effectiveness of the CPO-RF model screening.
3.4. Model Construction and Analysis
To better evaluate the performance of the proposed hybrid PV power forecasting model, this study compares the KAN-GRU model with commonly used PV power forecasting models using the test set. The input features are module temperature X1, total solar radiation X5, direct solar radiation X6, and diffuse solar radiation X7. The comparison models include GRU, KAN-TCN, TCN, BiLSTM, and KAN-BiLSTM, all configured with two layers and eight hidden layer dimensions. The combined prediction model described in this study operates in a Python 3.7 (Python Software Foundation, Wilmington, DE, USA) environment. This paper analyzes the prediction results for three consecutive days in the test set.
Figure 8,
Figure 9 and
Figure 10 show the prediction results under different weather types.
Table 5 shows the prediction error results of each model under different weather types.
As shown in
Table 5, among the listed models, the KAN-GRU model achieved the best performance in terms of RMSE, MSE, MAE, and R
2 for PV power forecasting. The GRU, KAN-TCN, TCN, BiLSTM, and KAN-BiLSTM models demonstrated relatively poorer performance. This advantage primarily stems from the flexibility and robust nonlinear function approximation capabilities of the KAN network, coupled with the GRU structure’s efficiency in processing sequential data. By integrating the learnable activation functions of the KAN network with the gating mechanism of GRU, the KAN-GRU model effectively captures the complex dynamic characteristics in PV power data, thereby achieving a significant improvement in prediction accuracy.
3.5. Model Explainability
To investigate the influence of input features on PV power prediction models, this study employs the SHAP method to interpret the DPCA-POA-RF-KAN-GRU.
Figure 11 shows the effect of each variable on predicted value of PV power, where (
i)~(
k) reflects the degree of influence of the variables, and (
l)~(
n) reflects the positive or negative correlation.
Subfigures (a)~(c) intuitively quantify the influence intensity of each feature via mean absolute SHAP values, with higher values indicating greater impacts on predictions. Total solar radiation and direct solar radiation have relatively large numerical impacts on final predictions, with their combined average SHAP value contribution exceeding 70%. In contrast, diffuse solar radiation and module temperature exert smaller influences, with their combined average contribution below 20%.
Subfigures (d)~(f) clearly reveal the coupling patterns between each feature and power output, with feature values distinguished by color—red for high values and blue for low values. As core energy inputs, total solar radiation and direct solar radiation show data points mostly concentrated on the right side of the horizontal axis, with positive SHAP values. The higher the feature value (darker red), the larger the SHAP value. These two features also exhibit significant SHAP value dispersion, ranging from −5 to 20, indicating that small changes in radiation intensity can trigger obvious fluctuations in PV power. All data points of diffuse solar radiation cluster in a weak positive region near SHAP value 0, with minimal fluctuations across different feature values. This reflects its stabilizing role as an energy supplement in low-radiation scenarios, maintaining the baseline power output despite mild influence. Module temperature data points fluctuate slightly around SHAP value 0, with no obvious bias between high-value (red) and low-value (blue) points. This indicates that PV modules operate stably within the optimal temperature range, and temperature changes have a mild, nonlinear impact on power output.
3.6. Interval Prediction Analysis
To further illustrate the interval prediction performance of the KAN-GRU model, this paper employs KDE for interval prediction and visualization analysis of PV power data. KDE is performed on the first 192 values of the forecast set, generating error interval results. Interval predictions are then extended to the subsequent 96 data points covering the following day. The KDE estimation results under three weather types are shown in
Figure 12. Concurrently, interval estimates were generated at different confidence levels (80%, 90%, 95%). The upper and lower bounds of the error intervals, along with their widths, are presented in
Table 6.
Figure 13 clearly displays the 95% confidence interval prediction results of KAN-GRU across 96 time points throughout the day under three weather types. The figure reveals that the predicted power values closely align with actual measurements. Particularly during critical phases of PV power output ramp-up and ramp-down, the model demonstrates exceptional fitting capability.
Table 7 presents the interval prediction coverage results at different confidence levels across the three weather types. It is evident that the proposed model achieves over 90% interval coverage at the 95% confidence level.
As shown in
Figure 13 and
Table 7, Weather 0 exhibits a narrower interval, with actual coverage exceeding 90% across all three confidence levels. This indicates the proposed model has robust prediction performance under normal weather conditions, maintaining high accuracy even within a narrow range. In contrast, the interval widths for Weather 1 and Weather 2 increase. This demonstrates the model’s adaptive capability to weather fluctuations, automatically adjusting upper and lower bounds to accommodate complex meteorological conditions. Under the highly volatile Weather 2 scenario, the model achieves 95.58% coverage at the 95% confidence level. This proves robust nonlinear feature extraction capability of the proposed model, ensuring the interval prediction covers power fluctuations even under variable weather conditions.
In summary, the proposed model excels in point forecasting, and also demonstrates high robustness in interval prediction. It provides prediction intervals with high coverage across different weather types, successfully quantifying the uncertainty of PV output. The model’s interval prediction offers robust technical support for ensuring grid stability and enabling precise decision-making in power markets.
4. Conclusions
To address the intermittent and uncertain fluctuations inherent in PV power generation, this paper proposes a hybrid forecasting model based on DPCA-CPO-RF-KAN-GRU. The DPCA is first used to classify the weather conditions. Then, the CPO-RF model is employed for data dimensionality reduction, followed by the KAN-GRU for PV power forecasting. Key findings include the following:
- (1)
The DPCA effectively classifies weather conditions into three distinct types using meteorological and PV power data. Verified by clustered feature vectors, the three weather types differ significantly in meteorological factors and photovoltaic power. This confirms the rationality of the clustering and provides a basis for forecasting specific weather types.
- (2)
The CPO-RF optimizes hyperparameters to screen key features. It selects key features closely associated with PV output as input variables thereby reducing prediction complexity and improving forecasting accuracy.
- (3)
Compared with other deep learning models, the CPO-RF-KAN-GRU hybrid forecasting model exhibits notable performance advantages in PV power prediction. KAN has strong nonlinear approximation capabilities, and GRU is highly efficient in processing time-series data. Therefore, the combined model can effectively capture the complex dynamic characteristics of PV power data. The model demonstrates a strong ability to reduce prediction errors, with an average fitting accuracy of R2 as high as 97%. Case studies demonstrate that the proposed model substantially outperforms comparative models in prediction accuracy, confirming its effectiveness and superiority for PV power forecasting. Analysis of the SHAP values indicated that the combined average SHAP value of total solar radiation and direct solar radiation contributes more than 70%.
- (4)
In terms of interval prediction, the proposed model achieves over 90% interval coverage under all three weather conditions, demonstrating strong prediction robustness. Stable interval prediction results enable power grid companies to effectively mitigate the uncertainty risks associated with photovoltaic output and enhance operational efficiency.
In the future, with advancements in artificial intelligence technology, AI-based PV power forecasting models will continue to evolve. In data processing, AI can select features more relevant to PV power output based on data characteristics, intelligently handle anomalous data, and enhance forecasting efficiency. Meanwhile, AI can learn and adapt to geographical and climatic features within the data, thereby optimizing model parameter settings to achieve superior forecasting performance. This will significantly improve the reusability and generalization capabilities of PV power forecasting models. These advancements will enhance the accuracy, robustness, and real-time performance of renewable energy power prediction, while providing intelligent tools for optimizing renewable energy trading, strengthening the flexibility and stability of power systems, and facilitating the high-penetration integration of renewable energy.