Next Article in Journal
Identification of Grounding Impulse Impedance Based on a Combined Improved Hanning Window and RLS Algorithm in Power System
Previous Article in Journal
Interlayer Interference Mechanisms During Multi-Layer Commingled Production in Low-Permeability Gas Reservoirs
 
 
Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Photovoltaic Power Prediction Using a DPCA–CPO–RF–KAN–GRU Hybrid Model

1
School of Economics and Management, Lanzhou University of Technology, Lanzhou 730050, China
2
China Electric Power Research Institute Co., Ltd., Beijing 100089, China
3
State Grid Jiangsu Marketing Service Center (Metrology Center), Nanjing 210026, China
*
Author to whom correspondence should be addressed.
Processes 2026, 14(2), 252; https://doi.org/10.3390/pr14020252
Submission received: 3 November 2025 / Revised: 25 December 2025 / Accepted: 26 December 2025 / Published: 11 January 2026
(This article belongs to the Section Energy Systems)

Abstract

In photovoltaic (PV) power generation, the intermittency and uncertainty caused by meteorological factors pose challenges to grid operations. Accurate PV power prediction is crucial for optimizing power dispatching and balancing supply and demand. This paper proposes a PV power prediction model based on Density Peak Clustering Algorithm (DPCA)–Crested Porcupine Optimizer (CPO)–Random Forest (RF)–Gated Recurrent Unit (GRU)–Kolmogorov–Arnold Network (KAN). First, the DPCA is used to accurately classify weather conditions according to meteorological data such as solar radiation, temperature, and humidity. Then, the CPO algorithm is established to optimize the factor screening characteristic variables of the RF. Subsequently, a hybrid GRU model with a KAN layer is introduced for short-term PV power prediction. The Shapley Additive Explanation (SHAP) method values evaluating feature importance and the impact of causal features. Compared with other contrast models, the DPCA-CPO-RF-KAN-GRU model demonstrates better error reduction capabilities under three weather types, with an average fitting accuracy R2 reaching 97%. SHAP analysis indicates that the combined average SHAP value of total solar radiation and direct solar radiation contributes more than 70%. Finally, the Kernel Density Estimation (KDE) is utilized to verify that the KAN-GRU model has high robustness in interval prediction, providing strong technical support for ensuring the stability of the power grid and precise decision-making in the electricity market.

1. Introduction

Fossil fuels can no longer meet the demands of sustainable development. Accelerating the energy transition and building a new power system centered on renewable energy have become key measures for achieving the strategic goals of “Carbon Peaking” and “Carbon Neutrality”. Among these, solar energy, with its abundant resources, environmental friendliness, and technological maturity, has emerged as one of the primary sources of renewable energy production globally [1,2]. China’s PV installed capacity has experienced rapid growth in recent years. China’s photovoltaic installed capacity and power generation costs over the past decade are shown in Figure 1. By 2024, the national PV installed capacity reached approximately 889 GW and the photovoltaic power generation reached 838.3 TWh. Consequently, PV power generation is a vital component of clean energy supply. Meanwhile, the cost of electricity generation from PV units has decreased year by year, enhancing the economic competitiveness of clean energy. This trend is driving energy structure transformation, reducing electricity consumption costs, and accelerating progress toward Carbon Neutrality goals.
However, PV power generation heavily relies on meteorological conditions, such as solar radiation intensity, temperature, and humidity—natural factors characterized by instability [3]. Consequently, its output power exhibits pronounced intermittency and poor dispatchability [4]. During peak electricity demand periods, the unpredictability of PV generation may cause imbalances between power supply and demand, potentially disrupting grid normal operations. This increases the complexity of power system dispatch and poses challenges to grid stability and the reliability of electricity supply. Against the backdrop of high-proportion participation of new energy sources in electricity market transactions, the uncertainty of PV power generation will lead to more frequent price fluctuations in the electricity market, posing a severe challenge to the future nationwide unified spot electricity market.
Accurate short-term PV power prediction has become a critical research focus to address these challenges. It acts as a key tool for power system operators to anticipate PV generation fluctuations, thereby optimizing power dispatch. Integrating improved PV power forecasting with other balancing resources including smart grids, energy storage systems such as batteries and pumped hydro, and demand-side management significantly facilitates supply–demand balancing and ensures stable grid operation. Concurrently, precise PV generation prediction results empower PV power generation enterprises to refine their bidding strategies in electricity markets. By accurately estimating future generation, they can formulate more rational clearing/bidding plans in day-ahead and intraday markets, avoiding risk and deviation penalties while maximizing profits. The core of PV power prediction lies in predicting generation output over a specific period based on meteorological data and historical power records. Traditional PV prediction methods primarily fall into three categories: physical models [5], statistical models [6], and machine learning models [7,8].
Physical models calculate PV power generation by modeling solar radiation propagation patterns using meteorological data and physical laws. However, due to the complexity of PV systems and variable meteorological conditions, these models typically require numerous parameters, demand high-quality input data, and involve significant computational complexity, presenting challenges in practical applications. Statistical models predict PV power generation by identifying patterns in historical data. Common statistical methods include Autoregressive Integrated Moving Average (ARIMA) models [9], Support Vector Machines (SVMs) [10], and neural networks [11]. While statistical models can provide relatively accurate predictions under certain conditions, their reliance on linear relationships and strong assumptions limits their predictive accuracy when confronted with nonlinear, non-stationary, and highly variable meteorological conditions.
In recent years, with the rapid advancement of machine learning and deep learning technologies, data-driven PV power prediction methods [12] have gradually become mainstream. In particular, deep learning models such as convolutional neural networks (CNNs) [13], recurrent neural networks (RNNs) [14], and long short-term memory (LSTM) networks [15] have been extensively applied in PV power prediction due to their advantages in processing large-scale data and capturing temporal features. LSTM has emerged as one of the most prevalent deep learning models in PV power forecasting due to its ability to effectively address the vanishing gradient and exploding gradient issues encountered by traditional neural networks when processing long-term sequence data. Numerous researchers have employed LSTM for short-term PV power forecasting with promising results. For instance, Gao et al. [16] compared the errors of three LSTM-based power forecasting methods to evaluate the role of operational weather forecasts in PV power forecasting. However, LSTM may exhibit temporal dependency issues when processing long sequences and suffer from low computational efficiency for large-scale data. Consequently, increasing attention has shifted toward applying the GRU model in PV power forecasting. As a variant of LSTM [1], GRU simplifies architecture while retaining strong performance [17]. In addition, to better consider the complex spatio-temporal correlations in photovoltaic power output prediction and calculate the probability and fluctuation range of the prediction results, further research has introduced the attention mechanism [18] and adopted advanced architectures such as the improved CNN [19]. Literature [20] points out that the Informer model, through the ProbSparse attention mechanism and multi-scale feature fusion, can significantly reduce the computational complexity of long sequences while effectively capturing the key temporal dependencies and multi-scale fluctuation features in photovoltaic power sequences, achieving a very significant improvement in prediction accuracy. At the same time, many researchers have noticed that the temporal convolutional network can not only efficiently extract the temporal features of the input sequence, providing more effective representation information for subsequent prediction models [21], but also, through its parallel computing capability and dilated causal convolution structure, can effectively capture the long-term dependencies in photovoltaic power sequences, enhancing the stability and accuracy of the prediction [22].
While GRU exhibits lower computational costs during training, PV power forecasting involves complex meteorological conditions and nonlinear relationships. A single GRU model may struggle to fully capture these intricate features. To overcome this limitation, researchers have combined optimization algorithms with GRU to enhance prediction accuracy. For instance, Wang et al. [23] adopted a Red-tailed Hawk (RTH) algorithm-optimized GRU for PV power forecasting under different weather conditions and demonstrated its feasibility in PV power generation by comparing it with common models. Chen Qingming et al. [24] developed a GRU-based PV power forecasting model using the Grey Wolf Optimization (GWO) algorithm, demonstrating superior performance compared to LSTM. Li Sheng et al. [25] optimized GRU parameters using the Pelican Optimization Algorithm (POA) for PV power forecasting, with results similarly demonstrating POA-GRU’s superior power prediction capability for PV stations. Comparative analysis of relevant literature is presented in Table 1.
Although models like POA and GWO effectively enhance prediction accuracy when combined with GRU, they exhibit limitations in handling complex nonlinear relationships. These models struggle to accurately capture long-term dependencies and dynamic features within PV power sequences and possess weak generalization capabilities. The introduction of the Kolmogorov–Arnold Network (KAN) [26] presents a promising solution. The KAN architecture demonstrates that multidimensional dynamic systems can be precisely represented using a finite set of functions, granting this model inherent advantages in approximating complex nonlinear functions [27]. Combining the GRU model with KAN’s nonlinear approximation capability enables more effective capture of complex features and long-term dependencies in time series, reducing computational burden and enhancing prediction accuracy. For instance, Su et al. [28] employed GRU-KAN for predicting high-speed aircraft trajectories, while Ma Feihu et al. [29] employed GRU-KAN for predicting shared bicycle usage around subway stations. Both cases demonstrated significant improvements in prediction accuracy after incorporating KAN enhancements. However, these models suffer from deficiencies due to inadequate factor screening, which increases training costs.
In addition, probabilistic forecasting has become a research hotspot in PV power prediction, as it better captures the uncertainty of PV power compared to point forecasts. For instance, Niu proposed a probabilistic forecasting model based on KDE using point forecast results, providing support for the secure dispatch of power systems [30].
In summary, addressing current research gaps, this paper proposes a hybrid PV power prediction model integrating the Crested Porcupine Optimizer–Random Forest (CPO-RF) and GRU-KAN. First, the DPCA classifies weather conditions. Then, the CPO-RF model performs feature screening on PV power time series data. Subsequently, a hybrid GRU model incorporating a KAN layer is employed to forecast PV power generation, with SHAP values evaluating feature importance and the impact of causal features. Finally, KDE is adopted to evaluate the interval forecasting performance of the KAN-GRU model
The rest of this article is organized as follows. Section 2 introduces the principles of combined prediction models, covering DPCA, CPO, RF, KAN, GRU, SHAP, and KDE models, respectively, introduces and discusses the research framework, and proposes a combined photovoltaic power prediction model based on DPCA-CPO-RF-KAN-GRU. Section 3 conducts a computing power analysis and compares it with common prediction models to verify the superiority of the model in this study. Finally, the research conclusion is drawn in Section 4.

2. Methodology

This section first introduces the basic principles of the Density Peak Clustering Algorithm and the Crested Porcupine Optimization algorithm. Secondly, introduce the Random Forest model and clarify the application steps of the model. Then, introduce the structure of the KAN layer. Finally, the GRU model is introduced and the basic theoretical basis for conducting predictions with GRU is analyzed.

2.1. DPCA Model

The Density Peak Clustering Algorithm [31,32,33] is a new clustering algorithm proposed in recent years, which identifies non-spherical clusters based on the distances between data points and can automatically determine cluster centers and the number of clusters. In short-term PV power forecasting, historical power data and corresponding meteorological variables exhibit distinct clustering characteristics due to different weather conditions. Directly inputting mixed weather-condition data into a prediction model increases the complexity of nonlinear fitting and reduces forecasting accuracy. Consequently, this study employs DPCA to cluster historical PV and meteorological data: each cluster center corresponds to a typical weather pattern, and data under similar weather conditions are grouped into the same cluster. Subsequent prediction models can then learn the mapping between features and power output separately for each cluster, thereby mitigating interference from heterogeneous weather data and improving the robustness of predictions.
The algorithm is based on two assumptions about the clustering center: (1) the density of the clustering center is high; (2) the distance between the cluster center and the data points with higher local density is relatively larger. The model is introduced as follows:
Set the dataset to be clustered as X = x 1 , x 2 , , x n ; the corresponding set of subscripts is I X = { 1 , 2 , , N } , d i j is based on the distance between data points. For each of these data points x i X , two important parameters local density ρ i and distance δ i need to be calculated. For local density ρ i , the expression is as Equation (1).
ρ i = j I s e d i j d c 2
In Equation (1), d i j represents the distance between data points x i and x j ; d c stands for truncation distance. According to Equation (1), it can be seen that local density ρ i represents the number of points in dataset X whose distance from data point x i is less than the truncation distance. For large datasets, the algorithm is robust to the selection of stage distance.
The Gaussian kernel function is a continuous value, and the probability that different data points have the same local density value is small. A descending order subscript order of the local density set ρ i i = 1 N of all data points is expressed as q i i = 1 N , and satisfies
ρ q 1 ρ q 2 ρ q N
Define the distance δ i as Equation (3).
δ q i = max q i j < i d q i q j , i 2 ; max δ q i j 2 , i = 1 , i j
In Equation (3), for every data point x i X , ( ρ i , δ i ) can be computed. All data points are drawn in a two-dimensional coordinate plan, which is a decision graph. It can be intuitively seen that the data point with both large ρ value and δ value is the clustering center.
After the clustering center is determined, each remaining data point is assigned to the class cluster belonging to its nearest point with higher density.

2.2. RF Model

The Random Forest (RF) algorithm is an ensemble learning technique based on decision trees [34,35,36]. It enhances generalization ability and mitigates overfitting by introducing randomness during training. RF has the ability to evaluate feature importance. This study employs the RF model to identify critical factors affecting PV power sequences from meteorological and environmental variables. Selecting factors with higher importance as inputs can enhance the accuracy of predictions and reduce interference from irrelevant features. During the factor screening process, the Gini coefficient is employed to evaluate the influence of each meteorological indicator on PV power.
The procedure for calculating the influence degree of a PV dataset containing features ( x 1   x 2     x n ) is as follows:
(1)
This paper employs a bootstrap model to randomly select K sample sets from the initial dataset, generating K regression trees. During each random sampling process, the unselected sample groups form K out-of-bag datasets.
(2)
When the dataset contains n features, at each node, m ( m < n ) features are randomly selected as the candidate feature set. Based on this candidate set, the optimal feature subset is chosen to split the node.
(3)
Feature importance is assessed using the Gini score, calculated as follows:
I G i n i , m = 1 k = 1 | K | p m k 2
In Equation (4), I G i n i , m represents the Gini coefficient at node m , p m k represents the proportion of category k at node m , and K represents the total number of categories.
The importance formula for feature x i at node m is shown in Equation (5):
V I M j ( G i n i ) = I G i n i , m I G i n i , l I G i n i , r
In Equation (5), V I M j ( G i n i ) represents the Gini coefficient for each feature x i , I G i n i , l represents the Gini coefficient at node l , and I G i n i , r represents the Gini coefficient at node r .

2.3. CPO Model

The CPO algorithm is a novel metaheuristic algorithm inspired by the defensive behaviors of crested porcupines [37,38,39]. The CPO algorithm’s parameter optimization primarily consists of two phases: global exploration and local development. The exploration phase mimics behaviors such as vocal warning and odor diffusion to broadly search the solution space, while the exploitation phase refines solutions by emulating quill erection and tactical attacks. Since PV power prediction is subject to multiple nonlinear factors and the accuracy of the hybrid model is highly sensitive to its key parameters, arbitrary parameter selection may lead to local optima or overfitting. Therefore, this study employs CPO to optimize the key parameters of the RF model with the objective of minimizing PV power prediction error.

2.3.1. Global Exploration

During the global exploration phase, the Crested Porcupine maintains a distance from predators, employing both visual and auditory defense strategies to deter them.
First Defense Strategy: The Crested Porcupine raises and fans its quills to warn predators. The behavior model is as follows:
x i t + 1 = x i t + τ 1 × 2 × τ 2 × x C P t y i t
In Equation (6), x i t indicates the position of the individual i during iteration; x i t + 1 represents the position of the individual i + 1 in the next iteration; τ 1 is a random number following a normal distribution; τ 2 is a random number within the interval [0, 1]; x C P t is the optimal solution of function C P ; y i t indicates the position of the predator at iteration time t .
Second Defense Strategy: The Crested Porcupine generates noise and further threatens predators. The behavior model is as follows:
x i t + 1 = ( 1 U 1 ) × x i t + U 1 × ( y i t + τ 3 × ( x r 1 t x r 2 t ) )
In Equation (7), U 1 is a binary vector composed of 0 and 1; τ 3 is a random number between [0, 1]; r 1 , r 2 are two random integers within [1, N].

2.3.2. Local Development

During the local development phase, predators have already approached the Crested Porcupine at close range. The Crested Porcupine will employ scent and physical attacks to counter the predators.
Third Defense Strategy: The Crested Porcupine secretes a foul odor that spreads throughout the surrounding area, deterring predators from further approach. The behavior model is as follows:
x i t + 1 = ( 1 U 1 ) × x i t + U 1 × ( x r 1 t + S i t × ( x r 2 t x r 3 t ) τ 3 × δ × γ t × S i t )
In Equation (8), τ 3 is a random integer within [1, N], γ t is the defensive factor, S i t is the odor diffusion factor, and δ is the parameter that controls the search direction.
Fourth Defense Strategy: When all previous strategies fail, the Crested Porcupine will launch a physical attack against the predator. The behavior model is as follows:
x i t + 1 = x C P t + ( α ( 1 τ 4 ) + τ 4 ) × ( δ × x C P t x i t ) τ 5 × δ × γ t × F i t
In Equation (9), α is the convergence factor, τ 4 , τ 5 are two random integers within [1, N], and F i t is the inelastic collision force generated when an individual physically attacks a predator.

2.3.3. The Process of the CPO-RF Collaborative Optimization Algorithm

In this study, the CPO algorithm is employed to optimize two key hyperparameters of the RF model: the number of trees and the maximum depth of trees. The detailed procedure is outlined as follows:
(1)
Population initialization: The population is initialized with each individual’s solution randomly generated within the predefined parameter ranges, providing a diversified starting point for the search process.
(2)
Fitness evaluation: For each individual, an RF model is constructed based on its current parameter set. The fitness value is then computed to assess the model’s performance on the training data.
(3)
CPO parameter update: The parameters of each individual are updated according to the CPO exploration and exploitation rules. After the update, the fitness of the corresponding RF model is recalculated to explore potentially better parameter combinations.
(4)
Iterative search: Steps (2) and (3) are repeated iteratively. The process continues until the maximum number of iterations is reached.
(5)
Termination and output: Upon meeting the termination criterion, the optimal combination of tree count and tree depth is extracted. This configuration is adopted as the final hyperparameter setting for the RF model, which is subsequently used for feature selection to enhance overall model performance.

2.4. KAN Model

KAN is a novel neural network architecture based on the Kolmogorov–Arnold Representation Theorem. Compared to traditional Multi-Layer Perceptron (MLP), its key difference lies in the configuration and operational mechanism of its activation functions. The KAN model places learned activation functions at the network edges; these activation functions are not fixed but adaptively adjust during training. The KAN model achieves effective fitting of complex network structures with fewer parameters while maintaining flexibility, offering novel approaches to address issues such as parameter redundancy and limited adaptability encountered by traditional MLPs in handling complex tasks. Given the strong nonlinearity between weather features and power output, this study employs KAN as a feature mapping layer to transform features selected by Random Forest into high-dimensional representations that are more suitable for GRU modeling.
Kan model composed of an external function and an internal function, its mathematical expression is
f ( x ) = f ( x 1 , , x n ) = q = 1 2 n + 1 ϕ q ( p = 1 n φ q , p ( x p ) )
In Equation (10), x represents an n-dimensional input vector. The domain of the internal function φ q , p is typically [0, 1], with its range being the set of real numbers R . The external function ϕ q is often defined with both domain and range as R . The activation function equation is
φ ( x ) = w ( b ( x ) + spline ( x ) )
b ( x ) = silu ( x ) = x / ( 1 + e x )
spline ( x ) = c i B i ( x )
In Equations (11)–(13), w , c i are both learnable parameters, and b ( x ) is the bias activation function; in the original KAN text, this function is initialized as the silu ( ) activation function, spline ( x ) is a set of one-dimensional function combinations, and B i ( x ) is a predefined set of base spline functions. During training, the spline parameter c i is continuously optimized to adjust the spline shape, thereby fitting the training data.

2.5. GRU Model

As a type of recurrent neural network (RNN), the GRU model can effectively handle long-term dependencies in sequential data through its unique gating mechanism. Compared with traditional deep learning models, the GRU model demonstrates superior performance in many sequence tasks and can accurately capture the dynamic temporal characteristics of photovoltaic power output that fluctuate with meteorological factors, making it suitable for photovoltaic power generation forecasting.
The GRU primarily consists of an update gate and a reset gate. The GRU update gate determines the learning degree of the hidden state h t 1 and input information x t through weight coefficients W z and U z . The update gate z t is calculated as follows:
z t = σ ( W z x t + U z h t 1 )
In Equation (14), the reset gate of the GRU determines the learning degree between the hidden state h t 1 and the input information x t through weight coefficients W r and U r . The reset gate r is calculated as follows:
r t = S i g m o i d ( W r x t + U r h t 1 )
The GRU also includes a candidate hidden state h t , whose calculation formula is as follows:
h t = tanh W x ι + r ι U h ι 1
Finally, the update formula for the hidden state h t is as follows:
h t = ( 1 z t ) h t t + z t h t 1

2.6. SHAP Model

Generally speaking, the technology of machine learning is regarded as an opaque tool, which lacks the ability to provide clear and understandable interpretation of results [40,41]. Therefore, the SHAP architecture was employed to clarify the specific influence of each attribute on the prediction results, thereby facilitating the interpretation of deep learning techniques.The main goal of this architecture is to quantify the interaction degree between each attribute and the probability function, which is known as the SHAP value. It can reflect whether a specific characteristic can significantly increase or decrease the final output probability. The specific operation steps include several parts:
ϕ k = L N { k } | L | ! K | L | 1 ! K ! · v L { k } v ( L )
In Equation (18), ϕ k represents the contribution of the k th feature; L represents the feature subset; N k represents the feature set; K represents the total number of input features; v L { k } is the load prediction value when the sample is the feature values in L { k } ; v ( L ) is the load prediction value when the sample only includes the feature values in L .

2.7. Interval Prediction Based on the KDE Model

The KDE is a non-parametric method for estimating the probability density function. It smooths and superimposes each data point with a kernel function to estimate the continuous probability distribution of the error without assuming that the error follows a specific distribution. The specific steps include using the trained KAN-GRU model to make predictions on the validation set to obtain the point prediction sequence, calculating the error sequence between it and the true value, then applying KDE to the error sequence to estimate its probability density function, and determining it through Silverman’s rule or cross-validation to balance the smoothness and bias of the estimation. Finally, the prediction interval is constructed. Given a confidence level of 1 α , on the estimated error density function, find the lower quantile q α / 2 and the upper quantile q 1 α / 2 such that P ( q α / 2 e q 1 α / 2 ) = 1 α . Then, the power prediction interval ( L t + k , U t + k ) for the future moments t + k can be obtained through the following formula:
L t + k = y ^ t + k + q α / 2 ,   U t + k = y ^ t + k + q 1 α / 2
In Equation (18), y ^ t + k represents the point prediction value of the KAN-GRU model.
The prediction intervals generated by the KDE model can visually represent the possible fluctuation range of photovoltaic power, providing a crucial quantitative basis for the power system to cope with the uncertainty of photovoltaic output and formulate robust scheduling plans.

2.8. Research Framework

Based on PV power forecasting, the proposed CPO-RF hybrid model in this study comprises the following two steps:
(1)
Feature Selection. The CPO model is employed to optimize the decision tree parameters and depth parameters of the RF model. Based on the optimized results, feature selection is performed on the photovoltaic sequence, with the selection outcomes serving as input variables for prediction.
(2)
Power Forecasting. The GRU model is enhanced by incorporating a KAN layer to optimize it for PV power forecasting. The KAN-GRU hybrid model is illustrated in Figure 2. SHAP analysis and interval prediction are conducted to further demonstrate the rationality of the proposed model
This study employs Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Square Error (MSE), and the coefficient of determination (R-Square, R2) to evaluate model performance. The error calculation formula for the predicted value y ¯ = y ¯ 1 , , y ¯ n } and the actual value y = y 1 , , y n is as follows:
R M S E = 1 n i = 1 n y ¯ i y i 2
M A E = 1 n i = 1 n y ¯ i y i
M S E = 1 n i = 1 n y ¯ i y i 2
R 2 = 1 i = 1 T ( y i y ¯ i ) 2 i = 1 T ( y i 1 T i = 1 T y i ) 2
For interval estimation, this study adopts Prediction Interval Coverage Probability (PICP) to evaluate the performance of the model. PICP reflects the probability that the actual power falls within the predicted fluctuation range and can be used to evaluate the reliability of the prediction model. It can be expressed as
PICP = 1 N t = 1 N I { y t [ L t , U t ] }
The PV power forecasting process based on the DPCA-CPO-RF-KAN-GRU is illustrated in Figure 3.

3. Case Analysis

This section verifies the validity of the proposed prediction model through the actual operation data of a photovoltaic power station in a certain area in northwest China. The screening of photovoltaic power prediction factors is carried out. Then, model construction and analysis are carried out to verify the superiority of this prediction model over conventional prediction models.

3.1. Data Sources and Processing

The experimental data were collected from a 50MW-level PV power station located in eastern Xinjiang Uygur Autonomous Region, China, within a general geographic coordinate range of 42–43° N latitude and 94–95° E longitude. The data span the period from 1 January 2022, to 31 December 2022, recorded at 15 min intervals. The region is characterized by a typical temperate continental arid climate, featuring abundant sunshine hours, intense solar radiation, low annual precipitation, and significant diurnal temperature fluctuations. However, it is also occasionally subject to dust events and rapid weather changes. These conditions create an environment with high photovoltaic potential alongside notable meteorological variability, presenting a suitable scenario for validating the short-term forecasting robustness of the proposed model.
Data preprocessing consists of three steps:
(1)
Data cleaning: Abnormal data like missing values, duplicate records and outliers were eliminated. Considering the continuity of meteorological data changes in a short period of time, the average values of adjacent moments before and after missing data and abnormal data are used to fill in. A total of 35,040 valid data points were obtained.
(2)
Feature normalization: To eliminate scale discrepancies among features and expedite model convergence, all input features were normalized to the interval [0, 1] via the Min-Max scaling method.
(3)
Dataset splitting: To ensure a realistic assessment of temporal prediction performance, the dataset was split in chronological order. Specifically, the first 70% of the sequential data is assigned to the training set, the subsequent 10% serves as the validation set for hyperparameter optimization, and the final 20% is used as the test set to evaluate the final model performance.

3.2. Density Peak Clustering Algorithm Analysis

In this study, variables including module temperature, ambient temperature, pressure, humidity, total solar radiation, direct solar radiation, diffuse solar radiation, and PV power were employed as inputs to the DPCA model for the purpose of weather pattern clustering analysis. According to the method described above, the clustering results are shown in Figure 4. In Figure 4, the horizontal axis represents the density value and the vertical axis represents the relative distance. As shown in Figure 4, there are three points with relatively large values, which can be used as the clustering center, and the number of classes is 3. Therefore, weather conditions can be classified into three categories. Among them, the red dots are Weather Type 0, the green dots are Weather Type 1, and the blue dots are Weather Type 2.
The power generation curves of each weather category over four consecutive days and the average vectors of weather features after clustering are shown in Figure 5 and Table 2, respectively. Each time unit in Figure 5 corresponds to 15 min, resulting in 384 total time points. Figure 5 presents the blue, green, and red curves denoting Weather 0, Weather 1, and Weather 2, respectively. Among the indicators in Table 2, total solar radiation is the total solar radiant power incident per unit area on a horizontal surface. Direct solar radiation measures the solar radiant power per unit area received on a surface oriented normal to the direction of the sun’s direct beam. Diffuse solar radiation quantifies the solar radiant power per unit area incident on a horizontal surface from the entire sky hemisphere, excluding the direct beam from the solar. It can be seen from the results that Weather Type 2 exhibits the relatively high module temperature and the greatest total irradiance. This is accompanied by high direct and diffuse radiation, resulting in the maximum average power output. It represents ideal clear-sky conditions with minimal cloud cover. In contrast, Weather Type 1 has the lowest average power generation and solar radiation. Its temperature and radiation values are low, while humidity is high, making it unfavorable for PV system operation. Weather Type 0, while favorable, shows a high average power output. It is characterized by relatively long sunshine duration, high temperature, and strong radiation.

3.3. Feature Selection Based on CPO-RF

In this study, data from the training set are used for feature selection. The dataset contains seven potential features influencing PV power: module temperature, temperature, pressure, humidity, total solar radiation, direct solar radiation, and diffuse solar radiation. This study employs the CPO algorithm to optimize the number of decision trees and tree depth within the Random Forest model. Initial parameters are set as follows: population size = 30, maximum iterations = 20, tree count search range = [50, 100], and tree depth search range = [10, 30]. After iterative optimization, the optimal parameters were determined to be 86 trees and a depth of 24. These parameters were then applied to screen the features. The optimization process of the CPO-RF hybrid model is illustrated in Figure 6.
During the feature screening stage, a small number of representative key features were selected based on the importance assessment of each feature.
The importance levels of the seven potential influencing factors are shown in Table 3. To reduce input redundancy in the model and enhance the predictive efficiency and effectiveness of the hybrid model, this study selects meteorological features with importance levels exceeding 4 as input variables for the photovoltaic forecasting model.
This study presents a detailed analysis of the PV power forecasting, with key findings summarized as follows. Total solar radiation, the core energy source for PV generation (sum of direct and diffuse components), sees direct radiation sharply boosting power on clear days while diffuse radiation stabilizes basic output on overcast days. Module temperature has a conditional correlation with PV power: moderate levels support stable output under sufficient radiation, while temperatures exceeding critical temperature slightly reduce module efficiency and cause power drop.
To validate the effectiveness of the feature selection, this study conducted predictive experiments using the GRU model with different input feature sets. These sets were constructed based on varying thresholds of feature importance. Type 1 is a combination of variables with an absolute value of importance greater than 0. Type 2 is a combination of variables with an absolute value of importance greater than 2. Type 3 is a combination of variables with an absolute value of importance greater than 4. Type 4 is a combination of variables with an absolute value of importance greater than 8. The combination mode and prediction effect of predictors are shown in Table 4 and Figure 7.
In Figure 7, the PV power curves of different correlation variables of the prediction model are represented by different combination colors. It can be found that the factor combination Type 3 based on the CPO-RF model has the highest accuracy. Feature selection based on the CPO-RF model can help the prediction model achieve higher accuracy. The input accuracy of Type 1 and Type 2 is relatively limited. An excessive number of irrelevant variables can lead to information pollution and thus create redundancy. The prediction model is affected by redundant information during the training process, which reduces its accuracy. The combination of Type 4 variables as input variables for prediction also cannot achieve high accuracy. Due to the limited information contained in the relevant variables and the inability to provide more information to the model, resulting in a decrease in prediction accuracy.
In addition, Table 5 presents the prediction accuracy when different influencing factors are utilized as inputs. The results demonstrate that the model achieves an RMSE value of 3.70, MAE value of 3.05, and R2 of 95.70% for Weather 0. In comparison, for Weather 1, the model achieves 3.55 RMSE, 2.89 MAE, 94.77% R2. Similarly, on Weather 2, the model achieves an RMSE value of 3.57, MAE value of 2.92, and R2 of 96.31%. These indices reflect a superior prediction accuracy compared to other types, thus confirming the effectiveness of the CPO-RF model screening.

3.4. Model Construction and Analysis

To better evaluate the performance of the proposed hybrid PV power forecasting model, this study compares the KAN-GRU model with commonly used PV power forecasting models using the test set. The input features are module temperature X1, total solar radiation X5, direct solar radiation X6, and diffuse solar radiation X7. The comparison models include GRU, KAN-TCN, TCN, BiLSTM, and KAN-BiLSTM, all configured with two layers and eight hidden layer dimensions. The combined prediction model described in this study operates in a Python 3.7 (Python Software Foundation, Wilmington, DE, USA) environment. This paper analyzes the prediction results for three consecutive days in the test set. Figure 8, Figure 9 and Figure 10 show the prediction results under different weather types. Table 5 shows the prediction error results of each model under different weather types.
As shown in Table 5, among the listed models, the KAN-GRU model achieved the best performance in terms of RMSE, MSE, MAE, and R2 for PV power forecasting. The GRU, KAN-TCN, TCN, BiLSTM, and KAN-BiLSTM models demonstrated relatively poorer performance. This advantage primarily stems from the flexibility and robust nonlinear function approximation capabilities of the KAN network, coupled with the GRU structure’s efficiency in processing sequential data. By integrating the learnable activation functions of the KAN network with the gating mechanism of GRU, the KAN-GRU model effectively captures the complex dynamic characteristics in PV power data, thereby achieving a significant improvement in prediction accuracy.

3.5. Model Explainability

To investigate the influence of input features on PV power prediction models, this study employs the SHAP method to interpret the DPCA-POA-RF-KAN-GRU. Figure 11 shows the effect of each variable on predicted value of PV power, where (i)~(k) reflects the degree of influence of the variables, and (l)~(n) reflects the positive or negative correlation.
Subfigures (a)~(c) intuitively quantify the influence intensity of each feature via mean absolute SHAP values, with higher values indicating greater impacts on predictions. Total solar radiation and direct solar radiation have relatively large numerical impacts on final predictions, with their combined average SHAP value contribution exceeding 70%. In contrast, diffuse solar radiation and module temperature exert smaller influences, with their combined average contribution below 20%.
Subfigures (d)~(f) clearly reveal the coupling patterns between each feature and power output, with feature values distinguished by color—red for high values and blue for low values. As core energy inputs, total solar radiation and direct solar radiation show data points mostly concentrated on the right side of the horizontal axis, with positive SHAP values. The higher the feature value (darker red), the larger the SHAP value. These two features also exhibit significant SHAP value dispersion, ranging from −5 to 20, indicating that small changes in radiation intensity can trigger obvious fluctuations in PV power. All data points of diffuse solar radiation cluster in a weak positive region near SHAP value 0, with minimal fluctuations across different feature values. This reflects its stabilizing role as an energy supplement in low-radiation scenarios, maintaining the baseline power output despite mild influence. Module temperature data points fluctuate slightly around SHAP value 0, with no obvious bias between high-value (red) and low-value (blue) points. This indicates that PV modules operate stably within the optimal temperature range, and temperature changes have a mild, nonlinear impact on power output.

3.6. Interval Prediction Analysis

To further illustrate the interval prediction performance of the KAN-GRU model, this paper employs KDE for interval prediction and visualization analysis of PV power data. KDE is performed on the first 192 values of the forecast set, generating error interval results. Interval predictions are then extended to the subsequent 96 data points covering the following day. The KDE estimation results under three weather types are shown in Figure 12. Concurrently, interval estimates were generated at different confidence levels (80%, 90%, 95%). The upper and lower bounds of the error intervals, along with their widths, are presented in Table 6.
Figure 13 clearly displays the 95% confidence interval prediction results of KAN-GRU across 96 time points throughout the day under three weather types. The figure reveals that the predicted power values closely align with actual measurements. Particularly during critical phases of PV power output ramp-up and ramp-down, the model demonstrates exceptional fitting capability. Table 7 presents the interval prediction coverage results at different confidence levels across the three weather types. It is evident that the proposed model achieves over 90% interval coverage at the 95% confidence level.
As shown in Figure 13 and Table 7, Weather 0 exhibits a narrower interval, with actual coverage exceeding 90% across all three confidence levels. This indicates the proposed model has robust prediction performance under normal weather conditions, maintaining high accuracy even within a narrow range. In contrast, the interval widths for Weather 1 and Weather 2 increase. This demonstrates the model’s adaptive capability to weather fluctuations, automatically adjusting upper and lower bounds to accommodate complex meteorological conditions. Under the highly volatile Weather 2 scenario, the model achieves 95.58% coverage at the 95% confidence level. This proves robust nonlinear feature extraction capability of the proposed model, ensuring the interval prediction covers power fluctuations even under variable weather conditions.
In summary, the proposed model excels in point forecasting, and also demonstrates high robustness in interval prediction. It provides prediction intervals with high coverage across different weather types, successfully quantifying the uncertainty of PV output. The model’s interval prediction offers robust technical support for ensuring grid stability and enabling precise decision-making in power markets.

4. Conclusions

To address the intermittent and uncertain fluctuations inherent in PV power generation, this paper proposes a hybrid forecasting model based on DPCA-CPO-RF-KAN-GRU. The DPCA is first used to classify the weather conditions. Then, the CPO-RF model is employed for data dimensionality reduction, followed by the KAN-GRU for PV power forecasting. Key findings include the following:
(1)
The DPCA effectively classifies weather conditions into three distinct types using meteorological and PV power data. Verified by clustered feature vectors, the three weather types differ significantly in meteorological factors and photovoltaic power. This confirms the rationality of the clustering and provides a basis for forecasting specific weather types.
(2)
The CPO-RF optimizes hyperparameters to screen key features. It selects key features closely associated with PV output as input variables thereby reducing prediction complexity and improving forecasting accuracy.
(3)
Compared with other deep learning models, the CPO-RF-KAN-GRU hybrid forecasting model exhibits notable performance advantages in PV power prediction. KAN has strong nonlinear approximation capabilities, and GRU is highly efficient in processing time-series data. Therefore, the combined model can effectively capture the complex dynamic characteristics of PV power data. The model demonstrates a strong ability to reduce prediction errors, with an average fitting accuracy of R2 as high as 97%. Case studies demonstrate that the proposed model substantially outperforms comparative models in prediction accuracy, confirming its effectiveness and superiority for PV power forecasting. Analysis of the SHAP values indicated that the combined average SHAP value of total solar radiation and direct solar radiation contributes more than 70%.
(4)
In terms of interval prediction, the proposed model achieves over 90% interval coverage under all three weather conditions, demonstrating strong prediction robustness. Stable interval prediction results enable power grid companies to effectively mitigate the uncertainty risks associated with photovoltaic output and enhance operational efficiency.
In the future, with advancements in artificial intelligence technology, AI-based PV power forecasting models will continue to evolve. In data processing, AI can select features more relevant to PV power output based on data characteristics, intelligently handle anomalous data, and enhance forecasting efficiency. Meanwhile, AI can learn and adapt to geographical and climatic features within the data, thereby optimizing model parameter settings to achieve superior forecasting performance. This will significantly improve the reusability and generalization capabilities of PV power forecasting models. These advancements will enhance the accuracy, robustness, and real-time performance of renewable energy power prediction, while providing intelligent tools for optimizing renewable energy trading, strengthening the flexibility and stability of power systems, and facilitating the high-penetration integration of renewable energy.

Author Contributions

M.L.: writing—original draft, Y.Z.: writing—review and editing; Y.W.: writing—original draft preparation; W.Z.: writing—original draft preparation; M.Q.: writing—original draft; X.B.: writing—review and editing; Z.D.: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Science and Technology Project of the Headquarters of State Grid Corporation of China (No. 5400-202455314A-1-3-ZB).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Ying Zhou, Yusi Wei, Weibo Zhao, Min Qu and Xue Bai were employed by the China Electric Power Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zhu, C.; Tu, Y.; Wei, Q.; Zang, Y.; Zhou, P.; Yang, L.; Wang, J.; Yu, Y.; Lv, R.; Du, J.; et al. Short-term power prediction of photovoltaic power station based on LSTM-XGBoost model. Sol. Energy 2025, 300, 113819. [Google Scholar] [CrossRef]
  2. Sun, F.; Li, L.; Bian, D.; Bian, W.; Wang, Q.; Wang, S. Photovoltaic power prediction based on multi-scale photovoltaic power fluctuation characteristics and multi-channel LSTM prediction models. Renew. Energy 2025, 246, 122866. [Google Scholar] [CrossRef]
  3. Zhang, Y.; Jing, C.; Wang, H.; Zhang, J.; Zhang, X. Medium- and long-term photovoltaic power prediction based on periodic attention mechanism. Acta Energiae Solaris Sin. 2024, 45, 298–308. [Google Scholar] [CrossRef]
  4. Li, L.; Gao, G.; Tao, P.; Zhang, C.; Zhao, S.; Chen, W. Ultra-short-term PV power dynamic prediction method based on SFLA and MSISSA-ANFIS. Acta Energiae Solaris Sin. 2024, 45, 326–335. [Google Scholar] [CrossRef]
  5. Delussu, F.; Manzione, D.; Meo, R.; Ottino, G.; Asare, M. Experiments and Comparison of Digital Twinning of Photovoltaic Panels by Machine Learning Models and a Cyber-Physical Model in Modelica. IEEE Trans. Ind. Inform. 2022, 18, 4018–4028. [Google Scholar] [CrossRef]
  6. Pretto, S.; Ogliari, E.; Niccolai, A.; Nespoli, A. A New Probabilistic Ensemble Method for an Enhanced Day-Ahead PV Power Forecast. IEEE J. Photovolt. 2022, 12, 581–588. [Google Scholar] [CrossRef]
  7. Cheng, J.; Yang, Y.; Deng, M. Factors influence analysis of China’s photovoltaic industry development based on machine learning method. Water Resour. Power 2024, 42, 1–5. [Google Scholar] [CrossRef]
  8. Cui, S.; Lyu, S.; Ma, Y.; Wang, K. Improved informer PV power short-term prediction model based on weather typing and AHA-VMD-MPE. Energy 2024, 307, 132766. [Google Scholar] [CrossRef]
  9. Zhao, B.; Wang, Y.; Wang, B.; Xuan, W.; Lei, Z. Photovoltaic power prediction in distribution network based on ARIMA model time series. Renew. Energy Resour. 2019, 37, 820–823. [Google Scholar] [CrossRef]
  10. He, Y.; Shi, C.; Guo, X.; He, W.; Han, T. Photovoltaic power prediction algorithm based on parameter optimization of multi-kernel SVM. Acta Energiae Solaris Sin. 2024, 45, 394–404. [Google Scholar] [CrossRef]
  11. Li, Y.; Huang, W.; Lou, K.; Zhang, X.; Wan, Q. Short-term PV power prediction based on meteorological similarity days and SSA-BiLSTM. Syst. Soft Comput. 2024, 6, 200084. [Google Scholar] [CrossRef]
  12. Liu, X.; Chen, C.; Wang, H.; Chen, H. Power prediction of mechanism-data hybrid drive photovoltaic power plant based on TOPSIS-GRNN. Renew. Energy Resour. 2024, 42, 471–478. [Google Scholar] [CrossRef]
  13. Guo, F.; Yang, C.; Xia, D.; Xu, J. Short-Term Prediction of Photovoltaic Power Based on Improved CNN-LSTM and Cascading Learning. Energy Eng. 2025, 122, 1975–1999. [Google Scholar] [CrossRef]
  14. Wang, C.; Zhang, L.; Liu, Z.; Tan, J.; Xv, S. Feature mining based indRNN photovoltaic power generation prediction. In Proceedings of the 2024 8th International Conference on Electrical, Mechanical and Computer Engineering (ICEMCE), Xi’san, China, 25–27 October 2024. [Google Scholar] [CrossRef]
  15. Song, S.; Li, B. Short-term forecasting method of photovoltaic power based on LSTM. Renew. Energy Resour. 2021, 39, 594–602. [Google Scholar] [CrossRef]
  16. Gao, H.; Yuan, Z.; Zhang, S.; Wang, X.; Zhang, H.; Geng, H. Short-term photovoltaic power prediction based on LSTM model. Acta Energiae Solaris Sin. 2024, 45, 376–381. [Google Scholar] [CrossRef]
  17. Yang, S.; Luo, Y. Short-term photovoltaic power prediction based on RF-SGMD-GWO-BiLSTM hybrid models. Energy 2025, 316, 134545. [Google Scholar] [CrossRef]
  18. Wu, Y.; Han, X.; Niu, Z.; Yan, B.; Zhao, Z.; Yang, J. Interpretable Interval Prediction of Photovoltaic Power Integrating Multi-Attention Deep Neural Network. Power Syst. Technol. 2024, 48, 2928–2939. [Google Scholar] [CrossRef]
  19. Li, Q. Probabilistic Forecasting of Photovoltaic Power Based on Attention Temporal Convolutional Neural Network. Acta Energiae Solaris Sin. 2025, 46, 326–332. [Google Scholar] [CrossRef]
  20. Cao, Y.; Liu, G.; Luo, D.; Bavirisetti, D.; Xiao, G. Multi-timescale photovoltaic power forecasting using an improved Stacking ensemble algorithm based LSTM-Informer model. Energy 2023, 283, 128669. [Google Scholar] [CrossRef]
  21. Zhao, X.; Luo, J.; Jiang, A.; Hu, H.; Ma, Y. Short-term Photovoltaic Power Prediction Based on Improved Phase Space Reconstruction and Multi-channel Convolution-FPN Network. Acta Energiae Solaris Sin. 2025, 46, 299–307. [Google Scholar] [CrossRef]
  22. Wu, W.; Mi, C.; Li, L. A Short-term Power Forecasting Model for Distributed Photovoltaics Based on Multi-source Meteorological Information and Improved Combined Neural Network. Acta Energiae Solaris Sin. 2025, 46, 181–192. [Google Scholar] [CrossRef]
  23. Wang, H.; Yan, X.; Xia, Q.; Liu, J.; Wang, X. Short-term photovoltaic power prediction based on OVMD-KPCA-RTH-GRU. Water Power 2024, 50, 98–103. [Google Scholar] [CrossRef]
  24. Chen, Q.; Liao, H.; Sun, Y.; Zeng, Y. Photovoltaic power prediction model based on GWO-GRU. Acta Energiae Solaris Sin. 2024, 45, 438–444. [Google Scholar]
  25. Li, S.; Yu, H. Research on photovoltaic power generation power prediction based on POA-GRU Model. Power Electron. 2025, 59, 74–79+87. [Google Scholar] [CrossRef]
  26. Han, X.; Jiang, F.; Wen, S.; Tian, T. Kolmogorov-Arnold network-based enhanced fusion transformer for hyperspectral image classification. Inf. Sci. 2025, 717, 122323. [Google Scholar] [CrossRef]
  27. Liu, M.; Jin, S.; Liu, Z.; Liu, Z.; Yan, Z.; Liu, H.; Xu, M. Learning-based NLOS imaging with Kolmogorov-Arnold network-enhanced transformer. Opt. Laser Technol. 2025, 192, 113463. [Google Scholar] [CrossRef]
  28. Su, Y.; Zhang, L.; Zhao, G.; Guo, Z.; Zhang, K. Long-term trajectory prediction of hypersonic aircraft based on GRU-KAN. Aero Weapon. 2024, 31, 44–49. [Google Scholar]
  29. Ma, F.; Xiao, T.; Wang, X.; Sun, C.; Li, M. Demand forecasting of shared bicycles around subway stations based on GRU-KAN modeling. Transp. Res. 2025, 11, 93–104. [Google Scholar] [CrossRef]
  30. Niu, D.; Sun, L.; Yu, M.; Wang, K. Point and interval forecasting of ultra-short-term wind power based on a data-driven method and hybrid deep learning model. Energy 2022, 254, 124384. [Google Scholar] [CrossRef]
  31. Shi, F.; Du, W.; Xu, X.; Tian, H.; Yan, R.; Chao, L. An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf. Sci. 2023, 624, 252–276. [Google Scholar] [CrossRef]
  32. Yi, Z.; Wei, P.; Jing, C. An improved density peak clustering algorithm guided by pseudo labels. Knowl.-Based Syst. 2022, 252, 109374. [Google Scholar] [CrossRef]
  33. Ding, S.; Li, C.; Xu, X.; Ding, L.; Zhang, J.; Guo, L.; Shi, T. A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data. Pattern Recognit. 2023, 136, 109238. [Google Scholar] [CrossRef]
  34. Liu, X.; Yan, C. Based on the combined RF-ANFIS-PSO ultra-short-term photovoltaic power prediction model. Jiangxi Electr. Power 2023, 47, 25–28+50. [Google Scholar] [CrossRef]
  35. Zhang, J.; Ge, Y.; Wang, Y.; Tao, J.; Li, Z.; Fu, S.; Wang, X.; Zhong, Y.; Yan, B.; Chen, G. Photovoltaic power plants in mountainous area: Environmental impacts analysis based on random forest algorithm. Renew. Energy 2025, 254, 123670. [Google Scholar] [CrossRef]
  36. Dai, Y.; Wang, Y.; Leng, M.; Yang, X.; Zhou, Q. LOWESS smoothing and Random Forest based GRU model: A short-term photovoltaic power generation forecasting method. Energy 2022, 256, 124661. [Google Scholar] [CrossRef]
  37. Zhou, J.; Qu, Z.; Zhan, Z. Traveling wave fault location method of transmission line based on CPO-VMD. J. Electr. Power Sci. Technol. 2024, 40, 30–41. [Google Scholar] [CrossRef]
  38. Yang, Z.; Xu, K.; Zhao, H.; Su, B. Joint application of Crested Porcupine Optimizer and hybrid models in short-term wind power load forecasting. Electr. Power Syst. Res. 2025, 247, 111814. [Google Scholar] [CrossRef]
  39. Li, M.; Wang, X. Short-term electricity price forecasting via CPO-enhanced dual decomposition and NRBO-optimized deep learning. Digit. Signal Process. 2026, 168, 105520. [Google Scholar] [CrossRef]
  40. Ullah, I.; Liu, K.; Yamamoto, T. Prediction of electric vehicle charging duration time using ensemble machine learning algorithm and shapley additive explanations. Int. J. Energy Res. 2022, 46, 15211–15230. [Google Scholar] [CrossRef]
  41. Sun, Y.; Ma, H. Interpretable analysis of transformer winding vibration characteristics: SHAP and multi-classification feature optimization. Int. J. Electr. Power Energy Syst. 2025, 166, 110585. [Google Scholar] [CrossRef]
Figure 1. Photovoltaic installed capacity and power generation costs.
Figure 1. Photovoltaic installed capacity and power generation costs.
Processes 14 00252 g001
Figure 2. KAN-GRU hybrid model structure.
Figure 2. KAN-GRU hybrid model structure.
Processes 14 00252 g002
Figure 3. Flowchart of the DPCA-CPO-RF-KAN-GRU hybrid model prediction.
Figure 3. Flowchart of the DPCA-CPO-RF-KAN-GRU hybrid model prediction.
Processes 14 00252 g003
Figure 4. DPCA classification effect.
Figure 4. DPCA classification effect.
Processes 14 00252 g004
Figure 5. Daily power generation curve of each weather type.
Figure 5. Daily power generation curve of each weather type.
Processes 14 00252 g005
Figure 6. Iteration curve.
Figure 6. Iteration curve.
Processes 14 00252 g006
Figure 7. Comparison of prediction effect of different input types.
Figure 7. Comparison of prediction effect of different input types.
Processes 14 00252 g007
Figure 8. Comparison of different models of forecasting results in Weather 0.
Figure 8. Comparison of different models of forecasting results in Weather 0.
Processes 14 00252 g008
Figure 9. Comparison of different models of forecasting results in Weather 1.
Figure 9. Comparison of different models of forecasting results in Weather 1.
Processes 14 00252 g009
Figure 10. Comparison of different models of forecasting results in Weather 2.
Figure 10. Comparison of different models of forecasting results in Weather 2.
Processes 14 00252 g010
Figure 11. Configuration perspective SHAP value global and local feature importance.
Figure 11. Configuration perspective SHAP value global and local feature importance.
Processes 14 00252 g011
Figure 12. Error interval estimation results.
Figure 12. Error interval estimation results.
Processes 14 00252 g012
Figure 13. Interval forecasting results.
Figure 13. Interval forecasting results.
Processes 14 00252 g013
Table 1. Literature analysis and comparison.
Table 1. Literature analysis and comparison.
AuthorMethodDataset SizeKey Influencing FactorsDifferences from This Study
Wang et al. [23]OVMD-KPCA-RTH-GRU1 month, 15 min data (2019.08, 1623 samples)Atmospheric temperature, module temperature, relative humidity, solar irradianceRTH optimizes GRU and KPCA for dimensionality reduction; no weather classification.
Chen et al. [24]GWO-GRU2 years, 15 min data (2018–2019, 70,080 samples)Direct solar radiation, diffuse solar radiation, module temperature, pressure, humidityGWO optimizes GRU; Pearson-based feature selection; no weather classification or KAN layer.
Li et al. [25]POA-GRU1 months, 15 min data (2022.07, 1488 samples)Total solar radiation, temperature, pressure, humidity, direct solar radiationPOA optimizes GRU and PCA for dimensionality reduction; no weather classification or KAN enhancement.
This studyDPCA-CPO-RF-KAN-GRU1 year, 15 min data (2022, 35,040 samples)Module temperature, pressure, total solar radiationIntegrates DPCA, CPO-RF, and KAN-GRU; multi-module synergy covers comparative models’ limitations.
Table 2. Three types of weather meteorological characteristics.
Table 2. Three types of weather meteorological characteristics.
AverageWeather 0Weather 1Weather 2
Module temperature (°C)33.878.3222.79
Temperature (°C)25.063.678.77
Pressure (hPa)925.39934.72925.03
Humidity (%)25.0843.0732.39
Total solar radiation (W/m2)250.47121.35312.27
Direct solar radiation (W/m2)209.3787.07281.81
Diffuse solar radiation (W/m2)96.9674.84139.12
PV power (MW)11.149.2511.32
Table 3. Importance of influencing factors in photovoltaic prediction models.
Table 3. Importance of influencing factors in photovoltaic prediction models.
FeatureNumberImportance
Module temperature (°C)X15.96
Temperature (°C)X20.71
Pressure (hPa)X32.13
Humidity (%)X40.36
Total solar radiation (W/m2)X58.83
Direct solar radiation (W/m2)X66.02
Diffuse solar radiation (W/m2)X74.83
Table 4. Study of different types.
Table 4. Study of different types.
Input TypeImportanceIndicators
Type 1>0X1X2X3X4X5X6X7
Type 2>2X1X3X5X6X7
Type 3>4X1X5X6X7
Type 4>8X5
Table 5. Error comparison of different prediction modes.
Table 5. Error comparison of different prediction modes.
Comparison ModeWeather TypeIndicatorsRMSEMSEMAER2
Different input feature setsWeather 0type 14.7222.283.9493.03%
type 24.0616.463.2994.85%
type 33.7013.733.0595.70%
type 44.0016.023.3194.98%
Weather 1type 14.2618.163.3792.45%
type 23.6213.112.9794.55%
type 33.5512.592.8994.77%
type 44.0816.643.4093.08%
Weather 2type 14.4319.653.6494.32%
type 23.7714.213.1295.89%
type 33.5712.772.9296.31%
type 43.9115.323.2495.57%
Different prediction modelsWeather 0KAN-GRU2.948.672.4497.25%
GRU3.5512.592.8896.01%
KAN-TCN3.2410.472.6396.68%
TCN3.7113.773.0795.63%
KAN-Bilstm3.4111.632.896.31%
Bilstm3.4712.052.8496.18%
Weather 1KAN-GRU2.958.682.3896.37%
GRU3.2810.792.795.49%
KAN-TCN3.6113.042.9694.55%
TCN3.915.193.2293.65%
KAN-Bilstm3.159.952.5795.84%
Bilstm3.5812.812.9594.65%
Weather 2KAN-GRU2.98.432.3197.55%
GRU3.310.882.6696.84%
KAN-TCN3.19.612.5197.21%
TCN3.8915.123.1695.61%
KAN-Bilstm3.4311.782.8696.58%
Bilstm3.6513.342.9696.13%
Table 6. Error intervals under different weather types.
Table 6. Error intervals under different weather types.
Weather TypeMetricsConfidence Level
80%90%95%
Weather 0Error Interval[−2.05, 1.21][−2.33, 1.47][−2.53, 1.65]
Interval Width3.263.814.19
Weather 1Error Interval[−1.69, 2.17][−1.93, 3.42][−2.18, 4.00]
Interval Width3.865.356.19
Weather 2Error Interval[−2.24, 2.15][−2.72, 2.85][−2.92, 3.05]
Interval Width4.395.575.98
Table 7. Interval coverage results.
Table 7. Interval coverage results.
PICP
Confidence Interval80%90%95%
Weather 092.83%94.92%96.92%
Weather 183.78%86.11%93.67%
Weather 284.44%88.89%95.58%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, M.; Zhou, Y.; Wei, Y.; Zhao, W.; Qu, M.; Bai, X.; Ding, Z. Short-Term Photovoltaic Power Prediction Using a DPCA–CPO–RF–KAN–GRU Hybrid Model. Processes 2026, 14, 252. https://doi.org/10.3390/pr14020252

AMA Style

Liu M, Zhou Y, Wei Y, Zhao W, Qu M, Bai X, Ding Z. Short-Term Photovoltaic Power Prediction Using a DPCA–CPO–RF–KAN–GRU Hybrid Model. Processes. 2026; 14(2):252. https://doi.org/10.3390/pr14020252

Chicago/Turabian Style

Liu, Mingguang, Ying Zhou, Yusi Wei, Weibo Zhao, Min Qu, Xue Bai, and Zecheng Ding. 2026. "Short-Term Photovoltaic Power Prediction Using a DPCA–CPO–RF–KAN–GRU Hybrid Model" Processes 14, no. 2: 252. https://doi.org/10.3390/pr14020252

APA Style

Liu, M., Zhou, Y., Wei, Y., Zhao, W., Qu, M., Bai, X., & Ding, Z. (2026). Short-Term Photovoltaic Power Prediction Using a DPCA–CPO–RF–KAN–GRU Hybrid Model. Processes, 14(2), 252. https://doi.org/10.3390/pr14020252

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop