1. Introduction
With the expansion of the global economy, the issue of energy shortage has become increasingly prominent with the growth of consumption. Meanwhile, the large-scale exploitation and utilization of fossil fuels have triggered a series of environmental and ecological crises, including intensified greenhouse effects and frequent extreme weather events, directly threatening human survival and development. Thus, energy security and sustainability have emerged as core issues in the global response to climate change. Against this backdrop, a far-reaching transformation of the global energy mix is being witnessed. The shift toward clean and low-carbon energy is becoming an urgent task for the international community to realize sustainable development. Renewable energy, with inherent advantages of cleanliness and sustainability, has become the core strategy of energy transformation [
1,
2]. Over the past decade, global electricity generation has grown at an average annual rate of approximately 2.6%. In 2024, global electricity generation from all energy sources increased, with the exception of oil. Wind and solar power accounted for 53% of the global growth in electricity generation. Renewable power generation (including hydropower) contributed 32% to global electricity supply [
3]. However, renewable power is affected by policy regulation and geographical environment and presents significant volatility and uncertainty [
4].
Accurate prediction of renewable power generation provides scientific data support for formulating energy structure transformation strategies. Therefore, conducting relevant research on renewable power generation prediction is of significant theoretical value and practical importance. Notably, this field is confronted with multiple inherent challenges that complicate prediction tasks. On the one hand, high-quality data in renewable power generation prediction often suffers from the small-sample constraint. The scarcity of valid information restricts the model’s performance. On the other hand, the raw data exhibit significant nonlinear dynamic characteristics driven by factors like intermittent weather conditions or policy adjustments. These features make it difficult to accurately characterize the underlying patterns of renewable power generation through traditional linear models. Together, these challenges underscore the complexity and necessity of research in forecasting renewable power generation. Against this backdrop, renewable power generation prediction has been intensively studied with existing methods broadly categorized into three types based on their core principles, such as statistical analysis models, machine learning models, and grey models. The first category comprises statistical analysis models, which utilize statistical principles to capture data patterns. Foundational approaches include the Autoregressive Integrated Moving Average (ARIMA) [
5,
6,
7], exponential smoothing techniques [
8], and hybrid models derived from these methods [
9,
10]. These models have been extensively applied across various domains of energy forecasting. For instance, ARIMA has been employed to predict electricity consumption [
11] and demand [
12]. Furthermore, other statistical techniques have been applied to forecast renewable energy generation. Mesa-Jiménez et al. [
13] introduced a long-term wind-solar generation forecasting method based on Markov Chain Monte Carlo, specifically designed for power purchase agreements. Çakır [
14] proposed a hybrid approach that integrates intuitionistic fuzzy time series with intuitionistic fuzzy c-means clustering to forecast renewable energy generation in Turkey. However, the ability of these models to adapt to time series characterized by high intermittency and pronounced nonlinearity remains limited.
To address the challenges of forecasting renewable energy generation characterized by strong dependence on variable environmental factors and the need to capture complex nonlinear patterns, scholars have increasingly turned to machine learning approaches, which excel at learning intricate data-driven relationships without relying on strict statistical assumptions. Firstly, shallow learning methods such as ANN [
15,
16], and SVR [
17,
18] have been applied to model the nonlinear patterns. In recent years, deep learning models [
19,
20] have gained prominence for their superior ability to capture long-term temporal dependencies in sequential energy data. And then the hybrid models, which integrate the complementary strengths of different methods, have further pushed the boundaries of prediction performance [
21,
22,
23]. Many researchers have established numerous hybrid or optimized methods to boost the prediction ability. Alshammari et al. [
24] proposed the mayfly-optimized deep recurrent neural network (RNN) integrated with stacked CNN and ReLU activation, which delivered excellent results for wind power forecasting. Liu et al. [
25] proposed the hybrid ICEEMDAN-CIAPO-ELM system, which combines sequence decomposition and hyperparameter optimization to enhance wind power prediction accuracy. Bashir et al. [
26] established two models, CNN-ABiLSTM and CNN-Transformer-MLP, trained on 15-minute real-time data, which outperformed standalone baselines in day-/week-/month-ahead solar and wind power forecasting. For photovoltaic (PV) power specifically, Ibrahim et al. [
27] proposed a CNN-LSTM hybrid model for short-term forecasting, while Lim et al. [
28] developed a weather-adaptive CNN-LSTM framework to improve prediction robustness under variable conditions. Beyond model design, Abu-Salih et al. [
29] highlighted the value of high-resolution data, developing an optimized LSTM model using 5 s smart meter data that outperformed classical machine learning baselines. Liu et al. [
30] explored AI’s role in multidimensional renewable energy data processing—enhancing accuracy, quantifying uncertainty, and advancing the sector’s intelligent development. Despite these advancements, existing hybrid machine learning models face notable limitations. They are highly dependent on the volume and quality of training data, tend to overfit in small-sample scenarios, and often lack interpretability. These gaps underscore the need for further research to refine their practical applicability in renewable energy forecasting.
Compared with machine learning methods, grey models do not need a large amount of data and have a simple modeling process. Owing to the outstanding performance of grey models, they have been widely adopted for prediction tasks in the energy field. For example, grey models have been successfully employed to forecast wind power [
31,
32] and solar energy consumption [
33,
34]. Meanwhile, many scholars have constructed different grey models to handle the renewable power generation prediction. He et al. [
35] established a novel structure-adaptive new information priority discrete grey prediction model for forecasting small-sample mid-to-long-term renewable energy generation. Pandey et al. [
36] innovatively adopted the optimized discrete grey model DGM(1,1,
) to forecast India’s renewable energy production. Wang et al. [
37] innovatively proposed a structure-adaptive discrete grey Bernoulli model to address renewable energy data’s nonlinear and poor information. Qian et al. [
38] proposed a structural adaptive discrete grey prediction model, which introduces nonlinear and periodic terms to capture the linear and nonlinear mixed trend and periodicity of renewable energy generation time series. Xia et al. [
39] constructed a dynamic structure-adaptive multivariate grey model to improve the accuracy of solar power generation prediction in China. Xia et al. [
40] proposed a fractional order discrete grey model with dynamic delay function (DTDFF-DGM(1,1)) to accurately predict the trend of clean energy generation in China. Li et al. [
41] constructed a discrete multivariate grey model with seasonal delay effects to adapt the prediction of clean energy generation with different fluctuation characteristics. Wang et al. [
42] proposed a structure-adaptive seasonal grey model based on data recombination to improve the accuracy of solar power generation prediction. Ding et al. [
43] proposed a flexible nonlinear multivariate discrete grey prediction model, which introduces power exponent terms and adjustable time power terms to accurately predict China’s hydropower and total renewable energy generation. Ren et al. [
44] proposed the multi-variable grey model with a dual adaptive factor information accumulation mechanism to predict the growth trend of renewable energy generation in China. Chen et al. [
45] constructed a grey oscillation model that integrates the smoothing operator and fractional order accumulation operator to achieve accurate prediction for the EU’s hydropower generation. Wang et al. [
46] proposed the time-lag discrete grey Euler model (ATDGEM(1,1)) for predicting regional renewable energy generation. Hybrid models based on grey theory are significant methodologies for forecasting renewable energy generation. Ran et al. [
47] developed a grey combination model based on EMD to achieve long-term wind power prediction and provide a basis for policy formulation. Gu et al. [
48] proposed a grey adaptive ensemble model which integrates FGM(1,1) and AGMC(1,n) after feature selection and dynamically assigns weights through a Gaussian kernel function to balance the fitting and generalization ability of renewable energy generation prediction.
It can be seen that grey prediction models are widely applied in the field of renewable power generation prediction due to their strong adaptability to small-sample data. However, traditional grey models mainly rely on the sequence operators such as the First-order Accumulation Generation Operator (1-AGO), Fractional-order Accumulation Generation Operator (FAGO), and so on. These operators perform information accumulation on the original sequence with fixed weights or static weights based on mathematical rules, without distinguishing between the “valid component” and “invalid component” in grey information. This leads to the accumulation of invalid information that exacerbates prediction errors. Especially, the model’s accuracy and stability are significantly insufficient when the nonlinear renewable energy data is analyzed. Although existing studies have improved model performance by optimizing the grey accumulation operators, they still fail to break through the limitation of “modeling based on full-volume information” and cannot fundamentally solve the problem of the extraction of valid grey information. Against this background, constructing a new grey model that can accurately identify the valid grey information has become a key breakthrough for improving the prediction accuracy of renewable power generation. The Probabilistic Accumulation Generation Operator (PAGO) simulates the extraction process of valid and invalid information through a Bernoulli distribution, and combines the Sigmoid function to quantify the probability of valid information extraction. And then, the valid grey information in the original sequence is effectively mined to mitigate the interference from invalid information. Based on this, aiming at the complex characteristics of renewable power generation and the information processing defects of traditional grey models, this study relies on the PAGO operator to achieve accurate extraction of valid grey information and optimize the information foundation for building a grey model. It designs a new grey prediction model based on the PAGO operator to improve the traditional model. Meanwhile, the heuristic algorithm is introduced to solve the problem of determining complex parameters of the model. Finally, the model is applied to forecast the renewable power generation for providing a high-precision prediction tool and decision support for energy planning and energy policy formulation. In this study, the principal contributions are as follows:
- (1)
The probabilistic accumulation non-homogeneous grey model (PNGM) is proposed by integrating the Probabilistic Accumulation Generation Operator (PAGO) into the traditional non-homogeneous grey model. The newly proposed model utilizes the PAGO operator to effectively mine valid grey information to solve the problem of “excessive information smoothing and insufficient valid grey information mining” in the traditional models. It provides a new model architecture to enhance the adaptability of the grey model to complex sequences and the accuracy of information extraction.
- (2)
Given that the core parameters of PAGO have a critical impact on the model’s ability to fit data patterns, the Whale Optimization Algorithm (WOA) is employed to intelligently optimize the core parameters of the operator to improve the model’s fitting performance. At the same time, numerical experiments have been conducted to verify that the WOA algorithm has the advantages of fast convergence speed and high time efficiency. It ensures the effectiveness of parameter optimization and the ability of mining valid grey information by the PAGO operator. Thus, a dual advantage of model innovation and intelligent optimization is formed.
- (3)
The proposed PNGM model is applied to forecast renewable power generation in different countries, such as the United States, India, and Russia, covering the predicted demand under different regional climate conditions and energy structures. It demonstrates strong regional adaptability and data compatibility. The uncertainty interval for each prediction value is estimated by the Bootstrap resampling method. The results can provide a reliable data reference for the planning and decision-making of energy management departments and new energy enterprises.
The remainder of this paper is structured as follows: In
Section 2, the theory of the NGM model, the PAGO operator, and the PNGM model are introduced, respectively. In
Section 3, the proposed PNGM model is optimized by using the heuristic algorithm. Also, we introduce the application of probabilistic accumulation non-homogeneous grey model in
Section 4. Finally, we conclude this study and introduce the future work in
Section 5.
2. Related Work
Since the introduction of grey system theory [
49], its advantages in scenarios with small samples and poor information have gradually become prominent. To obtain better performance, scholars have improved the grey model by optimizing initial value and background values, constructing a new grey accumulation operator, etc. As one of the key parameters in grey prediction models, the background value directly influences the model’s parameter estimation and predictive precision. Usually, the dynamic background values are introduced and optimized by optimization algorithms to enhance the models’ adaptability [
50,
51,
52,
53]. Optimizing the initial value of the time response function in the grey model can also effectively enhance its performance [
54]. Meanwhile, comprehensive optimization of the background value and time response function also represents another reliable approach. Ding et al. [
55] proposed an adaptive optimized grey model in which the initial value, background values reconstructed using Simpson’s formula, and the constant term of the time response function are simultaneously optimized. The hybrid forecasting method based on the grey model also constitutes an important research direction. Yuan et al. [
56] developed he hybrid forecasting model GM-ARIMA and applied it to predicting China’s primary energy consumption. Zhao et al. [
57] developed the residual GM-LSSVM model to forecast the electricity consumption intensity in Beijing. Song et al. [
58] established the FIG-GARM model and applied it to forecast the traffic speed of urban expressways. In addition, developing new Accumulation Generation Operators or refining existing generation operators constitutes another important strategy for improving grey model performance. Wu et al. [
59] pioneered the extension of the Accumulation Generation Operator from integer-order to fractional-order. Ma et al. [
60] designed the Conformable Fractional Accumulation Generation Operator (CFAGO) to extend the theory of grey fractional order operators. Xu et al. [
61] extended the CFAGO operator to boost its robustness and effectiveness. Wang et al. [
62] constructed a novel Caputo FAGO operator and established an adaptive grey model based on this operator. Zeng et al. [
63] established a new intelligent buffering operator to tackle the system prediction trap caused by dataset perturbations. These operators rely on fixed weights or static weights derived from mathematical rules. Given that the grey information of original sequences comprises both valid and invalid components, using only the valid components to build grey models helps improve model efficiency. In response to this need, Zhang et al. [
64] developed the theory of probabilistic accumulation operators to extract valid information for constructing a grey model. By the optimization methods outlined above, the theory of grey models and their ability to be applied in practice have been substantially expanded and improved. The evolution of the grey models with the mainstream cumulative operators is shown in
Table 1. In the next section, we will extend the grey prediction model by using the PAGO operator.
Table 1.
The grey models and their variants.
Table 1.
The grey models and their variants.
Name | Model | Characteristics | Ref. |
---|
GM(1,1) | | Basic grey model with 1-AGO. | [49] |
NGM(1,1,k) | | Non-homogeneous grey model with 1-AGO and linear grey action quantity. | [65] |
NGM(1,1,k,c) | | Non-homogeneous grey model with 1-AGO and linear grey action quantity. | [66] |
DAGM(1,1) | | Grey model with Damping Accumulation Generation Operator (DAGO), of which order is . | [67] |
FGM(1,1) | | Grey model with Fractional-order Accumulation Generation Operator (FAGO), of which the order is . | [59] |
FANGM(1,1,k,c) | | Non-homogeneous grey model with the FAGO operator and linear grey action quantity. | [68] |
FHGM(1,1) | | Grey model with Fractional Hausdorff Accumulation Generation Operator (FHAGO), of which the order is h. | [69] |
CFGM(1,1) | | Grey model with Conformable Fractional-order Accumulation Generation Operator (CFAGO), of which the order is . | [60] |
CFNGM(1,1,k,c) | | Non-homogeneous grey model with CFAGO and linear grey action quantity. | [68] |
PGM(1,1) | | Grey model with Probabilistic Accumulation Generating Operator (PAGO). | [64] |
4. Optimization of Probabilistic Non-Homogeneous Grey Model
4.1. Optimization Problem for the Parameters of the PNGM Model
In the theory probabilistic cumulative grey model, the parameters
and
of the PAGO directly affect the performance of the model. Only appropriate
and
can effectively extract the valid components from the grey information in raw data. By utilizing such valid information, the capability of the grey model can be enhanced. In the PNGM model, the values
and
need to be given in advance before performing the cumulative operation on the original sequence. Then, the PAGO operator can be employed to compute the corresponding probabilistic accumulation generated sequence. Subsequently, the parameters
can be easily calculated via Theorem 1. However, there are inherent challenges to determine the optimal values of the parameters
and
directly through theoretical deduction. To mitigate this issue, an optimization problem is formulated in which the objective function is designed to minimize the fitting errors of modeling. Specifically, those values of
and
that yield the smallest fitting errors are designated as the optimal parameters. This nonlinear optimization problem is shown below.
4.2. Comparison of Optimization Algorithms
Since significant differences exist among various intelligent optimization algorithms in solving nonlinear optimization problems, it is necessary to verify the performance of the algorithms through experiments. To identify an intelligent algorithm suitable for parameter optimization of the proposed model, this study analyzes the performance of several intelligent optimization algorithms commonly used in grey models. These algorithms include the Grey Wolf Optimization (GWO), Particle Swarm Optimization (PSO), and Whale Optimization Algorithm (WOA) [
70], all of which are widely applied in the parameter optimization of grey models. To systematically evaluate the efficiency and performance of these algorithms in solving optimization problems and provide an objective basis for subsequent algorithm selection, the numerical experiments are designed. The unified parameter configuration is adopted for the experiments. For all three algorithms, the maximum number of iterations is set to 300, and the population size is set to 80. Meanwhile, all experiments are run on the same computer, which is equipped with a 12th Gen Intel(R) Core(TM) i9-12950HX processor (2.30 GHz) and 32 GB of RAM. Furthermore, the codes of the algorithms are implemented and executed using MATLAB R2009a software. In each experimental case, each algorithm is independently run 100 times repeatedly to eliminate random errors. The three datasets used in the experiments are all derived from
Section 5. The renewable energy data from the United States, the Russian Federation, and India are used in Case 1, Case 2, and Case 3, respectively, to verify the adaptability of the algorithms in renewable energy-related optimization problems. The experiments focus on counting two core performance metrics. The first is the average computation time for each algorithm to complete a single optimization task. The second is the mean fitness at key iteration points (i.e., 30th, 60th, 100th, 200th, and 300th iterations). The detailed results are presented in
Table 2. It can be found that the WOA algorithm achieves the shortest average optimization time among the three algorithms. Based on the analysis of the variation trend of average fitness across different iteration stages, WOA also exhibits the best convergence speed, enabling faster convergence to the global optimal solution. Therefore, the WOA algorithm is adopted to address the optimization problem (
39).
4.3. Procedure for Optimizing PNGM Based on WOA
Notably, this optimization framework (
39), as a prototypical nonlinear optimization problem, entails considerable difficulties for directly obtaining analytical resolution. Based on the validation in
Section 4.2, WOA can effectively deal with the optimization problem (
39). Meanwhile, WOA has been widely validated for addressing such optimization scenarios within modeling the grey system [
39,
71,
72]. Consequently, this study employs the WOA to tackle the optimization problem (
39) to derive the model’s optimal parameters. The detailed procedures of parameter optimization of PNGM is as follows:
Step 1: Input the raw sequence and initialize the population and parameters. Firstly, collect and input the raw sequence. Secondly, generate N initial individuals in which everyone represents a potential solution in the 2D solution space randomly. Each individual’s position in the solution space is . Lastly, set parameters (N is population size, maximum iteration times).
Step 2: Calculate the fitness and determine the optimal solution. For the i-
individual, its position
is used as the parameters
and
of PAGO. According to Definition 1, the probabilistic accumulative sequence is generated from the raw data. The PNGM model is established according to Definition 4, of which the parameters are obtained based on Theorem 1. The fitting values are computed according to Theorem 2. Then, the fitness is calculated based on the objective function (
39). The position of the individual with the lowest fitness is the global optimal solution (denoted as
).
Step 3: The position of each whale is updated according to the rules. Update the position of each individual as follows: When
and
, the position of each whale is updated based on
When
and
, the position of each whale is updated based on
When
, the position of each whale is updated based on
In these equations,
,
and
r are random in
.
is randomly chosen from the population.
Step 4: Update optimal solution and terminate iteration. After each iteration, recalculate the fitness of all individuals. If the
i-th individual’s fitness is better than the current optimal solution
, updated for the individual. Repeat
Step 2 and
Step 3 until the number of iterations reaches
, and finally output
as the optimal solution.
Figure 2 shows the processing steps of searching for the best parameters of the PAGO operator.
The time complexity of optimizing PNGM based on WOA mainly comes from model construction and optimization iterations. The time complexity of the computational cost for generating the probabilistic accumulation sequence is , where n is the number of data samples for building the PNGM model. The time complexity of parameter estimation by the least square method is , where p is the number of parameters to be estimated. The time complexity of WOA is , is the maximum number of iterations of the algorithm, N is the population size, and is the time complexity of fitness function calculation. Therefore, the total time complexity of optimizing PNGM based on WOA is .
4.4. Numerical Validation
4.4.1. Evaluation Indicators and Comparative Models
In time series prediction, a wide range of effective evaluation metrics are employed to represent the models’ performance. In our study, the performance of the models is evaluated using Relative Error (
RE), Mean Absolute Percentage Error (
MAPE), Mean Absolute Error (
MAE) and Root Mean Square Percentage Error (
RMSPE). The mathematical formula of RE is described as
MAPE is described as
where
n is the number of data samples. A lower MAPE indicates better model performance. The formula of MAE is
The RMSPE is described as
To validate the model’s performance, the nine grey models, such as GM(1,1), DGM, NGM(1,1,
k,
c), FGM(1,1), FANGM(1,1,
k,
c), CFGM(1,1), CFNGM(1,1,
k,
c), FHGM(1,1), and PGM(1,1), are used as comparison models to compare with the PNGM model. These models’ formulas are presented in
Table 1.
4.4.2. Numerical Simulation on Synthetic Datasets
To evaluate the PNGM model in complex data contexts, several numerical experiments were conducted using the datasets generated by non-homogeneous exponential functions, which are described as
. The datasets generated by
are listed in
Table 3 when
and
. In the numerical experiments, we used values from No. 1 to No. 8 as a trainset for building models, and values of No. 9 as the testset to validate the model’s predictive accuracy. The fractional orders of FGM(1,1) and FANGM were both optimized via the WOA algorithm, while the parameters
of the PGM and PNGM models were also searched and determined using the same algorithm.
The fitting and prediction MAPEs of the models are presented in
Table 4. The calculation results demonstrate that the PNGM model exhibits stronger fitting and predictive abilities on all synthetic datasets, with much lower MAPEs than other comparison models. It can also be found that the combination of non-homogeneous grey model theory and the PAGO operator can notably improve model performance. Essentially, valid information extracted from grey information through the PAGO operators can effectively enhance modeling capabilities. For example, the information extraction parameters of the PNGM model are (0.0776, 0.6099) in sequence S1. The valid information sequence extracted from the original sequence for modeling is [11.2755, 13.2510, 13.8563, 14.1533, 14.7160, 15.9813, 18.4626, 22.9537]. The comparison between the grey information and the valid information extracted from it is shown in
Figure 3. By utilizing the valid information of S1, the performance of the PNGM model was significantly improved, with fitting and predictive MAPEs of 0.69% and 0.67%, respectively. The PNGM model exhibits excellent capability of mining valid information from the complex sequences.
4.5. Estimation Method of Uncertainty Interval for PNGM Model
To accurately quantify the uncertainty of prediction results from the PNGM model and provide comprehensive references for risks and feasibility in decisions related to energy planning and power grid dispatching, this study refines the uncertainty interval estimation method for the PNGM model by integrating residual analysis and the Bootstrap resampling approach. The detailed steps of uncertainty interval estimation are as follows:
Step 1: Construct the Original PNGM Model and Calculate the Residual Sequence
Based on the original observation sequence
, the probabilistic accumulation generating sequence
is computed by using the PAGO operator, of which the parameters
are optimized by the WOA algorithm. Then, the original PNGM model is constructed based on the modeling steps in
Section 3.3 to obtain the model’s fitted values
(
). The residual sequence can be computed as
Step 2: Generate Bootstrap Samples
Perform sampling with replacement on the residual sequence
obtained in Step 1, with its length equal to the number of samples. The
b-th (
) Bootstrap residual sequence is described as
Based on the fitted values
of the original PNGM model and the Bootstrap residuals
, the new observation sequences is reconstructed as
Step 3: Construct Bootstrap-PNGM Models and Conduct k-Step Predictions
For the b-th reconstructed observation sequence , the probabilistic accumulation generating sequence is firstly recalculated using PAGO. Then, the PNGM modeling process is repeated to obtain the b-th Bootstrap-PNGM model. The next k time points (denoted as ) are used to obtain the k-step predicted value set corresponding to the b-th sampling
Step 4: Repeat Sampling and Determine Uncertainty Intervals
Repeat Steps 2–3 a total of
B times to generate prediction sets in batches. For each future prediction time
(
),
B prediction values are collected to form the prediction value set
. Then, the set
for each prediction time is sorted in ascending numerical order to obtain the ordered set
(
). The corresponding quantiles based on the confidence level
are calculated. The lower quantile
is the
-th ordered value. The upper quantile
is the
-th ordered value. The uncertainty interval for that time is obtained as
5. Application of the PNGM Model
Climate change and environmental pollution stemming from the over-consumption of fossil fuels have become increasingly severe. The whole world is expediting the transition toward low-carbon energy systems. With the promotion of “dual carbon” goals, renewable power (such as solar, wind, biomass energy, and so on) has seen a steady rise in its share of the energy structure. Nevertheless, renewable power generation is subject to multiple influencing factors, including natural conditions, policy regulations, and technological advancements. It exhibits pronounced nonlinear and volatile characteristics. These characteristics of renewable power pose substantial challenges to power system planning, dispatching, and the formulation of energy policies. Accurate prediction of renewable power generation constitutes a fundamental prerequisite for its efficient utilization, as it can mitigate risks in energy systems and enhance the efficiency of energy allocation.
Due to the strong ability of the model to handle the nonlinear data sequence with small samples, we use the PNGM model to investigate the prediction problem of the renewable power generation in this study. The datasets, which are derived from the 2025 Energy Institute Statistical Review of World Energy [
3], include the renewable power generation of the United States, the Russian Federation, and India from 2012 to 2024. Meanwhile, we select nine comparative models mentioned in
Section 4.4.2 because of their universality and efficiency.
5.1. Modeling of Renewable Energy Generation Prediction
5.1.1. PNGM Model of Renewable Power Generation in United States
In this subsection, we mainly study the renewable power generation in the United States by using grey models. The sequence of renewable power generation in United States is [502.35, 532.79, 552.53, 562.26, 631.19, 714.54, 741.14, 769.13, 830.49, 870.98, 968.61, 975.09, 1068.67] (unit: terawatt-hours) during the period of 2012–2024, which was collected from 2025 EISRWE. Firstly, we use the renewable power generation from 2012 to 2021 to build grey models. The WOA is employed to determine the orders of FGM(1,1) and FANGM by minimizing the fitting MAPE. The PGM and PNGM models adopt the same approach to derive the optimal parameters
of the PAGO operator.
Table 5 lists the parameters of all models. Then, we calculated the simulation values during the period of 2012–2024 and predicted the values of 2023 and 2024.
Table A1 displays the fitted values and predicted values of each model. The errors of all models are shown in
Table 6. The MAPEs of the models are compared in
Figure 4. It is evident that the fitting MAPE of PGM and PNGM are lower than those of other comparative models. The fitting MAPE of PNGM is only 0.06% higher than that of the PGM model. It indicates that the PAGO operator has stronger information extraction capabilities than other operators. In terms of predictive performance, the PNGM model achieves the lowest prediction MAPE, which is 2.12% lower than that of the PGM model. This indicates that its generalization ability is stronger than that of other models. For the entire original sequence, its total error is also the lowest, indicating that the PNGM model has a stronger application ability to predict the United States’ renewable power generation.
5.1.2. PNGM Model of Renewable Power Generation in the Russian Federation
In this subsection, the prediction of renewable power generation in Russia is studied using grey prediction models. The original data is sourced from the 2025 EISRWE, which includes annual renewable power generation from 2012 to 2024 in the Russian Federation. The values of the dataset are [163.97, 181.65, 174.21, 168.96, 185.67, 186.37, 192.00, 196.21, 216.25, 220.25, 205.11, 209.10, 218.86] (unit: terawatt-hours). The generation from 2012 to 2022 is for building grey models. The generation from 2023 to 2024 is for validating the performance of these models. For FGM(1,1) and FANGM, their orders of the FAGO operator used for data preprocessing are obtained through searching by the heuristic WOA algorithm. For the probabilistic accumulation grey models PGM and PNGM, the parameters of the PAGO operator are also obtained via the WOA algorithm. The optimal parameters are listed in
Table 7. Next, we substitute the optimal parameters into these models to obtain the calculated values, which are listed in
Table A2. By analyzing the errors between the calculated and actual values, we obtained the fitting MAPEs, prediction MAPEs, and total MAPEs of all models on the annual renewable power generation of the Russian Federation, which are listed in
Table A2. The errors of all models are shown in
Table 8. Meanwhile, these errors were compared in
Figure 5, in which it can be found that the fitting MAPE and prediction MAPE of our proposed model PNGM are the lowest; they are 2.20% and 2.16%, respectively. This result reflects that the PNGM model has stronger information mining capability and prediction performance on the renewable power generation dataset in the Russian Federation.
5.1.3. PNGM Model of Renewable Power Generation in India
In this section, grey models are developed based on India’s renewable power generation. The data sequence in this study includes renewable power generation from India between 2012 and 2024, collected from the 2025 EISRWE. The data sequence is [161.82, 183.00, 198.51, 191.67, 201.41, 225.96, 256.09, 292.24, 304.70, 320.05, 365.88, 370.67, 397.08] (unit: terawatt-hours). The data from the first 11 years of this time series is adopted to develop grey models, while the data from the last 2 years of this time series is adopted to access the model’s performance. The optimal values of the undetermined parameters for the FAGO and PAGO operators are searched using heuristic optimization algorithms, WOA.
Table 9 shows that these optimal values are listed. Then, we used the optimal models to obtain the fitting values and predicted values which are shown in
Table A3. By comparing with real data for training and testing, we calculated the fitting errors and prediction errors of the models, which are shown in
Table 10.
Figure 6 shows the comparison of the errors of all models. The fitting MAPE and prediction MAPE of the PNGM model are 1.32% and 1.28%, respectively. The PNGM’s errors are the lowest in the data sequence of renewable power generation in India. It reflects that the PNGM model has good modeling ability and predictive performance.
5.2. Residual Diagnosis of the Models
To verify the reliability of the PNGM model in fitting renewable energy generation, residual diagnostic analysis is conducted on the models for the United States, Russia, and India. The residual sequences of the PNGM models for these three countries can be directly derived from the model fitting process. The residual sequence for the United States is [0, 0.0002, 4.9526, −22.3604, −0.0003, 33.6061, 8.9648, −15.4270, −7.5337, −21.6020, 20.3585], for Russia is [0, −0.0007, −0.0014, −4.1086, 6.1420, −1.4980, −3.6175, −5.9060, 8.8849, 8.7034, −9.7512], and for India is [0, 0.0053, −1.3275, −0.0335, −2.2766, −1.0738, 1.6041, 9.9019, −4.2912, −13.8317, 8.9854]. Residual diagnosis was carried out from four dimensions, namely descriptive statistics, independence, variance stability, and structural consistency. To test the temporal independence of the residual sequence, the Ljung–Box test is employed. Considering the constraints of the small sample size (
n = 11), lag orders of 1 and 2 are selected by following the criterion that lag orders should be reasonably small to avoid excessive loss of degrees of freedom. For verifying the stability of residual variance, the Breusch–Pagan test was adopted. Specifically, an auxiliary regression model was constructed where the observation order (i.e., time index
, with
as the total sample size) served as the explanatory variable, and the squared residual (a direct proxy for residual variance) served as the explained variable. This setup enables the detection of potential heteroscedasticity that varies with time. For verifying the structural consistency of the residual sequences, the Chow test was employed. To ensure sufficient sample size for parameter estimation in each segment, the 5th period of the residual sequence was selected as the preset breakpoint. Detailed test statistics for the three countries are presented in
Table 11.
According to the descriptive statistical results in
Table 11, it can be seen that the mean values of the residual sequences in the United States, Russia, and India are all close to 0, and the numerical fluctuations are small. It indicates that there is no significant systematic deviation between the fitted values of the models and the actual values of renewable power generation. The overall capture direction of the three countries’ power generation trends by the models is reliable. Specifically, the standard deviation of the residual sequence in the United States is 17.0588, which reflects a certain degree of local random fluctuations in its residuals. However, from the temporal distribution of the residuals, this fluctuation does not exhibit trend amplification or clustering characteristics and still belongs to the category of random errors. The standard deviations of the residual sequences from Russia and India are 5.9537 and 6.3141, respectively, which are relatively small. This indicates that the overall fluctuation amplitude of the residuals is within a statistically reasonable range. Overall, the residual sequences of the three countries did not show a significant trend drift, such as a linear increase/decrease or periodic shift. Their numerical distribution initially meets the core statistical characteristics of the random error term, laying the foundation for the effectiveness of subsequent residual independence, variance stability, and structural consistency tests.
From the Ljung–Box test results in
Table 11, it is evident that for the residual sequences of the PNGM models for the United States, Russia, and India, the
p-values corresponding to the Ljung–Box statistics at the 1st- and 2nd-order lags are all greater than the preset significance level
. In line with hypothesis testing criteria, this indicates that the null hypothesis that “residual sequences are free of autocorrelation” cannot be rejected, meaning the residual sequences of the three countries exhibit no significant temporal correlation. This result fully demonstrates that the PNGM model has effectively extracted the dynamic evolution characteristics of the original renewable energy generation via the PAGO operator and non-homogeneous grey action quantity. The model fitting process does not omit key temporal information, and no unexplained lag-related structures exist in the residuals.
To diagnose the heteroscedasticity of residual sequences from the PNGM models for renewable energy generation prediction in the United States, Russia, and India, the Breusch–Pagan test was conducted. The test results show that for the U.S. residual sequence, the LM statistic is 1.0645 (df = 1) with a corresponding p-value of 0.3022, which is greater than the significance level , failing to reject the null hypothesis of “residual homoscedasticity”. It indicates stable residual variance over the full sample period, reliable parameter estimation standard errors, and no interference of heteroscedasticity on the model’s statistical inference. For Russia’s residual sequence, the LM statistic is 8.3107 (df = 1) with a p-value of 0.0039 (less than 0.05), and for India’s residual sequence, the LM statistic is 5.1668 (df = 1), with a p-value of 0.0230 (also less than 0.05). Both reject the homoscedasticity null hypothesis, confirming significant heteroscedasticity. Although the residual sequences of the PNGM models for Russia and India exhibit significant heteroscedasticity, it is critical to clarify the scope of heteroscedasticity’s impact. In the least squares framework underlying the PNGM model’s parameter estimation, heteroscedasticity only distorts the standard errors of parameter estimates and undermines the reliability of statistical inference. It does not compromise the unbiasedness and consistency of the parameter estimates themselves. Since the point prediction results of the PNGM model rely primarily on these unbiased parameter estimates, the models’ point predictions for renewable energy generation in Russia and India can still accurately reflect the true temporal trends. Therefore, it is reasonable to temporarily refrain from modifying the PNGM model to address heteroscedasticity when the objective is limited to generating point predictions. The PNGM model of the United States requires no additional adjustments due to satisfying the homoscedasticity assumption.
From the Chow test results presented in
Table 11, it is evident that for the residual sequences of the PNGM models corresponding to the United States, Russia, and India, the
p-values associated with the
F-statistics for the three countries are all greater than the preset significance level
. Accordingly, the null hypothesis of “no structural break in residuals” cannot be rejected. This result indicates that the fitting performance of the PNGM model remains consistent throughout the entire sample period, and there is no need to adjust the model parameters in stages.
5.3. Prediction of Renewable Power Generation
In the comparative experiments between the PNGM model and the nine traditional and improved grey models, with MAPE, MAE, and RMSPE as evaluation metrics, the results show that the PNGM model is significantly superior to all comparative models in both fitting accuracy and prediction performance. In the renewable energy generation forecasting scenarios for the United States, Russia, and India, the PNGM model can more accurately capture the variation patterns of renewable energy generation. Further, a systematic residual diagnosis of the PNGM models for the three countries was conducted using the Ljung–Box test, Breusch–Pagan test, and Chow test. The results indicate that the residuals of the three models have no significant autocorrelation or structural break, and the residual sequence of the U.S. model has no heteroscedasticity. Based on the aforementioned performance advantages and reliability verification, the following section will focus on the specific application of the PNGM model in forecasting renewable energy generation in the United States, Russia, and India. Based on the parameters filled in
Table 5,
Table 7, and
Table 9, the PNGM models for renewable power generation of the United States, the Russian Federation, and India can be established. The PNGM model for renewable power generation in the United States can be expressed as
The PNGM model for renewable power generation of the Russian Federation can be expressed as
The PNGM model for renewable power generation of India can be expressed as
From these models, it can be easily obtained that the renewable power generation of India, Russia, and the United States from 2025 to 2030. India’s renewable power generation during 2025–2030 will be [415.48, 431.94, 447.05, 460.94, 473.70, 485.42] terawatt-hours. The renewable power generation of the Russian Federation from 2025 to 2030 is projected to be [212.20, 222.49, 223.52, 224.33, 224.97, 225.47] terawatt-hours. The United States’ renewable power generation during 2025–2030 will be [1122.15, 1182.49, 1244.06, 1306.89, 1370.99, 1436.40] terawatt-hours. The predicted values are shown in
Figure 7.
5.4. Uncertainty Interval Estimation in Renewable Power Generation Prediction
To evaluate the predictive performance and uncertainty of the PNGM model in forecasting renewable power generation across the United States, the Russian Federation, and India, we employed a Bootstrap resampling approach with 1000 iterations on model residuals to derive uncertainty intervals (UIs) at 50%, 80%, and 95% confidence levels, as presented in
Table 12,
Table 13 and
Table 14.
For the United States, the predicted renewable power generation exhibits a consistent upward trend, rising from 1005.05 terawatt-hours in 2023 to 1436.40 terawatt-hours in 2030. The UIs illustrate that the 50% UI narrows closely around the predicted value, e.g., [982.77, 1013.28] in 2023, reflecting high precision for short-term prediction. In contrast, the 95% UIs broaden substantially over time, expanding from [957.69, 1048.87] of 2023 to [1083.21, 1870.17] of 2030. Due to the accumulation of stochastic uncertainties in long-term sequential prediction, the expected outcome aligns with the complexity of large-scale energy systems.
In the Russian Federation, the predicted values show a modest upward trajectory that rises from 217.48 terawatt-hours in 2023 to 225.47 terawatt-hours in 2030. The UIs here are narrower than those for the United States, attributable to the relatively more stable context of renewable energy development in Russia. For instance, the 95% UI spans [207.80, 232.03] in 2023 and extends to [152.18, 319.14] by 2030, remaining narrower than U.S. intervals. It indicates there is lower predictive uncertainty, but also smaller absolute growth in generation.
For India, the predicted renewable power generation increases from 378.09 terawatt-hours in 2023 to 485.42 terawatt-hours in 2030, with UIs that balance precision and coverage. The 50% UI highlights near-term precision, e.g., [375.75, 384.77] in 2023. However, the 95% UIs, which are from [364.74, 395.87] in 2023 to [436.86, 610.60] in 2030, capture the escalating uncertainty inherent to India’s rapid and dynamic renewable energy transition.
5.5. Further Analysis and Discussion
In order to validate the performance of the PNGM model, we have established a series of grey models using the renewable power generation of the United States, the Russian Federation, and India. The PGM(1,1) model and the comparative models GM(1,1), FGM(1,1), CFGM(1,1), and FHGM(1,1) all have constant grey action quantities. From
Table 6,
Table 8, and
Table 10, it can be found that the fitting error of the PGM(1,1) model established using renewable energy generation from the United States, Russia, and India is lower than that of GM(1,1), FGM(1,1), CFGM(1,1), and FHGM(1,1). Meanwhile, it can also be observed that the fitting performance of the PNGM model is superior to that of NGM(1,1,
k,
c), FANGM(1,1,
k,
c), and CFNGM(1,1,
k,
c) with the same linear grey action. For example, for forecasting the renewable power generation of India, the fitting MAPE of the PGM model is 2.45%, while the fitting MAPEs of GM(1,1), FGM(1,1), CFGM(1,1), and FHGM(1,1) are 3.82%, 2.71%, 3.82%, and 2.73%, respectively. The fitting MAPE of PNGM is 1.32%, while the fitting MAPEs of NGM(1,1,
k,
c), FANGM(1,1,
k,
c), and CFNGM(1,1,
k,
c) are 3.27%, 2.58%, and 2.35%, respectively. It can be proven that the PAGO operator has a stronger ability to extract useful information from the original sequence than the 1-AGO, FAGO, CFAGO, and HFAGO operators. Meanwhile, it can be found that the PNGM model has the lowest prediction error in forecasting the renewable power generation in the United States, Russia, and India. This proves that the PNGM model has better predictive performance.
Notably, in the specific case of renewable power generation forecasting in the United States, the PGM model exhibits better fitting performance than the proposed PNGM model in terms of fitting MAPE and MAE. The PGM model’s fitting MAPE and MAE are lower than those of the PNGM model. However, the prediction MAPE of the PNGM model (1.80%) is significantly lower than that of the PGM model (3.93%), though the fitting MAPE of the PNGM model is 1.66%, which is slightly higher than the 1.60% of the PGM model. The main reason is that the structures of the PNGM and PGM models are different. The grey action quantity of the PGM model is set as a constant, which means it assumes that the external factors of renewable energy generation remain unchanged over time. In contrast, the PNGM model adopts a time-varying linear grey action quantity, enabling it to dynamically adapt to the subtle and gradual changes in the external factors of renewable power generation. This structural optimization directly leads to different performance in fitting and prediction. The minor advantage of the PGM model in fitting MAPE may stem from its overfitting to historical data, while it achieves a slightly higher degree of consistency with past observations and lacks the ability to generalize to unseen future data.
Moreover, the residual diagnostics are conducted for the PNGM models of the United States, Russia, and India. The results are shown in
Table 11. For all three countries, the
p-values of Ljung–Box statistics at lag 1 and lag 2 are all greater than 0.05. It indicates there is no significant autocorrelation in the residual sequences. Meanwhile, the
p-values of Chow F-statistics of these countries are all greater than 0.05. Thus, there is no significant structural break in the residual sequences across the full sample period. Through the Breusch–Pagan test, it was found that the PNGM model for the United States exhibited no significant heteroscedasticity, while those for Russia and India exhibited significant heteroscedasticity. Since the point prediction results of the PNGM models rely primarily on these unbiased parameter estimations, the models’ point predictions for renewable energy generation in Russia and India can still accurately reflect the true temporal trends.
Based on the result shown in
Figure 7, the scale of renewable power generation of the United States is projected to increase from 1068.67 terawatt-hours in 2024 to 1436.40 terawatt-hours in 2030, with a cumulative expansion of 34.4%. The average annual growth rate fluctuates around 5%, and the absolute growth rate has slightly increased year by year, with a stable growth trend. Renewable power generation of the United States will exhibit stable and sustained growth characteristics in the coming years. In India, renewable power generation has increased from 397.08 terawatt-hours in 2024 to 485.42 terawatt-hours in 2030, which maintains an overall growth trend. However, the annual growth rate gradually declines from 4.63% to 2.47%, indicating a weakening trend in the expansion pace. This phenomenon may align with the “natural growth slowdown following base expansion” pattern inherent to its developmental stage. Overall, India’s renewable energy generation from 2024 to 2030 is characterized by a “steady growth with a sustained deceleration in the growth rate” pattern. The scale of Russia’s renewable power generation has fluctuated slightly from 218.86 terawatt-hours to 225.47 terawatt-hours with a very small overall expansion. It indicates a slow increase in its renewable power generation. This trend indicates that renewable power generation of Russia may have entered a “plateau phase”, characterized by weak growth momentum and overall stability as its primary feature.
Meanwhile, the uncertainty interval of each point prediction is estimated by the Bootstrap method. Across the three regions, two consistent patterns emerge. First, UIs widen with increasing forecast horizon, reflecting the universal challenge of growing uncertainty in long-term predictions. Second, the width and absolute range of UIs correlate strongly with the scale and volatility of regional renewable energy development. The wider intervals for the U.S. reflect that the scale of the renewable power generation is large and its growth is fast. For India, the development of renewable power generation is dynamic, and its growth is also rapid. The narrower intervals for Russia indicate that its development of renewable power generation is stable and grows more slowly. This multi-region analysis not only validates the PNGM model’s applicability in small-sample and information-poor scenarios but also underscores its capability to capture region-specific uncertainty characteristics, thereby providing decision-makers with risk-calibrated reference ranges for renewable energy planning.
6. Conclusions
In this study, a novel Probabilistic Non-homogeneous Grey Model (PNGM) is proposed and applied to predict renewable energy generation of the United States, the Russian Federation, and India. The newly proposed model fully leverages the capability of the Probability Accumulation Operator (PAGO) to extract useful information from raw sequences. The optimal parameters of the PNGM model are determined by using the heuristic WOA algorithm, significantly enhancing the performance of the traditional non-homogeneous grey model. The PNGM models for the United States, the Russian Federation, and India are established by using historical data on renewable energy generation. Compared with the other nine comparison models, the comprehensive performance of the PNGM model is superior. Meanwhile, the residual diagnosis results show that the PNGM models established for these three countries have a strong ability to capture patterns. The PNGM models are applied to forecast renewable power generation in the United States, Russia, and India, deriving projected renewable energy generation from 2025 to 2030. Specifically, by 2030, the projected figures are 1436.40 terawatt-hours for the United States, 485.42 terawatt-hours for India, and 225.47 terawatt-hours for Russia. The United States’ average annual growth rate is around 5%, India’s is around 3%, and Russia exhibits almost no growth. By the Bootstrap method, the uncertainty interval of each predicted value is estimated. They can provide more reliable data support and assistance to relevant departments and enterprises. Based on the projection results, there are some suggestions, as follows: The United States should continue to strengthen technological innovation to maintain steady and rapid growth in renewable power generation. Russia should increase investment in renewable power generation to drive a steady rise in total generation. India can stem the slowdown in the growth rate of renewable energy generation through deeper international cooperation. Overall, countries around the world should continue to increase investment in renewable power and technological innovation, fully exerting the key role of renewable energy in the global power transition.
Future research can be extended in three respects. Firstly, the potential application of the PNGM model in forecasting solar power, wind power, or hydropower can be explored. Meanwhile, efforts should be made to integrate it with other forecasting technologies to enhance prediction accuracy and generalization. Secondly, it can be expanded towards multivariate grey models, which can incorporate more influencing factors to boost the model’s applicability because the PNGM model is a univariate model. Thirdly, the characteristics of other energy sources can be explored to construct new predictive models. Thereby, more comprehensive technical support can be provided for energy structure adjustment and transformation.