Next Article in Journal
Impact of Prolonged High-Intensity Training on Autonomic Regulation and Fatigue in Track and Field Athletes Assessed via Heart Rate Variability
Previous Article in Journal
True Triaxial Investigation of the Effects of Principal Stresses and Injection Pressure on Induced Seismicity Behavior in Geothermal Reservoirs
Previous Article in Special Issue
Energy Management Strategy for Hybrid Electric Vehicles Based on Experience-Pool-Optimized Deep Reinforcement Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Recursively Updated Probabilistic Model for Renewable Generation

1
Electric Power Research Institute, State Grid Anhui Electric Power Co., Ltd., Hefei 230022, China
2
State Grid Anhui Electric Power Co., Ltd., Hefei 230022, China
3
Anhui Provincial Key Laboratory of New Energy Utilization and Energy Saving, Hefei University of Technology, Hefei 230009, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(19), 10546; https://doi.org/10.3390/app151910546
Submission received: 28 August 2025 / Revised: 22 September 2025 / Accepted: 25 September 2025 / Published: 29 September 2025

Abstract

The Gaussian Mixture Model (GMM) is commonly used to formulate the probabilistic model for quantifying uncertainties in renewable generation. However, traditional static probabilistic models may not efficiently adapt and learn from newly forecasted and measured data. In this paper, we propose a recursively updated probabilistic model that leverages a recursive estimation method to update the parameters of the GMM based on continuously arriving data of renewable generation. This recursive modeling approach effectively incorporates new observations while discarding outdated samples, enabling the tracking of time-varying uncertainties in renewable generation in an incremental manner. Furthermore, we introduce an extra calibration stage to enhance the long-term accuracy of the probabilistic model after a large number of incremental updates. The main contribution is to address the potential degradation of performance caused by suboptimal incremental updates accumulated over time. Numerical tests demonstrate that the proposed model achieves 5–10% higher log likelihood in characterizing renewable generation uncertainties compared to purely recursive models, while reducing computational time by three to four orders of magnitude (1000 to 10,000 times) relative to conventional EM. These results highlight the proposed model’s suitability for real-time probabilistic modeling of renewable generation, with potential applications in system operation.

1. Introduction

The accelerating development of renewable generation has brought new challenges to the secure and economical operation of power systems [1,2], because its power output depends on fluctuating meteorological conditions such as wind speed and solar radiation [2,3,4].
In order to capture the uncertainties of renewable generation, it is crucial to establish the probabilistic model of actual power output beyond deterministic forecast values [5]. As a powerful probability density estimator, the Gaussian Mixture Model (GMM) is characterized by its outstanding flexibility to fit arbitrary multidimensional correlated random variables precisely [6]. Thus, GMM has been successfully applied to modeling spatial-temporal probabilistic characteristics of uncertain renewable generation [7,8].
Due to its favorable properties for chance-constrained programming, the GMM-based probabilistic model of renewable generation is also widely employed in the uncertainty-aware operation of power systems [9,10]. It serves as a robust tool for decision-making under uncertainty across a range of applications. For example, in unit commitment, the GMM helps optimize the day-ahead scheduling of generating units by probabilistically characterizing wind and solar power output [11]. Similarly, it is instrumental in economic dispatch [12,13,14,15], where it enables a more precise trade-off between operational costs and risks. Specifically, some studies use it to capture forecast errors [12] or embed it within a chance-constrained framework to enhance security [13], while others apply GMM to multi-area problems to efficiently handle uncertainty across interconnected grids [14,15]. Beyond this, the GMM proves crucial for the aggregation of distributed energy resources [16,17], as it provides a compact and accurate probabilistic representation of the collective output from numerous small generators, which is essential for effective grid management. The model also provides a powerful tool for the allocation of flexible ramping capacity [18,19] by quantifying the required ramping capability based on the probabilistic fluctuations of high-penetration renewable generation. By accurately modeling these needs, this approach helps optimize flexible resources like fast-ramping generators or energy storage systems to maintain grid balance and stability.
As a parametric probabilistic model, estimating parameters of GMM based on empirical data is essential to characterize spatial-temporal uncertainties of renewable generation precisely [20,21]. The well-known Expectation Maximization (EM) algorithm is the most acknowledged approach to estimate parameters of GMM from historical samples [22,23]. In the context of probabilistic modeling of renewable generation, the typical EM algorithm requires an offline training set composed of historical forecast and corresponding actual generation to tune parameters of GMM, and then the tuned GMM with static parameters is deployed online to predict uncertainties of renewable generation in the future [24].
In real-world power systems, numerous renewable energy sources continuously generate a large volume of data from forecast and measurement [25]. Probabilistic characteristics of renewable generation represented by these emerging samples are obviously time-varying due to meteorological fluctuations [26,27,28]. Static GMM trained on offline data cannot capture the non-stationary uncertainties of renewable generation in practice. Meanwhile, because the EM algorithm needs to traverse the entire dataset per iteration, re-estimating the parameters of GMM is a time-consuming and computationally intensive task considering the size of historical samples [29,30]. Thus, it is a critical issue to adjust parameters of GMM efficiently in response to continuously emerging data if we aim to track the time-varying uncertainties of renewable generation [31]. The ability to rapidly integrate new data streams into existing probabilistic models opens avenues for more responsive and adaptive decision-making in managing renewable energy resources.
In the field of machine learning and data mining, incremental or recursive estimation of Gaussian Mixture Model (GMM) parameters has garnered remarkable attention [32,33,34,35]. This approach is particularly advantageous as it addresses the significant computational burden associated with updating a probabilistic model dynamically [36,37]. Instead of relying on the entire dataset, the incremental update strategy modifies GMM parameters recursively using only information from new data samples. This recursive process allows for the efficient, on-the-fly adjustment of the model’s parameters as new data stream in, which is crucial for applications that require continuous model adaptation. This principle has been successfully extended to various domains, including the probabilistic modeling of renewable energy. For instance, the authors in [38] proposed a distributed variant of the incremental GMM update algorithm and applied it to the probabilistic modeling of wind power forecast errors. This demonstrates the practical utility of incremental GMM in handling large-scale dynamic data typical of modern power systems.
In this paper, we propose a comprehensive framework for updating the parameters of the GMM using continuously arriving samples. We focus on exploring the application of this framework as a recursively updated probabilistic model for renewable generation. This recursive update method addresses two main issues, which are often overlooked in existing research on incremental GMM.
Firstly, we introduce the calibration of GMM parameters to enhance the long-term performance of the probabilistic model after a large number of incremental updates. While previously proposed recursive update algorithms are computationally efficient, they do not guarantee the optimality of parameters. The inaccuracies accumulated after recursive updates may compromise the precision of the probabilistic model in characterizing the uncertainties of renewable generation. To mitigate this, we combine an efficient recursive update step with an auxiliary calibration step, periodically invoked to compensate for suboptimality introduced by the former and to prevent degradation of precision over time.
Secondly, our recursively updated probabilistic model allows for a bidirectional update of the training dataset by incorporating new observations of renewable generation while concurrently discarding outdated samples. This stands in contrast to previous research where the total size of the training dataset monotonically increases due to the continuous integration of new samples. As the archive of historical renewable generation grows, the decreasing ratio of upcoming new samples against existing samples weakens the influence of incremental updates. In contrast, our bidirectional update strategy mitigates the inflation of the training dataset by gradually replacing the oldest samples with new ones in an incremental manner. This ensures that the most recent samples are always emphasized, enabling the model to capture the latest trend of uncertainties more effectively.
The remainder of this paper is organized as follows: The probabilistic model of renewable generation with GMM and the conventional parameter estimation algorithm are introduced in Section 2 and Section 3 as prelude. The incremental update framework for GMM using streaming samples of renewable generation is proposed in Section 4. Section 5 presents the results of case study. Conclusions are drawn in Section 6.

2. Probabilistic Model of Renewable Generation with Gaussian Mixture Model

Suppose we have already obtained deterministic forecast curves of N R RESs with T lookahead points. These forecasts can be denoted as a N R × T -dimensional vector (1) where vec represents the operator that reshapes the matrix as a vector. The corresponding actual power generated by N R RESs revealed after the forecast is denoted as P a (2) with the same size of P f . The forecast error Δ P is defined as the difference between P a and P f .
P f = vec P 1 , 1 f P 1 , T f P N R , 1 f P N R , T f
P a = vec P 1 , 1 a P 1 , T a P N R , 1 a P N R , T a
Δ P = P a P f
If P f and P a are concatenated into a new 2 × N R × T -dimensional vector P o b to include the observed pairs between the forecast and the actual generation, the joint probability distribution of P o b can be described by the Gaussian Mixture Model (GMM). ω k , μ k , Σ k denote the weight, expectation, and covariance matrix of each Gaussian component in GMM, respectively.
P o b = P f P a
P D F P o b x = k = 1 K ω k N x   |   μ k , Σ k
N x   |   μ , Σ = exp 1 2 x μ T Σ 1 x μ 2 π dim x det Σ
k = 1 K ω k = 1 , ω k > 0
The mean vector and covariance matrix can be separated by different parts of P o b .
μ k = μ f k μ a k
Σ k = Σ f f k Σ f a k Σ a f k Σ a a k
Once the joint probability model of P o b is revealed, we are more interested in the conditional distribution of the actual generation P a with respect to the deterministic forecast value known as P f c s t in advance. As pointed out by [2], the conditional distribution of P a is still an N R × T -dimensional GMM whose parameters can be inferred from the joint probability model of P o b analytically.
P D F P a x   P f = P f c s t = k = 1 K ω ¯ k N x   |   μ ¯ k , Σ ¯ k
ω ¯ k = ω k N p f c s t μ f k , Σ f k k = 1 K ω k N p f c s t μ f k , Σ f k
μ ¯ k = μ a k + Σ a f k Σ f f k 1 p f c s t μ f k
Σ ¯ k = Σ a a k Σ a f k Σ f f k 1 Σ f a k
The conditional distribution of forecast error Δ P is described as a GMM as well.
P D F Δ P x P f = p f c s t = P D F P a x + p f c s t   P f = p f c s t   = k = 1 K ω ¯ k N x + p f c s t |   μ ¯ k , Σ ¯ k

3. Parameter Estimation Algorithm for Probabilistic Model of Renewable Generation

As mentioned in Section 2, the key to an accurate model of renewable generation is the joint probability distribution of P o b (5). The parameters of GMM Θ (including weight, expectation, and covariance matrix of each Gaussian component) in (5) can be efficiently estimated by the Expectation Maximization (EM) algorithm. We will introduce the outline of the EM algorithm as the prerequisite of its incremental variation.
Θ = ω k , μ k , Σ k k = 1 , 2 , K
Step 1: N samples of P o b composed of historical forecast and actual generation data, X 1 , X 2 , X N , are collected to formulate the training set.
Step 2: With the given number of components in GMM K , the parameters of GMM Θ are initialized to bootstrap the EM algorithm. A commonly used approach is to execute k-means algorithm with K clusters on the training set, then the parameters of the kth component is calculated based on training samples in the kth cluster Ω k . N k denotes the number of training samples belonging to Ω k .
w k = N k N
μ k = 1 N k i Ω k X i
Σ k = 1 N k i Ω k X i μ k X i μ k T
Step 3 (Expectation step, E-step): The responsibility of each training sample to each Gaussian component based on current parameters of GMM.
r i k = ω k N X i   |   μ k , Σ k k = 1 K ω k N X i   |   μ k , Σ k i = 1 , 2 , , N
Step 4 (Maximization step, M-step): The parameters of GMM are adjusted to maximize the total likelihood of training samples.
w k = 1 N i = 1 N r i k
μ k = 1 i = 1 N r i k i = 1 N r i k X i
Σ k = 1 i = 1 N r i k i = 1 N r i k X i μ k X i μ k T
Step 5: The log likelihood of training samples is calculated and compared with the result from the previous iteration. If their difference is smaller than the tolerable criteria, the EM algorithm has converged; otherwise, steps three and four are iteratively repeated to further improve the parameters of GMM.
l = log i = 1 N k = 1 K ω k N X i   |   μ k , Σ k = i = 1 N log k = 1 K ω k N X i   |   μ k , Σ k
Following these steps from one to five, we can effectively estimate the parameters of GMM with the goal of maximizing log likelihood of the whole training set under GMM.

4. Recursive Parameter Update for Streaming Samples of Renewable Generation

After estimating parameters in the joint probability distribution of P o b (5), it can be employed to infer the conditional distribution of actual renewable generation based on the latest forecast values. During the operation of power systems, the forecasts of renewable generation and their corresponding actual values are continuously updated and cumulated in a streaming fashion. It is important to consolidate the recent observations of renewable generation with the pre-trained probabilistic model to improve its accuracy. Although we can abandon the existing model completely and estimate the parameters of GMM from scratch on the extended training set including new samples, this naïve approach is not scalable and highly inefficient for online application. Thus, we use the following three procedures to recursively update the parameters of joint probability distribution of P o b by keeping track of the newly collected data samples with moderate computational overhead: First, learning new samples ensures that the probabilistic model incorporates the most recent forecast and measurement data of renewable generation, thereby capturing the latest time-varying uncertainty patterns. Second, removing old samples prevents outdated information from dominating the training set and ensures that the model emphasizes current conditions instead of being diluted by obsolete data. Third, calibration of parameters is introduced to correct suboptimality accumulated after multiple incremental updates.

4.1. Procedure 1 (Learning New Samples)

When M new samples of P o b are collected, the E-step and M-step in the original EM algorithm are adapted to update the parameters without traversing existing training samples.
In the E-step, we calculate responsibilities of new samples with existing parameters of GMM:
r i k = ω k N X i   |   μ k , Σ k k = 1 K ω k N X i   |   μ k , Σ k i = N + 1 , , N + M
In the M-step, the parameters should be updated using the extended training set with N + M samples.
w k n e w = 1 N + M i = 1 N + M r i k
μ k , n e w = 1 i = 1 N + M r i k i = 1 N + M r i k X i
Σ k , n e w = 1 i = 1 N + M r i k i = 1 N + M r i k X i μ k , n e w X i μ k , n e w T
By observing the difference between (25)–(27) and (20)–(22), the parameter update formulas above can be rewritten in the following equivalent recursive forms, (28)–(30), to avoid duplicate computation on the old training samples. The update formula of covariance matrix (30) depends on the equality (31).
w k n e w = 1 N + M N w k + i = N + 1 N + M r i k
μ k , n e w = 1 i = 1 N + M r i k μ k i = 1 N r i k + i = N + 1 N + M r i k X i = μ k + 1 i = 1 N + M r i k i = N + 1 N + M r i k X i μ k = μ k + 1 N + M w k n e w i = N + 1 N + M r i k X i μ k
Σ k , n e w = 1 i = 1 N + M r i k i = 1 N r i k X i μ k , n e w X i μ k , n e w T + i = N + 1 N + M r i k X i μ k , n e w X i μ k , n e w T = N w k N + M w k n e w Σ k + N w k N + M w k n e w μ k μ k , n e w μ k μ k , n e w T + 1 N + M w k n e w i = N + 1 N + M r i k X i μ k , n e w X i μ k , n e w T
i = 1 N r i k X i μ k , n e w X i μ k , n e w T = i = 1 N r i k X i μ k + μ k μ k , n e w X i μ k + μ k μ k , n e w T = i = 1 N r i k X i μ k X i μ k T + μ k μ k , n e w μ k μ k , n e w T + 2 i = 1 N r i k X i μ k μ k μ k , n e w T = i = 1 N r i k Σ k + μ k μ k , n e w μ k μ k , n e w T
Because PDF formula of GMM contains the inverse of covariance matrix, Cholesky decomposition of each covariance matrix (32) where Cholesky factor L is a lower triangular matrix is calculated and stored to avoid computing its inverse explicitly. Σ 1 x μ can be obtained by solving the linear systems L y = x μ and L T z = y .
Σ = L L T
Calculating the determinant of the covariance matrix is also straightforward with Cholesky decomposition.
det Σ = i = 1 dim X L i i 2
The computational complexity of calculating Cholesky decomposition of an n × n matrix is O n 3 . However, Cholesky decomposition can be recursively updated with only O n 2 computations for basic operations:
1.
Scaling operation: If matrix Σ is multiplied by a constant coefficient c , its Cholesky factor is multiplied by c .
Σ ˜ = c Σ = L ˜ L ˜ T
L ˜ = c L
2.
Rank-1 update operation: If matrix Σ is updated by a rank-1 matrix (36) where v is a vector, its Cholesky decomposition can be updated with O n 2 complexity. The detailed algorithm can be found in Section 6.5.4 of [39] and omitted here. There have been several implementations in well-known numerical software, such as cholupdate in MATLAB (R2024a) and lowrankupdate in Julia (1.10.2).
Σ ˜ = Σ + v v T
The recursive update formula of covariance (30) is composed of one scaling operation and M + 1 rank-1 update operations. Thus, Cholesky decomposition of the covariance matrix can also be updated recursively to reuse existing Cholesky factors.

4.2. Procedure Two (Removing Old Samples)

In parallel to learning new samples, we also want to remove some old samples from the training set to restrict its size and accommodate more recently updated samples. Here, we assume the M oldest samples of P o b (from No. 1 to No. M ) are selected to be removed. The adapted versions of E-step and M-step for removing old samples are presented as follows:
In the E-step, responsibilities of old samples r i k i = 1 , , M are retrieved from historical computational traces directly.
In the M-step, the parameters should be updated using the reduced training set with N M samples.
w k n e w = 1 N M i = M + 1 N r i k
μ k , n e w = 1 i = M + 1 N r i k i = M + 1 N r i k X i
Σ k , n e w = 1 i = M + 1 N r i k i = M + 1 N r i k X i μ k , n e w X i μ k , n e w T
Like learning new samples, the parameter update formulas can be rewritten in equivalent recursive forms to avoid computation on unaffected training samples.
w k n e w = 1 N M N w k i = 1 M r i k
μ k , n e w = 1 i = M + 1 N r i k μ k i = 1 N r i k i = 1 M r i k X i = μ k 1 i = M + 1 N r i k i = 1 M r i k X i μ k = μ k 1 N M w k n e w i = 1 M r i k X i μ k
Σ k , n e w = 1 i = M + 1 N r i k i = 1 N r i k X i μ k , n e w X i μ k , n e w T i = 1 M r i k X i μ k , n e w X i μ k , n e w T = N w k N M w k n e w Σ k + N w k N M w k n e w μ k μ k , n e w μ k μ k , n e w T 1 N M w k n e w i = 1 M r i k X i μ k , n e w X i μ k , n e w T

4.3. Procedure Three (Calibration of GMM)

Learning new samples and removing old samples can be regarded as one adaptive E-step and M-step to update the parameters of GMM for renewable generation, where the convergence of parameter estimation is not achieved to ensure its optimality. Consequently, suboptimality might be accumulated after executing these incremental update procedures multiple times and undermining the performance and accuracy of GMM as a probabilistic model of renewable generation.
Thus, full-scale E-step and M-step should be executed on the whole training set (including both updated samples and initial samples not removed) for several iterations to calibrate parameters of the GMM. The algorithm has been described Section 2.

4.4. General Framework for Recursive Probabilistic Model

As illustrated in the flow chart in Figure 1, we propose the following framework to maintain a recursive probabilistic model of renewable generation based on continuously updated samples by composing three kinds of procedures discussed above.
  • When new observations of renewable generation including forecast and actual generation are collected, procedure one is invoked to extend the training and update the parameters of probabilistic model.
  • The size of the training set is used as the criterion to determine whether some old samples should be removed to keep its compactness. If the size is beyond the determined threshold, some old samples are disposed from the training set and procedure two is invoked to update the parameters of the probabilistic model accordingly.
  • When the probabilistic model of forecast errors experiences multiple rounds of recursive updates, including learning new samples and removing old samples, its parameters should be calibrated by procedure three periodically. The change in training set is regarded as the triggering criteria of calibration. For example, we can invoke the calibration procedure if 5% of samples in the training set are replaced by new samples since the last calibration.
Remark (Analysis of computational complexity): For procedure one (learning new samples) and two (removing old samples), the computational complexity of their update formulas for parameters (28)–(30) and (40)–(42) is O M K D 2 where D is the dimension of data samples ( D = 2 N R T in our case). The performance of recursive update procedures is only related to the size of the alternated data samples and not affected by the size of the whole training set. Thus, they can be invoked frequently to deal with streaming samples without much computational effort.
Comparatively, a full-scale update of parameters (20)–(22) requires O N K D 2 operations. Because the alternated samples represent only a small fraction of the overall training set M N , calibrating the model parameters is a more computationally expensive operation and should therefore be performed at a lower frequency.

5. Numerical Tests

The techno-economic WIND toolkit [40] from NREL serves as the data source for renewable generation in this section. The whole dataset contains aligned forecast and actual generation data for wind power for seven years at 120,000 sites in the US. The forecast data are available at a 1 h resolution for 1 h, 4 h, 6 h, and 24 h forecast horizons, and the actual generation is available at a 5 min resolution estimated from meteorological data. The 6 h ahead forecast and actual wind power generation of 40 sites (No. 205–244) are chosen as examples to establish the complete sample set of P o b ( N R = 40 , T = 6 ) with 1 h resolution for 7 years, where the actual generation is down sampled to align with forecast data. The dimension of P o b is 2 × N R × T = 480 . All numerical tests are implemented in Julia and executed on a laptop with Intel i7-1360P CPU and 32GB RAM.
The numerical tests are separated into two parts with different emphasis on precision and computational efficiency of the proposed recursively updated probabilistic model.

5.1. Comparing Precision to Characterize Uncertainties of Renewable Generation

We designed the following steps to emulate the continuous process of collecting new samples of P o b , updating the probabilistic model, and predicting the probability distribution of actual generation on the fly.
Step 1: The first 8760 samples of P o b (for one entire year) are selected as the initial training set and the EM algorithm is employed to obtain the parameters of GMM for P o b .
Step 2: The next N t e s t observations of P o b are used to evaluate the performance of the probabilistic model of P o b . For each observation of forecast generation, with P f denoted as p f c s t , we can obtain the conditional distribution of actual generation P a based on the GMM of P o b according to (10). The log likelihood of corresponding actual generation p a c t l under the conditional distribution is calculated as the performance index because higher likelihood means that the GMM of P o b provides more precise characterization for randomness of wind power to predict actual generation based on the forecast value.
L = log P D F P a p a c t l P f = p f c s t = log k = 1 K ω ¯ k N p a c t l   |   μ ¯ k , Σ ¯ k
Step 3: The N t e s t observations of P o b in step two are absorbed into the training set. If the total number of training samples N exceeds the predetermined upper bound N max , the oldest N N a p p samples are discarded to reduce the size of training set to an appropriate level N a p p .
Step 4: The parameters of the GMM are updated according to the change in training samples in step three, and then we go back to step two and evaluate the performance of the probabilistic model of P o b on new observations.
Steps two to four above are executed for N i t e r iterations, and all the parameters in our numerical tests are set as follows:
N i t e r = 3000 , N t e s t = 3 , N max = 20000 , N a p p = 16000
The following three strategies are used to update the parameters of GMM in step four and compare their influence on the performance of the probabilistic model of P o b :
Strategy 1 (Initial parameter): This strategy does not update the parameters of GMM and always uses the initial parameters of GMM obtained in step one.
Strategy 2 (Recursive update without calibration): This strategy updates the parameters of GMM with the proposed procedure 1 to learn new samples and procedure 2 to remove old samples but never invokes the calibration procedure.
Strategy 3 (Recursive update): Compared with the previous strategy, this strategy invokes the proposed procedure three automatically to calibrate the parameters of GMM if step two and three have been executed 50 times since the previous calibration.
We compare the accuracy of the probabilistic model via the log likelihood of actual generation under the GMM of P o b calculated in step two. Each strategy has collected N i t e r × N t e s t = 9000 results of log likelihood. We calculate the cumulated average log likelihood for comparison. The results of the three strategies are shown in Figure 2.
L c u m a v g n = 1 n i = 1 n L i
As shown in Figure 2, the proposed strategy three exhibits the highest cumulated average log likelihood among all probabilistic models. It proves that the combination of recursive updates and calibration steps can effectively update the probabilistic model of renewable generation to capture its time-varying characteristics. The performance of strategy two is approximately equal to strategy three in the initial stages due to the effectiveness of the recursive update, but the gap between strategy two and three grows and then stabilizes when more and more new observations of renewable generation are collected. The performance degradation of strategy two can be explained by its lack of a calibration step, which is essential to compensate for suboptimality accumulated in recursive update steps. To serve as a control group, strategy one demonstrated substantially lower performance versus the remaining strategies that severely degrade as more samples of renewable generation are observed, which highlights the necessity of updating the probabilistic model based on new data samples.
Based on comparisons on the precision of predictions, the proposed recursive probabilistic model has the following advantages compared to the other two approaches.
  • Compared with the static probabilistic model with fixed parameters, the recursive probabilistic model can adapt to new observations of renewable generation and update internal parameters of GMM dynamically. Thus, the predicted probability distribution of actual generation is more accurate and does not degrade due to outdated historical samples.
  • Compared with the recursive probabilistic model without calibration in previous studies, our model demonstrates higher long-term accuracy in the prediction of renewable generation. The improvement is mainly attributed to periodic calibration procedures to overcome the suboptimality of parameters accumulated during recursive updates.

5.2. Comparing Computational Time to Acquire Parameters of Probabilistic Model of Renewable Generation

In this section, we conduct two experiments to demonstrate computational efficiency to accommodate new observations of renewable generation provided by the recursive probabilistic model compared with GMM tuned by a conventional full-scale EM algorithm.
In general, we assume that there are N historical samples of renewable generation P o b and the probabilistic model in the form of GMM has been tuned via EM algorithm. The task is to update the parameters of the probabilistic model with knowledge of N e x t r a new observations of P o b . The conventional full-scale EM algorithm reconstructs the parameters of the GMM on the whole training set including N + N e x t r a observations by executing E-step and M-step in Section 3 iteratively. Meanwhile, the proposed recursive probabilistic model updates the parameters of GMM by traversing N e x t r a new observation once without iteration.
The first experiment fixes N e x t r a as 10 and enlarges N from 8760 (12 months of samples) to 20,440 (28 months of samples) in interval of 4 months. The performance of the two approaches is shown in Table 1. The recursive probabilistic model demonstrates minor fluctuations in computational time, maintaining performance below 0.13 s constantly. Notably, this efficiency persists regardless of the increasing number of existing samples N .
In contrast to the recursive probabilistic model, the full-scale EM algorithm exhibits a rapid escalation in total computational time as the sample size increases. This increase is influenced by two main factors: the number of iterations the EM algorithm undergoes, and the duration of each iteration. The number of iterations is particularly sensitive to the initial parameters, which are often derived from clustering methods like k-means and the criteria set for convergence. Meanwhile, the time spent on each iteration is approximately proportional to the total size of the training samples. This is because each iteration of the EM algorithm requires processing all the samples, making the process more time-consuming as the sample size grows.
Empirical evidence from Table 1 supports these observations. It shows that the average time spent per iteration increases about fourfold, from 4.0 s to 15.8 s, as the sample size expands from 8760 to 20,440. When the sample size reaches 20,440, the total computational time for the full EM algorithm soars to 649 s. This duration is nearly 12,000 times longer than that of the recursive probabilistic model, underscoring the significant difference in scalability and efficiency between the two methodologies.
The second experiment was conducted with a fixed parameter, N = 11680 , while varying the number of new samples N e x t r a between 10 and 90. This setup was designed to elucidate the relationship between the number of new samples and the computational efficiency of the two methods. The result of the experiment is presented in Table 2.
The time consumed by the recursive update process is positively related to the quantity of processed new samples N e x t r a and grows at an approximately linear rate. This empirical finding is in harmony with our prior theoretical analysis regarding computational complexity, underscoring the predictability of the recursive update’s performance. In contrast, the iteration time for the full EM algorithm displayed a tendency to oscillate around the 10 s mark. This pattern can be attributed to the algorithm’s computational complexity being more acutely affected by minor variations in the total number of samples N + N e x t r a (ranging from 11,690 to 11,770), rather than the number of new samples.
The recursive update procedure adeptly integrates up to 90 new observations in a mere 0.34 s, demonstrating remarkable computational efficiency. This performance is especially noteworthy when juxtaposed with the full-scale EM algorithm, where the recursive update method requires less than 0.1% of the time needed by its counterpart (417 s).
The experiments conducted in this section demonstrate a significant reduction in computational time when employing the proposed recursive update algorithm, as opposed to the traditional full-scale EM algorithm. A key observation is that the degree of acceleration is influenced by the ratio of existing samples to the newly added samples, which leads to a remarkable 1000- to 10,000-fold decrease in computational time as evidenced in our experiments.
In scenarios where the number of renewable energy sources expands, the computational demands of the full EM algorithm for updating a new batch of observations could become impractically burdensome, potentially extending to several hours. In contrast, the recursive probabilistic model demonstrated the capability to accomplish similar tasks within a matter of seconds. The efficiency and agility of the recursive update algorithm highlight its potential for online probabilistic modeling, particularly in processing real-time streaming data. This aspect is especially pertinent in practical applications involving renewable energy generation, where timely and efficient data processing is paramount.

5.3. Discussion

To highlight the relative merits of different GMM updating strategies, as detailed in Table 3, we compare the proposed recursive GMM with sample elimination and calibration against two representative approaches from recent studies, as well as the conventional EM-based static GMM. The comparison focuses on the updating mechanism, treatment of historical data, calibration strategy, computational efficiency, and accuracy performance.
The conventional EM algorithm achieves high accuracy in static settings but lacks adaptability to non-stationary environments, and its computational burden increases significantly with data size. The recursive GMM with a forgetting factor [34] effectively emphasizes recent data and enables efficient online updating. However, it relies heavily on the proper selection of the forgetting parameter and does not address the long-term accuracy degradation problem. Similarly, the online GMM with a forgetting factor [38] provides rapid adaptation for renewable energy forecasting, but the absence of a calibration mechanism may lead to accumulated bias over time.
In contrast, the proposed method integrates recursive incremental updating, sample elimination, and periodic calibration, thereby maintaining computational efficiency while ensuring long-term stability. By discarding outdated samples, the model captures the most recent stochastic behavior, and the calibration step prevents cumulative errors in recursive updates. Numerical experiments confirm that this approach achieves a favorable balance between real-time adaptability and long-term accuracy, making it particularly suitable for renewable energy scenarios with significant variability.

6. Conclusions

We propose a recursively updated probabilistic model based on the GMM for renewable generation, aimed at continuously tracking time-varying uncertainties using emerging forecasted and measured data. Our approach leverages an efficient incremental learning algorithm, allowing the parameters of the probabilistic model to be autonomously adjusted by learning from new observations and discarding old ones simultaneously, with slight computational burden. Furthermore, a periodic calibration step is introduced to maintain the long-term performance of the probabilistic model after a large number of recursive updates.
In our numerical experiments conducted on empirical data of wind power, our recursive model demonstrates higher precision in prediction compared to existing recursive update strategies, which improves the average log likelihood of predictions by approximately 5–10% over long-term updates. It also shows a significant reduction in computational time when compared to conventional full-scale EM algorithms. For instance, updating parameters with 20,000 training samples and 10 new observations required only 0.13 s, whereas the conventional EM algorithm took 649 s, nearly 5000 times slower.
In practice, the significance of this study lies in its ability to cope with the inherently time-varying characteristics of renewable generation. Unlike static offline models that quickly become outdated, the proposed recursive method provides a computationally efficient and accurate tool to dynamically update probabilistic models in real time, thereby offering strong potential for application in power system operation.
The proposed method has one limitation. Each step of the incremental update is computationally efficient but cannot guarantee parameter optimality; while the periodic full calibration ensures accuracy through multiple iterations, it is computationally expensive. How to coordinate these two procedures to achieve a better trade-off between efficiency and accuracy remains an open question for future research.

Author Contributions

Conceptualization, W.L. and Y.Y.; methodology, W.L.; software, Y.Y.; validation, W.L. and C.Z.; formal analysis, H.Z. and Y.Y.; investigation, S.F. and Z.Q.; resources, W.L.; data curation, Y.Y.; writing—original draft preparation, W.L.; writing—review and editing, W.L., H.Z. and Y.Y.; visualization, H.Z.; supervision, C.Z.; project administration, S.F. and Z.Q.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Science and Technology Project of State Grid Anhui Electric Power (No. B3120524003K).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors want to thank the editor and anonymous reviewers for their valuable suggestions for improving this paper.

Conflicts of Interest

Authors Wei Lou, Cheng Zhao, Shen Fan and Zhenbiao Qi were employed by the company State Grid Anhui Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
GMMGaussian Mixture Model
EMExpectation Maximization

References

  1. Mahdavi, M.; Jurado, F.; Schmitt, K.; Chamana, M. Electricity Generation from Cow Manure Compared to Wind and Photovoltaic Electric Power Considering Load Uncertainty and Renewable Generation Variability. IEEE Trans. Ind. Appl. 2024, 60, 3543–3553. [Google Scholar] [CrossRef]
  2. Mai, T.; Hand, M.M.; Baldwin, S.F.; Wiser, R.H.; Brinkman, G.L.; Denholm, P.; Arent, D.J.; Porro, G.; Sandor, D.; Hostick, D.J.; et al. Renewable Electricity Futures for the United States. IEEE Trans. Sustain. Energy 2014, 5, 372–378. [Google Scholar] [CrossRef]
  3. Islam, M.K.; Hassan, N.M.S.; Rasul, M.G.; Emami, K.; Chowdhury, A.A. Forecasting of Solar and Wind Resources for Power Generation. Energies 2023, 16, 6247. [Google Scholar] [CrossRef]
  4. Xiong, B.R.; Lou, L.; Meng, X.Y.; Wang, X.; Ma, H.; Wang, Z.X. Short-term wind power forecasting based on Attention Mechanism and Deep Learning. Electr. Power Syst. Res. 2022, 206, 107776. [Google Scholar] [CrossRef]
  5. Sharifzadeh, M.; Sikinioti-Lock, A.; Shah, N. Machine-learning methods for integrated renewable power generation: A comparative study of artificial neural networks, support vector regression, and Gaussian Process Regression. Renew. Sustain. Energy Rev. 2019, 108, 513–538. [Google Scholar] [CrossRef]
  6. Zhang, J.H.; Yan, J.; Infield, D.; Liu, Y.Q.; Lien, F.S. Short-term forecasting and uncertainty analysis of wind turbine power based on long short-term memory network and Gaussian mixture model. Appl. Energy 2019, 241, 229–244. [Google Scholar] [CrossRef]
  7. Singh, R.; Pal, B.C.; Jabr, R.A. Statistical Representation of Distribution System Loads Using Gaussian Mixture Model. IEEE Trans. Power Syst. 2010, 25, 29–37. [Google Scholar] [CrossRef]
  8. Wang, Z.W.; Shen, C.; Liu, F. A conditional model of wind power forecast errors and its application in scenario generation. Appl. Energy 2018, 212, 771–785. [Google Scholar] [CrossRef]
  9. Xiao, R.; Wan, C.; He, Z.; Ju, P. Probabilistic Small-Signal Stability Analysis of Power Systems with Renewable Energy: A Truncated GMM-Based Analytical Method. IEEE Trans. Power Syst. 2025, 1–14. [Google Scholar] [CrossRef]
  10. Chen, D.; Chen, Z.; Fan, H.; Xu, X.; Liu, M.; Chen, Y. Probability Evaluation Method of Available Transfer Capability Considering Source-Load Side Uncertainty. In Proceedings of the 2022 4th International Conference on Power and Energy Technology (ICPET), Qinghai, China, 28–31 July 2022; pp. 47–52. [Google Scholar]
  11. Yang, Y.; Wu, W.C.; Wang, B.; Li, M.J. Analytical Reformulation for Stochastic Unit Commitment Considering Wind Power Uncertainty with Gaussian Mixture Model. IEEE Trans. Power Syst. 2020, 35, 2769–2782. [Google Scholar] [CrossRef]
  12. Wang, Z.; Shen, C.; Liu, F.; Wu, X.; Liu, C.C.; Gao, F. Chance-Constrained Economic Dispatch with Non-Gaussian Correlated Wind Power Uncertainty. IEEE Trans. Power Syst. 2017, 32, 4880–4893. [Google Scholar] [CrossRef]
  13. Yang, Y.; Wu, W.; Wang, B.; Li, M. Chance-Constrained Economic Dispatch Considering Curtailment Strategy of Renewable Energy. IEEE Trans. Power Syst. 2021, 36, 5792–5802. [Google Scholar] [CrossRef]
  14. Xu, S.; Wu, W. Tractable Reformulation of Two-Side Chance-Constrained Economic Dispatch. IEEE Trans. Power Syst. 2022, 37, 796–799. [Google Scholar] [CrossRef]
  15. Xu, S.; Wu, W.; Yang, Y.; Lin, C.; Liu, Y. Chance-Constrained Joint Dispatch of Generation and Wind Curtailment-Load Shedding Schemes with Large-Scale Wind Power Integration. IEEE Trans. Sustain. Energy 2023, 14, 2220–2233. [Google Scholar] [CrossRef]
  16. Wang, S.; Wu, W. Aggregate Flexibility of Virtual Power Plants with Temporal Coupling Constraints. IEEE Trans. Smart Grid 2021, 12, 5043–5051. [Google Scholar] [CrossRef]
  17. Wang, S.; Wu, W.; Chen, Q.; Yu, J.; Wang, P. Stochastic Flexibility Evaluation for Virtual Power Plants by Aggregating Distributed Energy Resources. CSEE J. Power Energy Syst. 2024, 10, 988–999. [Google Scholar] [CrossRef]
  18. Wang, Z.; Shen, C.; Liu, F.; Wang, J.; Wu, X. An Adjustable Chance-Constrained Approach for Flexible Ramping Capacity Allocation. IEEE Trans. Sustain. Energy 2018, 9, 1798–1811. [Google Scholar] [CrossRef]
  19. Chang, N.; Cui, H.; Xie, W. Optimal Dispatching of Power System Considering Source-load Uncertainty and Pricing of Flexible Ramping Products. In Proceedings of the 2023 9th International Conference on Systems and Informatics (ICSAI), Changsha, China, 16–18 December 2023; pp. 1–6. [Google Scholar]
  20. Lin, Y.; Yang, M.; Wan, C.; Wang, J.; Song, Y. A Multi-Model Combination Approach for Probabilistic Wind Power Forecasting. IEEE Trans. Sustain. Energy 2019, 10, 226–237. [Google Scholar] [CrossRef]
  21. Jia, M.; Shen, C.; Wang, Z. A Distributed Probabilistic Modeling Algorithm for the Aggregated Power Forecast Error of Multiple Newly Built Wind Farms. IEEE Trans. Sustain. Energy 2019, 10, 1857–1866. [Google Scholar] [CrossRef]
  22. Przyborowski, M.; Ślęzak, D. Approximation of the expectation-maximization algorithm for Gaussian mixture models on big data. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 6256–6260. [Google Scholar]
  23. Liu, J.H.; Li, Z.H. Robust Expectation-Maximization-Based Secondary Voltage Control Scheme Considering Stochastic Measurement Error. IEEE Trans. Power Syst. 2023, 38, 2958–2961. [Google Scholar] [CrossRef]
  24. Watanabe, H.; Muramatsu, S.; Kikuchi, H. Interval calculation of EM algorithm for GMM parameter estimation. In Proceedings of the Proceedings of 2010 IEEE International Symposium on Circuits and Systems, Paris, France, 30 May–2 June 2010; pp. 2686–2689. [Google Scholar]
  25. Zeng, B.; Zhang, J.; Yang, X.; Wang, J.; Dong, J.; Zhang, Y. Integrated Planning for Transition to Low-Carbon Distribution System with Renewable Energy Generation and Demand Response. IEEE Trans. Power Syst. 2014, 29, 1153–1165. [Google Scholar] [CrossRef]
  26. Cardell, J.B.; Connors, S.R. Wind power in New England: Modeling and analysis of nondispatchable renewable energy technologies. IEEE Trans. Power Syst. 1998, 13, 710–715. [Google Scholar] [CrossRef]
  27. Shi, T.; Zhao, F.; Chen, Z.; Shi, T.; Chen, B.; Qi, C. Research on Probabilistic Production Simulation of Grid Connected Renewable Energy Generation with Flexibility Evaluation. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 7485–7490. [Google Scholar]
  28. Chaiamarit, K.; Nuchprayoon, S. Impact assessment of renewable generation on electricity demand characteristics. Renew. Sustain. Energy Rev. 2014, 39, 995–1004. [Google Scholar] [CrossRef]
  29. Jovanović, A.; Perić, Z.; Nikolić, J.; Aleksić, D. The Effect of Uniform Data Quantization on GMM-based Clustering by Means of EM Algorithm. In Proceedings of the 2021 20th International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovina, 17–19 March 2021; pp. 1–5. [Google Scholar]
  30. Takahiro, S.; Tatsuya, K. GMM and HMM training by aggregated EM algorithm with increased ensemble sizes for robust parameter estimation. In Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 31 March–4 April 2008; pp. 4405–4408. [Google Scholar]
  31. Al Lawati, Y.; Kelly, J.; Stowell, D. Short-term prediction of photovoltaic power generation using Gaussian process regression. arXiv 2020, arXiv:2010.02275. [Google Scholar] [CrossRef]
  32. Pinto, R.C.; Engel, P.M. A Fast Incremental Gaussian Mixture Model. PLoS ONE 2015, 10, e0139931. [Google Scholar] [CrossRef]
  33. Engel, P.M.; Heinen, M.R. Incremental learning of multivariate gaussian mixture models. In Proceedings of the Brazilian Symposium on Artificial Intelligence, São Bernardo do Campo, Brazil, 23–28 October 2010; pp. 82–91. [Google Scholar]
  34. Zheng, J.; Wen, Q.; Song, Z. Recursive Gaussian mixture models for adaptive process monitoring. Ind. Eng. Chem. Res. 2019, 58, 6551–6561. [Google Scholar] [CrossRef]
  35. Zivkovic, Z.; van der Heijden, F. Recursive unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 651–656. [Google Scholar] [CrossRef] [PubMed]
  36. Awwad Shiekh Hasan, B.; Gan, J.Q. Sequential EM for unsupervised adaptive Gaussian mixture model based classifier. In Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Miami, FL, USA, 6 December 2009; pp. 96–106. [Google Scholar]
  37. Dai, Q.; Zhao, C. Incremental Gaussian mixture model for time-varying process monitoring. In Proceedings of the 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), Liuzhou, China, 20–22 November 2020; pp. 1305–1311. [Google Scholar] [CrossRef]
  38. Jia, M.; Shen, C.; Wang, Z. A distributed incremental update scheme for probability distribution of wind power forecast error. Int. J. Electr. Power Energy Syst. 2020, 121, 106151. [Google Scholar] [CrossRef]
  39. Golub, G.H.; Van Loan, C.F. Matrix Computations; JHU Press: Baltimore, MD, USA, 2013. [Google Scholar]
  40. Draxl, C.; Clifton, A.; Hodge, B.-M.; McCaa, J. The wind integration national dataset (wind) toolkit. Appl. Energy 2015, 151, 355–366. [Google Scholar] [CrossRef]
Figure 1. Flow chart of the proposed framework to recursively update probabilistic model of renewable generation.
Figure 1. Flow chart of the proposed framework to recursively update probabilistic model of renewable generation.
Applsci 15 10546 g001
Figure 2. The cumulated average log likelihood of different strategies.
Figure 2. The cumulated average log likelihood of different strategies.
Applsci 15 10546 g002
Table 1. Comparison of computational time to update parameters of GMM in Experiment 1.
Table 1. Comparison of computational time to update parameters of GMM in Experiment 1.
N 876011,68014,60017,52020,400
Recursive update0.129 s0.063 s0.129 s0.113 s0.053 s
EMTotal time124 s176 s229 s589 s649 s
Iterations3119223941
Time per iteration4.0 s9.3 s10.4 s15.1 s15.8 s
Table 2. Comparison of computational time to update parameters of GMM in Experiment 2.
Table 2. Comparison of computational time to update parameters of GMM in Experiment 2.
N e x t r a 1030507090
Recursive update0.051 s0.113 s0.205 s0.238 s0.340 s
EMTotal time191 s192 s345 s374 s417 s
Iterations1920313932
Time per iteration10.1 s9.6 s11.1 s9.6 s13.0 s
Table 3. Comparison of GMM updating methods.
Table 3. Comparison of GMM updating methods.
MethodUpdating MechanismTreatment of Old SamplesCalibrationComputational EfficiencyAccuracy
Proposed methodRecursive incremental update with calibrationDiscardedYesVery high, 1000 to 10,000 times faster than full EMHigh long-term accuracy
Recursive GMM with forgetting factor [34]Recursive update with fixed forgetting factor λDown-weightedNoHigh, optimized matrix inversion, determinant updatesGood accuracy, but depends on λ
Online GMM with forgetting [38]Recursive Bayesian EM updateDown-weightedNoHigh, incremental updates onlyGood short-term accuracy, but bias accumulates
Traditional EMBatch EM re-estimationRetainedNoLowAccurate for static data
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lou, W.; Fan, S.; Qi, Z.; Zhao, C.; Zhou, H.; Yang, Y. Recursively Updated Probabilistic Model for Renewable Generation. Appl. Sci. 2025, 15, 10546. https://doi.org/10.3390/app151910546

AMA Style

Lou W, Fan S, Qi Z, Zhao C, Zhou H, Yang Y. Recursively Updated Probabilistic Model for Renewable Generation. Applied Sciences. 2025; 15(19):10546. https://doi.org/10.3390/app151910546

Chicago/Turabian Style

Lou, Wei, Shen Fan, Zhenbiao Qi, Cheng Zhao, Hang Zhou, and Yue Yang. 2025. "Recursively Updated Probabilistic Model for Renewable Generation" Applied Sciences 15, no. 19: 10546. https://doi.org/10.3390/app151910546

APA Style

Lou, W., Fan, S., Qi, Z., Zhao, C., Zhou, H., & Yang, Y. (2025). Recursively Updated Probabilistic Model for Renewable Generation. Applied Sciences, 15(19), 10546. https://doi.org/10.3390/app151910546

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop