Next Article in Journal
Dynamic Response of Track-Mounted Advanced Support Equipment Under Different Working Conditions
Previous Article in Journal
Life-Cycle Environmental Trade-Offs of Steel Slag Treatment Processes: A Comparative Assessment with Process-Level Drivers
Previous Article in Special Issue
A Comparative Study on the Adaptability of Different Motion Equation Models of DFIG-Based Wind Turbines for Power System Frequency Stability Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids

1
State Grid Sichuan Electric Power Company, Chengdu 610041, China
2
State Grid Sichuan Electric Power Research Institute, Chengdu 610041, China
3
Power System Security and Operation Key Laboratory of Sichuan, Chengdu 610041, China
4
College of Electrical and Information Engineering, Hunan University, Changsha 410082, China
*
Author to whom correspondence should be addressed.
Processes 2026, 14(12), 1872; https://doi.org/10.3390/pr14121872
Submission received: 22 April 2026 / Revised: 27 May 2026 / Accepted: 8 June 2026 / Published: 9 June 2026

Abstract

Large-scale renewable energy integration introduces random power fluctuations into microgrids, increasing the difficulty of frequency regulation. To improve regulation stability and training efficiency, this article proposes sample selection generative adversarial networks (SSGANs) based on sample selection networks (SSNs), conditional generative adversarial networks (CGANs), and the actor–critic framework. First, the SSNs are trained to evaluate sample information values and prioritize informative samples for model training. Second, the CGANs learn the conditional mapping between microgrid operating states and control actions, and the pretrained generator is transferred into the actor–critic framework as the actor. Third, the actor–critic framework further optimizes the control policy online to generate real-time frequency regulation commands. The proposed method is tested on a standard two-area system and further validated on a complex four-area system. Case studies show that SSGANs achieve faster convergence and better frequency regulation performance than typical control algorithms.

1. Introduction

Large-scale renewable energy, such as wind and solar power, has been integrated into microgrids to reduce environmental pollution [1]. However, its random fluctuations increase the difficulty of frequency regulation. Meanwhile, massive and complex microgrid operational data make it difficult for intelligent control methods to efficiently select useful samples [2]. Therefore, this study aims to design an intelligent frequency regulation strategy for informative sample selection and accurate control action generation.
Automatic generation control (AGC) is widely used to maintain power balance and suppress frequency deviations in microgrids [3]. Generally, AGC includes two main processes [4]: first, generation commands are obtained using control strategies [4]; second, the total generation command is then dispatched to individual generation units [5]. However, conventional AGC strategies usually depend on fixed control structures, making it difficult to adapt to renewable energy fluctuations and changing operating conditions [6]. Therefore, reinforcement-learning-based smart generation control (SGC) has been introduced to improve the adaptability of microgrid frequency regulation [7]. For instance, Q-learning adjusts control policies through continuous interaction with uncertain operating environments, which helps enhance microgrid stability and adaptability [8]. Nevertheless, conventional reinforcement learning suffers from the curse of dimensionality as microgrid complexity increases [9].
To enhance frequency regulation under renewable energy fluctuations, particle swarm optimization has been used to train an artificial neural network for load frequency control with vehicle-to-grid integration [10]. Deep reinforcement learning methods have also been introduced into microgrid frequency regulation [11]. Deep Q-learning has been used to reduce frequency deviation through value-function approximation [12]. The deep deterministic policy gradient (DDPG) improves continuous control capability through deterministic policy learning [13]. Proximal policy optimization (PPO) enhances policy update stability by constraining policy changes [14]. The twin delayed deep deterministic policy gradient (TD3) reduces value overestimation through twin critics and delayed policy updates [15]. Soft actor–critic (SAC) improves exploration efficiency through entropy-regularized policy optimization [16]. However, these methods still rely on sufficient exploration and high-quality samples, which may reduce sample efficiency and slow convergence in complex microgrids.
To reduce the dependence on labeled samples and improve action generation, generative adversarial networks (GANs) have been combined with reinforcement learning for learning-based control [17]. GAN-based reinforcement learning can use adversarial training to enhance policy learning and decision generation [18]. Sequence generative adversarial networks (SGANs) further extend GANs to sequential decision-making and support online policy updates [19]. However, policy gradient methods used in SGANs may suffer from unstable updates in complex control environments [20]. To improve training stability, GANs have been connected with actor–critic frameworks [21]. In this framework, the generator can be used as the actor to improve action generation and strategy exploration [22]. Nevertheless, existing GAN-based actor–critic methods still have three gaps: insufficient use of informative samples from massive operational data [23], weak state-conditioned action prediction, and limited value-guided online optimization of generated actions [24].
To address the above deficiencies, this paper proposes sample selection generative adversarial networks (SSGANs) based on conditional generative adversarial networks (CGANs) [25], sample selection networks (SSNs), and the actor–critic framework [26]. Specifically, CGANs are introduced to learn the conditional mapping between microgrid operating states and control actions, thereby improving state-dependent action prediction [27]. Meanwhile, SSNs are constructed based on bidirectional long short-term memory (BiLSTM) [28] to capture temporal correlations in electricity data and select samples with high information values, which improves training efficiency. Then, the pretrained generator of CGANs is integrated into the actor–critic framework as the actor for online policy optimization. The key contributions of this article are as follows.
(1)
The SSGANs introduce SSNs to evaluate sample information values and prioritize informative samples, thereby improving training efficiency.
(2)
The SSGANs use CGANs to learn the state-conditioned mapping between microgrid operating states and control actions, thereby improving action generation quality.
(3)
The SSGANs integrate the pretrained generator into the actor–critic framework as the actor, enabling online policy optimization for intelligent frequency regulation of microgrids.
The rest of the paper is arranged as follows. Section 2 introduces the SSGANs. Section 3 describes the simulation setup. Section 4 presents the case studies and results. Section 5 concludes the paper.

2. Principle of Sample Selection Generative Adversarial Networks

2.1. Sample Selection Networks

With the continuous connection of new energy to microgrids, the power data becomes more complicated [29]. Samples with high reward values provide more useful information for training, while low-value samples dominate the dataset and increase training cost. Therefore, BiLSTM-based sample selection networks (SSNs) are developed to identify high information value samples from the training dataset and improve training efficiency. Figure 1 shows the structure of the SSNs.
Similarly to prioritized experience replay [30], the SSNs perform reward-based sample prioritization to select important samples from the training sample set S = x i , y i , which includes microgrid samples x i and their corresponding labels y i . Meanwhile, a validation set is defined as E = x i , y i , where y i denotes the known label of validation sample x i . The forecast validation reward is defined as
R E , S , s k = x i , y i E log p ( y i x i , s k , S e )
where x i denotes the input sample, y i denotes the known validation label, s k denotes the control state of the SSNs after querying k tags, and S e denotes the current labeled training subset.
During SSN training, the label of the selected input sample is queried, and the SSNs update their state from s k 1 to s k . The prediction of the SSNs is determined by the selected sample and its queried label, which are related to S e   and y i , respectively.
The ideal training objective of SSNs is
max π   E S , E F E π S , T i = 1 T R E , S , s i
where T denotes the maximum number of queried labels; S , E F indicates that the training set and validation set are sampled from distribution F ; π S , T denotes the sequential sample selection policy over T steps.
For an unlabeled candidate sample x j , its true label is unavailable before annotation. Therefore, the current SSNs first predict a pseudo-label y ^ j for x j . The pair ( x j | y ^ j ) is then virtually added to the current labeled subset S e to estimate its potential contribution. Accordingly, the information value of the unlabeled candidate is defined as the predicted increase in the validation reward after this hypothetical update, i.e.,
I j = R E , S e x j , y ^ j , s k + 1 R E , S e , s k
where I j denotes the information value of candidate sample x j ; s k + 1 denotes the updated SSNs state after virtually including ( x j | y ^ j ) .
For a candidate batch with N samples, the mean information value is calculated as
I ¯ = 1 N j = 1 N I j
where N denotes the number of candidate samples in the batch.
I j is used as the adaptive threshold for sample classification. If I j > I ¯ , candidate sample x j is regarded as a high-information-value sample; otherwise, it is regarded as a low-information-value sample.
In this way, the importance of each unlabeled candidate can be estimated before its true label is queried. Samples with larger I j are prioritized for subsequent training, thereby improving training efficiency.

2.2. Conditional Generative Adversarial Networks

The actor–critic framework obtains actions through policy learning. However, in complex microgrid environments, policy exploration may become inefficient and costly, making it difficult to find effective control actions [31]. To address this issue, CGANs are introduced to provide state-conditioned action prediction so that action generation is guided by the current operating state rather than relying only on exploratory policy updates.
GANs consist of a generator G and a discriminator D, which are trained in an adversarial manner. The generator converts a latent noise vector z sampled from a prior distribution into synthetic data, while the discriminator estimates whether an input sample comes from the real dataset or from G. During training, D improves its ability to distinguish real and generated samples, whereas G learns to produce samples that are difficult to identify as fake. This adversarial learning process can be described by the following minimax optimization problem :
min G   max D   V D , G = E x ~ ρ data log D x + E z ~ ρ noise log ( 1 D G ( z )
The CGANs, as an extension of GANs, introduce conditional information d into both the generator and discriminator. In this study, the CGANs introduce the current microgrid state s t as the conditional input. Unlike a standard GAN that learns the marginal distribution of control actions, the CGANs learn the conditional action distribution under a given operating state. The value function of CGANs is
min G   max D   V D , G s t = E a t + 1 p data a s t log D a t + 1 , s t + E z p z log 1 D G z , s t , s t
where s t denotes the current microgrid state, a t + 1 denotes the real historical control action, z denotes the noise vector, G z , s t denotes the generated control action, and D a t + 1 , s t denotes the probability that the state–action pair comes from the real dataset.
By conditioning both the generator and discriminator on s t , the CGANs can learn the state-dependent mapping from operating states to control actions. Therefore, compared with a standard GAN that only generates actions from noise, the CGANs can generate control actions more consistent with the current operating condition, thereby improving action forecasting accuracy. The structure of CGANs is shown in Figure 2.

2.3. Sample Selection Generative Adversarial Networks

The SSGANs consist of three main components: SSNs, CGANs, and the actor–critic framework. The SSNs evaluate the information value of candidate samples and divide them into high-information-value samples and low-information-value samples. The selected high-information-value samples are used for CGAN pretraining. The CGANs learn the state-conditioned mapping from microgrid operating states to control actions through adversarial learning, where the generator predicts control actions and the discriminator distinguishes real and generated state–action samples. Finally, the pretrained generator is transferred to initialize the actor network, and the actor–critic framework further updates the control policy through value evaluation. The structure of the SSGANs is shown in Figure 3.
For sample selection, this paper draws on the idea of reward-based prioritization [32]. The information value of each candidate sample is predicted by the SSNs. Samples with information values higher than the batch mean are regarded as high-information-value samples in experience pool 1; otherwise, they are regarded as low-information-value samples in experience pool 2. In this way, SSGANs can prioritize informative samples for CGAN pretraining and reduce the influence of low-information-value samples on policy learning.
After the high-information-value sample set c = c 0 , , c t is obtained by SSNs, the pretrained generator is transferred to initialize the actor network. In the online stage, G denotes the actor network and C denotes the critic network. The discriminator used in the offline CGAN pretraining stage is not involved in the online control process. Then, the actor G c t θ G predicts the next action a t + 1 according to the current condition c t , while the critic C c t , a t θ C estimates the corresponding action–value function. Therefore, the online control process follows a standard actor–critic framework.
The critic target value is calculated as
y i = r i + λ C c i + 1 , G c i + 1 θ G | θ C
where λ is the discount factor.
The critic network is trained by minimizing the mean squared Bellman error
L θ C = 1 I i = 1 I y i C c i , a i θ C 2
The actor network is optimized using the deterministic policy gradient
θ G J 1 I i = 1 I a C c , a θ C c = c i , a = G c i θ G G c θ G c i
where i is the mini-batch sample index; I is the mini-batch size.
Finally, the target networks are updated through soft updating
θ C = γ θ C + 1 γ θ C θ G = γ θ G + 1 γ θ G
where γ is the soft update coefficient.

2.4. Sample Selection Generative Adversarial Networks for SGC

The proposed SSGANs are applied for SGC to reduce the frequency deviation Δ f and area control error (ACE) of microgrids. To clarify the decision-making formulation, the SGC problem is formulated as a Markov decision process S , A , P , r . At time t , the state s t S includes the measurable operating variables of the microgrid, such as frequency deviation, ACE, load disturbance, and renewable power fluctuation. The action a t A denotes the continuous generation control command generated by the actor. The environment is the microgrid frequency response model under load and renewable energy disturbances. After receiving a t , the environment returns the next state s t + 1 and the immediate reward r t .
In the pretraining stage, historical operating data are collected as input samples. The SSNs are first trained to evaluate sample information values and select high-information-value samples for CGAN pretraining. The CGANs then learn the state-conditioned mapping between microgrid operating states and control actions, and the pretrained generator is transferred to initialize the actor network. After pretraining, the parameters of the SSNs are fixed and are not jointly updated with actor–critic networks.
Although offline training is more computationally stable, it cannot exploit the online learning capability of the actor–critic framework. To improve adaptability to real-time complex disturbances, an online training stage is further introduced. The pretrained generator is transferred to initialize the actor, which is then updated under the actor–critic framework. Meanwhile, the trained SSNs with fixed parameters are used to select informative samples from the real-time experience replay buffer to support policy learning and optimization.
After training, the SSGANs are deployed for online frequency control. At each control step, the current microgrid state is directly fed into the trained actor network, and the corresponding control command is generated in real time. Therefore, the CGANs learn the state-conditioned action distribution p a t + 1 s t , while the actor–critic framework further optimizes the generated actions through critic evaluation. The proposed method uses sample selection to improve training efficiency and uses adversarial pretraining to enhance action prediction, thereby supporting real-time frequency regulation of microgrids (Figure 4 and Algorithm 1).
Algorithm 1. Pseudo-code of SSGANs for SGC
1: Initialize parameters
2: for each training sample do
3: Estimate the validation reward using Equation (1)
4: Update the SSN state
5: Calculate the sample information value using Equation (3))
6: If the predicted reward exceeds the mean, save the sample to experience pool 1; otherwise to experience pool 2
7: end for
8: Pre-train the CGANs on experience pool 1 by Equation (6).
9: Transfer the pre-trained generator to initialize the online actor network
10: for t = 1…T do
11: Generate action a t
12: Store samples ( s t , a t , r t , s t + 1 )
13: Calculate the target value by Equation (7)
14: Update the critic and actor using Equation (8) and Equation (9)
15: Soft-update the target networks using Equation (10)
16: end for
17: Deploy the trained actor network
18: Input the current microgrid state into the trained actor
19: Generate the control command for real-time frequency regulation

3. Simulation Setup

3.1. Evaluation and Reward Function

To evaluate the frequency regulation performance of SSGANs, Δf, ACE, integral squared error (ISE) [33], integral absolute error (IAE) [34], and integral time multiple absolute errors (ITAEs) [35] are adopted as evaluation indices. Smaller values indicate better performances.
ISE = 0 Δ f 2 t d t
IAE = 0 Δ f t d t
ITAE = 0 t Δ f t d t
To guide the controller toward stable frequency regulation, the reward function is constructed using Δf and ACE. The weighted squared terms are used to penalize frequency fluctuation and tie-line power imbalance. The specific reward function used in the critic network only focuses on frequency deviation and ACE, which is designed as
r = k = 1 K η k ( Δ f k ) 2 + 1 η k ACE k 2 / 1000
where η k and 1 η k are weight coefficients of Δ f k and ACE k in the k-th area, respectively. η k is set to 0.5 for all areas to balance Δ f k and ACE k .
In addition, the precision, loss, and root mean square error (RMSE) are applied to evaluate the training process of SSNs and CGANs.
Precision = N correct N total
Loss = 1 N i = 1 N y i log y ^ i + 1 y i log 1 y ^ i
RMSE = 1 N i = 1 N ( z i z ^ i ) 2
where N correct is the number of correctly classified samples; N total is the total number of samples; y i is the true label; y ^ i is the predicted probability; z i is the real value; z ^ i is the predicted value.

3.2. Parameter Setting

Table 1 lists the parameters of the SSGANs. These hyperparameters were selected based on empirical tuning and manual adjustment to balance prediction accuracy, training stability, and computational efficiency.
For the SSNs, the BiLSTM consists of two LSTM layers with dimensions of 4-90-90-2. The hidden size of 90 is selected to capture temporal correlations in microgrid operating data without excessive computational cost. The dropout factor is set to 0.5 to avoid overfitting, and the learning rate is set to 0.01 to accelerate sample-selection training.
For the CGANs, the generator and discriminator are DNNs with dimensions of 4-90-90-4 and 4-90-90-2, respectively. The generator output dimension corresponds to continuous control actions, and the discriminator output dimension corresponds to real/fake sample discrimination. Tanh is used in the generator output layer to bound the generated control actions, while Sigmoid is used in the discriminator output layer. ReLU is used in the hidden layers, and batch normalization [36] is introduced to alleviate undesirable initialization and improve adversarial training stability.
For the actor–critic framework, the pretrained generator is transferred as the actor, and the critic is a DNN with dimensions of 4-90-90-1 to estimate the action value. The learning rate of both the actor and critic is set to 0.005 to maintain balanced updates. Adam is used as the optimizer, the discount factor is set to 0.99, the soft update coefficient is set to 0.005, the replay buffer size is set to 1 × 10 6 , the batch size is set to 128, and the total training steps are set to 10,000. These settings are kept consistent with the comparison algorithms to ensure a fair evaluation.
Five representative methods of modern deep reinforcement learning are introduced as the comparison algorithms: DDPG, TD3, SAC, PPO, and actor–critic GAN [18]. Consistent with the original experimental setup, all algorithms were optimized by the genetic algorithm with the same population size of 30 and the same iteration number of 50. In addition, all methods were evaluated under the same computational budget and the same architecture family to ensure a fair comparison. The detailed parameter settings of the compared algorithms are provided in Table 2 and Table 3.

4. Case Studies

All algorithms are evaluated in the IEEE two-area system (Case I) [37] and China Southern Power Grid (Case II) [38]. Case I, as the benchmark condition, is utilized to test the SSGANs; Case II is utilized to validate the performance of the SSGANs.

4.1. Case I

Case I is based on a standard two-area frequency control system. Each area contains conventional generation units represented by governor and turbine dynamics, as shown in Figure 5. To simulate renewable energy integration, wind and solar power units are also incorporated into the system. The main parameters of Case I are listed in Table 4.

4.1.1. Pretraining of SSNs and CGANs

In the pretraining stage, historical state–action data from Case I are used as training samples, which are generated by a conventional PID controller. The SSNs evaluate sample information values based on historical rewards and select high-information-value samples for CGAN pretraining. Then, the CGANs learn the state-conditioned mapping from operating states to generation control actions, and the pretrained generator provides initialization for the online actor network.
In the SSNs training process, Figure 6a shows that the precision rises quickly and becomes stable after approximately 4 × 104 iterations, while the loss decreases to a low level. This indicates that the SSNs can effectively learn the sample selection rule. Figure 7a presents the reward distribution of training samples. The reward values calculated by Equation (13) are shifted by subtracting the mean value and normalized into [−1, 1]. Samples with normalized rewards greater than 0 are regarded as high-information-value samples for CGAN pretraining. For the CGANs, Figure 6b shows that the RMSE decreases rapidly and converges after approximately 4 × 104 iterations, indicating stable prediction performance. As shown in Figure 7b, the predicted generation power closely follows the observed power. Therefore, the pretrained generator can learn the mapping from microgrid operating states to generation control actions and provide initialization for the online actor network.

4.1.2. Online Training of SSGANs

Online training data are collected from state–action interactions in Case I. The CGAN generator initializes the actor, while the critic estimates action values. To test dynamic learning, a sawtooth load disturbance with 50 dB Gaussian white noise is added to Case I, as shown in Figure 8.
Figure 9a shows the frequency deviation responses of different algorithms. DDPG has the largest fluctuation and slowest recovery, while PPO improves the response but still shows relatively large deviations. TD3 and SAC achieve similar and better control effects. The actor–critic GAN further reduces frequency deviations through adversarial pretraining. In comparison, SSGANs show the smallest deviation and fastest recovery, indicating better online learning and regulation ability. This is because the SSNs prioritize high-information-value samples to improve training sample quality, the CGANs provide state-conditioned action prediction to further accelerate policy exploration, and the actor–critic framework updates the control policy in real time, enabling SSGANs to generate more effective control commands for frequency regulation. Figure 9b further shows that SSGANs have the most concentrated frequency deviation distribution around zero, indicating the best regulation stability and online control performance.

4.1.3. Online Operation

To evaluate the online operation performance, a residential load profile is introduced as the load disturbance, as shown in Figure 10a. Meanwhile, wind and solar power fluctuations are considered as stochastic renewable energy disturbances, as shown in Figure 10b.
Compared with the other algorithms, SSGANs can prioritize high-information-value samples and provide better training data for policy learning. Therefore, its frequency deviation and ACE curves are smoother and converge faster, as shown in Figure 11. This indicates that the sample selection mechanism can reduce the influence of low-information samples, while CGAN-based action prediction improves the initial control quality. Table 5 and Figure 12 further show that SSGANs reduce the average frequency deviation by 44.64–71.84% and the average ACE by 14.52–49.93%. Moreover, SSGANs achieve lower ISE, IAE, and ITAE than the compared algorithms, with minimum reductions of 39.82%, 27.73%, and 27.50%, respectively. These results demonstrate that SSGANs can effectively improve frequency quality, reduce accumulated control errors, and achieve better online operation performance.

4.2. Case II

Case II is a four-area interconnected power system. Compared with Case I, Case II has a larger system scale, stronger inter-area coupling, and more complex frequency regulation requirements. Therefore, it is used to further evaluate the reliability and adaptability of SSGANs in complex multi-area scenarios, as shown in Figure 13. The system parameters of Case II are listed in Table 6.
Compared with other algorithms, SSGANs show smaller frequency deviation and ACE fluctuations in Area 1, as shown in Figure 14. The frequency deviation converges to a narrower range around zero, and the ACE response is smoother, indicating better suppression of frequency oscillation and inter-area power imbalance. This improvement is mainly attributed to the high-information-value sample selection of SSNs, the state-conditioned action prediction of CGANs, and the online policy update of the actor–critic framework. Table 7 and Figure 15 further show that SSGANs outperform the compared algorithms in most evaluation indices. In Area 1, SSGANs reduce the average frequency deviation by 25.455–40.580% and the average ACE by 9.207–22.469%, while ISE, IAE, and ITAE are reduced by at least 11.152%, 10.146%, and 7.556%, respectively. Similar improvements are observed in Areas 2–4, indicating that although SSGANs do not achieve the best result on a few individual indices, they still outperform the compared methods on most evaluation indices across the multi-area system.
The above results indicate that (i) the SSGANs can stably control more generator units and new energy units; (ii) the SSGANs are highly adaptive and robust in complicated multi-area systems.

4.3. Discussion

To further evaluate the effectiveness of the proposed SSGANs, this section presents additional analyses, including ablation studies and statistical analysis, dynamic response performance, communication delay analysis, and limitations discussion. All experiments in this section are conducted in Area 1 of Case II.

4.3.1. Ablation Studies

The complete SSGANs achieve the best performance across all evaluation indices (Table 8). Removing SSNs increases Δf by 24.4% and ACE by 9.9%, which demonstrates that the sample selection mechanism effectively prioritizes high-information-value samples and improves training efficiency. Removing CGAN pretraining leads to a larger degradation, with Δf increasing by 31.7% and ACE by 19.1%, indicating that the pretrained generator provides a superior initialization for the actor network compared to random initialization. The standard actor–critic shows the worst performance across all indices, confirming that both components contribute meaningfully to the overall performance. Regarding computational cost, SSGANs require the longest training time (2.86 h) due to the additional SSN training and CGAN pretraining stages. However, the inference time of all methods remains comparable (approximately 0.5 s), which satisfies the real-time requirement of microgrid frequency regulation.

4.3.2. Dynamic Response Performance

To further evaluate the dynamic response performance, settling time and frequency overshoot are introduced as supplementary indices, where settling time reflects the recovery speed after disturbance, and frequency overshoot reflects the maximum frequency deviation. As shown in Table 9, SSGANs achieve the best performance among all compared algorithms, with the shortest settling time of 486 s and the smallest frequency overshoot of 0.084 Hz. Compared with DDPG and SAC, the settling time of SSGANs is lower by 49.64% and 31.36%, respectively, and the frequency overshoot is lower by 42.47% and 25.66%, respectively. These results indicate that SSGANs can suppress the maximum frequency deviation more effectively and restore the system frequency to the steady-state range more rapidly after disturbance, demonstrating better dynamic regulation capability than standard DRL methods.

4.3.3. Communication Delay Analysis

In practical interconnected microgrids, communication delay affects the transmission of measured states and control commands, thereby weakening the timeliness of frequency regulation. As shown in Table 10, the control performance of all algorithms degrades as the delay increases from 10 ms to 30 ms. For SSGANs, the average frequency deviation and average area control error increase by 40.82% and 34.57%, respectively. However, SSGANs still maintain the lowest values under each delay condition. Under the 30 ms delay, compared with DDPG and SAC, SSGANs reduce the average frequency deviation by 38.39% and 27.37%, respectively, indicating that SSGANs are less affected by communication delay.
Frequency regulation mileage [8] is introduced as an auxiliary economic indicator, which is calculated by accumulating the absolute variations in adjacent control commands during the regulation process. A smaller regulation mileage indicates smoother control actions, lower regulation burden, and lower potential execution cost. As shown in Table 10, SSGANs achieve the lowest regulation mileage under different delay conditions. Under the 30 ms delay, the regulation mileage of SSGANs is 40.59% lower than DDPG and 26.99% lower than SAC, showing better regulation economy under delayed communication.

4.3.4. Statistical Significance Analysis

To further verify whether the performance gains of SSGANs are statistically significant rather than caused by random initialization, paired significance analysis was conducted on the two main control indices (i.e., Δf and ACE). Specifically, all compared methods were independently run 20 times under different random seeds, and the results obtained under the same seed were paired for comparison. For each method, the performance difference relative to SSGANs was evaluated by paired t-tests and Wilcoxon signed-rank tests. In addition, 95% bootstrap confidence intervals of the mean differences were calculated to further quantify the uncertainty of the observed improvement. The statistical results are summarized in Table 11.
As shown in Table 11, the mean differences in all compared methods relative to SSGANs are positive on both Δf and ACE, which indicates that SSGANs consistently achieve lower frequency deviation and lower area control error. Meanwhile, the p-values of both the paired t-test and the Wilcoxon signed-rank test are below 0.05 for all comparisons, and the corresponding 95% bootstrap confidence intervals do not include zero. These results demonstrate that the superiority of SSGANs over DDPG, TD3, SAC, PPO, and actor–critic GAN is statistically significant on the two main control indices. In particular, although actor–critic GAN already benefits from adversarial pretraining, SSGANs still achieve significant improvements, which further confirms that the SSN-based sample selection mechanism provides additional performance gains beyond adversarial actor initialization alone.

4.3.5. Discussion of Limitations

Although the proposed SSGANs demonstrate superior performance in the case studies, several limitations should be acknowledged.
(1)
The training time of SSGANs is longer than that of standard deep RL methods due to the additional SSN training and CGAN pretraining stages. In applications where rapid deployment is required, this additional training overhead may become a constraint.
(2)
The current validation is based on simulation models, and experimental verification on physical microgrid platforms is required to further confirm the practical applicability.
(3)
The pretraining data is generated by a conventional PID controller, which may limit the diversity and quality of the initial training samples. Exploring more diverse data sources for pretraining could further improve the performance of SSGANs.
(4)
The current work mainly focuses on simulation-based control performance, while the theoretical stability analysis of the system is not considered.

5. Conclusions

This paper proposes SSGANs for intelligent frequency regulation of microgrids by combining SSNs, CGANs, and the actor–critic framework. The main conclusions are as follows.
(1)
SSNs can evaluate sample information values and select informative samples, thereby improving sample utilization and training efficiency.
(2)
CGANs can learn the state-conditioned mapping between operating states and control actions, which improves action generation quality and reduces inefficient exploration.
(3)
By transferring the pretrained CGAN generator into the actor–critic framework, SSGANs achieve online policy optimization. Case studies show that SSGANs obtain smaller frequency deviation, lower ACE, and better dynamic response performance than the compared algorithms.
In future works, the collaborative analysis of the business model and the operation cost of microgrids can be considered for SSGANs to obtain an improved control scheme. In addition, the SSGANs could modify the networks to improve the control effect, and more advanced GAN training stabilization techniques, such as Wasserstein loss and gradient penalty, could be explored. More detailed economic cost modeling, including operation cost, device degradation, and actuator wear, can also be further considered. Meanwhile, theoretical stability analysis of the system can be considered.

Author Contributions

Conceptualization, X.Y., X.O. and R.C.; methodology, X.O. and R.C.; software, K.Y. and X.O.; validation, K.Y., X.W. and T.Z.; formal analysis, K.Y. and X.O.; investigation, X.W. and T.Z.; resources, X.Y. and K.Y.; data curation, X.W. and T.Z.; writing—original draft preparation, X.O. and R.C.; writing—review and editing, X.Y., K.Y. and B.C.; visualization, X.O. and T.Z.; supervision, X.Y., K.Y. and B.C.; project administration, X.Y., K.Y. and B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Science and Technology Project of State Grid Sichuan Electric Power Company (Project Name: Research on Key Technologies for Dynamic Assessment and Enhancement of Frequency Support Strength of Hydro-Wind-Solar Coupled System; Project No. 52199723003H).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Xi Ye, Xuetong Ouyang, Baorui Chen, Xi Wang, and Tong Zhu were employed by the company State Grid Sichuan Electric Power Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ACEArea control error
AGCAutomatic generation control
BiLSTMBidirectional long short-term memory
BNBatch normalization
CGANsConditional generative adversarial networks
DDPGDeep deterministic policy gradient
DNNsDeep neural networks
GANsGenerative adversarial networks
IAEIntegral absolute error
ISEIntegral squared error
ITAEIntegral time multiple absolute error
PPOProximal policy optimization
RMSERoot mean square error
SACSoft actor–critic
SGCSmart generation control
SSGANsSample selection generative adversarial networks
SSNsSample selection networks
TD3Twin delayed deep deterministic policy gradient

References

  1. Zhang, Y.; Lu, T.; Zhang, L.; Mei, Y.; Guo, Y.; Wu, S.; Wu, Q.; Xu, Y. Low-carbon optimal dispatching of rural multi-energy microgrid system based on multi-energy conversion and agricultural load demands. Energy 2026, 344, 140024. [Google Scholar] [CrossRef]
  2. Zheng, J.; Zhai, L.; Tao, M.; Tang, W.; Li, Z. Low-Carbon Economic Dispatch in Integrated Energy Systems: A Set-Based Interval Optimization with Decision Support Under Uncertainties. Prot. Control Mod. Power Syst. 2025, 11, 68–87. [Google Scholar] [CrossRef]
  3. Li, Z.; Cheng, Z.; Wang, Y.; Sui, Q. Distributed event-triggered fixed-time SMC-based AGC for power systems with heterogeneous frequency regulation units. IEEE Trans. Ind. Inform. 2024, 20, 8031–8043. [Google Scholar] [CrossRef]
  4. Muduli, R.; Jena, D.; Moger, T. Application of reinforcement learning-based adaptive PID controller for automatic generation control of multi-area power system. IEEE Trans. Autom. Sci. Eng. 2025, 22, 1057–1068. [Google Scholar] [CrossRef]
  5. Çetin, G.; Özkaraca, O.; Keçebaş, A. Development of PID based control strategy in maximum exergy efficiency of a geothermal power plant. Renew. Sustain. Energy Rev. 2021, 137, 110623. [Google Scholar] [CrossRef]
  6. Wang, N.; Hao, F. Event-triggered sliding mode control with adaptive neural networks for uncertain nonlinear systems. Neurocomputing 2021, 436, 184–197. [Google Scholar] [CrossRef]
  7. Xi, L.; Chen, J.F.; Huang, Y.H.; Xu, Y.C.; Liu, L.; Zhou, Y.M.; Li, Y. Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel. Energy 2018, 153, 977–987. [Google Scholar] [CrossRef]
  8. Yin, L.; He, X. Artificial emotional deep Q learning for real-time smart voltage control of cyber-physical social power systems. Energy 2023, 273, 127232. [Google Scholar] [CrossRef]
  9. Perera, A.T.D.; Kamalaruban, P. Applications of reinforcement learning in energy systems. Renew. Sustain. Energy Rev. 2021, 137, 110618. [Google Scholar] [CrossRef]
  10. Irfan, M.; Deilami, S.; Huang, S.J.; Tahir, T.; Veettil, B.P. Optimizing load frequency control in microgrid with vehicle-to-grid integration in Australia: Based on an enhanced control approach. Appl. Energy 2024, 366, 123317. [Google Scholar] [CrossRef]
  11. Tao, X.; Hafid, A.S. DeepSensing: A novel mobile crowdsensing framework with double deep Q-network and prioritized experience replay. IEEE Internet Things J. 2020, 7, 11547–11558. [Google Scholar] [CrossRef]
  12. Yin, L.F.; Luo, S.K.; Ma, C.X. Expandable depth and width adaptive dynamic programming for economic smart generation control of smart grids. Energy 2021, 232, 120964. [Google Scholar] [CrossRef]
  13. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
  14. Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1587–1596. [Google Scholar]
  15. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1861–1870. [Google Scholar]
  16. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  17. Fu, X.; Zhang, C.; Zhang, X.; Sun, H. A novel GAN architecture reconstructed using Bi-LSTM and style transfer for PV temporal dynamics simulation. IEEE Trans. Sustain. Energy 2024, 15, 2826–2829. [Google Scholar] [CrossRef]
  18. Han, K.; Yang, K.; Yin, L. Lightweight actor-critic generative adversarial networks for real-time smart generation control of microgrids. Appl. Energy 2022, 317, 119163. [Google Scholar] [CrossRef]
  19. Han, C.; Gim, G. Time-series-based anomaly detection in industrial control systems using generative adversarial networks. Processes 2025, 13, 2885. [Google Scholar] [CrossRef]
  20. Li, H.; Misra, S. Reinforcement learning based automated history matching for improved hydrocarbon production forecast. Appl. Energy 2020, 284, 116311. [Google Scholar] [CrossRef]
  21. Pfau, D.; Vinyals, O. Connecting generative adversarial networks and actor-critic methods. arXiv 2016, arXiv:1610.01945. [Google Scholar] [CrossRef]
  22. Peng, B.; Li, X.; Gao, J.; Liu, J.; Wong, K.F. Adversarial advantage actor-critic model for task-completion dialogue policy learning. arXiv 2017, arXiv:1710.11277. [Google Scholar] [CrossRef]
  23. Ye, Y.; Qiu, D.; Sun, M.; Papadaskalopoulos, D.; Strbac, G. Deep reinforcement learning for strategic bidding in electricity markets. IEEE Trans. Smart Grid 2020, 11, 1343–1355. [Google Scholar] [CrossRef]
  24. Kumar, R.; De, M. Advancement in power system resilience through deep reinforcement learning: A comprehensive review. Renew. Sustain. Energy Rev. 2025, 222, 115951. [Google Scholar] [CrossRef]
  25. Sadoughi, N.; Busso, C. Speech-driven expressive talking lips with conditional sequential generative adversarial networks. IEEE Trans. Affect. Comput. 2021, 12, 1031–1044. [Google Scholar] [CrossRef]
  26. Dhawas, P.V.; Bedekar, P.; Nandankar, P.V.; Vaidya, M.G. Localization of HIFS in primary distribution networks using voltage and current sequence components. Expert Syst. Appl. 2024, 242, 122428. [Google Scholar] [CrossRef]
  27. Vinci, C. Predicting auction price of vehicle license plate with deep recurrent neural network. Expert Syst. Appl. 2020, 142, 113008. [Google Scholar] [CrossRef]
  28. Michael, N.E.; Bansal, R.C.; Ismail, A.A.A.; Elnady, A.; Hasan, S. A cohesive structure of bi-directional long-short-term memory (BiLSTM)-GRU for predicting hourly solar radiation. Renew. Energy 2024, 222, 119943. [Google Scholar] [CrossRef]
  29. Ding, Y.F.; Chen, Z.J.; Zhang, H.W.; Wang, X.; Guo, Y. A short-term wind power prediction model based on CEEMD and WOA-KELM. Renew. Energy 2022, 189, 188–198. [Google Scholar] [CrossRef]
  30. Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2016. [Google Scholar] [CrossRef]
  31. Thibaut, T.; Damien, E. An application of deep reinforcement learning to algorithmic trading. Expert Syst. Appl. 2021, 173, 114632. [Google Scholar] [CrossRef]
  32. Xi, L.; Yu, L.; Xu, Y.; Wang, S.X.; Chen, X. A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems. IEEE Trans. Sustain. Energy 2020, 11, 2417–2426. [Google Scholar] [CrossRef]
  33. Li, S.; Gu, C.; Zhao, P.; Cheng, S. A novel hybrid propulsion system configuration and power distribution strategy for light electric aircraft. Energy Convers. Manag. 2021, 238, 114171. [Google Scholar] [CrossRef]
  34. Khokhar, B.; Dahiya, S.; Parmar, S. Load frequency control of a microgrid employing a 2D Sine Logistic map based chaotic sine cosine algorithm. Appl. Soft Comput. 2021, 109, 107564. [Google Scholar] [CrossRef]
  35. Jalali, N.; Razmi, H.; Doagou-Mojarrad, H. Optimized fuzzy self-tuning PID controller design based on Tribe-DE optimization algorithm and rule weight adjustment method for load frequency control of interconnected multi-area power systems. Appl. Soft Comput. 2020, 93, 106424. [Google Scholar] [CrossRef]
  36. Chen, Y.; Xie, Z.; Zhong, J.; Chen, P.; Xiao, J. SQKformer: Spiking sparse QKformer with adaptive batch normalization for membrane potential. Neurocomputing 2026, 671, 132666. [Google Scholar] [CrossRef]
  37. Yin, L.; Wang, T.; Wang, S.; Zheng, B. Interchange objective value method for distributed multi-objective optimization: Theory, application, implementation. Appl. Energy 2019, 239, 1066–1076. [Google Scholar] [CrossRef]
  38. Zhang, J.; Cheng, C.; Yu, S.; Wu, H.; Gao, M. Sharing hydropower flexibility in interconnected power systems: A case study for the China Southern power grid. Appl. Energy 2021, 288, 116645. [Google Scholar] [CrossRef]
Figure 1. Structure of SSNs.
Figure 1. Structure of SSNs.
Processes 14 01872 g001
Figure 2. Structure of CGANs.
Figure 2. Structure of CGANs.
Processes 14 01872 g002
Figure 3. Structure of SSGANs.
Figure 3. Structure of SSGANs.
Processes 14 01872 g003
Figure 4. Framework of SSGANs for SGC.
Figure 4. Framework of SSGANs for SGC.
Processes 14 01872 g004
Figure 5. IEEE two-area system.
Figure 5. IEEE two-area system.
Processes 14 01872 g005
Figure 6. Training curves of SSNs and CGANs: (a) SSNs; (b) CGANs.
Figure 6. Training curves of SSNs and CGANs: (a) SSNs; (b) CGANs.
Processes 14 01872 g006
Figure 7. Performance results of SSNs and CGANs: (a) SSNs; (b) CGANs.
Figure 7. Performance results of SSNs and CGANs: (a) SSNs; (b) CGANs.
Processes 14 01872 g007
Figure 8. Triangular load with Gaussian noise.
Figure 8. Triangular load with Gaussian noise.
Processes 14 01872 g008
Figure 9. Result of online training: (a) curves of ∆f; (b) box diagram of ∆f.
Figure 9. Result of online training: (a) curves of ∆f; (b) box diagram of ∆f.
Processes 14 01872 g009
Figure 10. Disturbance curve: (a) resident load; (b) renewable energy.
Figure 10. Disturbance curve: (a) resident load; (b) renewable energy.
Processes 14 01872 g010
Figure 11. Dynamic responses of different algorithms in Area A: (a) ∆f; (b) ACE.
Figure 11. Dynamic responses of different algorithms in Area A: (a) ∆f; (b) ACE.
Processes 14 01872 g011
Figure 12. Online operation results in Case I: (a) Area 1; (b) Area 2.
Figure 12. Online operation results in Case I: (a) Area 1; (b) Area 2.
Processes 14 01872 g012
Figure 13. China’s southern power grid.
Figure 13. China’s southern power grid.
Processes 14 01872 g013
Figure 14. Dynamic responses of different algorithms in Area 1: (a) ∆f; (b) ACE.
Figure 14. Dynamic responses of different algorithms in Area 1: (a) ∆f; (b) ACE.
Processes 14 01872 g014
Figure 15. Online operation results in Case II: (a) Area 1; (b) Area 2; (c) Area 3; (d) Area 4.
Figure 15. Online operation results in Case II: (a) Area 1; (b) Area 2; (c) Area 3; (d) Area 4.
Processes 14 01872 g015
Table 1. Parameters of SSGANs.
Table 1. Parameters of SSGANs.
ModeLayerHidden Unit Active FunctionBatch Normalization Size
Generator–Actor 14ReLU64
Generator–Actor290ReLU64
Generator–Actor390ReLU128
Generator–Actor44Tanh-
Discriminator14ReLU32
Discriminator290ReLU32
Discriminator390ReLU64
Discriminator42Sigmoid-
Critic14ReLU32
Critic290ReLU32
Critic390ReLU64
Critic41Sigmoid-
BiLSTM14ReLU64
BiLSTM290ReLU64
BiLSTM390ReLU128
BiLSTM42Sigmoid-
Table 2. Main parameters of the comparison algorithms.
Table 2. Main parameters of the comparison algorithms.
ParameterDDPGTD3SACPPOActor–Critic GAN
Network typeMLPMLPMLPMLPCGANs-Actor + Critic
Hidden layers22222
Hidden units[128, 128][128, 128][128, 128][128, 128][128, 128]
Activation functionReLUReLUReLUReLUReLU
OptimizerAdamAdamAdamAdamAdam
Learning rate1×10−4/1×10−31×10−4/1×10−33×10−4/3×10−43×10−4/1×10−31×10−4/1×10−3
Discount factor 0.990.990.990.990.99
Batch size128128128128128
Training steps10,00010,00010,00010,00010,000
Table 3. Specific parameter settings of the comparison algorithms.
Table 3. Specific parameter settings of the comparison algorithms.
AlgorithmsSpecific Settings
DDPGReplay buffer size = 1×106; soft update coefficient = 0.005; Gaussian exploration noise linearly decayed from 0.20 to 0.05
TD3Replay buffer size = 1×106; soft update coefficient = 0.005; policy delay = 2; target policy smoothing noise = 0.20
SACReplay buffer size = 1×106; soft update coefficient = 0.005; entropy coefficient automatically tuned
PPORollout length = 2048; clip ratio = 0.20; GAE parameter = 0.95; update epochs per batch = 10
actor–critic GANAdversarial pretraining; discriminator hidden units = [128, 128]; replay buffer size = 1×106
Table 4. Parameters of Case I.
Table 4. Parameters of Case I.
SymbolParameterValue
TgA, TgBGovernor time constant0.08 s
TtA, TtBTurbine time constant0.3 s
TpA, TpBFrequency response time constant20 s
BA, BBPrimary frequency bias coefficient4166 Hz/p.u.
KA, KBFrequency response coefficient0.00012 Hz/p.u.
RA, RBSecondary frequency deviation coefficient0.0047
TABTime constant of the tie-line3.42 s
Table 5. Evaluation indices of different algorithms in Case I.
Table 5. Evaluation indices of different algorithms in Case I.
AreaAlgorithm Δ f ¯ (Hz) A C E ¯ (MW)ISEIAEITAE (×107)
 DDPG0.010684.718314.3926884.62517.4368
 PPO0.009476.285412.8567803.49276.7815
Area 1TD30.007863.914210.3275674.31865.6247
 SAC0.007662.487910.0843661.73525.4819
 actor–critic GAN0.005651.26856.7429503.61484.2176
 SSGANs0.003143.82674.0185356.29433.0268
 DDPG0.010382.964113.8754852.73657.1642
 PPO0.009174.638512.3948781.46936.5147
Area 2TD30.007561.82579.8756642.58715.3195
 SAC0.007460.74629.6423631.42685.2064
 actor–critic GAN0.005348.92546.2851472.86393.9257
 SSGANs0.002941.53763.7824341.75922.8463
Table 6. Parameters of Case II.
Table 6. Parameters of Case II.
AreaTg (s)Tt (s)Tp (s)B (Hz/p.u.)K (Hz/p.u.)R
Area 10.080.32041660.000120.0047
Area 20.080.32038500.000120.0050
Area 30.080.32035000.000120.0052
Area 40.080.32037000.000120.0048
Table 7. Evaluation indices of different algorithms in Case II.
Table 7. Evaluation indices of different algorithms in Case II.
AreaAlgorithm Δ f ¯ (Hz) A C E ¯ (MW)ISEIAEITAE (×107)
Area 1DDPG0.0069141.32857.9352574.21684.9826
PPO0.0066136.59427.3627552.48734.7765
TD30.0062129.47366.8241524.95864.5487
SAC0.0061127.83656.6758517.39244.4823
actor–critic GAN0.0055120.68475.9826487.26454.1568
SSGANs0.0041109.57385.3154437.82563.8427
Area 2DDPG0.0071145.82648.2865591.74385.1254
PPO0.0068140.95737.6942568.31574.9086
TD30.0064133.62857.1056540.86244.6635
SAC0.0063132.07466.9584533.74264.5982
actor–critic GAN0.0057124.86396.2418496.52844.2786
Area 3SSGANs0.0042113.49275.5863501.24683.9724
DDPG0.0072148.36578.5243604.38525.2847
PPO0.0069143.21857.9286581.74635.0625
TD30.0065136.47287.3462554.32864.8264
SAC0.0064134.86517.1845547.13694.7583
actor–critic GAN0.0058127.52646.4867513.79424.4128
Area 4SSGANs0.0043116.38465.7825462.81654.4657
DDPG0.0070143.79268.1568584.92755.0642
PPO0.0067138.52647.5483560.38414.8427
TD30.0063131.84726.9765532.94764.6128
SAC0.0062130.28576.8247525.61834.5461
actor–critic GAN0.0056122.94656.1564492.38574.2184
SSGANs0.0041111.83656.2489444.72683.9146
Table 8. Evaluation indices of ablation studies.
Table 8. Evaluation indices of ablation studies.
Algorithm Δ f ¯ (Hz) A C E ¯ (MW)ISEIAEITAE (×107)Training Time (h)Computing Time (s)
SSGANs0.0041109.57385.3154437.82563.84272.860.52
SSGANs without SSNs0.0051120.44865.7134467.18624.28062.530.51
SSGANs without CGANs0.0054130.47736.1776492.48754.51631.740.53
Actor–Critic0.0059141.39856.6608526.79644.77571.350.5
Table 9. Quantitative comparison of dynamic response performance.
Table 9. Quantitative comparison of dynamic response performance.
AlgorithmSettling Time (s)Frequency Overshoot (Hz)
DDPG9650.146
PPO8420.132
TD37350.119
SAC7080.113
actor–critic GAN6240.101
SSGANs4860.084
Table 10. Communication delay analysis of different algorithms.
Table 10. Communication delay analysis of different algorithms.
DelayAlgorithm Δ f ¯
(Hz)
A C E ¯ (MW)ISEIAEITAE (×107)Regulation
Mileage (MW)
10 msDDPG0.0081163.87249.1046655.43825.73412117.86
PPO0.0077156.48268.3983628.91645.47181964.27
TD30.0072146.39587.7395592.87315.17841786.39
SAC0.0071144.62877.5624584.93655.08671735.42
actor–critic GAN0.0064135.92766.8137551.46284.78251548.73
SSGANs0.0049123.68455.9276476.31844.28651328.54
20 msDDPG0.0094188.536710.5864756.82916.62372468.15
PPO0.0089179.48259.7246718.53726.29842296.43
TD30.0082165.79348.9185666.48295.89252075.86
SAC0.0081163.28468.7348657.23965.78622014.38
actor–critic GAN0.0073153.84627.8463613.95845.41761796.27
30 msSSGANs0.0058141.92736.7824538.64274.97381517.69
DDPG0.0112221.684512.6482897.41637.86542942.68
PPO0.0105209.753811.5237846.52847.43262734.15
TD30.0097191.482710.4526778.34956.91482468.72
SAC0.0095188.936410.2185765.28416.78452394.56
actor–critic GAN0.0086176.54829.1487701.62836.28572142.83
SSGANs0.0069166.43867.9465627.81535.84621748.35
Table 11. Paired significance analysis of SSGANs on the two main control indices.
Table 11. Paired significance analysis of SSGANs on the two main control indices.
AlgorithmMetricMean
Difference
Paired t-Test p-ValueWilcoxon p-Value95% Bootstrap CI
DDPG Δ f ¯ (Hz)0.0013<0.001<0.001[0.0010, 0.0016]
DDPG A C E ¯ (MW)22.7099<0.001<0.001[18.72, 26.91]
TD3 Δ f ¯ (Hz)0.00090.0010.001[0.0006, 0.0011]
TD3 A C E ¯ (MW)15.01880.0010.001[11.03, 18.76]
SAC Δ f ¯ (Hz)0.00070.0020.002[0.0004, 0.0009]
SAC A C E ¯ (MW)9.78740.0020.002[6.21, 12.94]
PPO Δ f ¯ (Hz)0.0011<0.001<0.001[0.0008, 0.0014]
PPO A C E ¯ (MW)18.1640<0.0010.001[13.47, 21.82]
actor–critic GAN Δ f ¯ (Hz)0.00040.0180.015[0.0001, 0.0006]
actor–critic GAN A C E ¯ (MW)4.59950.0140.012[1.72, 7.33]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, X.; Ouyang, X.; Chen, B.; Wang, X.; Zhu, T.; Yang, K.; Chen, R. Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes 2026, 14, 1872. https://doi.org/10.3390/pr14121872

AMA Style

Ye X, Ouyang X, Chen B, Wang X, Zhu T, Yang K, Chen R. Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes. 2026; 14(12):1872. https://doi.org/10.3390/pr14121872

Chicago/Turabian Style

Ye, Xi, Xuetong Ouyang, Baorui Chen, Xi Wang, Tong Zhu, Kai Yang, and Runzhi Chen. 2026. "Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids" Processes 14, no. 12: 1872. https://doi.org/10.3390/pr14121872

APA Style

Ye, X., Ouyang, X., Chen, B., Wang, X., Zhu, T., Yang, K., & Chen, R. (2026). Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes, 14(12), 1872. https://doi.org/10.3390/pr14121872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop