Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids

Ye, Xi; Ouyang, Xuetong; Chen, Baorui; Wang, Xi; Zhu, Tong; Yang, Kai; Chen, Runzhi

doi:10.3390/pr14121872

Open AccessArticle

Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids

by

Xi Ye

¹,

Xuetong Ouyang

¹,

Baorui Chen

^2,3,

Xi Wang

^2,3,

Tong Zhu

¹,

Kai Yang

^4,* and

Runzhi Chen

⁴

¹

State Grid Sichuan Electric Power Company, Chengdu 610041, China

²

State Grid Sichuan Electric Power Research Institute, Chengdu 610041, China

³

Power System Security and Operation Key Laboratory of Sichuan, Chengdu 610041, China

⁴

College of Electrical and Information Engineering, Hunan University, Changsha 410082, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(12), 1872; https://doi.org/10.3390/pr14121872

Submission received: 22 April 2026 / Revised: 27 May 2026 / Accepted: 8 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Power-Electronics-Based New-Type Power Systems: Data Analysis, Planning, Operation, and Stability Control)

Download

Browse Figures

Versions Notes

Abstract

Large-scale renewable energy integration introduces random power fluctuations into microgrids, increasing the difficulty of frequency regulation. To improve regulation stability and training efficiency, this article proposes sample selection generative adversarial networks (SSGANs) based on sample selection networks (SSNs), conditional generative adversarial networks (CGANs), and the actor–critic framework. First, the SSNs are trained to evaluate sample information values and prioritize informative samples for model training. Second, the CGANs learn the conditional mapping between microgrid operating states and control actions, and the pretrained generator is transferred into the actor–critic framework as the actor. Third, the actor–critic framework further optimizes the control policy online to generate real-time frequency regulation commands. The proposed method is tested on a standard two-area system and further validated on a complex four-area system. Case studies show that SSGANs achieve faster convergence and better frequency regulation performance than typical control algorithms.

Keywords:

generative adversarial networks; sample selection networks; actor–critic framework; smart generation control

1. Introduction

Large-scale renewable energy, such as wind and solar power, has been integrated into microgrids to reduce environmental pollution [1]. However, its random fluctuations increase the difficulty of frequency regulation. Meanwhile, massive and complex microgrid operational data make it difficult for intelligent control methods to efficiently select useful samples [2]. Therefore, this study aims to design an intelligent frequency regulation strategy for informative sample selection and accurate control action generation.

Automatic generation control (AGC) is widely used to maintain power balance and suppress frequency deviations in microgrids [3]. Generally, AGC includes two main processes [4]: first, generation commands are obtained using control strategies [4]; second, the total generation command is then dispatched to individual generation units [5]. However, conventional AGC strategies usually depend on fixed control structures, making it difficult to adapt to renewable energy fluctuations and changing operating conditions [6]. Therefore, reinforcement-learning-based smart generation control (SGC) has been introduced to improve the adaptability of microgrid frequency regulation [7]. For instance, Q-learning adjusts control policies through continuous interaction with uncertain operating environments, which helps enhance microgrid stability and adaptability [8]. Nevertheless, conventional reinforcement learning suffers from the curse of dimensionality as microgrid complexity increases [9].

To enhance frequency regulation under renewable energy fluctuations, particle swarm optimization has been used to train an artificial neural network for load frequency control with vehicle-to-grid integration [10]. Deep reinforcement learning methods have also been introduced into microgrid frequency regulation [11]. Deep Q-learning has been used to reduce frequency deviation through value-function approximation [12]. The deep deterministic policy gradient (DDPG) improves continuous control capability through deterministic policy learning [13]. Proximal policy optimization (PPO) enhances policy update stability by constraining policy changes [14]. The twin delayed deep deterministic policy gradient (TD3) reduces value overestimation through twin critics and delayed policy updates [15]. Soft actor–critic (SAC) improves exploration efficiency through entropy-regularized policy optimization [16]. However, these methods still rely on sufficient exploration and high-quality samples, which may reduce sample efficiency and slow convergence in complex microgrids.

To reduce the dependence on labeled samples and improve action generation, generative adversarial networks (GANs) have been combined with reinforcement learning for learning-based control [17]. GAN-based reinforcement learning can use adversarial training to enhance policy learning and decision generation [18]. Sequence generative adversarial networks (SGANs) further extend GANs to sequential decision-making and support online policy updates [19]. However, policy gradient methods used in SGANs may suffer from unstable updates in complex control environments [20]. To improve training stability, GANs have been connected with actor–critic frameworks [21]. In this framework, the generator can be used as the actor to improve action generation and strategy exploration [22]. Nevertheless, existing GAN-based actor–critic methods still have three gaps: insufficient use of informative samples from massive operational data [23], weak state-conditioned action prediction, and limited value-guided online optimization of generated actions [24].

To address the above deficiencies, this paper proposes sample selection generative adversarial networks (SSGANs) based on conditional generative adversarial networks (CGANs) [25], sample selection networks (SSNs), and the actor–critic framework [26]. Specifically, CGANs are introduced to learn the conditional mapping between microgrid operating states and control actions, thereby improving state-dependent action prediction [27]. Meanwhile, SSNs are constructed based on bidirectional long short-term memory (BiLSTM) [28] to capture temporal correlations in electricity data and select samples with high information values, which improves training efficiency. Then, the pretrained generator of CGANs is integrated into the actor–critic framework as the actor for online policy optimization. The key contributions of this article are as follows.

(1): The SSGANs introduce SSNs to evaluate sample information values and prioritize informative samples, thereby improving training efficiency.
(2): The SSGANs use CGANs to learn the state-conditioned mapping between microgrid operating states and control actions, thereby improving action generation quality.
(3): The SSGANs integrate the pretrained generator into the actor–critic framework as the actor, enabling online policy optimization for intelligent frequency regulation of microgrids.

The rest of the paper is arranged as follows. Section 2 introduces the SSGANs. Section 3 describes the simulation setup. Section 4 presents the case studies and results. Section 5 concludes the paper.

2. Principle of Sample Selection Generative Adversarial Networks

2.1. Sample Selection Networks

With the continuous connection of new energy to microgrids, the power data becomes more complicated [29]. Samples with high reward values provide more useful information for training, while low-value samples dominate the dataset and increase training cost. Therefore, BiLSTM-based sample selection networks (SSNs) are developed to identify high information value samples from the training dataset and improve training efficiency. Figure 1 shows the structure of the SSNs.

Similarly to prioritized experience replay [30], the SSNs perform reward-based sample prioritization to select important samples from the training sample set

S = \{(x_{i}, y_{i})\}

, which includes microgrid samples

x_{i}

and their corresponding labels

y_{i}

. Meanwhile, a validation set is defined as

E = \{(x_{i}, y_{i})\}

, where

y_{i}

denotes the known label of validation sample

x_{i}

. The forecast validation reward is defined as

R (E, S, s_{k}) = \sum_{(x_{i}, y_{i}) \in E} \log p (y_{i} ∣ x_{i}, s_{k}, S_{e})

(1)

where

x_{i}

denotes the input sample,

y_{i}

denotes the known validation label,

s_{k}

denotes the control state of the SSNs after querying

k

tags, and

S_{e}

denotes the current labeled training subset.

During SSN training, the label of the selected input sample is queried, and the SSNs update their state from

s_{k - 1}

to

s_{k}

. The prediction of the SSNs is determined by the selected sample and its queried label, which are related to

S_{e}

and

y_{i}

, respectively.

The ideal training objective of SSNs is

\max_{π} E_{(S, E) \sim F} [E_{π (S, T)} [\sum_{i = 1}^{T} R (E, S, s_{i})]]

(2)

where

T

denotes the maximum number of queried labels;

S, E \sim F

indicates that the training set and validation set are sampled from distribution

F

;

π (S, T)

denotes the sequential sample selection policy over

T

steps.

For an unlabeled candidate sample

x_{j}

, its true label is unavailable before annotation. Therefore, the current SSNs first predict a pseudo-label

{\hat{y}}_{j}

for

x_{j}

. The pair

(x_{j} | {\hat{y}}_{j})

is then virtually added to the current labeled subset

S_{e}

to estimate its potential contribution. Accordingly, the information value of the unlabeled candidate is defined as the predicted increase in the validation reward after this hypothetical update, i.e.,

I_{j} = R (E, S_{e} \cup \{(x_{j}, {\hat{y}}_{j})\}, s_{k + 1}) - R (E, S_{e}, s_{k})

(3)

where

I_{j}

denotes the information value of candidate sample

x_{j}

;

s_{k + 1}

denotes the updated SSNs state after virtually including

(x_{j} | {\hat{y}}_{j})

.

For a candidate batch with N samples, the mean information value is calculated as

\bar{I} = \frac{1}{N} \sum_{j = 1}^{N} I_{j}

(4)

where N denotes the number of candidate samples in the batch.

I_{j}

is used as the adaptive threshold for sample classification. If

I_{j} > \bar{I}

, candidate sample

x_{j}

is regarded as a high-information-value sample; otherwise, it is regarded as a low-information-value sample.

In this way, the importance of each unlabeled candidate can be estimated before its true label is queried. Samples with larger

I_{j}

are prioritized for subsequent training, thereby improving training efficiency.

2.2. Conditional Generative Adversarial Networks

The actor–critic framework obtains actions through policy learning. However, in complex microgrid environments, policy exploration may become inefficient and costly, making it difficult to find effective control actions [31]. To address this issue, CGANs are introduced to provide state-conditioned action prediction so that action generation is guided by the current operating state rather than relying only on exploratory policy updates.

GANs consist of a generator G and a discriminator D, which are trained in an adversarial manner. The generator converts a latent noise vector z sampled from a prior distribution into synthetic data, while the discriminator estimates whether an input sample comes from the real dataset or from G. During training, D improves its ability to distinguish real and generated samples, whereas G learns to produce samples that are difficult to identify as fake. This adversarial learning process can be described by the following minimax optimization problem :

\min_{G} \max_{D} V (D, G) = E_{x ~ ρ_{data}} [\log D (x)] + E_{z ~ ρ_{noise}} [\log (1 - D (G (z))]

(5)

The CGANs, as an extension of GANs, introduce conditional information d into both the generator and discriminator. In this study, the CGANs introduce the current microgrid state

s_{t}

as the conditional input. Unlike a standard GAN that learns the marginal distribution of control actions, the CGANs learn the conditional action distribution under a given operating state. The value function of CGANs is

\min_{G} \max_{D} V (D, G ∣ s_{t}) = E_{a_{t + 1} \sim p_{data} (a ∣ s_{t})} [\log D (a_{t + 1}, s_{t})] + E_{z \sim p_{z}} [\log (1 - D (G (z, s_{t}), s_{t}))]

(6)

where

s_{t}

denotes the current microgrid state,

a_{t + 1}

denotes the real historical control action,

z

denotes the noise vector,

G (z, s_{t})

denotes the generated control action, and

D (a_{t + 1}, s_{t})

denotes the probability that the state–action pair comes from the real dataset.

By conditioning both the generator and discriminator on

s_{t}

, the CGANs can learn the state-dependent mapping from operating states to control actions. Therefore, compared with a standard GAN that only generates actions from noise, the CGANs can generate control actions more consistent with the current operating condition, thereby improving action forecasting accuracy. The structure of CGANs is shown in Figure 2.

2.3. Sample Selection Generative Adversarial Networks

The SSGANs consist of three main components: SSNs, CGANs, and the actor–critic framework. The SSNs evaluate the information value of candidate samples and divide them into high-information-value samples and low-information-value samples. The selected high-information-value samples are used for CGAN pretraining. The CGANs learn the state-conditioned mapping from microgrid operating states to control actions through adversarial learning, where the generator predicts control actions and the discriminator distinguishes real and generated state–action samples. Finally, the pretrained generator is transferred to initialize the actor network, and the actor–critic framework further updates the control policy through value evaluation. The structure of the SSGANs is shown in Figure 3.

For sample selection, this paper draws on the idea of reward-based prioritization [32]. The information value of each candidate sample is predicted by the SSNs. Samples with information values higher than the batch mean are regarded as high-information-value samples in experience pool 1; otherwise, they are regarded as low-information-value samples in experience pool 2. In this way, SSGANs can prioritize informative samples for CGAN pretraining and reduce the influence of low-information-value samples on policy learning.

After the high-information-value sample set

c = \{c_{0}, \dots, c_{t}\}

is obtained by SSNs, the pretrained generator is transferred to initialize the actor network. In the online stage,

G

denotes the actor network and

C

denotes the critic network. The discriminator used in the offline CGAN pretraining stage is not involved in the online control process. Then, the actor

G (c_{t} ∣ θ^{G})

predicts the next action

a_{t + 1}

according to the current condition

c_{t}

, while the critic

C (c_{t}, a_{t} ∣ θ^{C})

estimates the corresponding action–value function. Therefore, the online control process follows a standard actor–critic framework.

The critic target value is calculated as

y_{i} = r_{i} + λ C^{'} (c_{i + 1}, G^{'} (c_{i + 1} ∣ θ^{G^{'}}) | θ^{C^{'}})

(7)

where

λ

is the discount factor.

The critic network is trained by minimizing the mean squared Bellman error

L (θ^{C}) = \frac{1}{I} \sum_{i = 1}^{I} {[y_{i} - C (c_{i}, a_{i} ∣ θ^{C})]}^{2}

(8)

The actor network is optimized using the deterministic policy gradient

\nabla_{θ^{G}} J \approx \frac{1}{I} \sum_{i = 1}^{I} [\nabla_{a} C (c, a ∣ θ^{C}) ∣_{c = c_{i}, a = G (c_{i})} \cdot \nabla_{θ^{G}} G (c ∣ θ^{G}) ∣_{c_{i}}]

(9)

where

i

is the mini-batch sample index;

I

is the mini-batch size.

Finally, the target networks are updated through soft updating

\begin{array}{l} θ^{C^{'}} = γ θ^{C} + (1 - γ) θ^{C^{'}} \\ θ^{G^{'}} = γ θ^{G} + (1 - γ) θ^{G^{'}} \end{array}

(10)

where

γ

is the soft update coefficient.

2.4. Sample Selection Generative Adversarial Networks for SGC

The proposed SSGANs are applied for SGC to reduce the frequency deviation

Δ f

and area control error (ACE) of microgrids. To clarify the decision-making formulation, the SGC problem is formulated as a Markov decision process

(S, A, P, r)

. At time

t

, the state

s_{t} \in S

includes the measurable operating variables of the microgrid, such as frequency deviation, ACE, load disturbance, and renewable power fluctuation. The action

a_{t} \in A

denotes the continuous generation control command generated by the actor. The environment is the microgrid frequency response model under load and renewable energy disturbances. After receiving

a_{t}

, the environment returns the next state

s_{t + 1}

and the immediate reward

r_{t}

.

In the pretraining stage, historical operating data are collected as input samples. The SSNs are first trained to evaluate sample information values and select high-information-value samples for CGAN pretraining. The CGANs then learn the state-conditioned mapping between microgrid operating states and control actions, and the pretrained generator is transferred to initialize the actor network. After pretraining, the parameters of the SSNs are fixed and are not jointly updated with actor–critic networks.

Although offline training is more computationally stable, it cannot exploit the online learning capability of the actor–critic framework. To improve adaptability to real-time complex disturbances, an online training stage is further introduced. The pretrained generator is transferred to initialize the actor, which is then updated under the actor–critic framework. Meanwhile, the trained SSNs with fixed parameters are used to select informative samples from the real-time experience replay buffer to support policy learning and optimization.

After training, the SSGANs are deployed for online frequency control. At each control step, the current microgrid state is directly fed into the trained actor network, and the corresponding control command is generated in real time. Therefore, the CGANs learn the state-conditioned action distribution

p (a_{t + 1} ∣ s_{t})

, while the actor–critic framework further optimizes the generated actions through critic evaluation. The proposed method uses sample selection to improve training efficiency and uses adversarial pretraining to enhance action prediction, thereby supporting real-time frequency regulation of microgrids (Figure 4 and Algorithm 1).

Algorithm 1. Pseudo-code of SSGANs for SGC

1: Initialize parameters
2: for each training sample do
3: Estimate the validation reward using Equation (1)
4: Update the SSN state
5: Calculate the sample information value using Equation (3))
6: If the predicted reward exceeds the mean, save the sample to experience pool 1; otherwise to experience pool 2
7: end for
8: Pre-train the CGANs on experience pool 1 by Equation (6).
9: Transfer the pre-trained generator to initialize the online actor network
10: for t = 1…T do
11: Generate action

a_{t}

12: Store samples

(s_{t}, a_{t}, r_{t}, s_{t + 1})

13: Calculate the target value by Equation (7)
14: Update the critic and actor using Equation (8) and Equation (9)
15: Soft-update the target networks using Equation (10)
16: end for
17: Deploy the trained actor network
18: Input the current microgrid state into the trained actor
19: Generate the control command for real-time frequency regulation

3. Simulation Setup

3.1. Evaluation and Reward Function

To evaluate the frequency regulation performance of SSGANs, Δf, ACE, integral squared error (ISE) [33], integral absolute error (IAE) [34], and integral time multiple absolute errors (ITAEs) [35] are adopted as evaluation indices. Smaller values indicate better performances.

ISE = \int_{0}^{\infty} Δ f^{2} (t) d t

(11)

IAE = \int_{0}^{\infty} |Δ f (t)| d t

(12)

ITAE = \int_{0}^{\infty} t |Δ f (t)| d t

(13)

To guide the controller toward stable frequency regulation, the reward function is constructed using Δf and ACE. The weighted squared terms are used to penalize frequency fluctuation and tie-line power imbalance. The specific reward function used in the critic network only focuses on frequency deviation and ACE, which is designed as

r = - \sum_{k = 1}^{K} [η_{k} {(Δ f_{k})}^{2} + (1 - η_{k}) {({ACE}_{k})}^{2} / 1000]

(14)

where

η_{k}

and

(1 - η_{k})

are weight coefficients of

Δ f_{k}

and

{ACE}_{k}

in the k-th area, respectively.

η_{k}

is set to 0.5 for all areas to balance

Δ f_{k}

and

{ACE}_{k}

.

In addition, the precision, loss, and root mean square error (RMSE) are applied to evaluate the training process of SSNs and CGANs.

Precision = \frac{N_{correct}}{N_{total}}

(15)

Loss = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})]

(16)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(z_{i} - {\hat{z}}_{i})}^{2}}

(17)

where

N_{correct}

is the number of correctly classified samples;

N_{total}

is the total number of samples;

y_{i}

is the true label;

{\hat{y}}_{i}

is the predicted probability;

z_{i}

is the real value;

{\hat{z}}_{i}

is the predicted value.

3.2. Parameter Setting

Table 1 lists the parameters of the SSGANs. These hyperparameters were selected based on empirical tuning and manual adjustment to balance prediction accuracy, training stability, and computational efficiency.

For the SSNs, the BiLSTM consists of two LSTM layers with dimensions of 4-90-90-2. The hidden size of 90 is selected to capture temporal correlations in microgrid operating data without excessive computational cost. The dropout factor is set to 0.5 to avoid overfitting, and the learning rate is set to 0.01 to accelerate sample-selection training.

For the CGANs, the generator and discriminator are DNNs with dimensions of 4-90-90-4 and 4-90-90-2, respectively. The generator output dimension corresponds to continuous control actions, and the discriminator output dimension corresponds to real/fake sample discrimination. Tanh is used in the generator output layer to bound the generated control actions, while Sigmoid is used in the discriminator output layer. ReLU is used in the hidden layers, and batch normalization [36] is introduced to alleviate undesirable initialization and improve adversarial training stability.

For the actor–critic framework, the pretrained generator is transferred as the actor, and the critic is a DNN with dimensions of 4-90-90-1 to estimate the action value. The learning rate of both the actor and critic is set to 0.005 to maintain balanced updates. Adam is used as the optimizer, the discount factor is set to 0.99, the soft update coefficient is set to 0.005, the replay buffer size is set to

1 \times 10^{6}

, the batch size is set to 128, and the total training steps are set to 10,000. These settings are kept consistent with the comparison algorithms to ensure a fair evaluation.

Five representative methods of modern deep reinforcement learning are introduced as the comparison algorithms: DDPG, TD3, SAC, PPO, and actor–critic GAN [18]. Consistent with the original experimental setup, all algorithms were optimized by the genetic algorithm with the same population size of 30 and the same iteration number of 50. In addition, all methods were evaluated under the same computational budget and the same architecture family to ensure a fair comparison. The detailed parameter settings of the compared algorithms are provided in Table 2 and Table 3.

4. Case Studies

All algorithms are evaluated in the IEEE two-area system (Case I) [37] and China Southern Power Grid (Case II) [38]. Case I, as the benchmark condition, is utilized to test the SSGANs; Case II is utilized to validate the performance of the SSGANs.

4.1. Case I

Case I is based on a standard two-area frequency control system. Each area contains conventional generation units represented by governor and turbine dynamics, as shown in Figure 5. To simulate renewable energy integration, wind and solar power units are also incorporated into the system. The main parameters of Case I are listed in Table 4.

4.1.1. Pretraining of SSNs and CGANs

In the pretraining stage, historical state–action data from Case I are used as training samples, which are generated by a conventional PID controller. The SSNs evaluate sample information values based on historical rewards and select high-information-value samples for CGAN pretraining. Then, the CGANs learn the state-conditioned mapping from operating states to generation control actions, and the pretrained generator provides initialization for the online actor network.

In the SSNs training process, Figure 6a shows that the precision rises quickly and becomes stable after approximately 4 × 10⁴ iterations, while the loss decreases to a low level. This indicates that the SSNs can effectively learn the sample selection rule. Figure 7a presents the reward distribution of training samples. The reward values calculated by Equation (13) are shifted by subtracting the mean value and normalized into [−1, 1]. Samples with normalized rewards greater than 0 are regarded as high-information-value samples for CGAN pretraining. For the CGANs, Figure 6b shows that the RMSE decreases rapidly and converges after approximately 4 × 10⁴ iterations, indicating stable prediction performance. As shown in Figure 7b, the predicted generation power closely follows the observed power. Therefore, the pretrained generator can learn the mapping from microgrid operating states to generation control actions and provide initialization for the online actor network.

4.1.2. Online Training of SSGANs

Online training data are collected from state–action interactions in Case I. The CGAN generator initializes the actor, while the critic estimates action values. To test dynamic learning, a sawtooth load disturbance with 50 dB Gaussian white noise is added to Case I, as shown in Figure 8.

Figure 9a shows the frequency deviation responses of different algorithms. DDPG has the largest fluctuation and slowest recovery, while PPO improves the response but still shows relatively large deviations. TD3 and SAC achieve similar and better control effects. The actor–critic GAN further reduces frequency deviations through adversarial pretraining. In comparison, SSGANs show the smallest deviation and fastest recovery, indicating better online learning and regulation ability. This is because the SSNs prioritize high-information-value samples to improve training sample quality, the CGANs provide state-conditioned action prediction to further accelerate policy exploration, and the actor–critic framework updates the control policy in real time, enabling SSGANs to generate more effective control commands for frequency regulation. Figure 9b further shows that SSGANs have the most concentrated frequency deviation distribution around zero, indicating the best regulation stability and online control performance.

4.1.3. Online Operation

To evaluate the online operation performance, a residential load profile is introduced as the load disturbance, as shown in Figure 10a. Meanwhile, wind and solar power fluctuations are considered as stochastic renewable energy disturbances, as shown in Figure 10b.

Compared with the other algorithms, SSGANs can prioritize high-information-value samples and provide better training data for policy learning. Therefore, its frequency deviation and ACE curves are smoother and converge faster, as shown in Figure 11. This indicates that the sample selection mechanism can reduce the influence of low-information samples, while CGAN-based action prediction improves the initial control quality. Table 5 and Figure 12 further show that SSGANs reduce the average frequency deviation by 44.64–71.84% and the average ACE by 14.52–49.93%. Moreover, SSGANs achieve lower ISE, IAE, and ITAE than the compared algorithms, with minimum reductions of 39.82%, 27.73%, and 27.50%, respectively. These results demonstrate that SSGANs can effectively improve frequency quality, reduce accumulated control errors, and achieve better online operation performance.

4.2. Case II

Case II is a four-area interconnected power system. Compared with Case I, Case II has a larger system scale, stronger inter-area coupling, and more complex frequency regulation requirements. Therefore, it is used to further evaluate the reliability and adaptability of SSGANs in complex multi-area scenarios, as shown in Figure 13. The system parameters of Case II are listed in Table 6.

Compared with other algorithms, SSGANs show smaller frequency deviation and ACE fluctuations in Area 1, as shown in Figure 14. The frequency deviation converges to a narrower range around zero, and the ACE response is smoother, indicating better suppression of frequency oscillation and inter-area power imbalance. This improvement is mainly attributed to the high-information-value sample selection of SSNs, the state-conditioned action prediction of CGANs, and the online policy update of the actor–critic framework. Table 7 and Figure 15 further show that SSGANs outperform the compared algorithms in most evaluation indices. In Area 1, SSGANs reduce the average frequency deviation by 25.455–40.580% and the average ACE by 9.207–22.469%, while ISE, IAE, and ITAE are reduced by at least 11.152%, 10.146%, and 7.556%, respectively. Similar improvements are observed in Areas 2–4, indicating that although SSGANs do not achieve the best result on a few individual indices, they still outperform the compared methods on most evaluation indices across the multi-area system.

The above results indicate that (i) the SSGANs can stably control more generator units and new energy units; (ii) the SSGANs are highly adaptive and robust in complicated multi-area systems.

4.3. Discussion

To further evaluate the effectiveness of the proposed SSGANs, this section presents additional analyses, including ablation studies and statistical analysis, dynamic response performance, communication delay analysis, and limitations discussion. All experiments in this section are conducted in Area 1 of Case II.

4.3.1. Ablation Studies

The complete SSGANs achieve the best performance across all evaluation indices (Table 8). Removing SSNs increases Δf by 24.4% and ACE by 9.9%, which demonstrates that the sample selection mechanism effectively prioritizes high-information-value samples and improves training efficiency. Removing CGAN pretraining leads to a larger degradation, with Δf increasing by 31.7% and ACE by 19.1%, indicating that the pretrained generator provides a superior initialization for the actor network compared to random initialization. The standard actor–critic shows the worst performance across all indices, confirming that both components contribute meaningfully to the overall performance. Regarding computational cost, SSGANs require the longest training time (2.86 h) due to the additional SSN training and CGAN pretraining stages. However, the inference time of all methods remains comparable (approximately 0.5 s), which satisfies the real-time requirement of microgrid frequency regulation.

4.3.2. Dynamic Response Performance

To further evaluate the dynamic response performance, settling time and frequency overshoot are introduced as supplementary indices, where settling time reflects the recovery speed after disturbance, and frequency overshoot reflects the maximum frequency deviation. As shown in Table 9, SSGANs achieve the best performance among all compared algorithms, with the shortest settling time of 486 s and the smallest frequency overshoot of 0.084 Hz. Compared with DDPG and SAC, the settling time of SSGANs is lower by 49.64% and 31.36%, respectively, and the frequency overshoot is lower by 42.47% and 25.66%, respectively. These results indicate that SSGANs can suppress the maximum frequency deviation more effectively and restore the system frequency to the steady-state range more rapidly after disturbance, demonstrating better dynamic regulation capability than standard DRL methods.

4.3.3. Communication Delay Analysis

In practical interconnected microgrids, communication delay affects the transmission of measured states and control commands, thereby weakening the timeliness of frequency regulation. As shown in Table 10, the control performance of all algorithms degrades as the delay increases from 10 ms to 30 ms. For SSGANs, the average frequency deviation and average area control error increase by 40.82% and 34.57%, respectively. However, SSGANs still maintain the lowest values under each delay condition. Under the 30 ms delay, compared with DDPG and SAC, SSGANs reduce the average frequency deviation by 38.39% and 27.37%, respectively, indicating that SSGANs are less affected by communication delay.

Frequency regulation mileage [8] is introduced as an auxiliary economic indicator, which is calculated by accumulating the absolute variations in adjacent control commands during the regulation process. A smaller regulation mileage indicates smoother control actions, lower regulation burden, and lower potential execution cost. As shown in Table 10, SSGANs achieve the lowest regulation mileage under different delay conditions. Under the 30 ms delay, the regulation mileage of SSGANs is 40.59% lower than DDPG and 26.99% lower than SAC, showing better regulation economy under delayed communication.

4.3.4. Statistical Significance Analysis

To further verify whether the performance gains of SSGANs are statistically significant rather than caused by random initialization, paired significance analysis was conducted on the two main control indices (i.e., Δf and ACE). Specifically, all compared methods were independently run 20 times under different random seeds, and the results obtained under the same seed were paired for comparison. For each method, the performance difference relative to SSGANs was evaluated by paired t-tests and Wilcoxon signed-rank tests. In addition, 95% bootstrap confidence intervals of the mean differences were calculated to further quantify the uncertainty of the observed improvement. The statistical results are summarized in Table 11.

As shown in Table 11, the mean differences in all compared methods relative to SSGANs are positive on both Δf and ACE, which indicates that SSGANs consistently achieve lower frequency deviation and lower area control error. Meanwhile, the p-values of both the paired t-test and the Wilcoxon signed-rank test are below 0.05 for all comparisons, and the corresponding 95% bootstrap confidence intervals do not include zero. These results demonstrate that the superiority of SSGANs over DDPG, TD3, SAC, PPO, and actor–critic GAN is statistically significant on the two main control indices. In particular, although actor–critic GAN already benefits from adversarial pretraining, SSGANs still achieve significant improvements, which further confirms that the SSN-based sample selection mechanism provides additional performance gains beyond adversarial actor initialization alone.

4.3.5. Discussion of Limitations

Although the proposed SSGANs demonstrate superior performance in the case studies, several limitations should be acknowledged.

(1): The training time of SSGANs is longer than that of standard deep RL methods due to the additional SSN training and CGAN pretraining stages. In applications where rapid deployment is required, this additional training overhead may become a constraint.
(2): The current validation is based on simulation models, and experimental verification on physical microgrid platforms is required to further confirm the practical applicability.
(3): The pretraining data is generated by a conventional PID controller, which may limit the diversity and quality of the initial training samples. Exploring more diverse data sources for pretraining could further improve the performance of SSGANs.
(4): The current work mainly focuses on simulation-based control performance, while the theoretical stability analysis of the system is not considered.

5. Conclusions

This paper proposes SSGANs for intelligent frequency regulation of microgrids by combining SSNs, CGANs, and the actor–critic framework. The main conclusions are as follows.

(1): SSNs can evaluate sample information values and select informative samples, thereby improving sample utilization and training efficiency.
(2): CGANs can learn the state-conditioned mapping between operating states and control actions, which improves action generation quality and reduces inefficient exploration.
(3): By transferring the pretrained CGAN generator into the actor–critic framework, SSGANs achieve online policy optimization. Case studies show that SSGANs obtain smaller frequency deviation, lower ACE, and better dynamic response performance than the compared algorithms.

In future works, the collaborative analysis of the business model and the operation cost of microgrids can be considered for SSGANs to obtain an improved control scheme. In addition, the SSGANs could modify the networks to improve the control effect, and more advanced GAN training stabilization techniques, such as Wasserstein loss and gradient penalty, could be explored. More detailed economic cost modeling, including operation cost, device degradation, and actuator wear, can also be further considered. Meanwhile, theoretical stability analysis of the system can be considered.

Author Contributions

Conceptualization, X.Y., X.O. and R.C.; methodology, X.O. and R.C.; software, K.Y. and X.O.; validation, K.Y., X.W. and T.Z.; formal analysis, K.Y. and X.O.; investigation, X.W. and T.Z.; resources, X.Y. and K.Y.; data curation, X.W. and T.Z.; writing—original draft preparation, X.O. and R.C.; writing—review and editing, X.Y., K.Y. and B.C.; visualization, X.O. and T.Z.; supervision, X.Y., K.Y. and B.C.; project administration, X.Y., K.Y. and B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Science and Technology Project of State Grid Sichuan Electric Power Company (Project Name: Research on Key Technologies for Dynamic Assessment and Enhancement of Frequency Support Strength of Hydro-Wind-Solar Coupled System; Project No. 52199723003H).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Xi Ye, Xuetong Ouyang, Baorui Chen, Xi Wang, and Tong Zhu were employed by the company State Grid Sichuan Electric Power Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACE	Area control error
AGC	Automatic generation control
BiLSTM	Bidirectional long short-term memory
BN	Batch normalization
CGANs	Conditional generative adversarial networks
DDPG	Deep deterministic policy gradient
DNNs	Deep neural networks
GANs	Generative adversarial networks
IAE	Integral absolute error
ISE	Integral squared error
ITAE	Integral time multiple absolute error
PPO	Proximal policy optimization
RMSE	Root mean square error
SAC	Soft actor–critic
SGC	Smart generation control
SSGANs	Sample selection generative adversarial networks
SSNs	Sample selection networks
TD3	Twin delayed deep deterministic policy gradient

References

Zhang, Y.; Lu, T.; Zhang, L.; Mei, Y.; Guo, Y.; Wu, S.; Wu, Q.; Xu, Y. Low-carbon optimal dispatching of rural multi-energy microgrid system based on multi-energy conversion and agricultural load demands. Energy 2026, 344, 140024. [Google Scholar] [CrossRef]
Zheng, J.; Zhai, L.; Tao, M.; Tang, W.; Li, Z. Low-Carbon Economic Dispatch in Integrated Energy Systems: A Set-Based Interval Optimization with Decision Support Under Uncertainties. Prot. Control Mod. Power Syst. 2025, 11, 68–87. [Google Scholar] [CrossRef]
Li, Z.; Cheng, Z.; Wang, Y.; Sui, Q. Distributed event-triggered fixed-time SMC-based AGC for power systems with heterogeneous frequency regulation units. IEEE Trans. Ind. Inform. 2024, 20, 8031–8043. [Google Scholar] [CrossRef]
Muduli, R.; Jena, D.; Moger, T. Application of reinforcement learning-based adaptive PID controller for automatic generation control of multi-area power system. IEEE Trans. Autom. Sci. Eng. 2025, 22, 1057–1068. [Google Scholar] [CrossRef]
Çetin, G.; Özkaraca, O.; Keçebaş, A. Development of PID based control strategy in maximum exergy efficiency of a geothermal power plant. Renew. Sustain. Energy Rev. 2021, 137, 110623. [Google Scholar] [CrossRef]
Wang, N.; Hao, F. Event-triggered sliding mode control with adaptive neural networks for uncertain nonlinear systems. Neurocomputing 2021, 436, 184–197. [Google Scholar] [CrossRef]
Xi, L.; Chen, J.F.; Huang, Y.H.; Xu, Y.C.; Liu, L.; Zhou, Y.M.; Li, Y. Smart generation control based on multi-agent reinforcement learning with the idea of the time tunnel. Energy 2018, 153, 977–987. [Google Scholar] [CrossRef]
Yin, L.; He, X. Artificial emotional deep Q learning for real-time smart voltage control of cyber-physical social power systems. Energy 2023, 273, 127232. [Google Scholar] [CrossRef]
Perera, A.T.D.; Kamalaruban, P. Applications of reinforcement learning in energy systems. Renew. Sustain. Energy Rev. 2021, 137, 110618. [Google Scholar] [CrossRef]
Irfan, M.; Deilami, S.; Huang, S.J.; Tahir, T.; Veettil, B.P. Optimizing load frequency control in microgrid with vehicle-to-grid integration in Australia: Based on an enhanced control approach. Appl. Energy 2024, 366, 123317. [Google Scholar] [CrossRef]
Tao, X.; Hafid, A.S. DeepSensing: A novel mobile crowdsensing framework with double deep Q-network and prioritized experience replay. IEEE Internet Things J. 2020, 7, 11547–11558. [Google Scholar] [CrossRef]
Yin, L.F.; Luo, S.K.; Ma, C.X. Expandable depth and width adaptive dynamic programming for economic smart generation control of smart grids. Energy 2021, 232, 120964. [Google Scholar] [CrossRef]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1587–1596. [Google Scholar]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. 35th International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1861–1870. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Fu, X.; Zhang, C.; Zhang, X.; Sun, H. A novel GAN architecture reconstructed using Bi-LSTM and style transfer for PV temporal dynamics simulation. IEEE Trans. Sustain. Energy 2024, 15, 2826–2829. [Google Scholar] [CrossRef]
Han, K.; Yang, K.; Yin, L. Lightweight actor-critic generative adversarial networks for real-time smart generation control of microgrids. Appl. Energy 2022, 317, 119163. [Google Scholar] [CrossRef]
Han, C.; Gim, G. Time-series-based anomaly detection in industrial control systems using generative adversarial networks. Processes 2025, 13, 2885. [Google Scholar] [CrossRef]
Li, H.; Misra, S. Reinforcement learning based automated history matching for improved hydrocarbon production forecast. Appl. Energy 2020, 284, 116311. [Google Scholar] [CrossRef]
Pfau, D.; Vinyals, O. Connecting generative adversarial networks and actor-critic methods. arXiv 2016, arXiv:1610.01945. [Google Scholar] [CrossRef]
Peng, B.; Li, X.; Gao, J.; Liu, J.; Wong, K.F. Adversarial advantage actor-critic model for task-completion dialogue policy learning. arXiv 2017, arXiv:1710.11277. [Google Scholar] [CrossRef]
Ye, Y.; Qiu, D.; Sun, M.; Papadaskalopoulos, D.; Strbac, G. Deep reinforcement learning for strategic bidding in electricity markets. IEEE Trans. Smart Grid 2020, 11, 1343–1355. [Google Scholar] [CrossRef]
Kumar, R.; De, M. Advancement in power system resilience through deep reinforcement learning: A comprehensive review. Renew. Sustain. Energy Rev. 2025, 222, 115951. [Google Scholar] [CrossRef]
Sadoughi, N.; Busso, C. Speech-driven expressive talking lips with conditional sequential generative adversarial networks. IEEE Trans. Affect. Comput. 2021, 12, 1031–1044. [Google Scholar] [CrossRef]
Dhawas, P.V.; Bedekar, P.; Nandankar, P.V.; Vaidya, M.G. Localization of HIFS in primary distribution networks using voltage and current sequence components. Expert Syst. Appl. 2024, 242, 122428. [Google Scholar] [CrossRef]
Vinci, C. Predicting auction price of vehicle license plate with deep recurrent neural network. Expert Syst. Appl. 2020, 142, 113008. [Google Scholar] [CrossRef]
Michael, N.E.; Bansal, R.C.; Ismail, A.A.A.; Elnady, A.; Hasan, S. A cohesive structure of bi-directional long-short-term memory (BiLSTM)-GRU for predicting hourly solar radiation. Renew. Energy 2024, 222, 119943. [Google Scholar] [CrossRef]
Ding, Y.F.; Chen, Z.J.; Zhang, H.W.; Wang, X.; Guo, Y. A short-term wind power prediction model based on CEEMD and WOA-KELM. Renew. Energy 2022, 189, 188–198. [Google Scholar] [CrossRef]
Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2016. [Google Scholar] [CrossRef]
Thibaut, T.; Damien, E. An application of deep reinforcement learning to algorithmic trading. Expert Syst. Appl. 2021, 173, 114632. [Google Scholar] [CrossRef]
Xi, L.; Yu, L.; Xu, Y.; Wang, S.X.; Chen, X. A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems. IEEE Trans. Sustain. Energy 2020, 11, 2417–2426. [Google Scholar] [CrossRef]
Li, S.; Gu, C.; Zhao, P.; Cheng, S. A novel hybrid propulsion system configuration and power distribution strategy for light electric aircraft. Energy Convers. Manag. 2021, 238, 114171. [Google Scholar] [CrossRef]
Khokhar, B.; Dahiya, S.; Parmar, S. Load frequency control of a microgrid employing a 2D Sine Logistic map based chaotic sine cosine algorithm. Appl. Soft Comput. 2021, 109, 107564. [Google Scholar] [CrossRef]
Jalali, N.; Razmi, H.; Doagou-Mojarrad, H. Optimized fuzzy self-tuning PID controller design based on Tribe-DE optimization algorithm and rule weight adjustment method for load frequency control of interconnected multi-area power systems. Appl. Soft Comput. 2020, 93, 106424. [Google Scholar] [CrossRef]
Chen, Y.; Xie, Z.; Zhong, J.; Chen, P.; Xiao, J. SQKformer: Spiking sparse QKformer with adaptive batch normalization for membrane potential. Neurocomputing 2026, 671, 132666. [Google Scholar] [CrossRef]
Yin, L.; Wang, T.; Wang, S.; Zheng, B. Interchange objective value method for distributed multi-objective optimization: Theory, application, implementation. Appl. Energy 2019, 239, 1066–1076. [Google Scholar] [CrossRef]
Zhang, J.; Cheng, C.; Yu, S.; Wu, H.; Gao, M. Sharing hydropower flexibility in interconnected power systems: A case study for the China Southern power grid. Appl. Energy 2021, 288, 116645. [Google Scholar] [CrossRef]

Figure 1. Structure of SSNs.

Figure 2. Structure of CGANs.

Figure 3. Structure of SSGANs.

Figure 4. Framework of SSGANs for SGC.

Figure 5. IEEE two-area system.

Figure 6. Training curves of SSNs and CGANs: (a) SSNs; (b) CGANs.

Figure 7. Performance results of SSNs and CGANs: (a) SSNs; (b) CGANs.

Figure 8. Triangular load with Gaussian noise.

Figure 9. Result of online training: (a) curves of ∆f; (b) box diagram of ∆f.

Figure 10. Disturbance curve: (a) resident load; (b) renewable energy.

Figure 11. Dynamic responses of different algorithms in Area A: (a) ∆f; (b) ACE.

Figure 12. Online operation results in Case I: (a) Area 1; (b) Area 2.

Figure 13. China’s southern power grid.

Figure 14. Dynamic responses of different algorithms in Area 1: (a) ∆f; (b) ACE.

Figure 15. Online operation results in Case II: (a) Area 1; (b) Area 2; (c) Area 3; (d) Area 4.

Table 1. Parameters of SSGANs.

Mode	Layer	Hidden Unit	Active Function	Batch Normalization Size
Generator–Actor	1	4	ReLU	64
Generator–Actor	2	90	ReLU	64
Generator–Actor	3	90	ReLU	128
Generator–Actor	4	4	Tanh	-
Discriminator	1	4	ReLU	32
Discriminator	2	90	ReLU	32
Discriminator	3	90	ReLU	64
Discriminator	4	2	Sigmoid	-
Critic	1	4	ReLU	32
Critic	2	90	ReLU	32
Critic	3	90	ReLU	64
Critic	4	1	Sigmoid	-
BiLSTM	1	4	ReLU	64
BiLSTM	2	90	ReLU	64
BiLSTM	3	90	ReLU	128
BiLSTM	4	2	Sigmoid	-

Table 2. Main parameters of the comparison algorithms.

Parameter	DDPG	TD3	SAC	PPO	Actor–Critic GAN
Network type	MLP	MLP	MLP	MLP	CGANs-Actor + Critic
Hidden layers	2	2	2	2	2
Hidden units	[128, 128]	[128, 128]	[128, 128]	[128, 128]	[128, 128]
Activation function	ReLU	ReLU	ReLU	ReLU	ReLU
Optimizer	Adam	Adam	Adam	Adam	Adam
Learning rate	1×10⁻⁴/1×10⁻³	1×10⁻⁴/1×10⁻³	3×10⁻⁴/3×10⁻⁴	3×10⁻⁴/1×10⁻³	1×10⁻⁴/1×10⁻³
Discount factor	0.99	0.99	0.99	0.99	0.99
Batch size	128	128	128	128	128
Training steps	10,000	10,000	10,000	10,000	10,000

Table 3. Specific parameter settings of the comparison algorithms.

Algorithms	Specific Settings
DDPG	Replay buffer size = 1×10⁶; soft update coefficient = 0.005; Gaussian exploration noise linearly decayed from 0.20 to 0.05
TD3	Replay buffer size = 1×10⁶; soft update coefficient = 0.005; policy delay = 2; target policy smoothing noise = 0.20
SAC	Replay buffer size = 1×10⁶; soft update coefficient = 0.005; entropy coefficient automatically tuned
PPO	Rollout length = 2048; clip ratio = 0.20; GAE parameter = 0.95; update epochs per batch = 10
actor–critic GAN	Adversarial pretraining; discriminator hidden units = [128, 128]; replay buffer size = 1×10⁶

Table 4. Parameters of Case I.

Symbol	Parameter	Value
T_gA, T_gB	Governor time constant	0.08 s
T_tA, T_tB	Turbine time constant	0.3 s
T_pA, T_pB	Frequency response time constant	20 s
B_A, B_B	Primary frequency bias coefficient	4166 Hz/p.u.
K_A, K_B	Frequency response coefficient	0.00012 Hz/p.u.
R_A, R_B	Secondary frequency deviation coefficient	0.0047
T_AB	Time constant of the tie-line	3.42 s

Table 5. Evaluation indices of different algorithms in Case I.

Area	Algorithm	$\bar{\|Δ f\|}$ (Hz)	$\bar{\|A C E\|}$ (MW)	ISE	IAE	ITAE (×10⁷)
	DDPG	0.0106	84.7183	14.3926	884.6251	7.4368
	PPO	0.0094	76.2854	12.8567	803.4927	6.7815
Area 1	TD3	0.0078	63.9142	10.3275	674.3186	5.6247
	SAC	0.0076	62.4879	10.0843	661.7352	5.4819
	actor–critic GAN	0.0056	51.2685	6.7429	503.6148	4.2176
	SSGANs	0.0031	43.8267	4.0185	356.2943	3.0268
	DDPG	0.0103	82.9641	13.8754	852.7365	7.1642
	PPO	0.0091	74.6385	12.3948	781.4693	6.5147
Area 2	TD3	0.0075	61.8257	9.8756	642.5871	5.3195
	SAC	0.0074	60.7462	9.6423	631.4268	5.2064
	actor–critic GAN	0.0053	48.9254	6.2851	472.8639	3.9257
	SSGANs	0.0029	41.5376	3.7824	341.7592	2.8463

Table 6. Parameters of Case II.

Area	T_g (s)	T_t (s)	T_p (s)	B (Hz/p.u.)	K (Hz/p.u.)	R
Area 1	0.08	0.3	20	4166	0.00012	0.0047
Area 2	0.08	0.3	20	3850	0.00012	0.0050
Area 3	0.08	0.3	20	3500	0.00012	0.0052
Area 4	0.08	0.3	20	3700	0.00012	0.0048

Table 7. Evaluation indices of different algorithms in Case II.

Area	Algorithm	$\bar{\|Δ f\|}$ (Hz)	$\bar{\|A C E\|}$ (MW)	ISE	IAE	ITAE (×10⁷)
Area 1	DDPG	0.0069	141.3285	7.9352	574.2168	4.9826
	PPO	0.0066	136.5942	7.3627	552.4873	4.7765
	TD3	0.0062	129.4736	6.8241	524.9586	4.5487
	SAC	0.0061	127.8365	6.6758	517.3924	4.4823
	actor–critic GAN	0.0055	120.6847	5.9826	487.2645	4.1568
	SSGANs	0.0041	109.5738	5.3154	437.8256	3.8427
Area 2	DDPG	0.0071	145.8264	8.2865	591.7438	5.1254
	PPO	0.0068	140.9573	7.6942	568.3157	4.9086
	TD3	0.0064	133.6285	7.1056	540.8624	4.6635
	SAC	0.0063	132.0746	6.9584	533.7426	4.5982
	actor–critic GAN	0.0057	124.8639	6.2418	496.5284	4.2786
Area 3	SSGANs	0.0042	113.4927	5.5863	501.2468	3.9724
	DDPG	0.0072	148.3657	8.5243	604.3852	5.2847
	PPO	0.0069	143.2185	7.9286	581.7463	5.0625
	TD3	0.0065	136.4728	7.3462	554.3286	4.8264
	SAC	0.0064	134.8651	7.1845	547.1369	4.7583
	actor–critic GAN	0.0058	127.5264	6.4867	513.7942	4.4128
Area 4	SSGANs	0.0043	116.3846	5.7825	462.8165	4.4657
	DDPG	0.0070	143.7926	8.1568	584.9275	5.0642
	PPO	0.0067	138.5264	7.5483	560.3841	4.8427
	TD3	0.0063	131.8472	6.9765	532.9476	4.6128
	SAC	0.0062	130.2857	6.8247	525.6183	4.5461
	actor–critic GAN	0.0056	122.9465	6.1564	492.3857	4.2184
	SSGANs	0.0041	111.8365	6.2489	444.7268	3.9146

Table 8. Evaluation indices of ablation studies.

Algorithm	$\bar{\|Δ f\|}$ (Hz)	$\bar{\|A C E\|}$ (MW)	ISE	IAE	ITAE (×10⁷)	Training Time (h)	Computing Time (s)
SSGANs	0.0041	109.5738	5.3154	437.8256	3.8427	2.86	0.52
SSGANs without SSNs	0.0051	120.4486	5.7134	467.1862	4.2806	2.53	0.51
SSGANs without CGANs	0.0054	130.4773	6.1776	492.4875	4.5163	1.74	0.53
Actor–Critic	0.0059	141.3985	6.6608	526.7964	4.7757	1.35	0.5

Table 9. Quantitative comparison of dynamic response performance.

Algorithm	Settling Time (s)	Frequency Overshoot (Hz)
DDPG	965	0.146
PPO	842	0.132
TD3	735	0.119
SAC	708	0.113
actor–critic GAN	624	0.101
SSGANs	486	0.084

Table 10. Communication delay analysis of different algorithms.

Delay	Algorithm	$\bar{\|Δ f\|}$ (Hz)	$\bar{\|A C E\|}$ (MW)	ISE	IAE	ITAE (×10⁷)	Regulation Mileage (MW)
10 ms	DDPG	0.0081	163.8724	9.1046	655.4382	5.7341	2117.86
	PPO	0.0077	156.4826	8.3983	628.9164	5.4718	1964.27
	TD3	0.0072	146.3958	7.7395	592.8731	5.1784	1786.39
	SAC	0.0071	144.6287	7.5624	584.9365	5.0867	1735.42
	actor–critic GAN	0.0064	135.9276	6.8137	551.4628	4.7825	1548.73
	SSGANs	0.0049	123.6845	5.9276	476.3184	4.2865	1328.54
20 ms	DDPG	0.0094	188.5367	10.5864	756.8291	6.6237	2468.15
	PPO	0.0089	179.4825	9.7246	718.5372	6.2984	2296.43
	TD3	0.0082	165.7934	8.9185	666.4829	5.8925	2075.86
	SAC	0.0081	163.2846	8.7348	657.2396	5.7862	2014.38
	actor–critic GAN	0.0073	153.8462	7.8463	613.9584	5.4176	1796.27
30 ms	SSGANs	0.0058	141.9273	6.7824	538.6427	4.9738	1517.69
	DDPG	0.0112	221.6845	12.6482	897.4163	7.8654	2942.68
	PPO	0.0105	209.7538	11.5237	846.5284	7.4326	2734.15
	TD3	0.0097	191.4827	10.4526	778.3495	6.9148	2468.72
	SAC	0.0095	188.9364	10.2185	765.2841	6.7845	2394.56
	actor–critic GAN	0.0086	176.5482	9.1487	701.6283	6.2857	2142.83
	SSGANs	0.0069	166.4386	7.9465	627.8153	5.8462	1748.35

Table 11. Paired significance analysis of SSGANs on the two main control indices.

Algorithm	Metric	Mean Difference	Paired t-Test p-Value	Wilcoxon p-Value	95% Bootstrap CI
DDPG	$\bar{\|Δ f\|}$ (Hz)	0.0013	<0.001	<0.001	[0.0010, 0.0016]
DDPG	$\bar{\|A C E\|}$ (MW)	22.7099	<0.001	<0.001	[18.72, 26.91]
TD3	$\bar{\|Δ f\|}$ (Hz)	0.0009	0.001	0.001	[0.0006, 0.0011]
TD3	$\bar{\|A C E\|}$ (MW)	15.0188	0.001	0.001	[11.03, 18.76]
SAC	$\bar{\|Δ f\|}$ (Hz)	0.0007	0.002	0.002	[0.0004, 0.0009]
SAC	$\bar{\|A C E\|}$ (MW)	9.7874	0.002	0.002	[6.21, 12.94]
PPO	$\bar{\|Δ f\|}$ (Hz)	0.0011	<0.001	<0.001	[0.0008, 0.0014]
PPO	$\bar{\|A C E\|}$ (MW)	18.1640	<0.001	0.001	[13.47, 21.82]
actor–critic GAN	$\bar{\|Δ f\|}$ (Hz)	0.0004	0.018	0.015	[0.0001, 0.0006]
actor–critic GAN	$\bar{\|A C E\|}$ (MW)	4.5995	0.014	0.012	[1.72, 7.33]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, X.; Ouyang, X.; Chen, B.; Wang, X.; Zhu, T.; Yang, K.; Chen, R. Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes 2026, 14, 1872. https://doi.org/10.3390/pr14121872

AMA Style

Ye X, Ouyang X, Chen B, Wang X, Zhu T, Yang K, Chen R. Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes. 2026; 14(12):1872. https://doi.org/10.3390/pr14121872

Chicago/Turabian Style

Ye, Xi, Xuetong Ouyang, Baorui Chen, Xi Wang, Tong Zhu, Kai Yang, and Runzhi Chen. 2026. "Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids" Processes 14, no. 12: 1872. https://doi.org/10.3390/pr14121872

APA Style

Ye, X., Ouyang, X., Chen, B., Wang, X., Zhu, T., Yang, K., & Chen, R. (2026). Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids. Processes, 14(12), 1872. https://doi.org/10.3390/pr14121872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sample Selection Generative Adversarial Networks for Intelligent Frequency Regulation of Microgrids

Abstract

1. Introduction

2. Principle of Sample Selection Generative Adversarial Networks

2.1. Sample Selection Networks

2.2. Conditional Generative Adversarial Networks

2.3. Sample Selection Generative Adversarial Networks

2.4. Sample Selection Generative Adversarial Networks for SGC

3. Simulation Setup

3.1. Evaluation and Reward Function

3.2. Parameter Setting

4. Case Studies

4.1. Case I

4.1.1. Pretraining of SSNs and CGANs

4.1.2. Online Training of SSGANs

4.1.3. Online Operation

4.2. Case II

4.3. Discussion

4.3.1. Ablation Studies

4.3.2. Dynamic Response Performance

4.3.3. Communication Delay Analysis

4.3.4. Statistical Significance Analysis

4.3.5. Discussion of Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI