1. Introduction
In the context of the “dual-high” era—characterized by a high proportion of renewable energy and highly volatile loads—distributed energy sources such as wind and solar power have seen widespread deployment [
1,
2,
3]. Since these energy sources are typically integrated into the grid through power electronic converters, large-scale grid connection results in low-inertia and weak-damping characteristics, posing severe challenges to frequency stability and dynamic regulation capability of the power system [
4]. To enhance the system’s dynamic support capability, researchers have proposed the Virtual Synchronous Generator (VSG) control technology, which emulates the inertia, electromagnetic, and damping characteristics of a synchronous generator (SG) within an inverter. This allows the system to possess adjustable virtual inertia and damping, thereby effectively improving frequency stability and dynamic response quality under high renewable penetration scenarios [
5]. However, while providing inertia support, VSGs also inherit the low-frequency oscillation risks of SGs. Compared with traditional SGs, VSG parameters are adjustable, and the dynamic response performance can be enhanced by tuning the virtual inertia
J, damping coefficient
D, and active power droop coefficient
[
6]. If parameter selection is inappropriate, both frequency and power oscillations may be amplified. Therefore, achieving adaptive optimal parameter tuning under complex disturbances is of great research significance [
7].
Existing research on adaptive control of VSG parameters can be categorized into the following types. The first category includes mechanism-model-based adaptive adjustment methods, which rely on accurate mathematical models of the system and construct adjustment laws through small-signal analysis, linear techniques, or model predictive control [
8,
9]. Reference [
10] adopts current model predictive control for adaptive tuning of VSG parameters, resolving the contradiction between dynamic response and steady-state accuracy in traditional control. References [
11,
12] apply a linear quadratic regulator (LQR) to design adaptive tuning rules for virtual inertia and damping. Although these methods offer strong theoretical rigor, they require high model accuracy and parameter accessibility, making them difficult to apply in high-renewable systems with inherent uncertainties and nonlinearities [
13].
The second category consists of experience-based parameter adjustment strategies that do not require complex system modeling. Reference [
14] proposes a bang–bang adaptive control strategy that adaptively selects virtual inertia and damping based on frequency deviation and its rate of change. In [
15], a controller combining fuzzy logic and a differential algorithm is used to adjust virtual inertia and damping, improving frequency stability in low-inertia microgrids with high renewable penetration. Reference [
16] eliminates the need for precise mathematical modeling through expert knowledge and system measurement data, enabling the controller to adjust the virtual inertia parameter of the VSG. However, such methods rely heavily on thresholds and rule design, making it difficult to achieve robust performance and fast dynamics under complex disturbances.
The third category comprises intelligent optimization and data-driven strategy-learning approaches. References [
17,
18,
19] use intelligent optimization algorithms such as particle swarm optimization and genetic algorithms for VSG parameter tuning, but they still depend on accurate modeling. In recent years, data-driven reinforcement learning (RL) methods have been introduced into VSG parameter coordination control and have demonstrated strong performance in model-free scenarios. For example, reference [
20] applies Q-learning to VSG parameter tuning but only outputs a one-dimensional action—the virtual inertia
J—and the reward function only considers frequency deviation. Although this improves dynamic frequency and active power response to some extent, algorithm efficiency deteriorates significantly as state and action dimensions increase. To address this, reference [
21] proposes a DQN algorithm that uses a neural network to replace the Q-table in Q-learning for handling continuous states, with virtual inertia
J and damping
D set as discrete action space elements. Reference [
22] adopts the DDPG algorithm to simultaneously adjust
J and
D in continuous action space, achieving better dynamic frequency performance. Reference [
23] applies VSG control to modular multilevel converter (MMC) structures and uses the TD3 algorithm to adjust virtual inertia
J, damping
D, and voltage reference
E.
Reference [
24] further verifies the superiority of the Soft Actor–Critic (SAC) algorithm over other algorithms in VSG parameter adaptive control, demonstrating its effectiveness in suppressing power and frequency oscillations and shortening stabilization time. However, reference [
20,
21,
22,
23,
24] typically limit their action space to the adjustment of only the virtual inertia J and damping coefficient D, failing to incorporate the active power droop coefficient
, which also significantly influences system dynamic characteristics into a framework for real-time cooperative optimization (see analysis in
Section 2.2). This limitation results in insufficient flexibility for parameter adjustment under complex disturbances, thereby restricting further improvement of the system’s transient performance.
Building upon the aforementioned issues, this paper selects the Soft Actor–Critic (SAC) reinforcement learning algorithm as the foundational framework to investigate optimal VSG parameter regulation. The SAC algorithm encourages policy exploration through an entropy regularization term, demonstrating exceptional sample efficiency and training stability in stochastic environments [
25].
The main contributions of this paper are as follows.
A parameter adaptive control framework combining fuzzy logic and the Soft Actor–Critic (SAC) algorithm is proposed. By dividing the VSG operation process into different transient regions and constructing a state-cognition vector based on fuzzy membership, the controller can perform more targeted policy optimization for different transient characteristics.
A reward-design mechanism that integrates transient-region guidance and autonomous exploration is developed. This method applies expert-experience-based action limiting and guidance according to different transient regions of the VSG system, improving SAC-VSG training stability while ensuring system safety.
Beyond traditional SAC-VSG strategies that focus only on virtual inertia and damping, this work further investigates the influence of active power droop coefficient perturbation on VSG transient performance. An ISAC-VSG multi-parameter coordinated control strategy is proposed to enhance transient performance and robustness.
The rest of this paper is organized as follows:
Section 2 introduces the basic VSG model, system performance analysis, and parameter tuning.
Section 3 presents the improved SAC-VSG multi-parameter coordinated control strategy.
Section 4 verifies the effectiveness of the proposed method using MATLAB/Simulink simulations.
Section 5 provides conclusions and perspectives for future work.
3. VSG Multi-Parameter Collaborative Control Strategy Based on the Improved SAC-VSG
The grid-connected system employing the proposed improved SAC-VSG multi-parameter cooperative control strategy is illustrated in
Figure 3. Here,
denotes the DC power supply,
and
are the filter inductance and filter capacitance, respectively, and
is the grid-side inductance. The fundamental control layer consists of the active-power control loop and the reactive-power control loop. The improved part is the multi-parameter cooperative control strategy based on the improved SAC-VSG. In this strategy, the normalized angular frequency deviation
and its derivative
, the active-power deviation
and its derivative
, together with the transient membership vector
of the VSG, are fed into the agent. The agent then outputs an optimal set of virtual inertia
J, damping coefficient
D, and active-power droop coefficient
. These parameters are returned to the basic control layer to obtain the inverter output voltage magnitude
E and the impedance angle
of the equivalent VSG impedance. After passing through the dual voltage–current control loops and SPWM modulation, the signals are applied to the VSG main body to complete the overall closed-loop control.
3.1. Construction of the Fuzzy Five-Dimensional Membership Vector
3.1.1. Regional Division and Physical Meaning
When the power system is subjected to disturbances, the frequency and active-power output of the VSG experience a complete transient process. This process can generally be divided into several typical stages, each exhibiting distinct dynamic characteristics and physical significance. To enable adaptive parameter adjustment and efficient agent learning, the transient process is divided into five stages based on the frequency deviation and its variation trend , which are continuously represented through fuzzy membership functions.
As shown in
Figure 4, when the VSG operates at the steady-state point
o, a disturbance causes the power angle to jump from
to
, moving the VSG operating point from
o to
a. Due to damping effects, however, the trajectory does not follow the direct path
but rather the oscillatory path
.
In the stable region , the system remains steady with and . During the initial disturbance and acceleration stages and , since and is increasing, the system is in an acceleration phase. Therefore, J should be increased to enhance inertia, while D and are enlarged to suppress the frequency drop. In regions and , where and is still increasing, the system enters a deceleration phase. Here, J and should be reduced to accelerate frequency recovery, but this weakens damping against oscillations; hence, D must be increased to suppress overshoot.
Through the above analysis, only real-time coordinated adjustment of
J,
D, and
can effectively mitigate frequency oscillations and enhance the system’s disturbance rejection. The adjustment directions of these parameters according to
and
are summarized in
Table 2.
3.1.2. Design of the Membership Function
To avoid discontinuities caused by hard partitioning of transient regions, a fuzzy-partition-based five-dimensional membership vector construction method is proposed. The core idea is to map
and
into a two-dimensional feature space and apply Gaussian basis functions for soft division of the five transient regions. This yields a smooth state vector
as an auxiliary observation for the agent. The method enables smooth transitions across region boundaries, thereby improving the stability and robustness of the control strategy. The overall process is shown in
Figure 5.
To ensure that different physical quantities are handled on the same scale, the frequency deviation and its rate of change are first normalized, and a coupling term
e is introduced.
where
,
, and
are scaling factors. For each stage
, the unnormalized weight is defined as
where
denote fuzzy widths,
is the coupling coefficient, and
indicates the desired sign of
for each region. After normalization:
where
prevents division by zero and ensures
with
. To suppress high-frequency jitter, a first-order low-pass filter is applied:
where
is the sampling period and
s is the time constant.
The design of the aforementioned fuzzy parameters adheres to the following principles: the scaling factors
,
, and
are set according to the system’s permissible maximum frequency deviation, maximum rate of frequency change, and the typical magnitude of their product, respectively. The fuzzy widths
and
control the smoothness of regional transitions, and their values ensure a reasonable overlap between adjacent transient regions. Specific parameters are initially determined through offline analysis of typical disturbance scenarios, as detailed in
Table 3. It should be emphasized that the fuzzy parameter design in this paper aims to align the region division with the physical transient characteristics of the VSG, rather than pursuing a single optimal solution. Within the reinforcement learning framework, the agent is capable of autonomously learning and adapting to different parameter configurations. Therefore, as long as the parameters remain within a reasonable physical range, the proposed control strategy can maintain stable performance.
3.2. Establishment of a Markov Decision Model Based on SAC with Fuzzy Membership Vector
The frequency and power oscillation problem of the VSG under disturbances is modeled as a multi-objective optimization problem. Unlike conventional single-objective methods, the proposed framework incorporates partitioned optimization objectives corresponding to different transient stages, ensuring both frequency and power dynamic performance. The parameter adjustments of the VSG are formulated as a Markov Decision Process (MDP), comprising state space, action space, and reward function [
25].
3.2.1. State Space
To suppress frequency and power oscillations under disturbances while implementing stage-wise control strategies by recognizing different transient responses, the VSG state space is defined as
where
and
, with
and
representing the normalization coefficients for active power and angular frequency deviation, respectively.
3.2.2. Action Space
The action vector is defined by the control variables directly adjusted by the agent, corresponding to the changes in virtual inertia, damping, and droop coefficients:
Accordingly, the actual outputs of inertia
, damping
, and droop coefficient
re expressed as
where
,
, and
denote the initial values of virtual inertia, damping coefficient, and active power droop coefficient, respectively.
3.2.3. Reward Function
The reward function is designed to not only optimize the power and frequency performance but also account for the stage-specific transient behavior of the VSG. By introducing a fuzzy membership vector corresponding to the transient stages, the agent can learn differentiated optimization policies that achieve more targeted control in each phase.
During disturbances, smaller frequency deviations and shorter oscillation durations are preferred. The frequency penalty is defined as
Similarly, the active power is expected to fluctuate minimally around its reference and quickly return to steady state. The penalty function is given by:
To integrate the expert adjustment rules from
Table 2 into the learning process, an expected direction matrix
is constructed, encoding the expected variation trends of each parameter across different transient regions from
to
.
where
represents the expected directional tendency of each parameter (
J,
D,
) in stage
. Here, ‘1’ indicates that the parameter should be increased, ‘
’ indicates that it should be decreased, and ‘0’ indicates that the parameter should remain unchanged. The specific entries of matrix
E are fully consistent with the rules presented in
Table 2.
Given the fuzzy membership vector at the current time step,
, the comprehensive expected direction vector
for the three parameters is computed via weighted summation as
where
represents the desired direction of variation for each parameter. The directional reward is then determined by the cosine similarity between the actual action vector
and the expected direction
:
By penalizing significant differences between action values at adjacent time steps, the approach serves to constrain the rate of change for virtual inertia, damping, and droop coefficients.
Finally, the total reward is formulated as
The weighting of each term in the reward function is determined based on the principles of multi-objective optimization and the prioritization of system safety and stable operation. First, frequency stability is the primary objective; therefore, the penalty weight for frequency deviation is set to the largest value. Active power tracking is a fundamental function of the VSG, and its weight is assigned the second-highest value. The directional consistency reward is introduced to incorporate expert knowledge, with its weight calibrated so that its magnitude is comparable to the main penalty terms during the early stages of training. However, should not be excessively large, as it needs to allow the agent to explore appropriate values within a reasonable range. The action smoothness penalty weight is set relatively small to prevent overly restrictive limitations on dynamic adjustment capability while still being sufficient to suppress detrimental high-frequency oscillations.
3.3. Improved SAC-VSG Algorithm
The Soft Actor–Critic (SAC) algorithm is adopted to solve the MDP of VSG frequency optimization. Compared with conventional RL algorithms, SAC handles continuous action spaces more effectively and demonstrates superior convergence speed and training stability [
28].
Figure 6 illustrates the overall architecture of the improved SAC-VSG algorithm. Compared with the traditional SAC-VSG, this framework incorporates an expert knowledge guidance module. This module first fuzzifies the real-time VSG system measurements (
,
) into a membership degree vector
. Then, by integrating with the expert rule matrix
E from
Table 2, it synthesizes an expected direction vector
. The vector
is fed into the SAC agent as a soft guidance signal, encouraging the agent to output actions consistent with expert experience through the reward term
(Equation (
23)). Concurrently,
is also provided as part of the state input to the SAC agent, enabling transient region-aware perception. This design allows the algorithm to leverage expert domain knowledge to enhance interpretability while retaining the capability of reinforcement learning to autonomously explore optimal control policies.
3.3.1. Actor Network
In the improved SAC-VSG framework, the Actor network outputs the parameter set
, which are essential for the stable grid-connected operation of the VSG. The policy network, parameterized by
, is denoted as
and adopts a stochastic Gaussian policy. The network outputs the mean and log standard deviation to construct the action distribution. The sampled action is given by
To ensure differentiability during sampling, the reparameterization trick is applied:
followed by a hyperbolic tangent squashing function:
which guarantees that the final action lies within the normalized range
.
3.3.2. Critic Network
The Critic network evaluates state–action pairs by estimating their expected return (Q-value), thus providing gradient information for updating the Actor. A twin Q-network architecture is employed to suppress overestimation bias, where the minimum of the two Q-values is used for the target computation. The soft Q-value target follows the Bellman equation:
The target network parameters are updated using Polyak averaging:
where
determines the update rate and helps maintain training stability.
3.3.3. Network Architecture
As shown in
Figure 7, both the Actor network and the Critic networks employ fully connected feedforward neural networks. The specific configurations are as follows.
3.3.4. Overall Process of Parameter Adjustment Control Based on the Improved SAC-VSG Strategy
Figure 8 illustrates the overall flow of the parameter adjustment control based on the improved SAC-VSG strategy. The algorithm proceeds as follows: an initial set of values for virtual inertia
J, damping coefficient
D, and active power droop coefficient
is selected within their permissible ranges. These initial values, together with the VSG-connected system, are used to form the state space equations fed into the agent for training. During each episode, the agent receives a reward, which is accumulated until the cumulative reward is maximized. The optimal values of
J,
D, and
derived from the agent’s policy are then subjected to amplitude-limiting and optimization, yielding the parameter values for the next time step and completing the closed-loop control.
5. Conclusions
The improved SAC-based VSG multi-parameter coordinated control strategy (ISAC-3P) integrates the advantages of fuzzy logic and reinforcement learning, achieving a state–action dual-layer optimization under the Markov decision framework. This provides a robust and disturbance-resilient control solution for renewable-energy-based power systems.
By introducing a five-dimensional fuzzy membership vector and designing a reward function that combines stage-based guidance with autonomous exploration, the controller can perform differentiated optimization according to the VSG’s transient stages, effectively improving transient response speed and oscillation suppression. Meanwhile, a multi-parameter coordinated control mechanism with feasible-region constraints was developed, enabling adaptive coordination of virtual inertia, damping, and active power droop coefficients. This achieves a balanced trade-off between power dynamics and frequency stability, further enhancing system transient performance and robustness.
The ISAC-VSG strategy demonstrates superior transient performance under multiple disturbance conditions, validating its effectiveness and advanced capabilities. However, this study has certain limitations. The validation work is primarily based on simulation models of a single grid-connected system, and its coordination performance in complex multi-machine interconnected networks requires further investigation. Future research will focus on the following directions: (1) Extending the proposed architecture to multi-machine parallel VSG systems, investigating their coordination and communication interaction mechanisms; (2) Expanding the proposed framework to other advanced DRL algorithms such as TD3 for further investigation.