1. Introduction
The penetration rate of renewable energy sources such as wind power and photovoltaic power in power systems keeps increasing. A large number of power electronic converters have replaced traditional synchronous generator sets, which has led to a significant reduction in the equivalent rotational inertia and continuous deterioration of the damping characteristics of power grids. Consequently, the system is prone to rapid frequency drop, power oscillation and other issues under disturbances, seriously endangering the safe and stable operation of power grids [
1]. By simulating the inertia, damping, frequency regulation and voltage regulation characteristics of synchronous generators, virtual synchronous generator (VSG) control can provide effective inertia support as well as frequency and voltage support for low-inertia power systems and has become a mainstream control scheme for new energy grid integration, microgrids and grid-forming converters. Nevertheless, under practical operating conditions including a weak grid, high resistance-to-reactance ratio and large power angle, strong cross-coupling exists between the active and reactive power control loops of VSGs [
2]. This not only causes steady-state errors and distorted dynamic responses of reactive power but also exacerbates frequency and voltage oscillations, restricts power transmission capacity, and even triggers system instability. Recent studies have indicated that coupling interactions may become a dominant source of instability in power-electronic-based energy conversion systems under weak grid conditions. In particular, an analysis of coupling dissipativity has revealed that severe coupling dynamics rather than local subsystem dynamics can trigger oscillatory instability, highlighting the importance of coupling suppression for system stability enhancement [
3]. Therefore, realizing the efficient power decoupling of VSGs under complex operating conditions and collaboratively improving inertia–damping support and dynamic stability have become critical problems to be solved urgently.
To address the problems of the power oscillation and frequency stability of VSGs, scholars have conducted extensive studies on adaptive virtual inertia and damping control. Ref. [
4] proposes a distributed adaptive virtual inertia control strategy for multiple grid-connected VSGs, which adjusts inertia adaptively according to the deviation between local frequency and regional average frequency to suppress power oscillations and optimize frequency dynamic responses, yet it ignores the influence of power coupling on control performance. Ref. [
5] applies adaptive virtual inertia to the VSG control of AC microgrids, which increases inertia during frequency deviation and reduces inertia during frequency recovery to balance frequency support and rapid convergence while lacking the design of active–reactive power decoupling. In [
6], a data-driven-based optimal virtual inertia control strategy for VSGs is designed to enhance the suppression effect on frequency drop and the rate of frequency change, but the cross-power coupling issue remains unresolved. Ref. [
7] presents a multi-objective adaptive inertia–droop control strategy for wind turbines, which realizes coordinated frequency support and speed recovery and is suitable for scenarios with high wind power penetration, yet it cannot be directly applied to VSG power decoupling. Ref. [
8] establishes an adaptive virtual inertia–damping system based on model predictive control (MPC) to improve the frequency stability of low-inertia microgrids, without dedicated decoupling design for power coupling. Ref. [
9] combines virtual inertia with additional damping controllers to suppress the oscillations of wind power grid-connected systems, which also fails to eliminate coupling among power loops. Ref. [
10] proposes fuzzy adaptive virtual inertia for doubly fed induction generators to improve frequency regulation capability, without involving VSG power decoupling and coordinated control.
To solve the power coupling problem of VSGs, existing studies have focused on in-depth investigations of power decoupling control methods. Ref. [
11] proposes a full-state feedback power decoupling strategy for VSGs, where the state feedback matrix is designed to eliminate active–reactive power coupling while retaining inertia response capability. Disturbance injection is further introduced to reduce dependence on line parameters and improve system stability. However, the method mainly relies on state feedback decoupling design and does not incorporate adaptive inertia–damping optimization under varying operating conditions. Ref. [
12] proposes a frequency and amplitude feedforward decoupling strategy for grid-forming converters to realize decoupling between active and reactive power control loops, which features a simple principle and easy implementation but fails to consider operating conditions with large power angles and weak grids. Ref. [
13] presents a VSG power decoupling method based on online adaptive feedforward compensation (OAFC) for weak grids, which can adapt to time-varying line impedance and variable operating points, yet it is not coordinated with inertia and damping control. Ref. [
14] comprehensively reviews and classifies active power decoupling (APD) control strategies, providing theoretical references for power ripple suppression in single-phase systems. Ref. [
15] puts forward a dynamic power decoupling method for grid-forming converters under strong grids to restrain dynamic interaction between active and reactive power, whereas its applicable scenarios are limited. Ref. [
16] establishes hierarchical adaptive decoupling control (HADC) for wind farms connected to weak grids, which suppresses the power coupling caused by phase-locked loop and grid strength and improves grid-friendliness. In the field of power decoupling, Refs. [
17,
18] respectively propose common-ground and reduced-switch active power decoupling topologies for photovoltaic inverters to reduce leakage current and second-order power ripple. Refs. [
19,
20] achieve the decoupled power control of photovoltaic systems via model predictive control to accelerate dynamic response. Ref. [
21] realizes second-order harmonic active decoupling for differential boost inverters. Refs. [
22,
23] propose dual synchronous coordinate decoupling and phase-locked loop-free decoupling strategies for multi-source inverters and marine propulsion systems respectively. Although the above methods have achieved favorable effects in power decoupling and harmonic suppression, most of them focus on topology design, single-phase systems or specific types of converters and cannot be directly applied to satisfy the coordinated control requirements of inertia, damping and decoupling for VSGs under the conditions of a weak grid and large power angle [
24]. In general, existing decoupling strategies either are only applicable to small-power-angle scenarios or operate independently from inertia and damping control, making it difficult to simultaneously achieve the comprehensive goals of decoupling, frequency stabilization and oscillation suppression under complex operating conditions.
To this end, this paper proposes a VSG control strategy based on reactive power feedforward decoupling and coordinated adaptive damping–inertia regulation. First, a quantitative coupling analysis based on small-signal modeling and a dynamic relative gain array (DRGA) is conducted to reveal the influence of large-power-angle and high-R/X conditions on active–reactive power coupling. Based on this analysis, a reactive-power-oriented feedforward decoupling strategy is designed to suppress Q–θ coupling while preserving the intrinsic inertia support characteristics of the active power loop. Furthermore, a deep deterministic policy gradient (DDPG)-based adaptive damping–inertia control method is developed, where frequency, voltage, power, and coupling degree are incorporated into the state space to achieve the online coordinated optimization of virtual inertia and damping. Compared with existing decoupling or adaptive control methods, the proposed approach simultaneously achieves effective power decoupling, enhanced inertia support, improved oscillation suppression, and strong robustness against parameter uncertainty, thereby significantly improving the dynamic performance and grid-connected stability of VSGs under weak grid and high-R/X operating conditions.
3. Power Feedforward Decoupling
Considering that the active power loop is closely linked with the VSG swing equation, its dynamic characteristics directly determine the inertial support capability and frequency stability of the system. Strong decoupling compensation applied to the active loop can suppress power coupling, yet it may weaken the inherent synchronous generator characteristics, reduce system damping and dynamic stability, and even trigger frequency oscillations under weak grid and large-power-angle operating conditions. For this reason, this paper retains the original inertial response mechanism of the active power loop and only adopts directional feedforward decoupling for the reactive power loop to weaken the cross-coupling between reactive power and power angle so as to optimize the dynamic performance of reactive power and grid voltage. Meanwhile, to improve the poor dynamic adaptability of the active power loop, the adaptive damping and inertia control based on the DDPG is further introduced to realize the coordinated optimization of system stability and dynamic response performance.
The power deviation caused by the power angle is corrected using
,
. Combined with the equation
, the reactive power feedforward link can be expressed as follows:
The reactive power control loop with power feedforward decoupling is shown in
Figure 7.
When the VSG operates at steady state, the output reactive power of the VSG is as follows:
The power angle of the VSG becomes
after system disturbance, where
, and the corresponding output reactive power is as follows:
contains steady-state terms and transient deviations. To eliminate reactive power deviation, the power control term
Qv is added to the reactive power loop.
To further verify the influence of different decoupling strategies on system stability, eigenvalue migration analysis is carried out based on the established small-signal model.
To quantitatively analyze the effect of different decoupling strategies on system stability, this paper further investigates the migration law of closed-loop eigenvalues when the virtual inertia J varies from 1 to 40, and the results are depicted in
Figure 8. At J = 10, the dominant poles of the reactive power decoupling scheme are −1.79 ± j4.46, while those of the full PQ decoupling scheme are −1.24 ± j6.51. The damping ratio of the proposed reactive power decoupling strategy is 0.372, in contrast to 0.187 for the full PQ decoupling method. The reactive-power-only decoupling strategy achieves a higher damping ratio and superior stability. All eigenvalues under both control schemes lie in the left half of the complex plane, which guarantees stable system operation. As the virtual inertia increases, the dominant conjugate complex poles gradually shift toward the imaginary axis, indicating reduced oscillation frequency and degraded system damping. Compared with the full active–reactive power decoupling strategy, the dominant poles of the reactive-power-only decoupling strategy always stay further left with larger negative real parts and higher damping, which confirms the larger stability margin of the proposed scheme.
4. Adaptive Damping and Inertia Control
To enhance the adaptive capability of the VSG against variations in operating conditions, this paper introduces the deep reinforcement learning algorithm DDPG to construct a cooperative adaptive damping and inertia control method. By perceiving the real-time dynamic state of the system, the proposed method autonomously adjusts the virtual inertia and damping parameters, thereby optimizing the system’s dynamic performance and suppressing oscillations.
4.1. Principle of DDPG Algorithm
The DDPG is a deep reinforcement learning algorithm based on the Actor–Critic framework, which is applicable to control problems with continuous action spaces. Its core idea is that the agent continuously interacts with the environment, outputs control actions according to the system’s operating state, and updates the control strategy in real time based on the reward function to obtain the optimal control effect.
The DDPG mainly consists of four components: an online Actor network, online Critic network, target Actor network and target Critic network. Specifically, the Actor network outputs control actions according to the current system state; the Critic network evaluates the quality of the executed actions; the target networks are adopted to improve training stability; and the experience replay mechanism is used to reduce sample correlation.
The basic control process is shown in
Figure 9.
First, the agent collects the environmental state information
st at the current time step
t and feeds it into the online Actor network
μ. The online Actor network then obtains the current action
at based on its current network parameters
θμ and random noise
Nt. The corresponding formula is as follows:
where
denotes the action output by the Actor network;
Nt is the random exploration noise.
After the environment executes the action, it obtains the reward value rt and transitions to the next state st+1. Upon acquiring the current environmental state st and the action at, the reward rt is calculated, and the environment then proceeds to the next state st+1. To fully utilize the experience to accelerate network training, the experience tuple (st, at, rt, st+1) is stored in the experience replay buffer.
The Critic network evaluates the current action benefit through the action–value function Q(s,a):. θQ is the parameter of the Critic network.
Assume that the number of experiences sampled each time is
M. The training of the Critic network is similar to that of supervised learning, and the formula for minimizing the loss function is as follows:
The target value
yi is given by the following:
where
γ is the discount factor;
Q′ is the target Critic network.
The Actor network is updated via the policy gradient:
where
N is the total number of time steps in one episode.
Finally, the agent updates the target Actor and target Critic networks with a tiny update rate
τ. The corresponding update formulas are as follows:
Through continuous iterative training, the DDPG agent can learn the optimal adjustment strategy of damping and inertia parameters under different operating conditions.
4.2. Implementation of Adaptive Damping and Inertia Control
To enable the agent to accurately perceive the dynamic characteristics of the system, this paper selects variables such as frequency, voltage, power and coupling state to construct the state space. The system state variables are defined as follows:
where Δ
f denotes frequency deviation;
represents the rate of change of frequency (RoCoF); Δ
P is active power deviation; Δ
Q is reactive power deviation; Δ
U stands for voltage deviation;
θ is power angle variation;
is the power coupling index.
The power coupling index is used to characterize the degree of dynamic coupling in the system. As system coupling increases, will increase significantly. Therefore, introducing it into the state space enables the agent to perceive the coupling status, thereby improving its dynamic adaptability under complex operating conditions.
The action output by the DDPG agent is the virtual inertia and damping parameters of the VSG, i.e.,
where
J denotes the virtual inertia;
D represents the virtual damping coefficient.
To ensure the operational stability of the system, constraint ranges are set for the action space:
When the inertia and damping coefficients satisfy the aforementioned constraint ranges, the real parts of the dominant closed-loop eigenvalues remain negative, guaranteeing the asymptotic stability of the closed-loop system. Accordingly, the aforementioned stability region is adopted in this paper as the action space constraint for the DDPG algorithm, which restricts the online optimization of the agent strictly within the stability domain and thus prevents the generation of unstable control parameters.
By dynamically adjusting J and D online, the system can autonomously change its inertia support capability and damping level according to different operating conditions, thus achieving the optimization of dynamic performance.
The reward function determines the optimization objective of reinforcement learning, and its design directly affects the training effect of the agent. To comprehensively improve the system’s frequency stability, dynamic response capability, and power coupling suppression capability, this paper constructs the following reward function:
where
w1∼
w6 are the weight coefficients of each performance index. The weighting coefficients are set as
w1 = 2,
w2 = 1,
w3 = 1.5,
w4 = 1,
w5 = 2, and
w6 = 1. The larger values of
w1 and
w5 are adopted to enhance frequency stability and the suppression capability of power coupling;
w3 is configured to damp active power oscillations;
w2 and
w4 are used to balance the dynamic performance of voltage and reactive power; and
w6 is designed to shorten the dynamic settling time of the system.
Ts denotes the dynamic regulation time of the system.
The design of the multi-objective reward function realizes the collaborative optimization of frequency stability, voltage stability and coupling suppression.
4.3. Training Environment and Parameter Settings of DDPG
To improve the repeatability and generalization capability of the proposed adaptive damping and inertia control strategy, the training environment and network hyperparameters of the DDPG agent are uniformly configured in this paper. Both the Actor and Critic networks adopt fully connected feedforward neural networks. The Actor network consists of two hidden layers with 128 and 64 neurons respectively; the rectified linear unit (ReLU) is selected as the activation function, and the tanh function is used at the output layer to constrain the output range of control actions. The Critic network takes the concatenated state–action pair as its input, with two hidden layers also configured with 128 and 64 neurons.
During training, the capacity of the experience replay buffer is set to 1 × 106, and the mini-batch size M is fixed at 256. The discount factor γ is specified as 0.99 to balance long-term dynamic performance, and the soft-update coefficient τ for target networks is set to 0.005 to enhance training stability. The learning rates of the Actor and Critic networks are 1 × 10−4 and 5 × 10−4, respectively. The Ornstein–Uhlenbeck stochastic noise is adopted as the exploration strategy, whose initial standard deviation is initialized to 0.2 and decays gradually along with training episodes.
To strengthen the agent’s adaptability against complicated operating conditions, grid parameters and disturbance scenarios are randomly varied throughout the training process. Specifically, the line resistance-to-inductance ratio R/X ranges from 0.1 to 1, the short-circuit ratio (SCR) varies between 2 and 10, the steady-state power angle is limited within 10°–45°, the amplitude of frequency disturbance is constrained to ±0.5 Hz, and active power step disturbance is set from 0.2 to 0.5 p.u. Each training episode contains 1000 control time steps, and the total number of training episodes is 3000. The training is regarded as converged if the variation rate of the average reward over consecutive 200 episodes is less than 5%.
Figure 10 illustrates the variation in the average reward of the DDPG agent against training episodes. To eliminate the influence of randomness, a moving average algorithm is employed to smooth the training reward curve. At the early training stage, the reward fluctuates drastically as the agent is in the random exploration phase. As training proceeds, the agent gradually learns the inertia–damping regulation strategy adaptable to diverse operating conditions, leading to a continuous rise in the average reward and a remarkable reduction in fluctuation amplitude. After approximately 1500 episodes, the reward converges to a steady value, which verifies that the DDPG agent finishes convergent training and acquires an optimal control policy.
Benefiting from randomized training under multiple operating scenarios, the DDPG agent can learn the optimal coordinated inertia–damping regulation strategy corresponding to various grid strengths and operating states, improving the robustness and generalization capability of the resultant control strategy.
This paper introduces the virtual inertia and damping parameters output by the DDPG into the VSG swing equation.
where
J(
t) and
D(
t) are the dynamic parameters output in real time by the DDPG.
When the system is disturbed, the agent increases virtual inertia to enhance inertia support capability if the rate of frequency change is large. If system oscillation intensifies, it raises the damping coefficient to strengthen oscillation suppression ability. After the system restores stability, damping and inertia are automatically reduced to improve dynamic response speed.
Therefore, this method can realize the online collaborative optimization of damping and inertia parameters according to the system’s dynamic operating state and improve the dynamic stability performance of the VSG under weak grid conditions.
The overall control block diagram is shown in
Figure 11.
5. Experimental Test Results
5.1. The Hardware-in-the-Loop Test Platform
A hardware-in-the-loop test platform composed of a real-time simulation machine and physical DSP controller is built, as shown in
Figure 12. The main circuit of the power system is modeled and simulated on the real-time simulation device, while the proposed control strategy is programmed and operated in the physical DSP controller.
To verify the effectiveness of the proposed VSG control strategy combining reactive power feedforward decoupling and adaptive inertia–damping control in theoretical analysis, algorithm design and engineering implementation and clarify its dynamic control performance and grid-connected adaptation advantages under complex operating conditions such as a weak grid, large power angle and high resistance-to-reactance ratio, comprehensive experimental verifications are carried out on the hardware-in-the-loop test platform from the perspectives of the power decoupling effect, frequency and voltage support, dynamic response speed and disturbance rejection stability. The steady-state and dynamic output characteristics of the proposed strategy are compared with those of traditional VSG control, conventional adaptive inertia control and independent power decoupling control. The improvement effects of reactive power feedforward decoupling and DDPG-based adaptive inertia–damping control on system stability, power transmission capacity and dynamic performance are quantitatively analyzed.
Taking single-machine simulation as an example, the specific parameters are as follows (
Table 1):
5.2. Experimental Results Under Active Power Command Variation
As shown in
Figure 13 and
Figure 14, the active power command steps up from 0.5 p.u. to 1 p.u. at
t = 1 s. All four control strategies can realize active power tracking, but their dynamic performances differ significantly.
It can be seen from
Figure 13 that both traditional VSG control and conventional adaptive inertia control suffer from large overshoot and oscillation, with overshoots of about 0.16 p.u. and 0.154 p.u. respectively. Although independent power decoupling control weakens partial coupling effects, it features slow response speed and residual fluctuations in the later stage. In contrast, the proposed strategy has an overshoot of merely 0.04 p.u. and returns to steady state more rapidly, which verifies its superior dynamic response and oscillation suppression capability.
As illustrated in
Figure 14, active power disturbances cause reactive power fluctuations under all strategies, revealing obvious active–reactive power coupling in the system. Traditional VSG control produces the maximum reactive power fluctuation of around 0.18 p.u., while the proposed strategy achieves the minimum fluctuation and fastest convergence. This proves that the adopted reactive power feedforward decoupling can effectively block the disturbance transmission from the active power loop to the reactive power loop and reduce the system’s coupling degree.
In summary, the proposed strategy can improve active power dynamic performance while effectively mitigating active–reactive power coupling, thus enhancing overall system stability.
5.3. Comparison of Control Strategies Under Frequency Disturbance
As shown in
Figure 15 and
Figure 16, grid frequency drops by 0.5 Hz during
t = 1∼1.5 s to verify the dynamic support capability and stability of different control strategies under frequency disturbance.
It can be observed from
Figure 15 that all strategies adjust active power to provide inertial support after frequency disturbance occurs. Severe power oscillations exist in traditional VSG control and conventional adaptive inertia control with obvious fluctuations during dynamic recovery. Independent power decoupling control suppresses oscillations yet suffers from slow response speed. By contrast, the proposed strategy rapidly raises active power output with the maximum active support up to about 2.6 p.u., far exceeding other methods, which demonstrates stronger transient inertial support and frequency regulation capability. Meanwhile, it presents lower oscillation amplitude and faster steady-state recovery.
From
Figure 16, traditional VSG control leads to the largest frequency drop and longest recovery time during grid frequency sag. Conventional adaptive inertia control and independent power decoupling control optimize frequency dynamic performance but still have noticeable overshoot and oscillation. The proposed strategy maintains a higher minimum frequency with a recovery time of approximately 1.2 s, achieving superior performance. It proves that the presented adaptive inertia–damping control can adjust inertia and damping parameters in real time according to the system’s operating conditions, effectively strengthening system frequency support and damping characteristics.
5.4. Comparison of Strategies Under Different Short-Circuit Ratios
To verify the adaptability of the proposed strategy under different grid strengths, tests are carried out under SCR = 2 and SCR = 10, and the results are presented in
Figure 17 and
Figure 18.
Figure 17 and
Figure 18 show the active power response curves of different control strategies under transient disturbances with short-circuit ratios of 2 and 10 respectively. The comparison indicates that the traditional VSG control has the largest active power fluctuation, obvious overshoot and power drop, as well as longer dynamic recovery time. Conventional adaptive inertia control and independent power decoupling control can suppress power oscillations to a certain extent, but they still have prominent transient deviations and slow convergence. In comparison, the proposed strategy achieves the smallest overshoot and minimum power drop and can converge to steady state rapidly with nearly no obvious oscillation during transients. The results verify that the proposed method can effectively restrain active power fluctuations and improve system transient stability and dynamic response performance under both weak and relatively strong grid conditions, which confirms the superiority and strong robustness of the adaptive inertia–damping control and reactive power feedforward decoupling scheme in complex grid environments.
5.5. Sensitivity Analysis of Impedance Estimation Error
To evaluate the robustness of the proposed feedforward decoupling control strategy against grid impedance uncertainty, this section conducts a sensitivity analysis on the estimation error of total grid impedance Zsum. In practical power systems, Zsum cannot be measured accurately. Restricted by changes in network topologies and estimation methods, deviations will exist between the estimated value adopted in the controller and the true value.
Define the estimated impedance used in the controller as follows: .
Here, k denotes the scaling coefficient and represents the estimation error. Five typical operating conditions with k = 0.5, 0.8, 1.0, 1.2, 1.5 are selected in this paper, corresponding to impedance estimation errors of −50%, −20%, 0%, +20% and +50% respectively.
The key performance indicators of the system are compared under different error conditions, as shown in
Figure 19 and
Figure 20.
Figure 19 and
Figure 20 present the active and reactive power responses with grid impedance estimation errors of ±50% and ±20%. Under all operating conditions, active power can be tracked stably, and the fluctuation amplitude of reactive power is limited within 0.11 p.u. with rapid convergence. The results verify the strong robustness of the proposed strategy against impedance uncertainty.
5.6. Composite Disturbance Conditions
Combined with the actual operating characteristics of weak grids, a composite disturbance scenario consisting of active power step change and grid frequency drop is established. At t = 1 s, the active power reference steps up from 0.5 p.u. to 1.0 p.u., and grid frequency drops by 0.2 Hz simultaneously. This setup simulates the typical complex operating conditions where the on-site converter is subjected to sudden power variation and frequency disturbance concurrently.
Figure 21 shows the active power response curve under the composite disturbance of active power step change and grid frequency drop. The comparison results indicate that the overshoot of active power under traditional VSG control reaches 15.6%, while that of the proposed strategy is only 7%. The proposed strategy features smaller dynamic fluctuations and faster recovery speed, which effectively suppresses the impact of composite disturbance on active power.