Next Article in Journal
Towards Detecting Chinese Harmful Memes with Fine-Grained Explanatory Augmentation
Previous Article in Journal
AHN-BudgetNet: Cost-Aware Multimodal Feature-Acquisition Architecture for Parkinson’s Disease Monitoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Transient Power Angle Control for Virtual Synchronous Generators via Physics-Embedded Reinforcement Learning

1
School of Electrical Engineering and Automation, Wuhan University, Wuhan 430000, China
2
China Electric Power Research Institute, Beijing 100192, China
3
Power Systems Engineering Center, National Renewable Energy Laboratory, Golden, CO 80401, USA
4
State Grid Hubei Electric Power Research Institute, Wuhan 430074, China
5
Electrical and Computer Engineering Department, University of Denver, Denver, CO 80280, USA
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(17), 3503; https://doi.org/10.3390/electronics14173503
Submission received: 21 July 2025 / Revised: 22 August 2025 / Accepted: 26 August 2025 / Published: 1 September 2025

Abstract

With the increasing integration of renewable energy sources and power electronic converters, Grid-Forming (GFM) technologies such as Virtual Synchronous Generators (VSGs) have emerged as key enablers of future power systems. However, conventional VSG control strategies with fixed parameters often fail to maintain transient stability under dynamic grid conditions. This paper proposes a novel adaptive GFM control framework based on physics-informed reinforcement learning, targeting transient power angle stability in systems with high renewable penetration. An adaptive controller, termed the 3N-D controller, is developed to periodically update the virtual inertia and damping coefficients of VSGs based on real-time system observations, enabling anticipatory adjustments to evolving operating conditions. The controller leverages a reinforcement learning architecture embedded with physical priors, which captures the high-order differential relationships between rotor angle dynamics and control variables. This approach enhances generalization, reduces data dependency, and mitigates the risk of local optima. Comprehensive simulations on the IEEE-39 bus system with varying VSG penetration levels validate the proposed method’s effectiveness in improving system stability and control flexibility. The results demonstrate that the physics-embedded GFM strategy can significantly enhance the transient stability and adaptability of future power grids.

1. Introduction

With the rapid increase in renewable energy penetration driven by global carbon neutrality goals, the modern power grid is undergoing a fundamental transformation. A large number of power electronic converters are replacing conventional synchronous generators (SGs), significantly reducing system inertia. This transition increases the system’s vulnerability to disturbances and poses serious challenges to transient stability [1,2,3]. Traditional stability analysis and control methods, which rely on synchronous machine-based assumptions, are increasingly inadequate for this new paradigm. As a result, there is a growing demand for advanced control frameworks that can ensure system stability in a low-inertia, converter-dominated grid environment [4,5,6].
To address these challenges, grid-forming (GFM) control technologies have gained significant attention. Among them, VSG control is particularly prominent due to its ability to mimic the inertial and damping characteristics of conventional synchronous machines [7,8,9,10]. VSG-equipped inverters contribute to system frequency support and enhance transient stability by emulating generator dynamics. However, conventional VSG implementations typically rely on fixed control parameters, such as virtual inertia and damping, which limits their adaptability to varying operating conditions and network configurations. In practical systems, grid conditions fluctuate due to varying loads, faults, and renewable output. Fixed-parameter VSG controllers may lead to suboptimal performance or even system instability under such dynamic environments. This lack of adaptability restricts the capability of GFM-based systems to maintain rotor angle synchronization and overall system stability, especially in scenarios with high penetration of renewables and weak grid conditions.
Several methods have been proposed to tune VSG parameters, including exhaustive search [11], equal-area criteria [12], and worst-case scenario analysis frameworks [13]. While these methods can provide reasonable performance in well-modeled systems, they often require precise knowledge of system parameters, network topology, and operating conditions. Such requirements are difficult to meet in practice, especially in complex or time-varying networks. Consequently, these approaches lack generalization capability and are often unsuitable for online or real-time applications [5,14].
To overcome the limitations of model-based tuning approaches, reinforcement learning (RL) has emerged as a promising data-driven alternative [15,16,17,18,19]. RL enables online adaptation of control parameters by learning directly from system observations, without requiring detailed physical models. It offers fast decision-making and has demonstrated strong potential for real-time applications in power system control. However, despite these advantages, RL-based methods face significant challenges that hinder their deployment in practical systems [20,21].
(1) Scarcity of high-quality scenario samples: In data-driven power system methods, sample quality and quantity are critical for model accuracy. Reference [22] introduces a dataset generation method using infeasibility certificates to reduce unsafe regions. However, insufficient initial sampling or poorly selected infeasible points can leave unsafe areas uncovered and limit safe samples in the dataset. Similarly, ref. [23] highlights that insufficient or noisy historical data causes models to poorly represent real operating conditions, especially in systems with high renewable energy penetration. Samples often focus on high-probability regions, ignoring critical low-probability states. Moreover, ref. [24] stresses that insufficient high-quality samples in large-scale systems or environments with many uncertainties result in inaccurate predictions for complex or rapidly changing conditions.
(2) Low learning efficiency and unreliable strategies of RL agents: The complexity of power systems poses challenges for learning efficiency and strategy reliability in data-driven models. While ref. [25] shows that linearization simplifies power flow equations, it fails to capture nonlinear system behavior under extreme conditions, leading to unreliable strategies. Reference [26] points out that linear regression-based distributed algorithms improve efficiency but lack physical mechanism support, making them inadequate for complex networks. Similarly, ref. [27] highlights that high-dimensional observables increase computational complexity, reducing learning efficiency and reliability, particularly for discrete events or power electronic dynamics. Finally, ref. [28] underscores that linearization improves computation but overlooks critical nonlinear characteristics, compromising accuracy.
To address these limitations, physics-informed learning (PIL) has been proposed as a hybrid approach that embeds physical laws into data-driven models. One common method involves using physical information as a separate component in the neural network for predictions. For example, ref. [29] proposes a real-time voltage control method that embeds physical knowledge into reinforcement learning (RL) through a Trainable Action Mask (TAM), improving training efficiency and control performance. However, the TAM can increase model complexity and require more computational resources. It may also lead to suboptimal or unstable control strategies if training data is insufficient or if the system state changes frequently. In [30], physical knowledge is directly embedded into the Actor design. The physics-informed Actor uses power flow equations to ensure that the generated actions satisfy the equality constraints of the Optimal Power Flow problem. Another approach incorporates physical knowledge as a constraint in the loss function during RL. Ref. [31] introduces a voltage ratio constraint in the loss function to guide Multi-Agent Deep RL (MADRL) for solving distributed voltage control problems. Additionally, ref. [32] proposes an analytical physical model for wind power output during extreme events, which is used in the Pinball error evaluation function during training. However, despite these efforts to integrate physical knowledge into RL, most methods fail to address transient processes in power systems, particularly the underlying dynamic mechanisms.
Physics-Informed Neural Networks (PINNs) are a widely used approach within the field of physics-informed learning (PIL) [33,34,35]. By integrating data-driven approaches with fundamental physical laws, PINNs overcome the limitations of traditional deep neural networks (DNNs) [36,37,38,39], enabling them to effectively solve ordinary differential equations (ODEs), identify parameters, and tackle complex problems such as partial differential equations (PDEs). PINNs not only enhance robustness and reduce reliance on large datasets but are also particularly well-suited for modeling transient processes, where accurate and reliable predictions are critical. This makes PINNs highly deployable in the New Power System, where reliability and limited scenario samples are key considerations.
Compared with model-predictive control (MPC) and robust optimization, classical implementations must solve a constrained optimization at every control step, and their runtime grows with model dimension, prediction horizon, and constraint set [40]. Robust variants such as tube or min–max MPC further enlarge the problem size and introduce conservatism, which aggravates real-time feasibility issues [41]. In power-electronic and renewable-integration applications, long horizons or detailed multi-machine dynamics often render conventional MPC too computationally heavy for sub-100 ms deadlines [42]. Even recent work seeks to accelerate MPC by embedding machine-learning surrogates to alleviate its online latency, underscoring the difficulty of plain MPC in real-time grids [43]. Moreover, grid-forming inverter control requires almost instantaneous response to disturbances, which exacerbates the challenge [44]. In contrast, physics-informed reinforcement learning (PIRL) executes with (near) constant online latency and leverages embedded physical constraints to enforce consistency and generalization even under model mismatch [45]. This motivates the development of a PIRL-based adaptive control framework that prioritizes lower latency, scalable online computation, and physical fidelity in uncertain, time-varying power-system environments.
In this paper, we propose an adaptive grid-forming control framework for VSGs based on PIRL. The proposed method leverages the expressiveness of RL for predictive adaptation based on periodically updated system knowledge and the physical consistency of PINNs to enhance learning robustness. An adaptive controller, referred to as 3N-D, is developed to periodically adjust virtual inertia and damping parameters in anticipation of potential transient events. The key contributions of this study are summarized as follows:
(1) An adaptive VSG control strategy based on differential observations is proposed, which periodically updates the virtual inertia and damping parameters to achieve predictive adjustment under evolving system conditions, without relying on explicit multi-machine models.
(2) A physics-informed neural network is integrated into the reinforcement learning framework to embed system dynamic characteristics, which guides policy optimization with physical consistency, reduces the dependence on large-scale datasets, and enhances learning efficiency and generalization in uncertain environments.
The remaining structure of this paper is as follows: Section 2 introduces the mathematical model of the VSG. Section 3 proposes a method based on Reinforcement Learning and add Physics-Informed for adaptive VSG parameter tuning. Section 4 rigorously tests these methods based on the revised IEEE-39 bus system. Finally, Section 5 discusses the conclusions and future research directions.

2. System Model and Preliminaries

2.1. GFM-VSG Model

Grid-forming (GFM) control technologies have emerged as essential strategies for maintaining system stability in converter-dominated grids. Among them, the VSG is a widely adopted GFM implementation, aiming to replicate the dynamic behavior of conventional SGs, including their inertial and damping characteristics. By emulating synchronous generator dynamics, VSG-equipped inverters can provide frequency support and contribute to system robustness during disturbances as shown in Figure 1a. Figure 1b further details the internal loop, which manages frequency and power to maintain system equilibrium.
The system measures the power deviation Δ P between the reference power P r e f and the actual power P e , as well as the frequency deviation Δ ω between current frequency deviation Δ ω between the current frequency ω and the system’s nominal frequency ω 0 . The damping coefficient D regulates the system’s response to frequency variations. Δ P and Δ ω jointly determine the rate of frequency change, preventing excessive fluctuations. The frequency signal ω is integrated and fed back for dynamic adjustment and further integration providing the phase angle δ , ensuring synchronization. This closed-loop control of frequency and phase angle allows VSG to maintain system stability during load changes, simulating SG behavior.
To further enhance adaptability, this paper introduces a nonlinear decision module named 3N-D into the VSG control loop. The 3N-D module periodically adjusts the virtual inertia J and damping coefficient D based on system state observations. Unlike conventional fixed-parameter designs, this approach enables predictive control by tuning the VSG parameters in advance of potential disturbances, thereby improving both dynamic performance and robustness.
In summary, the control equation of the VSC can be represented by the following rotor motion equation:
J ω ˙ = P r e f P e D ( ω ω 0 ) δ ˙ = ω
We have converted ω ˙ in Equation (1) to δ ¨ , which corresponds to the traditional swing equation:
J δ ¨ = P r e f P e D ( ω ω 0 )
This method enables RES to emulate the dynamic characteristics of SGs during grid connection by leveraging inverter-based control. The integrated 3N-D module utilizes current system state observations to periodically determine updated values of the virtual inertia J and virtual damping coefficient D, thereby enhancing both dynamic performance and overall system robustness. This control strategy effectively enhances the system’s robustness and reliability, reduces frequency fluctuations, and ensures efficient operation of the power system.
Although different RESs, such as wind and solar, exhibit diverse physical characteristics at the source level, their dynamic interaction with the power grid is ultimately shaped by the inverter-based interface control strategy. In this study, all RESs are interfaced through a GFM control mode based on the VSG approach, which standardizes their grid-facing behavior via tunable inertia and damping parameters. This abstraction allows the proposed adaptive tuning strategy to be generally applicable across various RES types.

2.2. Analysis of VSG Parameters for Stability

In the power angle stability issue discussed in this paper, integrating RESs does not change the fundamental definition of rotor angle stability. However, it affects rotor angle stability by altering power flow on major interconnections, replacing large SGs, influencing the damping torque of nearby SGs, and substituting SGs with critical power system stabilizers (PSSs). On the other hand, as the proportion of RESs in the grid increases, the inertia of the power systems decreases. When the grid experiences a large disturbance, it must provide power support quickly, often leading to larger and faster rotor swings. Unlike traditional generators, the selection of virtual inertia and damping coefficients under VSG is more flexible. Therefore, to enhance the power system’s ability to cope with large disturbances, these two controllable parameters can be dynamically adjusted, leading to the following transformation of Equation (1):
ω ˙ = Δ P J D J Δ ω
where Δ ω = ( ω ω 0 ) ,   Δ P = ( P r e f P e ) . And when we set 1 / J = k 1 ,   D / J = k 2 , the equation becomes the following:
ω ˙ = k 1 Δ P k 2 Δ ω
From the above equations, it is evident that k 1 and k 2 are determined by J and D, representing the system’s sensitivity to power error and angular velocity error, respectively. We can analyze the impact of J and D on system stability and select appropriate parameters to optimize system performance.
(1) Influence of Virtual Inertia J: A larger J results in a smaller k 1 and k 2 , increasing the system’s virtual inertia. This helps suppress rapid frequency fluctuations and improves system stability by slowing down the rate of frequency changes. However, the downside is that the system’s response becomes more sluggish, making it less suitable for applications requiring rapid adaptation to dynamic changes.
A smaller J results in a larger k 1 and k 2 , allowing the system to respond more quickly to changes in power or frequency deviation. This improves dynamic performance and responsiveness. However, reduced inertia may lead to larger frequency fluctuations and compromise system stability, as the system becomes more sensitive to disturbances.
(2) Influence of Damping Coefficient D: A larger D increases k 2 , providing stronger damping effects. This helps reduce system oscillations and enhances stability by effectively counteracting frequency deviations. However, if D is too large, the system may become overly sluggish, reducing the speed of dynamic adjustments in response to frequency changes.
A smaller D results in a smaller k 2 , which can improve the system’s dynamic response speed. However, insufficient damping leads to increased oscillations and potential instability, as the system lacks sufficient resistance to rapid frequency deviations.
In summary, a larger J and D provide better stability but slower response. A smaller J and D provide faster dynamic response but may lead to system instability. While a larger D improves damping, if J is small, the effect of damping might not be sufficient to stabilize fast disturbances. Therefore, selecting these parameters requires balancing system stability and dynamic performance to meet specific application needs. Additionally, the impact of increasing renewable energy penetration on transient rotor angle stability depends on factors such as grid layout and the location and control of renewable energy generators, adding significant complexity to physical modeling.
From the above analysis, it is evident that appropriate tuning of J and D is essential for VSGs to emulate the dynamic characteristics of SGs. The virtual inertia J allows VSGs to replicate the inertial response of SGs during frequency disturbances, while the damping coefficient D mimics the damping torque provided by SG mechanical dynamics and power system stabilizers (PSSs). By properly adjusting these parameters, VSGs can provide fast and coordinated frequency and angle support, thus contributing to overall system stability under high renewable penetration.

3. Adaptive Transient Rotor Angle Control Strategy Based on PIRL

3.1. VSG Adaptive Transient Power Angle Control Framework

When VSG uses fixed control parameters J and D, it may not achieve optimal control effects for the following three main reasons: First, fixed control parameters cannot adapt to grid conditions, operating conditions, and various disturbances, leading to poor control performance. Second, due to the complexity and nonlinearity of the system, fixed parameters may fail at certain operating points, making it impossible to ensure global optimization. Third, the lack of adaptive capability and the ability to address the uncertainties of renewable energy means that fixed parameters cannot be adjusted promptly to respond to changes in system state and environment, affecting system stability and response effectiveness. Therefore, this paper proposes a novel VSG adaptive transient power angle control method, with its basic framework shown in Figure 2.
The proposed adaptive transient power rotor angle control method periodically updates the virtual inertia J and damping coefficient D to achieve online coordinated control of multiple VSGs. Considering the time-varying nature of the system model and operating conditions, optimal control parameters J 1 , J 2 , , J N and D 1 , D 2 , , D N are calculated online at fixed intervals Δ t based on the latest grid model and operating state information (such as data collected from EMS) and deployed to the controllers of each VSG.
To implement the periodic updating of J and D, the proposed framework follows a two-stage mechanism within each update interval Δ t : (1) system state estimation and prediction, and (2) parameter computation and deployment. Specifically, at the beginning of each interval, system-level measurements (e.g., bus voltages, power angles, and load data) are collected from the EMS and fed into the trained physics-informed reinforcement learning (PIRL) agent. The PIRL agent embeds a physics-informed neural network (PINN) to predict the rotor angle trend δ ^ ( t + 1 ) , which is then combined with the current state vector to form an augmented state input for the Actor network. The Actor network outputs optimized J i and D i values for all VSGs through centralized policy inference. These values are then distributed to the corresponding local VSG controllers via the 3N-D module. This inference–deployment cycle is repeated periodically every Δ t (e.g., 5–15 min), allowing the system to maintain an updated and anticipatory control strategy.
It is important to note that this method does not focus on post-fault adjustments but rather on proactively optimizing system parameters to enhance adaptability and stability in the presence of potential disturbances. Unlike traditional event-triggered control strategies, this approach periodically adjusts VSG parameters so that the system remains in a more favorable operating state when disturbances occur, thereby reducing the impact of sudden events on system stability.
The parameter update interval Δ t is typically set to several minutes (5 to 15 min) to capture long-term dynamic changes in the system. Although short-term disturbances are not directly managed, the input data in the control process already reflects the real-time state of the system, including the effects of short-term variations. This ensures that the system remains stable most of the time, and by proactively adjusting parameters, we avoid the delays of handling disturbances only after they arise. This approach improves system robustness while avoiding the computational overhead and noise of frequent updates.
To further improve performance during transient events, this paper proposes a physics-informed reinforcement learning (PIRL)-based VSG parameter optimization method, which integrates reinforcement learning with physical system constraints to ensure efficient and timely online computation of optimal parameters.

3.2. MDP Setting

Reinforcement learning is a machine learning method that learns strategies by interacting with the environment. An intelligent agent chooses an action in a given state and adjusts the strategy based on the rewards fed back from the environment. In the training of RL, the target is set to find the optimal policy π * , with which the cumulative reward is maximized. We commonly use a Markov process consisting of five tuples for modeling.
For the transient power angle problem concerned in this paper, the following settings are made to PIRL:
(1) States Space
In this paper, the selected observations are bus voltage V k , bus voltage phase angle θ k , bus load P d k and Q d k , generator power angle δ i , frequency ω i , output power P e i , and the time series t (input time series data for 5 observation points within 1 s), specifically integrated into the following equation:
s t = P e i , δ i , ω i , V k , θ k , P d k , Q d k , t
(2) Action Space
The action space consists of all possible actions that the agent can select at each time step. In this paper, the controlled variables are the J and D parameters of the VSG, as described earlier, specifically represented as follows:
a t = J t , D t
That is, two coefficients are output for each VSG. If there is no model-based conclusion and the action space is continuous, it may lead to millions of different operations in each event.
(3) Reward Function
This paper primarily focuses on studying the transient stability of the system and the variation in active power output. The reward r t is designed based on the system’s immediate response in terms of the VSG frequency ω V S G , electrical power output P e , and the virtual power angle δ V S G after a disturbance.
For evaluating transient stability, we use the transient stability index (TSI) of the rotor angle, defined as follows:
T S I = 360 Δ δ max 360 + Δ δ max
where Δ δ m a x represents the maximum difference in rotor angle between any two generators during the simulation. As indicated by the formula, when the equation is greater than 0, the system is considered to be transiently stable; conversely, if the equation is less than or equal to 0, the system is at risk of instability.
Consequently, the system’s reward at time t is formulated as R t , which is the weighted sum of individual components for frequency, power angle, and power deviations:
R t = γ ( c 1 ) , if action is infeasible or the system is unstable , γ ( r t ( ω ) + r t ( δ ) + r t ( P ) + c 2 , otherwise .
where
r t ( ω ) = γ ω ω ω 0 , r t ( δ ) = γ δ T S I , r t ( P ) = γ P P e P r e f .
where γ ω , γ δ , and γ P separate the penalty coefficients set to ensure the effect of frequency, rotor angle, and active power in the transient state, respectively. To ensure the overall reward value, γ is set as a scaling factor for the total reward. It should be noted that the reward function is designed to balance the trade-off between system stability and dynamic response. The coefficients γ ω , γ δ , and γ P are carefully chosen to prioritize system frequency stability ω , rotor angle stability δ , and power output accuracy P e . Importantly, when an action is infeasible or leads to system instability, the three data-dependent terms r t ( ω ) , r t ( δ ) , and r t ( P ) are not computed; instead, a constant penalty c 1 = 200 is directly applied. Likewise, when the action enables the system to recover stability (i.e., T S I > 0 ), a constant reward c 2 = 100 is added. We set γ = 0.1 as the global outer scaling factor applied to both penalties and bonuses. By adjusting these weights, the reinforcement learning agent is encouraged to maintain a stable power angle while minimizing frequency fluctuations and ensuring that the power output remains close to the setpoint. This ensures that both J and D are dynamically adjusted to maintain an optimal balance between stability and response speed.

3.3. DDPG Algorithm for Adaptive Control

In this work, the Deep Deterministic Policy Gradient (DDPG) is selected instead of alternatives like PPO or TD3 mainly due to its suitability for continuous and high-dimensional action spaces with direct physical interpretations. PPO relies on stochastic policy optimization, which may introduce unnecessary variance when precise parameter outputs are required for real-time grid control. TD3 improves upon DDPG but at the expense of additional critics and delayed policy updates, leading to higher computational burden and longer training time, which is less favorable under strict real-time constraints. By contrast, DDPG directly learns deterministic mappings from system states to continuous control parameters ( J , D ), achieving fast convergence and low-latency decision-making. Furthermore, the stability concerns of DDPG are mitigated in our framework by embedding physical constraints via PINNs and employing target networks and experience replay, which enhance robustness during training.
DDPG is designed for continuous action spaces, utilizing an Actor–Critic architecture where the Actor generates actions and the Critic evaluates their quality. In this paper, we apply the DDPG algorithm to optimize the control parameters of VSG, ensuring that the system can respond adaptively to grid disturbances while maintaining stability. The novelty of our approach lies in how we integrate DDPG into the VSG control framework, customizing the algorithm to suit the specific requirements of power system dynamics.

3.3.1. Actor-Network

In the proposed PIRL framework, the Actor-network is designed to output the complete set of virtual inertia coefficients J t and damping coefficients D t for all VSGs based on the current system state s t . These parameters are critical for VSGs to emulate the behavior of SGs and provide stability support to the grid. Unlike traditional DDPG applications with generic action spaces, the action space in this study directly corresponds to the physical control parameters of multiple VSGs, making the policy learned by the Actor directly applicable to real-world grid control.
A centralized Actor structure is adopted, as shown in Figure 3. The network takes the complete system state s t as input and simultaneously generates control parameters for N VSGs through a deep neural network. This design enables coordinated control across VSGs by leveraging global state information.
As depicted in Figure 3, the Actor is a multilayer neural network consisting of an input layer, hidden layers, and an output layer. The input layer receives the system state s t and the output of PINNs (discussed in the next subsection), which includes the operating states of all VSGs. After multiple layers of nonlinear transformations, the output layer produces N sets of parameters ( J 1 , D 1 ) ( J N , D N ) . This structure allows the Actor to learn the coupling relationships among multiple VSGs, particularly the strong nonlinearities exhibited during large grid disturbances. By sharing network parameters, the system effectively captures dynamic interactions between VSGs, enhancing transient regulation performance.
At the execution level, these parameters are transmitted to a single 3N-D module that serves as the parameter execution interface. The 3N-D module then distributes the appropriate parameter sets to each local VSG controller, maintaining the characteristics of a distributed execution architecture while ensuring control strategy consistency through centralized decision-making. This design maintains the characteristics of a distributed execution architecture, where each VSG responds quickly based on local information, while ensuring control strategy consistency through centralized decision-making.
The innovation of this architecture lies in its centralized–distributed hybrid control paradigm: the Actor-network updates its parameters based on the gradient information from the Critic-network to maximize the Q-value, achieving global optimization at the decision-making level; the execution layer performs periodic parameter updates through 3N-D modules based on system state snapshots. By periodically adjusting the parameter sets for all VSGs, the system can respond more sensitively to power angle fluctuations. The parameter update timing mechanism is adaptive, allowing the system to dynamically adjust control cycles based on grid conditions, balancing control accuracy and communication load.
The Actor-network updates its parameters based on the gradient information of the Critic-network to maximize the Q value of the Critic-network output:
θ μ J θ μ = E s d β θ μ log π θ μ ( a s ) Q π ( s , a )
Here, the policy π θ μ is determined by the parameter θ μ .

3.3.2. Critic-Network

To ensure that the control actions output by the Actor network are optimal, the Critic network provides feedback by evaluating the quality of the Actors’ actions. In the DDPG algorithm, the Critic network measures the quality of each state–action pair based on a Q-value function.The Q-value function is continuously updated by the following Bellman equation:
Q s t , a t = r t + y Q s t + 1 , a t + 1
The task of the Critic network is to minimize the prediction error of the Q-value, allowing the Q-value function to gradually approximate the true cumulative reward.
In practice, the Critic-network is updated by minimizing the following loss function:
L θ Q = E r t + γ Q s t + 1 , a t + 1 θ Q Q s t , a t θ Q 2
In the application scenario of this paper, the reward function of the Critic network is closely related to the transient power angle stability of the power system. Specifically, the Critic network evaluates the effectiveness of the Actor network’s control actions based on the transient response of the power grid. If the actions generated by the Actor can effectively suppress excessive oscillations of the transient power angle and restore the system to a stable state quickly after a disturbance, the Critic will assign a high Q-value to those actions. Conversely, if the system experiences large power angle oscillations or significant frequency deviations, the Critic will assign a lower Q-value, guiding the Actor to adjust its strategy accordingly.

3.4. Physics-Informed Learning Mechanism for Multi-VSG Systems

This paper employs physics-informed neural networks (PINNs) to embed physical laws and generator identity information into a unified framework for a dynamic prediction and control of multi-VSG systems. This is accomplished by considering the given partial differential equation:
N ( u ) = 0
where N is a differential operator, and u is the function to be solved. PINNs approximate the solution u by constructing a neural network.
Given the previously detailed dynamic equations, we define u ( t , x ) as the predicted rotor angle δ ( t + 1 ) at the next time step. Thus, the PINNs can be expressed as follows:
(1) u ( t , x ) represents the neural network’s predicted rotor angle δ at the next moment;
(2) f ( t , x ) represents the physical equation for rotor angular acceleration according to the traditional swing equation.
Therefore, the whole definition of PINNs is as follows:
u ( t , x ) = δ ^ ( t + 1 ) f ( t , x ) = J d 2 u ( t , x ) d t 2 ( Δ P ( t ) D Δ ω )
The specific physical constraint enforced here is the classical swing equation of synchronous generators. It ensures that the predicted rotor angle trajectory not only fits the data but also satisfies the physical relationship between inertia J, damping D, power imbalance Δ P , and frequency deviation Δ ω . In practice, this constraint is embedded as a soft penalty in the loss function, guiding the network to produce physically consistent outputs while improving its generalization under limited or noisy samples.
As the power system comprises multiple VSGs operating in parallel, a generator identity encoding method is adopted within the PINNs framework to differentiate the dynamic characteristics of various generators. Assuming each generator’s state is represented by a vector containing d features, we assign each generator a unique identifier using one-hot encoding. Specifically, for each generator i, its identity encoding can be represented as an N-dimensional vector e i , as follows:
e i = [ 0 , , 1 , , 0 ]
where the i-th position is 1, and all others are 0. This encoding ensures that each generator carries a unique identifier in the input, allowing the PINNs framework to accurately distinguish between different generators.
Next, to integrate the generator’s identity information with its state features, we concatenate each generator’s state vector s t with its corresponding identity encoding e i , forming a new input vector:
v t = [ e i , s t ]
where v t is the input vector containing both state information and identity information. By clearly representing both state dynamics and generator identities, the neural network can more effectively learn the individualized dynamics of each generator and improve prediction accuracy.
The unified total loss function comprises two parts: data loss and physics loss, assuming the same set of time points is used for both data fitting and enforcing physical constraints:
(1) Data loss term L d a t a : calculates the mean squared error (MSE) between the network’s predicted rotor angles and actual observations, ensuring consistency with observed data.
(2) Physics loss term L p h y s i c s : calculates the MSE between the predicted rotor angular acceleration and the real angular acceleration derived from the traditional swing equation, ensuring predictions adhere to physical laws.
The loss functions are defined explicitly as follows:
L d a t a = 1 N i = 1 N u ( t i , x ) δ i 2 L p h y s i c s = 1 N i = 1 N f ( t i , x ) 2
It should be noted that the weights of data loss L d a t a and physics loss L p h y s i c s are assumed equal by default in this paper. However, in practical applications, it may be necessary to adjust these weights according to specific tasks to balance the strengths of data fitting and physical constraints appropriately. Moreover, since rotor acceleration fluctuations are expected during disturbances in practical power systems, strictly minimizing the physics loss to zero might not be reasonable. Instead, the physics loss should be controlled within a reasonable range to avoid over-constraining the network’s predictive performance. The whole PINNs architecture is shown in Figure 4.
By integrating data loss and physics loss, the approach effectively balances data-driven learning with adherence to physical laws, enabling accurate fitting of historical data while strictly following the dynamic characteristics of the power system.
The input vector is fed into the PINNs model, which outputs the predicted rotor angle δ ( t + 1 ) by learning the physical laws and constraints. Subsequently, this predicted rotor angle serves as the new state input for the Actor network within the RL framework, enabling the generation of corresponding control actions based on both predicted rotor angles and current system states. We refer to this integrated PINNs and Actor network approach as the 3N-D model, which ensures accurate prediction and effective control of generator rotor angles, optimizing overall power system stability and responsiveness. The pseudocode of proposed method is as Algorithm 1.
Algorithm 1 Physics-Informed Deep Deterministic Policy Gradient
 1:
Randomly initialize Physics-Informed neural network ϕ
 2:
Initialize policy network parameters θ μ and value network parameters θ Q ;
 3:
Initialize target policy network parameters θ μ and target value network parameters θ Q ;
 4:
Initializes the replay buffer D ;
 5:
for episode = 1,M do
 6:
  Initializes the action exploration noise N
 7:
  Get the initial state s t
 8:
  for t=1, T do
 9:
   Predict next rotor angle trend from PINNs:
    δ ^ ( t + 1 ) PINNs ( s t ) ;
10:
  Combine s t and δ ^ ( t + 1 ) together, select actions based on online policy network and exploration noise:
   a t π ϕ ( a t | ( s t , δ ^ ( t + 1 ) ) ) + N ;
11:
  Exexute action a t , get reward r t from the environment and the next state s t ;
12:
  Save experience to replay buffer:
   D ( s t , a t , r ( s t , a t ) , s t + 1 )
13:
end for
14:
for k=1, K do
15:
  Generate identity information of generator with stae features by Equation (15)
16:
  Calculate the PINNs value by Equation (13);
17:
  Update PINNs parameters by minimizing loss function Equation (16);
18:
  Calculate the target value by Equation (9)
19:
  Update Critic-networks by minimizing loss function Equation (11);
20:
  Update Actor-networks and target networks.
21:
end for
22:
end for
The physics-informed DDPG implements three training techniques to prevent instability in DDPG training and uses physical information to reduce training complexity while avoiding local optima issues.
(1) Using Target Networks: Both the Actor- and Critic-networks have target networks to generate stable target values. The parameters of the target networks gradually approach the main network parameters through soft updates rather than being fully copied each time. This method reduces the changes in target values for the target networks, preventing drastic updates to the main network parameters and thus improving training stability.
(2) Experience Replay: Storing past experience samples and randomly sampling small batches for training. This method can break time correlations, improving the independence and identically distributed nature of the samples.
(3) Physics-Informed Action Mapping: Although DDPG performs well in handling continuous action spaces, it can still fall into local optima, especially in complex and multi-peak policy spaces. The embedding of physical information guides the training of the agent while avoiding local optima.

4. Case Study

4.1. Analysis of Parameter J and D for VSG

From Section 2.2, we have already discussed the influence of virtual inertia J and damping coefficient D. Larger values of J and D enhance system stability but result in slower response times, while smaller values improve dynamic response but may compromise system stability. To further investigate these effects, we will first conduct simulations based on a Single Machine Infinite Bus (SMIB) system, incorporating both a Virtual VSG and a SG. In the simulation where the J was varied, D was fixed at 5. Similarly, in the tests of D, J was fixed at 5. By fixing one parameter while adjusting the other, we can more clearly observe the effects of virtual inertia and virtual damping on the system’s dynamic response, ensuring that the simulation results are both targeted and comparable. All simulations are conducted using the open-source power system simulation software ANDES (version 1.9.2) [46], programmed in Python 3.7. This platform is selected for its strong capabilities in multi-device dynamic modeling, support for modular controller integration, and symbolic formulation of differential-algebraic equations (DAEs), which make it well-suited for implementing custom VSG control strategies. Additionally, its open-source nature ensures transparency and reproducibility, which aligns with the research objectives of this work.
The left side of Figure 5 illustrates the impact of different virtual inertia values on system response. A smaller virtual inertia (e.g., J = 2.0 ) allows the system to respond more quickly, with angular velocity ω reaching a steady state rapidly. However, due to the lack of sufficient inertia, the system exhibits larger oscillations, with significant frequency deviations and notable power oscillations in the output power P e . As the virtual inertia increases (e.g., J = 4.0 ), the system’s oscillations decrease, and both angular velocity and power fluctuations are significantly reduced. This indicates that increasing the virtual inertia improves system stability. However, a larger inertia also results in a slower system response, prolonging the duration of oscillations.
The right side of Figure 5 shows the effect of different damping coefficients D on the system’s dynamic characteristics. In the absence of damping (i.e., D = 0.0 ), the system exhibits pronounced oscillations, with large amplitude fluctuations in both angular velocity ω and output power P e , making the system difficult to stabilize quickly. As the damping coefficient increases (e.g., D = 20.0 ), the system’s oscillations are effectively damped, and the fluctuations in angular velocity and power gradually diminish, leading to faster stabilization. Larger damping provides stronger suppression of frequency deviations, reducing system oscillations. However, excessive damping may slow down the system’s response.
In summary, the virtual inertia J and damping coefficient D play complementary roles in balancing the trade-off between response speed and system stability. Increasing virtual inertia enhances system stability by reducing oscillations but slows down the dynamic response. Similarly, increasing damping D mitigates system oscillations and improves stability, but excessive damping can lead to slower system response. Therefore, selecting appropriate values for J and D is crucial to achieving an optimal balance between rapid response and stability in various applications.

4.2. Test System: Modified IEEE-39 Bus System

This section validates the 3N-D controller using the PI-DDPG algorithm on a modified IEEE 39-bus system. Random three-phase ground faults are introduced at buses 2, 9, 13, 17, 23, and 25, with fault occurrence at t f = 1.0 s and fault clearing t c = 1.2 1.3 s. To emulate observation-to-action latency in practical EMS, the control action is applied with a delay of 0.2 s after the fault clearing time, while the RL agent only uses state data up to the fault clearing instant for decision-making. This setting effectively incorporates data latency into the simulation design. Three cases are simulated. The load levels in all cases are set to 120–126% of the original load level.
As shown in Figure 6. Case 1 and Case 2 represent scenarios with relatively extreme proportions of renewable energy units (20% and 80%, respectively) to evaluate the effectiveness of the method under different renewable energy ratios. In Case 3, the proportion of renewable energy units is 50%, and the method’s reliability is tested under high load by increasing the operational load. Case 1: Seven SGs (at buses 30, 31, 33, 34, 35, 36, and 37) and three VSGs (at buses 32, 38, and 39). Case 2: Two SGs (at buses 30 and 33) and eight VSGs (at buses 31, 32, 34, 35, 36, 37, 38, and 39). Case 3: Five SGs (at buses 30, 33, 34, 36, and 37) and five VSGs (at buses 31, 32, 35, 38, and 39).
In the simulation experiments of this paper, the PINNs include two fully connected layers, each containing 128 neurons; the Actor- and Critic-networks each have three fully connected layers, with 1024, 128, and 64 neurons per layer, respectively. Additionally, the activation function is the rectified linear unit (ReLU), the batch size for sampling is 64, the optimizer used is Adam, and the learning rate is 0.001.
Table 1 summarizes the key parameters of the VSGs in the 39-bus system, including the inertia constant J, damping coefficient D, and other control settings. Since there is no universal standard for selecting VSG parameters, a wide range of values can be applied. In this study, we selected the parameters based on simulations reported in [17,20,21,47], combined with the default settings of the simulation platform, to ensure reliable results and provide a basis for further optimization. Although detailed hardware constraints are not explicitly modeled in this work, the chosen values of J and D fall within the feasible ranges reported in the related literature, ensuring that the parameter settings remain physically meaningful and practically relevant. The table lists the configurations adopted in the case study.
  • The apparent power ratings S n are identical to those of the original conventional SGs.
  • The parameters k ω = 0 and k v = 0 for all VSGs.
  • K p v and K i v represent the voltage controller’s proportional and integral gains, while K p I and K i I represent the current controller’s proportional and integral gains, respectively.

4.3. Training Efficiency and Deployment Performance

Figure 7 illustrates the convergence of PI-DDPG. The reward curve reflects the overall performance of the policy. As shown in Figure 7a, the rewards for all cases increase rapidly during the early training phase (first 3000 epochs), indicating that the policy network quickly improves its adaptability to the environment. In Case 1, the reward value rises sharply from nearly 0 to over 10 within the first 3000 epochs and then gradually stabilizes throughout the training process, eventually converging around 15. In Case 2, the reward increases rapidly to approximately 14 in the early training phase and remains stable after 3000 epochs, ultimately converging around 15. In Case 3, the reward quickly rises to around 15 and remains stable throughout the training process.
Figure 7b depicts the training process of the value loss function, where the Critic loss for all cases exhibits good convergence. In Case 1, the loss value drops sharply from a high level within the first 3000 epochs and continues to decrease gradually during training, stabilizing after 9000 epochs with reduced fluctuations. In Case 2, the loss value rapidly decreases to around 100 in the early training phase and then remains stable after 3000 epochs, with minimal fluctuation, indicating a highly stable training process. In Case 3, the loss value decreases rapidly to around 80 within the first 3000 epochs and then remains stable throughout the training process, showing excellent convergence.
Figure 7c presents the declining trend of policy loss during training for all cases, demonstrating the optimization effectiveness of the policy network. In the early training stage (first 3000 epochs), the policy loss drops rapidly, followed by a steady decrease between 3000 and 9000 epochs. In the later training stage (after 9000 epochs), the loss value gradually stabilizes at a low level, and the loss curves of all cases converge smoothly, indicating the stability and convergence of the policy network.
In Figure 7d, the physics loss exhibits excellent convergence across all cases, maintaining a stable state throughout training with almost no fluctuations. This performance indicates that the physical constraints are fully satisfied, ensuring that the network optimization consistently follows physical laws, resulting in a highly stable optimization process.
On the other hand, Figure 8 demonstrates the application of the PI-DDPG-based 3N-D controller, particularly highlighting the adaptive adjustment of parameters over time to maintain stability. In Figure 8a, without adaptive tuning, ‘VSG-10’ in Cases 1 and 2 experiences significant rotor angle increases from around 8 s until the end of the simulation, resulting in power angle instability. However, with PI-DDPG adaptive tuning (Figure 8b), the rotor angles of all generators remain stable throughout the simulation. In Case 3 (right side of Figure 8), the system is more vulnerable to large disturbances under high load. The rotor angles of SGs and VSGs deviate at around 2.5 s, maintaining three clusters of power angles until the end of the simulation. After a large disturbance, PI-DDPG adjusts the VSG parameters, transitioning the system rotor angles to synchronous reduction, achieving stability according to current standards. The PI-DDPG algorithm demonstrates effective and timely parameter adjustments under severe conditions, significantly enhancing power system stability through adaptive tuning of VSG parameters J and D.
The dynamic response of the system in terms of rotor speed ω and electrical power P e under different scenarios is presented in Figure 9. In all three cases without control, the system exhibits significant oscillations during the fault, with noticeable fluctuations in both ω and P e and a delayed return to stability after the fault. Particularly in Case 2 and Case 3, the system experiences larger deviations and slower recovery, indicating its inability to maintain stability effectively without proper control.
However, with PI-DDPG control, the system’s oscillations are significantly reduced and both ω and P e stabilize more quickly across all cases. Even in Case 2 and Case 3, where the uncontrolled system experiences larger disturbances, PI-DDPG ensures a rapid recovery with minimal deviations from the steady-state values.
The virtual inertia J and virtual damping coefficient D are crucial for stabilizing rotor speed ω , power angle δ , and electrical power output P e . Without control, the system experiences instability, particularly in δ , due to insufficient inertia and damping, which causes significant power angle deviations. For example, in `VSG-10’ with J = 10 and D = 65 , large power angle shifts occur during disturbances, leading to instability.
Therefore, the adaptive tuning of J and D is critical for balancing stability and dynamic performance. Larger values of J reduce frequency fluctuations, while larger values of D enhance damping to suppress oscillations. These parameter adjustments under the PI-DDPG framework ensure system stability across different scenarios. These results confirm that the PI-DDPG controller maintains robust performance even when decision-making is based on delayed observations, which demonstrates its adaptability to latency commonly encountered in EMS.

4.4. Validation of the Effectiveness of PINNs

To validate the feasibility and adaptability of the proposed PI-DDPG-based 3N-D controller, we compare its performance with the SAC algorithm based on stochastic gradient descent for comparison in this section.
Figure 10 illustrates the reward curves and Critic loss of different algorithms during the training process. In all cases, DDPG and PI-DDPG converge significantly faster than SAC. Due to the guidance of PINNs, PI-DDPG obtains higher rewards in the early stage, which helps guide policy optimization into the correct range, thereby accelerating convergence. Specifically, the reward functions of PI-DDPG and DDPG converge at approximately 200 epochs across all cases, whereas SAC shows signs of convergence around 8000 epochs in Case 1 and Case 2. Notably, in Case 3, SAC does not converge until 10,000 epochs.
On the other hand, regarding the convergence of the Critic loss function, PI-DDPG and DDPG converge at around 7500 epochs, while SAC continues until the end of the training process. Furthermore, compared to the other two algorithms, PI-DDPG exhibits significantly smaller fluctuations, indicating that the inclusion of PINNs improves the convergence of the reinforcement learning algorithm.
According to the algorithm data in Table 2, the PI-DDPG algorithm quickly achieves and maintains high reward values and success rates across all cases and training epochs, indicating its superior ability to optimize the objective function. In this study, the “success rate” is defined as the proportion of fault scenarios in which the system maintains transient rotor angle stability after adaptive tuning of VSG parameters. A trial is considered successful if the rotor angles of all generators remain synchronized without loss of stability during the post-fault period. This definition directly corresponds to practical transient stability criteria used in grid operation.
In Case 1, where SGs dominate the system, the system has high inertia and damping characteristics. Due to the unbalanced proportion of VSGs, the adaptive control of VSG parameters has certain limitations in improving power angle stability. Under this scenario, PI-DDPG demonstrates a relatively high reward value (−0.03) and success rate (55.43%) even in the early stage (1000 epochs), significantly outperforming DDPG and SAC. This suggests that PI-DDPG effectively enhances system stability by adjusting the limited number of VSG parameters. In the later stage (15,000 epochs), PI-DDPG maintains stable performance with a reward value of 1.91 and a success rate of 60.41%. The relatively lower success rate in Case 1 is attributed to the dominance of SGs, which already provide strong inertia and damping, thus limiting the contribution of VSG-based adaptive control. Since only three VSGs are included in this case, the ability of the proposed controller to further improve system stability is inherently constrained. Nevertheless, PI-DDPG still significantly outperforms DDPG and SAC under the same conditions, validating the effectiveness of the proposed approach.
In Case 2, the system has a high penetration of renewable energy, leading to significantly weakened inertia and damping characteristics, making power angle stability highly dependent on the dynamic adjustment of VSGs and increasing control difficulty. PI-DDPG demonstrates exceptional adaptability from the early stage (1000 epochs), achieving a reward value of 15.00 and a success rate of 96.88%, significantly outperforming DDPG and SAC. This indicates that PI-DDPG can quickly optimize the inertia and damping characteristics of VSGs, effectively compensating for the system’s inertia deficiency and enhancing stability. In the later stage (15,000 epochs), PI-DDPG reaches a near-theoretical optimal performance with a reward value of 15.06 and a success rate of 97.50%, significantly surpassing DDPG (−2.79 reward value; 48.13% success rate). This highlights PI-DDPG’s superior performance and rapid convergence capability in complex scenarios.
In Case 3, the system’s inertia and damping characteristics fall between those of Case 1 and Case 2. PI-DDPG quickly identifies a coordinated control strategy for SGs and VSGs in the early stage (1000 epochs), achieving a reward value of 15.78 and an impressive success rate of 99.38%, far exceeding DDPG (−2.74 reward value; 47.83% success rate) and SAC (−1.65 reward value; 50.93% success rate). In the later stage (15,000 epochs), PI-DDPG’s reward value and success rate remain at the theoretical optimum (15.80 reward value; 99.38% success rate), closely matching SAC’s performance but with significantly faster convergence. This demonstrates PI-DDPG’s efficiency and stability in control.
The PI-DDPG algorithm outperforms others in terms of both performance and learning efficiency, quickly converging to optimal solutions and maximizing rewards across various test scenarios. In contrast, SAC converges more slowly due to its broader strategy search range, as it balances maximizing expected rewards with policy entropy for exploration. While this entropy-based exploration enhances robustness in some environments, it results in slower convergence, particularly in tasks where control actions are more deterministic, as in our setup. DDPG, on the other hand, focuses on deterministic policy optimization, allowing for faster convergence in environments with well-defined control action spaces. By optimizing the policy directly without entropy-based exploration, DDPG efficiently finds the optimal solution in our test cases, where system dynamics require less exploration.
As shown in Figure 7d, the Physics Loss in all three cases remains at a low level of around 0.1 throughout the training process and gradually stabilizes without noticeable oscillations or divergence. Case 1 exhibits slightly lower values, while Case 2 and Case 3 show marginally higher losses due to the increased complexity of system dynamics under high renewable penetration and heavy-load conditions. It should be noted that the Physics Loss is not required to converge to zero; maintaining a stable and low level is sufficient to ensure a balance between physical consistency and data fitting capability. This trend confirms that the embedded physical constraints are effectively learned by the network, ensuring that PINNs capture the swing equation dynamics while avoiding overfitting. The convergence of PINNs in this study is determined when the total loss (composed of data loss and physics loss) becomes stable within a predefined threshold over consecutive training epochs. Unlike methods that enforce the physics loss to converge to a very small value, we adopt this criterion because the physics loss may retain residual errors due to factors such as numerical precision limits in power system simulation, discretization of differential equations, and inevitable approximation errors in the neural network representation. Therefore, instead of forcing the physics loss to diminish excessively, which may over-constrain the model and degrade prediction accuracy, convergence is recognized once the total loss reaches stability. This approach ensures both feasibility in training and consistency with physical laws. To enhance generalization, the PINNs are trained with diverse scenarios including multiple fault locations, clearing times, and different renewable penetration levels. Moreover, the embedded physical constraints derived from the swing equation act as an inherent regularization, ensuring that the learned solutions remain consistent with physical laws even when encountering unseen operating conditions.

4.5. Sensitivity Analysis of Reward Penalty Coefficients

To further evaluate the robustness of the proposed physics-informed reinforcement learning (PIRL) framework, we conduct a sensitivity analysis of the reward weight coefficients. In particular, the rotor-angle-based transient stability index (TSI) is adopted as the dominant term of the reward, while penalty terms associated with frequency deviation ( γ ω ) and active power deviation ( γ P ) are considered as auxiliary constraints. To investigate the influence of these penalty coefficients ( γ ω , γ δ , γ P ) on the performance and convergence of the control policy, we conducted a sensitivity analysis with three representative settings:
  • Group A (default): γ ω = 1 , γ δ = 100 , γ P = 1 ;
  • Group B (Weakened penalties): γ ω = 0.1 , γ δ = 100 , γ P = 0.1 ;
  • Group C (Strengthened penalties): γ ω = 10 , γ δ = 100 , γ P = 10 .
Each configuration was trained under consistent conditions for 15,000 episodes. Evaluation metrics included cumulative reward, success rate, TSI, and tracking accuracy of power and frequency. The results are shown in Figure 11.
The results indicate that TSI consistently dominates the reward shaping across all configurations. Under the default setting, TSI reaches 0.727, which is significantly larger than the contribution from active power deviation (0.056). This confirms that the controller prioritizes rotor angle stability. When the penalties are weakened (Group B), the weight of TSI slightly decreases to 0.668, while the contribution of active power deviation increases markedly to 1.741. This suggests that a looser penalty on power mismatch allows the agent to relax its tracking requirement, thereby allocating more learning capacity to stability optimization. Conversely, when penalties are strengthened (Group C), active power deviation is effectively suppressed to 0.158, while TSI (0.723) and frequency deviation (≈10) remain nearly unchanged compared to the default case. These results demonstrate that the TSI-dominant reward structure guarantees rotor angle stability even under different penalty settings, while power tracking can be flexibly adjusted.
The underlying mechanism behind these observations can be explained by the design of the reward function. The dominance of TSI originates from the positive offset applied in its formulation and the large negative penalty incurred upon instability, which ensures that maintaining synchronism is always prioritized. In contrast, the frequency deviation term shows little variation because the VSG dynamics inherently damp frequency oscillations, and the reward function only considers terminal states, thereby limiting the impact of γ ω . The active power deviation term is the most sensitive component, as it is penalized linearly without any offset, making it directly proportional to γ P . Reducing γ P lowers the cost of power mismatch, resulting in larger deviations, while increasing γ P effectively suppresses the mismatch.
In summary, the sensitivity analysis confirms that the proposed reward design is robust against variations in penalty coefficients. TSI consistently dominates the learning process and ensures rotor angle stability, while active power deviation is the most sensitive term, influenced mainly by γ P . The default configuration thus provides a balanced trade-off between system stability and power tracking performance.

4.6. Complex System Validation

To validate the capability and generalization of the proposed PI-DDPG method in handling complex systems, simulations were conducted on the WECC-179 bus case with a high proportion of renewable energy generation, as illustrated in Figure 10. This case consists of 15 SGs and 14 VSGs, accounting for approximately 50% of the total generation (detailed shown in Figure 12). Faults were introduced at bus 16, 47, 53, 75, 85, 110, and 119, respectively, with tf and tc set as before. The load levels were consistent with the basic case of WECC-179.
From Figure 13a, it can be seen that PI-DDPG still performs consistently well, with DDPG slightly lagging behind. However, the SAC algorithm consistently fails to achieve satisfactory results. This aligns with the conclusions drawn from the simulation experiments on the modified IEEE-39 bus system discussed earlier. Similarly, these trends are reflected in the critic loss function, indicating that PI-DDPG converges the fastest.
On the other hand, Figure 13 depicts the deployment performance of PI-DDPG based on the modified WECC-179 bus system. In Figure 13c, the system’s time-domain simulation is shown without VSC. The system exhibits angular fluctuations from 2 s to 7 s, and instability tendencies emerge around 8 s, amplifying continuously until the end of the simulation. However, in Figure 13d, for VSC decisions made by PI-DDPG, the angular fluctuations in the system are delayed until 4 s, with lower amplitudes and frequencies. Throughout the simulation, the fluctuation amplitude decreases continuously, indicating that the system’s final angular stability is achieved.
In summary, the PI-DDPG algorithm’s ability to significantly reduce training time and achieve high-quality results highlights its effectiveness in practical applications. It demonstrates robustness and scalability across different complexity levels, making it an efficient decision-making tool in real-world applications by rapidly converging to optimal solutions and maximizing rewards.

5. Conclusions

This paper presents an adaptive control framework to enhance the transient stability of renewable-dominated power systems under uncertainty and time-varying conditions. Focusing on VSGs, a nonlinear decision module is introduced to periodically tune the virtual inertia and damping coefficients. By embedding physical constraints via PINNs into the reinforcement learning process, the tuning task is formulated as a physics-guided optimization problem and solved using a DDPG-based learning algorithm. The proposed method improves training efficiency and policy reliability, addressing the limitations of purely data-driven approaches. Simulation results on high-load and high-renewable scenarios demonstrate that this hybrid strategy not only enhances control adaptability but also significantly improves transient rotor angle stability. This work offers a promising direction for robust and intelligent grid-forming control in future power systems.
Future work will explore the extension of this method to larger-scale systems with multi-area coordination and the integration of cybersecurity considerations such as partial observability, communication delays, and data spoofing into the control framework. In addition, future research will investigate more stable and sample-efficient reinforcement learning algorithms, such as offline RL or safe RL, as well as the integration of uncertainty quantification to enhance the robustness and interpretability of the control policy. Moreover, although this study models renewable energy sources using a unified grid-forming interface based on the VSG approach, future work may further explore the adaptation of the proposed control strategy under heterogeneous inverter-interfaced RES configurations, considering the specific dynamic characteristics of wind, solar, and other emerging resources. Additionally, future work will incorporate realistic measurement noise models and filtering techniques, together with the PINNs-based physical constraints, to evaluate the controller’s resilience under noisy and uncertain EMS conditions. These directions will further strengthen the applicability of the proposed framework in real-world power systems.

Author Contributions

Conceptualization, J.G. and J.J.Z.; methodology, J.G., S.C. and D.K.; software, J.G. and S.C.; validation, J.G., S.C. and D.K.; formal analysis, J.G.; investigation, J.G.; resources, S.F. and K.J.; data curation, J.G.; writing—original draft preparation, J.G.; writing—review and editing, S.C., D.K., H.J. and D.W.G.; visualization, J.G.; supervision, H.J. and D.W.G.; project administration, J.J.Z.; funding acquisition, S.F. and K.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Foundation of State Grid Corporation of China, grant number 5100-202355764A-3-5-YS.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Shu, Y.B.; Zhang, Z.L.; Tang, Y.; Zhang, F.; Zhong, W. Fundamental Issues of New-type Power System Construction. CSEE J. Power Energy Syst. 2024, 44, 8327–8340. [Google Scholar] [CrossRef]
  2. Xiang, Y.; Wang, T.; Wang, Z. Framework for Preventive Control of Power Systems to Defend Against Extreme Events. CSEE J. Power Energy Syst. 2024, 10, 856–870. [Google Scholar] [CrossRef]
  3. Zhu, S.; Liu, K.; Qin, L.; Li, G.; Hu, X.; Liu, D. Analysis of Transient Stability of Power Electronics Dominated Power System: An Overview. Proc. CSEE 2017, 37, 3948–3962. [Google Scholar]
  4. Khan, A.; Hosseinzadehtaher, M.; Shadmand, M.B.; Bayhan, S.; Abu-Rub, H. On the stability of the power electronics-dominated grid: A renewable energy paradigm. IEEE Ind. Electron. Mag. 2020, 14, 65–78. [Google Scholar] [CrossRef]
  5. Holttinen, H.; Kiviluoma, J.; Flynn, D.; Smith, J.C.; Orths, A.; Eriksen, P.B.; Cutululis, N.; Söder, L.; Korpås, M.; Estanqueiro, A.; et al. System impact studies for near 100% renewable energy systems dominated by inverter based variable generation. IEEE Trans. Power Syst. 2020, 37, 3249–3258. [Google Scholar] [CrossRef]
  6. Tang, Z.; Yang, Y.; Blaabjerg, F. Power electronics: The enabling technology for renewable energy integration. CSEE J. Power Energy Syst. 2022, 8, 39–52. [Google Scholar] [CrossRef]
  7. Geng, H.; He, C.; Liu, Y.; He, X.; Li, M. Overview on Transient Synchronization Stability of Renewable-rich Power Systems. High Volt. Eng. 2022, 48, 3367–3383. [Google Scholar] [CrossRef]
  8. Wu, M.; Lü, Z.; Qin, L.; Song, Z.H.; Sun, L.J.; Zhao, T.; Gao, J. Robust Control Parameter Design for Virtual Synchronous Generator Under Variable Operation Conditions of Grid. Power Syst. Technol. 2019, 43, 3743–3753. [Google Scholar] [CrossRef]
  9. Lu, Y.H. Research on Control Strategy of Microgrid Based on Virtual Synchronous Generator. Master’s Thesis, Jiangnan University, Wuxi, China, 2023. [Google Scholar] [CrossRef]
  10. Chen, J.; Gong, Q.; Zhang, Y.; Fawad, M.; Wang, S.; Li, C.; Liang, J. Comprehensive Assessment of Transient Stability for Grid-Forming Converters Considering Current Limitations, Inertia and Damping Effects. CSEE J. Power Energy Syst. 2025, 11, 1–12. [Google Scholar] [CrossRef]
  11. Xu, H. Study on Transient Voltage Stability and Its Control Measures of Guangdong Power Grid. Doctoral Dissertation, South China University of Technology, Guangzhou, China, 2025. [Google Scholar]
  12. Yang, G. Research on Power Angle Strategy of Virtual Synchronous Generator Based on Flexible Inertia Adjustment. Doctoral Dissertation, Xian University of Technology, Xi’an, China, 2021. [Google Scholar] [CrossRef]
  13. Ortiz-Villalba, D.; Rahmann, C.; Alvarez, R.; Canizares, C.A.; Strunck, C. Practical framework for frequency stability studies in power systems with renewable energy sources. IEEE Access 2020, 8, 202286–202297. [Google Scholar] [CrossRef]
  14. Huo, X.L. Research on Power System Transient Instability Recognition and Control Strategy. Doctoral Dissertation, University of Electronic Science and Technology of China, Chengdu, China, 2021. [Google Scholar]
  15. Zhang, H.; Xiang, W.; Lin, W.; Lu, J.; Wang, P.; Guerrero, J. Grid forming converters in renewable energy sources dominated power grid: Control strategy, stability, application, and challenges. J. Mod. Power Syst. Clean Energy 2021, 9, 1239–1256. [Google Scholar] [CrossRef]
  16. Shi, R.; Zhang, X.; Hu, C.; Xu, H.; Gu, J.; Cao, W. Improvement of frequency regulation in VSG-based AC microgrid via adaptive virtual inertia. IEEE Trans. Power Electron. 2019, 35, 1589–1602. [Google Scholar] [CrossRef]
  17. Shi, R.; Zhang, X.; Hu, C.; Xu, H.; Gu, J.; Cao, W. Self-tuning virtual synchronous generator control for improving frequency stability in autonomous photovoltaic-diesel microgrids. J. Mod. Power Syst. Clean Energy 2018, 6, 482–494. [Google Scholar] [CrossRef]
  18. Khodayar, M.; Liu, G.; Wang, J.; Khodayar, M.E. Deep learning in power systems research: A review. CSEE J. Power Energy Syst. 2021, 7, 209–220. [Google Scholar] [CrossRef]
  19. de la Cruz, J.; Wu, Y.; Candelo-Becerra, J.E.; Vásquez, J.C.; Guerrero, J.M. Review of Networked Microgrid Protection: Architectures, Challenges, Solutions, and Future Trends. CSEE J. Power Energy Syst. 2024, 10, 448–467. [Google Scholar] [CrossRef]
  20. Ghodsi, M.R.; Tavakoli, A.; Samanfar, A. Microgrid Stability Improvement Using a Deep Neural Network Controller Based VSG. Int. Trans. Electr. Energy Syst. 2022, 2022, 7539173. [Google Scholar] [CrossRef]
  21. Yang, M.; Wu, X.; Loveth, M.C. A Deep Reinforcement Learning Design for Virtual Synchronous Generators Accommodating Modular Multilevel Converters. Appl. Sci. 2023, 13, 5879. [Google Scholar] [CrossRef]
  22. Venzke, Y.; Chatzivasileiadis, M. Efficient Creation of Datasets for Data-Driven Power System Applications. IEEE Trans. Power Syst. 2021, 36, 5326–5336. [Google Scholar] [CrossRef]
  23. Huo, Y.; Wu, Z.; Dai, J.; Duan, W.; Jiang, J. Lightweight Data-Driven Planning Method for Hybrid Energy Storage Systems. In Proceedings of the 2023 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia), Chongqing, China, 8–11 July 2023; pp. 1831–1836. [Google Scholar] [CrossRef]
  24. Wang, G.; Xin, H.; Wu, D.; Ju, P. Data-Driven Probabilistic Small Signal Stability Analysis for Grid-Connected PV Systems. Int. J. Electr. Power Energy Syst. 2019, 113, 824–831. [Google Scholar] [CrossRef]
  25. Zamzam, A.S.; Sidiropoulos, N.D. Data-Driven Power Flow Linearization: A Regression Approach. IEEE Trans. Power Syst. 2019, 34, 1785–1795. [Google Scholar]
  26. Yang, Y.; Hug, G. Distributed Optimal Power Flow with Data-Driven Sensitivity Computation. IEEE Trans. Smart Grid 2019, 10, 6799–6810. [Google Scholar]
  27. Xu, Y.; Wang, Q.; Mili, L.; Zheng, Z.; Gu, W.; Lu, S.; Wu, Z. A Data-Driven Koopman Approach for Power System Nonlinear Dynamic Observability Analysis. IEEE Trans. Power Syst. 2024, 39, 4090–4102. [Google Scholar] [CrossRef]
  28. Fu, Y.; Zhang, X.; Chen, L.; Tian, Z.; Hou, K.; Wang, H. Analytical Representation of Data-driven Transient Stability Constraint and Its Application in Preventive Control. J. Mod. Power Syst. Clean Energy 2022, 10, 1085–1096. [Google Scholar] [CrossRef]
  29. Zhou, Y.; Wang, X.; Liu, Y.; Zhang, Y. Physics-informed Evolutionary Strategy based Control for Mitigating Delayed Voltage Recovery. Electr. Power Syst. Res. 2023, 223, 109551. [Google Scholar]
  30. Wu, Z.; Zhang, M.; Gao, S.; Wu, Z.G.; Guan, X. Physics-Informed Reinforcement Learning for Real-Time Optimal Power Flow with Renewable Energy Resources. Electr. Power Syst. Res. 2023, 223, 109552. [Google Scholar] [CrossRef]
  31. Zhang, P.; Zhang, Y.; Liu, W.; Hu, J. Physics-Informed Multi-Agent Deep Reinforcement Learning Enabled Distributed Voltage Control for Active Distribution Network Using PV Inverters. IEEE Trans. Smart Grid 2021, 12, 2345–2357. [Google Scholar] [CrossRef]
  32. Li, L.; Chen, X.; Wang, Z.; Sun, K. Physics-Informed Reinforcement Learning for Probabilistic Wind Power Forecasting Under Extreme Events. Renew. Energy 2021, 174, 1063–1073. [Google Scholar] [CrossRef]
  33. Dissanayake, M.; Phan-Thien, N. Neural-network-based approximations for solving partial differential equations. Commun. Numer. Methods Eng. 1994, 10, 195–201. [Google Scholar] [CrossRef]
  34. Lagaris, I.E.; Likas, A.C.; Papageorgiou, D.G. Neural-network methods for boundary value problems with irregular boundaries. IEEE Trans. Neural Netw. 2000, 11, 1041–1049. [Google Scholar] [CrossRef]
  35. Lagaris, I.E.; Likas, A.; Fotiadis, D.I. Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 1998, 9, 987–1000. [Google Scholar] [CrossRef]
  36. Jin, X.; Cai, S.; Li, H.; Karniadakis, G.E. NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. J. Comput. Phys. 2021, 426, 109951. [Google Scholar] [CrossRef]
  37. Hou, L.; Zhu, B.; Zhang, W. A turbulent channel flow modeling method based on PINN for low Reynolds number. In Proceedings of the 31th National Conference on Hydrodynamics, Chicago, IL, USA, 22–24 November 2020; pp. 1037–1044. [Google Scholar]
  38. Cai, S.; Wang, Z.; Wang, S.; Perdikaris, P.; Karniadakis, G. Physics-informed neural networks for heat transfer problems. J. Heat Transf. 2021, 143, 060801. [Google Scholar] [CrossRef]
  39. Cai, S.; Wang, Z.; Fuest, F.; Jeon, Y.J.; Gray, C.; Karniadakis, G.E. Flow over an espresso cup: Inferring 3-D velocity and pressure fields from tomographic background oriented Schlieren via physics-informed neural networks. J. Fluid Mech. 2021, 915, A102. [Google Scholar] [CrossRef]
  40. Schwenzer, M.; Ay, M.; Bergs, T.; Abel, D. Review on Model Predictive Control: An Engineering Perspective. Int. J. Adv. Manuf. Technol. 2021, 117, 1327–1349. [Google Scholar] [CrossRef]
  41. Saltık, M.B.; Özkan, L.; Ludlage, J.H.; Weiland, S.; Van Den Hof, P.M. An Outlook on Robust Model Predictive Control Algorithms: Reflections on Performance and Computational Aspects. J. Process Control 2018, 61, 77–102. [Google Scholar] [CrossRef]
  42. Younesi, A.; Tohidi, S.; Feyzi, M.R. An Improved Long-horizon Model Predictive Control for DFIG in WECS with Variable Sampling-time. IET Renew. Power Gener. 2022, 16, 517–531. [Google Scholar] [CrossRef]
  43. Hossain, R.R.; Kumar, R. Machine Learning Accelerated Real-Time Model Predictive Control for Power Systems. IEEE/CAA J. Autom. Sin. 2023, 10, 916–930. [Google Scholar] [CrossRef]
  44. Bahrani, B.; Ravanji, M.H.; Kroposki, B.; Ramasubramanian, D.; Guillaud, X.; Prevost, T.; Cutululis, N.A. Grid-Forming Inverter-Based Resource Research Landscape: Understanding the Key Assets for Renewable-Rich Power Systems. IEEE Power Energy Mag. 2024, 22, 18–29. [Google Scholar] [CrossRef]
  45. Banerjee, C.; Nguyen, K.; Fookes, C.; Raissi, M. A Survey on Physics Informed Reinforcement Learning: Review and Open Problems. Expert Syst. Appl. 2023, 287, 128166. [Google Scholar] [CrossRef]
  46. Cui, H.; Li, F.; Tomsovic, K. Hybrid symbolic-numeric framework for power system modeling and analysis. IEEE Trans. Power Syst. 2020, 36, 1373–1384. [Google Scholar] [CrossRef]
  47. Hou, X.; Sun, Y.; Zhang, X.; Lu, J.; Wang, P.; Guerrero, J. Improvement of Frequency Regulation in VSG-Based AC Microgrid Via Adaptive Virtual Inertia. IEEE Trans. Power Electron. 2020, 35, 1589–1602. [Google Scholar] [CrossRef]
Figure 1. VSG process. (a) GFM-VSG components; (b) VSG block.
Figure 1. VSG process. (a) GFM-VSG components; (b) VSG block.
Electronics 14 03503 g001
Figure 2. Overall structure. (a) Timing logic diagram for adaptive transient power angle control; (b) schematic diagram of the principle of adaptive transient power angle control.
Figure 2. Overall structure. (a) Timing logic diagram for adaptive transient power angle control; (b) schematic diagram of the principle of adaptive transient power angle control.
Electronics 14 03503 g002
Figure 3. Actor architecture with 3N-D module.
Figure 3. Actor architecture with 3N-D module.
Electronics 14 03503 g003
Figure 4. PINNs Architecture.
Figure 4. PINNs Architecture.
Electronics 14 03503 g004
Figure 5. Comparison of the effects of J and D on system response. (a) Impact of J in ω ; (b) impact of D in ω ; (c) impact of J in P e ; (d) impact of D in P e .
Figure 5. Comparison of the effects of J and D on system response. (a) Impact of J in ω ; (b) impact of D in ω ; (c) impact of J in P e ; (d) impact of D in P e .
Electronics 14 03503 g005
Figure 6. Modified IEEE-39 bus system topology.
Figure 6. Modified IEEE-39 bus system topology.
Electronics 14 03503 g006
Figure 7. PI-DDPG training efficiencies. (a) rewards, (b) critic loss, (c) policy loss, (d) and physics loss.
Figure 7. PI-DDPG training efficiencies. (a) rewards, (b) critic loss, (c) policy loss, (d) and physics loss.
Electronics 14 03503 g007
Figure 8. PI-DDPG-based 3N-D controller deployment performance. (a) Without adaptive tuning; (b) PI-DDPG-based adaptive tuning of 3N-D controllers.
Figure 8. PI-DDPG-based 3N-D controller deployment performance. (a) Without adaptive tuning; (b) PI-DDPG-based adaptive tuning of 3N-D controllers.
Electronics 14 03503 g008
Figure 9. System response of ω and P e for different cases with/without control. (a) Case 1, (b) Case 2, (c) and Case 3.
Figure 9. System response of ω and P e for different cases with/without control. (a) Case 1, (b) Case 2, (c) and Case 3.
Electronics 14 03503 g009
Figure 10. Comparison of different algorithms in training. (a) Case 1, (b) Case 2, (c) and Case 3.
Figure 10. Comparison of different algorithms in training. (a) Case 1, (b) Case 2, (c) and Case 3.
Electronics 14 03503 g010
Figure 11. Sensitivity analysis of reward weight coefficients.
Figure 11. Sensitivity analysis of reward weight coefficients.
Electronics 14 03503 g011
Figure 12. Modified WECC-179 bus system.
Figure 12. Modified WECC-179 bus system.
Electronics 14 03503 g012
Figure 13. Training performance and deployment of different algorithms on modified WECC-179 bus system. (a) Rewards, (b) Critic loss, (c) without control, (d) and PI-DDPG-based adaptive control.
Figure 13. Training performance and deployment of different algorithms on modified WECC-179 bus system. (a) Rewards, (b) Critic loss, (c) without control, (d) and PI-DDPG-based adaptive control.
Electronics 14 03503 g013
Table 1. VSG Parameters.
Table 1. VSG Parameters.
ParameterValueParameterValueRange
T c 0.01 s K p v 0.5 p.u.-
J10 K i v 0.02 p.u.[4, 20]
D65 K p I 0.2 p.u.[30, 80]
f n 60 Hz K i I 0.01 p.u.-
Table 2. Comparison of training performance of different algorithms.
Table 2. Comparison of training performance of different algorithms.
CaseEpochsPI-DDPGDDPGSAC
RewardSuccess RateRewardSuccess RateRewardSuccess Rate
11000−0.0355.43%−9.4729.03%−8.8830.79%
50001.9160.41%−5.9738.71%−4.1243.99%
10,0001.9160.41%−5.9738.71%1.4359.24%
15,0001.9160.41%−5.9738.71%3.1764.22%
2100015.0096.88%−2.8648.13%−2.6847.50%
500015.0397.50%−2.9048.13%−0.0255.00%
10,00015.0597.50%−2.3548.13%12.1088.75%
15,00015.0697.50%−2.7948.13%14.4596.25%
3100015.7899.38%−2.7447.83%−1.6550.93%
500015.8099.38%−2.7447.83%3.0463.98%
10,00015.8099.38%−2.7447.83%13.8393.79%
15,00015.8099.38%−2.7447.83%15.8899.38%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, J.; Chen, S.; Fan, S.; Zhang, J.J.; Ke, D.; Jun, H.; Jiang, K.; Gao, D.W. Adaptive Transient Power Angle Control for Virtual Synchronous Generators via Physics-Embedded Reinforcement Learning. Electronics 2025, 14, 3503. https://doi.org/10.3390/electronics14173503

AMA Style

Gao J, Chen S, Fan S, Zhang JJ, Ke D, Jun H, Jiang K, Gao DW. Adaptive Transient Power Angle Control for Virtual Synchronous Generators via Physics-Embedded Reinforcement Learning. Electronics. 2025; 14(17):3503. https://doi.org/10.3390/electronics14173503

Chicago/Turabian Style

Gao, Jiemai, Siyuan Chen, Shixiong Fan, Jun Jason Zhang, Deping Ke, Hao Jun, Kezheng Jiang, and David Wenzhong Gao. 2025. "Adaptive Transient Power Angle Control for Virtual Synchronous Generators via Physics-Embedded Reinforcement Learning" Electronics 14, no. 17: 3503. https://doi.org/10.3390/electronics14173503

APA Style

Gao, J., Chen, S., Fan, S., Zhang, J. J., Ke, D., Jun, H., Jiang, K., & Gao, D. W. (2025). Adaptive Transient Power Angle Control for Virtual Synchronous Generators via Physics-Embedded Reinforcement Learning. Electronics, 14(17), 3503. https://doi.org/10.3390/electronics14173503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop