1. Introduction
Electric vehicles (EVs) have gradually replaced traditional fuel-powered vehicles, emerging as a crucial mode of transportation [
1,
2,
3]. Besides, the rapid development of vehicle-to-grid (V2G) charging piles enables EVs can function as a critical load-side resource to provide frequency response (FR) [
4,
5].
Meanwhile, with the large-scale access to renewable energy sources from sustainable energy systems [
6], the inertia of the new power system is declining. Therefore, the power system’s frequency is easier to rapidly change in response to disturbances, leading to significant safety risks [
7]. Therefore, virtual synchronous generators (VSGs) have become a matter of great concern. VSG can provide inertia support to the new power system, effectively slowing down the rate of frequency change [
8,
9].
Given the above context, the application of inertia control strategies within CSs to enhance FR performance has emerged as a critical research direction in the energy field [
10,
11]. The traditional VSG control strategy with constant parameters can provide inertia to the system nodes but loses part of the dynamic regulation ability [
12]. Therefore, many scholars have researched in the field of VSG inertia adaptive adjustment. Ref. [
13] analyzed the correlation between frequency change and inertia coefficient and proposed a VSG control strategy with adaptive virtual inertia. Refs. [
14,
15] applied a fuzzy controller to the VSG control strategy to realize adaptive virtual inertia. However, these control strategies only focus on improving frequency dynamic characteristics under a single VSG. The FR capability of an individual CS is insufficient to improve the FR effect. Additionally, for CSs participating in FR, inertia adjustments in the aforementioned strategies do not consider the variations and differences in different CSs’ FR capabilities.
The multi-VSG dynamic characteristics interact with each other, causing frequency and CS response power oscillations of each node [
16,
17], which seriously depletes EV battery usage life. In recent years, the research about multi-VSG control can be divided into two types: physics-driven and data-driven.
For physics-driven methods, some scholars modified the VSG control strategy with adaptive control, fuzzy control, and optimization algorithms. Ref. [
18] used a particle swarm optimization algorithm to optimize VSGs’ control parameters. However, the optimized parameters cannot be changed in real time. Ref. [
19] proposed a cooperative virtual inertia control strategy for multiple energy storage stations (ESSs) to coordinate state of charge (SOC) balance and inertia enhancement. Ref. [
20] analyzed the relationship between frequency and inertia and proposed a multi-CS control strategy based on adaptive virtual inertia. However, the adaptive inertia in the aforementioned single physics-driven methods relies solely on fixed equations and constant coordination coefficients for CS VSGs. This diminishes the accuracy of inertia adjustments, affecting frequency control performance and adaptability under various FR scenes, particularly in situations involving communication interruptions.
Therefore, for the design of an adaptive VSG control strategy, the deep reinforcement learning (DRL) algorithm which does not depend on specific models has received widespread attention in recent years [
21,
22]. DRL can achieve effective data accumulation through interaction with the environment to evolve control effects continuously in the training process. It can be divided into two categories. The first category is the value-based method, such as deep Q-network [
23]. However, the deep Q-network action space is discrete. When deep Q-network is applied in inertia control, the virtual inertia changing value is discretized, influencing the system FR dynamic performance. The second type is the policy gradient method [
24], which is suitable for continuous high-dimensional action space. Ref. [
25] took the active power output, angular frequency, and its derivatives as observations, and transformed the optimal VSG adaptive control problem into a deep deterministic policy gradient problem. However, the action selection of deep deterministic policy gradient, as a deterministic strategy algorithm, is limited [
26]. Ref. [
27] proposed a load frequency control method based on proximal policy optimization to optimize the balance between generation cost and frequency stability in an isolated microgrid. However, proximal policy optimization, as an on-policy algorithm, depends on sample size and its effectiveness.
The soft actor-critic (SAC) algorithm not only introduces action entropy to let action output be dispersed without missing useful actions in the training process but also applies an experience playback pool to enhance sample utilization. Therefore, Ref. [
28] introduced SAC to train an agent based on the active power and frequency as state input to output virtual inertia adjustment actions for the optimal FR. However, the above literatures only use a single agent to take a VSG as a single control object. When several CSs are connected to the power system, the training time and dimension under a centralized control architecture will increase exponentially [
29]. The multiagent-DRL (MA-DRL) method has good expansibility, and can effectively solve multiple control objects problem through decentralized implementation. Ref. [
30] proposed a distributed dynamic inertial droop control strategy based on MA-DRL for multiple ESSs. However, this single physics-driven method focuses on directly adjusting inertia for ESSs and is not suitable for CSs. Firstly, due to the varying EV users’ charging demands, CSs’ FR capabilities vary frequently compared to those of ESSs. Additionally, unlike ESS batteries participating in FR, the service life of EV batteries is more susceptible to degradation from frequent charge and discharge cycles. Consequently, the diversity of CS FR scenes complicates the training environment, which negatively impacts the learning efficiency of single data-driven methods that lack a priori knowledge.
Therefore, unlike previous studies that either rely solely on fixed equations under a single physics-driven method, diminishing the accuracy of inertia adjustments or ignore the complexity of FR scenes involving CS participation, which impacts the learning efficiency of a single data-driven method, this paper proposes a physics-data fusion enhanced VSG control strategy for multiple CSs active FR. The main contributions of this paper are drawn as follows:
- (1)
This paper incorporates a power grid frequency control framework with multiple CS VSGs participation based on physics-data fusion driven. Through MA communication of this framework, the application basic of the reinforcement learning method in CS VSG control is constructed, which considers the variations and differences in CS FR capabilities.
- (2)
The relationship between the angular velocity changes of multiple CS VSGs and their inertia are integrated into the data-driven method as prior knowledge. The agent learning efficiency, interpretability for FR effect improvement, and generalization for unknown scenes are enhanced under the power grid frequency control framework with multiple CS VSGs participation.
- (3)
The proposed control strategy utilizes the MA-SAC algorithm to dynamically adjust the coordination coefficients between CS VSGs. The MA-SAC achieves the cooperative control idea of “centralized training and decentralized execution” to improve adaptability under various FR scenes, including situations with communication interruptions.
2. Power Grid Frequency Control Architecture with Distributed CSs Participation
When power disturbance occurs, the energy control centers (ECCs) in the CSs can change the charging power of EVs through V2G charging piles. To achieve the above process, it is necessary to incorporate multiple CSs into a power grid frequency control framework.
2.1. Frequency Control Architecture of Power Grid with the CS Participation
In recent years, many scholars have researched on CS FR capability evaluation method [
10,
31] and V2G response model [
19,
32], making CSs a vital frequency regulation resource in the power grid frequency control architecture. Based on the EV users’ charging demand and EV message, the boundary of the CS response power
, including CS equivalent power generation capability Δ
and CS equivalent load capability Δ
can be aggregated by the CS FR capability evaluation method. Furthermore, based on the frequency situation, the CS response power
can be distributed among EVs in the CS by the V2G response model.
To further improve the power grid frequency control architecture with CS participation based on the above research, this paper focuses on adaptive inertia design for CS VSG. CS participation in frequency response control architecture in this paper is shown in
Figure 1, where each CS is equipped with an ECC, a CS VSG controller, and a CS VSG agent. ECC can evaluate each EV’s FR capability based on EV users’ charging demands received from charging piles. Then CS FR capability including CS equivalent power generation capability Δ
and equivalent load capability Δ
can be aggregated. Based on Δ
, Δ
, rotor angular velocity ω, CS VSG electromagnetic power
received from the corresponding node, and original charging power plan
, CS VSGs can respectively output CS response power
to the ECC and the PWM signal to CS grid-connected inverter to provide inertia for the power system. Based on
, the ECCs control V2G charging piles to adjust EVs’ charging plans.
2.2. Information Interaction Modeling for Distributed CSs
Based on graph theory knowledge, the communication topology graph of an MA system composed of n agents can be represented by directed graph G = (V, E, A), where V = (V1, V2, …, Vn) is a node set; E ⊆ V × V is an edge set; communication topology A = [bij]n×n is an adjacent matrix, and diagonal elements bii = 0, i = 1, 2, …, n. When the directed edge Eij = (Vi, Vj) ∈ E, bij = 1; otherwise, bij = 0.
In the context of graph theory, a CS equipped with a VSG agent can be modeled as a node within a network [
20,
30]. Under 5G communication technology, communication delay can be ignored when some CSs’ communication distance is within a certain range [
20]. This technological advantage enables CS VSG agents can exchange information.
Assuming that CSi VSG agent i can receive information from CSj VSG agent j, CSj VSG agent j is defined as an adjacent agent (AAG) of CSi VSG agent i, and CSi VSG agent i is defined as the leading agent (LAG) of CSj VSG agent j. If this communication mode is two-way, CSi VSG agent i and j are an AAG and a LAG to each other, bij = bji = 1. When the communication channel between CSi VSG agent i and j is suddenly interrupted, bij and bji change from 1 to 0.
Based on the above background knowledge, the adjustment process of the LAG to the inertia of the local CS VSG can be described in
Figure 2. In
Figure 2, based on the communication topology
A, CS VSG agents can exchange state information. Then, according to the state information received from local CS VSG and AAGs, LAG can reasonably adjust local CS VSG’s inertia to enhance the dynamic adjustment ability of the power system. In the following sections, the relationship between inertia adjustment and state message is specified.
4. CS VSGs Control Strategy Based on Physics-Data Fusion Model
The establishment equation of adaptive inertia in
Section 3 is not complete. It equates the influence degrees of AAGs to LAGs’ inertia. Therefore, it is necessary to introduce a data-driven method to adjust coordination coefficients, enhancing adaptability to reduce frequency oscillation and CS response power oscillation.
4.1. Physics-Data Fusion Modeling
The physics-driven adaptive inertia has rigidity based on mechanism analysis, but the model construction is too idealized. It is impossible to distinguish the difference in AAGs’ influence on LAGs, reducing inertia adjustment accuracy. Therefore, it is difficult to reduce frequency oscillation and CS response power oscillation. The data-driven model is flexible based on experience playback, which is suitable for complex scenes. However, in the scene where CSs participate in FR, the diversity of CS FR capabilities, communication situation, and power disturbances would lead to low training efficiency when lacking a priori analysis, making actions to adjust virtual inertia confusing.
Given the respective limitations of a physics-driven method and a data-driven method, a physics-data fusion model that integrates mechanistic rules with experience playback can be more effectively applied to CS participation in frequency response control architecture. By applying reinforcement learning to optimize coordination coefficients within the physics-driven adaptive CS VSG inertia, this approach not only enhances the accuracy of inertia adjustments by distinguishing the differential impacts of AAGs on LAGs but also improves MA training efficiency through the incorporation of a priori knowledge.
4.2. SAC Basic Principles
This paper uses the MA-SAC to achieve the cooperative control idea of “centralized training and decentralized execution”, respectively adjusting coordination coefficients between CS VSGs. The SAC discount cumulative reward function
M is as follows:
where
πβ is agent action strategy, reflecting the probability distribution of agent action selection;
st and
at are respectively the state and action taken by the agent at
t;
E[·] is the mathematical expectation; (
st,
at)~
ρπ represents the distribution of trajectories (
st,
at) under the strategy
π;
r(
st,
at) is the reward value based on
at and
st;
α is a regularized entropy parameter, determining the entropy importance and the strategy exploratory.
H(
πβ(|
st)) is action entropy of strategy
π under state
st;
β is the network parameter of
πβ.
The SAC actor network is responsible for modeling the action strategy
πβ, and the critic network evaluates the strategy by Q-value function
Qθ(
st,
at), defined as follows:
where
Qθ(
st,
at) is the sum of the expected discount reward and the expected action entropy from the state-action (s
t,
at) to the end;
e is the reward discount factor;
Est+1~ρπ represents the sum of the expected values of all states
st+1 under the trajectory
ρπ.
The SAC algorithm consists of three neural networks, that is actor network, critic network, and target critic network. Their parameters are respectively updated as follows:
where
Mπ(
β) and
MQ(
θ) are respectively loss functions with parameters
β and
θ as the independent variable;
D is the experience playback pool;
τ is the update coefficient.
4.3. Adaptive Virtual Inertia Modified by Physics-Data Fusion
The power grid frequency control framework with multiple CS participation based on physics-data fusion driven is shown in
Figure 5, including environment and CS VSG agents. The former includes the power grid frequency control architecture with CS participation and CSs’ communication topology. The latter can interact with the environment and obtain information from AAGs based on the communication topology.
si,t,
ai,t, and
ri,t, are respectively defined below:
- (1)
state:
where
si,t is state information for CS
i VSG agent
i at the
tth step;
is CS
i VSG active power instruction;
is FR capabilities for CS
i VSG and its AAGs;
is the angular velocity for CS
i VSG and its AAGs; Δ
and Δ
are respectively equivalent generation power and equivalent load power from CS
i VSG; Δ
and Δ
are respectively equivalent generation power and equivalent load power from its
jth AAG; Δ
ωi,t and d
ωi,t/d
t are respectively CS
i VSG angular velocity and its derivative; Δ
and d
/d
t are respectively angular velocity and its derivative from its
jth AAG.
- (2)
action:
where
ai,t is actor network
i action at the
tth step;
ki,j,t is coordination coefficient between CS
i VSG and CS
j VSG at the
tth step. Therefore, the proposed method can adjust CS VSGs’ coordination coefficients to enhance inertia adjustment accuracy.
- (3)
reward:
where
ri,t is the reward value of CS
i VSG agent
i at the
tth step, reflecting the negative square sum of the angular velocity difference between CS
i VSG agent
i and its AAGs;
q is the reward weight [
30] and is set at 4 × 10
7 in this paper;
rall is total reward value for MA and control steps;
Ts and
Tf are respectively each step control time and simulation time per episode;
N is the number of control steps.
During the training process, by the interaction with the environment, MA can gradually reasonably adjust corresponding coordination coefficients with consideration of CS FR capabilities to improve inertia adjustment accuracy. Then the rewards are gradually maximized, so different nodes’ frequency oscillation is restrained. Besides, restraining the frequency oscillation can also reduce the response power oscillation, preventing EVs in the CS from switching the states frequently to protect EV battery service life.
4.4. Assessment Index
To compare the control effect with other control strategies, some assessment indexes are designed as follows.
where
Ri is the frequency offset square sum of CS
i VSG node and other CS VSGs nodes;
Vi is the sum of CS
i response power oscillation absolute value;
Rall/Vall is the sum of
Ri/
Vi (
i = 1, 2, …,
n); Δ
fi(
t) is the frequency offset of CS
i VSG node relative to the rated frequency at the time
t;
t0 is simulation start time;
t1 is simulation end time;
is CS
i VSG response power at the time
t; Δ
t is the interval time for CS response power comparison. When different control strategies are applied in CSs, the smaller
Ri or
Rall is, the smaller different nodes’ frequency offsets are; the smaller
Vi or
Vall is, the less frequently the EV status is switched and the less EV battery loss is.
5. Results and Discussion
The effectiveness of the proposed strategy in this paper is verified by the following simulation.
Section 5.1 gives the simulation scene;
Section 5.2 gives the MA training situation.
Section 5.3 compares the proposed control strategy with traditional CS VSGs and adaptive CS VSGs control strategy.
5.1. Simulation Scene
To verify the proposed strategy, four CS VSGs are used to replace the SGs at nodes 31, 33, 36, and 39 in the IEEE 39-node system. The four CS VSGs’ initial inertia information and communication topology are shown in
Table 1. The IEEE 39-node system after CS VSGs replacement is shown in
Figure 6, where 1−39 represent the network nodes.
Besides, disturbance load, communication topology, and CSs’ FR capabilities should be considered to establish random scenes: (1) Load disturbance: load disturbance ranging from −1 MW to 1 MW is randomly generated in load nodes; (2) Communication topology: the communication between CS2 VSG agent and CS3 VSG agent is weak, in which there is a 10% probability of interruption, that is b2,3 and b3,2 change from 1 to 0; (3) CSs’ FR capabilities: 24 situations are set to simulate four CSs’ FR capabilities’ difference and variability for every hour of one day. Then 200 groups of scenes are randomly generated, and the training set and test set are divided at the proportion of 3:1.
Additionally, the hyperparameter settings for each CS VSG agent are the same, as shown in
Table 2. The software running the simulation is MATLAB/Simulink 2023a, which is configured on a computer with a 13th Gen Intel (R) Core (TM) i9-13900F CPU (32 GB).
5.2. The MA Training Process
The value range of each parameter in the action
ai,t is set as [0, 6 × 10
5], and the training set is cycled for four CS VSG agents’ training. The four agents’ reward training curves under the proposed control strategy are shown in
Figure 7. The dark green curves represent the average cumulative reward value of nearly 150 training episodes, and the light green curves represent the cumulative reward value in each episode. Besides, to verify the proposed control strategy with prior knowledge has better learning efficiency, the four agents’ reward training curves under a single data-driven method are shown in
Figure 8. The SAC parameters of the proposed strategy and a single data-driven method are the same.
In
Figure 7, for the proposed control strategy, CS VSGs’ coordination coefficient adjustments are disordered during the first 500 training episodes, leading to inertia adjustment disorder, which makes the reward value small. With the MA-SAC training, CS VSGs’ coordination coefficient adjustments gradually tend to be reasonable, leading the reward value to increase gradually, and the value is stable after 1200 training episodes. In
Figure 8, for a single data-driven method, inertia adjustments are still disordered during the first 750 training episodes and the reward value is not stable after 1500 training episodes.
Therefore, the above training results show that for the proposed control strategy, the CS VSG agents can not only achieve effective learning but also have better learning efficiency due to evolution interpretability in the prior knowledge under data-driven. Meanwhile, the power grid frequency control framework with multiple CS VSGs participation based on physics-data fusion driven proposed in this paper provides an application basic for the reinforcement learning method in CS VSG control.
5.3. Control Effect of the Proposed Strategy
The proposed strategy’s control is compared with traditional CS VSGs and adaptive CS VSGs control strategy. The CSs’ initial inertia and communication topology for these control strategies are shown in
Table 1. The inertia of traditional CS VSGs control strategy is constant. Adaptive CS VSGs control strategy is established based on
Section 3.3, and coordination coefficients are regarded as consistent, setting to 2 × 10
5.
Firstly, 150 training scenes and 50 testing scenes are applied to the above three control strategies, their each scene’s cumulative reward is compared as shown in
Figure 9. In
Figure 9a, for the training set, the average
rall of traditional CS VSGs control strategy, adaptive CS VSGs control strategy, and the proposed strategy are respectively −292.67, −236.04, and −154.26. In
Figure 9b, for the testing set, the average
rall of three strategies are respectively −298.43, −239.08, and −168.00. So average
rall of the proposed strategy are respectively 65.353% and 70.269% of this of the adaptive CS VSGs control strategy for the training set and the testing set, meaning the proposed strategy has better adaptability under various FR scenes and generalization for unknown scenes. This is because the proposed strategy utilizes MA-SAC to adjust coordination coefficients with consideration of CS FR capability to achieve more accurate inertia adjustment.
To specifically reflect the proposed strategy’s validity in improving system FR effect and adaptability under different FR scenes, this paper constructs five cases. The load of 100 MW suddenly increases at node 25, and Δ of CS1, CS2, CS3, and CS4 are respectively 3.33 MW, 5.38 MW, 1.90 MW, and 5.41 MW. The control strategy applied in CS grid-connected inverters and communication topology for five cases are respectively as follows:
Case 1 applies traditional CS VSGs and the communication is intact as shown in
Table 1; Case 2 applies adaptive CS VSGs and the communication is intact as shown in
Table 1; Case 3 applies the proposed control strategy and the communication is intact as shown in
Table 1; Case 4 applies adaptive CS VSGs and the communication is shown in
Table 1 with the exception that the communication between CS
2 VSG agent 2 and CS
3 VSG agent 3 is interrupted; Case 5: applies the proposed control strategy and the communication is shown in
Table 1 with the exception that the communication between CS2 VSG agent 2 and CS3 VSG agent 3 is interrupted.
5.3.1. Intact Communication
To verify the proposed control strategy can reduce different nodes’ frequency differences and CS response power oscillation, frequency, inertia, and CS response power for Case 1, Case 2, and Case 3 are shown in
Figure 10.
Firstly, from the two red boxes in
Figure 10a,d, it can be seen that the frequency offset between different nodes under the adaptive CS VSGs control strategy is less than that of the traditional CS VSGs control strategy. This is because adaptive CS VSGs can adaptively adjust CS VSGs’ inertia based on angular velocity offset and its change rate from its own CS VSG and its AAGs to suppress angular velocity difference, which can be seen in
Figure 10e.
Moreover, from the two red boxes in
Figure 10d,g, it can be seen that the frequency offset between different nodes under the proposed control strategy is less than that of the adaptive CS VSGs control strategy by further adjusting CS VSGs’ coordination coefficients. Meanwhile, in
Figure 10c,f,i, CS response power oscillation of the proposed strategy is the weakest, preventing EVs in the CSs from switching charge state frequently, protecting EV battery service life.
5.3.2. Damaged Communication
To verify the proposed control strategy’s adaptability under a communication interruption. This paper considers Case 4 and Case 5, in which the communication between CS
2 VSG agent 2 and CS
3 VSG agent 3 is interrupted. Frequency, inertia, and CS response power for Case 4 and Case 5 are shown in
Figure 11.
Firstly, comparing with
Figure 10d Case 2 and
Figure 11b Case 4 which both apply adaptive CS VSGs, it can be seen that influenced by the communication interruption, inertia adjustment of CS
2 VSG and CS
3 VSG in Case 4 is reduced. Besides, comparing
Figure 10h Case 3 and
Figure 11e Case 5 which both apply the proposed control strategy, it can be seen communication interruption influence still exists but is reduced. Because the proposed control strategy can adjust coordination coefficient
k21 and
k34 respectively to realize compensation.
5.3.3. Assessment Index Comparison Under Different Cases
To further explain the proposed control strategy’s adaptability during a communication interruption,
Table 3 shows the assessment index results in Cases 1–5.
Firstly, for traditional CS VSGs, no matter whether the communication is interrupted or intact, there is no difference in the control effect. Because traditional CS VSGs’ inertia is constant, leading to the weak dynamic adjustment that Rall and Vall are respectively 49.322 Hz2·s and 35.503 MW·s, relatively large.
Besides, the adaptive CS VSGs control strategy is seriously affected by communication interruption. Rall under Case 4 increases obviously by 8.996% to 41.763 Hz2·s, compared with 38.316 Hz2·s under Case 2. Even Vall of adaptive CS VSGs control strategy under Case 4 increases by 1.023% to 35.870 MW·s compared with 35.503 MW·s of traditional CS VSGs under Case 1. This is because the adaptive CS VSGs control strategy under a single physics-driven method relies solely on fixed equations with constant coordination coefficients for CS VSGs, diminishing the accuracy of inertia adjustments and adaptability under different FR scenes.
Meanwhile, the proposed control strategy is also affected by communication interruption, but not serious. Rall under Case 5 increases slightly by 6.866% to 20.421 Hz2·s compared with 19.109 Hz2·s under Case 3. Vall of the proposed control strategy under Case 5 is 31.111 MW·s, only 87.629% of Vall under Case 1. This is because the proposed control strategy has physics-data fusion modeling advantages by adjusting the coordination coefficients, alleviating the communication interruption influence.
Finally, when the communication is intact, Rall and Vall under Case 4 are respectively 49.872% and 79.542% of those under Case 2; when the communication between CS2 VSG agent 2 and CS3 VSG agent 3 is interrupted, Rall and Vall under Case 5 are still respectively 48.897% and 86.733% of those under Case 3. These show that the proposed control strategy has strong adaptability under various FR scenes, including situations with communication interruptions, which can effectively improve the system’s dynamic FR characteristics, that is reducing CS response power oscillation and frequency offset of different nodes.