Spatially Heterogeneous Resilient V2G-Enabled Grid Frequency Control via an Adversarially Trained Structural Switching Framework

Xiong, Xiong; Li, Shengyao; Xia, Kaiyi; Zheng, Hao; Huang, Zicheng; Zhu, Tong; Wang, Zijie; Kang, Qi

doi:10.3390/sym18050843

Open AccessArticle

Spatially Heterogeneous Resilient V2G-Enabled Grid Frequency Control via an Adversarially Trained Structural Switching Framework

by

Xiong Xiong

¹,

Shengyao Li

¹,

Kaiyi Xia

¹,

Hao Zheng

¹,

Zicheng Huang

²,

Tong Zhu

¹,

Zijie Wang

³ and

Qi Kang

^4,*

¹

School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, NSW 2052, Australia

²

College of Automation, Chongqing University, Chongqing 400044, China

³

Hebei Key Laboratory of Man-Machine Environmental Thermal Control Technology and Equipment, Hebei Vocational University of Technology and Engineering, Xingtai 054000, China

⁴

Department of Control Science and Engineering, College of Electronic and Information Engineering, Tongji University, Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(5), 843; https://doi.org/10.3390/sym18050843 (registering DOI)

Submission received: 25 March 2026 / Revised: 28 April 2026 / Accepted: 12 May 2026 / Published: 14 May 2026

(This article belongs to the Special Issue Symmetry in Sensing, Computing and Intelligence for Cyber–Physical Systems)

Download

Browse Figures

Versions Notes

Abstract

With the increasing penetration of renewable energy, power systems require fast and reliable frequency regulation resources. Vehicle-to-grid (V2G) aggregation can provide fast response capability. However, it relies heavily on communication networks and is vulnerable to communication degradation and false data injection attacks (FDIAs). To address this challenge, this paper proposes a detection-free resilient control method for V2G-based frequency regulation. Rather than relying on explicit attack detection or compensation, the proposed method achieves decision-level adaptation from closed-loop system feedback through dynamic selection and switching of aggregator subsets. In this way, unreliable or compromised aggregators are implicitly avoided, improving system robustness under uncertain communication and cyber conditions. To further enhance robustness, a diffusion-based adversarial reinforcement learning framework is developed. A conditional diffusion model is used to generate diverse capacity scenarios with spatial heterogeneity. Adversarial training formulates the interaction between the attacker and the defender as a zero-sum game. This enables the learning of robust selection–switching policies under worst-case disturbances. Simulation results on the IEEE 39-bus system show that the proposed method improves frequency regulation performance under communication degradation and FDIA. The RMS frequency deviation is reduced from 0.13426 Hz to 0.09174 Hz compared with the no-defense case.

Keywords:

vehicle-to-grid; grid frequency control; spatial heterogeneity; cyber security; adversarial reinforcement learning; detection-free control

1. Introduction

The large-scale integration of renewable energy sources such as wind and photo-voltaic power has increased net load variability and forecasting uncertainty, leading to more frequent power imbalances and larger frequency deviations, which complicates frequency regulation [1,2]. Meanwhile, the decreasing share of synchronous generators reduces system inertia and weakens fast frequency support. Under low-inertia conditions, frequency dynamics become faster, with higher rates of change, deeper nadirs, and shorter regulation time windows, requiring more responsive and reliable control strategies. Electric vehicles (EVs), as distributed flexible resources, support bidirectional charging/discharging and provide fast power response through power electronic interfaces. Through vehicle-to-grid (V2G) interaction, EVs can participate in frequency regulation and provide ancillary services [3,4]. With the rapid growth of EVs and the expansion of charging infrastructure, aggregated V2G resources have become a viable option for grid frequency regulation.

However, large-scale V2G-based frequency control relies heavily on communication networks. Aggregators must report available capacity to the control center, and dispatch commands must be delivered to EV resources through communication links. Therefore, the regulation process inherently forms a cyber–physical system. In practice, communication delays, packet losses, and false data injection attacks can degrade the accuracy and reliability of information exchange, thereby affecting regulation performance. In addition, variations in communication conditions and security levels across regions introduce spatial heterogeneity. Under these conditions, maintaining reliable and stable frequency regulation with large-scale V2G participation remains a key challenge.

1.1. Literature Review

Existing studies on V2G-based frequency regulation usually adopt an aggregator-based framework, in which geographically distributed EV resources are coordinated to participate in grid frequency regulation. In large-scale V2G systems, both capacity reporting and command dispatch rely on communication networks. As a result, system performance depends not only on the regulation capability of EV resources, but also on communication degradation and cyber attacks. As the system scale increases, the impact of these two factors on frequency control becomes more pronounced.

For communication degradation, existing studies typically model communication delays, packet losses, and denial-of-service attacks as disturbances in networked control systems, and design control strategies accordingly. For instance, observer-based distributed load frequency control has been used to mitigate the effects of degraded network quality and communication delays on regulation performance [5]. Robust control approaches have also been developed to maintain closed-loop stability under denial-of-service attacks [6]. In addition, reinforcement learning has been applied to frequency regulation under communication constraints and extended to nonlinear cyber–physical systems [7,8]. These studies mainly focus on performance degradation caused by communication issues, but generally assume that the transmitted information is reliable and pay less attention to malicious data manipulation.

For false data injection attacks, most existing work follows a detect–estimate–compensate framework. In this approach, abnormal measurements or control signals are first identified, the impact of the attack is then estimated, and the control input is adjusted accordingly. Along this direction, state-estimation-based methods have been proposed for detecting abnormal data in load frequency control systems [9]. Reinforcement learning has also been used to enhance the resilience of microgrid frequency control under attack conditions [10]. Other approaches include observer-based resilient control, adaptive residual observers, and recurrent-neural-network-based detection methods for anomaly identification and attack mitigation [11,12,13,14]. While these methods improve the ability of frequency control systems to cope with FDIA, they rely heavily on accurate attack detection and state estimation, making them sensitive to detection delays, false alarms, and missed detections.

Overall, existing studies mainly address communication degradation and FDIA along two separate lines, and have developed corresponding methods for network disturbances and abnormal data, respectively. However, in networked V2G frequency regulation systems, communication degradation and malicious attacks often do not occur independently, and communication conditions and attack risks may also vary significantly across regions. Existing methods still pay limited attention to such coupled uncertainties and their spatial heterogeneity.

1.2. Challenges and Motivations

The main challenge in wide-area V2G frequency control lies in the coupled effects of communication degradation and false data injection attacks, together with the spatial heterogeneity caused by regional differences. Communication degradation and FDIA may simultaneously affect state perception and control execution, while regional variations in communication conditions and cybersecurity protection further lead to heterogeneous delays, packet losses, and attack risks. Under such conditions, the execution reliability of aggregators may vary across regions and operating states, making system behavior difficult to characterize accurately using a unified model or fixed parameters. Although learning-based and resilient control methods have been introduced to improve adaptability under complex operating conditions [14,15], many existing studies still rely on detection, estimation, and compensation mechanisms. In fast-timescale frequency regulation, such mechanisms may introduce additional communication and computational overhead, while detection delay can further reduce the already limited response window. Moreover, when communication conditions and attack behaviors vary across regions, false alarms may weaken available regulation resources, whereas missed detections may allow abnormal execution to persist and further deteriorate frequency stability. As a result, maintaining stable and reliable performance across scenarios remains difficult.

Motivated by the above limitations, this work does not further strengthen explicit attack detection, but instead reduces the dependence of control decisions on detection results. For fast frequency regulation, it is often more important to determine which aggregators can reliably execute dispatch commands under current conditions than to identify the exact location and magnitude of an attack. Therefore, the focus is shifted from attack identification to execution reliability management. Accordingly, execution-subset selection and online switching are used to dynamically retain or replace aggregators based on closed-loop system responses. The goal is not to recover the true value of corrupted signals, but to maintain regulation capability and operational stability without relying on precise attack localization and estimation.

1.3. Contributions

This paper proposes a detection-free decision-level resilient control method, termed Diffusion-Based Offline Adversarial Reinforcement Learning Switching (DOARL-S). Rather than using explicit attack detection or compensation, the method adjusts the active aggregator subset through dynamic selection and switching driven by system-level closed-loop feedback. As a result, potentially unreliable resources can be avoided without sacrificing frequency regulation performance. To strengthen robustness in spatially heterogeneous environments, the framework further incorporates a diffusion-based adversarial reinforcement learning scheme. A conditional diffusion model is introduced to generate heterogeneous capacity scenarios, while offline adversarial training is used to learn selection-switching policies that remain effective under worst-case disturbances. The method is also built on a unified cyber-physical coupled model that captures the combined effects of communication degradation and bidirectional false data injection attacks under spatial heterogeneity. Most existing resilient frequency regulation approaches rely on first detecting attacks and then applying compensation or mitigation, including reinforcement-learning-based regulation under attack conditions [10], abnormal-data detection methods [16], and redundant-communication or observer-based schemes [17,18]. Different from these methods, the proposed approach improves resilience through feedback-driven adaptive resource selection and switching, rather than through an explicit attack detection stage. The main contributions of this paper are as follows:

A unified model is established for wide-area V2G frequency regulation under communication degradation and bidirectional false data injection attacks. Regional defense strength and attack feasibility parameters are introduced to characterize spatial variations in attack risk and attack cost.
A closed-loop-feedback-based strategy is developed to select the execution subset of aggregators and switch their participation online. Without relying on explicit attack detection, the proposed method adaptively selects reliable aggregators to reduce execution deviations while limiting switching costs and maintaining frequency regulation performance.
A diffusion-based adversarial offline reinforcement learning framework is developed by formulating the control problem as a zero-sum game between the attacker and the defender. Through adversarial training under a unified attack budget, the framework learns selection and switching policies against worst-case disturbances.

2. System Model: Spatially Heterogeneous V2G Aggregation in Grid Frequency Control

This section presents the system model of the V2G-enabled grid frequency control system under spatially heterogeneous network environments. The grid frequency dynamic model with aggregated V2G power injection is introduced to describe the physical-layer frequency regulation process. Then, the multi-area V2G aggregation architecture is presented, including the interaction among the transmission system operator, aggregators, and electric vehicles through the networked dispatch mechanism. Finally, the models of communication degradation and bidirectional FDIA in the networked dispatch process are developed, where spatial heterogeneity is represented through region-dependent communication conditions and defense strength parameters.

2.1. Grid Frequency Dynamics with Networked V2G Injection

In the frequency regulation framework shown in Figure 1, conventional generators and aggregated electric vehicle resources jointly participate in grid frequency control. System frequency deviation is caused by the power imbalance between generation and load, and the regulation signal is generated through the area control error. Conventional generators provide frequency regulation through physical control channels, while V2G aggregated resources deliver power support through a networked dispatch channel. Different from traditional frequency regulation models, the V2G regulation channel is influenced not only by physical dynamics but also by capacity reporting information, communication link conditions, and the execution selection mechanism [19,20,21]. Therefore, both the physical-layer frequency dynamics and the network-layer scheduling process need to be considered in the system modeling. In this work, the focus is on system-level frequency regulation dynamics, which are governed by the active power balance and low-frequency behavior of the power system. The frequency evolution is modeled based on the classical swing-equation framework, where the dominant dynamics lie in the time scale of seconds and are driven by aggregate power imbalance. In practical systems, converter-induced harmonics are typically mitigated at the device and distribution levels through filtering, modulation strategies, and power quality standards. These mechanisms effectively decouple high-frequency disturbances from system-level frequency dynamics. Based on this modeling abstraction, the proposed framework focuses on the cyber–physical interaction affecting frequency regulation without incorporating electromagnetic transient effects.

Equation (1) describes the discrete-time closed-loop frequency regulation dynamics of the V2G-enabled power system [1,2]. The system frequency is updated according to the swing equation using the total generator regulation power, aggregated EV power injection, renewable power deviation, and load disturbance. The generator-side mechanical power dynamics are represented by a first-order model driven by both primary droop regulation and AGC-based secondary control, while the aggregated EV response is modeled by a first-order tracking process with respect to the dispatched command.

Here,

k

denotes the discrete-time control step and

t_{k} = k T_{s}

, where

T_{s}

is the sampling interval.

Δ f_{s} [k]

is the system frequency deviation.

Δ P_{L} [k]

and

{Δ P}_{R E S} [k]

denote the load disturbance and renewable power deviation, respectively.

{Δ P}_{M, i}

denotes the mechanical power output of generator i, and

{Δ P}_{M}

is the total generator regulation power obtained by summing over all generators.

{Δ P}_{E V} [k]

and

{Δ P}_{E V}^{C M D} [k]

are the actual and commanded aggregated EV power, respectively.

H

,

D

,

K_{i}

,

T_{i}

, and

T_{e}

denote the equivalent inertia constant, load damping coefficient, generator gain, generator time constant, and EV time constant, respectively.

A C E [k]

is the area control error,

A C E [k] = B Δ f_{s} [k] + Δ P_{t i e} [k]

where

Δ P_{t i e} [k]

denotes the tie-line power deviation and

B

is the frequency bias factor. The secondary control signal is generated by the AGC mapping,

v [k] = g (A C E [k])

.

α_{i}

is the participation factor and

R_{i}

is the droop coefficient of generator i.

Δ P_{r, i} [k] = - \frac{1}{R_{i}} Δ f_{s} [k]

,

Δ P_{c, i} [k] = α_{i} v [k]

, and

Δ P_{M, i} [k + 1]

denote the primary response, secondary control input, and mechanical power output of generator i, respectively, and

Δ P_{M} [k]

is the total generator regulation power.

Based on the physical-layer frequency dynamic model, V2G aggregated resources participate in frequency control through a networked dispatch channel. At each fast-timescale control step

k

, the TSO generates the total frequency regulation command

u [k]

and selects

Q

aggregators from the candidate set

A

for execution.

A

is the set of all aggregators.

\{\begin{matrix} ∆ f_{s} [k + 1] = ∆ f_{s} [k] + \frac{T_{s}}{H} (- D Δ f_{s} [k] + Δ P_{M} [k] + Δ P_{E V} [k] + Δ P_{R E S} [k] - Δ P_{L} [k]) \\ Δ P_{M, i} [k + 1] = (1 - \frac{T_{s}}{T_{i}}) Δ P_{M, i} [k] + \frac{T_{s} K_{i}}{T_{i}} (Δ P_{c, i} [k] - Δ P_{r, i} [k]) \\ ∆ P_{E V} [k + 1] = (1 - \frac{T_{s}}{T_{e}}) Δ P_{E V} [k] + \frac{T_{s}}{T_{e}} Δ P_{E V}^{C M D} [k] \end{matrix}

(1)

The aggregator a updates the FM capacity at the slow time scale m. Each reporting interval contains

L

fast control steps,

k \in {m L, \dots, (m + 1) L - 1}}

. Let

{\bar{S}}_{a} [k]

denote the nominal capacity available at fast control step

k

, which is held constant within each reporting interval,

{\bar{S}}_{a} [k] = S_{a} [k]

,

k \in {m L, \dots, (m + 1) L - 1}} .

Let the binary variable

w_{a} [k] \in {0,1}

indicate whether aggregator

a

is selected to participate in execution at step

k

. For the selected aggregators, the TSO allocates the regulation command proportionally according to their nominal capacities

{\bar{S}}_{a} [k]

. The corresponding capacity weight is defined as follows:

β_{a} [k] = \frac{w_{a} [k] {\bar{S}}_{a} [k]}{\sum_{i \in A} w_{i} [k] {\bar{S}}_{a} [k] + ε}

(2)

Here,

ε > 0

is a small constant introduced to prevent the denominator from becoming zero. Accordingly, the normalized frequency regulation command allocated to aggregator a is

u_{a} [k] = β_{a} [k] u [k

]. The equivalent power command entering the V2G power tracking stage from the network layer can therefore be expressed as

Δ P_{E V}^{C M D} [k] = \sum_{a \in A} u_{a} [k]

. This signal serves as the input to the first-order V2G power tracking dynamics in Equation (1), thereby forming a coupled closed-loop interaction between network-layer scheduling and physical-layer frequency response.

2.2. Multi-Area V2G Aggregation Architecture and Interaction Mechanism

As illustrated in Figure 2, this paper considers a V2G aggregation-based frequency control framework coordinated by the transmission system operator and deployed across

Y

geographical regions. Let the set of regions be indexed by

y \in {1,2, \dots, Y}

. The number and scale of aggregators may differ from one region to another. Aggregators communicate bidirectionally with the TSO through communication networks under a fast control update and slow capacity reporting mechanism. Let

A_{y}

denote the set of aggregators located in region

y

, and let

r (a) \in {1, \dots, Y}

denote the regional index of aggregator a.

At the fast timescale, the TSO generates and dispatches the regulation command

u [k]

at each control step

k

with sampling period

T_{s}

, based on frequency measurements and regulation signals such as

A C E

. Aggregators coordinate their EV fleets to provide real time power responses. These responses are injected into the grid and influence the system frequency dynamics. At the slow reporting timescale, the available regulation capacities of aggregators are updated over a rolling window

m

with reporting period

T_{m}

,

T_{m} = L T_{s}

. They therefore define the capacity limits and operational constraints for the fast timescale control. At the beginning of each slow timescale window, aggregators report their available regulation capacity to the TSO. The TSO then forms a candidate resource set according to the current regulation demand and system constraints.

This paper adopts a report, select, and execute mechanism instead of assuming that all reported resources automatically participate in control. Within a slow timescale window, the regulation power required at the fast timescale usually accounts for only a portion of the total available capacity. Allowing all aggregators to participate simultaneously would introduce unnecessary communication and coordination overhead. It may also increase tracking errors due to network congestion, communication delays, or compromised resources. Therefore, even if

N

aggregators report their capacities, the TSO selects only

Q

aggregators where

Q < N

to form the active participation set. These aggregators execute the regulation command and provide power support. The remaining aggregators do not execute the command during that window but remain as standby resources for later updates or emergency activation.

Within this framework, the spatial heterogeneity considered in this paper appears in three aspects: resource availability, communication conditions, and cyber security risks. Differences in EV penetration levels and infrastructure development lead to regional variations in available regulation capacity. Aggregators located in different areas therefore provide different amounts of regulation resources. Communication conditions also vary across regions due to differences in terrain, network coverage, and historical communication infrastructure. These differences result in heterogeneous delays, packet loss rates, and communication reachability in both uplink and downlink channels. As a result, the transmission and execution of control commands at the fast timescale may vary across regions. And variations in regional defense strength and security mechanism deployment affect the probability of FDIA, the feasibility of attack implementation, and the attack energy budgets in both capacity reporting channels and control command channels. Because of these factors, even under the same dispatch requirement, the information reliability and command executability of aggregators may differ across locations. This motivates the need for resource selection and online switching within a unified decision framework.

2.3. Communication Degradation and Bidirectional FDIA in Networked V2G Dispatch

Based on the nominal closed-loop frequency regulation model described in (1)–(2), we further consider that the networked V2G dispatch communication links may be simultaneously affected by communication degradation and FDIA. Communication degradation mainly manifests as link delays and packet losses. Taking the downlink communication channel as an example, these effects cause deviations between the control signal available at the aggregator side and the command

u_{a} [k

]. Let

τ_{a}^{d o w n} [k] \in Z \geq 0

denote the equivalent downlink communication delay experienced by aggregator a at the fast-timescale step

k

. In addition, introduce an effective reception coefficient

γ_{a}^{d o w n} [k] \in [0,1]

, which represents the proportion of the command that effectively arrives and can be executed at step

k

. Under this formulation, the actual control signal received by the aggregator can be modeled as follows:

{\tilde{u}}_{a} [k] = γ_{a}^{d o w n} [k] u_{a} [k - τ_{a}^{d o w n} [k]]

(3)

The variables

γ_{a}^{d o w n} [k]

and

τ_{a}^{d o w n} [k]

follow region-dependent statistical characteristics, which can be expressed as

γ_{a}^{d o w n} [k] \sim D_{r (a)}^{γ_{d o w n}}, τ_{a}^{d o w n} [k] \sim D_{r (a)}^{τ_{d o w n}}

. Considering the selection–execution mechanism, aggregators that are not selected do not participate in real-time power tracking. Therefore, under communication degradation, the equivalent network-layer power command entering the V2G tracking stage is updated as

Δ P_{E V}^{C M D} [k] = \sum_{a \in A} {\tilde{u}}_{a} [k]

.

In addition to communication degradation, FDIA may tamper with both the uplink capacity-reporting channel and the downlink control-command channel, thereby affecting the capacity allocation weights and the command execution results, respectively.

For the uplink channel, let the nominal capacity reported by aggregator a in the slow-timescale window

m

be

S_{a} [m]

. The capacity information actually received by the TSO can be modeled as follows:

{\hat{S}}_{a} [m] = {γ_{a}^{u p} [m] S}_{a} [m - τ_{a}^{u p} [m]] + δ_{a} [m]

(4)

Here,

δ_{a} [m]

denotes the capacity tampering component. The effective reception coefficient

γ_{a}^{u p} [m]

and the communication delay

τ_{a}^{u p} [m]

follow region-dependent statistical characteristics, which can be expressed as

γ_{a}^{u p} [m] \sim D_{r (a)}^{γ_{u p}}, τ_{a}^{u p} [m] \sim D_{r (a)}^{τ_{u p}}

. The tampered capacity

{\hat{S}}_{a} [m]

is then held constant within the corresponding reporting interval and used by the TSO to compute the capacity allocation weights in the fast-timescale dispatch stage. For the subsequent fast-timescale dispatch, the received capacity

{\hat{S}}_{a} [m]

is held constant within the corresponding reporting interval and mapped to the fast-timescale available capacity

{\hat{\tilde{S}}}_{a} [m]

.

{\hat{β}}_{a} [k] = \frac{w_{a} [k] {\hat{\tilde{S}}}_{a} [k]}{\sum_{i \in A} w_{i} [k] {\hat{\tilde{S}}}_{i} [k] + ε}

(5)

Accordingly, the command allocation is updated as

u [k]

,

u_{a} [k] = \hat{β} u [k]

, thereby altering the distribution structure of the frequency regulation command among aggregators.

On the downlink channel, attackers may further tamper with the signals received at the aggregator side. For the signals affected by communication degradation, an additive tampering model is adopted:

{\bar{u}}_{a} [k] = {\tilde{u}}_{a} [k] + η_{a} [k]

(6)

Here,

η_{a} [k]

denotes the downlink command tampering component. By jointly considering the selection–execution mechanism, communication degradation, and downlink FDIA, the equivalent network-layer power command entering the V2G tracking stage can finally be expressed as

Δ P_{E V}^{C M D} [k] = \sum_{a \in A} {\bar{u}}_{a} [k]

.

The uplink capacity information tampering

δ_{a} [m]

and the downlink control command tampering

η_{a} [k]

are implemented by the same attacker, and they share limited attack resources within each slow-timescale window. The set of fast-timescale steps contained in the m-th slow window is defined as

K_{m} ≜ {k | m L \leq k \leq (m + 1) L - 1)}

. The total energy consumption of the attacker on both the uplink and downlink communication channels satisfies a unified budget constraint. Let

A = {1,2, \dots, N}

denote the candidate aggregator set, where

N = ∣ A ∣

is the total number of candidate aggregators.

E_{a m p} [m] = \sum_{a \in A} ω_{r (a)}^{u p} ∥ δ_{a} [m] ∥_{2}^{2} + \sum_{k \in K_{m}} \sum_{a \in A} ω_{r (a)}^{d o w n} ∥ η_{a} [k] ∥_{2}^{2}

(7)

Here,

ω_{r (a)}^{u p}

and

ω_{r (a)}^{d o w n}

denote the region-dependent energy weights, which characterize the impact of regional defense strength differences on the attack cost. In addition, a dispersion cost is introduced.

Define the indicator variables that represent whether an injection attack is applied to aggregator

a

:

z_{a}^{u p} [m] ≜ I {‖δ_{a} [m] ‖_{2} > 0\}, z_{a}^{d o w n} [m] ≜ I {\exists k \in K [m] : ‖ η_{a} [k] ‖_{2} > 0}

.

Then, within the slow-timescale window

m

: The number of attacked aggregators within region

y

is

n_{y}^{u p} [m] = \sum_{a \in A_{y}} z_{a}^{u p} [m], n_{y}^{d o w n} [m] = \sum_{a \in A_{y}} z_{a}^{d o w n} [m]

. The number of attacked regions is

N_{r e g}^{u p} [m] = \sum_{y = 1}^{Y} I {n_{y}^{u p} [m] > 0}, N_{r e g}^{d o w n} [m] = \sum_{y = 1}^{Y} I {n_{y}^{d o w n} [m] > 0}

. Accordingly, different additional cost coefficients are introduced to penalize two attack patterns attacking multiple aggregators within the same region, and launching attacks simultaneously across multiple regions.

\begin{matrix} E_{d i s p} [m] = λ_{i n}^{u p} \sum_{y = 1}^{Y} {(n_{y}^{u p} [m] - 1)}_{+} + λ_{o u t}^{u p} {(N_{r e g}^{u p} [m] - 1)}_{+} \\ + λ_{i n}^{d o w n} \sum_{y = 1}^{Y} {(n_{y}^{d o w n} [m] - 1)}_{+} + λ_{o u t}^{d o w n} {(N_{r e g}^{d o w n} [m] - 1)}_{+} \end{matrix}

(8)

Here,

(x)^{+} ≜ \max (x, 0)

. The coefficients

λ_{i n}^{u p}

and

λ_{i n}^{d o w n}

denote the additional coordination cost when attacks are dispersed among multiple aggregators within the same region. The coefficients

λ_{o u t}^{u p}

and

λ_{o u t}^{d o w n}

denote the additional cost associated with cross-region dispersion, that is, attacks launched across multiple regions. Typically,

λ_{o u t}^{u p} \geq λ_{i n}^{u p}

,

λ_{o u t}^{d o w n} \geq λ_{i n}^{d o w n}

. Within each slow-timescale window m, the total resource consumption of the attacker in both the uplink and downlink channels is limited by a unified attack budget.

E_{a m p} [m] + E_{d i s p} [m] \leq Γ

(9)

Here,

Γ

denotes the upper bound of the total attack resource budget within each slow-timescale window. Equations (3)–(6) describe how communication degradation and bidirectional FDIA enter the networked V2G dispatch communication links and affect the execution of regulation commands [5,9]. Equations (7)–(9) further model the spatial differences in attack capability and attack cost [14,15].

2.4. Modeling Assumptions

To improve clarity, the main modeling assumptions adopted in this study are summarized as follows.

2.4.1. Generator-Side Modeling

Conventional generators are represented by a reduced-order frequency regulation model based on the classical swing equation, primary droop response, and AGC-based secondary control. The generator-side mechanical power dynamics are modeled by a first-order process. Detailed electromagnetic transients and higher-order generator subsystems are not explicitly considered, since the focus of this work is on system-level frequency dynamics and regulation behavior at the electromechanical time scale.

2.4.2. EV Aggregation Modeling

Electric vehicles are not modeled individually. Instead, they are aggregated at the aggregator level and represented as flexible resources with limited regulation capacity. The aggregated EV response is described by a first-order power-tracking process with respect to the dispatched command. This abstraction captures the dominant active-power response relevant to frequency regulation, while avoiding detailed modeling of individual vehicle states, charger-level dynamics, and internal converter behavior.

3. Detection-Free Select-Switch for Robust V2G Frequency Control with Coupled Heterogeneous Uncertainties

Section 2 establishes the cyber–physical model of V2G frequency regulation under communication degradation and bidirectional FDIA. These disturbances affect the closed-loop process of capacity reporting, dispatch decisions, and command execution. Under wide-area deployment, both communication conditions and attack risks vary across regions and over time. As a result, control approaches that rely on explicit anomaly detection and compensation are difficult to apply reliably. This section proposes a Diffusion-Based Offline Adversarial Reinforcement Learning Switching framework. The proposed method does not depend on attack detection. Instead, it performs execution subset selection and online switching based on closed-loop system feedback. The framework combines diffusion-based modeling of regulation capacity reporting under spatial heterogeneity with adversarial offline reinforcement learning.

Section 3.1 introduces the diffusion-based modeling of regulation capacity reporting under spatial heterogeneity. Section 3.2 describes the execution subset selection and switching mechanism used in real-time control. Section 3.3 formulates the adversarial training problem between the attacker and the defender. Section 3.4 presents the overall workflow of the DOARL-S framework.

3.1. Scenario Generator Based on Diffusion

In wide-area V2G aggregation-based frequency regulation, each aggregator a

\in A

available regulation capacity during the scheduling window m, denoted by

S_{a} [m]

. This value represents the EV capacity that can participate in frequency regulation. The aggregated report vector is written as S[m]

≜ [S_{1} [m], \dots, S_{|A|} [m]]^{⊤}

. The reported capacities exhibit significant spatial and temporal uncertainty. Regional traffic conditions, charging demand, vehicle state-of-charge, and communication quality all influence the reported values. Under wide-area deployment, differences in EV penetration, charging infrastructure, and user behavior lead to capacity reporting distributions that are often non-Gaussian, multimodal, and correlated across regions [22]. In addition, historical data are usually unevenly distributed across regions and operating conditions. As a result, offline training scenarios constructed directly from historical samples may not sufficiently cover possible capacity reporting patterns, which can limit the generalization capability of the learned policy.

In practical implementation, not all aggregators are eligible to participate in real-time power tracking. An aggregator can be selected into the execution subset only if it satisfies the following technical conditions. Firstly, the aggregator has non-zero available regulation capacity within the current slow-timescale reporting window. Secondly, the aggregator can receive and execute dispatch commands under the current communication condition, characterized by acceptable effective reception and delay in the downlink channel. And the inclusion of the aggregator does not violate the switching cost and spatial coordination constraints defined in the scheduling framework. Therefore, the selection problem is not solely determined by available capacity, but reflects a trade-off among regulation capability, communication reliability, and switching overhead.

To address the above issue, this paper employs a conditional diffusion generative model to learn the nominal probability distribution of capacity reporting under a conditional vector

c [m]

, denoted as

p_{θ} (S ∣ c [m])

. Diffusion models can capture the complex probability structure of high-dimensional data without relying on explicit distribution assumptions [23]. The trained model can therefore generate capacity reporting samples that preserve spatial heterogeneity and multimodal characteristics, providing diverse operating scenarios for the subsequent adversarial policy training.

Specifically, this paper adopts the standard diffusion training framework, where distribution learning is achieved through forward noise injection and noise prediction. In the forward process, at diffusion step n, noise is added to the real sample

S_{0}

to obtain

S_{n} = α_{n} S_{0} + \sqrt{1 - α_{n}} ϵ, ϵ \sim N (0, I)

.

A noise prediction network

ϵ_{θ} (S_{n}, n, c)

is then trained by minimizing the following loss function:

\min_{θ} L_{d i f f} (θ) = E_{S_{0}, c, n, ϵ} [‖ ϵ - ϵ_{θ} (S_{n}, n, c) ‖_{2}^{2}]

. After training, the parameters θ are fixed to obtain a conditional nominal sampler, which generates nominal capacity reporting scenarios according to

S_{b a s e} [m] \sim p_{θ} (S ∣ c [m])

.

During offline training, the generated

S_{b a s e} [m]

is used as the baseline uplink capacity reporting input for each rollout. Communication degradation and bidirectional FDIA are then introduced on top of this baseline to construct the adversarial training environment.

Training Note and Justification of the Diffusion Scenario Generator

The diffusion model is used only in the offline stage to generate representative nominal capacity scenarios, while the online stage executes only the trained defender policy. Let the diffusion training dataset be

D = (S^{(i)}, c^{(i)}) {i = 1}^{N t r},

where

S^{(i)}

denotes an aggregator-level capacity-reporting sample and

c^{(i)}

denotes its associated conditioning information. The diffusion model learns the conditional distribution

p_{θ} (S∣ c),

so that, for a given condition

c [m]

, it generates a nominal capacity scenario

S_{b a s e} [m] \sim p_{θ} (S∣ c [m]) .

The generated

S_{b a s e} [m]

is then used as the baseline input to the offline adversarial training environment, on top of which communication degradation and bidirectional FDIA are imposed.

The main reason for using diffusion is that offline policy learning depends strongly on the diversity and representativeness of training scenarios. If scenarios are generated only by replaying limited historical samples or by sampling from a simple unimodal distribution, the learned policy may overfit a narrow subset of reporting patterns. By contrast, the diffusion model is used to better preserve multimodality, spatial heterogeneity, and cross-regional correlation in the capacity-reporting process. The effectiveness of the diffusion generator is evaluated by comparing generated samples with reference data in terms of the total-capacity distribution, aggregator-level mean and standard deviation, and correlation characteristics.

3.2. Selection-and-Switching Defense Mechanism

Specifically, at each fast-timescale step

k

, the defender selects a fixed-size execution subset

S [k] \subseteq A

with cardinality Q from the candidate aggregator set

A = {1,2, \dots, N}

. For each selected aggregator

a \in S [k]

, a discrete execution mode

q_{a} [k] \in M

is assigned. Therefore, the defense action at step k is jointly determined by the subset-selection decision and the mode-assignment decision,

a_{D} [k] = (S [k], {q_{a} [k]}_{a \in S [k]})

, where M denotes the set of admissible execution modes.

Communication degradation and downlink tampering may cause deviations between the issued commands and the actions executed by aggregators. Uplink tampering may also distort the reported capacity information. As a result, the dispatch center cannot assume access to reliable execution information at the aggregator or regional level during either training or deployment. Consequently, decision-making is driven only by system-level closed-loop feedback, where the state is constructed from

Δ f [k]

,

A C E [k]

, and their historical observation window, together with the previous execution subset and mode information to capture dispatch inertia.

The policy input therefore consists solely of system-level observable closed-loop variables, without including modules for attack detection, localization, or tampering estimation. When the slow-timescale window switches from m to m + 1, the execution subset and mode at the end of the previous window are adopted as the initial configuration for the new window, ensuring scheduling continuity. Online decision-making is executed in a rolling manner over the fast-timescale steps

k \in K_{m}

. The reconstruction of the execution subset is not subject to explicit hard constraints; instead, the subset reconstruction cost

L_{w} [k]

and the spatial switching cost

L_{g e o} [k]

are incorporated into the instantaneous loss. In this way, the adversarial training process learns a self-consistent trade-off between regulation performance and switching overhead. Specifically, let the newly added subset be defined as

A^{+} [k] = S [k] ∖ S [k - 1]

. Let

{p o s}_{a}

denote the spatial position (or equivalent network coordinate) of aggregator a, and

r (a)

denote its regional index. For any newly added aggregator

a \in A^{+} [k]

, the spatial coordination cost associated with joining the current execution set is characterized by its minimum spatial distance to the previously active subset

{m i n}_{b \in S [k - 1]} ∥ {p o s}_{a} - {p o s}_{b} ∥

. Meanwhile, if the nearest reference node

b^{*} = {a r g m i n}_{b \in S [k - 1]} ∥ {p o s}_{a} - {p o s}_{b} ∥

belongs to a different region than a, an additional cross-region penalty is introduced to reflect the boundary coordination and communication overhead. The spatial switching cost is defined:

L_{g e o} [k] = \sum_{a \in A^{+} [k]} (ρ_{d} \min_{b \in S [k - 1]} ∥ {p o s}_{a} - {p o s}_{b} ∥ + ρ_{r} I (r (a) \neq r (b^{*})))

(10)

Here,

ρ_{d} > 0

controls the distance-driven switching penalty intensity, while

ρ_{r} > 0

controls the additional cross-regional penalty intensity. Equation (10) accumulates the costs item by item over the newly added subset, meaning that when a batch reconstruction involves more aggregators, larger spatial spans, or higher cross-regional components, the corresponding switching cost becomes larger. In this way, spatial heterogeneity is embedded into the selection–switching mechanism in an optimizable manner, providing clear structural constraints for policy learning.

On this basis, this paper adopts a unified instantaneous loss function to jointly measure frequency regulation performance and switching overhead. The regulation performance is represented by the frequency deviation term

{Δ f [k]}^{2}

. Frequent reconstruction of the execution subset at the fast timescale may introduce additional dispatching and coordination overhead. This includes control channel reconfiguration, resource coordination, and concurrent switching on the execution side. It may also cause policy oscillations. To explicitly incorporate the magnitude and frequency of subset reconstruction into the policy learning process, the subset reconstruction scale cost is defined as follows:

L_{w} [k] = |S [k] △ S [k - 1]|

(11)

Here, △ denotes the symmetric difference between sets, and

L_{w} [k]

is equivalent to the number of aggregators replaced at the current step. This term and the spatial switching cost

L_{g e o} [k]

respectively constrain how many and how frequently aggregators are replaced, and where the switching occurs and whether it involves cross-regional transitions. In this way, without imposing hard constraints on the switching scale at each step, the policy is guided during adversarial training to adaptively form a subset evolution rhythm that aligns with real operational environments.

Combining the above components, the instantaneous loss function of the defender at the fast-timescale step

k

is given as follows:

L_{D} [k] = α_{f} {Δ f [k]}^{2} + α_{w} L_{w} [k] + α_{g} L_{g e o} [k]

(12)

Here,

α_{f}

,

α_{w}

and

α_{g} > 0

are weighting coefficients: Specifically,

α_{f}

determines the importance of the frequency-regulation performance term,

α_{w}

controls the penalty on the subset reconstruction scale, and

α_{g}

controls the penalty on spatial switching overhead. Equation (12) defines the trade-off between improving frequency regulation performance and limiting the magnitude, frequency, and spatial range of batch switching. Under this formulation, the learned policy tends to follow a gradual subset evolution path that respects spatial constraints. During offline training, the attacker and defender share the same performance metric

L_{D}

, which forms a zero-sum adversarial game. Under the communication degradation model and the unified attack budget constraint in (9), the attacker selects uplink and downlink tampering actions to degrade frequency regulation performance. The defender counteracts these disturbances through the subset selection and switching strategy. Equations (10)–(12) therefore describe the switching mechanism and the optimization signals under spatial heterogeneity constraints. These equations define the objective used in the adversarial training problem introduced in the next section.

3.3. Solving the Adversarial Optimization Problem via Reinforcement Learning

Based on the instantaneous loss

L_{D}

defined in Section 3.2 (Equation (12)), together with the communication degradation and bidirectional FDIA injection models introduced in Section 2 (Equations (3)–(6)) and the unified attack budget constraint (Equation (9)), the attack–defense interaction can be formulated as a zero-sum min–max adversarial optimization problem over the long-term discounted loss [24]. Let the defender policy be denoted by

π

. At each fast-timescale step t, the defender outputs a selection–switching action

a_{D} [k] = (S [k], {q [k]}_{a \in S [k]})

.

Based on the observable state

x [k] ≜ [Δ f [k], A C E [k], Δ f_{k - H : k - 1}, {A C E}_{k - H : k - 1}, S [k - 1], q [k - 1]]

. Let the attacker policy be denoted by

ρ

. At each slow-timescale window m, the attacker generates an uplink capacity tampering vector

δ [m] = {[δ_{a} [m]]}_{a \in A}

and, for each fast-timescale step

k \in K_{m}

within that window, outputs a downlink command tampering vector

η [k] = {[η_{a} [k]]}_{a \in A}

. Accordingly, starting from the initial time

k = 0

, the adversarial objective is defined as follows [24]:

(π^{*}, ρ^{*}) = a r g \min_{π} a r g \max_{ρϵ U (Γ)} E [\sum_{k = 0}^{\infty} γ^{k} L_{D} [k]]

(13)

Here, the discount factor

γ \in (0,1)

is used to attenuate long-term effects and ensure that the infinite-horizon loss remains bounded. The set

U (Γ)

denotes the feasible strategy set of the attacker, whose actions must satisfy the unified budget constraint within each slow-timescale window m, meaning that the sum of the attack magnitude energy and the spatial dispersion cost does not exceed

Γ

. Equation (13) corresponds to a zero-sum dynamic game. The attacker seeks to maximize the degradation of frequency regulation performance under the attack budget constraint, while the defender minimizes the combined cost of regulation degradation and switching overhead through the selection–switching decision mechanism. This min–max formulation creates a symmetric interaction between attack and defense. Furthermore, to characterize the adversarial cost starting from any fast-timescale step

k

and system state

x [k]

, the optimal adversarial value function is defined as:

J [k, x [k]] = \min_{π} \max_{ρϵ Π_{U (Γ)}} E [\sum_{l = k}^{\infty} γ^{l - k} L_{D} [l] | x [k]]

(14)

This problem simultaneously involves system dynamics and stochasticity introduced by communication degradation, the timescale coupling and budget constraints of uplink and downlink attacks, as well as the combinatorial action space of the defender (subset selection and mode allocation). As a result, obtaining analytical solutions for

J (\cdot)

and the saddle-point strategies is generally intractable. In (14),

l

is introduced as the future fast-timescale step index to distinguish the summation variable from the current step

k

. To address this challenge, this paper adopts adversarial reinforcement learning to approximately solve Equation (13) within the policy space [25].

To solve the adversarial scheduling problem formulated in (13), this paper adopts a gradient descent–ascent based alternating optimization framework. The defender and attacker policies are trained through repeated adversarial interactions within a simulated cyber–physical environment. Let

π_{θ}

denote the defender policy parameterized by

θ

, and

ρ_{ψ}

denote the attacker policy parameterized by

ψ

. The defender aims to minimize the cumulative system loss, while the attacker attempts to maximize it by injecting malicious perturbations into the communication channels. The policy updates follow a min–max optimization scheme:

θ \leftarrow θ - α_{D} \nabla_{θ} Υ (θ, ψ), ψ \leftarrow ψ + α_{A} \nabla_{ψ} Υ (θ, ψ)

(15)

where

Υ (θ, ψ)

denotes the parameterized objective corresponding to (13). where

α_{D}

and

α_{A}

denote the learning rates of the defender and attacker policies, respectively. In practice, the gradients in (15) are approximated using policy-gradient based actor–critic updates. Specifically, the defender policy generates a score vector for all candidate aggregators, from which a fixed-size execution subset is selected. The policy gradient is estimated using the advantage function computed from the critic network. Similarly, the attacker policy outputs a probability distribution over predefined attack actions, and its parameters are updated according to the corresponding policy gradient. Since attacker actions must satisfy the unified attack budget constraint defined in (9), the attacker network first produces unconstrained injection vectors

(δ, η)

. These outputs are then projected onto the feasible attack set using a projection operator

(δ, η) \leftarrow Π_{U (Γ)} (δ, η)

(16)

where

U (Γ)

represents the feasible set defined by the attack budget

Γ

. In implementation, this projection is achieved by normalizing the attack vector whenever the total attack cost exceeds the budget limit. This guarantees that both the magnitude and spatial dispersion of attack actions remain within the allowable resource constraints in each slow-timescale window. Through this adversarial learning process, the attacker gradually learns to generate disruptive attack strategies under the budget constraint, while the defender learns a robust scheduling policy capable of maintaining frequency regulation performance even under worst-case cyber disturbances.

After training converges, only the optimized defender policy

π^{*}

is retained for online deployment, while the attacker policy is used solely during the offline adversarial training stage.

3.4. DOARL-S Training and Deployment Procedure

The proposed DOARL-S framework contains an offline training stage and an online operation stage, as shown in Figure 3. The solid arrows denote explicit signal or parameter flows among modules, while the dashed arrows denote joint training and learning interaction in the adversarial RL process. During offline training, the conditional diffusion model in Section 3.1 learns the distribution of aggregator capacity reports from historical data. After training, the model generates capacity-reporting scenarios that reflect spatial heterogeneity. These scenarios are used to construct training environments for policy learning. At the beginning of each training episode, a nominal capacity-report sequence is sampled from the diffusion model to initialize the system state. The attacker and defender then interact within the simulation environment. At the start of each slow-timescale window, the attacker generates actions that manipulate uplink capacity reports and downlink control commands. The defender responds by selecting an execution subset of aggregators and adjusting the scheduling decision based on the observed system state. The system state evolves according to the frequency dynamics in Section 2 together with the communication degradation and FDIA models. The instantaneous loss defined in (12) is evaluated at each control step, and the attacker and defender policies are updated through repeated adversarial interactions. Through this process, the defender learns a scheduling policy that remains effective under worst-case attack conditions.

This paper adopts a dual-agent Actor–Critic reinforcement learning structure, respectively constructing independent actor networks and critic networks for the attacker and the defender. The attacker’s actor outputs the probability distribution of attack actions based on the system state, which is used to select the attack area, attack link, and attack intensity; the attacker’s critic estimates the state value under the current attack strategy. The defender’s actor scores 10 EV aggregators based on the frequency state, ACE historical information, aggregator available capacity, and the selection result from the previous moment, and selects the top 5 aggregators with the highest scores to participate in frequency regulation; the defender’s critic is used to estimate the state value under the defense strategy. All four networks adopt a two-layer fully connected MLP structure, with each hidden layer containing 128 neurons. The output dimension of the Defender actor is 10, corresponding to the selection score of 10 EV aggregators; the output dimension of the attacker’s strategy network is 64, corresponding to the discrete attack action library formed by the combination of the selection of the uplink attack area, the selection of the downlink attack area, and the four attack intensity levels.

During online operation, the trained defender policy is deployed as the scheduling strategy. At each fast-timescale control step, the dispatch center constructs the system state from system-level feedback such as frequency deviation and regulation response. The policy network then selects a fixed number of aggregators from the candidate set to form the execution subset. The selected aggregators perform real-time power tracking, while the remaining aggregators remain in standby mode. If certain aggregators exhibit degraded execution performance due to communication problems or potential attacks, the policy adjusts the execution subset according to the observed system response. Most computation is completed during offline training. Online operation only requires forward inference of the policy network with lightweight policy updates.

4. Simulation and Results Analysis

This section evaluates the performance of the proposed method through simulation studies in a wide-area V2G frequency regulation scenario. EV aggregators are distributed across three regions: a scenic area, a residential area, and a commercial area, containing 2, 3, and 5 aggregators, respectively. A transmission system operator coordinates these aggregators through centralized control. Each aggregator periodically reports its available regulation capacity to the TSO. Based on the system frequency feedback, the TSO selects a subset of aggregators to participate in frequency regulation and dynamically switches them according to the system state. In the main experiment, 10 candidate aggregators are available and 5 of them are selected for regulation execution. The nominal grid frequency is set to 50 Hz, with an inertia constant of H = 5.5 and a damping coefficient of D = 1.8. The system frequency control is updated every 1 s, while the aggregators report their available capacity every 10 s. At the V2G aggregation layer, the number of candidate aggregators is set to N = 10, and in each control step, Q = 5 aggregators are selected to participate in the real-time execution. The aggregators are respectively connected to buses 4, 8, 15, 16, 18, 21, 23, 25, 26 and 28 of the IEEE 39-bus system to represent spatial heterogeneity. The available capacity S of each aggregator is determined by the number of available EVs and the power of each vehicle, with the maximum adjustable power of each vehicle set to 0.01 MW. The V2G power tracking adopts a first-order dynamic model, with the time constant

T_{e} = 0.05

. The FDIA decision interval is also 10 s, although attacks do not necessarily occur at every interval.

Due to differences in EV population, travel patterns, and charging behavior across regions, the available capacity of aggregators varies across locations. To generate representative capacity data for policy training, a diffusion model is trained to learn the statistical characteristics of the reported capacity data.

Figure 4 further compares the correlation patterns among aggregators. The generated samples reproduce the regional correlation observed in the real data, indicating that the diffusion model captures the spatial characteristics of the capacity variations.

These results show that the diffusion model can generate capacity data that are consistent with the characteristics of the simulated system, providing representative scenarios for subsequent control policy training.

4.1. Ablation Study of the DOA Mechanism in DOARL-S

To validate the effectiveness of the proposed method in a more representative power-grid test scenario, simulations are conducted on the MATPOWER IEEE 39-bus system. Figure 5 shows the total power disturbance formed by load disturbances and renewable generation deviations. Figure 6 shows the trend of the round-level system loss

L_{D}

during the offline adversarial training process. It can be observed that in the early stage of training, due to the alternating updates of the attacker and defender strategies, the loss shows significant fluctuations. As the training progresses, the loss gradually decreases overall and eventually approaches a relatively stable low level. This indicates that the proposed adversarial training method has certain convergence and effectiveness. Figure 7 and Figure 8 show the system responses under three strategies in the presence of FDIA and communication degradation. Table 1 summarizes the corresponding performance metrics.

As shown in Figure 7, the no-defense case exhibits the largest frequency fluctuation and the deepest negative deviation. The frequency nadir reaches −0.36573 while the positive peak is 0.46115. This indicates that attacks and communication degradation affect both aggregator capacity perception and regulation execution, thereby weakening the system’s frequency regulation capability. Although the RLS policy achieves a marginal reduction in overall root-mean-square deviation, it performs poorly under extreme conditions, even resulting in a deeper frequency nadir of −0.39399 Hz. The overall improvement remains limited, with the RMS frequency deviation decreasing only slightly from 0.13426 Hz to 0.12246 Hz. This further demonstrates that an RLS policy trained solely on mixed datasets struggle to effectively counter precision attacks.

This mainly results from the characteristics of the training data. The RLS policy is trained using historical operational data, which contains both normal samples and samples affected by attacks. Since these samples are mixed together, it is difficult to identify which data correspond to attack conditions. As a result, the learned policy mainly reflects average operating conditions rather than being specifically optimized for attack scenarios. By contrast, DOARL-S achieves better frequency regulation performance. The frequency nadir further improves to −0.22777 Hz, and the positive peak decreases to 0.25411 Hz. Meanwhile, the RMS frequency deviation is reduced to 0.09174 Hz, corresponding to a 31.67% improvement compared with the no-defense case. This improvement mainly comes from the offline diffusion adversarial training stage. During this process, the attacker policy is continuously updated and gradually learns more effective attack strategies, which introduce stronger disturbances to the system. The defense policy is updated simultaneously, allowing it to adjust to these attack behaviors. After training, the policy can maintain more stable regulation performance when facing attacks.

Figure 8 shows the EV command mismatch, defined as the difference between the EV regulation command and the actual EV output. In the no-defense case, the mismatch exhibits several large peaks, indicating that FDIAs and communication degradation lead to large deviations between the issued command and the actual EV response. By contrast, under DOARL-S, the peaks of the EV regulation mismatch are clearly reduced, and large mismatch events occur less frequently. Although DOARL-S also does not perform attack detection during online operation, its policy is obtained through offline adversarial training. During this process, the attacker policy is continuously updated and searches for attack strategies that can amplify system disturbances.

As a result, aggregators with unstable execution responses are more likely to be exploited by attacks. After adversarial training, the policy tends to select aggregators with more stable execution responses. Table 1 also reflects this result. Among the three strategies, DOARL-S achieves the lowest average EV command mismatch, which is 2.412 MW.

Overall, without DOA, the RLS strategy can partially mitigate the impact of attacks, but the improvement remains limited. After introducing DOA, both frequency stability and regulation execution consistency are significantly improved. These results highlight the contribution of the DOA mechanism in improving the robustness of the RL-based aggregator selection strategy.

4.2. Validation of DOARL-S in an Expanded Scenario

To further validate the applicability of the proposed method in a larger-scale power system, this section presents an extended study on the MATPOWER IEEE 57-bus test system, where the number of aggregators is increased from 10 to 20. The aggregators are distributed across three regions, with 3, 7, and 10 aggregators, respectively. In each control period, 10 aggregators are selected from 20 candidates for frequency regulation. In addition, more attack periods are included in this case to test the method under a more demanding setting. The corresponding performance metrics are listed in Table 2. Figure 9 presents the power disturbance in this scenario, resulting from the combined effects of load disturbances and renewable generation deviations. Figure 10 shows the comparison of frequency responses under different strategies. Figure 11 further illustrates the EV command mismatch performance in the expanded 20-aggregator scenario.

As shown in Figure 10 and Table 2, the no-defense case exhibits relatively large frequency fluctuations, with a frequency nadir of −0.41323 Hz and a positive peak of 0.40058 Hz. After applying the DOARL-S, the frequency fluctuation is reduced. The magnitude of frequency fluctuations is reduced, and the extreme deviations are suppressed.

Figure 11 shows the EV regulation mismatch. In the no-defense case, pronounced mismatch peaks appear during attack periods. Under the DOARL-S strategy, these large mismatch peaks are alleviated to some extent, although small fluctuations still remain. This trend is also reflected in Table 2, where the average EV mismatch decreases from 5.6283 MW to 3.7476 MW.

Overall, when the system size is expanded to the 57-bus system and the attack periods become more frequent, the proposed method can still maintain a stable frequency response and reduce the EV regulation mismatch to some extent. This indicates that the method remains applicable in a larger-scale scenario.

4.3. Comparison of DOARL-S with Different Defense Methods

To further evaluate the performance of the proposed method on the MATPOWER IEEE 39-bus system, Diffusion-Based Offline Adversarial Reinforcement Learning Switching is compared with several representative defense strategies, including Detection Compensation Selection (DCS), Observer-Smoothed Switching (OSS), Robust Worst-Case Switching (RWS), and Kalman-Trust Switching (KTS). The case without defense (No defense) is also provided as a reference. Detection Compensation Selection: Based on the reported deviation in capacity and EV power, it determines whether the aggregator is abnormal by judging the error; if an abnormality is detected, it conservatively corrects the available capacity [26,27,28]. Observer-Smoothed Switching: It does not directly use the current capacity reported value, but combines historical estimated values to smoothly update the available capacity. This reduces the impact of short-term abnormal data on the scheduling decision [29,30]. Robust Worst-Case Switching: It represents the robust worst-case switching method, which constructs a risk score based on factors such as capacity fluctuations, communication delay, packet loss rate, power tracking error, and switching risk. It prioritizes the aggregation providers with lower comprehensive risk [31,32]. Kalman-Trust Switching: It updates the trust value of the aggregator based on the capacity residual and EV, and uses the weighted trust-based available capacity as the selection criterion [33,34,35]. Figure 12 and Figure 13 present the system frequency responses and EV regulation mismatches, respectively, while Table 3 summarizes the corresponding performance metrics.

Figure 12 and Table 3 shows the frequency responses under different methods. Without any defense, the system exhibits the largest frequency fluctuation, with the minimum frequency deviation reaching −0.36573 Hz and an RMS frequency deviation of 0.13426 Hz. After introducing defense strategies, the frequency stability is improved to different extents. Among them, DOARL-S achieves the lowest RMS frequency deviation of 0.09174 Hz, corresponding to a 31.67% reduction compared with the no-defense case. KTS and OSS also provide effective frequency regulation, with RMS deviations of 0.11802 Hz and 0.10844 Hz, respectively, while DCS results in 0.09661 Hz. In contrast, RWS shows limited improvement, with the RMS frequency deviation remaining at 0.11898 Hz. One important reason for the limited improvement of RWS lies in the spatial heterogeneity of EV aggregators across different regions. In such scenarios, the available regulation capacity and response capability vary significantly among regions. However, RWS adopts a conservative switching strategy based on worst-case risk, which does not fully exploit the regulation differences among regions. As a result, during certain disturbance periods the system cannot effectively utilize aggregators with higher regulation capability, which limits the overall improvement in frequency regulation performance.

From the perspective of extreme frequency deviations, different methods also exhibit different suppression characteristics. For example, DCS improves the frequency nadir to −0.26498 Hz, and its positive peak reaches 0.25464 Hz. In contrast, DOARL-S achieves the best performance in both the frequency nadir and peak metrics, while also attaining the lowest overall level of frequency fluctuation. It can also be observed that DOARL-S exhibits more symmetric suppression of frequency deviations on both the positive and negative sides. This allows for more balanced control of frequency deviations under disturbances in different directions. This balanced regulation characteristic helps the system maintain a more stable frequency response under attacks and communication disturbances.

Figure 13 further presents the EV regulation mismatch under different methods. As shown in Table 3, the performance of different methods varies significantly. DCS achieves the smallest mean EV mismatch of 1.0011 MW, followed by OSS with 2.5171 MW. The mean EV mismatch of DOARL-S is 2.412 MW, while KTS reaches 3.1483 MW. In contrast, RWS shows a significantly larger mismatch of 5.4774 MW.

The relatively smaller mismatches of DCS is mainly due to its mechanisms that correct capacity information or execution responses during decision-making, which helps reduce the discrepancy between regulation commands and actual outputs. It should be noted that DOARL-S is not the best in terms of EV mismatch. This is because its objective focuses more on suppressing frequency deviations than on minimizing mismatch itself. When the frequency deviation is already small and the regulation demand is limited, DOARL-S may tolerate a larger mismatch, even if more data have been tampered with, as long as the frequency response is not significantly affected.

Overall, different methods exhibit different strengths in frequency stability and execution consistency. DCS and OSS perform well in reducing EV regulation mismatches, while DOARL-S achieves better overall frequency regulation performance. As shown in Table 3, DOARL-S obtains the lowest RMS frequency deviation while maintaining a relatively low EV mismatch. Under the coexistence of attacks and communication degradation, the proposed method maintains a stable frequency response and demonstrates more balanced regulation behavior, leading to superior overall performance.

5. Conclusions

This paper presented a resilient switching framework for V2G-enabled frequency regulation in the presence of communication degradation and bidirectional false data injection attacks. By combining diffusion-based scenario generation with offline adversarial reinforcement learning, the proposed method supports detection-free subset switching using closed-loop system feedback and reduces the effect of unreliable or compromised aggregators. The effectiveness of the framework was validated through ablation, scalability, and comparative studies. DOARL-S reduced the RMS frequency deviation from 0.13426 Hz in the no-defense case to 0.09174 Hz, achieving a 31.67% improvement, while maintaining a relatively low EV regulation mismatch. It also outperformed DCS, OSS, RWS, and KTS in overall frequency regulation performance. Overall, the results indicate that this framework can improve the resilience of wide-area V2G frequency control under spatially heterogeneous cyber-physical uncertainties. Future work will consider more realistic network dynamics, coordinated multi-channel attack models, and deployment in large-scale interconnected power systems.

Author Contributions

Conceptualization, X.X. and Q.K.; methodology, X.X. and Z.W.; software, X.X. and T.Z.; validation, X.X.; formal analysis, X.X. and K.X.; investigation, X.X. and H.Z.; resources, X.X. and S.L.; data curation, X.X. and Z.H.; writing—original draft preparation, X.X.; writing—review and editing, X.X. and Q.K.; visualization, X.X.; supervision, Q.K.; project administration, X.X. and Q.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sun, J.; Tan, S.; Zheng, H.; Qi, G.; Tan, S.; Peng, D.; Guerrero, J.M. A DoS Attack-Resilient Grid Frequency Regulation Scheme via Adaptive V2G Capacity-Based Integral Sliding Mode Control. IEEE Trans. Smart Grid 2023, 14, 3046–3057. [Google Scholar] [CrossRef]
Ajami, O.M.; Alkhusaibi, M.S.; Tan, R.H.; Jamaludin, F.A.; Nadarajah, M. The impact of increased renewable energy penetration and reduced inertia on the frequency nadir in a multi-area interconnected network based on the peninsula Malaysia national grid. Glob. Energy Interconnect. 2026, 9, 372–385. [Google Scholar] [CrossRef]
Alamgir, S.; Hassan, S.J.U.; Mehdi, A.; Abdelmaksoud, A.; Haider, Z.; Shin, G.S.; Kim, C.H. A comprehensive review of vehicle–to–grid technology as an ancillary services provider. Results Eng. 2025, 27, 106813. [Google Scholar] [CrossRef]
Kaur, K.; Singh, M.; Kumar, N. Multiobjective Optimization for Frequency Support Using Electric Vehicles: An Aggregator-Based Hierarchical Control Mechanism. IEEE Syst. J. 2019, 13, 771–782. [Google Scholar] [CrossRef]
Panda, D.K.; Halder, K.; Das, S.; Townley, S. Observer based decentralized load frequency control with false data injection attack for specified network quality and delay. Chaos Solitons Fractals 2024, 186, 115323. [Google Scholar] [CrossRef]
Feng, S.; Tesi, P. Resilient control under Denial-of-Service: Robust design. In Proceedings of the 2016 American Control Conference (ACC), Boston, MA, USA, 6–8 July 2016; pp. 4737–4742. [Google Scholar] [CrossRef]
Gao, W.; Deng, C.; Jiang, Y.; Jiang, Z.-P. Resilient reinforcement learning and robust output regulation under denial-of-service attacks. Automatica 2022, 142, 110366. [Google Scholar] [CrossRef]
Mazare, M. Reinforcement learning-based fixed-time resilient control of nonlinear cyber physical systems under false data injection attacks and mismatch disturbances. J. Frankl. Inst. 2023, 360, 14926–14938. [Google Scholar] [CrossRef]
Ye, J.; Yu, X. Detection and Estimation of False Data Injection Attacks for Load Frequency Control Systems. J. Mod. Power Syst. Clean Energy 2022, 10, 861–870. [Google Scholar] [CrossRef]
Abouzeid, S.I.; Chen, Y.; Zaery, M.; Abido, M.A.; Raza, A.; Abdelhameed, E.H. Load frequency control based on reinforcement learning for microgrids under false data attacks. Comput. Electr. Eng. 2025, 123, 110093. [Google Scholar] [CrossRef]
Mohan, A.M.; Meskin, N. Observer-based false data injection attack resilient event-triggered control of microgrid load frequency control system. ISA Trans. 2025, 164, 46–60. [Google Scholar] [CrossRef]
Yu, X.; Gao, C.; Du, Y.; Gao, B.; Tian, D.; Hou, T. Adaptive residual observer-based detection and isolation framework against false data injection attack in large-scale power systems. Sci. Rep. 2025, 15, 41070. [Google Scholar] [CrossRef]
Kim, K.; Sasahara, H.; Imura, J.I. Adaptive false data injection attack detection in load frequency control using recurrent neural networks. SICE J. Control Meas. Syst. Integr. 2025, 18, 2597566. [Google Scholar] [CrossRef]
Zheng, F.; Li, W.; Li, H.; Yang, L.; Sun, Z. Research on load frequency control system attack detection method based on multi-model fusion. Energy Inform. 2025, 8, 72. [Google Scholar] [CrossRef]
Zhao, X.; Ma, Z.; Shi, X.; Zou, S. Attack Detection and Mitigation Scheme of Load Frequency Control Systems Against False Data Injection Attacks. IEEE Trans. Ind. Inform. 2024, 20, 9952–9962. [Google Scholar] [CrossRef]
Hong, C.; Liang, Z.; Yang, Y.; Li, P.; Chen, L.; Bi, L.; Zhang, Y. Research on detection and defense methods for false data injection attacks in power systems based on state-space decomposition. Discov. Appl. Sci. 2025, 7, 760. [Google Scholar] [CrossRef]
Huang, C.; Deng, S.; Ge, H. Resilient load frequency control of cyber–physical power systems with off-the-shelf redundant communication channels under FDI attacks. Meas. Energy 2025, 7, 100053. [Google Scholar] [CrossRef]
Khandani, K.; Hafezi, A. Observer-Based Exponential H_∞ Event-Triggered Load Frequency Control Approach for Multi-Area Power Systems. Int. J. Control 2025, 1–20. [Google Scholar] [CrossRef]
Uribe-Pérez, N.; Gonzalez-Garrido, A.; Gallarreta, A.; Justel, D.; González-Pérez, M.; González-Ramos, J.; Arrizabalaga, A.; Asensio, F.J.; Bidaguren, P. Communications and Data Science for the Success of Vehicle-to-Grid Technologies: Current State and Future Trends. Electronics 2024, 13, 1940. [Google Scholar] [CrossRef]
Shi, R.; Peng, S.; Chang, T.; Lee, K.Y. Annotated Survey on the Research Progress within Vehicle-to-Grid Techniques Based on CiteSpace Statistical Result. World Electr. Veh. J. 2023, 14, 303. [Google Scholar] [CrossRef]
Zeng, F.; Wei, Z.; Sun, G.; Wang, M.; Han, H. Frequency Regulation of Electric Vehicle Aggregator Considering User Requirements with Limited Data Collection. Energies 2023, 16, 848. [Google Scholar] [CrossRef]
Shariatzadeh, M.; Lopes, M.A.R.; Antunes, C.H. Electric vehicle users’ charging behavior: A review of influential factors, methods and modeling approaches. Appl. Energy 2025, 396, 126167. [Google Scholar] [CrossRef]
Zhao, W.; Shao, Z.; Yang, S.; Lu, X. A novel conditional diffusion model for joint source-load scenario generation considering both diversity and controllability. Appl. Energy 2025, 377, 124555. [Google Scholar] [CrossRef]
Moos, J.; Hansel, K.; Abdulsamad, H.; Stark, S.; Clever, D.; Peters, J. Robust Reinforcement Learning: A Review of Foundations and Recent Advances. Mach. Learn. Knowl. Extr. 2022, 4, 276–315. [Google Scholar] [CrossRef]
Xi, A.; Cai, Y. Deep Reinforcement Learning-Based Differential Game Guidance Law against Maneuvering Evaders. Aerospace 2024, 11, 558. [Google Scholar] [CrossRef]
Abbaspour, A.; Sargolzaei, A.; Forouzannezhad, P.; Yen, K.K.; Sarwat, A.I. Resilient Control Design for Load Frequency Control System Under False Data Injection Attacks. IEEE Trans. Ind. Electron. 2020, 67, 7951–7962. [Google Scholar] [CrossRef]
Li, Y.; Huang, R.; Ma, L. False Data Injection Attack and Defense Method on Load Frequency Control. IEEE Internet Things J. 2021, 8, 2910–2919. [Google Scholar] [CrossRef]
Sargolzaei, A.; Yazdani, K.; Abbaspour, A.; Crane, C.D., III; Dixon, W.E. Detection and Mitigation of False Data Injection Attacks in Networked Control Systems. IEEE Trans. Ind. Inform. 2020, 16, 4281–4292. [Google Scholar] [CrossRef]
Zhang, M.; Dong, S.; Shi, P.; Chen, G.; Guan, X. Distributed Observer-Based Event-Triggered Load Frequency Control of Multiarea Power Systems Under Cyber Attacks. IEEE Trans. Autom. Sci. Eng. 2023, 20, 2435–2444. [Google Scholar] [CrossRef]
Wu, L.; Liu, F.; Wang, Y.; Liu, C.; Liu, Q.; Xu, Y. Event-triggered observer-based load frequency control for cyber–physical power systems with electric vehicles under hybrid attacks. Control Eng. Pract. 2025, 165, 106600. [Google Scholar] [CrossRef]
Singh, R.; Kumar, R.; Raj, U.; Shankar, R. Robust Load Frequency Control in Cyber-Vulnerable Smart Grids with Renewable Integration. Energies 2025, 18, 2899. [Google Scholar] [CrossRef]
Shangguan, X.-C.; He, Y.; Zhang, C.K.; Yao, W.; Zhao, Y.; Jiang, L.; Wu, M. Resilient Load Frequency Control of Power Systems to Compensate Random Time Delays and Time-Delay Attacks. IEEE Trans. Ind. Electron. 2023, 70, 5115–5128. [Google Scholar] [CrossRef]
Zhang, G.; Gao, W.; Li, Y.; Guo, X.; Hu, P.; Zhu, J. Detection of False Data Injection Attacks in a Smart Grid Based on WLS and an Adaptive Interpolation Extended Kalman Filter. Energies 2023, 16, 7203. [Google Scholar] [CrossRef]
Li, H.; Lai, L.; Djouadi, S.M. Combating False Reports for Secure Networked Control in Smart Grid via Trustiness Evaluation. In Proceedings of the 2011 IEEE International Conference on Communications (ICC), Kyoto, Japan, 5–9 June 2011; pp. 1–5. [Google Scholar] [CrossRef][Green Version]
Liang, C.; Wen, F.; Wang, Z. Trust-based distributed Kalman filtering for target tracking under malicious cyber attacks. Inf. Fusion 2019, 46, 44–50. [Google Scholar] [CrossRef]

Figure 1. Closed-loop grid frequency regulation with networked V2G aggregation.

Figure 2. Multi-area V2G aggregation architecture for frequency regulation.

Figure 3. The offline training framework of the proposed scheme.

Figure 4. Correlation Comparison of Real and Diffusion-Generated Reported Capacities.

Figure 5. The sum of power disturbances from load and renewable resources.

Figure 6. Convergence curve of the round-level system loss during offline adversarial training.

Figure 7. Comparison of frequency deviation curves of with and without DOA.

Figure 8. Comparison of EV command mismatch with and without DOA.

Figure 9. The sum of power disturbances from load and renewable resources in the 57-bus system.

Figure 10. Comparison of frequency deviation curves in the 57-bus system.

Figure 11. Comparison of EV command mismatch in the 57-bus system.

Figure 12. Comparison of frequency deviation curves.

Figure 13. Comparison of EV command mismatch.

Table 1. Frequency and EV command mismatch comparison of with and without DOA.

Method	Frequency Nadir (Hz)	Frequency Zenith (Hz)	RMS Frequency Deviation (Hz)	RMS Improvement (%)	Mean EV Command Mismatch (MW)
No defense	−0.36573	0.46115	0.13426	0	6.662
RLS w/o DOA	−0.39399	0.3824	0.12246	8.79	5.896
DOARL-S	−0.22777	0.25411	0.09174	31.67	2.412

Table 2. Frequency response and EV command mismatch comparison in the 57-bus system.

Method	Frequency Nadir (Hz)	Frequency Zenith (Hz)	RMS Frequency Deviation (Hz)	RMS Improvement (%)	Mean EV Mismatch (MW)
No defense	−0.41323	0.40058	0.1139	0	5.6283
DOARL-S	−0.21567	0.21695	0.084052	26.21	3.7476

Table 3. Frequency deviation and EV command mismatch comparison.

Method	Frequency Nadir (Hz)	Frequency Zenith (Hz)	RMS Frequency Deviation (Hz)	RMS Improvement (%)	Mean EVMismatch (MW)
No defense	−0.36573	0.46115	0.13426	0	6.662
DCS	−0.26498	0.25464	0.09661	28.05	1.0011
OSS	−0.36075	0.27178	0.10844	19.23	2.5171
DOARL-S	−0.22777	0.25411	0.09174	31.67	2.412
RWS	−0.37459	0.3179	0.11898	11.38	5.4774
KTS	−0.46706	0.36035	0.11802	12.10	3.1483

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xiong, X.; Li, S.; Xia, K.; Zheng, H.; Huang, Z.; Zhu, T.; Wang, Z.; Kang, Q. Spatially Heterogeneous Resilient V2G-Enabled Grid Frequency Control via an Adversarially Trained Structural Switching Framework. Symmetry 2026, 18, 843. https://doi.org/10.3390/sym18050843

AMA Style

Xiong X, Li S, Xia K, Zheng H, Huang Z, Zhu T, Wang Z, Kang Q. Spatially Heterogeneous Resilient V2G-Enabled Grid Frequency Control via an Adversarially Trained Structural Switching Framework. Symmetry. 2026; 18(5):843. https://doi.org/10.3390/sym18050843

Chicago/Turabian Style

Xiong, Xiong, Shengyao Li, Kaiyi Xia, Hao Zheng, Zicheng Huang, Tong Zhu, Zijie Wang, and Qi Kang. 2026. "Spatially Heterogeneous Resilient V2G-Enabled Grid Frequency Control via an Adversarially Trained Structural Switching Framework" Symmetry 18, no. 5: 843. https://doi.org/10.3390/sym18050843

APA Style

Xiong, X., Li, S., Xia, K., Zheng, H., Huang, Z., Zhu, T., Wang, Z., & Kang, Q. (2026). Spatially Heterogeneous Resilient V2G-Enabled Grid Frequency Control via an Adversarially Trained Structural Switching Framework. Symmetry, 18(5), 843. https://doi.org/10.3390/sym18050843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spatially Heterogeneous Resilient V2G-Enabled Grid Frequency Control via an Adversarially Trained Structural Switching Framework

Abstract

1. Introduction

1.1. Literature Review

1.2. Challenges and Motivations

1.3. Contributions

2. System Model: Spatially Heterogeneous V2G Aggregation in Grid Frequency Control

2.1. Grid Frequency Dynamics with Networked V2G Injection

2.2. Multi-Area V2G Aggregation Architecture and Interaction Mechanism

2.3. Communication Degradation and Bidirectional FDIA in Networked V2G Dispatch

2.4. Modeling Assumptions

2.4.1. Generator-Side Modeling

2.4.2. EV Aggregation Modeling

3. Detection-Free Select-Switch for Robust V2G Frequency Control with Coupled Heterogeneous Uncertainties

3.1. Scenario Generator Based on Diffusion

Training Note and Justification of the Diffusion Scenario Generator

3.2. Selection-and-Switching Defense Mechanism

3.3. Solving the Adversarial Optimization Problem via Reinforcement Learning

3.4. DOARL-S Training and Deployment Procedure

4. Simulation and Results Analysis

4.1. Ablation Study of the DOA Mechanism in DOARL-S

4.2. Validation of DOARL-S in an Expanded Scenario

4.3. Comparison of DOARL-S with Different Defense Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI