Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning

Liang, Canhang

doi:10.3390/sym18050824

Open AccessArticle

Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning

by

Canhang Liang

School of Electric Power Engineering, South China University of Technology, Guangzhou 510641, China

Symmetry 2026, 18(5), 824; https://doi.org/10.3390/sym18050824 (registering DOI)

Submission received: 25 March 2026 / Revised: 20 April 2026 / Accepted: 29 April 2026 / Published: 11 May 2026

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

With the increasing penetration of renewable energy, power systems are facing stronger frequency fluctuations, which make fast and flexible frequency support increasingly important. Although vehicle-to-grid (V2G) technology provides a promising source of distributed regulation capacity, many existing studies do not explicitly consider EV owners’ participation, which may lead to a mismatch between theoretical regulation potential and practically available V2G support. To address this issue, this paper proposes a distributed Grid–Aggregator–EV frequency-regulation (FR) framework that incorporates EV participation factor into the control design. A three-layer architecture and a dynamic participation-aware model are established to describe the coordination of distributed V2G resources, and a Hamiltonian-based robust control law is developed under V2G power constraints. An integral reinforcement learning scheme is then adopted to realize the optimal regulation policy online, where the controller does not require explicit online knowledge of the system drift matrix, while preserving the physical control structure. In this way, the proposed method explicitly links the EV participation factor, dispatchable V2G regulation capacity, and coordinated FR, thereby improving robustness, adaptability, and practical relevance. Simulation studies on the IEEE 14-bus and IEEE 39-bus systems, together with an evening-period, time-varying participation case, demonstrate that the proposed method provides more effective frequency-deviation suppression, better overall regulation performance, and stable operation under dynamic EV participation.

Keywords:

GRID FR; V2G; reinforcement learning

1. Introduction

With the continuous increase in renewable-energy penetration, power systems are facing stronger frequency fluctuations due to the intermittency and uncertainty of wind and solar generation. Although conventional generating units remain the main source of frequency support, their response speed and regulation flexibility are often insufficient under highly dynamic operating conditions. Large-scale battery energy storage can provide fast support, but its deployment cost is still considerable. In this context, vehicle-to-grid (V2G) technology has attracted growing attention because electric vehicles (EVs) can act as distributed and fast-response energy resources for ancillary services, including frequency regulation. Recent reviews have further shown that modern FR/LFC research is evolving from classical control toward intelligent, data-driven, and resilient regulation frameworks for renewable-rich power systems [1,2,3].

Existing FR strategies mainly include classical feedback control and optimization-based methods such as proportional–integral–derivative (PID) control and model predictive control (MPC). These methods have provided useful baselines for improving system frequency quality and coordinating power support from controllable resources [4,5]. However, their effectiveness in large-scale V2G-integrated power systems is often limited by model dependence, online computational burden, and insufficient adaptability to strong nonlinearities and uncertain disturbances. Recent state-of-the-art reviews on load FR have emphasized that these limitations become more pronounced in interconnected and renewable-dominated systems, where modern intelligent and data-driven approaches are increasingly required [1,2].

To enhance adaptability and control performance under uncertainty, a variety of intelligent control methods have been introduced into FR problems, including integral sliding mode control, hierarchical adaptive control, adaptive dynamic programming, and reinforcement learning [6,7,8,9,10]. Among them, reinforcement-learning-based and approximate dynamic programming methods are particularly attractive because they can approximate optimal policies online without explicitly solving the Hamilton–Jacobi–Bellman equation. Recent surveys have also shown that reinforcement learning has become an important direction for both load FR and V2G-oriented scheduling under uncertainty [1,11]. Nevertheless, most existing studies either focus on controller adaptability without explicitly modeling EV participation factor constraints, or consider V2G scheduling without sufficiently addressing robust closed-loop FR in distributed multi-layer architectures. Representative recent studies have further shown that model-free or adaptive-dynamic-programming-based methods can be directly applied to FR problems, for example, in islanded microgrids under uncertain operating conditions, thereby reinforcing the relevance of learning-based regulation strategies in frequency-control applications [12].

In parallel, V2G has been widely recognized as a promising resource for ancillary services because EV batteries provide rapid bidirectional power support. Recent review studies have further emphasized that V2G technology can serve as an ancillary-services provider for renewable-rich power systems, offering services such as FR, voltage support, spinning reserve, and peak-load support, which strengthens the practical motivation for participation-aware V2G FR [13]. However, practical V2G participation is fundamentally constrained by user-side factors, including travel demand, charging flexibility, battery degradation concerns, and economic incentives. Recent studies have shown that users’ willingness to participate in V2G is strongly influenced by financial returns, perceived loss of flexibility, and battery-health concerns [14,15]. In addition, collaborative V2G market studies have highlighted that heterogeneous driving schedules and incentive design significantly affect the dispatchable V2G capacity available for FR [16]. Therefore, treating EV participation factor as a fixed or preset parameter may lead to a mismatch between theoretical regulation capacity and practically available V2G support.

From a control-theoretic perspective, Hamiltonian- and energy-based formulations provide a rigorous framework for deriving optimal control laws in constrained power and microgrid systems [17,18]. Meanwhile, recent deep-reinforcement-learning-based V2G studies have demonstrated the potential of learning-based strategies for handling operational uncertainty and dynamic charging/discharging decisions [19]. These developments indicate that combining Hamiltonian-based optimality with online learning is a promising direction for participation-aware V2G FR. However, the integration of EV-owner participation dynamics, distributed Grid–Aggregator–EV coordination, and online optimal robust FR remains insufficiently explored in the current literature.

Despite the above progress, several research gaps remain. First, most existing V2G frequency-regulation studies still simplify EV participation factor as a fixed or exogenous parameter, without dynamically characterizing the impacts of user willingness and practical availability on dispatchable regulation capacity [14,15,16]. Second, current FR strategies rarely unify participation-aware V2G capacity constraints, conventional-generator coordination, and disturbance-rejection requirements within a single optimal control framework [1,2]. Third, although reinforcement-learning-based methods improve adaptability, the combination of Hamiltonian-based optimality analysis and online critic learning for distributed V2G FR is still limited [11,16,19].

To address the aforementioned challenges, this paper proposes a participation-aware V2G-based FR method that explicitly incorporates EV owners’ participation factor into the control design. First, a power-grid FR model incorporating the EV participation factor is established to characterize the constrained relationship between the participation factor and dispatchable V2G regulation capacity. Second, a Hamiltonian function is constructed to derive the optimal control law. Finally, a critic-neural-network-based online learning algorithm is designed to update the network weights and implement the control policy online. The proposed method ultimately achieves fast and stable frequency regulation while accounting for EV participation and enhancing system robustness. The main contributions of this paper can be summarized in the following two aspects:

A distributed multi-layer V2G FR architecture, including the power-grid side, aggregator side, and EV side, is established to achieve collaborative FR among multiple controllers. Meanwhile, the EV participation factor is quantified in the developed power-grid model to characterize the dispatchable V2G regulation capacity, thereby improving V2G FR performance.

This paper presents a collaborative integral reinforcement learning control scheme that is model-free with respect to the drift dynamics in the online implementation. By simultaneously incorporating V2G FR signals and power grid control signals, the proposed utility function enables optimal collaborative FR across various types of FR signals, which in turn further optimizes the FR performance.

To make the objective of this study more explicit, the research hypothesis of this paper is stated as follows:

H1.

Explicit modeling of EV owners’ participation factor improves the practical dispatchability of V2G resources and leads to better frequency-regulation performance than participation-agnostic control.

H2.

A Hamiltonian-based optimal control framework combined with integral reinforcement learning can provide a robust and adaptive regulation policy for distributed V2G FR under uncertain disturbances.

The subsequent arrangement of this paper is as follows: Section 2 elaborates on the establishment of the power grid FR model and problem description. Section 3 presents the FR controller, learning algorithm, and stability analysis based on EV owners’ V2G participation factor. Section 4 verifies the effectiveness, superiority, and practical applicability of the proposed method through simulation results in single-area, multi-area, and time-varying participation scenarios. Section 5 summarizes the research findings of this paper.

2. Model Construction and Problem Description

2.1. Power System Model

This section establishes the participation-aware dynamic model from the physical Grid–Aggregator–EV architecture and provides the basis for the subsequent control design.

As shown in Figure 1, a three-layer FR architecture for distributed V2G is established, consisting of the power grid side, aggregation side, and EV side. The power grid side serves as the initiation and regulation level for FR demands, leading the overall planning of global FR resources. The aggregation side comprises multiple EV Aggregators (EVAs), which aggregate regional FR resources through electrical and physical connections, and complete cross-regional information interaction and collaborative decision-making via the EV Communication Network (EVCN). The EV side takes Distributed EV Communities (DEVCs) as units; within each community, EVs form an autonomous entity through local communication networks, and while receiving instructions from the aggregation side, they feed back their own FR capabilities and status information. Through bidirectional interaction between electrical-physical connections and data-information connections, the three-layer architecture realizes multi-level FR resource scheduling and collaboration from the power grid to individual EVs, providing architectural support for the efficiency and flexibility of power system FR in distributed V2G scenarios.

Figure 2 shows the structure of the proposed frequency-regulation system with V2G. In this framework, the conventional-generator channel and the aggregated EV channel collaboratively provide regulation support, while EV owners’ participation factor constrains the dispatchable V2G capacity. Therefore, the system explicitly links participation-aware V2G availability with coordinated frequency-control action.

Based on the above physical architecture, the corresponding mathematical model is established as follows. The power-grid layer determines the global frequency-regulation task and provides the system frequency deviation to be suppressed. The aggregator layer collects participation-related information and available regulation capacity from distributed EV communities and converts them into aggregated regulation actions. The EV layer provides practical charging/discharging flexibility subject to the user-side participation factor and operational availability. Accordingly, the system frequency deviation, turbine-governor dynamics, aggregated EV regulation power, integral regulation state, and participation factor are selected as the main state variables of the mathematical model. In this study, the primary controlled variable is the system frequency deviation

Δ f

, while the conventional-generator regulation signal

u_{p p}

and the EV charge/discharge regulation signal

u_{E V}

are treated as the two manipulated inputs. External load and renewable-energy fluctuations are modeled as lumped disturbances.

For the main controller, the frequency dynamics of the equivalent regulator, steam turbine, and power system, as well as the aggregated V2G power output and EV owners’ participation factor, are described by the following differential equations:

\{\begin{array}{l} H_{s} Δ \dot{f} = - D Δ f + Δ P_{m} + Δ P_{E V} - Δ P_{L} + Δ P_{R} \\ T_{d} Δ {\dot{P}}_{m} = - Δ P_{m} + Δ P_{g} \\ T_{g} Δ {\dot{P}}_{g} = - Δ P_{g} - Δ f / R_{d} + u_{p p} \\ T_{E} Δ {\dot{P}}_{E V} = - Δ P_{E V} + α u_{E V} \\ {\dot{U}}_{I} = Δ f \\ T_{α} \dot{α} = - α + α_{r e f} \end{array}

(1)

In the above parameters,

Δ f

,

Δ P_{m}

,

Δ P_{E V}

,

Δ P_{g}

,

U_{I}

and

α

correspond to the frequency deviation, mechanical power deviation of the hydro turbine, control power deviation of electric vehicles (EVs) in FR, governor position deviation of the hydro turbine, integral of the area control error (ACE), and V2G participation factor of EV owners, in that order.

Δ P_{L}

and

Δ P_{R}

respectively denote the disturbance power originating from loads and renewable energy sources. Furthermore,

H_{s}

,

D

,

T_{d}

,

T_{g}

,

T_{e}

,

R_{d}

and

T_{α}

represent the grid inertia constant, grid damping coefficient, hydro turbine time constant, governor time constant, EV controller time constant, governor control input deviation coefficient, and user behavior inertia time constant, respectively. Additionally,

u_{p p}

and

u_{E V}

denote the EV charge/discharge control signal and the primary FR signal from conventional generators, respectively.

Define the system state vector as:

x (t) = {[Δ f, Δ P_{m}, Δ P_{g}, Δ P_{E V}, U_{I}, α]}^{T}

(2)

Accordingly, the overall system dynamics can be rewritten in the following state-space form:

\dot{x} = A x + B_{1} u_{p p} + B_{2} u_{E V} + E v

(3)

Here,

A

denotes the system state matrix,

B_{1}

and

B_{2}

denote the input matrices associated with different control channels, and

E

denotes the external disturbance matrix.

In this study, the primary controlled variable is the system frequency deviation

Δ f

, while the integral of the area control error and the aggregated EV regulation power are included as auxiliary regulation-related states to improve dynamic frequency-recovery performance. The two manipulated inputs are the conventional-generator control signal

u_{p p}

and the EV charge/discharge control signal

u_{E V}

.

\begin{array}{l} A = [\begin{matrix} - \frac{D}{H_{s}} & \frac{1}{H_{s}} & 0 & \frac{1}{H_{s}} & 0 & 0 \\ - \frac{1}{R_{d} T_{g}} & - \frac{1}{T_{d}} & \frac{1}{T_{d}} & 0 & 0 & 0 \\ 0 & 0 & - \frac{1}{T_{g}} & 0 & 0 & 0 \\ 0 & 0 & 0 & - \frac{1}{T_{E}} & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & - \frac{1}{T_{α}} \end{matrix}], \\ B_{1} = [\begin{matrix} \begin{matrix} 0 \\ 0 \\ \frac{1}{T_{g}} \\ 0 \\ 0 \\ 0 \end{matrix} \end{matrix}], B_{2} = [\begin{matrix} \begin{matrix} 0 \\ 0 \\ 0 \\ \frac{α}{T_{E}} \\ 0 \\ 0 \end{matrix} \end{matrix}], E = [\begin{matrix} - \frac{1}{H_{s}} & \frac{1}{H_{s}} \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \end{matrix}] \end{array}

(4)

The communication links shown in Figure 1 introduce finite delays in the transmission of regulation commands and participation-related information between the grid layer, aggregator layer, and EV layer. In this work, these delays are assumed to be bounded and relatively small compared with the dominant frequency-regulation time scale. Therefore, their aggregated effect is incorporated into the regulation framework through the existing actuator/response dynamics of the aggregated EV channel and the participation-update process, rather than being treated as an independent large-delay networked control problem. Under this assumption, the proposed model captures the practical latency effect in a simplified but implementation-oriented manner.

Here, the EV-side time constant

T_{E}

is used to capture the aggregated response lag of the V2G regulation channel, including the effect of local communication, command execution, and power-response delay at the aggregator–EV interface.

The first-order dynamic equation for EV owners’ V2G participation factor is calibrated based on 320 scenario-based questionnaires from first-tier cities and Monte Carlo simulations of 300,000 EVs in [20], which aligns with the behavioral characteristics of vehicle owners such as SOC thresholds and travel patterns. V2G participation factor among EV owners quantifies the willingness of electric vehicle owners to participate in grid FR. A higher

α

indicates a larger energy range available for V2G to support grid FR. A lower

α

indicates a smaller energy range available for V2G to support grid FR, making the system more reliant on conventional generators.

Therefore, the physical frequency-regulation problem is transformed into a participation-aware state-space control problem with dual control channels, namely conventional generator regulation and aggregated EV regulation.

2.2. Problem Description

With the growing integration of renewable energy sources into power grids, frequency variations resulting from their intermittent characteristics pose a significant threat to grid stability, and in extreme cases, may trigger the collapse of the power system. Traditional solutions rely on energy storage devices such as batteries for reverse power supply, but large-scale deployment entails substantial costs. Meanwhile, with the development of new energy technologies, an increasing number of EVs have entered daily life. Thanks to the rapid advancement of V2G technology, EVs are regarded as distributed energy storage devices. Beyond meeting travel needs, EV batteries can fulfill energy storage functions through V2G technology. Moreover, since the charging and discharging process of EVs involves electromagnetic and chemical reactions rather than mechanical processes, EVs respond faster than power plants during FR. In summary, compared with traditional energy storage devices, EV batteries offer the advantages of low cost, convenient use, and fast response speed. However, this framework faces three core challenges. First, EVs are primarily used to satisfy travel demand, and frequent charging and discharging may accelerate battery degradation, leading to low and uncertain EV participation. Therefore, improving and characterizing the V2G participation factor becomes a key issue. Second, EVs are widely distributed, which makes centralized power dispatch difficult; thus, the coordinated scheduling of large-scale distributed resources must be addressed. Third, multiple control strategies coexist in the power grid, and their power allocation must be coordinated to achieve efficient FR.

3. Collaborative FR Based on Integral Reinforcement Learning

The overall methodological workflow of the proposed participation-aware V2G frequency-regulation framework is summarized in Figure 3.

The proposed methodology is organized into four main stages. First, a participation-aware dynamic model is established from the physical Grid–Aggregator–EV architecture, in which the EV participation factor is incorporated into the state-space representation of FR. Second, a constrained optimal control problem is formulated so that the regulation objective remains consistent with frequency-quality requirements and practical V2G availability. Third, a Hamiltonian-based regulation law is derived and implemented online through critic-network-based integral reinforcement learning. Fourth, the resulting closed-loop properties and the research hypotheses are examined through stability analysis and simulation-based validation in both single-area and multi-area test systems.

3.1. Optimal Control Objective

This subsection formulates the constrained optimal control problem so that FR remains consistent with practical V2G availability and participation-aware resource limits.

To support Hypotheses H1 and H2, the control objective is formulated to jointly penalize frequency deviation, control effort, and the limitation of dispatchable V2G capacity induced by the EV participation factor.

The saturation of available V2G FR capacity is defined by

u_{i}

, specifically as follows:

u_{i} = s a t ({\hat{u}}_{i}, U_{m}) = \{\begin{matrix} \min ({\hat{u}}_{i}, U_{m}^{+}), {\hat{u}}_{i} \geq 0 \\ \max ({\hat{u}}_{i}, U_{m}^{-}), {\hat{u}}_{i} \leq 0 \end{matrix}

(5)

The objective function is designed to reflect the physical regulation task shown in Figure 1. Specifically, it penalizes frequency-related state deviations to ensure frequency quality at the grid layer, penalizes control effort to avoid excessive regulation burden on conventional generators and EV resources, and incorporates the limitation of dispatchable V2G capacity so that the optimization remains consistent with the practical participation-aware availability provided by the aggregator and EV layers.

The optimal control minimizes the cost function at the saddle point, as shown in Equation (6).

J (x) = \int_{t}^{\infty} r (x (τ), u_{p p} (τ), u_{E V} (τ)) d τ

(6)

r (x (τ), u_{p p} (τ), u_{E V} (τ))

is defined as the utility function:

r (x, u_{p p}, u_{E V}) = x^{T} Q x + u_{p p}^{T} R_{u} u_{p p} + σ (u_{E V})

(7)

Among these,

Q

and

R_{u}

are given symmetric positive definite matrices.

σ (u_{E V}) \in R

is a semi-definite function used to regulate the available V2G FR

U_{m}

under the following constraints

u_{E V}

:

σ (u_{E V}) = 2 U_{m} \int_{0}^{u_{E V}} ζ^{- 1} (ξ / U_{m}) d ξ

(8)

Among these,

ζ (•)

belongs to the class of bounded, monotonic, odd functions, such as the hyperbolic tangent function (where

ζ (0) = 0

).

The Hamiltonian function is constructed as follows:

H (x, u_{p p}, u_{E V}, J_{x}) = J_{x}^{T} \dot{x} + r

(9)

Among these,

J_{x} = \partial J / \partial x

.

Let

J^{*}

be the optimal performance indicator function, then

0 = \min_{u \in U} \{H (x, u_{p p}, u_{E V}, J_{x}^{*})\}

(10)

u^{*} = \arg \min_{u \in U} \{H (x, u_{p p}, u_{E V}, J_{x}^{*})\}

(11)

Take the partial derivative of the Hamilton function

H

with respect to

u_{p p}

and set it equal to 0:

\frac{\partial H}{\partial u_{p p}} = B_{1}^{T} J_{x} + 2 R u_{p p} = 0 \Rightarrow u_{p p}^{*} = - \frac{1}{2} R^{- 1} B_{1}^{T} J_{x}^{*}

(12)

Taking the partial derivative of the Hamilton function

H

with respect to

u_{E V}

and setting it equal to 0 yields similarly:

u_{E V}^{*} = - U_{m} \tanh (\frac{1}{2 U_{m}} B_{2}^{T} J_{x}^{*})

(13)

Therefore, the optimal control can be obtained from (10) below, where the superscript * denotes the optimal solution.

The optimal control may be formulated as:

u_{p p}^{*} = - \frac{1}{2} R^{- 1} B_{1}^{T} J_{x}^{*}

(14)

u_{E V}^{*} = - U_{m} \tanh (\frac{1}{2 U_{m}} B_{2}^{T} J_{x}^{*})

(15)

Following the definition in [21], “model-free” means that the online learning and control implementation do not require explicit knowledge of the system dynamics. In this paper,

A

denotes the drift dynamics matrix of the participation-aware frequency-regulation system, while

B_{1}

and

B_{2}

denote the input-channel matrices. Since the final IRL-based control laws do not explicitly depend on

A

, the proposed controller can be regarded as model-free with respect to the drift dynamics.

Based on the above state-space model and constrained regulation objective, a Hamiltonian function is constructed to derive the optimal regulation law. The main contribution here is that the practical V2G participation limitation is embedded into the optimal control formulation, rather than treated as an exogenous fixed coefficient. This enables the resulting controller to better reflect the physically available V2G support in real operation.

3.2. V2G Optimal FR and Online Learning Algorithm

This part implements the Hamiltonian-based control law online through critic-network-based integral reinforcement learning.

Since the optimal value function cannot be solved analytically, the Hamiltonian-based control law is implemented online through a critic neural network within an integral reinforcement learning framework. In the proposed implementation, the online policy update is carried out without explicit online dependence on the system drift matrix

A

. Therefore, the learning module serves not only as a computational realization of the optimal control design, but also as the key mechanism that enables a model-free online implementation with respect to the drift dynamics.

Since the bounded communication-induced latency is reflected in the aggregated EV-channel dynamics, the optimization and online learning process are carried out on the participation-aware delayed-response system rather than on an ideal instantaneous-response V2G model.

The calculation

u_{p p}^{*}

of

u_{E V}^{*}

depends on

J_{x}^{*}

, as shown in Equations (14) and (15).

J_{x}^{*}

cannot be explicitly computed, thus requiring the future value of the utility function. Accordingly, an adaptive neural network-based discriminator scheme is employed to solve for

u_{p p}^{*}

and

u_{E V}^{*}

, where one neural network approximates

J_{x}^{*}

[10]. The critic neural network can reconstruct

W_{c}

using ideal neural weights, as follows:

J^{*} (x) = W_{c}^{T} φ_{c} (x) + ε_{c} (x)

(16)

Here

W_{c} \in R^{d}

,

d

denotes the number of neurons in the hidden layer.

x

represents the system state, which serves as input to the critic neural network.

φ_{c} (x) \in ℝ^{N}

denotes the activation-function vector, and

ε_{c} (x)

is the bounded neural-network approximation error. Due to the powerful approximation capabilities of neural networks, this error can be made extremely small. As follows,

J_{x}^{*}

can be deduced from Equation (16):

J_{x}^{*} (x) = \nabla φ_{c}^{T} W_{c} + \nabla ε_{c}

(17)

Among these,

\nabla φ_{c} = \partial φ_{c} / \partial x

and

\nabla φ_{c} = \partial φ_{c} / \partial x \in ℝ^{N \times n}

denotes its gradient with respect to the state vector

x \in ℝ^{n}

. As defined in Equation (17), the neural network transforms the value function into a critic neural network that meets the HJB equation for optimality achievement. Consequently, an optimized robust controller can be obtained from the critic neural network by learning

W_{c}

. By learning to construct an estimated weight

{\hat{W}}_{c}

to approximate

W_{c}

, and can be approximated as follows:

\hat{J} (x) = {\hat{W}}_{c}^{T} φ_{c}

(18)

During online learning,

u_{p p} = {\hat{u}}_{p p} + ε_{1}

and

u_{E V} = {\hat{u}}_{E V} + ε_{2}

are deployed as FR signals, where

ε_{1}

and

ε_{2}

are two small random noises for exploration. Even though the approximated worst disturbance

\hat{v}

may not equal the actual disturbance

v

,

\hat{v}

is used in online learning; thus,

{\hat{u}}_{p p}

and

{\hat{u}}_{E V}

are optimized under the worst disturbance, which enhances the robustness of V2G-based FR.

Therefore, the derivative of

J_{x}^{*}

with respect to

{\hat{J}}_{x}^{*}

is estimated as follows:

{\hat{J}}_{x} (x) = \nabla φ_{c}^{T} {\hat{W}}_{c}

(19)

Substituting Formula (19) into Formulas (14) and (15),

u_{p p}^{*}

and

u_{E V}^{*}

can be approximated as follows:

{\hat{u}}_{p p}^{*} = - \frac{1}{2} R^{- 1} B_{1}^{T} \nabla φ_{c}^{T} {\hat{W}}_{c}

(20)

{\hat{u}}_{E V}^{*} = - U_{m} \tanh (\frac{1}{2 U_{m}} B_{2}^{T} \nabla φ_{c}^{T} {\hat{W}}_{c})

(21)

HJB residual error is as follows:

e_{H} = \hat{J} + H (x, {\hat{u}}_{p p}, {\hat{u}}_{E V}, {\hat{J}}_{x})

(22)

Objective function

E_{c} = e_{c}^{T} e_{c} / 2

is introduced to quantify the distance to optimality. The learning algorithm conducts gradient descent along

E_{c}

with respect to

{\hat{W}}_{c}

, leading to the following derivation of

{\hat{W}}_{c}

learning dynamics:

{\hat{W}}_{c} = - \frac{η ρ}{{(1 + ρ^{T} ρ)}^{2}} e_{H}

(23)

Among these,

ρ = \nabla φ_{c} (A x + B_{1} {\hat{u}}_{p p} + B_{2} {\hat{u}}_{E V} + E v)

and

η > 0

represent learning rates, while

{(1 + ρ^{T} ρ)}^{- 2}

denotes the regression term. Once

{\hat{W}}_{c}

converges, it yields an optimized V2G FR controller, which is illustrated in Equations (20) and (21). Algorithm 1 is for optimal FR and online learning of vehicle-grid interaction.

Algorithm 1 V2G Optimal FR and Online Learning Algorithm

Initialize the critic neural network weight

{\hat{W}}_{c}

and learning rate

η

for each sampling time t do

Obtain state x from SCADA and the available V 2 G FRC U_{m}

from EV aggregator;

Calculate {\hat{J}}_{x}

by Equation (19) according to

x

;

Generate the optimal robust control output {\hat{u}}_{p p}

and {\hat{u}}_{E V}

, by Equations (20) and (21) respectively;

end for

3.3. Stability Analysis

The purpose of this subsection is to establish the boundedness of the closed-loop state trajectory and critic-learning error under the conditions required by the Lyapunov-based analysis, rather than to claim strict asymptotic stability under arbitrary operating conditions.

This subsection analyzes the stability of the proposed V2G FR system based on the standard Lyapunov extension theorem. As is common in neural-network-based approximate optimal control and integral reinforcement learning analysis, several boundedness assumptions are introduced to facilitate the Lyapunov-based derivation of the closed-loop stability properties. These assumptions are used to establish the boundedness of the closed-loop states and critic-learning dynamics under the considered operating conditions. To mathematically derive the corresponding stability result, the following assumptions are made:

Assumption 1.

The ideal critic-network weight vector

W_{c}

is bounded, i.e.,

‖W_{c}‖ \leq W_{M}

, where

W_{M}

is a positive constant. This assumption is standard in critic-network-based approximation analysis and reflects the existence of a compact operating region in which the ideal value-function representation is well defined. This condition is required to ensure that the critic-network representation of the value function remains well posed in the compact operating region considered in the Lyapunov analysis.

Assumption 2.

The neural-network approximation error

ε_{c} (x)

is bounded, i.e.,

|ε_{c} (x)| \leq ε_{M}

, where

ε_{M}

is a positive constant. Since the state trajectory is considered within a compact operating region and the critic neural network uses continuous activation functions, this boundedness assumption is consistent with the universal approximation property. This condition is necessary to bound the residual terms introduced by neural-network approximation in the derivative of the Lyapunov function.

Assumption 3.

The gradient of the activation-function vector is bounded, i.e.,

‖\nabla φ_{c} (x)‖ \leq φ_{M}

, where

φ_{M}

is a positive constant. The boundedness of

\nabla φ_{c} (x)

follows from the use of smooth bounded activation functions over the compact operating domain considered in this work. This condition guarantees that the gradient-related terms appearing in the critic-learning dynamics remain bounded and can be handled in the Lyapunov derivative estimate.

Under the above assumptions, a Lyapunov function is constructed to establish the boundedness of the closed-loop system and critic-learning dynamics.

L (t) = J_{x}^{*} (t) + \frac{1}{2} {\tilde{W}}_{c}^{T} (t) {\tilde{W}}_{c} (t)

(24)

where

L_{1} = J_{x}^{*} (t)

is the system’s optimal value function,

L_{2} = \frac{1}{2} {\tilde{W}}_{c}^{T} (t) {\tilde{W}}_{c} (t)

is the weight error term of the critic neural network, and

{\tilde{W}}_{c} = W_{c}^{*} - {\hat{W}}_{c}

,where

\tilde{W_{c}}

is the critic weight estimation error.

Take the partial derivative of

L_{1}

and

L_{2}

with respect to

x

to obtain:

{\dot{L}}_{1} = J_{x}^{* T} \dot{x}

(25)

{\dot{L}}_{2} = {\tilde{W}}_{c}^{T} {\dot{\tilde{W}}}_{c} = - {\tilde{W}}_{c}^{T} \dot{\hat{W}}

(26)

Constrain

{\dot{L}}_{1}

and

{\dot{L}}_{2}

such that:

{\dot{L}}_{1} \leq - λ_{\min} (Q) {‖x‖}^{2} + C_{0}

(27)

{\dot{L}}_{2} \leq - (η - 0.5) λ_{\min} (ϕ_{1} ϕ_{1}^{T}) {‖\tilde{W}‖}^{2} + \frac{1}{2} η^{2} b_{e}^{2}

(28)

where

λ_{\min} (Q)

is the smallest eigenvalue of the positive definite matrix

Q

,

C_{0}

is a normal constant.

It should be emphasized that the present analysis establishes uniform ultimate boundedness of the closed-loop states and critic weight estimation error under the stated assumptions, rather than asymptotic convergence in the stable strict sense. For the considered EV-integrated multi-area power system, under the proposed control laws, critic learning dynamics, and the stated boundedness assumptions, the closed-loop state

x

and critic weight estimation error

{\tilde{W}}_{c}

are uniformly ultimately bounded.

B_{x} = \sqrt{\frac{C}{a}}

(29)

B_{W_{c}} = \sqrt{\frac{C}{a}}

(30)

Specifically,

a = λ_{\min} (Q)

,

b = (η - 0.5) λ_{\min} (ϕ_{1} ϕ_{1}^{T})

,

C = C_{0} + \frac{1}{2} η^{2} b_{e}^{2}

.

In practical implementation, these boundedness conditions are supported by several design choices, including bounded activation functions, finite learning rates, bounded operating ranges of frequency deviation and V2G power, and physically constrained EV participation factor dynamics. Therefore, the assumptions used in the analysis are not arbitrary, but are consistent with the practical operating limits of the considered power-system FR problem.

A precise analytical characterization of the closed-loop stability region for large-scale multi-area V2G systems with stronger inter-area coupling is beyond the scope of the current study and will be investigated in future work. In this paper, the theoretical analysis mainly provides boundedness guarantees for the aggregated closed-loop system, while the effectiveness in multi-area scenarios is further supported by simulation studies on the IEEE 39-bus system.

It should be noted that the present work considers bounded small communication delays through an aggregated dynamic representation, rather than a full large-delay networked control formulation. The rigorous stability characterization under larger communication delays, packet loss, or asynchronous updates remains an important topic for future research.

4. Simulation and Results

The simulation studies are designed to verify the two research hypotheses from complementary aspects. The IEEE 14-bus case compares the proposed participation-aware controller with the participation-agnostic IRL baseline, mainly to examine whether explicit modeling of EV participation factor improves regulation performance (H1). The IEEE 39-bus case further compares the proposed method with MPC, ADHDP [8], and DDPG under multi-area disturbances, in order to evaluate the robustness and adaptability of the Hamiltonian-IRL framework in more complex scenarios (H2).

In this section, the IEEE 14-bus test system and IEEE 39-bus test system are used to verify the effectiveness and superiority of the proposed power system FR model incorporating EV owners’ participation factor. The parameters of the IEEE 14-bus system and IEEE 39-bus test system are listed in Table 1 and Table 2, respectively. Both systems have a base capacity of 100 MVA. The simulation duration is 180 s, with a sampling period of 0.01 s. All simulations are conducted on an ASUS laptop running Windows 10, equipped with an Intel Core i7-12700H CPU @ 2.30 GHz and 8 GB of RAM, using MATLAB 2018b.

This work performs two case studies to evaluate the proposed control strategy. In the first case, a comparative analysis is carried out between the traditional automatic coordinated control and the proposed V2G FR control (which accounts for EV owners’ participation factor), aiming to verify the effectiveness of the proposed V2G FR control. The second case involves a comparison between the proposed V2G FR control and three other existing methods—model predictive control (MPC), action-dependent heuristic dynamic programming (ADHDP), and deep deterministic policy gradient (DDPG) [22]—to highlight the advantages of the proposed scheme.

For the simulation, we assume that each area containing 2000 EVs can engage in V2G operations, where the charging/discharging power per EV is

\pm 7

kW [23]. Thus, the FR capacity

U_{m i}

is set to 0.2. Due to the simulation’s short time span,

U_{m i}

can be treated as a constant during the entire simulation process. The comprehensive scenario of power disturbances generated by loads and renewable energy sources is illustrated in Figure 4. The disturbance ranges from −15 MW to +15 MW (with a variance of 45 MW), consisting of

\pm 2

MW from load-related disturbances and

\pm 16

MW from renewable energy-related disturbances. Disturbance sources are of various types, encompassing both renewable energy and load-induced disturbances. In this study, load and renewable-power fluctuations are represented as lumped disturbances for controller-performance evaluation. This simplified treatment is intended for benchmark validation rather than detailed stochastic modeling of specific renewable-resource distributions.

4.1. Validation of the Effectiveness of V2G FR Control

This section conducts simulation experiments on the proposed V2G FR control based on EV owner engagement within the IEEE-14 node system, as shown in Figure 5, to validate the effectiveness of the proposed scheme in mitigating FR performance degradation caused by various disturbances.

The critic neural network adopts a three-layer structure, including seven input neurons, eight hidden neurons, and one output neuron. With the learning rate

η

set to 0.2 and

Q = d i a g (10, 4, 4, 1, 4, 8)

, four training rounds of simulations were conducted to construct a V2G FR controller that accounts for EV owners’ participation. During each learning iteration, the controller optimized performance under various disturbances using system data. As shown in Figure 6, the single-region learning error for the IEEE-14 node system converged to near zero after 90 s in the final training round, demonstrating the convergence of the critic neural network’s learning process.

As shown in Figure 7, the blue line represents the grid FR model without considering EV owners’ V2G participation, denoted as IRL. The red line represents the grid FR model incorporating EV owners’ V2G participation, denoted as α-IRL. In IRL, frequency deviation exhibits significant oscillations throughout the timeframe, fluctuating widely between −0.6 Hz and 0.6 Hz. The maximum frequency deviation reaches 0.5985 Hz, exceeding the safety threshold. This demonstrates that the IRL method struggles to effectively suppress frequency fluctuations when addressing frequency stability issues. In contrast, the α-IRL model exhibits significantly reduced frequency deviation fluctuations compared to IRL. Its overall curve is smoother and closer to the steady-state value, with a maximum deviation of 0.1955 Hz—remaining within the safe range. This demonstrates that the approach incorporating EV owners’ V2G participation achieves superior FR.

Table 3 presents FR performance through three metrics: integral of squared error (ISE), integral of absolute error (IAE), and weighted error consistency ultimate bound (UBB). Generally, smaller values of performance metrics indicate better performance. Compared to the traditional IRL-based FR scheme, the proposed FR scheme incorporating V2G participation from EV owners improved these three metrics by 89.1%, 66.8%, and 66.2%, respectively. These results validate the effectiveness of the proposed FR scheme in mitigating FR performance degradation caused by various disturbances.

Thus, the proposed method incorporating EV owner V2G participation demonstrates significant advantages over traditional IRL in suppressing frequency deviation and maintaining system frequency stability, effectively proving its efficacy.

4.2. Comparison of the Proposed V2G FR Control Method with Other Methods

This case study compares the proposed V2G FR control based on EV owner participation factor with Model Predictive Control (MPC), Heuristic Adaptive Dynamic Programming (ADHDP), and Deep Deterministic Policy Gradient (DDPG). The comparison is intended to evaluate the relative regulation performance of the proposed approach under various disturbance conditions.

For consistency with the single-area study, the critic neural network used in the IEEE 39-bus multi-area case adopts the same three-layer structure as that used for the IEEE 14-bus system. Specifically, the critic neural network consists of an input layer, a hidden layer, and an output layer, while the input dimension is selected according to the corresponding multi-area state vector.

As shown in the frequency deviation curves of Figure 8, during the training process of the IEEE-39 node system, the learning errors corresponding to Area 1, Area 2, and Area 3 all underwent a transition from initial fluctuations to gradual convergence. Ultimately, they stabilized at relatively low error levels, indicating that the training method effectively optimizes learning errors within this system, enabling the system-related training to achieve favorable results.

As shown in the frequency deviation curves of Figure 9, the frequency deviations of MPC (Model Predictive Control), ADHDP (Adaptive Heuristic Dynamic Programming), DDPG (Deep Deterministic Policy Gradient), and the proposed method all exhibit dynamic variations over time. In subfigures (a–c), the curves corresponding to MPC, ADHDP, and DDPG exhibit significant fluctuations, with frequency deviations oscillating noticeably between −0.4 Hz and 0.4 Hz. These oscillations persist strongly, indicating that these methods demonstrate poor FR performance when encountering disturbances. In contrast, the red curve representing the proposed method exhibits significantly smaller overall fluctuations than the other three approaches. Its frequency deviation approaches zero more closely, and the curve appears smoother. This demonstrates that under various disturbance scenarios, the proposed method can more effectively suppress fluctuations in frequency deviation and maintain system frequency stability.

Moreover, as shown in Table 4, the proposed α-IRL scheme achieves better frequency-regulation performance than MPC, ADHDP, and DDPG under the considered multi-area simulation scenario.

In summary, the proposed method shows better regulation performance than MPC, ADHDP, and DDPG under the considered disturbance scenario, leading to more effective frequency-stability support.

4.3. Time-Varying Participation Case Study

In Section 4.1 and Section 4.2, the simulation horizon was limited to 180 s. Over such a short time scale, EV owners’ connection status and behavioral willingness are unlikely to change significantly, and therefore the participation factor

α

can be reasonably approximated as a constant. Over a longer operating horizon, however, this assumption becomes less appropriate because the practically dispatchable V2G capacity is affected by both EV plug-in availability and user-side willingness, both of which vary with travel routine and behavioral preference.

In this subsection, the simulation window is selected as 17:00–20:00, corresponding to a typical weekday evening residential plug-in period. Prior studies have shown that, in residential scenarios, EV arrival times commonly concentrate in this evening period and can often be approximated by a normal distribution with mean values around 17:00–20:00 [24]. Therefore, this time window is adopted to characterize the influence of time-varying participation on dispatchable V2G regulation capacity.

It should be emphasized that this paper does not assume that

α

itself follows a single fixed probability distribution. Instead,

α (t)

is modeled as a bounded time-varying factor jointly determined by two behavior-related components, namely plug-in availability and participation factor. This treatment is more physically meaningful than directly imposing a prescribed distribution on

α

, because practical V2G participation requires both that the EV is connected to the grid and that the owner is willing to provide ancillary support. Recent bottom-up flexibility studies have also shown that EV flexibility is more appropriately characterized using travel and plugging patterns together with heterogeneous user archetypes, rather than by a fixed scalar participation coefficient [25].

Accordingly, the reference participation trajectory is constructed as

α_{r e f} (t) = A (t) W (t)

(31)

where

A (t)

denotes the plug-in availability factor and

W (t)

denotes the participation-willingness factor. The availability component represents the time-varying access of EVs to the charging interface during the evening commuting period, while the willingness component represents the user-side preference for participating in V2G and evolves more smoothly. This interpretation is consistent with recent V2G adoption studies showing that participation factor is strongly influenced by financial incentives, perceived loss of flexibility, and battery-degradation concerns [13].

In this study,

A (t)

is designed according to the regularity of evening residential arrival and connection behavior, while

W (t)

is modeled as a slower-varying bounded behavioral factor. The actual participation state

α (t)

is then generated through a first-order dynamic update law, so that the participation factor evolves gradually rather than changing instantaneously. This setting is consistent with the physical interpretation that aggregated user participation exhibits inertia over time.

To illustrate the effect of time-varying participation, two cases are considered: a constant-participation baseline, in which the participation factor remains fixed throughout the 3 h interval, and a dynamic participation-aware case, in which the participation factor varies with time and the controller updates the EV regulation limit according to the real-time dispatchable V2G capacity.

Figure 10 shows the trajectory of the participation factor

α

. It can be observed that the dynamic participation factor is lower than the constant baseline in the early evening period and then gradually increases as more EVs become available. This result indicates that a fixed participation assumption may either overestimate or underestimate the actual V2G regulation capability depending on the operating time.

Figure 11 presents the corresponding frequency-deviation responses. The results show that the proposed framework remains stable over the entire 3 h interval under time-varying participation. Compared with the constant-

α

baseline, the dynamic-

α

-aware case exhibits a distinguishable but still stable regulation trajectory. These results indicate that the variation of EV plug-in availability and participation factor is translated into a time-varying participation factor, which in turn affects the closed-loop regulation process. Therefore, the dynamic-

α

-aware formulation provides a more realistic representation of V2G-supported FR over the evening operating window.

Overall, the above results show that the proposed participation-aware framework can accommodate behavior-driven variation of EV participation factor over a practically meaningful evening operating period while maintaining stable frequency-regulation performance.

5. Discussion

The simulation results support the two research hypotheses from complementary perspectives. In the IEEE 14-bus system, the comparison between the conventional IRL method and the proposed participation-aware α-IRL method shows that explicitly incorporating EV owners’ participation factor significantly improves frequency-deviation suppression and overall regulation performance. This result supports H1 by indicating that participation-aware modeling can better align theoretical V2G regulation capability with practically dispatchable regulation resources. In the IEEE 39-bus multi-area system, the comparisons with MPC, ADHDP, and DDPG show that the proposed method achieves better overall dynamic regulation performance under more complex disturbance conditions. This result supports H2 by demonstrating that the Hamiltonian-based integral reinforcement learning framework provides a robust and adaptive solution for distributed V2G FR under uncertainty. The time-varying participation study further shows that the proposed framework can maintain stable frequency-regulation performance over a practical evening operating window, while more realistically reflecting the effect of changing EV plug-in availability and user willingness on dispatchable V2G capacity.

Compared with MPC, the proposed method avoids repeated online optimization and therefore shows stronger adaptability under uncertain disturbances and changing participation-aware V2G availability. Compared with ADHDP and DDPG, the proposed method remains more closely connected to the physical control objective through the Hamiltonian-based formulation, which improves interpretability and facilitates the incorporation of participation-aware constraints. In addition, the proposed controller is implemented in a model-free online form with respect to the drift dynamics, meaning that explicit online knowledge of the system drift matrix is not required for policy realization. This feature improves practical adaptability when accurate drift dynamics are difficult to identify, while preserving the structured physical meaning of the control design.

From an application perspective, the proposed framework is relevant to future renewable-rich power systems in which V2G resources are expected to provide distributed ancillary support. By explicitly considering EV owners’ participation factor, the method improves the practical realism of dispatchable V2G regulation capacity and reduces the gap between idealized regulation assumptions and actual user-constrained resource availability. In addition, the distributed Grid–Aggregator–EV structure is compatible with hierarchical implementation in practical aggregation-based V2G services, where regulation commands, participation information, and EV response must be coordinated across multiple layers.

Several important directions remain for future study. First, more realistic user-side uncertainty should be incorporated, including dynamic travel schedules, state-of-charge distributions, charging/discharging preferences, and battery-degradation costs. Second, the present work adopts a simplified bounded-delay representation; therefore, larger communication delays, packet loss, asynchronous updates, and more realistic cyber-physical constraints should be explicitly investigated in future networked-control formulations. Third, broader large-scale validation under more practical operating scenarios and richer benchmark comparisons would further strengthen the engineering applicability of the proposed framework.

6. Conclusions

This paper proposes a participation-aware V2G frequency-regulation framework for renewable-rich power systems by integrating the EV participation factor into the control design. A three-layer Grid–Aggregator–EV architecture and a corresponding participation-aware dynamic model were established to describe the coordination of distributed V2G resources in FR. On this basis, a Hamiltonian-based optimal robust regulation law was derived under practical V2G power constraints. To realize the control policy online, an integral reinforcement learning scheme was further developed so that the controller can be implemented without explicit online dependence on the system drift matrix

A

. Therefore, the proposed method preserves a model-free characteristic with respect to the drift dynamics while maintaining the physical structure of the frequency-regulation problem.

The simulation results on the IEEE 14-bus and IEEE 39-bus systems verified the effectiveness of the proposed framework from complementary aspects. In the IEEE 14-bus case, explicitly modeling EV participation factor improved the practical dispatchability of V2G resources and achieved better frequency-regulation performance than the participation-agnostic baseline. In the IEEE 39-bus case, the proposed method further demonstrated stronger robustness and adaptability than the compared methods under more complex multi-area disturbances. Overall, the proposed participation-aware Hamiltonian-IRL framework provides an effective and physically interpretable solution for coordinated V2G-based FR. In addition, the time-varying participation case showed that the proposed framework can accommodate behavior-driven variation of EV participation factor over a practical evening operating period while maintaining stable frequency-regulation performance.

Future work will focus on incorporating more realistic EV operational factors, such as travel schedules, state-of-charge distributions, battery degradation, and larger communication uncertainties, so as to further improve the practical applicability of the proposed method in large-scale V2G frequency-regulation scenarios.

Funding

The author did not receive support from any organization for the submitted work.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the author.

Conflicts of Interest

The author declares no conflicts of interest.

Nomenclature

Nomenclature of major symbols

Symbol	Definition	Remark
Δf	Frequency deviation	State variable
ΔP_m	Mechanical power deviation of the hydro turbine	State variable
ΔP_EV	Control power deviation of EVs in FR	State variable
ΔP_g	Governor position deviation of the hydro turbine	State variable
U_I	Integral of the area control error (ACE)	State variable
α	EV owners’ V2G participation factor	State variable
x(t)	System state vector	x(t) = [Δf,ΔP_m,ΔP_g,ΔP_EV,U,α]^T
u_EV	EV charge/discharge control signal	Control input
u_pp	Primary FR signal from conventional generators	Control input
$v$	External disturbance input	Lumped disturbance
ΔP_L	Disturbance power from loads	Disturbance term
ΔP_R	Disturbance power from renewable energy sources	Disturbance term
A	System state matrix	State-space model
B₁	Input matrix associated with one control channel	State-space model
B₂	Input matrix associated with another control channel	State-space model
E	External disturbance matrix	State-space model
H_s	Grid inertia constant	System parameter
D	Grid damping coefficient	System parameter
T_d	Hydro turbine time constant	System parameter
T_g	Governor time constant	System parameter
T_E	EV controller time constant	System parameter
T_α	User behavior inertia time constant	System parameter
R_d	Droop coefficient	System parameter
α_ref	Reference value of EV owners’ V2G participation	System parameter
J	Cost function/performance index	Optimization objective
U(⋅)	Utility function	Used in the cost function
$H$	Hamiltonian function	Optimal control formulation
J*(x)	Optimal value function	Optimal control variable
${\hat{J}}^{*} (x)$	Approximated value function	Critic-network approximation
Q	State weighting matrix in the cost function	Symmetric positive definite
R_u	Control weighting matrix in the cost function	Symmetric positive definite
W	Ideal critic-network weight vector	Neural-network parameter
$\hat{W}$	Estimated critic-network weight vector	Neural-network parameter
$\tilde{W}$	Critic weight estimation error	$\tilde{W} = W - \hat{W}$
$φ_{c} (x)$	Activation-function vector	Critic network
$\nabla φ_{c} (x)$	Gradient of the activation-function vector with respect to the state vector	Neural-network derivative term
$ε_{c} (x)$	Approximation error	Bounded neural-network approximation error
e_H	HJB residual error	Learning error term
$η$	Learning rate	Online learning parameter
$δ$	Regression term/residual-related term	Learning law parameter
N	Number of neurons in the hidden layer	Neural-network structure
$λ_{\min} (\cdot)$	Minimum eigenvalue of a positive definite matrix	Stability analysis
ISE	Integral of squared error	Performance index
IAE	Integral of absolute error	Performance index
UBB	Ultimate boundedness bound/weighted error consistency ultimate bound	Performance index

Abbreviations

The following abbreviations are used in this manuscript:

Abbreviation	Full term
V2G	Vehicle-to-grid
EV	Electric vehicle
FR	Frequency regulation
LFC	Load frequency control
IRL	Integral reinforcement learning
ADP	Adaptive dynamic programming
HJB	Hamilton–Jacobi–Bellman
ACE	Area control error
EVA	Electric vehicle aggregator
DEVC	Distributed EV community
EVCN	EV communication network
MPC	Model predictive control
ADHDP	Action-dependent heuristic dynamic programming
DDPG	Deep deterministic policy gradient

References

Muduli, R.; Jena, D.; Moger, T. A survey on load frequency control using reinforcement learning-based data-driven controller. Appl. Soft Comput. 2024, 166, 112203. [Google Scholar] [CrossRef]
Gulzar, M.M.; Sibtain, D.; Alqahtani, M.; Alismail, F.; Khalid, M. Load frequency control progress: A comprehensive review on recent development and challenges of modern power systems. Ain Shams Eng. J. 2025, 16, 103168. [Google Scholar] [CrossRef]
Wadi, M.; Shobole, A.; Elmasry, W.; Kucuk, I. Load frequency control in smart grids: A review of recent developments. Renew. Sustain. Energy Rev. 2024, 189, 114013. [Google Scholar] [CrossRef]
Rahman, M.; Sarker, S.K.; Das, S.K.; Ali, M.F. Model predictive control framework design for frequency regulation of PHEVs participating in interconnected smart grid. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE); IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
Nguyen, H.T.; Choi, D.H. Distributionally robust model predictive control for smart electric vehicle charging station with V2G/V2V capability. IEEE Trans. Smart Grid 2023, 14, 4621–4634. [Google Scholar] [CrossRef]
Sun, J.; Tan, S.; Zheng, H.; Qi, G.; Tan, S.; Peng, D.; Guerrero, J.M. A DoS attack-resilient grid frequency regulation scheme via adaptive V2G capacity-based integral sliding mode control. IEEE Trans. Smart Grid 2023, 14, 3046–3057. [Google Scholar] [CrossRef]
Mu, C.; Liu, W.; Xu, W. Hierarchically adaptive frequency control for an EV-integrated smart grid with renewable energy. IEEE Trans. Ind. Inform. 2018, 14, 4254–4265. [Google Scholar] [CrossRef]
Kumar, N.; Tyagi, B.; Kumar, V. Approximate dynamic programming based controller design for interconnected AGC scheme. In 2015 IEEE Region 10 Conference (TENCON); IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar]
Mu, C.; Wang, K.; Ma, S.; Chong, Z.; Ni, Z. Adaptive composite frequency control of power systems using reinforcement learning. CAAI Trans. Intell. Technol. 2025, 7, 671–684. [Google Scholar] [CrossRef]
Song, R.; Lewis, F.L.; Wei, Q.; Zhang, H. Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 2016, 46, 1041–1053. [Google Scholar] [CrossRef]
Xie, H.; Song, G.; Shi, Z.; Zhang, J.; Lin, Z.; Yu, Q.; Fu, H.; Song, X.; Zhang, H. Reinforcement learning for vehicle-to-grid: A review. Sustain. Futures 2025, 8, 100369. [Google Scholar] [CrossRef]
Shi, J.; Peng, C.; Zhang, J.; Xie, X. Model-free frequency regulation in islanded microgrids: An event-triggered adaptive dynamic programming approach. Int. J. Electr. Power Energy Syst. 2024, 155, 109635. [Google Scholar] [CrossRef]
Alamgir, S.; Hassan, S.J.U.; Mehdi, A.; Abdelmaksoud, A.; Haider, Z.; Shin, G.-S.; Kim, C.-H. A comprehensive review of vehicle-to-grid (V2G) technology as an ancillary services provider. Results Eng. 2025, 27, 106813. [Google Scholar] [CrossRef]
Bakhuis, J.; Barbour, N.; Chappin, É.J.L. Exploring user willingness to adopt vehicle-to-grid (V2G): A statistical analysis of stated intentions. Energy Policy 2025, 203, 114619. [Google Scholar] [CrossRef]
Chen, G.; Zhang, Z. Control strategies, economic benefits, and challenges of vehicle-to-grid applications: Recent trends research. World Electr. Veh. J. 2024, 15, 190. [Google Scholar] [CrossRef]
Tang, R.; Mak, H.-Y.; Rong, Y. Collaborative vehicle-to-grid operations in frequency regulation markets. Manuf. Serv. Oper. Manag. 2024, 26, 814–833. [Google Scholar]
Avila-Becerril, S.; Espinosa-Pérez, G.; Machado, J.E. A Hamiltonian control approach for electric microgrids with dynamic power flow solution. Automatica 2022, 139, 110192. [Google Scholar] [CrossRef]
Tõnso, M.; Kaparin, V.; Belikov, J. Port-Hamiltonian framework in power systems domain. Energy Rep. 2023, 10, 2918–2930. [Google Scholar] [CrossRef]
Jang, M.-J.; Oh, E. Deep-reinforcement-learning-based vehicle-to-grid operation strategies for managing solar power generation forecast errors. Sustainability 2024, 16, 3851. [Google Scholar] [CrossRef]
Li, T.; Tao, S.; He, K.; Lu, M.; Xie, B.; Yang, B.; Sun, Y. V2G Multi-Objective Dispatching Optimization Strategy Based on User Behavior Model. Front. Energy Res. 2021, 9, 739527. [Google Scholar] [CrossRef]
Abouheaf, M.; Gueaieb, W. Model-free adaptive control approach using integral reinforcement learning. In 2019 IEEE International Instrumentation and Measurement Technology Conference (I2MTC); IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
Alfaverh, F.; Denaï, M.; Sun, Y. Optimal vehicle-to-grid control for supplementary frequency regulation using deep reinforcement learning. Electr. Power Syst. Res. 2023, 214, 108949. [Google Scholar] [CrossRef]
Igarashi, K.; Takami, S.; Hayashi, Y. Development and Analysis of an Integrated Optimization Model for Variable Renewable Energy and Vehicle-to-Grid in Remote Islands: A Case Study of Tanegashima, Japan. Energies 2025, 18, 5933. [Google Scholar] [CrossRef]
Xue, L.; Xia, J. Simulator to Quantify and Manage Electric Vehicle Load Impacts on Low-Voltage Distribution Grids; WRI China Technical Note; World Resources Institute: Washington, DC, USA, 2021. [Google Scholar]
Gan, W.; Zhou, Y.; Wu, J. Quantifying grid flexibility provision of virtual vehicle-to-vehicle energy sharing using statistically similar networks. Appl. Energy 2025, 390, 125818. [Google Scholar] [CrossRef]

Figure 1. Three-layer Grid–Aggregator–EV architecture for distributed V2G FR.

Figure 2. Overall structure of the proposed participation-aware V2G frequency-regulation system.

Figure 3. Overall workflow of the proposed participation-aware V2G frequency-regulation methodology.

Figure 4. The summation of the power disturbances from loads and renewable resources.

Figure 5. Single-area IEEE 14-bus test system used for validation of the proposed method.

Figure 6. Convergence of critic-network learning error in the IEEE 14-bus system.

Figure 7. Comparison of frequency deviation responses under IRL and participation-aware α-IRL in the IEEE 14-bus system.

Figure 8. Evolution of critic-network learning errors in the IEEE 39-bus multi-area system: (a) Area 1; (b) Area 2; (c) Area 3.

Figure 9. Multi-area frequency deviation responses under different control methods: (a) Area 1; (b) Area 2; (c) Area 3.

Figure 10. Time-varying participation factor

α

over 17:00–20:00.

Figure 10. Time-varying participation factor

α

over 17:00–20:00.

Figure 11. Frequency deviation under constant-

α

and dynamic-

α

-aware cases.

Figure 11. Frequency deviation under constant-

α

and dynamic-

α

-aware cases.

Table 1. Parameter Configuration for Single-area IEEE 14-bus System.

Symbol	Definition	Area
H_s	Grid inertia constant (pu/Hz)	11
R_d	Droop coefficient (Hz/pu)	0.05
D	Grid damping coefficient (pu/Hz)	1.4
T_g	Governor time constant (s)	0.15
T_d	Hydro turbine time constant (s)	0.30
T_E	EV controller time constant (s)	0.02
T_α	User behavior inertia time constant (s)	60
α_ref	Reference value of EV owners’ V2G participation	0.60

Table 2. Parameter configuration for three-area IEEE 39-bus system.

Symbol	Definition	Area 1	Area 2	Area 3
H_s	Grid inertia constant (pu/Hz)	10	10	12
R_d	Droop coefficient (Hz/pu)	0.05	0.05	0.05
D	Grid damping coefficient (pu/Hz)	1.0	1.5	1.8
T_g	Governor time constant (s)	0.10	0.17	0.20
T_d	Hydro turbine time constant (s)	0.30	0.40	0.35
T_E	EV controller time constant (s)	0.02	0.02	0.02
T_α	User behavior inertia time constant (s)	60	70	80
α_ref	Reference value of EV owners’ V2G participation	0.60	0.55	0.65

Table 3. Performance Comparison of IEEE-14.

	ISE	IAE	UBB
IRL	15.402	43.305	0.601
α-IRL	1.678	14.370	0.203

Table 4. Performance Comparison of IEEE-39.

Groups	ISE	IAE	UBB
MPC	0.863	9.829	0.183
ADHDP	0.974	10.586	0.263
DDPG	1.067	11.124	0.256
α-IRL	0.077	3.064	0.064

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, C. Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning. Symmetry 2026, 18, 824. https://doi.org/10.3390/sym18050824

AMA Style

Liang C. Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning. Symmetry. 2026; 18(5):824. https://doi.org/10.3390/sym18050824

Chicago/Turabian Style

Liang, Canhang. 2026. "Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning" Symmetry 18, no. 5: 824. https://doi.org/10.3390/sym18050824

APA Style

Liang, C. (2026). Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning. Symmetry, 18(5), 824. https://doi.org/10.3390/sym18050824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning

Abstract

1. Introduction

2. Model Construction and Problem Description

2.1. Power System Model

2.2. Problem Description

3. Collaborative FR Based on Integral Reinforcement Learning

3.1. Optimal Control Objective

3.2. V2G Optimal FR and Online Learning Algorithm

3.3. Stability Analysis

4. Simulation and Results

4.1. Validation of the Effectiveness of V2G FR Control

4.2. Comparison of the Proposed V2G FR Control Method with Other Methods

4.3. Time-Varying Participation Case Study

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI