Next Article in Journal
Numerical Simulation of Optical Characteristics of the NPOM Nanostructure Based on Gold Nanocubes
Previous Article in Journal
Short-Term Human Activity Recognition Based on Adaptive Variational Mode Decomposition and Information-Enhanced Hilbert Transform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning

School of Electric Power Engineering, South China University of Technology, Guangzhou 510641, China
Symmetry 2026, 18(5), 824; https://doi.org/10.3390/sym18050824 (registering DOI)
Submission received: 25 March 2026 / Revised: 20 April 2026 / Accepted: 29 April 2026 / Published: 11 May 2026
(This article belongs to the Section Engineering and Materials)

Abstract

With the increasing penetration of renewable energy, power systems are facing stronger frequency fluctuations, which make fast and flexible frequency support increasingly important. Although vehicle-to-grid (V2G) technology provides a promising source of distributed regulation capacity, many existing studies do not explicitly consider EV owners’ participation, which may lead to a mismatch between theoretical regulation potential and practically available V2G support. To address this issue, this paper proposes a distributed Grid–Aggregator–EV frequency-regulation (FR) framework that incorporates EV participation factor into the control design. A three-layer architecture and a dynamic participation-aware model are established to describe the coordination of distributed V2G resources, and a Hamiltonian-based robust control law is developed under V2G power constraints. An integral reinforcement learning scheme is then adopted to realize the optimal regulation policy online, where the controller does not require explicit online knowledge of the system drift matrix, while preserving the physical control structure. In this way, the proposed method explicitly links the EV participation factor, dispatchable V2G regulation capacity, and coordinated FR, thereby improving robustness, adaptability, and practical relevance. Simulation studies on the IEEE 14-bus and IEEE 39-bus systems, together with an evening-period, time-varying participation case, demonstrate that the proposed method provides more effective frequency-deviation suppression, better overall regulation performance, and stable operation under dynamic EV participation.

1. Introduction

With the continuous increase in renewable-energy penetration, power systems are facing stronger frequency fluctuations due to the intermittency and uncertainty of wind and solar generation. Although conventional generating units remain the main source of frequency support, their response speed and regulation flexibility are often insufficient under highly dynamic operating conditions. Large-scale battery energy storage can provide fast support, but its deployment cost is still considerable. In this context, vehicle-to-grid (V2G) technology has attracted growing attention because electric vehicles (EVs) can act as distributed and fast-response energy resources for ancillary services, including frequency regulation. Recent reviews have further shown that modern FR/LFC research is evolving from classical control toward intelligent, data-driven, and resilient regulation frameworks for renewable-rich power systems [1,2,3].
Existing FR strategies mainly include classical feedback control and optimization-based methods such as proportional–integral–derivative (PID) control and model predictive control (MPC). These methods have provided useful baselines for improving system frequency quality and coordinating power support from controllable resources [4,5]. However, their effectiveness in large-scale V2G-integrated power systems is often limited by model dependence, online computational burden, and insufficient adaptability to strong nonlinearities and uncertain disturbances. Recent state-of-the-art reviews on load FR have emphasized that these limitations become more pronounced in interconnected and renewable-dominated systems, where modern intelligent and data-driven approaches are increasingly required [1,2].
To enhance adaptability and control performance under uncertainty, a variety of intelligent control methods have been introduced into FR problems, including integral sliding mode control, hierarchical adaptive control, adaptive dynamic programming, and reinforcement learning [6,7,8,9,10]. Among them, reinforcement-learning-based and approximate dynamic programming methods are particularly attractive because they can approximate optimal policies online without explicitly solving the Hamilton–Jacobi–Bellman equation. Recent surveys have also shown that reinforcement learning has become an important direction for both load FR and V2G-oriented scheduling under uncertainty [1,11]. Nevertheless, most existing studies either focus on controller adaptability without explicitly modeling EV participation factor constraints, or consider V2G scheduling without sufficiently addressing robust closed-loop FR in distributed multi-layer architectures. Representative recent studies have further shown that model-free or adaptive-dynamic-programming-based methods can be directly applied to FR problems, for example, in islanded microgrids under uncertain operating conditions, thereby reinforcing the relevance of learning-based regulation strategies in frequency-control applications [12].
In parallel, V2G has been widely recognized as a promising resource for ancillary services because EV batteries provide rapid bidirectional power support. Recent review studies have further emphasized that V2G technology can serve as an ancillary-services provider for renewable-rich power systems, offering services such as FR, voltage support, spinning reserve, and peak-load support, which strengthens the practical motivation for participation-aware V2G FR [13]. However, practical V2G participation is fundamentally constrained by user-side factors, including travel demand, charging flexibility, battery degradation concerns, and economic incentives. Recent studies have shown that users’ willingness to participate in V2G is strongly influenced by financial returns, perceived loss of flexibility, and battery-health concerns [14,15]. In addition, collaborative V2G market studies have highlighted that heterogeneous driving schedules and incentive design significantly affect the dispatchable V2G capacity available for FR [16]. Therefore, treating EV participation factor as a fixed or preset parameter may lead to a mismatch between theoretical regulation capacity and practically available V2G support.
From a control-theoretic perspective, Hamiltonian- and energy-based formulations provide a rigorous framework for deriving optimal control laws in constrained power and microgrid systems [17,18]. Meanwhile, recent deep-reinforcement-learning-based V2G studies have demonstrated the potential of learning-based strategies for handling operational uncertainty and dynamic charging/discharging decisions [19]. These developments indicate that combining Hamiltonian-based optimality with online learning is a promising direction for participation-aware V2G FR. However, the integration of EV-owner participation dynamics, distributed Grid–Aggregator–EV coordination, and online optimal robust FR remains insufficiently explored in the current literature.
Despite the above progress, several research gaps remain. First, most existing V2G frequency-regulation studies still simplify EV participation factor as a fixed or exogenous parameter, without dynamically characterizing the impacts of user willingness and practical availability on dispatchable regulation capacity [14,15,16]. Second, current FR strategies rarely unify participation-aware V2G capacity constraints, conventional-generator coordination, and disturbance-rejection requirements within a single optimal control framework [1,2]. Third, although reinforcement-learning-based methods improve adaptability, the combination of Hamiltonian-based optimality analysis and online critic learning for distributed V2G FR is still limited [11,16,19].
To address the aforementioned challenges, this paper proposes a participation-aware V2G-based FR method that explicitly incorporates EV owners’ participation factor into the control design. First, a power-grid FR model incorporating the EV participation factor is established to characterize the constrained relationship between the participation factor and dispatchable V2G regulation capacity. Second, a Hamiltonian function is constructed to derive the optimal control law. Finally, a critic-neural-network-based online learning algorithm is designed to update the network weights and implement the control policy online. The proposed method ultimately achieves fast and stable frequency regulation while accounting for EV participation and enhancing system robustness. The main contributions of this paper can be summarized in the following two aspects:
A distributed multi-layer V2G FR architecture, including the power-grid side, aggregator side, and EV side, is established to achieve collaborative FR among multiple controllers. Meanwhile, the EV participation factor is quantified in the developed power-grid model to characterize the dispatchable V2G regulation capacity, thereby improving V2G FR performance.
This paper presents a collaborative integral reinforcement learning control scheme that is model-free with respect to the drift dynamics in the online implementation. By simultaneously incorporating V2G FR signals and power grid control signals, the proposed utility function enables optimal collaborative FR across various types of FR signals, which in turn further optimizes the FR performance.
To make the objective of this study more explicit, the research hypothesis of this paper is stated as follows:
H1. 
Explicit modeling of EV owners’ participation factor improves the practical dispatchability of V2G resources and leads to better frequency-regulation performance than participation-agnostic control.
H2. 
A Hamiltonian-based optimal control framework combined with integral reinforcement learning can provide a robust and adaptive regulation policy for distributed V2G FR under uncertain disturbances.
The subsequent arrangement of this paper is as follows: Section 2 elaborates on the establishment of the power grid FR model and problem description. Section 3 presents the FR controller, learning algorithm, and stability analysis based on EV owners’ V2G participation factor. Section 4 verifies the effectiveness, superiority, and practical applicability of the proposed method through simulation results in single-area, multi-area, and time-varying participation scenarios. Section 5 summarizes the research findings of this paper.

2. Model Construction and Problem Description

2.1. Power System Model

This section establishes the participation-aware dynamic model from the physical Grid–Aggregator–EV architecture and provides the basis for the subsequent control design.
As shown in Figure 1, a three-layer FR architecture for distributed V2G is established, consisting of the power grid side, aggregation side, and EV side. The power grid side serves as the initiation and regulation level for FR demands, leading the overall planning of global FR resources. The aggregation side comprises multiple EV Aggregators (EVAs), which aggregate regional FR resources through electrical and physical connections, and complete cross-regional information interaction and collaborative decision-making via the EV Communication Network (EVCN). The EV side takes Distributed EV Communities (DEVCs) as units; within each community, EVs form an autonomous entity through local communication networks, and while receiving instructions from the aggregation side, they feed back their own FR capabilities and status information. Through bidirectional interaction between electrical-physical connections and data-information connections, the three-layer architecture realizes multi-level FR resource scheduling and collaboration from the power grid to individual EVs, providing architectural support for the efficiency and flexibility of power system FR in distributed V2G scenarios.
Figure 2 shows the structure of the proposed frequency-regulation system with V2G. In this framework, the conventional-generator channel and the aggregated EV channel collaboratively provide regulation support, while EV owners’ participation factor constrains the dispatchable V2G capacity. Therefore, the system explicitly links participation-aware V2G availability with coordinated frequency-control action.
Based on the above physical architecture, the corresponding mathematical model is established as follows. The power-grid layer determines the global frequency-regulation task and provides the system frequency deviation to be suppressed. The aggregator layer collects participation-related information and available regulation capacity from distributed EV communities and converts them into aggregated regulation actions. The EV layer provides practical charging/discharging flexibility subject to the user-side participation factor and operational availability. Accordingly, the system frequency deviation, turbine-governor dynamics, aggregated EV regulation power, integral regulation state, and participation factor are selected as the main state variables of the mathematical model. In this study, the primary controlled variable is the system frequency deviation Δ f , while the conventional-generator regulation signal u p p and the EV charge/discharge regulation signal u E V are treated as the two manipulated inputs. External load and renewable-energy fluctuations are modeled as lumped disturbances.
For the main controller, the frequency dynamics of the equivalent regulator, steam turbine, and power system, as well as the aggregated V2G power output and EV owners’ participation factor, are described by the following differential equations:
H s Δ f ˙ = D Δ f + Δ P m + Δ P E V Δ P L + Δ P R T d Δ P ˙ m = Δ P m + Δ P g T g Δ P ˙ g = Δ P g Δ f / R d + u p p T E Δ P ˙ E V = Δ P E V + α u E V U ˙ I = Δ f T α α ˙ = α + α r e f
In the above parameters, Δ f , Δ P m , Δ P E V , Δ P g , U I and α correspond to the frequency deviation, mechanical power deviation of the hydro turbine, control power deviation of electric vehicles (EVs) in FR, governor position deviation of the hydro turbine, integral of the area control error (ACE), and V2G participation factor of EV owners, in that order. Δ P L and Δ P R respectively denote the disturbance power originating from loads and renewable energy sources. Furthermore, H s , D , T d , T g , T e , R d and T α represent the grid inertia constant, grid damping coefficient, hydro turbine time constant, governor time constant, EV controller time constant, governor control input deviation coefficient, and user behavior inertia time constant, respectively. Additionally, u p p and u E V denote the EV charge/discharge control signal and the primary FR signal from conventional generators, respectively.
Define the system state vector as:
x t = Δ f , Δ P m , Δ P g , Δ P E V , U I , α T
Accordingly, the overall system dynamics can be rewritten in the following state-space form:
x ˙ = A x + B 1 u p p + B 2 u E V + E v
Here, A denotes the system state matrix, B 1 and B 2 denote the input matrices associated with different control channels, and E denotes the external disturbance matrix.
In this study, the primary controlled variable is the system frequency deviation Δ f , while the integral of the area control error and the aggregated EV regulation power are included as auxiliary regulation-related states to improve dynamic frequency-recovery performance. The two manipulated inputs are the conventional-generator control signal u p p and the EV charge/discharge control signal u E V .
A = D H s 1 H s 0 1 H s 0 0 1 R d T g 1 T d 1 T d 0 0 0 0 0 1 T g 0 0 0 0 0 0 1 T E 0 0 1 0 0 0 0 0 0 0 0 0 0 1 T α , B 1 = 0 0 1 T g 0 0 0 , B 2 = 0 0 0 α T E 0 0 , E = 1 H s 1 H s 0 0 0 0 0 0 0 0 0 0
The communication links shown in Figure 1 introduce finite delays in the transmission of regulation commands and participation-related information between the grid layer, aggregator layer, and EV layer. In this work, these delays are assumed to be bounded and relatively small compared with the dominant frequency-regulation time scale. Therefore, their aggregated effect is incorporated into the regulation framework through the existing actuator/response dynamics of the aggregated EV channel and the participation-update process, rather than being treated as an independent large-delay networked control problem. Under this assumption, the proposed model captures the practical latency effect in a simplified but implementation-oriented manner.
Here, the EV-side time constant T E is used to capture the aggregated response lag of the V2G regulation channel, including the effect of local communication, command execution, and power-response delay at the aggregator–EV interface.
The first-order dynamic equation for EV owners’ V2G participation factor is calibrated based on 320 scenario-based questionnaires from first-tier cities and Monte Carlo simulations of 300,000 EVs in [20], which aligns with the behavioral characteristics of vehicle owners such as SOC thresholds and travel patterns. V2G participation factor among EV owners quantifies the willingness of electric vehicle owners to participate in grid FR. A higher α indicates a larger energy range available for V2G to support grid FR. A lower α indicates a smaller energy range available for V2G to support grid FR, making the system more reliant on conventional generators.
Therefore, the physical frequency-regulation problem is transformed into a participation-aware state-space control problem with dual control channels, namely conventional generator regulation and aggregated EV regulation.

2.2. Problem Description

With the growing integration of renewable energy sources into power grids, frequency variations resulting from their intermittent characteristics pose a significant threat to grid stability, and in extreme cases, may trigger the collapse of the power system. Traditional solutions rely on energy storage devices such as batteries for reverse power supply, but large-scale deployment entails substantial costs. Meanwhile, with the development of new energy technologies, an increasing number of EVs have entered daily life. Thanks to the rapid advancement of V2G technology, EVs are regarded as distributed energy storage devices. Beyond meeting travel needs, EV batteries can fulfill energy storage functions through V2G technology. Moreover, since the charging and discharging process of EVs involves electromagnetic and chemical reactions rather than mechanical processes, EVs respond faster than power plants during FR. In summary, compared with traditional energy storage devices, EV batteries offer the advantages of low cost, convenient use, and fast response speed. However, this framework faces three core challenges. First, EVs are primarily used to satisfy travel demand, and frequent charging and discharging may accelerate battery degradation, leading to low and uncertain EV participation. Therefore, improving and characterizing the V2G participation factor becomes a key issue. Second, EVs are widely distributed, which makes centralized power dispatch difficult; thus, the coordinated scheduling of large-scale distributed resources must be addressed. Third, multiple control strategies coexist in the power grid, and their power allocation must be coordinated to achieve efficient FR.

3. Collaborative FR Based on Integral Reinforcement Learning

The overall methodological workflow of the proposed participation-aware V2G frequency-regulation framework is summarized in Figure 3.
The proposed methodology is organized into four main stages. First, a participation-aware dynamic model is established from the physical Grid–Aggregator–EV architecture, in which the EV participation factor is incorporated into the state-space representation of FR. Second, a constrained optimal control problem is formulated so that the regulation objective remains consistent with frequency-quality requirements and practical V2G availability. Third, a Hamiltonian-based regulation law is derived and implemented online through critic-network-based integral reinforcement learning. Fourth, the resulting closed-loop properties and the research hypotheses are examined through stability analysis and simulation-based validation in both single-area and multi-area test systems.

3.1. Optimal Control Objective

This subsection formulates the constrained optimal control problem so that FR remains consistent with practical V2G availability and participation-aware resource limits.
To support Hypotheses H1 and H2, the control objective is formulated to jointly penalize frequency deviation, control effort, and the limitation of dispatchable V2G capacity induced by the EV participation factor.
The saturation of available V2G FR capacity is defined by u i , specifically as follows:
u i = s a t u ^ i , U m = min u ^ i , U m + , u ^ i 0 max u ^ i , U m , u ^ i 0
The objective function is designed to reflect the physical regulation task shown in Figure 1. Specifically, it penalizes frequency-related state deviations to ensure frequency quality at the grid layer, penalizes control effort to avoid excessive regulation burden on conventional generators and EV resources, and incorporates the limitation of dispatchable V2G capacity so that the optimization remains consistent with the practical participation-aware availability provided by the aggregator and EV layers.
The optimal control minimizes the cost function at the saddle point, as shown in Equation (6).
J x = t r x τ , u p p τ , u E V τ d τ
r x τ , u p p τ , u E V τ is defined as the utility function:
r x , u p p , u E V = x T Q x + u p p T R u u p p + σ u E V
Among these, Q and R u are given symmetric positive definite matrices.
σ u E V R is a semi-definite function used to regulate the available V2G FR U m under the following constraints u E V :
σ u E V = 2 U m 0 u E V ζ 1 ξ / U m d ξ
Among these, ζ belongs to the class of bounded, monotonic, odd functions, such as the hyperbolic tangent function (where ζ 0 = 0 ).
The Hamiltonian function is constructed as follows:
H x , u p p , u E V , J x = J x T x ˙ + r
Among these, J x = J / x .
Let J * be the optimal performance indicator function, then
0 = min u U H x , u p p , u E V , J x *
u * = arg min u U H x , u p p , u E V , J x *
Take the partial derivative of the Hamilton function H with respect to u p p and set it equal to 0:
H u p p = B 1 T J x + 2 R u p p = 0 u p p * = 1 2 R 1 B 1 T J x *
Taking the partial derivative of the Hamilton function H with respect to u E V and setting it equal to 0 yields similarly:
u E V * = U m tanh 1 2 U m B 2 T J x *
Therefore, the optimal control can be obtained from (10) below, where the superscript * denotes the optimal solution.
The optimal control may be formulated as:
u p p * = 1 2 R 1 B 1 T J x *
u E V * = U m tanh 1 2 U m B 2 T J x *
Following the definition in [21], “model-free” means that the online learning and control implementation do not require explicit knowledge of the system dynamics. In this paper, A denotes the drift dynamics matrix of the participation-aware frequency-regulation system, while B 1 and B 2 denote the input-channel matrices. Since the final IRL-based control laws do not explicitly depend on A , the proposed controller can be regarded as model-free with respect to the drift dynamics.
Based on the above state-space model and constrained regulation objective, a Hamiltonian function is constructed to derive the optimal regulation law. The main contribution here is that the practical V2G participation limitation is embedded into the optimal control formulation, rather than treated as an exogenous fixed coefficient. This enables the resulting controller to better reflect the physically available V2G support in real operation.

3.2. V2G Optimal FR and Online Learning Algorithm

This part implements the Hamiltonian-based control law online through critic-network-based integral reinforcement learning.
Since the optimal value function cannot be solved analytically, the Hamiltonian-based control law is implemented online through a critic neural network within an integral reinforcement learning framework. In the proposed implementation, the online policy update is carried out without explicit online dependence on the system drift matrix A . Therefore, the learning module serves not only as a computational realization of the optimal control design, but also as the key mechanism that enables a model-free online implementation with respect to the drift dynamics.
Since the bounded communication-induced latency is reflected in the aggregated EV-channel dynamics, the optimization and online learning process are carried out on the participation-aware delayed-response system rather than on an ideal instantaneous-response V2G model.
The calculation u p p * of u E V * depends on J x * , as shown in Equations (14) and (15). J x * cannot be explicitly computed, thus requiring the future value of the utility function. Accordingly, an adaptive neural network-based discriminator scheme is employed to solve for u p p * and u E V * , where one neural network approximates J x * [10]. The critic neural network can reconstruct W c using ideal neural weights, as follows:
J * x = W c T φ c x + ε c x
Here W c R d , d denotes the number of neurons in the hidden layer. x represents the system state, which serves as input to the critic neural network. φ c x N denotes the activation-function vector, and ε c x is the bounded neural-network approximation error. Due to the powerful approximation capabilities of neural networks, this error can be made extremely small. As follows, J x * can be deduced from Equation (16):
J x * x = φ c T W c + ε c
Among these, φ c = φ c / x and φ c = φ c / x N × n denotes its gradient with respect to the state vector x n . As defined in Equation (17), the neural network transforms the value function into a critic neural network that meets the HJB equation for optimality achievement. Consequently, an optimized robust controller can be obtained from the critic neural network by learning W c . By learning to construct an estimated weight W ^ c to approximate W c , and can be approximated as follows:
J ^ x = W ^ c T φ c
During online learning, u p p = u ^ p p + ε 1 and u E V = u ^ E V + ε 2 are deployed as FR signals, where ε 1 and ε 2 are two small random noises for exploration. Even though the approximated worst disturbance v ^ may not equal the actual disturbance v , v ^ is used in online learning; thus, u ^ p p and u ^ E V are optimized under the worst disturbance, which enhances the robustness of V2G-based FR.
Therefore, the derivative of J x * with respect to J ^ x * is estimated as follows:
J ^ x x = φ c T W ^ c
Substituting Formula (19) into Formulas (14) and (15), u p p * and u E V * can be approximated as follows:
u ^ p p * = 1 2 R 1 B 1 T φ c T W ^ c
u ^ E V * = U m tanh 1 2 U m B 2 T φ c T W ^ c
HJB residual error is as follows:
e H = J ^ + H ( x , u ^ p p , u ^ E V , J ^ x )
Objective function E c = e c T e c / 2 is introduced to quantify the distance to optimality. The learning algorithm conducts gradient descent along E c with respect to W ^ c , leading to the following derivation of W ^ c learning dynamics:
W ^ c = η ρ 1 + ρ T ρ 2 e H
Among these, ρ = φ c A x + B 1 u ^ p p + B 2 u ^ E V + E v and η > 0 represent learning rates, while 1 + ρ T ρ 2 denotes the regression term. Once W ^ c converges, it yields an optimized V2G FR controller, which is illustrated in Equations (20) and (21). Algorithm 1 is for optimal FR and online learning of vehicle-grid interaction.
Algorithm 1 V2G Optimal FR and Online Learning Algorithm
Initialize the critic neural network weight W ^ c and learning rate η
for each sampling time t do
      Obtain   state   x   from   SCADA   and   the   available   V 2 G   FRC   U m from EV aggregator;
      Calculate   J ^ x by Equation (19) according to x ;
Generate   the   optimal   robust   control   output   u ^ p p   and   u ^ E V , by Equations (20) and (21) respectively;
end for

3.3. Stability Analysis

The purpose of this subsection is to establish the boundedness of the closed-loop state trajectory and critic-learning error under the conditions required by the Lyapunov-based analysis, rather than to claim strict asymptotic stability under arbitrary operating conditions.
This subsection analyzes the stability of the proposed V2G FR system based on the standard Lyapunov extension theorem. As is common in neural-network-based approximate optimal control and integral reinforcement learning analysis, several boundedness assumptions are introduced to facilitate the Lyapunov-based derivation of the closed-loop stability properties. These assumptions are used to establish the boundedness of the closed-loop states and critic-learning dynamics under the considered operating conditions. To mathematically derive the corresponding stability result, the following assumptions are made:
Assumption 1.
The ideal critic-network weight vector  W c  is bounded, i.e.,   W c   W M , where  W M  is a positive constant. This assumption is standard in critic-network-based approximation analysis and reflects the existence of a compact operating region in which the ideal value-function representation is well defined. This condition is required to ensure that the critic-network representation of the value function remains well posed in the compact operating region considered in the Lyapunov analysis.
Assumption 2.
The neural-network approximation error  ε c x    is bounded, i.e.,  ε c x ε M , where  ε M  is a positive constant. Since the state trajectory is considered within a compact operating region and the critic neural network uses continuous activation functions, this boundedness assumption is consistent with the universal approximation property. This condition is necessary to bound the residual terms introduced by neural-network approximation in the derivative of the Lyapunov function.
Assumption 3.
The gradient of the activation-function vector is bounded, i.e.,  φ c x φ M , where    φ M  is a positive constant. The boundedness of  φ c x  follows from the use of smooth bounded activation functions over the compact operating domain considered in this work. This condition guarantees that the gradient-related terms appearing in the critic-learning dynamics remain bounded and can be handled in the Lyapunov derivative estimate.
Under the above assumptions, a Lyapunov function is constructed to establish the boundedness of the closed-loop system and critic-learning dynamics.
L t = J x * t + 1 2 W ˜ c T t W ˜ c t
where L 1 = J x * t is the system’s optimal value function, L 2 = 1 2 W ˜ c T t W ˜ c t is the weight error term of the critic neural network, and W ˜ c = W c * W ^ c ,where W c ˜ is the critic weight estimation error.
Take the partial derivative of L 1 and L 2 with respect to x to obtain:
L ˙ 1 = J x * T x ˙
L ˙ 2 = W ˜ c T W ˜ ˙ c = W ˜ c T W ^ ˙
Constrain L ˙ 1 and L ˙ 2 such that:
L ˙ 1 λ min Q x 2 + C 0
L ˙ 2 η 0.5 λ min ϕ 1 ϕ 1 T W ˜ 2 + 1 2 η 2 b e 2
where λ min Q is the smallest eigenvalue of the positive definite matrix Q , C 0 is a normal constant.
It should be emphasized that the present analysis establishes uniform ultimate boundedness of the closed-loop states and critic weight estimation error under the stated assumptions, rather than asymptotic convergence in the stable strict sense. For the considered EV-integrated multi-area power system, under the proposed control laws, critic learning dynamics, and the stated boundedness assumptions, the closed-loop state x and critic weight estimation error W ˜ c are uniformly ultimately bounded.
B x = C a
B W c = C a
Specifically, a = λ min Q , b = η 0.5 λ min ϕ 1 ϕ 1 T , C = C 0 + 1 2 η 2 b e 2 .
In practical implementation, these boundedness conditions are supported by several design choices, including bounded activation functions, finite learning rates, bounded operating ranges of frequency deviation and V2G power, and physically constrained EV participation factor dynamics. Therefore, the assumptions used in the analysis are not arbitrary, but are consistent with the practical operating limits of the considered power-system FR problem.
A precise analytical characterization of the closed-loop stability region for large-scale multi-area V2G systems with stronger inter-area coupling is beyond the scope of the current study and will be investigated in future work. In this paper, the theoretical analysis mainly provides boundedness guarantees for the aggregated closed-loop system, while the effectiveness in multi-area scenarios is further supported by simulation studies on the IEEE 39-bus system.
It should be noted that the present work considers bounded small communication delays through an aggregated dynamic representation, rather than a full large-delay networked control formulation. The rigorous stability characterization under larger communication delays, packet loss, or asynchronous updates remains an important topic for future research.

4. Simulation and Results

The simulation studies are designed to verify the two research hypotheses from complementary aspects. The IEEE 14-bus case compares the proposed participation-aware controller with the participation-agnostic IRL baseline, mainly to examine whether explicit modeling of EV participation factor improves regulation performance (H1). The IEEE 39-bus case further compares the proposed method with MPC, ADHDP [8], and DDPG under multi-area disturbances, in order to evaluate the robustness and adaptability of the Hamiltonian-IRL framework in more complex scenarios (H2).
In this section, the IEEE 14-bus test system and IEEE 39-bus test system are used to verify the effectiveness and superiority of the proposed power system FR model incorporating EV owners’ participation factor. The parameters of the IEEE 14-bus system and IEEE 39-bus test system are listed in Table 1 and Table 2, respectively. Both systems have a base capacity of 100 MVA. The simulation duration is 180 s, with a sampling period of 0.01 s. All simulations are conducted on an ASUS laptop running Windows 10, equipped with an Intel Core i7-12700H CPU @ 2.30 GHz and 8 GB of RAM, using MATLAB 2018b.
This work performs two case studies to evaluate the proposed control strategy. In the first case, a comparative analysis is carried out between the traditional automatic coordinated control and the proposed V2G FR control (which accounts for EV owners’ participation factor), aiming to verify the effectiveness of the proposed V2G FR control. The second case involves a comparison between the proposed V2G FR control and three other existing methods—model predictive control (MPC), action-dependent heuristic dynamic programming (ADHDP), and deep deterministic policy gradient (DDPG) [22]—to highlight the advantages of the proposed scheme.
For the simulation, we assume that each area containing 2000 EVs can engage in V2G operations, where the charging/discharging power per EV is ± 7 kW [23]. Thus, the FR capacity U m i is set to 0.2. Due to the simulation’s short time span, U m i can be treated as a constant during the entire simulation process. The comprehensive scenario of power disturbances generated by loads and renewable energy sources is illustrated in Figure 4. The disturbance ranges from −15 MW to +15 MW (with a variance of 45 MW), consisting of ± 2 MW from load-related disturbances and ± 16 MW from renewable energy-related disturbances. Disturbance sources are of various types, encompassing both renewable energy and load-induced disturbances. In this study, load and renewable-power fluctuations are represented as lumped disturbances for controller-performance evaluation. This simplified treatment is intended for benchmark validation rather than detailed stochastic modeling of specific renewable-resource distributions.

4.1. Validation of the Effectiveness of V2G FR Control

This section conducts simulation experiments on the proposed V2G FR control based on EV owner engagement within the IEEE-14 node system, as shown in Figure 5, to validate the effectiveness of the proposed scheme in mitigating FR performance degradation caused by various disturbances.
The critic neural network adopts a three-layer structure, including seven input neurons, eight hidden neurons, and one output neuron. With the learning rate η set to 0.2 and Q = d i a g 10 , 4 , 4 , 1 , 4 , 8 , four training rounds of simulations were conducted to construct a V2G FR controller that accounts for EV owners’ participation. During each learning iteration, the controller optimized performance under various disturbances using system data. As shown in Figure 6, the single-region learning error for the IEEE-14 node system converged to near zero after 90 s in the final training round, demonstrating the convergence of the critic neural network’s learning process.
As shown in Figure 7, the blue line represents the grid FR model without considering EV owners’ V2G participation, denoted as IRL. The red line represents the grid FR model incorporating EV owners’ V2G participation, denoted as α-IRL. In IRL, frequency deviation exhibits significant oscillations throughout the timeframe, fluctuating widely between −0.6 Hz and 0.6 Hz. The maximum frequency deviation reaches 0.5985 Hz, exceeding the safety threshold. This demonstrates that the IRL method struggles to effectively suppress frequency fluctuations when addressing frequency stability issues. In contrast, the α-IRL model exhibits significantly reduced frequency deviation fluctuations compared to IRL. Its overall curve is smoother and closer to the steady-state value, with a maximum deviation of 0.1955 Hz—remaining within the safe range. This demonstrates that the approach incorporating EV owners’ V2G participation achieves superior FR.
Table 3 presents FR performance through three metrics: integral of squared error (ISE), integral of absolute error (IAE), and weighted error consistency ultimate bound (UBB). Generally, smaller values of performance metrics indicate better performance. Compared to the traditional IRL-based FR scheme, the proposed FR scheme incorporating V2G participation from EV owners improved these three metrics by 89.1%, 66.8%, and 66.2%, respectively. These results validate the effectiveness of the proposed FR scheme in mitigating FR performance degradation caused by various disturbances.
Thus, the proposed method incorporating EV owner V2G participation demonstrates significant advantages over traditional IRL in suppressing frequency deviation and maintaining system frequency stability, effectively proving its efficacy.

4.2. Comparison of the Proposed V2G FR Control Method with Other Methods

This case study compares the proposed V2G FR control based on EV owner participation factor with Model Predictive Control (MPC), Heuristic Adaptive Dynamic Programming (ADHDP), and Deep Deterministic Policy Gradient (DDPG). The comparison is intended to evaluate the relative regulation performance of the proposed approach under various disturbance conditions.
For consistency with the single-area study, the critic neural network used in the IEEE 39-bus multi-area case adopts the same three-layer structure as that used for the IEEE 14-bus system. Specifically, the critic neural network consists of an input layer, a hidden layer, and an output layer, while the input dimension is selected according to the corresponding multi-area state vector.
As shown in the frequency deviation curves of Figure 8, during the training process of the IEEE-39 node system, the learning errors corresponding to Area 1, Area 2, and Area 3 all underwent a transition from initial fluctuations to gradual convergence. Ultimately, they stabilized at relatively low error levels, indicating that the training method effectively optimizes learning errors within this system, enabling the system-related training to achieve favorable results.
As shown in the frequency deviation curves of Figure 9, the frequency deviations of MPC (Model Predictive Control), ADHDP (Adaptive Heuristic Dynamic Programming), DDPG (Deep Deterministic Policy Gradient), and the proposed method all exhibit dynamic variations over time. In subfigures (a–c), the curves corresponding to MPC, ADHDP, and DDPG exhibit significant fluctuations, with frequency deviations oscillating noticeably between −0.4 Hz and 0.4 Hz. These oscillations persist strongly, indicating that these methods demonstrate poor FR performance when encountering disturbances. In contrast, the red curve representing the proposed method exhibits significantly smaller overall fluctuations than the other three approaches. Its frequency deviation approaches zero more closely, and the curve appears smoother. This demonstrates that under various disturbance scenarios, the proposed method can more effectively suppress fluctuations in frequency deviation and maintain system frequency stability.
Moreover, as shown in Table 4, the proposed α-IRL scheme achieves better frequency-regulation performance than MPC, ADHDP, and DDPG under the considered multi-area simulation scenario.
In summary, the proposed method shows better regulation performance than MPC, ADHDP, and DDPG under the considered disturbance scenario, leading to more effective frequency-stability support.

4.3. Time-Varying Participation Case Study

In Section 4.1 and Section 4.2, the simulation horizon was limited to 180 s. Over such a short time scale, EV owners’ connection status and behavioral willingness are unlikely to change significantly, and therefore the participation factor α can be reasonably approximated as a constant. Over a longer operating horizon, however, this assumption becomes less appropriate because the practically dispatchable V2G capacity is affected by both EV plug-in availability and user-side willingness, both of which vary with travel routine and behavioral preference.
In this subsection, the simulation window is selected as 17:00–20:00, corresponding to a typical weekday evening residential plug-in period. Prior studies have shown that, in residential scenarios, EV arrival times commonly concentrate in this evening period and can often be approximated by a normal distribution with mean values around 17:00–20:00 [24]. Therefore, this time window is adopted to characterize the influence of time-varying participation on dispatchable V2G regulation capacity.
It should be emphasized that this paper does not assume that α itself follows a single fixed probability distribution. Instead, α t is modeled as a bounded time-varying factor jointly determined by two behavior-related components, namely plug-in availability and participation factor. This treatment is more physically meaningful than directly imposing a prescribed distribution on α , because practical V2G participation requires both that the EV is connected to the grid and that the owner is willing to provide ancillary support. Recent bottom-up flexibility studies have also shown that EV flexibility is more appropriately characterized using travel and plugging patterns together with heterogeneous user archetypes, rather than by a fixed scalar participation coefficient [25].
Accordingly, the reference participation trajectory is constructed as
α r e f t = A t W t
where A t denotes the plug-in availability factor and W t denotes the participation-willingness factor. The availability component represents the time-varying access of EVs to the charging interface during the evening commuting period, while the willingness component represents the user-side preference for participating in V2G and evolves more smoothly. This interpretation is consistent with recent V2G adoption studies showing that participation factor is strongly influenced by financial incentives, perceived loss of flexibility, and battery-degradation concerns [13].
In this study, A t is designed according to the regularity of evening residential arrival and connection behavior, while W t is modeled as a slower-varying bounded behavioral factor. The actual participation state α t is then generated through a first-order dynamic update law, so that the participation factor evolves gradually rather than changing instantaneously. This setting is consistent with the physical interpretation that aggregated user participation exhibits inertia over time.
To illustrate the effect of time-varying participation, two cases are considered: a constant-participation baseline, in which the participation factor remains fixed throughout the 3 h interval, and a dynamic participation-aware case, in which the participation factor varies with time and the controller updates the EV regulation limit according to the real-time dispatchable V2G capacity.
Figure 10 shows the trajectory of the participation factor α . It can be observed that the dynamic participation factor is lower than the constant baseline in the early evening period and then gradually increases as more EVs become available. This result indicates that a fixed participation assumption may either overestimate or underestimate the actual V2G regulation capability depending on the operating time.
Figure 11 presents the corresponding frequency-deviation responses. The results show that the proposed framework remains stable over the entire 3 h interval under time-varying participation. Compared with the constant- α baseline, the dynamic- α -aware case exhibits a distinguishable but still stable regulation trajectory. These results indicate that the variation of EV plug-in availability and participation factor is translated into a time-varying participation factor, which in turn affects the closed-loop regulation process. Therefore, the dynamic- α -aware formulation provides a more realistic representation of V2G-supported FR over the evening operating window.
Overall, the above results show that the proposed participation-aware framework can accommodate behavior-driven variation of EV participation factor over a practically meaningful evening operating period while maintaining stable frequency-regulation performance.

5. Discussion

The simulation results support the two research hypotheses from complementary perspectives. In the IEEE 14-bus system, the comparison between the conventional IRL method and the proposed participation-aware α-IRL method shows that explicitly incorporating EV owners’ participation factor significantly improves frequency-deviation suppression and overall regulation performance. This result supports H1 by indicating that participation-aware modeling can better align theoretical V2G regulation capability with practically dispatchable regulation resources. In the IEEE 39-bus multi-area system, the comparisons with MPC, ADHDP, and DDPG show that the proposed method achieves better overall dynamic regulation performance under more complex disturbance conditions. This result supports H2 by demonstrating that the Hamiltonian-based integral reinforcement learning framework provides a robust and adaptive solution for distributed V2G FR under uncertainty. The time-varying participation study further shows that the proposed framework can maintain stable frequency-regulation performance over a practical evening operating window, while more realistically reflecting the effect of changing EV plug-in availability and user willingness on dispatchable V2G capacity.
Compared with MPC, the proposed method avoids repeated online optimization and therefore shows stronger adaptability under uncertain disturbances and changing participation-aware V2G availability. Compared with ADHDP and DDPG, the proposed method remains more closely connected to the physical control objective through the Hamiltonian-based formulation, which improves interpretability and facilitates the incorporation of participation-aware constraints. In addition, the proposed controller is implemented in a model-free online form with respect to the drift dynamics, meaning that explicit online knowledge of the system drift matrix is not required for policy realization. This feature improves practical adaptability when accurate drift dynamics are difficult to identify, while preserving the structured physical meaning of the control design.
From an application perspective, the proposed framework is relevant to future renewable-rich power systems in which V2G resources are expected to provide distributed ancillary support. By explicitly considering EV owners’ participation factor, the method improves the practical realism of dispatchable V2G regulation capacity and reduces the gap between idealized regulation assumptions and actual user-constrained resource availability. In addition, the distributed Grid–Aggregator–EV structure is compatible with hierarchical implementation in practical aggregation-based V2G services, where regulation commands, participation information, and EV response must be coordinated across multiple layers.
Several important directions remain for future study. First, more realistic user-side uncertainty should be incorporated, including dynamic travel schedules, state-of-charge distributions, charging/discharging preferences, and battery-degradation costs. Second, the present work adopts a simplified bounded-delay representation; therefore, larger communication delays, packet loss, asynchronous updates, and more realistic cyber-physical constraints should be explicitly investigated in future networked-control formulations. Third, broader large-scale validation under more practical operating scenarios and richer benchmark comparisons would further strengthen the engineering applicability of the proposed framework.

6. Conclusions

This paper proposes a participation-aware V2G frequency-regulation framework for renewable-rich power systems by integrating the EV participation factor into the control design. A three-layer Grid–Aggregator–EV architecture and a corresponding participation-aware dynamic model were established to describe the coordination of distributed V2G resources in FR. On this basis, a Hamiltonian-based optimal robust regulation law was derived under practical V2G power constraints. To realize the control policy online, an integral reinforcement learning scheme was further developed so that the controller can be implemented without explicit online dependence on the system drift matrix A . Therefore, the proposed method preserves a model-free characteristic with respect to the drift dynamics while maintaining the physical structure of the frequency-regulation problem.
The simulation results on the IEEE 14-bus and IEEE 39-bus systems verified the effectiveness of the proposed framework from complementary aspects. In the IEEE 14-bus case, explicitly modeling EV participation factor improved the practical dispatchability of V2G resources and achieved better frequency-regulation performance than the participation-agnostic baseline. In the IEEE 39-bus case, the proposed method further demonstrated stronger robustness and adaptability than the compared methods under more complex multi-area disturbances. Overall, the proposed participation-aware Hamiltonian-IRL framework provides an effective and physically interpretable solution for coordinated V2G-based FR. In addition, the time-varying participation case showed that the proposed framework can accommodate behavior-driven variation of EV participation factor over a practical evening operating period while maintaining stable frequency-regulation performance.
Future work will focus on incorporating more realistic EV operational factors, such as travel schedules, state-of-charge distributions, battery degradation, and larger communication uncertainties, so as to further improve the practical applicability of the proposed method in large-scale V2G frequency-regulation scenarios.

Funding

The author did not receive support from any organization for the submitted work.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the author.

Conflicts of Interest

The author declares no conflicts of interest.

Nomenclature

Nomenclature of major symbols
SymbolDefinitionRemark
ΔfFrequency deviationState variable
ΔPmMechanical power deviation of the hydro turbineState variable
ΔPEVControl power deviation of EVs in FRState variable
ΔPgGovernor position deviation of the hydro turbineState variable
UIIntegral of the area control error (ACE)State variable
αEV owners’ V2G participation factor State variable
x(t)System state vectorx(t) = [ΔfPmPgPEV,U,α]T
uEVEV charge/discharge control signalControl input
uppPrimary FR signal from conventional generatorsControl input
v External disturbance inputLumped disturbance
ΔPLDisturbance power from loadsDisturbance term
ΔPRDisturbance power from renewable energy sourcesDisturbance term
ASystem state matrixState-space model
B1Input matrix associated with one control channelState-space model
B2Input matrix associated with another control channelState-space model
EExternal disturbance matrixState-space model
HsGrid inertia constantSystem parameter
DGrid damping coefficientSystem parameter
TdHydro turbine time constantSystem parameter
TgGovernor time constantSystem parameter
TEEV controller time constantSystem parameter
TαUser behavior inertia time constantSystem parameter
RdDroop coefficientSystem parameter
αrefReference value of EV owners’ V2G participationSystem parameter
JCost function/performance indexOptimization objective
U(⋅)Utility functionUsed in the cost function
H Hamiltonian functionOptimal control formulation
J*(x)Optimal value functionOptimal control variable
J ^ * x Approximated value functionCritic-network approximation
QState weighting matrix in the cost functionSymmetric positive definite
RuControl weighting matrix in the cost functionSymmetric positive definite
WIdeal critic-network weight vectorNeural-network parameter
W ^ Estimated critic-network weight vectorNeural-network parameter
W ˜ Critic weight estimation error W ˜ = W W ^
φ c x Activation-function vectorCritic network
φ c x Gradient of the activation-function vector with respect to the state vectorNeural-network derivative term
ε c x Approximation errorBounded neural-network approximation error
eHHJB residual errorLearning error term
η Learning rateOnline learning parameter
δ Regression term/residual-related termLearning law parameter
NNumber of neurons in the hidden layerNeural-network structure
λ min Minimum eigenvalue of a positive definite matrixStability analysis
ISEIntegral of squared errorPerformance index
IAEIntegral of absolute errorPerformance index
UBBUltimate boundedness bound/weighted error consistency ultimate boundPerformance index

Abbreviations

The following abbreviations are used in this manuscript:
AbbreviationFull term
V2GVehicle-to-grid
EVElectric vehicle
FRFrequency regulation
LFCLoad frequency control
IRLIntegral reinforcement learning
ADPAdaptive dynamic programming
HJBHamilton–Jacobi–Bellman
ACEArea control error
EVAElectric vehicle aggregator
DEVCDistributed EV community
EVCNEV communication network
MPCModel predictive control
ADHDPAction-dependent heuristic dynamic programming
DDPGDeep deterministic policy gradient

References

  1. Muduli, R.; Jena, D.; Moger, T. A survey on load frequency control using reinforcement learning-based data-driven controller. Appl. Soft Comput. 2024, 166, 112203. [Google Scholar] [CrossRef]
  2. Gulzar, M.M.; Sibtain, D.; Alqahtani, M.; Alismail, F.; Khalid, M. Load frequency control progress: A comprehensive review on recent development and challenges of modern power systems. Ain Shams Eng. J. 2025, 16, 103168. [Google Scholar] [CrossRef]
  3. Wadi, M.; Shobole, A.; Elmasry, W.; Kucuk, I. Load frequency control in smart grids: A review of recent developments. Renew. Sustain. Energy Rev. 2024, 189, 114013. [Google Scholar] [CrossRef]
  4. Rahman, M.; Sarker, S.K.; Das, S.K.; Ali, M.F. Model predictive control framework design for frequency regulation of PHEVs participating in interconnected smart grid. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE); IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
  5. Nguyen, H.T.; Choi, D.H. Distributionally robust model predictive control for smart electric vehicle charging station with V2G/V2V capability. IEEE Trans. Smart Grid 2023, 14, 4621–4634. [Google Scholar] [CrossRef]
  6. Sun, J.; Tan, S.; Zheng, H.; Qi, G.; Tan, S.; Peng, D.; Guerrero, J.M. A DoS attack-resilient grid frequency regulation scheme via adaptive V2G capacity-based integral sliding mode control. IEEE Trans. Smart Grid 2023, 14, 3046–3057. [Google Scholar] [CrossRef]
  7. Mu, C.; Liu, W.; Xu, W. Hierarchically adaptive frequency control for an EV-integrated smart grid with renewable energy. IEEE Trans. Ind. Inform. 2018, 14, 4254–4265. [Google Scholar] [CrossRef]
  8. Kumar, N.; Tyagi, B.; Kumar, V. Approximate dynamic programming based controller design for interconnected AGC scheme. In 2015 IEEE Region 10 Conference (TENCON); IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar]
  9. Mu, C.; Wang, K.; Ma, S.; Chong, Z.; Ni, Z. Adaptive composite frequency control of power systems using reinforcement learning. CAAI Trans. Intell. Technol. 2025, 7, 671–684. [Google Scholar] [CrossRef]
  10. Song, R.; Lewis, F.L.; Wei, Q.; Zhang, H. Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans. Cybern. 2016, 46, 1041–1053. [Google Scholar] [CrossRef]
  11. Xie, H.; Song, G.; Shi, Z.; Zhang, J.; Lin, Z.; Yu, Q.; Fu, H.; Song, X.; Zhang, H. Reinforcement learning for vehicle-to-grid: A review. Sustain. Futures 2025, 8, 100369. [Google Scholar] [CrossRef]
  12. Shi, J.; Peng, C.; Zhang, J.; Xie, X. Model-free frequency regulation in islanded microgrids: An event-triggered adaptive dynamic programming approach. Int. J. Electr. Power Energy Syst. 2024, 155, 109635. [Google Scholar] [CrossRef]
  13. Alamgir, S.; Hassan, S.J.U.; Mehdi, A.; Abdelmaksoud, A.; Haider, Z.; Shin, G.-S.; Kim, C.-H. A comprehensive review of vehicle-to-grid (V2G) technology as an ancillary services provider. Results Eng. 2025, 27, 106813. [Google Scholar] [CrossRef]
  14. Bakhuis, J.; Barbour, N.; Chappin, É.J.L. Exploring user willingness to adopt vehicle-to-grid (V2G): A statistical analysis of stated intentions. Energy Policy 2025, 203, 114619. [Google Scholar] [CrossRef]
  15. Chen, G.; Zhang, Z. Control strategies, economic benefits, and challenges of vehicle-to-grid applications: Recent trends research. World Electr. Veh. J. 2024, 15, 190. [Google Scholar] [CrossRef]
  16. Tang, R.; Mak, H.-Y.; Rong, Y. Collaborative vehicle-to-grid operations in frequency regulation markets. Manuf. Serv. Oper. Manag. 2024, 26, 814–833. [Google Scholar]
  17. Avila-Becerril, S.; Espinosa-Pérez, G.; Machado, J.E. A Hamiltonian control approach for electric microgrids with dynamic power flow solution. Automatica 2022, 139, 110192. [Google Scholar] [CrossRef]
  18. Tõnso, M.; Kaparin, V.; Belikov, J. Port-Hamiltonian framework in power systems domain. Energy Rep. 2023, 10, 2918–2930. [Google Scholar] [CrossRef]
  19. Jang, M.-J.; Oh, E. Deep-reinforcement-learning-based vehicle-to-grid operation strategies for managing solar power generation forecast errors. Sustainability 2024, 16, 3851. [Google Scholar] [CrossRef]
  20. Li, T.; Tao, S.; He, K.; Lu, M.; Xie, B.; Yang, B.; Sun, Y. V2G Multi-Objective Dispatching Optimization Strategy Based on User Behavior Model. Front. Energy Res. 2021, 9, 739527. [Google Scholar] [CrossRef]
  21. Abouheaf, M.; Gueaieb, W. Model-free adaptive control approach using integral reinforcement learning. In 2019 IEEE International Instrumentation and Measurement Technology Conference (I2MTC); IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
  22. Alfaverh, F.; Denaï, M.; Sun, Y. Optimal vehicle-to-grid control for supplementary frequency regulation using deep reinforcement learning. Electr. Power Syst. Res. 2023, 214, 108949. [Google Scholar] [CrossRef]
  23. Igarashi, K.; Takami, S.; Hayashi, Y. Development and Analysis of an Integrated Optimization Model for Variable Renewable Energy and Vehicle-to-Grid in Remote Islands: A Case Study of Tanegashima, Japan. Energies 2025, 18, 5933. [Google Scholar] [CrossRef]
  24. Xue, L.; Xia, J. Simulator to Quantify and Manage Electric Vehicle Load Impacts on Low-Voltage Distribution Grids; WRI China Technical Note; World Resources Institute: Washington, DC, USA, 2021. [Google Scholar]
  25. Gan, W.; Zhou, Y.; Wu, J. Quantifying grid flexibility provision of virtual vehicle-to-vehicle energy sharing using statistically similar networks. Appl. Energy 2025, 390, 125818. [Google Scholar] [CrossRef]
Figure 1. Three-layer Grid–Aggregator–EV architecture for distributed V2G FR.
Figure 1. Three-layer Grid–Aggregator–EV architecture for distributed V2G FR.
Symmetry 18 00824 g001
Figure 2. Overall structure of the proposed participation-aware V2G frequency-regulation system.
Figure 2. Overall structure of the proposed participation-aware V2G frequency-regulation system.
Symmetry 18 00824 g002
Figure 3. Overall workflow of the proposed participation-aware V2G frequency-regulation methodology.
Figure 3. Overall workflow of the proposed participation-aware V2G frequency-regulation methodology.
Symmetry 18 00824 g003
Figure 4. The summation of the power disturbances from loads and renewable resources.
Figure 4. The summation of the power disturbances from loads and renewable resources.
Symmetry 18 00824 g004
Figure 5. Single-area IEEE 14-bus test system used for validation of the proposed method.
Figure 5. Single-area IEEE 14-bus test system used for validation of the proposed method.
Symmetry 18 00824 g005
Figure 6. Convergence of critic-network learning error in the IEEE 14-bus system.
Figure 6. Convergence of critic-network learning error in the IEEE 14-bus system.
Symmetry 18 00824 g006
Figure 7. Comparison of frequency deviation responses under IRL and participation-aware α-IRL in the IEEE 14-bus system.
Figure 7. Comparison of frequency deviation responses under IRL and participation-aware α-IRL in the IEEE 14-bus system.
Symmetry 18 00824 g007
Figure 8. Evolution of critic-network learning errors in the IEEE 39-bus multi-area system: (a) Area 1; (b) Area 2; (c) Area 3.
Figure 8. Evolution of critic-network learning errors in the IEEE 39-bus multi-area system: (a) Area 1; (b) Area 2; (c) Area 3.
Symmetry 18 00824 g008
Figure 9. Multi-area frequency deviation responses under different control methods: (a) Area 1; (b) Area 2; (c) Area 3.
Figure 9. Multi-area frequency deviation responses under different control methods: (a) Area 1; (b) Area 2; (c) Area 3.
Symmetry 18 00824 g009
Figure 10. Time-varying participation factor α over 17:00–20:00.
Figure 10. Time-varying participation factor α over 17:00–20:00.
Symmetry 18 00824 g010
Figure 11. Frequency deviation under constant- α and dynamic- α -aware cases.
Figure 11. Frequency deviation under constant- α and dynamic- α -aware cases.
Symmetry 18 00824 g011
Table 1. Parameter Configuration for Single-area IEEE 14-bus System.
Table 1. Parameter Configuration for Single-area IEEE 14-bus System.
SymbolDefinitionArea
HsGrid inertia constant (pu/Hz)11
RdDroop coefficient (Hz/pu)0.05
DGrid damping coefficient (pu/Hz)1.4
TgGovernor time constant (s)0.15
TdHydro turbine time constant (s)0.30
TEEV controller time constant (s)0.02
TαUser behavior inertia time constant (s)60
αrefReference value of EV owners’ V2G participation0.60
Table 2. Parameter configuration for three-area IEEE 39-bus system.
Table 2. Parameter configuration for three-area IEEE 39-bus system.
SymbolDefinitionArea 1Area 2Area 3
HsGrid inertia constant (pu/Hz)101012
RdDroop coefficient (Hz/pu)0.050.050.05
DGrid damping coefficient (pu/Hz)1.01.51.8
TgGovernor time constant (s)0.100.170.20
TdHydro turbine time constant (s)0.300.400.35
TEEV controller time constant (s)0.020.020.02
TαUser behavior inertia time constant (s)607080
αrefReference value of EV owners’ V2G participation0.600.550.65
Table 3. Performance Comparison of IEEE-14.
Table 3. Performance Comparison of IEEE-14.
ISEIAEUBB
IRL15.40243.3050.601
α-IRL1.67814.3700.203
Table 4. Performance Comparison of IEEE-39.
Table 4. Performance Comparison of IEEE-39.
GroupsISEIAEUBB
MPC0.8639.8290.183
ADHDP0.97410.5860.263
DDPG1.06711.1240.256
α-IRL0.0773.0640.064
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liang, C. Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning. Symmetry 2026, 18, 824. https://doi.org/10.3390/sym18050824

AMA Style

Liang C. Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning. Symmetry. 2026; 18(5):824. https://doi.org/10.3390/sym18050824

Chicago/Turabian Style

Liang, Canhang. 2026. "Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning" Symmetry 18, no. 5: 824. https://doi.org/10.3390/sym18050824

APA Style

Liang, C. (2026). Distributed V2G Grid Frequency Regulation Considering EV Owner Participation via Cooperative Integral Reinforcement Learning. Symmetry, 18(5), 824. https://doi.org/10.3390/sym18050824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop