1. Introduction
With the continuous increase in renewable-energy penetration, power systems are facing stronger frequency fluctuations due to the intermittency and uncertainty of wind and solar generation. Although conventional generating units remain the main source of frequency support, their response speed and regulation flexibility are often insufficient under highly dynamic operating conditions. Large-scale battery energy storage can provide fast support, but its deployment cost is still considerable. In this context, vehicle-to-grid (V2G) technology has attracted growing attention because electric vehicles (EVs) can act as distributed and fast-response energy resources for ancillary services, including frequency regulation. Recent reviews have further shown that modern FR/LFC research is evolving from classical control toward intelligent, data-driven, and resilient regulation frameworks for renewable-rich power systems [
1,
2,
3].
Existing FR strategies mainly include classical feedback control and optimization-based methods such as proportional–integral–derivative (PID) control and model predictive control (MPC). These methods have provided useful baselines for improving system frequency quality and coordinating power support from controllable resources [
4,
5]. However, their effectiveness in large-scale V2G-integrated power systems is often limited by model dependence, online computational burden, and insufficient adaptability to strong nonlinearities and uncertain disturbances. Recent state-of-the-art reviews on load
FR have emphasized that these limitations become more pronounced in interconnected and renewable-dominated systems, where modern intelligent and data-driven approaches are increasingly required [
1,
2].
To enhance adaptability and control performance under uncertainty, a variety of intelligent control methods have been introduced into FR problems, including integral sliding mode control, hierarchical adaptive control, adaptive dynamic programming, and reinforcement learning [
6,
7,
8,
9,
10]. Among them, reinforcement-learning-based and approximate dynamic programming methods are particularly attractive because they can approximate optimal policies online without explicitly solving the Hamilton–Jacobi–Bellman equation. Recent surveys have also shown that reinforcement learning has become an important direction for both load
FR and V2G-oriented scheduling under uncertainty [
1,
11]. Nevertheless, most existing studies either focus on controller adaptability without explicitly modeling EV participation factor constraints, or consider V2G scheduling without sufficiently addressing robust closed-loop FR in distributed multi-layer architectures. Representative recent studies have further shown that model-free or adaptive-dynamic-programming-based methods can be directly applied to FR problems, for example, in islanded microgrids under uncertain operating conditions, thereby reinforcing the relevance of learning-based regulation strategies in frequency-control applications [
12].
In parallel, V2G has been widely recognized as a promising resource for ancillary services because EV batteries provide rapid bidirectional power support. Recent review studies have further emphasized that V2G technology can serve as an ancillary-services provider for renewable-rich power systems, offering services such as FR, voltage support, spinning reserve, and peak-load support, which strengthens the practical motivation for participation-aware V2G FR [
13]. However, practical V2G participation is fundamentally constrained by user-side factors, including travel demand, charging flexibility, battery degradation concerns, and economic incentives. Recent studies have shown that users’ willingness to participate in V2G is strongly influenced by financial returns, perceived loss of flexibility, and battery-health concerns [
14,
15]. In addition, collaborative V2G market studies have highlighted that heterogeneous driving schedules and incentive design significantly affect the dispatchable V2G capacity available for FR [
16]. Therefore, treating EV participation factor as a fixed or preset parameter may lead to a mismatch between theoretical regulation capacity and practically available V2G support.
From a control-theoretic perspective, Hamiltonian- and energy-based formulations provide a rigorous framework for deriving optimal control laws in constrained power and microgrid systems [
17,
18]. Meanwhile, recent deep-reinforcement-learning-based V2G studies have demonstrated the potential of learning-based strategies for handling operational uncertainty and dynamic charging/discharging decisions [
19]. These developments indicate that combining Hamiltonian-based optimality with online learning is a promising direction for participation-aware V2G FR. However, the integration of EV-owner participation dynamics, distributed Grid–Aggregator–EV coordination, and online optimal robust
FR remains insufficiently explored in the current literature.
Despite the above progress, several research gaps remain. First, most existing V2G frequency-regulation studies still simplify EV participation factor as a fixed or exogenous parameter, without dynamically characterizing the impacts of user willingness and practical availability on dispatchable regulation capacity [
14,
15,
16]. Second, current FR strategies rarely unify participation-aware V2G capacity constraints, conventional-generator coordination, and disturbance-rejection requirements within a single optimal control framework [
1,
2]. Third, although reinforcement-learning-based methods improve adaptability, the combination of Hamiltonian-based optimality analysis and online critic learning for distributed V2G FR is still limited [
11,
16,
19].
To address the aforementioned challenges, this paper proposes a participation-aware V2G-based FR method that explicitly incorporates EV owners’ participation factor into the control design. First, a power-grid FR model incorporating the EV participation factor is established to characterize the constrained relationship between the participation factor and dispatchable V2G regulation capacity. Second, a Hamiltonian function is constructed to derive the optimal control law. Finally, a critic-neural-network-based online learning algorithm is designed to update the network weights and implement the control policy online. The proposed method ultimately achieves fast and stable frequency regulation while accounting for EV participation and enhancing system robustness. The main contributions of this paper can be summarized in the following two aspects:
A distributed multi-layer V2G FR architecture, including the power-grid side, aggregator side, and EV side, is established to achieve collaborative FR among multiple controllers. Meanwhile, the EV participation factor is quantified in the developed power-grid model to characterize the dispatchable V2G regulation capacity, thereby improving V2G FR performance.
This paper presents a collaborative integral reinforcement learning control scheme that is model-free with respect to the drift dynamics in the online implementation. By simultaneously incorporating V2G FR signals and power grid control signals, the proposed utility function enables optimal collaborative FR across various types of FR signals, which in turn further optimizes the FR performance.
To make the objective of this study more explicit, the research hypothesis of this paper is stated as follows:
H1. Explicit modeling of EV owners’ participation factor improves the practical dispatchability of V2G resources and leads to better frequency-regulation performance than participation-agnostic control.
H2. A Hamiltonian-based optimal control framework combined with integral reinforcement learning can provide a robust and adaptive regulation policy for distributed V2G FR under uncertain disturbances.
The subsequent arrangement of this paper is as follows:
Section 2 elaborates on the establishment of the power grid FR model and problem description.
Section 3 presents the FR controller, learning algorithm, and stability analysis based on EV owners’ V2G participation factor.
Section 4 verifies the effectiveness, superiority, and practical applicability of the proposed method through simulation results in single-area, multi-area, and time-varying participation scenarios.
Section 5 summarizes the research findings of this paper.
2. Model Construction and Problem Description
2.1. Power System Model
This section establishes the participation-aware dynamic model from the physical Grid–Aggregator–EV architecture and provides the basis for the subsequent control design.
As shown in
Figure 1, a three-layer FR architecture for distributed V2G is established, consisting of the power grid side, aggregation side, and EV side. The power grid side serves as the initiation and regulation level for FR demands, leading the overall planning of global FR resources. The aggregation side comprises multiple EV Aggregators (EVAs), which aggregate regional FR resources through electrical and physical connections, and complete cross-regional information interaction and collaborative decision-making via the EV Communication Network (EVCN). The EV side takes Distributed EV Communities (DEVCs) as units; within each community, EVs form an autonomous entity through local communication networks, and while receiving instructions from the aggregation side, they feed back their own FR capabilities and status information. Through bidirectional interaction between electrical-physical connections and data-information connections, the three-layer architecture realizes multi-level FR resource scheduling and collaboration from the power grid to individual EVs, providing architectural support for the efficiency and flexibility of power system FR in distributed V2G scenarios.
Figure 2 shows the structure of the proposed frequency-regulation system with V2G. In this framework, the conventional-generator channel and the aggregated EV channel collaboratively provide regulation support, while EV owners’ participation factor constrains the dispatchable V2G capacity. Therefore, the system explicitly links participation-aware V2G availability with coordinated frequency-control action.
Based on the above physical architecture, the corresponding mathematical model is established as follows. The power-grid layer determines the global frequency-regulation task and provides the system frequency deviation to be suppressed. The aggregator layer collects participation-related information and available regulation capacity from distributed EV communities and converts them into aggregated regulation actions. The EV layer provides practical charging/discharging flexibility subject to the user-side participation factor and operational availability. Accordingly, the system frequency deviation, turbine-governor dynamics, aggregated EV regulation power, integral regulation state, and participation factor are selected as the main state variables of the mathematical model. In this study, the primary controlled variable is the system frequency deviation , while the conventional-generator regulation signal and the EV charge/discharge regulation signal are treated as the two manipulated inputs. External load and renewable-energy fluctuations are modeled as lumped disturbances.
For the main controller, the frequency dynamics of the equivalent regulator, steam turbine, and power system, as well as the aggregated V2G power output and EV owners’ participation factor, are described by the following differential equations:
In the above parameters, , , , , and correspond to the frequency deviation, mechanical power deviation of the hydro turbine, control power deviation of electric vehicles (EVs) in FR, governor position deviation of the hydro turbine, integral of the area control error (ACE), and V2G participation factor of EV owners, in that order. and respectively denote the disturbance power originating from loads and renewable energy sources. Furthermore, , , , , , and represent the grid inertia constant, grid damping coefficient, hydro turbine time constant, governor time constant, EV controller time constant, governor control input deviation coefficient, and user behavior inertia time constant, respectively. Additionally, and denote the EV charge/discharge control signal and the primary FR signal from conventional generators, respectively.
Define the system state vector as:
Accordingly, the overall system dynamics can be rewritten in the following state-space form:
Here, denotes the system state matrix, and denote the input matrices associated with different control channels, and denotes the external disturbance matrix.
In this study, the primary controlled variable is the system frequency deviation
, while the integral of the area control error and the aggregated EV regulation power are included as auxiliary regulation-related states to improve dynamic frequency-recovery performance. The two manipulated inputs are the conventional-generator control signal
and the EV charge/discharge control signal
.
The communication links shown in
Figure 1 introduce finite delays in the transmission of regulation commands and participation-related information between the grid layer, aggregator layer, and EV layer. In this work, these delays are assumed to be bounded and relatively small compared with the dominant frequency-regulation time scale. Therefore, their aggregated effect is incorporated into the regulation framework through the existing actuator/response dynamics of the aggregated EV channel and the participation-update process, rather than being treated as an independent large-delay networked control problem. Under this assumption, the proposed model captures the practical latency effect in a simplified but implementation-oriented manner.
Here, the EV-side time constant is used to capture the aggregated response lag of the V2G regulation channel, including the effect of local communication, command execution, and power-response delay at the aggregator–EV interface.
The first-order dynamic equation for EV owners’ V2G participation factor is calibrated based on 320 scenario-based questionnaires from first-tier cities and Monte Carlo simulations of 300,000 EVs in [
20], which aligns with the behavioral characteristics of vehicle owners such as SOC thresholds and travel patterns. V2G participation factor among EV owners quantifies the willingness of electric vehicle owners to participate in grid FR. A higher
indicates a larger energy range available for V2G to support grid FR. A lower
indicates a smaller energy range available for V2G to support grid FR, making the system more reliant on conventional generators.
Therefore, the physical frequency-regulation problem is transformed into a participation-aware state-space control problem with dual control channels, namely conventional generator regulation and aggregated EV regulation.
2.2. Problem Description
With the growing integration of renewable energy sources into power grids, frequency variations resulting from their intermittent characteristics pose a significant threat to grid stability, and in extreme cases, may trigger the collapse of the power system. Traditional solutions rely on energy storage devices such as batteries for reverse power supply, but large-scale deployment entails substantial costs. Meanwhile, with the development of new energy technologies, an increasing number of EVs have entered daily life. Thanks to the rapid advancement of V2G technology, EVs are regarded as distributed energy storage devices. Beyond meeting travel needs, EV batteries can fulfill energy storage functions through V2G technology. Moreover, since the charging and discharging process of EVs involves electromagnetic and chemical reactions rather than mechanical processes, EVs respond faster than power plants during FR. In summary, compared with traditional energy storage devices, EV batteries offer the advantages of low cost, convenient use, and fast response speed. However, this framework faces three core challenges. First, EVs are primarily used to satisfy travel demand, and frequent charging and discharging may accelerate battery degradation, leading to low and uncertain EV participation. Therefore, improving and characterizing the V2G participation factor becomes a key issue. Second, EVs are widely distributed, which makes centralized power dispatch difficult; thus, the coordinated scheduling of large-scale distributed resources must be addressed. Third, multiple control strategies coexist in the power grid, and their power allocation must be coordinated to achieve efficient FR.
3. Collaborative FR Based on Integral Reinforcement Learning
The overall methodological workflow of the proposed participation-aware V2G frequency-regulation framework is summarized in
Figure 3.
The proposed methodology is organized into four main stages. First, a participation-aware dynamic model is established from the physical Grid–Aggregator–EV architecture, in which the EV participation factor is incorporated into the state-space representation of FR. Second, a constrained optimal control problem is formulated so that the regulation objective remains consistent with frequency-quality requirements and practical V2G availability. Third, a Hamiltonian-based regulation law is derived and implemented online through critic-network-based integral reinforcement learning. Fourth, the resulting closed-loop properties and the research hypotheses are examined through stability analysis and simulation-based validation in both single-area and multi-area test systems.
3.1. Optimal Control Objective
This subsection formulates the constrained optimal control problem so that FR remains consistent with practical V2G availability and participation-aware resource limits.
To support Hypotheses H1 and H2, the control objective is formulated to jointly penalize frequency deviation, control effort, and the limitation of dispatchable V2G capacity induced by the EV participation factor.
The saturation of available V2G FR capacity is defined by
, specifically as follows:
The objective function is designed to reflect the physical regulation task shown in
Figure 1. Specifically, it penalizes frequency-related state deviations to ensure frequency quality at the grid layer, penalizes control effort to avoid excessive regulation burden on conventional generators and EV resources, and incorporates the limitation of dispatchable V2G capacity so that the optimization remains consistent with the practical participation-aware availability provided by the aggregator and EV layers.
The optimal control minimizes the cost function at the saddle point, as shown in Equation (6).
is defined as the utility function:
Among these, and are given symmetric positive definite matrices.
is a semi-definite function used to regulate the available V2G FR
under the following constraints
:
Among these, belongs to the class of bounded, monotonic, odd functions, such as the hyperbolic tangent function (where ).
The Hamiltonian function is constructed as follows:
Among these, .
Let
be the optimal performance indicator function, then
Take the partial derivative of the Hamilton function
with respect to
and set it equal to 0:
Taking the partial derivative of the Hamilton function
with respect to
and setting it equal to 0 yields similarly:
Therefore, the optimal control can be obtained from (10) below, where the superscript * denotes the optimal solution.
The optimal control may be formulated as:
Following the definition in [
21], “model-free” means that the online learning and control implementation do not require explicit knowledge of the system dynamics. In this paper,
denotes the drift dynamics matrix of the participation-aware frequency-regulation system, while
and
denote the input-channel matrices. Since the final IRL-based control laws do not explicitly depend on
, the proposed controller can be regarded as model-free with respect to the drift dynamics.
Based on the above state-space model and constrained regulation objective, a Hamiltonian function is constructed to derive the optimal regulation law. The main contribution here is that the practical V2G participation limitation is embedded into the optimal control formulation, rather than treated as an exogenous fixed coefficient. This enables the resulting controller to better reflect the physically available V2G support in real operation.
3.2. V2G Optimal FR and Online Learning Algorithm
This part implements the Hamiltonian-based control law online through critic-network-based integral reinforcement learning.
Since the optimal value function cannot be solved analytically, the Hamiltonian-based control law is implemented online through a critic neural network within an integral reinforcement learning framework. In the proposed implementation, the online policy update is carried out without explicit online dependence on the system drift matrix . Therefore, the learning module serves not only as a computational realization of the optimal control design, but also as the key mechanism that enables a model-free online implementation with respect to the drift dynamics.
Since the bounded communication-induced latency is reflected in the aggregated EV-channel dynamics, the optimization and online learning process are carried out on the participation-aware delayed-response system rather than on an ideal instantaneous-response V2G model.
The calculation
of
depends on
, as shown in Equations (14) and (15).
cannot be explicitly computed, thus requiring the future value of the utility function. Accordingly, an adaptive neural network-based discriminator scheme is employed to solve for
and
, where one neural network approximates
[
10]. The critic neural network can reconstruct
using ideal neural weights, as follows:
Here
,
denotes the number of neurons in the hidden layer.
represents the system state, which serves as input to the critic neural network.
denotes the activation-function vector, and
is the bounded neural-network approximation error. Due to the powerful approximation capabilities of neural networks, this error can be made extremely small. As follows,
can be deduced from Equation (16):
Among these,
and
denotes its gradient with respect to the state vector
. As defined in Equation (17), the neural network transforms the value function into a critic neural network that meets the HJB equation for optimality achievement. Consequently, an optimized robust controller can be obtained from the critic neural network by learning
. By learning to construct an estimated weight
to approximate
, and can be approximated as follows:
During online learning, and are deployed as FR signals, where and are two small random noises for exploration. Even though the approximated worst disturbance may not equal the actual disturbance , is used in online learning; thus, and are optimized under the worst disturbance, which enhances the robustness of V2G-based FR.
Therefore, the derivative of
with respect to
is estimated as follows:
Substituting Formula (19) into Formulas (14) and (15),
and
can be approximated as follows:
HJB residual error is as follows:
Objective function
is introduced to quantify the distance to optimality. The learning algorithm conducts gradient descent along
with respect to
, leading to the following derivation of
learning dynamics:
Among these,
and
represent learning rates, while
denotes the regression term. Once
converges, it yields an optimized V2G FR controller, which is illustrated in Equations (20) and (21). Algorithm 1 is for optimal
FR and online learning of vehicle-grid interaction.
| Algorithm 1 V2G Optimal FR and Online Learning Algorithm |
| Initialize the critic neural network weight and learning rate |
| for each sampling time t do |
| from EV aggregator; |
| by Equation (19) according to ; |
| , by Equations (20) and (21) respectively; |
| end for |
3.3. Stability Analysis
The purpose of this subsection is to establish the boundedness of the closed-loop state trajectory and critic-learning error under the conditions required by the Lyapunov-based analysis, rather than to claim strict asymptotic stability under arbitrary operating conditions.
This subsection analyzes the stability of the proposed V2G FR system based on the standard Lyapunov extension theorem. As is common in neural-network-based approximate optimal control and integral reinforcement learning analysis, several boundedness assumptions are introduced to facilitate the Lyapunov-based derivation of the closed-loop stability properties. These assumptions are used to establish the boundedness of the closed-loop states and critic-learning dynamics under the considered operating conditions. To mathematically derive the corresponding stability result, the following assumptions are made:
Assumption 1. The ideal critic-network weight vector is bounded, i.e.,, where is a positive constant. This assumption is standard in critic-network-based approximation analysis and reflects the existence of a compact operating region in which the ideal value-function representation is well defined. This condition is required to ensure that the critic-network representation of the value function remains well posed in the compact operating region considered in the Lyapunov analysis.
Assumption 2. The neural-network approximation error is bounded, i.e., , where is a positive constant. Since the state trajectory is considered within a compact operating region and the critic neural network uses continuous activation functions, this boundedness assumption is consistent with the universal approximation property. This condition is necessary to bound the residual terms introduced by neural-network approximation in the derivative of the Lyapunov function.
Assumption 3. The gradient of the activation-function vector is bounded, i.e., , where is a positive constant. The boundedness of follows from the use of smooth bounded activation functions over the compact operating domain considered in this work. This condition guarantees that the gradient-related terms appearing in the critic-learning dynamics remain bounded and can be handled in the Lyapunov derivative estimate.
Under the above assumptions, a Lyapunov function is constructed to establish the boundedness of the closed-loop system and critic-learning dynamics.
where
is the system’s optimal value function,
is the weight error term of the critic neural network, and
,where
is the critic weight estimation error.
Take the partial derivative of
and
with respect to
to obtain:
Constrain
and
such that:
where
is the smallest eigenvalue of the positive definite matrix
,
is a normal constant.
It should be emphasized that the present analysis establishes uniform ultimate boundedness of the closed-loop states and critic weight estimation error under the stated assumptions, rather than asymptotic convergence in the stable strict sense. For the considered EV-integrated multi-area power system, under the proposed control laws, critic learning dynamics, and the stated boundedness assumptions, the closed-loop state
and critic weight estimation error
are uniformly ultimately bounded.
Specifically, , , .
In practical implementation, these boundedness conditions are supported by several design choices, including bounded activation functions, finite learning rates, bounded operating ranges of frequency deviation and V2G power, and physically constrained EV participation factor dynamics. Therefore, the assumptions used in the analysis are not arbitrary, but are consistent with the practical operating limits of the considered power-system FR problem.
A precise analytical characterization of the closed-loop stability region for large-scale multi-area V2G systems with stronger inter-area coupling is beyond the scope of the current study and will be investigated in future work. In this paper, the theoretical analysis mainly provides boundedness guarantees for the aggregated closed-loop system, while the effectiveness in multi-area scenarios is further supported by simulation studies on the IEEE 39-bus system.
It should be noted that the present work considers bounded small communication delays through an aggregated dynamic representation, rather than a full large-delay networked control formulation. The rigorous stability characterization under larger communication delays, packet loss, or asynchronous updates remains an important topic for future research.
4. Simulation and Results
The simulation studies are designed to verify the two research hypotheses from complementary aspects. The IEEE 14-bus case compares the proposed participation-aware controller with the participation-agnostic IRL baseline, mainly to examine whether explicit modeling of EV participation factor improves regulation performance (H1). The IEEE 39-bus case further compares the proposed method with MPC, ADHDP [
8], and DDPG under multi-area disturbances, in order to evaluate the robustness and adaptability of the Hamiltonian-IRL framework in more complex scenarios (H2).
In this section, the IEEE 14-bus test system and IEEE 39-bus test system are used to verify the effectiveness and superiority of the proposed power system FR model incorporating EV owners’ participation factor. The parameters of the IEEE 14-bus system and IEEE 39-bus test system are listed in
Table 1 and
Table 2, respectively. Both systems have a base capacity of 100 MVA. The simulation duration is 180 s, with a sampling period of 0.01 s. All simulations are conducted on an ASUS laptop running Windows 10, equipped with an Intel Core i7-12700H CPU @ 2.30 GHz and 8 GB of RAM, using MATLAB 2018b.
This work performs two case studies to evaluate the proposed control strategy. In the first case, a comparative analysis is carried out between the traditional automatic coordinated control and the proposed V2G FR control (which accounts for EV owners’ participation factor), aiming to verify the effectiveness of the proposed V2G FR control. The second case involves a comparison between the proposed V2G FR control and three other existing methods—model predictive control (MPC), action-dependent heuristic dynamic programming (ADHDP), and deep deterministic policy gradient (DDPG) [
22]—to highlight the advantages of the proposed scheme.
For the simulation, we assume that each area containing 2000 EVs can engage in V2G operations, where the charging/discharging power per EV is
kW [
23]. Thus, the FR capacity
is set to 0.2. Due to the simulation’s short time span,
can be treated as a constant during the entire simulation process. The comprehensive scenario of power disturbances generated by loads and renewable energy sources is illustrated in
Figure 4. The disturbance ranges from −15 MW to +15 MW (with a variance of 45 MW), consisting of
MW from load-related disturbances and
MW from renewable energy-related disturbances. Disturbance sources are of various types, encompassing both renewable energy and load-induced disturbances. In this study, load and renewable-power fluctuations are represented as lumped disturbances for controller-performance evaluation. This simplified treatment is intended for benchmark validation rather than detailed stochastic modeling of specific renewable-resource distributions.
4.1. Validation of the Effectiveness of V2G FR Control
This section conducts simulation experiments on the proposed V2G FR control based on EV owner engagement within the IEEE-14 node system, as shown in
Figure 5, to validate the effectiveness of the proposed scheme in mitigating FR performance degradation caused by various disturbances.
The critic neural network adopts a three-layer structure, including seven input neurons, eight hidden neurons, and one output neuron. With the learning rate
set to 0.2 and
, four training rounds of simulations were conducted to construct a V2G FR controller that accounts for EV owners’ participation. During each learning iteration, the controller optimized performance under various disturbances using system data. As shown in
Figure 6, the single-region learning error for the IEEE-14 node system converged to near zero after 90 s in the final training round, demonstrating the convergence of the critic neural network’s learning process.
As shown in
Figure 7, the blue line represents the grid FR model without considering EV owners’ V2G participation, denoted as IRL. The red line represents the grid FR model incorporating EV owners’ V2G participation, denoted as α-IRL. In IRL, frequency deviation exhibits significant oscillations throughout the timeframe, fluctuating widely between −0.6 Hz and 0.6 Hz. The maximum frequency deviation reaches 0.5985 Hz, exceeding the safety threshold. This demonstrates that the IRL method struggles to effectively suppress frequency fluctuations when addressing frequency stability issues. In contrast, the α-IRL model exhibits significantly reduced frequency deviation fluctuations compared to IRL. Its overall curve is smoother and closer to the steady-state value, with a maximum deviation of 0.1955 Hz—remaining within the safe range. This demonstrates that the approach incorporating EV owners’ V2G participation achieves superior FR.
Table 3 presents FR performance through three metrics: integral of squared error (ISE), integral of absolute error (IAE), and weighted error consistency ultimate bound (UBB). Generally, smaller values of performance metrics indicate better performance. Compared to the traditional IRL-based FR scheme, the proposed FR scheme incorporating V2G participation from EV owners improved these three metrics by 89.1%, 66.8%, and 66.2%, respectively. These results validate the effectiveness of the proposed FR scheme in mitigating FR performance degradation caused by various disturbances.
Thus, the proposed method incorporating EV owner V2G participation demonstrates significant advantages over traditional IRL in suppressing frequency deviation and maintaining system frequency stability, effectively proving its efficacy.
4.2. Comparison of the Proposed V2G FR Control Method with Other Methods
This case study compares the proposed V2G FR control based on EV owner participation factor with Model Predictive Control (MPC), Heuristic Adaptive Dynamic Programming (ADHDP), and Deep Deterministic Policy Gradient (DDPG). The comparison is intended to evaluate the relative regulation performance of the proposed approach under various disturbance conditions.
For consistency with the single-area study, the critic neural network used in the IEEE 39-bus multi-area case adopts the same three-layer structure as that used for the IEEE 14-bus system. Specifically, the critic neural network consists of an input layer, a hidden layer, and an output layer, while the input dimension is selected according to the corresponding multi-area state vector.
As shown in the frequency deviation curves of
Figure 8, during the training process of the IEEE-39 node system, the learning errors corresponding to Area 1, Area 2, and Area 3 all underwent a transition from initial fluctuations to gradual convergence. Ultimately, they stabilized at relatively low error levels, indicating that the training method effectively optimizes learning errors within this system, enabling the system-related training to achieve favorable results.
As shown in the frequency deviation curves of
Figure 9, the frequency deviations of MPC (Model Predictive Control), ADHDP (Adaptive Heuristic Dynamic Programming), DDPG (Deep Deterministic Policy Gradient), and the proposed method all exhibit dynamic variations over time. In subfigures (a–c), the curves corresponding to MPC, ADHDP, and DDPG exhibit significant fluctuations, with frequency deviations oscillating noticeably between −0.4 Hz and 0.4 Hz. These oscillations persist strongly, indicating that these methods demonstrate poor FR performance when encountering disturbances. In contrast, the red curve representing the proposed method exhibits significantly smaller overall fluctuations than the other three approaches. Its frequency deviation approaches zero more closely, and the curve appears smoother. This demonstrates that under various disturbance scenarios, the proposed method can more effectively suppress fluctuations in frequency deviation and maintain system frequency stability.
Moreover, as shown in
Table 4, the proposed α-IRL scheme achieves better frequency-regulation performance than MPC, ADHDP, and DDPG under the considered multi-area simulation scenario.
In summary, the proposed method shows better regulation performance than MPC, ADHDP, and DDPG under the considered disturbance scenario, leading to more effective frequency-stability support.
4.3. Time-Varying Participation Case Study
In
Section 4.1 and
Section 4.2, the simulation horizon was limited to 180 s. Over such a short time scale, EV owners’ connection status and behavioral willingness are unlikely to change significantly, and therefore the participation factor
can be reasonably approximated as a constant. Over a longer operating horizon, however, this assumption becomes less appropriate because the practically dispatchable V2G capacity is affected by both EV plug-in availability and user-side willingness, both of which vary with travel routine and behavioral preference.
In this subsection, the simulation window is selected as 17:00–20:00, corresponding to a typical weekday evening residential plug-in period. Prior studies have shown that, in residential scenarios, EV arrival times commonly concentrate in this evening period and can often be approximated by a normal distribution with mean values around 17:00–20:00 [
24]. Therefore, this time window is adopted to characterize the influence of time-varying participation on dispatchable V2G regulation capacity.
It should be emphasized that this paper does not assume that
itself follows a single fixed probability distribution. Instead,
is modeled as a bounded time-varying factor jointly determined by two behavior-related components, namely plug-in availability and participation factor. This treatment is more physically meaningful than directly imposing a prescribed distribution on
, because practical V2G participation requires both that the EV is connected to the grid and that the owner is willing to provide ancillary support. Recent bottom-up flexibility studies have also shown that EV flexibility is more appropriately characterized using travel and plugging patterns together with heterogeneous user archetypes, rather than by a fixed scalar participation coefficient [
25].
Accordingly, the reference participation trajectory is constructed as
where
denotes the plug-in availability factor and
denotes the participation-willingness factor. The availability component represents the time-varying access of EVs to the charging interface during the evening commuting period, while the willingness component represents the user-side preference for participating in V2G and evolves more smoothly. This interpretation is consistent with recent V2G adoption studies showing that participation factor is strongly influenced by financial incentives, perceived loss of flexibility, and battery-degradation concerns [
13].
In this study, is designed according to the regularity of evening residential arrival and connection behavior, while is modeled as a slower-varying bounded behavioral factor. The actual participation state is then generated through a first-order dynamic update law, so that the participation factor evolves gradually rather than changing instantaneously. This setting is consistent with the physical interpretation that aggregated user participation exhibits inertia over time.
To illustrate the effect of time-varying participation, two cases are considered: a constant-participation baseline, in which the participation factor remains fixed throughout the 3 h interval, and a dynamic participation-aware case, in which the participation factor varies with time and the controller updates the EV regulation limit according to the real-time dispatchable V2G capacity.
Figure 10 shows the trajectory of the participation factor
. It can be observed that the dynamic participation factor is lower than the constant baseline in the early evening period and then gradually increases as more EVs become available. This result indicates that a fixed participation assumption may either overestimate or underestimate the actual V2G regulation capability depending on the operating time.
Figure 11 presents the corresponding frequency-deviation responses. The results show that the proposed framework remains stable over the entire 3 h interval under time-varying participation. Compared with the constant-
baseline, the dynamic-
-aware case exhibits a distinguishable but still stable regulation trajectory. These results indicate that the variation of EV plug-in availability and participation factor is translated into a time-varying participation factor, which in turn affects the closed-loop regulation process. Therefore, the dynamic-
-aware formulation provides a more realistic representation of V2G-supported FR over the evening operating window.
Overall, the above results show that the proposed participation-aware framework can accommodate behavior-driven variation of EV participation factor over a practically meaningful evening operating period while maintaining stable frequency-regulation performance.
5. Discussion
The simulation results support the two research hypotheses from complementary perspectives. In the IEEE 14-bus system, the comparison between the conventional IRL method and the proposed participation-aware α-IRL method shows that explicitly incorporating EV owners’ participation factor significantly improves frequency-deviation suppression and overall regulation performance. This result supports H1 by indicating that participation-aware modeling can better align theoretical V2G regulation capability with practically dispatchable regulation resources. In the IEEE 39-bus multi-area system, the comparisons with MPC, ADHDP, and DDPG show that the proposed method achieves better overall dynamic regulation performance under more complex disturbance conditions. This result supports H2 by demonstrating that the Hamiltonian-based integral reinforcement learning framework provides a robust and adaptive solution for distributed V2G FR under uncertainty. The time-varying participation study further shows that the proposed framework can maintain stable frequency-regulation performance over a practical evening operating window, while more realistically reflecting the effect of changing EV plug-in availability and user willingness on dispatchable V2G capacity.
Compared with MPC, the proposed method avoids repeated online optimization and therefore shows stronger adaptability under uncertain disturbances and changing participation-aware V2G availability. Compared with ADHDP and DDPG, the proposed method remains more closely connected to the physical control objective through the Hamiltonian-based formulation, which improves interpretability and facilitates the incorporation of participation-aware constraints. In addition, the proposed controller is implemented in a model-free online form with respect to the drift dynamics, meaning that explicit online knowledge of the system drift matrix is not required for policy realization. This feature improves practical adaptability when accurate drift dynamics are difficult to identify, while preserving the structured physical meaning of the control design.
From an application perspective, the proposed framework is relevant to future renewable-rich power systems in which V2G resources are expected to provide distributed ancillary support. By explicitly considering EV owners’ participation factor, the method improves the practical realism of dispatchable V2G regulation capacity and reduces the gap between idealized regulation assumptions and actual user-constrained resource availability. In addition, the distributed Grid–Aggregator–EV structure is compatible with hierarchical implementation in practical aggregation-based V2G services, where regulation commands, participation information, and EV response must be coordinated across multiple layers.
Several important directions remain for future study. First, more realistic user-side uncertainty should be incorporated, including dynamic travel schedules, state-of-charge distributions, charging/discharging preferences, and battery-degradation costs. Second, the present work adopts a simplified bounded-delay representation; therefore, larger communication delays, packet loss, asynchronous updates, and more realistic cyber-physical constraints should be explicitly investigated in future networked-control formulations. Third, broader large-scale validation under more practical operating scenarios and richer benchmark comparisons would further strengthen the engineering applicability of the proposed framework.
6. Conclusions
This paper proposes a participation-aware V2G frequency-regulation framework for renewable-rich power systems by integrating the EV participation factor into the control design. A three-layer Grid–Aggregator–EV architecture and a corresponding participation-aware dynamic model were established to describe the coordination of distributed V2G resources in FR. On this basis, a Hamiltonian-based optimal robust regulation law was derived under practical V2G power constraints. To realize the control policy online, an integral reinforcement learning scheme was further developed so that the controller can be implemented without explicit online dependence on the system drift matrix . Therefore, the proposed method preserves a model-free characteristic with respect to the drift dynamics while maintaining the physical structure of the frequency-regulation problem.
The simulation results on the IEEE 14-bus and IEEE 39-bus systems verified the effectiveness of the proposed framework from complementary aspects. In the IEEE 14-bus case, explicitly modeling EV participation factor improved the practical dispatchability of V2G resources and achieved better frequency-regulation performance than the participation-agnostic baseline. In the IEEE 39-bus case, the proposed method further demonstrated stronger robustness and adaptability than the compared methods under more complex multi-area disturbances. Overall, the proposed participation-aware Hamiltonian-IRL framework provides an effective and physically interpretable solution for coordinated V2G-based FR. In addition, the time-varying participation case showed that the proposed framework can accommodate behavior-driven variation of EV participation factor over a practical evening operating period while maintaining stable frequency-regulation performance.
Future work will focus on incorporating more realistic EV operational factors, such as travel schedules, state-of-charge distributions, battery degradation, and larger communication uncertainties, so as to further improve the practical applicability of the proposed method in large-scale V2G frequency-regulation scenarios.