1. Introduction
The global push toward cleaner, decentralized power systems has led to the widespread adoption of inverter-based renewable energy sources, particularly photovoltaic (PV) systems. Unlike synchronous machines, PV inverters do not naturally provide rotational inertia, which traditionally played a critical role in stabilizing system frequency during disturbances. This growing reduction in system inertia has created new challenges in maintaining frequency stability and dynamic resilience, especially in isolated microgrids and low-inertia environments where even small power imbalances can lead to significant frequency deviations.
To mitigate these risks, virtual inertia control strategies have emerged as promising tools that enable inverter-based resources to emulate the inertial response of conventional generators. These methods enhance the system’s ability to resist sudden frequency deviations by injecting synthetic inertia through a fast inverter response. However, conventional virtual inertia schemes are typically implemented using fixed-parameter controllers. While these static methods are simple and easy to deploy, they lack the adaptability needed for modern, variable renewable energy environments [
1].
To improve performance, researchers have introduced optimization techniques such as Particle Swarm Optimization (PSO) to tune the virtual inertia and damping coefficients for improved frequency response and transient stability [
2,
3]. Although PSO offers a strong global search capability and helps determine optimal controller settings, it is often applied as an offline process, which limits its effectiveness under rapidly changing grid conditions. Once set, the parameters remain fixed and may not be optimal in the face of load variations, changes in irradiance, or other disturbances.
Meanwhile, Reinforcement Learning (RL) has gained attention as a dynamic control method capable of real-time adaptation. RL algorithms allow controllers to learn optimal actions from continuous system interactions, adapting to varying grid conditions without requiring an explicit model [
4]. Despite this advantage, RL approaches alone may suffer from issues such as lengthy training times, poor initial performance, and an instability in convergence in complex or non-linear environments.
Recently, hybrid strategies that combine the strengths of both methods have gained interest. These hybrid control frameworks use metaheuristic optimizers like PSO to initialize controller parameters and then apply RL to refine these parameters dynamically during operation. This approach combines the global search advantage of PSO with the online learning and adaptation capabilities of RL. Such strategies have been shown to enhance the performance of frequency regulation schemes in PV-based microgrids, particularly in improving resilience and reducing steady-state error [
5,
6,
7]. For instance, recent studies [
6,
7] have introduced hybrid inertia control methods that adaptively regulate frequency support in islanded or low-inertia systems under realistic dynamic conditions. The author [
8] presents ST-CALNet, a new hybrid deep learning framework that combines a concentrated Long Short-Term Memory (LSTM) structure with convolutional neural networks (CNNs) to improve forecasting in smart grids with renewable energy integration. The LSTM module simulates temporal correlations in past load patterns, while the CNN component records spatial connections from heterogeneous inputs, including generation data and meteorological variables.
The frequency of an islanded microgrid was controlled using the FOPID controller in [
9]. In [
10], the FOPID system controller was used to enhance VIC performance in an islanded microgrid. Its settings were adjusted through the use of a neural network. In order to improve frequency consistency in an islanded microgrid, the FOPID controller was used in [
11], with its settings tuned using the SWA. The FOPID controller was used in [
12] to increase frequency reliability over an islanded microgrid by optimizing its parameters using the SCA. In order to improve frequency reliability in an islanded microgrid, the FOPID controller was used in [
13] with its settings improved using the HSA. Based on the results of the management methods [
9,
10,
11,
12,
13], the islanded microgrid’s frequency stability has been enhanced by the FOPID controller. However, this kind of controller has drawbacks as well. The FOPID controller’s fractional parameters are one of its complexities that could have a direct impact on the islanded microgrid’s frequency stability. As a result, it is crucial to set these parameters correctly. Compared to single PID and FOPID controllers, cascaded controllers with both PID and FOPID components offer numerous benefits. These controllers, also known as cascaded controllers, offer the best performance and react fast to system changes. Furthermore, complex systems, such as islanded microgrids, can be controlled by this type of controller [
14,
15,
16]. In power systems and islanded microgrids, sequential controllers have been utilized to enhance the frequency control [
14,
15,
16]. The DSA was used to refine the parameters of the FOPI-FOPD tandem controller, which was used in [
16] to improve frequency management in power systems. To enhance the frequency management in islanded microgrids, the PI-TID sequential controller was employed in [
15], with its settings adjusted using the chaotic BOT. The PI-FOPID sequential controller was used in [
16] to improve frequency regulation in islanded microgrids. The GTO was used to optimize the controller’s parameters. The results of the control techniques [
14,
15,
16] show that the cascaded controller can keep the islanded microgrid’s frequency stability. On the islanded microgrid, this controller is extremely resilient to disturbances. In the event of uncertainties, it can also keep the islanded microgrid’s frequency stability. This paper proposes a Hybrid PSO–Reinforcement Learning-Based Adaptive Virtual Inertia Control (AVIC) strategy designed for PV-based islanded microgrids. The proposed approach utilizes PSO to determine initial virtual inertia and damping coefficients, providing a well-tuned starting point. A Reinforcement Learning agent then adapts these parameters in real time in response to frequency deviations, allowing the controller to respond dynamically to operational conditions. This hybrid structure enables a robust and adaptive solution that outperforms conventional fixed-gain or standalone optimization-based approaches.
Simulation studies are conducted in MATLAB R2024a/Simulink to evaluate the effectiveness of the proposed hybrid control scheme. The performance is assessed under typical operating scenarios using metrics such as frequency deviation, settling time, and rate of change of frequency (RoCoF). Results demonstrate that the hybrid PSO–RL controller significantly enhances frequency stability and dynamic response compared to non-adaptive and single-layer control schemes.
Compared to previous work, this research offers the following significant advances:
- ◾
Integrated Hybrid Control Framework: This is one of the first studies to integrate PSO and RL into a hybrid control framework for virtual inertia, enabling both global optimization and local real-time adaptation.
- ◾
Enhanced Robustness and Adaptivity: By combining the strengths of both techniques, this controller adapts more effectively to rapidly changing grid dynamics, ensuring an enhanced frequency stability even under severe disturbances or high renewable penetration scenarios.
- ◾
Practical Implementation: The proposed method is implemented and validated in MATLAB through real-time gate signal generation for inverter switches, demonstrating practical viability for embedded system applications in smart grid environments.
By bridging optimization and adaptive learning, this method provides a comprehensive and scalable solution to frequency stabilization challenges in modern low-inertia power systems. It addresses the shortcomings of existing approaches by offering both high performance and adaptability, making a valuable contribution to the field of intelligent energy management and smart grid control systems. The conceptual structure of the proposed hybrid control system is illustrated in
Figure 1, which outlines the coordination between the PSO for global optimization and RL for real-time adaptive control.
What distinguishes this hybrid PSO-RL implementation in energy systems is its real-time Reinforcement Learning (RL) adaptation mechanism built upon PSO-initialized parameters. Unlike previous studies that apply PSO and RL in isolation or in offline stages, our approach uses Particle Swarm Optimization to pre-optimize the initial policy network weights, learning rates, and exploration parameters of the RL agent. This ensures that the Reinforcement Learning process begins from a near-optimal configuration, drastically reducing convergence time and avoiding inefficient exploration during early operation.
In the dynamic environment of energy systems, where load profiles, renewable generation, and grid constraints fluctuate in real-time, this integrated framework allows the RL agent to continuously adapt operational strategies, such as energy dispatch, load shifting, or demand response, with high responsiveness and precision. The hybrid approach combines PSO’s global search capabilities (for initial parameter tuning) with the adaptive, model-free control of RL, achieving both faster learning and higher resilience to non-stationary operating conditions. This results in a more reliable and economically optimized energy system performance compared to conventional PSO-only, RL-only, or sequential PSO-RL strategies.
2. Methodology
The model is designed to improve frequency stability in a distributed PV-based microgrid environment through intelligent virtual inertia control. The system consists of four individual PV arrays, each connected to its own DC/DC converter. These converters play a crucial role in extracting the maximum available power from the solar panels by implementing Maximum Power Point Tracking (MPPT) algorithms. MPPT dynamically adjusts the duty cycle of the converter to ensure optimal power extraction under varying environmental conditions such as solar irradiance and temperature [
17]. The output of the PV modules is unregulated DC, which is then stabilized through the converter to provide a steady DC voltage required by downstream systems.
Following the DC/DC conversion, the output is passed through a DC-link capacitor, which serves as an energy buffer and voltage stabilizer. This capacitor reduces ripple in the voltage and ensures a consistent DC supply to the IGBTs and diodes [
18]. The DC-link voltages (Vdc, Vdc1, Vdc2, and Vdc3) are continuously monitored and fed into the central controller. These voltages provide insight into the power availability and converter health, and they are used as part of the control logic in the inertia emulation algorithm.
Each PV unit is connected to an IGBT/Diodes module, which is responsible for converting DC power into three-phase AC and injecting it into the local grid [
19]. The IGBT/Diodes work by switching the DC voltage to an AC waveform while managing both active and reactive power flow [
20]. They also regulate the voltage at the Point of Common Coupling (PCC) and support frequency control. Gate signals from the Hybrid PSO–Reinforcement Learning (RL) Controller modulate the IGBT/Diodes operation to implement dynamic power control and inertia emulation.
At the heart of the system lies the Hybrid PSO–Reinforcement Learning (RL) Controller, which dynamically provides synthetic inertia by emulating the inertial response of traditional synchronous generators. The controller takes multiple inputs, including real-time DC-link voltages, measured grid frequency, and power parameters. It uses Particle Swarm Optimization (PSO) to optimize the initial controller parameters and employs Reinforcement Learning to adapt to changes in grid dynamics during real-time operation. The output of this controller consists of four gate signals (gate1 to gate4) that drive the IGBT/Diodes to inject or absorb power in response to frequency deviations, thereby enhancing system stability.
The AC output from each IGBT/Diodes is interfaced with the grid using a distribution transformer rated at 400 kVA and 260 V/25 kV. This voltage is stepped up again via a 25 kV/132 kV transformer for integration into the transmission grid. A 100 kW load is modeled at the 25 kV level to simulate real-time consumption and test the control system’s ability to maintain frequency under varying load conditions. This subsystem also includes passive elements such as filters and impedance lines to emulate realistic grid interfacing [
21]. The detailed architecture of the proposed Hybrid PSO–Reinforcement Learning (RL) Controller is presented in
Figure 2. The framework begins with system inputs, including nominal parameters, simulated voltages, and fixed currents, which are used to compute instantaneous power and the corresponding frequency deviation. Based on this deviation, the inertia and damping coefficients are dynamically adjusted through the Hybrid PSO–RL mechanism, where PSO performs global parameter optimization while RL continuously updates the control policy in real-time. The adapted control signals are processed to generate PWM reference signals, which are then compared with a carrier waveform to produce logical gate pulses. These pulses are then translated into complete gate signals for the IGBT modules of the voltage source converters (VSCs). This closed-loop structure ensures effective parameter adaptation, robust dynamic response, and reliable frequency stability under varying load and renewable energy conditions.
To validate the proposed controller, two simulation cases are considered. Case A evaluates independent PV–IGBT operation with decentralized inertia emulation, while Case B analyzes a shared IGBT configuration with centralized control. These scenarios enable a comprehensive assessment of the controller’s performance under different interfacing schemes, providing insights into distributed versus centralized inertia emulation strategies within PV-based multi-microgrid systems.
2.1. Case A—Independent PV-IGBT Modules Connected to the Same Node
In this configuration, each PV module is equipped with a dedicated IGBT-based converter, and all four modules are connected directly to a common grid node. The Hybrid PSO–RL controller independently modulates each IGBT, enabling decentralized operation and distributed inertia emulation. This setup allows each PV source to contribute inertia individually, enhancing modularity and resilience.
2.2. Case B—Four PV Modules Sharing One IGBT Before Node Connection
In the alternative configuration, all four PV modules are aggregated at the DC bus and fed into a single IGBT converter connected to the grid node. Here, the Hybrid PSO–RL controller performs centralized modulation of the combined PV output, reducing hardware requirements while providing coordinated inertia emulation.
The overall Simulink model of the PV-based multi-microgrid system is illustrated in
Figure 3. It integrates PV arrays, energy storage, MPPT, and the proposed Hybrid PSO–RL Controller in a grid-connected environment. Two configurations are supported: Case A, where each PV module is interfaced with an independent IGBT for decentralized inertia emulation, and Case B, where all PV modules share a single IGBT for centralized control. These two cases enable a comprehensive comparison between decentralized and centralized inertia emulation strategies, providing insights into trade-offs between scalability, hardware efficiency, and control flexibility.
3. Mathematical Modeling
The frequency regulation in an inverter-dominated system relies on virtual inertia and damping parameters that adjust in response to power fluctuations or disturbances [
22].
3.1. Frequency Deviation Model
The frequency deviation
is a key indicator of system stability. It is defined as the difference between the instantaneous frequency
and the nominal frequency
:
where
- ◾
is the instantaneous grid frequency.
- ◾
is the nominal grid frequency (typically 50 Hz or 60 Hz).
When power generation and demand are imbalanced, the grid frequency will deviate from the nominal value.
3.2. Power Generation and Frequency Relationship
The instantaneous frequency deviation is related to the change in power output
through the following inertia equation:
where
- ◾
is the change in power output (difference between generated power and load demand);
- ◾
is the system inertia constant (in seconds);
- ◾
is the nominal frequency.
The system’s inertia constant represents the capability of the system to resist changes in the frequency. Lower inertia can result in faster frequency fluctuations.
3.3. Virtual Inertia Control
To stabilize the frequency and mimic the inertial response of traditional synchronous generators, virtual inertia is introduced through the control strategy. This synthetic inertia
adapts in real time based on the system’s frequency deviation:
where
- ◾
is the base inertia constant;
- ◾
is the adaptation gain;
- ◾
is the frequency deviation.
Virtual inertia provides resistance to rapid frequency changes by injecting or absorbing power in proportion to the rate of change of frequency.
3.4. Damping Control
In addition to virtual inertia, damping is crucial for stabilizing the system by counteracting any oscillations. The damping term
is similarly adjusted based on the frequency deviation:
where
- ◾
is the base damping coefficient;
- ◾
is the adaptation gain.
Damping works by reducing the amplitude of oscillations, helping to restore the system to equilibrium.
3.5. Power Injection Based on Frequency Deviation
To emulate the inertial response of synchronous generators, the active power
injected into the grid is adjusted based on the frequency deviation:
This equation ensures that the power injection is proportional to the rate of change of frequency () and the instantaneous frequency deviation. The controller adjusts dynamically to stabilize the grid frequency.
3.6. Reinforcement Learning Adjustment
To optimize the adaptation parameters and , a Reinforcement Learning (RL) algorithm can be employed. The RL algorithm updates the adaptation gain based on real-time feedback:
- ◾
State
- ◾
, the current frequency deviation;
- ◾
Action , the change in adaptation gain;
- ◾
Reward , where the reward is negative for larger deviations from the nominal frequency.
The RL agent updates the control policy using the following:
where
- ◾
is the action value function;
- ◾
is the learning rate;
- ◾
is the discount factor.
The RL process allows the system to adapt to changing grid conditions and optimize performance over time.
3.7. Hybrid Interaction
- ◾
PSO: Performs offline tuning of initial controller gains by minimizing a cost function such as the integral of frequency error (IFE).
- ◾
RL: Performs online adjustment in real-time to adapt to changing system conditions by maximizing the cumulative reward.
The hybrid controller thus enables both global optimality (via PSO) and local adaptability (via RL), ensuring a robust performance in dynamic microgrid environments.
The interaction between Particle Swarm Optimization (PSO) and Reinforcement Learning (RL) in a hybrid control system combines the strengths of both offline and online learning paradigms for enhanced system performance and adaptability. PSO serves as an offline optimization technique, where optimal or near-optimal parameters (e.g., controller gains, inertia constants) are pre-computed by simulating multiple scenarios and evaluating a fitness function. This reduces the search space and provides a solid starting point for real-time operations. On the other hand, RL operates online, learning from real-time system interactions and environmental feedback. While PSO provides an optimized initialization, the RL adapts these parameters dynamically as the system encounters unforeseen disturbances or time-varying conditions. The RL agent continuously refines control actions based on a reward signal, ensuring the system remains stable and responsive under non-linear and stochastic scenarios.
This interaction creates a synergistic loop: PSO accelerates RL convergence by supplying good initial policies, and the RL ensures robustness by adapting beyond the PSO’s fixed optimization boundaries. In frequency stability control of PV-based microgrids, for example, PSO can tune inertia gains offline, while RL adjusts them online in response to load fluctuations or renewable intermittency, ensuring a seamless performance and reduced frequency deviation in real-time operations.
However, the effectiveness of PSO-RL control is highly dependent on the physical configuration of the PV generation units and their interface with the grid.
This study investigates and compares two configurations:
- (i)
Four PV-IGBT modules connected individually to the same node.
- (ii)
Four PV modules connected to a single shared IGBT, which is then interfaced with the node.
The analysis focuses on how each configuration affects control dynamics, power quality, and response to disturbances under a hybrid PSO-RL controller.
As shown in
Table 1, the PV-based islanded microgrid parameters were selected to emulate realistic operating conditions of a medium-scale renewable energy system, ensuring the model captures essential dynamics for frequency stability analysis.
The PSO tuning parameters in
Table 2 were optimized to achieve a rapid convergence while avoiding premature stagnation, thereby providing robust initial controller settings before online adaptation.
As presented in
Table 3, the RL training parameters were chosen to balance adaptability with stability, enabling the effective real-time adjustment of controller gains without compromising closed-loop performance.
3.8. Theoretical Convergence and Stability of the PSO–RL AVIC
The closed-loop frequency dynamics of the islanded microgrid with the proposed Adaptive Virtual Inertia Control (AVIC) can be represented in state-space form as
where
is the state vector, with denoting the frequency deviation and its time derivative.
is the parameter vector containing the virtual inertia gain and virtual damping gain of the AVIC.
is the bounded disturbance input, representing load variations or renewable generation fluctuations.
is the closed-loop state matrix determined by , which is Hurwitz in the stabilizing set Θ.
is the disturbance input matrix mapping to the state derivatives.
If
is Hurwitz, there exists a symmetric positive–definite matrix P satisfying the Lyapunov equation [
23]:
3.8.1. PSO Stage (Offline)
The Particle Swarm Optimization (PSO) stage searches for optimal parameters
within the compact stabilizing set Θ, ensuring A(
) remains Hurwitz for all
Θ. The use of the constriction factor PSO guarantees bound particle velocities and positions, with convergence to a stationary point in the feasible space [
24]. The optimized
serves as a stabilizing initial point for the online RL stage.
3.8.2. RL Stage (Online)
The Reinforcement Learning (RL) stage updates the AVIC parameters online according to
where
is the estimated policy gradient,
> 0 is the step size, and
(⋅) is the projection operator onto the stabilizing set Θ. Projection ensures that parameters remain stable, a standard adaptive control technique used to maintain closed-loop stability [
25].
3.8.3. Lyapunov–ISS Stability Analysis
Consider the Lyapunov function
where
is the parameter error and
is the equilibrium parameter vector. Its time derivative satisfies
for some constants > 0,
> 0,
> 0,
From this inequality, the system is input-to-state stable (ISS) [
26]:
If w ≡ 0, the equilibrium is globally exponentially stable.
If w is bounded, the state remains bounded, with .
3.8.4. Convergence Remarks
PSO Convergence—Guaranteed under constriction factor theory for bounded feasible sets [
26].
RL Convergence—Ensured under diminishing step sizes satisfying
and
and with bounded and asymptotically unbiased gradient estimates [
27,
28].
4. Results and Discussion
To assess the dynamic response of the proposed hybrid controller under realistic grid conditions, a disturbance scenario involving a 10% step increase in load was applied at t = 0.2 s. This sudden change was designed to test the system’s ability to preserve the voltage and frequency stability in the face of rapid demand fluctuations. The controller’s performance was benchmarked against a standard PSO-based controller and a non-adaptive controller over a 1 s simulation period, highlighting the comparative advantages of the hybrid PSO–RL scheme in terms of transient handling and steady-state regulation.
Figure 4 illustrates the performance of the proposed Hybrid PSO–Reinforcement Learning-Based Adaptive Virtual Inertia Control system in a multi-microgrid PV setup. The top subplot presents the three-phase voltage waveforms, which stabilize rapidly and maintain a consistent sinusoidal profile after the initial transient period, indicating successful voltage regulation and synchronization with the grid. The middle subplot shows the corresponding three-phase current waveforms, which exhibit a well-balanced sinusoidal form with minimal harmonic distortion after startup, confirming effective current injection and load sharing among the inverters. The bottom subplot displays the power quality in terms of total apparent power (PQ) in kVA. The system demonstrates a sharp initial response followed by smooth settling behavior, with PQ stabilizing around a constant value. These results confirm that the hybrid controller effectively mitigates transient disturbances and maintains high power quality, thereby contributing to frequency stability and dynamic system performance.
In
Figure 5, the DC-link voltage waveforms (
,
,
, and
) correspond to multiple inverter modules in the multi-microgrid PV system. Initially, all DC voltages exhibit a brief transient response with small oscillations due to startup dynamics and power balancing.
However, by approximately 0.2 s, each voltage stabilizes near the desired operating point of 500 V. The observed steady-state ripple is minimal, indicating the effectiveness of the MPPT-controlled DC-DC converters and the coordinated control strategy. The uniformity across all four voltage traces demonstrates the balanced operation of the distributed PV units and consistent energy transfer through the IGBT-based inverter interfaces. This stable DC voltage regulation is critical for ensuring high-quality AC output and seamless grid synchronization.
In
Figure 6, two key performance characteristics of the photovoltaic (PV) system are illustrated: irradiance and mean power output over time. The top subplot shows a nearly constant irradiance level of approximately 1000 W/m
2, indicating stable solar input throughout the observation period. This stability ensures continuous exposure of the PV modules to consistent sunlight, providing a reliable source of energy. In contrast, the bottom subplot displays the mean power output
, which initially exhibits a sharp rise as the system stabilizes in response to the irradiance. After this transient phase, the output converges smoothly to a steady value of about 95–100 kW, with minor variations reflecting the comparative performance of the controllers. These results confirm the system’s ability to adapt and maintain steady power generation under stable irradiance, demonstrating both efficiency and robustness of the proposed control mechanism.
Figure 7 shows the frequency over time, with the frequency consistently remaining at 50 Hz throughout the observation period. This indicates that the system is well-regulated and stable, maintaining the nominal grid frequency without any significant fluctuations or disturbances. The absence of frequency variations suggests that the frequency control mechanism is effectively compensating for any potential deviations, ensuring a stable operation of the system, which is crucial for grid synchronization.
This stable behavior could be due to virtual inertia control or real-time frequency adjustment mechanisms within the power system, allowing it to respond to changes in power generation or demand without deviating from the nominal frequency.
The plot illustrates the dynamic frequency response of three different control strategies under a disturbance or setpoint change. The non-adaptive controller (red dashed line) shows significant oscillations and slow damping, highlighting poor transient stability and weak adaptability to dynamic system changes. The standard PSO controller (blue dash–dot line) improves upon this by reducing oscillations and achieving a faster convergence but still exhibits noticeable frequency fluctuations in the early transient phase.
In contrast, the Hybrid PSO-RL controller (solid green line) maintains a remarkably stable response throughout the simulation, effectively holding the frequency at the nominal value of 50 Hz with negligible deviation. This demonstrates superior adaptability and learning capability, as the hybrid controller dynamically tunes itself in real time based on system feedback. Overall, the Hybrid PSO-RL approach outperforms both traditional control schemes in terms of response speed, overshoot minimization, and frequency regulation precision.
As shown in
Figure 8, the proposed controller significantly improves frequency stability under load variations.
Table 4 provides a comparative evaluation of the frequency response performance for the three controllers. The non-adaptive controller shows the weakest performance, with a high peak overshoot (54.6 Hz), a long settling time (0.75 s), a steep RoCoF (76.7 Hz/s), and a steady-state error of ±0.4 Hz. The standard PSO controller improves frequency regulation with a reduced overshoot (51.7 Hz), a faster settling time (0.40 s), a moderate RoCoF (21.3 Hz/s), and a smaller steady-state error of ±0.1 Hz. The hybrid PSO–RL controller demonstrates a superior performance, maintaining a frequency close to nominal (50.02 Hz), with the fastest settling time (0.10 s), minimal RoCoF (0.2 Hz/s), and effectively zero steady-state error. These results confirm the hybrid controller’s enhanced stability and adaptability under transient conditions in multi-microgrid PV systems.
Figure 9 shows the convergence curve of the frequency response.
The simulation results in
Figure 10 demonstrate the effectiveness of the hybrid PSO–RL control strategy in maintaining system frequency stability under varying disturbances, renewable variability, and fault conditions. The PV power profile shows realistic fluctuations, while load disturbances and fault injections introduce dynamic challenges. Despite these, the RL agent adapts the control parameters in real time, correcting deviations that the offline PSO alone cannot handle. The frequency response plot shows quick recovery and minimal overshoot, indicating robustness and adaptability. This hybrid approach leverages the global search ability of PSO and the real-time learning capacity of RL, ensuring optimal and stable microgrid performance.
The superior performance of the hybrid PSO–RL controller is primarily attributed to its ability to combine the global optimization strength of Particle Swarm Optimization (PSO) with the real-time adaptability of Reinforcement Learning (RL). In contrast to traditional fixed-parameter controllers or separately tuned PSO/RL schemes, this hybrid framework enables the RL agent to start from an optimal parameter space initialized by PSO, such as learning rates, exploration parameters, and action weightings, thus eliminating inefficient exploration during the early stages of control. More importantly, the real-time RL adaptation proves critical under dynamic disturbances, such as sudden load changes or fluctuations in PV output due to cloud coverage. When frequency deviation occurs, the RL agent, guided by its continuous feedback learning mechanism, quickly adjusts the virtual inertia in response to the rate and direction of change. For instance, during a sharp frequency drop, the RL component increases the virtual inertia to inject synthetic power, emulating the response of synchronous inertia. Conversely, during over-frequency events, the agent reduces inertia and absorbs excess energy, thereby damping oscillations. This context-aware decision-making, enabled by online state-action evaluations and updated policies, allows the system to stabilize more rapidly and robustly than conventional PID or PSO-tuned static inertia controllers. Additionally, the hybrid approach prevents overfitting to specific operating conditions and generalizes better across varying grid events, making it well-suited for highly dynamic, renewable-dominated microgrid environments.
The novelty of this research lies in the first integration of a Particle Swarm Optimization (PSO)-tuned Proportional-Integral-Derivative (PID) controller with an online Reinforcement Learning (RL) adaptation scheme for Adaptive Virtual Inertia Control (AVIC) in PV-based islanded microgrids. Unlike existing AVIC approaches [
29,
30,
31,
32,
33,
34,
35,
36,
37], which either rely on fixed parameters or purely offline optimization, the proposed method enables the dynamic real-time adjustment of controller parameters while preserving closed-loop stability via Lyapunov–ISS design. This combined offline–online optimization strategy achieves superior performance in terms of setting time, controller error, and overshoot compared to all benchmarked methods in
Table 5—where the proposed method attains a setting time of 0.10 s, zero steady-state error, and an overshoot of only 0.02, outperforming Fuzzy, ANN–PID, ANFIS, GA–PID, and other optimization-based controllers. To the best of our knowledge, no prior work has applied a PSO–RL hybrid to AVIC in PV-based islanded microgrids, demonstrating a proven theoretical stability and comparative superiority across all three performance metrics.
Comparative Analysis of Case A and Case B Configurations
A quantitative performance comparison was conducted between two PV–IGBT configurations under the same 10% load step disturbance at t = 0.2 s, using identical Hybrid PSO–RL Controller parameters:
Table 6 and
Figure 11 present the key performance indicators: settling time, overshoot, rate of change of frequency (RoCoF), steady-state (SS) error, and maximum frequency deviation.
As shown in
Table 6 and
Figure 11, Case B consistently outperforms Case A across all evaluated metrics. Specifically, Case B achieves a 37% reduction in settling time, 52% lower overshoot, 29% improvement in RoCoF, and complete elimination of steady-state error. This demonstrates that centralized control via a single IGBT allows the Hybrid PSO–RL controller to operate more efficiently, delivering faster damping, improved real-time inertia emulation, and enhanced frequency stability.
Based on these findings, all subsequent disturbance and dynamic performance analyses in this paper are carried out using the Case B configuration, unless otherwise stated.