Next Article in Journal
Impact of Internal Solitary Waves on Marine Suspended Particulate Matter: A Review
Previous Article in Journal
Research on the Influence of Hydrofoil Propulsive Parameters on Propulsion Efficiency
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Leader Election and Model-Free Reinforcement Learning for Coordinated Voltage and Reactive Power Containment Control in Offshore Island AC Microgrids

1
College of Information Science and Engineering, Northeastern University, Shenyang 110819, China
2
State Grid Jilin Electric Power Co., Ltd. Changchun Power Supply Company, Changchun 130000, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2025, 13(8), 1432; https://doi.org/10.3390/jmse13081432
Submission received: 29 May 2025 / Revised: 22 July 2025 / Accepted: 26 July 2025 / Published: 27 July 2025
(This article belongs to the Section Ocean Engineering)

Abstract

Island microgrids are essential for the exploitation and utilization of offshore renewable energy resources. However, voltage regulation and accurate reactive power sharing remain significant technical challenges that need to be addressed. To tackle these issues, this paper proposes an algorithm that integrates a dynamic leader election (DLE) mechanism and model-free reinforcement learning (RL). The algorithm aims to address the issue of fixed leaders restricting reactive power flow between buses during heavy load variations in island microgrids, while also overcoming the challenge of obtaining model parameters such as resistance and inductance in practical microgrids. First, we establish a voltage containment control and reactive power error model for island alternating current (AC) microgrids and construct a corresponding value function based on this error model. Second, a dynamic leader election algorithm is designed to address the issue of fixed leaders restricting reactive power flow between buses due to preset voltage limits under unknown or heavy load conditions. The algorithm adaptively selects leaders based on bus load, allowing the voltage limits to adjust accordingly and regulating reactive power flow. Then, to address the difficulty of accurately acquiring parameters such as resistance and inductance in microgrid lines, a model-free reinforcement learning method is introduced. This method relies on real-time measurements of voltage and reactive power data, without requiring specific model parameters. Ultimately, simulation experiments on offshore island microgrids are conducted to validate the effectiveness of the proposed algorithm.

1. Introduction

Renewable generation is playing an increasingly important role in the development and utilization of marine resources [1,2]. Specifically, the proportion of renewable generation units in offshore island microgrids is higher than that in land-based microgrids. These renewable generation units are integrated into the grid via inverters that convert direct current (DC) into AC [3]. Voltage serves as a crucial parameter in island AC microgrid operation [4]. Proper reactive power allocation affects not only power balance but also voltage regulation and line loss optimization. Since island microgrids are not connected to the main grid, voltage and reactive power regulation depend entirely on internal coordinated control [5]. Additionally, island microgrids operate under harsh marine climatic conditions, which pose significant challenges to their operational security. Given their important role in promoting the development of marine resources, voltage regulation and reactive power sharing have become key issues that require extensive research [6].
Voltage regulation and reactive power sharing have been extensively investigated in previous research. The average voltage regulation method in [7] ensures accurate reactive power sharing but may violate voltage safety constraints under load fluctuations. The weighted coefficient method in [8] maintains voltage within safe limits but relies on empirical tuning, limiting scalability. The optimization-based approach in [9] achieves precise control but is computationally intensive and hard to implement in real time. These limitations hinder the practical deployment of these methods in island microgrids.
To address the conflict between voltage regulation and reactive power sharing in island AC microgrids, containment control has gradually attracted increasing attention. In [10], the use of containment control was first proposed to balance voltage regulation and reactive power sharing in microgrids, laying a theoretical foundation for subsequent research in this field. The application of containment control was further extended in [11] to address voltage regulation issues among multiple interconnected microgrids, which significantly improved the performance and stability of coordinated multi-microgrid operation. In [12], the discussion focuses on a containment control-based strategy for voltage regulation in microgrids facing communication and sensor failures. However, in the above methods, the leaders for containment control are typically predetermined. When the bus associated with the upper-bound leader experiences a heavy load, its voltage may drop significantly, which contradicts the assumption that the upper leader should always maintain the highest voltage. This situation leads to ineffective reactive power transfer among buses. The fixed leader configuration lacks flexibility and reduces the adaptability of the island microgrid under dynamic operating conditions.
In island microgrids that have already been put into operation, it is difficult to accurately obtain model parameters such as resistance and inductance due to the limited precision of measurement devices and the influence of operating conditions (such as temperature variations). In contrast, data-driven approaches that do not rely on explicit model parameters can directly utilize operational data from the microgrid for controller design. In [13], a deep learning-based secondary controller for microgrids is designed using historical data. In [14], Koopman’s operator theory enables voltage control based on input–output data. In [15], a data-driven distributed predictive control method achieves voltage restoration and current sharing via an incremental linear model. In [16], least squares and Gaussian process regression are used to learn system sensitivity and estimate modeling errors, ensuring optimal and safe microgrid control. However, the method in [13] requires processing a large quantity of offline data, while the approach in [15] has high computational complexity and slow convergence of control performance, making it difficult to satisfy the real-time requirements of microgrid control. The methods in [14,16] rely heavily on prior knowledge and historical data. These issues limit the widespread application of data-driven methods in practical island microgrids.
Based on the above discussion, this paper proposes an island microgrid voltage regulation and reactive-power-sharing control strategy that combines a dynamic leader election algorithm with a model-free reinforcement learning algorithm. The proposed strategy utilizes a leader election algorithm to dynamically adjust the leader roles according to the load conditions of each bus in the island microgrid: the distributed generation (DG) corresponding to the bus with a higher load is set as the lower-bound leader, while the DG associated with the bus with a lighter load is set as the upper-bound leader. In this way, flexible reactive power flow among buses is promoted, enabling precise reactive power sharing. Meanwhile, by designing value functions for voltage and reactive power errors and employing a model-free reinforcement learning algorithm, the controller is designed based solely on island microgrid operational data without requiring any model information. Furthermore, this paper theoretically proves the convergence of the leader election algorithm and the optimality of the policy iteration algorithm in model-free reinforcement learning. Lastly, the proposed strategy is validated through a series of simulation experiments. In the experiments, the effectiveness of the proposed method is validated in three distinct case studies, which confirm that it restores the voltage of island AC microgrids to the reference range set by containment control and accomplishes accurate reactive power sharing. The main contributions of this research are summarized below:
  • To address the limitations of containment control methods based on fixed leaders, as proposed in [10,11], which struggle with complex scenarios like sudden large load changes, this paper introduces a novel DLE algorithm. Unlike the static nature of fixed-leader approaches, our DLE mechanism is based on bus voltage estimation, allowing each DG to dynamically select the leader according to the relative magnitude of the estimated voltages. This adaptive capability enables accurate reactive power sharing even under sudden load changes or large load fluctuations, significantly enhancing the microgrid’s flexibility.
  • To overcome the challenges of model-based controller design, as highlighted in [17,18], where obtaining practical parameters like resistance and inductance is difficult, this paper proposes a data-driven online reinforcement learning approach. In contrast to model-based methods that are sensitive to parameter uncertainties and measurement errors, our algorithm does not require extensive offline data processing. The control policy is iteratively optimized online by minimizing a value function, enabling accurate reactive power sharing and effective voltage control in the microgrid without relying on a precise system model.
The structure of this paper is as follows: The necessary background and preliminaries are outlined in Section 2. Section 3 proves the convergence of policy iteration, proposes a dynamic leader election algorithm, and designs a model-free reinforcement learning algorithm. Section 4 validates the effectiveness of the proposed methods through numerical experiments. Finally, Section 5 summarizes the main contributions of this paper.

2. Preliminaries and Problem Formulation

This section presents the graph theory, island microgrid modeling, analysis of reactive power and voltage coupling, and the formulation of performance indices for containment control.

2.1. Graph Theory

This paper considers an island AC microgrid modeled as a multi-agent system (MAS), comprising N follower agents and M virtual leader agents. The MAS is represented by an undirected graph Ψ = ( V , ) , where V = v 0 , v 1 , , v N denotes the node set, while V × V corresponds to the collection of edges. Two nodes v i and v j are considered neighbors if v i , v j . For the graph Ψ , A = a i j R N × N is the adjacency matrix, where a i j = 1 if v i , v j ; otherwise, a i j = 0 . The degree matrix is D = diag d i , where d i = j N i a i j is the degree of the ith node in the graph, and N i represents the set of neighbors of node v i . The Laplacian matrix L is then defined by L = D A . Matrix S r = diag s 1 r , , s i r , , s N r R N × N ( i = 1 , , N , r = 1 , , M ) is the pinning gain matrix associated with the rth virtual leader, where s i r = 1 if the rth virtual leader can communicate with the ith follower; otherwise, s i r = 0 .

2.2. Model Descriptions of Island AC Microgrids

Assume that all DGs are interfaced with the island microgrid through voltage source inverters (VSIs), each equipped with an output LC filter, as illustrated in Figure 1. By applying feedback linearization [17], the following relationship is established:
z ˜ ¨ i = L F i 2 h i + L g i L F i ˜ i h i V ˜ i + v ¨ = υ i + v ¨ ,
where F i ˜ i = f i ˜ i + k i ˜ i D i , and L F i 2 h i = L F i L F i h i = L F i h i ˜ i F i · v . For the sake of brevity, while the complete derivation is detailed in [11], the key terms are briefly defined here. ˜ i is the state vector of the ith DG, which includes filter currents and capacitor voltages in the dq-frame. h i is the output function, defined as the bus voltage v b d i . f i and g i represent the drift and input vector fields of the nonlinear system, respectively. L F i h i and L F i 2 h i are the first- and second-order Lie derivatives of the output function with respect to the system dynamics. υ i is the new, linearized control input. According to [11], Ξ i is used to denote the bus voltage v b d i . Since the control of microgrids is typically implemented within a digital control framework, it is necessary to discretize the aforementioned continuous-time equations. After discretization, the variable Ξ i in Equation (1) can be reformulated as follows:
Ξ i ( k + 1 ) = A d Ξ i ( k ) + B d u i ( k ) , i = 1 , , N
where the new control input is u i = υ i + v ¨ . The state variable is defined by Ξ i = v b d i , v ˙ b d i T . The discrete-time system matrices A d and B d are derived from A and B (as described in [11]) using the zero-order hold method with sampling interval T s , where A d = e A T s and B d = 0 T s e A τ B d τ [19]. Thus, A d 1 T s 0 1 and B d T s 2 / 2 T s .
The linearized system uses the state vector Ξ i to describe the voltage dynamics, where v b d i and its derivative v ˙ b d i are captured.

2.3. Reactive Power Sharing and Voltage Regulation

Under islanded conditions, each DG applies the conventional Q - V droop control [20], i.e., E i = V ˜ i n i Q i , where E i and V ˜ i are the voltage and voltage magnitude references, n i is the droop coefficient, and Q i is the reactive power output of the ith DG. The objective of reactive power sharing is [21]: n 1 Q 1 = n 2 Q 2 = = n i Q i , i = 1 , 2 , , N .
Following the conventional Kron reduction based on steady-state parameters [22], the reduced bus admittance matrix Y is obtained. The reactive power at bus i is given by [9]:
Q i = v b d i j N i v b d j G i j sin θ i j B i j cos θ i j
where G i j and B i j are the real and imaginary parts of Y between buses i and j ( Y = G + j B ), θ i j = θ i θ j is the phase angle difference, and N i is the set of buses connected to i (including i itself). Assuming small power angles ( sin θ i j θ i j , cos θ i j 1 ) [23] and predominantly inductive feeder impedance ( R / X 1 ) [24], (3) simplifies to Q i = j = 1 n B i j v b d j . Letting Q ˜ = [ Q , Q ˙ ] and combining with (2), the discrete-time dynamics of Q ˜ i are:
Q ˜ i ( k + 1 ) = A d Q ˜ i ( k ) B d j = 1 n B i j u j
Accurate reactive power sharing is difficult due to the coupling between voltage regulation and reactive power [10]. Tight voltage control limits reactive power exchange and leads to sharing imbalance. To overcome this, a containment control strategy is used to maintain voltage within set bounds. The boundary dynamics are given by:
Ξ 0 r ( k + 1 ) = A d Ξ 0 r ( k ) , r = 1 , 2 .
where ω 0 r = V r 0 , r = 1 , 2 . Here, V 1 and V 2 denote the upper and lower voltage reference limits. According to MAS theory, the neighborhood containment error ϱ i v ( k ) is given by:
ϱ i v ( k ) = j N i a i j Ξ i ( k ) Ξ j ( k ) + r = 1 2 s i r Ξ 0 r ( k ) Ξ i ( k ) , i = 1 , , N .
Based on (2), (5), and (6), the dynamic equation satisfied by the containment voltage error can be derived as follows:
ϱ i v ( k + 1 ) = A d ϱ i v ( k ) d i + r = 1 2 s i r B d u i ( k ) + j N i a i j B d u j ( k )
The reactive power sharing error ϱ i q ( k ) quantifies deviations in the reactive power contribution of DGs. It is defined by:
ϱ i q ( k ) = j N i a i j n i Q ˜ i ( k ) n j Q ˜ j ( k )
According to (4), the dynamic equation of the reactive-power-sharing error is given by:
ϱ i q ( k + 1 ) = A d ϱ i q ( k ) B d j N i a i j n i m = 1 n B i m u m n j w = 1 n B j w u w
By minimizing these errors, the proposed control strategy achieves both containment-based voltage regulation and accurate reactive power sharing.
Assumption A1.
For any virtual leader, one or more paths exist that connect its dynamic behavior to every follower DG in the network.

2.4. Optimal Performance Metrics

Each DG i optimizes its cost via a game, using a local performance index as in [25] to ensure proper power sharing, low energy consumption, and voltage security:
J i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) = t = k β t k M i ϱ i v ( t ) , u i ( t ) , ϱ i q ( t )
where M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) = ϱ i v T ( k ) Θ 1 i i ϱ i v ( k ) + u i T ( k ) Θ 2 i i u i ( k ) + ϱ i q T ( k ) Θ 3 i i ϱ i q ( k ) , and Θ 1 i i > 0 , Θ 2 i i > 0 , Θ 3 i i > 0 are all positive weighting matrices. β denotes the discount factor, satisfying 0 < β 1 . Each DG optimizes its control strategy locally through communication with neighbors to achieve system objectives.
Definition 1
([25]). The control action u i ( k ) is considered admissible if it stabilizes Equations (7) and (9) and ensures that J i remains bounded.
For any admissible control policy u i ( k ) , the local performance function V i ϱ i v ( k ) , ϱ i q ( k ) of the ith DG can be expressed as V i ( ϱ i v ( k ) , ϱ i q ( k ) ) = M i ( ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) ) + β V i ( ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) ) by applying the Bellman optimality principle. Specifically, the optimal local performance function is given by:
V i * ϱ i v ( k ) , ϱ i q ( k ) = min u i ( k ) M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β V i * ϱ i v ( k + 1 ) , ϱ i q ( k + 1 )
where V i * denotes the optimal value function, subject to the boundary condition V i * ( 0 , 0 ) = 0 . Equation (11) represents the HJB equation. Accordingly, u i * ( k ) represents the local containment control input that achieves optimality for the microgrid, and its derivation is provided below:
u i * ( k ) = β 2 Θ 2 i i 1 V i * ϱ i v ( k + 1 ) d i + r = 1 2 s i r B d + V i * ϱ i q ( k + 1 ) B d j N i a i j ( n i B i i n j B j i )
Remark 1.
While conventional model-based methods can theoretically compute the optimal control u i * ( k ) by solving the HJB Equation (12), their practical application is hindered by a significant limitation: the reliance on precise microgrid parameters that are often unavailable or uncertain in real-world scenarios. To address this fundamental challenge, this paper proposes a model-free reinforcement learning approach. Instead of requiring an explicit system model, this method approximates the HJB solution and derives the optimal policy directly from input–output data using an actor–critic framework. This data-driven nature represents a key advantage, enhancing the controller’s robustness and practical value compared to conventional methods that depend on an idealized and often inaccurate system model.

3. Coordinated Voltage and Reactive Power Control Scheme Design Based on DLE and RL Algorithms

This section describes the coordinated control scheme for voltage and reactive power based on DLE and RL algorithms. Figure 2 shows the overall control procedure.

3.1. Convergence Analysis of Policy Iteration

The iterative learning algorithm is applied to the containment controller as an optimization method using historical data. Each DG exchanges voltage and reactive power information with others via the communication network. A time sequence t 1 , t 2 , with interval s = t k + 1 t k is defined. In policy iteration, the performance function is evaluated for a feasible policy and, as s increases, both performance function V i s ϱ i v ( k ) , ϱ i q ( k ) and control policy u i s ( k ) are iteratively updated.
Step 1: Initialize u i 0 ( k ) and V i 0 ϱ i v ( k ) , ϱ i q ( k ) = 0;
Step 2: Update the performance function V i s ϱ i v ( k ) , ϱ i q ( k ) ;
Step 3: Update the control actions u i s + 1 ϱ i v ( k ) , ϱ i q ( k ) ;
Step 4: The algorithm terminates, while V i s + 1 ϱ i v ( k ) , ϱ i q ( k ) V i s ϱ i v ( k ) , ϱ i q ( k ) ι , where ι is a predefined positive constant. The iteration index s is updated to s + 1 , and the process returns to Step 2 for further iteration.
The objective is to guarantee convergence of both the control strategy and the local performance function to their respective optimal values. To establish u i s ( k ) u i * ( k ) and V i s ( ξ i ( k ) ) V i * ( ϱ i v ( k ) , ϱ i q ( k ) ) as s , an essential lemma is presented below.
Lemma 1
([26]). Startingfrom any initial admissible control policies u i 0 ( k ) , V i s ( ϱ i v ( k ) , ϱ i q ( k ) ) and u i s ( k ) are updated iteratively via Steps 2 and 3. It can be shown that V i s ( ϱ i v ( k ) , ϱ i q ( k ) ) is monotonically nonincreasing, i.e., V i s + 1 ( ϱ i v ( k ) , ϱ i q ( k ) ) V i s ( ϱ i v ( k ) , ϱ i q ( k ) ) .
Theorem 1.
Let V i s ϱ i v ( k ) , ϱ i q ( k ) and u i s ( k ) be generated by Step 2 and Step 3. As s , V i s ϱ i v ( k ) , ϱ i q ( k ) converges to the optimal value V i * ϱ i v ( k ) , ϱ i q ( k ) , and u i s ( k ) converges to the optimum u i * ( k ) , i.e., lim s V i s ϱ i v ( k ) , ϱ i q ( k ) = V i * ϱ i v ( k ) , ϱ i q ( k ) , lim s u i s ( k ) = u i * ( k ) .
Proof. 
Let V i s ( ϱ i v ( k ) , ϱ i q ( k ) ) denote the value function at iteration s, and define its pointwise limit as V i ϱ i v ( k ) , ϱ i q ( k ) = lim s V i s ϱ i v ( k ) , ϱ i q ( k ) . By Step 2 and Step 3, for all s, the following recursion holds:
V i s ϱ i v ( k ) , ϱ i q ( k ) = min u i ( k ) M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β V i s ϱ i v ( k + 1 ) , ϱ i q ( k + 1 )
First, we note that for any ϵ > 0 , there exists an integer s 0 such that for all s s 0 ,
V i s ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) V i ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) < ϵ .
For any admissible u i ( k ) , as s , it follows that
V i ϱ i v ( k ) , ϱ i q ( k ) min u i ( k ) M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β V i ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) + 2 ϵ
For s , we have
V i ϱ i v ( k ) , ϱ i q ( k ) min u i ( k ) M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β V i ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) ϵ
Combining inequalities (15) and (16), for any ϵ > 0 , we have
min u i ( k ) M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β V i ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) ϵ V i ϱ i v ( k ) , ϱ i q ( k ) min u i ( k ) M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β V i ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) + 2 ϵ
Since ϵ > 0 is arbitrary, by letting ϵ 0 , we conclude that
V i ϱ i v ( k ) , ϱ i q ( k ) = min u i ( k ) M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β V i ϱ i v ( k + 1 ) , ϱ i q ( k + 1 )
For any admissible control u i ( k ) , a new performance index can be used to equivalently express the problem:
Ψ i ϱ i v ( k ) , ϱ i q ( k ) = M i ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) + β Ψ i ϱ i v ( k + 1 ) , ϱ i q ( k + 1 )
Furthermore, assume there exists a state ( ϱ ¯ i v , ϱ ¯ i q ) such that Ψ i ( ϱ ¯ i v , ϱ ¯ i q ) < V i ( ϱ ¯ i v , ϱ ¯ i q ) . By recursively unfolding the definition of Ψ i , for a finite horizon N (and noting that the terminal cost vanishes as ( ϱ i v ( N ) , ϱ i q ( N ) ) 0 ), we obtain
Ψ i ϱ i v ( 0 ) , ϱ i q ( 0 ) = t = 0 N 1 β t M i ϱ i v ( t ) , u i ( t ) , ϱ i q ( t ) .
According to the definition of V i ϱ i v ( 0 ) , ϱ i q ( 0 ) , we have
V i ϱ i v ( 0 ) , ϱ i q ( 0 ) = min u i ( 0 ) , , u i ( N 1 ) t = 0 N 1 β t M i ϱ i v ( t ) , u i ( t ) , ϱ i q ( t ) .
By the principle of optimality, V i ϱ i v ( 0 ) , ϱ i q ( 0 ) is the minimal cost. Therefore,
V i ϱ i v ( 0 ) , ϱ i q ( 0 ) t = 0 N 1 β t M i ϱ i v ( t ) , u i ( t ) , ϱ i q ( t ) = Ψ i ϱ i v ( 0 ) , ϱ i q ( 0 )
This contradicts our previous assumption. Thus, it must hold that Ψ i ( ϱ i v ( k ) , ϱ i q ( k ) ) V i ( ϱ i v ( k ) , ϱ i q ( k ) ) for all k. V i ( ϱ i v ( k ) , ϱ i q ( k ) ) serves as a global lower bound on the cost for any admissible policy, with equality attained under the optimal policy, i.e.,
V i ϱ i v ( k ) , ϱ i q ( k ) = Ψ i * ϱ i v ( k ) , ϱ i q ( k ) = V i * ϱ i v ( k ) , ϱ i q ( k ) .
Similarly, it holds that V i s ϱ i v ( k ) , ϱ i q ( k ) V i * ϱ i v ( k ) , ϱ i q ( k ) for any iteration s. Taking the limit as s , we have V i ϱ i v ( k ) , ϱ i q ( k ) V i * ϱ i v ( k ) , ϱ i q ( k ) . On the other hand, by the definition of V i * ϱ i v ( k ) , ϱ i q ( k ) as the minimal cost achievable by any admissible policy, V i ϱ i v ( k ) , ϱ i q ( k ) cannot be smaller than V i * ϱ i v ( k ) , ϱ i q ( k ) . Therefore, the following equality holds: lim s V i s ϱ i v ( k ) , ϱ i q ( k ) = V i * ϱ i v ( k ) , ϱ i q ( k ) .
This completes the proof. □
This algorithm ensures voltage convergence to the optimal values under containment control and achieves accurate reactive power sharing.

3.2. Stability Analysis of Coordinated Voltage and Reactive Power Control

Section 3.1 proved that the policy iteration algorithm converged to the optimal control policy u i * ( k ) . This section demonstrates that the application of this optimal policy ensures the asymptotic stability of the closed-loop system. Specifically, we prove that the containment voltage error ϱ i v ( k ) and the reactive-power-sharing error ϱ i q ( k ) converge to zero.
To analyze the stability, we employ a Lyapunov-based approach. The optimal value function V i * ( ϱ i v ( k ) , ϱ i q ( k ) ) derived from the Bellman equation serves as a natural candidate for a Lyapunov function for the closed-loop error dynamics of the ith DG.
Theorem 2.
For the error dynamics described by (7) and (9), if the control policy u i * ( k ) is obtained from the converged policy iteration algorithm as described in Section 3.1, then the closed-loop system is asymptotically stable at the origin, i.e., lim k ϱ i v ( k ) = 0 and lim k ϱ i q ( k ) = 0 .
Proof. 
The optimal value function V i * ( ϱ i v ( k ) , ϱ i q ( k ) ) satisfies the Bellman optimality equation for the optimal policy u i * ( k ) :
V i * ( ϱ i v ( k ) , ϱ i q ( k ) ) = M i ( ϱ i v ( k ) , u i * ( k ) , ϱ i q ( k ) ) + β V i * ( ϱ i v ( k + 1 ) , ϱ i q ( k + 1 ) )
where M i ( ϱ i v ( k ) , u i * ( k ) , ϱ i q ( k ) ) = ϱ i v T ( k ) Θ 1 i i ϱ i v ( k ) + u i * T ( k ) Θ 2 i i u i * ( k ) + ϱ i q T ( k ) Θ 3 i i ϱ i q ( k ) .
According to Definition 1, for any admissible control policy, the performance index J i must be bounded. The optimal policy u i * ( k ) is, by definition, an admissible policy. Therefore, the optimal value function V i * ( ϱ i v ( 0 ) , ϱ i q ( 0 ) ) , which is the minimum possible value of J i , must be finite.
V i * ( ϱ i v ( 0 ) , ϱ i q ( 0 ) ) = J i * = t = 0 β t M i ( ϱ i v ( t ) , u i * ( t ) , ϱ i q ( t ) ) <
From the definition of M i , since the weighting matrices Θ 1 i i , Θ 2 i i , and Θ 3 i i are all positive definite, M i ( ϱ i v ( k ) , u i * ( k ) , ϱ i q ( k ) ) 0 for all k. The equality M i = 0 holds if and only if ϱ i v ( k ) = 0 , ϱ i q ( k ) = 0 , and u i * ( k ) = 0 .
For the infinite series in (25) to converge to a finite value with a discount factor 0 < β 1 , it is a necessary condition that the terms of the series approach zero, that is:
lim k β k M i ( ϱ i v ( k ) , u i * ( k ) , ϱ i q ( k ) ) = 0
Since β is a constant, this implies:
lim k M i ( ϱ i v ( k ) , u i * ( k ) , ϱ i q ( k ) ) = 0
Given that M i is a sum of non-negative terms, for their sum to be zero, each individual term must be zero. Therefore, we must have:
lim k ϱ i v T ( k ) Θ 1 i i ϱ i v ( k ) = 0 lim k u i * T ( k ) Θ 2 i i u i * ( k ) = 0 lim k ϱ i q T ( k ) Θ 3 i i ϱ i q ( k ) = 0
Since Θ 1 i i and Θ 3 i i are positive definite matrices, this directly leads to the conclusion that the error states converge to zero:
lim k ϱ i v ( k ) = 0 lim k ϱ i q ( k ) = 0
This demonstrates that the origin of the error system is asymptotically stable under the optimal control policy u i * ( k ) . □

3.3. Dynamic Leader Election Algorithm

Containment control maintains voltage safety in microgrids by enforcing upper and lower bounds. However, conventional approaches usually predefine the upper-bound leader. If this leader experiences a heavy load, its voltage may decrease, violating the highest-voltage assumption and impairing effective reactive power sharing.
To dynamically adjust the containment control leader based on bus voltage, each DG must access the voltage of all DG-connected buses. However, due to the distributed communication architecture in microgrids, non-adjacent DGs cannot directly share information. Thus, a bus voltage estimation algorithm is required to enable indirect acquisition of voltage data among non-adjacent buses.
Let χ ^ i = χ ^ i 1 , χ ^ i 2 , , χ ^ i N T R N denote the vector of bus voltage estimates by the ith DG, where χ ^ i j represents the ith DG’s estimate of the voltage at the bus to which the jth DG is connected; furthermore, χ i = Ξ i , 1 . Then, the update rule for the estimated value χ ^ i j , i , j N takes the form
χ ^ ˙ i j = k = 1 N a i k χ ^ i j χ ^ k j + a i j χ ^ i j χ j
where a i j specifies the ( i , j ) entry in the adjacency matrix. The first term, a i k ( χ ^ i j χ ^ k j ) , represents the difference between the ith DG’s estimate and its neighboring kth DG’s estimate of the bus voltage at the jth DG. The second term, a i j ( χ ^ i j χ j ) , captures the error between the ith DG’s estimate and the actual bus voltage at the jth DG.
During each iteration of the microgrid controller, each DG estimates the voltages at all buses according to (27). Based on the estimated vector χ ^ i = χ ^ i 1 , χ ^ i 2 , , χ ^ i N T , the ith DG determines whether it is elected as an upper- or lower-bound leader. The specific rules are given as follows:
  • If χ ^ i i is the maximum value in χ ^ i , then node i is selected as the upper-bound leader.
  • If χ ^ i i is the minimum value in χ ^ i , then node i is selected as the lower-bound leader.
  • If χ ^ i i is neither the maximum nor the minimum value in χ ^ i , node i is not selected as a leader.
  • If there are multiple nodes whose values are equal to χ ^ i i and these values are the maximum or minimum in χ ^ i , node i is elected as the upper- or lower-bound leader only if its index i is the smallest; otherwise, it is not selected as a leader.
Other nodes follow the same procedure to determine their leadership status. In the following, we prove the convergence of the bus voltage estimation algorithm.
Theorem 3.
Consider the estimator dynamic for bus voltage in a microgrid, given by (27). Under Assumption 1, it holds that lim t χ ^ i j ( t ) = χ j for all i , j = 1 , , N .
Proof. 
Introduce the error variable ϖ i j ( t ) , defined by ϖ i j ( t ) = χ ^ i j ( t ) χ j , where ϖ i j ( t ) represents the estimation error of the bus voltage at jth DG, as estimated by the ith DG. Since χ j is constant or varies very slowly in steady state, it follows that ϖ ˙ i j ( t ) = χ ^ ˙ i j ( t ) .
Substituting the system dynamics yields:
ϖ ˙ i j = k = 1 N a i k χ ^ i j χ ^ k j + a i j χ ^ i j χ j .
By substituting the relation ϖ k j = χ ^ k j χ j into the above equation, the error dynamics can be expressed as
ϖ ˙ i j = k = 1 N a i k ϖ i j ϖ k j a i j ϖ i j = k = 1 N a i k + a i j ϖ i j + k = 1 N a i k ϖ k j
Next, define the Lyapunov function as
V ( t ) = 1 2 i = 1 N j = 1 N ϖ i j ( t ) 2 ,
which satisfies V ( t ) 0 for all t, and V ( t ) equals zero only when ϖ i j ( t ) is zero for all i , j .
By differentiating (30) with respect to time, we obtain
V ˙ ( t ) = i = 1 N j = 1 N ϖ i j ϖ ˙ i j = i = 1 N j = 1 N ϖ i j k = 1 N a i k ϖ i j ϖ k j a i j ϖ i j
To facilitate further analysis, we separate the right-hand side of (31) into two components and define
V 1 ( t ) = i = 1 N j = 1 N k = 1 N a i k ϖ i j ϖ i j ϖ k j and V 2 ( t ) = i = 1 N j = 1 N a i j ϖ i j 2 .
Noting that a i k = a k i , we exchange the indices i and k in V 1 ( t ) to obtain an equivalent form. Summing the two expressions yields
2 V 1 ( t ) = i = 1 N j = 1 N k = 1 N a i k ( ϖ i j ϖ k j ) 2 ,
which implies that
V 1 ( t ) = 1 2 i = 1 N j = 1 N k = 1 N a i k ( ϖ i j ϖ k j ) 2 0 .
Since every term ( ϖ i j ϖ k j ) 2 is nonnegative and a i k 0 , we also have V 2 ( t ) 0 . Therefore, V ˙ ( t ) 0 , which shows that ϖ i j 0 as t , and consequently, χ ^ i j ( t ) χ j . Therefore, the ith DG’s estimation error for the jth DG’s bus voltage gradually converges to zero.
This completes the proof. □
Remark 2.
In direct contrast to conventional containment control methods based on fixed leaders, as proposed in [10,11], this work addresses their well-known limitation in handling complex scenarios like sudden large load changes. The fixed-leader approach often fails to ensure accurate reactive power sharing under such conditions. To overcome this specific flaw, this paper introduces a novel DLE algorithm. This mechanism, based on bus voltage estimation, allows each DG to dynamically select the leader according to real-time operating conditions. By doing so, it enables accurate reactive power sharing precisely where the conventional method falters, providing a clear, practical demonstration of its superiority over the static, fixed-leader approach.

3.4. RL-Based Containment Control Implementation

To ensure voltage containment and accurate reactive power sharing, this section proposes a control method based on actor–critic reinforcement learning. The actor network generates the control policy, while the critic network evaluates and guides its optimization. Through online iteration, the algorithm converges to the optimal control. The implementation structure is shown in Figure 2.

3.4.1. Critic Network

The critic network is designed to approximate the optimal value function, expressed as V i s ( k ) = V i s ϱ i v ( k ) , u i ( k ) , ϱ i q ( k ) . This network adopts a three-layer back-propagation neural architecture. Define the input vector of the critic as υ i ( k ) = ϱ i v ( k ) , ϱ i q ( k ) T , where N c denotes the quantity of neurons within the hidden layer, and ω c 1 and ω c 2 represent the weight matrices for the hidden and output layers, respectively. Accordingly, the hidden layer is supplied with ς c 1 ( k ) = ω c 1 υ i ( k ) , u i ( k ) T as its input. A hyperbolic tangent activation function, ψ , is employed in the hidden layer to capture smooth nonlinear relationships, with ψ ς c 1 ( k ) = 1 exp ς c 1 ( k ) 1 + exp ς c 1 ( k ) . The corresponding hidden layer output is Λ c ( k ) = ψ ς c 1 ( k ) . Ultimately, the output of the critic network at time k is represented by V i s ( k ) = ω c 2 Λ c ( k ) .
The error term for the critic network is defined by e c ( k ) = V i s + 1 ( k ) M i ( k ) + β V i s ( k + 1 ) . To train the critic network, gradient descent is employed to minimize the error e c ( k ) , resulting in the following objective function: min ω c 1 , ω c 2 E c ( k ) = min ω c 1 , ω c 2 1 2 [ e c ( k ) ] 2 .
The iterative update laws for the weights ω c 1 and ω c 2 are given by
ω c 1 l + 1 ( k ) = ω c 1 l ( k ) + Δ ω c 1 l ( k ) , Δ ω c 1 l ( k ) = τ E c l ( k ) ω c 1 l ( k )
E c l ( k ) ω c 1 l ( k ) = E c l ( k ) e c l ( k ) e c l ( k ) V i , l s + 1 ( k ) V i , l s + 1 ( k ) Λ c ( k ) Λ c ( k ) ς c 1 ( k ) ς c 1 ( k ) ω c 1 ( k ) .
where l is the neural network iteration index, and τ represents the learning rate. For the output layer weights:
ω c 2 l + 1 ( k ) = ω c 2 l ( k ) + Δ ω c 2 l ( k ) , Δ ω c 2 l ( k ) = τ E c l ( k ) ω c 2 l ( k )
E c l ( k ) ω c 2 l ( k ) = E c l ( k ) e c l ( k ) e c l ( k ) V i , l s + 1 ( k ) V i , l s + 1 ( k ) ω c 2 l ( k )

3.4.2. Actor Network

The actor network is constructed to approximate the optimal control policy u i * . It is implemented as a three-layer neural network, where the hidden layer consists of N a neurons and employs the hyperbolic tangent activation function. Denote ω a 1 and ω a 2 as the weight matrices for the hidden and output layers, respectively. The hidden layer output can be expressed as Λ a ( k ) = ψ ς a 1 ( k ) = 1 exp ς a 1 ( k ) 1 + exp ς a 1 ( k ) , where ς a 1 ( k ) = ω a 1 υ i ( k ) . The final output of the network is given by u i s ( k ) = 1 exp ϑ i a ( k ) 1 + exp ϑ i a ( k ) , with ϑ i a ( k ) = ω a 2 Λ a ( k ) .
By continuously adjusting the parameters ω a 1 and ω a 2 , the network aims to derive the optimal control input based on υ i ( k ) . The parameter update is guided by minimizing the objective function min ω a 1 , ω a 2 E a ( k ) = min ω a 1 , ω a 2 1 2 e a ( k ) 2 , where the error term is defined by e a ( k ) = η V i s + 1 ( k + 1 ) + γ u i s + 1 ( k ) 2 .
The purpose of the optimization problem is to ensure that the actor network produces an optimal control action that minimizes the value function. Similar to the critic network, the weights ω a 1 and ω a 2 are updated using gradient descent:
ω a 1 l ( k ) = ω a 1 l ( k ) + Δ ω a 1 l ( k ) , Δ ω a 1 l ( k ) = τ E a l ( k ) ω a 1 l ( k )
E a l ( k ) ω a 1 l ( k ) = E a l ( k ) e a l ( k ) e a l ( k ) u i s + 1 ( k ) u i s + 1 ( k ) ω a 1 l ( k ) + E a l ( k ) e a l ( k ) e a l ( k ) V i , l s + 1 ( k + 1 ) V i , l s + 1 ( k + 1 ) u i s + 1 ( k + 1 ) u i ( k + 1 ) ω a 1 l ( k )
Similarly, the weights of the output layer are updated by
ω a 2 l + 1 ( k ) = ω a 2 l ( k ) + Δ ω a 2 l ( k ) , Δ ω a 2 l ( k ) = τ E a l ( k ) ω a 2 l ( k )
E a l ( k ) ω a 2 l ( k ) = E a l ( k ) e a l ( k ) e a l ( k ) u i s + 1 ( k ) u i s + 1 ( k ) ω a 2 l ( k ) + E a l ( k ) e a l ( k ) e a l ( k ) V i , l s + 1 ( k + 1 ) V i , l s + 1 ( k + 1 ) ω a 2 l ( k )
Remark 3.
As a typical nonlinear system, island microgrids present challenges for model-based controller design due to difficulties in obtaining practical parameters such as resistance and inductance [17,18]. These methods are also sensitive to measurement errors, further complicating controller design. In contrast to [18], this paper proposes a data-driven online reinforcement learning approach that does not require extensive offline data processing. The control policy is iteratively optimized by minimizing the value function, enabling accurate reactive power sharing and effective voltage control in the microgrid.
Remark 4.
From a practical standpoint, the proposed DLE and model-free RL framework is designed to address key operational challenges in real-world offshore microgrids. First, the DLE algorithm provides crucial operational flexibility and resilience. It enables the microgrid to autonomously adapt to the harsh and dynamic conditions of offshore environments (e.g., sudden load changes, volatile renewables), overcoming the rigidity of fixed-leader schemes to ensure stability without manual intervention. Second, the model-free RL controller eliminates the reliance on an accurate system model, which is a significant practical challenge because obtaining line parameters is often both difficult and costly. By learning directly from measurement data, our approach simplifies deployment, reduces commissioning costs, and enhances robustness against parameter uncertainties and system aging. Collectively, these features make the proposed framework not only technically effective but also practical, cost-efficient, and resilient for real-world offshore applications.

4. Simulation Results

As shown in Figure 3, the offshore island AC microgrid system consisted of four renewable generation units. The algorithm proposed in this paper was validated on a simulation model built on the Simulink platform. The validation strategy across the following cases was deliberately designed to highlight the practical value of the proposed model-free approach. Instead of a direct numerical comparison against a model-based controller, which can be misleading (as its performance is entirely dependent on an idealized, perfectly accurate model that is unavailable in reality), our validation focused on two key aspects. First, in Case 4.1, we conducted a head-to-head comparison with a conventional fixed-leader method [10] to demonstrate that our DLE algorithm solved a fundamental operational flaw. Second, in Cases 4.2 and 4.3, we verified that our model-free controller robustly achieved all control objectives under challenging conditions (load changes and plug-and-play), thereby proving its effectiveness and practical viability on its own terms. Following the approach in [10], the allowable voltage deviation was set to ±1%, and the rated bus voltage V bus was selected as 311 V, which served as the system design objective. Specific simulation parameters are provided in Table 1, and other related parameters were as follows: β was selected to be 0.98 as the discount factor. The performance index employed the following weighting parameters: both Θ 1 i i and Θ 3 i i were diagonal matrices with diagonal elements equal to 1, and Θ 2 i i = 0.1 . Both the actor and critic networks employed five hidden neurons.

4.1. Dynamic Leader Election

This case was designed to validate the effectiveness of the proposed dynamic leader election (DLE) algorithm. To achieve this, we first established a benchmark scenario from t = 0 s to t = 10 s by implementing the fixed-leader containment control described in [10]. The purpose of this benchmark was to replicate a well-known limitation of conventional methods. As shown in Figure 4, during this benchmark period, while the system successfully maintained voltage containment (i.e., all bus voltages remained within the safe range), the reactive-power-sharing ratio failed to achieve the desired 2 : 1 : 2 : 1 . This outcome was an expected consequence of the fixed-leader topology: since DG1 was designated as the upper-limit leader, its bus voltage was consistently maintained at the highest level, which inherently restricted reactive power flow and prevented equitable sharing among the DGs. This scenario effectively highlighted the specific problem that the data-driven model-free approach was designed to overcome without relying on pre-configured roles or precise system parameters. Subsequently, at t = 10 s , the load distribution was changed by transferring the load on Bus 3 to Bus 2. With the leaders still fixed as DG1 and DG4, it can be observed from Figure 4 that although the microgrid remained effective in voltage containment control, reactive power sharing was still not fully achieved.
At t = 20 s , the proposed dynamic leader election algorithm was enabled. As shown in Figure 4, the dynamic leader election algorithm allowed the upper-limit leader to be automatically elected as DG4 and the lower-limit leader to be automatically elected as DG1. Under the effect of the dynamic leader election algorithm, the microgrid not only achieved voltage containment control but also brought the reactive-power-sharing ratio to 2 : 1 : 2 : 1 , successfully achieving precise reactive power sharing. At t = 30 s , the load on Bus 2 was transferred back to Bus 3. From Figure 4, it can be observed that after a brief transient process, the microgrid once again achieved voltage containment control, and the reactive-power-sharing ratio was restored to 2 : 1 : 2 : 1 , ensuring precise reactive power sharing. The results confirm that the proposed algorithm adaptively selects leaders based on bus voltage magnitude.

4.2. Load Variation

Through simulation and comparative experiments, this study verified the effectiveness of the proposed dynamic leader election algorithm and model-free reinforcement learning algorithm. The proposed approach achieved both voltage recovery and accurate reactive power sharing. First, from t = 0 s to t = 5 s , the microgrid employed only the conventional PI control strategy described in [7], as shown in Figure 5. During that phase, the load on Bus 1 of the microgrid was 8 kW , and the load on Bus 3 was 10 kW . It can be seen that the microgrid voltage was not restored to the safe level, nor was the reactive-power-sharing ratio precisely maintained at 2 : 1 : 2 : 1 .
At t = 5 s , the proposed dynamic leader election and model-free reinforcement learning algorithms were enabled. As shown in Figure 5, the voltage quickly recovered to within the safe constraint range, and the reactive-power-sharing ratio also reached 2 : 1 : 2 : 1 , achieving accurate reactive power sharing. To further validate the robustness of the algorithms under load variation, at t = 10 s , the load on Bus 3 (DG3) was increased by 3 kW . From Figure 5, it can be observed that after experiencing a brief transient process, the microgrid voltage returned to the steady state and remained within the safe range. Simultaneously, the reactive-power-sharing ratio again reached 2 : 1 : 2 : 1 , achieving precise reactive power sharing. Figure 6 shows the evolution of actor–critic neural network weights for DG1 during the simulation. All weights converged to stable values, as illustrated.

4.3. Plug-And-Play Capability

The plug-and-play capability of microgrids enables the rapid integration or removal of DGs, allowing the system to adapt to load changes and equipment failures, thereby improving overall flexibility and scalability. To comprehensively and realistically validate the plug-and-play performance of the proposed algorithm, this section designs an experiment that includes a “plug-out” event and a “plug-in” process that mimics real-world engineering scenarios.
The simulation results are shown in Figure 7. During the time period from t = 0 s to t = 10 s , the microgrid operated stably with all four DGs. The proposed algorithms achieved voltage containment control and accurate reactive power sharing with a ratio of 2 : 1 : 2 : 1 . At t = 10 s , DG4 was disconnected (plug-out) to simulate its removal from operation. The simulation results show that after DG4 was removed, the power deficit was automatically compensated by the remaining DGs. The system voltage, after a brief transient, quickly stabilized and remained within the safe constraint range. Meanwhile, reactive power was redistributed among the three remaining DGs, reaching a new stable sharing ratio of 2 : 1 : 2 . To simulate the reconnection process of a DG, at t = 20 s , DG4 initiated the synchronization process with the microgrid. During that phase, DG4 adjusted its output voltage frequency, phase, and amplitude to match the microgrid’s parameters in preparation for grid connection. At t = 25 s , upon successful synchronization, DG4 was physically connected to the microgrid, and its controller was activated. As observed in Figure 7, the system seamlessly reintegrated DG4. The voltage remained stable, and the reactive-power-sharing ratio, after a short dynamic adjustment, was accurately restored to the initial 2 : 1 : 2 : 1 state.
This complete test, encompassing both a plug-out event and a realistic plug-in process, robustly demonstrates that the proposed dynamic leader election and model-free reinforcement learning algorithms provide the microgrid with plug-and-play capability, ensuring safe and stable operation under dynamic topological changes.

5. Conclusions

This paper developed a secondary control method for offshore island microgrids based on a model-free reinforcement learning algorithm and a dynamic leader election mechanism. First, by combining the microgrid’s voltage containment error and reactive power sharing error, a value function for policy iteration was constructed. Then, a dynamic leader election algorithm was designed, enabling different DGs to be dynamically elected as leaders to facilitate accurate reactive power allocation. Subsequently, a model-free reinforcement learning algorithm was developed, which relied solely on real-time measurements of voltage and reactive power without requiring a complex system model.
However, it is important to acknowledge that this study was conducted under the assumption of an ideal island microgrid model, where factors such as communication delays, external disturbances, and potential cyber-attacks were not considered. Communication delays, which are inherent in distributed control systems, could introduce time lags in the information exchange among DGs. This might affect the timeliness of the dynamic leader election process and degrade the performance of the reinforcement learning algorithm, potentially leading to oscillations or even instability. Similarly, other disturbances, such as measurement noise and unmodeled dynamics, could impact the accuracy of the data-driven RL algorithm, which is highly dependent on the quality of measurement data. Addressing these practical challenges is crucial for real-world implementation. Therefore, these aspects will be the focus of our future work. We plan to investigate and develop more robust control strategies that can tolerate communication delays and are resilient to various disturbances. This may involve integrating predictive control mechanisms or designing delay-compensation techniques within the RL framework. To validate the effectiveness and robustness of the enhanced methods, we intend to conduct more comprehensive hardware-in-the-loop simulations or tests on a physical experimental platform.

Author Contributions

Formal analysis, X.Y.; Funding acquisition, Z.W.; Investigation, S.W.; Project administration, Z.W.; Supervision, Z.W.; Validation, X.Y.; Visualization, Q.W.; Writing—original draft, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under No. 62373089.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Shuran Wang was employed by the State Grid Jilin Electric Power Co., Ltd. Changchun Power Supply Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Wang, S.; Wang, Z.; Liu, X.; Ye, X. An SoC-based bidirectional virtual DC machine control for energy storage systems in offshore isolated island DC microgrids. J. Mar. Sci. Eng. 2023, 11, 1502. [Google Scholar] [CrossRef]
  2. Wang, F.; Teng, F.; Xiao, G.; He, Y.; Feng, Q. Resilient distributed secondary control strategy for polymorphic seaport microgrid against estimation-dependent FDI attacks. J. Mar. Sci. Eng. 2022, 10, 1668. [Google Scholar] [CrossRef]
  3. Zhou, J.; Weng, Z.; Li, J.; Song, X. Reliability evaluation, planning, and economic analysis of microgrid with access to renewable energy and electric vehicles. Electr. Power Syst. Res. 2024, 230, 110252. [Google Scholar] [CrossRef]
  4. Nasirian, V.; Shafiee, Q.; Guerrero, J.M.; Lewis, F.L.; Davoudi, A. Droop-free distributed control for ac microgrids. IEEE Trans. Power Electron. 2015, 31, 1600–1617. [Google Scholar] [CrossRef]
  5. Li, W.; Zhao, H.; Zhu, J.; Yang, T. A novel reactive power sharing control strategy for shipboard microgrids based on deep reinforcement learning. J. Mar. Sci. Eng. 2025, 13, 718. [Google Scholar] [CrossRef]
  6. Ahmed, K.; Seyedmahmoudian, M.; Mekhilef, S.; Mubarak, N.; Stojcevski, A. A review on primary and secondary controls of inverter-interfaced microgrid. J. Mod. Power Syst. Clean Energy 2020, 9, 969–985. [Google Scholar] [CrossRef]
  7. Shafiee, Q.; Guerrero, J.M.; Vasquez, J.C. Distributed secondary control for islanded microgrids—A novel approach. IEEE Trans. Power Electron. 2013, 29, 1018–1031. [Google Scholar] [CrossRef]
  8. Xiao, H.; Liu, G.; Huang, J.; Hou, S.; Zhu, L. Parameterized and centralized secondary voltage control for autonomous microgrids. Int. J. Electr. Power Energy Syst. 2022, 135, 107531. [Google Scholar] [CrossRef]
  9. Mohiuddin, S.M.; Qi, J. Optimal distributed control of ac microgrids with coordinated voltage regulation and reactive power sharing. IEEE Trans. Smart Grid 2022, 13, 1789–1800. [Google Scholar] [CrossRef]
  10. Han, R.; Meng, L.; Ferrari-Trecate, G.; Coelho, E.A.A.; Vasquez, J.C.; Guerrero, J.M. Containment and consensus-based distributed coordination control to achieve bounded voltage and precise reactive power sharing in islanded ac microgrids. IEEE Trans. Ind. Appl. 2017, 53, 5187–5199. [Google Scholar] [CrossRef]
  11. Zhai, M.-N.; Sun, J. Distributed critical bus voltage regulation control for multimicrogrids with positive minimum interevent times. IEEE Trans. On Ind. Inform. 2023, 20, 5774–5783. [Google Scholar] [CrossRef]
  12. Zhai, M.; Sun, Q.; Wang, R.; Zhang, H. Containment-based multiple pcc voltage regulation strategy for communication link and sensor faults. IEEE/CAA J. Autom. Sin. 2023, 10, 2045–2055. [Google Scholar] [CrossRef]
  13. Xia, Y.; Xu, Y.; Wang, Y.; Mondal, S.; Dasgupta, S.; Gupta, A.K. Optimal secondary control of islanded ac microgrids with communication time-delay based on multi-agent deep reinforcement learning. CSEE J. Power Energy Syst. 2022, 9, 1301–1311. [Google Scholar]
  14. Toro, V.; Tellez-Castro, D.; Mojica-Nava, E.; Rakoto-Ravalontsalama, N. Data-driven distributed voltage control for microgrids: A koopman-based approach. Int. J. Electr. Power Energy Syst. 2023, 145, 108636. [Google Scholar] [CrossRef]
  15. Huang, Y.; Liu, G.-P.; Yu, Y.; Hu, W. Data-driven distributed predictive control for voltage regulation and current sharing in dc microgrids with communication constraints. IEEE Trans. Cybern. 2024, 54, 4998–5011. [Google Scholar] [CrossRef]
  16. Zholbaryssov, M.; Dominguez-Garcia, A.D. Safe data-driven secondary control of distributed energy resources. IEEE Trans. Power Syst. 2021, 36, 5933–5943. [Google Scholar] [CrossRef]
  17. Bidram, A.; Davoudi, A.; Lewis, F.L.; Guerrero, J.M. Distributed cooperative secondary control of microgrids using feedback linearization. IEEE Trans. Power Syst. 2013, 28, 3462–3470. [Google Scholar] [CrossRef]
  18. Gu, W.; Lou, G.; Tan, W.; Yuan, X. A nonlinear state estimator-based decentralized secondary voltage control scheme for autonomous microgrids. IEEE Trans. Power Syst. 2017, 32, 4794–4804. [Google Scholar] [CrossRef]
  19. Lin, S.-W.; Chu, C.-C. Distributed q-learning-based voltage restoration algorithm in isolated ac microgrids subject to input saturation. IEEE Trans. Ind. Appl. 2024, 60, 5447–5459. [Google Scholar] [CrossRef]
  20. Han, Y.; Li, H.; Shen, P.; Coelho, E.A.A.; Guerrero, J.M. Review of active and reactive power sharing strategies in hierarchical controlled microgrids. IEEE Trans. Power Electron. 2016, 32, 2427–2451. [Google Scholar] [CrossRef]
  21. An, R.; Liu, Z.; Liu, J. Successive-approximation-based virtual impedance tuning method for accurate reactive power sharing in islanded microgrids. IEEE Trans. Power Electron. 2020, 36, 87–102. [Google Scholar] [CrossRef]
  22. Kersting, W.H. Distribution System Modeling and Analysis, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2007. [Google Scholar]
  23. Yang, J.; Zhang, N.; Kang, C.; Xia, Q. A state-independent linear power flow model with accurate estimation of voltage magnitude. IEEE Trans. Power Syst. 2016, 32, 3607–3617. [Google Scholar] [CrossRef]
  24. Rocabert, J.; Luna, A.; Blaabjerg, F.; Rodriguez, P. Control of power converters in ac microgrids. IEEE Trans. Power Electron. 2012, 27, 4734–4749. [Google Scholar] [CrossRef]
  25. Wang, R.; Ma, D.; Li, M.-J.; Sun, Q.; Zhang, H.; Wang, P. Accurate current sharing and voltage regulation in hybrid wind/solar systems: An adaptive dynamic programming approach. IEEE Trans. Consum. Electron. 2022, 68, 261–272. [Google Scholar] [CrossRef]
  26. Li, T.; Bai, W.; Liu, Q.; Long, Y.; Chen, C.P. Distributed fault-tolerant containment control protocols for the discrete-time multiagent systems via reinforcement learning method. IEEE Trans. Neural Netw. Learning Syst. 2021, 34, 3979–3991. [Google Scholar] [CrossRef]
Figure 1. Control block diagram of DGs.
Figure 1. Control block diagram of DGs.
Jmse 13 01432 g001
Figure 2. DLE and RL algorithms for island microgrid control framework.
Figure 2. DLE and RL algorithms for island microgrid control framework.
Jmse 13 01432 g002
Figure 3. Offshore island microgrid with four DGs.
Figure 3. Offshore island microgrid with four DGs.
Jmse 13 01432 g003
Figure 4. Comparative analysis of the microgrid’s fixed and dynamic leaders. (a) Bus voltage. (b) Reactive power sharing.
Figure 4. Comparative analysis of the microgrid’s fixed and dynamic leaders. (a) Bus voltage. (b) Reactive power sharing.
Jmse 13 01432 g004
Figure 5. Load variation. (a) Bus voltage. (b) Reactive power sharing.
Figure 5. Load variation. (a) Bus voltage. (b) Reactive power sharing.
Jmse 13 01432 g005
Figure 6. Variation in weights for the first DG. (a) Weight of the actor network output layer. (b) Weight of the critic network output layer. (c) Weights of the actor network hidden layer. (d) Weights of the critic network hidden layer.
Figure 6. Variation in weights for the first DG. (a) Weight of the actor network output layer. (b) Weight of the critic network output layer. (c) Weights of the actor network hidden layer. (d) Weights of the critic network hidden layer.
Jmse 13 01432 g006
Figure 7. Plug-and-play capability. (a) Output voltage. (b) Reactive power sharing.
Figure 7. Plug-and-play capability. (a) Output voltage. (b) Reactive power sharing.
Jmse 13 01432 g007
Table 1. System parameters.
Table 1. System parameters.
SymbolParameterValueSymbolParameterValue
V b u s Rated bus voltage311 V S r a t e , 1 Rated power DG125 kW, 20 kVar
Z l i n e , 1 Line impedance of DG1 0.03 Ω + 0.56 mH S r a t e , 2 Rated power DG220 kW, 10 kVar
Z l i n e , 2 Line impedance of DG2 0.06 Ω + 0.8 mH S r a t e , 3 Rated power DG325 kW, 20 kVar
Z l i n e , 3 Line impedance of DG3 0.03 Ω + 0.56 mH S r a t e , 4 Rated power DG420 kW, 10 kVar
Z l i n e , 4 Line impedance of DG4 0.06 Ω + 0.8 mH L f DG filter inductance1 mH
Z l i n e , 12 Line impedance of DG1,2 0.6 Ω + 3.2 mH C f DG filter capacitor100 μF
Z l i n e , 23 Line impedance of DG2,3 0.4 Ω + 2.4 mH l o a d 1 Capacity of load112.5 kW, 8 kVar
Z l i n e , 34 Line impedance of DG3,4 0.5 Ω + 2.8 mH l o a d 2 Capacity of load215.8 kW, 10 kVar
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, X.; Wang, Z.; Wang, Q.; Wang, S. Dynamic Leader Election and Model-Free Reinforcement Learning for Coordinated Voltage and Reactive Power Containment Control in Offshore Island AC Microgrids. J. Mar. Sci. Eng. 2025, 13, 1432. https://doi.org/10.3390/jmse13081432

AMA Style

Ye X, Wang Z, Wang Q, Wang S. Dynamic Leader Election and Model-Free Reinforcement Learning for Coordinated Voltage and Reactive Power Containment Control in Offshore Island AC Microgrids. Journal of Marine Science and Engineering. 2025; 13(8):1432. https://doi.org/10.3390/jmse13081432

Chicago/Turabian Style

Ye, Xiaolu, Zhanshan Wang, Qiufu Wang, and Shuran Wang. 2025. "Dynamic Leader Election and Model-Free Reinforcement Learning for Coordinated Voltage and Reactive Power Containment Control in Offshore Island AC Microgrids" Journal of Marine Science and Engineering 13, no. 8: 1432. https://doi.org/10.3390/jmse13081432

APA Style

Ye, X., Wang, Z., Wang, Q., & Wang, S. (2025). Dynamic Leader Election and Model-Free Reinforcement Learning for Coordinated Voltage and Reactive Power Containment Control in Offshore Island AC Microgrids. Journal of Marine Science and Engineering, 13(8), 1432. https://doi.org/10.3390/jmse13081432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop