Optimal Voltage Recovery Learning Control for Microgrids with N-Distributed Generations via Hybrid Iteration Algorithm

: Considering that the nonlinearity and uncertainty of the microgrid model complicate the derivation and design of the optimal controller, an adaptive dynamic programming (ADP) algorithm is designed to solve the model-free non-zero-sum game. By combining the advantages of policy iteration and value iteration, an optimal learning control scheme based on hybrid iteration is constructed to provide stringent real power sharing for the nonlinear and coupled microgrid systems with N-distributed generations. First, using non-zero-sum differential game strategy, a novel distributed secondary voltage recovery consensus optimal control protocol is built using a hybrid iteration method to realize the voltage recovery of microgrids. Then, the data of the system state and input are gathered along a dynamic system trajectory and a data-driven optimal controller learns the game solution without microgrid physics information, enhancing convenience and efficiency in practical applications. Furthermore, the convergence analysis is given in detail, and it is proved that the control protocol can converge to the optimal solution so as to ensure the stability of the voltage recovery of the microgrid system. Convergence analysis proves the convergence of the the protocol to the optimal solution, ensuring voltage recovery stability. Simulation results validate the feasibility and effectiveness of the proposed scheme.


Introduction
In recent years the ecological and environmental foundation of the world has been deteriorating and the demand for energy has become increasingly urgent.The utilization of renewable clean energy and the improvement of energy efficiency have become crucial for environmental protection, yet the structure and operational mode of traditional power distribution systems have proven inadequate in meeting the demands of large-scale renewable power integration [1][2][3].In order to meet this challenge, it is imperative to transform and upgrade the traditional power distribution system.Furthermore, the advancement of smart technology has led to the gradual personalization of energy demand and electricity consumption patterns, accelerating the transition of the power system towards a smart grid.This transition has rendered the traditional large grid increasingly incapable of satisfying diversified power supply needs [4][5][6].To address these challenges, the microgrid has emerged as a viable solution, playing a pivotal role in maintaining the smooth operation of power grids.A smart grid, as a new power system model, aims to achieve energy conversion and efficient use through advanced communication technology and control means.The goal is to improve the safety, reliability, economy and efficiency of the operation of the power system to meet the growing demand for energy while reducing the environmental impact of energy consumption.As an integral component of the smart grid, the microgrid represents a small-scale power system that effectively integrates distributed generators [7,8].Traditional microgrid voltage restoration methods often rely on fixed control strategies and parameter settings, which is difficult to adapt to the changes of different microgrid structures and operating conditions.These methods often show poor adaptability and robustness in the face of complex and changeable power system environments.Therefore, the development of an appropriate frequency and voltage control strategy is crucial for ensuring the stable operation of the microgrid.Research on this topic holds considerable practical significance.This paper aims to address these challenges by exploring innovative control strategies tailored for the unique characteristics of the microgrid.By doing so, the advancement of smart grid technology and enhancements for the overall performance and reliability of power systems worldwide are given in this paper.
Over the past few years, numerous scholars have begun to explore innovative control strategies for microgrids, aiming to achieve stable operation and efficient utilization of microgrids by precisely controlling energy conversion, charge and discharge of energy storage devices and load management [9][10][11].Shuai et al. conducted a comprehensive review of research on microgrid stability [12].Specifically, ref. [13] highlighted the increasing replacement of distributed generators with diesel generators in standalone microgrids.To address this, a decoupled controller for frequency and voltage was introduced, enabling the maintenance of grid frequency and voltage magnitude stability.The high-order full drive model of a distributed generator microgrid is explored by using the large signal model of a microgrid, and the complex relationship and interaction between voltage and current are effectively captured in [14].In Ref. [15] a moving target defense mechanism was proposed, which can effectively limit network attacks on smart microgrid systems.Furthermore, ref. [16] employed a robust passive control strategy to stabilize and regulate the DC bus voltage.These results represent significant contributions to the field, providing valuable insights into voltage/frequency recovery techniques for generation systems.However, despite these advancements, there remains a need for further research and development in this area, particularly in addressing the unique challenges posed by microgrids and their integration into smart grid systems.This paper wants to contribute to this ongoing effort by exploring innovative control strategies tailored to the specific requirements of microgrids, with a focus on enhancing their stability and performance.
The implementation of a nonlinear disturbance observer is proposed to enhance the stability of the system and mitigate steady-state errors resulting from CPL variations.In Ref. [17], mixed iterative adaptive dynamic programming is explored, providing a solution to the optimal control problem associated with battery energy management in smart residential microgrid systems.This method comprises two iterative processes: P-iteration and V-iteration, both grounded in reinforcement learning techniques.The P-iteration process follows a policy iteration approach, with iterative updates to the value function based on the sequence of iterative control laws.Conversely, the V-iteration process adheres to value iteration, enabling the derivation of iterative control laws in each iterative cycle.It is noteworthy that traditional centralized control strategies rely heavily on extensive communication systems to gather and process substantial amounts of information from distributed generators.These strategies primarily concentrate on data generation [18], energy management [19], state estimation [20], optimal scheduling [21] and reliability evaluation [22].The nonlinear disturbance observer offers an innovative approach to enhancing system stability and eliminating steady-state errors, presenting a promising alternative to traditional centralized control strategies.This advancement holds significant potential for improving the performance and reliability of smart residential microgrid systems.
Inspired by the distributed and consistent control methods for nonlinear multi-agent systems, ref. [23] introduced the concept of distributed cooperative control and employed a hybrid iteration technique to design a nonlinear model controller for microgrids.This approach effectively bridges the performance gap between the well-known policy iteration and value iteration methods.In policy iteration, derived from the Newton-Raphson method, convergence to the optimal value occurs at a rate of quadratic convergence [24].Conversely, value iteration differs as it does not require an initial admissible control policy, as is the case in policy iteration [25].However, hybrid iteration combines the strengths of both methods, eliminating the need for the initial stability control strategy required in policy iteration while achieving a faster convergence rate than traditional value iteration.In recent years, the application of hybrid iterative technology in modern microgrids has gradually highlighted its importance and potential.In Ref. [26], an optimal output regulation method for islanded modern microgrids was presented.This method demonstrates that the hybrid iterative approach significantly reduces the convergence time of the deployed CPU, minimizes the number of learning iteration cycles and eliminates the need for an initial stability control strategy.However, although the application of hybrid iteration in modern microgrids has made remarkable progress, there remains a lack of research on the acquisition and analysis of optimal feedback control protocols for nonlinear N-distributed generations in microgrids.This research gap not only limits the further application of hybrid iteration in the field of microgrids, but also affects the further improvement of the overall performance of microgrids.By analyzing the nonlinear N-distributed generation in microgrids and combining it with the characteristics of hybrid iteration, this study introduces a novel feedback control protocol, which can not only effectively deal with the nonlinear problems, but also realizes the precise adjustment of the output voltage of a microgrid, thereby contributing to the enhancement of operational efficiency and stability.
With the emergence of novel optimization algorithms, the intelligent control optimization of microgrid systems has seen significant progress.The inherent randomness, dynamics and nonlinearity of microgrid operations pose challenges to traditional model-driven control strategies, limiting their accuracy and efficiency.Adaptive dynamic programming (ADP), an intelligent control approach, exhibits strong self-learning and adaptive capabilities, independent of prior system knowledge or models.ADP leverages iteration to derive control strategies, encompassing policy iteration and value iteration.Utilizing neural network approximations, ADP employs critic and action networks to approximate optimal performance indices and control strategies, respectively.Through continuous information feedback and transmission, the critic network evaluates and updates strategies, enabling the system to converge to an optimal control strategy and its corresponding value function [27].Game theory, analyzing behavior prediction among competitive individuals and optimizing strategies for multi-agent decision-making, provides a theoretical foundation for the microgrid system's behavior trajectory.Key elements in game theory include players, strategies and cost functions.Ref. [28] transformed the microgrid system with external interference into a zero-sum differential game model to design a robust control scheme.Ref. [29] simplified the distributed secondary voltage recovery consistency control to a general distributed zero-sum differential game and then solved it.Players aim to minimize their cost functions through strategic choices, aligning with the objectives of this study.
This study aims to explore the voltage recovery control problem of microgrid systems with N-distributed generations and proposes an innovative method for this purpose.The main contributions are as follows: (1) voltage recovery control in microgrid systems involves the interaction between multiple distributed power sources and loads, so this paper transforms it into a non-zero-sum game problem to effectively deal with the cooperative relationship between multiple participants.(2) An ADP method is adopted to obtain the optimal control scheme by iteratively solving the Hamilton-Jacobi (HJ) equation.The proposed hybrid iteration combines the advantages of value iteration and policy iteration and adaptively adjusts the step size to ensure the convergence and stability of the algorithm.A strict convergence proof is given to prove the effectiveness of the proposed method.(3) In practical applications, the mathematical models of microgrid systems are often uncertain and complex, making it difficult to model accurately.In order to meet this challenge, the algorithm proposed in this paper does not rely on mathematical models and gradually approaches the optimal solution through iteration and learning.
This paper addresses the optimal control problem for nonlinear N-distributed generations, introducing a novel hybrid iteration technique to derive an optimal policy without relying on the system model.Initially, the optimal control strategy and cost function value are defined, providing a foundation for subsequent analysis.Subsequently, the effectiveness of the proposed hybrid iteration algorithm is demonstrated in learning optimal controller laws for N-distributed generations.Notably, this strategy does not require a system model, relying solely on online state and input data.This approach is particularly useful for nonlinear microgrid systems, where establishing a precise dynamics model can be challenging.By leveraging the loci of these systems, our method offers a practical solution for real-time optimal control.
The remainder of this paper is structured into six sections.Section 2 introduces the optimal control problem statement and formulation for microgrids with N-distributed generations.Section 3 presents both model-based and data-driven hybrid iteration methods.Section 4 provides a rigorous convergence proof for the proposed methods.Section 5 presents simulation results for a microgrid system with two distributed generations, along with a technical discussion.Finally, Section 6 concludes the paper, summarizing the key findings and contributions.

Problem Formulation
The N-generation distributed microgrid system refers to a multi-level complex power grid system formed by multiple microgrids.Each generation of microgrids has a certain degree of autonomy and independence.Consider the microgrid system with N-distributed generations, let the state space variables of ith generation be ṡi = ṡi,1 ṡi,2 ∈ R n 1 +n 2 , which contains the key information of the generation of the whole microgrid.s i,1 ∈ R n 1 shows the magnitude of v o,i and represents ith generation output voltage value and s i,2 ∈ R n 2 is the derivative of the magnitude of v o,i .
To capture the continuous voltage changes and understand the dynamic behavior of the microgrids in the voltage recovery process more deeply, the state-space model pertaining to the voltage recovery layer of microgrids is formulated in the following differential form: where g i,1 ∈ R n 1 ×n 1 is the nonlinear smooth function, w i ∈ R q is the control input and f i,2 ∈ R n 2 and g i,2 ∈ R n 2 ×q are nonlinear uncertain smooth functions.For ease of analysis, For each generation, the utility function is designed as where Then, the cost function of the ith generation is The optimal control problem addressed in this paper aims to determine the control strategy that minimizes the performance index value for the microgrid system.Based on (3), the optimal cost function P * is presented by The optimization problem of the microgrid system, which is difficult to obtain the model for, is transformed into the solution of a non-zero-sum game problem in this paper.Considering the interaction of various components in the system, the stable operation of the system is achieved by optimizing the overall performance.The ADP algorithm is used to adapt to the uncertainties in the system, adjust the control strategy flexibly and improve the operating efficiency and reliability of the system.
Based on the stationary conditions and utilizing the optimal cost function (4), the optimal feedback control protocol can be formulated as follows [24] w The aforementioned control law aims to achieve the Nash equilibrium solution of the game model, targeting the equilibrium of interests among multiple entities within the microgrid system.This is not only an important guarantee for system stability and reliability, but also the key to the efficient and sustainable operation of microgrid systems.Additionally, it is evident from the expression that the control law output of each player in the microgrid system is unaffected by other players, which aligns with the concept of a non-zero-sum game.Each player attains concordant interests through mutual promotion and coordination, thereby minimizing the designed value function.
In order to solve (4) and ( 5), the Hamiltonian function is defined as According to optimal control theory, it has inf From ( 4) and ( 5), the coupled equation of HJ can be expressed by with the initial condition P * i (0) = 0.In the next section, the aforementioned coupled HJ equation is approximated using the ADP algorithm in a hybrid iterative manner.According to the above preparation, it becomes evident that each generation of the coupled HJ equation, as presented in (8), has a connection with the other control laws.Consequently, the multiple coupled generations are not entirely autonomous.Therefore, this paper aims to establish an adaptive and distributed consistent voltage recovery control strategy for the N-distributed game system (1).To tackle this problem, there have been studies on designing near-optimal control methods, including policy iteration and value iteration, which are all online learning techniques.
The implementation of policy iteration requires certain initial conditions.Specifically, the system must commence iteration with an admissible control strategy.Subsequently, the iteration performance index is approximated, the control law of the current iteration step is evaluated and then the strategy is updated.These evaluation and update steps are repeated until a termination condition, similar to the inequality in (12), is satisfied.
Obviously, the advantage of this iterative method lies in its high computational efficiency, as policy iteration based on an initial control law can enable the system to quickly obtain the optimal control law.In contrast to policy iteration, the initial value of the iteration value function in value iteration can be arbitrary.In order to overcome these limitations, a hybrid iterative algorithm is used to solve the optimal control strategy.This algorithm combines the advantages of policy iteration and value iteration and not only does it not require the initially allowed control strategy, but it also has a high convergence speed.

Hybrid Iteration for the Microgrid Systems
In this part, by applying the hybrid iteration strategy and the reinforcement learning method, a recovery control protocol for adaptive distributed voltage is constructed.First, a hybrid iteration approach based on the system model is proposed and a convergence analysis is provided.Then, the implementation process of the hybrid iteration approach based on data is presented, along with proof of convergence.
The hybrid iteration is based on the execution of policy iteration and the hybrid iteration algorithm is presented as follows.First, choose positive semidefinite It has the iterative update rule of the control policy, as follows The value function is renewed by where ε [n] i is a deterministic sequence, which satisfies The cyclic search process ( 9) and (10) of the admissible control policy will stop until Then, based on the admissible control law (9), it obtains the exploring process of the optimal control policy based on the following two iterative equations.The iteration cost function P

[n]
i is solved from and the control policy can be updated according to the following equation Through the above two steps of policy evaluation and policy improvement, the optimal policy sequence and its corresponding value function are finally reached or converge.
Note that the hybrid iteration ( 9)-( 14) still faces a strict requirement, since it still requires precise knowledge of system dynamics models.For many complex practical systems, it is not easy to acquire accurate knowledge of system dynamics models.Even when some degree of knowledge of system dynamics models can be obtained in some cases, it is a challenge to effectively apply this knowledge to hybrid iteration because, in practice, the model needs to be simplified or approximated in order to be able to perform effective iterative calculations.However, such simplifications or approximations may introduce errors that affect the accuracy of the iterative results.Hence, to avoid dependence on the system model, the hybrid iteration ( 9)-( 14) will be improved, and the hybrid iteration based on state and input information will be developed and proposed.
Taking the derivative of the cost function Here, define H(s i , ∇P i , Since the system lacks a dynamical model, the system data are used to construct the approximated functions.Without loss of generality, we take the ith generation as an example, the approximated functions Vi (s i ) and Ĥ(s i , ∇P i , w i ) are given by and where ϕ j (s i ), φ j (s i ) and ψ j (s i ) are linearly independent basis functions and Ŵj , ηj and γj are weight matrices.The differential equations are used to update the weight matrices. where Over the procedure, an essentially bounded input such as exploration noise is used to update the weights during a time interval.The iterative process from ( 17)-( 19) are operated until an admissible control is gained.Based on the admissible control, the system (1) is rewritten as Then, along the solutions (20) based on ( 13) and ( 14), it has Over any time interval [t k , t k+1 ], we integrate both sides of ( 21) and it has From the expression of the above formula, it can be seen that the hybrid iteration scheme in terms of ADP is based on data and does not require accurate parameters of the microgrid model to ensure that the microgrid system reaches the Nash equilibrium situation.
Merge (22) with (23) and obtain where e [n] i,k is the error generated in the approximation process.It can be seen that ŵ[n] j and γj are obtained from (24) in a least squares sense, which is making the ∑ l j=1 e 2 i,k minimal.For the microgrids with N-distributed generations, voltage recovery control is very important.In order to maintain the stability of the power system, it is necessary to adopt appropriate control algorithms to ensure that the voltage reaches the best situation.The hybrid iteration algorithm based on ADP proposed in this paper is an efficient learning control strategy for voltage recovery.It gradually adjusts the power output of the microgrid in an iterative manner to achieve optimal voltage recovery.Next, the learning process of the algorithm is demonstrated intuitively through Algorithm 1.

Algorithm 1
The hybrid iteration algorithm for microgrids with N-distributed generations Initialization: Select an initial system state s i (0); Select a positive semidefinite Update the control policy in Equation ( Update the cost function in Equation (17) 4: end while 5: return w * i and P * i At the beginning, the hybrid iteration algorithm is initialized and the relevant parameters are set.The iterative process is then entered and in each iteration, the algorithm calculates the power output scheme of each microgrid according to the current parameters and the real-time operating state of the microgrid.These schemes are improved by the proposed optimization algorithm to ensure the effect of voltage recovery.As the number of iterations increases, algorithm 1 will continuously adjust the power output scheme of the microgrid to achieve voltage recovery.At the same time, algorithm 1 dynamically adjusts parameters and strategies according to the real-time operating state of the microgrid and changes in the external environment.When the algorithm meets the convergence condition, the final voltage recovery scheme is output and applied to the microgrid system to achieve the optimal voltage recovery situation.

Convergence of the Hybrid Iteration
In this section, the convergence of ( 9)-( 14) is analyzed first.Then, we rigorously prove that the property of hybrid iteration with no need for the information with respect to the system physics.
Theorem 1.For the state-space model (1), the value function P

[n]
i and the control policy w i is convergent to P * i at a quadratic convergence rate and w [n] i is convergent to w * i .
Proof.In [23] and [30], the proof of sup s i ||P [n] i − w * i || ≤ E has been given as i → ∞.Note that the conditions of ( 12) are sufficient to make the following two inequations hold.

P[n]
i > 0 (25) and Therefore, ∃n e , such that an admissible control input law w [n e ] i is obtained from ( 9)-( 12) with respect to the microgrids with an N-distributed generation system (1).Let the system (1) be driven by the control input w [n e ] i , then, according to (3), integrate (26) with respect to time and you can obtain That means w [n e ] i is an admissible control.Then, based on [31], it has {w [n] i } ∞ n=n e converge point-wise to w * i , as n → ∞, and {P [n] i } ∞ n=n e converges on P * i , as n → ∞.Furthermore, according to ( 13) and ( 14), the above formula has where It can be seen that ( 28) is a Newton-Raphson method and the target is to obtain ∇P i from the nonlinear equation inf Proof.It has been proved in [23] that one can select a large t f which could make (19) have the uniqueness of the solution.Since ŵ[n e ] i is an admissible control law, and on the basis of [31], the optimal control policy and the corresponding value function can be obtained approximately as n, M 0 and M 1 going to infinity.
The non-zero-sum hybrid iteration mode devised in this paper, which is independent of the system model, does not need the system dynamics parameters at all and uses the collected data for training and learning to describe the optimal behavior trajectory under the interaction between individuals.Based on the rigorous analysis of the two theorems, it can be concluded that the control strategy attained through the model-free hybrid iteration scheme is convergent and can make the microgrid system stable.

Simulation Results
In this section, the microgrid system with two distributed generations is given to show the effectiveness of the proposed control method.The dynamics of the system are given as [32] f i (s i ) = cos(s i,1 )s i,2 0.8 sin(s i,1 ) and In the simulation experiment, the initial states of the above system are given as s 1 (0) = [0.6,−1.1] T and s 2 (0) = [−1, 1] T .In the networks, the activation functions are selected as ϕ j = φ j = ψ j = [s 2 i,1 , s i,1 s i,2 , s 2 i,2 ] T .To indicate the effectiveness and feasibility of the presented hybrid iteration method, the optimal control protocols obtained by the value iteration and policy iteration are also considered.Then, the number costed by the learning iterations and the average CPU time are analyzed and summarized for the hybrid iteration as well as for the value iteration and policy iteration as in Table 1.According to the results of Table 1, it can be seen that hybrid iteration spends less CPU time and iterations to obtain the same simulation result.Therefore, hybrid iteration is a more powerful tool, it uses less numbers for learning iterations required for convergence and costs less average CPU time until convergence.Furthermore, the previous information of the admissible control law is dispensable to initiate hybrid iteration.
The state trajectories of each generation are given in Figures 1 and 2. The system state described in Figure 1 mainly represents the output voltage value of the microgrid system, and the stability of the output voltage is directly related to the normal operation of each device in the microgrid system and the user's power consumption experience.The system state described in Figure 2 is the derivative of the amplitude, reflecting the rate of change of the system state, which is of great significance for the analysis of the dynamic performance and stability of the system.It can be seen that under the action of the algorithm designed in this paper, the system state converges to zero at a faster speed and finally reaches a steady situation.The results also show that the mathematical model established in this paper is very effective for the microgrid system.Based on the control strategy achieved through hybrid iteration, the model can guarantee the stability and realize the reliability of the microgrid system with relatively low computational costs.The voltage recovery control protocol input trajectories by hybrid iteration are displayed in Figure 3.The designed controller can generate a continuous control input to the microgrid system, avoid chattering in the system state and make the system run stably.It is worth noting that the value function obtained by the convergent optimal control strategy applied to the microgrid system is also optimal.Combining Figures 1-3, it can be found that the proposed method saves a lot of learning iterations required for convergence.Figures 4 and 5 depict the value function surfaces according to the hybrid iterationbased control law designed by this paper.These surfaces not only reflect the dynamic characteristics of the control law, but also show the evolution of the performance index function during the control process.The performance index function is affected by the punishment or reward generated by the controlled object in different stages and updates the parameters based on the principle of optimality.The critic network guides the action network to approximate the optimal control strategy; in other words, the approximation of the control law is carried out on the basis of the estimation of the performance index function, so the dynamic change of the value function is very important.It can be intuitively seen from these two 3D graphs that both sets of value functions are monotone and nonincreasing, which means that after the finite times of hybrid iteration and strategy update designed in this paper, the value function gradually converges to the optimal.The data statistics and simulation results in this section show that the ADP algorithm based on hybrid iteration designed in this paper can make the microgrid system reach the control target with fewer iterations and has the advantages of fast computing speed and low implementation cost.In general, the proposed optimal voltage recovery learning scheme can be applied to microgrid systems and the control effect is satisfactory.

Conclusions
In view of the complexity of the actual microgrid system structure and the sensitivity to external environmental interference, the uncertainty contained in it is often difficult to predict.This paper solves the optimal control problem of the microgrids with N-distributed generations system with unknown dynamics by means of a non-zero-sum differential game and a novel optimal control law has been put forward for dynamic optimization and adaptive regulation of microgrid systems by introducing an adaptive hybrid iteration algorithm.It is worth noting that the control strategy in this paper is not based on the unknown physical characteristics of the system, but is obtained in real time according to the system state and input information data.This makes the control strategy more [n+1] i are given as in (13) and (14), then P[n]
This explains that the P > N * , M 0 > M * 0 and M 1 > M * 1 , then the control policy obtained by (23) satisfies ||w n * i − w * i || ≤ ε e and the cost function showed in (17) satisfies ||P n * i − P * i || ≤ ε e .

Table 1 .
Comparison results of hybrid iteration, value iteration and policy iteration.