Comparative Performance Analysis of the DC-AC Converter Control System Based on Linear Robust or Nonlinear PCH Controllers and Reinforcement Learning Agent

Starting from the general topology and the main elements that connect a microgrid represented by a DC power source to the main grid, this article presents the performance of the control system of a DC-AC converter. The main elements of this topology are the voltage source inverter represented by a DC-AC converter and the network filters. The active Insulated Gate Bipolar Transistor (IGBT) or Metal–Oxide–Semiconductor Field-Effect Transistor (MOSFET) elements of the DC-AC converter are controlled by robust linear or nonlinear Port Controlled Hamiltonian (PCH) controllers. The outputs of these controllers are modulation indices which are inputs to a Pulse-Width Modulation (PWM) system that provides the switching signals for the active elements of the DC-AC converter. The purpose of the DC-AC converter control system is to maintain ud and uq voltages to the prescribed reference values where there is a variation of the three-phase load, which may be of balanced/unbalanced or nonlinear type. The controllers are classic PI, robust or nonlinear PCH, and their performance is improved by the use of a properly trained Reinforcement Learning-Twin Delayed Deep Deterministic Policy Gradient (RL-TD3) agent. The performance of the DC-AC converter control systems is compared using performance indices such as steady-state error, error ripple and Total Harmonic Distortion (THD) current value. Numerical simulations are performed in Matlab/Simulink and conclude the superior performance of the nonlinear PCH controller and the improvement of the performance of each controller presented by using an RL-TD3 agent, which provides correction signals to improve the performance of the DC-AC converter control systems when it is properly trained.


Introduction
Although there are various topologies and connection schemes for the connection of microgrids to the main grid, in general, it can be said that the central element is a voltage source inverter, represented by a DC-AC converter that can connect a DC source power to the main grid. Among the other important elements of the system, a special role is played by the connection filters attempting to perform primary filtering due to load fluctuations or parametric variation. In a top-down approach to the general issues that can be found in a microgrid, we can start with the issues of optimization and forecasting from an economic point of view [1,2] and then analyze the control elements of the main subassemblies of the microgrid, i.e., the DC-DC converter [3,4], DC-AC converter [5][6][7], battery energy storage system (BESS) [8,9], and last but not least specific connection elements in the case of electric vehicles connected to the microgrid [10].
When the purpose of such a system is to maintain certain quality quantities (e.g., u d and u q voltages described in the d-q reference frame) to prescribed values with minimal fluctuations when the load and system parameter values may vary, it is necessary to use • Presentation, synthesis, and implementation of the robust control algorithm for DC-AC converter control; • Presentation, synthesis, and implementation of the PCH control algorithm based on the passivity theory for the DC-AC converter control; • Presentation, synthesis, and implementation of an RL-TD3 agent, by covering the stages of creation, training, testing and validation for each of the PI, robust and PCH controllers; • Implementation in Matlab/Simulink of the software applications for the calculation of the steady-state error performance indicators and the error ripple of the u d voltage and THD current phase a of the microgrid-to-the-main-grid connection system using a DC-AC converter for the comparative analysis of PI, robust and PCH control systems with or without the RL-TD3 agent.
The rest of the paper is structured as follows: Section 2 presents the robust control of the DC-AC converter and the Matlab/Simulink implementation of the robust controller, while the PCH-type control and the Matlab/Simulink implementation of the PCH-type controller are presented in Section 3. Section 4 presents the numerical simulations, and future works are presented in the final section.

Robust Control of the DC-AC Converter
In general, the coupling of a microgrid (considered as a DC power source in the structure discussed below) to the grid is achieved by means of a voltage source inverter (DC-AC converter). Assuming that the DC power source is capable of supplying a constant current to power the DC-AC converter, Figure 1 shows the block diagram for the DC-AC converter control system using a robust controller.
The elements in the block diagram are shown in the d-q frame, and to synchronize the voltage at the output of the DC-AC converter with the voltage supplied by the grid, references i dref , i qref are set initially to 0, while the breaker is set to the closed position. The grid voltages are filtered by a low-pass filter to reduce harmonics and then supply a feed-forward to the robust controller outputs to obtain PWM modulation pulses for the DC-AC converter control.
The grid-characteristic currents i a , i b , i c , are dictated by the consumers connected to it and represent the input quantities for the robust controller, which will be synthesized using the robust systems theory. This controller will supply the control signals to a PWM generator, and by driving active MOSFET or IGBT elements in the DC-AC converter, u d voltage will be kept constant, which is the main objective of the control system for the presented benchmark. We specify that in the microgrid topology shown in Figure 1, there is no BESS precisely in order to follow the benchmark presented. From the point of view of the synthesis of the controllers proposed in this article, the absence or presence of a BESS does not influence the synthesis of these controllers or the performance of these control systems. This is due to the fact that in the currents i a , i b , i c, which represent inputs for the controller, there are fluctuations caused by consumers, and possible BESS', both in the stationary regime and in dynamic regime, as a result of their connection or disconnection. Moreover, Ref. [8] presents the control of the main phenomena occurring when there is a BEES, namely their charging or discharging according to certain criteria imposed by the connection to the microgrid. These refer to the charging and discharging of the BESS when the voltage at its terminals is lower, respectively higher by a set percentage than the voltage which is intended to be kept constant in the microgrid. These goals are achieved through the use of classical PI-type cascade controllers, where the charging/discharging current of the BESS is regulated in the inner loop, and the voltage at the BESS terminals is regulated in the outer loop. forward to the robust controller outputs to obtain PWM modulation pulses for the DC-AC converter control. The grid-characteristic currents ia, ib, ic, are dictated by the consumers connected to it and represent the input quantities for the robust controller, which will be synthesized using the robust systems theory. This controller will supply the control signals to a PWM generator, and by driving active MOSFET or IGBT elements in the DC-AC converter, ud voltage will be kept constant, which is the main objective of the control system for the presented benchmark. We specify that in the microgrid topology shown in Figure 1, there is no BESS precisely in order to follow the benchmark presented. From the point of view of the synthesis of the controllers proposed in this article, the absence or presence of a BESS does not influence the synthesis of these controllers or the performance of these control systems. This is due to the fact that in the currents ia, ib, ic, which represent inputs for the controller, there are fluctuations caused by consumers, and possible BESS', both in the stationary regime and in dynamic regime, as a result of their connection or disconnection. Moreover, [8] presents the control of the main phenomena occurring when there is a BEES, namely their charging or discharging according to certain criteria imposed by the connection to the microgrid. These refer to the charging and discharging of the BESS when the voltage at its terminals is lower, respectively higher by a set percentage than the voltage which is intended to be kept constant in the microgrid. These goals are achieved through the use of classical PI-type cascade controllers, where the charging/discharging current of the BESS is regulated in the inner loop, and the voltage at the BESS terminals is regulated in the outer loop.

Mathematical Description of the Robust Control for DC-AC Converter
In the d-q frame, for Figure 1, the quality quantities ud and uq voltages are defined in the sense that the purpose of the DC-AC converter control is to maintain the constant values of ud = 310 V and uq = 0 V. To use the concepts of the robust control systems theory, plant G is presented, starting from the single phase representation in Figure 2, where the notations are the usual ones.

Mathematical Description of the Robust Control for DC-AC Converter
In the d-q frame, for Figure 1, the quality quantities u d and u q voltages are defined in the sense that the purpose of the DC-AC converter control is to maintain the constant values of u d = 310 V and u q = 0 V. To use the concepts of the robust control systems theory, plant G is presented, starting from the single phase representation in Figure 2, where the notations are the usual ones. Thus, the mathematical description takes the form given by Equations (1) and (2). where: represents the external input, and the control input is represented by u. It can be noted that the quantities u, uG, and iref are three-dimensional vectors consisting of the components for each phase a, b, and c. The rest of the matrices are expressed in the following expressions [13,16,17].
The following output can be chosen: The transfer function usually denoted as G is represented as (4). Usually, G can be rewritten according to the theory of robust systems as (5).
These can be represented schematically as in Figure 3. The role of the robust control is to find a controller K(s) capable of minimizing the H∞ norm of the transfer function ξ, μ, and W(s) represents the weighting parameters, which will be specified in the robust controller synthesis algorithm. Thus, the mathematical description takes the form given by Equations (1) and (2).
where: x = i 1 i 2 u c T represents the state, w = u G i re f T represents the external input, and the control input is represented by u. It can be noted that the quantities u, u G , and i ref are three-dimensional vectors consisting of the components for each phase a, b, and c. The rest of the matrices are expressed in the following expressions [13,16,17].
The following output can be chosen: The transfer function usually denoted as G is represented as (4). Usually, G can be rewritten according to the theory of robust systems as (5).
These can be represented schematically as in Figure 3. The role of the robust control is to find a controller K(s) capable of minimizing the H∞ norm of the transfer function T z w = F l (P, K) from the external inputs w = v w T to the quality quantities z = z 1 z 2 T . ξ, µ, and W(s) represents the weighting parameters, which will be specified in the robust controller synthesis algorithm.  Thus, the mathematical description takes the form given by Equations (1) and (2). where: represents the external input, and the control input is represented by u. It can be noted that the quantities u, uG, and iref are three-dimensional vectors consisting of the components for each phase a, b, and c. The rest of the matrices are expressed in the following expressions [13,16,17].
The following output can be chosen: The transfer function usually denoted as G is represented as (4). Usually, G can be rewritten according to the theory of robust systems as (5).
These can be represented schematically as in Figure 3. The role of the robust control is to find a controller K(s) capable of minimizing the H∞ norm of the transfer function ξ, μ, and W(s) represents the weighting parameters, which will be specified in the robust controller synthesis algorithm.  The equations of the extended system can be written as follows: where: the extended plant is noted with P and K is the controller to be designed. The extended plant P contains, as in Figure 3, the weighting ξ and µ and the low-pass filter W(s). Based on these specifications, Equation (6) will be extended in the form of Equations (7) and (8) [13,16,17].

Matlab/Simulink Implementation of the Robust Control for DC-AC Converter
Using the notations in Section 2.1, the extended plant P takes the following form [13,16,17]: By using hinfsyn() command from Robust Control toolbox of Matlab, the robust controller K(s) can be obtained [16,17]: The transfer functions of the low-pass filters on each phase used to filter the voltages in the grid from Figure 1 are chosen by the form expressed in relation (11), and additionally, the following weights can be chosen as: ξ = 100 and µ = 0.26.
The synthesized controller, weights and low-pass filters are implemented in a Simulinktype scheme as in Figure 4. The transfer function of the robust controller K(s) is shown in relation (12).
The values of nominal parameters of the DC-AC converter circuit elements are given in Table 1. Table 1. DC-AC converter circuit elements-nominal parameters [16,17,22,23].

Parameter Value Unit
Filter inductance L f 150·10

Improvement of the Robust Control for DC-AC Converter Using RL-TD3 Agent
A combined control of the DC-AC converter system based on a robust controller and RL-TD3 agent can be proposed to improve the performance of the DC-AC converter control system. Among machine learning-based controls, the most suitable variant for industrial process control is provided by RL [24][25][26][27].
Thus, the main stages of creating, training, validating and using an RL agent are suggestively presented in Figure 5. Also, by analogy with the control of an industrial process, it can be noted that, based on observations collected from the Environment (similarly to reading analog/digital inputs from an industrial process), the RL-TD3 agent provides actions (similarly to providing analog/digital outputs to an industrial process) based on the optimization of a reward calculated according to the proposed objectives (similarly to the optimization of an integral criterion in the industrial process control). (a)

Improvement of the Robust Control for DC-AC Converter Using RL-TD3 Agent
A combined control of the DC-AC converter system based on a robust controller and RL-TD3 agent can be proposed to improve the performance of the DC-AC converter control system. Among machine learning-based controls, the most suitable variant for industrial process control is provided by RL [24][25][26][27].
Thus, the main stages of creating, training, validating and using an RL agent are suggestively presented in Figure 5. Also, by analogy with the control of an industrial process, it can be noted that, based on observations collected from the Environment (similarly to reading analog/digital inputs from an industrial process), the RL-TD3 agent provides actions (similarly to providing analog/digital outputs to an industrial process) based on the optimization of a reward calculated according to the proposed objectives (similarly to the optimization of an integral criterion in the industrial process control).
Thus, the main stages of creating, training, validating and using an RL agent are su gestively presented in Figure 5. Also, by analogy with the control of an industrial proce it can be noted that, based on observations collected from the Environment (similarly reading analog/digital inputs from an industrial process), the RL-TD3 agent provides a tions (similarly to providing analog/digital outputs to an industrial process) based on t optimization of a reward calculated according to the proposed objectives (similarly to t optimization of an integral criterion in the industrial process control).  For the improvement of the proposed control system, an RL-TD3 agent algori chosen. After completing the training, testing and validation stages, the RL-TD3 age provide correction signals to the robust controller commands to improve the perfor of the control system for the DC-AC converter shown in Figure 6. The details of the Matlab/Simulink implementation of the RL-TD3 agent for th rection of uaref, ubref,and ucref command signals are presented in Figure 7.
With the values of the circuit elements presented in Table 1, the robust controll the filters presented in Section 2.2, and for idref = 5 A, iqref = 0 A, udref = 310 V, and uqre For the improvement of the proposed control system, an RL-TD3 agent algorithm is chosen. After completing the training, testing and validation stages, the RL-TD3 agent will provide correction signals to the robust controller commands to improve the performance of the control system for the DC-AC converter shown in Figure 6. For the improvement of the proposed control system, an RL-TD3 agent algorithm is chosen. After completing the training, testing and validation stages, the RL-TD3 agent will provide correction signals to the robust controller commands to improve the performance of the control system for the DC-AC converter shown in Figure 6. The details of the Matlab/Simulink implementation of the RL-TD3 agent for the correction of uaref, ubref,and ucref command signals are presented in Figure 7.
With the values of the circuit elements presented in Table 1, the robust controller and the filters presented in Section 2.2, and for idref = 5 A, iqref = 0 A, udref = 310 V, and uqref = 0 V, The details of the Matlab/Simulink implementation of the RL-TD3 agent for the correction of u aref , u bref ,and u cref command signals are presented in Figure 7.
With the values of the circuit elements presented in Table 1, the robust controller and the filters presented in Section 2.2, and for i dref = 5 A, i qref = 0 A, u dref = 310 V, and u qref = 0 V, Figure 8 shows the reward evolution in training stage for the implemented RL-TD3 algorithm performance.
The time of the training stage for the implemented RL-TD3 agent for command signals correction of the robust controller is 2 h, 11 min, and 5 s. The sampling time of the RL-TD3 algorithm is 10 −4 s, and the training stage is of 200 epochs.  In the RL-TD3 agent training stage, it is used an optimization criterion (13) with the usual notations.
where: j t u 1 − includes the actions in the previous step.

PCH Control of the DC-AC Converter
Similar to the description in Figure 1, Figure 9 shows the block diagram of the control system for the DC-AC converter based on a PCH-type controller. The main components are the follows: DC voltage source; three-phase voltage source inverter (DC-AC converter); LC filter; load; and the control system for DC-AC converter. Usually, the controller is implemented with a PI control law, but in this section, based on the PCH theory, will be presented the synthesis of a PCH controller, which will provide modulation indices for the control of the active control elements in the DC-AC converter. In the RL-TD3 agent training stage, it is used an optimization criterion (13) with the usual notations.
where: u j t−1 includes the actions in the previous step.  In the RL-TD3 agent training stage, it is used an optimization criterion (13) with the usual notations.
where: j t u 1 − includes the actions in the previous step.

PCH Control of the DC-AC Converter
Similar to the description in Figure 1, Figure 9 shows the block diagram of the control system for the DC-AC converter based on a PCH-type controller. The main components are the follows: DC voltage source; three-phase voltage source inverter (DC-AC converter); LC filter; load; and the control system for DC-AC converter. Usually, the controller is implemented with a PI control law, but in this section, based on the PCH theory, will be presented the synthesis of a PCH controller, which will provide modulation indices for the control of the active control elements in the DC-AC converter.

PCH Control of the DC-AC Converter
Similar to the description in Figure 1, Figure 9 shows the block diagram of the control system for the DC-AC converter based on a PCH-type controller. The main components are the follows: DC voltage source; three-phase voltage source inverter (DC-AC converter); LC filter; load; and the control system for DC-AC converter. Usually, the controller is implemented with a PI control law, but in this section, based on the PCH theory, will be presented the synthesis of a PCH controller, which will provide modulation indices for the control of the active control elements in the DC-AC converter.

Mathematical Description of the PCH Control
If, in the previous section, the description equations of the controlled system are usually linearized to obtain a robust controller, in this section, the PCH theory will be used to obtain a nonlinear controller, which will have superior performance. Thus, Figure 10 shows the schematic single-phase representation of the controlled system.
Based on the PCH theory and d-q reference frame representation, the synthesis functions of the modulation indices md and mq will be obtained, and then, by means of a PWM block, the switching signals S1…S6 will be obtained for the control of the IGBT active elements for the control of the DC-AC converter. Starting from the diagram in Figure 10, where the notations are the usual ones in the d-q reference frame for the modulation indices, angular frequency, currents and voltages, the following equations can be written:

Mathematical Description of the PCH Control
If, in the previous section, the description equations of the controlled system are usually linearized to obtain a robust controller, in this section, the PCH theory will be used to obtain a nonlinear controller, which will have superior performance. Thus, Figure 10 shows the schematic single-phase representation of the controlled system.
Based on the PCH theory and d-q reference frame representation, the synthesis functions of the modulation indices m d and m q will be obtained, and then, by means of a PWM block, the switching signals S 1 . . . S 6 will be obtained for the control of the IGBT active elements for the control of the DC-AC converter.

Mathematical Description of the PCH Control
If, in the previous section, the description equations of the controlled system are usually linearized to obtain a robust controller, in this section, the PCH theory will be used to obtain a nonlinear controller, which will have superior performance. Thus, Figure 10 shows the schematic single-phase representation of the controlled system.
Based on the PCH theory and d-q reference frame representation, the synthesis functions of the modulation indices md and mq will be obtained, and then, by means of a PWM block, the switching signals S1…S6 will be obtained for the control of the IGBT active elements for the control of the DC-AC converter. Starting from the diagram in Figure 10, where the notations are the usual ones in the d-q reference frame for the modulation indices, angular frequency, currents and voltages, the following equations can be written: Starting from the diagram in Figure 10, where the notations are the usual ones in the d-q reference frame for the modulation indices, angular frequency, currents and voltages, the following equations can be written: System (14) can be written as Port Hamiltonian model as follows: where: the state vector is noted with x, the interconnection matrix and damping matrix are noted with J and R, the energy stored by the system is noted with H(x), the input matrix is noted with g, the control input vector is noted with u, and the external input is noted with ζ. Thus, the Port Hamiltonian model of the DC-AC converter can be obtained as [22,23]: where: the matrices from Equation (15) are expressed in the following relations: where: J = −J T and R = R T ≥ 0. Denoting the energy stored in the elements L f and C f as H(x), the following relation can be written: An admissible state vector is defined based on passivity from control theory [22,23]: Based on these, equations expressed in (15) becomes on the form: .
where: u * is bounded. By denoting the variable quantities: x = x − x re f and u = u − u * , the system (20) becomes: By denoting the gradient of the energy function as (18) can be rewritten in the next form: where the gradient of the variable energy function can be expressed in the next form: With these the equation given in (21) can be written as follows: From Equation (24), the dynamic regime can be obtained as follows: The output signal of the system can be denoted in the next form: Using the energy function expressed in (27) by performing a series of calculations, it can be concluded that the system given in (25) is passive, because the inequality .
With these, the PCH controller has the next form: .
This form is the analogue of a PI controller with constants k P and k I , where the output signal is given by the equation expressed in (29) like in the next form: Based on these, from Equation (24) currents i dref and i qref can be obtained as Equation (30) and the modulation indices m dref and m qref as Equation (31).
3.2. Matlab/Simulink Implementation of the PCH Control Combined with RL-TD3 Agent for Command Signals Correction Similar to Section 2.2, the main purpose of this section is to present a method for improving the control system for DC-AC converter performance by using an RL-TD3 agent, in which the basic controller is shown to be both the classic PI type and the PCH type controller.
Based on the classic PI control structure, Figure 11 shows the block diagram structure for the Matlab/Simulink model implementation of the control system for the DC-AC converter based on PI controller and an RL-TD3 agent.

Matlab/Simulink Implementation of the PCH Control Combined with RL-TD3 Agent for Command Signals Correction
Similar to Section 2.2, the main purpose of this section is to present a method for improving the control system for DC-AC converter performance by using an RL-TD3 agent, in which the basic controller is shown to be both the classic PI type and the PCH type controller.
Based on the classic PI control structure, Figure 11 shows the block diagram structure for the Matlab/Simulink model implementation of the control system for the DC-AC converter based on PI controller and an RL-TD3 agent.   Figure 12 shows the details implementation of the RL-TD3 agent for the correction of i dref and i qref signals, which is represented in the Reinforcement Learning subsystem shown in Figure 11.
With the values of the circuit elements presented in Table 1, the PI controllers and RL-TD3 agent for control of the DC-AC converter, and for i dref = 5 A, i qref = 0 A, u dref = 310 V, and u qref = 0 V, Figure 13 presents the reward evolution of the RL-TD3 algorithm performance.
The time of the training stage for the implemented RL-TD3 agent for command signals correction of the PI controller is one hour, 42 min, and 11 s.
The sampling time of the RL-TD3-type agent algorithm is 0.0001 s, and the training stage is 200 epochs.
RL-TD3 agent for control of the DC-AC converter, and for idref = 5 A, iqref = 0 A, udref = 310 V, and uqref = 0 V, Figure 13 presents the reward evolution of the RL-TD3 algorithm performance.
The time of the training stage for the implemented RL-TD3 agent for command signals correction of the PI controller is one hour, 42 min, and 11 s.
The sampling time of the RL-TD3-type agent algorithm is 0.0001 s, and the training stage is 200 epochs.  The optimization criterion (the reward) used in the training stage of the control system for DC-AC converter based on PI controllers and RL-TD3 agent is presented in Equation (32).
(32) Figure 14 shows the block diagram structure for the Matlab/Simulink model implementation of the control system for the DC-AC converter based on PHC controller and an RL-TD3 agent. It can be noted in the Simulink implementation of Equations (30) and (31) in the structure of the PCH-type controller. RL-TD3 agent for control of the DC-AC converter, and for idref = 5 A, iqref = 0 A, udref = and uqref = 0 V, Figure 13 presents the reward evolution of the RL-TD3 algorithm p mance.
The time of the training stage for the implemented RL-TD3 agent for comman nals correction of the PI controller is one hour, 42 min, and 11 s.
The sampling time of the RL-TD3-type agent algorithm is 0.0001 s, and the tra stage is 200 epochs.   The optimization criterion (the reward) used in the training stage of the control system for DC-AC converter based on PI controllers and RL-TD3 agent is presented in Equation (32).
(32) Figure 14 shows the block diagram structure for the Matlab/Simulink model implementation of the control system for the DC-AC converter based on PHC controller and an RL-TD3 agent. It can be noted in the Simulink implementation of Equations (30) and (31) in the structure of the PCH-type controller. The detail of the implementation of the RL-TD3 agent for the correction of edref, eqref, idref, and iqref command signals, which is represented in the Reinforcement Learning subsystem shown in Figure 14, is presented in Figure 15.
With the values of the circuit elements presented in Table 1, the PCH-type controller and RL-TD3 agent for control of DC-AC converter, and for idref = 5 A, iqref = 0 A, udref = 310 V, and uqref = 0 V, Figure 16 presents the reward evolution of the RL-TD3 algorithm performance.
The time of the training stage for the implemented RL-TD3 agent for command signals correction of the PCH-type controller is one hour, 58 min, and 56 s. The sampling time of the RL-TD3-type agent algorithm is 0.0001 s and the training stage is of 200 epochs.
The optimization criterion (the reward) used in the training stage of the control system for DC-AC converter based on PCH controller and RL-TD3 agent is presented in Equation  The detail of the implementation of the RL-TD3 agent for the correction of e dref , e qref , i dref , and i qref command signals, which is represented in the Reinforcement Learning subsystem shown in Figure 14, is presented in Figure 15.
With the values of the circuit elements presented in Table 1, the PCH-type controller and RL-TD3 agent for control of DC-AC converter, and for i dref = 5 A, i qref = 0 A, u dref = 310 V, and u qref = 0 V, Figure 16 presents the reward evolution of the RL-TD3 algorithm performance.
The time of the training stage for the implemented RL-TD3 agent for command signals correction of the PCH-type controller is one hour, 58 min, and 56 s. The sampling time of the RL-TD3-type agent algorithm is 0.0001 s and the training stage is of 200 epochs.
The optimization criterion (the reward) used in the training stage of the control system for DC-AC converter based on PCH controller and RL-TD3 agent is presented in Equation (33).
The control law for DC-AC converter output is given by the modulation indices m d and m q , and by means of an inverse Park transformation (d-q→abc), the real modulation indices m a , m b , and m c are obtained. These modulation indices provide the input signals for a PWM block whose outputs are represented by the switching signals S 1 . . . S 6 , which represent the control elements for the active elements of the DC-AC converter voltage.  The control law for DC-AC converter output is given by the modulation indices md and mq, and by means of an inverse Park transformation (d-q→abc), the real modulation indices ma, mb, and mc are obtained. These modulation indices provide the input signals for a PWM block whose outputs are represented by the switching signals S1…S6, which represent the control elements for the active elements of the DC-AC converter voltage.

Numerical Simulations
Starting from Figures 1, 2, 9 and 10, which show the block diagram for the control system of the DC-AC converter using a robust controller and PCH-type controller, respectively, Figure 17 summarizes the Matlab/Simulink implementation of the proposed control system of the DC-AC converter based on PI, Robust or PCH type controllers and RL-TD3 agents for command signals correction. The numerical values of the circuit elements are given in Table 1 in Section 2, and the quality quantities ud and uq voltages defined d-q frame, aimed at DC-AC converter control, will be kept at constant values ud = 310 V and uq = 0 V.
The controllers used are the classic PI controller, the robust controller and the nonlinear PCH controller. Each of these three controllers will be backed up with an RL-TD3 agent trained accordingly in order to improve the performance of each control system. The aimed performances of the DC-AC converter control systems are the steady-state error, the error ripple, and the THD current. In order to reveal aspects of the actual  The control law for DC-AC converter output is given by the modulation indices md and mq, and by means of an inverse Park transformation (d-q→abc), the real modulation indices ma, mb, and mc are obtained. These modulation indices provide the input signals for a PWM block whose outputs are represented by the switching signals S1…S6, which represent the control elements for the active elements of the DC-AC converter voltage.

Numerical Simulations
Starting from Figures 1, 2, 9 and 10, which show the block diagram for the control system of the DC-AC converter using a robust controller and PCH-type controller, respectively, Figure 17 summarizes the Matlab/Simulink implementation of the proposed control system of the DC-AC converter based on PI, Robust or PCH type controllers and RL-TD3 agents for command signals correction. The numerical values of the circuit elements are given in Table 1 in Section 2, and the quality quantities ud and uq voltages defined d-q frame, aimed at DC-AC converter control, will be kept at constant values ud = 310 V and uq = 0 V.
The controllers used are the classic PI controller, the robust controller and the nonlinear PCH controller. Each of these three controllers will be backed up with an RL-TD3 agent trained accordingly in order to improve the performance of each control system. The aimed performances of the DC-AC converter control systems are the steady-state error, the error ripple, and the THD current. In order to reveal aspects of the actual

Numerical Simulations
Starting from Figures 1, 2, 9 and 10, which show the block diagram for the control system of the DC-AC converter using a robust controller and PCH-type controller, respectively, Figure 17 summarizes the Matlab/Simulink implementation of the proposed control system of the DC-AC converter based on PI, Robust or PCH type controllers and RL-TD3 agents for command signals correction. The numerical values of the circuit elements are given in Table 1 in Section 2, and the quality quantities u d and u q voltages defined d-q frame, aimed at DC-AC converter control, will be kept at constant values u d = 310 V and u q = 0 V.
The controllers used are the classic PI controller, the robust controller and the nonlinear PCH controller. Each of these three controllers will be backed up with an RL-TD3 agent trained accordingly in order to improve the performance of each control system. The aimed performances of the DC-AC converter control systems are the steady-state error, the error ripple, and the THD current. In order to reveal aspects of the actual operation, for each of the controllers presented above and the targeted performance, the load used in the simulation will be of three types: balanced, unbalanced, and nonlinear. In the case of the balanced load, the resistance on each phase is 5 Ω. In the case of the unbalanced load, the resistance on phase b is chosen of a very high value compared to the other two phases, a and c, with a resistance of 5 Ω. In the case of nonlinear load, the resistances on each phase are the same but are described by voltage-current pairs u(k) and i(k), where the discretization variable k covers the simulation period.
operation, for each of the controllers presented above and the targeted performance, the load used in the simulation will be of three types: balanced, unbalanced, and nonlinear. In the case of the balanced load, the resistance on each phase is 5 Ω. In the case of the unbalanced load, the resistance on phase b is chosen of a very high value compared to the other two phases, a and c, with a resistance of 5 Ω. In the case of nonlinear load, the resistances on each phase are the same but are described by voltage-current pairs u(k) and i(k), where the discretization variable k covers the simulation period.  Thus, Figures 18-20 show the time evolution of ud and uq voltages for the DC-AC converter control system based on the PI controller in the case when the load is balanced, unbalanced or nonlinear. Figures 21-23, for the same types of load variation, show the time evolution of ud and uq voltages for the DC-AC converter control system based on the PI controller improved by using an RL-TD3 agent. Substantial improvement in control system performance can be observed when using PI control in combination with an RL-TD3 agent. Figures 24-26 show the time evolution of ud and uq voltages for the DC-AC converter control system based on a robust controller when the load is balanced, unbalanced or nonlinear. Figure 27-29, for the same types of load variation, show the time evolution of ud and uq voltages for the DC-AC converter control system based on robust controller improved by using an RL-TD3 agent. Substantial improvement in control system performance can be observed when using the robust control in combination with an RL-TD3 agent. Figures 30-32 show the time evolution of ud and uq voltages for the DC-AC converter control system based on the PCH-type controller in the case when the load is balanced, unbalanced or nonlinear.  Thus, Figures 18-20 show the time evolution of u d and u q voltages for the DC-AC converter control system based on the PI controller in the case when the load is balanced, unbalanced or nonlinear. Figures 21-23, for the same types of load variation, show the time evolution of u d and u q voltages for the DC-AC converter control system based on the PI controller improved by using an RL-TD3 agent. Substantial improvement in control system performance can be observed when using PI control in combination with an RL-TD3 agent. Figures 24-26 show the time evolution of u d and u q voltages for the DC-AC converter control system based on a robust controller when the load is balanced, unbalanced or nonlinear. Figures 27-29 for the same types of load variation, show the time evolution of u d and u q voltages for the DC-AC converter control system based on robust controller improved by using an RL-TD3 agent. Substantial improvement in control system performance can be observed when using the robust control in combination with an RL-TD3 agent. Figures 30-32 show the time evolution of u d and u q voltages for the DC-AC converter control system based on the PCH-type controller in the case when the load is balanced, unbalanced or nonlinear. Figures 33-35, for the same types of load variation, show the time evolution of u d and u q voltages for the DC-AC converter control system based on the PCH-type controller improved by using an RL-TD3 agent. Substantial improvement in control system performance can be observed when using PCH-type control in combination with an RL-TD3 agent.
In Table 2, in terms of the steady-state error, it can be noted that the performance of each control system based on the main PI controller, robust controller, and PCH-type controller is improved when using a properly trained RL-TD3 agent. Moreover, in the hierarchy of the three basic controllers, the robust-type controller has better performance than the classic PI-type controller, but obviously, the system controlled with a nonlinear PCH-type controller has superior performance.
It can also be noted that the steady-state error in robust and PCH controllers, with or without RL-TD3 agent, is two to five times lower than the steady-state error when using a classic PI controller. It is also worth noting that the use of an RL-TD3 agent in tandem with the robust controller provides superior performance compared to a nonlinear PCH controller without an RL-TD3 agent.
In general, the analysis in Table 2 shows that the steady-state errors with respect to the basic balanced load regime are 50% higher in the case of nonlinear load and up to five times higher in the case of unbalanced load for each of the controllers used.
Also, another important indicator for characterizing the performance of the DC-AC converter control system is the ripple of the error signal of the u d voltage, which is calculated according to Equation (34). It can be concluded from the analysis of the results presented in Table 2 that the order of the controllers in terms of the performance of the control system is also maintained for this indicator, similar to the case of the steady-state error performance indicator.
Thus, the superiority of the PCH nonlinear controller is also concluded for the error signal ripple indicator, and there are also obvious improvements brought by the use of RL-TD3 agent. In Table 2 it can be noted that the error ripple value with respect to the basic case of the balanced load is up to 20% higher in the case of the nonlinear load and about four times higher in the case of the unbalanced load.
where: N represents the sample number, u d represents the voltage and u dref represents the reference voltage.
mance can be observed when using PCH-type control in combination with an RL-TD3 agent.
In Table 2, in terms of the steady-state error, it can be noted that the performance of each control system based on the main PI controller, robust controller, and PCH-type controller is improved when using a properly trained RL-TD3 agent. Moreover, in the hierarchy of the three basic controllers, the robust-type controller has better performance than the classic PI-type controller, but obviously, the system controlled with a nonlinear PCHtype controller has superior performance.
It can also be noted that the steady-state error in robust and PCH controllers, with or without RL-TD3 agent, is two to five times lower than the steady-state error when using a classic PI controller. It is also worth noting that the use of an RL-TD3 agent in tandem with the robust controller provides superior performance compared to a nonlinear PCH controller without an RL-TD3 agent.
In general, the analysis in Table 2 shows that the steady-state errors with respect to the basic balanced load regime are 50% higher in the case of nonlinear load and up to five times higher in the case of unbalanced load for each of the controllers used.
Also, another important indicator for characterizing the performance of the DC-AC converter control system is the ripple of the error signal of the ud voltage, which is calculated according to Equation (34). It can be concluded from the analysis of the results presented in Table 2 that the order of the controllers in terms of the performance of the control system is also maintained for this indicator, similar to the case of the steady-state error performance indicator.
Thus, the superiority of the PCH nonlinear controller is also concluded for the error signal ripple indicator, and there are also obvious improvements brought by the use of RL-TD3 agent. In Table 2 it can be noted that the error ripple value with respect to the basic case of the balanced load is up to 20% higher in the case of the nonlinear load and about four times higher in the case of the unbalanced load.
where: N represents the sample number, ud represents the voltage and udref represents the reference voltage.                                               Another important indicator of the DC-AC converter control system is the THD which is described by the following relation: where: IN is the RMS value of the harmonic N and IRMS is the RMS value of the fundamental of the signal. Figures 36-41 show the FFT analysis and THD for the current phase a of the DC-AC converter controller for the types of controllers and load variations presented above. Figures 36 and 37 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with PI-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load. Figures 38 and 39 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with a robust-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load. Figures 40 and 41 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with PCH-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load.  Another important indicator of the DC-AC converter control system is the THD which is described by the following relation: where: IN is the RMS value of the harmonic N and IRMS is the RMS value of the fundamental of the signal. Figures 36-41 show the FFT analysis and THD for the current phase a of the DC-AC converter controller for the types of controllers and load variations presented above. Figures 36 and 37 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with PI-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load. Figures 38 and 39 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with a robust-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load. Figures 40 and 41 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with PCH-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load. Another important indicator of the DC-AC converter control system is the THD which is described by the following relation: where: I N is the RMS value of the harmonic N and I RMS is the RMS value of the fundamental of the signal. Figures 36-41 show the FFT analysis and THD for the current phase a of the DC-AC converter controller for the types of controllers and load variations presented above. Figures 36 and 37 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with PI-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load. Figures 38 and 39 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with a robust-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load. Figures 40 and 41 show FFT analysis and THD for the current on phase a of the DC-AC converter controlled with PCH-type controller without/with RL-TD3 agent in the case of balanced, unbalanced or nonlinear type for the load.       Since the controlled system is a DC-AC converter, the THD-type indicator of the current signal on phase a is a very important indicator, especially as it must be lower than a value required by power quality standards (usually IEC and IEEE type standards [28] recommend a current THD of less than 12% for a number of harmonics N = 50). Table 2 shows the THD values for the currents on phase a for all three types of controllers presented with or without RL-TD3 agent for the three types of load presented.
As in the case of the indicators of the steady-state error and the ripple of the error of the u d voltage, the order of the performance of the controllers is also kept in the case of the indicator of phase-a THD current, in the sense of the superiority of the nonlinear PCH controller and the improvement of the performance of each controller when using an RL-TD3 agent.
It can be noted, however, that due to the way the nonlinear resistance is defined, the phase-a THD current values are twice as high in the unbalanced load case and up to three times as high in the nonlinear load case compared to the main balanced load case.

Conclusions
This article presented the performance of the control system of a DC-AC converter. The article considers the main elements by which a microgrid represented by a DC power source is connected to the main grid. The main element is a voltage source inverter which is represented by a DC-AC converter whose IGBT active elements are controlled by robust linear or nonlinear PCH controllers. The outputs of these controllers are the modulation indices m d and m q in the d-q reference frame, which, by an inverse Park transformation, are transformed into the actual modulation indices m a , m b , and m c , which provide the switching signals S 1 . . . S 6 for the active elements of the DC-AC converter when they pass through a PWM system. The purpose of the DC-AC converter control system is to maintain the reference values u d = 310 V and u q = 0 V of u d and u q voltages under load variation. The article presents the block structures of the overall microgrid-to-grid connection system, and the three-phase load is assumed to be balanced/unbalanced or nonlinear. The controllers are classic PI, robust or nonlinear PCH type, and their performance is improved by means of a properly trained RL-TD3 agent. The performance of DC-AC converter control systems is compared using such performance indices as the steady-state and ripple of the error of the u d voltage and phase-a THD current of the microgrid-to-main-grid connection system using a DC-AC converter. The numerical simulations are performed in Matlab/Simulink and reveal the superiority of the performance of the nonlinear PCH controller but also the improvement of the performance of each controller presented by using an RL-TD3 agent, which provides correction signals for the control signals of the corresponding controllers when it is properly trained, to improve the performance of the control systems. In future papers, the software used in the numerical simulations will be implemented in real-time, allowing the transition from the Software-in-the-Loop stage to the Hardware-in-the-Loop stage using dedicated platforms such as SpeedGoat or RT-Opal.