On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning

In this paper, a tracking control approach is developed based on an adaptive reinforcement learning algorithm with a bounded cost function for perturbed nonlinear switched systems, which represent a useful framework for modelling these converters, such as DC–DC converter, multi-level converter, etc. An optimal control method is derived for nominal systems to solve the tracking control problem, which results in solving a Hamilton–Jacobi–Bellman (HJB) equation. It is shown that the optimal controller obtained by solving the HJB equation can stabilize the perturbed nonlinear switched systems. To develop a solution to the translated HJB equation, the proposed neural networks consider the training technique obtaining the minimization of square of Bellman residual error in critic term due to the description of Hamilton function. Theoretical analysis shows that all the closed-loop system signals are uniformly ultimately bounded (UUB) and the proposed controller converges to optimal control law. The simulation results of two situations demonstrate the effectiveness of the proposed controller.


Introduction
Many power electronic converters play a remarkable role in industrial applications, such as electrical drives, renewable energy systems, etc. [1][2][3][4][5]. The modelling technique in power electronic converters is usually implemented by considering the average small-signal analysis. This method enables us to design the voltage, current controller, as well as implement the space vector modulation technique. However, the average small-signal model method is disadavantageous for complicated applications, with the requirement of tracking control systems. Switched systems have been known as a special class of hybrid systems describing an active application area in the field of power electronics, such as DC-DC converter, power source, etc. [6][7][8][9][10][11][12][13][14][15]. The approach of employing switched systems representation in control design has been the efficient method to model many practical systems based on multi-subsystem. Basic issues in the control of switched systems are to find the control input under arbitrary switching signal and/or to find the appropriate switching signal obtaining the tracking, stability of closed-loop systems. Additionally, the identification of the active mode based on estimating the delay to analyze the switched systems was implemented in [6]. The authors in [7] proposed a novel Lyapunov function combining linear matrix inequalities (LMIs) for switched linear systems in the presence of exogenous disturbance. The control design problem with stochastic stability based on a complete probability space under the general random switching signal was investigated in [8]. The stabilizing switching signal was designed using average dwell time with proposed existence conditions [9]. Moreover, several conditions to characterize the mode-dependent dwell-times have been given in [10]. The switching event triggered and back stepping technique-based tracking controller achieved the bound of all signals and tracking error in the closed-loop system [11]. However, the optimal control with effectiveness in dealing with constraint has not been mentioned in switched systems yet [6][7][8][9][10][11]. For the optimal control design, it is necessary to find the result of a Ricatti equation for linear systems or a Hamilton-Jacobi-Bellman (HJB) equation for nonlinear systems [16]. However, in the general case, it is hard to solve the HJB (Hamilton-Jacobi-Bellman) equation for finding the optimal controller. In [16][17][18][19], thanks to the approximation description of Neural Networks (NNs), the weight of the actor and critic were updated simultaneously under the optimization problem for continuous/discrete time systems. The convergence effectiveness of weights in actor/critic policy is guaranteed by the persistence excitation (PE) condition. It is worth emphasizing that to deal with the uncertainties of nonlinear continuous systems, the identifier was inserted into the control structure [19] and the off-policy technique under the separation between actor/critic policy and control input in the first stage is presented [20]. Moreover, the constraint in input/output was mentioned in the adaptive/approximate dynamic programming (ADP) technique [17,18,21] by using the appropriate cost function and the dynamic programming principle. The framework of ADP was utilized in control scheme for surface vessel systems [22], a spring-Mass-Damper [23], two mass-spring systems [20], Wheeled mobile robotic systems [24]. Based on approximating the modified critic term in each control loop, the stability of whole cascade systems was guaranteed in surface vessel systems due to the consideration of the derivative of Lyapunov candidate function along the closed system [22,25]. However, all the ARL-based controllers presented in [20,[22][23][24] are implemented under the consideration of nonlinear systems, which are not mentioned in the switching signals. The model-free problem, as well as the connection between continuous time systems and appropriate discrete time systems, ensure the development of off-policy algorithm [26,27]. To deal with the complete dynamic uncertainties, the modified cost function and critic enable us to obtain the approximately optimal control law [28][29][30][31]. Additionally, the general policy iteration (GPI) in linear discrete time systems [32] achieves the simultaneous computation and appropriate data collection.
To the best knowledge of the authors, adaptive reinforcement learning (ARL) for uncertain switched systems is still an open problem. Therefore, this paper attempts to implement an ARL algorithm for uncertain continuous-time nonlinear switched systems under arbitrary switching signals based on an optimization problem in NNs training. Inspired by the above works, this paper studies the ARL-based optimal control problem for a class of perturbed switched nonlinear continuous systems. First, the optimal control algorithm is established for switched systems. Then, the neural network is employed to approximate the Critic part in policy. Based on optimization principle, the training law is proposed to develop the optimal control strategy. The main contributions of this paper are as follows: (1) In comparison with the previous papers [12][13][14][15]18,22,29,30,33], an optimal control obtained from the nominal system is proposed for perturbed switched nonlinear continuous systems based on dynamic programming principle. (2) The Neural Network training law with optimization principle is developed to achieve the ARL-based optimal control strategy. (3) The strict proof concerning UUB stability of the closed system and the convergence of the controller to optimal control input are given based on Lyapunov stability theory and reinforcement learning scheme.
The rest of this article is organized as follows. The preliminaries and problem statements are presented in Section 2. The main results are given in Section 3. Two simulation cases are presented in Section 4 to illustrate the effectiveness of the proposed solution. Finally, the conclusions are exhibited in Section 5.

Problem Statement and Preliminaries
In [15], it can be seen that switched systems can represent a useful framework for modelling general power electronics converters. Therefore, in this section, we consider the perturbed continuoustime nonlinear switched system described by: where x (t) ∈ Ω x ⊂ R n and u (t) ∈ Ω u ⊂ R m are the vector of state variables and the control inputs vector, respectively. The function σ : [ 0, +∞) → Ω = {1, 2, ..., n} is an unknown switching signal and n is the number of subsystems. f i (x) (∀i ∈ Ω) are unknown smooth vector functions satisfying f i (0) = 0 and g i (x)(∀i ∈ Ω) are known smooth vector functions such that G min ≤ g i (x) ≤ G max .

Assumption 1.
There exists a known function ρ (x) satisfying that: Regarding the perturbed continuous-time nonlinear switched system (1), we introduce the cost function formulated as: Control Objective: This article aims at designing the optimal guaranteed cost control scheme u * = arg min u∈Ω u K (x, u) despite arbitrary switching law, in which the feedback control law u and a finite upper bound function K (x, u) are satisfied, not only is the closed-loop system (1) robustly stable but the cost function (2) also satisfies the condition: Definition 1. The function K(x, u) can be known as the guaranteed cost function. Therefore, the control law u * with u * = arg min u∈Ω u K (x, u) is known as the optimal guaranteed cost control law.

Remark 1.
It is worth emphasizing that the main objective of this work is to find the optimal control for the equivalent nominal system with the performance index being modified. Compared with the control objective in [33], this work investigates the tracking control for switched systems, for which it is hard to develop the optimal control algorithm. Additionally, it is obviously different from the existing work of robust ADP in [31], since the proposed optimal control design is implemented based on the upper bound function K(x, u).

Adaptive Reinforcement Learning-Based Control Design
In this section, we investigate the ARL-based optimal control for perturbed nonlinear switched systems. Due to the difficulties in implementing directly the optimal control scheme for perturbed nonlinear systems, the strategy is proposed by three steps as follows. First, based on Dynamic Programming, we obtain the optimal control design for the corresponding nominal switched systems obtained by eliminating the uncertainties. Then, the ARL algorithm is developed by Neural Network technique for this nominal system. Finally, we carry out a stability analysis of the closed-loop system involving the perturbed switched system and the proposed ARL controller. According to perturbed system (1), the nominal system is obtained by eliminating the term of uncertainties as: Because the purpose is to consider the tracking problem of optimal control law and the influence of uncertainties, the cost function is modified to satisfy that J (x (t) , u (t)) ≤ J 1 (x (t) , u (t)). Therefore, the cost function associated with (3) can be represented as: It should be noted that J 1 (x (t) , u (t)) with λ ≥ R is one of the guaranteed cost function associated with system (1). According to (2) and (4), it can be seen that: Based on the Dynamic programming principle, establishing the Bellman function as: , we have the cost function formulated as: As the convergence of ∆t → 0 + , we can derive that: where ∇V * = ∂V * ∂x . Consider the Hamilton function obtaining from the nominal system and performance index (4): The control input is computed by minimizing this function with u ∈ Ω u as Therefore, we find that (13) yields We implement to develop this control algorithm (13) for nonlinear switched system (1) and achieve the following result: Theorem 1. Take system (1) with the feedback control law u * (x) = − 1 2 R −1 (g σ (x)) T ∇V * into consideration. Then, the cost function V * (t) = ∞ t r (x * (τ) , u * (τ)) + λ (ρ (x * )) 2 dτ is the Lyapunov function candidate with λ ≥ R can guarantee that the system (1) is stable.
Proof. Consider the derivative of V * (t) along the solution of Equation (1), we achieve the result formulated as:V It follows from u * (x) = − 1 2 R −1 (g i (x)) T ∇V * and according to (14), (11) and (12), we have: According to Assumption 1 and λ ≥ R , it can be seen that: Therefore, the system (1) is stable under the optimal control for equivalent nominal system.
It is noteworthy that Theorem 1 has a new extension in switched systems compared with nonlinear systems in the existing result [33].
However, it is impossible to find the Bellman function V * (x) based on solving analytically the HJB nonlinear Equation (12). Hence, to solve it, we construct a critic network under the framework of adaptive critic learning. Using the approximation property of neural networks described in [28], the critic associated with system (3) can be described as: where σ (x) : R n → R N ; σ (0) = 0 is the vector activation function including N linearly independent elements, N is the number of neurons in the hidden layer of Radial Basis Function (RBF) network [28] and it is also the number of dimensions of σ (x) : R n → R N , and ε (x) is the function reconstruction error playing the role in finding the training law the in next steps, w ∈ R N is the idea weight vector and is generally unavailable. As N → ∞ converges to infinity, we obtain that ε (x) → 0 and ∇ε (x) → 0. The following assumption is considered for each fixed N.

Assumption 2.
The estimation of terms in NNs can be described as: ε (x) ≤ ε max ; ∇ε (x) ≤ ∇ε max ; ∇σ min ≤ ∇σ (x) ≤ ∇σ max ; w ≤ w max According two Equations (12) and (13), it holds that: It follows from (20) to: The NN-based HJB equation can be represented as follows: Thus, the residual error is formulated by the function approximation error: It should be noted that, as N goes to infinity, e NN converges uniformly to zero. Hence, the residual error e NN is bounded for each fixed N. Under the framework of ADP-based approximate optimal control design, a critic neural network is established by an estimated weight vectorŵ given as: The approximate error of the critic network can be formulated as: The weight vector is determined by a steepest descent algorithm: with E = 1 2 e T H JB e H JB (30) to minimize the quadratic function E = 1 2 e T H JB e H JB . The fact is that the following inequality:

Remark 2. The weight vector w is designed as in
Theorem 2. The proposed optimal control (27) with the weight vector of the critic network (30) can force the system (1) to be uniform ultimate bounded (UUB) stability.
Proof. Define:w = w −ŵ ⇒ d dtw = − d dtŵ . Consider the Lyapunov function: For deriving the term V 1 (t), we obtain that: According to (27) and (13), it follows that: Moreover, we have: From (26) and (29), it follows: Because u * = − 1 2 R −1 (g i (x)) T (∇σ (x)) T w + ∇ε (x) , we have: From (35) we obtain: Define: We have: According to (49) and (50), the inequality can be obtained It can be obtained that π 1 > 0 with a big enough variable w and the highest order coefficient (G min ) 2 λ min R −1 (∇σ min ) 2 2 > 0. Therefore, we can determine the positive number ϑ 1 such that: ∀ w > ϑ 1 , we imply that (A + 4B) 2 − 16B 2 + 2D 2 ≥ π 1 , and from (48) we obtain:V 1 (t) ≤ −π 1 . Regarding the term of V 2 (t), from (20) we compute the derivative of them: It can be considered that ρ (x) = x . According to (51), we obtain: According to the assumption 1 and assumption 2, it follows that: It can be seen that (λ min (Q) + λ ) x 2 − θ 2 ≥ π 2 with π 2 > 0 also leads to other inequalities, including the polynomial quadratic form with the variable x and the highest order coefficient (λ min (Q) + λ ) > 0. Hence, we can find the positive number ϑ 2 such that: ∀ x > ϑ 2 we get (λ min (Q) + λ ) x 2 − θ 2 ≥ π 2 , and from (52) we obtaiṅ Remark 3. The numbers ϑ 1 ; ϑ 2 can be changed by establishing the neural network of the optimal cost function. Additionally, for any switching index, the variable x and w converge to the specified domains. The approximate optimal control lawû is considered in the equation as (27), which converges to the neighbourhood of the optimal control as u * . Different from the controller in [33], the switched system with unknown switching signal is mentioned in this work by proposed adaptive optimal control.
Proof. It should be noted that according to (34), we imply that:

Simulation Results
In this section, we verify the effectiveness and performance of the proposed controller. The ARL algorithm is developed for solving the optimal and tracking problem by RBF network-based ADP, where a single critic Neural Network is used to approximate the Bellman function. The ARL control law and the weight of the Critic part are established as in (27) and (30) with the appropriate learning rate α and the coefficient λ. Moreover, to carry out the ARL algorithm (27) and (30), we need to employ the term of model g i (x), the term of cost function Q, R and the appropriate function σ(x) in Neural Network. To verify the proposed method of our algorithm, two different situations are implemented to do simulations with all the parameters and functions being listed in each case as follows:

The Second-Order Switched Nonlinear Systems
In this simulation experiment, we consider the case of switched systems including N = 2 subsystems to be described as Equations (56) and (57): The initial value of state vector is selected as and choosing the parameters and matrices in cost function as follows: R = 2 0 0 2 ; Q = 1 0 0 3 ; α = 0.1; λ = 5. Additionally, the function σ(x) in RBF Neural Network can be chosen as Under the arbitrary switching law (Figure 1), the results are shown in Figures 2-6 by using proposed ARL law, in which the responses of state variables are described in Figures 2 and 3 under the control input ( Figure 5) and the weights are trained as in Figure 6. It is obvious that the closed system is stable due to the convergence of state variables to zero (Figures 2 and 3). On the other hand, the convergence is also shown in the weights of Critic part ( Figure 6).

Remark 4.
The second-order switched nonlinear system was also mentioned in [14] by using nonlinear control law after the equivalent transform operation (Figures 1-6). However, it is noteworthy that the response of control input of proposed method has a better transient and steady performance compared with the nonlinear controller in [14] because of the effectiveness of handling the constraint in optimal control (Figures 4 and 5).

The Third-Order Switched Nonlinear Systems
Here, we continue to investigate the case of switched systems including N = 2 subsystems as follows: In this case, under the arbitrary switching law in Figure 7, the results are obtained in Figures 8-13 with the responses of state variables are shown in Figures 8-10 based on the control input ( Figures 11 and 12), and the training weights are shown in Figure 13. It can be seen that the closed system is stable (Figures 8-10) and the convergence is also shown in the weights of Critic part ( Figure 13).

Remark 5.
It is worth noting that the third-order switched nonlinear system was also considered in [15] based on the adaptive backstepping output feedback nonlinear control scheme combined with the transform operation obtaining the response of state variables, control input in Figures 8-11 and adaptation law ( Figure 14). However, thanks to the property of optimal control, the control input response of the proposed controller has a better transient compared with the adaptive backstepping nonlinear controller in [15] (Figures 11 and 12).

Conclusions
This paper investigated the optimal control design for perturbed switched nonlinear systems based on the adaptive dynamic programming technique. The optimal control is first designed under the consideration of nominal systems. Then, the ADP technique is developed by using the Neural Networks. Due to the description of nonlinear systems, as well as unknown switch index, the neural network is considered to approximate the critic part of the iterative algorithm. Moreover, the UUB stability problem of the closed-loop system and the convergence of weight training are guaranteed under this solution. Finally, two simulation examples are given to verify the effectiveness of the presented ARL algorithm.