On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning

Dao, Phuong Nam; Nguyen, Hong Quang; Ngo, Minh-Duc; Ahn, Seon-Ju

doi:10.3390/en13195069

Open AccessArticle

On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning

¹

Department of Automatic Control, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam

²

Department of Automation, Thai Nguyen University of Technology, 666, 3/2 Street, Tich Luong Ward, Thai Nguyen City 251750, Vietnam

³

Department of Electrical Engineering, Chonnam National University, Gwangju 61186, Korea

^*

Authors to whom correspondence should be addressed.

Energies 2020, 13(19), 5069; https://doi.org/10.3390/en13195069

Submission received: 19 August 2020 / Revised: 21 September 2020 / Accepted: 24 September 2020 / Published: 27 September 2020

(This article belongs to the Section F: Electrical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, a tracking control approach is developed based on an adaptive reinforcement learning algorithm with a bounded cost function for perturbed nonlinear switched systems, which represent a useful framework for modelling these converters, such as DC–DC converter, multi-level converter, etc. An optimal control method is derived for nominal systems to solve the tracking control problem, which results in solving a Hamilton–Jacobi–Bellman (HJB) equation. It is shown that the optimal controller obtained by solving the HJB equation can stabilize the perturbed nonlinear switched systems. To develop a solution to the translated HJB equation, the proposed neural networks consider the training technique obtaining the minimization of square of Bellman residual error in critic term due to the description of Hamilton function. Theoretical analysis shows that all the closed-loop system signals are uniformly ultimately bounded (UUB) and the proposed controller converges to optimal control law. The simulation results of two situations demonstrate the effectiveness of the proposed controller.

Keywords:

adaptive dynamic programming (ADP); adaptive reinforcement learning (ARL); switched systems; HJB equation; uniformly ultimately bounded (UUB); Lyapunov stability theory

1. Introduction

Many power electronic converters play a remarkable role in industrial applications, such as electrical drives, renewable energy systems, etc. [1,2,3,4,5]. The modelling technique in power electronic converters is usually implemented by considering the average small-signal analysis. This method enables us to design the voltage, current controller, as well as implement the space vector modulation technique. However, the average small-signal model method is disadavantageous for complicated applications, with the requirement of tracking control systems. Switched systems have been known as a special class of hybrid systems describing an active application area in the field of power electronics, such as DC–DC converter, power source, etc. [6,7,8,9,10,11,12,13,14,15]. The approach of employing switched systems representation in control design has been the efficient method to model many practical systems based on multi-subsystem. Basic issues in the control of switched systems are to find the control input under arbitrary switching signal and/or to find the appropriate switching signal obtaining the tracking, stability of closed-loop systems. Additionally, the identification of the active mode based on estimating the delay to analyze the switched systems was implemented in [6]. The authors in [7] proposed a novel Lyapunov function combining linear matrix inequalities (LMIs) for switched linear systems in the presence of exogenous disturbance. The control design problem with stochastic stability based on a complete probability space under the general random switching signal was investigated in [8]. The stabilizing switching signal was designed using average dwell time with proposed existence conditions [9]. Moreover, several conditions to characterize the mode-dependent dwell-times have been given in [10]. The switching event triggered and back stepping technique-based tracking controller achieved the bound of all signals and tracking error in the closed-loop system [11]. However, the optimal control with effectiveness in dealing with constraint has not been mentioned in switched systems yet [6,7,8,9,10,11]. For the optimal control design, it is necessary to find the result of a Ricatti equation for linear systems or a Hamilton–Jacobi–Bellman (HJB) equation for nonlinear systems [16]. However, in the general case, it is hard to solve the HJB (Hamilton–Jacobi–Bellman) equation for finding the optimal controller. In [16,17,18,19], thanks to the approximation description of Neural Networks (NNs), the weight of the actor and critic were updated simultaneously under the optimization problem for continuous/discrete time systems. The convergence effectiveness of weights in actor/critic policy is guaranteed by the persistence excitation (PE) condition. It is worth emphasizing that to deal with the uncertainties of nonlinear continuous systems, the identifier was inserted into the control structure [19] and the off-policy technique under the separation between actor/critic policy and control input in the first stage is presented [20]. Moreover, the constraint in input/output was mentioned in the adaptive/approximate dynamic programming (ADP) technique [17,18,21] by using the appropriate cost function and the dynamic programming principle. The framework of ADP was utilized in control scheme for surface vessel systems [22], a spring-Mass-Damper [23], two mass-spring systems [20], Wheeled mobile robotic systems [24]. Based on approximating the modified critic term in each control loop, the stability of whole cascade systems was guaranteed in surface vessel systems due to the consideration of the derivative of Lyapunov candidate function along the closed system [22,25]. However, all the ARL-based controllers presented in [20,22,23,24] are implemented under the consideration of nonlinear systems, which are not mentioned in the switching signals. The model-free problem, as well as the connection between continuous time systems and appropriate discrete time systems, ensure the development of off-policy algorithm [26,27]. To deal with the complete dynamic uncertainties, the modified cost function and critic enable us to obtain the approximately optimal control law [28,29,30,31]. Additionally, the general policy iteration (GPI) in linear discrete time systems [32] achieves the simultaneous computation and appropriate data collection.

To the best knowledge of the authors, adaptive reinforcement learning (ARL) for uncertain switched systems is still an open problem. Therefore, this paper attempts to implement an ARL algorithm for uncertain continuous-time nonlinear switched systems under arbitrary switching signals based on an optimization problem in NNs training. Inspired by the above works, this paper studies the ARL-based optimal control problem for a class of perturbed switched nonlinear continuous systems. First, the optimal control algorithm is established for switched systems. Then, the neural network is employed to approximate the Critic part in policy. Based on optimization principle, the training law is proposed to develop the optimal control strategy. The main contributions of this paper are as follows:

(1): In comparison with the previous papers [12,13,14,15,18,22,29,30,33], an optimal control obtained from the nominal system is proposed for perturbed switched nonlinear continuous systems based on dynamic programming principle.
(2): The Neural Network training law with optimization principle is developed to achieve the ARL-based optimal control strategy.
(3): The strict proof concerning UUB stability of the closed system and the convergence of the controller to optimal control input are given based on Lyapunov stability theory and reinforcement learning scheme.

The rest of this article is organized as follows. The preliminaries and problem statements are presented in Section 2. The main results are given in Section 3. Two simulation cases are presented in Section 4 to illustrate the effectiveness of the proposed solution. Finally, the conclusions are exhibited in Section 5.

2. Problem Statement and Preliminaries

In [15], it can be seen that switched systems can represent a useful framework for modelling general power electronics converters. Therefore, in this section, we consider the perturbed continuous-time nonlinear switched system described by:

\dot{x} = f_{σ} (x) + g_{σ} (x) (u + Δ (x, t))

(1)

where

x (t) \in Ω_{x} \subset R^{n}

and

u (t) \in Ω_{u} \subset R^{m}

are the vector of state variables and the control inputs vector, respectively. The function

σ : [0, + \infty) \to Ω = \{1, 2, . . ., n\}

is an unknown switching signal and n is the number of subsystems.

f_{i} (x)

(\forall i \in Ω)

are unknown smooth vector functions satisfying

f_{i} (0) = 0

and

g_{i} (x)

(\forall i \in Ω)

are known smooth vector functions such that

G_{min} \leq ∥g_{i} (x)∥ \leq G_{max}

.

Assumption 1.

There exists a known function

ρ (x)

satisfying that:

∥Δ (x, t)∥ \leq ρ (x)

.

Regarding the perturbed continuous-time nonlinear switched system (1), we introduce the cost function formulated as:

J (x (t), u (t)) = \int_{t}^{\infty} r (x (τ), u (τ)) d τ

(2)

where

r (x (τ), u (τ)) = x^{T} Q x + u^{T} R u

and

Q = Q^{T} \geq 0;

R = R^{T} > 0

Control Objective: This article aims at designing the optimal guaranteed cost control scheme

u^{*} = arg min_{u \in Ω_{u}} K (x, u)

despite arbitrary switching law, in which the feedback control law u and a finite upper bound function

K (x, u)

are satisfied, not only is the closed-loop system (1) robustly stable but the cost function (2) also satisfies the condition:

J (x, u) \leq K (x, u) \leq M

.

Definition 1.

The function

K (x, u)

can be known as the guaranteed cost function. Therefore, the control law

u^{*}

with

u^{*} = arg min_{u \in Ω_{u}} K (x, u)

is known as the optimal guaranteed cost control law.

Remark 1.

It is worth emphasizing that the main objective of this work is to find the optimal control for the equivalent nominal system with the performance index being modified. Compared with the control objective in [33], this work investigates the tracking control for switched systems, for which it is hard to develop the optimal control algorithm. Additionally, it is obviously different from the existing work of robust ADP in [31], since the proposed optimal control design is implemented based on the upper bound function

K (x, u)

.

3. Adaptive Reinforcement Learning-Based Control Design

In this section, we investigate the ARL-based optimal control for perturbed nonlinear switched systems. Due to the difficulties in implementing directly the optimal control scheme for perturbed nonlinear systems, the strategy is proposed by three steps as follows. First, based on Dynamic Programming, we obtain the optimal control design for the corresponding nominal switched systems obtained by eliminating the uncertainties. Then, the ARL algorithm is developed by Neural Network technique for this nominal system. Finally, we carry out a stability analysis of the closed-loop system involving the perturbed switched system and the proposed ARL controller.

According to perturbed system (1), the nominal system is obtained by eliminating the term of uncertainties as:

\dot{x} = f_{σ} (x) + g_{σ} (x) u

(3)

Because the purpose is to consider the tracking problem of optimal control law and the influence of uncertainties, the cost function is modified to satisfy that

J (x (t), u (t)) \leq J_{1} (x (t), u (t))

. Therefore, the cost function associated with (3) can be represented as:

J_{1} (x (t), u (t)) = \int_{t}^{\infty} [r (x (τ), u (τ)) + λ {(ρ (x))}^{2}] d τ

(4)

It should be noted that

J_{1} (x (t), u (t))

with

λ \geq ∥R∥

is one of the guaranteed cost function associated with system (1). According to (2) and (4), it can be seen that:

J (x (t), u (t)) \leq J_{1} (x (t), u (t))

(5)

Based on the Dynamic programming principle, establishing the Bellman function as:

V^{*} (t) = min_{u \in Ω_{u}} J_{1} (x (t), u (t))

, we have the cost function formulated as:

V^{*} (t) = min_{u \in Ω_{u}} \int_{t}^{\infty} [r (x (τ), u (τ)) + λ {(ρ (x))}^{2}] d τ

(6)

V^{*} (t) = min_{u \in Ω_{u}} \int_{t}^{t + Δ t} [r (x (τ), u (τ)) + λ {(ρ (x))}^{2}] d τ + min_{u \in Ω_{u}} \int_{t + Δ t}^{\infty} [r (x (τ), u (τ)) + λ {(ρ (x))}^{2}] d τ

(7)

V^{*} (x) = min_{u \in Ω_{u}} \int_{t}^{t + Δ t} [r (x (τ), u (τ)) + λ {(ρ (x))}^{2}] d τ + V^{*} (t + Δ t)

(8)

min_{u \in Ω_{u}} \{\frac{1}{Δ t} \int_{t}^{t + Δ t} [r (x (τ), u (τ)) + λ {(ρ (x))}^{2}] d τ + \frac{V^{*} (t + Δ t) - V^{*} (t)}{Δ t}\} = 0

(9)

As the convergence of

Δ t \to 0^{+}

, we can derive that:

min_{u \in Ω_{u}} [r (x (t), u (t)) + λ ρ^{2} (x) + {(\nabla V^{*})}^{T} (f_{σ} (x) + g_{σ} (x) u)] = 0

(10)

where

\nabla V^{*} = \frac{\partial V^{*}}{\partial x}

.

Consider the Hamilton function obtaining from the nominal system and performance index (4):

H (x, u, V^{*}) = r (x (t), u (t)) + λ ρ^{2} (x) + {(\nabla V^{*})}^{T} (f_{σ} (x) + g_{σ} (x) u)

(11)

The control input is computed by minimizing this function with

u \in Ω_{u}

as

H (x, u^{*}, V^{*}) = min_{u \in Ω_{u}} H (x, u, V^{*}) = 0

(12)

Therefore, we find that (13) yields

{\frac{\partial H (x, u, V^{*})}{\partial u}|}_{u = u^{*}} = 0 \Rightarrow u^{*} = - \frac{1}{2} R^{- 1} {(g_{σ} (x))}^{T} \nabla V^{*}

(13)

We implement to develop this control algorithm (13) for nonlinear switched system (1) and achieve the following result:

Theorem 1.

Take system (1) with the feedback control law

u^{*} (x) = - \frac{1}{2} R^{- 1} {(g_{σ} (x))}^{T} \nabla V^{*}

into consideration. Then, the cost function

V^{*} (t) = \int_{t}^{\infty} [r (x^{*} (τ), u^{*} (τ)) + λ {(ρ (x^{*}))}^{2}] d τ

is the Lyapunov function candidate with

λ \geq ∥R∥

can guarantee that the system (1) is stable.

Proof.

Consider the derivative of

V^{*} (t)

along the solution of Equation (1), we achieve the result formulated as:

{\dot{V}}^{*} (t) = {(\nabla V^{*})}^{T} (f_{i} (x^{*}) + g_{i} (x^{*}) (u^{*} + Δ (x^{*}, t)))

(14)

It follows from

u^{*} (x) = - \frac{1}{2} R^{- 1} {(g_{i} (x))}^{T} \nabla V^{*}

and according to (14), (11) and (12), we have:

\begin{matrix} 8 {\dot{V}}^{*} (t) & = - r (x^{*} (t), u^{*} (t)) - λ ρ^{2} (x) + {(g_{i} (x^{*}))}^{T} \nabla V^{*} Δ (x^{*}, t) \\ = - x^{* T} Q x^{*} - u^{*} T R u^{*} - λ ρ^{2} (x^{*}) - 2 u^{* T} R Δ (x^{*}, t) \end{matrix}

(15)

= - x^{* T} Q x^{*} - λ ρ^{2} (x^{*}) - (u^{* T} + Δ {(x^{*}, t)}^{T}) R (u^{*} + Δ (x, t)) + Δ {(x^{*}, t)}^{T} R Δ (x^{*}, t)

(16)

= - x^{* T} Q x^{*} - (λ ρ^{2} (x^{*}) - Δ {(x^{*}, t)}^{T} R Δ (x^{*}, t)) - {(u^{*} + Δ (x^{*}, t))}^{T} R (u^{*} + Δ (x, t))

(17)

According to Assumption 1 and

λ \geq ∥R∥

, it can be seen that:

{\dot{V}}^{*} (t) \leq - x^{* T} Q x^{*} - (λ ρ^{2} (x^{*}) - ∥R∥ {∥Δ (x^{*}, t)∥}^{2}) \leq - x^{* T} Q x^{*} - (λ - ∥R∥) ρ^{2} (x^{*})

(18)

\dot{V} (t) \leq - x^{* T} Q x^{*}

(19)

Therefore, the system (1) is stable under the optimal control for equivalent nominal system. □

It is noteworthy that Theorem 1 has a new extension in switched systems compared with nonlinear systems in the existing result [33].

However, it is impossible to find the Bellman function

V^{*} (x)

based on solving analytically the HJB nonlinear Equation (12). Hence, to solve it, we construct a critic network under the framework of adaptive critic learning. Using the approximation property of neural networks described in [28], the critic associated with system (3) can be described as:

V^{*} (x) = w^{T} σ (x) + ε (x)

(20)

where

σ (x) : R^{n} \to R^{N}; σ (0) = 0

is the vector activation function including N linearly independent elements, N is the number of neurons in the hidden layer of Radial Basis Function (RBF) network [28] and it is also the number of dimensions of

σ (x) : R^{n} \to R^{N}

, and

ε (x)

is the function reconstruction error playing the role in finding the training law the in next steps,

w \in R^{N}

is the idea weight vector and is generally unavailable. As

N \to \infty

converges to infinity, we obtain that

ε (x) \to 0

and

\nabla ε (x) \to 0

. The following assumption is considered for each fixed N.

Assumption 2.

The estimation of terms in NNs can be described as:

∥ε (x)∥ \leq ε_{max}; ∥\nabla ε (x)∥ \leq \nabla ε_{max}; \nabla σ_{min} \leq ∥\nabla σ (x)∥ \leq \nabla σ_{max}; ∥w∥ \leq w_{max}

According two Equations (12) and (13), it holds that:

H (x, u^{*}, V^{*}) = r (x (t), u^{*} (t)) + λ ρ^{2} (x) + {(\nabla V^{*})}^{T} (f_{i} (x) + g_{i} (x) u^{*}) = 0

(21)

= x^{T} Q x + λ ρ^{2} (x) + {(\nabla V^{*})}^{T} f_{i} (x) - \frac{1}{4} {(\nabla V^{*})}^{T} g_{i} (x) R^{- 1} g_{i} {(x)}^{T} (\nabla V^{*}) = 0

(22)

It follows from (20) to:

\nabla V^{*} = {(\nabla σ (x))}^{T} w + \nabla ε (x)

(23)

The NN-based HJB equation can be represented as follows:

e_{N N} = x^{T} Q x + λ ρ^{2} (x) + {(\nabla σ {(x)}^{T} w)}^{T} f_{i} (x) - \frac{1}{4} {(\nabla σ {(x)}^{T} w)}^{T} g_{i} (x) R^{- 1} g_{i} {(x)}^{T} (\nabla σ {(x)}^{T} w)

(24)

e_{N N} = x^{T} Q x + λ ρ^{2} (x) + w^{T} \nabla σ (x) f_{i} (x) - \frac{1}{4} w^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} w

(25)

Thus, the residual error is formulated by the function approximation error:

e_{N N} = - \nabla ε {(x)}^{T} (f_{i} (x) + g_{i} (x) u^{*}) + \frac{1}{4} \nabla ε {(x)}^{T} g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla ε (x)

(26)

It should be noted that, as N goes to infinity,

e_{N N}

converges uniformly to zero. Hence, the residual error

e_{N N}

is bounded for each fixed N. Under the framework of ADP-based approximate optimal control design, a critic neural network is established by an estimated weight vector

\hat{w}

given as:

\hat{V} = {\hat{w}}^{T} σ (x) = σ {(x)}^{T} \hat{w}; \hat{u} = - \frac{1}{2} R^{- 1} {(g_{i} (x))}^{T} \nabla \hat{V}

(27)

The approximate error of the critic network can be formulated as:

e_{H J B} = r (x (t), \hat{u} (t)) + λ ρ^{2} (x) + {(\nabla σ {(x)}^{T} \hat{w})}^{T} (f_{i} (x) + g_{i} (x) \hat{u})

(28)

e_{H J B} = x^{T} Q x + λ ρ^{2} (x) + {\hat{w}}^{T} \nabla σ (x) f_{i} (x) - \frac{1}{4} {\hat{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} \hat{w}

(29)

The weight vector is determined by a steepest descent algorithm:

\frac{d}{d t} \hat{w} = - α \frac{\partial E}{\partial \hat{w}}

(30)

with

E = \frac{1}{2} e_{H J B}^{T} e_{H J B}

Remark 2.

The weight vector

\hat{w}

is designed as in (30) to minimize the quadratic function

E = \frac{1}{2} e_{H J B}^{T} e_{H J B}

. The fact is that the following inequality:

\dot{E} = \frac{\partial E}{\partial t} = \frac{\partial E}{\partial \hat{w}} . \frac{\partial \hat{w}}{\partial t} = \frac{\partial E}{\partial \hat{w}} . (\frac{d}{d t} \hat{w}) = - α {(\frac{\partial E}{\partial \hat{w}})}^{2} \leq 0

(31)

Theorem 2.

The proposed optimal control (27) with the weight vector of the critic network (30) can force the system (1) to be uniform ultimate bounded (UUB) stability.

Proof.

Define:

\tilde{w} = w - \hat{w} \Rightarrow \frac{d}{d t} \tilde{w} = - \frac{d}{d t} \hat{w}

. Consider the Lyapunov function:

V (t) = V_{1} (t) + V_{2} (t), where : V_{1} (t) = \frac{1}{2 α} \tilde{w} {(t)}^{T} \tilde{w} (t), V_{2} (t) = V^{*}

For deriving the term

V_{1} (t)

, we obtain that:

\frac{d}{d t} V_{1} (t) = \frac{1}{α} \tilde{w} {(t)}^{T} (\frac{d}{d t} \tilde{w} (t)) = - \frac{1}{α} \tilde{w} {(t)}^{T} (\frac{d}{d t} \hat{w} (t)) = \tilde{w} {(t)}^{T} \frac{\partial E}{\partial \hat{w}}

{\dot{V}}_{1} = {\tilde{w}}^{T} e_{H J B} \nabla σ (x) (f_{i} (x) + g_{i} (x) \hat{u})

According to (27) and (13), it follows that:

\hat{u} - u^{*} = - \frac{1}{2} R^{- 1} {(g_{i} (x))}^{T} (\nabla \hat{V} - \nabla V^{*})

(32)

= - \frac{1}{2} R^{- 1} {(g_{i} (x))}^{T} (\nabla σ {(x)}^{T} \hat{w} - {(\nabla σ (x))}^{T} w - \nabla ε (x))

(33)

= \frac{1}{2} R^{- 1} {(g_{i} (x))}^{T} ({(\nabla σ (x))}^{T} \tilde{w} + \nabla ε (x))

(34)

Moreover, we have:

\nabla σ (x) (f_{i} (x) + g_{i} (x) \hat{u}) = \nabla σ (x) (f_{i} (x) + g_{i} (x) u^{*}) + \nabla σ (x) g_{i} (x) (\hat{u} - u^{*})

(35)

\nabla σ (x) (f_{i} (x) + g_{i} (x) \hat{u}) = \nabla σ (x) (f_{i} (x) + g_{i} (x) u^{*} + \frac{1}{2} g_{i} (x) R^{- 1} g_{i} {(x)}^{T} (\nabla σ {(x)}^{T} \tilde{w} + \nabla ε (x)))

(36)

From (26) and (29), it follows:

\begin{matrix} e_{H J B} - e_{N N} = ({(\nabla σ {(x)}^{T} \hat{w})}^{T} f_{i} (x) - {(\nabla σ {(x)}^{T} w)}^{T} f_{i} (x)) \\ + \frac{1}{4} (w^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} w - {\hat{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} \hat{w}) \\ + \frac{1}{2} {(\nabla σ {(x)}^{T} \hat{w})}^{T} g_{i} (x) R^{- 1} {(g_{i} (x))}^{T} ({(\nabla σ (x))}^{T} \tilde{w} + \nabla ε (x)) \end{matrix}

(37)

\begin{matrix} e_{H J B} - e_{N N} = - {(\nabla σ {(x)}^{T} \tilde{w})}^{T} f_{i} (x) \\ + \frac{1}{4} ((w^{T} - {\hat{w}}^{T}) \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} w + {\hat{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} (w - \hat{w})) \end{matrix}

(38)

\begin{matrix} e_{H J B} - e_{N N} = - {\tilde{w}}^{T} \nabla σ (x) f_{i} (x) \\ + \frac{1}{4} ({\tilde{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} w + {\hat{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} \tilde{w}) \end{matrix}

(39)

\begin{matrix} e_{H J B} - e_{N N} = - {\tilde{w}}^{T} \nabla σ (x) f_{i} (x) \\ + \frac{1}{4} ({\tilde{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} w + {(w - \tilde{w})}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} \tilde{w}) \end{matrix}

(40)

\begin{matrix} e_{H J B} - e_{N N} = - {\tilde{w}}^{T} \nabla σ (x) f_{i} (x) \\ + \frac{1}{2} {\tilde{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} w - \frac{1}{4} {\tilde{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} \tilde{w} \end{matrix}

(41)

Because

u^{*} = - \frac{1}{2} R^{- 1} {(g_{i} (x))}^{T} ({(\nabla σ (x))}^{T} w + \nabla ε (x))

, we have:

\begin{matrix} e_{H J B} - e_{N N} = - {\tilde{w}}^{T} \nabla σ (x) f_{i} (x) + {\tilde{w}}^{T} \nabla σ (x) g_{i} (x) (- u^{*} - \frac{1}{2} R^{- 1} g_{i} {(x)}^{T} \nabla ε (x)) \\ - \frac{1}{4} {\tilde{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} \tilde{w} \end{matrix}

(42)

\begin{matrix} e_{H J B} = e_{N N} - {\tilde{w}}^{T} \nabla σ (x) (f_{i} (x) + g_{i} u^{*}) \\ - \frac{1}{2} {\tilde{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla ε (x) - \frac{1}{4} {\tilde{w}}^{T} \nabla σ (x) g_{i} (x) R^{- 1} g_{i} {(x)}^{T} \nabla σ {(x)}^{T} \tilde{w} \end{matrix}

(43)

Assumption 3.

∥f_{i} (x) + g_{i} (x) u^{*}∥ \leq μ_{max}

Define:

μ_{i} = f_{i} (x) + g_{i} (x) u^{*}; G_{i} = g_{i} (x) R^{- 1} g_{i} {(x)}^{T}; \nabla σ = \nabla σ (x); \nabla ε = \nabla ε (x)

From (35) we obtain:

{\dot{V}}_{1} (t) = - {\tilde{w}}^{T} (- e_{N N} + {\tilde{w}}^{T} \nabla σ μ_{i} + \frac{1}{2} {\tilde{w}}^{T} \nabla σ G_{i} \nabla ε + \frac{1}{4} {\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w}) \nabla σ (x) (μ_{i} + \frac{1}{2} G_{i} (\nabla σ^{T} \tilde{w} + \nabla ε))

(44)

\begin{matrix} {\dot{V}}_{1} (t) = - \frac{1}{8} {\tilde{w}}^{T} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w}) \nabla σ (G_{i} \nabla σ^{T} \tilde{w}) \\ - (\frac{1}{2} {\tilde{w}}^{T} ({\tilde{w}}^{T} \nabla σ μ_{i}) \nabla σ (G_{i} \nabla σ^{T} \tilde{w}) + \frac{1}{4} {\tilde{w}}^{T} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla ε) \nabla σ (G_{i} \nabla σ^{T} \tilde{w}) + \frac{1}{4} {\tilde{w}}^{T} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w}) \nabla σ μ_{i}) \\ - {\tilde{w}}^{T} ({\tilde{w}}^{T} \nabla σ μ_{i} + \frac{1}{2} {\tilde{w}}^{T} \nabla σ G_{i} \nabla ε) \nabla σ μ_{i} + \frac{1}{2} {\tilde{w}}^{T} e_{N N} \nabla σ (G_{i} \nabla σ^{T} \tilde{w}) + {\tilde{w}}^{T} e_{N N} \nabla σ μ_{i} \end{matrix}

(45)

\begin{matrix} {\dot{V}}_{1} (t) = - \frac{1}{8} {({\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w})}^{2} \\ - [\frac{1}{2} ({\tilde{w}}^{T} \nabla σ μ_{i}) + \frac{1}{4} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla ε) + \frac{1}{4} ({\tilde{w}}^{T} \nabla σ μ_{i}) + \frac{1}{2} e_{N N}] ({\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w}) \\ - {({\tilde{w}}^{T} \nabla σ μ_{i})}^{2} + [e_{N N} - \frac{1}{2} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla ε)] ({\tilde{w}}^{T} \nabla σ μ_{i}) \end{matrix}

(46)

\begin{matrix} {\dot{V}}_{1} (t) = - \frac{1}{8} {({\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w})}^{2} \\ - [\frac{3}{4} ({\tilde{w}}^{T} \nabla σ μ_{i}) + \frac{1}{4} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla ε) + \frac{1}{2} e_{N N}] ({\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w}) - {({\tilde{w}}^{T} \nabla σ μ_{i})}^{2} \\ + [e_{N N} - \frac{1}{2} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla ε)] ({\tilde{w}}^{T} \nabla σ μ_{i}) \end{matrix}

(47)

Define:

\begin{matrix} A = {\tilde{w}}^{T} \nabla σ G_{i} \nabla σ^{T} \tilde{w}; B = \frac{3}{4} ({\tilde{w}}^{T} \nabla σ μ_{i}) + \frac{1}{4} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla ε) + \frac{1}{2} e_{N N}; \\ C = {\tilde{w}}^{T} \nabla σ μ_{i}; D = e_{N N} - \frac{1}{2} ({\tilde{w}}^{T} \nabla σ G_{i} \nabla ε) \end{matrix}

It is obvious that:

\begin{matrix} {\dot{V}}_{1} (t) = - \frac{1}{8} A^{2} - B A - C^{2} - D C = - \frac{1}{8} {(A + 4 B)}^{2} + 2 B^{2} - {(C + \frac{D}{2})}^{2} + \frac{D^{2}}{4} \\ - \frac{1}{8} [{(A + 4 B)}^{2} - (16 B^{2} + 2 D^{2})] \end{matrix}

(48)

We have:

A + 4 B \geq {∥\tilde{w}∥}^{2} {(G_{min})}^{2} λ_{min} (R^{- 1}) {(\nabla σ_{min})}^{2} - ∥\tilde{w}∥ (3 \nabla σ_{max} μ_{max} + \nabla σ_{max} {(G_{max})}^{2} λ_{max} (R^{- 1}) \nabla ε_{max}) - 2 e_{max}

(49)

\begin{matrix} 16 B^{2} + 2 D^{2} \leq {(\begin{matrix} ∥\tilde{w}∥ (3 \nabla σ_{max} μ_{max} + \nabla σ_{max} {(G_{max})}^{2} λ_{max} (R^{- 1}) \nabla ε_{max}) \\ + 2 e_{max} \end{matrix})}^{2} \\ + 2 {(e_{max} + \frac{1}{2} ∥\tilde{w}∥ \nabla σ_{max} \nabla σ_{max} {(G_{max})}^{2} λ_{max} (R^{- 1}) \nabla ε_{max})}^{2} \end{matrix}

(50)

According to (49) and (50), the inequality can be obtained

{(A + 4 B)}^{2} - (16 B^{2} + 2 D^{2}) \geq π_{1}

It can be obtained that

π_{1} > 0

with a big enough variable

∥\tilde{w}∥

and the highest order coefficient

{({(G_{min})}^{2} λ_{min} (R^{- 1}) {(\nabla σ_{min})}^{2})}^{2} > 0

. Therefore, we can determine the positive number

ϑ_{1}

such that:

\forall ∥\tilde{w}∥ > ϑ_{1}

, we imply that

{(A + 4 B)}^{2} - (16 B^{2} + 2 D^{2}) \geq π_{1}

, and from (48) we obtain:

{\dot{V}}_{1} (t) \leq - π_{1}

.

Regarding the term of

V_{2} (t)

, from (20) we compute the derivative of them:

\begin{array}{l} {\dot{V}}_{2} = {(\nabla V^{*})}^{T} (f_{i} + g_{i} (\hat{u} + Δ)) \\ = - (x^{T} Q x + λ ρ^{2} (x)) + \frac{1}{4} {(\nabla V^{*})}^{T} g_{i} (x) R^{- 1} g_{i} {(x)}^{T} (\nabla V^{*}) + {(\nabla V^{*})}^{T} g_{i} (- \frac{1}{2} R^{- 1} {(g_{i} (x))}^{T} \nabla σ {(x)}^{T} \hat{w} + Δ) \\ = - (x^{T} Q x + λ ρ^{2} (x)) - \frac{1}{4} {(\nabla V^{*})}^{T} g_{i} (x) R^{- 1} g_{i} {(x)}^{T} (\nabla V^{*}) \\ + \frac{1}{2} {(\nabla V^{*})}^{T} g_{i} R^{- 1} {(g_{i} (x))}^{T} (\nabla V^{*} - \nabla σ {(x)}^{T} \hat{w}) + {(\nabla V^{*})}^{T} g_{i} Δ \\ = - (x^{T} Q x + λ ρ^{2} (x)) - \frac{1}{4} {(\nabla V^{*})}^{T} g_{i} (x) R^{- 1} g_{i} {(x)}^{T} (\nabla V^{*}) \\ + \frac{1}{2} {(\nabla V^{*})}^{T} g_{i} R^{- 1} {(g_{i} (x))}^{T} (\nabla σ {(x)}^{T} \tilde{w} + \nabla ε (x)) + {(\nabla V^{*})}^{T} g_{i} Δ \\ = - (x^{T} Q x + λ ρ^{2} (x)) - \frac{1}{4} {(\nabla V^{*})}^{T} g_{i} R^{- 1} {g_{i}}^{T} (\nabla V^{*}) + \frac{1}{2} {(\nabla V^{*})}^{T} g_{i} R^{- 1} {g_{i}}^{T} (\nabla σ {(x)}^{T} \tilde{w} + \nabla ε (x)) \\ + {(\nabla V^{*})}^{T} g_{i} Δ \end{array}

(51)

It can be considered that

ρ (x) = ϖ ∥x∥

. According to (51), we obtain:

{\dot{V}}_{2} \leq - (λ_{min} (Q) + λ ϖ) {∥x∥}^{2} + θ^{2}

(52)

with

θ^{2} = - \frac{1}{4} {(\nabla V^{*})}^{T} g_{i} R^{- 1} g_{i}^{T} (\nabla V^{*}) + \frac{1}{2} {(\nabla V^{*})}^{T} g_{i} R^{- 1} g_{i}^{T} (\nabla σ {(x)}^{T} \tilde{w} + \nabla ε (x)) + {(\nabla V^{*})}^{T} g_{i} Δ

(53)

According to the assumption 1 and assumption 2, it follows that:

\begin{matrix} θ^{2} \leq \frac{1}{4} {(w_{max} \nabla σ_{max} + \nabla ε_{max})}^{2} g_{max}^{2} λ_{max} (R^{- 1}) + \frac{1}{2} {(ϑ \nabla σ_{max} + \nabla ε_{max})}^{2} g_{max}^{2} λ_{max} (R^{- 1}) \\ + (w_{max} \nabla σ_{max} + \nabla ε_{max}) g_{max} ϖ ∥x∥ \end{matrix}

(54)

It can be seen that

(λ_{min} (Q) + λ ϖ) {∥x∥}^{2} - θ^{2} \geq π_{2}

with

π_{2} > 0

also leads to other inequalities, including the polynomial quadratic form with the variable

∥x∥

and the highest order coefficient

(λ_{min} (Q) + λ ϖ) > 0

. Hence, we can find the positive number

ϑ_{2}

such that:

\forall ∥x∥ > ϑ_{2}

we get

(λ_{min} (Q) + λ ϖ) {∥x∥}^{2} - θ^{2} \geq π_{2}

, and from (52) we obtain

{\dot{V}}_{2} (t) \leq - π_{2} .

□

Remark 3.

The numbers

ϑ_{1}; ϑ_{2}

can be changed by establishing the neural network of the optimal cost function. Additionally, for any switching index, the variable

∥x∥

and

∥\tilde{w}∥

converge to the specified domains. The approximate optimal control law

\hat{u}

is considered in the equation as (27), which converges to the neighbourhood of the optimal control as

u^{*}

. Different from the controller in [33], the switched system with unknown switching signal is mentioned in this work by proposed adaptive optimal control.

Proof.

It should be noted that according to (34), we imply that:

∥\hat{u} - u^{*}∥ = \frac{1}{2} ∥R^{- 1} {(g_{i} (x))}^{T} ({(\nabla σ (x))}^{T} \tilde{w} + \nabla ε (x))∥ \leq \frac{1}{2} λ_{max} (R^{- 1}) . G_{max} . (\nabla σ_{max} . υ_{1} + \nabla ε_{max}) = ϑ_{3}

(55)

□

4. Simulation Results

In this section, we verify the effectiveness and performance of the proposed controller. The ARL algorithm is developed for solving the optimal and tracking problem by RBF network-based ADP, where a single critic Neural Network is used to approximate the Bellman function. The ARL control law and the weight of the Critic part are established as in (27) and (30) with the appropriate learning rate

α

and the coefficient

λ

. Moreover, to carry out the ARL algorithm (27) and (30), we need to employ the term of model

g_{i} (x)

, the term of cost function

Q, R

and the appropriate function

σ (x)

in Neural Network. To verify the proposed method of our algorithm, two different situations are implemented to do simulations with all the parameters and functions being listed in each case as follows:

4.1. The Second-Order Switched Nonlinear Systems

In this simulation experiment, we consider the case of switched systems including

N = 2

subsystems to be described as Equations (56) and (57):

\{\begin{matrix} {\dot{x}}_{1} = - x_{1}^{3} - 2 x_{2} + (u + Δ_{1} (x, t)) \\ {\dot{x}}_{2} = x_{1} + 0.5 cos (x_{1}^{2}) sin (x_{2}^{3}) - (u + Δ_{1} (x, t)) \end{matrix}

(56)

\{\begin{matrix} {\dot{x}}_{1} = - x_{1}^{5} sin (x_{2}) + (u + Δ_{2} (x, t)) \\ {\dot{x}}_{2} = \frac{1}{2} x_{1} - cos (x_{1}) cos (x_{2}^{3}) - (u + Δ_{2} (x, t)) \end{matrix}

(57)

The initial value of state vector is selected as

x (0) = {[\begin{matrix} 5 & - 5 \end{matrix}]}^{T}

and choosing the parameters and matrices in cost function as follows:

R = [\begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix}]; Q = [\begin{matrix} 1 & 0 \\ 0 & 3 \end{matrix}]; α = 0.1; λ = 5

. Additionally, the function

σ (x)

in RBF Neural Network can be chosen as

σ (x) = {[x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}]}^{T}

.

Under the arbitrary switching law (Figure 1), the results are shown in Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6 by using proposed ARL law, in which the responses of state variables are described in Figure 2 and Figure 3 under the control input (Figure 5) and the weights are trained as in Figure 6. It is obvious that the closed system is stable due to the convergence of state variables to zero (Figure 2 and Figure 3). On the other hand, the convergence is also shown in the weights of Critic part (Figure 6).

Remark 4.

The second-order switched nonlinear system was also mentioned in [14] by using nonlinear control law after the equivalent transform operation (Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6). However, it is noteworthy that the response of control input of proposed method has a better transient and steady performance compared with the nonlinear controller in [14] because of the effectiveness of handling the constraint in optimal control (Figure 4 and Figure 5).

4.2. The Third-Order Switched Nonlinear Systems

Here, we continue to investigate the case of switched systems including

N = 2

subsystems as follows:

\{\begin{matrix} \dot{x} = [\begin{matrix} - x_{1} + x_{2} \\ - 0.2 x_{2} - sin x_{1} cos x_{3} \\ x_{1} x_{2} - sin x_{3} \end{matrix}] + [\begin{matrix} 1 \\ - 1 \\ - 2 \end{matrix}] u + Δ_{1} (x, t) \\ \dot{x} = [\begin{matrix} - x_{1} + x_{2}^{2} x_{3} \\ - x_{2} - sin (x_{1} cos x_{3}) \\ sin x_{2} - x_{3} \end{matrix}] + [\begin{matrix} 1 \\ - 1 \\ - 2 \end{matrix}] u + Δ_{2} (x, t) \end{matrix}

(58)

where

x_{0} = {[5; - 3; 2]}^{T}; Δ_{i} (x, t) = \frac{1}{10} [\begin{matrix} d_{1} x_{1} \\ d_{2} x_{2} \\ d_{3} x_{3} \end{matrix}]; ∥Δ_{i}∥ \leq \frac{1}{10} ∥x∥

with

d_{1}, d_{2}, d_{3}

are the random functions. The parameters of proposed control design can be chosen as

R = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]; α = 0.01; Q = [\begin{matrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{matrix}]; λ = 5

. Moreover, the function

σ (x)

in RBF Neural Network can be chosen as

σ (x) = {[x_{1}^{2}, x_{2}^{2}, x_{3}^{2}, x_{1} x_{2}, x_{2} x_{3}, x_{1} x_{3}]}^{T}

. In this case, under the arbitrary switching law in Figure 7, the results are obtained in Figure 8, Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 with the responses of state variables are shown in Figure 8, Figure 9 and Figure 10 based on the control input (Figure 11 and Figure 12), and the training weights are shown in Figure 13. It can be seen that the closed system is stable (Figure 8, Figure 9 and Figure 10) and the convergence is also shown in the weights of Critic part (Figure 13).

Remark 5.

It is worth noting that the third-order switched nonlinear system was also considered in [15] based on the adaptive backstepping output feedback nonlinear control scheme combined with the transform operation obtaining the response of state variables, control input in Figure 8, Figure 9, Figure 10 and Figure 11 and adaptation law (Figure 14). However, thanks to the property of optimal control, the control input response of the proposed controller has a better transient compared with the adaptive backstepping nonlinear controller in [15] (Figure 11 and Figure 12).

5. Conclusions

This paper investigated the optimal control design for perturbed switched nonlinear systems based on the adaptive dynamic programming technique. The optimal control is first designed under the consideration of nominal systems. Then, the ADP technique is developed by using the Neural Networks. Due to the description of nonlinear systems, as well as unknown switch index, the neural network is considered to approximate the critic part of the iterative algorithm. Moreover, the UUB stability problem of the closed-loop system and the convergence of weight training are guaranteed under this solution. Finally, two simulation examples are given to verify the effectiveness of the presented ARL algorithm.

Author Contributions

P.N.D. was originally responsible for conceptualization, methodology, simulation and prepared the original draft of the article. H.Q.N., M.-D.N. verified the simulation data, data curation, investigation, and draft writing. S.-J.A. contribute a formal analysis of review, editing, and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Korea Electric Power Corporation (Grant number: R18XA04).

Acknowledgments

The authors gratefully acknowledge Thai Nguyen University of Technology for supporting this work.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARL	Adaptive Reinforcement Learning
ADP	Adaptive Dynamic Programming
UUB	Uniformly Ultimately Bounded
NNs	Neural Networks
HJB	Hamilton–Jacobi–Bellman

References

Loschi, H.; Smolenski, R.; Lezynski, P.; Nascimento, D.; Demidova, G. Aggregated Conducted Electromagnetic Interference Generated by DC/DC Converters with Deterministic and Random Modulation. Energies 2020, 13, 3698. [Google Scholar] [CrossRef]
Bakeer, A.; Chub, A.; Vinnikov, D. Step-Up Series Resonant DC–DC Converter with Bidirectional-Switch- Based Boost Rectifier for Wide Input Voltage Range Photovoltaic Applications. Energies 2020, 13, 3747. [Google Scholar] [CrossRef]
Premkumar, M.; Subramaniam, U.; Haes Alhelou, H.; Siano, P. Design and Development of Non-Isolated Modified SEPIC DC-DC Converter Topology for High-Step-Up Applications: Investigation and Hardware Implementation. Energies 2020, 13, 3960. [Google Scholar] [CrossRef]
Korkh, O.; Blinov, A.; Vinnikov, D.; Chub, A. Review of Isolated Matrix Inverters: Topologies, Modulation Methods and Applications. Energies 2020, 13, 2394. [Google Scholar] [CrossRef]
Chen, B.-Y.; Shangguan, X.-C.; Jin, L.; Li, D.-Y. An Improved Stability Criterion for Load Frequency Control of Power Systems with Time-Varying Delays. Energies 2020, 13, 2101. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Xiang, W. Mode-identifying time estimation and switching-delay tolerant control for switched systems: An elementary time unit approach. Automatica 2016, 64, 174–181. [Google Scholar] [CrossRef]
Yuan, S.; Zhang, L.; De Schutter, B.; Baldi, S. A novel Lyapunov function for a non-weighted L2 gain of asynchronously switched linear systems. Automatica 2018, 87, 310–317. [Google Scholar] [CrossRef] [Green Version]
Xiang, W.; Lam, J.; Li, P. On stability and H_∞ control of switched systems with random switching signals. Automatica 2018, 95, 419–425. [Google Scholar] [CrossRef]
Lin, J.; Zhao, X.; Xiao, M.; Shen, J. Stabilization of discrete-time switched singular systems with state, output and switching delays. J. Frankl. Inst. 2019, 356, 2060–2089. [Google Scholar] [CrossRef]
Briat, C. Convex conditions for robust stabilization of uncertain switched systems with guaranteed minimum and mode-dependent dwell-time. Syst. Control. Lett. 2015, 78, 63–72. [Google Scholar] [CrossRef]
Lian, J.; Li, C. Event-triggered control for a class of switched uncertain nonlinear systems. Syst. Control. Lett. 2020, 135, 104592. [Google Scholar] [CrossRef]
Vu, T.A.; Nam, D.P.; Huong, P.T.V. Analysis and control design of transformerless high gain, high efficient buck-boost DC-DC converters. In Proceedings of the 2016 IEEE International Conference on Sustainable Energy Technologies (ICSET), Hanoi, Vietnam, 14–16 November 2016; pp. 72–77. [Google Scholar]
Nam, D.P.; Thang, B.M.; Thanh, N.T. Adaptive Tracking Control for a Boost DC–DC Converter: A Switched Systems Approach. In Proceedings of the 2018 4th International Conference on Green Technology and Sustainable Development (GTSD), Ho Chi Minh City, Vietnam, 23–24 November 2018; pp. 702–705. [Google Scholar]
Thanh, N.T.; Sam, P.N.; Nam, D.P. An Adaptive Backstepping Control for Switched Systems in presence of Control Input Constraint. In Proceedings of the 2019 International Conference on System Science and Engineering (ICSSE), Quang Binh, Vietnam, 19–21 July 2019; pp. 196–200. [Google Scholar]
Chiang, M.-L.; Fu, L.-C. Adaptive stabilization of a class of uncertain switched nonlinear systems with backstepping control. Automatica 2014, 50, 2128–2135. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Vrabie, D.; Lewis, F.L. Online adaptive algorithm for optimal control with integral reinforcement learning. Int. J. Robust Nonlinear Control. 2014, 24, 2686–2710. [Google Scholar] [CrossRef]
Bai, W.; Zhou, Q.; Li, T.; Li, H. Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation. IEEE Trans. Cybern. 2019, 50, 3433–3443. [Google Scholar] [CrossRef] [PubMed]
Zhang, T.; Xu, H. Adaptive optimal dynamic surface control of strict-feedback nonlinear systems with output constraints. J. Frankl. Inst. 2020, 30, 2059–2078. [Google Scholar] [CrossRef]
Lv, Y.; Na, J.; Yang, Q.; Wu, X.; Guo, Y. Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics. Int. J. Control 2016, 89, 99–112. [Google Scholar] [CrossRef] [Green Version]
Chen, C.; Modares, H.; Xie, K.; Lewis, F.L.; Wan, Y.; Xie, S. Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics. IEEE Trans. Autom. Control 2019, 64, 4423–4438. [Google Scholar] [CrossRef]
Yang, X.; Wei, Q. Adaptive Critic Learning for Constrained Optimal Event-Triggered Control with Discounted Cost. IEEE Trans. Neural Netw. Learn. Syst. 2020. [Google Scholar] [CrossRef]
Wen, G.; Ge, S.S.; Chen, C.L.P.; Tu, F.; Wang, S. Adaptive tracking control of surface vessel using optimized backstepping technique. IEEE Trans. Cybern. 2018, 49, 3420–3431. [Google Scholar] [CrossRef]
Wang, D.; Mu, C. Adaptive-critic-based robust trajectory tracking of uncertain dynamics and its application to a spring–mass–damper system. IEEE Trans. Ind. Electron. 2017, 65, 654–663. [Google Scholar] [CrossRef]
Li, S.; Ding, L.; Gao, H.; Liu, Y.-J.; Huang, L.; Deng, Z. ADP-based online tracking control of partially uncertain time-delayed nonlinear system and application to wheeled mobile robots. IEEE Trans. Cybern. 2019, 50, 3182–3194. [Google Scholar] [CrossRef]
Wen, G.; Chen, C.L.P.; Ge, S.S.; Yang, H.; Liu, X. Optimized adaptive nonlinear tracking control using actor–critic reinforcement learning strategy. IEEE Trans. Ind. Inform. 2019, 15, 4969–4977. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Ferraz, H. Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance. Automatica 2018, 87, 412–420. [Google Scholar] [CrossRef]
Gao, W.; Jiang, Y.; Jiang, Z.-P.; Chai, T. Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica 2016, 72, 37–45. [Google Scholar] [CrossRef] [Green Version]
Mu, C.; Zhang, Y.; Gao, Z.; Sun, C. ADP-based robust tracking control for a class of nonlinear systems with unmatched uncertainties. IEEE Trans. Syst. Man Cybern. Syst. 2019. [Google Scholar] [CrossRef]
Huang, Y. Optimal guaranteed cost control of uncertain non-linear systems using adaptive dynamic programming with concurrent learning. IET Control Theory Appl. 2018, 12, 1025–1035. [Google Scholar] [CrossRef]
Tang, D.; Chen, L.; Tian, Z.F.; Hu, E. Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control. Int. J. Control 2019, 1–13. [Google Scholar] [CrossRef]
Fan, B.; Yang, Q.; Tang, X.; Sun, Y. Robust ADP design for continuous-time nonlinear systems with output constraints. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2127–2138. [Google Scholar] [CrossRef]
Chun, T.Y.; Lee, J.Y.; Park, J.B.; Choi, Y.H. Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalised policy iteration. Int. J. Control 2018, 91, 1223–1240. [Google Scholar] [CrossRef]
Mu, C.; Wang, D. Neural-network-based adaptive guaranteed cost control of nonlinear dynamical systems with matched uncertainties. Neurocomputing 2017, 245, 46–54. [Google Scholar] [CrossRef]

Figure 1. The Switching Law of the second-order Switched System.

Figure 2. The response of state variable 1.

Figure 3. The response of state variable 2.

Figure 4. The response of Control Input.

Figure 5. The response of Control Input using Adaptive Reinforcement Learning.

Figure 6. The Training Parameters of adaptive dynamic programming (ADP).

Figure 7. The Switching Law of the third-order Switched System.

Figure 8. The response of state variable 1.

Figure 9. The response of state variable 2.

Figure 10. The response of state variable 2.

Figure 11. The response of Control Input.

Figure 12. The response of Control Input using Adaptive Reinforcement Learning.

Figure 13. The Training Parameters of ADP.

Figure 14. The Adaptation Law of Nonlinear Control in [15].

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dao, P.N.; Nguyen, H.Q.; Ngo, M.-D.; Ahn, S.-J. On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning. Energies 2020, 13, 5069. https://doi.org/10.3390/en13195069

AMA Style

Dao PN, Nguyen HQ, Ngo M-D, Ahn S-J. On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning. Energies. 2020; 13(19):5069. https://doi.org/10.3390/en13195069

Chicago/Turabian Style

Dao, Phuong Nam, Hong Quang Nguyen, Minh-Duc Ngo, and Seon-Ju Ahn. 2020. "On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning" Energies 13, no. 19: 5069. https://doi.org/10.3390/en13195069

APA Style

Dao, P. N., Nguyen, H. Q., Ngo, M.-D., & Ahn, S.-J. (2020). On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning. Energies, 13(19), 5069. https://doi.org/10.3390/en13195069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On Stability of Perturbed Nonlinear Switched Systems with Adaptive Reinforcement Learning

Abstract

1. Introduction

2. Problem Statement and Preliminaries

3. Adaptive Reinforcement Learning-Based Control Design

4. Simulation Results

4.1. The Second-Order Switched Nonlinear Systems

4.2. The Third-Order Switched Nonlinear Systems

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI