Reinforcement-Learning-Based Fixed-Time Prescribed Performance Consensus Control for Stochastic Nonlinear MASs with Sensor Faults

Wang, Zhenyou; Cai, Xiaoquan; Luo, Ao; Ma, Hui; Xu, Shengbing

doi:10.3390/s24247906

Open AccessArticle

Reinforcement-Learning-Based Fixed-Time Prescribed Performance Consensus Control for Stochastic Nonlinear MASs with Sensor Faults

by

Zhenyou Wang

¹

,

Xiaoquan Cai

¹

,

Ao Luo

^2,*

,

Hui Ma

¹

and

Shengbing Xu

¹

School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou 510006, China

²

School of Automation, Guangdong-Hong Kong Joint Laboratory for Intelligent Decision and Cooperative Control, Guangdong University of Technology, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(24), 7906; https://doi.org/10.3390/s24247906

Submission received: 22 October 2024 / Revised: 1 December 2024 / Accepted: 5 December 2024 / Published: 11 December 2024

(This article belongs to the Section Fault Diagnosis & Sensors)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes the fixed-time prescribed performance optimal consensus control method for stochastic nonlinear multi-agent systems with sensor faults. The consensus error converges to the prescribed performance bounds in fixed-time by an improved performance function and coordinate transformation. Due to the unknown faults in sensors, the system states cannot be gained correctly; therefore, an adaptive compensation strategy is constructed based on the approximation capabilities of neural networks to solve the negative impact of sensor failures. The reinforcement-learning-based backstepping method is proposed to realize the optimal control of the system. Utilizing Lyapunov stability theory, it is shown that the designed controller enables the consensus error to converge to the prescribed performance bounds in fixed time and that all signals in the closed-loop system are bounded in probability. Finally, the simulation results prove the effectiveness of the proposed method.

Keywords:

fixed-time prescribed performance; optimal consensus control; reinforcement learning; sensor faults; stochastic nonlinear multi-agent systems

1. Introduction

At the present time, multi-agent systems (MASs) have attracted widespread attention for their immense contributions to distributed coordination [1,2,3,4,5]. For example, a novel differential privacy bipartite consensus algorithm was proposed by Ma et al. [6], enabling consensus control in cooperative–competitive MASs. Wang et al. [7] designed the fixed-time formation controller for uncertain nonlinear MASs with time-varying actuator faults and random perturbations. To handle denial-of-service attacks and actuator faults in heterogeneous linear MASs, Zhang et al. [8] developed a resilient practical leader–follower consensus controller. Compared with the linear systems or ordinary nonlinear systems discussed earlier, stochastic nonlinear systems can be more widely applied in many practical engineering designs. Therefore, investigating stochastic nonlinear systems has greater practical significance and application value [9,10,11]. For instance, Zhu et al. [12] utilized the Nussbaum technique to resolve the issue of asymptotic consensus control in SNMASs. A compensator-based distributed controller for consensus control in SNMASs was developed by Li et al. [13]. However, all of the above studies ignored the effect of sensor faults.

In practical application, sensors often have unknown faults, which may lead to the failure of traditional control methods [14,15,16,17]. Therefore, compensating for sensor faults in MASs is crucial, as it enhances their security and reliability. For instance, Wu et al. [18] tackled sensor faults by designing resilient adaptive updates and state observers. To address sensor faults, Yu et al. [19] proposed an algorithm based on weighted average consensus and the unscented information filter.

As one of the important indicators to measure the performance of controllers, the problem of convergence time has been widely studied in recent years [20,21,22,23]. Although finite-time controls allow systems to converge quickly, their settling time depends on the initial value [24]. However, in many real-world scenarios, the initial values cannot be obtained in advance, so there is no control over the convergence time. Fixed-time control (FTC) methods have been proposed to solve the problem, where the convergence time can be predetermined and remains unaffected by the initial value [25,26,27]. To achieve the desired performance of the consensus error, the fixed-time prescribed performance control (FTPPC) method was designed [28,29,30,31]. To illustrate, Long et al. [32] adopted the performance strategy specified in a fixed-time frame to construct the control method, and realized the consensus control for nonlinear MASs under full state constraints.

To alleviate the challenge of directly solving the Hamilton–Jacobi–Bellman (HJB) equation when designing a consensus controller for MASs, a backstepping control design method utilizing reinforcement learning (RL) was ntroduced [33,34,35]. Wang et al. [36] used an actor–critic NN-based N-step backstepping structure to achieve consensus control for MASs. In [37,38], the idea of RL-based optimal backstepping control design was applied to tackle tracking control issues in ship systems and nonlinear systems with unknown dynamics, respectively. The above research has greatly pushed forward the development of nonlinear MAS consensus control.

Based on the above results, an improved FTPPC framework is proposed. To address the effects of sensor faults, an adaptive compensator based on neural networks (NNs) is constructed. Furthermore, the actor–critic structure within the reinforcement learning framework is utilized to develop an optimal backstepping method, ensuring that the consensus error remains within the FTPPC requirements, thereby achieving robust and reliable control.

(1): This paper presents an improved fixed-time prescribed performance framework. By constructing coordinate transformations, the consensus error can converge to a prescribed performance boundary in fixed time. Moreover, this framework overcomes the defect in finite-time control [20], where convergence time depends on the initial state.
(2): Considering the potential impact of sensor faults in real-world systems, which leads to a poor control performance, this paper utilizes neural networks to construct a sensor fault compensation mechanism. This enables the consensus error to efficiently converge to the prescribed performance boundary, even in the presence of unknown sensor faults.
(3): Compared with traditional backstepping methods [39], which do not account for system resource consumption, this paper uses RL to design an optimal control strategy, reducing the resource consumption associated with backstepping. Furthermore, compared with existing RL strategies [40], our approach uses a simpler adaptive laws form, ensuring that the RL network can be trained sufficiently and efficiently.

2. Preliminaries and Description

2.1. Graph Theory

Consider the digraph

G = (V, E, A)

, representing the communication topology of MASs, where

V = {v_{1}, v_{2}, \dots, v_{N}}

is the set of nodes consisting of N agents.

E \subseteq V \times V

,

A = [a_{i, k}] \subseteq R^{N \times N}

is the edges set and weighted adjacency matrix of the graph

G

, respectively. When

v_{i}

can obtain information from

v_{k}

, denoted by

(v_{i}, v_{k}) \in E

, the weight

a_{i, k} \neq 0

[41]. Otherwise, if there is no information exchange between

v_{i}

and

v_{k}

, then

a_{i, k} = 0

. In a directed graph

G

, the set of neighbors of

v_{i}

, denoted by

N_{i}

, is defined as all nodes connected to

v_{i}

. Specifically,

N_{i} = {j | (v_{i}, v_{j}) \in E}

. The adjacency matrix A describes the connectivity between nodes in the graph, where

a_{i, j}

represents the weight of the edge from

v_{j}

to

v_{i}

. The degree matrix

D

is the sum of the weights of the edges adjacent to

v_{i}

with

d_{i} = \sum a_{i, j}

. The Laplacian matrix

L = D - A

.

2.2. Neural Networks (NNs)

NNs have excellent function approximation and adaptive learning capabilities [42]. Based on the approximation properties of NNs, any continuous function

f (x) : R^{n} \to R^{m}

can be approximated by NNs as follows:

f (x) = Y *^{T} ϕ (x) + ε (x),

(1)

where

ϕ (x) = {[ϕ_{1} (x), \dots, ϕ_{p} (x)]}^{T} \in R^{p}

with

ϕ_{i} (x) = \exp (- {(x - μ_{i})}^{T} (x - μ_{i}) / σ_{i}^{2}) \in R

, where

σ_{i} \in R

and

μ_{i} = {[μ_{i 1}, \dots, μ_{i n}]}^{T} \in R^{n}

are the width and center of the Gaussian function, respectively. The approximation error

ε (x) \in R^{m}

satisfies

∥ ε (x) ∥ \leq ε^{*}

with a positive constant

ε^{*}

.

Y^{*} \in R^{p}

is the ideal weight vector, which satisfies the following:

Y^{*} : = arg min_{Y \in R^{p}} \{sup_{x \in Ω_{x}} ∥f (x) - Y^{T} ϕ (x)∥\} .

(2)

2.3. System Description

Consider the following SNMASs with sensor faults:

\{\begin{matrix} d x_{i, j} = & [f_{i, j} ({\bar{x}}_{i, j}) + x_{i, j + 1}] d t + g_{i, j} ({\bar{x}}_{i, j}) d ω \\ d x_{i, n} = & [f_{i, n} ({\bar{x}}_{i, n}) + u_{i}] d t + g_{i, n} ({\bar{x}}_{i, n}) d ω \\ y_{i}^{f} = & ϱ_{i} x_{i, 1} + ρ_{i}, 1 \leq i \leq N, 1 \leq j \leq n - 1, \end{matrix}

(3)

where

f_{i, j} (\cdot) : R^{n} \to R^{n}

and

g_{i, j} (\cdot) : R^{n} \to R^{n \times r}

,

ω

is r-dimensional independent standard wiener process,

x_{i, j}

is state,

{\bar{x}}_{i, j} = {[x_{i, 1}, \dots, x_{i, j}]}^{T} \in R^{j}

is the system state vector,

u_{i} \in R

is the control input, and

y_{i}^{f} \in R

is the untrue state measured by the sensor.

ϱ_{i}

and

ρ_{i}

are the sensor fault parameters, which will be defined later.

Definition 1

(See [43]). A sensor fault occurs if the output

y_{i}^{f} (t)

of the sensor measuring the system output signal

x_{i, 1} (t) \in R

satisfies

y_{i}^{f} (t) = ϱ_{i} x_{i, 1} (t) + ρ_{i}

, where the unknown parameters

ϱ_{i}

and

ρ_{i}

satisfy

{\bar{ϱ}}_{i, min} \leq ϱ_{i} \leq 1

and

- {\bar{ρ}}_{i} \leq ρ_{i} \leq {\bar{ρ}}_{i}

, respectively.

{\bar{ϱ}}_{i, min}

and

{\bar{ρ}}_{i}

are constants.

Definition 2

(See [21]). Consider the following stochastic nonlinear system:

d x = f (x) d t + g (x) d ω,

(4)

For the system (4) and any twice-differentiable function

V (x)

, based on Itô’s Lemma and the stochastic differentials properties

{(d ω)}^{2} = d t

, define the differential operator

L

and derivative

d

as follows:

L V = \frac{\partial V}{\partial x} f (x) + \frac{1}{2} T r \{g^{T} (x) \frac{\partial^{2} V}{\partial x^{2}} g (x)\},

(5)

d V = L V d t + \frac{\partial V}{\partial x} g d ω,

(6)

where

T r {A}

is the trace of matrix A.

Assumption 1.

At least one follower can directly receive information from the leader.

Lemma 1

(See [44]). For any

m, n, λ > 0, κ, ν \in R

and

(m - 1) (n - 1) = 1

, we have the following:

κ^{T} ν \leq \frac{λ^{m}}{m} {∥ κ ∥}^{m} + \frac{1}{n λ^{n}} {∥ ν ∥}^{n} .

(7)

For clarity, a list of key variables and a list of abbreviations used in this paper are provided in Appendix A.

3. Adaptive Optimal Consensus Controller Design and Stability Analysis

The backstepping technique is employed for controller design.

3.1. Adaptive Optimal Consensus Controller Design

For agent i in SNMASs (3), let

f_{s i} = (ϱ_{i} - 1) x_{i, 1} + ρ_{i}

, we have

y_{i}^{f} (t) = x_{i, 1} + f_{s i} .

(8)

The consensus error is defined as

s_{i} (t) = \sum_{k \in N_{i}} a_{i, k} (y_{i}^{f} - y_{k}^{f}) + b_{i} (y_{i}^{f} - y_{r} (t)),

(9)

where

y_{r} (t)

is a reference signal.

b_{i} = 1

means that the leader information can be received by agent i; otherwise,

b_{i} = 0

.

To achieve the FTPPC requirement for

s_{i} (t)

, this paper chooses the fixed-time performance function (FTPF) as follows:

h (t) = \{\begin{matrix} (h_{0} - & h_{\tilde{T}}) {(1 - \frac{λ t}{ι})}^{ι} + h_{\tilde{T}} & , & 0 \leq t < \tilde{T} \\ h_{\tilde{T}} & , & t \geq \tilde{T}, \end{matrix}

(10)

where

ι \geq n

and

λ > 0

are the design parameters,

h_{0} = h (0) > h_{\tilde{T}} = l i m_{t \to \tilde{T}} h (t) > 0

, and the settling time

\tilde{T} = ι / λ < \infty

.

Achieving the asymmetric FTPPC requires ensuring that the following inequalities hold:

- δ_{m} h (t) < s_{i} (t) < δ_{M} h (t), t > 0 .

(11)

where

δ_{m}

and

δ_{M}

are positive asymmetric design parameters.

Then, we make

\begin{matrix} η_{i} (μ_{i}) & = (δ_{M} e^{μ_{i}} - δ_{m} e^{- μ_{i}}) / (e^{μ_{i}} + e^{- μ_{i}}), \\ s_{i} (t) & = h (t) η_{i} (μ_{i}) . \end{matrix}

(12)

Then, one has

μ_{i} (t) = η_{i}^{- 1} (\frac{s_{i} (t)}{h (t)}) = \frac{1}{2} ln \frac{η_{i} + δ_{m}}{δ_{M} - η_{i}} .

For convenience of description, we let

\begin{matrix} φ_{i} = & (1 / 2 h) ([1 / (η_{i} + δ_{m})] - [1 / (η_{i} - δ_{M})]), \\ ψ_{i} = & - (1 / 2 h) ([1 / (η_{i} + δ_{m})] + [1 / (η_{i} - δ_{M})]) . \end{matrix}

Then, avoiding the zero equilibrium point issue, we propose the following state coordinate transformation for the n-step backstepping method.

\{\begin{matrix} z_{i, 1} & = μ_{i} - \frac{1}{2} ln \frac{δ_{m}}{δ_{M}} \\ z_{i, j} & = x_{i, j} - {\hat{α}}_{i, j - 1}, j = 2, \dots, n, \end{matrix}

(13)

where

{\hat{α}}_{i, j - 1}

is the approximate ideal virtual control.

Step 1: From (13), the derivative of

z_{i, 1}

is obtained as

\begin{matrix} d z_{i, 1} = & φ_{i} ((d_{i} + b_{i}) x_{i, 2} + F_{s i} + F_{p i} \\ - β_{i, 1} + Γ_{i, 1}) d t + ℧_{i, 1} d ω, \end{matrix}

(14)

where

F_{s i} = (d_{i} + b_{i}) f_{i, 1} (x_{i, 1}) - \sum_{k \in N_{i}} a_{i, k} f_{k, 1} (x_{k, 1}), Γ_{i, 1} = {(d_{i} + b_{i})}^{2} ϱ_{i}^{2} ψ_{i} g_{i, 1}^{T} g_{i, 1} + d_{k}^{2} ϱ_{k}^{2} ψ_{i} g_{k, 1}^{T} g_{k, 1}

- \sum_{k \in N_{i}} a_{i, k} x_{k, 2}, F_{p i} = (d_{i} + b_{i}) f_{p i} - \sum_{k \in N_{i}} a_{i, k} f_{p k}, β_{i, 1} = \frac{\dot{h} s_{i} (t)}{h} + b_{i} \dot{y_{r}}

and

℧_{i, 1} = φ_{i} (d_{i} + b_{i}) ϱ_{i} g_{i, 1} + φ_{i} b_{k} ϱ_{k} g_{k, 1}

,

f_{p i} = {\dot{f}}_{s i}, f_{p k} = {\dot{f}}_{s k}

.

The unknown smooth functions

F_{s i}

and

F_{p i}

are approximated by NNs as

\begin{matrix} F_{p i} & = Y_{p i}^{* T} ϕ_{p i} ({\bar{x}}_{i, 2}, {\bar{x}}_{k, 2}) + ε_{p i} ({\bar{x}}_{i, 2}, {\bar{x}}_{k, 2}), \\ F_{s i} & = Y_{s i}^{* T} ϕ_{s i} ({\bar{x}}_{i, 2}) + ε_{s i} ({\bar{x}}_{i, 2}), \end{matrix}

(15)

where

{\bar{x}}_{k, 2} = {[x_{k, 1}, x_{k, 2}]}^{T}

,

ε_{s i} ({\bar{x}}_{i, 2})

and

ε_{p i} ({\bar{x}}_{i, 2}, {\bar{x}}_{k, 2})

are defined as the approximation errors.

The value function is designed as

J_{i, 1} (z_{i, 1} (t)) = lim_{t_{f} \to \infty} \frac{1}{t_{f}} \int_{t}^{t_{f}} W_{i, 1} d τ,

(16)

where

W_{i, 1} = z_{i, 1}^{4} + α_{i, 1}^{2} + \sum_{k \in N_{i}} α_{k, 1}^{2}

,

t_{f}

is the control end time.

Then, define an optimal value function as

\begin{matrix} J_{i, 1}^{*} (z_{i, 1}) & = min_{α_{i, 1} \in ψ_{i} (Ω_{i, 1})} [lim_{t_{f} \to \infty} \frac{1}{t_{f}} \int_{t}^{t_{f}} W_{i, 1} d τ] = lim_{t_{f} \to \infty} \frac{1}{t_{f}} \int_{t}^{t_{f}} W_{i, 1}^{*} d τ, \end{matrix}

(17)

where

Ω_{i, 1}

is the compact set.

W_{i, 1}^{*} = z_{i, 1}^{4} + α_{i, 1}^{* 2} + \sum_{k \in N_{i}} α_{k, 1}^{* 2}

.

ψ_{i} (Ω_{i, 1})

is the available control set.

α_{i, 1}^{*}

is the ideal virtual control.

Define

V_{i, 1} (z_{i, 1} (t)) = \int_{t}^{t_{f}} W_{i, 1} d τ

, thus,

V_{i, 1}^{*} (z_{i, 1} (t)) = \int_{t}^{t_{f}} W_{i, 1}^{*} d τ

, one has

\begin{matrix} J_{i, 1}^{*} (z_{i, 1}) = & lim_{t_{f} \to \infty} \frac{1}{t_{f}} \int_{t}^{t + Δ t} W_{i, 1}^{*} d τ + lim_{t_{f} \to \infty} \frac{1}{t_{f}} V_{i, 1}^{*} (z_{i, 1} (t + Δ t)) . \end{matrix}

(18)

Then, one has

\begin{matrix} lim_{t_{f} \to \infty} \frac{1}{t_{f}} [ & lim_{Δ t \to 0} \frac{1}{Δ t} \int_{t}^{t + Δ t} W_{i, 1}^{*} d τ + lim_{Δ t \to 0} \frac{V_{i, 1}^{*} (z_{i, 1} (t + Δ t)) - V_{i, 1}^{*} (z_{i, 1} (t))}{Δ t}] = 0 . \end{matrix}

(19)

It means that

\begin{matrix} lim_{t_{f} \to \infty} \frac{1}{t_{f}} [W_{i, 1}^{*} + \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial t}] = 0 . \end{matrix}

(20)

Based on (5) and (6) and

x_{i, 2}

is regarded as

α_{i, 1}^{*}

, one has

\begin{matrix} \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial t} = & \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}} φ_{i} ((d_{i} + b_{i}) α_{i, 1}^{*} + F_{s i} + F_{p i} - β_{i, 1} + Γ_{i, 1}) + \frac{1}{2} \frac{\partial^{2} V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}^{2}} {∥ ℧_{i, 1} ∥}^{2} . \end{matrix}

(21)

Now, we have the HJB equation as follows:

\begin{matrix} H_{i, 1} (z_{i, 1}, α_{i, 1}^{*}, α_{k, 1}^{*}, \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}}) = W_{i, 1}^{*} + \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial t} = 0 . \end{matrix}

(22)

By solving the equation

\partial H_{i, 1} / \partial α_{i, 1}^{*} = 0

, one has

\begin{matrix} α_{i, 1}^{*} = - \frac{φ_{i} (d_{i} + b_{i})}{2} \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}} . \end{matrix}

(23)

Then, we let

\begin{matrix} \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}} = & \frac{1}{φ_{i} {(d_{i} + b_{i})}^{2}} {2 η_{i, 1} φ_{i} z_{i, 1}^{3} + \frac{2 {\bar{η}}_{i, 1}}{φ_{i}} z_{i, 1} + \frac{ϕ_{s i}^{T} ϕ_{s i}}{π^{2}} z_{i, 1}^{3} φ_{i} Θ_{s i, 1}^{*} \\ - 2 β_{i, 1} + V_{i, 1}^{c} (z_{i, 1}) + \frac{ϕ_{p i}^{T} ϕ_{p i}}{π^{2}} z_{i, 1}^{3} φ_{i} Θ_{p i, 1}^{*} + \frac{ϕ_{f i, 1}^{T} ϕ_{f i, 1}}{π^{2}} z_{i, 1}^{3} φ_{i} Θ_{i, 1}^{*}}, \end{matrix}

(24)

where

η_{i, 1}, {\bar{η}}_{i, 1}

and

π

are design positive constants.

Θ_{i, 1}^{*}, Θ_{s i, 1}^{*}, Θ_{p i, 1}^{*}

are the ideal weights, which will be defined later.

V_{i, 1}^{c} (z_{i, 1}) = - 2 η_{i, 1} φ_{i} z_{i, 1}^{3} - \frac{ϕ_{f i, 1}^{T} ϕ_{f i, 1}}{2 π^{2}} z_{i, 1}^{3} φ_{i} Θ_{i, 1}^{*} - \frac{2 {\bar{η}}_{i, 1}}{φ_{i}} z_{i, 1} + \frac{2 \dot{h} s_{i} (t)}{h} - \frac{ϕ_{s i}^{T} ϕ_{s i}}{2 π^{2}} z_{i, 1}^{3} φ_{i} Θ_{s i, 1}^{*} - \frac{ϕ_{p i}^{T} ϕ_{p i}}{2 π^{2}} z_{i, 1}^{3} φ_{i} Θ_{p i, 1}^{*} + φ_{i} {(d_{i} + b_{i})}^{2} \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}}

.

From (23) and (24), rewrite (23) as

\begin{matrix} α_{i, 1}^{*} = & - \frac{1}{d_{i} + b_{i}} {η_{i, 1} φ_{i} z_{i, 1}^{3} + \frac{{\bar{η}}_{i, 1}}{φ_{i}} z_{i, 1} + \frac{ϕ_{f i, 1}^{T} ϕ_{f i, 1}}{2 π^{2}} z_{i, 1}^{3} φ_{i} Θ_{i, 1}^{*} \\ + \frac{1}{2} V_{i, 1}^{c} (z_{i, 1}) + \frac{ϕ_{s i}^{T} ϕ_{s i}}{2 π^{2}} z_{i, 1}^{3} φ_{i} Θ_{s i, 1}^{*} + \frac{ϕ_{p i}^{T} ϕ_{p i}}{2 π^{2}} z_{i, 1}^{3} φ_{i} Θ_{p i, 1}^{*} - β_{i, 1}} . \end{matrix}

(25)

V_{i, 1}^{c} (z_{i, 1})

is approximated by NNs as

\begin{matrix} V_{i, 1}^{c} (z_{i, 1}) = Y_{i, 1}^{*^{T}} ϕ_{i, 1} (z_{i, 1}) + ε_{i, 1} (z_{i, 1}), \end{matrix}

(26)

where

∥ ε_{i, 1} (z_{i, 1}) ∥ \leq ε_{i, 1}^{*}

with constant

ε_{i, 1}^{*} > 0

.

Thus, from (25) and (26), as ideal weights

Θ_{i, 1}^{*}, Θ_{s i, 1}^{*}

,

Θ_{p i, 1}^{*}

and

Y_{i, 1}^{*}

are unknown parameters, thus, using

{\hat{Θ}}_{i, 1}, {\hat{Θ}}_{s i, 1}

,

{\hat{Θ}}_{p i, 1}, {\hat{Y}}_{i, 1}

to estimate

Θ_{i, 1}^{*}, Θ_{s i, 1}^{*}, Θ_{p i, 1}^{*}, Y_{i, 1}^{*}

, respectively.

To optimize the control performance, an RL method with an actor–critic structure is proposed as follows:

\begin{matrix} \frac{\partial {\hat{V}}_{i, 1} (z_{i, 1})}{\partial z_{i, 1}} = & \frac{1}{φ_{i} {(d_{i} + b_{i})}^{2}} {2 η_{i, 1} φ_{i} z_{i, 1}^{3} + \frac{2 {\bar{η}}_{i, 1}}{φ_{i}} z_{i, 1} - 2 β_{i, 1} + \frac{ϕ_{f i, 1}^{T} ϕ_{f i, 1}}{π^{2}} z_{i, 1}^{3} φ_{i} {\hat{Θ}}_{i, 1} \\ + {\hat{Y}}_{i, c 1} ϕ_{i, 1} (z_{i, 1}) + \frac{ϕ_{s i}^{T} ϕ_{s i}}{π^{2}} z_{i, 1}^{3} φ_{i} {\hat{Θ}}_{s i, 1} + \frac{ϕ_{p i}^{T} ϕ_{p i}}{π^{2}} z_{i, 1}^{3} φ_{i} {\hat{Θ}}_{p i, 1}}, \end{matrix}

(27)

\begin{matrix} {\hat{α}}_{i, 1} = - \frac{1}{d_{i} + b_{i}} & {η_{i, 1} φ_{i} z_{i, 1}^{3} + \frac{ϕ_{f i, 1}^{T} ϕ_{f i, 1}}{2 π^{2}} z_{i, 1}^{3} φ_{i} {\hat{Θ}}_{i, 1} + \frac{ϕ_{s i}^{T} ϕ_{s i}}{2 π^{2}} z_{i, 1}^{3} φ_{i} {\hat{Θ}}_{s i, 1} \\ + \frac{{\bar{η}}_{i, 1}}{φ_{i}} z_{i, 1} - β_{i, 1} + \frac{ϕ_{p i}^{T} ϕ_{p i}}{2 π^{2}} z_{i, 1}^{3} φ_{i} {\hat{Θ}}_{p i, 1} + \frac{1}{2} {\hat{Y}}_{i, a 1} ϕ_{i, 1} (z_{i, 1})}, \end{matrix}

(28)

where

{\hat{Y}}_{i, a 1}

and

{\hat{Y}}_{i, c 1}

estimate

Y_{i, 1}^{*}

,

\frac{\partial {\hat{V}}_{i, 1} (z_{i, 1})}{\partial z_{i, 1}}

estimates

\frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}}

.

Remark 1.

Solving the HJB equation analytically is challenging [45], so we employ a reinforcement learning approach to approximate its solution. Specifically, we construct an actor-critic network architecture: the actor interacts with the environment to optimize the policy based on the feedback, while the critic evaluates the current policy to improve the value function. Through the interaction between the actor and critic networks, an approximate solution to the HJB equation is obtained, thereby achieving optimal control.

Thus, from (22), (27) and (28), the approximation HJB function is

\begin{matrix} H_{i, 1} (z_{i, 1}, {\hat{α}}_{i, 1}, {\hat{α}}_{k, 1}, \frac{\partial {\hat{V}}_{i, 1} (z_{i, 1})}{\partial z_{i, 1}}) = & z_{i, 1}^{4} + {\hat{α}}_{i, 1}^{2} + \sum_{k \in N_{i}} {\hat{α}}_{k, 1}^{2} + \frac{\partial {\hat{V}}_{i, 1} (z_{i, 1})}{\partial z_{i, 1}} φ_{i} ((d_{i} + b_{i}) {\hat{α}}_{i, 1} \\ + F_{s i} + F_{p i} - β_{i, 1} + Γ_{i, 1}) + \frac{1}{2} \frac{\partial^{2} {\hat{V}}_{i, 1} (z_{i, 1})}{\partial z_{i, 1}^{2}} {∥ ℧_{i, 1} ∥}^{2} . \end{matrix}

(29)

Define the Bellman error as

\begin{matrix} ϵ_{i, 1} = & H_{i, 1} (z_{i, 1}, {\hat{α}}_{i, 1}, {\hat{α}}_{k, 1}, \frac{\partial {\hat{V}}_{i, 1} (z_{i, 1})}{\partial z_{i, 1}}) - H_{i, 1} (z_{i, 1}, α_{i, 1}^{*}, α_{k, 1}^{*}, \frac{\partial V_{i, 1}^{*} (z_{i, 1})}{\partial z_{i, 1}}) \\ = & H_{i, 1} (z_{i, 1}, {\hat{α}}_{i, 1}, {\hat{α}}_{k, 1}, \frac{\partial {\hat{V}}_{i, 1} (z_{i, 1})}{\partial z_{i, 1}}) . \end{matrix}

(30)

{\hat{α}}_{i, 1}

is expected as a unique solution to make

ϵ_{i, 1} \to 0

. If

ϵ_{i, 1} = 0

is held and has unique solution, it is equivalent to the following:

\begin{matrix} \frac{\partial ϵ_{i, 1}}{\partial {\hat{Y}}_{i, a 1}} = \frac{ϕ_{i, 1} (z_{i, 1}) ϕ_{i, 1}^{T} (z_{i, 1})}{2 {(d_{i} + b_{i})}^{2}} ({\hat{Y}}_{i, a 1} - {\hat{Y}}_{i, c 1}) = 0_{N \times 1} . \end{matrix}

(31)

To formulate the actor updating law for training the weight as given in (31), we design a positive definite function

P (t)

as follows:

\begin{matrix} P (t) = & \frac{1}{2 {(d_{i} + b_{i})}^{2}} {({\hat{Y}}_{i, a 1} - {\hat{Y}}_{i, c 1})}^{T} ϖ_{i, 1} ({\hat{Y}}_{i, a 1} - {\hat{Y}}_{i, c 1}), \end{matrix}

(32)

where

ϖ_{i, 1} = ϕ_{i, 1} (z_{i, 1}) ϕ_{i, 1}^{T} (z_{i, 1}) + j_{i, 1} I

,

j_{i, 1} > 0

is the design constant and

I \in R^{p \times p}

is a identity matrix.

Obviously,

P (t) = 0

is synonymous with (31). Consequently, the adaptive updating law

{\dot{\hat{Y}}}_{i, a 1}

is derived from the negative gradient of

P (t)

.

\begin{matrix} {\dot{\hat{Y}}}_{i, a 1} & = - γ_{i, a 1} \frac{\partial P (t)}{\partial {\hat{Y}}_{i, a 1}} = - \frac{γ_{i, a 1}}{{(d_{i} + b_{i})}^{2}} ϖ_{i, 1} ({\hat{Y}}_{i, a 1} - {\hat{Y}}_{i, c 1}) . \end{matrix}

(33)

Remark 2.

The actor updating law in [33] was constructed as

{\dot{\hat{Y}}}_{i, a 1} = - γ_{i, a 1} ϕ_{i, 1} ϕ_{i, 1}^{T} ({\hat{Y}}_{i, a 1} - {\hat{Y}}_{i, c 1})

. When

{\hat{Y}}_{i, a 1} - {\hat{Y}}_{i, c 1}

falls on the zero feature vector of

ϕ_{i, 1} ϕ_{i, 1}^{T}

, the training will be terminated prematurely. Therefore,

j_{i, 1} I

is introduced in (33) for sufficient training.

Design adaptive law

{\dot{\hat{W}}}_{i, c 1}

of critic NNs, and adaptive law

{\dot{\hat{Θ}}}_{i, 1}, {\dot{\hat{Θ}}}_{s i, 1}

and

{\dot{\hat{Θ}}}_{p i, 1}

as

\begin{matrix} {\dot{\hat{Y}}}_{i, c 1} = & - \frac{γ_{i, c 1}}{{(d_{i} + b_{i})}^{2}} ϖ_{i, 1} {\hat{Y}}_{i, c 1} - \frac{φ_{i}}{2} ϕ_{i, 1} (z_{i, 1}) z_{i, 1}^{3}, \end{matrix}

(34)

\begin{matrix} {\dot{\hat{Θ}}}_{i, 1} = z_{i, 1}^{6} φ_{i}^{2} \frac{γ_{i, 1}}{2 π^{2}} ϕ_{f i, 1}^{T} ϕ_{f i, 1} - σ_{i, 1} {\hat{Θ}}_{i, 1}, \end{matrix}

(35)

\begin{matrix} {\dot{\hat{Θ}}}_{s i, 1} = z_{i, 1}^{6} φ_{i}^{2} \frac{{\bar{γ}}_{i, 1}}{2 π^{2}} ϕ_{s i}^{T} ϕ_{s i} - σ_{s i, 1} {\hat{Θ}}_{s i, 1}, \end{matrix}

(36)

\begin{matrix} {\dot{\hat{Θ}}}_{p i, 1} = z_{i, 1}^{6} φ_{i}^{2} \frac{{\underset{̲}{γ}}_{i, 1}}{2 π^{2}} ϕ_{p i}^{T} ϕ_{p i} - σ_{p i, 1} {\hat{Θ}}_{p i, 1}, \end{matrix}

(37)

where

γ_{i, a 1} > 0

and

γ_{i, c 1} > 0

are learning rates, and

γ_{i, a 1} > γ_{i, c 1}

.

γ_{i, 1}, {\bar{γ}}_{i, 1}, {\underset{̲}{γ}}_{i, 1}, σ_{i, 1}, σ_{s i, 1}

and

σ_{p i, 1}

are positive constants.

Step $j$ ( $2 \leq j \leq n - 1$ ): From (3) and (13), similar to (14) one has

\begin{matrix} d z_{i, j} = & (x_{i, j + 1} - β_{i, j} - Γ_{i, j} - Φ_{i, j}) d t + ℧_{i, j} d ω, \end{matrix}

(38)

where

β_{i, j} = \sum_{m = 1}^{j - 1} \frac{\partial {\hat{α}}_{i, j - 1}}{\partial {\hat{Y}}_{i, a m}} {\dot{\hat{Y}}}_{i, a m} + \sum_{m = 1}^{j - 1} \frac{\partial {\hat{α}}_{i, j - 1}}{\partial {\hat{Θ}}_{i, m}} {\dot{\hat{Θ}}}_{i, m} + \sum_{m = 1}^{j - 1} \frac{\partial {\hat{α}}_{i, j - 1}}{\partial {\hat{Θ}}_{s i, m}} {\dot{\hat{Θ}}}_{s i, m} + \sum_{m = 1}^{j - 1} \frac{\partial {\hat{α}}_{i, j - 1}}{\partial {\hat{Θ}}_{p i, m}} {\dot{\hat{Θ}}}_{p i, m} - \frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}} φ_{i} β_{i, 1}, Γ_{i, j} = \sum_{m = 1}^{j - 1} \frac{\partial {\hat{α}}_{i, j - 1}}{\partial x_{i, m}} x_{i, m + 1} - f_{i, j} + \frac{1}{2} (\frac{\partial^{2} {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}^{2}} φ_{i}^{2} b_{i}^{2} ϱ_{i}^{2} g_{i, 1}^{T} g_{i, 1} + \sum_{p, q = 1}^{j - 1} \frac{\partial^{2} {\hat{α}}_{i, j - 1}}{\partial x_{i, p} \partial x_{i, q}} g_{i, p}^{T} g_{i, q} + 2 \sum_{m = 1}^{j - 1} \frac{\partial^{2} {\hat{α}}_{i, j - 1}}{\partial z_{i, 1} \partial x_{i, m}} φ_{i} b_{i} ϱ_{i} g_{i, m}^{T} g_{i, 1}) + \frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}} φ_{i} Γ_{i, 1} + \frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}} φ_{i} F_{s i} + \frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}} φ_{i} F_{p i} + \sum_{m = 1}^{j - 1} \frac{\partial {\hat{α}}_{i, j - 1}}{\partial x_{i, m}} f_{i, m},

Φ_{i, j} = \frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}} φ_{i} (d_{i} + b_{i}) x_{i, j}

and

℧_{i, j} = g_{i, j} - \sum_{m = 1}^{j - 1} \frac{\partial {\hat{α}}_{i, j - 1}}{\partial x_{i, m}} g_{i, m} - \frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}} ℧_{i, 1}

.

Similar to (16) and (17), define an optimal value function as follows:

\begin{matrix} J_{i, j}^{*} (z_{i, j}) & = lim_{t_{f} \to \infty} \frac{1}{t_{f}} \int_{t}^{t_{f}} W_{i, j}^{*} d τ, \end{matrix}

(39)

where

W_{i, j}^{*} = z_{i, j}^{4} + α_{i, j}^{* 2}

.

Then, similar to (18)–(24), one has

\begin{matrix} \frac{\partial V_{i, j}^{*} (z_{i, j})}{\partial z_{i, j}} = & 2 (η_{i, j} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, j}^{3} + 2 ({\bar{η}}_{i, j} + δ^{2} + \frac{3}{4} {\bar{ε}}_{i, j}^{\frac{4}{3}} {\bar{Φ}}_{i, j}) z_{i, j} \\ + V_{i, j}^{c} - 2 β_{i, j} + \frac{ϕ_{f i, j}^{T} ϕ_{f i, j}}{π^{2}} z_{i, j}^{3} Θ_{i, j}^{*}, \end{matrix}

(40)

\begin{matrix} α_{i, j}^{*} = & - {(η_{i, j} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, j}^{3} + ({\bar{η}}_{i, j} + δ^{2} + \frac{3}{4} {\bar{ε}}_{i, j}^{\frac{4}{3}} {\bar{Φ}}_{i, j}) z_{i, j} \\ + \frac{1}{2} V_{i, j}^{c} - β_{i, j} + \frac{ϕ_{f i, j}^{T} ϕ_{f i, j}}{2 π^{2}} z_{i, j}^{3} Θ_{i, j}^{*}}, \end{matrix}

(41)

where

η_{i, j} > 0, {\bar{η}}_{i, j} > 0

,

Θ_{i, j}^{*}

is the ideal weight, which will be defined later.

V_{i, j}^{c} (z_{i, j}) = \frac{\partial V_{i, j}^{*} (z_{i, j})}{\partial z_{i, j}} + 2 β_{i, j} - 2 (η_{i, j} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, j}^{3} - 2 ({\bar{η}}_{i, j} + δ^{2} + \frac{3}{4} {\bar{ε}}_{i, j}^{\frac{4}{3}} {\bar{Φ}}_{i, j}) z_{i, j} - \frac{ϕ_{f i, j}^{T} ϕ_{f i, j}}{π^{2}} z_{i, j}^{3} Θ_{i, j}^{*}

.

Similar to (26)–(28), one has

\begin{matrix} \frac{\partial {\hat{V}}_{i, j} (z_{i, j})}{\partial z_{i, j}} = & 2 (η_{i, j} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, j}^{3} + 2 ({\bar{η}}_{i, j} + δ^{2} + \frac{3}{4} {\bar{ε}}_{i, j}^{\frac{4}{3}} {\bar{Φ}}_{i, j}) z_{i, j} \\ + {\hat{Y}}_{i, c j}^{T} ϕ_{i, j} (z_{i, j}) + \frac{ϕ_{f i, j}^{T} ϕ_{f i, j}}{π^{2}} z_{i, j}^{3} {\hat{Θ}}_{i, j} - 2 β_{i, j}, \end{matrix}

(42)

\begin{matrix} {\hat{α}}_{i, j} = & - {(η_{i, j} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, j - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, j}^{3} + ({\bar{η}}_{i, j} + \frac{3}{4} {\bar{ε}}_{i, j}^{\frac{4}{3}} {\bar{Φ}}_{i, j} \\ + δ^{2}) z_{i, j} + \frac{1}{2} {\hat{Y}}_{i, a j}^{T} ϕ_{i, j} (z_{i, j}) + \frac{ϕ_{f i, j}^{T} ϕ_{f i, j}}{2 π^{2}} z_{i, j}^{3} {\hat{Θ}}_{i, j} - β_{i, j}} . \end{matrix}

(43)

Similar to (29)–(37), design the adaptive laws

{\dot{\hat{Θ}}}_{i, j}

,

{\dot{\hat{Y}}}_{i, c j}

and

{\dot{\hat{Y}}}_{i, a j}

as

\begin{matrix} {\dot{\hat{Θ}}}_{i, j} = z_{i, j}^{6} \frac{γ_{i, j}}{2 π^{2}} ϕ_{f i, j}^{T} ϕ_{f i, j} - σ_{i, j} {\hat{Θ}}_{i, j}, \end{matrix}

(44)

\begin{matrix} {\dot{\hat{Y}}}_{i, a j} = - γ_{i, a j} ϖ_{i, j} ({\hat{Y}}_{i, a j} - {\hat{Y}}_{i, c j}), \end{matrix}

(45)

\begin{matrix} {\dot{\hat{Y}}}_{i, c j} = - γ_{i, c j} ϖ_{i, j} {\hat{Y}}_{i, c j} - \frac{1}{2} ϕ_{i, j} (z_{i, j}) z_{i, j}^{3}, \end{matrix}

(46)

where

ϖ_{i, j} = ϕ_{i, j} (z_{i, j}) ϕ_{i, j}^{T} (z_{i, j}) + j_{i, j} I

,

j_{i, j}

,

γ_{i, j}

and

σ_{i, j}

are the design positive constants,

γ_{i, a j} > 0

and

γ_{i, c j} > 0

are learning rates, and

γ_{i, a j} > γ_{i, c j}

.

Step $n$ : From (3) and (13), similar to (38), one has

\begin{matrix} d z_{i, n} = & ({\hat{u}}_{i} - β_{i, n} - Γ_{i, n} - Φ_{i, n}) d t + ℧_{i, n} d ω, \end{matrix}

(47)

where

β_{i, n} = \sum_{m = 1}^{n - 1} \frac{\partial {\hat{α}}_{i, n - 1}}{\partial {\hat{Y}}_{i, a m}} {\dot{\hat{Y}}}_{i, a m} + \sum_{m = 1}^{n - 1} \frac{\partial {\hat{α}}_{i, n - 1}}{\partial {\hat{Θ}}_{i, m}} {\dot{\hat{Θ}}}_{i, m} + \sum_{m = 1}^{n - 1} \frac{\partial {\hat{α}}_{i, n - 1}}{\partial {\hat{Θ}}_{s i, m}} {\dot{\hat{Θ}}}_{s i, m} + \sum_{m = 1}^{n - 1} \frac{\partial {\hat{α}}_{i, n - 1}}{\partial {\hat{Θ}}_{p i, m}} {\dot{\hat{Θ}}}_{p i, m} - \frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}} φ_{i} β_{i, 1}, Γ_{i, n} = \sum_{m = 1}^{n - 1} \frac{\partial {\hat{α}}_{i, n - 1}}{\partial x_{i, m}} x_{i, m + 1} - f_{i, n} + \frac{1}{2} (\frac{\partial^{2} {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}^{2}} φ_{i}^{2} b_{i}^{2} ϱ_{i}^{2} g_{i, 1}^{T} g_{i, 1} + \sum_{p, q = 1}^{n - 1} \frac{\partial^{2} {\hat{α}}_{i, n - 1}}{\partial x_{i, p} \partial x_{i, q}} g_{i, p}^{T} g_{i, q}

+ 2 \sum_{m = 1}^{n - 1} \frac{\partial^{2} {\hat{α}}_{i, n - 1}}{\partial z_{i, 1} \partial x_{i, m}} φ_{i} b_{i} ϱ_{i} g_{i, m}^{T} g_{i, 1}) + \frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}} φ_{i} Γ_{i, 1} + \frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}} φ_{i} F_{s i} + \frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}} φ_{i} F_{p i} + \sum_{m = 1}^{n - 1} \frac{\partial {\hat{α}}_{i, n - 1}}{\partial x_{i, m}} f_{i, m}

,

Φ_{i, n} = \frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}} φ_{i} (d_{i} + b_{i}) x_{i, n}

and

℧_{i, n} = g_{i, n} - \sum_{m = 1}^{n - 1} \frac{\partial {\hat{α}}_{i, n - 1}}{\partial x_{i, m}} g_{i, m} - \frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}} ℧_{i, 1}

.

Similar to (16) and (17), define an optimal value function as follows:

\begin{matrix} J_{i, n}^{*} (z_{i, n}) & = lim_{t_{f} \to \infty} \frac{1}{t_{f}} \int_{t}^{t_{f}} W_{i, n}^{*} d τ, \end{matrix}

(48)

where

W_{i, n}^{*} = z_{i, n}^{4} + u_{i}^{* 2}

.

Similar to (18)–(24), one has

\begin{matrix} \frac{\partial V_{i, n}^{*} (z_{i, n})}{\partial z_{i, n}} = & 2 (η_{i, n} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, n}^{3} + 2 ({\bar{η}}_{i, n} + δ^{2} + \frac{3}{4} {\bar{ε}}_{i, n}^{\frac{4}{3}} {\bar{Φ}}_{i, n}) z_{i, n} \\ + \frac{ϕ_{f i, n}^{T} ϕ_{f i, n}}{π^{2}} z_{i, n}^{3} Θ_{i, n}^{*} - 2 β_{i, n} + V_{i, n}^{c}, \end{matrix}

(49)

\begin{matrix} u_{i}^{*} = & - {(η_{i, n} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, n}^{3} + ({\bar{η}}_{i, n} + \frac{3}{4} {\bar{ε}}_{i, n}^{\frac{4}{3}} {\bar{Φ}}_{i, n} \\ + δ^{2}) z_{i, n} + \frac{1}{2} V_{i, n}^{c} + \frac{ϕ_{f i, n}^{T} ϕ_{f i, n}}{2 π^{2}} z_{i, n}^{3} Θ_{i, n}^{*} - β_{i, n}}, \end{matrix}

(50)

where

η_{i, n} > 0, {\bar{η}}_{i, n} > 0

,

Θ_{i, n}^{*}

is the ideal weight, which will be defined later.

V_{i, n}^{c} (z_{i, n}) = \frac{\partial V_{i, n}^{*} (z_{i, n})}{\partial z_{i, n}} - 2 (η_{i, n} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, n}^{3} - 2 ({\bar{η}}_{i, n} + δ^{2} + \frac{3}{4} {\bar{ε}}_{i, n}^{\frac{4}{3}} {\bar{Φ}}_{i, n}) z_{i, n} - \frac{ϕ_{f i, n}^{T} ϕ_{f i, n}}{π^{2}} z_{i, n}^{3} Θ_{i, n}^{*} + 2 β_{i, n}

.

Similar to (26)–(28), one has

\begin{matrix} \frac{\partial {\hat{V}}_{i, n} (z_{i, n})}{\partial z_{i, n}} = & 2 (η_{i, n} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, n}^{3} - 2 β_{i, n} + {\hat{Y}}_{i, c n}^{T} ϕ_{i, n} (z_{i, n}) \\ + \frac{ϕ_{f i, n}^{T} ϕ_{f i, n}}{π^{2}} z_{i, n}^{3} {\hat{Θ}}_{i, n} + 2 ({\bar{η}}_{i, n} + δ^{2} + \frac{3}{4} {\bar{ε}}_{i, n}^{\frac{4}{3}} {\bar{Φ}}_{i, n}) z_{i, n}, \end{matrix}

(51)

\begin{matrix} {\hat{u}}_{i} = & - {(η_{i, n} + \frac{1}{2} {(\frac{\partial {\hat{α}}_{i, n - 1}}{\partial z_{i, 1}})}^{2} φ_{i}^{2}) z_{i, n}^{3} + ({\bar{η}}_{i, n} + \frac{3}{4} {\bar{ε}}_{i, n}^{\frac{4}{3}} {\bar{Φ}}_{i, n} \\ + δ^{2}) z_{i, n} + \frac{1}{2} {\hat{Y}}_{i, a n}^{T} ϕ_{i, n} (z_{i, n}) + \frac{ϕ_{f i, n}^{T} ϕ_{f i, n}}{2 π^{2}} z_{i, n}^{3} {\hat{Θ}}_{i, n} - β_{i, n}} . \end{matrix}

(52)

Similar to (29)–(37), design the adaptive laws

{\dot{\hat{Θ}}}_{i, n}

,

{\dot{\hat{Y}}}_{i, a n}

and

{\dot{\hat{Y}}}_{i, c n}

as

\begin{matrix} {\dot{\hat{Θ}}}_{i, n} = z_{i, n}^{6} \frac{γ_{i, n}}{2 π^{2}} ϕ_{f i, n}^{T} ϕ_{f i, n} - σ_{i, n} {\hat{Θ}}_{i, n}, \end{matrix}

(53)

\begin{matrix} {\dot{\hat{Y}}}_{i, a n} = - γ_{i, a n} ϖ_{i, n} ({\hat{Y}}_{i, a n} - {\hat{Y}}_{i, c n}), \end{matrix}

(54)

\begin{matrix} {\dot{\hat{Y}}}_{i, c n} = & - γ_{i, c n} ϖ_{i, n} {\hat{Y}}_{i, c n} - \frac{1}{2} ϕ_{i, n} (z_{i, n}) z_{i, n}^{3}, \end{matrix}

(55)

where

ϖ_{i, n} = ϕ_{i, n} (z_{i, n}) ϕ_{i, n}^{T} (z_{i, n}) + j_{i, n} I

,

j_{i, n}

,

γ_{i, n}

and

σ_{i, n}

are the design positive constants,

γ_{i, a n} > 0

and

γ_{i, c n} > 0

are the learning rates, and

γ_{i, a n} > γ_{i, c n}

.

To clearly demonstrate our ideas and process, a block diagram is provided using Figure 1 and a pseudocode as Algorithm 1.

Algorithm 1: The Fixed-time prescribed performance optimization consensus control algorithm

3.2. Stability Analysis

Theorem 1.

Consider the SNMASs (3), which satisfies Assumption A1, by designing the local ideal control laws (52), local ideal virtual control law (28) and (43), adaptive laws (33)–(37), (44)–(46), and (53)–(55), the consensus error satisfies the FTPPC requirement, while other signals remain probabilistically bounded.□

Proof.

Define the Lyapunov function for SNMASs (3) as

\begin{matrix} V = & \sum_{i = 1}^{N} \sum_{m = 1}^{n} V_{i, m}, \end{matrix}

(56)

where

V_{i, 1} = \frac{1}{4} z_{i, 1}^{4} + \frac{1}{2 γ_{i, 1}} {\tilde{Θ}}_{i, 1}^{2} + \frac{1}{2 {\bar{γ}}_{i, 1}} {\tilde{Θ}}_{s i, 1}^{2} + \frac{1}{2 {\underset{̲}{γ}}_{i, 1}} {\tilde{Θ}}_{p i, 1}^{2} + \frac{1}{2} {\tilde{Y}}_{i, a 1}^{T} {\tilde{Y}}_{i, a 1} + \frac{1}{2} {\tilde{Y}}_{i, c 1}^{T} {\tilde{Y}}_{i, c 1}

,

V_{i, j} = \frac{1}{4} z_{i, j}^{4} + \frac{1}{2 γ_{i, j}} {\tilde{Θ}}_{i, j}^{2} + \frac{1}{2} {\tilde{Y}}_{i, a j}^{T} {\tilde{Y}}_{i, a j} + \frac{1}{2} {\tilde{Y}}_{i, c j}^{T} {\tilde{Y}}_{i, c j}, j = 2, \dots, n

,

{\tilde{Y}}_{i, a m} = {\hat{Y}}_{i, a m} - Y_{i, m}^{*}

and

{\tilde{Y}}_{i, c m} = {\hat{Y}}_{i, c m} - Y_{i, m}^{*}

.

{\tilde{Θ}}_{i, m} = {\hat{Θ}}_{i, m} - Θ_{i, m}^{*}

with

Θ_{i, m}^{*} = {∥ Y_{i, m}^{*} ∥}^{2}

,

m = 1, \dots, n

,

{\tilde{Θ}}_{s i, 1} = {\hat{Θ}}_{s i, 1} - Θ_{s i, 1}^{*}

with

Θ_{s i, 1}^{*} = {∥ Y_{s i, 1}^{*} ∥}^{2}

, and

{\tilde{Θ}}_{p i, 1} = {\hat{Θ}}_{p i, 1} - Θ_{p i, 1}^{*}

with

Θ_{p i, 1}^{*} = {∥ Y_{p i}^{*} ∥}^{2}

.

From (5), (13), (33) and (34), one has

L V_{i, 1}

is

\begin{matrix} L V_{i, 1} = & z_{i, 1}^{3} φ_{i} [(d_{i} + b_{i}) (z_{i, 2} + {\hat{α}}_{i, 1}) + F_{s i} + F_{p i} - β_{i, 1} + Γ_{i, 1}] - \frac{γ_{i, a 1}}{{(d_{i} + b_{i})}^{2}} {\tilde{Y}}_{i, a 1}^{T} ϖ_{i, 1} ({\hat{Y}}_{i, a 1} \\ - {\hat{Y}}_{i, c 1}) + \frac{3 z_{i, 1}^{2}}{2} {∥ ℧_{i, 1} ∥}^{2} + \frac{{\tilde{Θ}}_{i, 1}}{γ_{i, 1}} {\dot{\hat{Θ}}}_{i, 1} + \frac{{\tilde{Θ}}_{s i, 1}}{γ_{s i, 1}} {\dot{\hat{Θ}}}_{s i, 1} + \frac{{\tilde{Θ}}_{p i, 1}}{γ_{p i, 1}} {\dot{\hat{Θ}}}_{p i, 1} \\ - \frac{φ_{i}}{2} {\tilde{Y}}_{i, c 1}^{T} ϕ_{i, 1} (z_{i, 1}) z_{i, 1}^{3} - \frac{γ_{i, c 1}}{{(d_{i} + b_{i})}^{2}} {\tilde{Y}}_{i, c 1}^{T} ϖ_{i, 1} {\hat{Y}}_{i, c 1} . \end{matrix}

(57)

By applying (7), one has

\begin{matrix} \frac{3 z_{i, 1}^{2} φ_{i}^{2}}{2} ∥ ℧_{i, 1} ∥^{2} \leq \frac{9}{ς^{3}} + \frac{ς^{3 / 2} z_{i, 1}^{3} φ_{i}^{3}}{3 \sqrt{2}} {∥ b_{i} ϱ_{i} g_{i, 1} ∥}^{3}, \end{matrix}

(58)

where

ς > 0

is a constant.

Define

F_{i, 1} (Z_{i, 1}) = \frac{ς^{3 / 2} φ_{i}^{2}}{3 \sqrt{2}} {∥ b_{i} ϱ_{i} g_{i, 1} ∥}^{3} + Γ_{i, 1}

with

Z_{i, 1} = {[z_{i, 1}, x_{i, 1}, {\bar{x}}_{k, 2}]}^{T}

, similar to (26), we can express

F_{i, 1} (Z_{i, 1})

as follows

\begin{matrix} F_{i, 1} (Z_{i, 1}) = Y_{f i, 1}^{* T} ϕ_{f i, 1} (Z_{i, 1}) + ε_{f i, 1} (Z_{i, 1}), \end{matrix}

(59)

where

∥ ε_{i, 1} (Z_{i, 1}) ∥ \leq ε_{i, 1}^{*}

with constant

ε_{i, 1}^{*} > 0

.

By applying (7), one has

\begin{matrix} z_{i, 1}^{3} φ_{i} Y_{m}^{* T} ϕ_{m} & \leq z_{i, 1}^{6} φ_{i}^{2} \frac{Θ_{m, 1}^{*}}{2 π^{2}} ϕ_{m}^{T} ϕ_{m} + \frac{π^{2}}{2}, m = s i, p i, (f i, 1), \end{matrix}

(60)

\begin{matrix} z_{i, 1}^{3} φ_{i} ε_{m} \leq \frac{1}{4} z_{i, 1}^{6} φ_{i}^{2} + ε_{m}^{* 2}, m = s i, p i, (f i, 1), \end{matrix}

(61)

\begin{matrix} z_{i, 1}^{3} φ_{i} (d_{i} + b_{i}) z_{i, 2} \leq \frac{1}{4} z_{i, 1}^{6} φ_{i}^{2} + {(d_{i} + b_{i})}^{2} z_{i, 2}^{2}, \end{matrix}

(62)

\begin{matrix} {(d_{i} + b_{i})}^{2} z_{i, 2}^{2} \leq \frac{1}{4 δ_{i, 1}^{2}} {(d_{i} + b_{i})}^{4} + δ_{i, 1}^{2} z_{i, 2}^{4}, \end{matrix}

(63)

\begin{matrix} {\tilde{Y}}_{i, a 1}^{T} ϖ_{i, 1} {\tilde{Y}}_{i, c 1} \leq \frac{1}{2} {\tilde{Y}}_{i, a 1}^{T} ϖ_{i, 1} {\tilde{Y}}_{i, a 1} + \frac{1}{2} {\tilde{Y}}_{i, c 1}^{T} ϖ_{i, 1} {\tilde{Y}}_{i, c 1}, \end{matrix}

(64)

\begin{matrix} - \frac{1}{2} Y^{T} ϕ_{i, 1} φ_{i} z_{i, 1}^{3} \leq \frac{1}{4} Y_{i, a 1}^{T} ϖ_{i, 1} Y + \frac{1}{4} φ_{i}^{2} z_{i, 1}^{6}, Y = {\tilde{Y}}_{i, a 1}, {\hat{Y}}_{i, c 1}, \end{matrix}

(65)

\begin{matrix} - \frac{σ_{m, 1}}{γ_{m, 1}} {\tilde{Θ}}_{m, 1} {\hat{Θ}}_{m, 1} \leq & \frac{σ_{m, 1}}{2 γ_{m, 1}} Θ_{m, 1}^{* 2} - \frac{σ_{m, 1}}{2 γ_{m, 1}} {\tilde{Θ}}_{m, 1}^{2}, m = i, s i, p i . \end{matrix}

(66)

Then, the following equations hold:

\begin{matrix} {\hat{Y}}_{i, a 1} - {\hat{Y}}_{i, c 1} = {\tilde{Y}}_{i, a 1} - {\tilde{Y}}_{i, c 1}, \end{matrix}

(67)

\begin{matrix} - \frac{φ_{i}}{2} z_{i, 1}^{3} {\hat{Y}}_{i, a 1}^{T} ϕ_{i, 1} - \frac{φ_{i}}{2} z_{i, 1}^{3} {\tilde{Y}}_{i, c 1}^{T} ϕ_{i, 1} = - \frac{φ_{i}}{2} z_{i, 1}^{3} {\tilde{Y}}_{i, a 1}^{T} ϕ_{i, 1} - \frac{φ_{i}}{2} z_{i, 1}^{3} {\hat{Y}}_{i, c 1}^{T} ϕ_{i, 1}, \end{matrix}

(68)

\begin{matrix} {\tilde{Y}}_{i, c 1}^{T} ϖ_{i, 1} {\hat{Y}}_{i, c 1} = & \frac{1}{2} {\tilde{Y}}_{i, c 1}^{T} ϖ_{i, 1} {\tilde{Y}}_{i, c 1} + \frac{1}{2} {\hat{Y}}_{i, c 1}^{T} ϖ_{i, 1} {\hat{Y}}_{i, c 1} - \frac{1}{2} Y_{f i, 1}^{* T} ϖ_{i, 1} Y_{f i, 1}^{*} . \end{matrix}

(69)

By invoking (35)–(37) and (57)–(69) yields

\begin{matrix} L V_{i, 1} \leq & - (η_{i, 1} - \frac{3}{2}) φ_{i}^{2} z_{i, 1}^{6} - \frac{γ_{i, c 1} - γ_{i, a 1}}{2 {(d_{i} + b_{i})}^{2}} {\tilde{Y}}_{i, c 1}^{T} ϖ_{i, 1} {\tilde{Y}}_{i, c 1} - {\bar{η}}_{i, 1} z_{i, 1}^{4} - \frac{σ_{i, 1}}{2 γ_{i, 1}} {\tilde{Θ}}_{i, 1}^{2} \\ - \frac{σ_{s i, 1}}{2 γ_{s i, 1}} {\tilde{Θ}}_{s i, 1}^{2} - \frac{σ_{p i, 1}}{2 γ_{p i, 1}} {\tilde{Θ}}_{p i, 1}^{2} - (\frac{γ_{i, c 1}}{2 {(d_{i} + b_{i})}^{2}} - \frac{1}{4}) {\hat{Y}}_{i, c 1}^{T} ϖ_{i, 1} {\hat{Y}}_{i, c 1} + Δ_{i, 1} \\ + δ_{i, 1}^{2} z_{i, 2}^{4} - (\frac{γ_{i, a 1}}{2 {(d_{i} + b_{i})}^{2}} - \frac{1}{4}) {\tilde{Y}}_{i, a 1}^{T} ϖ_{i, 1} {\tilde{Y}}_{i, a 1}, \end{matrix}

(70)

where

Δ_{i, 1} = ε_{f i, 1}^{* 2} + ε_{s i}^{* 2} + ε_{p i}^{* 2} + \frac{π^{2}}{2} + \frac{π^{2}}{2} + \frac{π^{2}}{2} + \frac{σ_{i, 1}}{2 γ_{i, 1}} Θ_{i, 1}^{* 2} + \frac{σ_{s i, 1}}{2 γ_{s i, 1}} Θ_{s i, 1}^{* 2} + \frac{σ_{p i, 1}}{2 γ_{p i, 1}} Θ_{p i, 1}^{* 2} + \frac{9}{ς^{3}} + \frac{1}{4 δ^{2}} {(d_{i} + b_{i})}^{4} + \frac{γ_{i, c 1}}{2 {(d_{i} + b_{i})}^{2}} Y_{f i, 1}^{* T} ϖ_{i, 1} Y_{f i, 1}^{*}

.

Let

η_{i, 1} > \frac{3}{2}

,

γ_{i, a 1} > \frac{{(d_{i} + b_{i})}^{2}}{2}

,

γ_{i, c 1} > γ_{i, a 1}

,

γ_{i, 1}^{*} = min {\frac{γ_{i, c 1} - γ_{i, a 1}}{{(d_{i} + b_{i})}^{2}}, \frac{γ_{i, a 1}}{{(d_{i} + b_{i})}^{2}} - \frac{1}{2}}

,

c_{i, 1} = min {4 {\bar{η}}_{i, 1}, σ_{i, 1}, σ_{s i}, σ_{p i}, \frac{γ_{i, 1}^{*} λ_{min (i, 1)}}{{(d_{i} + b_{i})}^{2}}}

,

λ_{min (i, 1)}

is the minimal characteristic value of

ϖ_{i, 1}

, rewrite (70) as

\begin{matrix} L V_{i, 1} \leq - c_{i, 1} V_{i, 1} + Δ_{i, 1} + δ_{i, 1}^{2} z_{i, 2}^{4} . \end{matrix}

(71)

By applying (7), one has

\begin{matrix} z_{i, j}^{3} ε_{i, j} \leq \frac{1}{2} z_{i, j}^{6} + \frac{1}{2} ε_{f i, j}^{* 2}, \end{matrix}

(72)

\begin{matrix} z_{i, j}^{3} z_{i, j + 1} \leq \frac{1}{2} z_{i, j}^{6} + \frac{1}{2} z_{i, j + 1}^{2}, \end{matrix}

(73)

\begin{matrix} \frac{1}{2} z_{i, j + 1}^{2} \leq δ^{2} z_{i, j + 1}^{4} + \frac{1}{16 δ^{2}}, \end{matrix}

(74)

\begin{matrix} - z_{i, j}^{3} Φ_{i, j} \leq \frac{3}{4} {\bar{ε}}_{i, j}^{\frac{4}{3}} z_{i, j}^{4} Φ_{i, j}^{\frac{4}{3}} + \frac{1}{4 {\bar{ε}}_{i, j}^{4}} . \end{matrix}

(75)

Similar to

L V_{i, 1}

, one has

\begin{matrix} L V_{i, j} \leq & - (η_{i, j} - \frac{3}{2}) z_{i, j}^{6} + δ^{2} (z_{i, j + 1}^{4} - z_{i, j}^{4}) - \frac{σ_{i, j}}{2 γ_{i, j}} {\tilde{Θ}}_{i, j}^{2} - \frac{γ_{i, c j} - γ_{i, a j}}{2} {\tilde{Y}}_{i, c j}^{T} ϖ_{i, j} {\tilde{Y}}_{i, c j} \\ - {\bar{η}}_{i, j} z_{i, j}^{4} - (\frac{γ_{i, c j}}{2} - \frac{1}{4}) {\hat{Y}}_{i, c j}^{T} ϖ_{i, j} {\hat{Y}}_{i, c j} - (\frac{γ_{i, a j}}{2} - \frac{1}{4}) {\tilde{Y}}_{i, a j}^{T} ϖ_{i, j} {\tilde{Y}}_{i, a j} + Δ_{i, j}, \end{matrix}

(76)

where

Δ_{i, j} = \frac{σ_{i, j}}{2 γ_{i, j}} Θ_{i, j}^{* 2} + \frac{9}{ς^{3}} + \frac{π^{2}}{2} + \frac{1}{2} ε_{f i, j}^{* 2} + \frac{1}{4 {\bar{ε}}_{i, j}^{4}} + \frac{1}{16 δ^{2}} + \frac{γ_{i, c j}}{2} Y_{f i, j}^{* T} ϖ_{i, j} Y_{f i, j}^{*}

.

Let

η_{i, j} > \frac{3}{2}, γ_{i, a j} > \frac{1}{2}

,

γ_{i, c j} > γ_{i, a j}

,

γ_{i, j}^{*} = min {γ_{i, c j} - γ_{i, a j}, γ_{i, a j} - \frac{1}{2}}

,

c_{i, j} = min {4 {\bar{η}}_{i, j},

σ_{i, j}, σ_{s i}, σ_{p i}, γ_{i, j}^{*} λ_{min (i, j)}}

,

λ_{min (i, j)}

is the minimal characteristic value of

ϖ_{i, j}

, rewrite (76) as

\begin{matrix} L V_{i, j} \leq & - c_{i, j} V_{i, j} + Δ_{i, j} + δ^{2} (z_{i, j + 1}^{4} - z_{i, j}^{4}) . \end{matrix}

(77)

Similar to

L V_{i, j}

, one has

\begin{matrix} L V_{i, n} \leq & - (η_{i, n} - \frac{3}{4}) z_{i, n}^{6} - {\bar{η}}_{i, n} z_{i, n}^{4} - δ^{2} z_{i, n}^{4} - \frac{σ_{i, n}}{2 γ_{i, n}} {\tilde{Θ}}_{i, n}^{2} - \frac{γ_{i, c n} - γ_{i, a n}}{2} {\tilde{Y}}_{i, c n}^{T} ϖ_{i, n} {\tilde{Y}}_{i, c n} \\ + Δ_{i, n} - (\frac{γ_{i, c n}}{2} - \frac{1}{4}) {\hat{Y}}_{i, c n}^{T} ϖ_{i, n} {\hat{Y}}_{i, c n} - (\frac{γ_{i, a n}}{2} - \frac{1}{4}) {\tilde{Y}}_{i, a n}^{T} ϖ_{i, n} {\tilde{Y}}_{i, a n}, \end{matrix}

(78)

where

Δ_{i, n} = \frac{σ_{i, n}}{2 γ_{i, n}} Θ_{i, n}^{* 2} + \frac{9}{ς^{3}} + \frac{π^{2}}{2} + \frac{γ_{i, c j}}{2} Y_{f i, n}^{* T} ϖ_{f i, n} Y_{i, n}^{*} + \frac{1}{2} ε_{f i, n}^{* 2} + \frac{1}{4 {\bar{ε}}_{i, n}^{4}}

.

Let

η_{i, n} > \frac{3}{4}, γ_{i, a n} > \frac{1}{2}

and

γ_{i, c n} > γ_{i, a n}

,

γ_{i, n}^{*} = min {γ_{i, c n} - γ_{i, a n}, γ_{i, a n} - \frac{1}{2}}

,

λ_{min (i, n)}

is the minimal characteristic value of

ϖ_{i, n}

,

c_{i, n} = min {4 {\bar{η}}_{i, n}, σ_{i, n}, σ_{s i}, σ_{p i}, γ_{i, n}^{*} λ_{min (i, n)}}

, rewrite (78) as

\begin{matrix} L V_{i, n} \leq & - c_{i, n} V_{i, n} + Δ_{i, n} - δ^{2} z_{i, n}^{4} . \end{matrix}

(79)

From (56), (71), (77), and (79), one has

L V \leq - C V + Δ,

(80)

where

C = min_{1 \leq i \leq N, 1 \leq m \leq n, 2 \leq j \leq n} {4 {\bar{η}}_{i, m}, σ_{i, m}, σ_{s i, m}, σ_{p i, m},

\frac{γ_{i, 1}^{*} λ_{min (i, 1)}}{{(d_{i} + b_{i})}^{2}}, γ_{i, j}^{*} λ_{min (i, j)}}

,

Δ = \sum_{i = 1}^{N} \sum_{m = 1}^{n} Δ_{i, m}

.

Based on

I t \hat{o}

Lemma and (80), one has

E [V (t)] \leq V (t_{0}) e^{- C (t - t_{0})} + \frac{Δ}{C} .

(81)

Therefore, it can be concluded that Theorem 1 holds true based on the above analysis.

4. Simulation Example

Consider the following SNMASs, with the directed communication topology graph Figure 2.

\{\begin{matrix} d x_{i, 1} & = (x_{i, 2} + cos (x_{i, 1})) d t + sin (6 x_{i, 1}) d ω \\ d x_{i, 2} & = (u_{i} + cos (x_{i, 1}) sin (5 x_{i, 2})) d t + sin (6 x_{i, 1} x_{i, 2}) d ω \\ y_{i}^{f} & = ϱ_{i} x_{i, 1} + ρ_{i}, 1 \leq i \leq 4, \end{matrix}

(82)

The desired trajectory is

y_{r} = \frac{1}{2} (sin (5 t) + 1)

. From Figure 2, we have

A = [\begin{matrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{matrix}], L = [\begin{matrix} 0 & 0 & 0 & 0 \\ - 1 & 2 & - 1 & 0 \\ 0 & - 1 & 1 & 0 \\ - 1 & 0 & 0 & 1 \end{matrix}] .

Then, the FTPF is chosen as

\begin{matrix} h (t) = \{\begin{matrix} 4.9 {(1 - 4 t)}^{2} + 0.1, & 0 \leq t < 0.25 \\ 0.1, & t \geq 0.25, \end{matrix} \end{matrix}

from which we can obtain

h_{0} = 5, h_{\tilde{T}} = 0.1, ι = 2, λ = 8

, and

\tilde{T} = 0.25

.

Based on stability analysis, we take the design parameters as

η_{i, 1} = η_{i, 2} = 1.6

,

{\bar{η}}_{i, 1} = {\bar{η}}_{i, 2} = 20

,

μ_{i, 1} = {[- 2, - 1, 0, 1, 2]}^{T}

,

μ_{i, 2} = {[- 3, - 2, - 1, 0, 1, 2, 3]}^{T}

,

σ_{i} = 1

,

σ_{i, 1} = σ_{s i, 1} = σ_{p i, 1} = γ_{i, c 1} = γ_{i, c 2} = γ_{i, 1} = γ_{s i, 1} = γ_{p i, 1} = 15

,

σ_{i, 2} = σ_{s i, 2} = σ_{p i, 2} = γ_{i, 2} = γ_{s i, 2} = γ_{p i, 2} = 10

,

γ_{i, a 1} = γ_{i, a 2} = 5

,

δ_{m} = 0.5

,

δ_{M} = 1

. The initial values are given by

x_{i, 1} (0) = 5

,

x_{i, 2} (0) = 2

,

Y_{i, c 1} (0) = 0.5

,

Y_{i, a 1} (0) = 0.55

,

Y_{i, c 2} (0) = 0.8

,

Y_{i, a 2} (0) = 0.85

,

Θ_{i, 1} (0) = Θ_{i, 2} (0) = Θ_{s i, 1} (0) = Θ_{s i, 2} (0) = Θ_{p i, 1} (0) = Θ_{p i, 2} (0) = 1

. Consider the sensor faults: drift fault (

ϱ_{1} = 0.8, ρ_{1} = 0

), time-varying bias fault (

ϱ_{2} = 1, ρ_{2} = - exp (- x_{2, 1})

), time-varying drift fault (

ϱ_{3} = 1 / (1 + exp (- t)), ρ_{3} = 0

), and bias fault (

ϱ_{4} = 1, ρ_{4} = 0.5

). Simulation results are depicted in Figure 3, Figure 4, Figure 5 and Figure 6.

Figure 3 shows that the consensus error is driven to a target neighborhood that satisfies the FTPPC requirement. Meanwhile, by comparing different control methods, the effectiveness of the implemented control method is proven. Compared with the FTC in [44], the proposed method ensures that the consensus error converges to a prescribed range. In contrast to the PPC in [46], this method is able to converge to the prescribed range in fixed time, resulting in an average time saving of about 88.1657%. Figure 4 shows that under the controller (52), the followers can effectively follow the leader’s trajectory. The controller curves

u_{i}

are displayed in Figure 6. Figure 4, Figure 5 and Figure 6 show that

x_{i, 2}

,

y_{i}^{f}

and

u_{i}

are bounded in probability. Undoubtedly, the designed controller has been validated as effective through simulation results.

5. Conclusions

For the SNMASs with sensor faults, the fixed-time prescribed performance optimal consensus control issues have been addressed. A control protocol based on inaccurate information has been proposed to address the issue where the existing feedback control law is not applicable under sensor faults. By utilizing RL and backstepping, we realized the FTPPC for the consensus error. Compared with other consensus control methods, the proposed method is able to converge to a prescribed range, and the convergence time is saved by an average of about 88.1657%. Future work will investigate consensus control for MASs with time delays [47] based on the “Data-Driven ToMFIR” technique [48,49].

Author Contributions

Conceptualization, Z.W., X.C., H.M. and A.L.; Data curation, X.C. and S.X.; Formal analysis, X.C., H.M. and A.L.; Technology assessment, H.M. and A.L.; Funding acquisition, Z.W. and H.M.; Investigation, X.C. and H.M.; Methodology, X.C. and A.L.; Supervision, H.M. and Z.W.; Validation, X.C.; Visualisation, X.C.; Writing—original draft, X.C., A.L. and H.M.; Writing—review and editing, all. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 62203119 and 62373113, and Natural Science Foundation of Guangdong Province grant number 2023A1515012891 and 2023A1515011527.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Tables of key variables and abbreviations are provided to help ensure that readers can clearly understand our paper.

Table A1. The key variable of this paper.

Variable	Definition
$x_{i, \cdot}$	System state
$y_{i}^{f}$	System output state with sensor fault
t	Time
$s_{i}$	Consensus error
$α_{i}, \cdot$	Virtual controller
$u_{i}$	Actual controller
$Y_{i, \cdot}$	Network weight
${(\cdot)}^{*}$	Optimal parameters
$\hat{(\cdot)}$	Approximate optimal parameters

Table A2. The abbreviation of this paper.

Abbreviation	Full Spelling
MASs	Multi-agent systems
SNMASs	Stochastic Nonlinear MASs
FTC	Fixed-time control
FTPPC	Fixed-time prescribed performance control
PPC	Prescribed performance control
HJB	Hamilton–Jacobi–Bellman
RL	Reinforcement learning
NNs	Neural networks
FTPF	Fixed-time performance function

References

Yang, S.; Liang, H.; Pan, Y.; Li, T. Security control for air-sea heterogeneous multiagent systems with cooperative-antagonistic interactions: An intermittent privacy preservation mechanism. Sci. China-Technol. Sci. 2024. Available online: https://www.sciengine.com/SCTS/doi/10.1007/s11431-024-2758-6 (accessed on 21 October 2024). [CrossRef]
Ren, H.; Liu, Z.; Liang, H.; Li, H. Pinning-based neural control for multiagent systems with self-regulation intermediate event-triggered method. IEEE Trans. Neural Netw. Learn. Syst. 2024. early access. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Huang, Q.; Wang, X.; Li, H.; Li, H. Bipartite consensus for quantization communication multi-agents systems with event-triggered random delayed impulse control. IEEE Trans. Circuits Syst. I-Regul. Pap. 2024. early access. [Google Scholar] [CrossRef]
Ren, H.; Zhang, C.; Ma, H.; Li, H. Cloud-based distributed group asynchronous consensus for switched nonlinear cyber-physical systems. IEEE Trans. Ind. Inform. 2024. early access. [Google Scholar] [CrossRef]
Li, H.; Luo, J.; Ma, H.; Zhou, Q. Observer-based event-triggered iterative learning consensus for locally Lipschitz nonlinear MASs. IEEE Trans. Cogn. Dev. Syst. 2024, 16, 46–56. [Google Scholar] [CrossRef]
Ma, J.; Hu, J. Safe consensus control of cooperative-competitive multi-agent systems via differential privacy. Kybernetika 2023, 58, 426–439. [Google Scholar] [CrossRef]
Wang, J.; Li, Y.; Wu, Y.; Liu, Z.; Chen, K.; Chen, C.L.P. Fixed-time formation control for uncertain nonlinear multiagent systems with time-varying actuator failures. IEEE Trans. Fuzzy Syst. 2024, 32, 1965–1977. [Google Scholar] [CrossRef]
Zhang, J.; Yang, D.; Li, W.; Zhang, H.; Li, G.; Gu, P. Resilient output control of multiagent systems with DoS attacks and actuator faults: Fully distributed event-triggered approach. IEEE Trans. Cybern. 2024, 54, 7681–7690. [Google Scholar] [CrossRef]
Wang, F.; Chen, B.; Sun, Y.; Gao, Y.; Lin, C. Finite-time fuzzy control of stochastic nonlinear systems. IEEE Trans. Cybern. 2020, 50, 2617–2626. [Google Scholar] [CrossRef]
Hua, C.; Li, K.; Guan, X. Decentralized event-triggered control for interconnected time-delay stochastic nonlinear systems using neural networks. Neurocomputing 2018, 272, 270–278. [Google Scholar] [CrossRef]
Ren, C.E.; Zhang, J.; Guan, Y. Prescribed performance bipartite consensus control for stochastic nonlinear multiagent systems under event-triggered strategy. IEEE Trans. Cybern. 2023, 53, 468–482. [Google Scholar] [CrossRef]
Zhu, Y.; Niu, B.; Shang, Z.; Wang, Z.; Wang, H. Distributed adaptive asymptotic consensus tracking control for stochastic nonlinear MASs with unknown control gains and output constraints. IEEE Trans. Autom. Sci. Eng. 2024. early access. [Google Scholar] [CrossRef]
Li, K.; Hua, C.; You, X.; Ahn, C.K. Leader-following consensus control for uncertain feedforward stochastic nonlinear multiagent systems. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 1049–1057. [Google Scholar] [CrossRef] [PubMed]
Ren, H.; Cheng, Z.; Qin, J.; Lu, R. Deception attacks on event-triggered distributed consensus estimation for nonlinear systems. Automatica 2023, 154, 111100. [Google Scholar] [CrossRef]
Huang, C.; Xie, S.; Liu, Z.; Chen, C.L.P.; Zhang, Y. Adaptive inverse optimal consensus control for uncertain high-order multiagent systems with actuator and sensor failures. Inf. Sci. 2022, 605, 119–135. [Google Scholar] [CrossRef]
Guo, X.G.; Wang, B.Q.; Wang, J.L.; Wu, Z.G.; Guo, L. Adaptive event-triggered PIO-based anti-disturbance fault-tolerant control for MASs with process and sensor faults. IEEE Trans. Neural Netw. Learn. Syst. 2024, 11, 77–88. [Google Scholar] [CrossRef]
Zhou, Q.; Ren, Q.; Ma, H.; Chen, G.; Li, H. Model-free adaptive control for nonlinear systems under dynamic sparse attacks and measurement disturbances. IEEE Trans. Circuits Syst. I-Regul. Pap. 2024, 71, 4731–4741. [Google Scholar] [CrossRef]
Wu, Y.; Liu, J.; Wang, Z.; Ju, Z. Distributed resilient tracking of multiagent systems under actuator and sensor faults. IEEE Trans. Cybern. 2023, 53, 4653–4664. [Google Scholar] [CrossRef]
Yu, Y.; Peng, S.; Dong, X.; Li, Q.; Ren, Z. UIF-based cooperative tracking method for multi-agent systems with sensor faults. Sci. China-Inf. Sci. 2018, 62, 10202. [Google Scholar] [CrossRef]
Jiang, M.; Xie, X.; Zhang, K. Finite-time stabilization of stochastic high-order nonlinear systems with FT-SISS inverse dynamics. IEEE Trans. Autom. Control 2019, 64, 313–320. [Google Scholar] [CrossRef]
Sui, S.; Chen, C.L.P.; Tong, S. Fuzzy adaptive finite-time control design for nontriangular stochastic nonlinear systems. IEEE Trans. Fuzzy Syst. 2019, 27, 172–184. [Google Scholar] [CrossRef]
Yan, Z.; Zhang, W.; Zhang, G. Finite-time stability and stabilization of Itô stochastic systems with Markovian switching: Mode-dependent parameter approach. IEEE Trans. Autom. Control 2015, 60, 2428–2433. [Google Scholar] [CrossRef]
Ren, H.; Ma, H.; Li, H.; Wang, Z. Adaptive fixed-time control of nonlinear MASs with actuator faults. IEEE-CAA J. Autom. Sin. 2023, 10, 1252–1262. [Google Scholar] [CrossRef]
Shi, S.; Xu, S.; Liu, W.; Zhang, B. Global fixed-time consensus tracking of nonlinear uncertain multiagent systems with high-order dynamics. IEEE Trans. Cybern. 2020, 50, 1530–1540. [Google Scholar] [CrossRef] [PubMed]
Lu, R.; Wu, J.; Zhan, X.; Yan, H. Practical finite-time and fixed-time containment for second-order nonlinear multi-agent systems with IDAs and Markov switching topology. Neurocomputing 2023, 573, 127180. [Google Scholar] [CrossRef]
Yang, B.; Yan, Z.; Luo, M.; Hu, M. Fixed-time partial component consensus for nonlinear multi-agent systems with/without external disturbances. Commun. Nonlinear Sci. Numer. Simul. 2024, 130, 107732. [Google Scholar] [CrossRef]
Pan, Y.; Chen, Y.; Liang, H. Event-triggered predefined-time control for full-state constrained nonlinear systems: A novel command filtering error compensation method. Sci. China-Technol. Sci. 2024, 67, 2867–2880. [Google Scholar] [CrossRef]
Xin, B.; Cheng, S.; Wang, Q.; Chen, J.; Deng, F. Fixed-time prescribed performance consensus control for multiagent systems with nonaffine faults. IEEE Trans. Fuzzy Syst. 2023, 31, 3433–3446. [Google Scholar] [CrossRef]
Cheng, W.; Zhang, K.; Jiang, B. Fixed-time fault-tolerant formation control for a cooperative heterogeneous multiagent system with prescribed performance. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 462–474. [Google Scholar] [CrossRef]
Ke, J.; Huang, W.; Wang, J.; Zeng, J. Fixed-time consensus control for multi-agent systems with prescribed performance under matched and mismatched disturbances. ISA Trans. 2021, 119, 135–151. [Google Scholar] [CrossRef]
Zheng, S.; Ma, H.; Ren, H.; Li, H. Practically fixed-time adaptive consensus control for multiagent systems with prescribed performance. Sci. China-Technol. Sci. 2024, 67, 3867–3876. [Google Scholar] [CrossRef]
Long, S.; Huang, W.; Wang, J.; Liu, J.; Gu, Y.; Wang, Z. A fixed-time consensus control with prescribed performance for multi-agent systems under full-state constraints. IEEE Trans. Autom. Sci. Eng. 2024. early access. [Google Scholar] [CrossRef]
Wen, G.; Chen, C.L.P. Optimized backstepping consensus control using reinforcement learning for a class of nonlinear strict-feedback-dynamic multi-agent systems. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 1524–1536. [Google Scholar] [CrossRef] [PubMed]
Luo, A.; Ma, H.; Ren, H.; Li, H. Estimator-based reinforcement learning consensus control for multiagent systems with discontinuous constraints. IEEE Trans. Neural Netw. Learn. Syst. 2024. early access. [Google Scholar] [CrossRef] [PubMed]
Luo, A.; Zhou, Q.; Ma, H.; Li, H. Observer-based consensus control for MASs with prescribed constraints via reinforcement learning algorithm. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 17281–17291. [Google Scholar] [CrossRef]
Wang, Z.; Wang, X.; Zhao, C. Event-triggered containment control for nonlinear multiagent systems via reinforcement learning. IEEE Trans. Circuits Syst. II-Express Briefs 2023, 70, 2904–2908. [Google Scholar] [CrossRef]
Wen, G.; Ge, S.S.; Chen, C.L.P.; Tu, F.; Wang, S. Adaptive tracking control of surface vessel using optimized backstepping technique. IEEE Trans. Cybern. 2019, 49, 3420–3431. [Google Scholar] [CrossRef]
Wen, G.; Chen, C.L.P.; Ge, S.S. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans. Cybern. 2020, 51, 4567–4580. [Google Scholar] [CrossRef]
Wen, G.; Ge, S.S.; Tu, F. Optimized backstepping for tracking control of strict-feedback systems. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3850–3862. [Google Scholar] [CrossRef]
Wen, G.; Chen, C.L.P.; Li, B. Optimized formation control using simplified reinforcement learning for a class of multiagent systems with unknown dynamics. IEEE Trans. Ind. Electron. 2020, 67, 7879–7888. [Google Scholar] [CrossRef]
Sun, J.; Ming, Z. Cooperative differential game-based distributed optimal synchronization control of heterogeneous nonlinear multiagent systems. IEEE Trans. Cybern. 2023, 53, 7933–7942. [Google Scholar] [CrossRef]
Cao, L.; Pan, Y.; Liang, H.; Ahn, C.K. Event-based adaptive neural network control for large-scale systems with nonconstant control gains and unknown measurement sensitivity. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 7027–7038. [Google Scholar] [CrossRef]
Cao, L.; Li, H.; Dong, G.; Lu, R. Event-triggered control for multiagent systems with sensor faults and input saturation. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 3855–3866. [Google Scholar] [CrossRef]
Li, K.; Li, Y. Fuzzy adaptive optimal consensus fault-tolerant control for stochastic nonlinear multiagent systems. IEEE Trans. Fuzzy Syst. 2022, 30, 2870–2885. [Google Scholar] [CrossRef]
Sun, J.; Zhang, J.; Zhang, H.; Liu, Y. Adaptive virotherapy strategy for organism with constrained input using medicine dosage regulation mechanism. IEEE Trans. Cybern. 2024, 54, 2505–2514. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Mao, Z.; Jiang, B.; Yan, X.G. Prescribed performance fault-tolerant control for synchronization of heterogeneous nonlinear MASs using reinforcement learning. IEEE Trans. Cybern. 2024, 54, 5451–5462. [Google Scholar] [CrossRef]
Ji, L.; Lin, Z.; Zhang, C.; Yang, S.; Li, J.; Li, H. Data-based optimal consensus control for multiagent systems with time delays: Using prioritized experience replay. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 3244–3256. [Google Scholar] [CrossRef]
Wu, Y.; Su, Y.; Wang, Y.L.; Shi, P. T-S fuzzy data-driven tomfir with application to incipient fault detection and isolation for high-speed rail vehicle suspension systems. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7921–7932. [Google Scholar] [CrossRef]
Wu, Y.; Su, Y.; Shi, P. Data-driven ToMFIR-based incipient fault detection and estimation for high-speed rail vehicle suspension systems. IEEE Trans. Ind. Inform. 2024. early access. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the overall control system.

Figure 2. Directed communication topology graph.

Figure 3. Schematic diagram of

s_{i} (t)

curves [44,46].

Figure 3. Schematic diagram of

s_{i} (t)

curves [44,46].

Figure 4. Schematic diagram of

y_{i}^{f}

and

y_{r}

curves.

Figure 4. Schematic diagram of

y_{i}^{f}

and

y_{r}

curves.

Figure 5. Schematic diagram of

x_{i, 2}

curvies.

Figure 5. Schematic diagram of

x_{i, 2}

curvies.

Figure 6. Schematic diagram of

u_{i} (t)

curvies.

Figure 6. Schematic diagram of

u_{i} (t)

curvies.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Cai, X.; Luo, A.; Ma, H.; Xu, S. Reinforcement-Learning-Based Fixed-Time Prescribed Performance Consensus Control for Stochastic Nonlinear MASs with Sensor Faults. Sensors 2024, 24, 7906. https://doi.org/10.3390/s24247906

AMA Style

Wang Z, Cai X, Luo A, Ma H, Xu S. Reinforcement-Learning-Based Fixed-Time Prescribed Performance Consensus Control for Stochastic Nonlinear MASs with Sensor Faults. Sensors. 2024; 24(24):7906. https://doi.org/10.3390/s24247906

Chicago/Turabian Style

Wang, Zhenyou, Xiaoquan Cai, Ao Luo, Hui Ma, and Shengbing Xu. 2024. "Reinforcement-Learning-Based Fixed-Time Prescribed Performance Consensus Control for Stochastic Nonlinear MASs with Sensor Faults" Sensors 24, no. 24: 7906. https://doi.org/10.3390/s24247906

APA Style

Wang, Z., Cai, X., Luo, A., Ma, H., & Xu, S. (2024). Reinforcement-Learning-Based Fixed-Time Prescribed Performance Consensus Control for Stochastic Nonlinear MASs with Sensor Faults. Sensors, 24(24), 7906. https://doi.org/10.3390/s24247906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement-Learning-Based Fixed-Time Prescribed Performance Consensus Control for Stochastic Nonlinear MASs with Sensor Faults

Abstract

1. Introduction

2. Preliminaries and Description

2.1. Graph Theory

2.2. Neural Networks (NNs)

2.3. System Description

3. Adaptive Optimal Consensus Controller Design and Stability Analysis

3.1. Adaptive Optimal Consensus Controller Design

3.2. Stability Analysis

4. Simulation Example

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI