Integral Reinforcement Learning-Based Stochastic Guaranteed Cost Control for Time-Varying Systems with Asymmetric Saturation Actuators

Liang, Yuling; Xie, Mengjia; Zhang, Juan; Ming, Zhongyang; Gao, Zhiyun

doi:10.3390/act14100506

Open AccessArticle

Integral Reinforcement Learning-Based Stochastic Guaranteed Cost Control for Time-Varying Systems with Asymmetric Saturation Actuators

by

Yuling Liang

^1,*,

Mengjia Xie

¹,

Juan Zhang

²,

Zhongyang Ming

² and

Zhiyun Gao

³

¹

School of Artificial Intelligence, Shenyang University of Technology, Shenyang 110870, China

²

School of Information Science and Engineering, Northeastern University, Shenyang 110819, China

³

School of Transportation Science and Engineering, Civil Aviation University of China, Tianjin 300300, China

^*

Author to whom correspondence should be addressed.

Actuators 2025, 14(10), 506; https://doi.org/10.3390/act14100506

Submission received: 3 September 2025 / Revised: 5 October 2025 / Accepted: 17 October 2025 / Published: 19 October 2025

(This article belongs to the Special Issue Advances in Intelligent Control of Actuator Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This study explores a stochastic guarantee cost control (GCC) for time-varying systems with random parameters and asymmetric saturation actuators by employing the integral reinforcement learning (IRL) method in the dynamic event-triggered (DET) mode. Firstly, a modified Hamilton–Jacobi–Isaac (HJI) equation is formulated, and then the worst-case disturbance policy and the asymmetric saturation optimal control signal can be obtained. Secondly, the multivariate probabilistic collocation method (MPCM) is used to evaluate the value function at designated sampling points. The purpose of introducing the MPCM is to simplify the computational complexity of stochastic dynamic programming (SDP) methods. Furthermore, the DET mode is utilized to solve the SDP problem to reduce the computational burden on communication resources. Finally, the Lyapunov stability theorem is applied to analyze the stability of time-varying systems, and the simulation shows the feasibility of the designed method.

Keywords:

stochastic systems; asymmetric saturation actuators; multivariate probabilistic collocation method; adaptive dynamic programming

1. Introduction

Over the past few decades, stochastic control has been extensively studied across numerous practical domains, including robotic systems [1], microgrid setups [2], intelligent transportation networks [3], and other practical control fields. Compared with deterministic system control, stochastic system control has problems such as computational complexity and model mismatch, which makes its design challenging [4]. In [5], the stochastic control problem was studied in data-driven model predictive control by using neural networks (NNs) or establishing a stochastic sampling model. A stochastic control approach applicable to discrete-time nonlinear systems was put forward in [6]. The security issues of stochastic systems were also addressed in [7]. A novel stochastic control method was designed for a linear system with multiplicative noise and input constraints in [8]. Regarding stochastic discrete-time linear systems under bounded input constraints, a receding horizon control technique was formulated in [9]. However, none of the studies mentioned above considered optimal control under asymmetric saturation actuators (ASAs).

Actuator saturation is a common nonlinear problem that can affect system stability and performance and also pose significant challenges to controller design. In [10], a robust adaptive fuzzy fault-tolerant path planning control method was proposed to solve the actuator saturation problem. In [11], an adaptive compensation technique integrated into the backstepping framework was proposed for controlled systems with actuator saturation. In [12], an adaptive NN control method was proposed for systems with input saturation nonlinearities. These methods have symmetric saturation actuators, and in order to optimize control strategies, researchers have proposed ASA control strategies to ensure the safety and stability of the controlled system. In [13], a composite adaptive control method was proposed for tethered aircraft systems with ASAs. Ref. [14] proposed a sliding mode control method with ASAs for vehicle systems. In [15], an asymptotic stabilization method was proposed for unmanned surface vessels with actuator dead zones and bow constraints. However, the methods mentioned above do not consider the optimal performance of the controlled system. The adaptive dynamic programming (ADP) algorithm, which is a method that combines dynamic programming (DP) and integral reinforcement learning (RL) ideas, addresses the “curse of dimensionality” issue in conventional DP [16,17,18,19]. For the Itô-type stochastic systems, the authors employed the ADP method to address the tracking control issue in [20]. In [21], a self-learning optimal operation control method based on a compensator and RL was proposed for input-constrained dual-time scale systems. A Q-learning-based fault-tolerant control method was designed without using the knowledge of system function in [22]. In [23], an RL-based optimal control (OC) method was devised for linear stochastic systems, with the least squares approach adopted to derive approximate optimal solutions. Further, an integral RL-based model-free tracking approach was developed for linear stochastic systems in [24]. In [25], an ADP-based GCC method was presented for nonlinear systems. By introducing the GCC approach, one can ensure that the control performance index is below a certain bound. The above methods may cause unnecessary waste of communication resources because these approaches are designed based on time-triggered control mechanisms.

Event-triggered control (ETC) is a non-periodic control strategy based on triggering conditions; the core idea of ETC is to determine when to update control signals or communicate through preset event-triggered conditions, rather than traditional periodic sampling or time triggering [26,27]. In [28], for Itô-type time-delayed stochastic systems subject to uncertainties, an OC strategy under the ETC mechanism was designed by using the ADP method and constructing an actor–critic architecture. Moreover, an optimal consensus control (OCC) strategy was proposed by using fuzzy technology for stochastic MASs with time delays in [29]. To further decrease control input update frequency and increase communication resource utilization, the dynamic event-triggered control (DETC) mechanism serves as an upgraded version of the conventional ETC approach. It is a dynamically adjustable mechanism that further optimizes resource usage, system performance, and adaptability [30]. In [31], under the DETC mechanism, the H2/H∞ problem was explored for partially unknown nonlinear stochastic systems by employing the ADP algorithm. In [32], focusing on uncertain microgrids, control inputs and external disturbances were treated as two participants in zero-sum differential game scenarios. Under the DETC mechanism, the authors put forward a frequency recovery control approach based on integral reinforcement learning (IRL). In [33], a dynamically adjustable triggering condition was designed based on the OC method for stochastic nonlinear systems by using backstepping technology and the RL algorithm in the DETC mode.

In addition, it is difficult to obtain accurate mathematical models for actual control systems, and estimating system dynamics through extended simulation experiments is a conventional method [34,35]. If random time-varying parameters exist in actual nonlinear controlled system dynamics, the expected control performance function is always estimated by using uncertainty evaluation methods [36], such as the Monte Carlo (MC) approach and its extensive version, which are implemented based on simulation experiments [37]. However, approximating the expected control performance index requires a large number of simulation experiments when using the MC method and its extensive version [38]. In [39], to simplify the computational complexity, the MPCM method, which is an effective technology for estimating uncertainties, was used to estimate the expected value of the cost function by taking values at certain specific sampling points. Stochastic GCC approaches have been put forward by using RL (or IRL) and MPCM methods in [36,40]. However, the issue of ASAs remains unaddressed. In our design, we put forward a GCC approach in DETC mode for stochastic systems subject to ASAs by employing the the MPCM integrated with IRL algorithms. The main contributions are as follows:

1.: This study develops an innovative DETC-based GCC approach for stochastic systems by using IRL algorithms with the MPCM. This approach can ensure that the system performance index is less than a certain upper bound.
2.: By solving an improved HJI equation by designing a modified long-term performance cost function, the control inputs under ASAs can be obtained via an actor–critic–disturbance NN structure.
3.: Through the introduction of dynamic parameters into triggering conditions in the DETC mechanism, the update rule for control inputs can be adjusted dynamically, which can reduce the computational complexity of sampling data.

This paper’s structure is arranged as follows. Section 2 presents the issue description. Section 3 develops stochastic systems’ optimal GCC design via a modified HJI equation. A stochastic GCC method is designed by using the MPCM and IRL in Section 4. Section 5 presents the event-triggered structure for optimal GCC. A simulation example is shown to verify the feasibility of the strategy in Section 6. The conclusion is shown in Section 7.

2. Problem Statement

Consider the following stochastic system [36,41]:

\begin{matrix} \dot{ς} = χ (g) ς + B ρ + C d, \end{matrix}

(1)

where

χ (g) \in R^{n \times n}

represents a time-varying matrix related to the stochastic vector

g (t)

. Matrices

B \in R^{n \times q}

and

C \in R^{n \times p}

are the system functions. The disturbance policy, control input, and system state are

d \in R^{p}

,

ρ \in R^{q}

and

ς (t) \in R^{n}

, respectively. The control signal

ρ = {[ρ_{1}, ρ_{2}, \dots, ρ_{q}]}^{T}

satisfies

ρ_{m i n} \leq ρ_{i} \leq ρ_{m a x}

,

|ρ_{min}| \neq |ρ_{max}|

, for

i = 1, 2, \dots, q

.

ρ_{min}

and

ρ_{max}

stand for the smallest and largest constraint limits of control inputs

ρ_{i}

.

g_{p} (t)

is the sample function of the pth element in the random vector

g (t)

, and the dynamics of the sample system (1) are ordinary differential equations;

g_{p} (t)

exhibits good behavior.

Assumption 1.

The system (1) satisfies Lipschitz continuity over the compact set

Ω_{ς} \subset R^{n}

[36,42,43]. In addition,

χ (g) \leq b_{a}

, where

b_{a}

is a positive constant.

Remark 1.

Unlike general stochastic models with drift and diffusion terms, Ref. [41] focused on zero-sum game studies of time-varying stochastic systems (1) with environment-influenced random variables. The system model (1) is applicable to practical scenarios. For example, aircraft systems have been modeled as

\dot{ς} = - K ς (t) + M_{u} (t) + M_{d} (t)

, where K is a weather-dependent random variable,

M_{u} (t)

refers to regulated thrust, and

M_{d} (t)

is the interference force. Other fields also explore such system dynamics [44].

Assumption 2.

Let

β_{1}

denote a positive constant; the disturbance approach

d (ς)

is such that

d^{T} (ς) d (ς) \leq β_{1} ς^{T} ς

, with

d (0) = 0

.

The system (1) with nominal form becomes

\begin{matrix} \dot{ς} = χ (g) ς + B ρ . \end{matrix}

(2)

The positive definite matrix is represented as

Q \in R^{n \times n}

, and the long-term control performance function is

\begin{matrix} \bar{Θ} (ς (0), ρ) = E [\int_{0}^{\infty} (ς^{T} Q ς + W (ρ)) d t] . \end{matrix}

(3)

with

\begin{matrix} W (ρ) = 2 κ_{2} \int_{κ_{1}}^{ρ (ς)} {\tilde{ρ}}_{a}^{- T} (κ_{2}^{- 1} (ϑ - κ_{1})) R d ϑ, \\ κ_{1} = \frac{ρ_{max} + ρ_{min}}{2}, κ_{2} = \frac{ρ_{max} - ρ_{min}}{2}, \end{matrix}

where

{\tilde{ρ}}_{a} (\cdot) = tanh (\cdot)

. For

{\tilde{r}}_{i} > 0, i = 1, \dots, q

,

R = d i a g ({\tilde{r}}_{1}, {\tilde{r}}_{2}, \dots, {\tilde{r}}_{q})

.

By calculation, we have

\begin{matrix} W (ρ) & = 2 κ_{2} {tanh}^{- T} (\frac{ρ (ς) - κ_{1}}{κ_{2}}) R (ρ (ς) - κ_{1}) \\ + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {(\frac{ρ (ς) - κ_{1}}{κ_{2}})}^{2}), \end{matrix}

where

\bar{R} = {[{\tilde{r}}_{1}, {\tilde{r}}_{2}, \dots, {\tilde{r}}_{q}]}^{T}

and

\bar{1} = \underset{\underset{q}{︸}}{{[1, \dots, 1]}^{⊤}}

.

Remark 2.

Ref. [36] studied the GCC problem of stochastic systems under an event-triggered mechanism. However, some practical systems face the nonlinear control problem of ASAs [13,14,15], which limits the application of the method in [36]. Therefore, this paper proposes a GCC strategy for stochastic systems under the ASA condition with

| ρ_{min} | \neq | ρ_{max} |

. By introducing a non-quadratic function

W (ρ)

related to the control input ρ to relax the asymmetric constraints and incorporating dynamic parameters into the triggering condition, the number of communications can be effectively reduced, and the utilization rate of communication resources can be further improved.

Within our design framework, the primary objective is to establish a GCC strategy for systems with ASAs by solving the HJI equation, which can ensure that the cost function is less than a certain upper bound and guarantee the stability and optimal control performance of stochastic systems under the DETC mechanism.

3. Stochastic Optimal GCC Design

In our design, an auxiliary policy is designed as ð, and the auxiliary system is established as

\begin{matrix} \dot{ς} = χ (g) ς + B ρ + C ð . \end{matrix}

(4)

Let

r (ς, ρ, ð) = ς^{T} Q ς + W (ρ) - γ^{2} ð^{T} ð

, where

γ

represents a pre-specified disturbance attenuation level.

The modified cost function is

\begin{matrix} Θ (ς (0), ρ, ð) = E [\int_{0}^{\infty} (r (ς, ρ, ð) + γ^{2} β_{1} ς^{T} ς) d t] . \end{matrix}

(5)

The value function is

\begin{matrix} Γ (ς (t)) = E [\int_{t}^{\infty} (r (ς, ρ, ð) + γ^{2} β_{1} ς^{T} ς) d t] . \end{matrix}

(6)

The derivative value of

Γ (ς (t))

is

\begin{matrix} E [\dot{Γ} (ς (t))] = - γ^{2} β_{1} ς^{T} ς - r (ς, ρ, ð) . \end{matrix}

(7)

Let

\nabla Γ (ς) = \frac{\partial Γ (ς)}{\partial ς}

, inspired by [36,41], the Hamiltonian function is designed in the form of a mean, which is given by

\begin{matrix} H (ς, \nabla Γ, ρ, ð) = & E [{(\nabla Γ)}^{T} (χ (g) ς + B ρ + C ð)] \\ + γ^{2} β_{1} ς^{T} ς + r (ς, ρ, ð) . \end{matrix}

(8)

Bellman’s equation refers to the principle of optimization in DP problems. In light of Bellman’s optimality principle, the following holds:

\begin{matrix} min_{ρ} max_{ð} H (ς, \nabla Γ^{*}, ρ, ð) = 0, \end{matrix}

(9)

with

ð^{*}

,

ρ^{*}

, and

Γ^{*} (ς)

being the worst-case disturbance policy, optimal control signal, and optimal value function, respectively. By calculation, we have

\begin{matrix} ρ^{*} & = - κ_{2} tanh (N^{*}) + κ_{1}, \end{matrix}

(10)

\begin{matrix} ð^{*} & = \frac{1}{2 γ^{2}} C^{T} \nabla Γ^{*} (ς), \end{matrix}

(11)

where

N^{*} = \frac{1}{2 κ_{2}} R^{- 1} B^{T} \nabla Γ^{*} (ς)

.

According to (10) and (11), we get

\begin{matrix} W (ρ^{*}) & = 2 κ_{2} \int_{κ_{1}}^{ρ^{*} (ς)} {\tilde{ρ}}_{a}^{- T} (κ_{2}^{- 1} (ϑ - κ_{1})) R d ϑ \\ = 2 κ_{2} \int_{κ_{1}}^{- κ_{2} tanh (N^{*}) + κ_{1}} {\tilde{ρ}}_{a}^{- T} (κ_{2}^{- 1} (ϑ - κ_{1})) R d ϑ \\ = 2 κ_{2}^{2} N^{* T} R tanh (N^{*}) + k_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (N^{*})), \end{matrix}

(12)

\begin{matrix} U_{2}^{*} & = γ^{2} P^{* T} P^{*}, \end{matrix}

(13)

with

P^{*} = \frac{C^{T} \nabla Γ^{*}}{2 γ^{2}}

and

r (ς, ρ^{*}, ð^{*}) = ς^{T} Q ς + W (ρ^{*}) - U_{2}^{*}

.

Based on (8), (12) and (13), the HJI equation is restructured as

\begin{matrix} H (ς, \nabla Γ^{*}, ρ^{*}, ð^{*}) \\ = & E [{(\nabla Γ^{*})}^{T} (χ (g) ς + B ρ^{*} + C ð^{*})] + γ^{2} β_{1} ς^{T} ς + ς^{T} Q ς + W (ρ^{*}) - γ^{2} ð^{* T} ð^{*} \\ = & E [ς^{T} Q ς + \nabla Γ^{* T} χ (g) ς + 2 κ_{2}^{2} N^{* T} R tanh (N^{*}) + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (N^{*})) \\ + γ^{2} P^{* T} P^{*} + γ^{2} β_{1} ς^{T} ς + 2 κ_{2} N^{* T} R (- κ_{2} tanh (N^{*}) + κ_{1})] \\ = & E [ς^{T} Q_{1} ς + \nabla Γ^{* T} χ (g) ς + γ^{2} P^{* T} P^{*} + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (N^{*})) + 2 κ_{2} N^{* T} R κ_{1}] = 0 \end{matrix}

(14)

where

I_{n \times n}

is an

n \times n

identity matrix, which satisfies

Q_{1} = Q + γ^{2} β_{1} I_{n \times n}

and

Γ (0) = 0

.

Assumption 3.

λ_{1}

and

λ_{2}

are denoted as positive numbers.

Γ (ς)

is continuously differentiable, which satisfies

λ_{1} {∥ ς ∥}^{2} \leq Γ (ς) \leq λ_{2} {∥ ς ∥}^{2}

.

Remark 3.

Due to the existence of uncertainty and random parameters, the OC problem of system (1) is difficult to solve. The GCC problem can obtain the upper bound of the cost function of system (1), which is the basic idea of the GCC problem. This article constructs an auxiliary system, and by solving the optimal value function of the auxiliary system, the guaranteed cost (GC) function of the original system (1) can be obtained; that is, the value function of the auxiliary system is the optimal GC function of the original system. A related theorem is given as follows.

Theorem 1.

Considering system (1), define the cost function of auxiliary system (4) as (5); the asymptotic stability characteristic in the mean of stochastic system (1) can be guaranteed if a differentiable value function

Γ^{*}

satisfies the HJI equation in (14) and the asymmetric cost control signal in (10) and the auxiliary policy in (11) are utilized.

Proof.

Choose the candidate Lyapunov function as

\begin{matrix} Γ^{*} (ς (t)) = E [\int_{t}^{\infty} (ς^{T} Q_{1} ς + W (ρ^{*}) - γ^{2} ð^{* T} ð^{*}) d ϱ] . \end{matrix}

When

ρ^{*}

is substituted into (1), system (1) becomes

\begin{matrix} \dot{ς} = χ (g) ς + B ρ^{*} + C d . \end{matrix}

(15)

Through the addition and substitution of

\nabla Γ^{* T} (ς) C ð^{*}

, we derive

\begin{matrix} E [{\dot{Γ}}^{*} (ς)] & = E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B ρ^{*} + C d)] \\ = E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B ρ^{*} + C ð^{*})] \\ + E [{(\nabla Γ^{*} (ς))}^{T} C (d - ð^{*})] . \end{matrix}

(16)

Based on (8), (9) and (11), one gets

\begin{matrix} E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B ρ^{*} + C ð^{*})] + E [{(\nabla Γ^{*} (ς))}^{T} C (d - ð^{*})] \\ = & E [- γ^{2} β_{1} ς^{T} ς - ς^{T} Q ς - W (ρ^{*}) + γ^{2} ð^{* T} ð^{*}] \\ + E [2 γ^{2} ð^{* T} (d - ð^{*})] . \end{matrix}

(17)

Using Assumption 2 and the inequality

2 γ^{2} ð^{* T} d \leq γ^{2} ð^{* T} ð^{*} + γ^{2} d^{T} d

, one obtains

\begin{matrix} E [{\dot{Γ}}^{*} (ς)] \leq E [- ς^{T} Q ς - W (ρ^{*})] \leq 0 . \end{matrix}

(18)

Integrating Equation (18) over the interval

[0, \infty)

, we have

\begin{matrix} E [\int_{0}^{\infty} (ς^{T} Q ς + W (ρ^{*}) d ϱ] \leq Γ^{*} (ς (0)) . \end{matrix}

(19)

From (18), [36] (Lemma 1) and Assumption 3, the asymptotic stability characteristic in the mean of stochastic system (1) can be guaranteed under the application of the control pairs

(ρ^{*}, ð^{*})

given in (10) and (11). Additionally, based on (18), we have

\bar{Θ} (ς (0), ρ^{*}) \leq Γ^{*} (ς (0))

. □

4. Stochastic GCC Method Design via MPCM and IRL Algorithm

In this section, a novel stochastic GCC method is proposed for the controlled system with asymmetric constrained inputs by using the MPCM and IRL algorithms.

Using (7), we derive the integral Bellman equation based on the optimality principle as

Γ^{*} (ς (t)) = E [\int_{t}^{t + T} (γ^{2} β_{1} ς^{T} ς + r (ς, ρ^{*}, ð^{*})) d ϱ + Γ^{*} (ς (t + T))]

with

T \in (0, t)

. For any admissible control policies

(ρ (ς), ð (ς))

, the value function satisfies the condition that

\begin{matrix} Γ (ς (t)) & = E [\int_{t}^{t + T} (γ^{2} β_{1} ς^{T} ς + r (ς, ρ, ð)) d ϱ + Γ (ς (t + T))] . \end{matrix}

(20)

Accordingly, one obtains

\begin{matrix} {\overset{˘}{e}}_{B} = & E [\int_{t}^{t + T} (r (ς, ρ, ð) + γ^{2} β_{1} ς^{T} ς) d ϱ \\ + Γ (ς (t + T))] - Γ (ς (t)) \end{matrix}

(21)

where

{\overset{˘}{e}}_{B}

is the Bellman error.

Remark 4.

Equation (21) will be employed in the subsequent on-policy IRL design procedure. In our framework, the IRL algorithm serves to drive the Bellman error

{\overset{˘}{e}}_{B}

in (21) toward zero.

As

t \to \infty

,

(ρ (ς), d (ς))

converges to

(ρ^{*} (ς), d^{*} (ς))

after several iterations in the learning process, and concurrently,

Γ (ς)

approaches

Γ^{*} (ς)

.

4.1. On-Policy GCC Design

By designing an NN that includes the activation function, the expected weight vector and the approximation error are denoted as

φ_{c} (ς)

,

W_{c}

, and

ε_{c} (ς)

, respectively. We approximate the value function

Γ^{*} (ς)

as

\begin{matrix} Γ^{*} (ς) = W_{c}^{T} φ_{c} (ς) + ε_{c} (ς) . \end{matrix}

(22)

Let

{\hat{W}}_{c}

denote the estimated value for

W_{c}

. At time step

s + 1

, the estimate value of

Γ (ς)

is

\begin{matrix} Γ^{(s + 1)} (ς) = {\hat{W}}_{c}^{(s + 1) T} φ_{c} (ς) . \end{matrix}

(23)

Based on [40] (Th.2), we adopt the MPCM to reduce the number of simulations from

2^{m} \prod_{q = 1}^{m} N_{q}

to

\prod_{q = 1}^{m} N_{q}

, which enables the prediction of the output mean of the system mapping

G (η_{1}, η_{2}, \dots, η_{m})

. In this part, the GCC algorithm for random systems with stochastic uncertainty is presented in Algorithm 1.

The specific steps of Algorithm 1 include the following.

(1) Select a set of sampling points for the uncertain variable

g (t)

based on the MPCM and calculate the cumulative cost function

G_{Γ}

for the future time period at each sampling point. This function is composed of the output estimate

{\hat{W}}_{c}^{T} φ_{c}

of the next evaluation critic NN and the integral term containing control energy consumption and state cost in the current time period. (2) Calculate the mathematical expectation of the

G_{Γ}

function values at all sampling points to obtain the updated value function estimate for this iteration, denoted as

Γ^{(s + 1)}

. With this estimated value as the objective, update the weight vector

{\hat{W}}_{c}

of the evaluation NN by solving the HJI equation so that its output accurately matches the value function estimate. (3) Using the gradient information of the updated value function, design control strategy

ρ^{(s + 1)}

and disturbance strategy

ð^{(s + 1)}

, respectively.

Algorithm 1 Model-based GCC algorithm for stochastic uncertain system (1).

Initialization Set initial admissible policies

ρ (0)

and

ð (0)

.
Step 1: A set of sampling points for the uncertain variable

g (t)

is selected on the basis of the MPCM ([40], Section II). Compute the value of

G_{Γ}

at each sampling point:

\begin{matrix} G_{Γ^{(s + 1)}} (ς, ð^{(s)}, ρ^{(s)}, g) \\ = & {\hat{W}}_{c}^{(s + 1) T} φ_{c} (ς (t + T)) \\ + \int_{t}^{t + T} (r (ς (ϱ), ρ^{(s)} (ϱ), ð^{(s)} (ϱ)) \\ + γ^{2} β_{1} ς^{T} ς) d ϱ . \end{matrix}

(24)

Step 2: Compute

Γ^{(s + 1)} (ς (t))

by computing the mean value of

G_{Γ^{(s + 1)}} (\cdot)

.

Γ^{(s + 1)} (ς (t)) = E [G_{Γ^{(s + 1)}} (ς, ρ^{(s)}, ð^{(s)}, g)] .

(25)

Step 3: Update

W_{c}^{(s + 1)}

on the basis of solving the HJI equation:

{\hat{W}}_{c}^{(s + 1) T} φ_{c} (ς (t)) = Γ^{(s + 1)} (ς (t)) .

(26)

Step 4: Design the control pairs via

\begin{matrix} ρ^{(s + 1)} & = - κ_{2} tanh (\frac{1}{2 κ_{2}} B^{T} \nabla Γ^{(s + 1)} (ς)) + κ_{1}, \end{matrix}

(27a)

\begin{matrix} ð^{(s + 1)} & = \frac{1}{2 γ^{2}} C^{T} \nabla Γ^{(s + 1)} (ς) . \end{matrix}

(27b)

Let s be updated as

s + 1

. If

∥ Γ^{(s + 1)} - Γ^{(s)} ∥ \leq ς

where

ς

is a small positive number and stops at step 4; Otherwise, please return to step 1.

Here, we define a new system function involving the uncertain parameter

g (t)

as

G_{Γ (t)} (ς, ρ, ð, g) = \int_{t}^{t + T} (γ^{2} β_{1} ς^{T} ς + γ (ς, ρ, ð)) d ϱ + Γ (ς (t + T))

. From (20), it follows that

Γ (ς) = E [G_{Γ} (ς, ρ, ð, g)]

. Specifically, select a set of samples according to the probability density functions (pdfs)

F_{χ_{q}} (g_{q})

of the uncertain parameters

g (t)

, and then compute the value of

E [G_{Γ (t)} (ς, ρ, ð, g)]

via simulating at these chosen samples. Suppose the degree of each uncertain variable

g_{q}

is updated to

2 N_{q} - 1

;

G_{Γ (t)} (ς, ρ, ð, g)

can be expressed as

\begin{matrix} G_{Γ (t)} (ς, ρ, ð, g) \\ = & \sum_{p_{1} = 0}^{2 N_{1} - 1} \sum_{p_{2} = 0}^{2 N_{2} - 1} \dots \sum_{p_{m} = 0}^{2 N_{m} - 1} Ψ_{p_{1}, p_{2}, \dots, p_{m}} (ς, ρ, ð) \prod_{q = 1}^{m} g_{q}^{p_{q}} . \end{matrix}

(28)

Using the MPCM, a low-order mapping is used to approximate

G_{Γ (t)} (ς, ρ, ð, g)

as follows:

\begin{matrix} G_{Γ (t)}^{'} (ς, ρ, ð, g) \\ = & \sum_{p_{1} = 0}^{N_{1} - 1} \sum_{p_{2} = 0}^{N_{2} - 1} \dots \sum_{p_{m} = 0}^{N_{m} - 1} Ω_{p_{1}, p_{2}, \dots, p_{m}} (ς, ρ, ð) \prod_{q = 1}^{m} g_{q}^{p_{q}} . \end{matrix}

(29)

On the basis of [40] (Th.2), one obtains

Γ (ς (t)) = E [G_{Γ (t)} (ς, ρ, ð, g)] =

E [G_{Γ (t)}^{'} (ς, ρ, ð, g)]

.

Lemma 1.

The auxiliary system of controlled system (1) is given as (4), and the value function is obtained by (23). In each iteration, the MPCM is employed to sample the uncertainty space, and the policies ρ and ð obtained from Algorithm 1 remain optimal if Algorithm 1 converges.

Proof.

Using [40] (Th.2) and the relation

Γ (ς) = E [G_{Γ} (ς, ρ, ð, g)]

, the control pairs

(ρ^{*}, ð^{*})

can be derived from (10) and (11). To establish this theorem, we need to show that the optimal solution obtained by evaluating

G_{Γ (t)} (ς, ρ, ð, g)

is identical to that derived from computing the reduced-order mapping

G_{Γ (t)}^{'} (ς, ρ, ð, g)

. The equivalence of these two optimal solutions are proven via a contradiction method, as detailed in [39] (Th.1). □

4.2. IRL-Based GCC Design with Asymmetric Constrained Inputs

Unlike Algorithm 1, an IRL-based GCC method is developed for the system (1) without relying on the knowledge of function B and C. Based on [45], two exploration signals

e_{1}

and

e_{2}

are, respectively, added to the control policies

ρ^{(s)}

and

ð^{(s)}

. Consequently, the system (4) becomes

\begin{matrix} \dot{ς} = χ (g) ς + B (ρ + e_{1}) + C (ð + e_{2}) . \end{matrix}

(30)

The specific steps of Algorithm 2 include the following.

Algorithm 2 IRL-based GCC algorithm for system (1) with asymmetric constrained control.

Initialization Set the initial admissible policies

ρ (0)

and

ð (0)

.
Step 1: Choose a set of sampling points for the uncertain variable

g (t)

based on MPCM ([40], Section II). For each sampling point, compute the value of

\begin{matrix} G_{Γ^{(s + 1)}} (ς, ρ^{(s)}, d^{(s)}, g) \\ = & \int_{t}^{t + T} (γ^{2} β_{1} ς^{T} ς + r (ς (ϱ), ρ^{(s)} (ϱ), ð^{(s)} (ϱ)) d ϱ \\ + {\hat{W}}_{c}^{(s + 1) T} φ_{c} (ς (t + T)) . \end{matrix}

(31)

Step 2: Compute

Γ^{(s + 1)} (ς (t))

by calculating the mean of

G_{Γ^{(s + 1)}} (\cdot)

.

Γ^{(s + 1)} (ς (t)) = E [G_{Γ^{(s + 1)}} (ς, ρ^{(s)}, ð^{(s)}, g)] .

(32)

Step 3: Update

Γ^{(s + 1)}

,

ρ^{(s + 1)}

and

ð^{(s + 1)}

through solving the HJI equation:

\begin{matrix} Γ^{(s + 1)} (ς (t + T)) - Γ^{(s + 1)} (ς (t)) \\ = & E [\int_{t}^{t + T} (- 2 κ_{2} {tanh}^{- T} (κ_{2}^{- 1} (ρ^{(s + 1)} - κ_{1})) R e_{1} + 2 γ^{2} ð^{(s + 1) T} e_{2}) d ϱ] \\ + \int_{t}^{t + T} (- γ^{2} β_{1} ς^{T} ς - r^{(s)} (ς, ρ, ð)) d ϱ . \end{matrix}

(33)

Set s to

s + 1

. If

∥ Γ^{(s + 1)} - Γ^{(s)} ∥ \leq ς

, where

ς

is a chosen positive number, end at step 3; otherwise, return to step 1.

(1) Based on the MPCM, select a set of sampling points for the uncertain variable

g (t)

and calculate the cumulative cost function

G_{Γ}

for the future time period at each sampling point. This function consists of the integral term containing the state penalty and control cost in the current time period, as well as the output estimate

{\hat{W}}_{c}^{(s + 1) T} φ_{c}

of the evaluation NN at the next time. (2) By calculating the mathematical expectation of the

G_{Γ}

function values at all sampling points, the updated value function estimate for this iteration,

Γ^{(s + 1)}

, is obtained. (3) Solve an HJI equation embedded with exploration signals

e_{1}

,

e_{2}

, while updating the value function

Γ^{(s + 1)}

, control strategy

ρ^{(s + 1)}

, and disturbance strategy

ð^{(s + 1)}

.

Taking (7) into account and utilizing (10) and (11), we have

\begin{matrix} E [{\dot{Γ}}^{(s + 1)} (ς (t))] \\ = & E [(\nabla Γ^{(s + 1) T} (ς)) (χ (g) ς + B ρ^{(s)} + C ð^{(s)}) \\ + (\nabla Γ^{(s + 1) T} (ς)) (B e_{1} + C e_{2})] \\ = & E [- 2 κ_{2} {tanh}^{- T} (κ_{2}^{- 1} (ρ^{(s + 1)} - κ_{1})) R e_{1} \\ + 2 γ^{2} ð^{(s + 1) T} e_{2}] - γ^{2} β_{1} ς^{T} ς - r^{(s)} (ς, ρ, ð) . \end{matrix}

(34)

Based on (34), one obtains

\begin{matrix} E [Γ^{(s + 1)} (ς (t + T))] - Γ^{(s + 1)} (ς (t)) \\ = & \int_{t}^{t + T} (- γ^{2} β_{1} ς^{T} ς - r^{(s)} (ς, ρ, ð)) d ϱ \\ + E [\int_{t}^{t + T} (- 2 κ_{2} {tanh}^{- T} (κ_{2}^{- 1} (ρ^{(s + 1)} - κ_{1})) R e_{1} + 2 γ^{2} d^{(s + 1) T} e_{2}) d ϱ] . \end{matrix}

(35)

Algorithm 2 presents the GCC approach via the MPCM and IRL algorithm, where the target control policies can be approximated through

\begin{matrix} ρ^{*} (ς) & = - κ_{2} tanh (W_{ρ}^{T} φ_{ρ} (ς) + ε_{ρ} (ς)) + κ_{1}, \end{matrix}

(36)

\begin{matrix} ð^{*} (ς) & = W_{ð}^{T} φ_{ð} (ς) + ε_{ð} (ς), \end{matrix}

(37)

where the activation functions are denoted as

φ_{ρ} (ς)

and

φ_{ð} (ς)

, the expected weight vectors are denoted as

W_{ρ}

and

W_{ð}

, and approximation errors are denoted as

ε_{ρ} (ς)

and

ε_{ð} (ς)

. The subscripts

ρ

and ð represent the actor NN and disturbance NN, respectively.

The control pairs in actual situations are

\begin{matrix} ρ^{(s + 1)} (ς) & = - κ_{2} tanh ({\hat{W}}_{ρ}^{(s + 1) T} φ_{ρ} (ς)) + κ_{1}, \end{matrix}

(38)

\begin{matrix} ð^{(s + 1)} (ς) & = {\hat{W}}_{ð}^{(s + 1) T} φ_{ð} (ς) . \end{matrix}

(39)

Theorem 2.

The auxiliary system of controlled system (1) is given as (4). The value function and control pairs are approximated via (23), (38) and (39), respectively. Assume Algorithm 2 converges. At each iteration, the MPCM is used to sample from the uncertain parameters;

{\hat{Γ}}^{(s)} (ς)

,

{\hat{ρ}}^{(s)} (ς)

and

{\hat{ð}}^{(s)} (ς)

given by (33) will ultimately converge to the optimal values

Γ^{(*)} (ς)

,

ρ^{(*)} (ς)

, and

ð^{*} (ς)

.

Proof.

The solution obtained using the GCC algorithm based on IRL has been proven to be the same as the solution obtained using the GCC algorithm with asymmetric constraint control based on IRL in [46] (Th. 3). The prior conclusion derived via the MPCM for each sample point holds valid as well. Therefore, the two approaches achieve identical optimal value functions. □

5. Event-Triggered Construction of Optimal GCC

Time-triggered control strategies involve substantial data transmission, which causes computational overload. In contrast to time-triggered methods, this section presents a GCC approach in the DETC mode for the stochastic system (1) based on Theorems 1–2 and Lemma 1, aiming to decrease the unnecessary consumption of communication resources.

5.1. Event-Triggered GCC Design

The control pairs will update when the triggering condition is violated with the DETC strategy. At each triggering instant,

ς (s_{j}) = {\bar{ς}}_{j}

, where

s_{j}

stands for the j th sampling moment. The event-triggered error is

\begin{matrix} e_{j} (t) = ς (t) - {\bar{ς}}_{j}, \forall t \in [s_{j}, s_{j + 1}) \end{matrix}

(40)

For

j \in N

,

e_{j} (t) = 0

holds when

t = s_{j}

. Based on (40), at triggering instants, the ETC policies become

\begin{matrix} ρ ({\bar{ς}}_{j}) & = ρ (ς (t) - e_{j}), \end{matrix}

(41)

\begin{matrix} ð ({\bar{ς}}_{j}) & = ð (ς (t) - e_{j}) . \end{matrix}

(42)

By applying (41) and (42), and considering system (4) for all

\forall t \in [s_{j}, s_{j} + 1)

, the Hamiltonian function in (8) becomes

\begin{matrix} H (ς, \bar{ρ}, \bar{ð}, \nabla Γ (ς)) \\ = E [{(\nabla Γ (ς))}^{T} (χ (g) ς + B \bar{ρ} + C \bar{ð})] \\ + γ^{2} β_{1} ς^{T} ς + ς^{T} Q ς + W (\bar{ρ}) - γ^{2} {\bar{ð}}^{T} \bar{ð} . \end{matrix}

(43)

where

\bar{ρ} = ρ ({\bar{ς}}_{j})

and

\bar{ð} = ð ({\bar{ς}}_{j})

.

Using (10) and (11), the ETC policies can be constructed as

\begin{matrix} {\bar{ρ}}^{*} & = - κ_{2} tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} \nabla Γ^{*} ({\bar{ς}}_{j})) + κ_{1}, \end{matrix}

(44)

\begin{matrix} {\bar{ð}}^{*} & = \frac{1}{2 γ^{2}} C^{T} \nabla Γ^{*} ({\bar{ς}}_{j}) . \end{matrix}

(45)

with

{\bar{ρ}}^{*} = ρ^{*} ({\bar{ς}}_{j})

and

{\bar{ð}}^{*} = ð^{*} ({\bar{ς}}_{j})

.

Substituting (44) and (45) into (43), the result is

\begin{matrix} H (ς, {\bar{ρ}}^{*}, {\bar{ð}}^{*}, \nabla Γ^{*} (ς)) \\ = & γ^{2} β_{1} ς^{T} ς + ς^{T} Q ς + W ({\bar{ρ}}^{*}) - γ^{2} {\bar{ð}}^{* T} {\bar{ð}}^{*} \\ + E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B {\bar{ρ}}^{*} + C {\bar{ð}}^{*})] . \\ = & E [{(\nabla Γ^{*} (ς))}^{T} χ (g) ς - κ_{2} {(\nabla Γ^{*} (ς))}^{T} B tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} \nabla Γ^{*} ({\bar{ς}}_{j})) \\ + {(\nabla Γ^{*} (ς))}^{T} B κ_{1} + \frac{1}{2 γ^{2}} {(\nabla Γ^{*} (ς))}^{T} C C^{T} \nabla Γ^{*} ({\bar{ς}}_{j}) \\ - \frac{1}{4 γ^{2}} {(\nabla Γ^{*} ({\bar{ς}}_{j}))}^{T} C C^{T} \nabla Γ^{*} ({\bar{ς}}_{j})] + γ^{2} β_{1} ς^{T} ς + ς^{T} Q ς + W ({\bar{ρ}}^{*}) . \end{matrix}

(46)

Assumption 4.

The control pairs are Lipschitz-continuous, satisfying

\begin{matrix} ∥ ρ^{*} - {\bar{ρ}}^{*} ∥ \leq l_{ρ} ∥ E [e_{j} (t)] ∥, \end{matrix}

(47)

\begin{matrix} ∥ ð^{*} - {\bar{ð}}^{*} ∥ \leq l_{ð} ∥ E [e_{j} (t)] ∥, \end{matrix}

(48)

where

l_{ρ}

and

l_{ð}

are positive constants,

ρ^{*} = ρ^{*} (ς)

and

ð^{*} = ð^{*} (ς)

.

Lemma 2.

Given that Assumption 4 is valid, the following inequality holds:

\begin{matrix} ∥H (ς, {\bar{ρ}}^{*}, {\bar{ð}}^{*}, \nabla Γ^{*} (ς)) - H (ς, ρ^{*}, ð^{*}, \nabla Γ^{*} (ς))∥ \\ \leq m_{1} + (l_{ρ}^{2} + γ^{2} l_{ð}^{2}) {∥[E (e_{j} (t))]∥}^{2}, \end{matrix}

(49)

where

m_{1}

is given in (52).

Proof.

Subtracting (14) from (43), we have

\begin{matrix} H (ς, {\bar{ρ}}^{*}, {\bar{ð}}^{*}, \nabla Γ^{*} (ς)) - H (ς, ρ^{*}, ð^{*}, \nabla Γ^{*} (ς)) \\ = & E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B {\bar{ρ}}^{*} + C {\bar{ð}}^{*})] + γ^{2} β_{1} ς^{T} ς + r (ς, {\bar{ρ}}^{*}, {\bar{ð}}^{*}) \\ - E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B ρ^{*} + C ð^{*})] - γ^{2} β_{1} ς^{T} ς - r (ς, ρ^{*}, ð^{*}) \\ = & W ({\bar{ρ}}^{*}) - W (ρ^{*}) - γ^{2} {\bar{ð}}^{* T} {\bar{ð}}^{*} + γ^{2} ð^{* T} ð^{*} + 2 γ^{2} ð^{* T} {\bar{ð}}^{*} - 2 γ^{2} ð^{* T} ð^{*} \\ + E [{(\nabla Γ^{*} (ς))}^{T} B ({\bar{ρ}}^{*} - ρ^{*})] \\ = & W ({\bar{ρ}}^{*}) - W (ρ^{*}) - γ^{2} {(ð^{*} - {\bar{ð}}^{*})}^{T} (ð^{*} - {\bar{ð}}^{*}) \\ + E [{(\nabla Γ^{*} (ς))}^{T} B ({\bar{ρ}}^{*} - ρ^{*})] . \end{matrix}

(50)

Based on (47), we can get

\begin{matrix} ∥W ({\bar{ρ}}^{*}) - W (ρ^{*})∥ = M_{W} ∥{\bar{ρ}}^{*} - ρ^{*}∥ \leq M_{W} l_{ρ} ∥ E [e_{j} (t)] ∥ \end{matrix}

(51)

where

M_{W} = |2 κ_{2} {tanh}^{- T} (\frac{ρ_{0} - κ_{1}}{κ_{2}}) R|

with

ρ_{0} = |\frac{{\bar{ρ}}^{*} + ρ^{*}}{2}|

. Using Young’s inequality, we get

∥W ({\bar{ρ}}^{*}) - W (ρ^{*})∥ \leq \frac{M_{W}^{2}}{2} + \frac{l_{ρ}^{2} {∥ E [e_{j} (t)] ∥}^{2}}{2}

.

Based on (50), we have

\begin{matrix} ∥H (ς, {\bar{ρ}}^{*}, {\bar{ð}}^{*}, \nabla Γ^{*} (ς)) - H (ς, ρ^{*}, ð^{*}, \nabla Γ^{*} (ς)∥ \\ \leq & \frac{{∥E [\nabla Γ^{*} (ς) B]∥}^{2} + l_{ρ}^{2} {∥[E [e_{j} (t)]∥}^{2}}{2} \\ + \frac{M_{W}^{2}}{2} + \frac{l_{ρ}^{2} {∥ E [e_{j} (t)] ∥}^{2}}{2} + γ^{2} l_{ð}^{2} {∥[E (e_{j} (t))]∥}^{2} \\ = & m_{1} + (l_{ρ}^{2} + γ^{2} l_{ð}^{2}) {∥[E (e_{j} (t))]∥}^{2} . \end{matrix}

(52)

where

m_{1} = \frac{{∥E [\nabla Γ^{*} (ς) B]∥}^{2}}{2} + \frac{M_{W}^{2}}{2}

. □

Theorem 3.

Assuming that

Γ^{*} (ς (t)) > 0

is a continuous function satisfying the improved HJI Equation (14) and the OC pairs are given as (44) and (45), then system (1) is mean asymptotically stable if the following condition holds:

\begin{matrix} ∥ E [e_{j} (t)] ∥^{2} \leq \frac{β {\tilde{σ}}_{a} - α^{2} λ_{min} (Q) {∥ ς ∥}^{2} - m_{1}}{l_{ρ}^{2} + 2 γ^{2} l_{ð}^{2}} = e_{T}, \end{matrix}

(53)

where α is the designed parameter,

0 < α < 1

,

β > 0

, and

e_{T}

is the trigger threshold.

m_{1}

is given in (52). The internal dynamic signal is

\begin{matrix} {\dot{\tilde{σ}}}_{a} = - β {\tilde{σ}}_{a} + λ_{min} (Q) {∥ ς ∥}^{2} . \end{matrix}

(54)

Proof.

The Lyapunov function is

L = l_{1} + l_{2} = Γ (ς) + {\tilde{σ}}_{a}

, where

l_{1} = Γ (ς)

and

l_{2} = {\tilde{σ}}_{a}

.

In light of (43), the following holds:

\begin{matrix} E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B {\bar{ρ}}^{*} + C {\bar{ð}}^{*})] \\ = & - ς^{T} Q_{1} ς - W ({\bar{ρ}}^{*}) + γ^{2} {\bar{ð}}^{* T} {\bar{ð}}^{*} \\ + H (ς, {\bar{ρ}}^{*}, {\bar{ð}}^{*}, \nabla Γ^{*} (ς)) . \end{matrix}

(55)

By substituting

{\bar{ρ}}^{*}

from (44) and

{\bar{ð}}^{*}

from (45) into (1), and relying on (55) and Lemma 2, one obtains

\begin{matrix} E [{\dot{Γ}}^{*} (ς (t))] \\ = & E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B {\bar{ρ}}^{*} + C {\bar{ð}}^{*}) + {(\nabla Γ^{*} (ς))}^{T} C (d (ς) - {\bar{ð}}^{*})] \\ = & - ς^{T} Q_{1} ς - W ({\bar{ρ}}^{*}) + γ^{2} {\bar{ð}}^{* T} {\bar{ð}}^{*} + E [2 γ^{2} ð^{* T} (d (ς) - {\bar{ð}}^{*})] \\ + H (ς, {\bar{ρ}}^{*}, {\bar{ð}}^{*}, \nabla Γ^{*} (ς)) \\ \leq & E [γ^{2} {(ð^{*} - {\bar{ð}}^{*})}^{T} (ð^{*} (ς) - {\bar{ð}}^{*}) - ς^{T} Q ς + m_{1} + (l_{ρ}^{2} + γ^{2} l_{ð}^{2}) {∥E [e_{j} (t)]∥}^{2}] \\ \leq & E [- α^{2} λ_{min} (Q) {∥ ς ∥}^{2} + γ^{2} l_{ð}^{2} {∥E [e_{j} (t)]∥}^{2} \\ + m_{1} + (l_{ρ}^{2} + γ^{2} l_{ð}^{2}) {∥E [e_{j} (t)]∥}^{2} - (1 - α^{2}) λ_{min} (Q) {∥ ς ∥}^{2}], \end{matrix}

(56)

where

λ_{min} (Q)

denotes the smallest eigenvalue of Q.

According to (54), we obtain

\begin{matrix} {\dot{l}}_{2} = - β {\tilde{σ}}_{a} + λ_{min} (Q) {∥ ς ∥}^{2}, \end{matrix}

(57)

Taking (56) and (57) into account together, if the condition (53) holds, we derive

E [L] \leq - α^{2} λ_{min} (Q) {∥ ς ∥}^{2} < 0

, for any

ς = 0

. According to [36] (Lemma 1) and Assumption 3, the system (1) achieves mean asymptotic stability. □

Theorem 4.

Consider system (2); when adopting the optimal ETC policy

ρ^{*} ({\bar{ς}}_{j})

from (44) and employing the OC signal

ρ^{*} = {\bar{ρ}}^{*}

, the following result is obtained:

\begin{matrix} \bar{Θ} (ς (0)), ρ^{*}) \leq Θ^{*} (ς (0), ρ^{*}, ð^{*}) . \end{matrix}

(58)

Proof.

With respect to any admissible control function

ρ

, by referring to (4), (6), and (7) and calculating the system dynamics in (1), one obtains

\begin{matrix} \bar{Θ} (ς (0), ρ) & = & E [\int_{0}^{\infty} (ς^{T} Q ς + W (ρ) + {\dot{Γ}}^{*} (ς)) d ϱ] + Θ^{*} (ς (0)) \\ = & E [\int_{0}^{\infty} ({(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B ρ + C d) + ς^{T} Q ς \\ + W (ρ)) d ϱ] + Θ^{*} (ς (0)) . \end{matrix}

(59)

On the basis of (8), one obtains

\begin{matrix} E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς)] \\ = & E [- {(\nabla Γ^{*} (ς))}^{T} B ρ^{*} (ς) - γ^{2} ð^{* T} ð^{*} - γ^{2} β_{1} ς^{T} ς \\ - ς^{T} Q ς - W (ρ^{*})] . \end{matrix}

(60)

Based on (60), we have

\begin{matrix} E & [\int_{0}^{\infty} ({(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B ρ + C d) \\ + ς^{T} Q ς + W (ρ)) d ϱ] \\ = & E [{(\nabla Γ^{*} (ς))}^{T} B (ρ - ρ^{*}) - γ^{2} ð^{* T} ð^{*} \\ - γ^{2} β_{1} ς^{T} ς + W (ρ) - W (ρ^{*}) + 2 γ^{2} ð^{* T} d] . \end{matrix}

(61)

According to (61), Assumption 2, the Cauchy–Schwarz inequality, and

2 γ^{2} ð^{* T} d \leq γ^{2} ð^{* T} ð^{*} + γ^{2} d^{T} d

, we obtain

\begin{matrix} E [\int_{0}^{\infty} (\nabla Γ^{* T} (ς) (χ (g) ς + B ρ + C d) + ς^{T} Q ς + W (ρ)) d ϱ] \\ \leq & E [{(\nabla Γ^{*} (ς))}^{T} B (ρ - ρ^{*}) + W (ρ) - W (ρ^{*})] . \end{matrix}

(62)

Applying

ρ = {\bar{ρ}}^{*}

to (62), one obtains the result (58). □

5.2. NN-Based Control Design

Based on Theorem 4, we have

\bar{Θ} (ς (0), ρ^{*}) \leq Θ^{*} (ς (0), ρ^{*} (ς), ð^{*} (ς))

, and the GCC algorithm using IRL is used to achieve an approximation of

Θ^{*} (ς (0), ρ^{*} (ς), ð^{*} (ς))

. In the proposed method,

Γ^{*} (ς)

is approximated by (22). Based on the work in [47], the optimal GCC pairs are given as

\begin{matrix} ρ^{*} (ς) & = - κ_{2} tanh (W_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j}) + ε_{e, ρ} ({\bar{ς}}_{j}, e_{j})) + κ_{1} \forall t \in [s_{j}, s_{j + 1}), \end{matrix}

(63)

\begin{matrix} ð^{*} (ς) & = W_{ð}^{T} φ_{ð} ({\bar{ς}}_{j}) + ε_{e, ð} ({\bar{ς}}_{j}, e_{j}) \forall t \in [s_{j}, s_{j + 1}), \end{matrix}

(64)

where

ε_{e, ρ} = W_{ρ}^{T} [φ_{ρ} ({\bar{ς}}_{j} + e_{j}) - φ_{ρ} ({\bar{ς}}_{j})] + ε_{ρ} ({\bar{ς}}_{j} + e_{j})

and

ε_{e, ð} = W_{ð}^{T} [φ_{ð} ({\bar{ς}}_{j} + e_{j}) - φ_{ð} ({\bar{ς}}_{j})] + ε_{ð} ({\bar{ς}}_{j} + e_{j})

denote the event-sampled approximation errors.

Based on (14), (63) and (64), the HJI equation takes the form

\begin{matrix} E [(W_{c}^{T} \nabla φ_{c} (ς)) χ (g) ς + W_{c}^{T} \nabla φ_{c} (ς) B ρ^{*}] + γ^{2} β_{1} ς^{T} ς \\ + ς^{T} Q ς + W (ρ^{*}) + γ^{2} {(W_{ð}^{T} φ_{ð} ({\bar{ς}}_{j}))}^{T} (W_{ð}^{T} φ_{ð} ({\bar{ς}}_{j})) \\ + ε_{H J I} \\ = & ς^{T} Q_{1} ς + (W_{c}^{T} \nabla φ_{c} (ς)) χ (g) ς + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (N_{1})) \\ + γ^{2} P_{1}^{T} P_{1} + 2 κ_{2} N_{1}^{T} R κ_{1} \\ + ε_{H J I 1} \\ = & 0, \end{matrix}

(65)

where

ε_{H J I} = \nabla ε_{c}^{T} χ (g) ς + \nabla ε_{c}^{T} B ρ^{*} + 2 γ^{2} W_{ð}^{T} φ_{ð} ({\bar{ς}}_{j}) ε_{e, ð} + γ^{2} ε_{e, ð}^{T} ε_{e, ð}

.

P_{1} = W_{ð}^{T} φ_{ð} ({\bar{ς}}_{j})

,

ε_{H J I 1} = \nabla ε_{c}^{T} χ (g) ς + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (N_{1} + ε_{e, ρ})) - κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (N_{1})) + γ^{2} ε_{e, ð}^{T} ε_{e, ð} + 2 γ^{2} W_{ð}^{T} φ_{ð} ε_{e, ð} + 2 κ_{2} ε_{e, ρ} R κ_{1}

and

N_{1} = W_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j})

.

Assumption 5.

The actor, critic, and disturbance NNs, the NN weight vectors, activation function, reconstruction errors, and gradient of relevant parameters are bounded, which satisfy

∥ W_{c} ∥ \leq c_{1}, ∥ W_{ρ} ∥ \leq c_{2}, ∥ W_{ð} ∥ \leq c_{3}, ∥ W_{a} ∥_{F} \leq c_{2}^{'}, {∥ W_{ð} ∥}_{F} \leq c_{3}^{'}

;

∥φ_{ρ}∥ \leq φ_{ρ M}

,

∥ φ_{ð} ∥ \leq φ_{ð M}

,

∥ \nabla φ_{c} ∥ \leq φ_{c M}

,

ε_{c} (ς) \leq ε_{c M}

,

ε_{ρ} (ς) \leq ε_{ρ M}

,

ε_{ð} (ς) \leq ε_{ð M}

,

∥ε_{c a} ({\bar{ς}}_{j})∥ \leq b_{c a}

and

\nabla ε_{c} \leq c_{ρ}

, respectively, where

c_{1}

,

c_{2}

,

c_{3}

,

c_{2}^{'}

,

c_{3}^{'}

,

φ_{ρ M}

,

φ_{ð M}

,

φ_{c M}

,

ε_{c M}

,

ε_{ρ M}

,

ε_{ð M}

,

b_{c a}

and

c_{ρ}

are positive constants.

The estimates for

Γ

,

ρ

and ð are expressed as

\begin{matrix} \hat{Γ} (ς) & = {\hat{W}}_{c}^{T} φ_{c} (ς), \end{matrix}

(66a)

\begin{matrix} \hat{ρ} ({\bar{ς}}_{j}) & = - κ_{2} tanh ({\hat{W}}_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j})) + κ_{1}, \end{matrix}

(66b)

\begin{matrix} \hat{ð} ({\bar{ς}}_{j}) & = {\hat{W}}_{ð}^{T} φ_{ð} ({\bar{ς}}_{j}), \end{matrix}

(66c)

where

{\hat{W}}_{c}

,

{\hat{W}}_{ρ}

and

{\hat{W}}_{ð}

stand for the estimates of

W_{c}

,

W_{ρ}

and

W_{ð}

, respectively.

Using (23) and (31)–(33), the residual error can be given as

\begin{matrix} {\overset{˘}{e}}_{B} & = E [{\bar{G}}_{Γ^{(s + 1)}} (ς, \hat{ρ}, \hat{ð}, g)] - {\hat{W}}_{c}^{T} φ_{c} (ς) \\ = E [\int_{t}^{t + T} (ς^{T} Q_{1} ς + W (\hat{ρ}) - γ^{2} {\hat{ð}}^{T} \hat{ð}) d ϱ] \\ - κ_{2} \int_{t}^{t + T} ((2 e_{1}^{T} R) ⨂ φ_{ρ} ({\bar{ς}}_{j})) d ϱ \cdot vec ({\hat{W}}_{ρ}) \\ - \int_{t}^{t + T} ((2 γ^{2} e_{2}^{T}) ⨂ φ_{ð} ({\bar{ς}}_{j})) d ϱ \cdot vec ({\hat{W}}_{ð}) \\ + E [{\hat{W}}_{c}^{T} φ_{c} (ς (t + T))] - {\hat{W}}_{c}^{T} φ_{c} (ς) . \end{matrix}

(67)

where

\hat{ρ} = \hat{ρ} ({\bar{ς}}_{j})

and

\hat{ð} = \hat{ð} ({\bar{ς}}_{j})

. Design a new augmented weight vector as

\hat{W} = {[{\hat{W}}_{c}, vec ({\hat{W}}_{ρ}), vec ({\hat{W}}_{ð})]}^{T}

and the target weight vector

W = {[W_{c}, vec (W_{ρ}), vec (W_{ð})]}^{T}

. Denote

\begin{matrix} S_{1} & = E [\int_{t}^{t + T} (ς^{T} Q_{1} ς + W (\hat{ρ}) - γ^{2} {\hat{ð}}^{T} \hat{ð}) d ϱ], \\ S_{2} & = E [φ_{c} (ς (t + T)) - φ_{c} (ς (t))], \\ S_{3} & = - κ_{2} \int_{t}^{t + T} ((2 e_{1}^{T} R) ⨂ φ_{ρ} ({\bar{ς}}_{j})) d ϱ, \\ S_{4} & = - \int_{t}^{t + T} ((2 γ^{2} e_{2}^{T}) ⨂ φ_{ð} ({\bar{ς}}_{j})) d ϱ . \end{matrix}

(68)

Then, we have

\begin{matrix} {\overset{˘}{e}}_{B} = E [\bar{S} \hat{W} + S_{1}], \end{matrix}

(69)

with

\bar{S} = [S_{2}, S_{3}, S_{4}]

.

Let

ξ_{1} = \frac{1}{2} {\overset{˘}{e}}_{B}^{T} {\overset{˘}{e}}_{B}

,

α_{1}

denote an adaptive gain parameter; the weight adjustment rule is developed as

\begin{matrix} \dot{\hat{W}} = - E [α_{1} \frac{{\bar{S}}^{T}}{{(\bar{S} {\bar{S}}^{T} + 1)}^{2}} {\overset{˘}{e}}_{B}], \end{matrix}

(70)

Assumption 6.

{\tilde{η}}_{1}

is continuously excited within the interval of

[t, t + T]

, and one obtains

\begin{matrix} γ_{1} I \leq \int_{t}^{t + T} {\tilde{η}}_{1} {\tilde{η}}_{1}^{T} d t \leq γ_{2} I, \end{matrix}

(71)

where

{\tilde{η}}_{1} = ({\bar{S}}^{T} / (\bar{S} {\bar{S}}^{T} + 1))

,

T > 0

,

γ_{1} \leq γ_{2}

, and

γ_{1}

and

γ_{2}

are constants.

Considering

\dot{ς} = χ (g) ς + B \hat{ρ} + C \hat{ð}

and

\tilde{W} = W - \hat{W}

, we define the augmented system state as

℧ = {[ς^{T}, {\bar{ς}}_{j}^{T}, {\tilde{W}}^{T}]}^{T}

, we have

\begin{matrix} \dot{℧} (t) = [\begin{matrix} χ (g) ς + B \hat{ρ} + C \hat{ð} \\ 0 \\ E [α_{1} \frac{{\bar{S}}^{T}}{{(\bar{S} {\bar{S}}^{T} + 1)}^{2}} {\overset{˘}{e}}_{B}] \end{matrix}], t \in [s_{j}, s_{j + 1}) \end{matrix}

(72)

Furthermore, one obtains

\begin{matrix} ℧ (t) = ℧ (t^{-}) + [\begin{matrix} 0 \\ ς - {\bar{ς}}_{j} \\ 0 \end{matrix}], t = s_{j + 1} \end{matrix}

(73)

Here,

℧ (t^{-}) = {lim}_{λ \to 0} ℧ (t - λ)

, where

λ \in (0, s_{j + 1} - s_{j})

.

Remark 5.

The residual error

{\overset{˘}{e}}_{B}

, which requires minimization in (67), is derived from the expectation of the function

E [{\bar{G}}_{Γ^{(s + 1)}} (ς, \hat{ρ}, \hat{ð}, g)]

. Detailed procedures for calculating this expectation are provided in [40].

5.3. Stability Analysis

Theorem 5.

The ETC policies are given as (66b) and (66c) for the dynamical system (4), and the weight adjustment law

\hat{W}

is designed as (70). The system states and NN weight estimation errors are uniformly bounded (UB) in the mean. The triggering condition is designed as

\begin{matrix} ∥ E [e_{j} (t)] ∥^{2} \leq \frac{β {\tilde{σ}}_{b} - ϖ^{2} λ_{min} (Q_{1}) {∥ ς ∥}^{2} - Ψ ({\bar{ς}}_{j})}{l_{ρ ð}} - \frac{C_{0} + γ^{2} {∥\hat{ð}∥}^{2}}{l_{ρ ð}} = e_{T}, \end{matrix}

(74)

where

C_{0}

and

l_{ρ ð}

are defined in (81),

ϖ \in (0, 1)

. The internal dynamic signal is given by

\begin{matrix} {\dot{\tilde{σ}}}_{b} = - β {\tilde{σ}}_{b} + λ_{min} (Q_{1}) {∥ ς ∥}^{2} . \end{matrix}

(75)

Proof.

A Lyapunov function is chosen as

\begin{matrix} Λ_{Γ} (t) = Λ_{1} (t) + Λ_{2} (t) + Λ_{3} (t) + Λ_{4} (t), \end{matrix}

(76)

with

Λ_{1} (t) = E [Γ^{*} (ς)]

,

Λ_{2} (t) = E [Γ^{*} ({\bar{ς}}_{j})]

,

Λ_{3} (t) = E [\frac{1}{2 α_{1}} {\tilde{W}}^{T} \tilde{W}]

and

Λ_{4} (t) = {\tilde{σ}}_{b}

.

Case 1: When

\forall t \in [s_{j}, s_{j + 1})

,

j \in N

, based on (4), (66b) and (66c), one obtains

\begin{matrix} {\dot{Λ}}_{1} (t) & = & E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B \hat{ρ} + C \hat{ð})], \\ {\dot{Λ}}_{2} (t) & = & 0, \\ {\dot{Λ}}_{3} (t) & = & E [\frac{1}{α_{1}} {\tilde{W}}^{T} \dot{\tilde{W}}], \\ {\dot{Λ}}_{4} (t) & = & {\dot{\tilde{σ}}}_{b} . \end{matrix}

Based on (22), we can obtain

\begin{matrix} \nabla Γ^{*} ({\bar{ς}}_{j}) = {(\nabla φ_{c} ({\bar{ς}}_{j}))}^{T} W_{c} + \nabla ε_{c} ({\bar{ς}}_{j}) . \end{matrix}

(77)

According to

2 a^{T} b \leq {∥a∥}^{2} + {∥b∥}^{2}

and Assumption 5, we get

\begin{matrix} {\dot{Λ}}_{1} (t) & = & E [{(\nabla Γ^{*} (ς))}^{T} (χ (g) ς + B ρ^{*} + C ð^{*})] \\ + {(\nabla Γ^{*} (ς))}^{T} B (\hat{ρ} - ρ^{*}) \\ + {(\nabla Γ^{*} (ς))}^{T} C (\hat{ð} - ð^{*}) \\ = & E [γ^{2} ð^{* T} ð^{*} - γ^{2} β_{1} ς^{T} ς \\ - ς^{T} Q ς - W (ρ^{*}) + {(\nabla Γ^{*} (ς))}^{T} B (\hat{ρ} - ρ^{*}) \\ + 2 γ^{2} ð^{* T} (\hat{ð} - ð^{*})] \\ \leq & E [γ^{2} {\hat{ð}}^{T} \hat{ð} - ς^{T} Q_{1} ς - W (ρ^{*}) \\ + \frac{1}{2} {∥B (\hat{ρ} - ρ^{*})∥}^{2} - γ^{2} {∥ð^{*} - \hat{ð}∥}^{2} + (c_{ρ}^{2} + φ_{c M}^{2} c_{1}^{2})] \\ \leq & E [γ^{2} {\hat{ð}}^{T} \hat{ð} - ς^{T} Q_{1} ς - W (ρ^{*}) \\ + \frac{1}{2} {∥B (\hat{ρ} - ρ^{*})∥}^{2} + (c_{ρ}^{2} + φ_{c M}^{2} c_{1}^{2})] \end{matrix}

(78)

Substituting (77) into (10), one obtains

\begin{matrix} ρ^{*} ({\bar{ς}}_{j}) & = - κ_{2} tanh (N^{*}) + ε_{c a} ({\bar{ς}}_{j}) + κ_{1}, t \in [s_{l}, s_{l + 1})) \end{matrix}

(79a)

\begin{matrix} \hat{ρ} ({\bar{ς}}_{j}) & = - κ_{2} tanh (\frac{1}{2 κ_{2}} B^{T} {(\nabla φ_{c} ({\bar{ς}}_{j}))}^{T} {\hat{W}}_{c}) + κ_{1}, \end{matrix}

(79b)

where

N^{*} = (1 / (2 κ_{2})) R^{- 1} B^{T} {(\nabla φ_{c} ({\bar{ς}}_{j}))}^{T} W_{c}

and

ε_{c a} ({\bar{ς}}_{j}) = - (1 / 2) (I_{q} - D (ϵ ({\bar{ς}}_{j}))) R^{- 1} B^{T} ({\bar{ς}}_{j}) \nabla ε_{c} ({\bar{ς}}_{j})

with

D (ϵ ({\bar{ς}}_{j})) = diag {{tanh}^{2} (ϵ_{i} ({\bar{ς}}_{j}))}

,

i = 1, 2, \dots, q

.

ϵ ({\bar{ς}}_{j}) = [ϵ_{1} ({\bar{ς}}_{j}), ϵ_{2} ({\bar{ς}}_{j}), \dots, {ϵ_{q} ({\bar{ς}}_{j})]}^{T} \in R^{q}

,

ϵ ({\bar{ς}}_{j})

is selected between

N^{*}

and

N^{*}

and

\hat{ρ} ({\bar{ς}}_{j})

is an approximate value of

ρ^{*} ({\bar{ς}}_{j})

.

According to

{∥ a + b ∥}^{2} \leq {2 ∥ a ∥}^{2} + 2 {∥ b ∥}^{2}

, (79a) and (79b), we get

\begin{matrix} ∥ \hat{ρ} - {\bar{ρ}}^{*} + {\bar{ρ}}^{*} - ρ^{*} ∥^{2} \\ \leq & 2 ∥ \hat{ρ} - {\bar{ρ}}^{*} ∥^{2} + 2 {∥ {\bar{ρ}}^{*} - ρ^{*} ∥}^{2} \\ \leq & 2 ∥ κ_{2} tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} {(\nabla φ_{c} ({\bar{ς}}_{j}))}^{T} W_{c}) \\ - κ_{2} tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} {(\nabla φ_{c} ({\bar{ς}}_{j}))}^{T} {\hat{W}}_{c}) - ε_{c a} ({\bar{ς}}_{j}) ∥^{2} + 2 {∥ l_{ρ} E [e_{j} (t)] ∥}^{2} \\ \leq & 4 ∥ κ_{2} tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} {(φ_{c} ({\bar{ς}}_{j}))}^{T} W_{c}) \\ - κ_{2} tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} {(φ_{c} ({\bar{ς}}_{j}))}^{T} {\hat{W}}_{c}) ∥^{2} + 4 {∥ ε_{c a} ({\bar{ς}}_{j}) ∥}^{2} \\ + 2 ∥ l_{ρ} E [e_{j} (t)] ∥^{2} \\ \leq & 8 ∥ κ_{2} tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} {(φ_{c} ({\bar{ς}}_{j}))}^{T} W_{c}) ∥^{2} \\ + 8 ∥ κ_{2} tanh (\frac{1}{2 κ_{2}} R^{- 1} B^{T} {(φ_{c} ({\bar{ς}}_{j}))}^{T} {\hat{W}}_{c}) ∥^{2} + 4 {∥ ε_{c a} ({\bar{ς}}_{j}) ∥}^{2} \\ + 2 ∥ l_{ρ} E [e_{j} (t)] ∥^{2} \\ \leq & 16 κ_{2}^{2} + 4 b_{c a}^{2} + 2 l_{ρ}^{2} {∥E [e_{j} (t)]∥}^{2} . \end{matrix}

(80)

As proved in [48],

W (ρ^{*}) > 0

, and one has

\begin{matrix} {\dot{Λ}}_{1} (t) & \leq E [γ^{2} {\hat{ð}}^{T} \hat{ð} - ς^{T} Q_{1} ς + (c_{ρ}^{2} + φ_{c M}^{2} c_{1}^{2}) \\ + \frac{1}{2} {∥B∥}^{2} (16 κ_{2}^{2} + 4 b_{c a}^{2} + 2 l_{ρ}^{2} E {[e_{j} (t)]}^{2})] \\ \leq E [γ^{2} {\hat{ð}}^{T} \hat{ð} - ς^{T} Q_{1} ς + l_{ρ ð} {∥ E [e_{j} (t)] ∥}^{2} + C_{0}] \end{matrix}

(81)

where

C_{0} = (c_{ρ}^{2} + φ_{c M}^{2} c_{1}^{2}) + \frac{1}{2} {∥B∥}^{2} (16 κ_{2}^{2} + 4 b_{c a}^{2})

and

l_{ρ ð} = {∥B∥}^{2} l_{ρ}^{2}

.

According to (12), we get

\begin{matrix} W (\hat{ρ}) & = & 2 κ_{2} \int_{κ_{1}}^{- κ_{2} tanh ({\hat{W}}_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j})) + κ_{1}} {\tilde{ρ}}_{a}^{- T} (κ_{2}^{- 1} (ϑ - κ_{1})) R d ϑ \\ = & 2 κ_{2} \int_{κ_{1}}^{- κ_{2} tanh ({\hat{W}}_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j})) + κ_{1}} {\tilde{ρ}}_{a}^{- T} (κ_{2}^{- 1} (ϑ - κ_{1})) R d ϑ \\ = & 2 κ_{2}^{2} \int_{0}^{- tanh ({\hat{W}}_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j}))} {\tilde{ρ}}_{a}^{- T} (κ_{2}^{- 1} (ϑ - κ_{1})) R d (κ_{2}^{- 1} (ϑ - κ_{1})) \\ = & 2 κ_{2}^{2} φ_{ρ} {({\bar{ς}}_{j})}^{T} {\hat{W}}_{ρ} R tanh ({\hat{W}}_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j})) + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} ({\hat{W}}_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j}))) . \end{matrix}

(82)

By calculating Formula (69), the residual error

{\overset{˘}{e}}_{B}

is re-expressed as follows:

\begin{matrix} {\overset{˘}{e}}_{B} & = & E [\bar{S} (W - \tilde{W}) + S_{1}] \\ = & E [- \bar{S} \tilde{W} + \bar{S} W + \int_{t}^{t + T} (ς^{T} Q_{1} ς + W (\hat{ρ}) - {\hat{U}}_{2}) d ϱ], \end{matrix}

(83)

\begin{matrix} W (\hat{ρ}) & = 2 κ_{2}^{2} {\hat{N}}^{T} R tanh (\hat{N}) + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (\hat{N})), \end{matrix}

(84)

\begin{matrix} \hat{N} & = {\hat{W}}_{ρ}^{T} φ_{ρ} ({\bar{ς}}_{j}), \end{matrix}

(85)

\begin{matrix} {\hat{U}}_{2} & = γ^{2} {\hat{P}}^{T} \hat{P}, \end{matrix}

(86)

\begin{matrix} \hat{P} & = {\hat{W}}_{ð}^{T} φ_{ð} ({\bar{ς}}_{j}) . \end{matrix}

(87)

Based on (65), one obtains

\begin{matrix} {\overset{˘}{e}}_{B} = E [\int_{t}^{t + T} (- W (\bar{ρ}) + W (\hat{ρ}) + {\bar{U}}_{2} - {\hat{U}}_{2} - (W_{c}^{T} \nabla φ_{c} (ς)) χ (g) ς - ε_{H J I 1} + Z_{0}) d ϱ - \bar{S} \tilde{W} + \bar{S} W], \end{matrix}

(88)

where

\begin{matrix} Z_{0} & = - 2 κ_{2} N_{1}^{T} R κ_{1} - 2 γ^{2} P_{1}^{T} P_{1} + 2 κ_{2}^{2} N_{1}^{T} R tanh (N_{1}), \\ W (\bar{ρ}) & = 2 κ_{2}^{2} N_{1}^{T} R tanh (N_{1}) + κ_{2}^{2} {\bar{R}}^{T} ln (\bar{1} - {tanh}^{2} (N_{1})), \\ {\bar{U}}_{2} & = γ^{2} P_{1}^{T} P_{1} . \end{matrix}

(89)

where

ε_{H J I 1} \leq ε_{h}

, with

ε_{h}

being a positive number.

Denote

{\bar{F}}_{1} = \int_{t}^{t + T} ({\bar{U}}_{2} - W (\bar{ρ}) - ε_{H J I 1} + Z_{0}) d ϱ

,

{\bar{F}}_{2} = \int_{t}^{t + T} (W (\hat{ρ}) - {\hat{U}}_{2}) d ϱ

and

{\bar{F}}_{3} = \int_{t}^{t + T} (- W_{c}^{T} \nabla φ_{c} (ς) χ (g) ς) d ϱ

, where T is a small positive number. Then, we have

{\bar{F}}_{1} = T ({\bar{U}}_{2} - W (\bar{ρ}) - ε_{H J I 1} - Z_{0})

,

{\bar{F}}_{2} = T (W (\hat{ρ}) - {\hat{U}}_{2})

and

{\bar{F}}_{3} = T (- W_{c}^{T} \nabla φ_{c} χ (g) ς)

. Based on Assumption 5, we have

∥{\bar{F}}_{1}∥ \leq d_{1}

with

d_{1} > 0

.

Then, we have

\begin{matrix} {\overset{˘}{e}}_{B} = E [\overset{˘}{F} - \bar{S} \tilde{W} + \bar{S} W] . \end{matrix}

(90)

where

\overset{˘}{F} = {\bar{F}}_{1} + {\bar{F}}_{2} + {\bar{F}}_{3}

.

Based on (70) and (83), we have

\begin{matrix} E [\frac{1}{α_{1}} {\tilde{W}}^{T} \dot{\tilde{W}}] & = E [\frac{{\tilde{W}}^{T} {\bar{S}}^{T}}{{(\bar{S} {\bar{S}}^{T} + 1)}^{2}} (\overset{˘}{F} - \bar{S} \tilde{W} + \bar{S} W)] \\ \leq E [- {\tilde{W}}^{T} {\tilde{η}}_{1} {\tilde{η}}_{1}^{T} \tilde{W} + {\tilde{W}}^{T} {\tilde{η}}_{1} {\tilde{η}}_{1}^{T} W + \frac{1}{2} d_{1} ∥ \tilde{W} ∥ + \frac{1}{4} {∥ \tilde{W} ∥}^{2} \\ + Ψ ({\bar{ς}}_{j}) + \frac{1}{4} ∥ \tilde{W} ∥^{2} + \frac{1}{4} λ_{m}^{2} {∥ ς ∥}^{2}] \\ \leq E [- (\frac{γ_{1}}{T} - \frac{1}{2}) ∥ \tilde{W} ∥^{2} + Ψ ({\bar{ς}}_{j}) + \frac{1}{4} λ_{m}^{2} {∥ ς ∥}^{2} \\ + (\frac{γ_{2}}{T} \sqrt{c_{1}^{2} + c_{2}^{' 2} + c_{3}^{' 2}} + \frac{d_{1}}{2}) ∥ \tilde{W} ∥] \end{matrix}

(91)

where

{\tilde{η}}_{1} = ({\bar{S}}^{T} / (\bar{S} {\bar{S}}^{T} + 1))

,

Ψ ({\bar{ς}}_{j}) = \frac{T^{2}}{4} {(W (\hat{ρ}) - - γ^{2} {\hat{ð}}^{T} ({\bar{ς}}_{j}) \hat{ð} ({\bar{ς}}_{j}))}^{2}

and

λ_{m} = T c_{1} φ_{c M} b_{a}

.

Using Equations (81) and (91),

{\dot{Λ}}_{Γ} (t)

becomes

\begin{matrix} {\dot{Λ}}_{Γ} (t) & \leq & E [γ^{2} {\hat{ð}}^{T} \hat{ð} - ς^{T} Q_{1} ς + l_{ρ ð} {∥ E [e_{j} (t)] ∥}^{2} + C_{0} \\ - (\frac{γ_{1}}{T} - \frac{1}{2}) ∥ \tilde{W} ∥^{2} + Ψ ({\bar{ς}}_{j}) + \frac{1}{4} λ_{m}^{2} {∥ ς ∥}^{2} \\ + (\frac{γ_{2}}{T} \sqrt{c_{1}^{2} + c_{2}^{' 2} + c_{3}^{' 2}} + \frac{d_{1}}{2}) ∥ \tilde{W} ∥ + {\dot{\tilde{σ}}}_{b}], \end{matrix}

(92)

If the triggering conditions (74) and (75) hold, one obtains

\begin{matrix} {\dot{Λ}}_{Γ} (t) & \leq & E [γ^{2} {\hat{ð}}^{T} \hat{ð} - ς^{T} Q_{1} ς + l_{ρ ð} {∥ E [e_{j} (t)] ∥}^{2} + C_{0} \\ - (\frac{γ_{1}}{T} - \frac{1}{2}) ∥ \tilde{W} ∥^{2} + Ψ ({\bar{ς}}_{j}) + \frac{1}{4} λ_{m}^{2} {∥ ς ∥}^{2} \\ + (\frac{γ_{2}}{T} \sqrt{c_{1}^{2} + c_{2}^{' 2} + c_{3}^{' 2}} + \frac{d_{1}}{2}) ∥ \tilde{W} ∥ + {\dot{\tilde{σ}}}_{b}] \\ \leq & E [- (ϖ^{2} λ_{min} (Q_{1}) - \frac{1}{4} λ_{m}^{2}) {∥ ς ∥}^{2} - (\frac{γ_{1}}{T} - \frac{1}{2}) \\ \times {(∥ \tilde{W} ∥ - \frac{2 γ_{2} \sqrt{c_{1}^{2} + c_{2}^{' 2} + c_{3}^{' 2}} + T d_{1}}{2 (2 γ_{1} - T)})}^{2} + m], \end{matrix}

(93)

where

m = \frac{{(2 γ_{2} \sqrt{c_{1}^{2} + c_{2}^{' 2} + c_{3}^{' 2}} + T d_{1})}^{2}}{8 T (2 γ_{1} - T)}

.

We select the interval T and the parameter

γ_{1}

,

ϖ

under the condition that

ϖ^{2} λ_{min} (Q_{1}) - 1 / 4 λ_{m}^{2} > 0

. Let parameter T be set such that

γ_{1} > T

; then

{\dot{Λ}}_{Γ} (t) < 0

given that

\begin{matrix} ∥ ς ∥ & > \sqrt{\frac{m}{ϖ^{2} λ_{min} (Q_{1}) - \frac{1}{4} λ_{m}^{2}}}, \\ ∥ \tilde{W} ∥ & > \sqrt{\frac{2 T m}{2 γ_{1} - T}} + \frac{2 γ_{2} \sqrt{c_{1}^{2} + c_{2}^{' 2} + c_{3}^{' 2}} + T d_{1}}{2 (2 γ_{1} - T)} . \end{matrix}

Case 2: For any

\forall t = s_{j + 1}

, the derivation of

Λ_{Γ} (t)

is given by

\begin{matrix} Δ Λ_{Γ} (t) \\ = & E [\frac{1}{2 α_{1}} {\tilde{W}}^{T} (\bar{ς} (s_{j}^{+})) \tilde{W} (\bar{ς} (s_{j}^{+}))] - E [\frac{1}{2 α_{1}} {\tilde{W}}^{T} (\bar{ς} (s_{j})) \tilde{W} (\bar{ς} (s_{j}))] \\ + Γ^{*} ({\bar{ς}}_{j + 1}) - Γ^{*} ({\bar{ς}}_{j}) + + {\tilde{σ}}_{b} (\bar{ς} (s_{j}^{+})) - {\tilde{σ}}_{b} (\bar{ς} (s_{j})) + Γ^{*} (\bar{ς} (s_{j}^{+})) - Γ^{*} (\bar{ς} (s_{j})) . \end{matrix}

(94)

Considering Case 1,

Γ (t)

decreases strictly monotonically over the interval

[s_{j}, s_{j + 1})

. This implies

Γ (s_{j} + t_{1}) < Γ (s_{j})

for all

\forall t_{1} \in (0, s_{j + 1} - s_{j})

. Then, we have

Λ (s_{j}^{+}) \leq Λ (s_{j})

when calculating the limits on both sides of this inequality. Given this result, it follows that

\begin{matrix} σ (\bar{ς} (s_{j}^{+})) + E [\frac{1}{2 α_{1}} {\tilde{W}}^{T} (\bar{ς} (s_{j}^{+})) \tilde{W} (\bar{ς} (s_{j}^{+}))] + Γ^{*} (\bar{ς} (s_{j}^{+})) \\ \leq & σ (\bar{ς} (s_{j})) + E [\frac{1}{2 α_{1}} {\tilde{W}}^{T} (\bar{ς} (s_{j})) \tilde{W} (\bar{ς} (s_{j}))] + Γ^{*} (\bar{ς} (s_{j})) . \end{matrix}

(95)

Furthermore, for

s_{j}

,

j \in N

, according to Case 1,

Γ^{*} (ς)

constitutes a continuous difference function, so it follows that

Γ^{*} ({\bar{ς}}_{j + 1}) \leq Γ^{*} ({\bar{ς}}_{j})

. Based on (94) and (95), we derive

Δ Λ_{Γ} (t) \leq 0

.

Based on the above analysis, the proof is completed. □

6. Simulation

Consider the following stochastic system [36]:

\begin{matrix} \dot{ς} = χ (g) ς + B ρ + C d, \end{matrix}

(96)

with the system states and control policy being denoted as

ς = {[ς_{1}, ς_{2}]}^{T} \in R^{2}

and

ρ \in R

, respectively. d is the disturbance policy. The control policy

ρ

satisfies

- 0.8 \leq ρ \leq 1

. The uncertain term is

d (ς) = 0.5 p_{1} sin (2 p_{2} ς_{2}) cos (p_{3} ς_{1})

with

p_{1} \in (- 1, 1)

,

p_{2} \in (- 0.5, 0.5)

and

p_{3} \in (0, 0.5)

.

ξ_{i} (t), i = 1, \dots, 4

, is a random variable that satisfies a uniform distribution, and its pdfs are the same as example 1 in the simulation in [36].

Through calculation, one obtains

{∥ð (ς)∥}^{2} \leq sin (0.5 ς_{2}^{2}) \leq 0.5 {∥ς∥}^{2}

. Set the following auxiliary system as

\begin{matrix} \dot{ς} = χ (g) ς + B ρ + C ð, \end{matrix}

(97)

where

ð (ς)

denotes the auxiliary disturbance policy. The original state is

ς_{0} = {[1, 2]}^{T}

and

Q = diag {1, 1}

. The activation function is defined as follows:

φ_{c} = {[ς_{1}^{2}, ς_{1} ς_{2}, ς_{2}^{2}]}^{T}

,

φ_{ρ} = φ_{ð} = {[ς_{1}, ς_{2}]}^{T}

; the weight vectors are

W_{c} = {[W_{c 1}, W_{c 2}, W_{c 3}]}^{T}

,

W_{ρ} = {[W_{ρ 1}, W_{ρ 2}]}^{T}

, and

W_{ð} = {[W_{ð 1}, W_{ð 2}]}^{T}

. The sample size used by the MPCM is 16. Table 1 shows the simulation parameters.

The weight curves of NNs are shown in Figure 1 and Figure 2. The weight updates of the critic, actor, and disturbance NNs gradually approach

[3.413, 2.916, 2.147]

,

[0.868, 0.648]

, and

[0.576, 0.208]

, respectively. Based on this weight convergence value of the critic NN, we can obtain that the GC of the original system is 17.86.

Figure 3 shows the system state curves. Evidently, all state trajectories are capable of eventually converging to the equilibrium point.

Figure 4 gives the variation curves of control pairs according to the DETC mode. It is shown that, compared with the baseline, the curve of the control policy obtained by our designed method stays within the range of −0.8 to 1, which proves the effectiveness of the designed method with ASAs.

Figure 5 shows the trajectories of

| | e_{j} {| |}^{2}

and

{\bar{e}}_{T}

. By using the DETC method, the number of controller updates can be greatly reduced. The sampling period of ETC is given in Figure 6. It displays the time interval between the previous and the current triggering moment. The triggering frequency is dynamically adjusted according to the triggering condition.

Figure 7 illustrates the dynamic evolution process of internal signals within the DETC framework. It is clear that the trajectory of internal parameter changes is always greater than 0 and decreasing.

To demonstrate the control performance of the approach proposed in our design, a comparative simulation is given as follows by using the method proposed in [36].

Figure 8 and Figure 9 show the convergence of the NN weight vectors. The weight updates of the critic, actor and disturbance NNs gradually approach

[3.121, 2.880, 2.295]

,

[0.900, 0.697]

and

[0.599, 0.301]

, respectively.

Figure 10 shows the system state curves obtained by using the method in [36].

Figure 11 shows the variation curves of control pairs based on the ETC mode obtained by using the method in [36]. The blue line denotes the variation curve of the control input, indicating that it exceeds the range of −0.8 to 1. Figure 4 is within the range of −0.8 to 1. By comparing Figure 4 and Figure 11, it is clear that the control strategy in our design exhibits excellent performance.

Figure 12 shows the triggering conditions of

| | e_{j} {| |}^{2}

and

{\bar{e}}_{T}

under event-triggering mode. It is obvious that the ETC signal requires 2574 state samples, while the DETC strategy proposed in this paper only uses 2028 state samples. It is evident that the DETC proposed in this article can improve controller updates by up to 21.2%.

7. Conclusions

In our design, we have proposed dynamic event-triggered IRL-based control approach for a stochastic system with random parameters and ASAs. Firstly, a modified HJI equation has been formulated. Applying the actor–critic–disturbance NN architecture, both the OC signal under ASAs and the worst-case disturbance strategy have been derived by solving an improved HJI equation. Secondly, the MPCM has been employed to estimate the value function at specific sampling points. The utilization of the MPCM can reduce the computational complexity of sampling data in simulation experiments. Thirdly, through the introduction of dynamic parameters into the triggering conditions, the control input has been adjusted dynamically. Moreover, when the triggering condition is met, the weight values of NNs can be tuned in a synchronous manner, which has significantly enhanced the efficiency of communication resource usage. Furthermore, the Lyapunov stability theorem has been employed to analyze and verify the time-varying system’s stability. Finally, simulation results have confirmed the efficacy of the developed approach.

Author Contributions

Conceptualization, Y.L.; software, Y.L.; methodology, Y.L.; validation, M.X.; writing—original draft preparation, M.X. and Y.L.; rigorous analysis, J.Z.; supervision, Z.G.; data curation, J.Z. and Z.M.; writing—review and editing, M.X.; funding acquisition, Y.L.; visualization, J.Z. All authors have reviewed and agree to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (62403329).

Data Availability Statement

In terms of the data availability, if a researcher requires data from this article, the corresponding author can provide the simulation data.

Conflicts of Interest

The authors have declared that there are no conflicts of interest.

References

Xie, M.; Shakoor, A.; Wu, Z.; Jiang, B. Optical manipulation of biological cells with a robot-tweezers system: A stochastic control approach. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 3232–3236. [Google Scholar] [CrossRef]
Bazmohammadi, N.; Tahsiri, A.; Anvari-Moghaddam, A.; Guerrero, J.M. Stochastic predictive control of multi-microgrid systems. IEEE Trans. Ind. Appl. 2019, 55, 5311–5319. [Google Scholar] [CrossRef]
Dai, M.; Wu, C.; Wen, J. Vehicle longitudinal stochastic control for connected and automated vehicle platooning in highway systems. IEEE Trans. Intell. Transp. Syst. 2025, 26, 9563–9578. [Google Scholar] [CrossRef]
Liu, J.; Xu, J.; Zhang, H.; Fu, M. Stochastic LQ optimal control with initial and terminal constraints. IEEE Trans. Autom. Control 2024, 69, 6261–6268. [Google Scholar]
Sun, H.Y.; Mu, H.R.; Fu, S.J.; Han, H.G. Data-driven model predictive control for unknown nonlinear NCSs with stochastic sampling intervals and successive packet dropouts. IEEE Trans. Cybern. 2025, 55, 2899–2909. [Google Scholar] [CrossRef]
Xu, J.; Xie, L.; Zhang, H. Solution to discrete-time linear FBSDEs with application to stochastic control problem. IEEE Trans. Autom. Control 2017, 62, 6602–6607. [Google Scholar]
Li, Y.; Voos, H.; Darouach, M.; Hua, C. An application of linear algebra theory in networked control systems: Stochastic cyber-attacks detection approach. IMA J. Math. Control Inf. 2016, 33, 1081–1102. [Google Scholar] [CrossRef]
Cetinkaya, A.; Kishida, M. Instabilizability conditions for continuous-time stochastic systems under control input constraints. IEEE Control Syst. Lett. 2021, 6, 1430–1435. [Google Scholar] [CrossRef]
Chatterjee, D.; Hokayem, P.; Lygeros, J. Stochastic receding horizon control with bounded control inputs: A vector space approach. IEEE Trans. Autom. Control 2011, 56, 2704–2710. [Google Scholar]
Nguyen, X.P.; Dang, X.K.; Do, V.D.; Corchado, J.M.; Truong, H.N. Robust adaptive fuzzy-free fault-tolerant path planning control for a semi-submersible platform dynamic positioning system with actuator constraints. IEEE Trans. Intell. Transp. Syst. 2023, 24, 12701–12715. [Google Scholar]
Wang, F.; Xie, X.; Zhou, C. Locally expanded constraint-boundary-based adaptive composite control of a constrained nonlinear system with time-varying actuator fault. IEEE Trans. Fuzzy Syst. 2023, 31, 4121–4136. [Google Scholar] [CrossRef]
Sun, W.; Diao, S.; Su, S.F.; Sun, Z.Y. Fixed-time adaptive neural network control for nonlinear systems with input saturation. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 1911–1920. [Google Scholar] [CrossRef]
Zhang, F.; Song, M.; Huang, B.; Huang, P. Adaptive tracking control for tethered aircraft systems with actuator nonlinearities and output constraints. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 3582–3597. [Google Scholar] [CrossRef]
Gao, Z.; Zhang, Y.; Guo, G. Adaptive fixed-time sliding mode control of vehicular platoons with asymmetric actuator saturation. IEEE Trans. Veh. Technol. 2023, 72, 8409–8423. [Google Scholar] [CrossRef]
Guo, G.; Zhang, P. Asymptotic stabilization of USVs with actuator dead-zones and yaw constraints based on fixed-time disturbance observer. IEEE Trans. Veh. Technol. 2019, 69, 302–316. [Google Scholar] [CrossRef]
Wang, D.; Gao, N.; Liu, D.; Li, J.; Lewis, F.L. Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA J. Autom. Sin. 2023, 11, 18–36. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Y.; Cai, Y. Value iteration-based distributed adaptive dynamic programming for multi-player differential game with incomplete information. IEEE/CAA J. Autom. Sin. 2025, 12, 436–447. [Google Scholar] [CrossRef]
Wei, Q.; Yang, Z.; Su, H.; Wang, L. Online adaptive dynamic programming for optimal self-learning control of VTOL aircraft systems with disturbances. IEEE Trans. Autom. Sci. Eng. 2022, 21, 343–352. [Google Scholar] [CrossRef]
Wei, Q.; Chen, W.; Tan, X.; Xiao, J.; Dong, Q. Observer-based optimal Backstepping security control for nonlinear systems using reinforcement learning strategy. IEEE Trans. Cybern. 2024, 54, 7011–7023. [Google Scholar] [CrossRef]
Ming, Z.; Zhang, H.; Li, W.; Luo, Y. Neurodynamic programming and tracking control for nonlinear stochastic systems by PI algorithm. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 2892–2896. [Google Scholar] [CrossRef]
Li, J.; Yang, M.; Lewis, F.L.; Zheng, M. Compensator-based self-learning: Optimal operational control for two-time-scale systems with input constraints. IEEE Trans. Ind. Inform. 2024, 20, 9465–9475. [Google Scholar] [CrossRef]
Shi, H.; Gao, W.; Jiang, X.; Su, C.; Li, P. Two-dimensional model-free Q-learning-based output feedback fault-tolerant control for batch processes. Comput. Chem. Eng. 2024, 182, 108583. [Google Scholar] [CrossRef]
Pang, B.; Jiang, Z.P. Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Trans. Autom. Control 2022, 68, 2383–2390. [Google Scholar] [CrossRef]
Zhang, K.; Peng, Y. Model-free tracking control for linear stochastic systems via integral reinforcement learning. IEEE Trans. Autom. Sci. Eng. 2025, 22, 10835–10844. [Google Scholar] [CrossRef]
Zhang, H.; Qu, Q.; Xiao, G.; Cui, Y. Optimal guaranteed cost sliding mode control for constrained-input nonlinear systems with matched and unmatched disturbances. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2112–2126. [Google Scholar] [CrossRef]
Liu, T.; Jiang, Z.P. Event-based control of nonlinear systems with partial state and output feedback. Automatica 2015, 53, 10–22. [Google Scholar] [CrossRef]
Lu, J.; Han, L.; Wei, Q.; Wang, X.; Dai, X.; Wang, F.Y. Event-triggered deep reinforcement learning using parallel control: A case study in autonomous driving. IEEE Trans. Intell. Veh. 2023, 8, 2821–2831. [Google Scholar] [CrossRef]
Zhang, G.; Zhu, Q. Event-triggered optimized control for nonlinear delayed stochastic systems. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 3808–3821. [Google Scholar] [CrossRef]
Zhang, G.; Liang, C.; Zhu, Q. Adaptive fuzzy event-triggered optimized consensus control for delayed unknown stochastic nonlinear multi-agent systems using simplified ADP. IEEE Trans. Autom. Sci. Eng. 2025, 22, 11780–11793. [Google Scholar] [CrossRef]
Xue, S.; Zhang, W.; Luo, B.; Liu, D. Integral reinforcement learning-based dynamic event-triggered nonzero-sum games of USVs. IEEE Trans. Cybern. 2025, 55, 1706–1716. [Google Scholar] [CrossRef]
Ming, Z.g.; Zhang, H.; Tong, X.; Yan, Y. Mixed H2/H∞ control with dynamic event-triggered mechanism for partially unknown nonlinear stochastic systems. IEEE Trans. Autom. Sci. Eng. 2022, 20, 1934–1944. [Google Scholar]
Tong, X.; Ma, D.; Wang, R.; Xie, X.; Zhang, H. Dynamic event-triggered-based integral reinforcement learning algorithm for frequency control of microgrid with stochastic uncertainty. IEEE Trans. Consum. Electron. 2023, 69, 321–330. [Google Scholar]
Zhu, H.Y.; Li, Y.X.; Tong, S. Dynamic event-triggered reinforcement learning control of stochastic nonlinear systems. IEEE Trans. Fuzzy Syst. 2023, 31, 2917–2928. [Google Scholar] [CrossRef]
Liu, M.; Wan, Y.; Lewis, F.L. Adaptive optimal decision in multi-agent random switching systems. IEEE Control Syst. Lett. 2019, 4, 265–270. [Google Scholar]
Liu, T.; Qin, Z.; Hong, Y.; Jiang, Z.P. Distributed optimization of nonlinear multiagent systems: A small-gain approach. IEEE Trans. Autom. Control 2021, 67, 676–691. [Google Scholar]
Liang, Y.; Zhang, H.; Zhang, J.; Ming, Z. Event-triggered guarantee cost control for partially unknown stochastic systems via explorized integral reinforcement learning strategy. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 7830–7844. [Google Scholar]
Yuan, R.; Ma, J.; Su, P.; Dong, Y.; Cheng, J. Monte-Carlo integration models for multiple scattering based optical wireless communication. IEEE Trans. Commun. 2019, 68, 334–348. [Google Scholar] [CrossRef]
Wang, J.; Gao, X.; Cao, R.; Sun, Z. A multilevel Monte Carlo method for performing time-variant reliability analysis. IEEE Access 2021, 9, 31773–31781. [Google Scholar] [CrossRef]
Xie, J.; Wan, Y.; Mills, K.; Filliben, J.J.; Lewis, F.L. A scalable sampling method to high-dimensional uncertainties for optimal and reinforcement learning-based controls. IEEE Control Syst. Lett. 2017, 1, 98–103. [Google Scholar]
Zhou, Y.; Wan, Y.; Roy, S.; Taylor, C.; Wanke, C.; Ramamurthy, D.; Xie, J. Multivariate probabilistic collocation method for effective uncertainty evaluation with application to air traffic flow management. IEEE Trans. Syst. Man Cybern. Syst. 2014, 44, 1347–1363. [Google Scholar]
Liu, M.; Wan, Y.; Lewis, F.L.; Lopez, V.G. Adaptive optimal control for stochastic multiplayer differential games using on-policy and off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5522–5533. [Google Scholar] [CrossRef]
Jin, Z. Global asymptotic stability analysis for autonomous optimization. IEEE Trans. Autom. Control 2025, 70, 6953–6960. [Google Scholar] [CrossRef]
Jin, Z.; Li, H.; Qin, Z.; Wang, Z. Gradient-free cooperative source-seeking of quadrotor under disturbances and communication constraints. IEEE Trans. Ind. Electron. 2024, 72, 1969–1979. [Google Scholar] [CrossRef]
Shi, K.; Tang, Y.; Zhong, S.; Yin, C.; Huang, X.; Wang, W. Nonfragile asynchronous control for uncertain chaotic lurie network systems with bernoulli stochastic process. Int. J. Robust Nonlinear Control 2018, 28, 1693–1714. [Google Scholar] [CrossRef]
Cui, X.; Zhang, H.; Luo, Y.; Jiang, H. Adaptive dynamic programming for H∞ tracking design of uncertain nonlinear systems with disturbances and input constraints. Int. J. Adapt. Control Signal Process. 2017, 31, 1567–1583. [Google Scholar] [CrossRef]
Zhang, H.; Cui, X.; Luo, Y.; Jiang, H. Finite-horizon H∞ tracking control for unknown nonlinear systems with saturating actuators. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1200–1212. [Google Scholar] [PubMed]
Sahoo, A.; Jagannathan, S. Stochastic optimal regulation of nonlinear networked control systems by using event-driven adaptive dynamic programming. IEEE Trans. Cybern. 2016, 47, 425–438. [Google Scholar] [CrossRef]
Yasini, S.; Naghibi Sitani, M.B.; Kirampor, A. Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems. Int. J. Mach. Learn. Cybern. 2016, 7, 967–980. [Google Scholar] [CrossRef]

Figure 1. Weight trajectories of the critic NN in our design.

Figure 2. Weight curves of actor and disturbance NNs in our design.

Figure 3. System state curves in our design.

Figure 4. The change curves of the event-triggered optimization control strategy and worst-case disturbance strategy in our design.

Figure 5. The triggering condition of

| | e_{j} {| |}^{2}

and

{\bar{e}}_{T}

in our design.

Figure 5. The triggering condition of

| | e_{j} {| |}^{2}

and

{\bar{e}}_{T}

in our design.

Figure 6. Sampling period of the learning stage in our design.

Figure 7. The change curve of the dynamic signal in the triggering condition in our design.

Figure 8. Weight trajectories of the critic NN by using the method in [36].

Figure 9. Weight curves of actor and disturbance NNs by using method in [36].

Figure 10. System state curves by using method in [36].

Figure 11. The change curves of the event-triggered optimization control signal and worst-case disturbance signal by using the method in [36].

Figure 12. The triggering condition of

| | e_{j} {| |}^{2}

and

{\bar{e}}_{T}

by using the method in [36].

Figure 12. The triggering condition of

| | e_{j} {| |}^{2}

and

{\bar{e}}_{T}

by using the method in [36].

Table 1. Parameter settings.

Parameter	R	T	$γ$	$β$	$ρ_{\min}$	$ρ_{\max}$	$ϖ$	$l_{ρ}$
value	1	0.005	5	0.5	−0.8	1	0.5	10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, Y.; Xie, M.; Zhang, J.; Ming, Z.; Gao, Z. Integral Reinforcement Learning-Based Stochastic Guaranteed Cost Control for Time-Varying Systems with Asymmetric Saturation Actuators. Actuators 2025, 14, 506. https://doi.org/10.3390/act14100506

AMA Style

Liang Y, Xie M, Zhang J, Ming Z, Gao Z. Integral Reinforcement Learning-Based Stochastic Guaranteed Cost Control for Time-Varying Systems with Asymmetric Saturation Actuators. Actuators. 2025; 14(10):506. https://doi.org/10.3390/act14100506

Chicago/Turabian Style

Liang, Yuling, Mengjia Xie, Juan Zhang, Zhongyang Ming, and Zhiyun Gao. 2025. "Integral Reinforcement Learning-Based Stochastic Guaranteed Cost Control for Time-Varying Systems with Asymmetric Saturation Actuators" Actuators 14, no. 10: 506. https://doi.org/10.3390/act14100506

APA Style

Liang, Y., Xie, M., Zhang, J., Ming, Z., & Gao, Z. (2025). Integral Reinforcement Learning-Based Stochastic Guaranteed Cost Control for Time-Varying Systems with Asymmetric Saturation Actuators. Actuators, 14(10), 506. https://doi.org/10.3390/act14100506

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integral Reinforcement Learning-Based Stochastic Guaranteed Cost Control for Time-Varying Systems with Asymmetric Saturation Actuators

Abstract

1. Introduction

2. Problem Statement

3. Stochastic Optimal GCC Design

4. Stochastic GCC Method Design via MPCM and IRL Algorithm

4.1. On-Policy GCC Design

4.2. IRL-Based GCC Design with Asymmetric Constrained Inputs

5. Event-Triggered Construction of Optimal GCC

5.1. Event-Triggered GCC Design

5.2. NN-Based Control Design

5.3. Stability Analysis

6. Simulation

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI