Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints

Wang, Yajing; Wang, Xiangke; Shen, Lincheng

doi:10.3390/drones6110319

Open AccessArticle

Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints

by

Yajing Wang

,

Xiangke Wang

^* and

Lincheng Shen

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Drones 2022, 6(11), 319; https://doi.org/10.3390/drones6110319

Submission received: 5 September 2022 / Revised: 20 October 2022 / Accepted: 21 October 2022 / Published: 26 October 2022

(This article belongs to the Section Drone Design and Development)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes an approximate optimal curve-path-tracking control algorithm for partially unknown nonlinear systems subject to asymmetric control input constraints. Firstly, the problem is simplified by introducing a feedforward control law, and a dedicated design for optimal control with asymmetric input constraints is provided by redesigning the control cost function in a non-quadratic form. Then, the optimality and stability of the derived optimal control policy is demonstrated. To solve the underlying tracking Hamilton–Jacobi–Bellman (HJB) equation in consideration of partially unknown systems, an integral reinforcement learning (IRL) algorithm is utilized using the neural network (NN)-based value function approximation. Finally, the effectiveness and generalization of the proposed method is verified by experiments carried out on a high-fidelity hardware-in-the-loop (HIL) simulation system for fixed-wing unmanned aerial vehicles (UAVs) in comparison with three other typical path-tracking control algorithms.

Keywords:

optimal tracking control; asymmetric input constraints; integral reinforcement learning; fixed-wing UAVs

1. Introduction

The optimal tracking control problem (OTCP) is of major importance in a variety of applications for robotic systems such as wheeled vehicles, unmanned ground vehicles (UGVs), unmanned aerial vehicles (UAVs), etc. The aim is to find a control policy to drive the specified system, given a particular reference path to follow in an optimal manner [1,2,3,4,5,6]. The reference paths are generally generated by a separate mission planner according to specific tasks, and optimization is usually achieved by minimizing an objective function regarding energy cost, tracking error cost, and/or the traveling time cost.

With the rapid development of unmanned systems, algorithms to solve OTCPs have been widely studied in the literature. Addressing the OTCPs involves solving the underlying Hamilton–Jacobi–Bellman (HJB) equation. For linear systems, the HJB equation is replaced by the Riccati equation, and the numerical solution is generally available. However, for nonlinear robotic systems subject to asymmetric input constraints, such as fixed-wing UAVs and autonomous underwater vehicles (AUVs) [7,8,9], it is still a challenging issue. To deal with this difficulty while guaranteeing tracking performance for nonlinear systems, various methods have been developed to find approximate optimal control efficacy. One idea is to simplify or transform the objective function to be optimized to obtain a solution to an approximate or equivalent optimal control problem. For instance, nonlinear model predictive control (MPC) is used to obtain a near optimal path-following control law for UAVs by truncating the time horizon and minimizing a finite-horizon tracking objective function in [7,8]. Another idea aims to solve the approximate solution directly. An offline policy iteration (PI) strategy is utilized to obtain the near optimal solution by solving a sequence of Bellman equation iteratively [10]. However, in the abovmentioned methods, the complete dynamics of the system are generally required and the curse of dimensionality might occur. To deal with this issue, an approximate dynamic programming (ADP) scheme was developed and has received increasing interest in the optimal control area [11,12,13].

ADP, which combines the concept of reinforcement learning (RL) and Bellman’s principle of optimality, was first introduced in [11] to handle the curse of dimensionality that might occur in the classical dynamic programming (DP) scheme for solving optimal control problems. The main idea is to approximate the solution to the HJB equation using some parametric function approximation techniques, for which aneural network (NN) is the most commonly used scheme, such as a single-NN based value function approximation and the actor–critic dual-NN structure [14]. For continuous-time nonlinear systems, Ref. [15] proposed a data-based ADP algorithm to relax the dependence on the internal dynamics of the control system, which is also called integral reinforcement learning (IRL), to learn the solution to the HJB equation using only partial knowledge about the system dynamics. After that, the IRL scheme became widely used in various nonlinear optimization control problems, including optimal tracking control, control with input constraints, control with unknown or partially unknown systems, etc. [7,14,15,16].

The IRL-based methods are powerful tools used to solve nonlinear optimal control problems. However, the OTCP for nonlinear systems with partially unknown dynamics and asymmetric input constraints, especially for curve path tracking is still open to study. Firstly, the stability of the IRL-based methods for nonlinear constrained systems are generally hard to prove. Moreover, the changing curvature in the curve-path-tracking control problem makes it more difficult to stabilize the tracking error compared to the widely studied regulation control or circular path-tracking control problems. Moreover, the asymmetric input constraints are more difficult to deal with than commonly discussed symmetric constraints.

Motivated by the desire to solve the OTCP with the curve path for partially unknown nonlinear systems with asymmetric input constraints, this paper introduces a feedforward control law to simplify the problem and redesigns the non-quadratic form control input cost function and utilizes an NN-based IRL scheme to solve an approximate optimal control policy. The three main contributions are:

An approximate optimal curve-path-tracking control policy is developed for nonlinear systems with a feedforward control law, which handles the time-varying dynamics of the reference states caused by the curvature variation, and a data-driven IRL algorithm is developed to solve the approximate optimal control policy, in which a single-NN structure for value function approximation is utilized, reducing the computation burden and simplifying the algorithm structure.
The non-quadratic control cost function is redesigned via a constraint transformation with the introduced feedforward control law, which solves the challenge of asymmetric control input constraints that traditional methods cannot handle directly, and satisfactory input constraints are guaranteed with proof.
The proposed approximate optimal path-tracking control algorithm is validated via hardware-in-the-loop (HIL) simulations for fixed-wing UAVs in comparison with three other typical path-tracking algorithms. The result shows that the proposed algorithm not only has much less fluctuation and smaller root mean squared error (RMSE) of the tracking error but also naturally meets the control input constraints.

2. Problem Formulation

This section briefly formulates the OTCP of nonlinear systems subject to asymmetric control input constraints.

Consider the following affine nonlinear kinematic systems:

{\dot{x}}_{k} (t) = f_{k} (x_{k} (t)) + g_{k} (x_{k} (t)) u (t),

(1)

where

x_{k} \in R^{n_{1}}

is the vector of system motion states that we focus on,

f_{k} (\cdot) : R^{n_{1}} \to R^{n_{1}}

is the internal kinematic dynamics,

g_{k} (\cdot) : R^{n_{1}} \to R^{n_{1} \times m}

is the control input dynamics of system, and

u \in R^{m}

is the control input, which is constrained by

λ_{j}^{m i n} \leq u^{j} \leq λ_{j}^{m a x}, j = 1, \dots, m

(2)

where

λ_{j}^{m i n}

and

λ_{j}^{m a x}

are the minimum and the maximum thresholds of control input

u^{j}

, which are decided by characteristics of the actuator, and not always satisfying

λ_{j}^{m i n} = - λ_{j}^{m a x}

.

Remark 1.

The asymmetric control input constraint (2) is widespread in practical systems, such as fixed-wing UAVs and autonomous underwater vehicles (AUVs) [1,7,8,9,17]. For these systems, existing control algorithms that consider only symmetric input constraints cannot be utilized directly.

This paper studies the OTCP with curve paths for system (1) with input constraint (2). Thus, we focus on the tracking performance of the above motion states

x_{k}

with reference to the reference motion states

x_{k_{d}}

specified by the corresponding virtual target point (VTP)

p_{d}

on the reference path. Then, the considered tracking control system is described as

\{\begin{matrix} {\dot{x}}_{e} & = f_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) u \\ {\dot{x}}_{d} & = f_{d} (x_{d}) \end{matrix},

(3)

where

x_{e} = x_{k} - x_{k_{d}}

describes the tracking error state,

x_{d} = {[x_{k_{d}}^{⊤}, x_{c_{d}}^{⊤}]}^{⊤} \in R^{n_{2}}

represents the bounded state vector related to the reference motion states, not subject to human control,

x_{k_{d}} \in R^{n_{1}}

is the reference motion states, and

x_{c_{d}} \in R^{n_{2} - n_{1}}

describes some other related system variables,

n_{2} - n_{1} \geq 0

. The continuous-time functions,

f_{e} (\cdot)

and

g_{e} (\cdot)

, are internal dynamics and control input dynamics of the tracking error system,

f_{d} (x_{d})

is the dynamics of the reference states and is decided by the task setting. Obviously, the specific form of

f_{e} (\cdot)

and

g_{e} (\cdot)

is closely related to the specific

f_{d} (\cdot)

. For the tracking control problem of system (3), the complete system state is denoted as

x = {[x_{e}^{⊤}, x_{d}^{⊤}]}^{⊤}

. Then there is

x \in R^{n}, n = n_{1} + n_{2}

.

Remark 2.

Suppose that the reference path is generated by a separate mission planner, and

x_{c_{d}}

describes system dynamic parameters determined by the task setting, such as the moving speed of the VTP along the reference path. Then, it is reasonable to suppose that

f_{d} (\cdot)

is known, which describes the shape of the reference path as well as the motion dynamics of the reference point along the path.

Then, in the problem of curve-path-tracking control, given the reference motion state

x_{k_{d}}

corresponding to

p_{d}

, denote the curvature of the reference path at this point as

κ_{d}

, and the speed of the point moving along the path as

v_{d}

. The dynamics of the reference states can be more specifically described as

{\dot{x}}_{d} = f_{d} (x_{d}) = [\begin{matrix} {\dot{x}}_{k_{d}} \\ {\dot{v}}_{d} \\ {\dot{κ}}_{d} \end{matrix}] = [\begin{matrix} f_{k_{d}} (x_{k_{d}}, κ_{d}, v_{d}) \\ f_{v_{d}} (x_{k_{d}}) \\ f_{κ_{d}} (x_{k_{d}}) \end{matrix}] .

(4)

Then the control objective is to find an optimal control policy

u^{*}

that consumes at the least cost to drive the tracking error

x_{e}

to converge to

0

. To this end, take the objective function as

J (x (0), u) = \int_{0}^{\infty} [E (x) + U (u)] d τ, x (0) \in X,

(5)

where

X \subseteq R^{n}

is a compact set containing the origin of the tracking error,

E (x) = x_{e}^{T} (t) Q x_{e} (t)

is the quadratic tracking error cost with the positive definite diagonal matrix

Q

, and

U (u)

is the positive semi-definite control cost to be designed.

Now, referring to the concept in optimal control theory in [18], we define the admissible control for OTCP as follows.

Definition 1.

A control policy

u (t) = μ (x (t))

is said to be admissible, denoted as

u (t) \in U

, with respect to objective function (5) for the tracking control system (3), if

μ (x (t))

is continuous on

X

and satisfies constraints (2), and the corresponding state trajectory

x (t)

makes

J (x (0)) < \infty, \forall x (0) \in X

.

Then, the main objective of this paper is to find the optimal control policy

u^{*} \in U

that minimizes the objective function (5), and before we illustrate the design of solving

u^{*}

, the following assumption is made in this paper.

Assumption 1.

For any initial state

x (0) \in X

, given the dynamic function

f_{d} (\cdot)

of the reference state, there exists an admissible control

u^{(0)} \in U

, i.e.,

u^{(0)}

, which satisfies constraints (2), is continuous to

x

on set

X

, and stabilizes the tracking error in (3).

3. Optimal Control Design for Curve Path Tracking with Asymmetric Control Input Constraints

To find the optimal curve-path-tracking control policy

u^{*}

for system (3), this section first introduces a feedforward control law which helps to deal with the variation of the reference state dynamics. Then a dedicated design for a control cost function, which enables natural satisfactory of the asymmetric input constraint, is proposed (2).

Note that the main difficulty of curve-path-tracking control compared with that of regulation or straight/circular path tracking, is that the dynamics of the reference motion states

x_{k_{d}}

is time-varying because of the varying curvature of the reference path. To drive the tracking error to converge to

0

, when

x_{e} = 0

, it needs

{\dot{x}}_{e} = 0

. The point is, different from regulation control problem, there needs to be a non-zero steady-state control law (denoted as

\bar{u}

) because of the varying dynamics of

x_{k_{d}}

, such that

{\dot{x}}_{e} {(t) |}_{x_{e} = 0} = f_{e} (0, x_{d} (t)) + g_{e} (0, x_{d} (t)) \bar{u} (0, x_{d} (t)) = 0 .

(6)

It is easy to know that this non-zero steady-state control input

\bar{u}

mainly depends on the dynamics of reference states. Therefore, we rewrite the dynamic function of the reference motion state in (4) in the following form:

{\dot{x}}_{k_{d}} = f_{k_{d}} (x_{k_{d}}, κ_{d}, v_{d}) = f_{k} (x_{k_{d}}) + g_{k} (x_{k_{d}}) u_{d} (x_{k_{d}}, κ_{d}, v_{d}) .

(7)

Importing (7) and (1) into (6), we can obtain

\bar{u} (0, x_{d}) = u_{d} (x_{k_{d}}, κ_{d}, v_{d}) .

Then for

x_{e} \neq 0

, we extend the above result to define the feedforward control

\bar{u} (x)

as

\bar{u} (x | x_{d}) = u_{d} (x_{k_{d}}, κ_{d}, v_{d}) .

(8)

Remark 3.

The rewriting of (7) is reasonable for practical robotic systems since the reference state as well as the associated constraint conditions are well-concerned by the separate mission planner and will be illustrated by examples in later experiments.
The feedforward control law $\bar{u}$ here is not an admissible control policy, which cannot drive a non-zero tracking error to $0$ , but is to be taken as a part of the control policy for the tracking control system.

Now, this paper explains how to solve the desired optimal tracking control strategy

u^{*}

that satisfies the asymmetric control input constraint (2) in a simplified way by using

\bar{u}

.

Given the dynamic function

f_{d} (\cdot)

of reference states,

\bar{u}

can be obtained in real time according to (8). Then, the complete tracking control policy can be described as

u ≜ \bar{u} + \tilde{u},

where

\tilde{u}

is the feedback control to be solved. Importing

\tilde{u}

into the tracking error state equation in (3) generates

\begin{matrix} {\dot{x}}_{e} & = f_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) (\bar{u} + \tilde{u}) \\ = {\bar{f}}_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) \tilde{u}, \end{matrix}

(9)

where

{\bar{f}}_{e} (x_{e}, x_{d}) = f_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) \bar{u} .

Thus it holds that

{\bar{f}}_{e} (0, x_{d}) = 0

. Then, to solve the optimal control policy

u^{*}

is actually equivalent to solve the optimal feedback control

{\tilde{u}}^{*} ≜ u^{*} - \bar{u}

.

Therefore, in the consideration of control input constraint (2), referring to [10,16,19], the control cost in (5) is designed as

U (u) = \tilde{U} (\tilde{u}) = 2 \sum_{j = 1}^{m} \int_{0}^{{\tilde{u}}^{j}} {\tilde{λ}}_{j} r_{j} {tanh}^{- 1} (s / {\tilde{λ}}_{j}) d s,

(10)

which is a semi-positive definite function, and the greater the absolute value of the control input component

{\tilde{u}}^{j}

, the greater the function value. So, being a part of the objective function, it can help to find an energy-optimal solution, and

r_{j} > 0

is the weight coefficient with reference to component j. The main difference of (10) compared with that in [10,16,19] is that the threshold parameter in the integrand, i.e.,

{\tilde{λ}}_{j} \geq 0

, is not a constant directly obtained from a symmetric control constraint but redefined for the asymmetric constraint (2) with the introduced feedforward control law as

{\tilde{λ}}_{j} = \{\begin{matrix} - (λ_{j}^{m i n} - {\bar{u}}^{j}), & if {\tilde{u}}^{j} < 0 . \\ λ_{j}^{m a x} - {\bar{u}}^{j}, & if {\tilde{u}}^{j} \geq 0 . \end{matrix}

(11)

This design allows for the natural satisfaction of the asymmetric control input constraint 2, which will be illustrated later in Lemma 1.

Then for tracking control system (3) subject to asymmetric control input constraint (2), given an initial state

x (0) \in X

and the objective fuction (5) with (10), we define the optimal value function

V^{*} (x) \in C^{1}

as

V^{*} (x (t)) = min_{u \in U} J (x (t), u) = min_{\tilde{u} | \bar{u} + \tilde{u} \in U} J (x (t), \tilde{u}) .

(12)

Correspondingly, the Hamiltonian is constructed as

\begin{matrix} H (x, \tilde{u}, V^{*}) & = E (x) + \tilde{U} (\tilde{u}) + {(\nabla V^{*})}^{⊤} \dot{x} \\ = E (x) + \tilde{U} (\tilde{u}) + {(\nabla_{x_{e}} V^{*})}^{⊤} [{\bar{f}}_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) \tilde{u}], \end{matrix}

where

\nabla V^{*} = \frac{\partial V^{*}}{\partial x}, \nabla_{x_{e}} V^{*} = \frac{\partial V^{*}}{\partial x_{e}}

. Then, according to the principle of optimality,

{\tilde{u}}^{*}

satisfies

\begin{matrix} H (x, {\tilde{u}}^{*}, V^{*}) & = E (x) + \tilde{U} ({\tilde{u}}^{*}) + {(\nabla_{x_{e}} V^{*})}^{⊤} [{\bar{f}}_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) {\tilde{u}}^{*}] \\ = 0 . \end{matrix}

(13)

Then using the stationary condition, the optimal feedback control

{\tilde{u}}^{*}

can be obtained as

{\tilde{u}}^{*} = - Λ tanh [\frac{1}{2} {(Λ R)}^{- 1} g_{e}^{⊤} (x) \nabla_{x_{e}} V^{*}],

(14)

where

Λ, R \in R^{m \times m}

are diagonal matrices constructed by

{\tilde{λ}}_{j}

and

r_{j}, (j \in {1, \dots, m})

, respectively, i.e.,

Λ = [\begin{matrix} {\tilde{λ}}_{1} & \dots & 0 \\ 0 & ⋱ & 0 \\ 0 & \dots & {\tilde{λ}}_{m} \end{matrix}],

R = [\begin{matrix} r_{1} & \dots & 0 \\ 0 & ⋱ & 0 \\ 0 & \dots & r_{m} \end{matrix}] .

Then the optimal tracking control policy

u^{*} ≜ \bar{u} + {\tilde{u}}^{*}

is

u^{*} = \bar{u} - Λ tanh [\frac{1}{2} {(Λ R)}^{- 1} g_{e}^{⊤} (x) \nabla_{x_{e}} V^{*}] .

(15)

Importing

{\tilde{u}}^{*}

into (10), we can obtain the optimal control cost

\begin{matrix} \tilde{U} ({\tilde{u}}^{*}) & = 2 \sum_{j = 1}^{m} \int_{0}^{{\tilde{u}}^{j^{*}}} {\tilde{λ}}_{j} r_{j} {({tanh}^{- 1} (s / {\tilde{λ}}_{j}))}^{T} d s \\ = {(\nabla_{x_{e}} V^{*})}^{⊤} g_{e} (x) Λ tanh (D^{*}) + {diag}^{⊤} (Λ R Λ) ln (1_{2} - {tanh}^{2} (D^{*})), \end{matrix}

(16)

where

D^{*} = 1 / 2 {(Λ R)}^{- 1} g_{e}^{⊤} (x) \nabla_{x_{e}} V^{*}

,

{diag}^{⊤} (\cdot)

represents the vector constructed by the matrix main diagonal elements,

1_{2} = {(1, 1)}^{⊤}

.

Further, importing (16) into (13), the tracking HJB equation turns into

E (x) + {(\nabla_{x_{e}} V^{*})}^{⊤} {\bar{f}}_{e} (x_{e}, x_{d}) + {diag}^{⊤} (Λ R Λ) ln (1_{2} - {tanh}^{2} (D^{*})) = 0 .

(17)

Then if one can obtain the solution

V^{*}

by solving (17), (15) would provide the desired optimal tracking control policy.

Now we propose the following lemma.

Lemma 1.

With the non-quadratic control cost function (10), the optimal control policy

u^{*}

in (15) satisfies the asymmetric constraint (2) naturally.

Proof.

Under Assumption 1, there exists an admissible control

u^{(0)} \in U

such that

λ_{j}^{m i n} \leq u^{j^{(0)}} \leq λ_{j}^{m a x} .

(18)

Denote

u^{(0)}

as

u^{(0)} = {\tilde{u}}^{(0)} + \bar{u} .

Since

u^{(0)}

is an admissible control law, according to Definition 1, there must be

\begin{matrix} {\dot{x}}_{e} {(t) |}_{x_{e} = 0} & = f_{e} (0, x_{d} (t)) + g_{e} (0, x_{d} (t)) u^{(0)} (t) \\ = f_{e} (0, x_{d} (t)) + g_{e} (0, x_{d} (t)) [{\tilde{u}}^{(0)} + \bar{u}] \\ = 0 + g_{e} (0, x_{d} (t)) {\tilde{u}}^{(0)} \\ = 0 . \end{matrix}

(19)

Thus we have

{\tilde{u}}^{(0)} (0, x_{d}) = 0 .

Putting

{\tilde{u}}^{(0)}

into (18) generates

λ_{j}^{m i n} \leq {\bar{u}}^{j} (0, x_{d}) \leq λ_{j}^{m a x} .

Then according to definition of

{\tilde{λ}}_{j}

in (11) and the extended feedforward control defined in (8), we have

{\tilde{λ}}_{j} \geq 0 .

Since

- 1 \leq tanh (\cdot) \leq 1

, according to (14) and (11), the feedback control

{\tilde{u}}^{*}

satisfies

\{\begin{matrix} {\tilde{u}}^{j^{*}} \geq - {\tilde{λ}}_{j} = λ_{j}^{m i n} - {\bar{u}}^{j}, & when {\tilde{u}}^{j^{*}} < 0 . \\ {\tilde{u}}^{j^{*}} \leq {\tilde{λ}}_{j} = λ_{j}^{m a x} - {\bar{u}}^{j}, & when {\tilde{u}}^{j^{*}} \geq 0 . \end{matrix}

(20)

Then combining (20) with (15), we have

λ_{j}^{m i n} \leq u^{j^{*}} \leq λ_{j}^{m a x} .

This completes the proof. □

Next, the following theorem provides the optimality and stability analysis of

u^{*}

.

Theorem 1.

For tracking control system (3), given the dynamics function

f_{d} (\cdot)

of the reference state, initial state

x (0) \in X

and the objective function (5) with (10), assume

V^{*}

is a smooth positive definite solution to (17), then the optimal control policy given by (15) has the following properties:

$\forall u \in U$ , $u^{*}$ minimizes objective function $J (x (0), u)$ ;
$u^{*}$ stabilizes the tracking error $x_{e}$ gradually.

Proof.

First, we prove that

u^{*}

minimizes the objective function J.

Given the initial state

x (0)

and the solution of HJB equation (17) as

V^{*}

, it holds that

\int_{0}^{\infty} {\dot{V}}^{*} (x (t)) d t = - V^{*} (x (0)) .

(21)

Thus for any admissible control

u = \bar{u} + \tilde{u}

, the corresponding objective function (5) can be represented as

J (x (0), u) = \int_{0}^{\infty} [E (x) + \tilde{U} (\tilde{u})] d t + \int_{0}^{\infty} {\dot{V}}^{*} (x (t)) d t + V^{*} (x (0)) .

(22)

Deriving

V^{*}

alone the state trajectory corresponding to

u_{i}

, we have

{\dot{V}}^{*} (x (t)) = \frac{\partial V^{*}}{\partial x} \dot{x} = {(\nabla_{x_{e}} V^{*})}^{⊤} [{\bar{f}}_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) \tilde{u}],

and

\begin{matrix} J (x (0), u) = & \int_{0}^{\infty} [E (x) + \tilde{U} (\tilde{u})] d τ \\ + \int_{0}^{\infty} {(\nabla_{x_{e}} V^{*})}^{⊤} [{\bar{f}}_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) \tilde{u}] d τ + V^{*} (x (0)) . \end{matrix}

(23)

By adding and subtracting

\int_{0}^{\infty} {(\nabla_{x_{e}} V^{*})}^{⊤} g_{e} (x) {\tilde{u}}^{*} d τ

and

\int_{0}^{\infty} \tilde{U} ({\tilde{u}}^{*}) d τ

to the right side of the equation, it generates

\begin{matrix} J (x (0), u) \\ = & \int_{0}^{\infty} [E (x) + \tilde{U} ({\tilde{u}}^{*})] d τ + \int_{0}^{\infty} {(\nabla_{x_{e}} V^{*})}^{⊤} [{\bar{f}}_{e} (x_{e}, x_{d}) + g_{e} (x_{e}, x_{d}) {\tilde{u}}^{*}] d τ + V^{*} (x (0)) \\ + \int_{0}^{\infty} {(\nabla_{x_{e}} V^{*})}^{⊤} g_{e} (x_{e}, x_{d}) (\tilde{u} - {\tilde{u}}^{*}) d τ + \int_{0}^{\infty} \tilde{U} (\tilde{u}) d τ - \int_{0}^{\infty} \tilde{U} ({\tilde{u}}^{*}) d τ . \end{matrix}

(24)

Then combine (24) with HJB equation (13), we further obtain

\begin{matrix} J (x (0), u) \\ = & \int_{0}^{\infty} H (x, {\tilde{u}}^{*}, V^{*}) d τ + V^{*} (x (0)) + \int_{0}^{\infty} {(\nabla_{x_{e}} V^{*})}^{⊤} g_{e} (x_{e}, x_{d}) (\tilde{u} - {\tilde{u}}^{*}) d τ \\ + \int_{0}^{\infty} \tilde{U} (\tilde{u}) d τ - \int_{0}^{\infty} \tilde{U} ({\tilde{u}}^{*}) d τ \\ = & V^{*} (x (0)) + \int_{0}^{\infty} [{(\nabla_{x_{e}} V^{*})}^{⊤} g_{e} (x) (\tilde{u} - {\tilde{u}}^{*}) + 2 \sum_{j = 1}^{m} \int_{{\tilde{u}}^{j^{*}}}^{{\tilde{u}}^{j}} {\tilde{λ}}_{j} r_{j} {tanh}^{- 1} (s / {\tilde{λ}}_{j}) d s] d τ . \end{matrix}

(25)

Denote that

M = {(\nabla_{x_{e}} V^{*})}^{⊤} g_{e} (x) (\tilde{u} - {\tilde{u}}^{*}) + 2 \sum_{j = 1}^{m} \int_{{\tilde{u}}^{j^{*}}}^{{\tilde{u}}^{j}} {\tilde{λ}}_{j} r_{j} {tanh}^{- 1} (s / {\tilde{λ}}_{j}) d s .

(26)

Then to prove that

u^{*}

minimizes J, one needs to prove that

M > 0

for all admissible control

u \neq u^{*}

, and that

M = 0

if and only if

u = u^{*}

.

Based on (14), there is

{(\nabla_{x_{e}} V^{*})}^{⊤} g_{e} (x) = - 2 Λ R {tanh}^{- 1} ({(Λ^{- 1} {\tilde{u}}^{*})}^{⊤}) .

(27)

Then importing (27) to M, we obtain

\begin{matrix} M & = 2 {[Λ R {tanh}^{- 1} (Λ^{- 1} {\tilde{u}}^{*})]}^{⊤} ({\tilde{u}}^{*} - \tilde{u}) + 2 \sum_{j = 1}^{m} \int_{{\tilde{u}}^{j^{*}}}^{{\tilde{u}}^{j}} {\tilde{λ}}_{j} r_{j} {tanh}^{- 1} (s / {\tilde{λ}}_{j}) d s \\ = 2 \sum_{j = 1}^{m} r_{j} [{\tilde{λ}}_{j} {tanh}^{- 1} ({\tilde{u}}^{j^{*}} / {\tilde{λ}}_{j}) ({\tilde{u}}^{j^{*}} - {\tilde{u}}^{j}) + \int_{{\tilde{u}}^{j^{*}}}^{{\tilde{u}}^{j}} {\tilde{λ}}_{j} {tanh}^{- 1} (s / {\tilde{λ}}_{j}) d s] . \end{matrix}

(28)

To help to analyze, define a function

ς_{a} (x_{1}, x_{2})

as

ς_{a} (x_{1}, x_{2}) = a {tanh}^{- 1} (x_{1} / a) (x_{1} - x_{2}) + \int_{x_{1}}^{x_{2}} a {tanh}^{- 1} (s / a) d s,

where

a > 0

,

- a \leq x_{1}, x_{2} \leq a

. Since

{tanh}^{- 1} (\cdot)

increases monotonically, when

x_{1} < x_{2}

, there must be a

\hat{x} \in (x_{1}, x_{2})

, such that

a {tanh}^{- 1} (\hat{x} / a) (x_{2} - x_{1}) = \int_{x_{1}}^{x_{2}} a {tanh}^{- 1} (s / a) d s,

and that

{tanh}^{- 1} (\hat{x} / a) > {tanh}^{- 1} (x_{1} / a)

. Then importing

{tanh}^{- 1} (\hat{x} / a)

into

ς_{a} (x_{1}, x_{2})

generates

\begin{matrix} ς_{a} (x_{1}, x_{2}) & = a {tanh}^{- 1} (x_{1} / a) (x_{1} - x_{2}) + a {tanh}^{- 1} (\hat{x} / a) (x_{2} - x_{1}) \\ = a ({tanh}^{- 1} (\hat{x} / a) - {tanh}^{- 1} (x_{1} / a)) (x_{2} - x_{1}) \\ > 0 . \end{matrix}

(29)

Likewise, when

x_{1} > x_{2}

, there must also be a

\hat{x} \in (x_{2}, x_{1})

, such that

a {tanh}^{- 1} (\hat{x} / a) (x_{1} - x_{2}) = - \int_{x_{1}}^{x_{2}} a {tanh}^{- 1} (s / a) d s,

and that

{tanh}^{- 1} (x_{1} / a) > {tanh}^{- 1} (\hat{x} / a)

. Then importing

{tanh}^{- 1} (\hat{x} / a)

into

ς_{a} (x_{1}, x_{2})

we have

\begin{matrix} ς_{a} (x_{1}, x_{2}) & = a {tanh}^{- 1} (x_{1} / a) (x_{1} - x_{2}) - a {tanh}^{- 1} (\hat{x} / a) (x_{1} - x_{2}) \\ = a ({tanh}^{- 1} (x_{1} / a) - {tanh}^{- 1} (\hat{x} / a)) (x_{1} - x_{2}) \\ > 0 . \end{matrix}

(30)

Further, when

x_{1} = x_{2}

, it holds that

ς_{a} (x_{1}, x_{2}) = 0

. That is,

ς_{a} (x_{1}, x_{2}) = 0

only when

x_{1} = x_{2}

, and

ς_{a} (x_{1}, x_{2}) > 0

, when

x_{1} \neq x_{2}

.

Combining the above conclusion with (28), M can be represented as

M = 2 \sum_{j = 1}^{m} r_{j} ς_{{\tilde{λ}}_{j}} ({\tilde{u}}^{j^{*}}, {\tilde{u}}^{j}) .

(31)

Then, there is

\{\begin{matrix} M > 0 & if \exists j \in {1, \dots, m}, s . t . {\tilde{u}}^{j} \neq {\tilde{u}}^{j^{*}} . \\ M = 0 & if \forall j \in {1, \dots, m}, s . t . {\tilde{u}}^{j} = {\tilde{u}}^{j^{*}} . \end{matrix}

(32)

Therefore,

J (x (0), u) \geq V^{*} (x (0))

holds for all

u \in U

, in which the = holds only when

u = u^{*} ≜ \bar{u} + {\tilde{u}}^{*}

.

Next, we prove that the tracking error

x_{e}

is gradually stabilized with

u^{*}

.

Note that

V^{*} (x)

is a positive semi-definite function. Take

V^{*} (x)

as the Lyapunov function of the tracking control system (3), then there is

{\dot{V}}^{*} (x (t)) = - x_{e}^{⊤} Q x_{e} - \tilde{U} ({\tilde{u}}^{*}) \leq 0 .

(33)

It is known from the proof of Lemma 1 that

{\tilde{u}}^{*} (0, x_{d}) = 0

, then the “=’’ in (33) holds only if

x_{e} = 0

. Thus,

u^{*}

gradually stabilizes

x_{e}

.

This completes the proof. □

4. IRL-Based Approximate Optimal Solution

The last section provides the design of the optimal tracking control policy

u^{*}

. However, to solve

u^{*}

involves solving the HJB Equation (17), which is highly nonlinear to

V^{*}

. In consideration of the difficulty in solving (17), this section provides an NN-based IRL algorithm to obtain an approximate optimal solution.

With the optimal value function denoted as

V^{*}

, the following integral form of the value function is taken according to the idea of IRL:

V^{*} (x (t)) = \int_{t}^{t + T} [x_{e}^{⊤} Q x_{e} + \tilde{U} ({\tilde{u}}^{*})] d τ + V^{*} (x (t + T)),

(34)

where the integral reinforcement interval

T > 0

.

Then the IRL-based PI Algorithm 1 is presented as follows.

Algorithm 1 IRL-based optimal path-tracking algorithm

1:: Policy evaluation: NN weights update

$V^{(k)} (x (t)) = \int_{t}^{t + T} [x_{e}^{⊤} Q x_{e} + \tilde{U} ({\tilde{u}}^{(k)})] d τ + V^{(k)} (x (t + T)) .$

(35)
2:: Policy improvement:

$u^{(k + 1)} = - Λ tanh (D^{(k)}) + \bar{u},$

(36)

where $D^{(k)} = 1 / 2 {(Λ R)}^{- 1} g_{e}^{⊤} (x) \nabla_{x_{e}} V^{(k)}$ .

Remark 4.

Equation (34) is equivalent to the HJB equation (17) in the way that (34) and (17) have the same positive definite solution

V^{*}

, and according to the result of traditional PI algorithm, given an initial admissible control

u^{(0)}

, then for all

k \geq 0

, iteratively solving (35) for

V^{(k)}

, there always exists an admissible control

u^{(k + 1)}

with (36), and when

k \to \infty

,

u^{(k)}

and

V^{(k)}

uniformly converge to

u^{*}

and

V^{*}

[10,20].

To implement Algorithm 1, this paper introduces a single-layer NN with p neurons to approximate the value function:

V^{(k)} (x) = {(W_{c}^{(k)})}^{⊤} σ (x) + ε (x),

(37)

and

\nabla V^{(k)} (x) = {(\nabla σ (x))}^{⊤} W_{c}^{(k)} + \nabla ε (x),

(38)

where

W_{c}^{(k)} \in R^{p}

is the optimal weight vector to approximate

V^{(k)}

,

σ (\cdot) : R^{n} \to R^{p}

is the vector of continuously differentiable bounded basis functions, and

ε

is the approximation error. Then, according to work in [10], when the number of neurons

p \to \infty

, the fitting error

ε

would be close to 0, and [21] points out that, even when the number of neurons is limited, the fitting error is still bounded. Therefore,

ε

and

\nabla ε

are bounded over the compact set

X

, i.e., there exist constants

b_{ε} > 0

and

b_{ε}^{^{'}} > 0

such that

| ε (x) | \leq b_{ε}

,

∥ \nabla ε (x) ∥ \leq b_{ε}^{^{'}}

.

Putting (37) into (35), we obtain the tracking Bellman error as

ε_{c}^{(k)} (t) = \int_{t}^{t + T} [x_{e}^{⊤} Q x_{e} + \tilde{U} ({\tilde{u}}^{(k)})] d τ + {(W_{c}^{(k)})}^{⊤} Δ σ (x (t)),

(39)

where

Δ σ (x (t)) = σ (x (t + T)) - σ (x (t))

. Then, there exists a positive constant

ε_{m a x}

such that

| ε_{c}^{(k)} (t) | \leq ε_{m a x}, \forall t \geq 0

.

Since the optimal weight vector

W_{c}^{(k)}

in (37) is unknown, the value function is approximated in the iteration as

{\hat{V}}^{(k)} (x) = {({\hat{W}}_{c}^{(k)})}^{⊤} σ (x),

(40)

where

{\hat{W}}_{c}^{(k)}

is the estimation of

W_{c}^{(k)}

. Then, the estimation of

ε_{c}^{(k)} (t)

is

{\hat{e}}_{c}^{(k)} (t) = \int_{t}^{t + T} [x_{e}^{⊤} Q x_{e} + \tilde{U} ({\tilde{u}}^{(k)})] d τ + {({\hat{W}}_{c}^{(k)})}^{⊤} Δ σ (x (t)) .

(41)

To find the best weight vector

W_{c}^{(k)}

of

V^{(k)}

, the tuning law of the weight estimation

{\hat{W}}_{c}^{(k)}

should minimize the estimated Bellman error

{\hat{e}}_{c}^{(k)}

. Utilizing the gradient decent scheme and considering the objective function

E_{c} = \frac{1}{2} {({\hat{e}}_{c}^{(k)})}^{2}

, we take the tuning law for the weight vector as

{\dot{\hat{W}}}_{c}^{(k)} = - α_{c} δ \frac{\partial E_{c}}{\partial {\hat{W}}_{c}^{(k)}} = - α_{c} \frac{Δ σ (x)}{{(Δ σ^{T} (x) Δ σ (x) + 1)}^{2}} {\hat{e}}_{c}^{(k)},

(42)

where

α_{c} > 0

is the learning rate, and

δ = \frac{1}{{(Δ σ^{T} (x) Δ σ (x) + 1)}^{2}}

is used for normalization [16]. Then, taking the sampling period as equal to the integral reinforcement interval T, after each N sampling period, the NN weights of online IRL-based PI for the approximate tracking control policy after

k^{t h}

iterations is updated by

{\hat{W}}_{c}^{(k + 1)} = {\hat{W}}_{c}^{(k)} - α_{c} \frac{1}{N} \sum_{j = 0}^{N - 1} \frac{Δ σ_{j} (x)}{{(Δ σ_{j}^{⊤} (x) Δ σ_{j} (x) + 1)}^{2}} {\hat{e}}_{c_{j}}^{(k)} .

Importing

{\hat{W}}_{c}^{(k + 1)}

into (36), we obtain the improved control policy

{\hat{u}}^{(k + 1)} = - Λ tanh ({\hat{D}}^{(k + 1)}) + \bar{u},

(43)

where

{\hat{D}}^{(k + 1)} = 1 / 2 {(Λ R)}^{- 1} g_{e}^{⊤} (x) {(\nabla σ (x))}^{⊤} {\hat{W}}_{c}^{(k + 1)}

. Then, given an initial approximated weight

{\hat{W}}_{c}^{(0)}

corresponding to an admissible initial control

u^{(0)}

, the online IRL-based PI can be performed as in Figure 1.

Remark 5.

Let

u^{(0)}

be any admissible bounded control policy in the algorithm in Figure 1, and take (42) as the tuning law of the critic NN weights. If

Δ \bar{σ} = Δ σ (x) / (Δ σ^{T} (x) Δ σ (x) + 1)

is persistently exciting (PE), i.e., if there exist

γ_{1} > 0

and

γ_{2} > 0

such that

\forall t > 0

γ_{1} I \leq \int_{t}^{t + T} Δ \bar{σ} Δ {\bar{σ}}^{T} d τ \leq γ_{2} I,

(44)

where

I

is the unit matrix, then for the bounded reconstruction error

ε_{c}^{(k)}

in (41), the critic weight estimation error

{\tilde{W}}_{c}^{(k)} = W_{c}^{(k)} - {\hat{W}}_{c}^{(k)}

converges exponentially fast to a residual set [13,14,15].

5. Application to Fixed-Wing UAVs

This section verifies the proposed method on the OTCP for fixed-wing UAVs curve path tracking in HIL simulations in comparison with three other typical path tracking algorithms.

5.1. Problem Formulation

The system state of fixed-wing UAVs denoted by

x_{k} = {(x, y, ψ)}^{⊤}

includes the position of the UAV in the inertial system

p = {(x, y)}^{⊤}

and the heading angle

ψ

. The control input

u

comprises the airspeed

u_{v}

and heading rate

u_{ω}

, which are constrained by

\begin{matrix} v_{s t a l l} \leq & u_{v} \leq v_{m a x}, \\ - ω_{m a x} \leq & u_{ω} \leq ω_{m a x}, \end{matrix}

(45)

where

v_{s t a l l} > 0

is the minimum stall speed,

v_{m a x}

, and

ω_{m a x}

are the maximum speed and heading rate, respectively, determined by executor features.

Given the VTP

p_{d} (t)

at time t, the corresponding reference motion state

x_{k_{d}} (t) = {(x_{d}, y_{d}, ψ_{d})}^{⊤}

is then designated. Let the VTP move at a constant speed

v_{d}

along the reference path. Then, denote the curve length of one point with reference to the start point along the path as l. Given the parameterized function

Q (l)

of the reference path, the curvature

κ_{d}

at

p_{d} (t)

can be calculated. Then, the reference state dynamics are obtained:

{\dot{x}}_{d} = [\begin{matrix} {\dot{x}}_{d} \\ {\dot{y}}_{d} \\ {\dot{ψ}}_{d} \\ {\dot{v}}_{d} \\ {\dot{κ}}_{d} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\ f_{κ_{d}} (x_{k_{d}} | Q (l)) \end{matrix}] + [\begin{matrix} cos ψ_{d} & 0 \\ sin ψ_{d} & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \end{matrix}] [\begin{matrix} v_{d} \\ v_{d} κ_{d} \end{matrix}] .

(46)

Then, the feedforward control law is

\bar{u} = [\begin{matrix} v_{d} \\ v_{d} κ_{d} \end{matrix}] .

Define the tracking error in a local Frenet–Serret coordinate system

{F}

as

x_{e} = {(x_{e}, y_{e}, ψ_{e})}^{⊤}

[22,23]. Then, let

Q = I_{3 \times 3}

and

R = I_{2 \times 2}

. The goal is to solve the optimal control

u^{*}

that minimizes the objective function (5) with (10).

5.2. Approximate Optimal Control Policy Learning

This subsection utilizes the proposed method to find an approximate optimal policy for OTCP of fixed-wing UAVs formulated in the last subsection.

The learning process is carried out on Matlab 2018. Table 1 presents the parameter settings, and the nonlinear kinematics of fixed-wing UAVs is modeled by

{\dot{x}}_{k} = [\begin{matrix} \dot{x} \\ \dot{y} \\ \dot{ψ} \end{matrix}] = [\begin{matrix} cos ψ & 0 \\ sin ψ & 0 \\ 0 & 1 \end{matrix}] [\begin{matrix} u_{v} \\ u_{ω} \end{matrix}] .

(47)

Given coordinates of five waypoints, the reference curve path is generated using the third order B-spline curve algorithm (See Figure 2a). Given the reference state of the start point on the reference path, the initial state of the UAV is randomly chosen within

x_{e} (0), y_{e} (0) \in [- 50, 50], ψ_{e} (0) \in [- π, π]

. The basis for the value function approximation is selected as

σ = {[\begin{matrix} x_{e} & y_{e} & ψ_{e} & x_{e} y_{e} & x_{e} ψ_{e} & y_{e} ψ_{e} & x_{e}^{2} & y_{e}^{2} & ψ_{e}^{2} \end{matrix}]}^{⊤} .

(48)

The value function NN weights are initialized as

{\hat{W}}_{c}^{(0)} = {[\begin{matrix} 0.1 & 0.1 & - 0.5 & 0.1 & 0.1 & 0.1 & 0.1 & 0.1 & 0.5 \end{matrix}]}^{⊤},

(49)

which corresponds to an admissible but non-optimal control policy

u^{(0)}

. Given the initial NN weights and the corresponding admissible initial control policy, the tracking data are collected online, and the NN weights are updated once a batch of a specific amount of data is collected according to the flow in Figure 1.

The iterative process of critic NN weight estimates are provided in Figure 2b, which converges to a steady value in 23 steps, and the final NN weights are

{\hat{W}}_{c}^{(23)} = {[\begin{matrix} 0.329 & 0.002 & 0.574 & 0.102 & 0.074 & 0.120 & 0.100 & 0.103 & 3.204 \end{matrix}]}^{⊤},

(50)

which provides an approximate optimal path-tracking control policy for fixed-wing UAVs. In the process of policy training, we found that

w_{3}

and

w_{9}

demonstrate stronger oscillation compared with other NN weights, which is also presented in Figure 2b. This is because both of the corresponding activation functions are one-variable functions of the heading angle error

ψ_{e}

, which is set to be within

[- π, π]

during the training, and the value ranges of

x_{e}

and

y_{e}

are set to be

[- 300, 300]

. Thus, there is no unified metric for the three components. As a result, the weight of the activation function would be much more sensitive to variation of the approximated function value.

5.3. HIL Simulation Test and Result Analysis

To fully validate the effectiveness of the proposed method on OTCP of fixed-wing UAVs, the learned control policy was tested on a high-fidelity HIL simulation system in comparison with three other typical path-tracking algorithms [5]: the pure pursuit and line of sight algorithm (PLOS), the nonlinear Lyapunov guidance method (NLGL), and the backstepping control method (BS). The HIL simulation system consists of a swarm control station, the host computer, a Pixhawk autopilot, a QGround Control, and an X-Plane aircraft simulator. Specifically, the swarm control station, which is used to give task instructions and displays the current status of the system, was developed by the authors’ team. The host computer was used to simulate the onboard computer of the physical aircraft, which receives and processes task instructions from the control station and state information from onboard sensors and generates and sends control commands to Pixhawk. The Pixhawk autopilot is a widely-used open-source autopilot, and it processes and generates control commands for the underlying actuators and collects and sends back the sensor data. The X-Plane aircraft simulator, which is a high-fidelity aircraft simulator, provides the physical engine and dynamics simulation of the UAV, and the QGround Control performs as an information transfer station between X-Plane and Pixhawk (see Figure 3 for the flow of control commands and sate information).

Note that:

The reference path in HIL simulations shown in Figure 4a is generated by QGround Control with eight waypoints (provided in Table 2) on an experimental airport, which is different from that used for policy learning and has larger curvature changes.
Speed constraints in the aircraft simulator during the test were $10 \leq u_{v} \leq 18$ , different from settings in policy learning (which is the same as a practical UAV platform).

In spite of the abovementioned differences between settings of policy learning process and HIL simulation, the learned control policy provided satisfying tracking performance in the comparison HIL simulation. The path-tracking trajectories are presented in Figure 4b, which shows that all of the four algorithms can stably track the reference curve path. Figure 5 and Figure 6 further show the heading and the cross-tracking errors of the four algorithms. From these two figures, we can see that the learned control policy using the proposed method leads to a smooth curve-path-tracking trajectory with a small lateral steady-state tracking error and near zero heading and forward steady-state tracking errors. Moreover, heading tracking errors of BS, PLOS, and NLGL, the forward tracking error of BS, and the lateral tracking error of PLOS and NLGL, show significant fluctuations compared with the proposed method, especially when the UAV moves up to the corners of the reference path. This is because the heading tracking error and the information of curvature variation of the reference path are not considered in the three algorithms. Therefore, the three algorithms cannot achieve a satisfactory curve-path-tracking control performance as in the straight-line and circular path-tracking control problems, and the proposed method can provide more stable and smooth tracking performance. Figure 6 also shows that both PLOS and NLGL algorithms have a significant steady-state forward error. The main reason is that, the tracking performance of the two algorithms is very dependent on the update rule of the VTP, which is required to be updated ahead of a distance before the UAV’s arrival, and the algorithms would fail to track the path if this distance is not large enough (such as, smaller than about 20 m). Finally, Figure 7 provides the control input using the provided method, which verifies that the input constraints are naturally satisfied instead of being forcibly cut down during the whole path-tracking period.

6. Conclusions

This paper developed an approximate optimal control scheme for OTCP of nonlinear systems with asymmetric input constraints. Especially, the difficulty brought by the varying curvature of the curve reference path is handled by introducing a feedforward control law. The effectiveness was verified in a high-fidelity HIL system for fixed-wing UAVs. The result confirmed the effectiveness and generalization of the learned control policy and indicates the capability of ADP theory in complicated nonlinear systems. Future work will study the robust control of such control systems under external disturbance.

Author Contributions

Conceptualization, Y.W. and X.W.; methodology, Y.W.; software, Y.W.; validation, Y.W. and X.W.; formal analysis, Y.W.; investigation, Y.W.; resources, X.W. and L.S.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W. and X.W.; visualization, Y.W.; supervision, L.S. and X.W.; project administration, X.W. and L.S.; funding acquisition, X.W. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 61973309; Natural Science Foundation of Hunan Province grant number 2021JJ10053 and Hunan Provincial Innovation Foundation for Postgraduate grant number CX20210009.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yang, J.; Liu, C.; Coombes, M.; Yan, Y.; Chen, W.H. Optimal Path Following for Small Fixed-Wing UAVs under Wind Disturbances. IEEE Trans. Control Syst. Technol. 2021, 29, 996–1008. [Google Scholar] [CrossRef]
Kang, J.G.; Kim, T.; Kwon, L.; Kim, H.D.; Park, J.S. Design and Implementation of a UUV Tracking Algorithm for a USV. Drones 2022, 6, 66. [Google Scholar] [CrossRef]
Ratnoo, A.; Sujit, P.B.; Kothari, M. Adaptive Optimal Path Following for High Wind Flights. IFAC Proc. Vol. 2011, 44, 12985–12990. [Google Scholar] [CrossRef]
Lin, F.; Chen, Y.; Zhao, Y.; Wang, S. Path Tracking of Autonomous Vehicle Based on Adaptive Model Predictive Control. Int. J. Adv. Robot. Syst. 2019, 16, 1–12. [Google Scholar] [CrossRef]
Sujit, P.B.; Saripalli, S.; Sousa, J.B. Unmanned Aerial Vehicle Path Following: A Survey and Analysis of Algorithms for Fixed-Wing Unmanned Aerial Vehicles. IEEE Control Syst. Mag. 2014, 34, 42–59. [Google Scholar]
Chen, S.; Chen, H.; Negrut, D. Implementation of MPC-Based Path Tracking for Autonomous Vehicles Considering Three Vehicle Dynamics Models with Different Fidelities. Automot. Innov. 2020, 3, 386–399. [Google Scholar] [CrossRef]
Rucco, A.; Aguiar, A.P.; Pereira, F.L.; de Sousa, J.B. A Predictive Path-Following Approach for Fixed-Wing Unmanned Aerial Vehicles in Presence of Wind Disturbances. Adv. Intell. Syst. Comput. 2016, 417, 623–634. [Google Scholar] [CrossRef]
Alessandretti, A.; Aguiar, A.P. A Planar Path-Following Model Predictive Controller for Fixed-Wing Unmanned Aerial Vehicles. In Proceedings of the 11th International Workshop on Robot Motion and Control (RoMoCo), Wasowo, Poland, 3–5 July 2017; pp. 59–64. [Google Scholar] [CrossRef]
Chen, H.; Cong, Y.; Wang, X.; Xu, X.; Shen, L. Coordinated Path-Following Control of Fixed-Wing Unmanned Aerial Vehicles. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 2540–2554. [Google Scholar] [CrossRef]
Abu-Khalaf, M.; Lewis, F.L. Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]
Powell, W.B. Approximate Dynamic Programming; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2007. [Google Scholar] [CrossRef]
Yang, X.; He, H.; Liu, D.; Zhu, Y. Adaptive Dynamic Programming for Robust Neural Control of Unknown Continuous-Time Non-Linear Systems. IET Control Theory Appl. 2017, 11, 2307–2316. [Google Scholar] [CrossRef]
Jiang, H.; Zhang, H.; Luo, Y.; Han, J. Neural-Network-Based Robust Control Schemes for Nonlinear Multiplayer Systems with Uncertainties via Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 579–588. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Lewis, F.L. Online Actor-Critic Algorithm to Solve the Continuous-Time Infinite Horizon Optimal Control Problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
Vrabie, D.; Lewis, F. Neural Network Approach to Continuous-Time Direct Adaptive Optimal Control for Partially Unknown Nonlinear Systems. Neural Netw. 2009, 22, 237–246. [Google Scholar] [CrossRef] [PubMed]
Modares, H.; Lewis, F.L. Optimal Tracking Control of Nonlinear Partially-Unknown Constrained-Input Systems Using Integral Reinforcement Learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
Yan, J.; Yu, Y.; Wang, X. Distance-Based Formation Control for Fixed-Wing UAVs with Input Constraints: A Low Gain Method. Drones 2022, 6, 159. [Google Scholar] [CrossRef]
Lewis, F.L.; Vrabie, D.L.; Syrmos, V.L. Optimal Control, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012. [Google Scholar] [CrossRef]
Adhyaru, D.M.; Kar, I.N.; Gopal, M. Bounded Robust Control of Nonlinear Systems Using Neural Network–Based HJB Solution. Neural Comput. Appl. 2010, 20, 91–103. [Google Scholar] [CrossRef]
Liu, D.; Yang, X.; Li, H. Adaptive Optimal Control for a Class of Continuous-Time Affine Nonlinear Systems with Unknown Internal Dynamics. Neural Comput. Appl. 2012, 237, 2012, 23, 1843–1850. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 1990, 3, 551–560. [Google Scholar] [CrossRef]
Aguiar, A.P.; Hespanha, J.P.; Kokotović, P.V. Performance Limitations in Reference Tracking and Path Following for Nonlinear Systems. Automatica 2008, 44, 598–610. [Google Scholar] [CrossRef]
Wang, Y.; Wang, X.; Zhao, S.; Shen, L. Vector Field Based Sliding Mode Control of Curved Path Following for Miniature Unmanned Aerial Vehicles in Winds. J. Syst. Sci. Complex. 2018, 31, 302–324. [Google Scholar] [CrossRef]

Figure 1. The flowchart of the online integral reinforcement learning (IRL)-based policy iteration algorithm for approximate optimal tracking control policy.

Figure 2. The reference path for policy learning and the neural network (NN) weights iteration.

Figure 3. The high-fidelity hardware-in-the-loop (HIL) simulation system.

Figure 4. The reference path and tracking trajectories in HIL simulation tests: (a) reference path; (b) tracking trajectory.

Figure 5. The heading error comparison.

Figure 6. The cross-tracking error and the root mean squared error comparison.

Figure 7. The control input using the proposed method.

Table 1. Parameter settings for optimal policy learning.

Symbol	Value	Meaning
$v_{d}$ (m/s)	19	the cruising speed of the UAV
$v_{s t a l l}$ (m/s)	14	the minimum stall speed of the UAV
$v_{m a x}$ (m/s)	24	the maximum speed of the UAV
$ω_{m a x}$ (rad/s)	0.6	the maximum heading rate

Table 2. Waypoints of the reference path in HIL simulation tests.

	WP 1	WP 2	WP 3	WP 4
Latitude	34.0245	34.0288	34.0240	34.0198
Longitude	113.7068	113.7032	113.7012	113.7040
	WP 5	WP 6	WP 7	WP 8
Latitude	34.0282	34.0247	34.0212	34.0245
Longitude	113.7096	113.7114	113.7106	113.7068

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Wang, X.; Shen, L. Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints. Drones 2022, 6, 319. https://doi.org/10.3390/drones6110319

AMA Style

Wang Y, Wang X, Shen L. Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints. Drones. 2022; 6(11):319. https://doi.org/10.3390/drones6110319

Chicago/Turabian Style

Wang, Yajing, Xiangke Wang, and Lincheng Shen. 2022. "Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints" Drones 6, no. 11: 319. https://doi.org/10.3390/drones6110319

APA Style

Wang, Y., Wang, X., & Shen, L. (2022). Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints. Drones, 6(11), 319. https://doi.org/10.3390/drones6110319

Article Menu

Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints

Abstract

1. Introduction

2. Problem Formulation

3. Optimal Control Design for Curve Path Tracking with Asymmetric Control Input Constraints

4. IRL-Based Approximate Optimal Solution

5. Application to Fixed-Wing UAVs

5.1. Problem Formulation

5.2. Approximate Optimal Control Policy Learning

5.3. HIL Simulation Test and Result Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI