Inducing Optimality in Prescribed Performance Control for Uncertain Euler–Lagrange Systems

Vlachos, Christos; Malli, Ioanna; Bechlioulis, Charalampos P.; Kyriakopoulos, Kostas J.

doi:10.3390/app132111923

Open AccessArticle

Inducing Optimality in Prescribed Performance Control for Uncertain Euler–Lagrange Systems

¹

Department of Electrical and Computer Engineering, University of Patras, Rio, 26504 Patras, Greece

²

School of Mechanical Engineering, National Technical University of Athens, 15772 Athens, Greece

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(21), 11923; https://doi.org/10.3390/app132111923

Submission received: 20 September 2023 / Revised: 29 October 2023 / Accepted: 29 October 2023 / Published: 31 October 2023

(This article belongs to the Section Mechanical Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The goal of this paper is to find a stabilizing and optimal control policy for a class of systems dictated by Euler–Lagrange dynamics, that also satisfies predetermined response criteria. The proposed methodology builds upon two stages. Initially, a neural network is trained online via an iterative process to capture the system dynamics, which are assumed to be unknown. Subsequently, a successive approximation algorithm is applied, employing the acquired dynamics from the previous step, to find a near-optimal control law that takes into consideration prescribed performance specifications, such as convergence speed and steady-state error. In addition, we concurrently guarantee that the system evolves exclusively within the compact set for which sufficient approximation capabilities have been acquired. Finally, we validate our claims through various simulated studies that confirm the success of both the identification process and the minimization of the cost function.

Keywords:

adaptive dynamic programming; optimal control; Euler–Lagrange systems; prescribed performance control; tracking differentiator

1. Introduction

Finding an optimal control solution for Euler–Lagrange systems can be challenging, owing to the nonlinear nature of the Hamilton–Jacobi–Bellman (HJB) equation. Such complication is also aggravated by the absence of an accurate knowledge of the system dynamics. Towards this direction, multiple methods have been proposed in the recent related literature. Adaptive dynamic programming (ADP) [1] is a prominent class of such algorithms, which evolved from the field of dynamic programming [2] and shares many common tools with the area of reinforcement learning [3]. Owing to their data-driven nature, ADP algorithms that are based on actor–critic schemes and the broad use of neural networks are free from both the “curse of dimensionality” and the “curse of modeling”, which plague traditional dynamic programming techniques. The ADP algorithm that was first introduced by Werbos in [4] for the adaptive approximation of the Bellman equation has evolved beyond its original application on discrete-time systems [5,6], to include algorithms for continuous time. The ever-increasing prominence of neural network programming has further contributed to the success of this particular field of optimal control, exploiting neural networks’ ability to function as universal approximators [7]. The existing literature on the subject is rich, tackling problems such as stabilization [8,9], regulation [10,11] and tracking [12,13,14,15,16].

The regulation problem is examined in [10], aiming to find a near-optimal online tracker using a policy iteration technique. The solutions to the regulation and HJB equations are successively approximated via neural networks. In a more recent work on the regulation problem [11], a value iteration algorithm is proposed, with ensures the convergence rate of the tracking error through the selection of an appropriate gain, free from the hurdle of finding an appropriate initial admissible policy. Regarding ADP methods for tracking problems, in [13], an optimal tracking controller with guaranteed ultimately bounded tracking error is proposed; however, the employed method requires the exact knowledge of the system dynamics, a condition that is often hard to satisfy in practice. Optimal tracking control with completely unknown dynamics is proposed in [14], where the ADP-based controller is used along with a steady-state controller to shape tracking error both in the transient as well as the steady state. Similarly, in [15], an online adaptive tracking controller is designed for completely unknown dynamics, by merging a steady-state and an optimal controller. Another approach for the nonlinear optimal control problem, which differs from the conventional actor–critic networks, introduces an identifier module that is tasked with learning the system dynamics. In [17], this structure is utilized for a partially unknown system, whereas in [15] an algorithm with an identifier–critic structure is applied on a system of completely unknown dynamics. In this technique, the use of the actor is rendered obsolete, reducing the computational burden.

2. State of the Art and Contributions

A common factor in the aforementioned works is that none of them take into consideration performance specifications, such as convergence speed and steady-state error. One way to achieve this is through the incorporation of the prescribed performance control (PPC) technique [18]. The PPC scheme has been utilized for Euler–Lagrange systems with unknown dynamics in several works [19,20,21]. In [19], a sliding-mode control technique is employed along with a novel prescribed performance function to guarantee finite-time stability. In [20], the saturated PPC problem is addressed by using a controller that utilizes an adaptive multilayer neural network of adjustable weights to compensate for state-dependent uncertainties of the system. Furthermore, ref. [21] presents a reinforcement learning-based control scheme that has the optimizing ability for a trade-off between performance and control cost. Nevertheless, note that optimality is only addressed in the latter.

In this paper, we examine the optimal regulation problem and propose an algorithm that approximates the solution to the HJB equation without assuming any a priori knowledge of the system dynamics. In contrast to the vast majority of previous related works, the corresponding control policy is not only optimal with respect to the chosen cost function, but also satisfies prescribed performance criteria that have been set by the designer beforehand through the appropriate selection of certain control parameters. This is achieved by incorporating the PPC formulation within the definition of the optimal control problem. In addition, a significant part of our contribution lies in the provable guarantee of the system’s trajectories evolution, strictly within the set for which the approximation capabilities of the adopted neural network structures are sufficient. This is necessary to ensure the stability of the derived optimal control policy, which is the primary requirement in control system design [22].

The proposed method follows a two-phase strategy. First, via a system identification phase, a neural network is utilized to learn the unknown open-loop system dynamics, even though it might be unstable (i.e., probing signals that are common in the related identification literature are not sufficient to excite the system dynamics as they may lead to instability). The novelty of our approach lies in the fact that it can be employed to learn the unknown dynamics even in the case of an open-loop unstable system. Moreover, the neural network is fitted over a series of trajectories that cover the desired subset of the state space in its entirety to ensure that the extracted knowledge is valid over the entire set and not only around the neighborhood of a single trajectory, as commonly met in related works [10]. It should be noted that knowledge of the system dynamics is not required to achieve prescribed performance [23]. However, to achieve optimality, the system dynamics are necessary since they relate the input–output mapping which is involved in the adopted cost function, as it will be made clear in the sequel. Despite the approximation error between the real and the acquired dynamics, the induction of optimality with prescribed performance in the derived control strategy provides robustness when dealing with such uncertainty. In the second phase, the optimal cost function and policy are approximated through an iterative technique that converges uniformly to their actual values. A constrained least-squares problem is solved at each iteration, and upon convergence, we obtain the optimal controller’s parameters which also ensure that the system’s trajectories evolve strictly within the set for which the approximation of the cost function is valid (an issue that is ignored in the related literature).

Our key contributions in this work can be summarized as follows:

An identification framework that is able to retrieve the unknown system dynamics even in the case of open-loop instability.
A successive approximation algorithm that aims to obtain a near-optimal control law, while incorporating prescribed performance specifications.
A method that guarantees the evolution of the system’s trajectories strictly within the set for which the approximation capabilities of the identification structure are sufficient.

The overall methodology and its effectiveness are demonstrated through extensive simulation studies on a pendulum and a two-degree-of-freedom robotic manipulator.

3. Problem Formulation and Preliminaries

Consider an n-degree-of-freedom Euler–Lagrange system that obeys the following dynamic model:

M (q) \ddot{q} + C (q, \dot{q}) \dot{q} + G (q) = τ

(1)

where

q, \dot{q} \in R^{n}

denote the generalized state vector (e.g., position and velocity),

M (q)

is a positive definite inertia matrix,

C (q, \dot{q})

is the matrix that describes the Coriolis centrifugal phenomena,

G (q)

is the vector describing the influence of gravity, and

τ

is the torque that acts as the system’s input.

Assumption 1.

The states q (position) and

\dot{q}

(velocity) are available for measurement, but

\ddot{q}

(acceleration) is not.

We also define the Lipschitz continuous functions

f (q, \dot{q}) \in R^{n}

and

g (q) \in R^{n \times n}

to describe the system drift dynamics

f (q, \dot{q}) = - M^{- 1} (q) (C (q, \dot{q}) \dot{q} + G (q))

and the input vector field

g (q) = M^{- 1} (q)

, respectively. Thus, system (1) may be reformulated as:

\ddot{q} = f (q, \dot{q}) + g (q) τ

(2)

Assumption 2.

The system (2) is robustly stabilizable [24], that is, there exists a continuous control law

τ (q, \dot{q})

and a compact set

Ω \subset R^{n}

such that for any initial condition

\tilde{q} (0) = {[q (0), \dot{q} (0)]}^{T}

, the solutions of the closed-loop system starting from

\tilde{q} (0)

exist for all

t \geq 0

and satisfy

∥ \tilde{q} {(t) ∥}_{Ω} \leq β (∥ \tilde{q} (0) ∥_{Ω}, t)

, where β is a class

K L

function and

{∥ • ∥}_{Ω}

denotes

{∥ • ∥}_{Ω} = inf {∥ • - ζ ∥ : ζ \in Ω}

.

Assumption 3.

No a priori knowledge regarding

f (q, \dot{q})

and

g (q)

is available besides the Lipschitz continuity.

Our objective is to design an optimal (with respect to a state and input integral cost) control strategy within a compact workspace

Ω_{q} \times Ω_{\dot{q}} \subset R^{2 n}

that drives the system towards a fixed configuration

{\bar{q}}_{d} \in Ω_{q}

with predefined transient and steady-state performance (i.e., minimum convergence rate and steady-state error). Before we proceed with the formulation of the optimal control problem, we first give a brief presentation of the PPC technique.

3.1. Prescribed Performance Control (PPC)

The PPC strategy [18] enables the tracking of a reference trajectory with the system’s response fulfilling predefined performance criteria during the transient and the steady state, without requiring any knowledge of the system dynamics. For the case of a generic scalar error

σ (t)

, the prescribed performance is achieved if the error remains bounded within a predefined region that is formed by decaying functions of time, as illustrated in Figure 1:

- ρ (t) < σ (t) < ρ (t), \forall t ⩾ 0

(3)

The function

ρ (t)

is a smooth, bounded, strictly positive and decreasing function of time, called performance function, and is chosen as

ρ (t) = (ρ_{0} - ρ_{\infty}) e^{- l t} + ρ_{\infty}

, with

ρ_{0}, ρ_{\infty}

and l being positive gains that are chosen to satisfy the designer’s specifications. Specifically,

ρ_{\infty} = {lim}_{t \to \infty} ρ (t)

is selected according to the maximum allowable tracking error at the steady state; l determines a lower bound on the speed of convergence; and

ρ_{0}

affects the maximum overshoot and is selected such that

ρ_{0} > | σ (0) |

.

In our case, we define the scalar tracking error

σ_{i} (e_{i} (t)) = {\tilde{λ}}^{T} e_{i} (t)

where

{\tilde{λ}}^{T} = [λ, 1]

with

λ

being a positive constant and

e_{i} (t) = {[q_{i} - q_{d_{i}}, {\dot{q}}_{i} - {\dot{q}}_{d_{i}}]}^{T}

with

q_{d_{i}} (t)

,

i = 1, \dots, n

being the reference trajectories. Notice that the performance specifications imposed on the errors

σ_{i}

are easily translated into equivalent performance specifications of the errors

e_{i}

as mentioned in Lemma 1 in [23]. The intrinsic property behind PPC lies on a mapping of the tracking error that transforms the constrained behavior as defined in (3) into a significantly relaxed unconstrained problem. More specifically, we define:

ϵ_{i} (t) = T (ξ_{i} (t)), i = 1, \dots, n

(4)

where

ϵ_{i} (t)

is the transformed error,

T : (- 1, 1) \to (- \infty, \infty)

is a strictly increasing, symmetric and bijective mapping, e.g.,

T (★) = \frac{1}{2} \ln (\frac{1 + ★}{1 - ★})

, and

ξ_{i} (t) = \frac{σ_{i} (e_{i} (t))}{ρ_{i} (t)}

. To achieve the prescribed performance the following control signal [23] is used:

τ_{i} = - k T^{'} (ξ_{i}) ρ_{i}^{- 1} (t) T (ξ_{i}), k > 0, \forall i = 1, \dots, n

(5)

where k is a positive gain and

T^{'} (ξ)

the derivative of

T (ξ)

. The following proposition guarantees that no matter how large the upper bound of the transformed error

ϵ_{i} (t)

is (which is affected by the model uncertainty), the performance specifications encapsulated in the corresponding performance function

ρ_{i} (t)

are met.

Proposition 1

([25]). The prescribed-performance control problem, as defined by (3), admits a solution if and only if the transformed error signals (4) can be kept bounded.

Proof.

Given any finite initial condition

{[q_{i} (0), {\dot{q}}_{i} (0)]}^{T}

, we can always select

ρ_{i} (0)

to satisfy (3) at

t = 0

, hence ensuring that

ξ_{i} (0) \in (- 1, 1)

, and thus, the transformed system is initially well defined. Furthermore, if the transformed system is robustly stabilizable, there exists a continuous control law that guarantees that

{[q_{i} (t), {\dot{q}}_{i} (t)]}^{T}, ϵ_{i} (t) \in L_{\infty}

and as a consequence, there exist unknown constants

ϵ_{i_{l o w e r}}

,

ϵ_{i_{u p p e r}}

, such that

ϵ_{i_{l o w e r}} \leq ϵ_{i} (t) \leq ϵ_{i_{u p p e r}}, i = 1 \dots, N \forall t \geq 0

. Since T is a smooth, strictly increasing function, its inverse exists, and thus,

T^{- 1} (ϵ_{i_{l o w e r}}) \leq ξ_{i} (t) \leq T^{- 1} (ϵ_{i_{u p p e r}}) \Rightarrow T^{- 1} (ϵ_{i_{l o w e r}}) ρ_{i} (t) \leq σ_{i} (t) \leq T^{- 1} (ϵ_{i_{u p p e r}}) \forall t \geq 0

. Since system (2) belongs to the general class of nonlinear affine MIMO systems that are feedback linearizable, it can be easily verified that its transformed system is robustly stabilizable [25]. Therefore, the PPC problem admits a solution. □

3.2. Optimal Control with Prescribed Performance

To induce optimality along with prescribed performance, we formulate the cost function of the optimal control problem as follows:

J = \int_{0}^{\infty} ({α ∥ ϵ (t) ∥}^{2} + β {∥ τ (t) - G ({\bar{q}}_{d}) ∥}^{2}) d t

(6)

where

ϵ = {[\begin{matrix} ϵ_{1}, \dots, ϵ_{n} \end{matrix}]}^{T}

denotes the vector of the transformed errors defined via (4) as

ϵ_{i} (t) = T (\frac{{\dot{q}}_{i} + λ (q_{i} - {\bar{q}}_{d_{i}})}{ρ_{i} (t)}), i = 1, \dots, n

. The constants

α

and

β

are positive and regulate the trade-off between the state convergence and the energy of the input signal. Additionally, for an admissible control policy (i.e., a control policy that exhibits a finite cost value from any initial condition in the workspace) the transformed errors remain bounded since

ϵ_{i} (t) \in L_{2}

and thus the predefined performance specifications are met according to Proposition 1. Finally, notice that the term

G ({\bar{q}}_{d})

is necessary in the input-related term in (6), so that

{\bar{q}}_{d}

becomes an equilibrium and the integral cost is valid (otherwise the integral of the second term in the cost function (6) would grow unbounded as the input signal to keep the system at the desired position

{\bar{q}}_{d}

would not vanish).

To find an admissible policy that minimizes (6) and guarantees predefined transient and steady state performance specifications, first, notice that (6) can be transformed in its differential form as

\dot{J} = - {α ∥ ϵ ∥}^{2} - β {∥ τ - G ({\bar{q}}_{d}) ∥}^{2}

. Thus, the equivalent HJB equation can be written as:

H J B (z) = \nabla_{q} J^{T} \dot{q} + \nabla_{ρ} J^{T} \dot{ρ} + \nabla_{ϵ} J^{T} \dot{ϵ} + {α ∥ ϵ ∥}^{2} + β {∥ τ - G ({\bar{q}}_{d}) ∥}^{2} = 0

(7)

for a stacked state-space vector

z = {[q^{T}, ρ^{T}, ϵ^{T}]}^{T}

, where

ρ = {[ρ_{1}, \dots, ρ_{n}]}^{T}

denotes the vector of the performance functions, and

\nabla_{•}

denotes the gradient with respect to each argument •. Applying the stationary condition in (7) after substituting the system dynamics (1) in

\dot{ϵ_{i}}

, the optimal control policy is calculated as:

τ^{★} (q, ρ, ϵ) = - \frac{1}{2 β} M^{- T} (q) T^{'} diag {(ρ)}^{- 1} \nabla_{ϵ} J + G ({\bar{q}}_{d}),

(8)

where

T^{'}

denotes the diagonal matrix with the derivatives of the transformation function

\frac{d T (★)}{d ★}

for each error, respectively, which are yielded by

\dot{ϵ}

.

Now, we have to deal with two major obstacles. The first concerns the lack of knowledge of the system dynamics, which is heavily involved in both the HJB Equation (7) via

\dot{ϵ}

as well as in the optimal control policy (8). Furthermore, notice that in order to implement (8) we need the gradient of the cost function. However, the HJB Equation (7) is a nontrivial partial differential equation to solve numerically (i.e., we cannot easily calculate

\nabla_{ϵ} J

and apply it in the optimal control policy (8)). In the next section, both issues are rigorously addressed.

4. Methodology

To overcome the aforementioned hurdles, the first phase of our method is devoted to creating a sufficiently accurate approximation of the underlying open-loop dynamics by employing an artificial neural network. Subsequently, building upon the extracted knowledge of the system dynamics, we utilize a successive approximation strategy to solve the HJB equation.

4.1. Identification of System Dynamics

We adopt an iterative process, which aims at creating a progressively improving approximation of the unknown functions

f (q, \dot{q})

,

g (q)

, denoted as

\hat{f} (q, \dot{q})

and

\hat{g} (q)

, respectively, at each iteration, until convergence is achieved, i.e.,

| f (q, \dot{q}) + g (q) τ - \hat{f} (q, \dot{q}) - \hat{g} (q) τ | < \bar{ϵ}

, with

\bar{ϵ}

being an arbitrarily small positive number. To acquire the data needed for the neural network estimation at each iteration and in order to guarantee that this data set is representative of the compact workspace

Ω_{q} \times Ω_{\dot{q}} \subset R^{2 n}

, we form a reference trajectory by linking together multiple points located all over

Ω_{q} \times Ω_{\dot{q}}

. To elaborate further on this, first, a set of N points

X = {[X_{1}, \dots, X_{N}]}^{T}

is selected such that it covers

Ω_{q} \times Ω_{\dot{q}}

. Then, we need to devise a closed path that traverses these points. For that purpose, we connect any pair of points

X_{i} \to X_{j}

with a trajectory of minimum acceleration (see Chapter 3 in [26]), as follows:

[\begin{matrix} q_{d} (t) \\ {\dot{q}}_{d} (t) \end{matrix}] = ([\begin{matrix} 1 & t \\ 0 & 1 \end{matrix}] \otimes I_{n}) X_{i} - ([\begin{matrix} - \frac{t^{3}}{T^{3}} & - \frac{t^{2} (T - t)}{T^{2}} \\ \frac{6 t (T - t)}{T^{3}} & \frac{t (2 T - 3 t)}{T^{2}} \end{matrix}] \otimes I_{n}) [X_{j} - ([\begin{matrix} 1 & T \\ 0 & 1 \end{matrix}] \otimes I_{n}) X_{i}]

for all

t \in [0, T]

, where T denotes the transition time from

X_{i}

to

X_{j}

. The path in every iteration is then created by a different random sequence of

X_{i}

,

i = 1, \dots, N

(i.e., a permutation of X) to ensure that the final result is free of bias along a specific trajectory.

Subsequently, the aforementioned reference trajectory that covers the whole domain space will be tracked with predefined transient and steady-state performance using the PPC technique, as described in Section 3.1, with the position q and velocity

\dot{q}

being measurable, according to Assumption 1. Hence, based on the collected data over the aforementioned series of reference trajectories, we shall approximate the unknown dynamics using a neural network structure

N N (q, \dot{q}, τ)

that will fit the system dynamics as

f (q, \dot{q}) + g (q) τ \approx N N (q, \dot{q}, τ)

. In particular, we are interested in learning

f (q, \dot{q})

and

g (q)

separately, which correspond to the terms

- M {(q)}^{- 1} (C (q, \dot{q}) \dot{q} + G (q))

and

M^{- 1} (q)

, respectively. This can be straightforwardly accomplished by setting

τ = 0_{n}

, where

0_{n}

denotes an n-dimensional vector of zeros, as follows:

N N (q, \dot{q}, 0_{n}) \approx f (q, \dot{q}) .

(9)

Then, by setting

τ = c 1_{n}^{{i}}

,

i = 1, \dots, n

, where

c \neq 0

is a constant number, and

1_{n}^{{i}}

denotes an n-dimensional vector with one in the ith element and zeros everywhere else, we obtain:

\frac{1}{c} {[N N (q, \dot{q}, c 1_{n}^{{i}}) - N N (q, \dot{q}, 0_{n})]}_{i = 1, \dots, n} \approx g (q, \dot{q}) .

(10)

Finally, the gravity vector of the dynamic model that is employed in (8) can be easily accessed by the aforementioned structure following the property

G (q) = - g^{- 1} (q) f (q, 0)

.

Nevertheless, under Assumption 1, we do not have access to the acceleration

\ddot{q}

in (2), and thus, the target values to train the neural network

N N (q, \dot{q}, τ)

are not available. To remedy this issue, the following tracking differentiator [27] is employed:

\begin{matrix} \dot{z_{1}} & = z_{2} \end{matrix}

(11)

\begin{matrix} \dot{z_{2}} & = - k_{z_{1}} R^{2} (z_{1} - q) - k_{z_{2}} R (z_{2} - \dot{q}) + N N (q, \dot{q}, τ) \end{matrix}

(12)

with positive gains

k_{z_{1}}, k_{z_{2}}, R

. When

R \to \infty

, then, based on [27],

z_{1} \to q

and

z_{2} \to \dot{q}

, and consequently

\dot{z_{2}} \to \ddot{q}

, from which we may reconstruct the acceleration signal that will be employed for the neural network training. Notice that during the first round over the closed reference trajectory that traverses all points in the workspace

Ω_{q} \times Ω_{\dot{q}}

, the NN structure is null (i.e., we have not initiated training since we have not collected the required data yet). Thus, it is activated after the first training stage (i.e., the first pass over all points in X). This means that in every new round (i.e., a new permutation of X), the extracted neural network originates from the previous iteration. After each round of training, we employ the knowledge we acquire for the dynamics not only to improve the acceleration estimation, but also to provide improved initial weights for the consecutive iterations. Consequently, after enough iterations (e.g., when the weights do not change above a threshold), an accurate model of the system dynamics is eventually acquired.

4.2. Solving the Hamilton–Jacobi–Bellman Equation

This subsection is dedicated to finding an admissible policy (8) that minimizes (6) and guarantees predefined transient and steady-state performance specifications. Towards this direction, a successive approximation strategy is adopted similarly to [21], where the approximation of the cost function converges uniformly to the optimal cost function.

In this respect, the solution to the HJB equation (i.e., the unknown cost function

J (q, ρ, ϵ)

) is expanded over a set of basis functions with adjustable weights. The approximate solution assumed here is:

J_{L} (q, ρ, ϵ) = \sum_{j = 1}^{L} w_{j}^{i} σ_{j} (q, ρ, ϵ) = w_{L}^{T} σ (q, ρ, ϵ)

(13)

where

J_{L} (q, ρ, ϵ)

denotes the approximation of the unknown cost function,

J (q, ρ, ϵ)

,

σ_{L} = {[\begin{matrix} σ_{1}, \dots, σ_{L} \end{matrix}]}^{T}

includes polynomial regressor terms, and

w_{L} = {[\begin{matrix} w_{1}, \dots, w_{L} \end{matrix}]}^{T}

denotes the weight vector to be successively adjusted so that the residual error may be minimized, i.e., the best-fitted solution

J_{L} (q, ρ, ϵ)

to the HJB equation is extracted. Hence, if we substitute the trial solution into the HJB equation, the residual is formed as:

e_{L} (q, ρ, ϵ) = w_{L}^{T} (\sum_{i = 1}^{n} \nabla_{q_{i}} σ^{T} \dot{q_{i}} + \nabla_{ρ_{i}} σ^{T} \dot{ρ_{i}} + \nabla_{ϵ_{i}} σ^{T} \dot{ϵ_{i}}) + {α ∥ ϵ ∥}^{2} + β {∥ τ (q, ρ, ϵ) - G ({\bar{q}}_{d}) ∥}^{2} = 0

(14)

where

\begin{matrix} \dot{q} = diag (ρ) T^{- 1} (ϵ) - λ (q - {\bar{q}}_{d}) \\ \dot{ρ} = - l ρ + l ρ_{\infty} \\ \dot{ϵ} = diag {(ρ)}^{- 1} diag {(T^{- 1} (ϵ))}^{- 1} (f (q, \dot{q}) + g (q) τ - T^{- 1} (ϵ) (- l ρ + l ρ_{\infty})) \end{matrix}

In order to calculate the unknown weights

w_{L}

, the inner product of the residual and its derivative with respect to the weights is set to zero as follows:

〈 \frac{d e_{L}}{d w_{L}}, e_{L} 〉 = 0 \Leftrightarrow

\begin{matrix} w_{L}^{T} 〈 \sum_{i = 1}^{n} \nabla_{q_{i}} σ^{T} \dot{q_{i}} + \sum_{i = 1}^{n} \nabla_{ρ_{i}} σ^{T} \dot{ρ_{i}} + \sum_{i = 1}^{n} \nabla_{ϵ_{i}} σ^{T} \dot{ϵ_{i}}, \\ \sum_{i = 1}^{n} \nabla_{q_{i}} σ^{T} \dot{q_{i}} + \sum_{i = 1}^{n} \nabla_{ρ_{i}} σ^{T} \dot{ρ_{i}} + \sum_{i = 1}^{n} \nabla_{ϵ_{i}} σ^{T} \dot{ϵ_{i}} 〉 + \\ 〈 α ∥ ϵ ∥^{2} + β ∥ τ (q, ρ, ϵ) - G (q_{d}) ∥^{2}, \\ \sum_{i = 1}^{n} \nabla_{q_{i}} σ^{T} \dot{q_{i}} + \sum_{i = 1}^{n} \nabla_{ρ_{i}} σ^{T} \dot{ρ_{i}} + \sum_{i = 1}^{n} \nabla_{ϵ_{i}} σ^{T} \dot{ϵ_{i}} 〉 = 0 \end{matrix}

where the inner product between two vectors u and v is given by

〈 u, v 〉 = \int_{V} u v d V

over a domain

V

. To solve the aforementioned problem with respect to

w_{L}

, a discretization of P points

(q^{i}, ρ^{i}, ϵ^{i}), i = 1, \dots, P

is applied over a compact set

Ω_{z} \subset ℜ^{3 n}

in order to obtain the following terms:

\begin{matrix} X = ⌊ {(\sum_{i = 1}^{n} \nabla_{q_{i}} σ^{T} \dot{q_{i}} + \sum_{i = 1}^{n} \nabla_{ρ_{i}} σ^{T} \dot{ρ_{i}} + \nabla_{ϵ_{i}} σ^{T} \dot{ϵ_{i}})}_{(q^{1}, ρ^{1}, ϵ^{1})}, \dots, \\ \sum_{i = 1}^{n} \nabla_{q_{i}} σ^{T} \dot{q_{i}} + \sum_{i = 1}^{n} \nabla_{ρ_{i}} σ^{T} \dot{ρ_{i}} + \nabla_{ϵ_{i}} σ^{T} \dot{ϵ_{i}})_{(q^{p}, ρ^{p}, ϵ^{p})} ⌋ \end{matrix}

(15)

\begin{matrix} Y = ⌊ {({α ∥ ϵ ∥}^{2} + β {∥ τ (q, ρ, ϵ) - G (q_{d}) ∥}^{2})}_{(q^{1}, ρ^{1}, ϵ^{1})}, \dots, \\ {({α ∥ ϵ ∥}^{2} + β {∥ τ (q, ρ, ϵ) - G (q_{d}) ∥}^{2})}_{(q^{p}, ρ^{p}, ϵ^{p})} ⌋ \end{matrix}

(16)

Consequently, the weights

w_{L}

may be calculated using the least-squares method as follows:

w_{L} = - {(X^{T} X)}^{- 1} (X^{T} Y)

(17)

and then be employed to update the optimal control policy in (8) by calculating the gradient of

J_{L} (q, ρ, ϵ)

along

ϵ

. However, owing to the fact that the approximation capabilities of polynomials hold locally over a compact set, when employing the optimal control policy (8) we need to ensure that the trajectories of

q, ρ, ϵ

evolve strictly within the compact set where the approximation of the cost function is valid. Owing to the decreasing property of the performance function, we have that

ρ (t) \leq ρ_{0} \forall t \geq 0

. Consequently, if

ρ_{0}

belongs to the compact set, so does the trajectory

ρ (t)

. Here,

Ω_{z}

is chosen to be symmetric in each variable

ϵ_{i}

about zero, i.e.,

ϵ_{i} \in [- c, c]

. In addition, from the proof of Proposition 1, if

q (0) \in Ω_{z}

, then q stays trapped inside

Ω_{z}

as well. Therefore, what remains is to guarantee that the transformed error stays inside

Ω_{z}

, which is achieved by imposing constraints on the weights

w_{L}

. Consider the n-dimensional cube:

M = \{V (ϵ_{1}, ϵ_{2}, \dots, ϵ_{n}) = max {∥ ϵ_{1} ∥, ∥ ϵ_{2} ∥, \dots, ∥ ϵ_{n} ∥} = c\}

(18)

The transformed error stays trapped inside M if for the inner product of

{\dot{ϵ}}^{T} = [{\dot{ϵ}}_{1} (t), {\dot{ϵ}}_{2} (t), \dots, {\dot{ϵ}}_{N} (t)]

and

\nabla V (ϵ)

, it holds that

{\dot{ϵ}}^{T} \nabla V (ϵ) \leq 0

. Essentially, this means that for each of the

2 n

facets of the n-dimensional cube, we have

{\dot{ϵ}}_{N} \leq 0

if

ϵ_{N} = c

, and

{\dot{ϵ}}_{N} \geq 0

if

ϵ_{N} = - c

. Differentiating

ϵ_{i}

, we obtain:

{\dot{ϵ}}_{i} = \frac{ϵ_{i}^{'}}{ρ_{i}} {\ddot{q}}_{i} + \frac{ϵ_{i}^{'} λ q_{i} ρ_{i} - ϵ_{i}^{'} [{\dot{q}}_{i} + λ (q_{i} - q_{d_{i}})] {\dot{ρ}}_{i}}{ρ_{i}^{2}}

(19)

where

{\ddot{q}}_{i} = f_{i} + g_{i}^{T} τ^{*}

, and

f_{i}

and

g_{i}^{T}

denote the ith row of matrices f and g, respectively. Consequently,

{\dot{ϵ}}_{i}

can be written as

{\dot{ϵ}}_{i} = A_{i}^{T} w_{l} + B_{i}

where:

\begin{matrix} A_{i}^{T} & = - \frac{1}{2 β} g_{i}^{T} g^{T} T^{'} diag ρ^{- 1} \nabla_{σ_{ϵ}}^{T} \\ B_{i} & = \frac{ϵ_{i}^{'}}{ρ_{i}} (f_{i} + g_{i}^{T} G (q_{d})) + \frac{ϵ_{i}^{'} λ q_{i} ρ_{i} - ϵ_{i}^{'} [{\dot{q}}_{i} + λ (q_{i} - q_{d_{i}})] {\dot{ρ}}_{i}}{ρ_{i}^{2}} \end{matrix}

(20)

To guarantee safety [28], a discretization of M points is applied over each facet of the cube, for which:

\begin{matrix} {\tilde{A}}_{i}^{+} & = [A_{i}^{T} (q^{1}, ρ^{1}, ϵ^{1^{+}}), \dots, A_{i}^{T} (q^{M}, ρ^{M}, ϵ^{M^{+}})] \\ {\tilde{A}}_{i}^{-} & = [A_{i}^{T} (q^{1}, ρ^{1}, ϵ^{1^{-}}), \dots, A_{i}^{T} (q^{M}, ρ^{M}, ϵ^{M^{-}})] \\ {\tilde{B}}_{i}^{+} & = [B_{i} (q^{1}, ρ^{1}, ϵ^{1^{+}}), \dots, B_{i} (q^{M}, ρ^{M}, ϵ^{M^{+}})] \\ {\tilde{B}}_{i}^{-} & = [B_{i} (q^{1}, ρ^{1}, ϵ^{1^{-}}), \dots, B_{i} (q^{M}, ρ^{M}, ϵ^{M^{-}})] \end{matrix}

(21)

and

ϵ^{j^{-}}

(

ϵ^{j^{+}}

) denotes the jth sample point where

ϵ_{i}

is fixed at

ϵ_{i} = - c

(

ϵ_{i} = c

). Summarizing, for each

ϵ_{i}

, the following inequality constraints must be satisfied:

[\begin{matrix} {\tilde{A}}_{i}^{+} \\ - {\tilde{A}}_{i}^{-} \end{matrix}] w_{L} \leq [\begin{matrix} - {\tilde{B}}_{i}^{+} \\ {\tilde{B}}_{i}^{-} \end{matrix}]

(22)

Therefore, the weights

w_{L}

may be obtained by solving the constrained least-squares problem:

min_{w_{L}} \frac{1}{2} {∥ X w_{L} + Y ∥}^{2}, s . t . \tilde{A} w_{L} \leq \tilde{B}

(23)

where

\tilde{A} = [\begin{matrix} {\tilde{A}}_{1}^{+} \\ - {\tilde{A}}_{1}^{-} \\ ⋮ \\ {\tilde{A}}_{N}^{+} \\ - {\tilde{A}}_{N}^{-} \end{matrix}], \tilde{B} = [\begin{matrix} - {\tilde{B}}_{1}^{+} \\ {\tilde{B}}_{1}^{-} \\ ⋮ \\ - {\tilde{B}}_{N}^{+} \\ {\tilde{B}}_{N}^{-} \end{matrix}]

It should be noted that (23) can be solved as a quadratic optimization problem with linear constraints; thus, plenty of robust and computationally efficient methods to tackle it exist [29]. The aforementioned procedure that is applied iteratively to find an approximation of the optimal solution to the HJB equation is presented in Algorithm 1 and the results are summarized in the following Theorem.

Algorithm 1: Cost function approximation algorithm

1:: Initialize based on the PPC technique the control policy: $τ (q, ρ, ϵ) = - k T^{'} (ξ) ρ^{- 1} ϵ + G ({\bar{q}}_{d})$ with $ξ = \frac{\dot{q} + λ (q - {\bar{q}}_{d})}{ρ}$ .
2:: Select $(q^{i}, ρ^{i}, ϵ^{i})$ , $i = 1, \dots, P$ points over which the HJB will be fitted.
3:: Repeat
4:: Calculate the terms X and Y from (15) and (16) over the data $(q^{i}, ρ^{i}, ϵ^{i})$ , $i = 1, \dots, P$ .
5:: Calculate the terms $\tilde{A}$ and $\tilde{B}$ from (21) over the data $(q^{j}, ρ^{j}, ϵ^{j +})$ , $j = 1, \dots, M$ and $(q^{j}, ρ^{j}, ϵ^{j -})$ , $j = 1, \dots, M$ .
6:: Find $w_{L}$ by solving the constrained least-squares problem (23).
7:: Update the control policy according to (8) employing the parameter vector $w_{L}$ in the approximation of $\nabla_{ϵ} J$ .
8:: until $w_{L}$ converges.

Theorem 1.

Consider the Euler–Lagrange system (1) under Assumptions 1–3, as well as a compact set

Ω_{q, \dot{q}, τ}

for which a sufficiently accurate neural network approximation

N N (q, \dot{q}, τ)

of the system dynamics has been acquired. Then, for a compact set

Ω_{z} \subset ℜ^{3 n}

where

z ≜ {[q^{T}, ρ^{T}, ϵ^{T}]}^{T}

and a linearly independent polynomial basis

{σ_{j}}_{1}^{L}

, Algorithm 1 converges to a near-optimal estimate

J_{L} (q, ρ, ϵ) = w_{L}^{T} σ (q, ρ, ϵ)

of the minimum cost function, i.e., for any arbitrarily small constant

\tilde{ε}

, there exists

L_{0}

such that for any

L \geq L_{0}

:

sup_{z \in Ω_{z}} | J (z) - w_{L}^{T} σ (z) | < \tilde{ε}

(24)

Proof.

For a sufficiently accurate neural network approximation and by utilizing the admissible policy

u^{(0)}

in the initialization of Algorithm 1, an initial, unique, least-squares solution

J_{L}^{(0)}

for (8) can be acquired through the optimization problem (23). This solution is employed in the calculation of the control policy

u^{(1)}

, and by iterating the process, we obtain the successive approximation algorithm, which, along with the inequality constraints imposed, guarantees, following [30], that each new policy that is provided by the cost function of the previous one is stable and better than the previous one with respect to the metric (6), while also satisfying the prescribed performance specifications, since

ϵ \in L_{\infty}

and consequently remains bounded. Moreover, every trajectory

z (t)

lies strictly within the set where the approximation capabilities of the polynomial approximation structure hold. Therefore, starting from an admissible policy, we always improve it until

u^{(i)}

and

J_{L}^{(i)}

converge to their optimal values. Finally, to show (24), notice that from Weierstrass’s approximation theorem, since

J (z)

is

C^{1}

, it can be uniformly approximated as accurately as desired by a polynomial function within a compact set. As a consequence, no matter how small

\tilde{ε}

is chosen, there exists a polynomial basis

{σ_{j}}_{1}^{L}

with

L \geq L_{0} (\tilde{ε})

such that (24) holds. □

5. Simulation Results

In this section, we demonstrate the effectiveness of the proposed scheme via two simulated scenarios: a pendulum, illustrating the full capabilities of our method by performing a comparative study with another successive approximation strategy, as well as a two-degree-of-freedom robotic manipulator.

5.1. Case A: Pendulum

Consider a pendulum that obeys the following Euler–Lagrange dynamics

\ddot{q} = - \frac{g}{l} s i n (q) - \frac{k}{m} \dot{q} + \frac{1}{m l^{2}} τ

(25)

with q and

\dot{q}

denoting the angular position and velocity, respectively, and

τ

denoting the applied torque that acts as the system’s input. In addition, m denotes the mass and l the length of the rod, with k being the friction coefficient and g the acceleration of gravity. The values that were adopted for the simulation are

m = 5.2

kg,

l = 0.9

m,

k = 1

kg/s, and

g = 9.81

m/s

^{2}

. Despite its simplicity, several systems are modeled by equations similar to (25), rendering it of great practical importance.

Initially, a reference trajectory was designed so that it traversed (in random order) 51 points scattered over the domain set

Ω_{q} \times Ω_{\dot{q}} = [- 1, 1] \times [- 1, 1]

. The gains of the PPC controller used to track it were

k = 2

and

λ = 7

, with performance specifications dictated by

ρ_{0} = π / 3

,

ρ_{\infty} = π / 180

, and

l = 1.5

. Finally, the gains of the tracking differentiator were chosen as

k_{z_{1}} = k_{z_{2}} = 0.2

and

R = 100

. We also employed a neural network with a hidden layer containing eight neurons, to extract the system dynamics via the learning process.

The neural network’s training was carried out for 20 iterations (i.e., 20 permutations of the overall 51 points in the workspace), so that the approximation improved sufficiently. The training of the adopted neural network structure over the collected data during each iteration (i.e., after each one of the 20 reference trajectories) was conducted using the MATLAB toolbox employing the default Levenberg–Marquardt algorithm. The results of the approximation problem can be observed in Figure 2 for the corresponding terms of

f (q, \dot{q})

and

g (q)

. It can be easily observed that through this process, we succeeded in obtaining a highly accurate (with an average relative approximation error less than

1 %

) estimate of the system dynamics, without relying on any prior knowledge.

In the second phase, we set a conventional PPC stabilizing controller, parametrized by

k = 2

,

λ = 2

,

ρ_{0} = π / 3

,

ρ_{\infty} = π / 180

and

l = 1.5

, as the initial admissible control policy. Our aim was to drive the system towards the fixed configuration

{\bar{q}}_{d} = 0.5

, which is not a zero-input equilibrium of the system. The points used for the regression process were sampled from the set

Ω_{q, ρ, ϵ} = {[\begin{matrix} 0 & 1 \end{matrix}] \times [\begin{matrix} ρ_{\infty} & 10 ρ_{\infty} \end{matrix}]} \times [\begin{matrix} - 0.45 & 0.45 \end{matrix}]}

. Please note that the values chosen for the PPC controller, as well as the set adopted for the regression process should be specifically picked to take into consideration the desired prescribed performance specifications, as well as the system’s region of operation, respectively. For the integral cost, the weights

α = 0.5

and

β = 0.5

were chosen to specify an equal trade-off between the error convergence and the requested input energy. Moreover,

P = 2500

points were chosen within

Ω_{q, ρ, ϵ}

, and the number of constraints was set to

M = 1250

. The cost function was approximated by 32 polynomial basis functions, chosen as follows:

σ (q, ρ, ϵ) = {[T_{i_{1}} (q) T_{i_{2}} (ε) T_{i_{3}} (ρ)]}_{i_{1}, i_{2} = {1, \dots, 4} & i_{3} = {1, 2}}

, where

T_{i} (\cdot)

denotes the Chebyshev polynomial of degree i. An important property of these polynomials is that they are orthogonal with respect to the inner product, rendering them ideal for performing polynomial regression.

The response of the system under the optimal policy was simulated for

3 s

for five different initial conditions and is illustrated in Figure 3, where the ability of the algorithm to stabilize the system under strict performance specifications is demonstrated. The initial conditions for these simulations are included in Table 1, along with a comparison between the costs of the initial admissible PPC policy, the final one, and an optimal control policy obtained through a successive approximation strategy similar to [30]. The response of the system under the new optimal policy was simulated for

3 s

for the same initial conditions and is illustrated in Figure 4, along with the control effort requested by it. While it is clear that this policy results in significantly less control effort and thus a decrease in cost, note that this comes at the expense of sacrificing prescribed performance, as it is dictated by the parameters of the adopted performance function. This becomes evident in Figure 4, where it can be seen that for the initial conditions

x_{0}^{1}

, the tracking error

σ_{1} (t)

in our method stays within the region defined by the performance function, while the tracking error

σ_{2} (t)

escapes the region, indicating the other policy’s trade-off of prescribed performance for optimality.

Regarding our method, it can be easily observed that the cost was successfully decreased, owing mainly to the reduction in the required control effort and despite the fact that we confined the system evolution within the set where the approximation capabilities of the polynomial structure held. Finally, Figure 5 shows the evolution of the transformed error. Notice that even though all initial conditions lie within

Ω_{z}

, if we obtain the weights

w_{L}

through the solution of the unconstrained least squares problem (17), the trajectories of

ϵ

escape the set

Ω_{z}

, and

τ^{*}

grows unbounded. However, by imposing a set of inequalities for

w_{L}

as described in Section 4.2, the transformed error evolves strictly within

Ω_{z}

, where the polynomial approximation holds, and thus optimality along with prescribed performance specifications are met.

5.2. Case B: 2-DOF Robotic Manipulator

In this section, we present simulation results that demonstrate the effectiveness of our methodology in deriving a near-optimal control law, while taking into consideration prescribed performance specifications, for a two-degree-of-freedom robotic manipulator, as illustrated in Figure 6. The robotic manipulator obeys the following dynamic model

M (q) \ddot{q} + C (q, \dot{q}) \dot{q} + G (q) = τ,

(26)

where

q = {[q_{1} q_{2}]}^{T}

and

\dot{q} = {[{\dot{q}}_{1} {\dot{q}}_{2}]}^{T}

denote the joint angular positions and velocities, respectively,

M (q)

is a positive definite inertia matrix,

C (q, \dot{q})

is the matrix that describes the Coriolis centrifugal phenomena,

G (q)

is the vector describing the influence of gravity, and

τ

is the torque that acts as the system’s input.

More specifically, the inertia matrix is formulated as:

M = [\begin{matrix} M_{11} & M_{12} \\ M_{21} & M_{22} \end{matrix}]

where

\begin{matrix} M_{11} = I_{Z_{1}} + I_{Z_{2}} + m_{1} \frac{l_{1}^{2}}{4} + m_{2} ({l_{1}}^{2} + \frac{l_{2}^{2}}{4} + l_{1} l_{2} c_{2}) \\ M_{12} = M_{21} = I_{Z_{2}} + m_{2} (\frac{l_{2}^{2}}{4} + \frac{1}{2} l_{1} l_{2} c_{2}) \\ M_{22} = I_{Z_{2}} + m_{2} \frac{l_{2}^{2}}{4} \end{matrix}

In addition, the vector containing the Coriolis and centrifugal torques is defined as follows:

C (q, \dot{q}) \dot{q} = [\begin{matrix} - c \dot{q_{2}} + k_{1} & - c (\dot{q_{1}} + \dot{q_{2}}) \\ c \dot{q_{1}} & k_{2} \end{matrix}] [\begin{matrix} \dot{q_{1}} \\ \dot{q_{2}} \end{matrix}]

with c being

c = \frac{1}{2} m_{1} g l_{1} l_{2} s_{2}

. Additionally, the gravity vector is given by

G (q) = [\begin{matrix} \frac{1}{2} m_{1} g l_{1} c_{1} + m_{2} g (l_{1} c_{1} + \frac{1}{2} l_{2} c_{12}) \\ \frac{1}{2} m_{2} l_{2} g c_{12} \end{matrix}]

(27)

and the terms

c_{2}, s_{2}

and

c_{12}

correspond to

cos (q_{2}), sin (q_{2})

and

cos (q_{1} + q_{2})

, respectively. The values adopted for the simulation are depicted in Table 2, with

m_{i}

,

I_{z i}

and

l_{i}

denoting the mass, the moment of inertia and the length of link i, respectively,

k_{i}

being the joint friction coefficient and g the acceleration of gravity.

In order to retrieve the manipulator’s dynamics, a reference trajectory was designed so that it traversed (in random order) 71 points scattered over the compact set

Ω_{q} \times Ω_{\dot{q}} = [- 1, 1] \times [- 1, 1] \times [- 1, 1] \times [- 1, 1]

. In order to track it, a PPC controller with gains

k = 2

and

λ = 7

was used, with performance specifications dictated by

ρ_{0}^{1} = π / 3

,

ρ_{\infty}^{1} = π / 180

,

l_{1} = 1.5

,

ρ_{0}^{2} = π / 3

,

ρ_{\infty}^{2} = π / 180

and

l_{2} = 1.5

. In addition, the gains of the tracking differentiator were chosen as

k_{z_{1}} = k_{z_{2}} = 0.2

and

R = 10^{5}

. For the system identification phase, a shallow neural network of one hidden layer containing 12 neurons was utilized to extract the system dynamics via the learning process, and the simulation was carried out for 20 iterations so that a sufficiently accurate approximation was obtained. The results of the identification problem are depicted in Figure 7 for the corresponding terms of

f (q, \dot{q}) = - M^{- 1} (q) (C (q, \dot{q}) \dot{q} + G (q))

and Figure 8 for the corresponding terms of

g (q) = M^{-} 1 (q)

.

In the second phase, we set a conventional PPC stabilizing controller, parametrized by

k = 2

,

λ = 2

,

ρ_{0}^{1} = ρ_{0}^{2} = π / 3

,

ρ_{\infty}^{1} = ρ_{\infty}^{2} = π / 180

and

l_{1} = l_{2} = 1.5

, as the initial admissible control policy. Our aim was to drive the system towards the fixed configuration

{[{\bar{q}}_{1 d}, {\bar{q}}_{2 d}]}^{T} = {[π / 6, - π / 6]}^{T}

. The points used for the regression process were sampled from the set:

\begin{matrix} Ω_{q, ρ, ϵ} = {[\begin{matrix} - 0.5 & 0.5 \end{matrix}] \times [\begin{matrix} - 0.45 & 0.45 \end{matrix}] \times [\begin{matrix} ρ_{\infty}^{1} & 10 ρ_{\infty}^{1} \end{matrix}] \times \\ [\begin{matrix} - 0.5 & 0.5 \end{matrix}] \times [\begin{matrix} - 0.45 & 0.45 \end{matrix}] \times [\begin{matrix} ρ_{\infty}^{2} & 10 ρ_{\infty}^{2} \end{matrix}]} \end{matrix}

For the integral cost, the weights

α = 0.5

and

β = 0.5

were chosen to specify an equal trade-off between the error convergence and the requested input energy. Moreover,

P = 8000

points were chosen within

Ω_{q, ρ, ϵ}

, and the number of constraints was set to

M = 1250

. The cost function was approximated by 64 polynomial basis functions, chosen as follows:

σ (q, ρ, ϵ) = {[T_{i_{1}} (q) T_{i_{2}} (ε) T_{i_{3}} (ρ)]}_{i_{1}, i_{2}, i_{3} = {1, \dots, 2}}

, where

T_{i} (\cdot)

denotes the Chebyshev polynomial of degree i. The response of the system under the optimal policy was simulated for

3 s

for four different initial conditions and is illustrated in Figure 9 and Figure 10, where the ability of the algorithm to stabilize the system under strict performance specifications is demonstrated. The initial conditions for these simulations are included in Table 3, along with a comparison between the costs of the initial admissible policy and the final optimal one. It is evident that the cost was successfully decreased, owing mainly to the reduction in the required control effort.

6. Conclusions

A near-optimal control policy with a predefined transient and steady-state response for uncertain Euler–Lagrange dynamics was proposed. The method consisted of a neural network identifier, capable of accurately extracting the unknown dynamics as well as an iterative process for solving the HJB equation in order to produce an optimal stabilizing control policy with prescribed performance. The aforementioned approach was demonstrated through extensive simulation studies on a pendulum and a two-degree-of-freedom robotic manipulator. Future work will focus on expanding this method so that it is applicable to other classes of systems, while taking into consideration input and state constraints.

Author Contributions

Conceptualization, C.P.B. and K.J.K.; methodology, I.M. and C.P.B.; software, I.M. and C.V.; validation, I.M. and C.V.; formal analysis, I.M. and C.V.; investigation, all; resources, C.P.B.; data curation, I.M. and C.V.; writing—original draft preparation, I.M. and C.V.; writing—review and editing, C.P.B. and K.J.K.; visualization, C.V.; supervision, C.P.B. and K.J.K.; project administration, C.P.B.; funding acquisition, C.P.B. All authors have read and agreed to the published version of the manuscript.

Funding

The work of C.V. and C.P.B. was funded by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the second call for research projects to support postdoctoral researchers (HFRI-PD19-370).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data for this study are available from the corresponding author on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Powell, W.B.; Ryzhov, I.O. Optimal Learning and Approximate Dynamic Programming. In Reinforcement Learning and Approximate Dynamic Programming for Feedback Control; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2012; Chapter 18; pp. 410–431. [Google Scholar]
Kirk, D. Optimal Control Theory: An Introduction; Dover Books on Electrical Engineering Series; Dover Publications: Mineola, NY, USA, 2004. [Google Scholar]
Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
Werbos, P. Elements of intelligence. Cybernetica 1968, 11, 131. [Google Scholar]
Al-Tamimi, A.; Lewis, F.L.; Abu-Khalaf, M. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2008, 38, 943–949. [Google Scholar] [CrossRef] [PubMed]
Wang, F.Y.; Jin, N.; Liu, D.; Wei, Q. Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems with ε-Error Bound. IEEE Trans. Neural Netw. 2011, 22, 24–36. [Google Scholar] [CrossRef] [PubMed]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Jiang, Y.; Jiang, Z.P. Robust Adaptive Dynamic Programming and Feedback Stabilization of Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 882–893. [Google Scholar] [CrossRef] [PubMed]
Zhao, B.; Liu, D.; Luo, C. Reinforcement Learning-Based Optimal Stabilization for Unknown Nonlinear Systems Subject to Inputs with Uncertain Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4330–4340. [Google Scholar] [CrossRef] [PubMed]
Gao, W.; Jiang, Z.P. Learning-Based Adaptive Optimal Tracking Control of Strict-Feedback Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2614–2624. [Google Scholar] [CrossRef] [PubMed]
Jiang, Y.; Gao, W.; Na, J.; Zhang, D.; Hämäläinen, T.T.; Stojanovic, V.; Lewis, F.L. Value iteration and adaptive optimal output regulation with assured convergence rate. Control Eng. Pract. 2022, 121, 105042. [Google Scholar] [CrossRef]
Chen, C.; Modares, H.; Xie, K.; Lewis, F.L.; Wan, Y.; Xie, S. Reinforcement Learning-Based Adaptive Optimal Exponential Tracking Control of Linear Systems with Unknown Dynamics. IEEE Trans. Autom. Control 2019, 64, 4423–4438. [Google Scholar] [CrossRef]
Kamalapurkar, R.; Dinh, H.; Bhasin, S.; Dixon, W.E. Approximate optimal trajectory tracking for continuous-time nonlinear systems. Automatica 2015, 51, 40–48. [Google Scholar] [CrossRef]
Na, J.; Lv, Y.; Wu, X.; Guo, Y.; Chen, Q. Approximate optimal tracking control for continuous-time unknown nonlinear systems. In Proceedings of the 33rd Chinese Control Conference, Nanjing, China, 28–30 July 2014; pp. 8990–8995. [Google Scholar] [CrossRef]
Na, J.; Lv, Y.; Zhang, K.; Zhao, J. Adaptive Identifier-Critic-Based Optimal Tracking Control for Nonlinear Systems with Experimental Validation. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 459–472. [Google Scholar] [CrossRef]
Zhao, K.; Song, Y.; Ma, T.; He, L. Prescribed Performance Control of Uncertain Euler–Lagrange Systems Subject to Full-State Constraints. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3478–3489. [Google Scholar] [CrossRef]
Bhasin, S.; Kamalapurkar, R.; Johnson, M.; Vamvoudakis, K.; Lewis, F.; Dixon, W. A novel actor–critic–identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 2013, 49, 82–92. [Google Scholar] [CrossRef]
Bechlioulis, C.P.; Rovithakis, G.A. Robust Adaptive Control of Feedback Linearizable MIMO Nonlinear Systems with Prescribed Performance. IEEE Trans. Autom. Control 2008, 53, 2090–2099. [Google Scholar] [CrossRef]
Yin, Z.; Luo, J.; Wei, C. Robust prescribed performance control for Euler–Lagrange systems with practically finite-time stability. Eur. J. Control 2020, 52, 1–10. [Google Scholar] [CrossRef]
Jabbari Asl, H.; Narikiyo, T.; Kawanishi, M. Bounded-input prescribed performance control of uncertain Euler–Lagrange systems. IET Control Theory Appl. 2019, 13, 17–26. [Google Scholar] [CrossRef]
Dong, H.; Zhao, X.; Luo, B. Optimal Tracking Control for Uncertain Nonlinear Systems with Prescribed Performance via Critic-Only ADP. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 561–573. [Google Scholar] [CrossRef]
Fortuna, L.; Frasca, M. Optimal and Robust Control: Advanced Topics with MATLAB®; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar] [CrossRef]
Dimanidis, I.S.; Bechlioulis, C.P.; Rovithakis, G.A. Output Feedback Approximation-Free Prescribed Performance Tracking Control for Uncertain MIMO Nonlinear Systems. IEEE Trans. Autom. Control 2020, 65, 5058–5069. [Google Scholar] [CrossRef]
Kosmatopoulos, E.B.; Ioannou, P.A. Robust switching adaptive control of multi-input nonlinear systems. IEEE Trans. Autom. Control 2002, 47, 610–624. [Google Scholar] [CrossRef]
Bechlioulis, C.P.; Rovithakis, G.A. Prescribed Performance Adaptive Control for Multi-Input Multi-Output Affine in the Control Nonlinear Systems. IEEE Trans. Autom. Control 2010, 55, 1220–1226. [Google Scholar] [CrossRef]
Lewis, F.L.; Vrabie, D.L.; Syrmos, V.L. Optimal Control; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012. [Google Scholar] [CrossRef]
Guo, B.Z.; Zhao, Z. On convergence of tracking differentiator. Int. J. Control 2011, 84, 693–701. [Google Scholar] [CrossRef]
Rousseas, P.; Bechlioulis, C.; Kyriakopoulos, K.J. Harmonic-Based Optimal Motion Planning in Constrained Workspaces Using Reinforcement Learning. IEEE Robot. Autom. Lett. 2021, 6, 2005–2011. [Google Scholar] [CrossRef]
Potra, F.A.; Wright, S.J. Interior-point methods. J. Comput. Appl. Math. 2000, 124, 281–302. [Google Scholar] [CrossRef]
Abu-Khalaf, M.; Lewis, F.L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]

Figure 1. Graphical representation of (3).

Figure 2. Comparison between the actual values and their estimates for

f (q, \dot{q})

(top) and

g (q)

(bottom) for 300 random samples within

Ω_{q} \times Ω_{\dot{q}}

.

Figure 2. Comparison between the actual values and their estimates for

f (q, \dot{q})

(top) and

g (q)

(bottom) for 300 random samples within

Ω_{q} \times Ω_{\dot{q}}

.

Figure 3. The state response of the calculated optimal PPC scheme (top). Comparison between the control effort

τ - G (q_{d})

requested by the initial and the optimal PPC policies (bottom).

Figure 3. The state response of the calculated optimal PPC scheme (top). Comparison between the control effort

τ - G (q_{d})

requested by the initial and the optimal PPC policies (bottom).

Figure 4. The state response of the calculated optimal scheme (top). Control effort

τ - G (q_{d})

requested by the optimal policy (bottom left). Comparison of the tracking error’s evolution for the initial condition

x_{0}^{1}

between our method (

σ_{1}

) and successive approximation (

σ_{2}

) (bottom right).

Figure 4. The state response of the calculated optimal scheme (top). Control effort

τ - G (q_{d})

requested by the optimal policy (bottom left). Comparison of the tracking error’s evolution for the initial condition

x_{0}^{1}

between our method (

σ_{1}

) and successive approximation (

σ_{2}

) (bottom right).

Figure 5. Evolution of the transformed error

ϵ

by utilizing

w_{L}

obtained from the constrained (left) and the unconstrained (right) least-squares problem. The dashed lines represent the values of

ϵ

on the boundary of

Ω_{z}

.

Figure 5. Evolution of the transformed error

ϵ

by utilizing

w_{L}

obtained from the constrained (left) and the unconstrained (right) least-squares problem. The dashed lines represent the values of

ϵ

on the boundary of

Ω_{z}

.

Figure 6. Two-DOF robotic manipulator.

Figure 7. Comparison between the actual values and their estimates for

f_{1} (q, \dot{q})

(bottom) and

f_{2} (q, \dot{q})

(top) for 200 random samples within

Ω_{q} \times Ω_{\dot{q}}

.

Figure 7. Comparison between the actual values and their estimates for

f_{1} (q, \dot{q})

(bottom) and

f_{2} (q, \dot{q})

(top) for 200 random samples within

Ω_{q} \times Ω_{\dot{q}}

.

Figure 8. Comparison between the actual values and their estimates for

g_{11} (q)

,

g_{12} (q)

,

g_{21} (q)

and

g_{22} (q)

for 200 random samples within

Ω_{q} \times Ω_{\dot{q}} .

.

Figure 8. Comparison between the actual values and their estimates for

g_{11} (q)

,

g_{12} (q)

,

g_{21} (q)

and

g_{22} (q)

for 200 random samples within

Ω_{q} \times Ω_{\dot{q}} .

.

Figure 9. The state response of the calculated optimal PPC scheme for various initial conditions.

Figure 10. Comparison between the control effort

τ - G (q_{d})

requested by the initial and the optimal PPC policies.

Figure 10. Comparison between the control effort

τ - G (q_{d})

requested by the initial and the optimal PPC policies.

Table 1. Comparison between the initial PPC policy and the final optimal one.

Initial Condition $[q (0)$ , $ρ (0)$ , $ϵ (0)]$	Initial Policy’s Cost	Final Policy’s Cost	Final Policy’s Cost (Without Prescribed Performance)
$[0.73, 0.148, 0.45]$	$22.60$	$16.79$	$0.47$
$[0.2, 0.165, 0.25]$	$45.54$	$34.83$	$1.01$
$[0.56, 0.174, - 0.37]$	$3.21$	$1.59$	$0.05$
$[0.37, 0.173, 0.37]$	$6.63$	$1.4$	$0.21$
$[0.43, 0.11, - 0.13]$	$3.90$	$1.80$	$0.05$

Table 2. Two-DOF robotic manipulator parameter values.

$m_{1}$ (kg)	$l_{1}$ (m)	$I_{Z_{1}} (kg \cdot m^{2})$	$k_{1}$ (kg/s)	$m_{2}$ (kg)	$l_{2}$ (m)	$I_{Z_{2}} (kg \cdot m^{2})$	$k_{2}$ (kg/s)	$g {(m / s}^{2})$
3.2	0.5	0.96	1	2.0	0.4	0.81	1	9.81

Table 3. Comparison between the initial PPC policy and the final optimal one.

Initial Condition $[q_{1} (0)$ , $ρ_{1} (0)$ , $ϵ_{1} (0)]$ , $[q_{2} (0)$ , $ρ_{2} (0)$ , $ϵ_{2} (0)]$	Initial Policy’s Cost	Final Policy’s Cost
$[0.84, 0.17, 0.159], [- 0.12, 0.108, - 0.11]$	$5.24$	$1.18$
$[0.05, 0.145, 0.29], [- 1.02, 0.106, 0.134]$	$7.24$	$4.16$
$[0.69, 0.171, 0.425], [- 0.94, 0.101, 0.09]$	$5.69$	$1.01$
$[1.01, 0.118, 0.42], [- 0.165, 0.149, 0.27]$	$5.66$	$1.39$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vlachos, C.; Malli, I.; Bechlioulis, C.P.; Kyriakopoulos, K.J. Inducing Optimality in Prescribed Performance Control for Uncertain Euler–Lagrange Systems. Appl. Sci. 2023, 13, 11923. https://doi.org/10.3390/app132111923

AMA Style

Vlachos C, Malli I, Bechlioulis CP, Kyriakopoulos KJ. Inducing Optimality in Prescribed Performance Control for Uncertain Euler–Lagrange Systems. Applied Sciences. 2023; 13(21):11923. https://doi.org/10.3390/app132111923

Chicago/Turabian Style

Vlachos, Christos, Ioanna Malli, Charalampos P. Bechlioulis, and Kostas J. Kyriakopoulos. 2023. "Inducing Optimality in Prescribed Performance Control for Uncertain Euler–Lagrange Systems" Applied Sciences 13, no. 21: 11923. https://doi.org/10.3390/app132111923

APA Style

Vlachos, C., Malli, I., Bechlioulis, C. P., & Kyriakopoulos, K. J. (2023). Inducing Optimality in Prescribed Performance Control for Uncertain Euler–Lagrange Systems. Applied Sciences, 13(21), 11923. https://doi.org/10.3390/app132111923

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inducing Optimality in Prescribed Performance Control for Uncertain Euler–Lagrange Systems

Abstract

1. Introduction

2. State of the Art and Contributions

3. Problem Formulation and Preliminaries

3.1. Prescribed Performance Control (PPC)

3.2. Optimal Control with Prescribed Performance

4. Methodology

4.1. Identification of System Dynamics

4.2. Solving the Hamilton–Jacobi–Bellman Equation

5. Simulation Results

5.1. Case A: Pendulum

5.2. Case B: 2-DOF Robotic Manipulator

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI