Model-Free Optimized Tracking Control Heuristic

: Many tracking control solutions proposed in the literature rely on various forms of tracking error signals at the expense of possibly overlooking other dynamic criteria, such as optimizing the control effort, overshoot, and settling time, for example. In this article, a model-free control architectural framework is presented to track reference signals while optimizing other criteria as per the designer’s preference. The control architecture is model-free in the sense that the plant’s dynamics do not have to be known in advance. To this end, we propose and compare four tracking control algorithms which synergistically integrate a few machine learning tools to compromise between tracking a reference signal and optimizing a user-deﬁned dynamic cost function. This is accomplished via two orchestrated control loops, one for tracking and one for optimization. Two control algorithms are designed and compared for the tracking loop. The ﬁrst is based on reinforcement learning while the second is based on nonlinear threshold accepting technique. The optimization control loop is implemented using an artiﬁcial neural network. Each controller is trained ofﬂine before being integrated in the aggregate control system. Simulation results of three scenarios with various complexities demonstrated the effectiveness of the proposed control schemes in forcing the tracking error to converge while minimizing a pre-deﬁned system-wide objective function.

neural network is employed to approximate the tracking strategy for a time-varying dynamical environment with coupling uncertainties.
The NLTA heuristic relies on a nonlinear accepting transfer function such as the one used in low pass filters. It was developed to solve the NP-hard problems in [8,31]. In [8], NLTA is adopted to tune the PID control gains for interactive multi-area power system network to solve a combined voltage and frequency regulation problem. The results outperformed other analytical and heuristic solutions in terms of the closed-loop time response characteristics. In addition, the NLTA algorithm is applied to solve redundancy allocation problems by optimizing the reliability of the underlying systems in [32]. In [? ], the algorithm is employed to solve a non-convex economic dispatch problem using various non-convex objective functions.
Inspired by biological neurons, artificial neural networks are composed of multiple layers of interconnected processing elements (known as neurons or nodes) [34]. The neural network is organized into layers, where the first and last layers are known as the input and output layers, respectively, while the layers in between are referred to as the hidden layers. Nodes from different layers are connected through some weights reflecting how strong or weak is the connection between the nodes [35]. In a supervised learning approach, the training process of a neural network is accomplished by adjusting the connection weights between the different layers iteratively [36]. Neural networks have been widely applied as artificial intelligence tools to solve many complex problems [37][38][39][40][41]. For instance, they were applied to forecast the electric loads in power systems where they outperformed other standard regression methods [42]. A class of adaptive control problems is solved using neural networks in [43]. An adaptation algorithm that employs an online neural network is developed for nonlinear flight control problems in [44]. A multi-dimensional signal processing problem is solved using an extended quaternionic feedforward neural network scheme in [45]. Algorithms based on Levenberg-Marquardt optimization schemes are used for the neural network training in [46,47]. An optimization method based on Bayesian approach is employed to learn the dynamics of a process in an active learning framework in order to maximize the control performance [48]. It does not depend on prior knowledge of the dynamic model of the process. Instead, the dynamics are learned by observing the behavior of the physical system. This is combined with an optimal control structure to develop the full controller. A robust control approach is developed for a nonlinear system with unmatched certainties where a critic neural network is adopted to solve the underlying Hamilton-Jacobi-Bellman equation [49].
Tracking control processes differ in various ways. Some of them employ complicated forms of control laws based on sliding mode approaches. They may be inconvenient to implement in digital control environments [18,19,23,50]. Furthermore, the control laws may hold higher orders of the tracking errors in addition to the embedded nonlinearities in the system. Some sliding mode techniques used for multi-agent tracking systems involve several coupled layers of control laws [51]. Several tracking techniques use adaptive Jacobian approaches which encounter difficulties in case of multi-task tracking control operations [22]. On another side, optimal tracking algorithms often depend on the prior knowledge of the system dynamics. This is required by optimal control laws which are typically computed by solving several coupled differential equations [52]. This becomes more complicated as the order of the dynamic model increases [19]. Tracking approaches could also use sophisticated performance indices, system transformations, and multi-stage control methods in order to solve the underlying tracking problems for some classes of nonlinear systems [21,50].
Herein, artificial intelligence (AI) tools are conditioned to solve tracking control problems while addressing some of the above mentioned challenges. The idea is based on designing separate flexible easy-to-implement tracking and optimization units the complexity of which do not increase with the increase of the plant's complexity. To do so, the overall optimized tracking process is divided into two sub-tasks. The first minimizes the tracking error, while the second minimizes other dynamic signals within a global flexible user-defined cost function. The use of one overall objective function can bias the quality of the tracking control law in addition to increasing the state and action space dimensions of a technique such as Q-Learning. By splitting the control objective into two sub-tasks and introducing simply structured performance indices allows for solving the problem without applying any system transformations or over-estimating the nonlinearity of the control laws. This technique can be very useful in complex cases, such as cooperative control or multi-agent systems, for instance, where multiple layers can be employed to solve the tracking problem among the agents themselves and the desired reference model in addition to optimizing other relevant indices.
As such, the paper's main contribution is the development of a modular model-free optimized control architecture to track reference signals while optimizing a set of designer-specific criteria within a global objective function. The proposed architecture is model-free where no prior information is required about the system dynamics. The technique is modular in the sense that it can be applied to a large class of linear and nonlinear systems, such as single and multi-agent systems, with virtually the same architecture.
The rest of this paper is organized as follows: Section 2 presents the overall system architecture. Section 3 details the two proposed tracking controllers, which are based on RL and NLTA. An optimization scheme based on a feedforward neural network is introduced in Section 4 to optimize the total cost function. The tracking and optimization units are integrated into an aggregate control system in Section 5. Section 6 validates the performance of the proposed control schemes using different simulation scenarios. Finally, a few concluding remarks are made in Section 7.

Control Architecture
This section lays out the architecture of the developed optimized tracking mechanisms for a class of dynamic systems. The objective of the tracking control problem is to find the best control strategies by optimizing the tracking error. Two methods are proposed for this purpose. The first is based on RL while the second is based on NLTA. Both controllers are supplemented with a neural network to optimize an overall cost function.

Tracking Control
RL provides a decision making mechanism where the agent learns its best strategy u in a dynamic environment in order to transit from one state E to a new state E +1 while maximizing a reward R +1 , as shown in Figure 1. The RL-based controller is founded on a Markov Decision Process employed to decide on the tracking control signals, as shown in Figure 2. Given a vector X k ∈ R n of measurable states at a discrete time index k, the tracking process compares the output r actual k ∈ R, which is also one of the states (i.e., r actual k ∈ X k{j} , j ∈ {1, . . . , n}), with its desired value (i.e., reference signal) r desired k ∈ R and computes the resulting tracking error e k = r desired k − r actual k ∈ R. The respective tracking control signal u RL k ∈ R is decided by where Q(. . . ) is a mapping from an evaluation state E = {e k−3 , e k−2 , e k−1 , e k } to the associated best tracking control strategy u RL from a range of feasible discrete values using an optimized Q- Table  learned from an RL process, while is the state-index.

Environment
Action Q- Table  ∑ Dynamical System The NLTA-based controller is similar in architecture as its RL-based counterpart. However, instead of the RL, it uses the NLTA algorithm [8] to automatically tune the gains (K p , K i , K d ) of a PID controller, as shown in Figure 3. The resulting control signal u NLTA ∈ R takes the following form: where e(t) is the continuous-time tracking error at time t. The training of the NLTA-based controller is conducted in continuous time because NLTA is a continuous-time algorithm in nature. However, the controller is discretized into a discrete-time system after the training phase so that it can be integrated into the aggregate control structure, as shall be explained later. That way, the RL-and NLTA-based controllers can be benchmarked on a fair basis.

Cost Function Optimization
Standalone tracking processes usually overlook the optimization characteristics of the overall dynamic performance alongside the tracking process. Therefore, it is useful to adjust the dynamic tracking control signal in order to improve the overall optimization features of the underlying control system. In this work, this is achieved through an auxiliary control signal u NN k ∈ R, as shown in Figure 4. To this end, a neural network is trained using an optimized performance criteria. Unlike the tracking control schemes of Figures 2 and 3, the neural network takes the full state feedback measurements X k as input in order to advise the supporting control signal u NN k , such that where f is the input-output mapping of the neural network.

Neural Network Dynamical System
Neural Network Cost Optimizer Adaptation Figure 4. Neural network-based optimization control scheme.

Agrregate Control System
Each of the above tracking control schemes is augmented with the optimization control system of Figure 4. As such, the overall control laws of the RL-and NLTA-based controllers are defined as u T = u RL + u NN and u T = u NLTA + u NN , respectively. The dynamical system (i.e., the plant) may be defined in the time domain by either a linearized continuous-or discrete-time state-space model: where the superscripts "c" and "d" denote the continuous-and discrete-time matrices, respectively. For the purpose of this study, which is conducted in the discrete-time domain, the plant model needs to be discretized if presented in the continuous-time domain. In the following, we will detail the learning paradigms of each of the control schemes.

RL-Based Tracking Control Algorithm
The RL process trains the tracking controller to opt for the optimal policies that minimize the tracking error. This is done through an iterative training process which continuously updates a Q- Table  by penalizing or rewarding the taken actions. The table keeps track of the maximum expected future rewards for each feasible state-action pair (E , u RL ), where E = [e k−3 e k−2 e k−1 e k ] T . An -greedy algorithm is applied to trade off between the exploitation and exploration of the training process. The update of the Q- Table entries follow the Q-Learning pattern, defined by [7] Q where is the agent state index, α is a learning rate, γ is a discount factor, and R(E , u RL ) is the training reward for taking action u RL at state E . The process is detailed in Algorithm 1. Note that the accuracy of the obtained control signals u RL depends on the discretization steps of the states and the tracking errors. The finer is the resolution, the longer the learning may take.
Algorithm 1 Reinforcement Learning: Offline Computation of Q- Table   Input: Minimum and maximum bounds X min , X max , u RL min , u RL max , E min , and E max of the variables X, u RL , and e, respectively.
Discretization steps ∆X, ∆E, and ∆u RL of the variables X, e and the control signal u RL , respectively.
Output: Q- Table   1: Initialize the Q- Table randomly 2: repeat 3: Update the entries of the Q-Table using (3) 4: until A convergence criterion is met 5: return Q- Table   3.

NLTA-Based Tracking Control Algorithm
With the NLTA-Based Tracking method, a PID controller is applied where the PID gains are tuned using an NLTA algorithm [8]. The NLTA technique adopts the concept of a low-pass filter, whose transfer function is H(s) = 1/(1 + s/ω 0 ) for some nominal frequency ω 0 . In particular, it bases its heuristics on the magnitude of the transfer function in the frequency domain, for some frequency ω. The magnitude function controls the convergence speed of the search process within the domains of the optimized variables. The NLTA algorithm optimizes the control gains by minimizing a tracking error-based objective cost function. In this work, the objective function is chosen to be in the form of an Integrated Squared Error (ISE), defined as with e(τ) being the tracking error at continuous time τ. Full details of how the NLTA is applied to optimize the gains of the PID controller are provided in Algorithm 2.

Output:
Optimized PID gains (K p , K i , K d ).
1: for p = 1 to N_EPISODES do Beginning of an episode 2: Randomly initialize the PID gains K p , K i , K d , to some stable values Stability may be verified by simulating the system in Figure 3 3: Simulate the system in Figure 3 and calculate the ISE using (5) 4: Smallest_ISE_p ← ISE Smallest ISE in the episode 6: Beginning of an iteration within an episode 8: Randomly select one PID gain candidate K p , K i or K d from its respective range 9: Simulate the system in Figure 3 with the PID gain candidate selected in the previous step (line 8), and calculate the ISE using (5) 10: The second condition allows for exploration to avoid local minima 12: Replace the PID gain with its candidate value The one selected in line 8 13: Smallest_ISE_p ← ISEV 14: if (ω − ∆ω) > 0 then 15: ω ← ω − ∆ω To control the convergence speed of the search process 16: end if 17: end if 18: end for 19:

Neural Network Optimization Algorithm
A nonlinear state feedback control law u NN is developed to optimize the dynamic performance of the control scheme. It is added to the tracking control effort to form the aggregate control signal. Although the Q-Table in its simplest form can be used to control the dynamic system, it results in non-smooth discrete actions, and consequently an undesired scattered dynamic performance [53]. Therefore, the added neural network trained using this Q-Table is able to generate a continuous (non-discrete) control signal which can smooth out the system's behavior.
To this end, two feedforward multi-layer perceptrons are trained offline for comparison purpose. Only one of them is eventually integrated in the closed-loop control system to generate an optimized value of u NN , as shall be detailed later. The training data for each neural network is prepared using a separate Q-Table with a similar structure to the one employed in the RL-based tracking controller, except that this time the full state vector is considered, instead of the tracking error combination.
The reason behind trying two neural networks is the ability to test and compare the following two objective cost functions: where S ∈ R n×n and Z ∈ R are weighting factors reflecting the designer's preference of how the state and the control effort are prioritized. Notice how the second performance index F 2 gives no importance to the current state, unlike F 1 . Instead, it is more driven by the future states. In the following, we will denote the neural networks trained using objective functions F 1 and F 2 by NN 1 and NN 2 , respectively; while their outputs are denoted by u NN 1 and u NN 2 .
It is worth mentioning that the cost function can also include other terms that may be of interest to the design. For instance, it can include terms related to the transient response of the system, such as the overshoot, settling time, rise time, etc. This is an appealing feature of the proposed architecture as it gives the flexibility of optimizing the terms of choice without increasing the controller's complexity.
The training samples are arranged in two steps. The first associates the states to all possible control actions; while the second applies an optimization criteria to decide on the control action to be applied at any given state [53]. The complete training process of the neural networks is presented in Algorithm 3.

Input:
Minimum and maximum bounds X j min , X j max of every state X j ∈ X, j = 1, 2, . . . , n, and its discretization step ∆X j .
Minimum and maximum bounds u NN min and u NN max of u NN , and its discretization step ∆u NN .
3: Create Q-Tables Q 1 (1 . . . N u , 1 . . . N X ) and Q 2 (1 . . . N u , 1 . . . N X ), whose rows and columns correspond to the possible discrete actions and states formed in lines 1 and 2, respectively. 4: Populate Q 1 and Q 2 using cost functions (6a) and (6b), respectively. 5: Create and initialize two feedforward multi-layer perceptrons, NN 1 and NN 2 , with n inputs (corresponding to samples of states X 1 , X 2 , . . . , X n ) and one output (corresponding to the action u). 6: Train NN 1 and NN 2 using the data in Q 1 and Q 2 , respectively. 7: return NN 1 and NN 2 It is worth mentioning that, unlike the RL algorithm, the discretization steps ∆X j , j = 1, . . . , n, and ∆u NN for the measured states and the control signals, respectively, can be refined as needed without running into massive calculation overhead [53]. This is because the optimization process associated with each state X j follows a different optimization and approximation procedure. It is accomplished in one episode, unlike the RL approach which employs successive search episodes, as explained earlier.

Aggregate Control Scheme
After the offline training of the RL-and the NLTA-based trackers along with the NN optimizer, they are integrated in two feedback systems of the same architecture, as shown in Figure 5. The first system uses the RL-based tracker u RL while the second system employs the NLTA-based tracker u NLTA . One of the goals of this work is to compare the performances of both systems. The aggregate control signal u T k applied to the dynamic system under consideration is either depending on the type of tracking controller used. In an abuse of notation, the following notation is adopted in the rest of the paper: , depending on the NN optimizer used.  The two systems operate in discrete time. To adapt the NLTA-based tracker to this architecture, it needed to be discretized, since it is a continuous-time controller in nature. In other words, we needed to discretize the continuous-time PID controller whose gains were optimized using the NLTA algorithm.

RL/NLTA Tracker
The transfer function of a discrete-time PID controller in the z-domain is with T s being the sampling period. This yields Converting to a difference equation leads to the following discrete-time control expression: The proposed machine learning processes rely on heuristic tools and neural network approximations. Once the offline training phases are complete, the resulting units are applied in the closed-loop aggregate structure. The NLTA relies on a low pass filter-like accepting function to search for the best feasible combination of control gains. On another side, the reinforcement learning maximizes a cumulative reward in order to transit from one state to another while converging to an equilibrium in the process of minimizing the tracking error.

Simulation Setup
The proposed control architecture is applied to control the navigation of an autonomous flexible wing aircraft. This type of aircraft is described as a two-body system which is composed of a wing and a pilot/fuselage connected by a hang strap [54]. In a manned system, the pilot controls the aircraft by pushing, pulling, and rolling the control-bar which adjusts the pilot's center relative to that of the wing [55]. The unmanned control process of flexible wing aircraft possesses many challenges. This is mainly because the system is extremely difficult to model due to its continuously varying aerodynamics [54][55][56].
The motion of an unmanned flexible wing aircraft can be decoupled into longitudinal and lateral motion frames. Herein, only the lateral motion control is tackled to validate the performance of the proposed tracking schemes. A vector of measurable states X = [vφφ φ ϕ] T is considered, where v, φ and ϕ are the lateral velocity, roll attitude, and yaw attitude, respectively.
The aircraft is required to follow a desired rolling maneuver (i.e., r desired k ) and to undergo continuous opposite banking turns as follows: with the following initial conditions: X 0 = 10 m/s 0.5 rad/s 0.5 rad/s 0 0 where T is the total number of simulation iterations. Note that T = T d /T s , with T d = 200 s and T s = 0.01 s being the total duration of the simulation and the sampling time, respectively.
The reward function R is determined by a scalar convex cost criteria δ k = e 2 k + 0.9 e 2 k−1 + 0.8 e 2 k−2 + 0.7 e 2 k−3 , such that This function assigns a positive reward if the dynamic cost δ k+1 is less than δ k , where the highest reward is achieved at equilibrium. Although this is a model-free control architectural heuristic, it does depend on a priori known subset of a stable region for the gains to be optimized so that it is explored as a search space by the optimization algorithms. In general, this region can be obtained through an empirical study or/and using some already known facts about the system whenever available. For instance, with PID control gains, we already know that they ought to be positive. In addition, in this study, the search range of the PID gains is selected around an initial value (K p = 9.9345, K i = 4.6778, K d = 3.5585) determined by Matlab's PID Tuner so as to minimize the settling time. This tool can search for control gains that compromise between having larger closed-loop bandwidth (i.e., faster response) and having enough gain and phase margins to tackle the robustness of the tuning outcome. The NLTA parameter ∆ω should be selected small relative to the resonance frequency ω 0 in order to control the speed of the search process. The NLTA algorithm parameters and the resultant optimized PID gains are listed in Tables 1 and 2. To show the effect of some parameters on the NLTA algorithm, the control system is simulated using the PID gains obtained by the NLTA algorithm executed with different parameter values. The ISE of these runs is plotted in Figure 6. It is clear that running the controller with the PID gains optimized by the NLTA algorithm leads to a far better ISE than with the initial PID gains.  Table 1. "Initial PID Gains" refers to the case where the PID gains were not tuned by the NLTA. The other two cases are related to the PID gains obtained by running the NLTA algorithm with the parameters in Table 1 except the ones specified by legend labels.  In Q-Learning, the states and the control actions are discretized within their feasible limits where the discretization steps decide and control the overall computational time taken in learning process. In this work, the learning parameter α is set to 0.5 in order to trade-off between the previous knowledge and the new reward or learning experience. The exploration rate is set according to an -greedy algorithm to avoid falling in local maxima. The other learning parameter values are taken as = 0.9 and γ = 0.9. The rest of the parameters are listed in Table 3. Figure 7 shows the ISE plots corresponding to RL-optimized PID gains obtained using different RL learning parameters. The results justify the parameter choices specified in Table 3.   Table 3.
To assess the performance of cost functions (6a) and (6b), a neural network optimizer is assigned to each of them; namely NN 1 and NN 2 , respectively. The structure and training parameters of the neural network optimizers are summarized in Table 4. The weighting matrices S and Z are selected as to weigh the importance of the different dynamic signals in each of the cost functions. They are set to At first, both neural network optimizers are compared based on their accumulated cost ∑ T k=1 F i (k), i ∈ {1, 2}, to decide on the one to adopt in the optimization loop of the aggregate control system ( Figure 5). The results are illustrated in Figure 8. They show that NN 2 yields a lower cumulative dynamic cost compared to that of NN 1 . As such, NN 2 is chosen as the control optimizer in the subsequent simulations. To demonstrate the neural network's sensitivity to some of its parameters and the appropriateness of the values adopted in Table 4, the control system is run with NN 2 after it is trained with different number of hidden neurons (NOH) and discretization steps (NOS) of the state and action spaces. The latter is specified as 20 in Table 4. The results are displayed in Figure 9. The figure reveals the importance of having finer NOS. However, too small values may make the learning time excessively long. The number of hidden neurons is chosen to be just enough to capture the system's complexity without risking to over train the network.   Table 4.
Herein, we opted to focus more on the structure of the AI mechanisms after testing sufficient sets of initial learning environments and choosing the best combinations that compromise among our objectives. In the following, we will refer to the RL-and NLTA-based trackers with and without the neural network optimizer as RL, RL+NN, NLTA, and NLTA+NN.

Performance Analysis
Three simulation scenarios are considered. The first adopts the dynamic model of the aircraft at nominal trim speed and flight condition. In the second scenario, a sudden variation in the dynamic model is applied at time t s = 20 s (corresponding to k s = t s /T s = 2000). It aims at assessing the controller's performance in the face of a sudden drop in the aircraft's payload, for example. Finally, a more aggressive simulation scenario is considered, where the dynamics of the aircraft are allowed to vary around their nominal values (A d 1 and B d 1 ) at each evolution step k. The variation of each state (shown in Figure 10) is drawn from a normal distribution of zero mean and a variance of 0.5. The dynamics of the three scenarios are summarized in Table 5, where for every element (i, j) of matrices A d k and B d k , is defined as where σ A , σ B ∼ N (0, 0.5).  The discrete-time state space matrices are adopted from [55] as The aircraft banking simulation results of the first scenario are shown in Figures 11 to 13. The RLand NLTA-based trackers with and without the neural network optimizer are able to asymptotically stabilize the aircraft around the desired banking trajectory φ desired k . The tracking errors and the control signals of RL and RL+NN are characterized by a chattering behavior within the envelopes [−0.005 rad, 0] and [−0.45 rad, 0.45 rad], respectively. Nevertheless, the tracking errors generated NLTA and NLTA+NN are asymptotically convergent. It also worth noticing that the control signals of the latter controllers are significantly smoother than their counterparts dispatched by RL and RL+NN. Figure 12 clearly demonstrates that the neural network optimizer contributes to reducing the cumulative tracking error, however small this contribution might be. The total energy cost ∑ T k=1 X T k+1 S X k+1 + Z u T k 2 , of the four controllers across the three scenarios are summarized in Table 6. From this measure, we can deduce that the controllers share similar performances due to the insignificant differences in their total costs. It is interesting to notice that, the NLTA+NN variant of the controller achieved the lowest total energy cost.   The same remarks are confirmed by the simulation of the second scenario. The results are depicted in Figures 14 to 16. The proposed control structures initially took a few seconds to converge to the reference signal from the initial condition, as shown in Figure 14. However, once on track, it is interesting to notice that they were virtually insensitive to the abrupt change in the system dynamics at time t = 20 s. The system robustness is clearly revealed again in Figures 14 and 15. The influence of the neural network optimizer in improving the performances of the standalone tracking units is demonstrated in Figure 15b and Table 6. Furthermore, the resulting control signals generated by the tracking controllers are shown in Figure 16 where the NLTA and NLTA+NN exhibited smooth behavior compared to that of RL and RL+NN.   Figures 17 and 18. This is clearly illustrated by examining the roll angle rateφ indicator in Figure 20c. This scenario articulates the advantage of adding the neural network optimizer, as shown in Figure 18b and Table 6. It is in such a case that the difference between the trackers with and without the neural network optimizer is most significant. The generated control signals and hence the associated system states, reacting to the disturbance in the dynamics, are depicted in Figures 19 and 20, respectively. The resulting control signals show instantaneous counteractions made by the tracking controllers in response to the imposed disturbance leading to the performance demonstrated in Figures 17 and 20.     Figure 17).

Conclusions
The article presents a synergetic integration approach of machine learning tools to provide a model-free solution for a class of tracking control problems. The proposed family of controllers solve the underlying problem using two control loops: one for tracking and one to optimize an overall performance measure. The controllers are tested on a navigation mechanism of a flexible wing aircraft with uncertain/varying dynamics. After an offline training process, they demonstrated a high ability to simultaneously minimize the tracking error and the total energy cost in different test cases of varying complexities. They also succeeded to maintain the system's stability in the face of excessive disturbances. The neural network optimizer proved to well complement the trackers to better annihilate the cumulative tracking error as well as further reduce the total energy cost. Although the four controllers showed a comparable performance in terms of cumulative cost, NLTA and NLTA+NN tend to generate smoother control signals and asymptotically convergent tracking errors.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.