Optimal Neural Tracking Control with Metaheuristic Parameter Identification for Uncertain Nonlinear Systems with Disturbances

In this paper, we propose an inverse optimal neural control strategy for uncertain nonlinear systems subject to external disturbances. This control strategy is developed based on a neural observer for the estimation of unmeasured states and inverse optimal control theory for trajectory tracking. The stabilization of states along the desired trajectory is ensured via a control Lyapunov function. The optimal parameters of the control law are identified by different nature-inspired metaheuristic algorithms, namely: Ant Lion Optimizer, Grey Wolf Optimizer, Harris Hawks Optimization, and Whale Optimization Algorithm. Finally, a highly nonlinear biological system subject to parameter uncertainties and external disturbances (Activated Sludge Model) is proposed to validate the control strategy. Simulation results demonstrate that the control law with Ant Lion Optimizer outperforms the other optimization methods in terms of trajectory tracking in the presence of disturbances. The control law with Harris Hawks Optimization shows a better convergence of the neural states in presence of parameter uncertainty.


Introduction
Researchers solving problems in engineering, computer science, biological, and physical sciences are developing advanced mathematical methods for control purposes. Technological advances have had a significant impact on the implementation of new analytical methods for solving nonlinear problems [1]. In most cases involving nonlinear control systems, states may not be completely measurable, which makes it difficult to solve complex control engineering problems. Other major difficulties in the area of control design are the use of a range of different models and concepts, a lack of standardization in the parameters, a lack of adequate control techniques, external disturbances, and the high level of nonlinearity of the equations that govern processes. A further issue is a lack of knowledge of important variables, since the states of the system can directly reflect the nature of the control design process allowing for high performance to be achieved [2]. It can therefore be stated that it is necessary to use advanced estimation, control, and optimization techniques to guarantee optimal nonlinear system performance. Knowledge of the system is important to understand its control requirements; however, the nonlinearities are often so complex that the control design for an adequate performance of the system is difficult [3][4][5][6]. Over time, new control strategies have emerged to ensure optimal system performance, and the estimation of states is vitally important for the development of feedback laws that allow for the avoidance of interruptions, stops, or process failures. For these reasons, it is important to develop devices called observers to estimate the missing states, and at the same time, to address the lack of online sensor measurements, the high cost of sensors, and the long laboratory analysis times [7,8]. Artificial

Neural Observer Based on RHONN
A discrete-time nonlinear system is considered as follows where z k ∈ R n is the state variable of the system at time k ∈ N + , u k ∈ R n is the input control, d k ∈ R n is the disturbance vector representing modeling errors and uncertain parameters, and f (z k ) ∈ R n and g s (z k ) ∈ R n × m are smooth mapping functions with f (0) = 0 and rank g s (z k ) = m , ∀z k 0. y k is the output vector and h(z k ) ∈ R p × n is a nonlinear function of the system states.

T
is the estimated state vector, w i,k ∈ R n are the online adapted weight vectors, w T i is the fixed weight vector associated with the control inputs to ensure controllability of the observer, ϕ i denotes a function ofẑ k or u k corresponding to the plant structure (1) or external inputs to the network, respectively. g i ∈ R p are the Luenberger gains, e k is the output error, and θ i is an L P dimensional vector defined as where n i t are nonnegative integers, L p is the number of high-order connections, and I L P is a collection of nonordered subsets of {1, 2, . . . , n + m}. S i is a sigmoid function vector defined as: The output error is defined by: e k = y k − y k .
The output error can be minimized by an appropriate selection of L p . w * i and w * are assumed ideal weight matrices for analytical purposes, and these are constant but unknown. The optimal unknown weights w * i.k are approximated by online adjustable versions w i,k . For online learning of the RHONN weights, an Extended Kalman Filter (EKF) is used [30]. EKF training of recurrent neural networks has proven its applicability to extensive difficult modeling and control problems. The sequential nature of the EKF provides a stochastic component that allows for more effective search of the weight space. The EKF methods are easily implemented in software and hardware. Perhaps the most significant limitation of EKF training is its limited applicability to cost functions other than minimizing sum of squared error. For the proposed neural network training, the cost function criterion is satisfied. The weights of the neural network become the estimated states. The neural network training to find the optimal weight values is as follows: where w i,k ∈ R Li is the weight vector, η i is the learning rate, K i,k ∈ R Li×p is the Kalman gain matrix, P i,k ∈ R Li×li is the prediction error associated covariance matrix, Q i,k ∈ R Li×li is the state noise associated covariance matrix, M i,k ∈ R p×p is a global scaling matrix, R i,k ∈ R p×p is the measurement noise associated covariance matrix, and H i,k ∈ R Li×n is a matrix of the derivatives of the network outputs with respect to all trainable neural network weights as follows: L i is the total number of neural network weights. P i , Q i , and R i are initialized as diagonal matrices. During training, K i,k , P i,k , and H i,k are ensured to be bounded [31].

Discrete-Time Inverse Optimal Control
Consider the following cost function related to trajectory tracking of the nonlinear system in (1) where δ = z k − z r,k ; z r,k is the desired trajectory for z k , J(δ k ) : R n → R + ; l(δ k ) : R n → R + is a positive semidefinite function, and R z : R n → R + is a real weighting matrix, symmetric and positive definite for control efforts. The cost function in (8) is a performance measurement. Availability of system states is necessary for the state feedback control design. Using the optimal value function J * (δ k ) for (8) as a Lyapunov function V L (δ k ), the cost function can be rewritten as [32] From Bellman's principle of optimality [33], for the case of infinite time horizon optimization, the value function V * L (δ k ) becomes time invariant and satisfies the discrete-time Bellman equation. The boundary condition V(0) = 0 must be satisfied so that V L (δ k ) becomes a Lyapunov function. The following discrete-time Hamiltonian [32] is defined A necessary condition for the optimal control law u k is ∂H(δ k , u k )/∂u k = 0. Then, The optimal control law to achieve trajectory tracking is therefore formulated as follows By substituting (12) in (10), the Hamilton-Jacobi-Bellman (HJB) equation is obtained. Solving this equation is a challenging task, and to address this problem, inverse optimal control based on Lyapunov stability is proposed [32].

Definition: Tracking Inverse Optimal Control
For the control law in (12) and tracking error δ k = z k − z r,k , inverse optimal stabilization along the desired trajectory is guaranteed if: Appl. Sci. 2020, 10, 7073 5 of 18 (i) (global) asymptotic stability of z k along reference z r,k is achieved (ii) V L (δ k ) is a (radially unbounded) positive definite function and satisfies the inequality The inverse optimal control law for trajectory tracking is based on a knowledge of V L (δ k ), and a Control Lyapunov Function (CLF) V L (δ k ) is proposed to ensure that requirements (i) and (ii) are satisfied. Hence, instead of solving the HJB equation for V L (δ k ) a CLF is proposed as follows: By substituting (14) in (12), the following control law is obtained: where P L and R Z are positive definite symmetric matrices that guarantee the existence of the inverse function in (15). The proof that the control law in (15) ensures stability and optimality for (1) is given in [32].

Metaheuristic Optimization Algorithms
In this section, the four optimization algorithms used in this paper are briefly described. It is worth noting that the versions used are the original proposed by their respective author, thus variants, hybrids, or improved versions are not considered.

Ant Lion Optimizer
The main inspiration for the ALO algorithm was the foraging behavior of antlion larvae [16]. The algorithm mimics the interactions between antlions and their prey (ants) in traps. To model these interactions, ants move around the search space and antlions try to hunt them and become more sophisticated at using traps. Because the movements of ants when searching for food are stochastic in nature, a random walk or move is chosen to model this movement, as follows: where cumsum computes the cumulative sum, n represents the number of iterations, t is the step of the random walk (or iteration), and r(t) is a function that is defined as follows: where t is the step of the random walk and rand is a random number produced with a uniform distribution on the interval [0, 1]. The random moves are based on Equation (17). Ants update their position with a random move at every step of the process. Given that every search space has a boundary, Equation (16) is not directly used to update the position of an ant. In order to keep the random moves inside the search space, they need to be normalized using the following equation, which is a min-max normalization: Appl. Sci. 2020, 10, 7073 where a i is the minimum random move for the ith variable, b i is the maximum random move for the ith variable, c t i is the minimum for the ith variable at the ith iteration, and d t i is the maximum for the ith variable at the ith iteration. Equation (18) should be applied at each iteration to ensure that random moves are confined to the search space.
When the prey is close to the trap, antlions hurl grains of sand outwards from the pit, causing the ant to slide further down. This behavior is modeled by reducing the radius of the ants' random walk using the following equations: where I is a ratio, c t is the minimum of all the variables in the ith iteration and d t is a vector that includes the maximum of all variables in the ith iteration.
When an ant falls to the bottom of the pit, the antlion updates its position to the last position of the ant in order to increase the chances of catching the prey. For this, the following equation is used: where t is the current iteration, Antlion t j is the position of the chosen jth antlion at tth iteration, and Ant t i is the position of the ith ant at tth iteration. At the end, the fittest antlion for each iteration is kept and considered elite. Given that the elite is the best antlion, it affects the movements of all the ants during the iterations. Thus, every ant walks randomly around a selected antlion by means of a roulette wheel and the elite as follows: where R t A is the random move around the selected antlion by the roulette wheel at the tth iteration, R t E is the random move around the elite at the tth iteration, and Ant t i is the position of the tth ant at the tth iteration. Figure 1 shows the pseudo code for the ALO algorithm. where t R A is the random move around the selected antlion by the roulette wheel at the tth iteration, t R E is the random move around the elite at the tth iteration, and t i Ant is the position of the tth ant at the tth iteration. Figure 1 shows the pseudo code for the ALO algorithm.

Grey Wolf Optimizer
The Grey Wolf Optimizer is inspired in the collective behavior of a pack of wolves hunting together [23]. Grey wolves show a very well-defined social hierarchy, where the leader of the pack is called dominant or alpha (α). The alpha wolf is in charge of making decisions about moving from one area to another, when to sleep, hunt, etc. The next level in their hierarchy is the beta wolf (β), which is the group of wolves that help the alpha to discipline the pack. The beta wolf is normally the best candidate to lead the pack when the alpha dies or becomes too old. The next level is the delta wolf (δ) and they serve as scouts or sentinels. Lastly, the omega wolf (ω) is the lowest in rank, it serves as the scapegoat and they are the last ones to eat. The omega wolves are actually very useful to the pack, since without them the pack would fight with each other.
Grey wolves work as a team and their hunting strategy is described in three steps: Finding and

Grey Wolf Optimizer
The Grey Wolf Optimizer is inspired in the collective behavior of a pack of wolves hunting together [23]. Grey wolves show a very well-defined social hierarchy, where the leader of the pack is called dominant or alpha (α). The alpha wolf is in charge of making decisions about moving from one area to another, when to sleep, hunt, etc. The next level in their hierarchy is the beta wolf (β), which is the group of wolves that help the alpha to discipline the pack. The beta wolf is normally the best candidate to lead the pack when the alpha dies or becomes too old. The next level is the delta wolf (δ) and they serve as scouts or sentinels. Lastly, the omega wolf (ω) is the lowest in rank, it serves as the scapegoat and they are the last ones to eat. The omega wolves are actually very useful to the pack, since without them the pack would fight with each other.
Grey wolves work as a team and their hunting strategy is described in three steps: Finding and chasing the prey, surrounding and frightening until it stops moving, and attack. In the GWO algorithm, the search for pray corresponds to the exploration of the space and the attacking step to the exploitation phase. The encircling behavior is governed by the following equations: where t is the current iteration, X p is a vector that represents the position of the prey, A and C are coefficient vectors, and X the position of a grey wolf.
Vectors A and C are obtained as follows: where r 1 and r 2 are random vectors in [0, 1] and a is a vector with components that decrease linearly from 2 to 0 during iterations. The alpha wolf usually leads the hunt and the beta and delta join. However, in an optimization search space, the location of the best position is not known; therefore, the GWO algorithm assumes that these wolves are closer to the location of the prey. Their three positions are saved, and the rest of the pack is encouraged to follow them updating their positions according to the following formulas: The pseudo code in Figure 2 describes the GWO algorithm: of the pack is encouraged to follow them updating their positions according to the following formulas: The pseudo code in Figure 2 describes the GWO algorithm:

Harris Hawks Optimization
The Harris Hawks Optimization relatively new algorithm proposed by Heidari et al. [27] in 2019. It is a nature-inspired population-based technique that mimics the cooperative behavior and hunting style of Harris' hawks. These raptors pounce a prey from different angles, attempting to surprise it and they display different chasing patterns depending on the particular natural scenarios and

Harris Hawks Optimization
The Harris Hawks Optimization relatively new algorithm proposed by Heidari et al. [27] in 2019. It is a nature-inspired population-based technique that mimics the cooperative behavior and hunting style of Harris' hawks. These raptors pounce a prey from different angles, attempting to surprise it and they display different chasing patterns depending on the particular natural scenarios and escaping patterns of the pray (usually rabbits). The HHO algorithm, as many of the other nature-inspired techniques, has both exploration and exploitation phases. For the exploration phase, they have mainly two strategies: one of them consists in perching based on the locations of other hawks, in order to be close to them at the moment of the attack, and the second strategy is perching based on random tall trees, still close to other members. Both strategies are equally probable and denoted by q in Equation (29), which models this behavior: where X(t + 1) is the position vector of the hawks for the next iteration t. X rabbit is the position of the rabbit, X(t) is the current position of the hawks, r 1 , r 2 , r 3 , r 4 , and q are random numbers in the range of (0, 1) and are updated in each iteration, LB and UB are the upper and lower bounds of variables, X rand (t) is a hawk selected randomly from the current population, and X m is the average position of the current population of hawks. The HHO algorithm transfers from exploration to exploitation and changes between different exploitative strategies based on the energy of the prey. This energy decreases during the escaping behavior, which is modeled by the following equation: where E is the escaping energy of the rabbit, T is the maximum number of iterations, and E 0 is the initial state of the energy. E 0 randomly changes in the interval (−1, 1) at each iteration. When its value decreases from 0 to −1, the prey is physically weak, and when it increases from 0 to 1, the prey strengthens. The dynamic escaping energy E decreases during the iterations. The next phase is called "soft besiege" when the rabbit still has energy and tries to escape performing random misleading jumps. This behavior is modeled by the following rules: where ∆X(t) is the difference between the position vector of the prey and the current location at iteration t, J = 2(1 − r 5 ) represents the random hop strength of the rabbit when trying to escape, and r 5 is a random number in (0, 1). The value of J changes to simulate the random movement of the rabbit. "Hard besiege" is when the prey is very tired and has low escaping energy. The Harris's hawks surround the prey to finally execute the surprise pounce. Their current positions are updated by Equation (33): The pseudo code for the HHO algorithm is described in Figure 3.
"Hard besiege" is when the prey is very tired and has low escaping energy. The Harris's hawks surround the prey to finally execute the surprise pounce. Their current positions are updated by Equation (33): The pseudo code for the HHO algorithm is described in Figure 3.

Whale Optimization Algorithm
The Whale Optimization Algorithm is inspired by the foraging behavior of humpback whales [25]. These mammals have a complex social behavior, which helps them to hunt in groups to feed on their favorite prey: krill. Humpback whales hunt krill and small fishes that swim close to the surface.

Whale Optimization Algorithm
The Whale Optimization Algorithm is inspired by the foraging behavior of humpback whales [25]. These mammals have a complex social behavior, which helps them to hunt in groups to feed on their favorite prey: krill. Humpback whales hunt krill and small fishes that swim close to the surface. They create distinctive bubbles along a circle path around the school, creating what is called a bubble-net to trap their prey.
Because the solution of an optimization problem on the search space is not known, the whale optimization algorithm assumes that the optimal candidate is the target prey. The behavior of the whales or search agents is given by the following equations: where t indicates the current iteration, X is the position vector, A and C are coefficient vectors, and X* is the position vector of the best solution so far. X* is to be updated in each iteration when a better solution is found. The vectors A and C are determined as follows: where a linearly decreases from 2 to 0 during iterations and r is a random vector in [0, 1]. The exploitation phase is called "Bubble-net attacking" where the whales circle their prey performing a spiral shape, which is described by the following equation: where D = X * (t) − X(t) is the distance of the ith whale to the prey, b is a constant to define the shape of the logarithmic spiral, and l is a random number in [−1, 1].
The exploration phase consists of a random search by each whale with A ≥ 1 or A ≤ 1, where A is a random value in order to force each whale to move away from a reference whale. The mathematical model is as follows: D = |C · X rand − X| (38) where X rand is a random position vector chosen from the current population. The pseudo code for the WOA algorithm is shown in Figure 4.

Optimal Control Strategy Architecture
Nonlinear systems can have a complex dynamic behavior based on the stability of their equilibrium points. Convergence of nonlinear system states to one operating point can be unpredictable depending on the nature and the initial state of the system, such that robust control strategies are necessary. Figure 5 displays the optimal neural control strategy for uncertain nonlinear systems subject to disturbances. The RHONO estimates unmeasured states of the nonlinear system needed to develop a state feedback control law. The controlled system could work properly in the presence of small disturbances but, for large disturbances and parameter uncertainties, the system could be unstable affecting the performance and operating safety. In order to address this issue, metaheuristic algorithms are used to identify optimal tuning parameters for controller. The metaheuristic algorithms propose a PL matrix for the CLF to reduce the mean square tracking error between the neural model state and its given trajectory reference. Optimization strategy stops when the reference trajectory is reached, and the CLF matrix satisfice the asymptotic stability condition given in Section 3. The optimization strategy is able to find optimal parameters for reference trajectories tracking of different operating points.

Optimal Control Strategy Architecture
Nonlinear systems can have a complex dynamic behavior based on the stability of their equilibrium points. Convergence of nonlinear system states to one operating point can be unpredictable depending on the nature and the initial state of the system, such that robust control strategies are necessary. Figure 5 displays the optimal neural control strategy for uncertain nonlinear systems subject to disturbances. The RHONO estimates unmeasured states of the nonlinear system needed to develop a state feedback control law. The controlled system could work properly in the presence of small disturbances but, for large disturbances and parameter uncertainties, the system could be unstable affecting the performance and operating safety. In order to address this issue, metaheuristic algorithms are used to identify optimal tuning parameters for controller. The metaheuristic algorithms propose a P L matrix for the CLF to reduce the mean square tracking error between the neural model state and its given trajectory reference. Optimization strategy stops when the reference trajectory is reached, and the CLF matrix satisfice the asymptotic stability condition given in Section 3. The optimization strategy is able to find optimal parameters for reference trajectories tracking of different operating points. metaheuristic algorithms propose a PL matrix for the CLF to reduce the mean square tracking error between the neural model state and its given trajectory reference. Optimization strategy stops when the reference trajectory is reached, and the CLF matrix satisfice the asymptotic stability condition given in Section 3. The optimization strategy is able to find optimal parameters for reference trajectories tracking of different operating points.

Case Study
The mathematical model is derived from Activated Sludge Model No. 1 (ASM1) [34] and describes the aerobic digestion process that occurs in a CSTR, consisting of the material balances of the components, the kinetics, and the stoichiometry of the major processes in the plant. The TCOD is the variable of interest that will be controlled by regulating the KLA. The following nonlinear discrete-time Equations (40)-(49) describe the biological transformation processes that occur in the reactor, and Figure 6 illustrates the plant configuration.

Case Study
The mathematical model is derived from Activated Sludge Model No. 1 (ASM1) [34] and describes the aerobic digestion process that occurs in a CSTR, consisting of the material balances of the components, the kinetics, and the stoichiometry of the major processes in the plant. The TCOD is the variable of interest that will be controlled by regulating the KLA. The following nonlinear discrete-time Equations (40)-(49) describe the biological transformation processes that occur in the reactor, and Figure 6 illustrates the plant configuration.
Soluble oxygen Readily biodegradable soluble substrate Slowly biodegradable particulate substrate Active heterotrophic particulate biomass Active autotrophic particulate biomass Soluble oxygen .µ max,A .µ 5 .µ 2 .X 4,k (44) Soluble nitrate and nitrite nitrogen Soluble ammonium (NH+ 4 and NH) nitrogen Soluble biodegradable organic nitrogen Particulate biodegradable organic nitrogen Parameters of the model and Monod reaction rates µ i , i = 1, ..., 7 [35] are given in Appendix A. The inputs of the plant are the dilution rates (D, D r ), KLA, biomass, and substrate concentrations (X i,in ). The output of the plant is the TCOD given by (49), which is composed of the readily biodegradable soluble substrate (X 1,k ), the slowly biodegradable particulate substrate (X 2,k ), and the inert soluble and particulate organic material (I s ). The steady-state value for I s is given by the corresponding constant value of the influent: TCOD = X 1,k + X 2,k + I s (49)

Results
The performance of the control strategy for trajectory tracking is verified with the biological nonlinear model (40)-(49), in the presence of load disturbances and parameter uncertainty. Reference trajectories are proposed based on the hydraulic retention time (HRT) of the wastewater treatment plant, and the maximal daily rate of organic biodegradability per biomass unit. Neural network structure for the biological model consists of five neurons in the hidden layer and one neuron at the output layer, with a logistic and a linear activation function, respectively. The neural network estimates the X 1,k and X 2,k states of the system in order to calculate the TCOD output. Parameter of the neural structure are given in Appendix A. The optimization consists of using each of the metaheuristic algorithms (ALO, GWO, HHO, and WOA) to find the best set of parameters to configure the control law (15) for optimal trajectory tracking. In order to configure the control law, matrix P L is used as the variable to be optimized. P L is a symmetric matrix, where all values must be real positive numbers. Because the matrix is symmetric, values P L12 and P L21 are the same, thus the vector to be optimized consists of three variables: [P L11 , P L12 , P L23 ]. The fitness function is the mean square tracking error between the neural model state and its given trajectory reference; thus, the objective is to minimize the area between desired trajectory and the actual output. In this section, two cases are presented where tracking error is compared when the system is subject to load disturbances and parameter uncertainty, respectively. In the following tests, the metaheuristic algorithms were configured as follows: 50 search agents, 100 iterations, a lower boundary of 0.00001, and an upper boundary of 1. MATLAB R2016a (Mathworks Inc., Natick, MA, USA) was used for these mathematical simulations.

External Disturbance
In all of the case studies, reference paths are used to comply with the TCOD requirements. The allowable TCOD concentration leaving a wastewater treatment plant has a maximum value of 50 mg COD/L. A step disturbance with an amplitude equal to 40% of the steady-state value X 1in (the input concentration of readily biodegradable substrate) is applied at t = 5 d. The tracking error comparison is displayed in Figures 7 and 8.  Figure 7 displays the tracking error for substrates along their respective desired trajectories. For the case of the neural model control with ALO algorithm, the tracking performance for both substrates is better than the other methods. For both sceneries, the neural model presents a slightly transient state error when disturbance is applied at t = 5 days. Figure 8a shows the TCOD tracking error as function of the substrates defined by (49). As in the previous test, the TCOD with ALO algorithm presents a good convergence along the desired trajectory as compared with the other methods. Figure 8b portrays the KLA signal behavior to reach the reference trajectories of the system. A time variant behavior of KLA is observed before the disturbance is applied, and then the KLA signal shows an incremental response over the remaining simulation time to reject the disturbance. For the case of the neural model control with ALO algorithm, the KLA signal illustrates a stable increasing behavior over the simulation time up to a value of 5 d −1 , maintaining the system along the reference trajectories. Thus, the control law with ALO outperforms the other methods in terms of tracking trajectory in a nonlinear disturbed system.

Parameter Uncertainty
Active heterotrophs are known to govern TCOD removal in activated sludge systems, and the biodegradation rates are expressed in terms of the active biomass. In this test, the biodegradation rate  Figure 7 displays the tracking error for substrates along their respective desired trajectories. For the case of the neural model control with ALO algorithm, the tracking performance for both substrates is better than the other methods. For both sceneries, the neural model presents a slightly transient state error when disturbance is applied at t = 5 days. Figure 8a shows the TCOD tracking error as function of the substrates defined by (49). As in the previous test, the TCOD with ALO algorithm presents a good convergence along the desired trajectory as compared with the other methods. Figure 8b portrays the KLA signal behavior to reach the reference trajectories of the system. A time variant behavior of KLA is observed before the disturbance is applied, and then the KLA signal shows an incremental response over the remaining simulation time to reject the disturbance. For the case of the neural model control with ALO algorithm, the KLA signal illustrates a stable increasing behavior over the simulation time up to a value of 5 d −1 , maintaining the system along the reference trajectories. Thus, the control law with ALO outperforms the other methods in terms of tracking trajectory in a nonlinear disturbed system.

Parameter Uncertainty
Active heterotrophs are known to govern TCOD removal in activated sludge systems, and the biodegradation rates are expressed in terms of the active biomass. In this test, the biodegradation rate is investigated by increasing the maximum specific growth rate m ax, H  by 20% for active  Figure 7 displays the tracking error for substrates along their respective desired trajectories. For the case of the neural model control with ALO algorithm, the tracking performance for both substrates is better than the other methods. For both sceneries, the neural model presents a slightly transient state error when disturbance is applied at t = 5 days. Figure 8a shows the TCOD tracking error as function of the substrates defined by (49). As in the previous test, the TCOD with ALO algorithm presents a good convergence along the desired trajectory as compared with the other methods. Figure 8b portrays the KLA signal behavior to reach the reference trajectories of the system. A time variant behavior of KLA is observed before the disturbance is applied, and then the KLA signal shows an incremental response over the remaining simulation time to reject the disturbance. For the case of the neural model control with ALO algorithm, the KLA signal illustrates a stable increasing behavior over the simulation time up to a value of 5 d −1 , maintaining the system along the reference trajectories. Thus, the control law with ALO outperforms the other methods in terms of tracking trajectory in a nonlinear disturbed system.

Parameter Uncertainty
Active heterotrophs are known to govern TCOD removal in activated sludge systems, and the biodegradation rates are expressed in terms of the active biomass. In this test, the biodegradation rate is investigated by increasing the maximum specific growth rate µ max,H by 20% for active heterotrophic biomass at t = 5 d. Figures 9 and 10 Figure 9 illustrates the tracking error performance for the substrates in the presence of parameter uncertainty. The neural model control using metaheuristic algorithms shows a good tracking performance with a neglected transient state error. For this test, HHO shows a better convergence along the trajectory reference in presence of parameter uncertainty. Figure 10a shows the TCOD tracking error as a result of the transient error of the substrates. Figure 10b displays the KLA signal behavior to reach reference trajectories of the system. The KLA shows a time variant behavior after the parameter uncertainty. For the case of the control law with WOA algorithm, KLA shows an increasing stable performance over simulation time up to a value of 4.8 d −1 to reject the disturbance. Thus, the control strategy with WOA demonstrates a more stable behavior for KLA, but control law with HHO shows a better convergence for the neural states. Thus, the control law combined with these metaheuristic algorithms is effective in terms of tracking trajectory in presence of parameter uncertainty.

Conclusions
This paper proposes an optimal neural tracking control strategy for nonlinear uncertain systems subject to external disturbances. The novelty of this strategy is the use of the newest metaheuristic algorithm (ALO, GWO, HHO, and WOA) implementation to find optimal tuning of the control law. Efficacy of the coupled strategy is proven in a highly nonlinear biological model subject to disturbances, uncertainties, and hard to measure state variables. A neural observer based on a  Figure 9 illustrates the tracking error performance for the substrates in the presence of parameter uncertainty. The neural model control using metaheuristic algorithms shows a good tracking performance with a neglected transient state error. For this test, HHO shows a better convergence along the trajectory reference in presence of parameter uncertainty. Figure 10a shows the TCOD tracking error as a result of the transient error of the substrates. Figure 10b displays the KLA signal behavior to reach reference trajectories of the system. The KLA shows a time variant behavior after the parameter uncertainty. For the case of the control law with WOA algorithm, KLA shows an increasing stable performance over simulation time up to a value of 4.8 d −1 to reject the disturbance. Thus, the control strategy with WOA demonstrates a more stable behavior for KLA, but control law with HHO shows a better convergence for the neural states. Thus, the control law combined with these metaheuristic algorithms is effective in terms of tracking trajectory in presence of parameter uncertainty.

Conclusions
This paper proposes an optimal neural tracking control strategy for nonlinear uncertain systems subject to external disturbances. The novelty of this strategy is the use of the newest metaheuristic algorithm (ALO, GWO, HHO, and WOA) implementation to find optimal tuning of the control law. Efficacy of the coupled strategy is proven in a highly nonlinear biological model subject to disturbances, uncertainties, and hard to measure state variables. A neural observer based on a RHONN is proposed to estimate the unmeasured states of the system necessary to develop a state  Figure 9 illustrates the tracking error performance for the substrates in the presence of parameter uncertainty. The neural model control using metaheuristic algorithms shows a good tracking performance with a neglected transient state error. For this test, HHO shows a better convergence along the trajectory reference in presence of parameter uncertainty. Figure 10a shows the TCOD tracking error as a result of the transient error of the substrates. Figure 10b displays the KLA signal behavior to reach reference trajectories of the system. The KLA shows a time variant behavior after the parameter uncertainty. For the case of the control law with WOA algorithm, KLA shows an increasing stable performance over simulation time up to a value of 4.8 d −1 to reject the disturbance. Thus, the control strategy with WOA demonstrates a more stable behavior for KLA, but control law with HHO shows a better convergence for the neural states. Thus, the control law combined with these metaheuristic algorithms is effective in terms of tracking trajectory in presence of parameter uncertainty.

Conclusions
This paper proposes an optimal neural tracking control strategy for nonlinear uncertain systems subject to external disturbances. The novelty of this strategy is the use of the newest metaheuristic algorithm (ALO, GWO, HHO, and WOA) implementation to find optimal tuning of the control law. Efficacy of the coupled strategy is proven in a highly nonlinear biological model subject to disturbances, uncertainties, and hard to measure state variables. A neural observer based on a RHONN is proposed to estimate the unmeasured states of the system necessary to develop a state feedback control law. Inverse optimal control theory for trajectory tracking based on a neural model is used to design a stabilizing control law. In order to evaluate the efficiency of the coupled strategy, four configurations with metaheuristic algorithms are used. Two case studies are examined, system subject to load disturbances, and parameter uncertainty, respectively. Results via simulations show that the optimal control law combined with the ALO metaheuristic algorithm is a robust method of stabilizing the system states along the reference trajectories in presence of external disturbances. Optimal control law combined with the HHO is effective in presence of parameter uncertainty. Control strategy with the other methods shows a significant transient state error due to a limited tuning. Thus, optimal operation of the plant can improve energy efficiency and increase the treatment capacity. As future work, the control strategy will be proven in a real scale pilot plant for wastewater treatment to validate its efficacy.  Acknowledgments: A graduate scholarship for Roxana Recio-Colmenares was provided by the National Council for Science and Technology (CONACyT) of México.

Conflicts of Interest:
The authors declare no conflict of interest.