1. Introduction
Neural sliding mode control (SMC) has become an effective paradigm for dealing with the nonlinearities and uncertainties inherent in robotic systems. In particular, discrete-time formulations of SMC combined with neural approximators have gained attention for real-time control applications involving robotic manipulators. However, the tuning of control gains in these architectures remains a non-trivial task that directly influences performance, robustness, and stability.
Recent contributions in the literature have explored a variety of approaches to optimize sliding mode controller parameters using nature-inspired and learning-based strategies. For example, in [
1], a robust adaptive neural SMC framework was proposed, optimized via Grey Wolf Optimizer to enhance tracking accuracy and robustness in robot manipulators.
Similarly, in [
2], a Particle Swarm Optimization (PSO)-tuned SMC was implemented in the context of robotic rehabilitation, highlighting its effectiveness for human–robot interaction. Within physical human–robot collaboration (pHRC), recent work has modeled a robot’s trust in its human co-worker using measurable factors, validated on a collaborative manipulator setup [
3]. Other researchers have focused on Bayesian-based strategies, such as adaptive Gaussian process modeling with adaptive kernels to improve sample efficiency in robotic control scenarios [
4]. Genetic Algorithms (GAs) have also been investigated for gain optimization in SMC settings; for instance, [
5] demonstrated the effectiveness of GA-tuned SMC in mitigating chattering and ensuring precision tracking in multi-DOF manipulators.
Beyond traditional metaheuristics, recent advances have incorporated Artificial Intelligence techniques into the sliding mode framework. For example, in [
6], an adaptive neural sliding mode controller for a flexible robotic manipulator was proposed, where the neural network learns online to approximate model uncertainties, ensuring robust tracking despite dynamic variations. In a related direction, in [
7], a deep reinforcement learning-based SMC design was introduced for a partially known nonlinear systems, in which a policy gradient method adaptively adjusts the sliding mode control effort, effectively bridging classical robustness with data-driven adaptability. Complementarily, sampled-data admittance control integrated with a data-driven moving-horizon velocity estimator (based on Willems’ fundamental lemma) has been shown to stabilize pHRI under noisy, discontinuous velocity measurements and to improve both transient and steady-state tracking [
8].
Despite recent advances in Artificial Intelligence (AI)-based controller design, most studies either focus on continuous-time implementations or lack a unified methodology for comparing multiple optimization strategies under the same normalized framework. Moreover, few works incorporate a surrogate modeling strategy to reduce computational cost during optimization iterations. Also, in real-world control applications, the tuning of controller gains is frequently performed manually, relying on heuristic adjustments and iterative trial-and-error refinements by human experts. This manual approach is not only time-consuming but also prone to suboptimal performance, especially in systems with nonlinear dynamics, modeling errors, or parametric uncertainties. These challenges are amplified when dealing with discrete-time implementations and neural-based controllers, where interactions between gain parameters and recurrent dynamics are not easily interpretable.
To address this, this work proposes a comprehensive methodology to optimize the gains of a neural discrete-time sliding mode controller using three metaheuristic techniques: Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GAs). The robot dynamics are first estimated via a recurrent high-order neural network trained with an Extended Kalman Filter (EKF) algorithm that dynamically updates measurement and process noise covariance matrices and where the training is performed online. Then, a dataset of tracking errors and gains is collected by sweeping controller parameters in the physical domain and subsequently normalized. This normalization ensures that all optimization algorithms operate within a common domain , enabling fair comparison and improved convergence behavior.
The objective is to identify the optimal control gains that minimize steady-state tracking errors in both joints of a 2-DOF robotic manipulator. By leveraging surrogate models and normalized search spaces, this study demonstrates how each optimization method balances accuracy, computational effort, and robustness, offering valuable insights into controller design for robotic applications.
The main contributions of this work are as follows:
Discrete-time SMC + recurrent high-order NN integration. It is coupled a discrete-time sliding-mode controller with a recurrent high-order neural network identified online via EKF, with adaptive noise covariance updates. This NN reproduces the 2-DOF robot dynamics used by the controller.
Unified, normalized benchmark across BO/PSO/GA. All optimizers run in the same pipeline (shared data loading, normalized gain space , identical budgets/criteria, common steady-state error objective), enabling a fair, apples-to-apples comparison.
Surrogate-assisted screening with diagnostics. Integrate GP surrogates and report model adequacy (), evaluation counts, and wall-clock time alongside tracking performance.
Progression and convergence analytics. Progression plots (early/mid/final) and normalized convergence curves of the aggregate objective are provided to expose exploration–exploitation dynamics and steady-state improvements in discrete time.
Constraint-aware evaluation. Limits enforced actuator torque via the SMC saturation branch and verifies bounded control actions in simulation, linking optimization outcomes to implementability.
Actionable guidance. The unified analysis reveals when PSO, BO, or GA is preferable (e.g., PSO is fastest, BO is most sample-efficient, GA is competitive accuracy at higher cost), providing practical guidance for discrete-time SMC with recurrent high-order NN plants.
Principled alternative to heuristic tuning. Replaced ad hoc trial and error with a reproducible, data-driven pipeline—discrete-time SMC + recurrent high-order NN—optimized under a unified, normalized setup with BO/PSO/GA.
Together, these elements constitute a novel, reproducible benchmarking framework for discrete-time SMC gain tuning with recurrent high-order NN-based plant models in 2-DOF manipulators, extending beyond isolated demonstrations to deliver a systematic methodology and decision-relevant evidence for controller design.
This paper is organized as follows:
Section 2 details the proposed methodology, including the discrete-time modeling of the 2-DOF robot manipulator, the identification of the plant using a recurrent high-order NN trained with the EKF, the implementation of the neural discrete-time sliding mode control algorithm, and the organization of the dataset used for training and simulation. It also describes the application of three metaheuristic optimization algorithms—BO, PSO, and GA—for automatic gain tuning.
Section 3 presents the main simulation results across the entire dataset, and some tables of the main results, and
Section 4 displays a discussion of the findings and their implications.
3. Results
To enhance the transparency of the optimization process, progression plots are provided for each algorithm: BO (
Figure 2), PSO (
Figure 3), and GA (
Figure 4). Each figure is arranged as a 2 × 3 grid: the top row shows Joint 1 and the bottom row Joint 2; the columns correspond to early, mid, and final stages of the search. In every panel, the black curve is the reference, the blue curve is the estimated position
, and the red curve (right axis) is the estimation error
. These signals are reconstructed directly from the dataset using
and
. When available, the three milestones are taken from the optimization logs at iterations 1, 25, and 50; otherwise, a representative early/mid/final triplet is selected by sorting the per-simulation cost
and picking worst/median/best. This presentation complements the convergence summary (
Figure 5) by showing how tracking quality improves over time in each method.
3.1. Dataset Neural Sliding Mode Control Simulation
As a result of the script and for the dataset obtained in
Section 2.4,
Figure 6 and
Figure 7 show simulation comparisons for all tracking performance of the proposed control scheme for the 2-DOF planar robotic manipulator for Joints 1 and 2. Each figure displays the reference signal (in black), the real joint position (in red), and the estimated position provided by recurrent high-order NN (in blue) for all simulations with the 6400 different controller gain values
and
(
18).
In
Figure 6, the measured position
and the estimated state
closely follow reference
over the full 5 s simulation. The red and blue semi-transparent traces correspond to
simulations with different controller gains, allowing visual assessment of variability; the narrow dispersion indicates consistent tracking. The inset zoom emphasizes the initial transient. These results also show that the recurrent high-order NN accurately estimates the robot state, as
matches
with high fidelity.
Figure 7 shows analogous results for Joint 2. After a short transient, both
and
align with
, demonstrating robust tracking over the full sweep of gains. The agreement between
and
further validates the effectiveness of the neural estimator. To explicitly illustrate optimization progress (beyond aggregate tracking), it includes per-algorithm progression plots (early/mid/final milestones taken from iterations 1, 25, and 50 or a worst/median/best fallback), see
Figure 2,
Figure 3 and
Figure 4, and the convergence of the objective value over iterations/generations in
Figure 5. These plots complement
Figure 6 and
Figure 7 by showing how performance improves as each method converges.
Table 1 reports the mean values of the state estimation errors for clarity. MAE (mean absolute error) is used as the primary metric and also the mean error (bias) and RMSE (root mean squared error) are reported. For the joint positions,
rad and
rad for
and
, respectively, are obtained with very small mean errors (
rad and
rad), indicating negligible bias. The corresponding RMSE values (
rad and
rad) are close to the MAE, suggesting the absence of large outliers in the position estimates. For the joint velocities, the MAE is
rad/s for
and
rad/s for
, with mean errors
rad/s and
rad/s, respectively. The larger gap between RMSE and MAE for
(
vs.
rad/s) points to sporadic higher deviations in the velocity estimate of Joint 2.
Table 1 complements
Figure 6 and
Figure 7 and clearly demonstrates the accuracy of the state estimation algorithm.
3.2. Optimal Control Gains for BO, PSO, and GA
Table 2,
Table 3,
Table 4 and
Table 5 summarize the performance of the optimization algorithms applied to the tuning of the neural discrete-time sliding mode controller gains for the robot manipulator.
Table 2 and
Table 3 jointly summarize accuracy and efficiency under an identical pipeline (shared data loading/normalization and a single Python implementation). The optimal gains concentrate in a narrow region (
,
), and the average steady-state errors
are computed over the final 0.1
s (
s). Quantitatively, PSO attains the lowest error in Joint 1 (
rad), GA the lowest in Joint 2 (
rad), and BO remains competitive in both (
rad;
rad) while using only 50 evaluations. In terms of time, PSO is the fastest (23.44 s), BO is similar (23.65 s), and GA is costlier (61.98 + 7.70 s refinement
s), consistent with its larger evaluation budget.
Table 4 shows that the GA requires the highest number of function evaluations (7500), significantly more than PSO with 1500 and BO with only 50. This extensive sampling by GA likely contributes to its ability to find a more refined solution, as reported in
Table 2, but comes at the cost of considerably increased computational time.
Table 5 presents the predictive accuracy of the Gaussian Process surrogate models used in the optimization, reported through the coefficient of determination
. The low
values for both joints (0.0336 for DOF1 and 0.0084 for DOF2) indicate that the surrogate models provide only coarse approximations of the true cost landscape. As such, while they offer a useful heuristic for guiding the search process, direct simulation evaluations remain essential to ensure accurate assessment of candidate solutions.
Figure 5 illustrates the convergence behavior of the three optimization algorithms over 50 iterations or generations. The BO trajectory shows significant oscillations, reflecting its probabilistic sampling strategy and active exploration of the gain space. In contrast, both PSO and GA exhibit rapid convergence within the first few iterations, stabilizing early in low-error regions. For PSO and GA, the plot displays the best normalized cost value found at each generation, highlighting their exploitation-driven search dynamics. The dashed horizontal line representing PSO marks the lowest final cost among the three methods, consistent with the results reported in
Table 2. These convergence profiles illustrate the distinct exploration–exploitation balances intrinsic to each optimization strategy. For clarity,
Figure 5 plots the normalized objective
(built from the steady-state tracking-error cost defined in
Section 2.6; lower is better). Each curve reports the best-so-far value
as a function of iteration/generation
t, and all methods are run with an equal budget of 50 iterations/generations. The
y-axis is unitless due to normalization to
, and the
x-axis counts iterations (BO) or generations (PSO/GA).
The early stagnation of the PSO curve suggests that the swarm quickly located a promising region in the search space and collectively converged without further significant improvement, possibly due to limited diversity or suboptimal balance in its exploration parameters. On the other hand, the persistent oscillations in the BO curve indicate that the acquisition function continues to propose exploratory samples far from previously evaluated regions. This reflects the BO strategy’s emphasis on global search, even after identifying near-optimal regions, which may delay convergence but can help avoid premature local optima. Such behavior highlights that while BO may take longer to settle, it remains valuable in problems where the cost landscape is highly multimodal or where evaluations are costly and limited in number.
Overall, the results demonstrate that all three optimization strategies effectively identify high-performance gain configurations. BO offers a favorable trade-off between tracking performance and computational efficiency, whereas the GA achieves the lowest tracking cost at the expense of significantly higher computational effort and number of evaluations.
Figure 8 shows the torques produced over a 5 s simulation with
ms. Dashed horizontal lines indicate actuator limits:
N
m for Joint 1 and
N
m for Joint 2. Both
and
remain within their bounds, exhibiting the expected behavior of SMC while satisfying the torque requirements stated in
Section 2.3. The apparent saturation in the control signals is an intended consequence of the control law in (
8). Whenever
, the second branch is activated and the commanded torque is clipped to
with the sign of
. This explains the plateaus at
N
m (Joint 1) and
N
m (Joint 2), ensures compliance with actuator limits, and produces the characteristic behavior typical of SMC.
4. Discussion
The results presented in
Section 3 confirm the effectiveness of the proposed AI-based optimization framework for tuning the gains of a neural discrete-time sliding mode controller applied to a 2-DOF robotic manipulator. However, several additional aspects merit discussion to contextualize the contributions and limitations of this work.
First, while this study focused on three popular metaheuristic algorithms—Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GAs)—it is important to note that other AI-driven and classical optimization techniques are available. Alternatives such as Differential Evolution, Ant Colony Optimization, Reinforcement Learning-based controllers, or even gradient-based deterministic methods (e.g., Sequential Quadratic Programming) could also be explored. Each approach brings trade-offs in terms of convergence rate, robustness, and computational demand, which may prove beneficial in different application scenarios.
Second, the optimization objective in this study targeted the minimization of the steady-state tracking error, as defined by the cost function in (
22). While this metric is fundamental for ensuring accurate long-term tracking, it does not explicitly account for transient dynamics such as overshoot, settling time, or control energy expenditure.
Third, the simulation data used for training the surrogate models and evaluating the controller was generated through a systematic sweep of normalized gain values. Despite the limited size of the dataset (on the order of hundreds of samples), the convergence of BO, PSO, and GA to similar optimal regions indicates that the data were sufficiently informative to guide the optimization processes. This also highlights the advantage of working in a normalized gain space, which allowed the models to generalize well across the explored domain.
It is also worth noting that the surrogate models used in BO and PSO approximated the cost landscape with relatively low precision (as reflected by the
scores in
Table 5). Nevertheless, the optimization routines remained effective due to the reinforcement provided by direct simulation evaluations. This suggests that even modest-quality models can support efficient gain tuning when combined with simulation-based corrections.
Finally, although the neural plant model employed an EKF for online adaptation of covariance matrices, future research could investigate other adaptive training mechanisms such as recursive least squares, ensemble learning, or deep learning-based dynamics estimators. Incorporating uncertainty quantification into the plant model could further improve the robustness of the control strategy in real-world settings.
In summary, the methodology proposed in this study establishes a solid foundation for AI-based controller optimization. It remains extensible to other objective functions, optimization techniques, and controller architectures, thereby offering a flexible framework for advancing autonomous control of robotic systems.
Recent studies have explored data-driven or metaheuristic tuning of robust controllers in robotic applications. For instance, work [
26] combines SMC with online BO for a bio-inspired underwater robot, showing marked improvements in depth and pitch regulation after roughly fifty online updates while maintaining Lyapunov-stable operation. In terrestrial settings, local BO with crash constraints has been proposed to tune controller parameters under hardware limits safely [
27]. Within metaheuristics, work [
28] integrates PSO with a modified power–rate SMC to mitigate chattering and simplify tuning on a multi-DOF manipulator with experimental validation; the authors in [
29] compare several metaheuristics (DE, PSO, hybrids) for manipulator trajectory tracking using a multi-term objective (position/orientation/joint errors) and uniform iteration budgets; and in Keshti [
30] employ GA to tune SMC gains for human-arm trajectory tracking, focusing on minimizing summary tracking error.
Compared with the above, we explicitly adopt a discrete-time SMC and a recurrent high-order neural network NN that reproduces the dynamics of the 2-DOF manipulator (serving as the predictive plant for indirect control). Within this discrete-time setting, we evaluate the same controller and dataset across BO, PSO, and GA, enabling a direct trade-off reading between accuracy and computation (
Table 2 and
Table 3).
Table 2 shows that PSO attains the lowest steady-state error for Joint 1, while GA attains the lowest for Joint 2 (by a narrow margin), and BO is competitive in both joints. From
Table 3, PSO achieves the shortest optimization time among the three, BO is similar, and GA is noticeably more expensive—consistent with prior reports that PSO offers fast convergence for nonlinear objectives [
28] and that BO is attractive for sample-efficient tuning [
26,
27]. Together with the progression plots and convergence curves, these results contextualize our contribution: a reproducible, apples-to-apples benchmark of discrete-time SMC gain tuning with a recurrent high order NN-based plant model for a 2-DOF robot, revealing that PSO is an excellent first choice when compute is tight, BO is attractive when safe, sample-efficient exploration is needed, and GA can yield competitive accuracy at higher computational cost.
Table 6 shows the mentioned comparison.
Following industry practice, steady state is declared once the trajectory enters and remains within a tolerance band for a dwell time. In accordance with ISO 9283 [
31] “position stabilization time,” the band can be tied to robot repeatability or set as a percentage band (e.g.,
), and the dwell time ensures persistence inside the band. In our experiments (sampling at 1 kHz), we adopt a 0.1 s dwell (100 samples) once the band is entered, and compute steady-state errors over that dwell segment. This aligns with the classical settling-time definition based on tolerance bands used in control theory [
32].
5. Conclusions
This work presented a comprehensive AI-based optimization framework for tuning the gains of a neural discrete-time sliding mode controller applied to a two-degree-of-freedom robotic manipulator. The controller was indirectly implemented through a neural network trained with an EKF, allowing the estimation of the plant dynamics and facilitating robust control under model uncertainties.
Three optimization strategies—Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GAs)—were evaluated using a normalized parameter space and surrogate modeling based on Gaussian Processes. The optimization process was driven by the minimization of normalized steady-state tracking errors for both joints of the robot. The normalization framework enabled a unified treatment of the gain space and tracking error costs, improving convergence efficiency and reproducibility.
The results demonstrated that all three metaheuristics successfully identified high-performance gain configurations. GAs achieved the lowest tracking cost but at the expense of a significantly higher number of evaluations and computation time. BO and PSO yielded nearly identical results with much lower computational effort. These findings highlight that the proposed methodology provides a flexible and efficient solution for automatic controller tuning, especially in scenarios where analytical models are unavailable or costly to evaluate.
Future work will explore real-time adaptation of controller gains using online learning techniques and extend the optimization framework to multi-objective formulations involving control effort, energy consumption, and robustness metrics.