AI-Based Optimization of a Neural Discrete-Time Sliding Mode Controller via Bayesian, Particle Swarm, and Genetic Algorithms

Castañeda, Carlos E.

doi:10.3390/robotics14090128

Open AccessArticle

AI-Based Optimization of a Neural Discrete-Time Sliding Mode Controller via Bayesian, Particle Swarm, and Genetic Algorithms

by

Carlos E. Castañeda

Centro Universitario de los Lagos, Universidad de Guadalajara, Lagos de Moreno 47460, Jalisco, Mexico

Robotics 2025, 14(9), 128; https://doi.org/10.3390/robotics14090128

Submission received: 14 August 2025 / Revised: 9 September 2025 / Accepted: 15 September 2025 / Published: 19 September 2025

(This article belongs to the Section AI in Robotics)

Download

Browse Figures

Versions Notes

Abstract

This work introduces a unified Artificial Intelligence-based framework for the optimal tuning of gains in a neural discrete-time sliding mode controller (SMC) applied to a two-degree-of-freedom robotic manipulator. The novelty lies in combining surrogate-assisted optimization with normalized search spaces to enable a fair comparative analysis of three metaheuristic strategies: Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GAs). The manipulator dynamics are identified via a discrete-time recurrent high-order neural network (NN) trained online using an Extended Kalman Filter with adaptive noise covariance updates, allowing the model to accurately capture unmodeled dynamics, nonlinearities, parametric variations, and process/measurement noise. This neural representation serves as the predictive plant for the discrete-time SMC, enabling precise control of joint angular positions under sinusoidal phase-shifted references. To construct the optimization dataset, MATLAB^® simulations sweep the controller gains

(k_{0}^{*}, k_{1}^{*})

over a bounded physical domain, logging steady-state tracking errors. These are normalized to mitigate scaling effects and improve convergence stability. Optimization is executed in Python^® using integrated scikit-learn, DEAP, and scikit-optimize routines. Simulation results reveal that all three algorithms reach high-performance gain configurations. Here, the combined cost is the normalized aggregate objective

\tilde{J}

constructed from the steady-state tracking errors of both joints. Under identical experimental conditions (shared data loading/normalization and a single Python pipeline), PSO attains the lowest error in Joint 1 (

7.36 \times 10^{- 5}

rad) with the shortest runtime (23.44 s); GA yields the lowest error in Joint 2 (

8.18 \times 10^{- 3}

rad) at higher computational expense (≈69.7 s including refinement); and BO is competitive in both joints (

7.81 \times 10^{- 5}

rad,

8.39 \times 10^{- 3}

rad) with a runtime comparable to PSO (23.65 s) while using only 50 evaluations.

Keywords:

AI-based controller; neural sliding mode control; robotic manipulator; gain optimization; bayesian optimization; particle swarm optimization; genetic algorithm optimization

1. Introduction

Neural sliding mode control (SMC) has become an effective paradigm for dealing with the nonlinearities and uncertainties inherent in robotic systems. In particular, discrete-time formulations of SMC combined with neural approximators have gained attention for real-time control applications involving robotic manipulators. However, the tuning of control gains in these architectures remains a non-trivial task that directly influences performance, robustness, and stability.

Recent contributions in the literature have explored a variety of approaches to optimize sliding mode controller parameters using nature-inspired and learning-based strategies. For example, in [1], a robust adaptive neural SMC framework was proposed, optimized via Grey Wolf Optimizer to enhance tracking accuracy and robustness in robot manipulators.

Similarly, in [2], a Particle Swarm Optimization (PSO)-tuned SMC was implemented in the context of robotic rehabilitation, highlighting its effectiveness for human–robot interaction. Within physical human–robot collaboration (pHRC), recent work has modeled a robot’s trust in its human co-worker using measurable factors, validated on a collaborative manipulator setup [3]. Other researchers have focused on Bayesian-based strategies, such as adaptive Gaussian process modeling with adaptive kernels to improve sample efficiency in robotic control scenarios [4]. Genetic Algorithms (GAs) have also been investigated for gain optimization in SMC settings; for instance, [5] demonstrated the effectiveness of GA-tuned SMC in mitigating chattering and ensuring precision tracking in multi-DOF manipulators.

Beyond traditional metaheuristics, recent advances have incorporated Artificial Intelligence techniques into the sliding mode framework. For example, in [6], an adaptive neural sliding mode controller for a flexible robotic manipulator was proposed, where the neural network learns online to approximate model uncertainties, ensuring robust tracking despite dynamic variations. In a related direction, in [7], a deep reinforcement learning-based SMC design was introduced for a partially known nonlinear systems, in which a policy gradient method adaptively adjusts the sliding mode control effort, effectively bridging classical robustness with data-driven adaptability. Complementarily, sampled-data admittance control integrated with a data-driven moving-horizon velocity estimator (based on Willems’ fundamental lemma) has been shown to stabilize pHRI under noisy, discontinuous velocity measurements and to improve both transient and steady-state tracking [8].

Despite recent advances in Artificial Intelligence (AI)-based controller design, most studies either focus on continuous-time implementations or lack a unified methodology for comparing multiple optimization strategies under the same normalized framework. Moreover, few works incorporate a surrogate modeling strategy to reduce computational cost during optimization iterations. Also, in real-world control applications, the tuning of controller gains is frequently performed manually, relying on heuristic adjustments and iterative trial-and-error refinements by human experts. This manual approach is not only time-consuming but also prone to suboptimal performance, especially in systems with nonlinear dynamics, modeling errors, or parametric uncertainties. These challenges are amplified when dealing with discrete-time implementations and neural-based controllers, where interactions between gain parameters and recurrent dynamics are not easily interpretable.

To address this, this work proposes a comprehensive methodology to optimize the gains of a neural discrete-time sliding mode controller using three metaheuristic techniques: Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GAs). The robot dynamics are first estimated via a recurrent high-order neural network trained with an Extended Kalman Filter (EKF) algorithm that dynamically updates measurement and process noise covariance matrices and where the training is performed online. Then, a dataset of tracking errors and gains is collected by sweeping controller parameters in the physical domain and subsequently normalized. This normalization ensures that all optimization algorithms operate within a common domain

{[0, 1]}^{2}

, enabling fair comparison and improved convergence behavior.

The objective is to identify the optimal control gains

{k_{0}^{*}, k_{1}^{*}}

that minimize steady-state tracking errors in both joints of a 2-DOF robotic manipulator. By leveraging surrogate models and normalized search spaces, this study demonstrates how each optimization method balances accuracy, computational effort, and robustness, offering valuable insights into controller design for robotic applications.

The main contributions of this work are as follows:

Discrete-time SMC + recurrent high-order NN integration. It is coupled a discrete-time sliding-mode controller with a recurrent high-order neural network identified online via EKF, with adaptive noise covariance updates. This NN reproduces the 2-DOF robot dynamics used by the controller.
Unified, normalized benchmark across BO/PSO/GA. All optimizers run in the same pipeline (shared data loading, normalized gain space ${[0, 1]}^{2}$ , identical budgets/criteria, common steady-state error objective), enabling a fair, apples-to-apples comparison.
Surrogate-assisted screening with diagnostics. Integrate GP surrogates and report model adequacy ( $R^{2}$ ), evaluation counts, and wall-clock time alongside tracking performance.
Progression and convergence analytics. Progression plots (early/mid/final) and normalized convergence curves of the aggregate objective $\tilde{J}$ are provided to expose exploration–exploitation dynamics and steady-state improvements in discrete time.
Constraint-aware evaluation. Limits enforced actuator torque via the SMC saturation branch and verifies bounded control actions in simulation, linking optimization outcomes to implementability.
Actionable guidance. The unified analysis reveals when PSO, BO, or GA is preferable (e.g., PSO is fastest, BO is most sample-efficient, GA is competitive accuracy at higher cost), providing practical guidance for discrete-time SMC with recurrent high-order NN plants.
Principled alternative to heuristic tuning. Replaced ad hoc trial and error with a reproducible, data-driven pipeline—discrete-time SMC + recurrent high-order NN—optimized under a unified, normalized setup with BO/PSO/GA.

Together, these elements constitute a novel, reproducible benchmarking framework for discrete-time SMC gain tuning with recurrent high-order NN-based plant models in 2-DOF manipulators, extending beyond isolated demonstrations to deliver a systematic methodology and decision-relevant evidence for controller design.

This paper is organized as follows: Section 2 details the proposed methodology, including the discrete-time modeling of the 2-DOF robot manipulator, the identification of the plant using a recurrent high-order NN trained with the EKF, the implementation of the neural discrete-time sliding mode control algorithm, and the organization of the dataset used for training and simulation. It also describes the application of three metaheuristic optimization algorithms—BO, PSO, and GA—for automatic gain tuning. Section 3 presents the main simulation results across the entire dataset, and some tables of the main results, and Section 4 displays a discussion of the findings and their implications.

2. Methodology

2.1. Dynamic Model of the Two-DOF Robotic Manipulator

To show the proposed method, let us describe the mathematical expression of the robot manipulator in general form in continuous time [9,10]:

M (q) \ddot{q} + C (q, \dot{q}) \dot{q} + G (q) + F_{v} (\dot{q}) + F_{c} (\dot{q}) = τ .

(1)

For the particular case of the two degrees of freedom robot,

q = [\begin{matrix} q_{1} \\ q_{2} \end{matrix}]

are the joint positions (rad),

\dot{q} = [\begin{matrix} {\dot{q}}_{1} \\ {\dot{q}}_{2} \end{matrix}]

are the joint velocities (rad/s),

τ = [\begin{matrix} τ_{1} \\ τ_{2} \end{matrix}]

are the applied torques N · m. The inertia matrix is

M (q) = [\begin{matrix} M_{11} & M_{12} \\ M_{21} & M_{22} \end{matrix}],

where

\begin{matrix} M_{11} & = m_{1} l_{c 1}^{2} + m_{2} (l_{1}^{2} + l_{c 2}^{2} + 2 l_{1} l_{c 2} cos q_{2}) + I_{1} + I_{2}, \\ M_{12} & = m_{2} (l_{c 2}^{2} + l_{1} l_{c 2} cos q_{2}) + I_{2}, \\ M_{21} & = M_{12}, \\ M_{22} & = m_{2} l_{c 2}^{2} + I_{2} \end{matrix}

(2)

With each element of the matrix

M (q)

is in (kg ·

m^{2}

);

l_{1}

and

l_{2}

,

l_{c 1}

and

l_{c 2}

,

m_{1}

and

m_{2}

,

I_{1}

and

I_{2}

are the lengths (m), centers of mass (m), mass (kg), and moments of inertia (kg

\cdot

m^{2}

) of Links 1 and 2, respectively. The Coriolis and centrifugal contributions are expressed in the form

C (q, \dot{q}) = [\begin{matrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{matrix}]

where each element is defined as

\begin{matrix} C_{11} & = - m_{2} l_{1} l_{c 2} sin (q_{2}) {\dot{q}}_{2}, \\ C_{12} & = - m_{2} l_{1} l_{c 2} sin (q_{2}) ({\dot{q}}_{1} + {\dot{q}}_{2}), \\ C_{21} & = m_{2} l_{1} l_{c 2} sin (q_{2}) {\dot{q}}_{1}, \\ C_{22} & = 0 . \end{matrix}

(3)

Note that in this representation, each element

C_{i j}

already incorporates the velocity terms and therefore directly represents a torque contribution expressed in (N

\cdot

m). The gravity vector is

G (q) = [\begin{matrix} G_{11} \\ G_{21} \end{matrix}] = [\begin{matrix} (m_{1} l_{c 1} + m_{2} l_{1}) g sin q_{1} + m_{2} l_{c 2} g sin (q_{1} + q_{2}) \\ m_{2} l_{c 2} g sin (q_{1} + q_{2}) \end{matrix}]

in (N

\cdot

m). The viscous friction is represented by

F_{v} (\dot{q}) = [\begin{matrix} f_{v 1} {\dot{q}}_{1} \\ f_{v 2} {\dot{q}}_{2} \end{matrix}]

in (N

\cdot

m

\cdot

s) and the Coulomb friction by

F_{c} (\dot{q}) = [\begin{matrix} f_{c 1} tanh (100 {\dot{q}}_{1}) \\ f_{c 2} tanh (100 {\dot{q}}_{2}) \end{matrix}]

in (N

\cdot

m).

To illustrate the present proposal, Dynamical Model (1) is discretized using the Euler approximation [11], which facilitates the implementation of the recurrent high-order NN and the discrete-time sliding mode controller. This discretized model results in

\begin{matrix} q_{1} (k + 1) & = q_{1} (k) + T \cdot q_{3} (k) \\ q_{2} (k + 1) & = q_{2} (k) + T \cdot q_{4} (k) \\ q_{3} (k + 1) & = q_{3} (k) + T \cdot [M_{11}^{'} \cdot (τ_{1} (k) - C_{11} q_{3} (k) - C_{12} q_{4} (k) - G_{11} - f_{1}) \\ + M_{12}^{'} \cdot (τ_{2} (k) - C_{21} q_{3} (k) - C_{22} q_{4} (k) - G_{21} - f_{2})] \\ q_{4} (k + 1) & = q_{4} (k) + T \cdot [M_{21}^{'} \cdot (τ_{1} (k) - C_{11} q_{3} (k) - C_{12} q_{4} (k) - G_{11} - f_{1}) \\ + M_{22}^{'} \cdot (τ_{2} (k) - C_{21} q_{3} (k) - C_{22} q_{4} (k) - G_{21} - f_{2})] \end{matrix}

(4)

where

M_{11}^{'} = \frac{M_{22}}{| M |}

;

M_{12}^{'} = \frac{- M_{12}}{| M |}

;

M_{21}^{'} = \frac{- M_{21}}{| M |}

;

M_{22}^{'} = \frac{M_{11}}{| M |}

;

f_{1} = f_{v 1} q_{3} (k) + f_{c 1} tanh (100 q_{3} (k))

,

f_{2} = f_{v 2} q_{4} (k) + f_{c 2} tanh (100 q_{4} (k))

, and T is the sampling step.

This discrete-time Model (4) is used for both simulating the robot’s response and training the recurrent neural network identifier. The parameters used in Expressions (2) through (3), which are essential for simulating the discrete-time model (4), are derived from the study presented in reference [10].

2.2. State Estimation with Recurrent High-Order NN

In this work, the methodology presented in a previous work [12] is employed, where a discrete-time recurrent high-order NN is employed to identify the dynamical behavior of (4). The goal for proposing this neural network is that this NN has the capability to absorb non-modeled dynamics, modeled error, and possible parameter variations that are common in the real system. Additionally, this NN can aid the controller in gaining variations and nonlinear effects in the closed-loop system produced by the reference inputs for tracking proposed in this work. Also, to make indirect control in series-parallel consuration such that the discrete-time sliding mode algorithm acts directly on the neural network structure and so that the NN reproduces the model (4), the outputs of this model track the reference signals. Then, the recurrent high-order NN proposed for this work has the following general structure:

x_{i} (k + 1) = W_{i} {(k)}^{⊤} H_{i} (k) + b_{i} (k),

(5)

where x denotes the NN state, with

i = 1, 2, . ., n

as the number of states,

W_{i} (k) \in R^{3}

are the adaptive synaptic weights,

H_{i} (k)

are nonlinear regressors specific to each joint,

b_{i} (k)

are linear dynamic contributions linked to the velocity or torque signals. Specifically, to reproduce the dynamics of the 2-DOF robot model in discrete time (4), the following recurrent high-order NN structure is proposed:

\begin{matrix} x_{1} (k + 1) & = W_{1} {(k)}^{⊤} [\begin{matrix} tanh (a q_{1} (k)) \\ tanh (a q_{3} (k)) \\ 1 \end{matrix}] + w_{11} q_{3} (k) + σ_{q 1} N (0, 1) \\ x_{2} (k + 1) & = W_{2} {(k)}^{⊤} [\begin{matrix} tanh (a q_{2} (k)) \\ tanh (a q_{4} (k)) \\ 1 \end{matrix}] + w_{12} q_{4} (k) + σ_{q 2} N (0, 1) \\ x_{3} (k + 1) & = W_{3} {(k)}^{⊤} [\begin{matrix} tanh (a q_{1} (k)) \\ tanh (a q_{2} (k)) \\ tanh (a q_{3} (k)) \end{matrix}] + w_{13} τ_{1} (k) + σ_{q 3} N (0, 1) \\ x_{4} (k + 1) & = W_{4} {(k)}^{⊤} [\begin{matrix} tanh (a q_{1} (k)) \\ tanh (a q_{2} (k)) \\ tanh (a q_{4} (k)) \end{matrix}] + w_{14} τ_{2} (k) + σ_{q 4} N (0, 1) \end{matrix}

(6)

where x are the NN state variables, each one estimating the behavior of their corresponding robot states of System (4); the synaptic weight vectors

W_{i} (k)

are adaptive online using the EKF, as explained in [12], where the state noise covariance matrix

Q_{i} (k)

and the associated measurement noise covariance matrix

R_{i} (k)

are computed using a time-varying formulation. Synaptic weights

w_{1 i}

are constant parameters to guarantee the controllability of the system;

N (0, 1)

denotes a standard Gaussian random variable of measurement noise, and

σ_{q i} = 0.001

are the standard deviations; and hyperbolic tangent (tanh) is the activation function. The goal for using the EKF as training algorithm with the time-varying formulation is to reduce or to eliminate the estimation error

e_{i} (k) = q_{i} (k) - x_{i} (k)

.

The present formulation of this NN for estimating the plant states combines nonlinear activation function, dynamic contributions of the applied inputs, adaptive estimation using EKF, and Gaussian noise applied to the measurable variables. This is to show a robust and optimal control algorithm based on the NN structure.

2.3. Discrete-Time Sliding Mode Controller

The control algorithm employed in this work is based on the methodology presented in a previous study [12]. However, unlike their approach—which targets a five-degree-of-freedom robotic system—this proposal focuses on a planar robot with only two degrees of freedom. Naturally, this difference entails significant adjustments in aspects such as the neural network architecture, the number of inputs and outputs, and other related parameters.

We use the methodology presented in [12] to control the Model System (4) indirectly through the proposed recurrent high-order NN (6), such that the following NN tracking error vector converges to zero:

Z_{1} (k) = [\begin{matrix} x_{1} (k) - r e f_{1} (k) \\ x_{2} (k) - r e f_{2} (k) \end{matrix}]

(7)

where

r e f_{1} (k)

and

r e f_{2} (k)

are the reference signals to be tracked by the NN states

x_{1} (k)

and

x_{2} (k)

of System (6), respectively, and therefore by the robot states

q_{1} (k)

and

q_{2} (k)

of System (4), we follow the methodology presented in [12]. Therefore, the discrete-time sliding mode control law is as follows:

τ_{i} (k) = \{\begin{matrix} U_{eq, i} (k), & if | U_{eq, i} (k) | \leq {sat}_{i}, \\ {sat}_{i} \cdot \frac{U_{eq, i} (k)}{| U_{eq, i} (k) |}, & otherwise, \end{matrix} i = 1, 2 .

(8)

with the discrete control torque vector given by:

U_{eq} (k) = - B_{1}^{- 1} (F_{1} (k) + X_{1, des} (k + 1) + K_{1} Z_{2} (k)),

(9)

In (8),

U_{eq, i} (k)

denotes the ith component of the equivalent control vector

U_{eq} (k) \in R^{2}

, computed in (9). The scalar

{sat}_{i} > 0

is the actuator torque bound (N · m) for joint i; in these simulations,

{sat}_{1} = 80

N · m and

{sat}_{2} = 20

N · m. The term

\frac{U_{eq, i} (k)}{| U_{eq, i} (k) |}

is the sign of

U_{eq, i} (k)

(and is defined as 0 when

U_{eq, i} (k) = 0

), ensuring pointwise saturation of the torque

τ_{i} (k)

to the interval

[- {sat}_{i}, {sat}_{i}]

.

Also,

B_{1} = diag (b_{1, a}, b_{1, b}), K_{1} = diag (k_{1}, k_{1}),

(10)

F_{1} (k) = [\begin{matrix} f_{1, a} (k) \\ f_{1, b} (k) \end{matrix}], Z_{2} (k) = x_{2} (k) - X_{1, des} (k)

(11)

X_{1, des} (k + 1) = - B_{0}^{- 1} (F_{0} (k + 1) + K_{0} Z_{1} (k + 1) - ref (k + 2)),

(12)

with the equivalent control component computed as

F_{0} (k + 1) = [\begin{matrix} f_{0, a} (k + 1) \\ f_{0, b} (k + 1) \end{matrix}], Z_{1} (k + 1) = x_{1} (k + 1) - ref (k + 1),

and the reference signals

ref (k + 1) = [\begin{matrix} r e f_{1} (k + 1) \\ r e f_{2} (k + 1) \end{matrix}], ref (k + 2) = [\begin{matrix} r e f_{1} (k + 2) \\ r e f_{2} (k + 2) \end{matrix}] .

The desired discrete velocity signal is computed as

X_{1, des} (k) = - B_{0}^{- 1} (F_{0} (k) + K_{0} Z_{1} (k) - ref (k + 1)),

(13)

where

B_{0} = diag (b_{0, a}, b_{0, b}), K_{0} = diag (k_{0}, k_{0}) .

(14)

and

F_{0} (k)

denotes the estimated dynamic term provided by the recurrent NN identification:

F_{0} (k) = [\begin{matrix} f_{0, a} (k) \\ f_{0, b} (k) \end{matrix}] .

The reference trajectories are defined as

ref (k) = [\begin{matrix} r e f_{1} (k) \\ r e f_{2} (k) \end{matrix}] = [\begin{matrix} 1.5 \cdot sin (2 π \cdot 0.5 \cdot k T + \frac{π}{4}) \\ 1.0 \cdot sin (2 π \cdot 0.3 \cdot k T + \frac{π}{2}) \end{matrix}]

(15)

for the step

(k + 1)

:

ref (k + 1) = [\begin{matrix} r e f_{1} (k + 1) \\ r e f_{2} (k + 1) \end{matrix}] = [\begin{matrix} 1.5 \cdot 2 π \cdot 0.5 \cdot cos (2 π \cdot 0.5 \cdot k T + \frac{π}{4}) \\ 1.0 \cdot 2 π \cdot 0.3 \cdot cos (2 π \cdot 0.3 \cdot k T + \frac{π}{2}) \end{matrix}]

(16)

and for the step

(k + 2)

:

ref (k + 2) = [\begin{matrix} r e f_{1} (k + 2) \\ r e f_{2} (k + 2) \end{matrix}] = [\begin{matrix} - 1.5 \cdot {(2 π \cdot 0.5)}^{2} \cdot sin (2 π \cdot 0.5 \cdot k T + \frac{π}{4}) \\ - 1.0 \cdot {(2 π \cdot 0.3)}^{2} \cdot sin (2 π \cdot 0.3 \cdot k T + \frac{π}{2}) \end{matrix}]

(17)

where T is the sampling period. It is worth noting that the reference signals assigned to each joint differ in both amplitude and phase. This design choice is intentional and serves to evaluate the stability margin of the proposed control system under non-identical and time-varying tracking conditions. By introducing distinct dynamics for each joint, the control system is subjected to more demanding scenarios, thereby allowing for a more comprehensive assessment of its performance and adaptability. For a more detailed explanation of the control synthesis, please refer to [12].

It is important to remark that the control gains

k_{0}

in (14) and

k_{1}

in (10) are responsible for driving the NN-based tracking error vector (7) toward zero. Typically, these control gains are tuned heuristically, where the practitioner manually adjusts the parameters based on qualitative or quantitative assessments of the tracking performance. The objective is to achieve minimal tracking error. Three systematic procedures for optimizing these control gains are presented in the following subsections.

2.4. Data Generation and CSV Export for AI-Based Optimization

A script file in Matlab^® is prepared to indirectly control Robot Model (4) based on the recurrent high-order NN (6) and trained with the EKF, as explained in Section 2.2 and Section 2.3.

The primary objective of this script is to explore a grid of controller gain values systematically,

k_{0}

and

k_{1}

, and record the resulting tracking performance to facilitate the application of Artificial Intelligence (AI) optimization methods.

Then, the procedure for obtaining the dataset is summarized as follows:

Gain Sweep and Data Storage: A grid search is performed over a discrete range of controller gain values:

$k_{0} \in {100, 102, \dots, 258}, k_{1} \in {100, 102, \dots, 258},$

(18)

resulting in $80 \times 80 = 6400$ total simulations. For each simulation, the values of $k_{0}$ and $k_{1}$ are embedded in diagonal gain matrices $K_{0}$ (14) and $K_{1}$ (10).
The control algorithm is executed for each pair $(k_{0}, k_{1})$ , and the resulting states, control inputs, reference signals, and estimation errors are recorded. Of particular interest is the tracking error vector $Z_{1} (k)$ , defined in (7).
NN tracking error storage. The NN tracking errors are also stored with the following features:

$Z_{11} (k), Z_{21} (k) \in {6400 \times 5000}$

(19)

They correspond to the total combinations of the control gains $k_{0}$ and $k_{1}$ × the total steps of simulations.
Simulation Setup. The simulations are performed using a discrete-time step of $T = 0.001 s$ . Each trial corresponds to a unique pair of controller gains $(k_{0}, k_{1})$ within the range mentioned in (18), forming an exhaustive grid for gain evaluation. The simulation environment includes measurement noise and process disturbances, both modeled as additive zero-mean Gaussian noise.
Simulation Length. Each simulation run spans a total of 5000 time steps, corresponding to a real-time duration of 5 s. This duration is sufficient to capture the transient and steady-state behavior of the robotic manipulator under sinusoidal reference tracking.
Control Saturation. Actuator saturation is considered to reflect physical constraints. The saturation limits are set as follows: Joint 1: $τ_{1, \max} = 80$ N $\cdot$ m; Joint 2: $τ_{2, \max} = 20$ N $\cdot$ m. Any control input exceeding these thresholds is clipped accordingly.
Initial Conditions. The robot’s original state variables are initialized as follows: Joint positions: $q_{1} (0) = 1$ rad, $q_{2} (0) = 2$ rad; Joint velocities: $q_{3} (0) = 0$ rad/s, $q_{4} (0) = 0$ rad/s. The estimated states from the neural observer are initialized as follows: Estimated positions and velocities: $x_{i} (0) = 0$ for $i = 1, 2, 3, 4$ . The distinct initial values of the original and estimated states are chosen to evaluate the fast convergence capability of the NN.
Export to CSV Format. After simulating all gain combinations, the script exports the data into separate .csv files to facilitate further analysis and AI-based optimization. These files include:
–
k0.csv and k1.csv: Contain the scalar gain values used in each simulation, with one value per row.
–
Z1.csv and Z2.csv: Store the first and second components of the NN tracking error vector $Z_{1} (k)$ (7) for all simulations. Each row corresponds to a simulation instance, and each column corresponds to a discrete-time step k.
These exported files serve as the dataset for AI optimization. The input features are the gain values $k_{0}$ (14) and $k_{1}$ (10), and the objective is to minimize the norm of the tracking error:

$J (k_{0}, k_{1}) = \sum_{k = 1}^{N} {∥ Z_{1} (k) ∥}^{2},$

(20)

where N is the number of time steps in the simulation. This objective function is used to train regression models or directly guide optimization algorithms.

The dataset generated from the script file in Matlab^® thus becomes the cornerstone for offline training and evaluation of AI-based methods to identify the optimal controller gains that minimize tracking error across all trajectories. This strategy bypasses the need for analytical tuning and enables data-driven calibration with high accuracy and generalization capability. The total dataset captures enough information that can be used to optimize the control gains

k_{0}

(14) and

k_{1}

(10) for minimizing the NN tracking errors (7) and therefore the original tracking error vector

e (k) = [\begin{matrix} e_{1} (k) \\ e_{2} (k) \end{matrix}] = q (k) - ref (k) = [\begin{matrix} q_{1} (k) - r e f_{1} (k) \\ q_{2} (k) - r e f_{2} (k) \end{matrix}]

(21)

with

q (k) = [\begin{matrix} q_{1} (k) \\ q_{2} (k) \end{matrix}]

of System (4).

In the following subsections, we present three optimization procedures consisting of BO, PSO, and GA which optimize the control gains

k_{0}

(14) and

k_{1}

(10) to minimize the NN tracking error vector

Z_{1} (k)

(7) and of course the original tracking error vector

e (k)

(21). Figure 1 presents the complete optimization workflow, structured into three distinct branches corresponding to the BO, PSO, and GA approaches. The process begins with two common preprocessing blocks: Load CSV Data and Normalize Data. Each branch subsequently applies its respective algorithmic strategy to search the parameter space for optimal control gains. All three branches converge at their respective endpoints, yielding the optimal pair of gains

(k_{0}^{*}, k_{1}^{*})

that minimize the NN tracking error vector

Z_{1} (k)

(7) and of course the original tracking error vector

e (k)

(21) of the discrete-time sliding mode controller for the 2-DOF robot (4).

2.5. Load CSV Data

The data loading stage was implemented in Python version 3.8.1 using the pandas and numpy libraries [13,14]. The script to load data in .py file performs the following operations:

CSV Import. As mentioned before, four CSV files are loaded: k0.csv and k1.csv that contain the controller gain vectors $k_{0}$ (14) and $k_{1}$ (10) explored in the parameter sweep, each of dimension $6400 \times 1$ ; and Z1.csv and Z2.csv that contain the time-series NN tracking errors $Z_{1} (k)$ (7) per simulation, each of dimension $6400 \times 5000$ . These files are imported as NumPy arrays using pd.read_csv(filename, header=None).values.
Dimensionality Checks. This is performed to ensure consistency and to guarantee that each gain combination has a corresponding tracking error trajectory. Then, the script validates $assert len (k_{0}) = len (k_{1}) = 6400, Z 1 . shape = Z 2 . shape = (6400, 5000) .$
Computation of Steady-State Errors. To condense the time series into a scalar measure of control performance per axis, for each simulation, the final NN tracking error is summarized as the mean absolute error over the last 100 samples as $e s s_{i} (j) = \frac{1}{100} \sum_{k = 4901}^{5000} | Z_{i} (j, k) |, i = 1, 2 .$
Cost Function Computation. The cost per simulation is defined as the sum of both steady-state errors:

$J (j) = e s s_{1} (j) + e s s_{2} (j) .$

(22)

This represents the global tracking deviation for the given gains $(k_{0}, k_{1})$ (18).
Feature Matrix Assembly. A matrix $X \in R^{6400 \times 2}$ is constructed by stacking the gain vectors (18):

$X = [\begin{matrix} k_{0} & k_{1} \end{matrix}] .$

(23)

This matrix serves as the input space for subsequent optimization algorithms.

This modular approach separates the dataset preprocessing logic from the optimization routines and ensures reproducibility of the feature-target mapping. By returning the tuple

(X, J, k_{0}, k_{1}, Z 1, Z 2)

, the pipeline retains access to both the summarized metrics and the full error trajectories for visualization or additional analysis.

2.6. Normalize Data

To prevent variables with larger numerical ranges from dominating the search process, a normalization procedure is applied. This is performed after loading the CSV data to rescale all input variables into a common range, typically [0, 1]. Also, this procedure improves the performance and convergence speed of optimization algorithms such as BO, PSO, and GA [15,16,17].

The data normalization procedure was implemented in Python using scikit-learn and joblib [18]. The script to normalize data in .py processes the gains and cost function to improve convergence stability across the optimization methods. The workflow includes the following steps:

Gain Normalization. To ensure that each gain dimension contributes equally to the surrogate model, the matrix $X \in R^{6400 \times 2}$ (23) that has the controller gains $k_{0}$ and $k_{1}$ is rescaled to the unit hypercube ${[0, 1]}^{2}$ . This operation is performed by using the following transformation: $\tilde{X} = \frac{X - X_{min}}{X_{max} - X_{min}},$ where $X_{min}$ and $X_{max}$ are vectors containing the minimum and maximum of each column, respectively.
Cost Function Normalization. To prevent numerical scaling issues when fitting surrogate models or computing particle and genetic algorithm fitness scores, the scalar cost values $J \in R^{6400}$ (22), which represent the aggregate steady-state tracking error per simulation, are normalized using the same approach as the following:

$\tilde{J} = \frac{J - min (J)}{max (J) - min (J)} .$

(24)
Persistence of Scaling Parameters. The fitted scalers (scaler_k.pkl and scaler_J.pkl) are saved to disk using joblib.dump. This allows the optimization .py files to apply the same transformation to new candidate gains during optimization and to inverse-transform normalized results back to the original gain space as follows: $X = s c a l e r_X . i n v e r s e_t r a n s f o r m (\tilde{X}),$ $J = s c a l e r_J . i n v e r s e_t r a n s f o r m (\tilde{J}) .$
Verification. To confirm numerical consistency, after normalization and descaling, the .py file asserts the following: $∥ X - X_{recovered} ∥_{\infty} < 10^{- 8}, | J - J_{recovered} | < 10^{- 8} .$

This normalization procedure ensures that BO, PSO, and GA algorithms operate in a uniform and reproducible numerical domain, thus improving convergence reliability. The normalized controller gains are represented as the vector

\tilde{k} = [{\tilde{k}}_{0}, {\tilde{k}}_{1}] \in {[0, 1]}^{2},

(25)

where each component is obtained by applying the normalization transformation to the original gains

k_{0}

and

k_{1}

defined in (18).

Also, the normalized tracking errors are defined as

{\tilde{Z}}_{i} (j, k) = \frac{Z_{i} (j, k) - min (Z_{i})}{max (Z_{i}) - min (Z_{i})}, i = 1, 2,

(26)

where

Z_{i} \in R^{6400 \times 5000}

represents the original tracking errors for joint i,

min (Z_{i})

and

max (Z_{i})

are the global minimum and maximum over all elements of

Z_{i}

, with j indexing the simulation number (corresponding to a specific

(k_{0}, k_{1})

pair), and k indexing the discrete-time steps within the simulation. This normalization rescales the full time series data to the interval

[0, 1]

, ensuring stability and comparability across evaluations.

The mapping of the gain vector (25) and the normalized matrices

{\tilde{Z}}_{1}

and

{\tilde{Z}}_{2}

(26) allows the optimization process to explore the search space consistently while maintaining the ability to recover the original gains and tracking errors using the inverse transformation. Consequently, the normalized parameters

\tilde{k}

and matrices

{\tilde{Z}}_{1}

and

{\tilde{Z}}_{2}

serve as the inputs for all subsequent optimization algorithms BO, PSO, and GA, and their optimal values can later be projected back to the control parameter space of the 2-DOF robotic manipulator represented in dicrete time (4).

2.7. Bayesian Optimization Algorithm

The BO branch constitutes a sequential search procedure for minimizing the combined tracking error of the neural discrete-time sliding mode controller applied to the 2-DOF robot manipulator. This procedure relies on a surrogate probabilistic model and iteratively refines the estimation of the optimal control gains

{k_{0}, k_{1}}

. The implementation leverages the scikit-optimize library in Python [19]. Each step of the workflow in the left part of Figure 1 is detailed as follows:

Initialize Samples. The normalized input space $X \subset {[0, 1]}^{2}$ is defined as $\tilde{k} = [{\tilde{k}}_{0}, {\tilde{k}}_{1}]$ (25), where scaling is performed to the complete dataset ${(k_{0}, k_{1}) \in {[100, 258]}^{2}}$ (18) with a step of two. To cover the search space efficiently, initial samples are selected via Latin Hypercube Sampling [20].
Fit Gaussian Process (GP) Surrogate Model. In this work, two independent Gaussian Process Regressors (GaussianProcessRegressor) are trained to model the normalized tracking errors $J_{1}$ and $J_{2}$ , which are defined as follows:

$J_{1} = mean (Z_{1} [t > 4.9 s]), J_{2} = mean (Z_{2} [t > 4.9 s]),$

(27)

where $Z_{i}$ are vectors of length 5000 time steps per simulation. The kernel hyperparameters are optimized by maximizing the marginal likelihood, and the resulting models provide predictive means and variances.
These outputs correspond to the normalized cost values ${\tilde{J}}_{1}$ and ${\tilde{J}}_{2}$ derived from the steady-state tracking errors described in (24) in Section 2.6.
Maximize Acquisition Function. To select the next candidate, the expected improvement criterion $α (\bar{x})$ is optimized over the input space $X$ . The objective function for BO combines both tracking errors as follows:

$f (x) = 0.5 {\hat{J}}_{1} (x) + 0.5 {\hat{J}}_{2} (x),$

(28)

where ${\hat{J}}_{i} (x)$ denotes the GP prediction where the default optimizer in scikit-optimize is employed in the expected improvement maximization.
Evaluate Objective. Now, using the surrogate models, the selected normalized gains are denormalized to the original scale of ${k_{0}, k_{1}}$ and evaluated by predicting $J_{1}$ and $J_{2}$ of (27). This denormalization process uses the inverse transformation $k = s c a l e r_X . i n v e r s e_t r a n s f o r m (\tilde{k})$ , where $\tilde{k} = {[{\tilde{k}}_{0} {\tilde{k}}_{1}]}^{⊤} \in {[0, 1]}^{2}$ are the normalized gain values for the original gains $k_{0}$ (14) and $k_{1}$ (10) obatained in Section 2.6.
Update Dataset. After selecting a new gain configuration ${\tilde{k}}^{(n e w)}$ , the associated cost $J^{(n e w)}$ is computed either through the surrogate model or via a direct simulation. Then, following the evaluation of the new gain configuration, the resulting data point $({\tilde{k}}^{(n e w)}, J^{(n e w)})$ is integrated into the training set. This updated dataset is used to retrain the Gaussian Process model, thereby enhancing its predictive performance across the cost landscape associated with the discrete-time sliding mode controller (8) that governs the motion of the 2-DOF planar robot in discrete time (4) via the recurrent high-order NN (6).
Optimal Gains. At this point, until the improvement in the cost function falls below a threshold, the optimization loop is repeated for $N = 50$ iterations. At the end of this process, the best performing normalized parameters ${\tilde{k}}^{*} = [{\tilde{k}}_{0}^{*}, {\tilde{k}}_{1}^{*}]$ are transformed back into the original gain space using the inverse normalization function. These Optimal Gains ${k_{0}^{*}, k_{1}^{*}}$ correspond to the optimal tuning of the discrete-time sliding mode controller, minimizing the aggregate tracking error J (22) computed from the steady-state values of $Z_{1} (k)$ and $Z_{2} (k)$ of (7) (or in case $e_{1} (k)$ and $e_{2} (k)$ of (21)), the time-domain tracking errors for each joint.

2.8. Particle Swarm Optimization Algorithm

The PSO branch draws inspiration from the emergent collective behavior of biological swarms such as bird flocks or fish schools [21]. Within the goal of this project, PSO is applied as a population-based metaheuristic algorithm designed to minimize the aggregate tracking error J (22) of the neural discrete-time sliding mode controller (8) implemented on the 2-DOF robotic manipulator in discrete time (4).

The optimization presented in this work is carried out over the normalized gain space

X \subset {[0, 1]}^{2}

, where each particle in the swarm represents the normalized gains

\tilde{k} = [{\tilde{k}}_{0}, {\tilde{k}}_{1}]

(25). These gains, as candidate solutions, are mapped to the original controller parameters

{k_{0}, k_{1}} \in {[100, 258]}^{2}

(18) with a step of two by inverse normalization for performance evaluation using the NN tracking error metrics derived from

Z_{1} (k)

and

Z_{2} (k)

, as defined in (7) (or the original tracking errors

e_{1} (k)

and

e_{2} (k)

of (21)). This is due to the recurrent high-order NN (5) reproducing the behavior of the 2-DOF robotic manipulator in discrete time (4).

The PSO optimization process follows the central workflow depicted in Figure 1 and is implemented using the scikit-optimize and pyswarm libraries. The algorithm simulates a swarm of particles moving in the gain space, each adjusting its velocity based on both its personal best performance and the best performance of the swarm as a whole. This collaborative mechanism allows the swarm to efficiently explore the cost landscape and converge to the global minimum.

The main steps of the PSO procedure are detailed below.

Initialize Particle Swarm. To begin the optimization process, a swarm of $N_{p} = 30$ particles is initialized within the normalized gain space $X = {[0, 1]}^{2}$ , as defined in Section 2.6. Each particle i is assigned an initial position vector $x_{i}^{(0)} \in X$ , representing a candidate normalized gain vector $\tilde{k} = [{\tilde{k}}_{0}, {\tilde{k}}_{1}]$ (25) and an initial velocity vector $v_{i}^{(0)} \in R^{2}$ . The velocity components are randomly sampled from a uniform distribution in the interval $[- 0.1, 0.1]$ , which allows for exploratory movement within the search space at the start of the algorithm.
Evaluate Fitness. For each particle position $x_{i} \in X = {[0, 1]}^{2}$ , representing a candidate normalized gain vector ${\tilde{k}}_{i} = [{\tilde{k}}_{0}^{(i)}, {\tilde{k}}_{1}^{(i)}]$ as defined in (25), the fitness function is computed using the surrogate models. Specifically, the predicted tracking cost is calculated as

$f_{PSO} (x_{i}) = 0.5 {\hat{J}}_{1} (x_{i}) + 0.5 {\hat{J}}_{2} (x_{i}),$

where ${\hat{J}}_{1} (x_{i})$ and ${\hat{J}}_{2} (x_{i})$ are the outputs of the Gaussian Process regressors trained on the normalized dataset, representing approximations of the normalized cost function (24) derived from normalized tracking error matrices ${\tilde{Z}}_{1}$ and ${\tilde{Z}}_{2}$ as defined in (26).
Update Personal and Global Best. Each particle maintains its personal best position $p_{i}$ , and the swarm identifies the global best position g based on the fitness function $f_{PSO} (x_{i})$ as follows:

$p_{i}^{(t + 1)} = arg min {f_{PSO} (x_{i}^{(t)}), f_{PSO} (p_{i}^{(t)})}, g^{(t + 1)} = arg min_{i} f_{PSO} (p_{i}^{(t + 1)})$

(29)

The fitness function is evaluated using the surrogate-based approximation defined in (28). Note: Although Equation (28) was originally introduced in the context BO, it remains valid for the PSO procedure, since the same Gaussian Process surrogate models are used to estimate the normalized tracking errors ${\hat{J}}_{1} (x)$ and ${\hat{J}}_{2} (x)$ based on the steady-state values derived from ${\tilde{Z}}_{1}$ and ${\tilde{Z}}_{2}$ (26) as described in Section 2.6.
Update Positions. Each particle updates its velocity and moves to a new location in the normalized gain space $X \subset {[0, 1]}^{2}$ . First, the velocity is updated according to the standard PSO update rule: $v_{i}^{(t + 1)} = ω v_{i}^{(t)} + c_{1} r_{1} (p_{i}^{(t)} - x_{i}^{(t)}) + c_{2} r_{2} (g^{(t)} - x_{i}^{(t)}),$ where $ω = 0.5$ is the inertia weight that controls the influence of the previous velocity; $c_{1} = c_{2} = 1.5$ are the cognitive and social acceleration coefficients; $r_{1}, r_{2} \sim U [0, 1]$ are vectors of uniformly distributed random numbers that introduce stochasticity in exploration. Then, the particle’s position is updated as $x_{i}^{(t + 1)} = x_{i}^{(t)} + v_{i}^{(t + 1)}$ and clipped to remain within the bounds of the normalized domain: $x_{i}^{(t + 1)} = min (max (x_{i}^{(t + 1)}, 0), 1) .$
This ensures that each particle remains within the feasible search region for the normalized gains $\tilde{k} = [{\tilde{k}}_{0}, {\tilde{k}}_{1}]$ (25).
Termination Criterion. The PSO algorithm proceeds iteratively until one of the following stopping conditions is met:
–
the number of iterations reaches a predefined maximum of $T^{'} = 50$ , or
–
the absolute improvement in the global best fitness value $f_{PSO} (g^{(t)})$ over consecutive iterations is smaller than a convergence threshold $ϵ = 10^{- 6}$ .
This condition is checked at each iteration. If satisfied (Yes), the loop terminates and proceeds to the final block (Optimal Gains). Otherwise (No), the algorithm returns to the Evaluate Fitness step to continue refining the swarm’s exploration of the normalized gain space $X = {[0, 1]}^{2}$ .
This convergence behavior ensures that the PSO process balances global exploration and local refinement while searching for the optimal normalized gains ${\tilde{k}}^{*} = [{\tilde{k}}_{0}^{*}, {\tilde{k}}_{1}^{*}]$ , which can then be inverse-transformed to obtain the corresponding gains ${k_{0}^{*}, k_{1}^{*}}$ for the controller.
Optimal Gains ${k_{0}^{*}, k_{1}^{*}}$ . Once the termination criterion is satisfied, the global best particle position $g^{*} \in {[0, 1]}^{2}$ is selected as the optimal solution in the normalized space. To recover the corresponding control gains for the discrete-time sliding mode controller, the inverse normalization transformation is applied using the scaler saved in Section 2.6: $[k_{0}^{*}, k_{1}^{*}] = s c a l e r_X . i n v e r s e_t r a n s f o r m (g^{*}) .$
These gains represent the optimal tuning configuration that minimizes the aggregate tracking error defined by the surrogate-based fitness function $f_{PSO} (x)$ , which itself is derived from the normalized steady-state tracking errors ${\tilde{Z}}_{1}$ and ${\tilde{Z}}_{2}$ described in (26).
The resulting Optimal gains ${k_{0}^{*}, k_{1}^{*}}$ are the values alongside their corresponding cost J computed from the original tracking error matrices. This optimal pair can now be implemented in the neural discrete-time sliding mode controller (8) to achieve improved performance in joint trajectory tracking for the 2-DOF planar robot in discrete time (4).

2.9. Genetic Algorithm Optimization

The GA branch employs a population-based evolutionary method to minimize the overall steady-state tracking error of the neural discrete-time sliding mode controller for the 2-DOF planar robot. This is illustrated in the rightmost branch of the workflow in Figure 1 and is implemented using the DEAP [22] and scikit-optimize [23] libraries. The optimization operates within the normalized gain space

X = {[0, 1]}^{2}

, where each candidate solution represents a gain vector

\tilde{k} = [{\tilde{k}}_{0}, {\tilde{k}}_{1}]

, as defined in (25). These are obtained by scaling the original gain set

{k_{0}, k_{1}} \in {[100, 258]}^{2}

(see (18)), as discussed in Section 2.6.

The GA procedure comprises the following stages:

Initialize Population. A starting population of $N_{pop} = 50$ individuals is generated by random sampling from the normalized domain $X$ . Each individual $x_{j}^{(0)} = [{\tilde{k}}_{0}^{(j)}, {\tilde{k}}_{1}^{(j)}] \in {[0, 1]}^{2}$ represents a potential controller gain vector. Uniform initialization is applied to guarantee broad exploration of the search space and maintain genetic diversity for the evolutionary algorithm.
Evaluate Fitness. Each individual’s fitness is assessed through the Gaussian Process (GP) surrogate model. The fitness function combines predicted normalized tracking costs as

$f_{GA} (x_{j}) = 0.5 {\hat{J}}_{1} (x_{j}) + 0.5 {\hat{J}}_{2} (x_{j}),$

(30)

where ${\hat{J}}_{1} (x_{j})$ and ${\hat{J}}_{2} (x_{j})$ are predictions from GP models trained on the normalized error matrices ${\tilde{Z}}_{1}$ and ${\tilde{Z}}_{2}$ (see (26)). This cost function aligns with (28) from the Bayesian Optimization framework and remains applicable here, as identical surrogate models are employed.
Select Parents. A tournament selection mechanism is used with a tournament size of $k_{t o u r n} = 3$ . In this process, random groups of individuals are created from the population, and the highest-fitness candidate from each group is chosen for reproduction. This approach maintains an effective equilibrium between selecting high-performing solutions and sustaining population diversity.
Crossover and Mutation. The next generation is produced through the following:
–
Crossover. Simulated Binary Crossover (SBX) [24,25] is performed with crossover probability $p_{c} = 0.9$ . For parent solutions, offspring are generated via

$x_{child} = SBX (x_{parent 1}, x_{parent 2}, η_{c}),$

where the distribution index $η_{c} = 15$ determines offspring diversity.
–
Mutation. Polynomial mutation operates with probability $p_{m} = 0.1$ , adding controlled random variations:

$x_{j}^{'} = x_{j} + δ, δ \sim Polynomial (η_{m} = 20),$

(31)

with solutions constrained to the normalized space ${[0, 1]}^{2}$ .
Termination Criterion. The optimization runs for up to $G = 50$ generations or stops earlier if the relative improvement in the best fitness value drops below $ϵ = 10^{- 6}$ . Each iteration checks the stopping condition: if it is not satisfied (No), the process returns to the fitness evaluation; if satisfied (Yes), the algorithm ends and proceeds to the Optimal Gains block.
Optimal Gains ${k_{0}^{*}, k_{1}^{*}}$ . Upon convergence, the optimal normalized solution $x^{*} = [{\tilde{k}}_{0}^{*}, {\tilde{k}}_{1}^{*}]$ is identified. The actual controller gains are obtained by reversing the normalization:

$[k_{0}^{*}, k_{1}^{*}] = s c a l e r_X . i n v e r s e_t r a n s f o r m (x^{*}),$

producing the final control parameters. These optimized gains minimize the surrogate-based cost function $f_{GA} (x)$ that represents the robot joints’ steady-state tracking errors.

3. Results

To enhance the transparency of the optimization process, progression plots are provided for each algorithm: BO (Figure 2), PSO (Figure 3), and GA (Figure 4). Each figure is arranged as a 2 × 3 grid: the top row shows Joint 1 and the bottom row Joint 2; the columns correspond to early, mid, and final stages of the search. In every panel, the black curve is the reference, the blue curve is the estimated position

x_{j} (k)

, and the red curve (right axis) is the estimation error

e_{j} (k)

. These signals are reconstructed directly from the dataset using

x_{j} (k) = {ref}_{j} + z_{j} (k)

and

e_{j} (k) = z_{j} (k)

. When available, the three milestones are taken from the optimization logs at iterations 1, 25, and 50; otherwise, a representative early/mid/final triplet is selected by sorting the per-simulation cost

J = mean (| z_{1} (k) |) + mean (| z_{2} (k) |)

and picking worst/median/best. This presentation complements the convergence summary (Figure 5) by showing how tracking quality improves over time in each method.

3.1. Dataset Neural Sliding Mode Control Simulation

As a result of the script and for the dataset obtained in Section 2.4, Figure 6 and Figure 7 show simulation comparisons for all tracking performance of the proposed control scheme for the 2-DOF planar robotic manipulator for Joints 1 and 2. Each figure displays the reference signal (in black), the real joint position (in red), and the estimated position provided by recurrent high-order NN (in blue) for all simulations with the 6400 different controller gain values

k_{0}

and

k_{1}

(18).

In Figure 6, the measured position

q_{1} (k)

and the estimated state

x_{1} (k)

closely follow reference

{ref}_{1} (k)

over the full 5 s simulation. The red and blue semi-transparent traces correspond to

6400

simulations with different controller gains, allowing visual assessment of variability; the narrow dispersion indicates consistent tracking. The inset zoom emphasizes the initial transient. These results also show that the recurrent high-order NN accurately estimates the robot state, as

x_{1} (k)

matches

q_{1} (k)

with high fidelity.

Figure 7 shows analogous results for Joint 2. After a short transient, both

q_{2} (k)

and

x_{2} (k)

align with

{ref}_{2} (k)

, demonstrating robust tracking over the full sweep of gains. The agreement between

x_{2} (k)

and

q_{2} (k)

further validates the effectiveness of the neural estimator. To explicitly illustrate optimization progress (beyond aggregate tracking), it includes per-algorithm progression plots (early/mid/final milestones taken from iterations 1, 25, and 50 or a worst/median/best fallback), see Figure 2, Figure 3 and Figure 4, and the convergence of the objective value over iterations/generations in Figure 5. These plots complement Figure 6 and Figure 7 by showing how performance improves as each method converges.

Table 1 reports the mean values of the state estimation errors for clarity. MAE (mean absolute error) is used as the primary metric and also the mean error (bias) and RMSE (root mean squared error) are reported. For the joint positions,

MAE = 2.63 \times 10^{- 3}

rad and

4.083 \times 10^{- 3}

rad for

e_{1} (k)

and

e_{2} (k)

, respectively, are obtained with very small mean errors (

2.56 \times 10^{- 4}

rad and

3.45 \times 10^{- 5}

rad), indicating negligible bias. The corresponding RMSE values (

3.056 \times 10^{- 3}

rad and

4.558 \times 10^{- 3}

rad) are close to the MAE, suggesting the absence of large outliers in the position estimates. For the joint velocities, the MAE is

6.785 \times 10^{- 2}

rad/s for

e_{3} (k)

and

4.457 \times 10^{- 1}

rad/s for

e_{4} (k)

, with mean errors

7.39 \times 10^{- 4}

rad/s and

2.91 \times 10^{- 3}

rad/s, respectively. The larger gap between RMSE and MAE for

e_{4} (k)

(

0.500

vs.

0.446

rad/s) points to sporadic higher deviations in the velocity estimate of Joint 2. Table 1 complements Figure 6 and Figure 7 and clearly demonstrates the accuracy of the state estimation algorithm.

3.2. Optimal Control Gains for BO, PSO, and GA

Table 2, Table 3, Table 4 and Table 5 summarize the performance of the optimization algorithms applied to the tuning of the neural discrete-time sliding mode controller gains for the robot manipulator.

Table 2 and Table 3 jointly summarize accuracy and efficiency under an identical pipeline (shared data loading/normalization and a single Python implementation). The optimal gains concentrate in a narrow region (

k_{0}^{*} \approx 106

,

k_{1}^{*} \in [194, 195]

), and the average steady-state errors

{\bar{e}}_{1}, {\bar{e}}_{2}

are computed over the final 0.1 s (

t > 4.9

s). Quantitatively, PSO attains the lowest error in Joint 1 (

7.36 \times 10^{- 5}

rad), GA the lowest in Joint 2 (

8.18 \times 10^{- 3}

rad), and BO remains competitive in both (

7.81 \times 10^{- 5}

rad;

8.39 \times 10^{- 3}

rad) while using only 50 evaluations. In terms of time, PSO is the fastest (23.44 s), BO is similar (23.65 s), and GA is costlier (61.98 + 7.70 s refinement

\approx 69.7

s), consistent with its larger evaluation budget.

Table 4 shows that the GA requires the highest number of function evaluations (7500), significantly more than PSO with 1500 and BO with only 50. This extensive sampling by GA likely contributes to its ability to find a more refined solution, as reported in Table 2, but comes at the cost of considerably increased computational time.

Table 5 presents the predictive accuracy of the Gaussian Process surrogate models used in the optimization, reported through the coefficient of determination

R^{2}

. The low

R^{2}

values for both joints (0.0336 for DOF1 and 0.0084 for DOF2) indicate that the surrogate models provide only coarse approximations of the true cost landscape. As such, while they offer a useful heuristic for guiding the search process, direct simulation evaluations remain essential to ensure accurate assessment of candidate solutions.

Figure 5 illustrates the convergence behavior of the three optimization algorithms over 50 iterations or generations. The BO trajectory shows significant oscillations, reflecting its probabilistic sampling strategy and active exploration of the gain space. In contrast, both PSO and GA exhibit rapid convergence within the first few iterations, stabilizing early in low-error regions. For PSO and GA, the plot displays the best normalized cost value found at each generation, highlighting their exploitation-driven search dynamics. The dashed horizontal line representing PSO marks the lowest final cost among the three methods, consistent with the results reported in Table 2. These convergence profiles illustrate the distinct exploration–exploitation balances intrinsic to each optimization strategy. For clarity, Figure 5 plots the normalized objective

\tilde{J}

(built from the steady-state tracking-error cost defined in Section 2.6; lower is better). Each curve reports the best-so-far value

{min}_{t^{'} \leq t} \tilde{J} (k^{(t^{'})})

as a function of iteration/generation t, and all methods are run with an equal budget of 50 iterations/generations. The y-axis is unitless due to normalization to

[0, 1]

, and the x-axis counts iterations (BO) or generations (PSO/GA).

The early stagnation of the PSO curve suggests that the swarm quickly located a promising region in the search space and collectively converged without further significant improvement, possibly due to limited diversity or suboptimal balance in its exploration parameters. On the other hand, the persistent oscillations in the BO curve indicate that the acquisition function continues to propose exploratory samples far from previously evaluated regions. This reflects the BO strategy’s emphasis on global search, even after identifying near-optimal regions, which may delay convergence but can help avoid premature local optima. Such behavior highlights that while BO may take longer to settle, it remains valuable in problems where the cost landscape is highly multimodal or where evaluations are costly and limited in number.

Overall, the results demonstrate that all three optimization strategies effectively identify high-performance gain configurations. BO offers a favorable trade-off between tracking performance and computational efficiency, whereas the GA achieves the lowest tracking cost at the expense of significantly higher computational effort and number of evaluations.

Figure 8 shows the torques produced over a 5 s simulation with

T = 1

ms. Dashed horizontal lines indicate actuator limits:

\pm 80

N

\cdot

m for Joint 1 and

\pm 20

N

\cdot

m for Joint 2. Both

τ_{1} (k)

and

τ_{2} (k)

remain within their bounds, exhibiting the expected behavior of SMC while satisfying the torque requirements stated in Section 2.3. The apparent saturation in the control signals is an intended consequence of the control law in (8). Whenever

| U_{eq, i} (k) | > {sat}_{i}

, the second branch is activated and the commanded torque is clipped to

\pm {sat}_{i}

with the sign of

U_{eq, i} (k)

. This explains the plateaus at

\pm 80

N

\cdot

m (Joint 1) and

\pm 20

N

\cdot

m (Joint 2), ensures compliance with actuator limits, and produces the characteristic behavior typical of SMC.

4. Discussion

The results presented in Section 3 confirm the effectiveness of the proposed AI-based optimization framework for tuning the gains of a neural discrete-time sliding mode controller applied to a 2-DOF robotic manipulator. However, several additional aspects merit discussion to contextualize the contributions and limitations of this work.

First, while this study focused on three popular metaheuristic algorithms—Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GAs)—it is important to note that other AI-driven and classical optimization techniques are available. Alternatives such as Differential Evolution, Ant Colony Optimization, Reinforcement Learning-based controllers, or even gradient-based deterministic methods (e.g., Sequential Quadratic Programming) could also be explored. Each approach brings trade-offs in terms of convergence rate, robustness, and computational demand, which may prove beneficial in different application scenarios.

Second, the optimization objective in this study targeted the minimization of the steady-state tracking error, as defined by the cost function in (22). While this metric is fundamental for ensuring accurate long-term tracking, it does not explicitly account for transient dynamics such as overshoot, settling time, or control energy expenditure.

Third, the simulation data used for training the surrogate models and evaluating the controller was generated through a systematic sweep of normalized gain values. Despite the limited size of the dataset (on the order of hundreds of samples), the convergence of BO, PSO, and GA to similar optimal regions indicates that the data were sufficiently informative to guide the optimization processes. This also highlights the advantage of working in a normalized gain space, which allowed the models to generalize well across the explored domain.

It is also worth noting that the surrogate models used in BO and PSO approximated the cost landscape with relatively low precision (as reflected by the

R^{2}

scores in Table 5). Nevertheless, the optimization routines remained effective due to the reinforcement provided by direct simulation evaluations. This suggests that even modest-quality models can support efficient gain tuning when combined with simulation-based corrections.

Finally, although the neural plant model employed an EKF for online adaptation of covariance matrices, future research could investigate other adaptive training mechanisms such as recursive least squares, ensemble learning, or deep learning-based dynamics estimators. Incorporating uncertainty quantification into the plant model could further improve the robustness of the control strategy in real-world settings.

In summary, the methodology proposed in this study establishes a solid foundation for AI-based controller optimization. It remains extensible to other objective functions, optimization techniques, and controller architectures, thereby offering a flexible framework for advancing autonomous control of robotic systems.

Recent studies have explored data-driven or metaheuristic tuning of robust controllers in robotic applications. For instance, work [26] combines SMC with online BO for a bio-inspired underwater robot, showing marked improvements in depth and pitch regulation after roughly fifty online updates while maintaining Lyapunov-stable operation. In terrestrial settings, local BO with crash constraints has been proposed to tune controller parameters under hardware limits safely [27]. Within metaheuristics, work [28] integrates PSO with a modified power–rate SMC to mitigate chattering and simplify tuning on a multi-DOF manipulator with experimental validation; the authors in [29] compare several metaheuristics (DE, PSO, hybrids) for manipulator trajectory tracking using a multi-term objective (position/orientation/joint errors) and uniform iteration budgets; and in Keshti [30] employ GA to tune SMC gains for human-arm trajectory tracking, focusing on minimizing summary tracking error.

Compared with the above, we explicitly adopt a discrete-time SMC and a recurrent high-order neural network NN that reproduces the dynamics of the 2-DOF manipulator (serving as the predictive plant for indirect control). Within this discrete-time setting, we evaluate the same controller and dataset across BO, PSO, and GA, enabling a direct trade-off reading between accuracy and computation (Table 2 and Table 3). Table 2 shows that PSO attains the lowest steady-state error for Joint 1, while GA attains the lowest for Joint 2 (by a narrow margin), and BO is competitive in both joints. From Table 3, PSO achieves the shortest optimization time among the three, BO is similar, and GA is noticeably more expensive—consistent with prior reports that PSO offers fast convergence for nonlinear objectives [28] and that BO is attractive for sample-efficient tuning [26,27]. Together with the progression plots and convergence curves, these results contextualize our contribution: a reproducible, apples-to-apples benchmark of discrete-time SMC gain tuning with a recurrent high order NN-based plant model for a 2-DOF robot, revealing that PSO is an excellent first choice when compute is tight, BO is attractive when safe, sample-efficient exploration is needed, and GA can yield competitive accuracy at higher computational cost. Table 6 shows the mentioned comparison.

Following industry practice, steady state is declared once the trajectory enters and remains within a tolerance band for a dwell time. In accordance with ISO 9283 [31] “position stabilization time,” the band can be tied to robot repeatability or set as a percentage band (e.g.,

\pm 2 %

), and the dwell time ensures persistence inside the band. In our experiments (sampling at 1 kHz), we adopt a 0.1 s dwell (100 samples) once the band is entered, and compute steady-state errors over that dwell segment. This aligns with the classical settling-time definition based on tolerance bands used in control theory [32].

5. Conclusions

This work presented a comprehensive AI-based optimization framework for tuning the gains of a neural discrete-time sliding mode controller applied to a two-degree-of-freedom robotic manipulator. The controller was indirectly implemented through a neural network trained with an EKF, allowing the estimation of the plant dynamics and facilitating robust control under model uncertainties.

Three optimization strategies—Bayesian Optimization (BO), Particle Swarm Optimization (PSO), and Genetic Algorithms (GAs)—were evaluated using a normalized parameter space and surrogate modeling based on Gaussian Processes. The optimization process was driven by the minimization of normalized steady-state tracking errors for both joints of the robot. The normalization framework enabled a unified treatment of the gain space and tracking error costs, improving convergence efficiency and reproducibility.

The results demonstrated that all three metaheuristics successfully identified high-performance gain configurations. GAs achieved the lowest tracking cost but at the expense of a significantly higher number of evaluations and computation time. BO and PSO yielded nearly identical results with much lower computational effort. These findings highlight that the proposed methodology provides a flexible and efficient solution for automatic controller tuning, especially in scenarios where analytical models are unavailable or costly to evaluate.

Future work will explore real-time adaptation of controller gains using online learning techniques and extend the optimization framework to multi-objective formulations involving control effort, energy consumption, and robustness metrics.

Funding

This work was supported by Universidad de Guadalajara through the Programa de Apoyo a la Mejora en las Condiciones de Producción de las Personas Integrantes del SNII y SNCA, (PROSNII), 2025.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The author gratefully acknowledges the invaluable support provided by the Universidad de Guadalajara, Centro Universitario de los Lagos.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
BO	Bayesian Optimization
EKF	Extended Kalman Filter
GA	Genetic Algorithms
GP	Gaussian Process
NN	Neural Network
PSO	Particle Swarm Optimization
SMC	Sliding Mode Control
2-DOF	Two-Degree-Of-Freedom

References

Silaa, M.Y.; Barambones, O.; Bencherif, A. Robust adaptive sliding mode control using stochastic gradient descent for robot arm manipulator trajectory tracking. Electronics 2024, 13, 3903. [Google Scholar] [CrossRef]
Sosa Méndez, D.; Bedolla-Martínez, D.; Saad, M.; Kali, Y.; García Cena, C.E.; Álvarez, Á.L. Upper-limb robotic rehabilitation: Online sliding mode controller gain tuning using particle swarm optimization. Robotics 2025, 14, 51. [Google Scholar] [CrossRef]
Wang, Q.; Liu, D.; Carmichael, M.G.; Aldini, S.; Lin, C.T. Computational model of robot trust in human co-worker for physical human-robot collaboration. IEEE Robot. Autom. Lett. 2022, 7, 3146–3153. [Google Scholar] [CrossRef]
Martinez-Cantin, R. Bayesian optimization with adaptive kernels for robot control. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3350–3356. [Google Scholar]
Saidi, K.; Bournediene, A.; Boubekeur, D. Genetic Algorithm Optimization of sliding Mode Controller Parameters for Robot Manipulator. Int. J. Emerg. Technol. 2021, 12, 119–127. [Google Scholar]
Lima, G.d.S.; Porto, D.R.; de Oliveira, A.J.; Bessa, W.M. Intelligent control of a single-link flexible manipulator using sliding modes and artificial neural networks. Electron. Lett. 2021, 57, 869–872. [Google Scholar] [CrossRef]
Mosharafian, S.; Afzali, S.; Bao, Y.; Velni, J.M. A deep reinforcement learning-based sliding mode control design for partially-known nonlinear systems. In Proceedings of the 2022 European Control Conference (ECC), London, UK, 12–15 July 2022; pp. 2241–2246. [Google Scholar]
Duan, X.; Liu, X.; Ma, Z.; Huang, P. Sampled-Data Admittance-Based Control for Physical Human–Robot Interaction With Data-Driven Moving Horizon Velocity Estimation. IEEE Trans. Ind. Electron. 2024, 72, 6317–6328. [Google Scholar] [CrossRef]
Spong, M.W.; Hutchinson, S.; Vidyasagar, M. Robot Modeling and Control; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Pizarro-Lerma, A.; García-Hernández, R.; Santibáñez, V. Fine-tuning of a fuzzy computed-torque control for a 2-DOF robot via genetic algorithms. IFAC- PapersOnLine 2018, 51, 326–331. [Google Scholar] [CrossRef]
Kazantzis, N.; Kravaris, C. Time-discretization of nonlinear control systems via Taylor methods. Comput. Chem. Eng. 1999, 23, 763–784. [Google Scholar] [CrossRef]
Castañeda, C.E.; Esquivel, P. Decentralized neural identifier and control for nonlinear systems based on extended Kalman filter. Neural Netw. 2012, 31, 81–87. [Google Scholar] [CrossRef]
Siebert, J.; Groß, J.; Schroth, C. A systematic review of python packages for time series analysis. arXiv 2021, arXiv:2104.07406. [Google Scholar] [CrossRef]
Rayhan, R.; Rayhan, A.; Kinzler, R. Exploring the Power of Data Manipulation and Analysis: A Comprehensive Study of NumPy, SciPy, and Pandas. 2023. Available online: https://www.researchgate.net/publication/373217405_Exploring_the_Power_of_Data_Manipulation_and_Analysis_A_Comprehensive_Study_of_NumPy_SciPy_and_Pandas (accessed on 14 September 2025). [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Liu, Z.; Zhao, K.; Liu, X.; Xu, H. Design and optimization of haze prediction model based on particle swarm optimization algorithm and graphics processor. Sci. Rep. 2024, 14, 9650. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.S.; Kim, M.K.; Fu, N.; Liu, J.; Wang, J.; Srebric, J. Investigating the impact of data normalization methods on predicting electricity consumption in a building using different artificial neural network models. Sustain. Cities Soc. 2025, 118, 105570. [Google Scholar] [CrossRef]
Jolly, K. Machine Learning with Scikit-Learn Quick Start Guide: Classification, Regression, and Clustering Techniques in Python; Packt Publishing Ltd.: Birmingham, UK, 2018. [Google Scholar]
Head, T.; Cherti, M.; Pedregosa, F.; Zhdanov, M.; Louppe, G.; Raffel, C.; Mueller, A.; Fauchere, N.; McInnes, L.; Grisel, O. Scikit-Optimize: Sequential Model-Based Optimization with Scikit-Learn. 2018. Available online: https://scikit-optimize.github.io/stable/ (accessed on 14 September 2025).
Greif, L.; Hübschle, N.; Kimmig, A.; Kreuzwieser, S.; Martenne, A.; Ovtcharova, J. Structured sampling strategies in Bayesian optimization: Evaluation in mathematical and real-world scenarios. J. Intell. Manuf. 2025, 1–31. [Google Scholar] [CrossRef]
Gad, A.G. Particle swarm optimization algorithm and its applications: A systematic review. Arch. Comput. Methods Eng. 2022, 29, 2531–2561. [Google Scholar] [CrossRef]
Malashin, I.; Masich, I.; Tynchenko, V.; Nelyub, V.; Borodulin, A.; Gantimurov, A. Application of natural language processing and genetic algorithm to fine-tune hyperparameters of classifiers for economic activities analysis. Big Data Cogn. Comput. 2024, 8, 68. [Google Scholar] [CrossRef]
Hackeling, G. Mastering Machine Learning with Scikit-Learn; Packt Publishing Ltd.: Birmingham, UK, 2017. [Google Scholar]
Roeva, O.; Zoteva, D.; Roeva, G.; Ignatova, M.; Lyubenova, V. An Effective Hybrid Metaheuristic Approach Based on the Genetic Algorithm. Mathematics 2024, 12, 3815. [Google Scholar] [CrossRef]
Fanggidae, A.; Prasetyo, M.C.; Polly, Y.T.; Boru, M. New Approach of Self-Adaptive Simulated Binary Crossover-Elitism in Genetic Algorithms for Numerical Function Optimization. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 174–183. [Google Scholar]
Hamamatsu, Y.; Rebane, J.; Kruusmaa, M.; Ristolainen, A. Bayesian Optimization Based Self-Improving Sliding Mode Controller For a Bio-Inspired Marine Robot. In Proceedings of the 2024 IEEE/OES Autonomous Underwater Vehicles Symposium (AUV), Boston, MA, USA, 18–20 September 2024; pp. 1–6. [Google Scholar]
von Rohr, A.; Stenger, D.; Scheurenberg, D.; Trimpe, S. Local Bayesian optimization for controller tuning with crash constraints. at-Automatisierungstechnik 2024, 72, 281–292. [Google Scholar] [CrossRef]
Sinan, S.; Fareh, R.; Hamdan, S.; Saad, M.; Bettayeb, M. Modified power rate sliding mode control for robot manipulator based on particle swarm optimization. IAES Int. J. Robot. Autom. 2022, 11, 168. [Google Scholar] [CrossRef]
Lopez-Franco, C.; Diaz, D.; Hernandez-Barragan, J.; Arana-Daniel, N.; Lopez-Franco, M. A metaheuristic optimization approach for trajectory tracking of robot manipulators. Mathematics 2022, 10, 1051. [Google Scholar] [CrossRef]
Kheshti, M.; Tavakolpour-Saleh, A.; Razavi-Far, R.; Zarei, J.; Saif, M. Genetic Algorithm-Based Sliding Mode Control of a Human Arm Model. IFAC-PapersOnLine 2022, 55, 2968–2973. [Google Scholar] [CrossRef]
ISO 9283:1998; Manipulating Industrial Robots—Performance Criteria and Related Test Methods; Technical Report. ISO: Geneva, Switzerland, 1998.
Nise, N.S. Control Systems Engineering; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]

Figure 1. Workflow illustrating Bayesian Optimization (left), Particle Swarm Optimization (center), and Genetic Algorithm (right) for obtaining the optimal control gains

k_{0}^{*}

and

k_{1}^{*}

.

Figure 1. Workflow illustrating Bayesian Optimization (left), Particle Swarm Optimization (center), and Genetic Algorithm (right) for obtaining the optimal control gains

k_{0}^{*}

and

k_{1}^{*}

.

Figure 2. Progression of BO. Columns depict early, mid, and final stages (either iterations 1/25/50 from the log or worst/median/best by J). Top: Joint 1; bottom: Joint 2. Black: reference; blue: estimated position

x_{j} (k)

; red (right axis): estimation error

e_{j} (k)

. A clear reduction in steady-state error from early to final is observed, with BO retaining some exploratory behavior in the mid stage before settling, consistent with its acquisition-driven search and the global trend shown in Figure 5. Left axis in rad; right axis in rad for Joint 1 and rad/s for Joint 2.

Figure 2. Progression of BO. Columns depict early, mid, and final stages (either iterations 1/25/50 from the log or worst/median/best by J). Top: Joint 1; bottom: Joint 2. Black: reference; blue: estimated position

x_{j} (k)

; red (right axis): estimation error

e_{j} (k)

. A clear reduction in steady-state error from early to final is observed, with BO retaining some exploratory behavior in the mid stage before settling, consistent with its acquisition-driven search and the global trend shown in Figure 5. Left axis in rad; right axis in rad for Joint 1 and rad/s for Joint 2.

Figure 3. Progression of PSO. From early to mid, the swarm quickly drives the trajectories closer to the references, and the final stage shows small tracking errors with minor residual transients. This fast improvement mirrors the rapid convergence behavior reported in Figure 5. Axis conventions as in Figure 2.

Figure 4. Progression of GA. The early stage exhibits larger errors, which decrease steadily by the mid stage as selection, crossover, and mutation refine the population; the final stage presents low steady-state errors comparable to the best configurations. This gradual improvement aligns with the generation-wise convergence trend in Figure 5. Axis conventions as in Figure 2.

Figure 5. Convergence comparison of BO, PSO, and GA.

Figure 6. Tracking performance for Joint 1. The reference trajectory

r e f_{1} (k)

is shown in black, the measured position

q_{1} (k)

in red, and the estimated state

x_{1} (k)

in blue. A zoomed-in region illustrates the tracking precision at the beginning of the simulation.

Figure 6. Tracking performance for Joint 1. The reference trajectory

r e f_{1} (k)

is shown in black, the measured position

q_{1} (k)

in red, and the estimated state

x_{1} (k)

in blue. A zoomed-in region illustrates the tracking precision at the beginning of the simulation.

Figure 7. Tracking performance for Joint 2. Black:

{ref}_{2} (k)

; red: measured

q_{2} (k)

; blue: estimated

x_{2} (k)

. Semi-transparent overlays depict 6400 simulations across the gain space. The inset zoom highlights the initial transient.

Figure 7. Tracking performance for Joint 2. Black:

{ref}_{2} (k)

; red: measured

q_{2} (k)

; blue: estimated

x_{2} (k)

. Semi-transparent overlays depict 6400 simulations across the gain space. The inset zoom highlights the initial transient.

Figure 8. PSO–tuned discrete-time SMC: control torques ((top)

τ_{1} (k)

, (bottom)

τ_{2} (k)

).

Figure 8. PSO–tuned discrete-time SMC: control torques ((top)

τ_{1} (k)

, (bottom)

τ_{2} (k)

).

Table 1. Mean values of the state estimation errors (MAE as the primary clarity metric).

Estimation Error	MAE	Mean Error	RMSE
$e_{1} (k) = q_{1} (k) - x_{1} (k)$ rad	0.00263	$2.56 \times 10^{- 4}$	0.0030561
$e_{2} (k) = q_{2} (k) - x_{2} (k)$ rad	0.004083	$3.45 \times 10^{- 5}$	0.004558
$e_{3} (k) = q_{3} (k) - x_{3} (k)$ rad/s	0.067850	$7.39 \times 10^{- 4}$	0.081822
$e_{4} (k) = q_{4} (k) - x_{4} (k)$ rad/s	0.445721	$2.91 \times 10^{- 3}$	0.5002100

Table 2. Optimal gains and average steady-state tracking errors.

Algorithm	$k_{0}^{*}$	$k_{1}^{*}$	${\bar{e}}_{1}$ (Joint 1) rad	${\bar{e}}_{2}$ (Joint 2) rad
BO	106.863	194.327	$7.814 \times 10^{- 5}$	$8.392 \times 10^{- 3}$
PSO	106.8747	194.542.0	$7.363 \times 10^{- 5}$	$8.96 \times 10^{- 3}$
GA	106.4461	195.01	$1.225 \times 10^{- 4}$	$8.175 \times 10^{- 3}$

Table 3. Computational time per process.

Process	Time (s)
Load Data	22.3841
Normalization	0.0526
Training DOF1	7.4854
Training DOF2	7.2542
Bayesian Optimization	23.6546
PSO Optimization	23.443
GA Optimization	61.9840
GA Refinement	7.702

Table 4. Number of evaluations per algorithm.

Algorithm	Evaluations
Bayesian Optimization	50
Particle Swarm Optimization	1500
Genetic Algorithms	7500

Table 5. Model precision (R²).

Model	R²
DOF1	0.0336
DOF2	0.0084

Table 6. Comparison with related works on data-driven/metaheuristic controller tuning.

Application/Plant	Method (Tuning/Goal)	Reported Outcome/Notes
Bio-inspired marine robot [26]	SMC with online Bayesian Optimization (update sliding surface)	Significant improvement in depth/pitch regulation after ∼56 online updates with Lyapunov-stable tuning; safe real-time exploration.
Real systems (controller tuning with safety) [27]	Local Bayesian Optimization with crash constraints	Formulation and demos emphasizing safe, sample-efficient parameter search under hardware limits.
Robotic manipulator (4-DOF) [28]	PSO-tuned modified power-rate SMC (chattering mitigation)	Experimental validation; PSO adopted to simplify tuning of nonlinear objective and improve tracking robustness.
Manipulator path tracking [29]	Comparative metaheuristics (DE, PSO, hybrids) minimizing weighted position/orientation/joint errors	Uniform settings (e.g., population and iterations); methodology generalizable across DOFs and objectives.
Human-arm (3-link) model [30]	GA-tuned SMC (minimize sum of tracking errors)	Simulation study; GA set gains to track prescribed human-arm trajectories.
2-DOF manipulator (discrete-time SMC with NN model). This study	Unified normalized benchmark of BO/PSO/GA for gain tuning	Table 2: PSO best ${\bar{e}}_{1}$ , GA best ${\bar{e}}_{2}$ (close margins); Table 3: PSO fastest, BO similar, GA slower. Convergence/progression figures corroborate trends.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Castañeda, C.E. AI-Based Optimization of a Neural Discrete-Time Sliding Mode Controller via Bayesian, Particle Swarm, and Genetic Algorithms. Robotics 2025, 14, 128. https://doi.org/10.3390/robotics14090128

AMA Style

Castañeda CE. AI-Based Optimization of a Neural Discrete-Time Sliding Mode Controller via Bayesian, Particle Swarm, and Genetic Algorithms. Robotics. 2025; 14(9):128. https://doi.org/10.3390/robotics14090128

Chicago/Turabian Style

Castañeda, Carlos E. 2025. "AI-Based Optimization of a Neural Discrete-Time Sliding Mode Controller via Bayesian, Particle Swarm, and Genetic Algorithms" Robotics 14, no. 9: 128. https://doi.org/10.3390/robotics14090128

APA Style

Castañeda, C. E. (2025). AI-Based Optimization of a Neural Discrete-Time Sliding Mode Controller via Bayesian, Particle Swarm, and Genetic Algorithms. Robotics, 14(9), 128. https://doi.org/10.3390/robotics14090128

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Based Optimization of a Neural Discrete-Time Sliding Mode Controller via Bayesian, Particle Swarm, and Genetic Algorithms

Abstract

1. Introduction

2. Methodology

2.1. Dynamic Model of the Two-DOF Robotic Manipulator

2.2. State Estimation with Recurrent High-Order NN

2.3. Discrete-Time Sliding Mode Controller

2.4. Data Generation and CSV Export for AI-Based Optimization

2.5. Load CSV Data

2.6. Normalize Data

2.7. Bayesian Optimization Algorithm

2.8. Particle Swarm Optimization Algorithm

2.9. Genetic Algorithm Optimization

3. Results

3.1. Dataset Neural Sliding Mode Control Simulation

3.2. Optimal Control Gains for BO, PSO, and GA

4. Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI