Neuro-Visual Adaptive Control for Precision in Robot-Assisted Surgery

Urrea, Claudio; Garcia-Garcia, Yainet; Kern, John; Rodriguez-Guillen, Reinier

doi:10.3390/technologies13040135

Open AccessArticle

Neuro-Visual Adaptive Control for Precision in Robot-Assisted Surgery

by

Claudio Urrea

^*

,

Yainet Garcia-Garcia

,

John Kern

and

Reinier Rodriguez-Guillen

Electrical Engineering Department, Faculty of Engineering, University of Santiago of Chile, Las Sophoras 165, Estación Central, Santiago 9170020, Chile

^*

Author to whom correspondence should be addressed.

Technologies 2025, 13(4), 135; https://doi.org/10.3390/technologies13040135

Submission received: 17 January 2025 / Revised: 26 March 2025 / Accepted: 29 March 2025 / Published: 1 April 2025

(This article belongs to the Special Issue Assistive Technologies in Care and Rehabilitation: Research, Developments, and International Initiatives)

Download

Browse Figures

Versions Notes

Abstract

This study introduces a Neuro-Visual Adaptive Control (NVAC) architecture designed to enhance precision and safety in robot-assisted surgery. The proposed system enables semi-autonomous guidance of the laparoscope based on image input. To achieve this, the architecture integrates the following: (1) a computer vision system based on the YOLO11n model, which detects surgical instruments in real time; (2) a Model Reference Adaptive Control with Proportional–Derivative terms (MRAC-PD), which adjusts the robot’s behavior in response to environmental changes; and (3) Closed-Form Continuous-Time Neural Networks (CfC-mmRNNs), which efficiently model the system’s dynamics. These networks address common deep learning challenges, such as the vanishing gradient problem, and facilitate the generation of smooth control signals that minimize wear on the robot’s actuators. Performance evaluations were conducted in CoppeliaSim, utilizing real cholecystectomy images featuring surgical tools. Experimental results demonstrate that the NVAC achieves maximum tracking errors of 1.80 × 10⁻³ m, 1.08 × 10⁻⁴ m, and 1.90 × 10⁻³ m along the x, y, and z axes, respectively, under highly significant dynamic disturbances. This hybrid approach provides a scalable framework for advancing autonomy in robotic surgery.

Keywords:

autonomous robotic surgical assistant; model reference-based neuro-visual adaptive control; surgical instrument detection; CfC-mmRNN; data-driven inverse modeling; visual tracking; YOLO11n

1. Introduction

Minimally invasive robotic surgery has revolutionized surgical procedures, offering benefits to both patients—by reducing recovery time and the risk of complications [1,2]—and surgeons by enhancing their capabilities and providing an ergonomically optimal position. Advances in hardware and software have improved precision and dexterity in tissue and organ manipulation, incorporating advanced features such as tremor reduction in the surgeon’s hand, thereby enhancing procedural safety and quality [3,4].

In this context, camera-holding robots have emerged as a solution to improve the stability and quality of intraoperative vision without significantly impacting operation time. These devices minimize unintended movements, reduce the need for endoscope cleaning, and optimize workflow [5]. Over their evolution, they have progressed from basic mechanical systems to intelligent assistant robots integrating computer vision (CV) and machine learning (ML) [6].

However, challenges persist in achieving precise and intuitive control, adapting to dynamic surgical environments, and rejecting external disturbances that negatively affect robot autonomy, field of view (FOV) stability, and procedural fluency. To address these issues, this work proposes a Neuro-Visual Adaptive Control architecture that enhances precision, visual stability, and responsiveness to dynamic environmental variations, thereby promoting robotic autonomy.

Related Work

For a camera-holding robot to assist the surgeon with a degree of autonomy, robust environmental perception and control strategies that adapt to changing conditions are essential. A widely adopted approach to maintaining an appropriate FOV involves dynamically adjusting the vision sensor’s position based on the movement of surgical instruments [7,8,9,10,11]. In this context, artificial intelligence (AI) has driven the development of image-guided robotic surgery, often employing hierarchical control architectures. These are characterized by two primary levels: low-level control, responsible for executing movements through robot and tool regulation, and high-level control, tasked with planning and decision-making based on surgical requirements and visual data.

A commercial example is the RoboLens robot, part of the Sina surgical system, which tracks surgical instruments via image processing [6]. Other systems, such as the Smart Tissue Autonomous Robot (STAR), utilize Image-Based Visual Servoing (IBVS) to enhance real-time surgical instrument alignment precision [12]. In [13], a combination of fuzzy adaptive control, disturbance observation, dynamic surface control, and Lyapunov functions improves synchronization and robustness in teleoperation systems with communication delays. Reference [14] proposes an Adaptive Zeroing Neural Network (AZNN) controller to optimize FOV stability and mitigate external disturbances. Meanwhile, Reference [15] introduces an Adaptive Proportional Sliding Mode controller (APSMC) for manipulating an electromagnetically actuated colonoscope.

As surgical environments grow more complex, adaptive control systems have become critical for procedures demanding high precision, such as neurosurgery [16]. For low-level control, various adaptive strategies have been developed alongside hybrid approaches combining classical and advanced controllers. In [17], an Adaptive Fuzzy Control is introduced for flexible robotic endoscopes, adjusting parameters in real time to compensate for friction and nonlinear dynamics without requiring a precise physical model. Reference [18] designs a Fractional-Order Adaptive Fuzzy Sliding Mode Control (AFFOSMC) strategy to enhance trajectory tracking accuracy in surgical manipulators.

Model Reference Adaptive Control (MRAC) stands out as a prominent adaptive control approach, dynamically adjusting its parameters based on a predefined reference model to optimize system performance amid uncertainties and dynamic variations. MRAC has broad applicability in aerospace, renewable energy, and industrial and service robotics [19,20]. In [21], an MRAC based on Robust Fixed Point Transformations (RFPTs) was implemented in a master–slave system for soft tissue surgery, enabling the slave arm to accurately replicate the surgeon’s movements.

Cursi et al. reported significant improvements in autonomous trajectory tracking tasks using adaptive modeling with probabilistic networks like Bayesian Neural Networks (BNNs) and Evidential Neural Networks (EvNNs), outperforming Feedforward Artificial Neural Network (ANN)-based adaptive modeling [22,23]. In [24], a hybrid adaptive control was developed for a continuum surgical robot, combining neural network-trained inverse kinematics with online tuning of a PID controller. The authors reported a 50% reduction in maximum tracking error using a neural inverse model, with subsequent integration of the adaptive PID controller yielding further improvements of 74% in maximum error (down to 2.41 mm) and 66% in mean error (down to 1.49 mm). These findings underscore how hybrid strategies and data-driven modeling can effectively compensate for nonlinear system dynamics and manage unpredictable random errors during surgical procedures.

Among visual information processing techniques for surgical tool detection, object detection stands out, with deep learning-based methods excelling in this domain. Algorithms such as Faster Region-based Convolutional Neural Networks (Faster R-CNNs) and the You Only Look Once (YOLO) family have been widely adopted for their real-time processing capabilities and high detection rates [25,26,27]. Other proposed approaches include anchor-free CNNs [28], URTNet [29], Hourglass modules [30], and Transformer architectures [31], among others.

Despite these advances, challenges remain in validating system performance under more demanding and variable scenarios, applying data-driven control techniques with faster inference and higher precision, and implementing a holistic approach that enhances camera-holding robot autonomy without compromising safety. The future success of these systems hinges on achieving precise, adaptable endoscopic tracking robust to anatomical variations and external disturbances. Furthermore, proposed solutions must be inherently scalable, facilitating their transition from experimental settings to real-world clinical applications, and optimized for efficient operation on devices with limited computational resources without sacrificing performance critical to safe and effective surgical procedures.

In response to these challenges, we propose a Neuro-Visual Adaptive Control (NVAC) architecture, representing a significant advancement in assistive surgical robotics. This approach effectively addresses critical challenges such as real-time surgical tool tracking, adaptation to nonlinear dynamics, and rejection of significant disturbances. The primary contributions of this work are as follows:

Development of a hybrid Neuro-Visual Adaptive Control architecture integrating a Model Reference Adaptive Controller with Proportional–Derivative terms (MRAC-PD) and Closed-Form Continuous-Time Neural Networks with mixed-memory Long Short-Term Memory architectures (CfC-mmRNN). While hybrid adaptive control and neural network approaches exist, the specific combination of MRAC-PD and CfC-mmRNN in robotic surgery is novel. This integration enables precise and adaptable modeling of dynamic surgical environments, delivering high tracking accuracy, rapid response to variations, optimized inference speed, and smoother control signals, thereby enhancing system stability and efficiency.
Design and implementation of an optimized visual detection system for robotic surgery, based on YOLO version 11, nano variant (YOLO11n), enabling high-precision real-time tracking with an efficient balance between accuracy and computational load, facilitating improved trajectory planning.
Experimental validation in simulated environments demonstrating the robustness, precision, and superiority of the controller compared to standard control strategies (e.g., PID, SMC, and baseline MRAC-PD) under complex, nonlinear surgical conditions.
Comprehensive performance evaluation of the controllers using metrics such as tracking accuracy and external disturbance rejection, including significant forces (up to 15 N), sensor failures, and combined perturbations. These complex, representative scenarios highlight the controller’s robustness.
Establishment of an integrated framework to promote autonomy in robotic surgery, with a scalable approach for complex surgical environments, providing a solid foundation for future research and development in surgical assistant robots.

This article is organized as follows: Section 2 details the methodology, encompassing computational resources, system description, and the design of the control strategy, covering its theoretical basis and practical implementation. Section 3 presents the experimental results, highlighting validation across diverse scenarios. Section 4 discusses the key factors driving the controller’s performance, evaluates limitations, and suggests future research directions. Section 5 summarizes the conclusions and main contributions. Finally, Section 6 proposes future work aligned with the study’s findings.

2. Materials and Methods

This section outlines the resources, tools, and methodologies employed to develop and validate the proposed Neuro-Visual Adaptive Control system. It details the UR5 robot’s dynamic model, simulation platforms, controller design and validation approaches, performance evaluation metrics, and experimental protocols, ensuring reproducibility and scientific rigor.

The methodology adopts an integrated framework to achieve autonomous visual tracking and robotic control in a simulated surgical environment. It comprises three core modules—Simulation Environment, Visual Processing, and Control Design—as depicted in Figure 1.

2.1. Simulation Setup

For the validation of the designed control strategies, CoppeliaSim [32] was employed—a software platform integrating multiple physics engines and enabling advanced modeling of soft tissues, simulation of tissue deformations, physiological movements, rigid body dynamics, and instrument–tissue interactions with high-precision contact models. This capability supports progressive testing as the research advances, within high-fidelity surgical scenarios [9,33].

The simulation environment replicates a controlled surgical setting, wherein the robot interacts with a mannequin featuring real laparoscopic abdominal surgery images from various patients embedded in its abdomen (Figure 2). These images, applied as textures onto a surface, provide a realistic visual foundation for evaluating the control system. The resulting visual fidelity captures the complexity of the surgical field (e.g., variations in lighting and tissue characteristics), exposing the system to natural anatomical variability and enhancing its robustness [34,35].

To enable dynamic interaction with the images, they are rotated within a range of ±30° around the z-axis from their initial position, at an angular velocity of 0.05 rad/s. This mechanism allows assessment of the system’s ability to maintain a stable visual reference, facilitating laparoscope guidance. The surgical scenario also incorporates elements of laparoscopic surgery, such as a trocar, through which the robot’s end-effector—a passive cylindrical tool with dimensions (0.01 m × 0.01 m × 0.262 m)—is inserted to simulate a laparoscope. At its tip, this tool integrates a perspective vision sensor with Legacy OpenGL rendering, a resolution of 270 × 270 pixels, a 45° field of view, and a detection range of 0.001–0.180 m for short-range visualization.

To ensure surgical safety and fidelity, constraints were established using virtual hyperplanes that delimit the workspace, preventing collisions between the laparoscope and tissues. A fraction of the robot’s total operational space was defined with the following dimensions: [x: 0.3769 m to 0.5969 m, y: −0.025 m to 0.245 m, z: 0.9757 m to 1.1547 m]. Additionally, the Remote Center of Motion (RCM) constraint was implemented virtually—a critical element in minimally invasive surgery—enabling surgical instruments to pivot appropriately around the insertion point without causing displacements that could compromise patient safety [35,36,37].

This methodology ensures the validity of the simulation environment and facilitates the transfer of skills from this setting to real-world surgical practice, as demonstrated in prior autonomous endoscopic positioning systems using CoppeliaSim [38,39]. Thus, the developed environment enables the evaluation of control strategies within a context representative of clinical reality, preparing the system to address surgical scenarios with enhanced precision and safety.

2.1.1. Hardware and Software Tools

The study utilizes MatLab^® R2024a [40] and CoppeliaSim Edu 4.8.0 for controller design, implementation, and validation, integrated via the ZeroMQ Remote API with simulation stepping enabled (sim.setStepping(true)) for seamless synchronization. Surgical tool detection leverages the YOLO11n architecture [41], trained in Python 3.9.21, with labeling performed using Roboflow [42]. Simulations were conducted on a high-performance computational platform, with specifications detailed in the Supplementary Materials.

2.1.2. Dataset

The CholecSeg8k dataset [43], comprising 8080 annotated laparoscopic cholecystectomy images (854 × 480 pixels, PNG format), is derived from 17 videos performed by 13 different surgeons (Figure 3). This dataset, an extension of Cholec80 [44], provides extensive diversity in surgical techniques and anatomical variability, while also presenting challenging conditions such as low illumination, blurriness, opacity, smoke presence, and a range of surgical instruments varying in size and depth. This richness of scenarios makes the dataset particularly suitable for our study, as its variability enhances the robustness and generalization capability of the trained detection model.

Since the images from the dataset were loaded as textures in the simulation environment, their resolution was adjusted to 270 × 270 pixels—a process that could invalidate prior labeling. Furthermore, as a strategy to maintain precise focus of the FOV on the operational region, only the tip of the surgical tool was labeled as “tool” in Roboflow, aiming to minimize errors or deviations in the FOV. The labeled images were applied as textures onto a disk-shaped structure in CoppeliaSim and captured via MatLab^® to generate the training dataset used in the study. For these reasons, broadly speaking, any dataset employed would have required relabeling to align with the experiment’s specific criteria.

2.2. Description of the System Under Study

The UR5 collaborative robot, developed by Universal Robots, was selected to validate the control strategy in robot-assisted surgery due to its versatility and suitability for both industrial and clinical environments. Its key features include a 6 Degrees of Freedom (DoF) configuration for adaptable positioning and seamless integration with external devices, supported by technical specifications such as a 5 kg payload, 850 mm operational radius, and ±0.1 mm positioning accuracy [45]. Figure 4 presents a schematic of the UR5 robot. Detailed modeling of its kinematics and dynamics, including coordinate systems, dimensional parameters, Denavit-Hartenberg (D-H) parameters, and dynamic specifications, as well as the implementation of Remote Center of Motion constraints, are provided in the Supplementary Materials.

2.3. Design of Control Strategy

This section introduces the Neuro-Visual Adaptive Control, an innovative hybrid control architecture designed for the precise guidance of camera-holding robots in laparoscopic surgery. NVAC synergistically integrates advanced visual information processing via the YOLO11n model with adaptive control techniques and neural networks. This combination enables the system to efficiently interpret the complex surgical environment in real time, adapt to anatomical variations and unforeseen disturbances, and maintain optimal camera positioning throughout the procedure. The proposed architecture is conceived to overcome the limitations of conventional approaches, offering enhanced robustness, precision, and responsiveness in challenging environments.

YOLO, first introduced by Redmon and Farhadi in 2015, is renowned for its efficiency in object detection. Developed by Ultralytics under the AGPL-3.0 license, YOLO11 enhances performance across various CV tasks. The YOLO11n variant, utilized in this study, balances high inference speed (9.5 ms) and accuracy with a reduced parameter count, motivating its selection for real-time surgical tool tracking. Figure 5 illustrates the YOLO11 architecture based on [46]. This method generates a continuous sequence of reference points from the centroids of detected bounding boxes, updated periodically through image acquisition and processing, enabling the generation of precise tracking trajectories.

The proposed low-level control consists of a Model Reference Adaptive Control scheme with Proportional–Derivative (PD) terms (MRAC-PD), leveraging its ability to manage nonlinear dynamics without requiring an exact mathematical model of the system. As demonstrated in prior studies [47,48], this configuration maintains accuracy in the presence of parametric uncertainties. The PD component enhances response speed and reduces overshoot, which is critical when manipulating delicate anatomical structures. Unlike conventional controllers such as PID, whose performance degrades significantly under nonlinear conditions, or Sliding Mode Control (SMC), which introduces high-frequency oscillations detrimental to surgical instrumentation, MRAC-PD generates smooth control signals and sustains adaptive robustness.

Additionally, a data-driven robot modeling approach is incorporated—a key strategy in surgical robotics, where analytical models are limited by the mechanical complexity of these systems, dynamic interactions during surgery, and the inherent variability of the surgical environment. The use of data-driven models enhances trajectory tracking accuracy, increases system robustness to sensor failures, and facilitates the integration of kinematic constraints. Furthermore, it enables adaptation to changing operational conditions, rejection of external disturbances, and management of parametric uncertainties. For implementation in real-time control systems, it is essential that such models are computationally efficient, ensuring low latency and optimal dynamic response.

In this context, the architecture of Closed-Form Continuous-Time Neural Networks (CfC) emerges as an ideal solution. Hasani et al.’s Liquid Time-Constant Networks (LTCs) [49] introduced adaptive “liquid” time constants for flexible representation; however, their reliance on Ordinary Differential Equation (ODE) solvers incurs a high computational cost. Closed-Form Continuous-Time Neural Networks (CfC) [47] overcome this limitation by providing a closed-form solution, reducing computational overhead without sacrificing accuracy, thus enabling real-time execution on hardware with limited resources.

The integration of CfC with mixed-memory architectures, such as Long Short-Term Memory (CfC-mmRNN) (Figure 6), effectively mitigates the vanishing gradient problem, which is critical for modeling complex, long-term temporal dependencies. This capability is essential for anticipating tissue movements and proactively responding to environmental changes. Experimental results reported by the authors [47] demonstrate that the CfC-mmRNN architecture achieves superior performance in terms of Mean Squared Error (MSE) when modeling physical dynamics, significantly outperforming architectures such as LTCs and various LSTM variants, as well as achieving an MSE approximately 18% lower than the Transformer architecture.

Even more notable is its computational efficiency, with inference times less than half those of Transformer and LTC architectures, making it an optimal choice for applications requiring real-time processing of temporal sequences with irregularly sampled data. Furthermore, its design—based on a shared backbone that branches into specialized functions (f, g, and h)—facilitates the capture of temporal and structural relationships without compromising the model’s expressiveness.

The integration of MRAC-PD with CfC-mmRNN directly addresses the critical requirements of robot-assisted surgery. While MRAC-PD ensures stability, precise tracking, and robustness, CfC-mmRNN enables the modeling of complex nonlinear dynamics without imposing a computational burden that would limit its clinical applicability. This approach optimizes millimeter-level precision, real-time adaptation, and resilience to system disturbances and uncertainties.

2.3.1. Design of the Low-Level MRAC-PD-CfC-mmRNN Adaptive Controller

This subsection provides a detailed overview of the low-level MRAC-PD-CfC-mmRNN controller, which serves as the core of the NVAC and is responsible for regulating the robotic manipulator’s dynamic response. The discrete-time dynamics of an n-DoF manipulator—outlined in Supplementary Material, Section 4—are modeled as follows:

M (q [k]) (q [k + 1] - 2 q [k] + q [k - 1]) / {T_{s}}^{2} + C (q [k], \dot{q} [k]) \dot{q} [k] + G (q [k]) = τ [k]

(1)

where

q[k]: joint position vector.
$\dot{q}$ [k] = (q[k] − q[k − 1])/ $T_{s}$ : joint velocity vector.
$\ddot{q}$ [k] = (q[k + 1] − 2q[k] + q[k − 1])/ $T_{s}$ ²: joint acceleration vector.
M(q[k]): inertia matrix.
C(q[k], $\dot{q}$ [k]) $\dot{q}$ [k]: Coriolis and centrifugal forces matrix.
G(q[k]): gravitational force vector.
τ[k]: control torque vector.
$T_{s}$ : sampling period.

The controller formulation defines position and velocity errors as follows:

{e [k] = q}_{d} [k] - q [k]

(2)

\dot{e} [k] = (e [k] - e [k - 1]) / T_{s} = {\dot{q}}_{d} [k] - \dot{q} [k]

(3)

where

$q_{d}$ [k]: desired position at step k.
${\dot{q}}_{d}$ [k]: desired velocity at step k.
q[k]: current position of the robot.

The reference model, specified by Equations (4) and (5), incorporates adaptation gains

γ_{1}

and

γ_{2}

to adjust the reference position

q_{r}

[k] and velocity

{\dot{q}}_{r}

[k], facilitating rapid adaptation to system changes.

q_{r} [k] = q_{d} [k] + γ_{1} e [k]

(4)

{\dot{q}}_{r} [k] = {\dot{q}}_{d} [k] + γ_{2} \dot{e} [k]

(5)

where

$γ_{1}$ , $γ_{2}$ : positive diagonal gain matrices.

The adaptation law, given by Equation (6), dynamically updates the adaptive parameters Γ[k] to minimize tracking errors between

q_{r}

[k] and q[k].

{Γ [k + 1] = Γ [k] + γ}_{1} (q_{r} [k] - q [k]) + γ_{2} ({\dot{q}}_{r} [k] - \dot{q} [k])

(6)

where

Γ[k] > 0: vector of adaptive parameters, ensuring convergence of position and velocity errors while compensating for dynamic uncertainties.

The control law, defined in Equation (7), integrates the CfC-mmRNN-based inverse model

τ_{C f C}

[k] with PD and adaptive terms for robustness and stability:

τ [k] = τ_{C f C} [k] + K_{p} e [k] + K_{d} \dot{e} [k] + Γ [k]

(7)

where

$τ_{C f C}$ [k]: torque predicted by the inverse model.
$K_{p}$ , $K_{d}$ : positive diagonal gain matrices.
$K_{p}$ e[k]: proportional term that reduces the error.
$K_{d} \dot{e}$ [k]: derivative term that improves dynamic stability.

2.3.2. Stability Analysis

A Lyapunov candidate function is proposed in Equation (8) to assess stability:

V [k] = \frac{1}{2} e {[k]}^{T} K_{p} e [k] + \frac{1}{2} \dot{e} {[k]}^{T} M (q [k]) \dot{e} [k] + \frac{1}{2} \tilde{Γ} {[k]}^{T} П^{- 1} \tilde{Γ} [k]

(8)

where

$\tilde{Γ}$ [k] = Γ[k] − Γ*: adaptive parameter error.
П⁻¹: inverse adaptive gain matrix.

The Lyapunov difference is computed as

∆V[k] = V[k + 1] − V[k]

(9)

Substituting the adaptation law (6) and control law (7) into the dynamics (1) yields

∆ V [k] \leq {- T}_{s} \dot{e} {[k]}^{T} K_{d} \dot{e} [k] - T_{s} \dot{e} {[k]}^{T} K_{p} e [k] + T_{s} \dot{e} {[k]}^{T} \tilde{τ} [k]

(10)

where

τ*[k]: actual torque required.
$\tilde{τ}$ [k] = τ*[k] − $τ_{C f C}$ [k]: modeling error, representing the discrepancy between the learned model and the actual system dynamics.

Stability conditions:

(a)

The controller gains must satisfy the following:

$K_{p}$ > 0: ensures that the proportional term contributes to system stability.
$K_{d}$ > $T_{s} K_{p}$ /2: guarantees adequate damping in the discrete-time system.

(b)

Adaptation parameters:

$γ_{1}$ , $γ_{2}$ < 2/ $T_{s}$ : restricts adaptation gains to prevent excessive oscillations.

(c)

The sampling interval must be sufficiently small to satisfy the conditions specified in Equations (11) and (12):

{‖ I + T_{s} M^{- 1} (q [k]) K}_{d} ‖ < 1

(11)

‖M^─1(q[k])‖ < L

(12)

where

L: a positive constant bounding the inverse of the inertia matrix.

(d)

Modeling error:

‖ $\tilde{τ}$ [k]‖≤ ξ, where ξ > 0 is an upper bound.

These conditions ensure the asymptotic stability of the system, guaranteeing precise tracking and bounded adaptive parameters:

Asymptotic convergence of position error: e[k] $\to$ 0 as k $\to \infty .$
Velocity error convergence: $\dot{e}$ [k] $\to$ 0 as k $\to \infty .$
Bounded adaptive parameters: ‖Γ[k]‖< $\infty$ .

2.3.3. Trajectory Generation

To select the optimal solution for trajectory generation, a methodology comprising three main components was followed:

Constraints: after defining the workspace, the RCM constraint was implemented, ensuring that all movements maintain a fixed fulcrum point.
Trajectory generation: smooth and continuous trajectories were generated using the minimum-jerk method between target points, considering maximum velocity, acceleration, and sampling step through interpolation to achieve seamless transitions.
Optimal configuration selection: eight inverse kinematics solutions were generated per target position, and the best was selected based on three criteria:
- Position constraint verification: ensures the end-effector remains within the safe operational volume.
- Joint limit verification: eliminates configurations exceeding safe joint ranges, preventing collisions or singularities.
- Minimum configuration distance: prioritizes the solution with the smallest displacement from the current configuration for more efficient movements.

Table 1 lists the control strategy design parameters. The sampling period is T_s = 0.001 s, and Figure 7 depicts the NVAC scheme with the MRAC-PD-CfC-mmRNN low-level controller.

2.4. Neural Network Training

The success of the training process heavily depends on the quantity and quality of input data. In this study, a gray-box approach was adopted, utilizing bounded input signals within a clinically relevant workspace to characterize the robot’s dynamics.

To train the inverse model, trajectories were generated within the defined workspace. A Monte Carlo method was employed to identify feasible configurations and exhaustively explore possible trajectories. Notably, training trajectories were intentionally constrained to an “effective space” to establish an operational safety margin, preventing collisions with tissues in the event of transient control errors.

Random configurations were generated within joint limits, and forward kinematics was used to determine the positions and orientations of the end-effector. Each trajectory was designed at varying sampling frequencies (200 Hz, 250 Hz, 400 Hz, and 500 Hz) and comprised a maximum of 10 segments per trajectory. These segments were constructed by connecting selected points according to two possible configurations: straight and/or circular segments. Figure 8a depicts the general workspace of the assistant robot, accounting for the laparoscope’s length, while Figure 8b illustrates the restricted working area used for generating training trajectories.

Figure 9 presents examples of these trajectories within the restricted workspace.

Table 2 specifies the training parameters for the CfC-mmRNN (system identification) and YOLO11n (tool tip detection) neural networks.

2.5. Performance Metrics

The detection performance of the YOLO11n model is evaluated using recall (R), defined in (13), precision (P), defined in (14), and the mean average precision (mAP), which represents the area under the precision–recall curve and is defined in (16) [29].

R = \frac{TP}{TP + FN}

(13)

P = \frac{TP}{TP + FP}

(14)

where

TP (true positive): correct detections of the tool tip as the region of interest.
FP (false positive): incorrect identifications of a region as the tool tip.
FN (false negative): missed detections of the tool tip despite its presence.

AP = \int_{0}^{1} P (R) d R

(15)

mAP = \frac{\sum_{i = 1}^{N} {A P}_{i}}{N}

(16)

where

N: the total number of classes in the test set.
i: represents class I.
${A P}_{i}$ : the average precision for class i.

The

{A P}_{0.5}

refers to the average precision (AP) when the Intersection over Union (IoU) is set to 0.5, enabling measurement of the model’s accuracy when a detection is considered correct with 50% overlap between the prediction and the actual tool. The

{A P}_{0.5 : 0.95}

represents the AP across varying IoU thresholds (from 0.5 to 0.95), allowing evaluation of the detection model’s performance at different levels of overlap [29].

For the training of the CfC-mmRNN neural network, the Mean Squared Error (MSE) loss function is employed, defined as

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(17)

where

n: number of dataset examples.
$y_{i}$ : actual value for the i-th example.
${\hat{y}}_{i}$ : predicted value for the i-th example.

Three performance indices (PIs) quantitatively assess controller performance, each capturing distinct response characteristics:

Mean Absolute Error (MAE): Robust against outliers and disturbances, MAE offers a balanced measure of average error, making it well suited for dynamic, unpredictable surgical environments:

$M A E = \frac{1}{n} \sum_{i = 1}^{n} |e_{i}|$

(18)
Integral of Time-Weighted Absolute Error (ITAE): By assigning greater weight to persistent errors over time, ITAE is essential for evaluating long-term stability and precision in surgical tracking:

$I T A E = \int_{0}^{T} t |e (t)| d t$

(19)
Integral of Time-Weighted Squared Error (ITSE): Combining temporal weighting with quadratic error penalization, ITSE effectively identifies large, late-stage errors and assesses robustness against disturbances:

$ITSE = \int_{0}^{T} t e^{2} (t) d t$

(20)

These metrics, as detailed in [48,49,50,51,52,53], provide a comprehensive evaluation of the system’s performance under surgical conditions.

3. Results

This section presents the experimental results validating the proposed Neuro-Visual Adaptive Control architecture for the semi-autonomous operation of a camera-holding robot in surgical environments. Overall, the NVAC controller demonstrated exceptional performance across all evaluation metrics, achieving tracking errors of 1.80 × 10⁻³ m, 1.08 × 10⁻⁴ m, and 1.90 × 10⁻³ m along the x, y, and z axes, respectively, in complex scenarios with significant dynamic disturbances. The YOLO11n-based visual processing system attained a precision of 97.30% and a recall of 94.80% in surgical instrument detection, while the CfC-mmRNN neural network provided accurate system identification with a Mean Squared Error (MSE) of 8.40 × 10⁻³.

Comparative analysis with conventional control techniques (PID, SMC, and standard MRAC-PD) confirmed the superior performance of NVAC in trajectory tracking, disturbance rejection, and control signal smoothness. The system maintained robust stability even under challenging conditions, including simulated sensor failures and tissue collisions. These findings underscore the potential of NVAC to significantly enhance precision and safety in robot-assisted surgical applications. The following subsections provide a detailed analysis of these results concerning surgical instrument localization, system identification, and control performance across various simulated surgical scenarios.

3.1. Results of Surgical Tool Localization

To train the YOLO11n model for surgical tool tip detection, a total of 2561 images from the CholecSeg8k dataset were employed, divided into 80% for training (2364 images), 10% for validation (99 images), and 10% for testing (98 images), each resized to 640 × 640 pixels. Training outcomes, illustrated in Figure 10, exhibit stable and consistent convergence of performance metrics, indicating robust model behavior.

The YOLO11n model achieved a precision of 97.30%, recall of 94.80%, an

{m A P}_{0.5}

of 97.5%, and an

{m A P}_{0.5 : 0.95}

of 72.7% in detecting surgical instruments. These outcomes are significant given the visual complexity and variability of instruments in surgical settings, highlighting the model’s reliability in minimizing false positives that could compromise procedural safety. The elevated recall underscores its ability to rarely miss detectable instruments, a critical factor for continuous tracking during surgery. These results confirm the model’s effectiveness for real-world surgical scenarios and its appropriateness for autonomous tool tracking tasks. Figure 10 presents the training performance results for the YOLO11n model.

Figure 11a showcases representative examples of tool tip detection in the test dataset, demonstrating high precision in bounding box placement. Figure 11b depicts the localization of bounding box centroids, calculated in MatLab^® from CoppeliaSim data, providing reliable spatial references for robotic control.

3.2. Results of System Identification

System identification utilized the CfC-mmRNN neural network, chosen for its robustness against the vanishing gradient problem in irregularly sampled data, effectively capturing long-term dependencies. A total of 80 trajectories at varying frequencies were employed, sampled at T_s = 0.001 s.

Training the inverse dynamics model yielded an MSE of 8.4 × 10⁻³, showcasing high precision in predicting the dynamic behavior of the UR5 robot despite its inherent nonlinearities. Figure 12 compares predicted torques with actual values for a test trajectory, revealing strong alignment with minimal deviations, even in regions of significant dynamic variability. This torque curve analysis underscores the model’s capability to accurately capture both overarching system dynamics and subtle variations, with minor discrepancies at torque peaks exerting negligible impact on control performance, as confirmed by subsequent tests.

Table 3 outlines the computational costs of the trained neural networks, confirming their suitability for real-time implementation on resource-constrained hardware.

The YOLO11x model, trained with parameters identical to those of YOLO11n, attained a precision of 98% and a recall of 93.5%, closely aligning with YOLO11n’s performance. YOLO11n was selected due to its substantially lower computational cost, enhancing its applicability for real-time surgical operations.

3.3. Simulation Results

Experiments evaluated the robot’s capability to track millimetric trajectories within the abdominal workspace using the NVAC strategy. Reference trajectories were derived from interpolated points captured during instrument tracking at various time instants, ensuring realistic motion profiles. To enhance safety, the robot’s torque was restricted according to manufacturer specifications.

3.3.1. Experimental Tests

Four progressively challenging test scenarios were devised to assess the NVAC’s performance, extracting key visual points for dynamic camera path planning:

Scenario 1: Visual data provided by YOLO11n detects the surgical instrument, generating reference points for a tracking path with moderate geometric transitions.
Scenario 2: To simulate estimation errors or sensor signal transmission failures, a controlled disturbance was introduced in joint 6 of the manipulator. Specifically, at the 9 s mark of the simulation, a sudden displacement of 0.06 radians was applied, lasting 0.2 s. This abrupt alteration in the reference mimics an unexpected trajectory deviation, enabling evaluation of the controller’s ability to detect discrepancies and recover stability in real time.
Scenario 3: The system’s robustness is evaluated against an abrupt perturbation simulating a 15 N collision with elements of the surgical environment.
Scenario 4: Multiple perturbations are integrated in a challenging scenario: an initial sensor signal failure at the 9 s mark, followed by a 15 N collision at the 10 s mark, assessing real-time path adjustment and tracking accuracy.

The performance of the NVAC was compared against three established controllers—PID, conventional MRAC-PD (with identical tuning parameters), and SMC—using quantitative performance indices.

3.3.2. Trajectory Tracking

Figure 13 showcases the performance of the implemented controllers in tracking the generated trajectories. Figure 13a presents results for Scenario 1, employing a higher density of control points to enhance tracking precision and assess controller behavior under substantial dynamic demands. Figure 13b illustrates performance in Scenario 2, incorporating a perturbation represented by a sudden displacement—a reference value significantly offset from the current workspace—simulating abrupt motion due to estimation errors or sensor signal failures.

For detailed performance analysis, Figure 14, Figure 15 and Figure 16 present tracking results in Cartesian space (x, y, and z axes) for trajectories 1 and 2, with the first and second columns corresponding to each trajectory, respectively.

Analysis of trajectory tracking highlights the challenges of achieving millimetric precision within the surgical Cartesian space. The proposed NVAC controller demonstrates shorter settling times and superior precision, particularly in regions with steep slopes and curvature. Conversely, the conventional MRAC-PD controller exhibits closely comparable performance but with slightly extended settling times and increased errors. The SMC controller displays intermediate performance, with evident deviations in curved segments and direction changes. The PID controller shows the greatest deviations from the reference, markedly affected by gravitational effects alongside SMC. A comparative analysis of the four controllers confirms that the NVAC delivers the best tracking performance, consistently maintaining close proximity to the reference from the outset.

Table 4 provides performance indices for trajectory 1. Table 5 provides performance indices for trajectory 2, which includes perturbation, facilitating a detailed comparison of controller performance.

Similar to the images, Table 4 confirms the superior performance of the proposed controller compared to the other controllers, with behavior akin to that of the MRAC-PD. The proposed controller consistently achieved the lowest performance index (PI) values across most metrics. MAE ranges from 10⁻⁵ to 10⁻⁶, indicating exceptional robustness and precision in tracking. ITAE, generally on the order of 10⁻³ for most joints, reflects rapid error convergence. ITSE, ranging from 10⁻⁶ to 10⁻⁹, confirms minimal time-weighted errors. The PI values for Scenario 2, in the presence of a sensor failure disturbance, highlight greater differences among the controllers, with the proposed controller demonstrating a marked improvement over the others.

A critical attribute of control systems is their capacity to reject external disturbances, which is paramount in surgery where precision and stability are essential. In addition to the perturbation in Scenario 2, a further 15 N force impulse was applied to simulate a collision, evaluating performance under extreme conditions. Figure 17 illustrates tracking performance under these disturbances, with Figure 17a representing Scenario 3 and Figure 17b representing Scenario 4.

Analysis of the controllers’ responses to perturbations reveals distinct recovery capabilities. The proposed NVAC exhibits robust disturbance rejection, with minimal deviation from the desired path and swift recovery, closely resembling the behavior of conventional MRAC-PD. In contrast, SMC and PID demonstrate greater vulnerability, displaying pronounced deviations and prolonged recovery times. Scenario 4, the most realistic and demanding scenario, integrates an initial perturbation simulating abrupt motion due to estimation errors or sensor failures with a subsequent 15 N collision. Table 6 and Table 7 quantitatively compare controller performance for Scenario 3 and 4, respectively, using performance indices.

Performance index analysis, as described in the methodology, validates the NVAC’s effectiveness and exceptional robustness. These findings confirm superior tracking precision, efficient reduction of persistent errors, and a remarkable capacity to mitigate large errors under substantial perturbations. Figure 18 and Figure 19 illustrate tracking errors for the designed controllers in Scenarios 3 and 4, respectively.

For Scenario 3, the proposed controller achieved maximum errors of 1.80 × 10⁻³ m on the x- and z-axes and 6.11 × 10⁻⁵ m on the y-axis, compared to MRAC-PD’s 1.70 × 10⁻³ m, 6.92 × 10⁻⁵ m, and 1.80 × 10⁻³ m for the x-, y-, and z-axes, respectively. In Scenario 4, the proposed controller recorded maximum errors of 1.80 × 10⁻³ m, 1.08 × 10⁻⁴ m, and 1.90 × 10⁻³ m across the x-, y-, and z-axes, respectively, with MRAC-PD showing comparable values.

Despite these similarities, comparative torque analysis reveals distinct differences in perturbation response. The proposed NVAC demonstrates faster and more stable recovery with reduced torque oscillations, signifying enhanced disturbance rejection. Its control action is less aggressive than that of MRAC-PD, minimizing abrupt variations while preserving performance. These findings establish the NVAC as a more robust and precise solution, ideally suited for surgical environments where stability is paramount.

4. Discussion

The results obtained with the NVAC controller demonstrate exceptional performance in terms of surgical precision. In the undisturbed scenario, the maximum errors achieved are submillimeter: 4.87 × 10⁻⁵ m (x-axis), 1.22 × 10⁻⁵ m (y-axis), and 1.81 × 10⁻⁴ m (z-axis). In the most complex scenario, tracking errors increase to 1.80 × 10⁻³ m (x-axis), 1.08 × 10⁻⁴ m (y-axis), and 1.90 × 10⁻³ m (z-axis), remaining below the clinically acceptable threshold of 2 mm for robotic surgery, as established in the consulted literature [18,24].

Compared to previous studies, such as Wang et al., who reported Mean Absolute Errors of 1.49 × 10⁻³ m without disturbances and 1.98 × 10⁻³ m under external perturbations, our approach maintains lower errors even under more demanding conditions. Additionally, the NVAC’s ability to recover stability after disturbances is significantly superior. While the controller proposed by Ren et al. [17] requires approximately 2 s to stabilize following disturbances causing angular errors below 1°, the proposed controller achieves recovery times under 0.7 s against 15 N forces—well above typical forces in surgical procedures (e.g., surgical camera: 0.2 N; soft tissue penetration (e.g., liver): 0.08 N; soft tissue resection (e.g., liver): 0.9 N; suturing on soft tissue (e.g., skin): 1.7 N; suturing with knot-tying: 8 N; liver and gallbladder retraction: 6.6 N; general surgical manipulation: 0.5–8 N) [54,55].

This robustness is particularly valuable in surgical environments where involuntary patient movements, accidental collisions, or interactions with denser tissues could introduce unforeseen disturbances. Although a 15 N perturbation exceeds forces commonly encountered in laparoscopic surgery, this test validates the system’s ability to maintain stability in critical scenarios, ensuring patient safety under adverse conditions.

The simulated disturbance in Scenario 2, introducing a sudden displacement of 0.06 radians (approximately 3.4°) for 0.2 s, represents a severe sensor failure compared to values reported in the literature. For instance, relative to Ren et al. [17], the NVAC controller demonstrates the ability to manage a larger angular error with a substantially shorter recovery time.

4.1. Study Limitations

While the obtained results are promising, several limitations of this study must be acknowledged. The validation of the NVAC controller was conducted exclusively in simulated environments. Although CoppeliaSim provides a robust platform for modeling complex dynamics, differences exist between simulations and real surgical scenarios, particularly regarding anatomical variability, tissue behavior, fluctuating lighting conditions, and the presence of fluids. For broader acceptance within the medical community, demonstrating results on physical hardware, even in a laboratory setting, would be ideal.

Validation with real cholecystectomy images marks a significant advancement, but comparison with larger clinical datasets could strengthen the model’s applicability. Furthermore, although the CholecSeg8k dataset is extensive, it represents a limited subset of potential variations in surgical instruments and visual conditions. The system’s ability to detect instruments not represented in the training set may be compromised. While YOLO11n proved efficient for real-time surgical instrument detection, benchmarking against alternatives such as Mask R-CNN or Swin Transformer could confirm its optimality. Additionally, the performance comparison of the proposed controller could be enriched by including modern deep learning-based control approaches.

4.2. Feasibility and Challenges of Transitioning from Simulation to Real-World Clinical Trials

The NVAC system holds potential to enhance visualization in various robotic surgery procedures, such as laparoscopic cholecystectomy, colorectal resection, Nissen fundoplication, and radical prostatectomy. In these procedures, key challenges include frequent camera repositioning, assistant fatigue, difficulty maintaining stable vision in deep spaces, and coordination of multiple instruments.

NVAC addresses these issues through efficient instrument tracking, improved visual stability, submillimeter precision, and real-time adaptation to different anatomical regions. These advantages can enhance surgical safety and efficiency, supporting its potential for future clinical application. Effective transition from simulation-based findings to real-world clinical settings requires rigorous validation and progressive adaptation.

Considerations for Clinical Validation of the NVAC Controller

Developing a Neuro-Visual Adaptive Control system for robotic surgery entails not only validation in simulated environments but also evaluation of its applicability in real clinical contexts. The following sections address critical aspects that would demonstrate its feasibility in an operating room.

1. Enhancements in visual perception: The YOLO11n model delivers high success rates in detecting surgical tool tips, providing rapid computation and accurate reference point acquisition. Additional capabilities of the YOLO family, such as instance segmentation and pose estimation, establish a foundation for future advancements in surgical scene perception. However, despite its effectiveness in real-time detection, YOLO11n exhibits limitations in visual perception, including difficulties in detecting fine details and ensuring precise identification under partial occlusions. Advanced techniques could mitigate these limitations, enhancing precision in complex surgical scenes, although they require greater computational resources, which may be impractical in hardware-constrained operating rooms.

A hybrid approach is recommended, utilizing YOLO11n for rapid detection while supplementing it with semantic segmentation for scenarios demanding greater detail and precision. Figure 20 showcases YOLO11n’s performance on complex dataset images, demonstrating robust detection despite limited training data and default parameters. Performance enhancements are achievable by optimizing parameters such as the initial learning rate, optimizer, and loss function. Overall, the YOLO family provides a highly efficient solution for real-time object detection, widely implemented in medical patents.

2. Currently, various commercial platforms have incorporated robotic assistance technologies into laparoscopic procedures. However, these systems primarily rely on manual guidance or servo-control strategies with limited adaptability. In contrast, the proposed NVAC controller offers greater adaptability to intraoperative disturbances, reduced tracking errors, and enhanced stability of the FOV, thereby improving surgical efficiency. To assess its applicability in real-world surgery, integration with certified surgical platforms and verification of compatibility with endoscopic vision systems used in laparoscopy would be necessary. A comparative study with commercial platforms, evaluating precision, latency, and robustness under varying surgical conditions, would be critical to validate its performance and implementation potential.

Given that these systems operate in environments with resource-constrained hardware, optimization strategies have been explored in three key areas: visual processing, neural model inference, and low-level control. These enhancements would reduce computational latency and ensure execution on medical hardware without delays, facilitating adoption in real clinical settings.

Computational cost reduction strategies:
−
Model compression: techniques such as quantization (reducing precision from FP32 to INT8) and pruning unnecessary connections reduce memory usage and accelerate inference without sacrificing precision.
−
CfC networks: by eliminating the need for numerical ODE solvers, CfC networks substantially decrease computational overhead, enabling deployment on edge devices; recent studies validate their efficiency on resource-constrained microcontrollers.
Visual processing optimization:
−
Lightweight YOLO variants: evaluating models like YOLO11n and YOLO-Tiny reduces the number of parameters and layers while maintaining detection accuracy.
−
Inference acceleration: methods such as ONNX and TensorRT conversions optimize model execution, reducing latency and improving detection speed.
Hardware implementation:
−
Edge-AI frameworks: technologies such as TinyML and TinyEngine support efficient model execution on devices like Jetson Nano or low-power microcontrollers, ensuring real-time processing capabilities.
−
Containerization: deployment in Docker or Singularity containers ensures portability and scalability, facilitating seamless integration across medical hardware [56,57,58].

3. Evaluation in high-fidelity simulation: to enhance the validity of the experiments, a simulation in CoppeliaSim could be conducted, incorporating tissue modeling that accounts for effects such as breathing, intra-abdominal pressure, variable lighting, and occlusions. This would challenge the robustness of the computer vision system and test the controller’s adaptability to unpredictable movements of surgical instruments.

4. Safety and ethical considerations: for future implementation in real surgery, it is critical to address human error management, including unexpected surgeon movements and potential visual detection failures, as well as system fault tolerance, by evaluating recovery mechanisms for sensory feedback errors. A key aspect is the design of intuitive human–robot interaction (HRI) interfaces that integrate haptic, visual, and auditory feedback, enabling early detection of failures or anomalous behavior. Similarly, predictive monitoring of critical parameters—such as force, velocity, and precision—helps reduce the risk of mechanical failures and tissue damage.

5. Compliance with international medical device regulations and ethical considerations is also essential. Partial or full autonomy of the system must remain under surgeon supervision, ensuring that critical decisions stay within human control. Moreover, transparency in algorithms and interpretability of their decisions are vital for building trust in clinical use. Equity in access to these technologies must be addressed to prevent widening healthcare disparities. Legal accountability for potential complications should be clearly defined among manufacturers, medical institutions, and surgeons, with established protocols for intervention in case of failures. This balance between robotic autonomy and human oversight will be pivotal for the acceptance and integration of these systems into surgical practice, ensuring both patient safety and ethical responsibility in their application.

6. Clinical validation plan:

To transition to a real clinical environment, a three-stage validation plan is proposed:

Enhanced simulated scenarios: tests in improved simulations modeling tissue dynamics and interactions more realistically, incorporating pre-recorded surgical videos.
Physical validation: validation using laparoscopic phantoms and pelvitrainers, enabling evaluation in a controlled environment.
Surgeon evaluation: assessment with surgeons using simulators such as Fundamentals of Laparoscopic Surgery (FLS), gathering feedback on the system’s utility in real procedures. The applicability of the NVAC controller could extend to other surgeon-support tasks involving surgical instruments. Performance evaluation with standardized exercises—such as object transfer, cutting precision, and intracorporeal suturing—using abdominal phantoms with realistic tissue properties will demonstrate its applicability.

The MRAC-PD-CfC-mmRNN system represents an advancement in robotic surgery automation, offering more precise and robust adaptive control. However, its applicability in real clinical settings requires further validation, optimization for certified hardware, and compliance with regulatory standards. Developing a testing protocol in collaboration with medical teams will be crucial for its effective implementation in surgical practice.

5. Conclusions

This study marks a significant advancement in hybrid control architectures for assisted robotic surgery. The proposed Neuro-Visual Adaptive Controller innovatively integrates Model Reference Adaptive Control with Proportional–Derivative terms, Closed-Form Continuous-Time Neural Networks, and the YOLO11n model, effectively overcoming the limitations in precision and adaptability inherent in conventional methods. Key findings include the following:

Exceptional surgical precision, with tracking errors of 1.80 × 10⁻³ m (x-axis), 1.08 × 10⁻⁴ m (y-axis), and 1.90 × 10⁻³ m (z-axis), considered negligible by clinical standards.
Robust rejection of external disturbances, ensuring operational stability and heightened safety within the surgical environment.
Seamless integration of computer vision achieves reliable detection of surgical instruments with a precision of 97.30%, a recall of 94.80%, an ${m A P}_{0.5}$ of 97.5%, and an ${m A P}_{0.5 : 0.95}$ of 72.7%, thereby optimizing real-time visual control.
Reduced risk of mechanical wear and tissue damage, supported by smooth control signals and stable dynamic responses.

Although the validation of the NVAC was conducted in simulated environments, the results obtained provide a robust foundation for its application in complex clinical scenarios. This work establishes a new benchmark for the integration of neural networks and adaptive control in robotic surgery, paving the way for future transformative advanced applications. However, its implementation in real surgery requires further validation on certified hardware, testing with surgeons, and compliance with medical regulations, representing the next phase of research for its clinical application.

6. Future Works

The findings of this study provide a robust foundation for advancing research in assisted robotic surgery. Nevertheless, several critical areas merit further investigation to fully realize the potential and clinical applicability of the proposed Neuro-Visual Adaptive Controller. The most significant future research directions include the following:

Enhanced validation in advanced simulated environments: strengthen validation through advanced simulations by incorporating extreme scenarios, such as severe disturbances, variable lighting conditions, and dynamic tissue motion. Furthermore, leveraging clinical databases and pre-recorded surgical videos—without conducting live trials—will refine and validate the NVAC in simulated settings that closely replicate real-world conditions, ensuring a rigorous preparatory phase for clinical deployment.
Optimization of visual perception models: assess the integration of multimodal vision techniques within simulated environments, incorporating fluorescence and 3D imaging to address current YOLO11n limitations in detecting critical fine details. This enhancement seeks to improve scene perception accuracy, tackling challenges such as occlusions and small instrument detection, while preserving computational feasibility for surgical contexts.
Enhancement of computational efficiency: implement quantization and pruning techniques to adapt YOLO11n for low-power hardware, complemented by latency and efficiency testing in simulated environments. These optimizations aim to refine the NVAC for seamless integration into resource-constrained clinical devices, ensuring real-time performance without sacrificing precision.
Integration of reinforcement learning: incorporate reinforcement learning techniques to autonomously optimize robotic trajectories and dynamically adapt to evolving surgical conditions. This approach aims to bolster the NVAC’s adaptability, enabling proactive responses to intraoperative variability and enhancing tracking robustness.
Advancement of robot–surgeon collaboration: develop and evaluate, within simulated environments, a multimodal human–robot collaboration interface integrating visual, auditory, and haptic feedback. This interface seeks to enable intuitive supervision and real-time system adjustments, simulating operational conditions in clinical settings and improving surgeon-robot synergy.

These research directions offer transformative potential for assisted robotic surgery, advancing the frontiers of autonomy and precision beyond current capabilities. By pursuing these developments, the NVAC could address a broader spectrum of clinical applications, significantly improving surgical outcomes and patient experiences.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/technologies13040135/s1, File S1: Neuro-Visual Adaptive Control in Robot-Assisted Surgery [45,53,59].

Author Contributions

Conceptualization, C.U. and Y.G.-G.; methodology, C.U. and Y.G.-G.; software, C.U. and Y.G.-G.; validation, C.U. and Y.G.-G.; formal analysis, C.U. and Y.G.-G.; investigation, C.U. and Y.G.-G.; resources, C.U. and Y.G.-G.; data curation, C.U. and Y.G.-G.; writing—original draft preparation, C.U. and Y.G.-G.; writing—review and editing, C.U., Y.G.-G., J.K. and R.R.-G.; visualization, C.U.; supervision, C.U. and J.K.; project administration, C.U.; funding acquisition, C.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

This work was supported by the Faculty of Engineering of the University of Santiago of Chile and the Agencia Nacional de Investigación y Desarrollo de Chile (ANID-Subdirección de Capital Humano/Doctorado Nacional/2022-21220266), Chile.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Adam	Adaptive Moment Estimation
AFFOSMC	Adaptive Fuzzy Fractional-Order Sliding Mode Controller
AI	Artificial intelligence
ANN	Artificial Neural Network
AP	Precisión Promedio
APSMC	Adaptive Proportional Sliding Mode controller
AZNN	Adaptive Zeroing Neural Network
BNN	Bayesian Neural Network
CfC	Closed-Form Continuous-Time Neural Network
CfC-mmRNN	Closed-Form Continuous-Time Neural Networks with mixed-memory Recurrent Neural Networks (Combines the CfC’s RNN-state with an LSTM)
CNN	Convolutional Neural Network
CV	Computer vision
D-H	Denavit-Hartenberg
DoF	Degrees of freedom
EP	End point
EvNN	Evidential Neural Network
FN	False negative
FOV	Field of view
FP	False positive
GELU	Gaussian error linear unit
GPU	Graphics Processing Unit
HRI	Human–robot interaction
IBVS	Image-Based Visual Servoing
IoU	Intersection over Union
ITAE	Integral of Time-Weighted Absolute Error
ITSE	Integral of Time-Weighted Squared Error
LSTM	Long Short-Term Memory
LTC	Liquid Time-Constant Networks
MAE	Mean absolute error
mAP	media de la Precisión Promedio
ML	Machine learning
MRAC	Model Reference Adaptive Control
MSE	Mean Squared Error
NVAC	Neuro-Visual Adaptive Control
ODE	Ordinary Differential Equations
P	Precision
PD	Proportional–Derivate
PID	Proportional–Integral–Derivate
PIs	Performance indexes
PNG	Portable Network Graphics
R	Recall
RCM	Remote Center of Motion
RFPT	Robust Fixed Point Transformations
RNN	Recurrent Neural Networks
SGDM	Stochastic Gradient Descent with Momentum
SMC	Sliding Mode Control
SP	Start point
STAR	Smart Tissue Autonomous Robot
TP	True positive
YOLO	You Only Look Once
YOLO11n	You Only Look Once, version 11, nano variant

References

Mirbagheri, A.; Farahmand, F.; Sarkar, S.; Alamdar, A.; Moradi, M.; Afshari, E. The Sina Robotic Telesurgery System. In Handbook of Robotic and Image-Guided Surgery; Elsevier: Amsterdam, The Netherlands, 2020; Chapter 7; pp. 107–121. [Google Scholar]
Dupont, P.E.; Nelson, B.J.; Goldfarb, M.; Hannaford, B.; Menciassi, A.; O’malley, M.K.; Simaan, N.; Valdastri, P.; Yang, G.-Z. A decade retrospective of medical robotics research from 2010 to 2020. Sci. Robot. 2021, 6, eabi8017. [Google Scholar]
Higgins, R.M.; Gould, J.C. Clinical Applications of Robotics in General Surgery. In Handbook of Robotic and Image-Guided Surgery; Elsevier: Amsterdam, The Netherlands, 2020; pp. 211–221. [Google Scholar]
Culmer, P.; Alazmani, A.; Mushtaq, F.; Cross, W.; Jayne, D. Haptics in surgical robots. In Handbook of Robotic and Image-Guided Surgery; Elsevier: Amsterdam, The Netherlands, 2020; pp. 239–263. [Google Scholar]
Wan, G.-Y.; Zhou, X.-Y.; Duan, H.-X.; Zou, Z.-Y.; Zhang, M.-M.; Mao, J.-B. Comparison of robotic camera holders with human assistants in endoscopic surgery: A systematic review and meta-analysis. Minim. Invasive Ther. Allied Technol. 2023, 32, 153–162. [Google Scholar]
Abedin-Nasab, M. Handbook of Robotic and Image-Guided Surgery; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar]
Li, B.; Lu, B.; Wang, Z.; Zhong, F.; Dou, Q.; Liu, Y.-H. Learning laparoscope actions via video features for proactive robotic field-of-view control. IEEE Robot. Autom. Lett. 2022, 7, 6653–6660. [Google Scholar] [CrossRef]
Li, J.; Huang, Y.; Zhang, X.; Xie, K.; Xian, Y.; Luo, X.; Chiu, P.W.Y.; Li, Z. An autonomous surgical instrument tracking framework with a binocular camera for a robotic flexible laparoscope. IEEE Robot. Autom. Lett. 2023, 8, 4291–4298. [Google Scholar] [CrossRef]
Huang, Y.; Li, J.; Zhang, X.; Xie, K.; Li, J.; Liu, Y.; Ng, S.H.; Chiu, P.W.Y.; Li, Z. A surgeon preference-guided autonomous instrument tracking method with a robotic flexible endoscope based on dVRK platform. IEEE Robot. Autom. Lett. 2022, 7, 2250–2257. [Google Scholar] [CrossRef]
Ou, Y.; Zargarzadeh, S.; Tavakoli, M. Robot Learning Incorporating Human Interventions in the Real World for Autonomous Surgical Endoscopic Camera Control. J. Med. Robot. Res. 2023, 8, 2340004. [Google Scholar] [CrossRef]
Kong, X.; Mo, H.; Dong, E.; Liu, Y.; Sun, D. Automatic Tracking of Surgical Instruments with a Continuum Laparoscope Using Data-Driven Control in Robotic Surgery. Adv. Intell. Syst. 2023, 5, 2200188. [Google Scholar] [CrossRef]
Opfermann, J.D.; Leonard, S.; Decker, R.S.; Uebele, N.A.; Bayne, C.E.; Joshi, A.S.; Krieger, A. Semi-autonomous electrosurgery for tumor resection using a multi-degree of freedom electrosurgical tool and visual servoing. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3653–3660. [Google Scholar] [CrossRef]
Mehrjouyan, A.; Menhaj, M.B.; Hooshiar, A. Safety-enhanced observer-based adaptive fuzzy synchronization control framework for teleoperation systems. Eur. J. Control 2023, 73, 100885. [Google Scholar] [CrossRef]
Huang, Y.; Li, W.; Zhang, X.; Li, J.; Li, Y.; Sun, Y.; Chiu, P.W.Y.; Li, Z. 4-dof visual servoing of a robotic flexible endoscope with a predefined-time convergent and noise-immune adaptive neural network. IEEE/ASME Trans. Mechatron. 2023, 29, 576–587. [Google Scholar] [CrossRef]
Li, Y.; Ng, W.Y.; Li, W.; Huang, Y.; Zhang, H.; Xian, Y.; Li, J.; Sun, Y.; Chiu, P.W.Y.; Li, Z. Towards semi-autonomous colon screening using an electromagnetically actuated soft-tethered colonoscope based on visual servo control. IEEE Trans. Biomed. Eng. 2023, 71, 77–88. [Google Scholar] [CrossRef]
Adjepong, D. Adaptive Control Systems for Medical Robots in Neurosurgery: A Review. HSOA J. Surg. Curr. Trends Innov. 2020, 4, 030. [Google Scholar] [CrossRef]
Ren, F.; Wang, X.; Yu, N.; Han, J. Adaptive fuzzy control for tendon-sheath actuated bending-tip system with unknown friction for robotic flexible endoscope. Front. Neurosci. 2024, 18, 1330634. [Google Scholar] [CrossRef] [PubMed]
Sachan, S.; Swarnkar, P. Intelligent fractional-order sliding mode optimised control of surgical manipulator for healthcare system. Electr. Eng. 2024, 106, 2131–2142. [Google Scholar] [CrossRef]
Kökçam, E.; Tan, N. Comparison of Adaptation Mechanisms on MRAC. IEEE Access 2024, 12, 31862–31874. [Google Scholar] [CrossRef]
Sánchez–Sánchez, P.; Guillermo Cebada–Reyes, J.; Montiel–Martínez, A.; Reyes–Cortés, F. Optimizing Dehydration Systems: Implementing Model Reference Adaptive Control for Enhanced Efficiency. Int. J. Adapt. Control Signal Process. 2025, 39, 1–23. [Google Scholar] [CrossRef]
Tar, J.K.; Kovács, L.; Takács, A.; Takács, B.; Zentay, P.; Haidegger, T.; Rudas, I.J. Novel design of a Model Reference Adaptive Controller for soft tissue operations. In Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, CA, USA, 5–8 October 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 2446–2451. [Google Scholar]
Cursi, F.; Bai, W.; Yeatman, E.M.; Kormushev, P. Task accuracy enhancement for a surgical macro-micro manipulator with probabilistic neural networks and uncertainty minimization. IEEE Trans. Autom. Sci. Eng. 2022, 21, 241–256. [Google Scholar] [CrossRef]
Cursi, F.; Bai, W.; Yeatman, E.M.; Kormushev, P. Adaptive kinematic model learning for macro-micro surgical manipulator control. In Proceedings of the 2022 International Conference on Advanced Robotics and Mechatronics (ICARM), Guilin, China, 9–11 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 494–501. [Google Scholar]
Wang, Z.; Wang, T.; Zhao, B.; He, Y.; Hu, Y.; Li, B.; Zhang, P.; Meng, M.Q.-H. Hybrid adaptive control strategy for continuum surgical robot under external load. IEEE Robot. Autom. Lett. 2021, 6, 1407–1414. [Google Scholar] [CrossRef]
Ping, L.; Wang, Z.; Yao, J.; Gao, J.; Yang, S.; Li, J.; Shi, J.; Wu, W.; Hua, S.; Wang, H. Application and evaluation of surgical tool and tool tip recognition based on Convolutional Neural Network in multiple endoscopic surgical scenarios. Surg. Endosc. 2023, 37, 7376–7384. [Google Scholar] [CrossRef]
Pan, X.; Bi, M.; Wang, H.; Ma, C.; He, X. DBH-YOLO: A surgical instrument detection method based on feature separation in laparoscopic surgery. Int. J. Comput. Assist. Radiol. Surg. 2024, 19, 2215–2225. [Google Scholar] [CrossRef]
Liu, Z.; Zhou, Y.; Zheng, L.; Zhang, G. SINet: A hybrid deep CNN model for real-time detection and segmentation of surgical instruments. Biomed. Signal Process. Control 2024, 88, 105670. [Google Scholar] [CrossRef]
Song, H.; Zhao, Z.; Liu, K.; Wu, Y.; Li, F. Anchor-Free Convolutional Neural Network Application to Enhance Real-Time Surgical Tool Detection in Computer-Aided Surgery. IEEE Trans. Med. Robot. Bionics 2023, 6, 73–83. [Google Scholar]
Peng, C.; Li, Y.; Long, X.; Zhao, X.; Jiang, X.; Guo, J.; Lou, H. Urtnet: An unstructured feature fusion network for real-time detection of endoscopic surgical instruments. J. Real-Time Image Process. 2024, 21, 190. [Google Scholar]
Benavides, D.; Cisnal, A.; Fontúrbel, C.; de la Fuente, E.; Fraile, J.C. Real-Time Tool Localization for Laparoscopic Surgery Using Convolutional Neural Network. Sensors 2024, 24, 4191. [Google Scholar] [CrossRef]
Loza, G.; Valdastri, P.; Ali, S. Real-time surgical tool detection with multi-scale positional encoding and contrastive learning. Healthc. Technol. Lett. 2024, 11, 48–58. [Google Scholar] [PubMed]
Rohmer, E.; Singh, S.P.N.; Freese, M. CoppeliaSim (formerly V-REP): A Versatile and Scalable Robot Simulation Framework. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013. [Google Scholar]
Li, W.; Chiu, P.W.Y.; Li, Z. An accelerated finite-time convergent neural network for visual servoing of a flexible surgical endoscope with physical and RCM constraints. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5272–5284. [Google Scholar] [CrossRef] [PubMed]
Amirthavarshini, G.A.; Thondiyath, A. RCM Constrained Task Space Trajectory Generation for Autonomous Robotic Surgeries. In Proceedings of the 2024 IEEE International Conference on Cyborg and Bionic Systems (CBS), Nagoya, Japan, 20–22 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 121–125. [Google Scholar]
Kuo, J.-Y.; Song, K.-T. A Design for Remote Center of Motion Control Using a Software Computing Approach. In Proceedings of the 2020 International Automatic Control Conference (CACS), Hsinchu, Taiwan, 4–7 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
Huang, Y.; Li, J.; Li, W.; Zhang, X.; Sun, Y.; Xie, K.; Hu, Y.; Chiu, P.W.Y.; Li, Z. An Accelerated Anti-Noise Adaptive Neural Network for Robotic Flexible Endoscope With Multitype Surgical Objectives and Constraints. IEEE Trans. Syst. Man Cybern. Syst. 2024, 55, 990–1003. [Google Scholar]
Colan, J.; Davila, A.; Fozilov, K.; Hasegawa, Y. A concurrent framework for constrained inverse kinematics of minimally invasive surgical robots. Sensors 2023, 23, 3328. [Google Scholar] [CrossRef]
Iovene, E.; Cattaneo, D.; Fu, J.; Ferrigno, G.; De Momi, E. Hybrid Tracking Module for Real-Time Tool Tracking for an Autonomous Exoscope. IEEE Robot. Autom. Lett. 2024, 9, 6067–6074. [Google Scholar]
Zinchenko, K.; Song, K.-T. Autonomous endoscope robot positioning using instrument segmentation with virtual reality visualization. IEEE Access 2021, 9, 72614–72623. [Google Scholar]
MATLAB; Mathworks: Natick, MA, USA, 2024; Available online: https://www.mathworks.com (accessed on 25 March 2025).
Ultralytics YOLO11. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 20 December 2024).
Roboflow. 2024. Available online: https://roboflow.com (accessed on 17 December 2024).
Kaggle. CholecSeg8k: A Semantic Segmentation Dataset Based on Cholec80. Available online: https://www.kaggle.com/datasets/newslab/cholecseg8k (accessed on 25 March 2025).
Twinanda, A.P.; Shehata, S.; Mutter, D.; Marescaux, J.; De Mathelin, M.; Padoy, N. EndoNet: A deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 2017, 36, 86–97. [Google Scholar]
Universal Robots A/S. UR5 Especificaciones Técnicas No. Artículo 110105. Available online: https://www.universal-robots.com/media/50591/ur5_es.pdf (accessed on 25 March 2025).
Rao, S.N. YOLOv11 Architecture Explained: Next-Level Object Detection with Enhanced Speed and Accuracy. Medium. Available online: https://medium.com/@nikhil-rao-20/yolov11-explained-next-level-object-detection-with-enhanced-speed-and-accuracy-2dbe2d376f71 (accessed on 25 February 2025).
Hasani, R.; Lechner, M.; Amini, A.; Rus, D.; Grosu, R. Liquid time-constant networks. Proc. AAAI Conf. Artif. Intell. 2021, 35, 7657–7666. [Google Scholar] [CrossRef]
Hasani, R.; Lechner, M.; Amini, A.; Liebenwein, L.; Ray, A.; Tschaikowski, M.; Teschl, G.; Rus, D. Closed-form continuous-time neural networks. Nat. Mach. Intell. 2022, 4, 992–1003. [Google Scholar] [CrossRef]
Urrea, C.; Garcia-Garcia, Y. Design and Performance Analysis of Level Control Strategies in a Nonlinear Spherical Tank. Processes 2023, 11, 720. [Google Scholar] [CrossRef]
Kern, J.; Rodriguez-Guillen, R.; Urrea, C.; Garcia-Garcia, Y. Enhancing 3D Rock Localization in Mining Environments Using Bird’s-Eye View Images from the Time-of-Flight Blaze 101 Camera. Technologies 2024, 12, 162. [Google Scholar] [CrossRef]
Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 2022, 5481–5487. [Google Scholar] [CrossRef]
Rodriguez-Guillen, R.; Kern, J.; Urrea, C. Fast Rock Detection in Visually Contaminated Mining Environments Using Machine Learning and Deep Learning Techniques. Appl. Sci. 2024, 14, 731. [Google Scholar] [CrossRef]
Urrea, C.; Garcia-Garcia, Y.; Kern, J. Closed-Form Continuous-Time Neural Networks for Sliding Mode Control with Neural Gravity Compensation. Robotics 2024, 13, 126. [Google Scholar] [CrossRef]
Leong, F.; Mohammadi, A.; Tan, Y.; Thiruchelvam, D.; Lai, C.Y.; Valdastri, P.; Oetomo, D. Disturbance Rejection in Multi-DOF Local Magnetic Actuation for the Robotic Abdominal Surgery. IEEE Robot. Autom. Lett. 2018, 3, 1568–1575. [Google Scholar]
Picod, G.; Jambon, A.; Vinatier, D.; Dubois, P. What can the operator actually feel when performing a laparoscopy? Surg. Endosc. Other Interv. Tech. 2005, 19, 95–100. [Google Scholar]
Le, A.T.; Shakiba, M.; Ardekani, I.; Abdulla, W.H. Optimizing Plant Disease Classification with Hybrid Convolutional Neural Network–Recurrent Neural Network and Liquid Time-Constant Network. Appl. Sci. 2024, 14, 9118. [Google Scholar] [CrossRef]
Huang, Z.; Contreras, L.F.H.; Leung, W.H.; Yu, L.; Truong, N.D.; Nikpour, A.; Kavehei, O. Efficient edge-AI models for robust ECG abnormality detection on resource-constrained hardware. J. Cardiovasc. Transl. Res. 2024, 17, 879–892. [Google Scholar] [CrossRef] [PubMed]
Belhaoua, A.; Kimpe, T.; Crul, S. TensorRT-based surgical instrument detection assessment for deep learning on edge computing. In Medical Imaging 2024: Image-Guided Procedures, Robotic Interventions, and Modeling; SPIE: Bellingham, WA, USA, 2024; Volume 12928, pp. 368–371. [Google Scholar]
Kufieta, K. Force Estimation in Robotic Manipulators: Modeling, Simulation and Experiments. Master’s Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2014. Available online: https://tomgra.folk.ntnu.no/Diplomer/Kufieta.pdf (accessed on 25 March 2025).

Figure 1. Methodology of the Neuro-Visual Adaptive Controller (NVAC) Operation. (a) Simulation Environment: This represents the virtual scenario where the camera-holding robot interacts with surgical tools and the patient. (b) Visual Information Processing: In this stage, visual data are interpreted, and motion trajectories for the robot are generated using computer vision models and inverse kinematics. (c) Control Strategy Design: The robot’s response is defined based on the processed visual information, adjusting its movement through the low-level MRAC-PD-CfC-mmRNN controller.

Figure 2. Simulation environment in CoppeliaSim representing a surgical scenario with a human mannequin. A laparoscope equipped with assisted vision is depicted operating within the simulated environment.

Figure 3. Examples of the dataset used in the study. (a) Images from the original CholecSeg8k dataset; (b) images captured in MatLab^® from the CoppeliaSim environment; (c) example of annotations in a subset of images from the original dataset, labeling only the tip of the surgical tool as ground truth.

Figure 4. Representation of the structure and joint configuration of the UR5 robot using the Denavit-Hartenberg convention.

Figure 5. YOLO11 architecture. The diagram illustrates the main components of the model: convolutional layers (Conv) combined with C2PSA blocks at the top, processing pathways featuring BottleNeck and SPPF on the left, and specialized modules such as PSA, FFN, and Detect blocks.

Figure 6. CfC-mmRNN neural network architecture. The figure illustrates the combination of Closed-Form Continuous-Time (CfC) networks with LSTM, forming the CfC-mmRNN architecture. On the left, a traditional LSTM cell is depicted, while on the right, the CfC cell is shown, incorporating a time vector and neural network heads (g, f, h). This integration enhances computational efficiency and the modeling of temporal dynamics.

Figure 7. Schematic of the MRAC-PD-CfC-mmRNN low-level controller. The MRAC-PD-CfC-mmRNN controller is responsible for managing the dynamic response of the robotic manipulator within the NVAC framework. The controller accounts for the robot’s dynamics and incorporates a reference model that adjusts the desired trajectory using adaptive gains (

γ_{1}

,

γ_{2}

). The CfC-mmRNN network predicts the inverse torque (

τ_{C f C}

), supplemented by proportional, derivative, and adaptive terms to enhance control stability and robustness.

Figure 7. Schematic of the MRAC-PD-CfC-mmRNN low-level controller. The MRAC-PD-CfC-mmRNN controller is responsible for managing the dynamic response of the robotic manipulator within the NVAC framework. The controller accounts for the robot’s dynamics and incorporates a reference model that adjusts the desired trajectory using adaptive gains (

γ_{1}

,

γ_{2}

). The CfC-mmRNN network predicts the inverse torque (

τ_{C f C}

), supplemented by proportional, derivative, and adaptive terms to enhance control stability and robustness.

Figure 8. Workspace of the camera-holding robot. (a) Representation of the robot’s complete workspace, including the laparoscope (blue points), with a restricted operational region in the abdominal area (red lines); (b) example of a trajectory within the restricted workspace, showing the starting point (SP, green circle), endpoint (EP, red star), and planned trajectory (blue line). Top and front views enable analysis of the trajectory relative to imposed limits, highlighting compliance with spatial constraints.

Figure 9. Example of trajectories generated within the restricted workspace for training the robot’s inverse model.

Figure 10. Performance curves of YOLO11 training across training epochs. The top row displays the training losses for box detection (train/box_loss), classification (train/cls_loss), and distribution focal loss (train/dfl_loss), along with precision and recall metrics. The bottom row presents the validation losses (val/box_loss, val/cls_loss, val/dfl_loss) and the mAP₅₀ and mAP_50–95 metrics. These plots demonstrate the model’s effective convergence during training and its ability to generalize on validation data.

Figure 11. Detection of the surgical tool tip. (a) Detection results on the test set using YOLO11n; (b) localization of the centroid of the bounding boxes.

Figure 12. Validation of the inverse model training for a test trajectory. The torques estimated by the inverse model (solid line) are compared with the actual torques applied by the robot (dashed line) across each of the six joints, verifying the model’s accuracy in reproducing the system’s dynamics.

Figure 13. Trajectory tracking performance of the implemented controllers. (a) Desired Cartesian trajectory 1 for Scenario 1. (b) Desired Cartesian trajectory 2 for Scenario 2.

Figure 14. Simulation results for the implemented control strategies. (a) Tracking of the desired Cartesian trajectory 1 on the x-axis; (b) tracking of the desired Cartesian trajectory 2 on the x-axis.

Figure 15. Simulation results for the implemented control strategies. (a) Tracking of the desired Cartesian trajectory 1 on the y-axis; (b) tracking of the desired Cartesian trajectory 2 on the y-axis.

Figure 16. Simulation results for the implemented control strategies. (a) Tracking of the desired Cartesian trajectory 1 on the z-axis; (b) tracking of the desired Cartesian trajectory 2 on the z-axis.

Figure 17. Cartesian trajectory tracking under external disturbances. (a) Trajectory 1, Scenario 3; (b) trajectory 2, Scenario 4.

Figure 18. Cartesian tracking errors in surgical instrument tracking for Scenario 3. The figure compares the tracking errors of the designed controllers along the Cartesian axes (x, y, z).

Figure 19. Cartesian tracking errors in surgical instrument tracking for Scenario 4. The figure compares the tracking errors of the designed controllers along the Cartesian axes (x, y, z).

Figure 20. Examples of surgical tool detection with YOLO11n in challenging scenarios, including low lighting, partial occlusion, smoke presence, and small instrument detection.

Table 1. Design parameters for the NVAC control strategy.

Parameters	Joint 1	Joint 2	Joint 3	Joint 4	Joint 5	Joint 6
$K_{p}$	100	3000	320	80	30	2
$K_{d}$	100	380	80	20	8	2
$γ_{1}$	4	4	4	4	3	3
$γ_{2}$	3	3	2	2	3	3

Table 2. Neural network training parameters for system identification and tool tip detection.

Parameters	CfC-mmRNN	Parameters	YOLO11n
Epochs	50	Epochs	300
Initial Learning Rate	0.001	Initial Learning Rate	0.01
Batch Size	64	Batch Size	16
Activation Function	GELU	Optimizer	SGDM
Optimizer	Adam	Momentum	0.937
Weight Decay	$6.0 {\times 10}^{- 6}$	Weight Decay	$5.0 {\times 10}^{- 4}$

Table 3. Computational cost of trained neural network models.

Model	Parameters	Memory	Inference Speed
YOLO11x	56,874,931	111.7 MB	34.8 ms
YOLO11n	2,590,035	5.37 MB	9.5 ms
CfC-mmRNN	197,638	798 KB	26 ms

Table 4. Performance indexes for Cartesian and joint positions under trajectory 1.

PIs	Controller	x	y	z	Joint 1	Joint 2	Joint 3	Joint 4	Joint 5	Joint 6
MAE	PID	0.0047	$9.394 \times$ 10⁻⁵	0.0081	$6.71 \times$ 10⁻⁵	0.0065	0.0120	0.0031	0.0008	0.0005
	SMC	0.0045	0.0002	0.0031	$3.85 \times$ 10⁻⁵	0.0020	0.0098	0.0027	0.0001	0.0002
	MRAC-PD	$2.60 \times$ 10⁻⁵	$3.24 \times$ 10⁻⁶	$1.71 \times$ 10⁻⁵	$1.43 \times$ 10⁻⁵	$6.90 \times$ 10⁻⁵	$6.82 \times$ 10⁻⁵	$2.00 \times$ 10⁻⁵	$2.16 \times$ 10⁻⁵	0.0002
	Proposed	$5.22 \times$ 10⁻⁶	$3.91 \times$ 10⁻⁶	$2.11 \times$ 10⁻⁵	$1.36 \times$ 10⁻⁵	$3.02 \times$ 10⁻⁵	$1.99 \times$ 10⁻⁵	$2.03 \times$ 10⁻⁵	$2.18 \times$ 10⁻⁵	0.0001
ITAE	PID	0.5538	0.0084	0.9556	0.0066	0.7814	1.4354	0.3425	0.0709	0.0363
	SMC	0.5383	0.0174	0.3751	0.0028	0.2325	1.1793	0.3337	0.0289	0.0116
	MRAC-PD	0.0003	0.0002	0.0008	0.0015	0.0014	0.0013	0.0008	0.0023	0.0105
	Proposed	0.0003	0.0004	0.0012	0.0013	0.0021	0.0010	0.0008	0.0022	0.0105
ITSE	PID	0.0026	$1.01 \times$ 10⁻⁶	0.0077	$5.88 \times$ 10⁻⁷	0.0052	0.0174	0.0010	$6.73 \times$ 10⁻⁵	$2.18 \times$ 10⁻⁵
	SMC	0.0024	$4.04 \times$ 10⁻⁶	0.0011	$1.75 \times$ 10⁻⁷	0.0004	0.0117	0.0009	$7.79 \times$ 10⁻⁶	$2.33 \times$ 10⁻⁶
	MRAC-PD	$3.93 \times$ 10⁻⁸	$9.27 \times$ 10⁻¹⁰	$1.77 \times$ 10⁻⁸	$2.54 \times$ 10⁻⁸	$2.747 \times$ 10⁻⁷	$2.40 \times$ 10⁻⁷	$3.23 \times$ 10⁻⁸	$5.58 \times$ 10⁻⁸	$1.34 \times$ 10⁻⁶
	Proposed	$2.27 \times$ 10⁻⁹	$2.480 \times$ 10⁻⁹	$3.44 \times$ 10⁻⁸	$2.25 \times$ 10⁻⁸	$7.79 \times$ 10⁻⁸	$3.34 \times$ 10⁻⁸	$4.09 \times$ 10⁻⁸	$5.52 \times$ 10⁻⁸	$1.34 \times$ 10⁻⁶

Best performance is denoted in bold.

Table 5. Performance indexes for Cartesian and joint positions under trajectory 2.

PIs	Controller	x	y	z	Joint 1	Joint 2	Joint 3	Joint 4	Joint 5	Joint 6
MAE	PID	0.0049	$1.53 \times$ 10⁻⁴	0.0084	$1.02 \times$ 10⁻⁴	0.0055	0.0121	0.0031	0.0013	0.0024
	SMC	0.0045	$2.45 \times$ 10⁻⁴	0.0033	$3.35 \times$ 10⁻⁵	0.0029	0.0098	0.0027	$1.95 \times$ 10⁻⁴	0.0011
	MRAC-PD	$2.49 \times$ 10⁻⁵	$2.86 \times$ 10⁻⁶	$3.362 \times$ 10⁻⁵	$1.47 \times$ 10⁻⁵	$8.34 \times$ 10⁻⁵	$6.70 \times$ 10⁻⁵	$4.05 \times$ 10⁻⁵	$2.23 \times$ 10⁻⁵	0.0015
	Proposed	$6.68 \times$ 10⁻⁶	$4.63 \times$ 10⁻⁶	$2.81 \times$ 10⁻⁵	$1.44 \times$ 10⁻⁵	$3.92 \times$ 10⁻⁵	$2.78 \times$ 10⁻⁵	$3.89 \times$ 10⁻⁵	$2.19 \times$ 10⁻⁵	0.0013
ITAE	PID	0.9228	0.0373	1.6236	0.0174	1.0312	2.3259	0.5386	0.3276	0.5482
	SMC	0.8693	0.0582	0.6403	0.0046	0.5909	1.8829	0.5075	0.0440	0.2332
	MRAC-PD	0.0012	$3.89 \times$ 10⁻⁴	0.0052	0.0022	0.0078	0.0054	0.0073	0.0034	0.3128
	Proposed	0.0013	$7.59 \times$ 10⁻⁴	0.0048	0.0022	0.0067	0.0051	0.0078	0.0033	0.3126
ITSE	PID	0.0044	$8.99 \times$ 10⁻⁶	0.0138	$4.96 \times$ 10⁻⁶	0.0056	0.0282	0.0015	$6.47 \times$ 10⁻³	0.0073
	SMC	0.0039	$2.04 \times$ 10⁻⁵	0.0021	$2.38 \times$ 10⁻⁷	0.0019	0.0185	0.0013	$1.29 \times$ 10⁻⁵	0.0035
	MRAC-PD	$5.675 \times$ 10⁻⁸	$1.93 \times$ 10⁻⁹	$3.464 \times$ 10⁻⁷	$6.18 \times$ 10⁻⁸	$9.86 \times$ 10⁻⁷	$5.92 \times$ 10⁻⁷	$8.84 \times$ 10⁻⁷	$1.52 \times$ 10⁻⁷	0.0060
	Proposed	$2.02 \times$ 10⁻⁸	$1.06 \times$ 10⁻⁹	$2.78 \times$ 10⁻⁷	$6.72 \times$ 10⁻⁸	$5.58 \times$ 10⁻⁷	$4.23 \times$ 10⁻⁷	$9.83 \times$ 10⁻⁷	$1.44 \times$ 10⁻⁷	0.0055

Best performance is denoted in bold.

Table 6. Performance indexes for Cartesian and joint positions under Scenario 3.

PIs	Controller	x	y	z	Joint 1	Joint 2	Joint 3	Joint 4	Joint 5	Joint 6
MAE	PID	$6.70 \times$ 10⁻⁵	0.0065	0.0121	$6.70 \times$ 10⁻⁵	0.0065	0.0121	0.0033	0.0007	0.0004
	SMC	0.0046	0.0002	0.0032	$3.89 \times$ 10⁻⁵	0.0019	0.0100	0.0029	0.0001	0.0002
	MRAC-PD	$5.68 \times$ 10⁻⁵	$3.82 \times$ 10⁻⁶	$4.83 \times$ 10⁻⁵	$1.47 \times$ 10⁻⁵	$9.33 \times$ 10⁻⁵	0.0001	$6.75 \times$ 10⁻⁵	$2.16 \times$ 10⁻⁵	0.0002
	Proposed	$3.59 \times$ 10⁻⁵	$4.22 \times$ 10⁻⁶	$5.24 \times$ 10⁻⁵	$1.40 \times$ 10⁻⁵	$5.48 \times$ 10⁻⁵	$6.74 \times$ 10⁻⁵	$6.77 \times$ 10⁻⁵	$2.18 \times$ 10⁻⁵	0.0001
ITAE	PID	0.0066	0.7860	1.4465	0.0066	0.7860	1.4465	0.3767	0.0708	0.0362
	SMC	0.5538	0.0177	0.3841	0.0028	0.2285	1.1966	0.3613	0.0289	0.0116
	MRAC-PD	0.0046	0.0004	0.0053	0.0014	0.0049	0.0080	0.0077	0.0022	0.0105
	Proposed	0.0046	0.0003	0.0057	0.0013	0.0057	0.0078	0.0076	0.0023	0.0105
ITSE	PID	$5.86 \times$ 10⁻⁷	0.0052	0.0176	$5.86 \times$ 10⁻⁷	0.0052	0.0176	0.0016	$6.73 \times$ 10⁻⁵	$2.18 \times$ 10⁻⁵
	SMC	0.0026	$4.21 \times$ 10⁻⁶	0.0012	$1.79 \times$ 10⁻⁷	0.0004	0.0121	0.0013	$7.82 \times$ 10⁻⁶	$2.34 \times$ 10⁻⁵
	MRAC-PD	$3.77 \times$ 10⁻⁶	$4.06 \times$ 10⁻⁹	$3.62 \times$ 10⁻⁶	$2.97 \times$ 10⁻⁸	$2.19 \times$ 10⁻⁶	$9.69 \times$ 10⁻⁶	$8.62 \times$ 10⁻⁶	$5.52 \times$ 10⁻⁸	$1.34 \times$ 10⁻⁶
	Proposed	$3.72 \times$ 10⁻⁶	$4.49 \times$ 10⁻⁹	$3.65 \times$ 10⁻⁶	$2.72 \times$ 10⁻⁸	$2.08 \times$ 10⁻⁶	$9.41 \times$ 10⁻⁶	$8.71 \times$ 10⁻⁶	$5.58 \times$ 10⁻⁸	$1.34 \times$ 10⁻⁶

Best performance is denoted in bold.

Table 7. Performance indexes for Cartesian and joint positions under Scenario 4.

PIs	Controller	x	y	z	Joint 1	Joint 2	Joint 3	Joint 4	Joint 5	Joint 6
MAE	PID	0.0049	$1.53 \times$ 10⁻⁴	0.0084	$1.02 \times$ 10⁻⁴	0.0055	0.0121	0.0032	0.0013	0.0024
	SMC	0.0046	$2.53 \times$ 10⁻⁴	0.0033	$3.37 \times$ 10⁻⁵	0.0029	0.0099	0.0028	$1.95 \times$ 10⁻⁴	0.0011
	MRAC-PD	$4.86 \times$ 10⁻⁵	$5.02 \times$ 10⁻⁶	$5.05 \times$ 10⁻⁵	$1.48 \times$ 10⁻⁵	$9.29 \times$ 10⁻⁵	$9.93 \times$ 10⁻⁵	$7.46 \times$ 10⁻⁵	$2.23 \times$ 10⁻⁵	0.0015
	Proposed	$3.01 \times$ 10⁻⁵	$6.89 \times$ 10⁻⁶	$4.58 \times$ 10⁻⁵	$1.44 \times$ 10⁻⁵	$4.97 \times$ 10⁻⁵	$5.99 \times$ 10⁻⁵	$7.29 \times$ 10⁻⁵	$2.19 \times$ 10⁻⁵	0.0013
ITAE	PID	0.9378	0.0376	1.6297	0.0173	1.0364	2.3382	0.5698	0.3276	0.5482
	SMC	0.8863	0.0597	0.6504	0.0046	0.5864	1.9021	0.5380	0.0440	0.2331
	MRAC-PD	0.0060	$8.28 \times$ 10⁻⁴	0.0086	0.0022	0.0097	0.0119	0.0141	0.0033	0.3128
	Proposed	0.0059	0.0012	0.0083	0.0022	0.0088	0.0115	0.0145	0.0032	0.3126
ITSE	PID	0.0046	$9.30 \times$ 10⁻⁶	0.0138	$4.95 \times$ 10⁻⁶	0.0056	0.0285	0.0022	$6.47 \times$ 10⁻⁴	0.0073
	SMC	0.0041	$2.19 \times$ 10⁻⁵	0.0022	$2.41 \times$ 10⁻⁷	0.0018	0.0190	0.0017	$1.298 \times$ 10⁻⁵	0.0034
	MRAC-PD	$4.19 \times$ 10⁻⁶	$3.78 \times$ 10⁻⁸	$4.35 \times$ 10⁻⁶	$6.61 \times$ 10⁻⁸	$3.04 \times$ 10⁻⁶	$1.09 \times$ 10⁻⁵	$1.06 \times$ 10⁻⁵	$1.52 \times$ 10⁻⁷	0.0060
	Proposed	$4.15 \times$ 10⁻⁶	$5.33 \times$ 10⁻⁸	$4.21 \times$ 10⁻⁶	$7.14 \times$ 10⁻⁸	$2.52 \times$ 10⁻⁶	$1.07 \times$ 10⁻⁵	$1.08 \times$ 10⁻⁵	$1.44 \times$ 10⁻⁷	0.0055

Best performance is denoted in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Urrea, C.; Garcia-Garcia, Y.; Kern, J.; Rodriguez-Guillen, R. Neuro-Visual Adaptive Control for Precision in Robot-Assisted Surgery. Technologies 2025, 13, 135. https://doi.org/10.3390/technologies13040135

AMA Style

Urrea C, Garcia-Garcia Y, Kern J, Rodriguez-Guillen R. Neuro-Visual Adaptive Control for Precision in Robot-Assisted Surgery. Technologies. 2025; 13(4):135. https://doi.org/10.3390/technologies13040135

Chicago/Turabian Style

Urrea, Claudio, Yainet Garcia-Garcia, John Kern, and Reinier Rodriguez-Guillen. 2025. "Neuro-Visual Adaptive Control for Precision in Robot-Assisted Surgery" Technologies 13, no. 4: 135. https://doi.org/10.3390/technologies13040135

APA Style

Urrea, C., Garcia-Garcia, Y., Kern, J., & Rodriguez-Guillen, R. (2025). Neuro-Visual Adaptive Control for Precision in Robot-Assisted Surgery. Technologies, 13(4), 135. https://doi.org/10.3390/technologies13040135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Neuro-Visual Adaptive Control for Precision in Robot-Assisted Surgery

Abstract

1. Introduction

Related Work

2. Materials and Methods

2.1. Simulation Setup

2.1.1. Hardware and Software Tools

2.1.2. Dataset

2.2. Description of the System Under Study

2.3. Design of Control Strategy

2.3.1. Design of the Low-Level MRAC-PD-CfC-mmRNN Adaptive Controller

2.3.2. Stability Analysis

2.3.3. Trajectory Generation

2.4. Neural Network Training

2.5. Performance Metrics

3. Results

3.1. Results of Surgical Tool Localization

3.2. Results of System Identification

3.3. Simulation Results

3.3.1. Experimental Tests

3.3.2. Trajectory Tracking

4. Discussion

4.1. Study Limitations

4.2. Feasibility and Challenges of Transitioning from Simulation to Real-World Clinical Trials

Considerations for Clinical Validation of the NVAC Controller

5. Conclusions

6. Future Works

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI