Stereo Visual Servoing Control of a Soft Endoscope for Upper Gastrointestinal Endoscopic Submucosal Dissection

Quickly and accurately completing endoscopic submucosal dissection (ESD) operations within narrow lumens is currently challenging because of the environment’s high flexibility, invisible collision, and natural tissue motion. This paper proposes a novel stereo visual servoing control for a dual-segment robotic endoscope (DSRE) for ESD surgery. Departing from conventional monocular-based methods, our DSRE leverages stereoscopic imaging to rapidly extract precise depth data, enabling quicker controller convergence and enhanced surgical accuracy. The system’s dual-segment configuration enables agile maneuverability around lesions, while its compliant structure ensures adaptability within the surgical environment. The implemented stereo visual servo controller uses image features for real-time feedback and dynamically updates gain coefficients, facilitating rapid convergence to the target. In visual servoing experiments, the controller demonstrated strong performance across various tasks. Even when subjected to unknown external forces, the controller maintained robust performance in target tracking. The feasibility and effectiveness of the DSRE were further verified through ex vivo experiments. We posit that this novel system holds significant potential for clinical application in ESD surgeries.


Introduction
Gastrointestinal (GI) cancer is a prevalent and highly malignant tumor in clinical practice [1].Stomach cancer, in particular, ranks third in terms of mortality due to its low survival rate, with 782,685 deaths reported in 2018 [2].Improving the survival rate of GI cancer relies heavily on early-stage treatment [3,4].Owing to the high compliance, control precision, and cost-effectiveness, flexible endoscopes are increasingly being utilized in a variety of minimally invasive surgical procedures (see Figure 1a).Since the first application in 1988 [5], endoscopic submucosal dissection (ESD) has become the gold standard for the early treatment of GI cancer [6].ESD enables the en bloc resection of submucosal lesions, resulting in a lower recurrence rate [7].Currently, most ESD procedures are dependent on systems that utilize a single flexible arm outfitted with a monocular endoscope and various electrosurgical knives designed for specific surgical tasks [8].However, due to the limited degrees of freedom (DoFs) of existing ESD endoscopes and the high skills required, improper operations during the procedure may lead to some complications such as tissue perforation and bleeding [9].Additionally, the narrow visual field offered by traditional endoscopes hinders the surgeon's ability to perform precise, multi-angle excisions, emphasizing the necessity for an ESD system with improved visual capabilities and instrument control [10].
Micromachines 2024, 15, 276 2 of 22 required, improper operations during the procedure may lead to some complications such as tissue perforation and bleeding [9].Additionally, the narrow visual field offered by traditional endoscopes hinders the surgeon's ability to perform precise, multi-angle excisions, emphasizing the necessity for an ESD system with improved visual capabilities and instrument control [10].To simplify the procedure and improve maneuverability, researchers have proposed different robot arm designs by adapting conventional surgical tools, enabling novices to perform the surgery easily.Chiu et al. [11] compared the feasibility and proficiency of ESD using an insulation-tipped (IT) diathermic knife, demonstrating that the robotic ESD device MASTER is more effective in terms of time cost and exhibits a good grasping and cutting efficiency [12].Other similar ESD devices include ESD+AWC [13], K-FLEX [14], and TURBT [15].However, the monocular vision system on traditional ESD endoscopes limits depth perception and lacks the stereoscopic perception of the lesion, which may increase the surgical operation time [16] and compromise the accuracy of resection [17,18].The application of stereoscopic vision in various miniaturized surgical instruments, such as the da Vinci surgical system [19] and Versius surgical robot [20], enhances surgeons' hand-eye coordination and depth judgment capabilities in minimally invasive surgeries.Figure 1b illustrates a dual-segment robotic endoscope (DSRE) [21] with a stereo vision system, which is used to reach the stomach lesion through the esophagus and perform the excision.As shown in Figure 1c, the DSRE consists of two air-driven soft segments that can be actively bent by tuning the air pressure.
Challenges remain in controlling the end effector to achieve a desired pose in a flexible continuum robot arm, especially given the hyper-redundancy and infinite DoFs [22].Contact with the target also introduces disturbances at the tip and the circumferential body.Therefore, accurate modeling of the relationship between the actuation inputs, tip pose, and compensation for disturbances is essential.Although modeling in continuum robotics is a well-explored topic [23], the design of an eye-in-hand system like a flexible endoscope is not common.The stereo vision, embedded at the robot tip, contributes to image-based visual servoing (IBVS), enhancing the system's compatibility with the eye- To simplify the procedure and improve maneuverability, researchers have proposed different robot arm designs by adapting conventional surgical tools, enabling novices to perform the surgery easily.Chiu et al. [11] compared the feasibility and proficiency of ESD using an insulation-tipped (IT) diathermic knife, demonstrating that the robotic ESD device MASTER is more effective in terms of time cost and exhibits a good grasping and cutting efficiency [12].Other similar ESD devices include ESD+AWC [13], K-FLEX [14], and TURBT [15].However, the monocular vision system on traditional ESD endoscopes limits depth perception and lacks the stereoscopic perception of the lesion, which may increase the surgical operation time [16] and compromise the accuracy of resection [17,18].The application of stereoscopic vision in various miniaturized surgical instruments, such as the da Vinci surgical system [19] and Versius surgical robot [20], enhances surgeons' hand-eye coordination and depth judgment capabilities in minimally invasive surgeries.Figure 1b illustrates a dual-segment robotic endoscope (DSRE) [21] with a stereo vision system, which is used to reach the stomach lesion through the esophagus and perform the excision.As shown in Figure 1c, the DSRE consists of two air-driven soft segments that can be actively bent by tuning the air pressure.
Challenges remain in controlling the end effector to achieve a desired pose in a flexible continuum robot arm, especially given the hyper-redundancy and infinite DoFs [22].Contact with the target also introduces disturbances at the tip and the circumferential body.Therefore, accurate modeling of the relationship between the actuation inputs, tip pose, and compensation for disturbances is essential.Although modeling in continuum robotics is a well-explored topic [23], the design of an eye-in-hand system like a flexible endoscope is not common.The stereo vision, embedded at the robot tip, contributes to image-based visual servoing (IBVS), enhancing the system's compatibility with the eye-in-hand setup [24].Table 1 summarizes the related research on multi-segment soft robot control, highlighting the superiority of visual servoing for robot operation in unstructured environments.The classical IBVS has been widely used to solve the tracking [25], shape control [26], and depth estimation [27] problems of flexible endoscopes.During endoscopic operations, external interference, such as inserting internal instruments, may lead to potential issues with the proportional controller.These issues could manifest as slow convergence [28] and decreased tracking performance [25].To overcome these uncertainties, control algorithms such as sliding mode control [29], model predictive control [30], and multi-sensor fusion [31] are deployed in the IBVS framework to improve system robustness.The camera not only provides a view of the lesion but also senses the actual motion of the robot arm, serving as a proprioception mechanism for disturbance compensation.In [31], the depth of the endoscopic image is extracted for camera position adjustment in laparoscopic surgeries.In this study, stereo vision is employed to capture a wide view of the lesion and estimate the depth between the robot tip and the desired target, providing sensing information for disturbance compensation.This study presents a dual-segment robotic endoscope with a control algorithm based on binocular visual servoing, and preliminarily discusses the application value of stereo vision in ESD surgery.To address the extant clinical challenges, the design requirements of the DSRE are introduced in detail in Section 2, and the fabrication procedure of the robot and the design of the pneumatic control unit are explained, respectively.Section 3 explains the modeling of the relationship between the actuation inputs of the robot and the tip pose, as well as the approach to sensing an object in 3D form.Additionally, Section 3 introduces an adaptive stereo visual servoing (ASVS)-based control scheme.The experiment results in Section 4 demonstrate the effectiveness of the robot and the proposed methods through controller tests and an ex vivo ESD trial.Finally, Section 5 concludes this work.Compared with a previously published conference paper [21], the main improvements in this expanded study include a smaller and more flexible robot for ESD with segments calibration, the ASVS controller with adaptive gain and PD control, and a more comprehensive experimental validation.

Robot Design
The DSRE is designed to function the same as the distal tip [37] on a traditional endoscope; thus, it was designed with the following three goals in mind: (i) to allow for multiple orientations along a resection path, as a single segment proves inadequate; (ii) to fit within the adult esophagus, which measures 25 cm in length and 2 cm in diameter [38]; and (iii) to facilitate a wide viewing angle during surgery and the efficient operation of a T-type electrosurgical cutting knife.
To this end, both the proximal and distal segment of the DSRE were designed to have three actuators longitudinally aligned with the main body and have a 2 mm through working channel for the electrosurgical knife or biopsy grasper, as shown in Figure 2b.When fabricated with appropriate pretension, this construction allows for approximately 170 • and 88 • of curvature when the proximal and distal are fully actuated, respectively.More detailed dimensions of the DSRE are shown in Figure 2b.As shown in Table 2, compared with the previous design [21] and conventional endoscopes, the DSRE's robotic actuator can provide more precise control, while its smaller size and increased bending capabilities expand the operational area within the gastrointestinal tract.Additionally, the stereo vision provides supplementary depth information.
CF-XZ1200L/I [39]  Two commercial cameras (OV6946, OmniVision, Santa Clara, CA, USA) with a distribution distance of 4.4 mm were attached at the tip, providing a wide view angle during surgery, as shown in Figure 2c.Similarly, a T-type electrosurgical cutting knife was fixed at the tip, acting as the end effector of the robot arm.The resection power and mode could be set on an electrosurgical generator (DGD-300B-2, Beilin, Beijing, China).Two commercial cameras (OV6946, OmniVision, Santa Clara, CA, USA) with a distribution distance of 4.4 mm were attached at the tip, providing a wide view angle during surgery, as shown in Figure 2c.Similarly, a T-type electrosurgical cutting knife was fixed at the tip, acting as the end effector of the robot arm.The resection power and mode could be set on an electrosurgical generator (DGD-300B-2, Beilin, Beijing, China).

Robot Fabrication
As shown in Figure 2a, by casting the prototypes on 3D-printed molds, the soft segments were constructed of two low-stiffness silicones, Ecoflex 0050 and Dragon Skin 10 AF (Smooth-On, Macungie, PA, USA) mixed in a 1:1 ratio.To restrict elongation and limit radial expansion when pressurized, a nylon thread was attached at the end of each segment and another thread was coiled spirally.The DSRE of two serially connected segments was then affixed to a linear stage to facilitate insertion into the stomach.

Pneumatic Control Design
Pneumatic actuation is used in the DSRE system for its high precision control, quick response, safety for medical use, and the adaptability of in-body applications.Each air chamber of the robotic arm was linked to a pneumatic regulator (ITV0030, SMC, Tokyo, Japan) by a 60 cm silicone tube (0.3 mm inner, 0.8 mm outer diameter), and command individual pressures continuously to bend the DSRE with a direction angle and a bending angle.These valves were chosen for the following reasons: fast response times (0.1 s with no load), high stability (within 0.5% full span repeatability error), and accurate response (within 1% full span linearity error).

Robot Calibration
The calibration of soft robots is an essential process that helps determine the mechanical behavior under different loads and conditions, enabling better control and manipulation.To verify the mechanical performance and chamber consistency of the proposed dual-segment soft robot, we initially examined its bending behavior corresponding to each independent air chamber.
Figure 3a illustrates the configurations of the electromagnetic (EM) trackers located at the proximal and distal segments of the DSRE.Notably, to accurately obtain the position and orientation of each segment center, two trackers were symmetrically affixed to a 3Dprinted part mounted at the proximal/distal segment's end.We subsequently processed the average of the sensed data to calculate the positions and Euler angles.During testing, the pneumatic regulator was controlled to gradually increase the air pressure with a constant increment of 5000 Pa, while simultaneously recording the bending angles.As shown in Figure 3b, the curves of the bending angles corresponding to each segment's three chambers were roughly equivalent.When reaching the maximum air pressure (0.14 MPa for the proximal segment and 0.1 MPa for the distal segment), the standard deviations of the bending angles across the different chambers were remarkably low, which were only 0.1012 rad and 0.0428 rad, respectively.These results illustrate that the soft robot has a consistent and stable bending performance in all directions.
As shown in Figure 2a, by casting the prototypes on 3D-printed molds, the soft segments were constructed of two low-stiffness silicones, Ecoflex 0050 and Dragon Skin 10 AF (Smooth-On, Macungie, PA, USA) mixed in a 1:1 ratio.To restrict elongation and limit radial expansion when pressurized, a nylon thread was attached at the end of each segment and another thread was coiled spirally.The DSRE of two serially connected segments was then affixed to a linear stage to facilitate insertion into the stomach.

Pneumatic Control Design
Pneumatic actuation is used in the DSRE system for its high precision control, quick response, safety for medical use, and the adaptability of in-body applications.Each air chamber of the robotic arm was linked to a pneumatic regulator (ITV0030, SMC, Tokyo, Japan) by a 60 cm silicone tube (0.3 mm inner, 0.8 mm outer diameter), and command individual pressures continuously to bend the DSRE with a direction angle and a bending angle.These valves were chosen for the following reasons: fast response times (0.1 s with no load), high stability (within 0.5% full span repeatability error), and accurate response (within 1% full span linearity error).

Robot Calibration
The calibration of soft robots is an essential process that helps determine the mechanical behavior under different loads and conditions, enabling better control and manipulation.To verify the mechanical performance and chamber consistency of the proposed dual-segment soft robot, we initially examined its bending behavior corresponding to each independent air chamber.
Figure 3a illustrates the configurations of the electromagnetic (EM) trackers located at the proximal and distal segments of the DSRE.Notably, to accurately obtain the position and orientation of each segment center, two trackers were symmetrically affixed to a 3D-printed part mounted at the proximal/distal segment's end.We subsequently processed the average of the sensed data to calculate the positions and Euler angles.During testing, the pneumatic regulator was controlled to gradually increase the air pressure with a constant increment of 5000 Pa, while simultaneously recording the bending angles.As shown in Figure 3b, the curves of the bending angles corresponding to each segment's three chambers were roughly equivalent.When reaching the maximum air pressure (0.14 MPa for the proximal segment and 0.1 MPa for the distal segment), the standard deviations of the bending angles across the different chambers were remarkably low, which were only 0.1012 rad and 0.0428 rad, respectively.These results illustrate that the soft robot has a consistent and stable bending performance in all directions.

Methodology
Through geometric analysis and an optimization algorithm, the modeling was first established, and the two endoscopes placed in an embedded sensing system to compensate for the external load effect.

Shape to Tip Pose
For soft segments with a large slenderness ratio, when the eccentrically fixed chamber is subjected to changing air pressure, the robot will bend and deform.Referring to the PCC model proposed in [42], it is assumed that the bending shape of each soft robot on the DSRE is a circular arc.As shown in Figure 4a, the DSRE can be geometrically parameterized by Φ = (z s θ s φ s θ e φ e ) T in the configuration space, where z s is the overall insert distance provided by a linear stage, θ s and θ e are the bending angles, φ s and φ e are the direction angles between the bending plane and the oxz plane, l s and l e are the length of the proximal and distal segment, and z e and t d are the length of the middle cap and tip cap.( ) where , and the subscripts e and s are used to represent the proximal and distal segment, respectively, i ρ is the distance between the center of the air chamber and the center of the segment, are the air pressure of the chambers in each segment.

Depth Estimation
The two cameras mounted at the robot tip could not only view the lesion with a wide angle but also could be employed to calculate the depth of a target (vertical distance from camera to the target).In this study, we employed the semi-global matching (SGM) algorithm [43], renowned for its efficacy in disparity estimation.From the disparity and stereo camera intrinsic information, the targets' 3D position relative to camera frames can be estimated.The SGM algorithm encompasses a pixelwise matching technique based on mutual information ( MI ) [44] and a semi-global 2D constraint, composed of multiple 1D constraints.It is detailed as follows.

Mutual Information-Based Cost function
Initially, SGM computes the similarity between pixels in two rectified stereo images Taking the proximal segment as an example, the coordinate transformation sequence from the O sb X sb Y sb Z sb to O st X st Y st Z st coordinate system is as follows: first rotate the φ s angle around the z-axis, then translate l s /θ s along the x-axis, then rotate the θ s angle around the y-axis, and, finally, translate l s /θ s along the negative direction of the x-axis.The corresponding homogeneous transformation matrix is calculated as follows: where τ * , R * ∈ R 4×4 , respectively, denote the translation and rotation about the {x, y, z} axis.Considering the proximal segment and the displacements z s , z e , and t d , the overall transformation matrix where the shift angle between the two segments is 60 • The Jacobian matrix J r ∈ R 6×5 is used to analytically establish the approximate relationship between the robot tip velocity v t ∈ R 6 and joint velocity .Φ: where J r can be derived through forward kinematics b t T. A more detailed calculation of the forward kinematics and the Jacobian matrix can be found in Appendix A.
To reach a given target position of the end of the robot, we need to inversely solve the appropriate joint configuration.

Air Pressure to Shape
After finding the shape parameters, the robot system should set proper air pressures for each segment.The joint angles θ s , θ e , φ s , and φ e can be calculated from the actuator space variables q = z s p s,1 p s,2 p s,3 p e,1 p e,2 p e,3 T : where i ∈ {e, s}, and the subscripts e and s are used to represent the proximal and distal segment, respectively, ρ i is the distance between the center of the air chamber and the center of the segment, p i,m , m ∈ {1, 2, 3} are the air pressure of the chambers in each segment.

Depth Estimation
The two cameras mounted at the robot tip could not only view the lesion with a wide angle but also could be employed to calculate the depth of a target (vertical distance from camera to the target).In this study, we employed the semi-global matching (SGM) algorithm [43], renowned for its efficacy in disparity estimation.From the disparity and stereo camera intrinsic information, the targets' 3D position relative to camera frames can be estimated.The SGM algorithm encompasses a pixelwise matching technique based on mutual information (MI) [44] and a semi-global 2D constraint, composed of multiple 1D constraints.It is detailed as follows.

Mutual Information-Based Cost function
Initially, SGM computes the similarity between pixels in two rectified stereo images using a mutual information (MI)-based cost function, which is less sensitive to photometric changes.Thus, the MI value can be used to evaluate the suitability of pixels.The higher the value, the better the match, and vice versa.For the left and right images I l , I r from the end effector of the DSRE, the MI value is MI I l ,I r = H I l + H I r − H I l ,I r (H is the entropy of image, H I l ,I r is the joint entropy of images).The MI-based cost function would be computed for each pixel and disparity, resulting in a cost volume that the SGM method would use in subsequent steps to compute the final disparity map.

Aggregate Cost
After the initial cost computation, these costs need to be aggregated across multiple paths through the image to enforce the smoothness constraints while preserving disparity discontinuities, which are often present at object boundaries.A path-wise dynamic programming approach was applied in the SGM algorithm.The cost aggregation step involves accumulating the costs for each pixel along several paths through the image.Typically, 8 paths are considered to cover all directions: left-to-right, right-to-left, top-to-bottom, bottom-to-top, and the four diagonals.
For a given pixel h and disparity d, the aggregated cost E s (h, d) along a path s is calculated as follows: where C(h, d) is the matching cost at pixel h for disparity d computed from the mutual information, E s (h − s, d) is the cost for pixel h at disparity d accumulated from the previous pixel along path s, K 1 and K 2 are the penalty for small disparity changes and for large disparity changes, respectively, to allow for disparity discontinuities.The last term is the minimum cost for the previous pixel h − s over all disparities.After the cost aggregation step, each pixel will have 8 different accumulated costs, 1 for each path.These costs are then combined to produce a single, aggregated cost for each pixel and disparity: The final disparity for each pixel is chosen as the disparity that minimizes this aggregated cost.

Disparity Refinement
Once the initial disparity map is computed, it often requires further refinement to improve the accuracy, especially in occluded or low-texture areas.Refinement steps may include subpixel enhancement, occlusion handling, disparity map filtering, and left-right consistency check.Figure 5 illustrates the application of the SGM algorithm on the DSRE system, which was conducted in the MATLAB R2022a (Mathworks Inc.Portola Valley, CA, USA) with the disparity SGM and reconstruct Scene function.The depth information would be further used in the interaction matrix L in the ASVS controller.This matrix is essential for providing accurate and reliable positional information regarding the target.

Adaptive Stereo Visual Servoing Control
The visual feedback is directly utilized in the control loop of IBVS.The goal is to match the current image with the desired image.The error signal

Monocular Vision Model
Take the left camera as an example to demonstrate the camera model for brevity.The midpoint of the connection line between the two cameras coincides with t O .
As shown in Figure 4b, as a given point

Adaptive Stereo Visual Servoing Control
The visual feedback is directly utilized in the control loop of IBVS.The goal is to match the current image with the desired image.The error signal e ∈ R 2 for the controller is derived from the difference between the current image features s ∈ R 2 and the desired image features s d , and .e= .s.

Monocular Vision Model
Take the left camera as an example to demonstrate the camera model for brevity.The midpoint of the connection line between the two cameras coincides with O t .
As shown in Figure 4b, as a given point its coordinates in the image frame O l x l y l and pixel frame O pl uv are q l = x l y l T and s l = u l v l T , respectively.According to the pinhole camera model, the perspective equation can be obtained from the relationship on similar triangles, i.e., as follows: where λ xl , λ yl are the focal length in pixels, c xl , c yl are the optical center in pixels, and λ l is the focal length in millimeters.
The motion of the features .s l on the image plane can be predicted using the interaction matrix and velocity of camera v cl ∈ R 6 : . s l = L l v cl (8) with (7), L l ∈ R 2×6 is given by where z e cl is the depth of point Q, which can be estimated online with the stereo vision.Similarly, we can obtain the right camera model with .s r = L r v cr .

Stereo Vision Model
Considering the distance d between cameras, the transformation matrix cl t T ∈ R 4×4 from the left camera frame to the robot tip frame is while the transformation matrix for the right camera is cr t T = τ x (−d/2).Since the left-right camera is fixed on the tip of the DSRE, the velocity transformation between the camera frame and robot tip frame is given by v cl = cl t Mv t v cr = cr t Mv t (11) where the transformation matrix c* t M ∈ R 6×6 can be calculated from [45] with (10), i.e., T represent the image features motion extracted from the left and right cameras simultaneously, and the overall velocity transformation from the robot tip frame to the pixel frame can be derived from ( 8), (11), and (12): where L ∈ R 4×6 is the overall interaction matrix of the stereo vision system.Inserting (3) in (13), we obtain the relationship of the joint velocity and image feature: LJ r ∈ R 4×5 is a singular matrix without an inverse matrix.Commonly used pseudoinverse methods often have stability problems.When the robot approaches a singular configuration, this method will cause excessive joint velocities and cause safety problems.The damped least squares method [46] (15) where σ ∈ R is a non-zero damping constant.
The inverse problem could be solved by minimizing the object (14).The joint velocities could be derived further as

Adaptive PD Controller
The classic IBVS tasks usually use a proportional control law [28].Let the error of the feature satisfy the first-order equation .e = −ηe, so that the error value can decrease rapidly in an exponential speed.Thus, Although a proportional controller responds quickly and is simple to deploy, it cannot eliminate the steady-state error (SSE).In this paper, we adopt a PD controller to reduce the SSE and increase the system stability.Adding a derivative term to (17), we obtain where K d is the non-negative derivative gain.
To enhance the efficiency of the IBVS convergence process, we propose the implementation of an adaptive proportional gain.η is dynamically adjusted in real time based on the infinity norm of the feature error, aiming to reduce the number of iterations necessary for convergence and enhance the system's performance.η is set as follows: where η u and η l are the preset maximum and minimum gains, respectively, e 0 is the initial error to a new target.The detailed flow chart of the ASVS controller is shown in Figure 6.
Micromachines 2024, 15, 276 11 of 22 Although a proportional controller responds quickly and is simple to deploy, it cannot eliminate the steady-state error (SSE).In this paper, we adopt a PD controller to reduce the SSE and increase the system stability.Adding a derivative term to (17), we obtain where d K is the non-negative derivative gain.To enhance the efficiency of the IBVS convergence process, we propose the implementation of an adaptive proportional gain.η is dynamically adjusted in real time based on the infinity norm of the feature error, aiming to reduce the number of iterations necessary for convergence and enhance the system's performance.η is set as follows: ( ) where u η and l η are the preset maximum and minimum gains, respectively, 0 e is the initial error to a new target.The detailed flow chart of the ASVS controller is shown in Figure 6.

Depth Estimation Validation
To validate the efficacy of the proposed camera depth estimation algorithm, a comparison was made between the ground truth, derived from EM trackers, model 10001742 (NDI Aurora, Waterloo, ON, Canada), and the values estimated by the algorithm.The specific setup used for the testing process is illustrated in Figure 7a.Three trackers were fixed on a piece of tissue while two base trackers were mounted on the tip of the distal segment.

Experimental Validation 4.1. Depth Estimation Validation
To validate the efficacy of the proposed camera depth estimation algorithm, a comparison was made between the ground truth, derived from EM trackers, model 10001742 (NDI Aurora, Waterloo, ON, Canada), and the values estimated by the algorithm.The specific setup used for the testing process is illustrated in Figure 7a.Three trackers were fixed on a piece of tissue while two base trackers were mounted on the tip of the distal segment.For this experiment, the ground truth was defined as the distance between the base tracker and the trackers placed within the task space.The stereo camera in Figure 7b was first well calibrated with the method in [47], resulting in a notably low mean reprojection error of merely 0.2 pixels.A comparison was then made between the estimated depth, as calculated by the SGM, and the actual values across different altitudes.As depicted in the error band curves in Figure 8, the RMSE of the estimated depth parameters of different markers were 1.5334 mm, 1.5449 mm, and 1.5468 mm, respectively, when compared with the ground truth.This illustrates that the proposed algorithm is highly accurate, and it could obtain reliable depth estimations in various surgical applications.The estimated depth was further used in the visual servoing control.ground truth.This illustrates that the proposed algorithm is highly accurate, and it could obtain reliable depth estimations in various surgical applications.The estimated depth was further used in the visual servoing control.

Controller Performance
The performance of the ASVS was evaluated through the execution of four distinct manipulation tasks in this section.The visual servo controller's parameters during these tests were established as follows: ground truth.This illustrates that the proposed algorithm is highly accurate, and it could obtain reliable depth estimations in various surgical applications.The estimated depth was further used in the visual servoing control.

Controller Performance
The performance of the ASVS was evaluated through the execution of four distinct manipulation tasks in this section.The visual servo controller's parameters during these tests were established as follows: Comparative error band analysis of the camera depth estimation results.

Controller Performance
The performance of the ASVS was evaluated through the execution of four distinct manipulation tasks in this section.The visual servo controller's parameters during these tests were established as follows: q 0 = 0 0.2 0.2 0.2 0.2 0.2 0.2 T , η u = 1, η l = 0.5, σ = 0.5, K d = diag{0.02,0.06, 0.02, 0.06} T .In this study, the tracking error was quantified by the Euclidean distance between the current coordinates and target coordinates of the extracted image feature in the left and right frames.The control goal was to stabilize the error below 15 pixels.The frame rate of the left and right cameras was 30 FPS to ensure that the calculated air pressures have been applied to the segments to generate the desired robot shape.
As indicated in Figure 9, to simplify the feature detection and extraction process, the stereo vision system was programmed to track the AprilTags [48] in the workspace.Tags with ID 1, 2, 8, and 9 were selected as the tracking targets.The soft robot was controlled to track the target tags to the desired position on the image plane under the air pressure from the pneumatic regulators.A straight beam equipped with a force sensor was fixed on a linear rail guide slide actuator, and the tip caused contact disturbance to the robot.More results can be found in the Supplementary Video S1.  10b.The initial orientation of the DSRE robot was positioned to face the initial target with the AprilTag ID 3.After initialization, the robot was controlled to bring tags with different IDs to the ROI areas.These tags were equidistantly distributed at 90-degree intervals along the circumference of a circle.In each static tag tracking experiment, the adaptively adjusted gains of the ASVS controller were compared with the classic visual servoing method [28] employing fixed gains of 0.5 η = , 0.75 η = , and 1 η = .Figure 10b illustrates the trajectories of various tags in the left and right images, and all ultimately converged within the ROIs.Due to the inherent limitation of robots with inconsistent stiffness in all directions, the DSRE cannot reach the target at some positions, and it will overshoot after passing through these positions, which resulted in the controller's tracking effect on tag 1 being inferior to several others, as depicted by the blue curve in Figure 10b.Figure 10c and Table 3 detail the pixel errors and convergence times for tracking different tags, respectively.As indicated by (17), when the gain was low, the robot's end effector moved at a reduced velocity, resulting in excessive time consumption.Conversely, with high gain, the end effector's velocity increased, potentially leading to overshoot and oscillation.With the gain set at constant values of 5, 0.75, and 1, the classic controller required an average of 19.4 s, 8.29 s, and 9.04 s, respectively, to achieve the tracking of a static target, reducing the error to below 15 pixels.Meanwhile, the proposed control algorithm necessitated only 4.99 s, corresponding to a reduction in time consumption of 74.28%, 39.81%, and 37.94%, T with a radius of 15 pixels, as shown by the red circle and the green dashed circle in Figure 10b.The initial orientation of the DSRE robot was positioned to face the initial target with the AprilTag ID 3.After initialization, the robot was controlled to bring tags with different IDs to the ROI areas.These tags were equidistantly distributed at 90-degree intervals along the circumference of a circle.In each static tag tracking experiment, the adaptively adjusted gains of the ASVS controller were compared with the classic visual servoing method [28] employing fixed gains of η = 0.5, η = 0.75, and η = 1. Figure 10b illustrates the trajectories of various tags in the left and right images, and all ultimately converged within the ROIs.Due to the inherent limitation of robots with inconsistent stiffness in all directions, the DSRE cannot reach the target at some positions, and it will overshoot after passing through these positions, which resulted in the controller's tracking effect on tag 1 being inferior to several others, as depicted by the blue curve in Figure 10b.Figure 10c and Table 3 detail the pixel errors and convergence times for tracking different tags, respectively.As indicated by (17), when the gain was low, the robot's end effector moved at a reduced velocity, resulting in excessive time consumption.Conversely, with high gain, the end effector's velocity increased, potentially leading to overshoot and oscillation.With the gain set at constant values of 5, 0.75, and 1, the classic controller required an average of 19.4 s, 8.29 s, and 9.04 s, respectively, to achieve the tracking of a static target, reducing the error to below 15 pixels.Meanwhile, the proposed control algorithm necessitated only 4.99 s, corresponding to a reduction in time consumption of 74.28%, 39.81%, and 37.94%, respectively.This result illustrates the robustness and reliability of this method in achieving the accurate control of the soft robot arm and precise tracking of the moving target.This part introduces a tracking experiment designed to evaluate the control performance of the robot under dynamic target changes.As demonstrated in the previous experiment, comparisons were made between the ASVS and classical algorithms using different fixed gains.Figure 11a shows the experimental setup, with tags affixed to an electric linear slide rail, which moved back and forth between two points, A and B, 15 mm apart, at a specified velocity of 0.75mm/s = tag v . The endoscopic images captured during the experiment and the corresponding tracking errors are presented in Figure 11b,c, respectively.The robot was tasked to follow tag 9, while the linear guide remained stationary until the initial stable tracking of the target was accomplished by the robot.As shown in Figure 11b, a significant surge in error occurred when the tag's movement direction changed, but the controller quickly re-captured the tag and subsequently reduced the tracking difference to less than 15 pixels.Table 4 enumerates the tracking errors' root mean  This part introduces a tracking experiment designed to evaluate the control performance of the robot under dynamic target changes.As demonstrated in the previous experiment, comparisons were made between the ASVS and classical algorithms using different fixed gains.Figure 11a shows the experimental setup, with tags affixed to an electric linear slide rail, which moved back and forth between two points, A and B, 15 mm apart, at a specified velocity of v tag = 0.75mm/s.The endoscopic images captured during the ex-periment and the corresponding tracking errors are presented in Figure 11b,c, respectively.The robot was tasked to follow tag 9, while the linear guide remained stationary until the initial stable tracking of the target was accomplished by the robot.As shown in Figure 11b, a significant surge in error occurred when the tag's movement direction changed, but the controller quickly re-captured the tag and subsequently reduced the tracking difference to less than 15 pixels.Table 4 enumerates the tracking errors' root mean square error (RMSE), mean absolute value error (MAE), standard deviation (SD), and maximum error e max for different controllers.The ASVS outperformed the classic controller across various metrics, maintaining the RMSE and MAE beneath 15 pixels, with an SD of merely 5.14 pixels.This rapid error correction is indicative of the agility and precision of the real-time tracking of dynamic targets.The performance of dealing with external disturbance was also evaluated.During the test, the robot utilized endoscopic image feedback exclusively.An external force of  The performance of dealing with external disturbance was also evaluated.During the test, the robot utilized endoscopic image feedback exclusively.An external force of unknown magnitude and location was applied to the soft robot using a slender beam affixed with a single-axial force sensor, which caused a mutable robot configuration.The control objective was to bring the target tag into the predefined ROIs on the image planes.The robot configuration and experimental results are shown in Figure 12.During the experiment, at 11.6 s, the robot first received an external force of 120 mN, and the tracking error increased from 5.9 pixels to 93.4 pixels.The controller subsequently engaged to resist the external force and tracked the tag to the ROIs within approximately 3.5 s, reducing the error to less than 15 pixels.After the external force was maintained for 5 s, it was removed, and the controller repeated the above error correction process.During the entire test process, the robot was affected by dynamic external forces a total of three times.The maximum external force was 150 mN.The robot could maintain the tracking of the target and the error could converge to five pixels.This shows that image-based feedback control is efficacious in enabling the DSRE robot to adjust its configuration responsively to external environmental conditions, thereby enhancing the tracking precision.unknown magnitude and location was applied to the soft robot using a slender beam affixed with a single-axial force sensor, which caused a mutable robot configuration.The control objective was to bring the target tag into the predefined ROIs on the image planes.
The robot configuration and experimental results are shown in Figure 12.During the experiment, at 11.6 s, the robot first received an external force of 120 mN, and the tracking error increased from 5.9 pixels to 93.4 pixels.The controller subsequently engaged to resist the external force and tracked the tag to the ROIs within approximately 3.5 s, reducing the error to less than 15 pixels.After the external force was maintained for 5 s, it was removed, and the controller repeated the above error correction process.During the entire test process, the robot was affected by dynamic external forces a total of three times.The maximum external force was 150 mN.The robot could maintain the tracking of the target and the error could converge to five pixels.This shows that image-based feedback control is efficacious in enabling the DSRE robot to adjust its configuration responsively to external environmental conditions, thereby enhancing the tracking precision.

Trajectory Tracking
Marking the lesion is an important early step in ESD surgery [7].The end effector of the soft robot carrying the electrosurgical knife should always be perpendicular to the working surface, marking along the periphery of the target lesion by coagulation.This experiment aims to evaluate the feasibility of robots in the marking process, with the DSRE tracking a triangular reference trajectory.This trajectory guided the tag moving along the pre-defined path in the captured images.The experimental trajectories of the pixel positions on the left and right image planes are shown in Figure 13.The overall RMSE of the left and right image errors was 13.51 pixels, which demonstrates that the controller has a good tracking performance for the trajectories.

Trajectory Tracking
Marking the lesion is an important early step in ESD surgery [7].The end effector of the soft robot carrying the electrosurgical knife should always be perpendicular to the working surface, marking along the periphery of the target lesion by coagulation.This experiment aims to evaluate the feasibility of robots in the marking process, with the DSRE tracking a triangular reference trajectory.This trajectory guided the tag moving along the pre-defined path in the captured images.The experimental trajectories of the pixel positions on the left and right image planes are shown in Figure 13.The overall RMSE of the left and right image errors was 13.51 pixels, which demonstrates that the controller has a good tracking performance for the trajectories.

Resection in an Ex Vivo Porcine Stomach
This test was designed to assess the practical efficacy of robotic systems in ESD procedures.As shown in Figure 14, an ex vivo porcine stomach was affixed within a receptacle to simulate surgical conditions.The operator used a gamepad to perform a remote operation program on the DSRE and controlled the robot to complete the ESD surgery in the stomach.Figure 15 shows the entire process of the ESD surgery.According to the DSRE configuration shown in Figure 2c, the angle between the end effector and the horizontal plane was only 12.5°, which illustrates that the robot can cut tissue in multiple directions.Such a small operating angle can improve the cutting efficiency and safety of traditional ESD. Figure 15a,b show the left endoscopic images during ESD marking and cutting, respectively.The operator operated the DSRE to perform electrocoagulation around the lesion and successfully completed a series of marking points.During the cutting process, the operator used tweezers to assist in pulling the target tissue up to expose the cutting point to the robot and used the joystick of the gamepad to adjust the cutting angle of the robot.After the procedure, as illustrated in Figure 15c, the DSRE has completely removed the target tissue mass without causing perforation in the porcine stomach wall.Finally, a piece of tissue measuring approximately 54  mm was successfully removed.Therefore, this ex vivo experiment illustrates the feasibility of applying the DSRE in ESD surgery.

Resection in an Ex Vivo Porcine Stomach
This test was designed to assess the practical efficacy of robotic systems in ESD procedures.As shown in Figure 14, an ex vivo porcine stomach was affixed within a receptacle to simulate surgical conditions.The operator used a gamepad to perform a remote operation program on the DSRE and controlled the robot to complete the ESD surgery in the stomach.Figure 15 shows the entire process of the ESD surgery.According to the DSRE configuration shown in Figure 2c, the angle between the end effector and the horizontal plane was only 12.5 • , which illustrates that the robot can cut tissue in multiple directions.Such a small operating angle can improve the cutting efficiency and safety of traditional ESD. Figure 15a,b show the left endoscopic images during ESD marking and cutting, respectively.The operator operated the DSRE to perform electrocoagulation around the lesion and successfully completed a series of marking points.During the cutting process, the operator used tweezers to assist in pulling the target tissue up to expose the cutting point to the robot and used the joystick of the gamepad to adjust the cutting angle of the robot.After the procedure, as illustrated in Figure 15c, the DSRE has completely removed the target tissue mass without causing perforation in the porcine stomach wall.Finally, a piece of tissue measuring approximately 5 × 4 mm was successfully removed.Therefore, this ex vivo experiment illustrates the feasibility of applying the DSRE in ESD surgery.

Resection in an Ex Vivo Porcine Stomach
This test was designed to assess the practical efficacy of robotic systems in ESD procedures.As shown in Figure 14, an ex vivo porcine stomach was affixed within a receptacle to simulate surgical conditions.The operator used a gamepad to perform a remote operation program on the DSRE and controlled the robot to complete the ESD surgery in the stomach.Figure 15 shows the entire process of the ESD surgery.According to the DSRE configuration shown in Figure 2c, the angle between the end effector and the horizontal plane was only 12.5°, which illustrates that the robot can cut tissue in multiple directions.Such a small operating angle can improve the cutting efficiency and safety of traditional ESD. Figure 15a,b show the left endoscopic images during ESD marking and cutting, respectively.The operator operated the DSRE to perform electrocoagulation around the lesion and successfully completed a series of marking points.During the cutting process, the operator used tweezers to assist in pulling the target tissue up to expose the cutting point to the robot and used the joystick of the gamepad to adjust the cutting angle of the robot.After the procedure, as illustrated in Figure 15c, the DSRE has completely removed the target tissue mass without causing perforation in the porcine stomach wall.Finally, a piece of tissue measuring approximately 5 4 × mm was successfully removed.Therefore, this ex vivo experiment illustrates the feasibility of applying the DSRE in ESD surgery.

Conclusions
This work presents a novel dual-segment soft robotic endoscope with stereo vision designed to enhance reachability and dexterity during ESD surgery.The proposed system can execute multi-angle cutting operations at a small angle relative to the lesion surface, allowing for efficient en bloc resection.Additionally, the system incorporates two calibrated RGB cameras and a depth estimation algorithm to provide real-time 3D information of the tumor, which is also used to guide the control framework.Based on the adaptively updated gain and PD control laws, the stereo vision servo controller improves the convergence speed and path tracking performance during surgery.The experimental results indicate that the proposed system improves the motion stability and precision.Ex vivo testing further demonstrates its significant potential in endoscopic surgery.

Conclusions
This work presents a novel dual-segment soft robotic endoscope with stereo vision designed to enhance reachability and dexterity during ESD surgery.The proposed system can execute multi-angle cutting operations at a small angle relative to the lesion surface, allowing for efficient en bloc resection.Additionally, the system incorporates two calibrated RGB cameras and a depth estimation algorithm to provide real-time 3D information of the tumor, which is also used to guide the control framework.Based on the adaptively updated gain and PD control laws, the stereo vision servo controller improves the convergence speed and path tracking performance during surgery.The experimental results indicate that the proposed system improves the motion stability and precision.Ex vivo testing further demonstrates its significant potential in endoscopic surgery.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/mi15020276/s1,Video S1: the experimental results of the dual-segment robotic endoscope (DSRE).ϕ θ θ In order to simplify the formula, use s and c instead of sine and cosine functions, and the forward kinematics in (2) can be extended to  The pose Θ t ∈ R 6 of the robot end effector can be calculated from (A2) The velocity v t ∈ R 6 of the robot end effector is Thus, the Jacobian matrix J r ∈ R 6×5 in ( 3) is used to represent the approximate linear relationship between the robot tip velocity v t and joint velocity .Φ, which can be derived from the forward kinematics as follows:

Figure 1 .
Figure 1.(a) Flexible endoscopes provide effective minimally invasive treatment through different naturally narrow orifices in the human body.(b) DSRE promotes dexterity and flexibility for ESD, and stereo vision contributes to evaluation before and after surgery.(c) Schematic diagram of the DSRE silicone-body soft segment, which would bend when the air pressure in the chamber increases.

Figure 1 .
Figure 1.(a) Flexible endoscopes provide effective minimally invasive treatment through different naturally narrow orifices in the human body.(b) DSRE promotes dexterity and flexibility for ESD, and stereo vision contributes to evaluation before and after surgery.(c) Schematic diagram of the DSRE silicone-body soft segment, which would bend when the air pressure in the chamber increases.

Figure 2 .
Figure 2. (a) Fabrication process of the DSRE.(b) Detailed dimension.(c) DSRE with two cameras and an electrosurgical knife.

Figure 2 .
Figure 2. (a) Fabrication process of the DSRE.(b) Detailed dimension.(c) DSRE with two cameras and an electrosurgical knife.

Figure 3 .
Figure 3. (a) Configuration of the EM trackers on proximal and distal segments.(b) Bending angles of proximal and distal segments corresponding to the air pressure, the red lines correspond to the maximum air pressure exerted on each segment.

Micromachines 2024, 15 , 276 7 of 22 Figure 4 .
Figure 4. (a) Illustration of coordinate frames.(b) The binocular stereo vision on the tip.3.1.2.Air Pressure to ShapeAfter finding the shape parameters, the robot system should set proper air pressures for each segment.The joint angles s θ , e θ , s ϕ , and e ϕ can be calculated from the actu-

Figure 4 .
Figure 4. (a) Illustration of coordinate frames.(b) The binocular stereo vision on the tip.

Figure 5 .
Figure 5.The application of the SGM algorithm on the DSRE system.

2 ∈
e  for the controller is derived from the difference between the current image features 2 ∈ s  and the desired image features d s , and = e s   .
coordinates in the image frame l l l O x y and pixel frame pl O uv are to the pinhole camera model, the perspective equation can be obtained from the relationship on similar triangles, i.e., as follows: λ are the focal length in pixels, xl c , y l c are the optical center in pixels, and l λ is the focal length in millimeters.

Figure 5 .
Figure 5.The application of the SGM algorithm on the DSRE system.
× is the skew-symmetric matrix of c

Figure 6 .
Figure 6.Flow chart of the ASVS controller.

Figure 6 .
Figure 6.Flow chart of the ASVS controller.

Figure 7 .
Figure 7. Depth estimation validation setup.(a) A base tracker was used to measure the ground truth of depth.(b) The stereo vision on the tip of DSRE.(c) Image of left camera.(d) Image of right camera.

Figure 8 .
Figure 8. Comparative error band analysis of the camera depth estimation results.

Figure 7 .
Figure 7. Depth estimation validation setup.(a) A base tracker was used to measure the ground truth of depth.(b) The stereo vision on the tip of DSRE.(c) Image of left camera.(d) Image of right camera.

Figure 7 .
Figure 7. Depth estimation validation setup.(a) A base tracker was used to measure the ground truth of depth.(b) The stereo vision on the tip of DSRE.(c) Image of left camera.(d) Image of right camera.

Figure 8 .
Figure 8. Comparative error band analysis of the camera depth estimation results.

Micromachines 2024 ,Figure 9 .
Figure 9. (a) The actuator configuration.(b) Experimental setup for the ASVS controller performance verification.4.2.1.Static Target TrackingIn this experiment, the regions of interest (ROIs) within the left and right images were delineated as circular areas centered at points

Figure 9 .
Figure 9. (a) The actuator configuration.(b) Experimental setup for the ASVS controller performance verification.4.2.1.Static Target Tracking In this experiment, the regions of interest (ROIs) within the left and right images were delineated as circular areas centered at points s dl = 210 200 T and s dr = 190 200

Figure 10 .
Figure 10.Static target tracking results.(a) Distribution configuration of target markers.(b) Target movement trajectories in the left and right images, the tags moved from the start points and converged to the end points.(c) Tracking errors of different controllers.

Figure 10 .
Figure 10.Static target tracking results.(a) Distribution configuration of target markers.(b) Target movement trajectories in the left and right images, the tags moved from the start points and converged to the end points.(c) Tracking errors of different controllers.

Micromachines 2024 ,
15, 276 15 of 22 square error (RMSE), mean absolute value error (MAE), standard deviation (SD), and maximum error max e for different controllers.The ASVS outperformed the classic controller across various metrics, maintaining the RMSE and MAE beneath 15 pixels, with an SD of merely 5.14 pixels.This rapid error correction is indicative of the agility and precision of the real-time tracking of dynamic targets.(a)(

Figure 11 .
Figure 11.Dynamic target tracking results.(a) Experiment setup.(b) Endoscopic view when the tag moved.(c) Tracking error of different controllers.

Figure 11 .
Figure 11.Dynamic target tracking results.(a) Experiment setup.(b) Endoscopic view when the tag moved.(c) Tracking error of different controllers.

Figure 12 .
Figure 12.Tracking results under unknown external disturbance.The upper pictures show the robot configurations before, during, and after the external force application.The figure below shows the changes in external force and tracking error results.

Figure 12 .
Figure 12.Tracking results under unknown external disturbance.The upper pictures show the robot configurations before, during, and after the external force application.The figure below shows the changes in external force and tracking error results.

Figure 13 .
Figure 13.Triangular trajectory tracking results.(a) Target movment in the left image.(b) Target movment in the right image.

Figure 13 .
Figure 13.Triangular trajectory tracking results.(a) Target movment in the left image.(b) Target movment in the right image.

Figure 13 .
Figure 13.Triangular trajectory tracking results.(a) Target movment in the left image.(b) Target movment in the right image.

Figure 14 .
Figure 14.Experimental setup of ex vivo trial in a porcine stomach.

Figure 14 .Figure 15 .
Figure 14.Experimental setup of ex vivo trial in a porcine stomach.

Figure 15 .
Figure 15.ESD test results in ex vivo porcine stomach.(a) Marking points.(b) Cutting with assistance.(c) Cutting results.(d) Target tissue after dissection.

Table 1 .
Comparison of the performance of the DSRE with related robot system.

Table 3 .
Convergence times of ASVS and the classic control law with different constant  .

Table 3 .
Convergence times of ASVS and the classic control law with different constant η.

Table 4 .
Control performance of ASVS and the classic control law with different constant  in[28].

Table 4 .
[28]rol performance of ASVS and the classic control law with different constant η in[28].