1. Introduction
Self-driving laboratories (SDLs) are transforming the landscape of autonomous experimentation by integrating robotics, computer vision, and intelligent planning to accelerate scientific workflows. Within these environments, dexterous manipulators equipped with sophisticated grippers play a pivotal role in executing precise and adaptive tasks, often outperforming human operators in terms of reliability and efficiency [
1,
2,
3]. A key challenge in such dynamic settings lies in enabling robotic systems to interact with textured objects in real time, requiring seamless coordination between vision-based perception and motion planning. This coordination must not only ensure accurate object tracking but also generate smooth and feasible joint trajectories for safe and efficient manipulation.
This work presents a foundational control framework designed to address this challenge by coupling real-time object tracking with smooth and stable motion execution. The system integrates a hybrid vision pipeline based on feature detection and homography-driven pose estimation with Jacobian-based motion planning for a 7-DOF manipulator. While the vision module is validated using a highly textured, opaque object to ensure reliable tracking, the primary contribution lies in the rigorous quantitative analysis of joint motion dynamics. Specifically, we evaluate velocity continuity, acceleration, jerk, snap, and smoothness cost functions to benchmark trajectory quality and mechanical stability. Rather than claiming architectural novelty, this study focuses on validating the control pipeline’s ability to generate smooth, feasible trajectories in response to dynamic visual input. Depth information is leveraged to interpret object orientation in 3D space, guiding the manipulator’s end-effector toward dynamically changing targets. Trajectories are generated using the RRT* algorithm, while inverse kinematic solutions are computed using Damped Least Squares (DLS), chosen for its potential to optimize joint smoothness and precision [
4]. By establishing a modular and reproducible baseline, this framework lays the groundwork for future extensions involving more complex vision modalities and real-world SDL objects such as transparent glassware, reflective surfaces, and low-texture microplates.
While SDLs ultimately encompass capabilities such as autonomous hypothesis generation, adaptive decision-making, and closed-loop experimental optimization, the present work does not attempt to implement these higher-level functions. Instead, we focus on a foundational subsystem that enables such autonomy: reliable, vision-guided motion execution. The study develops and rigorously evaluates a modular control pipeline that integrates real-time object tracking with smooth and stable manipulator motion. Since SDL platforms depend on precise, repeatable, and dynamically responsive robotic manipulation to carry out experimental actions, this subsystem forms a necessary building block for future SDL architectures. Through quantitative analysis of joint-level smoothness, stability, and tracking performance, the work establishes a reproducible baseline that can be incorporated into more advanced frameworks involving planning, reasoning, and adaptive experimentation. Although robotic manipulation is central to SDL operation, existing frameworks seldom combine real-time feature-based tracking, homography-guided depth estimation, and smooth Jacobian-based motion execution within a unified pipeline. Perception and motion planning are often evaluated in isolation, leaving a gap in integrated, quantitatively validated approaches for dynamic object tracking and smooth trajectory generation. This motivates our contribution: a modular, reproducible baseline for vision-guided motion execution that supports the manipulation layer of SDL environments without overstating system-level autonomy.
The remainder of this paper is organized as follows.
Section 2 focuses on previous work related to kinematic analysis, motion planning schemes, and vision algorithms. Kinematic modeling and workspace analyses of the manipulator, motion planning schemes and vision algorithm are presented in
Section 3. Experimental results, including simulation and comparison studies of the proposed motion schemes, are presented in
Section 4.
2. Related Work
In the context of SDLs, mobile manipulators that combine a dexterous arm with a mobile base are proving essential for automating complex experimental workflows. These integrated platforms offer both spatial mobility and fine-grained manipulation capabilities, allowing them to navigate dynamic lab environments and interact with diverse instruments and materials. Their deployment in chemical research settings introduces distinct challenges and opportunities, particularly due to the delicate and potentially hazardous nature of lab operations. Tasks such as handling fragile glassware, precisely dispensing reagents, and interfacing with analytical equipment require high levels of accuracy, repeatability, and safety making mobile manipulators a cornerstone of autonomous ssecientific discovery.
Screw theory formulations are widely used for the kinematic modeling of robotic systems with higher number of degrees-of-freedom (dof) [
5,
6]. This approach proved to be particularly flexible for modeling complex systems with coupled and offset joints [
7,
8]. Liu et al. [
9] introduced a kinematic modeling approach for a 6-DOF industrial robot utilizing screw theory formulations. They employed a Particle Swarm Optimization (PSO)-based algorithm to minimize synthesis errors while considering kinematic and dynamic constraints. Another method for kinematic modeling of a redundant manipulator, combining screw theory with the Newton-Raphson method, was presented by Ge et al. [
10]. They derived forward kinematic equations based on screw theory formulations and obtained joint solutions using the Newton-Raphson method. Screw theory-based kinematic modeling and motion analysis of a fixed-base dual-arm robot were demonstrated by Sulaiman et al. [
11]. They utilized screw theory-derived kinematic equations to plot the robot’s workspace. Additionally, Sulaiman et al. [
12] derived kinematic equations for a 10-DOF dual-arm robot with a wheelbase using screw theory. These equations were employed to evaluate singularities and dexterous regions within the robot’s workspace. An iterative method for determining forward kinematic equations using screw theory formulations was demonstrated by Medrano et al. [
13]. They applied this approach to model a 6-DOF manipulator and conducted simulation studies to demonstrate the advantages of their method.
Recent advancements in vision-based grasping have significantly enhanced the capabilities of robotic manipulators in dynamic and unstructured environments [
14,
15]. Hélénon et al. [
16] introduced a plug-and-play vision-based grasping framework that leverages Quality-Diversity (QD) algorithms to generate diverse sets of open-loop grasping trajectories. Their system integrates multiple vision modules for 6-DoF object detection and tracking, allowing trajectory generalization across different manipulators such as the Franka Research 3 and UR5 arms. This modular approach improves adaptability and reproducibility in robotic grasping tasks. Kushwaha et al. [
17] proposed a vision-based intelligent grasping system using sparse neural networks to reduce computational overhead while maintaining high grasp accuracy. Their Sparse-GRConvNet and Sparse-GINNet architectures utilize the Edge-PopUp algorithm to identify high-quality grasp poses in real time. Extensive experiments on benchmark datasets and the a cobot validated the models’ effectiveness in manipulating unfamiliar objects with minimal network parameters. Wang et al. [
18] developed a trajectory planning method for manipulator grasping under visual occlusion using monocular vision and multi-layer neural networks. Their approach combines Gaussian sampling with Hopfield neural networks to optimize grasp paths in cluttered environments. The proposed method achieved a 99.5% identification accuracy and demonstrated significant improvements in motion smoothness and efficiency.
Zhang et al. [
19] presented a comprehensive survey of robotic grasping techniques, tracing developments from classical analytical methods to modern deep learning-based approaches. Their work highlights the evolution of grasp synthesis and the integration of vision algorithms in robotic manipulation. Similarly, Newbury et al. [
20] reviewed deep learning approaches to grasp synthesis, emphasizing the role of convolutional neural networks and transformer models in improving grasp reliability. Du et al. [
21] provided a detailed review of vision-based robotic grasping, covering object localization, pose estimation, and grasp inference for parallel grippers. Their analysis underscores the importance of combining RGB-D data with machine learning models to enhance grasp precision in real-world scenarios. These findings align with the growing trend of integrating vision and motion planning for dexterous manipulation. In addition to grasp synthesis, trajectory planning remains a critical component of robotic manipulation. Zhang et al. [
22] proposed a time-optimal trajectory planning strategy that incorporates dynamic constraints and input shaping algorithms to improve motion speed and smoothness. Finally, Ortenzi et al. [
23] developed an iterative method for determining joint solutions of redundant manipulators performing telemanipulation tasks. Their approach avoids singularities and joint limits, enabling smooth and reliable motion execution. These contributions collectively demonstrate the importance of integrating vision-based perception with robust motion planning schemes to enhance the adaptability, precision, and safety of robotic manipulators in complex environments.
Existing research in SDLs has made notable progress in robotic manipulation, motion planning, and vision-based perception, yet several key limitations persist. Many frameworks lack real-time adaptability in dynamic lab environments, particularly when interacting with textured or moving objects. Vision systems are often limited to static object detection or rely on depth sensors, without integrating feature-based tracking and homography-driven depth estimation for textured surfaces. Furthermore, while screw theory has been widely applied for kinematic modeling, its use in deriving both forward and inverse kinematics with stability guarantees under redundancy and singularities is still underexplored. The screw theory with the Damped Least Squares (DLS) method addresses this gap by enabling smooth and continuous joint trajectories. In motion planning, although algorithms like RRT* are known for optimality, their coupling with real-time visual feedback for adaptive trajectory generation remains limited in current literature. The proposed framework bridges this gap by combining RRT*-based planning with a vision pipeline that supports dynamic pose estimation and grasping. Additionally, few studies offer a unified simulation-based evaluation of these components using quantitative metrics such as RMSE pose errors, velocity continuity, and higher-order motion profiles. By addressing these gaps, our work contributes a cohesive and scalable solution for autonomous experimentation in next-generation SDLs.
4. Results and Discussion
The robotic manipulation pipeline in this study starts with a kinematic formulation, where inverse kinematics is solved and the workspace is analyzed to identify valid end-effector configurations. This step establishes the robot’s ability to access and interact with objects throughout its reachable area. Afterward, object detection is carried out using feature extraction and matching methods, allowing the system to recognize and determine the position of target items within the scene. Once detected, pose estimation is employed to determine the precise position and orientation of the object, providing critical spatial information for manipulation. The final stage involved tracking the textured object, in which the robot dynamically adjusts its position and orientation to follow the object. This sequential framework facilitates reliable and flexible autonomous handling in complex and variable settings. This step-by-step pipeline enables the robot to operate autonomously with both reliability and adaptability, even in dynamic or unstructured environments. The object detection and pose estimation method was integrated into the Robot Operating System (ROS) and implemented using OpenCV 4.11.0 on an Ubuntu 20.04 machine. The experiments were carried out on a system equipped with a
GHz Intel Core
processor and 16 GB of RAM. To assess the performance of the algorithm, a simulation environment was developed using Rviz and Gazebo, where a mobile manipulator interacts with a textured book cover, as shown in
Figure 4. The complete simulation setup, including both the Gazebo world and the Rviz visualization, is presented in
Figure 4a,b.
The motion strategies were evaluated based on the smoothness of joint motions using a smoothness function and the error range relative to the desired end-effector trajectory. Additionally, the velocity, acceleration, jerk, and snap values of the motions were determined to analyse the joint motions. The proposed method aimed to gradually adjust joint positions and orientations from a stable state. However, due to the arbitrary number of iterations, the time required to find an inverse kinematics (IK) solution for a given end-effector pose was variable. Despite this, the duration of a single iteration remained constant with respect to the dimensionality of
J and
, and was unaffected by the full algorithm’s completion. To address this variability, a maximum time limit for the algorithm was enforced by setting an upper bound on the number of iterations. In Jacobian-based inverse kinematics solvers, the dimensionality of
J in 3-dimensional space is typically either 3 or 6. A 3-dimensional
J encodes only the positional information for the end-effector, while a 6-dimensional
J is often preferred as it includes both positional and orientation information. In this work, a 6-dimensional
J vector was chosen to account for both positional and orientation components. The system was deemed repeatable if a given goal vector
consistently produced the same pose vector. However, achieving repeatability in redundant systems requires special measures, as this consistency is not guaranteed inherently. An alternative approach involves resetting the system to a predefined default pose, ensuring repeatable solutions. However, this method may introduce sharp discontinuities in the solution trajectory. For every inverse solution technique employed in this work, the error matrix
was assigned values as described in (
19):
was determined such that it moves
closer to
. The starting iteration assumes
as
. The stopping conditions were implemented to improve the performance of the method and reduce computational effort during the iterations. In this work, the stopping criteria were as follows:
Finding a solution within the error limits given in (
20).
Convergence to a local minimum.
Non-convergence after the allotted time.
Maximum iterations reached.
If the solution converges to a local minimum, pose vectors can be randomized to avoid recurrence in future iterations. An allotted time can be specified for each step to prevent the method from exceeding a predetermined duration. Additionally, the maximum number of steps can be set to limit computational time. The iteration step size,
, was determined using the desired
and
as given in (
20):
The step size was limited by scaling it with
. The determination of
was carried out after computing
for better approximation. A common approach involved limiting joint rotation increments to a maximum of 5 degrees per iteration. Initially,
was computed without including
, and later checked to ensure no
values exceeded the threshold
, calculated using Equation (
21):
After finalizing
, the
values were updated during iterations using (
22):
The dampening constant of DLS method was determined by multiplying the size of the desired change in the end effector by a small constant that was set at each iteration. The small constant was taken as 1/100 of the total sum of the segments from base to end effector. Small constant was added to prevent the oscillating of errors due to desirable change was too small to dampen. The problem occurs because the target positions are nearly close to each other. The inclusion of avoids the convergence of jacobian matrix to a singular space at any time. An orthogonal set of vectors, which are equal to the length of the desirable change in vector, will result in a good damping effect. However, the effect resulted in increase in computational time for calculating the joint values. Each iteration tried to lower the value of errors to the desired value. The iteration stopped when the stopping criteria specified in section were met.
A RealSense camera mounted on the mobile platform (as shown in
Figure 4) was used to determine the initial pose of the textured book cover. After the pose was obtained, the RRT* motion planner generated a trajectory that enabled the manipulator to track the book’s motion while keeping the end effector in a grasp-ready configuration. The DLS approach was applied to compute the joint angles required to follow this path. The outputs of the vision module are displayed in
Figure 5a,b.
Figure 5a presents the detected bounding box around the object, while
Figure 5b shows the resulting pose estimation. These visual outcomes demonstrate that the algorithm successfully identifies and orients the target, supplying the necessary information for the subsequent manipulation stage. The mobile manipulator was placed in front of a book as shown in
Figure 4. The manipulator follows a trajectory computed via the Rapidly exploring Random Tree Star (RRT*) algorithm, with corresponding joint configurations derived using the Damped Least Squares (DLS) inverse kinematics method. The end-effector pose is defined such that the manipulator’s gripper approaches the book frontally, maintaining a face-to-face orientation at a distance of 15 cm in x-axis as shown in
Figure 6 and
Figure 7.
Figure 6 and
Figure 7 illustrate the manipulator’s progression through its initial, intermediate, and final configurations as it approaches the target pose, visualized in Rviz and Gazebo environments, respectively. To further validate the system’s manipulation capabilities, we performed tracking experiments following the motions of the book. During tracking, camera placed on the manipulator was used for detecting the updated poses of the book.
Figure 8 and
Figure 9 present snapshots of the manipulator’s motion in Rviz and Gazebo as it responds to variations in the book’s pose. These images illustrate how the robot updates its motion plan and adjusts both its joints and end-effector orientation to reach the target. Although the mobile base remains fixed, the arm continuously adapts its configuration using the pose information supplied by the vision module. The colored coordinate axes shown in each frame correspond to the estimated pose of the book, emphasizing the system’s capability to track and align with the object during the approach. Collectively, these visual results demonstrate the robustness of the combined perception and planning pipeline in enabling accurate and responsive interaction with the target.
The RRT* planner is used only during the initial approach, where it computes an optimal, collision-free path from the robot’s current configuration to a region near the target. After the end-effector arrives at this location, the system switches to a real-time tracking mode. In this stage, continuous pose updates from the vision module (running at roughly 13 FPS) are fed into a DLS-based inverse kinematics controller, which applies small corrective motions. Rather than generating new RRT* plans at every update, the DLS controller functions within a visual-servoing loop, compensating for the instantaneous pose error between the end-effector and the newly estimated target pose. By separating long-range global planning (handled by RRT*) from short-range local tracking (handled by DLS servoing), the system achieves both computational efficiency and smooth, responsive motion. The experimental results confirmed that the proposed framework can smoothly transition from visual pose estimation and trajectory planning to direct physical interaction, demonstrating strong performance in both tracking and manipulation scenarios. All planning and execution were performed using the ROS MoveIt environment, which provided stable and adaptable path optimization for accurate object handling. To obtain quantitative performance insights, we conducted a series of trials using a textured book cover placed at multiple positions within the camera’s field of view, under varying environmental conditions. The system achieved a tracking accuracy of 96.7% as given in
Table 2, indicating consistent pose estimation until occlusion occurred due to end-effector interference. Pose estimation error remained within ±0.63 cm, confirming the system’s suitability for precision grasping tasks. Maximum pose estimation error (±0.63 cm) was computed by comparing the output of the vision pipeline to the ground-truth object pose provided by the Gazebo simulation environment.The detection module achieved an average processing latency of 75 ms per frame, corresponding to a real-time performance of approximately 13.30 FPS. The system attained a precision of 97.1% and a recall of 96.5%, demonstrating reliable object identification and localization even under challenging lighting and background conditions. These results indicate that the approach is well-suited for dynamic manipulation tasks in semi-structured environments and generalizes effectively across different object types and camera viewpoints. Overall, the demonstrated performance highlights its potential for deployment in practical applications such as automated sorting, assistive robotic systems, and mobile manipulation.
Table 3 presents a comparative evaluation of the proposed motion scheme against several established pose-estimation and position-error detection methods. The results clearly demonstrate the superior performance of the proposed framework in both recall and positional accuracy. In terms of pose-estimation recall, the proposed scheme achieves a value of 96 %, substantially outperforming all competing approaches. Classical feature-matching and voting-based techniques adopted in [
32] such as Geometric Consistency (GC) [
33], Hough Transform (HG) [
34], Search of Inliers (SI) [
35], Spectral Technique (ST) [
36], NNSR [
37], and RANSAC [
26] exhibit significantly lower recall values ranging from 0 % to 31 %. These methods are known to be sensitive to feature noise, viewpoint variations, and partial occlusions, which limits their robustness in dynamic laboratory environments. The near-perfect recall of the proposed method highlights its reliability in consistently detecting and tracking the target object under varying visual conditions. A similar trend is observed in the position-error comparison. Existing approaches such as Dong and Rodriguez et al. [
38] report mean errors as high as
mm, while Li et al. [
39] achieve
mm. More recent high-precision methods such as Zhao et al. [
40] report errors in the range of 0.068–0.102 mm. In contrast, the proposed scheme attains a lower mean position error of
mm, placing it within the same high-accuracy regime while maintaining a simpler and more computationally efficient pipeline. This improvement can be attributed to the integration of homography-based pose estimation with the DLS inverse kinematics strategy, which enhances both spatial accuracy and motion stability. Overall, the results confirm that the proposed vision-guided motion framework not only surpasses traditional feature-based pose-estimation techniques in recall but also delivers competitive or superior positional accuracy relative to state-of-the-art methods. These findings underscore the suitability of the proposed approach for precise, reliable, and adaptive manipulation tasks in autonomous SDL environments.
4.1. Evaluation of Trajectory Motions
Maximum errors and RMSE of cartesian motions obtained using DLS method were shown in
Table 4. The maximum translational errors range from 0.74 mm to 1.11 mm, while rotational errors peak at 1.75 deg. The RMSE values indicate consistent performance, with translational deviations remaining below 1 mm and rotational deviations under 1.10 deg. These results demonstrate the DLS method’s effectiveness in maintaining low pose estimation errors across the trajectory. Velocity Continuity (VC), Acceleration Profile (AP), jerk, snap, smoothness and RMSE errors of end effector were evaluated. Maximum values of AP, jerk, and snap values of motions were calculated to analyse the behaviour of motions. VC, AP, jerk, and snap values of manipulator joints are given in
Table 5.
Table 5 presents the dynamic motion metrics of the manipulator joints, including velocity continuity (VC), acceleration profile (AP), jerk, and snap, evaluated across all seven joints (
to
). The VC values range from 0.08 deg/s to 0.30 deg/s, indicating smooth transitions in joint velocities without abrupt discontinuities. The acceleration profiles span from 0.75 deg/s
2 to 1.9 deg/s
2, reflecting the rate of change in velocity during motion execution. Jerk values, which quantify the variation in acceleration, remain below 0.41 deg/s
3, suggesting well-regulated dynamic behavior. Similarly, snap values representing the fourth derivative of position are maintained below 0.46 deg/s
4, confirming the overall smoothness and stability of the joint trajectories. These metrics collectively demonstrate that the motion planning and control strategies employed yield dynamically consistent and mechanically safe joint movements suitable for precise and compliant manipulation tasks.
DLS performed better in terms of VC, AP, jerk and snap values of motions. Motions obtained using JT method exhibited lower VC, unstable AP, higher jerk and higher snap values.
Table 6 presents the smoothness values associated with the individual joint motions of the manipulator, denoted as
through
. The smoothness metric serves as a quantitative indicator of trajectory continuity, where lower values correspond to reduced dynamic fluctuations in joint motion specifically in velocity, acceleration, jerk, and snap thereby reflecting smoother and more mechanically stable trajectories. These values quantify the continuity and fluidity of joint trajectories during execution. The results indicate that most joints exhibit smooth motion profiles, with values ranging from 0.78 to 1.57. Notably, joint
shows a higher smoothness value of 2.14, suggesting increased variability or dynamic complexity in its motion. Overall, the smoothness metrics reflect well-conditioned joint behavior, contributing to stable and precise end-effector control throughout the manipulation task.
4.2. Discussion
DLS method determined feasible joint motions with optimum accuracy and feasibility. It prevented the occurrences of high joint velocities and provide a smooth motion even near to singular regions by employing the appropriate damping factor. The performance of the DLS-based inverse kinematics solver is influenced by the choice of the damping factor . In our implementation, the system exhibits low to moderate sensitivity to variations in , primarily due to smooth trajectories and the manipulator does not operate near severe singularities during the evaluated tasks.
Small damping values (e.g., ) yield highly accurate joint updates but may amplify noise in the Jacobian pseudo-inverse, causing oscillations when the manipulator approaches singular configurations.
Moderate damping values (e.g., ) provide a good balance between accuracy and stability. In this range, the system maintains smooth joint trajectories, low RMSE pose errors, and stable velocity/acceleration profiles.
Large damping values (e.g., ) overly suppress the Jacobian, leading to slower convergence and slightly increased tracking error, but the system remains stable and does not exhibit discontinuities.
When the target object is located near the boundary of the manipulator’s workspace, the proposed framework handles this situation through a combination of workspace analysis, trajectory planning, and the stabilizing properties of the DLS inverse kinematics method. First, the full Cartesian workspace of the manipulator and gripper is precomputed during the modeling stage. This allows the system to determine whether a target pose is fully reachable or lies close to the workspace limits. When the object is near the boundary, the RRT* planner automatically generates trajectories that remain within feasible regions and avoids joint-limit violations. Second, the DLS inverse kinematics formulation provides additional robustness near workspace boundaries. As the manipulator approaches stretched or near-singular configurations, the damping term prevents numerical instability and ensures smooth joint updates. Finally, if the target pose lies outside the reachable workspace, the system does not attempt to force an infeasible configuration. Instead, it selects the closest reachable pose and plans a safe trajectory to that point, ensuring stable behavior without abrupt or unsafe joint motions. While the experiment utilized a textured book cover as a proxy object, the demonstrated capabilities particularly robust tracking of textured planar surfaces and smooth, redundant motion planning are directly transferable to high-value tasks in a SDL environment.
The system maintains stable tracking under partial visibility (at least 60 % of the textured object visible) through a combination of redundant feature extraction, FLANN-based matching, and RANSAC homography estimation. RANSAC effectively rejects outlier correspondences introduced by occlusion, enabling reliable homography computation even when only a subset of the object is visible. Depth information is further integrated to stabilize the surface normal when visual features are sparse. The current vision pipeline, which relies on feature-based detection and homography-driven pose estimation, cannot generalize to more challenging SDL objects such as transparent glassware, reflective instrument panels, or low-texture plastic microplates. These object classes often lack sufficient visual features for reliable SIFT-based tracking and require alternative sensing modalities (e.g., depth fusion, polarization imaging, or thermal vision) or learning-based approaches (e.g., deep neural networks trained on domain-specific datasets) to ensure robust perception. However, the primary objective of this study is to establish a foundational motion and control framework that is modular and sensor-oriented. The architecture is designed to accommodate future upgrades to the perception stack without requiring changes to the underlying kinematic modeling or trajectory optimization methods. In this regard, the current implementation serves as a proof-of-concept that validates the motion planning and control pipeline under ideal visual conditions, providing a baseline for future extensions.
Although this work is motivated by the broader vision of SDLS, it does not attempt to implement the full spectrum of SDL autonomy, such as hypothesis generation, experimental reasoning, or closed-loop optimization. Instead, the system presented here is to be understood as a manipulation-level subsystem that enables those higher-level capabilities. By focusing on the integration of real-time feature-based tracking, homography-guided depth estimation, and smooth Jacobian-based motion execution, we address a foundational layer that many SDL frameworks implicitly assume but rarely evaluate in a unified manner. The quantitative analysis of smoothness, stability, and tracking accuracy demonstrates that reliable, reproducible motion execution can be achieved even under dynamic visual input—an essential prerequisite for any platform that aims to automate experimental procedures. Positioning the contribution in this way ensures that the claims remain aligned with the demonstrated capabilities while highlighting the subsystem’s importance for future SDL architectures that incorporate planning, reasoning, and adaptive experimentation.
5. Conclusions
This study introduced a multimodal control framework that integrates vision-guided object tracking with Jacobian-based motion planning for autonomous manipulation in SDL environments. The proposed system combined feature-based detection and homography-driven pose estimation with RRT*-based trajectory generation and inverse kinematics computed via the Damped Least Squares (DLS) method. Kinematic modeling using screw theory enabled robust workspace analysis and stable joint solutions, even in the presence of redundancy and near-singular configurations. Quantitative evaluations demonstrated the effectiveness of the framework. The manipulator achieved a maximum Cartesian pose error of 1.11 mm and a root mean square error (RMSE) of 0.97 mm across translational axes. Rotational RMSE remained below 1.10 deg, confirming precise orientation tracking. Joint motion profiles showed velocity continuity values ranging from 0.08 deg/s to 0.30 deg/s, with acceleration profiles peaking at 1.9 deg/s2. Higher-order metrics such as jerk and snap were maintained below 0.41 deg/s3 and 0.46 deg/s4, respectively, indicating smooth and dynamically stable trajectories. Additionally, joint smoothness values ranged from 0.78 to 2.14, with most joints exhibiting consistent motion behavior. Despite these promising results, certain limitations persist, particularly in handling edge-of-workspace scenarios and highly nonlinear object motions. Future work will focus on enhancing the linear kinematic model with second-order approximations to improve accuracy near workspace boundaries. Real-time feedback from tactile and force-torque sensors will be incorporated to enable compliant manipulation and safe human-robot interaction. Incorporating force–torque sensors would significantly enhance the current framework by enabling contact-aware and compliant manipulation, which is essential for handling delicate laboratory objects. Force feedback would complement the vision pipeline by providing stability during grasping, alignment, and interaction tasks where visual cues may be unreliable. This integration would extend the system from purely vision-guided motion to a more robust multimodal control strategy suitable for real-world SDL operations. Furthermore, extending the framework to support multi-arm coordination and mobile base integration will enhance scalability and operational reach. Finally, benchmarking the system across diverse laboratory tasks and deploying it in real-world experimental settings will be essential for validating its generalizability and impact on autonomous scientific discovery.
Performance comparisons further underscore the advantages of the proposed approach. In pose-estimation recall, the proposed scheme achieved a value of 0.96, substantially surpassing classical methods such as GC (0.24), HG (0.31), ST (0.01), NNSR (0.00), and RANSAC (0.00). This near-perfect recall highlights the robustness of the vision pipeline in consistently detecting and tracking textured planar objects under varying visual conditions. A similar trend is observed in the position-error evaluation, where the proposed method attained a mean error of 0.09 mm, significantly outperforming Dong and Rodriguez et al. (1.90 mm) and Li et al. (0.14 mm), while remaining competitive with recent high-precision approaches such as Zhao et al. (0.068–0.102 mm). Collectively, these results reinforce the precision, reliability, and stability of the integrated perception-and-motion framework.
This work demonstrated foundational capabilities that support the operational demands of SDL workflows, without attempting to implement full SDL autonomy. By combining reliable tracking of textured planar objects with smooth, redundancy-aware motion execution, the system establishes a manipulation-level baseline that can underpin tasks such as handling labeled reagent bottles, interacting with instrument interfaces, and managing microplate logistics. Future work will extend the perception module to accommodate laboratory objects by incorporating multi-modal sensing and deep learning-based detection and pose estimation. We also plan to evaluate the framework in real laboratory settings involving diverse geometries, material properties, and environmental conditions. Finally, integrating obstacle-rich or constrained environments will allow the advantages of RRT* to be demonstrated more explicitly. Together, these developments will further validate the scalability and robustness of the proposed subsystem within practical autonomous experimentation pipelines.