Design and Implementation of an Autonomous Mobile Robot for Object Delivery via Homography-Based Visual Servoing

Lin, Jung-Shan; Hsiao, Yen-Che; Hung, Jeih-Weih

doi:10.3390/fi17090379

Open AccessArticle

Design and Implementation of an Autonomous Mobile Robot for Object Delivery via Homography-Based Visual Servoing

by

Jung-Shan Lin

,

Yen-Che Hsiao

and

Jeih-Weih Hung

^*

Department of Electrical Engineering, National Chi Nan University, No. 301, University Rd., Puli Township, Nantou County 545, Taiwan

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(9), 379; https://doi.org/10.3390/fi17090379

Submission received: 6 August 2025 / Revised: 21 August 2025 / Accepted: 21 August 2025 / Published: 24 August 2025

(This article belongs to the Special Issue Mobile Robotics and Autonomous System)

Download

Browse Figures

Versions Notes

Abstract

This paper presents the design and implementation of an autonomous mobile robot system able to deliver objects from one location to another with minimal hardware requirements. Unlike most existing systems, our robot uses only a single camera—mounted on its robotic arm—to guide both its movements and the pick-and-place process. The robot detects target signs and objects, automatically navigates to desired locations, and accurately grasps and delivers items without the need for complex sensor arrays or multiple cameras. The main innovation of this work is a unified visual control strategy that coordinates both the vehicle and the robotic arm through homography-based visual servoing. Our experimental results demonstrate that the system can reliably locate, pick up, and place objects, achieving a high success rate in real-world tests. This approach offers a simple yet effective solution for object delivery tasks and lays the groundwork for practical, cost-efficient mobile robots in automation and logistics.

Keywords:

robot vision systems; robotics; manipulators; mobile robots; homography-based control; visual servoing

1. Introduction

The robotic arm system has been widely investigated and developed for various research purposes. In [1,2], the position and orientation of any point on a robotic arm can be derived using kinematic equations. Furthermore, the desired speed or force at any location on the arm can be obtained through the Jacobian matrix. Based on these foundational theories, several practical applications have emerged. For example, in [3], multiple mobile manipulator robots can cooperatively grasp and transport an object along a desired path, while [4] demonstrates a mobile manipulator robot that follows a planned path while pushing an object.

Recent reviews highlight that mobile manipulators, with various levels of autonomy, are advancing rapidly—especially for deployment in hazardous, industrial, and logistics applications [5,6]. Mobile manipulators today are increasingly deployed as single- or dual-entity systems, enabling both fully autonomous and semi-autonomous operation under challenging conditions [5]. Market analyses confirm that the delivery robot sector is expanding rapidly, with projections for exponential growth in both Asia and the US, and increasing integration into last-mile logistics and urban environments [6,7].

With the integration of cameras, robotic arms have greatly contributed to industrial automation, reducing the need for humans to perform dangerous or monotonous tasks. For instance, the system in [8] can both wipe a whiteboard and perform peg-hole insertion. In [9], fuzzy logic is employed to design a shape-sorting controller for a vision-enabled robotic manipulator. In [10], trajectory tracking for robotic manipulators is achieved using a nonlinear backstepping controller with velocity observers. Despite these advancements, the delivery of goods still requires considerable human effort. If a mobile robot equipped with a robotic arm could automatically fetch and deliver items to specified locations, it would reduce workplace injuries. Such systems are known as intelligent mobile robots. The intelligent mobile robot described in [11] can autonomously open doors. In [12], a dual-arm mobile manipulator can grasp a book from a table and return it to a bookshelf. The system in [13] features autonomous grasping, allowing the robot to approach and capture a target even if the object moves. In [14], the robot can pick items from warehouse shelves, and the mechanism in [15] can grasp objects at varying heights.

A 2022 survey by Wu et al. [16] provides a comprehensive overview of learning-based control for robotic visual servoing, including recent advances in deep learning and reinforcement learning techniques. Likewise, Amaya-Mejía et al. (2024) surveyed modern approaches to visual servoing in autonomous manipulation in challenging contexts such as on-orbit servicing [17], confirming the trend toward multi-modal visual control for robust, adaptive robotic manipulation.

The primary design objective of the intelligent mobile robot in this study is to grasp a target object with arbitrary orientation, even if the vehicle does not stop at a precise position. Afterward, the system transports the target object to another location and places it appropriately.

The proposed mobile robot system utilizes a single camera, one robotic arm, and a vehicle platform. Unlike previous configurations [15], the camera in this system is mounted on the end-effector of the manipulator. The camera detects the target object and determines the positions of both the end-effector and the mobile robot. Detection is achieved using HSV color space methods. To achieve accurate positioning, this study employs a visual servoing approach [18,19,20], which can calculate the necessary velocity for the camera to reach the desired position.

The visual servoing method implemented here is called “Homography-Based 2D Visual Servoing” (HBVS) [21]. Leveraging properties of the homography matrix [22,23], the HBVS approach computes the relative translation and rotation between two coordinate frames. Given a set of desired feature points in the image, if the camera can observe the current feature points, the HBVS algorithm determines how to move the camera so that the actual and desired points coincide. The camera can be mounted either on the manipulator or the mobile platform. For example, refs. [24,25] place the camera on a mobile robot to achieve target localization with HBVS. The system in [13] employs a camera on the end-effector, as in the current study. Additionally, autonomous underwater vehicles have utilized HBVS for station-keeping [26] and localization [27].

Recent developments in homography-based visual servoing confirm the strong interest in its application to underactuated as well as fully actuated robotic systems, including UAVs and mobile manipulators [17,28,29]. For example, Huang et al. (2023) introduced a robust HBVS method for quadrotor UAVs, while geometry-based extensions and deep vision-based adaptations are emerging [28,29]. Challenges remain in integrating efficient visual feedback and improving the robustness and efficiency of HBVS for tasks such as autonomous picking in logistics, surgery, and space environments [5,30].

While a variety of mobile manipulation systems have been explored in previous studies, many existing approaches rely on multiple cameras, extensive sensor arrays, or address navigation and manipulation as separate challenges. Integrated solutions utilizing a single camera for both mobile base control and precise object manipulation, particularly with validation on real robots in practical delivery scenarios, appear to be relatively limited. Specifically, there seems to be a gap in the literature regarding systems that achieve autonomous navigation and dexterous pick-and-place tasks with minimal hardware and robust experimental evaluation.

To the best of our understanding, this work represents an initial attempt at demonstrating a complete object delivery cycle using only a single end-effector-mounted camera for both vehicle guidance and arm control, accompanied by comprehensive experimental validation.

Unlike prior approaches that mount the camera on the mobile base or in the surrounding environment, our system positions the camera directly on the end-effector of the robotic arm. This configuration enables more precise and adaptive visual feedback during grasping, significantly enhancing the robot’s ability to localize and grasp objects from various positions and orientations. The camera-on-end-effector design ensures that relevant feature points remain in view throughout manipulation, resulting in higher grasp success rates, particularly when vehicle stopping accuracy is limited.

The main contributions of this work are summarized as follows:

We present a unified mobile robot system that integrates a single camera for both vehicle and manipulator visual servoing.
We develop and empirically validate a homography-based control strategy for object delivery, encompassing both navigation and pick-and-place, with minimal hardware requirements.
We offer detailed experimental results in real-world settings to illustrate the system’s effectiveness and reliability.

The remainder of this paper is organized as follows. The system construction is described in Section 2. Section 3 provides details on the robotic arm used in this study. The proposed control designs for both the arm and mobile base are presented in Section 4. Section 5 discusses experimental results, and Section 6 concludes the paper and suggests directions for future research.

2. System Construction

The system used in this study includes three main components: the robotic arm system, the vehicle system, and the camera system, as illustrated in Figure 1. This section describes the construction of these components and provides a detailed explanation of the control objectives.

In this work, the vehicle system is referred to as the Eddie robot. The movement of the Eddie robot is controlled by applying different voltages at specific time intervals to each motor, allowing the robot to move forward, backward, turn left, and turn right. Commands are sent from the computer to the control panel, enabling the Eddie robot to perform different movements based on the situation.

The robotic arm utilized in this system is a 6-degree-of-freedom (6-DOF) manipulator. The reference configuration is shown in Figure 2. In each frame

F_{i}

, link

i + 1

connects the frame

F_{i}

at Motor

i + 1

to the frame

F_{i + 1}

at Motor

i + 2

. This reference configuration is used to develop the configuration kinematic equations [13,15], which are essential for controlling the robotic arm. The motors employed in the robotic arm are AX-12+ servomotors. In this study, goal positions are sent to each motor to precisely control the movement of the robotic arm. Communication between the computer and the motors is handled via USB2DYNAMIXEL, which allows for the use of different programming languages to send commands to the motors.

The camera used in the system is a Logitech c170, featuring a diagonal field of view of 58° and a focal length of 2.3 mm. The input image size is 160 columns by 120 rows of pixels. As shown in Figure 2, the camera is mounted on the robotic arm’s end-effector and continuously captures images during operation. These images are used both for vehicle localization, as the system approaches the target platform, and for end-effector localization during the object grasping phase. Image information is processed using the Homography-Based Visual Servoing (HBVS) algorithm [21] to determine the required velocity of the camera, thereby guiding the end-effector closer to the desired position through matrix transformations.

3. Robotic Arm Analysis

The reference configuration of the robotic arm is illustrated in Figure 2 and Figure 3. The transformation from frame i to frame j can be expressed as follows:

T_{j}^{i} (θ) = A_{i} A_{i + 1} \dots A_{j} = [\begin{matrix} R_{j}^{i} (θ) & o_{j}^{i} (θ) \\ 0_{1 \times 3} & 1 \end{matrix}]

(1)

where

R_{j}^{i} = [\begin{matrix} x_{j}^{i} & y_{j}^{i} & z_{j}^{i} \end{matrix}]

(2)

is a

3 \times 3

rotation matrix,

o_{j}^{i}

is a

3 \times 1

translation vector, and

A_{i}

denotes the transformation for each link.

The manipulator Jacobian describes the relationship between the velocities of the joints and the velocity of the end-effector with respect to the base frame. This relationship enables the use of the Jacobian matrix to control the robotic arm by specifying joint velocities, thereby allowing the end-effector to follow the desired trajectory. The Jacobian is defined as follows:

ξ_{6}^{0} = J \dot{θ}

(3)

where

ξ_{6}^{0}

is the velocity of the end-effector with respect to the first frame (a

6 \times 1

vector),

J

is the

6 \times 6

Jacobian matrix, and

\dot{θ}

is the joint velocity vector (also

6 \times 1

).

The terms

ξ_{6}^{0}

and

J

can be written as follows:

J = [\begin{matrix} J_{v} \\ J_{ω} \end{matrix}]

(4)

\dot{θ} = {[\begin{matrix} {\dot{θ}}_{1} & {\dot{θ}}_{2} & {\dot{θ}}_{3} & {\dot{θ}}_{4} & {\dot{θ}}_{5} & {\dot{θ}}_{6} \end{matrix}]}^{T}

(5)

where

J_{v}

and

J_{ω}

are

3 \times 6

matrices.

The rotational part of the Jacobian,

J_{ω}

, is given by the following:

J_{ω} = [\begin{matrix} k & R_{1}^{0} k & R_{2}^{0} k & R_{3}^{0} k & R_{4}^{0} k & R_{5}^{0} k \end{matrix}]

(6)

and

J_{v}

can be written as follows:

J_{v} = {[\begin{matrix} {(z_{0}^{0} \times (o_{6}^{0} - o_{0}^{0}))}^{T} \\ {(z_{1}^{0} \times (o_{6}^{0} - o_{1}^{0} - 2 R_{1}^{0} o_{2}^{1}))}^{T} \\ {(z_{2}^{0} \times (o_{6}^{0} - o_{2}^{0}))}^{T} \\ {(z_{3}^{0} \times (o_{6}^{0} - o_{3}^{0}))}^{T} \\ {(z_{4}^{0} \times (R_{5}^{0} o_{6}^{5} - R_{4}^{0} o_{5}^{4}))}^{T} \\ {(z_{5}^{0} \times (o_{6}^{0} - o_{5}^{0}))}^{T} \end{matrix}]}^{T}

(7)

where

k = {[\begin{matrix} 0 & 0 & 1 \end{matrix}]}^{T}

and

z_{b}^{a}

denotes the z-axis vector of frame b expressed in frame a.

The manipulator Jacobian defined by (6) and (7) can be calculated by determining the parameters

z

,

o

, and

R

. The parameter

z

can be derived from

R

as shown in (2), while

o

and

R

are given by (1).

Transformation of Camera Velocity

The velocity calculated by the visual servoing controller is defined in the camera coordinate frame. To control the robotic arm, this velocity must be transformed into the base frame of the robot. This is performed through the forward kinematics of the manipulator, using the transformation matrix from the camera frame to the base frame. The transformed velocity can then be used with the manipulator’s Jacobian for joint control.

4. Control System Design

During the vehicle localization process, the robotic arm is held in a fixed configuration so that the camera is parallel to the direction of the vehicle. By applying homography-based visual servoing, if the calculated translation velocity in the x-direction is greater than 0, the vehicle needs to turn right to decrease the x-direction velocity below 0. Conversely, if the calculated x-direction velocity is less than 0, the vehicle needs to turn left to increase it. These turning maneuvers are combined with forward movement, allowing the vehicle to approach its goal position. The vehicle will stop in front of the grasping platform when the translation velocity in the z-direction is calculated to be less than a fixed threshold.

Once the vehicle has stopped at the grasping position, it begins searching for the target object. To locate the object placed on the grasping platform, the robotic arm gradually changes its configuration until the camera detects the target. If the object is detected on either side of the system, the arm will rotate approximately 10° in that direction and search again. The search process ends when either the target is found directly in front of it, or the camera fails to see the target after all arm movements are exhausted.

Visual servoing is a technique that uses image data to control robots. The approach used in this study is homography-based 2D visual servoing (HBVS), relying solely on image information for robot control. The control law for HBVS is as follows:

ξ_{c}^{c} = - [\begin{matrix} λ_{v_{x}} & 0 & 0 & 0 & 0 & 0 \\ 0 & λ_{v_{y}} & 0 & 0 & 0 & 0 \\ 0 & 0 & λ_{v_{z}} & 0 & 0 & 0 \\ 0 & 0 & 0 & λ_{ω_{x}} & 0 & 0 \\ 0 & 0 & 0 & 0 & λ_{ω_{y}} & 0 \\ 0 & 0 & 0 & 0 & 0 & λ_{ω_{z}} \end{matrix}] [\begin{matrix} e_{v} \\ e_{ω} \end{matrix}]

(8)

where

λ_{v_{x}} = {\begin{matrix} 0.06 & if | e_{v_{x}} | > 0.05 \\ 0 & o t h e r w i s e \end{matrix}

(9)

λ_{v_{y}} = {\begin{matrix} 0.06 & if e_{v_{y}} > 0.05 \\ 0.01 & if - 0.05 < e_{v_{y}} < 0.1 \\ 0 & o t h e r w i s e \end{matrix}

(10)

λ_{v_{z}} = {\begin{matrix} 0.007 & if | e_{v_{z}} | > 0.1 \\ 0.1 & o t h e r w i s e \end{matrix}

(11)

λ_{ω_{x}} = {\begin{matrix} 0.05 & if | e_{ω_{x}} | > 0.1 \\ 0 & o t h e r w i s e \end{matrix}

(12)

λ_{ω_{y}} = {\begin{matrix} 0.05 & if | e_{ω_{y}} | > 0.1 \\ 0 & o t h e r w i s e \end{matrix}

(13)

λ_{ω_{z}} = {\begin{matrix} 0.1 & if | e_{ω_{z}} | > 0.1 \\ 0 & o t h e r w i s e \end{matrix} .

(14)

The six

λ

parameters are determined by trial and error. For example, if the camera moves too quickly and the object leaves the image frame,

λ_{v_{x}}

is decreased; if the camera moves too slowly,

λ_{v_{x}}

is increased. These values are selected to keep the visual target within the image plane during manipulator movement.

While the selection of the adaptive

λ

parameters in the above equations is based on practical experimentation and iterative tuning, this heuristic approach is common in applied visual servoing and robotics literature, especially when system dynamics, sensor noise, and mechanical uncertainties can hardly be modeled exactly. To date, a rigorous theoretical justification (e.g., through Lyapunov or input-to-state stability analysis) remains an open challenge for image-based and homography-based visual servoing subject to hardware constraints and quantization. Our empirical parameter choices ensure responsiveness while avoiding overshoot or loss of the object within the image frame.

In practice, larger

λ

values improve convergence speed but can reduce robustness—potentially causing the target to leave the field of view under disturbances or delays. Conversely, smaller

λ

values enhance stability and robustness against noise, but at the expense of slower response and potentially longer task times. Our tuning was guided by repeated experiments seeking a balance: the

λ

gains are chosen large enough for efficient task completion, but not so large as to trigger instability or target loss during real robot runs. We acknowledge that formal analysis and parameter optimization are potential future research directions.

The HBVS task function is defined as follows:

e_{v} = (H - I) m^{*}

(15)

{[e_{ω}]}_{\times} = H - H^{T}

(16)

where

m^{*}

is defined as the centroid of the feature points in the reference image. The vectors

e_{v} = {[\begin{matrix} e_{v_{x}} & e_{v_{y}} & e_{v_{z}} \end{matrix}]}^{⊤}

and

e_{ω} = {[\begin{matrix} e_{ω_{x}} & e_{ω_{y}} & e_{ω_{z}} \end{matrix}]}^{⊤}

represent, respectively, the translation and rotation errors of the camera.

Importantly, HBVS requires that the determinant of the homography matrix

H

equals 1. Therefore, the system first normalizes the homography matrix by dividing it by the cube root of its determinant before computing the error functions (15) and (16). After calculating the error, the system checks if the errors satisfy the convergence conditions:

| e_{v_{y}} | < 0.1

,

| e_{v_{z}} | < 0.3

, and

| e_{ω_{z}} | < 0.5

. These criteria are designed so that the gripper can descend successfully and grasp the object. If met, the localization process stops; otherwise, the system continues adjusting the gripper. Additionally, if the x-position is less than 18 cm from the origin of

F_{0}

in Figure 2 and the computed velocity is negative in x, the system determines the arm is too close and commands the vehicle to move backward slightly.

In the grasping process, the system defines the parameter

α

as the number of iterations used for updating the Jacobian matrix in (3). A larger value of

α

results in a more precise grasping process, while a smaller value allows for faster execution. The system also defines the parameter

ξ_{6}^{0} = {[\begin{matrix} - 0.04 & 0 & {v_{6}^{0}}_{z} & 0 & 0 & 0 \end{matrix}]}^{⊤} / α

, which represents the desired translation and rotation increments of the end-effector, divided by

α

, where the value

- 0.04

is selected heuristically. In particular,

v_{6_{z}}^{0}

is set as follows:

{v_{6}^{0}}_{z} = \{\begin{matrix} - 0.081, & if | e_{v_{z}} | > 0.28 \\ - 0.077, & if 0.25 < | e_{v_{z}} | \leq 0.28 \\ - 0.072, & if 0.20 < | e_{v_{z}} | \leq 0.25 \\ - 0.070, & if 0.15 < | e_{v_{z}} | \leq 0.20 \\ - 0.055, & otherwise \end{matrix}

(17)

Next, the system inverts the Jacobian matrix in (3) to calculate

\dot{θ}

. The joint rotation angles in the Jacobian matrix are then updated by adding the computed increments to the previous values.

These calculations are repeated 100 times. After 100 iterations, the final joint angles are computed by summing all increments and are then used to control the robotic arm, guiding the gripper downward to grasp the object.

If, during these updates, any joint angle exceeds its allowable range, the system checks whether the x-position relative to the origin of

F_{0}

in Figure 2 is greater than 25 cm. If it is, the vehicle moves forward; otherwise, it moves backward. The system then restarts the search, localization, and grasping procedures, allowing it to try different configurations for the robotic arm.

After the object is grasped, the arm is returned to its initial vehicle-control configuration. The system then performs object recognition and rotates left until the target sign appears in the right half of the image, at which point it stops turning and moves toward the sign. The full control procedure is shown in Figure 4.

5. Practical Experiments

All experiments were conducted in a corridor of a laboratory building. The floor surface was flat and smooth without any obstacles. Experiments took place during the daytime under clear weather conditions, using only ambient sunlight for illumination. Background clutter was minimal, and there were no moving objects in the scene during the tests.

To showcase the system’s robustness in the presence of disturbances and varying environmental conditions, we conducted three experiments designed to evaluate its performance under such challenges:

Case 1: Disturbance at placing platform; grasping platform ahead; object without orientation; placing platform ahead.
Case 2: Disturbance at grasping platform; grasping platform left; object with orientation; placing platform ahead.
Case 3: Disturbance at grasping platform; grasping platform ahead; object without orientation; placing platform elsewhere.

Figure 5, Figure 6 and Figure 7 illustrate that the system was able to accurately identify the correct sign at the placing platform and successfully complete the delivery task in each of the three scenarios. The error function curves for the visual servoing system, shown in Figure 8, Figure 9 and Figure 10, indicate that both the translational errors and the z-axis rotation error converge closely to zero in all cases. In addition, the initial and final images of HBVS, displayed in Figure 11, Figure 12 and Figure 13, demonstrate that the actual and target feature points nearly overlap for each case.

Table 1 presents the success rates for each stage of the experiment, from gripper localization to object grasping. Both vehicle control and object placement performance were influenced by environmental conditions. Out of 20 trials, the system achieved an overall success rate of 80%, with performance largely determined by the effectiveness of the grasping phase. Due to constraints in personnel and available time, the experiment was limited to 20 trials, and baseline methods were not directly compared. As a result, these success rates should be considered preliminary; more extensive statistical analysis and baseline comparisons will be provided in future work.

The accuracy of the grasping process is influenced by the performance of the HBVS algorithm, the quality of the motors, and insufficient rigidity in the mechanical design of the robotic arm. Greater motor accuracy and higher image resolution would improve the precision of the HBVS process. The performance of the HBVS algorithm is constrained by the maximum and minimum speeds of each robotic arm motor. If the parameters in (9)–(14) are not properly defined, the robotic arm may fail to reach the desired speed. As a result, the target object may move outside the camera’s field of view, or the HBVS may not converge.

Furthermore, reducing weight and torsion in the limbs and joints, or redesigning the robotic arm using stronger materials, may reduce excessive motion and shaking, thereby significantly improving the success rate.

Notably, the computation time for each HBVS and Jacobian update cycle was not individually recorded during our experiments. Nevertheless, the implemented system responded in real-time with no perceptible delay between visual feedback and robot action, demonstrating acceptable practical performance for the presented use case. We recognize that detailed computational profiling could strengthen quantitative evaluation, and we plan to include such measurements in future work.

6. Conclusions and Future Works

This study demonstrates the application of an intelligent mobile robot equipped with a single camera located on the gripper, capable of autonomously transporting objects from one location to another. Experimental results show that the implemented homography-based visual servoing (HBVS) strategy successfully localizes both the vehicle and the gripper. Moreover, the Jacobian matrix approach proved effective for visual servoing as well as grasping procedures. This method allows the robot to accurately position its gripper above a target object and to execute pick-and-place operations.

The main contributions of this research include the application of HBVS for localizing both the gripper and the vehicle, as well as the design of a comprehensive control procedure tailored for pick-and-place tasks performed by the intelligent mobile robot. It should be noted, however, that HBVS is not ideally suited for moving the camera across larger distances. When the camera is far from the target, a higher

λ

value is needed to avoid excessively slow motion. Conversely, when the camera approaches the target, keeping

λ

unchanged can result in excessive speed, which may cause the target to fall outside the camera’s field of view. Therefore, the control parameters

λ

are adaptively adjusted based on the error in each direction to ensure that the camera’s motion remains sufficiently rapid without losing the target.

This work also introduces a general control framework for object delivery systems. While HBVS was employed in this study, it may be substituted with alternative visual servoing algorithms, such as image-based visual servoing. Accuracy and robustness of grasping could be further improved through advanced path planning techniques, which help minimize unwanted gripper motion and vibration. Vehicle navigation can also benefit from more sophisticated path planning, obstacle avoidance, or enhanced image processing, enabling operation in more complex environments.

Notably, since significant motion and vibration were observed in the robotic arm during experiments, the present system does not yet achieve precise object placement—a limitation we aim to address in future work. Additionally, comparing our approach with other visual servoing methods will be an important avenue for future investigation. Finally, we recognize that our evaluation was limited to a single laboratory setting and a moderate variety of objects and scenes. To fully validate generalization ability, future studies will expand to a broader range of object types, diverse environments, and detailed analyses of success rates across different scenarios.

Author Contributions

Conceptualization, J.-S.L.; Methodology, J.-S.L. and Y.-C.H.; Validation, J.-S.L. and Y.-C.H.; Formal analysis, J.-S.L.; Investigation, J.-S.L. and Y.-C.H.; Resources, J.-S.L.; Data curation, J.-S.L. and Y.-C.H.; Writing—original draft, J.-S.L. and Y.-C.H.; Writing—review and editing, J.-S.L. and J.-W.H.; Visualization, J.-S.L. and J.-W.H.; Supervision, J.-S.L.; Project administration, J.-S.L.; Funding acquisition, J.-W.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Paul, R. Robot Manipulators: Mathematics, Programming and Control; MIT Press: Cambridge, MA, USA, 1982. [Google Scholar]
Spong, M.; Hutchinson, S.; Vidyasagar, M. Robot Modeling and Control; Wiley: New York, NY, USA, 2006. [Google Scholar]
Brahmi, A.; Saad, M.; Gauthier, G.; Zhu, W.; Ghommam, J. Adaptive control of multiple manipulators transporting a rigid object. Int. J. Control Autom. Syst. 2017, 15, 1779–1789. [Google Scholar] [CrossRef]
Tan, J.; Xi, N. Unified model approach for planning and control of mobile manipulators. In Proceedings of the IEEE International Conference on Robotics and Automation, Seoul, Republic of Korea, 21–26 May 2001; Volume 3, pp. 3145–3152. [Google Scholar]
Contreras, C.A.; Rastegarpanah, A.; Chiou, M.; Stolkin, R. A mini-review on mobile manipulators with Variable Autonomy. Front. Robot. AI 2025, 12, 1540476. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Wei, Y.; Ding, Y.; Han, K.; Masoud, N. Autonomous robot-driven deliveries: A review of recent developments and future directions. Transp. Res. Part E Logist. Transp. Rev. 2022, 162, 102748. [Google Scholar] [CrossRef]
Montero-Vega, M.; Estrada, M.; Prouvier, M.; Siebeneich, A. Autonomous delivery robots: Differences in consumer’s acceptance across regions. J. Urban Mobil. 2025, 7, 100110. [Google Scholar] [CrossRef]
Gao, W.; Tedrake, R. kpam 2.0: Feedback control for generalizable manipulation. IEEE Robot. Autom. Lett. 2021, 6, 2962–2969. [Google Scholar] [CrossRef]
Lu, J.-C.; Pan, J.-C.; Lin, J.-S. Fuzzy logic design for shape sorting control of a robotic manipulator with computer vision. In Proceedings of the International Conference on Service and Interactive Robotics, Taichung, Taiwan, 25–27 November 2011; pp. 507–512. [Google Scholar]
Chen, F.-S.; Lin, J.-S. Nonlinear control design of robotic manipulators with velocity observers. In Proceedings of the 16th World Congress of International Federation of Automatic Control, Prague, Czech Republic, 3–8 July 2005. [Google Scholar]
Wu, Y.; Lamon, E.; Zhao, F.; Kim, W.; Ajoudani, A. Unified approach for hybrid motion control of MOCA based on weighted whole-body cartesian impedance formulation. IEEE Robot. Autom. Lett. 2021, 6, 3505–3512. [Google Scholar] [CrossRef]
Takahama, T.; Nagatani, K.; Tanaka, Y. Motion planning for dualarm mobile manipulator-realization of “tidying a room motion”. In Proceedings of the IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, 26 April–1 May 2024; Volume 5, pp. 4338–4343. [Google Scholar]
Song, K.-T.; Chen, H.-T. Object grasping of a mobile robot using image features and virtual points. In Proceedings of the International Conference on Control, Automation and Systems, Fukuoka, Japan, 18–21 August 2009; pp. 4370–4375. [Google Scholar]
Correll, N.; Bekris, K.E.; Berenson, D.; Brock, O.; Causo, A.; Hauser, K.; Okada, K.; Rodriguez, A.; Romano, J.M.; Wurman, P.R. Analysis and observations from the first amazon picking challenge. IEEE Trans. Autom. Sci. Eng. 2018, 15, 172–188. [Google Scholar] [CrossRef]
Hsu, H.-Y.; Hsu, H.-Y.; Lin, J.-S. Control design and implementation of intelligent vehicle with robot arm and computer vision. In Proceedings of the International Conference on Advanced Robotics and Intelligent Systems, Taipei, Taiwan, 29–31 May 2015; pp. 1–6. [Google Scholar]
Wu, J.; Lin, W.; Su, H. A Survey of Learning-Based Control of Robotic Visual Servoing. Annu. Rev. Control 2022, 53, 270–286. [Google Scholar] [CrossRef]
Amaya-Mejía, L.M.; Ghita, M.; Dentler, J.; Olivares-Mendez, M.; Martinez, C. Visual Servoing for Robotic On-Orbit Servicing: A Survey. In Proceedings of the 2024 International Conference on Space Robotics (iSpaRo), Luxembourg, 24–27 June 2024. [Google Scholar]
Hutchinson, S.; Hager, G.D.; Corke, P.I. A tutorial on visual servo control. IEEE Trans. Robot. Autom. 1996, 12, 651–670. [Google Scholar] [CrossRef]
Wilson, W.J.; Williams Hulls, C.C.; Bell, G.S. Relative end-effector control using Cartesian position based visual servoing. IEEE Trans. Robot. Autom. 1996, 12, 684–696. [Google Scholar] [CrossRef]
Malis, E.; Manuel, V. Deeper Understanding of the Homography Decomposition for Vision-Based Control; INRIA Technical Report; INRIA: Le Chesnay-Rocquencourt, France, 2007. [Google Scholar]
Benhimane, S.; Malis, E. Homography-based 2D visual servoing. In Proceedings of the IEEE International Conference on Robotics and Automation, Orlando, FL, USA, 15–19 May 2006; pp. 2397–2402. [Google Scholar]
Malis, E.; Chaumette, F.; Boudet, S. 2 1/2 d visual servoing with respect to unknown objects through a new estimation scheme of camera displacement. Int. J. Comput. Vis. 2000, 37, 79–97. [Google Scholar] [CrossRef]
Benhimane, S.; Malis, E. Real-time image-based tracking of planes using efficient second-order minimization. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 28 September–2 October 2004; Volume 1, pp. 943–948. [Google Scholar]
Fang, Y.; Dawson, D.M.; Dixon, W.E.; de Queiroz, M.S. Homography-based visual servoing of wheeled mobile robots. In Proceedings of the 41st IEEE Conference on Decision and Control, Las Vegas, NV, USA, 10–13 December 2002; Volume 3, pp. 2866–2871. [Google Scholar]
Sana, S.; Mustapha, H.; Mohamed, T. Homography-based visual control of nonholonomic vehicles: A new approach. In Proceedings of the 3rd International Conference on Systems and Control, Algiers, Algeria, 29–31 October 2013; pp. 844–849. [Google Scholar]
Nguyen, L.-H.; Hua, M.-D.; Allibert, G.; Hamel, T. A Homography-based dynamic control approach applied to station keeping of autonomous underwater vehicles without linear velocity measurements. IEEE Trans. Control Syst. Technol. 2021, 29, 2065–2078. [Google Scholar] [CrossRef]
Krupínski, S.; Allibert, G.; Hua, M.; Hamel, T. An inertial-aided homography-based visual servo control approach for (almost) fully actuated autonomous underwater vehicles. IEEE Trans. Robot. 2017, 33, 1041–1060. [Google Scholar] [CrossRef]
Huang, Y.; Zhu, M.; Chen, T.; Zheng, Z. Robust homography-based visual servo control for a quadrotor UAV tracking a moving target. J. Frankl. Inst. 2023, 360, 1953–1977. [Google Scholar] [CrossRef]
Qian, Z.; Dong, Y.; Hou, Y.; Zhang, H.; Fan, S.; Zhong, H. A geometric approach for homography-based visual servo control of underactuated UAVs. Meas. Control 2024, 57, 1513–1523. [Google Scholar] [CrossRef]
Chen, X.; Kiziroglou, M.E.; Yeatman, E.M. Onboard visual micro-servoing on robotic surgery tools. Microsystems Nanoeng. 2025, 11, 112. [Google Scholar] [CrossRef] [PubMed]

Figure 1. System construction.

Figure 2. Motor arrangement in reference configuration.

Figure 3. Reference configuration.

Figure 4. Control procedure of object delivery system.

Figure 5. Experimental results for Case I. (a) Start searching the sign. (b) Stop before the grasping platform. (c) Start locating the gripper. (d) Finish locating the gripper. (e) Pick up the object. (f) Finish searching the sign. (g) Stop before the placing platform. (h) Finish the task.

Figure 6. Experimental results for Case 2. (a) Start searching the sign. (b) Stop before the grasping platform. (c) Start locating the gripper. (d) Finish locating the gripper. (e) Pick up the object. (f) Finish searching the sign. (g) Stop before the placing platform. (h) Finish the task.

Figure 7. Experimental results for Case 3. (a) Start searching the sign. (b) Stop before the grasping platform. (c) Start locating the gripper. (d) Finish locating the gripper. (e) Pick up the object. (f) Finish searching the sign. (g) Stop before the placing platform. (h) Finish the task.

Figure 8. Error function plot of HBVS when performing end-effector locating process in Case 1.

Figure 9. Error function plot of HBVS when doing end-effector locating process in Case 2.

Figure 10. Error function plot of HBVS when doing end-effector locating process in Case 3.

Figure 11. Initial and final images of HBVS in Case 1: (a) initial image; (b) final image.

Figure 12. Initial and final images of HBVS in Case 2: (a) initial image; (b) final image.

Figure 13. Initial and final images of HBVS in Case 3: (a) initial image; (b) final image.

Table 1. Success rate.

Process	Revision of Vehicle Position	Gripper Localization	Grasping Process
Success rate	95%	95%	90%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, J.-S.; Hsiao, Y.-C.; Hung, J.-W. Design and Implementation of an Autonomous Mobile Robot for Object Delivery via Homography-Based Visual Servoing. Future Internet 2025, 17, 379. https://doi.org/10.3390/fi17090379

AMA Style

Lin J-S, Hsiao Y-C, Hung J-W. Design and Implementation of an Autonomous Mobile Robot for Object Delivery via Homography-Based Visual Servoing. Future Internet. 2025; 17(9):379. https://doi.org/10.3390/fi17090379

Chicago/Turabian Style

Lin, Jung-Shan, Yen-Che Hsiao, and Jeih-Weih Hung. 2025. "Design and Implementation of an Autonomous Mobile Robot for Object Delivery via Homography-Based Visual Servoing" Future Internet 17, no. 9: 379. https://doi.org/10.3390/fi17090379

APA Style

Lin, J.-S., Hsiao, Y.-C., & Hung, J.-W. (2025). Design and Implementation of an Autonomous Mobile Robot for Object Delivery via Homography-Based Visual Servoing. Future Internet, 17(9), 379. https://doi.org/10.3390/fi17090379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Implementation of an Autonomous Mobile Robot for Object Delivery via Homography-Based Visual Servoing

Abstract

1. Introduction

2. System Construction

3. Robotic Arm Analysis

Transformation of Camera Velocity

4. Control System Design

5. Practical Experiments

6. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI