Real-Time SLAM and Faster Object Detection on a Wheeled Lifting Robot with Mobile-ROS Interaction

Lei, Xiang; Chen, Yang; Zhang, Lin

doi:10.3390/app14145982

Open AccessArticle

Real-Time SLAM and Faster Object Detection on a Wheeled Lifting Robot with Mobile-ROS Interaction

by

Xiang Lei

,

Yang Chen

and

Lin Zhang

^*

School of Software Engineering, Tongji University, Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 5982; https://doi.org/10.3390/app14145982

Submission received: 2 June 2024 / Revised: 28 June 2024 / Accepted: 8 July 2024 / Published: 9 July 2024

(This article belongs to the Special Issue Computer Vision, Robotics and Intelligent Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Wheeled lifting robots have found widespread applications in various industrial and logistical environments. However, traditional robots are far from adequate in terms of visual perception capabilities. Additionally, their remote control methods suffer from inefficiencies, which tend to bring safety concerns. To address these issues, this work proposes an autonomous multi-sensor-enabled wheeled lifting robot system, i.e., AMSeWL-R, to facilitate remote autonomous operations. Specifically, AMSeWL-R integrates real-time simultaneous localization and mapping with object detection on a wheeled lifting robot. Additionally, a novel mobile-ROS interaction method is proposed to achieve real-time communication and control between a mobile device and a ROS host. Furthermore, a lightweight object detection algorithm based on YOLOv8, i.e., YOLOv8-R, is proposed to achieve faster detection. Experimental results validate the effectiveness of the AMSeWL-R system for accurately detecting objects and mapping its surroundings. Furthermore, TensorRT acceleration is employed during practical testing on a Jetson Nano to achieve real-time detection using the proposed YOLOv8-R, demonstrating its efficacy in real-world scenarios.

Keywords:

simultaneous localization and mapping; object detection; wheeled lifting robot; mobile-ROS interaction; real-world tests

1. Introduction

Autonomous mobile robots (AMRs) [1] have a wide range of applications across various industries, including manufacturing [2], warehousing [3], retail [4], and healthcare [5]. The advantage of AMRs lies in their capability to adapt to and operate in dynamic environments: automatically maneuvering through complex spaces while executing tasks with precision and efficiency [6,7,8]. As typical AMRs, wheeled lifting robots combine mobility and lifting capability, making them versatile assets in various industrial and logistical settings. Robot Operating System (ROS) is an open-source framework for building robot applications. Combining ROS with mobile devices offers a more convenient and efficient way to control and interact with robots. However, the potential of mobile-ROS interaction has not yet been fully exploited. Additionally, due to the limited onboard computing resources available in AMRs, most existing object detectors [9,10,11,12,13,14] struggle to balance accuracy and efficiency. The above challenges constrain the usability of AMRs in complex dynamic environments. Therefore, a new system is urgently needed to boost the development and capabilities of AMRs.

The first challenge is how to capture real-world data about surrounding environments and further construct the connection between mobile devices and ROS on AMRs. Simultaneous localization and mapping (SLAM) [15] stands as a cornerstone technology in the field of robotics, providing essential capabilities for AMRs to construct detailed maps of their surroundings while simultaneously determining their position within the maps [16,17]. Among the various implementations of SLAM, Google’s Cartographer [18] has been recognized for its robustness and precision in generating real-time, scalable maps [18]. Iviz [19] is an open-source ROS visualization app for mobile devices that has emerged as a revolutionary tool, transforming the accessibility and utility of SLAM data. Built on the Unity Engine, Iviz provides an advanced 3D visualization platform that enables intuitive interaction with complex robotic data. Compared with the ROS visualization tool (RViz [20]), which is traditionally used on desktop environments and requires a full ROS setup, Iviz is designed for mobility and ease of use, making it accessible on a range of mobile devices. Iviz provides a real-time, interactive view of the environment, enhancing situational awareness and operational control over autonomous systems. This capability is particularly vital in settings where AMRs are expected to operate with high levels of autonomy and precision, such as navigating through densely cluttered warehouse spaces or complex healthcare facilities [21]. To maximize the effectiveness of AMRs in complex and dynamic environments, integrating Cartographer with Iviz presents a compelling strategy to enhance mobile-ROS interaction. This integration aims to streamline the way AMRs interact with the real world and provide human operators with superior tools to control and interact with these autonomous systems effectively.

The second challenge is how to meet the requirement of real-time detection onboard AMRs. With the development of deep learning, object detection algorithms have achieved remarkable advancements and exhibit high accuracy and robustness [22,23,24]. They can be divided into two types: one-stage and two-stage. Among them, one-stage detectors directly predict object classes and locations in one pass without needing a separate region proposal step, making them suitable for real-time applications [25,26,27]. As a representative, the YOLO (you only look once) family [28] demonstrates robust end-to-end object detection capabilities. However, even with one of the most advanced one-stage detectors, i.e., YOLOv8, ensuring real-time detection on edge devices with limited resources remains a challenging task.

This work introduces AMSeWL-R: a novel autonomous wheeled lifting robot system designed to enhance remote operations through mobile-ROS integration. Furthermore, to address the critical need for efficient and timely responses in dynamic environments, the system incorporates a state-of-the-art object detection algorithm optimized for real-time performance on edge devices. To validate the usability of the proposed system, a fully functional wheeled lifting robot is designed, produced, and tested under real-world conditions. The main contributions of this work are as follows:

An autonomous multi-sensor-enabled wheeled lifting robot system, i.e., AMSeWL-R, is proposed to integrate real-time SLAM with faster object detection. By harnessing data from multiple sensors, including LiDAR and cameras, the AMSeWL-R system achieves robust environmental perception and localization accuracy, which is essential for safe and efficient operation in complex environments.
A novel mobile-ROS interaction method is introduced to establish seamless real-time communication and control between a mobile device and a ROS host. This innovative method bridges the gap between mobile platforms and robotic systems, enabling users to interact with AMRs from their mobile devices remotely.
An innovative lightweight object detection algorithm, i.e., YOLOv8-R, is proposed to improve real-time object detection speed, with notably significant enhancements achieved through TensorRT acceleration. Real-world tests conducted on a Jetson Nano in real-world deployment scenarios not only validate its efficacy but also boost the practical applicability of our system. The code is available at https://github.com/Lei00764/AMSeWL-R (accessed on 1 June 2024).

2. Related Work

2.1. SLAM

SLAM is essential for autonomous navigation as it achieves localization and mapping in dynamic and unknown environments. Rapid advancements in the industry have allowed cost-effective LiDAR sensors to be used to produce impressive mapping results, significantly extending their utility in autonomous robotics. GTSAM [29] implements smoothing and mapping for robotics and vision through factor graphs and Bayes networks and is an implementation of a more effective computing paradigm than traditional sparse matrices. M-LOAM [30] enhances robot perception by integrating multiple LiDAR inputs to construct and optimize a global map, reducing uncertainties in measurements and estimations. Google’s Cartographer [18] is renowned for its real-time capabilities and facilitates effective real-time environmental mapping. It supports both 2D and 3D LiDAR inputs for precise robot localization and grid map construction; 2D SLAM utilizes 2D grid maps directly for navigation, while 3D SLAM, which is based on a hybrid probability grid, presents complexities for direct usage. Given its user-friendly nature and usability, this work employs Google’s Cartographer for localization and mapping purposes.

2.2. Lightweight Object Detection

Lightweight object detection models aim to achieve an ideal balance between computational complexity and performance, making them highly suitable for deployment in resource-constrained environments. MobileNet [31,32] utilizes depth-wise separable convolutions (DW-Conv), which reduce computational cost and model size by decomposing standard convolutions into depth-wise and point-wise operations. ShuffleNet [33] employs feature shuffling techniques to enable efficient information flow across channels at a reduced computational load by using pointwise group convolutions followed by channel shuffling. FasterNet [34] adopts a feature reuse strategy that economizes on computational resources by leveraging existing features to generate new feature maps rather than creating them from scratch. StarNet [35] employs an implicit high-dimensional strategy: mapping inputs to a high-dimensional nonlinear feature space without the need to widen the network.

Network pruning [36] is another effective strategy that allows for an explicit evaluation of the trade-off between model lightness and accuracy. It systematically removes redundant or less significant weights from the network, which can significantly improve computational efficiency without a substantial loss of performance. SLIM [37] employs L1 regularization on the scaling factors of BN layers to identify and prune less significant channels, efficiently streamlining the network. Taylor FO and Taylor SO [38] use Taylor expansions to estimate the contribution of neurons to the final loss and iteratively removes those with minor scores to reduce computation, energy consumption, and memory transfer costs while maintaining high accuracy. LAMP [39] introduces a new global pruning significance score, i.e., a LAMP score, which automatically determines the sparsity level for each layer by minimizing model-level

ℓ_{2}

distortion without the need for hyperparameter tuning or extensive computation. The model-level

ℓ_{2}

distortion is derived from the need to reduce the pruning-incurred distortion of the model’s output for the worst-case input.

Therefore, to achieve a lightweight version of an object detection model, this work considers combining both lightweight model replacement and network pruning.

3. Proposed Method

3.1. Overview

The proposed AMSeWL-R represents a comprehensive integration of several subsystems, i.e., mechanics, electronic control, and vision subsystems, which enhance its functionality and autonomy in various operational environments. The mechanical subsystem provides the structural framework and includes elements such as wheels, motors, and the lifting apparatus. The electronic control subsystem is the nerve center that commands and coordinates the robot’s operations, ensuring tasks are executed as planned. The vision subsystem is aimed at exploiting the surrounding environment using multi-sensors, including cameras, LiDAR, and inertial measurement units (IMUs). Based on these subsystems that are needed for AMRs, a wheeled lifting robot is designed and produced to validate the usability of the proposed AWSeWL-R.

The mobile-ROS interaction is also a significant innovation in this work and significantly boosts the remote controllability of AMRs. Enhanced integration facilitates superior control dynamics and data exchange, enabling remote and autonomous operations with heightened efficiency and reliability. Moreover, a faster object detector, i.e., YOLOv8-R, is introduced in the vision subsystem. Figure 1 depicts the framework of the proposed AWSeWL-R, and further details about the AWSeWL-R system’s capabilities are discussed in the following sections.

3.2. Mobile-ROS Interaction

The mobile-ROS interaction in the AMSeWL-R system begins by connecting a LiDAR sensor to a ROS-running host, such as a PC or an embedded board, which serves as the foundation for real-time mapping and data processing. Using Google’s Cartographer [18], the system performs real-time mapping to model the robot’s environment accurately. Data are continuously transmitted to a mobile device equipped with the Iviz application during the mapping process. The integration with Iviz facilitates sophisticated visualization of the ROS data, enabling users to interact with the system more intuitively and engagingly. The detailed interaction process is outlined in Algorithm 1.

To augment its remote control ability, the system features tailored enhancements in Iviz, including toggling between 2D and 3D views, halting scans, and other remote operations. These functionalities are supported by a robust TCP communication protocol developed in Python that guarantees reliable and secure data exchange between the mobile interface and the ROS host. This setup enriches the adaptability and operational flexibility and streamlines user interaction and control.

Algorithm 1: Mobile-ROS Interaction

Input: LiDAR data from a ROS-running host (PC or embedded board), mobile device with Iviz
Output: Real-time map visualization and control actions
1 while True do

2 Bind socket to the specified port and listen for incoming connections;
3 Accept incoming connection from the mobile device;
4 while connected do

5 Receive and decode command from the mobile device;
6 if command is “kill” then

7 Terminate the current roslaunch process;
8 else if command is “switch 2D/3D” then

9 Start to play the 2D/3D data bag collected in advance;
10 else if command is “switch real-time mapping” then

11 Start real-time mapping script;
12 end

13 Close the connection with the mobile device;
14 Close the socket;
15 end

3.3. Vision Subsystem

The vision subsystem is empowered by the proposed YOLOv8-R, which has been specifically optimized for faster detection speed through significant lightweight modifications to both YOLOv8’s backbone and detection head.

As shown in Figure 2, the proposed YOLOv8-R uses StarNet [35] as its backbone, which embodies the concept of a lightweight architecture while maintaining robust feature extraction capabilities. The core of StarNet is the star operation: an element-wise multiplication technique that efficiently projects inputs into a high-dimensional, nonlinear feature space. Specifically, the star operation in StarNet can be formulated as:

\begin{matrix} f (X) = (W_{1}^{T} X + B_{1}) * (W_{2}^{T} X + B_{2}), \end{matrix}

(1)

where

X

is the input feature matrix,

W_{1}

and

W_{2}

are the weight matrices, and

B_{1}

and

B_{2}

are the bias matrices for the respective transformations. All matrices are of dimension

R^{d \times d}

. The operation ∗ denotes the element-wise multiplication of the resultant feature vectors and enables rich, nonlinear feature mapping without expanding the depth or width of the network substantially.

The proposed YOLOv8-R employs a lightweight shared convolutional detection head (LSCDH), which benefits the model by processing multi-scale features efficiently with substantially fewer parameters. A mathematical comparison regarding the calculated parameters is as follows:

\begin{matrix} P_{traditional} & = \sum_{s = 1}^{N} C_{o u t, s} \times C_{i n, s} \times K_{s}^{2}, \\ P_{shared} & = C_{o u t} \times C_{i n} \times K^{2}, \end{matrix}

(2)

where

P_{traditional}

represents the total number of parameters when each scale uses separate convolutional filters, and

P_{shared}

denotes the reduced number of parameters resulting from using shared convolutional filters across all scales. Here, N is the number of scales, C represents the number of channels, and K is the kernel size.

To address the issue of varying target scales detected by each detection head, a scale layer is employed to adjust the features at each scale, which can be depicted as:

\begin{matrix} Y_{i} = γ_{i} \cdot X_{i}, \end{matrix}

(3)

where

Y_{i} \in R^{H_{i} \times W_{i}}

is the output feature map,

X_{i} \in R^{H_{i} \times W_{i}}

is the input feature map from the

i^{t h}

layer, and

γ_{i}

is a scaling factor that is learned during training.

To further reduce the model size for edge device deployment, the LAMP [39] technique is introduced to prune weights based on their importance: maintaining crucial weights while eliminating less significant ones. The LAMP score is designed to reflect the relative importance of weights based on their contribution to model-level distortion when pruned, which can be calculated as below:

\begin{matrix} score (u; W) = \frac{{(W [u])}^{2}}{\sum_{v \geq u} {(W [v])}^{2}}, \end{matrix}

(4)

where

W [u]

represents the weight at position u, and the denominator sums the squares of weights from position u to the end of the weight matrix. The smaller the LAMP score, the more likely it is to be pruned.

Following the pruning process, the pruned network undergoes fine-tuning on the original dataset to restore any performance that was potentially lost by the pruning.

3.4. Mechanical and Electronic Control Subsystem

The mechanical design of the AMSeWL-R system is focused on achieving a balance between lightweight and robust construction, and we used aluminum and carbon fiber materials for the main frame to enhance strength without compromising weight. The chassis and manipulator structures, as illustrated in Figure 3, are designed with compact, flexible configurations to facilitate efficient navigation and material handling. Notably, the chassis integrates advanced wheel designs for better traction, and the manipulative structure utilizes a crank–slider mechanism for precise movement of the arm.

The electronic control system of AMSeWL-R leverages advanced task scheduling and motor control strategies to enhance its operational capabilities. It utilizes FreeRTOS [40] for effective task management, ensuring that multiple tasks such as chassis navigation, turret operations, and IMU data processing are handled concurrently with real-time priorities. Each component depicted in Figure 4a–d plays a vital role: the core control unit (Figure 4a) manages the primary command operations; the wheel motor (Figure 4b), lift motor (Figure 4c), and robotic arm yaw axis motor (Figure 4d) are integral for precise movement and handling and are equipped with feedback systems for accurate positioning. Precise motor control is implemented through a PID control strategy, which is mathematically represented by the following equation:

\begin{matrix} τ (t) = K_{p} e (t) + K_{i} \int_{0}^{t} e (t) d t + K_{d} \frac{d}{d t} e (t), \end{matrix}

(5)

where

τ (t)

is the torque applied,

e (t)

is the position error, and

K_{p}

,

K_{i}

, and

K_{d}

are the proportional, integral, and derivative controller gains, respectively.

Detailed specifications and functionalities of these motors are outlined in Table 1. The architecture supports a distributed processing approach, with upper units tasked with complex vision processing and lower units focusing on direct motor control, enhancing the system’s responsiveness and efficiency for real-world applications.

4. Experiments

4.1. Implementation Details

The proposed YOLOv8-R was implemented by Python 3.10.9 and PyTorch 1.13.1+cu117 and was trained on a PC equipped with an NVIDIA GeForce RTX 4060 laptop GPU (Santa Clara, CA, USA) for 100 epochs. The dataset consists of about 1000 images that were captured using a USB camera in various industrial and logistical settings. These images include two types of objects: cola and box. The dataset was divided into training, validation, and test sets with a ratio of 7:2:1. The training sets were expanded through preprocessing and data augmentation. All other parameters followed the default settings of YOLOv8. For a fair comparison, other state-of-the-art detectors used the settings from their official code and were trained for the same number of epochs on the same dataset.

4.2. Evaluation Metrics

The performance of the object detection models was evaluated using mean average precision (mAP), number of parameters, floating point operations per second (FLOPS), and frames per second (FPS).

The mAP at an intersection over union (IoU) threshold of 0.5 (mAP@0.5) is used to evaluate the model’s performance at detecting objects with a minimum overlap of 50%. This metric provides an initial measure of how well the model detects objects without considering varying levels of overlap. The mAP at IoU thresholds ranging from 0.5 to 0.95 (mAP@0.5:0.95) offers a more comprehensive evaluation by averaging the precision across multiple IoU thresholds. The mAP is calculated using the following equation:

\begin{matrix} mAP = \frac{1}{n} \sum_{i = 1}^{n} {AP}_{i}, \end{matrix}

(6)

where

{AP}_{i}

is the average precision at the i-th IoU threshold, and n represents the number of IoU thresholds considered.

The parameters metric measures the memory usage of the models. FLOPS corresponds to the computational complexity of the models. FPS quantifies the speed at which the model processes images. The FPS values in Table 2 are measured on a PC equipped with an NVIDIA GeForce RTX 4060 laptop GPU—the same as the training process. The FPS tests are conducted with a batch size of 8 in the following experiments.

4.3. Comparative Analysis

In this section, comparison experiments were conducted to verify the efficiency and performance of the proposed YOLOv8-R. Specifically, YOLOv8-R was compared with different sizes of YOLOv8 models. As shown in Table 2, the experimental results indicate that YOLOv8-R not only achieves comparable performance but also the fastest object detection speed among the five models. Furthermore, the visualization results in real scenarios can be seen in Figure 5.

The closely comparable mAP values across these models can be attributed to the robust performance of the YOLO series and the simplicity of the dataset we employed. Within the vision subsystem, our foremost objective is to achieve the fastest possible detection speeds while ensuring reliable object detection performance. To achieve this, a powerful lightweight module, i.e., Star Block, is employed to extract features from images more efficiently. Meanwhile, pruning techniques are utilized to effectively remove non-essential parameters, thereby retaining only those that are crucial for maintaining the integrity of the detection performance. The integration of the pruning model with the lightweight module utilizes fewer parameters compared to the original versions. In simpler scenarios, this configuration not only ensures the maintenance of high object detection performance but, most importantly, significantly increases the speed of detection. Consequently, robust real-time object detection can be achieved on edge devices with limited computational resources while maintaining high object detection performance. Notably, the main contribution of this section is the construction of such a scenario to verify the effectiveness of the proposed AMSeWL-R system, which can later be extended to more complex scenarios.

4.4. Ablation Study

An ablation study was conducted to demonstrate the effectiveness of the introduced module. YOLOv8n served as the baseline and uses C2f modules in its backbone. To ensure fairness, all variants of the detector were configured with the same environmental settings except for the studied module. As presented in Table 3, replacing the C2f modules in the backbone with Star Blocks significantly reduces both the parameter count and FLOPS. Substituting the detection head in YOLOv8n with the LSCDH further decreases the complexity of the model and improves FPS. Finally, the application of the LAMP pruning technique to the detector demonstrates a dramatic reduction in parameters and computational requirements while substantially boosting the FPS. The ablation study results indicate that the implemented improvements contribute significantly to reducing model complexity while enhancing detection speed. Consequently, these enhancements enable real-time detection capabilities in resource-constrained environments, demonstrating the practical applicability of the proposed YOLOv8-R.

5. Real-World Tests

5.1. Mobile-ROS Interaction Test

The mobile-ROS interaction test was conducted on mobile devices and a portable laptop running ROS. RPLIDAR A1M8 LiDAR [41] and Sprint RoboSense 16-wire LiDAR [42] were used to collect data and enabled the system to perform real-time mapping and data processing. The testing environment included an underground parking lot at Tongji University, Jiading Campus, Shanghai, China, as shown in Figure 6a. During the test, the system demonstrated its capability to accurately capture and visualize the surrounding environment. The corresponding mapping result is illustrated in Figure 6b. Notably, gaps in the scanned image correspond to the positions of parked cars, demonstrating the system’s precision at detecting and mapping obstacles.

Specifically, the RPLIDAR A1M8 LiDAR was employed for detailed 2D environmental mapping, as shown in Figure 7a. As depicted in Figure 7b, the real-time scanning process involved the LiDAR performing real-time scanning and mapping. The data were then transmitted from the laptop to the mobile device for visualization. The scanned results were displayed on the mobile device in real time, allowing users to control and interact with the system directly through the mobile interface. Then, users could view the real-time environmental map on the mobile device and make necessary adjustments and controls. This enables users to monitor and manage environmental data more conveniently.

To further illustrate the mobile-ROS capabilities of the proposed AMSeWL-R system, a real-world test was conducted to switch between the 2D and 3D views using ROS on the portable laptop. The switching process involved three steps, as depicted in Figure 8. The seamless transitions between 2D and 3D views demonstrated the system’s capability to adapt to different visualization needs and provided users with flexible and intuitive control over the mapping and interaction processes. This feature is critical for enhancing the safety and efficiency of remote autonomous operations.

The excellent mobile-ROS interaction ability significantly enhances the usability and functionality of the proposed AMSeWL-R system, making it a powerful tool for real-time environmental monitoring and interaction.

5.2. Onboard Object Detection Test

To evaluate the onboard object detection capability of the proposed YOLOv8-R, we conducted experiments on a wheeled lifting robot equipped with an NVIDIA Jetson Nano as the edge device. Meanwhile, as a high-performance deep learning inference optimizer and runtime library from NVIDIA, TensorRT was employed to accelerate YOLOv8-R. Figure 9a shows the wheeled lifting robot in a working condition, demonstrating its capability to handle various objects. Figure 9b provides the environment from the camera’s perspective, showcasing the robot’s view and interaction with the surroundings. Figure 9c,d display real-world tests in which the robot successfully detects a cola can and a box using the proposed YOLOv8-R.

Notably, FPS values greater than 30 are typically considered adequate for real-time applications. Compared with YOLOv8n, which achieved 14 FPS, the proposed YOLOv8-R achieved an average of 34 FPS, demonstrating its capability for real-time detection on edge devices with limited resources.

6. Discussion

The proposed AMSeWL-R system offers significant advancements in enhancing the intelligence and operational capabilities of AMRs. Future work could explore the development of multi-agent systems where multiple AMRs collaborate to complete tasks more efficiently. Such an approach would enhance productivity and open up new possibilities for complex operations in various industries. By leveraging the coordinated efforts of multiple robots, tasks could be completed faster and with greater accuracy, thereby optimizing the overall performance.

Additionally, AMRs can be combined with the Segment Anything Model (SAM) [43], a large zero-shot segmentation model, to create automatic inspection robots capable of detecting road damage such as cracks. This application has significant market potential and practical value in infrastructure maintenance. Integrating SLAM-based localization and navigation would further enhance task completion, making this a highly promising research direction. Furthermore, the potential for multi-agent collaboration introduces an additional layer of efficiency and innovation, expanding the capabilities and applications of AMRs across various sectors.

Practical applications of the AMSeWL-R system are diverse and include roles in libraries, museums, construction sites, and infrastructure maintenance. These applications require careful consideration of the dimensions, geometries, and weights of the objects that the equipment can handle as well as the robot’s ability to navigate barriers and understand its maximum loading height and minimum corridor width. By addressing these considerations, the AMSeWL-R system can significantly enhance the practical applications of AMRs, making them more versatile and efficient in various real-world scenarios.

7. Conclusions

The proposed AMSeWL-R system enhances the intelligence of AMRs by integrating multiple sensors, including cameras and LiDARs. This integration significantly improves the robot’s environmental perception and operational capabilities, allowing for more accurate and efficient task execution. The system demonstrates substantial advancements towards smarter and more versatile AMRs by providing real-time mapping and sophisticated visualization through mobile-ROS interaction. By integrating real-time SLAM with efficient object detection, AMSeWL-R enhances the robot’s navigational and environmental perception capabilities. Specifically, The novel mobile-ROS interaction method facilitates seamless real-time communication and control between mobile devices and the ROS host, enhancing user interaction. The proposed YOLOv8-R, optimized with TensorRT acceleration on the Jetson Nano, demonstrates significant improvements in detection speed. YOLOv8-R has a parameter count that is one-tenth that of YOLOv8n and achieves over a 143% increase in detection speed on embedded devices, proving itself effective for real-time applications in resource-constrained environments. The combination of the above functions positions AMSeWL-R as a robust solution for complex, dynamic industrial environments. We believe that our AMSeWL-R system can boost the development of AMRs, particularly in terms of environmental perception and user experience.

Author Contributions

Supervision, L.Z.; Writing—original draft, X.L.; Writing—review and editing, L.Z. and Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant 62272343, in part by the Shuguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission under grant 21SG23, and in part by the Fundamental Research Funds for the Central Universities.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Siegwart, R.; Nourbakhsh, I.R.; Scaramuzza, D. Introduction to Autonomous Mobile Robots; MIT Press: Cambridge, MA, USA, 2011. [Google Scholar]
Fragapane, G.; Ivanov, D.; Peron, M.; Sgarbossa, F.; Strandhagen, J.O. Increasing Flexibility and Productivity in Industry 4.0 Production Networks with Autonomous Mobile Robots and Smart Intralogistics. Ann. Oper. Res. 2022, 308, 125–143. [Google Scholar] [CrossRef]
Balakrishnan, S.; Azman, A.D.; Nisar, J.; Ejodame, O.E.; Cheng, P.S.; Kin, T.W.; Yi, Y.J.; Das, S.R. IoT-Enabled Smart Warehousing with AMR Robots and Blockchain: A Comprehensive Approach to Efficiency and Safety. In Proceedings of the International Conference on Mathematical Modeling and Computational Science (ICMMCS), Madurai, Tamilnadu, India, 24–25 February 2023; pp. 261–270. [Google Scholar]
Chen, Y.; Luo, Y.; Yang, C.; Yerebakan, M.O.; Hao, S.; Grimaldi, N.; Li, S.; Hayes, R.; Hu, B. Human Mobile Robot Interaction in the Retail Environment. Sci. Data 2022, 9, 673. [Google Scholar] [CrossRef] [PubMed]
Fragapane, G.; Hvolby, H.H.; Sgarbossa, F.; Strandhagen, J.O. Autonomous Mobile Robots in Hospital Logistics. In Proceedings of the IFIP International Conference on Advances in Production Management Systems (APMS), Novi Sad, Serbia, 30 August–3 September 2020; pp. 672–679. [Google Scholar]
Saleh, S.A.M.; Suandi, S.A.; Ibrahim, H.; Hamad, Q.S.; Al Amoudi, I. AGVs and AMRs Robots: A Brief Overview of the Differences and Navigation Principles. In Proceedings of the International Conference on Robotics, Vision, Signal Processing and Power Applications (RoViSP), Penang, Malaysia, 28–29 August 2023; pp. 255–260. [Google Scholar]
Loganathan, A.; Ahmad, N.S. A Systematic Review on Recent Advances in Autonomous Mobile Robot Navigation. Eng. Sci. Technol. Int. J. 2023, 40, 101343. [Google Scholar] [CrossRef]
Tadić, S.; Krstić, M.; Dabić-Miletić, S.; Božić, M. Smart Material Handling Solutions for City Logistics Systems. Sustainability 2023, 15, 6693. [Google Scholar] [CrossRef]
Padilla, R.; Netto, S.L.; Da Silva, E.A. A Survey on Performance Metrics for Object-Detection Algorithms. In Proceedings of the IEEE International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; pp. 237–242. [Google Scholar]
Joseph, K.; Khan, S.; Khan, F.S.; Balasubramanian, V.N. Towards Open World Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5830–5840. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of YOLO Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Durrant-Whyte, H.; Bailey, T. Simultaneous Localization and Mapping: Part I. IEEE Robot. Autom. Mag. 2006, 13, 99–110. [Google Scholar] [CrossRef]
Chen, C.W.; Lin, C.L.; Hsu, J.J.; Tseng, S.P.; Wang, J.F. Design and Implementation of AMR Robot Based on RGBD, VSLAM and SLAM. In Proceedings of the International Conference on Orange Technology (ICOT), Tainan, Taiwan, 16–17 December 2021; pp. 1–5. [Google Scholar] [CrossRef]
Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef]
Hess, W.; Kohler, D.; Rapp, H.; Andor, D. Real-Time Loop Closure in 2D LIDAR SLAM. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1271–1278. [Google Scholar] [CrossRef]
Zea, A.; Hanebeck, U.D. Iviz: A ROS Visualization App for Mobile Devices. Softw. Impacts 2021, 8, 100057. [Google Scholar] [CrossRef]
RViz. Available online: https://wiki.ros.org/rviz (accessed on 1 June 2024).
Costa, G.d.M.; Petry, M.R.; Moreira, A.P. Augmented Reality for Human-Robot Collaboration and Cooperation in Industrial Applications: A Systematic Literature Review. Sensors 2022, 22, 2725. [Google Scholar] [CrossRef] [PubMed]
Xiao, Y.; Tian, Z.; Yu, J.; Zhang, Y.; Liu, S.; Du, S.; Lan, X. A Review of Object Detection Based on Deep Learning. Multimed. Tools Appl. 2020, 79, 23729–23791. [Google Scholar] [CrossRef]
Deng, J.; Xuan, X.; Wang, W.; Li, Z.; Yao, H.; Wang, Z. A Review of Research on Object Detection Based on Deep Learning. J. Phys. Conf. Ser. 2020, 1684, 012028. [Google Scholar] [CrossRef]
Kaur, R.; Singh, S. A Comprehensive Review of Object Detection with Deep Learning. Digit. Signal Process. 2023, 132, 103812. [Google Scholar] [CrossRef]
Carranza-García, M.; Torres-Mateo, J.; Lara-Benítez, P.; García-Gutiérrez, J. On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data. Remote Sens. 2020, 13, 89. [Google Scholar] [CrossRef]
Cheng, M.; Bai, J.; Li, L.; Chen, Q.; Zhou, X.; Zhang, H.; Zhang, P. Tiny-RetinaNet: A One-Stage Detector for Real-Time Object Detection. In Proceedings of the International Conference on Graphics and Image Processing (ICGIP 2019), Hangzhou, China, 12–14 October 2019; Volume 11373, pp. 195–202. [Google Scholar]
Pham, M.T.; Courtrai, L.; Friguet, C.; Lefèvre, S.; Baussard, A. YOLO-Fine: One-Stage Detector of Small Objects under Various Backgrounds in Remote Sensing Images. Remote Sens. 2020, 12, 2501. [Google Scholar] [CrossRef]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef] [PubMed]
Dellaert, F.; Contributors, G. borglab/gtsam; Georgia Tech Borg Lab: Atlanta, GA, USA, 2022. [Google Scholar] [CrossRef]
Jiao, J.; Ye, H.; Zhu, Y.; Liu, M. Robust Odometry and Mapping for Multi-Lidar Systems with Online Extrinsic Calibration. IEEE Trans. Robot. 2021, 38, 351–371. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
Chen, J.; Kao, S.h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the Stars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 5694–5703. [Google Scholar]
Liu, Z.; Sun, M.; Zhou, T.; Huang, G.; Darrell, T. Rethinking the Value of Network Pruning. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019; pp. 1–21. [Google Scholar]
Liu, Z.; Li, J.; Shen, Z.; Huang, G.; Yan, S.; Zhang, C. Learning Efficient Convolutional Networks through Network Slimming. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2736–2744. [Google Scholar]
Molchanov, P.; Mallya, A.; Tyree, S.; Frosio, I.; Kautz, J. Importance Estimation for Neural Network Pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11264–11272. [Google Scholar]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-adaptive Sparsity for the Magnitude-based Pruning. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, Austria, Virtual Event, 3–7 May 2021. [Google Scholar]
FreeRTOS. Available online: https://www.freertos.org (accessed on 1 June 2024).
Slamtec. Available online: https://www.slamtec.com (accessed on 1 June 2024).
Robosense. Available online: https://www.robosense.ai (accessed on 1 June 2024).
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 4015–4026. [Google Scholar]

Figure 1. Overview of the AMSeWL-R system. The simulated structure of the wheeled lifting robot demonstrates its mechanical subsystem. The vision subsystem includes both vision-based and LiDAR-based perception. Vision-based perception utilizes the proposed YOLOv8-R for object detection and is enhanced with TensorRT acceleration for efficient edge computation on an NVIDIA Jetson Nano (Santa Clara, CA, USA). LiDAR-based perception employs Google’s Cartographer [18] for real-time mapping and is integrated with the ROS system and user interface on mobile devices for control and visualization. The proposed system processes estimated object positions and control commands for effective operation.

Figure 2. Different backbone architectures of YOLOv8 and YOLOv8-R: YOLOv8 uses C2f modules, while the proposed YOLOv8-R employs Star Blocks.

Figure 3. Design components of the wheeled lifting robot. (a) The overall structure of the wheeled lifting robot is equipped with a lifting mechanism and storage for cylindrical objects. (b) Detailed view of the manipulative structure highlighting the mechanical arm and its components. (c) The chassis structure showcases the robust wheel design and framework intended for stable movement and efficient navigation in varied environments.

Figure 4. Key components of the AMSeWL-R system’s electronic control unit. (a) The core control board integrates essential processing capabilities. (b) The wheel motor is used for optimal locomotion efficiency. (c) The lift motor provides precise elevation control. (d) The robot arm yaw axis motor ensures accurate rotational movement. Detailed specifications and functionalities for these components are provided in Table 1.

Figure 5. Visual analysis of the prediction results. The figure demonstrates YOLOv8-R’s capability to accurately detect objects in real scenarios.

Figure 6. (a) The location of the mapping experiment: an underground parking lot at Tongji University, Jiading Campus, Shanghai, China, where we conducted our mapping tests. (b) Visualization on a mobile device using Iviz that show the scanning results from the parking lot. Notably, the gaps in the scanned image correspond to the positions of cars in the parking lot. This illustrates the effectiveness and accuracy of the SLAM data captured and visualized using Iviz.

Figure 7. (a) RPLIDAR A1M8 LiDAR [41]: utilized for real environmental mapping. (b) LiDAR and mobile interaction: showcases real-time scanning, map construction, and mobile integration.

Figure 8. (a) Illustration of the switching process from 2D to 3D and (b) from 3D to 2D. The yellow numbers in the top left corners indicate the sequence of operations. Switching between 2D and 3D images is achieved using ROS on a Mac, with remote operation tests performed via a mobile device that runs the improved Iviz. The numbers 1–3 represent the execution order.

Figure 9. (a) Wheeled lifting robot in working condition, demonstrating its capability to handle various objects. (b) Environment from the camera’s perspective, showcasing the robot’s view of and interaction with the surroundings. (c,d) Real-world tests to detect a cola and box using YOLOv8-R.

Table 1. Specifications of the control core board and three motors.

Component	Model No.	Features
Control Core Board	RoboMaster Development Board C	STM32 main control chip, rich interfaces, compact structure, integrated high-precision IMU sensor, strong protection features
Wheel Motor	RoboMaster M3508	CAN bus control, dual closed-loop, max output power: 220 W, torque: 5 N·m
Lifting Motor	RoboMaster M2006	High precision, small size, max torque: 1000 mN·m, power: 44 W, max speed: 500 rpm
Yaw Motor	RoboMaster GM6020	Powered by 24V DC, speed control via CAN/PWM, built-in angle sensor, FOC technology

Table 2. Experimental results of the proposed YOLOv8-R compared to four different sizes of YOLOv8 on the collected test dataset. The up arrow (↑) indicates that higher values are better, and the down arrow (↓) indicates that lower values are better.

Model	mAP@0.5/% ↑	mAP@0.5:0.95/% ↑	Parameters/M ↓	FLOPS/B ↓	FPS ↑
YOLOv8n	0.960	0.841	3.0	8.1	283.0
YOLOv8s	0.965	0.842	11.1	28.4	115.3
YOLOv8m	0.971	0.847	25.8	78.7	38.3
YOLOv8l	0.980	0.861	43.5	164.8	26.4
YOLOv8-R (Ours)	0.973	0.821	0.1	1.0	639.8

Table 3. Ablation study for the proposed YOLOv8-R. The up arrow (↑) indicates that higher values are better, and the down arrow (↓) indicates that lower values are better.

Model	Parameters/M ↓	$Δ_{Param}$	FLOPS/B ↓	$Δ_{FLOPS}$	FPS ↑	$Δ_{FPS}$
Baseline (YOLOv8n)	3.0	-	8.1	-	283.0	-
+ Star Block	2.2	−26.7%	6.5	−19.8%	286.2	+1.1%
+ LSCDH	1.4	−53.3%	4.5	−44.4%	314.9	+11.3%
+ LAMP Pruning (YOLOv8-R)	0.1	−96.7%	1.0	−87.7%	639.8	+126.1%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lei, X.; Chen, Y.; Zhang, L. Real-Time SLAM and Faster Object Detection on a Wheeled Lifting Robot with Mobile-ROS Interaction. Appl. Sci. 2024, 14, 5982. https://doi.org/10.3390/app14145982

AMA Style

Lei X, Chen Y, Zhang L. Real-Time SLAM and Faster Object Detection on a Wheeled Lifting Robot with Mobile-ROS Interaction. Applied Sciences. 2024; 14(14):5982. https://doi.org/10.3390/app14145982

Chicago/Turabian Style

Lei, Xiang, Yang Chen, and Lin Zhang. 2024. "Real-Time SLAM and Faster Object Detection on a Wheeled Lifting Robot with Mobile-ROS Interaction" Applied Sciences 14, no. 14: 5982. https://doi.org/10.3390/app14145982

APA Style

Lei, X., Chen, Y., & Zhang, L. (2024). Real-Time SLAM and Faster Object Detection on a Wheeled Lifting Robot with Mobile-ROS Interaction. Applied Sciences, 14(14), 5982. https://doi.org/10.3390/app14145982

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time SLAM and Faster Object Detection on a Wheeled Lifting Robot with Mobile-ROS Interaction

Abstract

1. Introduction

2. Related Work

2.1. SLAM

2.2. Lightweight Object Detection

3. Proposed Method

3.1. Overview

3.2. Mobile-ROS Interaction

3.3. Vision Subsystem

3.4. Mechanical and Electronic Control Subsystem

4. Experiments

4.1. Implementation Details

4.2. Evaluation Metrics

4.3. Comparative Analysis

4.4. Ablation Study

5. Real-World Tests

5.1. Mobile-ROS Interaction Test

5.2. Onboard Object Detection Test

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI