LiDAR–Visual–Inertial Multi-UGV Collaborative SLAM Framework

Wei, Hongyu; Wu, Pingfan; Zhang, Xutong; Zheng, Jianyong; Zhang, Jianzheng; Wei, Kun

doi:10.3390/drones10010031

Open AccessArticle

LiDAR–Visual–Inertial Multi-UGV Collaborative SLAM Framework

by

Hongyu Wei

^1,*

,

Pingfan Wu

¹,

Xutong Zhang

¹,

Jianyong Zheng

¹,

Jianzheng Zhang

² and

Kun Wei

²

¹

College of Future Technology, Shanghai University, Shanghai 201900, China

²

Shanghai SAGE Intelligent Technology Co., Ltd., Shanghai 201100, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(1), 31; https://doi.org/10.3390/drones10010031

Submission received: 2 December 2025 / Revised: 30 December 2025 / Accepted: 1 January 2026 / Published: 5 January 2026

(This article belongs to the Topic International Conference on Autonomous Unmanned Systems (5th ICAUS 2025))

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A LiDAR-visual–inertial fusion-based collaborative SLAM framework is here proposed, achieving precise fused mapping and trajectory estimation in GPS-denied environments for UGVs.
The proposed framework consists of three stages. The first provides accurate odometry information for each UGV, the second achieves global map generation based on local map similarities, and the third constructs a multi-UGV global map. It is validated through real-world experiments, demonstrating remarkable performance compared to existing methods.

What are the implications of the main findings?

This framework enhances the robustness of multi-UGV operations in large-scale, challenging environments, supporting applications in autonomous logistics and search and rescue missions.
It advances C-SLAM technology by emphasizing front-end sensor fusion, reducing the dependency on back-end integration alone and facilitating scalable deployment in real-world unmanned systems.

Abstract

The collaborative execution of tasks by multiple Unmanned Ground Vehicles (UGVs) has become a development trend in the field of unmanned systems. Existing collaborative Simultaneous Localization and Mapping (SLAM) frameworks mainly employ methods based on visual–inertial or LiDAR–inertial. However, the use of C-SLAM based on these three types of sensors is relatively less common. Therefore, these systems cannot achieve robust and accurate global localization performance in real-world environments. In order to address this issue, a LiDAR–visual–inertial multi-UGV collaborative SLAM framework is proposed in this paper. The whole system is divided into three parts. The first part constructs a front-end odometry by integrating the raw information from LiDAR, visual, and inertial sensors, which provides the accurate initial pose estimation and local mapping of each UGV for the collaborative system. The second part utilizes the similarity of different local mappings to form a global mapping of the environment. The third part achieves global localization and mapping optimization for multi-UGV localization system. In order to verify the effectiveness of the proposed framework, a series of real-world experiments have been conducted. Over an average trajectory length of 237 m, the framework achieves a mean Absolute Pose Error (APE) of 1.49 m and Relative Pose Error (RPE) of 1.68° after the global optimization. The experimental results demonstrate that the proposed framework achieves superior collaborative localization and mapping performance, with the mean APE reduced by 5.4% and mean RPE reduced by 1.4% compared to other methods.

Keywords:

Unmanned Ground Vehicle (UGV); sensor fusion; collaborative mapping; large scale environment; SLAM

1. Introduction

Over the past few decades, SLAM has made remarkable advancements, enabling UGVs to concurrently localize themselves and construct maps in unknown environments [1,2]. Based on the rapid development of SLAM technology, autonomous vehicles [3], drones [4] and home robots [5] are now increasingly widely known and used by people. Although the existing SLAM methods enable the agents to perform tasks completely in some special environments, they still cannot meet the needs of practical application.

Nowadays, it is necessary to utilize multiple agents to perform more challenging tasks collaboratively [6]. Therefore, collaborative SLAM (C-SLAM) has become a technology in urgent need of development and breakthrough. Existing C-SLAM methods mainly focus on the improvement of accuracy and robustness in simulated environments [7].

However, implementing multi-UGV collaborative SLAM presents some challenges. UGVs need to sense their surroundings while understanding their current state. Furthermore, UGVs need to understand the poses and environmental information of other UGVs. The challenges become particularly difficult when a Global Positioning System (GPS) is unavailable. To address the challenges above, UGVs need to reconstruct a local map by fusing multi-source sensors and build a global map by exchanging information.

In this paper, a LiDAR–visual–inertial multi-UGV collaborative SLAM framework is proposed. It helps the multi-UGV system to achieve high-precision collaborative mapping without GPS or geo-tagged databases. To summarize, the features of the proposed framework are the following:

A new LiDAR–visual–inertial multi-UGV collaborative SLAM framework is proposed, which mainly comprises odometry, segment management, and global optimization modules.

A high-precision front-end odometry system integrating images, LiDAR point clouds and IMU data has been proposed, which effectively enhances the pose estimation accuracy and global map construction accuracy of the collaborative SLAM system.

A large number of experiments have been conducted to verify the positioning accuracy and global robustness of the proposed framework.

2. Related Works

The application of SLAM to multi-UGV collaboration through multi-sensor fusion encompasses a series of stages that intersect diverse research domains. To establish a comprehensive understanding of the background, this section synthesizes a comprehensive review of the pertinent literature. Section 2.1 covers advancements in front-end odometry systems within SLAM, whereas Section 2.2 covers the back-end fusion and optimization of SLAM. Additionally, Section 2.3 introduces the existing technologies for the collaborative SLAM system.

2.1. Front-End Odometry Systems

In recent years, numerous studies in the field of collaboration SLAM have utilized various types of sensors. The sensors primarily include cameras, LiDAR, and Inertial Measurement Units (IMUs) [8]. Visual-based methods are preferred by researchers due to the low cost and ease of deployment of such sensor devices. Early seminal works predominantly employed feature-based approaches. These methods estimate the camera pose by detecting feature points in the image, as exemplified by ORB-SLAM2 [9]. However, feature-based methods lack scale information, leading to scale uncertainty in trajectory estimation and scale drift. To address this, integrating an IMU sensor provides an effective solution to tracking failures in challenging environments. For instance, some of the earlier works including those on ORB-SLAM3 [10] and OpenVINS [11] exemplify this approach. Furthermore, in recent years, researchers have also devoted a lot of work to visual–inertial SLAM. LIU et al. proposed RDS-SLAM [12], enabling real-time and robust tracking in dynamic environments. In 2024, Fan et al. proposed SchurVINS [13], which significantly improves computational efficiency, rendering it deployable on resource-constrained devices, such as autonomous vehicles.

Visual sensors are susceptible to illumination and texture variations, while LiDAR-based SLAM addresses the limitations of visual methods by providing high-density 3D point clouds with a geometric structure, enabling robust pose estimation in textureless or low-light environments. Consequently, LiDAR has been extensively applied in front-end odometry. Unlike visual-based methods’ reliance on image features, LiDAR SLAM employs scan matching techniques like Iterative Closest Point (ICP) [14]. Through these methods, the point cloud can realize frame-to-frame alignment, offering superior accuracy in large-scale outdoor scenarios. To accelerate registration, Shan proposed the LeGO-LOAM [15], incorporating point cloud clustering and ground segmentation into the data preprocessing stage. In recent research work, FAST-LIO2 [16] has offered a fast, powerful and multi-functional LiDAR inertial odometry framework. He et al. proposed PLACE-LIO [17], presenting a LiDAR inertial odometry method centered on the plane and combining it with a plane-occupied voxel grid for map representation. In some studies like that on LVI-SAM [18], visual data were incorporated into LiDAR-based methods.

2.2. Back-End Fusion and Optimization

Back-end optimization is crucial for achieving high-precision, globally consistent trajectory estimation and map representation. It systematically corrects the drift errors accumulated in the agents’ trajectories through pose graph optimization and loop closure detection mechanisms. For example, LOAM [19] constructs an error function based on the distances from points to lines and planes, and solves the nonlinear pose optimization problem. DL-SLOT [20] proposes a sliding window-based 3D multi-object tracking method that integrates the state estimation of static and dynamic objects in the environment into a unified optimization framework. EPL-VINS [21] allows novel representation for spatial lines, based on which it constructs line reprojection residuals, and performs two-degree-of-freedom (2-DoF) optimization on spatial lines in the backend.

Researchers have also conducted extensive work in the domain of loop closure optimization techniques. Loopy-SLAM [22] employs frame-to-model tracking via a data-driven, point-based submap generation approach, while enabling online loop closure detection through global place recognition. Shi et al. propose a novel multi-head network, termed LCR-Net [23], which integrates loop closure and relocalization within a unified framework. The 2DLIW-SLAM [24] framework integrates multi-sensor data from 2D LiDAR, inertial measurement unit (IMU), and wheel odometry, while introducing a novel loop closure detection algorithm based on global feature point matching.

2.3. Collaborative SLAM Systems

Deploying multi-UGV collaborative SLAM introduces several key challenges. Beyond estimating their own poses, UGVs must effectively sense the surrounding environment. Additionally, they require mechanisms to comprehend the poses and environmental observations related to fellow UGVs. Over recent years, a growing body of research has explored advancements in collaborative systems. Ref. [25] proposed a visual SLAM algorithm combining point, line, and plane features for collaborative optimization. Additionally, Ref. [26] proposed an active visual SLAM approach with a multi-agent centralized distributed architecture system. Recent studies on vision-based methods have focused on reducing the data volume in the fusion process. Ref. [27] utilized sparse distance measurements and odometry information for efficient localization. Obviously, the vision-based collaborative research has gradually matured. Ref. [28] introduced a cooperative LiDAR-based SLAM approach for localization in an unfamiliar environment with minimal data exchange. Ref. [29] introduced a C-SLAM system supporting LiDAR, stereo, and RGB-D sensing. Ref. [30] proposed a multi-robot collaborative SLAM approach using LiDAR data with a new environmental feature descriptor. It can be seen that the use of LiDAR provides a new solution for multi-robot collaboration.

3. Proposed Methods

3.1. Framework Overview

An overview of the proposed collaborative SLAM framework is illustrated in Figure 1. Each individual UGV is equipped with identical modules, including the front-end odometry module and local segment management module. Additionally, there is a shared global optimization module, enabling collaborative localization and mapping among multiple UGVs. The specific process is as follows. Firstly, LiDAR point cloud, camera image and IMU data are captured by sensors on UGVs. Secondly, these sensor data are fed into the LiDAR–visual–inertial odometry module to extract features and estimate odometry information. Then, the extracted de-skewed point cloud and LiDAR odometry data are sent to the Segment Management module to construct a local map. Finally, the local map is forwarded to the shared global optimizer. Subsequent loop closure detection and pose graph optimization refine the trajectory estimates. This refinement thereby culminates in the finalization of multi-UGV collaborative localization and mapping.

3.2. Front-End Odometry

As shown in Figure 1, the processing pipeline utilizes point cloud, image, and IMU measurement information as the input for the front-end odometry module. The module contains two key sub-systems: a LiDAR–inertial system (LIS) and a visual–inertial system (VIS). The LIS module extracts LiDAR features to obtain the de-skewed point cloud and uses these features together with the VIS to calculate the LiDAR odometry. Besides this, the VIS module estimates the visual odometry by minimizing the combined residuals of visual and IMU data. Finally, the de-skewed point cloud is exported to the backend along with the LiDAR odometry.

The LIS comprises feature extraction and feature matching modules to extract point cloud features and compute LiDAR odometry. Specifically, the process involves using IMU measurements to calculate the de-skewed point cloud

P_{d s}

at time

k

. Simultaneously, edge and plane features are extracted from the raw point cloud. After the extraction, the edge and plane features are used solely for feature matching. The VIS is mainly composed of three submodules including depth registration, feature tracker and LiDAR-aided visual–inertial odometry. The initialization of the visual–inertial odometry within VIS is prone to divergence due to highly nonlinear issues. This is exacerbated by unobservable scale changes when the sensor moves slowly or at a constant speed, where acceleration excitation is insufficient. Additionally, monocular cameras lack depth information. The de-skewed point cloud

P_{d s}

can provide depth references for visual odometry. Therefore, it is passed to the VIS to assist in depth estimation. During the VIS initialization phase, the output of the LiDAR-aided visual–inertial odometry is fed back to the depth registration module. This feedback uses estimated inter-frame poses to register and accumulate multiple consecutive de-skewed LiDAR frames, generating a denser sparse depth map that significantly enhances the reliability of depth association for visual features. Specifically, the LIS is integrated into visual–inertial odometry calculations, leveraging LiDAR’s ability to directly obtain depth data. During initialization, the LIS is first initialized, and its estimates serve as initial values. Visual features are then correlated with LiDAR depth points via interpolation, with both projected onto a camera-centered unit sphere. After downsampling, depth points are stored in polar coordinates, and a 2D K-D tree is built to find the three closest depth points for each visual feature. Notably, the IMU bias between two key image frames is assumed to be constant during this process. This depth association significantly enhances the initialization speed and robustness of the VIS.

Despite advancements in initialization, the VIS remains susceptible to runtime failures in challenging operational scenarios. It is particularly susceptible under conditions such as abrupt motion changes, varying illumination, or sparse-textured environments. Rapid movements or textureless surroundings cause a significant drop in tracked features, which may lead to optimization divergence and large IMU bias estimates. Consequently, VIS is considered to have failed if the number of tracked features drops below a set threshold or the IMU bias estimate exceeds a threshold. Upon failure, the VIS reinitializes and alerts the LIS to avoid contaminating it with contaminated data.

Following the depth-assisted initialization, convergence is accelerated, and the overall robustness of the VIS is enhanced. Then, the system proceeds to the continuous tracking phase. In the feature tracker module, corner points are detected as visual features and tracked by the Kanade–Lucas–Tomasi optical flow method [14]. This can maintain feature correspondence across frames, enabling system state estimation. The system state is represented as

x = [R, p, v, b],

(1)

where

R \in S O (3)

is the rotation matrix,

p \in R^{3}

is the translation vector,

v \in R^{3}

is the linear velocity,

b = [b_{a}, b_{w}]

is the IMU bias vector for the IMU’s acceleration and angular velocity, and the process of transforming the local coordinate system to the world coordinate system can be represented as

T = [R | p]

.

Meanwhile, a sliding window mechanism is used to minimize reprojection errors and IMU measurement residuals, and the optimization formula is

\min_{x} \sum_{(i, j, k) \in F} {‖r_{r e p r o j}^{i, j, k}‖}_{\sum_{r e p r o j}}^{2} + \sum_{(j, k) \in M} {‖r_{I M U}^{j, k}‖}_{\sum_{I M U}}^{2},

(2)

where

r_{r e p o r j}^{i, j, k} (x)

represents the residuals of the feature point

i

in frame

j

and frame

k

, and

r_{I M U}^{j, k} (x)

represents the residual of the IMU pre-integration between two frames.

F

is the set of feature point observations,

M

is the set of IMU measurements, and

\sum_{r e p r o j}

and

\sum_{I M U}

are the corresponding covariance matrices.

Following this optimization, the refined visual–inertial odometry is derived through LiDAR-aided visual–inertial odometry estimation. Together with the visual odometry from VIS, the edge and plane features extracted before are matched with the feature map maintained in a sliding window via scan-matching. This produces the estimated LiDAR odometry

T_{k}

. Lastly,

T_{k}

, along with the de-skewed point cloud

P_{d s}

, are passed to the backend for mapping.

3.3. Segment Management

The segment management module is specifically designed for the construction and management of the local maps. The tightly coupled LiDAR–visual–inertial odometry and the de-skewed point cloud from the front-end are received as the module input. The data structure in this module is shown in Figure 2.

Each submap comprises a pose segment and a local map, enabling localized trajectory estimation and environmental modeling. The pose segment is constituted by nodes and edges. The nodes represent the absolute LiDAR poses, capturing the sensor’s position and orientation at discrete timestamps. These nodes are linked by edges that represent the relative poses between frames. The initial pose incorporated is designated as the base node. The local point cloud map is composed of multiple frames of de-skewed point clouds under the coordinate system of the reference node. Specifically, the input data pair

(P_{d s}^{k}, T_{k - 1}^{k})

at time

k

is transformed to the reference node coordinate system of the submap. The local point cloud is aligned by minimizing the distance between the point cloud of the current frame

P_{d s}^{k}

and the previous frame

P_{d s}^{k - 1}

, formulated as

\min_{T} \sum_{i} {‖p_{k}^{i} - T_{k - 1}^{k} (p_{k - 1}^{i})‖}^{2},

(3)

where

p_{k}^{i}

and

p_{k - 1}^{i}

, respectively, represent the

i

-th corresponding point in the point clouds of frame

k

and frame

k - 1

, while

T_{k - 1}^{k}

denotes the optimized relative transformation matrix.

Finally, the pose graph is updated, with the optimized relative pose

T_{k - 1}^{k}

used as an edge, and the current frame pose is used as a node to add the pose graph, where the update is given by

P_{k} = P_{k - 1} ⨁ T_{k - 1}^{k},

(4)

Here,

P_{k}

and

P_{k - 1}

represent the pose graph at time

k

and time

k - 1

, and

\oplus

denotes the pose transformation, which involves the combined operation of rotation and translation.

It should be noted that, to ensure the accuracy of the pose and reduce redundant computation, scan to map registration is performed every five frames.

3.4. Global Optimizer

The Global Optimizer is composed of loop detection and pose graph optimization. Figure 3 shows a clear illustration of the pose graph optimization process. Through this optimization, the coordinate systems of multiple UGVs are unified, cumulative errors are eliminated, and multi-session maps are integrated.

To fuse sub-maps from multiple UGVs into a global map, a global optimizer module with loop detection and pose optimization is proposed. Inspired by GACM [31], obstacle contours are extracted to ensure scene consistency and generate thumbnails for matching. Firstly, the ground plane normal vector

N_{p}

is estimated using the RANSAC [32] algorithm. Subsequently, point clouds within a threshold distance from the ground are removed, retaining obstacle point clouds, which are projected onto the ground plane along the normal vector. The projected point cloud is aligned to the Z-axis using a rotation matrix, generating a 50 m × 50 m thumbnail with a resolution of 0.5 m/pixel. Notably, thumbnails are generated every 40 frames to balance scene coverage and computational efficiency.

After generating the thumbnails, scene similarity is measured using NetVLAD [33] descriptors. To incorporate geometric constraints for submap alignment, the base node poses of all sub-maps are indexed in a K-D tree data structure. This structure facilitates the selection of the k nearest candidate submaps by querying for the closest base nodes in pose space. Subsequently, the Normal Distributions Transform (NDT) registration is applied to align the projected point clouds. Specifically, point clouds from the current sub-map

P

and point clouds from candidates Q are aligned to estimate rigid transformations. The transformations are expressed as

T^{*} = \arg \min_{T} \sum_{i} e x p ({- \frac{1}{2} (p_{i} - T q_{i})}^{T} \sum^{- 1} (p_{i} - T q_{i})),

(5)

where

T^{*}

denotes the optimal rigid transformation,

T

is the rigid transformation to be optimized,

p_{i} \in P

and

q_{i} \in Q

represent the source and target points, respectively, and

\sum

is the Gaussian covariance matrix.

For candidates passing NDT verification, the Generalized–Iterative–Closest Point algorithm is used to compute the precise relative transformation between the point clouds of the two sub-maps. Specifically, the operational process involves minimizing point-to-point distances and covariance matrix differences, which can be represented as

T^{*} = \arg \min_{T} \sum_{i} e x p ({- \frac{1}{2} (p_{i} - T q_{i})}^{T} (\sum_{p}^{i} + R \sum_{q}^{i} R^{T}) (p_{i} - T q_{i})),

(6)

Here,

p_{i} \in P

and

q_{i} \in Q

represent the source and target points,

\sum_{p}^{i}

and

\sum_{q}^{i}

are the covariances of points

p_{i}

and

q_{i}

,

R

is the rotation matrix and

T \in S E (3)

is the rigid body transformation matrix that aligns the source point cloud to the target point cloud.

These transformations are added as loop-closure constraints to the global pose graph, completing loop-closure detection. Finally, pose graph optimization is performed to optimize the global pose graph.

As illustrated in the Algorithm 1, the framework achieves multi-UGV collaborative SLAM through front-end LiDAR–visual–inertial odometry for robust local pose estimation, submap construction for consistent environmental modeling, and shared global optimization with loop closure detection to refine trajectories and fuse into a unified map.

Algorithm 1: Multi-UGV Collaborative SLAM Framework

Input: Sensor data of per UGV: {Camera images

I_{t}

, IMU measurements

M_{t}

, LiDAR points

L_{t}

}; UGVs

V_{1}

to

V_{N}

; thresholds

f_{t h}

,

b_{t h}

; window size

w

Output: Unified global map

M

and trajectory

\{T_{g l o b a l}\}

Initialize

for each UGV

V_{i}

in

\{V_{1} \dots V_{N}\}

do

Initialize FrontEndOdometry(

V_{i}

); // initial pose

T_{i}^{0}

Initialize LocalSegmentManager(

V_{i}

); // pose graph + local

end

SharedGlobalOptimizer

\leftarrow

InitializeGlobalOptimizer()

Front-End Odometry

while sensor data available do

for each UGV

V_{i}

do

// Front-End Odometry

CaptureSensors(

I_{t}^{i}

,

M_{t}^{i}

,

L_{t}^{i}

);

[T_{o d}^{i}, P_{d s}^{i}] \leftarrow {O d o m e t r y}_{L V I} (I_{t}^{i}

,

M_{t}^{i}

,

L_{t}^{i}

;

f_{t h}

,

b_{t h}

,

w)

if failure (low features or high bias) then

Reinit VIS with LIS fallback;

else

// Local Segment Management

Build local submap;

{S u b m a p}_{i}

← BuildLocalSubmap

(T_{o d}^{i}, P_{d s}^{i})

;

Send

{S u b m a p}_{i}

to SharedGlobalOptimizer;

end

Global Optimization

Thumbnails ← GenerateThumbnails({Submap1, …, SubmapN});

// Similarity matching + K-D tree for k-nearest

Candidates

\leftarrow

DetectLoops(Thumbnails; NetVLAD);

for each candidate pair

({S u b m a p}_{i}, {S u b m a p}_{j})

do

T_{r e l} \leftarrow

AlignSubmaps

({S u b m a p}_{i}, {S u b m a p}_{j})

do

AddLoopConstraint(

T_{r e l}

) // To global pose graph

end

GlobalPoseGraph

\leftarrow

OptimizePoseGraph({Submaps, constraints});

// Refine trajectories, unify coordinates

Update local poses {

T_{i}

} with global refinements;

GlobalMap

M

\leftarrow

FuseSubmaps(GlobalPoseGraph);

end

return GlobalMap

M

, trajectory

\{T_{g l o b a l}\}

4. Results

This section presents the results of the experiments undertaken to verify the feasibility of the proposed framework and evaluate the overall performance on the Unmanned Ground Vehicle platform. We have designed a series of evaluation experiments for each module in the system, aiming to quantify performance metrics like accuracy, robustness, and computational efficiency. To provide a more comprehensive overview of the experimental details, Section 4.1 introduces the data collection platform. In Section 4.2, the accuracy of the odometry module is evaluated and compared with vision-only and laser-vision methods. In addition, the entire collaborative mapping process is assessed in Section 4.3.

4.1. Data Collection

This work focuses on the collaborative mapping of large-scale exploration by an Unmanned Ground Vehicle (UGV) equipped with multiple sensor systems. Therefore, all evaluation experiments are based on real-world datasets. All real-world data are collected using this UGV, as shown in Figure 4a. The Unmanned Ground Vehicle is equipped with a sensor suite for perception and navigation, including a Velodyne VLP-16 LiDAR, a D435i camera with an IMU and a u-blox F9P GNSS Receiver. Sensor specifications are presented in Table 1.

Five sequences (UGV 1~5) are collected on the campus, including challenging scenes such as those with repetitive textures and dynamic scenes, as shown in Figure 4b. The IMU operates at a high 100 Hz rate to ensure accurate estimation during aggressive maneuvers, while the 30 Hz camera and 10 Hz LiDAR provide complementary visual and geometric information at frequencies typical for outdoor robotics applications. To ensure consistency across the environment, all sequences are collected in the same area and time period using the aforementioned UGV. Additionally, the GNSS data are collected as the ground truth for trajectory evaluation and the GNSS measurements achieve centimeter-level accuracy during the entire experiment. The front-end task is processed on the system with an AMD Ryzen 5 4600H CPU (3.0 GHz), NVIDIA GeForce RTX 1650 GPU, and 8 GB memory, while the back-end task operates on the system with an AMD Ryzen 5 5600H CPU (3.3 GHz), NVIDIA GeForce RTX 3050 GPU, and 16 GB memory.

4.2. Evaluation of Odometry Results

High-precision odometry is pivotal in establishing a robust foundation for subsequent fusion mapping processes, as it ensures the accuracy of initial pose estimates that propagate through the entire system. To evaluate the precision of front-end odometry, the evo package is used to compute metrics, including Absolute Pose Error (APE) and Relative Pose Error (RPE).

In APE tests, the absolute pose error for each aligned frame

i

is defined as

E_{i} = P_{g t, i}^{- 1} P_{e s t, i} \in SE (3),

(7)

where

P_{e s t}

represents the estimated poses and

P_{g t}

represents the ground truth poses, and the ATE is computed as the root mean squared error (RMSE),

A P E_{i} = ‖t r a n s (E_{i})‖ .

(8)

In RPE tests, the relative pose error between frames

i

and

j

is defined as

E_{i, j} = {(P_{g t, i}^{- 1} P_{e s t, j})}^{- 1} (P_{g t, i}^{- 1} P_{e s t, j}) \in SE (3),

(9)

where

P_{e s t}

and

P_{g t}

represent the estimated poses and the ground truth poses to measure local drift, and the RPE for each pair is defined as

R P E_{i, j} = |a n g l e ({l o g}_{S O (3)} (r o t (E_{i, j})))| .

(10)

where the angle is in degrees, and the final RPE is typically reported as the RMSE over all relative pairs.

Two other odometry methods are compared, including the visual-only method ORB-SLAM3 (mono) [10] and the LiDAR-Visual based method GACM [31]. The key parameter settings in the experiment are shown in Table 2. All parameters for the compared methods are based on the official recommended configurations, enabling us to ensure a fair and reproducible comparison across all approaches. The loop detection modules are deactivated in these approaches to prioritize the evaluation of odometry performance. Since the estimated trajectories and GPS-based ground truth use different coordinate systems, the trajectory coordinates and scale are aligned prior to trajectory evaluation.

Table 3 presents the experimental results, listing the Root Mean Square Error (RMSE) with the best-performing results highlighted in bold. The experimental results demonstrate that the proposed odometry method outperforms the vision-only and LiDAR–visual methods in the APE and RPE test. These substantial gains are primarily attributable to a key innovation—the highly robust front-end odometry, which provides significantly lower initial drift than the vision-dominant method (ORB-SLAM3), especially in sequences with aggressive motion, illumination changes, and dynamic interference.

To evaluate the precision of odometry within the proposed framework across different directional components, a visualization of the odometry errors along these components is presented, as illustrated in Figure 5. It presents a comparative analysis of odometry errors along the East, North, and Up axes, as evaluated under different odometry methodologies employed in the front-end module. The proposed odometry method yields the minimal error in most sequences, underscoring the provision of an exceptionally robust foundation for mapping using front-end odometry.

It should be noted that, due to the occurrence of long-distance tracking losses in ORB-SLAM3, its predicted trajectory is substantially shorter. As a result, although the errors in the up direction appear minimal owing to insufficient samples, the actual errors remain considerable. This is particularly evident in the horizontal plane directions, where its errors far exceed those of the other methods. Overall, the odometry errors in the proposed framework are the smallest.

In summary, the proposed LiDAR–visual–inertial odometry module offers a robust solution. By incorporating constraints from LiDAR, visual, and IMU data, the odometry module substantially enhances the robustness and precision of pose estimation. This facilitates a solid foundation for the collaborative mapping of the environment.

4.3. Assessment of the Entire System

To investigate whether the odometry’s accuracy decreases due to the error accumulation in a multi-UGV framework during loop detection and global optimization, the proposed method is evaluated in a collaborative scenario. Three data sequences (UGV 1–3) are employed for the collaborative mapping experiment. Figure 6 illustrates the mapping results, showing maps from different UGVs displayed in distinct colors. The proposed method successfully merges maps from the different sequences, producing an accurate global map consistent with the real-world environment.

Simultaneously, the odometry trajectory accuracy of the UGV is assessed to determine whether substantial deviations are incurred following the integration of collaborative mapping. As illustrated in Figure 7, the trajectory optimized post-loop-closure is overlaid with the ground truth in a unified 3D visualization. The dashed line and the solid line, respectively, denote the ground truth trajectories and the estimated trajectories of three sequences. Notably, the estimated trajectory of each sequence closely overlaps with the ground truth. Besides this, the magnified views of the overlapping regions between different trajectories are provided. In these overlapping areas, the relative relationships among the trajectories are clearly visible, with all trajectories well aligned on the same horizontal plane, demonstrating that global optimization achieves effective and consistent fusion.

Furthermore, to offer a more objective and quantitative evaluation of whether odometry accuracy is enhanced post-loop-closure, we compared the variations in odometry errors across different processing stages, as presented in Table 4. This presents the variations in odometry accuracy across different processing stages of the system. Although not every UGV exhibits reduced errors following optimization, the aggregate system error diminishes overall. This outcome is intuitive, as the integration of pose graphs across multiple UGVs enables those with minimal drift to offset inaccuracies in their counterparts, which may in turn amplify errors in selected individual UGVs. The observed performance differences across sequences primarily stem from varying initial drift levels and the degree of trajectory overlap during inter-robot loop closure. Therefore, when the initial drift level is high, a slight increase in error may occur after loop detection and loop closure optimization.

Finally, we conducted an analysis of the real-time nature of the framework. Table 5 presents the time costs for each stage in the global optimization process. The majority of the time is spent on thumbnail generation and loop detection. The main reason is that the processes of multiple UGVs have scheduling delays, resulting in poor real-time performance.

5. Conclusions

In this paper, a multi-UGV collaborative SLAM framework based on a LiDAR–visual–inertial model is presented. It consists of three main components—a front-end LiDAR–visual–inertial odometry module for precise odometry estimation, a segment management module for combining pose and point cloud data to create local maps, and a global optimization module for fusing individual vehicle maps into a high-precision global map. Experiments in the real world confirm that the framework is effective for multi-UGV fusion SLAM, providing accurate odometry data and superior mapping results. Meanwhile, overly similar but not exactly identical scenarios may lead to incorrect closed-loop detection. In the future, developing more kinds of cooperative SLAM technology for unmanned devices and enhancing robustness against false detections will be focused on to enhance generalization ability.

Author Contributions

Conceptualization, H.W. and P.W.; methodology, H.W. and P.W.; validation, P.W., X.Z.; resources, J.Z. (Jianyong Zheng); writing, review, and editing, H.W. and P.W.; investigation, K.W.; project administration, funding acquisition, J.Z. (Jianzheng Zhang). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China 62403299, the Research Funds of Shanghai Collaborative Innovation Office XTCX-KJ-2023-2-17, the Research Fund for Advanced Ocean Institute of Southeast University, Nantong under Grant KP202403, Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515010269.

Data Availability Statement

The original data presented in the study are openly available at https://github.com/vanjules-shu/UGV_datasets (accessed on 25 December 2025).

Conflicts of Interest

Authors Jianzheng Zhang and Kun Wei were employed by the company Shanghai SAGE Intelligent Technology Co., Ltd. The remaining authors—Hongyu Wei, Pingfan Wu, Xutong Zhang, and Jianyong Zheng—declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zheng, S.; Wang, J.; Rizos, C.; Ding, W.; El-Mowafy, A. Simultaneous localization and mapping (SLAM) for autonomous driving: Concept and analysis. Remote Sens. 2023, 15, 1156. [Google Scholar] [CrossRef]
Ahmed, M.F.; Masood, K.; Fremont, V.; Fantoni, I. Active SLAM: A review on last decade. Sensors 2023, 23, 8097. [Google Scholar] [CrossRef] [PubMed]
He, W.; Chen, W.; Tian, S.; Zhang, L. Towards full autonomous driving: Challenges and frontiers. Front. Phys. 2024, 12, 1485026. [Google Scholar] [CrossRef]
Arafat, M.Y.; Alam, M.M.; Moh, S. Vision-based navigation techniques for unmanned aerial vehicles: Review and challenges. Drones 2023, 7, 89. [Google Scholar] [CrossRef]
Wang, S.; Ahmad, N.S. AI-based approaches for improving autonomous mobile robot localization in indoor environments: A comprehensive review. Eng. Sci. Technol. Int. J. 2025, 63, 101977. [Google Scholar] [CrossRef]
Kshirsagar, J.; Shue, S.; Conrad, J.M. A survey of implementation of multi-robot simultaneous localization and mapping. In Proceedings of the 2018 IEEE SoutheastCon, St. Petersburg, FL, USA, 19–22 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar] [CrossRef]
Chen, W.; Wang, X.; Gao, S.; Shang, G.; Zhou, C.; Li, Z.; Xu, C.; Hu, K. Overview of multi-robot collaborative SLAM from the perspective of data fusion. Machines 2023, 11, 653. [Google Scholar] [CrossRef]
Li, Z.; Jiang, C.; Gu, X.; Xu, Y.; Zhou, F.; Cui, J. Collaborative positioning for swarms: A brief survey of vision, LiDAR and wireless sensors based methods. Def. Technol. 2024, 33, 475–493. [Google Scholar] [CrossRef]
Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
Campos, C.; Elvira, R.; Rodríguez, J.J.; Montiel, J.M.; Tardós, J.D. ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]
Geneva, P.; Eckenhoff, K.; Lee, W.; Yang, Y.; Huang, G. OpenVINS: A research platform for visual-inertial estimation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 4666–4672. [Google Scholar]
Liu, Y.; Miura, J. RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods. IEEE Access 2021, 9, 23772–23785. [Google Scholar] [CrossRef]
Fan, Y.; Zhao, T.; Wang, G. SchurVINS: Schur complement-based lightweight visual inertial navigation system. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 17964–17973. [Google Scholar]
Besl, P.J.; McKay, N.D. Method for registration of 3-D shapes. In Fusion IV: Control Paradigms and Data Structures; SPIE: Bellingham, WA, USA, 1992; Volume 1611, pp. 586–606. [Google Scholar]
Shan, T.; Englot, B. LeGO-LOAM: Lightweight and ground-optimized LiDAR odometry and mapping on variable terrain. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4758–4765. [Google Scholar]
Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. Fast-LIO2: Fast direct LiDAR-inertial odometry. IEEE Trans. Robot. 2022, 38, 2053–2073. [Google Scholar] [CrossRef]
He, L.; Li, B.; Chen, G.E. PLACE-LIO: Plane-Centric LiDAR-Inertial Odometry. IEEE Robot. Autom. Lett. 2025, in press. [Google Scholar]
Shan, T.; Englot, B.; Ratti, C.; Rus, D. LVI-SAM: Tightly-coupled LiDAR-visual-inertial odometry via smoothing and mapping. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5692–5698. [Google Scholar]
Zhang, J.; Singh, S. LOAM: LiDAR odometry and mapping in real-time. In Robotics: Science and Systems; MIT Press: Cambridge, MA, USA, 2014; Volume 2, pp. 1–9. [Google Scholar]
Tian, X.; Zhu, Z.; Zhao, J.; Tian, G.; Ye, C. DL-SLOT: Tightly-coupled dynamic LiDAR SLAM and 3D object tracking based on collaborative graph optimization. IEEE Trans. Intell. Veh. 2023, 9, 1017–1027. [Google Scholar] [CrossRef]
Zeng, D.; Liu, X.; Huang, K.; Liu, J. EPL-VINS: Efficient point-line fusion visual-inertial SLAM with LK-RG line tracking method and 2-DoF line optimization. IEEE Robot. Autom. Lett. 2024, 9, 5911–5918. [Google Scholar] [CrossRef]
Liso, L.; Sandström, E.; Yugay, V.; Van Gool, L.; Oswald, M.R. Loopy-SLAM: Dense neural SLAM with loop closures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 20363–20373. [Google Scholar]
Shi, C.; Chen, X.; Xiao, J.; Dai, B.; Lu, H. Fast and accurate deep loop closing and relocalization for reliable LiDAR SLAM. IEEE Trans. Robot. 2024, 40, 2620–2640. [Google Scholar] [CrossRef]
Zhang, B.; Peng, Z.; Zeng, B.; Lu, J. 2DLIW-SLAM: 2D LiDAR-inertial-wheel odometry with real-time loop closure. Meas. Sci. Technol. 2024, 35, 075205. [Google Scholar] [CrossRef]
Zhu, J.; Yang, B.; Yu, Z.; Hu, H. Visual SLAM based on collaborative optimization of points, lines, and planes. In Proceedings of the 2024 IEEE International Conference on Unmanned Systems (ICUS), Nanjing, China, 18–20 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 375–379. [Google Scholar]
Elahian, S.; Atashgah, M.A.; Tarvirdizadeh, B. Collaborative autonomous navigation of quadrotors in unknown outdoor environments: An active visual SLAM approach. IEEE Access 2024, 12, 147115–147128. [Google Scholar] [CrossRef]
Lee, Y.H.; Zhu, C.; Wiedemann, T.; Staudinger, E.; Zhang, S.; Günther, C. CovoR-SLAM: Cooperative SLAM using visual odometry and ranges for multi-robot systems. IEEE Trans. Intell. Transp. Syst. 2025, in press. [Google Scholar] [CrossRef]
Zhong, S.; Qi, Y.; Chen, Z.; Wu, J.; Chen, H.; Liu, M. DCL-SLAM: A distributed collaborative LiDAR SLAM framework for a robotic swarm. IEEE Sens. J. 2023, 24, 4786–4797. [Google Scholar] [CrossRef]
Lajoie, P.Y.; Beltrame, G. Swarm-SLAM: Sparse decentralized collaborative simultaneous localization and mapping framework for multi-robot systems. IEEE Robot. Autom. Lett. 2023, 9, 475–482. [Google Scholar] [CrossRef]
Shao, S.; Han, G.; Jia, H.; Shi, X.; Wang, T.; Song, C.; Hu, C. Multi-robot collaborative SLAM based on novel descriptor with LiDAR remote sensing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 19317–19327. [Google Scholar] [CrossRef]
He, J.; Zhou, Y.; Huang, L.; Kong, Y.; Cheng, H. Ground and aerial collaborative mapping in urban environments. IEEE Robot. Autom. Lett. 2020, 6, 95–102. [Google Scholar] [CrossRef]
Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 5297–5307. [Google Scholar]

Figure 1. The proposed cooperative framework. Each UGV uses laser point clouds, RGB images, and inertial measurements for UGV motion estimation. Local mapping builds local maps from raw sensor data. The global optimizer combines these into a unified global map.

Figure 2. A visual representation of the local maintained in the segment management module, illustrating the data structure of the submaps.

Figure 3. Visual depiction of multi-UGV pose optimization, featuring orange dashed lines to represent loop-closure constraints and a magnified view highlighting the optimization process across red and green trajectories. Different colored lines represent different trajectories.

Figure 4. (a) Four-wheel UGV used as a data acquisition platform equipped with LiDAR, camera, IMU and GNSS receiver. (b) Some challenging scenarios in the dataset, including repetitive textures like trees and dynamic scenes including pedestrians and vehicles.

Figure 5. The errors of front-end odometry across the East, North, and Up components, wherein values approaching zero signify diminishing error magnitudes. For each component, a localized magnified inset is furnished.

Figure 6. Collaborative mapping results, where UGVs 1 to 3 are represented in pink, cyan, and yellow, respectively.

Figure 7. Comparative visualization of the trajectory following unmanned ground vehicle collaborative fusion mapping and loop closure optimization against the ground truth trajectory.

Table 1. Detailed parameters of sensors.

Parameter	Value
IMU frequency	$100 H z$
Camera frequency	$30 H z$
Camera resolution	$1280 \times 720$
GNSS frequency	$10 H z$
LiDAR frequency	$10 H z$
Gyroscope noise	$1.89 \times 10^{- 3} r a d / s^{- 0.5}$
Acceleration noise	$1.22 \times 10^{- 2} m / s^{- 1.5}$
Gyroscope random walk	$1.58 \times 10^{- 5} r a d / s^{- 1.5}$
Acceleration random walk	$3.28 \times 10^{- 4} m / s^{- 2.5}$

Table 2. The key parameters of the algorithms.

Parameter	ORB-SLAM3	GACM	Proposed
Visual Features	(max) 2000	(max) 50	(max) 150
LiDAR Feature	------	(max) keylines: 80	(min) edge:10 surface: 100
Scale Pyramid	levels: 8	levels: 1	------
Feature Distance	factor: 1.2	factor: 1.2	(min) 20 pixels
Voxel Filter	------	(min) 40 pixels	0.4 m

Table 3. Comparison of different odometry methods.

Method			ORB-SLAM3		GACM		Proposed
Sequence	Length (m)	Duration (s)	APE (m)	RPE (°)	APE (m)	RPE (°)	APE (m)	RPE (°)
UGV1	255.6	319.4	18.36	6.244	5.734	2.239	1.231	0.902
UGV2	258.9	311.3	31.01	10.24	4.080	2.660	1.785	2.919
UGV3	197.8	246.5	30.15	11.73	1.838	2.476	1.590	1.223
UGV4	191.5	234.3	26.82	4.168	1.405	2.232	1.584	1.459
UGV5	212.2	254.8	17.86	4.982	3.930	2.009	3.339	3.050
Mean	------	------	24.84	7.472	3.398	2.323	1.906	1.911

Table 4. Assessment of the comprehensive system.

Stage	Front-End Odometry		Loop Detection		Global Optimization
Sequence	APE (m)	RPE (°)	APE (m)	RPE (°)	APE (m)	RPE (°)
UGV1	1.231	0.902	1.347	0.903	1.176	0.902
UGV2	1.785	2.919	1.782	2.934	1.795	2.934
UGV3	1.590	1.223	1.588	1.205	1.564	1.204
Mean	1.535	1.681	1.572	1.681	1.488	1.680

Table 5. Time cost of the global optimization.

Task	Min (s)	Max (s)	Mean (s)
Thumbnail Generation	0.186	62.51	5.798
Descriptor Extraction	0.002	0.526	0.067
Loop Detection	1.017	73.02	8.388
Global Graph Optimization	0.040	11.93	1.056

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, H.; Wu, P.; Zhang, X.; Zheng, J.; Zhang, J.; Wei, K. LiDAR–Visual–Inertial Multi-UGV Collaborative SLAM Framework. Drones 2026, 10, 31. https://doi.org/10.3390/drones10010031

AMA Style

Wei H, Wu P, Zhang X, Zheng J, Zhang J, Wei K. LiDAR–Visual–Inertial Multi-UGV Collaborative SLAM Framework. Drones. 2026; 10(1):31. https://doi.org/10.3390/drones10010031

Chicago/Turabian Style

Wei, Hongyu, Pingfan Wu, Xutong Zhang, Jianyong Zheng, Jianzheng Zhang, and Kun Wei. 2026. "LiDAR–Visual–Inertial Multi-UGV Collaborative SLAM Framework" Drones 10, no. 1: 31. https://doi.org/10.3390/drones10010031

APA Style

Wei, H., Wu, P., Zhang, X., Zheng, J., Zhang, J., & Wei, K. (2026). LiDAR–Visual–Inertial Multi-UGV Collaborative SLAM Framework. Drones, 10(1), 31. https://doi.org/10.3390/drones10010031

Article Menu

LiDAR–Visual–Inertial Multi-UGV Collaborative SLAM Framework

Highlights

Abstract

1. Introduction

2. Related Works

2.1. Front-End Odometry Systems

2.2. Back-End Fusion and Optimization

2.3. Collaborative SLAM Systems

3. Proposed Methods

3.1. Framework Overview

3.2. Front-End Odometry

3.3. Segment Management

3.4. Global Optimizer

4. Results

4.1. Data Collection

4.2. Evaluation of Odometry Results

4.3. Assessment of the Entire System

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI