A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation

Liu, Peng; Bi, Yuxuan; Wang, Caixia; Jiang, Xiaojiao

doi:10.3390/drones9070504

Open AccessArticle

A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation

School of Electronic Information Engineering, Changchun University of Science and Technology, Jilin 130022, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(7), 504; https://doi.org/10.3390/drones9070504

Submission received: 22 May 2025 / Revised: 26 June 2025 / Accepted: 15 July 2025 / Published: 18 July 2025

(This article belongs to the Special Issue Advanced Cross-Domain Unmanned Platform Command and Security Technology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

To overcome the limitations in the perception performance of individual robots and homogeneous robot teams, this paper presents a distributed multi-robot collaborative SLAM method based on air–ground cross-domain cooperation. By integrating environmental perception data from UAV and UGV teams across air and ground domains, this method enables more efficient, robust, and globally consistent autonomous positioning and mapping. First, to address the challenge of significant differences in the field of view between UAVs and UGVs, which complicates achieving a unified environmental understanding, this paper proposes an iterative registration method based on semantic and geometric features assistance. This method calculates the correspondence probability of the air–ground loop closure keyframes using these features and iteratively computes the rotation angle and translation vector to determine the coordinate transformation matrix. The resulting matrix provides strong initialization for back-end optimization, which helps to significantly reduce global pose estimation errors. Next, to overcome the convergence difficulties and high computational complexity of large-scale distributed back-end nonlinear pose graph optimization, this paper introduces a multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization. This method constructs a multi-level, balanced pose subgraph based on the coupling degree of robot nodes. Then, it uses the minimization substitution function of non-trivial loss kernel optimization to gradually converge the distributed pose graph optimization problem to a first-order critical point, thereby significantly improving global pose estimation accuracy. Finally, experimental results on benchmark SLAM datasets and the GRACO dataset demonstrate that the proposed method effectively integrates environmental feature information from air–ground cross-domain UAV and UGV teams, achieving high-precision global pose estimation and map construction.

Keywords:

UAV; UGV; air–ground cross-domain; multi-robot collaborative SLAM; distributed pose graph optimization

1. Introduction

With the continuous advancement of computer vision and artificial intelligence, robot technology has entered a new stage of rapid development [1,2]. Among these advancements, CSLAM (collaborative simultaneous localization and mapping) has emerged as a core technology for multi-robot cooperative perception and environmental understanding, driving the intelligent evolution of robotics [3,4]. CSLAM tightly integrates environmental perception with multi-robot state estimation, enabling not only high-quality mapping and precise positioning but also improving robots’ adaptability to complex environments [5,6,7]. As a result, robots can operate in broader areas and accomplish more sophisticated tasks. However, while multi-robot collaboration within a single domain has shown great potential, it still faces inherent limitations [8]. To overcome these challenges, air–ground cross-domain collaborative multi-robot SLAM has gained increasing attention as a promising research direction [9,10]. By leveraging the complementary strengths of UAVs and UGVs —such as differences in motion characteristics, load capacity, and communication—this approach significantly expands the application scope of robotic systems, as shown in Figure 1.

In recent years, significant progress has been made in multi-robot collaborative SLAM algorithms. For instance, Rosen et al. [11] introduced SE-Sync, a method that reformulates the multi-robot pose-mapping optimization problem as a maximum likelihood estimation under assumed measurement noise distributions. This enables the recovery of globally optimal solutions for unique Euclidean synchronization. Similarly, Yu et al. [12] developed a Grassmannian manifold map merging algorithm that integrates line and surface features from multiple robots to achieve global map fusion. Building on this, Huang et al. [13] proposed DiSCo-SLAM, which utilizes lightweight descriptors for efficient inter-robot loop closure detection and employs a two-stage distributed Gauss–Seidel method for global pose optimization. Further advancing the field, Lajoie et al. [14] introduced Swarm SLAM, which maximizes algebraic connectivity to select the most critical loop closure keyframes, thereby minimizing communication overhead and enhancing global SLAM accuracy. Additionally, Zhong et al. [15] presented DCL-SLAM, which uses LiDAR–Iris descriptors for efficient loop closure recognition and enables fully distributed collaborative SLAM with interchangeable front-end LiDAR odometry. Despite these advancements, current algorithms still face challenges in air–ground cross-domain collaboration. The extreme disparity in the field of view between UAVs and UGVs makes it difficult to establish consistent environmental feature representations, significantly degrading SLAM accuracy and performance in air–ground multi-robot systems.

To address these limitations, this paper proposes a distributed multi-robot collaborative SLAM method based on air–ground cross-domain cooperation. The algorithm integrates environmental feature information from UAVs and UGVs to achieve global pose estimation and map construction in cross-domain scenarios. Specifically, each robot (UAVs and UGVs) runs a local single-robot semantic-assisted LiDAR odometry module at the front end, capturing real-time local keyframes of its surrounding environment. When UAVs and UGVs come within communication range, they exchange keyframe feature information and perform PCM (pairwise consistency maximization) incremental anomaly detection [16] to eliminate erroneous inter-robot loop closures. After validating loop closure candidates, the algorithm employs an iterative registration method based on semantic and geometric features assistance to compute the coordinate transformation matrix between UAVs and UGVs’ point clouds. This establishes a unified coordinate system and provides robust initialization for backend optimization. Finally, the method implements a multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization. This approach decouples factor graphs from different robot nodes, performing majorization–minimization with loss kernel optimization to ensure gentle convergence to first-order critical points. The high-precision positioning and map construction of the global robot are realized. The main contributions of this paper are as follows:

To address the significant field-of-view disparity and alignment errors between UAVs and UGVs, we propose an iterative registration method based on semantic and geometric features assistance. The method uses semantic and geometric features to calculate the corresponding probability of loop closures. Then, through iterative optimization of rotation angles and translation vectors, our approach computes the optimal coordinate transformation matrix, ensuring all robots operate in a unified coordinate system during backend optimization. This effectively reduces pose estimation errors in cross-domain collaboration.
To address the challenges in large-scale nonlinear DPGO, including convergence difficulties and high computational complexity, we propose a multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization. Our method constructs multi-level balanced subgraphs based on node coupling degrees. Our method formulates a non-trivial loss kernel optimization framework using majorization–minimization surrogate functions, enabling DPGO to converge smoothly to first-order critical points while significantly improving positioning and pose estimation accuracy.
Extensive experiments are conducted to evaluate the effectiveness of the proposed method. The experimental results show that the proposed method can effectively integrate the environment feature information of the air–ground cross-domain UAV and UGV teams and cooperate to achieve high-precision global pose estimation and map construction.

The remainder of this paper is structured as follows: Section 2 reviews related work in cross-domain collaborative SLAM. Section 3 elaborates on the proposed framework, followed by experimental validation in Section 4. Finally, Section 5 concludes this paper.

2. Related Work

In this section, we will briefly introduce existing research methods on distributed pose graph optimization and air–ground cross-domain cooperative SLAM.

2.1. Distributed Pose Graph Optimization

Choudhary et al. [17] first proposed the distributed Gauss–Seidel (DGS) algorithm, which approximates the maximum likelihood trajectory estimation problem by decoupling it into two sequential subproblems of rotation and translation components, resolved through a two-phase distributed DGS framework. During distributed iterations, each robot employs quadratic relaxation and projection for rotation initialization to eliminate non-convex constraints, followed by single-iteration Gauss–Newton refinement to recover complete poses, ultimately converging to centralized pose estimation benchmarks. Tian et al. [18] introduced an asynchronous stochastic parallel PGO method operable without synchronization. Through Riemannian gradient optimization with bounded time-delay rank relaxation, they established global first-order convergence with sufficiently small step sizes. Inspired by SE-Sync [11], they further proposed a verifiable distributed PGO method via Riemannian block coordinate descent, proving convergence to global first-order critical points under specified conditions—representing the first asynchronous algorithm for multi-robot SLAM applications.

Recent work by Tian et al. [19] enhanced the distributed PGO with robust kernel improvements using graduated non-convexity (GNC) [20] to estimate inter-frame transformations, initializing local trajectories in global coordinates. They formulated a progressively non-convex Riemannian block coordinate descent method to resolve distributed PGO optimization. Fan et al. [21] developed a majorization–minimization (MM) method for distributed PGO, ensuring convergence to first-order critical points. Unlike previous approaches representing each pose as individual nodes, their method updates all poses from the same robot as unified nodes, utilizing enhanced constraint information. They implemented an accelerated initialization approach with quadratic convergence properties for distributed PGO and redesigned the optimization using Nesterov [22] acceleration, significantly improving computational efficiency while reducing inter-node communication redundancy. Li et al. [23] proposed a multi-level graph-distributed PGO algorithm combining graph partitioning with Riemannian optimization. Their approach refines highly coupled nodes through multi-level graph partitioning, reformulates distributed PGO as low-rank convex relaxation, and solves it via modified Riemannian block coordinate descent.

2.2. Air–Ground Cross-Domain Cooperative SLAM

He et al. [24] proposed a camera–LiDAR fusion method for multi-robot collaborative SLAM. Their framework incorporates a visual–LiDAR odometry module with point-line-plane constraints to provide robust front-end odometry. The system generates thumbnail representations delineating obstacle contours, employs neural networks for descriptor extraction to facilitate inter-robot data association, and ultimately integrates map segments with robot poses to achieve superior cross-domain collaborative localization and mapping. Zhong et al. [15] devised DCL-SLAM, a front-end replaceable fully distributed SLAM framework for multi-robot collaboration in unknown environments. Operating through peer-to-peer communication, the system utilizes lightweight LiDAR–Iris descriptors for inter-robot place recognition and rejects spurious inter-robot loop closures via PCM-based maximal clique set verification.

Liu et al. [25] innovatively developed a bio-inspired semantic SLAM framework for large-scale heterogeneous cross-domain collaboration. Their approach constructs semi-dense semantic maps through bio-inspired tightly-coupled visual-inertial odometry, employing neural-inspired compact encoding for biological motion cues. The topological data of semantic features serve as cross-view scene descriptors, enabling air–ground heterogeneous robots to recognize identical scenes under viewpoint variations, thereby accomplishing large-scale environmental mapping. Duan et al. [26] created a ground-prior-assisted localization method for air–ground heterogeneous robots. The approach first generates rotation-invariant candidate co-visualization context descriptors using pre-mapped ground references, establishing unified representation for heterogeneous point cloud features. Heterogeneous robots then achieve refined positioning through descriptor matching between queried co-visual contexts and allocated candidate descriptors. Qiao et al. [27] introduced a genetic algorithm-based cross-domain semantic map registration method for significant viewpoint variations. This technique employs a fitness evaluation model combining semantic–geometric feature correlations, implements latent semantic correspondence selection to eliminate outliers, and incorporates an adaptive mutation step-size adjustment strategy to accelerate convergence while enhancing co-location precision.

3. Materials and Methods

Figure 2 demonstrates the architectural framework of the distributed multi-robot collaborative SLAM method based on air–ground cross-domain cooperation. The framework achieves environmental perception and global optimization through a coordinated mechanism integrating three modules, namely the robot-local front-end module, the distributed loop closure configuration module, and the robust DPGO back-end module. In the robot-local front-end module, both UGVs and UAVs employ SA-LIO [28] methodology to process raw LiDAR–IMU measurements, producing real-time local odometer trajectories while extracting semantic-assisted loop closure keyframes. The next step involves establishing good communication between robots. The distributed loop closure configuration module begins by utilizing the PCM incremental anomaly detection module to filter out erroneous loop closure detection keyframes that may arise from interactions between different robots. Only valid loop closure keyframes are retained for further processing. Next, an iterative registration method based on semantic and geometric features assistance calculates the precise transformation matrix between the corresponding valid loop closure detection keyframes. This process provides a reliable initialization for relative pose measurement used in the back-end DPGO. Finally, the robust DPGO back-end module constructs a multi-level balanced optimization subgraph based on the degree of coupling between the relative pose measurements of robot nodes. It constructs a majorization–minimization substitute function for nontrivial loss kernel optimization to integrate and optimize intra-robot local pose estimations and inter-robot relative pose, thereby achieving globally optimal trajectory estimation and high-precision map construction. The technical implementations of these modules will be comprehensively discussed in the following sections.

3.1. Robot-Local Front-End Module

Both UGVs and UAVs autonomously implement the SA-LIO [28] algorithm locally. This enables them to generate real-time local motion trajectories by fusing LiDAR and IMU data. The process also constructs semantic-enhanced local point cloud keyframes for loop closure detection. The algorithm effectively filters out dynamic objects and conducts precise semantic segmentation of static environments by processing point cloud data using a deep learning-based neural network. Laser odometry is utilized for real-time local pose estimation. Additionally, the semantic features extracted from the keyframes are employed to identify candidates for loop closure, which facilitates distributed loop closure detection for air–ground cross-domain teams of UAVs and UGVs. For a detailed introduction to the SA-LIO algorithm, please refer to reference [28]; further explanation is not provided here.

3.2. Distributed Loop Closure Configuration Module

When UAVs and UGVs enter the communication range, they exchange local loop closure keyframe information. To address the challenges of erroneous loop closure detection and the significant field-of-view disparity between UAVs and UGVs, the paper proposes a distributed loop closure configuration module. This module consists of the following two key components: (1) A PCM [16] incremental anomaly detection module that filters out incorrect inter-robot loop closures, and (2) an iterative registration method based on semantic and geometric features assistance that computes transformation matrices between corresponding air–ground keyframes, providing well-initialized, unified coordinate references for subsequent DPGO.

3.2.1. PCM Incremental Anomaly Detection Module

When two robots enter each other’s communication range, they exchange local loop closure detection keyframes. However, significant differences in the field of view between UAVs and UGVs, as well as inaccuracies in loop closure detection due to repetitive scenes, can severely impact subsequent global pose optimization. To eliminate errors in keyframes during the air–ground loop closure detection, we utilize the PCM [16] incremental anomaly detection module. The PCM incremental anomaly detection module verifies the accuracy of loop closure detection by examining the consistency of inter-robot noise measurements.

Consider a multi-robot system comprising

N = {α, β, γ, \dots}

robots, where each robot only possesses local perception capabilities. Each robot

α \in [N]

includes a set of

n

poses denoted as

g_{α}^{1}, \dots g_{α}^{n} \in S E (d)

, where

S E (d)

represents the special Euclidean group in d-dimensional space. A pose

g_{α}^{i} \in S E (d)

can be decomposed into a rotation matrix

R_{α}^{i} \in S O (d)

and a translation vector

T_{α}^{i} \in R^{d}

, with

S O (d)

denoting the special orthogonal group. The intra-robot noisy measurements are expressed as

{\bar{g}}_{α α}^{i j} = [\begin{matrix} {\bar{R}}_{α α}^{i j} & {\bar{T}}_{α α}^{i j} \\ 0 & 1 \end{matrix}] \in S E (d)

. Similarly, the inter-robot noisy measurements are defined as

{\tilde{g}}_{α β}^{i j} = [\begin{matrix} {\tilde{R}}_{α β}^{i j} & {\tilde{T}}_{α β}^{i j} \\ 0 & 1 \end{matrix}] \in S E (d)

, where

α, β \in N

denote distinct robots. Here,

{\bar{T}}_{α α}^{i j}, {\tilde{T}}_{α β}^{i j} \in R^{d}

represent translational measurements, while

{\bar{R}}_{α α}^{i j}, {\tilde{R}}_{α β}^{i j} \in S O (d)

correspond to rotational measurements. These measurements adhere to the following noise model:

{\tilde{R}}_{α i}^{β j} = {(R_{α}^{i})}^{T} R_{β}^{j} R_{\in}, R_{\in} ~ L a n g e v i n (I_{d}, w)

(1)

{\tilde{T}}_{α i}^{β j} = {(R_{α}^{i})}^{T} (T_{α}^{i} - T_{β}^{j}) + T_{\in}, T_{\in} ~ N (0, τ^{- 1} I_{d})

(2)

where

I_{d}

denotes the

(d \times d)

identity matrix. The isotropic Langevin noise model for rotational components serves as the directional counterpart to Gaussian noise applied in translational components.

The loop closure is deemed acceptable when the measurement noise between two robots satisfies the following condition:

{‖({\bar{g}}_{β β}^{l j} \cdot {\tilde{g}}_{β α}^{j i} \cdot {\bar{g}}_{α α}^{i k}) \cdot {\tilde{g}}_{β α}^{l k}^{- 1}‖}_{2}^{2} < σ

(3)

where

{\bar{g}}_{β β}^{l j}

and

{\bar{g}}_{α α}^{i k}

denote intra-robot noisy measurements between keyframes

l

and

j

for robot

β

, and between keyframes

i

and

k

for robot

α

, respectively. On the other hand, the terms

{\tilde{g}}_{β α}^{j i}

and

{\tilde{g}}_{β α}^{l k}

represent inter-robot noisy measurements:

{\tilde{g}}_{β α}^{j i}

refers to the measurement between robot

β

at keyframe

j

and robot

α

at keyframe

i

, while

{\tilde{g}}_{β α}^{l k}

corresponds to the measurement between robot

β

at keyframe

l

and robot

α

at keyframe

k

. Finally, the threshold parameter

σ

is empirically set to

σ = 5

in our system configuration.

We utilized the PCM incremental anomaly detection module to filter out erroneous loop closure detection keyframes that occurred due to interactions between different robots. Only the valid loop closure keyframes were retained for further processing, which ultimately improved the accuracy and robustness of the subsequent optimization process.

3.2.2. Iterative Registration Method Based on Semantic and Geometric Features Assistance

After identifying the correct loop closure detection keyframes between the UAVs and UGVs, the next step is to accurately determine the relative pose measurements. This step is essential for eliminating optimization errors that arise from the significant differences in field-of-view between the UAVs and UGVs. For the local loop closure semantic keyframes

A

and

B

, provided by the UAV and UGV, respectively, we aim to find a transformation matrix

T

. This matrix, which consists of a rotation matrix

R

and a translation vector

T

, seeks to minimize the coordinate transformation error in the point cloud mapping between the UAV and UGV. However, directly determining the transformation matrix

T

between the heterogeneous coordinate systems of the UAV and UGV is a highly challenging task [29].

To achieve precise global coordinate transformation, we recast the problem of minimizing coordinate transformation error as a probabilistic correspondence problem in heterogeneous point cloud alignment. Specifically, the higher the probability of the corresponding point in the local ring closed mapping, the smaller the error of the correlation coordinate transformation, and the higher the precision of the transformation matrix

T

[27]. Consequently, we propose an iterative registration method based on semantic and geometric feature assistance. The method structure is shown in Figure 3, which integrates semantic and geometric descriptors. It is followed by an iterative optimization scheme to determine the loop closure transformation matrix

T

through successive refinements of

R

and

T

.

Based on line and surface features descriptors extracted from UAV and UGV heterogeneous robots’ keyframes, we derive the key feature point set

P_{A} = {\{p_{A}^{k}\}}_{k = 1}^{m}

in local point cloud map

A

. For each key feature point

p_{A}^{k}

, its surrounding semantic distribution is characterized as

L_{A}^{k} = {\{l_{A}^{k}\}}_{k = 1}^{ω}

, where

l_{A}^{k}

represents the semantic labels of

ω

neighboring points. These points are clustered into

M

groups where all points within each group share identical semantic labels. Let

p_{A, m}^{k}

and

l_{A, m}^{k}

denote the geometric coordinate and semantic label of the

k

-th feature point in

P_{A}

, respectively, where

m \in M

. Assuming the transformed counterpart of

p_{A}^{k}

in the loop closure candidate map

B

is

p_{B}^{i}

, with

p_{B, m}^{i}

and

l_{B, m}^{i}

representing corresponding geometric coordinates and semantic labels, a higher correspondence probability between

p_{A}^{k}

and

p_{B}^{i}

in global coordinates enhances the accuracy of the transformation matrix

T

, thereby reducing alignment errors. The formula for calculating the probability of potential corresponding points is as follows:

ρ (p_{A, m}^{k}, p_{B, m}^{i}, l_{A, m}^{k}, l_{B, m}^{i}, T) = \sum_{m = 0}^{M} ρ_{g e} (p_{A, m}^{k}, p_{B, m}^{i}, T) \cdot ρ_{s e} (l_{A, m}^{k}, l_{B, m}^{i})

(4)

where

ρ_{s e} (l_{A, m}^{k}, l_{B, m}^{i})

represents the semantic similarity measure between keypoint

p_{A}^{k}

and its potential correspondence

p_{B}^{i}

. The value is 1 if and only if

p_{A}^{k}

and

p_{B}^{i}

have the same semantic feature label, and 0 in other cases. Equation (5) is as follows:

ρ_{s e} (l_{A, m}^{k}, l_{B, m}^{i}) = \{\begin{matrix} 1 l_{A, m}^{k} = l_{B, m}^{i} \\ 0 l_{A, m}^{k}! = l_{B, m}^{i} \end{matrix}

(5)

ρ_{g e} (p_{A, m}^{k}, p_{B, m}^{i}, T)

quantifies the geometric similarity measure between keypoint

p_{A}^{k}

and its potential correspondence

p_{B}^{i}

. The correspondence probability under transformation matrix

T

is computed using a Gaussian probability density function, formally expressed as follows:

ρ_{g e} (p_{A, m}^{k}, p_{B, m}^{i}, T) = \frac{1}{\sqrt{{(2 π)}^{3} \cdot |Z^{k, i}|}} e x p (- \frac{1}{2} {(p_{B}^{i} - T (p_{A}^{k}))}^{T} \cdot {Z^{k, i}}^{- 1} \cdot (p_{B}^{i} - T (p_{A}^{k})))

(6)

where

T (\cdot)

denotes the transformation function associated with the current transformation matrix

T

, and

Z^{k, i}

represents the covariance matrix of geometric feature residuals, computed as follows:

Z^{k, i} = ψ_{B, m}^{i} + R \cdot ψ_{A, m}^{k} \cdot R^{T}

(7)

where

R \in S O (3)

denotes the rotation matrix component, and

ψ_{A, m}^{k}

and

ψ_{B, m}^{i}

represent the geometric feature covariance matrices associated with potential corresponding points

p_{A}^{k}

and

p_{B}^{i}

, respectively.

This leads to the derivation of the correspondence probability formulation for the UAV and UGV teams’ local maps

A

and

B

under transformation matrix

T

, as follows:

ρ (A, B, T) = \sum_{k = 0}^{M} ρ (p_{A, m}^{k}, p_{B, m}^{i}, l_{A, m}^{k}, l_{B, m}^{i}, T)

(8)

The optimal transformation matrix

T

is subsequently computed via an adaptive iterative adjustment strategy, as outlined in Algorithm 1. First, the initial parameters are given, namely rotation matrix

R

, translation vector

T

, and predefined threshold

τ

. The algorithm then iteratively evaluates the correspondence probability using the current transformation parameters, compares the result with

τ

, and adaptively adjusts the rotation angles and translation step sizes. Through continuous iteration, the optimal cross-domain coordinate transformation matrix

T

between UAV and UGV is ultimately determined. This process provides reliable relative pose measurement initialization for the back-end DPGO.

Algorithm 1. Adaptive Iterative Adjustment Strategy

Input: Initial rotation Angle

R_{I}

Initial translation vector

T_{I}

Threshold of the probability corresponding

τ

Output: Rotation angle

R

Translation vector

T

Function

while (True) do
evaluate fusing Equation (8)

if (

ρ (A, B, T) < τ / 10

) then

R = R_{I} \times 10, T = T_{I} \times 5

else if (

ρ (A, B, T) < τ / 3

) then

R = R_{I} \times 5, T = T_{I} \times 3

else if (

ρ (A, B, T) < τ / 2

) then

R = R_{I} \times 2, T = T_{I}

else

R = R_{I}, T = T_{I}

end if

end while

return

R, T

End

3.3. Robust DPGO Back-End Module

After passing through the distributed loop closure configuration module, effective pose initialization was achieved among multiple UAVs and UGVs in the air–ground cross-domain. The next step is to perform pose estimation and optimization of the global robot using the robust DPGO back-end module. To enhance the robustness and accuracy of the DPGO in the proposed method and to mitigate the negative effects caused by false loop closures, this paper introduces a multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization.

DPGO, as a nonlinear non-convex optimization problem, estimates unknown poses by modeling multi-robot trajectories as graph structures where vertices represent individual poses and edges represent relative pose constraints [30]. DPGO in the proposed method integrates mileage estimation within robots and loop closure constraints between robots to collaboratively solve globally consistent trajectories among air–ground cross-domain UAVs and UGVs [31]. The DPGO problem is formally modeled through a directed graph

\vec{G} = (ν, \vec{ε})

, where the vertex

ν

represents unknown poses

g_{α}^{i} \in S E (d)

of robot

α

at keyframe

i

, and the directed edge

\vec{ε} \subset ν \times ν

represents noisy relative measurements

{\tilde{g}}_{α β}^{i j} \in S E (d)

between robot

α

at keyframe

i

and robot

β

at keyframe

j

. The directed graph

\vec{G}

is assumed to exhibit weak connectivity, which ensures the solvability of the optimization problem while maintaining algorithmic generality.

The directed graph

\vec{G}

represents an idealized model under the assumption of perfect loop closure detection. In practical multi-robot systems, however, inter-robot measurements are susceptible to outliers caused by erroneous loop closures, which introduce significant noise in relative pose estimation and degrade DPGO performance [32]. To address these measurement anomalies and improve the optimization accuracy, we construct a robust multi-level graph-optimized DPGO problem with Huber loss function through maximum likelihood estimation, as follows:

\underset{X \in X}{m i n} F (X)

(9)

The objective function

F (X)

is defined as follows:

\begin{matrix} F (X) ≜ & \sum_{α \in N} \sum_{(i, j) \in {\vec{ε}}^{α α}} \frac{1}{2} (w_{α i}^{α j} {‖ R_{α}^{j} - R_{α}^{i} {\tilde{R}}_{α i}^{α j} ‖}^{2} + τ_{α i}^{α j} {‖ T_{α}^{j} - T_{α}^{i} - R_{α}^{i} {\tilde{T}}_{α i}^{α j} ‖}^{2}) \\ + \sum_{\begin{matrix} α, β \in N, \\ α \neq β \end{matrix}} \sum_{(i, j) \in {\vec{ε}}^{α β}} \frac{1}{2} η (w_{α i}^{β j} {‖ R_{β}^{j} - R_{α}^{i} {\tilde{R}}_{α i}^{β j} ‖}^{2} + τ_{α i}^{β j} {‖ T_{β}^{j} - T_{α}^{i} - R_{α}^{i} {\tilde{T}}_{α i}^{β j} ‖}^{2}) \end{matrix}

(10)

where

w_{α i}^{α j}, τ_{α i}^{α j}, w_{α i}^{β j}, τ_{α i}^{β j}

denote the weights of each part of the noisy measurements, respectively.

η (\cdot)

represents Huber nontrivial loss kernel, which is specifically defined as follows:

η (s) = \{\begin{matrix} s, & | s | \leq a \\ 2 \sqrt{a | s |} - a, & | s | \geq a \end{matrix}

(11)

where

a > 0

. The Huber nontrivial loss kernel function is shown in Figure 4.

To enhance communication efficiency, reduce communication overhead, and improve optimization performance for DPGO, we adopt a multi-level partitioning strategy for the directed graph

\vec{G}

. The original multi-robot pose graph is recursively decomposed into balanced subgraphs across multiple hierarchical levels.

As shown in Figure 5, the procedure begins by partitioning

\vec{G}

into

k

-balanced subgraphs using edge-cut or vertex-cut methods, with the choice between these strategies depending on the graph’s inherent connectivity patterns [33]. Subsequently, each subgraph is further partitioned to generate

N

disjoint node sets

X = (X_{α}, X_{β}, \dots, X_{N})

. Notably, the state space of every robot node

X_{α}

combines both its local pose information and pose estimates from adjacent robots within the same hierarchical level. Specific definitions are as follows:

X_{α} = (g_{α}^{i}, \dots, g_{β}^{j}, \dots, g_{γ}^{k})

(12)

\forall α, β, γ \in [N], \forall i, j, k \in [n]

(13)

where

X_{α}

represents all the poses contained in each subgraph after subgraph division. The following multi-level partitioning DPGO problem can be formulated as follows:

\underset{X \in X}{m i n} F (X) = \sum_{α \in N} F_{α} (X)

(14)

where Equations (15)–(17) are as follows:

\begin{matrix} F_{α} (X) ≜ & \sum_{(g_{α}^{i}, g_{α}^{j}) \in ε_{i}} \frac{1}{2} (w_{α i}^{α j} {‖ R_{α}^{j} - R_{α}^{i} {\tilde{R}}_{α i}^{α j} ‖}^{2} + τ_{α i}^{α j} {‖ T_{α}^{j} - T_{α}^{i} - R_{α}^{i} {\tilde{T}}_{α i}^{α j} ‖}^{2}) \\ + \sum_{\begin{matrix} (g_{α}^{i}, g_{β}^{j}) \in ε_{i} \\ α \neq β \end{matrix}} \frac{1}{2} η (w_{α i}^{β j} {‖ R_{β}^{j} - R_{α}^{i} {\tilde{R}}_{α i}^{β j} ‖}^{2} + τ_{α i}^{β j} {‖ T_{β}^{j} - T_{α}^{i} - R_{α}^{i} {\tilde{T}}_{α i}^{β j} ‖}^{2}) \end{matrix}

(15)

X_{α} = (g_{α}^{i}, \dots, g_{β}^{j}, \dots, g_{γ}^{k})

(16)

s . t . g_{α}^{i}, g_{β}^{j} \in S E (d), \forall α, β, γ \in [N], \forall i, j, k \in [n]

(17)

where

F_{α} (X)

denotes the multi-level partitioning DPGO subproblem constructed for robot

α

, and

ε_{i}

represents the constraint set from local pose edges associated with robot

α \in N

.

For a simpler representation, the above formula can be rewritten as follows:

\underset{X \in X}{m i n} F (X) ≜ \sum_{α \in N} (\sum_{(g_{α}^{i}, g_{α}^{j}) \in ε_{i}} {F_{α}}_{α i}^{α j} (X) + \sum_{\begin{matrix} (g_{α}^{i}, g_{β}^{j}) \in ε_{i} \\ α \neq β \end{matrix}} {F_{α}}_{α i}^{β j} (X))

(18)

where

{F_{α}}_{α i}^{α j} (X)

and

{F_{α}}_{α i}^{β j} (X)

represent the measurement estimates within and between nodes of robot

α

, respectively. Equations (19) and (20) are as follows:

{F_{α}}_{α i}^{α j} (X) ≜ \frac{1}{2} (w_{α i}^{α j} {‖R_{α}^{j} - R_{α}^{i} {\tilde{R}}_{α i}^{α j}‖}^{2} + τ_{α i}^{α j} {‖T_{α}^{j} - T_{α}^{i} - R_{α}^{i} {\tilde{T}}_{α i}^{α j}‖}^{2})

(19)

{F_{α}}_{α i}^{β j} (X) ≜ \frac{1}{2} η (w_{α i}^{β j} {‖R_{β}^{j} - R_{α}^{i} {\tilde{R}}_{α i}^{β j}‖}^{2} + τ_{α i}^{β j} {‖T_{β}^{j} - T_{α}^{i} - R_{α}^{i} {\tilde{T}}_{α i}^{β j}‖}^{2})

(20)

Next, we can achieve global optimization of DPGO by optimizing each component of the objective function.

According to reference [31], when

α = β

or

α \neq β

, there exists a sparse positive semidefinite matrix

Ψ_{α i}^{β j} \in R^{n (d + 1) \times n (d + 1)}

that satisfies the following Equation (21):

\frac{1}{2} {‖X‖}_{Ψ_{α i}^{β j}}^{2} = \frac{1}{2} w_{α i}^{β j} {‖R_{β}^{j} - R_{α}^{i} {\tilde{R}}_{α i}^{β j}‖}^{2} + \frac{1}{2} τ_{α i}^{β j} {‖T_{β}^{j} - T_{α}^{i} - R_{α}^{i} {\tilde{T}}_{α i}^{β j}‖}^{2}

(21)

Then, the forms of the measurement

{F_{α}}_{α i}^{α j} (X)

in the robot node and the measurement

{F_{α}}_{α i}^{β j} (X)

between nodes can be expressed as follows:

{F_{α}}_{α i}^{α j} (X) ≜ \frac{1}{2} {‖X‖}_{Ψ_{α i}^{α j}}^{2}

(22)

{F_{α}}_{α i}^{β j} (X) ≜ \frac{1}{2} {η ‖X‖}_{Ψ_{α i}^{β j}}^{2}

(23)

For any matrix

A, B, C \in R^{m \times n}

, it can be easily proved that the following is true:

\frac{1}{2} {‖A - B‖}_{Ψ_{α i}^{β j}}^{2} \leq {‖A - C‖}_{Ψ_{α i}^{β j}}^{2} + {‖A - C‖}_{Ψ_{α i}^{β j}}^{2}

(24)

where

Ψ_{α i}^{β j} \in R^{n \times n}

is a positive semidefinite, and the equation holds when

C = \frac{1}{2} A + \frac{1}{2} B

. If

C = 0

, the above formula can be rewritten as follows:

\frac{1}{2} {‖A - B‖}_{Ψ_{α i}^{β j}}^{2} \leq {‖A‖}_{Ψ_{α i}^{β j}}^{2} + {‖B‖}_{Ψ_{α i}^{β j}}^{2}

(25)

Applying the above formula to the right side of

\frac{1}{2} {‖X‖}_{Ψ_{α i}^{β j}}^{2}

yields the following:

\frac{1}{2} {‖X‖}_{Ψ_{α i}^{β j}}^{2} \leq w_{α i}^{β j} ({‖R_{β}^{j}‖}^{2} + {‖R_{α}^{i} {\tilde{R}}_{α i}^{β j}‖}^{2}) + τ_{α i}^{β j} ({‖T_{β}^{j}‖}^{2} + {‖T_{α}^{i} + R_{α}^{i} {\tilde{T}}_{α i}^{β j}‖}^{2})

(26)

At the same time, due to

{({\tilde{R}}_{α i}^{β j})}^{T} {\tilde{R}}_{α i}^{β j} = {{\tilde{R}}_{α i}^{β j} ({\tilde{R}}_{α i}^{β j})}^{T} = I

, it can be further expressed as follows:

\frac{1}{2} {‖X‖}_{Ψ_{α i}^{β j}}^{2} \leq w_{α i}^{β j} ({‖R_{β}^{j}‖}^{2} + {‖R_{α}^{i}‖}^{2}) + τ_{α i}^{β j} ({‖T_{β}^{j}‖}^{2} + {‖T_{α}^{i} + R_{α}^{i} {\tilde{T}}_{α i}^{β j}‖}^{2})

(27)

There is a positive semidefinite matrix

Ω_{α i}^{β j} \in R^{(d + 1) n \times (d + 1) n}

such that the right side of the above equation can be rewritten as follows:

\frac{1}{2} {‖X‖}_{Ω_{α i}^{β j}}^{2} = w_{α i}^{β j} ({‖R_{β}^{j}‖}^{2} + {‖R_{α}^{i}‖}^{2}) + τ_{α i}^{β j} ({‖T_{β}^{j}‖}^{2} + {‖T_{α}^{i} + R_{α}^{i} {\tilde{T}}_{α i}^{β j}‖}^{2})

(28)

where

Ω_{α i}^{β j}

is a block diagonal matrix that can be used to decouple the unknown poses of different nodes.

Let

X^{(k)} = [X_{α}^{(k)}, \dots, X_{N}^{(k)}] \in X

, where

X_{α}^{(k)} \in X_{α}

represents the

k

iteration state variable of

F_{α} (X_{α})

. So, for any

X \in X_{α}

, the minimized agent function

{E_{α}}_{α i}^{β j} (X | X^{(K)})

of

{F_{α}}_{α i}^{β j} (X)

can be obtained as follows:

{E_{α}}_{α i}^{β j} (X | X^{(K)}) ≜ \frac{1}{2} μ_{α i}^{β j (k)} {‖X - X^{(k)}‖}_{Ω_{α i}^{β j}}^{2} + 〈\nabla {F_{α}}_{α i}^{β j} (X^{(k)}), X - X^{(k)}〉 + {F_{α}}_{α i}^{β j} (X) \geq {F_{α}}_{α i}^{β j} (X)

(29)

The equation holds when

X = X^{(k)}

. Here,

μ_{α i}^{β j (k)}

is the weight associated with the loss kernel function, defined as follows:

μ_{α i}^{β j (k)} ≜ \{\begin{matrix} 1, & α = β \\ \nabla η ({‖X^{(k)}‖}_{Ψ_{α i}^{β j}}^{2}), & α \neq β \end{matrix}

(30)

So, the agent function

G_{α} (X | X^{(K)})

of

F_{α} (X)

can be further constructed, for any

X \in X_{α}

can obtain the following:

G_{α} (X | X^{(K)}) ≜ \sum_{(g_{α}^{i}, g_{α}^{j}) \in ε_{i}} {E_{α}}_{α i}^{α j} (X | X^{(K)}) + \sum_{\begin{matrix} (g_{α}^{i}, g_{β}^{j}) \in ε_{i} \\ α \neq β \end{matrix}} {E_{α}}_{α i}^{β j} (X | X^{(K)}) + \frac{ς}{2} {‖X - X^{(k)}‖}^{2} \geq F_{α} (X)

(31)

where

ς \geq 0 \in R

. Furthermore, the equation holds when

X = X^{(K)}

.

We can further obtain the following:

\sum_{α \in N} G_{α} (X | X^{(K)}) \geq \sum_{α \in N} F_{α} (X) = F (X)

(32)

The equation holds when

X_{α} = X^{(K)}

.

There also exists a positive semidefinite matrix

Γ_{α}^{(k)} \in R^{(d + 1) n_{α} \times (d + 1) n_{α}}

such that Equation (33) is as follows:

G_{α} (X | X^{(K)}) ≜ \frac{1}{2} {‖X - X_{α}^{(k)}‖}_{Γ_{α}^{(k)}}^{2} + 〈\nabla_{X_{α}} F_{α} (X^{(k)}), X - X_{α}^{(k)}〉

(33)

where

\nabla_{X_{α}} F_{α} (X^{(k)})

represents the Euclidean gradient of

F_{α} (X)

with respect to

X \in X_{α}

at

X^{(K)} \in X

.

For any node

α \in N

,

G_{α} (X | X^{(K)})

is equivalent to the following:

G_{α} (X | X^{(K)}) = \sum_{i = 1}^{n} G_{α}^{i} (X_{α}^{i} | X_{α}^{(k)})

(34)

where

G_{α}^{i} (X_{α}^{i} | X_{α}^{(k)})

is a function of a single pose

X_{α}^{i} \in X_{α}

;

X_{α}^{i}

represents the

i

-th position

X_{α}^{i} = [R_{α}^{i}, t_{α}^{i}] \in S E (d)

in the directed graph

X_{α}

of node

α

. Furthermore, there exists a positive semidefinite matrix

Γ_{α}^{i (k)} \in R^{(d + 1) \times (d + 1)}

such that Equation (35) is as follows:

G_{α}^{i} (X | X^{(K)}) = \frac{1}{2} {‖X - X_{α}^{i (k)}‖}_{Γ_{α}^{i (k)}}^{2} + 〈\nabla_{X_{α}^{i}} F_{α} (X^{(K)}), X - X_{α}^{i (k)}〉

(35)

where

\nabla_{X_{α}^{i}} F_{α} (X^{(K)})

represents the Euclidean gradient of

F_{α} (X)

at

X^{(K)} \in X

with respect to a single pose

X_{α}^{i} \in S E (d)

.

G_{α} (X | X^{(K)})

couples multi-level directed graphs of different nodes, enabling the majorization–minimization methods of DPGO to be possible. By setting

ς > 0

and letting it approach 0 so that

G_{α} (X | X^{(K)})

becomes the tightest upper bound surrogate function of

F_{α} (X)

, the convergence speed and convergence stability are improved.

Therefore, the optimization problem of

F (X)

can be solved by solving the independent optimization problem of

X \in X_{α}

in a single node

α \in N

. Based on the above formula derivation, the following update rules are proposed:

X_{α}^{(k + 1)} \leftarrow \underset{X \in X_{α} \in SE (d)}{argmin} G_{α} (X | X^{(K)})

(36)

Since Equations (34) and (36) can be solved in a single multi-level partitioned node

α \in N

, we can further decompose Equation (36) into an independent optimization problem for a single pose

X_{α}^{i} \in S E (d)

at the node

α

, as follows:

X_{α}^{i (k + 1)} \leftarrow \underset{X \in X_{α} \in SE (d)}{argmin} G_{α}^{i} (X | X^{(K)})

(37)

According to the above equation, Equation (38) is as follows:

F (X^{(K)}) = \sum_{α \in N} G_{α} (X^{(K)} | X^{(K)}) \geq \sum_{α \in N} G_{α} (X^{(K + 1)} | X^{(K)}) \geq F (X^{(K + 1)})

(38)

The above derivation shows that

F (X^{(K + 1)}) \leq F (X^{(K)})

. Therefore, Equations (36) and (37) above are reasonable update rules for distributed PGO. These rules have proven convergence for first-order critical points. The specific optimization algorithm formulated according to the optimization rules is shown in Algorithm 2. The algorithm details are as follows:

(1): In the sixth row of Algorithm 2, each robot node $α$ performs a round of inter-node communication to obtain $X_{β}^{(k)}$ from its neighbor robot node $β \in N$ ;
(2): In the seventh row of Algorithm 2, each robot node $α$ using its state $X_{α}^{(k)}$ with its neighbors $X_{β}^{(k)}$ evaluates $Γ^{α (k)}$ , $\nabla_{X} F_{α} (X^{(K)})$ ;
(3): In rows 8 to 11 of Algorithm 2, $X_{α}^{(k + 1)}$ is calculated for each robot node $α$ ;
(4): In row 11 of Algorithm 2, each robot node $α$ is iteratively optimized to improve the final solution $X^{(k + 1)}$ .

Algorithm 2. Multi-level Partitioning DPGO Method

Input: An initial iterate

X^{(0)} \in X

and

ς \geq 0

.

Output: A sequence of iterates

X^{(k)}

Function

for

k \leftarrow 0, 1, 2, \dots

do

for node

α \leftarrow 1, 2, \dots N

do

retrieve

X_{β}^{(k)}

from

β \in N

evaluate

Γ^{α (k)}

,

\nabla_{X} F_{α} (X^{(K)})

for

i \leftarrow 1, 2, \dots n

do

X_{α}^{i (k + 1)} \leftarrow \underset{X \in X_{α} \in SE (d)}{argmin} G_{α}^{i} (X | X^{(K)})

retrieve

X_{α}^{(k + 1)}

from

X_{α}^{i (k + 1)}

in which

i = 1, 2, \dots n

end for

retrieve

X^{(k + 1)}

from

X_{α}^{(k + 1)}

in which

α = 1, 2, \dots N

end for

return

X^{(k)}

end

The proposed optimization algorithm requires no extra computational effort, as long as each robot node

α

communicates with its neighbor robot node

β \in N

. Next, the first-order convergence of the algorithm will be verified.

Theorem 1.

Suppose that for any node

α = 1, 2, \dots N

, the following conditions hold:

G_{α} (X^{(K)} | X^{(K)}) \geq G_{α} (X^{(K + 1)} | X^{(K)})

(39)

g r a d G_{α} (X^{(K + 1)} | X^{(K)}) = 0

(40)

where

g r a d

denotes the Riemannian gradient. Then, for the iterative sequence

X^{(k)}

generated by the optimization algorithm, we can conclude the following:

(1): $F (X^{(k)})$ is a non-increasing function.
(2): $F (X^{(k)}) \to F^{\infty}$ as $k \to \infty$ .
(3): $‖X^{(k + 1)} - X^{(k)}‖ \to 0$ as $k \to \infty$ , if $ς > 0$ .
(4): $g r a d (F (X^{(k)})) \to 0$ as $k \to \infty$ , if $ς > 0$ .

Proof:

(1): According to Formulas (31) and (32), we can deduce the following:

$G (X | X^{(K)}) \geq F (X)$

(41)

This equation holds when $X = X^{(K)}$ . From Formula (38), we can derive the following conclusions:

$F (X^{(k + 1)}) \leq G (X^{(k + 1)} | X^{(K)}) \leq G (X^{(K)} | X^{(K)}) = F (X)$

(42)

Therefore, we can prove that (1) holds, and that $F (X^{(k)})$ is a non-increasing function.
(2): Since (1) holds and $F (X^{(k)}) \geq 0$ , we can conclude the following:

$\exists F^{\infty} \in R, (F (X^{(k)}) \to F^{\infty})$

(43)
(3): According to Formula (42), we have the following:

$F (X^{(k)}) - F (X^{(k + 1)}) \geq G (X^{(K + 1)} | X^{(K)}) - F (X^{(k + 1)})$

(44)

Substituting Formulas (29) and (33) into the above equation, we obtain the following:

$F (X^{(k + 1)}) \leq \frac{1}{2} {‖X^{(k + 1)} - X^{(K)}‖}_{Ω^{(k)}}^{2} + 〈\nabla F (X^{(K)}), X - X^{(K)}〉 + F (X^{(K)})$

(45)

$G (X^{(k + 1)} | X^{(K)}) = \frac{1}{2} {‖X^{(K + 1)} - X^{(K)}‖}_{Γ^{(k)}}^{2} + 〈\nabla F (X^{(k)}), X - X^{(K)}〉 + F (X^{(K)})$

(46)

$F (X^{(k)}) - F (X^{(k + 1)}) \geq \frac{1}{2} {‖X^{(K + 1)} - X^{(K)}‖}_{Γ^{(k)}}^{2} - \frac{1}{2} {‖X^{(k + 1)} - X^{(K)}‖}_{Ω^{(k)}}^{2}$

(47)

Since $Γ^{(k)} ⪰ Ω^{(k)}$ , we can derive the following:

$F (X^{(k)}) - F (X^{(k + 1)}) \geq \frac{δ}{2} {‖X^{(K + 1)} - X^{(K)}‖}^{2}$

(48)

where $δ > 0$ is a constant scalar. As $k \to \infty$ , we have $F (X^{(k)}) \to F^{\infty}$ , and, thus, when $k \to \infty$ , $F (X^{(k)}) - F (X^{(k)}) \to 0$ . Therefore, we can conclude the following:

$‖X^{(K + 1)} - X^{(K)}‖ \to 0$

(49)
(4): Since (3) holds true and $g r a d (F (X^{(k)}))$ is continuous, it follows that $g r a d (F (X^{(k)})) \to 0$ as $k \to \infty$ , if $ς > 0$ . □

Thus, we can demonstrate that the DPGO method presented in this paper converges gently. Compared with other DPGO methods, the multi-level partitioning DPGO method incorporating loss kernel optimization proposed in this paper has the most relaxed first-order convergence requirements, while significantly enhancing the convergence performance of nonlinear optimization.

The above DPGO method operates periodically on each robot. Each robot first determines neighboring robots within communication range by analyzing connectivity in the distributed pose factor graph. Subsequently, it performs loop closure detection through interactive keyframe matching. Following the elimination of erroneous keyframes and computation of the optimal loop closure transformation matrix, the method implements the multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization. In each iteration of the algorithm, the optimal solution for each robot pose estimation is found, and a consensus is reached on the optimal trajectory and pose estimation.

4. Results

This section presents a systematic evaluation of the performance of the distributed multi-robot collaborative SLAM method based on air–ground cross-domain cooperation proposed in this paper. First, we conduct numerical experiments comparing our DPGO optimization approach with existing methods, specifically analyzing its convergence speed and optimization accuracy. Second, we validate the method’s practical performance in air–ground cooperative UAV and UGV teams by assessing trajectory estimation accuracy and map construction quality using air–ground multi-robot datasets. Experimental results demonstrate that our method offers significant advantages for air–ground cooperative UAV and UGV teams, delivering robust, high-precision pose and trajectory estimation while maintaining global map consistency.

4.1. DPGO Performance Analysis

We evaluate our DPGO method using 2D and 3D SLAM standard benchmark datasets. The 3D evaluation employs standard datasets including the Sphere, Grid, Garage, and Torus, while the 2D evaluation utilizes the City and CSAIL datasets [11]. This dual-testing framework guarantees a reliable assessment of the method’s universality and practical applicability across different scenarios.

We employ two quantitative metrics to evaluate method performance: the relative suboptimality gap and the Riemannian gradient norm [21].

The relative suboptimality gap is defined as

(F - F^{*}) / F^{*}

, Where

F

denotes the objective value at each iteration and

F^{*}

represents the global optimal value computed by SE-SYNC [11]. Specifically, SE-SYNC reformulates the original pose graph’s non-convex optimization problem as follows through semi-definite relaxation and Riemannian optimization on matrix manifolds:

F^{*} = \min_{Y \geq 0} 〈Q, Y^{T} Y〉

(50)

where

Q

is the coefficient matrix constructed by the noise measured values, defined as follows:

Q = \sum_{α \in N} \sum_{(i, j) \in {\vec{ε}}^{α α}} Q ({\bar{g}}_{α α}^{i j}) + \sum_{\begin{matrix} α, β \in N \\ α \neq β \end{matrix}} \sum_{(i, j) \in {\vec{ε}}^{α β}} Q ({\tilde{g}}_{α β}^{i j}) + Q_{a n c h o r}

(51)

where

Q ({\bar{g}}_{α α}^{i j}), Q ({\tilde{g}}_{α β}^{i j}),

and

Q_{a n c h o r}

represent the intra-robot measurement noise, the inter-robot measurement noise, and the anchor point constraints, respectively. These covariance matrices are explicitly defined as follows:

Q ({\bar{g}}_{α α}^{i j}) = - [\begin{matrix} Ω_{R} ⨂ {({\bar{R}}_{α α}^{i j})}^{T} {\bar{R}}_{α α}^{i j} & Ω_{R} ⨂ {({\bar{T}}_{α α}^{i j})}^{T} T_{α α}^{i j} \\ {(Ω_{R} ⨂ {({\bar{R}}_{α α}^{i j})}^{T} {\bar{T}}_{α α}^{i j})}^{T} & Ω_{T} ⨂ {({\bar{T}}_{α α}^{i j})}^{T} {\bar{T}}_{α α}^{i j} \end{matrix}] ⨁ [\begin{matrix} Ω_{R} ⨂ I_{d} & 0 \\ 0 & Ω_{T} \end{matrix}]

(52)

Q ({\tilde{g}}_{α β}^{i j}) = - [\begin{matrix} Ω_{R} ⨂ {({\tilde{R}}_{α β}^{i j})}^{T} {\tilde{R}}_{α β}^{i j} & Ω_{R} ⨂ {({\tilde{R}}_{α β}^{i j})}^{T} {\tilde{T}}_{α β}^{i j} \\ {(Ω_{R} ⨂ {({\tilde{R}}_{α β}^{i j})}^{T} {\tilde{T}}_{α β}^{i j})}^{T} & Ω_{T} ⨂ {({\tilde{T}}_{α β}^{i j})}^{T} {\tilde{T}}_{α β}^{i j} \end{matrix}] ⨁ [\begin{matrix} Ω_{R} ⨂ I_{d} & 0 \\ 0 & Ω_{T} \end{matrix}]

(53)

Q_{a n c h o r} = λ \sum_{α \in N} \sum_{k \in A_{α}} (e_{α k} e_{α k}^{T}) ⨂ I_{d + 1}

(54)

where

Ω_{R}

and

Ω_{T}

represent the rotation information matrix and the translation information matrix, respectively.

A_{α}

represents the anchor point pose set of robot a, while

e_{α k}

corresponds to the standard basis vector.

In Equation (50),

Y

represents the matrix variable of the rotation vector relaxed by SDP and is defined as follows:

Y = [u_{α}^{1} \dots u_{α}^{i} \dots u_{β}^{j} \dots u_{N}^{n}] \in R^{(d \cdot N) \times r}

(55)

where

u_{α}^{i} \in R^{d}

is the rotation vector remapped to its corresponding rotation matrix via

R_{α}^{i} = m a t (u_{α}^{i})

. After computing the global optimal objective value

F^{*}

, the relative suboptimal gap

(F - F^{*}) / F^{*}

at each iteration is quantified. Convergence to the global optimum is indicated as this value approaches zero.

The Riemannian gradient norm measures the variations of the gradient during optimization. Its global measure is defined as follows:

{‖g r a d f‖}_{M} = \sqrt{\sum_{α \in N} \sum_{i = 1}^{n} {‖g r a d f (R_{α}^{i})‖}_{F}^{2}}

(56)

where

M

represents the product manifold of the rotational group. The term

g r a d f (R_{α}^{i})

denotes the Riemannian gradient associated with a single robot, specifically defined as follows:

\begin{matrix} g r a d f (R_{α}^{i}) & = [\sum_{β \in N} [\sum_{(i, j) \in {\vec{ε}}^{α β}} ({R_{α}^{i}}^{T} R_{β}^{j} - {\tilde{R}}_{α β}^{i j}) {R_{β}^{j}}^{T} - \sum_{(i, j) \in {\vec{ε}}^{α β}} R_{β}^{j} {({R_{β}^{j}}^{T} R_{α}^{i} - {\tilde{R}}_{β α}^{j i})}^{T}]] R_{α}^{i} \\ = R_{α}^{i} \cdot skew (\sum_{β \in N} \sum_{(i, j) \in \vec{ε}} ({R_{α}^{i}}^{T} R_{β}^{j} - {\tilde{R}}_{α β}^{i j})) \end{matrix}

(57)

where

s k e w (M) = \frac{M - M^{T}}{2}

is a function of the projection form of Lie algebra. The smaller the global Riemannian gradient norm, the stronger the convergence it indicates.

These complementary metrics comprehensively evaluate the convergence performance of the method from different angles and provide a comprehensive quantitative basis for the optimization effect.

To rigorously evaluate our proposed DPGO optimization method, we conduct comparative experiments with three state-of-the-art PGO algorithms, including DGS (distributed Gauss–Seidel) [17], RCBD++ (accelerated Riemannian block coordinate descent) [18], and MM-PGO (majorization–minimization methods for distributed pose graph optimization) [21]. Among them, DGS adopts the standard configuration, while RCBD++ and MM-PGO are configured based on the Nesterov acceleration algorithm. For comprehensive scalability validation, we test all methods using 5-node and 10-node robot team configurations. The comparative results are presented in Table 1.

The scalar

F^{*}

in Table 1 denotes the globally optimized objective value for the multi-robot pose graph optimization problem, which corresponds to the negative log-likelihood under the given measurement noise model. Its magnitude is determined by the covariance-weighted sum of all measurement residuals (rotational and translational), inherently lacking a unified dimension. We typically focus more on the numerical change relative to the initial value than on its physical unit. However, this value can serve as an indicator of the pose graph’s consistency. A smaller value indicates a higher consistency in the pose graph.

Table 1 presents the objective function values of each method at 12, 25, 50, 100, 250, and 500 iterations for both 5-node and 10-node robot systems. The key advantage of DPGO is its ability to achieve higher precision more quickly, rather than its eventual attainment of the highest possible precision. The results highlight the correlation between convergence rate and iteration count across various optimization methods. These data demonstrate the superior convergence performance of our method across all datasets. While the performance of each method is similar in simpler datasets, like Sphere(3D) and CSAIL(2D), our method reaches optimal solution convergence within 50 iterations. In more complex scenarios, particularly in large-scale datasets, like Garage(3D) and City(2D), our method achieves optimal convergence within 100 iterations, while other methods fail to converge even after 500 iterations. These findings confirm that our method outperforms alternatives in both convergence quality and speed, especially in complex real-world scenarios.

To evaluate the optimization performance of each method, we set the Riemannian gradient norm and the relative suboptimal gap threshold to

10^{- 1}

and

10^{- 5}

, respectively. Specifically, when the Riemannian gradient norm satisfies

{‖g r a d f‖}_{M} < 10^{- 1}

, it indicates that optimal convergence has been achieved, and the optimization has reached a critical state. Similarly, when the relative suboptimal gap satisfies

(F - F^{*}) / F^{*} < 10^{- 5}

, it suggests that the current value of the objective function is very close to the global optimal solution.

To further compare the specific changes in the two evaluation indicators—the Riemannian gradient norm and the relative suboptimal gap—of each optimization method during the iterative process, Figure 6 and Figure 7 present the optimization results for all methods on the Grid(3D) and Torus(3D) datasets, respectively. The contrast curve in the figures demonstrates the superior efficiency of our method, achieving optimal convergence within just 200 iterations across both datasets. Specifically, it achieves a relative suboptimal gap of

10^{- 5}

, which is far below the threshold, within 200 iterations and reduces the Riemannian gradient to

10^{- 1}

within 500 iterations. In contrast, the DGS method shows the worst optimization performance, failing to converge within 500 iterations. In the case of a 5-node configuration, the Riemannian gradient only reaches the order of magnitude

10^{0}

, and the relative suboptimal gap does not improve beyond

10^{- 5}

. In a more complex 10-node configuration, the Riemannian gradient only reaches

10^{1}

, and the relative suboptimal gap remains above

10^{- 4}

. RBCD++ and MM-PGO perform similarly, with both methods achieving a Riemannian gradient threshold of

10^{- 1}

within 500 iterations and keeping the relative suboptimal gap below

10^{- 3}

. Therefore, it can be concluded that the DPGO method proposed in this paper outperforms all other comparison methods in terms of convergence speed and optimization performance. Notably, our method exhibits faster convergence with 5-node configurations versus 10-node systems, confirming an inverse relationship between node count and optimization complexity. This verifies the negative correlation between the number of robot nodes and the complexity of the optimization problem. Fewer nodes impose tighter spatial constraints on pose estimates, thereby accelerating convergence. Despite this node-dependent variation, our method consistently outperforms other methods, delivering superior convergence rates and final optimization accuracy across all system scales.

The comparative analysis results of DPGO performance analysis demonstrate that our DPGO algorithm achieves superior performance, exhibiting both higher convergence accuracy and lower computational overhead than other methods.

4.2. Air–Ground Cross-Domain Collaborative SLAM Performance Analysis

Following the evaluation of the DPGO method, we systematically verified the actual performance of our air–ground cross-domain cooperative SLAM method.

We conducted experiments using the GRACO air–ground collaborative dataset [34], which is the first benchmark dataset specifically designed for air–ground cooperative multi-robot SLAM. Developed by Sun Yat-sen University, the dataset includes synchronized data collected from aerial drones and ground vehicles, with both equipped with multi-modal sensors, such as LiDAR, cameras, and GNSS. The GRACO dataset consists of eight aerial sequences and six ground sequences, each providing highly accurate ground truth trajectories. Notably, it contains abundant loop closure features across both aerial and ground sequences. These comprehensive characteristics make GRACO particularly well-suited for evaluating the performance of SLAM algorithms in air–ground cross-domain collaboration and multi-modal fusion scenarios.

We evaluate the method performance on the GRACO dataset using two key metrics, namely absolute trajectory error (ATE) for global accuracy and relative trajectory error (RTE) for local stability [35]. These two metrics jointly evaluate the pose estimation performance of the SLAM system.

To comprehensively evaluate the performance advantages of the proposed method, we compared it with four mainstream SLAM algorithms, namely Lio-Sam [36], Fast-Lio2 [37], ORB-SLAM3 [38], and GAC-Mapping [24]. Table 2 presents the comparative results of pose estimation errors for each method on the sequences of the GRACO dataset. All performance metrics were extracted directly from the original paper.

The comparison data in Table 2 demonstrate that the proposed method outperforms the other algorithms in pose estimation accuracy across all sequences. In the aerial sequence, Fast-Lio2, GAC-Mapping, and ORB-SLAM3 show significant errors, primarily due to their limited use of altitude information. Although Lio-Sam exhibits a small pose estimation error, the semantics-assisted strategy in our method further enhances pose trajectory estimation accuracy. Our method reduces the mean ATE of each sequence by 0.3494 m, highlighting the effectiveness of the proposed method. In the ground sequence, despite the interference from numerous repeated features, such as railings and trees, the proposed method maintains high pose estimation accuracy through its robust feature-matching mechanism. As a result, its error value remains consistently lower than that of the other methods. The mean ATE of each sequence is reduced by 0.2335 m, 0.5864 m, 5.0058 m, and 0.4981 m, respectively, when compared to Lio-Sam, Fast-Lio2, ORB-SLAM3, and GAC-Mapping. These results demonstrate the exceptional pose estimation accuracy of the proposed method in both ground and aerial sequences, validating its universality and robustness in air–ground cross-domain collaborative SLAM tasks.

To further intuitively evaluate the effectiveness of the proposed method, we conducted a detailed comparative analysis of two representative sequences from the GRACO dataset—the Ground-01 and the Aerial-03 sequences—using three benchmark SLAM methods, namely Lio-Sam, Fast-Lio2, and ORB-SLAM3.

Ground-01 is a loop closure sequence with a relatively high traveling speed, and the absence of distinctive building features poses a significant challenge for feature matching. Figure 8 presents the comparison between the ground truth trajectory and the trajectory estimation by four methods for Ground-01. Specifically, Figure 8a,b display the comparison of the overall trajectory alignment error and the alignment error in each direction for each method on Ground-01, respectively. From the overall trajectory comparison graphs, it is evident that our method achieves the smallest error when compared to the ground truth trajectory. In the latter part of the sequence, as the speed increases, the trajectory and relative rotation errors in the y-direction for the predicted trajectories of other methods increase significantly. However, our method significantly reduces trajectory drift, especially the relative trajectory errors in each direction.

As shown in Figure 9, the comparison results of the ATE evaluation metrics for each method on Ground-01 are presented. From the comparison chart, it is evident that our method outperforms the other methods across all evaluation indicators in this sequence. Specifically, our method reduces the absolute pose estimation error by 0.165, 0.059, and 4.219 compared to Lio-Sam, Fast-Lio2, and ORB-SLAM3, respectively. Among all methods, ORB-SLAM3 performs the worst, with its maximum error reaching 10.089, while our method achieves a significantly lower maximum error of 0.595.

As shown in Figure 10, the results of the RTE comparison highlight the superior performance of our method. The evaluation metrics consistently show that the error in our method’s relative pose estimation is lower than that of other comparative methods. Specifically, our method achieves optimal results with a maximum error of 0.099 and a mean error of 0.026. In contrast, ORB-SLAM3 demonstrates the worst performance, with a maximum error of 1.099 during operation. However, the performance differences between Lio-Sam and Fast-Lio2 are not significant. The mean relative pose errors are 0.026 and 0.185, respectively, and the maximum errors are 0.127 and 0.414, respectively. These error values are notably higher than those of our method.

The Aerial-03 sequence presents a significant challenge due to the dense, repetitive structures at an altitude of 20 m, which greatly complicate pose estimation. As illustrated in Figure 11, the comparison between the ground truth trajectory and the trajectories generated by each method on the Aerial-03 sequence is shown. Specifically, Figure 11a,b display the overall trajectory alignment error comparison and the trajectory error comparison in each direction for each method on Aerial-03, respectively. Through this comparative analysis, it becomes evident that the error of the method proposed in this paper is considerably lower than that of the other methods. Notably, during mid-flight UAV turning maneuvers, other methods demonstrate significant amplification of z-direction errors, while our proposed method maintains trajectory stability and effectively reduces trajectory drift.

As shown in Figure 12, the ATE comparison chart for each method on the Aerial-03 sequence is presented. It is evident from the comparison chart that our method outperforms the others in all ATE indicators. The mean absolute error of our method is 0.166, which represents reductions of 0.384, 35.848, and 24.034 compared to Lio-Sam, Fast-Lio2, and ORB-SLAM3, respectively.

Similarly, Figure 13 presents the RTE comparison chart for each method. The comparison chart clearly illustrates that the relative pose error of our method is significantly lower than that of the other methods. Specifically, our method achieves a mean relative pose error of 0.044 and a maximum error of only 0.272, demonstrating superior stability and accuracy. In comparison, our method reduces the mean relative pose error by 0.175, 16.581, and 4.226 when compared to Lio-Sam, Fast-Lio2, and ORB-SLAM3, respectively.

The experimental results validate that the method proposed in this paper exhibits excellent pose estimation performance in both ground sequences and air sequences.

An experimental analysis of the cooperative positioning and mapping performance of our method was then conducted in the UAV and UGV air–ground sequences containing many repetitive loop closures. The experiment evaluated the air–ground collaboration performance of our method by utilizing two groups and three groups of air–ground collaborative UAV and UGV teams, respectively. Table 3 presents the comparative experimental results of air–ground multi-robot cooperative SLAM, using the method proposed in this paper and GAC-Mapping.

The comparative data in Table 3 show that the method proposed in this paper significantly outperforms GAC-Mapping across two air–ground sequence groups. In Team 1, which includes the Aerial-02 sequence and the Ground-04 sequence in GRACO, the mean ATE and RTE were reduced by 1.1885 m and 2.7819 m, respectively. Furthermore, due to the PCM incremental anomaly detection module and multi-level partitioning processing, the total number of parameters was decreased by 18.9%, and the processing delay was shortened by 38.9%. In Team 2, comprising the Aerial-08 sequence, Ground-03 sequence, and Ground-06 sequence in GRACO, the reduction in parameters reached 29.5%. Additionally, the computing delay decreased by 35.6%, and the ATE and RTE were significantly reduced by 0.2387 m and 2.8485 m, respectively. These results indicate that the proposed method achieves an average improvement of 42.7% in trajectory estimation accuracy, along with a 24.2% reduction in resource consumption and a 37.3% increase in operational efficiency. This fully demonstrates the dual advantages of accuracy and efficiency in air–ground cooperative pose estimation.

We verified and analyzed the global trajectory comparison and the details of map construction. To differentiate the mapping areas of each robot, the map information for each robot is independently color-coded. This provides a clear visual representation of the collaboration effect to facilitate the visual assessment of collaborative performance. This enables clear identification of local map construction results of individual robot contributions while maintaining global map consistency.

Figure 14 presents the comparison results between the global trajectory estimation and the ground truth trajectories of our method and GAC-Mapping. The UAV adopts the Aerial-02 sequence in GRACO, while the UGV utilizes the Ground-04 sequence in GRACO. From the comparison results of the global trajectories, it is evident that our method outperforms GAC-Mapping. In particular, the trajectory errors of GAC-Mapping in both the air sequences and the ground sequences are relatively large, while our method’s global trajectory estimation is much closer to the true trajectory. Notably, although individual robot trajectory errors may not always decrease during optimization, the overall system error is significantly reduced. This outcome aligns with expectations: robots with more accurate trajectory estimations help correct mistakes in others, thereby improving the system’s positioning accuracy while maintaining global trajectory consistency. Notably, variations in pose estimation may lead to real-time error increases for certain robots during collaborative optimization, especially during UAV turning maneuvers. However, compared to single-robot SLAM, air–ground cross-domain collaborative SLAM shows clear advantages for large-scale mapping, particularly by improving map completeness and coverage in previously unexplored environments.

As demonstrated in the map construction effect diagram shown in Figure 15a, our method effectively integrates the local maps of an air–ground robot team composed of a UAV and a UGV into a unified global map. This map is rich in detail, with a high consistency with the actual environment. Additionally, Figure 15b provides a local magnified map detail, clearly illustrating the smooth transition between the road and roadside trees in the scene. The edges of the map are neat, and the low bushes at the environmental corners are accurately identified. These map details comprehensively validate the geometric consistency of the constructed map.

Figure 16 compares the trajectory estimation of an air–ground collaborative team, consisting of one UAV and two UGVs, with the ground truth trajectories. The UAV uses the Aerial-08 sequence from the GRACO dataset, while the two UGVs utilize the Ground-03 and Ground-06 sequences, respectively, from the same dataset. The comparison of global trajectories clearly shows that our method outperforms GAC-Mapping. In the intersection area between the air and the ground, GAC-Mapping exhibited significant trajectory errors. In contrast, our method demonstrated the smallest trajectory deviation and lower cumulative error. These results suggest that our method effectively utilizes loop closure detection from both the UAV and UGVs, leading to accurate global trajectory estimation.

The map construction effect, shown in Figure 17a, further highlights our method’s effectiveness in air–ground cross-domain collaborative SLAM tasks. It successfully integrates the map information from both UAVs and UGVs, generating an environment map with rich details and global consistency. Specifically, our method seamlessly merges the local maps of multiple robots into a unified global map. The map not only boasts a high level of detail but also maintains excellent consistency with the actual environment. Additionally, as shown in the magnified local detail map in Figure 17b, the constructed map exhibits high consistency. The transitions between roads, buildings, trees, and other elements along the streets are smooth, the edges are neat, and there are very few outlying points.

The comparison results demonstrate our method’s superior performance in air–ground cross-domain collaborative SLAM tasks. In summary, our method significantly enhances both trajectory estimation accuracy and mapping performance for air–ground cross-domain multi-robot SLAM systems. This improvement provides reliable technical support for multi-robot cooperative tasks, especially in complex environments.

5. Discussion

The experimental results on both benchmark SLAM datasets and the air–ground GRACO datasets demonstrate the superior performance of the method proposed in this paper, particularly in large-scale distributed pose graph optimization and air–ground cross-domain SLAM applications. On the benchmark SLAM datasets, the method achieves optimal convergence with minimal iterations in both 5-node and 10-node environments. It excels in complex 10-node scenarios, primarily due to two key techniques, namely loss kernel optimization and multi-level partitioning. The loss kernel optimization technique effectively reduces the error in loop closure detection caused by repeated or complex environments, thus improving the optimization performance in challenging scenarios. Meanwhile, the multi-level partitioning technique divides the global directed pose graph into multiple balanced directed graphs based on the number of robot nodes and their degree of coupling. These graphs are then refined to minimize communication complexity and accelerate convergence. Experiments conducted on the GRACO dataset further validate the proposed method’s exceptional performance in air–ground cross-domain cooperative SLAM, even when there are significant differences in perspectives between UAVs and UGVs. This is achieved through an iterative registration method based on semantic and geometric features assistance, addressing the challenge of large field-of-view differences between UAVs and UGVs. Specifically, this method calculates the high-precision air–ground loop closure transformation matrix by integrating semantic features from the loop closure and performing probabilistic calculations on the geometric features. As a result, global pose estimation accuracy is enhanced. Furthermore, the robustness of the multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization contributes to optimal global pose estimation and precise map construction in collaborative scenarios involving multiple UAV and UGV teams. These comprehensive results demonstrate the method’s ability to effectively integrate environmental perception data from both UAVs and UGVs, ensuring consistently high-precision pose estimation and efficient optimization performance in complex and cross-domain collaborative tasks.

Although the method demonstrates excellent performance and robustness, some limitations remain. First, it has only been validated with public datasets, and its performance has not yet been tested in real-world, complex environments. As a result, verifying its performance in the face of real-world challenges, such as sensor noise, is difficult. Additionally, as the method’s performance improves, the communication complexity between robot nodes increases, which may negatively impact operational efficiency and performance, particularly in environments with limited computational resources or memory. However, comparison results of the running parameters and consumption time with other methods show that this method, through techniques, such as PCM incremental anomaly discrimination, multi-level partitioning, and loss kernel optimization, achieves strong generalization across diverse scenarios. It outperforms existing methods in both resource consumption during operation and robustness.

6. Conclusions

This paper proposes a distributed multi-robot collaborative SLAM method based on air–ground cross-domain cooperation, achieving high-precision collaborative SLAM by fusing environmental features from both UAVs and UGVs. The method begins with a local single-robot SA-LIO at each robot’s front end, which extracts local keyframes of the surrounding environment in real time. When UAVs and UGVs enter the communication range, they exchange loop closure keyframes and utilize a PCM incremental anomaly detection method to eliminate incorrect loop closures, ensuring reliable data association. Next, an iterative registration method based on semantic and geometric features assistance computes the relative coordinate transformation matrix between heterogeneous keyframes. This step establishes a unified coordinate system and provides accurate initial pose estimates for backend optimization. Finally, a multi-level partitioning majorization–minimization DPGO method incorporating loss kernel optimization ensures smooth convergence of the optimization process to first-order critical points, enabling precise global localization and mapping for both air–ground UAV and UGV teams. Experimental results on benchmark SLAM datasets and the GRACO dataset demonstrate that our method achieves state-of-the-art performance in air–ground cross-domain collaborative SLAM compared to other approaches.

In the future, we will continue to explore more accurate and reliable algorithm improvements, utilizing advanced techniques, such as chordal initialization and spectral relaxation, to reduce communication complexity and enhance optimization efficiency. Additionally, we plan to extend the proposed method from public datasets to real-world environments, ensuring optimal system performance. These advancements will significantly contribute to the deployment of air–ground cross-domain multi-robot systems in practical application scenarios.

Author Contributions

Conceptualization, Y.B.; Data curation, X.J.; Methodology, Y.B.; Software, Y.B.; Supervision, P.L. and C.W.; Validation, P.L.; Writing—original draft, Y.B.; Writing—review and editing, Y.B., P.L., C.W. and X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Jilin Province Science and Technology Development Plan Project, grant number 20250102228JC.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CSLAM	Collaborative simultaneous localization and mapping
UAV	Unmanned aerial vehicle
UGV	Unmanned ground vehicle
PCM	Pairwise consistency maximization
DPGO	Distributed pose graph optimization

References

Jiang, W.; Xue, H.; Si, S.; Xiao, L.; Zhao, D.; Zhu, Q.; Nie, Y.; Dai, B. R2SCAT-LPR: Rotation-Robust Network with Self- and Cross-Attention Transformers for LiDAR-Based Place Recognition. Remote Sens. 2025, 17, 1057. [Google Scholar] [CrossRef]
Barbieri, L.; Tedeschini, B.C.; Brambilla, M.; Nicoli, M. Deep Learning-Based Cooperative LiDAR Sensing for Improved Vehicle Positioning. IEEE Trans. Signal Process. 2024, 72, 1666–1682. [Google Scholar] [CrossRef]
Chang, Y.; Hughes, N.; Ray, A.; Carlone, L. Hydra-Multi: Collaborative Online Construction of 3D Scene Graphs with Multi-Robot Teams. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 10995–11002. [Google Scholar]
Xu, H.; Liu, P.; Chen, X.; Shen, S. D2SLAM: Decentralized and Distributed Collaborative Visual-Inertial SLAM System for Aerial Swarm. IEEE Trans. Robot. 2024, 40, 3445–3464. [Google Scholar] [CrossRef]
Araújo, A.G.; Pizzino, C.A.P.; Couceiro, M.S.; Rocha, R.P. A Multi-Drone System Proof of Concept for Forestry Applications. Drones 2025, 9, 80. [Google Scholar] [CrossRef]
Chen, Z.; Zhao, J.; Feng, T.; Ye, C.; Xiong, L. Robust Loop Closure Selection Based on Inter-Robot and Intra-Robot Consistency for Multi-Robot Map Fusion. Remote Sens. 2023, 15, 2796. [Google Scholar] [CrossRef]
Xie, Y.; Zhang, Y.; Chen, L.; Cheng, H.; Tu, W.; Cao, D. RDC-SLAM: A Real-Time Distributed Cooperative SLAM System Based on 3D LiDAR. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14721–14730. [Google Scholar] [CrossRef]
Li, Z.; Xu, P.; Ding, L.; Shan, J.; Yang, W.; Deng, Z.; Gao, H.; Liu, T.; Yang, H. Enhancing Cooperative Exploration and Planning: UAV-Legged Robot Synergy. IEEE Robot. Autom. Lett. 2024, 9, 10367–10374. [Google Scholar] [CrossRef]
Miller, I.D.; Cladera, F.; Smith, T.; Taylor, C.J.; Kumar, V. Stronger Together: Air-Ground Robotic Collaboration Using Semantics. IEEE Robot. Autom. Lett. 2022, 7, 9643–9650. [Google Scholar] [CrossRef]
Munguia, R.; Grau, A.; Bolea, Y.; Obregón-Pulido, G. A Simultaneous Control, Localization, and Mapping System for UAVs in GPS-Denied Environments. Drones 2025, 9, 69. [Google Scholar] [CrossRef]
Rosen, D.M.; Carlone, L.; Bandeira, A.S.; Leonard, J.J. SE-Sync: A certifiably correct algorithm for synchronization over the special Euclidean group. Int. J. Robot. Res. 2019, 38, 95–125. [Google Scholar] [CrossRef]
Yu, Z.; Qiao, Z.; Qiu, L.; Yin, H.; Shen, S. Multi-Session, Localization-Oriented and Lightweight LiDAR Mapping Using Semantic Lines and Planes. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 7210–7217. [Google Scholar]
Huang, Y.; Shan, T.; Chen, F.; Englot, B. DiSCo-SLAM: Distributed Scan Context-Enabled Multi-Robot LiDAR SLAM With Two-Stage Global-Local Graph Optimization. IEEE Robot. Autom. Lett. 2022, 7, 1150–1157. [Google Scholar] [CrossRef]
Lajoie, P.; Beltrame, G. Swarm-SLAM: Sparse Decentralized Collaborative Simultaneous Localization and Mapping Framework for Multi-Robot Systems. IEEE Robot. Autom. Lett. 2024, 9, 475–482. [Google Scholar] [CrossRef]
Zhong, S.; Qi, Y.; Chen, Z.; Wu, J.; Chen, H.; Liu, M. DCL-SLAM: A Distributed Collaborative LiDAR SLAM Framework for a Robotic Swarm. IEEE Sens. J. 2024, 24, 4786–4797. [Google Scholar] [CrossRef]
Mangelson, J.G.; Dominic, D.; Eustice, R.M.; Vasudevan, R. Pairwise Consistent Measurement Set Maximization for Robust Multi-Robot Map Merging. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 2916–2923. [Google Scholar]
Choudhary, S.; Carlone, L.; Nieto, C.; Rogers, J.; Christensen, H.; Dellaert, F. Distributed trajectory estimation with privacy and communication constraints: A two-stage distributed Gauss-Seidel approach. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 5261–5268. [Google Scholar]
Tian, Y.; Koppel, A.; Bedi, A.S.; How, J.P. Asynchronous and Parallel Distributed Pose Graph Optimization. IEEE Robot. Autom. Lett. 2020, 5, 5819–5826. [Google Scholar] [CrossRef]
Tian, Y.; Chang, Y.; Arias, F.H.; Nieto-Granda, C.; How, J.P.; Carlone, L. Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems. IEEE Trans. Robot. 2022, 38, 2022–2038. [Google Scholar] [CrossRef]
Yang, H.; Antonante, P.; Tzoumas, V.; Carlone, L. Graduated Non-Convexity for Robust Spatial Perception: From Non-Minimal Solvers to Global Outlier Rejection. IEEE Robot. Autom. Lett. 2020, 5, 1127–1134. [Google Scholar] [CrossRef]
Fan, T.; Murphey, T. Majorization Minimization Methods for Distributed Pose Graph Optimization. IEEE Trans. Robot. 2024, 40, 22–42. [Google Scholar] [CrossRef]
Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
Li, C.; Guo, G.; Yi, P.; Hong, Y. Distributed Pose-Graph Optimization with Multi-Level Partitioning for Multi-Robot SLAM. IEEE Robot. Autom. Lett. 2024, 9, 4926–4933. [Google Scholar] [CrossRef]
He, J.; Zhou, Y.; Huang, L.; Kong, Y.; Cheng, H. Ground and Aerial Collaborative Mapping in Urban Environments. IEEE Robot. Autom. Lett. 2021, 6, 95–102. [Google Scholar] [CrossRef]
Liu, D.; Wu, J.; Du, Y.; Zhang, R.; Cong, M. SBC-SLAM: Semantic Bioinspired Collaborative SLAM for Large-Scale Environment Perception of Heterogeneous Systems. IEEE Trans. Instrum. Meas. 2024, 73, 5018110. [Google Scholar] [CrossRef]
Duan, X.; Wu, M.; Xiong, C.; Hu, Q.; Zhao, P. A Base-Map-Guided Global Localization Solution for Heterogeneous Robots Using a Co-View Context Descriptor. Remote Sens. 2024, 16, 4027. [Google Scholar] [CrossRef]
Qiao, X.; Tang, J.; Li, G.; Yang, W.; Shen, B.; Niu, W.; Xie, W.; Peng, X. SMR-GA: Semantic Map Registration Under Large Perspective Differences Through Genetic Algorithm. IEEE Robot. Autom. Lett. 2025, 10, 2518–2525. [Google Scholar] [CrossRef]
Liu, P.; Bi, Y.; Shi, J.; Zhang, T.; Wang, C. Semantic-Assisted LIDAR Tightly Coupled SLAM for Dynamic Environments. IEEE Access 2024, 12, 34042–34053. [Google Scholar] [CrossRef]
Zhong, Y. Intrinsic shape signatures: A shape descriptor for 3D object recognition. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision Workshops, Kyoto, Japan, 27 September–4 October 2009; pp. 689–696. [Google Scholar]
Tian, Y.; Khosoussi, K.; Rosen, D.M.; How, J.P. Distributed Certifiably Correct Pose-Graph Optimization. IEEE Trans. Robot. 2021, 37, 2137–2156. [Google Scholar] [CrossRef] [PubMed]
Fan, T.; Murphey, T. Majorization Minimization Methods for Distributed Pose Graph Optimization with Convergence Guarantees. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020; pp. 5058–5065. [Google Scholar]
Barron, J.T. A General and Adaptive Robust Loss Function. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4326–4334. [Google Scholar]
Xu, H.; Shen, S. BDPGO: Balanced Distributed Pose Graph Optimization Framework for Swarm Robotics. arXiv 2021, arXiv:2109.04502. [Google Scholar]
Zhu, Y.; Kong, Y.; Jie, Y.; Xu, S.; Cheng, H. GRACO: A Multimodal Dataset for Ground and Aerial Cooperative Localization and Mapping. IEEE Robot. Autom. Lett. 2023, 8, 966–973. [Google Scholar] [CrossRef]
Evo: Python PackAge for the Evaluation of Odometry and SLAM. Available online: https://github.com/MichaelGrupp/evo (accessed on 19 November 2017).
Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 5135–5142. [Google Scholar]
Xu, W.; Cai, Y.; He, D.; Lin, J.; Zhang, F. FAST-LIO2: Fast Direct LiDAR-Inertial Odometry. IEEE Trans. Robot. 2022, 38, 2053–2073. [Google Scholar] [CrossRef]
Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.M.; Tardós, J.D. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM. IEEE Trans. Robot. 2021, 37, 1874–1890. [Google Scholar] [CrossRef]

Figure 1. Air–ground cross-domain collaborative robot team.

Figure 2. The architectural framework of the air–ground cross-domain collaborative SLAM system.

Figure 3. The structure of the iterative registration method based on semantic and geometric feature assistance.

Figure 4.

η (s^{2})

Huber loss kernel.

Figure 4.

η (s^{2})

Huber loss kernel.

Figure 5. Procedure of the multi-level graph partitioning.

Figure 6. Optimized performance comparison results of all methods in the Grid(3D) dataset: (a) The Riemannian gradient norm comparison diagram of each method with a 5-node configuration; (b) the relative suboptimality gap comparison diagram of each method with a 5-node configuration; (c) the Riemannian gradient norm comparison diagram of each method with a 10-node configuration; (d) the relative suboptimality gap comparison diagram of each method with a 10-node configuration.

Figure 7. Optimized performance comparison results of all methods in the Torus(3D) dataset: (a) The Riemannian gradient norm comparison diagram of each method with a 5-node configuration; (b) the relative suboptimality gap comparison diagram of each method with a 5-node configuration; (c) the Riemannian gradient norm comparison diagram of each method with a 10-node configuration; (d) the relative suboptimality gap comparison diagram of each method with a 10-node configuration.

Figure 8. The comparison results of trajectory estimation by each method and ground truth trajectory on the Ground-01 sequence of the GRACO dataset: (a) The comparison results of trajectory alignment errors of each method; (b) the comparison results of trajectory errors in each direction of each method.

Figure 9. The ATE comparison results of each method on the Ground-01 sequence of the GRACO dataset: (a) The comparison results of various ATE indicators; (b) the comparison results of the ATE box plots.

Figure 10. The RTE comparison results of each method on the Ground-01 sequence of the GRACO dataset: (a) The comparison results of various RTE indicators; (b) the comparison results of the RTE box plots.

Figure 11. The comparison results of trajectory estimation by each method and ground truth trajectory in the Aerial-03 sequence of the GRACO dataset: (a) The comparison results of trajectory alignment errors of each method; (b) the comparison results of trajectory errors in each direction of each method.

Figure 12. The ATE comparison results of each method on the Aerial-03 sequence of the GRACO dataset: (a) The comparison results of various ATE indicators; (b) the comparison results of the ATE box plots.

Figure 13. The RTE comparison results of each method on the Aerial-03 sequence of the GRACO dataset: (a) The comparison results of various RTE indicators; (b) the comparison results of the RTE box plots.

Figure 14. The comparison results of the ground truth trajectories on the Aerial-02 and the Ground-04 sequences with the cooperative estimated trajectories of our method.

Figure 15. The air–ground collaborative mapping results of our method on the Aerial-02 and the Ground-04 sequences: (a) The complete map constructed by our method; (b) a partial enlarged view of the map constructed by our method.

Figure 16. The comparison results of the ground truth trajectories on the Aerial-08, Ground-03, and Ground-06 sequences with the cooperative estimated trajectories of our method.

Figure 17. The air–ground collaborative mapping results of our method on the Aerial-08, Ground-03, and Ground-06 sequences: (a) The complete map constructed by our method; (b) a partial enlarged view of the map constructed by our method.

Table 1. Results of DPGO data on 2D and 3D baseline datasets for each method.

Number of Nodes	Method	Number of Iterations	Sphere(3D)	Grid(3D)	Garage(3D)	Torus(3D)	City(2D)	CSAIL(2D)
$F^{*}$			$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.2625 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3862 \times 10^{2}$	$3.1704 \times 10^{1}$
5	DGS	12	$1.7245 \times 10^{3}$	$8.4351 \times 10^{4}$	$1.4857 \times 10^{0}$	$2.4294 \times 10^{4}$	$6.8616 \times 10^{2}$	$3.4619 \times 10^{1}$
		25	$1.6953 \times 10^{3}$	$8.4325 \times 10^{4}$	$1.4062 \times 10^{0}$	$2.4249 \times 10^{4}$	$6.7354 \times 10^{2}$	$3.3285 \times 10^{1}$
		50	$1.6889 \times 10^{3}$	$8.4321 \times 10^{4}$	$1.3328 \times 10^{0}$	$2.4238 \times 10^{4}$	$6.6427 \times 10^{2}$	$3.2617 \times 10^{1}$
		100	$1.6877 \times 10^{3}$	$8.4321 \times 10^{4}$	$1.3170 \times 10^{0}$	$2.4235 \times 10^{4}$	$6.5536 \times 10^{2}$	$3.2081 \times 10^{1}$
		250	$1.6871 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.2867 \times 10^{0}$	$2.4229 \times 10^{4}$	$6.5228 \times 10^{2}$	$3.1805 \times 10^{1}$
		500	$1.6870 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.2768 \times 10^{0}$	$2.4228 \times 10^{4}$	$6.5006 \times 10^{2}$	$3.1704 \times 10^{1}$
	RBCD++	12	$1.6941 \times 10^{3}$	$8.4347 \times 10^{4}$	$1.5249 \times 10^{0}$	$2.4251 \times 10^{4}$	$6.7047 \times 10^{2}$	$3.1707 \times 10^{1}$
		25	$1.6904 \times 10^{3}$	$8.4323 \times 10^{4}$	$1.4521 \times 10^{0}$	$2.4231 \times 10^{4}$	$6.5928 \times 10^{2}$	$3.1707 \times 10^{1}$
		50	$1.6871 \times 10^{3}$	$8.4321 \times 10^{4}$	$1.3595 \times 10^{0}$	$2.4229 \times 10^{4}$	$6.5279 \times 10^{2}$	$3.1705 \times 10^{1}$
		100	$1.6870 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3364 \times 10^{0}$	$2.4228 \times 10^{4}$	$6.4418 \times 10^{2}$	$3.1704 \times 10^{1}$
		250	$1.6870 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3276 \times 10^{0}$	$2.4228 \times 10^{4}$	$6.3892 \times 10^{2}$	$3.1704 \times 10^{1}$
		500	$1.6870 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3199 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3862 \times 10^{2}$	$3.1704 \times 10^{1}$
	MM-PGO	12	$1.6936 \times 10^{3}$	$8.4405 \times 10^{4}$	$1.4482 \times 10^{0}$	$2.4274 \times 10^{4}$	$6.5652 \times 10^{2}$	$3.1706 \times 10^{1}$
		25	$1.6903 \times 10^{3}$	$8.4341 \times 10^{4}$	$1.4047 \times 10^{0}$	$2.4235 \times 10^{4}$	$6.5337 \times 10^{2}$	$3.1705 \times 10^{1}$
		50	$1.6871 \times 10^{3}$	$8.4325 \times 10^{4}$	$1.3658 \times 10^{0}$	$2.4229 \times 10^{4}$	$6.4968 \times 10^{2}$	$3.1704 \times 10^{1}$
		100	$1.6870 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3282 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.4421 \times 10^{2}$	$3.1704 \times 10^{1}$
		250	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.3094 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3896 \times 10^{2}$	$3.1704 \times 10^{1}$
		500	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.2876 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3865 \times 10^{2}$	$3.1704 \times 10^{1}$
	Our method	12	$1.6894 \times 10^{3}$	$8.4342 \times 10^{4}$	$1.4172 \times 10^{0}$	$2.4247 \times 10^{4}$	$6.5133 \times 10^{2}$	$3.1706 \times 10^{1}$
		25	$1.6875 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3817 \times 10^{0}$	$2.4229 \times 10^{4}$	$6.4977 \times 10^{2}$	$3.1704 \times 10^{1}$
		50	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.3394 \times 10^{0}$	$2.4228 \times 10^{4}$	$6.4518 \times 10^{2}$	$3.1704 \times 10^{1}$
		100	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.3105 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3927 \times 10^{2}$	$3.1704 \times 10^{1}$
		250	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.2872 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3862 \times 10^{2}$	$3.1704 \times 10^{1}$
		500	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.2766 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3862 \times 10^{2}$	$3.1704 \times 10^{1}$
10	DGS	12	$1.7418 \times 10^{3}$	$8.5326 \times 10^{4}$	$1.5635 \times 10^{0}$	$2.4434 \times 10^{4}$	$1.5718 \times 10^{3}$	$3.5107 \times 10^{1}$
		25	$1.7047 \times 10^{3}$	$8.4854 \times 10^{4}$	$1.4715 \times 10^{0}$	$2.4327 \times 10^{4}$	$9.6577 \times 10^{2}$	$3.3472 \times 10^{1}$
		50	$1.6893 \times 10^{3}$	$8.4611 \times 10^{4}$	$1.3364 \times 10^{0}$	$2.4305 \times 10^{4}$	$8.9752 \times 10^{2}$	$3.2785 \times 10^{1}$
		100	$1.6875 \times 10^{3}$	$8.4594 \times 10^{4}$	$1.3170 \times 10^{0}$	$2.4298 \times 10^{4}$	$7.7745 \times 10^{2}$	$3.2479 \times 10^{1}$
		250	$1.6872 \times 10^{3}$	$8.4462 \times 10^{4}$	$1.2867 \times 10^{0}$	$2.4259 \times 10^{4}$	$7.0063 \times 10^{2}$	$3.1792 \times 10^{1}$
		500	$1.6872 \times 10^{3}$	$8.4332 \times 10^{4}$	$1.2768 \times 10^{0}$	$2.4246 \times 10^{4}$	$6.4072 \times 10^{2}$	$3.1743 \times 10^{1}$
	RBCD++	12	$1.7025 \times 10^{3}$	$8.4425 \times 10^{4}$	$1.5663 \times 10^{0}$	$2.4352 \times 10^{4}$	$6.7946 \times 10^{2}$	$3.1711 \times 10^{1}$
		25	$1.6956 \times 10^{3}$	$8.4345 \times 10^{4}$	$1.4951 \times 10^{0}$	$2.4262 \times 10^{4}$	$6.6175 \times 10^{2}$	$3.1708 \times 10^{1}$
		50	$1.6873 \times 10^{3}$	$8.4323 \times 10^{4}$	$1.3618 \times 10^{0}$	$2.4241 \times 10^{4}$	$6.5428 \times 10^{2}$	$3.1705 \times 10^{1}$
		100	$1.6870 \times 10^{3}$	$8.4321 \times 10^{4}$	$1.3364 \times 10^{0}$	$2.4239 \times 10^{4}$	$6.5396 \times 10^{2}$	$3.1705 \times 10^{1}$
		250	$1.6870 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3276 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.5122 \times 10^{2}$	$3.1704 \times 10^{1}$
		500	$1.6870 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3199 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.4973 \times 10^{2}$	$3.1704 \times 10^{1}$
	MM-PGO	12	$1.7004 \times 10^{3}$	$8.4376 \times 10^{4}$	$1.4925 \times 10^{0}$	$2.4266 \times 10^{4}$	$6.5693 \times 10^{2}$	$3.1706 \times 10^{1}$
		25	$1.6896 \times 10^{3}$	$8.4329 \times 10^{4}$	$1.4529 \times 10^{0}$	$2.4236 \times 10^{4}$	$6.5379 \times 10^{2}$	$3.1705 \times 10^{1}$
		50	$1.6872 \times 10^{3}$	$8.4320 \times 10^{4}$	$1.3864 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.5231 \times 10^{2}$	$3.1705 \times 10^{1}$
		100	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.3282 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.5138 \times 10^{2}$	$3.1704 \times 10^{1}$
		250	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.3094 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.4732 \times 10^{2}$	$3.1704 \times 10^{1}$
		500	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.2876 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.4297 \times 10^{2}$	$3.1704 \times 10^{1}$
	Our method	12	$1.6942 \times 10^{3}$	$8.4372 \times 10^{4}$	$1.4237 \times 10^{0}$	$2.4261 \times 10^{4}$	$6.5517 \times 10^{2}$	$3.1705 \times 10^{1}$
		25	$1.6875 \times 10^{3}$	$8.4324 \times 10^{4}$	$1.4017 \times 10^{0}$	$2.4235 \times 10^{4}$	$6.5135 \times 10^{2}$	$3.1705 \times 10^{1}$
		50	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.3688 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.4936 \times 10^{2}$	$3.1704 \times 10^{1}$
		100	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.3105 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.4762 \times 10^{2}$	$3.1704 \times 10^{1}$
		250	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.2872 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3989 \times 10^{2}$	$3.1704 \times 10^{1}$
		500	$1.6870 \times 10^{3}$	$8.4319 \times 10^{4}$	$1.2766 \times 10^{0}$	$2.4227 \times 10^{4}$	$6.3862 \times 10^{2}$	$3.1704 \times 10^{1}$

Table 2. Results of our method and other methods on the sequences of the GRACO dataset.

Sequences	Lio-Sam	Fast-Lio2	ORB-SLAM3	GAC-Mapping	Our Method
Sequences	ATE(m)/RTE(m)	ATE(m)/RTE(m)	ATE(m)/RTE(m)	ATE(m)/RTE(m)	ATE(m)/RTE(m)
Ground-01	0.3162/0.0764	0.4280/0.1976	4.9654/0.7897	1.0347/3.8892	0.2736/0.0557
Ground-02	0.8321/0.1173	1.0039/0.1736	3.4576/6.9509	0.3541/3.3475	0.6584/0.1008
Ground-03	1.1246/0.1007	1.6097/0.1458	4.9613/7.0285	1.6059/1.8317	1.0097/0.1325
Ground-04	0.7291/0.0923	0.7292/0.2011	6.2968/6.9404	1.4731/2.1381	0.6937/0.0782
Ground-05	1.9802/0.1027	1.8987/0.2179	8.5792/6.9073	1.5922/3.5628	0.9583/0.1694
Ground-06	0.9217/0.0788	2.3513/0.2470	6.2770/6.9252	1.4306/2.0467	0.9089/0.1037
Aerial-01-40 m	0.4592/0.2379	86.8127/12.3536	22.6670/8.7108	1.2199/4.3624	0.4375/0.2586
Aerial-02-20 m	0.1857/0.2836	35.3482/22.4799	8.7979/8.0089	1.3960/2.1995	0.1699/0.2576
Aerial-03-20 m	0.2759/0.1742	37.6768/18.5712	25.7488/5.3842	0.8057/0.6752	0.1884/0.1566
Aerial-04-40 m	3.8695/1.5274	36.0139/10.9409	18.1040/8.2972	5.3285/15.7729	1.9439/1.2463
Aerial-05-40 m	0.8951/0.5993	2.5204/0.5657	28.9463/8.5584	1.6986/9.5503	0.8722/0.5081
Aerial-06-20 m	2.6581/1.4525	56.4376/17.2058	14.2582/8.4145	2.9802/10.7491	1.2358/0.9367
Aerial-07-25 m	5.1380/1.6013	46.2759/12.2668	14.7024/8.4329	1.4761/8.4197	5.8716/1.3675
Aerial-08-25 m	0.3145/0.1457	17.2637/4.6123	16.2044/8.2248	1.5794/3.9037	0.2819/0.1596

Table 3. Results of our method and other methods on the air–ground collaborative sequences of the GRACO dataset.

Team	Sequences	GAC-Mapping			Our Method
Team	Sequences	ATE(m)/RTE(m)	Parameters(MB)	Time Cost(s)	ATE(m)/RTE(m)	Parameters(MB)	Time Cost(s)
Team 1	Aerial-02-20 m	1.9351/3.2048	2.37	3.72	0.4071/0.4926	1.92	2.27
Team 1	Ground-04	1.6639/2.9573	2.37	3.72	0.8148/0.1057	1.92	2.27
Team 2	Aerial-08-25 m	1.4051/4.7883	5.16	7.97	0.5350/0.4160	3.64	5.13
	Ground-03	0.8727/2.6930			1.1962/0.3573
	Ground-06	1.4401/2.1815			1.2706/0.3438

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, P.; Bi, Y.; Wang, C.; Jiang, X. A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation. Drones 2025, 9, 504. https://doi.org/10.3390/drones9070504

AMA Style

Liu P, Bi Y, Wang C, Jiang X. A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation. Drones. 2025; 9(7):504. https://doi.org/10.3390/drones9070504

Chicago/Turabian Style

Liu, Peng, Yuxuan Bi, Caixia Wang, and Xiaojiao Jiang. 2025. "A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation" Drones 9, no. 7: 504. https://doi.org/10.3390/drones9070504

APA Style

Liu, P., Bi, Y., Wang, C., & Jiang, X. (2025). A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation. Drones, 9(7), 504. https://doi.org/10.3390/drones9070504

Article Menu

A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation

Abstract

1. Introduction

2. Related Work

2.1. Distributed Pose Graph Optimization

2.2. Air–Ground Cross-Domain Cooperative SLAM

3. Materials and Methods

3.1. Robot-Local Front-End Module

3.2. Distributed Loop Closure Configuration Module

3.2.1. PCM Incremental Anomaly Detection Module

3.2.2. Iterative Registration Method Based on Semantic and Geometric Features Assistance

3.3. Robust DPGO Back-End Module

4. Results

4.1. DPGO Performance Analysis

4.2. Air–Ground Cross-Domain Collaborative SLAM Performance Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI