Review Reports - A Distributed Multi-Robot Collaborative SLAM Method Based on Air–Ground Cross-Domain Cooperation

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1.There are too many principles and formulas in the multi-robot collaboration section, but the connection and support between the various sections are not obvious. For example, what is the relationship between PCM in 3.2.2, Iterative Registration in 3.2.3, and the so-called DPGO in 3.3? There are too many theoretical formulas and the connection is loose, making it difficult for readers to understand. The content seems to be deliberately piled up.

2.What exactly is the advantage of the so-called DPGO method? The contents of Table 1 and Figure 6 do not match. The data in Table 1 do not show the advantages of DPGO, so why can it be shown in Figure 6? The logic does not make sense.

3. The mainstream air-ground collaborative SLAM methods in recent years (such as SBC-SLAM and DCL-SLAM) were not compared, and only single-robot algorithms such as Lio-Sam and Fast-Lio2 were selected, which lacks specificity.

4. Only ATE and RTE are used to evaluate trajectory accuracy, lacking quantitative indicators of map quality (such as average registration error and feature matching recall). In terms of computing resource consumption (such as communication volume and memory usage), no comparison is made with the comparison method, which cannot reflect the actual deployment advantages of the algorithm.

5. The experiments are entirely based on public datasets (such as GRACO) and have not been tested in real scenarios (such as complex buildings and dynamic outdoor environments). This makes it difficult to verify the robustness of the algorithm under practical challenges such as sensor noise and occlusion.

6. The experimental results did not demonstrate the reliability of the proposed method. They only provided vague multi-robot collaborative mapping effects and the local magnification effect of a single robot. The actual experimental results were unconvincing and lacked comparison.

Author Response

Comments 1: There are too many principles and formulas in the multi-robot collaboration section, but the connection and support between the various sections are not obvious. For example, what is the relationship between PCM in 3.2.2, Iterative Registration in 3.2.3, and the so-called DPGO in 3.3? There are too many theoretical formulas and the connection is loose, making it difficult for readers to understand. The content seems to be deliberately piled up.

Response 1: We fully agree with your assessment and have made the following revisions to the manuscript. Firstly, in the overall introduction section of the algorithm, we provided detailed explanations of the correlations among all the components. Secondly, we have added detailed explanations of the progressive relationship between each chapter and the subsequent chapters at the beginning and end of each section. This effectively strengthened the connections between various parts and improved the logical coherence of the manuscript. The specific details of these revisions can be found in the red-marked sections on lines 189 to 198, lines 228 to 235, lines 258 to 226, lines 319 to 320, and lines 322 to 325 of the manuscript.

Comments 2: What exactly is the advantage of the so-called DPGO method? The contents of Table 1 and Figure 6 do not match. The data in Table 1 do not show the advantages of DPGO, so why can it be shown in Figure 6? The logic does not make sense.

Response 2: We fully agree with your assessment and have made the following revisions to the manuscript. The core objective of DPGO is to achieve optimal convergence with the fewest iterations possible, rather than continuing iterations until an optimal result is reached. To address this, we first revised Table 1 by adding the objective function values for iterations at 12, 25, and 50, and re-analyzed the data presented in the table. Next, we modified Figures 6 and 7 by establishing thresholds for the Riemannian gradient norm and the relative suboptimal gap. This modification enables readers to intuitively compare the performance of each method through curve comparisons. Moreover, we have strengthened the analysis and comparison between Figures 6 and 7 to further clarify the relationship between these two figures. We believe these revisions will help readers gain a clearer understanding of the exceptional performance of the DPGO method presented in this paper. For detailed information about these revisions, please refer to the sections marked in red on lines 524 to 586 of the manuscript.

Comments 3: The mainstream air-ground collaborative SLAM methods in recent years (such as SBC-SLAM and DCL-SLAM) were not compared, and only single-robot algorithms such as Lio-Sam and Fast-Lio2 were selected, which lacks specificity.

Response 3: We fully agree with your assessment and have made the following revisions to the manuscript. To verify the practical performance of our method, we compared it with the mainstream ground-air cooperative SLAM approaches used in recent years. In our experiment, we selected algorithms with open-source codes. However, the experimental results showed that most of them were designed for multi-robot collaboration SLAM in the same field. While these algorithms performed well in the ground sequence, their performance in the air sequence was notably poor. As a result, we decided to include only GAC-Mapping in the manuscript. This algorithm, based on air-ground collaborative multi-robot SLAM, demonstrated strong performance in both air and ground sequences. Because the performance of the benchmark methods is superior to that of the multi-robot collaborative SLAM methods. Therefore, our approach still focuses on comparing with those single-robot SLAM methods that are the benchmarks. The specific details of these revisions can be found in the red-marked sections on lines 612 to 627 of the manuscript.

Comments 4: Only ATE and RTE are used to evaluate trajectory accuracy, lacking quantitative indicators of map quality (such as average registration error and feature matching recall). In terms of computing resource consumption (such as communication volume and memory usage), no comparison is made with the comparison method, which cannot reflect the actual deployment advantages of the algorithm.

Response 4: We fully agree with your assessment and have made the following revisions to the manuscript. Firstly, we fully understand that the quality of the map is an important evaluation criterion for SLAM, and the average registration error and the recall rate of feature matching are the key evaluation indicators for assessing the quality of the map. However, based on the research objective of collaborative trajectory optimization for our global robots, the ATE and RTE indicators meet the requirements. Moreover, in the SLAM framework based on pose graph optimization, trajectory accuracy (ATE/RTE) is closely linked to the quality of the map. ATE/RTE inherently reflects the effectiveness of map construction and optimization, especially when global trajectory accuracy is a key requirement. Achieving a globally consistent and accurate trajectory is our primary optimization goal, which largely hinges on the effective construction and optimization of the local map. Our experiments demonstrate that the ATE/RTE of our method outperforms benchmark SLAM methods, providing indirect evidence of the high quality of the map generated by our algorithm, as aligned with our application goals. In future work, we will focus on repositioning based on air-ground collaborative maps. Consequently, a more detailed analysis of map quality will be provided in our subsequent work. We appreciate your understanding. Secondly, regarding the comparison of computing resources and execution time, we have included a comparison between our method and GAC-Mapping. The experimental results demonstrate that our method not only delivers more accurate pose estimation, but also requires fewer computing resources, less execution time, and exhibits higher efficiency. The specific details of these revisions can be found in the red-marked sections on lines 707 to 725 of the manuscript.

Comments 5: The experiments are entirely based on public datasets (such as GRACO) and have not been tested in real scenarios (such as complex buildings and dynamic outdoor environments). This makes it difficult to verify the robustness of the algorithm under practical challenges such as sensor noise and occlusion.

Response 5: We fully agree with your assessment and have made the following revisions to the manuscript. The practical scenario-based method testing is a key component for evaluating the performance of the proposed approach. In this manuscript, we used the GRACO public dataset, selected for its reproducibility and fairness in comparison. This dataset is derived from raw data collected by actual sensors, which inherently includes sensor noise. As a result, the excellent performance of our method demonstrates its robustness against such noise. We fully understand and agree with the reviewer’s perspective that conducting comprehensive field tests in real outdoor environments is a vital next step. These tests will reveal challenges that cannot be fully simulated using laboratory data and are essential for the technology’s practical application. Thus, while the current study validates the core algorithm’s effectiveness and efficiency through publicly available datasets, our future research will focus on assessing the algorithm’s robustness and adaptability in real-world, complex scenarios. However, at present, due to time and technical constraints, we will conduct actual scenario testing in our future work. Revisions have been made to the discussion and conclusion sections to reflect this direction. The specific details of these revisions can be found in the red-marked sections on lines 822 to 833 and lines 852 to 858 of the manuscript.

Comments 6: The experimental results did not demonstrate the reliability of the proposed method. They only provided vague multi-robot collaborative mapping effects and the local magnification effect of a single robot. The actual experimental results were unconvincing and lacked comparison.

Response 6: We fully agree with your assessment and have made the following revisions to the manuscript. In the previous version, the comparison between our method and others concerning multi-robot ground-air collaboration performance was lacking. The manuscript only verified the effectiveness of our method without drawing comparisons. To address this, we have now included a comparison of the global trajectory with GAC-Mapping, as well as a comparison of the estimation errors in the pose sequences. Additionally, we have provided a detailed analysis of the results from these comparisons. The specific details of these revisions can be found in the red-marked sections on lines 703 to 784 of the manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

drones-3687078

Paper Title: A Distributed Multi-Robot Collaborative SLAM Method Based 2 on Air-Ground Cross-Domain Cooperation

Authors proposed a distributed multi-robot collaborative SLAM method based on air-ground cross-domain cooperation. They integrated environmental feature information from UAVs and UGVs to achieve global pose estimation and map construction in cross-domain scenarios. They have shown the performance of the proposed method by comparing with other competing methods in an extensive set of examples. The manuscript is well written and can be accepted after minor revisions.

My comments are as follows.

Lines 56-59: " For instance, Rosen et al. [11] introduced SE-Sync, which reformulates 56 the multi-robot pose-mapping optimization problem as a maximum likelihood estimation under assumed measurement noise distributions, enabling the recovery of globally optimal solutions for unique Euclidean synchronization."

Lines 472-473: "First, the relative suboptimality gap, which is defined as (F−F^∗)/F^∗, compares each iteration's objective value F against the global optimal F^∗ computed by SE-SYNC [11]."

Please elaborate on the SE-SYNC method for computing the relative suboptimality gap.

Lines 475-476: "Second, the Riemannian gradient norm measures reflect the gradient variations during optimization, where smaller values reflect stronger convergence properties."

Please elaborate on or give a mathematical expression for the Riemannian gradient norm.

Table 1:

1) What is the meaning of the numerical value of the the global optimal F*? Isn't is a physical quantity having a certain unit? Please add some explanations.

2) For the most part of entries, the convergence is achieved before 100 iterations. As a result, methods are not fully compared. Please consider re-making the table with 100, 50, 25, 12 iterations, for example.

Figure 6:

1) Please make a comment on acceptable or target values of 'Grad Norm' and 'F-F*'.

2) If 10^0 is enough for 'Grad Norm', the proposed method's performance is about the same as other methods except the DGS. Please check and make some comments.

3) Table 1 shows that F* for Grid(3D) is 8.4e4. Then F-F* value of 10^1 would be enough. With 5 nodes, the proposed method is of similar performance to other methods (except MM-PGO). Again please check and make some comments.

Figure 7:

Lines 515-522: " Notably, our method exhibits faster convergence with 5-node configurations versus 10-node systems, confirming an inverse relationship between node count and optimization complexity. This verifies the negative correlation between the number of robot nodes and the complexity of the optimization problem. Fewer nodes impose tighter spatial constraints on pose estimates, thereby accelerating convergence. Despite this node-dependent variation, our method consistently outperforms other methods, delivering superior convergence rates and final optimization accuracy regardless of system scale."

The proposed method and other methods (except DGS) work well with 5 nodes (Compare Figure 7b with Figure 7d). Then why try 10 nodes? Please consider using a different example.

Author Response

Comments 1: Lines 56-59: " For instance, Rosen et al. [11] introduced SE-Sync, which reformulates 56 the multi-robot pose-mapping optimization problem as a maximum likelihood estimation under assumed measurement noise distributions, enabling the recovery of globally optimal solutions for unique Euclidean synchronization."

Lines 472-473: "First, the relative suboptimality gap, which is defined as (F−F∗)/F∗, compares each iteration's objective value F against the global optimal F∗ computed by SE-SYNC [11]."

Please elaborate on the SE-SYNC method for computing the relative suboptimality gap.

Response 1: We fully agree with your assessment and have made the following revisions to the manuscript. In the DPGO experiment section, we have added a detailed calculation process for the relative suboptimal gap of SE-SYNC in the evaluation indicators, which is crucial for supplementing the experimental details. The specific details of these revisions can be found in the red-marked sections on lines 488 to 505 of the manuscript.

Comments 2: Lines 475-476: "Second, the Riemannian gradient norm measures reflect the gradient variations during optimization, where smaller values reflect stronger convergence properties."

Please elaborate on or give a mathematical expression for the Riemannian gradient norm.

Response 2: We fully agree with your assessment and have made the following revisions to the manuscript. We have added a detailed calculation process for the Riemann gradient norm as an evaluation metric in the DPGO experiment section, which is crucial for supplementing the experimental details. The specific details of these revisions can be found in the red-marked sections on lines 506 to 512 of the manuscript.

Comments 3: Table 1:

1) What is the meaning of the numerical value of the the global optimal F*? Isn't is a physical quantity having a certain unit? Please add some explanations.

Response 3:

1) We fully agree with your assessment and have made the following revisions to the manuscript. F* represents the global optimization objective value of the multi-robot pose graph optimization problem. Its magnitude is determined by the weighted sum of the covariances of all measurement residuals (rotation and translation) and essentially has no unified physical unit. We provided detailed explanations in the manuscript. The specific details of these revisions can be found in the red-marked sections on lines 525 to 532 of the manuscript.

2) We fully agree with your assessment and have made the following revisions to the manuscript. We have redrafted Table 1 and included the objective function values for 12, 25, and 50 iterations. This addition allows for a comprehensive comparison of the optimization performance of each method. The specific details of these revisions can be found in the red-marked sections in Table 1 of the manuscript.

Comments 4: Figure 6:

1) Please make a comment on acceptable or target values of 'Grad Norm' and 'F-F*'.

2) If 10^0 is enough for 'Grad Norm', the proposed method's performance is about the same as other methods except the DGS. Please check and make some comments.

Response 4:

1) We fully agree with your assessment and have made the following revisions to the manuscript. We added the acceptable values of the Riemann gradient norm and the relative suboptimal gap in the DPGO experimental section, enriching the experimental details. The specific details of these revisions can be found in the red-marked sections on lines 556 to 562 of the manuscript.

2) We fully agree with your assessment and have made the following revisions to the manuscript. We have set the threshold of the Riemann gradient norm to . This threshold indicates that when the Riemann gradient norm falls below this value, it is considered that the critical point state has been reached. This ensures that the rotation error in pose estimation is less than , and the translation error is less than 5% of the scene scale. Additionally, a smaller Riemann gradient norm correlates with better convergence performance of the method. The core objective of DPGO is to achieve the fastest optimal convergence. As shown in the comparison graph, our method demonstrates the best convergence performance. Finally, we have added comments to the experimental analysis. The specific details of these revisions can be found in the red-marked sections on lines 563 to 586 of the manuscript.

3) We fully agree with your assessment and have made the following revisions to the manuscript. In the previous version, we calculated the difference between the target function value and the optimal value directly, rather than considering the relative suboptimal distance. We have now modified the approach to calculate the relative suboptimal distance. Additionally, we have set a threshold for the relative suboptimal gap at . A smaller threshold indicates better optimization and brings the result closer to the target function value. The core objective of DPGO is to achieve the fastest optimal convergence. Therefore, the convergence performance is demonstrated through the comparison curves in the figure. The specific details of these revisions can be found in the red-marked sections on lines 563 to 586 of the manuscript.

Comments 5: Figure 7:

The proposed method and other methods (except DGS) work well with 5 nodes (Compare Figure 7b with Figure 7d). Then why try 10 nodes? Please consider using a different example.

Response 5: We fully agree with your assessment and sincerely appreciate your concern about this matter. We also completely understand your thoughtful considerations regarding the improvement of the manuscript's quality. The core challenge of DPGO lies in its nonlinear growth in computational complexity as the number of nodes increases. This challenge arises from several factors: first, an increase in nodes causes an exponential rise in the complexity of the network topology; second, the number of relative pose constraints between nodes grows significantly with the number of nodes; and third, distributed algorithms must demonstrate their robustness across systems of varying scales. As a result, testing only a single scale (e.g., performance with 5 nodes) does not adequately assess the scalability of the method, which is a core aspect of DPGO. Moreover, the performance gap between the 5-node and 10-node setups in this study indicates that the computational complexity of DPGO increases substantially as the number of nodes rises. Although the relative suboptimal gap and the Riemann gradient norm show minimal changes, the number of iterations required to reach a critical state has notably increased. A detailed comparison of the methods shows that our method exhibits lower sensitivity to node scale expansion and superior optimization performance. Therefore, including both the 5-node and 10-node experiments is essential in DPGO research. These experiments are crucial for: demonstrating the algorithm's robustness to system scale expansion, exposing the weaknesses of comparison methods in large-scale topologies, and verifying the method's practicality in real-world scenarios (e.g., 10-robot collaborative SLAM). As such, retaining the 10-node experiment aligns with domain evaluation standards and supports the conclusions drawn in the paper.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a solution to collaborative SLAM between air and ground agents. The method is based on a framework consisting of three modules: robot-local front-end, distributed loop closure, and robust DPGO back-end. The proposed framework is compared with other state-of-the art methods to demonstrate high-precision global pose estimation and pose construction.

The paper requires some improvements to better understand the notation and operations described:

Notation and order of presentation of ideas should be improved. For example, Equation (1) presents for the first time the notation of g variables, but they are not explained until page 9, where g are defined as position-orientation pairs (t,R). Thus, if they (t,R) pairs, how is the product operation in Equation (1) defined? Are g variables actually 4x4 homogeneous matrices containing [R,t;0,0,0,1]? Please change the order of defining notation and symbols so that operations can be clearly understood starting in Equation (1).

Please check symbols overall in the paper for consistency and unambiguity. For example: is the F function of equation (10) the same as in Equation (6)? If not, please redefine with different symbols to avoid confusion.

Please indicate the end of proof of Theorem 1 with the standard square symbol [] typically used to denote the end of proofs.

Please clarify if the pairs AAAA/BBBB shown in table 2 are ATE/RTE.

Page 23 describes an experimental analysis of the cooperative positioning and mapping performance, but the conditions of this analysis are unclear. What dataset or instance is being used in this concrete analysis? All datasets/instances described in the experiments were totally clear until this point.

Author Response

Comments 1: Notation and order of presentation of ideas should be improved. For example, Equation (1) presents for the first time the notation of g variables, but they are not explained until page 9, where g are defined as position-orientation pairs (t,R). Thus, if they (t,R) pairs, how is the product operation in Equation (1) defined? Are g variables actually 4x4 homogeneous matrices containing [R,t;0,0,0,1]? Please change the order of defining notation and symbols so that operations can be clearly understood starting in Equation (1).

Response 1: We fully agree with your assessment and have made the following revisions to the manuscript. First, we have provided detailed definitions for all symbols used in Equation (1) before its introduction. Second, the composition of was clearly stated, and an appropriate definition for the product operation was included. The specific details of these revisions can be found in the red-marked sections on lines 236 to 257 of the manuscript.

Comments 2: Please check symbols overall in the paper for consistency and unambiguity. For example: is the F function of equation (10) the same as in Equation (6)? If not, please redefine with different symbols to avoid confusion.

Response 2: We fully agree with your assessment and have made the following revisions to the manuscript. First, we carefully examined the function F in equations (10) and (6). Since these are different functions, we have corrected the function in equation (6) accordingly. Additionally, we reviewed and revised all the symbols used in the formulas throughout the document to ensure consistency and clarity. The specific details of these revisions can be found in the red-marked sections of equation (6) of the manuscript.

Comments 3: Please indicate the end of proof of Theorem 1 with the standard square symbol [] typically used to denote the end of proofs.

Response 3: We fully agree with your assessment and have made the following revisions to the manuscript. We added standard square brackets [] at the end of the proof of Theorem 1 to indicate the conclusion of the proof. The specific details of these revisions can be found in the red-marked sections on lines 453 to 454 of the manuscript.

Comments 4: Please clarify if the pairs AAAA/BBBB shown in table 2 are ATE/RTE.

Response 4: We fully agree with your assessment and have made the following revisions to the manuscript. We have ensured that the data in Table 2 accurately correspond to the ATE and RTE respectively, and we have updated Table 2 accordingly. The specific details of these revisions can be found in the red-marked sections in Table 2 of the manuscript.

Comments 5: Page 23 describes an experimental analysis of the cooperative positioning and mapping performance, but the conditions of this analysis are unclear. What dataset or instance is being used in this concrete analysis? All datasets/instances described in the experiments were totally clear until this point.

Response 5: We fully agree with your assessment and have made the following revisions to the manuscript. First, we have added descriptions of the specific data sets used in the performance analysis of the two collaborative positioning and mapping sets. Additionally, we have updated Table 3 to offer a clearer analysis of the performance of our method in these tasks. The specific details of these revisions can be found in the red-marked sections on lines 732 to 735 and lines 762 to 765 of the manuscript.

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have satisfactorily addressed my reviewer comments.