Three-Level MIFT: A Novel Multi-Source Information Fusion Waterway Tracking Framework
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper addresses the problem of multitarget tracking with the fusion of Lidar and AIS information for inland vessel monitoring. An improved adaptive LiDAR tracking algorithm is proposed and employs the enhanced Dempster-Shafer theory to fuse AIS information. The work is valuable, while the manuscript can be improved in the following aspects:
1. The figures are unclear. Consider converting them to vector graphics and increasing font sizes.
2. Add the performance of comparative methods in Subsection 3.4
3. The evidence fusion shows instability in high-conflict scenarios, requiring manual threshold tuning. However, the paper lacks quantitative validation for dynamic environments.
4. The Voxel Grid filter discards high-frequency point cloud data, potentially degrading size estimation for sparse targets. How to handle this potential issue?
5. Some symbols, such as \beta in eq.13, are introduced without justification.
6. Though the focus is on non-deep learning methods, a brief discussion or comparison with a lightweight deep learning tracker could better contextualize the performance claims.
7. Add a brief introduction to the content of the paper's sections at the end of the introduction section.
8. There are some typos and grammatical errors, e.g., “ours LiDAR”, “Over the horizon early warning capability”. A thorough proofreading is required.
Author Response
Dear professors,
The revisions have been completed according to the comments of all the reviewers. It is quite convincing that each opinion is of great value, which improves the quality of this manuscript a lot. We are very grateful for this. According to these comments, all sections have been improved, so as to complete the modification of this manuscript. Compared with this manuscript's version1 in MDPI System, our revisions Manuscript Version 2 has been uploaded.
Comments 1: The figures are unclear. Consider converting them to vector graphics and increasing font sizes.
Response 1: Thank you very much for your helpful comment.
We thank the reviewer for their valuable feedback on the quality of the figures. We agree that clarity is essential for proper presentation. Following this suggestion, we have revised the figures in the manuscript. They have now been converted to high-resolution vector graphics, and all font sizes have been increased to ensure better readability. We are confident that the updated figures are now much clearer.
Comments 2: Add the performance of comparative methods in Subsection 3.4
Response 2: Thank you very much for your helpful comment.
We thank the reviewer for the suggestion to add comparative methods to Subsection 3.4. We would like to clarify that the experiment in this subsection is designed as an ablation study to specifically measure the impact of our fusion method. For this purpose, the most appropriate baseline is our own LiDAR-only tracker. We have now explicitly stated this rationale in the manuscript to explain why this comparison is direct and avoids the implementation biases that would arise from integrating our module with third-party algorithms.
Comments 3: The evidence fusion shows instability in high-conflict scenarios, requiring manual threshold tuning. However, the paper lacks quantitative validation for dynamic environments.
Response 3: Thank you very much for your helpful comment.
We thank the reviewer for their valuable feedback on the performance of our evidence fusion method in high-conflict scenarios. We agree that this is a limitation and that further validation is required. In the "Conclusions and Future Work" section, we have now explicitly stated that the fusion mechanism can be challenged by high-conflict cases and currently relies on manually tuned parameters. We have also emphasized that quantitative validation in dynamic environments is a primary objective for our future research.
Comments 4: The Voxel Grid filter discards high-frequency point cloud data, potentially degrading size estimation for sparse targets. How to handle this potential issue?
Response 4: Thank you very much for your helpful comment.
We thank the reviewer for raising the important point about potential information loss due to the Voxel Grid filter. We agree that this required clarification. In the revised manuscript, we have now added a justification for our chosen voxel size (0.1 m), explaining that it is sufficiently small relative to the targets of interest to retain essential details for size estimation while still providing the benefits of downsampling. We appreciate the suggestion to make this explicit.
Comments 5: Some symbols, such as \beta in eq.13, are introduced without justification.
Response 5: Thank you very much for your helpful comment.
We thank the reviewer for their feedback regarding the undefined symbol \beta in Equation 13. In the revised text, we have now explicitly defined \beta as a fuzzy function adjustment parameter and explained its purpose in controlling the model's sensitivity. We appreciate the opportunity to improve the manuscript's clarity.
Comments 6: Though the focus is on non-deep learning methods, a brief discussion or comparison with a lightweight deep learning tracker could better contextualize the performance claims.
Response 6: Thank you very much for your helpful comment.
We appreciate the reviewer’s insightful suggestion. In the revised manuscript, we have expanded the comparison set by including three additional recent non-deep-learning-based algorithms, namely Guo’s algorithm (2024), Dalhaug’s algorithm (2025), and Xu’s algorithm (2025), to provide a more comprehensive evaluation. In addition, a discussion has been added (Lines 513-535) explaining that deep-learning-based trackers were not adopted in this work mainly because of their high computational cost and limited real-time applicability on resource-constrained embedded or edge computing platforms, which is the focus of our study.
To demonstrate the efficiency of our proposed framework, runtime performance measurements have also been reported, showing an average processing time of approximately 36.99 ms per frame (around 27 FPS) on a single CPU thread. Furthermore, the Conclusion section now explicitly states that future work will explore integrating lightweight deep learning models for enhanced feature extraction and dynamic scene understanding.
Comments 7: Add a brief introduction to the content of the paper's sections at the end of the introduction section.
Response 7: Thank you very much for your helpful comment.
We agree this significantly aids reader navigation. Following your advice, we have revised the end of the Introduction (lines 66-82) to explicitly outline the three main contributions of the paper: the improved adaptive LiDAR tracking module, the enhanced decision-level fusion method, and the three-level track management framework. This new addition clearly guides the reader through the subsequent sections and the overall structure of the manuscript.
Comments 8: There are some typos and grammatical errors, e.g., “ours LiDAR”, “Over the horizon early warning capability”. A thorough proofreading is required.
Response 8: Thank you very much for your helpful comment.
We sincerely thank the reviewer for their meticulous reading and for pointing out the typos and grammatical errors, such as "ours LiDAR" and "Over the horizon early warning capability." Following this critical advice, we have conducted a thorough, full-manuscript proofreading. This process has corrected all identified grammatical errors, typographical mistakes, and instances of awkward or non-standard phrasing throughout the paper, significantly enhancing the clarity and flow of the text.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThere are some questions to understand the reserch results as following:
Q1. Questionable Experimental Validation
The comparison baseline appears insufficient. Table 1 only compares against SORT (2016), Faggioni (2022), Yao (2023), and Qi (2023). For a 2025 publication claiming to be "novel," the absence of more recent state-of-the-art methods raises questions about the comprehensiveness of the evaluation. What was the state-of-the-art method before this paper? Also, the claim of "outperforming mainstream non-deep learning methods" (lines 19-22) seems overstated given the limited comparison set.
Q2. Unsubstantiated Real-time Claims
Line 107 states the goal is to "achieve real-time perception," yet no computational performance metrics are provided throughout the paper. Without processing time measurements, memory usage, or throughput analysis, the real-time capability remains unverified.
Q3. Lack of Statistical Rigor
The paper repeatedly uses terms like "significantly" (lines 85, 444) without proper statistical testing. While claiming "statistical reliability" based on 10 experimental runs (line 418), no p-values, confidence intervals, or significance tests are reported. For instance, comparing Yao's algorithm (IDSW: 5.50±1.18) with the proposed method (IDSW: 5.10±0.88), the difference appears marginal and may not be statistically significant.
Q4. Parameter Optimization Transparency
Critical parameters like w(iou), w(maha) weights (lines 209-210) and α, θ(bel) thresholds (line 305) lack explanation of their optimization process. Were these hand-tuned or systematically optimized? This affects reproducibility and fair comparison with baseline methods.
Q5. Ground Truth Reliability
The Gazebo simulation setup (lines 391-392) lacks detailed explanation of ground truth generation, particularly for occlusion scenarios. How are true positions determined when targets are occluded? This fundamental issue affects all evaluation metrics.
Q6. Implementation Fairness
Lines 430-434 claim fair comparison by using "same point cloud preprocessing," but implementation details of baseline methods are unclear. Were baseline methods optimally configured, or potentially disadvantaged by suboptimal implementation?
Q7. Overgeneralization
The conclusion (lines 584-586) claims the framework is "tailored for inland waterways" based solely on simulation results. This should be qualified as "simulated inland waterways" to avoid overgeneralization.
Q8. Technical Inconsistencies
- Section numbering error: "2.2. Improved Adaptive LiDAR Tracking Algorithm" appears twice (lines 130, 235).
- Non-standard colon usage in title (line 1): "3-Level MIFT:" uses non-English punctuation.
- Inconsistent terminology: "tracks" vs "trajectories" used interchangeably without clear distinction.
Author Response
Please see the attachment part.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper suggests a framework for tracking and decision-making on inland waterways, utilizing data from both LiDAR and AIS to overcome the limitations of single-source systems. The core of the methodology is a robust tracking engine that employs an Augmented Extended Kalman Filter coupled with a hybrid cost association technique. For reliable data integration, the framework uses a decision-level fusion strategy that combines the strengths of Dempster-Shafer evidence theory with Covariance Intersection. Finally, it implements a three-level trajectory management scheme specifically designed to support accurate and timely Beyond Visual Range warning capabilities, ensuring safer navigation in complex inland environments.
The paper is nice and I enjoyed reading it; however, I have several concerns:
- There is no related work section and the survey of related work is included in the introduction. I would encourage the author to split out the related work section.
- In Figure 1 and Figure 3, there are many colors for the boxes and the backgrounds. Why was such a diverse color palette necessary for this figure?
- The authors should clarify Figure 2, as the distinction between "unmatched tracks" and "unmatched detections" is currently ambiguous. Specifically, an explanation is needed to define precisely what constitutes an "unmatched track" within the context of the tracking algorithm (e.g., a track that has ceased receiving new detection updates), and how this differs from an "unmatched detection" (e.g., a new sensor measurement that could not be successfully associated with any existing track). Providing clear definitions for these two distinct categories is essential for fully understanding the performance metrics and the internal logic of the multi-source fusion process illustrated in the figure.
- The authors need to clarify what T represents in Equation 1.
- In equation 2, the authors define a recursive equation, I.e. X_k is defined by X_k-1, however, the definition of X_0 is missing.
- In equation 3, the authors define a recursive equation, I.e. P_k is defined by P_k-1, however, the definition of P_0 is missing.
- The authors write “When the innovation is large, increase Q_k and reduce R_k , so that the filter can quickly respond to sudden changes; When the innovation is small, reduce Q_k and increase R_k to make the filter smooth estimation.” The methodology lacks clarity regarding the precise tuning parameters for the covariance matrices. Specifically, the paper does not specify the necessary magnitude (i.e., by how much) to either increase or decrease the process noise covariance (Q_k​) and the measurement noise covariance (R_k​) during the filter's calibration. This ambiguity makes the practical replication and optimization of the proposed tracking method difficult.
- The authors do not refer to the combination of other apparatus besides the LIDAR and AIS like ultrasonic devices as was suggested in Y. Wiseman, "Ancillary Ultrasonic Rangefinder for Autonomous Vehicles", International Journal of Security and its Applications, Vol. 12(5), pp. 49-58, 2018. Available online at: https://u.cs.biu.ac.il/~wisemay/ijsia2018.pdf and also in Premnath, S., Mukund, S., Sivasankaran, K., Sidaarth, R., & Adarsh, S., "Design of an autonomous mobile robot based on the sensor data fusion of LIDAR 360, ultrasonic sensor and wheel speed encoder", In 2019 9th IEEE International Conference on Advances in Computing and Communication (ICACC), pp. 62-65, 2019. I would encourage the authors to cite these two papers and add some text about a combination of different devices at least as future work.
- In equations 19, the range of the Sigma is not specified. I.e. The bounds of the summation (Sigma) are not defined.
- In equations 21, the range of the Sigma is not specified. I.e. The bounds of the summation (Sigma) are not defined.
- Including a discussion of the proposed model's potential limitations and possible improvements would be valuable.
Author Response
Please see the attachment part.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe Authors have addressed all of my comments. However, based on their revisions, I have the following questions:
1. Please explain the rationality of choosing a voxel size of 0.1 m. Would voxel sizes of 0.2 m or 0.5 m be infeasible? Additionally, using a larger voxel size might help improve the algorithm efficiency.
2. Further efforts should be made to enhance the readability of the figures, as their current quality is quite poor.
3. Comment 7 requires to introduce the content of each section in the paper. If the authors have any confusion about this requirement, please refer to other professional papers for reference.
Author Response
Dear professors,
The revisions have been completed according to the comments of all the reviewers. It is quite convincing that each opinion is of great value, which improves the quality of this manuscript a lot. We are very grateful for this. According to these comments, all sections have been improved, so as to complete the modification of this manuscript. Compared with this manuscript's version2 in MDPI System, our revisions Manuscript Version 3 has been uploaded.
Comments 1: Please explain the rationality of choosing a voxel size of 0.1 m. Would voxel sizes of 0.2 m or 0.5 m be infeasible? Additionally, using a larger voxel size might help improve the algorithm efficiency.
Response 1: Thank you very much for your helpful comment.
We appreciate the reviewer’s valuable suggestion. In response, we have added an explanation in Section 3.2.1 to clarify the rationale for choosing a voxel size of 0.1 m. Specifically, larger voxel sizes (e.g., 0.2 m or 0.5 m) were also evaluated but led to partial loss of hull contour information and unstable clustering results, especially for small vessels and dense traffic scenes.
Therefore, a voxel size of 0.1 m was selected as a balanced configuration that preserves geometric fidelity while maintaining computational efficiency. The corresponding clarification has been added in lines 194-200 of the revised manuscript.
Comments 2: Further efforts should be made to enhance the readability of the figures, as their current quality is quite poor.
Response 2: Thank you very much for your helpful comment.
We sincerely thank the reviewer for pointing out this issue and apologize for the poor readability of the figures in the original submission.
We agree completely that the quality of the figures was insufficient. In response to this comment, we have rebuilt and re-exported all figures in the manuscript to ensure high resolution. Furthermore, we have specifically increased the font sizes in Figures 1, 2, and 3 to enhance readability. We have carefully checked the final revised PDF to confirm that all figures are now clear and easy to see.
Comments 3: Comment 7 requires to introduce the content of each section in the paper. If the authors have any confusion about this requirement, please refer to other professional papers for reference.
Response 3: Thank you very much for your helpful comment.
Thank you for this valuable suggestion. We agree that adding a brief introduction to the paper's sections at the end of the introduction is necessary. In accordance with your recommendation, we have added this content to the revised manuscript. This new text, which provides an overview of the paper's organization, can now be found on lines 83-88 of the revised manuscript. We believe this improves the clarity of the paper's structure.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed all my concerns. The revised manuscript is ready for publication.
Author Response
Dear professors,
The revisions have been completed according to the comments of all the reviewers. It is quite convincing that each opinion is of great value, which improves the quality of this manuscript a lot. We are very grateful for this. According to these comments, all sections have been improved, so as to complete the modification of this manuscript. Compared with this manuscript's version2 in MDPI System, our revisions Manuscript Version 3 has been uploaded.
Comments 1: The authors have addressed all my concerns. The revised manuscript is ready for publication.
Response 1: Thank you very much for your helpful comment.
We are extremely grateful for the reviewer's final approval of our work. Your professional, detailed, and constructive comments throughout the review process have been invaluable in improving the quality of our manuscript. We sincerely thank you for your time and patient guidance.
Author Response File:
Author Response.pdf
Round 3
Reviewer 1 Report
Comments and Suggestions for AuthorsI have no more comments.

