Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multiscale Region-Based Convolutional Neural Networks for 3D Object Detection with LiDAR Sensors

Sensors 2026, 26(4), 1156; https://doi.org/10.3390/s26041156

by Wei-Jong Yang¹, Song-Bo Yao² and Jar-Ferr Yang^3,*

Reviewer 1:

Ming Yang

Reviewer 2: Anonymous

Sensors 2026, 26(4), 1156; https://doi.org/10.3390/s26041156

Submission received: 5 January 2026 / Revised: 30 January 2026 / Accepted: 5 February 2026 / Published: 11 February 2026

(This article belongs to the Section Vehicular Sensing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a multi-scale 3D object detection network for LiDAR point clouds. By introducing a pyramid fusion RoI head, cross-attention modules, and improved data augmentation methods within the Voxel R-CNN framework, it effectively enhances the detection performance for multi-scale objects in sparse point clouds. The research problem is well-defined, the methodology is innovative, the experimental design is comprehensive, and the conclusions are well-supported by data, making it a solid and contributive work.

Nevertheless, to further enhance the academic rigor, clarity, and impact of the paper, the following points could be refined:

Suggestion 1: Expand the comparative discussion of the research background.
It is recommended to include a brief comparison and discussion of recent 3D detection methods based on multi-modal fusion (e.g., LiDAR-Camera), such as TransFusion and IS-Fusion, in the Introduction or Related Work section. This would help more clearly delineate the research space, applicable scenarios, and unique advantages of the proposed "LiDAR-only" method, strengthening its dialogue within the field.

Suggestion 2: Provide a deeper analysis of the experimental results.
In the footnote or main text related to Table 2, a brief explanation of why the "use_road_plane" data augmentation technique negatively affects the performance of the proposed network would be beneficial. This analysis would help readers understand the dependency of different methods on data assumptions and scene priors, reflecting the authors' in-depth consideration of model robustness.

Suggestion 3: Optimize the visual representation of the core module.
Regarding Figure 7 (detailed architecture of the pyramid fusion RoI head), there is room for improvement in clearly conveying the core innovative aspects of the method. The current figure could be further optimized in visually distinguishing feature scales and clarifying how the key (K) and value (V) in the attention modules aggregate information from different levels—two key designs of "multi-scale feature fusion" and "cross-level attention interaction." Refining this figure—for example, by adding scale annotations, arrows for cross-level information flow, or potentially using subfigures—to more intuitively illustrate the feature flow and interaction from deep to shallow layers would greatly help readers quickly and accurately understand this sophisticated design, thereby enhancing the readability and reproducibility of the paper.

Suggestion 4: Strengthen the interpretation of ablation results.
In the Discussion section (or Section 4.2), a brief hypothetical analysis of the interesting observation that "the fusion module significantly improves pedestrian detection but may slightly affect the detection of larger objects" is suggested. For instance, attempts could be made to explore this from perspectives such as the receptive field requirements of objects at different scales or the scale sensitivity of feature representations, thereby adding depth and insight to the paper's analysis.

Suggestion 5: Refine linguistic details and textual standardization.
Please carefully check the grammar and spelling throughout the manuscript, correcting minor typos (e.g., "increasing important" in the abstract should be "increasingly important"; "exit" in the introduction should be "exist"), and ensure consistent use of formulas, symbols, and terminology. This will further enhance the academic rigor and professional expression of the paper.

In summary, this paper has completed a valuable piece of research. Addressing the above suggestions would further improve its completeness, clarity, and academic impact.

Comments on the Quality of English Language

The overall quality of the English language is acceptable for scientific communication, and the core ideas of the research are conveyed understandably. However, the manuscript would benefit from thorough proofreading and editing to elevate its professionalism and readability. Several instances of grammatical inaccuracies, awkward phrasing, and typographical errors are present throughout the text. Correcting these issues will enhance the clarity of presentation and ensure consistency in terminology, ultimately strengthening the scholarly impact of the work. Below are representative examples for the authors' reference:

Abstract: "...is increasing important..." should be "...is increasingly important...".
Introduction: "...there still exit many..." should be "...there still exist many...".
Introduction: "If the image sensors fail to work properly, the Lidar-image collaborations will not also function correctly." The word "also" is awkward here; consider rephrasing, e.g., "...will also fail to function correctly" or simply "...will not function correctly."

We recommend a careful line-by-line review of the manuscript, potentially with the assistance of a native English speaker or professional editing service, to polish the language before final publication.

Author Response

See attached PDF file for details

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript submitted by the authors for review is quite interesting and has a scientific and applied character. At the same time, there are a number of comments and recommendations:

In our opinion, similar implemented projects and the scientific literature related to the authors’ research topic have not been sufficiently analyzed. The application of LiDAR systems for solving applied problems is currently a very popular topic, and it is очевидent that similar studies have been conducted previously. The objectives and scientific novelty of the presented work should be derived from a thorough analysis of the relevant literature.
The research methodology is described in great detail; however, it would be advisable to present a comparative table demonstrating the advantages of the proposed method over existing ones. In addition, the article lacks detailed information on the measurement conditions, sensor parameters, calibration procedures, sample size, and data processing workflow, which makes it difficult for other researchers to reproduce the results.
In our view, the statistical analysis of the results is somewhat weak. The presented graphs and tables are not always accompanied by error estimates, confidence intervals, or tests of statistical significance of the differences between methods, which reduces the persuasiveness of the conclusions.
Unfortunately, the paper does not include a “Discussion” section, where the authors could address controversial or debatable issues related to the study.
The “Conclusions” section is very general in nature and should be made more specific and detailed based on the obtained research results.

Author Response

See attached file for details

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you to the authors for listening to the comments and recommendations, now the work looks much better.

Article Menu

Multiscale Region-Based Convolutional Neural Networks for 3D Object Detection with LiDAR Sensors

Further Information

Guidelines

MDPI Initiatives

Follow MDPI