Rotation- and Scale-Invariant Object Detection Using Compressed 2D Voting with Sparse Point-Pair Screening
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript presents a well-motivated and clever method to accelerate the Generalized Hough Transform (GHT) for robust object detection. The core idea of using a two-level screening process to compress the 4D search space into a 2D voting scheme is sound, and the reported performance improvements are significant. The work addresses a practical and important problem in industrial machine vision. The paper has several weaknesses:
- The method's performance depends on several hyperparameters, such as the quantization bin sizes (Δ_bin, Δθ, Δφ), the grid-filtering threshold (Thre_size = 4), and the fuzzy voting neighborhood size (3x3). The manuscript needs an ablation study or sensitivity analysis to explain how these parameters were chosen and how they impact the results.
- The introduction dismisses deep learning (DL) due to latency and data needs. However, it fails to discuss modern, lightweight DL architectures (e.g., MobileNets, YOLO-tiny) designed for real-time performance. A discussion on why this GHT-based method remains competitive is necessary.
- The paper's novelty over prior GHT methods using gradient differences needs to be more clearly defined. The authors should explicitly state what makes their two-level screening and lookup pipeline a fundamental improvement.
- The experiments are conducted on a private dataset. To strengthen the claims and allow for fair comparison, the authors should either test their method on a public benchmark dataset or provide much more detail about their own dataset and consider making it public. Furthermore, the paper only shows successful results; an analysis of failure cases is needed to understand the method's limitations.
Author Response
Comments 1:
The method's performance depends on several hyperparameters, such as the quantization bin sizes (Δ_bin, Δθ, Δφ), the grid-filtering threshold (Thre_size = 4), and the fuzzy voting neighborhood size (3×3). The manuscript needs an ablation study or sensitivity analysis to explain how these parameters were chosen and how they impact the results.
Response 1:
Thank you for this comment. We agree that a clear sensitivity analysis is essential. Accordingly, we have added a new subsection 4.2 Hyperparameter Sensitivity Analysis immediately following Section 4.1 (page 11, lines 338–355). In it, we run single-variable trials on our augmented dataset (1 000 images), measure localization precision (P) for each tested value of Δ_bin, Δθ, Δφ, Thre_size, and neighborhood size, and summarize the results in Table 3.
Comments 2:
The introduction dismisses deep learning (DL) due to latency and data needs. However, it fails to discuss modern, lightweight DL architectures (e.g., MobileNets, YOLO-tiny) designed for real-time performance. A discussion on why this GHT-based method remains competitive is necessary.
Response 2:
Thank you for raising this point. While we recognize that lightweight DL models such as MobileNets and YOLO-tiny have made significant progress in reducing inference time, our manuscript is focused specifically on a training-free, annotation-free approach that can be deployed immediately on commodity CPU hardware without any specialized accelerators or calibration data. Introducing a detailed comparison to lightweight DL architectures would require an extensive survey of their various implementations and platform-specific optimizations, which would detract from the core contribution—namely, the novel two-level screening and lookup pipeline for GHT.
Moreover, due to the strict length limits set by the journal, we chose to emphasize quantitative runtime and robustness comparisons against classical GHT variants, leaving a broader DL discussion for future work. We believe this maintains a clear and focused narrative while still demonstrating that our method offers a unique real-time, data-independent solution in settings where DL training or hardware acceleration is not feasible.
Comments 3:
The paper's novelty over prior GHT methods using gradient differences needs to be more clearly defined. The authors should explicitly state what makes their two-level screening and lookup pipeline a fundamental improvement.
Response 3:
Thank you for this valuable suggestion. We agree that our contributions must be contrasted directly with prior gradient-difference GHT methods. Accordingly, we have added a focused paragraph at the very beginning of Section 3 “Our Method” (page 4, lines 129–140) that explicitly enumerates our three fundamental innovations and highlights how they together yield orders-of-magnitude speedups and improved noise resilience.
Comments 4:
The experiments are conducted on a private dataset. To strengthen the claims and allow for fair comparison, the authors should either test their method on a public benchmark dataset or provide much more detail about their own dataset and consider making it public. Furthermore, the paper only shows successful results; an analysis of failure cases is needed to understand the method’s limitations.
Response 4:
Thank you for this valuable suggestion. While we recognize the benefit of public benchmarks and failure‐case analysis, our current study focuses on a highly specialized, in‐line industrial inspection setup where the exact lighting, vibration, and fixturing conditions are integral to the method’s validation. Recasting our pipeline on existing public datasets would require a substantial effort to recreate those specific operating parameters—work that is beyond the scope and length constraints of this manuscript.
To ensure transparency, we have augmented Section 4.1 (page 11, lines 309–312) with a detailed description of our dataset’s composition, augmentation levels, and capture conditions. A systematic study of failure modes and cross‐benchmark comparisons is planned as part of a follow-on journal extension, where we can devote the necessary resources to re-capture, annotate, and analyze those scenarios without detracting from the core contributions presented here. We appreciate your understanding and look forward to exploring these directions in future work.
Reviewer 2 Report
Comments and Suggestions for AuthorsI present only a few corrections to improve the text provided. Also, the size of the figures should be increased as they do not adequately reflect the work performed, example: Figure 6, 7, 8, 9, 10, 11,12 and 13.
Comments for author File: Comments.docx
Author Response
Comments 1: I present only a few corrections to improve the text provided. Also, the size of the figures should be increased as they do not adequately reflect the work performed, example: Figure 6, 7, 8, 9, 10, 11, 12 and 13.
Response 1: Thank you very much for your careful reading and for especially pointing out the grammatical errors in our manuscript. We have corrected all identified grammar issues and, in addition, agree with your suggestion to enlarge the figures.We have increased the display size of some figures throughout the manuscript to better showcase the details of our results.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors propose research on rotation- and scale-variant object detection using compressed 2D voting with sparse point-pair screening. They commence with a comprehensive introduction that delineates the project's scope and the primary contributions of the research. They proceed with a review of related work to further demonstrate the developed method. They then proceed to explain the experiment performed to validate the research, concluding with their conclusions.
- It is essential that they provide a thorough explanation regarding the rationale behind the utilization of blur, noise, occlusion, and nonlinear illumination as evaluation metrics for the system's robustness, as opposed to the employment of alternative tests.
- The graphs in Figure 9 are not visible. They should be placed in at least two different rows to improve visibility.
- It is recommended that a "future work" section be added to identify a future path for this research.
Author Response
Comments 1:
It is essential that they provide a thorough explanation regarding the rationale behind the utilization of blur, noise, occlusion, and nonlinear illumination as evaluation metrics for the system’s robustness, as opposed to the employment of alternative tests.
Response 1:
Thank you for this insightful comment. We agree that a clear justification of these four interference modes is necessary. Accordingly, we have refined the paragraph in Section 4.1 “Experimental environment” (page 11, lines 314–331) to make our rationale explicit.This addition clearly explains why we chose blur, noise, occlusion, and nonlinear illumination, and why other tests were not included.
Comments 2:
The graphs in Figure 9 are not visible. They should be placed in at least two different rows to improve visibility.
Response 2:
Thank you for this suggestion. We agree that splitting Figure 9 into two rows greatly enhances readability. With these changes, Figure 9 now appears in two clear rows of two subplots, making each graph much easier to inspect.
Comments 3:
It is recommended that a “future work” section be added to identify a future path for this research.
Response 3:
Thank you for this valuable suggestion. We agree that outlining concrete next steps will strengthen the manuscript. Accordingly, we have added a new Section 6 Future Work immediately following the Conclusion (page 17, lines 436–452).
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have provided clear, in-depth, and persuasive responses to all the issues raised previously. All concerns have been properly addressed, and the scientific rigor and precision of the research have been further enhanced. Overall, this work demonstrates considerable academic value and innovation, meeting the acceptance criteria, and thus I recommend accepting it.