Pose-Perceptive Convolution: Learning Geometry-Adaptive Receptive Fields for Robust 6D Pose Estimation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents a novel method for RGBD-based 6DOF pose estimation. This method focuses on improving a feature extraction frontend rather than extending complex backend modules. To this end, it proposes PPF-Net, which includes a PPC-based encoder, an LFA module for feature refinement, and a probabilistic pose regression head for uncertainty-aware prediction. Its performance is evaluated using the MP6D, YCB, LINEMOD, and Occlusion-LINEMOD datasets and compared with previous methods. An ablation study is also conducted to validate the contribution of the proposed method.
Overall, this paper is well-written with detailed explanations and easy figures, making it easy to follow and understand. The main idea is well-articulated, and the proposed new features are technically sound and clearly explained against the baseline method. The performance evaluations are also sufficiently conducted through comparative and ablation studies.
However, there are several drawbacks in the comparative studies (section 4.4), which require clarification and improvement:
- There is no explanation or justification for the selection of the previous methods used for comparisons. Actually, there exist several other recent methods that report better performances, e.g., RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images, 2024. Cross-modal attention and geometric contextual aggregation network for 6DoF object pose estimation, 2025.
- Why do you think that the selected methods are appropriate for fair comparisons? e.g., GDRNPP uses RGB-based results. MegaPose and DFTr are categorized as backend processing, which appears to outperform the proposed method.
- Some methods do not report the detailed results in their papers. How did you obtain all the results of previous methods? Did you reproduce them on your own?
- The authors pointed out the complexity of other methods. How about the proposed method’s computational costs?
- Please separately highlight the top and second-ranked methods in the tables for better interpretability, even though the proposed method does not achieve the top ranking in the evaluations.
- Please carefully use those words, “outperform”, “boost”, “slightly”, “highly”, and so on, throughout the discussion.
These points weaken the significance of the proposed method and make the overall contribution less convincing.
In addition, the BOP benchmark (https://bop.felk.cvut.cz/home/) can be helpful for comparative studies.
Author Response
Dear Reviewer,
We sincerely appreciate the time and effort you dedicated to reviewing our manuscript, as well as your highly professional and detailed feedback. Your rigorous review and insightful comments have been crucial in significantly improving the quality of this work.
We have carefully considered and addressed each of your comments, providing detailed explanations in this response letter(PDF). All revisions have been highlighted in the manuscript, which is compiled along with this response for your review.
Once again, we thank you for your guidance and constructive input.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper presents a well-motivated and novel approach to addressing the geometric mismatch problem in 6D pose estimation. The proposed Pose-Perceptive Convolution (PPC) is a meaningful contribution, offering structured and geometry-aware receptive field adaptation, which differentiates it from existing deformable convolutions. The extensive experiments demonstrate the effectiveness and robustness of the method.
However, there are a few minor issues that need to be modified
1 Figure 1 is referenced before it is properly introduced, and Table 1 uses “DCN” without specifying the version (e.g., DCNv2).
2 For long sentences, it is recommended to shorten the first sentence, for example, in Section 3.2
Author Response
Dear Reviewer,
We sincerely appreciate the time and effort you dedicated to reviewing our manuscript, as well as your highly professional and detailed feedback. Your rigorous review and insightful comments have been crucial in significantly improving the quality of this work.
We have carefully considered and addressed each of your comments, providing detailed explanations in this response letter(PDF). All revisions have been highlighted in the manuscript, which is compiled along with this response for your review.
Once again, we thank you for your guidance and constructive input.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper addresses a fundamental yet often overlooked issue—the geometric mismatch between standard convolutional receptive fields and the projected morphology of objects under varying poses. The proposed Pose-Perceptive Convolution (PPC) is a novel and principled solution that dynamically adapts both the shape and sampling density of the receptive field, effectively aligning feature extraction with object geometry. The integration of a Lightweight Fusion Attention (LFA) module and a probabilistic regression head further enhances the robustness and uncertainty-awareness of the model, particularly in challenging scenarios such as occlusion and symmetry.
This paper is well-structured and technically sound. The writing is clear, the methodology is well-explained, and the claims are appropriately supported by the results. In my opinion, this manuscript is ready for acceptance in its current form.
Recommendation: Accept
Author Response
We sincerely thank the reviewer for the encouraging assessment and the recommendation for acceptance. We are delighted that you recognized the importance of addressing the fundamental geometric mismatch problem and appreciated the novelty of our Pose-Perceptive Convolution (PPC) design. Your validation of the technical soundness and clarity of our work is highly motivating. We have performed a final proofread of the manuscript to ensure it maintains the high standard you observed.
Reviewer 4 Report
Comments and Suggestions for AuthorsGeneral Notes:
- The proposed model does not provide better performance when compared with the current existing models.
- The advantage of its use and the innovation must be properly highlighted and indentified in hte experimental results section.
Abstract:
- “Extensive experiments on multiple benchmarks demonstrate that PPF-Net achieves performance comparable to state-of-the-art methods.” – Quantify the performance. If it is comparable and not better, what is the objective of developing this method? Explore it better.
Keywords:
- Do not use acronyms
Introduction:
- You have stated the innovations, but please be clearer about what is entirely new and what is based on pre-existing methods.
Related Work:
- Please do not start a section directly with a subsection.
- Add a table comparing the methods.
- Explore better the existing research gaps.
Methodology:
- Do not describe well-known methods with such detail. Use citations instead, or create a proper appendix.
Experimental Results and Analyses:
- Please do not start a section directly with a subsection.
- Add a table describing the performed tests, their objectives, and the used performance metrics.
- Why these datasets? The most representative of the reality?
- Add a proper k-fold analysis to better understand the contribution of each dataset to the model training, and not this division into independent subsections.
- In the vast majority of the tests, the proposed model does not present a better performance.
Comments on the Quality of English LanguageModerate
Author Response
Dear Reviewer,
We sincerely appreciate the time and effort you dedicated to reviewing our manuscript, as well as your highly professional and detailed feedback. Your rigorous review and insightful comments have been crucial in significantly improving the quality of this work.
We have carefully considered and addressed each of your comments, providing detailed explanations in this response letter(PDF). All revisions have been highlighted in the manuscript, which is compiled along with this response for your review.
Once again, we thank you for your guidance and constructive input.
Author Response File:
Author Response.pdf
Reviewer 5 Report
Comments and Suggestions for AuthorsThis manuscript focuses on the core problem of the mismatch between the fixed receptive field of standard convolutions and object projection shapes in the field of 6D object pose estimation, and proposes the Pose-Perceptive Convolution (PPC) and Pose-Perceptive Fusion Network (PPF-Net). It is a research work that combines both academic value and application potential.
Its core strengths are reflected in the following aspects: 1) By dynamically adjusting the receptive field shape and sampling density, the PPC module forms a structured geometric alignment scheme that differs from traditional deformable convolutions; 2)The module combination of PPC, Lightweight Fusion Attention (LFA), and probabilistic regression accurately addresses domain challenges such as geometric mismatch, feature noise, and pose ambiguity; 3)Experiments cover multiple typical datasets (including industrial parts, general objects, and heavily occluded scenarios), and standard metrics such as ADD and VSD are used to verify the model performance, while ablation experiments clarify the independent contributions of each module.
This manuscript still has room for optimization at the detail level: 1) In the method description, some parameter tuning bases and the operation details of the bidirectional fusion module are not fully explained, which affects reproducibility; 2)The references do not cover recently emerging Transformer-based pose estimation methods, resulting in insufficient comprehensiveness of the research context; 3)The presentation of figures and tables is limited in form (e.g., only numerical tables are used for ablation experiments), and intuitive materials such as feature visualization are lacking; 4)There is a lack of generalization experiments targeting object size and texture complexity, and the robustness of the conclusions can be further strengthened.
Comments on the Quality of English LanguageThis manuscript focuses on the core problem of the mismatch between the fixed receptive field of standard convolutions and object projection shapes in the field of 6D object pose estimation, and proposes the Pose-Perceptive Convolution (PPC) and Pose-Perceptive Fusion Network (PPF-Net). It is a research work that combines both academic value and application potential.
Its core strengths are reflected in the following aspects: 1) By dynamically adjusting the receptive field shape and sampling density, the PPC module forms a structured geometric alignment scheme that differs from traditional deformable convolutions; 2)The module combination of PPC, Lightweight Fusion Attention (LFA), and probabilistic regression accurately addresses domain challenges such as geometric mismatch, feature noise, and pose ambiguity; 3)Experiments cover multiple typical datasets (including industrial parts, general objects, and heavily occluded scenarios), and standard metrics such as ADD and VSD are used to verify the model performance, while ablation experiments clarify the independent contributions of each module.
This manuscript still has room for optimization at the detail level: 1) In the method description, some parameter tuning bases and the operation details of the bidirectional fusion module are not fully explained, which affects reproducibility; 2)The references do not cover recently emerging Transformer-based pose estimation methods, resulting in insufficient comprehensiveness of the research context; 3)The presentation of figures and tables is limited in form (e.g., only numerical tables are used for ablation experiments), and intuitive materials such as feature visualization are lacking; 4)There is a lack of generalization experiments targeting object size and texture complexity, and the robustness of the conclusions can be further strengthened.
Author Response
Dear Reviewer,
We sincerely appreciate the time and effort you dedicated to reviewing our manuscript, as well as your highly professional and detailed feedback. Your rigorous review and insightful comments have been crucial in significantly improving the quality of this work.
We have carefully considered and addressed each of your comments, providing detailed explanations in this response letter(PDF). All revisions have been highlighted in the manuscript, which is compiled along with this response for your review.
Once again, we thank you for your guidance and constructive input.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors’ responses are thorough and clear, and the revisions are acceptable. While the evaluation results do not strongly demonstrate the superiority of the proposed method compared to existing methods, I think that the findings of this study are still a valuable contribution to the field.
Reviewer 4 Report
Comments and Suggestions for AuthorsThe authors have addressed the vast majority of my comments.
