Robust 6-DoF Pose Estimation under Hybrid Constraints
Abstract
:1. Introduction
- To improve the accuracy of keypoint positioning, Heatmap Wing Loss is designed specifically for heatmap regression. Using Heatmap Wing Loss, the algorithm improves the quality of the network-predicted heatmap and ensures the stability of keypoint positioning when the object is blocked.
- Referring to the end-to-end algorithm, the heatmap regression network is expanded to add a translation regression branch, so that a variety of constraints on the pose can be imposed, leading to an improvement of the pose estimation stability in occlusion scenes.
- To better integrate the two constraints of the keypoints and translation, a pose optimization module is designed to further improve the accuracy of the estimated poses.
2. Related Work
2.1. End-to-End Methods
2.2. Two-Stage Methods
3. Proposed Approach
3.1. Pipeline
Algorithm 1 calculation of the output bounding box |
|
3.2. Mutli-Task Network
3.3. Loss Function
3.4. Initial Pose
3.5. Pose Optimization
4. Experiments
4.1. Benchmark Datasets
4.2. Evaluation Metrics
4.3. Implementation
4.4. Ablation Studies
4.5. Performance on LINEMOD
4.6. Performance on Occlusion LINEMOD
4.7. Running Time
5. Discussion
- In this paper, only the translation component is used for the pose constraint. In addition, the predicted rotation component, keypoint correlation information and symmetry information can all be used to constrain the pose. Using more constraints on pose in the algorithm is more conducive to improving the stability of pose estimation.
- It is difficult for the two-stage algorithm using heatmap to estimate the pose of occluded object. The main reason is that the heatmap cannot locate key points outside the range of the image. Therefore, we can learn from the idea of voting method to generate features that represent key points outside the image range within the image range.
- This paper only focuses on the pose estimation problem of a single object in an image, when there are multiple objects in the image, the algorithm needs to estimate the pose of each object one by one, which affects the operation efficiency of the algorithm. How to estimate multiple objects in an image at the same time will be studied later.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Feng, W.; Tian, F.P.; Zhang, Q.; Sun, J. 6D Dynamic Camera Relocalization from Single Reference Image. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4049–4057. [Google Scholar] [CrossRef]
- Tian, F.P.; Feng, W.; Zhang, Q.; Wang, X.; Sun, J.; Loia, V.; Liu, Z.Q. Active Camera Relocalization from a Single Reference Image without Hand-Eye Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2791–2806. [Google Scholar] [CrossRef] [PubMed]
- Rad, M.; Lepetit, V. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3848–3856. [Google Scholar] [CrossRef] [Green Version]
- Tekin, B.; Sinha, S.N.; Fua, P. Real-Time Seamless Single Shot 6D Object Pose Prediction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 292–301. [Google Scholar] [CrossRef] [Green Version]
- Pavlakos, G.; Zhou, X.; Chan, A.; Derpanis, K.G.; Daniilidis, K. 6-DoF object pose from semantic keypoints. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 2011–2018. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.; Sridhar, S.; Huang, J.; Valentin, J.; Song, S.; Guibas, L.J. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2637–2646. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Z.; Peng, G.; Wang, H.; Fang, H.S.; Li, C.; Lu, C. Estimating 6D Pose From Localizing Designated Surface Keypoints. arXiv 2018, arXiv:1812.01387. [Google Scholar]
- Oberweger, M.; Rad, M.; Lepetit, V. Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation. arXiv 2018, arXiv:1804.03959. [Google Scholar]
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. arXiv 2017, arXiv:1711.00199. [Google Scholar]
- Kehl, W.; Manhardt, F.; Tombari, F.; Ilic, S.; Navab, N. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1530–1538. [Google Scholar] [CrossRef] [Green Version]
- Bukschat, Y.; Vetter, M. EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach. arXiv 2020, arXiv:2011.04307. [Google Scholar]
- Labb’e, Y.; Carpentier, J.; Aubry, M.; Sivic, J. CosyPose: Consistent multi-view multi-object 6D pose estimation. In Proceedings of the ECCV, Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Su, Y.; Saleh, M.; Fetzer, T.; Rambach, J.R.; Navab, N.; Busam, B.; Stricker, D.; Tombari, F. ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, Louisiana, 19–24 June 2022; pp. 6728–6738. [Google Scholar]
- Castro, P.; Kim, T.K. CRT-6D: Fast 6D Object Pose Estimation with Cascaded Refinement Transformers. arXiv 2022, arXiv:2210.11718. [Google Scholar]
- Hu, Y.; Fua, P.; Salzmann, M. Perspective Flow Aggregation for Data-Limited 6D Object Pose Estimation. arXiv 2022, arXiv:2203.09836. [Google Scholar]
- Hodan, T.; Michel, F.; Brachmann, E.; Kehl, W.; Buch, A.G.; Kraft, D.; Drost, B.; Vidal, J.; Ihrke, S.; Zabulis, X.; et al. BOP: Benchmark for 6D Object Pose Estimation. arXiv 2018, arXiv:1808.08319. [Google Scholar]
- Hodan, T.; Sundermeyer, M.; Drost, B.; Labbé, Y.; Brachmann, E.; Michel, F.; Rother, C.; Matas, J. BOP Challenge 2020 on 6D Object Localization. In Proceedings of the ECCV Workshops, Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Do, T.T.; Cai, M.; Pham, T.; Reid, I. Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image. arXiv 2018, arXiv:1802.10367. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Liu, F.; Fang, P.; Yao, Z.; Fan, R.; Pan, Z.; Sheng, W.; Yang, H. Recovering 6D object pose from RGB indoor image based on two-stage detection network with multi-task loss. Neurocomputing 2019, 337, 15–23. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN. arXiv 2017, arXiv:1506.01497. [Google Scholar]
- Sundermeyer, M.; Marton, Z.C.; Durner, M.; Triebel, R. Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection. Int. J. Comput. Vis. 2020, 128, 714–729. [Google Scholar] [CrossRef]
- Li, Y.; Wang, G.; Ji, X.; Xiang, Y.; Fox, D. DeepIM: Deep Iterative Matching for 6D Pose Estimation. Int. J. Comput. Vis. 2020, 128, 657–678. [Google Scholar] [CrossRef]
- Gupta, K.; Petersson, L.; Hartley, R. CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Korea, 27–28 October 2019; pp. 2758–2766. [Google Scholar] [CrossRef] [Green Version]
- Hu, Y.; Hugonot, J.; Fua, P.; Salzmann, M. Segmentation-Driven 6D Object Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3380–3389. [Google Scholar] [CrossRef] [Green Version]
- Zhao, W.; Zhang, S.; Guan, Z.; Luo, H.; Tang, L.; Peng, J.; Fan, J. 6D object pose estimation via viewpoint relation reasoning. Neurocomputing 2020, 389, 9–17. [Google Scholar] [CrossRef]
- Peng, S.; Zhou, X.; Liu, Y.; Lin, H.; Huang, Q.; Bao, H. PVNet: Pixel-Wise Voting Network for 6DoF Object Pose Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3212–3223. [Google Scholar] [CrossRef]
- Song, C.; Song, J.; Huang, Q. HybridPose: 6D Object Pose Estimation Under Hybrid Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 431–440. [Google Scholar]
- ER-Pose: Learning edge representation for 6D pose estimation of texture-less objects. Neurocomputing 2023, 515, 13–25. [CrossRef]
- Park, K.; Patten, T.; Vincze, M. Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 7667–7676. [Google Scholar] [CrossRef] [Green Version]
- Zakharov, S.; Shugurov, I.; Ilic, S. DPOD: 6D Pose Object Detector and Refiner. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1941–1950. [Google Scholar] [CrossRef] [Green Version]
- Wang, G.; Manhardt, F.; Tombari, F.; Ji, X. GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 16606–16616. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. arXiv 2019, arXiv:1902.09212. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. arXiv 2019, arXiv:1908.07919. [Google Scholar] [CrossRef] [Green Version]
- Feng, Z.H.; Kittler, J.; Awais, M.; Huber, P.; Wu, X.J. Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks. arXiv 2017, arXiv:1711.06753. [Google Scholar]
- Wang, X.; Bo, L.; Fuxin, L. Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression. arXiv 2019, arXiv:1904.07399. [Google Scholar]
- Zhang, F.; Zhu, X.; Dai, H.; Ye, M.; Zhu, C. Distribution-Aware Coordinate Representation for Human Pose Estimation. arXiv 2019, arXiv:1910.06278. [Google Scholar]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. EPnP: An Accurate O(n) Solution to the PnP Problem. Int. J. Comput. Vis. 2008, 81, 155. [Google Scholar] [CrossRef] [Green Version]
- Hinterstoisser, S.; Lepetit, V.; Ilic, S.; Holzer, S.; Bradski, G.; Konolige, K.; Navab, N. Model Based Training, Detection and Pose Estimation of Texture-Less 3d Objects in Heavily Cluttered Scenes. In Proceedings of the Computer Vision—ACCV 2012, Daejeon, Korea, 5–9 November 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 548–562. [Google Scholar] [CrossRef]
- Brachmann, E.; Krull, A.; Michel, F.; Gumhold, S.; Shotton, J.; Rother, C. Learning 6D Object Pose Estimation Using 3D Object Coordinates. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014. [Google Scholar]
- Agarwal, S.; Mierle, K.; Ceres Solver Team. Ceres Solver. Available online: http://ceres-solver.org (accessed on 18 September 2022).
- Li, Z.; Wang, G.; Ji, X. CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 7677–7686. [Google Scholar] [CrossRef]
- Yu, X.; Zhuang, Z.; Koniusz, P.; Li, H. 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss. arXiv 2020, arXiv:2002.03923. [Google Scholar]
- Xiong, F.; Liu, C.; Chen, Q. Region Pixel Voting Network (RPVNet) for 6D Pose Estimation from Monocular Image. Appl. Sci. 2021, 11, 743. [Google Scholar] [CrossRef]
Metric | ADD(-S)/% | ADD(-S)/m | ADD(-S)/% | ADD(-S)/m | ADD(-S) /% | ADD(-S)/m | ADD(-S)/% | ADD(-S)/m |
---|---|---|---|---|---|---|---|---|
Trans. Module | ✓ | ✓ | ✕ | ✕ | ||||
Opt. Module | ✓ | ✕ | ✓ | ✕ | ||||
ape | 31.5 | 0.14 | 14.8 | 0.08 | 32.3 | 0.33 | 25.8 | 0.41 |
can | 75.1 | 0.04 | 10 | 0.06 | 75.8 | 0.03 | 69.8 | 0.03 |
cat | 30.6 | 0.66 | 4.9 | 0.17 | 26.8 | 3.02 | 24 | 2.79 |
duck | 34.6 | 0.35 | 11.3 | 0.09 | 31.1 | 0.76 | 28.3 | 0.65 |
driller | 56 | 0.09 | 29 | 0.06 | 61.7 | 0.09 | 53.4 | 0.10 |
eggbox | 46.7 | 0.38 | 1.4 | 0.13 | 51.4 | 1.11 | 42.1 | 1.13 |
glue | 53.4 | 0.36 | 21.6 | 0.10 | 51.1 | 3.72 | 50.5 | 6.25 |
holepuncher | 42 | 0.03 | 8 | 0.05 | 41.9 | 0.03 | 35.6 | 0.03 |
mean | 46.2 | 0.26 | 12.6 | 0.09 | 46.5 | 1.14 | 41.2 | 1.42 |
Mean Keypoint Location Error/cm | ||
---|---|---|
Loss Function | Heatmap Wing Loss | MSE Loss |
ape | 8.4 | 9.1 |
can | 4.1 | 4.6 |
cat | 13 | 15.1 |
duck | 9 | 10 |
driller | 5.2 | 6.8 |
egg box | 38.4 | 39.2 |
glue | 14.6 | 20.7 |
hole | 8.2 | 12.8 |
Mean | 12.6 | 14.8 |
Method | Proposed | BetaPose | CDPN | DPVL | RPVNet | HybridPose | ER-Pose | poseCNN | DeepIM |
---|---|---|---|---|---|---|---|---|---|
ape | 73 | 41.2 | 64.4 | 69.1 | 55.6 | 63.1 | 62.6 | - | 77 |
benchvise | 99.5 | 85.7 | 97.8 | 100 | 98.7 | 99.9 | 100 | - | 97.5 |
cam | 96.9 | 78.9 | 91.7 | 94.1 | 83.6 | 90.4 | 95.8 | - | 93.5 |
can | 99.3 | 85.2 | 95.9 | 98.5 | 93.2 | 98.5 | 99.2 | - | 96.5 |
cat | 92.3 | 73.9 | 83.8 | 83.1 | 75.5 | 89.4 | 90.7 | - | 82.1 |
driller | 98 | 77 | 96.2 | 99 | 94.7 | 98.5 | 99 | - | 95 |
duck | 72.7 | 42.7 | 66.8 | 63.5 | 63.5 | 65 | 68.6 | - | 77.7 |
eggbox | 99.8 | 78.9 | 99.7 | 100 | 95.8 | 100 | 100 | - | 97.1 |
glue | 96.6 | 72.5 | 99.6 | 98 | 93.4 | 98.8 | 98.7 | - | 99.4 |
hole. | 95.3 | 63.9 | 85.8 | 88.2 | 82.5 | 89.7 | 89.7 | - | 52.8 |
iron | 98.1 | 94.4 | 97.9 | 99.9 | 96.1 | 100 | 99.6 | - | 98.3 |
lamp | 99.4 | 98.1 | 97.9 | 99.8 | 96.8 | 99.5 | 99.4 | - | 97.5 |
phone | 95 | 51 | 90.8 | 96.4 | 91.5 | 94.9 | 96.8 | - | 87.7 |
Mean | 93.5 | 72.6 | 89.9 | 91.5 | 86.1 | 91.3 | 92.3 | 62.7 | 88.6 |
Method | Proposed | HeatmapNet | RPVNet | DPVL | DPOD | HybridPose | ER-Pose | GDR-Net | DPOD+Ref |
---|---|---|---|---|---|---|---|---|---|
ape | 31.5 | 17.6 | 17.9 | 19.2 | - | 20.9 | 25.9 | 39.3 | - |
can | 75.1 | 53.9 | 69.5 | 69.8 | - | 75.3 | 72.1 | 79.3 | - |
cat | 30.6 | 3.3 | 19 | 21.1 | - | 24.9 | 25.3 | 23.5 | - |
duck | 34.6 | 19.2 | 31.1 | 34.3 | - | 27.9 | 35.8 | 44.4 | - |
driller | 56 | 62.4 | 63.7 | 71.6 | - | 70.2 | 72.9 | 71.3 | - |
eggbox | 46.7 | 25.9 | 59.2 | 47.3 | - | 52.4 | 48.7 | 58.2 | - |
glue | 53.4 | 39.6 | 46.6 | 39.7 | - | 53.8 | 58.8 | 49.3 | - |
hole. | 42 | 21.3 | 42.8 | 45.3 | - | 54.2 | 47.4 | 58.7 | - |
Mean | 46.2 | 30.4 | 43.7 | 43.5 | 32.8 | 47.5 | 48.3 | 53.0 | 47.2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ren, H.; Lin, L.; Wang, Y.; Dong, X. Robust 6-DoF Pose Estimation under Hybrid Constraints. Sensors 2022, 22, 8758. https://doi.org/10.3390/s22228758
Ren H, Lin L, Wang Y, Dong X. Robust 6-DoF Pose Estimation under Hybrid Constraints. Sensors. 2022; 22(22):8758. https://doi.org/10.3390/s22228758
Chicago/Turabian StyleRen, Hong, Lin Lin, Yanjie Wang, and Xin Dong. 2022. "Robust 6-DoF Pose Estimation under Hybrid Constraints" Sensors 22, no. 22: 8758. https://doi.org/10.3390/s22228758