A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios
Abstract
:1. Introduction
- We construct a new dataset with evaluation metrics for stacked scenarios of parametric part objects from multiple templates.
- We propose a new network to provide benchmark results, which jointly achieves foreground segmentation, instance segmentation, template segmentation, 6D pose estimation and parameter values prediction.
2. Related Works
2.1. Dataset
2.2. Instance-Level 6D Pose Estimation
2.3. Category-Level 6D Pose Estimation
3. Dataset
3.1. Dataset Description
3.2. Synthetic Data Generation
3.3. Evaluation Metrics
4. Baseline Method
4.1. Foreground Segmentation
4.2. Template Segmentation
4.3. Translation Branch
4.4. Rotation Branch
4.5. Parameter Values Branch
4.6. Visibility Branch
5. Experiments and Results
5.1. Implementation Details
5.2. Results
5.3. Ablation Study
5.4. Further Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zeng, L.; Dong, Z.; Yu, J.; Hong, J.; Wang, H. Sketch-based retrieval and instantiation of parametric parts. Comput.-Aided Des. 2019, 113, 82–95. [Google Scholar] [CrossRef]
- Shapiro, V.; Vossler, D.L. What is a parametric family of solids? In Proceedings of the Third ACM Symposium on Solid Modeling and Applications, Salt Lake City, UT, USA, 17–19 May 1995; pp. 43–54. [Google Scholar]
- Zeng, L.; Lv, W.; Zhang, X.; Liu, Y. ParametricNet: 6DoF Pose Estimation Network for Parametric Shapes in Stacked Scenarios. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 772–778. [Google Scholar]
- Rad, M.; Lepetit, V. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3848–3856. [Google Scholar]
- Park, K.; Patten, T.; Vincze, M. Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 7668–7677. [Google Scholar]
- Li, Z.; Wang, G.; Ji, X. Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 7678–7687. [Google Scholar]
- Peng, S.; Liu, Y.; Huang, Q.; Zhou, X.; Bao, H. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Bench, CA, USA, 16–20 June 2019; pp. 4556–4565. [Google Scholar]
- Song, C.; Song, J.; Huang, Q. Hybridpose: 6d object pose estimation under hybrid representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 431–440. [Google Scholar]
- Xiong, F.; Liu, C.; Chen, Q. Region Pixel Voting Network (RPVNet) for 6D Pose Estimation from Monocular Image. Appl. Sci. 2021, 11, 743. [Google Scholar] [CrossRef]
- He, Y.; Sun, W.; Huang, H.; Liu, J.; Fan, H.; Sun, J. Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11632–11641. [Google Scholar]
- He, Y.; Huang, H.; Fan, H.; Chen, Q.; Sun, J. FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Event, 19–25 June 2021; pp. 3003–3013. [Google Scholar]
- Liu, H.; Liu, G.; Zhang, Y.; Lei, L.; Xie, H.; Li, Y.; Sun, S. A 3D Keypoints Voting Network for 6DoF Pose Estimation in Indoor Scene. Machines 2021, 9, 230. [Google Scholar] [CrossRef]
- Wang, H.; Sridhar, S.; Huang, J.; Valentin, J.; Song, S.; Guibas, L.J. Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Bench, CA, USA, 16–20 June 2019; pp. 2642–2651. [Google Scholar]
- Tian, M.; Ang, M.H.; Lee, G.H. Shape prior deformation for categorical 6d object pose and size estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 530–546. [Google Scholar]
- Wang, J.; Chen, K.; Dou, Q. Category-Level 6D Object Pose Estimation via Cascaded Relation and Recurrent Reconstruction Networks. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021. [Google Scholar]
- Chen, D.; Li, J.; Wang, Z.; Xu, K. Learning canonical shape space for category-level 6d object pose and size estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11973–11982. [Google Scholar]
- Chen, W.; Jia, X.; Chang, H.J.; Duan, J.; Shen, L.; Leonardis, A. FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Event, 19–25 June 2021; pp. 1581–1590. [Google Scholar]
- Dong, Z.; Liu, S.; Zhou, T.; Cheng, H.; Zeng, L.; Yu, X.; Liu, H. PPR-Net: Point-wise pose regression network for instance segmentation and 6d pose estimation in bin-picking scenarios. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; pp. 1773–1780. [Google Scholar]
- Zeng, L.; Lv, W.J.; Dong, Z.K.; Liu, Y.J. PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios. IEEE Trans. Autom. Sci. Eng. 2021. [Google Scholar] [CrossRef]
- Hinterstoisser, S.; Lepetit, V.; Ilic, S.; Holzer, S.; Bradski, G.; Konolige, K.; Navab, N. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Proceedings of the Asian Conference on Computer Vision (ACCV), Daejeon, Korea, 5–9 November 2012; pp. 548–562. [Google Scholar]
- Tejani, A.; Tang, D.; Kouskouridas, R.; Kim, T.K. Latent-class hough forests for 3D object detection and pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 462–477. [Google Scholar]
- Rennie, C.; Shome, R.; Bekris, K.E.; De Souza, A.F. A dataset for improved RGBD-based object detection and pose estimation for warehouse pick-and-place. IEEE Robot. Autom. Lett. 2016, 1, 1179–1185. [Google Scholar] [CrossRef] [Green Version]
- Eppner, C.; Höfer, S.; Jonschkowski, R.; Martín-Martín, R.; Sieverling, A.; Wall, V.; Brock, O. Lessons from the amazon picking challenge: Four aspects of building robotic systems. In Proceedings of the Robotics: Science and Systems, Ann Arbor, MI, USA, 18–22 June 2016. [Google Scholar]
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. In Proceedings of the Robotics: Science and Systems, Pittsburgh, PA, USA, 26–30 June 2018. [Google Scholar]
- Berk, C.; Arjun, S.; Aaron, W.; Siddhartha, S.; Pieter, A.; Aaron, M.D. The YCB object and model set: Towards common benchmarks for manipulation research. In Proceedings of the International Conference on Advanced Robotics (ICAR), Istanbul, Turkey, 27–31 July 2015; pp. 510–517. [Google Scholar]
- Liu, X.; Iwase, S.; Kitani, K.M. StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 10870–10879. [Google Scholar]
- Hodan, T.; Haluza, P.; Obdržálek, Š.; Matas, J.; Lourakis, M.; Zabulis, X. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 880–888. [Google Scholar]
- Brégier, R.; Devernay, F.; Leyrit, L.; Crowley, J.L. Symmetry aware evaluation of 3d object detection and pose estimation in scenes of many parts in bulk. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV), Venice, Italy, 22–29 October 2017; pp. 2209–2218. [Google Scholar]
- Kleeberger, K.; Landgraf, C.; Huber, M.F. Large-scale 6d object pose estimation dataset for industrial bin-picking. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 4–8 November 2019; pp. 2573–2578. [Google Scholar]
- Lepetit, V.; Moreno-Noguer, F.; Fua, P. Epnp: An accurate o(n) solution to the pnp problem. Int. J. Comput. Vis. 2009, 81, 155. [Google Scholar] [CrossRef] [Green Version]
- Arun, K.S.; Huang, T.S.; Blostein, S.D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell. 1987, 5, 698–700. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Umeyama, S. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 376–380. [Google Scholar] [CrossRef] [Green Version]
- Shotton, J.; Glocker, B.; Zach, C.; Izadi, S.; Criminisi, A.; Fitzgibbon, A. Scene coordinate regression forests for camera relocalization in RGB-D images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 2930–2937. [Google Scholar]
- Li, Y.; Wang, G.; Ji, X.; Xiang, Y.; Fox, D. Deepim: Deep iterative matching for 6d pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 683–698. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 June 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. arXiv 2017, arXiv:1706.02413. [Google Scholar]
- Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C. Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv 2018, arXiv:1807.00652. [Google Scholar]
- Chen, W.; Jia, X.; Chang, H.J.; Duan, J.; Leonardis, A. G2l-net: Global to local network for real-time 6d pose estimation with embedding vector features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 4233–4242. [Google Scholar]
- Gao, G.; Lauri, M.; Zhang, J.; Frintrop, S. Occlusion resistant object rotation regression from point cloud segments. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Brégier, R.; Devernay, F.; Leyrit, L.; Crowley, J.L. Defining the pose of any 3d rigid object and an associated distance. Int. J. Comput. Vis. 2018, 126, 571–596. [Google Scholar] [CrossRef] [Green Version]
Dataset | Modality | Multiple Categories in a Scene | Pose Variability | Task | Occlusion | Types |
---|---|---|---|---|---|---|
LineMOD [20] | RGB-D | Yes | Limited | Instance-level | Cluttered | Household |
IC-MI [21] | RGB-D | Yes | Limited | Instance-level | Cluttered | Household |
Rutgers APC [22] | RGB-D | Yes | Limited | Instance-level | Cluttered | Household |
YCB-Video [24] | RGB-D | Yes | Limited | Instance-level | Cluttered | Household |
StereOBJ-1M [26] | RGB-D | Yes | Limited | Instance-level | Cluttered | Household |
T-LESS [27] | RGB-D | Yes | High | Instance-level | Cluttered | Industrial |
Siléane [28] | RGB-D | No | High | Instance-level | Stacked | Industrial |
Fraunhofer IPA [29] | RGB-D | No | High | Instance-level | Stacked | Industrial |
CAMERA [13] | RGB-D | Yes | Limited | Category-level | Cluttered | Household |
REAL [13] | RGB-D | Yes | Limited | Category-level | Cluttered | Household |
Parametric Dataset [3] | Depth | No | High | Category-level | Stacked | Industrial |
Multi-Parametric Dataset | RGB-D | Yes | High | Category-level | Stacked | Industrial |
Dataset | IoU (%) | mIoU (%) | Overall Accuracy (%) | ||||
---|---|---|---|---|---|---|---|
BG | FG | ||||||
TN06 | TN16 | TN34 | TN42 | ||||
TEST-L | 99.99 | 99.64 | 99.82 | 99.58 | 99.63 | 99.74 | 99.96 |
TEST-G | 99.95 | 98.81 | 99.48 | 98.59 | 99.04 | 99.17 | 99.84 |
Translation | ||||
---|---|---|---|---|
Dataset | mAP (%) | |||
thres = 0.5 cm | thres = 1 cm | thres = 2 cm | thres = 5 cm | |
TEST-L | 92.9 | 99.1 | 99.6 | 99.5 |
TEST-G | 90.8 | 98.4 | 99.4 | 99.3 |
Rotation | ||||
Dataset | mAP (%) | |||
thres = 5 | thres = 10 | thres = 15 | thres = 20 | |
TEST-L | 41.9 | 52.3 | 56.1 | 59.7 |
TEST-G | 36.8 | 53.0 | 57.4 | 60.9 |
Parameter values | ||||
Dataset | mAP (%) | |||
thres = 5% | thres = 10% | thres = 15% | thres = 20% | |
TEST-L | 51.0 | 76.2 | 86.3 | 91.2 |
TEST-G | 6.0 | 43.7 | 71.0 | 83.6 |
Dataset | Method | mAP (%) | |||
---|---|---|---|---|---|
thres = 0.5 cm | thres = 1 cm | thres = 2 cm | thres = 5 cm | ||
TEST-L | w/o residual | 91.1 | 99.1 | 99.7 | 99.5 |
with residual | 92.9 | 99.1 | 99.6 | 99.5 | |
TEST-G | w/o residual | 88.4 | 98.4 | 99.4 | 99.3 |
with residual | 90.8 | 98.4 | 99.4 | 99.3 |
Dataset | Method | mAP (%) | |||
---|---|---|---|---|---|
thres = 5 | thres = 10 | thres = 15 | thres = 20 | ||
TEST-L | w/o residual | 36.0 | 44.2 | 48.1 | 50.6 |
with residual | 41.9 | 52.3 | 56.1 | 59.7 | |
TEST-G | w/o residual | 34.1 | 45.2 | 49.3 | 52.1 |
with residual | 36.8 | 53.0 | 57.4 | 60.9 |
Dataset | Method | mAP (%) | |||
---|---|---|---|---|---|
thres = 5% | thres = 10% | thres = 15% | thres = 20% | ||
TEST-L | w/o residual | 47.5 | 73.4 | 84.4 | 89.6 |
with residual | 51.0 | 76.2 | 86.3 | 91.2 | |
TEST-G | w/o residual | 4.8 | 41.3 | 69.7 | 80.6 |
with residual | 6.0 | 43.7 | 71.0 | 83.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Lv, W.; Zeng, L. A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios. Machines 2021, 9, 321. https://doi.org/10.3390/machines9120321
Zhang X, Lv W, Zeng L. A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios. Machines. 2021; 9(12):321. https://doi.org/10.3390/machines9120321
Chicago/Turabian StyleZhang, Xinyu, Weijie Lv, and Long Zeng. 2021. "A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios" Machines 9, no. 12: 321. https://doi.org/10.3390/machines9120321
APA StyleZhang, X., Lv, W., & Zeng, L. (2021). A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios. Machines, 9(12), 321. https://doi.org/10.3390/machines9120321