DeepHMap++: Combined Projection Grouping and Correspondence Learning for Full DoF Pose Estimation
Abstract
:1. Introduction
- We present a simple yet efficient projection grouping module for removing fake local maxima in each layer of projection heatmaps. The projection grouping module learns correlation constraints among projections of different BBCs and to select the optimal projection.
- In order to suppress different jitters during inference, multiple correspondence hypotheses are randomly sampled from local maxima and its corresponding neighborhood and ranked by a correspondence–evaluation network.
2. Related Works
2.1. Direct Methods
2.2. Two-Stage Pipeline
3. Methods
3.1. Local Patch Based Heatmap Prediction
3.2. Projection Grouping
3.3. Correspondence Learning Based Hypothesis Scoring
3.3.1. Generating Hypothesis Pool
3.3.2. Learning with a Hybrid Loss
3.4. Training Dataset
4. Evaluations
4.1. Datasets and Evaluation Metric
4.2. Architecture and Parameter Selection for Projection Grouping Module
4.3. Correspondence Evaluation
4.4. Results from the Full Pipeline
4.5. Runtime Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Lepetit, V.; Fua, P. Monocular Model-Based 3D Tracking of Rigid Objects; Now Publishers Inc.: Hanover, MA, USA, 2005; pp. 1–89. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Rothganger, F.; Lazebnik, S.; Schmid, C.; Ponce, J. 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int. J. Comput. Vis. 2006, 66, 231–259. [Google Scholar] [CrossRef]
- Crivellaro, A.; Rad, M.; Verdie, Y.; Yi, K.M.; Fua, P.; Lepetit, V. Robust 3D object tracking from monocular images using stable parts. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1465–1479. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Brachmann, E.; Krull, A.; Michel, F.; Gumhold, S.; Shotton, J.; Rother, C. Learning 6D Object Pose Estimation using 3D Object Coordinates. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 536–551. [Google Scholar]
- Rad, M.; Lepetit, V. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Nigam, A.; Penate-Sanchez, A.; Agapito, L. Detect Globally, Label Locally: Learning Accurate 6-DOF Object Pose Estimation by Joint Segmentation and Coordinate Regression. IEEE Robot. Autom. Lett. 2018, 3, 3960–3967. [Google Scholar] [CrossRef]
- Hodan, T.; Haluza, P.; Obdržálek, Š.; Matas, J.; Lourakis, M.; Zabulis, X. T-LESS: An RGB-D dataset for 6D Pose Estimation of Texture-less Objects. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Santa Rosa, CA, USA, 24–31 March 2017; pp. 880–888. [Google Scholar]
- Crivellaro, A.; Rad, M.; Verdie, Y.; Moo Yi, K.; Fua, P.; Lepetit, V. A novel representation of parts for accurate 3D object detection and tracking in monocular images. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4391–4399. [Google Scholar]
- Oberweger, M.; Rad, M.; Lepetit, V. Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Brachmann, E.; Rother, C. Learning less is more-6d camera localization via 3D surface regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Krull, A.; Brachmann, E.; Nowozin, S.; Michel, F.; Shotton, J.; Rother, C. Poseagent: Budget-constrained 6D object pose estimation via reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Tekin, B.; Sinha, S.N.; Fua, P. Real-time seamless single shot 6D object pose prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Hartley, R.I.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Brachmann, E.; Krull, A.; Nowozin, S.; Shotton, J.; Michel, F.; Gumhold, S.; Rother, C. DSAC—Differentiable RANSAC for Camera Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Yi, K.M.; Trulls, E.; Ono, Y.; Lepetit, V.; Salzmann, M.; Fua, P. Learning to Find Good Correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Hinterstoisser, S.; Lepetit, V.; Ilic, S.; Holzer, S.; Bradski, G.; Konolige, K.; Navab, N. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; pp. 548–562. [Google Scholar]
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. In Proceedings of the Robotics: Science and Systems, Pittsburgh, PA, USA, 26–30 June 2018. [Google Scholar]
- Tejani, A.; Kouskouridas, R.; Doumanoglou, A.; Tang, D.; Kim, T.K. Latent-Class Hough Forests for 6 DoF Object Pose Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 119–132. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wohlhart, P.; Lepetit, V. Learning descriptors for object recognition and 3D pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Balntas, V.; Doumanoglou, A.; Sahin, C.; Sock, J.; Kouskouridas, R.; Kim, T.K. Pose Guided RGBD Feature Learning for 3D Object Pose Estimation. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3856–3864. [Google Scholar]
- Doumanoglou, A.; Kouskouridas, R.; Malassiotis, S.; Kim, T.K. Recovering 6D object pose and predicting next-best-view in the crowd. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NE, USA, 27–30 June 2016. [Google Scholar]
- Li, C.; Bai, J.; Hager, G.D. A Unified Framework for Multi-View Multi-Class Object Pose Estimation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Mitash, C.; Boularias, A.; Bekris, K. Physics-based Scene-level Reasoning for Object Pose Estimation in Clutter. arXiv, 2018; arXiv:1806.10457. [Google Scholar]
- Wu, J.; Zhou, B.; Russell, R.; Kee, V.; Wagner, S.; Hebert, M.; Torralba, A.; Johnson, D. Real-Time Object Pose Estimation with Pose Interpreter Networks. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain, 1–5 October 2018. [Google Scholar]
- Do, T.T.; Cai, M.; Pham, T.; Reid, I. Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image. arXiv, 2018; arXiv:1802.10367. [Google Scholar]
- Periyasamy, A.S.; Schwarz, M.; Behnke, S. Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks. arXiv, 2018; arXiv:1810.03410. [Google Scholar]
- Sundermeyer, M.; Marton, Z.C.; Durner, M.; Brucker, M.; Triebel, R. Implicit 3d orientation learning for 6d object detection from rgb images. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 712–729. [Google Scholar]
- Doumanoglou, A.; Balntas, V.; Kouskouridas, R.; Kim, T.K. Siamese regression networks with efficient mid-level feature extraction for 3d object pose estimation. arXiv, 2016; arXiv:1607.02257. [Google Scholar]
- Kehl, W.; Milletari, F.; Tombari, F.; Ilic, S.; Navab, N. Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 205–220. [Google Scholar]
- Zhang, H.; Cao, Q. Combined Holistic and Local Patches for Recovering 6D Object Pose. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2219–2227. [Google Scholar]
- Zeng, A.; Yu, K.T.; Song, S.; Suo, D.; Walker, E.; Rodriguez, A.; Xiao, J. Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In Proceedings of the IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017. [Google Scholar]
- Brachmann, E.; Michel, F.; Krull, A.; Ying Yang, M.; Gumhold, S. Uncertainty-driven 6d pose estimation of objects and scenes from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Jafari, O.H.; Mustikovela, S.K.; Pertsch, K.; Brachmann, E.; Rother, C. iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 4–6 December 2018. [Google Scholar]
- Kehl, W.; Manhardt, F.; Tombari, F.; Illic, S.; Navab, N. SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Pavlakos, G.; Zhou, X.; Chan, A.; Derpanis, K.G.; Daniilidis, K. 6-dof object pose from semantic keypoints. In Proceedings of the IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017. [Google Scholar]
- Michel, F.; Kirillov, A.; Brachmann, E.; Krull, A.; Gumhold, S.; Savchynskyy, B.; Rother, C. Global hypothesis generation for 6D object pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Sock, J.; Kim, K.I.; Sahin, C.; Kim, T.K. Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios. arXiv, 2018; arXiv:1806.03891. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Krull, A.; Brachmann, E.; Michel, F.; Yang, M.Y.; Gumhold, S.; Rother, C. Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Sahin, C.; Kim, T.K. Recovering 6D Object Pose: A Review and Multi-modal Analysis. In Proceedings of the European Conference on Computer Vision Workshops on Assistive Computer Vision and Robotics, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Opitz, M.; Waltner, G.; Poier, G.; Possegger, H.; Bischof, H. Grid loss: Detecting occluded faces. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 386–402. [Google Scholar]
- Yang, S.; Luo, P.; Loy, C.C.; Tang, X. Faceness-Net: Face Detection through Deep Facial Part Responses. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 1845–1859. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial transformer networks. In Proceedings of the Advances In Neural Information Processing Systems, Montreal, QC, Canada, 11–12 December 2015; pp. 2017–2025. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 27–30 June 2016. [Google Scholar]
- Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Dang, Z.; Yi, K.M.; Hu, Y.; Wang, F.; Fua, P.; Salzmann, M. Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016. [Google Scholar]
- Ferraz, L.; Binefa, X.; Moreno-Noguer, F. Very Fast Solution to the PnP Problem with Algebraic Outlier Rejection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Rad, M.; Oberweger, M.; Lepetit, V. Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Sequence | ADD|I | 2D Reprojection Error | ||||
---|---|---|---|---|---|---|
BB8 [8] | CorrNet w/o CC | CorrNet | BB8 [8] | CorrNet w/o CC | CorrNet | |
Ape | 40.4 | 49.7 | 51.2 | 96.6 | 97.7 | 98.2 |
Benchvise | 91.8 | 93.0 | 93.5 | 90.1 | 97.5 | 98.1 |
Camera | 55.7 | 61.7 | 62.9 | 86.0 | 96.2 | 96.5 |
Can | 64.1 | 71.5 | 72.9 | 91.2 | 98.1 | 98.4 |
Cat | 62.6 | 65.5 | 67.1 | 98.8 | 98.7 | 98.8 |
Driller | 74.4 | 81.5 | 82.6 | 80.9 | 88.8 | 90.1 |
Duck | 44.3 | 55.7 | 59.0 | 92.2 | 97.0 | 97.3 |
Eggbox | 57.8 | 74.6 | 77.0 | 91.0 | 95.2 | 95.5 |
Glue | 41.2 | 83.4 | 85.1 | 92.3 | 97.8 | 98.5 |
Holepuncher | 67.2 | 73.9 | 75.7 | 95.3 | 97.8 | 98.2 |
Iron | 84.7 | 87.8 | 89.3 | 84.8 | 88.5 | 89.9 |
Lamp | 76.5 | 80.7 | 83.1 | 75.8 | 86.5 | 88.1 |
Phone | 54.0 | 62.1 | 65.6 | 85.3 | 90.6 | 91.0 |
Average | 62.7 | 72.4 | 74.2 | 89.3 | 94.6 | 95.3 |
Sequence | ADD|I | 2D Reprojection Error | ||||
---|---|---|---|---|---|---|
RANSAC [11] | CorrNet w/o CC | CorrNet | RANSAC [11] | CorrNet w/o CC | CorrNet | |
Ape | 16.5 | 17.0 | 17.3 | 64.7 | 66.5 | 68.6 |
Can | 42.5 | 45.8 | 49.2 | 53.0 | 61.7 | 64.9 |
Cat | 2.8 | 2.9 | 3.0 | 47.9 | 51.4 | 53.3 |
Driller | 47.1 | 54.6 | 57.7 | 35.1 | 48.5 | 55.0 |
Duck | 11.0 | 12.1 | 13.2 | 36.1 | 39.5 | 47.3 |
Eggbox | 24.7 | 24.9 | 25.0 | 10.3 | 10.3 | 10.4 |
Glue | 39.5 | 39.7 | 39.9 | 44.9 | 51.7 | 53.4 |
Holepuncher | 21.9 | 21.9 | 21.9 | 52.9 | 56.6 | 57.6 |
Average | 25.8 | 27.4 | 28.4 | 43.1 | 48.3 | 51.3 |
Sequence | PoseCNN [19] | DeepHMap [11] | DeepHMap++ | ||||||
---|---|---|---|---|---|---|---|---|---|
AUC | ADD|I | 2D Repr. | AUC | ADD|I | 2D Repr. | AUC | ADD|I | 2D Repr. | |
002 master chef can | 50.1 | 3.6 | 0.1 | 68.5 | 32.9 | 9.9 | 75.8 | 40.1 | 20.1 |
003 cracker box | 52.9 | 25.1 | 0.1 | 74.7 | 62.6 | 24.5 | 78.0 | 69.5 | 34.5 |
004 sugar box | 68.3 | 40.3 | 7.1 | 74.9 | 44.5 | 47.0 | 76.5 | 49.7 | 58.9 |
005 tomato soup can | 66.1 | 25.5 | 5.2 | 68.7 | 31.1 | 41.5 | 72.1 | 36.1 | 49.8 |
006 mustard bottle | 80.8 | 61.9 | 6.4 | 72.6 | 42.0 | 42.3 | 78.9 | 57.9 | 60.1 |
007 tuna fish can | 70.6 | 11.4 | 3.0 | 38.2 | 6.8 | 7.1 | 51.6 | 9.8 | 19.5 |
008 pudding box | 62.2 | 14.5 | 5.1 | 82.9 | 58.4 | 43.9 | 85.6 | 67.2 | 56.8 |
009 gelatin box | 74.8 | 12.1 | 15.8 | 82.8 | 42.5 | 62.1 | 86.7 | 59.1 | 76.8 |
010 potted meat can | 59.5 | 18.9 | 23.1 | 66.8 | 37.6 | 38.5 | 70.1 | 42.0 | 42.3 |
011 banana | 72.1 | 30.3 | 0.3 | 44.9 | 16.8 | 8.2 | 47.9 | 19.3 | 10.5 |
019 pitcher base | 53.1 | 15.6 | 0.0 | 70.3 | 57.2 | 15.9 | 71.8 | 58.5 | 19.8 |
021 bleach cleanser | 50.2 | 21.2 | 1.2 | 67.1 | 65.3 | 12.1 | 69.1 | 69.4 | 18.5 |
024 bowl | 69.8 | 12.1 | 4.4 | 58.6 | 25.6 | 16.0 | 60.2 | 27.7 | 18.1 |
025 mug | 58.4 | 5.2 | 0.8 | 38.0 | 11.6 | 20.3 | 43.4 | 12.9 | 26.3 |
035 power drill | 55.2 | 29.9 | 3.3 | 72.6 | 46.1 | 40.9 | 76.8 | 51.8 | 50.1 |
036 wood block | 61.8 | 10.7 | 0.0 | 57.7 | 34.3 | 2.5 | 61.3 | 35.7 | 2.8 |
037 scissors | 35.3 | 2.2 | 0.0 | 30.9 | 0.0 | 0.0 | 42.9 | 2.1 | 6.7 |
040 large marker | 58.1 | 3.4 | 1.4 | 46.2 | 3.2 | 0.0 | 47.6 | 3.6 | 0.8 |
051 large clamp | 50.1 | 28.5 | 0.3 | 42.4 | 10.8 | 0.0 | 44.1 | 11.2 | 8.7 |
052 extra large clamp | 46.5 | 19.6 | 0.6 | 48.1 | 29.6 | 0.0 | 51.9 | 30.9 | 0.8 |
061 foam brick | 85.9 | 54.5 | 0.0 | 82.7 | 51.7 | 52.4 | 84.1 | 55.4 | 59.7 |
Average | 61.0 | 21.3 | 3.7 | 61.4 | 33.8 | 23.1 | 65.5 | 38.6 | 30.6 |
Processing Step | Time | |
---|---|---|
First stage | Projection heatmap predictions | 80 ms |
Projection heatmap predictions (in parallel) | 20 ms | |
Second stage | Projection grouping | 3 ms |
Correspondence evaluation | 20 ms | |
Full pipeline | ms | |
Full pipeline (in parallel) | ms |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fu, M.; Zhou, W. DeepHMap++: Combined Projection Grouping and Correspondence Learning for Full DoF Pose Estimation. Sensors 2019, 19, 1032. https://doi.org/10.3390/s19051032
Fu M, Zhou W. DeepHMap++: Combined Projection Grouping and Correspondence Learning for Full DoF Pose Estimation. Sensors. 2019; 19(5):1032. https://doi.org/10.3390/s19051032
Chicago/Turabian StyleFu, Mingliang, and Weijia Zhou. 2019. "DeepHMap++: Combined Projection Grouping and Correspondence Learning for Full DoF Pose Estimation" Sensors 19, no. 5: 1032. https://doi.org/10.3390/s19051032
APA StyleFu, M., & Zhou, W. (2019). DeepHMap++: Combined Projection Grouping and Correspondence Learning for Full DoF Pose Estimation. Sensors, 19(5), 1032. https://doi.org/10.3390/s19051032