6DoF Pose Estimation of Transparent Object from a Single RGB-D Image
Abstract
1. Introduction
- We propose a new deep learning based approach which focuses on 6DoF pose estimation of transparent object from a single RGB-D image. Discriminative high-level deep features are retrieved through a two-stage end-to-end neural network, which result in an accurate 6DoF pose estimation.
- We introduce a novel extended point cloud representation for 6DoF pose estimation. Different from classic point cloud, the representation does not require depth as input. With this representation, object pose can be efficiently recovered without the time consuming depth reconstruction.
2. Related Work
3. Method
3.1. The First Stage
3.2. The Second Stage
3.3. Dataset
4. Experiments
4.1. Experimental Settings
4.2. Evaluation Metric
4.3. Accuracy
4.4. Efficiency
4.5. Ablation Study
4.6. Qualitative Evaluation
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Li, S.; Chi, X.; Ming, X. A Robust O(n) Solution to the Perspective-n-Point Problem. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1444–1450. [Google Scholar] [CrossRef]
- Wang, C.; Xu, D.; Zhu, Y.; Martín-Martín, R.; Lu, C.; Fei-Fei, L.; Savarese, S. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–18 June 2019; pp. 3343–3352. [Google Scholar] [CrossRef]
- Tian, M.; Pan, L.; Ang Jr, M.H.; Lee, G.H. Robust 6D Object Pose Estimation by Learning RGB-D Features. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–4 June 2020. [Google Scholar] [CrossRef]
- Zhu, M.; Derpanis, K.G.; Yang, Y.; Brahmbhatt, S.; Zhang, M.; Phillips, C.; Lecce, M.; Daniilidis, K. Single image 3D object detection and pose estimation for grasping. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Miami, Florida, USA, 20–21 January 2014; pp. 3936–3943. [Google Scholar] [CrossRef]
- Tremblay, J.; To, T.; Sundaralingam, B.; Xiang, Y.; Fox, D.; Birchfield, S. Deep object pose estimation for semantic robotic grasping of household objects. arXiv 2018, arXiv:1809.10790. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D Object Detection Network for Autonomous Driving. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017; pp. 1907–1915. [Google Scholar] [CrossRef]
- Yu, Y.K.; Wong, K.H.; Chang, M.M.Y. Pose Estimation for Augmented Reality Applications Using Genetic Algorithm. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2005, 35, 1295–1301. [Google Scholar] [CrossRef]
- Marchand, E.; Uchiyama, H.; Spindler, F. Pose Estimation for Augmented Reality: A Hands-On Survey. IEEE Trans. Vis. Comput. Graph. 2016, 22, 2633–2651. [Google Scholar] [CrossRef] [PubMed]
- Kehl, W.; Milletari, F.; Tombari, F.; Ilic, S.; Navab, N. Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation. In Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar] [CrossRef]
- Li, C.; Bai, J.; Hager, G.D. A Unified Framework for Multi-View Multi-Class Object Pose Estimation. In Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 263–281. [Google Scholar] [CrossRef]
- Sajjan, S.; Moore, M.; Pan, M.; Nagaraja, G.; Lee, J.; Zeng, A.; Song, S. Clear Grasp: 3D Shape Estimation of Transparent Objects for Manipulation. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–4 June 2020. [Google Scholar] [CrossRef]
- Peng, S.; Liu, Y.; Huang, Q.; Zhou, X.; Bao, H. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–18 June 2019; pp. 4561–4570. [Google Scholar] [CrossRef]
- Drost, B.; Ulrich, M.; Navab, N.; Ilic, S. Model globally, match locally: Efficient and robust 3D object recognition. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, SanFrancisco, CA, USA, 13–18 June 2010; pp. 998–1005. [Google Scholar] [CrossRef]
- Vidal, J.; Lin, C.; Martí, R. 6D pose estimation using an improved method based on point pair features. In Proceedings of the 2018 4th International Conference on Control, Automation and Robotics (ICCAR), Auckland, New Zealand, 20–23 April 2018; pp. 405–409. [Google Scholar] [CrossRef]
- Hinterstoisser, S.; Holzer, S.; Cagniart, C.; Ilic, S.; Konolige, K.; Navab, N.; Lepetit, V. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 858–865. [Google Scholar] [CrossRef]
- Guo, Y.; Bennamoun, M.; Sohel, F.; Lu, M.; Wan, J.; Kwok, N.M. A Comprehensive Performance Evaluation of 3D Local Feature Descriptors. Int. J. Comput. Vis. 2015, 116, 66–89. [Google Scholar] [CrossRef]
- Song, S.; Xiao, J. Deep sliding shapes for amodal 3d object detection in rgb-d images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 808–816. [Google Scholar] [CrossRef]
- Park, K.; Mousavian, A.; Xiang, Y.; Fox, D. LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
- Wada, K.; Sucar, E.; James, S.; Lenton, D.; Davison, A.J. MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5099–5108. [Google Scholar] [CrossRef]
- Fritz, M.; Bradski, G.; Karayev, S.; Darrell, T.; Black, M.J. An additive latent feature model for transparent object recognition. Adv. Neural Inf. Process. Syst. 2009, 22, 558–566. [Google Scholar]
- Mchenry, K.; Ponce, J.; Forsyth, D. Finding glass. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 196–199. [Google Scholar]
- Phillips, C.J.; Derpanis, K.G.; Daniilidis, K. A novel stereoscopic cue for figure-ground segregation of semi-transparent objects. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Wang, W.; Ding, M.; Shen, C.; Luo, P. Segmenting Transparent Objects in the Wild. arXiv 2020, arXiv:2003.13948. [Google Scholar]
- Mchenry, K.; Ponce, J. A Geodesic Active Contour Framework for Finding Glass. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, USA, 17–22 June 2006; pp. 1038–1044. [Google Scholar] [CrossRef]
- Wang, T.; He, X.; Barnes, N. Glass object localization by joint inference of boundary and depth. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; pp. 3783–3786. [Google Scholar]
- Khaing, M.P.; Masayuki, M. Transparent object detection using convolutional neural network. In Proceedings of the International Conference on Big Data Analysis and Deep Learning Applications, Miyazaki, Japan, 14–15 May 2018; Springer: Singapore; pp. 86–93. [Google Scholar] [CrossRef]
- Lai, P.J.; Fuh, C.S. Transparent object detection using regions with convolutional neural network. In Proceedings of the IPPR Conference on Computer Vision, Graphics, and Image Processing, Taiwan, China, 17–19 August 2015; pp. 1–8. [Google Scholar]
- Seib, V.; Barthen, A.; Marohn, P.; Paulus, D. Friend or foe: Exploiting sensor failures for transparent object localization and classification. In Proceedings of the 2016 International Conference on Robotics and Machine Vision, Moscow, Russia, 14–16 September 2016; Bernstein, A.V., Olaru, A., Zhou, J., Eds.; Volume 10253, pp. 94–98. [Google Scholar] [CrossRef]
- Han, K.; Wong, K.Y.K.; Liu, M. A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- Qian, Y.; Gong, M.; Yang, Y. 3D Reconstruction of Transparent Objects with Position-Normal Consistency. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4369–4377. [Google Scholar] [CrossRef]
- Song, S.; Shim, H. Depth Reconstruction of Translucent Objects from a Single Time-of-Flight Camera Using Deep Residual Networks. In Computer Vision–ACCV 2018; Jawahar, C., Li, H., Mori, G., Schindler, K., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 641–657. [Google Scholar] [CrossRef]
- Klank, U.; Carton, D.; Beetz, M. Transparent object detection and reconstruction on a mobile platform. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 5971–5978. [Google Scholar] [CrossRef]
- Eren, G.; Aubreton, O.; Meriaudeau, F.; Secades, L.S.; Fofi, D.; Naskali, A.T.; Truchetet, F.; Ercil, A. Scanning from heating: 3D shape estimation of transparent objects from local surface heating. Opt. Express 2009, 17, 11457–11468. [Google Scholar] [CrossRef] [PubMed]
- Ji, Y.; Xia, Q.; Zhang, Z. Fusing depth and silhouette for scanning transparent object with RGB-D sensor. Int. J. Opt. 2017, 2017, 9796127. [Google Scholar] [CrossRef]
- Li, Z.; Yeh, Y.Y.; Chandraker, M. Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1262–1271. [Google Scholar] [CrossRef]
- Albrecht, S.; Marsland, S. Seeing the unseen: Simple reconstruction of transparent objects from point cloud data. In Proceedings of the Robotics: Science and Systems, Berlin, Germany, 24–28 June 2013. [Google Scholar]
- Lysenkov, I.; Eruhimov, V.; Bradski, G. Recognition and pose estimation of rigid transparent objects with a kinect sensor. Robotics 2013, 273, 273–280. [Google Scholar] [CrossRef]
- Lysenkov, I.; Rabaud, V. Pose estimation of rigid transparent objects in transparent clutter. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 162–169. [Google Scholar] [CrossRef]
- Guo-Hua, C.; Jun-Yi, W.; Ai-Jun, Z. Transparent object detection and location based on RGB-D camera. J. Phys. Conf. Ser. 2019, 1183, 012011. [Google Scholar] [CrossRef]
- Byambaa, M.; Koutaki, G.; Choimaa, L. 6D Pose Estimation of Transparent Object from Single RGB Image. In Proceedings of the Conference of Open Innovations Association, FRUCT, Helsinki, Finland, 5–8 November 2019; pp. 444–447. [Google Scholar]
- Phillips, C.J.; Lecce, M.; Daniilidis, K. Seeing Glassware: From Edge Detection to Pose Estimation and Shape Recovery. In Proceedings of the Robotics: Science and Systems, Ann Arbor, MI, USA, 18–22 June 2016; Volume 3. [Google Scholar] [CrossRef]
- Liu, X.; Jonschkowski, R.; Angelova, A.; Konolige, K. KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11602–11610. [Google Scholar] [CrossRef]
- Lysenkov, I.; Eruhimov, V. Pose Refinement of Transparent Rigid Objects with a Stereo Camera. In Transactions on Computational Science XIX; Gavrilova, M.L., Tan, C.J.K., Konushin, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 143–157. [Google Scholar] [CrossRef]
- Zhou, Z.; Pan, T.; Wu, S.; Chang, H.; Jenkins, O.C. GlassLoc: Plenoptic Grasp Pose Detection in Transparent Clutter. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019. [Google Scholar] [CrossRef]
- Mathai, A.; Guo, N.; Liu, D.; Wang, X. 3D Transparent Object Detection and Reconstruction Based on Passive Mode Single-Pixel Imaging. Sensors 2020, 20, 4211. [Google Scholar] [CrossRef] [PubMed]
- Grammatikopoulou, M.; Yang, G. Three-Dimensional Pose Estimation of Optically Transparent Microrobots. IEEE Robot. Autom. Lett. 2020, 5, 72–79. [Google Scholar] [CrossRef]
- Kaiming, H.; Georgia, G.; Piotr, D.; Ross, G. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
- Schnabel, R.; Wahl, R.; Klein, R. Efficient RANSAC for point-cloud shape detection. In Computer Graphics Forum; Blackwell Publishing Ltd.: Oxford, UK, 2007; Volume 26, pp. 214–226. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2018, 38, 1–12. [Google Scholar] [CrossRef]
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes. arXiv 2017, arXiv:1711.00199. [Google Scholar]
- Brachmann, E.; Michel, F.; Krull, A.; Yang, M.Y.; Gumhold, S.; Rother, C. Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3364–3372. [Google Scholar] [CrossRef]
- Rad, M.; Lepetit, V. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3848–3856. [Google Scholar] [CrossRef]
- Tekin, B.; Sinha, S.N.; Fua, P. Real-Time Seamless Single Shot 6D Object Pose Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Kehl, W.; Manhardt, F.; Tombari, F.; Ilic, S.; Navab, N. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1521–1529. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in pytorch. In Proceedings of the 2017 Neural Information Processing Systems Workshop, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Hinterstoisser, S.; Lepetit, V.; Ilic, S.; Holzer, S.; Navab, N. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes. In Proceedings of the Asian Conference on Computer Vision, Daejeon, Korea, 5–9 November 2012; pp. 548–562. [Google Scholar] [CrossRef]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 Fourth international conference on 3D vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 239–248. [Google Scholar] [CrossRef]











| Object | FCRN + RA | FCRN + DF | CG + DF | CG + RA | Ours | 
|---|---|---|---|---|---|
| cup | 12.2 | 40.3 | 76.5 | 71.2 | 88.6 | 
| flower | 13.5 | 53.8 | 76.3 | 80.3 | 92.2 | 
| heart | 7.7 | 35.2 | 28.5 | 73.3 | 88.7 | 
| square | 25.5 | 35.6 | 71.8 | 54.4 | 77.0 | 
| stemless | 23.2 | 32.4 | 69.3 | 62.7 | 80.1 | 
| all | 16.4 | 39.1 | 64.0 | 68.4 | 85.4 | 
| FCRN + RA | FCRN + DF | CG + DF | CG + RA | Ours | |
|---|---|---|---|---|---|
| per instance | 0.108 s | 0.074 s | 0.819 s | 0.855 s | 0.069 s | 
| per image | 0.345 s | 0.234 s | 2.606 s | 2.715 s | 0.223 s | 
| Object | w/o UV Code | w/o Normal | w/o Plane | Ours | 
|---|---|---|---|---|
| cup | 35.6 | 71.6 | 72.5 | 88.6 | 
| flower | 44.0 | 79.7 | 81.0 | 92.2 | 
| heart | 36.2 | 69.4 | 73.4 | 88.7 | 
| square | 23.8 | 58.1 | 62.3 | 77.0 | 
| stemless | 29.2 | 65.3 | 68.3 | 80.1 | 
| all | 33.8 | 68.8 | 71.5 | 85.4 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, C.; Chen, J.; Yao, M.; Zhou, J.; Zhang, L.; Liu, Y. 6DoF Pose Estimation of Transparent Object from a Single RGB-D Image. Sensors 2020, 20, 6790. https://doi.org/10.3390/s20236790
Xu C, Chen J, Yao M, Zhou J, Zhang L, Liu Y. 6DoF Pose Estimation of Transparent Object from a Single RGB-D Image. Sensors. 2020; 20(23):6790. https://doi.org/10.3390/s20236790
Chicago/Turabian StyleXu, Chi, Jiale Chen, Mengyang Yao, Jun Zhou, Lijun Zhang, and Yi Liu. 2020. "6DoF Pose Estimation of Transparent Object from a Single RGB-D Image" Sensors 20, no. 23: 6790. https://doi.org/10.3390/s20236790
APA StyleXu, C., Chen, J., Yao, M., Zhou, J., Zhang, L., & Liu, Y. (2020). 6DoF Pose Estimation of Transparent Object from a Single RGB-D Image. Sensors, 20(23), 6790. https://doi.org/10.3390/s20236790
 
        

 
       