A Feature-Enhancement 6D Pose Estimation Method for Weakly Textured and Occluded Targets
Abstract
1. Introduction
2. Related Work
2.1. Pose Estimation from RGB Images
2.2. Pose Estimation from RGB-D Images
3. Method
3.1. Overall Process
3.2. Semantic Segmentation
3.3. Feature Extraction and Fusion
3.3.1. Color Feature Extraction
3.3.2. Point Cloud Feature Extraction
3.3.3. Feature Fusion
3.3.4. Pose Estimation and Refinement
4. Experiments and Results
4.1. Datasets
4.1.1. LineMOD Dataset
4.1.2. Occlusion LineMOD Dataset
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Results on the Datasets
4.4.1. Result on the LineMOD Dataset
4.4.2. Result of the Occlusion LineMOD Dataset
4.4.3. Ablation Experiments
- Improving only the color feature extraction network.
- Improving only the point cloud feature extraction network.
- Improving both, all under the setting without the refinement network.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhou, X.; Xu, X.; Liang, W.; Zeng, Z.; Yan, Z. Deep-Learning-Enhanced Multitarget Detection for End–Edge–Cloud Surveillance in Smart IoT. IEEE Internet Things J. 2021, 8, 12588–12596. [Google Scholar] [CrossRef]
- Jiang, L.; Hu, R.; Wang, X.; Tu, W.; Zhang, M. Nonlinear Prediction with Deep Recurrent Neural Networks for Non-Blind Audio Bandwidth Extension. China Commun. 2018, 15, 72–85. [Google Scholar] [CrossRef]
- Zhou, X.; Li, Y.; Liang, W. CNN-RNN Based Intelligent Recommendation for Online Medical Pre-Diagnosis Support. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 912–921. [Google Scholar] [CrossRef] [PubMed]
- Yang, S. A Novel Study on Deep Learning Framework to Predict and Analyze the Financial Time Series Information. Future Gener. Comput. Syst. 2021, 125, 812–819. [Google Scholar] [CrossRef]
- Shi, D.; Zheng, H. A Mortality Risk Assessment Approach on ICU Patients Clinical Medication Events Using Deep Learning. Comput. Model. Eng. Sci. 2021, 128, 161–181. [Google Scholar] [CrossRef]
- Zhou, X.; Liang, W.; Wang, K.I.K.; Wang, H.; Yang, L.T.; Jin, Q. Deep-Learning-Enhanced Human Activity Recognition for Internet of Healthcare Things. IEEE Internet Things J. 2020, 7, 6429–6438. [Google Scholar] [CrossRef]
- Zhou, X.; Liang, W.; Li, W.; Yan, K.; Shimizu, S.; Wang, K.I.K. Hierarchical Adversarial Attacks Against Graph-Neural-Network-Based IoT Network Intrusion Detection System. IEEE Internet Things J. 2022, 9, 9310–9319. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Adv. Neural Inf. Process. Syst. 2017, 30, 5099–5108. Available online: https://dl.acm.org/doi/10.5555/3295222.3295263 (accessed on 9 September 2025).
- Lv, P.; Wang, J.; Zhang, X.; Shi, C. Deep Supervision and Atrous Inception-Based U-Net Combining CRF for Automatic Liver Segmentation from CT. Sci. Rep. 2022, 12, 16995. [Google Scholar] [CrossRef]
- Luo, X.; Chen, Z. English Text Quality Analysis Based on Recurrent Neural Network and Semantic Segmentation. Future Gener. Comput. Syst. 2020, 112, 507–511. [Google Scholar] [CrossRef]
- Mo, C.; Sun, W. Point-by-Point Feature Extraction of Artificial Intelligence Images Based on the Internet of Things. Comput. Commun. 2020, 159, 1–8. [Google Scholar] [CrossRef]
- Jiang, F.; Wang, K.; Dong, L.; Pan, C.; Xu, W.; Yang, K. AI Driven Heterogeneous MEC System with UAV Assistance for Dynamic Environment: Challenges and Solutions. IEEE Netw. 2021, 35, 400–408. [Google Scholar] [CrossRef]
- Zhang, P.; Liu, X.; Li, W.; Yu, X. Pharmaceutical Cold Chain Management Based on Blockchain and Deep Learning. J. Internet Technol. 2021, 22, 1531–1542. [Google Scholar] [CrossRef]
- Zhou, X.; Liang, W.; Wang, K.I.K.; Wang, H.; Yang, L.T.; Jin, Q. Edge-Enabled Two-Stage Scheduling Based on Deep Reinforcement Learning for Internet of Everything. IEEE Internet Things J. 2023, 10, 3295–3304. [Google Scholar] [CrossRef]
- Li, C.; Wang, Z. Efficient Complex ISAR Object Recognition Using Adaptive Deep Relation Learning. IET Comput. Vis. 2020, 14, 185–191. [Google Scholar] [CrossRef]
- Zhou, W.; Zhao, Y.; Chen, W.; Zhang, X. Research on Investment Portfolio Model Based on Neural Network and Genetic Algorithm in Big Data Era. J. Wirel. Com. Netw. 2020, 2020, 228. [Google Scholar] [CrossRef]
- Sajja, G.S.; Meesala, M.K.; Addula, S.R.; Ravipati, P. Optimizing Retail Supply Chain Sales Forecasting with a Mayfly Algorithm-Enhanced Bidirectional Gated Recurrent Unit. SN Comput. Sci. 2025, 6, 737. [Google Scholar] [CrossRef]
- Peng, S.; Liu, Y.; Huang, Q.; Zhou, X.; Bao, H. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4556–4565. [Google Scholar] [CrossRef]
- Shugurov, I.; Li, F.; Busam, B.; Ilic, S. OSOP: A Multi-Stage One Shot Object Pose Estimation Framework. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 6825–6834. [Google Scholar] [CrossRef]
- Xiang, Y.; Schmidt, T.; Narayanan, V.; Fox, D. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv 2017, arXiv:1711.00199. [Google Scholar]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar] [CrossRef]
- Xu, D.; Anguelov, D.; Jain, A. PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 244–253. [Google Scholar] [CrossRef]
- Wang, C.; Xu, D.; Zhu, Y.; Martín-Martín, R.; Lu, C.; Fei-Fei, L.; Savarese, S. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 430–439. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 June 2017; pp. 652–660. [Google Scholar]
- He, Y.; Huang, H.; Fan, H.; Sun, J.; Liu, X.; Yi, L. FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 3003–3013. [Google Scholar]
- Zhao, G.; Yao, Y.; Wang, D.; Chen, Q. A Novel Depth and Color Feature Fusion Framework for 6D Object Pose Estimation. IEEE Trans. Multimed. 2021, 23, 1630–1639. [Google Scholar] [CrossRef]
- Wang, G.; Manhardt, F.; Tombari, F.; Ji, X. GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 16606–16616. [Google Scholar] [CrossRef]
- Do, T.T.; Cai, M.; Pham, T.; Reid, I. Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image. arXiv 2018, arXiv:1802.10367. [Google Scholar] [CrossRef]
- Park, K.; Patten, T.; Vincze, M. Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7667–7676. [Google Scholar] [CrossRef]
- Shan, W.; Chen, S.; Ma, X.; Xu, Y. An Efficient Optimization Framework for 6D Pose Estimation in Terminal Vision Guidance of AUV Docking. In Proceedings of the 2023 5th International Conference on Robotics and Computer Vision (ICRCV), Nanjing, China, 22–24 September 2023; pp. 249–253. [Google Scholar] [CrossRef]
- Hu, Y.; Fua, P.; Wang, W.; Salzmann, M. Single-Stage 6D Object Pose Estimation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2927–2936. [Google Scholar] [CrossRef]
- Avery, A.; Savakis, A. DeepRM: Deep Recurrent Matching for 6D Pose Refinement. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 19–20 June 2023; pp. 6206–6214. [Google Scholar] [CrossRef]
- Li, Z.; Wang, G.; Ji, X. CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7677–7686. [Google Scholar] [CrossRef]
- Zakharov, S.; Shugurov, I.; Ilic, S. DPOD: 6D Pose Object Detector and Refiner. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1941–1950. [Google Scholar] [CrossRef]
- Zuo, L.; Xie, L.; Pan, H.; Wang, Z. A Lightweight Two-End Feature Fusion Network for Object 6D Pose Estimation. Machines 2022, 10, 254. [Google Scholar] [CrossRef]
- Kehl, W.; Manhardt, F.; Tombari, F.; Ilic, S.; Navab, N. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1530–1538. [Google Scholar] [CrossRef]
- Li, F.; Zhang, Y.; Wang, X.; Liu, J.; Gao, Z.; Zhou, Y. NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-Supervised 6D Object Pose Estimation. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 1–2 October 2023; pp. 2115–2125. [Google Scholar] [CrossRef]
- Shugurov, I.; Zakharov, S.; Ilic, S. DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7417–7435. [Google Scholar] [CrossRef] [PubMed]
- Petitjean, T.; Wu, Z.; Laligant, O.; Demonceaux, C. QaQ: Robust 6D Pose Estimation via Quality-Assessed RGB-D Fusion. In Proceedings of the 2023 18th International Conference on Machine Vision and Applications (MVA), Hamamatsu, Japan, 18–22 June 2023; pp. 1–7. [Google Scholar] [CrossRef]
- He, Y.; Sun, W.; Huang, H.; Liu, J.; Fan, H.; Sun, J. PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11629–11638. [Google Scholar] [CrossRef]
- Chen, W.; Duan, J.; Basevi, H.; Chang, H.J.; Leonardis, A. PointPoseNet: Point Pose Network for Robust 6D Object Pose Estimation. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, 1–5 March 2020; pp. 2813–2822. [Google Scholar] [CrossRef]
- Jiang, X.; Li, D.; Chen, H.; Zheng, Y.; Zhao, R.; Wu, L. Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 11164–11174. [Google Scholar] [CrossRef]
- Sun, M.; Zheng, Y.; Bao, T.; Chen, J.; Jin, G.; Wu, L.; Zhao, R.; Jiang, X. Uni6Dv2: Noise Elimination for 6D Pose Estimation. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), San Diego, CA, USA, 10–12 May 2023; pp. 1832–1844. [Google Scholar]
- Kumar, A.; Shukla, P.; Kushwaha, V.; Nandi, G.C. Context-Aware 6D Pose Estimation of Known Objects Using RGB-D Data. arXiv 2022, arXiv:2212.05560. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Wang, H.; Situ, H.; Zhuang, C. 6D Pose Estimation for Bin-Picking Based on Improved Mask R-CNN and DenseFusion. In Proceedings of the 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vasteras, Sweden, 7–10 September 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Xu, Y.; Lin, K.-Y.; Zhang, G.; Wang, X.; Li, H. RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4669–4683. [Google Scholar] [CrossRef] [PubMed]
- Ning, X.; Yang, B.; Huang, S.; Zhang, Z.; Pan, B. RGB-Based Set Prediction Transformer of 6D Pose Estimation for Robotic Grasping Application. IEEE Access 2024, 12, 138047–138060. [Google Scholar] [CrossRef]











| Type | RGB | R-GBD | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| RNNPose [49] | CDPN [33] | DPOD [34] | RKDT [50] | DenseFusion [23] | NF6D [26] | Uni6D [42] | Uni6D-v2 [43] | CA6D [44] | FFB6D [25] | Ours | |
| ape | 88.2 | 64.4 | 87.7 | 91.1 | 92.3 | 93.5 | 93.7 | 95.7 | 99.0 | 98.4 | 95.6 |
| ben | 100 | 97.8 | 98.5 | 97.2 | 93.2 | 95.2 | 99.8 | 99.9 | 99.0 | 100 | 99.5 |
| cam | 98.0 | 91.7 | 96.1 | 88.4 | 94.4 | 93.9 | 95.9 | 95.7 | 98.0 | 99.4 | 99.6 |
| can | 99.3 | 95.9 | 99.7 | 98.2 | 93.1 | 95.5 | 99.0 | 96.0 | 98.0 | 99.0 | 99.4 |
| cat | 96.4 | 83.8 | 94.7 | 88.2 | 96.5 | 98.6 | 98.1 | 99.2 | 96.9 | 99.9 | 99.1 |
| drill. | 99.7 | 96.2 | 98.8 | 97.8 | 87.0 | 94.8 | 99.1 | 99.2 | 97.9 | 100 | 99.6 |
| duck | 89.3 | 66.8 | 86.3 | 80.1 | 92.3 | 96.0 | 89.9 | 92.1 | 94.3 | 98.4 | 97.6 |
| egg | 99.5 | 99.7 | 99.9 | 99.3 | 99.8 | 100 | 100 | 100 | 95.2 | 100 | 99.9 |
| glue | 99.7 | 99.6 | 96.8 | 96.6 | 100 | 99.7 | 99.2 | 99.6 | 95.1 | 100 | 100 |
| hole | 97.4 | 85.8 | 86.9 | 79.4 | 92.1 | 94.5 | 90.2 | 92.0 | 98.0 | 99.5 | 99.5 |
| iron | 100 | 97.9 | 100 | 98.5 | 97.0 | 97.5 | 99.4 | 97.9 | 97.9 | 99.9 | 99.5 |
| lamp | 99.8 | 97.9 | 96.8 | 98.5 | 95.3 | 98.9 | 99.4 | 98.4 | 99.9 | 99.5 | 99.7 |
| phone | 98.4 | 90.8 | 94.7 | 94.8 | 92.8 | 95.1 | 97.4 | 97.6 | 98.0 | 99.7 | 99.0 |
| MEAN | 97.4 | 89.9 | 95.2 | 92.9 | 94.3 | 96.4 | 97.0 | 97.2 | 97.5 | 99.5 | 99.1 |
| Type | FFB6D | DenseFusion | NF6D | Ours |
|---|---|---|---|---|
| ape | 47.2 | 73.2 | 68.4 | 75.0 |
| can | 85.2 | 88.6 | 92.7 | 92.9 |
| cat | 45.7 | 72.2 | 78.0 | 69.3 |
| driller | 81.4 | 92.5 | 95.1 | 88.5 |
| duck | 53.9 | 59.7 | 62.1 | 63.8 |
| eggbox | 70.2 | 94.2 | 96.1 | 96.8 |
| glue | 60.1 | 92.6 | 93.5 | 96.5 |
| holepuncher | 85.9 | 78.8 | 83.6 | 90.6 |
| MEAN | 66.2 | 81.5 | 83.7 | 84.0 |
| Type | DenseFusion | Improved Color Feature Extraction | Improved Point Cloud Feature Extraction | Both |
|---|---|---|---|---|
| ape | 79.5 | 86.8 | 82.1 | 88.8 |
| benchvise | 84.2 | 96.4 | 93.1 | 98.1 |
| camera | 76.5 | 93.1 | 84.8 | 92.7 |
| can | 86.6 | 94.1 | 86.6 | 95.2 |
| cat | 88.8 | 95.4 | 92.0 | 97.7 |
| driller | 77.7 | 94.0 | 87.0 | 94.6 |
| duck | 76.3 | 87.4 | 84.6 | 89.8 |
| eggbox | 99.9 | 99.9 | 99.9 | 99.9 |
| glue | 99.4 | 99.9 | 99.9 | 99.9 |
| holepuncher | 79.0 | 94.5 | 87.3 | 93.6 |
| iron | 92.1 | 95.4 | 93.9 | 95.4 |
| lamp | 92.3 | 95.7 | 95.2 | 97.9 |
| phone | 88.0 | 95.8 | 92.7 | 96.2 |
| MEAN | 86.2 | 94.5 | 90.7 | 95.4 |
| Type | Supervision Network | Adaptive Feature Selection Module | Both |
|---|---|---|---|
| ape | 90.1 | 90.1 | 90.9 |
| benchvise | 97.7 | 97.6 | 98.2 |
| camera | 97.0 | 96.8 | 97.3 |
| can | 97.0 | 97.9 | 98.4 |
| cat | 97.5 | 96.4 | 97.4 |
| driller | 95.7 | 97.0 | 97.8 |
| duck | 91.5 | 91.0 | 92.2 |
| eggbox | 99.9 | 99.9 | 99.9 |
| glue | 99.9 | 99.9 | 99.9 |
| holepuncher | 94.5 | 94.2 | 95.2 |
| iron | 96.2 | 98.3 | 98.5 |
| lamp | 97.5 | 98.2 | 99.0 |
| phone | 96.4 | 96.7 | 97.1 |
| MEAN | 96.2 | 96.3 | 97.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Zhou, K.; Zeng, Q.; Li, P. A Feature-Enhancement 6D Pose Estimation Method for Weakly Textured and Occluded Targets. Electronics 2025, 14, 4125. https://doi.org/10.3390/electronics14204125
Liu X, Zhou K, Zeng Q, Li P. A Feature-Enhancement 6D Pose Estimation Method for Weakly Textured and Occluded Targets. Electronics. 2025; 14(20):4125. https://doi.org/10.3390/electronics14204125
Chicago/Turabian StyleLiu, Xiaoqing, Kaijun Zhou, Qingyuan Zeng, and Peng Li. 2025. "A Feature-Enhancement 6D Pose Estimation Method for Weakly Textured and Occluded Targets" Electronics 14, no. 20: 4125. https://doi.org/10.3390/electronics14204125
APA StyleLiu, X., Zhou, K., Zeng, Q., & Li, P. (2025). A Feature-Enhancement 6D Pose Estimation Method for Weakly Textured and Occluded Targets. Electronics, 14(20), 4125. https://doi.org/10.3390/electronics14204125
