Dynamic Die-Forging Scene Semantic Segmentation via Point Cloud–BEV Feature Fusion with Star Encoding
Abstract
1. Introduction
- 1.
- We construct a comprehensive semantic dataset of forging point clouds encompassing both simulated and real-world scenarios, providing an essential data foundation for 3D recognition tasks in complex industrial environments.
- 2.
- We propose a novel semantic segmentation model that integrates 3D point clouds with Bird’s-Eye View (BEV) for forging applications. Specifically, for the BEV branch encoding stage, we design a star-based encoding module and a hierarchical feature alignment mechanism to enhance the encoder’s nonlinear mapping capability and feature representation capacity. During the BEV decoding phase, we introduce a multi-level feature offset calibration module to address feature misalignment caused by downsampling operations, enabling effective feature alignment throughout the upsampling process. Furthermore, we develop a weighted adaptive feature fusion module to achieve dynamic integration of cross-view features between the point view and BEV representation, significantly improving the accuracy and robustness of forging point cloud segmentation.
- 3.
- The results obtained by training on the synthetic dataset and testing on the real dataset show that PBNet improves the mIoU metric by compared to RPVNet. After further fine-tuning on the real dataset, PBNet achieves an mIoU of , still significantly outperforming RPVNet and PTv3 and maintaining the best performance.
2. Related Work
3. Data and Methods
3.1. Data Acquisition
3.2. Framework Overview
3.3. Point-to-BEV and BEV-to-Point
3.4. BEV Branch Encoding Module
3.4.1. Star-Based Encoding Module
3.4.2. Dual-Branch Subsampling Module
3.5. Decoder Module
3.5.1. Multi-Level Feature Alignment Module
3.5.2. Weighted Feature Fusion Module
3.6. Loss Function
3.7. Evaluation Metric
4. Experiments
4.1. Experimental Setup
4.2. Quantitative Results
4.3. Qualitative Results
4.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Wang, Y. DGCNN: Learning Point Cloud Representations by Dynamic Graph CNN. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2020. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
- Tang, H.; Liu, Z.; Zhao, S.; Lin, Y.; Lin, J.; Wang, H.; Han, S. Searching efficient 3d architectures with sparse point-voxel convolution. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXVIII; Springer: Cham, Switzerland, 2020; pp. 685–702. [Google Scholar]
- Zhou, H.; Zhu, X.; Song, X.; Ma, Y.; Wang, Z.; Li, H.; Lin, D. Cylinder3d: An effective 3d framework for driving-scene lidar semantic segmentation. arXiv 2020, arXiv:2008.01550. [Google Scholar]
- Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. Rangenet++: Fast and accurate lidar semantic segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4213–4220. [Google Scholar]
- Zhang, Y.; Zhou, Z.; David, P.; Yue, X.; Xi, Z.; Gong, B.; Foroosh, H. Polarnet: An improved grid representation for online lidar point clouds semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9601–9610. [Google Scholar]
- Liong, V.E.; Nguyen, T.N.T.; Widjaja, S.; Sharma, D.; Chong, Z.J. Amvnet: Assertion-based multi-view fusion network for lidar semantic segmentation. arXiv 2020, arXiv:2012.04934. [Google Scholar]
- Xu, J.; Zhang, R.; Dou, J.; Zhu, Y.; Sun, J.; Pu, S. Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 16024–16033. [Google Scholar]
- Li, X.; Zhang, G.; Pan, H.; Wang, Z. Cpgnet: Cascade point-grid fusion network for real-time lidar semantic segmentation. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 11117–11123. [Google Scholar]
- Atzmon, M.; Maron, H.; Lipman, Y. Point convolutional neural networks by extension operators. arXiv 2018, arXiv:1803.10091. [Google Scholar] [CrossRef]
- Wu, W.; Qi, Z.; Fuxin, L. Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9621–9630. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
- Wu, X.; Jiang, L.; Wang, P.S.; Liu, Z.; Liu, X.; Qiao, Y.; Ouyang, W.; He, T.; Zhao, H. Point transformer v3: Simpler faster stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 4840–4851. [Google Scholar]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. Shapenet: An information-rich 3d model repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Maturana, D.; Scherer, S. Voxnet: A 3d convolutional neural network for real-time object recognition. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 922–928. [Google Scholar]
- Qi, C.R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L.J. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5648–5656. [Google Scholar]
- Choy, C.B.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 628–644. [Google Scholar]
- Riegler, G.; Osman Ulusoy, A.; Geiger, A. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3577–3586. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Wang, Z.; Lu, F. VoxSegNet: Volumetric CNNs for semantic part segmentation of 3D shapes. IEEE Trans. Vis. Comput. Graph. 2019, 26, 2919–2930. [Google Scholar] [CrossRef] [PubMed]
- Choy, C.; Gwak, J.; Savarese, S. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3075–3084. [Google Scholar]
- Cheng, R.; Razani, R.; Taghavi, E.; Li, E.; Liu, B. 2-s3net: Attentive feature fusion with adaptive feature selection for sparse semantic segmentation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12547–12556. [Google Scholar]
- Ando, A.; Gidaris, S.; Bursuc, A.; Puy, G.; Boulch, A.; Marlet, R. Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5240–5250. [Google Scholar]
- Kong, L.; Liu, Y.; Chen, R.; Ma, Y.; Zhu, X.; Li, Y.; Hou, Y.; Qiao, Y.; Liu, Z. Rethinking range view representation for lidar segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 228–240. [Google Scholar]
- Zhou, Z.; Zhang, Y.; Foroosh, H. Panoptic-polarnet: Proposal-free lidar point cloud panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13194–13203. [Google Scholar]
- Zhou, Y.; Sun, P.; Zhang, Y.; Anguelov, D.; Gao, J.; Ouyang, T.; Guo, J.; Ngiam, J.; Vasudevan, V. End-to-end multi-view fusion for 3d object detection in lidar point clouds. In Proceedings of the Conference on Robot Learning, Virtual, 16–18 November 2020; pp. 923–932. [Google Scholar]
- Wang, Y.; Fathi, A.; Kundu, A.; Ross, D.A.; Pantofaru, C.; Funkhouser, T.; Solomon, J. Pillar-based object detection for autonomous driving. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 18–34. [Google Scholar]
- Zhang, F.; Fang, J.; Wah, B.; Torr, P. Deep fusionnet for point cloud semantic segmentation. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 644–663. [Google Scholar]
- Gerdzhev, M.; Razani, R.; Taghavi, E.; Bingbing, L. Tornado-net: Multiview total variation semantic segmentation with diamond inception module. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 9543–9549. [Google Scholar]
- Liu, Z.; Tang, H.; Lin, Y.; Han, S. Point-voxel cnn for efficient 3d deep learning. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the stars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5694–5703. [Google Scholar]
- Berman, M.; Triki, A.R.; Blaschko, M.B. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4413–4421. [Google Scholar]
- Everingham, M.; Eslami, S.A.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Elhoseiny, M.; Ghanem, B. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Adv. Neural Inf. Process. Syst. 2022, 35, 23192–23204. [Google Scholar]
- Wang, P.S. Octformer: Octree-based transformers for 3d point clouds. ACM Trans. Graph. (TOG) 2023, 42, 1–11. [Google Scholar] [CrossRef]
- Graham, B.; Engelcke, M.; Van Der Maaten, L. 3d semantic segmentation with submanifold sparse convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9224–9232. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Vasu, P.K.A.; Gabriel, J.; Zhu, J.; Tuzel, O.; Ranjan, A. Mobileone: An improved one millisecond mobile backbone. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7907–7917. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]








| Methods | Publication | Forging (%) | Cavity (%) | Speed (ms) | MIoU (%) |
|---|---|---|---|---|---|
| KPConv [15] | 2019 | 71.8 | 71.0 | − | 71.4 |
| RandLA-Net [4] | 2020 | 74.2 | 73.1 | 420 | 73.7 |
| PointNext [37] | 2022 | 75.3 | 74.2 | − | 74.8 |
| OctFormer [38] | 2023 | 77.4 | 76.3 | 91 | 76.9 |
| PTv3 [16] | 2024 | 79.1 | 78.6 | 70 | 78.85 |
| SparseConvNet [39] | 2018 | 70.1 | 69.5 | 200 | 69.8 |
| SPVNAS [5] | 2020 | 78.2 | 77.9 | 160 | 78.0 |
| Cylinder3D [6] | 2020 | 79.5 | 78.4 | 170 | 78.9 |
| RPVNet [10] | 2021 | 80.1 | 79.4 | 165 | 79.8 |
| CPGNet [11] | 2022 | 79.5 | 78.9 | 50 | 79.2 |
| PBNet (Ours) | - | 81.6 | 80.3 | 30 | 80.9 |
| Methods | Publication | Forging (%) | Cavity (%) | Speed (ms) | MIoU (%) |
|---|---|---|---|---|---|
| RandLA-Net [4] | 2020 | 78.4 | 77.6 | 420 | 78.0 |
| OctFormer [38] | 2023 | 82.6 | 80.9 | 91 | 81.8 |
| PTv3 [16] | 2024 | 85.2 | 83.6 | 70 | 84.4 |
| SparseConvNet [39] | 2018 | 74.3 | 72.1 | 200 | 73.2 |
| SPVNAS [5] | 2020 | 83.6 | 82.9 | 160 | 83.25 |
| Cylinder3D [6] | 2020 | 84.7 | 83.5 | 170 | 84.1 |
| RPVNet [10] | 2021 | 85.1 | 84.4 | 165 | 84.8 |
| CPGNet [11] | 2022 | 84.5 | 83.9 | 50 | 84.2 |
| PBNet (Ours) | - | 86.6 | 85.3 | 30 | 85.9 |
| NO. | SEM | DDM | MFAM | WFFM | mIoU |
|---|---|---|---|---|---|
| 1 | ✓ | ✓ | ✓ | ✓ | |
| 2 | ✗ | ✓ | ✓ | ✓ | |
| 3 | ✓ | ✗ | ✓ | ✓ | |
| 4 | ✓ | ✓ | ✗ | ✓ | |
| 5 | ✓ | ✓ | ✓ | ✗ |
| Module | Forging | Cavity | mIoU |
|---|---|---|---|
| MobileOne Block [41] | 83.8 | 82.7 | 83.3 |
| ShuffleNet Block [42] | 81.8 | 80.1 | 80.9 |
| MobilenetV3 Block [43] | 80.2 | 79.1 | 79.7 |
| Ghost Block [44] | 79.4 | 78.7 | 79.1 |
| SEM | 86.6 | 85.3 | 85.9 |
| 0.0 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | |
|---|---|---|---|---|---|---|
| mIoU |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Feng, X.; Wang, A.; Meng, G.; Xu, Y.; Yang, J.; Cheng, X.; Xiong, Y.; Wang, J. Dynamic Die-Forging Scene Semantic Segmentation via Point Cloud–BEV Feature Fusion with Star Encoding. Sensors 2026, 26, 708. https://doi.org/10.3390/s26020708
Feng X, Wang A, Meng G, Xu Y, Yang J, Cheng X, Xiong Y, Wang J. Dynamic Die-Forging Scene Semantic Segmentation via Point Cloud–BEV Feature Fusion with Star Encoding. Sensors. 2026; 26(2):708. https://doi.org/10.3390/s26020708
Chicago/Turabian StyleFeng, Xuewen, Aiming Wang, Guoying Meng, Yiyang Xu, Jie Yang, Xiaohan Cheng, Yijin Xiong, and Juntao Wang. 2026. "Dynamic Die-Forging Scene Semantic Segmentation via Point Cloud–BEV Feature Fusion with Star Encoding" Sensors 26, no. 2: 708. https://doi.org/10.3390/s26020708
APA StyleFeng, X., Wang, A., Meng, G., Xu, Y., Yang, J., Cheng, X., Xiong, Y., & Wang, J. (2026). Dynamic Die-Forging Scene Semantic Segmentation via Point Cloud–BEV Feature Fusion with Star Encoding. Sensors, 26(2), 708. https://doi.org/10.3390/s26020708

