Localization and Pixel-Confidence Network for Surface Defect Segmentation
Abstract
1. Introduction
- As shown in Figure 1, the background region occupies a significantly larger area than the defect region. This imbalance causes small defects to be overlooked during training, leading to a degraded segmentation performance. Although existing studies have attempted to address this issue by improving sample-selection strategies, designing class-sensitive evaluation metrics, and developing weighted loss functions, several problems remain. These include limited model convergence efficiency, high sensitivity to hyperparameters, and poor cross-domain generalization [15,16].
- As shown in Figure 2, there are numerous gap areas within the specific type of defect, and these gap areas are located above the background area. When relying solely on pixel value, texture, or morphological features for region partitioning [17,18], these internal gaps are often misclassified as being part of the background. This over-segmentation results in underestimated defect areas compared to the ground truth.
- (1)
- LPC-Net is proposed. It addresses two key challenges in surface defect segmentation: foreground–background area imbalance, and the over-segmentation of internal defect gaps. Experimental results show that LPC-Net improves mean Pixel Accuracy () and mean Intersection over Union () compared to baseline and state-of-the-art (SOTA) methods.
- (2)
- A DLM Module is introduced. It estimates the location of defects and embeds the spatial information into the loss function of the second-stage backbone through weighted coefficients. By dynamically adjusting pixel-wise loss weights, the model’s sensitivity to defect regions is enhanced, mitigating the adverse effects of area imbalance.
- (3)
- A PCM is introduced. It captures the probabilistic distribution of neighboring pixels from the initial segmentation results and generates pixel confidence matrices to guide the model’s decision-making. It reduces the misclassification of fine internal gaps within defects.
2. Related Works
2.1. Image Segmentation
2.2. Feature Enhancement
2.3. Area Imbalance
3. Methods
3.1. Overall Network Architecture
3.2. Defect Localization Module
3.3. Pixel Confidence Module
- Increased misclassification at defect boundaries: In edge regions, especially where boundary kurtosis is high (e.g., sharp tips or needle-like protrusions), the PCM may incorrectly assign background labels to true defect pixels. This occurs because the surrounding pixels are predominantly background, lowering the confidence score of the edge pixel after PCM processing.
- Parameter explosion: Even with small , prolonged training can lead to value saturation in and , pushing them toward binary extremes (0 or 1), compromising segmentation accuracy.
3.4. Loss Function
4. Experiments and Results
4.1. Datasets
4.2. Performance Metrics
4.3. Experimental Setup
4.4. Experiment and Analysis
4.5. Ablative Study
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Huang, Y.; Qiu, C.; Yuan, K. Surface defect saliency of magnetic tile. Vis. Comput. 2020, 36, 85–96. [Google Scholar] [CrossRef]
- Rasheed, A.; Zafar, B.; Rasheed, A.; Ali, N.; Sajid, M.; Dar, S.H.; Habib, U.; Shehryar, T.; Mahmood, M.T. Fabric defect detection using computer vision techniques: A comprehensive review. Math. Probl. Eng. 2020, 2020, 8189403. [Google Scholar] [CrossRef]
- Ren, Z.; Fang, F.; Yan, N.; Wu, Y. State of the art in defect detection based on machine vision. Int. J. Precis. Eng. Manuf.-Green Technol. 2022, 9, 661–691. [Google Scholar] [CrossRef]
- Malamas, E.N.; Petrakis, E.G.; Zervakis, M.; Petit, L.; Legat, J.D. A survey on industrial vision systems, applications and tools. Image Vis. Comput. 2003, 21, 171–188. [Google Scholar] [CrossRef]
- Wang, H.; Li, X.; Huo, L.; Hu, C. Global and edge enhanced transformer for semantic segmentation of remote sensing. Appl. Intell. 2024, 54, 5658–5673. [Google Scholar] [CrossRef]
- Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020, 31, 759–776. [Google Scholar] [CrossRef]
- Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of deep convolutional neural networks and edge detectors for image-based crack detection in concrete. Constr. Build. Mater. 2018, 186, 1031–1045. [Google Scholar] [CrossRef]
- Nock, R.; Nielsen, F. Statistical region merging. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1452–1458. [Google Scholar] [CrossRef] [PubMed]
- Tang, Z.; Tang, J.; Zou, D.; Rao, J.; Qi, F. Two guidance joint network based on coarse map and edge map for camouflaged object detection. Appl. Intell. 2024, 54, 7531–7544. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Li, B.; Niu, S.; Wang, Z.; Wang, M.; Niu, T. LSA-Net: Location and shape attention network for automatic surface defect segmentation. J. Manuf. Process. 2023, 99, 65–77. [Google Scholar] [CrossRef]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice loss for data-imbalanced NLP tasks. arXiv 2019, arXiv:1911.02855. [Google Scholar]
- Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and challenges of image segmentation: A review. Electronics 2023, 12, 1199. [Google Scholar] [CrossRef]
- Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Li, Y.; Zhang, Y.; Cui, W.; Lei, B.; Kuang, X.; Zhang, T. Dual encoder-based dynamic-channel graph convolutional network with edge enhancement for retinal vessel segmentation. IEEE Trans. Med. Imaging 2022, 41, 1975–1989. [Google Scholar] [CrossRef] [PubMed]
- Li, N.; Zhang, T.; Li, B.; Yuan, B.; Xu, S. RS-UNet: Lightweight network with reflection suppression for floating objects segmentation. Signal Image Video Process. 2023, 17, 4319–4326. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; SegNet, R.C. A deep convolutional encoder-decoder architecture for image segmentation. arXiv 2015, arXiv:1511.00561. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Yu, F. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Ji, J.; Lu, X.; Luo, M.; Yin, M.; Miao, Q.; Liu, X. Parallel fully convolutional network for semantic segmentation. IEEE Access 2020, 9, 673–682. [Google Scholar] [CrossRef]
- Zhang, C.; Jiang, W.; Zhang, Y.; Wang, W.; Zhao, Q.; Wang, C. Transformer and CNN hybrid deep neural network for semantic segmentation of very-high-resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–20. [Google Scholar] [CrossRef]
- Hu, K.; Zhang, Z.; Niu, X.; Zhang, Y.; Cao, C.; Xiao, F.; Gao, X. Retinal vessel segmentation of color fundus images using multiscale convolutional neural network with an improved cross-entropy loss function. Neurocomputing 2018, 309, 179–191. [Google Scholar] [CrossRef]
- Liu, Z.; Yeoh, J.K.; Gu, X.; Dong, Q.; Chen, Y.; Wu, W.; Wang, L.; Wang, D. Automatic pixel-level detection of vertical cracks in asphalt pavement based on GPR investigation and improved mask R-CNN. Autom. Constr. 2023, 146, 104689. [Google Scholar] [CrossRef]
- Xie, X.; Pan, X.; Zhang, W.; An, J. A context hierarchical integrated network for medical image segmentation. Comput. Electr. Eng. 2022, 101, 108029. [Google Scholar] [CrossRef]
- Wang, T.S.; Kim, G.T.; Kim, M.; Jang, J. Contrast Enhancement-Based Preprocessing Process to Improve Deep Learning Object Task Performance and Results. Appl. Sci. 2023, 13, 10760. [Google Scholar] [CrossRef]
- Dai, X.; Xia, M.; Weng, L.; Hu, K.; Lin, H.; Qian, M. Multiscale location attention network for building and water segmentation of remote sensing image. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–19. [Google Scholar] [CrossRef]
- Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Zhang, Y.; Xu, X. Objectaug: Object-level data augmentation for semantic image segmentation. In Proceedings of the 2021 IEEE International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19529–19539. [Google Scholar]
- Hong, Y.; Pan, H.; Sun, W.; Jia, Y. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv 2021, arXiv:2101.06085. [Google Scholar]
- Zhang, W.; Pang, J.; Chen, K.; Loy, C.C. K-net: Towards unified image segmentation. Adv. Neural Inf. Process. Syst. 2021, 34, 10326–10338. [Google Scholar]
Layer Name | Output Size | Type |
---|---|---|
Input Image | / | |
Bottleneck block1 | ||
Bottleneck block2 | ||
Bottleneck block3 | ||
Bottleneck block4 | ||
Bottleneck block5 | ||
Bottleneck block6 | ||
Bottleneck block7 | ||
Bottleneck block8 | ||
Upsample1 | ||
Upsample2 |
CF Defect Datasets | MT Defect Dataset | |||||
---|---|---|---|---|---|---|
Model | ||||||
FCN | 92.28 | 87.12 | 84.44 | 81.00 | 74.22 | 43.48 |
U-Net | 93.37 | 88.79 | 86.67 | 79.17 | 75.06 | 44.93 |
PIDNet | 91.68 | 88.07 | 86.67 | 82.22 | 76.16 | 47.83 |
DDRNet | 92.96 | 88.90 | 88.89 | 81.99 | 75.59 | 46.83 |
K-Net | 94.11 | 89.28 | 88.89 | 81.84 | 73.91 | 42.03 |
DeepLabV3+ | 93.24 | 87.82 | 86.67 | 79.72 | 75.32 | 44.93 |
LPC-Net | 94.52 | 89.68 | 91.11 | 83.65 | 76.48 | 49.28 |
Stage | Parameters | GFLOPs | Latency | FPS |
---|---|---|---|---|
First stage | 0.16 M | 0.65 | 1.26 ms | 794.2 |
Second stage | 39.75 M | 14.84 | 6.66 ms | 150.1 |
LPC-Net (total) | 39.91 M | 15.49 | 7.92 ms | 126.2 |
CF Defect Dataset | MT Defect Dataset | |||||
---|---|---|---|---|---|---|
Model | ||||||
deeplabv3+ | 93.24 | 87.82 | 86.67 | 79.72 | 75.32 | 44.93 |
LPC-Net (DLM) | 94.22 | 89.40 | 88.89 | 83.30 | 76.88 | 47.83 |
LPC-Net (PCM) | 94.00 | 89.60 | 86.67 | 80.20 | 75.22 | 43.48 |
LPC-Net (DLM + PCM) | 94.52 | 89.68 | 91.11 | 83.65 | 76.48 | 49.28 |
: | |||
---|---|---|---|
1.0:0 | 93.24 | 87.82 | 86.67 |
0.9:0.1 | 93.64 | 88.15 | 86.67 |
0.85:0.15 | 94.08 | 88.96 | 88.89 |
0.8:0.2 | 94.00 | 89.60 | 86.67 |
0.75:0.25 | 93.08 | 88.03 | 84.44 |
0.7:0.3 | 92.27 | 88.60 | 86.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Xu, Z.; Mei, L.; Guo, R.; Zhang, J.; Zhang, T.; Liu, H. Localization and Pixel-Confidence Network for Surface Defect Segmentation. Sensors 2025, 25, 4548. https://doi.org/10.3390/s25154548
Wang Y, Xu Z, Mei L, Guo R, Zhang J, Zhang T, Liu H. Localization and Pixel-Confidence Network for Surface Defect Segmentation. Sensors. 2025; 25(15):4548. https://doi.org/10.3390/s25154548
Chicago/Turabian StyleWang, Yueyou, Zixuan Xu, Li Mei, Ruiqing Guo, Jing Zhang, Tingbo Zhang, and Hongqi Liu. 2025. "Localization and Pixel-Confidence Network for Surface Defect Segmentation" Sensors 25, no. 15: 4548. https://doi.org/10.3390/s25154548
APA StyleWang, Y., Xu, Z., Mei, L., Guo, R., Zhang, J., Zhang, T., & Liu, H. (2025). Localization and Pixel-Confidence Network for Surface Defect Segmentation. Sensors, 25(15), 4548. https://doi.org/10.3390/s25154548