Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation
Abstract
:1. Introduction
- We propose a PEFT method based on partial convolution and LoRA, named MSPCA-LoRA, to facilitate the transfer of knowledge from natural images to industrial defect images. This PEFT method not only ensures extensive knowledge transfer to industrial scenarios but also enhances the model’s sensitivity to local prior knowledge across scales, all while maintaining parameter efficiency.
- We devise the IPEG to automatically generate prompt embeddings, eliminating the need for manual prompt design during both training and inference. These prompt embeddings are then used to guide predicting segmentation masks, thus enhancing SAM’s practical applicability in industrial settings.
- We make slight yet effective architectural adjustments to the mask decoder, transforming SAM into an end-to-end semantic segmentation model suited for defect segmentation tasks. Based on this architecture, we leverage MSPCA-LoRA and the IPEG to build PA-SAM. We conduct extensive comparison and ablation experiments on two common-used defect segmentation datasets, demonstrating the effectiveness of our proposed method for such downstream tasks.
2. Related Work
2.1. Defect Segmentation Models
2.2. Segment Anything Model
2.3. Adaptation of SAM
3. Methodology
3.1. Overview of PA-SAM
3.2. MSPCA-LoRA
3.3. Image-to-Prompt Embedding Generator
3.4. Multi-Class Mask Decoder
3.5. Loss Function for Segmentation
4. Experiments and Analysis of Results
4.1. Datasets
4.2. Implementations
4.3. Evaluation Metrics
4.4. Comparison Study
4.5. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Tulbure, A.A.; Tulbure, A.A.; Dulf, E.H. A review on modern defect detection models using DCNNs–Deep convolutional neural networks. J. Adv. Res. 2022, 35, 33–48. [Google Scholar] [CrossRef] [PubMed]
- Tang, B.; Chen, L.; Sun, W.; Lin, Z.K. Review of surface defect detection of steel products based on machine vision. IET Image Process. 2023, 17, 303–322. [Google Scholar] [CrossRef]
- Ren, Z.; Fang, F.; Yan, N.; Wu, Y. State of the art in defect detection based on machine vision. Int. J. Precis. Eng. Manuf.-Green Technol. 2022, 9, 661–691. [Google Scholar] [CrossRef]
- Saberironaghi, A.; Ren, J.; El-Gindy, M. Defect detection methods for industrial products using deep learning techniques: A review. Algorithms 2023, 16, 95. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Liang-Chieh, C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. Semantic image segmentation with deep convolutional nets and fully connected crfs. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Mottaghi, R.; Chen, X.; Liu, X.; Cho, N.G.; Lee, S.W.; Fidler, S.; Urtasun, R.; Yuille, A. The role of context for object detection and semantic segmentation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 891–898. [Google Scholar]
- Wang, Y.; Zhang, Y.; Jiang, Z.; Zheng, L.; Chen, J.; Lu, J. Robust learning against label noise based on activation trend tracking. IEEE Trans. Instrum. Meas. 2022, 71, 5025812. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Zhou, C.; Li, Q.; Li, C.; Yu, J.; Liu, Y.; Wang, G.; Zhang, K.; Ji, C.; Yan, Q.; He, L.; et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. Int. J. Mach. Learn. Cybern. 2024, 15, 1–65. [Google Scholar] [CrossRef]
- Han, Z.; Gao, C.; Liu, J.; Zhang, J.; Zhang, S.Q. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv 2024, arXiv:2403.14608. [Google Scholar]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
- Sung, Y.L.; Cho, J.; Bansal, M. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Adv. Neural Inf. Process. Syst. 2022, 35, 12991–13005. [Google Scholar]
- Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4015–4026. [Google Scholar]
- Ji, W.; Li, J.; Bi, Q.; Liu, T.; Li, W.; Cheng, L. Segment anything is not always perfect: An investigation of sam on different real-world applications. Mach. Intell. Res. 2024, 21, 617–630. [Google Scholar] [CrossRef]
- Hu, B.; Gao, B.; Tan, C.; Wu, T.; Li, S.Z. Segment anything in defect detection. arXiv 2023, arXiv:2311.10245. [Google Scholar]
- Chen, K.; Liu, C.; Chen, H.; Zhang, H.; Li, W.; Zou, Z.; Shi, Z. RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4701117. [Google Scholar] [CrossRef]
- Zhang, K.; Liu, D. Customized segment anything model for medical image segmentation. arXiv 2023, arXiv:2304.13785. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Yang, B.; Liu, Z.; Duan, G.; Tan, J. Residual shape adaptive dense-nested Unet: Redesign the long lateral skip connections for metal surface tiny defect inspection. Pattern Recognit. 2024, 147, 110073. [Google Scholar] [CrossRef]
- Kong, D.; Hu, X.; Gong, Z.; Zhang, D. Segmentation of void defects in X-ray images of chip solder joints based on PCB-DeepLabV3 algorithm. Sci. Rep. 2024, 14, 11925. [Google Scholar] [CrossRef] [PubMed]
- DosoViTskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7262–7272. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar]
- Cheng, B.; Schwing, A.; Kirillov, A. Per-pixel classification is not all you need for semantic segmentation. Adv. Neural Inf. Process. Syst. 2021, 34, 17864–17875. [Google Scholar]
- Zhao, L.; Zhang, Y.; Duan, J.; Yu, J. Cross-supervised contrastive learning domain adaptation network for steel defect segmentation. Adv. Eng. Inform. 2025, 64, 102964. [Google Scholar] [CrossRef]
- Ma, M.; Yang, L.; Liu, Y.; Yu, H. A transformer-based network with feature complementary fusion for crack defect detection. IEEE Trans. Intell. Transp. Syst. 2024, 25, 16989–17006. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Liu, S.; Zeng, Z.; Ren, T.; Li, F.; Zhang, H.; Yang, J.; Jiang, Q.; Li, C.; Yang, J.; Su, H.; et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Cham, Switzerland, 2024; pp. 38–55. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Zhang, C.; Liu, L.; Cui, Y.; Huang, G.; Lin, W.; Yang, Y.; Hu, Y. A comprehensive survey on segment anything model for vision and beyond. arXiv 2023, arXiv:2305.08196. [Google Scholar]
- Ma, J.; He, Y.; Li, F.; Han, L.; You, C.; Wang, B. Segment anything in medical images. Nat. Commun. 2024, 15, 654. [Google Scholar] [CrossRef]
- Cen, J.; Zhou, Z.; Fang, J.; Shen, W.; Xie, L.; Jiang, D.; Zhang, X.; Tian, Q. Segment anything in 3d with nerfs. Adv. Neural Inf. Process. Syst. 2023, 36, 25971–25990. [Google Scholar]
- Zhang, R.; Jiang, Z.; Guo, Z.; Yan, S.; Pan, J.; Ma, X.; Dong, H.; Gao, P.; Li, H. Personalize segment anything model with one shot. arXiv 2023, arXiv:2305.03048. [Google Scholar]
- Chen, T.; Zhu, L.; Deng, C.; Cao, R.; Wang, Y.; Zhang, S.; Li, Z.; Sun, L.; Zang, Y.; Mao, P. Sam-adapter: Adapting segment anything in underperformed scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 3367–3375. [Google Scholar]
- Chen, T.; Lu, A.; Zhu, L.; Ding, C.; Yu, C.; Ji, D.; Li, Z.; Sun, L.; Mao, P.; Zang, Y. Sam2-adapter: Evaluating & adapting segment anything 2 in downstream tasks: Camouflage, shadow, medical image segmentation, and more. arXiv 2024, arXiv:2408.04579. [Google Scholar]
- Ye, Z.; Lovell, L.; Faramarzi, A.; Ninić, J. Sam-based instance segmentation models for the automation of structural damage detection. Adv. Eng. Inform. 2024, 62, 102826. [Google Scholar] [CrossRef]
- Pu, X.; Jia, H.; Zheng, L.; Wang, F.; Xu, F. Classwise-sam-adapter: Parameter efficient fine-tuning adapts segment anything to sar domain for semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4791–4804. [Google Scholar] [CrossRef]
- Yan, Z.; Li, J.; Li, X.; Zhou, R.; Zhang, W.; Feng, Y.; Diao, W.; Fu, K.; Sun, X. RingMo-SAM: A foundation model for segment anything in multimodal remote-sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5625716. [Google Scholar] [CrossRef]
- Qiu, Z.; Hu, Y.; Li, H.; Liu, J. Learnable ophthalmology sam. arXiv 2023, arXiv:2304.13425. [Google Scholar]
- Wang, A.; Islam, M.; Xu, M.; Zhang, Y.; Ren, H. Sam meets robotic surgery: An empirical study on generalization, robustness and adaptation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer: Cham, Switzerland, 2023; pp. 234–244. [Google Scholar]
- Li, Y.; Wang, D.; Yuan, C.; Li, H.; Hu, J. Enhancing agricultural image segmentation with an agricultural segment anything model adapter. Sensors 2023, 23, 7884. [Google Scholar] [CrossRef]
- Zhong, Z.; Tang, Z.; He, T.; Fang, H.; Yuan, C. Convolution meets lora: Parameter efficient finetuning for segment anything model. arXiv 2024, arXiv:2401.17868. [Google Scholar]
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv 2017, arXiv:1701.06538. [Google Scholar]
- Cao, S.; Wu, Q.; Ma, L. TongueSAM: An Universal Tongue Segmentation Model Based on SAM with Zero-Shot. In Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkey, 5–8 December 2023; pp. 4520–4526. [Google Scholar]
- Seyoum Wahd, A.; Felfeliyan, B.; Zhou, Y.; Ghosh, S.; McArthur, A.; Zhang, J.; Jaremko, J.L.; Hareendranathan, A. Sam2Rad: A Segmentation Model for Medical Images with Learnable Prompts. arXiv 2024, arXiv:2409.06821. [Google Scholar]
- Liu, N.; Xu, X.; Su, Y.; Zhang, H.; Li, H.C. PointSAM: Pointly-Supervised Segment Anything Model for Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5608515. [Google Scholar] [CrossRef]
- Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020, 31, 759–776. [Google Scholar] [CrossRef]
- Huang, Y.; Yang, X.; Liu, L.; Zhou, H.; Chang, A.; Zhou, X.; Chen, R.; Yu, J.; Chen, J.; Chen, C.; et al. Segment anything model for medical images? Med. Image Anal. 2024, 92, 103061. [Google Scholar] [CrossRef] [PubMed]
- Shaharabany, T.; Dahan, A.; Giryes, R.; Wolf, L. Autosam: Adapting sam to medical images by overloading the prompt encoder. arXiv 2023, arXiv:2306.06370. [Google Scholar]
- Zhang, X.; Liu, Y.; Lin, Y.; Liao, Q.; Li, Y. Uv-sam: Adapting segment anything model for urban village identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 22520–22528. [Google Scholar]
- Delle Castelle, C.; Spampinato, F.; Proietto Salanitri, F.; Bellitto, G.; Spampinato, C. Leveraging SAM and Learnable Prompts for Pancreatic MRI Segmentation. In Proceedings of the International Workshop on Personalized Incremental Learning in Medicine, Marrakesh, Morocco, 10 October 2024; Springer: Cham, Switzerland, 2024; pp. 25–34. [Google Scholar]
- Chen, J.; Kao, S.h.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar]
- Song, G.; Song, K.; Yan, Y. Saliency detection for strip steel surface defects using multiple constraints and improved texture features. Opt. Lasers Eng. 2020, 128, 106000. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, Y.; Jiang, Z.; Zheng, L.; Chen, J.; Lu, J. Prototype-based supervised contrastive learning method for noisy label correction in tire defect detection. IEEE Sens. J. 2023, 24, 660–670. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
- Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 19529–19539. [Google Scholar]
Dataset | SD-Saliency-900 | Tire-Seg | ||
---|---|---|---|---|
Method | mIoU | mDice | mIoU | mDice |
U-Net | 62.21 | 74.45 | 59.88 | 73.79 |
Deeplabv3+ | 65.27 | 78.37 | 55.15 | 70.37 |
Segformer | 66.52 | 79.50 | 61.25 | 74.72 |
Mask2Former | 71.73 | 83.74 | 66.87 | 78.77 |
PIDNet | 71.25 | 82.86 | 64.72 | 76.38 |
PA-SAM | 73.87 | 84.90 | 68.30 | 80.22 |
Rank | Params (M) | ||
---|---|---|---|
MSPCA-LoRA | Conv-LoRA | LoRA | |
0.5970 | 0.5979 | 0.5967 | |
0.6038 | 0.6076 | 0.6028 | |
0.6108 | 0.6197 | 0.6090 | |
0.6339 | 0.6695 | 0.6274 |
Dataset | SD-Saliency-900 | Tire-Seg | ||
---|---|---|---|---|
PEFT Method | mIoU | mDice | mIoU | mDice |
LoRA | 70.70 | 82.29 | 64.08 | 76.43 |
Conv-LoRA | 72.34 | 83.39 | 66.71 | 78.70 |
MSPCA-LoRA | 72.76 | 83.70 | 66.87 | 78.91 |
Module | Params (M)/Ratio (%) | SD-Saliency-900 | Tire-Seg | |||
---|---|---|---|---|---|---|
MSPCA-LoRA | IPEG | mIoU | mDice | mIoU | mDice | |
4.20/4.48% | 50.16 | 63.70 | 43.90 | 53.92 | ||
✓ | 26.19/22.60% | 72.76 | 83.80 | 66.87 | 78.91 | |
✓ | 5.16/5.45% | 52.93 | 67.65 | 45.00 | 55.15 | |
✓ | ✓ | 27.15/23.24% | 73.87 | 84.90 | 68.30 | 80.22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, Y.; Chen, J.; Lu, J. Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation. Sensors 2025, 25, 2417. https://doi.org/10.3390/s25082417
Jiang Y, Chen J, Lu J. Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation. Sensors. 2025; 25(8):2417. https://doi.org/10.3390/s25082417
Chicago/Turabian StyleJiang, Yifan, Jinshui Chen, and Jiangang Lu. 2025. "Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation" Sensors 25, no. 8: 2417. https://doi.org/10.3390/s25082417
APA StyleJiang, Y., Chen, J., & Lu, J. (2025). Leveraging Vision Foundation Model via PConv-Based Fine-Tuning with Automated Prompter for Defect Segmentation. Sensors, 25(8), 2417. https://doi.org/10.3390/s25082417