You Only Look Once–Aluminum: A Detection Model for Complex Aluminum Surface Defects Based on Improved YOLOv8
Abstract
:1. Introduction
- We design a C2f module incorporating Universal Inverted Bottleneck and Space-to-depth Conv (C2f-US) effectively, which enhances the model’s capability to extract features from low-resolution images and small objects, while simplifying its architecture and reducing computational overhead in terms of floating-point operations.
- We introduce the Channel Prior and Multi-scale Convolutional Attention (CPMSCA) mechanism, which improves the module’s ability to capture contextual information through a symmetry-like strip convolutional structure, enabling more efficient extraction of complex defective features.
- We propose an Omni-Dimensional Efficient-reparameterized Generalized Feature Pyramid Network (ODE-RepGFPN) to more effectively capture both high-level semantics and low-level spatial information. We also refactor the Rep module to make it more efficient when training.
- Integrating Wise-IoU (WIoU) and Focaler-IoU and Focaler-WIoU (FW-IoU) is proposed to make the model focus more on hard-to-classify samples that are on the edge of positive and negative samples and to solve the problem of data imbalance.
2. Related Works
3. Methods
3.1. YOLOv8n Model Architecture
3.2. YOLO-AL Model Architecture
3.3. C2f-UIB with SPD-Conv
3.4. Channel Prior and Multi-Scale Convolutional Attention
3.5. ODE-RepGFPN
3.6. Focaler-WIoU
4. Experiments and Results
4.1. Experiment Introduction
4.1.1. Experimental Settings
4.1.2. Dataset
4.1.3. Evaluation Indicators
4.2. Visualization of the Results
4.3. Ablation Experiment
4.4. Comparative Experiment
4.4.1. Comparison with Other Models
4.4.2. Comparison of C2f-US Utilization Schemes
4.4.3. Comparison of FW-IoU Utilization Schemes
5. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Stojanovic, B.; Bukvic, M.; Epler, I. Application of Aluminum and Aluminum Alloys in Engineering. Appl. Eng. Lett. 2018, 3, 52–62. [Google Scholar] [CrossRef]
- Fu, P.; Peng, L.; Ding, W. Automobile Lightweight Technology: Development Trends of Aluminum/Magnesium Alloys and Their Forming Technologies. Strateg. Study CAE 2018, 20, 84–90. [Google Scholar] [CrossRef]
- Yuferov, Y.; Zykov, F.; Malshakova, E. Defects of Porous Self-Structured Anodic Alumina Oxide on Industrial Aluminum Grades. Solid State Phenom. 2018, 284, 1134–1139. [Google Scholar] [CrossRef]
- Wu, Q.; Dong, K.; Qin, X.; Hu, Z.; Xiong, X. Magnetic Particle Inspection: Status, Advances, and Challenges—Demands for Automatic Non-Destructive Testing. NDT E Int. 2024, 143, 103030. [Google Scholar] [CrossRef]
- Alay, T.K.; Cagirici, M.; Yagmur, A.; Gur, C.H. Determination of the Anisotropy in Mechanical Properties of Laser Powder Bed Fusion Inconel 718 by Ultrasonic Testing. Nondestruct. Test. Eval. 2024, 40, 206–224. [Google Scholar] [CrossRef]
- Liu, J.; Zhang, F.; Fang, L. The Summary on Important Damage Detection Technology of Composite Materials. Adv. Mat. Res. 2014, 1055, 32–37. [Google Scholar] [CrossRef]
- Gulhan, U.K. Development of hybrid optical sensor based on deep learning to detect and classify the micro-size defects in printed circuit board. Measurement 2023, 206, 112247. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E. SSD: Single Shot MultiBox Detector. In Lecture Notes in Computer Science–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Alkandary, K.; Yildiz, A.S.; Meng, H. A Comparative Study of YOLO Series (v3–v10) with DeepSORT and StrongSORT: A Real-Time Tracking Performance Study. Electronics 2025, 14, 876. [Google Scholar] [CrossRef]
- Zhang, G.; Cao, X.; Liu, S.; Jin, L.; Yang, Q. Electromagnetic Ultrasonic Nonlinear Detection of Plastic Damage in aluminum Based on Cumulative Effect. JET 2019, 34, 3961–3967. [Google Scholar] [CrossRef]
- Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8. arXiv 2024, arXiv:2305.09972. [Google Scholar] [CrossRef]
- Zhang, D.; Song, K.; Xu, J.; He, Y.; Yan, Y. Unified Detection Method of Aluminum Profile Surface Defects: Common and Rare Defect Categories. Opt. Lasers Eng. 2020, 126, 105936. [Google Scholar] [CrossRef]
- Wang, W.; Chen, J.; Han, G.; Shi, X.; Qian, G. Application of Object Detection Algorithms in Non-Destructive Testing of Pressure Equipment: A Review. Sensors 2024, 24, 5944. [Google Scholar] [CrossRef] [PubMed]
- Roy, A.M.; Bhaduri, J. DenseSPH-YOLOv5: An Automated Damage Detection Model Based on DenseNet and Swin-Transformer Prediction Head-Enabled YOLOv5 with Attention Mechanism. Adv. Eng. Inform. 2023, 56, 102007. [Google Scholar] [CrossRef]
- Gao, G.; Ma, Y.; Wang, J.; Li, Z.; Wang, Y.; Bai, H. CFR-YOLO: A Novel Cow Face Detection Network Based on YOLOv7 Improvement. Sensors 2025, 25, 1084. [Google Scholar] [CrossRef]
- Xu, K.; Lu, X.; Shen, T.; Zhu, X.; Wang, S.; Wang, X.; Wang, J. Rebar binding point location method based on improved YOLOv5 and thinning algorithm. Measurement 2025, 242, 116029. [Google Scholar] [CrossRef]
- Zhang, M.; Ye, S.; Zhao, S.; Wang, W.; Xie, C. Pear Object Detection in Complex Orchard Environment Based on Improved YOLO11. Symmetry 2025, 17, 255. [Google Scholar] [CrossRef]
- Li, L.; Zhang, P.; Yang, S.; Jiao, W. YOLOv5-SFE: An algorithm fusing spatio-temporal features for detecting and recognizing workers’ operating behaviors. Adv. Eng. Inform. 2023, 56, 101988. [Google Scholar] [CrossRef]
- Wang, X.; Gao, H.; Jia, Z.; Li, Z. BL-YOLOv8: An Improved Road Defect Detection Model Based on YOLOv8. Sensors 2023, 23, 8361. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, F.; Lei, G.B.; Xiong, Y.; Xu, L.; Xu, C.Z.; Zhu, W. LSD-YOLOv5: A Steel Strip Surface Defect Detection Algorithm Based on Lightweight Network and Enhanced Feature Fusion Mode. Sensors 2023, 23, 6558. [Google Scholar] [CrossRef]
- Wu, D.; Meng, F. NBD-YOLOv5: An Efficient and Accurate Aluminum Surface Defect Detection Method. In Proceedings of the 2024 7th International Conference on Advanced Algorithms and Control Engineering, Shanghai, China, 1–3 March 2024; pp. 1190–1196. [Google Scholar] [CrossRef]
- Wang, L.; Zhang, G.; Wang, W.; Chen, J.; Jing, X.; Yuan, H.; Huang, Z. A defect detection method for industrial aluminum sheet surface based on improved YOLOv8 algorithm. Front. Phys. 2024, 12, 1419998. [Google Scholar] [CrossRef]
- Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on ADICS, Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B.; et al. MobileNetV4: Universal Models for the Mobile Ecosystem. In Computer Vision—ECCV 2024. ECCV 2024. Lecture Notes in Computer Science; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; Volume 15098. [Google Scholar] [CrossRef]
- Nascimento, M.G.D.; Prisacariu, V.; Fawcett, R. DSConv: Efficient Convolution Operator. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 5147–5156. [Google Scholar] [CrossRef]
- Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Cham, Switzerland, 13–17 September 2022. [Google Scholar] [CrossRef]
- Huang, H.; Chen, Z.; Zou, Y.; Lu, M.; Chen, C.; Song, Y.; Zhang, H.; Yan, F. Channel prior convolutional attention for medical image segmentation. Comput. Biol. Med. 2024, 178, 108784. [Google Scholar] [CrossRef] [PubMed]
- Guo, M.-H.; Lu, C.-Z.; Hou, Q.; Liu, Z.; Cheng, M.-M.; Hu, S.-M. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. arXiv 2022, arXiv:2209.08575. [Google Scholar] [CrossRef]
- Li, C.; Zhou, A.; Yao, A. Omni-Dimensional Dynamic Convolution. arXiv 2022, arXiv:2209.07947. [Google Scholar] [CrossRef]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. DAMO-YOLO: A Report on Real-Time Object Detection Design. arXiv 2023, arXiv:2211.15444. [Google Scholar] [CrossRef]
- Zhang, H.; Zhang, S. Focaler-IoU: More Focused Intersection over Union Loss. arXiv 2024, arXiv:2401.10525. [Google Scholar] [CrossRef]
- Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vision 2020, 128, 336–359. [Google Scholar] [CrossRef]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. arXiv 2019, arXiv:1904.08189. [Google Scholar] [CrossRef]
- Sun, Y.; Dong, L.; Huang, S.; Ma, S.; Xia, Y.; Xue, J.; Wang, J.; Wei, F. Retentive Network: A Successor to Transformer for Large Language Models. arXiv 2023, arXiv:2307.08621. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar] [CrossRef]
- Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems, Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar] [CrossRef]
Version | mAP (COCO) | Parameters (M) | Key Strengths | Key Weaknesses |
---|---|---|---|---|
YOLOv5 | 28.0–50.7% | 2.6–97.2 | Fast, easy deployment, active community | Lower accuracy, limited task support |
YOLOv8 | 37.3–53.9% | 3.2–68.2 | Improved accuracy, multi-task support, suitable for edge AI applications | Slightly higher computational demand |
YOLOv9 | Up to 55.6% | 2.0–57.3 | Enhanced accuracy, efficient parameters | Increased training complexity |
YOLOv10 | Up to 55.0% | ~25 | End-to-end detection, reduced latency | Limited community resources |
YOLOv11 | Up to 54.7% | 20–28 | Improved feature extraction, OBB support | Complex architecture |
YOLOv12 | 40.6–55.2% | 2.6–59.1 | Highest accuracy, efficient deployment | Slower inference, higher hardware needs |
Algorithms | GFlops | Params/M | Backbone | Neck | Loss Function | |
---|---|---|---|---|---|---|
YOLOv8n | 8.1 | 3 | Original C2f | No additional attention | Original Neck | Original CIoU |
YOLO-AL | 7.9 | 3.4 | C2f module incorporating Universal Inverted Bottleneck and Space-to-depth Conv (C2f-US) | Channel Prior and Multi-scale Convolutional Attention (CPMSCA) | Omni-Dimensional Efficient-reparameterized Generalized Feature Pyramid Network (ODE-RepGFPN) | Focaler-WIoU (FW-IoU) |
C2f-US | ODE-RepGFPN | FW-IoU | CPMSCA | P | R | mAP@0.5 | GFLOPs | Params/M |
---|---|---|---|---|---|---|---|---|
0.824 | 0.759 | 0.793 | 8.1 | 3.0 | ||||
√ | 0.852 | 0.781 | 0.809 | 7.6 | 3.0 | |||
√ | 0.856 | 0.772 | 0.799 | 8.2 | 3.3 | |||
√ | 0.842 | 0.783 | 0.796 | 8.1 | 3.0 | |||
√ | 0.866 | 0.770 | 0.795 | 8.3 | 3.2 | |||
√ | √ | 0.864 | 0.780 | 0.800 | 7.7 | 3.3 | ||
√ | √ | 0.846 | 0.779 | 0.801 | 7.6 | 3.0 | ||
√ | √ | 0.865 | 0.754 | 0.795 | 7.8 | 3.2 | ||
√ | √ | √ | 0.859 | 0.775 | 0.798 | 7.7 | 3.3 | |
√ | √ | √ | 0.864 | 0.758 | 0.800 | 7.9 | 3.4 | |
√ | √ | √ | 0.865 | 0.774 | 0.811 | 7.8 | 3.2 | |
√ | √ | √ | √ | 0.865 | 0.778 | 0.815 | 7.9 | 3.4 |
Algorithm | P | R | mAP@0.5 | GFlops | Params/M | FPS |
---|---|---|---|---|---|---|
Faster R-CNN [9] | 0.542 | 0.679 | 0.643 | 401.9 | 136.9 | 11.5 |
SSD [10] | 0.362 | 0.557 | 0.373 | 30.9 | 25.0 | 2.6 |
CenterNet [37] | 0.621 | 0.631 | 0.796 | 137.2 | 35.6 | 176.0 |
RetainNet [38] | 0.807 | 0.239 | 0.361 | 170.1 | 38.0 | 83.5 |
FCOS [39] | 0.862 | 0.665 | 0.804 | 161.9 | 32.2 | 73.5 |
Damo-YOLO [33] | 0.822 | 0.734 | 0.791 | 8.4 | 3.3 | 204.1 |
YOLOv3-tiny [40] | 0.692 | 0.703 | 0.704 | 18.9 | 12.1 | 290.5 |
YOLOv5n [41] | 0.820 | 0.719 | 0.768 | 7.1 | 2.5 | 195.4 |
YOLOv8n [26] | 0.824 | 0.759 | 0.793 | 8.1 | 3.0 | 202.3 |
YOLOv10n [42] | 0.752 | 0.729 | 0.753 | 8.2 | 2.7 | 163.8 |
YOLOv11n [43] | 0.795 | 0.770 | 0.797 | 6.3 | 2.6 | 169.0 |
YOLOv12s [44] | 0.835 | 0.768 | 0.786 | 21.2 | 9.2 | 75.2 |
OURS | 0.865 | 0.778 | 0.815 | 7.9 | 3.4 | 212.9 |
Algorithm | LB | NC | OP | SC | DS | VC | PB | JS | PI | CL |
---|---|---|---|---|---|---|---|---|---|---|
Faster R-CNN [9] | 0.811 | 0.754 | 0.918 | 0.470 | 0.065 | 0.998 | 0.136 | 0.494 | 0.817 | 0.959 |
SSD [10] | 0.702 | 0.613 | 0.711 | 0.049 | 0.182 | 0.964 | 0.098 | 0.192 | 0.358 | 0.567 |
CenterNet [37] | 0.955 | 0.428 | 0.946 | 0.552 | 0.406 | 0.994 | 0.393 | 0.850 | 0.634 | 0.987 |
RetainNet [38] | 0.088 | 0.567 | 0.696 | 0.266 | 0.225 | 0.981 | 0.100 | 0.620 | 0.059 | 0.011 |
FCOS [39] | 0.830 | 0.892 | 0.915 | 0.823 | 0.395 | 0.991 | 0.322 | 0.821 | 0.676 | 0.934 |
Damo-YOLO [33] | 0.944 | 0.888 | 0.908 | 0.614 | 0.413 | 0.989 | 0.334 | 0.849 | 0.951 | 0.975 |
YOLOv3-tiny [40] | 0.859 | 0.819 | 0.950 | 0.456 | 0.358 | 0.981 | 0.208 | 0.660 | 0.854 | 0.896 |
YOLOv5n [41] | 0.975 | 0.904 | 0.955 | 0.482 | 0.339 | 0.995 | 0.214 | 0.846 | 0.990 | 0.977 |
YOLOv8n [26] | 0.972 | 0.908 | 0.950 | 0.570 | 0.399 | 0.995 | 0.342 | 0.852 | 0.956 | 0.987 |
YOLOv10n [42] | 0.962 | 0.894 | 0.897 | 0.442 | 0.327 | 0.994 | 0.251 | 0.805 | 0.987 | 0.970 |
YOLOv11n [43] | 0.974 | 0.911 | 0.961 | 0.527 | 0.397 | 0.995 | 0.391 | 0.842 | 0.978 | 0.988 |
YOLOv12s [44] | 0.888 | 0.864 | 0.918 | 0.584 | 0.334 | 0.986 | 0.344 | 0.723 | 0.860 | 0.947 |
OURS | 0.985 | 0.925 | 0.957 | 0.650 | 0.516 | 0.995 | 0.396 | 0.878 | 0.988 | 0.979 |
Method | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 | Params/M | GFlops |
---|---|---|---|---|---|---|
Backbone | 0.824 | 0.759 | 0.793 | 0.571 | 3.00 | 8.1 |
Backbone-US-1,2 | 0.852 | 0.781 | 0.809 | 0.576 | 2.96 | 7.6 |
Backbone-US-3,4 | 0.813 | 0.764 | 0.789 | 0.563 | 2.85 | 7.4 |
Backbone-US-1,2,3,4 | 0.847 | 0.753 | 0.792 | 0.562 | 2.74 | 7.0 |
Method | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
---|---|---|---|---|
CIoU | 0.824 | 0.759 | 0.793 | 0.571 |
Focaler-CIoU | 0.829 | 0.768 | 0.795 | 0.572 |
Focaler-GIoU | 0.834 | 0.761 | 0.802 | 0.576 |
Focaler-WIoU | 0.842 | 0.783 | 0.796 | 0.579 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, J.; Chen, H.; Ding, Y.; Zhuang, S.; Zhou, C.; Chen, H. You Only Look Once–Aluminum: A Detection Model for Complex Aluminum Surface Defects Based on Improved YOLOv8. Symmetry 2025, 17, 724. https://doi.org/10.3390/sym17050724
Han J, Chen H, Ding Y, Zhuang S, Zhou C, Chen H. You Only Look Once–Aluminum: A Detection Model for Complex Aluminum Surface Defects Based on Improved YOLOv8. Symmetry. 2025; 17(5):724. https://doi.org/10.3390/sym17050724
Chicago/Turabian StyleHan, Jiashu, Huiye Chen, Yitong Ding, Shudong Zhuang, Chengyu Zhou, and Hua Chen. 2025. "You Only Look Once–Aluminum: A Detection Model for Complex Aluminum Surface Defects Based on Improved YOLOv8" Symmetry 17, no. 5: 724. https://doi.org/10.3390/sym17050724
APA StyleHan, J., Chen, H., Ding, Y., Zhuang, S., Zhou, C., & Chen, H. (2025). You Only Look Once–Aluminum: A Detection Model for Complex Aluminum Surface Defects Based on Improved YOLOv8. Symmetry, 17(5), 724. https://doi.org/10.3390/sym17050724