MCH-YOLOv12: Research on Surface Defect Detection Algorithm for Aluminum Profiles Based on Improved YOLOv12
Abstract
1. Introduction
- (1)
- We designed MultiScaleGhost convolution, which enhances the ability to extract multi-scale defect features. This not only improves the detection performance for small targets but also reduces the number of parameters and computational load.
- (2)
- We proposed the SCCGLU. By combining direction-aware channel modeling with edge-guided spatial enhancement, it effectively improved the perception capability for irregular defects. Additionally, we integrated it after the C3K2 module of YOLOv12 using a post-enhancement fusion strategy.
- (3)
- We constructed the Hybrid Head detection head, which integrates the advantages of both anchor-based and anchor-free approaches. This integration thereby enhances the detection accuracy and robustness for irregular defects and defects with class imbalance.
2. YOLOv12 Detection Algorithm
3. Methods
3.1. MultiScaleGhost: Lightweight GhostConv Module with Multi-Scale Feature Enhancement
3.2. Spatial-Channel Collaborative Gated Linear Unit
3.2.1. Channel-Wise Gated Linear Unit
3.2.2. Improved Channel-Wise Gated Linear Unit
3.2.3. SCCGLU-C3k2 Fusion Structure Design
3.3. Hybrid Head: Integrating Anchor-Based and Anchor-Free Approaches
3.3.1. Anchor-Free Branch
3.3.2. Balancing and Optimization of Multiple Loss Functions in Hybrid Head
3.3.3. Weighted Non-Maximum Suppression
4. Experiments
4.1. Dataset
4.2. Experimental Environment and Parameter Settings
4.3. Evaluation Metrics
4.4. Comparison with YOLOv12
4.5. Ablation Experiment
4.6. Comparative Experiment
4.7. Generalization Evaluation on the NEU-DET Dataset
4.8. Analysis of Failure Cases
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yang, Y.; Sun, Q.; Zhang, D.; Shao, L.; Song, X.; Li, X. Improved Method Based on Faster R-CNN Network Optimization for Small Target Surface Defects Detection of Aluminum Profile. In Proceedings of the 2021 IEEE 15th International Conference on Electronic Measurement & Instruments (ICEMI), Xi’an, China, 22–24 October 2021. [Google Scholar]
- Long, Y.; Ding, H.; Zhu, Y.; Yang, Z.; Li, B. DMPNet: A Lightweight Remote Sensing Forest Wildfire Detection Network Based on Multi-Scale Heterogeneous Attention Mechanism and Dynamic Scaling Fusion Strategy. Digit. Signal Process. 2025, 164, 105252. [Google Scholar] [CrossRef]
- Chen, Y.; Zhang, J.; Gu, Y. A Novel Detection Method Based on DETR for Drone Aerial Images. In Proceedings of the 2023 IEEE 6th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 15–17 December 2023. [Google Scholar]
- Wang, D. Research on E-Commerce Special Commodity Recommendation System Based on Attention Mechanism and Dense Net Model. Syst. Soft Comput. 2025, 7, 200216. [Google Scholar] [CrossRef]
- Zhang, W.; Ji, Y.; Wang, S.; Gu, L.; Cao, G. Research of Integrating Prior Knowledge into Abnormal Behavior Recognition Model of EV Charging Station. In Proceedings of the 2023 IEEE 2nd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), Changchun, China, 24–26 February 2023. [Google Scholar]
- Zhao, S.; Wu, K.; Gu, C.; Pu, X.; Guan, X. SNc Neuron Detection Method Based on Deep Learning for Efficacy Evaluation of Anti-PD Drugs. In Proceedings of the 2018 Annual American Control Conference (ACC), Milwaukee, WI, USA, 27–29 June 2018. [Google Scholar]
- Liu, P.; Wang, Q.; Zhang, H.; Mi, J.; Liu, Y. A Lightweight Object Detection Algorithm for Remote Sensing Images Based on Attention Mechanism and YOLOv5s. Remote Sens. 2023, 15, 2429. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, H.; Xin, Z. Efficient Detection Model of Steel Strip Surface Defects Based on YOLO-V7. IEEE Access 2022, 10, 133936–133944. [Google Scholar] [CrossRef]
- Huang, J.; Zhao, F.; Chen, L. Defect Detection Network in PCB Circuit Devices Based on GAN Enhanced YOLOv11. arXiv 2025, arXiv:2501.06879. [Google Scholar] [CrossRef]
- Chen, M.; Yu, L.; Zhi, C.; Sun, R.; Zhu, S.; Gao, Z.; Ke, Z.; Zhu, M.; Zhang, Y. Improved Faster R-CNN for Fabric Defect Detection Based on Gabor Filter with Genetic Algorithm Optimization. Comput. Ind. 2022, 134, 103551. [Google Scholar] [CrossRef]
- Chen, S.-H.; Tsai, C.-C. SMD LED Chips Defect Detection Using a YOLOv3-Dense Model. Adv. Eng. Inform. 2021, 47, 101255. [Google Scholar] [CrossRef]
- Wang, X.; Gao, H.; Jia, Z.; Li, Z. BL-YOLOv8: An Improved Road Defect Detection Model Based on YOLOv8. Sensors 2023, 23, 8361. [Google Scholar] [CrossRef] [PubMed]
- Xu, Y.; Li, D.; Xie, Q.; Wu, Q.; Wang, J. Automatic Defect Detection and Segmentation of Tunnel Surface Using Modified Mask R-CNN. Measurement 2021, 178, 109316. [Google Scholar] [CrossRef]
- Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Zhang, M.; Zhang, F. Lightweight YOLOv8 Networks for Driver Profile Face Drowsiness Detection. Int. J. Automot. Technol. 2024, 25, 1331–1343. [Google Scholar] [CrossRef]
- Chen, C.; Lu, X.; He, L.; Xu, R.; Yang, Y.; Qiu, J. Research on Soybean Leaf Disease Recognition in Natural Environment Based on Improved YOLOv8. Front. Plant Sci. 2025, 16, 1523633. [Google Scholar] [CrossRef] [PubMed]
- Ge, Z.; Zhang, D.; Lu, Y.; Liu, W.; Xiao, S.; Cao, S. Propagation of Stress Wave and Fragmentation Characteristics of Gangue-Containing Coal Subjected to Water Jets. J. Nat. Gas Sci. Eng. 2021, 95, 104137. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. Yolov11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Özkan, C. Computer Vision in Wind Turbine Blade Inspections: An Analysis of Resolution Impact on Detection and Classification of Leading-Edge Erosion. Master’s Thesis, University of Stavanger, Stavanger, Norway, 2023. [Google Scholar]
- Feng, Y.-A.; Song, W.-W. Surface Defect Detection for Aerospace Aluminum Profiles with Attention Mechanism and Multi-Scale Features. Electronics 2024, 13, 2861. [Google Scholar] [CrossRef]
- Ye, S.; Wu, J.; Jin, Y.; Cui, J. Novel Variant Transformer-Based Method for Aluminum Profile Surface Defect Detection. Meas. Sci. Technol. 2024, 36, 025602. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 11–13 April 2011. [Google Scholar]
- Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Liu, Y.; Shao, Z.; Hoffmann, N. Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar] [CrossRef]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- Li, Q.; Jia, X.; Zhou, J.; Shen, L.; Duan, J. Rediscovering BCE Loss for Uniform Classification. arXiv 2024, arXiv:2403.07289. [Google Scholar] [CrossRef]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Ning, C.; Zhou, H.; Song, Y.; Tang, J. Inception Single Shot Multibox Detector for Object Detection. In Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 10–14 July 2017. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision Transformers for Dense Prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
- Zhang, H.; Li, S.; Miao, Q.; Fang, R.; Xue, S.; Hu, Q.; Hu, J.; Chan, S. Surface Defect Detection of Hot Rolled Steel Based on Multi-Scale Feature Fusion and Attention Mechanism Residual Block. Sci. Rep. 2024, 14, 7671. [Google Scholar] [CrossRef] [PubMed]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
Symbol | Definition |
---|---|
X | input feature map |
Xp | the base feature map output by the main convolutional module |
Xgi | the multi-scale feature map generated by the i-th branch |
Y | the final fused feature map |
C | the number of channels in the input feature map |
C’ | the target number of channels for the final feature map |
m | the number of output channels of the main convolutional module |
H,W | the height and width of the feature map |
Wp | parameters of the 1 × 1 convolutional kernel in the main convolutional module |
Wgi | the parameters of the depthwise separable convolution kernel for the i-th branch |
ki | the kernel size of the i-th branch |
s | the reduction ratio of the main convolution module |
n | the number of branches in the multi-scale derivation module |
BN | Batch Normalization |
ReLU | Rectified Linear Unit |
Concat | channel concatenation operation |
Attention Module | Attention Type | Enhanced Dimensions | Key Mechanism | Complexity | Suitability for Defect Detection |
---|---|---|---|---|---|
SE-Net | Channel-wise | Channel | Squeeze-and-Excitation | Low | Good for global context; lacks spatial focus |
CBAM | Channel and Spatial | Channel and Spatial | Channel and spatial attention in sequence | Medium | Better local focus; still limited on edges |
SA | Self-Attention | Full Feature Map | Long-range dependencies with high computational cost | High | Powerful but expensive for real-time use |
GAM | Global Attention | Global Channel + Spatial | Global context fusion with separate attention branches | High | Effective but overcomplex for light models |
SCCGLU (Ours) | Direction-aware and Edge | Directional Channel + Edge Spatial | Combines directional channel modeling and edge-aware enhancement via Gated Linear Unit | Medium | Designed for fine-grained, edge-blurred, irregular defects |
Loss Function | Applied Module | Purpose | Key Feature |
---|---|---|---|
CIoU Loss | Anchor-based regression | Optimizes bounding box localization by considering overlap, center distance, and aspect ratio | More accurate and stable than IoU and GIoU, especially in tight bounding box regression |
BCE Classification Loss | Anchor-based classification | Binary classification to predict object presence per anchor | Simple cross-entropy loss; does not address class imbalance |
GIoU Loss | Anchor-free regression | Optimizes bounding box regression by penalizing non-overlapping predictions | Improves over IoU when predicted boxes do not overlap with ground truth |
Focal Loss | Anchor-free classification | Enhances classification by addressing foreground–background imbalance | Down-weights easy examples, focuses on hard examples, mitigates class imbalance |
BCE Centerness Loss | Centerness prediction | Supervises the centerness score, indicating how close a point is to object center | Helps suppress low-quality predictions far from the center of objects |
Module | YOLOv12 | MCH-YOLOv12 | Purpose of Modification | Benefit |
---|---|---|---|---|
Backbone | Standard convolution | MultiScaleGhost Conv replace Standard Conv | To overcome the limitation of single-scale feature extraction | Enhance feature representation and detection of irregular defects |
Neck | C3k2 | SCCGLU-C3k2: Spatial-Channel Collaborative Gated Linear Unit added after C3k2 | To better capture direction-aware and edge-sensitive features | Improve perception of fine-grained or edge-localized defect features |
Head | Anchor-based | Hybrid Head: combines anchor-based and anchor- free branches | To adapt to varied object shapes/sizes and handle class imbalance | Improve accuracy, robustness, and localization under diverse scenarios |
Name | Experimental Configuration |
---|---|
Programming language | Python 3.12 |
Deep learning framework | PyTorch 2.3.0 + CUDA 12.1 |
CPU | Intel (R) Xeon (R) Platinum 8352 V (Intel Corporation, Santa Clara, CA, USA.) |
Memory | 48 GB |
GPU | RTX 3080x2 (20 GB) (NVIDIA Corporation, Santa Clara, CA, USA.) |
Development environment | JupyterLab (version 3.6.3) |
Parameter | Parameter Value |
---|---|
Input image size | 640 × 640 |
Number of CPU threads | 8 |
Initial learning rate | 0.01 |
Final learning rate | 0.01 |
Batch size | 32 |
Optimizer | SGD |
Number of training rounds | 400 |
Models | jupi | budaodian | tufen | cahua | aoxian | qikeng | zangdian | tucengkailie | loudi | pengshang | mAP@0.5/% |
---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv12n | 94.0 | 86.9 | 88.5 | 88.6 | 86.2 | 65.4 | 76.0 | 93.1 | 76.9 | 93.8 | 91.5 |
Ours | 98.6 | 95.0 | 92.7 | 95.9 | 92.1 | 73.6 | 82.7 | 99.2 | 86.7 | 94.3 | 95.0 |
Model Number | MultiScaleGhost | SCCGLU-C3K2 | Hybrid Head | Precision/% | Recall/% | mAP@0.5/% | mAP@0.5~0.95/% | Parameter/106 | FLOPs/G |
---|---|---|---|---|---|---|---|---|---|
A0 | ✗ | ✗ | ✗ | 84.9 | 87.9 | 91.5 | 69.4 | 11.1 | 19.6 |
A1 | √ | ✗ | ✗ | 87.2 | 88.7 | 92.9 | 71.6 | 8.3 | 18.3 |
A2 | ✗ | √ | ✗ | 86.1 | 91.1 | 93.6 | 74.1 | 7.5 | 19.4 |
A3 | ✗ | ✗ | √ | 90.3 | 84.9 | 92.3 | 71.6 | 7.1 | 18.2 |
A4 | √ | √ | ✗ | 87.0 | 89.7 | 93.3 | 71.7 | 9.3 | 18.8 |
A5 | √ | ✗ | √ | 88.6 | 89.5 | 94.1 | 74.3 | 9.4 | 18.6 |
A6 | ✗ | √ | √ | 89.5 | 89.8 | 94.3 | 75.6 | 8.0 | 19.0 |
A7 | √ | √ | √ | 92.8 | 91.1 | 95.0 | 71.9 | 7.0 | 17.1 |
Model | Precision/% | Recall/% | mAP@0.5/% | mAP@0.5~0.95/% | Parameter/106 | FLOPs/G |
---|---|---|---|---|---|---|
YOLOv5 | 81.3 | 83.4 | 87.2 | 62.0 | 9.6 | 16.5 |
YOLOv7 | 83.9 | 86.0 | 90.1 | 66.7 | 36.9 | 104.7 |
YOLOv8 | 86.6 | 84.5 | 91.6 | 69.1 | 11.2 | 28.6 |
YOLOv9 | 72.2 | 67.3 | 72.7 | 46.0 | 9.1 | 26.7 |
YOLOv10 | 84.1 | 81.4 | 87.8 | 64.5 | 9.3 | 21.6 |
YOLOv11 | 83.9 | 82.2 | 88.0 | 65.1 | 9.4 | 21.5 |
YOLOv12 | 84.9 | 87.9 | 91.5 | 69.4 | 11.1 | 19.6 |
Faster-RCNN | 88.6 | 89.5 | 94.1 | 74.3 | 41.3 | 251.4 |
SSD | 84.6 | 77.4 | 85.5 | 60.2 | 24.7 | 98.3 |
Ours | 92.8 | 91.1 | 95.0 | 71.9 | 7.0 | 17.1 |
Model | Precision/% | Recall/% | mAP@0.5/% | mAP@0.5~0.95/% | Parameter/106 | FLOPs/G |
---|---|---|---|---|---|---|
YOLOv5 | 76.7 | 72.8 | 80.3 | 54.1 | 8.7 | 15.3 |
YOLOv7 | 77.9 | 72.8 | 80.4 | 55.4 | 33.4 | 98.7 |
YOLOv8 | 81.3 | 80.4 | 85.2 | 61.4 | 10.8 | 25.7 |
YOLOv9 | 78.5 | 76.5 | 82.3 | 58.0 | 8.3 | 24.5 |
YOLOv10 | 78.0 | 71.6 | 81.1 | 54.0 | 8.9 | 20.1 |
YOLOv11 | 78.3 | 78.8 | 83.1 | 58.3 | 8.6 | 20.7 |
YOLOv12 | 85.0 | 80.3 | 87.9 | 61.4 | 10.7 | 19.3 |
Ours | 89.6 | 82.0 | 89.2 | 67.4 | 6.8 | 16.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, Y.; Yan, H.; Shang, Z.; Yang, M. MCH-YOLOv12: Research on Surface Defect Detection Algorithm for Aluminum Profiles Based on Improved YOLOv12. Sensors 2025, 25, 5389. https://doi.org/10.3390/s25175389
Sun Y, Yan H, Shang Z, Yang M. MCH-YOLOv12: Research on Surface Defect Detection Algorithm for Aluminum Profiles Based on Improved YOLOv12. Sensors. 2025; 25(17):5389. https://doi.org/10.3390/s25175389
Chicago/Turabian StyleSun, Yuyu, Heqi Yan, Zongkai Shang, and Mingxiao Yang. 2025. "MCH-YOLOv12: Research on Surface Defect Detection Algorithm for Aluminum Profiles Based on Improved YOLOv12" Sensors 25, no. 17: 5389. https://doi.org/10.3390/s25175389
APA StyleSun, Y., Yan, H., Shang, Z., & Yang, M. (2025). MCH-YOLOv12: Research on Surface Defect Detection Algorithm for Aluminum Profiles Based on Improved YOLOv12. Sensors, 25(17), 5389. https://doi.org/10.3390/s25175389