WeldSimAM and EnNWD Co-Optimization: Enhancing Lightweight YOLOv11 for Multi-Scale Weld Defect Detection
Abstract
1. Introduction
- •
- We introduce WeldSimAM, an enhanced attention module built on the parameter-free SimAM [10]. Explicitly models horizontal and vertical features to better represent the directionality of linear defects.
- •
- We formulate an Enhanced Normalized Wasserstein Distance (EnNWD) loss using Wasserstein distance [12] to quantify distributional differences. It incorporates a scale penalty term and adaptive weighting for small objects, thereby enhancing detection accuracy for minor and irregularly shaped defects.
- •
- Extensive validation: Evaluated via 10-fold cross-validation on three datasets (self-built + two public) and compared with YOLO-series and non-YOLO SOTA methods, achieving 99.67% precision, 99.65% recall, and 132 FPS, establishing a new benchmark for lightweight weld defect detection.
2. Related Work
2.1. YOLOv11 Baseline Structure
2.2. Attention Mechanisms
2.3. Bounding Box Regression Loss
2.4. Comparative Analysis of Recent SOTA Methods
3. Method
3.1. Overall Architecture of the Improved YOLOv11 Model
3.2. WeldSimAM: Directional and Channel-Enhanced Attention
- •
- Directional Convolution Parameters: The 1 × 3 and 3 × 1 convolutions in WeldSimAM use padding = 1 (to maintain feature map size) and stride = 1 (for dense feature extraction), with ReLU activation to enhance nonlinearity.
- •
- Directional Attention Branch: Extract horizontal and vertical features through 1 × 3 and 3 × 1 convolution, respectively, to capture the linear features of weld defects (e.g., cracks, linear porosity). This design draws inspiration from the discovery that linear defects possess notable directional traits, which can be strengthened by directional convolution kernels. For the input feature map, the horizontal mean (mean (dim = 3)) and vertical mean (mean (dim = 2)) are calculated, and subsequently, horizontal and vertical features are extracted by using the corresponding convolution kernels.
- •
- Dynamic Weight Fusion: The directional attention and the original SimAM basic attention are fused with weights (0.3 × directional attention + 0.7 × basic attention) through softmax to balance the contribution of different directional features. To justify the fixed 0.3/0.7 fusion weights, we conducted a sensitivity analysis (Table 2): weights of 0.2/0.8 reduced mAP@0.5:0.95 by 0.8%, while 0.4/0.6 increased false positives by 2.3%. The 0.3/0.7 ratio achieves the best balance between directional feature enhancement and background suppression. This weighted fusion strategy avoids over-reliance on single-direction features, which is consistent with the multi-scale feature fusion idea in BiFPN [28].
- •
- Channel Attention Enhancement: Employ 1 × 1 convolution for channel dimension adjustment, and enhance the sensitivity of important channels by calculating the channel mean (x.mean ([2,3])) and implementing normalization. This component references the channel attention design in CBAM, which has been proven effective in suppressing background noise in industrial scenes.
- •
- Aspect Ratio Adaptive Adjustment: Adjust the feature map size through bilinear interpolation, and optimize the processing capability of targets with different aspect ratios (e.g., vertical strip and square weld defects) by introducing scale_factor. This addresses the problem that traditional SimAM has inconsistent performance on targets of different shapes.
3.3. Adaptive NWD Loss (EnNWD)
- •
- Small Target Weight Mechanism: Compute the relative area of the prediction box (area_pred = W1 × H1) and the ground-truth box (area_target = W2 × H2), then generate the small target weight coefficient (small_obj_weight) to adaptively adjust the loss weight according to the target size. This is based on the finding that small-scale targets are more sensitive to position deviations compared to large-scale ones.
- •
- Scale Difference Penalty Term: Compute the scale difference (scale_diff) between the prediction box and the ground-truth box. Next, integrate the scale penalty (scale_penalty) to prevent scale inconsistency between the prediction box and the ground-truth box. This tackles the limitation of the original NWD in dealing with scale variations in large targets.
- •
- Multi-Scale Weighted Fusion: Adjust the penalty weights of center distance (center_distance) and width-height distance (wh_distance) according to the target size to optimize the regression performance of large-scale targets. This references the weight adjustment strategy in Shape-IoU [27], which considers the impact of target shape and scale on loss calculation.
4. Experiments
4.1. Dataset Description
4.1.1. Self-Built Weld Defect Dataset
4.1.2. Public Datasets
- •
- NEU-DET [15]: 1800 steel surface defect images (6 types), including cracks, inclusions, and patches, with a similar multi-scale characteristic to weld defects. The dataset is split into 8:1:1 via 10-fold cross-validation and resized to 640 × 640.
- •
- PCB Defect Dataset [29]: 1460 PCB defect images (5 types), with small/low-contrast targets (e.g., pinholes, open circuits) similar to tiny weld pores. The same 10-fold cross-validation split and resizing strategy are adopted.
4.2. Experimental Setup
- •
- Training Hyperparameters: 100 epochs, batch size = 64 (mixed precision), SGD optimizer (initial learning rate = 0.01, momentum = 0.937, weight decay = 0.0005).
- •
- Loss Weights: 7.5 for bounding box regression, 0.5 for classification, and 1.5 for distribution focal loss.
- •
- Metrics: Mean Average Precision (mAP@0.5, mAP@0.5:0.95), Precision (P), Recall (R), model size (MB), and real-time inference speed (FPS).
4.3. Evaluation Metrics
4.4. Ablation Study
- •
- All models exhibit an upward trend in mAP as the number of training epochs increases.
- •
- The proposed model (ours, red curve) consistently outperforms other variants throughout the training process.
- •
- It ultimately achieves an mAP of over 0.73, significantly higher than the baseline model’s (blue curve) approximately 0.69.
- •
- This demonstrates that the collaborative optimization of WeldSimAM and EnNWD effectively enhances the model’s detection accuracy across multiple IoU thresholds.
- •
- The mAP of each model rises rapidly in the early training stage and stabilizes in the later stage.
- •
- The proposed model (ours, red curve) reaches a final mAP close to 1.0.
- •
- It performs slightly better than YOLOv11 + WeldSimAM (orange curve) and YOLOv11 + EnNWD (green curve).
- •
- This further verifies the performance gain of the dual-module optimization for detection under high IoU thresholds.
- •
- Small weld detection map: Compact, small-scale weld regions (e.g., partial good-labeled samples).
- •
- Horizontal weld detection map: Wide, transversely distributed qualified welds (e.g., good samples marked by red boxes).
- •
- Vertical weld detection map: Long, longitudinally distributed defective welds (e.g., bad samples marked by green boxes).
- •
- For the vertical welds (bad samples), the confidence of the baseline YOLOv11 is 0.60, which increases to 0.67 (with WeldSimAM), 0.73 (with EnNWD), and finally reaches 0.76 in the proposed model (a 16% improvement), indicating that the dual-module optimization achieves a more significant enhancement in identifying long-strip defective welds.
- •
- Good sample horizontal welds for YOLOv11 confidence is 0.71, going up to 0.74 (WeldSimAM), 0.74 (EnNWD), and 0.78 in the proposed model. All scores are above 0.71, showing a good recognition stability of wide, qualified welds.
- •
- For the small welds, the confidence is steadily increasing (for good samples, it goes from 0.71 to 0.78), showing that the modules are good at recognizing small welds.
4.5. Comparative Analysis
4.5.1. Comparison with YOLO-Series Models (Self-Built Dataset)
4.5.2. Comparison with Non-YOLO SOTA Methods (Public Datasets)
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Khumaidi, A.; Yuniarno, E.M.; Purnomo, M.H. Welding defect classification based on convolution neural network (CNN) and Gaussian kernel. In Proceedings of the 2017 International Seminar on Intelligent Technology and Its Applications, Surabaya, Indonesia, 28–29 August 2017; pp. 268–273. [Google Scholar] [CrossRef]
- Czimmermann, T.; Ciuti, G.; Milazzo, M.; Chiurazzi, M.; Roccella, S.; Oddo, C.M.; Dario, P. Visual-Based Defect Detection and Classification Approaches for Industrial Applications—A Survey. Sensors 2020, 20, 1459. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Chen, Z.; Zhang, C.; Xi, J.; Le, X. Weld Defect Detection Based on Deep Learning Method. In Proceedings of the 2019 IEEE 15th International Conference on Automation Science and Engineering, Vancouver, BC, Canada, 22–26 August 2019; pp. 1184–1189. [Google Scholar] [CrossRef]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8 [EB/OL]. GitHub Repository. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 19 November 2025).
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Cengil, E. Weld Defect Detection with YOLOv10. NATURENGS 2024, 5, 77–81. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Han, K.; Ding, G.; Yao, H. YOLOv10: Real-time end-to-end object detection. arXiv 2024. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv 2019. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 11863–11874. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar] [CrossRef]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021. [Google Scholar] [CrossRef]
- Wu, L.; Chu, Y.K.; Yang, H.G.; Chen, Y.X. Sim-YOLOv8 Object Detection Model for DR Image Defects in Aluminum Alloy Welds. Chin. J. Lasers 2024, 51, 1602103. [Google Scholar] [CrossRef]
- Xu, J.; Ye, D.; Zhang, S.; Wang, K.; Chen, S. Metallic surface defect detection via NWD-WIoU based on grayscale co-generation entropy gain. Appl. Intell. 2025, 55, 752. [Google Scholar] [CrossRef]
- Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]
- Jocher, G. Ultralytics YOLO [EB/OL]. GitHub Repository. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 15 July 2025).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Wu, Z.; Jiao, C.; Sun, J.; Chen, L. Tire Defect Detection Based on Faster R-CNN. In Communications in Computer and Information Science; Springer: Singapore, 2020; Volume 1336, pp. 203–218. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H. YOLOv9: Learning what you want to learn using programmable gradient information. arXiv 2024. [Google Scholar] [CrossRef]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2117–2125. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- Zhao, Z.; Huang, W.Q.; Li, T.; Zhu, J. eCBAM and saSIoU Co-Optimized YOLOv11 for Riverine Floating Garbage Classification Under Complex Aquatic Scenarios. Appl. Sci. 2026, 16, 651. [Google Scholar] [CrossRef]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. UnitBox: An Advanced Object Detection Network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 521–525. [Google Scholar] [CrossRef]
- Zhang, H.; Wang, S.; Wang, S.; Wang, Z. Shape-IoU: More accurate metric considering bounding box shape and scale. arXiv 2023, arXiv:2307.02155. Available online: https://arxiv.org/abs/2307.02155 (accessed on 9 November 2025).
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar] [CrossRef]
- Wu, H.; Tang, M. PCB surface defect detection based on improved YOLOv7-tiny. In Proceedings of the 2023 5th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Hangzhou, China, 15–17 December 2023; pp. 334–337. [Google Scholar] [CrossRef]








| Fusion Weight (Directional:Basic) | Core Design | Advantages | Limitations | Our Improvements |
|---|---|---|---|---|
| Sim-YOLOv8 [13] | Vanilla SimAM + YOLOv8 | Parameter-free, lightweight | Isotropic, no directional feature modeling | WeldSimAM with 1 × 3/3 × 1 directional convolutions |
| NWD-WIoU [14] | NWD + WIoU hybrid loss | Robust to tiny object deviation | Ignores scale disparity between multi-scale defects | EnNWD with scale penalty and small-object weighting |
| eCBAM-YOLO [25] | Enhanced CBAM + YOLOv7 | Strong background suppression | Model size increased by 30% | Maintain 5.21 MB via parameter-free directional branch |
| EfficientDet-Lite3 [28] | Compound scaling + MobileNetV2 | Good multi-scale adaptation | Low FPS (95 FPS) and large parameter (4.8 MB) | 132 FPS with comparable size, task-specific optimization |
| YOLOv10-Weld [6] | YOLOv10 + data augmentation | Real-time (126 FPS) | No attention/loss optimization for weld defects | Dual optimization of attention and loss, 6 FPS improvement |
| Fusion Weight (Directional:Basic) | mAP@ 0.5 (%) | mAP@ 0.5:0.95 (%) | False Positive Rate (%) |
|---|---|---|---|
| 0.2:0.8 | 99.45 | 72.49 | 1.2 |
| 0.3:0.7 (Proposed) | 99.47 | 73.09 | 1.1 |
| 0.4:0.6 | 99.46 | 72.83 | 3.4 |
| Name | Specific Information |
|---|---|
| CPU | Intel(R) Xeon(R) Gold 5418Y (10 cores) |
| GPU | NVIDIA RTX 4090 |
| RAM | 120 GB |
| CUDA | 11.8 |
| PyTorch | 2.2.2 |
| Python | 3.10 |
| Model | mAP@ 0.5 (%) | mAP@ 0.5:0.95 (%) | P (%) | R (%) | Size (MB) | FPS |
|---|---|---|---|---|---|---|
| YOLOv11(Baseline) | 99.35 ± 0.08 | 69.53 ± 0.32 | 99.36 ± 0.07 | 99.38 ± 0.06 | 5.21 | 128 ± 2 |
| YOLOv11 + WeldSimAM | 99.47 ± 0.06 * | 73.09 ± 0.28 ** | 99.43 ± 0.05 | 99.34 ± 0.07 | 5.21 | 130 ± 2 |
| YOLOv11 + EnNWD | 99.45 ± 0.07 * | 72.90 ± 0.30 ** | 99.12 ± 0.08 | 99.45 ± 0.05 | 5.21 | 129 ± 2 |
| Proposed Model | 99.48 ± 0.05 ** | 73.29 ± 0.25 ** | 99.67 ± 0.04 ** | 99.65 ± 0.04 ** | 5.21 | 132 ± 2 |
| Model | mAP@ 0.5 (%) | mAP@ 0.5:0.95 (%) | FPS |
|---|---|---|---|
| YOLOv11 + WeldSimAM (Backbone only) | 99.41 ± 0.07 | 72.15 ± 0.31 | 129 ± 2 |
| YOLOv11 + WeldSimAM (Head only) | 99.38 ± 0.08 | 71.82 ± 0.33 | 131 ± 2 |
| YOLOv11 + WeldSimAM (P3-P5 layers) | 99.47 ± 0.06 * | 73.09 ± 0.28 ** | 130 ± 2 |
| Model | mAP@ 0.5 (%) | mAP@ 0.5:0.95 (%) | P (%) | R (%) | Size (MB) | FPS |
|---|---|---|---|---|---|---|
| YOLOv5 | 99.15 ± 0.12 | 62.19 ± 0.45 | 97.69 ± 0.15 | 98.47 ± 0.11 | 4.43 | 115 ± 3 |
| YOLOv6 | 99.43 ± 0.09 | 66.12 ± 0.38 | 99.24 ± 0.08 | 99.74 ± 0.05 | 8.15 | 108 ± 3 |
| YOLOv8n | 99.25 ± 0.10 | 63.44 ± 0.42 | 97.57 ± 0.14 | 98.53 ± 0.10 | 5.36 | 125 ± 2 |
| YOLOv9t | 99.38 ± 0.08 | 68.32 ± 0.35 | 98.62 ± 0.11 | 99.50 ± 0.07 | 3.95 | 120 ± 2 |
| YOLOv10n | 99.38 ± 0.08 | 67.32 ± 0.36 | 98.02 ± 0.13 | 98.91 ± 0.09 | 5.48 | 126 ± 2 |
| YOLOv11 | 99.35 ± 0.08 | 69.53 ± 0.32 | 99.36 ± 0.07 | 99.38 ± 0.06 | 5.21 | 128 ± 2 |
| Proposed Model | 99.48 ± 0.05 ** | 73.29 ± 0.25 ** | 99.67 ± 0.04 ** | 99.65 ± 0.04 ** | 5.21 | 132 ± 2 |
| Model | Dataset | mAP@ 0.5 (%) | mAP@ 0.5:0.95 (%) | P (%) | R (%) | Size (MB) | FPS |
|---|---|---|---|---|---|---|---|
| Faster R-CNN (ResNet50) [17] | NEU-DET | 92.30 ± 0.52 | 48.50 ± 0.68 | 91.80 ± 0.45 | 90.20 ± 0.51 | 42.50 | 25 ± 1 |
| SSD (MobileNetV2) [19] | NEU-DET | 89.70 ± 0.58 | 45.30 ± 0.72 | 88.90 ± 0.50 | 87.60 ± 0.55 | 3.20 | 110 ± 3 |
| EfficientDet-Lite3 [28] | NEU-DET | 94.50 ± 0.35 | 52.50 ± 0.61 | 93.70 ± 0.38 | 92.80 ± 0.42 | 4.80 | 95 ± 2 |
| Proposed Model | NEU-DET | 97.80 ± 0.22 ** | 63.30 ± 0.40 ** | 97.20 ± 0.25 ** | 96.90 ± 0.28 ** | 5.21 | 132 ± 2 |
| Faster R-CNN (ResNet50) [17] | PCB | 90.60 ± 0.55 | 47.20 ± 0.70 | 89.90 ± 0.48 | 88.50 ± 0.53 | 42.50 | 23 ± 1 |
| SSD (MobileNetV2) [19] | PCB | 88.30 ± 0.60 | 43.70 ± 0.75 | 87.50 ± 0.52 | 86.80 ± 0.58 | 3.20 | 107 ± 3 |
| EfficientDet-Lite3 [28] | PCB | 93.20 ± 0.38 | 50.90 ± 0.63 | 92.60 ± 0.40 | 91.70 ± 0.45 | 4.80 | 92 ± 2 |
| Proposed Model | PCB | 97.00 ± 0.25 ** | 61.70 ± 0.42 ** | 96.50 ± 0.27 ** | 96.20 ± 0.30 ** | 5.21 | 132 ± 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, W.; Cheng, Q.; Zhu, J. WeldSimAM and EnNWD Co-Optimization: Enhancing Lightweight YOLOv11 for Multi-Scale Weld Defect Detection. Technologies 2026, 14, 140. https://doi.org/10.3390/technologies14030140
Huang W, Cheng Q, Zhu J. WeldSimAM and EnNWD Co-Optimization: Enhancing Lightweight YOLOv11 for Multi-Scale Weld Defect Detection. Technologies. 2026; 14(3):140. https://doi.org/10.3390/technologies14030140
Chicago/Turabian StyleHuang, Wenquan, Qing Cheng, and Jing Zhu. 2026. "WeldSimAM and EnNWD Co-Optimization: Enhancing Lightweight YOLOv11 for Multi-Scale Weld Defect Detection" Technologies 14, no. 3: 140. https://doi.org/10.3390/technologies14030140
APA StyleHuang, W., Cheng, Q., & Zhu, J. (2026). WeldSimAM and EnNWD Co-Optimization: Enhancing Lightweight YOLOv11 for Multi-Scale Weld Defect Detection. Technologies, 14(3), 140. https://doi.org/10.3390/technologies14030140

