Detection of Welding Defects Tracked by YOLOv4 Algorithm

Chen, Yunxia; Wu, Yan

doi:10.3390/app15042026

Open AccessArticle

Detection of Welding Defects Tracked by YOLOv4 Algorithm

by

Yunxia Chen

^* and

Yan Wu

School of Intelligent Manufacturing and Control Engineering, Shanghai Polytechnic University, Shanghai 201209, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(4), 2026; https://doi.org/10.3390/app15042026

Submission received: 13 January 2025 / Revised: 6 February 2025 / Accepted: 12 February 2025 / Published: 14 February 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The recall rate of the original YOLOv4 model for detecting internal defects in aluminum alloy welds is relatively low. To address this issue, this paper introduces an enhanced model, YOLOv4-cs1. The improvements include optimizing the stacking method of residual blocks, modifying the activation functions for different convolutional layers, and eliminating the downsampling layer in the PANet (Pyramid Attention Network) to preserve edge information. Building on these enhancements, the YOLOv4-cs2 model further incorporates an improved Spatial Pyramid Pooling (SPP) module after the third and fourth residual blocks. The experimental results demonstrate that the recall rates for pore and slag inclusion detection using the YOLOv4-cs1 and YOLOv4-cs2 models increased by 28.9% and 16.6%, and 45% and 25.2%, respectively, compared to the original YOLOv4 model. Additionally, the mAP values for the two models are 85.79% and 87.5%, representing increases of 0.98% and 2.69%, respectively, over the original YOLOv4 model.

Keywords:

weld defects; deep learning; target detection; YOLOv4

1. Introduction

Aluminum alloy materials are not only corrosion resistant and easy to use and maintain, but they also significantly reduce the weight of objects. The development of the aluminum processing industry has primarily relied on mature technology, low-cost replication, and expansion, leading to increased production capacity and widespread application in aerospace, construction, and rail transportation. Due to the influence of various process parameters on the welding process of aluminum alloy weldments, both manual welding and robot-assisted automatic welding can result in defects such as pores, slag inclusion, and incomplete penetration. These welding defects pose potential safety hazards, including leakage and explosion, making weld defect detection a critical issue in the industry.

In industrial production, X-ray non-destructive testing is commonly used to detect internal defects in aluminum alloy weldments. However, prolonged exposure to X-ray images can lead to ophthalmic diseases and errors due to human subjectivity and fatigue, which are unacceptable in engineering practice. Moreover, current real-time weld defect detection systems face limitations in terms of accuracy, efficiency, and false positive rates. Therefore, it is essential to develop an automated weld defect detection system that can detect weld defects more effectively, quickly, and accurately [1].

Object detection is a fundamental task in computer vision, aimed at identifying object categories within images and precisely determining their locations. Object detection algorithms can be classified into convolution-based and transformer-based methods [2]. Convolution-based approaches can be further subdivided into two-stage and one-stage detectors. The two-stage detection model consists of a first stage that proposes candidate bounding boxes and a second stage that refines and classifies these proposals. Notable examples in the RCNN series include SPPNet [3], Fast R-CNN [4], and Region Proposal Network (RPN) [5]. While these methods achieve high detection accuracy, they do so at the expense of their processing speed [6]. In contrast, single-stage detection models such as YOLO [7], RetinaNet [8], and SSD [9] directly predict both the bounding boxes and class probabilities of objects within an image, thereby significantly enhancing detection speed and better meeting practical application requirements. Zhao et al. [10] introduced a novel infrared aerial target detection method called YOLO-Mamba to address challenges such as distance dependence and computational complexity in the existing approaches. Huang et al. [11] proposed an enhanced RetinaNet algorithm to improve the model’s detection performance, specifically targeting issues arising from multi-scale variations in targets. To tackle the problem of detecting occluded objects, Biffi et al. [12] developed a deep learning method based on Adaptive Training Sample Selection (ATSS), which labels only the center points of objects, thereby enhancing practicality.

Weld defect detection, as a specialized application of computer vision, is a promising research area with significant prospects, particularly given the critical importance of oil and natural gas as primary energy sources that provide a strong practical foundation. Currently, experts in weld seam detection primarily focus on challenges such as dataset scarcity, multi-scale defect detection, and tiny target detection. Guo et al. [13] proposed a detection method that integrates Generative Adversarial Networks (GANs) with transfer learning to balance data distribution and augment image samples, thereby addressing data imbalance issues. Kumaresan et al. [14] explored an image-centric approach and employed real-time image data augmentation techniques to overcome the limitations of X-ray datasets. Ji et al. [15] tackled inconsistent scales and wide boundary transitions in weld seam defects by using geometric transformation-based data augmentation and a Feature Pyramid Network (FPN) to improve detection accuracy. Liu et al. [16] addressed significant shape and size variations by designing multi-scale feature extraction modules, proposing the LF-YOLO method, which balances performance and computational cost. Pan et al. [17] introduced a gray value curve enhancement module and the WD-YOLO model to handle large shape and size variations. Yang et al. [18] proposed an end-to-end detection model with bidirectional convolutional LSTM blocks to optimize shortcuts, addressing the lack of time-based information in the existing methods. To address the weak generalization of anchor-based detectors for large-scale defects, Zuo et al. [19] proposed an efficient anchor-free detector with Dynamic Receptive Field Allocation (DRFA) and task alignment, improving both localization and classification. Additionally, Zuo et al. [20] designed a Multilevel Attention Feature Fusion Network (MAFFN) to enhance prediction accuracy for multi-scale defects.

Gu Jing et al. [21] proposed an enhanced Faster-RCNN model for detecting weld defects. The model employs a multi-layer feature extraction network and incorporates multiple sliding windows to improve detection performance. Guo Feng et al. [22] primarily investigated the impact of different activation functions, including Mish, Swish, and Leaky-ReLU, on the detection results of the YOLOv4 model. Their findings indicate that the simultaneous use of the Swish and Mish activation functions in the YOLOv4 model yields the best detection performance. Inspired by these studies, this manuscript introduces improvements to the original YOLOv4 model. Compared with existing algorithms, the improved model demonstrates superior performance metrics in defect detection, providing more accurate results across various types of defects.

This paper is organized as follows: Section 1 introduces deep learning models and their applications; Section 2 details the prediction principles, activation functions, and overall framework of the YOLOv4 algorithm; Section 3 presents the improved YOLOv4-cs1 and YOLOv4-cs2 models, along with diagrams of their frameworks; and Section 4 describes the experimental conditions, hardware setup, and analysis of the experimental results. Finally, a comparative chart of the different models’ performance in defect detection is provided.

2. YOLOv4 Algorithm

The YOLOv4 algorithm is an enhanced version of the YOLOv3 algorithm. As illustrated in Figure 1, YOLOv4 comprises three main components: the backbone feature extraction network CSPDarknet53, the feature fusion network PANet, and the prediction head. Specifically, CSPDarknet53 integrates Darknet-53 with CSPNet, incorporating partial dense blocks and transition layers. The backbone feature extraction process involves three steps: (1) performing convolution, normalization, and activation function operations on the input image; (2) conducting preliminary feature extraction through five Res_body blocks; and (3) extracting feature maps with higher semantic information from the last three Res_body blocks. These feature maps are then convolved and processed through Spatial Pyramid Pooling (SPP) before being fed into the PANet for further feature refinement. Finally, the prediction results are generated via three YOLO Head layers. Mish, used as the activation function, is defined by Equation (1) and visualized in Figure 2.

M i s h = x \times \tan h (\ln (1 + \exp (x)))

(1)

3. Algorithm Improvement

3.1. k-Means++ Clustering

The original YOLOv4 anchor boxes are well suited for the Visual Object Classes (VOC) dataset; however, aluminum alloy weld defects represent small target objects. To enhance both the training efficiency and prediction accuracy of the entire model, we employ the k-means++ algorithm to re-cluster the X-ray weld defect dataset, using the resulting clusters as the initial anchor boxes. Compared with traditional k-means [23], k-means++ [24] optimizes the initial cluster center selection method to ensure that the distance between initial cluster centers is maximized, thereby achieving a superior clustering performance. The input image resolution is set to 416 × 416 pixels. Considering the varying sizes of different types of defects, we determine nine sets of anchor box dimensions: (300,29), (15,160), (135,24), (64,74), (41,39), (28,38), (51,20), (26,25), and (15,19). These anchor boxes are assigned to three feature layers of different scales: (13 × 13), (26 × 26), and (52 × 52).

3.2. Framework Optimization

3.2.1. YOLOv4-cs1

After k-means++ clustering, the anchor boxes are also optimized to improve the speed and accuracy of aluminum alloy weld defect detection. Figure 3 illustrates the improved framework of CSPDarknet53. The residual blocks are sequentially stacked through downsampling with different strides, effectively addressing the degradation problem in deep learning models and fully utilizing the semantic information from higher layers.

However, as the network deepens and residual blocks are continuously fused, the number of parameters that the model needs to train inevitably increases. To address this issue, all 3 × 3 standard convolutions in the PANet network are replaced with depthwise separable convolutions [25], significantly reducing the number of training parameters. Since the pore and incomplete penetration defects in the training dataset (internal defect maps of aluminum alloy welds) are small targets, downsampling in the original YOLOv4 model can lead to the loss of edge information for these defects, thereby reducing the model’s effectiveness. Therefore, the downsampling layers and corresponding feature fusion operations in the PANet network are removed to preserve more effective edge information as the network depth increases. Additionally, the ReLU activation function for all 3 × 3 depthwise separable convolutions in the PANet network is replaced with the ReLU6 activation function, as ReLU6 helps prevent numerical instability. The mathematical expressions for the ReLU and ReLU6 activation functions are provided in Equations (2) and (3), respectively, and their graphical representations are shown in Figure 4. The overall architecture of the improved YOLOv4-cs1 model is described in Figure 5.

R e L U = m a x (0, x)

(2)

R e L U 6 = m i n (6, m a x (0, x))

(3)

3.2.2. YOLOv4-cs2

While ensuring the depth of the model, issues such as feature extraction and gradient disappearance must be carefully addressed. The YOLOv4-cs2 model improves upon the residual block structure that is based on the YOLOv4-cs1 network. Specifically, a 3 × 3 convolution is added to the large residual branch, connecting the input information with the output information from the last layer in the original residual block. This modification helps reduce the number of parameters while enhancing the extraction of local detailed features. In this study, the defects of interest include pores, slag inclusion, and incomplete penetration. Pore and incomplete penetration defects often have less edge information, which poses significant challenges for edge feature extraction. The SPP (Spatial Pyramid Pooling) layer in the original network significantly increases the receptive field and effectively separates important contextual features, playing a crucial role in the extraction of edge features from small targets. Accordingly, two additional SPP modules are incorporated into the YOLOv4-cs1 model, with these being placed after the third and fourth residual blocks, respectively. The pooling kernels of these SPP modules are set to 3 × 3, 5 × 5, and 7 × 7, enabling the network to better capture edge information. Finally, the number of channels in the model is adjusted to reduce the overall parameter count. The architecture diagram of the YOLOv4-cs2 model is shown in Figure 6.

4. Experiment

4.1. Dataset

The dataset used in this experiment consists of images of internal defects in aluminum alloy welds, captured using the X-ray detection method. This is not a public dataset. The dataset includes three object classes: pores, slag inclusion, and incomplete penetration, totaling 1005 images. Given the limited number of images (1005), which is insufficient for effective training of deep learning models, data augmentation is necessary to enhance the dataset. By rotating, cropping, and adjusting the contrast of the images, the dataset is expanded to include more feature information. For example, Figure 7a–c illustrate the augmentation process for incomplete penetration defects. After using the popular annotation tool LabelMe, the dataset now comprises 5165 labeled images, as shown in the schematic diagram in Figure 8. The dataset is randomly divided into three subsets: 3616 images for training, 516 images for validation, and 1033 images for testing. Figure 9 illustrates the data distribution for different object classes across the training dataset, validation dataset, and test dataset. It is important to note that the data used for the subsequent comparison experiments were obtained from the test set. Utilizing this augmented dataset improves the robustness of the model during training.

4.2. Hardware Facilities

The hardware configuration used in the experiment is detailed in Table 1.

4.3. Model Training

The deep learning framework used for all models in this experiment and comparison is PyTorch v2.1.0. Training employs the Adam optimizer with an unfreezing training method. The specific parameter settings before and after unfreezing are detailed in Table 2.

After unfreezing, the batch size of the model is adjusted to 2 to prevent excessive data from overwhelming the cache and causing insufficient GPU memory, which could halt the program midway.

4.4. Experimental Results and Analysis

4.4.1. Comparison with Other Object Detection Models

To comprehensively evaluate the performance of the YOLOv4-cs2 model, several commonly used evaluation metrics are employed, including Precision, Recall, AP (Average Precision), and mAP (mean Average Precision). The calculation formulas for Precision and Recall are defined in Equations (4) and (5).

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

R e c a l l = \frac{T P}{T P + F N}

(5)

Taking slag inclusions as an example, in Equations (4) and (5) TP represents the number of slag inclusions correctly detected, FP represents the number of slag inclusions incorrectly detected, and FN represents the number of slag inclusions that were not detected.

To evaluate its performance in detecting external substances, the proposed YOLOv4-cs2 model is compared with several other object detection methods, including YOLOv4, YOLOv3, YOLOv4-Tiny, CenterNet, and SSD. Table 3 summarizes the results in terms of accuracy and recall.

As shown in Table 3, the YOLOv4-Tiny, CenterNet, and SSD models exhibit higher accuracy rates for detecting the three types of defects. However, their recall rates for these defects are relatively low, particularly for incomplete penetration defects. While YOLOv3 and YOLOv4 show improved recall rates for all three types of defects while maintaining accuracy, their recall rates for specific defects, such as pores, remain low.

Due to the lack of obvious edge features and the tendency to lose edge information after convolution, deepening the network is crucial for extracting robust edge information from pores. YOLOv4-cs1 modifies the feature fusion mode of residual network blocks, thereby adjusting the network depth and ensuring proper gradient flow. Removing the downsampling layer helps minimize the loss of edge information, leading to better overall prediction results compared to the original YOLOv4 model, with a significant improvement in recall rate.

YOLOv4-cs2 builds upon YOLOv4-cs1 by adding two SPP (Spatial Pyramid Pooling) modules to enhance the semantic information of feature maps obtained from residual blocks. Additionally, it refines the large residual edges within the residual blocks, thereby improving the extraction of edge features. As shown in Table 3, although the accuracy rates for pores and slag inclusion have slightly decreased, the recall rates for these two defects have significantly improved—by 25.79% and 18.53%, respectively—compared to the original YOLOv4 model. Moreover, they are 9.24% and 6.32% higher than those of the YOLOv4-cs1 model.

As shown in Table 3, the detection accuracy rate of the YOLOv4-cs2 model has decreased, while the recall rate has increased. To provide a comprehensive evaluation, the F1 score is introduced, which represents the harmonic mean of precision and recall. The formula for calculating the F1 score is defined in Equation (6).

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(6)

The values of F1 for three defects, calculated through the above equations, are shown in Table 4.

A higher F1 value indicates better detection performance of the model. As shown in the results, YOLOv4-Tiny, CenterNet, and SSD exhibit very low F1 values for detecting incomplete penetration defects, making these models unsuitable for this type of defect detection. Compared to YOLOv3 and the original YOLOv4 model, YOLOv4-cs1 demonstrates a significant increase in F1 values for pore and slag inclusion defects. Furthermore, YOLOv4-cs2 possesses higher-level semantic information and retains more defect edge information, resulting in higher F1 values for pore, slag inclusion, and incomplete penetration defects compared to YOLOv4-cs1, as illustrated in Figure 10.

A missed alarm rate is another metric for evaluating the model. It represents the proportion of positive samples that are incorrectly classified as negative samples out of all positive samples. The calculation formula is given in Equation (7).

M i s s i n g a l a r m = \frac{F N}{T P + F N}

(7)

The comparison results of different models regarding the missed detection rates are presented in Figure 11.

As shown in Figure 11, the missed detection rates of the YOLOv4-cs2 model for pores and slag inclusion are relatively low, demonstrating that the improved model has advantages in detecting these types of defects.

Two other important metrics for evaluating the model are AP (Average Precision) and mAP (mean Average Precision). AP measures the average precision for a specific defect type, while mAP represents the average precision across all defect types. The calculation formulas for AP and mAP are provided in Equation (8) and Equation (9), respectively.

A P_{j} = \frac{\sum_{i = 1}^{N_{j}} p r e c i s i o n_{i}}{N_{j}}

(8)

m A P = \frac{\sum_{j = 1}^{j} A P_{j}}{j}

(9)

In Equation (8), AP represents the ratio of the sum of all detection accuracy rates for a specific type of defect to the total number of instances of that defect. In Equation (9), mAP represents the average of the AP values across all types of defects. The results of a comparison between different models are presented in Table 5. Figure 12 illustrates a comparison of the AP and mAP results of various models. The results indicate that the YOLOv4-cs2 model performs better, demonstrating an improved detection performance.

4.4.2. Comparison of Test Results

The detected defects are illustrated in Figure 13, Figure 14, Figure 15 and Figure 16, corresponding to the models listed in Table 5, from YOLOv4-cs2 to SSD, respectively. The tests primarily focus on images of pores, slag inclusion, incomplete penetration, and combinations of all three types of defects. From Figure 13, it is evident that YOLOv4-Tiny failed to detect all instances of pores, while the CenterNet model misclassified some pores as slag inclusion. In Figure 14, which shows images of slag inclusion, the SSD model incorrectly identified one region as a pore and failed to detect a small slag inclusion.

Figure 15 clearly demonstrates that, during the detection of incomplete penetration, YOLOv4-cs1 missed the defect. Additionally, the detection frames of YOLOv4-Tiny and SSD identified locations that do not contain any defects. In the final inspection of images containing all three types of defects, it is evident that the YOLOv4-cs2 and YOLOv3 models perform better overall, with YOLOv4-cs2 showing greater accuracy in detecting incomplete penetration defects.

5. Conclusions

The modified YOLO model demonstrates an enhanced automatic detection performance for three types of internal defects in welds. The main conclusions are as follows:

An improved model, YOLOv4-cs1, is proposed. This model primarily modifies the fusion method involving residual blocks, the feature extraction approach of the PANet network, and the activation functions. As a result, the model can better learn edge information.
YOLOv4-cs2 further improves upon YOLOv4-cs1. In YOLOv4-cs2, the residual block structures and activation functions corresponding to different convolution kernels are modified to accelerate the learning of rare features. Two SPP (Spatial Pyramid Pooling) modules are added after the third and fourth residual blocks to expand the model’s receptive field.
The results indicate that the recall rates for pores and slag inclusion are significantly improved in both optimized models, which is attributed to their enhanced ability to learn edge information. In the future, we will continue to focus on designing an advanced intelligent detection system for aluminum alloy weld defects that is aimed at improving the safety and automation of equipment.

Author Contributions

Conceptualization, validation, writing—original draft preparation: Y.C. and Y.W.; writing—review and editing: Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (no. 51809161).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no competing financial interests that could affect this research.

References

Du, W.; Shen, H.; Fu, J.; Zhang, G.; He, Q. Approaches for improvement of the X-ray image defect detection of automobile casting aluminum parts based on deep learning. NDT E Int. 2019, 107, 102144. [Google Scholar] [CrossRef]
Arkin, E.; Yadikar, N.; Xu, X.; Aysa, A.; Ubul, K. A survey: Object detection methods from CNN to transformer. Multimed. Tools Appl. 2023, 82, 21353–21383. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
Girshick, R. Fast R-CNN. arXiv 2015, arXiv:1504.08083. [Google Scholar]
Ren, S. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.-Y.; Gao, B.-B.; Wu, J. Adaptive feeding: Achieving fast and accurate detections by adaptively combining object detectors. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3505–3513. [Google Scholar]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A review of yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Ross, T.-Y.; Dollár, G. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference; Springer: Cham, Switzerland, 2016; Part I; pp. 21–37. [Google Scholar]
Zhao, Z.; He, P. Yolo-mamba: Object detection method for infrared aerial images. Signal Image Video Process 2024, 18, 8793–8803. [Google Scholar] [CrossRef]
Huang, L.; Wang, Z.; Fu, X. Pedestrian detection using retinanet with multi-branch structure and double pooling attention mechanism. Multimed. Tools Appl. 2024, 83, 6051–6075. [Google Scholar] [CrossRef]
Biffi, L.J.; Mitishita, E.; Liesenberg, V.; dos Santos, A.A.; Gonçalves, D.N.; Estrabis, N.V.; Silva, J.d.A.; Osco, L.P.; Ramos, A.P.M.; Centeno, J.A.S.; et al. Atss deep learning-based approach to detect apple fruits. Remote Sens. 2020, 13, 54. [Google Scholar] [CrossRef]
Guo, R.; Liu, H.; Xie, G.; Zhang, Y. Weld defect detection from imbalanced radiographic images based on contrast enhancement conditional generative adversarial network and transfer learning. IEEE Sens. J. 2021, 21, 10844–10853. [Google Scholar] [CrossRef]
Kumaresan, S.; Aultrin, K.S.J.; Kumar, S.S.; Anand, M.D. Deep learning-based weld defect classification using vgg16 transfer learning adaptive fine-tuning. Int. J. Interact. Des. Manuf. 2023, 17, 2999–3010. [Google Scholar] [CrossRef]
Ji, C.; Wang, H.; Li, H. Defects detection in weld joints based on visual attention and deep learning. NDT E Int. 2023, 133, 102764. [Google Scholar]
Liu, M.; Chen, Y.; Xie, J.; He, L.; Zhang, Y. Lf-yolo: A lighter and faster yolo for weld defect detection of X-ray image. IEEE Sens. J. 2023, 23, 7430–7439. [Google Scholar] [CrossRef]
Pan, K.; Hu, H.; Gu, P. Wd-yolo: A more accurate yolo for defect detection in weld x-ray images. Sensors 2023, 23, 8677. [Google Scholar] [CrossRef]
Yang, L.; Xu, S.; Fan, J.; Li, E.; Liu, Y. A pixel-level deep segmentation network for automatic defect detection. Expert Syst. Appl. 2023, 215, 119388. [Google Scholar] [CrossRef]
Zuo, F.; Liu, J.; Fu, M.; Wang, L.; Zhao, Z. An efficient anchor-free defect detector with dynamic receptive field and task alignment. IEEE Trans. Ind. Inform. 2024, 20, 8536–8547. [Google Scholar] [CrossRef]
Zuo, F.; Liu, J.; Fu, M.; Wang, L.; Zhao, Z. STMA-net: A spatial transformation-based multi-scale attention network for complex defect detection with X-ray images. IEEE Trans. Instrum. Meas. 2024, 73, 5014511. [Google Scholar] [CrossRef]
Gu, J.; Xie, Z.Q.; Zhang, X.Y. Weld Defect Detection based on Improved Deep Learning. J. Astronaut. Metrol. Meas. 2020, 40, 75–79. [Google Scholar]
Guo, F.; Qian, Y.; Shi, Y.F. Real-time railroad track components inspection based on the improved YOLOv4 framework. Automat. Constr. 2021, 125, 103596. [Google Scholar] [CrossRef]
Mi, J.; Wen, X.; Sun, C.; Lu, Z.; Jing, W. Energy-efficient and Low Package Loss Clustering in UAV-assisted WSN using Kmeans++ and Fuzzy Logic. In Proceedings of the 2019 IEEE/CIC International Conference on Communications Workshops in China, ICCC Workshops 2019, Changchun, China, 11–13 August 2019; pp. 210–215. [Google Scholar]
Adnan, R.M.; Khosravinia, P.; Karimi, B.; Kisi, O. Prediction of hydraulics performance in drain envelopes using Kmeans based multivariate adaptive regression spline. Appl. Soft. Comput. 2021, 100, 107008. [Google Scholar] [CrossRef]
Wang, G.; Yuan, G.; Li, T.; Lv, M. An multi-scale learning network with depthwise separable convolutions. IPSJ Trans. Comput. Vis. Appl. 2018, 10, 11. [Google Scholar] [CrossRef]

Figure 1. Mish activation function.

Figure 2. YOLOv4 framework diagram.

Figure 3. Improvements of CSPDarknet53.

Figure 4. Function graph.

Figure 5. YOLOv4-cs1 framework diagram.

Figure 6. YOLOv4-cs2 framework diagram.

Figure 7. Data enhancement: (a) rotating, (b) cropping, and (c) increasing contrast.

Figure 8. LabelMe interface diagram.

Figure 9. Dataset partitioning.

Figure 10. F1 values of different models.

Figure 11. Missed detection rates of different models.

Figure 12. Comparison of AP and mAP for varied models.

Figure 13. Pores (red frames).

Figure 14. Slag inclusion (blue frames). (a) YOLOv4-cs2, (b) YOLOv4-cs1, (c) YOLOv4, (d) YOLOv3, (e) YOLOv4-Tiny, (f) Centernet, (g) SSD.

Figure 15. Incomplete penetration (green frames). (a) YOLOv4-cs2, (b) YOLOv4-cs1, (c) YOLOv4, (d) YOLOv3, (e) YOLOv4-Tiny, (f) Centernet, (g) SSD.

Figure 16. Images containing three defects simultaneously. (a) YOLOv4-cs2, (b) YOLOv4-cs1, (c) YOLOv4, (d) YOLOv3, (e) YOLOv4-Tiny, (f) Centernet, (g) SSD.

Table 1. Hardware information.

Processor	Graphics Card	Memory
AMD R7 4800H (TSMC, Taiwan, Chian)	NVIDIA GTX 1650Ti (NVIDIA, Santa Clara, CA, USA)	16 G

Table 2. Model parameters.

Status	Size	Batch Size	Learning Rate	Decay	Eps	Epoch
Before	416 × 416	8	1 × 10⁻³	5 × 10⁻⁵	1 × 10⁻⁸	0–85
After	416 × 416	2	1 × 10⁻⁴	5 × 10⁻⁵	1 × 10⁻⁸	86–180

Table 3. Comparison of results of different models.

Model	Pores		Slag Inclusions		Incomplete Penetration
Model	Precision	Recall	Precision	Recall	Precision	Recall
YOLOv4-cs2	91.81%	83.12%	92.19%	92%	87.19%	51.96%
YOLOv4-cs1	92.97%	73.88%	94.1%	85.68%	89.6%	46.8%
YOLOv4	97.28%	57.33%	94.2%	73.47%	81.12%	58.36%
YOLOv3	97.31%	65.07%	98.42%	79.05%	93.8%	48.10%
YOLOv4-Tiny	94.05%	47.26%	98.38%	64.11%	93.75%	8.42%
Centernet	99.71%	18.76%	99.77%	45.79%	99.9%	0.11%
SSD	95.26%	43.33%	97.56%	63.26%	99.9%	1.12%

Table 4. F1 values of different models.

Model	Pore F1	Slag Inclusion F1	Incomplete Penetration F1
YOLOv4-cs2	0.87	0.92	0.65
YOLOv4-cs1	0.82	0.89	0.62
YOLOv4	0.72	0.83	0.68
YOLOv3	0.78	0.88	0.64
YOLOv4-Tiny	0.63	0.78	0.15
Centernet	0.32	0.63	0.002
SSD	0.6	0.77	0.02

Table 5. Comparison of AP and mAP.

Model	Pore AP	Slag Inclusion AP	Incomplete Penetration AP	mAP
YOLOv4-cs2	92%	96%	74%	87.5%
YOLOv4-cs1	90%	96%	72%	85.79%
YOLOv4	88%	92%	75%	84.81%
YOLOv3	88%	95%	76%	86.30%
YOLOv4-Tiny	71%	84%	43%	65.93%
Centernet	87%	93%	74%	84.75%
SSD	83%	91%	49%	74.11%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Wu, Y. Detection of Welding Defects Tracked by YOLOv4 Algorithm. Appl. Sci. 2025, 15, 2026. https://doi.org/10.3390/app15042026

AMA Style

Chen Y, Wu Y. Detection of Welding Defects Tracked by YOLOv4 Algorithm. Applied Sciences. 2025; 15(4):2026. https://doi.org/10.3390/app15042026

Chicago/Turabian Style

Chen, Yunxia, and Yan Wu. 2025. "Detection of Welding Defects Tracked by YOLOv4 Algorithm" Applied Sciences 15, no. 4: 2026. https://doi.org/10.3390/app15042026

APA Style

Chen, Y., & Wu, Y. (2025). Detection of Welding Defects Tracked by YOLOv4 Algorithm. Applied Sciences, 15(4), 2026. https://doi.org/10.3390/app15042026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Welding Defects Tracked by YOLOv4 Algorithm

Abstract

1. Introduction

2. YOLOv4 Algorithm

3. Algorithm Improvement

3.1. k-Means++ Clustering

3.2. Framework Optimization

3.2.1. YOLOv4-cs1

3.2.2. YOLOv4-cs2

4. Experiment

4.1. Dataset

4.2. Hardware Facilities

4.3. Model Training

4.4. Experimental Results and Analysis

4.4.1. Comparison with Other Object Detection Models

4.4.2. Comparison of Test Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI