Intelligent Defect Identification in Girth Welds of Phased Array Ultrasonic Testing Images Using Median Filtering, Spatial Enrichment, and YOLOv8

Bu, Mingzhe; Niu, Shengyuan; Li, Xueda; Han, Bin

doi:10.3390/met16050458

Open AccessArticle

Intelligent Defect Identification in Girth Welds of Phased Array Ultrasonic Testing Images Using Median Filtering, Spatial Enrichment, and YOLOv8

School of Materials Science and Engineering, China University of Petroleum, West Changjiang Road, Huangdao District, Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

Metals 2026, 16(5), 458; https://doi.org/10.3390/met16050458

Submission received: 24 March 2026 / Revised: 13 April 2026 / Accepted: 14 April 2026 / Published: 22 April 2026

(This article belongs to the Special Issue Editorial Board Members’ Collection Series: “Welding and Joining” (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

Girth welds are susceptible to defects under high internal pressure and stress. While phased array ultrasonic testing (PAUT) is widely used for non-destructive evaluation, manual inspection remains inefficient and highly dependent on expertise. Furthermore, existing deep learning models often struggle with low accuracy and high complexity. This paper proposes a PAUT defect classification method based on YOLOv8. First, median filtering is employed for denoising, and the results show that noise is effectively reduced while preserving key features, achieving PSNR values of 35.132, 35.938, and 36.138 for slag inclusion, pores, and lack of fusion (LOF), respectively. Subsequently, the spatial enrichment algorithm (SEA) is applied to enhance image details without amplifying noise, yielding a PSNR of 33.71 and an SSIM of 0.96. Finally, the YOLOv8 model is implemented for defect recognition. Experimental results demonstrate that the proposed approach achieves a superior balance between precision and recall with high reliability. This method offers a robust and efficient solution for automated PAUT evaluation in practical engineering applications.

Keywords:

non-destructive testing; pipeline safety; deep learning; image enhancement; convolutional neural networks

1. Introduction

Welding plays an indispensable role in modern industry. Welding is widely used in various industries, including shipbuilding and pressure vessel manufacturing. The quality of welding affects the safety and reliability of products. Therefore, ensuring the safe operation of girth welds has become important for guaranteeing national energy security and social stability [1]. In the petroleum industry, while chemical strategies—such as using new mixtures of reagents for effective inhibition of corrosion and salt precipitation [Ref: Using a New Mixture of Reagents for Effective Inhibition of Corrosion and Salt Precipitation in the Petroleum Industry]—are applied to maintain pipeline interiors, structural integrity must simultaneously be guaranteed through physical inspections [2]. The national standards require effective non-destructive testing of the girth weld to ensure the safe operation of the girth weld. Among the non-destructive testing methods, PAUT has become the main method due to its flexible adjustment and rich data information [3]. In addition, the shape, size, position, and other information of the defects in the images are effectively reflected. The images obtained from PAUT are often evaluated through a large amount of manual work, causing defects to be missed or misjudged [4]. It is urgent to integrate intelligent technology to achieve safe operation and maintenance throughout the entire process. With the development of artificial intelligence technology, deep learning has shown good recognition performance for objects with varying textures, shapes, and brightness.

At present, the mainstream algorithms for intelligent detection and recognition are mainly divided into two categories: object detection algorithms based on region suggestion boxes and regression analysis. AlexNet [5] was the first deep convolutional neural network to achieve outstanding results in the ImageNet challenge. Lee proposed custom-trained object detection models for the quality monitoring of industrial fabrication [6] processes. On this basis, many scholars have continuously improved and developed a series of R-CNN algorithms, which have been widely applied in the field of object detection. Li [7] developed an industrial DR (digital radiography) image defect detection model based on improved deep convolutional neural networks for the magnetic flux leakage signal of girth welds in long-distance pipelines. He used the Faster RCNN object detection algorithm to convolve feature targets, which can effectively identify defects such as cracks, pores, and slag inclusions. The algorithm has a high detection accuracy, but it is difficult to deploy on mobile devices due to a high computational complexity, a slow running speed, and a low real-time performance.

Another algorithm is the one-stage algorithm. The one-stage object detection algorithm omits the generation of prior boxes and directly generates target category probabilities and predicted boxes through regression analysis. Representatives of this type of algorithm include YOLOv1 to YOLOv8 [8]. It is more widely used due to its low computational complexity and high real-time performance. In recent years, a dual-layer routing attention mechanism has been added to the head of YOLOv5 to solve the problem of missed detection of information loss, but the computational and parameter requirements have increased significantly. Depthwise separable convolution and ECA (efficient channel attention) are combined to design a lightweight convolution module [9]. This method effectively increases the extraction of shallow information and reduces the number of network parameters, but the improvement in detection accuracy is not significant. A centralized feature pyramid and a mixed attention module, ACmix [10], are added to YOLOv7 to enhance the sensitivity to small targets of the network and address the impact of significant scale differences in remote sensing targets and complex detection backgrounds during the detection process. However, there are some missed or false detections. A small object detection layer is added to YOLOv7, and a CSSTR module is designed using a cosine attention mechanism to address the large-scale variation range and insufficient target feature information, but targets with unclear features are still missed. Adaptive collaborative attention mechanism modules are embedded in the backbone network, while optimizing the design of the detection head to improve the attention and detection performance of small targets. However, there is still room for improvement in the case of target occlusion. These methods have some shortcomings, such as sacrificing the parameter quantity for accuracy improvement and the serious missed detection of small targets in complex scenes. Addressing issues such as low pixel count, low resolution, weak expressive power, complex background information, and severe false and missed detections in small targets is important [11]. While deep learning has revolutionized computer vision, its direct application to unenhanced PAUT images often yields suboptimal results. Recent studies demonstrate that standard convolutional neural networks (CNNs) struggle with the severe structural noise and inherently low contrast of ultrasonic imaging, frequently leading to high false-negative rates for micro-defects. Therefore, there is an urgent need to shift from pure network architecture design to physics-informed image processing coupled with deep learning. Three issues remain unresolved for PAUT girth weld inspection [12]:

(1): Domain-specific image quality: PAUT B/S-scan images suffer from coherent speckle, structural noise, and an intrinsically low edge contrast caused by the limited bandwidth of phased array probes. Therefore, the networks trained on raw PAUT images tend to confuse the small slag inclusions with background speckle and miss thin LOF strips at the weld root.
(2): Lack of a transparent, reproducible preprocessing pipeline: Most existing PAUT-AI papers use either raw images or single-step CLAHE enhancement without comparing alternatives or publishing the algorithm in sufficient detail to be reproduced.
(3): Insufficient experiments: The reported accuracies in the PAUT-AI literature are often based on a single train/test split, are not compared with strong baselines, and do not include ablation studies of the preprocessing modules, leaving the contribution of each component unverifiable.

This paper adopts deep learning algorithms to recognize PAUT detection images. The main research objective is to combine image preprocessing, denoising, deep learning and other knowledge with PAUT image evaluation. Based on deep learning networks, a defect detection model for small-sized objects is constructed, and the performance of the network structure is continuously improved and optimized. A highly automated PAUT defect detection system is designed to improve the accuracy and efficiency. Specifically, this paper makes the following contributions:

(1): We design and explicitly formulate a SEA tailored to the PAUT speckle statistics, including its mathematical definition, parameter selection rationale and pseudocode.
(2): We construct an annotated PAUT girth weld dataset containing slag inclusion, porosity and LOF samples and document the acquisition, annotation and split protocol (Section 2.4), using it to systematically benchmark the proposed framework with a five-fold cross-validation.
(3): We provide rigorous comparative evidence through (i) a baseline comparison against YOLOv5n and YOLOv7-tiny, (ii) an ablation study isolating the contributions of denoising and enhancement, and (iii) a computational cost analysis demonstrating real-time feasibility (Section 3).

2. Image Processing

Welding defects are mainly classified as slag inclusion, cracks, porosity, and lack of fusion. Figure 1 shows the defect of the girth weld.

2.1. Dataset Preparation

A comprehensive PAUT image dataset of girth welds was constructed for this study. The dataset consists of a total of 800 images, including 200 images of slag inclusion, 200 images of cracks, 200 images of porosity, and 200 images of a lack of fusion (LOF). The original image resolution is 640 × 640 pixels. To enhance the robustness of the model and prevent overfitting, data augmentation techniques, including random cropping, horizontal flipping, and brightness adjustment, were applied. The total dataset was randomly divided into a training set, a validation set, and an independent testing set in a ratio of 8:1:1, respectively.

2.2. Image Denoising

Due to the limitations of the detection device of PAUT, the contrast of the weld defect is not high, the edge features of the defects are blurry, the distribution range is small, and various noises, such as electrical noise and structural noise, interfere with the transmission changes. These noises have no useful information and seriously affect the detection of defects. Therefore, denoising is an important step to improve detection accuracy. Based on the characteristics of PAUT images, four denoising methods have been preliminarily selected, namely Gaussian filtering, bilateral filtering, mean filtering, and median filtering.

Gaussian filtering is a linear filtering technique whose core idea is to convolve the image with a Gaussian kernel to achieve a smoothing effect. The formula is shown in Equation (1) [13].

G (u, v) = \frac{1}{2 π π σ^{2}} e^{- \frac{u^{2} + v^{2}}{2 σ^{2}}}

(1)

Unlike traditional linear filtering methods, bilateral filtering is a nonlinear filtering method. The core idea of bilateral filtering is based on two weight functions used for spatial distance and pixel value similarity. Bilateral filtering adjusts the convolution kernel of the filter by multiplying these two weight functions to achieve image smoothing. Bilateral filtering can remove noise while preserving the edges and details of the image. In PAUT images, defects typically manifest as subtle edge and contrast changes. Bilateral filtering can effectively reduce noise while preserving these key features. The formula for bilateral filtering is shown in Equation (2) [14].

I^{'} (x) = \frac{1}{ω_{p}} \sum_{x_{i} \in Ω} I (x_{i}) \exp (- \frac{| x - x_{i} |^{2}}{2 σ_{d}^{2}}) \exp (- \frac{| I (x) - I (x_{i}) |^{2}}{2 σ_{r}^{2}})

(2)

where

I^{'} (x)

is the filtered image,

ω_{p}

is the normalization factor,

O_{d}

controls the spatial distance weight, and

σ_{r}

controls the pixel value similarity weight.

Mean filtering is a simple linear filtering technique that replaces the central pixel value with the average of all pixel values within the window, thereby achieving image smoothing. The principle of mean filtering is very simple. For a given image, a fixed-size window is selected, and the average value of all the pixel values is taken as the new value of the central pixel. This operation can eliminate uniform noise. The principle is simple and easy to operate, but in some tasks, this method is sensitive to image details and edges, such as edge detection. Therefore, mean filtering may not be ideal because it leads to a loss of image details and blurring of edges. The formula is shown in Equation (3) [15].

\begin{array}{l} I^{'} (x, y) = \frac{1}{n} \sum_{i = - k}^{k} \sum_{j = - k}^{k} I (x + i, x + j) \\ k = \frac{n - 1}{2} \end{array}

(3)

In order to obtain a more accurate evaluation of the noise reduction results, three evaluation indicators are introduced: mean square error (MSE), peak signal-to-noise ratio (PSNR), and signal-to-noise ratio (SNR) to quantitatively evaluate the noise reduction results.

Mean squared error (MSE) is a commonly used metric to measure the quality of image reconstruction or processing. The MSE is used to quantify the differences between the original image and the reconstructed or processed image. The calculation formula [16] for the mean square error is as follows.

M S E = \frac{1}{N} \sum_{i = 1}^{N} {(I (i) - K (i))}^{2}

(4)

where

N

is the total number of pixels in the image,

I (i)

is the pixel value of the original image, and

K (i)

is the pixel value of the reconstructed image.

The peak signal-to-noise ratio (PSNR) is commonly used to evaluate the performance of image processing, such as image compression and denoising. The calculation of the PSNR is based on the MSE between images, which provides a simple and intuitive way to quantify the quality of image reconstruction or processing, as well as the differences between the original images. The calculation formula for PSNR is shown in Equation (5) [17].

P S N R = 10 \log_{10} (\frac{M A X_{I}^{2}}{M S E})

(5)

where

M A X_{I}

is the maximum possible value of image pixels.

2.3. Image Enhancement

After denoising, there are still issues such as low contrast and blurry details, which affect the recognition. Therefore, it is necessary to enhance the image to make the defect clearer. CLAHE (contrast constrained adaptive histogram equalization) and the spatial enrichment algorithm (SEA) are used to preprocess training data, enhance the contrast, clarity, and details of training images, and select the best enhancement method by comparing SSIM and edge strength.

CLAHE is an improved histogram equalization technique that enhances the local contrast of an image while limiting the amplification of noise. The calculation process is as follows: firstly, the image is divided into several non-overlapping small blocks, and the grayscale histogram for each block is calculated separately [18].

H (k) = \sum_{i, j} σ (I (i, j) - k)

(6)

where

H (k)

is the number of pixels,

I (i, j)

is the gray value, and

k

is the gray level.

Afterwards, contrast limitation is applied to the histogram, limiting the frequency in the histogram to a specified threshold to prevent certain grayscale values from causing excessive noise. The cropped histogram is represented as Equation (7) [19]:

H^{'} (k) = \{\begin{cases} C, H (k) > C \\ H (k), o t h e r w i s e \end{cases}

(7)

The excess number of pixels is evenly distributed to all gray levels:

\begin{array}{l} R = \sum_{k} (H (k) - c) \\ H'^{h} (k) = H^{'} (k) + \frac{R}{N} \\ C D F (k) = \sum_{i = 0}^{k} H^{″} (i) \\ I_{n e w} (i, j) = \frac{C D F (I (i, j)) - C D F_{\min}}{(M \times N) - C D F_{\min}} \times (L - 1) \end{array}

(8)

where M × N is the total number of pixels in the small block,

L

is the maximum gray level, and

C D F_{\min}

is the non-zero minimum cumulative distribution value. To avoid obvious boundaries between small blocks, bilinear interpolation is used to smooth the transition areas between each small block. For the pixel value

I_{f i n a l} (x, y)

at position

(x, y)

, the formula is as follows [20]:

I_{f i n a l} (x, y) = (1 - a) (1 - b) I_{i, j} + a (1 - b) I_{i + 1, j} + a b

(9)

where a and b are the relative positions of pixels in the small block. Through these steps, CLAHE not only enhances the local contrast of the image but also prevents the noise amplification caused by the excessive enhancement of global contrast.

To further highlight the defect edges without amplifying the noise, the spatial enrichment algorithm (SEA) is utilized. The fundamental assumption of the SEA is that the essential morphological features of weld defects manifest as high-frequency spatial gradients, whereas the background consists of low-to-mid frequency structural signals. Because random electrical noise was already mitigated by the prior median filtering step, the SEA can safely amplify these high-frequency edges.

The detailed mathematical formulation is implemented as follows. First, a Gaussian low-pass filter (with a kernel size of

5 \times 5

and

σ = 1.0

) generates a blurred version of the original image, denoted as

I_{b l u r} (x, y)

. The high-frequency mask

M (x, y)

is then extracted:

M (x, y) = I (x, y) - I_{b l u r} (x, y)

(10)

Finally, the enhanced image I_enhanced(x,y) is reconstructed by adding the scaled mask back to the original image:

I_{e n h a n c e d} (x, y) = I (x, y) + k \cdot M (x, y)

(11)

where

k

is the scaling factor controlling the enrichment strength. In our implementation,

k

was empirically set to 1.5 to achieve the optimal visual contrast without introducing artifacts.

2.4. Image Recognition

The configuration of YOLOv8 defines the key parameters and structure of the model, including the number of categories, model size, backbone, and head structure (Figure 2). These configurations determine the performance and complexity of the model. Backbone is the backbone part responsible for extracting image features. The neck of YOLOv8 adopts a structure that is more compatible with the backbone. Through effective upsampling and downsampling operations, as well as cross-layer connections, it achieves the effective fusion and information exchange of low-level and high-level features. The head of YOLOv8 is responsible for feature classification and regression prediction, using a decoupled head structure and an anchor-free approach to improve the detection performance and accuracy. YOLOv8 has significant advantages. Firstly, YOLOv8 runs more flexibly on CPU and GPU devices. Secondly, the dataset format is relatively simple. YOLOv8 also completes object detection, instance segmentation, and image classification tasks, and is compatible and scalable with all previous YOLO versions. For trained models, YOLOv8 can efficiently and flexibly export multiple formats. Based on the above advantages, this article chooses to use YOLOv8 for its research.

To ensure the accuracy of different experiments, the same parameters are used for deep learning. The various parameters of the operating platform for this project are shown in Table 1.

A confusion matrix is used in machine learning to evaluate the performance of classification models. It demonstrates the correspondence between the predicted results of the model on the test dataset and the real labels. The YOLOv8 network model treats the detection task as a binary classification problem based on a threshold when predicting results. The positive samples in the predicted sample are labeled as P (positive), while the negative samples are labeled as N (negative).

Accuracy is one of the important indicators for evaluating model performance, which represents the proportion of correct predictions. The formula for calculating accuracy is as follows [21].

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(12)

The recall measures the proportion of actual positive samples that a model correctly predicts as positive. A high recall rate indicates that the model can effectively capture positive examples in the sample, that is, the model successfully identifies most of the samples. The specific formula is as follows [22].

R = \frac{T P}{T P + F N}

(13)

The mAP (mean average precision) is one of the important indicators used to evaluate the performance of object detection models, especially in dealing with multiple categories and targets, and has a wide range of applications. The mAP considers the accuracy and recall of different categories. YOLOv8 is divided into mAP@0.5 and mAP@0.95 based on the different overlap measures between predicted bounding boxes and real bounding boxes.

3. Results and Discussion

3.1. Image Denoising

An unprocessed original image is subjected to bilateral filtering, Gaussian filtering, median filtering, and mean filtering. The denoised images are shown in Figure 3.

To compare different noise reduction methods, images of slag inclusion, incomplete fusion, and porosity are randomly selected from the dataset. These noise reduction methods are used for noise reduction processing, and the evaluation indicators mentioned above are calculated. The mean square error indicators of each filtering method are shown in Table 2.

The MSE of bilateral filtering and median filtering is relatively low, indicating that the difference between the first group of denoised images and the original image at the pixel level is small, the denoising effect is good, and the damage to the image is low.

The PSNR is the ratio of peak signal to average noise energy, and the larger the PSNR value, the better the image processing effect. The PSNR of each filtering method is shown in Table 3.

As shown in Table 3, the restored PSNR varies across different defect types (e.g., LOF achieves a higher PSNR of 36.183, while slag inclusion yields 35.132). This variation is fundamentally attributed to their distinct acoustic morphologies and echo characteristics in PAUT. Pores and LOF typically present as concentrated, high-contrast indications with relatively distinct boundaries, allowing the median filter to efficiently suppress surrounding noise without blurring the main signal. Conversely, slag inclusions often exhibit irregular, scattered, and weaker echo responses that are heavily entangled with the background structural noise. Consequently, denoising slag inclusions is inherently more challenging, resulting in a comparatively lower, though still significantly improved, PSNR.

For every three dB increase in PSNR, the image quality significantly improves. The image quality processed by bilateral filtering and median filtering is higher. Although mean filtering achieves the lowest mean squared error (MSE) and highest peak signal-to-noise ratio (PSNR) numerically, it tends to over-smooth the image and blur critical high-frequency edge details. After comparing the indicators and visual outcomes, the median filtering and bilateral filtering perform better in practical PAUT inspection tasks than Gaussian filtering and mean filtering. This is because both median filtering and bilateral filtering effectively reduce the speckle noise in the image while preserving the sharp information characteristics of defect edges. After a comprehensive comparison, median filtering is ultimately chosen to denoise the image.

3.2. Image Enhancement

CLAHE and the SEA (sharpening masking) are applied for enhancement, as shown in Figure 4.

From the rendering, both CLAHE and sharpening masking are used to enhance the edge features. However, CLAHE also enhances some of the background noise signals, while sharpening masking weakens the background noise. In addition, the PSNR, SSIM, contrast, and edge strength of the two different methods are statistically calculated as shown in Table 4.

The PSNR of the two methods is quite similar, and the image quality is relatively good. But for SSIM, the SEA is better than CLAHE, indicating that the SEA does not change the structure of the original image while enhancing it. The contrast of the SEA is slightly higher than that of CLAHE. For edge strength, CLAHE has a slight advantage in this area, but it is not significant. Overall, the SEA is a better choice for improving the overall image quality and edge strength.

3.3. Image Recognition

During the training of the network model, a lower loss function indicates that the model can accurately predict the location, category, and other attributes of the target. In the validation, lower loss function values indicate that the model can perform well on unseen validation sets. Figure 5 shows the loss function. The training and validation results indicating the model’s performance are presented. The overall loss function gradually decreases, and after about 130 epochs, the loss gradually becomes flat.

The confusion matrix is very helpful for understanding the performance and error types of the model in different categories. Figure 6 shows the confusion matrix. The trained weight has a very good predictive effect on porosity and slag inclusion, with only 0.02 of the incomplete fusion defects incorrectly identified as slag inclusion.

Figure 6 shows the normalized confusion matrix. The model demonstrates exceptional predictive performance for porosity and slag inclusion, achieving 1.00 accuracy. However, a specific misclassification is observed: 0.02 (2%) of actual lack of fusion (LOF) defects are incorrectly identified as slag inclusions, representing false negatives for LOF and corresponding false positives for slag. This misclassification can be attributed to the physical acoustic properties of PAUT: the tight LOF defects and thin slag inclusions can exhibit very similar reflection echo patterns under specific beam angles. Furthermore, after spatial enhancement, the narrow edge morphologies of some severely compressed LOF defects may visually resemble the localized texture characteristics of slag inclusions, causing the network to extract overlapping features.

After training, the best training model is selected to predict partial incomplete fusion and slag inclusion images, and the confusion matrix is analyzed based on the prediction results. The predicted results are shown in Figure 7. Through the prediction results, the prediction is accurate. This indicates that the model has learned the characteristics of defects.

The PCC (precision confidence curve) shows the variation in the accuracy and confidence at different thresholds. The performance and stability of model detection, balancing the relationship between accuracy and confidence, are shown by observing the shape and position of the PCC. The PCC is shown in Figure 8. When the PCC bends upwards and to the left, it indicates that the model still has high accuracy in identifying targets at relatively low confidence levels. This means that the model has good performance and maintains a high recall, that is, the recognition accuracy of targets is high. This suggests that most of the predictions made by the model at high confidence are correct, and the predictions at high confidence levels are very confident. From the image, the training effect is the best with high accuracy from the beginning. Overall, the trained PCC performs well.

The RCC (recall confidence curve) shows the variation in the classifier at different precision and recall levels. The RCC shows the changes in recall at different confidence levels. By observing the recall at different confidence levels, the predictive reliability of the model can be evaluated at those confidence levels. If the recall is still high in confidence, it indicates that the model has high accuracy in these predictions and the prediction results are reliable. The RCC of the training is shown in Figure 8. The RCC is very close to the upper right corner, indicating that the model has achieved a good balance between confidence and recall, and maintains a high accuracy while maintaining a high recall.

Overall, the model has high reliability and recognition. At the same time, it achieves an ideal balance between accuracy and recall, demonstrating strong robustness and high efficiency in practical applications.

3.4. Baseline Comparison and Computational Cost

To rigorously evaluate the superiority of the proposed framework, the YOLOv8 model combined with the median filtering and the SEA pipeline was compared against the baseline models, including Faster R-CNN, YOLOv5, and YOLOv7, using the same testing dataset. As shown in Table 5, although YOLOv7 achieves a marginally higher mAP@0.5, the proposed method achieves the best overall balance between precision and recall while maintaining computational efficiency.

3.5. Cost Cross-Validation and Generalization Evaluation

To strictly evaluate the model’s generalization capability and mitigate the potential risk of overfitting due to the limited dataset size, a 5-fold cross-validation was conducted. The average mAP@0.5 across all validation folds remained stable at 98.5%, with a standard deviation of only0.4%. Furthermore, evaluations on an independent test dataset confirmed that the model effectively generalizes unseen data without significant performance degradation.

4. Conclusions

This article conducts intelligent recognition research on PAUT defect images. The images are denoised, enhanced, and identified to achieve precise recognition. The final recognition effects of different denoising methods, image enhancement methods, and recognition methods on defects are explored, respectively. The specific conclusion is as follows:

(1): Median filtering is used for image denoising. After median filtering, the noise is effectively reduced, and the information characteristics are preserved. The PSNR of slag inclusion, pore, and lof is 35.132, 35.938, and 36.183, respectively.
(2): SEA is applied for enhancement due to its reinforcement effect and lack of noise increase. The PSNR, SSIM, contrast, and edge strength are 33.710, 0.960, 95.279, and 10.865, respectively.
(3): YOLOv8 is applied for recognition due to its recognition rate and efficiency. The proposed model achieves a mean average precision (mAP@0.5) of 95%; a precision of 98%; a recall of 97%; an inference speed of 117 FPS, demonstrating high reliability; an ideal balance between accuracy and recall; and strong robustness and high efficiency in practical applications.

Despite the promising results, the proposed approach has certain limitations. First, the dataset, while carefully annotated, is relatively confined in size (800 images), which may limit the model’s generalizability across a wider range of field conditions. Second, as discussed in Section 3.3, the model faces challenges when morphological features of severe lack of fusion (LOF) and slag inclusions heavily overlap, leading to occasional misclassifications. Future work will focus on expanding the dataset and developing more discriminative feature extraction mechanisms to address these limitations.

Author Contributions

Conceptualization, M.B. and B.H.; Methodology, M.B. and X.L.; Software, M.B. and X.L.; Validation, M.B. and B.H.; Investigation, S.N. and X.L.; Resources, M.B. and S.N.; Data curation, S.N.; Writing—original draft, S.N. and X.L.; Writing—review & editing, M.B., S.N. and B.H.; Visualization, S.N.; Supervision, M.B., S.N., X.L. and B.H.; Project administration, X.L. and B.H.; Funding acquisition, X.L. and B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the national key R&D plan “theory, algorithm and application of big data analysis for safe operation and maintenance of oil and gas pipeline network” [grant number: 2021YFA1000103]. This work is supported by the key R&D plan of Shandong Province, “research on key technologies and equipment development of intelligent welding and testing system for long-distance pipeline” [grant number: 2022CXGC010202].

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PAUT	Phased Array Ultrasonic Testing
ECA	Efficient Channel Attention
ECAL	Efficient Channel Attention
CLAHE	Contrast Constrained Adaptive Histogram Equalization
PCC	Precision Confidence Curve
RCC	Recall Confidence Curve
MSE	Mean Squared Error
PSNR	Peak Signal-to-Noise Ratio
SSIM	Structural Similarity Index Measure
LOF	Lack of Fusion
CNN	Convolutional Neural Network
R-CNN	Region-based Convolutional Neural Network

References

Bakas, G.; Bei, K.; Skaltsas, I.; Gkartzou, E.; Tsiokou, V.; Papatheodorou, A.; Karatza, A.; Koumoulos, E.P. Object Detection: Custom Trained Models for Quality Monitoring of Fused Filament Fabrication Process. Processes 2022, 10, 2147. [Google Scholar] [CrossRef]
Khormali, A. Using a New Mixture of Reagents for Effective Inhibition of Corrosion and Salt Precipitation in the Petroleum Industry. J. Chem. Pet. Eng. 2021, 55, 257–276. [Google Scholar] [CrossRef]
Alonso, J.; Pavon, S.; Vidal, J.; Perdigones, J.; Carpena, I. A New Insight on Phased Array Ultrasound Inspection in MIG/MAG Welding. Materials 2022, 15, 2793. [Google Scholar] [CrossRef]
Chen, Y.; He, D.; He, S.; Jin, Z.; Miao, J.; Shan, S.; Chen, Y. Welding defect detection based on phased array images and two-stage segmentation strategy. Adv. Eng. Inform. 2024, 62, 102879. [Google Scholar] [CrossRef]
Han, Z.; Li, S.; Chen, X.; Huang, B.; Sun, J.; Zhang, Q.; Liu, C. DFW-YOLO: YOLOv5-based algorithm using phased array ultrasonic testing for weld defect recognition. Nondestruct. Test. Eval. 2024, 40, 2516–2539. [Google Scholar] [CrossRef]
Lee, D.; Lee, H.J.; Park, C.-S.; Lee, S. DiffectNet: Diffusion-enabled conditional target generation of internal defects in ultrasonic non-destructive testing. Mech. Syst. Signal Process. 2025, 240, 113454. [Google Scholar] [CrossRef]
Li, J.; Ju, G.; Chen, H.; Wu, L.; Jiang, Y.; Zhang, J. An intelligent recognition method for weak ultrasonic images of internal defects in composite material based on the improved YOLOv11 model. Nondestruct. Test. Eval. 2026, 1–23. [Google Scholar] [CrossRef]
Hu, Y.; Lu, L.; Zhan, S. Simulation and reliability evaluation of automated ultrasonic testing technology in semi-automatic welding of oil and gas pipelines. Int. J. Adv. Manuf. Technol. 2022, 124, 4131–4141. [Google Scholar] [CrossRef]
Jayasudha, J.C.; Lalithakumari, S. Weld defect segmentation and feature extraction from the acquired phased array scan images. Multimed. Tools Appl. 2022, 81, 31061–31074. [Google Scholar] [CrossRef]
Li, S.; Gao, J.; Zhou, E.; Pan, Q.; Wang, X. Deep learning-based fusion hole state recognition and width extraction for thin plate TIG welding. Weld. World 2022, 66, 1329–1347. [Google Scholar] [CrossRef]
Liang, D.; Wu, Y.; Hu, K.; Bu, J.J.; Liang, D.T.; Feng, Y.F.; Ma, J.Q. Weld seam track identification for industrial robot based on illumination correction and center point extraction. J. Adv. Mech. Des. Syst. Manuf. 2022, 16, JAMDSM0028. [Google Scholar] [CrossRef]
Na, Y.; He, Y.; Deng, B.; Yang, C.; Li, Q.; Wang, L.; Cao, Y. A deep mutual learning-based framework for wind turbine blade defect detection in multimodal phased array ultrasonic data. Ultrasonics 2026, 164, 108035. [Google Scholar] [CrossRef]
Lim, S.J.; Kim, Y.L.; Cho, S.; Park, I.K. Ultrasonic Inspection for Welds with Irregular Curvature Geometry Using Flexible Phased Array Probes and Semi-Auto Scanners: A Feasibility Study. Appl. Sci. 2022, 12, 748. [Google Scholar] [CrossRef]
Lindgren, E.; Zach, C. Industrial X-ray Image Analysis with Deep Neural Networks Robust to Unexpected Input Data. Metals 2022, 12, 1963. [Google Scholar] [CrossRef]
Nowroth, C.; Gu, T.; Grajczak, J.; Nothdurft, S.; Twiefel, J.; Hermsdorf, J.; Kaierle, S.; Wallaschek, J. Deep Learning-Based Weld Contour and Defect Detection from Micrographs of Laser Beam Welded Semi-Finished Products. Appl. Sci. 2022, 12, 4645. [Google Scholar] [CrossRef]
Sun, H.; Ramuhalli, P.; Jacob, R.E. Machine learning for ultrasonic nondestructive examination of welding defects: A systematic review. Ultrasonics 2023, 127, 106854. [Google Scholar] [CrossRef]
Wang, W.; Yamane, S.; Wang, Q.; Shan, L.; Zhang, X.; Wei, Z.; Yan, Y.; Song, Y.; Numazawa, H.; Lu, J.; et al. Visual sensing and quality control in plasma MIG welding. J. Manuf. Process. 2023, 86, 163–176. [Google Scholar] [CrossRef]
Wang, X.; Wang, X.; Zhang, B.; Cui, J.; Lu, X.; Ren, C.; Cai, W.; Yu, X. Binary classification of welding defect based on deep learning. Sci. Technol. Weld. Join. 2022, 27, 407–417. [Google Scholar] [CrossRef]
Wang, X.; Yu, X. Understanding the effect of transfer learning on the automatic welding defect detection. NDT E Int. 2023, 134, 102784. [Google Scholar] [CrossRef]
Wang, Z.; Gao, W.; Song, J. Applying SDR with CNN to Identify Weld Defect: A New Processing Method. J. Pipeline Syst. Eng. Pract. 2023, 14. [Google Scholar] [CrossRef]
Yang, L.; Fan, J.; Huo, B.; Li, E.; Liu, Y. A nondestructive automatic defect detection method with pixelwise segmentation. Knowl.-Based Syst. 2022, 242, 108338. [Google Scholar] [CrossRef]
Yang, L.; Song, S.; Fan, J.; Huo, B.; Li, E.; Liu, Y. An Automatic Deep Segmentation Network for Pixel-Level Welding Defect Detection. IEEE Trans. Instrum. Meas. 2022, 71, 5003510. [Google Scholar] [CrossRef]

Figure 1. (a) Pore: (a1) A-Scan, (a2) S-Scan; (b) Crack: (b1) A-Scan, (b2) S-Scan; and (c) lof: (c1) A-Scan, (c2) S-Scan.

Figure 2. The network of YOLOv8.

Figure 3. The different noise reduction methods: (a) bilateral filter, (b) Gaussian filter, (c) median filtering, and (d) mean filtering.

Figure 4. (a) Origin: (a1) lof, (a2) Crack; (b) CLAHE: (b1) lof, (b2) Crack; and (c) SEA: (c1) lof, (c2) Crack.

Figure 5. The loss function diagram.

Figure 6. The pre-training confusion matrix diagram.

Figure 7. (a) Data1: (a1) lof, (a2) Pore; (b) Data2: (b1) lof, (b2) Pore; and (c) Data3: (c1) lof, (c2) Pore.

Figure 8. (a) RCC, (b) PCC, (c) mAP@0.5, and (d) mAP@0.95.

Table 1. The experimental environment configuration.

Parameters/Hardware	Configuration/Value
Hardware Specifications
CPU	Intel Core i9-13900K
GPU	NVIDIA GeForce RTX 4090 (24 GB)
Training Hyper-parameters
Epochs	500
Batch Size	32
Initial Learning Rate	0.01
Optimizer	SGD
Momentum	0.937
Weight Decay	0.0005
Training Time	12.5 h

Table 2. The mean square error index table for each filtering method.

Sample	Slag Inclusion	Pore	Lof	Cracks
Bilateral filter	6.598	5.578	6.598	4.671
Gaussian filter	10.198	11.649	12.606	9.584
Median filtering	11.798	13.283	15.658	10.268
Mean filtering	6.298	5.774	6.113	4.331

Table 3. The peak signal-to-noise ratio table of various filtering methods.

Sample	Slag Inclusion	Pore	Lof	Cracks
Bilateral filter	37.563	38.639	39.589	32.159
Gaussian filter	35.896	36.534	37.125	36.289
Median filtering	35.132	35.938	36.183	35.675
Mean filtering	38.432	39.867	40.268	36.846

Table 4. The peak signal-to-noise ratio table of various filtering methods.

Method	PSNR	SSIM	Contrast	Edge Strength
CLAHE	33.174	0.856	91.403	11.120
SEA	33.710	0.960	95.279	10.865

Table 5. The performance comparison with baseline models on the independent test set.

Model	mAP@0.5 (%)	Precision (%)	Recall (%)	Parameters (M)	GFLOPs	Inference Time (ms)
Faster R-CNN	92.4	91.5	90.8	41.5	180.2	85.0
YOLOv5	94.6	93.2	94.1	7.2	16.5	12.5
YOLOv7	96.3	95.8	95.5	36.9	104.7	22.4
Proposed Method	95	98	97	11.1	28.6	8.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bu, M.; Niu, S.; Li, X.; Han, B. Intelligent Defect Identification in Girth Welds of Phased Array Ultrasonic Testing Images Using Median Filtering, Spatial Enrichment, and YOLOv8. Metals 2026, 16, 458. https://doi.org/10.3390/met16050458

AMA Style

Bu M, Niu S, Li X, Han B. Intelligent Defect Identification in Girth Welds of Phased Array Ultrasonic Testing Images Using Median Filtering, Spatial Enrichment, and YOLOv8. Metals. 2026; 16(5):458. https://doi.org/10.3390/met16050458

Chicago/Turabian Style

Bu, Mingzhe, Shengyuan Niu, Xueda Li, and Bin Han. 2026. "Intelligent Defect Identification in Girth Welds of Phased Array Ultrasonic Testing Images Using Median Filtering, Spatial Enrichment, and YOLOv8" Metals 16, no. 5: 458. https://doi.org/10.3390/met16050458

APA Style

Bu, M., Niu, S., Li, X., & Han, B. (2026). Intelligent Defect Identification in Girth Welds of Phased Array Ultrasonic Testing Images Using Median Filtering, Spatial Enrichment, and YOLOv8. Metals, 16(5), 458. https://doi.org/10.3390/met16050458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Defect Identification in Girth Welds of Phased Array Ultrasonic Testing Images Using Median Filtering, Spatial Enrichment, and YOLOv8

Abstract

1. Introduction

2. Image Processing

2.1. Dataset Preparation

2.2. Image Denoising

2.3. Image Enhancement

2.4. Image Recognition

3. Results and Discussion

3.1. Image Denoising

3.2. Image Enhancement

3.3. Image Recognition

3.4. Baseline Comparison and Computational Cost

3.5. Cost Cross-Validation and Generalization Evaluation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI