Deep CNN-Based Multi-Class TIG Welding Defect Classification Using HDR Images with Explainable AI

Nikam, Deepika; Nikam, Sagar; Bhosale, Tejaswini; Harkin, Declan; Sawant, Mayur; McGarrigle, Cormac

doi:10.3390/jmmp10060193

Open AccessArticle

Deep CNN-Based Multi-Class TIG Welding Defect Classification Using HDR Images with Explainable AI

by

Deepika Nikam

¹,

Sagar Nikam

^1,*

,

Tejaswini Bhosale

²,

Declan Harkin

¹,

Mayur Sawant

³ and

Cormac McGarrigle

¹

School of Computing, Engineering, and Intelligent Systems, Ulster University, Magee Campus, Northland Road, Derry/Londonderry BT48 7JL, Northern Ireland, UK

²

Department of Computer Science and Engineering, MIT Art, Design and Technology University, Pune 412201, India

³

Department of Mechanical Engineering, MIT Art, Design and Technology University, Pune 412201, India

^*

Author to whom correspondence should be addressed.

J. Manuf. Mater. Process. 2026, 10(6), 193; https://doi.org/10.3390/jmmp10060193

Submission received: 18 April 2026 / Revised: 23 May 2026 / Accepted: 27 May 2026 / Published: 30 May 2026

Download

Browse Figures

Versions Notes

Abstract

Recent advances in deep convolutional neural networks (D-CNNs) have improved automated welding defect inspection. This study presents an explainable comparative framework for multi-class classification of defects in Aluminium 5083 TIG weld joints using High Dynamic Range (HDR) image data, integrating a transfer-learning model, stratified five-fold cross-validation, computational-time analysis, and Grad-CAM-based visual interpretation. Five transfer-learning-based D-CNN architectures such as VGG16, VGG19, Inception V3, MobileNet, and DenseNet were trained, validated, and tested under a common evaluation protocol to assess their suitability for welding defect classification. The dataset was organised into classes such as good weld, contamination, lack of fusion, lack of penetration, and misalignment. Model performance was compared using multiple evaluation metrics. Stratified five-fold cross-validation was also performed to assess model stability. Alongside the cross-validation, training/inference times were also recorded to evaluate computational feasibility. Grad-CAM was used as an explainable artificial intelligence (XAI) technique in order to provide visual interpretation of weld regions. Among evaluated models, DenseNet achieved the best overall performance, with a classification accuracy of 98%, and showed the least confusion across defect classes. The Grad-CAM visualisations showed that the model focused on defect-relevant weld regions, demonstrating that transfer-learning D-CNNs with XAI can support TIG welding defect classification and effective visual quality assessment.

Keywords:

welding defects; deep convolutional neural networks; Grad-CAM; defect detection

1. Introduction

In contemporary manufacturing, welding is one of the most important processes for creating strong and durable joints between components. It is widely used in applications ranging from large structural assemblies to precision-engineered products [1]. Among the different welding methods, Tungsten Inert Gas (TIG) welding, also known as Gas Tungsten Arc Welding (GTAW), is widely used for producing precise and clean welds. Unlike processes that use consumable electrodes, TIG welding employs a non-consumable tungsten electrode with a separate filler rod, while an inert shielding gas, such as argon or helium, protects the weld pool from contamination [2,3]. Owing to the high level of control over heat input and weld quality in TIG welding, it is commonly used to weld thin sheets of metal, pipes, and geometrically complex components [4].

Despite its advantages, TIG welding is still prone to defects due to the rapid heating and cooling of the metal, which can generate deformation, residual stress, and local irregularities in the weld region [5]. Common defects include contamination, spatter, lack of fusion, lack of penetration, and misalignment. These defects can compromise the weld integrity, reduce service performance, and increase the risk of product failure if not identified in time [6,7]. Therefore, reliable inspection of the weld quality is essential for maintaining structural safety, reducing rework, and improving manufacturing efficiency [8,9,10].

Conventional weld inspection techniques, such as liquid penetrant testing, ultrasonic testing, magnetic particle inspection, radiographic testing, and visual inspection, are widely used for defect detection [11,12]. However, these approaches can be labour-intensive, time-consuming, and sometimes inconsistent, particularly when the inspection result depends on manual judgement [13]. In recent years, deep learning-based image analysis has emerged as a promising alternative to the traditional techniques for weld inspection because it can extract discriminative features directly from image data and support faster, more objective, and potentially more scalable defect assessment [14,15,16,17,18,19,20,21]. Apart from welding, AI-based monitoring and detection frameworks are being increasingly applied to broader engineering inspection tasks, including underground pipeline recognition, spatial localisation, road service-performance monitoring, and hidden defect identification, thus demonstrating the wider relevance of deep learning-based inspection methods across engineering systems [22,23].

1.1. Literature Review

Zhang et al. [24] conducted a study on the identification and categorisation of weld defects using the Cut-Cascade RCNN model. The work was aimed at utilising weakly supervised semantic segmentation to improve the precision and effectiveness of defect identification in welds. The findings of the study showed promising results, with an accuracy rate of 90.15%, suggesting its potential for practical implementation in enhancing weld quality assurance procedures. Say et al. [25] introduced an innovative method for classifying welding defects including cavity, cracks, inclusion slag, lack of fusion, shape defects, and normal defects using the D-CNN architectures Inception V3, Random Forest, LeNet and CNN. They focused on automating the classification process and improving the precision and efficiency of weld defect detection. The CNN model classified the weld defects with a high accuracy of 92% compared to other D-CNN models. Cherkasov et al. [26] conducted a study using the iRVision computer vision system integrated with deep learning models to detect welding defects. In utilising the iRVision system they developed models to classify the weld data into two categories: defective and non-defective. The most successful neural network model achieved an accuracy of 0.92, a precision of 0.90, and a recall of 0.81 on the test sample.

Particularly focusing on the TIG welding process, Boutin et al. [27] conducted research by integrating image processing with Machine Learning (ML) to categorise weld configurations. This was an in situ analysis and monitoring of the TIG welding process to ensure weld quality in joining 316L stainless steel. They utilised a K-nearest neighbour (KNN) classification algorithm to categorise various weld configurations based on extracted geometric characteristics. Analysing the images of the melt pool, the proposed model showed strong predictive ability for real-time process control and the enhancement of welding procedures through the integration of ML models. Ghimire and Selvam [28] showed the use of Convolutional Neural Networks (CNNs) for the purpose of automated defect classification in the Tungsten Inert Gas (TIG) welding process. The aim was to improve weld quality by overcoming the drawbacks associated with manual inspection. For this purpose, the weld quality monitoring system was integrated with the CNN model for weld monitoring and automating defect classification. This approach reduced the need for manual inspection and boosted the efficiency and reliability of the weld quality, thereby creating new possibilities for advancements in industrial welding processes.

Some researchers have proposed the integration of High Dynamic Range (HDR) technology with the CNN models for defect monitoring. Bacioiu et al. [29,30] utilised the CNN model to improve the process of classifying welding defects. Initially the dataset was built to capture multiple defects in the welding of SS304 stainless steel and Aluminium 5083. Following this, the conventional CNN model was applied to the dataset with the aim of classifying defects such as burn through, contamination, lack of fusion, misalignment and lack of penetration. The six-class defects classification accuracy for SS304 stainless steel is 93.4%, while for Aluminium 5083 the accuracy is 71%. Using the dataset for SS304 stainless steel, Sekhar et al. [31] conducted research on utilising a pre-trained deep learning model to classify six types of weld defects. They focused on addressing the class imbalance issue in order to enhance accuracy and robustness. They found that the DenseNet169-SGD architectures gave an accuracy rating of 97.28% in the six-class defects classification. On the other hand, for the dataset on Aluminium 5083, Wang et al. [32] proposed using CNNs such as ResNet18 and WeldNet. WeldNet is the combination of CNN and FCN. Additionally, WeldNet was combined with Focal Ensemble (FE) and Knowledge Distillation (KD) to improve the accuracy of defect classification. It was found that WeldNet + FE achieved an accuracy of 89.1%, compared with 79.3% for the ResNet18 model.

Explainable artificial intelligence has also become increasingly important in manufacturing inspection because high classification accuracy alone does not guarantee that a CNN is using physically meaningful defect features. Grad-CAM is particularly useful in this context because it generates class-wise heatmaps by using gradient information from convolutional layers, thereby highlighting the image regions that contribute best to model prediction [33]. In manufacturing environments, this allows engineers to assess whether the model focuses on relevant defect zones, material discontinuities, weld irregularities, corrosion regions, or process-induced features. Recent manufacturing studies have demonstrated the value of Grad-CAM-based interpretation across different inspection tasks. Beyond image-based defect inspection, Wu et al. [34] demonstrated that physically interpretable data-driven models, supported by XAI methods such as SHAP and Grad-CAM, can reveal process dynamics for real-time cavity profile prediction in electrochemical machining. Similarly, Wu et al. [35] used a data-driven CNN approach alongside Grad-CAM to identify frequency regions associated with acoustic emission source motion and positioning effects in laser powder bed fusion, thus demonstrating the value of XAI for interpreting process dynamics beyond image-based inspection. Lee et al. [36] applied CNN-based defect detection to wire arc additive manufacturing using high dynamic range images and used Grad-CAM to verify the basis of the model’s feature learning. Aminudin et al. [37] used Grad-CAM for binary corrosion image classification to provide an explainable visual interpretation of corrosion-related image regions. Elhendawy and El-Taybany [38] applied CNN-based machine vision to welding defect classification and used visualisation techniques to show that the best-performing model responded strongly to weld defect regions. These studies show that Grad-CAM and related visual explanation approaches can support model transparency by linking CNN predictions with physically meaningful manufacturing features and process dynamics.

Although previous studies have applied CNN-based models and XAI techniques to manufacturing inspection, the literature still shows three practical gaps in TIG weld inspection using HDR image data. First, few studies provide a systematic comparison of multiple transfer-learning architectures under the same experimental conditions for multi-class TIG defect classification. Second, limited attention has been given to explaining model predictions in a way that supports engineering interpretation of defect-specific regions. Third, the effect of using a balanced and defect-organised dataset structure for a consistent comparison across different CNN models has not been sufficiently discussed for TIG welding. To address these gaps, this study develops an application-specific and explainable comparative framework for TIG weld defect classification using HDR imagery. The study performs a controlled comparison of five transfer-learning-based D-CNN models under a common evaluation protocol and employs Grad-CAM as an explainable artificial intelligence (XAI) technique to visually interpret defect-relevant regions in the weld images.

1.2. Aim and Objectives

This paper aims to develop an explainable deep learning framework for multi-class classification and XAI-based visual interpretation of TIG welding defects using HDR image data. The study evaluates multiple transfer-learning-based D-CNN architectures under a common protocol to identify the most suitable model for defect classification and to examine whether the model focus, interpreted through the XAI technique Grad-CAM, corresponds to the defect-relevant regions in the weld images. This aim has been achieved using the following methodology:

To organise and balance the TIG weld image dataset into five classes: good weld, contamination, lack of fusion, lack of penetration, and misalignment.
To train, validate, and test five transfer-learning-based D-CNN architectures under the same experimental conditions.
To compare the models using multiple performance measures, including accuracy, precision, recall, and F1-score, along with stratified five-fold cross-validation and training/inference-time analysis, to identify the most suitable architecture for multi-class defect classification.
To interpret the predictions of the best-performing model using Grad-CAM and examine whether the highlighted regions correspond to visually meaningful defect locations.

2. Materials and Methods

Figure 1 shows the process used in the current study to classify and detect defects in the welded joints performed by the TIG welding process. Figure 1 and following subsections explain the methodology adopted to organise, preprocess and partition data, the D-CNN models used for training, testing validation and classification of defects and the application of the best model on image defects detection.

The detection of defects helps to identify and fine-tune the process parameters that can eliminate the defects in joining components. This data plays a vital role in building a machine learning model, which can be used to identify and fine-tune the process parameters. This paper uses a public dataset from Bacioiu et al. [30] on the TIG welding of Aluminium 5083. The dataset is available through an online repository at https://www.kaggle.com/datasets/danielbacioiu/tig-aluminium-5083 (accessed on 2 April 2024) [39]. The dataset consists of both good and defective welds captured in real-time using weld monitoring cameras. The dataset was created during the welding of 5083 grade Aluminium using the TIG welding process. The camera model Xiris XVC-1000 was used to record images. This camera has a High Dynamic Range (HDR) of 140+ dB, which reduces arc brightness and brings up details from the weld pool and surrounding area, generating tones with greater accuracy. The use of HDR imaging is particularly important in examining TIG welding because the intense arc brightness and darker surrounding weld-pool regions can cause saturation and loss of defect-relevant visual information in conventional images. By preserving details across both bright and dark regions, HDR imaging improves the visibility of weld-pool boundaries, surface irregularities, and local defect features, thereby supporting a more reliable image-based defect classification. The original database contained 30,625 images. Each image was captured from a recorded video at 0.018 s. intervals. However, this data was totally disorganised and created an imbalance in the data partitioning. To overcome this problem, in this study, attempt was made to organise the data according to the previously determined classes so as to improve the computational efficiency of the testing. For this reason, the data was organised and structured into 5 classes such as good weld, contamination, lack of fusion, lack of penetration and misalignment. This is shown in Figure 2.

2.1. Dataset Pre-Processing, Training Configuration, and Computational Environment

The original image size of 1280 × 1024 pixels was cropped down to 800 × 974 pixels to reduce the black pixels surrounding the melt pool [30]. Blurry images and images with poor light conditions were discarded from the dataset. Additionally, the operations for data augmentation such as rescale factor, shear range, zoom range, rotation range, width and height shift range, vertical and horizontal flip, and fill mode have been applied to the dataset as shown in Figure 1.

The images in the dataset were re-organised separately according to the 5 classes. The new balanced dataset consists of 15,000 images. To construct the balanced dataset, images were grouped according to the defect-category labels provided in the original dataset. Since the original class distribution was imbalanced, 3000 images were selected from each class to ensure equal class representation. For classes containing more than 3000 images, a representative subset was selected while taking care to preserve the variation in weld appearance within each defect category using the original class labels and visual clarity of the weld/defect features. This balancing strategy was used to reduce class bias during training and to allow a fair comparison of the evaluated D-CNN models under the same experimental conditions. The dataset was further partitioned into subsets such as training data (70%), test data (20%), and validation data (10%). Table 1 shows the number of original images and partitioned images according to class and subsets. In addition to the stratified 70/20/10 train–test–validation split, stratified 5-fold cross-validation was performed to assess the robustness and stability of the classification outcomes. The balanced dataset of 15,000 images was divided into five folds, with each fold containing 3000 images, including 600 images from each of the five classes. In each cross-validation iteration, one fold was used as the testing set and the remaining four folds were used for training. Thus, each iteration used 12,000 images for training and 3000 images for testing. This process was repeated five times so that each fold was used once as the testing set. The 5-fold cross-validation split is illustrated in Figure 3.

Hyperparameter settings are essential in training the CNN models. The adaptive moment estimation (Adam) is the optimisation algorithm used in the current CNN model [31]. In this study we used the following hyperparameters: learning rate (=1 × 10⁻³), dropout rate (=0.25), epoch size (=50), batch size (=64), number of added classifier layers (=2), number of units in the added dense layer (=16), and cost function (=categorical cross-entropy). Here, the number of layers and units refers to the newly added trainable classifier head placed on top of the pretrained transfer-learning base, rather than to the original layers of the pretrained CNN architectures. These values were taken from studies conducted in past literature [30,31]. The computational environment and model execution time were also recorded so as to assess the practical feasibility of the evaluated models for industrial deployment. All models were trained and evaluated using Google Collab with GPU acceleration. The recorded runtime used an NVIDIA Tesla T4 GPU with 15 GB GPU memory and 12 GB allocated system RAM. The software environment included Python 3.12.9 and TensorFlow 2.20.0. Training time was measured for 50 epochs, while inference time was calculated as the average prediction time per image over the corresponding test set. For the 5-fold cross-validation, training and inference times were averaged across the five folds.

2.2. Deep-Convolutional Neural Network (D-CNN) Models

Deep-Convolutional Neural Networks (D-CNNs) have become increasingly popular in recent years for tackling complex problems in computer vision [40]. Drawing inspiration from the human brain’s visual cortex, D-CNNs are a cutting-edge method in deep learning for image recognition. Training a CNN from scratch generally requires substantial data to achieve a robust predictive model. However, when data is scarce, advanced techniques are needed to yield satisfactory predictions with limited datasets. Transfer learning (TL) is one such technique, where the pre-trained features of an existing CNN model are used to initialise a new CNN for a specific classification task. Tsiakmaki et al. [41] highlight its effectiveness by using pre-trained CNN models on large datasets to initialise new models on smaller datasets for different purposes. This approach remains informative even if the new task diverges significantly from the original model’s training. This study employs five transfer-learning-based D-CNN models for feature extraction and comparative evaluation. Each model utilises pre-trained weights and integrates a new classifier, facilitating the continuous detection of defects in TIG welding processes through intelligent image classification. The CNNs sequentially process selected image data, learning potentially relevant features for assessing weld quality.

CNN models can automatically learn high-level features from raw images, thus allowing for the development of applications in a shorter timeframe. CNN stacks the layers for feature extraction from images. Stacking mainly consists of the convolution layer, Relu, pooling layer, and dense layer with SoftMax activation function.

Convolution layer: A convolution is a mathematical operation applied to a matrix. This matrix is usually the image represented in the form of pixels/numbers. The convolution operation extracts the features from the image. For a two-dimensional image, the discrete convolution operation is defined as follows:

(p * q) (i, j) = \sum_{m} \sum_{n} p (i - m, j - n) q (m, n)

(1)

where p represents the input image or feature map, q represents the convolution kernel, and (i,j) denotes the spatial location in the output feature map. This operation enables the convolutional layer to extract local spatial features, such as edges, textures, and defect-related patterns, from the input weld images.

Relu: In CNN, the Rectified Linear Units Layer integrates non-linearity and rectification into a single layer, which helps to overcome the problem of vanishing gradient [42]. This activation function yields the nonlinear properties of the decision function and the overall network without affecting the cancellation problem. A rectified linear unit is a simple mathematical formulation defined as follows:

f (X_{p}^{(l)}) = Max (0, X_{p}^{(l - 1)})

(2)

Pooling Layer: Pooling reduces the spatial size of an image. There are three types of pooling: minimum pooling, maximum pooling, and average pooling. Maximum pooling provides a form of translation invariance and thus benefits generalisation. It computes the maximum values over a neighbourhood in each feature map. Maximum pooling function is dependent on the two hyperparameters, filter (K^(l)) and stride (S^(l)).

Input Size:

n_{1}^{(l - 1)} \times n_{2}^{(l - 1)} \times n_{3}^{(l - 1)}

Output Size:

n_{1}^{l} \times n_{2}^{l} \times n_{3}^{l}

Max Pooling:

n_{p}^{(l)} = \underset{(K, S) \in n_{p}}{Max} n_{(K, S)}

Dense Layer: A dense layer is a multilayer perceptron structure that maps higher-level features from the input data. The dense layer is defined as:

If (l−1) is a dense layer:

X_{p}^{(l)} = f (Z_{p}^{(l)})

where:

Z_{p}^{(l)} = \sum_{q = 1}^{n_{1}^{(l - 1)}} w_{p, q}^{(l)} \cdot X_{p}^{(l - 1)}

The stochastic likelihood representation of predefined classes of the feature map using SoftMax are created by tuning weight parameters

w_{p, q}^{(l)}

.

Figure 4 shows the state-of-the-art D-CNN architecture used in this study for comparative analysis. Each model is uniquely advantageous and is optimised for specific needs in terms of efficiency, accuracy, and computational demand. Models such as VGG16 and VGG19 are noted for their depth, which aids in thorough feature extraction. VGG16 is particularly valued for its balance of performance and simplicity and is especially suited for image classification tasks. VGG19, with three additional layers, can model more complex patterns but may require more computing power and runs the risk of overfitting.

Inception V3 distinguishes itself with its inception modules that allow it to efficiently capture information at various scales. This architecture optimises computational cost while increasing network depth, offering substantial improvements in speed and efficiency compared to VGG models. Inception V3′s multi-scale processing capability and advanced features like label smoothing and factorised convolutions make it highly effective in managing overfitting and enhancing model accuracy on large datasets. Inception V3 is tailored for scenarios that require managing large-scale data with efficient computation. It benefits from its ability to handle complex data with nuanced feature extraction.

MobileNet is designed to reduce computational cost through depth-wise separable convolutions, which separate: 1. spatial filtering and 2. channel-wise feature combination. This design reduces the number of parameters and operations compared with deeper architectures such as VGG16 and VGG19. In the present TIG welding defect classification task, MobileNet was included in order to assess whether a lightweight CNN architecture can provide efficient classification of HDR weld images, particularly for applications where rapid frame-level prediction and reduced computational demand are important.

DenseNet introduces an innovative approach by connecting each layer to every other layer in a feed-forward fashion. This connectivity ensures excellent feature propagation and reuse, significantly reducing the risk of vanishing gradients, which is a common problem in deeper networks like VGG19 and Inception V3. DenseNet is exceptionally efficient in terms of parameters and computation, leveraging feature reuse to minimise redundancy and enhance training effectiveness. It is robust against overfitting and is scalable, making it an excellent choice for both academic research and complex image processing tasks. DenseNet offers advantages in environments where maximum parameter efficiency and deep network capabilities are needed without the extensive computational overhead typically associated with such depth.

2.3. Experimental Testing

Experimental testing involved running five D-CNN models on a dataset comprising both good and defective weld joints to determine the optimal hyperparameters for the models. The experimental procedure was conducted as follows:

Recording and preprocessing of welding image data.
Configuring the CNN models within the TensorFlow framework.
Training each model architecture using the prepared training and validation datasets.
Fine-tuning the upper layers of the networks to enhance their performance.
Inputting test data into the trained models to obtain classification outcomes.
Assessing and contrasting the classification performance of each network.
Employing Grad-CAM to generate heatmaps, highlighting potential defect locations within the test images.

The CNN models were developed using Python in the TensorFlow deep learning framework, which facilitates the programming of machine learning algorithms and their execution. TensorFlow’s capabilities include a data preprocessing tool utilised for both the training and validation dataset. After preprocessing, each network, coupled with an additional classifier, was trained and subsequently fine-tuned. These refined models were then employed to classify the test data, and their performance was evaluated using appropriate metrics and visualisation techniques, including the use of heat maps generated through the Grad-CAM process to visually locate defects in the welding images.

2.4. Performance Measure

In classification tasks, results can be efficiently summarised using a confusion matrix. This matrix forms the basis for deriving other performance metrics for CNN models. Accuracy, a common metric for multi-class classification, measures the overall effectiveness of a classifier by indicating the proportion of correct predictions, as detailed by Johnson and Khoshgoftaar [43] and Branco et al. [44]. In scenarios involving imbalanced datasets as well as balanced ones, the additional metrics such as precision, recall (also known as sensitivity), and the F1-Score are commonly utilised. For visualising and evaluating model performance in cases of imbalanced data, the AUC (Area Under the Curve) measure is particularly valuable. The AUC value, which quantifies the overall performance across all possible prediction thresholds, is a critical measure for comparing different models, as noted by Branco et al. [44]. For the multi-class classification task, precision, recall, F1-score, and AUC were calculated using macro-averaging across the five classes. Macro-averaging calculates each metric independently for every class and then reports the unweighted mean across all classes. This approach was selected because the dataset was balanced and each weld class was considered equally important. For the 5-fold cross-validation, the final results were reported as mean ± standard deviation across the five folds. Typically, performance evaluation in deep learning also involves tracking accuracy and loss or cost functions. The accuracy graph depicts classification performance as a percentage, while the loss graph accounts for forecast uncertainties by quantifying deviations from actual values. These graphs are instrumental in assessing the training process, providing insights into the efficacy of chosen hyperparameters, and guiding modifications for improved training outcomes. For multi-class classifications, such as distinguishing between acceptable and defective welding images, the categorical cross-entropy loss function is frequently employed. In this study, the performance of D-CNN models was further evaluated using gradient-weighted class activation mapping (Grad-CAM), an XAI technique, according to Selvaraju et al. [33].

3. Results

3.1. Model Performance, Cross-Validation Stability, and Computational Efficiency

Table 2 presents the performance analysis of the five D-CNN models evaluated in the current study using the stratified 70/20/10 train–test–validation split and stratified five-fold cross-validation. For the 70/20/10 split, VGG16 achieved an accuracy of 0.91, with precision, recall, and F1-score values of 0.91, indicating balanced and reliable classification performance across the defect classes. VGG19 showed slightly improved performance over VGG16, with an accuracy of 0.92 and corresponding precision, recall, and F1-score values of 0.92. Inception V3 further improved the classification results, achieving an accuracy of 0.94 with precision, recall, and F1-score values of 0.94, which indicates good overall generalisation and consistent class-wise performance. MobileNet achieved an accuracy of 0.95 and similarly strong precision, recall, and F1-score values of 0.95, demonstrating that it provides an effective balance between classification performance and computational efficiency. Among all evaluated models, DenseNet achieved the best overall performance in the 70/20/10 split, with an accuracy of 0.98 and precision, recall, F1-score, and AUC values of 0.98. These results indicate that DenseNet provides the most robust and balanced performance for multi-class TIG weld defect classification on the present dataset.

To further assess the stability and robustness of the classification outcomes, stratified five-fold cross-validation was performed. The results showed a similar overall trend to the 70/20/10 split, with DenseNet again achieving the highest mean performance, including an accuracy of 0.95 ± 0.03, precision of 0.96 ± 0.02, recall of 0.95 ± 0.02, and F1-score of 0.94 ± 0.02. Inception V3 also showed stable performance, achieving a cross-validation accuracy of 0.94 ± 0.02, followed by MobileNet with 0.91 ± 0.03, VGG19 with 0.88 ± 0.03, and VGG16 with 0.87 ± 0.03. Although the cross-validation values were slightly lower than the corresponding 70/20/10 split results for some models, this is expected because cross-validation averages model performance across five different test partitions rather than relying on a single fixed test set. The relatively low standard deviation values indicate stable model performance across different data partitions. Therefore, the five-fold cross-validation provides a more robust and conservative estimate of model stability. Based on its highest classification accuracy, strong cross-validation performance, and consistent class-wise behaviour, DenseNet was selected for further detailed analysis using confusion matrices and Grad-CAM visualisation.

Table 3 presents the training and inference time for the stratified 70/20/10 train–test–validation split, while Table 4 presents the average training and inference time per fold for the stratified five-fold cross-validation. For the 70/20/10 split, MobileNet achieved the shortest training time of 70.2 min and the lowest inference time of 7.0 ms/image, indicating its suitability for lightweight or edge-based deployment. DenseNet required 76.4 min for training and achieved an inference time of 10.0 ms/image, while also producing the highest classification performance. VGG16, VGG19, and Inception V3 required longer inference times of 15.2, 16.0, and 14.0 ms/image, respectively. These results show that DenseNet provides a strong balance between classification accuracy and computational feasibility, whereas MobileNet provides the fastest inference speed. For the five-fold cross-validation, the average training and inference times were higher than those obtained from the 70/20/10 split. This is expected because each model was trained five times, and each fold used 12,000 training images compared with 10,500 training images in the 70/20/10 split. Therefore, while five-fold cross-validation provides a more robust estimate of model stability, it also requires a greater computational cost. In the five-fold cross-validation analysis, DenseNet achieved an average inference time of 15.0 ms/image, which remains practical for rapid weld image classification while maintaining the highest overall classification performance.

3.2. Training and Validation Accuracy/Loss

Figure 5 depicts the dynamics of different D-CNN models, such as VGG16, VGG19, Inception V3, MobileNet, and DenseNet, used in the current study. Each plot shows the accuracy and loss for both training and validation datasets across different epochs. For VGG16, the training and validation loss curves show stable convergence, indicating that the model learned relevant weld image features without severe divergence between training and validation behaviour. The validation loss closely follows the training loss in Figure 5a, suggesting good generalisation without significant overfitting. Training accuracy for VGG16 increases sharply initially and then levels off, showing a typical convergence behaviour. Validation accuracy is closely aligned with training accuracy, suggesting the model performs consistently on both training and unseen validation data. VGG19 (Figure 5b) shows a similar trend to VGG16, given both are part of the VGG family. The training and validation loss for VGG19 starts high and decreases, but validation loss appears slightly higher when compared to VGG16 in the later epochs, which might indicate slightly more overfitting. The training and validation accuracies of the VGG19 model increased steadily, with validation accuracy slightly lower than training accuracy, again suggesting minor overfitting. The training loss for the Inception V3 model (Figure 5c) decreased steadily, which is a good indicator of learning. However, it does not reach as low as the VGG models, which might suggest less effective optimisation or adjustments in learning parameters. The validation loss in the Inception V3 model starts lower but shows slight volatility, which can be typical in deeper and more complex architectures. The training and validation accuracies of the Inception V3 model are very close to each other, indicating that the model generalises well despite the complexity. The training and validation losses for the MobileNet model (Figure 5d) decrease and are quite low compared to other models, suggesting effective learning and optimisation. The accuracies are very stable with validation accuracy almost matching training accuracy, which is ideal in practical scenarios. The losses for the DenseNet model (Figure 5e) show that the model is extremely well-optimised with the lowest training and validation loss among all models while the accuracies for the DenseNet model are exceptionally high and stable, indicating excellent learning and generalisation capabilities.

3.3. Classifications of Defects

Table 5 represents confusion matrices for five convolutional neural networks (CNNs) applied to a welding defect classification task.

Each matrix presents the classification results for the five weld categories including good weld, contamination, lack of fusion, lack of penetration, and misalignment. The confusion matrix analysis shows that VGG16 achieved reasonable classification performance across all classes, although greater confusion remained for lack of penetration and misalignment compared with the other categories. VGG19 improved the overall classification performance, with stronger results for contamination, lack of penetration, and misalignment, while some confusion was still observed for lack of fusion. Inception V3 further improved the class-wise predictions and showed more balanced performance across all categories, with only limited confusion between visually similar defects. MobileNet also demonstrated strong classification performance, particularly for good weld, contamination, and misalignment, while a small number of errors remained for lack of fusion and lack of penetration. Among all the evaluated models, DenseNet exhibited the most robust and reliable performance, with the highest number of correct predictions in each class and the lowest overall level of misclassification. In particular, DenseNet classified contamination, lack of fusion, lack of penetration, and misalignment with very high accuracy, while only a small number of samples were assigned to neighbouring classes. Overall, the confusion matrix analysis confirms that DenseNet achieved the best multi-class TIG weld defect classification performance on the present dataset.

3.4. Detection of Weld Joints

Figure 6 depicts HDR camera-captured images alongside Grad-CAM-detected images. These images are analysed using the DenseNet model through the application of Grad-CAM heatmaps, providing “visual explanations” for the model’s decision-making process, as initially suggested by Selvaraju et al. [33]. This approach underlines the advantage of convolutional layers in retaining spatial information, enabling the identification of specific, class-related features within the images such as those associated with good welds, contamination, lack of fusion, lack of penetration, and misalignment. Figure 7 depicts the visualisations of the most critical regions in the welded joint through a heatmap.

This study exclusively employs the DenseNet architecture, with the Grad-CAM technique harnessing gradient information from the final convolution layer to assign importance values to particular features relevant to each classified defect. The heatmap visualisations highlight the most critical regions in the welded joint (shown in yellow), indicating where the DenseNet model focuses most when assessing the weld quality, while less pertinent areas are shown in blue. This method not only enhances the transparency and understanding of how the CNN processes and evaluates TIG welding images but also serves to refine the model’s predictive accuracy by clearly delineating areas of interest that correlate with specific welding defects. This approach is pivotal for optimising weld evaluations and ensuring consistent quality in TIG welding applications.

4. Discussion

In the present study, the five evaluated transfer-learning-based CNN architectures were assessed on a balanced HDR image dataset for the multi-class TIG welding defect classification of Aluminium 5083 joints. The benefits of this study are therefore methodological and application-oriented rather than architectural, as the work establishes a consistent comparative framework for assessing how different deep learning model families perform under the same dataset and evaluation setting. Among the evaluated models, DenseNet performed best in distinguishing good welds from defects such as contamination, lack of fusion, lack of penetration, and misalignment, achieving a classification accuracy of 98%. In addition, the use of Grad-CAM as an XAI tool complements numerical performance measures by highlighting defect-relevant weld regions, thereby improving the interpretability of the classification outcomes.

Comparison with previously reported studies further supports the effectiveness of the evaluated transfer-learning-based models for TIG weld defect classification. As shown in Figure 8, Bacioiu et al. [29,30] applied a conventional CNN model to TIG welding datasets for Aluminium 5083 and SS304, while Sekhar et al. [31] and Wang et al. [32] investigated more advanced CNN-based architectures on similar datasets. The conventional CNN reported by Bacioiu et al. [30] showed the lowest accuracy, around 70%, whereas the transfer-learning-based architectures evaluated in the present study achieved a substantially higher classification performance, ranging from 91% to 98%. Similar findings were reported by Sekhar et al. [31] on a TIG welding dataset for SS304, where DenseNet also demonstrated a high classification accuracy of 97.28% for multi-class and 99.69% for two-class classification. These findings suggest that advanced deep CNN models, particularly DenseNet, provide a more effective solution for multi-class TIG weld defect classification than earlier conventional approaches.

Among the evaluated architectures, DenseNet stands out because of its ability to efficiently learn and reuse features across layers, which is particularly advantageous in weld defect classification. The dense connectivity in DenseNet allows each layer to access the feature maps of all preceding layers, promoting extensive feature reuse. This is particularly beneficial in weld image analysis, where defects have complex patterns and are challenging to capture. By reusing features across layers, DenseNet can more effectively learn the critical differences between a good weld and a defective weld, leading to more accurate classification. The confusion matrices further support this result, as DenseNet showed the lowest overall misclassification and the most balanced class-wise performance among all evaluated architectures.

The cross-validation and computational-time results further strengthen the practical interpretation of the model comparison. The consistent ranking of DenseNet across both the 70/20/10 split and the five-fold cross-validation suggests that its superior performance is not only due to the selected hold-out test set but also reflects a stable feature-learning capability across different data partitions. However, model selection for industrial inspection should consider both predictive accuracy and computational efficiency. Within the computational environment used in this study, DenseNet provided the strongest balance between classification performance, cross-validation stability, and inference latency, while MobileNet showed the lowest inference time among the evaluated models. The inference-time results can also be related to the image acquisition rate of the HDR camera used for dataset generation. Bacioiu et al. [30] used an Xiris XVC-1000 HDR camera and monitored the welding process at 55 frames per second, corresponding to approximately 18.18 ms per frame. In the 70/20/10 split analysis, DenseNet achieved an inference time of 10.0 ms/image, while MobileNet achieved the fastest inference time of 7.0 ms/image. Both values are below the nominal camera frame interval, suggesting that these models provide promising inference latency for frame-level TIG weld image classification. However, this comparison considers only model prediction time. Full real-time deployment would also require evaluation of image acquisition, preprocessing, data transfer, hardware integration, and synchronisation with welding speed and process parameters.

Applying the DenseNet model further in defect detection, it can be seen in the Grad-CAM image (Figure 6) that the model focuses on specific areas of the welds, such as the bright, uniform region in good welds and the irregular, asymmetrical regions in defective welds such as contamination, lack of fusion, and lack of penetration. Similarly, the heatmaps in Figure 7 visualise the intensity of the model’s focus across the welds. In good welds, the heatmap shows a concentrated, symmetrical region, indicating consistent quality. In contrast, defective welds exhibit more scattered or shifted heatmap patterns reflecting the model’s detection of irregularities. When combined, these visual tools confirm that DenseNet not only accurately classifies weld quality but also provides interpretable and reliable visual cues that highlight the specific areas that contribute to its decisions. From an industrial perspective, the high classification accuracy of DenseNet and its low inference time indicate that the proposed framework could support a rapid frame-level assessment of TIG weld quality. When integrated with an HDR camera near the welding station, the model could classify weld images during the welding process. Grad-CAM could help operators verify whether or not predictions are based on physically meaningful weld regions.

The proposed framework was validated on the balanced Aluminium 5083 TIG welding HDR dataset and showed strong classification performance, cross-validation stability, and interpretable Grad-CAM responses. However, generalisation beyond the present dataset should be examined in future work using additional weld images from different materials, plate thicknesses, welding parameters, lighting conditions, and imaging systems. Such external testing would help assess the cross-material and cross-camera robustness of the proposed framework.

5. Conclusions

The present study systematically evaluates advanced transfer-learning-based D-CNN models for the multi-class TIG welding defect classification of Aluminium 5083 joints using HDR imagery. The key contribution of this work lies in combining a controlled comparative evaluation of multiple transfer-learning-based D-CNN architectures with XAI-based interpretability through Grad-CAM for multi-class TIG weld defect analysis using HDR images. This framework provides not only quantitative evidence for model selection, but also visual insight into defect-relevant weld regions, thereby supporting more reliable and interpretable TIG weld inspection. Furthermore, all the models were trained, tested and validated and their performance measures were compared with each other to identify the most suitable model for defect detection using Grad-CAM. In addition, stratified five-fold cross-validation was performed to assess model stability across different data partitions, and training/inference time was recorded to evaluate computational feasibility. Simultaneously, the accuracy of the most suitable model was compared with previously applied models by Bacioiu et al. [30], Sekhar et al. [31] and Wang et al. [32]. The study findings are concluded as follows:

DenseNet achieved the highest classification accuracy of 98% in the 70/20/10 train–test–validation split, compared with MobileNet at 95%, Inception V3 at 94%, VGG19 at 92%, and VGG16 at 91%.
The stratified five-fold cross-validation results confirmed the stability of the classification outcomes, with DenseNet achieving the highest mean accuracy of 0.95 ± 0.03 across the five folds.
The computational-time analysis showed that DenseNet provided the strongest balance between classification accuracy, cross-validation stability, and inference latency within the computational environment used in this study.
DenseNet exhibited robust performance across most defect classes with the highest accuracy and least confusion, making it highly reliable and suitable for defect classification in the TIG welding process. The improved performance compared with the other evaluated models in the literature [30,31,32] is associated with more effective feature reuse and stronger class-wise consistency, as reflected in both the performance metrics and the confusion matrices.
Compared with the conventional CNN model proposed by Bacioiu et al. [30] on a similar dataset, the advanced architectures evaluated in the present study achieved substantially higher classification accuracy.
The “Grad-CAM with DenseNet” model has the potential to automate the process of identifying critical features within complex defects in welding joints more accurately.

Author Contributions

Conceptualisation, D.N., T.B. and S.N.; methodology, D.N., T.B. and D.H.; software, D.N., T.B. and D.H.; validation, D.N., T.B. and D.H.; formal analysis, D.N., T.B., D.H. and S.N.; investigation, D.N., T.B., and D.H.; resources, C.M. and S.N.; data curation, T.B. and D.H.; writing—original draft preparation, D.N., M.S., T.B. and S.N.; writing—review and editing, D.N., D.H., M.S., T.B., C.M. and S.N.; visualisation, T.B. and D.H.; supervision, C.M. and S.N.; project administration, D.N. and S.N.; funding acquisition, D.N. and S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset used in this research work is publicly available on https://www.kaggle.com/datasets/danielbacioiu/tig-aluminium-5083 (accessed on 2 April 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shravan, C.; Radhika, N.; Deepak Kumar, N.H.; Sivasailam, B. A Review on Welding Techniques: Properties, Characterisations and Engineering Applications. Adv. Mater. Process. Technol. 2024, 10, 1126–1181. [Google Scholar] [CrossRef]
Ogbonna, O.S.; Akinlabi, S.A.; Madushele, N.; Mashinini, P.M.; Abioye, A.A. Application of MIG and TIG Welding in Automobile Industry. J. Phys. Conf. Ser. 2019, 1378, 042065. [Google Scholar] [CrossRef]
Sen, R.; Choudhury, S.P.; Kumar, R.; Panda, A. A Comprehensive Review on the Feasibility Study of Metal Inert Gas Welding. Mater. Today Proc. 2018, 5, 17792–17801. [Google Scholar] [CrossRef]
Singh, S.R.; Khanna, P. A-TIG (Activated Flux Tungsten Inert Gas) Welding:—A Review. Mater. Today Proc. 2021, 44, 808–820. [Google Scholar] [CrossRef]
Ezer, M.A.; Çam, G. A Study on Microstructure and Mechanical Performance of Gas Metal Arc Welded AISI 304 L Joints. Materwiss. Werksttech. 2022, 53, 1043–1052. [Google Scholar] [CrossRef]
Şenol, M.; Çam, G. Investigation into Microstructures and Properties of AISI 430 Ferritic Steel Butt Joints Fabricated by GMAW. Int. J. Press. Vessel. Pip. 2023, 202, 104926. [Google Scholar] [CrossRef]
Serindağ, H.T.; Çam, G. Characterizations of Microstructure and Properties of Dissimilar AISI 316L/9Ni Low-Alloy Cryogenic Steel Joints Fabricated by Gas Tungsten Arc Welding. J. Mater. Eng. Perform. 2023, 32, 7039–7049. [Google Scholar] [CrossRef]
Serindağ, H.T.; Çam, G. Multi-Pass Butt Welding of Thick AISI 316L Plates by Gas Tungsten Arc Welding: Microstructural and Mechanical Characterization. Int. J. Press. Vessel. Pip. 2022, 200, 13–16. [Google Scholar] [CrossRef]
Serindağ, H.T.; Tardu, C.; Kirçiçek, İ.Ö.; Çam, G. A Study on Microstructural and Mechanical Properties of Gas Tungsten Arc Welded Thick Cryogenic 9% Ni Alloy Steel Butt Joint. CIRP J. Manuf. Sci. Technol. 2022, 37, 1–10. [Google Scholar] [CrossRef]
Sumesh, A.; Nair, B.B.; Rameshkumar, K.; Santhakumari, A.; Raja, A.; Mohandas, K. Decision Tree Based Weld Defect Classification Using Current and Voltage Signatures in GMAW Process. Mater. Today Proc. 2018, 5, 8354–8363. [Google Scholar] [CrossRef]
Stavridis, J.; Papacharalampopoulos, A.; Stavropoulos, P. A Cognitive Approach for Quality Assessment in Laser Welding. Procedia CIRP 2018, 72, 1542–1547. [Google Scholar] [CrossRef]
Gupta, M.; Khan, M.A.; Butola, R.; Singari, R.M. Advances in Applications of Non-Destructive Testing (NDT): A Review. Adv. Mater. Process. Technol. 2022, 8, 2286–2307. [Google Scholar] [CrossRef]
Lukács, H.I.; Beregi, B.Z.; Porteleki, B.; Fischl, T.; Botzheim, J. Attention U-Net-Based Semantic Segmentation for Welding Line Detection. Sci. Rep. 2025, 15, 15276. [Google Scholar] [CrossRef]
Pugliese, R.; Regondi, S.; Marini, R. Machine Learning-Based Approach: Global Trends, Research Directions, and Regulatory Standpoints. Data Sci. Manag. 2021, 4, 19–29. [Google Scholar] [CrossRef]
Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
Patil, D.B.; Nigam, A.; Mohapatra, S.; Nikam, S. A Deep Learning Approach to Classify and Detect Defects in the Components Manufactured by Laser Directed Energy Deposition Process. Machines 2023, 11, 854. [Google Scholar] [CrossRef]
Ma, G.; Yuan, H.; Yu, L.; He, Y. Monitoring of Weld Defects of Visual Sensing Assisted GMAW Process with Galvanized Steel. Mater. Manuf. Process. 2021, 36, 1178–1188. [Google Scholar] [CrossRef]
Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a Convolutional Neural Network. In Proceedings of the 2017 International Conference on Engineering and Technology, ICET, Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
Paturi, U.M.R.; Cheruku, S. Application and Performance of Machine Learning Techniques in Manufacturing Sector from the Past Two Decades: A Review. Mater. Today Proc. 2020, 38, 2392–2401. [Google Scholar] [CrossRef]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent Advances in Convolutional Neural Networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Cui, J.; Zhang, B.; Wang, X.; Wu, J.; Liu, J.; Li, Y.; Zhi, X.; Zhang, W.; Yu, X. Impact of Annotation Quality on Model Performance of Welding Defect Detection Using Deep Learning. Weld. World 2024, 68, 855–865. [Google Scholar] [CrossRef]
Lv, H.; Li, C.; Dai, J.; Zhang, Y.; Fan, Z.; Tan, Y.; Wang, D.; Xie, B. Lightweight Framework for Underground Pipeline Recognition and Spatial Localization Based on Multiview 2-D GPR Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5110115. [Google Scholar] [CrossRef]
Wang, D.; Lv, H.; Zhang, Y.; Fan, Z.; Ni, Y.; Lv, S.; Shen, P.; Tang, F.; Wu, H. Monitoring and Detection Technologies and AI-Powered Development Toward Transparent Roads. Engineering 2025. [Google Scholar] [CrossRef]
Zhang, B.; Wang, X.; Cui, J.; Wu, J.; Wang, X.; Li, Y.; Li, J.; Tan, Y.; Chen, X.; Wu, W.; et al. Welding Defects Classification by Weakly Supervised Semantic Segmentation. NDT E Int. 2023, 138, 102899. [Google Scholar] [CrossRef]
Say, D.; Zidi, S.; Qaisar, S.M.; Krichen, M. Automated Categorization of Multiclass Welding Defects Using the X-Ray Image Augmentation and Convolutional Neural Network. Sensors 2023, 23, 6422. [Google Scholar] [CrossRef] [PubMed]
Cherkasov, N.; Ivanov, M.; Ulanov, A. Classification of Weld Defects Based on Computer Vision System Data and Deep Learning. In Proceedings of the 2023 International Conference on Industrial Engineering, Applications and Manufacturing, ICIEAM, Sochi, Russia, 15–19 May 2023; pp. 856–860. [Google Scholar] [CrossRef]
Boutin, T.; Bendaoud, I.; Delmas, J.; Borel, D.; Bordreuil, C. Machine Learning Approach for Weld Configuration Classification within the GTAW Process. CIRP J. Manuf. Sci. Technol. 2023, 47, 116–131. [Google Scholar] [CrossRef]
Ghimire, R.; Selvam, R. Machine Learning-Based Weld Classification for Quality Monitoring. Eng. Proc. 2023, 59, 9241. [Google Scholar] [CrossRef]
Bacioiu, D.; Melton, G.; Papaelias, M.; Shaw, R. Automated Defect Classification of SS304 TIG Welding Process Using Visible Spectrum Camera and Machine Learning. NDT E Int. 2019, 107, 102139. [Google Scholar] [CrossRef]
Bacioiu, D.; Melton, G.; Papaelias, M.; Shaw, R. Automated Defect Classification of Aluminium 5083 TIG Welding Using HDR Camera and Neural Networks. J. Manuf. Process. 2019, 45, 603–613. [Google Scholar] [CrossRef]
Sekhar, R.; Sharma, D.; Shah, P. Intelligent Classification of Tungsten Inert Gas Welding Defects: A Transfer Learning Approach. Front. Mech. Eng. 2022, 8, 824038. [Google Scholar] [CrossRef]
Wang, R.; Wang, H.; He, Z.; Zhu, J.; Zuo, H. WeldNet: A Lightweight Deep Learning Model for Welding Defect Recognition. Weld. World 2024, 68, 2963–2974. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Wu, M.; Yao, Z.; Verbeke, M.; Karsmakers, P.; Gorissen, B.; Reynaerts, D. Data-driven models with physical interpretability for real-time cavity profile prediction in electrochemical machining processes. Eng. Appl. Artif. Intell. 2025, 160, 111807. [Google Scholar] [CrossRef]
Wu, M.; Shukla, S.; Vrancken, B.; Verbeke, M.; Karsmakers, P. Data-Driven Approach to Identify Acoustic Emission Source Motion and Positioning Effects in Laser Powder Bed Fusion with Frequency Analysis. Procedia CIRP 2025, 133, 531–536. [Google Scholar] [CrossRef]
Lee, C.; Seo, G.; Kim, D.B.; Kim, M.; Shin, J.-H. Development of Defect Detection AI Model for Wire + Arc Additive Manufacturing Using High Dynamic Range Images. Appl. Sci. 2021, 11, 7541. [Google Scholar] [CrossRef]
Aminudin, M.A.I.; Abdullah, M.N.; Mustapha, F.; Eng, K.K.; Mustapha, M.; Mustapha, A. Explainable Deep Learning Framework for Binary Corrosion Image Classification Using Grad-CAM. Sensors 2025, 25, 7070. [Google Scholar] [CrossRef] [PubMed]
Elhendawy, G.A.; El-Taybany, Y. Machine Vision-Assisted Welding Defect Detection System with Convolutional Neural Networks. Int. J. Precis. Eng. Manuf. 2025, 26, 3185–3194. [Google Scholar] [CrossRef]
Daniel, B. TIG Aluminium 5083. Available online: https://www.kaggle.com/danielbacioiu/tig-aluminium-5083 (accessed on 2 April 2024).
Wang, Z.; Li, L.; Chen, H.; Lin, S.; Wu, J.; Ding, T.; Tian, J.; Xu, M. Recognition of GTAW Weld Penetration Based on the Lightweight Model and Transfer Learning. Weld. World 2023, 67, 251–264. [Google Scholar] [CrossRef]
Tsiakmaki, M.; Kostopoulos, G.; Kotsiantis, S.; Ragos, O. Transfer Learning from Deep Neural Networks for Predicting Student Performance. Appl. Sci. 2020, 10, 2145. [Google Scholar] [CrossRef]
Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 315–323. [Google Scholar]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on Deep Learning with Class Imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
Branco, P.; Torgo, L.; Ribeiro, R.P. A Survey of Predictive Modeling on Imbalanced Domains. ACM Comput. Surv. 2016, 49, 1–50. [Google Scholar] [CrossRef]

Figure 1. Process used in the current study.

Figure 2. Different classes in the dataset built for the joining of grade 5083 Aluminium using a TIG welding process [39].

Figure 3. Stratified 5-fold cross-validation split of the balanced dataset.

Figure 4. Architecture of CNN models used in the current study: (a) VGG16, (b) VGG19, (c) Inception V3, (d) MobileNet and (e) DenseNet.

Figure 5. Training and validation accuracy/loss for the D-CNN models: (a) VGG16, (b) VGG19, (c) Inception V3, (d) MobileNet, and (e) DenseNet.

Figure 6. HDR camera captured image alongside Grad-CAM image detecting defects in (a) good weld, (b) contamination, (c) lack of fusion, (d) lack of penetration and (e) misalignment using DenseNet model.

Figure 7. Visualisation of the most critical regions in (a) good weld, (b) contamination, (c) lack of fusion, (d) lack of penetration and (e) misalignment through heatmap.

Figure 8. Comparison of classification accuracies for previously proposed and currently used CNN models [30,31,32].

Table 1. Split of dataset used in current study.

Class	Original Dataset	Balanced Dataset	Stratified 70/20/10 Train–Test–Validation Split
Class	Original Dataset	Balanced Dataset	Training Images	Testing Images	Validation Images
Good weld	10,947	3000	2100	600	300
Contamination	8403	3000	2100	600	300
Lack of fusion	5035	3000	2100	600	300
Lack of penetration	3053	3000	2100	600	300
Misalignment	3187	3000	2100	600	300

Table 2. Model performance analysis for 70/20/10 split and cross-validation split.

Model	Evaluation Method	Accuracy	Precision	Recall	F1-Score	AUC
VGG16	70/20/10 split	0.91	0.91	0.91	0.91	0.98
VGG16	5-fold Cross-Validation (mean ± standard deviation)	0.87 ± 0.03	0.87 ± 0.03	0.89 ± 0.02	0.90 ± 0.01	0.94 ± 0.03
VGG19	70/20/10 split	0.92	0.92	0.92	0.92	0.97
VGG19	5-fold Cross-Validation (mean ± standard deviation)	0.88 ± 0.03	0.89 ± 0.03	0.88 ± 0.04	0.89 ± 0.02	0.95 ± 0.02
Inception V3	70/20/10 split	0.94	0.94	0.94	0.94	0.97
Inception V3	5-fold Cross-Validation (mean ± standard deviation)	0.94 ± 0.02	0.92 ± 0.02	0.92 ± 0.02	0.92 ± 0.03	0.95 ± 0.02
MobileNet	70/20/10 split	0.95	0.95	0.95	0.95	0.96
MobileNet	5-fold Cross-Validation (mean ± standard deviation)	0.91 ± 0.03	0.89 ± 0.02	0.91 ± 0.03	0.91 ± 0.03	0.93 ± 0.02
DenseNet	70/20/10 split	0.98	0.98	0.98	0.98	0.98
DenseNet	5-fold Cross-Validation (mean ± standard deviation)	0.95 ± 0.03	0.96 ± 0.02	0.95 ± 0.02	0.94 ± 0.02	0.96 ± 0.02

Table 3. Training and inference time for 70/20/10 train–test–validation split.

Model	Training Time for 50 Epochs (min)	Test Images	Total Inference Time (s)	Inference Time per Image (ms/Image)
VGG16	98.6	3000	45.6	15.2
VGG19	88.3	3000	48.0	16.0
Inception V3	80.0	3000	42.0	14.0
MobileNet	70.2	3000	20.9	7.0
DenseNet	76.4	3000	29.9	10.0

Table 4. Training and inference time for 5-fold cross-validation split.

Model	Average Training Time per Fold for 50 Epochs (min)	Test Images per Fold	Average Total Inference Time per Fold (s)	Average Inference Time per Image (ms/Image)
VGG16	128.8	3000	78.6	26.2
VGG19	118.0	3000	66.8	22.3
Inception V3	116.2	3000	63.4	21.1
MobileNet	105.0	3000	56.6	18.9
DenseNet	110.0	3000	45.0	15.0

Table 5. Confusion matrix.

		Good Weld	Contamination	Lack of Fusion	Lack of Penetration	Misalignment
VGG16	Good weld	574	2	8	7	9
	Contamination	10	555	12	8	15
	Lack of fusion	5	8	550	12	25
	Lack of penetration	18	16	30	512	24
	Misalignment	14	9	13	34	530
VGG19	Good weld	553	4	18	10	15
	Contamination	11	559	9	8	13
	Lack of fusion	25	10	535	12	18
	Lack of penetration	15	7	13	555	10
	Misalignment	12	6	7	5	570
Inception V3	Good weld	576	5	7	3	9
	Contamination	8	559	14	9	10
	Lack of fusion	12	16	547	11	14
	Lack of penetration	5	3	4	582	6
	Misalignment	15	4	8	13	560
MobileNet	Good weld	575	7	4	8	6
	Contamination	5	565	11	9	10
	Lack of fusion	18	15	550	10	7
	Lack of penetration	8	4	12	570	6
	Misalignment	6	6	3	5	580
DenseNet	Good weld	583	3	5	4	5
	Contamination	2	589	3	2	4
	Lack of fusion	4	4	587	2	3
	Lack of penetration	5	2	6	585	2
	Misalignment	2	6	3	2	587

Bold indicates correct prediction of defects.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nikam, D.; Nikam, S.; Bhosale, T.; Harkin, D.; Sawant, M.; McGarrigle, C. Deep CNN-Based Multi-Class TIG Welding Defect Classification Using HDR Images with Explainable AI. J. Manuf. Mater. Process. 2026, 10, 193. https://doi.org/10.3390/jmmp10060193

AMA Style

Nikam D, Nikam S, Bhosale T, Harkin D, Sawant M, McGarrigle C. Deep CNN-Based Multi-Class TIG Welding Defect Classification Using HDR Images with Explainable AI. Journal of Manufacturing and Materials Processing. 2026; 10(6):193. https://doi.org/10.3390/jmmp10060193

Chicago/Turabian Style

Nikam, Deepika, Sagar Nikam, Tejaswini Bhosale, Declan Harkin, Mayur Sawant, and Cormac McGarrigle. 2026. "Deep CNN-Based Multi-Class TIG Welding Defect Classification Using HDR Images with Explainable AI" Journal of Manufacturing and Materials Processing 10, no. 6: 193. https://doi.org/10.3390/jmmp10060193

APA Style

Nikam, D., Nikam, S., Bhosale, T., Harkin, D., Sawant, M., & McGarrigle, C. (2026). Deep CNN-Based Multi-Class TIG Welding Defect Classification Using HDR Images with Explainable AI. Journal of Manufacturing and Materials Processing, 10(6), 193. https://doi.org/10.3390/jmmp10060193

Article Menu

Deep CNN-Based Multi-Class TIG Welding Defect Classification Using HDR Images with Explainable AI

Abstract

1. Introduction

1.1. Literature Review

1.2. Aim and Objectives

2. Materials and Methods

2.1. Dataset Pre-Processing, Training Configuration, and Computational Environment

2.2. Deep-Convolutional Neural Network (D-CNN) Models

2.3. Experimental Testing

2.4. Performance Measure

3. Results

3.1. Model Performance, Cross-Validation Stability, and Computational Efficiency

3.2. Training and Validation Accuracy/Loss

3.3. Classifications of Defects

3.4. Detection of Weld Joints

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI