Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection

Friebolin, Manuel; Munz, Michael; Schlickenrieder, Klaus

doi:10.3390/ai6100275

Open AccessArticle

Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection

by

Manuel Friebolin

^*

,

Michael Munz

^†

and

Klaus Schlickenrieder

Institute for Production Engineering and Materials Testing, Ulm University of Applied Sciences, 89081 Ulm, Germany

^*

Author to whom correspondence should be addressed.

^†

AI for Sensor Data Analytics Research Group.

AI 2025, 6(10), 275; https://doi.org/10.3390/ai6100275

Submission received: 14 September 2025 / Revised: 15 October 2025 / Accepted: 16 October 2025 / Published: 21 October 2025

Download

Browse Figures

Versions Notes

Abstract

In shearography-based tire testing, so-called “Mode Hops”, abrupt phase changes caused by laser mode changes, can lead to significant disturbances in the interference image analysis. These artifacts distort defect assessment, lead to retesting or false-positive decisions, and, thus, represent a significant hurdle for the automation of the shearography-based tire inspection process. This work proposes a deep learning workflow that combines a pretrained, optimized ResNet-50 classifier with Grad-CAM, providing a practical and explainable solution for the reliable detection and localization of Mode Hops in shearographic tire inspection images. We trained the algorithm on an extensive, cross-machine dataset comprising more than 6.5 million test images. The final deep learning model achieves a classification accuracy of 99.67%, a false-negative rate of 0.48%, and a false-positive rate of 0.24%. Applying a probability-based quadrant-repeat decision rule within the inspection process effectively reduces process-level false positives to zero, with an estimated probability of repetition of ≤0.084%. This statistically validated approach increases the overall inspection accuracy to 99.83%. The method allows the robust detection and localization of relevant Mode Hops and represents a significant contribution to explainable, AI-supported tire testing. It fulfills central requirements for the automation of shearography-based tire testing and contributes to the possible certification process of non-destructive testing methods in safety-critical industries.

Keywords:

shearography; Mode Hop; deep learning; ResNet-50; Grad-CAM; tire inspection; non-destructive testing; explainable AI

1. Introduction

Quality assurance in tire testing is an essential challenge in tire manufacturing and tire retreading. In particular, shearography-based tire testing has established itself as a reliable, non-destructive method for identifying internal defects. Shearography is widely used in industrial practice, especially for retreaded tires, where nearly 100% of tires undergo non-destructive testing. For new tires, shearographic inspection is often performed as part of random quality control. As a result, millions of tires are subjected to shearographic testing worldwide each year, contributing significantly to safety and reliability in the tire industry. Despite the maturity of the technology, so-called Mode Hops can occur, i.e., abrupt phase changes in the interference patterns caused by sudden shifts in longitudinal laser modes, which are discussed in the literature in connection with interferometric instability [1].

The detection of relevant defects by analyzing the surface deformation is, thus, distorted and may require the entire test procedure to be repeated. Precise identification of these artifacts (Mode Hops) is, therefore, necessary in order to increase the detection quality of defects or findings in the carcass, prevent repetition of the entire tire testing process, and minimize false positives in the findings.

The aim of this work is to propose a deep learning workflow that integrates a pretrained, optimized ResNet-50 classifier with Grad-CAM for the robust and explainable detection of Mode Hops in shearography-based tire testing.

Deep learning has established itself as the state of the art in image analysis, outperforming classical machine learning approaches due to its ability to learn complex, hierarchical features directly from data. This is particularly advantageous in the context of shearographic imaging, where signal distortions caused by Mode Hops exhibit high variability depending on the inspection area of the tire, the material structure, and the specific test conditions. In contrast to rule-based or hand-crafted feature methods, convolutional neural networks (CNNs) can generalize effectively across such variations, making them a suitable choice for robust and scalable defect detection. Their proven performance in visual pattern recognition tasks motivates their application to reliably identify Mode Hops and, thereby, enhance both the quality and automation potential of the inspection process.

For the first time, the method developed in this work was trained and evaluated on a representative industrial dataset of shearography-based tire inspections, demonstrating its applicability under real-world testing conditions.

This work provides a comprehensive overview of the development pipeline—from dataset generation and model architecture design to training methodology and explainable visualization and localization of Mode Hops using Gradient-weighted Class Activation Mapping (Grad-CAM). In addition, a method for spatial localization of detected artifacts within the shearographic image data has been integrated, enabling deeper analysis and improved interpretability of the results. The developed system is designed to support a robust and automatable decision-making process within the shearography-based inspection workflow and forms a foundation for future industrial implementation in the context of explainable artificial intelligence. Beyond tire inspection, the developed workflow can serve as a generic approach for explainable AI in non-destructive testing and advanced manufacturing, where XAI methods such as Grad-CAM are increasingly used for anomaly detection and process diagnostics [2].

Shearography has long been established as a non-destructive testing (NDT) technique in tire inspection due to its sensitivity to surface and subsurface strain anomalies [3]. However, systematic overviews of AI integration in shearography-based tire testing remain scarce. Prior studies primarily addressed defect or quality assessment rather than interference artefacts such as Mode Hops. Wang et al. [4] and Chang et al. [5] demonstrated that convolutional neural networks can reliably detect structural anomalies in tire shearography, while Li et al. [6] combined digital shearography with machine learning to improve defect characterization. More recently, Saleh et al. [2] showed that Xception Networks with Grad-CAM interpretation can support explainable defect localization in industrial visual inspection tasks. Building on these advances, the present paper targets the automation of Mode-Hop detection and explainable quadrant-level localization under real-world factory conditions, aiming to improve both reliability and traceability in the tire inspection process.

2. State of the Art

2.1. Shearography

Tire testing using shearography is an economical and highly precise method of non-destructive material testing. The method is based on the basic principles of laser interferometry, in which coherent laser light is split into two beams: a reference beam and a measurement beam that illuminates the tire. The interference of both beams creates patterns whose phase shifts provide information about the deformations and stress states occurring in the tire material [7].

In a specially designed test rig, the tire is first subjected to a controlled mechanical load in a measuring chamber using negative pressure. The negative pressure causes the tire to be stressed evenly and produces fine deformations in the structure. These deformations are then recorded by a high-resolution camera system, which captures the interferometric patterns. By recording the difference in surface structure between normal pressure and negative pressure, the change in the tire surface is made visible (see Figure 1).

2.2. Mode Hop as a Disturbance in Shearographic Inspection

The effect of Mode Hops describes abrupt changes between different longitudinal laser modes, which can lead to sudden phase shifts in the interferometric patterns. Such unforeseen mode changes make it difficult to analyze the measurement results. Mode hopping is often induced by internal and external factors, such as temperature drift or mechanical vibrations, leading to rapid changes in the dominant laser mode [1]. Previous work on specific Mode Hop detection is sparse and often integrated into studies to improve laser mode stability, using statistical and machine learning approaches to identify and correct for these abrupt phase changes [8,9,10,11,12,13].

While existing systems and studies investigate methods for the detection and stabilization of Mode Hops in laser systems, there is a lack of approaches for the direct detection of Mode Hops in shearography image data, especially in the context of tire testing. Earlier tire-related research using shearography and deep learning primarily focused on defect detection and quality assessment of the tire carcass. For instance, Wang et al. [4] and Chang et al. [5] developed CNN-based models for shearographic defect classification and visual quality analysis, while Li et al. [6] proposed hybrid training strategies to improve defect characterization in digital shearography. Wu et al. [14] further advanced segmentation-based defect quantification using U-Net architectures. However, none of these studies addresses Mode Hops as optical interference artefacts, which differ fundamentally from structural defects and represent a distinct source of uncertainty in shearographic inspection. The present work, therefore, focuses on the automated identification of Mode Hops in tire inspection image data, aiming to derive specific measures for improving process robustness and inspection reliability.

2.3. Deep Learning in Non-Destructive Testing

The integration of AI-supported image processing methods has led to significant progress in the field of non-destructive testing in recent years. Modern approaches are often based on deep learning methods such as convolutional neural networks (CNNs) [15] and transfer learning, in which pre-trained models are fine-tuned to domain-specific datasets as a powerful feature extractor [4,6]. These methods make it possible to detect material defects and structural weaknesses in image data from non-destructive testing [5,14]. In tire testing, CNNs have been applied successfully for defect localization and quality grading in shearographic and surface images [5,14]. These methods enable reliable detection of structural defects such as separations, blisters, and voids, but they do not address non-defect artefacts such as Mode Hops that can impair defect interpretation. In this regard, the current study extends previous tire-related deep learning research by explicitly focusing on optical interference artefacts that arise during laser-based image acquisition. In addition, classic image processing methods are combined with modern machine learning algorithms to identify defects automatically and reproducibly, which enables a high degree of automation and efficiency gains, especially in industrial applications [4,5]. In Gradient-weighted Class Activation Mapping (Grad-CAM) [16], relevant image areas that contribute to a particular prediction can be visually highlighted. This method enhances the transparency and traceability of decisions, which is particularly beneficial in safety-critical NDT applications. To the authors’ knowledge, no existing study has applied Grad-CAM or other explainable AI methods to shearography-based tire testing to verify model performance. Hence, the automatic detection and explainable localization of Mode Hops in shearographic tire testing remains unexplored.

3. Methodology

3.1. Measuring Setup and Testing Machines

To obtain shearography images with Mode Hops, two ITT1-DD shearography test machines (SDS Systemtechnik GmbH, Calw, Germany) are used. Each machine is equipped with a measurement head comprising a high-resolution camera (Allied Vision Manta G-235, Allied Vision Technologies GmbH, Stadtroda, Germany) and an image processing unit that delivers 8-bit grayscale images at a resolution of 768 × 575 pixels. The shear angle in the interferometry system is 25°. The object surface is illuminated by four monochromatic laser diodes arranged in a square configuration, producing coherent light (see Figure 2).

These machines are installed in a tire retreading facility and are operated both in the incoming goods inspection and the final product quality control. The image data for the Mode Hop dataset consist of shearographic tire inspection images generated under varying conditions using these two independent systems. Due to the differences in hardware aging, maintenance status, and environmental influences, the resulting data capture a broad and representative spectrum of image characteristics typical for industrial testing environments.

The variability in the taken images originates from several sources, including the following:

Age-related system characteristics: Progressive soiling of camera lenses and diodes, age-related factors, and machine mechanics.
Machine environmental influences: Temperature fluctuations, vibrations or shocks, and dust and dirt particles that can affect image quality.
Tire artifacts: Fluid residues, dirt, stickers, deposits of ice or snow, and stress markers due to mechanical stress.
Tire findings: Blisters, separations, wear marks, repair patches, and other damage.

In the present paper, it is assumed that illumination-related variations, including changes in surface reflection due to tire age or contamination, are sufficiently covered by the more than 269,809 inspected tires included in the datasets. Furthermore, both shearography machines are mechanically isolated by vibration dampers; at the same time, it is assumed that operational vibrations occurring during industrial use are also reflected in the acquired data. Given the large number of images collected over several months and across two independent machines, the dataset can be regarded as representative for the inspection scenario in the context of the investigated systems.

3.2. Dataset Description

Based on the described reference machines and the recording period as well as their environmental influences, a comprehensive and representative dataset was collected under real operating conditions. In the following section, the structure, quality assurance procedures, and properties of the cleaned image data used for model training are presented in detail.

The Mode Hop labeling was carried out by experienced shearography experts to ensure data quality for subsequent training and evaluation. A more detailed description of the labeling procedure is provided further below.

The data were cleaned up in several steps to ensure high quality and representativeness for training the machine learning models. This included separation according to test areas (crown and sidewall) and the selection of pure shearography images. The images are available as TIFF stacks and, in addition to the grayscale shearography images, also contain the corresponding infrared images for additional information gathering.

In addition, a pre-trained ResNet-50 classifier (ImageNet weights) was fine-tuned on a manually curated subset of images to identify and remove unusable data. This included calibration plate recordings and plain grayscale images from failed acquisitions. Through this process, corrupted or irrelevant data were excluded from the training and evaluation datasets.

Some of the sources of error that arose during data collection include inspection and calibration plates used during maintenance and rare camera system malfunctions. These unusable data were identified and removed using a combination of rule-based filters and machine learning classifiers. We refer to the cleaned dataset of testing machine 1 as dataset T1, and analogously, the dataset of testing machine 2 as dataset T2.

Table 1 lists the data acquired by machines during field deployment.

The dataset T1 (except 70 recordings) contains the categorization of the tire conditions according to the criteria “good”, “critical”, and “bad”. This evaluation assesses the condition of the tire carcass and is based on the worst sector. In this dataset, one test contains 24 sectors (8 sectors each for the crown, the upper sidewall, and the lower sidewall). The assessment was carried out by a quality assurance machine operator from the company from which the data originated.

77.95% of the tires do not contain a “critical” or “bad” sector (see Table 2). A visual inspection of the data showed that the tires with the rating “good” are similar to new tires and are, therefore, comparable with them. Therefore, “good” class data can be used to represent new tires, and no additional testing machine is required for the new tire segment.

Similar to dataset T1, dataset T2 contains 24 sectors for each test (8 sectors each for the crown, the upper sidewall, and the lower sidewall), but no categorization of the tire conditions into “good”, “critical” or “bad”, since this labeling was not provided by the machine operator or the company from which the data originated.

It comprises a total of 5,778,459 images taken over a period of around 3.5 years and, thus, represents a comprehensive and representative database.

Table 3 describes and defines the characteristic features and variance of Mode Hops.

Each row displays representative examples where the described artifact appears within a specific quadrant of the image, highlighting the spatial variability and aiding in visual differentiation.

The Mode Hop defect pattern is manifested by abrupt changes in the spectral and phase behavior of the light in the affected quadrants. In the grayscale images of the shearography-based tire inspection, these defect patterns manifest themselves as noise, sudden intensity changes, shifted or fragmented interference rings, and sharp artifact boundaries that could be misinterpreted as material findings and, thus, distort the results or invalidate the inspection images.

As part of the investigation of the characteristic features and variance of Mode Hops, 164,145 images labeled as “critical” or “bad” from dataset T1 were manually inspected. All Mode Hops were identified during this process. The labeling procedure was performed by several experienced shearography experts. Each expert sequentially inspected the dataset that had already been reviewed by the previous expert, so that the annotations were iteratively refined. Through this stepwise process, the dataset was consolidated and continuously improved in quality. This ensured a robust labeling outcome, although no formal inter-rater reliability study with independent double annotations was conducted. The relative occurrence of Mode Hops was found to be 0.04% in crown recordings and 0.04% in sidewall recordings. This proportion statistically reflects the probability of encountering Mode Hops in a part of dataset T1 and on field testing machine 1, and is, therefore, used as a reference value.

Since the datasets T1 and T2 are not annotated, a classifier was trained on a subset of the manually inspected 164,145 images from T1 to classify all images from datasets T1 and T2. The setup and the results can be seen in Experiment 1 Section 4.1.

The trained model reaches an accuracy of 99.67%. The number of Mode Hops found in datasets T1 and T2 can be seen in Table 4.

All images classified as containing Mode Hops were manually checked to verify the results. The complete dataset T1, consisting of “critical”, “bad”, and “good” rated images, has a Mode Hop rate of 0.13% in the crown recordings and 0.12% in the sidewall recordings. The dataset T2 has a Mode Hop rate of 0.2% in the crown recordings and 0.18% in the sidewall recordings.

3.3. Model Architecture: ResNet-50 CNN Implemented in PyTorch

The entire model development and implementation was performed in Python 3.12 using the PyTorch 2.5.1 framework [17]. A CNN based on the ResNet-50 architecture [18] was used for the image classification task. ResNet-50 is a deep neural network with 50 layers that addresses the degradation problem in deep networks by introducing residual connections. These residual connections allow information to be passed directly to later layers, which facilitates the training of deep networks. In our implementation, we used a pre-trained ResNet-50 model on the ImageNet-1K dataset [19], whose last fully connected layer was adapted to two outputs for the classification task.

3.4. Rationale for Backbone Selection

ResNet-50 was selected as it represents a state-of-the-art backbone architecture that has been evaluated in numerous scientific publications as well as in industrial and medical applications [20,21]. Compared to earlier networks such as VGG, AlexNet, or Inception, ResNet-50 provides higher accuracy with fewer parameters and enables stable training of very deep models through its residual connections [18]. Due to extensive ImageNet pre-training [19], the network converges faster and provides robust, transferable feature representations, making it particularly suitable for Mode Hop detection in shearography-based tire inspection. Today, ResNet variants form the foundation of many modern approaches in industrial image processing, including defect detection, medical image analysis, and visual quality inspection [22].

3.5. Training Strategy and Hyperparameter Optimization

For the optimization of the hyperparameters, we used Optuna [23], a modern framework for automated, efficient search for optimal model parameters. Optuna enables targeted fine-tuning of the learning rate, batch size, regularization, and other hyperparameters using a define-by-run structure, allowing flexible and dynamic optimization strategies to be implemented.

All models were trained and evaluated on an NVIDIA RTX 4090 GPU (ASUS, Taipei, Taiwan) with CUDA 13.0.

4. Experiments and Results

Before presenting the detailed setup and results, it is important to outline the purpose of the first two experiments conducted in the present paper. Experiment 1 was designed as an initial proof-of-concept to verify that a deep learning model can reliably detect Mode Hops in shearography images. Due to the lack of full annotations in the large datasets T1 and T2, a smaller, manually curated subset (dataset V1) was used for training and testing. The aim was to establish a baseline model, identify recurring patterns in false-positives, and use these findings to guide the creation of a more robust dataset. Experiment 2 built upon the findings of Experiment 1 by expanding the dataset with additional true Mode Hop samples as well as typical false-positive cases identified in the first step. Hyperparameter optimization was performed using Optuna to refine the model configuration. The purpose of this stage was to achieve higher robustness and generalization capability across different machines and operating conditions, while significantly reducing false positives.

4.1. Experiment 1: Initial Training and Mode Hop Identification

This first experiment aims to perform an initial training with a subset of dataset T1 (dataset V1, see Table 5) to find all Mode Hops in the datasets T1 and T2 (see Table 4) since T1 and T2 are too big to manually annotate. Furthermore, the goal is to identify potential false-positive images and their characteristic features to incorporate the findings into the final training and test dataset (later summarized in Section 4.1.2 last table).

In this experiment, 154 crown recordings and 154 sidewall recordings with Mode Hops were manually selected from dataset T1. Due to the small dataset, augmentation in the form of vertical, horizontal, and combined mirroring was applied, which increased the amount of data by a factor of four without introducing new variants into the system.

It should be noted that the Mode Hop is always located in one of the four corner quadrants of the image, since the inspection process is a two-sided 360° inspection process. The images without Mode Hop in the sidewall and crown area were randomly selected manually from the dataset T1.

4.1.1. Training Dataset Overview

Table 5 summarizes the augmented and original Mode Hop images in the training dataset V1.

4.1.2. Training Setup

The training and model parameters used in this experiment are detailed in Table 6. The corresponding test dataset is summarized in Table 7.

4.1.3. Test Dataset and Results

To test the model, we select additional images from dataset T1 (see Table 7). With this first model, a sensitivity of 100% was achieved, meaning that all Mode Hops present in the test set were correctly detected. In total, 12,761 out of 738,429 images in dataset T1 were classified as containing Mode Hops. This corresponds to a positive prediction rate (true and false positives) of 1.73%. An analysis of the false-positive images revealed specific and recurring features among the misclassifications (see Table 8).

4.2. Experiment 2: Data Expansion and Model Robustness

4.2.1. Objective and Strategy

The goal of this experiment is to refine the data basis and develop a more robust and generalizable model for Mode Hop detection. The procedure involves expanding dataset V1 with Mode Hops and false positives from Experiment 1 and subsequently optimizing the training and model hyperparameters.

4.2.2. Training and Test Dataset Overview

In the first step, the trained model from Experiment 1 was used to identify Mode Hops and false positives in the second dataset T2.

A total of 1072 of the found Mode Hops were integrated into the training and 7868 into test dataset V1, as well as additional images with no Mode Hops, leading to the new training and test dataset V2 (see Table 9 and Table 10).

4.2.3. Hyperparameter Settings

To fine-tune the model performance, the Python library Optuna was used for hyperparameter optimization. The tree-structured Parzen estimator served as the optimization algorithm [24]. It is a Bayesian optimization approach that selects promising hyperparameter configurations based on probability models. The following parameter ranges were explored: batch size [8, 16, 32, 64], learning rate [1 × 10⁻⁵, 1 × 10⁻²] (log-uniform), weight decay [1 × 10⁻⁵, 1 × 10⁻²] (log-uniform), and optimizer [‘Adam’, ‘SGD’].

The optimization was achieved by defining an objective function that minimizes the validation loss. Optuna performed a predefined number of trials, testing a different combination of hyperparameters in each trial to achieve the best possible performance.

Fifty trials were conducted as part of the optimization. The best trial achieved a minimum validation loss of approximately 0.088 and resulted in the best hyperparameter setting displayed in Table 11.

4.2.4. Training and Results

The model was trained using the final hyperparameters determined via Optuna optimization. To ensure consistent evaluation, a fixed data split was defined with 80% of the data used for training and 20% for validation. To increase data variance, augmentation with random rotation of ±10° (seed = 42) was applied. Additionally, early stopping [25] with a patience of 5 epochs and a minimum improvement delta of 0.001 was implemented to avoid early overfitting. The left-hand plot in Figure 3 shows the evolution of training and validation loss over 15 epochs. The training loss continuously decreases, indicating an improved model fit. In contrast, the validation loss begins to fluctuate from epoch 6 onwards and does not show significant improvement thereafter. From around epoch 10, a clear divergence between training and validation loss becomes apparent. Similarly, the right-hand plot in Figure 3 shows that training accuracy quickly rises to values above 98%, while validation accuracy stabilizes around 97–97.5% with slight variations. These trends suggest overfitting tendencies, as the model continues to improve on training data while generalization to validation data stagnates. However, it is important to note that the validation set is relatively small and may not adequately reflect the underlying data distribution. As such, the observed fluctuations and the apparent onset of overfitting may be misleading.

In contrast, the results obtained on the larger and more representative test dataset V2 (see Table 10) are consistently excellent and confirm the high generalization performance of the model. This is also reflected in the confusion matrix shown in Figure 4, which demonstrates a very low number of false positives and false negatives alongside an overall accuracy of 99.67%.

This suggests that the early signs of overfitting observed during validation are more likely attributable to the limited scope of the validation set rather than to a fundamental overfitting problem.

In summary, it can be said that the model is very well adapted to the training data and achieves a high level of validation accuracy from the sixth epoch onwards. The overall validation accuracy of approx. 97% shows that the model is very well suited for the reliable detection of Mode Hops (see Figure 3).

The confusion matrix in Figure 4 presents the true class labels by row and the predicted labels by column, following the convention described in [26]. A total of 15,080 true negatives and 7845 true positives were recorded, alongside 37 false positives and 38 false negatives. The model, thus, achieves an accuracy of 99.67%, a false-positive rate of 0.24% (37 images out of 15,117), and a false-negative rate of 0.48% (38 images out of 7883).

These results confirm that the model optimized by Optuna is not only able to distinguish the classes very precisely, but also has very few misclassifications. Also, they demonstrate the outstanding performance and generalization capability of the model.

4.3. Experiment 3: Explainability and Localization Using Grad-CAM

4.3.1. Objective and Setup

The experiment aims to perform a plausibility check of the recognized Mode Hop feature and simultaneously identify the affected image quadrant. For this purpose, a ResNet-50 network trained on Mode Hop detection is used as a basis, whose final fully connected layer was adapted for a binary classification (two classes). The Grad-CAM technique is applied to determine in which image quadrant, i.e., in which corner of the image, the Mode Hop object is located and whether the network specifically identifies the characteristic features of the Mode Hop.

At the beginning, the image is pre-processed. The input signal is first squared using the ZeroPaddingToSquare class, which adds black equally to both shorter sides of the image in order to transform it into a square format, and is then scaled to a size of 224 × 224 pixels. Finally, the images are converted into tensor format and normalized to ensure compatibility with the subsequent network analysis, i.e., the ResNet-50 model and the Grad-CAM analysis.

During the forward and backward passes, the system collects the required information to compute a weighted sum of activations after a backward pass, where the gradient of the predicted score is derived, resulting in a Grad-CAM heatmap. The channel-wise importance weights are obtained by applying global average pooling to the gradients, and the final heatmap is computed by the channel-wise weighted sum of the activation maps.

This heatmap highlights the image regions that are most important for classification, using a color scale ranging from green (low relevance) to red (high relevance), superimposed semi-transparently on the original image (see Figure 5).

The generated heat map is normalized and binarized using a threshold of 0.5 to obtain a binary mask of the most relevant regions. This value effectively suppresses background noise while preserving salient features that contribute to the model’s decision. Using OpenCV, contours are extracted from this mask. For each detected region, a bounding box is computed via cv2.boundingRect. For visualization purposes (see Figure 6), this bounding box is extended to the nearest vertical and horizontal image edges. However, the quadrant determination itself is based solely on the geometric center of the original activation region extracted from the Grad-CAM heatmap. This center point is compared to the four outer corners of the image, and the quadrant corresponding to the corner with the shortest Euclidean distance is assigned (top left, top right, bottom left, or bottom right).

In summary, the experiment aims to enable not only the classification of the Mode Hop feature, but also its spatial localization in the image by combining deep learning, visual explanation using Grad-CAM, and geometric analysis. The extension of the determined activation region to the edges of the image allows the relevant quadrant to be determined automatically.

4.3.2. Results and Interpretation

The experiment shows that the relevant image areas that contribute to the classification of the Mode Hop feature are reliably highlighted using Grad-CAM. This not only increases the traceability of the AI decision, but also makes it possible to localize the feature in the image space. The identified regions are visually marked by bounding boxes, and the affected image quadrant is automatically determined and displayed. Although no quantitative localization metric such as intersection-over-union (IoU) with expert-annotated ground-truth masks was applied, this is not directly applicable here since our approach focuses on quadrant-level determination rather than pixel-accurate segmentation. Instead, a qualitative assessment was conducted: Three independent shearography experts reviewed 7845 true-positive Mode Hop cases from dataset V2 with the corresponding Grad-CAM heatmaps. They confirmed that the quadrant assignments consistently matched the actual Mode Hop locations and did not reveal any systematic misinterpretations. This expert validation provides strong confidence in the plausibility of the Grad-CAM-based quadrant localization, even though a formal pixel-level evaluation remains open for future work. This localization offers several advantages: First, it is possible to recognize not only whether a Mode Hop is present, but also where it is located in the image. Second, the visual transparency improves the analysis of false-positive decisions, which helps to optimize the model and assess the relevance of such cases for the downstream process. Overall, the method makes an important contribution to explainable AI in Mode Hop recognition and increases the applicability of the model in practical scenarios.

4.4. Experiment 4: Statistical Validation of False Positives

4.4.1. Objective and Methodology

The objective of this experiment is the reduction of false-positive detections in automated Mode Hop classification through probability-based plausibility analysis and spatial correlation across image quadrants.

To suppress false-positive results, a probabilistic decision rule is integrated into the inspection process. The core idea is based on the statistical unlikelihood of two independent Mode Hops occurring in the same quadrant in consecutive inspections of the same tire sector. If such a repeated detection occurs, the event is classified as a true positive with high confidence. Conversely, if the second inspection does not reproduce the Mode Hop in the same quadrant, the event is classified as a false positive and excluded from further processing.

Process overview:

Initial detection: The AI model flags a Mode Hop in a given image quadrant during the first inspection run.
Targeted re-inspection: The same tire sector is immediately re-inspected under identical conditions.
Probability check: The second detection is compared with the first; only if the Mode Hop is present in the same quadrant is it confirmed as a true positive.
Decision:
- Confirmed → classification as a valid Mode Hop.
- Not confirmed → classification as false-positive, discard.

Formal decision rule:

Decision = \{\begin{matrix} True Positive, & if P (M_{1} \cap M_{2}) \geq τ \\ False-Positive, & if P (M_{1} \cap M_{2}) < τ \end{matrix}

with

$P (M_{1})$ : Probability of occurrence of a Mode Hop in the first test.
$P (M_{2} ∣ M_{1})$ : Probability of a true Mode Hop occurring again in the same quadrant.
$P (M_{1} \cap M_{2})$ : Probability that both events occur consistently.
$τ$ : Decision threshold.

Building on the method developed in Experiment 3 for localizing detected Mode Hop areas via Grad-CAM and image quadrant mapping, this fourth experiment investigates to what extent the repetition of tests of the same tire sector is suitable for validating or excluding false-positive results. The aim is to quantify the probability of a Mode Hop occurring repeatedly in the same region and to formalize the automated decision rule for the inspection process introduced above.

The fourth experiment, thus, addresses the problem of false positives in automated Mode Hop detection using deep learning by integrating spatial localization with probabilistic evaluation.

Below, we show the rule of Bayes’ theorem used in this work:

P (M_{1} \cap M_{2}) = P (M_{1}) \cdot P (M_{2} ∣ M_{1})

To quantify the frequency of Mode Hops within the dataset under investigation, an overall evaluation was carried out for the crown and sidewall areas as well as for the datasets T1 and T2 (see Table 4).

The probability of occurrence

P (M_{1})

of a Mode Hop is estimated by

P (M_{1}) = \frac{N_{M H}}{N_{Total}}

(1)

with:

$N_{MH}$ : Number of recordings with Mode Hops.
$N_{Total}$ : Total number of recordings (with and without Mode Hops).

The estimated probabilities based on the dataset T1 and T2 for the two test areas are as follows:

Crown:

$P_{Crown} = \frac{4155}{4,095,318} \approx 0.101 %$

(2)
Sidewall:

$P_{Sidewall} = \frac{7650}{2,421,570} \approx 0.316 %$

(3)
Total:

$P_{Total} = \frac{11,805}{6,516,888} \approx 0.181 %$

(4)

The basic data originate from datasets T1 and T2. In particular, it was calculated whether false-positive cases can be systematically identified by repeated occurrence in the same image quadrant. Figure 7 shows the center of gravity distribution of the affected quadrants across the test dataset V2.

The groupings in the quadrants each show the sum of the detected Mode Hops over the T1 and T2 datasets. Please note that there are no data at the top and bottom of the graph, which results from the zero-padding applied during pre-processing. Each point represents the centric center of a detected Mode Hop; the frequency of identical centers is visualized by a color scale. It can be clearly seen that the Mode Hops are distributed well across the quadrants with a standard deviation of 4.37%. This indicates a uniform occurrence without any recognizable systematic local dependency.

If it is also recognized in which of the four quadrants the respective Mode Hop occurs, the uncertainty can be further reduced. Assuming an equally distributed probability of occurrence across the four quadrants, the conditional probability results for a specific quadrant

Q_{i}

:

P_{M H | Q_{i}} = \frac{P_{Total}}{4} \approx \frac{0.181 %}{4} \approx 0.0453 %

(5)

with

$P_{M H | Q_{i}}$ : Conditional probability of a Mode Hop occurring in a specific quadrant $Q_{i}$ , assuming uniform distribution.
$P_{Total}$ : Total probability of a Mode Hop occurring in a single shearography image.

This corresponds to the conditional probability of occurrence of a Mode Hop in a known quadrant, assuming uniform distribution.

The probability that two consecutive Mode Hops occur is the product of the individual probabilities under the assumption of statistical independence, which is reasonable here since each inspection run is performed under identical but separate measurement cycles without causal influence between the occurrences:

P_{2 \times M H} = P_{Total} \cdot P_{Total} \approx {(0.181 %)}^{2} \approx 3.28 \times 10^{- 4} %

(6)

with

$P_{2 \times M H}$ : Probability that two Mode Hops occur independently in two consecutive shearography images.

The additional condition that both Mode Hops occur in the same quadrant results in a combined probability, the coincidence probability:

P_{2 \times M H in Q_{i}} \approx P_{2 \times M H} \cdot \frac{1}{4} \approx 3.28 \cdot 10^{- 4} % \cdot \frac{1}{4} \approx 8.2 \cdot 10^{- 5} %

(7)

with

$P_{2 \times M H in Q_{i}}$ : Probability that two Mode Hops occur consecutively in the same quadrant $Q_{i}$ , assuming statistical independence and uniform distribution.

This coincidence probability corresponds to the decision threshold

τ

.

Expectation:

\frac{1}{p} \approx \frac{1}{8.2 \times 10^{- 7}} \approx 1, 219, 512

(8)

with

$\frac{1}{p}$ : Expected number of cases until this event occurs once (mathematical expectation).

A probability of

8.2 \times 10^{- 5} %

corresponds to an occurrence of about 1 in 1,219,512 cases.

This emphasizes that, for the decision process, the relevant measure is not the overall probability

P (M_{1} \cap M_{2})

, but rather the conditional probability

P (M_{2} ∣ M_{1})

. The repeated inspection is only performed when a Mode Hop has already been detected in the first run. At this point, the decision rule evaluates whether the same feature is detected again in the same quadrant. Given the extremely low random coincidence probability

(8.2 \cdot 10^{- 5} %)

, a second detection in the same quadrant is accepted as a true positive with high confidence. If no Mode Hop is detected in the repeated measurement, the initial finding is classified as a false positive and discarded.

4.4.2. Probability Calculations and Result

Following this decision rule, false-positive cases from the first inspection are eliminated in the second run unless they are confirmed by repeated detection in the same quadrant. This process transforms many false-positive results into either verified true positives or removes them entirely from the set of detections, thereby increasing the overall accuracy:

When applying this verification rule, it is assumed that the 37 false-positive results identified in the first test are successfully filtered out, effectively reducing the number of false positives for the entire inspection process to zero. The resulting accuracy is

Accuracy = \frac{TP + TN}{TP + TN + FP + FN} = \frac{7845 + 15, 117}{7845 + 15, 117 + 0 + 38} \approx 99.83 % .

(9)

4.4.3. Uncertainty Estimation of the Repeat Probability

Let k denote the number of observed false-positive events on negatives and N the number of inspected negative images, with

{\hat{p}}_{FP} = k / N

. A 95% Wilson confidence interval for

p_{FP}

is reported. Since a retest is only triggered after a first detection, the relevant quantity is the conditional repeat probability

P_{repeat} = p_{FP} \cdot s,

(10)

where

s = \sum_{q} p_{q}^{2}

denotes the probability that two independent false positives fall into the same quadrant. For uniformly distributed false positives,

s \approx 0.25

.

For our data (

k = 37

,

N = 15, 117

), we obtain

{\hat{p}}_{FP} \approx 0.002448

(11)

with a 95% Wilson confidence interval of

[0.001776, 0.003372] .

(12)

Consequently, the conditional repeat probability estimate is

{\hat{P}}_{repeat} = \frac{1}{4} \cdot {\hat{p}}_{FP} \approx 6.12 \times 10^{- 4},

(13)

with a 95% confidence interval of

[P_{repeat, L}, P_{repeat, U}] \approx [4.44 \times 10^{- 4}, 8.43 \times 10^{- 4}] (= [0.0444 %, 0.0843 %]) .

(14)

This confirms that the chance of two consecutive false positives occurring in the same quadrant is negligible, supporting the robustness of the quadrant-repeat rule as a statistical safeguard against spurious confirmations.

In addition to accuracy, the inference time per shearography image was measured on an RTX 4090 GPU and determined to be approximately 12–13 milliseconds at 40% utilization. This corresponds to a throughput of more than 80 images per second, which meets the requirements for real-time integration into the tire inspection workflow. Hence, the proposed method is not only highly accurate but also computationally efficient, making it suitable for industrial deployment.

5. Discussion

The present work shows that the automated detection of Mode Hops in shearography-based tire inspection using deep learning is not only technically feasible, but can also be carried out with high accuracy. In particular, the ResNet-50 model pre-trained on the ImageNet dataset optimized by Optuna and combined with the explainable AI technique Grad-CAM achieves an extremely robust classification result with an accuracy of 99.67% and reliably determines the affected quadrant. The visualization using Grad-CAM also enables transparent traceability of the classification decision, creating a valid basis for industrial applications and possible certification procedures. The greatest strength of the developed system lies in its high generalization capability across two independent testing machines and a long-term stable dataset, and the clear reduction in the false-positive rate through plausibility checks across image quadrants. The integration of deep learning in conjunction with statistical probability analysis (Experiment 4) offers, for the first time, a method for plausible confirmation or refutation of an initial Mode Hop result, marking a decisive step towards automating the inspection decision. However, challenges remain. For example, the variance of Mode Hop manifestations continues to be a limiting factor. Even though the dataset has been expanded through augmentation and transfer learning (using pre-trained weights that incorporate another dataset), individual, rare manifestations are not fully represented. In addition, environmental factors such as temperature drift, the aging of optical components, and mechanical vibrations influence image quality and can, therefore, indirectly affect detection accuracy. To further reduce residual false-positive detections, a probability-based mechanism was developed, referred to as the quadrant-repeat decision rule. This rule defines that a Mode Hop detection is only accepted if the same image quadrant is repeatedly activated across consecutive image acquisitions of the same tire region. A central argument for safeguarding the AI system and its results is the probabilistic validation of false-positive findings using the quadrant-repeat decision rule introduced in Experiment 4. The probability of a random double occurrence of a Mode Hop in the same image quadrant is estimated at

8.2 \times 10^{- 5} %

(See Section 4.4.1), which is statistically negligible and practically equivalent to ruling out random coincidences. By applying this probability-based quadrant-repeat decision rule, false positives are effectively eliminated at the process level, reducing the estimated probability of repetition to ≤0.084%. This mechanism increases the overall inspection accuracy to 99.83%, representing a significant improvement compared to previous purely visual or experience-based evaluation methods. The study addresses many systemic sources of error, but does not exclude all of them. For example, the influence of the mechanical stability of individual machine components over time was only observed but not measured. Similarly, microclimatic differences in the measuring chamber or non-linear effects in the laser system could lead to further artifacts, the influence of which has not yet been causally analyzed. There is a need for further research here, particularly with regard to a robust system design that requires little maintenance over many years.

Several limitations of the present paper should be noted.

The Mode Hop labeling was performed by several experienced shearography experts, who sequentially inspected the dataset. Their annotations were subsequently compared and consolidated. However, a formal inter-rater reliability study, in which multiple experts independently label the same dataset for statistical agreement analysis, was not conducted. This may introduce subjectivity and remains an open issue for future work.
The validation was restricted to two shearography systems. While two independent machines were included, no third-party external dataset was available. We, therefore, added a leave-one-machine-out validation to mimic an external shift; nevertheless, a true external validation on different hardware and environments remains for future work.
The applied data augmentation was limited to mirroring and small rotations. While this preserved realistic shearographic patterns and was sufficient for the present proof of concept, it did not extend beyond the naturally occurring variations already present in the dataset, such as intensity- or contrast-related changes due to laser drift, contamination, reflections, or operational vibrations. Because only two machines were used, these effects are represented but not in the full range of possible industrial conditions.
While Grad-CAM provided plausible quadrant-level localization of Mode Hops, no quantitative pixel-level metric such as IoU was applied, as this is not directly applicable in our setup. The method focuses on quadrant determination rather than precise segmentation, and detailed annotation masks are not available in the dataset. The absence of such annotations remains a limitation for future work.
The probabilistic validation in Experiment 4 assumed independence and did not include an explicit uncertainty interval. Although the variance is negligible in large datasets, this remains a limitation for smaller datasets.
Indications of overfitting were observed in Experiment 2 after epoch 6, likely due to the small validation set size. While test results confirmed high generalization, the limited validation data constrain interpretability.
Finally, aspects of industrial implementation such as long-term machine stability, certification requirements, and operator acceptance were not addressed in this work. These factors are critical for practical deployment and should be investigated in future research.

The following questions in particular arise for future work:

To what extent can an online learning process adaptively adjust the model to new machine environments?
What requirements must an explainable AI system meet in order to be recognized as a certifiable system in non-destructive testing?

6. Summary and Outlook

The method for Mode Hop detection developed in this thesis represents a significant extension of existing test procedures. For the first time, a deep learning workflow was introduced that detects, localizes, and explains Mode Hops with high precision. The combination of ResNet-50, Grad-CAM, and probability analysis raises the state of the art to a new level and provides a solid basis for the automation of tire testing.

The main contributions of this work can be summarized as follows:

Development of a robust classification model for Mode Hops with 99.67% accuracy.
Visual verification using Grad-CAM for explainable decisions.
Introduction of a statistically validated method for the verification and elimination of false-positive findings.
Establishment of one of the most comprehensive datasets in shearography-based tire inspection to date.

The results of this work enable a significant increase in inspection process reliability and efficiency. Automated detection and localization of Mode Hops can minimize wrong decisions, shorten inspection times, and reduce manual visual inspections. In particular, this opens up the possibility of fully automating the shearography-supported tire inspection process.

From an industrial perspective, the developed approach can be integrated into existing shearography-supported tire testing machines and forms the basis for a patented process of automated, AI-supported Mode Hop detection. The combination of AI, explainable analysis, and probability testing represents a unique selling point and fulfills essential requirements for future certifiable testing procedures in safety-critical industries.

By integrating the probability-based quadrant-repeat decision rule, the workflow effectively suppresses process-level false positives (estimated probability of repetition

\leq 0.084 %

), thereby increasing the overall inspection accuracy to 99.83%. This statistically validated improvement highlights the practical relevance of the proposed approach for reliable, certifiable AI-assisted tire inspection.

Beyond tire inspection, the proposed workflow illustrates how explainable AI can be embedded into non-destructive testing and advanced manufacturing, supporting real-time anomaly detection and transparent decision-making. This transferability highlights the potential impact of XAI not only in tire inspection but also across broader domains of quality assurance and industrial diagnostics.

A promising approach for future research is the detailed analysis of false-negative images. Such an examination could help to identify previously unrecognized visual patterns and failure modes of the model. The resulting insights can be used to selectively expand and diversify the training dataset, ensuring that such cases are better represented and improving the robustness of future inspections.

Future work should also extend the validation to additional shearography systems and operating conditions in order to further verify the robustness and generalization capability of the trained models. We plan first, train-on-site-A/test-on-site-B protocols, second, periodic re-validation to capture temporal drift, and third, standardized reporting across subgroups to quantify domain robustness. True external validation on independent machines in partner plants will be a crucial next step.

Finally, the approach should be contextualized within broader XAI applications in non-destructive testing and advanced manufacturing. Beyond tire inspection, the developed workflow can serve as a transferable template for explainable AI in real-time quality control, where methods such as Grad-CAM are increasingly used for anomaly detection and process diagnostics. Embedding explainable AI into industrial workflows not only increases trust and acceptance but also supports the path toward future certifiable AI systems in safety-critical industries.

The detection of Mode Hops is not just a technical detail in image analysis. It is a critical building block for the full automation of tire testing. This work has created a scientifically sound, practically feasible, and technologically innovative solution that forms the basis for an industrially relevant patent.

Author Contributions

Conceptualization, M.F.; methodology, M.F.; software, M.F.; validation, M.F., M.M. and K.S.; formal analysis, M.F.; investigation, M.F.; resources, M.F.; data curation, M.F.; writing—original draft preparation, M.F.; writing—review and editing, M.M. and K.S.; visualization, M.F.; supervision, M.M. and K.S.; project administration, K.S.; funding acquisition, K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by SDS Systemtechnik GmbH.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from SDS Sytemtechnik GmbH and are available from the authors with the permission of SDS Sytemtechnik GmbH.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gong, Q.; Yan, S.; Yang, G.; Wang, Y. Mode-hopping suppression of external cavity diode laser by mode matching. Appl. Opt. 2014, 53, 7878–7882. [Google Scholar] [CrossRef] [PubMed]
Saleh, A.; Al-Tashi, Q.; Mirjalili, S.; Alhussian, H.; Ibrahim, A.; Omar, M.; Abdulkadir, S. Explainable Artificial Intelligence-Powered Foreign Object Defect Detection with Xception Networks and Grad-CAM Interpretation. Appl. Sci. 2024, 14, 4267. [Google Scholar] [CrossRef]
Zhao, Q.; Dan, X.; Sun, F.; Wang, Y.; Wu, S.; Yang, L. Digital Shearography for NDT: Phase Measurement Technique and Recent Developments. Appl. Sci. 2018, 8, 2662. [Google Scholar] [CrossRef]
Wang, R.; Guo, Q.; Lu, S.; Zhang, C. Tire defect detection using fully convolutional network. IEEE Access 2019, 7, 43502–43510. [Google Scholar] [CrossRef]
Chang, C.-Y.; Srinivasan, K.; Wang, W.-C.; Ganapathy, G.P.; Vincent, D.R.; Deepa, N. Quality assessment of tire shearography images via ensemble hybrid faster region-based ConvNets. Electronics 2020, 9, 45. [Google Scholar] [CrossRef]
Li, W.; Wang, D.; Wu, S. Simulation Dataset Preparation and Hybrid Training for Deep Learning in Defect Detection Using Digital Shearography. Appl. Sci. 2022, 12, 6931. [Google Scholar] [CrossRef]
Rojas-Vargas, F.; Pascual-Francisco, J.B.; Hernández-Cortés, T. Applications of shearography for non-destructive testing and strain measurement. Int. J. Comb. Optim. Probl. Inform. 2020, 11, 21–36. [Google Scholar] [CrossRef]
Ma, M.; Hu, Z.; Xu, P.; Wang, W.; Hu, Y. Detecting mode hopping in single-longitudinal-mode fiber ring lasers based on an unbalanced fiber Michelson interferometer. Appl. Opt. 2012, 51, 7420–7425. [Google Scholar] [CrossRef] [PubMed]
Winkler, L.; Nölleke, C. Artificial neural networks for laser frequency stabilization. Opt. Express 2023, 31, 32188–32199. [Google Scholar] [CrossRef] [PubMed]
Sun, C.; Kaiser, E.; Brunton, S.L.; Kutz, J.N. Deep reinforcement learning for optical systems: A case study of mode-locked lasers. arXiv 2020, arXiv:2006.05579. [Google Scholar] [CrossRef]
Yan, Q.; Tian, Y.; Zhang, T.; Lv, C.; Meng, F.; Jia, Z.; Qin, W.; Qin, G. Machine learning based automatic mode-locking of a dual-wavelength soliton fiber laser. Photonics 2024, 11, 47. [Google Scholar] [CrossRef]
Fu, X.; Brunton, S.L.; Kutz, J.N. Classification of birefringence in mode-locked fiber lasers using machine learning and sparse representation. Opt. Express 2014, 22, 8585–8597. [Google Scholar] [CrossRef]
Tricot, F.; Phung, D.H.; Lours, M.; Guerandel, S.; de Clercq, E. Power stabilization of a diode laser with an acousto-optic modulator. arXiv 2018. [Google Scholar] [CrossRef]
Wu, R.; Wei, H.; Lu, C.; Liu, Y. Automatic and Accurate Determination of Defect Size in Shearography Using U-Net Deep Learning Network. J. Nondestruct. Eval. 2025, 44, 12. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS); Neural Information Processing Systems Foundation, Inc.: South Lake Tahoe, NV, USA, 2019; Volume 32, pp. 8024–8035. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
Zhang, L.; Bian, Y.; Jiang, P.; Zhang, F. A Transfer Residual Neural Network Based on ResNet-50 for Detection of Steel Surface Defects. Appl. Sci. 2023, 13, 5260. [Google Scholar] [CrossRef]
Xu, W.; Fu, Y.-L.; Zhu, D. ResNet and Its Application to Medical Image Processing: Research Progress and Challenges. Comput. Methods Programs Biomed. 2023, 240, 107660. [Google Scholar] [CrossRef] [PubMed]
Xu, Z.; Luo, H.; Zeng, X.; Zhang, Y.; Guo, Y.; Huang, J. Brain Tumor Classification from MRI Using Residual Network and Transfer Learning. Diagnostics 2024, 14, 869. Available online: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11898819/ (accessed on 31 August 2025 ).
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
Bergstra, J.; Yamins, D.; Cox, D.D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013; Volume 28, pp. 115–123. Available online: https://proceedings.mlr.press/v28/bergstra13.html (accessed on 31 August 2025 ).
Prechelt, L. Early stopping – but when? In Neural Networks: Tricks of the Trade; Orr, G., Müller, K., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–69. [Google Scholar] [CrossRef]
Ting, K.M. Confusion matrix. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2010; p. 209. [Google Scholar] [CrossRef]

Figure 1. The figure illustrates the principle of shearography used in tire inspection. The shearography camera and its laser diodes illuminate the tire surface to record phase variations under different pressure conditions. The blue line represents the tire surface under normal pressure, while the red line shows the deformation under negative pressure. Two visible deformations can be observed, corresponding to defects within the tire carcass. The arrows indicate the deformation of defects between the normal-pressure and under-pressure.

Figure 2. The figure illustrates the camera system and laser configuration used in the shearography-based tire testing setup. On the left, the hardware arrangement of the camera unit is shown, including four laser diodes positioned in the corners around the camera lens. On the right, the illumination pattern of the tire surface produced by the laser diodes is depicted, together with the image area captured by the camera.

Figure 3. Training and validation metrics in Experiment 2: performance trends of the model in Experiment 2, indicating convergence and overfitting after epoch 6.

Figure 4. Confusion matrix for Experiment 2: confusion matrix of the final model based on test dataset V2, showing significantly reduced false positives and false negatives.

Figure 5. Grad-CAM heatmap of Mode Hop localization (Experiment 3): visualization of network attention used for Mode Hop classification and spatial localization. (a) Original input image. (b) Image with heatmap overlay.

Figure 6. Bounding box based on Grad-CAM evaluation (Experiment 3): illustration of quadrant-based Mode Hop localization using heatmap contour detection. (a) Original input image. (b) Image with bounding box, where the green box highlights the localized Mode Hop region identified by the Grad-CAM analysis.

Figure 7. Relative distribution of Mode Hops in the quadrant distribution in test dataset V2.

Table 1. Summary of shearography data recorded by two reference tire inspection machines in field operation. Overview of the cleaned shearography image data and the number of tire inspections and resulting sector images performed over different time periods by two automated inspection systems in real operation.

	Testing Machine 1	Testing Machine 2
Recording period	8 Months	42 Months
Tire tests before cleaning	30,774	239,032
Cleaned shearography images of the crown	246,143	1,926,079
Cleaned shearography images of the sidewall	492,286	3,852,380
Cleaned image data total	6,516,888

Table 2. Tire condition distribution in dataset T1 (manual inspection): evaluation of tire condition (good, critical, and bad) based on manual visual inspection.

Dataset T1	Evaluated Tire Tests	Tire Condition Good	Tire Condition Critical	Tire Condition Bad
Testing machine 1	30,704	23,932	1613	5159
Percentage	100%	77.95%	5.25%	16.80%

Table 3. Variability and visual characteristics of Mode Hops in shearography-based tire inspection. This table presents a structured overview of typical Mode Hop manifestations in shearography phase images. It compares occurrences in two main tire regions: the Crown (tread area) and the Sidewall. Five primary artifact types are identified and illustrated:

Characteristic
Sharp artifact boundaries: Abrupt transitions or edges in the fringe pattern. The artifact is visible in the upper-left quadrant of the Crown image and the upper-right quadrant of the Sidewall image.
Intensity changes: Linear or wavy brightness variations not related to physical defects. The artifact appears in the upper-right quadrant of both Crown and Sidewall images.
Noise: High-frequency graininess superimposed on the interference fringes. The noise is evident in the upper-left quadrant of both Crown and Sidewall images.
Combination of intensity and noise: A mixed manifestation of brightness changes and noise, visible in the upper-left quadrant of the Crown image and the lower-right quadrant of the Sidewall image.
Fragmented interference rings: Broken or incomplete circular fringe structures. In this case, the feature is present only in the upper-left quadrant of the Sidewall image.

Table 4. Mode Hop occurrence in cleaned datasets T1 and T2 by using classifier model from Experiment 1—Absolute number of Mode Hop and non-Mode Hop images in the crown and sidewall recordings.

	Dataset T1	Dataset T2	Dataset Total
Crown recording without Mode Hops	245,833	3,845,330	4,091,163
Crown recording with Mode Hops	310	3845	4155
Sidewall recording without Mode Hops	491,686	1,922,234	2,413,920
Sidewall recording with Mode Hops	600	7050	7650
Total recordings with Mode Hop	910	10,895	11,805
Total recordings	738,429	5,778,459	6,516,888

Table 5. Training dataset V1: augmented vs. original images—summary of Mode Hop images and augmentation strategy for crown and sidewall areas.

Training Dataset V1	With Mode Hop	Augmented Data	Total Images with Mode Hop	Without Mode Hop
Shearography images of the crown	154	462	616	616
Shearography images of the sidewall	154	462	616	616

Table 6. The training and model parameters used in Experiment 1: Architecture, optimizer, loss function, and other settings applied in the initial training round.

Training and Model Parameters	Value/Setting
Model	pre-trained ResNet-50 (ImageNet-1K)
Output layer	`fc = nn.Linear(2048, 2)`
Optimizer	Adam
Learning rate	1 × 10⁻⁴
Weight decay	1 × 10⁻⁴
Loss function	cross-entropy loss
Number of epochs	10
Batch size	16
Train-/validation-split	80%/20%
Random rotation	±10° (training only)
Normalization	`mean = [0.485, 0.485, 0.485]` `std = [0.229, 0.229, 0.229]`

Table 7. Test Dataset V1.

Test Dataset V1	With Mode Hop	Augmented Data	Total Images with Mode Hop	Without Mode Hop
Shearography images	15	39	54	60

Table 8. This table contains examples of recurring patterns observed in false-positive classifications in the model from Experiment 1. These images were incorrectly labeled as Mode Hops. The artifacts have visual similarities to real Mode Hop features.

Similar patterns and combined noise:

Linear structures resulting from tire lettering or surface textures, in combination with high-frequency image noise or only high-frequency image noise, produce artifacts that closely resemble the visual characteristics of true Mode Hops. This resemblance can lead to false-positive classifications by the model. Correct identification as false positives typically requires expert knowledge and, in some cases, careful visual inspection by trained personnel.

Table 9. Training dataset V2: enlarged and augmented dataset; refined Mode Hop dataset with more samples and augmentations for improved generalization.

Training Dataset V2	With Mode Hop	Augmented Data	Total Images with Mode Hop	Without Mode Hop
Shearography images of the crown	751	2253	3004	2440
Shearography images of the sidewall	629	1887	2516	2245
Total	1380	4140	5520	4685

Table 10. Test dataset V2: composition of evaluation data; dataset used for final model evaluation.

Test Dataset V2	With Mode Hop	Augmented Data	Total Images with Mode Hop	Without Mode Hop
Shearography images of the crown	1867	–	1867	4173
Shearography images of the sidewall	6016	–	6016	10,944
Total	7883	–	7883	15,117

Table 11. The best hyperparameter setting achieved.

Hyperparameter	Value/Setting
Optimizer	SGD
Learning rate	0.00264
Weight decay	0.00873
Loss function	CrossEntropyLoss
Batch size	32
Momentum	0.7878

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Friebolin, M.; Munz, M.; Schlickenrieder, K. Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection. AI 2025, 6, 275. https://doi.org/10.3390/ai6100275

AMA Style

Friebolin M, Munz M, Schlickenrieder K. Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection. AI. 2025; 6(10):275. https://doi.org/10.3390/ai6100275

Chicago/Turabian Style

Friebolin, Manuel, Michael Munz, and Klaus Schlickenrieder. 2025. "Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection" AI 6, no. 10: 275. https://doi.org/10.3390/ai6100275

APA Style

Friebolin, M., Munz, M., & Schlickenrieder, K. (2025). Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection. AI, 6(10), 275. https://doi.org/10.3390/ai6100275

Article Menu

Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection

Abstract

1. Introduction

2. State of the Art

2.1. Shearography

2.2. Mode Hop as a Disturbance in Shearographic Inspection

2.3. Deep Learning in Non-Destructive Testing

3. Methodology

3.1. Measuring Setup and Testing Machines

3.2. Dataset Description

3.3. Model Architecture: ResNet-50 CNN Implemented in PyTorch

3.4. Rationale for Backbone Selection

3.5. Training Strategy and Hyperparameter Optimization

4. Experiments and Results

4.1. Experiment 1: Initial Training and Mode Hop Identification

4.1.1. Training Dataset Overview

4.1.2. Training Setup

4.1.3. Test Dataset and Results

4.2. Experiment 2: Data Expansion and Model Robustness

4.2.1. Objective and Strategy

4.2.2. Training and Test Dataset Overview

4.2.3. Hyperparameter Settings

4.2.4. Training and Results

4.3. Experiment 3: Explainability and Localization Using Grad-CAM

4.3.1. Objective and Setup

4.3.2. Results and Interpretation

4.4. Experiment 4: Statistical Validation of False Positives

4.4.1. Objective and Methodology

4.4.2. Probability Calculations and Result

4.4.3. Uncertainty Estimation of the Repeat Probability

5. Discussion

6. Summary and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI