Enhancing Interpretability in Drill Bit Wear Analysis through Explainable Artificial Intelligence: A Grad-CAM Approach

: This study introduces a novel method for analyzing vibration data related to drill bit failure. Our approach combines explainable artificial intelligence (XAI) with convolutional neural networks (CNNs). Conventional signal analysis methods, such as fast Fourier transform (FFT) and wavelet transform (WT), require extensive knowledge of drilling equipment specifications, which limits their adaptability to different conditions. In contrast, our method leverages XAI algorithms applied to CNNs to directly identify fault signatures from vibration signals. The signals are transformed into their frequency components and then employed as inputs to a CNN model, which is trained to detect patterns indicative of drill bit failure. XAI algorithms are then employed to generate attention maps, highlighting regions of interest in the CNN. By scrutinizing these maps, engineers can identify critical frequencies associated with drill bit failure, providing valuable insights for maintenance and optimization. This method offers a transparent and interpretable framework for analyzing vibration data, enabling informed decision-making and proactive maintenance strategies to enhance drilling efficiency and minimize downtime. The integration of XAI with CNNs facilitates a deeper understanding of the root causes of drill bit failure and improves overall drilling performance.


Introduction
Vibration signal analysis plays a crucial role in various industries, enabling the detection and diagnosis of mechanical faults in machinery and structures.It involves measuring the oscillations of an object in response to internal or external forces, using sensors such as accelerometers or piezoelectric transducers [1].It aims to extract meaningful information from these vibrations to assess the condition, performance, and health of the equipment being monitored.In industrial applications, vibration analysis is widely used for the predictive maintenance, condition monitoring, and fault diagnosis of the machinery [2].By analyzing the frequency, amplitude, and other characteristics of vibration signals, engineers can detect abnormal patterns or deviations from normal operation, which may indicate impending failures or malfunctions [3].This helps prevent unexpected downtime, reduce maintenance costs, and extend the lifespan of the equipment [4].Vibration signal analysis encompasses many techniques, which include time-domain and frequency analysis.Time series analysis entails examining raw vibration signals in their time-domain representation to identify trends, changes, and periods [5][6][7].With the spectral analysis vibration signals decomposed into their constituent frequency components, the identification of dominant frequencies associated with specific faults or operational conditions is possible [8].
A drilling operation is a dynamic process that involves the high-speed rotation of a drill bit and its interaction with geological formations.These interactions generate vibration signals that contain valuable information about the condition of the drill bit, the effectiveness of the drilling process, and the properties of the surrounding rock formations [9].Drill bit wear is a major concern in drilling operations.It is a gradual process affected by factors like drilling parameters, rock properties, and operational conditions [10].Detecting and assessing drill bit wear early on is crucial to avoid expensive downtime, minimize maintenance costs, and maintain operations [11,12].Vibration analysis is a conventional method for detecting subtle changes in the vibration patterns of drill bits.These changes could indicate the beginning of wear or degradation.By analyzing the frequency, amplitude, and other characteristics of these signals, deviations from normal operations can be detected [13].These deviations may appear as changes in vibration harmonics, increased levels of high-frequency noise, or alterations in the frequency spectrum of the vibration signal [14].
Signal analysis techniques such as the fast Fourier transform (FFT) and wavelet transform (WT) are commonly used in various studies.The FFT breaks down time-based waveforms into different frequency components [15] while the WT analyzes signals in both the time and frequency domains simultaneously [16].Li et al. [17] conducted an analysis of vibration signals from a downhole drill string.They used a three-axis accelerometer to measure vibration signals during drilling in an oil well.The time-frequency characteristics of the signal were then analyzed using both FFT and short-time Fourier transform (STFT).This analysis proved effective in identifying harmful vibrations associated with drilling into igneous rock.Rafezi and Hassani [18] applied signal analysis to study the relationship between drill signals and bit wear by analyzing measurement-while-drilling (MWD) data.Their analysis included both time domain statistical analysis and frequency spectrum analysis of the signals.They were able to identify specific frequency bands that were responsive to both bit wear and drillability.Similarly, Karakus and Perez [19] discovered that acoustic emission monitoring techniques are effective in optimizing diamond core drilling performance and detecting changes in drilling conditions.They attached acoustic emission sensors to both the drill and the rocks, recorded emitted acoustic signals during drilling, and analyzed the waveforms.Their observations revealed that as wear accelerates, the amplitudes of the acoustic emissions decrease.
Another study by Kawamura et al. [20] used a GoPro camera to capture drilling sounds and applied signal processing techniques, such as time series analysis, FFT, and WT, to assess button bit failure.The results of this study demonstrated the usefulness of drilling sounds in detecting the condition of drill bits.However, signal analysis methods have some limitations.Traditional vibration analysis relies heavily on the expertise and understanding of the analyst, making the interpretation of vibration signals subjective and susceptible to human errors [21,22].In today's complex industrial environment, where machinery and structures are becoming increasingly intricate, conventional vibration analysis may struggle to effectively handle this complexity [23].This can lead to inaccurate detection and diagnosis of faults.Furthermore, the sheer volume of vibration data generated can be overwhelming, leading to time-consuming tasks related to data analysis and management.It is crucial to tackle these challenges to enhance the accuracy and efficiency of vibration analysis techniques.
By integrating advanced technologies like artificial intelligence and machine learning, we can overcome these limitations and create more reliable and effective methods for fault detection and diagnosis.Machine learning algorithms are capable of handling complex data patterns and efficiently processing large amounts of data, enabling real-time monitoring [24].Igbal and Madan [14] utilized a CNN algorithm to detect bearing faults using vibration and acoustic signals.Their approach involved several steps, including data preprocessing, STFT, and feature extraction.The model was evaluated on four distinct bearing conditions and achieved a classification accuracy of 100%.In another study, Kumar et al. [25] developed artificial neural network (ANN) models to predict the geomechanical properties of sedimentary rock types using acoustic signal dominant frequencies.A microphone was used to record the drilling sounds during the core drilling operations in the laboratory.The ANN model was efficient in determining the physicomechanical rock properties.Senjoba et al. [26] proposed a method to detect drill bit failure in rotary percussion drills using a one-dimensional convolutional neural network (1D CNN) with drill vibration as the input data.The 1D CNN model was evaluated on five drilling conditions: normal, defective, abrasion, high pressure, and misdirection.The model achieved a classification accuracy of 88.7%.
CNNs are highly effective for tasks such as fault detection.However, understanding the reasons behind their predictions can be challenging.This lack of interpretability is a crucial issue in applications where transparency, accountability, and the ability to explain decisions are essential.It can also hinder the identification and resolution of biases or errors in the model's predictions [27].To address this, explainable artificial intelligence (XAI) can be applied to make deep learning models transparent to users.XAI aims to make the decision-making process of AI systems understandable and transparent to humans by employing diverse techniques.A notable method involves highlighting the specific regions within an input that contribute most significantly to the model's predictions [28].In this study, we propose a novel approach that integrates XAI techniques, specifically Gradient-weighted Class Activation Mapping (Grad-CAM), with CNNs for interpretable drill bit wear detection from vibration data.Although the integration of Grad-CAM with CNN is not a new concept, our study introduces an innovative application of this technique in the domain of drill bit wear detection.By leveraging Grad-CAM, our approach provides transparent visualizations that highlight the regions of interest within vibration signals contributing to the detection of drill bit wear.This newfound interpretability not only enhances the trustworthiness of the detection system but can also enable domain experts to gain actionable insights into the underlying causes of wear and degradation.This integration introduces interpretability to the diagnostic process, enabling stakeholders to not only rely on accurate predictions but also gain insights into the features and patterns that drive the model's decisions.By applying XAI to vibration analysis, engineers and researchers can pinpoint the specific areas responsible for failure.This aids in fault localization, root cause analysis, and effective maintenance strategies.In this manuscript, we present a detailed exploration of the application of Grad-CAM for interpretable drill bit wear detection from vibration data.We evaluate the performance of our proposed method on real-world drilling datasets, showcasing its effectiveness in identifying and assessing drill bit wear under diverse operating conditions.Figure 1 shows the proposed approach.
Appl.Sci.2024, 14, x FOR PEER REVIEW 3 of 15 distinct bearing conditions and achieved a classification accuracy of 100%.In another study, Kumar et al. [25] developed artificial neural network (ANN) models to predict the geomechanical properties of sedimentary rock types using acoustic signal dominant frequencies.A microphone was used to record the drilling sounds during the core drilling operations in the laboratory.The ANN model was efficient in determining the physicomechanical rock properties.Senjoba et al. [26] proposed a method to detect drill bit failure in rotary percussion drills using a one-dimensional convolutional neural network (1D CNN) with drill vibration as the input data.The 1D CNN model was evaluated on five drilling conditions: normal, defective, abrasion, high pressure, and misdirection.The model achieved a classification accuracy of 88.7%.CNNs are highly effective for tasks such as fault detection.However, understanding the reasons behind their predictions can be challenging.This lack of interpretability is a crucial issue in applications where transparency, accountability, and the ability to explain decisions are essential.It can also hinder the identification and resolution of biases or errors in the model's predictions [27].To address this, explainable artificial intelligence (XAI) can be applied to make deep learning models transparent to users.XAI aims to make the decisionmaking process of AI systems understandable and transparent to humans by employing diverse techniques.A notable method involves highlighting the specific regions within an input that contribute most significantly to the model's predictions [28].In this study, we propose a novel approach that integrates XAI techniques, specifically Gradient-weighted Class Activation Mapping (Grad-CAM), with CNNs for interpretable drill bit wear detection from vibration data.Although the integration of Grad-CAM with CNN is not a new concept, our study introduces an innovative application of this technique in the domain of drill bit wear detection.By leveraging Grad-CAM, our approach provides transparent visualizations that highlight the regions of interest within vibration signals contributing to the detection of drill bit wear.This newfound interpretability not only enhances the trustworthiness of the detection system but can also enable domain experts to gain actionable insights into the underlying causes of wear and degradation.This integration introduces interpretability to the diagnostic process, enabling stakeholders to not only rely on accurate predictions but also gain insights into the features and patterns that drive the model's decisions.By applying XAI to vibration analysis, engineers and researchers can pinpoint the specific areas responsible for failure.This aids in fault localization, root cause analysis, and effective maintenance strategies.In this manuscript, we present a detailed exploration of the application of Grad-CAM for interpretable drill bit wear detection from vibration data.We evaluate the performance of our proposed method on real-world drilling datasets, showcasing its effectiveness in identifying and assessing drill bit wear under diverse operating conditions.Figure 1 shows the proposed approach.

Background Theories 2.1. Convolutional Neural Networks
CNNs are a class of deep neural networks widely applied for tasks like image classification and feature extraction.They consist of several layers, including convolutional, pooling, and fully connected layers.Within the convolutional layers, the inputs are convolved with filters to extract the features.Hyperparameters like the filter size, stride, and the number of filters play pivotal roles in this process [29].The operation of a single filter in the convolutional operation is demonstrated by Equation (1).
where x is the input, l is the index of filters in the convolution layer, k is the index of the convolutional layer, f is the non-linear activation function, * represents the convolutional operation, b is the bias, Zlk is the feature map generated by the l-th filter, and αl is the corresponding kernel matrix of the l-th filter.
The Pooling layer divides the feature map into equal-length segments and represents each segment by its average or maximum value.This pooling operation down samples the output bands of convolution, effectively reducing variability in hidden activations.After multiple convolution and pooling operations, the original input image is transformed into a series of feature maps.These feature maps are then connected to generate a new output, forming the final representation of the original input.This representation is fully connected via weights to the output layer.The values from the fully connected layers are passed through a non-linear activation function, such as the softmax layer, which returns the probability of each class.During the training process, the weights and biases are learned using the stochastic gradient descent method, while the gradients are computed using the backpropagation method.The softmax function is represented by Equation (2) [30].
CNNs have various architectures, each designed to effectively handle specific types of data and tasks.These architectures differ in terms of depth, width, specific layers, and connectivity patterns.Some popular architectures include AlexNet, VGG, GoogLeNet, Inception, and ResNet-50.Each of these architectures has its own unique advantages and trade-offs in terms of computational efficiency, memory usage, and performance on different tasks.In this study, a ResNet-50 algorithm was used.The ResNet-50 model consists of a series of residual blocks, each containing two or more convolutional layers.
The key feature of a residual block is the inclusion of a "skip connection" that adds the input to the block's output shown in Figure 2.This allows the network to learn residual mapping, enhancing the identity mapping.By incorporating residual connections, information can bypass certain layers, effectively addressing the issue of vanishing gradients.This improves the model's learning efficiency and simplifies the training of deep networks, which is crucial for capturing complex features [31,32].

Gradient Class Activation Mapping
Grad-CAM is a technique used to interpret and highlight the regions of interest in input data that have the most impact on a model's predictions.It achieves this by visualizing the activation maps generated by a CNN model.By doing so, Grad-CAM provides valuable insights into the features and patterns present in the vibration data that indicate

Gradient Class Activation Mapping
Grad-CAM is a technique used to interpret and highlight the regions of interest in input data that have the most impact on a model's predictions.It achieves this by visualizing the activation maps generated by a CNN model.By doing so, Grad-CAM provides valuable insights into the features and patterns present in the vibration data that indicate drill bit wear.Unlike class activation mapping (CAM), which requires a specific type of CNN with a global average pooling layer to generate feature maps, Grad-CAM is more versatile and can be applied to a wider range of CNN architectures [33,34].To obtain the class-specific localization map, which is a two-dimensional matrix representing the discriminative regions for a particular class, the process involves calculating the gradient of the class score (y c ) with respect to the activations (A k ) of a specific convolutional layer.This is expressed in Equation ( 3).These gradients are then globally averaged across both the width (i) and height (j) dimensions.This pooling operation results in the neuron importance weights.
While computing α c k during the backward pass, the process entails a sequence of matrix multiplications between weight matrices and the gradients with respect to activation functions.This computational procedure effectively creates a partial linearization of the deep network downstream from the activation map A k .The weight α c k essentially quantifies the 'importance' of feature map k in relation to the target class c.To generate the class-specific localization map L c Grad−CAM , a weighted combination of the forward activation maps is performed.This combination is then followed by the application of a Rectified Linear Unit (ReLU) activation function.The resulting expression is presented as Equation ( 4): In this equation, the variable k encompasses all feature maps, while α c k represents the weight assigned to feature map k for class c.The ReLU operation ensures that any negative values are changed to zero, highlighting the regions that have a positive impact on the classification of class c.By using this formulation, the Grad-CAM localization map can pinpoint the important areas in the image that aid in predicting the target class [35].

Data Collection and Preprocessing
In this study, we employed the data collection and preprocessing methodology described by Senjoba et al. [26], where the temporal acceleration of drill vibrations was recorded using acceleration sensors (600 series; TEAC Corporation, Tokyo, Japan) fixed to the guide cell of a rock drill.A stationary drifter (YH-70 Yamamoto Rock Machine Ltd., Tokyo, Japan) was used for horizontally drilling through 18 m 3 of marble rock at a sampling frequency of 50 kHz.The operational parameters are shown in Table 1, including impact pressure, striking frequency, rotation pressure, and hits per minute, and were consistent with those reported in Senjoba et al. [26] Different drill bits, each with distinct conditions, were used for data collection.These included a standard bit (normal) and a bit with chipped buttons (abnormal) as shown in Figure 3.A controlled environment was maintained during drilling by regulating the drilling parameters under all conditions.This ensured consistency across the conditions, enabling us to capture relevant signals.The dataset was divided into smaller segments to capture local variations effectively and to reduce computation time.Each segment was 0.06 s in length, containing 3000 data points.This ensured that at least 3 drill hits were captured within each segment.To minimize information loss at the segment edges, a 12.5% overlap was employed.The normal and abnormal condition comprised of 3784 data points each.Thereafter, FFT was used to convert the time-domain data segments into their frequency domain.This transformation enables a thorough examination of the different frequency components present in the signal.This ensured consistency across the conditions, enabling us to capture relevant signals.
The dataset was divided into smaller segments to capture local variations effectively and to reduce computation time.Each segment was 0.06 s in length, containing 3000 data points.This ensured that at least 3 drill hits were captured within each segment.To minimize information loss at the segment edges, a 12.5% overlap was employed.The normal and abnormal condition comprised of 3784 data points each.Thereafter, FFT was used to convert the time-domain data segments into their frequency domain.This transformation enables a thorough examination of the different frequency components present in the signal.

Vibration Signal Analysis
Signal analysis plays a crucial role in understanding the dynamic behavior of drilling operations.Time series and frequency analysis were conducted to identify temporal patterns and frequency components in the vibration signals [36].These patterns and components provide insights into various conditions affecting the drill bit's performance.This study carried out a comparative analysis between a standard drill bit and a faulty one to underscore differences in their vibration characteristics.The analysis focused on significant frequencies and distinctive peaks, indicating notable variations in vibration characteristics between the two drill conditions.Figures 4 and 5 display the time-series waveforms and frequency spectra comparisons of the two conditions.Time series analysis revealed no discernible trends, with both normal and faulty conditions exhibiting fluctuations in acceleration amplitude over time.Subsequently, the frequency analysis of the twobit conditions was carried out, as depicted in Figures 4b and 5b.Generally, it was noted

Vibration Signal Analysis
Signal analysis plays a crucial role in understanding the dynamic behavior of drilling operations.Time series and frequency analysis were conducted to identify temporal patterns and frequency components in the vibration signals [36].These patterns and components provide insights into various conditions affecting the drill bit's performance.This study carried out a comparative analysis between a standard drill bit and a faulty one to underscore differences in their vibration characteristics.The analysis focused on significant frequencies and distinctive peaks, indicating notable variations in vibration characteristics between the two drill conditions.Figures 4 and 5 display the time-series waveforms and frequency spectra comparisons of the two conditions.Time series analysis revealed no discernible trends, with both normal and faulty conditions exhibiting fluctuations in acceleration amplitude over time.Subsequently, the frequency analysis of the two-bit conditions was carried out, as depicted in Figures 4b and 5b.Generally, it was noted that both conditions exhibited similar dominant frequencies within the range of 0-6000 Hz.A consistent feature emerged, with the highest peaks consistently clustered around the 4000-5000 Hz frequency range.A distinct trend occurred in terms of amplitude; normal drill bits consistently exhibited a lower amplitude compared to the faulty drill bit.The presence of larger peaks indicated increased vibrational energy at the dominant frequency due to structural damage to the abnormal drill bit, reduced cutting efficiencies, and uneven contact of the bit with the rock.The analysis process required a detailed examination of the signals, making it labor-intensive.Therefore, it is essential to implement techniques that can improve the efficiency and accuracy of the process, such as deep learning models and XAI.
presence of larger peaks indicated vibrational energy at the dominant frequency due to structural damage to the abnormal drill bit, reduced cutting efficiencies, and uneven contact of the bit with the rock.The analysis process required a detailed examination of the signals, making it labor-intensive.Therefore, it is essential to implement techniques that can improve the efficiency and accuracy of the process, such as deep learning models and XAI.

Statistical Analysis
Statistical feature extraction is a crucial step in analyzing vibration data.It involves computing descriptive measures from the vibration signals [37].These measures presence of larger peaks indicated increased vibrational energy at the dominant frequency due to structural damage to the abnormal drill bit, reduced cutting efficiencies, and uneven contact of the bit with the rock.The analysis process required a detailed examination of the signals, making it labor-intensive.Therefore, it is essential to implement techniques that can improve the efficiency and accuracy of the process, such as deep learning models and XAI.

Statistical Analysis
Statistical feature extraction is a crucial step in analyzing vibration data.It involves computing descriptive measures from the vibration signals [37].These measures

Statistical Analysis
Statistical feature extraction is a crucial step in analyzing vibration data.It involves computing descriptive measures from the vibration signals [37].These measures summarize important information about the data's distribution, variability, and central tendency [38].Metrics such as mean, standard deviation, skewness, and kurtosis were computed to characterize the signals, shown in Figure 6.The statistical features were calculated as average values of the normal and abnormal datasets.The normal dataset had slightly lower mean, variance, standard deviation, skewness, and kurtosis values compared to the abnormal dataset.The observations from Figure 6 indicate that the abnormal dataset had a large number of outliers, suggesting irregularities in the signal.A correlation coefficient of 0.9938 was obtained, indicating a high degree of interdependence between the normal and abnormal datasets.
culated as average values of the normal and abnormal datasets.The normal dataset had slightly lower mean, variance, standard deviation, skewness, and kurtosis values compared to the abnormal dataset.The observations from Figure 6 indicate that the abnormal dataset had a large number of outliers, suggesting irregularities in the signal.A correlation coefficient of 0.9938 was obtained, indicating a high degree of interdependence between the normal and abnormal datasets.

t-SNE Evaluation
The t-distributed stochastic neighbor embedding (t-SNE) technique was utilized to investigate the underlying patterns and relationships between normal and abnormal drilling conditions.t-SNE is a data visualization technique that reduces high-dimensional data into a lower-dimensional space while preserving local relationships among data points [39].It is particularly beneficial for visualizing complex datasets and revealing hidden structures.Figure 7 demonstrates a substantial overlap between the two conditions.This observation indicates a strong similarity in the frequency characteristics of the vibration data and a non-linear distribution of the data, highlighting the intricate nature of the underlying data structure.

t-SNE Evaluation
The t-distributed stochastic neighbor embedding (t-SNE) technique was utilized to investigate the underlying patterns and relationships between normal and abnormal drilling conditions.t-SNE is a data visualization technique that reduces high-dimensional data into a lower-dimensional space while preserving local relationships among data points [39].It is particularly beneficial for visualizing complex datasets and revealing hidden structures.Figure 7 demonstrates a substantial overlap between the two conditions.This observation indicates a strong similarity in the frequency characteristics of the vibration data and a non-linear distribution of the data, highlighting the intricate nature of the underlying data structure.

Hyperparameter Tuning and Evaluation Metrics
The hyperparameters were adjusted based on the training set to obtain the best performance and stability in the model.To prevent overfitting, the cross-validation technique was employed, where the data was divided into training, validation, and testing sets.Out of the 3784 data points, 2784 were used for training and 500 were used for validation and testing.The model was trained using the hyperparameters indicated in Table 2: an Adam optimizer with an initial learning rate of 0.001 and a categorical cross-entropy loss function.A minibatch size of 8 and a maximum of 10 epochs were used.The training process was conducted in MATLAB R2022b with a deep learning toolbox.The training machine had 32 GB of memory, an Intel i7-8750H CPU running at 2.2 GHz, NVIDIA GeForce GTX

Hyperparameter Tuning and Evaluation Metrics
The hyperparameters were adjusted based on the training set to obtain the best performance and stability in the model.To prevent overfitting, the cross-validation technique was employed, where the data was divided into training, validation, and testing sets.Out of the 3784 data points, 2784 were used for training and 500 were used for validation where the model failed to detect actual abnormalities (FP).

Interpretability with GradCAM
To gain insight into the decision-making process of the ResNet-50 model, we used the Grad-CAM algorithm.This algorithm highlights the significant frequencies that are responsible for a given prediction.Additionally, we applied the t-SNE technique to visualize the activations in the last ten convolutional layers, as shown in Figure 9. Through t-SNE visualizations of activations, we can identify the convolutional layers that capture the most discriminative features relevant to the classification task.Layers that show well-separated clusters of activations indicate the learning of important features for distinguishing between different classes.Figure 9b demonstrates better cluster separation compared to the commonly used final layer.This suggests that the layer effectively captured the crucial features required for the classification task.Given the well-separated clusters of activations, this layer emerges as a strong candidate for generating meaningful visualizations.Therefore, by utilizing activations from this layer in our model, we can potentially enhance interpretability and gain valuable insights into the decision making process of our model.
Figure 10 displays the attention maps generated using Grad-CAM.These maps provide insights into the specific spectral regions that the model considers essential for making predictions.By overlaying these heat maps onto the spectral plots of the test dataset, we can easily identify the frequencies that the model deems significant.In these visualizations, areas of high attention are indicated in red to highlight their importance, while areas of low attention are represented in blue.The model's attention is directed towards specific frequency ranges depending on the condition of the drill bit.For instance, when predicting a normal drill bit condition, Figure 10a, the model primarily focuses on frequencies within the range of 0-1000 Hz.Additionally, attention is also observed in the frequency range of 4000-6000 Hz, indicating the presence of distinctive spectral features that are crucial for identifying normal drill bit behavior.On the other hand, in scenarios involving abnormal drill conditions, Figure 10b, the model's attention shifts to different frequency ranges.Specifically, attention is concentrated within the frequency range of 2500-6000 Hz.This suggests that spectral characteristics within this range play a crucial role in distinguishing abnormal drill bit behavior from normal operation.
the most discriminative features relevant to the classification task.Layers that show wellseparated clusters of activations indicate the learning of important features for distinguishing between different classes.Figure 9b demonstrates better cluster separation compared to the commonly used final layer.This suggests that the layer effectively captured the crucial features required for the classification task.Given the well-separated clusters of activations, this layer emerges as a strong candidate for generating meaningful visualizations.Therefore, by utilizing activations from this layer in our model, we can potentially enhance interpretability and gain valuable insights into the decision making process of our model.Figure 10 displays the attention maps generated using Grad-CAM.These maps provide insights into the specific spectral regions that the model considers essential for making predictions.By overlaying these heat maps onto the spectral plots of the test dataset, we can easily identify the frequencies that the model deems significant.In these visualizations, areas of high attention are indicated in red to highlight their importance, while areas of low attention are represented in blue.The model's attention is directed towards specific frequency ranges depending on the condition of the drill bit.For instance, when predicting a normal drill bit condition, Figure 10a, the model primarily focuses on frequencies within the range of 0-1000 Hz.Additionally, attention is also observed in the

Discussion
Vibrational signals provide valuable information about the operating conditions of machinery.It is crucial to identify the frequencies associated with abnormal drill bit behavior to detect faults early and prevent further damage.The interpretability of Grad-CAM enables us to identify the frequency ranges that the model considers representative of drill bit failure.The study results showed that, under normal conditions, the model focused on frequencies of 0-1000 Hz and 4000-6000 Hz.Under abnormal conditions, the model focused on frequency ranges of 2500-6000 Hz.The signal analysis method yielded

Discussion
Vibrational signals provide valuable information about the operating conditions of machinery.It is crucial to identify the frequencies associated with abnormal drill bit behavior to detect faults early and prevent further damage.The interpretability of Grad-CAM enables us to identify the frequency ranges that the model considers representative of drill bit failure.The study results showed that, under normal conditions, the model focused on frequencies of 0-1000 Hz and 4000-6000 Hz.Under abnormal conditions, the model focused on frequency ranges of 2500-6000 Hz.The signal analysis method yielded similar results.Both approaches identified dominant frequencies within the 0-6000 Hz range for different conditions.Frequency analysis provided additional insight by distinguishing the acceleration peaks of each condition in more detail.However, this capability is a limitation of the Grad-CAM approach.On the other hand, Grad-CAM offers an advantage by providing interpretable visualizations directly from the neural network, eliminating the need for domain-specific knowledge and subjective analysis while accomplishing the task efficiently.In practical applications, a combined approach using both traditional signal analysis and Grad-CAM simultaneously would offer superior insights and advantages.It is crucial to have a high-accuracy model that consistently predicts drill bit conditions to effectively leverage Grad-CAM.This ensures that Grad-CAM interpretations are based on reliable predictions, enhancing the reliability and trustworthiness of the insights gained from the model.Before applying Grad-CAM, determining the most informative convolutional layer in the neural network is crucial.The t-SNE analysis becomes invaluable as it allows researchers to visualize feature representations at different layers of the network.By selecting the most discriminative layer, Grad-CAM can accurately identify regions of interest in the input data that significantly contribute to the model's predictions.The use of Grad-CAM for interpretability in drill bit failure detection represents a significant advancement in condition monitoring and predictive maintenance.Incorporating interpretable AI techniques like Grad-CAM into existing monitoring systems improves the efficiency and reliability of maintenance processes, reduces downtime, and optimizes resource utilization.Ultimately, adopting advanced analytic solutions enhances equipment reliability, cost savings, and operational excellence in industrial settings.The main limitation of the study is that it does not consider the geological conditions associated with rock formations.Geological factors have a significant impact on drilling performance and the wear patterns of drill bits.Additionally, the study does not address the variability in drilling parameters encountered during actual drilling operations.In real-life scenarios, drilling parameters often change due to various factors, making it difficult to consistently control these variables.Consequently, future research should prioritize the development of systems that can adapt to dynamic drilling conditions by incorporating geological data and accounting for fluctuations in drilling parameters.Such an approach would enhance the system's robustness and practicality in real-world settings.

Conclusions
This study presents a novel method to enhance the interpretability of drill bit wear vibration data using explainable artificial intelligence (XAI) algorithms.By incorporating XAI techniques, we obtained valuable insights into the decision-making process of the ResNet-50 model.Grad-CAM identified important frequencies in the signals that were indicative of drill bit wear.In normal conditions, the model primarily focused on frequencies within the ranges of 0-1000 Hz and 4000-6000 Hz.However, during abnormal conditions, the model's attention shifted to the frequency range of 2500-6000 Hz.The results obtained from the Grad-CAM method demonstrate a level of consistency and reliability that is comparable to conventional signal analysis.This similarity underscores the trustworthiness and robustness of the model's predictions.The interpretability provided not only instills confidence in the model's predictions but also serves as a valuable diagnostic tool for domain experts and engineers involved in drilling operations.It forms a foundation upon which further research and development can be built, ultimately improving our understanding and ability to mitigate drill bit failures in rock drilling operations.

Figure 1 .
Figure 1.The proposed method for drill bit failure detection diagnosis with the application of a deep learning model for classification and Grad-CAM for vibrational diagnosis.

15 Figure 2 .
Figure 2. Residual block of ResNet-50 models with skip connections to solve the vanishing gradient problem in deep networks.

Figure 2 .
Figure 2. Residual block of ResNet-50 models with skip connections to solve the vanishing gradient problem in deep networks.

Figure 3 .
Figure 3. Drill bits used in the experiments: (a) normal and (b) abnormal.

Figure 3 .
Figure 3. Drill bits used in the experiments: (a) normal and (b) abnormal.

Figure 4 .
Figure 4. Signal analysis for the normal drill bit: (a) time series analysis and (b) frequency analysis.

Figure 5 .
Figure 5. Signal analysis for the abnormal drill bit: (a) time series analysis and (b) frequency analysis.

Figure 4 .
Figure 4. Signal analysis for the normal drill bit: (a) time series analysis and (b) frequency analysis.

Figure 4 .
Figure 4. Signal analysis for the normal drill bit: (a) time series analysis and (b) frequency analysis.

Figure 5 .
Figure 5. Signal analysis for the abnormal drill bit: (a) time series analysis and (b) frequency analysis.

Figure 5 .
Figure 5. Signal analysis for the abnormal drill bit: (a) time series analysis and (b) frequency analysis.

Figure 6 .
Figure 6.Statistical features for the normal and abnormal condition.

Figure 6 .
Figure 6.Statistical features for the normal and abnormal condition.

Figure 7 .
Figure 7. Visualization of the normal and abnormal data in two dimensions using the t-SNE technique.The normal drill bit is represented in blue whilst the abnormal in orange.

Figure 7 .
Figure 7. Visualization of the normal and abnormal data in two dimensions using the t-SNE technique.The normal drill bit is represented in blue whilst the abnormal in orange.

Figure 8 .
Figure 8. ResNet-50 model confusion matrix: green indicates true positives and true negatives while pink denotes false positives and false negatives.

Figure 8 .
Figure 8. ResNet-50 model confusion matrix: green indicates true positives and true negatives while pink denotes false positives and false negatives.

Figure 9 .
Figure 9. Visualization of the Resnet-50 model's activations from the last 10 convolutional layers using the t-SNE method.The normal drill bit is depicted in blue, while the abnormal one is shown in orange.

Figure 9 .
Figure 9. Visualization of the Resnet-50 model's activations from the last 10 convolutional layers using the t-SNE method.The normal drill bit is depicted in blue, while the abnormal one is shown in orange.

Table 1 .
Rock drill machine specifications.

Table 1 .
Rock drill machine specifications.

Table 3 .
Results of the ResNet-50 model.