Interpretable and Noise-Robust Bearing Fault Diagnosis for CNC Machine Tools via Adaptive Shapelet-Based Deep Learning Model

Hu, Weiqi; Zhou, Huicheng; Yang, Jianzhong

doi:10.3390/machines14020214

Open AccessArticle

Interpretable and Noise-Robust Bearing Fault Diagnosis for CNC Machine Tools via Adaptive Shapelet-Based Deep Learning Model

by

Weiqi Hu

^*

,

Huicheng Zhou

and

Jianzhong Yang

National Center of Technology Innovation for Intelligent Design and Numerical Control, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Machines 2026, 14(2), 214; https://doi.org/10.3390/machines14020214

Submission received: 26 January 2026 / Revised: 9 February 2026 / Accepted: 9 February 2026 / Published: 12 February 2026

(This article belongs to the Section Advanced Manufacturing)

Download

Browse Figures

Versions Notes

Abstract

Rolling bearings are crucial components in CNC machine tool spindles, and their health condition directly affects machining precision and operational reliability. To address the significant challenges of bearing fault diagnosis in industrial environments, this paper proposes an adaptive shapelet-based deep learning model for bearing fault diagnosis. The proposed model integrates three key components: (1) an adaptive multi-scale shapelet extraction module for discriminative pattern learning, (2) a gated parallel CNN with depthwise separable convolutions for multi-scale spatial feature extraction, (3) an enhanced bidirectional long short-term memory network with residual connections for temporal dependency modeling. A composite loss function combining cross-entropy, supervised contrastive learning, and multi-scale consistency regularization is employed for training. To simulate real-world industrial noise conditions, Gaussian, uniform, and impulse noise were injected into the signals. Experiments conducted on the CWRU and IMS datasets demonstrate that, compared with state-of-the-art methods, the proposed approach achieves stronger noise robustness, higher fault classification accuracy, and more stable performance under severe noise contamination.

Keywords:

bearing fault diagnosis; numerical control machine tools; shapelet learning; noise suppression; fault diagnosis

1. Introduction

Rolling bearings are key components in the main spindles of numerical control machine tools. Their operating conditions directly affect the precision of machining and the performance of the entire equipment. Statistical research indicates that bearing failures account for approximately 40–50% of rotating machinery failures in the manufacturing system [1]. Therefore, the development of effective fault diagnosis methods for the spindle bearings of numerical control machine tools has become increasingly important for predictive maintenance strategies and for reducing unplanned production downtime [2,3].

Traditional bearing fault diagnosis methods mainly rely on vibration signal analysis together with signal processing techniques like fast Fourier transform (FFT), wavelet transform, and empirical mode decomposition [4,5]. Although these methods perform well under controlled laboratory conditions, their effectiveness may decrease in actual industrial environments as signals are heavily contaminated by various noise sources, including cutting forces, cooling systems, and environmental vibrations. In the machining environment of numerical control machine tools, the signal-to-noise ratio (SNR) will change according to operating parameters, which makes continuous fault detection challenging.

To overcome these limitations, researchers have already proposed various hybrid architectures that combine Convolutional Neural Networks (CNNs) and recurrent networks [6]. The design of multi-scale CNNs having parallel branches of different kernel sizes is more effective at capturing the fault features of multiple frequency bands [7,8]. Attention mechanisms have also been integrated to dynamically weight feature importance [9,10]. Despite these advancements, most existing approaches still treat feature extraction as a black-box process, offering little insight into which local signal patterns are responsible for diagnostic decisions. For CNC spindle bearings, such a lack of physical interpretability is problematic, since fault mechanisms are typically characterized by periodic impacts and localized transient signatures that maintenance engineers expect to be traceable in the time domain.

Shapelet-based methods, originally developed for time-series classification, provide a promising path toward interpretable fault diagnosis [11]. Unlike black-box deep learning models, the learned shapelets that can be directly visualized and interpreted can provide physical insights for the diagnostic process [12,13]. However, the effective integration of shapelet learning and modern deep neural networks for noise-robust bearing fault diagnosis is still largely unexplored.

Recently, increasing attention has been paid to fault diagnosis methods specifically designed for CNC machine tool spindle bearings. He et al. [14] proposed a method for in-situ fault detection of the main shaft motor of a numerical control machine tool through a multi-level residual fusion CNN, which shows the feasibility of deep learning methods in real industrial situations. Iqbal and Madan [15] developed a CNC machine-bearing fault detection method using CNN with vibration and acoustic signals. These works highlight the growing interest in developing specialized diagnostic methods for CNC machine tools.

In parallel, advanced optimization and hybrid modeling techniques have significantly advanced CNC machining analysis. Kouam et al. [16] proposed a hybrid RSM-ANN-GA approach coupled with finite element analysis for optimizing end mill geometry in machining tool steel, demonstrating the effectiveness of combining machine learning with physics-based simulation for manufacturing process optimization. Suryanarayana et al. [17] developed a hybrid neural network framework for mechanical alloying process parameter optimization, showing how forward and reverse mapping strategies can enhance predictive modeling in materials processing. These studies highlight the broader trend of integrating intelligent algorithms with domain-specific knowledge for improved industrial decision-making. Furthermore, recent advances in air compressor fault diagnosis [18] and rotor fault diagnosis [19] have demonstrated the versatility of hybrid deep learning approaches across different manufacturing domains, providing valuable insights for bearing fault diagnosis methodology development.

Noise in rolling bearing vibration signals is one of the main factors affecting the accuracy of fault diagnosis. In actual industrial environments, vibration signals are often contaminated by various noise sources, including environmental vibration, electromagnetic interference, and mechanical noise [20,21]. The existing deep learning models generally consider Gaussian noise and have achieved good diagnostic performance, while other kinds of noise, such as uniform noise and impulse noise in the field, need further systematic research [22,23].

This paper proposes a Shapelet-based CNN-Bidirectional Long Short-Term Memory (BiLSTM) model for bearing fault diagnosis in CNC machine tools. Unlike existing approaches that simply combine deep learning modules or apply standard shapelet methods, the proposed framework introduces three fundamental innovations: (1) a correlation-based shapelet matching mechanism that is inherently robust to amplitude variations and signal scaling, unlike traditional Euclidean distance-based methods; (2) learnable scale weights that are jointly optimized with feature extraction, enabling automatic adaptation to fault-specific temporal characteristics rather than relying on fixed multi-scale fusion; and (3) a composite loss function that enforces cross-scale feature consistency, ensuring that representations learned at different temporal scales remain coherent, which is critical for maintaining diagnostic reliability under varying noise conditions. The main contributions include:

A unified shapelet-guided deep architecture that captures physically meaningful local waveform patterns and jointly models multi-scale spatial and temporal features for accurate and interpretable fault representation.
A structurally noise-robust learning mechanism that combines correlation-based shapelet matching, gated multi-scale convolution, and bidirectional temporal modeling to suppress Gaussian, uniform, and impulsive disturbances.
An end-to-end training strategy using a composite objective to enhance class discrimination and enforce cross-scale feature consistency under severe noise conditions.

The remainder of this paper is organized as follows: Section 2 reviews related work on deep learning for fault diagnosis and shapelet-based methods. Section 3 presents the detailed methodology of the proposed Shapelet CNN-BiLSTM model. Section 4 describes the experimental setup and presents comprehensive results. Section 5 concludes the paper.

2. Related Work

2.1. Deep Learning for Bearing Fault Diagnosis

Multi-scale feature extraction has become a central theme in CNN-based fault diagnosis. Peng et al. [1] developed a multi-branch and multi-scale CNN architecture for wheelset bearing fault diagnosis under strong noise and variable load conditions, demonstrating that parallel convolutional branches with different kernel sizes can capture complementary features at multiple receptive fields. Kumar et al. [8] proposed a multi-size wide kernel CNN that combines multiple receptive fields to achieve improved fault pattern recognition. Beyond CNN-based architectures, capsule networks have emerged as a promising alternative for fault diagnosis. Jiang et al. [24] proposed a convolutional capsule network for rolling bearing fault diagnosis, which effectively captures spatial hierarchical relationships between features and achieves robust performance under varying operating conditions. Vibration-based discriminant analysis methods have also been investigated for machinery fault detection. Niola et al. [25] developed a discriminant analysis approach for characterizing vibrational behavior, demonstrating the importance of extracting discriminative features from vibration signals for reliable fault identification. Wang et al. [26] introduced Inception-inspired modules for bearing fault diagnosis that perform well under different operating conditions.

The latest research has focused primarily on the Transformer architecture and its variants for fault diagnosis. Hou et al. [9] proposed the Diagnosisformer, which is an efficient Transformer-based model that incorporates a multi-feature parallel fusion encoder to extract local and global features. Ding et al. [27] proposed a graph neural network-based method for system-level fault diagnosis of train transmission systems.

2.2. Noise-Robust Fault Diagnosis Methods

Noise robustness is an important requirement of real-world fault diagnosis systems in industrial environments. Several strategies have been proposed to enhance the noise immunity of deep learning models. Data augmentation through noise injection during training has been widely adopted as a straightforward approach to improve model generalization [28].

Li et al. [29] presented the Deep Residual Shrinkage Network combined with gated convolution and enhancement mechanism (DRSN-GCE) model, which mainly addresses the problem of difficulty in extracting fault features and low diagnostic accuracy of the spindle (rolling bearing) of CNC machine tools in a strong noise environment. Li et al. [30] proposed a fault diagnosis method combining DRSN with a Transformer for CNC machine tool spindle bearings, achieving high accuracy under strong noise conditions.

Denoising-based approaches represent another important direction. Chen et al. [31] incorporated a spatially adaptive denoising module into the residual network to progressively filter out industrial noise. Wang et al. [32] utilized generative models, particularly the Denoising Diffusion Implicit Model (DDIM), to recover clear fault features from noisy observation data, demonstrating the potential of diffusion model-based denoising methods in fault diagnosis.

2.3. Shapelet-Based Time Series Analysis

The shapelet-based method has been proven to be highly effective in extracting discriminative time features for industrial fault diagnosis. Liu et al. [13] developed a multi-dimensional shapelet extraction strategy for real-time fault detection in the power distribution system of the International Space Station. These studies demonstrate that shapelets have broad applicability in capturing interpretable patterns in complex engineering systems.

Recent advancements have centered around integrating shapelet learning and deep neural networks together. Fan et al. [33] combined LSTM with shapelets to address the issue of feature redundancy in the flotation process and improve prediction accuracy. Zhang et al. [34] introduced L₂-norm shapelet dictionary learning for bearing fault diagnosis under uncertain working conditions. However, the application of learned shapelets combined with CNN-BiLSTM architectures for noise-robust CNC bearing diagnosis remains unexplored.

2.4. Comparison with Deep Shapelet-Based and Hybrid CNN-RNN Methods

To better position the proposed method, we provide a comparison with existing deep shapelet-based methods and hybrid CNN-RNN architectures for time series classification and fault diagnosis. The seminal work by Grabocka et al. [35] introduced the concept of learning shapelets through gradient descent optimization, enabling end-to-end training without exhaustive candidate search. This approach uses Euclidean distance-based matching and linear classification, which can be sensitive to amplitude variations in noisy environments. Li et al. [36] proposed ShapeNet, which combines shapelets with a dilated causal CNN for multivariate time series classification, employing a triplet loss for shapelet representation learning. However, ShapeNet focuses on shapelet selection from candidates rather than adaptive scale weighting, and does not specifically address industrial noise robustness.

Regarding hybrid CNN-RNN methods for bearing fault diagnosis, Chen et al. [37] proposed a noise-robust fault diagnosis approach combining multi-scale CNN with LSTM and deep residual learning, demonstrating improved noise immunity through hierarchical feature extraction. More recently, Wang et al. [38] developed a multi-scale CNN combining BiLSTM and attention mechanism (MSCNN-BiLSTM-AM) for bearing fault diagnosis under multiple working conditions, achieving good performance through transfer learning strategies. Imran et al. [39] introduced a CNN-BiLSTM-MHSA hybrid model that integrates multi-head self-attention for rotor motor bearing diagnosis, showing the benefits of attention-based feature weighting.

Despite these advances, existing methods exhibit three key limitations that our approach addresses: (1) Deep shapelet methods typically use Euclidean distance matching, which is sensitive to signal amplitude variations caused by noise; our correlation-based matching provides inherent normalization and improved noise robustness; (2) Hybrid CNN-RNN methods lack explicit interpretable pattern extraction; by integrating learnable shapelets as the front-end representation, our method maintains physical interpretability while leveraging deep spatiotemporal modeling; (3) Current multi-scale approaches use fixed fusion weights or simple concatenation; our learnable scale weights with consistency regularization enable adaptive emphasis on the most discriminative temporal scales for each fault type.

2.5. Fault Diagnosis for CNC Machine Tools

Attention has been increasingly paid to the fault diagnosis of CNC machine tools. Lyu et al. [40] proposed a causal Transformer meta-learning method. By extracting fault-causal features, they achieved high-precision diagnoses for CNC bearings in a small sample size scenario. Zhou et al. [41] proposed the parameter optimization MOMEDA method, which effectively separates the composite fault features of CNC bearings and solves the problem of multi-fault coupling diagnosis.

Based on the above review, there are still three limitations that have not been fully addressed in the existing research. First, although shapelet learning can provide interpretable local pattern representations, it is rarely combined with the deep convolutional neural network-bidirectional long short-term memory network (CNN-BiLSTM) architecture for main shaft bearing fault diagnosis, especially in industrial noise environments. Second, most noise-robust methods rely on global denoising or fixed preprocessing strategies, which may suppress the transient signals related to faults while removing noise, and lack adaptive model-level modeling capabilities. Third, current multi-scale deep feature fusion mechanisms can generally improve accuracy, but have limited physical interpretability of the learned feature representations.

These observations indicate that an effective fault diagnosis model for the main shaft bearings of a CNC machine tool should simultaneously meet three requirements: adaptive extraction of distinctive local fault patterns, structural robustness against various noise types, and multi-scale spatio-temporal representation learning within an end-to-end trainable framework.

3. Proposed Methods

To address the above challenges, this paper proposes a deep learning framework based on shapelets that integrates local pattern matching and multi-scale spatiotemporal modeling to achieve robust bearing fault diagnosis. Unlike treating noise suppression as an independent preprocessing task, the proposed method embeds adaptive shapelet learning into the front-end representation stage, enabling the model to explicitly capture transient patterns that are insensitive to amplitude variations and pulse disturbances and have physical significance. These local representations are further optimized through gated convolution feature extraction and bidirectional temporal modeling to enhance stability under complex industrial noise conditions.

The overall architecture of the proposed Shapelet-Based CNN-BiLSTM model is illustrated in Figure 1. It consists of four main components: an adaptive multi-scale shapelet extraction module, a gated parallel CNN module, a residual-enhanced BiLSTM module, and a classification layer. The raw vibration signal is first processed by the shapelet extraction module to capture discriminative fault patterns at multiple time scales. The extracted shapelet features are then passed to the parallel CNN for multi-scale spatial feature extraction with learnable gating. Subsequently, the BiLSTM module models temporal dependencies with residual connections. Finally, the fault classification is realized through the Softmax-based layer.

3.1. Adaptive Multi-Scale Shapelet Extraction

Shapelets are discriminative subsequences that capture essential local patterns for distinguishing between different fault classes. Unlike traditional black-box feature extraction methods, shapelets provide interpretable representations that can be directly related to physical fault characteristics. The architecture of the adaptive multi-scale shapelet extraction module is shown in Figure 2.

The shapelet lengths are selected based on the characteristic periodicity of bearing fault impulses. Under the sampling rate of 12 kHz and a shaft speed of 1797 RPM, one revolution corresponds to approximately 400 samples. As a representative example, the ball-pass frequency outer race (BPFO) impulse period is about 80 samples. Therefore, the chosen length range

[16,256]

spans roughly

0.2 \times

to

3.2 \times

the BPFO period. Short shapelets (16–64 samples) capture localized impulsive transients and their immediate decay, whereas longer shapelets (128–256 samples) provide sufficient context to represent repeated impact patterns over multiple BPFO cycles. This multi-scale design avoids committing to a single fault period and improves robustness when the effective impulse spacing varies with operating conditions and noise contamination.

The quality of initial shapelets affects the convergence and final performance of the model. Random initialization often leads to suboptimal solutions due to the non-convex nature of the optimization landscape. To address this issue, the K-means++ algorithm is employed for shapelet initialization. Using a sliding window approach, input signal subsequences are extracted at multiple lengths. K-means++ is used to cluster these subsequences, so as to obtain the cluster centers as the initial shapelet patterns. The K-means initialization ensures the initial shapelet candidates are diverse and also reflect the actual signal patterns, which makes the subsequent gradient-based optimization be able to converge more rapidly.

In this module, normalized Euclidean distance with local Z-score normalization is employed to achieve scale-invariant pattern matching. For each location in the input signal, calculate the local mean and standard deviation in the sliding window. Then, the normalized local subsequences are used to calculate the distance between them and the shapelet template using a distance formula based on correlation:

d (x_{t}, s) = \sqrt{2 m (1 - c o r r (x_{t}, s))}

(1)

In Equation (1),

m

represents the length of the shape sequence, and

c o r r (x_{t}, s)

represents the Pearson correlation coefficient between the normalized subsequence at position

t

and the shape sequence

s

. The shapelet

s

is pre-normalized to zero mean and unit variance, so the correlation simplifies to an inner product form:

corr (x_{t}, s) = \frac{1}{m \cdot σ_{t}} \sum_{i = 0}^{m - 1} x_{t + i} \cdot s_{i}

(2)

where

s_{i}

is the

i

-th element of the shapelet,

x_{t + i}

is the locally normalized input, and

\sum_{i} x_{t + i} \cdot s_{i}

corresponds to the convolution output at position

t

.

This formula can be efficiently calculated by performing convolution operations using shapelets as convolution kernels. This correlation-based distance measurement ensures robustness against amplitude scaling and offset variations, which is particularly effective for fault mode matching under different operating conditions of CNC machines.

Bearing fault characteristics exhibit different properties at different time scales. Short-term patterns and long-term patterns can respectively capture different fault characteristics. Short-term patterns can capture pulsed fault characteristics, while long-term patterns can reveal periodic fault characteristics. To capture these multi-scale characteristics, we adopted shapelets of different lengths. Distance sequences are calculated at each scale, and they were unified to the same length through adaptive pooling. Finally, these sequences were concatenated to form a shapelet representation.

The distinct features may manifest at different scales for various fault types. To make the model focus on the effective scales of each fault type, learnable scale weights are introduced. These weights are optimized together with other model parameters through backpropagation, allowing the model to adaptively emphasize the scales that are discriminative for the classification task.

To unify multi-scale distance sequences before fusion, adaptive average pooling is applied. Given

K

different shapelet lengths (

m_{1}, m_{2}, \dots, m_{K}

), the distance sequences at each scale

D^{(K)}

are first aligned to a common length through adaptive pooling. The learnable scale weights are computed using the softmax function to ensure they sum to one:

w_{K} = \frac{e x p (α_{K})}{\sum_{j = 1}^{K} e x p (α_{j})}

(3)

where

α_{K}

are learnable parameters initialized to equal values, ensuring all scales contribute equally at the beginning of training. The scale weights

α_{K}

in Equation (3) are global trainable parameters shared across all samples and remain fixed during inference. Fault-type adaptation is achieved by the input-dependent multi-scale shapelet responses and the subsequent gated CNN fusion in Section 3.2 (Equation (8)), rather than by dynamically adjusting

α_{K}

per sample. These weights are jointly optimized with other model parameters through backpropagation. The weighted multi-scale features are then concatenated along the channel dimension:

F_{shapelet} = C o n c a t (w_{1} \cdot D^{(1)}, w_{2} \cdot D^{(2)}, \dots, w_{K} \cdot D^{(K)})

(4)

This weighted concatenation mechanism allows the model to adaptively emphasize the scales that are most discriminative for each fault type.

Then the distance sequence is converted into similarity scores by using an exponential kernel as follows:

s i m (t, s) = e x p (- γ \cdot d (t, s))

(5)

In this equation,

γ

represents a scaling parameter used to control the sensitivity to changes in distance. This transformation converts the distance values into similarity scores ranging from (0, 1], with higher values indicating a greater match between the signal and the shape pattern.

Time Complexity of Shapelet Matching. As shown in Equation (1), we utilize convolutional operations for efficient sliding-window matching. For an input signal of length

n

with

K

shapelets at each of

S

different scales (shapelet lengths

\{m_{1}, m_{1}, . . ., m_{S}\}

), the naive sliding-window Euclidean distance computation would require

O (n \cdot K \cdot S \cdot \tilde{m})

, multiplications per sample, where

\tilde{m}

is the average shapelet length. Our method reformulates the normalized distance computation using convolution operations. The key insight is that the squared normalized Euclidean distance can be decomposed as in Equation (1). The critical efficiency gains are:

The terms $\sum X$ and $\sum X^{2}$ (required for local mean and variance) are computed via convolution with a ones kernel, which is shared across all $K \cdot S$ shapelets. This reduces the normalization overhead from $O (n \cdot K \cdot S \cdot \tilde{m})$ to $O (n \cdot S \cdot \tilde{m})$ .
Each shapelet-signal correlation is computed via a single conv1d operation with complexity $O (n \cdot m)$ , but modern deep learning frameworks execute this as a highly parallelized matrix operation rather than sequential window enumeration.

3.2. Gated Parallel CNN

The gated parallel CNN module is designed to extract multi-scale spatial features from shapelet feature maps while adaptively suppressing noise interference. This module adopts a gating mechanism that selectively filters irrelevant information, thus enhancing the model’s robustness in a noisy environment.

The key point of the gating mechanism is the combination of the convolution path used for feature extraction and a gating path that generates values between 0 and 1 to control the flow of information. The gating convolution operation can be described as follows:

G (X) = R e L U (X \cdot W_{c} + b_{c}) ⨀ σ (X \cdot W_{g} + b_{g})

(6)

In this equation,

W_{c}

and

b_{c}

represent the main convolution kernels and biases

W_{g}

and

b_{g}

are the gate convolution kernels and biases

σ

denotes the Sigmoid activation function. Produced by the Sigmoid-activated gating path, values in the range of (0, 1) act as soft gates. This mechanism can help suppress the features related to noise while retaining the information related to faults.

We use multiple parallel branches with different convolution kernel sizes, each capturing patterns at different time scales. The outputs of these branches are concatenated to form the final feature map. To enable adaptive fusion of these multi-scale features based on input characteristics, we introduce learnable gate weights computed through global average pooling (GAP). Given

N

parallel branches with feature outputs

H^{(1)}, H^{(2)}, \dots, H^{(N)}

, we first concatenate them and apply global average pooling to obtain a compact representation:

p = G A P (Concat (H^{(1)}, H^{(2)}, \dots, H^{(N)}))

(7)

where

p \in R^{N \cdot C}

contains the pooled features from all branches. The gate weights are then computed through a fully connected layer followed by sigmoid activation:

g = σ (W_{g} \cdot p + b_{g}) \in R^{N}

(8)

where

W_{g}

and

b_{g}

are learnable parameters, and

σ (\cdot)

denotes the sigmoid activation function. The gate values

g_{i} \in (0,1)

act as soft attention weights that control the contribution of each branch. The final fused features are obtained by applying the gate weights element-wise and concatenating:

H_{fused} = C o n c a t (g_{1} \cdot H^{(1)}, g_{2} \cdot H^{(2)}, \dots, g_{N} \cdot H^{(N)})

(9)

This gated fusion mechanism enables the model to dynamically adjust the contribution of each scale based on the input signal characteristics. In noisy environments, scales that are more susceptible to noise contamination are automatically down-weighted, thereby enhancing the model’s robustness against different noise types.

3.3. Residual-Enhanced Bidirectional LSTM Network

After convolutional feature extraction, temporal dependencies are modeled using a BiLSTM with residual connections. This bidirectional design enables the model to capture patterns from both directions, which is useful for periodic fault signatures and repeated impacts.

The bidirectional architecture processes the sequence in both forward and backward directions simultaneously. Their hidden states are concatenated at each time step:

h_{t} = [{\vec{h}}_{t}; {\overset{\leftarrow}{h}}_{t}]

(10)

where

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

are the forward and backward hidden states, respectively. This bidirectional design enables the model to capture patterns dependent on both past and future context, which is particularly beneficial for identifying fault signatures that manifest as symmetric or periodic patterns.

Since the input dimension typically differs from the BiLSTM output dimension, we introduce a linear projection layer to enable residual connection. The enhanced output with residual connection and layer normalization is computed as:

\tilde{h_{t}} = L a y e r N o r m (h_{t} + W_{proj} \cdot x_{t})

(11)

where

W_{proj} \in R^{d_{input} \times {2 d}_{h i d d e n}}

is the projection matrix that aligns the input dimension with the BiLSTM output,

h_{t}

is the BiLSTM output, and

x_{t}

is the input sequence. This residual design facilitates gradient flow during training and helps preserve information from earlier processing stages, improving the overall stability and convergence of the deep network.

3.4. Composite Loss Function

Bearing fault signals are often similar across different fault categories. The cross-entropy loss function only focuses on the classification results and does not distinguish similar categories in the feature space. Therefore, a contrastive loss is added to enhance feature discriminability.

Since the model employs shapelet extraction, features at different scales should represent the same fault type, but they may learn inconsistent patterns. Therefore, a consistency loss is added to maintain feature consistency across different scales.

Weighted cross-entropy loss is the main classification loss. Classes are weighted according to their inverse frequencies. This helps handle cases where certain fault-type samples are scarce.

Supervised contrast loss pulls similar features together and pushes different features apart. Temperature-scaled cosine similarity is employed. This makes the learned features more distinguishable, especially in noisy environments.

Multi-scale consistency loss encourages the similarity of features at adjacent scales. This loss term employs cosine similarity between features at different scales. This can prevent the model from learning contradictory patterns at different scales.

The total loss is described as a weighted combination of these three components.

L_{t o t a l} = L_{C E} + λ_{1} L_{c o n t} + λ_{2} L_{c o n s}

(12)

In this equation,

L_{C E}

represents the weighted cross-entropy loss,

L_{c o n t}

represents the supervised contrastive loss and

L_{c o n s}

represents the multi-scale consistency loss, while

λ_{1}

and

λ_{2}

are the hyperparameters that balance the contributions of each part of the losses.

The composite loss function involves two key hyperparameters:

λ_{1}

for the supervised contrastive loss and

λ_{2}

for the multi-scale consistency loss. We conducted a grid search on the validation set to determine the optimal values, with results shown in Table 1.

Based on these results, we selected

λ_{1} = 0.05

and

λ_{2} = 0.02

for the following reasons:

Contrastive loss weight ( $λ_{1} = 0.05$ ): This value provides sufficient regularization to enhance inter-class separability without dominating the cross-entropy loss. When $λ_{1} > 0.10$ , training becomes unstable due to the competing gradients between classification and contrastive objectives. When $λ_{1} < 0.01$ , the contrastive regularization effect is negligible, resulting in less discriminative feature representations under noise.
Consistency loss weight ( $λ_{2} = 0.02$ ): The multi-scale consistency loss encourages coherent representations across different temporal scales. A moderate weight of 0.02 enforces this constraint without over-regularizing the model. Higher values ( $λ_{2} > 0.05$ ) tend to force excessive similarity between scales, reducing the model’s ability to capture scale-specific fault patterns.

4. Experiments and Evaluation

4.1. Experimental Setup

The experiments conducted were all tested on a computer equipped with an Intel Core i7-14700KF processor running at 3.40 GHz and an NVIDIA GTX 4080 graphics card. The experiments used Python 3.9 and PyTorch 1.9 as the deep learning framework. The main structural hyperparameters of the shapelet module, gated CNN branches, BiLSTM, and the loss weights are summarized in Table 2.

Owing to the fact that the complete life-cycle of the main-shaft bearings of the numerical-control machine tool is usually rather long and the healthy samples are in the dominant position, it is hard to directly collect comprehensive fault data under the actual numerical-control operation conditions. This paper employs the Case Western Reserve University (CWRU) bearing dataset with noise added, which is used to simulate the actual operating conditions of the CNC spindle bearings. This method can systematically assess the robustness in terms of noise, and meanwhile can provide very useful insights for the actual fault diagnosis applications of CNC bearings.

4.2. Dataset Description

We evaluated the proposed approach on the Case Western Reserve University (CWRU) bearing dataset, which is used to verify the effectiveness of diagnostic algorithms. Since each type of failure includes three different damage diameters, we divided the data into ten categories: one normal operating condition, and the remaining nine categories cover failure conditions. We studied rolling element failures with diameters of 0.007 inches, 0.014 inches, and 0.021 inches (marked as BA_007, BA_014, and BA_021, respectively), inner ring failures with diameters of 0.007 inches, 0.014 inches, and 0.021 inches (marked as IR_007, IR_014, and IR_021, respectively), and outer ring failures with diameters of 0.007 inches, 0.014 inches, and 0.021 inches (marked as OR_007, OR_014, and OR_021, respectively). We constructed a dataset using the vibration signals of the DE bearings and added noise to the vibration signals manually to simulate signals containing noise in real industrial environments. The sampling frequency was 12 kHz under a 3HP load. The dataset was divided into training, validation, and test sets with a ratio of 7:1.5:1.5. Detailed information about the dataset is shown in Table 3.

To ensure rigorous evaluation and prevent data leakage, we adopted a stratified segment-level splitting strategy with careful consideration of the temporal structure. Each raw vibration signal was first normalized using z-score standardization to eliminate amplitude variations across different recordings. For each fault class, the segments were randomly shuffled with a fixed seed and split into training (70%), validation (15%), and test (15%) sets. This stratified approach ensures balanced class distribution across all subsets. Although the CWRU dataset provides multiple recordings per fault condition at different motor loads (0–3 HP), we used recordings from a single load condition (2 HP) for each fault type. This design choice ensures that training and test segments originate from the same operating condition, which is widely used in CWRU-based fault diagnosis studies [8,28,29]. However, the random shuffling before splitting disperses temporally adjacent segments across different subsets, minimizing the correlation between training and test data.

4.3. Noise Configuration

To simulate the real-world industrial noise environment in CNC processing, we respectively added Gaussian, Uniform, and Impulse noises to the original signal for testing.

The signal-to-noise ratio (SNR) is defined as:

{S N R}_{d B} = 10 \cdot \log_{10} (\frac{P_{s i g n a l}}{P_{n o i s e}})

(13)

where

P_{s i g n a l}

denotes the signal power computed as the variance of the original clean signal (excluding the DC component), and

P_{n o i s e}

denotes the noise power.

The three noise types are generated as follows:

Gaussian noise: Gaussian noise is sampled from $N (0, σ^{2})$ , where $σ = \sqrt{P_{n o i s e}}$ . This noise type models broadband random disturbances such as thermal noise and electronic interference commonly present in industrial measurement systems.
Uniform noise: Noise samples are drawn from a uniform distribution $U (- a, a)$ , where a = $σ = \sqrt{{3 \cdot P}_{n o i s e}}$ to ensure the noise variance equals $P_{n o i s e}$ . Uniform noise simulates quantization effects and bounded random disturbances in signal acquisition systems.
Impulse noise: Sparse high-amplitude impulses are generated with probability $p$ at each sample point. The impulse amplitude is set to $A = \sqrt{P_{n o i s e} ∕ p}$ to achieve the target noise power. Impulse noise models transient mechanical impacts, electrical spikes, and intermittent interference characteristic of CNC machining environments.

The identical noise generation procedure, validation strategies, and data augmentation were applied to all baseline methods and the proposed method to ensure fair comparison.

The signals under the influence of different noises are shown in Figure 3. Gaussian noise represents electromagnetic interference and sensor/background vibration; uniform noise approximates quantization and bounded fluctuation effects; impulse noise mimics intermittent impacts such as milling transients, tool wear/breakage events, and tool-changing shocks.

As shown in Table 4, different signal-to-noise ratio levels ranging from −6 dB to 8 dB are utilized to evaluate the performance of the model under different noise intensities. This SNR range covers severe noise conditions (negative SNR) and moderate noise conditions (positive SNR) to comprehensively evaluate the model’s noise robustness.

4.4. Model Performance Analysis

As shown in Figure 4, the model converged after about the 30th epoch. The accuracy rate has risen to a relatively high level, and the loss value has basically stabilized.

Figure 5 illustrates the trend of diagnostic accuracy rates for different noise categories at various signal-to-noise ratio (SNR) levels. In the high SNR region (SNR > 0 dB), the performance curves show a clear asymptotic convergence trend, indicating that when the signal energy dominates, the statistical heterogeneity of the noise distribution can be disregarded. On the contrary, as the SNR decreases to the negative region, the performance differences among different noise types significantly increase. This disparity highlights the importance of adopting robust feature extraction mechanisms tailored to specific noise characteristics for mitigating the impact of noise under extreme conditions.

Figure 6 shows the confusion matrix of different noise types when the signal-to-noise ratio is −6 dB. This figure reveals the classification performance of the proposed method under different noise types. The diagonal elements of all three matrices dominate, indicating that the model maintains a certain discriminative ability even under severe noise conditions.

Among the ten fault categories, normal state and IR_021 achieved nearly perfect classification under all noise types, indicating that this method has robust feature extraction capabilities under these conditions. Most errors occur between fault severities within the same defect family (e.g., IR_007 vs. IR_014), which is expected because similar defect diameters generate similar impact patterns.

Comparing the three noise types, Gaussian noise yields the highest accuracy (87.67%), followed by uniform noise (85.65%). Overall, these results validate the noise resistance robustness of the proposed model.

4.5. Comparison with State-of-the-Art Methods

We obtained the average values and standard deviations of the diagnostic accuracy, precision, recall, and F1 score for each batch of models under the three different SNRs. Within the signal-to-noise ratio (SNR) range of −6 dB to 0 dB (with an increment of 2 dB), a comparative analysis was conducted between the proposed model and four representative baselines: WDCNN, ResNet1D, GRU, and DRSN-CW. Table 5, Table 6 and Table 7 record the experimental results under Gaussian, Uniform, and Impulse noise. The results show that the proposed architecture consistently outperforms the benchmark methods across various noise types and SNR levels. Moreover, as shown in Table 8, compared to mainstream baseline models, the proposed model achieves superior performance in terms of accuracy, recall, and F1 score.

As shown in Table 8, the proposed model demonstrates consistent performance across different noise types. This model achieves relatively high accuracy under Gaussian noise, then under uniform noise, and impulsive noise is the most challenging. This ranking is aligned with the inherent difficulty of each type of noise: Gaussian noise is continuous and can be filtered through the averaging operation in the convolutional layer; uniform noise is bounded, so it has limited influence on feature extraction; impulse noise is sparse and has high amplitude, which distorts local signal patterns, thus being more difficult to deal with. At a high noise level of −6 dB, the proposed method achieved 87.67 ± 0.45%, exceeding all four baselines. Compared to the fluctuations observed in the baseline methods, the performance gap among Gaussian, Uniform, and Impulse noise is significantly smaller. This extremely small difference indicates that the proposed adaptive shape element model has excellent distribution robustness, maintaining consistent diagnostic capabilities regardless of the underlying statistical characteristics of the environmental noise.

To rigorously validate that the performance improvements are statistically significant rather than due to random variation, we conducted paired t-tests comparing the proposed method against each baseline across 10 independent runs. Table 9 presents the p-values for accuracy comparisons under different noise conditions.

The results confirm that the proposed method significantly outperforms all baseline methods (p < 0.05) across all tested noise conditions. The improvements are particularly significant under severe noise conditions (−6 dB), where all p-values are below 0.01. This statistical evidence validates that the observed accuracy gains are not attributable to random fluctuations but reflect genuine improvements in the model’s noise-robust diagnostic capability.

To conduct a more in-depth assessment of the feature extraction capability under severe noise conditions, Figure 7 and Figure 8 present the confusion matrices and t-SNE visualization results of four comparison methods at −6 dB SNR. WDCNN achieved the highest accuracy among all the comparison methods, but it exhibited confusion between IR_014 faults and other IR faults. From the confusion matrix graph, it can be clearly seen that ResNet and GRU have notable deficiencies in diagnosing Ball_007 and Ball_021.

In contrast, the method proposed in this paper achieved an accuracy of 87.67% at a signal-to-noise ratio of −6 dB, which was higher than all the comparison methods. The confusion matrix indicates that all four comparison methods had difficulty distinguishing faults of similar severity, while the multi-scale shapelets proposed in this paper can more effectively identify subtle fault characteristics under strong noise interference.

4.6. Interpretablility Analysis

To verify the interpretability of the learned shapelet, Figure 9 shows the alignment between the learned shapelets of the outer race fault sample (OR_007) and the original vibration signal. As shown in Figure 9c, the learned shapelet features (with a length of 256) and the physical impact patterns in the noisy signal segments are closely matched, which implies that the model captures the actual waveforms related to faults instead of abstract features. The “best-matching” region identified in Figure 9a corresponds exactly to the characteristic of impact positions peculiar to the outer race fault.

Figure 10 further visualizes the multi-scale matching process, and the learned scale weights indicate that longer shapelets receive higher importance for periodic outer race impacts. These results support that the model does not rely solely on opaque deep features; it also extracts localized waveforms with direct physical meaning.

To validate that the learned shapelets capture physically meaningful fault signatures, we performed spectral analysis comparing the frequency content of the shapelets with theoretical bearing fault characteristic frequencies. Based on the CWRU test bearing geometry (6205-2RS: 9 rolling elements, ball diameter 7.94 mm, pitch diameter 39.04 mm) and shaft frequency at 1797 RPM (29.95 Hz), the characteristic fault frequencies were calculated using Harris formulas:

Outer race fault frequency: $f_{B P F O} = 107.36 H z$
Inner race fault frequency: $f_{B P F O} = 162.19 H z$
Ball spin frequency: $f_{B P F O} = 70.58 H z$

Figure 11 presents the FFT magnitude spectra of the learned shapelets for outer race fault (OR_007) samples. The spectral peaks of the longer shapelets (L = 128, L = 256) align closely with BPFO and its harmonics, confirming that the shapelet learning process successfully extracts fault-specific periodic patterns.

Figure 12 presents the multi-scale shapelet extraction mechanism that is in progress in more detail. The learned scale weights show that longer shapelets (length = 256, weight = 0.275) are more important, which is in line with the physical understanding that the outer race fault has periodic impacts and needs a longer time window to completely describe its pattern. This weight distribution, together with the visual correspondence between the shapelets and the fault features, shows that the proposed model can offer physical significance and interpretable diagnostic features.

4.7. Ablation Study

To evaluate the individual contributions and necessity of each component (particularly the Shapelet module, the gated CNN, and the BiLSTM), a series of ablation experiments was conducted under clean signal conditions (without noise injection) and the −6 dB Gaussian noise condition. Four derivative models (“w/o Shapelet Module”, “w/o Parallel CNN”, “w/o BiLSTM”, and “BiLSTM w/o Residual”) were developed to quantify the impact of each module on the overall diagnostic performance. As shown in Table 10, removing the Shapelet module led to an accuracy decrease of 12.65%, while removing the Parallel CNN, BiLSTM, and Residual resulted in accuracy decreases of 7.31%, 1.86%, and 1.07%, respectively. These results indicate that the proposed architecture is not a redundant stacking of modules; rather, each component works synergistically to enhance the model’s feature extraction and classification capabilities.

Table 10 was obtained under clean conditions to isolate module contributions, whereas the noise robustness comparison (Table 5, Table 6, Table 7 and Table 8) was conducted under various SNR levels. The performance gap between 99.98% (clean) and 87.67% (−6 dB Gaussian noise) reflects the inherent challenge of fault diagnosis under severe noise contamination. To further clarify, Table 11 presents ablation results under the −6 dB Gaussian noise condition.

Under noisy conditions, the Shapelet module demonstrates even greater importance. The absence of the shapelet module led to a 15.22% decrease (from 87.67% to 72.45%) in accuracy, compared to 12.65% under clean conditions. This confirms that the correlation-based shapelet matching mechanism provides substantial noise robustness benefits beyond its contribution to clean-signal classification.

4.8. Cross-Dataset Validation

In this section, we apply the proposed method to the Intelligent Maintenance Systems (IMS) bearing dataset to evaluate the cross-dataset generalization capability. The IMS dataset comprises run-to-failure vibration data collected from four bearings under constant load and speed conditions. Vibration signals were recorded at a sampling rate of 20 kHz, with each file containing a 1 s snapshot. The dataset consists of three test sets: in Set No. 1, Bearing 3 developed an inner race defect and Bearing 4 developed a roller element defect; in Set No. 2, Bearing 1 developed an outer race defect; and in Set No. 3, Bearing 3 developed an inner race defect. In this study, we utilize all three test sets to construct a four-class classification task, including Normal, Ball (roller element fault), Inner Race fault, and Outer Race fault. The signals from bearings without defects are labeled as “Normal.” To ensure consistency with the CWRU dataset, all signals are downsampled to 12 kHz. The same data augmentation strategy, model hyperparameters, and validation protocols used in the CWRU experiments are applied to verify the transferability of the proposed method. Table 12 presents the sample distribution of the IMS dataset, and Table 13 shows the validation results under clean and −6 dB Gaussian noise conditions.

The cross-dataset validation results on the IMS dataset demonstrate the strong generalization capability of the proposed method. As shown in Table 13, the proposed method achieves an accuracy of 99.58% under clean conditions. It is worth noting that the IMS dataset represents real-world run-to-failure data collected from an industrial-scale test rig, which differs significantly from the CWRU dataset in terms of bearing type, sampling rate, and fault progression patterns. Under the challenging −6 dB Gaussian noise condition, the proposed method maintains an accuracy of 84.98%, with Precision, Recall, and F1 Score all exceeding 84%. This robustness can be attributed to the multi-scale shapelet features that capture fault-related patterns invariant to specific bearing geometries.

5. Conclusions

This work presented an interpretable and noise-robust bearing fault diagnosis framework for CNC spindle applications based on adaptive multi-scale shapelet and deep learning. The proposed method integrates domain knowledge with deep learning, overcomes the “black box” limitation of traditional CNNs, and maintains excellent performance in complex industrial environments. The proposed model has inherent interpretability, and the adaptive shapelet extraction module effectively connects data-driven features with physical fault mechanisms. Moreover, the gated multi-branch CNN and residual-enhanced BiLSTM improve discriminative capability under nonstationary noise conditions. Through learnable gated weights and multi-scale consistency regularization, the model can adaptively suppress noise interference and focus on key fault features. In addition to the CWRU results (87.67% accuracy at −6 dB SNR under Gaussian noise), we further validated the method on the real-world IMS dataset, where it also maintains strong performance, demonstrating good generalization beyond the lab-only setting. The model exhibits only slight performance degradation under various noise conditions, verifying its distribution robustness. Future research will focus on validating the generalization capability using real-time operational spindle data and exploring few-shot learning strategies to reduce reliance on large amounts of labeled data.

Author Contributions

W.H. is in charge of Methodology, Software, Validation, and Writing—Original Draft. H.Z. is in charge of the Investigation and Conceptualization. J.Y. is in charge of Investigation Conceptualization and Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Center of Technology Innovation for Intelligent Design and Numerical Control (NCDC), Huazhong University of Science and Technology. This work was funded by the Huazhong University of Science and Technology, China (grant no. 2024010902040450) and Wuhan Huazhong Numerical Control System, Inc., China (grant no. TC220807F).

Data Availability Statement

The data contained in this study are available on request from the corresponding authors, except for those published. The data are not released due to confidentiality.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.-F.; Huang, H.-Z.; Mi, J.; Peng, W.; Han, X. Reliability analysis of multi-state systems with common cause failures based on Bayesian network and fuzzy probability. Ann. Oper. Res. 2022, 311, 195–209. [Google Scholar] [CrossRef]
Wankhede, S.; Handi, P. AI-based predictive maintenance for CNC machines using vibration data from MPU6050 sensor. Int. J. Interact. Des. Manuf. 2025, 20, 425–435. [Google Scholar] [CrossRef]
Thoppil, N.M.; Vasu, V. An industrial IoT framework for predictive maintenance of CNC lathe spindles: Integrating deep learning and cloud-based analytics. J. Vib. Eng. Technol. 2025, 13, 590. [Google Scholar] [CrossRef]
Gundewar, S.K.; Kane, P.V. Bearing fault diagnosis using time segmented Fourier synchrosqueezed transform images and convolution neural network. Measurement 2022, 203, 111855. [Google Scholar] [CrossRef]
Tang, J.; Wu, J.; Hu, B.; Qing, J. Towards a fault diagnosis method for rolling bearings with time-frequency region-based CNN. Machines 2022, 10, 1145. [Google Scholar] [CrossRef]
Sun, J.; Zhang, Y.; Ma, X.; Li, B.; Yang, M.; Xu, L.; Li, C. Multi-source errors evaluation of machine tools: From research gaps to methodologies and applications. Int. J. Extrem. Manuf. 2025, 8, 022002. [Google Scholar] [CrossRef]
Sun, X.; Yang, Y.; Chen, C.; Tian, M.; Du, S.; Wang, Z. A multi-branch convolution and dynamic weighting method for bearing fault diagnosis based on acoustic–vibration information fusion. Actuators 2025, 14, 17. [Google Scholar] [CrossRef]
Kumar, P.; Raouf, I.; Kim, H.S. Multi-size wide kernel CNN for bearing fault diagnosis. Adv. Eng. Softw. 2024, 198, 103799. [Google Scholar] [CrossRef]
Hou, Y.; Wang, J.; Chen, Z.; Ma, J.; Li, T. Diagnosisformer: An efficient rolling bearing fault diagnosis method based on improved Transformer. Eng. Appl. Artif. Intell. 2023, 124, 106507. [Google Scholar] [CrossRef]
Dong, Z.; Zhao, D.; Cui, L. An intelligent bearing fault diagnosis framework: One-dimensional improved self-attention-enhanced CNN and empirical wavelet transform. Nonlinear Dyn. 2024, 112, 6439–6459. [Google Scholar] [CrossRef]
Chen, G.; Lu, Y.; Su, R. Interpretable fault diagnosis with shapelet temporal logic: Theory and application. Automatica 2022, 142, 110350. [Google Scholar] [CrossRef]
Wan, X.; Cen, L.; Chen, X.; Xie, Y.; Gui, W. Convertible shapelet learning with incremental learning capability for industrial fault diagnosis under shape shift samples. IEEE Trans. Ind. Inform. 2025, 21, 3356–3365. [Google Scholar] [CrossRef]
Liu, Q.; Chen, W.; Dinavahi, V. Multidimensional time-series shapelet-based real-time fault detection and localization on ISS electrical power distribution system. IEEE Trans. Instrum. Meas. 2023, 73, 1–12. [Google Scholar] [CrossRef]
He, Y.; Xiang, H.; Zhou, H.; Chen, J. In-situ fault detection for the spindle motor of CNC machines via multi-stage residual fusion convolution neural networks. Comput. Ind. 2023, 144, 103810. [Google Scholar] [CrossRef]
Iqbal, M.; Madan, A.K. CNC machine-bearing fault detection based on CNN using vibration and acoustic signal. J. Vib. Eng. Technol. 2022, 10, 1613–1621. [Google Scholar] [CrossRef]
Kouam, J.; Songmene, V.; Balhoul, M.A.; Sar, B.E.; Teyssedre, H. Optimization of end mill geometry for machining 1.2379 cold-work tool steel through hybrid RSM-ANN-GA coupled FEA approach. Machines 2025, 14, 15. [Google Scholar] [CrossRef]
Suryanarayana, C.; Tulasi, S.; Ramakrishna, D.; Neelakantan, S. Hybrid neural network modeling of mechanical alloying process parameters and alloy performance using forward and reverse mapping. Eng. Rep. 2025, 7, e70344. [Google Scholar] [CrossRef]
Rezazadeh, N.; Perfetto, D.; Caputo, F.; De Luca, A. Enhancing air compressor fault diagnosis: A comparative study of GPT-2 and traditional machine learning models. Macromol. Symp. 2025, 414, 70057. [Google Scholar] [CrossRef]
Rezazadeh, N.; De Oliveira, M.; Lamanna, G.; Perfetto, D.; De Luca, A. WaveCORAL-DCCA: A scalable solution for rotor fault diagnosis across operational variabilities. Electronics 2025, 14, 3146. [Google Scholar] [CrossRef]
Li, Q.; Zhang, X.; Huang, J.; He, H.; Zhang, F.; Qin, Z.; Chu, F. VSLLaVA: A pipeline of large multimodal foundation model for industrial vibration signal analysis. arXiv 2024, arXiv:2409.07482. [Google Scholar] [CrossRef]
Bagri, I.; Tahiry, K.; Hraiba, A.; Touil, A.; Mousrij, A. Vibration signal analysis for intelligent rotating machinery diagnosis and prognosis: A comprehensive systematic literature review. Vibration 2024, 7, 1013–1062. [Google Scholar] [CrossRef]
Zhang, Y.; Lin, L.; Wang, J.; Zhang, W.; Gao, S.; Zhang, Z. Attention activation network for bearing fault diagnosis under various noise environments. Sci. Rep. 2025, 15, 977. [Google Scholar] [CrossRef]
Ma, Q.; Cao, S.; Gong, T.; Yang, J. Weak fault feature extraction of rolling bearing under strong Poisson noise and variable speed conditions. J. Mech. Sci. Technol. 2022, 36, 5341–5351. [Google Scholar] [CrossRef]
Jiang, G.; Li, D.; Feng, K.; Li, Y.; Zheng, J.; Ni, Q.; Li, H. Rolling bearing fault diagnosis based on convolutional capsule network. J. Dyn. Monit. Diagn. 2023, 2, 275–289. [Google Scholar] [CrossRef]
Niola, V.; Savino, S.; Quaremba, G.; Cosenza, C.; Nicolella, A.; Spirto, M. Discriminant analysis of the vibrational behavior of a gas micro-turbine as a function of fuel. Machines 2022, 10, 925. [Google Scholar] [CrossRef]
Wang, J.; Li, J.; Wang, H. Research on gearbox composite fault diagnosis based on improved local mean decomposition. Int. J. Dyn. Control 2021, 9, 1411–1422. [Google Scholar] [CrossRef]
Ding, A.; Qin, Y.; Wang, B.; Guo, L.; Jia, L.; Cheng, X. Evolvable graph neural network for system-level incremental fault diagnosis of train transmission systems. Mech. Syst. Signal Process. 2024, 210, 111175. [Google Scholar] [CrossRef]
Xiao, L.; Wang, J.; Liu, X.; Sun, H.; Zhao, H. A novel fault diagnosis method based on convolutional neural network with adaptive noise injection. Meas. Sci. Technol. 2025, 36, 036101. [Google Scholar] [CrossRef]
Li, X.; Wang, J.; Wang, J.; Wang, J.; Chen, J.; Yu, X. Research on CNC machine tool spindle fault diagnosis method based on DRSN-GCE model. Algorithms 2025, 18, 304. [Google Scholar] [CrossRef]
Li, X.; Chen, J.; Wang, J.; Wang, J.; Li, X.; Kan, Y. Research on fault diagnosis method of bearings in the spindle system for CNC machine tools based on DRSN-Transformer. IEEE Access 2024, 12, 74586–74595. [Google Scholar] [CrossRef]
Chen, Y.; Zeng, X.; Huang, H. Fault diagnosis of rolling bearings based on adaptive denoising residual network. Processes 2025, 13, 151. [Google Scholar] [CrossRef]
Wang, C.; Huang, C.; Zhang, L.; Xiang, Z.; Xiao, Y.; Qian, T.; Liu, J. Denoising diffusion implicit model combined with TransNet for rolling bearing fault diagnosis under imbalanced data. Sensors 2024, 24, 8009. [Google Scholar] [CrossRef] [PubMed]
Fan, Y.; Luo, J.; Zhang, H. Learning shapelet transform by temporal features for enhanced flotation fault prognosis. In Proceedings of the 2024 8th Asian Conference on Artificial Intelligence Technology (ACAIT), Macau, China, 1–3 November 2024; pp. 1075–1080. [Google Scholar] [CrossRef]
Zhang, J.; Song, X.; Gao, L.; Shen, W.; Chen, J. L2-norm shapelet dictionary learning-based bearing-fault diagnosis in uncertain working conditions. IEEE Sens. J. 2022, 22, 2647–2657. [Google Scholar] [CrossRef]
Grabocka, J.; Schilling, N.; Wistuba, M.; Schmidt-Thieme, L. Learning time-series shapelets. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 392–401. [Google Scholar] [CrossRef]
Li, G.; Choi, B.; Xu, J.; Bhowmick, S.S.; Chun, K.P.; Wong, G.L. ShapeNet: A shapelet-neural network approach for multivariate time series classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 8375–8383. [Google Scholar] [CrossRef]
Chen, H.; Meng, W.; Li, Y.; Xiong, Q. A noise-robust fault diagnosis approach for rolling bearings based on multiscale CNN-LSTM and a deep residual learning model. Meas. Sci. Technol. 2023, 34, 045013. [Google Scholar] [CrossRef]
Wang, L.; Cao, J.; Zhang, J.; Zhao, Y. Multi scale convolutional neural network combining BiLSTM and attention mechanism for bearing fault diagnosis under multiple working conditions. Sci. Rep. 2025, 15, 12580. [Google Scholar] [CrossRef]
Imran, M.; Hasan, M.J.; Kim, C.H.; Kim, J.M. Hybrid CNN-BiLSTM-MHSA model for accurate fault diagnosis of rotor motor bearings. Mathematics 2025, 13, 334. [Google Scholar] [CrossRef]
Lyu, Y.; Chu, Y.; Qiu, Q.; Zhang, J.; Guo, J. A causal-transformer based meta-learning method for few-shot fault diagnosis in CNC machine tool bearings. Comput. Mater. Contin. 2025, 85, 3393–3418. [Google Scholar] [CrossRef]
Zhou, W.; Zhou, J.; Li, C.; Liu, X. Research on compound fault diagnosis of rolling bearing of CNC machine tools based on parameter optimization MOMEDA. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2025, 239, 5284–5296. [Google Scholar] [CrossRef]

Figure 1. Overall architecture of the proposed Adaptive Shapelet-Based Deep Learning Model.

Figure 2. Architecture of the adaptive multi-scale shapelet extraction module.

Figure 3. Signal comparison under different noise types with original signal.

Figure 4. Validation Loss and validation accuracy.

Figure 5. Comparison of noise robustness under different noise types.

Figure 6. Proposed method confusion matrix comparison under different noise types: (a) Gaussian noise; (b) Uniform noise; (c) Impulse noise.

Figure 7. Confusion matrix comparison of every single method under Gaussian noise (SNR = −6 dB): (a) WDCNN Mode; (b) ResNet Model; (c) GRU Model; (d) DRSN-CW Model; (e) Proposed Method.

Figure 8. t-SNE visualization comparison of every single method under Gaussian noise (SNR = −6 dB): (a) WDCNN Model; (b) ResNet Model; (c) GRU Model; (d) DRSN-CW Model; (e) Proposed Method.

Figure 9. Comparison of learned shaplet and the raw vibration signal with Gaussian noise (shown with the OR_007 case for illustration): (a) Raw Vibration Signal with Noise; (b) Learned Shapelet; (c) Local Alignment.

Figure 10. Comparison of multi-scale shapelet and raw vibration signal (shown with the OR_007 case for illustration).

Figure 11. FFT analysis of multi-scale shapelet: (a) FFT Magnitude Spectra of Learned Shapelets (shown with the OR_007 case for illustration); (b) Detailed Spectrum of Shapelet (L = 256) with Fault Frequency Markers.

Figure 12. Shapelet scale weights after training (shown with the OR_007 case for illustration).

Table 1. Sensitivity analysis of loss weights under Gaussian noise (SNR = −6 dB).

$λ_{1}$	$λ_{2}$	Accuracy (%)	F1 Score
0.01	0.01	85.23 ± 0.72	84.89 ± 0.68
0.01	0.02	85.67 ± 0.58	85.34 ± 0.61
0.01	0.05	84.92 ± 0.81	84.56 ± 0.77
0.05	0.01	86.45 ± 0.53	86.21 ± 0.49
0.05	0.02	87.67 ± 0.45	87.34 ± 0.46
0.05	0.05	86.89 ± 0.62	86.32 ± 0.58
0.10	0.01	86.12 ± 0.68	85.87 ± 0.71
0.10	0.02	85.78 ± 0.74	85.43 ± 0.69
0.10	0.05	84.56 ± 0.95	84.21 ± 0.88

Table 2. Structural parameter settings.

Component	Parameter	Value
Shapelet Module	Shapelet lengths	[16, 32, 64, 128, 256]
Shapelet Module	K shapelets per length	10
Gated CNN	Kernel sizes	[3, 5, 7]
	Parallel branches	3
	Per-branch conv	Depthwise Conv1D: in = D, out = D, groups = D; Pointwise Conv1D: in = D, out = 64
	Channels after concatenation	$64 \times 3 = 192$
	Fusion Layer	Conv1D: in = 192, out = 128, kernel = 3, padding = 1
BiLSTM	Hidden dimension	128
BiLSTM	Number of layers	2
Classification	FC layers	[256, 128, num_classes]
Loss weights	$λ_{1}$ (contrastive)	0.05
Loss weights	$λ_{2}$ (consistency)	0.02

Table 3. Bearing Dataset Description.

State	Fault Diameter (Inches)	Number of Training Samples	Number of Validation Samples	Number of Test Samples	Label
Normal	0	280	60	60	0
Ball_007	0.007	280	60	60	1
Ball_014	0.014	280	60	60	2
Ball_021	0.021	280	60	60	3
IR_007	0.007	280	60	60	4
IR_014	0.014	280	60	60	5
IR_021	0.021	280	60	60	6
OR_007	0.007	280	60	60	7
OR_014	0.014	280	60	60	8
OR_021	0.021	280	60	60	9

Table 4. Noise configuration for experiments.

Noise Type	Signal-to-Noise Ratio (dB)
Gaussian	−6	−4	−2	2	4	8
Uniform	−6	−4	−2	2	4	8
Impulse	−6	−4	−2	2	4	8

Table 5. Diagnostic accuracy under Gaussian noise (%).

Model	−6 dB	−4 dB	−2 dB	0 dB
WDCNN	82.98 ± 1.40	93.93 ± 0.77	98.53 ± 0.43	99.62 ± 0.21
ResNet1D	75.28 ± 1.52	86.62 ± 1.15	94.60 ± 0.52	98.37 ± 0.37
GRU	80.85 ± 1.33	91.38 ± 1.19	95.58 ± 0.74	97.47 ± 0.43
DRSN-CW	59.48 ± 0.43	75.98 ± 0.44	88.64 ± 0.41	94.88 ± 0.50
Proposed Method	87.67 ± 0.45	94.70 ± 0.40	98.85 ± 0.33	99.72 ± 0.31

Table 6. Diagnostic accuracy under Uniform noise (%).

Model	−6 dB	−4 dB	−2 dB	0 dB
WDCNN	80.42 ± 1.46	93.38 ± 0.92	95.65 ± 0.36	98.62 ± 0.24
ResNet1D	75.03 ± 0.92	86.78 ± 0.77	94.35 ± 0.64	98.43 ± 0.37
GRU	78.73 ± 1.74	90.80 ± 1.37	95.97 ± 0.79	98.00 ± 0.58
DRSN-CW	66.18 ± 1.27	80.87 ± 1.06	92.42 ± 0.45	97.37 ± 0.31
Proposed Method	85.65 ± 0.63	93.21 ± 0.57	97.55 ± 0.47	99.01 ± 0.44

Table 7. Diagnostic accuracy under Impulse noise (%).

Model	−6 dB	−4 dB	−2 dB	0 dB
WDCNN	77.83 ± 1.44	86.17 ± 0.76	96.63 ± 0.54	98.01 ± 0.22
ResNet1D	74.50 ± 1.01	84.57 ± 0.65	97.28 ± 0.30	98.48 ± 0.18
GRU	78.83 ± 1.07	85.65 ± 1.07	96.83 ± 0.58	98.67 ± 0.42
DRSN-CW	72.38 ± 0.81	83.00 ± 0.68	92.53 ± 0.20	98.13 ± 0.14
Proposed Method	82.33 ± 0.49	89.35 ± 0.52	96.55 ± 0.39	98.97 ± 0.21

Table 8. Performance metrics at SNR = −6 dB (Gaussian noise) (% or s).

Model	Accuracy (Avg)	Recall (Avg)	F1 Score (Avg)
WDCNN	82.98 ± 1.40	81.15 ± 1.52	82.22 ± 1.42
ResNet1D	75.28 ± 1.52	73.83 ± 1.71	74.68 ± 1.49
GRU	80.85 ± 1.33	79.08 ± 1.46	80.26 ± 1.31
DRSN-CW	59.48 ± 0.43	59.37 ± 0.39	59.41 ± 0.42
Proposed Method	87.67 ± 0.45	87.34 ± 0.46	87.65 ± 0.45

Table 9. p-values for baseline accuracy comparisons under different noise conditions.

Model	Gaussian (−6 dB)	Gaussian (0 dB)	Uniform (−6 dB)	Impulse (−6 dB)
WDCNN	$2.14 \times 10^{- 6}$ **	$0.038$ *	$1.87 \times 10^{- 5}$ **	$3.42 \times 10^{- 4}$ **
ResNet1D	$8.63 \times 10^{- 8}$ **	$4.21 \times 10^{- 4}$ **	$5.29 \times 10^{- 7}$ **	$1.15 \times 10^{- 6}$ **
GRU	$4.52 \times 10^{- 7}$ **	$1.83 \times 10^{- 5}$ **	$2.76 \times 10^{- 5}$ **	$6.89 \times 10^{- 4}$ **
DRSN-CW	$3.18 \times 10^{- 12}$ **	$2.47 \times 10^{- 9}$ **	$7.34 \times 10^{- 9}$ **	$4.57 \times 10^{- 8}$ **

* p < 0.05, ** p < 0.01.

Table 10. Ablation study results (Clean signal, no noise injection).

Configuration	Accuracy (%)	F1 Score	Recall
w/o Shapelet Module	87.33	86.39	86.48
w/o Parallel CNN	92.67	92.03	92.52
w/o BiLSTM	98.12	97.96	98.02
BiLSTM w/o Residual	98.89	98.76	98.81
Full Model	99.98	99.92	99.98

Table 11. Ablation study results under Gaussian noise (SNR = −6 dB).

Configuration	Accuracy (%)	F1 Score	Recall
w/o Shapelet Module	72.45 ± 1.23	71.82 ± 1.31	72.15 ± 1.28
w/o Parallel CNN	79.83 ± 0.95	79.21 ± 0.88	79.56 ± 0.91
w/o BiLSTM	85.42 ± 0.62	85.18 ± 0.58	85.31 ± 0.65
BiLSTM w/o Residual	84.23 ± 0.78	83.95 ± 0.72	84.08 ± 0.75
Full Model	87.67 ± 0.45	87.65 ± 0.45	87.34 ± 0.46

Table 12. Bearing condition, class labels, and data samples of IMS dataset.

Condition	Number of Training Samples	Number of Validation Samples	Number of Test Samples	Label
Normal	280	60	60	0
Ball	280	60	60	1
Inner Race	280	60	60	2
Outer Race	280	60	60	3

Table 13. Cross-dataset Validation Results on IMS Dataset.

Noise Condition	Accuracy (%)	F1 Score	Recall
Clean	99.58 ± 0.35	99.54 ± 0.38	99.57 ± 0.33
−6 dB Gaussian Noise	84.98 ± 0.64	84.72 ± 0.69	84.79 ± 0.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, W.; Zhou, H.; Yang, J. Interpretable and Noise-Robust Bearing Fault Diagnosis for CNC Machine Tools via Adaptive Shapelet-Based Deep Learning Model. Machines 2026, 14, 214. https://doi.org/10.3390/machines14020214

AMA Style

Hu W, Zhou H, Yang J. Interpretable and Noise-Robust Bearing Fault Diagnosis for CNC Machine Tools via Adaptive Shapelet-Based Deep Learning Model. Machines. 2026; 14(2):214. https://doi.org/10.3390/machines14020214

Chicago/Turabian Style

Hu, Weiqi, Huicheng Zhou, and Jianzhong Yang. 2026. "Interpretable and Noise-Robust Bearing Fault Diagnosis for CNC Machine Tools via Adaptive Shapelet-Based Deep Learning Model" Machines 14, no. 2: 214. https://doi.org/10.3390/machines14020214

APA Style

Hu, W., Zhou, H., & Yang, J. (2026). Interpretable and Noise-Robust Bearing Fault Diagnosis for CNC Machine Tools via Adaptive Shapelet-Based Deep Learning Model. Machines, 14(2), 214. https://doi.org/10.3390/machines14020214

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable and Noise-Robust Bearing Fault Diagnosis for CNC Machine Tools via Adaptive Shapelet-Based Deep Learning Model

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning for Bearing Fault Diagnosis

2.2. Noise-Robust Fault Diagnosis Methods

2.3. Shapelet-Based Time Series Analysis

2.4. Comparison with Deep Shapelet-Based and Hybrid CNN-RNN Methods

2.5. Fault Diagnosis for CNC Machine Tools

3. Proposed Methods

3.1. Adaptive Multi-Scale Shapelet Extraction

3.2. Gated Parallel CNN

3.3. Residual-Enhanced Bidirectional LSTM Network

3.4. Composite Loss Function

4. Experiments and Evaluation

4.1. Experimental Setup

4.2. Dataset Description

4.3. Noise Configuration

4.4. Model Performance Analysis

4.5. Comparison with State-of-the-Art Methods

4.6. Interpretablility Analysis

4.7. Ablation Study

4.8. Cross-Dataset Validation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI