1. Introduction
In recent years, fault diagnosis of rolling bearings has attracted attention due to its significance in maintaining industrial machinery. Traditional methods often struggle with noisy signals, leading to poor feature extraction. To address this, we propose an Ascending-Dimensional Convolutional Neural Network (ADCNN), which utilizes a larger convolutional kernel in the first layer and introduces a dimensional enhancement technique. These adjustments allow the network to better handle noisy environments, providing improved accuracy in fault diagnosis, even under challenging conditions.
With the improvement of industrialization and the rapid development of science and technology, rotating equipment is developing towards high-speed, large-scale, and automation directions [
1]. Rolling bearings are not only essential components of rotating machinery but also vulnerable components in mechanical equipment. The health status of bearings has a significant impact on the performance, smooth operation, and service life of mechanical equipment [
2,
3]. Statistics have shown that about 44% of equipment failures are caused by rolling bearing failures [
4]. Once a bearing fails, it will not only affect the operating efficiency of mechanical equipment but also cause economic losses and even threaten human life safety. Therefore, the fault diagnosis of rolling bearings has always been a focus of researchers [
5,
6,
7].
The fault diagnosis of rolling bearings can be roughly divided into the following steps: (1) acquisition of vibration signals, (2) signal preprocessing, (3) extraction of useful feature information from the signals, and (4) fault modeling and diagnosis [
8]. Traditional fault diagnosis methods are mainly based on signal analysis theory and are realized by improving the signal processing method. Wang et al. proposed a sparse guided empirical wavelet transformation method and successfully identified the resonance frequency band [
9]. Kiral and Karagülle proposed a rolling bearing defect detection method for single and multiple defects based on finite element vibration analysis and established the relationship between bearing failure and vibration response [
10]. Jiang et al. proposed a variational mode decomposition (VMD) method based on initial center frequencies (ICF-guided) to accurately extract weak damage features of rotating machinery [
11]. The traditional signal processing method is of great significance in the fault diagnosis of rolling bearings and has provided important guidance for subsequent research. However, it is difficult to extract the feature signals manually using traditional fault diagnosis methods, and the work is further complicated if the bearing vibration signals are too complex [
12]. Addressing this limitation has therefore become a key research objective.
Recent advancements in traditional methodologies continue to push the boundaries of fault diagnostics. For instance, Yang et al. [
13] provided an in-depth analytical model for the time-varying excitation mechanism of bearing surface defects, offering a fundamental theoretical understanding that underpins many vibration-based diagnosis approaches. Furthermore, advanced signal processing techniques, such as the wavelet-supported residual processing method proposed by Melluso et al. [
14] for the extraction of subtle torque faults in complex hybrid electric powertrains, demonstrate significant potential for robust extraction under challenging, noisy conditions. Despite these sophisticated developments, the reliance on expert knowledge for manual feature design and the computational complexity involved in processing highly non-stationary signals remain inherent limitations.
In recent years, data-driven bearing fault diagnosis methods have gradually developed. For example, the rolling bearing fault diagnosis method based on a multi-layer perceptron (MLP) [
15,
16] has played a certain role in solving the problems outlined above. Among many data-driven methods, the method based on deep learning has achieved certain results in bearing fault diagnosis [
17]. Currently, a number of deep learning methods are commonly used, including transfer learning [
18,
19], deep residual networks (ResNets) [
20,
21], generating confrontation networks [
22], deep confidence networks [
23], convolutional neural networks [
24] and so on. They have been successfully applied to computer vision [
25], natural language processing [
26], medical image diagnosis [
27], automatic driving [
28], etc. Janssens et al. [
29] carried out early research on fault diagnosis based on the CNN architecture, converting the original signal into a frequency-domain signal through Discrete Fourier Transform (DFT) and inputting it into the CNN for model training. Compared to traditional diagnostic methods, the accuracy rate was improved by 6%. Xue F et al. [
30] extracted depth features from the parallel multi-channel structures of 1D-CNN and 2D-CNN architectures, and they proposed a method to solve the problems of low accuracy of bearing fault diagnosis models and long network iteration times. Zhao B et al. proposed a new normalized CNN model for bearing fault diagnosis under complicated operating conditions in actual industrial scenarios [
31]. In practical applications, there is a greater need for models that require less computer memory and have higher prediction efficiency. Common lightweight models include SqueezeNet [
32], Xception [
33], MobileNet [
34], ShuffleNet [
35], and deeply separable convolutional networks [
36,
37]. In actual operating conditions, bearings work in extremely severe environments, and it is inevitable that the noise will interfere with signals when acquiring vibration signals. The characteristic data of noise interference greatly reduces the accuracy rate in fault diagnosis. Therefore, it is also necessary to improve the noise resistance performance of the network (effectively extracting feature data from noisy signals and performing fault classification).
A fault diagnosis method based on ADCNN is presented in this paper to address the issues outlined above. The work done in this paper is summarized as follows:
- (1)
A super convolutional kernel is used in the first layer, and an RLTL module is introduced in the tail layer of the network to improve the overall noise resistance and achieve a lightweight design.
- (2)
Multiple sources of channel data are combined into two-dimensional data in parallel, using a 2D-CNN network in the spatial dimension of the two-dimensional channel and combining the weight-sharing mechanism to solve the information interaction problem while reducing the number of channels in the height direction and the feature information in the width direction, in addition to increasing the number of channels in the depth direction and achieving a network that is lightweight overall and improving the noise resistance performance.
- (3)
The grouping convolution method is introduced to reduce the model’s parameters. The increased channel area is divided into three groups through a shuffle operation, further improving the feature extraction ability and network diagnosis accuracy of the convolution layer.
The rest of this paper is structured as follows.
Section 2 covers the theoretical basis for improving the bearing fault diagnosis model. The architecture of the proposed bearing fault diagnosis network is described in detail in
Section 3. A validation example of the model is presented in
Section 3. The conclusion of this paper is revealed in
Section 4.
Although simple fault detection is sufficient for deciding whether a bearing should be replaced, detailed fault-type identification provides additional value in practical engineering applications. Specifically, distinguishing between inner-race, outer-race, and rolling-element faults is beneficial for root-cause analysis, maintenance strategy optimization, and fault progression monitoring. Therefore, fine-grained fault diagnosis remains an important task in intelligent condition monitoring systems.
In addition to conventional CNN-based diagnostic pipelines, recent studies have explored transfer learning strategies for bearing diagnosis in industrial servomotor settings, as well as multi-size, wide-kernel convolution designs to enhance multi-scale feature extraction. Compared with these approaches, our ADCNN focuses on (i) improving noise robustness via a large-kernel frontend, (ii) reducing parameters through the ascending-dimension 2D weight-sharing design, and (iii) compressing the classifier using the RLTL module. These design choices jointly target robust and lightweight deployment.
Conventional 1D-CNN bearing diagnosis models typically rely on the stacking of 1D convolutions and frequent 1 × 1 pointwise convolutions to achieve cross-channel fusion. This often increases parameters rapidly with the channel number and does not explicitly exploit structured inter-channel correlations. In contrast, our ascending-dimension design reshapes the concatenated multi-channel 1D vibration segments into a 2D representation, enabling 2D convolutions to jointly model (i) temporal patterns along the signal axis and (ii) inter-channel correlation patterns along the newly formed spatial axis. Because 2D convolution shares weights across spatial locations, the proposed formulation provides stronger feature interaction with fewer parameters than repeated 1D fusion blocks, which is particularly beneficial under low-SNR conditions and for resource-constrained deployments.
Although this work focuses on robust single-domain training under a low SNR, the proposed ADCNN backbone can be naturally extended with transfer learning schemes to handle cross-machine or cross-condition adaptation in future work.
3. Results and Discussion
To verify the effectiveness and progressiveness of the bearing fault diagnosis model in different dataset noise environments, two types of bearing fault datasets are taken as examples to conduct bearing fault diagnosis using the proposed model. The environment required for the fault diagnosis model is as follows: Computers with Win10 operating systems are used in the experiment, and the specific hardware configuration includes an Intel (R) Core (TM) i7-10875H CPU, 16 G RAM and NVDIA RTX2060GPU. The network architecture was developed by Pytorch 1.7.1, and the programming language is Python 3.8.
3.1. Experimental Setup and Hyperparameter Configuration
The network input is a normalized 1D vibration signal segment with a fixed length of 1024 points, formatted as a tensor of size [Batch Size, 1, 1024]. The output is a probability distribution over the fault classes (10 for CWRU, 4 for SJZU). The model was trained using the Adam optimizer with an initial learning rate of 0.001, β1 = 0.9, and β2 = 0.999. Cross-entropy loss was used as the objective function. Training proceeded for a maximum of 100 epochs with a batch size of 64. To prevent overfitting, we employed early stopping (patience = 10 epochs based on validation loss) and L2 weight decay (coefficient = 1 × 10−4). The learning rate was reduced by a factor of 0.5 if the validation loss plateaued for five consecutive epochs.
To test the model’s robustness under realistic operating conditions, Gaussian white noise of varying intensities (SNR from −5 dB to 5 dB) was added to the original vibration data. This noise simulates the interference commonly encountered in industrial environments, and the model’s ability to perform fault diagnosis under these conditions was thoroughly tested. Sliding window sampling was also employed to ensure signal continuity and avoid class imbalance, further enhancing the training dataset. By using k-fold cross-validation, we ensured that the performance of the fault diagnosis model was not overly dependent on a single dataset split. This helped account for variations in the data, such as signal noise or different fault types, improving the robustness of the model and its ability to generalize to new, unseen data.
For reproducibility, each epoch consisted of ⌈N_train/batch_size⌉ iterations. We also fixed the random seed for data splitting and training. The implementation was based on PyTorch (version specified in the final submission) and executed on the hardware described above. During inference, the trained ADCNN takes an unseen normalized vibration segment of size [Batch Size, 1, 1024] as input and outputs a probability distribution over fault classes via the Softmax layer. The predicted fault category is determined by the maximum-probability class. This inference process supports online diagnosis once the model has been trained offline.
3.2. Data Preprocessing
In this example, both datasets are divided into training sets and test sets in an 8:2 ratio. Data enhancement and standardization processing are performed after the dataset is divided to prevent information leakage from the test set. First, the sample length should be determined based on the parameters known in the experiment, and the number of sampling points for one cycle of bearing operation should be calculated using Formula (18):
where
is the sampling frequency,
represents the running speed of the bearing, and
is the calculated sample length. After calculating the number of sampling points, a sample point greater than this value is selected as the sample length. At the same time, the same number of samples is used for each operating state of the bearings in the training set and the test set to prevent class imbalance in training samples.
The accuracy of the model largely depends on the number of samples in the training set when the training of a neural network model is conducted. The more samples in the training set, the better the training effect of the model. Therefore, a sliding window overlapping sampling method is used for data enhancement in this experiment, and the process is shown in
Figure 9. This method can ensure the continuity of the vibration signals better and complies with the characteristics of signal periodicity. The overlapping sampling formula is expressed as follows:
where m is the number of samples, N is the total length of the signal,
is the i-th sample data point after sampling,
, and S is the length of the sliding window.
The sliding window sampling method is used to generate overlapping samples from the original dataset, effectively increasing the number of samples available for training. It should be emphasized that this approach increases sample redundancy rather than intrinsic data diversity. The primary benefit lies in enabling the model to learn temporal correlations more effectively rather than introducing additional information content. This means that while the model may encounter more examples, the variety of information presented to it remains the same. Therefore, the primary benefit of this method is that it allows the model to learn more about the temporal relationships within the data, particularly when the dataset is small. However, it should not be confused with methods that increase the diversity of training data.
In practical work, rolling bearings are often located in a relatively complicated operating environment, so noises would inevitably interfere with the vibration signals. Gaussian white noise with different intensities is added during data preprocessing to investigate the noise resistance ability of the proposed network model in different noise environments under actual operating conditions. Its intensity is measured by the signal-to-noise ratio (SNR) in decibels. The SNR calculation formula is expressed as follows:
where
is the effective power of the signal and
is the effective power of the noise. The lower the SNR value, the greater the energy of the noise interference and the greater the difficulty in identifying the original signal.
To further illustrate the influence of noise on vibration signals, representative raw signals under clean and noise-contaminated conditions were analyzed. For healthy bearings, the vibration signals exhibit relatively stable periodic patterns with low amplitude variation, whereas faulty bearings (inner-ring, outer-ring, and rolling element faults) show impulsive components and increased signal complexity. As the SNR decreases, these fault-related impulses become progressively masked by noise, leading to blurred time-domain characteristics. This phenomenon is observed consistently in both the CWRU and SJZU datasets and highlights the challenge of extracting discriminative features under severe noise interference.
For the CWRU dataset, labels are categorical indices (0–9), each of which corresponds to a pre-defined bearing health state (e.g., healthy or inner-ring fault with a specific damage size). We trained a single unified multi-class classifier that directly predicts the fault type/severity from raw vibration segments rather than constructing separate datasets for each fault cause. This setting aligns with practical condition monitoring, where a single deployed model is expected to differentiate multiple fault modes and severities.
The predicted fault type refers to the categorical classification output of the ADCNN model. Each category corresponds to a specific bearing condition, including healthy condition, inner-ring fault, outer-ring fault, and rolling element fault, as defined in
Table 3 and Table 6.
Therefore, we did not construct separate datasets for different fault causes; instead, we used one unified multi-class dataset and trained a single classifier to directly identify the fault mode and severity.
3.3. Example 1: Network Model Validation Based on CWRU Dataset
3.3.1. Data Description
For our experiments, we used publicly available data from the Case Western Reserve University (CWRU) Bearing Data Center [
39]. This dataset includes data from various bearing fault types, such as inner-ring faults, outer-ring faults, and rolling element faults. The experiment was conducted under a 0 HP load condition with a bearing speed of 1797 RPM. The data were sampled at a frequency of 12 kHz with a sample length of 1024 points. The dataset was divided into training and test sets in an 80:20 ratio, with each fault type having samples with damage sizes of 7 mil, 14 mil, and 21 mil. Given that the test set comprises less than 5% of the total data, k-fold cross-validation (with k = 5) was employed to mitigate the risk of evaluating the model based on a small and potentially unrepresentative test set. This technique ensures that each sample has a chance to be used as a test set, providing a more robust and generalized estimate of the model’s performance. The CWRU dataset was collected using a deep-groove ball-bearing test rig.
The bearing type used in the experiment was SKF 6205-2RS JEM, with the following specifications: Inner diameter: 25 mm; Outer diameter: 52 mm; Width: 15 mm.
Faults were artificially introduced using electro-discharge machining (EDM), which produced localized defects on the inner ring, outer ring, and rolling elements. Fault diameters were 0.007 inches (0.178 mm), 0.014 inches (0.356 mm), and 0.021 inches (0.533 mm).
The motor load conditions correspond to different torque levels applied to the motor shaft. The 0 HP condition corresponds to a no-load motor condition with a nominal rotational speed of 1797 RPM.
The dataset used in this experimental validation comprises open-source data released by the CWRU Bearing Data Center. As shown in
Figure 10, the test bench consists of four parts: a drive motor, a torque sensor, a power tester and a console. In the bearing experiment, vibration signals were recorded under four load conditions (0 HP, 1 HP, 2 HP, and 3 HP). The 0 HP load condition refers to a no external mechanical load applied to the motor shaft, while the bearing rotates at its nominal operating speed. This condition is commonly used as a baseline operating condition in the CWRU dataset.
These four loads correspond to different bearing speeds. In this paper, the signals under the 0 HP operating condition were selected for validation, the corresponding speed was 1797 r/min, and the sampling frequency for the experiment was 12 KHz. The data includes ten states of normal bearings, three types of bearing outer-ring faults, three types of bearing inner-ring faults, and three types of bearing roller faults. The corresponding damage sizes for the three types of bearing failures are 7 mil, 14 mil, and 21 mil.
Based on the above, it is calculated that at least about 400 sampling points are passed during one revolution of the bearing. The final sample length was determined to be 1024 to better extract the feature information. In addition, a sliding window size of 128 was selected for the data enhancement operation. The details of the data samples after dataset enhancement are shown in the
Table 3.
The labels (0 to 9 for CWRU) are categorical, each representing a specific, pre-defined health state of the bearing (e.g., healthy or inner-ring fault of 0.007 inches). This comprehensive dataset encompassing multiple fault types and severities is essential for the training of a unified diagnostic model capable of directly identifying a wide range of potential failures from raw vibration signals, which aligns with the practical need for versatile condition monitoring systems in industry.
To verify the effectiveness and progressiveness of the method proposed in this paper for fault diagnosis in different noise environments, we added white Gaussian noise with an SNR of −5 dB to 5 dB to the original dataset to test the noise resistance performance and robustness of the model.
During data preprocessing, Gaussian white noise of varying intensities, with a Signal-to-Noise Ratio (SNR) ranging from −5 dB to +5 dB, was added to the original vibration data. This range simulates the noise interference typically encountered in industrial environments. Specifically, an extreme noise level of −5 dB was chosen to test the model’s performance under challenging conditions.
3.3.2. Experimental Results and Analysis
We performed experiments using the publicly available dataset from the CWRU Bearing Data Center. The data was collected under a 0 HP load condition at a speed of 1797 RPM with a sampling frequency of 12 kHz. In the preprocessing stage, we applied a sliding window technique for data augmentation and standardized the data. The dataset was split into training and testing sets in an 80:20 ratio, and each sample was 1024 points long. The fault types included in the dataset are inner-ring faults, outer-ring faults, and rolling element faults, with damage sizes of 7 mil, 14 mil, and 21 mil, respectively. The dataset, which includes various fault types, such as inner-ring faults, outer-ring faults, and rolling element faults, was split into training and test sets in an 80:20 ratio to ensure balance. For data augmentation, the sliding window method was applied during preprocessing, using a window size of 128 samples with 50% overlap. Furthermore, each combination of fault type and damage size was represented with multiple samples to increase dataset variability and robustness. The fault types included in the dataset are: inner-ring fault (7 mil, 14 mil, and 21 mil), outer-ring fault (7 mil, 14 mil, and 21 mil), and rolling element fault (7 mil, 14 mil, and 21 mil).
To verify the model, the diagnosis results of the ADCNN model proposed in this paper are compared with those of several traditional network models. Traditional models include the excellent lightweight MLP_0 model, conventional CNN model, and CNN1D_N×3Max model with good noise resistance.
Table 4 shows the sizes and prediction speeds of different models.
Table 5 and Figure 12 present comparisons of the diagnostic accuracies of different models, and Figure 13 reveals the model accuracy under different numbers of epochs.
As seen in
Table 5 and
Figure 11 and
Figure 12, ADCNN consistently outperforms traditional models, especially under low-SNR conditions. With a noise resistance rate exceeding 99% at higher SNR levels (0 dB and above), ADCNN’s performance in feature extraction and fault classification significantly surpasses that of the MLP_0 model and conventional CNN architectures. This confirms the superiority of ADCNN in handling noisy industrial data.
Table 5 and
Figure 12 indicate that the accuracy is low for MLP_0 and CNN1D_Normal in areas with high noise interference (SNR = −5 dB~−1 dB). The accuracy of fault diagnosis is lower than 60%, especially when SNR = −5 dB. The accuracy is lower than 80% when SNR = −1 dB, which indicates that MLP_0 and CNN1D_Normal have poor noise resistance. ADCNN and CNN1D_N×3Max have similar effects, with an accuracy of more than 80% when SNR = −5 dB. The fault diagnosis accuracy of ADCNN and CNN1D_N×3Max is around 95% when SNR = −1 dB. ADCNN and CNN1D_N×3Max have the advantage of noise resistance performance in areas with lower noise interference (SNR = 0 dB~5 dB), with an accuracy of around 99%.
To further analyze the potential overfitting behavior of the proposed ADCNN model, classification accuracy under different numbers of training epochs was investigated. As shown in
Figure 12, when the number of training epochs is lower than 40, both training and test accuracy increase steadily, indicating effective learning features. However, when the number of epochs exceeds approximately 50, the training accuracy continues to improve, while the test accuracy shows noticeable fluctuations or slight degradation. This phenomenon suggests a potential tendency toward memorizing training patterns rather than further improving generalization performance. Further increasing the number of training epochs does not lead to additional performance improvement and may introduce a potential risk of overfitting.
Based on this observation, an appropriate training epoch range was selected to balance model convergence and generalization performance. In this study, the optimal number of training epochs for ADCNN was empirically determined to be within the range of 40 to 50, where the model achieves high diagnostic accuracy while maintaining stable performance on the test set.
According to
Table 4, from the perspective of model size and prediction time, the parameter count of ADCNN is only 1.15% that of CNN1D_N×3Max, requiring a shorter training time. Combining noise immunity and model size, ADCNN outperforms other networks, highlighting its advantages in fault diagnosis tasks.
Figure 13 shows that the network accuracy during training will fluctuate with a continuous increase in the number of training epochs, indicating that attention should be paid to the selection of the number of epochs during ADCNN network training to enhance the network stability.
Although CNN1D_N×3Max achieves comparable accuracy to ADCNN under certain SNR conditions, ADCNN demonstrates clear advantages in terms of model compactness and computational efficiency. Specifically, ADCNN requires significantly less memory and exhibits a lower prediction time, as shown in
Table 4. This trade-off between accuracy and efficiency highlights the superiority of ADCNN for practical industrial deployment, where both noise robustness and resource constraints must be considered simultaneously.
3.4. Example 2: Validation of the Designed Test Dataset
3.4.1. Data Description
This part of experimental validation data was obtained from the bearing failure test bench of Shenyang Jianzhu University (SJZU). The structure of the test bench is shown in
Figure 13. The entire test bench consists of three parts: a drive motor, a rotor, and an acceleration tester. The bearing experiment was conducted under a load condition of 0 HP, with a rotation frequency of 20 HZ and a sampling frequency of 16,384 HZ. The data includes four states: normal bearing, bearing outer-ring fault, bearing inner-ring fault, and bearing roller fault. The bearing used in the SJZU test bench is a deep-groove ball bearing (model: MB ER-8K, Spectra Quest Inc., Richmond, VA, USA). The bearing contains eight rolling elements, with a rolling-element diameter of 0.3125 inch and a pitch diameter of 1.319 inch. The contact angle is 0°. These detailed bearing parameters are provided to ensure the full reproducibility of the experimental conditions. Faults were introduced artificially using controlled machining to simulate realistic defect conditions.
The labels (0 to 3 for SJZU) are categorical, each representing a specific, pre-defined health state of the bearing (e.g., Healthy, inner-ring fault of 0.007 inches). This comprehensive dataset encompassing multiple fault types and severities is essential for the training of a unified diagnostic model capable of directly identifying a wide range of potential failures from raw vibration signals, which aligns with the practical need for versatile condition monitoring systems in industry.
Like example 1, it is calculated that the bearing passes through at least about 820 sampling points during one revolution, and the sample length was also determined to be 1024. A sliding window size of 128 was selected for the data enhancement operation. The data details are shown in the table below (
Table 6).
To verify the universality of the proposed method in terms of noise resistance performance, we adopted the same scheme as the original dataset in example 1 and added white Gaussian noise with an SNR of −5 dB to 5 dB.
3.4.2. Experimental Results and Analysis
The designed test datasets are used as examples in this section, and the MLP_0 model, conventional CNN model, CNN1D_N×3Max model and ADCNN model are used for fault diagnosis.
Table 7 shows the model sizes and operation time comparisons of different network models for the test dataset.
Table 7 indicates that the scale of the ADCNN model is smaller than that of other networks.
Table 8 and
Figure 14 show the obtained fault diagnosis accuracy of each network model on the dataset.
Table 8 and
Figure 14 show that ADCNN performs well in the validation of design test datasets. Under the condition of high noise interference (SNR = −5 dB~−1 dB), its accuracy is still higher than that of other networks. The accuracy can be maintained within a range of 85–95%. The accuracy of conventional CNN models is below 90%. As a neural network with good noise resistance, the accuracy of CNN1D_N×3Max is over 90% only when the SNR is −1 dB, and the accuracy of MLP_0 within this range can only reach about 50%. In the case of limited noise interference (SNR = 0 dB~5 dB), the accuracy of ADCNN can reach up to 99.87%. The above result further proves that ADCNN achieves good noise resistance performance.
As shown in
Table 7, the model size of ADCNN is 48.76 KB, making it much smaller than CNN1D_Normal, the second smallest model. Although slightly larger than MLP_0 in runtime, ADCNN is still better than other networks in terms of comprehensive noise resistance performance and recognition accuracy.
Figure 15 indicates that in the SJZU dataset, the optimal number of training epochs for the ADCNN network is within the range of 40–50. Beyond this number, the effect will not be as good as CNN1D_N×3Max. This indicates that when using this dataset for network experiments, excessive training times can lead to overfitting problems, affecting the accuracy of the network. It further indicates that attention should be paid to the setting of the number of epochs during network training.
Accuracy is adopted as the primary evaluation metric in this study, as the datasets are balanced across fault categories. Under such conditions, accuracy provides a reliable indicator of overall classification performance. Nevertheless, metrics such as precision, recall, and F1 score are also important for evaluating false-positive and false-negative rates, particularly in imbalanced scenarios. A more comprehensive multi-metric evaluation will be considered in future work to further assess the diagnostic reliability of the proposed method.