The effectiveness of the proposed method is assessed using VS data obtained from actual bearing testbeds. Since the primary aim of this method is to detect and diagnose faults in bearings, this section begins by comparing the fault detection capability of the proposed method.
3.1. Experimental Setup and Data Acquisition
Figure 5 depicts the structure of the bearing testbed developed by the Ulsan Industrial Artificial Intelligence (UIAI) Laboratory at Ulsan University, Ulsan, Republic of Korea, which was utilized for this analysis. The collected data were categorized into four distinct bearing conditions: normal, outer race damage, inner race damage, and roller damage. During the experiment, a three-phase motor operated the testbed at a constant speed of 1800 rpm. The rotational motion was transferred from the rotor shaft to the main shaft via a belt system installed on both sides of the test bearings. The bearings used in this study were FAG NJ206-E-TVP2 cylindrical roller bearings made of chrome steel (AISI 52100), subjected to a constant vertical load of 100 kgf. Artificial defects were introduced with dimensions of approximately 3 mm × 0.3 mm × 1 mm to simulate real fault conditions. To ensure precise data acquisition, the maximum vibration signal was recorded from the left side of the target bearing using both a vibration sensor and an acoustic emission (AE) accelerometer. A schematic representation of the complete experimental setup is provided in
Figure 6, offering a detailed overview of the testbed configuration and data acquisition process.
The data acquisition system, detailed in
Table 3, utilizes a FAG NJ206-3-TVP2 bearing, a cylindrical roller type. For capturing VS, an accelerometer (model PCB-622B01) was employed, while AE signals were detected using R15I-AST AE sensors. Both sensors were interfaced with an NI-9234 data acquisition (DAQ) device to ensure precise data collection from the integrated electronics piezoelectric (IEPE) sensors.
Vibration data were acquired at a sampling rate of 25 kHz, with five minutes of continuous data collection conducted for all bearing conditions. Subsequently, the data were divided into 1 s segments, with each 1 s segment containing 309–390 data samples for different fault types. The test procedure can be repeated for different fault types by replacing the test bearing in the same testbed setup. The samples of each class of the bearing fault as well as its vibration data are shown in the following
Figure 7 and
Figure 8, followed by the dataset details in
Table 4.
3.2. Paderborn University Dataset
The experimental test rig developed by Lessmeier et al. [
42] for vibration data acquisition consisted of several key components, as illustrated in
Figure 9. Designed as a versatile platform, it enabled the testing of bearings under various fault conditions, allowing for precise fault characterization and analysis. At the core of the setup, the bearing test module facilitated controlled testing by applying a constant radial load to the test bearings, adjustable up to 10 kN before each experiment. This ensured consistency and repeatability across different testing conditions. The system was powered by a 425 W permanent magnet synchronous motor (PMSM) with a nominal torque of 1.35 Nm, a rotational speed of 3000 rpm, a nominal current of 2.3 A, and four pole pairs. Manufactured by Hanning Elektro-Werke GmbH & Co. KG (Model: SD4CDu8S009), the motor was controlled using an industrial inverter (KEB Combivert 07F5E 1D-2B0A, Oerlinghausen, Germany) with a 16 kHz switching frequency, effectively replicating real industrial environments, including PWM-induced noise typically found in such systems. To capture vibration signals, a piezoelectric accelerometer (Model 336C04, PCB Piezotronics, Inc., Depew, NY, USA) was securely mounted on the bearing housing adapter. The signals were amplified using a Type 5015A charge amplifier (Kistler Group, Winterthur, Switzerland) and filtered through a 30 kHz low-pass filter, which removed high-frequency noise. The filtered signals were then digitized at a 64 kHz sampling rate, ensuring high-resolution data acquisition for precise fault analysis. To mimic real-world operational conditions, a flywheel and load motor were incorporated into the setup, simulating dynamic load variations and system inertia. The experiments were conducted under four distinct operating conditions, varying in rotational speed, radial force, and load torque, as outlined in
Table 5.
The study utilized 32 bearings, categorized into three groups: 12 bearings with artificially induced faults, 14 bearings subjected to accelerated lifetime testing to develop real-time faults, and 6 healthy bearings serving as baseline references.
3.2.1. Case 1: Artificial Induced Faults
Lessmeier et al. [
42] induced artificial damage in bearings using three distinct methods, each designed to replicate common fault characteristics observed in industrial environments. The first method, electric discharge machining (EDM), was employed to create precise trenches approximately 0.25 mm in length along the rolling direction, with depths ranging from 1 mm to 2 mm. This technique ensures highly controlled and repeatable damage patterns, making it ideal for maintaining consistent experimental conditions. The second method involved drilling, which was used to introduce faults of varying diameters, simulating damage caused by localized stress concentrations or surface abrasions. This approach helps replicate real-world wear and tear commonly found in industrial applications. The third method utilized electric engraving to create faults with damage lengths ranging from 1 mm to 4 mm. This technique produces irregular damage profiles, mimicking defects that arise due to material fatigue or improper handling. By employing these three damage-inducing techniques, the dataset captures a diverse range of fault characteristics, ensuring a comprehensive evaluation of the proposed fault detection framework. The bearing codes used in this study for Case 1, which are publicly available, are detailed in
Table 6.
3.2.2. Case 2: Real Bearing Faults
Lessmeier et al. [
42] generated real damaged ball bearings using an accelerated lifetime test rig. The test bearings were subjected to a radial load applied through a spring-screw mechanism. To accelerate the formation of fatigue damage, the load was deliberately set higher than standard operational conditions while remaining below the bearings’ static load capacity to prevent immediate failure. Additionally, low-viscosity oil was introduced to simulate poor lubrication conditions, further promoting the gradual development of bearing damage. The accelerated testing process produced several damaged bearings, with approximately 70% exhibiting fatigue-induced pitting on both the inner race (IR) and outer race (OR). Among the remaining bearings, most displayed plastic deformations in the form of indentations caused by debris, primarily manifesting as OR faults. One bearing experienced a complete fracture, while no damage was observed to the rolling elements. This controlled approach to accelerated lifetime testing ensured the generation of realistic and diverse fault scenarios, allowing the collected data to closely reflect real-world failure mechanisms. This makes the dataset valuable for robust fault detection research. The bearing codes used in this study for Case 2, which are publicly available, are listed in
Table 7.
3.4. Comparative Analysis of Fault Diagnosis Methods
To comprehensively evaluate the performance and generalization capability of the proposed model, it was tested on the UIAI Lab dataset as well as both the real and artificial subsets of the Paderborn dataset. Each of the datasets was divided into 80% for training and 20% for testing purposes. Furthermore, the UIAI Lab dataset was also used to validate the recent model proposed by Guanghua Fu et al. [
43] for a direct performance comparison, and an ablation study of the proposed model was carried out. The proposed hybrid deep learning model combines CWT with an advanced classification architecture comprising 1D conv ResNet, MHSA, and BiLSTM. Initially, vibration signals are transformed into time-frequency representations using CWT, effectively capturing nonstationary and nonlinear signal characteristics across multiple scales. These CWT scalograms are then processed through the hybrid deep learning network. The proposed hybrid deep learning model combines CWT with a powerful classification framework that integrates MHSA, BiLSTM, and 1D conv ResNet. Initially, vibration signals are transformed into time-frequency representations using CWT, effectively capturing nonstationary and nonlinear signal characteristics across multiple scales. These CWT scalograms are then processed through the hybrid architecture, where each component contributes to robust and interpretable feature learning: 1D conv ResNet extracts localized features while ensuring efficient gradient propagation, BiLSTM captures long-term temporal dependencies important for fault progression tracking, and MHSA enhances the model’s focus on the most informative segments of the signal. This enables the model to remain resilient in noisy industrial environments and effectively distinguish between fault types. The model was trained on a bearing dataset encompassing four classes—IRF, ORF, RF, and NC. Training and validation results demonstrated the model’s rapid convergence and stability. Both accuracy curves quickly approached higher accuracy after a dozen epochs, while training and validation losses showed a sharp decline and stabilize near zero—indicating highly effective learning without overfitting, as shown in
Figure 10. On internal testing, the model achieved perfect classification accuracy across all four fault categories, as reflected in the confusion matrix in
Figure 11: 85/85 for IRF, 74/74 for ORF, 55/55 for RF, and 60/60 for NC. Precision, recall, and F1-scores for all classes reached 1.00, further emphasizing the model’s diagnostic precision, as shown in
Table 8. The t-SNE visualization revealed clearly clustered and non-overlapping distributions for each class, showcasing the model’s ability to learn well-separated, discriminative features as evident from
Figure 12. Additionally, ROC curves for each class yielded an AUC of 1.00, confirming flawless class separation and zero false positives or false negatives. As shown in
Table 9, the class-wise metrics further validate these findings, with all fault types achieving a true positive rate (TPR) of 1.00 and minimal or zero values for false positive rate (FPR), false negative rate (FNR), and false discovery rate (FDR), except for a slight drop in normal class due to minor misclassifications—demonstrating the model’s reliability and precision across fault categories.
These confidence interval (CI) values reflect a high-performing model with minimal variance, making them statistically realistic and better for bearing fault diagnosis [
43]. To evaluate generalization and real-world applicability, the model was further tested on the Paderborn bearing dataset, which includes both real-world and artificially induced fault conditions. On the real dataset, the model achieved an overall accuracy of 97.98 ± 0.18%, with precision, recall, and F1-score all at 98.33 ± ~0.16%, as mentioned in
Table 9. Class-wise performance remained high, with perfect classification for normal and OR real samples, and minor misclassifications between IR real and OR real classes. The confusion matrix confirmed 739/774 correct predictions for IR real, 395/395 for normal, and 615/616 for OR real, as shown in
Figure 11. As shown in
Figure 12, the t-SNE visualizations indicated clear class separation in all three datasets, with minor overlaps observed in the Paderborn real set due to naturally developed faults. Similarly, the ROC curves in
Figure 13 reflect strong classification capability, with high AUC values supporting the model’s effectiveness in distinguishing fault types across diverse conditions.
On the artificial Paderborn dataset, the model achieved an even higher accuracy of 98.71 ± 0.15%, with precision, recall, and F1-score values all at 98.67 ± ~0.13%. Only a few misclassifications were observed between IR artificial and normal samples, while OR artificial was classified with 100% accuracy. The ROC curves for all artificial classes once again demonstrated an AUC of 1.00, and t-SNE visualizations in
Figure 12 and
Figure 13, respectively, revealed distinct, non-overlapping class clusters, further validating the model’s robustness and reliability.
An ablation study was conducted to assess the individual contributions of core components in the proposed model using the UIAI lab dataset. As shown in
Table 9, four configurations were tested: no BiLSTM (CWT + Conv1D + MHSA), no MHSA (CWT + Conv1D + BiLSTM), no Conv1D (CWT + MHSA + BiLSTM), and Conv1D only (CWT + Conv1D). The results indicates that excluding BiLSTM led to an average accuracy of 82.12 ± 0.22%, precision of 83.60 ± 0.21%, recall of 82.59 ± 0.20%, and F1-score of 83.09 ± 0.22%. Removing MHSA yielded improved results with 86.50 ± 0.19% accuracy, 87.05 ± 0.17% precision, 87.01 ± 0.19% recall, and 87.07 ± 0.18% F1-score. The best performance among ablation variants was achieved when Conv1D was removed, resulting in 95.05 ± 0.14% accuracy, 96.96 ± 0.13% precision, 96.87 ± 0.15% recall, and 96.91 ± 0.13% F1-score. In contrast, the Conv1D Only configuration produced the poorest performance with 63.68 ± 0.16% accuracy, 62.55 ± 0.18% precision, 62.62 ± 0.17% recall, and 62.43 ± 0.20% F1-score, highlighting the critical contributions of both temporal modeling and attention mechanisms in the proposed framework.
In comparison to recent state-of-the-art models, the proposed method demonstrates notable improvements in both accuracy and robustness. For instance, the model developed by Guanghua Fu et al. [
44], which integrates a CNN with BiLSTM and a residual module, employs a dual-path feature extraction strategy. In this framework, BiLSTM captures temporal characteristics from vibration signals, while CNN processes spatial features from time-frequency representations. These complementary features are fused and further refined by a residual module to enhance robustness, particularly under noisy conditions. When this model was applied to the same UIAI Lab dataset, it achieved an overall accuracy of 92.02 ± 0.20%, with precision, recall, and F1-score values of 92.50 ± ~0.22%, respectively, as shown in
Table 9. While it outperformed basic CNN and BiLSTM models individually, it still exhibited several misclassifications across fault classes. In contrast, the proposed model not only surpassed Fu et al.’s architecture in all evaluation metrics but also maintained perfect or near-perfect class separation, illustrating its superior capability in handling complex, real-world vibration signals.
Overall, these results highlight the strength of the proposed framework in delivering high-precision and reliable bearing fault diagnosis. Its consistent performance across both controlled and real-world datasets confirms its strong generalization ability and practical relevance for industrial scenarios. The ablation analysis further validates the critical contribution of each architectural component, demonstrating a well-balanced and optimized design. To assess real-time suitability, the model’s latency and complexity were evaluated. It achieved an average inference time (note that the reported inference time refers only to the model’s forward pass and does not include preprocessing steps such as CWT computation) of 0.19 ms per sample measured over 100 runs on an NVIDIA RTX 3060 GPU and contained only 262,308 trainable parameters, with a model size under 1 MB. These results confirm its feasibility for real-time deployment on edge or embedded industrial systems. Moreover, the model effectively tackled key challenges such as nonstationary signal characteristics, class imbalance, and noise interference, making it a robust and scalable candidate for deployment in real-time predictive maintenance and intelligent health monitoring systems.
Figure 12.
t-SNE visualizations of learned features on (a) UIAI, (b) Paderborn real, and (c) Paderborn artificial datasets, showing clear fault class separation.
Figure 12.
t-SNE visualizations of learned features on (a) UIAI, (b) Paderborn real, and (c) Paderborn artificial datasets, showing clear fault class separation.
Figure 13.
ROC curves of the proposed model on (a) UIAI, (b) Paderborn real, and (c) Paderborn artificial datasets, indicating strong class-wise discrimination performance.
Figure 13.
ROC curves of the proposed model on (a) UIAI, (b) Paderborn real, and (c) Paderborn artificial datasets, indicating strong class-wise discrimination performance.