1. Introduction
Under the strategic imperative of high-quality industrial development, rotating machinery serves as critical equipment in industrial systems. They operate in complex environments characterized by high temperatures, fatigue, and heavy-load conditions [
1]. Rolling mills represent rotating mechanical equipment for metal processing, inducing the plastic deformation of materials through rolling pressure to modify their geometric configurations and properties [
2]. The system comprises multiple components, including frames, work rolls, and transmission mechanisms, with work rolls constituting the core working elements. In the rolling process, work rolls typically need to be supported and rotated by rolling bearings, which make contact with the surface of metal materials to generate frictional forces and compressive stresses. The rolling elements of a bearing are typically distributed in a strictly symmetrical and equidistant manner around the circumference. This rotationally symmetric geometry forms the foundation for stable operation. However, when a pit or spall appears at a specific location on the inner race, outer race, or rolling elements, it fundamentally compromises this structural symmetry. This defect thereby becomes a fixed and asymmetric excitation source, generating abnormal vibration impacts. Rolling bearings endure severe operational environments with continuous heavy-load operation, resulting in frequent failures. Rolling bearing failures critically degrade the stability of rolling mills, potentially triggering catastrophic accidents with substantial economic loss and safety hazards. Fault diagnosis focuses on analyzing a system’s current state by detecting existing anomalies, identifying their specific modes, and isolating faulty components. In contrast, fault prognosis targets the future progression of faults by predicting how anomalies will evolve and estimating components’ remaining useful life. Therefore, health monitoring and fault diagnosis strategies for rolling bearings have become critical for rolling mill maintenance [
3].
Fault diagnosis is a key technology to ensure the safe and reliable operation of industrial systems. Its core lies in comprehensively perceiving the abnormal states of the system, determining fault types, and achieving accurate localization by analyzing operational data. A complete diagnostic process starts with fault detection to perceive system anomalies. Then it proceeds to fault identification to determine its specific mode. And finally, it realizes fault isolation to locate the specific component. Brahmbhatt et al. [
4] developed a graph-based neural network integrated with system knowledge graphs. By applying advanced graph neural networks, their method significantly enhances fault detection and diagnosis in industrial processes. Gajjar et al. [
5] created an integrated framework that combines a neural network classifier with a model-based feedback controller. This hybrid strategy addresses the challenge of concurrent multi-sensor fault detection and system stability control in nonlinear systems. Shahnazari [
6] proposed a generic FDI (fault detection and isolation) framework that utilizes specifically designed residual sets. This framework leverages the generated fault signatures to isolate concurrent faults in complex systems without relying on precise models or historical fault data. Xie et al. [
7] introduced a digital twin-based approach to tackle issues of redundant data dimensions and high computational load in smart buildings. They employed HVAC system fault detection and diagnosis as a case study to automatically identify and filter critical fault-related data. Consequently, recent advances in the field of fault diagnosis are increasingly integrating digital twins and deep learning to address real-world complexity. These research efforts substantiate the broad applicability of such integrated methodologies across various scenarios. This connectivity links our research on bearing fault diagnosis to the wider domain of engineering fault diagnosis.
Advancements in IoT and AI technologies have propelled intelligent diagnostic methods into research prominence [
8]. Vibration signal acquisition from operational rolling bearings enables data-driven fault diagnosis, which typically involves feature extraction [
9] and fault classification [
10] procedures. Conventional approaches calculate expert-defined fault features for classification using shallow machine learning models, such as support vector machines [
11,
12,
13], random forests [
14,
15], and decision trees [
16,
17]. Despite demonstrating their efficacy in early stage applications, these methods have exhibited critical limitations. An overreliance on domain expertise impedes a model’s self-adaptation capabilities. In addition, shallow architectures fail to capture the complex nonlinear features in operational processes.
Deep learning (DL) methods have distinct advantages over conventional diagnostic techniques for fault-identification applications. This approach autonomously learns feature representations from raw vibration signals, eliminating expert dependency while addressing diagnostic challenges under complex working conditions. Cui et al. [
18] developed a residual network integrated with Singular Value Decomposition (SVD), employing SVD pooling layers for signal denoising before feeding wavelet time–frequency maps into an enhanced ResNet architecture for fault classification. However, this method suffers from excessive computational complexity, which significantly compromises model training efficiency. Zhang et al. [
19] presented a DL model in response to the complex fault types encountered in industrial wind turbines. This approach transforms one-dimensional vibration data into two-dimensional image to enhance feature representation and concurrently redesigns the deep residual network architecture to optimize kernel sensitivity. This study provides a novel framework for fault diagnosis in complex industrial data environments. Zhu et al. [
20] innovatively employed a wide-kernel convolutional structure to eliminate high-frequency noise interference in vibration signals, substantially improving the diagnostic performance by expanding the size of the convolutional kernels. Chen et al. [
21] designed a neural network that automatically learns features. This technique uses dual-branch convolutional neural networks with different kernel sizes to automatically extract multi-frequency characteristics after directly processing raw vibration signals as input. These features are subsequently input into LSTM network for fault type classification. This method offers a new avenue for the diagnosis of intelligent bearing faults.
DL based data-driven fault diagnosis methods require the prior acquisition of complete fault pattern datasets for model training [
22,
23]. However, such datasets are often unavailable in critical industrial scenarios, fundamentally limiting the applicability of data-driven approaches. Digital twin (DT) technology has emerged as an emerging technology that addresses the scarcity of training data in mechanical health management through virtual–physical interaction [
24]. DT serves as a virtual representation of a physical system. It employs digital methods to construct a dynamic model that captures multi-dimensional, multi-spatiotemporal scale, and multi-physical quantity characteristics. This model simulates the attributes, behaviors, and principles of the physical entity in real-world environments. It can be utilized to generate high-fidelity simulated data for various engineering applications. Under insufficient fault data conditions, the DT establishes digital models of physical entities to reflect operational states, providing reliable data support for fault diagnosis and prognosis. Xia et al. [
25] proposed a DT-assisted triplex pump diagnostic method that employs simulated fault data from digital models and implements fault identification through sparse denoising autoencoder networks. Yan et al. [
26] proposed a DT-enhanced imbalanced fault diagnosis framework that generates simulated data through gearbox nonlinear dynamic analysis and digital twin modeling to overcome the difficulty in acquiring actual fault data. Zhang et al. [
27] created a dynamic DT-bearing model to simulate operational conditions and generate sufficient fault datasets to resolve diagnostic challenges using unlabeled health status data. Li et al. [
28] devised a DT-based synthetic data augmentation method using DT models to generate simulated data, coupled with a lightweight CNN architecture integrating focal modulation mechanisms (FM-LCN) to enhance the diagnostic accuracy. Ming et al. [
29] designed a DT-assisted diagnostic framework combining frequency-domain filtering and subdomain adaptation networks to achieve cross-condition feature space alignment, significantly improving the performance under data imbalance.
While DL and DT applications have been separately investigated for bearing fault diagnosis, an integrated framework is notably absent. Inspired by the existing literature, this study proposes an innovative methodology integrating DT technology with the multi-scale convolutional neural network with attention mechanisms and bidirectional gated recurrent unit (MCNN-AT-BiGRU) hybrid diagnostic model for rolling bearing fault diagnosis. First, a rolling bearing DT model for rolling bearings is established by leveraging the geometric symmetry of rolling bearings to simulate multiple failure modes and their evolutionary processes. The model generates simulated data, which is subsequently infused into actual datasets via a hierarchical hybrid strategy to balance the distribution of fault categories. The MCNN-AT-BiGRU hybrid diagnostic model with a symmetric network structure is then constructed. This model makes use of a symmetrically parallel convolutional neural network layer structure for multi-scale feature extraction, while employing an attention mechanism to accurately capture key features in fault signals. These key features often manifest as asymmetric transient impulses. Furthermore, the deep focus on and fusion of these asymmetric features serve as the key to the model achieving high-precision fault diagnosis. By capturing the bearing fault periodicity characteristics, the bidirectional gated recurrent units architecture extracts temporal features from the vibration signals to achieve spatiotemporal feature integration. The principal contributions of this work are as follows:
- (1)
To overcome the limitations of purely data-driven methods, this study develops a DT framework. The framework is centered on a high-fidelity model of rolling bearings. It incorporates the physical dimensions of the bearing and physical principles. The model generates diverse types of physics-consistent fault simulation data. This capability effectively addresses the challenge of data scarcity.
- (2)
The symmetric network architecture of MCNN-AT-BiGRU is developed, optimizing parameters through virtual–real data co-training and innovatively integrating multi-scale convolutional neural networks (MCNNs) with bidirectional gated recurrent units (BiGRUs) for spatiotemporal complementary feature representation. By leveraging the physical characteristics of bearing faults, the attention mechanism (AT) is incorporated to precisely capture asymmetric transient impulses within fault signals. The deep focus on and fusion of these asymmetric features enhance the model’s accuracy in identifying fault types.
- (3)
The experimental results establish the greater recognition accuracy of the proposed method over mainstream diagnostic models in noisy environments and varying operational conditions. Meanwhile, the proposed method maintains stable performance in cross-bearing-type transfer experiments. These results indicate the robustness of the model in mitigating noise interference and adapting to complex industrial settings, exhibiting exceptional generalization performance and transferability.
Section 2 details the DT model for rolling bearings.
Section 3 describes the architecture of the proposed hybrid diagnostic model composed of a multi-scale convolutional neural network with an attention mechanism and a bidirectional gated recurrent unit (MCNN-AT-BiGRU), along with the integrated fault diagnosis process.
Section 4 presents the experimentally designed scenarios to validate and discuss the performance of the MCNN-AT-BiGRU model.
Section 5 concludes the study and outlines prospective research directions.
4. Experiments and Results
In this section, experiments were conducted under four scenarios: single operating conditions, noise interference, variable operating conditions, and cross-bearing generalization performance, with a subsequent analysis to verify the effectiveness of the proposed methodology. The classification accuracy was chosen as the main performance parameter to assess the model’s capacity for diagnosis. All the experiments shared identical runtime environments. The implementation used Python 3.9 and was developed on the PyTorch 2.1 framework. The hardware consisted of an NVIDIA GeForce RTX 4060 GPU and an Intel Core i7-12700H CPU.
4.1. Dataset Description
The experimental data in this study originated from the Case Western Reserve University (CWRU) Bearing Data Center [
33]. Vibration signals were collected from the drive-end SKF-6205-2RS deep-groove ball bearings using the setup shown in
Figure 10. The sampling frequency was set to 12 kHz. The dataset encompassed four operational conditions (0, 1, 2, and 3 hp, where 1 hp = 745.7 W) and ten bearing health states, including NOR (normal state), IRF (inner ring fault), ORF (outer ring fault), and ROF (roller fault), with defect diameters of 0.178 mm, 0.356 mm, and 0.534 mm. All the faults were introduced using EDM (electrical discharge machining), a technique that generates highly precise and controllable pits on metal materials through electrical discharges. Each state comprised 100 samples (1024 points per sample), totaling 1000 samples. To address fault sample scarcity, a bearing dynamics model was constructed using DT technology. This model simulated the operating conditions and fault patterns identical to real data to generate simulated vibration signals. A stratified hybrid strategy involved the injection of synthetic data into a measured dataset. The normal state retained all 100 originally measured samples. Mild wear (0.178 mm) and moderate wear (0.356 mm) mixed 50 measured and 50 simulated samples in a 1:1 ratio. Severe spalling faults (0.534 mm) combined 20 measured and 80 simulated samples at 1:4 ratio, ensuring a total of 100 samples per health state. The final augmented datasets A/B/C/D (corresponding to the four operating conditions) preserved 1000 total samples, divided into a training set (700 samples) and a test set (300 samples) at a 7:3 ratio. The test set exclusively contained unreplaced original measured data to validate model generalization. The dataset configurations are listed in
Table 4.
4.2. Single Operating Condition Scenario Experiment
To evaluate the diagnostic capability of the model under single operating conditions, this study validated the proposed method using datasets A, B, C, and D. To ensure a robust and reliable evaluation, we employed a k-fold cross-validation strategy. This involved partitioning the training data into temporary folds and iteratively using different subsets for training and validation, thereby providing a more reliable estimate of model performance and reducing the variance of the results. Furthermore, considering the random sequencing characteristics of the training data, 10 complete repeated trials of this cross-validation process were conducted, with the averaged performance metrics serving as the final evaluation criterion to enhance result reliability. During each trial, 150 complete iterations were executed, with loss function values and classification accuracy recorded at each iteration to comprehensively monitor the training process. To improve training efficiency without compromising generalization performance, an early stopping mechanism was implemented, which terminated training when the validation loss failed to decrease for ten consecutive epochs, indicating the completion of model optimization. Additionally, the batch size was identified as a crucial factor influencing model performance. While a larger batch size can reduce training time per epoch, it might adversely affect generalization ability. Therefore, to strike an optimal balance, experiments were conducted using Dataset B while systematically varying only the batch size, with the comparative results presented in
Table 5.
As shown in the table above, the training difficulty varied across different batch sizes, leading to distinct early stopping epochs. While batch sizes of 32 and 128 achieved shorter training durations, the former yielded a higher diagnostic accuracy. Consequently, a batch size of 32 optimally balanced the diagnostic precision and computational efficiency, emerging as the preferred choice.
With the determined optimal batch size,
Table 6 compares the proposed method with BiGRU, CNN, MCNN, and CNN-GRU under single operating conditions. The proposed model achieved optimal diagnostic performance with an average recognition accuracy of 99.79%. Compared to the BiGRU, CNN, MCNN, and CNN-GRU models, the diagnostic precision improved by 34.66%, 6.61%, 4.34%, and 1.66%, respectively. Given the use of identical training parameters, the performance differences evident in
Table 6 can be explained by the inherent strengths and limitations of each model’s architecture. The BiGRU model exhibited significant limitations in fault identification because of the difficulty in extracting spatial features from data [
34]. Convolution-based comparison models all exceeded 92% diagnostic accuracy, confirming the superiority convolutional operations in feature extraction [
35].
The incorporation of the coordinate attention mechanism was pivotal to the model’s further performance improvement. As shown in
Figure 11, which visualizes the attention weights when the model processed outer and inner ring fault signals, the highlighted regions exhibited a high degree of spatiotemporal consistency with the transient impulse peaks caused by fault impacts in the raw signal. Rather than merely propagating features, the mechanism learned and prioritized the importance of each location in the feature map. It enhanced feature channels rich in critical fault information while suppressing irrelevant noise, thereby making the entire model’s information flow more efficient.
Furthermore, a permutation test was performed on the classification accuracy of the proposed MCNN-AT-BiGRU model relative to the CNN-GRU model. The test was based on 10 independent runs with 10,000 permutation resamplings, and the results are shown in
Figure 12. Analysis revealed that the permutation distribution of accuracy under the null hypothesis exhibited an approximately normal shape centered around zero, indicating that if the performance of the two models were equivalent, differences would be expected to cluster near zero. However, the observed accuracy advantage of +1.66% lay far in the right extreme tail of the distribution and fell entirely outside the confidence interval derived from the permutation distribution. This result suggests that the observed difference is highly unlikely to be due to random variance (
p < 0.01), leading to a rejection of the null hypothesis. Therefore, the 1.66% performance improvement achieved by the MCNN-AT-BiGRU model is statistically significant, confirming the effectiveness of the proposed model enhancement.
We selected Dataset B to further validate the model. In addition to accuracy, the metrics of precision, recall, and F1-score were also introduced. Similarly, ten repeated experiments were conducted, and the average value was taken as the final result. The specific experimental results are summarized in
Table 7. Moreover,
Figure 13 shows the confusion matrices of different methods, and the proposed model can distinguish the health conditions of these ten types of bearings, which further proves the superiority of the model structure.
The proposed method innovatively integrated multi-scale convolutional structures with attention mechanisms. Multi-scale convolutional layers captured hierarchical spatial features, whereas attention weighting optimized the feature representation. Bidirectional gated recurrent networks further excavated the temporal features. This hierarchical feature learning structure significantly enhanced the feature extraction capability of the model, thereby achieving optimal diagnostic results under stable working conditions, which fully illustrated the validity of the proposed method.
4.3. Noise Interference Scenario Experiment
The vibration signals of rolling bearings collected in industrial fields are often affected by environmental noise. During the initial stages of a fault occurrence, weak fault characteristics are prone to noise. This results in the masking of the critical fault information.
To evaluate the diagnostic performance of models in noisy environments, this study introduced Gaussian white noise to perturb the experimental datasets. Different noise intensities were simulated by adjusting the signal-to-noise ratio (SNR) parameter. Taking an inner race fault sample with a damage diameter of 0.178 mm as an example,
Figure 14 illustrates the original signal and noise-contaminated signals at different SNRs. Distinctive fault characteristics were clearly observable in the original signal. However, these characteristics became progressively obscured after the noise injection. As the SNR decreased, the masking effect on the fault features increased significantly. Consequently, the difficulty of fault identification increased correspondingly.
Dataset B was selected for the noise interference experiments. Gaussian white noise with SNRs ranging from −4 dB to 8 dB was injected to simulate varying interference intensities. The experimental parameters were consistent with those of previous configurations to ensure comparability. To ensure result reliability, average values from 10 repeated trials were adopted as final evaluation metrics. Each trial involved 150 iterations for comprehensive assessment.
Figure 15 illustrates performance comparison results of different models under varying SNR conditions.
The results showed that the diagnostic accuracy of all the models progressively improved with increasing SNR. Among them, the BiGRU method exhibited suboptimal overall performance. Its accuracy notably lagged behind that of the comparative methods, particularly in low-SNR regions. In contrast, the CNN and MCNN architectures demonstrated stronger noise resistance. This advantage primarily stems from the convolution and pooling operations, which extracted more robust features. These operations suppressed noise interference during signal processing. The CNN-GRU model combined spatial feature extraction from convolutional layers with the temporal modeling capabilities of GRU networks. This hybrid approach achieved superior performance compared with the standalone CNN and MCNN frameworks. The proposed method maintained the leading performance across all the SNR levels. It sustained approximately 90% accuracy even under negative SNR conditions.
Although the aforementioned experiment illustrated the significant robustness of the proposed MCNN-AT-BiGRU model against Gaussian white noise, the acoustic environments in industrial settings are often more complex. As an idealized assumption, Gaussian white noise has a power spectral density that is uniformly distributed across frequencies. In contrast, the background noise in practical mechanical systems typically exhibits specific frequency characteristics. For instance, pink noise and Brownian noise, whose energy is predominantly concentrated in the low-frequency region, more realistically simulate the random interference generated by physical processes such as bearing wear and structural resonance. Furthermore, blue noise, as another typical type of colored noise, possesses an energy distribution opposite to that of pink noise, increasing with frequency. It is often used to simulate high-frequency-dominated interference, such as certain types of electromagnetic interference or high-frequency meshing noise. These types of noise form a continuum of energy distribution from low to high frequencies, providing a more realistic scenario for a comprehensive assessment of the diagnostic model’s robustness.
To evaluate the model’s performance in noise environments that more closely resemble industrial realities, this study incorporated diagnostic experiments under colored noise interference. The SNR was set to 2 dB, with all the other parameters remaining unchanged in the experiment. The average accuracy of ten repeated experiments was taken as the evaluation metric. The results are summarized in
Table 8.
As clearly evidenced in
Table 8, under the condition of SNR = 2 dB, the proposed MCNN-AT-BiGRU model consistently achieved the highest diagnostic accuracy across all four noise environments. Under the most interfering Brownian noise and pink noise, the proposed model achieved accuracy advantages of 5.42% and 3.38%, respectively, compared with the suboptimal-performing CNN-GRU, which highlighted the critical role of its attention mechanism in capturing transient pulses from strong background noise. Under blue noise, the accuracy of all the models is higher than that under other colored noises, even approaching the level under Gaussian white noise. The proposed model remained leading with an accuracy of 94.36%, which indicated that its multi-scale convolutional structure could effectively extract high-frequency detailed features. The proposed model exhibited stable performance under noises with different energy distributions. Its symmetric network architecture and attention mechanism jointly ensured that it could accurately focus on the essential features of faults, whether in Brownian and pink noises with energy concentrated in low frequencies or in blue noise with energy concentrated in high frequencies. These results supported the effectiveness of the proposed method in terms of robust diagnostic capability under high-noise environments.
4.4. Variable Operating Conditions Scenario Experiment
To validate the cross-condition adaptability of the proposed method, datasets A, B, C, and D were used to simulate different load conditions. Specifically, Dataset B under the 1 hp load condition served as the training set for model parameter optimization. Datasets A, C, and D under 0 hp, 2 hp, and 3 hp load conditions were subsequently used to evaluate the model generalization performance. The notation “B→C” denoted training on Dataset B and testing on Dataset C for fault diagnosis. The experimental parameters remained consistent with previous configurations to ensure methodological uniformity. The reliability of the results was ensured through averaged metrics from 10 repeated trials, each involving 150 iterations.
Figure 16 presents performance comparisons of the various models under varying operational conditions.
The results illustrated diagnostic accuracy degradation across all the models in cross-condition scenarios (“B→A”, “B→C”, and “B→D”) compared to same-condition testing (“B→B”). This confirmed that operational condition variations significantly increased the diagnostic difficulty. Regarding model performance, BiGRU exhibited the poorest results across all the test scenarios. This reflected its inadequate adaptability to changes in operational conditions. Although outperforming BiGRU, CNN and MCNN still showed noticeable performance degradation under cross-condition settings. The CNN-GRU illustrated relative advantages in specific scenarios (e.g., “B→C” and “B→D”). Nevertheless, its overall performance was inferior to that of the proposed method. The proposed method consistently achieved optimal performance across all the test conditions, demonstrating more significant advantages when there were large differences in working conditions. This validated exceptional cross-condition adaptability and generalization capabilities.
4.5. Generalization Performance Experiment for Different Bearings
In practical industrial applications, bearing faults exhibit diverse generation mechanisms and characteristics. This necessitates the use of diagnostic models to recognize known fault patterns and accurately classify similar fault types.
An additional bearing fault dataset from Jiangnan University was incorporated to evaluate the cross-type fault diagnosis capability of the proposed model [
3]. The test bearings were N205EM cylindrical roller bearings, with simulated faults introduced via wire-electrical discharge machining to create 0.3 mm × 0.05 mm (width × depth) grooves on the outer ring, inner ring, and roller surfaces. A cross-validation scheme was implemented through bidirectional testing between the different bearing-type datasets. Initially, the CWRU bearing dataset (Dataset P) was designated as the source domain, whereas the Jiangnan University bearing dataset (Dataset Q) served as the target domain for transfer fault diagnosis. Conversely, Dataset Q was configured as the source domain, with Dataset P acting as the target domain.
Owing to the structural differences between the Jiangnan University and CWRU datasets, the CWRU data were recalibrated to align with the experimental requirements. Specifically, faults with a defect diameter of 0.356 mm on the outer ring, inner ring, and roller at a rotational speed of 1750 r/min were selected. The number of classification categories in the DL model architecture was adjusted from 10 to 4 to accommodate the dataset consistency. The specifications of the key dataset are listed in
Table 9.
During the data preprocessing stage, distinct sampling strategies were implemented for both datasets. Dataset P from the CWRU continued to use overlapping sampling techniques. This approach simulated industrial scenarios with limited fault samples. For Jiangnan University Dataset Q, sequential sampling was applied directly to the drive-end vibration signals. Ultimately, 200 samples per fault category were obtained from each dataset, and each sample comprised 1024 consecutive sampling points. Preprocessed samples from Datasets P and Q were used separately for model training. The complete training workflow is illustrated in
Figure 17.
Dataset P from CWRU and Dataset Q from Jiangnan University were, respectively, used as source-domain training data to construct a basic model and save the parameters of the feature extraction layer. During model transfer phase, all the parameters except batch normalization and fully connected layers remained frozen. Only 20% of the target-domain samples were utilized for fine-tuning trainable layer parameters. The remaining 80% of the target-domain data was reserved for performance evaluation. The diagnostic accuracies of the four models under different source–target domain configurations are summarized in
Table 10.
The results in
Table 10 show the superior performance of the proposed method in cross-type transfer experiments. Diagnostic accuracy rates of 92.52% and 98.76% were achieved, significantly outperforming those of the comparative models. This established enhanced cross-domain generalization capabilities of the proposed method. Further analysis revealed higher diagnostic accuracy when Dataset Q served as the source domain and Dataset P as the target domain. This superiority stemmed from fundamental differences in data structures and sampling strategies between Dataset P and Dataset Q. Dataset Q possesses a greater sample abundance and a relatively sparse sample space distribution. These characteristics enabled the models to learn more generalized feature representations during the source-domain training. Under conditions of favorable source-domain sample sparsity, only a small number of the target-domain fault samples were sufficient for fine-tuning. The accurate identification of similar faults across different bearing types and damage sizes was achieved. This result illustrated the transfer capabilities of the method in cross-domain bearing fault diagnosis.
5. Conclusions
This study proposes an intelligent fault diagnosis framework based on virtual–physical data collaborative optimization. Initially, leveraging the geometric symmetry of rolling bearings, digital twin technology is used to establish a DT model of rolling bearings to simulate various fault modes and their evolution processes, and to generate simulation data to balance the distribution of fault samples in practical applications. Subsequently, a symmetrically parallel MCNN-AT-BiGRU network architecture is constructed. This architecture combines the spatial and temporal feature extraction capabilities through an integrated design. An attention mechanism is incorporated after the MCNN module to enhance the fault impulse characteristics. This mechanism converts critical fault information into attention weights for feature modulation. These optimized weights act on the original feature matrices to facilitate advanced feature learning by the BiGRU.
The experimental results indicate that the method proposed here shows better diagnostic capabilities under diverse fault conditions. The model also exhibits strong transfer learning capability and generalization performance. This provides an effective solution for bearing fault diagnosis, thereby ensuring the safe and efficient operation of bearings.
It is also important to acknowledge several limitations of the current study. These limitations clarify the scope of our present findings and point to a clear direction for future research. First, the DT model uses a simplified rectangular geometry to represent bearing spalls. This simplification fails to fully capture the intricate morphologies of real-world single damage. Moreover, it struggles to reflect the interactive characteristics and complex coexistence states of simultaneous faults. Second, the current noise-resistance experiments mainly focus on additive white Gaussian noise and included some colored noises. They do not address more prevalent and complex industrial interferences. Addressing these limitations will be a core focus of our subsequent research. Thus, future work will not only assess the model’s long-term reliability under extreme conditions but also extend the current framework to diagnose simultaneous faults.