1. Introduction
As essential components in rotating machinery, bearings are extensively utilized across various industries such as manufacturing, aerospace, automotive, energy, and heavy industry. Due to their frequent operation in challenging conditions, they often fail before reaching their anticipated service life. This not only results in equipment downtime and production losses but can also cause serious injuries and significant economic losses. Therefore, monitoring the health of rotating machinery components is crucial. This can both ensure the normal operation of the machinery and serve as an important measure to protect the safety of the operator [
1,
2,
3,
4].
Over the past few years, deep learning (DL) technology has achieved notable advancements in the rotating machinery fault diagnosis area, and the wide range of applications have demonstrated great potential. The raw data are processed end-to-end by a DL model and directly converted into fault diagnosis results. This end-to-end processing method greatly simplifies the traditional diagnostic process by eliminating the need for extensive feature engineering and expert intervention. Owing to its efficiency and high degree of automation, DL is rapidly gaining recognition and substantial traction in rotating machinery fault diagnosis. Li et al. [
5] designed an end-to-end adaptive multi-scale fully convolutional network (AMFCN) for bearing fault diagnosis in various signal-to-noise ration (SNR) environments. Wen et al. [
6] utilized a convolutional neural network (CNN) with a hierarchical structure to evaluate the fault location and fault severity of mechanical devices. Xu et al. [
7] developed a hybrid DL model based on a CNN and deep forest (gcForest). Wang et al. [
8] introduced a batch normalization method at each layer of the deep neural network (DNN). Although DL-based methods have made significant strides in intelligent fault diagnosis, the literature [
9] points out that they need to assume that the distribution of training and test data is consistent. This assumption poses a serious limitation in engineering practice. Obtaining sufficient and balanced data remains a key challenge for intelligent diagnosis. Notably, the progress of adaptive variational autoencoding generative adversarial networks (AVAEGANs) [
10] and traceable multi-domain collaborative generative adversarial networks (TMCGANs) [
11] has made substantial progress in addressing this issue. Due to the complexity and variability of operating environments, rotating components frequently operate under diverse conditions. This variability leads to differences in data distribution, thereby weakening the diagnostic performance of the models [
12]. Therefore, devising robust diagnostic techniques to meet the challenges of domain shift is vital for enhancing the reliability of rotating machinery.
Transfer learning (TL), an emerging artificial intelligence (AI) method, leverages knowledge from one domain to facilitate learning in other related but distinct domains. This method eliminates the need for training and test data to align with the same distribution, thereby significantly reducing application limitations. Among them, DA-based methods are particularly widely used [
13]. Aiming at this aspect, an increasing number of scholars have employed TL methods for intelligent fault diagnosis, achieving notable results. Guo et al. [
14] designed a deep convolutional transfer learning network (DCTLN) to solve the fault diagnosis problem under diverse working conditions. Zhang et al. [
15] presented a Wasserstein distance to guide a multi-adversarial network (WDMAN) for promoting effective representation learning of source and target domains. In [
16], a novel joint distribution adaptation mechanism (IJDA) was introduced, which combined MMD and correlation alignment (CORAL) as a new distribution difference index.
DA-based methods can effectively utilize some target domain data during the training stage, thereby demonstrating excellent performance in the target domain. However, rotating machinery often works in a constantly changing operating environment, making the acquisition of target data in advance very difficult. In this case, DA methods cannot be used due to the lack of available target domain data. Therefore, it is of great interest and necessity to explore a generalized diagnostic method that can be trained only on the source domain data without accessing any target data, but can still effectively solve the problem of unseen target data. This aspect is equally critical for diagnostic tasks performed in real time. The aforementioned method is referred to as domain generalization (DG) [
17]. DG has also attracted growing interest in intelligent diagnosis. Two commonly employed techniques in DG are enhancing data diversity and learning representations that are invariant across domains. The former aims to enhance the diversity and usability of training data through the augmentation and extension of the input data. The most commonly used is the generative adversarial network (GAN) [
18]. In addition, numerical simulation methods have become a common tool in intelligent diagnostic research because they can efficiently simulate complex system behaviors and fault characteristics. Zheng et al. [
19] developed a numerical simulation-enhanced RV reducer fault diagnosis method. Wang et al. [
20] combined numerical simulation models with machine learning to implement the online fault diagnosis of bearings. Xu et al. [
21] solved the problem of insufficient fault sample data by establishing a connecting rod-fastening rotor dynamics model. The above numerical simulation method effectively alleviates the challenges brought about by data scarcity in intelligent diagnostic research and provides important data support for model development and performance verification. However, there is little research on applying simulation data augmentation to DG. For example, the literature review [
22] of DG pointed out in its future work that using simulation models to generate simulation data has great potential for application in DG. The latter aims to learn feature representations that are stable and consistent across domains, and centers on training models to extract features that are valid across different data distributions. For example, the research conducted by [
23] introduced an intelligent fault detection approach utilizing multiple source domains, with the goal of effectively extracting domain-invariant characteristics from diverse source domains. Shi et al. [
24] presented an intelligent fault diagnosis method for unknown conditions built upon a domain generalization theory. In contrast, Chen et al. [
25] developed a general adversarial domain invariant generalization (ADIG) regression framework to address the issue of feature migration between different domains. Hu et al. [
26] designed a novel loss for domain generalization. In addition, both federated learning and meta-learning have shown promising results in improving domain generalization capabilities. For example, recent studies [
27,
28,
29,
30] explored the potential of meta-learning and federated learning in DG, which makes both methods particularly useful in the case of small samples. Among them, Jian et al. [
29] designed a domain generation module to generate data with different distributions and used meta-learning to simulate domain shift. These methods have demonstrated their effectiveness in handling previously unseen target data, thus paving the way for a new research trajectory within the realm of intelligent diagnosis. However, based on the above methods, the model still needs sufficient source domain training data to ensure good performance in the target domain.
Overall, DG can effectively tackle fault diagnosis in scenarios where the target domain is unseen. However, how to obtain a substantial quantity of source domain data and effectively learn domain-invariant feature representations to support DG based on multi-source domains is still a problem to be solved. To overcome this difficulty, this study applies simulation data to DG for the first time. It constructs multi-source domains by combining simulation data with measured data, facilitating the model to learn more domain-independent features through adversarial training, thereby generalizing effectively to unseen target domains. The key contributions of this study are outlined below:
We propose a DG method augmented by numerical simulation data, where simulation data representing different operating conditions are used together with measured data from other different conditions. Increasing domain diversity by introducing different speeds and loads improves generalization performance across unknown domains.
The proposed method is superior to the traditional DG method without simulated data augmentation. Experimental verification shows that with the augmentation of simulated data, the generalization performance of the adversarial training model on the unknown target domain is effectively improved.
The remaining sections of this article are organized as follows:
Section 2 details the basic theory of transfer learning.
Section 3 details the general framework of the proposed method.
Section 4 is the experimental design and result analysis.
Section 5 is the summary of the paper.
3. The Proposed Method
This paper mainly solves the problem of source domain data scarcity in multi-source domain generalization by constructing an FEM to generate simulation data as an enhanced domain and combining it with the real domain.
Figure 2 shows the overall flow chart of the article.
Table 1 shows the main parameters of the network.
3.1. Finite Element Model
Building on the existing research in the laboratory [
40] and incorporating the concept of DG, an innovative DG method utilizing simulation data enhancement is introduced to effectively address the issue of limited source domains in multi-source domain adversarial learning processes. The general steps of this paper are as follows:
Step 1: Establish an FEM of the initial healthy state, simulate the dynamic response under such a state, and obtain the simulation signal of the healthy state.
Step 2: To make the vibration response signal generated by the FEM highly match the system vibration response signal of the actual physical model under a running state, the cosine similarity is used as a matching metric tool to modify the FEM. The closer the cosine similarity is to 1, the more effective the model is. Generally, satisfactory results will be achieved when cosine similarity is greater than 0.6 [
41].
where
Vp and
Vs are the vibration responses of length
N obtained by the physical system test and numerical model simulation, respectively.
Step 3: Based on the modified FEM, faults are added to construct an FEM with fault characteristics to obtain simulation data.
Through the above steps, an FEM from a healthy state to a fault state is constructed to ensure that the simulation signal is highly consistent with the real signal in the healthy state and successfully generates high-fidelity simulation signals of various typical faults, preparing for the subsequent construction of simulation-enhanced multi-source domains.
3.2. Multi-Source Domain Construction
In the finite element modeling process, we ensured the accuracy of the simulation model through strict modeling and correction steps. At the same time, cosine similarity was used as an evaluation index to verify that there was a high similarity between the simulation data and the measured data. The generation of fault data is based on a corrected high-precision model and has good credibility. Therefore, it can be considered that the current simulation data have high reliability and can reduce the risk of the model being biased towards simulation data.
The simulation working condition data
(
) are obtained through the FEM. Among them,
represents the simulation data sample,
represents the corresponding label, and
denotes the number of simulation data samples.
represents the measured data of diverse working conditions. Constructing multi-source domain data
, we obtain the follows:
where
N is the number of real domains. In order to distinguish data from diverse working conditions, the domain label
d is introduced, and the multi-source domain dataset may be expanded to the following:
where
k and
i represent the number of multi-source domains and the number of real domains, respectively.
Use the constructed multi-source domain data for domain adversarial training to achieve domain generalization. The model architecture consists of three parts: feature extractor
F, task classifier
T, and domain discriminator
D. Randomly select samples from multi-source domain data and input them into the model. After passing through the feature extractor
F, the extracted features
F(
x) are respectively transmitted to
T and
D.
T maps
F(
x) to the corresponding category labels to learn the discriminative information.
D performs domain confusion operation on
F(
x) to make it domain invariant. In domain adversarial training, task classification loss and domain discrimination loss are considered separately to make the learned features discriminative and domain invariant. For the original samples from
, the task classification loss
and the domain discrimination loss
can be expressed as follows:
where
and
represent the category label and domain label, respectively.
and
are the cross-entropy loss and gradient reversal layer, respectively.
denotes the number of source domain samples.
The total loss is expressed as follows:
By alternately training the domain discriminator and the feature extractor, the model learns to distinguish between multi-source domains while simultaneously extracting features that are common across these domains. This process ultimately realizes the goal of domain adversarial training, enhancing the model’s robustness and reducing dependency on any specific domain.