To validate the effectiveness of the STFDAN framework, we conducted comparative experiments on three publicly available datasets against several traditional models. Additionally, to evaluate the impact of each individual component of the model on its diagnostic performance, we performed ablation experiments using the Case Western Reserve University (CWRU) bearing dataset and the Jiangnan University (JNU) bearing dataset.
3.1. Data Description and Processing Methods
All experiments in this study were conducted using the PyTorch 1.4 framework. The specific hardware configuration includes an RTX 3080Ti GPU, an Intel i9-10980XE processor, 125.5 GiB of RAM, and a 64-bit Linux operating system.
In the comparative experiments, we primarily tested the proposed model on the following three publicly available bearing datasets. A description of these datasets is provided below:
- A.
Case Western Reserve University Bearing Fault Dataset (CWRU)
The CWRU dataset is one of the most widely recognized and authoritative publicly available datasets for bearing fault diagnosis [
30]. Therefore, this study uses the CWRU dataset as the primary data source for model evaluation. The bearing model used in the dataset is SKF6205, and the fault data collection test rig is shown in
Figure 4.
The experimental setup is as follows: the signal acquisition frequency is set to 12 kHz, and the motor load is 0 HP, 1 HP, 2 HP, and 3 HP, corresponding to different rotational speeds. These different speeds represent different operating conditions, or tasks. The corresponding relationships are shown in
Table 2. Vibration signals under different migration tasks are obtained by collecting vibration signals from the acceleration sensor on the drive-end bearing. For each migration task, there are 10 categories with corresponding labels from 0 to 9, including one healthy state (NA) and three types of faults. The three types of faults are divided into 9 categories based on fault location and severity. The fault locations include inner race fault (IF), outer race fault (OF), and rolling element fault (BF). Based on the severity of the faults, they are further divided into light fault (0.007 inches), medium fault (0.014 inches), and severe fault (0.021 inches). The corresponding relationships are shown in
Table 3. The original vibration signal is sampled with a window length of 1024 without overlap. After applying the data augmentation method introduced in
Section 2.2 to the samples, the data is split into training, validation, and test sets in a 3:1:1 ratio. The final distribution of the generated samples is shown in
Table 4.
- B.
Jiangnan University Bearing Fault Dataset (JNU)
The JNU dataset is a bearing fault dataset obtained from Jiangnan University [
31]. The experimental setup uses a PCB MA352A60 accelerometer to collect vertical vibration signals. The bearing failure test rig at Jiangnan University is shown in
Figure 5.
The sampling frequency is 50 kHz, with rotational speeds of 600 r/min, 800 r/min, and 1000 r/min. Each speed corresponds to a different transfer task, as shown in
Table 5. In each transfer task, there are four fault types: inner race fault (IF), outer race fault (OF), rolling element fault (BF), and normal state (NA). The corresponding labels and fault mappings are shown in
Table 6. The processing method is identical to that used for the CWRU dataset. Finally, the sample set is divided into training, validation, and test sets in a 3:1:1 ratio.
- C.
Southeast University Bearing Fault Dataset (SEU)
The Southeast University bearing dataset is a publicly available dataset provided by Southeast University [
32]. This dataset consists of two sub-datasets: one for bearings and one for gears. In this study, only the bearing dataset is used to evaluate the performance of the proposed model. The bearing dataset includes two operating conditions with different rotational speeds and loads: 20 Hz-0 V and 30 Hz-2 V. These two conditions are treated as different transfer tasks, labeled as 0 and 1. In each condition, there is one healthy state (NA) and four fault types. The four fault types are roller fault (BF), inner race fault (IF), outer race fault (OF), and composite fault (CF). The composite fault refers to the simultaneous occurrence of both inner and outer race faults. The corresponding classification categories and label assignments are shown in
Table 7. Similar to the data processing methods used for the two datasets mentioned above, vibration data from the eight channels collected is processed. For the subsequent experiments, vibration data in the direction of the planetary gear x (channel 2) is selected.
3.3. Comparative Experiments
This section presents comparative experiments based on the three publicly available datasets mentioned in
Section 3.1. Six widely used transfer learning methods are selected for comparison with the proposed model. For the fairness of comparison, we conduct comparative experiments on the same experimental computing platform, and the hyperparameter settings are kept consistent. Among them, the learning rate is set to 0.0003 and the batch size is set to 32. These methods include the machine learning-based Support Vector Machine (SVM) [
33], Domain Adaptation with Deep Convolutional Networks (DDC) [
34], Deep Domain-Convolutional Neural Networks (DFCNN) [
35], Domain-Adversarial Neural Network (DANN) [
36], Correlation Alignment (CORAL) [
37], and Multi-Kernel Maximum Mean Discrepancy (MKMMD).
The SVM uses an RBF kernel function with parameters C = 100 and γ = 0.01. The DDC introduces an adaptive layer and an additional domain confusion loss to learn domain invariant representations. DFCNN converts 784 samples to grayscale images for training to extract deeper fault features. To ensure that the comparative algorithms are representative, the feature extractors used in the DANN and CORAL methods are based on ResNet-18 [
38], and the MKMMD methods are categorized into ResNet-18 based feature extractor methods (RMKMMD) and Transformer [
39] based feature extractor methods (TMKMMD). ResNet-18 is particularly effective for processing time series data. The network structure is shown in
Figure 8.
- A.
Comparative Experiments on the Case Western Reserve University (CWRU) Dataset
In the bearing dataset experiments from Case Western Reserve University, 12 variable operating condition fault diagnosis tasks are involved. To conveniently represent the transfer tasks, “a–b” is used to denote the transfer task from source domain “a” to target domain “b”. To verify the superiority of the proposed STFDAN model, comparisons are made with the seven different fault diagnosis methods mentioned earlier. In addition, we also evaluated whether the computational cost of the proposed STFDAN model is reasonable by comparing the average training time (Time) across the 12 transfer tasks. The unit of training time is consistently measured in seconds (s) in the subsequent experiments. The diagnostic results of the different methods for the 12 cross-domain tasks are shown in
Table 9.
As can be seen from
Table 9, STFDAN outperforms the other seven compared methods in most of the transfer tasks. The STFDAN model extracts features using parallel SECNN and BiLSTM architectures and fuses the spatiotemporal features through a cross-attention mechanism to obtain the final feature representation. Compared to the RMKMMD approach, which uses ResNet-18 as a feature extractor, STFDAN shows significant advantages in 9 out of 12 transfer tasks. Notably, STFDAN outperforms RMKMMD by 6.25% in the 3–0 transfer task. Compared with the TMKMMD method, which uses Transformer as the feature extractor, STFDAN demonstrates a more obvious superiority, not only in terms of higher accuracy but also in terms of less training time. Overall, this indicates that STFDAN is more advantageous in feature extraction when dealing with complex transfer conditions.
In addition, STFDAN achieved higher accuracies in eight transfer tasks compared to DANN and CORAL. In the 3–0 transfer task, the accuracy of STFDAN is 7.03% to 7.43% higher than that of DANN and CORAL. This indicates that STFDAN does not fully utilize its advantages in transfer tasks with small feature differences (e.g., low-to-high-speed faulty data transfers), but in complex transfer tasks, especially in the 3–0 task, STFDAN shows significant advantages. Overall, STFDAN maintains an accuracy of over 98% in all transfer tasks, which demonstrates its stability. It performs particularly well in the 3–0 transfer task, which indicates its suitability for complex transfer learning scenarios.
By comparing the three models—RMKMMD, DANN, and CORAL—which all use ResNet-18 as the feature extractor, it is evident that DANN exhibits very stable performance across most tasks, especially in tasks 0–1, 2–3, and 3–2, where its accuracy is close to or reaches 100%. This indicates that DANN has strong adaptability, even in transfer tasks where there are significant differences between the source and target domains. CORAL performs exceptionally well in tasks where the difference between the source and target domains is smaller or where feature alignment is more evident, particularly in tasks 0–1, 2–3, and 1–2. It effectively reduces domain discrepancies by aligning the covariance matrices of the source and target domains. RMKMMD performs well in tasks 3–1, 3–0, and 3–2, achieving higher accuracy than the other two models. This suggests that RMKMMD is more effective in low-speed to high-speed transfer tasks, where the distribution differences between the source and target domains are more pronounced.
From the perspective of computational cost, the comparative analysis of the methods in
Table 9 shows that our proposed STFDAN model requires less training time with higher average accuracy compared to DANN, CORAL, RMKMMD and TMKMMD. Specifically, compared to the RMKMMD method that uses ResNet-18 as a feature extractor, our proposed STFDAN model takes only 121 s, which is 25 s less than RMKMMD. Compared with the TMKMMD method that uses Transformer as a feature extractor, the TMKMMD method requires about three times the training time of our proposed STFDAN model, with a larger time cost. This indicates that the parallel network architecture proposed by STFDAN has lower computational overhead than ResNet-18 and is more suitable for industrial applications. DFCNN does not involve the alignment process of source and target domains, but only trains the source domain data and applies the model directly to the target domain for testing, resulting in shorter training time. Meanwhile, DDC is based on the AlexNet architecture, which is relatively simple and thus has a shorter training time than STFDAN. However, the average accuracy of STFDAN is 15.26% higher than that of DDC. Considering all the factors, STFDAN has better overall performance.
Furthermore, by analyzing the experimental results, we observed a common pattern across all transfer learning methods: when the source domain consists of low-speed datasets and the target domain consists of high-speed datasets, the performance of transfer learning is consistently lower than that of other transfer tasks. This phenomenon occurs because high-speed features are more complex, containing more high-frequency components and intricate vibration patterns.
When transferring from low-speed feature recognition to high-speed feature recognition, certain challenges arise, leading to a slight decrease in accuracy in transfer tasks 2–0 and 3–0. Additionally, the feature distribution gap between the source and target domains in these tasks is relatively large, especially in the 3–0 task. Conducting feature alignment between the source and target domains using an unsupervised approach becomes more challenging, introducing difficulties in feature alignment.
In the 3–0 migration task, it is challenging to effectively align the two domains because of the large difference in rotation speed between the source domain (low speed) and the target domain (high speed). In order to visualize the feature extraction capability of the proposed model, we take the 3–0 migration task as an example and apply the t-distributed stochastic neighborhood embedding (t-SNE) algorithm to visualize the features extracted by the DDC, DFCNN, DANN, CORAL, RMKMMD, TMKMMD and STFDAN methods. The t-SNE feature visualization for each method is shown in
Figure 9.
From the feature map in
Figure 9a, it can be observed that the inter-domain distance between source domain samples and target domain samples is relatively small, while within each domain, the distances between different categories are relatively large. The experimental results indicate that the proposed model can effectively align the source and target domains and correctly classify faults.
To visualize the feature extraction process in the STFDAN model, we use t-SNE plots to analyze the extracted features for the 3–0 transfer task from a network architecture perspective. The extracted features from the SECNN layer, BiLSTM layer, and the final fully connected layer are visualized, as shown in
Figure 10.
Figure 10a illustrates the feature distribution of the source and target domain test samples before being input into the model.
Figure 10b shows the output features from the SECNN layer.
Figure 10c presents the output features from the BiLSTM layer.
Figure 10d depicts the output features from the final fully connected layer after training.
From these visualizations, we observe that both SECNN and BiLSTM have certain feature extraction capabilities. As the network deepens, the overlap between different fault categories in the source and target domains significantly decreases, and the feature distribution of the same category becomes more concentrated.
The confusion matrix for the 3–0 transfer task is shown in
Figure 11. By observing the confusion matrix, it can be seen that almost all samples in the target domain test set are correctly classified, with only 5 samples being misclassified. This further validates that the proposed model exhibits superior performance compared to other methods.
- B.
Comparative Experiments on the Jiangnan University (JNU) Dataset
In the experiments with the Jiangnan University dataset, there are three operating conditions, resulting in a total of six transfer tasks. Similar to the comparison experiments with the Case Western Reserve University dataset, we compare the accuracy and average training time (Time) of the proposed STFDAN model with the mentioned different fault diagnosis methods.
The basic parameters are kept consistent, and the experiments are conducted on the same computational platform. The diagnostic accuracies for the six transfer tasks across different methods are shown in
Table 10, and the corresponding histograms are presented in
Figure 12.
The experimental data show that the proposed model still achieves high accuracy on the JNU dataset. Compared with the CWRU dataset, the advantage of our proposed model on the JNU dataset is more obvious. The average accuracy is 4.28% higher than the highest accuracy among the other six methods. The diagnostic accuracies for each transfer task ranged from 98.78% to 99.65%, indicating that STFDAN has higher adaptability and stronger generalization ability compared to other methods. Comparison in terms of time cost reveals that although DDC and DFCNN require less time, their average fault diagnosis accuracies are only between 72.89% and 83.14%, with a relatively high risk of missed and incorrect tests in practical applications. Meanwhile, compared with our proposed STFDAN model, DANN, CORAL, RMKMMD and TMKMMD require more training time but have lower accuracy rates. These results further demonstrate the significant value of the proposed STFDAN model in practical engineering applications.
To visually demonstrate the contribution of the STFDAN model to migration fault diagnosis, we also perform t-SNE feature visualization for the comparative experiments on the JNU dataset. The results are shown in
Figure 13. To make the migration effect more intuitive, we label the source and target domain samples with different symbols, using the same color for the same class in both domains. By observing
Figure 13, it is evident that the STFDAN model outperforms other methods. This is primarily reflected in the fact that not only are the source and target domain data completely aligned in feature space, but the distance between different classes is also large, with samples of the same class clustered together. This indicates that the migration effect is excellent, and most samples are correctly classified.
Additionally, to further assess the convergence speed and learning ability of the STFDAN model, we plot the accuracy and loss curves, as well as the confusion matrix for Task 0–1, to better observe the model’s status during the training process, as shown in
Figure 14. The curves include the accuracy curve for the source domain training set, the accuracy curve for the target domain validation set, and the loss curve for the entire training phase. The loss curve for the first 50 epochs records the classification loss for the source domain training set, while the subsequent 100 epochs record both classification loss and domain adaptation loss. As shown in
Figure 14a, the accuracy and loss values stabilize after the 10th epoch, indicating that the model exhibits strong stability. After the 50th epoch, with the addition of domain alignment loss, the accuracy curve for the target domain validation set gradually stabilizes and approaches 100%. This suggests that the STFDAN model effectively captures the features of the data and learns meaningful features. In the confusion matrix of
Figure 14b, the predicted labels are almost identical to the true labels, with the majority of samples correctly diagnosed, as shown along the diagonal.
- C.
Comparative Experiments on the Southeast University (SEU) Dataset
There are two migration tasks in the comparison experiment on the Southeast University dataset. We again used the seven methods mentioned earlier for comparison, and the results are shown in
Table 11. On the Southeast University dataset, the STFDAN model shows excellent performance. The highest average accuracy of the other seven methods is only 76.00%, while the STFDAN model achieves an accuracy of 99.22%, which is 23.22% higher than 76.00%. In addition, the STFDAN model significantly outperformed the other methods in both migration tasks.
The SEU dataset has more complex fault modes, which not only include single faults such as inner race, outer race, and rolling element faults, but also composite fault modes involving both inner and outer races. Additionally, the SEU dataset consists of data collected at different frequency levels (20 Hz-0 V and 30 Hz-2 V), so the data distribution differences in the SEU dataset are larger compared to the other two datasets, which were collected at the same frequency. When using other models, the feature extraction capability is limited, making it challenging to perform feature alignment with unsupervised transfer learning methods. On the other hand, the STFDAN model processes the input data from different perspectives and, through its parallel network structure, separately extracts temporal and spatial features before fusing them. This approach enables the extraction of more comprehensive feature representations from input data with large distribution differences, which is more beneficial for subsequent feature alignment. This further demonstrates that the proposed STFDAN model is more capable of handling complex transfer tasks.
In the comparison of computational overhead, we find that although DDC and DFCNN take less time than STFDAN, the average accuracy of STFDAN is 25.94% higher than DFCNN and 58.75% higher than DDC. The DANN, CORAL and RMKMMD using ResNet-18 as a feature extractor and the TMKMMD method using Transformer as a feature extractor all take more time than our proposed STFDAN model, resulting in higher computational overhead with lower accuracy than STFDAN. In conclusion, the STFDAN model achieves higher fault diagnosis accuracy while maintaining relatively low computational overhead, making it valuable in practical applications.
To present the experimental results more intuitively, we performed feature visualization on the source and target domain test set data for migration task 0–1, as shown in
Figure 15. In
Figure 15a, both the classification and migration effects are clearly good. In the other five methods, the target domain data for label 2 are significantly different in distribution from the source domain data, and the target domain data are incorrectly classified as label 3. However, in the feature map of our model, almost all samples for label 2 are correctly classified. This further validates the generalization ability and practical applicability of the proposed model.
Additionally, we plotted the confusion matrices for the different migration tasks on the SEU dataset to further demonstrate the significant value of the proposed STFDAN model for engineering applications. The confusion matrices for the two migration tasks are shown in
Figure 16.