1. Introduction
With the advent of Industry 4.0, modern information technology has become deeply integrated with manufacturing, leading to significant advancements in machine manufacturing and industrial production. Rotating machinery equipment is extensively used in these fields, and bearings, as key mechanical components of such machinery, affect the safe operation of the equipment in its entirety [
1,
2,
3]. Statistics show that about 40% of failures in rotating equipment are caused by bearing faults [
4]. Thus, accurate and real-time detection of bearing faults is essential for the smooth progress of mechanical manufacturing and industrial production.
With the boom in big data and artificial intelligence technology, data-driven intelligent fault diagnosis methods have become a key research focus in recent years [
5,
6,
7]. Data processing plays a key role in the effectiveness of fault diagnosis. Raw bearing fault signals typically reflect time domain information, and, after serial processing, they can reveal frequency domain information. However, considering only time or frequency domain information, the model’s fault diagnosis performance is often suboptimal when dealing with nonlinear bearing fault diagnosis signals. Therefore, attention has been given to the Continuous Wavelet Transform (CWT), which can simultaneously reflect both time and frequency domain information. Gu et al. [
8] proposed a hybrid deep learning model for fault diagnosis that effectively extracts fault features from bearings and handles small sample datasets. This model uses variational modal decomposition (VMD) [
9] and CWT algorithms for data processing and employs a convolutional neural network (CNN) [
10] for model training. Cheng et al. [
11] introduced a rotational machinery diagnosis method based on the CWT and Local Binary Convolutional Neural Networks. Stable and accurate fault diagnosis technology can reliably detect the types of faults in motors, providing reliable support for the operational monitoring and maintenance of rotating machinery [
5,
6,
7]. As an intelligent algorithm, the deep learning model can extract fault features from bearing data for end-to-end fault diagnosis. Wang et al. [
12] proposed a method combining an improved residual network and wavelet transform for intelligent gearboxes. This approach effectively extracts features and diagnoses single faults, compound faults, and unbalance faults. Jiang et al. [
13] introduced a multi-scale convolutional neural network featuring channel attention utilizing both max pooling and average pooling layers to identify bearing fault characteristics at different scales. Regarding nonlinear feature extraction methods, Zhang et al. [
14] developed an adaptive activation function with a 
tanh function and slope thresholding. These were incorporated into the Residual Network (ResNet), allowing the network to extract features that are significantly different between faults types. However, deep learning requires a large number of data for training, and deep learning algorithms need training and testing data to have the same operational conditions, meaning they must share the same distribution. Real-time changes in operating conditions such as humidity, voltage, speed, current fluctuations, and load can cause data distribution variations during the normal operation of actual rotating equipment. These changes decrease the accuracy of deep learning algorithms when processing test dataset data [
15].
Recently, bearing fault diagnosis methods based on domain adaptation transfer learning have addressed several challenges. Specifically, they have resolved the issues of low generalizability and low robustness owing to limited data in deep learning. They have tackled the problems associated with the source and target data being in different feature spaces or distributions. Schwendemann et al. [
16] proposed the Layered Maximum Mean Discrepancy (LMMD) method, an extension of the Maximum Mean Discrepancy (MMD) that incorporates the unique characteristics of the proposed intermediary domain. Lu et al. [
17] developed an architecture in which the conditional and marginal distributions are adapted across multiple neural network layers. This method uses the MMD to measure the distribution discrepancies and introduces an adaptive weighting strategy to ascertain the importance of different distributions. Mao et al. [
18] combined the adaptability of Domain Adversarial Neural Networks (DANNs) with structured relational information across various failure models to enhance transfer learning effectiveness. Chen et al. [
19] proposed the Multi-Gradient Hierarchical Domain Adaptation Network, which concurrently acquires transferable domain invariance and class-discriminative insights, improving the diagnostic transferability of bearing faults. All of these methods have achieved satisfactory results in some respects. However, traditional bearing fault diagnosis methods based on transfer learning and CWT time–frequency images still face the following major challenges in feature capture and domain adaptation:
(1) When the fault signal is weak, the data are smooth, or feature contrast is not apparent. CWT alone may not clearly display bearing fault characteristics. Therefore, enhancing data contrast through sharpening methods to improve the discriminative power of data features is particularly important.
(2) In the process of feature extraction for fault diagnosis, models need strong feature capture capabilities. Traditional residual networks often struggle to adequately focus on important features when capturing complex fault patterns, leading to suboptimal feature extraction.
(3) Domain adaptation algorithms based on kernel methods, such as the MMD, LMMD, and Joint Maximum Mean Discrepancy (JMMD), rely heavily on the selection and tuning of the kernel function to achieve feature alignment. When dealing with data exhibiting complex nonlinear distributions, the choice of kernel function greatly influences the algorithm’s ability to capture feature differences and interactions within the data. Domain adaptation algorithms based on adversarial learning, such as DANNs and Conditional Adversarial Domain Adaption Networks (CDANs), align features between the source and target domains through adversarial training. Although adversarial learning excels at capturing complex nonlinear distribution differences, the training process is prone to gradient instability, mode collapse, and vanishing gradients, making it difficult for the model to converge. Additionally, a DANN primarily focuses on aligning feature distributions and lacks explicit alignment of class conditions, which can adversely affect classification performance.
In order to solve the problems mentioned above, this paper proposes the CWT-SimAM-DAMS model. The specific innovations and contributions are as follows:
(1) The one-dimensional bearing fault signal is intergrated using a sliding window, the segmented data are processed with the CWT algorithm, and, finally, the resulting CWT time–frequency images are enhanced by overlaying high-frequency features using the Unsharp Masking (USM) algorithm. This method is named CWT-USM.
(2) The SimAM attention mechanism is integrated into the Residual Network to enhance the model’s feature extraction capability for input images and provide a robust feature extraction foundation for JMMD and CDAN domain adaptation algorithms. This model is named SimAM-ResNet.
(3) The model’s generalization ability is enhanced utilizing the JMMD and CDAN domain adaptation algorithms and designing an adaptive weighting strategy. The JMMD domain adaptive algorithm provides stable distribution alignment to make adversarial training more stable, and the CDAN domain adaptive algorithm mitigates the JMMD domain adaptive algorithm’s dependence on the kernel method by capturing complex nonlinear distribution differences through adversarial learning. CDAN and JMMD domain adaptive algorithms focus on both the joint distribution of labels and features. The adaptive weighting strategy considers the classification, JMMD, and CDAN loss, effectively reducing the discrepancy in joint distributions and achieving global domain alignment. Additionally, parameters are adaptively adjusted at various stages of model training to ensure the model’s optimal performance.
The rest of the paper is organized as follows: 
Section 2 describes the theoretical concepts of transfer learning, CWT, USM, SimAM, and ResNet. 
Section 3 presents a new domain adaptive method for diagnosing bearing faults, including the SimAM attention mechanism, the JMMD and CADN domain adaptation algorithms, and the weight adaptive strategy. 
Section 4 describes the specifics of the dataset and the parameter settings used in this study. 
Section 5 provides experimental results and conducts an analysis. 
Section 6 summarizes the paper.
  5. Experimental Verification
  5.1. Experiment on Unsharp Mask Parameter Settings
We selected nine values for the 
 and 
 parameters in the Unsharp Masking algorithm based on reference [
34], and the experiments were conducted on the CWRU dataset. The ResNet model was utilized, and each experiment was repeated five times. 
Table 6 reports the corresponding results. This demonstrates the feasibility of the USM algorithm.
By analyzing 
Table 6, it can be seen that, after applying the USM algorithm, the overall performance of fault diagnosis improved. In transfer task 2-3, the accuracy of all nine parameter configurations selected in this study was higher than the results using the original CWT images. Moreover, different parameter configurations had a certain impact on the final results. Specifically, when the parameters were set to 
, the algorithm performed best, achieving an average accuracy of 86.59%, which is higher than the average accuracy of 84.89% achieved without using the USM algorithm. 
 and 
 represent a moderately blurred and strongly enhanced image edge and detail processing in sharpening. While reducing minor noise, it also avoids losing too many detailed features. This parameter setting makes the fault features more pronounced without excessively emphasizing noise, effectively balancing the signal-to-noise ratio and showcasing the frequency and time information of CWT images at different scales. Therefore, 
, 
 were selected as the parameter settings for the Unsharp Masking algorithm.
  5.2. Comparative Experiment of Image Processing Method
Comparative experiments were conducted on the CWRU dataset to validate the effectiveness of the proposed image extraction method (CWT-USM) by combining the CWT and USM in the signal feature extraction process. In the experiment, several different image transformation methods [
35] were selected for comparison, including Gramian Angular Summation Fields (GASF), Gramian Angular Difference Fields (GADF), Recurrence Plot (RP), and Markov Transition Fields (MTF) methods. The corresponding two-dimensional images are shown in 
Figure 7.
The ResNet model was selected for the experiments. Where 50% data overlap was selected for all image processing methods, the signal period sampling points were the same as that of the CWT algorithm. 
Figure 8 depicts the results after conducting five experiments for each method and averaging the results. By observing the experimental results, it is evident that the accuracy of images processed using GASF, GADF, RP, and MTF methods in the transfer task 0-1 was below 60%. In contrast, the accuracy of images processed with the CWT-USM method in transfer task 0-1 was 85.04%, which is a 30.4% improvement compared to the second-highest accuracy achieved by MTF (54.64%), showing a significant enhancement. Additionally, in other transfer tasks, the accuracy of the RP and MTF was significantly improved compared to that of GASF and GADF, but their accuracy was still lower than that of the CWT-USM method proposed in this study. The results indicate that the CWT-USM method can extract richer and more accurate data features, significantly improving the accuracy of bearing fault diagnosis.
  5.3. Comparative Experiments with Different Dimensional Inputs
To compare the impact of different dimensional inputs on fault diagnosis outcomes, we evaluated the original one-dimensional (1D) time domain signal, the 1D frequency domain signal processed by FFT, and the proposed CWT-USM method, which includes time–frequency domain information. The experiments were conducted using the ResNet model, and the results are shown in 
Table 7.
The results indicate that CWT-USM outperformed the 1D frequency domain input across all transfer tasks. Although the accuracy of CWT-USM was slightly lower in transfer tasks 0-3, 1-2, 1-3, and 2-1 compared to the one-dimensional time domain input, the overall average accuracy of CWT-USM was higher. Specifically, the CWT-USM method improved the average accuracy by 3.61% compared to the 1D time domain input and by 14.46% compared to the 1D frequency domain input.
These experimental results demonstrate the superiority of using CWT-USM as input. By encompassing both frequency domain and time domain information relating to the vibration signal, CWT-USM provides richer feature information, leading to better fault diagnosis performance.
  5.4. Comparative Experiments on Different Domain Adaptation Strategies
To enhance the persuasiveness and general applicability of the experiments, this study introduced the PU bearing and the existing CWRU datasets. The experiments extensively compared several transfer strategies, including the baseline model without any transfer strategy (SimAM-ResNet), utilizing the Conditional Domain Adaptation Network (SimAM-ResNet-CDAN), utilizing the Maximum Mean Discrepancy (SimAM-ResNet-JMMD), a model combining CDAN and JMMD but without the adaptive weighting algorithm (SimAM-ResNet-CDAN-JMMD), and the proposed method (CWT-SimAM-DAMS). The experiments were repeated five times, and the results on the CWRU and PU datasets are presented in 
Table 8 and 
Table 9, as well as 
Figure 9 and 
Figure 10, respectively.
On the CWRU dataset, compared to SimAM-ResNet without domain adaptation algorithm or SimAM-ResNet-CDAN and SimAM-ResNet-JMMD using one domain adaptation algorithm alone, SimAM-ResNet-CDAN-JMMD, a method that combines two domain adaptation algorithms, had an improved average fault diagnostic accuracy. However, there was still be a problem where the accuracy rate decreased in migration tasks compared to when using a domain adaptation algorithm alone, e.g., migration task 0-2. The proposed method, CWT-SimAM-DAMS, achieves an accuracy rate that is greater than or equal to that of other domain adaptation algorithms across all migration tasks. Additionally, this method addresses the decreased accuracy of the SimAM-ResNet-CDAN-JMMD method compared to the SimAM-ResNet-CDAN and SimAM-ResNet-JMMD methods on migration task 0-2. On the PU dataset, the proposed method showed a significant improvement in migration task 0-2 and migration task 2-1. Although it decreased in migration tasks 0-1, 1-0, and 2-0, the conditions 0-2 and 2-1 were improved by 14.03% and 13.42%, respectively, compared to the SimAM-ResNet-CDAN-JMMD method. Thus, it significantly improves the average fault diagnosis accuracy. Compared to other domain adaptation algorithms, the proposed CWT-SimAM-DAMS method exhibits stronger adaptability and accuracy. Because it adjusts the optimization objectives in real time, by comprehensively considering three optimization objectives, the classification, JMMD, and CDAN, this method reduces the distribution differences between the source and target domains, enhancing the model’s ability to diagnose bearing failures.
  5.5. Model Comparison Experiment
To verify the feasibility of our proposed bearing fault diagnosis model compared to other bearing fault diagnosis models, we selected several common algorithms and models in bearing fault diagnosis for comparative verification. Each model was applied five times to obtain the average diagnostic accuracy. 
Figure 11 and 
Figure 12, as well as 
Table 10 and 
Table 11, present the experimental results comparing the CWT-SimAM-DAMS model with the competitor models. 
Table 12 and 
Table 13 present the training and testing times of different models. Additionally, the performance of each method was assessed through confusion matrices, as shown in 
Figure 13 and 
Figure 14.
The experimental results show that the CWT-SimAM-DAMS model achieved an average accuracy of 99.29% on the CWRU dataset and 86.93% on the PU dataset. Compared to several traditional bearing fault diagnosis methods, the CWT-SimAM-DAMS method significantly improves average accuracy. Specifically, the accuracy of the CWT-SimAM-DAMS method on the CWRU and PU datasets was 13.56% and 25.42% higher, respectively, than that of the traditional ResNet model. Similarly, compared to the CNN model, the CWT-SimAM-DAMS method achieved an average accuracy improvement of 12.18% and 30.66% on the CWRU and PU datasets, respectively. This indicates that the CWT-SimAM-DAMS model has superior feature extraction and domain alignment capabilities.
Table 12 and 
Table 13 show that the ResNet model had the longest training and testing times on both the CWRU and PU datasets. Although the CWT-SimAM-DAMS model has relatively long training times compared to other models, its testing time does not significantly increase. For the CWRU dataset, the training time difference between all models was less than one minute, and the testing time of the CWT-SimAM-DAMS model was only 0.068898 min longer than the fastest AlexNet model. For the PU dataset, the training time difference between all models was within 10 min, and the testing time of the CWT-SimAM-DAMS model was only 0.4784722 min longer than the fastest CNN model. Considering that, in practical industrial applications, model training is usually conducted offline, training time is not a critical issue compared to model accuracy. Additionally, the difference in testing time between models is not significant. Taking both model accuracy and testing time into account, the CWT-SimAM-DAMS model still has a significant advantage.
   5.6. Ablation Study
Various ablation experiments were conducted on the CWRU dataset to verify the proposed CWT-SimAM-DAMS method, referred to as Method 1 in 
Table 14, and the efficacy of each component. These ablation experiments involved systematically removing key modules of Method 1 and observing their impact on the final performance, thereby revealing the contributions and importance of each module.
By comparing different combinations, it was found that removing the adaptive weighting module led to a significant decrease in performance, indicating the critical importance of the adaptive weighting module for the effectiveness of the CWT-SimAM-DAMS method. Conversely, when CWT-USM was replaced with regular CWT or the Residual Network integrated with SimAM was substituted by a standard residual network, although there was a decrease in performance, the impact was relatively minor. This indicates that, while the module image processing and the residual network integrated with SimAM contribute to performance enhancement, their effect is not as pronounced as that of the weight adaptive strategy module. The complete Method 1 model outperformed all other combinations, validating that integrating all modules achieves the best performance.