Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN

Li, Xiaoxu; Wang, Jiahao; Wang, Jianqiang; Wang, Jixuan; Li, Qinghua; Yu, Xuelian; Chen, Jiaming

doi:10.3390/machines13070618

Open AccessArticle

Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN

by

Xiaoxu Li

,

Jiahao Wang

,

Jianqiang Wang

,

Jixuan Wang

,

Qinghua Li

^*,

Xuelian Yu

and

Jiaming Chen

College of Mechanical and Vehicular Engineering, Changchun University, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(7), 618; https://doi.org/10.3390/machines13070618

Submission received: 7 June 2025 / Revised: 9 July 2025 / Accepted: 16 July 2025 / Published: 17 July 2025

(This article belongs to the Special Issue Advances in Bearing Modeling, Fault Diagnosis, RUL Prediction (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

To address the problems of feature extraction, cost of obtaining labeled samples, and large differences in domain distribution in bearing fault diagnosis on variable operating conditions, an unsupervised domain-adaptive bearing fault diagnosis method based on migration learning using MSACNN-IJMMD-DANN (multi-scale and attention-based convolutional neural network, MSACNN, improved joint maximum mean discrepancy, IJMMD, domain adversarial neural network, DANN) is proposed. Firstly, in order to extract fault-type features from the source domain and target domain, this paper establishes a MSACNN based on multi-scale and attention mechanisms. Secondly, to reduce the feature distribution difference between the source and target domains and address the issue of domain distribution differences, the joint maximum mean discrepancy and correlation alignment approaches are used to create the metric criterion. Then, the adversarial loss mechanism in DANN is introduced to reduce the interference of weakly correlated domain features for better fault diagnosis and identification. Finally, the method is validated using bearing datasets from Case Western Reserve University, Jiangnan University, and our laboratory. The experimental results demonstrated that the method achieved higher accuracy across different migration tasks, providing an effective solution for bearing fault diagnosis in industrial environments with varying operating conditions.

Keywords:

fault diagnosis; transfer learning; multi-scale convolution; attention mechanism; JMMD; unsupervised domain adaptation

1. Introduction

Rolling bearings are essential components of rotating machinery, which plays an important part in industrial production [1,2]. The condition of rolling bearings has a significant impact on the performance, stability, and service life of mechanical equipment. Rolling bearings are subjected to high pressure during operation and are easily damaged. Failure can entail not just equipment downtime, but also catastrophic industrial accidents, resulting in irreversible economic losses and social consequences [3,4]. As a result, fault diagnosis for rolling bearings in complicated industrial equipment is an essential method for ensuring that the equipment operates normally.

In recent years, deep learning algorithms have recently demonstrated better performance in the field of fault diagnostics due to their great autonomous feature learning capacity and have been widely developed [5,6,7]. Among them, the convolutional neural network (CNN) is a representative example in the field of deep learning, which is known for its excellent feature extraction ability, as well as the ability to classify input signals into faults and realize end-to-end bearing fault diagnosis, which has gradually become a research highlights in the field of fault diagnosis [8,9]. For example, Chen [10] et al. used convolutional neural networks with varied kernel sizes to extract problematic vibration signal characteristics, which were then detected as defects using a long and short-term memory network. Li [11] et al. developed a novel wavelet operation-driven convolutional neural network to replace the network’s initial convolutional layer and extract more information for diagnosis.

Although deep learning-based defect diagnostic algorithms have produced some promising results, they must meet two stringent constraints. First, the data between the training and test sets must follow a uniform distribution. Second, adequately labeled samples are required to support the target diagnostic task [12]. However, in complicated real-world circumstances, the gathered training and test samples are frequently unevenly distributed, and labeled samples are difficult and expensive to obtain. Labeling enough samples for each condition is impractical, limiting the practical use of defect diagnostic methods.

To deal with the aforementioned issues, transfer learning has been implemented into intelligent fault diagnostic algorithms [13]. Transfer learning may use a priori knowledge learned in the source domain to enhance the performance of the prediction model in the destination domain, making it a powerful method for addressing the problem of insufficient or unlabeled data in the target domain. Bearing fault detection based on transfer learning is divided into four approaches: instance-based techniques, parameter-based methods, GAN-based methods (generative adversarial network, GAN), and feature-based methods.

Instance-based methods help facilitate domain alignment by altering the weights of source domain examples to assist the classifier in diagnosing target domain samples, or by utilizing statistical data from target domain samples. For instance, Tian et al. [1] employed a sum–mean matching strategy to alleviate the class imbalance problem by allocating higher weights to created comparable samples in order to limit the impact of irrelevant data. The parameter-based methods involve directly calling the parameters of an already trained model or pre-training its network parameters, adjusting the parameters to optimal values, and then using a small number of target domain datasets to fine-tune some of the network hyper-parameters to train the model for feature extraction in the target domain. For instance, Z et al. [3] trained a CNN with enough normal data and then substituted the fully connected layer with an SVM as a diagnostic model, which has high defect detection capabilities even with limited sample sizes. The adversarial-based approaches use generative adversarial networks with migration learning. The GAN generator is employed as a feature extractor in migration learning, whereas the adversarial between the feature extractor and the discriminator is used to learn domain-invariant features. Kuang et al. [14] used two-layer adversarial migration learning training to align both edge distribution and conditional distribution adaptively in order to gain domain-invariant knowledge to deal with fault diagnostic issues under imbalanced class situations.

The feature-based methods involve linking the source and target domain data to reduce the gap between the two domains in a shared feature space. Tong et al. [15] used maximum mean discrepancy (MMD) to decrease the distributional difference between the two domains, resulting in a relocatable feature representation of the training and test data. They subsequently trained a nearest-neighbor classifier using the relocatable features. Azamfar et al. [16] increased MMD performance by minimizing the distributional distance between the source and target datasets while extracting domain-invariant features. Li et al. [17] used multi-kernel maximum mean discrepancy (MK-MMD) to improve the migration of learnt source domain characteristics to the target domain for intelligent fault categorization. An et al. [18] employed CORAL (correlation alignment) to align second-order moments and reduce the distance between the source and target domains. Han et al. [19] employed joint distribution adaptation to match the conditional and marginal distributions of two domains in order to create more efficient features that minimized distributional discrepancies between domains. Zhao et al. [20] suggested an improved joint maximum mean discrepancy (JMMD) method for more reliably matching feature distributions from diverse domains. Xiao et al. [21] developed a function of loss integrated in the joint maximum mean discrepancy to align marginal and conditional distributions during domain adaptation, while providing weights to source domain samples to prevent negative migration.

Among them, instance-based migration learning methods are less effective when some features are specific to the source (target) domain. In such cases, the re-weighted samples cannot reduce the domain differences, so they are only suitable for scenarios where the data of the two domains are relatively similar. Parameter-based migration learning approaches encode migrated knowledge at the model level, such as network parameters, previous knowledge, and so on, with the majority of the methods relying on labeled samples in the target domain. When there is little similarity between the two domains, feature-based migration learning methods are more advantageous because they can map data from the source and target domains separately into a common feature space and use the distance metric to reduce the difference between the two domains. This paper studies bearing fault diagnosis under unsupervised variable operating condition situations and therefore adopts a feature-based transfer learning technique. The primary goal of the feature-based transfer learning approach is to identify a distance metric criterion that can quantify the difference between the source and target domains, hence measuring their similarity. As a result, a suitable metric is required for model training in order to appropriately evaluate the difference between the two domains. When training the network, the metric can be optimized by model tuning to improve the similarity between the two domains, and the final model obtained can be effectively migrated.

Based on the research background and analysis presented above, this paper proposes a multi-scale and attention-based convolutional neural network (MSACNN) that does not require signal preprocessing and automatically learns multi-scale fault information and intrinsic multi-scale features in the signal. In the domain adaptation stage, the joint maximum mean discrepancy (JMMD) is first introduced. This method is further improved, resulting in the improved JMMD, referred to as IJMMD. In addition, the adversarial loss mechanism in the domain adversarial neural network (DANN) is introduced to enhance domain confusion both between source domains and between source domains and target domains. This helps to reduce the interference of weakly correlated domain features and further enhances the learning process. Based on these enhancements, a fault diagnosis model using transfer learning is established to improve fault diagnosis accuracy and robustness. This model supports fault mode migration across different working conditions or equipment types and enables accurate fault-type identification, even within the target domain. As a result, it is applicable to diverse working conditions encountered in industrial environments. At the same time, migration learning decreases reliance on data labels in the target domain, increases efficient diagnosis under unlabeled or sparsely labeled settings, and lowers data labeling costs.

The remainder of this paper is organized as follows: Section 2 introduces the theoretical basis, Section 3 introduces the network framework structure, Section 4 introduces the relevant experiments and results analysis, and Section 5 presents the conclusions.

2. Methods

2.1. Unsupervised Domain Adaptation

Domain adaptation (DA), an important component of transfer learning, attempts to reduce disparities in distributions between the source and target domains by harmonizing their feature spaces. This approach effectively applies models trained in the source domain to tasks in the target domain. Domain adaptation (DA) approaches are often classified as supervised or unsupervised, depending on the availability of labeled data in the target domain. Given the pervasive difficulties in obtaining labeled data in industrial settings, this study focuses on the application for unsupervised domain adaptation to diagnose bearing defects.

In unsupervised domain adaptation, the labeled source domain data is defined as

D_{s} = \{x_{i}^{s}, y_{i}^{s}\}, i = 1, \dots, n_{s}, x_{i}^{s} \in R^{d}

, where

n_{s}

represents the number of source domain samples,

x_{i}^{s}

represents the ith sample of the source domain,

y_{i}^{s}

is the corresponding fault label,

R

represents the real number domain, and

d

is the feature dimension. The unlabeled target domain data is defined as

D_{t} = \{x_{j}^{s}, y_{j}^{s}\}, j = 1, \dots, n_{t}, x_{j}^{t} \in R^{d}

, with

n_{t}

representing the number of samples in the target domain and

x_{j}^{t}

representing the jth sample in the target domain. Unsupervised domain adaptation aims to reduce the distributional gap between the source and target domains by identifying transferable features that align with the data distributions of both domains. This enables the model, trained on labeled source data, to achieve effective classification performance in the unlabeled target domain.

2.2. Multi-Scale Convolution

Convolutional networks are a type of feed-forward network that performs convolutional computation. They typically include three network sections: input, hidden, and output [22]. The framework is seen in Figure 1. Multi-scale convolutional neural networks improve typical CNN designs by including convolutional layers made up of filters with different kernel sizes. This approach allows for the extraction of defect characteristics at various degrees of granularity. As a result, using multi-scale convolutional kernels allows for the acquisition of richer and more precise feature representations, which improves the effectiveness and reliability of fault diagnostic tasks.

Multi-scale convolution is defined as

y_{i}^{k} = f [{C o n v}_{k \in K} (X, W_{i}^{k}) + b_{i}]

(1)

where

y_{i}^{k}

is the output of the ith feature of the convolution kernel scale

K

;

X

is the input signal;

W_{k}^{i}

is the ith convolution kernel of the convolution kernel scale

k

;

b_{i}

is the bias added to the output of the first feature;

K

is the convolution kernel scale,

K = [k_{1}, k_{2}, \dots, k_{n}]

; and

f

is the activation function.

The input layers mainly take the vibration signal of the bearing as input. The hidden layers mainly consist of multi-scale convolutional operations (convolutional layers), pooling operations (pooling layers), and activation functions, which together facilitate the fault feature extraction process. Each convolutional operation is immediately followed by pooling operations. The output layers are mainly responsible for the fusion of the extracted features to enable the recognition and classification output of rolling bearing faults, which mainly consist of fully connected (FC) layers and a Softmax layer.

In order to obtain information across different dimensions, three convolutional kernels with different sizes are designed. In addition, in order to adjust the ability to pay attention to the key features and assign weight to the important features, the convolutional attention module CBAM is introduced, whose structure is shown in Figure 2.

2.3. Convolutional Block Attention Module

As shown in Figure 3, the convolutional block attention module (CBAM) is composed of two elements: the channel attention module (CAM) and the spatial attention module (SAM). CBAM is an embeddable, lightweight attention module that enhances its features by combining the spatial and channel mechanisms. It aims to improve the capability of neural network feature representation by highlighting useful features while suppressing unnecessary ones. By using CBAM to weight the sample features, adaptive weighting of the sample features can be achieved, suppressing the possible negative effects of noise or useless features in the migration process, thus improving the model’s migration capability.

As seen in Figure 4, the channel attention module receives input characteristics provided by spatial dimensions (height and width) and executes the MaxPool and AvgPool operations. A multi-layer perceptron (MLP) network is then used to examine the pooled characteristics. The input is pooled via a summing operation before being transferred to a Sigmoid activation function to generate the channel attention map. The process is detailed below:

M_{c} (F) = σ \{M L P [M a x P o o l (F)]\} + M L P = σ \{W_{1} [W_{0} (F^{c} m a x)] + W_{1} [W_{0} (F^{c} a v g)]\}

(2)

where

M_{c} (F)

is the channel attention feature,

σ

is the Sigmoid activation function,

F

is the input feature, MLP is the multi-layer perceptual machine, MaxPool is the maximum pooling, AvgPool is the average pooling,

F_{m a x}^{c}

and

F_{a v g}^{c}

denote the maximum pooling layer and the average pooling layer of the feature map of the channel attention module, respectively,

W_{0}

and

W_{1}

denote the weight parameter of the two fully connected layers, respectively, and the weight parameter can be shared.

Figure 5 depicts the spatial attention module (SAM), which focuses on collecting spatial dependencies between feature maps and utilizing spatial linkages to build attention maps that highlight the significance of target locations. In this procedure, SAM receives its information from the channel attention module (CAM). Initially, channel-wise average pooling and max pooling are used to generate two unique spatial maps of features. The spatial attention map is created by concatenating the channel dimensions and passing them through a convolutional layer with a 7 × 7 kernel. The detailed process is illustrated below:

M_{s} (F) = σ (f^{7 \times 7} \{A v g P o o l (F); M a x P o o l\}) = σ (f^{7 \times 7} \{[F_{a v g}^{s}; F_{m a x}^{s}]\})

(3)

where

M_{s} (F)

is the spatial attention feature,

σ

is the Sigmoid activation parameter,

F

is the feature map of the CAM, MLP is the multi-layer perceptron,

f

is the convolution operation,

F_{m a x}^{s}

and

F_{a v g}^{s}

denote the maximum pooling layer and the average pooling layer of the feature map of the channel attention module, respectively. A

7 \times 7

convolution kernel is used in this process.

2.4. JMMD and CORAL

MMD and MK-MMD are only for their marginal distributions and are not able to solve the domain bias generated by the joint distribution of input and output. To address such domain bias more effectively, the JMMD is proposed. JMMD evaluates the divergence between the joint distributions of data from the source and target domains, thereby enhancing domain adaptation performance. The corresponding loss function of JMMD is defined as follows:

L_{(J M M D)} (P, Q) = {‖E_{P} (⨂_{l = 1}^{|L|} φ^{l} (z^{s l})) - E_{Q} (⨂_{l = 1}^{|L|} φ^{l} (z^{t l}))‖}^{2} ⨂_{l}^{|L|} H^{l}

(4)

where

P

is the distribution of the source domain,

Q

is the distribution of the target domain,

|L|

is the number of layers in the corresponding set,

\otimes_{l = 1}^{|L|} φ^{l} (z^{s l})

is the feature mapping of the source domain data in the Hilbert space,

\otimes_{l = 1}^{|L|} φ^{l} (z^{t l})

is the feature mapping of the target domain data in the Hilbert space,

z^{s l}

is the activation generated by the source domain in the first layer,

z^{t l}

is the activation generated by the target domain in the first layer, and

H^{l}

is the Reproducing Kernel Hilbert Space (RKHS) corresponding to the lth layer.

To reduce feature distribution discrepancies between the source and target domains, JMMD is incorporated into the overall loss function. The loss function is defined as follows:

L = L_{c} + λ_{J M M D} L_{J M M D} (D_{s}, D_{t})

(5)

where

L_{c}

is the cross-entropy loss function;

λ_{J M M D}

is the trade-off parameter of the total loss function;

D_{s}

is the source domain dataset; and

D_{t}

is the target domain dataset.

Correlation alignment (CORAL) utilizes a linear transformation to align the second-order statistical properties of feature distributions between the source and target domains, and has demonstrated effectiveness in unsupervised pre-adaptation tasks. Specifically, the CORAL loss computes the second-order covariance distance between the source and target domains to quantify distributional discrepancies. The formula is as follows:

L_{C O R A L} = \frac{1}{{4 d}^{2}} {‖C_{s} - C_{t}‖}_{F}^{2}

(6)

where

d

is the number of dimensions of each sample, and

{‖\cdot‖}_{F}^{2}

is the square matrix Frobenius paradigm. The covariance matrices of the two domains of data are obtained according to the following equation, respectively. The formulas are as follows:

C_{s} = \frac{X_{s}^{T} X_{s} - \frac{1}{n_{s}} {(I^{T} X_{s})}^{T} (I^{T} X_{s})}{n_{s} - 1}

(7)

C_{t} = \frac{X_{t}^{T} X_{t} - \frac{1}{n_{t}} {(I^{T} X_{t})}^{T} (I^{T} X_{t})}{n_{t} - 1}

(8)

where

C_{s}

is the covariance matrix of the source domain data,

C_{t}

is the covariance matrix of the target domain data,

X_{s}

is the source domain training sample,

X_{t}

is the target domain training sample,

n_{s}

is the number of source domain training samples,

n_{t}

is the number of target domain training samples, and

I

is the row vector with all elements equal to one.

The gradient of the input features is calculated using chain derivation. The formula is as follows:

\frac{{\partial L}_{C O R A L}}{{\partial X}_{s}^{i j}} = \frac{1 {({(X_{s}^{T} - \frac{1}{n_{s}} {(I^{T} X_{s})}^{T} I^{T})}^{T} (C_{s} - C_{t}))}^{i j}}{d^{2} (n_{s} - 1)}

(9)

\frac{{\partial L}_{C O R A L}}{{\partial X}_{t}^{i j}} = \frac{{({(X_{t}^{T} - \frac{1}{n_{t}} {(I^{T} X_{t})}^{T} I^{T})}^{T} (C_{s} - C_{t}))}^{i j}}{d^{2} (n_{t} - 1)}

(10)

where

X_{s}^{i j}

is the ith j-dimensional source domain sample,

X_{t}^{i j}

is the ith j-dimensional target domain sample.

JMMD reduces the inter-domain distributional differences by maximizing the mean difference between domains and taking higher-order statistical differences into account in the computation. CORAL reduces the lower-order statistical differences by aligning the covariance matrices of the source and target domains. The combination of CORAL and JMMD synergistically reduces the inter-domain distributional differences. CORAL and JMMD have different statistical dimensions with different optimization goals, and their combination can more comprehensively improve the performance of the model, especially in complex transfer learning tasks. In addition, the CORAL approach is simple to implement and does not require complex network structures or additional optimization objectives.

Considering the relative importance of the marginal and conditional distributions, a dynamic balance parameter is added to adjust the relative importance between them. The formula is as follows:

L = ζ_{c l s} + α \sum_{l \in Γ} {C O R A L}_{m} + (1 - α) {J M M D}_{c} - L_{D}

(11)

L_{D} = \sum_{j \in \{s, t\}} E_{S_{j} - D_{j}} [\frac{1}{2} l o g G_{d} (M_{j} (x_{j})) + \frac{1}{2} l o g (1 - G_{d} (M_{j} (x_{j})))]

(12)

where

ζ_{c l s}

is the cross-entropy loss,

α

is a hyperparameter to control the weight of the CORAL part and the JMMD part of the loss,

\sum_{l \in Γ} {C O R A L}_{m}

is the CORAL loss,

(1 - α) {J M M D}_{c}

is the JMMD loss,

\sum_{j ϵ \{s, j\}} E_{S_{j} ~ D_{j}}

is the expected value,

\frac{1}{2} l o g G_{d} (M_{j} (x_{j}))

is the source-domain adversarial loss term,

\frac{1}{2} l o g (1 - G_{d} (M_{j} (x_{j})))

is the target-domain adversarial loss term, and

G_{d}

is a discriminator to determine whether the samples are from the source domain.

By combining classification loss, CORAL loss, JMMD loss, and DANN network adversarial loss (see the next subsection for details) to achieve distributional alignment between source and target domains and reduce distributional differences (as seen in Figure 6), the model’s performance on the target domain is improved, and its domain generalization ability is enhanced.

JMMD enables the simultaneous alignment of both marginal and conditional distributions between the source and target domains, addressing the inconsistency in joint distributions of multivariate features within domain adaptation. In contrast to MMD and MK-MMD, which focus solely on aligning marginal distributions, JMMD captures more intricate feature dependencies. This leads to enhanced generalization capability of cross-domain models, improved performance in complex tasks, and higher accuracy in fault diagnosis applications.

When discrepancies exist between marginal and conditional distributions, relying solely on one often results in degraded diagnostic performance. While existing approaches typically assign equal importance to both distribution types, their relative significance is in fact data-dependent. To address this, we introduce a balancing factor that adaptively adjusts the weighting between marginal and conditional distances, allowing for more flexible and accurate distribution alignment.

2.5. DANN

DANN is a domain adaptation method that uses adversarial concepts [23]. This approach further minimizes the distribution discrepancy between the source and target domains, thereby enhancing the performance of cross-domain transfer. The loss function of the DANN network mainly consists of classification loss

L_{y}

and domain classification loss

L_{d}

:

L_{y} = l o g \frac{1}{G_{y} {[G_{f} (x)]}_{y_{i}}}

(13)

L_{d} = d_{i} l o g \frac{1}{G_{d} [G_{f} (x_{i})]} + (1 - d_{i}) l o g \frac{1}{G_{d} [G_{f} (x_{i})]}

(14)

where

G_{f} (x_{i})

is the feature representation of the sample obtained by the feature extractor,

G_{y} [\cdot]

and

G_{d} [\cdot]

are the classification results obtained by the label classifier and the domain discriminator, respectively, and

d_{i}

and is the label of the domain.

The DANN architecture is primarily composed of a feature extractor, a domain classifier, and a label predictor, as illustrated in Figure 7. The Gradient Reversal Layer (GRL) is the key component. Its role is to invert the gradient when passing the loss from the domain classifier back to the feature extractor, so that the feature extractor produces “domain indistinguishable” features. For the label loss, the feature extractor performs normal gradient descent to perform well on the source-domain label prediction task. For the domain loss, the gradient inversion layer changes the sign of the gradient as it propagates to the feature extractor, thus forcing the feature extractor to generate features that are domain indistinguishable, preventing the domain classifier from easily distinguishing the source of the features.

3. Bearing Fault Diagnosis Model Based on Migration Learning

This paper takes rolling bearings as the research object and studies a bearing fault diagnosis method based on transfer learning. This paper constructs a MSACNN network to automatically extract fault features directly from the raw signals. By leveraging domain adaptation, fault diagnosis knowledge is transferred and applied to different scenarios, addressing issues such as scarce available data, insufficient labeled information, and significant differences in domain distribution. The diagnostic flowchart is shown in Figure 8, the structural diagram in Figure 9, and the structural parameter table in Table 1. The diagnostic process consists of the following steps:

Feature extraction: the MSACNN model is applied to extract multi-scale features from complex signals, ensuring stable and accurate cross-domain fault detection.

Domain adaptation: a combination of classification loss, CORAL loss, JMMD loss, and adversarial loss from the DANN is used to align the distributions between source and target domains.

Model training and diagnosis: the model is trained by updating network weights, and the diagnostic result is obtained once the model has converged.

4. Results

4.1. Introduction to the Experimental Setup and Open Bearing Dataset

To ensure consistency between this paper’s method and the comparison method during training, all methods used equivalent hyperparameters, the batch_size was set to 64, epoch = 180, and the starting learning rate was 0.001. The experimental operating environment was an Intel Core i7-14700KF 3.40 GHz processor and an NVIDIA GTX 4060 GPU graphics card, with Pytorch 1.13 as the running environment.

To evaluate the performance of the proposed model, a cross-machine fault diagnosis task was conducted using bearing datasets from Jiangnan University (JNU) and Case Western Reserve University (CWRU).

The Jiangnan University bearing data set was collecting using an experimental bench equipped with a Mitsubishi SB-JR induction motor, which drives a centrifugal fan system to carry out fault diagnosis tests. As shown in Figure 10, the setup mainly consists of a signal recorder, accelerometer, and amplifier. This open-data set was specifically designed for research on bearing fault diagnosis and predictive maintenance, with a sampling frequency of 50 kHz. The JiangNan University bearing data acquisition device collects fault data at different rotational speeds, specifically, 600 r/min, 800 r/min, and 1000 r/min, respectively. Each rotational speed contains three types of faults, namely, inner ring fault (IB), outer ring fault (OB), and rolling body fault (TB), as well as the normal state (N), at that rotational speed.

The Case Western Reserve University experimental setup is shown in Figure 11. The experimental rig consists of a 2 Hp motor, a dynamometer, and a torque sensor. This experiment takes the bearing drive end (DE end) dataset as the research object, and the data sampling frequency is 12 kHz. Based on the analysis of the bearing fault location, there are four types of faults, namely, inner ring fault (IF), ball fault (BF), outer ring fault (OF), and normal condition (NC). Since the JiangNan University bearing dataset is a four-class classification dataset, we treated each bearing failure state in the Case Western Reserve University dataset as part of a four-class classification dataset, from which we selected the case with a motor speed of 1750 r/min, a load of 2 Hp, and a fault size of 0.007 inch.

A total of 1000 samples were collected for each fault category, resulting in 4000 samples in total. Each sample had a length of 3072, and the dataset was split into training and testing sets, with a ratio of 8:2. Cross-domain transfer was performed across different rotational speeds, specifically 600 r/min, 800 r/min, 1000 r/min, and 1750 r/min, corresponding to working conditions A1, B1, C1, and D1, respectively. In each transfer task, one condition was designated as the source domain and another as the target domain. Based on this setup, 12 domain adaptation tasks were constructed, as detailed in Table 2.

4.2. Comparative Experiments and Analysis of Results

To validate the efficiency of the suggested method, a comparison study was performed using several well-established domain adaption methodologies in the field of intelligent fault detection, based on the aforementioned experimental design. The approaches were domain adversarial neural network (DANN) [24], maximum mean discrepancy (MMD) [25], deep subdomain adaptive network (DSAN) [26], and Deep CORAL (DC) [27]. DANN, specifically, uses adversarial training to close the distributional gap between the source and target domains. DSAN substitutes the usual multi-kernel MMD with a local maximum mean discrepancy metric. Deep CORAL aligns features by changing source domain representations into covariance structures corresponding to the destination domain. MMD conducts feature alignment by reducing the mean difference between the source and target domains, hence limiting the distributional discrepancies in feature space. To avoid random mistakes, the above method is tested multiple times in each migration task.

As can be seen from Table 3, the proposed method in this paper achieves the highest diagnostic accuracy on all the migration tasks, which can reach 98.75% on average and up to 99.50%, which is better than other domain-adapted methods.

Specifically, DC, MMD, and DANN all adapt only the global domain features, i.e., perform the edge distribution alignment, and the average diagnostic accuracy is 92.35%, 84.07%, and 92.43%, respectively. The DSAN method introduces the criterion of local maximal mean discrepancy, and the average diagnostic accuracy is 94.54%. This paper’s method aligns the edge distributions and conditional distributions of the source and target domains at the same time in the domain adaptation stage, and dynamically adjusts weights considering their relative importance, further reducing the distribution difference. The average diagnostic accuracy reaches 98.75%, and the bearing diagnostic accuracy is the highest, indicating that, when facing the diagnosis of bearing faults between different working conditions or equipment types, the model in this paper can solve the problem of large domain distribution differences, and accurately identify the fault type in the target domain, making it applicable to variable working conditions in industrial environments.

As shown in Table 4, taking the migration task A1 → D1 as an example, under the condition that the number of training samples is 4000, the training time required by the method proposed in this paper is 0.105 s, which is longer than that of the DANN method and the DSAN method, but shorter than that of the MMD method and the DC method. Considering both fault diagnosis accuracy and computational efficiency, the transfer learning-based bearing diagnosis method proposed in this paper has certain advantages.

In order to visualize the classification performance of the migration fault diagnostic model in the migration task, this paper analyzes the migration diagnostic task by means of a multiclassification confusion matrix and loss rate graphs. The confusion matrix is able to directly display the relationship between the true labels and the model-predicted labels among all fault types in the form of a matrix. Here, the vertical axis of the matrix represents the true labels, the horizontal axis of the matrix represents the predicted labels, the values on the main diagonal are the number of correctly classified fault types for each fault type, and the rest of the values represent the number of samples misclassified as other fault types. Taking the cross-device migration task of A → D as an example, the confusion matrix of the model proposed in this paper along with those of the DC, DANN, and DASN models, is shown in Figure 12.

From Figure 12, it can be seen that the classification accuracy of the model proposed in this paper reaches 98.60%, which is significantly higher than that of other methods (92.15%, 93.95%, and 95.37%). The higher value of the main diagonal of the model in this paper indicates more accurate classification results for each fault type. In addition, the proportion of misclassifications off the non-primary diagonal is significantly reduced, reflecting the strong adaptability of the model to distributional differences.

As shown in Figure 13, the classification loss in the target and source domains varies with the number of iterations over the course of 180 training sessions. Here, the blue curve indicates the classification loss (target_cls_loss) of the target domain, and the orange curve indicates the classification loss (source_cls_loss) of the source domain.

Analyzing from the initial stage, in the first 25 iterations of training, the loss value of each method decreases at a faster rate, indicating that they quickly capture the effective features at the early stage of training. In the 25th training iteration, the loss values of DC, DANN, DSAN, and those of this paper’s method are 6.76, 4.17, 3.15, and 2.76 in the target domain and 2.44, 2.95, 1.59, and 1.27 in the source domain, respectively. It can be found by comparison that this paper’s method has the fastest descending speed, captures the effective features more rapidly, and has the lowest loss value, indicating that the feature extraction ability of this paper’s method is the most effective. From the overall trend, it can be seen that the methods are gradually learning and converging. The loss values of DC, DANN, DSAN, and the method proposed in this paper finally converge to 4.23, 2.52, 1.32, and 0.71 in the target domain, and to 1.21, 0.75, 0.51, and 0.33 in the source domain, respectively. It can be observed that this paper has the fastest convergence speed and lowest loss values in iterative training. This finding shows that the method proposed in this paper stabilizes earlier and converges better than other methods. Analyzed from the perspective of domain adaptation, the comparison reveals that the loss values of the target and source domains of this paper’s model are closer to each other (the blue curves are more closely aligned with the orange curves, and the differences between DC, DANN, DSAN and this paper’s method are 3.02, 1.77, 0.82, and 0.38, respectively). Domain adaptation aims to minimize the distribution discrepancy between the source and target domains. When the loss values of both domains converge, it indicates that the model has effectively aligned their feature representations. This demonstrates that the proposed method not only achieves high classification performance on the source domain but also exhibits strong generalization capability on the target domain.

Furthermore, the experimental results clearly show that even if the model has the ability to migrate, the source and target domain loss values cannot completely overlap. Only the target domain loss approaches but still remains slightly higher than the source domain, primarily because the target domain has no true labels or exhibits distributional bias. If the target domain loss is lower than the source domain loss (the blue curve is below the orange curve), it means that the model is overfitting to the target domain, or that a form of supervised target domain setting is being used instead of the standard unsupervised domain adaptation.

4.3. Ablation Experiment and Result Analysis

To validate the effectiveness of each module in the fault diagnosis model proposed in this paper, ablation experiments were conducted based on the experimental data settings described in Section 4.1. Ablation Experiment 1: only the CBAM module was removed. Ablation Experiment 2: the CORAL module and the DANN adversarial mechanism module were removed, while the JMMD module was retained. Ablation Experiment 3: the JMMD module and the DANN adversarial mechanism module were removed while retaining the CORAL module. Ablation Experiment 4: the JMMD and CORAL modules were removed while retaining the DANN adversarial mechanism module. Ablation Experiment 5: the DANN adversarial mechanism module was removed while retaining the IJMMD module. To avoid random mistakes, the above method was tested multiple times in each migration task.

As shown in Table 5 and Table 6, after removing the CBAM in Experiment 1, the average accuracy rate was only 85.08%. This indicates that the attention mechanism module assigns weights to important features, enabling the extraction of features that better characterize fault types. Accurate feature extraction is crucial for bearing diagnosis. Comparing Experiment 2 with Experiment 5, i.e., the comparison analysis between the JMMD and the IJMMD, the corresponding average diagnostic accuracy rates were 94.20% and 95.92%, respectively. This indicates that CORAL improves the performance of JMMD by aligning the covariance matrices of the source domain and target domain, thereby reducing differences in low-order statistics. When CORAL is combined with JMMD (IJMMD), it can synergistically reduce distribution differences between domains, achieving better transfer effects and higher accuracy than the unimproved JMMD.

Figure 14 shows the loss-rate change curves for each ablation experiment. Experiment 1 corresponds to Figure 14a. Due to the removal of the CBAM, the model finds it difficult to capture effective features, resulting in particularly slow feature extraction before 50 training iterations. After 50 training iterations, the curve gradually stabilizes, but the convergence characteristics are poor. Comparing Experiment 2 (Figure 14b), Experiment 3 (Figure 14c), and Experiment 5 (Figure 14e), it can be seen from the figures that the loss values of both the source domain and the target domain in the IJMMD remain below 2.5 after 75 training iterations, which is lower than the loss values of the single JMMD and CORAL, and the convergence effect is better. This indicates that the IJMMD outperforms the standalone JMMD and CORAL in reducing domain distribution differences. It demonstrates that this paper successfully combines JMMD with CORAL, improving the performance of both JMMD and CORAL. Experiment 4, corresponding to Figure 14d, shows that by utilizing the adversarial mechanism in the DANN network, the loss values of the source domain and target domain are both below 2.0 after 50 training iterations, with good convergence characteristics. Therefore, after selecting the IJMMD domain adaptation method, this paper further introduces the adversarial mechanism of the DANN network to reduce domain distribution differences and achieve better transfer effects. As shown in Figure 14f, the proposed model brings the loss values of the target domain and source domain closer together, achieving better transfer results.

4.4. Introduction to the Experimental Setup and Our Laboratory Bearing Dataset

Bearing vibration data were collected on the bearing experimental bench in our laboratory to further verify the performance of the model in this paper. The structure of the experimental bench is shown in Figure 15. The rolling bearings used were SKF6007 deep groove ball bearings, and the failure types included inner ring fault (IF), rolling element fault (BF), outer ring fault (OF), while NF indicated the normal state. At the beginning of the experiment, the motor speed was set to 2000 r/min on the console, and the bearing vibration data were measured when the radial loading was 1000 N. The sampling frequency was set to 20 kHz. A total of 1000 samples were obtained for each fault condition, totaling 4000 samples, with a sample length of 3072, and the ratio of the training samples to the test samples was 8:2. The samples were migrated between different rotational speeds. Specifically, samples at 800 r/min, 1200 r/min, and 1200 r/min were used in the migration process. The four rotational speeds, 800 r/min, 1200 r/min, 1600 r/min, and 2000 r/min, correspond to the four working conditions of A2, B2, C2, and D2, respectively. One working condition was used as the source domain and the other as the target domain, and 12 migration experimental tasks were constructed. As shown in Table 7.

4.5. Experimental Results and Analysis

To validate the efficiency of the method in this research, based on the above experimental setup, the current domain adaptation methods commonly used in the field of intelligent fault diagnosis were compared and investigated. These methods included DANN, MMD, DSAN, and DC. To prevent random errors, the above method was tested multiple times for each migration task.

As shown in Table 8, the average diagnostic accuracy of DC, MMD, DANN, and DRSN is 91.55%, 84.21%, 92.81%, and 94.41%, respectively. The method presented in this paper aligns the marginal and conditional distributions of the source and target domains at the same time during the domain adaptation stage, performs dynamic weight adjustment, and achieves an average diagnostic accuracy of 98.58%, with the bearing fault diagnosis accuracy remaining higher than that of the other methods. The model presented in this paper can solve the problem of large differences in domain distribution and accurately identify fault types in the target domain, making it applicable to variable working conditions in industrial environments.

In order to show the classification effectiveness of the migration fault diagnosis model in the migration task, the migration diagnostic task was analyzed through the multiclassification confusion matrix and the loss-rate change graph. Taking the migration task A2 → B2 as an example, the confusion matrices of the model suggested in this paper, along with those of the DC, DANN, and DASN, are shown in Figure 16.

As shown in Figure 16, the classification accuracy of the model proposed in this paper reached 99.42%, which is significantly better than that of the other methods (91.15%, 92.67%, and 93.30%). The higher value of the main diagonal indicates more accurate classification results for each fault type. In addition, the proportion of misclassifications on the non-primary diagonal is significantly reduced, reflecting the strong adaptability of the model to distributional differences.

Figure 17 shows that the model proposed in this research exhibits lower loss during iterative training than other models, the source and target domain losses are closer, and the training is the first to achieve stability with good convergence.

5. Discussion

This article presents an unsupervised domain-adaptive bearing fault diagnosis model, MSACNN-IJMMD-DANN, based on transfer learning, aiming to enhance both the accuracy and robustness of fault identification. The approach enables the transfer of fault knowledge across different operating conditions or equipment types, making it well suited for complex and variable industrial environments. The specific conclusions are as follows:

Based on practical needs, this study proposes the idea of migration learning to reduce the dependence on data labels in the target domain, which promotes efficient diagnosis under no-label or less-label conditions, and to solve the problem of high cost and complexity of acquiring labeled samples in bearing fault diagnosis.

The convolutional network model described in this paper, which integrates multi-scale and attention mechanisms, effectively mitigates the difficulty of feature extraction in bearing fault diagnosis under varying operating conditions. The migration learning-based diagnostic model achieves high accuracy across different bearing datasets.

During the domain adaptation phase, the joint maximum mean discrepancy (JMMD) and correlation alignment (CORAL) methods were employed to establish the distribution alignment criterion. In addition, the adversarial loss mechanism from the DANN was incorporated to further align the joint distribution between the source and target domains. This effectively reduces distribution discrepancies and enhances transfer performance. The approach supports unsupervised domain adaptation in bearing fault diagnosis and offers a practical solution for handling variable working conditions in industrial environments.

The method described in this paper achieved better experimental results and transfer performance, which can serve as an important theoretical base for further research, as follows: unsupervised domain adaptation methods for multi-source, multi-objective applications, and cross-modal transfer learning methods for more comprehensive fault diagnosis.

Author Contributions

Conceptualization, X.L. and J.W. (Jiahao Wang); methodology, X.L., J.W. (Jiahao Wang), and J.W. (Jianqiang Wang); software, J.W. (Jiahao Wang); validation, J.W. (Jiahao Wang), J.W. (Jianqiang Wang), and J.W. (Jixuan Wang); formal analysis, J.W. (Jiahao Wang); investigation, J.W. (Jiahao Wang); resources, J.W. (Jiahao Wang); data curation, J.W. (Jiahao Wang) and J.C.; writing—original draft preparation, J.W. (Jiahao Wang); writing—review and editing, J.W. (Jiahao Wang), J.W. (Jianqiang Wang), X.Y., and J.C.; visualization, J.W. (Jianqiang Wang); supervision, X.L. and Q.L.; project administration, X.L. and Q.L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Department of Education of Jilin Province under grant JJKH20251092CY; Changchun University, under grant ZKP202018; and Changchun Jiamei Machinery Manufacturing Co., Ltd., under grant 2024JBH01LX6.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data contained in this study are available on request from the corresponding authors, except for those published. Certain data are not released due to confidentiality.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

MSACNN: Multi-scale and attention-based convolutional neural network; IJMMD: improved joint maximum mean discrepancy; DANN: domain adversarial neural network; CORAL: correlation alignment; CNN: convolutional neural network; GAN: generative adversarial network; MMD: maximum mean discrepancy; MK-MMD: multi-kernel maximum mean discrepancy; JMMD: joint maximum mean discrepancy; DA: domain adaptation; FC: fully connected layer; CBAM: convolutional block attention module; CAM: channel attention module; SAM: spatial attention module; RKHS: Reproducing Kernel Hilbert Space; GRL: Gradient Reversal Layer; CWRU: Case Western Reserve University; JNU: Jiangnan University; DSAN: deep subdomain adaptive network; DC: Deep CORAL, Deep correlation alignment.

References

Tian, J.; Jiang, Y.; Zhang, J.; Luo, H.; Yin, S. A novel data augmentation approach to fault diagnosis with class-imbalance problem. Reliab. Eng. Syst. Saf. 2024, 243, 109832. [Google Scholar] [CrossRef]
Lu, W.; Liu, J.; Lin, F. The Fault Diagnosis of Rolling Bearings Is Conducted by Employing a Dual-Branch Convolutional Capsule Neural Network. Sensors 2024, 24, 3384. [Google Scholar] [CrossRef] [PubMed]
Zhong, S.; Fu, S.; Lin, L. A novel gas turbine fault diagnosis method based on transfer learning with CNN. Measurement 2019, 137, 435–453. [Google Scholar] [CrossRef]
Yang, X.; Yang, J.; Jin, Y.; Liu, Z. A New Method for Bearing Fault Diagnosis across Machines Based on Envelope Spectrum and Conditional Metric Learning. Sensors 2024, 24, 2674. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Liang, J.; Gu, X.; Ling, D.; Yu, H. Multi-scale attention mechanism residual neural network for fault diagnosis of rolling bearings. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2022, 236, 10615–10629. [Google Scholar] [CrossRef]
Li, Y.; Gu, X.; Wei, Y. A Deep Learning-Based Method for Bearing Fault Diagnosis with Few-Shot Learning. Sensors 2024, 24, 7516. [Google Scholar] [CrossRef]
Zhu, H.; Sui, Z.; Xu, J.; Lan, Y. Fault Diagnosis of Mechanical Rolling Bearings Using a Convolutional Neural Network–Gated Recurrent Unit Method with Envelope Analysis and Adaptive Mean Filtering. Processes 2024, 12, 2845. [Google Scholar] [CrossRef]
Li, X.; Chen, J.; Wang, J.; Wang, J.; Li, X.; Kan, Y. Research on Fault Diagnosis Method of Bearings in the Spindle System for CNC Machine Tools Based on DRSN-Transformer. IEEE Access 2024, 12, 74586–74595. [Google Scholar] [CrossRef]
Li, X.; Chen, J.; Wang, J.; Wang, J.; Wang, J.; Li, X.; Kan, Y. Multi-Scale Channel Mixing Convolutional Network and Enhanced Residual Shrinkage Network for Rolling Bearing Fault Diagnosis. Electronics 2025, 14, 855. [Google Scholar] [CrossRef]
Chen, X.; Zhang, B.; Gao, D. Bearing fault diagnosis base on multi-scale CNN and LSTM model. J. Intell. Manuf. 2021, 32, 971–987. [Google Scholar] [CrossRef]
Li, T.; Zhao, Z.; Sun, C.; Cheng, L.; Chen, X.; Yan, R.; Gao, R.X. Wavelet Kernel Net: An interpretable deep neural network for industrial intelligent diagnosis. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 2302–2312. [Google Scholar] [CrossRef]
Liu, X.; Chen, J.; Zhang, K.; Liu, S.; He, S.; Zhou, Z. Cross-domain intelligent bearing fault diagnosis under class imbalanced samples via transfer residual network augmented with explicit weight self-assignment strategy based on meta data. Knowl. Based Syst. 2022, 251, 109272. [Google Scholar] [CrossRef]
Kumar, P.; Kumar, P.; Hati, A.S.; Kim, H.S. Deep transfer learning framework for bearing fault detection in motors. Mathematics 2022, 10, 4683. [Google Scholar] [CrossRef]
Kuang, J.; Xu, G.; Tao, T.; Wu, Q. Class-imbalance adversarial transfer learning network for cross-domain fault diagnosis with imbalanced data. IEEE Trans. Instrum. Meas. 2021, 71, 1–11. [Google Scholar] [CrossRef]
Tong, Z.; Li, W.; Zhang, B.; Jiang, F.; Zhou, G. Bearing Fault Diagnosis Under Variable Working Conditions Based on Domain Adaptation Using Feature Transfer Learning. IEEE Access 2018, 6, 76187–76197. [Google Scholar] [CrossRef]
Azamfar, M.; Singh, J.; Li, X.; Lee, J. Cross-domain Gearbox Diagnostics under Variable Working Conditions with Deep Convolutional Transfer Learning. J. Vib. Control 2021, 27, 854–864. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ding, Q.; Sun, J.-Q. Multi-Layer domain adaptation method for rolling bearing fault diagnosis. Signal Process. 2019, 157, 180–197. [Google Scholar] [CrossRef]
An, J.; Ai, P.; Liu, D. Deep domain adaptation model for bearing fault diagnosis with domain alignment and discriminative feature learning. Shock Vib. 2020, 2020, 4676701. [Google Scholar] [CrossRef]
Han, T.; Liu, C.; Yang, W.; Jiang, D. Deep Transfer Network with Joint Distribution Adaptation: A New Intelligent Fault Diagnosis Framework for Industry Application. ISA Trans. 2020, 97, 269–281. [Google Scholar] [CrossRef]
Zhao, K.; Jiang, H.; Wang, K.; Pei, Z. Joint Distribution Adaptation Network with Adversarial Learning for Rolling Bearing Fault Diagnosis. Knowl. Based Syst. 2021, 222, 106–117. [Google Scholar] [CrossRef]
Xiao, Y.; Shao, H.; Han, S.; Huo, Z.; Wan, J. Novel joint transfer network for unsupervised bearing fault diagnosis from simulation domain to experimental domain. IEEE/ASME Trans. Mechatron. 2022, 27, 5254–5263. [Google Scholar] [CrossRef]
Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
Yang, Y.; Zhang, T.; Li, G.; Kim, T.; Wang, G. An unsupervised domain adaptation model based on dual-module adversarial training. Neurocomputing 2022, 475, 102–111. [Google Scholar] [CrossRef]
Xu, Y.; Liu, J.; Wan, Z.; Zhang, D.; Jiang, D. Rotor fault diagnosis using domain-adversarial neural network with time-frequency analysis. Machines 2022, 10, 610. [Google Scholar] [CrossRef]
Zhang, W.; Wu, D. Discriminative joint probability maximum mean discrepancy (DJP-MMD) for domain adaptation. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: New York, NY, USA, 2020; pp. 1–8. [Google Scholar]
Zhu, Y.; Zhuang, F.; Wang, J.; Ke, G.; Chen, J.; Bian, J.; Xiong, H.; He, Q. Deep subdomain adaptive networks for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 1713–1722. [Google Scholar] [CrossRef]
Wang, Z.; Ming, X. A Domain Adaptation Method Based on Deep Coral for Rolling Bearing Fault Diagnosis. In Proceedings of the 2023 IEEE 14th International Symposium on Diagnostics for Electrical Machines, Power Electronics and Drives (SDEMPED), Chania, Greece, 28–31 August 2023; IEEE: New York, NY, USA, 2023; pp. 211–216. [Google Scholar]

Figure 1. Multi-scale convolutional network structure.

Figure 2. Schematic diagram of the multi-scale attention module.

Figure 3. Structure of the CBAM.

Figure 4. Structure of the CAM.

Figure 5. Structure of the SAM.

Figure 6. Schematic diagram of the domain distribution.

Figure 7. Structure of the DANN.

Figure 8. MMSACNN-IJMMD-DANN diagnostic flowchart.

Figure 9. Structure of the MSACNN-IJMMD-DANN.

Figure 10. JiangNan University laboratory bench.

Figure 11. Case Western Reserve University lab bench.

Figure 12. Confusion matrix for different migration learning fault diagnosis in migration test A1 → D1. (a) DC: accuracy = 92.15%; (b) DANN: accuracy = 93.95%; (c) DSAN: accuracy = 95.37%; (d) method proposed in this paper: Accuracy = 98.60%.

Figure 13. Plot of fault diagnosis loss rate with different migration learning methods in migration test A1 → D1. (a) DC: loss-rate variation graph; (b) DANN: loss-rate variation graph; (c) DSAN: loss-rate variation graph; (d) the method proposed in this paper: loss-rate variation graph.

Figure 14. Plot of loss-rate variation graph for ablation experiment in migration test A1 → D1. (a) Experiment 1: loss-rate variation graph; (b) Experiment 2: loss-rate variation graph; (c) Experiment 3: loss-rate variation graph; (d) Experiment 4: loss-rate variation graph; (e) Experiment 5: loss-rate variation graph; (f) the method proposed in this paper: loss-rate variation graph.

Figure 15. Bearing test bench in our school laboratory.

Figure 16. Confusion matrices for different migration learning fault diagnosis in migration test A2 → D2. (a) DC: accuracy = 91.15%; (b) DANN: accuracy = 92.67%; (c) DSAN: accuracy = 93.30%; (d) the method proposed in this paper: accuracy = 99.42%.

Figure 17. Plot of loss rate of fault diagnosis with different migration learning methods in migration test A2 → D2. (a) DC: loss-rate variation graph; (b) DANN: loss-rate variation graph; (c) DSAN: loss-rate variation graph; (d) the method proposed in this paper: loss-rate variation graph.

Table 1. Structural parameter settings.

Layers	Components	Tied Parameters	Padding
Layer1	Conv1	Kernels: 64 × 1 × 32, stride: 16	24
	BN	32	/
	ReLU	/	/
	Maxpool 1	2	/
Layer2	Conv2	Kernels: 32 × 1 × 32, stride: 16	16
	BN	32	/
	ReLU	/	/
	Maxpool 2	2	/
Layer3	Conv3	Kernels: 16 × 1 × 32, stride: 16	8
	BN	32	/
	ReLU	/	/
	Maxpool 3	2	/
CBAM	/	96	/
FC1	/	96 × 512	/
FC2	/	512 × 4	/

Table 2. Rolling bearing failure data information.

Type of Data Set	Type of Fault	Frequency	Speed (r/min)	Load/(Hp)	Fault Size	Sample Size	Tab
CWRU	NC	12 kHz	1750	2	0	1000	0
	IF		1750	2	0.007 inch.	1000	1
	OF		1750	2	0.007 inch.	1000	2
	BF		1750	2	0.007 inch.	1000	3
JNU	N	50 kHz	600	/	/	1000	0
	IB		600	/	/	1000	1
	OB		600	/	/	1000	2
	TB		600	/	/	1000	3
	N		800	/	/	1000	0
	IB		800	/	/	1000	1
	OB		800	/	/	1000	2
	TB		800	/	/	1000	3
	N		1000	/	/	1000	0
	IB		1000	/	/	1000	1
	OB		1000	/	/	1000	2
	TB		1000	/	/	1000	3

Table 3. Cross-condition experimental setup and results.

Migration Tasks	MMD	DC	DANN	DSAN	Method of This Paper
A1 → B1	82.83 ± 1.42%	91.57 ± 0.95%	93.97 ± 1.18%	94.25 ± 0.92%	98.65 ± 0.41%
A1 → C1	82.75 ± 1.44%	92.90 ± 0.96%	92.15 ± 1.20%	93.72 ± 0.92%	99.50 ± 0.43%
A1 → D1	84.75 ± 1.43%	92.15 ± 0.96%	93.95 ± 1.19%	95.37 ± 0.94%	98.60 ± 0.42%
B1 → A1	84.33 ± 1.43%	93.17 ± 0.95%	93.80 ± 1.19%	93.35 ± 0.92%	98.17 ± 0.41%
B1 → C1	85.35 ± 1.42%	94.72 ± 0.96%	92.87 ± 1.18%	94.20 ± 0.92%	98.45 ± 0.41%
B1 → D1	82.82 ± 1.43%	92.70 ± 0.97%	90.97 ± 1.18%	95.60 ± 0.93%	97.87 ± 0.43%
C1 → A1	81.52 ± 1.44%	92.22 ± 0.96%	92.07 ± 1.20%	94.47 ± 0.92%	98.50 ± 0.42%
C1 → B1	86.15 ± 1.42%	91.42 ± 0.95%	91.62 ± 1.18%	95.30 ± 0.91%	99.27 ± 0.42%
C1 → D1	85.00 ± 1.42%	91.60 ± 0.95%	90.07 ± 1.19%	96.25 ± 0.93%	99.32 ± 0.41%
D1 → A1	83.25 ± 1.43%	91.72 ± 0.97%	92.62 ± 1.20%	94.80 ± 0.94%	98.57 ± 0.43%
D1 → B1	85.15 ± 1.43%	91.97 ± 0.95%	92.87 ± 1.18%	93.92 ± 0.92%	98.80 ± 0.41%
D1 → C1	84.90 ± 1.42%	92.00 ± 0.96%	92.21 ± 1.18%	93.25 ± 0.92%	99.33 ± 0.41%
Average Value	84.07	92.35	92.43	94.54	98.75

Table 4. Experimental accuracy and runtime of adaptation methods across different fields.

Migration Tasks A1 → D1	MMD	DC	DANN	DSAN	Method of This Paper
Accuracy	84.75 ± 1.43%	92.15 ± 0.96%	93.95 ± 1.19%	95.37 ± 0.94%	98.60 ± 0.42%
Runtime	0.058 s	0.098 s	0.113 s	0.173 s	0.105 s

Table 5. Ablation test results.

Migration Tasks	Experiment 1	Experiment 2	Experiment 3	Experiment 4	Experiment 5	Method of This Paper
A1 → B1	86.80 ± 0.74%	94.87 ± 0.54%	91.32 ± 0.93%	92.62 ± 0.61%	95.95 ± 0.52%	98.65 ± 0.41%
A1 → C1	84.70 ± 0.75%	93.97 ± 0.56%	90.67 ± 0.95%	92.55 ± 0.63%	95.70 ± 0.52%	99.50 ± 0.43%
A1 → D1	84.95 ± 0.75%	94.35 ± 0.54%	90.13 ± 0.93%	92.75 ± 0.62%	96.53 ± 0.53%	98.60 ± 0.42%
B1 → A1	85.32 ± 0.74%	94.07 ± 0.55%	90.87 ± 0.93%	93.12 ± 0.61%	95.15 ± 0.52%	98.17 ± 0.41%
B1 → C1	85.15 ± 0.74%	94.53 ± 0.55%	90.05 ± 0.94%	93.10 ± 0.61%	95.87 ± 0.53%	98.45 ± 0.41%
B1 → D1	84.75 ± 0.76%	94.45 ± 0.54%	90.30 ± 0.94%	92.22 ± 0.62%	94.97 ± 0.52%	97.87 ± 0.43%
C1 → A1	84.73 ± 0.76%	94.12 ± 0.56%	90.75 ± 0.95%	92.67 ± 0.63%	95.62 ± 0.54%	98.50 ± 0.42%
C1 → B1	85.02 ± 0.75%	93.95 ± 0.55%	90.27 ± 0.95%	93.12 ± 0.62%	96.72 ± 0.52%	99.27 ± 0.42%
C1 → D1	84.15 ± 0.74%	93.71 ± 0.56%	91.97 ± 0.94%	92.82 ± 0.62%	96.22 ± 0.53%	99.32 ± 0.41%
D1 → A1	84.72 ± 0.75%	93.75 ± 0.56%	90.60 ± 0.95%	92.02 ± 0.61%	95.72 ± 0.53%	98.57 ± 0.43%
D1 → B1	85.02 ± 0.76%	94.37 ± 0.54%	90.47 ± 0.94%	91.97 ± 0.63%	95.95 ± 0.54%	98.80 ± 0.41%
D1 → C1	85.61 ± 0.75%	94.27 ± 0.55%	91.22 ± 0.93%	92.17 ± 0.62%	96.61 ± 0.53%	99.33 ± 0.41%
Average Value	85.08	94.20	90.72	92.59	95.92	98.75

Table 6. Accuracy and runtime of ablation test.

Migration Tasks A1 → D1	Experiment 1	Experiment 2	Experiment 3	Experiment 4	Experiment 5	Method of This Paper
Accuracy	84.95 ± 0.75%	94.35 ± 0.54%	90.13 ± 0.93%	92.75 ± 0.62%	96.53 ± 0.53%	98.60 ± 0.42%
Runtime	0.143 s	0.063 s	0.023 s	0.095 s	0.081 s	0.105 s

Table 7. Rolling bearing failure data.

Frequency	SNR/dB	Speed/(r/min)	Tab
A2	NF	800	0
	IF		1
	OF		2
	BF		3
B2	NF	1200	0
	IF		1
	OF		2
	BF		3
C2	NF	1600	0
	IF		1
	OF		2
	BF		3
D2	NF	2000	0
	IF		1
	OF		2
	BF		3

Table 8. Cross-condition experimental setup and results.

Migration Tasks	MMD	DC	DANN	DSAN	Method of This Paper
A2 → B2	85.12 ± 1.37%	91.15 ± 1.05	92.67 ± 0.94	93.30 ± 0.72	99.42 ± 0.51
A2 → C2	83.62 ± 1.38%	92.95 ± 1.07	92.72 ± 0.94	95.25 ± 0.71	97.55 ± 0.52
A2 → D2	82.77 ± 1.38%	91.82 ± 1.06	92.60 ± 0.95	94.67 ± 0.71	98.52 ± 0.52
B2 → A2	85.20 ± 1.37%	91.15 ± 1.05	94.52 ± 0.94	94.37 ± 0.73	99.70 ± 0.51
B2 → C2	86.45 ± 1.37%	92.90 ± 1.05	93.90 ± 0.96	93.75 ± 0.72	97.87 ± 0.52
B2 → D2	82.77 ± 1.39%	91.92 ± 1.07	93.57 ± 0.96	94.80 ± 0.71	96.53 ± 0.51
C2 → A2	85.40 ± 1.38%	89.85 ± 1.06	92.72 ± 0.95	94.92 ± 0.72	98.30 ± 0.53
C2 → B2	81.70 ± 1.37%	90.62 ± 1.05	90.05 ± 0.94	93.55 ± 0.71	99.27 ± 0.51
C2 → D2	85.60 ± 1.37%	91.02 ± 1.05	93.90 ± 0.94	95.57 ± 0.73	98.92 ± 0.51
D2 → A2	83.97 ± 1.39%	90.92 ± 1.07	91.65 ± 0.96	94.77 ± 0.72	99.17 ± 0.53
D2 → B2	84.67 ± 1.37%	90.85 ± 1.05	92.40 ± 0.94	93.82 ± 0.71	99.35 ± 0.51
D2 → C2	83.21 ± 1.38%	93.40 ± 1.05	93.07 ± 0.95	94.17 ± 0.73	98.30 ± 0.51
Average Value	84.21	91.55	92.81	94.41	98.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Wang, J.; Wang, J.; Wang, J.; Li, Q.; Yu, X.; Chen, J. Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN. Machines 2025, 13, 618. https://doi.org/10.3390/machines13070618

AMA Style

Li X, Wang J, Wang J, Wang J, Li Q, Yu X, Chen J. Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN. Machines. 2025; 13(7):618. https://doi.org/10.3390/machines13070618

Chicago/Turabian Style

Li, Xiaoxu, Jiahao Wang, Jianqiang Wang, Jixuan Wang, Qinghua Li, Xuelian Yu, and Jiaming Chen. 2025. "Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN" Machines 13, no. 7: 618. https://doi.org/10.3390/machines13070618

APA Style

Li, X., Wang, J., Wang, J., Wang, J., Li, Q., Yu, X., & Chen, J. (2025). Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN. Machines, 13(7), 618. https://doi.org/10.3390/machines13070618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Unsupervised Domain Adaptive Bearing Fault Diagnosis Method Based on Migration Learning Using MSACNN-IJMMD-DANN

Abstract

1. Introduction

2. Methods

2.1. Unsupervised Domain Adaptation

2.2. Multi-Scale Convolution

2.3. Convolutional Block Attention Module

2.4. JMMD and CORAL

2.5. DANN

3. Bearing Fault Diagnosis Model Based on Migration Learning

4. Results

4.1. Introduction to the Experimental Setup and Open Bearing Dataset

4.2. Comparative Experiments and Analysis of Results

4.3. Ablation Experiment and Result Analysis

4.4. Introduction to the Experimental Setup and Our Laboratory Bearing Dataset

4.5. Experimental Results and Analysis

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI