4.1. Dataset Description
(1) Experiment (1) (Cross-domain bearing fault diagnosis on the same device)
The validity of the proposed method is verified using the test rig shown in
Figure 3. This rig mainly includes the motor, coupling, bearing housing, gearbox, and brake. The test bearing is NSK6205; the operating speed includes 1800 r/min and 2400 r/min; the radial load includes 0 N,
600 N, and the sampling frequency is 10,240 Hz. Inner and outer ring cracks are artificially introduced by using wire cutting, at depths of 0.2 mm and 0.5 mm. Therefore, this experiment primarily involves five distinct working conditions, with each condition comprising 100 samples; each sample has a length of 1024. Six different working conditions (A1, A2…A6) are set up according to different loads and rotational speeds, and their detailed information is shown in
Table 1. A multi-source domain transfer experiment task is established to validate the model performance (see
Table 2 and
Table 3).
(2) Experiment (2) (Cross-equipment bearing fault diagnosis)
Experiment (2) utilizes bearing datasets from different devices, including data from Case Western Reserve University (CWRU) [
36], the American Society for Mechanical Failure Prevention Technology (MFPT) [
37], Jiangnan University (JNU) [
38], and our own test data. The details are shown in
Table 4. The configurations of datasets are summarized in
Table 4, and the results of source domain selection for Experiment (2) are presented in
Table 5. For the CWRU dataset, the test bearing is an SKF6205 deep groove ball bearing. The collected vibration signals form the drive-end bearing; the IN and OU fault diameters are 0.36 mm, and the sampling frequency is 12 kHz. In the MFPT dataset, the fault size is 0.38 mm, the speed is 1500 r/min, and the sampling frequencies for Normal, OU, and IN conditions are 97,656 Hz, 97,656 Hz, and 48,828 Hz, respectively. The Normal and OU radial loads for dataset F are 270 lbs, and the IN loads are 250 lbs and 300 lbs, respectively. In the JNU dataset, the experimental speeds were 600 r/min and 800 r/min, and the sampling frequencies were 50 kHz. N205 bearings were used for normal, outer ring faults and rolling faults. NU205 bearings with detachable outer rings were used for the inner ring faults, with inner and outer ring fault dimensions measuring 0.3 × 0.25 mm (width × depth). The self-test dataset is derived from vibration signals collected from the faulty bearing, with a fault depth of 0.2 mm and a sampling frequency of 10,240 Hz. Three types of faults are selected for each dataset: the normal, the inner ring fault (IN), and the outer ring fault (OU), totaling 300 samples. The experimental results of cross-equipment diagnosis are displayed in
Table 6.
4.2. Experimental Results
To validate the effectiveness of the proposed method, it is compared with several traditional machine learning methods and transfer learning (TL) methods, including k-Nearest Neighbors (KNNs) [
39], Support Vector Machines (SVMs) [
40], Domain Adversarial Neural Networks (DANNs) [
41], Geodesic Flow Kernel (GFK) [
42], Transfer Component Analysis (TCA) [
43], and Semi-Supervised Transfer Component Analysis (SSTCA) [
44].
(1) Implementation Details:
As suggested in the literature [
45], the traversal optimization method is employed to grid search for the optimal parameter settings. For MJSPTA, the fixed average weight parameter
,
is utilized as the kernel function. The kernel function of SVM is
with a penalty factor C = 1; the optimal number of nearest neighbors of the KNN method is selected in
. In TCA, the optimal hyper-parameters are through Bayesian optimization, with the parameter
searched in the range of
, and the range of the dimension of the subspace in the range of
. For SSTCA, the parameter
is in the range of
, and the parameter
is in the range of
. Moreover, TCA and SSTCA both utilize a linear kernel function and SVM for construction of the model, with C set to 1. The parameter setting of GFK is based on the literature [
42], employing nearest neighbors as the classifier; the parameter setting of TLPP is based on the literature [
46], optimizing parameters
,
, and
within the range of
by using the linear kernel function and SVM classifiers. For DANN, the training batch size is set to 64, and the learning rate is 0.001, as recommended in the literature [
47]. The detailed hyper-parameter settings are summarized in
Table 7.
(2) Experimental Results:
In Experiment (1), the results of multi-source domain selection are presented in
Table 2. The three source domains with the greatest similarity to the target domain were selected for the transfer task experiment, and the diagnostic results are shown in
Table 3. Most of the transfer learning (TL) algorithms in the comparison set performed similarly to the traditional machine learning methods, achieving accuracies mostly in the range of 80–90%. However, the proposed MJSPTA method significantly outperformed them, achieving an accuracy of 98.63%. The reason for this superior performance is that MJSPTA enhances the separability of the samples by constructing graph embeddings and applying Fisher’s discriminant criterion. This approach reduces intra-class distances and increases inter-class distances while retaining the local sample consistency, making prediction results more reliable. Additionally, MJSPTA incorporates a voting mechanism to further improve reliability.
In Experiment (2), the results of source domain selection are presented in
Table 5, and the results of cross-domain diagnosis are displayed in
Table 6. Even though the source and target domain data originated from different devices, the average recognition rate of the proposed MJSPTA method still achieved 98.93%, surpassing all the compared methods. This superior performance is attributed to MJSPTA’s more accurate low-dimensional mapping, which considers bidirectional mapping. MJSPTA utilizes distinct projection matrices for the source and target domains and Fisher’s criterion to identify shared fault features across domains. This approach preserves the local manifold structure and discriminative information of the data while minimizing the domain discrepancies in the subspace.
The superior performance of MJSPTA is closely related to its ability to align with the vibrational dynamics and physical fault characteristics of bearings.
For early weak faults (e.g., inner ring fault with 0.2 mm depth), MJSPTA’s local manifold structure preservation module retains the transient impulse features of vibration signals, which are the core physical signatures of early faults. This ensures that even subtle fault characteristics are not lost during domain adaptation, leading to near-perfect classification accuracy (e.g., 100.00% for the A2 condition in Experiment (1). In contrast, comparison methods such as TCA and DANN neglect these physical features, resulting in lower accuracy for early faults.
The MMD-based distribution alignment aligns the energy distribution of vibration signals in characteristic frequency bands between source and target domains. This ensures that the physical essence of fault signals is consistent across domains, especially in cross-equipment experiments where bearing models and operating conditions differ.
The low standard deviation of MJSPTA (±0.10% in Experiment (1); ±0.16% in Experiment (2) reflects its stability in capturing consistent physical fault characteristics, while comparison methods with higher standard deviations (e.g., DANN ±0.71% in Experiment (1) are more susceptible to non-physical noise interference, indicating weaker robustness in extracting physical fault features.
To visualize the data processing effects, two representative tasks were selected: the multi-source domain transfer task A6 → A1 and the cross-equipment transfer task Z1 → A1. The transformed fault features were projected into a 3D space using the t-Distributed Stochastic Neighborhood Embedding (t-SNE) [
48] algorithm and visualized as scatter plots, as shown in
Figure 4 and
Figure 5, respectively. As shown in
Figure 4, after domain adaptation via multiple transfer learning methods, most fault categories are separated more distinctly. However, similar faults exhibit poor clustering. There is overlap between normal and outer ring 0.5 mm samples, making it difficult to distinguish the faults. From
Figure 5, it is apparent that the clustering performance of the compared methods is inferior to MJSPTA, with significantly poorer class separability. The specific reasons are as follows: the method proposed in this paper reduces distributional differences between the data of the two domains, preserves the local manifold structure of the samples, and explores the domain-shared features by jointly aligning the source and target domains at both the feature level and the sample level. This approach makes the fault features more distinguishable and representative.
To quantitatively evaluate the clustering performance shown in
Figure 4 and
Figure 5, we calculated the Trace of Between-Class Scatter (
) and Trace of Within-Class Scatter (
). As shown in
Table 8, the Fisher ratio
for MJSPTA is 434.69, which is substantially higher than TCA (196.13) and DANN (75.36). This quantitative result confirms that MJSPTA yields the most compact intra-class structure and separable inter-class margins.
4.3. Parametric Analysis
To further analyze the impact of the target variance factor and the local retention factor as well as the subspace dimension d on the fault diagnosis results, parametric analyses of and were conducted. According to the multi-source domain selection method, the optimal set of source domains was obtained and three transfer tasks Z1 → A1, Z2 → A3 and X4 → A5 were randomly selected for parameter analysis.
The effect of the variation of the local preservation factor
on the recognition accuracy is shown in
Figure 6a; the value of
should be set smaller in the interval
.
The target variance term aims to optimize the feature mapping by maintaining the variance properties of the target domain, helping the model to maintain the distribution and internal structure of the data in the target domain. By optimizing the target variance term, the model can effectively adapt to the data distribution in the target domain and can enhance its generalization ability. The influence of parameter
is shown in
Figure 6b, and the recognition accuracy increases monotonically as
is increasing. Therefore, in the illustration, the optimal range of
is
.
As shown in
Figure 6c, MJSPTA is not sensitive to the value of the subspace embedding dimension d. The reason is that the target variance term stabilizes the feature mapping at the global distribution level; the local holdout term provides fault tolerance at the local structure level, constrains the manifold structure in the feature space, and preserves the similarity relationships of neighboring samples. This joint parameter optimization makes the model more flexible and practical for selecting subspace dimensions in engineering.