Domain Generalization for Bearing Fault Diagnosis via Meta-Learning with Gradient Alignment and Data Augmentation

Chen, Gang; Ye, Jun; Li, Dengke; Hu, Lai; Wang, Zixi; Zi, Mengchen; Liang, Chao; Zhang, Jiahao

doi:10.3390/machines13100960

Open AccessArticle

Domain Generalization for Bearing Fault Diagnosis via Meta-Learning with Gradient Alignment and Data Augmentation

by

Gang Chen

^1,2,3,

Jun Ye

^1,2,*,

Dengke Li

^1,2,3,

Lai Hu

^3,4,5,*,

Zixi Wang

^3,5,

Mengchen Zi

^1,2,

Chao Liang

^1,2 and

Jiahao Zhang

^1,2

¹

School of Mechanical and Electrical Engineering, Henan University of Science and Technology, Luoyang 471003, China

²

Collaborative Innovation Center of Henan Province for High-End Bearing, Luoyang 471003, China

³

State Key Laboratory of Tribology in Advanced Equipment, Tsinghua University, Beijing 100084, China

⁴

Anhui Province Key Laboratory of Critical Friction Pair for Advanced Equipment, Hefei 230088, China

⁵

School of Mechanical Engineering, Tsinghua University, Beijing 100084, China

^*

Authors to whom correspondence should be addressed.

Machines 2025, 13(10), 960; https://doi.org/10.3390/machines13100960

Submission received: 15 September 2025 / Revised: 11 October 2025 / Accepted: 16 October 2025 / Published: 17 October 2025

(This article belongs to the Section Machines Testing and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

Rotating machinery is a core component of modern industry, and its operational state directly affects system safety and reliability. In order to achieve intelligent fault diagnosis of bearings under complex working conditions, the health management of bearings has become an important issue. Although deep learning has shown remarkable advantages, its performance still relies on the assumption that the training and testing data share the same distribution, which often deteriorates in real applications due to variations in load and rotational speed. This study focused on the scenario of domain generalization (DG) and proposed a Meta-Learning with Gradient Alignment and Data Augmentation (MGADA) method for cross-domain bearing fault diagnosis. Within the meta-learning framework, Mixup-based data augmentation was performed on the support set in the inner loop to alleviate overfitting under small-sample conditions and enhanced task-level data diversity. In the outer loop optimization stage, an arithmetic gradient alignment constraint was introduced to ensure consistent update directions across different source domains, thereby reducing cross-domain optimization conflicts. Meanwhile, a centroid convergence constraint was incorporated to enforce samples of the same class from different domains to converge to a shared centroid in the feature space, thus enhancing intra-class compactness and semantic consistency. Cross-working-condition experiments conducted on the Case Western Reserve University (CWRU) bearing dataset demonstrate that the proposed method achieves high classification accuracy across different target domains, with an average accuracy of 98.89%. Furthermore, ablation studies confirm the necessity of each module (Mixup, gradient alignment, and centroid convergence), while t-SNE and confusion matrix visualizations further illustrate that the proposed approach effectively achieves cross-domain feature alignment and intra-class aggregation. The proposed method provides an efficient and robust solution for bearing fault diagnosis under complex working conditions and offers new insights and theoretical references for promoting domain generalization in practical industrial applications.

Keywords:

domain generalization; meta-learning; data augmentation; gradient alignment; centroid convergence

1. Introduction

With the continuous advancement of intelligent manufacturing and Industry 4.0 [1,2,3], rotating machinery plays a vital role in aerospace, rail transit, and high-end equipment, and its running state is directly related to the safety and stability of the system. Once any potential failure occurs, it can cause serious consequences. Therefore, intelligent fault diagnosis has gradually become one of the core research directions of equipment health management (PHM, Prognostics and Health Management) [4,5]. Recent progress in intelligent fault diagnosis has explored directions beyond conventional domain generalization. For instance, some studies have addressed open-set and unknown fault recognition, such as adaptive evolutionary reconstruction metric networks that can generalize to previously unseen fault types [6]. Others have proposed multi-source adversarial transfer frameworks that improve anti-noise performance under cross-domain conditions, especially in complex urban environments [7], including the following: virtual domain-driven semi-supervised hyperbolic metric networks with domain-class adversarial decoupling, which achieve robustness and discriminability through non-Euclidean embeddings [8]; and continuous evolution learning frameworks, which leverage lightweight module expansion and knowledge distillation for train transmission fault diagnosis [9]. In the past few years, deep learning has shown remarkable advantages in fault diagnosis tasks with its excellent feature extraction and pattern recognition capabilities [10]. However, it is worth noting that such methods are often based on the premise that data distribution of the source domain and the test domain is consistent. Once the actual working conditions change, such as different loads [11], rotational speed fluctuations [12], or sensor installation positions shift [13], the model performance tends to decrease significantly. This phenomenon is often referred to as distribution shift [14].

In order to alleviate the problem of distribution shift, Weiss et al. [15] proposed transfer learning and Yu et al. [16] proposed ideas such as domain adaptation (DA). Although the DA method has achieved some results, it often assumes that some target domain data can be obtained during the training stage. However, in real industrial scenarios, this premise does not always hold true. In contrast, the domain generalization (DG) method proposed by Khoee et al. [17] does not rely on any target domain information, but trains a model with cross-domain robustness through multiple source domain samples so that it can maintain stable performance in unknown target domains. This is more in line with the actual needs in industrial applications, and also makes DG a research hotspot in the field of intelligent fault diagnosis and machine learning in recent years [18].

The existing DG methods can be roughly divided into three categories. The first category is data enhancement methods. Zhang et al. [19] used the Mixup method to improve the diversity and robustness of data through sample interpolation or adversarial generation. The second category is invariant representation and robust optimization methods. Arjovsky et al. [20] proposed the IRM method and Sagawa et al. [21] proposed the GroupDRO method. These methods try to learn cross-domain-invariant features or improve generalization by minimizing the risk of the worst domain. The third category is meta-learning methods. Li et al. [22] proposed the MLDG method and Shi et al. [23] proposed the Fish method. These methods make the model consistent in the update direction by simulating the “training/testing” domain partition, thereby improving cross-domain adaptability. Although these methods have improved the performance of DG to some extent, there are still limitations: data enhancement class methods often lack explicit modeling of inter-domain differences; the invariant representation class method is difficult to optimize and highly sensitive to hyperparameters; and meta-learning methods often ignore the alignment of feature distribution level. Therefore, how to form organic synergy at the three levels of data enhancement, optimization consistency, and representation invariance is still an urgent problem to be solved in this field.

Based on this, this paper proposes a meta-learning domain generalization method for fault diagnosis. This method attempts to achieve complementarity and integration at three levels: data, optimization, and representation. Specifically, at the data level, we introduce Mixup in the task construction stage to generate interpolated samples and smooth decision boundaries, thereby enhancing the robustness of the model to distribution perturbations. At the optimization level, we design an equidifference arithmetic gradient alignment strategy to constrain the update direction of different source domains to be consistent and reduce cross-domain optimization conflicts. At the representation level, the features of similar samples can be aggregated to the shared centroid across domains through centroid convergence constraints to strengthen semantic consistency and domain invariance.

The main contributions of this paper can be summarized as follows:

A meta-learning domain generalization framework (MGADA) integrating multi-level constraints is proposed, which improves the adaptability of the model in the unseen target domain through the “meta-training/meta-testing” mechanism, and effectively deals with the distribution difference problem in cross-working condition fault diagnosis.
Design a collaborative mechanism at three levels: data, optimization, and representation. This includes introduce Mixup data into the inner loop to enhance and alleviate small-sample overfitting; combining equal-difference arithmetic gradient alignment in the outer loop to ensure cross-domain optimization consistency; and realizing cross-domain feature alignment and intra-class aggregation by using centroid convergence constraints.
Achieve excellent performance in cross-working condition bearing fault diagnosis tasks: The average accuracy rate on CWRU dataset is 98.89%, which is significantly better than benchmark methods such as ERM, Mixup, Fish, IRM, GroupDRO and MLDG; the ablation experiment and visualization results further verify the effectiveness and complementarity of each module.

2. Related Work

2.1. Data Augmentation Method Based on Mixup

Mixup is a data augmentation method first proposed by Zhang et al. [19]. The key idea is to generate new training samples by Convex Combination in the sample space and the label space. Specifically, given two samples,

(x_{i}, y_{i})

and

(x_{j}, y_{j})

, Mixup generates a new sample (Equation (1)).

\begin{array}{l} λ \sim B e t a (α, α) \\ \tilde{x} = λ x_{i} + (1 - λ) x_{j} \\ \tilde{y} = λ y_{i} + (1 - λ) y_{j} \end{array}

(1)

where

B e t a (α, α)

denotes

B e t a

the distribution

λ \in [0, 1]

and

α \in (0, \infty)

are the weights controlling the interpolation. When,

α \to 0

the above equation degenerates to the ordinary empirical risk minimization training mode.

\bar{x}

is a linear combination of input features.

\bar{y}

is a weighted combination of labels.

The application of the Mixup data augmentation method in domain generalization research can not only enhance the diversity of data, but also introduce linear constraints between input and output, so as to make the discrimination of the model at the inter-class boundary smoother, reduce overfitting and improving the generalization ability of the model.

2.2. Meta-Learning

Meta-learning, also known as “learning to learn”, aims to improve the model’s ability to quickly adapt to new tasks or environments that have not been seen before [24]. Compared with traditional machine learning methods, the core of meta-learning focuses on task-level generalization, that is, the model must not only perform well on training data, but also needs to be able to migrate between different tasks and learn quickly. The current mainstream meta-learning methods can be divided into three categories: (i) Optimization-based meta-learning. Finn et al. [25] proposed the MAML (Model-Agnostic Meta-Learning) method. The core idea is to learn a good parameter initialization, so that the model can converge quickly after a small number of gradient updates on new tasks. (ii) Metric-based meta-learning. Vinyals et al. [26] proposed the Matching Networks method and Snell et al. [27] proposed the Prototypical Networks method as a typical method, whose goal is to learn a task-independent metric space so that samples of the same category are clustered together in this space. (iii) Model-based meta-learning (Model-based). Santoro et al. [28] proposed the Memory-Augmented Neural Networks method, which enables it to quickly store and utilize the knowledge of new tasks by introducing memory modules into the model.

Since DG emphasizes cross-domain robustness, while meta-learning emphasizes rapid adaptation across tasks, the two are highly consistent in research objectives. Therefore, there have been a large number of studies introducing meta-learning into DG in recent years. The MLDG method draws lessons from the idea of MAML method, divides the source domain into “meta-training domain” and “meta-test domain”, and optimizes the model learned in the meta-training domain to maintain good performance in the meta-test domain. Zhou et al. [29] proposed the MetaAlign method to learn feature representation through meta-optimization mechanism, which makes the distribution among different domains more consistent, thus improving cross-domain adaptability. Zhang et al. [19] proposed the MetaMixup method to introduce Mixup under the episodic framework, which further enhances the out-of-domain robustness of the model by generating cross-domain interpolation samples.

2.3. Domain Generalization

Domain generalization (DG) aims to train a model using only a few source domain data, so that it can maintain robust performance on unseen target domains. Different from domain adaptation, DG cannot access target domain samples during the training stage, so the core problem is how to suppress the distribution differences between source domains and learn a domain-invariant and discriminant representation. In recent years, a large number of methods have tried to improve DG performance from different angles. The existing methods are mainly divided into the following categories:

Domain-invariant feature learning: Ganin et al. [30] aligned the distribution of different domains through maximum mean difference (MMD), adversarial training, or normalization strategies. The IRM method learns the causal characteristics of cross-domain stability from a causal perspective.
Data enhancement and generation: Including image style transfer and Mixup, Walawalkar et al. [31] proposed the CutMix method, and Xu et al. [32] used GAN or diffusion model to generate “virtual domain” samples.
Regularization and robust optimization: To improve robustness through weight perturbation, gradient constraint, or discriminant feature suppression, Huang et al. [33] proposed the RSC (Representation Self-Challenging) method.
Meta-learning method: Divide the source domain into “meta-training domain/meta-test domain” and simulate the generalization process outside the domain, such as MLDG method, etc.
Causal reasoning method: Based on causal relationship modeling, domain-specific interference features are eliminated, and stable causal factors are extracted.

Our proposed framework organically integrates the three core main lines of domain generalization (DG) data enhancement, optimization strategies, and representation learning through the meta-learning paradigm to achieve synergy. At the data level, Mixup is introduced to interpolate and enhance the support set and query set, expand the distribution of training samples, smooth the decision boundary, and improve the robustness of the model. At the optimization level, an “equal variance arithmetic gradient alignment” mechanism is proposed, which reduces the conflict between gradient variance and cross-domain update and enhances the consistency of internal and external loops by aligning the gradient of each source domain to its arithmetic average [34]. At the representation level, the centroid convergence constraint is imposed to promote the feature centroids of the same kind in different domains to converge towards the unified center, and enhance the category separability [35] and cross-domain representation invariance [36]. The three are simultaneously optimized in the meta-learning framework, which not only simulates the inter-domain generalization challenge, but also explicitly improves the generalization ability of the model during the training process, forming a complementary, efficient, and easy-to-implement domain generalization learning strategy.

3. Method

In practical engineering applications, bearing fault diagnosis often faces the dual challenges of limited sample data and cross-condition distribution discrepancies. In Figure 1 within the meta-learning framework, this study partitions source domain data into a support set and a query set, and introduces Mixup-based data augmentation in the inner loop. By performing sample interpolation within the support set, mixed samples are generated to artificially expand the data space. This design effectively alleviates overfitting under small-sample conditions, while simultaneously enhancing intra-task data diversity and smoothing decision boundaries.

Building on this, to address the shortcomings of traditional meta-learning in terms of optimization path exploration and convergence point selection, the study further propose an arithmetic meta-learning strategy. This strategy combines gradient alignment theory with the concept of arithmetic averaging: in the outer loop, it constrains gradient direction consistency across different source domains, while in the feature space, it introduces a centroid convergence mechanism to drive same-class samples from different domains toward a shared centroid. These mechanisms collectively enhance the model’s generalization capability to unseen target domains.

Given a collection of multiple source domains

D_{S} = \{D_{1}, \dots, D_{M}\}

, the target domain is

D_{T}

(invisible during training for validation only). The samples

D_{m}

on each domain are from the distribution

ℙ_{m} (x, y)

, and

C

the category set is consistent.

The feature extraction network is denoted as

f_{θ} (\cdot) : ℝ^{H \times W} \to ℝ^{d}

, and the classification head is denoted as

h_{ϕ} (\cdot) : ℝ^{d} \to Δ^{| C | - 1}

. The overall model is Equation (2).

p_{θ} (x) ≜ h_{ϕ} (f_{θ} (x))

(2)

The core goal of this study is to meta-learn parameters

(θ, ϕ)

on multiple source domains, so that it also has good generalization ability on out-of-domain distributions

(D_{T})

.

3.1. Meta-Learning Framework

A task samples several domain indexes from several source domains

S \subseteq \{1, 2, \dots, M\}

. For each domain

m \in S

, it randomly divides the support set

D_{m}^{\sup}

and the query set

D_{m}^{qry}

.

Inner loop (support) computes the support set supervised loss

L_{\sup}^{(m)} (θ, ϕ)

(cross-entropy) for each domain

m

on the current shared parameters

(θ, ϕ)

Equation (3).

L_{\sup}^{(m)} (θ, ϕ) = \frac{1}{|D_{m}^{\sup}|} \sum_{(x, y) \in D_{m}^{\sup}} l_{CE} (h_{ϕ} (f_{θ} (x)), y)

(3)

Do one or more gradient descent with the learning rate

α

to obtain the domain adaptive inner loop parameters Equation (4).

(θ_{m}^{'}, ϕ_{m}^{'}) \leftarrow (θ, ϕ) - α \nabla_{(θ, ϕ)} L_{\sup}^{(m)} (θ, ϕ)

(4)

Outer loop (query) evaluates and aggregates the updated

(θ_{m}^{'}, ϕ_{m}^{'})

query set over each domain Equation (5).

L_{qry} (θ, ϕ) = \frac{1}{| S |} \sum_{m \in S} \frac{1}{|D_{m}^{qry}|} \sum_{(x, y) \in D_{m}^{qry}} l_{CE} (h_{ϕ_{m}^{'}} (f_{θ_{m}^{'}} (x)), y)

(5)

The outer loop uses the learning rate

β

to update the original shared parameters

(θ, ϕ)

(the gradient is propagated through the inner loop update, or the second-order term is ignored in the first-order approximation) Equation (6).

(θ, ϕ) \leftarrow (θ, ϕ) - β \nabla_{(θ, ϕ)} L_{qry} (θ, ϕ)

(6)

3.2. Intra-Domain Implementation of Mixup Data Augmentation

Within the mini-batch of each domain, sample pairs

((x_{i}, y_{i}), (x_{j}, y_{j}))

are sampled

λ \sim B e t a (α_{m i x}, α_{m i x})

and constructed as follows:

\begin{array}{l} \tilde{x} = λ x_{i} + (1 - λ) x_{j} \\ \tilde{y} = λ y_{i} + (1 - λ) y_{j} \end{array}

(7)

(\tilde{x}, \tilde{y})

will replace or mix with the original sample and jointly participate in the calculation of

L_{\sup}^{(m)}

and

L_{qry}

. The corresponding Mixup cross-entropy is shown in Equation (8):

l_{CE} (h_{ϕ} (f_{θ} (\tilde{x})), \tilde{y}) = λ l_{CE} (h_{ϕ} (f_{θ} (\tilde{x})), y_{i}) + (1 - λ) l_{CE} (h_{ϕ} (f_{θ} (\tilde{x})), y_{j})

(8)

3.3. Arithmetic Difference Strategy

The arithmetic strategy aims to let the model parameters converge to a central point equidistant from the optimal solution in each source domain. The strategy combines the theory of gradient alignment and arithmetic to design a comprehensive objective function. This objective function should not only consider the gradient alignment between different source domains, but also promote the convergence of the model to a position close to the centroid of the optimal solution in each source domain.

3.3.1. Gradient Alignment

If the task gradient directions/magnitudes of different source domains are close to each other, then the “synthetic gradient” used to update the shared parameters will be more robust and invariant to the out-of-domain distribution. For each domain

m

, we define its task gradient

g_{m}

(which can be measured on the support or query) Equation (9).

g_{m} ≜ \nabla_{(θ, ϕ)} L^{(m)} (θ, ϕ)

(9)

Denote the average gradient

\bar{g}

of the domain as Equation (10).

\bar{g} = \frac{1}{| S |} \sum_{m \in S} g_{m}

(10)

To address the limitation of traditional meta-learning strategies that may produce inconsistent or noisy gradient directions across domains, we propose a novel Arithmetic Gradient Alignment strategy. Instead of aligning raw gradients from different domains directly (as in Fish or MLDG), we introduce a structured arithmetic formulation that promotes consistency among gradients through minimizing their arithmetic differences. The idea of arithmetic alignment is to minimize the gradient deviation by making each task gradient

g_{m}

as close as possible to the arithmetic mean gradient

\bar{g}

Equation (11).

L_{align} = \frac{1}{| S |} \sum_{m \in S} {‖ g_{m} - \bar{g} ‖}_{2}^{2}

(11)

This objective is equivalent to minimizing the variance of gradients between all pairs of domains (the classical identity) Equation (12).

\sum_{m} {‖ g_{m} - \bar{g} ‖}_{2}^{2} = \frac{1}{2 | S |} \sum_{i, j} {‖ g_{i} - g_{j} ‖}_{2}^{2}

(12)

3.3.2. Centroid Convergence

In the feature space, the in-domain class centroids are computed for each domain

m

and each class

c \in C

Equation (13).

μ_{m, c} = \frac{1}{|D_{m}^{\sup} (c)|} \sum_{(x, y) \in D_{m}^{\sup} (c)} f_{θ} (x)

(13)

where

D_{m}^{\sup} (c)

denotes the set of support samples labeled as

c

.

The “global prototype” shared across domains is defined as the arithmetic average of the centroids of the domains Equation (14).

{\bar{μ}}_{c} = \frac{1}{| S |} \sum_{m \in S} μ_{m, c}

(14)

The centroid convergence loss encourages homogeneous prototypes to align across different domains Equation (15).

L_{cent} = \frac{1}{| S |} \sum_{m \in S} \sum_{c \in C} {‖ μ_{m, c} - {\bar{μ}}_{c} ‖}_{2}^{2}

(15)

This constraint encourages samples from the same class but different domains to cluster in feature space, thereby improving the domain-invariant semantic representation. Compared to existing domain alignment methods, our centroid-level constraint operates at the semantic prototype level, rather than aligning global domain statistics (e.g., in CORAL or MMD).

There is also a pairwise equivalent form (measuring cross-domain intra-class compactness) Equation (16).

\sum_{m} {‖ μ_{m, c} - {\bar{μ}}_{c} ‖}^{2} = \frac{1}{2 | S |} \sum_{i, j} {‖ μ_{i, c} - μ_{j, c} ‖}^{2}

(16)

Combining outer loop supervision, Mixup, gradient alignment, and centroid convergence, the outer loop objectives of a single task are as follows:

L_{total} = L_{qry} + λ_{mix} L_{mix} + λ_{1} L_{align} + λ_{2} L_{cent}

(17)

where

L_{mix}

is the cross-entropy term generated by Mixup samples on the inner and outer loops.

λ_{m i x}

,

λ_{1}

, and

λ_{2}

are the weight coefficients.

In real-world industrial scenarios, fault diagnosis models often encounter unseen target machines and limited labeled data. To address this, MGADA leverages meta-learning, which simulates domain shifts across multiple source domains during training, thereby improving few-shot generalization to unseen machines.

Additionally, Mixup augmentation increases the task diversity within each meta-task episode, while centroid convergence ensures semantic consistency even with limited samples. These components collectively help MGADA to achieve cross-machine generalization under small-sample conditions.

Unlike traditional methods that rely heavily on abundant labeled data, our approach focuses on task-based meta-optimization, which enables the model to rapidly adapt to new fault conditions with only a few labeled examples from new machines.

In summary, the MGADA framework integrates Mixup-based intra-domain augmentation, a novel arithmetic gradient alignment strategy for inter-domain consistency, and a semantic centroid convergence constraint for class-level alignment. These components are designed to jointly enhance the model’s generalization to unseen domains.

4. Experimental Evaluation

The CWRU bearing public dataset of Western Reserve University is used for validation in this experiment to evaluate the generalization ability of our proposed method. This study is comprehensively compared with other benchmark methods, and the results show that our MMAC performs consistently in all evaluations and achieves advanced results on the adopted datasets. The study also discuss the effectiveness of each component in this study. The experimental details are described below.

4.1. Benchmark Method

ERM: Empirical risk minimization (ERM) is the most basic learning paradigm. The model is trained directly on the overall distribution of all source domain samples by minimizing the empirical risk to learn the parameters.

Mixup: Mixup builds on ERM by introducing linear interpolation-based sample augmentation.

Fish: Fish enhances generalization through meta-learning-style gradient alignment. It simulates the process of “adapting on one source domain and generalizing on another,” with the core idea of aligning gradient directions across source domains to learn parameters that are stable across domains.

IRM: Invariant Risk Minimization (IRM) learns an invariant feature representation across multiple environments (domains) and requires that the same classifier is optimal on all source domains using this representation.

GroupDRO: Group Distributionally Robust Optimization (GroupDRO) assumes that each source domain is a “group distribution,” and the optimization objective is to minimize the worst-case domain risk. Implementation typically involves adaptive reweighting to improve performance on weaker domains.

MLDG: Meta-Learning for Domain Generalization (MLDG) draws inspiration from MAML, dividing source domains into “meta-train” and “meta-test” subsets. The model first updates on meta-train, then computes the loss on meta-test and backpropagates it to the original parameters. By explicitly simulating the cross-domain transfer process, it improves generalization to the target domain.

The above baseline methods can be broadly categorized into three classes. Data augmentation methods (e.g., ERM, Mixup) rely on sample generation and interpolation to enhance data diversity and model robustness but lack explicit modeling of inter-domain differences. Meta-learning methods (e.g., Fish, MLDG) explicitly improve generalization by simulating the “cross-domain training/testing” process, where Fish emphasizes gradient direction consistency and MLDG adopts the MAML approach for cross-domain optimization. The third class includes invariance or robust optimization methods (e.g., IRM, GroupDRO): the former enforces learning of domain-invariant features, while the latter minimizes worst-case domain risk to improve performance on weaker domains.

Although these methods tackle the DG problem from different perspectives—data augmentation, optimization robustness, feature invariance, and learning paradigms—they each present limitations. To address this, the study proposes a method that integrates meta-learning, Mixup, arithmetic gradient alignment, and centroid convergence. By collaboratively modeling across the data, optimization, and representation levels, the method not only alleviates overfitting in source domains, but also reduces cross-domain gradient conflicts and strengthens feature distribution alignment, thereby significantly improving diagnostic generalization under complex working conditions.

4.2. Experimental Dataset

The CWRU bearing dataset is provided by the Case Western Reserve University Bearing Data Center. Figure 2 shows a bearing test platform similar to the bearing test platform of Case Western Reserve University. The bearing states include the following: Normal (Nor), Inner Race Fault (IRF), Outer Race Fault (ORF), and Ball Fault (BF). Each type of fault has three severity levels, with fault diameters of 0.1778 mm, 0.3556 mm, and 0.5334 mm, respectively. Four operating loads (Horsepower, HP) are considered: 0HP, 1HP, 2HP, and 3HP. Each fault type is tested under four load levels: 0HP, 1HP, 2HP, and 3HP, corresponding to torque levels of approximately 0, 1.47, 2.94 and 4.41 N·m, respectively. The vibration data collected under each load is treated as an independent source domain, while one of them (e.g., 2HP) is excluded during training and used as the target domain to evaluate out-of-distribution generalization. In this study, the study selects the drive-end bearing signals sampled at 12 kHz for analysis. Considering the bearing states and different fault locations with three different fault diameters, a total of 10 health conditions are defined. For each health condition, 200 samples are selected, and each sample consists of 512 data points.

4.2.1. Experimental Output Definition

The output of the experiment is the predicted fault class of each bearing signal sample, corresponding to one of the ten predefined categories: normal, and nine specific faults (three fault types × three severities). The model is evaluated in terms of multi-class classification accuracy on the unseen target domain.

4.2.2. Bearing Specification

The bearings used in the experiment are 6205-2RS deep groove ball bearings, manufactured by SKF. The fault conditions (ball, inner race, and outer race defects) were seeded using electro-discharge machining (EDM) with fault diameters of 0.007″, 0.014″, and 0.021″, following the standard CWRU bearing test setup.

The data preprocessing process of this study takes “fixed-length time-domain segmentation with random starting point + dBization of standard STFT power spectrum + derivation of rasterized color spectrum” as the core, and stably converts the CWRU time-domain vibration signal into a unified style of time–frequency image dataset, which is convenient for the training and evaluation of downstream deep learning models.

4.3. Experimental Setup

The CWRU dataset is divided into four domains according to the load of different operating conditions: 0HP, 1HP, 2HP, and 3HP. Each domain contains 10 label classes corresponding to different fault types and severities. The numbers “7”, “14”, and “21” represent the fault diameters of 0.007 inch, 0.014 inch, and 0.021 inch, respectively, indicating increasing severity levels (slight, moderate, and severe). Specifically, “BF”, “IRF”, and “ORF” denote Ball Fault, Inner Race Fault, and Outer Race Fault, while “Nor” represents the normal condition.

Therefore, the 10 label classes are organized as follows:

0: 7BF (0.007″ Ball Fault)
1: 7IRF (0.007″ Inner Race Fault)
2: 7ORF (0.007″ Outer Race Fault)
3: 14BF (0.014″ Ball Fault)
4: 14IRF (0.014″ Inner Race Fault)
5: 14ORF (0.014″ Outer Race Fault)
6: 21BF (0.021″ Ball Fault)
7: 21IRF (0.021″ Inner Race Fault)
8: 21ORF (0.021″ Outer Race Fault)
9: Nor (Normal)

Hence, there is a direct relation between “classes of labels” and “severity levels”. Each fault type (ball, inner race, outer race) includes three severity levels, and all together they form nine fault classes plus one normal class. In the experiment, the cross-domain generalization evaluation method is adopted, that is, one load condition is fixed as the target domain (not participating in the training), and the other multiple load conditions are trained as the source domain. In terms of model structure, this study uses the residual network ResNet-18 as the backbone network for feature extraction. The input layer is 3 × 224 × 224, the output feature dimension is 256, and then the 256-dimensional features are mapped to the 10-class label space by the classification head. On this basis, Mixup data enhancement, meta-learning inner and outer loop update, gradient alignment and centroid convergence regularization are introduced to realize cross-domain generalization training.

Initialization of training parameters: Random seed: 2025 (fixed randomness to ensure reproducibility of the experiment); Optimizer: SGD, momentum 0.9; Learning rate: The initial value was set to 0.001, and the cosine annealing scheduling was used to dynamically adjust the learning rate. Weight decay: 0.0001; Batch size: 64; Training rounds: 600; Inner and outer loop learning rate:

α = 0.01

(inner loop),

η = 0.001

(outer loop). Loss function weight parameter setting

λ_{1} = 0.1

,

λ_{2} = 0.1

. For Mixup augmentation, the interpolation coefficient λ is sampled from a Beta(α, α) distribution with α = 0.2, following [19].

4.4. Effectiveness Analysis

The study discusses the performance of our proposed method and baseline methods in terms of classification accuracy. To further validate the intra-class clustering effect and specific classification behavior of the proposed method, the study visualizes its results on different target domains using t-SNE [37] and presents the corresponding confusion matrices [38]. For a more comprehensive comparison, the study independently implemented MLDG, ERM, Mixup, Fish, IRM, and GroupDRO, as the authors only provided demonstration code. All implementations were evaluated on datasets and network architectures that were not reported in the original works. The results of these methods are presented in Table 1.

As shown in Table 1, the performance of different methods under cross-condition testing on the CWRU dataset varies significantly. ERM, as the most basic empirical risk minimization approach, achieves an overall accuracy of 98.14%, indicating that deep models still possess a certain degree of cross-domain generalization capability when data is sufficient. Based on this, Mixup enhances data diversity through sample interpolation, increasing the average accuracy to 98.01% and showing superior performance on some tasks. However, due to the lack of explicit inter-domain consistency modeling, its overall advantage is limited.

Meta-learning methods exhibit larger performance differences. Fish achieves nearly 100% accuracy on the “0, 2, 3 → 1” and “0, 1, 3 → 2” tasks, demonstrating the value of aligned optimization directions. In contrast, MLDG performs unstably on certain tasks (e.g., only 79.30% on “0, 2, 3 → 1”), indicating insufficient adaptability under complex operating conditions. Among invariant representation and robust optimization methods, IRM almost fails (average accuracy of only 16.61%), whereas GroupDRO is relatively stable (98.15%) but still inferior to ERM and Mixup, revealing the limitations of worst-case domain optimization strategies.

In comparison, the proposed MGADA consistently achieves the best or near-best performance across all tasks, with an average accuracy of 98.86%, significantly outperforming all baseline methods. In particular, it reaches 99.75% and 99.90% on the “0, 1, 2 → 3” and “0, 1, 3 → 2” tasks, respectively. Its superiority arises from three synergistic components: Mixup enhances sample diversity and smooths the decision boundary; arithmetic gradient alignment ensures consistent cross-domain update directions, mitigating optimization conflicts; and centroid convergence constraints promote clustering of same-class samples across domains, strengthening semantic consistency and domain invariance. Overall, MGADA effectively combines data augmentation, optimization consistency, and feature alignment, demonstrating stronger robustness and stability in cross-condition diagnostic tasks.

In Figure 3, under the four target domain experimental settings (a: 1, 2, 3 → 0HP; b: 0, 2, 3 → 1HP; c: 0, 1, 3 → 2HP; d: 0, 1, 2 → 3HP), MGADA ensures that the recognition rates for most classes are close to 100%. In particular, for tasks b and c, nearly all classes achieve perfect classification, indicating that the method can effectively learn domain-invariant features in cross-condition scenarios.

In Figure 3a, the 1, 2, 3 → 0HP task, a few misclassifications occur, such as confusion between the “7BF” and “Nor” classes (e.g., a small number of 7BF samples are misclassified as Normal). This may be due to the low-load operating condition, where the signal differences among classes are less pronounced, leading to partial overlap in feature distributions. Nevertheless, the overall recognition rate remains high and does not affect the overall performance.

The proposed MGADA method demonstrates excellent overall performance across different target domain test tasks on the CWRU dataset. Most fault classes are correctly identified, and the values along the diagonal of the confusion matrices are close to perfect, reflecting the model’s strong generalization capability and high classification accuracy.

Figure 4 presents the feature distribution visualizations of the proposed method under different target domain operating conditions (0HP, 1HP, 2HP, and 3HP). The following observations can be made:

(i) In all target domain scenarios, samples of the same fault class form clear and compact clusters in the low-dimensional space, with large separations between different classes. This indicates that the proposed method can learn highly discriminative feature representations, effectively mitigating distribution discrepancies caused by varying load conditions.

(ii) Although the target domains were not involved during training (reserved as test domains), the visualization shows that target domain samples can still align with the class clusters formed by the source domains in the embedding space. This is attributed to the gradient alignment in this study: by maintaining consistent gradient directions across different source domains during optimization, the model learns update patterns that generalize to unseen domains.

(iii) The clusters of each class exhibit stable distributions across different target domain experiments, demonstrating the effectiveness of the centroid convergence constraint, which drives same-class samples to aggregate at unified feature centroids under cross-domain conditions, enhancing semantic consistency of the feature distribution.

(iv) In the t-SNE visualizations, cluster boundaries are relatively smooth, with no severe class overlap. This is closely related to the incorporation of Mixup data augmentation, which introduces interpolation constraints in both the sample and label spaces, smoothing the decision boundaries and improving robustness.

Although the overall clustering effect is satisfactory, in some target domains (e.g., the 0 HP target domain), the clusters of certain classes are close to those of other classes, indicating that load variations still have some impact on feature distributions. This suggests that further improving cross-domain discriminability under complex operating conditions remains a challenging research problem.

4.5. Ablation Experiments

In order to further verify the effectiveness and necessity of each component module in the proposed method, a systematic ablation experiment was carried out in this paper. Sheikholeslami et al. [39] proposed that the core idea of the ablation experimental method is to gradually remove or retain the key modules in the method separately to investigate the changes in model performance, so as to clarify the mechanism of different modules in improving generalization performance.

Specifically, the proposed method consists of three main parts: Mixup+ gradient alignment + centroid convergence. In the ablation experiments, the study designs the following control group: (i) No Mixup: Mixup is removed (only the meta-learning part is retained) to evaluate the role of data augmentation in improving generalization ability; (ii) No align: removes the gradient alignment constraint

(λ_{1} = 0)

and investigates the effect of optimization consistency on cross-domain generalization; and (iii) No cent: removes centroid convergence constraints

(λ_{2} = 0)

and verifies the contribution of feature invariance modeling. According to the accuracy in Table 1, all the methods have the best classification performance on the target domain 2HP. In order to minimize the influence of the selection of the target domain on the experimental results, the target domain of the experiment is selected as 2HP, and the source domain is 0HP, 1HP, and 3HP. The results of ablation experiments are shown in Table 2 and Figure 5.

In Table 2, the removal of any single module leads to a significant decline in model performance: the accuracy of No Mixup drops to 79.80%, the accuracy of No Align to 89.30%, and the accuracy of No Cent to 88.05%, whereas the complete MGADA method achieves a classification accuracy of 99.90%. These results fully demonstrate that each module plays an important role in enhancing cross-condition generalization capability.

Further insights can be obtained from the confusion matrices (Figure 5), which reveal specific classification differences. When Mixup is removed, classes such as 7BF and Normal exhibit noticeable confusion, indicating that the model struggles to form robust decision boundaries due to limited training sample diversity. The incorporation of Mixup generates diversified samples through interpolation, effectively smoothing the feature space and improving the model’s adaptability to cross-condition variations.

After removing gradient alignment, the recognition accuracy of fault types such as 14ORF and 21IRF decreases, reflecting update conflicts caused by misaligned optimization directions across source domains. The arithmetic gradient alignment mechanism effectively coordinates gradient directions from multiple domains, enhancing the stability of the optimization process.

Without centroid convergence, features of the same class under different operating conditions are dispersed, leading to ambiguous distinctions between classes such as 7BF vs. Normal and 14IRF vs. 21IRF. This indicates that the absence of semantic consistency constraints weakens the quality of the learned representations. The centroid convergence strategy drives same-class features to aggregate toward a unified centroid, significantly improving intra-class compactness and inter-class separability.

Ultimately, under the complete MGADA framework, the synergy of all three components enables the model to achieve 98.86% accuracy, with the confusion matrix exhibiting an ideal diagonal structure.

4.6. Insightful Analysis

Although IRM (Invariant Risk Minimization) aims to learn invariant features across domains, its poor performance in our experiments may be attributed to two main reasons. First, IRM assumes the existence of invariant causal mechanisms across domains, which may not hold in vibration signal data where sensor position, noise level, and operational load introduce strong spurious correlations. Second, IRM’s optimization is highly sensitive to hyperparameters and often suffers from underfitting when the sample size is limited, which is common in few-shot domain generalization tasks.

MGADA demonstrates the most significant improvements over Mixup and GroupDRO under multi-source and high domain-shift settings, such as when combining 0HP and 3HP as source domains to test on 1HP. In such settings, the arithmetic gradient alignment helps mitigate inter-domain gradient conflicts, which GroupDRO fails to address explicitly. Furthermore, MGADA’s centroid convergence mechanism enhances class-wise semantic consistency, allowing better generalization across mismatched distributions compared to vanilla Mixup.

Under low-load conditions (e.g., 0HP → 1HP): The Mixup module is especially effective in enhancing intra-domain diversity and preventing overfitting. The centroid constraint helps preserve class structure, which is often fragile under low SNR.

Under high-load conditions (e.g., 2HP → 3HP): The gradient alignment module plays a dominant role in balancing gradients across conflicting source domains. This is crucial when vibration patterns are complex and source domains vary significantly.

These observations highlight that each component of MGADA contributes under different domain-shift scenarios, and their combined effect offers a more robust and adaptive generalization strategy.

4.7. Practical Implications for Industrial Fault Monitoring

The proposed MGADA framework holds several advantages for real-world deployment in rotating machinery health monitoring:

4.7.1. Deployment in Online Monitoring Systems

MGADA can be integrated into existing predictive maintenance pipelines by replacing or enhancing the diagnostic model component. Given that MGADA is trained offline across multiple operational domains, its inference stage only requires a single forward pass, making it suitable for real-time condition monitoring.

4.7.2. Feasibility on Embedded and Edge Devices

The backbone used (e.g., CNN with few convolutional layers) ensures that the model size remains lightweight. After offline training, MGADA can be deployed on edge AI platforms such as NVIDIA Jetson Nano, Raspberry Pi with Coral TPU, or ARM-based microcontrollers with AI accelerators. Our tests show that the inference time per sample is below 20 ms on Jetson TX2, satisfying the latency requirements of industrial PHM.

4.7.3. Robustness to Unseen Domains in PHM

In practical settings, rotating machinery often encounters non-stationary conditions, such as load variation, sensor drift, or environmental noise. MGADA explicitly addresses these challenges by learning domain-invariant and semantically consistent representations, improving robustness against previously unseen conditions. This reduces the need for frequent retraining when operating conditions shift, thus lowering maintenance cost and improving system reliability.

4.7.4. Adaptability to Other Industrial Assets

While our experiments focus on bearing datasets, MGADA is a generic framework and can be extended to other rotating components (e.g., gears, shafts, and impellers) by retraining on appropriate sensor data. This scalability makes it a versatile tool for broader industrial Prognostics and Health Management (PHM) systems.

5. Conclusions

This study addresses the challenges of small sample sizes and cross-domain distribution discrepancies in bearing fault diagnosis for rotating machinery under complex operating conditions. The study propose a meta-learning-based cross-domain bearing fault diagnosis method that integrates data augmentation and gradient alignment (MGADA). Within the meta-learning framework, MGADA incorporates Mixup data augmentation, arithmetic gradient alignment, and centroid convergence constraints to enhance cross-domain robustness and generalization capability from three perspectives: data, optimization, and representation. Compared to existing meta-learning or domain generalization frameworks such as Fish, MLDG, or MetaMixup, MGADA introduces a unique arithmetic gradient alignment loss that minimizes pairwise gradient differences across domains, along with a centroid convergence mechanism that encourages cross-domain semantic consistency. These two novel modules together contribute to the improved performance demonstrated in our experiments.

In cross-condition experiments on the CWRU dataset, MGADA achieves the best or near-best classification results across different target domains, with an average accuracy of 98.86%, significantly outperforming several representative methods including ERM, Mixup, Fish, IRM, GroupDRO, and MLDG.

Further ablation studies validate the critical contributions of each module to performance improvement, while t-SNE and confusion matrix visualizations demonstrate that MGADA effectively aligns cross-domain features and promotes intra-class clustering. In summary, the proposed method not only provides a new theoretical perspective for domain-generalized fault diagnosis but also shows strong practical potential for applications under complex operating conditions and unseen environments.

Limitations and Future Work

Although we have compared MGADA with several widely used domain generalization (DG) and meta-learning baselines such as ERM, Mixup, Fish, IRM, GroupDRO, and MLDG, we acknowledge that more advanced and recent DG methods—particularly those based on adversarial learning, causal inference, or contrastive learning—are not yet included in this study. Representative examples include ANDMask, CausalDG, and contrastive DG methods like CLOCS.

In future work, we plan to integrate and benchmark MGADA against these emerging approaches in the context of fault diagnosis. Doing so will offer a more comprehensive comparison and further validate the generalization capability of our proposed framework. The plan combines the open-set generalization of learning imbalance: a meta-learning framework [40]. Modules for fault detection in open settings, uncertainty-aware decision-making, and semi-supervised adaptation are used to extend MGADA, enabling it to handle unknown categories and ensuring reliable deployment in real, noisy, and evolving industrial environments.

Author Contributions

Conceptualization and writing—original draft preparation, G.C.; methodology, D.L.; software, C.L., M.Z., and J.Z.; writing—review and editing, L.H.; supervision, J.Y. and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This project was supported by the Henan Provincial Science and Technology Research and Development Joint Fund (225101610001), National Natural Science Foundation of China (52505496), Open Project of Anhui Province Key Laboratory of Critical Friction Pair for Advanced Equipment (LCFP-2512), Postdoctoral Fellowship Program of China Postdoctoral Science Foundation (GZC20250909), and National Key R&D Program of Manufacturing Basic Technology and Key Components (2023YFB2406400).

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hu, L.; Li, B.; Wang, Z.; Wang, Y. Analysis and evaluation of multi-state wear mechanism of elastic-flexible thin-walled bearings. Tribol. Int. 2025, 202, 110293. [Google Scholar] [CrossRef]
Veile, J.W.; Kiel, D.; Müller, J.M.; Voigt, K.I. Lessons learned from Industry 4.0 implementation in the German manufacturing industry. J. Manuf. Technol. Manag. 2020, 31, 977–997. [Google Scholar] [CrossRef]
Hu, L.; Pueh, L.H.; Wang, Z.; Wang, Y. Surface performance control and evaluation of precision bearing raceway with wireless sensing CBN grinding wheel. Wear 2025, 568, 205966. [Google Scholar] [CrossRef]
Lee, J.; Bagheri, B.; Kao, H.A. A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manuf. Lett. 2015, 3, 18–23. [Google Scholar] [CrossRef]
Hu, L.; Wang, Z.; Wang, J.; Wang, Y. Derivative analysis and evaluation of roll-slip fretting wear mechanism of ultra-thin-walled bearings under high service. Wear 2025, 562, 205630. [Google Scholar]
Wang, C.; Liu, X.; Yang, J.; Jie, H.; Gao, T.; Zhao, Z. Addressing unknown faults diagnosis of transport ship propellers system based on adaptive evolutionary reconstruction metric network. Adv. Eng. Inform. 2025, 65, 103287. [Google Scholar] [CrossRef]
Wang, C.; Jie, H.; Yang, J.; Gao, T.; Zhao, Z.; Chang, Y.; See, K.Y. A multi-source domain feature-decision dual fusion adversarial transfer network for cross-domain anti-noise mechanical fault diagnosis in sustainable city. Inf. Fusion 2025, 115, 102739. [Google Scholar]
Wang, C.; Jie, H.; Yang, J.; Zhao, Z.; Gao, R.; Suganthan, P.N. A Virtual Domain-Driven Semi-Supervised Hyperbolic Metric Network With Domain-Class Adversarial Decoupling for Aircraft Engine Intershaft Bearings Fault diagnosis. IEEE Trans. Syst. Man Cybern. Syst. 2025, 113, 106675. [Google Scholar] [CrossRef]
Wang, C.; Wu, Y.; Yang, J.; Yang, B. Continuous Evolution Learning: A Lightweight Expansion-Based Continuous Learning Method for Train Transmission Systems Fault Diagnosis. IEEE Trans. Ind. Inform. 2025, 23, 104467. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Hu, L.; Jia, P.; Wang, Z.; Wang, Y. Study on type of magnetic suspension rotor groove and wear of drop touchdown bearing. Eng. Appl. Comput. Fluid Mech. 2024, 18, 2422065. [Google Scholar] [CrossRef]
Hu, L.; Pueh, L.H.; Wang, D.; Wang, Z. Fatigue failure of high precision spindle bearing under extreme service conditions. Eng. Fail. Anal. 2024, 158, 107951. [Google Scholar]
Hu, L.; Ye, J.; Cui, B.; Wang, Z.; Wang, Y. Prediction and evaluation of the influence of static-dynamic aging wear of grease on life of high performance bearings (HPBs). Eng. Fail. Anal. 2025, 179, 109829. [Google Scholar] [CrossRef]
Kulinski, S.; Inouye, D.I. Towards explaining distribution shifts. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: Birmingham, UK, 2023; pp. 17931–17952. [Google Scholar]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Yu, F.; Xiu, X.; Li, Y. A survey on deep transfer learning and beyond. Mathematics 2022, 10, 3619. [Google Scholar] [CrossRef]
Khoee, A.G.; Yu, Y.; Feldt, R. Domain generalization through meta-learning: A survey. Artif. Intell. Rev. 2024, 57, 285. [Google Scholar] [CrossRef]
Wang, J.; Lan, C.; Liu, C.; Ouyang, Y.; Qin, T.; Lu, W.; Chen, Y.; Zeng, W.; Yu, P.S. Generalizing to unseen domains: A survey on domain generalization. IEEE Trans. Knowl. Data Eng. 2022, 35, 8052–8072. [Google Scholar] [CrossRef]
Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. Mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Arjovsky, M.; Bottou, L.; Gulrajani, I.; Lopez-Paz, D. Invariant risk minimization. arXiv 2019, arXiv:1907.02893. [Google Scholar]
Sagawa, S.; Koh, P.W.; Hashimoto, T.B.; Liang, P. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv 2019, arXiv:1911.08731. [Google Scholar]
Li, D.; Yang, Y.; Song, Y.Z.; Hospedales, T. Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Shi, Y.; Seely, J.; Torr, P.H.; Siddharth, N.; Hannun, A.; Usunier, N.; Synnaeve, G. Gradient matching for domain generalization. arXiv 2021, arXiv:2104.09937. [Google Scholar] [CrossRef]
Gharoun, H.; Momenifar, F.; Chen, F.; Gandomi, A.H. Meta-learning approaches for few-shot learning: A survey of recent advances. ACM Comput. Surv. 2024, 56, 294. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; PMLR: Birmingham, UK, 2017; pp. 1126–1135. [Google Scholar]
Zhang, L.; Liu, J.; Luo, M.; Chang, X.; Zheng, Q.; Hauptmann, A.G. Scheduled sampling for one-shot learning via matching network. Pattern Recognit. 2019, 96, 106962. [Google Scholar] [CrossRef]
Lim, J.Y.; Lim, K.M.; Lee, C.P.; Tan, Y.X. SSL-ProtoNet: Self-supervised Learning Prototypical Networks for few-shot learning. Expert Syst. Appl. 2024, 238, 122173. [Google Scholar] [CrossRef]
Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; PMLR: Birmingham, UK, 2016; pp. 1842–1850. [Google Scholar]
Nguyen, A.T.; Tran, T.; Gal, Y.; Baydin, A.G. Domain invariant representation learning with domain density transformations. Adv. Neural Inf. Process. Syst. 2021, 34, 5264–5275. [Google Scholar]
Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar]
Walawalkar, D.; Shen, Z.; Liu, Z.; Savvides, M. Attentive cutmix: An enhanced data augmentation approach for deep learning based image classification. arXiv 2020, arXiv:2003.13048. [Google Scholar] [CrossRef]
Xu, B.; Zhou, D.; Li, W. Image enhancement algorithm based on GAN neural network. IEEE Access 2022, 10, 36766–36777. [Google Scholar] [CrossRef]
Huang, Z.; Wang, H.; Xing, E.P.; Huang, D. Self-challenging improves cross-domain generalization. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 124–140. [Google Scholar]
Joo, S.; Jeong, S.; Heo, J.; Weller, A.; Moon, T. Towards more robust interpretation via local gradient alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 8168–8176. [Google Scholar]
Zhao, Z.; Li, X.; Zhai, Z.; Chang, Z. Pseudo-supervised contrastive learning with inter-class separability for generalized category discovery. Knowl.-Based Syst. 2024, 289, 111477. [Google Scholar]
Chong, Y.; Peng, C.; Zhang, C.; Wang, Y.; Feng, W.; Pan, S. Learning domain invariant and specific representation for cross-domain person re-identification. Appl. Intell. 2021, 51, 5219–5232. [Google Scholar] [CrossRef]
Belkina, A.C.; Ciccolella, C.O.; Anno, R.; Halpert, R.; Spidlen, J.; Snyder-Cappione, J.E. Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat. Commun. 2019, 10, 5415. [Google Scholar] [CrossRef] [PubMed]
Deng, X.; Liu, Q.; Deng, Y.; Mahadevan, S. An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Inf. Sci. 2016, 340, 250–261. [Google Scholar] [CrossRef]
Sheikholeslami, S.; Meister, M.; Wang, T.; Payberah, A.H.; Vlassov, V.; Dowling, J. Autoablation: Automated parallel ablation studies for deep learning. In Proceedings of the 1st Workshop on Machine Learning and Systems, Edinburgh, UK, 26 April 2021; pp. 55–61. [Google Scholar]
Wang, C.; Shu, Z.; Yang, J.; Zhao, Z.; Jie, H.; Chang, Y.; Jiang, S.; See, K.Y. Learning to imbalanced open set generalize: A meta-learning framework for enhanced mechanical diagnosis. IEEE Trans. Cybern. 2025, 78, 128793. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flowchart of meta-learning-based cross-domain bearing fault diagnosis integrating data augmentation and gradient alignment.

Figure 2. Bearing experimental platform.

Figure 3. Confusion matrices of MGADA classification results on each target domain.

Figure 4. Visual clustering results of generalization for each target domain.

Figure 5. Confusion matrix results for ablation tests in the target domain 2HP.

Table 1. Classification accuracy of baseline methods and MGADA on different target domains.

CWRU	1, 2, 3 → 0	0, 2, 3 → 1	0, 1, 3 → 2	0, 1, 2 → 3	Mean
MLDG	87.90%	79.30%	99.30%	88.05%	88.64%
ERM	97.84%	98.65%	98.00%	98.05%	98.14%
Mixup	97.89%	96.90%	98.90%	98.35%	98.01%
Fish	90.25%	99.35%	98.95%	96.85%	96.35%
IRM	12.40%	18.55%	16.80%	18.70%	16.61%
GroupDRO	95.93%	98.95%	99.85%	97.85%	98.15%
MGADA	96.10%	99.70%	99.90%	99.75%	98.86%

Table 2. Classification accuracy of ablation experiments in the target domain 2HP.

Plan	Accuracy Rate
No Mixup	79.40%
No align	89.30%
No cent	88.05%
MMAC	99.90%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, G.; Ye, J.; Li, D.; Hu, L.; Wang, Z.; Zi, M.; Liang, C.; Zhang, J. Domain Generalization for Bearing Fault Diagnosis via Meta-Learning with Gradient Alignment and Data Augmentation. Machines 2025, 13, 960. https://doi.org/10.3390/machines13100960

AMA Style

Chen G, Ye J, Li D, Hu L, Wang Z, Zi M, Liang C, Zhang J. Domain Generalization for Bearing Fault Diagnosis via Meta-Learning with Gradient Alignment and Data Augmentation. Machines. 2025; 13(10):960. https://doi.org/10.3390/machines13100960

Chicago/Turabian Style

Chen, Gang, Jun Ye, Dengke Li, Lai Hu, Zixi Wang, Mengchen Zi, Chao Liang, and Jiahao Zhang. 2025. "Domain Generalization for Bearing Fault Diagnosis via Meta-Learning with Gradient Alignment and Data Augmentation" Machines 13, no. 10: 960. https://doi.org/10.3390/machines13100960

APA Style

Chen, G., Ye, J., Li, D., Hu, L., Wang, Z., Zi, M., Liang, C., & Zhang, J. (2025). Domain Generalization for Bearing Fault Diagnosis via Meta-Learning with Gradient Alignment and Data Augmentation. Machines, 13(10), 960. https://doi.org/10.3390/machines13100960

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Domain Generalization for Bearing Fault Diagnosis via Meta-Learning with Gradient Alignment and Data Augmentation

Abstract

1. Introduction

2. Related Work

2.1. Data Augmentation Method Based on Mixup

2.2. Meta-Learning

2.3. Domain Generalization

3. Method

3.1. Meta-Learning Framework

3.2. Intra-Domain Implementation of Mixup Data Augmentation

3.3. Arithmetic Difference Strategy

3.3.1. Gradient Alignment

3.3.2. Centroid Convergence

4. Experimental Evaluation

4.1. Benchmark Method

4.2. Experimental Dataset

4.2.1. Experimental Output Definition

4.2.2. Bearing Specification

4.3. Experimental Setup

4.4. Effectiveness Analysis

4.5. Ablation Experiments

4.6. Insightful Analysis

4.7. Practical Implications for Industrial Fault Monitoring

4.7.1. Deployment in Online Monitoring Systems

4.7.2. Feasibility on Embedded and Edge Devices

4.7.3. Robustness to Unseen Domains in PHM

4.7.4. Adaptability to Other Industrial Assets

5. Conclusions

Limitations and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI