Episodic Training and Feature Orthogonality-Driven Domain Generalization for Rotating Machinery Fault Diagnosis Under Unseen Working Conditions

Liao, Yixiao; Zhou, Songbin; Liu, Yisen; Pang, Kunkun; Li, Jing; Li, Chang; Zhao, Lulu

doi:10.3390/machines13070563

Open AccessArticle

Episodic Training and Feature Orthogonality-Driven Domain Generalization for Rotating Machinery Fault Diagnosis Under Unseen Working Conditions

by

Yixiao Liao

^1,2

,

Songbin Zhou

^1,2,*,

Yisen Liu

^1,2,

Kunkun Pang

^1,2

,

Jing Li

^1,2,

Chang Li

^1,2 and

Lulu Zhao

^1,2

¹

Institute of Intelligent Manufacturing, Guangdong Academy of Sciences, Guangzhou 510070, China

²

Guangdong Key Laboratory of Modern Control Technology, Guangzhou 510070, China

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(7), 563; https://doi.org/10.3390/machines13070563

Submission received: 22 May 2025 / Revised: 17 June 2025 / Accepted: 27 June 2025 / Published: 28 June 2025

(This article belongs to the Special Issue Advanced Signal Processing Methods and Deep Neural Networks for Machine Fault Diagnosis)

Download

Browse Figures

Versions Notes

Abstract

In recent years, domain generalization-based fault diagnosis (DGFD) methods have shown significant potential in rotating machinery fault diagnosis in unseen target domains. However, these methods focus on learning domain-invariant representations via feature distribution adaptation. The generalization of classifiers and the orthogonality between fault-related and domain-related features have not been thoroughly explored, which hinders further improvements in DGFD performance. To address these limitations, an episodic training and feature orthogonality-driven domain generalization (EODG) method is proposed. In this method, episodic training is introduced to jointly improve the generalization capabilities of both the feature extractor and fault classifier, while a novel feature transfer loss is proposed for learning domain-invariant representations. Furthermore, the orthogonality between fault-related and domain-related features is enhanced by minimizing their cosine similarity, thereby improving the generalization capability of the DGFD model. The experimental results validated the effectiveness and superiority of the proposed method on domain generalization-based fault diagnosis tasks.

Keywords:

fault diagnosis; rotating machinery; domain generalization; episodic training; feature orthogonality

1. Introduction

Rotating machinery is a critical component of mechanical systems and is widely used in industrial applications [1,2]. With the growing complexity and intelligence of mechanical systems, the unexpected failures of rotating machinery can lead to severe economic loss and even safety accidents. Consequently, rotating machinery fault diagnosis becomes more and more important for ensuring the operational reliability of mechanical systems, which can significantly reduce safety accidents and downtime [3,4].

With the advancement of artificial intelligence, intelligent fault diagnosis (IFD) has gained widespread attention and became the mainstream technology for rotating machinery fault diagnosis [5,6]. Many researchers have already integrated a whole IFD system into a bearing [7]. Among these IFD methods, deep learning (DL)-based methods are particularly prominent due to their extraordinary nonlinear mapping capability, which allows them to map the raw operating data of rotating machinery to its health condition without manual feature extraction [8,9]. However, DL models typically require a large amount of labeled training data and assume that the test data are independent and identically distributed with the training data at the inference stage. Because the working condition of rotating machinery has a significant influence on data distributions, the DL model used needs to be trained from scratch to avoid performance degradation when the working condition is changed, which limits the application of DL based methods [10,11].

To solve the above problems, researchers introduced transfer learning for rotating machinery fault diagnosis, which includes domain adaptation-based fault diagnosis (DAFD) and domain generalization-based fault diagnosis (DGFD) [12,13,14]. DAFD leverages labeled source-domain datasets to learn fault diagnosis knowledge and transfers it to target-domain fault diagnosis tasks via collaborative training with unlabeled target-domain datasets. Tian et al. [15] proposed a multi-source information transfer learning method for DAFD, which used local maximum mean discrepancy (MMD) for fine-grained local alignment and used distribution distance to weigh source domains. Qian et al. [16] developed a novel distribution discrepancy metric for cross-machine fault diagnosis by combining MMD and CORAL. Huo et al. [17] proposed a novel linear superposition network with pseudo-label learning for DAFD. Li et al. [18] proposed an auto-regulated universal domain adaptation network for universal domain adaptation fault diagnosis, which does not require prior knowledge about the label space of the target domain. DAFD can effectively retrain the model for fault diagnosis tasks under new working conditions without obtaining a labeled dataset, which significantly reduces the cost of training. However, DAFD needs to collect target-domain datasets, and the trained model is only available for the target domain, while the diagnosis performance of the trained model still suffers from significant degradation under unseen working conditions.

In practical industrial applications, the working condition of rotating machinery always needs to be adjusted to satisfy the manufacturing requirements, and collecting faulty data is expensive and time-consuming [19]. DGFD is proposed for this concern and to further broaden the applications of the DL model. The goal of DGFG is to learn domain-invariant fault diagnosis knowledge from multiple source domains and train a DGFD model which can effectively apply for the fault diagnosis task in an unseen target domain. Therefore, the trained DGFD model can maintain its performance under unseen working conditions. Recently, DGFD research has made considerable progress, and many works have been published. Li et al. [20] proposed a time-stretching method for domain augmentation and combined it with domain-adversarial training and distance metric learning to learn domain-invariant fault diagnosis knowledge. Zhang et al. [21] proposed conditional generative adversarial networks (CGANs) for bearing DGFD, which used a discriminator that can simultaneously classify the fault type and domain label for domain-adversarial training. Chen et al. [22] proposed adversarial-domain-invariant generalization (ADIG) for bearing fault diagnosis under unseen conditions, in which adversarial learning and feature normalization strategies were leveraged to learn domain-invariant knowledge. Ragab et al. [23] used mutual information to capture shareable fault information and learn domain-independent representation. Li et al. [24] proposed causal consistency loss and collaborative training loss to learn consistent causality knowledge. Jia et al. [25] proposed causal disentanglement domain generalization for machine domain generalization fault diagnosis, which used a structural causal model to disentangle fault-related and domain-related representations. Aiming for imbalanced DGFD, Zhao et al. [26] used a semantic regularization-based mix-up strategy to synthesize samples for minority classes; they acquired discriminative knowledge by minimizing the triplet loss. Zhu et al. [27] proposed a decoupled interpretable robust domain generalization network (DIRNet), which used dynamic Shapley to prune the fault-unrelated neural basis functions. Pang [28] maximized the independence between features and domain labels to obtain domain-invariant features based on the Hilbert–Schmidt Independence Criterion (HSIC). Xu et al. [29] proposed a Domain-Private-Suppress Meta-Recognition Network (DPSMR), which can recognize unknown fault types in domain generalization fault diagnosis tasks. Recent studies have applied transformer and self-supervised learning for DGFD: Lu et al. [30] proposed a prior knowledge-embedded convolutional autoencoder (PKECA), which constructed a centroid-based self-supervised learning strategy to improve the generalization of the model; Xiao et al. [31] proposed a Bayesian variational transformer that treated all the attention weights as latent random variables to train an ensemble of networks for enhancing the generalization of the fault diagnosis model. In the existing literature, most DGFD methods focus on learning domain-invariant representations across source domains, while the generalization of fault classifiers is overlooked. Because the domain-invariant representations are learned by aligning feature distributions, the learned representations are not strictly domain-invariant and independent with domain-related features. Therefore, these methods would be less effective if the discrepancy between the target domain and source domain is substantial.

To address these challenges, this paper proposed an episodic training and feature orthogonality driven domain generalization (EODG) method. This method introduces episodic training between the general modules and domain-specific modules to improve the generalization capabilities of both the fault classifier and feature extractor. For example, the general feature extractor is paired with domain-specific classifiers, and the general classifier is combined with domain-specific feature extractors, while the hybrid models are trained by supervised learning. Via episodic training, the classifier learns to classify the features with or without domain information, thereby broadening the decision boundaries. In addition, a novel feature transfer loss is proposed for learning domain-invariant representation. This loss minimizes the distribution discrepancy between same-class samples across different source domains, while maximizing the distribution discrepancy between different-class samples. As a result, the intra-class feature distribution becomes more compact, while the inter-class separability is improved. Furthermore, a feature orthogonalization constrain is applied on fault-related and domain-related features to further eliminate domain information.

In EODG, the basic domain generalization capability of the DGFD model is achieved by minimizing the feature transfer loss, whereas the combination of episodic training and feature orthogonality further improves the generalization of both the general feature extractor and the general fault classifier. The main contributions of this study are as follows:

(1): A novel EODG method is proposed for the DGFD of rotating machinery. The proposed EODG method can effectively diagnose the health state of rotating machinery under unseen working conditions by jointly improving the generalization capabilities of the feature extractor and fault classifier.
(2): Episodic training is introduced to broaden the decision boundaries of the general fault classifier. The general module is integrated with domain-specific modules, and the hybrid models are trained by supervised learning.
(3): Feature orthogonalization constraint is combined with the proposed feature transfer loss to train a general feature extractor that can extract domain-invariant features.

2. Materials and Methods

2.1. Problem Formulation

In this paper, the heterogeneous DGFD problem is studied. We consider the source domains

D^{s} = {\{D_{i}^{s}\}}_{i = 1}^{n_{s}} = {\{X_{i}^{s}, P (X_{i}^{s})\}}_{i = 1}^{n_{s}}

and their learning tasks

T^{s} = {\{T_{i}^{s}\}}_{i = 1}^{n_{s}} = {\{Y_{i}^{s}, f_{i}^{s} (\cdot)\}}_{i = 1}^{n_{s}}

, as well as the target domains

D^{t} = {\{D_{i}^{t}\}}_{i = 1}^{n_{t}} = {\{X_{i}^{t}, P (X_{i}^{t})\}}_{i = 1}^{n_{t}}

and learning tasks

T^{t} = {\{T_{i}^{t}\}}_{i = 1}^{n_{t}} = {\{Y_{i}^{t}, f_{i}^{t} (\cdot)\}}_{i = 1}^{n_{t}}

, where

X_{i}^{s}

represents the feature space,

P (X_{i}^{s})

represents the marginal distribution,

Y_{i}^{s}

represents the label space, and

f_{i}^{t} (\cdot)

represents the predictive function.

In a heterogeneous DGFD setting, the marginal distributions vary across domains; that is,

P (X_{1}^{s}) \neq \dots \neq P (X_{n_{s}}^{s}) \neq P (X_{1}^{t}) \neq \dots \neq P (X_{n_{t}}^{t})

. The label space of different source domains can be different; however, for every source domain, its label space

Y_{i}^{s} (1 \leq i \leq n_{s})

is required to overlap with at least one other source domain

Y_{j}^{s} (1 \leq j \neq i \leq n_{s})

; that is,

Y_{j}^{s} \cap Y_{i}^{s} \neq \emptyset

. The label spaces of target domains are the subset of the union set of label spaces of source domains; that is,

Y_{l}^{t} \subset \cup_{i = 1}^{n_{s}} Y_{i}^{s}, (l = 1, 2, \dots, n_{t})

. The aim of DGFD is to train an intelligent fault diagnosis model with source-domain samples, and the trained model can accurately diagnose faults under unseen target domains.

2.2. The Proposed Method

As illustrated in Figure 1, in the proposed episodic training and feature orthogonality driven domain generalization (EODG) method, the general feature extractor

G (\cdot)

and general fault classifier

C (\cdot)

are combined to create the final DGFD model

C (G (\cdot))

, whereas the domain classification model

D (G_{d} (\cdot))

that consists of a domain feature extractor

G_{d} (\cdot)

and domain classifier

D (\cdot)

and the domain-specific fault diagnosis model

C_{i}^{s} (G_{i}^{s} (\cdot))

that consists of domain-specific feature extractors

G_{i}^{s} (\cdot)

and domain-specific fault classifiers

C_{i}^{s} (\cdot)

(i = 1, 2, \dots, n_{s})

are used to facilitate the learning of domain-invariant fault diagnosis knowledge. The learning procedure of EODG consists of three parts: (1) supervised learning; (2) domain-invariant representation learning; (3) episodic training. Supervised learning aims to enable the DGFD model, domain-specific fault diagnosis models, and the domain classification model to complete their respective fundamental tasks effectively. In addition, domain-invariant representation learning aims to enable

G (\cdot)

to extract domain-invariant features. Finally, episodic training is applied to enhance the generalization capabilities of both

C (\cdot)

and

G (\cdot)

.

2.2.1. Supervise Learning

Supervised learning is applied to train these models to acquire basic abilities on their respective classification tasks. Via supervised learning,

G (\cdot)

learns to extract fault-related features across multiple source domains,

C (\cdot)

learns to diagnose the health conditions of machinery in these domains,

G_{d} (\cdot)

learns to extract domain-related features from multiple source-domain data,

D (\cdot)

learns to classify their domain label,

G_{i}^{s} (\cdot)

learns to extract fault-related features from specific source domains, and

C_{i}^{s} (\cdot)

learns to diagnose the health conditions of machinery in specific source domains.

In this study, cross-entropy loss is used for all supervised learning tasks; supervised learning loss for the DGFD model

L_{S}

, the domain classification model

L_{S D}

, and the domain-specific fault diagnosis model

L_{S i}

is defined as follows:

L_{S} = E [- y^{s} \log ({\hat{y}}^{s})]

(1)

L_{S D} = E [- d^{s} \log ({\hat{d}}^{s})]

(2)

L_{S i} = E [- y_{i}^{s} \log ({\hat{y}}_{i}^{s})], (i = 1, 2, \dots, n_{s})

(3)

where

n_{s}

is the number of source domains;

y^{s}

is the fault label of the source-domain sample

x^{s}

;

{\hat{y}}^{s} = C (G (x^{s}))

is the output of the DGFD model;

d^{s}

is the domain label of the source-domain sample

x^{s}

;

{\hat{d}}^{s} = D (G_{d} (x^{s}))

is the output of the domain classification model;

y_{i}^{s}

is the fault label of the i-th source-domain sample

x_{i}^{s}

, and

{\hat{y}}_{i}^{s} = C_{i}^{s} (G_{i}^{s} (x_{i}^{s}))

is the output of the i-th source-domain-specific model.

2.2.2. Domain-Invariant Representation Learning

In EODG, feature orthogonalization constraint and feature transfer loss are combined to learn domain-invariant representations. Feature orthogonalization encourages orthogonality between fault-related and domain-related features by minimizing their cosine similarity. To align the feature distribution of same-class samples across different domains, and to separate the feature distribution of different-class samples, a novel feature transfer loss is proposed based on MMD in this paper [32]. The feature orthogonalization loss

L_{F O}

and feature transfer loss

L_{F T}

are defined as follows:

L_{F O} = E [c s (G (x^{s}), G_{d} (x^{s}))]

(4)

L_{F T} = \frac{E_{i, j, c} [M M D^{2} (Z_{i, c}^{s}, Z_{j, c}^{s})] + E_{i, c} [M M D^{2} (Z_{i, c}^{s}, {\bar{Z}}_{c}^{s})]}{E_{c, \hat{c}} [M M D^{2} (Z_{c}^{s}, Z_{\hat{c}}^{s})] + E_{c} [M M D^{2} (Z_{c}^{s}, {\bar{Z}}^{s})]}

(5)

where

c s (\cdot, \cdot)

represents the cosine similarity function;

M M D (\cdot, \cdot)

represents the maximum mean discrepancy function;

Z_{i, c}^{s} = G (X_{i, c}^{s})

and

Z_{j, c}^{s} = G (X_{j, c}^{s})

represent the features of dataset

X_{i, c}^{s}

and

X_{j, c}^{s}

, respectively;

X_{i, c}^{s}

and

X_{j, c}^{s}

contain all c-th-category samples that belong to the i-th source domain and the j-th source domain, respectively;

Z_{c}^{s} = G (X_{c}^{s})

represents the features set of dataset

X_{c}^{s}

, which contains all samples belonging to the c-th category;

{\bar{Z}}_{c}^{s} = E_{z_{c}^{s} ~ Z_{c}^{s}} [z_{c}^{s}]

represents the center of

Z_{c}^{s}

, which is the feature center of the c-th category;

{\bar{Z}}^{s} = E [G (x^{s})]

is the feature center of all samples;

i = 1, 2, \dots, n_{s} - 1

,

j = i + 1, i + 2, \dots, n_{s}

,

c = 1, 2, \dots, n_{c} - 1

, and

\hat{c} = c + 1, c + 2, \dots, n_{c}

,

n_{c}

represent the number of categories.

It can be seen from Equation (5) that by minimizing

L_{F T}

, the feature distributions of samples that have the same fault label but different domain labels are aligned, and all feature distributions of same-class samples are aligned with their feature center; meanwhile, the feature distributions of different categories are separated, and the feature distributions of each category are separated from the feature center of all samples.

2.2.3. Episodic Training

Most existing DGFD methods focus on training feature extractors to extract domain-invariant features, while the generalization capability of the fault classifier is overlooked. In EODG, episodic training [33] is applied to improve the generalization capabilities of both the general feature extractor

G (\cdot)

and the general fault classifier

C (\cdot)

. In episodic training, general modules and domain-specific modules are combined to form hybrid models which are trained in a supervised learning manner. As shown in Figure 2, in the case of the two source domains, the i-th and j-th domain-specific feature extractors are combined with the general fault classifier to form hybrid models

C (G_{i}^{s} (\cdot))

and

C (G_{j}^{s} (\cdot))

, respectively; meanwhile, the i-th and j-th domain-specific fault classifiers are combined with a general feature extractor to form hybrid models

C_{i}^{s} (G (\cdot))

and

C_{j}^{s} (G (\cdot))

. Then, labeled samples from other domains are used for the supervised training of each domain-specific hybrid model. Finally, the trained

G (\cdot)

is able to extract general features from samples of other domains which can be classified by the domain-specific fault classifier, and the trained

C (\cdot)

is able to classify the features extracted by the domain-specific feature extractor. Therefore, the generalization capabilities of

G (\cdot)

and

C (\cdot)

are effectively improved.

To meet the requirements of a heterogeneous DGFD setting, only shared-category samples from other domains are used for the supervised training of each domain-specific hybrid model in EODG. The loss of episodic training is defined as follows:

\begin{array}{l} L_{E T} = L_{E T, G} + L_{E T, C} \\ = E_{i, j, x_{i j}^{s}} [- y_{i j}^{s} \log (C_{i}^{s} (G (x_{i j}^{s})))] + E_{i, j, x_{i j}^{s}} [- y_{i j}^{s} \log (C (G_{i}^{s} (x_{i j}^{s})))] \end{array}

(6)

where

x_{i j}^{s}

is the sample of the i-th source domain that comes from the shared categories between i-th and j-th source domains, and

y_{i j}^{s}

is the fault label of

x_{i j}^{s}

,

i = 1, 2, \dots, n_{s} - 1

,

j = i + 1, i + 2, \dots, n_{s}

.

2.3. Diagnosis Procedures

In EODG, supervised learning, domain-invariant representation learning and episodic training are used for model training. During the training procedure, the losses for the general feature extractor

G (\cdot)

, the general fault classifier

C (\cdot)

, the domain feature extractor

G_{d} (\cdot)

, the domain classifier

D (\cdot)

, domain-specific feature extractors

D (\cdot)

and domain-specific fault classifiers

C_{i}^{s} (\cdot)

are defined as follows:

L_{G} = L_{S} + α L_{F O} + β L_{F T} + λ L_{E T, G}

(7)

L_{C} = L_{S} + λ L_{E T, C}

(8)

L_{G_{d}} = L_{S D} + α L_{F O}

(9)

L_{D} = L_{S D}

(10)

L_{G_{i}^{s}} = L_{S i}, (i = 1, 2, \dots, n_{s})

(11)

L_{C_{i}^{s}} = L_{S i}, (i = 1, 2, \dots, n_{s})

(12)

where

α

,

β

, and

λ

are tradeoff parameters.

The procedures of the proposed EODG method are presented in Figure 3, and summarized as follows:

Step 1: Collect vibration signals from rotating machinery and partition them into labeled source-domain signals for model training and unseen target-domain signals for model evaluation. Then, segment and standardize these signals to form labeled source-domain datasets

{\{D_{i}^{s}\}}_{i = 1}^{n_{s}}

and testing datasets

{\{D_{i}^{t}\}}_{i = 1}^{n_{t}}

.

Step 2: Construct a general feature extractor

G (\cdot)

, a general fault classifier

C (\cdot)

, a domain feature extractor

G_{d} (\cdot)

, a domain classifier

D (\cdot)

, domain-specific feature extractors

{\{G_{i}^{s}\}}_{i = 1}^{n_{s}}

and domain-specific fault classifiers

{\{C_{i}^{s}\}}_{i = 1}^{n_{s}}

. Pre-train these modules via supervised learning.

Step 3: Sample a batch of training data

{\{B_{i}^{s}\}}_{i = 1}^{n_{s}}

from

{\{D_{i}^{s}\}}_{i = 1}^{n_{s}}

. Train

{\{G_{i}^{s}\}}_{i = 1}^{n_{s}}

and

{\{C_{i}^{s}\}}_{i = 1}^{n_{s}}

. Then, freeze the parameters of

{\{G_{i}^{s}\}}_{i = 1}^{n_{s}}

and

{\{C_{i}^{s}\}}_{i = 1}^{n_{s}}

, and train

G (\cdot)

,

C (\cdot)

,

G_{d} (\cdot)

and

D (\cdot)

.

Step 4: Repeat Step 3 until the labeled source-domain dataset

{\{D_{i}^{s}\}}_{i = 1}^{n_{s}}

is traversed.

Step 5: Repeat Step 3 to Step 4 until the preset maximum number of epochs is reached.

3. Experimental Study

3.1. Datasets Description

In this study, a well-known public HUST bearing dataset and a CNC bearing dataset are used for the verification of the proposed method. Detailed descriptions of these datasets are provided below.

3.1.1. Huazhong University of Science and Technology (HUST) Bearing Dataset

The HUST bearing dataset was provided by Zhao et al. [34]. The test rig of this dataset is shown in Figure 4, and it consists of the following: 1. speed control; 2. a motor; 3. a shaft; 4. an accelerometer; 5. a bearing; and 6. a data acquisition board.

This dataset contains a normal state and four types of failure state, and each failure state has two severity levels. Vibration signals are collected at a sampling rate of 25.6 kHz. The data on the five types of health state (normal, inner-race fault, outer-race fault, ball fault, and inner- and outer-race combination fault) under six rotating speeds (20 Hz, 25 Hz, 30 Hz, 35 Hz, 40 Hz, and varying speeds (0–40–0 Hz)) are selected to evaluate the proposed method. The details of the HUST bearing dataset are listed in Table 1.

The test bearing used in the HUST dataset is a deep groove ball bearing, Rexnord ER16K, and its detailed specifications are listed in Table 2.

3.1.2. CNC Bearing Dataset

The test rig of the CNC bearing dataset is shown in Figure 5. The spindle of CNC is supported by four rolling bearings, and four types of faults (inner-race fault, outer-race fault, and cage fault) are introduced to the third bearing (marked in red). The experimental setup involved cutting aluminum materials under seven spindle speeds, with a feed rate of 2500 mm/min, a cutting depth of 0.1 mm, and a cutting width of 3 mm. Vibration data were acquired using an accelerometer mounted on the bearing housing with a sampling rate of 25.6 kHz. The details of the CNC bearing dataset are listed in Table 3.

The test bearing used in the CNC dataset is an angular contact ball bearing, NSK 40BNR10, and its detailed specifications are listed in Table 4.

3.2. Implementation Details

3.2.1. Network Structure and Hyperparameters

The network structures of the general feature extractor

G (\cdot)

, the general fault classifier

C (\cdot)

, the domain feature extractor

G_{d} (\cdot)

, the domain classifier

D (\cdot)

, domain-specific feature extractors

G_{i}^{s} (\cdot)

and domain-specific fault classifiers

C_{i}^{s} (\cdot)

are listed in Table 5. The structure of ResBlock is shown in Figure 6, where Ch represents the output channel, and W represents the convolutional kernel size.

As can be seen from Table 3,

G (\cdot)

,

G_{d} (\cdot)

, and

G_{i}^{s} (\cdot)

share the same network structure. The distinction between

C (\cdot)

,

D (\cdot)

, and

C_{i}^{s} (\cdot)

lies in their output layers, where the output sizes of

C (\cdot)

and

C_{i}^{s} (\cdot)

are determined by the number of classes

n_{c}

, and the output size of

D (\cdot)

is determined by the number of source domains

n_{s}

.

In this study, the negative slope of Leaky_ReLU is set as 0.1, and the Adam optimizer is used for training with a learning rate of 0.004. The model is trained using a batch size of 64 for 128 epochs. The tradeoff parameters are set as follows:

α = 1

,

β = 4

, and

β = 4

.

3.2.2. Experimental Setting

To evaluate the effectiveness of the proposed method, the HUST bearing dataset and CNC bearing dataset are used for multiple-source-domain generalization fault diagnosis experiments; the details of the experimental settings used are listed in Table 6 and Table 7, respectively. In this paper, the domain shifts between source and target domains arise from variations in rotating speed, which are directly proportional to the fault characteristics frequency. In the HUST bearing dataset, each category has 100 samples, with a sample length of 2048 data points. In CNC bearing datasets, each category has 200 samples, with the sample length of 2048 data points.

For each task, three speeds are randomly selected, and the corresponding domains are designated as source domains, whereas the remaining domains serve as target domains. To fulfill the requirements of a heterogeneous setting, the fault types differ across source domains. Specifically, all domains contain the normal state, as it is easy to achieve. In addition, each target fault type is required to appear in at least two different source domains, to learn consistent representations across domains. To verify the model’s DGFD performance for each fault type, all target domains share the same fault types, which cover all fault types that have occurred in the source domains.

Datasets of each domain are identified by their working conditions and fault categories. In Table 4, the working condition is represented by the rotating speed in units of Hz, where 20 denotes 20 Hz, 25 denotes 25 Hz, and so on. Specifically, VS denotes varying speed (0-40-0 Hz). The definitions of the fault codes can be found by referring to Table 1. In Table 5, the working condition is represented by the spindle rotation speed in units of rpm, where 6k denotes 6000 rpm, 7k denotes 7000 rpm, etc. The definitions of the fault codes are listed in Table 2. The dataset pertaining to the first source domain (S1) of the first HUST task (H1) is represented by 20 (H IRF BF ComF), where 20 indicates that the test rig is operating at a speed of 20 Hz, and (N IRF BF ComF) refers to the inclusion of four types of data (normal, inner-race fault, ball fault, and inner- and outer-race fault). For target domains, all datasets shared the same fault categories, and these datasets are uniformly represented, such as the datasets of target domains of H1, which is represented by 35 40 VS (N IRF ORF BF ComF), where 35 40 VS indicates the rotating speeds of the three target domains, each containing five types of data (normal, inner-race fault, outer-race fault, ball fault, and inner- and outer-race fault).

3.3. Benchmarked Approaches

To evaluate the effectiveness of the proposed methods, five methods are used for comparison.

(1) Convolutional Neural Networks (CNNs): All source-domain datasets are combined and used to train the model in a supervised learning manner.

(2) Conditional generative adversarial networks (CGANs) [21]: CGANs use a discriminator that can simultaneously classify the fault type and domain label for domain-adversarial training.

(3) Adversarial-Domain-Invariant Generalization (ADIG) [22]: In ADIG, the reshaped two-dimensional frequency spectrum is used as the input of the model, and adversarial learning and feature normalization strategies are leveraged to learn domain-invariant knowledge.

(4) Conditional Contrastive Domain Generalization (CCDG) [23]: CCDG uses mutual information to capture shareable fault information and learn domain-independent representations for rotary machine fault diagnosis in unseen domains.

(5) Causal Consistency Networks (CCNs) [24]: CCNs use the proposed causal consistency loss and collaborative training loss to learn consistent causality knowledge for bearing domain generalization fault diagnosis.

3.4. Results and Discussion

In this study, all experiments were implemented on an NVIDIA TITAN V GPU (Nvidia Corporation, Santa Clara, CA, USA) with the Pytorch 2.6.0 framework. The diagnostic accuracy, defined as the ratio of the number of correctly predicted test samples to the total number of test samples, is adopted as the evaluation metric. Each task is repeated ten times to reduce randomness.

3.4.1. Experimental Results of HUST Bearing Dataset

The diagnosis results of the HUST bearing dataset are presented in Table 8, which includes the mean and standard deviation of the accuracies of ten trials for each task. Specifically, the final row (Avg.) shows the average performance across all tasks. To further illustrate these results, Figure 7 shows the accuracy curve of each target domain, along with the average accuracy curve of all target domains, while Figure 8 presents a histogram of the diagnostic accuracy and corresponding standard deviations.

CNN achieves the lowest overall accuracy of 73.06%, which reveals the limitations of traditional supervised learning while applied to DGFD tasks. Among the DGFD methods, CGANs and ADIG are adversarial methods, and CCDG, CCNs and the proposed EODG method are non-adversarial methods. ADIG achieves the second highest overall accuracy of 83.74%, significantly outperforming CGANs, CCDG and CCNs. The superior performance of ADIG over that of CGANs indicates that incorporating frequency-spectrum inputs and feature normalization strategies can enhance generalization.

Among these methods, the proposed EODG method achieves the highest overall accuracy of 87.47%, outperforming all other benchmark methods. EODG consistently achieves superior performance across nearly all tasks and achieves the highest overall accuracy. The only exception is Task H2, where its accuracy (80.65%) is marginally lower (by 0.55%) than the best-performing method (ADIG, 81.20%). As illustrated in Figure 7, EODG demonstrates consistently strong performance across individual target domains. Figure 8 shows that EODG achieved the best overall performance with relatively small standard deviations.

In summary, the results of HUST bearing datasets demonstrate the effectiveness and superiority of the proposed EODG method in DGFD tasks. The combination of episodic training (EPI), feature transfer (FT) constraint and feature orthogonalization (FO) constraint can significantly improve the generalization of intelligent fault diagnosis model.

3.4.2. Experimental Results of CNC Bearing Dataset

The results of the CNC bearing datasets are shown in Table 9, and they are similar to the results of the HUST bearing datasets: CNNs achieve the lowest overall accuracy of 70.84%. The overall accuracies of CGANs, CCDG and CCNs are close, all slightly outperforming that of CNNs. ADIG achieves the second highest overall accuracy of 80.88%. In some tasks, ADIG shows higher accuracy than the proposed EODG method.

EODG achieves the highest overall accuracy of 85.56%, significantly outperforming other methods. EODG achieves the highest accuracy in most individual tasks, except C3, C5, and C6, in which its accuracy falls short of the best-performing method by no more than 1.5%. Figure 9 and Figure 10 show the accuracy curves and histogram of the experimental results, respectively. In these figures, the EODG method shows the best diagnosis performance in most individual target domains and shows the best overall performance with relatively small standard deviations, demonstrating the effectiveness, superiority and robustness of the proposed method.

3.5. Feature Visualization

To further evaluate the effectiveness of the proposed method, t-distributed stochastic neighbor embedding (t-SNE) is used for feature visualization [35]. Because the output features of the feature extractor are used for domain-invariant representation learning for all methods except CNNs, feature visualization is conducted on these features. Task C7 is selected for the visualization.

Figure 11 presents the results of feature visualization, where the legend consists of the fault label and the domain label. For example, “IFR(T)” represents that the inner-race fault samples from the target domain. It can be seen from Figure 11 that CNNs, CGANs, and CCNs exhibit category confusion in the extracted features. ADIG exhibits good inter-class discrimination, with almost no category confusion. However, its inter-domain integration is poor, as few target-domain samples are integrated with the corresponding source-domain samples of the same category. CCDG shows good inter-class discrimination and better inter-domain integration than ADIG. Among these methods, EODG exhibits the best domain-invariant feature extraction ability, with clear inter-class boundaries and excellent inter-domain integration. The target-domain samples are clustered near the corresponding source-domain samples of the same category.

The feature visualization results demonstrate that EODG can effectively extract domain-invariant features, even in the presence of unseen working conditions.

3.6. Ablation Study

In this section, the influences of FO, EPI, and FT on the performance of the model are analyzed. Six variants of EODG are used for the ablation study: (1) FO: the model is trained with supervised learning and feature orthogonalization constraint; (2) EPI: the model is trained with supervised learning and episodic training; (3) FT: the model is trained with supervised learning and feature transfer constraint; (4) EPI + FO: the model is trained with supervised learning, episodic training, and feature orthogonalization constraint; (5) FT + FO: the model is trained with supervised learning, feature transfer constraint, and feature orthogonalization constraint; (6) FT + EPI: the model is trained with supervised learning, feature transfer constraint, and episodic training.

The results of the ablation study are listed in Table 10 and shown in Figure 12. The results demonstrate that the contribution ranking is FT (78.47%) > EPI (76.76%) > FO (74.70%). The combination of FT and EPI achieves the second highest overall accuracy of 79.45%, which is higher than that of the individual EPI and FT variants. For EPI + FO, it also achieves higher overall accuracy than EPI and FO. FT + FO performs slightly better than FO, but slightly worse than FT. EODG with a combination of FO, EPI and FT outperforms all six variants in all individual tasks, which demonstrates the effectiveness of this combination.

In Figure 12, EODG shows the highest accuracy in every task and the smallest standard deviation in almost all tasks. The results of the ablation study demonstrate that FO, EPI and FT can effectively improve the generalization of the DGFD model. Furthermore, FT gives model the basic ability to perform DGFD by learning domain-invariant representations, while the additions of EPI and FO further boost the DGFD performance of the intelligent fault diagnosis model. In the practical industry, rotating machinery is always required to operate under varying working conditions. The proposed EODG method can train a DGFD model that can mitigate the performance degradation caused by changes in working conditions, thereby broadening the applications of intelligent fault diagnosis methods in practical industry.

4. Conclusions

This paper proposed an EODG method for domain generalization fault diagnosis tasks which aims to improve the fault diagnosis performance of intelligent fault diagnosis models under unseen working conditions. In EODG, the proposed feature transfer loss gives model the basic ability to perform domain generalization fault diagnosis, whereas the combination of episodic training and feature orthogonality further improves the generalization of both the general feature extractor and the general fault classifier. The model is trained using only the labeled data from source domains, and the samples from unseen target domains are used to evaluate the domain generalization fault diagnosis performance. Two bearing fault diagnosis datasets were used in this study for the evaluation of EODG performance; EODG achieved the highest overall accuracy and the highest accuracy in almost all individual tasks. The experimental results of the comparative study, of feature visualization, and of the ablation study demonstrate that EODG can significantly improve the performance of the intelligent fault diagnosis model under unseen working conditions by taking the advantages of feature orthogonalization constraint, episodic training and feature transfer constraint.

Author Contributions

Conceptualization, Y.L. (Yixiao Liao) and S.Z.; methodology, Y.L. (Yixiao Liao); software, J.L.; validation, L.Z. and C.L.; data curation, J.L.; writing—original draft preparation, Y.L. (Yixiao Liao) and K.P.; writing—review and editing, Y.L. (Yisen Liu); supervision, S.Z.; project administration, S.Z. and Y.L. (Yixiao Liao); funding acquisition, Y.L. (Yixiao Liao) and S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the GDAS’ Project of Science and Technology Development (grant number 2022 GDASZH 2022010108).

Data Availability Statement

The data used in this study are available on request to the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Misbah, I.; Lee, C.K.M.; Keung, K.L. Fault diagnosis in rotating machines based on transfer learning: Literature review. Knowl.-Based Syst. 2024, 283, 111158. [Google Scholar] [CrossRef]
Lin, H.; Huang, X.; Chen, Z.; He, G.; Xi, C.; Li, W. Matching pursuit network: An interpretable sparse time–frequency representation method toward mechanical fault diagnosis. IEEE Trans. Neural Netw. Learn. Syst. 2024, 1–12. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; He, G.; Ding, K.; Li, Y.; Ding, X.; Li, W. A novel optimization demodulation method for gear fault vibration overmodulation signal and its application to fault diagnosis. IEEE Trans. Instrum. Meas. 2023, 72, 3517812. [Google Scholar] [CrossRef]
Tang, S.; Ma, J.; Yan, Z.; Zhu, Y.; Khoo, B.C. Deep transfer learning strategy in intelligent fault diagnosis of rotating machinery. Eng. Appl. Artif. Intell. 2024, 134, 108678. [Google Scholar] [CrossRef]
Li, J.; Yue, K.; Chen, Z.; Xia, J.; Li, W.; Zhang, X. An Uncertainty-Aware Continual Learning Framework for Fault Diagnosis of Rotating Machinery With Homogeneous-Heterogeneous Faults. IEEE Trans. Autom. Sci. Eng. 2024, 1–15. [Google Scholar] [CrossRef]
Yuan, B.; Lei, L.; Chen, S. Optimized Variational Mode Decomposition and Convolutional Block Attention Module-Enhanced Hybrid Network for Bearing Fault Diagnosis. Machines 2025, 13, 320. [Google Scholar] [CrossRef]
Wang, S.; Zhang, X.; Ma, T.; Kong, Y.; Gao, S.; Han, Q. Symmetrical Triboelectric In Situ Self-Powered Sensing and Fault Diagnosis for Double-Row Tapered Roller Bearings in Wind Turbines: An Integrated and Real-Time Approach. Adv. Sci. 2025, 12, 2500981. [Google Scholar] [CrossRef]
Ren, Z.; Lin, T.; Feng, K.; Zhu, Y.; Liu, Z.; Yan, K. A systematic review on imbalanced learning methods in intelligent fault diagnosis. IEEE Trans. Instrum. Meas. 2023, 72, 3508535. [Google Scholar] [CrossRef]
Dai, J.; Tian, L.; Chang, H. An Intelligent Diagnostic Method for Wear Depth of Sliding Bearings Based on MGCNN. Machines 2024, 12, 266. [Google Scholar] [CrossRef]
Qian, Q.; Zhang, B.; Li, C.; Mao, Y.; Qin, Y. Federated transfer learning for machinery fault diagnosis: A comprehensive review of technique and application. Mech. Syst. Signal Process. 2025, 223, 111837. [Google Scholar] [CrossRef]
Wang, J.; Yang, S.; Liu, Y.; Wen, G. Deep subdomain transfer learning with spatial attention ConvLSTM network for fault diagnosis of wheelset bearing in high-speed trains. Machines 2023, 11, 304. [Google Scholar] [CrossRef]
Xiao, Y.; Shao, H.; Yan, S.; Wang, J.; Peng, Y.; Liu, B. Domain generalization for rotating machinery fault diagnosis: A survey. Adv. Eng. Inform. 2025, 64, 103063. [Google Scholar] [CrossRef]
Davoodabadi, A.; Behzad, M.; Arghand, H.A.; Mohammadi, S.; Gelman, L. Intelligent Diagnosis of Rolling Element Bearings Under Various Operating Conditions Using an Enhanced Envelope Technique and Transfer Learning. Machines 2025, 13, 351. [Google Scholar] [CrossRef]
Ma, S.; Leng, J.; Zheng, P.; Chen, Z.; Li, B.; Li, W.; Liu, Q.; Chen, X. A digital twin-assisted deep transfer learning method towards intelligent thermal error modeling of electric spindles. J. Intell. Manuf. 2025, 36, 1659–1688. [Google Scholar] [CrossRef]
Tian, J.; Han, D.; Li, M.; Shi, P. A multi-source information transfer learning method with subdomain adaptation for cross-domain fault diagnosis. Knowl.-Based Syst. 2022, 243, 108466. [Google Scholar] [CrossRef]
Qian, Q.; Qin, Y.; Luo, J.; Wang, Y.; Wu, F. Deep discriminative transfer learning network for cross-machine fault diagnosis. Mech. Syst. Signal Process. 2023, 186, 109884. [Google Scholar] [CrossRef]
Huo, C.; Jiang, Q.; Shen, Y.; Zhu, Q.; Zhang, Q. Enhanced transfer learning method for rolling bearing fault diagnosis based on linear superposition network. Eng. Appl. Artif. Intell. 2023, 121, 105970. [Google Scholar] [CrossRef]
Li, J.; Zhang, X.; Yue, K.; Chen, J.; Chen, Z.; Li, W. An auto-regulated universal domain adaptation network for uncertain diagnostic scenarios of rotating machinery. Expert Syst. Appl. 2024, 249, 123836. [Google Scholar] [CrossRef]
Xia, J.; Huang, R.; Chen, Z.; He, G.; Li, W. A novel digital twin-driven approach based on physical-virtual data fusion for gearbox fault diagnosis. Reliab. Eng. Syst. Saf. 2023, 240, 109542. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ma, H.; Luo, Z.; Li, X. Domain generalization in rotating machinery fault diagnostics using deep neural networks. Neurocomputing 2020, 403, 409–420. [Google Scholar] [CrossRef]
Zhang, Q.; Zhao, Z.; Zhang, X.; Liu, Y.; Sun, C.; Li, M.; Wang, S.; Chen, X. Conditional adversarial domain generalization with a single discriminator for bearing fault diagnosis. IEEE Trans. Autom. Sci. Eng. 2021, 70, 3514515. [Google Scholar] [CrossRef]
Chen, L.; Li, Q.; Shen, C.; Zhu, J.; Wang, D.; Xia, M. Adversarial domain-invariant generalization: A generic domain-regressive framework for bearing fault diagnosis under unseen conditions. IEEE Trans. Ind. Inform. 2021, 18, 1790–1800. [Google Scholar] [CrossRef]
Ragab, M.; Chen, Z.; Zhang, W.; Eldele, E.; Wu, M.; Kwoh, C.-K.; Li, X. Conditional contrastive domain generalization for fault diagnosis. IEEE Trans. Autom. Sci. Eng. 2022, 71, 3506912. [Google Scholar] [CrossRef]
Li, J.; Wang, Y.; Zi, Y.; Zhang, H.; Li, C. Causal consistency network: A collaborative multimachine generalization method for bearing fault diagnosis. IEEE Trans. Ind. Inform. 2022, 19, 5915–5924. [Google Scholar] [CrossRef]
Jia, L.; Chow, T.W.S.; Yuan, Y. Causal disentanglement domain generalization for time-series signal fault diagnosis. Neural Netw. 2024, 172, 106099. [Google Scholar] [CrossRef]
Zhao, C.; Shen, W. Imbalanced domain generalization via semantic-discriminative augmentation for intelligent fault diagnosis. Adv. Eng. Inform. 2024, 59, 102262. [Google Scholar] [CrossRef]
Zhu, Q.; Liu, H.; Bao, C.; Zhu, J.; Mao, X.; He, S.; Peng, F. Decoupled interpretable robust domain generalization networks: A fault diagnosis approach across bearings, working conditions, and artificial-to-real scenarios. Adv. Eng. Inform. 2024, 61, 102445. [Google Scholar] [CrossRef]
Pang, S. Stacked maximum independence autoencoders: A domain generalization approach for fault diagnosis under various working conditions. Mech. Syst. Signal Process. 2024, 208, 111035. [Google Scholar] [CrossRef]
Xu, M.; Zhang, Y.; Lu, B.; Liu, Z.; Sun, Q. A novel domain-private-suppress meta-recognition network based universal domain generalization for machinery fault diagnosis. Knowl.-Based Syst. 2025, 309, 112775. [Google Scholar] [CrossRef]
Lu, F.; Tong, Q.; Jiang, X.; Du, X.; Xu, J.; Huo, J. Prior knowledge embedding convolutional autoencoder: A single-source domain generalized fault diagnosis framework under small samples. Comput. Ind. 2025, 164, 104169. [Google Scholar] [CrossRef]
Xiao, Y.; Shao, H.; Wang, J.; Yan, S.; Liu, B. Bayesian variational transformer: A generalizable model for rotating machinery fault diagnosis. Mech. Syst. Signal Process. 2024, 207, 110936. [Google Scholar] [CrossRef]
Gretton, A.; Borgwardt, K.; Rasch, M.; Schölkopf, B.; Smola, A.J. A kernel method for the two-sample-problem. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006. [Google Scholar]
Li, D.; Zhang, J.; Yang, Y.; Liu, C.; Song, Y.-Z.; Hospedales, T. Episodic training for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Zhao, C.; Zio, E.; Shen, W. Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study. Reliab. Eng. Syst. Saf. 2024, 245, 109964. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. Overview of EODG.

Figure 2. The schematic diagram of episodic training.

Figure 3. The diagnosis procedures of EODG.

Figure 4. Test rig of HUST bearing dataset.

Figure 5. Test rig of CNC bearing dataset. (a) CNC picture. (b) Structure diagram.

Figure 6. Structure of ResBlock.

Figure 7. Accuracy curves (%) of HUST bearing dataset. (a) Accuracy curve of Target Domain 1. (b) Accuracy curve of Target Domain 2. (c) Accuracy curve of Target Domain 3. (d) Average accuracy curve of all target domains.

Figure 8. Accuracy histogram (%) of HUST bearing dataset.

Figure 9. Accuracy curves (%) of CNC bearing dataset. (a) Accuracy curve of Target Domain 1. (b) Accuracy curve of Target Domain 2. (c) Accuracy curve of Target Domain 3. (d) Accuracy curve of Target Domain 4. (e) Average accuracy curve of all target domains.

Figure 10. Accuracy histogram (%) of CNC bearing dataset.

Figure 11. Feature visualization of task C7.

Figure 12. Accuracy histogram (%) of CNC bearing dataset ablation experiment.

Table 1. Details of HUST bearing dataset.

Label	Bearing State	Mark	Rotating Speeds (Hz)
1	Normal	N	20, 25, 30, 35, 40, and varying speed (0-40-0)
2	Inner-race fault	IRF
3	Outer-race fault	ORF
4	Ball Fault	BF
5	Inner- and outer-race fault	ComF

Table 2. Detailed specifications of HUST bearing.

Pitch Diameter/mm	Ball Diameter/mm	Number of Balls	Contact Angle/°
38.52	7.94	9	0

Table 3. Details of CNC bearing dataset.

Label	Bearing State	Mark	Working Condition	Spindle Speeds (rpm)
1	Normal	N	Aluminum cutting	6000, 7000, 8000, 9000, 10,000, 11,000, 12,000
2	Inner-race fault	IRF
3	Outer-race fault	ORF
4	Cage Fault	CF

Table 4. Detailed specifications of CNC bearing.

Pitch Diameter/mm	Ball Diameter/mm	Number of Balls	Contact Angle/°
54	5.6	22	18

Table 5. Network structures.

Modules	Layer Type	Activation Function	Kernel Size	Output
$G (\cdot)$ $G_{d} (\cdot)$ $G_{i}^{s} (\cdot)$	Input	/	/	(1, 2048)
	Conv1	ReLU	16 × 129	(16, 1920)
	MaxPooling1	/	8	(16, 240)
	ResBlock1	ReLU	32 × 5	(32, 240)
	MaxPooling2	/	4	(32, 60)
	ResBlock2	ReLU	32 × 5	(32, 60)
	MaxPooling3	/	4	(32, 15)
	Conv2	ReLU	64 × 15	(64, 1)
	Flatten	/	/	(64)
$C (\cdot)$ $D (\cdot)$ $C_{i}^{s} (\cdot)$	Linear1	Leaky_ReLU	32	(32)
	Linear2	Leaky_ReLU	16	(16)
	Linear3 (Output)	SoftMax	$n_{c}$ $/ n_{s}$	$(n_{c}$ $/ n_{s}$ )

Table 6. Experimental setting of HUST bearing dataset.

Task	Source Domains			Target Domains
Task	S1	S2	S3	Target Domains
H1	20 (N IRF BF ComF)	25 (N IRF ORF BF)	30 (N ORF BF ComF)	35 40 VS (N IRF ORF BF ComF)
H2	30 (N IRF BF ComF)	35 (N IRF ORF BF)	40 (N ORF BF ComF)	20 25 VS (N IRF ORF BF ComF)
H3	20 (N IRF BF ComF)	30 (H IRF ORF BF)	40 (H ORF BF ComF)	25 35 VS (H IRF ORF BF ComF)
H4	20 (N IRF BF ComF)	25 (N IRF ORF BF)	40 (N ORF BF ComF)	30 35 VS (N IRF ORF BF ComF)
H5	20 (N IRF BF ComF)	35 (N IRF ORF BF)	40 (N ORF BF ComF)	25 30 VS (N IRF ORF BF ComF)
H6	25 (N IRF BF ComF)	30 (N IRF ORF BF)	35 (N ORF BF ComF)	20 40 VS (N IRF ORF BF ComF)

Table 7. Experimental setting of CNC bearing dataset.

Task	Source Domains			Target Domains
Task	S1	S2	S3	Target Domains
C1	6k (N ORF CF)	7k (N IRF ORF)	8k (N IRF CF)	9k 10k 11k 12k (N IRF ORF CF)
C2	10k (N ORF CF)	11k (N IRF ORF)	F12k (N IRF CF)	6k CB7k 8k 9k (N IRF ORF CF)
C3	6k (N ORF CF)	9k (N IRF ORF)	12k (N IRF CF)	7k 8k 10k 11k (N IRF ORF CF)
C4	8k (N ORF CF)	9k (N IRF ORF)	10k (N IRF CF)	6k 7k 11k 12k (N IRF ORF CF)
C5	7k (N ORF CF)	9k (N IRF ORF)	11k (N IRF CF)	6k 8k 10k 12k (N IRF ORF CF)
C6	7k (N ORF CF)	8k (N IRF ORF)	9k (N IRF CF)	6k 10k 11k 12k (N IRF ORF CF)
C7	9k (N ORF CF)	10k (N IRF ORF)	11k (N IRF CF)	6k 7k 8k 12k (N IRF ORF CF)
C8	6k (N ORF CF)	7k (N IRF ORF)	12k (N IRF CF)	8k 9k 10k 11k (N IRF ORF CF)
C9	6k (N ORF CF)	11k (N IRF ORF)	12k (N IRF CF)	7k 8k 9k 10k (N IRF ORF CF)

Table 8. Diagnosis results (%) of HUST bearing dataset.

Tasks	CNN	CGANs	ADIG	CCDG	CCN	Proposed
H1	67.84 ± 3.08	66.89 ± 2.49	78.10 ± 5.04	66.71 ± 3.05	72.29 ± 2.48	81.91 ± 3.95
H2	56.46 ± 3.25	64.81 ± 4.44	81.20 ± 5.25	65.84 ± 5.25	69.19 ± 2.04	80.65 ± 3.12
H3	80.66 ± 3.01	84.28 ± 3.20	86.63 ± 3.52	82.45 ± 2.67	82.55 ± 1.22	94.47 ± 1.16
H4	78.56 ± 3.69	79.87 ± 2.79	85.70 ± 3.10	78.69 ± 2.53	71.53 ± 2.53	89.92 ± 2.57
H5	82.15 ± 1.87	83.78 ± 4.75	88.63 ± 3.70	81.26 ± 3.78	81.21 ± 0.76	95.57 ± 0.78
H6	69.55 ± 4.14	78.93 ± 1.59	82.17 ± 2.97	74.59 ± 4.38	72.56 ± 0.72	82.31 ± 2.76
Avg.	72.54 ± 8.98	76.43 ± 7.74	83.74 ± 3.58	74.92 ± 6.60	74.89 ± 5.08	87.47 ± 6.12

The highest accuracy of each row is marked in bold.

Table 9. Diagnosis results (%) of CNC bearing dataset.

Tasks	CNN	CGANs	ADIG	CCDG	CCN	Proposed
C1	65.18 ± 1.16	67.57 ± 0.42	77.60 ± 1.94	68.65 ± 4.67	73.47 ± 2.50	86.38 ± 0.61
C2	66.77 ± 2.70	59.88 ± 2.61	73.69 ± 3.11	66.44 ± 3.48	76.55 ± 3.29	80.52 ± 2.20
C3	67.64 ± 2.15	73.70 ± 2.68	80.30 ± 1.25	73.19 ± 2.45	73.18 ± 1.88	80.03 ± 0.90
C4	74.80 ± 3.12	80.49 ± 1.09	79.43 ± 0.16	74.66 ± 2.65	82.10 ± 3.58	89.45 ± 2.42
C5	68.22 ± 1.71	79.68 ± 3.79	87.95 ± 1.98	73.87 ± 0.94	68.96 ± 0.64	87.18 ± 1.83
C6	73.42 ± 2.96	75.45 ± 2.71	87.72 ± 0.52	83.27 ± 2.28	73.40 ± 4.81	86.22 ± 0.59
C7	85.95 ± 3.00	69.76 ± 2.57	86.56 ± 4.94	84.89 ± 1.78	80.38 ± 2.13	94.67 ± 1.06
C8	72.07 ± 1.05	79.33 ± 1.66	78.49 ± 1.37	75.83 ± 1.43	74.22 ± 1.69	84.73 ± 2.14
C9	63.51 ± 1.50	65.84 ± 3.01	76.19 ± 2.42	66.43 ± 2.58	73.16 ± 1.89	80.87 ± 3.32
Avg.	70.84 ± 6.43	72.41 ± 6.73	80.88 ± 4.96	74.14 ± 6.25	75.05 ± 3.81	85.56 ± 4.48

The highest accuracy of each row is marked in bold.

Table 10. Ablation experiment results (%) of CNC bearing dataset.

Tasks	FO	EPI	FT	EPI + FO	FT + FO	FT + EPI	Proposed
C1	66.94 ± 1.63	73.41 ± 2.71	79.24 ± 1.33	70.91 ± 0.94	79.42 ± 2.21	82.46 ± 1.91	86.38 ± 0.61
C2	64.76 ± 2.25	66.51 ± 1.83	71.54 ± 1.75	68.29 ± 2.36	64.82 ± 0.99	69.10 ± 4.94	80.52 ± 2.20
C3	73.31 ± 1.86	71.81 ± 2.34	71.56 ± 1.91	71.91 ± 1.38	70.22 ± 1.50	76.35 ± 1.37	80.03 ± 0.90
C4	83.02 ± 0.99	82.75 ± 1.39	78.18 ± 1.86	82.06 ± 3.89	76.30 ± 3.58	80.68 ± 2.23	89.45 ± 2.42
C5	73.03 ± 1.90	81.64 ± 2.47	77.68 ± 3.72	79.20 ± 5.63	76.09 ± 2.18	76.89 ± 2.90	87.18 ± 1.83
C6	78.17 ± 4.20	80.92 ± 0.87	83.30 ± 1.66	81.30 ± 1.17	81.87 ± 1.68	82.64 ± 1.37	86.22 ± 0.59
C7	89.23 ± 2.38	86.92 ± 0.97	92.61 ± 2.13	91.64 ± 2.60	90.53 ± 0.70	93.69 ± 2.09	94.67 ± 1.06
C8	77.38 ± 2.48	77.34 ± 2.16	80.92 ± 1.81	77.13 ± 1.02	79.08 ± 2.48	80.96 ± 1.19	84.73 ± 2.14
C9	66.49 ± 3.40	69.55 ± 2.60	71.24 ± 2.05	70.50 ± 1.22	68.75 ± 2.29	72.32 ± 3.22	80.87 ± 3.32
Avg.	74.70 ± 7.68	76.76 ± 6.44	78.47 ± 6.48	76.99 ± 7.03	76.34 ± 7.28	79.45 ± 6.67	85.56 ± 4.48

The highest accuracy of each row is marked in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liao, Y.; Zhou, S.; Liu, Y.; Pang, K.; Li, J.; Li, C.; Zhao, L. Episodic Training and Feature Orthogonality-Driven Domain Generalization for Rotating Machinery Fault Diagnosis Under Unseen Working Conditions. Machines 2025, 13, 563. https://doi.org/10.3390/machines13070563

AMA Style

Liao Y, Zhou S, Liu Y, Pang K, Li J, Li C, Zhao L. Episodic Training and Feature Orthogonality-Driven Domain Generalization for Rotating Machinery Fault Diagnosis Under Unseen Working Conditions. Machines. 2025; 13(7):563. https://doi.org/10.3390/machines13070563

Chicago/Turabian Style

Liao, Yixiao, Songbin Zhou, Yisen Liu, Kunkun Pang, Jing Li, Chang Li, and Lulu Zhao. 2025. "Episodic Training and Feature Orthogonality-Driven Domain Generalization for Rotating Machinery Fault Diagnosis Under Unseen Working Conditions" Machines 13, no. 7: 563. https://doi.org/10.3390/machines13070563

APA Style

Liao, Y., Zhou, S., Liu, Y., Pang, K., Li, J., Li, C., & Zhao, L. (2025). Episodic Training and Feature Orthogonality-Driven Domain Generalization for Rotating Machinery Fault Diagnosis Under Unseen Working Conditions. Machines, 13(7), 563. https://doi.org/10.3390/machines13070563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Episodic Training and Feature Orthogonality-Driven Domain Generalization for Rotating Machinery Fault Diagnosis Under Unseen Working Conditions

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Formulation

2.2. The Proposed Method

2.2.1. Supervise Learning

2.2.2. Domain-Invariant Representation Learning

2.2.3. Episodic Training

2.3. Diagnosis Procedures

3. Experimental Study

3.1. Datasets Description

3.1.1. Huazhong University of Science and Technology (HUST) Bearing Dataset

3.1.2. CNC Bearing Dataset

3.2. Implementation Details

3.2.1. Network Structure and Hyperparameters

3.2.2. Experimental Setting

3.3. Benchmarked Approaches

3.4. Results and Discussion

3.4.1. Experimental Results of HUST Bearing Dataset

3.4.2. Experimental Results of CNC Bearing Dataset

3.5. Feature Visualization

3.6. Ablation Study

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI