Next Article in Journal
Uncertainty and Sensitivity Analyses of an Annular Thermoelectric Refrigerator Based on Latin Hypercube Sampling
Previous Article in Journal
Laser Beam Welding State Classification: A Deep Learning Framework for Acoustic Signal Intelligence
Previous Article in Special Issue
A Domain Adaptation Method for Fault Diagnosis of Planetary Gearboxes Under Varying Operating Conditions with Time–Frequency Enhanced Attention
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Few-Shot Fault Diagnosis of Rotating Machinery Using Complex Convolution and Disentangled Representation Learning

1
CRRC Academy, Beijing 100071, China
2
State Key Laboratory of Rail Transit Vehicle System, Southwest Jiaotong University, Chengdu 610031, China
3
School of Electronic Engineering, Xidian University, Xi’an 710071, China
4
School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 610031, China
5
CRRC Co., Ltd., Beijing 100036, China
6
CRRC Qingdao Sifang Co., Ltd., Qingdao 266111, China
7
School of Automation and Intelligence, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Machines 2026, 14(6), 655; https://doi.org/10.3390/machines14060655 (registering DOI)
Submission received: 2 May 2026 / Revised: 29 May 2026 / Accepted: 1 June 2026 / Published: 4 June 2026
(This article belongs to the Special Issue Intelligent Predictive Maintenance and Machine Condition Monitoring)

Abstract

Few-shot fault diagnosis is a challenging task in rotating machinery health monitoring because only limited labeled fault samples are available in practical industrial scenarios. Under such conditions, deep learning models are prone to overfitting and may fail to extract stable fault-sensitive features from vibration signals. Moreover, the weak fault-related components are usually coupled with operating-condition variations, background vibration, and environmental noise, which further degrades the discriminability and generalization ability of diagnostic models. To address these problems, this paper proposes a complex-valued disentangled representation learning network for few-shot fault diagnosis of rotating machinery. First, a direction-pair complex augmentation strategy is developed for triaxial vibration measurements. Two directional vibration components are selected and organized as the real and imaginary branches of a complex-valued input, which increases sample diversity under few-shot conditions. Then, a lightweight complex-valued convolution block is designed to model the coupled dynamic characteristics between different vibration directions and extract fault-sensitive representations. Furthermore, a dual-branch disentangled representation structure is developed to decompose the learned features into fault-sensitive representations and condition-related interference representations. To enhance the separability of fault embeddings under limited samples, a cosine-based disentangled representation loss is introduced, which improves intra-class compactness and inter-class discrimination while suppressing irrelevant interference information. Finally, a few-shot diagnosis strategy is constructed to identify fault categories with only a small number of labeled samples. Experimental results demonstrate that the proposed method consistently outperforms representative methods in terms of diagnostic accuracy, feature separability, and robustness, especially under extremely limited labeled samples.

1. Introduction

Rotating machinery is widely used in modern industrial systems, including rail transit systems, wind turbines, aero-engines, power generation equipment, and intelligent manufacturing equipment [1]. As key transmission and supporting components [2], bearings, gears, shafts, and other rotating elements usually operate under complex mechanical loads, varying speeds, and harsh operating environments. During long-term operation, these components are prone to fatigue, wear, pitting, cracking, lubrication degradation, and other localized defects or degradation phenomena [3]. If such faults are not detected promptly, they may cause performance degradation, unplanned downtime, cascading damage to adjacent components, and even serious safety accidents [4]. Therefore, accurate and timely fault diagnosis of rotating machinery is of great significance for ensuring operational reliability, reducing maintenance costs, and enhancing the safety of industrial equipment [5,6]. Vibration signal analysis has become one of the most widely used techniques for rotating machinery fault diagnosis because mechanical defects usually induce fault-related dynamic responses that can be captured by vibration sensors [7,8]. Localized defects in bearings or gears can generate fault-induced impulses and modulation components during rolling contact or gear meshing processes. These fault-related dynamic responses are subsequently manifested in the measured vibration signals [9]. Traditional fault diagnosis methods usually rely on handcrafted features extracted in the time, frequency, or time-frequency domains, such as time-domain statistical indicators, spectral features, envelope spectrum features, wavelet-based features, and empirical mode decomposition-based features [10]. These methods have achieved promising performance in specific diagnostic scenarios. However, their effectiveness often depends heavily on prior knowledge, signal preprocessing strategies, and expert experience, which limits their generalization and adaptability to complex industrial environments.
Driven by recent advances in deep learning, data-driven fault diagnosis methods have attracted increasing attention in rotating machinery health monitoring [11,12]. Through hierarchical nonlinear transformations, deep neural networks can automatically learn discriminative representations from raw vibration signals or transformed representations, such as spectral and time-frequency representations [13]. Compared with handcrafted feature-based methods, deep learning models alleviate the dependence on manual feature design and have shown strong representation learning capability in complex signal analysis tasks [14,15]. Among them, convolutional neural networks, recurrent neural networks, auto-encoders, and attention-based networks have been widely applied to rotating machinery fault diagnosis.
However, the promising performance of most deep diagnostic models usually relies on sufficient labeled training samples. The assumption of abundant labeled data is difficult to satisfy in practical industrial scenarios. Rotating machinery generally operates under normal conditions for most of its service life, whereas fault conditions occur infrequently. In particular, early-stage, severe, or specific fault types are rare, costly, and sometimes unsafe to reproduce in real equipment. As a result, the number of labeled fault samples is often far smaller than the number of normal samples, and some fault categories may contain only a few labeled samples [16,17]. Under limited-sample conditions, deep neural networks are prone to overfitting. This is because limited training data cannot fully characterize the intra-class variability and inter-class discrepancies of fault patterns. The learned features may be dominated by sample-specific patterns rather than intrinsic fault mechanisms. This problem becomes more severe when fault-related signal components are weak or when operating conditions vary. Consequently, diagnostic models trained with limited samples often suffer from unstable feature representations, reduced classification accuracy, and poor generalization to unseen working conditions. Therefore, few-shot fault diagnosis has become an important and challenging problem in intelligent maintenance of rotating machinery [18,19]. The key issue is how to learn robust and fault-discriminative representations from limited labeled samples [20]. An effective model should not only avoid overfitting but also enhance intra-class compactness and inter-class separability in the learned feature space [21,22]. This requirement motivates the development of representation learning, metric learning, and disentangled feature learning strategies for few-shot fault diagnosis.
To address the difficulty of extracting subtle fault features and improving generalization to unseen fault classes, Li et al. [18] proposed an attention-based deep meta-transfer learning method named ADMTL. Wang et al. [23] proposed a few-shot mechanical fault diagnosis method called dual graph neural network with residual blocks to address the limited labeled data problem. Ren et al. [24] proposed a Few-shot GAN to address severe data imbalance. The proposed method pre-trained the GAN using a sample-rich class to learn a general sample distribution paradigm. Wang et al. [25] proposed a few-shot fault diagnosis model that combines self-supervised learning with an improved Siamese network to address the lack of labeled samples. Lin et al. [26] proposed IFMAML, a few-shot meta-transfer fault diagnosis method for cross-domain diagnosis. Sparse principal component analysis is first used to enhance domain-invariant features and reduce redundancy.
In addition to the scarcity of labeled samples, feature coupling is another critical factor that limits the performance of few-shot fault diagnosis. Measured vibration signals from rotating machinery usually contain multiple coupled components. Fault-related components, such as weak impulses, modulation components, and transient responses, are often mixed with condition-related information, background vibration, structural transmission path effects, and environmental noise. When sufficient labeled samples are available, deep models may still learn useful fault-related patterns from complex signal distributions. However, in few-shot scenarios, limited labeled samples cannot adequately characterize intrinsic fault mechanisms and intra-class variations. As a result, the model may overfit to sample-specific fluctuations or condition-related patterns rather than fault-discriminative information. This problem becomes more pronounced when fault-induced components are weak. In practical rotating machinery systems, variations in speed, load, and structural transmission paths may change the amplitude, frequency distribution, and modulation characteristics of vibration signals. These variations may dominate weak early fault signatures. Consequently, samples from the same fault class may exhibit large intra-class variability under different operating conditions.
Motivated by the above analysis, this paper proposes a few-shot fault diagnosis method based on complex-valued disentangled representation learning. Two vibration samples are simultaneously fed into the shared feature extraction network, and their feature representations are optimized according to sample-to-sample relations. Through this mechanism, the proposed method can learn a more compact and separable fault feature space under few-shot conditions. Unlike standard episodic (N)-way (K)-shot meta-learning methods, such as Prototypical Networks and MAML, this study focuses on a fixed limited-labeled-sample fault diagnosis scenario. Therefore, the proposed CVDRNet addresses few-shot diagnosis from the perspective of robust representation learning rather than task-level meta-adaptation, by combining direction-pair complex-valued feature extraction, lightweight complex-valued convolution, dual-branch disentangled representation learning, and a cosine-based disentangled representation loss. The main contributions of this paper are summarized as follows:
  • To model the coupled dynamic characteristics between different vibration directions, a lightweight complex-valued convolutional module is designed. This module enhances fault-sensitive feature extraction while maintaining a compact network structure.
  • By using a dual-input weight-sharing structure, sample-to-sample relations are exploited under few-shot conditions. This structure provides a basis for discriminative representation learning from limited labeled fault samples.
  • To separate fault-sensitive information from condition-related interference, a cosine-based disentangled representation loss is introduced. It enhances intra-class compactness and inter-class separability in the fault-sensitive feature space, thereby improving the robustness and generalization ability of few-shot fault diagnosis.

2. Related Principles

2.1. Complex-Valued Convolution

Complex-valued convolution provides a structured way to model the interaction between two correlated signal components [27]. Unlike conventional real-valued convolution, complex-valued convolution processes the real and imaginary branches simultaneously and introduces cross-branch feature interaction through complex multiplication. In this study, it is employed to model the coupled dynamic responses between two directional vibration signals selected from triaxial measurements. For a triaxial vibration sample, the signals measured along three spatial directions are denoted as
x = x ( 1 ) , x ( 2 ) , x ( 3 ) ,
where x ( 1 ) , x ( 2 ) , and x ( 3 ) represent the vibration signals measured along the three directions. A complex-valued input is then constructed by selecting two directional components from the triaxial measurements. Given an ordered directional pair ( p , q ) with p , q { 1 , 2 , 3 } and p q , the corresponding complex-valued vibration representation is defined as
z ( p , q ) = x ( p ) + j x ( q ) ,
where j denotes the imaginary unit. Here, x ( p ) and x ( q ) serve as the real and imaginary branches, respectively. It should be noted that this representation is not a strict analytic signal, nor does it represent physical in-phase and quadrature components. Instead, it is introduced as a direction-pair-based complex representation to model cross-directional dynamic coupling in vibration responses. Let the complex-valued convolution kernel be
W = W r + j W i ,
where W r and W i denote the real and imaginary parts of the convolution kernel, respectively. The complex-valued convolution of W and z ( p , q ) is formulated as
W z ( p , q ) = W r + j W i x ( p ) + j x ( q ) = W r x ( p ) W i x ( q ) + j W r x ( q ) + W i x ( p ) ,
where ∗ denotes the convolution operator. According to this formulation, complex-valued convolution does not independently filter the two directional signals. Instead, both output branches integrate information from the two selected directions. Specifically, the real output branch combines W r x ( p ) and W i x ( q ) , whereas the imaginary output branch combines W r x ( q ) and W i x ( p ) in a complementary manner. This cross-branch interaction enables the network to capture cross-directional coupling characteristics in vibration responses.

2.2. Metric Learning

Metric learning aims to learn a discriminative embedding space in which the similarity between samples reflects their semantic relationships [28,29]. Instead of directly learning a mapping from input samples to class labels, it emphasizes sample-to-sample relationships in the feature space. A well-structured embedding space can improve inter-class separability and reduce overfitting to sample-specific patterns [30]. Given an input vibration sample x i , a feature extractor G ( · ; θ ) maps it into an embedding vector:
f i = G ( x i ; θ ) ,
where θ denotes the learnable parameters of the feature extractor. In the learned embedding space, samples from the same fault class are expected to be close to each other, whereas samples from different fault classes should be well separated. Therefore, the key objective of metric learning is to enhance intra-class compactness and inter-class separability.
Cosine similarity is commonly used to measure the angular similarity between two feature vectors and is insensitive to feature magnitude. For two embedding vectors f i and f j , the cosine similarity is defined as
s ( f i , f j ) = f i T f j f i 2 f j 2 ,
where s ( f i , f j ) theoretically ranges from 1 to 1. A larger value indicates higher angular similarity in the embedding space. For a given sample x i , samples from the same fault class are regarded as positives, whereas samples from different classes are regarded as negatives. Let P i and N i denote the positive and negative sets associated with x i , respectively. One general form of the metric learning objective can be expressed as
L metric = i j P i pos s ( f i , f j ) + k N i neg s ( f i , f k ) ,
where pos ( · ) encourages higher similarity between positive pairs, whereas neg ( · ) penalizes high similarity between negative pairs. Thus, intra-class samples are pulled closer, and inter-class samples are pushed apart.

3. Methods

Let D = { ( x i , y i ) } i = 1 N denote a training dataset containing N triaxial vibration samples collected from rotating machinery, where x i is the i-th triaxial vibration sample and y i is its corresponding fault label. For each sample, the vibration responses measured along three spatial directions can be represented as
x i = x i ( 1 ) , x i ( 2 ) , x i ( 3 ) ,
where x i ( 1 ) , x i ( 2 ) , and x i ( 3 ) denote the vibration signals collected from the three spatial directions. To exploit the complementary information in triaxial vibration measurements and increase the diversity of limited samples, a physically constrained random direction-pair selection strategy is adopted. Specifically, the vertical vibration component is always selected as one branch of the complex-valued input because it is generally more sensitive to fault-induced impacts and dynamic responses in rotating machinery. The other branch is randomly selected from the two remaining directional components. In this way, the constructed direction-pair complex vibration sample preserves a fault-sensitive vertical reference while introducing complementary directional information. Given an ordered directional pair ( p , q ) with p , q { 1 , 2 , 3 } and p q , the corresponding complex-valued input is defined as
z i ( p , q ) = x i ( p ) + j x i ( q ) ,
where j denotes the imaginary unit. In this representation, x i ( p ) and x i ( q ) are assigned to the real and imaginary branches, respectively. This construction is not intended to represent a strict analytic signal or physically defined in-phase and quadrature components. Instead, it is introduced as a structured direction-pair complex representation for modeling cross-directional dynamic coupling in vibration responses.
The proposed complex-valued disentangled representation network (CVDRNet) is illustrated in Figure 1. For illustration, two direction-pair complex vibration samples, S 1 and S 2 , are shown as the inputs of CVDRNet. In practice, all samples in a mini-batch are processed by the shared network during training. The network consists of four main blocks: CommBlock, CondBlock, FaultBlock, and ClassBlock. For simplicity, let B cm , B cond , B fault , and B cl denote these four blocks, respectively. The CommBlock maps the direction-pair complex vibration input into a deep common feature representation. The feature map extracted by the CommBlock is then simultaneously fed into the CondBlock and FaultBlock. The CondBlock learns condition-related interference representations, while the FaultBlock extracts fault-sensitive representations for fault identification. Finally, the fault-sensitive feature extracted from the FaultBlock is fed into the ClassBlock to generate the predicted class probability vector. A classification loss is introduced to measure the discrepancy between predictions and labels, and a cosine-based disentangled representation loss is employed to promote the disentanglement between condition-related and fault-sensitive feature representations. In general, B cm ( z r , θ cm ) is a function parameterized by θ cm , where z r denotes the r-th direction-pair complex vibration sample. The feature extracted from the CommBlock is denoted as
F r cm = B cm z r ; θ cm , F r cm R C ch × H × W .
The CondBlock is parameterized by θ cond and maps the common feature F r cm to a condition-related feature representation:
F r cond = B cond F r cm ; θ cond , F r cond R C ch × H × W .
Here, F r cond denotes the condition-related feature representation that captures operating-condition variations, direction-dependent responses, background vibration, and other interference components. Similarly, the FaultBlock is parameterized by θ fault and maps the common feature F r cm to a fault-sensitive feature representation:
F r fault = B fault F r cm ; θ fault , F r fault R C ch × H × W .
The representation F r fault is expected to preserve discriminative fault information, and its separation from condition-related interference is further promoted by the disentangled representation loss. Finally, B cl ( F r fault , θ cl ) represents a classification function parameterized by θ cl , which maps the fault-sensitive representation to the predicted class probability vector:
y ^ r = softmax B cl F r fault ; θ cl .
For the two illustrated input samples S 1 and S 2 , CVDRNet produces two sets of outputs:
F 1 cond , F 1 fault , y ^ 1 , F 2 cond , F 2 fault , y ^ 2 .
The fault-sensitive representations are used for classification, while the condition-related and fault-sensitive representations are jointly constrained by the cosine-based disentangled representation loss. Therefore, the proposed framework can explicitly enhance the discriminability of fault-sensitive features and reduce the influence of condition-related interference under few-shot conditions.
Table 1 lists the detailed architectures of the CVDRNet variants. In Table 1, the superscript of CVDRNet denotes the number of stacked Complex-ResBlocks, and the subscript denotes the dimension of the final fault-sensitive and condition-related embeddings. The kernel size and channel number of each building block are shown in square brackets, where the numbers outside the brackets indicate the number of stacked blocks. The symbol N c in the ClassBlock row denotes the number of fault classes. Unless otherwise specified, CVDRNet 32 4 is adopted as the default architecture in this paper.

3.1. Lightweight Complex-Valued Feature Extraction Module

The shared feature extraction module is designed to learn cross-directional fault-related representations from direction-pair complex vibration samples. Although conventional real-valued convolutional neural networks have been widely used in fault diagnosis, most real-valued convolutional operations fuse different input channels in a general manner without explicitly modeling their structured interactions. This strategy may be insufficient for capturing the cross-directional dynamic coupling between two directional vibration components. In the proposed method, two selected vibration directions are organized as the real and imaginary branches of a complex-valued input. Therefore, complex-valued convolution is introduced to model the interaction between these two branches in a structured manner. Figure 2 compares several representative convolutional structures, including ResBlock, ComplexBlock, and Res-ComplexBlock. Compared with real-valued convolution, complex-valued convolution enables information exchange between the real and imaginary branches through complex multiplication. For a complex-valued feature map F = F R + j F I and a complex-valued convolution kernel W = W R + j W I , the complex convolution can be expressed as
W F = W R F R W I F I + j W R F I + W I F R ,
where ∗ denotes the convolution operator. In this formulation, both the real and imaginary output branches integrate information from the two input branches. Therefore, complex-valued convolution provides a structured mechanism for learning cross-directional coupling characteristics in vibration responses.
However, directly stacking standard complex-valued convolutional layers usually introduces a large number of parameters and high computational cost. This is undesirable for few-shot fault diagnosis because over-parameterized models are more prone to overfitting when labeled fault samples are limited. To address this problem, a lightweight Complex-ConvBlock is designed by incorporating complex depthwise convolution into the complex-valued convolutional operation. The detailed structures of the proposed Complex-ConvBlock and its residual version, Complex-ResBlock, are shown in Figure 3. Given an input feature map F in R C in × H × W , where C in is assumed to be even, the first C in / 2 channels form the real branch, and the remaining C in / 2 channels form the imaginary branch. The Complex-ConvBlock consists of three main operations: pointwise expansion, complex depthwise filtering, and pointwise projection. First, a pointwise convolution followed by a nonlinear activation is used to expand the channel dimension and improve representation capability:
F ˜ = σ ( P exp F in ) ,
where P exp ( · ) denotes the pointwise expansion operation and σ ( · ) is the activation function. Second, a complex depthwise convolution is performed on the expanded feature map. Unlike standard convolution, depthwise convolution applies an individual convolutional filter to each input channel, thereby reducing the computational burden. Let F ˜ R and F ˜ I denote the expanded real and imaginary branches, respectively. The complex depthwise convolution can be formulated as
G R = D R F ˜ R D I F ˜ I ,
G I = D R F ˜ I + D I F ˜ R ,
where D R and D I denote the real and imaginary depthwise convolution kernels, respectively. The obtained feature maps G R and G I are then concatenated along the channel dimension as the real-valued tensor representation of the complex depthwise output. Finally, another pointwise convolution is adopted to fuse channel information and project the feature map to the desired output dimension:
F out = P proj G R , G I ,
where P proj ( · ) denotes the pointwise projection operation and [ · , · ] represents channel-wise concatenation. With pointwise expansion, complex depthwise filtering, and pointwise projection, the Complex-ConvBlock can extract cross-directional fault-related features with reduced computational cost.
To further improve feature propagation and alleviate optimization difficulty, a residual version of the Complex-ConvBlock, termed Complex-ResBlock, is also used in CVDRNet. As shown in Figure 3, the Complex-ResBlock introduces a shortcut connection between the input and output features:
F res = F out + S F in ,
where S ( · ) denotes an identity mapping or a projection operation used to match the feature dimensions. This residual design helps preserve useful low-level vibration information and facilitates the training of deeper complex-valued networks. As a result, the lightweight complex-valued feature extraction module can effectively model cross-directional vibration coupling while maintaining a compact network structure suitable for few-shot fault diagnosis.

3.2. Parameter Count for Complex-ConvBlock

To further analyze the computational efficiency of the proposed lightweight complex-valued feature extraction module, the model size and computational cost of Complex-ConvBlock are discussed in this subsection. For a standard convolutional block with spatial size h × w , c input channels, c output channels, and kernel size k h × k w , the number of multiply-add operations can be approximately calculated as
MAdds std = h w c c k h k w .
Although standard convolution has strong representational capacity, its computational cost increases rapidly with the number of input and output channels. This problem becomes more pronounced when complex-valued convolution is used, because the real and imaginary branches need to be processed and coupled simultaneously.
The proposed Complex-ConvBlock is designed as a lightweight replacement for standard complex convolution. Its detailed transformation process from c input channels to c output channels with stride s is shown in Table 2. In the complex-valued representation, the input and output channels are divided into real and imaginary branches, i.e., c = c R + c I and c = c R + c I . The expansion ratio λ controls the number of intermediate channels in the Complex-ConvBlock.
As shown in Table 2, the Complex-ConvBlock first partitions the input channels into the real and imaginary branches. Then, a 1 × 1 convolution followed by ReLU is used to expand the channel dimension from ( c R + c I ) to ( λ c R + λ c I ) . After that, a 1 × 3 complex depthwise convolution is performed for lightweight local feature filtering. Finally, another 1 × 1 convolution is adopted to project the feature map to ( c R + c I ) , and the real and imaginary branches are concatenated along the channel dimension to obtain the final output feature with c channels. Following the operation sequence in Table 2, the multiply-add operations of Complex-ConvBlock can be approximately expressed as
MAdds CCB = λ h w 1 2 c 2 k h k w + c k h k w + c c ,
where the three terms represent the computational contributions of the expansion, complex depthwise filtering, and projection stages, respectively, following the calculation rule adopted for the block. Compared with directly stacking standard complex convolutional layers, the use of depthwise convolution significantly reduces the computational burden while retaining the interaction between the real and imaginary branches. The main advantage of Complex-ConvBlock lies in reducing computational cost while maintaining sufficient representation capability. To evaluate this property, Figure 4 compares the parameter count, memory cost, floating-point operations, multiply-add operations, and total memory requirement among Complex-ConvBlock, Complex-ResBlock, ResBlock, and ResCBlock. Figure 4 shows that Complex-ConvBlock and Complex-ResBlock require fewer operations and lower memory consumption than ResBlock and ResCBlock under the same input conditions. This indicates that the proposed lightweight complex-valued design provides a more compact feature extractor for few-shot fault diagnosis, where over-parameterized models may easily overfit limited labeled fault samples.
The expansion ratio λ controls the trade-off between computational cost and feature representation ability. A larger λ increases the number of intermediate channels and may improve the expressive capacity of the network. However, it also leads to higher parameter count, memory usage, and computational cost. As shown in Figure 4, the computational statistics of Complex-ConvBlock and Complex-ResBlock increase approximately linearly with the expansion ratio. Therefore, a moderate expansion ratio is preferred to balance diagnostic performance and model complexity. In this paper, the expansion ratio is set to λ = 2 unless otherwise specified. This setting provides sufficient feature transformation capability while keeping the network compact for few-shot fault diagnosis.

3.3. Cosine-Based Disentangled Representation Loss

CVDRNet is optimized by jointly considering fault classification and feature disentanglement. For a direction-pair complex vibration sample z r , the predicted class probability vector is obtained from the fault-sensitive representation:
y ^ r = softmax B cl F r fault ; θ cl .
The overall loss function consists of a classification loss and a cosine-based disentangled representation loss:
L total = α L cls + β L CosDR ,
where α and β are weighting coefficients. The classification loss supervises fault prediction, while the cosine-based disentangled representation loss enhances the discriminability of fault-sensitive representations and suppresses the similarity between fault-sensitive and condition-related representations. For a mini-batch B , the classification loss is formulated as the cross-entropy between the predicted probability vector and the ground-truth label:
L cls = 1 | B | r B c = 1 N c y r , c log y ^ r , c ,
where N c denotes the number of fault classes, y r , c is the one-hot encoded label, and y ^ r , c is the predicted probability of the c-th class.
The classification loss mainly supervises the fault prediction results, but it does not explicitly separate fault-sensitive information from condition-related interference. Therefore, a cosine-based disentangled representation loss is further introduced. For metric calculation, the fault-sensitive and condition-related feature maps are first transformed into embedding vectors by global average pooling:
f r fault = GAP F r fault , f r cond = GAP F r cond .
The cosine similarity between two embedding vectors a and b is defined as
s ( a , b ) = a T b a 2 b 2 .
As illustrated in Figure 5, each mini-batch contains fault-sensitive and condition-related embeddings. For each anchor fault-sensitive embedding f m fault , the positive and negative index sets are defined as
P m = n y n = y m , n m , N m = n y n y m .
Here, P m contains the indices of samples from the same fault class as the anchor sample, while N m contains the indices of samples from different fault classes. In addition, condition-related embeddings are used to construct disentanglement constraints to reduce the correlation between f fault and f cond . The cosine-based disentangled representation loss is composed of a positive compactness term and a negative term that includes inter-class separation and feature disentanglement:
L CosDR = L CosDR p + L CosDR n .
The positive compactness term is defined as
L CosDR p = 1 | B | m B log 1 + n P m exp μ s f m fault , f n fault φ ,
where μ is a scaling factor and φ is the similarity margin. This term pulls fault-sensitive embeddings from the same fault class closer in the embedding space. The negative term is defined as
L CosDR n = 1 | B | m B log 1 + n N m exp μ s f m fault , f n fault φ + 1 | B | 2 m B n B max 0 , s f m fault , f n cond ε ,
where ε is the disentanglement margin. The first part pushes fault-sensitive embeddings from different fault classes apart, while the second part suppresses the similarity between fault-sensitive and condition-related embeddings. With the Softplus-like formulation, the optimization strength is automatically adjusted according to the relative hardness of sample pairs. Hard positive pairs with low cosine similarity and hard negative pairs with high cosine similarity receive larger gradients. Meanwhile, the disentanglement term penalizes highly similar fault-sensitive and condition-related embeddings only when their cosine similarity exceeds the margin ε . Therefore, the proposed loss provides richer pairwise supervisory information within each mini-batch. It improves intra-class compactness, inter-class separability, and feature disentanglement, thereby enabling CVDRNet to learn more robust fault-sensitive representations under few-shot conditions.

4. Experiments Results and Comparisons

In this section, extensive experiments are conducted on the Paderborn University (PU) bearing dataset [31] and a drivetrain dynamics simulator (DDS) bearing dataset to verify the effectiveness and generalization ability of the proposed method. The former provides real bearing vibration signals collected under different operating conditions, while the latter is used to further evaluate the diagnostic capability of the model in a controlled drivetrain simulation environment.

4.1. Data Preprocessing

The continuous vibration signals are segmented into fixed-length samples to construct the training and testing sets. Specifically, each raw signal is divided by a sliding window with a sample length of L and an overlap ratio of ρ . Each segmented sample inherits the health-condition label of the original signal. To avoid information leakage, the training and testing sets are split before direction-pair augmentation, and the augmented samples generated from the same original segment are assigned to the same subset.
To simulate a fixed limited-labeled-sample fault diagnosis scenario rather than a standard episodic N-way K-shot meta-learning protocol, only 30% of the available samples in each fault category are used for model training in all experiments, while the remaining 70% are used for testing. The training samples are randomly selected before direction-pair augmentation, ensuring that the augmented samples are generated only from the selected training subset. This setting restricts the amount of labeled training data and allows different methods to be trained and evaluated under the same data partition and noise conditions, thereby evaluating their ability to learn fault-discriminative representations from limited samples.

4.2. Implementation Details

All experiments were implemented in PyTorch 2.12.0 and executed on a workstation equipped with an NVIDIA 5060 Ti GPU and an Intel Core i7-14700K CPU. Unless otherwise specified, CVDRNet 32 4 was adopted as the default backbone, and the expansion ratio in the Complex-ConvBlock was set to λ = 2 . The proposed model was trained in an end-to-end manner using the Adam optimizer with an initial learning rate of 1 × 10 4 . The batch size was fixed at 256, and the training process was conducted for 50 epochs. In addition, a step-wise learning-rate decay strategy was adopted, in which the learning rate was multiplied by 0.01 every 20 epochs to facilitate convergence and mitigate overfitting. To reduce the influence of random initialization and random data partitioning, each experiment was independently repeated ten times. The final diagnostic results are reported as the mean value and standard deviation over the ten independent runs.

4.3. Dataset Description

Two bearing datasets are employed for performance evaluation, namely the PU-bearing dataset and a DDS bearing dataset. The PU dataset serves as a real-world bearing benchmark, whereas the DDS dataset provides a controlled drivetrain scenario for further validation. These two datasets offer complementary experimental conditions, covering real measured bearing vibration signals and simulator-based drivetrain fault cases.
The PU-bearing dataset was collected from the bearing test rig shown in Figure 6. The platform is designed to record bearing signals under different health conditions and operating conditions. It provides vibration responses of healthy and faulty bearings, together with operating parameters such as rotational speed, load torque, and radial load. In this study, vibration signals are selected as the diagnostic data source. The dataset contains normal bearings and faulty bearings with different localized defect characteristics. Since the signals are collected from a physical bearing test platform, the PU dataset is suitable for evaluating the diagnostic performance of the proposed method under real measurement conditions and varying operating environments.
The DDS bearing dataset was established using the drivetrain dynamics simulator rig shown in Figure 7. The simulator is constructed to reproduce representative operating conditions of rotating machinery and to support the analysis of bearing fault dynamics in drivetrain systems. The platform mainly consists of a driving motor, a rolling bearing support unit, an inertial flywheel, a gearbox transmission module, and a magnetic powder brake. The driving motor provides rotational drive at preset speeds of 1000, 1500, and 2000 r/min, while the magnetic powder brake is used to apply different mechanical loads. The flywheel is introduced to emulate inertial effects during drivetrain operation, and the gearbox module forms the transmission path through which fault-induced vibration propagates. All signals are sampled at 40 kHz to capture transient impacts and dynamic characteristics caused by bearing defects. The DDS dataset includes four health categories: normal condition (NC), inner race fault (IR), outer race fault (OR), and rolling element fault (RB). For each fault type, three artificial defect diameters are considered, namely 0.5 mm, 1.0 mm, and 2.0 mm. Nine operating conditions are formed by combining three rotational speeds with three load levels of 0, 2, and 4 hp.

4.4. Comparison with Other Methods on the PU-Bearing Dataset

To evaluate the diagnostic performance of the proposed method on real measured bearing signals, several representative methods are selected for comparison on the PU-bearing dataset, including CLFormer [32], DRSN [33], SepFormer [34], ResNetTh [35], and TST [36]. Although ResNetTh was originally developed for DAB converter fault diagnosis, it is used here as an architecture-level baseline because its residual threshold-denoising structure is relevant to supervised one-dimensional fault classification under noisy conditions. The comparison is conducted under four noise levels, namely 20 dB, 24 dB, 28 dB, and 32 dB. The diagnostic accuracy and the number of trainable parameters are reported in Table 3.
As shown in Table 3, the proposed CVDRNet variants consistently outperform the compared methods under all noise levels. Among the baseline methods, SepFormer achieves the best overall performance, with accuracies of 83.21%, 88.38%, 93.26%, and 93.72% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. This indicates that transformer-based feature modeling is effective for bearing fault diagnosis under noisy conditions. However, all CVDRNet variants achieve higher diagnostic accuracy than SepFormer, demonstrating the advantage of the proposed complex-valued disentangled representation learning framework. Specifically, the lightweight CVDRNet 32 2 achieves accuracies of 85.14%, 91.66%, 93.73%, and 95.66% under the four noise levels, respectively. Compared with SepFormer, CVDRNet 32 2 improves the average accuracy from 89.64% to 91.55%, while using only 1.4 M parameters. This result shows that the direction-pair complex representation and lightweight complex-valued convolution can enhance fault-sensitive feature extraction even with a relatively compact network structure.
Among all tested variants, CVDRNet 32 6 obtains the best overall performance on the PU-bearing dataset. Its accuracies reach 91.84%, 96.06%, 98.03%, and 97.94% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. Compared with the strongest baseline SepFormer, CVDRNet 32 6 improves the average accuracy by approximately 6.33 percentage points. The improvement is more significant under stronger noise interference. For example, under the 20 dB condition, CVDRNet 32 6 exceeds SepFormer by 8.63 percentage points, which indicates that the proposed method has stronger noise robustness.
The superior low-SNR performance is mainly attributed to representation-level noise suppression. The direction-pair complex representation exploits complementary cross-directional information, since fault-induced responses are more likely to show consistent coupling across vibration directions than random noise. The complex-valued convolution further models the interaction between the real and imaginary branches, thereby enhancing coupled fault-sensitive features. In addition, the cosine-based disentangled representation loss improves intra-class compactness and inter-class separability, making the learned fault-sensitive embeddings less affected by noise-induced perturbations. It can also be observed that increasing the embedding dimension does not always lead to consistent performance improvement. CVDRNet 256 4 has slightly more parameters than CVDRNet 32 4 and CVDRNet 64 4 , but its accuracy is not consistently higher under all noise levels. In contrast, increasing the network depth from CVDRNet 32 4 to CVDRNet 32 6 brings a more evident improvement in most cases. This suggests that deeper complex-valued residual feature extraction is more beneficial than simply increasing the embedding dimension for the PU-bearing dataset.

4.5. Comparison with Other Methods on the DDS Bearing Dataset

To further evaluate the robustness and generalization ability of the proposed method under controlled drivetrain operating conditions, comparative experiments are conducted on the DDS bearing dataset. Five representative fault diagnosis methods are selected as baseline models. The experiments are performed under four noise levels. The diagnostic accuracy and the number of trainable parameters are listed in Table 4.
As shown in Table 4, the proposed CVDRNet variants achieve consistently better performance than the compared methods under all noise levels. Among the baseline models, SepFormer obtains the highest overall accuracy, reaching 86.69%, 89.06%, 92.55%, and 93.26% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. This indicates that SepFormer has a relatively strong capability for extracting fault-related features from noisy vibration signals. However, all CVDRNet variants outperform SepFormer across the four noise levels, which demonstrates the effectiveness of the proposed complex-valued feature extraction and disentangled representation learning strategy. Specifically, the lightweight CVDRNet 32 2 achieves accuracies of 89.18%, 92.44%, 94.87%, and 96.94% under the four noise levels, respectively. Compared with SepFormer, its average accuracy increases from 90.39% to 93.36%, while the parameter number is reduced from 2.1 M to 1.4 M. This result suggests that the direction-pair complex representation can enhance diagnostic performance without relying on a large-scale network structure. Therefore, even a compact CVDRNet variant can effectively capture fault-sensitive information from multi-directional vibration responses.
Among all tested CVDRNet variants, CVDRNet 32 6 achieves the highest average accuracy on the DDS bearing dataset. Its diagnostic accuracies reach 92.66%, 97.16%, 96.78%, and 98.73% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. Compared with the strongest baseline SepFormer, CVDRNet 32 6 improves the average accuracy by approximately 5.94 percentage points. In particular, under the 24 dB noise condition, the improvement reaches 8.10 percentage points, indicating that the proposed model has strong robustness to noise interference in drivetrain fault diagnosis.
It can also be observed that different CVDRNet variants show slight performance differences. CVDRNet 32 4 achieves the best result under the 28 dB condition, with an accuracy of 97.73%, while CVDRNet 32 6 obtains the best results under 20 dB, 24 dB, and 32 dB. This indicates that increasing the network depth generally improves the feature representation capability, but the optimal architecture may vary slightly under different noise levels. In contrast, simply increasing the embedding dimension from 32 to 64 or 256 does not always lead to consistent improvement. For example, CVDRNet 64 4 and CVDRNet 256 4 have slightly more parameters than CVDRNet 32 4 , but their accuracies are not consistently higher. This suggests that an excessively large embedding dimension may introduce redundant parameters and does not necessarily improve generalization.

4.6. Feature Disentangle Visualization

To further analyze the effectiveness of the proposed disentangled representation learning strategy, t-SNE visualization is used to project the learned high-dimensional embeddings into a two-dimensional space. Figure 8 shows the visualization results of the fault-sensitive embeddings f fault and the condition-related embeddings f cond .
As shown in Figure 8a, the fault-sensitive embeddings f fault form compact and distinguishable clusters. Samples belonging to the same fault class are closely distributed, while samples from different fault classes are clearly separated in the embedding space. This indicates that the FaultBlock can effectively preserve discriminative fault-related information. The compact intra-class distribution also demonstrates that the proposed cosine-based disentangled representation loss helps reduce feature dispersion within the same fault category. Meanwhile, the clear inter-class boundaries show that the learned fault-sensitive representations are suitable for fault classification under limited labeled samples. In contrast, Figure 8b shows the distribution of the condition-related embeddings f cond . Compared with the fault-sensitive embeddings, the condition-related embeddings are more dispersed and do not form clear class-wise clusters. Samples from different fault categories are highly mixed in the projected space. This phenomenon suggests that the CondBlock mainly captures non-fault-dominant information, such as operating-condition variations, direction-related responses, background vibration, and other interference components. Since these representations do not show obvious fault-discriminative structures, they are effectively separated from the fault-sensitive feature space.
The comparison between Figure 8a,b verifies the feature disentanglement capability of CVDRNet. The proposed framework encourages fault-related information to be concentrated in f fault , while condition-related interference is absorbed by f cond . Therefore, the model can learn more robust and discriminative fault-sensitive representations, which contributes to the improved diagnostic performance under few-shot and noisy conditions.

4.7. Ablation Study

4.7.1. Block Design Ablation

To investigate the influence of different convolutional block designs, three variants of CVDRNet are compared, including CVDRNet with ResBlock, CVDRNet with Complex-ConvBlock, and CVDRNet with Complex-ResBlock. The experimental results under different SNRs are reported in Table 5. Among the three block designs, CVDRNet with Complex-ResBlock achieves the best average performance. Its Top-1 accuracies reach 88.52%, 93.69%, 95.78%, and 96.98% under the four SNR conditions. Compared with the conventional ResBlock, Complex-ResBlock improves the average accuracy from 93.04% to 93.74%. The improvement is especially evident under the low-SNR condition of 16 dB, where the accuracy increases by 2.37 percentage points. This result suggests that the residual complex-valued structure is beneficial for improving feature propagation and enhancing the robustness of the model under strong noise interference.

4.7.2. Loss Function Ablation

To verify the contribution of the proposed cosine-based disentangled representation loss, two loss configurations are compared, namely the conventional classification loss L class and the joint loss L class + L CosDR . The experimental results under different SNRs are shown in Table 6. The improvement is particularly significant under stronger noise interference. Under the 16 dB condition, the accuracy increases by 6.72 percentage points after adding L CosDR . Under 20 dB and 24 dB conditions, the improvements are 4.47 and 3.87 percentage points, respectively. This demonstrates that the proposed loss can enhance the robustness of the learned representation when fault-related components are weakened by noise. In contrast, under the 32 dB condition, the improvement is relatively smaller because the fault-sensitive information is easier to identify when the noise level is lower.

4.7.3. Hyperparameter Sensitivity of L CosDR

The hyperparameters μ and φ in L CosDR determine the optimization behavior of the cosine-based disentangled representation loss. To investigate their influence on diagnostic performance, a sensitivity analysis is conducted on the PU-bearing dataset under the 32 dB SNR condition using Top-1 accuracy as the evaluation metric. As shown in Figure 9, the Top-1 accuracy changes noticeably with different combinations of μ and φ , and a high-performance region can be observed in the parameter space. This indicates that the strength of the disentanglement constraint directly affects the learned representation. When μ or φ deviates from the suitable range, the diagnostic accuracy decreases. A weak constraint may fail to enhance the discriminability of fault-sensitive embeddings. Conversely, an overly strong constraint may interfere with classification-oriented feature learning. Overall, the sensitivity analysis shows that the effectiveness of L CosDR depends on a proper balance between scaling and margin constraints. Therefore, the selected hyperparameter combination is adopted in the reported experiments to balance fault-sensitive feature discrimination and condition-related feature disentanglement.

5. Conclusions

This paper proposes a complex-valued disentangled representation network, termed CVDRNet, for few-shot fault diagnosis of rotating machinery. A direction-pair complex representation strategy is designed for triaxial vibration signals, where two selected directional components are assigned to the real and imaginary branches. This strategy increases sample diversity and enables cross-directional dynamic coupling to be modeled under limited labeled samples.
A lightweight complex-valued feature extraction module is further developed. By combining complex-valued convolution with depthwise convolution, the proposed Complex-ConvBlock enhances real-imaginary branch interaction while reducing computational cost. The Complex-ResBlock improves feature propagation through residual learning. In addition, a cosine-based disentangled representation loss is introduced to enhance intra-class compactness and inter-class separability of fault-sensitive embeddings while suppressing condition-related interference.
Experiments on the PU and DDS bearing datasets show that CVDRNet consistently outperforms representative methods when only 30% of samples are used for training. The improvement is not solely due to increased model size, since the lightweight CVDRNet 32 2 contains only 1.4 M parameters, fewer than SepFormer with 2.1 M parameters, but still achieves higher average accuracy on both datasets. For industrial applications, larger CVDRNet variants are suitable for offline or server-side diagnosis, while CVDRNet 32 2 offers a more practical balance between accuracy and model size for resource-constrained deployment.
Visualization and ablation studies further verify the effectiveness of the complex-valued block design and the disentangled representation loss. Future work will focus on adaptive direction selection, lightweight deployment, and cross-condition domain generalization, especially for cases with large training-test distribution gaps caused by wide rotational speed or load variations.

Author Contributions

Conceptualization, Q.Z. and K.Y.; methodology, Q.Z. and K.Y.; software, Q.Z.; validation, Q.Z., X.X. and Z.C.; formal analysis, Q.Z.; investigation, Q.Z., X.X. and Z.C.; resources, L.Y., Y.F. and K.Y.; data curation, Q.Z. and Z.C.; writing—original draft preparation, Q.Z.; writing—review and editing, X.X., Z.C., L.Y., Y.F. and K.Y.; visualization, Q.Z.; supervision, L.Y., Y.F. and K.Y.; project administration, L.Y. and K.Y.; funding acquisition, L.Y., Y.F. and K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant 52505133, the Foundation of CRRC GROUP under grant 2025CZA383 and 2026CKA712-2.

Data Availability Statement

The data underlying this study are available in the article. Further details can be requested from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, H.; Li, C.; Ding, P.; Li, S.; Li, T.; Liu, C.; Zhang, X.; Hong, Z. A novel transformer-based few-shot learning method for intelligent fault diagnosis with noisy labels under varying working conditions. Reliab. Eng. Syst. Saf. 2024, 251, 110400. [Google Scholar] [CrossRef]
  2. Yu, G.; Wu, P.; Lv, Z.; Hou, J.; Ma, B.; Han, Y. Few-shot fault diagnosis method of rotating machinery using novel mcgm based cnn. IEEE Trans. Ind. Inform. 2023, 19, 10944–10955. [Google Scholar] [CrossRef]
  3. Wang, X.; Jiang, H.; Mu, M.; Dong, Y. A trackable multidomain collaborative generative adversarial network for rotating machinery fault diagnosis. Mech. Syst. Signal Process. 2025, 224, 111950. [Google Scholar] [CrossRef]
  4. Xiao, Y.; Shao, H.; Yan, S.; Wang, J.; Peng, Y.; Liu, B. Domain generalization for rotating machinery fault diagnosis: A survey. Adv. Eng. Inform. 2025, 64, 103063. [Google Scholar] [CrossRef]
  5. Li, X.; Shao, H.; Lu, S.; Xiang, J.; Cai, B. Highly efficient fault diagnosis of rotating machinery under time-varying speeds using lsismm and small infrared thermal images. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 7328–7340. [Google Scholar] [CrossRef]
  6. Shubita, R.R.; Alsadeh, A.S.; Khater, I.M. Fault detection in rotating machinery based on sound signal using edge machine learning. IEEE Access 2023, 11, 6665–6672. [Google Scholar] [CrossRef]
  7. Bagri, I.; Tahiry, K.; Hraiba, A.; Touil, A.; Mousrij, A. Vibration signal analysis for intelligent rotating machinery diagnosis and prognosis: A comprehensive systematic literature review. Vibration 2024, 7, 1013–1062. [Google Scholar] [CrossRef]
  8. Tama, B.A.; Vania, M.; Lee, S.; Lim, S. Recent advances in the application of deep learning for fault diagnosis of rotating machinery using vibration signals. Artif. Intell. Rev. 2023, 56, 4667–4709. [Google Scholar] [CrossRef]
  9. Pang, P.; Tang, J.; Luo, J.; Chen, M.; Yuan, H.; Jiang, L. An explainable and lightweight improved 1d cnn model for vibration signals of rotating machinery. IEEE Sens. J. 2024, 24, 6976–6997. [Google Scholar] [CrossRef]
  10. Ruiz-Sarrio, J.E.; Antoninodaviu, J.A.; Martis, C. Comprehensive diagnosis of localized rolling bearing faults during rotating machine start-up via vibration envelope analysis. Electronics 2024, 13, 375. [Google Scholar] [CrossRef]
  11. Liang, X.; Zhang, M.; Feng, G.; Xu, Y.; Zhen, D.; Gu, F. A novel deep model with meta-learning for rolling bearing few-shot fault diagnosis. J. Dyn. Monit. Diagn. 2023, 2, 102–114. [Google Scholar] [CrossRef]
  12. Zhang, X.; Tang, J.; Qu, Y.; Qin, G.; Guo, L.; Xie, J.; Long, Z. Few-shot fault diagnosis based on heterogeneous information fusion and meta learning. IEEE Sens. J. 2023, 23, 21433–21442. [Google Scholar] [CrossRef]
  13. Xu, L.; Teoh, S.S.; Ibrahim, H. A deep learning approach for electric motor fault diagnosis based on modified inceptionv3. Sci. Rep. 2024, 14, 12344. [Google Scholar] [CrossRef] [PubMed]
  14. Tang, H.; Tang, Y.; Su, Y.; Feng, W.; Wang, B.; Chen, P.; Zuo, D. Feature extraction of multi-sensors for early bearing fault diagnosis using deep learning based on minimum unscented kalman filter. Eng. Appl. Artif. Intell. 2024, 127, 107138. [Google Scholar] [CrossRef]
  15. Xu, K.; Kong, X.; Wang, Q.; Yang, S.; Huang, N.; Wang, J. A bearing fault diagnosis method without fault data in new working condition combined dynamic model with deep learning. Adv. Eng. Inform. 2022, 54, 101795. [Google Scholar] [CrossRef]
  16. Jiang, C.; Chen, H.; Xu, Q.; Wang, X. Few-shot fault diagnosis of rotating machinery with two-branch prototypical networks. J. Intell. Manuf. 2023, 34, 1667–1681. [Google Scholar] [CrossRef]
  17. Lin, J.; Shao, H.; Zhou, X.; Cai, B.; Liu, B. Generalized maml for few-shot crossdomain fault diagnosis of bearing driven by heterogeneous signals. Expert Syst. Appl. 2023, 230, 120696. [Google Scholar] [CrossRef]
  18. Li, C.; Li, S.; Wang, H.; Gu, F.; Ball, A.D. Attention-based deep meta-transfer learning for few-shot fine-grained fault diagnosis. Knowl.-Based Syst. 2023, 264, 110345. [Google Scholar] [CrossRef]
  19. Ren, C.; Jiang, B.; Lu, N.; Simani, S.; Gao, F. Meta-learning with distributional similarity preference for few-shot fault diagnosis under varying working conditions. IEEE Trans. Cybern. 2023, 54, 2746–2756. [Google Scholar] [CrossRef]
  20. Liu, S.; Liu, X.; Jiang, Z. A few-shot bearing fault diagnosis method integrating improved generative adversarial network and cnn-bilstm-attention hybrid network. Appl. Sci. 2026, 16, 2660. [Google Scholar] [CrossRef]
  21. Leite, D.; Andrade, E.; Rativa, D.; Maciel, A.M. Fault detection and diagnosis in industry 4.0: A review on challenges and opportunities. Sensors 2024, 25, 60. [Google Scholar] [CrossRef]
  22. Saleem, F.; Umar, M.; Kim, J.-M. An optimized few-shot learning framework for fault diagnosis in milling machines. Machines 2025, 13, 1010. [Google Scholar] [CrossRef]
  23. Wang, H.; Wang, J.; Zhao, Y.; Liu, Q.; Liu, M.; Shen, W. Few-shot learning for fault diagnosis with a dual graph neural network. IEEE Trans. Ind. Inform. 2022, 19, 1559–1568. [Google Scholar] [CrossRef]
  24. Ren, Z.; Zhu, Y.; Liu, Z.; Feng, K. Few-shot gan: Improving the performance of intelligent fault diagnosis in severe data imbalance. IEEE Trans. Instrum. Meas. 2023, 72, 3516814. [Google Scholar] [CrossRef]
  25. Wang, H.; Wang, X.; Yang, Y.; Gryllias, K.; Liu, Z. A few-shot machinery fault diagnosis framework based on self-supervised signal representation learning. IEEE Trans. Instrum. Meas. 2024, 73, 3509114. [Google Scholar] [CrossRef]
  26. Lin, C.; Kong, Y.; Han, Q.; Wang, T.; Dong, M.; Liu, H.; Chu, F. An information fusion-based meta transfer learning method for few-shot fault diagnosis under varying operating conditions. Mech. Syst. Signal Process. 2024, 220, 111652. [Google Scholar] [CrossRef]
  27. Lee, C.; Hasegawa, H.; Gao, S. Complex-valued neural networks: A comprehensive survey. IEEE/CAA J. Autom. Sin. 2022, 9, 1406–1426. [Google Scholar] [CrossRef]
  28. Huang, K.; Wu, S.; Sun, B.; Yang, C.; Gui, W. Metric learning-based fault diagnosis and anomaly detection for industrial data with intraclass variance. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 547–558. [Google Scholar] [CrossRef]
  29. Xie, J.; Liu, J.; Ding, T.; Wang, T.; Yu, T. Self-attention metric learning based on multiscale feature fusion for few-shot fault diagnosis. IEEE Sens. J. 2023, 23, 19771–19782. [Google Scholar] [CrossRef]
  30. Zheng, W.; Tian, X.; Yang, B.; Liu, S.; Ding, Y.; Tian, J.; Yin, L. A few shot classification methods based on multiscale relational networks. Appl. Sci. 2022, 12, 4059. [Google Scholar] [CrossRef]
  31. Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. PHM Soc. Eur. Conf. 2016, 3. [Google Scholar] [CrossRef]
  32. Fang, H.; Deng, J.; Bai, Y.; Feng, B.; Li, S.; Shao, S.; Chen, D. Clformer: A lightweight transformer based on convolutional embedding and linear self-attention with strong robustness for bearing fault diagnosis under limited sample conditions. IEEE Trans. Instrum. Meas. 2021, 71, 3504608. [Google Scholar] [CrossRef]
  33. Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 2019, 16, 4681–4690. [Google Scholar] [CrossRef]
  34. Yin, K.; Chen, C.; Shen, Q.; Deng, J. A lightweight and rapidly converging transformer based on separable linear self-attention for fault diagnosis. Meas. Sci. Technol. 2025, 36, 0161b4. [Google Scholar] [CrossRef]
  35. Cai, F.; Zhan, M.; Chai, Q.; Jiang, J. Fault diagnosis of dab converters based on resnet with adaptive threshold denoising. IEEE Trans. Instrum. Meas. 2022, 71, 3515510. [Google Scholar] [CrossRef]
  36. Jin, Y.; Hou, L.; Chen, Y. A time series transformer based method for the rotating machinery fault diagnosis. Neurocomputing 2022, 494, 379–395. [Google Scholar] [CrossRef]
Figure 1. Overall framework of the proposed few-shot fault diagnosis method.
Figure 1. Overall framework of the proposed few-shot fault diagnosis method.
Machines 14 00655 g001
Figure 2. Comparison of three typical convolutional structures. The cube thickness indicates the relative number of channels: (a) ResBlock; (b) ComplexBlock; (c) Res-ComplexBlock.
Figure 2. Comparison of three typical convolutional structures. The cube thickness indicates the relative number of channels: (a) ResBlock; (b) ComplexBlock; (c) Res-ComplexBlock.
Machines 14 00655 g002
Figure 3. Structures of the proposed complex-valued blocks in CVDRNet: (a) Complex-ConvBlock; (b) Complex-ResBlock.
Figure 3. Structures of the proposed complex-valued blocks in CVDRNet: (a) Complex-ConvBlock; (b) Complex-ResBlock.
Machines 14 00655 g003
Figure 4. Comparison of parameter count, memory cost, FLOPs, MAdds, and total memory requirement among Complex-ConvBlock, Complex-ResBlock, ResBlock, and ResCBlock. (a) Five computational statistics are evaluated under the conditions of h = 1 , w = 200 , c = 32 , c = 128 , k = ( 1 , 3 ) , s = ( 1 , 1 ) , and expansion ratio λ { 1 , 2 , 4 , 6 , 8 , 10 , 12 } . Since ResBlock and ResCBlock do not contain the expansion ratio λ , their statistical results are shown as straight lines. (b) Five computational statistics are evaluated under the conditions of h = 1 , w = 200 , c = 32 , λ = 1 , k = ( 1 , 3 ) , s = ( 1 , 1 ) , and output channels c { 64 , 128 , 256 } .
Figure 4. Comparison of parameter count, memory cost, FLOPs, MAdds, and total memory requirement among Complex-ConvBlock, Complex-ResBlock, ResBlock, and ResCBlock. (a) Five computational statistics are evaluated under the conditions of h = 1 , w = 200 , c = 32 , c = 128 , k = ( 1 , 3 ) , s = ( 1 , 1 ) , and expansion ratio λ { 1 , 2 , 4 , 6 , 8 , 10 , 12 } . Since ResBlock and ResCBlock do not contain the expansion ratio λ , their statistical results are shown as straight lines. (b) Five computational statistics are evaluated under the conditions of h = 1 , w = 200 , c = 32 , λ = 1 , k = ( 1 , 3 ) , s = ( 1 , 1 ) , and output channels c { 64 , 128 , 256 } .
Machines 14 00655 g004
Figure 5. Illustration of positive and negative pairs in a mini-batch. Yellow and gray nodes represent f fault and f cond , respectively, while different node shapes indicate different fault classes. Red edges denote positive pairs within the same fault class. Blue edges denote negative pairs from different fault classes, and gray edges denote disentanglement constraints between f fault and f cond .
Figure 5. Illustration of positive and negative pairs in a mini-batch. Yellow and gray nodes represent f fault and f cond , respectively, while different node shapes indicate different fault classes. Red edges denote positive pairs within the same fault class. Blue edges denote negative pairs from different fault classes, and gray edges denote disentanglement constraints between f fault and f cond .
Machines 14 00655 g005
Figure 6. Paderborn University bearing test rig.
Figure 6. Paderborn University bearing test rig.
Machines 14 00655 g006
Figure 7. Drivetrain dynamics simulator rig.
Figure 7. Drivetrain dynamics simulator rig.
Machines 14 00655 g007
Figure 8. t-SNE visualization of the learned feature representations. (a) The fault-sensitive embeddings f fault form compact and separable clusters for different fault classes, indicating that discriminative fault-related information is effectively preserved. (b) The condition-related embeddings f cond present a more dispersed distribution without clear fault-class clustering, suggesting that condition-related interference is separated from the fault-sensitive representation space.
Figure 8. t-SNE visualization of the learned feature representations. (a) The fault-sensitive embeddings f fault form compact and separable clusters for different fault classes, indicating that discriminative fault-related information is effectively preserved. (b) The condition-related embeddings f cond present a more dispersed distribution without clear fault-class clustering, suggesting that condition-related interference is separated from the fault-sensitive representation space.
Machines 14 00655 g008
Figure 9. Hyperparameter sensitivity analysis of μ and φ in L CosDR on the PU-bearing dataset under the 32 dB SNR condition. The surface represents the Top-1 accuracy obtained with different combinations of μ and φ .
Figure 9. Hyperparameter sensitivity analysis of μ and φ in L CosDR on the PU-bearing dataset under the 32 dB SNR condition. The surface represents the Top-1 accuracy obtained with different combinations of μ and φ .
Machines 14 00655 g009
Table 1. Architectures for the variants of the proposed CVDRNet.
Table 1. Architectures for the variants of the proposed CVDRNet.
Block
Name
Layer
Name
CVDRNet 32 2 CVDRNet 256 4 CVDRNet 64 4 CVDRNet 32 4 CVDRNet 32 6
CommBlockComplex
ConvBlock
[ 1 × 7 , 64 1 × 3 , 128 ] × 1 [ 1 × 7 , 64 1 × 3 , 128 ] × 1 [ 1 × 7 , 64 1 × 3 , 128 ] × 1 [ 1 × 7 , 64 1 × 3 , 128 ] × 1 [ 1 × 7 , 64 1 × 3 , 128 ] × 1
Complex
ResBlock
[ 1 × 5 , 256 ] × 1 [ 1 × 5 , 256 ] × 1
[ 1 × 5 , 512 ] × 1
[ 1 × 5 , 256 ] × 1
[ 1 × 5 , 512 ] × 1
[ 1 × 5 , 256 ] × 1
[ 1 × 5 , 512 ] × 1
[ 1 × 5 , 256 ] × 1
[ 1 × 5 , 512 ] × 2
CondBlock
FaultBlock
Complex
ResBlock
[ 1 × 3 , 512 ] × 1 [ 1 × 3 , 512 ] × 1
[ 1 × 3 , 1024 ] × 1
[ 1 × 3 , 512 ] × 1
[ 1 × 3 , 1024 ] × 1
[ 1 × 3 , 512 ] × 1
[ 1 × 3 , 1024 ] × 1
[ 1 × 3 , 512 ] × 1
[ 1 × 3 , 1024 ] × 2
Global Average Pooling
Dense32 d256 d64 d32 d32 d
ClassBlockDense32 d, N c -d256 d, N c -d64 d, N c -d32 d, N c -d32 d, N c -d
MAdds260.1 M890.4 M890.0 M889.9 M757.8 M
Table 2. Complex-ConvBlock transforming from c to c channels with stride s.
Table 2. Complex-ConvBlock transforming from c to c channels with stride s.
InputOperatorOutput
h × w × c Allocate channels h × w × ( c R + c I )
h × w × ( c R + c I ) 1 × 1 conv2d, ReLU h × w × ( λ c R + λ c I )
h × w × ( λ c R + λ c I ) 1 × 3 complex depthwise conv,
stride s, ReLU
h s × w s × ( λ c R + λ c I )
h s × w s × ( λ c R + λ c I ) 1 × 1 conv2d, linear h s × w s × ( c R + c I )
h s × w s × ( c R + c I ) Concat channels h s × w s × c
Table 3. Comparison of fault diagnosis accuracy (%) on the PU-bearing dataset under different noise levels when only 30% of samples are used for training. The results are reported as mean ± standard deviation over ten independent runs. The best and second-best mean accuracy values are highlighted in bold and underlined, respectively.
Table 3. Comparison of fault diagnosis accuracy (%) on the PU-bearing dataset under different noise levels when only 30% of samples are used for training. The results are reported as mean ± standard deviation over ten independent runs. The best and second-best mean accuracy values are highlighted in bold and underlined, respectively.
MethodPU-Bearing DatasetParams
20 dB24 dB28 dB32 dB
CLFormer 65.56 ± 0.92 70.79 ± 0.94 71.74 ± 0.65 73.01 ± 1.05 0.02 M
DRSN 76.43 ± 1.04 80.67 ± 1.12 83.29 ± 1.11 85.09 ± 0.64 4.3 M
SepFormer 83.21 ± 1.09 88.38 ± 0.84 93.26 ± 0.87 93.72 ± 1.10 2.1 M
ResNetTh 73.56 ± 0.65 80.43 ± 0.52 86.86 ± 0.58 88.57 ± 0.87 8.0 M
TST 58.78 ± 0.85 67.94 ± 1.23 69.14 ± 0.58 78.10 ± 0.74 0.5 M
CVDRNet 32 2 85.14 ± 0.86 91.66 ± 0.87 93.73 ± 0.96 95.66 ± 0.84 1.4 M
CVDRNet 32 4 88.52 ± 0.92 93.69 ± 0.86 95.78 ± 0.85 96.98 ± 0.91 18.3 M
CVDRNet 64 4 88.69 ± 0.88 93.93 ± 0.98 ̲ 95.96 ± 1.28 97.93 ± 1.07 ̲ 18.4 M
CVDRNet 256 4 90.21 ± 0.75 ̲ 93.53 ± 0.72 96.58 ± 1.24 ̲ 96.67 ± 1.19 18.8 M
CVDRNet 32 6 91 . 84 ± 0 . 48 96 . 06 ± 0 . 62 98 . 03 ± 1 . 04 97 . 94 ± 1 . 02 33.0 M
Table 4. Comparison of fault diagnosis accuracy (%) on the DDS bearing dataset under different noise levels when only 30% of samples are used for training. The results are reported as mean ± standard deviation over ten independent runs. The best and second-best mean accuracy values are highlighted in bold and underlined, respectively.
Table 4. Comparison of fault diagnosis accuracy (%) on the DDS bearing dataset under different noise levels when only 30% of samples are used for training. The results are reported as mean ± standard deviation over ten independent runs. The best and second-best mean accuracy values are highlighted in bold and underlined, respectively.
MethodDDS Bearing DatasetParams
20 dB24 dB28 dB32 dB
CLFormer 69.41 ± 0.59 70.48 ± 1.04 72.71 ± 0.75 74.83 ± 1.15 0.02 M
DRSN 77.83 ± 0.56 80.97 ± 0.72 82.82 ± 0.57 85.45 ± 0.90 4.3 M
SepFormer 86.69 ± 0.70 89.06 ± 0.50 92.55 ± 0.65 93.26 ± 0.63 2.1 M
ResNetTh 79.41 ± 0.81 79.72 ± 1.14 86.83 ± 0.60 87.41 ± 0.91 8.0 M
TST 61.23 ± 0.69 67.69 ± 0.82 77.24 ± 0.57 77.29 ± 1.05 0.5 M
CVDRNet 32 2 89.18 ± 0.45 92.44 ± 0.84 94.87 ± 0.69 96.94 ± 0.86 1.4 M
CVDRNet 32 4 92.00 ± 0.95 96.20 ± 0.44 ̲ 97 . 73 ± 0 . 59 98.28 ± 0.95 ̲ 18.3 M
CVDRNet 64 4 92.61 ± 0.73 ̲ 95.47 ± 1.11 97.58 ± 0.66 ̲ 97.33 ± 0.93 18.4 M
CVDRNet 256 4 91.88 ± 0.88 95.03 ± 0.64 96.92 ± 1.08 97.88 ± 1.19 18.8 M
CVDRNet 32 6 92 . 66 ± 0 . 60 97 . 16 ± 0 . 90 96.78 ± 1.18 98 . 73 ± 0 . 65 33.0 M
Table 5. Ablation comparison of different block designs in CVDRNet under different SNRs.
Table 5. Ablation comparison of different block designs in CVDRNet under different SNRs.
MethodTop 1 (%)
model#Block16 dB20 dB24 dB32 dB
CVDRNetResBlock86.1592.4695.6397.90
CVDRNetComplex-ConvBlock86.4493.3296.2798.27
CVDRNetComplex-ResBlock88.5293.6995.7896.98
Table 6. Ablation comparison of different loss functions in CVDRNet under different SNRs.
Table 6. Ablation comparison of different loss functions in CVDRNet under different SNRs.
MethodTop 1 (%)
model#Loss16 dB20 dB24 dB32 dB
CVDRNet L class 81.8089.2291.9196.39
CVDRNet L class + L CosDR 88.5293.6995.7896.98
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Q.; Xian, X.; Chen, Z.; Yan, L.; Fan, Y.; Yin, K. Few-Shot Fault Diagnosis of Rotating Machinery Using Complex Convolution and Disentangled Representation Learning. Machines 2026, 14, 655. https://doi.org/10.3390/machines14060655

AMA Style

Zhou Q, Xian X, Chen Z, Yan L, Fan Y, Yin K. Few-Shot Fault Diagnosis of Rotating Machinery Using Complex Convolution and Disentangled Representation Learning. Machines. 2026; 14(6):655. https://doi.org/10.3390/machines14060655

Chicago/Turabian Style

Zhou, Qiuyang, Xiaoyu Xian, Zhengyu Chen, Lei Yan, Yuming Fan, and Kexin Yin. 2026. "Few-Shot Fault Diagnosis of Rotating Machinery Using Complex Convolution and Disentangled Representation Learning" Machines 14, no. 6: 655. https://doi.org/10.3390/machines14060655

APA Style

Zhou, Q., Xian, X., Chen, Z., Yan, L., Fan, Y., & Yin, K. (2026). Few-Shot Fault Diagnosis of Rotating Machinery Using Complex Convolution and Disentangled Representation Learning. Machines, 14(6), 655. https://doi.org/10.3390/machines14060655

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop