1. Introduction
Rotating machinery is widely used in modern industrial systems, including rail transit systems, wind turbines, aero-engines, power generation equipment, and intelligent manufacturing equipment [
1]. As key transmission and supporting components [
2], bearings, gears, shafts, and other rotating elements usually operate under complex mechanical loads, varying speeds, and harsh operating environments. During long-term operation, these components are prone to fatigue, wear, pitting, cracking, lubrication degradation, and other localized defects or degradation phenomena [
3]. If such faults are not detected promptly, they may cause performance degradation, unplanned downtime, cascading damage to adjacent components, and even serious safety accidents [
4]. Therefore, accurate and timely fault diagnosis of rotating machinery is of great significance for ensuring operational reliability, reducing maintenance costs, and enhancing the safety of industrial equipment [
5,
6]. Vibration signal analysis has become one of the most widely used techniques for rotating machinery fault diagnosis because mechanical defects usually induce fault-related dynamic responses that can be captured by vibration sensors [
7,
8]. Localized defects in bearings or gears can generate fault-induced impulses and modulation components during rolling contact or gear meshing processes. These fault-related dynamic responses are subsequently manifested in the measured vibration signals [
9]. Traditional fault diagnosis methods usually rely on handcrafted features extracted in the time, frequency, or time-frequency domains, such as time-domain statistical indicators, spectral features, envelope spectrum features, wavelet-based features, and empirical mode decomposition-based features [
10]. These methods have achieved promising performance in specific diagnostic scenarios. However, their effectiveness often depends heavily on prior knowledge, signal preprocessing strategies, and expert experience, which limits their generalization and adaptability to complex industrial environments.
Driven by recent advances in deep learning, data-driven fault diagnosis methods have attracted increasing attention in rotating machinery health monitoring [
11,
12]. Through hierarchical nonlinear transformations, deep neural networks can automatically learn discriminative representations from raw vibration signals or transformed representations, such as spectral and time-frequency representations [
13]. Compared with handcrafted feature-based methods, deep learning models alleviate the dependence on manual feature design and have shown strong representation learning capability in complex signal analysis tasks [
14,
15]. Among them, convolutional neural networks, recurrent neural networks, auto-encoders, and attention-based networks have been widely applied to rotating machinery fault diagnosis.
However, the promising performance of most deep diagnostic models usually relies on sufficient labeled training samples. The assumption of abundant labeled data is difficult to satisfy in practical industrial scenarios. Rotating machinery generally operates under normal conditions for most of its service life, whereas fault conditions occur infrequently. In particular, early-stage, severe, or specific fault types are rare, costly, and sometimes unsafe to reproduce in real equipment. As a result, the number of labeled fault samples is often far smaller than the number of normal samples, and some fault categories may contain only a few labeled samples [
16,
17]. Under limited-sample conditions, deep neural networks are prone to overfitting. This is because limited training data cannot fully characterize the intra-class variability and inter-class discrepancies of fault patterns. The learned features may be dominated by sample-specific patterns rather than intrinsic fault mechanisms. This problem becomes more severe when fault-related signal components are weak or when operating conditions vary. Consequently, diagnostic models trained with limited samples often suffer from unstable feature representations, reduced classification accuracy, and poor generalization to unseen working conditions. Therefore, few-shot fault diagnosis has become an important and challenging problem in intelligent maintenance of rotating machinery [
18,
19]. The key issue is how to learn robust and fault-discriminative representations from limited labeled samples [
20]. An effective model should not only avoid overfitting but also enhance intra-class compactness and inter-class separability in the learned feature space [
21,
22]. This requirement motivates the development of representation learning, metric learning, and disentangled feature learning strategies for few-shot fault diagnosis.
To address the difficulty of extracting subtle fault features and improving generalization to unseen fault classes, Li et al. [
18] proposed an attention-based deep meta-transfer learning method named ADMTL. Wang et al. [
23] proposed a few-shot mechanical fault diagnosis method called dual graph neural network with residual blocks to address the limited labeled data problem. Ren et al. [
24] proposed a Few-shot GAN to address severe data imbalance. The proposed method pre-trained the GAN using a sample-rich class to learn a general sample distribution paradigm. Wang et al. [
25] proposed a few-shot fault diagnosis model that combines self-supervised learning with an improved Siamese network to address the lack of labeled samples. Lin et al. [
26] proposed IFMAML, a few-shot meta-transfer fault diagnosis method for cross-domain diagnosis. Sparse principal component analysis is first used to enhance domain-invariant features and reduce redundancy.
In addition to the scarcity of labeled samples, feature coupling is another critical factor that limits the performance of few-shot fault diagnosis. Measured vibration signals from rotating machinery usually contain multiple coupled components. Fault-related components, such as weak impulses, modulation components, and transient responses, are often mixed with condition-related information, background vibration, structural transmission path effects, and environmental noise. When sufficient labeled samples are available, deep models may still learn useful fault-related patterns from complex signal distributions. However, in few-shot scenarios, limited labeled samples cannot adequately characterize intrinsic fault mechanisms and intra-class variations. As a result, the model may overfit to sample-specific fluctuations or condition-related patterns rather than fault-discriminative information. This problem becomes more pronounced when fault-induced components are weak. In practical rotating machinery systems, variations in speed, load, and structural transmission paths may change the amplitude, frequency distribution, and modulation characteristics of vibration signals. These variations may dominate weak early fault signatures. Consequently, samples from the same fault class may exhibit large intra-class variability under different operating conditions.
Motivated by the above analysis, this paper proposes a few-shot fault diagnosis method based on complex-valued disentangled representation learning. Two vibration samples are simultaneously fed into the shared feature extraction network, and their feature representations are optimized according to sample-to-sample relations. Through this mechanism, the proposed method can learn a more compact and separable fault feature space under few-shot conditions. Unlike standard episodic (N)-way (K)-shot meta-learning methods, such as Prototypical Networks and MAML, this study focuses on a fixed limited-labeled-sample fault diagnosis scenario. Therefore, the proposed CVDRNet addresses few-shot diagnosis from the perspective of robust representation learning rather than task-level meta-adaptation, by combining direction-pair complex-valued feature extraction, lightweight complex-valued convolution, dual-branch disentangled representation learning, and a cosine-based disentangled representation loss. The main contributions of this paper are summarized as follows:
To model the coupled dynamic characteristics between different vibration directions, a lightweight complex-valued convolutional module is designed. This module enhances fault-sensitive feature extraction while maintaining a compact network structure.
By using a dual-input weight-sharing structure, sample-to-sample relations are exploited under few-shot conditions. This structure provides a basis for discriminative representation learning from limited labeled fault samples.
To separate fault-sensitive information from condition-related interference, a cosine-based disentangled representation loss is introduced. It enhances intra-class compactness and inter-class separability in the fault-sensitive feature space, thereby improving the robustness and generalization ability of few-shot fault diagnosis.
3. Methods
Let
denote a training dataset containing
N triaxial vibration samples collected from rotating machinery, where
is the
i-th triaxial vibration sample and
is its corresponding fault label. For each sample, the vibration responses measured along three spatial directions can be represented as
where
,
, and
denote the vibration signals collected from the three spatial directions. To exploit the complementary information in triaxial vibration measurements and increase the diversity of limited samples, a physically constrained random direction-pair selection strategy is adopted. Specifically, the vertical vibration component is always selected as one branch of the complex-valued input because it is generally more sensitive to fault-induced impacts and dynamic responses in rotating machinery. The other branch is randomly selected from the two remaining directional components. In this way, the constructed direction-pair complex vibration sample preserves a fault-sensitive vertical reference while introducing complementary directional information. Given an ordered directional pair
with
and
, the corresponding complex-valued input is defined as
where
j denotes the imaginary unit. In this representation,
and
are assigned to the real and imaginary branches, respectively. This construction is not intended to represent a strict analytic signal or physically defined in-phase and quadrature components. Instead, it is introduced as a structured direction-pair complex representation for modeling cross-directional dynamic coupling in vibration responses.
The proposed complex-valued disentangled representation network (CVDRNet) is illustrated in
Figure 1. For illustration, two direction-pair complex vibration samples,
and
, are shown as the inputs of CVDRNet. In practice, all samples in a mini-batch are processed by the shared network during training. The network consists of four main blocks: CommBlock, CondBlock, FaultBlock, and ClassBlock. For simplicity, let
,
,
, and
denote these four blocks, respectively. The CommBlock maps the direction-pair complex vibration input into a deep common feature representation. The feature map extracted by the CommBlock is then simultaneously fed into the CondBlock and FaultBlock. The CondBlock learns condition-related interference representations, while the FaultBlock extracts fault-sensitive representations for fault identification. Finally, the fault-sensitive feature extracted from the FaultBlock is fed into the ClassBlock to generate the predicted class probability vector. A classification loss is introduced to measure the discrepancy between predictions and labels, and a cosine-based disentangled representation loss is employed to promote the disentanglement between condition-related and fault-sensitive feature representations. In general,
is a function parameterized by
, where
denotes the
r-th direction-pair complex vibration sample. The feature extracted from the CommBlock is denoted as
The CondBlock is parameterized by
and maps the common feature
to a condition-related feature representation:
Here,
denotes the condition-related feature representation that captures operating-condition variations, direction-dependent responses, background vibration, and other interference components. Similarly, the FaultBlock is parameterized by
and maps the common feature
to a fault-sensitive feature representation:
The representation
is expected to preserve discriminative fault information, and its separation from condition-related interference is further promoted by the disentangled representation loss. Finally,
represents a classification function parameterized by
, which maps the fault-sensitive representation to the predicted class probability vector:
For the two illustrated input samples
and
, CVDRNet produces two sets of outputs:
The fault-sensitive representations are used for classification, while the condition-related and fault-sensitive representations are jointly constrained by the cosine-based disentangled representation loss. Therefore, the proposed framework can explicitly enhance the discriminability of fault-sensitive features and reduce the influence of condition-related interference under few-shot conditions.
Table 1 lists the detailed architectures of the CVDRNet variants. In
Table 1, the superscript of CVDRNet denotes the number of stacked Complex-ResBlocks, and the subscript denotes the dimension of the final fault-sensitive and condition-related embeddings. The kernel size and channel number of each building block are shown in square brackets, where the numbers outside the brackets indicate the number of stacked blocks. The symbol
in the ClassBlock row denotes the number of fault classes. Unless otherwise specified,
is adopted as the default architecture in this paper.
3.1. Lightweight Complex-Valued Feature Extraction Module
The shared feature extraction module is designed to learn cross-directional fault-related representations from direction-pair complex vibration samples. Although conventional real-valued convolutional neural networks have been widely used in fault diagnosis, most real-valued convolutional operations fuse different input channels in a general manner without explicitly modeling their structured interactions. This strategy may be insufficient for capturing the cross-directional dynamic coupling between two directional vibration components. In the proposed method, two selected vibration directions are organized as the real and imaginary branches of a complex-valued input. Therefore, complex-valued convolution is introduced to model the interaction between these two branches in a structured manner.
Figure 2 compares several representative convolutional structures, including ResBlock, ComplexBlock, and Res-ComplexBlock. Compared with real-valued convolution, complex-valued convolution enables information exchange between the real and imaginary branches through complex multiplication. For a complex-valued feature map
and a complex-valued convolution kernel
, the complex convolution can be expressed as
where ∗ denotes the convolution operator. In this formulation, both the real and imaginary output branches integrate information from the two input branches. Therefore, complex-valued convolution provides a structured mechanism for learning cross-directional coupling characteristics in vibration responses.
However, directly stacking standard complex-valued convolutional layers usually introduces a large number of parameters and high computational cost. This is undesirable for few-shot fault diagnosis because over-parameterized models are more prone to overfitting when labeled fault samples are limited. To address this problem, a lightweight Complex-ConvBlock is designed by incorporating complex depthwise convolution into the complex-valued convolutional operation. The detailed structures of the proposed Complex-ConvBlock and its residual version, Complex-ResBlock, are shown in
Figure 3. Given an input feature map
, where
is assumed to be even, the first
channels form the real branch, and the remaining
channels form the imaginary branch. The Complex-ConvBlock consists of three main operations: pointwise expansion, complex depthwise filtering, and pointwise projection. First, a pointwise convolution followed by a nonlinear activation is used to expand the channel dimension and improve representation capability:
where
denotes the pointwise expansion operation and
is the activation function. Second, a complex depthwise convolution is performed on the expanded feature map. Unlike standard convolution, depthwise convolution applies an individual convolutional filter to each input channel, thereby reducing the computational burden. Let
and
denote the expanded real and imaginary branches, respectively. The complex depthwise convolution can be formulated as
where
and
denote the real and imaginary depthwise convolution kernels, respectively. The obtained feature maps
and
are then concatenated along the channel dimension as the real-valued tensor representation of the complex depthwise output. Finally, another pointwise convolution is adopted to fuse channel information and project the feature map to the desired output dimension:
where
denotes the pointwise projection operation and
represents channel-wise concatenation. With pointwise expansion, complex depthwise filtering, and pointwise projection, the Complex-ConvBlock can extract cross-directional fault-related features with reduced computational cost.
To further improve feature propagation and alleviate optimization difficulty, a residual version of the Complex-ConvBlock, termed Complex-ResBlock, is also used in CVDRNet. As shown in
Figure 3, the Complex-ResBlock introduces a shortcut connection between the input and output features:
where
denotes an identity mapping or a projection operation used to match the feature dimensions. This residual design helps preserve useful low-level vibration information and facilitates the training of deeper complex-valued networks. As a result, the lightweight complex-valued feature extraction module can effectively model cross-directional vibration coupling while maintaining a compact network structure suitable for few-shot fault diagnosis.
3.2. Parameter Count for Complex-ConvBlock
To further analyze the computational efficiency of the proposed lightweight complex-valued feature extraction module, the model size and computational cost of Complex-ConvBlock are discussed in this subsection. For a standard convolutional block with spatial size
,
c input channels,
output channels, and kernel size
, the number of multiply-add operations can be approximately calculated as
Although standard convolution has strong representational capacity, its computational cost increases rapidly with the number of input and output channels. This problem becomes more pronounced when complex-valued convolution is used, because the real and imaginary branches need to be processed and coupled simultaneously.
The proposed Complex-ConvBlock is designed as a lightweight replacement for standard complex convolution. Its detailed transformation process from
c input channels to
output channels with stride
s is shown in
Table 2. In the complex-valued representation, the input and output channels are divided into real and imaginary branches, i.e.,
and
. The expansion ratio
controls the number of intermediate channels in the Complex-ConvBlock.
As shown in
Table 2, the Complex-ConvBlock first partitions the input channels into the real and imaginary branches. Then, a
convolution followed by ReLU is used to expand the channel dimension from
to
. After that, a
complex depthwise convolution is performed for lightweight local feature filtering. Finally, another
convolution is adopted to project the feature map to
, and the real and imaginary branches are concatenated along the channel dimension to obtain the final output feature with
channels. Following the operation sequence in
Table 2, the multiply-add operations of Complex-ConvBlock can be approximately expressed as
where the three terms represent the computational contributions of the expansion, complex depthwise filtering, and projection stages, respectively, following the calculation rule adopted for the block. Compared with directly stacking standard complex convolutional layers, the use of depthwise convolution significantly reduces the computational burden while retaining the interaction between the real and imaginary branches. The main advantage of Complex-ConvBlock lies in reducing computational cost while maintaining sufficient representation capability. To evaluate this property,
Figure 4 compares the parameter count, memory cost, floating-point operations, multiply-add operations, and total memory requirement among Complex-ConvBlock, Complex-ResBlock, ResBlock, and ResCBlock.
Figure 4 shows that Complex-ConvBlock and Complex-ResBlock require fewer operations and lower memory consumption than ResBlock and ResCBlock under the same input conditions. This indicates that the proposed lightweight complex-valued design provides a more compact feature extractor for few-shot fault diagnosis, where over-parameterized models may easily overfit limited labeled fault samples.
The expansion ratio
controls the trade-off between computational cost and feature representation ability. A larger
increases the number of intermediate channels and may improve the expressive capacity of the network. However, it also leads to higher parameter count, memory usage, and computational cost. As shown in
Figure 4, the computational statistics of Complex-ConvBlock and Complex-ResBlock increase approximately linearly with the expansion ratio. Therefore, a moderate expansion ratio is preferred to balance diagnostic performance and model complexity. In this paper, the expansion ratio is set to
unless otherwise specified. This setting provides sufficient feature transformation capability while keeping the network compact for few-shot fault diagnosis.
3.3. Cosine-Based Disentangled Representation Loss
CVDRNet is optimized by jointly considering fault classification and feature disentanglement. For a direction-pair complex vibration sample
, the predicted class probability vector is obtained from the fault-sensitive representation:
The overall loss function consists of a classification loss and a cosine-based disentangled representation loss:
where
and
are weighting coefficients. The classification loss supervises fault prediction, while the cosine-based disentangled representation loss enhances the discriminability of fault-sensitive representations and suppresses the similarity between fault-sensitive and condition-related representations. For a mini-batch
, the classification loss is formulated as the cross-entropy between the predicted probability vector and the ground-truth label:
where
denotes the number of fault classes,
is the one-hot encoded label, and
is the predicted probability of the
c-th class.
The classification loss mainly supervises the fault prediction results, but it does not explicitly separate fault-sensitive information from condition-related interference. Therefore, a cosine-based disentangled representation loss is further introduced. For metric calculation, the fault-sensitive and condition-related feature maps are first transformed into embedding vectors by global average pooling:
The cosine similarity between two embedding vectors
and
is defined as
As illustrated in
Figure 5, each mini-batch contains fault-sensitive and condition-related embeddings. For each anchor fault-sensitive embedding
, the positive and negative index sets are defined as
Here,
contains the indices of samples from the same fault class as the anchor sample, while
contains the indices of samples from different fault classes. In addition, condition-related embeddings are used to construct disentanglement constraints to reduce the correlation between
and
. The cosine-based disentangled representation loss is composed of a positive compactness term and a negative term that includes inter-class separation and feature disentanglement:
The positive compactness term is defined as
where
is a scaling factor and
is the similarity margin. This term pulls fault-sensitive embeddings from the same fault class closer in the embedding space. The negative term is defined as
where
is the disentanglement margin. The first part pushes fault-sensitive embeddings from different fault classes apart, while the second part suppresses the similarity between fault-sensitive and condition-related embeddings. With the Softplus-like formulation, the optimization strength is automatically adjusted according to the relative hardness of sample pairs. Hard positive pairs with low cosine similarity and hard negative pairs with high cosine similarity receive larger gradients. Meanwhile, the disentanglement term penalizes highly similar fault-sensitive and condition-related embeddings only when their cosine similarity exceeds the margin
. Therefore, the proposed loss provides richer pairwise supervisory information within each mini-batch. It improves intra-class compactness, inter-class separability, and feature disentanglement, thereby enabling CVDRNet to learn more robust fault-sensitive representations under few-shot conditions.
4. Experiments Results and Comparisons
In this section, extensive experiments are conducted on the Paderborn University (PU) bearing dataset [
31] and a drivetrain dynamics simulator (DDS) bearing dataset to verify the effectiveness and generalization ability of the proposed method. The former provides real bearing vibration signals collected under different operating conditions, while the latter is used to further evaluate the diagnostic capability of the model in a controlled drivetrain simulation environment.
4.1. Data Preprocessing
The continuous vibration signals are segmented into fixed-length samples to construct the training and testing sets. Specifically, each raw signal is divided by a sliding window with a sample length of L and an overlap ratio of . Each segmented sample inherits the health-condition label of the original signal. To avoid information leakage, the training and testing sets are split before direction-pair augmentation, and the augmented samples generated from the same original segment are assigned to the same subset.
To simulate a fixed limited-labeled-sample fault diagnosis scenario rather than a standard episodic N-way K-shot meta-learning protocol, only 30% of the available samples in each fault category are used for model training in all experiments, while the remaining 70% are used for testing. The training samples are randomly selected before direction-pair augmentation, ensuring that the augmented samples are generated only from the selected training subset. This setting restricts the amount of labeled training data and allows different methods to be trained and evaluated under the same data partition and noise conditions, thereby evaluating their ability to learn fault-discriminative representations from limited samples.
4.2. Implementation Details
All experiments were implemented in PyTorch 2.12.0 and executed on a workstation equipped with an NVIDIA 5060 Ti GPU and an Intel Core i7-14700K CPU. Unless otherwise specified, was adopted as the default backbone, and the expansion ratio in the Complex-ConvBlock was set to . The proposed model was trained in an end-to-end manner using the Adam optimizer with an initial learning rate of . The batch size was fixed at 256, and the training process was conducted for 50 epochs. In addition, a step-wise learning-rate decay strategy was adopted, in which the learning rate was multiplied by 0.01 every 20 epochs to facilitate convergence and mitigate overfitting. To reduce the influence of random initialization and random data partitioning, each experiment was independently repeated ten times. The final diagnostic results are reported as the mean value and standard deviation over the ten independent runs.
4.3. Dataset Description
Two bearing datasets are employed for performance evaluation, namely the PU-bearing dataset and a DDS bearing dataset. The PU dataset serves as a real-world bearing benchmark, whereas the DDS dataset provides a controlled drivetrain scenario for further validation. These two datasets offer complementary experimental conditions, covering real measured bearing vibration signals and simulator-based drivetrain fault cases.
The PU-bearing dataset was collected from the bearing test rig shown in
Figure 6. The platform is designed to record bearing signals under different health conditions and operating conditions. It provides vibration responses of healthy and faulty bearings, together with operating parameters such as rotational speed, load torque, and radial load. In this study, vibration signals are selected as the diagnostic data source. The dataset contains normal bearings and faulty bearings with different localized defect characteristics. Since the signals are collected from a physical bearing test platform, the PU dataset is suitable for evaluating the diagnostic performance of the proposed method under real measurement conditions and varying operating environments.
The DDS bearing dataset was established using the drivetrain dynamics simulator rig shown in
Figure 7. The simulator is constructed to reproduce representative operating conditions of rotating machinery and to support the analysis of bearing fault dynamics in drivetrain systems. The platform mainly consists of a driving motor, a rolling bearing support unit, an inertial flywheel, a gearbox transmission module, and a magnetic powder brake. The driving motor provides rotational drive at preset speeds of 1000, 1500, and 2000 r/min, while the magnetic powder brake is used to apply different mechanical loads. The flywheel is introduced to emulate inertial effects during drivetrain operation, and the gearbox module forms the transmission path through which fault-induced vibration propagates. All signals are sampled at 40 kHz to capture transient impacts and dynamic characteristics caused by bearing defects. The DDS dataset includes four health categories: normal condition (NC), inner race fault (IR), outer race fault (OR), and rolling element fault (RB). For each fault type, three artificial defect diameters are considered, namely 0.5 mm, 1.0 mm, and 2.0 mm. Nine operating conditions are formed by combining three rotational speeds with three load levels of 0, 2, and 4 hp.
4.4. Comparison with Other Methods on the PU-Bearing Dataset
To evaluate the diagnostic performance of the proposed method on real measured bearing signals, several representative methods are selected for comparison on the PU-bearing dataset, including CLFormer [
32], DRSN [
33], SepFormer [
34], ResNetTh [
35], and TST [
36]. Although ResNetTh was originally developed for DAB converter fault diagnosis, it is used here as an architecture-level baseline because its residual threshold-denoising structure is relevant to supervised one-dimensional fault classification under noisy conditions. The comparison is conducted under four noise levels, namely 20 dB, 24 dB, 28 dB, and 32 dB. The diagnostic accuracy and the number of trainable parameters are reported in
Table 3.
As shown in
Table 3, the proposed CVDRNet variants consistently outperform the compared methods under all noise levels. Among the baseline methods, SepFormer achieves the best overall performance, with accuracies of 83.21%, 88.38%, 93.26%, and 93.72% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. This indicates that transformer-based feature modeling is effective for bearing fault diagnosis under noisy conditions. However, all CVDRNet variants achieve higher diagnostic accuracy than SepFormer, demonstrating the advantage of the proposed complex-valued disentangled representation learning framework. Specifically, the lightweight
achieves accuracies of 85.14%, 91.66%, 93.73%, and 95.66% under the four noise levels, respectively. Compared with SepFormer,
improves the average accuracy from 89.64% to 91.55%, while using only 1.4 M parameters. This result shows that the direction-pair complex representation and lightweight complex-valued convolution can enhance fault-sensitive feature extraction even with a relatively compact network structure.
Among all tested variants, obtains the best overall performance on the PU-bearing dataset. Its accuracies reach 91.84%, 96.06%, 98.03%, and 97.94% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. Compared with the strongest baseline SepFormer, improves the average accuracy by approximately 6.33 percentage points. The improvement is more significant under stronger noise interference. For example, under the 20 dB condition, exceeds SepFormer by 8.63 percentage points, which indicates that the proposed method has stronger noise robustness.
The superior low-SNR performance is mainly attributed to representation-level noise suppression. The direction-pair complex representation exploits complementary cross-directional information, since fault-induced responses are more likely to show consistent coupling across vibration directions than random noise. The complex-valued convolution further models the interaction between the real and imaginary branches, thereby enhancing coupled fault-sensitive features. In addition, the cosine-based disentangled representation loss improves intra-class compactness and inter-class separability, making the learned fault-sensitive embeddings less affected by noise-induced perturbations. It can also be observed that increasing the embedding dimension does not always lead to consistent performance improvement. has slightly more parameters than and , but its accuracy is not consistently higher under all noise levels. In contrast, increasing the network depth from to brings a more evident improvement in most cases. This suggests that deeper complex-valued residual feature extraction is more beneficial than simply increasing the embedding dimension for the PU-bearing dataset.
4.5. Comparison with Other Methods on the DDS Bearing Dataset
To further evaluate the robustness and generalization ability of the proposed method under controlled drivetrain operating conditions, comparative experiments are conducted on the DDS bearing dataset. Five representative fault diagnosis methods are selected as baseline models. The experiments are performed under four noise levels. The diagnostic accuracy and the number of trainable parameters are listed in
Table 4.
As shown in
Table 4, the proposed CVDRNet variants achieve consistently better performance than the compared methods under all noise levels. Among the baseline models, SepFormer obtains the highest overall accuracy, reaching 86.69%, 89.06%, 92.55%, and 93.26% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. This indicates that SepFormer has a relatively strong capability for extracting fault-related features from noisy vibration signals. However, all CVDRNet variants outperform SepFormer across the four noise levels, which demonstrates the effectiveness of the proposed complex-valued feature extraction and disentangled representation learning strategy. Specifically, the lightweight
achieves accuracies of 89.18%, 92.44%, 94.87%, and 96.94% under the four noise levels, respectively. Compared with SepFormer, its average accuracy increases from 90.39% to 93.36%, while the parameter number is reduced from 2.1 M to 1.4 M. This result suggests that the direction-pair complex representation can enhance diagnostic performance without relying on a large-scale network structure. Therefore, even a compact CVDRNet variant can effectively capture fault-sensitive information from multi-directional vibration responses.
Among all tested CVDRNet variants, achieves the highest average accuracy on the DDS bearing dataset. Its diagnostic accuracies reach 92.66%, 97.16%, 96.78%, and 98.73% under 20 dB, 24 dB, 28 dB, and 32 dB, respectively. Compared with the strongest baseline SepFormer, improves the average accuracy by approximately 5.94 percentage points. In particular, under the 24 dB noise condition, the improvement reaches 8.10 percentage points, indicating that the proposed model has strong robustness to noise interference in drivetrain fault diagnosis.
It can also be observed that different CVDRNet variants show slight performance differences. achieves the best result under the 28 dB condition, with an accuracy of 97.73%, while obtains the best results under 20 dB, 24 dB, and 32 dB. This indicates that increasing the network depth generally improves the feature representation capability, but the optimal architecture may vary slightly under different noise levels. In contrast, simply increasing the embedding dimension from 32 to 64 or 256 does not always lead to consistent improvement. For example, and have slightly more parameters than , but their accuracies are not consistently higher. This suggests that an excessively large embedding dimension may introduce redundant parameters and does not necessarily improve generalization.
4.6. Feature Disentangle Visualization
To further analyze the effectiveness of the proposed disentangled representation learning strategy, t-SNE visualization is used to project the learned high-dimensional embeddings into a two-dimensional space.
Figure 8 shows the visualization results of the fault-sensitive embeddings
and the condition-related embeddings
.
As shown in
Figure 8a, the fault-sensitive embeddings
form compact and distinguishable clusters. Samples belonging to the same fault class are closely distributed, while samples from different fault classes are clearly separated in the embedding space. This indicates that the FaultBlock can effectively preserve discriminative fault-related information. The compact intra-class distribution also demonstrates that the proposed cosine-based disentangled representation loss helps reduce feature dispersion within the same fault category. Meanwhile, the clear inter-class boundaries show that the learned fault-sensitive representations are suitable for fault classification under limited labeled samples. In contrast,
Figure 8b shows the distribution of the condition-related embeddings
. Compared with the fault-sensitive embeddings, the condition-related embeddings are more dispersed and do not form clear class-wise clusters. Samples from different fault categories are highly mixed in the projected space. This phenomenon suggests that the CondBlock mainly captures non-fault-dominant information, such as operating-condition variations, direction-related responses, background vibration, and other interference components. Since these representations do not show obvious fault-discriminative structures, they are effectively separated from the fault-sensitive feature space.
The comparison between
Figure 8a,b verifies the feature disentanglement capability of CVDRNet. The proposed framework encourages fault-related information to be concentrated in
, while condition-related interference is absorbed by
. Therefore, the model can learn more robust and discriminative fault-sensitive representations, which contributes to the improved diagnostic performance under few-shot and noisy conditions.
4.7. Ablation Study
4.7.1. Block Design Ablation
To investigate the influence of different convolutional block designs, three variants of CVDRNet are compared, including CVDRNet with ResBlock, CVDRNet with Complex-ConvBlock, and CVDRNet with Complex-ResBlock. The experimental results under different SNRs are reported in
Table 5. Among the three block designs, CVDRNet with Complex-ResBlock achieves the best average performance. Its Top-1 accuracies reach 88.52%, 93.69%, 95.78%, and 96.98% under the four SNR conditions. Compared with the conventional ResBlock, Complex-ResBlock improves the average accuracy from 93.04% to 93.74%. The improvement is especially evident under the low-SNR condition of 16 dB, where the accuracy increases by 2.37 percentage points. This result suggests that the residual complex-valued structure is beneficial for improving feature propagation and enhancing the robustness of the model under strong noise interference.
4.7.2. Loss Function Ablation
To verify the contribution of the proposed cosine-based disentangled representation loss, two loss configurations are compared, namely the conventional classification loss
and the joint loss
. The experimental results under different SNRs are shown in
Table 6. The improvement is particularly significant under stronger noise interference. Under the 16 dB condition, the accuracy increases by 6.72 percentage points after adding
. Under 20 dB and 24 dB conditions, the improvements are 4.47 and 3.87 percentage points, respectively. This demonstrates that the proposed loss can enhance the robustness of the learned representation when fault-related components are weakened by noise. In contrast, under the 32 dB condition, the improvement is relatively smaller because the fault-sensitive information is easier to identify when the noise level is lower.
4.7.3. Hyperparameter Sensitivity of
The hyperparameters
and
in
determine the optimization behavior of the cosine-based disentangled representation loss. To investigate their influence on diagnostic performance, a sensitivity analysis is conducted on the PU-bearing dataset under the 32 dB SNR condition using Top-1 accuracy as the evaluation metric. As shown in
Figure 9, the Top-1 accuracy changes noticeably with different combinations of
and
, and a high-performance region can be observed in the parameter space. This indicates that the strength of the disentanglement constraint directly affects the learned representation. When
or
deviates from the suitable range, the diagnostic accuracy decreases. A weak constraint may fail to enhance the discriminability of fault-sensitive embeddings. Conversely, an overly strong constraint may interfere with classification-oriented feature learning. Overall, the sensitivity analysis shows that the effectiveness of
depends on a proper balance between scaling and margin constraints. Therefore, the selected hyperparameter combination is adopted in the reported experiments to balance fault-sensitive feature discrimination and condition-related feature disentanglement.
5. Conclusions
This paper proposes a complex-valued disentangled representation network, termed CVDRNet, for few-shot fault diagnosis of rotating machinery. A direction-pair complex representation strategy is designed for triaxial vibration signals, where two selected directional components are assigned to the real and imaginary branches. This strategy increases sample diversity and enables cross-directional dynamic coupling to be modeled under limited labeled samples.
A lightweight complex-valued feature extraction module is further developed. By combining complex-valued convolution with depthwise convolution, the proposed Complex-ConvBlock enhances real-imaginary branch interaction while reducing computational cost. The Complex-ResBlock improves feature propagation through residual learning. In addition, a cosine-based disentangled representation loss is introduced to enhance intra-class compactness and inter-class separability of fault-sensitive embeddings while suppressing condition-related interference.
Experiments on the PU and DDS bearing datasets show that CVDRNet consistently outperforms representative methods when only 30% of samples are used for training. The improvement is not solely due to increased model size, since the lightweight contains only 1.4 M parameters, fewer than SepFormer with 2.1 M parameters, but still achieves higher average accuracy on both datasets. For industrial applications, larger CVDRNet variants are suitable for offline or server-side diagnosis, while offers a more practical balance between accuracy and model size for resource-constrained deployment.
Visualization and ablation studies further verify the effectiveness of the complex-valued block design and the disentangled representation loss. Future work will focus on adaptive direction selection, lightweight deployment, and cross-condition domain generalization, especially for cases with large training-test distribution gaps caused by wide rotational speed or load variations.