1. Introduction
With the advancement of high-speed rotating machinery toward the era of intelligent manufacturing, the condition monitoring and intelligent fault diagnosis of key transmission components, such as rolling bearings, have become essential techniques for ensuring the reliable operation of transmission systems [
1]. In modern industrial applications such as aeroengines, new energy vehicles, and wind turbines, the reliability of rolling bearings critically influences the overall performance and operational safety of mechanical systems. Within these critical transmission systems, severe bearing failures may lead to unplanned downtime, substantial maintenance costs, and even serious safety incidents [
2].
In high-speed transmission equipment featuring adequate lubrication and high machining precision, rolling bearings gradually develop surface contact fatigue damage with prolonged operation. This damage accumulation propagates as abnormal vibration and noise, leading to progressive performance degradation and eventual failure [
3]. Therefore, accurate monitoring of the real-time status of rolling bearings is essential for timely maintenance and risk prevention [
4]. By analyzing real-time sensor data, intelligent fault diagnosis can assess the operational status of bearings and issue early warnings for necessary maintenance or component replacement. This capability significantly enhances industrial equipment utilization efficiency and reduces maintenance costs, leading to its widespread application in industrial health monitoring systems [
5].
Fault data in rotating machinery are often characterized by high uncertainty, data imbalance, difficulty in data acquisition, and complex multi-source signals, which represent a typical few-shot problem. Consequently, there is a pressing need for intelligent, efficient, and reliable fault diagnosis models capable of operating with few-shot training. It is within this engineering context that deep learning-based intelligent diagnostics have garnered attention, due to their capabilities in addressing such challenges [
6]. Currently, fault diagnosis models for rotating mechanical components can be generalized into model-based and data-driven methods [
7]. The model-based fault diagnosis approaches typically require substantial prior knowledge to develop accurate fault mechanistic models. For instance, Bera et al. [
8] proposed an adaptive model-based method to evaluate imbalance faults in rotor-bearing systems. Zhao et al. [
9] addressed the limited-data problem in bearing diagnosis by augmenting and reconstructing the dataset, employing a Support Vector Machine (SVM) for fault classification. Although their model achieved high accuracy, its diagnostic capability under non-stationary conditions remains limited. Due to the structural complexity, highly variable operating conditions, and strong environmental disturbances in rotating machinery, traditional model-based methods often fail to meet the practical diagnostic requirements [
10].
In contrast, data-driven fault diagnosis techniques, which require minimal reliance on precise analytical models or expert knowledge, have emerged as the predominant trend in the intelligent development of modern rotating machinery diagnosis [
11].
The data-driven approaches, particularly deep learning and reinforcement learning, have been extensively applied to bearing fault diagnosis. Prominent techniques include convolutional neural networks (CNNs) [
12], generative adversarial networks (GANs) [
13], transfer learning (TL) [
14], and manifold learning [
15]. These methods excel at capturing critical information from datasets and establishing mappings between raw sensory data and the discriminative features.
CNN models have been extensively utilized in the domain of fault diagnostics of high-speed rotating machinery, owing to their local connectivity and weight-sharing properties [
16]. Ongoing research has focused on enhancing CNN architectures to enable direct input of raw vibration signals for end-to-end intelligent diagnosis. Zhang et al. [
17] developed a deep CNN with wide convolutional kernels based on domain adaptation principles, achieving robust fault diagnosis under varying operation conditions using raw bearing vibration signals. Similarly, Jiang et al. [
18] proposed an intelligent diagnosis model that systematically extracts discriminative fault characteristics from raw vibration of wind turbine gearboxes. Meanwhile, the attention mechanisms have been incorporated into the latest CNNs, allowing models to emphasize the most informative features. Wang et al. [
19] designed an attention-based multi-layer fusion CNN (AMFCNN) to construct a lightweight multi-sensor gear fault diagnosis model. Zhou et al. [
20] proposed a frequency-domain attention CNN to mitigate the environmental noise interference. The data-driven fault diagnosis models inherently depend on a substantial quantity of labelled samples [
21]. However, the operational conditions of high-speed rotating machinery are highly variable and complex, making it difficult to acquire sufficient bearing fault data [
22]. Consequently, relying solely on CNN models for fault diagnosis carries a high risk of overfitting. This data scarcity hinders the model’s ability to generalize, compromising diagnostic accuracy and failing to meet the monitoring requirements.
To address the issue of insufficient sample size, the latest data-driven fault diagnosis models have been optimized at the algorithm and data levels, respectively [
23]. A representative algorithmic method is the TL [
24]. To address the challenge of data scarcity, Ai et al. [
25] pioneered a TL framework based entirely on simulated data. Their approach employs a domain-invariant transformation technique to enable effective feature sharing between simulation and real-world domains. Wang et al. [
26] proposed the rolling bearing fault diagnosis method based on a dynamic simulated model from the source domain to the target domain with improved alternating TL. However, the performance of TL strongly depends on the availability of sufficient source-domain samples. When the source data are limited, the model’s generalization ability in the target domain is reduced [
27].
The data-level approaches enhance training data diversity by transforming existing samples, but they must be tailored to the specific signal characteristics of the application scenario [
28]. Among these, the GAN models are particularly effective, as they can generate new samples while learning the intrinsic data distribution. The GAN comprises a generator, which produces synthetic data, and a discriminator, which distinguishes between real and generated samples. These two components are trained adversarially until they reach a Nash equilibrium [
9]. Huang et al. [
29] employed an enhanced GAN to improve diagnostic accuracy for wind turbine gearboxes under actual in-service conditions. Miao et al. [
30] proposed a data augmentation strategy based on an improved variational autoencoder GAN. Zhuang et al. [
31] developed a two-stage feature extractor with residual attention mechanisms and corresponding GAN-based generative modules. Owing to the nonlinear and non-stationary nature of vibration signals, the generator often struggles to accurately capture the underlying distribution of raw data. Furthermore, the generator itself exhibits deficiencies in its capacity for feature extraction and instability in its model framework [
32].
The few-shot diagnosis of rolling bearings in high-speed rotating machinery faces dual challenges: the degraded accuracy of a standalone CNN due to insufficient training samples and the limited feature representation capability of GAN. To address these challenges, this study introduces a novel few-shot intelligent diagnosis framework that synergistically integrates GAN with the convolutional block attention module (CBAM). The specifics of this framework are detailed as follows:
- (1)
The limited and imbalanced fault data were augmented using a GAN with the Wasserstein distance loss function, thereby generating a diverse training dataset with increased sample sizes for each bearing fault category to resolve the data imbalance problem.
- (2)
To address the challenge of few-shot fault diagnosis, a novel CNN architecture incorporating a convolutional block attention module (CBAM-CNN) was proposed. This model was designed to fuse convolutional operations with an attention mechanism, facilitating a focus on salient feature regions and the capture of both local and global dependencies.
- (3)
This study mainly focused on the pitting failure of rolling bearings in a typical high-speed rotating machine, specifically the two-speed automatic mechanical transmission (2AMT). Experimental results confirm that CBAM-CNN delivers high-precision fault identification and diagnosis even under few-shot conditions.
The proposed method synergistically combines the strengths of the CBAM and GAN models, thereby effectively compensating for their respective limitations. The novel method has the potential to reduce the reliance of fault diagnosis models on large-scale datasets, thereby enabling effective fault diagnosis of rolling bearings even with limited sample sizes.
4. Few-Shot Intelligent Diagnosis Model Based on CBAM
The data generated by the GAN alleviates the issue of insufficient effective data acquisition caused by the short fault-evolution duration in rotating machinery bearings. To further strengthen the model’s capability in identifying and extracting salient features, the CBAM-CNN architecture is proposed.
4.1. Model Structural Parameters
The architecture of the proposed CBAM-CNN model is presented in
Figure 10. For few-shot real data, large convolution kernels are initially employed to extract the global characteristics of the raw vibration data, followed by the global average pooling layer for feature compression. Subsequently, the CBAM is incorporated into the feature extraction stage to enhance feature depiction. Thereafter, a 1 × 1 convolution is applied to fuse and calibrate the features derived from both real and generated samples.
The detailed configuration of the model’s structure and parameters is enumerated in
Table 4. The input sample size is 24 × 24, and the network includes two convolutional layers. The first layer adopts a large convolution kernel of 11 × 11 with eight channels, while the second layer employs a 6 × 6 kernel with the channel number expanded to 24. The use of large convolution kernels provides a wide receptive field, covering larger regions of the input data and permitting the model to effectively apprehend large-scale features while reducing the need for deeper convolutional layers. The pooling layers utilize 2 × 2 average pooling to accomplish dimensionality reduction and feature aggregation, thereby enhancing the model’s ability to emphasize dominant trends in fault-related features.
4.2. 2AMT Bearing Failure Diagnosis Experiment
The target bearing application is associated with the 2AMT used in pure electric vehicles, as shown in
Figure 11a. The 2AMT system consists of three shafts, front and rear housings, gears, and rolling bearings.
The bearing highlighted in
Figure 11a serves as the target bearing in the experimental setup, and the corresponding sensor arrangement is shown in
Figure 11b. The experimental bearings are deep groove ball bearings. To establish a comprehensive fault dataset, six types of defective bearings were fabricated using laser machining technology. The surface pitting faults are indicated by the red circles in
Figure 12. These include single faults occurring on the inner ring, outer ring, and rolling element, as well as compound faults combining inner & outer rings, inner & rolling elements, and outer & rolling elements. The pitting diameter is 0.53 mm for single faults and 0.18 mm for compound faults.
Vibration data were collected using the LMS Test Lab system. Vibration signal acquisition was performed for 7 bearing types: one healthy bearing, three single pitting fault types, and three compound pitting fault types. Each bearing type was tested for 30 s at a sampling frequency of 16,384 Hz. For vibration-signal analysis, the axial vibration signals were selected under an operating condition of 5000 r/min and 32 Nm. This experimental condition was established based on the China Light-Duty Vehicle Test Cycle (CLTC-P) and formulated according to the comprehensive shift logic of 2AMT together with the operating speed-torque range of the second gear. Partial vibration acceleration signal segments for the seven bearing types are presented in
Figure 13.
4.3. Diagnostic Results for Rolling Bearing Few-Shot Dataset
To validate the CBAM-based few-shot intelligent diagnostic model, the 2AMT single-point pitting bearing dataset was first analyzed. Each fault condition contained 200 samples, including three pitting fault types: inner ring, rolling element, and outer ring. For each fault type, they randomly selected 10~50% of the samples as real data, while they generated the remaining samples using the GAN to form a hybrid dataset, as
Table 5 summarizes. The four bearing conditions—normal, inner ring pitting, ball pitting, and outer ring pitting—were labeled as N, IP, BP, and OP, respectively.
Fifteen datasets were constructed based on the three pitting fault types (outer ring, inner ring, and rolling element) under different real-data proportions, and multiple diagnostic experiments were performed for each case. The overall distribution of recognition accuracies is shown in
Figure 14. When the rolling element pitting data were used as few-shot data, the recognition accuracy for all data proportions exceeded 96.32%, representing a significant improvement compared with the RVDCNN baseline. Furthermore, within each fault category, the recognition accuracy consistently increased as the proportion of real samples increased.
The mean and standard deviation of the ten repeated diagnostic results are summarized in
Table 6. Under the same few-shot ratio, increasing the severity of pitting faults enhanced the fault-related features in vibration signals, thereby improving local feature learning through the CBAM. When the proportion of real data reached 50%, the average diagnostic accuracies for the outer ring, inner ring, and rolling element pitting faults were 99.26%, 99.46%, and 99.54%, respectively, indicating that the model could accurately identify fault types under this condition.
When the proportion of real data was limited to 10%, confusion matrices were employed to characterize the diagnostic performance across the three pitting fault types, as shown in
Figure 15. In these matrices, the horizontal and vertical axes correspond to the predicted and actual labels, and the matrix values represent the normalized proportion of predicted samples.
When the outer ring pitting data was used as the few-shot samples, the overall diagnostic accuracy was 94.00%, with an 85.00% accuracy for outer ring pitting samples; 9% and 6% of these were misclassified as “IP” and “BP”, respectively.
When the inner ring pitting data served as few-shot samples, the overall accuracy was 95.00%, and the diagnostic accuracy for inner ring pitting was 86.00%, with 8% and 6% of samples misclassified as “BP” and “BP”, respectively.
When the rolling element pitting data served as few-shot samples, the overall diagnostic accuracy reached 96.25%, and the accuracy for rolling element pitting was 90.00%, with 6% and 4% misclassified as “BP” and “IP”, respectively.
The confusion matrices indicate that, when the real data of outer ring and inner ring pitting faults were scarce, the augmented samples generated by GAN were more susceptible to misclassification into other pitting categories. In contrast, for few-shot rolling element pitting faults, the CBAM effectively enhanced the extraction and recognition of pitting features through its adaptive weighting, thereby yielding superior diagnostic performance.
4.4. Diagnostic Results for Rolling Bearing Mixed Sample Dataset
As the number of transmission components increases, the rolling bearings inside the gearbox are exposed to various excitation sources and ambient noise during operation, which substantially degrades the signal-to-noise ratio. To further evaluate the generalization capability of the proposed CBAM-CNN model, a hybrid bearing dataset was constructed, as summarized in
Table 7.
The dataset consisted of 7 bearing types: the bearing in normal condition; single-point pitting faults located on the inner ring, rolling element, and outer ring (fault size: 0.53 mm); and compound pitting faults occurring on inner & outer rings (IOP), inner ring & rolling element (IBP), and outer ring & rolling element (OBP) (fault size: 0.18 mm). Based on these seven fault types, nine datasets (labeled A–J) were generated.
In dataset A, the inner ring pitting fault (IP) was defined as the few-shot category, with a real-to-generated sample ratio of 2:8 and a total of 200 samples per fault type. In dataset G, both IP and BP were considered few-shot categories, while the remaining five fault types each contained 200 samples. Following this criterion, six datasets were designed with single few-shot labels, and three datasets contained dual few-shot labels.
Each of the nine few-shot bearing datasets was subjected to 10 repeated diagnostic experiments using the CBAM-CNN model. The mean accuracy and standard deviation were adopted as evaluation metrics, and the results are presented in
Figure 16. As illustrated by the error bars, diagnostic accuracy exhibited slight fluctuations across datasets but remained generally stable. Datasets D~J showed slightly lower diagnostic accuracy compared to A~C, reflecting the increased difficulty introduced by compound faults and limited quantities of real samples.
The detailed results are summarized in
Table 8. The CBAM-CNN model demonstrated consistently high and stable diagnostic performance across all fault types. The accuracies for datasets A~C were 98.36%, 98.27%, and 98.31%, indicating stable performance for single fault types. Dataset E, which contained few-shot compound faults involving the inner ring and rolling element, achieved the lowest accuracy at 96.90%. Datasets G~J, each containing dual few-shot labels, achieved accuracies of 97.65%, 97.55%, and 97.59%, approximately 0.72% lower than datasets A~C, demonstrating the model’s adaptability to complex labeling scenarios. As datasets D~F involved compound few-shot categories and datasets G~J contained multiple few-shot labels, the increased complexity in data distribution and temporal sequence structure posed greater challenges to both the generator and discriminator, thereby slightly reducing diagnostic accuracy.
To further investigate the classification performance, confusion matrices for datasets D~J were constructed, as shown in
Figure 17, where the horizontal and vertical axes represent the predicted and actual fault labels, and each matrix entry number denotes the normalized proportion of predicted samples.
For dataset D, the overall diagnostic accuracy was 97.14%, with 12 misclassified compound fault samples for inner outer ring pitting. For dataset E, the overall accuracy was 96.57%, with 22 misclassified samples for inner-rolling element pitting. For dataset F, the accuracy was 96.43%, with 18 misclassified samples for outer-rolling element pitting. For dataset G, the accuracy was 97.57%, with 10 misclassified inner ring and 10 misclassified rolling element pitting samples. For dataset H, the accuracy was 97.21%, with 11 misclassified inner ring and 8 misclassified outer ring samples. For dataset J, the accuracy was 97.14%, with 9 misclassified outer ring and 8 misclassified rolling element samples.
Overall, the misclassified samples were predominantly concentrated in the few-shot fault categories, indicating that data scarcity remained the main challenge. Nevertheless, the CBAM-CNN model maintained high robustness and diagnostic reliability under these complex few-shot and compound fault conditions, confirming its potential applicability in real-world transmission systems.
4.5. Model Core Ablation and Comparison of Different Diagnostic Models
The objective of the ablation study is to quantify the functional contribution of each core module to the overall diagnostic performance. To systematically evaluate the independent effects of the CBAM-CNN and the GAN, the ablation experiments were conducted using dataset C. Similar to the hyperparameter tuning process, all ablation configurations were trained under identical optimization settings to ensure fair comparability. The diagnostic accuracies of ablation are summarized in
Table 9, and the corresponding confusion matrices are illustrated in
Figure 18.
As shown in
Table 9, the baseline CNN model that excludes both GAN and CBAM modules achieves the lowest performance across all evaluation metrics, with the average accuracy rate of 85.47%. This performance degradation is attributable to limited original training, data imbalance, and the restricted capability of traditional convolutional layers to capture complex fault patterns.
When the GAN module is introduced, the model performance improves significantly. This improvement is primarily attributed to the GAN’s ability to generate class-consistent synthetic samples, thereby enriching the training distribution, mitigating data imbalance, and reducing overfitting. When the CBAM module was further incorporated, the diagnostic accuracy improved relative to the GAN + CNN model. The CBAM adaptively refines both channel and spatial attention distributions, enabling the model to emphasize salient fault features and suppress irrelevant information. This enhances the model’s ability to focus on and represent key features. Furthermore, by combining the GAN with the CBAM, the diagnostic performance of the model has been further enhanced.
The evaluation of each module’s contribution to the enhancement of diagnostic accuracy was conducted through quantification. The contribution degree of the GAN was 8.06%; the contribution degree of the CBAM was 9.63%; and the contribution degree of the GAN and CBAM-CNN was 16.25%. The contribution of the CBAM module was greater, and there was a certain degree of functional overlap between the two modules.
To further demonstrate the proposed model’s reliability, a recent related model was included for comparison. The baseline model integrates an asymmetric convolutional network (AC-Net) with a multi-head attention mechanism (MHA) and utilizes TL and GAN to generate fault data [
39]. The confusion matrices are illustrated in
Figure 19. The diagnostic accuracies are summarized in
Table 10.
The comparative results reveal that the proposed model delivers superior diagnostic accuracy in few-shot fault diagnosis tasks. The CBAM enhances the model’s capacity to focus on discriminative and fine-grained fault features, thereby improving classification precision. Owing to the scarcity of fault samples in the source domain, the baseline model’s capability for fault detection was suboptimally developed. Consequently, after fine-tuning on the target domain data, its diagnostic accuracy was lower than that of the CBAM-CNN model.
Overall, the results demonstrate that GAN-based data augmentation broadens the training data distribution, while CBAM preserves essential diagnostic semantics. These mechanisms play a pivotal role in improving the generalization capability and diagnostic reliability of the proposed model.
5. Discussion
The GAN module, founded upon the Wasserstein distance loss function, is demonstrated to effectively address issues such as data imbalance and the paucity of training samples, thereby synthesizing a high-precision fault dataset. Concurrently, the convolutional attention mechanism module is introduced, thereby enabling the model to focus on both local features and global information of the input data. The combination of convolutional operations and the attention mechanism enables the model to extract fault feature information from the data and comprehend the data distribution structure, thereby enhancing the model’s capacity to identify bearing faults at various scales. Furthermore, the findings of the recognition results of bearing faults on small sample datasets demonstrate the efficacy of the proposed fault diagnosis model. The model receives the vibration acceleration signals from the sensors of high-speed equipment. It detects the operating status of key components in real time, judges their operating conditions, and sends out information, such as fault alerts.
However, in practical engineering applications, the model parameters require fine-tuning based on the operating conditions and noise levels of the target equipment to achieve optimal diagnostic accuracy. Additionally, it should be noted that there is still room for improvement in the final diagnosis results due to certain deviations between the generated data by GAN and the real data, and the limited training dataset of CBAM-CNN, leading to overfitting.