1. Introduction and Literature Review
Gearboxes, as core components in rotating machinery, are widely used in industrial manufacturing, transportation, aerospace, and other fields. Their operational reliability is critical to ensuring the safe and stable operation of rotating machinery [
1]. However, gearboxes typically consist of multiple tightly coupled gears with complex internal structures and operate under harsh and variable conditions, making them prone to failures. Once a failure occurs, it may lead to significant economic losses or even casualties [
2]. Therefore, analyzing the operational monitoring data of gearboxes to achieve condition monitoring and fault diagnosis is of significant engineering importance.
Since vibration signals are most sensitive to gearbox faults, vibration analysis has become the most widely adopted technical approach for gearbox fault diagnosis [
3,
4]. Traditionally, researchers have employed frequency-domain analysis, wavelet transforms, and other manual-experience-based methods to analyze gearbox vibration signals and identify faults [
5,
6]. Such methods suffer from low data processing efficiency and heavy reliance on expert knowledge, making them inadequate for rapidly handling large volumes of monitoring signals in modern industrial systems [
7]. With advancements in artificial intelligence and big data technologies, deep learning-based intelligent diagnosis models have emerged as powerful tools for rapid and accurate gearbox fault identification [
8,
9]. Numerous deep learning models, such as deep convolutional neural networks (DCNNs) and deep generative adversarial networks (GANs), have been extensively applied in constructing intelligent fault diagnosis models for gearboxes [
10]. However, due to their massive parameter sizes, existing intelligent diagnosis models rely heavily on extensive fault data for parameter training, and their diagnostic performance is closely tied to the volume of training data [
11]. Under insufficient training samples, these models are highly prone to overfitting, resulting in inadequate diagnostic generalization capability. However, in practical engineering applications, the directly obtainable gearbox fault data are often extremely limited. Moreover, conducting fault simulation experiments on gearboxes in laboratory settings to acquire fault data proves to be highly costly. Consequently, the gearbox fault data we can collect fall far short of meeting the training requirements of intelligent models. Thus, this study on intelligent fault diagnosis methods for gearboxes under few-shot conditions is of both academic significance and engineering value.
In recent years, scholars have achieved some progress in few-shot gearbox fault diagnosis. Existing research can be broadly categorized into three approaches: data augmentation-based methods, algorithm optimization-based methods, and transfer learning-based methods [
12]. For data augmentation, Su et al. [
13] proposed an improved generative adversarial network (GAN)-based vibration data augmentation method for planetary gearbox fault diagnosis. Zhang et al. [
14] developed a semi-supervised GAN for gearbox fault data augmentation, where augmented data enhanced diagnostic performance under few-shot conditions. In algorithm optimization, Zhu et al. [
15] introduced a contrastive learning and self-attention mechanism-based method for few-shot gear fault recognition. Chen et al. [
16] designed a label-assisted semi-supervised adversarial learning network to extract vibration features from limited fault samples. For transfer learning, Li et al. [
17] proposed a cross-feature transferable network for gearbox fault diagnosis under data scarcity. However, data augmentation-based methods often demand substantial computational resources, making them less practical for engineering applications. Algorithm optimization-based approaches typically require meticulously designed network architectures or parameter regularization schemes to extract more fault information from limited samples, imposing high demands on model developers’ expertise. Transfer learning-based methods face challenges in selecting appropriate transfer sources and designing meta-transfer tasks, risking negative transfer phenomena that degrade diagnostic performance. In summary, there is an urgent need for innovative research directions and technical solutions for intelligent gearbox fault diagnosis under few-shot conditions.
Knowledge-informed deep learning (KDL) is recognized as a powerful tool with which to overcome the data bottleneck of deep learning and has become a new research frontier in artificial intelligence [
18], as shown in
Figure 1. Unlike traditional deep learning workflows, KDL integrates both data-driven learning and prior knowledge guidance during model construction and training, significantly reducing the demand for training data [
19]. In fields such as image recognition, KDL has successfully enabled researchers to achieve strong data generalization capabilities under limited-sample conditions [
20]. In recent years, in the field of mechanical fault diagnosis, researchers have begun incorporating diagnostic prior knowledge into models with the aim of achieving fault diagnosis with strong generalization ability despite insufficient fault samples [
21]. For instance, Kim et al. [
22] proposed a domain adaptation learning method enhanced by bearing prior knowledge for few-shot rolling bearing fault diagnosis. Liu et al. [
23] proposed a knowledge-informed cross-category filtering framework for fault diagnosis under small samples. Matania et al. [
24] proposed a physical knowledge-informed algorithm to overcome the lack of fault sample for fault severity estimation of gearboxes. Sun et al. [
25] presented a physical knowledge based data fusion and reconstruction network for bearing fault diagnosis under incomplete data. Consequently, incorporating gearbox fault diagnosis prior knowledge into the development and training of intelligent models holds great potential for achieving high performance with limited fault samples.
Building on these analyses, this paper proposes a hierarchical attention-guided data–knowledge dual-driven fusion network for intelligent gearbox fault diagnosis under few-shot conditions. Unlike traditional single data-driven paradigms, the proposed method synergizes prior knowledge and monitoring data to overcome limited-sample constraints, establishing a novel data–knowledge dual-driven fusion paradigm for few-shot gearbox fault diagnosis. The main contributions of this paper are as follows:
- (1)
A hierarchical attention-guided data–knowledge dual-driven fusion network is proposed, enabling parameter training with limited data and domain knowledge.
- (2)
An intelligent fault diagnosis method based on the above network is developed to address the challenge of accurate gearbox fault identification under few-shot conditions.
- (3)
Extensive case studies on gearbox fault experiments validate the diagnostic effectiveness and superiority of the proposed method over related methods in few-shot scenarios.
The remainder of this paper is organized as follows:
Section 2 introduces the theoretical foundations of CNNs and attention mechanisms.
Section 3 details the proposed diagnostic method.
Section 4 and
Section 5 validate the method’s effectiveness through two gearbox fault diagnosis case studies.
Section 6 concludes the paper.
3. Proposed Method
3.1. Overview
This study aims to achieve gear condition monitoring and fault diagnosis through the analysis of gearbox vibration signals, focusing on diagnosing gear cracks and pitting—the most prevalent failure modes during gear operation. Vibration signals are acquired from the gearbox using vibration sensors and data acquisition systems. Through manual signal segmentation and labeling, a dataset is constructed, where represents the ith vibration sample containing M data points, and denotes the corresponding fault label of . The objective is to develop a fault diagnosis model using D that accurately learns the nonlinear mapping .
In practical engineering scenarios, the scarcity of gearbox fault signals severely limits the availability of training data for intelligent diagnosis models. The primary challenge lies in training a high-performance diagnostic model with minimal fault samples. To address this, we propose a hierarchical attention-guided data–knowledge dual-driven fusion network. The key innovation of this method is the design of a hierarchical attention module that synergizes prior knowledge with monitoring data, thereby overcoming the constraints imposed by insufficient fault samples on model training.
As shown in
Figure 2, the proposed method involves three key steps, detailed as follows. (1) Prior Feature Construction: Domain knowledge of fault diagnosis is utilized to extract prior features from monitoring data. (2) Hierarchical Abstract Feature Extraction: A deep convolutional neural network (CNN) is designed to hierarchically capture abstract features from the spectrum of the monitoring data. (3) Hierarchical Attention-Guided Fusion: A hierarchical attention module dynamically allocates feature weights across layers, enabling adaptive fusion of prior features and abstract features. The fused features are then used to achieve accurate fault identification.
3.2. Knowledge-Driven Prior Feature Construction
In the field of fault diagnosis, experts have accumulated substantial domain knowledge, including failure mechanisms, fault characteristics, and signal processing methods, collectively referred to as prior knowledge. Statistical features of vibration signals, such as peak values and kurtosis, can partially characterize gearbox health conditions. Crucially, these features are derived from fault mechanism analyses and require no data-driven parameter training.
Inspired by Reference [
27] and supported by preliminary experiments on gearbox fault diagnosis, our method selects 10 signal feature indicators to construct the prior feature vector
, where
represents the
jth feature indicator, as listed in
Table 1. These 10 prior features are directly computed from input data and subsequently normalized via zero-mean standardization. The constructed prior feature vector demonstrates certain characterization capability for gearbox health conditions. For instance, when gearbox faults occur, peak-to-peak values and root mean square values of vibration signals exhibit significant changes. Moreover, the selected prior feature vector maintains its capability to characterize gearbox health conditions to some extent, even across different mechanical equipment, thus exhibiting certain generalization ability.
3.3. Data-Driven Fault Feature Extraction
When using limited training samples, deep features extracted by neural networks often suffer from insufficient generalization capability in characterizing gearbox health states. To address this, hierarchical feature fusion is adopted to obtain more generalizable data representations.
As shown in
Figure 2, the proposed method employs a seven-layer deep convolutional neural network (CNN) for hierarchical automatic feature extraction from the spectrum of the vibration data. The detailed parameters of this CNN are listed in
Table 2. The computational process for each convolutional layer follows Equations (1) and (2). To facilitate hierarchical feature fusion, we first process features from each layer using an identity convolution kernel. Let the kernel size of the identity convolution kernel
be
. The processed feature
at the
ath layer is expressed as
where
is the output of the pooling layer at the
ath layer.
is the bias term of the identity convolution layer.
The compressed feature
retains essential hierarchical information while balancing the dimensionality across layers, ensuring compatibility for subsequent attention-guided fusion.
where
and
are the weight matrix and bias term of the
ath fully connected layer, respectively.
3.4. Hierarchical Attention-Guided Feature Fusion
Although most intelligent diagnostic models employ deep data features for fault identification, the representational capacity and generalization ability of features at different depths in DCNNs vary significantly due to varying numbers of convolutional operations. In this field, existing studies have demonstrated that assigning different weights to distinct CNN layers can achieve more robust fault feature extraction with enhanced generalization capability [
28]. To obtain superior generalization performance, we adopt a weighting strategy to fuse data features from different depths.
Specifically, after obtaining seven abstract features from seven layers of the deep convolutional network, a hierarchical attention module is employed to achieve weighted fusion of the seven abstract features with the prior features, as illustrated in
Figure 3.
For the feature
, its attention score
is computed as
where
is a nonlinear activation function,
and
are the weight matrix and bias term of the attention scoring layer. The attention weight
is then normalized via
The weighted feature at the
ath layer is derived as
Finally, a concatenation function
fuses all weighted hierarchical features with the prior features:
where
F denotes the fused feature.
indicates that feature combination is performed along the column direction, ensuring dimensional compatibility of features. The resulting
F integrates both multi-layer abstract features (automatically extracted by the deep CNN) and knowledge-informed prior features, forming an optimal fusion that enhances discriminative power for characterizing gearbox health states.
3.5. Method Training Process
After obtaining the weighted fused feature
F, it is fed into a Softmax classifier to achieve the final classification of gearbox vibration data. The operation of the Softmax classifier is formulated as
where
is the output of the Softmax classifier, and
is the parameter of the Softmax classifier.
The training objective is to minimize the discrepancy between predicted and true labels, for which the cross-entropy loss
is adopted:
where
is an indicator function that returns 1 if
and otherwise 0.
The training workflow of the proposed method is summarized in Algorithm 1.
Algorithm 1 Pseudo- code of the training process of the proposed method. |
Input: | Gearbox vibration dataset |
Initialize: | Network parameters (CNN weights, attention weights, classifier parameters) |
Configure: | Learning rate, Training Epoch, optimizer (e.g., Adam) |
1: | for Epoch do: |
2: | Compute prior features Pi for training samples in D using Table 1; |
3: | Extract hierarchical abstract features and fuse with prior features via Equations (6)–(11); |
4: | Calculate predicted labels using Equation (12); |
5: | Compute cross-entropy loss via Equation (13); |
6: | Update parameters using the optimizer; |
7: | end for |