A Cross-Modal Multi-Layer Feature Fusion Meta-Learning Approach for Fault Diagnosis Under Class-Imbalanced Conditions

Luo, Haoyu; Liu, Mengyu; Deng, Zihao; Cheng, Zhe; Yang, Yi; Shen, Guoji; Hu, Niaoqing; Xiao, Hongpeng; Xing, Zhitao

doi:10.3390/act14080398

Open AccessArticle

A Cross-Modal Multi-Layer Feature Fusion Meta-Learning Approach for Fault Diagnosis Under Class-Imbalanced Conditions

by

Haoyu Luo

^1,2,

Mengyu Liu

³,

Zihao Deng

^1,2,

Zhe Cheng

^1,2,*

,

Yi Yang

^1,2,*

,

Guoji Shen

^1,2

,

Niaoqing Hu

^1,2,

Hongpeng Xiao

^1,2 and

Zhitao Xing

^1,2

¹

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

²

National Key Laboratory of Equipment Sate Sensing and Smart Support, National University of Defense Technology, Changsha 410073, China

³

College of Mechanical and Vehicle Engineering, Hunan University, Changsha 410082, China

^*

Authors to whom correspondence should be addressed.

Actuators 2025, 14(8), 398; https://doi.org/10.3390/act14080398

Submission received: 14 July 2025 / Revised: 6 August 2025 / Accepted: 9 August 2025 / Published: 11 August 2025

(This article belongs to the Special Issue AI, Designing, Sensing, Instrumentation, Diagnosis, Controlling, and Integration of Actuators in Digital Manufacturing—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In practical applications, intelligent diagnostic methods for actuator-integrated gearboxes in industrial driving systems encounter challenges such as the scarcity of fault samples and variable operating conditions, which undermine diagnostic accuracy. This paper introduces a multi-layer feature fusion meta-learning (MLFFML) approach to address fault diagnosis problems in cross-condition scenarios with class imbalance. First, meta-training is performed to develop a mature fault diagnosis model on the source domain, obtaining cross-domain meta-knowledge; subsequently, meta-testing is conducted on the target domain, extracting meta-features from limited fault samples and abundant healthy samples to rapidly adjust model parameters. For data augmentation, this paper proposes a frequency-domain weighted mixing (FWM) method that preserves the physical plausibility of signals while enhancing sample diversity. Regarding the feature extractor, this paper integrates shallow and deep features by replacing the first layer of the feature extraction module with a dual-stream wavelet convolution block (DWCB), which transforms actuator vibration or acoustic signals into the time-frequency space to flexibly capture fault characteristics and fuses information from both amplitude and phase aspects; following the convolutional network, an encoder layer of the Transformer network is incorporated, containing multi-head self-attention mechanisms and feedforward neural networks to comprehensively consider dependencies among different channel features, thereby achieving a larger receptive field compared to other methods for actuation system monitoring. Furthermore, this paper experimentally investigates cross-modal scenarios where vibration signals exist in the source domain while only acoustic signals are available in the target domain, specifically validating the approach on industrial actuator assemblies.

Keywords:

class imbalance; cross-modal; meta-learning; wavelet convolution

1. Introduction

Gearboxes function as critical transmission components in industrial actuator driving systems and are extensively utilized across numerous fields, such as industrial production and transportation. They are commonly seen in actuation-critical applications like robotic joint reducers, automotive transmissions, construction machinery, ships, and wind turbine pitch control systems [1,2,3], where they perform the core task of transmitting mechanical power to actuator outputs. However, during actual operation, gearboxes in actuation chains frequently encounter variable rotational speeds, fluctuating loads, and prolonged exposure to high-intensity alternating stresses. These demanding conditions inevitably heighten the risk of failures, including missing gear teeth and gear cracks, which can degrade actuator positioning accuracy or cause unexpected downtime. The occurrence of such failures not only severely disrupts the operational continuity of equipment but also poses latent safety hazards to overall machinery integrity and actuation reliability. In recent years, fault diagnosis has become a critical method for monitoring actuation system condition and mitigating the risk of equipment failure.

Fault diagnosis is a critical technology in industrial production and equipment operation maintenance. Its core lies in precisely monitoring the operational status of equipment or systems and promptly identifying the type and location of faults when anomalies occur. Currently, fault diagnosis techniques are widely applied to various equipment, such as aircraft engines [4,5], gearboxes [6,7], and bearings [8,9]. Traditional fault diagnosis methods often rely on manual feature extraction and expert experience, with their performance constrained by the completeness of prior knowledge and generalization capabilities [10]. In recent years, with advancements in IoT sensing technology and computational capabilities, data-driven fault diagnosis methods based on deep learning have achieved significant progress. These methods overcome the limitations of traditional approaches by automatically learning complex fault feature representations from multi-source monitoring signals, including raw vibration, temperature, and acoustic emission. The application of deep reinforcement learning in modeling fault diagnosis as Markov decision processes [11], Long Short-Term Memory (LSTM) networks in capturing long-term dependencies of non-stationary fault signals [7], and Transformers in global feature extraction [12] has significantly enhanced fault recognition accuracy, driving a technological paradigm shift in this field.

However, fault diagnosis based on deep learning in industrial scenarios faces severe challenges. First, deep learning models typically require a large number of labeled training samples to achieve good performance [13,14]. However, fault samples, especially those of rare or novel faults, are often difficult to obtain, leading to the sample scarcity problem [15]. This significantly degrades the diagnostic performance of traditional deep learning methods. Real-world fault diagnosis data often suffers from a serious class imbalance problem, where the number of healthy-state or common fault samples far exceeds that of certain specific fault samples [16]. This imbalance causes general learning algorithms to be easily biased towards the majority class, reducing the recall rate for minority-class faults. Moreover, when a diagnostic model is transferred from a laboratory environment to real operating conditions, factors such as load variations, environmental noise, and sensor drift cause data distribution shifts [4], leading to significant degradation in model performance.

To address the issue of scarce fault samples, existing research has conducted preliminary explorations, yielding various tentative or representative methodological frameworks. These methods typically involve improvements and integrations across dimensions including data augmentation strategies, loss function modifications, and meta-learning strategies. Data augmentation strategies encompass approaches such as generating fault samples via Generative Adversarial Networks (GANs) [17,18,19] and the Synthetic Minority Over-sampling Technique (SMOTE) [16,20]. Wang et al. [21] incorporated an adaptive network into the feature extraction component of a generative adversarial network, improving the traditional variational autoencoder to expand the dataset. Gao et al. [22] introduced multi-distribution mega-trend diffusion into generative adversarial networks, proposing an intelligent virtual sample method that demonstrated promising experimental results on real-world industrial datasets. Wang et al. [23] combined transfer entropy with convolutional generative adversarial networks to augment training data for fault diagnosis in air conditioning systems. Yusoff et al. [24] developed a comprehensive strategy integrating feature transformation and data-level resampling to address issues of sample overlap and class imbalance. Liu et al. [25] proposed an adaptive correction method for imbalanced datasets by incorporating niche techniques and the Synthetic Minority Over-sampling Technique (SMOTE), effectively enhancing model training outcomes. However, the physical interpretability and diversity of such methods remain questionable. Regarding loss function adjustments, deep reinforcement learning can flexibly leverage imbalanced distributions to construct reward functions, enhancing agents’ sensitivity to limited samples [26,27], yet its effectiveness is constrained in extreme imbalance scenarios. The core advantage of meta-learning methods lies in their few-shot learning capability, minimizing reliance on extensive labeled data and overcoming the dependence of traditional machine learning’s on large sample sizes. Current research primarily focuses on metric-based meta-methods that aim to learn effective distance or similarity functions for the classification of novel categories in few-shot tasks. For instance: Lin et al. [28] embedded deep features into BiLSTM to enhance discriminability between features of different fault categories, subsequently using prototypes per fault class to identify similar samples; Wang et al. [29] integrated a self-attentive multi-scale convolutional module to significantly enhance the classical MAML architecture, maintaining high accuracy with minimal fault samples. Chen et al. [30] employed a multi-scale attention mechanism for feature extraction to intensify focus on critical features while refining prototype vector accuracy and Bi et al. [31] adopted instance-guided matching and dimension-guided attention modules to dynamically adjust prototype features. Mu et al. [32] developed a fine-grained feature learner and a task inequality metric index, enhancing the discrimination capability for subtle features. Dong et al. [33] proposed an interpretable integration fusion time-frequency prototype contrastive learning method, enhancing the fusion of multi-sensor signals and selection of unlabeled samples, thereby improving the network’s decision-making capability through prototypes. Nevertheless, existing metric-based meta-learning methods emphasize increasing inter-sample distance metrics and feature extraction but lack sufficient exploration of global dependencies in deep features and causal relationship analysis. This deficiency renders the methods more vulnerable to sample scarcity and background noise.

To address the limitations of existing data augmentation methods, this paper proposes a frequency-domain weighted mixing (FWM) method for the augmentation of fault signals. This approach superimposes existing signals in the frequency domain to enhance sample diversity while preserving physical plausibility. Another key contribution is the design of a causal self-attention (CSA) mechanism, which guides the feature extraction network to focus on the causal characteristics of signals and the dependencies between channels in deep features. Finally, to improve network interpretability, we introduce a dual-stream wavelet convolution block (DWCB) that directly extracts time-frequency domain magnitude and phase information from signals and fuses both types of information.

Given the multimodal nature of fault diagnosis data, current research extensively explores diverse modalities, including vibration signals, audio signals [34], temperature data, and image data [35]. Furthermore, this study incorporates cross-domain experiments between vibration and acoustic signals, aiming to align and fuse information from disparate sensors or data sources to achieve more robust and comprehensive feature representations.

The proposed method is experimentally validated to enable effective early warning by monitoring actuator vibration or acoustic signals during incipient faults, such as gearbox wear and pitting. This approach suppresses the escalation of localized faults into system-level failures, significantly reducing potential economic losses and personnel safety risks. Constructed for cross-domain imbalanced sample scenarios, the intelligent diagnostic model requires a limited amount of datato train a highly mature model, thereby mitigating deployment challenges in practical applications.

The main contributions of this paper are summarized as follows:

(1) We propose the Multi-layer Feature Fusion Meta-Learning (MLFFML) framework for cross-domain class-imbalance scenarios, further incorporating cross-modal experimental scenarios.

(2) FWM is designed to synthesize training samples by fusing amplitude spectra of multiple signals while preserving phase characteristics, effectively addressing sample scarcity with physical plausibility.

(3) DWCB replaces standard CNN layers to process magnitude and phase information through parallel paths, significantly enhancing time-frequency feature discrimination for weak fault signatures.

(4) CSA is integrated to dynamically isolate domain-invariant features. It enforces independence constraints between noise-corrupted and clean feature representations, suppressing spurious correlations during prototype generation.

2. Materials and Methods

2.1. Overall Framework

Meta-learning, a novel network training paradigm also termed “learning to learn”, has been extensively studied for its superior few-shot learning capability and generalization performance. However, fault diagnosis imposes stringent requirements on feature extraction quality. This paper proposes MLFFML, which implements effective improvements at both shallow and deep feature extraction stages. At the shallow stage, the DWCB architecture processes amplitude and phase information in signal time-frequency representations separately, followed by feature fusion. At the deep stage, CSA leverages transformer encoder layers to enhance global feature dependencies, while a variable independence loss constrains the network to suppress redundant features and enhance feature sparsity. Additionally, FWM augments signal diversity by synthesizing spectral components across different samples, enriching training sample variability for improved model robustness. The overall framework diagram of MLFFML is shown in Figure 1.

The MLFFML framework explicitly addresses three persistent challenges in planetary gearbox fault diagnosis: complex signal transmission paths from planet gears to sensors, weak fault characteristics for planet gear and sun gear-localized faults, and strong cross-modal heterogeneity. Our technical contributions directly target these issues: DWCB’s wavelet convolution adapts to non-stationary gear-meshing signals, extracting physically interpretable time-frequency representations. CSA’s causal constraints suppress noise-induced pseudo-features under variable operating conditions. FWM preserves fault-related spectral components while augmenting limited samples.

The proposed method’s training process is divided into meta-training and meta-testing phases, both leveraging known data from source and target domains to train the model. Since the model architecture, loss computation methodology, and dataset construction remain identical across these two phases, they can be described uniformly. As an example, meta-training begins with the random generation of meta-tasks (

T_{t r a i n} = \{(S_{1}, Q_{1}), . . ., (S_{m}, Q_{m})\}

) from the available data. The set (

S_{i} = {\{(x_{i}, y_{i})\}}^{N \times K}

) used for task-specific adaptation is called the support set, while the set (

Q_{i} = {\{(x_{i}, y_{i})\}}^{N \times L}

) used to evaluate task performance is termed the query set. Here, N denotes the number of classes, and K and L represent the number of samples per category in the support set and query set, respectively.The specific details of the proposed method are detailed as follows.

First, during the data augmentation stage, FWM expands the number of samples for each fault category to match the count of healthy samples, thereby balancing the dataset. Subsequently, meta-training tasks are constructed from the expanded dataset, with Gaussian noise injected into the support set to create noised samples. The feature extraction stage then employs DWCB, followed by CNN and Transformer encoder layers. Notably, the dual-stream wavelet convolution replaces the first convolutional layer in conventional CNNs, utilizing trainable, complex-valued wavelet kernels optimized through gradient descent to extract interpretable time-frequency representations. Meanwhile, the multi-head self-attention mechanism within Transformer encoders enhances global feature dependencies. Following feature extraction, prototype vectors are computed per class by averaging support-set features. These prototypes serve two purposes: to enable the calculation of classification probability and to function as input to the triplet loss. Furthermore, a variable independence loss constrains feature representations between clean and noised support sets to enforce causal disentanglement and feature sparsity.

The remaining portion of Section 2 details the constituent modules of the proposed method. Specifically, Section 2.4 addresses not only the causal self-attention mechanism but also comprehensively elaborates on the computation of training loss functions and classification probabilities.

2.2. Dual-Stream Wavelet Convolution Block

A novel dual-stream wavelet convolution block (DWCB) is designed to extract key frequency-band information from signals. Inspired by WaveletKernelNet [36] and wavelet transform [37], it employs Morlet complex wavelet convolution for time-frequency decomposition of the signal. The wavelet basis function adopts the Morlet complex wavelet, whose mathematical expression is expressed as follows:

ψ_{s} (t) = \frac{1}{\sqrt{s}} π^{- \frac{1}{4}} exp (i ω_{0} \frac{t}{s} - \frac{t^{2}}{2 s^{2}})

(1)

where

ω_{0}

denotes the central frequency and s represents the scale parameter, which also functions as a trainable parameter in the complex wavelet convolutional layer.

ψ_{s}

denotes the complex Morlet wavelet basis function at scale s. By varying the scale parameter (s), a set of complex Morlet wavelet filters is generated. The convolution process is divided into real-part computation and imaginary-part computation, with their mathematical expressions provided below.

ϕ_{s}

represents the real part of the wavelet,

φ_{s}

represents the imaginary part of the wavelet, * denotes the convolution operation, and x is the input signal being processed.

h = ϕ_{s} (t) * x + i \cdot φ_{s} (t) * x

(2)

As shown in Figure 2, the DWCB separates amplitude and phase information after complex Morlet wavelet convolution. This design is necessitated by the inherent differences in the nature of amplitude and phase information. Using the same feature extraction methods—especially the ReLU activation function—may cause phase information loss, thereby diminishing its value in subsequent processes. After extracting preliminary features, amplitude and phase information are integrated via channel-wise concatenation. Multi-dimensional feature weighting is then applied to adaptively adjust the convolutional features.

2.3. Frequency-Domain Weighted Mixing Method

To address the issue of fault data scarcity, this paper proposes a sample augmentation method based on frequency-domain signal mixing, termed the frequency-domain weighted mixing (FWM) method. The method operates by randomly selecting three original signal samples (

x_{1}

,

x_{2}

, and

x_{3}

), decomposing them into frequency-domain representations via Fourier transform, with the computational procedure detailed as follows:

X_{k} (ω) = F \{x_{k} (t)\} = A_{k} (ω) e^{j ϕ_{k} (ω)}, k = 1, 2, 3

(3)

where

X_{k}

is the frequency-domain representation of signal

x_{k}

and

A_{k}

represents the amplitude. Randomly weighted mixing is applied to amplitude spectra across multiple samples using weights of

λ_{1}, λ_{2}, λ_{3} \sim U (0, 1)

while preserving the phase information of a single reference signal to maintain the temporal characteristics of the waveform.

A_{syn}

represents the amplitude of the synthesized signal in the frequency domain, and

ϕ_{syn}

denotes the phase information of the synthesized signal in the frequency domain.

A_{s y n} (ω) = \sum_{k = 1}^{3} λ_{k} A_{k} (ω)

(4)

ϕ_{s y n} (ω) = ϕ_{1} (ω)

(5)

Ultimately, the time-domain signal is reconstructed via the inverse Fourier transform and subjected to standardization processing. This fusion strategy preserves the energy within fault-characteristic frequency bands while ensuring phase consistency to prevent temporal distortion in synthesized signals, thereby effectively expanding the diversity of fault samples.

N o r m

denotes the normalization of the time-domain signal obtained from the inverse Fourier transform.

\hat{x} (t) = N o r m (F^{- 1} \{A_{s y n} (ω) e^{j ϕ_{s y n} (ω)}\})

(6)

2.4. Causal Self-Attention

The proposed method incorporates a Transformer encoder layer after the CNN layers. The Transformer encoder layer consists of two sub-layers: a multi-head self-attention mechanism and a feed-forward neural network. Each sub-layer includes residual connections and layer normalization. This architecture establishes global dependencies across sequences while dynamically weighting and integrating contextual information. Its structure is shown below. Here,

Attention

denotes the multi-head attention mechanism [38], where

Q

,

K

, and

V

represent the query, key, and value vectors, respectively.

F

is the input feature, and

F^{'}

is the output feature.

d_{k}

is the dimension of the key vector.

F^{'} = L a y e r N o r m (F + A t t e n t i o n (Q, K, V, F))

(7)

A t t e n t i o n (Q, K, V) = s o f t max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(8)

A causal decomposition loss is introduced to enforce independence constraints between variables of the original support-set features and the noise-injected support-set features. Combined with feature transformations in the Transformer, this approach uncovers underlying causal associations and enhances the sparsity of feature representations.

corr

denotes the computation of the correlation matrix.

f_{clean}^{T}

and

f_{noise}^{T}

represent the deep features of the original sample and the noise-added sample, respectively.

L_{C} = \frac{1}{d} \sum_{i = 1}^{d} {(corr {(f_{c l e a n}^{T}, f_{n o i s e}^{T})}_{i i} - 1)}^{2} + \frac{1}{d (d - 1)} \sum_{i \neq j} corr {(f_{c l e a n}^{T}, f_{n o i s e}^{T})}_{i j}^{2}

(9)

2.5. Classification Loss

The prototype vector for each class is calculated from the support-set samples, and triplet loss is employed to reduce intra-class sample distance while increasing inter-class sample distance; in this triplet loss configuration, the anchor is set to the features of all samples from both the support set and query set, the positive sample corresponds to the prototype feature of the anchor’s own class, and the negative samples are selected as the closest heterogeneous samples to prototype vectors, thereby enhancing intra-class compactness by aligning samples with their class prototypes while improving inter-class discrimination through explicit contrast with the most confusable negative prototypes.

The total loss consists of two components: causal disentanglement loss and classification loss.

L_{t o t a l} = α L_{C} + β L_{t r i}

(10)

The classification loss is calculated using the triplet function, as shown below.

D

denotes the Euclidean distance.

L_{t r i} = arg max (\sum_{i}^{N} (D (x_{i}^{a}, x_{i}^{p}) - D (x_{i}^{a}, x_{i}^{n}) + a), 0)

(11)

Here,

x_{i}^{a}

denotes the anchor sample, which is selected from all samples in both the support set and query set during calculation.

x_{i}^{p}

represents the positive sample, chosen as the prototype sample of the anchor sample’s category.

x_{i}^{n}

is the negative sample, selected as the most similar sample from a class different than that of the anchor sample. This configuration effectively minimizes intra-class distances while maintaining separation between the closest inter-class samples. a is a manually set distance parameter, set to 20 here. The classification outcome is determined by the distances between sample features and the prototype features of each category. This process is relatively simple and does not affect model training; thus, we will not elaborate on it here.

3. Experimental Verification

A CNN is selected as the backbone network, configured with batch normalization layers and LeakyReLU activation functions. The inner-loop optimizer employs the SGD optimizer with a learning rate of 0.001 for 50 epochs, while the outer-loop optimizer uses the Adam optimizer with a learning rate of 0.0001 for 100 epochs. Comparative methods include MAML, BordlineSMOTE, ProtoNet, ProtoNet+MLTI [39], EprotoNet [40], MLDSO [41], and PMML [28], where ProtoNet+MLTI employs task interpolation to augment the original meta-training tasks; EprotoNet utilizes an attention mechanism and elastic factors to flexibly distinguish between different classes; MLDSO is a metric-based meta-learning method that excels under conditions of limited samples and variable operating conditions; and PMML enhances the discriminability of meta-features between prototypes and test samples by integrating Bidirectional LSTM (BiLSTM), thereby forming more discriminative metric features.

Two datasets were utilized in the experiments: the planetary gearbox dataset and the Prognostics and Health Management (PHM) Data Challenge 2009 Dataset. The former was employed to investigate cross-modal fault diagnosis from vibration to acoustic signals, with experimental results detailed in Section 3.1. The latter focused on cross-domain fault diagnosis based solely on vibration signals, with experimental results presented in Section 3.2. Cross-modal tasks validate the model’s ability to extract essential signal features, while single-modal tasks verify its stability with homogeneous data. Moreover, the two datasets differ in fault types: the former contains multiple single faults, while the latter includes compound faults involving bearings, gears, and shafts. Together, they cover the data diversity requirements of real-world industrial scenarios.

The system operated on Windows 11, with an Intel Core i7-13650HX CPU and NVIDIA GeForce RTX 4060 GPU. Computation leveraged PyTorch 2.1.2 + cu121 and Python 3.11.7.

3.1. Cross-Modal Fault Diagnosis

3.1.1. Dataset Description and Preprocessing: Planetary Gearbox Dataset

The experimental data was generated using our team’s planetary gearbox test platform (structural schematic shown in Figure 3). Vibration and acoustic signals under three operating conditions were selected for this study, encompassing signals representing seven fault types, along with healthy signals. The sampling frequency for all signals was consistently set at 10 kHz.

A schematic diagram of the test-bench structure and faults is shown in Figure 4. The planetary gearbox used in the test bench is a single-stage reduction gearbox. The sun gear has 21 teeth, the planet gear has 31 teeth, and the ring gear has 84 teeth, all with a module of 1.5. Both the planet gears and sun gear are made of 20CrMnTi carburized steel. The ring gear is made of alloy steel.

The dataset distribution is detailed in Table 1. The imbalance ratio (p) is defined as the ratio of samples per fault category to healthy samples, where the healthy samples in training are consistently set to 50 and each sample has a fixed length of 2048 data points. For the target domain, we tested two scenarios with imbalance ratios p of 0.1 and 0.2, which means that each fault category contains 5 and 10 samples, respectively. In the source domain, the imbalance ratio was set to 0.4. Multiplying this ratio by the number of healthy samples results in 20 samples per fault category in the source domain.

As shown in Table 2, three operating conditions were selected to construct cross-domain diagnostic tasks: 1500 rpm at 4 N·m, 2100 rpm at 2 N·m, and 2100 rpm at 4 N·m. In this experiment, vibration signals were utilized as training data in the source domain, while acoustic signals were adopted for both training and testing in the target domain. The six cross-modal fault diagnosis tasks, as detailed in Table 2, involve both the source and target domains suffering from scarcity of fault samples and class imbalance. The imbalance ratios are provided in columns 3 and 4 of Table 2, with the number of samples per fault category indicated in parentheses. During training, the source domain exclusively utilizes vibration signals, while the target domain relies solely on acoustic signals. The final model evaluation is conducted exclusively on the acoustic dataset of the target domain.

3.1.2. Experimental Results of the Planetary Gearbox Test Bench

The experimental results for the planetary gearbox dataset are presented in Table 3 and Table 4, with all experiments repeated eight times and the highest and lowest values excluded, while Figure 5 displays radar charts for all methods. These tables list the average accuracy and standard deviation of all methods, facilitating multi-angle methodological evaluation. At a target-domain imbalance ratio of 0.2, the proposed method achieved an average accuracy of 96.78%, representing a 7.70% improvement over the best baseline method (PMML), while also exhibiting the lowest standard deviation among all methods, confirming its stability. At a target-domain imbalance ratio of 0.1, the proposed method attained an average accuracy of 92.36%, outperforming PMML (the best comparative method) by 7.25% while maintaining a low standard deviation. Furthermore, architectural analysis reveals that both PMML and the proposed method—which conduct deeper processing of deep-level features—deliver superior performance, with average accuracy degradation controlled within 5% as the imbalance ratio changes. In contrast, MLDSO (which performed well in Table 3) suffered an approximately 10% decline in average accuracy due to reduced fault samples. This demonstrates that mining global dependencies within deep-level features significantly enhances the robustness of intelligent diagnostic methods under fault-scarce scenarios.

According to Figure 5, it is evident that the area occupied by the proposed method’s graph is the largest. At a target domain imbalance ratio of 0.2, the proposed method demonstrates advantages primarily in classification tasks C3 and C5. When the target-domain imbalance ratio is 0.1, the advantage of the proposed method over PMML narrows in tasks C5 and C1, but it shows clear superiority in tasks C2, C3, C4, and C6. Figure 6 displays the confusion matrices of all methods for task C2 on the planetary gearbox dataset; the values on the diagonal represent the ratio of correctly predicted samples to the total samples for each class, which is the recall rate. The recall rate is widely used in small-sample diagnostics to evaluate method performance. It is clear that categories labeled 1, 2, 3, 4, and 5—corresponding to faults such as broken and cracked planetary gears, pitting, and cracked/broken sun gears—are more challenging to classify. This is because planetary gears and sun gears are not in direct contact with the planetary gearbox housing, making the vibration and acoustic signals from these faults unable to transmit directly outward, causing characteristic energy attenuation. When combined with environmental noise in acoustic measurements, traditional methods struggle to capture subtle defects. Our approach overcomes this through DWCB and CSA. The proposed method achieves an average recall rate of over 85% for these difficult categories, proving its effectiveness in extracting subtle fault features from acoustic and vibration signals.

3.1.3. Ablation Experiment

To demonstrate the effectiveness of each module in the proposed method, this section conducts an ablation study. The Ab1 variant removes FWM from the proposed method. Ab2 further replaces DWCB with a standard convolutional layer of identical kernel size and channel count, building upon Ab1. Ab3 additionally eliminates CSA in the proposed architecture. The ablation study is conducted on a planetary gearbox dataset with a target-to-imbalance ratio of 0.1. As shown in Table 5, the ablation study results demonstrate that the FWM, DWCB, and CSA mechanisms all play crucial roles in the proposed method.

3.2. Cross-Domain Fault Diagnosis Under Single-Mode Data

3.2.1. Dataset Description and Preprocessing: PHM Data Challenge 2009

The Prognostics and Health Management (PHM) Data Challenge 2009 dataset is a vibration signal dataset focused on gearbox faults, integrating multiple fault types across gears, bearings, and transmission shafts. The gearbox structure and sensor positions used in this dataset are shown in Figure 7, and this study exclusively utilizes data from the input-end vibration sensor channel. Six operational states were selected from the dataset, with specific details provided in Table 6, where IS, ID, and OS denote input shaft, idler shaft, and output shaft, respectively. Although the PHM 2009 dataset contains only vibration data, it offers comprehensive coverage of fault locations and types, encompassing diverse compound fault scenarios.

As shown in Table 7, this paper performs single-mode cross-operating-condition class imbalance experiments on this dataset, selecting input speeds of 30 Hz, 40 Hz, and 50 Hz to form six cross-domain diagnostic tasks, with the imbalance ratio settings and healthy sample counts identical to the previous dataset; experiments were repeated eight times, excluding the highest and lowest values.

3.2.2. Experimental Results of PHM Dataset

Experimental results for the PHM dataset are presented in Table 8 and Table 9, with Figure 8 providing radar chart visualization, revealing that compared to cross-modal approaches, single-mode fault diagnosis presents a notably lower challenge, as manifested by improved overall accuracy and a narrowed accuracy range from minimum to maximum. The proposed method outperforms all others, achieving average accuracies of 93.72% and 90.45% at target-domain imbalance ratios of 0.2 and 0.1, respectively, while also delivering the lowest standard deviations, with PMML—the strongest baseline—performing 5.76% and 8.99% worse than the proposed method under these two tests. Moreover, after halving the number of target-domain fault samples, the proposed method’s accuracy decreased by only 3.27%, which is significantly less than others, demonstrating its reduced dependency on fault samples.

Figure 8 illustrates the radar chart experimental results for the PHM dataset, clearly showing that the proposed method exhibits the smallest performance degradation after fault sample reduction, with only a 3.27% average accuracy drop. Compared to the best baseline PMML, the proposed method demonstrates significant advantages in tasks C1 and C2. Figure 9 displays the confusion matrix for task C2 under a target-domain imbalance ratio of 0.1 on the PHM dataset, revealing that fault classes labeled 3, 4, and 5 present greater classification challenges. According to Table 6, these difficulties arise because such faults involve gears and bearings located far from the input shaft, requiring fault features to be indirectly transmitted through gear meshing, leading to weakened and aliased characteristics. The proposed method achieved over an 85% recall rate for these three challenging fault types, substantially outperforming other comparative methods.

4. Conclusions

The scarcity of fault samples and the prevalence of class imbalance pose significant challenges for intelligent diagnosis of actuator-integrated gearboxes. This study introduces a novel meta-learning framework that synergizes multi-layer feature fusion and cross-modal adaptation to overcome these challenges. Our method achieves state-of-the-art performance in few-shot fault diagnosis under varying actuation conditions, demonstrating high generalization capability across domains. This paper proposes a multi-layer feature fusion meta-learning method for class-imbalanced scenarios across operating conditions and modalities in actuator-integrated gearboxes. The approach includes meta-training and meta-testing stages, utilizing known samples from source and target domains to achieve rapid adaptation to the target domain under fault sample scarcity—particularly critical for driving system actuators with limited maintenance data. The method employs FWM data augmentation, DWCB, and CSA mechanisms. Evaluations on two datasets tested scenarios with five and ten fault samples per class. On the planetary gearbox dataset, the method accurately learned cross-domain knowledge between vibration and acoustic signals from industrial actuator assemblies, maintaining over 92% average accuracy, with confusion matrices indicating over an 80% recall rate for samples with faint fault signals—validating robustness in actuator health monitoring. PHM dataset experiments demonstrated strong performance for compound faults involving bearings, gears, and drive-shaft actuators, sustaining over 90% average accuracy under variable actuation loads.

Current limitations involve interpretability deficiencies. While the DWCB extracts time-frequency features, subsequent deep feature extraction lacks explainability, representing a direction for future research. Emerging physics-informed neural networks incorporate prior physical knowledge as a loss term during model training to guide the learning process, enhancing physical consistency. Furthermore, expert knowledge can be integrated into the network to optimize feature extraction pathways, ultimately constructing a well-defined semantic space.

Author Contributions

Conceptualization, H.L. and M.L.; methodology, H.L. and Z.D.; validation, H.L. and M.L.; formal analysis, H.L., Z.C., Y.Y., G.S. and N.H.; investigation, Z.C.; resources, Y.Y.; data curation, H.L., M.L. and Z.D.; writing—original draft preparation, H.L. and M.L.; writing—review and editing, H.L. and H.X.; visualization, H.L. and Z.X.; supervision, G.S. and N.H.; project administration, Z.C. and Y.Y.; funding acquisition, Z.C. and Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The authors are grateful for the financial support from the National Natural Science Foundation of China (Grant No. 52275140); the Young Elite Scientists Sponsorship Program by CAST (No. YESS20240414); the Natural Science Foundation of Hunan Province of China (Grant No. 2024JJ5408); and the National Key Laboratory Fund Project (No.614XXXX2203).

Data Availability Statement

The detailed data supporting the results of this study are available from the corresponding authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CSA	Causal Self-Attention
DWCB	Dual-stream Wavelet Convolution Block
FWM	Frequency-Domain Weighted Mixing
MLFFML	Multi-Layer Feature Fusion Meta-Learning
PHM	Prognostics and Health Management

References

Zhang, L.; Fan, Q.; Lin, J.; Zhang, Z.; Yan, X.; Li, C. A nearly end-to-end deep learning approach to fault diagnosis of wind turbine gearboxes under nonstationary conditions. Eng. Appl. Artif. Intell. 2023, 119, 105735. [Google Scholar] [CrossRef]
Liu, D.; Cui, L.; Cheng, W. A review on deep learning in planetary gearbox health state recognition: Methods, applications, and dataset publication. Meas. Sci. Technol. 2024, 35, 012002. [Google Scholar] [CrossRef]
Chen, S.; Liu, Z.; He, X.; Zou, D.; Zhou, D. Multi-mode fault diagnosis datasets of gearbox under variable working conditions. Data Br. 2024, 54, 110453. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.Q.; Zhao, Y.P.; Liang, B.Y.; Zhang, T.D.; Hou, K.X. Weighted class-aware matching adaptation network for aero-engine imbalanced multi-source cross-domain fault diagnosis under class shift. Eng. Appl. Artif. Intell. 2025, 149, 110510. [Google Scholar] [CrossRef]
Li, B.; Xue, S.K.; Fu, Y.H.; Tang, Y.D.; Zhao, Y.P. Kernel adapted extreme learning machine for cross-domain fault diagnosis of aero-engines. Aerosp. Sci. Technol. 2024, 146, 108970. [Google Scholar] [CrossRef]
Feng, Z.; Gao, T.; Yu, X.; Zhang, Y.; Chen, X.; Yang, Y.; Du, M. Planet bearing fault diagnosis via double encoder signal analysis. Mech. Syst. Signal Process. 2025, 224, 111978. [Google Scholar] [CrossRef]
Chen, Y.; Liu, X.; Rao, M.; Qin, Y.; Wang, Z.; Ji, Y. Explicit speed-integrated LSTM network for non-stationary gearbox vibration representation and fault detection under varying speed conditions. Reliab. Eng. Syst. Saf. 2025, 254, 110596. [Google Scholar] [CrossRef]
Truong, G.B.; Tran, T.T.; Than, N.L.; Nguyen, V.Q.; Nguyen, T.H.; Pham, V.T. SC-MambaFew: Few-shot learning based on Mamba and selective spatial-channel attention for bearing fault diagnosis. Comput. Electr. Eng. 2025, 123, 110004. [Google Scholar] [CrossRef]
Lian, Y.; Wang, J.; Li, Z.; Liu, W.; Huang, L.; Jiang, X. Residual attention guided vision transformer with acoustic-vibration signal feature fusion for cross-domain fault diagnosis. Adv. Eng. Inform. 2025, 64, 103003. [Google Scholar] [CrossRef]
Li, Z.; Liu, C.; Huang, W.; Wang, F.; Yang, W. Fault diagnosis method based on multimodal-deep tensor projection network under variable working conditions. Mech. Syst. Signal Process. 2025, 225, 112336. [Google Scholar] [CrossRef]
Wei, Z.; Wang, H.; Zhao, Z.; Zhou, Z.; Yan, R. Gearbox fault diagnosis based on temporal shrinkage interpretable deep reinforcement learning under strong noise. Eng. Appl. Artif. Intell. 2025, 139, 109644. [Google Scholar] [CrossRef]
Yang, H.; Song, Y.; Wang, D.; Xing, J.; Li, Y. A global information-guided denoising diffusion probabilistic model for fault diagnosis with imbalanced data. Eng. Appl. Artif. Intell. 2025, 147, 110312. [Google Scholar] [CrossRef]
Kim, T.; Ko, J.U.; Lee, J.; Chae Kim, Y.; Jung, J.H.; Youn, B.D. Spectrum-guided GAN with density-directionality sampling: Diverse high-fidelity signal generation for fault diagnosis of rotating machinery. Adv. Eng. Inform. 2024, 62, 102821. [Google Scholar] [CrossRef]
Liu, D.; Zhong, S.; Lin, L.; Zhao, M.; Fu, X.; Liu, X. Feature-level SMOTE: Augmenting fault samples in learnable feature space for imbalanced fault diagnosis of gas turbines. Expert Syst. Appl. 2024, 238, 122023. [Google Scholar] [CrossRef]
Dong, Z.; Jiang, Y.; Jiao, W.; Zhang, F.; Wang, Z.; Huang, J.; Wang, X.; Zhang, K. Double attention-guided tree-inspired grade decision network: A method for bearing fault diagnosis of unbalanced samples under strong noise conditions. Adv. Eng. Inform. 2025, 64, 103004. [Google Scholar] [CrossRef]
Xu, Y.; Fan, R.Z.; He, Y.L.; Zhu, Q.X.; Zhang, Y.; Zhang, M.Q. DeepSMOTE with Laplacian matrix decomposition for imbalance instance fault diagnosis. Chemom. Intell. Lab. Syst. 2025, 259, 105338. [Google Scholar] [CrossRef]
Chen, H.; Wei, J.; Huang, H.; Wen, L.; Yuan, Y.; Wu, J. Novel imbalanced fault diagnosis method based on generative adversarial networks with balancing serial CNN and Transformer (BCTGAN). Expert Syst. Appl. 2024, 258, 125171. [Google Scholar] [CrossRef]
Liu, X.; Liu, S.; Xiang, J.; Sun, R. A transfer learning strategy based on numerical simulation driving 1D Cycle-GAN for bearing fault diagnosis. Inf. Sci. 2023, 642, 119175. [Google Scholar] [CrossRef]
Liu, J.; Zhang, C.; Jiang, X. Imbalanced fault diagnosis of rolling bearing using improved MsR-GAN and feature enhancement-driven CapsNet. Mech. Syst. Signal Process. 2022, 168, 108664. [Google Scholar] [CrossRef]
Gamel, S.A.; Ghoneim, S.S.; Sultan, Y.A. Improving the accuracy of diagnostic predictions for power transformers by employing a hybrid approach combining SMOTE and DNN. Comput. Electr. Eng. 2024, 117, 109232. [Google Scholar] [CrossRef]
Wang, X.; Jiang, H.; Wu, Z.; Yang, Q. Adaptive variational autoencoding generative adversarial networks for rolling bearing fault diagnosis. Adv. Eng. Inform. 2023, 56, 102027. [Google Scholar] [CrossRef]
Gao, X.; Zhang, Y.; Fu, J.; Li, S. Data augmentation using improved conditional GAN under extremely limited fault samples and its application in fault diagnosis of electric submersible pump. J. Frankl. Inst. 2024, 361, 106629. [Google Scholar] [CrossRef]
Wang, H.; Zhou, H.; Chen, Y.; Yang, L.; Bi, W. Deep learning GAN-based fault detection and diagnosis method for building air-conditioning systems. Sustain. Cities Soc. 2025, 118, 106068. [Google Scholar] [CrossRef]
Yusoff, M.; Mahmud, Y.; Azmi, P.A.R.; Sallehud-din, M.T.M. The improvement of SMOTE-ENN-XGBoost through Yeo Johnson strategy on Dissolved Gas Analysis dataset. Energy Rep. 2025, 13, 6281–6290. [Google Scholar] [CrossRef]
Liu, K.; Zhao, X.; Hui, Y. An adaptive imbalance robust graph embedding broad learning system fault diagnosis for imbalanced batch processes data. Process Saf. Environ. Prot. 2024, 192, 694–706. [Google Scholar] [CrossRef]
He, S.; Cui, Q.; Chen, J.; Pan, T.; Hu, C. Contrastive feature-based learning-guided elevated deep reinforcement learning: Developing an imbalanced fault quantitative diagnosis under variable working conditions. Mech. Syst. Signal Process. 2024, 211, 111192. [Google Scholar] [CrossRef]
Yin, Y.; Zhang, L.; Shi, X.; Wang, Y.; Peng, J.; Zou, J. Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning. Comput. Mater. Contin. 2024, 81, 2769–2790. [Google Scholar] [CrossRef]
Lin, L.; Zhang, S.; Fu, S.; Liu, Y.; Suo, S.; Hu, G. Prototype matching-based meta-learning model for few-shot fault diagnosis of mechanical system. Neurocomputing 2025, 617, 129012. [Google Scholar] [CrossRef]
Wang, C.; Wang, N.; Deng, L. Sample less meta-learning fault diagnosis based on ordered time–frequency features. Eng. Appl. Artif. Intell. 2025, 154, 110881. [Google Scholar] [CrossRef]
Chen, Y.; Yue, J.; Liu, Z.; Chen, J. A semi-supervised wise-attention weighted prototype network for rolling bearing fault diagnosis under noisy and limited labeled data conditions. Neurocomputing 2025, 647, 130563. [Google Scholar] [CrossRef]
Bi, H.; Peng, T.; Han, J.; Cui, H.; Liu, L. Dynamic matching-prototypical learning for noisy few-shot relation classification. Knowl. Based Syst. 2025, 309, 112888. [Google Scholar] [CrossRef]
Mu, M.; Jiang, H.; Wang, X.; Dong, Y. A task-oriented theil index-based meta-learning network with gradient calibration strategy for rotating machinery fault diagnosis with limited samples. Adv. Eng. Inform. 2024, 62, 102870. [Google Scholar] [CrossRef]
Dong, Y.; Jiang, H.; Wang, X.; Mu, M. An interpretable integration fusion time-frequency prototype contrastive learning for machine fault diagnosis with limited labeled samples. Inf. Fusion 2025, 124, 103340. [Google Scholar] [CrossRef]
Del Rosario Bautista-Morales, M.; Patiño-López, L. Acoustic detection of bearing faults through fractional harmonics lock-in amplification. Mech. Syst. Signal Process. 2023, 185, 109740. [Google Scholar] [CrossRef]
Yao, Y.; Chen, Q.; Gui, G.; Yang, S.; Zhang, S. A hierarchical adversarial multi-target domain adaptation for gear fault diagnosis under variable working condition based on raw acoustic signal. Eng. Appl. Artif. Intell. 2023, 123, 106449. [Google Scholar] [CrossRef]
Li, T.; Zhao, Z.; Sun, C.; Cheng, L.; Chen, X.; Yan, R.; Gao, R.X. WaveletKernelNet: An Interpretable Deep Neural Network for Industrial Intelligent Diagnosis. IEEE Trans. Syst. Man, Cybern. Syst. 2022, 52, 2302–2312. [Google Scholar] [CrossRef]
Li, C.; Liang, M. A generalized synchrosqueezing transform for enhancing signal time–frequency representation. Signal Process. 2012, 92, 2264–2274. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Yao, H.; Zhang, L.; Finn, C. Meta-Learning with Fewer Tasks through Task Interpolation. arXiv 2022, arXiv:2106.02695. [Google Scholar]
Luo, J.; Shao, H.; Lin, J.; Liu, B. Meta-learning with elastic prototypical network for fault transfer diagnosis of bearings under unstable speeds. Reliab. Eng. Syst. Saf. 2024, 245, 110001. [Google Scholar] [CrossRef]
Zhang, D.; Zheng, K.; Bai, Y.; Yao, D.; Yang, D.; Wang, S. Few-shot bearing fault diagnosis based on meta-learning with discriminant space optimization. Meas. Sci. Technol. 2022, 33, 115024. [Google Scholar] [CrossRef]

Figure 1. The overall framework diagram of the proposed method.

Figure 2. Structural diagram of DWCB.

Figure 3. Basic architecture of the planetary gearbox test bench.

Figure 4. Gearbox structure and schematic fault diagram. (a) Schematic diagram of the gearbox; (b) broken planetary gear; (c) planetary gear crack; (d) planetary gear pitting; (e) sun gear crack; (f) broken sun gear; (g) ring gear crack; (h) broken ring gear.

Figure 5. Radar charts of experimental results for the planetary gearbox dataset: (a) target-domain imbalance ratio of 0.2; (b) target-domain imbalance ratio of 0.1.

Figure 6. Confusion matrices of all methods for Task C2 on the planetary gearbox dataset: (a) BSMOTE; (b) ProtoNet; (c) MAML; (d) EProtoNet; (e) ProtoNet+MLTI; (f) MLDSO; (g) PMML; (h) proposed method.

Figure 7. Gearbox used on PHM Data Challenge 2009: (a) simplified diagram; (b) physical prototype.

Figure 8. Radar charts of experimental results for the PHM dataset: (a) target-domain imbalance ratio of 0.2; (b) target-domain imbalance ratio of 0.1.

Figure 9. Confusion matrices of all methods for Task C2 on the PHM dataset: (a) BSMOTE; (b) ProtoNet; (c) MAML; (d) EProtoNet; (e) ProtoNet+MLTI; (f) MLDSO; (g) PMML; (h) proposed method.

Table 1. The states contained in the planetary gearbox dataset.

Fault Location	Fault Type	Training Sample	Testing Sample	Label
None	Health	50	250	0
Planet gear teeth	Broken	50*p	250	2
Planet gear teeth	Crack	50*p	250	1
Planet gear teeth	Spall	50*p	250	4
Sun gear teeth	Crack	50*p	250	5
Sun gear teeth	Broken	50*p	250	6
Ring gear teeth	Crack	50*p	250	7
Ring gear teeth	Broken	50*p	250	8

Table 2. The operating conditions included in the planetary gearbox dataset.

Case	Source Domain	Target Domain	Source-Domain p and Fault Samples per Class	Target-Domain p and Fault Samples per Class	Healthy Samples
C1	1500 rpm 4 N·m	2100 rpm 2 N·m	0.4 (20)	0.1 (5)/0.2 (10)	50
C2	1500 rpm 4 N·m	2100 rpm 4 N·m	0.4 (20)	0.1 (5)/0.2 (10)	50
C3	2100 rpm 2 N·m	1500 rpm 4 N·m	0.4 (20)	0.1 (5)/0.2 (10)	50
C4	2100 rpm 2 N·m	2100 rpm 4 N·m	0.4 (20)	0.1 (5)/0.2 (10)	50
C5	2100 rpm 4 N·m	1500 rpm 4 N·m	0.4 (20)	0.1 (5)/0.2 (10)	50
C6	2100 rpm 4 N·m	2100 rpm 2 N·m	0.4 (20)	0.1 (5)/0.2 (10)	50

Table 3. Experimental results of the planetary gearbox test bench with a target-domain imbalance ratio of 0.2 (%).

Case	BSMOTE	Proto	MAML	EProto	Proto+MLTI	MLDSO	PMML	Ours
C1	63.28 ± 3.73	73.05 ± 3.14	79.73 ± 4.55	82.92 ± 2.85	74.92 ± 3.62	91.89 ± 1.50	92.25 ± 2.05	98.20 ± 1.07
C2	61.84 ± 1.41	70.13 ± 2.47	74.34 ± 2.23	72.40 ± 4.07	74.30 ± 1.32	94.06 ± 0.78	90.00 ± 1.38	97.28 ± 1.06
C3	61.11 ± 2.58	63.78 ± 3.77	66.15 ± 3.01	69.79 ± 9.23	79.95 ± 1.81	86.08 ± 1.70	83.75 ± 2.70	95.31 ± 1.84
C4	58.26 ± 4.48	65.10 ± 6.73	72.94 ± 2.33	82.92 ± 0.77	74.84 ± 2.62	88.21 ± 0.67	92.08 ± 0.79	97.62 ± 0.48
C5	58.69 ± 1.84	63.78 ± 2.84	61.30 ± 1.87	69.82 ± 10.25	78.96 ± 0.43	85.46 ± 1.30	84.63 ± 1.46	94.90 ± 2.22
C6	64.96 ± 2.37	69.14 ± 4.48	68.65 ± 1.83	72.29 ± 7.43	77.99 ± 4.56	87.05 ± 3.91	91.80 ± 1.31	97.36 ± 0.82
Avg	61.36 ± 2.74	67.50 ± 3.90	70.52 ± 2.64	75.02 ± 5.77	76.83 ± 2.39	88.79 ± 1.64	89.08 ± 1.61	96.78 ± 1.25

Table 4. Experimental results of the planetary gearbox test bench with a target-domain imbalance ratio of 0.1 (%).

Case	BSMOTE	Proto	MAML	EProto	Proto+MLTI	MLDSO	PMML	Ours
c1	42.01 ± 4.75	64.77 ± 2.60	68.60 ± 3.24	75.81 ± 1.87	66.67 ± 2.68	76.10 ± 2.91	91.23 ± 1.10	93.50 ± 0.57
C2	46.21 ± 2.02	58.72 ± 5.68	58.54 ± 0.79	68.36 ± 8.12	62.03 ± 1.66	79.79 ± 4.21	86.92 ± 1.71	94.39 ± 3.10
C3	46.77 ± 8.72	56.82 ± 5.53	51.52 ± 6.30	65.39 ± 2.77	67.11 ± 2.91	74.00 ± 2.98	77.39 ± 4.92	89.49 ± 2.81
C4	45.86 ± 3.88	58.96 ± 4.27	64.73 ± 2.55	78.93 ± 3.67	67.01 ± 1.00	81.51 ± 2.71	89.25 ± 0.77	94.67 ± 2.81
C5	38.70 ± 3.33	56.61 ± 2.39	44.04 ± 4.05	62.08 ± 2.41	70.10 ± 0.71	75.11 ± 4.03	78.17 ± 1.95	86.36 ± 1.95
C6	50.79 ± 2.54	64.64 ± 2.51	55.70 ± 2.13	59.45 ± 8.15	63.85 ± 2.43	83.49 ± 3.65	87.70 ± 5.15	95.77 ± 1.00
Avg	45.06 ± 4.21	60.09 ± 3.83	57.19 ± 3.18	68.34 ± 4.50	66.13 ± 1.90	78.33 ± 3.42	85.11 ± 2.60	92.36 ± 2.04

Table 5. The mean accuracy and standard deviation of the ablation study results.

	Proposed Method	Ab1	Ab2	Ab3
C1	93.50 ± 0.57	94.53 ± 1.48	83.24 ± 2.03	83.20 ± 2.36
C2	94.39 ± 3.10	92.93 ± 1.10	80.89 ± 8.38	81.24 ± 3.39
C3	89.49 ± 2.81	89.04 ± 0.69	76.11 ± 2.26	73.24 ± 2.64
C4	94.67 ± 2.81	95.57 ± 1.27	79.82 ± 2.44	81.80 ± 2.63
C5	86.36 ± 1.95	84.06 ± 2.05	72.00 ± 2.86	69.99 ± 3.58
C6	95.77 ± 1.00	94.63 ± 1.54	82.04 ± 3.06	77.85 ± 5.72
Avg	92.36 ± 2.04	91.79 ± 1.36	79.02 ± 3.51	77.89 ± 3.38

Table 6. PHM Data Challenge 2009.

Fault Type	Gear			Bearing			Input Shaft
Fault Type	32T	48T	80T	IS:IS	ID:IS	OS:IS	Input Shaft
1	Normal	Normal	Normal	Normal	Normal	Normal	Normal
2	Crack	Eccentricity	Normal	Normal	Normal	Normal	Normal
3	Normal	Eccentricity	Tooth Broken	Rolling Element	Normal	Normal	Normal
4	Crack	Eccentricity	Tooth Broken	Inner Ring	Rolling Element	Outer Ring	Normal
5	Normal	Normal	Tooth Broken	Inner Ring	Rolling Element	Outer Ring	Imbalance
6	Normal	Normal	Normal	Normal	Rolling Element	Outer Ring	Imbalance

Table 7. Detailed parameters of the PHM dataset.

Case	Source Domain	Target Domain	Source Domain p (Fault Samples per Class)	Target Domain p (Fault Samples per Class)	Healthy Samples
C1	30 Hz	40 Hz	0.4 (20)	0.1 (5)/0.2 (10)	50
C2	30 Hz	50 Hz	0.4 (20)	0.1 (5)/0.2 (10)	50
C3	40 Hz	30 Hz	0.4 (20)	0.1 (5)/0.2 (10)	50
C4	40 Hz	50 Hz	0.4 (20)	0.1 (5)/0.2 (10)	50
C5	50 Hz	30 Hz	0.4 (20)	0.1 (5)/0.2 (10)	50
C6	50 Hz	40 Hz	0.4 (20)	0.1 (5)/0.2 (10)	50

Table 8. Experimental results of the PHM dataset with a target-domain imbalance ratio of 0.2 (%).

Case	BS	Proto	MAML	EProto	Proto+MLTI	MLDSO	PMML	Ours
C1	72.64 ± 1.44	79.86 ± 5.16	70.17 ± 2.00	79.17 ± 2.49	75.21 ± 1.21	85.89 ± 3.19	88.47 ± 3.94	93.97 ± 1.78
C2	77.75 ± 2.31	83.61 ± 3.55	75.00 ± 1.99	89.24 ± 4.59	72.29 ± 2.79	89.28 ± 1.73	84.83 ± 4.33	97.39 ± 0.77
C3	66.39 ± 2.37	65.14 ± 2.59	68.22 ± 1.72	74.31 ± 3.76	67.88 ± 2.91	81.25 ± 2.00	87.61 ± 2.62	93.33 ± 2.57
C4	83.47 ± 1.76	77.88 ± 1.10	76.19 ± 4.35	79.97 ± 3.62	76.35 ± 2.08	87.44 ± 1.11	92.56 ± 0.76	94.89 ± 1.76
C5	66.11 ± 5.26	67.85 ± 4.52	72.28 ± 2.49	76.25 ± 5.02	66.60 ± 2.94	85.28 ± 1.67	87.81 ± 1.94	91.19 ± 1.22
C6	72.39 ± 4.78	78.85 ± 4.84	52.47 ± 4.22	71.25 ± 4.53	72.47 ± 2.29	85.03 ± 6.25	86.50 ± 4.04	91.56 ± 2.42
Avg	73.13 ± 2.99	75.53 ± 3.63	69.06 ± 2.79	78.36 ± 4.00	71.80 ± 2.37	85.69 ± 2.66	87.96 ± 2.94	93.72 ± 1.75

Table 9. Experimental results of the PHM dataset with a target-domain imbalance ratio of 0.1 (%).

Case	BS	Proto	MAML	EProto	Proto+MLTI	MLDSO	PMML	Ours
C1	60.19 ± 1.67	72.59 ± 3.82	54.72 ± 5.85	71.44 ± 3.43	67.34 ± 2.43	71.61 ± 5.36	75.65 ± 4.49	85.92 ± 3.35
C2	58.89 ± 4.84	71.11 ± 9.50	68.02 ± 5.82	73.03 ± 2.55	62.69 ± 3.97	73.61 ± 5.54	81.43 ± 1.73	92.31 ± 2.23
C3	56.52 ± 2.68	56.53 ± 3.33	55.46 ± 2.86	61.85 ± 2.18	56.69 ± 2.55	72.00 ± 1.77	81.54 ± 1.96	90.39 ± 1.24
C4	67.35 ± 1.98	68.52 ± 1.01	57.17 ± 3.01	73.59 ± 3.69	67.82 ± 3.38	69.48 ± 4.36	86.46 ± 2.30	94.92 ± 1.15
C5	58.94 ± 3.67	60.57 ± 4.45	51.89 ± 4.68	63.87 ± 6.35	56.34 ± 3.94	67.06 ± 3.09	79.37 ± 1.99	87.00 ± 3.26
C6	65.65 ± 3.18	63.24 ± 4.97	50.24 ± 1.43	66.09 ± 5.48	60.90 ± 8.14	64.59 ± 7.01	84.33 ± 2.45	92.19 ± 2.13
Avg	61.26 ± 3.00	65.43 ± 4.51	56.25 ± 3.94	68.31 ± 3.95	61.96 ± 4.07	69.73 ± 4.52	81.46 ± 2.49	90.45 ± 2.22

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, H.; Liu, M.; Deng, Z.; Cheng, Z.; Yang, Y.; Shen, G.; Hu, N.; Xiao, H.; Xing, Z. A Cross-Modal Multi-Layer Feature Fusion Meta-Learning Approach for Fault Diagnosis Under Class-Imbalanced Conditions. Actuators 2025, 14, 398. https://doi.org/10.3390/act14080398

AMA Style

Luo H, Liu M, Deng Z, Cheng Z, Yang Y, Shen G, Hu N, Xiao H, Xing Z. A Cross-Modal Multi-Layer Feature Fusion Meta-Learning Approach for Fault Diagnosis Under Class-Imbalanced Conditions. Actuators. 2025; 14(8):398. https://doi.org/10.3390/act14080398

Chicago/Turabian Style

Luo, Haoyu, Mengyu Liu, Zihao Deng, Zhe Cheng, Yi Yang, Guoji Shen, Niaoqing Hu, Hongpeng Xiao, and Zhitao Xing. 2025. "A Cross-Modal Multi-Layer Feature Fusion Meta-Learning Approach for Fault Diagnosis Under Class-Imbalanced Conditions" Actuators 14, no. 8: 398. https://doi.org/10.3390/act14080398

APA Style

Luo, H., Liu, M., Deng, Z., Cheng, Z., Yang, Y., Shen, G., Hu, N., Xiao, H., & Xing, Z. (2025). A Cross-Modal Multi-Layer Feature Fusion Meta-Learning Approach for Fault Diagnosis Under Class-Imbalanced Conditions. Actuators, 14(8), 398. https://doi.org/10.3390/act14080398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Cross-Modal Multi-Layer Feature Fusion Meta-Learning Approach for Fault Diagnosis Under Class-Imbalanced Conditions

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Framework

2.2. Dual-Stream Wavelet Convolution Block

2.3. Frequency-Domain Weighted Mixing Method

2.4. Causal Self-Attention

2.5. Classification Loss

3. Experimental Verification

3.1. Cross-Modal Fault Diagnosis

3.1.1. Dataset Description and Preprocessing: Planetary Gearbox Dataset

3.1.2. Experimental Results of the Planetary Gearbox Test Bench

3.1.3. Ablation Experiment

3.2. Cross-Domain Fault Diagnosis Under Single-Mode Data

3.2.1. Dataset Description and Preprocessing: PHM Data Challenge 2009

3.2.2. Experimental Results of PHM Dataset

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI