Lightweight Adaptive Feature Compression and Dynamic Network Fusion for Rotating Machinery Fault Diagnosis Under Extreme Conditions
Abstract
1. Introduction
- Facing extreme operating conditions with coupled strong noise and imbalance, the “K-means feature module clustering–workload-adaptive multi-scale convolution–self-attention dynamic GRU auto-encoder–balanced subset F1-weighted voting” pipeline is, for the first time, embedded into a unified framework, yielding AFM-CDGAE, which achieves Pareto optimality between accuracy and lightweight with 0.87 M parameters and 2.1 ms inference latency;
- A Workload-Adaptive Weight Rescaler is designed so that multi-scale convolutional weights are simultaneously modulated by spatial attention and real-time CPU load λ, balancing denoising and multi-scale fidelity, while reducing edge-side peak CPU utilization from 78% to 42% and power consumption from 12.5 W to 7.8 W;
- A reconstruction–classification joint loss and BST-WVI ensemble strategy are proposed. Balanced subsets are constructed by permutation sampling, and weights are recalibrated via minority-class F1-weighted voting, improving Macro-F1 on Paderborn and CWRU by 7.7% and 9.5%, respectively, under a 1:100 extreme imbalance and restoring recall to 99% and 98%, breaking the systematic majority-class bias of traditional majority voting.
2. Theoretical Foundations
2.1. Adaptive Feature Module Clustering (AFM)
2.2. Workload-Adaptive Multi-Scale Convolution (WAMSC)
2.3. Self-Attention Dynamic Gated Auto-Encoder (Dynamic GRU Autoencoder, DGAE)
2.3.1. Encoder
2.3.2. Self-Attention Mechanism
2.3.3. Decoder
2.3.4. Loss Function
2.4. Classification-Reconstruction Joint Loss
2.4.1. Reconstruction Loss
2.4.2. Classification Loss
2.4.3. Joint Loss Function
2.5. Balanced Subset Training and Weighted Voting Integration
2.5.1. Balanced Subset Training
- Data partitioning: The training dataset is divided into multiple subsets, each containing the same number of minority class samples and majority class samples. This can be achieved through a clustering-based approach;
- Subset training: Train a base model independently on each balanced subset. This method ensures that each base model can learn on a relatively balanced data distribution, thereby reducing the model’s preference for the majority of classes;
- Model integration: Integrate the classification results of all base models to obtain the final classification result, among which the integration method is weighted voting.
2.5.2. Weighted Voting Integration
- Model training: Train multiple independent models, each based on different data subsets or using different algorithms;
- Classification weighting: For new data points, each model is independently classified, and the classification results are weighted based on the pre-defined weights;
- Result fusion: The weighted results are summed or averaged to obtain the comprehensive classification value.
Algorithm 1 Minority-F1 Weighted Voting for AFM-CDGAE Fault Diagnosis | ||
Inputs: | ||
: | Pre-trained DGAE base classifiers (ensemble members of AFM-CDGAE) | |
: | ||
: | Single input sample for fault diagnosis | |
Output: | ||
: | ||
1: | ) | |
2: | do | |
3: | //Predict labels of the calibration set using the k-th base model | |
4: | //Compute C-dimensional F1 score (one score per class) | |
5: | ||
6: | end for | |
7: | equals 1) | |
8: | ) | |
9: | do | |
10: | //Obtain soft-max probability of the test sample from the k-th model | |
11: | //Update fused probability ( denotes element-wise product) | |
12: | end for | |
13: | //Output the class with the maximum fused probability |
3. The Network Structure and Fault Diagnosis Process of AFM-CDGAE
4. Experimental Verification and Comparative Analysis
4.1. Introduction to the Dataset
4.1.1. The Paderborn Public Dataset from Germany
4.1.2. The Public Dataset of Rolling Bearings from Case Western Reserve University (CWRU) in the United States
4.2. Experimental Analysis
4.2.1. Main Experiment Results
4.2.2. Comparative Experiment
4.2.3. Ablation Experiment
Remove AFM
Remove Load Adaptive Scaling
Remove Spatial Attention SAM
Remove the Reconstruction Branch
The Decisive Advantage of BST-WVI in Extremely Unbalanced Scenarios
Gradually Resume the Experiment: The “Final Push” Effect of BST-WVI
5. Conclusions and Prospects
5.1. Main Conclusions
- Feature compression and denoising: The AFM module compresses the 512-dimensional spectrum into 32/48-dimensional “feature modules” through K-means, achieving over 90% dimension compression while retaining 98.4% of the fault energy, significantly reducing input redundancy and computational load;
- Load-adaptive multi-scale convolution: WAMSC introduces spatial attention, λ-scale real-time load scaling, and channel–height–width triaxial collaborative attention. On Jetson Xavier NX, the CPU peak utilization rate is reduced from 78% to 42%, the power consumption is reduced from 12.5 W to 7.8 W, and background noise is effectively suppressed;
- Lightweight and high-performance inference: The network has only 0.87 M parameters and 2.1 ms (RTX3070)/7.3 ms (JetsonXavierNXINT8) inference latency, meeting the requirements of real-time edge monitoring;
- Extreme condition robustness: Under the four extreme conditions of only 5% training label, 10 dB noise, 100:1 class imbalance and plus or minus 20% speed/load drift, Macro-F1 only decreased by 1.5%/1.8% (Paderborn/CWRU), while the average decline of the five latest baselines was 6.7%;
- Breakthrough of unbalanced bottleneck BST-WVI, through balanced subset training and F1-weighted voting, in the extremely unbalanced 1:100 scenario, the Macro-F1 of the two datasets jumped by 7.7% (Paderborn) and 9.5% (CWRU), respectively, and the recall rates simultaneously recovered to 99% and 98%. Experiments have confirmed that it is a key module for the “final kick”.
5.2. Summary of Innovation Points
- For the first time, the “K-means feature module clustering-load adaptive multi-scale convolution-self-attention dynamic gated autoencoding” was embedded into a unified framework to achieve Pareto optimality in terms of accuracy and lightweight;
- Workload-Adaptive Weight Rescaler is proposed, which enables the convolutional weights to be simultaneously modulated by spatial attention and real-time CPU load λ dynamically, taking into account both denoising and multi-scale fidelity;
- Design reconstruction-classification joint loss + minority class F1-weighted voting, breaks through the traditional majority voting bias towards the majority class, significantly improving the recognition rate of weak faults in extremely unbalanced scenarios.
5.3. Method Limitations
5.3.1. Hardware Limitations
5.3.2. Systematic Elaboration of Limitations:
- Sample size dependence: The model requires a minimum of 40 to 50 samples for minority classes; performance degrades significantly with fewer samples.
- Complex noise adaptability: Under non-Gaussian/non-stationary noise (e.g., 15 dB impulsive noise), the Macro-F1 decreases by more than 5%, which is worse than under Gaussian noise.
- Low-power hardware adaptation: The model cannot meet real-time requirements on ultra-low-power platforms (e.g., Raspberry Pi Zero W) and requires further optimization.
- Calibration overhead: Recalibration is needed for large-scale operating condition changes, increasing maintenance cost in long-term deployment.
5.4. Future Outlook
- Multimodal expansion: Uniformly map multiple physical quantities such as vibration, current, acoustic emission, and temperature to a shared hidden space, further enhancing the cross-sensor generalization capability;
- Adaptive clustering: Research online incremental K-means or contrastive clustering to enable AFM to continuously update feature modules in streaming data scenarios and avoid concept drift;
- Causality can be explained: Introduce gradient causality graphs (Grad-CAM-GC) or Shapley values to quantify the contribution of each frequency band to diagnostic decisions, meeting the auditing requirements of safety-critical areas;
- Federated deployment: By integrating federated learning and differential privacy, collaborative training of wind turbine clusters is achieved without data leaving the factory, addressing issues of data silos and privacy compliance;
- Fault Prediction: By introducing a temporal Transformer prediction head on the basis of the existing diagnostic framework, an end-to-end joint optimization of “diagnosisremaining life prediction” is achieved, providing a more complete decision-making basis for predictive maintenance.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AFM | Adaptive Feature Module |
CDGAE | Conditional Dynamic GRU Auto-Encoder |
AFM-CDGAE | Adaptive Feature Module–Conditional Dynamic GRU Auto-Encoder |
WAMSC | Workload-Adaptive Multi-Scale Convolution |
DGAE | Dynamic GRU Auto-Encoder |
DGRUAE | Dynamic GRU Auto-Encoder (also written as DGAE) |
GRU | Gated Recurrent Unit |
CNN | Convolutional Neural Network |
FFT | Fast Fourier Transform |
MSE | Mean Square Error |
BST-WVI | Balanced Subset Training and Weighted Voting Integration |
CWRU | Case Western Reserve University (bearing dataset) |
t-SNE | t-distributed Stochastic Neighbor Embedding |
Grad-CAM | Gradient-weighted Class Activation Mapping |
Grad-CAM++ | Improved Gradient-weighted Class Activation Mapping |
AUC | Area Under the ROC Curve |
AUPRC | Area Under the Precision–Recall Curve |
INT8 | 8-bit Integer quantization |
SMOTE | Synthetic Minority Over-sampling Technique |
GAN | Generative Adversarial Network |
VAE | Variational Auto-Encoder |
DANN | Domain-Adversarial Neural Network |
TrAdaBoost | Transfer AdaBoost |
EEMD | Ensemble Empirical Mode Decomposition |
CEEMDAN | Complete Ensemble Empirical Mode Decomposition with Adaptive Noise |
SEHT | Signal-Enhanced Hilbert Transform |
DITSVD | Doubly Improved Truncated Singular Value Decomposition |
JR-TFViT | Lightweight Jamming Recognition–Time–Frequency Vision Transformer |
ADAGCN | Adaptive Graph Convolutional Network |
ResNet | Residual Network |
EfficientNet | Efficient Convolutional Neural Network architecture |
TS-TCC | Time-Series Temporal Contrastive Coding |
CLR | Cyclical Learning Rate |
References
- Khan, S.; Yairi, T. A review on the application of deep learning in system health management. Mech. Syst. Signal Process. 2018, 107, 241–265. [Google Scholar] [CrossRef]
- Guo, P.; Huang, W.; Ding, C.; Shi, J.; Zhu, Z. A novel sparse-aware contrastive learning network with adaptive gating neurons for extreme class imbalance diagnosis scenarios. Mech. Syst. Signal Process. 2025, 235, 112895. [Google Scholar] [CrossRef]
- Li, X.; Wang, Y.; Zhao, S.; Yao, J.; Li, M. Adaptive Convergent Visibility Graph Network: An interpretable method for intelligent rolling bearing diagnosis. Mech. Syst. Signal Process. 2025, 222, 111761. [Google Scholar] [CrossRef]
- Wen, W.; Bai, Y.; Hu, F.; Cheng, W. Intelligent fault diagnosis based on receptive field of DCNN for rotary machine under variable conditions. Procedia Manuf. 2020, 49, 119–125. [Google Scholar] [CrossRef]
- Bin, L.; Jian, G. JR-TFViT: A lightweight efficient radar jamming recognition network based on global representation of the time–frequency domain. Electronics 2022, 11, 2794. [Google Scholar]
- Mao, J.; Sun, L.; Chen, J.; Yu, S. A parallel image denoising network based on nonparametric attention and multiscale feature fusion. Sensors 2025, 25, 317. [Google Scholar] [CrossRef]
- Zhang, Y.; Su, C.; He, X.; Tang, J.; Xie, M.; Liu, H. Progressive hybrid hypergraph attention network with channel information fusion for remaining useful life prediction of rolling bearings. Mech. Syst. Signal Process. 2025, 236, 112987. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, Y.; Zhao, H.; Shi, X.; Xie, D.; Gao, Z. An early weak fault assessment method for rolling bearings based on adaptive frequency focusing and multi-level activation quantum-inspired neural network. Measurement 2025, 255, 118039. [Google Scholar] [CrossRef]
- Sun, Q.; Tang, Y. Singularity analysis using continuous wavelet transform for bearing fault diagnosis. Mech. Syst. Signal Process. 2002, 16, 1025–1041. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, Z.; Jiao, Y.; Zhao, R.; Hu, X.; Che, R. DPCCNN: A new lightweight fault diagnosis model for small samples and high noise problem. Neurocomputing 2025, 626, 129526. [Google Scholar] [CrossRef]
- Hao, J.; Lv, Y.; Liu, J.; Liu, Y.-C. Dynamic weighted multimodal fusion for fault diagnosis of marine rotating machinery under noisy and low-sample conditions. Ocean Eng. 2025, 339, 122082. [Google Scholar] [CrossRef]
- Liu, Z.; Li, M. An early fault characteristics analysis method of reactor canned motor pump based on signal enhancement Hilbert transform and complete ensemble empirical mode decomposition with optimized adaptive noise. Measurement 2025, 255, 118035. [Google Scholar] [CrossRef]
- Lee, G.J.; Kim, S.K.; Lee, H.J. Sound-based unsupervised fault diagnosis of industrial equipment considering environmental noise. Sensors 2024, 24, 7319. [Google Scholar] [CrossRef]
- Rajagopalan, S.; Purohit, A.; Singh, J. Genetically optimised SMOTE-based adversarial discriminative domain adaptation for rotor fault diagnosis at variable operating conditions. Meas. Sci. Technol. 2024, 35, 105027. [Google Scholar] [CrossRef]
- Hu, C.; Deng, R.; Hu, X.; He, M.; Zhao, H.; Jiang, X. An automatic methodology for lithology identification in a tight sandstone reservoir using a bidirectional long short-term memory network combined with Borderline-SMOTE. Acta Geophys. 2024, 73, 1–17. [Google Scholar] [CrossRef]
- Xiao, L.; Feng-Liang, Z. Classification of multi-type bearing fault features based on semi-supervised generative adversarial network (GAN). Meas. Sci. Technol. 2024, 35, 025014. [Google Scholar]
- He, W.; Chen, J.; Zhou, Y.; Liu, X.; Chen, B.; Guo, B. An intelligent machinery fault diagnosis method based on GAN and transfer learning under variable working conditions. Sensors 2022, 22, 9175. [Google Scholar] [CrossRef]
- Wang, Y.; Li, D.; Li, L.; Sun, R.; Wang, S. A novel deep learning framework for rolling bearing fault diagnosis enhancement using VAE-augmented CNN model. Heliyon 2024, 10, e35407. [Google Scholar] [CrossRef]
- Khan, M.A.; Asad, B.; Vaimann, T.; Kallaste, A.; Pomarnacki, R.; Hyunh, V.K. Improved fault classification and localization in power transmission networks using VAE-generated synthetic data and machine learning algorithms. Machines 2023, 11, 963. [Google Scholar] [CrossRef]
- Yang, M.; Yuncheng, J.; Jinfeng, H. Application of transfer learning in fault diagnosis of seawater hydraulic pumps. J. Vib. Shock 2020, 2020, 1–8. [Google Scholar]
- Chen, J.; Ling, J.; Lei, N.; Li, L. BDSER-InceptionNet: A novel method for near-infrared spectroscopy model transfer based on deep learning and balanced distribution adaptation. Sensors 2025, 25, 4008. [Google Scholar] [CrossRef] [PubMed]
- Xu, F.; Zhang, R. Explainable domain adaptation learning framework for credit scoring in internet finance through adversarial transfer learning and ensemble fusion model. Mathematics 2025, 13, 1045. [Google Scholar] [CrossRef]
- Zhang, G.; Kong, X.; Ma, H.; Wang, Q.; Du, J.; Wang, J. Dual disentanglement domain generalization method for rotating machinery fault diagnosis. Mech. Syst. Signal Process. 2025, 228, 112460. [Google Scholar] [CrossRef]
- Wei, J.; Wang, Q.; Zhang, G.; Ma, H.; Wang, Y. Domain knowledge guided pseudo-label generation framework for semi-supervised domain generalization fault diagnosis. Adv. Eng. Inform. 2025, 67, 103540. [Google Scholar] [CrossRef]
- Kumar, S.; Sinha, B.B. Enhanced fault diagnosis of rolling bearings with noise filtering and neural networks. J. Vib. Eng. Technol. 2025, 13, 411–425. [Google Scholar] [CrossRef]
- Jiang, L.; Dong, F.; Xu, S.; Yin, B. Fault diagnosis method for proton exchange membrane fuel cells based on the fusion of deep learning and ensemble learning. J. Energy Eng. 2025, 151, 04025018. [Google Scholar] [CrossRef]
- Long, C.; Yu, T.; Feng, G.; Wang, T. Research on air compressor fault detection algorithm based on ensemble learning. Eng. Lett. 2025, 33, 1–6. [Google Scholar]
- Li, X.; Gu, J.; Li, M.; Zhang, X.; Guo, L.; Wang, Y.; Lyu, W.; Wang, Y. Adaptive expert ensembles for fault diagnosis: A graph causal framework addressing distributional shifts. Mech. Syst. Signal Process. 2025, 234, 112762. [Google Scholar] [CrossRef]
- Zhao, Z.; Jin, Z.; Xin, X.; Fu, Y.; Huang, X.; Li, L.; Qin, H.; Wei, C.; Li, Y.; Liu, Y. Cross-domain fault diagnosis of marine diesel engines based on stepwise diffusion and iterative bidirectional optimization. Eng. Appl. Artif. Intell. 2025, 155, 110994. [Google Scholar] [CrossRef]
- Song, X.; Wu, C.; Song, S.; Stojanovic, V.; Tejado, I. Fuzzy wavelet neural adaptive finite-time self-triggered fault-tolerant control for a quadrotor unmanned aerial vehicle with scheduled performance. Eng. Appl. Artif. Intell. 2024, 131, 107832. [Google Scholar] [CrossRef]
- Xia, P.; Huang, Y.; Wang, Y.; Liu, C.; Liu, J. Augmentation-based discriminative meta-learning for cross-machine few-shot fault diagnosis. Sci. China Technol. Sci. 2023, 66, 1698–1716. [Google Scholar] [CrossRef]
- Rezazadeh, N.; De Oliveira, M.; Lamanna, G.; Perfetto, D.; De Luca, A. WaveCORAL-DCCA: A Scalable Solution for Rotor Fault Diagnosis Across Operational Variabilities. Electronics 2025, 14, 3146. [Google Scholar] [CrossRef]
- Smith, A.W.; Randall, B.R. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process. 2015, 64–65, 100–131. [Google Scholar] [CrossRef]
- Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition Monitoring of Bearing Damage in Electromechanical Drive Systems by Using Motor Current Signals of Electric Motors: A Benchmark Data Set for Data-Driven Classification. In Proceedings of the European Conference of the Prognostics and Health Management Society, Bilbao, Spain, 6–8 July 2016. [Google Scholar]
- Rezazadeh, N.; De Luca, A.; Perfetto, D.; Salami, M.R.; Lamanna, G. Systematic Critical Review of Structural Health Monitoring Under Environmental and Operational Variability: Approaches for Baseline Compensation, Adaptation, and Reference-Free Techniques. Smart Mater. Struct. 2025, 34, 073001. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, K.; Liu, X.; Yang, G.; Zhai, K.; An, G.; Zhang, Y.; Peng, C. Lightweight Adaptive Feature Compression and Dynamic Network Fusion for Rotating Machinery Fault Diagnosis Under Extreme Conditions. Actuators 2025, 14, 458. https://doi.org/10.3390/act14090458
Zhang K, Liu X, Yang G, Zhai K, An G, Zhang Y, Peng C. Lightweight Adaptive Feature Compression and Dynamic Network Fusion for Rotating Machinery Fault Diagnosis Under Extreme Conditions. Actuators. 2025; 14(9):458. https://doi.org/10.3390/act14090458
Chicago/Turabian StyleZhang, Kaiyi, Xuling Liu, Guohua Yang, Kun Zhai, Gaofei An, Yusong Zhang, and Chaofeng Peng. 2025. "Lightweight Adaptive Feature Compression and Dynamic Network Fusion for Rotating Machinery Fault Diagnosis Under Extreme Conditions" Actuators 14, no. 9: 458. https://doi.org/10.3390/act14090458
APA StyleZhang, K., Liu, X., Yang, G., Zhai, K., An, G., Zhang, Y., & Peng, C. (2025). Lightweight Adaptive Feature Compression and Dynamic Network Fusion for Rotating Machinery Fault Diagnosis Under Extreme Conditions. Actuators, 14(9), 458. https://doi.org/10.3390/act14090458