Bearings serve as critical components in modern industrial machinery, directly influencing system stability, efficiency, and operational safety. Prolonged service in high-load, high-speed environments such as wind turbines, high-speed trains, and aerospace systems inevitably induces bearing failure modes including wear, fatigue, and cracks. These failures not only trigger unplanned downtime with significant economic losses but may escalate into catastrophic safety incidents. Consequently, developing precise early-stage bearing fault diagnosis technologies holds substantial engineering value for ensuring continuous production, optimizing maintenance strategies, and enhancing equipment reliability. Traditional diagnostic approaches relying on time-domain statistical features, frequency-domain Fourier analysis, and time-frequency transformations exhibit inherent limitations when processing non-stationary, nonlinear vibration signals, particularly demonstrating inadequate robustness for incipient weak faults under strong noise interference. Recent advances in artificial intelligence have enabled machine learning algorithms like support vector machines and artificial neural networks to show significant potential in fault classification. For signal processing, while empirical mode decomposition adaptively decomposes non-stationary signals into intrinsic mode functions, it suffers from modal aliasing. Ensemble and complete ensemble empirical mode decomposition partially address this limitation through Gaussian white noise-assisted ensemble averaging, whereas variational mode decomposition reformulates decomposition as a variational optimization problem, offering theoretically rigorous foundations with superior noise immunity and modal separation efficacy. Regarding feature extraction, entropy-based complexity metrics are particularly vital. Permutation entropy quantifies signal complexity through ordinal pattern probability; multiscale permutation entropy extends analysis across temporal scales; weighted permutation entropy and multiscale weighted permutation entropy enhance local feature representation via weighting factors; refined composite multiscale weighted permutation entropy further integrates multiscale analysis, weighted entropy, and composite refinement to significantly boost feature robustness and noise immunity. For classifier construction, support vector machines excel in small-sample nonlinear classification yet require careful parameter optimization. Intelligent optimization techniques including whale optimization algorithm, particle swarm optimization, and genetic algorithms effectively address this challenge. To handle high-dimensional features, t-distributed stochastic neighbor embedding enables nonlinear dimensionality reduction while preserving local and global data structures, providing critical support for model visualization and construction. Current research on fault diagnosis for wind turbine gearbox bearings primarily encompasses three methodological domains: feature extraction, pattern recognition, and signal processing techniques. Regarding signal preprocessing, empirical mode decomposition (EMD) [
1,
2,
3,
4], ensemble empirical mode decomposition (EEMD) [
5,
6], and variational mode decomposition (VMD) [
7,
8,
9,
10,
11,
12] are extensively applied to enhance signal-to-noise ratio, mitigate noise interference, and prevent mode mixing. Within feature extraction, information entropy-based approaches [
13,
14,
15] and deep learning methodologies [
16,
17,
18,
19] have been widely investigated and implemented by researchers. Addressing classifier parameter selection under conditions of limited sample size and noise contamination, where improving accuracy and efficiency is paramount, support vector machine (SVM) algorithms [
20,
21,
22] effectively fulfill classification optimization requirements, achieving intelligent classification while enhancing both diagnostic efficiency and precision.
Numerous scholars have contributed significantly to research on wind turbine bearing fault diagnosis. Maohua Xiao et al. [
23] demonstrated that introducing the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) method yielded effective dimensionality reduction; combined with information entropy algorithms, it efficiently separated bearing vibration signal features for identifying different vibration signals. Runze Qi et al. [
24] applied ICEEMDAN to dimensionality reduction of transformer mechanical fault signals, finding it enhanced reduction effectiveness, effectively suppressed noise signals, and yielded purer fault feature data. Xiao Yang et al. [
25] investigated cross-domain fault diagnosis using refined composite multiscale weighted permutation entropy (RCZMWPE), which enhanced noise resistance, avoided information loss, and captured weak fault features through refined composite multiscale weighted calculation of minimally analyzed coarse-grained sequences. Wei Sun et al. [
26] integrated the whale optimization algorithm (WOA) with support vector machine (SVM) to study partial discharge faults in power plants, building virtual models, training data, and identifying fault types, discovering the significant impact of improved WOA (IWOA) on VMD and SVM parameters, with the IWOA-VMD-SVM combination markedly improving accuracy and operational speed. Bing Wang et al. [
27] proposed combining a whale swarm algorithm-optimized multilayer SVM with multiscale Kolmogorov entropy for rolling bearing fault signals, achieving 97.8% diagnostic accuracy. Zhen Wang et al. [
28] optimized VMD and t-distributed stochastic neighbor embedding (t-SNE), achieving 100% accuracy in parallel-axis gearbox fault diagnosis. Wei He et al. [
29] combined singular value decomposition (SVD), t-SNE, and SVM, developing a novel feature extraction path and parametric t-SNE to eliminate irrelevant information and enhance fault severity separability through nonlinear projection. Jiawei Liu et al. [
30] combined deep learning with t-SNE for sequential fault diagnosis in proton exchange membrane fuel cell water management subsystems, achieving 96.88% accuracy and broadening the application scope of this combined diagnostic approach. Smith et al. [
31] proposed the local mean decomposition (LMD) method. Ma, J et al. [
32] proposed the modified VMD and Teager energy operator (MVMD-TEO) method, which autonomously determines the VMD mode number to extract incipient fault features of bearings. Yang et al. [
33] determined VMD parameters through computation of central frequency components and envelope spectrum energy ratios between different components. Li, J. et al. [
34] proposed an estimation approach for the quadratic penalty term. This method analyzed the spectral distribution patterns of bearing vibration signals, ultimately enabling adaptive determination of the parameters. LI et al. [
35] employed a deeply stacked least squares support vector machine (LSSVM) algorithm to acquire intrinsic fault characteristics of rolling bearings through adaptive feature extraction. Ding et al. [
36] employed a genetically variant particle swarm optimization (GVPSO) algorithm to determine the decomposition mode number and penalty factor of VMD. Gear fault signals were decomposed into multiple intrinsic mode functions (IMFs), followed by calculation of the sample entropy for each IMF as feature values. These features were subsequently input into a probabilistic neural network (PNN) to achieve precise classification of gear faults. M Nazari et al. [
37] proposed a successive variational mode decomposition (SVMD) method. This approach achieved modal decomposition by imposing a series of constraints on the VMD optimization problem, eliminating the need for presetting the number of decomposition modes *k*. It significantly reduced computational complexity, adaptively partitioned the frequency domain and exhibited greater robustness to initializations of modal center frequencies. Yan et al. [
20] employed particle swarm optimization (PSO)-tuned SVM to achieve recognition of multiple fault states in rolling bearings. Song et al. [
38] utilized wavelet packet threshold denoising combined with a BP neural network to enhance the quality of bearing fault signals and achieve efficient diagnosis. Current research efforts focus on four primary technical routes: deep learning, time-frequency analysis, information entropy, and traditional machine learning. Deep learning relies heavily on massive labeled data, incurs high training costs, involves complex models, and exhibits limited generalization under small samples. Despite the widespread adoption of signal processing techniques (e.g., EMD, EEMD, VMD, and LMD) in bearing fault diagnosis, their inherent limitations under noisy industrial environments warrant critical scrutiny. Empirical decomposition methods (EMD/EEMD) exhibit pronounced sensitivity to noise interference, inducing modal aliasing that obscures fault-related frequency components and precipitates decomposition distortion. While VMD theoretically circumvents mode mixing via variational constraints, its efficacy hinges critically on preset parameters (decomposition level K and penalty factor α), which lack adaptability to signal heterogeneity. Similarly, entropy-based features (e.g., MPE and WPE) suffer from instability under strong noise and high computational overhead in multiscale implementations, limiting their robustness for weak fault detection. These constraints collectively impede reliable feature extraction, necessitating integrated approaches to enhance noise immunity and parameter autonomy. Time-frequency analysis methods (EMD/EEMD) are noise-sensitive and prone to mode mixing and decomposition distortion. Although VMD offers improvements, its requirement for preset decomposition levels remains a limitation. Information entropy methods are sensitive to signal length and noise, incur high computational costs in multiscale analysis, and suffer from feature instability under noise. Traditional classifiers like SVM exhibit performance highly dependent on penalty factors and kernel functions, making parameter tuning difficult and leading to overfitting or reduced accuracy. While scholars globally have achieved significant results in wind turbine bearing fault diagnosis, challenges persist, including signal noise interference and difficulties in acquiring high-quality operational data. Consequently, integrating multiple research methodologies is essential to enhance diagnostic precision and reliability, ensuring the stable and efficient operation of wind turbines. While deep learning architectures (such as convolutional neural networks (CNNs) and transformers) excel in automated feature representation, they confront persistent challenges in industrial fault diagnosis applications. These challenges fundamentally arise from a critical paradox: their substantial dependence on large-scale labeled datasets conflicts directly with the inherent scarcity of actual fault samples. This limitation is exacerbated by restricted cross-domain generalization capability, manifesting as significant performance deterioration under varying operational conditions and further compounded by prohibitive computational costs that impede real-time deployment on resource-constrained edge devices. Collectively, these inherent constraints necessitate the advancement of hybrid methodologies that strategically incorporate prior knowledge of signal processing to achieve robust feature extraction while maintaining computational tractability. In contrast, the approach integrating feature extraction with optimized support vector machines (SVMs) presented in this study offers distinct advantages through its structural simplicity, training efficiency, effectiveness with limited samples, and deployment practicality. These characteristics render it particularly well-suited for scenarios characterized by data scarcity or demanding real-time processing requirements. Consequently, this research focuses on enhancing the robustness and diagnostic accuracy of traditional modeling paradigms, ultimately seeking to deliver an efficient and pragmatically viable solution for engineering applications operating under low-resource constraints.
This study proposes an integrated framework for collaborative bearing fault diagnosis across multiple datasets, combining VMD, refined composite multiscale weighted permutation entropy (RCMWPE) feature extraction, and whale optimization algorithm-optimized support vector machine (WOA-SVM). The core contributions are summarized as follows: Firstly, an innovative integration of VMD signal decomposition, RCMWPE feature extraction, and WOA-SVM classifier establishes synergistic advantages leveraging VMD’s nonstationary signal decoupling capability, RCMWPE’s high robustness feature representation, and WOA-SVM’s parameter optimization effectiveness. Secondly, generalization capability is validated through PRONOSTIA (four class faults) and CWRU (ten class faults) benchmark datasets, confirming cross-dataset scenario adaptability. Thirdly, under complex operating conditions exemplified by the CWRU ten-class task, the system demonstrates diagnostic robustness for nonlinear vibration signals and multimode faults. Finally, experimental results indicate that WOA-SVM achieves 96.5% and 99.67% accuracy rates on PRONOSTIA and CWRU datasets, respectively, significantly outperforming conventional SVM, genetic algorithm-optimized SVM, and particle swarm optimization-optimized SVM baseline models.