4.1. Analysis of Model Comparison
To comprehensively evaluate the advancement of PDW-Net, five representative fault diagnosis methods were selected for systematic comparison, namely Random Forest (a traditional machine learning method), 1D-CNN (an end-to-end deep learning method), WT+2D-CNN (a fusion method of time–frequency analysis and deep learning), LSTM (a temporal sequence modeling method), and EMD+LSTM (a combination method of adaptive decomposition and deep learning). The experimental results are presented in
Table 4 and
Figure 8.
Random Forest achieves an average accuracy of 87.44%. Its performance is primarily constrained by the representational capability of features. Although time-domain statistical features can reflect fault characteristics to a certain extent, the relationship between these features and the physical mechanisms of faults is mostly an indirect statistical correlation. In contrast, PDW-Net realizes end-to-end features learning from raw signals to fault categories through physically guided deep learning while retaining physical interpretability. The resultant performance gain of 10.43 percentage points over the Random Forest baseline not only demonstrates the representational advantages of deep learning but also verifies the effectiveness of integrating physical understanding into the feature learning process.
The pronounced instability of 1D-CNN exposes a fundamental limitation of purely data-driven approaches in small-sample industrial fault diagnosis scenarios. With merely 255 training samples available, learning discriminative fault representations directly from 25,600-dimensional raw signals necessitates navigating an intractably large hypothesis space, rendering the optimization process highly susceptible to convergence toward suboptimal local minima or overfitting to individual sample characteristics. PDW-Net addresses this challenge by imposing multi-level physical constraints that progressively narrow the effective learning space: VMD reduces signal dimensionality from 25,600 points to 7 × 2560 points; physics-based feature extraction achieves statistical-level abstraction of fault-relevant information; adaptive frequency-band weighting selectively emphasizes diagnostically critical spectral regions; and the heterogeneous network architecture employs modality-specific processing tailored to distinct feature types. This physically guided hierarchical abstraction strategy not only enhances overall diagnostic performance, but more critically, stabilizes the learning dynamics, as evidenced by a reduction in standard deviation to 2.60%.
PDW-Net surpasses EMD+LSTM and WT+2D-CNN by margins of 50.31 and 35.50 percentage points, respectively, a performance disparity that underscores the critical role of signal decomposition strategy in rotating machinery fault diagnosis. The inherent mode mixing problem in empirical mode decomposition (EMD) introduces spurious cross-component interference between impact and modulation constituents during decomposition, fundamentally compromising the fidelity of subsequent feature extraction. Similarly, the fixed wavelet basis functions employed in WT+2D-CNN lack the adaptive flexibility required to match the scale-varying characteristics of heterogeneous fault types. VMD, by contrast, exploits its variational optimization framework in conjunction with narrowband spectral constraints to achieve physically meaningful separation of signal constituents into spectrally distinct Intrinsic Mode Functions, thereby providing a clean and well-structured input representation for subsequent targeted diagnostic processing.
The diagnostic performance of PDW-Net for different fault types on the independent test set is summarized in
Table 5. The specific diagnostic results are illustrated by the confusion matrix shown in
Figure 7, which presents the diagnostic details from five representative experiments. Overall, 78 out of 80 test samples were correctly identified, yielding an overall accuracy of 97.5%. This result is consistent with the average accuracy of 97.87% obtained from 20 repeated experiments, verifying the model’s stable diagnostic capability.
It is worth noting that the two misclassifications in
Table 5 (main shaft wear being misjudged as insufficient refrigerant and excessive refrigerant being misjudged as condenser failure) can be physically explained from the perspective of spectral characteristics. As shown in
Figure 3, both main shaft wear and insufficient refrigerant have dense modulation components in the medium- and high-frequency bands (the intervals of IMF3 and IMF4 are 160 Hz and 160 Hz, respectively), and their amplitude variation trends are similar, resulting in partial overlap in the feature space. Similarly, excessive refrigerant and condenser failure both exhibit wideband modulation characteristics and energy dispersion features, which can easily lead to confusion. This observation indicates that although PDW-Net has achieved an average accuracy of 97.87%, for fault pairs with similar spectral structures, the separability of features still needs to be further enhanced, which also points out the direction for subsequent improvements.
4.2. Analysis of Ablation Study
To objectively assess the individual contributions of the key architectural components within the proposed Physically Guided Dual-path Adaptive Weighted Diagnostic Network (PDW-Net), a systematic ablation study was conducted. As summarized in
Table 6 and illustrated in
Figure 9, five variant models were constructed by selectively removing or replacing critical design elements of PDW-Net, thereby enabling a controlled investigation into their respective impacts on overall diagnostic performance.
The M0 variant employs an equal-weight strategy, assigning identical weights to all seven IMF components obtained from VMD. This variant’s accuracy significantly drops to 63.63%, representing a 34.24 percentage point decrease relative to the full model. This substantial performance loss indicates that assuming all IMF components have equal importance in fault diagnosis contradicts actual physical principles. Equal-weight processing ignores this physically based feature distribution difference, preventing the network from focusing on frequency-band information most relevant to specific faults. The adaptive weighting mechanism learns the importance weights of each IMF through a data-driven approach, essentially achieving intelligent recognition and resource allocation for fault physical patterns.
In the M1 variant, the multilayer perceptron (MLP) output is directly employed as the reconstruction weight without being scaled by the kurtosis or energy of the corresponding branch. The accuracy of this variant decreased by 9.5% compared to the full model, indicating that the transient pulse components and steady-state vibration components in the vibration signal can effectively characterize the signal’s feature characteristics through their corresponding kurtosis and energy.
The M2 and M3 variants retain only the kurtosis and energy branches, respectively, with M3 achieving 85.25% accuracy compared to M2’s 66.75%. The synergistic effect produced by dual-branch fusion is reflected in the 12.62 percentage point performance improvement of the full model compared to the best single branch. This improvement stems from the physical orthogonality between kurtosis and energy features: kurtosis features reflect transient impact intensity, while energy features reflect steady-state distribution patterns. Dual-branch architecture achieves more robust fault identification by simultaneously utilizing these two orthogonal feature types.
The M4 variant substitutes the heterogeneous dual-branch architecture with two homogeneous FFT-1D-CNN branches, yielding an accuracy reduction to 81.69%. It demonstrates the significant value of employing specialized network architectures for different types of fault features. Impact features manifest as discrete spectral peaks in the frequency domain, lending themselves naturally to localized feature extraction via 1D-CNN. Modulation features, by contrast, exhibit continuous band-structured patterns in the time–frequency domain, making them inherently amenable to spatial feature learning via 2D- CNN. Heterogeneous design achieves optimal matching between physical features and processing methods, while homogeneous design fails to fully exploit the analytical potential of different features.
4.3. Engineering Applicability and Deployment Considerations of Methodology
This section conducts a comprehensive analysis of the parameter robustness, computational cost, deployment complexity and statistical significance of the proposed PDW-Net method from the perspective of engineering application.
In terms of parameter robustness, during the actual operation of the refrigeration system, environmental conditions such as pressure, temperature, and load are constantly changing. The compressor used in this study is a horizontal DC inverter compressor. The impact of these environmental condition changes on the vibration signal is multi-faceted: on the one hand, load changes directly cause the adjustment of the compressor’s rotational speed, resulting in all speed-related characteristic frequencies (including shaft frequency, harmonics, and fault characteristic frequencies, etc.) being proportionally scaled; on the other hand, changes in temperature and pressure may introduce additional frequency components or alter the amplitude distribution of existing components by modifying the flow field, excitation force, or structural resonance characteristics. Therefore, the influence of environmental changes on the signal cannot be simply attributed to a single factor.
However, no matter how complex the influence mechanism is, the number of independent narrowband frequency components that need to be separated by VMD is mainly determined by the physical mechanism of the fault and the inherent structure of the compressor. These inherent attributes (such as the number of motor poles, the number of stator slots, structural resonance modes, etc.) do not change with environmental conditions. In other words, changes in environmental conditions may alter the values or amplitudes of certain frequency components, but they will not essentially increase or decrease the number of independent components that need to be separated. Therefore, the minimum number of modes K required for VMD is stable.
Based on the above understanding, in the second stage of this paper, the maximum optimal K value among all faults (i.e., K = 7) is taken as the fixed number of modes. This is an over-complete decomposition strategy, ensuring that K is never less than the actual required number of modes under any operating conditions, thereby avoiding under-decomposition due to insufficient modes. Regarding the penalty factor α, the complete reconstruction property of VMD guarantees that the sum of all IMFs can precisely reconstruct the original signal regardless of parameter selection, which is a necessary condition for information integrity. However, complete reconstruction does not directly ensure that fault features are effectively separated in individual IMFs. This is precisely the purpose of designing the kurtosis weighting and energy weighting mechanisms in this paper: by leveraging the sensitivity of kurtosis to impulse components and energy to steady-state components, it adaptively screens and enhances IMFs containing the main fault information, while suppressing the interference that may be introduced by redundant modes. Therefore, by fixing K = 7 and combining the physically guided weighting strategy, effective extraction of fault features can be achieved while ensuring information integrity.
In terms of computational cost and deployment complexity, the computational cost of this method mainly lies in the offline optimization and online diagnosis stages. In the offline stage, a two-stage PSO is conducted for each fault type, requiring multiple VMDs. This process is performed only once during initial system modeling and does not affect real-time diagnosis. In the online diagnosis stage, only one VMD (K = 7) and neural network forward calculation are required. On an industrial control computer equipped with an AMD 12-core CPU, the average inference time for a single sample is 94.17 ± 4.29 ms (only model forward calculation), and approximately 10.6 samples can be processed per second. The total number of model parameters is 40,567, with 40,559 trainable parameters, which is a lightweight structure and convenient for embedded or edge deployment. In the training stage, based on 255 training samples, 50 rounds of iterations are completed in about 25 s on the same hardware, demonstrating rapid re-training capability.
To verify the statistical significance of the performance differences among the various methods, the Friedman test was conducted on the accuracy rates of 20 experiments. The results indicated that there were significant differences in diagnostic performance among the different methods (χ2 = 69.04, p < 0.001). Subsequently, the Nemenyi post hoc test was used for pairwise comparisons. The results showed that PDW-Net was significantly superior to EMD+LSTM, WT+2D-CNN, 1D-CNN, and LSTM at the 95% confidence level (p < 0.001), but there was no significant difference compared to Random Forest (p = 0.2591). Random Forest was significantly better than EMD+LSTM, WT+2D-CNN, and LSTM (p < 0.05), but there was no significant difference compared to 1D-CNN (p = 0.4275). 1D-CNN was significantly better than EMD+LSTM (p = 0.0414). These results confirmed the superiority and statistical robustness of PDW-Net.
The Friedman test results of the ablation experiments (χ2 = 64.64, p < 0.001) indicated that there were significant differences in the diagnostic performance among different variant models. The Nemenyi post hoc test showed that the complete model was significantly better than the equal-weight variant (M0), the only kurtosis branch variant (M2), and the homogeneous dual-branch variant (M4) at the 95% confidence level (p < 0.05), but there was no significant difference compared with the MLP direct weight variant (M1) and the only energy branch variant (M3) (p > 0.05). Further analysis revealed that M1 and M3 had relatively high performance fluctuations (CV values of 0.117 and 0.133, respectively), which were much higher than that of the complete model (0.027), indicating that their diagnostic results were more affected by random factors. From an engineering application perspective, the advantages of the complete model in terms of average accuracy and stability were clear, which verified the effectiveness of the proposed physics-guided dual-path architecture.
It should be noted that the current study focuses on discrete, single-fault scenarios, which serves as a foundation for establishing the diagnostic framework. In real-world industrial applications, faults typically develop gradually and may occur in combination—scenarios that are not addressed in this work. Nevertheless, the proposed framework offers a reasonable starting point for such extensions: the physical features (kurtosis and energy) are inherently continuous, VMD decomposes the signal into distinct frequency bands, and the heterogeneous dual-branch architecture is designed to process physically distinct signal components. Building upon these foundations, our future work will focus specifically on two extensions: continuous fault severity estimation to capture gradual degradation, and multi-label combined fault diagnosis to handle simultaneous faults. These directions will be pursued in subsequent studies to enhance the framework’s applicability in real-world industrial settings.
In summary, PDW-Net demonstrates parameter robustness under different operating conditions, while featuring controllable computational costs, low deployment complexity, and statistically validated performance improvements. The current work establishes a fundamental diagnostic framework for discrete single-fault scenarios. Future work will build upon this foundation to address issues such as continuous severity assessment and combined fault diagnosis.