1. Introduction
Most existing bearing fault diagnosis methods rely on analyzing parameters such as vibration and temperature signals under specific operating conditions [
1,
2]. Traditional signal processing techniques include time-domain analysis, frequency-domain analysis, and time–frequency domain analysis [
3,
4,
5,
6]. Depending on the available techniques and actual working conditions, different signal processing methods are selected and combined to extract fault features from the signals, enabling rapid and accurate fault identification.
The direct fusion of multi-scale data by integrating raw signals retains a large amount of information; however, the richness of this information significantly increases processing complexity and leads to serious redundancy issues. Zhang et al. [
7] proposed a method that resamples the raw data and constructs multi-state time series matrices, thereby greatly enriching the information content. Wang et al. [
8] introduced a temporal–spatial learning framework with attention; The fusion of multi-source data supports fault diagnosis under complex environments and mitigates the limitations of single-source information. However, the heterogeneity among data sources may degrade the fusion performance and limit information gain. To address this, Cui et al. [
9] constructed a dual-branch network that enhances feature complementarity, effectively overcoming the inconsistency caused by simple data concatenation. Qin et al. [
10] designed a channel attention module to filter information channels from multi-source inputs, enabling dynamic focus on critical channels. Liang et al. [
11] creatively proposed a multi-scale approximate entropy computation method that preserves signal characteristics while adaptively adjusting feature weights. Huo et al. [
12] combined frequency-domain energy representations with deep features using multi-scale data, creating a comprehensive 3D feature representation. Li D et al. [
13] proposed a Variable Filtered-Waveform Variational Mode Decomposition method, which introduces fractional-order constraints and dynamically adjusts the Wiener-filtered waveform. Li T et al. [
14] developed a data–model fusion-based degradation digital twin model, which reveals the interdependent mechanisms within fault evolution.
With the advancement of lightweight strategies, significant breakthroughs have been achieved in the field of fault diagnosis [
15,
16]. Xie K et al. [
17] replaced the fully connected layers in the SE module with group convolutions and introduced cross-channel interactions to enhance efficiency. Cheng et al. [
18] incorporated attention mechanism into their model, resulting in a substantial improvement in discriminative capability. Dong et al. [
19] transformed the input structure into a one-dimensional format and applied the SE module to dynamically weight key channels. Cai et al. [
20] converted the input data into Markov transition fields and fed them into an inverted residual-based ShuffleNetV2 network. Tong et al. [
21] by encoding one-dimensional signals into two-dimensional images and integrating group normalization with a dual-attention mechanism, their method enhances classification performance under imbalanced data conditions. Lu et al. [
22] extracted domain-invariant features from source equipment using a multi-scale residual network and employed knowledge distillation to efficiently transfer the learned knowledge to a lightweight student model. W et al. [
23] presented a data-driven fault diagnosis approach for rolling bearings under strong noise environments, which combines advanced signal preprocessing techniques with a lightweight convolutional neural network to achieve robust fault identification. Although these methods contribute to model lightweighting in essence, they often lead to significant accuracy degradation and reduced computational efficiency. Therefore, it is necessary to design fault diagnosis models based on lightweight architectures that can overcome the limitations of poor accuracy and low efficiency caused by partial structural replacement.
In conventional signal-to-image transformation, the generated grayscale representations often lack physical correspondence with the actual vibration source locations, leading to ambiguous feature distributions among different fault types. Furthermore, most existing approaches neglect the statistical characteristics of vibration signals—such as kurtosis and impulsiveness—which are crucial indicators of mechanical degradation. To overcome these limitations, this study introduces a physics-guided image construction strategy. The collected bearing vibration signals are segmented using a sliding window, and dimensionless indicators are computed for each segment. These indicators are then used to assign adaptive weights to predefined image regions, thereby constructing a novel bearing fault image dataset. By embedding physical prior knowledge into the weighting matrix design, the proposed approach ensures that the signal-to-image transformation process maintains physical consistency and diagnostic relevance.
In vibration-based fault diagnosis, feature representations often suffer from noise interference and redundant channel coupling, which degrade discriminability and increase model complexity. Moreover, lightweight networks tend to lose critical fault information during feature compression. To address these challenges, this study introduces a Cheap channel obfuscation module that achieves noise suppression, channel decorrelation, and lightweight feature enhancement in a unified framework.
4. Experiment
4.1. Datasets
The experimental data used in this study are sourced from the bearing dataset developed by the Bearing Data Center of Case Western Reserve University (CWRU) [
28]. The dataset includes bearing vibration signals under various fault modes and load conditions. For each fault diameter, three fault types are provided: inner race fault, rolling element fault, and outer race fault. This study utilizes data collected from the drive-end bearing. The sampling frequency is 12 kHz, and each condition contains 119,808 data points. The fault diameters (in inches) are 0.007, 0.014, and 0.021, with a total of ten bearing conditions. Additionally, datasets under four different load conditions—0 HP, 1 HP, 2 HP, and 3 HP—are used for comparative experiments.
The experimental data used in this study are sourced from the Southeast University (SEU) dataset [
29]. The SEU bearing dataset was collected using a transmission system dynamics simulator and includes vibration data under two different operating conditions: 20 Hz-0 V and 30 Hz-2 V. This study utilizes vibration data in the X-direction. The dataset contains five fault categories, including rolling element, inner race, and outer race faults, with 341,333 sample points for each condition. Datasets under two different load conditions—0 HP and 2 HP—were used for comparative experiments.
4.2. Experimental Environment
This experiment uses an environment built by a cloud server. Specifications: GPU: RTX 2080Ti; Memory: 40 GB; Image environment: Python 3.12 (ubuntu22.04), Cuda12.1, PyTorch 2.3.2.
The architectural details of the proposed ADGCC-Net are systematically presented in
Table 1. The network processes bearing vibration signals converted into 32 × 32 grayscale images through our novel physics-informed transformation method described in
Section 3.1. The architecture employs a strategic combination of spatial reduction and channel expansion operations, with particular emphasis on computational efficiency through grouped pointwise convolutions.
Input Representation: The network accepts preprocessed 2D representations (32 × 32 pixels) where vibration characteristics are encoded through dimensionless indicator weighting. Initial Feature Extraction: The first convolutional layer (Conv2d) utilizes 32 filters with a 3 × 3 kernel (stride = 2, padding = 1) to extract preliminary spatial features while reducing spatial dimensions by half through strided convolution. Batch normalization follows immediately to stabilize gradient propagation and accelerate convergence. Multi-Scale Feature Learning: Stage1 further processes features through a series of operations that maintain channel depth (32) while reducing spatial resolution to 16 × 16, likely incorporating pooling or additional strided convolutions. Grouped Pointwise Convolution Layers: Conv1 (implied in initial layers): Although not explicitly labeled in this table, our implementation incorporates grouped pointwise convolution principles in early stages to establish efficient feature foundations. Conv5 (channel expansion layer): A critically designed grouped 1 × 1 convolution (groups = 32) dramatically expands channel dimensionality from 32 to 1024 while preserving spatial resolution (16 × 16). This operation achieves a 98.5% parameter reduction compared to standard convolutional approaches, requiring only 1024 parameters instead of the theoretical 32 × 1024 = 32,768 parameters of a conventional implementation. Global Feature Aggregation: Adaptive average pooling collapses spatial dimensions to 1 × 1, transforming the 1024-channel feature maps into a compact 1024-dimensional representation vector. Classification Head: A fully connected layer maps the high-level features to 10 output neurons, corresponding to the fault categories in our experimental setup (9 fault types plus normal condition).
4.3. Results on CWRU
The rolling bearing dataset was split into non-overlapping training and testing sets at a ratio of 0.3:0.2. A sliding window was applied to resample the data, resulting in 1008 dimensionless weighted grayscale images for the training set and 672 images for the testing set. For the experiments, the input batch size was set to 32, the learning rate was 0.0018, the activation function was ReLU, and the loss function was Cross Entropy Loss. The Adam optimizer was employed to update the network parameters via backpropagation over 60 training epochs.
4.3.1. Model Testing and Analysis
When the adjustment parameter
was set to 1—representing a fully weighted strategy—the experimental results are shown in
Figure 5. This figure illustrates the trends in loss and accuracy during training under various load conditions using the CWRU bearing dataset.
Confusion matrices [
30] were plotted based on the predicted and actual labels of the test data under load conditions of 0 HP, 1 HP, 2 HP, and 3 HP, as shown in
Figure 6. In the figure, the horizontal axis represents the predicted classes, while the vertical axis represents the actual classes.
To quantitatively evaluate the model’s overall performance under different load conditions, we calculated four key evaluation metrics—Accuracy, Precision, Recall, and F1-Score [
31]—based on the confusion matrix shown in
Figure 6. The results for the four load conditions are summarized as bar charts in
Figure 7.
Based on the confusion matrix, the model demonstrates excellent overall recognition performance in the ten-class classification task under different load conditions. The key classification performance metrics derived from the confusion matrix remain at consistently high levels, indicating that the model possesses strong feature discrimination and generalization capabilities, effectively distinguishing between different fault categories. Furthermore, the training and testing accuracy curves are highly consistent, both exceeding 99% at an early stage and remaining stable thereafter, highlighting the model’s strong generalization ability and robustness. In summary, the proposed ADGCC-Net model exhibits outstanding fault diagnosis performance on the CWRU dataset across various load conditions.
4.3.2. Accuracy Comparison of Different Models Based on CWRU Data
To validate the superiority of the proposed method, this study selected several representative diagnostic approaches from recent research on the CWRU bearing dataset for comparative experiments, as shown in
Table 2. Under the same experimental conditions, when the load levels were 0 HP, 1 HP, 2 HP, and 3 HP, the corresponding modulation factors were set to 0.4, 0.6, 0.6, and 0.2, respectively. At this point, our model achieves the highest diagnostic accuracy under different operating conditions, with the accuracy reaching 1.0 particularly under the 2 HP and 3 HP load conditions. These results demonstrate that the proposed modulation mechanism can effectively adapt to variations in load conditions and significantly enhance the model’s adaptability to diverse operating scenarios.
4.4. Result on SEU
The rolling bearing dataset was divided into non-overlapping training and testing sets in a 0.3:0.2 ratio. Using a sliding window for data resampling, 1632 dimensionless weighted grayscale images were generated for the training set and 1088 images for the testing set. For the experiments, the input batch size was set to 32, the learning rate to 0.002, the activation function used was ReLU, and the loss function was Cross Entropy Loss. The network parameters were optimized using the Adam algorithm over 80 training epochs.
4.4.1. Model Testing
When the modulation factor
was set to 0.8—i.e., applying the 0.8-weighting strategy—the experimental results are shown in
Figure 8, which illustrates the trends of the model’s loss and accuracy across training epochs under different load conditions using the SEU bearing dataset.
Based on the test results under 0 HP and 2 HP loading conditions, the confusion matrices were plotted using the predicted and actual labels, as shown in
Figure 9. The horizontal axis represents the predicted classes, while the vertical axis indicates the actual classes. As observed, the ADGCC-Net model demonstrates excellent fault diagnosis performance on the SEU bearing dataset.
4.4.2. Accuracy Comparison of Different Models Based on SEU Data
To validate the superiority of the proposed approach, representative diagnostic methods for the SEU bearing dataset from recent years were selected for comparative experiments, as summarized in
Table 3. Under the same experimental conditions, when the load was 0 HP and the modulation factor was set to
= 0.8, the proposed model achieved a diagnostic accuracy of 99.91%. These results indicate that the proposed modulation mechanism effectively adapts to varying load conditions and significantly enhances the model’s adaptability to different working environments.
4.5. Modulatory Factor Experiment
To evaluate the robustness and generalization capability of the model, moderate-intensity noise was introduced solely into the CWRU, with a signal-to-noise ratio (SNR) of 15 dB. On this basis, a linear weighting model controlled by the modulation factor
was constructed and incorporated into the feature fusion process of dimensionless indicators, aiming to enhance the model’s responsiveness to key regional features. The linear weighting model is defined as
, where
represents the indicator value. As shown in
Table 4, the model was validated under various operating conditions to analyze the impact of introducing the modulation factor
on fault recognition accuracy, thereby assessing the model’s generalization performance across different scenarios. A comparison between full weighting (
= 1) and no weighting (
= 0) demonstrates that the model achieved a maximum accuracy of 1.
The comparative analysis of the above data indicates that the proposed method can effectively adapt to fault diagnosis tasks under various operating conditions through appropriate adjustment of the modulation factor , demonstrating strong generalization performance.
The impact of introducing the modulation factor
on fault identification accuracy was systematically analyzed to evaluate the generalization capability and adaptability of the proposed model. As shown in
Table 5, diagnostic accuracy was experimentally validated on the SEU bearing dataset under various working conditions. When comparing the fully weighted strategy (
= 1) with the non-weighted one (
= 0), the model’s accuracy increased by approximately 3%.
4.6. Ablation Study
To verify the effectiveness of the proposed dimensionless grayscale-weighted image construction and the cheap channel obfuscation module on the CWRU bearing dataset, experiments were conducted under a signal-to-noise ratio (SNR) of 15 dB and a load condition of 0, with the modulation factor
set to 1. The proposed grayscale image construction method was compared with a baseline approach in which the original data was directly converted into grayscale images. As shown in
Table 6, the proposed method achieved the highest diagnostic accuracy. Moreover, the ablation study demonstrates that the proposed dimensionless grayscale-weighted images significantly enhance time-frequency representation robustness by normalizing time-frequency energy distributions and focusing weights on key frequency bands, maintaining an identification accuracy above 98.5% even in noisy environments.
Based on the above experiments, by extracting features from the AdaptiveAvgPool2d layer of the model and using t-SNE technology for feature visualization, as shown in
Figure 10, the features learned by the model equipped with the Cheap Channel Obfuscation Module exhibit tighter intra-class clustering and clearer inter-class separability compared to the baseline model, indicating a higher discriminative capability of the extracted representations. This validates that the proposed module effectively enhances feature diversity.
To further demonstrate the effectiveness of the proposed dimensionless grayscale weighting map and the Cheap channel obfuscation module on the SEU bearing dataset, an experimental comparison was conducted under the 0 HP load condition with the modulation factor set to
= 0.8. The proposed grayscale weighting construction method was compared to the direct grayscale transformation of raw data. As shown in
Table 7, the proposed method achieved the highest diagnostic accuracy. Additionally, ablation experiments revealed that the proposed dimensionless grayscale weighting approach significantly enhanced the robustness of time-frequency representations by normalizing the energy distribution and emphasizing key frequency bands, maintaining a recognition accuracy above 98.9% under noisy conditions.
Based on the above experiments, the features of the AdaptiveAvg-Pool2d layer of the model were visualized using t-SNE technology (as shown in
Figure 11). Compared with the baseline model, the features learned by the model equipped with the inexpensive channel confusion module showed tighter intra-class clustering and clearer inter-class separability, indicating that the extracted feature representation has higher discriminative power. This also demonstrates the model’s strong generalization ability.
4.7. Discuss
While deep learning has advanced bearing fault diagnosis, significant gaps remain in constructing physically meaningful inputs and learning robust, lightweight representations. Firstly, in conventional signal-to-image transformation, the generated grayscale representations often lack physical correspondence with the actual vibration source locations, leading to ambiguous feature distributions. Moreover, these methods frequently overlook crucial statistical characteristics of vibration signals, such as kurtosis and impulsiveness, which are vital indicators of mechanical health. Secondly, even with improved inputs, feature representations often suffer from noise interference and redundant channel coupling, which degrade discriminability and inflate model complexity. Compounding this, lightweight networks designed for efficiency tend to lose critical fault information during feature compression.
Our work directly addresses these gaps. To bridge the first, we introduce a physics-guided image construction strategy that embeds physical prior knowledge by using dimensionless indicators to adaptively weight image regions, ensuring the transformation maintains physical consistency and diagnostic relevance. To address the second, we propose a Cheap Channel Obfuscation module within a unified framework, achieving simultaneous noise suppression, channel decorrelation, and lightweight feature enhancement without significant information loss.
It is also important to contextualize the limitations and future directions of our approach, which further define the current research frontier. The model’s performance is influenced by empirically set preprocessing parameters (e.g., sliding window length, overlap ratio), whose optimal values are likely dataset-specific. Furthermore, while our model shows high efficacy on single, stationary faults, its performance on compound faults or evolving progressive faults—more complex, real-world scenarios—requires further investigation and could necessitate multi-label or temporal modeling extensions. Architecturally, the aggressive initial spatial downsampling, though beneficial for efficiency, poses a potential trade-off by potentially attenuating subtle, high-frequency fault signatures—a challenge that future work could address with parallel branch structures designed to preserve fine details. These points collectively underscore both the advancements made by our present study and the precise trajectory for future research.
5. Conclusions
This study has presented two key contributions for advancing bearing fault diagnosis. Through extensive experimental validation, the following conclusions are drawn:
The proposed construction method of the dimensionless weighted grayscale map integrates grayscale representations of bearing vibration signals with dimensionless feature indicators extracted from different signal positions. By introducing prior physical knowledge to modulate the spatial distribution of grayscale features, the model’s attention to critical fault regions is significantly enhanced, thereby improving overall diagnostic accuracy and discriminative capability. The proposed scheme demonstrates strong performance, consistently achieving diagnostic accuracy above 96.88% on both the CWRU and SEU datasets after converting the raw signals into image data.
The proposed cheap channel obfuscation module combines the low-cost feature generation mechanism of the Ghost module with the channel reorganization strategy of ShuffleNetV2. This design effectively strengthens inter-channel information interaction and feature expression efficiency while maintaining low computational complexity and achieving a highly efficient and lightweight network architecture. Compared to the baseline, the incorporation of the proposed module yielded a 1% gain in recognition accuracy. The features from the pooling layer exhibit improved inter-class separation, confirming its effectiveness in learning more discriminative representations.
The proposed method demonstrates promising performance in bearing fault diagnosis; however, its effectiveness is highly dependent on choices made during the data preprocessing stage. Parameters such as sliding window length, overlap ratio, and weighting factors in the indicator matrix are set empirically, and their optimal values are likely influenced by rotational speed, bearing type, and fault severity. Future work should focus on developing adaptive parameter determination mechanisms to improve generalization.