Next Article in Journal
Real-Time Multi-Camera Tracking for Vehicles in Congested, Low-Velocity Environments: A Case Study on Drive-Thru Scenarios
Previous Article in Journal
Spatio-Temporal Deep Learning with Adaptive Attention for EEG and sEMG Decoding in Human–Machine Interaction
 
 
Article
Peer-Review Record

Few-Shot Bearing Fault Diagnosis Based on ALA-FMD and MSCA-RN

Electronics 2025, 14(13), 2672; https://doi.org/10.3390/electronics14132672
by Hengdi Wang 1, Fanghao Shui 1, Ruijie Xie 1,*, Jinfang Gu 2 and Chang Li 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2025, 14(13), 2672; https://doi.org/10.3390/electronics14132672
Submission received: 16 June 2025 / Revised: 29 June 2025 / Accepted: 30 June 2025 / Published: 1 July 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

A bearing fault diagnosis method was proposed by integrating the Artificial Lemming Algorithm (ALA) with Feature Mode Decomposition (FMD) and a Multi-Scale Coordinate Attention Relation Network (MSCA-RN). The ALA was used to optimize key FMD parameters for effective signal decomposition, and the minimum residual energy index was employed to select optimal components. These components were transformed into time-frequency maps using the continuous wavelet transform. The MSCA-RN captured multi-scale features and spatial dependencies, enabling few-shot learning for fault classification. I have a few suggestions for improvement.

  1. The novelty of the proposed method is limited, as both FMD and Relation Networks have been previously used in fault diagnosis.
  2. The ALA optimization process lacks comparison with more advanced or recent optimization algorithms beyond CPO.
  3. There is no ablation study to isolate the effects of MSCA, RN, and ALA components on model performance.
  4. The experiments rely heavily on benchmark datasets (CWRU and SKF), limiting the method's generalizability.
  5. The small sample size experiments are not validated on unseen real-world data, affecting external validity.
  6. The figures and results (e.g., envelope spectra and accuracy plots) are not statistically analyzed for significance.
  7. Comparisons are mainly made with outdated or baseline models (RN, CARN, CPO) without including newer architectures like Transformers or CNN-GRU hybrids.
  8. The feature extraction performance of ALA-FMD is not benchmarked against modern deep feature extraction techniques.

Author Response

Comments 1: The novelty of the proposed method is limited, as both FMD and Relation Networks have been previously used in fault diagnosis.

Response 1: Thank you for pointing this out. We agree with this comment. Although Feature Mode Decomposition (FMD) and Relation Network (RN) have been applied independently in the field of fault diagnosis, this paper is the first to integrate ALA optimization with MSCA-RN. ALA effectively mitigates the parameter sensitivity issue inherent in FMD by simulating lemming behavior, while MSCA-RN overcomes the limitations of traditional RNs, which rely solely on single-scale features, by utilizing multi-scale feature fusion and dynamic relation modeling. This integration improves diagnostic accuracy by approximately 9% in small sample scenarios. Furthermore, by constructing a few-shot learning framework and conducting cross-condition transfer experiments, the method's effectiveness in practical industrial environments has been validated. Additionally, ablation experiments reveal that both the ALA module and the MSCA module significantly impact accuracy, demonstrating the synergistic innovation value of each component. I hope this response meets your satisfaction, and once again, thank you for your suggestions.

 

Comments 2: The ALA optimization process lacks comparison with more advanced or recent optimization algorithms beyond CPO.

Response 2: Thank you for pointing this out. We agree with this comment. In the revised manuscript, we have incorporated comparisons with mainstream optimization algorithms from recent years, including Particle Swarm Optimization (PSO) and the Sparrow Search Algorithm (SSA). The results of the visualization analysis are presented as follows:

The optimal parameter set for FMD was determined using the ALA algorithm, aiming to minimize envelope entropy as the objective function. The outcomes of the optimization are visually examined throughout the process and compared against the results from the Crested Porcupine Optimizer (CPO), Particle Swarm Optimization (PSO), and Sparrow Search Algorithm (SSA). Figure 7 depicts the variation curve of envelope entropy, while Table 3 provides details on the optimal FMD configuration obtained through ALA optimization. The results indicate that the ALA algorithm reaches an envelope entropy of 0.58 and stabilizes after 20 iterations. In comparison, the PSO method achieves an envelope entropy of 1.02 after 74 iterations, showing a tendency to become trapped in local optima and demonstrating a slower optimization rate. Although PSO converges quickly in the early phases, it becomes vulnerable to local optima later on due to the convergence of particle velocities. The SSA algorithm initially prioritizes exploration, which limits its convergence speed. However, as the optimization process continues, the followers start to actively search, significantly improving convergence speed. Additionally, its capability to resist local optima exceeds that of the PSO method, yet its overall search effectiveness still lags behind that of the ALA method. (Located on line 374-391)

 

Comments 3: There is no ablation study to isolate the effects of MSCA, RN, and ALA components on model performance. 

Response 3: Thank you for pointing this out. We agree with this comment. In the revised manuscript, To quantitatively evaluate the independent contributions of ALA-FMD and MSCA-RN, three control experiments were designed. The first experiment involved di-rectly testing the untreated signal with RN for recognition. The second experiment in-volved inputting the signal processed with fixed-parameter FMD into RN for recognition. The third experiment involved inputting the signal processed with fixed-parameter FMD into MSCA-RN for recognition. The final accuracy comparison is presented in the table. As shown in Table, the ALA module improves recognition accuracy by approximately 4.9%, while the MSCA module enhances overall recognition accuracy by about 4.1%. This indi-cates that both the ALA module and the MSCA module hold significant value. (Located on line 488-497)

Model

Test Set A

Test Set B

Test Set C

RN

86.7

83.8

82.1

FMD+RN

88.1

84.9

83.2

FMD+MSCA-RN

92.4

90.2

87.5

ALA-FMD+MSCA-RN

96.8

94.3

91.2

 

Comments 4: The experiments rely heavily on benchmark datasets (CWRU and SKF), limiting the method's generalizability.

Response 4: Thank you for pointing this out. We agree with this comment. This paper first validates the proposed method using the CWRU bearing dataset, and then further verifies the method with experimental data collected from an existing test rig. The bearing model used in the dataset is SKF6205. Due to the limitations of our laboratory's experimental conditions, the bearing used in the actual tests was SKF6215. To further validate the generalization ability of the proposed method in this paper, data from the CWRU dataset, experimental data, and the SEU dataset were utilized. Data collected under the same operating conditions for different bearings within the datasets were compared and analyzed. The number of training samples remained at 20, and a 4-way 5-shot task was constructed. The accuracy results obtained from different datasets are shown in the table.

Different datasets

MSCA-RN

CARN

RN

SERN

CBRN

CWRU

96.8

94.3

91.2

94.1

93.9

Test data.

94.6

90.3

86.5

89.7

90.1

SEU

95.9

94.1

91.3

93.7

94.2

The experimental data achieved an accuracy rate of up to 94.6%, with accuracy rates for both the CWRU dataset and the SEU dataset exceeding 95.9%. Compared to other methods, the approach presented in this paper improved the average accuracy by at least 2%, demonstrating that the proposed model also exhibits excellent diagnostic performance across other datasets. (Located on line 604-615)

 

Comments 5: The small sample size experiments are not validated on unseen real-world data, affecting external validity.

Response 5: Thank you for pointing this out. We agree with this comment. The ALA-FMD and MSCA-RN fusion method proposed in this study effectively achieves adaptive noise reduction and feature enhancement for non-stationary vibration signals through dynamic parameter adjustment of the Fast Modal Decomposition (FMD) using the Adaptive Learning Algorithm (ALA). By integrating the Multi-Scale Coordinate Attention Relationship Network (MSCA-RN), this method leverages the coordinate attention mechanism to capture both the global energy distribution and local texture features of time-frequency maps. Furthermore, the nonlinear metric capability of the Relationship Network (RN) facilitates fault similarity discrimination even under conditions of limited sample sizes. The core advantages of this method include:

1. Data Adaptability: ALA-FMD effectively extracts fault features under different load and speed conditions on the CWRU dataset by selecting the optimal modal components through REI, with an SNR improvement of 11.5% compared to traditional FMD.

2. Small Sample Robustness: MSCA-RN employs a meta-learning training strategy, achieving a diagnostic accuracy of 91.2% for unseen working conditions in the 4-way 5-shot task, which is a 5.9% improvement over traditional RN models.

3. Cross-Dataset Generalization Capability: In addition to the CWRU dataset, the experimental data of SKF6215 bearings collected by our laboratory is introduced, validating the model's effectiveness across different bearing models and speeds, with an average diagnostic accuracy of 92.83%.

To further verify the generalization performance of the model proposed in this paper, we incorporated additional datasets for validation. Although the effectiveness of the method has been confirmed through standard datasets and supplementary experiments, there is a notable absence of complex operational data collected in real-time from industrial sites, which may affect the external validity of the model in extreme scenarios. Due to the limitations of current experimental conditions, validation using industrial site data has not been conducted, for which we sincerely apologize. Future research will focus on collaborating with industrial enterprises to collect bearing vibration data in scenarios such as wind power and machine tools, thereby constructing an industrial dataset that encompasses variable operating conditions and multiple fault types. Once again, we express our apologies and assure you that we will actively implement your suggestions, prioritizing the enhancement of industrial site data validation in our subsequent research to further improve the external validity of the study. We hope these responses meet your satisfaction. (Located on line 604-615、630-634)

 

Comments 6: The figures and results (e.g., envelope spectra and accuracy plots) are not statistically analyzed for significance.

Response 6: Thank you for pointing this out. We agree with this comment. We have supplemented the key charts and results in the text with statistical significance analysis. (Located on line 402-405、423-424、583-591、597-603)

 

Comments 7: Comparisons are mainly made with outdated or baseline models (RN, CARN, CPO) without including newer architectures like Transformers or CNN-GRU hybrids.

Response 7: Thank you for pointing this out. We agree with this comment. In the revised manuscript, To validate the significant advantages of the proposed model in this paper, it was compared with current mainstream models, namely ResNet-34, CNN-GRU, and Trans-former. The query set size for the MSCA-RN model was set to 5, and the obtained accuracy is illustrated in Figure 14. As demonstrated in Figure 14, the fault recognition accuracy of the method proposed in this paper surpasses that of the other models, with an average accuracy improvement of approximately 5% compared to ResNet-34, around 5.4% com-pared to CNN-GRU, and approximately 5.9% compared to Transformer. (Located on line 479-487)

 

 

 

Comments 8: The feature extraction performance of ALA-FMD is not benchmarked against modern deep feature extraction techniques.

Response 8: Thank you for pointing this out. We agree with this comment. To validate the feature extraction capability of ALA-FMD, it was compared with four contemporary deep feature extraction techniques: CNN-FE, VAE-FE, Transformer-FE, and DCNN-FE. The metrics used for comparison include SNR, REI, the inter-class to intra-class distance ratio (B/W Ratio), and the effective feature ratio (EF Ratio). The results of this comparison are presented in Table 9.

Method

SNR

REI

B/W Ratio

EF Ratio (%)

ALA-FMD

1.783

0.58

3.82

89.5

CNN-FE

1.558

1.09

2.56

78.3

VAE-FE

1.481

1.12

2.13

72.6

Transformer-FE

1.612

1.20

2.89

81.7

DCNN-FE

1.653

0.77

3.24

85.4

The SNR of ALA-FMD is 0.13 higher than that of DCNN-FE, while the REI is 32% lower. This indicates that the parameters optimized by ALA can decompose signals more accurately, preserving fault characteristics while effectively suppressing noise. Furthermore, B/W Ratio and EF Ratio of ALA-FMD are 17.9% and 4.1% higher than those of DCNN-FE, respectively. This demonstrates that the extracted time-frequency features exhibit greater inter-class differences and more compact intra-class distributions in spatial terms. In deep learning models, Transformer-FE outperforms CNN-FE and VAE-FE due to the self-attention mechanism's capability to capture long-range dependencies in non-stationary signals; however, it still falls short of the parameter-adaptive decomposition performance of ALA-FMD. Additionally, ALA-FMD feature extraction does not require training, providing significant advantages for industrial real-time diagnostics. (Located on line 537-553)

 

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

electronics-3732499

Authors addressed the problem of bearing fault feature extraction using small sample sizes. They proposed an intelligent diagnostic method that integrates an Artificial Lemming Algorithm (ALA) for Feature Mode Decomposition parameter optimization (ALA-FMD) with a Multi-Scale Coordinate Attention Relation Network (MSCA-RN). They presented the theory and implementation of the proposed algorithm. They have shown that the proposed method achieves a maximum accuracy of 96.8%, with an average accuracy of 92.83% on the test data.

My comments are as follows.

  1.  

Lines 43-44: "Therefore, researching fault diagnosis methods under conditions of small samples and varying operating conditions holds significant theoretical importance and practical value."

Lines 99-100: "Based on the aforementioned analysis, this paper proposes a small-sample bearing fault diagnosis method utilizing ALA-FMD and MSCA-RN."

Please add some explanations on 'small sample sizes'. Please constrast them with 'large sample sizes'. Small sample sizes: accelerometer data gathered for 10 seconds

Large sample sizes: accelerometer data gathered for 10 minuites

2.

Lines 251-252: " The minimum Residual Energy Index (REI) is adopted as the selection criterion for optimal modal components."

Lines 382-395: " The data corresponding to different fault types were decomposed into signals using  FMD, with the minimum REI employed as the selection criterion for optimal modal components. Figure 8 illustrates the REI values of each modal component for various bearing fault types."

Please define the minimum REI.

3.

Lines 341-343: "To validate the noise reduction effect of ALA-FMD and the classification performance of the MSCA-RN model, this paper employs the Case Western Reserve University (CWRU) bearing dataset [25] to verify the proposed fault diagnosis method."

Please elaborate on the CWRU dataset. Is it not original signals for bearing fault detection? How is it related to the dataset you have used in the test of the proposed algorithm?

  1.  

Lines 344-345: "The dataset was collected using acceleration sensors with SKF6205 bearings at a sampling frequency of 12 kHz."

Please add some comments on the choice of sampling frequency. Figures 9, 10, and 16 show spectra up to 1 kHz. Then how about using 3 kHz or 6 kHz?

5.

Lines 347-349: "The four bearing conditions utilized in the experiment are classified as Normal Class (NC), Inner-ring Fault (IF), Outer-ring Fault (OF), and Rolling-body Fault (RF), with a fault diameter of 0.178 mm."

1) Why this diameter? Fault diameters should have some distribution (e.g., 0.1 to 0.9 mm).

2) Why the same diameter for different faults. Shouldn't they be different for different faults?

3) What about the shape of the fault? The wording 'diameter' suggest circular pits. Figure 15 shows faults shapes which appear irregular.

  1.  

Some minor points

1) Equation font size: Please make fonts in equations of the same size as thos in the text.

2) Please consider removing verical grids (in red) in Figures 9 and 10. They interfere with identifying the value of spectrum peaks.

 

 

Author Response

Comments 1: Lines 43-44: "Therefore, researching fault diagnosis methods under conditions of small samples and varying operating conditions holds significant theoretical importance and practical value."

Lines 99-100: "Based on the aforementioned analysis, this paper proposes a small-sample bearing fault diagnosis method utilizing ALA-FMD and MSCA-RN."

Please add some explanations on 'small sample sizes'. Please constrast them with 'large sample sizes'. Small sample sizes: accelerometer data gathered for 10 seconds

Large sample sizes: accelerometer data gathered for 10 minuites

Response 1: Thank you for pointing this out. We agree with this comment. Small sample refers to scenarios in bearing fault diagnosis where the duration of a single data acquisition is typically short and the total sample size is limited. Usually, there are only a few dozen or a dozen samples for each fault type. In contrast, large sample refers to situations where the amount of fault data available for training diagnostic models is very sufficient. Typically, there are hundreds, thousands, or even tens of thousands of samples for each fault category, covering a wide range of operational conditions and damage levels. This study focuses on small sample scenarios, not to overlook the value of large samples, but to address the more prevalent issue of data scarcity in industrial settings.

 

Comments 2: Lines 251-252: " The minimum Residual Energy Index (REI) is adopted as the selection criterion for optimal modal components."

Lines 382-395: " The data corresponding to different fault types were decomposed into signals using  FMD, with the minimum REI employed as the selection criterion for optimal modal components. Figure 8 illustrates the REI values of each modal component for various bearing fault types."

Please define the minimum REI.

Response 2: Thank you for pointing this out. We agree with this comment. Among them, REI is used to measure the noise residue in the modal components, and the calculation formula is:

                                                      

In the equation, represents the original signal, and  represents the reconstructed signal of the modal component. The smaller the REI value, the more effective energy is retained in the component, and the less noise is present. (Located on line 317-321)

 

Comments 3: Lines 341-343: "To validate the noise reduction effect of ALA-FMD and the classification performance of the MSCA-RN model, this paper employs the Case Western Reserve University (CWRU) bearing dataset [25] to verify the proposed fault diagnosis method."

Please elaborate on the CWRU dataset. Is it not original signals for bearing fault detection? How is it related to the dataset you have used in the test of the proposed algorithm?

Response 3: Thank you for pointing this out. We agree with this comment. The CWRU bearing dataset, constructed by the Bearing Data Center at Case Western Reserve University, is a widely used benchmark dataset in the field of mechanical fault diagnosis. The data was collected in a laboratory environment, where vibration signals from motor bearings were recorded using acceleration sensors. It encompasses a variety of operating conditions, covering common fault types such as NC, IF, OF, and RF, with uniformly set fault diameters to facilitate the comparison of different research results.

This study directly utilizes the raw vibration signals under different operating conditions from the CWRU dataset as the test dataset. In the experimental procedure, four types of fault data samples, namely NC, IF, OF, and RF, are selected from the CWRU dataset under specific load and speed conditions. First, the ALA-FMD method is applied to these raw data for noise reduction and feature extraction, utilizing ALA to dynamically optimize FMD parameters and screen out fault feature modes with high signal-to-noise ratios. Subsequently, the extracted features are input into the MSCA-RN model for fault classification. Through experiments conducted on the CWRU dataset, the fault diagnosis performance of the proposed ALA-FMD and MSCA-RN fusion method under small-sample and variable operating conditions was validated. The cross-condition diagnosis accuracy between different load conditions reached 91.2%-94.3%, effectively demonstrating the effectiveness and robustness of the algorithm on such standard datasets.

 

Comments 4: Lines 344-345: "The dataset was collected using acceleration sensors with SKF6205 bearings at a sampling frequency of 12 kHz."

Please add some comments on the choice of sampling frequency. Figures 9, 10, and 16 show spectra up to 1 kHz. Then how about using 3 kHz or 6 kHz?

Response 4: Thank you for pointing this out. We agree with this comment. The CWRU bearing dataset used in the experiment is a standard test set in the field, and its 12kHz sampling setting is widely recognized. This configuration not only meets the needs of fault characteristic analysis but also facilitates horizontal comparison with existing research results, ensuring the repeatability and scientific rigor of the experiment. Furthermore, although the bearing fault characteristic frequency is mainly the fundamental frequency, its harmonic components can extend beyond 1kHz. According to the Nyquist theorem, the sampling frequency must be at least twice the highest analysis frequency. This study selects a 12kHz sampling frequency, which can cover frequency components up to 6kHz, ensuring the complete capture of harmonic characteristics, avoiding aliasing distortion, and facilitating noise suppression by ALA-FMD.

 

Comments 5: Lines 347-349: "The four bearing conditions utilized in the experiment are classified as Normal Class (NC), Inner-ring Fault (IF), Outer-ring Fault (OF), and Rolling-body Fault (RF), with a fault diameter of 0.178 mm."

1) Why this diameter? Fault diameters should have some distribution (e.g., 0.1 to 0.9 mm).

2) Why the same diameter for different faults. Shouldn't they be different for different faults?

3) What about the shape of the fault? The wording 'diameter' suggest circular pits. Figure 15 shows faults shapes which appear irregular.

Response 5: Thank you for pointing this out. We agree with this comment.

1)The CWRU bearing dataset used in the experiment is a widely adopted benchmark in the field, with the fault diameter uniformly set at 0.178 mm, which is the standard testing condition in the mechanical fault diagnosis domain, facilitating result comparability across different methods. Moreover, the 0.178 mm diameter represents a moderate level of fault severity, which not only generates distinct vibration characteristics but also avoids signal saturation caused by excessive faults, ensuring that ALA-FMD can effectively decompose time-frequency features.

2)The uniform diameter is adopted to achieve a single-variable control experiment, with the core objective being: by fixing the fault size, the differences in vibration signals among IF, OF, and RF faults are primarily determined by the fault location rather than the extent of damage, thereby validating the MSCA-RN's capability to identify faults at different locations. Simultaneously, under small sample conditions, controlling the fault diameter can reduce the dimensionality of variables, avoid feature confusion introduced by size differences, and enhance the credibility of the model's generalization ability.

3)In bearing faults, "diameter" is typically used to characterize the equivalent damage size, where even if the actual shape is irregular, its macroscopic scale can be described by the equivalent circle diameter. The irregular shape of the fault in Figure 15 is a common damage morphology in actual working conditions, and the ALA-FMD in this study effectively captures the time-frequency texture characteristics of irregular faults through multi-modal decomposition, demonstrating the robustness of the method.

 

Comments 6: Some minor points

1) Equation font size: Please make fonts in equations of the same size as thos in the text.

2) Please consider removing verical grids (in red) in Figures 9 and 10. They interfere with identifying the value of spectrum peaks.

Response 6: Thank you for pointing this out. We agree with this comment.

1)The font size of all formulas in the text has been uniformly adjusted to match the body text as required, ensuring that the formula symbols and variables are harmoniously typeset with the main text, thereby enhancing the document's readability.

2)The red vertical grid lines in Figures 9 and 10 have been removed. After the removal, the identification of spectral peak positions has become clearer, avoiding interference from the grid lines on the readings of peak amplitude and frequency. The modified spectrum has been rechecked for peak annotations to ensure the accurate visualization of fault characteristic frequencies.

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript may be accepted in its present form.

Back to TopTop