4.1. Experimental Setting
In this section, two bearing datasets under complex working conditions are selected to validate the effectiveness of the proposed method for rolling bearing fault diagnosis. In each case, the feature extraction quality and classification accuracy of the proposed ECMSE method are compared with its variants, CMSE and DBCMSE. In addition, several advanced methods, including CMWPE, RCMFE, RCMDE, HFDE, and HWPE [
20,
30,
31,
32,
33], are also considered for comparison. Recent data-driven and deep-learning-based approaches, such as semi-supervised metric learning, convolutional neural networks, and cross-domain adaptation models, have also achieved promising results in rolling bearing diagnosis [
2,
5,
6,
7]. Since the main objective of this paper is to evaluate whether ECMSE provides more discriminative entropy features, the primary comparisons are conducted among entropy-based feature extraction methods under the same HBA–KELM classifier. To further position the proposed method against broader data-driven diagnosis models, representative SVM, KELM, 1D-CNN, CNN-LSTM, and 1D-ResNet baselines are additionally evaluated in
Section 4.4. Comparisons with transfer-learning and domain-adaptation models under variable working conditions remain an important direction for future work.
In the experiments, all parameters of CMSE and DBCMSE are set consistently with those of ECMSE. Moreover, the same parameters (i.e., delay time
and embedding dimension
m) are adopted for other comparison methods. To ensure the same feature vector length, the decomposition level
k of HWPE and HFDE is set to 3, and the maximum scale factor
of other methods is set to 8. In addition, the independent parameters of RCMFE are set as follows: similarity tolerance
and gradient
[
30]. The number of classes
C for RCMDE and HFDE is set to 5 [
31]. All methods are implemented in MATLAB 2018b and executed on a computer equipped with an Intel
® Core
™ i5-1135G7 CPU @ 2.40 GHz and 16.00 GB RAM. For reproducibility, all random training/test partitions are generated before feature extraction and are kept identical for all compared methods in the same run. The ECMSE parameter selection, HBA–KELM optimization, and classifier training are performed only on the training samples, while the test samples are used exclusively for final evaluation. Each reported maximum, minimum, mean, and standard deviation is obtained from 30 independent runs.
4.2. Test Verification Case 1
In Case 1, the rolling bearing dataset [
34] from Case Western Reserve University (CWRU) is employed to evaluate the performance of the proposed method.
Figure 6 shows the experimental rig, which mainly consists of five components: a fan-end bearing, an induction motor, a drive-end bearing, a torque transducer/encoder, and a dynamometer.
In Case 1, the test objects are 6205-2RS JEM deep groove ball bearings manufactured by SKF Group (Gothenburg, Sweden). located at the drive end. The fault types include inner race fault, outer race fault, and ball fault. To simulate different damage levels of rolling bearings, electro-discharge machining (EDM) is applied to process healthy bearings. For each fault type, fault diameters of 0.1778 mm, 0.3556 mm, and 0.5334 mm are introduced.
As a result, vibration signals under ten different working conditions are obtained. In this experiment, the motor power is set to 1 HP, the sampling frequency is set to 12 kHz, and the rotational speed is approximately 1772 rpm. For each working condition, 100 non-overlapping samples with a length of 1024 are obtained, among which 20 samples are selected as the training set and the remaining 80 samples as the test set. Therefore, there are 200 training samples and 800 test samples in total. The detailed information of bearings under different working conditions is presented in
Table 2, and the corresponding vibration signal waveforms are shown in
Figure 7.
After obtaining all signal samples, the MSC index is used to determine the optimal parameter combination
of ECMSE. To this end, after each random split, the 20 training samples of each working state were used for MSC-based parameter selection, and no test samples were involved in this process. The MSC values of the training samples under different parameters were calculated. To simplify parameter selection,
is first fixed to observe the variation in the low threshold
under different high thresholds.
Figure 8a shows that as
decreases, the MSC values for the three
groups gradually increase. When
decreases to
or lower, the MSC values in each group reach their maximum; therefore, the optimal
is set to
. Subsequently, with the optimal low threshold fixed, the MSC values under different combinations
are calculated. As shown in
Figure 8b, the MSC reaches the maximum value when
and
. Consequently, the final parameters of ECMSE are set as
,
,
,
, and
. Next, the ECMSE features of all samples are calculated. The feature vectors of the training set are fed into the HBA–KELM classifier for training, and then the feature vectors of the test set are fed into the optimized model for verification. As displayed in
Figure 9, the proposed approach can effectively identify bearing faults with different types and damage degrees, and the maximum recognition accuracy reaches 100%.
To validate the superiority of the proposed method, feature extraction performance experiments are conducted, and the comparison methods include CMSE, DBCMSE, RCMFE, RCMDE, HFDE, HWPE, and CMWPE. t-SNE [
35] is employed to reduce the multidimensional features of all methods to a two-dimensional space, and the visualization results are shown in
Figure 10. It can be observed that the inter-class distances of OR2, B1, and B3 in CMSE are too close to be distinguished. In contrast, except for a few misclassified B3 samples, ECMSE and DBCMSE exhibit good separability overall, demonstrating the advantage of the proposed multi-order difference method in extracting detailed information. Additionally, the HFDE method also shows relatively good clustering performance; however, the B1 and B3 faults remain difficult to distinguish. The other methods exhibit issues such as large intra-class dispersion and severe feature overlap, making it difficult to effectively distinguish different fault types. In summary, DBCMSE and ECMSE achieve similar visualization results, with the best clustering performance among all methods. Following feature extraction, the feature vectors are fed into HBA–KELM for classification. To eliminate randomness and ensure reliability, all methods are executed 30 times, with the training and test sets randomly selected before each run. The recognition accuracy and total computation time of different methods are presented in
Table 3 and
Figure 11a. As shown, the proposed ECMSE method achieves the highest recognition accuracy in all experiments.
Furthermore, although DBCMSE requires the shortest computation time, the integration of CMSE and DBCMSE in ECMSE increases the computational cost slightly, while significantly improving diagnostic accuracy and robustness.
In addition, to further analyze the influence of sample set partitioning on the experimental results, six scenarios (i.e., training set/test set ratios) are considered: 10/90, 20/80, 40/60, 60/40, 80/20, and 90/10.
Figure 11b shows the average recognition accuracy for each scenario over 30 runs. As observed, increasing the size of the training set contributes positively to improving recognition accuracy. However, an excessive number of training samples may reduce the efficiency of classifier parameter optimization. On the other hand, variations in the sample ratio have little effect on the proposed method, and its accuracy reaches the maximum under all conditions. Notably, although DBCMSE performs similarly to ECMSE in Case 1, its classification performance degrades when the training/test ratio is 10/90, indicating that ECMSE is less dependent on the number of training samples. Overall, the proposed method demonstrates clear advantages in both recognition accuracy and robustness.
4.3. Test Verification Case 2
In Case 2, the experimental data were obtained from the aero-engine bearing test bench in the laboratory, which mainly consists of a spindle testing machine, a refrigeration system, a hydraulic loading system, and a lubrication system.
Figure 12 shows the experimental setup and the locations of the accelerometers. The test object is the NU1010EM (inner race detachable)/N1010EM (outer race detachable) single-row cylindrical roller bearing manufactured by NSK. The bearing fault types include inner race fault, outer race fault, ball fault, outer race and ball compound fault, and inner race and ball compound fault.
The main geometric parameters of the tested bearing are listed in
Table 4. Two piezoelectric acceleration sensors were mounted on the bearing housing by threaded connections, as shown in
Figure 12b. Accelerometer 1 was installed near the upper radial position of the bearing housing, and Accelerometer 2 was installed near the lower radial position to collect vibration responses from different measurement locations.
To simulate different degrees of bearing damage, a laser marking machine and a wire cutting machine were used to process healthy bearings to obtain single-point and multi-point fault bearings with damage dimensions of 9 mm (length) × 0.2 mm (width). For single-point faults, one rectangular defect was fabricated on the corresponding bearing component. For multi-point and compound faults, three defects with the same nominal size were fabricated on the corresponding single or combined components, as summarized in
Table 5. In Case 2, a total of nine vibration signals of rolling bearings under different working conditions were collected. During the data acquisition process, the axial load was set to 2 kN, the motor speed was set to 2000 rpm, and the sampling frequency was 20.48 kHz. For each working condition, 100 non-overlapping samples were obtained, of which 20 samples were selected as the training set, and the remaining 80 samples were used as the test set. The length of each sample is 1024. The detailed information of bearings under different working conditions is presented in
Table 5, and the corresponding waveforms are shown in
Figure 13. In this subsection, the parameter selection procedure of ECMSE is the same as that in Case 1. After each random training/test split, the 20 training samples of each bearing type are used for MSC-based parameter selection, and the test samples are not used during parameter tuning. The MSC values of the training samples are calculated under different parameter settings.
Figure 14a shows that as the low threshold
decreases, the MSC trends corresponding to different high thresholds
are generally consistent. When
, the MSC values of the three groups reach their maximum and remain unchanged thereafter. Therefore, the optimal value of
is set to
. The selection process of the low threshold in Cases 1 and 2 indicates that even if
varies, the MSC trends with respect to
remain similar. Hence, it is feasible to determine
independently. Similar to Case 1, the MSC values under different parameter combinations
are shown in
Figure 14b. It can be observed that the MSC reaches its maximum when
and
. Accordingly, the final parameters of ECMSE are determined as
,
,
,
, and
. The different optimal parameter combinations selected in Case 1 (
,
) and Case 2 (
,
) reflect the data-dependent sensitivity of ECMSE parameters. The low threshold
mainly identifies approximately equal adjacent amplitudes, and its MSC trends are similar in the two cases; once
decreases to
, the MSC values become stable, indicating that ECMSE is relatively insensitive to smaller
values. By contrast, the high threshold
and embedding dimension
m directly affect the symbolic slope patterns and are more sensitive to the signal structure. For the CWRU dataset in Case 1, the single-point faults are characterized by relatively regular impulse responses with subtle amplitude variations, so a lower high threshold and a slightly larger embedding dimension help capture fine local fluctuation patterns. In Case 2, the private dataset differs in bearing type, acquisition platform, and fault form, and the multi-point and compound faults produce more complex amplitude changes; therefore, a larger high threshold and a lower embedding dimension reduce excessive pattern fragmentation and provide better feature separability. This indicates that the optimal ECMSE parameters should be selected according to the target dataset, while the compact MSC-based search range enables stable parameter determination using only the training samples. Subsequently, the ECMSE features of all samples are extracted. The feature vectors of the training set are then fed into the fault classifier for model training, and the feature vectors of the test set are input into the optimized model for fault identification.
Similar to Case 1, the features extracted by all methods are input into HBA–KELM for classification experiments. To avoid randomness and ensure reliability, each method is executed 30 times, with the training and test sets randomly selected each time. The experimental results are presented in
Table 6 and
Figure 15a. It can be observed that the proposed method achieves the highest average recognition accuracy and the smallest standard deviation. Additionally, although DBCMSE requires the shortest computation time, its average accuracy is only 99.11%, which is lower than that of ECMSE (99.82%). Furthermore, the training/test sets are divided into different ratios (10/90, 20/80, 40/60, 60/40, 80/20, and 90/10). The average recognition accuracy is calculated over 30 runs for each ratio, and the results are shown in
Figure 15b. The recognition accuracy of the comparative methods remains below that of the proposed method under all training/test ratios, further demonstrating the superiority of ECMSE.
4.4. Additional Robustness and Baseline Analysis
To further address the robustness and baseline concerns, three additional analyses are conducted: (1) an SNR-based noise robustness test, (2) a comparison with representative 1D signal-based machine-learning and deep-learning baselines, and (3) a classifier optimization ablation study. For the noise robustness test, additive white Gaussian noise is injected into the Case 1 test samples at different SNR levels, while the training samples remain unchanged. Each method is repeated 30 times under the same random partitions. The results are shown in
Table 7.
It is worth emphasizing that ECMSE exhibits the best anti-noise performance among all compared methods, highlighting its engineering applicability in noisy bearing monitoring environments. Even under the severe dB condition, ECMSE still achieves an average recognition accuracy of 93.68%, outperforming DBCMSE, CMSE, RCMDE, and RCMFE by 3.23, 13.35, 9.06, and 17.14 percentage points, respectively.
As shown in
Table 7, the recognition accuracy of all methods decreases as the noise intensity increases, but ECMSE maintains the highest average accuracy at all SNR levels. When the SNR decreases from the clean condition to
dB, the accuracy of ECMSE decreases by 6.27 percentage points, which is smaller than the decreases of DBCMSE (9.39 percentage points), CMSE (16.17 percentage points), RCMDE (13.44 percentage points), and RCMFE (16.13 percentage points). These results indicate that the fusion of CMSE and DBCMSE improves robustness under noisy conditions: the CMSE branch retains relatively stable low-frequency trend information, while the DBCMSE branch preserves high-frequency fault-related details.
To position the proposed method against broader fault diagnosis models, representative 1D signal-based baselines are considered, including SVM, KELM, 1D-CNN, CNN-LSTM, and 1D-ResNet. For a fair comparison, all additional baselines use the same training/test partitions as the entropy-based methods, and the test samples are not used for model selection. The comparison results are shown in
Table 8.
The results in
Table 8 indicate that the proposed ECMSE+HBA–KELM framework achieves competitive performance compared with representative deep-learning models while using compact entropy features and a shallow classifier. In Case 1, ECMSE+HBA–KELM improves the average accuracy by 0.87 percentage points over 1D-ResNet, the strongest deep-learning baseline. In Case 2, the corresponding improvement is 1.21 percentage points. This suggests that the proposed entropy representation remains effective in small-sample diagnosis, where end-to-end deep models may not fully exploit their representation capacity.
To separate the contribution of ECMSE from the classifier optimizer, a classifier ablation study is further conducted by combining ECMSE with KELM, PSO-KELM, GA-KELM, and HBA–KELM. The results are shown in
Table 9.
Finally, to examine whether the improvement of ECMSE over its two branches is statistically meaningful, a paired Wilcoxon signed-rank test is conducted over the 30 repeated runs. In each run, the compared methods use the same random training/test partition, so their recognition accuracies form paired observations. The null hypothesis is that the median paired accuracy difference is zero, and a two-sided test with a significance level of 0.05 is adopted. The statistical results are listed in
Table 10.
The ablation results in
Table 9 show that HBA–KELM provides the best classification performance among the tested KELM variants. Compared with ECMSE + KELM, ECMSE + HBA − KELM improves the mean accuracy by 1.11 percentage points in Case 1 and 1.50 percentage points in Case 2. It also outperforms ECMSE+PSO-KELM and ECMSE+GA-KELM, indicating that HBA is effective for optimizing the KELM parameters in this task. In addition,
Table 10 shows that the improvement of ECMSE over CMSE and RCMDE is highly significant in both cases. The improvement over DBCMSE is smaller but still statistically significant, with
in Case 1 and
in Case 2. These results support the effectiveness of fusing the low-frequency CMSE branch and the high-frequency DBCMSE branch.