Intelligent Fault Diagnosis of Rolling Bearings Based on a Complete Frequency Range Feature Extraction and Combined Feature Selection Methodology

The utilization of multiscale entropy methods to characterize vibration signals has proven to be promising in intelligent diagnosis of mechanical equipment. However, in the current multiscale entropy methods, only the information in the low-frequency range is utilized and the information in the high-frequency range is discarded. In order to take full advantage of the information, in this paper, a fault feature extraction method utilizing the bidirectional composite coarse-graining process with fuzzy dispersion entropy is proposed. To avoid the redundancy of the full frequency range feature information, the Random Forest algorithm combined with the Maximum Relevance Minimum Redundancy algorithm is applied to feature selection. Together with the K-nearest neighbor classifier, a rolling bearing intelligent diagnosis framework is constructed. The effectiveness of the proposed framework is evaluated by a numerical simulation and two experimental examples. The validation results demonstrate that the extracted features by the proposed method are highly sensitive to the bearing health conditions compared with hierarchical fuzzy dispersion entropy, composite multiscale fuzzy dispersion entropy, multiscale fuzzy dispersion entropy, multiscale dispersion entropy, multiscale permutation entropy, and multiscale sample entropy. In addition, the proposed method is able to identify the fault categories and health states of rolling bearings simultaneously. The proposed damage detection methodology provides a new and better framework for intelligent fault diagnosis of rolling bearings in rotating machinery.


Introduction
In modern industrial systems involved in major engineering fields such as aviation, electric power, the chemical industry, and mining, rotating machinery has been widely used as an integral part of such systems.With the continuous progress of modern science and technology, the complexity of rotating machinery systems is also getting higher [1].The rolling bearing, as a key component, usually operates continuously in a harsh environment and under complex loading; thus, it is prone to failures [2,3].Once a rolling bearing failure occurs, it will directly affect the reliability and the stability of the rotating machinery system, and it sometimes produces safety risks or even leads to catastrophic accidents [4].Therefore, the research of rolling bearing health condition monitoring and intelligent fault diagnosis is very important.
Due to its cost-effectiveness, the vibration sensor has been widely used in bearing condition monitoring [5].In general, vibration-based rolling bearing health condition monitoring and intelligent fault diagnosis technology consists of three major steps: vibration signal data acquisition, signal feature extraction, and fault identification and classification [6,7].Among these steps, the process of feature extraction will greatly affect the final identification.The extraction of proper features from the complex vibration signals is the key step to realizing intelligent fault diagnosis [8,9].
As a statistical measure, information entropy can be used to quantify the internal information and complexity of a time series [10].Usually, the greater the complexity or irregularity of the time series, the larger the corresponding entropy value.In contrast, a smaller entropy value usually means less complexity and less irregularity in the time series.As is well known, the introduction of faults in a bearing will increase the complexity of the vibration signal during operations; therefore, an appropriately configurated entropy could be a feasible measure for the bearing health conditions.
Based on vibration signals, some commonly defined information entropy has been successfully applied to the field of fault diagnosis and has yielded useful results.For example, Yan [11] used the approximate entropy as the feature index of signal complexity measurement and successfully applied it to the identification of structural defects in bearings.Han et al. [12] adopted the sample entropy as an index to reflect the regularity and features of vibration signals and realized the fault diagnosis of rolling bearings.Zheng et al. [13] utilized the fuzzy entropy as the feature and demonstrated its effectiveness through experimental verifications.Zhang et al. [14] characterized the fault state of motor bearings by using the permutation entropy of the vibration signal.The effectiveness of their method was verified through experiments and comparative studies.Rostaghi et al. [15] proposed dispersion entropy for condition monitoring of rotating machinery and, through several sets of experimental data, demonstrated that it has a better effect than other usual entropy methods.In recent years, there have also been efforts made towards improvements of the entropy methods based on dispersive entropy which have been applied to the field of fault diagnosis [16,17].
For more complicated diagnosis situations, the above-mentioned single-scale entropy is sometimes not sufficient.Accordingly, the multiscale entropy method has been proposed [18].Multiscale entropy is calculated through expanding the original time series into a multiple scale series by using the coarse-graining method.Long et al. [19] realized rolling bearing different fault category diagnosis by using the multiscale sample entropy calculated from the vibration signal as the feature index and compared it with the results of using a traditional single-scale sample entropy as the index.Wu et al. [20] extracted the features of multiscale permutation entropy from the vibration signals of faulty rolling bearings and compared them with that of single-scale permutation entropy and multiscale sample entropy methods to verify the superiority and effectiveness of the multiscale permutation entropy method.Zhang et al. [21] utilized the multiscale dispersion entropy of the vibration signal as the feature for rolling bearing diagnosis and obtained an effective diagnosis result.
Multiscale-based entropy feature extraction methods are widely used, but they still have two major drawbacks.From one aspect, multiscale entropy based on the traditional coarse-graining method has a large variance of entropy value when the scale is large, causing the reliability of the entropy assessment results to be reduced.On the other side, the current multiscale methods extract only the low-frequency information from the sequence, without considering the high-frequency information.In order to overcome the problem of a high variance of entropy value in multiscale entropy methods based on traditional coarse-graining methods, composite multiscale entropy was proposed [22].Although the problem of a high variance of entropy value at multiscale was solved, the issue of utilizing the high-frequency information has not yet been resolved.Hierarchical entropy [23] is considered as an entropy method that takes into account the high-frequency information of a sequence and essentially coarse-grains the sequence using two operators-difference and average.However, hierarchical entropy is unable to consider multiscale information concurrently, and the coarse-grained sequences in different layers do not distribute as highfrequency components or low-frequency components [24].In addition, after the multiscale entropy feature extraction of the sequences, the feature selection was not considered in most cases, which could create the problem of redundancy of feature information.Therefore, it has become necessary to introduce a new method which is able to take advantage of the full frequency information and minimize possible feature redundancy.
The fuzzy dispersion entropy (FDE) has been proven to be a promising method for feature extraction [25].However, it also suffers from the drawback of a single scale, which makes it difficult to reflect information from multiple scales.To fully extract the entropy feature information of the full frequency band of the signal while avoiding feature information redundancy and achieving better diagnosis, an intelligent diagnosis framework for rolling bearing fault identification based on a combination of the bidirectional composite multiscale fuzzy dispersion entropy (BCMFDE), Random Forest (RF) algorithm, Maximum Relevance Minimum Redundancy (mRMR) algorithm, and the K-nearest neighbor (KNN) classifier is proposed.A numerical simulation and two experiments are used to demonstrate the effectiveness and versatility of the proposed method.
The rest of this paper is organized as follows.Section 2 describes the basic definitions of the proposed methodology.Section 3 describes the framework of the proposed fault diagnosis method.Section 4 simulates the signals of different faults of rolling bearings and verifies the effectiveness of the proposed method using the simulated signals.Section 5 verifies the effectiveness of the proposed method by two experimental examples.Section 6 summarizes the conclusions.

BCMFDE
The BCMFDE method is formed by combining the FDE method and the bidirectional composite coarse-graining process (BCCGP).

FDE
The FDE is a method used to characterize the complexity of a time series and estimate the dynamic changes of signal fluctuations.The calculation procedure is as follows [26]: 1.

2.
Each element y(i) is mapped to a new symbolic sequence Z c = z c 1 , z c 2 , . . ., z c N using a linear transformation as follows: where c is the number of categories.

3.
The series Z c with the embedding dimension m and the delay time d are constructed as follows:

4.
The fuzzy membership function is introduced in sequence Z c as follows:

5.
Each vector z m,c j is mapped to a dispersion pattern π v 0 ,v 1 ,...,v (m−1) according to its degrees of membership.where z c j is class v 0 , z c j+(1)d is class v 1 ,. .., and z c j+(m−1)d is class v m−1 .The membership degree of each vector z m,c j is calculated to obtain the membership degree of each dispersion pattern: In general, the number of dispersion patterns that are attributed to each vector z m,c j in FDE is equal to c m .

6.
The probability of each dispersion pattern π v 0 ,v 1 ,...,v m−1 is calculated as follows: Finally, the FDE is calculated according to the theory of Shannon's entropy as follows:

BCCGP
The BCCGP is based on an improvement of the composite coarse-graining process [27], which has the advantage of making BCCGP capable of dealing with the multiscale decomposition of the low-and high-frequency components of the time series.The calculation procedure of the BCCGP is as follows: 1.
For time series x = {x(i); i = 1, 2, . . ., N} of length, N is a positive integer, and the bidirectional composite coarse-graining operator at τ scales factors is expressed as where o,j and a (τ) o,j represent for difference operators and average operators, respectively.

2.
The coarse-grained series form of operators d o,j and a o,j is expressed as: where o represent the coarse-grained series for high-frequency components and low-frequency components, respectively.

3.
According to the definition of FDE, the BCMFDE is obtained by The BCCGP with scale factors τ = 2 and τ = 3 is shown in Figure 1.In the process of bidirectional composite coarse-graining, the difference and average operators are used to process the original time series, so the BCCGP is capable of capturing more comprehensive time series feature information from the low-frequency components and the high-frequency components of the time series simultaneously.
The BCCGP with scale factors  = 2 and  = 3 is shown in Figure 1.In the process of bidirectional composite coarse-graining, the difference and average operators are used to process the original time series, so the BCCGP is capable of capturing more comprehensive time series feature information from the low-frequency components and the highfrequency components of the time series simultaneously.

Feature Selection
The advantage of BCMFDE is that the information of low-and high-frequency components at different scales is considered.However, the more sequence scales that are considered, the more information features that can be constructed, and consequently, the more computational effort that is required.To balance feature richness and the computational burden, it is necessary to make a reasonable selection of the extracted feature set.To this end, the RF-mRMR is used for feature selection.

RF
The RF is an ensemble learning algorithm composed of decision tree models as its basic units [28].The essence of the algorithm is generated by integrating the results from all decision tree models and determining the final result with votes.The RF algorithm is capable of evaluating the importance of features, and the main idea is to calculate the contribution of different features to each decision tree model.The contribution is able to be represented using the calculation of the out-of-bag (OOB) data error rate [29], where the OOB data are unused data each time the decision tree is built.The importance of the features is measured by calculating the average contribution of each feature.
To evaluate the importance of a feature, the steps are as follows: 1.The corresponding OOB data are selected for each decision tree to calculate the OOB data error rate, denoted as 1.2. The OOB data error rate is calculated again after adding random noise interference to all samples of OOB data and is denoted as 2.x i x (3) 3,3 a

Feature Selection
The advantage of BCMFDE is that the information of low-and high-frequency components at different scales is considered.However, the more sequence scales that are considered, the more information features that can be constructed, and consequently, the more computational effort that is required.To balance feature richness and the computational burden, it is necessary to make a reasonable selection of the extracted feature set.To this end, the RF-mRMR is used for feature selection.

RF
The RF is an ensemble learning algorithm composed of decision tree models as its basic units [28].The essence of the algorithm is generated by integrating the results from all decision tree models and determining the final result with votes.The RF algorithm is capable of evaluating the importance of features, and the main idea is to calculate the contribution of different features to each decision tree model.The contribution is able to be represented using the calculation of the out-of-bag (OOB) data error rate [29], where the OOB data are unused data each time the decision tree is built.The importance of the features is measured by calculating the average contribution of each feature.
To evaluate the importance of a feature, the steps are as follows: 1.
The corresponding OOB data are selected for each decision tree to calculate the OOB data error rate, denoted as eOOB1.

2.
The OOB data error rate is calculated again after adding random noise interference to all samples of OOB data and is denoted as eOOB2.

3.
The importance ψ of the feature when there are N t decision trees in the forest can be expressed as: The reason that ψ reflects the importance of the feature is that when random noise is added, the accuracy of the OOB data decreases sharply (eOOB2 increases), which indicates that the feature has a major impact on the model results and that the importance of the feature is relatively significant.The parameters of RF are set based on the suggestions given in [30], where the decision tree N t = 50.

mRMR
The mRMR is acting as a filter for feature evaluation and selection [31,32].The core concept of the mRMR algorithm is to maximize the relevance between features and categorical variables while minimizing the redundancy between different features.The basic theory of the mRMR algorithm is summarized below [33].
The mutual information amount is used to measure the similarity between variables, and the corresponding score is assigned to achieve feature selection according to the size of the score.For the two given random variables X and Y, their mutual information is defined as: where p(a) and p(b) represent the probabilities of X and Y, respectively, and p(a, b) is the joint probability density function of X and Y. Assume that T i represents each feature and that v represents a category.In order to ensure the maximum relevance between the feature and the category, the maximum relevance criterion is expressed as: where S is a feature subset, and I(T i ; v) is the mutual information between features T i in different categories v.
The defined maximum relevance criterion is able to find the feature subset S that has the greatest information correlation with each type of feature; however, the feature subset selected based on this criterion may have redundancy.In order to ensure minimum redundancy among features, the minimum redundancy criterion needs to be applied.The minimum redundancy criterion is expressed as: In order to retrieve the feature sets with maximum relevance and minimum redundancy, D and R need to be optimized simultaneously.To achieve this, we define a feature sensitivity Φ for each feature as:

KNN
The KNN [34] classifier, which is widely used in mechanical fault diagnosis, was used to achieve the classification of rolling bearings with different types and severity of faults.
The KNN classifier is used for classification by measuring the distance between different feature data.The basic idea is that a sample h is assumed to have K nearest neighboring samples h K in the feature space.If most of the samples h K belong to category L, then, that sample h also belongs to category L.
The basic steps of KNN include: 1.
Calculating the distance between the feature data of the test sample and the feature data of each training sample.

2.
Ranking the distance according to its magnitude.

3.
Selecting the K samples with the smallest distance.

4.
Calculating the frequency of occurrence of the category in which the top K samples are located.

5.
Returning the category with the highest occurrence frequency among the top K samples as the classification of the test sample.
The value of the nearest neighbor number K affects the results of the model, as shown in Figure 2. As shown in Figure 2, the judgment results under K = 5 or K = 10 are inconsistent with those under K = 1.This indicates that the number of nearest neighbors K affects the complexity and generalization of the model.Therefore, in order to make the model have better generalization, in this paper, K = 5 is chosen as the number of nearest neighbors for the KNN classifier.The KNN classifier is used for classification by measuring the distance between different feature data.The basic idea is that a sample ℎ is assumed to have  nearest neighboring samples ℎ  in the feature space.If most of the samples ℎ  belong to category , then, that sample ℎ also belongs to category .
The basic steps of KNN include: 1. Calculating the distance between the feature data of the test sample and the feature data of each training sample.2. Ranking the distance according to its magnitude.

Intelligent Fault Diagnosis Framework
Based on the above discussions, we propose a rolling bearing fault feature sets construction method based on the BCMFDE feature extraction method combined with the RF-mRMR feature selection method, aiming at extracting features in the whole frequency range of the signal while minimizing the information redundancy.And, it is combined with the KNN classifier to form an intelligent fault diagnosis framework.The procedures of the proposed intelligent fault diagnosis framework are depicted in Figure 3, which include the following steps:

•
Step 1: Signal acquisition as shown in Figure 3a.The vibration sensor is used to collect the dynamic response for bearing condition diagnosis.The collected vibration signal is segmented with equal length before the signal being analyzed.

•
Step 2: Feature sets construction as shown in Figure 3b.Firstly, the analyzed signal is subjected to BCCGP processing to obtain the low-frequency and high-frequency component series in different scales.The FDE of each series is calculated using Equation (12).The alternative feature set of rolling bearing faults consisting of BCMFDE is constructed.Secondly, the RF-mRMR is used to select the dominant features from the rolling bearing alternative feature set based on the importance  and sensitivity  of features at each scale to obtain a new rolling bearing fault feature set.

•
Step 3: Failure identification and classification as shown in Figure 3c.The new rolling bearing fault feature set is randomly divided into a training sample set and a test

Intelligent Fault Diagnosis Framework
Based on the above discussions, we propose a rolling bearing fault feature sets construction method based on the BCMFDE feature extraction method combined with the RF-mRMR feature selection method, aiming at extracting features in the whole frequency range of the signal while minimizing the information redundancy.And, it is combined with the KNN classifier to form an intelligent fault diagnosis framework.The procedures of the proposed intelligent fault diagnosis framework are depicted in Figure 3, which include the following steps:

•
Step 1: Signal acquisition as shown in Figure 3a.The vibration sensor is used to collect the dynamic response for bearing condition diagnosis.The collected vibration signal is segmented with equal length before the signal being analyzed.

•
Step 2: Feature sets construction as shown in Figure 3b.Firstly, the analyzed signal is subjected to BCCGP processing to obtain the low-frequency and high-frequency component series in different scales.The FDE of each series is calculated using Equation (12).The alternative feature set of rolling bearing faults consisting of BCMFDE is constructed.Secondly, the RF-mRMR is used to select the dominant features from the rolling bearing alternative feature set based on the importance ψ and sensitivity Φ of features at each scale to obtain a new rolling bearing fault feature set.

•
Step 3: Failure identification and classification as shown in Figure 3c.

Simulated Bearing Damage Vibration Response
In this section, simulated signals of rolling bearings with different faults are used to evaluate the effectiveness of the proposed signal processing framework [35].Simulated bearing faults include: the roller fault, the inner race fault, and the outer race fault.The

Simulation 4.1. Simulated Bearing Damage Vibration Response
In this section, simulated signals of rolling bearings with different faults are used to evaluate the effectiveness of the proposed signal processing framework [35].Simulated bearing faults include: the roller fault, the inner race fault, and the outer race fault.The sampling frequency is 10.24 kHz.The shaft rotating speed is 1800 rpm.The bearing parameters are listed in Table 1.The schematic diagram of the simulated bearing is shown in Figure 4.   Assuming at time  = 0, the local defect begins to make contact with a roll impact force excited by the local defect on the outer ring can be expressed as: Then, the impact force excited by the local defect on the inner ring can be exp as: and the impact force excited by the local defect on the roller can be expressed as: In Equations ( 18)-( 20),   represents the pulse intensity of the outer race;  inner race pulse intensity;   is the pulse intensity of the outer race to the roller the pulse intensity of the inner race to the roller;  is the unit impulse function;  number of pulses;  is the pulse phase difference coefficient, which is  = 0.5;   characteristic frequency of the bearing outer race damage;   is the characteris quency of the bearing inner race damage; and   is the characteristic frequency roller damage.For simplicity, the pulse intensity in the simulation is assumed to be The vibration amplitude decay envelope function due to damping can be exp as: Assuming at time t = 0, the local defect begins to make contact with a roller.The impact force excited by the local defect on the outer ring can be expressed as: Then, the impact force excited by the local defect on the inner ring can be expressed as: and the impact force excited by the local defect on the roller can be expressed as: In Equations ( 18)-( 20), d o represents the pulse intensity of the outer race; d i is the inner race pulse intensity; d bo is the pulse intensity of the outer race to the roller; d bi is the pulse intensity of the inner race to the roller; δ is the unit impulse function; k is the number of pulses; β is the pulse phase difference coefficient, which is β = 0.5; f o is the characteristic frequency of the bearing outer race damage; f i is the characteristic frequency of the bearing inner race damage; and f b is the characteristic frequency of the roller damage.For simplicity, the pulse intensity in the simulation is assumed to be unity.
The vibration amplitude decay envelope function due to damping can be expressed as: where ζ e and f e are the damping ratio and the natural frequency of the bearing system, respectively.The amplitude transfer function is expressed as: Sensors 2023, 23, 8767 10 of 24 The load distribution can be expressed as: where q max represents the maximum load intensity, and ε represents the load distribution coefficient.
The simulated outer race fault signal can be expressed as: The simulated inner race fault signal can be expressed as: where f r is the rotating frequency of the shaft.
The simulated roller fault signal can be expressed as: where f c is the rotating frequency of the bearing cage.
In order to simulate the actual working conditions, Gaussian white noise is added into the analog signal, and the signal noise ratio is 5 dB.The time history of the simulated bearing vibration response with normal, roller fault, inner race fault, and outer race fault are plotted in Figure 5. where   and   are the damping ratio and the natural frequency of the bearing system, respectively.
The amplitude transfer function is expressed as: The load distribution can be expressed as: where   represents the maximum load intensity, and  represents the load distribution coefficient.
The simulated outer race fault signal can be expressed as: The simulated inner race fault signal can be expressed as: where   is the rotating frequency of the shaft.
The simulated roller fault signal can be expressed as: where   is the rotating frequency of the bearing cage.
In order to simulate the actual working conditions, Gaussian white noise is added into the analog signal, and the signal noise ratio is 5 dB.The time history of the simulated bearing vibration response with normal, roller fault, inner race fault, and outer race fault are plotted in Figure 5.

Simulation Analysis
The vibration signal is processed according to the procedures outlined in Section 3. Based on suggestions given in [27], for each vibration response, a sliding window of 2048 points is applied to extract 300 samples from the original signal.The BCMFDE features are extracted for each sample to construct the feature set.For the purpose of comparisons, other entropy feature extraction methods are also applied to the same signal, including HFDE (Hierarchical fuzzy dispersion entropy), CMFDE (Composite multiscale fuzzy dispersion entropy), MFDE (Multiscale fuzzy dispersion entropy), MDE (Multiscale dispersion entropy), MPE (Multiscale permutation entropy) and MSE (Multiscale sample entropy).Based on suggestions given in [25,36], the parameters used for feature calculation are listed in Table 2.The feature sets constructed by different entropy methods are visualized by using the t-SNE [37] algorithm first, as shown in Figure 6.According to [30], in a feature set, when the distance between samples of the same category is small and the distance between samples of different categories is large, it indicates that the constructed feature set has good separability for feature categorization.Therefore, judging by visualization, from Figure 6, the feature set constructed by the BCMFDE method has the best separability among the methods used.

Simulation Analysis
The vibration signal is processed according to the procedures outlined in Section 3. Based on suggestions given in [27], for each vibration response, a sliding window of 2048 points is applied to extract 300 samples from the original signal.The BCMFDE features are extracted for each sample to construct the feature set.For the purpose of comparisons, other entropy feature extraction methods are also applied to the same signal, including HFDE (Hierarchical fuzzy dispersion entropy), CMFDE (Composite multiscale fuzzy dispersion entropy), MFDE (Multiscale fuzzy dispersion entropy), MDE (Multiscale dispersion entropy), MPE (Multiscale permutation entropy) and MSE (Multiscale sample entropy).Based on suggestions given in [25,36], the parameters used for feature calculation are listed in Table 2.The feature sets constructed by different entropy methods are visualized by using the t-SNE [37] algorithm first, as shown in Figure 6.According to [30], in a feature set, when the distance between samples of the same category is small and the distance between samples of different categories is large, it indicates that the constructed feature set has good separability for feature categorization.Therefore, judging by visualization, from Figure 6, the feature set constructed by the BCMFDE method has the best separability among the methods used.In order to quantitatively evaluate the feature extraction ability of different entropy methods, the classification effect of the feature set was tested.The KNN classifier was used in the evaluation, and the test accuracy was adopted as the evaluation measure.
The KNN classifier was trained first.Eighty percent of the feature sets were randomly selected to form the training set and the rest were used as the test set.The training sets were used to train the classifier model first.The test sets were then fed to the trained model to validate the classification accuracy.This process was repeated 10 times and the mean, standard deviation, and mean time values of the test accuracies were calculated and listed in Table 3 for comparisons.It can be seen from Table 3 that among the MFDE, MDE, MPE, and MSE methods, the MFDE method obtained the highest mean accuracy and the smallest standard deviation.This indicates that the sensitivity of FDE features is higher.In addition, feature extraction based on BCMFDE produced the best mean accuracy and standard deviation compared with the other traditional coarse-grained methods, indicating that the bidirectional composite coarse-graining-based approach indeed increased information richness and, therefore, provided better classification accuracy.
The confusion matrix can intuitively show the category and number of samples that were misclassified.The confusion matrix of the fifth test result was visualized and analyzed, as shown in Figure 7.It can be seen from Figure 7 that in the feature set constructed using the BCMFDE method, the number of misclassified samples is the smallest.At the same time, it coincides with the t-SNE visualization results of the feature set in Figure 6.This indicates that the BCMFDE method has the best feature extraction capability.
Although the method based on BCMFDE improves classification accuracy, its disadvantages are an increased computational burden and the risk of feature information redundancy.Compared with traditional coarse-graining, bidirectional composite coarsegraining considers not only the information of low-frequency components but also the additional information of high-frequency components, which will double the number of extracted features.To avoid redundancy of feature information while minimizing the computation cost, the RF-mRMR is used to select important and sensitive features (assessed by importance ψ and sensitivity Φ and ranked from highest to lowest) from the raw feature set.The features with importance ψ and sensitivity Φ in the top τ are selected and then used to construct a new feature set.This feature selection procedure formulates the RF-mRMR-BCMFDE process.
In the RF-mRMR-BCMFDE process, for each feature, the importance ψ and sensitivity Φ were calculated based on Equations ( 13) and ( 17), respectively, and the results are shown in Figure 8.In Figure 8, feature indexes 1-16 indicate information about highfrequency component information, and feature indexes 17-32 indicate information about low-frequency component information.It can be found that the high-frequency component contains rich information, which is helpful for classification and compensates for the incomplete feature extraction of the low-frequency components.In addition, the sensitivity and importance of different features have significant differences, and therefore, a weighted feature selection strategy is necessary.
The confusion matrix can intuitively show the category and number of samples that were misclassified.The confusion matrix of the fifth test result was visualized and analyzed, as shown in Figure 7.It can be seen from Figure 7 that in the feature set constructed using the BCMFDE method, the number of misclassified samples is the smallest.At the same time, it coincides with the t-SNE visualization results of the feature set in Figure 6.This indicates that the BCMFDE method has the best feature extraction capability.Although the method based on BCMFDE improves classification accuracy, its disadvantages are an increased computational burden and the risk of feature information redundancy.Compared with traditional coarse-graining, bidirectional composite coarsegraining considers not only the information of low-frequency components but also the additional information of high-frequency components, which will double the number of extracted features.To avoid redundancy of feature information while minimizing the computation cost, the RF-mRMR is used to select important and sensitive features (assessed by importance  and sensitivity  and ranked from highest to lowest) from the raw feature set.The features with importance  and sensitivity  in the top  are selected and then used to construct a new feature set.This feature selection procedure formulates the RF-mRMR-BCMFDE process.
In the RF-mRMR-BCMFDE process, for each feature, the importance  and sensitivity  were calculated based on Equations ( 13) and ( 17), respectively, and the results are shown in Figure 8.In Figure 8, feature indexes 1-16 indicate information about highfrequency component information, and feature indexes 17-32 indicate information about low-frequency component information.It can be found that the high-frequency component contains rich information, which is helpful for classification and compensates for the incomplete feature extraction of the low-frequency components.In addition, the sensitivity and importance of different features have significant differences, and therefore, a weighted feature selection strategy is necessary.To quantitatively evaluate the RF-mRMR feature selection method, the KNN classifier was trained and tested as before.For comparison, other feature selection methods, including the RF method and the mRMR method, were also used.The test results are shown in Figure 9. From Figure 9, it can be observed that the RF-mRMR method produced the highest mean accuracy and relatively small standard deviation.At the same time, the RF-mRMR method has higher mean accuracy and smaller standard deviation compared with the results of the BCMFDE method (without using the feature selection algorithm) in Table 3.This indicates that the RF-mRMR method can effectively reduce the redundancy of the feature set, further proving the effectiveness of the method.To quantitatively evaluate the RF-mRMR feature selection method, the classifier was trained and tested as before.For comparison, other feature selection methods, including the RF method and the mRMR method, were also used.The test results are shown in Figure 9. From Figure 9, it can be observed that the RF-mRMR method produced the highest mean accuracy and relatively small standard deviation.At the same time, the RF-mRMR method has higher mean accuracy and smaller standard deviation compared with the results of the BCMFDE method (without using the feature selection algorithm) in Table 3.This indicates that the RF-mRMR method can effectively reduce the redundancy of the feature set, further proving the effectiveness of the method.To quantitatively evaluate the RF-mRMR feature selection method, the KNN classifier was trained and tested as before.For comparison, other feature selection methods, including the RF method and the mRMR method, were also used.The test results are shown in Figure 9. From Figure 9, it can be observed that the RF-mRMR method produced the highest mean accuracy and relatively small standard deviation.At the same time, the RF-mRMR method has higher mean accuracy and smaller standard deviation compared with the results of the BCMFDE method (without using the feature selection algorithm) in Table 3.This indicates that the RF-mRMR method can effectively reduce the redundancy of the feature set, further proving the effectiveness of the method.

Experimental Validation
In this section, two experimental examples are used to verify the bearing damage detection effectiveness and generalization capability of the proposed signal processing framework.The first example is focused on the diagnosis of different fault categories, while the second example emphasizes the fault categories as well as the fault severities.

Test Setup
The experimental setup is shown in Figure 10, which is composed of a motor, a shaft supported by a test bearing and a healthy bearing, and a belt-wheel loading system.The shaft is driven by the motor at 1800 rpm.The driving side bearing is healthy, and the driven bearing is the test bearing which can be embedded with different faults.The kinematics of the test bearing are listed in Table 4.

Experimental Validation
In this section, two experimental examples are used to verify the bearing damage detection effectiveness and generalization capability of the proposed signal processing framework.The first example is focused on the diagnosis of different fault categories, while the second example emphasizes the fault categories as well as the fault severities.

Test Setup
The experimental setup is shown in Figure 10, which is composed of a motor, a shaft supported by a test bearing and a healthy bearing, and a belt-wheel loading system.The shaft is driven by the motor at 1800 rpm.The driving side bearing is healthy, and the driven side bearing is the test bearing which can be embedded with different faults.The kinematics parameters of the test bearing are listed in Table 4.
The vibration is picked up by an accelerometer fixed on the test bearing seat and digitized by using an NI9185-based data acquisition system.The sampling frequency is kHz, and the sampling duration is 60 s.As shown in Figure 11, three different bearing component faults and three different fault combinations are simulated by artificially introducing damage to the bearing parts with electrical discharge machining (EDM).The vibration is picked up by an accelerometer fixed on the test bearing seat and digitized by using an NI9185-based data acquisition system.The sampling frequency is 10.24 kHz, and the sampling duration is 60 s.As shown in Figure 11  Including the healthy baseline, eight categories of bearing conditions were tested, a listed in Table 5.The typical time histories of the vibration signal corresponding to eigh bearing categories are shown in Figure 12.Including the healthy baseline, eight categories of bearing conditions were tested, as listed in Table 5.The typical time histories of the vibration signal corresponding to eight bearing categories are shown in Figure 12.

Diagnosis Results and Analysis
The vibration signals corresponding to the eight bearing categories are processed according to the procedure outlined in Section 4. For each bearing vibration response, a sliding window of 2048 points is applied and 300 samples are extracted from the original signal.
The evaluation procedure used is similar to the one described in Section 4.2.For comparison purposes, the entropy feature extraction methods based on BCMFDE, HFDE, CMFDE, MFDE, MDE, MPE, and MSE are used.The feature sets constructed by different entropy methods are visualized using the t-SNE algorithm, as shown in Figure 13.According to Figure 13, qualitatively, the feature set constructed by the BCMFDE method has the best separability among all the methods used.

. Diagnosis Results and Analysis
The vibration signals corresponding to the eight bearing categories are processed cording to the procedure outlined in Section 4. For each bearing vibration response, a ing window of 2048 points is applied and 300 samples are extracted from the orig signal.
The evaluation procedure used is similar to the one described in Section 4.2.For c parison purposes, the entropy feature extraction methods based on BCMFDE, HF CMFDE, MFDE, MDE, MPE, and MSE are used.The feature sets constructed by diffe entropy methods are visualized using the t-SNE algorithm, as shown in Figure 13.cording to Figure 13, qualitatively, the feature set constructed by the BCMFDE met has the best separability among all the methods used.For the quantitative evaluation, the test results of the BCMFDE method and the o entropy feature extraction methods on the KNN classifier are listed in Table 6.It ca seen from Table 6 that the feature extraction method based on BCMFDE obtained the mean accuracy and the smallest standard deviation.The matrix of results of the fifth test was visually analyzed, as sh in Figure 14.As be seen from 14, the number of misclassified samples is smallest in the feature set constructed using the BCMFDE This indicates tha For the quantitative evaluation, the test results of the BCMFDE method and the other entropy feature extraction methods on the KNN classifier are listed in Table 6.It can be seen from Table 6 that the feature extraction method based on BCMFDE obtained the best mean accuracy and the smallest standard deviation.The confusion matrix of the results of the fifth test was visually analyzed, as shown in Figure 14.As can be seen from Figure 14, the number of misclassified samples is the smallest in the feature set constructed using the BCMFDE method.This indicates that the BCMFDE method has the best feature extraction capability for different fault categories of bearings.In the RF-mRMR-BCMFDE process, the results of the calculation of importance  and sensitivity  for each feature are shown in Figure 15.As can be seen in Figure 15, the high-frequency components also contain features of high sensitivity and high importance.In addition, the sensitivity and importance of different features have significant differences and, therefore, suitable feature selection is necessary.In this section, the features with importance  and sensitivity  in the top  = 16 are selected.The selected features are then used to construct a new feature set.In the RF-mRMR-BCMFDE process, the results of the calculation of importance ψ and sensitivity Φ for each feature are shown in Figure 15.As can be seen in Figure 15, the high-frequency components also contain features of high sensitivity and high importance.In addition, the sensitivity and importance of different features have significant differences and, therefore, suitable selection is necessary.In this section, features importance ψ and sensitivity Φ in the top τ = 16 are selected.The selected features are then used to construct a new feature set.
For quantitative evaluation, the test results of the RF-mRMR, RF, and mRMR feature selection methods on the KNN classifier are shown in Figure 16.As can be seen from Figure 16, the RF-mRMR-based feature selection method obtained the best mean accuracy and the smallest standard deviation.and sensitivity  for each feature are shown in Figure 15.As can be seen in Figure 15, the high-frequency components also contain features of high sensitivity and high importance.In addition, the sensitivity and importance of different features have significant differences and, therefore, suitable feature selection is necessary.In this section, the features with importance  and sensitivity  in the top  = 16 are selected.The selected features are then used to construct a new feature set.For quantitative evaluation, the test results of the RF-mRMR, RF, and mRMR feature selection methods on the KNN classifier are shown in Figure 16.As can be seen from Figure 16, the RF-mRMR-based feature selection method obtained the best mean accuracy and the smallest standard deviation.
In addition, the RF-mRMR method has the same mean accuracy and standard deviation compared with the results of the BCMFDE method (without using the feature selection algorithm) in Table 6.This demonstrates that the RF-mRMR feature selection method can effectively reduce the redundancy of the feature set without affecting the classification effect.

The Test Data
In order to verify the effectiveness of the proposed fault diagnosis method in diagnosing different fault categories, as well as the fault severities of rolling bearings, the experimental verification is carried out on the rolling bearing vibration dataset of Case Western Reserve University (CWRU).The rolling bearing fault simulation test rig is shown in Figure 17, which is composed of a motor, a torque transducer, and a dynamometer.The test bearings support the motor shaft.The kinematics parameters of the test bearing are listed in Table 7.
The accelerometer is placed near the drive end the motor and is used to acquire vibration The vibration signals are collected with a 16-channel DAT recorder.The sampling frequency is 12.00 kHz.Single-point faults are introduced to the test bearing parts using EDM with diameters of 0.18 mm, 0.36 mm, and 0.53 mm, respectively.The fault depth is 0.28 mm.In addition, the RF-mRMR method has the same mean accuracy and standard deviation compared with the results of the BCMFDE method (without using the feature selection algorithm) in Table 6.This demonstrates that the RF-mRMR feature selection method can effectively reduce the redundancy of the feature set without affecting the classification effect.

The Test Data
In order to verify the effectiveness of the proposed fault diagnosis method in diagnosing different fault categories, as well as the fault severities of rolling bearings, experimental verification is carried out on the rolling bearing vibration dataset of Case Western Reserve University (CWRU).The rolling bearing test is shown in Figure 17, which composed of a motor, torque transducer, and a dynamometer.test bearings support the motor shaft.The parameters of the test bearing are listed in Table 7.The bearing vibration signal acquired at a motor speed of 1797 rpm is selected as the data sample for analysis.Data samples consist of 10 categories of bearing conditions, including health baseline and different failure categories and severities, as listed in Table 8.The typical time histories of the vibration signals corresponding to the 10 categories of bearings are shown in Figure 18.The accelerometer is placed near the drive end of the motor and is used to acquire vibration signals.The vibration signals are collected with a 16-channel DAT recorder.The sampling frequency is 12.00 kHz.Single-point faults are introduced to the test bearing parts using EDM with fault diameters of 0.18 mm, 0.36 mm, and 0.53 mm, respectively.The fault depth is 0.28 mm.
The bearing vibration signal acquired at a motor speed of 1797 rpm is selected as the data sample for analysis.Data samples consist of 10 categories of bearing conditions, including health baseline and different failure categories and severities, as listed in Table 8.The typical time histories of the vibration signals corresponding to the 10 categories of bearings are shown in Figure 18.

Diagnosis Results and Analysis
In this section, the vibration signals corresponding to the 10 categories of bearings obtained in the experiments are processed according to the procedure outlined in Section 4. For each bearing vibration response, a sliding window of 2048 points is applied and 55 samples are extracted from the original signal.
The evaluation procedure used is similar to Section 4.2.For comparison purposes, the entropy feature extraction methods based on BCMFDE, HFDE, CMFDE, MFDE, MDE, MPE, and MSE are used to extract the entropy features of each sample and construct different feature sets, respectively.The feature sets constructed by different entropy methods are visualized using the t-SNE algorithm, as shown in Figure 19.According to Figure 19, qualitatively, the feature set constructed by the BCMFDE method has the best separability among all the methods used.

Diagnosis Results and Analysis
In this section, the vibration signals corresponding to the 10 categories of bearings obtained in the experiments are processed according to the procedure outlined in Section 4. For each bearing vibration response, a sliding window of 2048 points is applied and 55 samples are extracted from the original signal.
The evaluation procedure used is similar to Section 4.2.For comparison purposes, the entropy feature extraction methods based on BCMFDE, HFDE, CMFDE, MFDE, MDE, MPE, and MSE are used to extract the entropy features of each sample and construct different feature sets, respectively.The feature sets constructed by different entropy methods are visualized using the algorithm, as shown in Figure 19.According to Figure 19, qualitatively, the feature set constructed by the BCMFDE method has the best separability among all the methods used.For the quantitative evaluation, the test results of the BCMFDE method and the entropy feature extraction methods on the KNN classifier are listed in Table 9.It c seen from Table 9 that the feature extraction method based on BCMFDE obtained the mean accuracy and the smallest standard deviation.For the quantitative evaluation, the test results of the BCMFDE method and the other entropy feature extraction methods on the KNN classifier are listed in Table 9.It can be seen from Table 9 that the feature extraction method based on BCMFDE obtained the best mean accuracy and the smallest standard deviation.The confusion matrix of the results of the fifth test was visually analyzed, as shown in Figure 20.As can be seen from Figure 20, the number of misclassified samples is the smallest in the feature set constructed using the BCMFDE method.This indicates that the BCMFDE method has the best feature extraction capability for different fault severities of bearings.For the quantitative evaluation, the test results of the BCMFDE method and the other entropy feature extraction methods on the KNN classifier are listed in Table 9.It can be seen from Table 9 that the feature extraction method based on BCMFDE obtained the best mean accuracy and the smallest standard deviation.The confusion matrix of the results of the fifth test was visually analyzed, as shown in Figure 20.As can be seen from Figure 20, the number of misclassified samples is the smallest in the feature set constructed using the BCMFDE method.This indicates that the BCMFDE method has the best feature extraction capability for different fault severities of bearings.In the RF-mRMR-BCMFDE process, the results of the calculation of importance  and sensitivity  for each feature are shown in Figure 21.As can be seen in Figure 21, the high-frequency components also contain features of higher sensitivity and importance.This indicates that the high-frequency component information is also important for classification.In addition, the sensitivity and importance of different features have significant differences and, therefore, suitable feature selection is necessary.In this section, the features with importance  and sensitivity  in the top  = 16 are selected.In the RF-mRMR-BCMFDE process, the results of the calculation of importance ψ and sensitivity Φ for each feature are shown in Figure 21.As can be seen in Figure 21, the high-frequency components also contain features of higher sensitivity and importance.This indicates that the high-frequency component information is also important for classification.In addition, the sensitivity and importance of different features have significant differences and, therefore, suitable feature selection is necessary.In this section, the features with importance ψ and sensitivity Φ in the top τ = 16 are selected.In the RF-mRMR-BCMFDE process, the results of the calculation of importance and sensitivity for each feature are shown in Figure 21.As can be seen in Figure 21, the high-frequency components also contain features of higher sensitivity and importance.This indicates that the high-frequency component information is also important for classification.In addition, the sensitivity and importance of different features have significant differences and, therefore, suitable feature selection is necessary.In this section, the features with importance and sensitivity in the top = 16 are selected.For quantitative evaluation, the test results of the RF-mRMR, RF, and mRMR feature selection methods on the KNN classifier are shown in Figure 22.As can be seen from Figure 22, the RF-mRMR-based feature selection method obtained the best mean accuracy and the smallest standard deviation.At the same time, the RF-mRMR method had a higher mean accuracy and relatively similar standard deviation compared with the results of the BCMFDE method (without using the feature selection algorithm) in Table 9.The validity of the RF-mRMR method was further demonstrated.For quantitative evaluation, the test results of the RF-mRMR, RF, and mRMR feature selection methods on the KNN classifier are shown in Figure 22.As can be seen from Figure 22, the RF-mRMR-based feature selection method obtained the best mean accuracy and the smallest standard deviation.At the same time, the RF-mRMR method had a higher mean accuracy and relatively similar standard deviation compared with the results of the BCMFDE method (without using the feature selection algorithm) in Table 9.The validity of the RF-mRMR method was further demonstrated.

Conclusions
For condition assessment of rolling bearings in rotating machinery, a combination of the BCMFDE-based feature extraction method and RF-mRMR feature selection method is proposed for the construction of rolling bearing fault feature sets.The BCMFDE can extract richer feature information from high-frequency components and low-frequency components for characterizing bearing fault features while The application of RF-mRMR can effectively select features with high importance and sensitivity suitable for classification, thus improving efficiency in the classification and identification of fault categories and reducing the redundancy of the feature sets.The validation results of numerical simulations and two experiments demonstrate that using the proposed framework, i.e., the combination of the fault feature sets construction process and the KNN classifier, is able to automatically identify bearing fault categories, in addition to bearing fault severity.The proposed framework provides a new perspective for intelligent bearing fault diagnosis.With some modifications, it is expected that the framework can be expanded to the intelligent diagnosis of other types of faults such as gear damage or to the intelligent condition monitoring of a complete system.

Conclusions
For condition assessment of rolling bearings in rotating machinery, a combination of the BCMFDE-based feature extraction method and RF-mRMR feature selection method is proposed for the construction of rolling bearing fault feature sets.The BCMFDE can extract richer feature information from high-frequency components and low-frequency components for characterizing bearing fault features while The application of RF-mRMR can effectively select features with high importance and sensitivity suitable for classification, thus improving efficiency in the classification and identification of fault categories and reducing the redundancy of the feature sets.The validation results of numerical simulations and two experiments demonstrate that using the proposed framework, i.e., the combination of the fault feature sets construction process and the KNN classifier, is able to automatically identify bearing fault categories, in addition to bearing fault severity.The proposed framework provides a new perspective for intelligent bearing fault diagnosis.With some modifications, it is expected that the framework can be expanded to the intelligent diagnosis of other types of faults such as gear damage or to the intelligent condition monitoring of a complete system.

3 .
Selecting the  samples with the smallest distance.4.Calculating the frequency of occurrence of the category in which the top  samples are located.5. Returning the category with the highest occurrence frequency among the top  samples as the classification of the test sample.The value of the nearest neighbor number K affects the results of the model, as shown in Figure2.As shown in Figure2, the judgment results under  = 5 or  = 10 are inconsistent with those under  = 1.This indicates that the number of nearest neighbors  affects the complexity and generalization of the model.Therefore, in order to make the model have better generalization, in this paper,  = 5 is chosen as the number of nearest neighbors for the KNN classifier.

Figure 3 .
Figure 3. Schematic diagram of the intelligent fault diagnosis framework.

Figure 3 .
Figure 3. Schematic diagram of the intelligent fault diagnosis framework.

Figure 5 .
Figure 5. Simulated vibration signals with different bearing states.Figure 5. Simulated vibration signals with different bearing states.

Figure 5 .
Figure 5. Simulated vibration signals with different bearing states.Figure 5. Simulated vibration signals with different bearing states.

Figure 6 .
Figure 6.Feature set visualization for different entropy methods.Figure 6. Feature set visualization for different entropy methods.

Figure 6 .
Figure 6.Feature set visualization for different entropy methods.Figure 6. Feature set visualization for different entropy methods.

7 . 26 Figure 8 .
Figure 8.The normalized importance and sensitivity of each feature.

Figure 8 .
Figure 8.The normalized importance and sensitivity of each feature.

Figure 8 .
Figure 8.The normalized importance and sensitivity of each feature.
, three different bearing component faults and three different fault combinations are simulated by artificially introducing damage to the bearing parts with electrical discharge machining (EDM).

Figure 12 .
Figure 12.The raw vibration signal of eight categories of conditions of rolling bearings.Figure 12.The raw vibration signal of eight categories of conditions of rolling bearings.

Figure 12 .
Figure 12.The raw vibration signal of eight categories of conditions of rolling bearings.Figure 12.The raw vibration signal of eight categories of conditions of rolling bearings.

Figure 13 .
Figure 13.Feature set visualization for different entropy methods.

Figure 13 .
Figure 13.Feature set visualization for different entropy methods.

Figure 14 .
Figure 14.Confusion matrix of different entropy methods.

Figure 15 . 26 Figure 15 .
Figure 15.The normalized importance and sensitivity of each feature.

Figure 18 .
Figure 18.The raw vibration signal of 10 categories of conditions of rolling bearings.

Figure 18 .
Figure 18.The raw vibration signal of 10 categories of conditions of rolling bearings.Sensors 2023, 23, x FOR PEER REVIEW 22

Figure 19 .
Figure 19.Feature set visualization for different entropy methods.

Figure 19 .
Figure 19.Feature set visualization for different entropy methods.

Figure 19 .
Figure 19.Feature set visualization for different entropy methods.

Figure 20 .
Figure 20.Confusion matrix of different entropy methods.

Figure 20 .
Figure 20.Confusion matrix of different entropy methods.

Figure 20 .
Figure 20.Confusion matrix of different entropy methods.

Figure 21 .
Figure 21.The normalized importance and sensitivity of each feature.

Figure 21 .
Figure 21.The normalized importance and sensitivity of each feature.

Table 1 .
Parameters of the simulation bearing.Sensors 2023, 23, x FOR PEER REVIEW sampling frequency is 10.24 kHz.The shaft rotating speed is 1800 rpm.The bear rameters are listed in Table 1.The schematic diagram of the simulated bearing is in Figure 4.

Table 1 .
Parameters of the simulation bearing.

Table 2 .
Parameters of the entropy-based methods.

Table 2 .
Parameters of the entropy-based methods.

Table 3 .
Testing accuracy and time obtained using different methods.

Table 4 .
Parameters of the test bearing.

Table 4 .
Parameters of the test bearing.

Table 6 .
Testing accuracy and time obtained using different methods.

Table 6 .
Testing accuracy and time obtained using different methods.

Table
Parameters of the test bearing.

Table 7 .
Parameters of the test bearing.

Table 8 .
The detailed descriptions for 10 different working conditions.Rolling bearing fault simulation test rig.

Table 8 .
The detailed descriptions for 10 different working conditions.

Table 9 .
Testing accuracy and time obtained using different methods.

Table 9 .
Testing accuracy and time obtained using different methods.

Table 9 .
Testing accuracy and time obtained using different methods.