A New Fuzzy Logic Classifier Based on Multiscale Permutation Entropy and Its Application in Bearing Fault Diagnosis

The self-organizing fuzzy (SOF) logic classifier is an efficient and non-parametric classifier. Its classification process is divided into an offline training stage, an online training stage, and a testing stage. Representative samples of different categories are obtained through the first two stages, and these representative samples are called prototypes. However, in the testing stage, the classification of testing samples is completely dependent on the prototype with the maximum similarity, without considering the influence of other prototypes on the classification decision of testing samples. Aiming at the testing stage, this paper proposed a new SOF classifier based on the harmonic mean difference (HMDSOF). In the testing stage of HMDSOF, firstly, each prototype was sorted in descending order according to the similarity between each prototype in the same category and the testing sample. Secondly, multiple local mean vectors of the prototypes after sorting were calculated. Finally, the testing sample was classified into the category with the smallest harmonic mean difference. Based on the above new method, in this paper, the multiscale permutation entropy (MPE) was used to extract fault features, linear discriminant analysis (LDA) was used to reduce the dimension of fault features, and the proposed HMDSOF was further used to classify the features. At the end of this paper, the proposed fault diagnosis method was applied to the diagnosis examples of two groups of different rolling bearings. The results verify the superiority and generalization of the proposed fault diagnosis method.


Introduction
Rotating machinery has been widely used in various modern industries such as wind turbines, aero engines, water turbines, and gas turbines. As a key component of rotating machinery, rolling between the testing samples and the prototypes in each category. This does not take into account the impact of other prototypes in the same category on the classification of testing samples, so classification accuracy may be affected. This paper have improved the testing stage of SOF from the classification decision of SOF. The harmonic mean difference (HMDSOF) proposed in this paper not only considers the influence of other prototypes on testing samples but also assigns different weights to different prototypes. In the experimental part, the influence of the parameter g on the classification result of HMDSOF is analyzed by the bearing fault data of Case Western Reserve University, and the default value of the parameter g is given. Then, by comparing the classification results of HMDSOF with SOF, SVM, DT, KNN, ELM, least squares support vector machine (LSSVM), and kernel extreme learning machine (KELM), the validity and rationality of the proposed HMDSOF are illustrated. Finally, the generalization of HMDSOF is verified by bearing testing data of coal washer.

Multiscale Permutation Entropy
MPE can be defined as the set of permutation entropy values of time series at different scales, and its calculation can be described as: (1) Assuming a one-dimensional time series x(i), i = 1 ∼ N of length N. Set the embedded dimension as m and set the delay time as τ, and then conduct phase space reconstruction to obtain the matrix in the following form: · · · · · · · · · · · · x( j) x( j + τ) · · · x( j + (m − 1)τ) · · · · · · · · · · · · x(K) x(K + τ) · · · x(K + (m − 1)τ) where K is the number of reconstruction vectors, K = N − (m − 1)τ.
To explain Formula (1), let us give an example, assuming that x = (4, 8,9,6,5,11,7). When τ = 1, m = 3, five embedding vectors can be obtained as: ( (2) Arrange the reconstruction matrix of each row according to the increasing rule: It is important to note that if two equal elements exist in the reconstructed vector, the two elements are arranged in the original order. That is to say, suppose that p and q are any two numbers between 1 and m, if x(i + ( j p − 1)τ) = x(i + ( j q − 1)τ) and p < q, the following formula can be obtained.
x(i + ( j p − 1)τ) ≤ x(i + ( j q − 1)τ) (4) (3) The symbol sequence corresponding to k reconstruction vectors of one-dimensional time series, whose permutation entropy is expressed as: PE P (m) = − k j=1 P j ln P j (5) Entropy 2020, 22, 27 4 of 20 where P j represents the probability of any time series occurring. (4) After steps (1), (2), and (3), the permutation entropy of the first scale is calculated. When calculating multiscale permutation entropy, it is necessary to use Formula (6) to conduct multiscale coarse granulation treatment on time series.
where s = 1, 2, . . . , is the scale factor. y (s) j represents the coarse granulation time series of length N s , it can be seen from Formula (6) that the coarse graining process is achieved by calculating the average value of the time series. y (1) j ( j = 1, 2, . . . , N) is the original time series. (5) After coarse granulation of time series, according to steps (1), (2) and (3), permutation entropy of different scales is calculated.

Linear Discriminant Analysis
Theoretically, the extracted multiscale permutation entropy set can be used to identify fault categories. The high-dimensional feature contains a lot of redundant information, so it is necessary to use the dimensionality reduction algorithm to reduce the dimension of the initial, which can not only avoid the dimension disaster, but also improve the performance of fault diagnosis. The role of LDA is to project a high-dimensional matrix into a low-dimensional matrix with minimal intraclass dispersion and maximum interclass dispersion. Assume that the calculated multiscale permutation entropy set is Y = [y 1 , y 2 , . . . , y n ] ∈ R d×n , where n is the total number of samples, d is the dimension, and d = s = 32. LDA will supervise the learning of a linear transformation matrix W ∈ R d×m (m d) by itself. After the calculation as below, the high-dimensional data set y ∈ R d is mapped to the low-dimensional data set x ∈ R m .
x = W T y Y is classified as Y = [ 1 , 2 , . . . , C ], C represents the number of categories. i ∈ R d×n i is the data set of category i, and n i is the number of data samples in the category i.
The optimal projection matrix W should satisfy the following formula: (y i − y)(y i − y) T is defined as a discrete matrix of the whole class, y = 1

Self-Organizing Fuzzy Logic Classifier
SOF is a fuzzy rule classifier without parameters. The algorithm includes three stages: the offline training stage, online training stage, and testing stage. In the first two stages, the fuzzy rules of each category were constructed based on the prototype of each category after the meta-parameters were updated iteratively, and the test samples were classified in the testing stage. The specific process is as follows:

Offline Training Stage
The role of the offline training stage is to find prototypes from different categories and build fuzzy rules that belong to different categories. Suppose there are a total of K samples, (the sample here refers to the low-dimensional feature vector processed by LDA), and the sample set belonging to the category D MM K c u c i , and r 2 is the sample that has the smallest distance from r 1 , that is, . r 3 is a sample with the smallest distance from r 2 , and so on. The multimodal density set of the sorted sample set {r} is represented as D MM K c (r) . Then, select the initial prototype according to Formula (10).
where p 0 represents a collection of initial prototypes.
(3) In order to increase the number of initial prototypes, the initial prototypes selected by Equation (11) is used as the center to attract nearby samples to form a data cloud.
It is important to note here that as mentioned above, the sample x i may not be unique, so the data cloud may not consist of only two samples.
(4) Define the set p 0 of the initial prototypes obtained by the Formula (10) as ϕ 0 ; that is, define the set of the data cloud center as ϕ 0 . Recalculate the multimodal density according to Equation (12).
where ϕ i ∈ ϕ 0 , S i is the number of samples in the ith data cloud, and n is the number of elements in the set ϕ 0 .
(5) According to Formula (13), the set ϕ neigbor i of adjacent centers of each data cloud center is composed.
Entropy 2020, 22, 27 6 of 20 G c,L K c is the average radius of the locally affected area around the data sample corresponding to the level of granularity L, with a default value of L = 12. The calculation process is as shown in Equation (14): is the average radius of the granularity level L − 1.
Q c,L K c is the number of times that the distance between any two samples in {x} c K c is less than G c,L−1 K c . Q c,1 K c is the number of times that the distance between any two samples in {x} c K c is less than the average square distance d c K c .
(6) According to Formula (16), select the most representative prototype p c in the category c from the center of the data cloud.
where ϕ ∈ ϕ neigbor i . (7) After determining the representative prototypes of category c, according to Formula (17), AnYa type fuzzy rules belonging to each category are constructed, where N c is the number of prototypes in p c . i where x represents a training sample, and ∼ represents similarity.

Online Training Stage
After the offline training stage, it is followed by the input of online training samples to continue training. The purpose of the online training stage is to continue to select prototypes, update the meta-parameters of the classifier, and improve the classification accuracy of the test samples. The online training process is based on the assumption that the samples are stream data that appear one by one. When the online training sample is input, it is assumed that the new sample of the category c is x c K c +1 , and the sample set after increasing the sample is defined as {x} c K c +1 . In order to improve the computational efficiency, the average radius of the locally affected area will be calculated according to the new formula: whether the sample x c K c +1 is a prototype will be determined according to Formula (19) i f where mm is the number of elements in p c , (p, p c l , p c j ) ∈ p c . If Formula (19) is not satisfied, we can continue to judge whether the sample x c K c +1 is a prototype according to Formula (20). If any of Formula (19) or Formula (20) is satisfied, the meta-parameter of the SOF is updated as follows: If neither Formulas (19) nor (20) is satisfied, the sample is assigned to the nearest prototype, that is, p c n * = argmin d x c K c +1 , p . The corresponding meta-parameters are updated as follows: After that, Equation (17) will be updated accordingly. The SOF classifier is ready to process the next data sample or enter the testing stage.

Testing Stage
The role of the testing stage is to classify the input testing samples. Assuming that the testing sample set is z 1 , z 2,... z vv , in order to determine the category of a testing sample z ii , the classification process of SOF is as follows: (1) According to Formula (23), calculate the similarity between each prototype selected in the first two stages and the testing sample.
(2) Classify the testing sample into the category of the prototype that has the greatest similarity to the testing sample.

Proposed HMDSOF
After the offline training phase and the online training phase, SOF selects a number of representative prototypes from each category of samples, and when selecting prototypes, different categories of samples will not affect each other. However, in the testing stage, the classification of the testing sample is only related to the prototype with the greatest similarity to the testing sample. The effect of other prototypes in the same category on the testing sample classification decisions is not considered, which affects the classification accuracy. In order to improve the classification accuracy, in this paper, we propose a SOF classifier based on harmonic mean difference, which is called HMDSOF. The first two stages of HMDSOF are the same as those of SOF. In the testing stage of HMDSOF, the category of the testing sample is determined and assigned a label corresponding to the category by calculating the harmonic mean difference between the testing sample and each prototype. The content of innovation is mainly two points: (1) The influence of different prototypes in the same category on the classification of the testing sample is considered by calculating multiple local mean vectors in the samples of each category. (2) In order to distinguish the influence of different prototypes in the same category on the testing sample classification decision, the harmonic mean difference constructed by introducing the concept of the harmonic mean is used as the decision of the testing sample classification. In addition, prototypes that differ slightly from test samples have greater weight in their classification decisions. The main process is as follows.
(1) Calculate the similarity between each prototype in each category and the test sample using Equation (22), and arrange the results in descending order.
(2) In each category, the corresponding prototype is sorted according to the result of similarity ranking-that is to say, the prototype with greater similarity to the test sample z ii is sorted in front. (3) According to Formula (25), calculate the local average vector of the prototype in category c after sorting.
where 1 ≤ i ≤ g, and g is a parameter set before the test stage. In addition, g cannot be bigger than the minimum of the number of prototypes in every category-that is g ≤ min(N c ). It is easy to conclude that g is also the number of local average vectors a Rc i , and a c 1 = p Rc 1 . (4) Construct the harmonic mean difference by introducing the concept of the harmonic mean value. Suppose there is a sample set y 1 , y 2 , . . . , y g with g elements, and its harmonic mean value is calculated as shown in Formula (26). The calculation of difference is shown in Formula (27). It can be seen that the value range of the difference is Di f f erence(z ii , p) ≥ 1. The harmonic mean difference is the sum of the harmonic mean value of the difference between each prototype in the same category and the testing sample. This paper applies the proposed harmonic mean difference to the SOF-based classification decision. The harmonic mean difference is defined as HMD(.), and the calculation process is as shown in Equation (28).
To illustrate the role of harmonic mean differences in assigning different weights to different prototypes in the same category, Equation (29) is given. As can be seen from Equation (29), when setting parameter g, HMD z ii , a c i g i=1 is a fixed value. As a result, the prototype that has a smaller difference from the testing sample will be given more weight.
(5) Assign the testing sample to the category with the smallest harmonic mean difference. It should be noted that when g = 1, Formula (30) can be converted into Formula (24); that is, when g = 1, the HMDSOF degenerates into SOF. Equation (31) expresses the relationship between SOF and HMDSOF. label

Proposed Fault Diagnosis Method
The fault diagnosis method proposed in this paper is shown in Figure 1: after the vibration signal is collected, the multiscale permutation entropy set is firstly extracted. The parameters of multiscale permutation entropy selected in this paper are an embedded dimension of m = 6 and delay time of τ = 1. In order to obtain the signal features as much as possible, the scale factor is set as s = 32 [33,34]. It can be seen that such a feature set has many scales and the entropy values are crossed together, which is not conducive to the final classification. Therefore, in this paper, linear discriminant analysis (LDA) is used to conduct dimensionality reduction for the multiscale permutation entropy feature set and the dimension of the feature set after the dimension reduction is nine. Then, the reduced dimensional feature set is randomly divided into online training samples, offline training samples, and testing samples. Finally, the proposed HMDSOF classifier is used for classification. After the training parameters of the HMDSOF are updated in two training stages, the testing samples are classified. For the convenience of description, this fault diagnosis method is named MPE-LDA-HMDSOF.

Proposed Fault Diagnosis Method
The fault diagnosis method proposed in this paper is shown in Figure 1: after the vibration signal is collected, the multiscale permutation entropy set is firstly extracted. The parameters of multiscale permutation entropy selected in this paper are an embedded dimension of 6 m  and delay time of 1   . In order to obtain the signal features as much as possible, the scale factor is set as 32 s  [33,34]. It can be seen that such a feature set has many scales and the entropy values are crossed together, which is not conducive to the final classification. Therefore, in this paper, linear discriminant analysis (LDA) is used to conduct dimensionality reduction for the multiscale permutation entropy feature set and the dimension of the feature set after the dimension reduction is nine. Then, the reduced dimensional feature set is randomly divided into online training samples, offline training samples, and testing samples. Finally, the proposed HMDSOF classifier is used for classification. After the training parameters of the HMDSOF are updated in two training stages, the testing samples are classified. For the convenience of description, this fault diagnosis method is named MPE-LDA-HMDSOF.

Experiment 1
In Experiment 1, the experimental data of rolling bearings provided by Case Western Reserve University (CWRU) is used to verify the effectiveness of the proposed method. The experimental equipment is shown in Figure 2. It consists mainly of a three-phase induction motor, a torque sensor, and a load motor. The testing bearings are 6205-2RS (SKF, Sweden) deep groove bearings. The vibration acceleration signal of the bearing is obtained from the driving end under the condition of a rotation speed of 1797 r/min and a sampling frequency of 12 kHz. The bearing vibration signals are first classified into four categories, namely ordinary rolling bearings (normal) and rolling bearings with ball failure (B), outer ring failure (OR), and inner ring failure (IR). The faulty bearing is formed on the normal bearing by using electro-discharge machining (EDM), and each fault condition is classified according to the fault size of 0.007, 0.014, and 0.021 inches (1 inch = 25.4 mm), so the bearing vibration signal is finally classified into 10 categories. The first 102,400 points under each category are divided into 50 non-overlapping data samples on average; that is, 2048 sampling points are taken as a sample, and 50 samples can be obtained for each category, for a total of 500 samples. A detailed description of the class label is given in Table 1. The time-domain waveforms of their typical vibration signals are shown in Figure 3. So, a multiscale permutation entropy feature set with the size of 500 × 32 is obtained. The results of the multiscale permutation entropy corresponding to the vibration signal of Figure 3 are shown in Figure 4. In this paper, 10 samples are randomly selected from each category to form the online training sample set. In the remaining samples, 10 samples are randomly selected in each category to form the offline training sample set, and then the remaining samples constitute the testing sample set. It is known that both the online training set and offline training set have 100 samples, and the test sample set has 300 samples.
with ball failure (B), outer ring failure (OR), and inner ring failure (IR). The faulty bearing is formed on the normal bearing by using electro-discharge machining (EDM), and each fault condition is classified according to the fault size of 0.007, 0.014, and 0.021 inches (1 inch = 25.4 mm), so the bearing vibration signal is finally classified into 10 categories. The first 102,400 points under each category are divided into 50 non-overlapping data samples on average; that is, 2048 sampling points are taken as a sample, and 50 samples can be obtained for each category, for a total of 500 samples. A detailed description of the class label is given in Table 1. The time-domain waveforms of their typical vibration signals are shown in Figure 3. So, a multiscale permutation entropy feature set with the size of 500 32  is obtained. The results of the multiscale permutation entropy corresponding to the vibration signal of Figure 3 are shown in Figure 4. In this paper, 10 samples are randomly selected from each category to form the online training sample set. In the remaining samples, 10 samples are randomly selected in each category to form the offline training sample set, and then the remaining samples constitute the testing sample set. It is known that both the online training set and offline training set have 100 samples, and the test sample set has 300 samples. . Figure 2. Experimental equipment. Figure 2. Experimental equipment.     Since the experiment in this paper is conducted under the condition of randomly select samples, in order to reduce the impact of contingency, the average value of 10 experiments is taken, and the maximum and minimum values of classification accuracy are given. In addition, the standard deviation of classification accuracy is given to analyze the stability of the classification method. In this paper, three different feature extraction methods (MPE, MPE-PCA, and MPE-LDA) are used to extract fault features and then used for the comparison between SOF and the proposed HMDSOF, and the comparison is listed in Table 2. All the methods are implemented on MATLAB R2016a version and tested on Intel Core CPU i5-6200U @2.30GHz /4.00GB RAM and a Win10 computer with a 64-bit operating system. Since the experiment in this paper is conducted under the condition of randomly select samples, in order to reduce the impact of contingency, the average value of 10 experiments is taken, and the maximum and minimum values of classification accuracy are given. In addition, the standard deviation of classification accuracy is given to analyze the stability of the classification method. In this paper, three different feature extraction methods (MPE, MPE-PCA, and MPE-LDA) are used to extract fault features and then used for the comparison between SOF and the proposed HMDSOF, and the comparison is listed in Table 2. All the methods are implemented on MATLAB R2016a version and tested on Intel Core CPU i5-6200U @2.30 GHz/4.00 GB RAM and a Win10 computer with a 64-bit operating system. The contribution rate of each principal component of MPE after PCA treatment is listed in Table 3, and the first eight principal components of the cumulative contribution rate of 90% are selected to form a feature set. Since the minimum number of prototypes of the third category obtained after the end of training in this experiment is 4, the case of g ≥ 5 does not exist. It can be seen that the bigger the value of g is, the longer the classification time will be. When the fault feature extraction method is MPE (numbered 1-4), the average classification time of HMDSOF (g = 3) consumes 1.0576 s more than that of SOF. The average classification accuracy of HMDSOF (g = 3) is 0.7334% higher than that of SOF, and the standard deviation of the classification accuracy of HMDSOF (g = 3) is 0.0005 lower than that of SOF. When the fault feature extraction method is MPE-PCA (numbered 5-8), compared with SOF, the average classification time of HMDSOF (g = 3) is 1.0379 s longer, its average classification accuracy is 0.6% higher, and its standard deviation of classification accuracy is 0.0054 lower. When the fault feature extraction method is MPE-LDA (numbered 9-12), compared with SOF, the average classification time of HMDSOF (g = 3) is 0.9893 s longer, and the classification accuracy standard deviation is reduced by 0.0022. In addition, the average accuracy of classification was only improved by 0.3667%, but the maximum accuracy of HMDSOF reached 100%, which was satisfactory. When the classification method is HMDSOF and different feature extraction methods are selected (for example, numbered 2, 6, 10, or numbered 3, 7, 11), the comparison of the five indicators shows the advantages of the proposed MPE-LDA-HMDSOF. In conclusion, three different fault extraction methods have shown a better classification effect than SOF after being used as an input of HMDSOF, which proves the effectiveness of the proposed HMDSOF. Under the premise of using the same classification method HMDSOF, the rationality of the proposed fault diagnosis method MPE-LDA-HMDSOF is proved by adopting different classifier inputs. In addition, as the value of g increases, the longer the classification takes, and when g = 3, the classification efficiency of the HMDSOF classifier is optimal, so the default value of g is set to 3. In order to make the proposed HMDSOF more convincing, this paper also compares it with other common classification methods, which are SVM, DT, KNN, ELM, least squares support vector machine (LSSVM), and kernel extreme learning machine (KELM), respectively. The input of each classification method is the features set processed by LDA after calculating multiscale permutation entropy. The training samples of the six classification methods as comparisons are the sum of the online training samples and offline training samples of the HMDSOF, and the test samples used by them are the same as those of HMDSOF. The penalty factor of a standard SVM is 100, and the kernel function is 0.01. The minimum number of father nodes of DT is 5. The nearest neighbor number of KNN is K = 5, and the number of hidden layer nodes of ELM is 100 [21,35]. The Gaussian kernel function of the LSSVM is 0.5. The kernel function of the KELM is RBF, and its regularization parameter is 10,000 [36][37][38][39]. The classification results are shown in Table 4. It can be seen from Table 4 that the SVM has the lowest classification accuracy, and it can be seen from the standard deviation that the classification effect of this method on different testing samples is very different, and the classification algorithm is very unstable. The standard deviation of the classification accuracy of DT is 1.7525, the algorithm is very unstable, and the minimum classification accuracy is 9% lower than that of HMDSOF. The maximum classification accuracy of KNN is 99%, but the standard deviation of classification accuracy is 1.4915 higher than that of HMDSOF. The input of different samples has a great influence on KNN classification accuracy. The average classification accuracy of ELM is 2.0333% lower than HMDSOF, and the standard deviation of classification accuracy is 0.7316 higher than that of HMDSOF. Compared with SVM, the calculation speed and classification accuracy of LSSVM have been significantly improved. KELM has the fastest calculation speed, but its maximum and minimum classification accuracy are 1% lower than HMDSOF. In addition, from the standard deviation of classification accuracy, the KELM classification stability is not as good as the proposed HMDSOF. In a word, the classification accuracy of HMDSOF is the highest; thus, the classification result is the best.
In order to express the classification effects of various classification methods more intuitively, Figure 5 shows the classification results of various classification methods in the fifth experiment. SVM has the lowest classification accuracy. Seventy of the 300 samples do not match the real category. Among the 70 misclassified samples, 67 samples of different categories are classified into category 6, with an overall classification accuracy of 76.6667%. In the classification results of DT, 27 samples are misclassified, and the overall classification accuracy is 91%. In the classification results of KNN, six samples are misclassified, of which four samples in category 6 are classified as category 3, and one sample in category 6 is classified as category 9. In the nine categories, one sample is misclassified as category 5, and the overall classification accuracy of KNN reached 98%. A total of 10 samples in the classification result of ELM are misclassified, and its overall classification accuracy is 91%. In the classification results of SOF, four samples were misclassified, among which three samples in category 6 are classified as category 3, and one sample in category 9 is classified as category 5. The total classification accuracy of SOF is 98.6667%. There are 20 misclassified samples in the classification results of LSSVM, and its classification accuracy is 93.3333%. There are five misclassified samples in the classification results of KELM, and its classification accuracy is 98.3333%. In the classification results of proposed HMDSOF, there are no misclassified samples, and the classification accuracy is 100%.
6 are classified as category 3, and one sample in category 9 is classified as category 5. The total classification accuracy of SOF is 98.6667%. There are 20 misclassified samples in the classification results of LSSVM, and its classification accuracy is 93.3333%. There are five misclassified samples in the classification results of KELM, and its classification accuracy is 98.3333%. In the classification results of proposed HMDSOF, there are no misclassified samples, and the classification accuracy is 100%.  In addition, in order to evaluate the results of this experiment from different perspectives, F scores  was introduced [40]. Its calculation process is shown in Formulas (32)  In addition, in order to evaluate the results of this experiment from different perspectives, F − scores was introduced [40]. Its calculation process is shown in Formulas (32)- (34).
where precision( j), Recall( j), and F − scores( j) represent the precision, recall, and F-scores measures of the j-th predicted class; respectively [41]. The F − scores of each category corresponding to the experimental results in Figure 5 is shown in Figure 6.

Experiment 2
The fan-end bearing of CWRU has proven to be a more complex database [35]. In Experiment 2, we use its data to verify the effectiveness of the proposed fault diagnosis method. All the parameters used in Experiment 2 are exactly the same as those in Experiment 1. The classification results of each classification method are shown in Table 5.

Experiment 3
This section uses the experimental data of the rolling bearing in the coal washer to verify the

Experiment 2
The fan-end bearing of CWRU has proven to be a more complex database [35]. In Experiment 2, we use its data to verify the effectiveness of the proposed fault diagnosis method. All the parameters used in Experiment 2 are exactly the same as those in Experiment 1. The classification results of each classification method are shown in Table 5.

Experiment 3
This section uses the experimental data of the rolling bearing in the coal washer to verify the generalization of the proposed fault diagnosis method. The experimental device is shown in Figure 7a. The motor speed is 1500 r/min, and the sampling frequency is 10 KHz. There are two acceleration sensors used to measure the bearing signal, and the position of the measuring point is shown in Figure 7b. The two bearing models are NJ210 (NSK, Japan) and NJ405 (NSK, Japan), respectively. NJ210 has two states, normal and crack, and NJ405 also has two states, normal and peeling. Their fault status is shown in Figure 8. In order to distinguish the two bearings, NJ210 is defined as A and NJ405 as B, so the collected signals can be divided into four categories. Their classification is shown in Table 6, and the typical time-domain diagram corresponding to the four states is given in Figure 9.
and ELM, there are 400 training samples and 400 test samples. The other parameters used in Experiment 3 are the same as those used in Experiment 1, and the comparison results are shown in Table 7.        It can be concluded from Table 7 that among the six classification methods, SVM has the lowest classification accuracy and the worst classification effect. The classification result of KNN is the most unstable, the standard deviation of classification accuracy is the largest, and the classification time is the longest. From the four indicators of classification accuracy, the classification effect of SOF is better than that of SVM, DT, KNN, and ELM. Although the average classification time of HMDSOF is 0.717304 s more than SOF, its maximum classification accuracy is 1.25% higher than SOF, its minimum classification accuracy is 1.5% higher than SOF, its average classification accuracy is 1.375% higher than SOF, and its classification standard deviation is 0.108864 lower than SOF; such results are satisfactory. The standard deviation of the classification accuracy of LSSVM is very close to that of HMDSOF, but the average classification accuracy is 4.1% lower than that of HMDSOF. KELM has the fastest classification speed and the shortest classification time; However, its maximum classification accuracy is 0.5% lower than HMDSOF, and the average classification accuracy is 1.625% lower than HMDSOF.

Conclusions
In this paper, a new SOF classifier (HMDSOF) based on the harmonic mean difference is proposed. Based on this, a new bearing fault diagnosis method is proposed. The validity and generalization of the proposed fault diagnosis method are verified by the bearing experimental data of Case Western Reserve University and the bearing experimental data of coal washer. The following conclusions can be drawn in this paper. After calculating the multiscale permutation entropy of the obtained experimental data, LDA is used for dimensionality reduction processing, and the feature set after dimensionality reduction is input into different classification methods for comparison. In this experiment, there are 200 samples for each state, among which 50 samples from each category were randomly selected as the online training samples of HMDSOF and SOF. Then, 50 samples from the remaining 150 samples were randomly selected for offline training, and the remaining 100 samples were used as the testing samples.  Table 7. Table 7. Classification results of various methods. It can be concluded from Table 7 that among the six classification methods, SVM has the lowest classification accuracy and the worst classification effect. The classification result of KNN is the most unstable, the standard deviation of classification accuracy is the largest, and the classification time is the longest. From the four indicators of classification accuracy, the classification effect of SOF is better than that of SVM, DT, KNN, and ELM. Although the average classification time of HMDSOF is 0.717304 s more than SOF, its maximum classification accuracy is 1.25% higher than SOF, its minimum classification accuracy is 1.5% higher than SOF, its average classification accuracy is 1.375% higher than SOF, and its classification standard deviation is 0.108864 lower than SOF; such results are satisfactory. The standard deviation of the classification accuracy of LSSVM is very close to that of HMDSOF, but the average classification accuracy is 4.1% lower than that of HMDSOF. KELM has the fastest classification speed and the shortest classification time; However, its maximum classification accuracy is 0.5% lower than HMDSOF, and the average classification accuracy is 1.625% lower than HMDSOF.

Conclusions
In this paper, a new SOF classifier (HMDSOF) based on the harmonic mean difference is proposed. Based on this, a new bearing fault diagnosis method is proposed. The validity and generalization of the proposed fault diagnosis method are verified by the bearing experimental data of Case Western Reserve University and the bearing experimental data of coal washer. The following conclusions can be drawn in this paper.
(1) As the parameter g increases, the classification time of HMDSOF increases. When g = 3, the classification effect of HMDSOF is optimal. (2) Under the premise of the same input, the proposed classification effect of HMDSOF is always higher than that of SOF, and the classification effect is better. By comparing with SVM, DT, KNN, ELM, LSSVM, and KELM, the proposed HMDSOF has higher classification accuracy and can be better used for bearing fault diagnosis. (3) By changing the input of the classifier, it is proved that the proposed bearing fault diagnosis method MPE-LDA-HMDSOF has better classification performance, and the classification accuracy reaches 100%.