Bearing Fault Diagnosis Based on Multiscale Permutation Entropy and Support Vector Machine

Bearing fault diagnosis has attracted significant attention over the past few decades. It consists of two major parts: vibration signal feature extraction and condition classification for the extracted features. In this paper, multiscale permutation entropy (MPE) was introduced for feature extraction from faulty bearing vibration signals. After extracting feature vectors by MPE, the support vector machine (SVM) was applied to automate the fault diagnosis procedure. Simulation results demonstrated that the proposed method is a very powerful algorithm for bearing fault diagnosis and has much better performance than the methods based on single scale permutation entropy (PE) and multiscale entropy (MSE).


Introduction
Bearings are the most frequently used component in a rotary machine.Bearing failures could lead to unpredictable productivity losses for production facilities.Therefore, bearing fault diagnosis has attracted significant attention from the research and engineering communities over the past decades.Generally, a bearing fault diagnosis process can be decomposed into three steps: data acquisition, feature extraction, and fault condition classification.
Vibration-based signal analysis in the time-frequency domain has been a major technique for bearing fault diagnosis.Several statistical parameters in the time domain and the frequency domain, such as the root mean square, kurtosis, and skewness, have been shown to be capable of fault detection [1,2].In [1], nine features in the time domain and seven features in the frequency domain were used for bearing fault detection.We call this method the time domain and frequency domain statistical formulas (TDFDSFs) method throughout this paper.
Time-frequency analysis methods, such as the short-time Fourier transform [3], the Wigner Ville distribution [4], and the wavelet transform [5], have been widely used to detect bearing faults since they can provide abundant information about machine faults.However, these time-frequency based methods often require a lot of computation time, as they involve a lot of Fourier transforms or convolution operations.Moreover, due to the factors of clearance and nonlinear stiffness of bearings, the vibration signals are often characterized by nonlinearity.Therefore, these commonly used time-frequency analysis techniques may exhibit limitations because of their linearity assumption.
In order to overcome this problem, several nonlinear parameter estimation techniques were applied to extract defect-related features hidden in the measured signals.In [6], Hong and Liang combined the Lempel-Ziv complexity with the continuous wavelet transform and found that the new method was more effective in bearing fault diagnosis.Then, the methods based on approximate entropy (ApEn) [7] and multiscale entropy (MSE) [8] were used for bearing fault diagnosis.Both ApEn and MSE can be used for measuring the regularity of a time series.Although these entropy-based methods are simple and require much less computation time, they have very good performance in bearing fault diagnosis.
In [9], a new entropy based method named permutation entropy (PE) was exploited to assess the status of a rotary machine.The PE was introduced by Bandit [10].It estimates the complexity of time series through the comparison of neighboring values.The PE has been widely used in a number of applications, such as electroencephalography (EEG) signal analysis [11,12], stock market analysis [13], tool breakage detection in end milling [14], and chatter detection in turning processes [15].Time series derived from physiological and mechanical systems are usually complicated and consist of multiple temporal scale structures.Based on a single scale algorithm, the PE based method has limited performance in analyzing these complicated data.To overcome this shortcoming, based on the concept of multiscale [16], Aziz proposed a new method termed mutliscale permutation entropy (MPE) to calculate entropy over multiple scales [17].In addition, Li employed the MPE method to track the effect of anesthetic drug sevoflurane on the brain and showed that the MPE index outperforms the single scale PE index [18].
In this paper, we introduce MPE as a feature extractor of the bearing fault diagnosis system.After extracting feature vectors by MPE, the multi-class support vector machine (SVM) [19] is used as a classifier.The SVM is probably the most popular and powerful machine learning algorithm because of its well established theoretical background and intuitive geometrical interpretation.Nowadays, the SVM is widely applied and has even served as the baseline in computer vision, pattern recognition, information retrieval, and data mining, etc.In our simulations, the vibration signal datasets of bearing from Case Western Reserve University (CWRU) [20] are utilized.Experimental results demonstrate that the proposed MPE-based algorithm provides a significantly higher accuracy of prediction than the traditional feature extraction methods.
The remainder of this paper is organized as follows: Section 2 provides a review of permutation entropy.In Section 3, the proposed algorithm based on multiscale permutation entropy is introduced.In Section 4, several examples are presented to demonstrate the effectiveness of the proposed MPE algorithm.A conclusion is given in Section 5.

Permutation Entropy
Given a time series {x(k), k = 1, 2, …, N}, the m-dimensional delay embedding vector at time i is defined as: ) where m is the embedded dimension and τ is time delay.We say that m i x has a permutation where 0 r i m − 1 and r i r j .
There are m! possible permutations of for an m-tuple vector.For each permutation π, we determine the relative frequency by:   ( 1) , has type ( ) The PE of m dimension is then defined as: The maximum value of H PE (m) is log(m!) when all possible permutations appear with the same probability.Therefore, the normalized permutation entropy (NPE) can be obtained as: For any time series, 0  H NPE (m)  1 is satisfied.
In the remainder of this section, we explain the PE algorithm by using an example of the time series in Equation ( 6): We set the parameter of time delay τ to be 1.When the embedded dimension m is 3, five embedding vectors can be obtained as: There are six (3!) possible permutations of dimension 3, which are denoted by π 012 , π 021 , π 102 , π 120 , π 201 , and π 210 , respectively.The embedding vectors 3   1   x and 3   2   x have the permutation type π 012 , the vector 3   4   x has the permutation type π 102 , while both 3   3   x and 3   5   x correspond to π 201 .Therefore, the probability of each permutation is given by: 012 021 102 ) 0, ( ) , ( ) 0. 5 The PE and the NPE of dimension 3 are then calculated by: The value of PE depends on the selection of the embedding dimension m and delay τ.If m is too small, the scheme will not work since there are too few distinct states.However, it is often inappropriate to choose m as a large value for detecting the dynamic change of a time series [17].Moreover, Cao [21] indicated that the delay τ is related to the signal for analysis and its sampling rate.

Proposed Bearing Fault Diagnosis System Based on Multiscale Permutation Entropy and Support Vector Machine
The concept of multiscale analysis was originally proposed by Costa [16], who indicated that the single scale entropy algorithm yielded contradictory results when applied to real-world datasets obtained in health and disease states.In regard to this, Costa proposed a coarse-grained procedure to obtain multiple scale time series from the original time series.Then, the entropy at each scale is calculated to analyze the physiological signal.Given a time series x = {x 1 , x 2 , …, x N }, one can construct a consecutive coarse-grained time series y (s) corresponding to the time scale s.First, the original time series is divided into non-overlapping windows of length s.Second, the data points inside each window are averaged by Equation (11).The schematic illustration of the coarse-grained procedure is shown as in Figure 1: Figure 1.Schematic illustration of the coarse-grained procedure modified from [16].
Based on the concepts of multiscale and PE, Aziz proposed a new method termed mutliscale permutation entropy (MPE).In MPE analysis, the entropy of the coarse-grained time series at each scale is calculated by the NPE algorithm defined in Equations ( 3)-( 5).Li employed MPE analysis to track the effect of anesthetic drug sevoflurane on the brain and showed that the MPE index outperforms the single scale PE index [18].In this paper, motivated by the previous efforts, we investigate the utility of MPE for detecting a variety of bearing faults in rotary machines.The flowchart of the multiscale permutation entropy algorithm is as seen in Figure 2. random separating hyperplane in a linear-separable dataset, the SVM generates a unique hyperplane in a given dataset.This hyperplane provided by the SVM not only has an intuitive geometrical interpretation, but also been proved theoretically to balance the in-sample error (E in ) and the generalization error.Given a binary labeled dataset as shown in Figure 3, we found that there are many hyperplanes that can be used to separate red circles (positive: 1) from blue circles (negative: −1), such as the three gray lines plotted in Figure 3a.These gray lines may come from the PLA, the Adaline algorithm, the least square regression algorithm, or the logistic regression algorithm, where the last three ones determine their separating hyperplanes based on the corresponding objective functions but without a direct geometrical interpretation.By contrast, the SVM was originated from a geometrical view as shown in Figure 3b.It seeks a separating hyperplane which keeps its distance from the positive and negative samples as far as possible without training error.In other words, the SVM desires a separating hyperplane that can maximize the margin  between the positive and the negative samples.It can effectively tolerate the error of the unseen samples and was claimed to have good generalization ability.The objective function of the SVM is then modeled as a constraint optimization problem.In our algorithm, the SVM classifier is implemented by the LIBSVM software [20].

Experimental Data
In order to validate the capability of the MPE algorithm, experimental analyses on bearing faults were conducted.All the bearing data we used were obtained from the CWRU Bearing Data Center [4].The time-domain vibration signals of bearing were collected from the normal case, the ball fault case, the inner race fault case, and the case of the outer race fault at the 6 o'clock position.The shaft rotating speeds of the motor are 1730, 1750, 1772, and 1797 rpm, and the sampling frequency is 48 kHz.For all fault conditions, the defect size of point fault is 14 mil in diameter.
In these experiments, the vibration signals collected from different fault conditions are divided into several non-overlapping 2048-point width windows.The window number of each fault condition at a specific rotating speed is shown in Table 1.Then, in Tables 2-9, the method of time domain and the frequency domain statistical formulas (TDFDSFs) [1], the MSE method [8,16], the PE method [10,16], and the proposed MPE method were used to extract the features and their performances compared.To demonstrate the effect of the number of training samples, the experiments were designed by different training set sizes (10%, 20%, 30%, 40%, and 50% of total samples), and the remaining samples are used for prediction.The average accuracy of prediction for each experiment was quantified over 200 tests.

Results
The average accuracies of prediction for different feature extraction methods are presented in Tables 2-5.In the experiments, 16 TDFDSF features, 16 MSE features, a single PE feature and 16 MPE features were used to train the corresponding SVM model.The parameter r of MSE was set to 0.15σ where σ represents the standard deviation of original signals.The embedded dimension m and the time delay τ of MPE were set to 5 and 1, respectively.The parameters C and γ, of the SVM were 100 and the reciprocal of the feature number, respectively.As presented in Tables 2-5, in the cases where the percentages of training samples are 10%, 20%, 30%, 40%, and 50%, the accuracies of the MPE based fault diagnosis system are all superior to those of the TDFDSF, the MSE, and the single scale PE based fault diagnosis systems.As shown by the experimental results, the single scale PE (i.e., MPE at scale 1) is not good enough to classify different bearing faults.However, a fault diagnosis system with the accuracy of prediction up to 99% will be obtained if MPE features are used.Another advantage of the MPE is that it is more robust on the variation of the training size.The computational cost for the SVM training procedure can be greatly reduced since a large number of training samples are unnecessary.In the following, we only demonstrate the confusion matrices of MPE with 16 scales in Tables 6-9.At each shaft rotating speed, there are 50% of total samples in four kinds of fault conditions are used for training, and the remainders are for testing.All the parameters are the same as those used in the last experiment.The experiment results show that the average accuracies are close to 99% while the MPE is utilized.Therefore, the proposed method provides significant improvement in bearing fault diagnosis.Furthermore, in Tables 10-12, we show the effects of varying the number of features for our proposed MPE algorithm.From these results, one can see that even if only 5 features are adopted, the recognition rate of the proposed MPE algorithm is more than 99%.Therefore, the proposed method is robust to the number of features.When using the proposed MPE algorithm, one can use very small number of features to achieve very high accuracies.
Moreover, when using the proposed MPE algorithm, from our simulations, if the data used for training the SVM are collected under 1,730 rpm, the recognition rate for the data at 1,750 rpm is 95.36%.If the data used for training the SVM are collected under 1,750 rpm, the recognition rate for the data at 1,730 rpm is 99.26%.When the difference between the rotating speed of the training data and that of the testing data is small, the recognition rate remains high.

Conclusions
Multiscale permutation entropy (MPE) is an effective way to measure the complexity of chaotic time series, such as the vibration signal of bearings in our experiments.Compared with PE and other well-known complexity measures, MPE can extract the features with high distinguishability.Combined with the SVM, the simulation results of bearing fault diagnosis show that the proposed framework achieves much higher accuracies than other methods.Due to the fact that MPE is robust to the training set data size, a large amount of computational cost could be saved in the training process.

Figure 3 .
Figure 3. Different separating hyperplanes resulted from different algorithms: (a) the hyperplanes (three gray lines) resulted from general linear classification algorithms; (b) the hyperplane (gray line) resulted from the linear SVM algorithm where the margin σ is the distance between the hyperplane and the nearest sample.

Figure 4 .
Figure 4.The flowchart of our framework with the one-versus-one SVM of c classes.

Table 1 .
The window number of each fault condition.

Table 2 .
The average accuracies at 1730 rpm.

Table 3 .
The average accuracies at 1750 rpm.

Table 4 .
The average accuracies at 1772 rpm.

Table 5 .
The average accuracies at 1797 rpm.

Table 10 .
The average accuracies of the proposed MPE algorithm at 1,730 rpm when the number of features varies from 1 to 20.

Table 11 .
The average accuracies of the proposed MPE algorithm at 1,750 rpm when the number of features varies from 1 to 20.

Table 12 .
The average accuracies of the proposed MPE algorithm at 1,772 rpm when the number of features varies from 1 to 20.