Bearing Fault Diagnosis Using a Particle Swarm Optimization-Least Squares Wavelet Support Vector Machine Classifier

Bearing is one of the key components of a rotating machine. Hence, monitoring health condition of the bearing is of paramount importace. This paper develops a novel particle swarm optimization (PSO)-least squares wavelet support vector machine (PSO-LSWSVM) classifier, which is designed based on a combination between a PSO, a least squares procedure, and a new wavelet kernel function-based support vector machine (SVM), for bearing fault diagnosis. In this work, bearing fault classification is transformed into a pattern recognition problem, which consists of three stages of data processing. Firstly, a rich information dataset is built by extracting the features from the signals, which are decomposed by the nonlocal means (NLM) and empirical mode decomposition (EMD). Secondly, a minimum-redundancy maximum-relevance (mRMR) method is employed to determine a subset of feature that can provide an optimal performance. Thirdly, a novel classifier, namely LSWSVM, is proposed with the aid of a PSO, to provide higher classification accuracy. The key innovative science of this work is to propropose a new classifier with the aid of an new wavelet kernel type to increase the classification precision of bearing fault diagnosis. The merit features of the proposed approach are demonstrated based on a benchmark bearing dataset and a comprehensive comparison procedure.


Introduction
Since bearing is a crucial component in the machine, its failure will hugely affect to the disruption of the machine. Therefore, condition monitoring for rolling bearings has become more and more important to detect early damage and increase safe of the operating systems. In the literature, two approaches can be applied to detect the bearing defects: (1) acoustic signal analysis, where the acoustic signal is acquired to obtain bearing characteristic information, and (2) vibration signal analysis, where the vibration signal is acquired. Among them, using vibration signal usually provides better defect detecting accuracy becuase it contains rich information of the bearing characteristics and less measurement noise [1].
Bearing defects can be detected by either analyzing the fault frequency spectrum [2] or pattern recognition [3]. However, the analysis in [4] shown that the pattern recognition can give higher accuracy compared to the spectrum approach. In the approach of traditional pattern recognition, the system will include three major components: feature extraction, feature selection and feature classification. The goal of the feature extraction task is to get as much information about the condition of the system as good. For this purpose, we employ the NLM-EMD method, which has been developed in our previous work [5] and proved its effectiveness, to extract a rich bearing feature set.
Feature extraction usually results in a large feature set. Unfortunately, the large feature set does not neccessarily provide higher classification accuracy as it possibly contains irrelevant and redundant features. Thus, it is signiticant to eliminate the irrelevant and redundant features before it is fed back to a classifier. To obtain an optimal feature subset, a minimum-redundancy maximum-relevance (mRMR) feature selection method has been developed [6]. The mRMR tries to search an outstanding combination of candidate features for minimum redundancy and maximum relevance. Due to the merits of the mRMR, it is employed in this paper to select the effective features.
Once the salient features are selected, they are fed into a classifier to identify the system condition. Due to its high performance classification and less requirement on sample data input, the support vector machine (SVM) proposed by Cortes and Vapnik [7] has been successfully applied to signal processing [8], regression analysis [9], pattern recognition [10], and bearing fault diagnosis [11]. However, the original SVM classifier provides high computational burden due to the method used to solve the quadratic programming problem in the SVM [12]. In order to reduce this, many methods have been developed, for example the SVM light decomposition algorithm [13], sequential minimal optimization (SMO) algorithm [14], neighbor algorithm [15], and least squares SVM (LSSVM) [16]. Among them, the LSSVM is commonly applied in real applications due to its simplicity in implementation and efficiency in classification and computation [17].
In the SVM classifier, a kernel function is used to transform the data from the lower dimension space to a high dimension space. Hence, the prior selection of the kernel will decide the way of classification of the SVM [18]. Several kind of kernels have been developed for SVM, for example, polynomial, dot product, and radial basis function (RBF) kernels. Among them, RBF kernel has shown to be more effective because it has good capacity to approximate nonlinear functions. Recently, wavelet kernel has been developed as an effective method for nonlinear approximation and mapping [4,19]. In [20], Zhang et al. has employed the wavelet kernel for the SVM classifier, and a wavelet SVM (WSVM) classifier has been proposed as a result. Since the wavelet transform provides better approximation capacity than the RBF, the WSVM classifier provides higher accuracy than the SVM with RBF kernel. Since then, the WSVM have been employed in many real applications, such as in the medical field [21], and machine fault diagnosis [22]. Due to the merits of the LSSVM classifier and the approximation capability of the wavelet kernel, a new least squares wavelet support vector machine (LSWSVM) is proposed first time in this paper to improve both computational efficiency and classification accuracy. However, the generalization performance of the LSWSVM is affected by its parameters. Thus, it is necessary to optimize the parameters to obtain a better performance. In the literature, Particle swarm optimization (PSO) [23] has been developed as an effective optimization technique to optimize parameters of a process. Compared with other optimization methods, PSO have many advantages, such as simple implementation, few parameters, parallel computation ability, and quickly converge [24]. The PSO had proved its optimization capacity when applying for many practical applications, such as for optimizing the parameters of SVMs [25] and other optimization problems [26,27]. Therefore, the PSO is used in this paper to effectively select the parameters of the LSWSVM, leading to a new PSO-LSWSVM classifier, which addresses all difficulties in the use of the SVM classifier.
In summary, the novelties and main contributions of this paper can be listed as follows: • A new methodology for bearing fault diagnosis is developed by combining between feature extration based on a NLM-EMD method, a feature selection based on a mRMR and a new PSO-LSWSVM classifier.

•
To improve the generalization performance of the SVM, a novel PSO-LSWSVM classifier, which combines between a least squares procedure, a new wavelet kernel function and the PSO, is proposed.

Feature Extraction
In this paper, we employ the NLM-EMD method, which has been developed in our previous work [5] and proved its effectiveness, to extract a rich bearing feature set. For the merit features of the NLM-EMD and its detail description, the interested readers can refer to the previous work [5].

Nonlocal Mean (NLM) De-Noising
Consider a noise signal has a form as y = u + n, where u is the true signal and n is an additive noise. The noise component can be eliminated using a NLM as below: The parameters used in (1) can be designed as in [5]. For more detail description of the NLM denoising, the interested readers can refer to our previous paper [5].

Empirical Mode Decomposition
Consider an original signal x(t), a number of IMFs C(t) can be obtained from the original signal using EMD method as [28] x(t) = n ∑ j=1 C j (t) + r n (t) (2) where the high frequency is decreased from C 1 (t), C 2 (t), C 3 (t), . . . , C n (t), and r n (t) contains no meaningful information. Generally, fault information is distributed significantly on the high and mid-frequency components [4,19]. Thus, the first five IMFs are used in this work for bearing fault analysis since they represent the mid-and high frequency components of the original signal.

Energy Feature Extraction
In the previous section, the EMD has been employed to decompose the original signal into a number of IMF components with different frequency bands. On the other hand, the frequency band can be referenced of the energy of fault vibration signal. Hence, in order to capture the effects of faults on the change of the energy of the vibration signal, IMF energy features are employed.
Each IMF component C j (t) possesses an energy E j (t), which can be calculated as: Then, a normalization procedure can be applied for each E j (t): where T is the total energy of the first five IMF components:

Time-Domain Feature Extraction
Time-domain features usually provide rich information to distinguish normal condition and fault condition. In this paper, the nine time-domain dimensionless parameters defined in Table 1 is used to extract fault information from the de-noised signal and the first five IMFs to obtain rich information of bearing faults.
Finally, a set of features, which includes 5 + 9 × 6 = 59 fetures, is obtained to represent a bearing condition. Standard deviation is a signal series for n = 1, 2, ..., N.

Minimum Redundancy Maximum Relevance (MRMR) Feature Selection
Let F be the initial feature set and |S| be the cardinality in seeking feature subset S. The following criterion is developed for minimal redundancy: and the maximum relevance criterion is defined as: where I( f i , f j ) is the mutual information of two features, f i and f j ; and I(C, f i ) quantifies the relevance of the feature, f i , in S and the target class, C.
To obtain a feature subset with minimum redundancy and maximum relevance, a mRMR function is obtained by combining (6) and (7): The completed procedure of the mRMR can be refered to [4]. To obtain the desired feature subset, forward selection search [29] is employed.

Least Squares Support Vector Machine (LSSVM)
Given a training set of N data points, (x 1 , y 1 ), (x 2 , y 2 ), ..., (x N , y N ), where x i ∈ R d is the i th input vector and y i ∈ ±1 is the corresponding target, we employ the idea of the transformation of an input pattern into a reproducing kernel Hilbert space using a set of mapping functions, φ(x). The reproducing kernel, K(x, x ), in the reproducing kernel Hilbert space is the dot product of the mapping functions at x and x , i.e., K(x, x ) = φ(x).φ(x ) . In the new defined kernel space, a linear classifier usually has a form below: To facilitate the selection of the parameters ω and b, the LSSVM formulates the optimization problem as: The feature mapping, i.e., φ(x), is usually unknown, and Mercer's condition [30] can be appllied.
The decision function of the LSSVM classifier becomes: A kernel RBF can be chosen as: where σ is a free parameter.

Least Squares Wavelet Support Vector Machine (LSWSVM)
Generally, the family of wavelet analysis has a form: where z, a, c ∈ R, a is a dilation factor, c is a translation factor; and h(z) is the mother wavelet, which satisfies the following condition [31,32]: where F(ω) is the output of h(z) using Fourier transform. Employing a wavelet transform for g(z), one obtains: W a,c (g) = g(z), h a,c (z) (16) where indicates the dot product. The function g(z) is provided by [31]: Reformulate (17):ĝ where W i is the reconstruction coefficient, and g(z) is approximated byĝ(z). A wavelet function can be selected as [31]: where z = [z 1 , z 2 , ..., z N ] T ∈ R. Then, if z, z ∈ R N , the dot-product wavelet kernels can be computed as: and the following expression is used to describe the translation invariant wavelet kernels [31]: Substituting (21) into (12), the decision function of the LSWSVM classifier has a form below: where x t,j and x i,j denote the j th element of x t and the i th training sample, x i , respectively. In order to approximate a general nonlinear model, in this paper, we propose to use the following wavelet kernel: where a is a parameter of the RBF kernel; k and λ are new parameters that control the kernel shape. It is obvious to see from equation (23) that the performance of the defined wavelet kernel depends significantly on the selection of the parameters a, k, and λ. When the parameters a, k, and λ are changed, the shape of the kernel is changed. Therefore, it is needed to optimize these parameters to obtain a good performance of the system.

Particle Swarm Optimization (PSO) for Parameter Selection of LSWSVM-the PSO-LSWSVM Classifier
In order to get the optimal values of the parameters a, k, and λ, particle swarm optimization (PSO) [33] is employed in this paper. The detail description of the PSO can be referred to our previous work [4,19] to reduce the length of the paper. The velocity, position and the initial weight of the PSO are updated using the following three equations: x t+1 The definitions of the parameters used in Equations (24)- (26) can be referred to [4,19]. The LSWSVM classification model constructed using the wavelet kernel function defined in (23) has four user-determined parameters, including a regularization parameter C and three kernel parameters, λ, k and a. In this paper, we use PSO to automatically select the parameters of the LSWSVM classifier; hence, a relatively new classifier, i.e., PSO-LSWSVM, is proposed. The step-by-step implementation details of parameters selection for the LSWSVM classifer based on PSO are described below.
Step 1: Initializes the parameters of the PSO: the population N, the position and velocity of each particle (C, a, k and λ -parameters for LSWSVM).
Step 2: Uses the following fitness function, which is obtained from the output of the LSWSVM classifier, to evaluate the initialized particles: where N t and N f denotes the number of true and false classification, respectively.
Step 3: Creates a new swarm by updating the velocity and position of each particle using (24) and (25).
Step 4: For the new obtained swarm, the fitness values are computed and compared to update the p best,i and G best of the swarm.
Step 5: Checks the termination condition: If the maximum number is reached, goes to Step 6. Otherwise, return to Step 3 and continue the closed-loop process.
Step 6: Encodes the optimal parameter of the wavelet kernel of the LSWSVM classifier from the global best position, G best .

Fault Diagnosis Methodology
The proposed fault diagnosis methodology is briefly described as in Figure 1. The implementation is executed as follows: Step 1: A number of effective IMFs are obtained after filtering the vibration signals using the NLM and EMD.
Step 2: Extracts the energy and time domain features to obtain a combined feature set.
Step 3: Uses the mRMR feature selection technique to get an optimal feature subset.
Step 4: Uses the wavelet kernel function defined in (23) for LSSVM classifier and optimizes the parameters using the PSO technique.
Step 5: Classifies the bearing fault types using the PSO-LSWSVM classifer based on the 'one to others' multi-class classification strategy [34], which is illustrated in Figure 2, and the selected feature subset in Step 3.
Remark: Although the full fault diagnosis system, which includes feature extraction, feature selection, and feature classification, is presented in this paper, the major contribution of this paper is to introduce a novel PSO-LSWSVM classifier. The feature extraction tasks are mainly taken from the previous work [4], while the feature selection based on the mRMR is a standard and well-known technique in the literature.

Training and Test Data Configuration
The data used in this experiment are taken from the Case Western Reserve University Bearing Data Center (2014) [35]. The bearing test-bed is shown in Figure 3. In this paper, four types of bearing conditions are considered, including one normal condition (no fault) which is labeled as NM and three fault conditions. The three fault conditions include fault at outer race, fault at inner race and fault at ball which are labeled as ORF, IRF and BF respectively. In each type of fault condition, fault size can have the value of 0.007, 0.014 or 0.021 mili-inches. Therefore, totally 10 conditions (10 classes) of bearing are taken into account.

Parameter Selection
In the first simulation set, we illustrate the performance of the NLM and EMD. Figures 4-7 illustrate the denoising results using the NLM. The denoised signals are then passed through the EMD to obtain the effective IMF components. The 59 features are then extracted from the denoised signal and the IMF components as described in Section 2.    In the second and third simulation sets, the computed feature set is fed into the mMRM feature selection to get an optimal feature subset. The selected feature subset is then used as input to a classifier to identify the bearing conditions. The LSWSVM classifier was implemented based on a modification of the LS-SVMLabtoolbox [36]. In order to verify the effectiveness of the PSO and the proposed wavelet kernel function, we constructed four different classifiers: (1) an LSRBFSVM classifier using an RBF kernel for the LSSVM with parameters selected by the user; (2) a PSO-LSRBFSVM classifier (LSRBFSVM with parameters are selected by using PSO); (3) an LSWSVM classifier using the proposed wavelet kernel in (25) with parameters selected by the users; and (4) a PSO-LSWSVM classifier (using PSO to automatically select the parameters of the LSWSVM). In addition, to verify the effects of the parameters λ, k and a, the PSO-LSWSVM classifier is used in three different circumstances: (a) λ and k are firstly selected by user, and the PSO is used to tune the parameters a and C; (b) λ is firstly selected, and the PSO is used to tune the parameters k, a and C simultaneously; and (c) the PSO is used to tune the parameters λ, k, a and C simultaneously. These classifiers are also compared with the k-nearest neighbor (KNN) [37] and probability neural network (PNN) [38] classifiers, which are widely applied for bearing fault diagnosis, to further verify the effectiveness of the proposed classifier.

Performance Evaluation
According to the forward selection search algorithm [29], 59 feature subsets are created based on the mRMR feature selection. To compare the generalization performance of the classifiers, we consider each feature subset as an independent dataset. Thus, we have 59 different datasets corresponding to 59 feature subsets. To evaluate the performance of the methods, the extracted feature vectors are used as inputs for the classifiers to obtain the classification accuracies. In this paper, to estimate the generalized classification accuracy, l-fold cross-validation (CV) [39], where l is set to 3, is employed. To obtain a precisely classification result, l-fold CV is performed ten times in this study.

Training Process
First, the training process is performed to obtain an optimal feature subset of each classifier and the kernel parameters of the LSRBFSVM and LSWSVM classifiers. The PSO is performed at this training step. The validation accuracy in this study is computed as follows: where K = 10 indicates number of classes, N TP indicates the number of true classifications, and N S is the number of samples used in this experiment. The validation accuracy of 59 features dataset for the KNN, PNN, LSRBFSVM, PSO-LSRBFSVM, LSWSVM, and PSO-LSWSVM classifiers are shown in Figures 8-13, respectively. The mean and best results and the computational time (for one fold) of each method are also reported in Table 2 for the sake of comparison. The subspaces according to the best records are assigned as the optimal feature subset according to the forward selection search algorithm [29]. Observing from these figures, we can see that the combined 59 features yields a low classification accuracy due to the presence of the irrelevant and redundant features; for example, 43% for the KNN, 55.95% for the PNN, 45.71% for the LSRBFSVM, 68.57% for the PSO-LSRBFSVM, 62.86% for the LSWSVM, and around 90.95% for the PSO-LSWSVM. By using the mRMR criteria for feature selection, the classification accuracy is clearly increased. For example, for the KNN classifier, the peak value is obtained at 7 features with the accuracy increased up to 83.91%; for the PNN classifier, the peak value is obtained at 17 features with the accuracy increased up to 91.42%; for the LSRBFSVM, the peak value is obtained at 11 features with the accuracy increased up to 91.43%; for the PSO-LSRBFSVM, the peak value is obtained at 20 features with the accuracy increased up to 94.76%; for the LSWSVM, the peak value is obtained at 12 features with the accuracy increased up 99.05%; and for the PSO-LSWSVM, the peak value is obtained at 2 features with the accuracy increased up to 100%.     (a) k = 1.75, λ = 1, using PSO to automatically select C and a.
(c) Using PSO to automatically select C, a, k and λ.  From these results, four observations can be obtained: (1) the feature subsets selected by the mMRM commonly yield higher accuracy than the use of all 59 features; (2) although the computational time of the PSO-LSWSVM (PSO: 30.52 s+ LSWSVM: 0.422 s) classifier is higher than the KNN (0.125s), PNN (0.109 s) and the PSO-LSRBFSVM classifier (PSO: 24.49 s+LSRBFSVM: 0.375 s), it gives much better performance. It should be notice that although the PSO requires a higher computational time, however the PSO training is done offline, and thus it will not affect to the real time fault diagnosis; (3) comparison results between Figure 12 with Figures 8-10 shown that the LSWSVM classifier provides better accuracy compared to the KNN, PNN and LSRBFSVM classifiers; (4) by comparing Figure 11 with Figure 10 and Figure 13 with Figure 12, it is clear that using the PSO for parameters selection always provides better performance than using the random selection. In addition, comparisons between Figure 13a-c shown that all parameters, λ, k, a and C, have significant effects on the performance of the LSWSVM classifier, and that the selection of four parameters simultaneously will produce better generalization performance. Based on Table 2 and the forward selection search algorithm [29], 8 features, 17 features, 20 features and 2 features are selected as the optimal feature subset for the KNN, PNN, PSO-LSRBFSVM and PSO-LSWSVM classifiers, respectively.

Conclusions
Two major contributions have been presented in this paper: • A new pattern recognition approach for bearing fault diagnosis is developed by combining between feature extration based on a NLM-EMD method, a feature selection based on a mRMR and a new PSO-LSWSVM classifier. • A novel PSO-LSWSVM classifier, which combines between a least squares procedure, a new wavelet kernel function and the PSO, is proposed.
In the presented method, the combined NLM-EMD is first employed to acquire more effective IMF components of vibration signals. Then, for the de-noised signal and each IMF component, the energy and time-domain feature parameters are extracted to obtain characteristic parameters. Next, the mRMR feature selection technique is adopted to eliminate the irrelevant and redundant features and select the best combined feature subset. Finally, the selected feature subset is fed into the proposed PSO-LSWSVM classifier to identify the bearing conditions, wherein a novel combination of a PSO, a least squares procedure, and a new wavelet kernel is proposed to address the difficulties in the use of the traditional SVM classifier. By experimenting with a real bearing vibration signal, we verified that the proposed wavelet kernel function has a better generalization performance than the previous kernels, i.e., RBF kernel, and the proposed PSO-LSWSVM classifier can overcome all difficulties in the use of the traditional SVM classifer. In addition, the uses of the NLM-EMD for the feature extraction and mRMR for the feature selection are effective. Therefore the proposed fault diagnosis methodology based on the NLM-EMD, mMRM feature selection and PSO-LSWSVM classifier improves the bearing recognition accuracy significantly, up to 95.53%.