Weighted Kernel Entropy Component Analysis for Fault Diagnosis of Rolling Bearings

This paper presents a supervised feature extraction method called weighted kernel entropy component analysis (WKECA) for fault diagnosis of rolling bearings. The method is developed based on kernel entropy component analysis (KECA) which attempts to preserve the Renyi entropy of the data set after dimension reduction. It makes full use of the labeled information and introduces a weight strategy in the feature extraction. The class-related weights are introduced to denote differences among the samples from different patterns, and genetic algorithm (GA) is implemented to seek out appropriate weights for optimizing the classification results. The features based on wavelet packet decomposition are derived from the original signals. Then the intrinsic geometric features extracted by WKECA are fed into the support vector machine (SVM) classifier to recognize different operating conditions of bearings, and we obtain the overall accuracy (97%) for the experimental samples. The experimental results demonstrated the feasibility and effectiveness of the proposed method.


Introduction
Rolling element bearings are widely used in rotating machines in modern industry, and bearing failure is one of the most common reasons for machine breakdown. Unexpected failures may cause huge economic losses and even lead to casualties [1][2][3]. Therefore, it is important to accurately diagnose bearing faults at the early stage [4,5]. Vibration-based fault diagnosis has been extensively studied to improve existing techniques toward the goal of more accurately dealing with various problems, such as varying load effect and noise contamination [3][4][5][6][7][8]. Especially, the sensitivity of diagnostic features from the vibration signals may vary with different load conditions due to nonlinear effect and non-stationary noise, of which no single-domain processing methods can comprehensively extract the fault features to reflect the condition [9]. High-dimensional feature sets constructed with mix-domain features are often used for diagnosis [10,11]. Although more features can obviously provide more information, they contain a lot of redundant and disturbed information which will increase computation time and reduce recognition accuracy. More effective feature extraction and dimensionality reduction methods are needed to obtain higher diagnostic accuracy [12,13].

Brief Review of KECA
Assuming that p(x) is the probability density function of a given sample X = x 1 , . . . , x N , its Renyi entropy of the order α is expressed as H α (X) = 1 1−α lg( p α (x)dx) [28], where α ≥ 1. In KECA, Renyi quadratic entropy (α = 2) is employed, because the entropy value can be elegantly estimated by Parzen window density estimator [29]. Renyi quadratic entropy can be expressed by H(X) = −lg p 2 (x)dx . Since the monotonicity property of logarithmic function, only the integral function V(p) = p 2 (x)dx = E{p(x)} needs to be considered [21,22,30]. To estimate V(p), a Parzen window density estimator p(x) = 1 N ∑ [21,29], where K σ (x, x i ) is the estimator or kernel function centered at x i and σ is the smoothing width or the kernel size. According to the convolution theorem, the convolution of two Gaussian functions is another Gaussian function with σ = σ 1 2 + σ 2 2 . Substituted K σ (x, x i ) and p(x) into V(p), the following estimation can be obtained: where K is a N × N kernel matrix, the element (i, j) of K is K σ (x i , x j ), and 1 is a N × 1 vector (all elements are one). Therefore, the Renyi entropy can be estimated by the corresponding kernel matrix that can be decomposed as K = EDE T , where D = diag(λ 1 , λ 2 ,..., λ N ) and E = [α 1 , α 2 ,...,α N ]. Here λ i and α i are the eigenvalues and corresponding eigenvectors, respectively. Then: This expression is the so-called entropy values, and each term √ λ i α i T contributes to the entropy estimation. The eigenvectors and corresponding eigenvalues are ranked in decreasing order of the entropies. KECA selects certain eigenvalues and corresponding eigenvectors according to the d largest entropies [21], different from PCA and KPCA that select largest eigenvalues. Therefore, the resulting KECA expression is Φ keca = D d 1 2 E d T , where D d and E d store the top d eigenvalues and corresponding eigenvectors.

Introduction of WKECA
Given a set of c-class training sample patterns x i ∈ R N (i = 1, 2, ..., N), and each sample x i belongs to one of c-class. Defined that the weight vector is [u 1 , u 2 , ..., u N ] and the label values are {l 1 , l 2 , ..., l c }. Each sample has the corresponding label value based on its own class properties. Thus, u i = l j if x i ∈ j-th class, where i = 1, 2, ..., N and j = 1,2, ..., c. Here the weights are depended on the class so that they can represent the class information. The weighted matrix that has the same dimension as the original kernel matrix K(x i , x j ) is defined as: We constructed the new weighted kernel matrix K w with K W (i, j) = K(i, j) W(i, j) as: The effects of the weights under two conditions can be analyzed: (1) If u i = u j , the samples x i and x j belong to the same class and W(i, j) = 1. As observed, the weighted kernel matrix K W will be equal to the original kernel matrix K. (2) If u i = u j , the W(i, j) will be a positive value, in which the label information can be embedded in the weighted kernel matrix. Eigen-decomposed K W : K W = E W D W E W T , the eigenvalues λ w1 , λ w2 , ..., λ wN of the weighted kernel matrix are ranked in decreasing order of the entropies, and α w1 , α w2 , ..., α wN are the corresponding eigenvectors. The subspace is defined as U W spanned by the principal axes that contribute most to the Renyi entropy estimation. Requiring ||u wi || 2 = 1, thus u wi = λ wi Φα wi can be obtained. We can project both training and testing samples onto U W to extract the intrinsic features.
For the out-of-sample data set x t , the extracted features can be calculated: Let Φ refer to a collection of the out-of-sample data sets, K = Φ T Φ is the inner product matrix.
Then we can extract the first d nonlinear principal components which contribute most to Renyi entropies of the input data by using the weighted kernel matrix. The number, d, of the projection vectors is determined in terms of

Selecting Optimal Weights for Weighted Kernel Entropy Component Analysis by Genetic Algorithm
The relevance of different classes leads to diversified generalization performances. Therefore, weights are important to the recognition system, and determination of weights can be considered as an optimization problem. GA is a search and optimization process inspired by the laws of nature evolution and selection [31], which is a powerful intelligent optimization tool based on a group of independent computations controlled by the probabilistic strategy. GA has been widely used in various applications because of its excellent global search ability [31,32]. In this study, we use GA to find the most suitable weights for WKECA where the optimality is defined regarding the recognition accuracy and class separability. The main optimization process can be described as follows: (1) Individual encoding: defined the individual is a set of weights l 1 , l 2 , ..., l c , the encoding method based on binary for each weight is used. (2) Population initialization: an initial population with n r individuals (set to 20) is randomly created.
(3) Fitness calculation: the individual selection for the next generation is done based on the fitness. Taking advantage of Liu and Wang's work [19], the fitness function is defined as f (X) = CA + kR BW , where CA is the training accuracy which can represent the performance of extracted features, k is a positive constant, and R BW is the Fisher criterion which can indicate the class separability. R BW is the ratio of between-class distance S b and within-class distance S w [33]. High classification accuracy and large class separability can be obtained by maximizing the fitness function, which results in evolving more discriminate information than KECA with a proper k. Therefore, good generalization performance for WKECA is possible to be acquired on both training and testing samples. (5) Terminating conditions: when the value of fitness does not change again during the iteration procedure or the number of iterations has reached the maximum value (50 in this study) the program will terminate.

Fault Diagnosis Based on WKECA
The high-dimensional feature set, which can represent well the operating condition of machines, should be first extracted from the raw vibration signals. Generally, the vibration signals of fault bearings are non-stationary, and wavelet packet decomposition (WPD) that can provide a more meticulous analysis is a powerful tool in dealing with non-stationary signals [34]. WPD is effective for decomposing both high-and mid-frequency information from a signal into the corresponding frequency regions, widely used for fault diagnosis of bearings now [34][35][36][37][38]. In this study, WPD is performed to extract the fault features including the relative energy in a wavelet packet node (REWPN) and the entropy in a wavelet packet node (EWPN). The REWPN indicates the normalized energy of the wavelet packets node, and the EWPN represents the uncertainty of the normalized coefficients of the wavelet packets node [39]. For a given sample x(n), the jth wavelet packet coefficients of the i-th wavelet packet node is defined as C i j , and then REWPN and EWPN can be expressed as follows: where p i j = C i j 2 /∑ K j=1 C i j 2 , N is the total number of wavelet packet nodes, and K is the total number of wavelet packet coefficients in each wavelet packet node. The REWPNs and EWPNs can truly reflect the diversity among different fault patterns of bearings. They are used as the high-dimensional input vector to WKECA for dimensionality reduction, which can be written as x i = [REWPN (1), ..., REWPN (p), EWPN (1), ..., EWPN (p)] T . Here, p is the number of wavelet packet node. The implementation process of the proposed fault diagnosis method using WKECA for bearings is detailed as shown in Figure 1: (1) Decomposing the vibration signals into different frequency bands by using WPD, and then we can acquire the high dimensional feature set X = [x 1 , ..., x N ] T including REWPNs and EWPNs, where N is the number of the signal samples.
(2) Carrying out feature extraction to the high-dimensional dataset obtained from vibration signals with WKECA algorithm, capturing their intrinsic manifold structure, and then we can obtain the low-dimensional features by projecting the original high-dimensional observed space into low-dimensional feature space. Meanwhile, the optimal mapping direction can be acquired so that new testing samples can be mapped into the low-dimensional feature space. performed to extract the fault features including the relative energy in a wavelet packet node (REWPN) and the entropy in a wavelet packet node (EWPN). The REWPN indicates the normalized energy of the wavelet packets node, and the EWPN represents the uncertainty of the normalized coefficients of the wavelet packets node [39]. For a given sample x(n), the jth wavelet packet coefficients of the i-th wavelet packet node is defined as j i C , and then REWPN and EWPN can be expressed as follows: , N is the total number of wavelet packet nodes, and K is the total number of wavelet packet coefficients in each wavelet packet node. The REWPNs and EWPNs can truly reflect the diversity among different fault patterns of bearings. They are used as the high-dimensional input vector to WKECA for dimensionality reduction, which can be written as xi = [REWPN (1), ..., REWPN (p), EWPN (1), ..., EWPN (p)] T . Here, p is the number of wavelet packet node. The implementation process of the proposed fault diagnosis method using WKECA for bearings is detailed as shown in Figure 1: (1) Decomposing the vibration signals into different frequency bands by using WPD, and then we can acquire the high dimensional feature set X = [x1, ..., xN] T including REWPNs and EWPNs, where N is the number of the signal samples.
(2) Carrying out feature extraction to the high-dimensional dataset obtained from vibration signals with WKECA algorithm, capturing their intrinsic manifold structure, and then we can obtain the low-dimensional features by projecting the original high-dimensional observed space into lowdimensional feature space. Meanwhile, the optimal mapping direction can be acquired so that new testing samples can be mapped into the low-dimensional feature space.

Experimental Description
To evaluate the effectiveness of the WKECA, an experimental study on fault diagnosis of rolling bearings was performed. As shown in Figure 2, the tested bearings were delivered through the automatic machinery system which contained the preset mechanism, the measuring mechanism, the sorting mechanism, and the feeding mechanism [40,41]. The radial vibration signals on one point of the tested bearings were detected by a piezoelectric acceleration sensor (YD-1, Far East Vibration

Experimental Description
To evaluate the effectiveness of the WKECA, an experimental study on fault diagnosis of rolling bearings was performed. As shown in Figure 2, the tested bearings were delivered through the automatic machinery system which contained the preset mechanism, the measuring mechanism, the sorting mechanism, and the feeding mechanism [40,41]. The radial vibration signals on one point of the tested bearings were detected by a piezoelectric acceleration sensor (YD-1, Far East Vibration (Beijing) System Engineering Technology Co., Ltd., Beijing, China) located on the top of the bearings, and amplified by a charge amplifier (DHF-2, same company as the sensor). The charge sensitivity and frequency response of the sensor are 6-10 pC/ms −2 and 1-10,000 Hz ± 1 dB, respectively, and the frequency range of the amplifier is 0.3 Hz-100 kHz. Then the signals were converted to voltage signals by an A/D converter (PCI-9114) (ADLINK Technology, Inc., Taiwan) and sent to a computer for further processing. The sampling frequency was 25 kHz, and the rotational speed of the driving motor was set to 1500 rpm. (Beijing) System Engineering Technology Co., Ltd., Beijing, China) located on the top of the bearings, and amplified by a charge amplifier (DHF-2, same company as the sensor). The charge sensitivity and frequency response of the sensor are 6-10 pC/ms −2 and 1-10,000 Hz ± 1 dB, respectively, and the frequency range of the amplifier is 0.3 Hz-100 kHz. Then the signals were converted to voltage signals by an A/D converter (PCI-9114) (ADLINK Technology, Inc., Taiwan) and sent to a computer for further processing. The sampling frequency was 25 kHz, and the rotational speed of the driving motor was set to 1500 rpm. Deep groove ball bearings (6328-2RZ) (Changjiang bearing co., LTD, Chongqing, China) were used as the tested bearings, and four different operating conditions (i.e., inner race fault, outer race fault, ball fault, and normal condition) were simulated in this experiment. Single point defects were introduced to the tested bearings by electric engraving pen, where the widths of the scratch defects were 65 ± 22 μm, 70 ± 20 μm, and 70 ± 20 μm for the inner race, outer race and ball, respectively, and the depths of the scratch defects were 0.2 ± 0.05 mm. The characteristic bearing defect frequencies can be calculated by [42]: Defect on inner race (BPI) = Defect on outer race (BPO) = Defect on ball (BS) = where Z is the number of rolling elements, fr is the rotational frequency, d is the diameter of the rolling element, D is the pitch diameter, and α is the contact angle. According to the kinematic parameters of the tested bearings and the rotational speed, the characteristic bearing defect frequencies of the inner race, outer race and ball are 121.75 Hz, 78.25 Hz and 55 Hz, respectively. Figure 3 indicates the four different vibration signal waveforms in the time-domain together with the amplitude spectrums. The peak values of the accelerations are obtained at 24.42 Hz which is closed to the rotational frequency 25 Hz. As observed, it is difficult to distinguish different faults only from Figure 3 due to the effects of the noise. The vibration signals under those four conditions are selected as samples, and 100 bearings for each state were tested. Thus, 400 data can be obtained, and the length of each data set is 25,000. The training data set is half samples of the original data set in the experiment. Deep groove ball bearings (6328-2RZ) (Changjiang bearing co., LTD, Chongqing, China) were used as the tested bearings, and four different operating conditions (i.e., inner race fault, outer race fault, ball fault, and normal condition) were simulated in this experiment. Single point defects were introduced to the tested bearings by electric engraving pen, where the widths of the scratch defects were 65 ± 22 µm, 70 ± 20 µm, and 70 ± 20 µm for the inner race, outer race and ball, respectively, and the depths of the scratch defects were 0.2 ± 0.05 mm. The characteristic bearing defect frequencies can be calculated by [42]: Defect on outer race (BPO) Defect on ball (BS) = f r D 2d where Z is the number of rolling elements, f r is the rotational frequency, d is the diameter of the rolling element, D is the pitch diameter, and α is the contact angle. According to the kinematic parameters of the tested bearings and the rotational speed, the characteristic bearing defect frequencies of the inner race, outer race and ball are 121.75 Hz, 78.25 Hz and 55 Hz, respectively. Figure

Dimensionality Reduction and Pattern Classification
The high dimensional feature set containing REWNs and EWPNs are first constructed. The wavelet packet node energy features obtained by Daubechies2 (db2) wavelet packet decomposition were found to achieve the best classification performance for bearing fault diagnosis after many experiments on a serials of Daubechies wavelets [43]. Here the Daubechies2 (db2) is selected as the mother wavelet function to implement binary WPD for vibration signals, where the maximum decomposition level is set to 4. The normalized wavelet packet energy and wavelet packets node entropy spectrums of the bearing vibration signals are shown in Figure 4

Dimensionality Reduction and Pattern Classification
The high dimensional feature set containing REWNs and EWPNs are first constructed. The wavelet packet node energy features obtained by Daubechies2 (db2) wavelet packet decomposition were found to achieve the best classification performance for bearing fault diagnosis after many experiments on a serials of Daubechies wavelets [43]. Here the Daubechies2 (db2) is selected as the mother wavelet function to implement binary WPD for vibration signals, where the maximum decomposition level is set to 4. The normalized wavelet packet energy and wavelet packets node entropy spectrums of the bearing vibration signals are shown in Figure 4. Obviously, different bearing faults have different amplitude in different frequency bands. 32 fault features in total including 16 REWPNs and 16 EWPNs are used for fault diagnosis of bearings. After the high-dimensional feature set is constructed, it is input into WKECA for non-linear dimension reduction, where the parameter k of the fitness function is set to 0.001. The first d most significant component vectors contributing most to the Renyi entropy are extracted by WKECA, and similar methods including PCA, KPCA and KECA are conducted for comparison. The target dimensionality for every method is set to a certain number so that the cumulative variance contribution rate is more than 95%. For visualization, the plots of the first three principal components of their projection results are shown in Figures 5-8, where Figures 5a, 6a, 7a and 8a represent the training results, and Figures 5b, 6b, 7b, and 8b represent the testing results. It is evident that PCA, KPCA and KECA are not well separated those four classes because some samples are overlapped, which will lead to low recognition accuracy. By contrast, WKECA has little misjudgment samples: the testing points are consistent with the training points in WKECA, and the WKECA algorithm can obviously identify different classes both for the training samples and the testing samples. It proves that WKECA has better clustering performance than PCA, KPCA and KECA, because WKECA introduces the fault class label information and a weight strategy into feature extraction, which is conductive to pattern recognition. After the high-dimensional feature set is constructed, it is input into WKECA for non-linear dimension reduction, where the parameter k of the fitness function is set to 0.001. The first d most significant component vectors contributing most to the Renyi entropy are extracted by WKECA, and similar methods including PCA, KPCA and KECA are conducted for comparison. The target dimensionality for every method is set to a certain number so that the cumulative variance contribution rate is more than 95%. For visualization, the plots of the first three principal components of their projection results are shown in Figures 5-8, where Figures 5a, 6a, 7a and 8a represent the training results, and Figure 5b, Figure 6b, Figure 7b, and Figure 8b represent the testing results. It is evident that PCA, KPCA and KECA are not well separated those four classes because some samples are overlapped, which will lead to low recognition accuracy. By contrast, WKECA has little misjudgment samples: the testing points are consistent with the training points in WKECA, and the WKECA algorithm can obviously identify different classes both for the training samples and the testing samples. It proves that WKECA has better clustering performance than PCA, KPCA and KECA, because WKECA introduces the fault class label information and a weight strategy into feature extraction, which is conductive to pattern recognition.

Results and Discussion
Within the fault diagnosis related to pattern recognition in conjunction with feature extraction techniques that find low-dimensional representation for samples, classifiers are needed to identify those different bearing faults. Support vector machine (SVM) is adopted for its well-developed statistical learning theory. 50 data from inner race fault, outer race fault, ball fault, and normal condition were selected randomly for SVM training and the others were used for testing. The quantitative evaluation procedure for SVM, PCA-SVM, KPCA-SVM, KECA-SVM, and WKECA-SVM were repeated for 10 times. In order to highlight the effectiveness of the proposed WKECA-SVM method, the fault detection rate of the method was compared with the results of the other four methods. The testing average results are summarized in Table 1, and the classification accuracies are 77.5%, 83%, 89.5%, 93% and 97%. The results demonstrate that satisfactory overall classification results have been achieved by means of the dimension reduction, and the classification accuracy is significantly improved by introducing WKECA. WKECA performs better than the other methods in terms of extracting discriminative features which can lead to high classification rates. Therefore, WKECA is suitable as a feature extraction step prior to classification, and functions well for fault patterns recognition. To obtain discriminative representations through GA, a suitable fitness function is important to the whole recognition procedure. Therefore, it is necessary to know the effects of the parameter k in fitness function. Table 2 presents the results of evolutionary process with different k, where CAtest is the testing accuracy. It is obvious that RBW increases with the raising of k while CAtest decreases accordingly. This observation reflects that k can adjust the contribution of class separability to the fitness function, and a proper k can lead to larger RBW as well as good classification performance.

Results and Discussion
Within the fault diagnosis related to pattern recognition in conjunction with feature extraction techniques that find low-dimensional representation for samples, classifiers are needed to identify those different bearing faults. Support vector machine (SVM) is adopted for its well-developed statistical learning theory. 50 data from inner race fault, outer race fault, ball fault, and normal condition were selected randomly for SVM training and the others were used for testing. The quantitative evaluation procedure for SVM, PCA-SVM, KPCA-SVM, KECA-SVM, and WKECA-SVM were repeated for 10 times. In order to highlight the effectiveness of the proposed WKECA-SVM method, the fault detection rate of the method was compared with the results of the other four methods. The testing average results are summarized in Table 1, and the classification accuracies are 77.5%, 83%, 89.5%, 93% and 97%. The results demonstrate that satisfactory overall classification results have been achieved by means of the dimension reduction, and the classification accuracy is significantly improved by introducing WKECA. WKECA performs better than the other methods in terms of extracting discriminative features which can lead to high classification rates. Therefore, WKECA is suitable as a feature extraction step prior to classification, and functions well for fault patterns recognition. To obtain discriminative representations through GA, a suitable fitness function is important to the whole recognition procedure. Therefore, it is necessary to know the effects of the parameter k in fitness function. Table 2 presents the results of evolutionary process with different k, where CA test is the testing accuracy. It is obvious that R BW increases with the raising of k while CA test decreases accordingly. This observation reflects that k can adjust the contribution of class separability to the fitness function, and a proper k can lead to larger R BW as well as good classification performance. In order to investigate the performance of WKECA in handling the Small Sample Size (SSS) problem with different training sample sizes, PCA, KPCA and KECA were conducted for comparison. Figure 9 presents the recognition rates of the four feature extraction methods and the original features with different numbers of labeled samples. It is obvious that the classification accuracy increases with the raising of training sample sizes. This reveals that the feature extraction based on manifold learning can improve the recognition performance, and WKECA performs better than other methods in achieving high classification accuracy. The effects of SSS problem are obvious in other methods when only ten samples are used for training, while WKECA is less sensitive to the training sample size. This proves that WKECA can capture the intrinsic geometric structure embedded in the data and achieve efficient performance in feature extraction and classification.  In order to investigate the performance of WKECA in handling the Small Sample Size (SSS) problem with different training sample sizes, PCA, KPCA and KECA were conducted for comparison. Figure 9 presents the recognition rates of the four feature extraction methods and the original features with different numbers of labeled samples. It is obvious that the classification accuracy increases with the raising of training sample sizes. This reveals that the feature extraction based on manifold learning can improve the recognition performance, and WKECA performs better than other methods in achieving high classification accuracy. The effects of SSS problem are obvious in other methods when only ten samples are used for training, while WKECA is less sensitive to the training sample size. This proves that WKECA can capture the intrinsic geometric structure embedded in the data and achieve efficient performance in feature extraction and classification.

Conclusions
In this study, a new feature extraction method called weighted entropy component analysis (WKECA) is proposed for fault diagnosis of rolling bearings. It makes the most of the labeled information and introduces a weight strategy in feature extraction, and GA is performed to find optimal weights for achieving high training classification results. The original high-dimensional feature sets are first constructed based on WPD which can provide a more meticulous analysis for signals. WKECA is then used to extract the intrinsic independent features among the multiple manifolds to reflect the states of the rolling bearings. Finally, the extracted intrinsic geometric features are fed into SVM to recognize different operating conditions of bearings. WKECA outperforms PCA, KPCA and KECA in terms of achieving higher testing accuracies. The results demonstrate the feasibility and effectiveness of the proposed method for fault diagnosis of rolling bearings. Next, we are trying to extend our approach to diagnose different faults magnitudes in different machines. The challenge is the great time consumption for training, which is inevitable confronted by almost all evolutionary processes for pattern recognition. Therefore, fast optimal strategies are deserved for further investigation.

Conclusions
In this study, a new feature extraction method called weighted entropy component analysis (WKECA) is proposed for fault diagnosis of rolling bearings. It makes the most of the labeled information and introduces a weight strategy in feature extraction, and GA is performed to find optimal weights for achieving high training classification results. The original high-dimensional feature sets are first constructed based on WPD which can provide a more meticulous analysis for signals. WKECA is then used to extract the intrinsic independent features among the multiple manifolds to reflect the states of the rolling bearings. Finally, the extracted intrinsic geometric features are fed into SVM to recognize different operating conditions of bearings. WKECA outperforms PCA, KPCA and KECA in terms of achieving higher testing accuracies. The results demonstrate the feasibility and effectiveness of the proposed method for fault diagnosis of rolling bearings. Next, we are trying to extend our approach to diagnose different faults magnitudes in different machines. The challenge is the great time consumption for training, which is inevitable confronted by almost all evolutionary processes for pattern recognition. Therefore, fast optimal strategies are deserved for further investigation.