Bearing Fault Diagnosis Based on Statistical Locally Linear Embedding

Fault diagnosis is essentially a kind of pattern recognition. The measured signal samples usually distribute on nonlinear low-dimensional manifolds embedded in the high-dimensional signal space, so how to implement feature extraction, dimensionality reduction and improve recognition performance is a crucial task. In this paper a novel machinery fault diagnosis approach based on a statistical locally linear embedding (S-LLE) algorithm which is an extension of LLE by exploiting the fault class label information is proposed. The fault diagnosis approach first extracts the intrinsic manifold features from the high-dimensional feature vectors which are obtained from vibration signals that feature extraction by time-domain, frequency-domain and empirical mode decomposition (EMD), and then translates the complex mode space into a salient low-dimensional feature space by the manifold learning algorithm S-LLE, which outperforms other feature reduction methods such as PCA, LDA and LLE. Finally in the feature reduction space pattern classification and fault diagnosis by classifier are carried out easily and rapidly. Rolling bearing fault signals are used to validate the proposed fault diagnosis approach. The results indicate that the proposed approach obviously improves the classification performance of fault pattern recognition and outperforms the other traditional approaches.

locally linear embedding (LLE) [9], Laplacian Eigenmaps (LapEig) [10], local tangent space alignment (LTSA) [11], etc. Today, manifold learning methods are widely used in cluster analysis, image processing, bioinformatics, etc. For instance, the LLE algorithm is used to extract characteristic MR features of brain alterations [12] and solve face recognition problems [13]. A local embedding method based on LLE and a semi-supervised LapEig algorithm is presented to achieve lower dimensionality from high-dimensional data and implement data visualization and classification [14,15]. Furthermore, manifold learning is relatively seldom studied in the fault diagnosis field. Yang [16] proposed a nonlinear time series noise reduction method based on principal manifold learning applied to the analysis of gearbox vibration signals with snaggletooth, which was only for signal denoising. The LTSA algorithm as a typical manifold learning method used for dimensionality reduction from original high-dimensional feature datasets [1]. However, this traditional manifold learning is an unsupervised learning method, and cannot be applied efficiently to supervised learning problems.
In some fault diagnosis tasks, data are from multiple classes and the class label information is known, which can help in classification tasks. The information provided by these class labels may be used to guide the dimensionality reduction procedure.
For the supervised expansion of manifold learning, a supervised LLE method (SLLE) for classification problems by utilizing the class label information was proposed by Ridder et al. [17]. Although SLLE improves the performance of LLE related to classification, the information provided by the Euclidean distance between samples is not sufficient to select proper neighbors for classification. A supervised manifold learning approach based on correntropy LLE with class labels information for visualization and classification on noisy artificial and real-world datasets was proposed by Genaro et al. [18]. Based on probability-based distance and the supervised locally linear embedding technique, a novel dimension reduction method for classification is introduced in [19]. A novel supervised manifold learning technique called Supervised Laplacian Eigenmaps (S-LE) was proposed by Raducanu et al. [20], which makes use of class label information to guide the non-linear dimensionality reduction procedure for face recognition problems by adopting the large margin concept. Zhang et al. introduced in [21] a supervised feature extraction method called locally discriminating projection (LDP) and achieved good recognition accuracy. A new supervised manifold learning algorithm based on the S-LapEig algorithm for machinery fault diagnosis was introduced in [22] by Jiang et al., and the good performance of the approach on a series of benchmark and real fault datasets verified its feasibility and efficiency. Su et al. [23] proposed a fault diagnosis method based on supervised extended local tangent space alignment (SE-LTSA) for dimension reduction to improve the effectiveness of fault diagnosis in machinery. Most of these methods are based on the improvement of manifold learning methods and to solve a certain task.
In this paper, aimed at addressing the difficulty of handling high-dimensional nonlinear fault data, we propose a new fault classification approach based on supervised manifold learning for rolling bearing fault diagnosis. Because of the prominent properties of considering both the local geometry information and the class information of the data, the proposed approach has efficient capability to deal with the supervised learning problem. Some experiments with the proposed method show its feasibility and effectiveness.
The remainder of the paper is organized as following: in Section 2, LLE theory and manifold learning methods are reviewed, and a new supervised manifold learning (statistical locally linear embedding, S-LLE) algorithm for feature extraction or reduction is proposed. The implementation steps of the algorithm are described in detail. In Section 3, we discuss a rolling bearing fault diagnosis strategy, and introduce the implementation process and flow chart of the proposed approach. In Section 4, we illustrate a feature extraction method by utilizing time-domain, frequency-domain and EMD analysis of the original rolling bearing vibration fault signals. In Section 5, we introduce the rolling bearing fault experiment setup and signal acquisition first, and then implement feature dimension reduction experimentally by utilizing S-LLE, and comparisons with other feature reduction methods such as PCA, LDA and LLE are also discussed. Finally, the proposed fault diagnosis approach is applied to pattern classification experiments with the original data and reduction of statistical features extracted from multi-domain rolling bearing vibration fault signals by three classification algorithms, namely, CART, K-NN and RBF-SVM, and the classification performance is analyzed and discussed. Finally, the conclusions are presented in Section 6.

Locally Linear Embedding Algorithm
The locally linear embedding (LLE) algorithm was originally proposed by Roweis et al. [9] to achieve non-linear dimension reduction. LLE is an unsupervised learning algorithm that can compute low-dimensional features that preserve the neighborhood relationship as that of the high-dimensional data. In other words, the low dimensional space is required to preserve the neighborhood configuration. The LLE algorithm to compute a lower-dimensional representation of the original data is described as follows.
Given a set of data, is in a high-dimensional input data space D R . The data points are assumed to lie on or near a nonlinear manifold of intrinsic dimensionality dD  . The goal of LLE is to find a low-dimensional embedding of dataset X by mapping the D-dimensional data into a single global coordinate system in Euclidean distance D R . The LLE algorithm can be generalized to three steps: select neighbors, reconstruct with linear weights and map to embedded coordinates. The steps of the LLE algorithm are illustrated in Figure 1.
Step 1: Using the Euclidean distance to define neighborhood of each input vector. Consider the -th Step 3: Computing the coordinates of the original high dimensional data i x in the low dimensional space. The low dimensional embedding is obtained based on the idea that LLE preserves the local linearity from neighbors and the corresponding reconstruction weights nn W  . Using nn W  , the low dimensional output space Y can be computed by minimizing the cost function: Subject to two constraints: where ij w is the same linear combination weight in the high-dimensional space and where Y is an nd  embedding data matrix (being dD  ), and d i yR  is the output sample vector, and 0 is a column vector of zeros. Based on W , we can define a sparse, symmetric, and positive semi definite matrix M as follows: Then, rewriting Equation (3) to find Y : It is possible to calculate d + 1 eigenvectors of M , which are associated to the d + 1 smallest eigenvalues. The first eigenvector is the unit vector with all equal components, which is discarded. The remaining d eigenvectors constitute the embedding coordinates found by LLE.

Statistical Locally Linear Embedding Algorithm
As mentioned above, the class label information is not utilized in the LLE dimensionality reduction process. The statistical locally linear embedding algorithm (S-LLE) is proposed based on utilizing class label information in the LLE algorithm to improve performance. The main idea of S-LLE is to utilize sample statistics to measure the relationship between samples. The membership of a sample laying on a nonlinear manifold can still be estimated using either parametric or nonparametric approaches.
In the parametric approach, each class is decomposed to a set of clusters (subclasses) because the high dimensionality data is complicated and cannot be modeled by only one Gaussian model. A cluster-space representation for classification is extended to estimate the pairwise similarity between samples [24]. The K-means clustering algorithm [25] was selected due to its efficiency and easy implementation. The purpose of the K-means algorithm is to partition data into a fixed number of K clusters by minimizing the mean squared error distance. After clustering, the training sample would be labeled into KL clusters, where L is the number of classes and K is the number of clusters of each class. That means we have KL subclasses. The probability density of a sample X with D dimension in subclass ω ik can be defined as: where ω ik is the k-th cluster in class i , ik m is the mean of and ik ik S S r I     is the sample covariance matrix of ω ik plus a small value r times identity matrix to prevent singularity.
Since the subclasses are mutually exclusive and statistically exhaustive, the likelihood of sample x in a class can be determined by the sum overall the subclasses, is given by: where i n is the number of samples of the k-th class, and n is the number of samples of k-th cluster in class i . In the nonparametric approach, the Parzen-window density estimation technique [26] was used to estimate the membership of samples and devise the statistic based similarity between samples. The Parzen-window defined a D -dimensional hypercube local region V with its length h to estimate density, i.e., D Vh  , the number n k falls into the hypercube V centered at x given by: where φ()  is a kernel function defined as: Therefore the density can be estimated as: where j n is number of samples in class ω j . Thus the likelihood of a sample for each class can be estimated as follows: Without loss of generality, we assume that samples are independent. Therefore, the likelihood of samples x and x belonging to the same class ω i is equal to the product of the individual likelihoods.
A new distance measure between any pair of samples can be defined as: The measurement d incorporates the class information provided by the statistical cluster model for each class. In S-LLE, d is used to find the neighbors of each sample instead of the original Euclidean distance in the first step of LLE algorithm. Later, the K-nearest neighbors of each sample are determined according to d . The subsequent steps follow the same procedure as LLE.
Both the parametric and nonparametric approaches can be used in statistical LLE to estimate the class-conditional probability density function of samples. The statistical LLE algorithm improves the performance of LLE in classification applications by incorporating class label information.

Statistical Locally Linear Embedding Algorithm for Bearing Fault Diagnosis
In this paper, we propose a new nonlinear dimensionality reduction method based on supervised manifold learning theory, which is a new fault diagnosis approach called the S-LLE algorithm. Taking special consideration of both the information of labeled data and local neighbor geometry information, the algorithm can obtain the whole intrinsic geometry of the dataset, and has good data classification performance. The approach first learns the intrinsic geometric structure of the fault data in the signal space to capture the nonlinear embedded manifold features and map the high-dimensional fault data into a low-dimensional embedded space. The process of the presented algorithm consists of two parts: training and testing. First we divide the fault dataset into a training dataset and a test dataset. Using the proposed S-LLE algorithm, the class label information is used to determine neighbors of the training dataset so as to map overlapping high-dimensional data into clusters in the embedded space, we map the training dataset into a low-dimensional feature space, and exploit the characteristic patterns of the dataset.
In order to further improve the diagnosis performance and ensure the diagnosis reliability, the rolling bearing fault diagnosis model based on the supervised manifold learning S-LLE approach can be described in four main steps as follows: signal acquisition, feature extraction, dimensionality reduction and pattern recognition. The implementation process and flow chart of the proposed approach is shown in Figure 2: (1) Signal acquisition: The acquisition of the original vibration signals is the first step in the rolling bearing fault diagnosis process. learning method S-LLE is used to select the salient features from the raw statistical feature dataset. (4) Pattern recognition: Implementing fault classification of the training samples in the low-dimensional embedded space according to class label information and learning geometric structure feature by optimized classifiers. To test the dataset, we also map it onto the same feature space according to the mapping matrix of the training dataset, and evaluate the classification capability. Finally, pattern recognition is carried out in the embedded spaces. In order to reliably diagnose complex roller bearing faults, the proposed fault diagnosis approach is applied for the roller bearings fault diagnosis.  Compared with other fault diagnosis methods, this method has several advantages as follows: (1) The method is based on nonlinear dimensionality reduction and can treat high-dimensional nonlinear data, which avoids the "curse of dimensionality". (2) The method can capture more accurately the intrinsic geometric distribution properties of samples by the sample label information, and utilize the obtained distribution feature to classify the fault category.
(3) The feature extraction method based on time-domain, frequency-domain and time-frequency domain is simple and the implementation speed is high, which greatly reduces the fault diagnosis difficulties.

High-Dimensional Fault Features Extracted from Accelerometer Sensor Vibration Signals
In recent years, intelligent fault diagnosis based on statistical features has received extensive attention because it can exploit important fault-related information contained in machinery operating vibration signals, and many fault diagnosis methods are proposed based on the features extracted from vibration signals [27]. Vibration signal analysis is an important means in the field of online detection and fault diagnosis of mechanical equipment. Generally, these signals are generated by accelerometer sensors on rolling bearings [28]. When faults occur in a rolling bearing, the vibration signal in this malfunction condition will deviate in the time-domain from that of normal condition both in amplitude and phase position. Meanwhile, the amplitude and distribution in the frequency spectrum of the transformed vibration signal also will change under different fault conditions [1].
The essential aim of signal processing is to map a signal from the time domain into another space in which some important information of the signals can be revealed, and consequently, some dominant features of the signals can be extracted [29]. When faults or abnormal running states occur, the measured vibration signals are usually non-stationary and non-linear, and the components of the vibration signals are also very complicated. Time-frequency domain methods are considered to be best way for analyzing the nonlinear and non-stationary signals of faulty bearings [30,31] due to the deficiencies of the Fourier transform. Wavelet transform [32,33] is a commonly used time-frequency domain signal analysis method, but the results of wavelet transform are related to the selected wavelet basis. Empirical Mode Decomposition (EMD) [30] doesn't need a base function and is completely based on the local characteristic time scale of the signal, so EMD is a self-adaptive signal processing method that is applicable to non-stationary and non-linear vibration signals and fault diagnosis of rotating machinery [34,35].
Many research results show that a multi-domain feature set can fully represent the bearing fault feature information which can provide an effective diagnosis for various faults of rolling bearings operating under variable speed and load or unknown speed conditions. For this purpose, various original features that can be extracted from accelerometer sensor signals of bearings have been investigated. This section presents a vibration signal feature calculation method from the time-domain, frequency-domain, and time-frequency domain as they will be used in a high-dimensional fault feature dataset.  Tables 1 and 2. These statistical features have demonstrated their effectiveness in previous publications, not only independent of speeds and loads, but to some extent they also can indicate complex roller bearing faults.  However, when the roller bearing fault is at an early stage, it is very hard to distinguish the feeble fault features only from the time-domain and frequency-domain signals. In this study, we adopt EMD energy to further mine for more characteristic information for accurate roller bearing diagnosis. EMD energy can reveal the original vibration signal in the time-frequency amplitude and distribution view. Some previous researches show that the typical roller bearing faults can be identified by EMD energy [30,34].
The original vibration signal is decomposed into a finite stationary intrinsic mode function (IMF) by the EMD method, the procedure needed to generate the adaptive IMF basis. An IMF is defined as any function satisfying the following conditions: (1) In the whole dataset, the number of extrema and the number of zero crossings must either equal or differ by at most one; (2) At any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero.
Based on EMD algorithm, the rolling bearing original vibration signal () xt can be decomposed into a set of IMFs: where () j ct is the -th j -IMF of the signal () xt which represents different frequency bands ranging from high frequency to low frequency, and () n rt is the final residue which can be a monotonic trend. More detailed information about EMD can be found in [5]. For an arbitrary time series () xt , we can always have its Hilbert transform () yt : where P represents the Cauchy principal value. With this definition () xt and () yt , we can have an analytic signal () zt : In which, It is also well know that the amplitude of the analytic signal () at , in Equation (13)  where n is the IMF number. In this study, the first six intrinsic mode functions (IMFs) containing almost all the valid information are selected. When different faults occur in the roller bearing, each IMF component will change in terms of its intrinsic information. By an energy calculation for each IMF component, the characteristic information of the original signal can be extracted more fully and effectively. Thus, the whole multi-domain feature set containing 22 statistical features is constructed, which is composed of 12 time domain, four frequency domain and six time-frequency domain terms.

Experiment Setup and Signal Acquisition
To evaluate the effectiveness of the feature extraction and dimensionality reduction methods for rolling bearings, bearing vibration data of different faults related to the bearing in this paper were provided by the Bearing Data Center of the Case Western Reserve University (CWRU), and acquired by bearing accelerometer sensors under different operating loads and bearing conditions [36]. The bearing data have been validated in many researches [37,38] and has become a standard dataset for rolling bearings. In this experiment, the rolling bearings are installed in a motor-driven mechanical system. As shown in Figure 3, the rolling bearing fault test rig is made up of a 2 HP motor (left), a torque transducer/encoder (center), and a dynamometer (right). The bearing type is SKF6205-2RS JEM, a deep groove ball bearing. Four types of vibration signal datasets (normal, ball fault, inner race fault and outer race fault) are acquired from the bearings with the sampling frequency of 48 kHz during about 10 s by using a 16 channel DAT recorder, and tested under motor loads is 2 HP at the speed of 1750 r/min. A single point fault is introduced to the test bearing inner race and outer race, respectively, using an electro-discharge machining with the fault diameter of 21 mils inches and the fault depth of 11 mils (1 mil = 25.4 um). More detailed information about the test rig can be found in [36]. The length of the signal data in every dataset is about 480,000, we can extract 100 samples for each vibration condition, that is, every sample data includes 4096 points, and thus the overall dataset consists of 400 samples. Figure 4 presents the vibration signal waveforms and power spectra from four signal samples of the different fault types.

Feature Extraction
For the every obtained data set, we extract statistical 22 features following the time-domain, frequency domain and time-frequency domain for the next feature dimension reduction. Twelve time-domain and four frequency-domain statistical features could be calculated directly using the feature definition equations as shown in Tables 1 and 2, and time-frequency domain features are extracted from the EMD energy. The calculated value of the six dimensional and six dimensionless time-domain statistical features are shown in Figures 5 and 6, respectively, and the calculated value of the four frequency-domain statistical features are shown in Figure 7. The six time-frequency domain statistical features obtained from the first six IMFs energy are obtained by applying EMD method. The EMD result of a signal sample in the dataset is shown in Figure 8.     The normalized IMFs energy was analyzed after EMD, and the results are shown in Figure 9, and the energy distributions are mutually different. From Figure 9, we note that there is a relatively large difference between the normal bearing and the faulty bearing signals.

Feature Dimension Reduction
In traditional diagnosis methods, these 22 features value are calculated from vibration signals to construct feature vectors which are directly input into the diagnosis model for rolling bearing fault diagnosis. However, due to the fact the diagnosis model involves too many parameters, directly applying a model on such high-dimensional feature set is very complicated and inefficient. Based on the manifold learning theory, the specific samples in the same state have the same distribution property and geometric structure in the sample space. The samples in different states also have different embedded manifolds. One only needs to select the salient features by a manifold learning algorithm to recover the geometric distribution embedded in the high-dimensional fault features [1].
In order to demonstrate the superiority of the presented S-LLE dimensionality reduction method, when S-LLE is carried out in the process of the training sample labeled into KL clusters, K is set to 4 and L is set to 4. An experiment was conducted on the dataset to evaluate its dimensionality reduction performance on the sample dataset and make a comparison with PCA, LDA, and LLE as the most representative dimensionality reduction approaches. The experimental results of dimensionality reduction with the four approaches are shown in Figure 10, where it can be seen that PCA, LDA and LLE have poor sample classification performance. PCA and LDA obviously have three classes of overlap and LLE obviously has two classes of overlap. Compared with them, S-LLE can obtain a more clear separation of the clustering on the mapping, so S-LLE can identify each fault accurately for all feature samples. This is due to the fact that S-LLE has a greater ability to discovery local neighbor geometry information in the data manifold by utilizing the class label information. Therefore, we can use the S-LLE algorithm to obtain the original multi-domain feature dataset and select the salient features. This added process can capture intrinsic global geometric structure embedded in the high-dimensional fault features and achieve an efficient classification for fault pattern recognition.

Classification Performance Analysis
In this section, three classifiers are adopted to evaluate the performance of the feature reduction method based on S-LLE, which are classification and regression trees (CART), K-nearest-neighbor classifier (K-NN, K = 4), and support vector machine with RBF kernel (RBF-SVM). In this study, PRTools [39] is used to implement K-NN, and CART classifier, and LIBSVM [40] is used to implement the SVM classifier. For the RBF-SVM classifier, there is a parameter C to control the trade-off between the margin and the size of the slack variables, and there is also a parameter  for the RBF kernel function. Hence, we will use the ten-fold cross-validation and the gird search to find the best C These classifiers trained on the reduced feature dataset are compared to that of classifiers trained on the 22 original statistical feature dataset. In this experiment, 50 signal samples per class are selected randomly as a training dataset, thus 200 samples are collected as the training dataset to calculate the fitness function and construct the diagnosis model, and 20, 40, 60, 80, 100 samples per class, respectively, are selected as the test dataset to test the classification accuracy rate. Each experiment is carried out ten times, to give ten classification results. The average classification accuracy using statistical LLE is presented in Table 3. Figure 11 shows the visualized representation of the comparison result of the average classification accuracy between the proposed method and the classical methods.  Further comparison on the classification effect between statistical LLE and supervised LLE [17] methods which all utilize the class label information, the parameter α which controls the amount to which class label information fully used should be incorporated is set to 0.5, using the same calculation procedure as above, the average classification accuracy using supervised LLE is presented in Table 4, and the visualized representation is shown in Figure 12.   As shown in Tables 3 and 4 and Figures 11 and 12, it can be seen that various classifiers perform better on the reduced feature dataset than on the original statistical feature dataset, and the average classification accuracy by various classifiers using the statistical LLE method is higher than that using supervised LLE. From Table 3, among these classifiers, the CART classifier works the worst on the original statistical feature set (the average classification accuracy is in the range of 77.76%-89.24%), whereas the reduced features followed by the CART classifier perform relatively well (the average classification accuracy is in the range of 90.54%-93.56%). Among the three classifiers, the optimized RBF-kernel SVM has a higher diagnostic accuracy than that of CART, and K-NN in roller bearing fault diagnosis, and the accuracies on the reduced feature dataset are in the range of 94.07%-97.26%, and on the original feature dataset they range from 82.32% to 92.79%. Therefore, a reasonable feature dimension reduction method is a necessary step prior to final classification on account of the fact the original feature dataset contains too much fault unrelated or redundant information. Overall, from the above experimental results, it can be seen that the parameter-optimized SVM model has high diagnostic accuracy; these experimental results also indicate that the proposed diagnosis model is obviously superior to the traditional diagnosis methods with reduced feature datasets. This demonstrates the effectiveness of S-LLE for feature dimensionality reduction of the given input space, and also confirmed the obviously improveed performance of the classifier.

Conclusions
In this paper, aimed at addressing the difficulty of dealing with high-dimensional nonlinear fault data, we propose a new fault reduction and classification approach based on statistical locally linear embedding (S-LLE) for rolling bearing fault diagnosis. Because of the prominent features of considering both the local geometry information and the class information of the data, the proposed approach efficiently deals with the supervised learning problem. For dealing with the test fault samples, the approach applies S-LLE to find the projection that best approximates the implicit mapping from high-dimensional feature samples dataset to their embedding. The experimental result show that S-LLE outperforms the other traditional dimensionality reduction methods such as PCA, LDA and LLE. Finally fault classification is carried out in the embedded space. Some experiments show the RBF-SVM classifier has the best fault classification performance through the use of the feature reduction methods based on S-LLE. The experimental results indicate that the proposed approach obviously improves the fault classification performance, and can be an effective and efficient tool for rolling bearing fault diagnosis. Therefore, we can safely make use of S-LLE in order to extract the most effective and salient features for classification in practical applications.