Spectral Regression Based Fault Feature Extraction for Bearing Accelerometer Sensor Signals

Bearings are not only the most important element but also a common source of failures in rotary machinery. Bearing fault prognosis technology has been receiving more and more attention recently, in particular because it plays an increasingly important role in avoiding the occurrence of accidents. Therein, fault feature extraction (FFE) of bearing accelerometer sensor signals is essential to highlight representative features of bearing conditions for machinery fault diagnosis and prognosis. This paper proposes a spectral regression (SR)-based approach for fault feature extraction from original features including time, frequency and time-frequency domain features of bearing accelerometer sensor signals. SR is a novel regression framework for efficient regularized subspace learning and feature extraction technology, and it uses the least squares method to obtain the best projection direction, rather than computing the density matrix of features, so it also has the advantage in dimensionality reduction. The effectiveness of the SR-based method is validated experimentally by applying the acquired vibration signals data to bearings. The experimental results indicate that SR can reduce the computation cost and preserve more structure information about different bearing faults and severities, and it is demonstrated that the proposed feature extraction scheme has an advantage over other similar approaches.


Introduction
Bearings are one of the most important components in rotating machinery [1]. Many of the faults of rotating machinery relate to the bearings, whose running conditions directly affect the precision, reliability and life of the machine [2]. Breakdowns caused by bearing performance degradation and inappropriate operation can not only lead to huge economic losses for enterprises, but also potentially serious casualties [3]. In recent years, therefore, bearings fault prognosis technology has received more and more attention, in particularly fault feature extraction (FFE) of bearing accelerometer sensor signals has become more and more important in order to avoid the occurrence of accidents.
Bearing accelerometer sensor signal analysis-based techniques, which are the most suitable and effective ones for bearing, have been extensively used since in machine prognosis it is easy to obtain sensor signals containing abundant information. These techniques mainly include three categories, namely, time domain analysis, frequency domain analysis and time-frequency domain analysis. Time domain analysis calculates characteristic features of signals statistics such as root mean squares (RMS), kurtosis value, skewness value, peak-peak value, crest factor, impulse factor, margin factor, etc. Frequency domain analysis search for a train of ringing occurring at any of the characteristic defect frequencies, it is widely applied in fast Fourier transform (FFT), spectrum analysis, envelop analysis, etc. Time-frequency domain analysis investigates given signals in both the time and frequency domain, which is successfully developed for non-stationary signals, and some different technologies such as short-time Fourier transform (STFT), wavelet transform (WT), wavelet packet transform (WPT), Hilbert-Huang transform (HHT), etc. are described in the literature [3][4][5][6]. Among them, energy features of reconstructed vibration signals are commonly calculated for the purpose of signal analysis, for example, the wavelet energy can represent the characteristics of vibration signals. Consequently, a lot of original features can be generated from accelerometer sensor signals, therefore it is a necessity to deal with large scale feature dimensions. The biggest challenge is how to extract the most useful information that can reflect comprehensive performance degradation. Previous research has shown that different features are sensitive to different faults and degradation stages, for example, kurtosis value and crest factor are sensitive to impulse faults, especially in the incipient stage, but they will decrease to normal-like levels as the damage grows, which shows that the stability of these features is not satisfactory [7].
Feature extraction means transforming the existing features into a lower dimensional space which is useful for feature reduction to avoid the redundancy due to high-dimensional data [8]. Principal component analysis (PCA) might be one of feature extraction techniques which is often used for bearing fault detection or classification, PCA has the ability to discriminate directions with the largest variance in a data set, and extract several representative features by using data projection. Factor analysis (FA) is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors, FA has been demonstrated to be able to extract important knowledge from sensor data based on the inter-correlation of sensor data [9]. Locality preserving projections (LPP) is a linear projective map that arises by solving a variational problem that optimally preserves the intrinsic geometry structure of the dataset in a low-dimensional space. Liao and Lee in [10] used PCA to find the first two principal components (wavelet packet node energy) which contain more than 90 percent of the variation information.
Widodo and Yang in [11] employed PCA to obtain one dimensional features of condition monitor histories from which the survival probability of the historical event data. Côme and Oukhellou in [12] applied the independent factor analysis for intelligent fault diagnosis of railway track circuits, and the diagnosis system aimed to recover the latent variables linked to track circuit defects using features extracted, significantly improving estimation accuracy and removing indeterminacy. Yu in [13] used LPP to extract the most representative features for representing the bearing performance, indicating that LPP could find more meaningful low-dimensional information hidden in the high dimensional feature set compared with PCA. PCA, FA and LPP play a manifest role in feature extraction, however, they have their limitations and don't contain a full exploitation of the multivariate nature of the data [14].
Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning [15], these methods use information contained in the eigenvectors of a data affinity matrix to reveal low dimensional structure in high dimensional data. Spectral regression (SR) is a novel regression framework for efficient regularized subspace learning and feature extraction technology [16]. Different from other similar methods, SR combines the spectral graph analysis and regression to provide an efficient and effective approach for regularized subspace learning problem. It is shown that SR casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Due to its superior properties, for example, the lower computation cost and the more structured information, it can be used in all unsupervised, semi-supervised or supervised problems. SR has been adopted for various applications such as location of the sensor nodes [17], human action recognition [18], facial image retrieval [19], EEG signals [20], etc. To the best of our knowledge, no research results have been published to data on the use of SR for bearing fault feature extraction and machine prognosis, therefore, this paper will be the first time SR was applied to feature extraction of bearing faults.
The rest of the paper is organized as follows: Section 2 proposes the signal processing (including feature calculation) from accelerometer sensors according to the time domain, frequency domain and time-frequency domain. In Section 3, the graph embedding view and SR-based feature extraction approach are introduced. Section 4 gives a description of the experiments and analysis, bearing accelerometer sensor signals from bearings are employed to evaluate the effectiveness of the proposed method. Finally, concluding remarks and future work on this approach are given in Section 5.

Signal Processing from Accelerometer Sensor
To diagnose the abnormality, it is important to record certain physical parameters which vary according to the variation in the operation of the machine [21]. Vibration signals are extensively used in signature matching for abnormality detection and diagnosis. Generally, these signals are generated by accelerometer sensors on bearings [22]. The essential aim of signal processing is to map a signal from the time domain into another space in which some important information of the signals can be revealed, and consequently, some dominant features of the signals can be extracted [23]. For this purpose, various original features that can be extracted from accelerometer sensor signals of bearings have been investigated. This section presents a brief discussion of feature generation from time-domain, frequency-domain, and time-frequency domain as they will be used throughout the paper.
Time domain features often involve statistical features that are sensitive to impulse faults [13], especially in the incipient stage, so we calculated some dimensional features, such as RMS, square root of the amplitude (SRA), kurtosis value (KV), skewness value (SV) and peak-peak value (PPV), in addition, some dimensionless features, such as crest factor (CF), impulse factor (IF), margin factor (MF), shape factor (SF) and kurtosis factor (KF). These features are defined as follows: Frequency domain analysis is another description of a signal, that can reveal some information that cannot be found in the time domain [24]. Frequency domain features are calculated on the basis of FFT from time domain vibration signals, these features often involve statistical results of frequency, such as frequency center (FC), RMS frequency (RMSF) and root variance frequency (RVF), etc. These features are defined as follows: Time-frequency domain methods are considered to be best way for analyzing non-stationary signals [25], due to the deficiency of the Fourier transform. Many time-frequency analysis technologies have been developed, including STFT, WT (or WPT), HHT, etc. In this study, we adopt WPT to present bearing accelerometer sensor signals in time-frequency distribution diagrams with multi-resolution. As we know, wavelet packet analysis (WPA) is an extension of WT which provides complete level-by-level decomposition. As shown in Figure 1, wavelet packets are particular linear combinations wavelets. The wavelet packets inherit properties such as orthogonality, smoothness and time-frequency localization from their corresponding wavelet functions. Let Ψ be a wavelet packet function with three integer indices i, j and k which are the modulation or oscillation parameter, the scale parameter, and the translation parameter, respectively: The wavelet packet coefficients of a signal s can be computed by taking the inner product of the signal and the wavelet packet function: The wavelet packet node energy WPNE(j,k) can represent the characteristics of vibration signals, and it is defined as:  In this application, we use a specific wavelet function -DB4‖ from the Daubechies (DB) wavelets family as the mother wavelet and decompose the vibration signals into four levels. In general, the biggest challenge in wavelet analysis is the selection and determination of the mother wavelet function as well as the decomposition level of signals for the different real-world applications [21]. Different mother wavelet functions and corresponding orders have different effects on the feature extraction. Rafiee etc. in [26] presented a novel solution to find the best mother wavelet function for fault classification purposes as well as the best level of decomposing the vibration signals by wavelet analysis in machine condition monitoring; the experimental results demonstrated that a DB4 orthogonal wavelet discloses abnormal transients generated by the bearing damage from the vibration signals more effectively than other wavelets in the range of DB2 and DB20, and the optimized value of the decomposition level is 4. In addition, a large number of previous studies have demonstrated that DB4 has been widely implemented as it matches the transient components in vibration signals and showed effectiveness in defect detection and fault diagnosis of bearings, because it has the advantages of orthogonality and computational simplicity [27]. Subsequently, we calculate wavelet packet node energy in fourth level as the input features of bearing time-frequency domain:

SR-Based Feature Extraction Approach
In this section, after the graph embedding and SR method are presented, SR-based fault feature extraction approach is proposed to extract useful information from the calculated original features of vibration signals.

Graph Embedding
The SR is fundamentally based on regression theory and spectral graph analysis, so it can be incorporated into other algorithms easily [28]. It can be used in all unsupervised, semi-supervised or supervised problems and integrated with different other suggested regularizers to make it more flexible [29]. In concrete applications, an affinity graph will be constructed first via the labeled and unlabeled samples, in order to reveal the intrinsic structured information and to learn the responses with the given data. Subsequently, with these obtained responses, the ordinary regression is applied for learning the embedding function.
The SR aims at finding a low-dimensional subspace Z = [z 1 , z 2 , …, z m ] (z i ∈R d ), when given where m is the sample number, say x i can be represented with z i . Let x = [x 1 , x 2 , …, x m ] T be high-dimensional space and y = [y 1 , y 2 , …, y m ] T be the mapped low-dimensional space, a reasonable criterion for choosing a map is to minimize: where the matrix W ij with m × m entries contains the weight of the edge, these edges join points x i and x j in a nearest-neighbor graph G with m points. The objective function will be heavily penalized if neighboring points x i and x j are mapped far apart. Therefore, the purpose of minimizing is to ensure that if x i and x j are -close‖ then y i and y j are close as well. Following some algebraic steps, we have: where D is a diagonal matrix, which contains column sums of W, D ii = ∑ j W ji , and L = D − W is the graph Laplacian matrix. And then, the minimization problem in Equation (18) In order to remove the arbitrary scaling factor in the embedding, a constraint y T Dy = 1 will be imposed. Obviously, it is because of L = D − W, Equation (20) is also equivalent to the maximization problem: * y y=1 yy y = arg max y y = argmax yy The optimal y's in Equation (21) can be obtained by solving the generalized eigenvalue problem: For simply mapping for training samples and new testing samples, we choose a linear function here: where A is a n×d matrix, x i is mapped to y i . Substituting Equation (23) into Equation (21), we have: The optimal A's in Equation (24) can be also obtained by solving the generalized eigenvalue problem: This maximum eigen-problem formulation in some cases can provide a more numerically stable solution. In the remainder of this paper, we will develop the SR algorithm based on Equation (25).

Spectral Regression Algorithm
The SR has been used in various applications where it has demonstrated efficacy compared to PCA, FA, and some common manifold techniques in both feature quality and calculation efficiency [15]. Meanwhile, the SR algorithm uses the least squares method to get the best projection direction, rather than computing the density matrix of features, so it also has an advantage in speed. An affinity graph G of both labeled and unlabeled points is constructed to find the intrinsic discriminant structure and to learn the responses with the given data. Then, with these responses, the ordinary regression is applied for learning the embedding function [30].
Given a training set with l labeled samples x 1 , x 2 , …, x l and a testing set with (m − l) unlabeled samples x l+1 , x l+2 , …, x m , where the sample x i ∈R n belongs to one of c classes, and let l k be the number of labeled samples in the k-th class (the sum of l k is equal to l). The SR is summarized as follows: Step1: Constructing the adjacency graph G: Let X be the training set and G denote a graph with m nodes, where the i-th node corresponds to the sample x i . In order to model the local structure as well as the label information, then the graph G will be constructed through the following three steps: (1) If x i is among p nearest neighbors of x j or x j is among p nearest neighbors of x i , then nodes i and j are connected by an edge; (2) If x i and x j are in the same class (i.e., same label), then nodes i and j are also connected by an edge; (3) Otherwise, if x i and x j are not in the same class, then the edge will be deleted between nodes i and j.
Step2: Constructing the weight matrix W: Let W be the sparse symmetric m×m matrix, where W ij having the weight of the edge joining vertices i and j.
(1) If there is no any edge between nodes i and j, then W ij = 0; (2) Otherwise, if both x i and x j belong to the k-th class, where δ ( 0 < δ ≤ 1) is a given parameter to adjust the weight between supervised and unsupervised neighbor information. Therein, s(i, j) is a similarity evaluation function between x i and x j , there are three variations, the first one is Simple-minded function s(i, j) = 1, the second one is Heat kernel function: where  ∈R, the third one is Cosine weight: Step3: Eigen-decomposing: Let D be the m × m diagonal matrix, whose the (i, i)-th element is the sum of the i-th column (or row) of W. Find y 0 , y 1 , …, y c−1 , which are the largest c generalized eigenvectors of eigen-problem: where the first eigenvector y 0 is a vector of all ones with eigenvalue 1.
Step4: Regularized least squares: Calculate c-1vectors a 1 ,…, a c−1 with a k ∈R n (k = 1,…, c−1), therein a k is then a solution of regularized least square problem: where k i y is the i-th element of y k . In order to obtain k a , the following linear equations system can be used to solve through the classic Gaussian elimination method.
where I is a n × n identity matrix.
Step5: SR Embedding: Let A be an n × (c − 1) obtained transformation matrix through the previously mentioned processes, where A = [a 1 , …, a c−1 ]. The testing samples or new sample can be embedded into c − 1 dimensional subspace by:

SR-Based Fault Feature Extraction
Feature extraction, which is a mapping process from the measured signal space to the feature space, can be regarded as the most important step for intelligent fault diagnosis systems [14]. The effective feature extraction is important for the pattern recognition of bearing faults [31]. In this work, we propose an SR-based fault feature extraction scheme for bearing accelerometer sensor signals, The flow chart of the proposed scheme is shown in Figure 2, which includes three parts: i.e., signal processing (or named as feature calculation), feature extraction and method evaluation.  Table 1), which have been enough to represent the bearing performance states and fault severity. As we know, because it is difficult to estimate which features are more sensitive to defect development and propagation in a machine system, as various factors affect the effectiveness of the features. In this case, we believe that it is more helpful to generate more and more various features. Secondly, we extract the most representative features from 29 initial features via the SR-based method. Obviously, very large initial features' dimension will result in decreasing performance of bearing prognosis and therefore also increasing computational costs. How to extract the really effective information of bearing fault is a challenging problem. In this paper, if we choose the first d Based on the new projection data set z using SR-based method, the high-dimensional data space is reduced to a low-dimensional data space, however, retaining the majority of local variation information in the projected data set. With the reduced dimensions and local variance information preservation, the extracted features z will be used as the new input features of pattern recognizers for bearing faults.
Finally, we validate the SR-based method using K-means in the case of original features and extracted features. In this paper, we compare the SR-based method with PCA-based, FA-based ones, etc., and the experiment results show that SR-based method is the best for extracting the useful information to represent bearing performance conditions from the available original features as you can calculate from the vibration signals. Moreover, the validated result confirms that the features extracted by SR ensure effective fault recognition at higher accuracy than the 29 original features.

Data Acquisition
Data acquisition is a process of collecting and storing useful data from targeted physical assets for the purpose of Condition-based Maintenance (CBM). This process is an essential step in implementing a CBM program for machinery fault diagnosis and prognosis. To evaluate the effectiveness of the signal processing and feature extraction methods for bearings, the vibration data related to the bearing and the system investigation in this paper were provided by the Bearing Data Center of the Case Western Reserve University (CWRU), and acquired by bearing accelerometer sensors under different operating loads and bearing conditions [32]. The bearing data of CWRU has been validated in many research works and become a standard dataset for bearing studies [2,13,14,21].
The test-rig shown in Figure 3 consists of a 2 HP motor (left), a torque transducer/encoder (center), a dynamometer (right), and control electronics (not shown). The test bearing type is a 6205-2RS JEM SKF, which is a deep groove ball bearing, the dynamometer is controlled so that desired torque load levels can be achieved. Accelerometer sensors were placed at the 12 o'clock position at the drive end of the motor housing. The experimental rotating frequency is about 30 Hz, the test bearings support the motor shaft and the load was 2 HP at the speed of 1,797 rpm, single point faults were introduced to the inner race, ball and outer race of the test bearings using electro-discharge machining (EDM) with fault diameters of 0.007, 0.014, 0.021 and 0.028 inches, and the fault depth is 0.011 inches. More detailed information about the test-rig can be found in [32]. The vibration signals were collected through accelerometers using a 16 channel digital audio tape (DAT) recorder at the sampling frequency 12 kHz. In order to evaluate the performance of the SR-based feature extraction approach proposed in this paper, we separate the experimental vibration data into four datasets, named as D_IRF, D_ORF, D_BF and D_MIX. Specifically, similar to the ORF and BF datasets, the IRF dataset includes five severity conditions, i.e., normal, and four types of fault bearings with faulty diameter: 0.007 (IRF07), 0.014 (IRF14), 0.021 (IRF21) and 0.028 (IRF28) inches in the inner race of the bearings, respectively. The D_MIX dataset, however, contains four different states which are normal, and three types of faults, i.e., inner race fault (IRF), ball fault (BF) and outer race fault (ORF) all with a fault diameter of 0.014 inches. The length of the signal data in every dataset is 1,024, that is, every example data includes 1,024 points. We extracted 100 examples for each severity condition, and thus the D_MIX and D_ORF dataset consists of 400 examples, simultaneously, the D_IRF and D_BF datasets contain 500 examples, respectively. The detailed description with respect to the experimental datasets is presented in Table 2, where -07‖, -14‖, -21‖ and -28‖ mean that fault diameter is 0.007, 0.014, 0.021 and 0.028 inches. For verifying the proposed scheme in this study, the overall datasets are split into two portions, i.e., training datasets (50%) and test datasets (50%).  Figure 4 presents the vibration signal waveforms from four signal samples of the different fault types in the D_MIX dataset, note that there is a manifest difference in the overall vibration magnitude for the new health bearing when compared with other three types of fault bearings. Nevertheless, we still need to process the signal (calculate signal features) due to very high dimensions of the original vibration signals.  (c) (d)

Signal Processing
For the obtained vibration signal data, we calculate original features following the time domain, frequency domain and time-frequency domain for the next feature extraction. Time domain features could be calculated directly from vibration signals using Equations (1)- (10). For validating the employed time domain features in this work, Table 3 lists the average value of the statistical time domain features in the D_MIX dataset. It can be seen from Figure 5, there are some differences in the various fault types of bearings in the D_MIX dataset, but some existed differences is still not easy to be distinguished, especially in the ball fault bearings.  (c) (d) As mentioned earlier, furthermore, some statistical features of time domain are sensitive to inchoate faults, for instance, RMS and kurtosis values should be able to capture the mutual difference in the time domain signal for the fault and healthy bearings. Figure 6 shows this character of four statistical features of time domain in the D_IRF dataset, we note that the feature of RMS can recognize differences in four bearing conditions, however, kurtosis values, peak-peak value and impulse factor only can capture better infancy fault, where it shows poor ability to identify much more severe faults. (c) (d) The advantage of frequency domain analysis over time domain analysis is its ability to easily identify and isolate certain frequency components of interest. The most widely used conventional analysis is the spectrum analysis by means of fast Fourier transform (FFT), which is a well-established method because of its simplicity. Figure 7 shows the spectrum based on FFT for a normal sample and three different fault samples in the D_MIX dataset, and Figure 8 displays the corresponding spectrum for a normal bearing and three outer race fault bearings with faulty diameter: 0.007 (ORF07), 0.014 (ORF14) and 0.021 (ORF21) inches in the D_ORF dataset, respectively.
The Fourier spectrum analysis provides a general method for examining the global energy-frequency distribution. The main idea of spectrum analysis is to either look at the whole spectrum or look closely at certain frequency components of interest and thus extract features from the obtained vibration signal data. On this basis, we calculate frequency domain features, such as frequency center, RMS frequency and root variance frequency using Equations (11)- (13). However, the features from the FFT analysis results tend to average out transient vibrations and don't provide a wholesome measure of bearing health states. Therefore, one manifest limitation of frequency domain analysis is its inability to handle non-stationary waveform signals, which are very common when machinery faults occur [33].
Time-frequency analysis, which investigates waveform signals in both time and frequency domain, has been developed for non-stationary waveform signals. Traditional time-frequency analysis uses time-frequency distributions, which represent the energy or power of waveform signals in two-dimensional functions of both time and frequency to better reveal fault patterns for more accurate diagnosis. In this study, we decompose vibration signals obtained from the test-rig into four scales using WPT with mother wavelet ‗DB4', Figure 9 displays the original and decomposed signals from a normal bearing sample and a ball fault bearing sample in the D_BF dataset, therein, we list only eight decomposed signals for the purpose of simplifying indication. From Figure 9, we note that there is a relatively large difference between the normal bearing and the ball fault bearing, especially at the high frequency of the decomposed signals. For the purpose of comparison, we calculate the average value of the wavelet packet nodes energy from decomposed signals of the normal bearings and the ball fault bearings with faulty diameter: 0.007 (BF07), 0.014 (BF14), 0.021 (BF21) and 0.028 (BF28) inches in the D_BF dataset using Equation (16), respectively. The normalized wavelet packet energy was analyzed from the corresponding sixteen decomposed signal nodes, the results are shown in Figure 10, its distribution of energy are different mutually.

Feature Extraction
In the technique presented in this paper, the total 29 features were calculated from 10 time domain features, three frequency domain features and 16 time-frequency domain features. In general, it is difficult to estimate which features are more sensitive to fault development and propagation in a machine system, furthermore, the effectiveness of these original features could change under different working conditions. In addition, this amount of original features is too many, thus it could be a burden and decrease the performance of the classifier or recognizer. Therefore, feature extraction and dimension reduction using some related technique are proposed in this study, so that more salient and low dimensional features are extracted for performing bearings fault diagnosis or prognosis.
At first, we take two experiments, each select randomly three features from the total 29 features in the D_MIX dataset, which are illustrated in Figure 11(a,b), respectively. Similarly, we also select randomly three features in the D_IRF dataset, the first and the second selected features are presented in Figure 12(a,b), respectively. It is shown that these features cannot separate well among the conditions of bearing fault because of high-dimensional data tends to redundancy, therefore, we cannot input them into the classifier directly.
In order to validate the performance of SR-based method for feature extraction, SR is originally implemented in the D_MIX dataset, the first d eigenvectors corresponding to the large d eigenvalues are selected to implement data projection using Equation (32). (a) (b) Figure 13(a) shows the data projection result with the first two eigenvectors corresponding to the large two eigenvalues, where the first two projected column vectors are plotted. For the purpose of the comparison, the projected results using PCA, FA and LPP are also illustrated in Figure 13(b-d), respectively. In addition, Figure 14 presents corresponding comparison of the data projection result with the first three eigenvectors.
We generally keep the first several eigenvectors corresponding to the large eigenvalues which can keep most variance information of the given data. However, high input data dimensions could decrease the recognition performance of the classifiers and result in more training time cost. Thus, the selection of the number of the eigenvectors should be based on the requirement of the real-world applications [1]. In this study, we select the first three eigenvectors of data projection result for inputting the classifiers, and we also display the first two eigenvectors of data projection result for visualization well. As shown in Figures 13 and 14, it is obvious that the data projection result with the first two or three eigenvectors using SR outperforms other methods in the D_MIX dataset. Similarly, we also perform these four feature extraction algorithms in the D_IRF dataset, the extracted first two and three features are compared in Figures 15 and 16, respectively. Severity recognition references to the identification of the differentiation of defective states of the bearings, e.g., normal, IRF07, IRF14, IRF21, IRF28 in the D_IRF dataset. From the corresponding compared results, we can observe that SR has better projection performance over other three methods, as it can obtain a more clear separation of the clustering on the map for the corresponding severity recognition. This is due to the fact that SR is capable of discovering local structured information of the data manifold. However, PCA aims to discover the global structure of the Euclidean space. For the D_IRF dataset, each of fault severity classes is a local structure, SR preserves the intrinsic geometry structure of the dataset in a low-dimensional space. This illustrates that the local information could be more meaningful than the global information from given dataset in some industrial situations. In addition, LPP shows better performance than PCA and FA, since LPP is also graph embedding method based on the local structure of the manifold. This result indicates that features extracted via spectral graph embedding analysis could be more effective than which extracted via global structure by PCA and FA, which illustrates that SR-based feature extraction is very effective to extract most sensitive features for fault classification and severity recognition tasks. As we know, the clearer the separation, the more robust a classifier is. Consequently, the extracted features by SR are able to improve the performance of the classifiers more effectively, which further proves that SR is capable of extracting the most effective features from original features without too much calculation cost.

Method Evaluation
In this study, K-Means is adopted to evaluate the performance of SR, PCA, FA and LPP. The first three extracted features corresponding to the largest eigenvalues are employed as the input features of K-Means. K-Means was implemented to recognize the clusters of the different bearing fault types, the acquired training dataset and testing dataset are used for modeling K-Means and checking misclassification, respectively. For given dataset, the accuracy rates are presented in Table 4, the classification results based on the original 29 features (OF29) and the first three features extracted by SR, PCA, FA and LPP are also presented in Table 4 and Figure 17. It can be observed that PCA and FA don't improve the recognition performance of K-Means in comparison with using OF29, both LPP and SR improve the accuracy rate, respectively. The results of this experiment are consistent with the actual situation of the CWRU dataset, since the data quality of the artificially introduced faults on bearings is very good, thus the features in different fault conditions are pretty separable. The K-Means recognized relatively accurately all of the different severity classes through the use of the methods based on PCA, LPP and SR. In addition, it is shown that the SR all gives more satisfied results as compared to others in four datasets, this further demonstrates the effectiveness of SR for feature extraction or dimensionality reduction of the given input space, and also be confirmed to improve the performance of the classifier obviously. Therefore, we can safely make use of SR in order to extract the most effective features among the practical applications.  Figure 17. The comparison of the accuracy rate in four datasets.
In order to further evaluate the proposed SR-based method, we adopt other experimental data, in particular, the bearing fault data acquired from an accelerated bearing life tester (ABLT-1) at the Hangzhou Bearing Test and Research Center in China (detailed information is described in [1]). The differentiation of fault states of bearings include three classes: normal, slightly degradation, and severe degradation (failure). The fault conditions can be estimated by the magnitude of the representative features, which are produced by the effective feature extraction methods. For this case, we collect the data from the whole life of the bearing to implement fault classification, and randomly select 100 samples from each fault states, and thus 300 samples are collected for the test bearing, 50% of samples are used as the training set to construct K-means model, while the remaining 50% of samples are used as the testing set to test the classification accuracy rate of K-means using the first three extracted features corresponding to the largest eigenvalues.
In this case, we not only compare with PCA, FA and LPP, but also compare with some other Graph Embedding based approaches, such as Laplacian Eigenmap (LE) and Linear Discriminant Analysis (LDA). The experimental results of K-means is shown in Table 5, the accuracy rate of classification by K-Means using the features extracted by SR is significantly better than that of K-Means using the features extracted by other methods, SR shows a similar performance with the supervised-based LDA. In addition, we can observe that there are some differences in the computational time of feature extraction consumed among the several methods. In Table 5, the computational time of LDA method is the highest, although it seeks the projective functions which are very perfect in the training set and testing set, so it is computationally expensive. The PCA method fails to show more improvement in the computational time; this is probably due to the fact that PCA does not encode discriminating information. The SR method achieves significantly better performance than other methods, which suggests that SR only needs to solve c-1 regularized least squares problems which are very efficient, This nice property makes it possible to apply SR to high dimensional large datasets in real-world applications. We also note that the classification accuracy rate of LPP-based and LE-based methods are also relatively higher, this is mostly due to the fact that the structured information in the experimental data is very important for feature extraction. Fortunately, SR is similar to them, and it is capable to discover local structured information in the data manifold. Specifically, this important property may enable SR to find more meaningful low-dimensional information hidden in the highdimensional features compared with PCA and FA methods. Overall, this case also further demonstrates that the SR-based feature extraction method is very effective to improve the performance of classifiers. It is noted that we tested the performance of the SR processing using the whole training and testing data for feature extraction in this experiment, which is not related to new test samples. In fact, handling data out samples (i.e., new inputs) problems presents a big challenge in the area of feature extraction. Due to space limitation, this problem is not discussed detail in the paper. In the real-world application, we should firstly transfer the training data into the project space under the weight matrix W, then using the same weight matrix W to treat the new testing data.

Conclusions
This paper has proposed a novel fault feature extraction scheme by adopting SR for bearing accelerometer sensor signals, and is the first time SR was applied to feature extraction of bearing faults. SR combines the spectral graph analysis and regression to provide an efficient and effective approach for regularized subspace learning problems, so that it can extract the most representative features from original calculated features. We adopt K-Means to evaluate the performance of the proposed feature extraction approaches, and the experimental results on obtained bearing vibration signal data have revealed that SR yields higher classification rates than other similar approaches, such as PCA, FA and LPP etc. According to this result, we conclude that the SR-based feature extraction scheme has great potential to be an effective and efficient tool for bearing fault diagnosis and prognosis, and the application of the SR-based method can serve as a promising alternative for intelligent maintenance systems in the future.