Fault Diagnosis of Rolling Bearing Based on Shift Invariant Sparse Feature and Optimized Support Vector Machine

: The vibration signal of rotating machinery fault is a periodic impact signal and the fault characteristics appear periodically. The shift invariant K-SVD algorithm can solve this problem effectively and is thus suitable for fault feature extraction of rotating machinery. With the over-complete dictionary learned by the training samples, including thedifferent classes, shift invariant sparse feature for the training as well as test samples can be formed through sparse codes and employed as the input of classiﬁer. A support vector machine (SVM) with optimized parameters has been extensively used in intelligent diagnosis of machinery fault. Hence, in this study, a novel fault diagnosis method of rolling bearings using shift invariant sparse feature and optimized SVM is proposed. Firstly, dictionary learning by shift invariant K-SVD algorithm is conducted. Then, shift invariant sparse feature is constructed with the learned over-complete dictionary. Finally, optimized SVM is employed for classiﬁcation of the shift invariant sparse feature corresponding to different classes, hence, bearing fault diagnosis is achieved. With regard to the optimized SVM, three methods including grid search, generic algorithm (GA), and particle swarm optimization (PSO) are respectively carried out. The experiment results show that the shift invariant sparse feature using shift invariant K-SVD can effectively distinguish the bearing vibration signals corresponding to different running states. Moreover, optimized SVM can signiﬁcantly improve the diagnosis precision.


Introduction
Sparse representation has been widely employed in image, video, and speech signal processing [1][2][3]. In recent years, a growing number of researchers utilize sparse representation to machinery fault diagnosis and performance degradation assessment [4]. Yu et al. proposed a novel classification method based on group sparse representation for bearing and gear fault diagnosis [5]. Peng et al. applied sparse representation to extract bearing fault features [6]. Fan et al. put forward a transient feature extraction method using sparse representation in wavelet basis [7].
For sparse representation, we can construct an over-complete dictionary through predefined dictionary, which needs prior knowledge of the signals, and is therefore not feasible in an engineering practice. The dictionary can also be formed by randomly choosing some samples from the training samples if the number of training samples is large enough. However, if the dataset is very large, this method does not work well, therefore we need more effective dictionary learning algorithms, e.g., K-means singular value decomposition (K-SVD) [8], method of optimal directions (MOD) [9], etc. K-SVD algorithm was first proposed for processing images and has been extensively applied to image processing [10][11][12]. In machinery fault diagnosis, K-SVD has also been employed. Zhu put forward a cutting force denoising method for micro-milling condition monitoring including kernel function parameter and penalization factor, hence it is very essential to conduct the optimization of the parameters by means of intelligent evolutionary algorithms, e.g., particle swarm optimization (PSO) [35] and generic algorithm (GA) [36,37]. For machinery fault diagnosis, SVM with optimized parameters has been widely used. Lu et al. applied an adaptive feature extraction method and optimized SVM based on PSO to drivetrain gearbox fault diagnosis [29]. Wang et al. utilized SVM based on GA to realize bearing fault diagnosis [30]. Three methods including grid search, GA, and PSO are respectively conducted to optimize SVM in this paper.
In this study, a novel method using shift invariant sparse feature and optimized SVM is put forward to realize bearing fault diagnosis. First of all, shift invariant K-SVD is adopted to learn an over-complete dictionary, whose training samples come from the vibration signals of rolling bearings at different running states. After that, the shift invariant sparse feature is constructed through sparse codes solved by the learned dictionary, which can be used as the input of SVM. In the end, optimized SVM using three different methods is implemented to distinguish different running states of rolling bearings including normal state, the fault of inner race, outer race and rolling element, hence intelligent fault diagnosis of rolling bearings can be achieved.
The remaining part of the paper includes: Section 2 introduces feature extraction method with shift invariant K-SVD. In Section 3, the pattern recognition method using optimized SVM is demonstrated. Then, the proposed bearing fault diagnosis model is presented in Section 4. Subsequently in Section 5, the experiment of rolling bearing fault is conducted to validate the effectiveness of the proposed method. At last, in Section 6, the conclusion is acquired.

Feature Extraction Using Shift Invariant K-SVD Algorithm
With regard to periodic impact signals, the same fault mode appears repeatedly at different times, which shows shift invariant characteristics. However, when using the K-SVD dictionary learning method, the learned dictionary will demonstrate that multiple different basis functions belong to the same fault feature mode, which just correspond to the different impact positions, that is, the K-SVD algorithm does not consider the shift invariant characteristics in the periodic impact signals, while the shift invariant K-SVD algorithm (SI-KSVD) [16] can effectively solve this problem, in which each fault mode, namely basis function, can appear at any moment and a translation of the same basis function is conducted to represent the periodically recurring signal characteristics. Although the fault characteristics may be submerged in strong noise and interference, the characteristic of periodic recurrence makes the shift-invariant K-SVD algorithm easier to converge to these recurring characteristic patterns. Therefore, a shift-invariant K-SVD algorithm is very suitable to extract the feature of periodic impact signals and thus a sparse feature based on shift invariant K-SVD algorithm can be formed.
In summary, for the feature extraction method using shift invariant K-SVD, there are two stages: dictionary learning and sparse coefficients solving. Firstly, dictionary learning using shift-invariant K-SVD is carried out to obtain a redundant dictionary. Afterwards, sparse codes can be solved, and the discriminative sparse feature is constructed based on the sparse codes.

Shift Invariant K-SVD Algorithm
For a long signal x ∈ R p×1 , assuming that there are a total of K basis functions d k ∈ R q×1 (q p), each basis function corresponds to a characteristic pattern and the over-complete dictionary D is constructed by translating a series of basis functions d k in the time domain. The goal of shift invariant K-SVD is to obtain several basis functions through dictionary learning based on a long signal x, thereby forming a total over-complete dictionary D, whose objective function is [17]: where T τ is the shift operator that translates the basis function d k to time τ and extends it to obtain a dictionary atom with the same length as the original long signal, where the basis function d k in this atom starts at time τ and all the rest is set to 0. For each basis function d k of length q, it can be translated up to time p − q + 1 and thus forming a total of p − q + 1 dictionary atoms and the total over-complete dictionary D contains K × (p − q + 1) dictionary atoms. s k,τ is the sparse coefficient with respect to the dictionary atom after basis function d k is translated to time τ and extended. s is the sparse coefficient vector of the long signal x and T is the sparsity prior. Similar to the two-step iterative process of the K-SVD dictionary learning algorithm, the shift invariant K-SVD algorithm is also a two-step iterative algorithm, which includes the sparse coefficient solving stage and the dictionary updating stage. During the sparse coefficient solving stage, each basis function is fixed. Due to excessive atoms of the overcomplete dictionary, the calculation is very time-consuming. Using the fast matching pursuit algorithm [38] can greatly improve the computing efficiency. During the dictionary Machines 2021, 9, 98 4 of 17 update stage, each basis function is updated in sequence. When the basis function d k is updated, other basis functions remain fixed and the sparse coefficients of the dictionary atoms corresponding to the basis function d k are also updated.
In the dictionary update stage, for a given basis function or feature pattern d κ , let the activation part in the corresponding coefficients be obtained in the first step of sparse decomposition stage, namely the set of non-zero elements be σ κ = {τ|s κ,τ = 0 }, defining the signalx κ with no contribution from other basis functions d k (k = κ), as shown below: where r is the residual signal. From Equation (1), the optimal basis function can be updated by the following equation: The above equation can be expressed: where T * τ is the operator corresponding to T τ , which can extract a segment with the same length q as the basis function d κ from the long signal and the segment starts at time τ.
Using only the activation information corresponding to the basis function d κ in the first sparse decomposition stage, i.e., the set of non-zero coefficients σ κ , the sparse coefficient and the new basis functions can be simultaneously updated. By Equation (4), the matrix formed by the basis function d κ and the segment T * τxκ corresponding to σ κ can be obtained and then the singular value decomposition can be performed on it. After the singular value decomposition, the largest singular value is retained, which means that the first principal component can be selected to obtain the best basis function and corresponding sparse coefficients: The flow of shift invariant K-SVD algorithm is as follows: (1) Given the long signal x and the length q and number K of basis functions. The initial basis functions are formed by randomly truncating segments with length q from the long signal x and then normalizing the segments. Set the number of iterations t = 1 and tolerance error ε; (2) Solving the sparse coefficient stage. The fast-matching pursuit algorithm is conducted to obtain the sparse coefficient s corresponding to the long signal; (3) Basis function update stage. Each basis function is updated in turn and assuming that when updated to the k-th basis function d κ , defining the set of sparse coefficients σ k activated by the basis function d κ , thus the basis function d κ and the corresponding sparse coefficient (s k,τ ) τ∈σ k can be updated through Equations (5) and (6); (4) Let t = t + 1 and judge whether the iteration is terminated. If the ratio of the reconstruction error x − ∑ k ∑ τ s k,τ T τ d k 2 2 of the two adjacent iterations is less than ε, the iteration is terminated, otherwise steps (2)-(4) are repeated.
If there are multiple long signals x i (i = 1, 2, . . . , N) forming a training set X, shift invariant K-SVD can still be utilized to learn the basis functions. Firstly, shift invariant K-SVD algorithm is employed for the first long signal x 1 , where the initial basis functions D 0 is constructed by randomly truncating segments from the long signals and then be normalized. Through this learning, sparse coefficients s 1 and corresponding basis functions D 1 can be obtained. The basis functions D 1 obtained through the shift invariant dictionary learning is more capable of sparsely representing the signal sample x 1 than the previous initial basis functions, which means that D 1 is closer to the nature of the signal x 1 . Then, D 1 is used as the initial basis functions and shift invariant K-SVD algorithm is applied to the second long signal x 2 , thus sparse coefficients s 2 and corresponding basis functions D 2 can be obtained. After that, D 2 is used as the initial basis functions and shift invariant K-SVD is applied to the third long signal x 3 .The above iterative process continues until the last long signal x N and sparse coefficients s N and corresponding basis functions D N are obtained. If the algorithm needs to continue, the basis functions D N is used as the initial basis functions and shift invariant K-SVD algorithm is applied to the first long signal x 1 and then the above iterative process is repeated in the order of the long signal. Whether the algorithm stops or not generally depends on whether the basis functions D has stabilized, which means that the relative error of two adjacent iterations of the basis functions D is less than the tolerance error. Finally, the basis functions D are learned through the algorithm.

Shift Invariant Sparse Feature
After the shift invariant dictionary learning, K basis functions is obtained. Each basis function d i (i = 1, 2, . . . , K) is translated in the time domain and extended to the length of the original long signal and then the sub-dictionary D i corresponding to the basis function d i can be acquired, which contains p-q+1 dictionary atoms. If there are L classes of signals and each class contains multiple training samples that are long signals, the shift invariant K-SVD algorithm is conducted for each class of samples and the K basis functions are employed for the training samples of each class and thus LK subdictionaries {D i |i = 1, 2, . . . , LK} can be obtained. Then, a whole redundant dictionary D = [D 1 , D 2 , . . . , D LK ] with LK sub-dictionaries can be formed by concatenating the subdictionaries. For each signal sample, matching pursuit algorithm is applied based on the over-complete dictionary D to solve the sparse coefficient s = [s 1 ; s 2 ; . . . , s LK ] where s i (i = 1, 2, . . . , LK) corresponds to each sub-dictionary D i . Afterwards, the l 1 norm, l 2 norm or maximum absolute value F i (i = 1, 2, . . . , LK) of the sparse coefficient vector s i corresponding to the sub-dictionary D i with p − q + 1 dictionary atoms is computed and thus LK-dimensional sparse feature F = [F 1 , F 2 , . . . , F LK ] can be obtained for each signal. Moreover, M(M ≥ 2) maximum absolute values F i (i = 1, 2, . . . , LKM) of the sparse coefficient vector s i are also computed, which is denoted as M-Max and thus LKMdimensional sparse feature F = [F 1 , F 2 , . . . , F LKM ] can be obtained for each signal. The LK-dimensional or LKM-dimensional sparse feature is named shift invariant sparse feature.
From the perspective of sparse representation, the sub-dictionary with regard to the class of the test sample is more adaptive to the test sample, i.e., the sub-dictionary with regard to the class of the test sample is more likely to be activated to approximate the test sample. Assuming that the class label of test sample y j is j(j = 1, 2, . . . , L), the sub-dictionaries D i (i = K(j − 1) + 1, K(j − 1) + 2, . . . , Kj) corresponding to the class j are more likely to be activated, i.e., solving the sparse coefficient using the whole overcomplete dictionary D and then the non-zero terms in the sparse coefficients corresponding to the sub-dictionaries D i are most likely to appear in s i (i = K(j − 1) + 1, K(j − 1) + 2, . . . , Kj), thus the l 1 norm, l 2 norm or M(M ≥ 1) maximum absolute values of the sparse coefficient vector s i corresponding to the sub-dictionaries D i are larger than the other sub-dictionaries. Therefore, the shift invariant sparse feature corresponding to different classes is distinguishable and can be employed as the input of classifier.

Classification with Optimized SVM
In this study, LIBSVM [39] is applied to the classification task including multiple classes using one-against-one method. RBF kernel K(x i , x j ) = e −g x i −x j 2 is suitable to conduct non-linear classification, where g denotes the width of RBF kernel. Moreover, penalization factor c also has a large impact on SVM performance. Consequently, the parameters (c, g) should be jointly optimized in order to get best SVM. In the following Machines 2021, 9, 98 6 of 17 subsections, three different methods including grid search, GA, and PSO are respectively carried out to optimize SVM.
The whole process of optimized SVM is as follows: firstly, linear normalization to [0, 1] is conducted on both the training set and test set; then, based on the training set, cross validation using the different parameters (c, g) is carried out and the best parameters that own the highest cross validation accuracy can be achieved, which is regarded as the best SVM model that corresponds to the training set; at last, the test set is predicted with the best SVM model.

Grid Search
The parameters (c, g) are given with an interval in grid form and all of the parameters are calculated to search the highest cross validation accuracy.

Genetic Algorithm
Genetic algorithm (GA) imitates the genetic and evolutionary process of organisms in nature [36], which operates on the coding of the decision variables and takes the objective function as search information while using the information of all points. Hence, GA owns excellent global search ability. The shortcomings of GA are that the local search ability is weak and the result is easily affected by the parameters.
GA consists of the following steps: population initialization, individual evaluation, selection operation, cross operation, mutation operation, and the decision of stopping criterion. In this study, the fitness value of GA is cross validation accuracy of SVM using the training set. With regard to the selection operation, it is to select relatively good individuals from the current population and copy them to the next population. Firstly, the total fitness of all individuals in the population is computed, then the relative fitness of each individual is computed as the individual selection probability, and finally the roulette method is employed to select new individuals. For the cross operation, the crossover operator is applied to the population, and two chromosomes are randomly selected for crossover. Whether to perform the crossover operation or not is determined by the crossover probability. The crossover position is randomly selected and the crossover position of the two chromosomes is the same. With respect to the mutation operation, the mutation operator is applied to the population, and a chromosome is randomly selected for mutation. Whether to perform the mutation operation or not is determined by the mutation probability. The position of the mutation is randomly selected, i.e., which gene is selected for mutation. After the mutation is completed the feasibility of the chromosome is tested.

Particle Swarm Optimization
PSO is also an evolutionary algorithm [35]. Compared with genetic algorithm, there are no crossover and mutation operations in PSO and the particles are only updated through internal velocity and thus PSO is easier to realize. However, when dealing with a complex problem with a high dimension, PSO always suffers premature convergence and the convergence performance is poor and thus the optimal solution cannot be guaranteed.
Suppose there are N particles in the particle swarm and the ith particle is p i = (p i1 , p i2 , . . . , p iK ), (where K represents the parameter numbers that should be optimized, in this study K = 2), whose velocity is expressed by v i = (v i1 , v i2 , . . . , v iK ). The fitness value of the ith particle is the cross-validation accuracy of SVM based on p i . In the iteration process, the best value of the ith particle that indicates the local best is represented by pbest i , while the best particle that indicates the global best is represented by gbest. Firstly, the initialization of the particles is implemented by a random number in the specified range. For the kth iteration, the ith particle and its velocity are renewed as follows [35]: where wv and wp are elastic coefficients for velocity update and particle update, respectively. c 1 and c 2 are acceleration coefficients, which represents the local and global search ability, respectively. r 1 and r 2 are random numbers uniformly distributed in [0, 1]. Each iteration indicates one generation, and the termination of iterations is determined by the maximum generations. When the iterations end, the global best value is obtained which signifies the best cross validation accuracy.

Bearing Fault Diagnosis Method Using Shift Invariant Sparse Feature and Optimized SVM
In this paper, a fault diagnosis method for rolling bearing using shift invariant sparse feature and optimized SVM is proposed. There are a total of five stages in the proposed method: dictionary learning with shift invariant K-SVD, sparse feature extraction, optimization of SVM, best SVM training and fault diagnosis. Figure 1 describes the whole process of the proposed method, and the description with regard to each stage is introduced in detail as follows:

Description of the Experiment
The proposed scheme based on shift invariant sparse feature and optimized SVM is verified by the experiment of rolling bearings fault through artificial processing. Figure 2 shows the test rig [26]. An AC motor drives the shaft through coupling and the shaft is supported by rolling bearings (GB203) at 720 rpm. The vibration signals are acquired by data acquisition system (NI PXI-1042), where the acceleration sensor (Kistler 8791A250) is located on the bracket that is fixed on the rolling bearing and the sampling rate is 25.6 kHz. The electro-discharge machining is carried out on the surface of the outer race, inner race, and rolling element of the rolling bearings, and three different classes of fault containing the fault of outer race (ORF), inner race (IRF) and rolling element (REF) are obtained, respectively. Consequently, including the normal state there are a total of four states.
With respect to the data set, there are a total of 120 samples containing four running (1) Dictionary learning with shift invariant K-SVD. Using the training set, an overcomplete dictionary is obtained with shift invariant K-SVD.
(2) Sparse feature extraction. Using the learned over-complete dictionary, sparse feature of all samples can be constructed as shown in Section 2.2, which can be employed as the input of SVM.
(3) Optimization of SVM. Three methods including grid search, GA, and PSO are respectively implemented to get the best (c, g), which has the best cross validation accuracy.
(4) SVM model training. Using the training set, the best SVM model can be learned with the best c and g.
(5) Fault diagnosis. For the test set, the category label of each test sample is predicted through the learned SVM model, therefore fault diagnosis of rolling bearings is achieved.

Description of the Experiment
The proposed scheme based on shift invariant sparse feature and optimized SVM is verified by the experiment of rolling bearings fault through artificial processing. Figure 2 shows the test rig [26]. An AC motor drives the shaft through coupling and the shaft is supported by rolling bearings (GB203) at 720 rpm. The vibration signals are acquired by data acquisition system (NI PXI-1042), where the acceleration sensor (Kistler 8791A250) is located on the bracket that is fixed on the rolling bearing and the sampling rate is 25.6 kHz. The electro-discharge machining is carried out on the surface of the outer race, inner race, and rolling element of the rolling bearings, and three different classes of fault containing the fault of outer race (ORF), inner race (IRF) and rolling element (REF) are obtained, respectively. Consequently, including the normal state there are a total of four states.

Feature Extraction with Shift Invariant Sparse Feature
Firstly, based on each class of training samples shift invariant K-SVD algorithm for multiple training samples is carried out to learn the dictionary corresponding to each class. The basis function length is 256 points and the basis function number is 4. When using matching pursuit algorithm for sparse decomposition, the sparsity prior T should be set. Theoretically, the sparsity prior T should be given as the quotient of the original signal length divided by the length of the basis function, but in order to allow correction of the sparse decomposition error, the sparsity can be set to 1.2 times the quotient [19], namely 1.2 × 2048/256 ≈ 10. In addition, the basis function number is generally set to greater than 2. If the basis function number is too large, the calculation amount is greatly increased and most basis functions will converge to noise. Conversely, in case there are too few basis functions, the fault feature components are hard to extract. In this paper, 4 basis functions are selected and the influence of different numbers of basis functions on the classification results is discussed in the next section.

Feature Extraction with Shift Invariant Sparse Feature
Firstly, based on each class of training samples shift invariant K-SVD algorithm for multiple training samples is carried out to learn the dictionary corresponding to each class. The basis function length is 256 points and the basis function number is 4. When using matching pursuit algorithm for sparse decomposition, the sparsity prior T should be set. Theoretically, the sparsity prior T should be given as the quotient of the original signal length divided by the length of the basis function, but in order to allow correction of the sparse decomposition error, the sparsity can be set to 1.2 times the quotient [19], namely 1.2 × 2048/256 ≈ 10. In addition, the basis function number is generally set to  The training set is formed by randomly selecting 150 samples from the 300 samples corresponding to different states, while the test set is constructed by the remaining samples. Hence, the training and test set both containing 600 samples are respectively generated.

Feature Extraction with Shift Invariant Sparse Feature
Firstly, based on each class of training samples shift invariant K-SVD algorithm for multiple training samples is carried out to learn the dictionary corresponding to each class. The basis function length is 256 points and the basis function number is 4. When using matching pursuit algorithm for sparse decomposition, the sparsity prior T should be set. Theoretically, the sparsity prior T should be given as the quotient of the original signal length divided by the length of the basis function, but in order to allow correction of the sparse decomposition error, the sparsity can be set to 1.2 times the quotient [19], namely 1.2 × 2048/256 ≈ 10. In addition, the basis function number is generally set to greater than 2. If the basis function number is too large, the calculation amount is greatly increased and most basis functions will converge to noise. Conversely, in case there are too few basis functions, the fault feature components are hard to extract. In this paper, 4 basis functions are selected and the influence of different numbers of basis functions on the classification results is discussed in the next section.
The learned four basis functions corresponding to each class is demonstrated in Figure 4. From the figure, it can be found that the basis functions belonging to different classes are significantly different.   Figures 6 and 7, respectively, where the subdictionary no. 1~4, 5~8, 9~12, 13~16 denotes normal, the fault of inner race, rolling element and outer race, respectively.
From the above two figures, we can find that for the test samples, the sub-dictionaries corresponding to the class of the samples are more likely to be activated, whose values are significantly larger than other sub-dictionaries, which reveals that shift invariant K-SVD can enable the signals of the same class to produce similar sparse feature, hence the shift invariant sparse feature is discriminative and can be employed as the input feature vector of the classifier.  After basis functions with regard to each class have been learned, the sub-dictionaries for each class can be formed, and thus the over-complete dictionary can be constructed by concatenating the sub-dictionaries. Using the over-complete dictionary, the sparse coefficients of all samples are calculated with the sparsity prior 10 and the sparse coefficients of four test samples with respect to four classes are demonstrated in Figure 5. As shown in the figure, each sample is more likely to be activated by the atoms corresponding to the category of the sample. Then, the shift invariant sparse feature can be computed based on the sparse codes and l 1 norm, l 2 norm or M-Max. Hence, 16-dimensional (l 1 norm, l 2 norm and Max), 32-dimensional (M = 2), or 48-dimensional (M = 3) feature vector is respectively acquired with regard to each sample.
Through the shift invariant K-SVD algorithm, matching pursuit algorithm and l 1 norm, the shift invariant sparse feature can be obtained. The test sample randomly selected for four different states and the sum of shift invariant sparse feature of test samples from the same state are demonstrated in Figures 6 and 7, respectively, where the sub-dictionary no. 1~4, 5~8, 9~12, 13~16 denotes normal, the fault of inner race, rolling element and outer race, respectively. Machines 2021, 9, x FOR PEER REVIEW 11 of 18          From the above two figures, we can find that for the test samples, the sub-dictionaries corresponding to the class of the samples are more likely to be activated, whose values are significantly larger than other sub-dictionaries, which reveals that shift invariant K-SVD can enable the signals of the same class to produce similar sparse feature, hence the shift invariant sparse feature is discriminative and can be employed as the input feature vector of the classifier.

Fault Diagnosis Using Shift Invariant Sparse Feature
After the feature of the training set and test set was extracted, SVM was utilized for classification. Firstly, standard SVM, which means that (c, g) are not optimized but specified instead, was employed. Then, the influence of the parameter sets of shift invariant sparse feature was analyzed. In the end, the optimization of SVM was carried out.

Diagnosis Result with Standard SVM
With regard to standard SVM, (c, g) are both set to 1. Based on the same learned over-complete dictionary, different methods for generating sparse feature, there are a total of five feature extraction methods using different sparse feature, including maximum absolute values, 2 maximum absolute values, 3 maximum absolute values, l 1 norm and l 2 norm, which are denoted as Max, 2-Max, 3-Max, L1, and L2, respectively, whose classification results are demonstrated in Table 1. Figure 8 respectively describes the detailed classification results corresponding to four classes.

Fault Diagnosis Using Shift Invariant Sparse Feature
After the feature of the training set and test set was extracted, SVM was utilized for classification. Firstly, standard SVM, which means that (c, g) are not optimized but specified instead, was employed. Then, the influence of the parameter sets of shift invariant sparse feature was analyzed. In the end, the optimization of SVM was carried out.

Diagnosis Result with Standard SVM
With regard to standard SVM, (c, g) are both set to 1. Based on the same learned overcomplete dictionary, different methods for generating sparse feature, there are a total of five feature extraction methods using different sparse feature, including maximum absolute values, 2 maximum absolute values, 3 maximum absolute values, l1 norm and l2 norm, which are denoted as Max, 2-Max, 3-Max, L1, and L2, respectively, whose classification results are demonstrated in Table 1. Figure 8 respectively describes the detailed classification results corresponding to four classes. Table 1 indicates that the different methods to construct sparse features have great impact on diagnosis result, in which the sparse feature using L1 (l1 norm) achieves the highest accuracy and thus l1 norm is utilized in the subsequent classification task using optimized SVM. The accuracy of the feature extraction method based on Max (Maximum absolute values) is the lowest, which is due to that the Max method ignores a lot of important sparse feature information in the sparse codes. However, with the increase of M, the accuracy is improved. Figure 8 shows that on the whole, the rolling element fault acquired the worst result, which signifies that rolling element fault is very complicated and harder to recognize. For normal and outer race fault, the method based on L1 (l1 norm) outperforms all the other methods.

Influence of Parameter Set of Shift Invariant Sparse Feature
With the feature extraction method based on L1 (l1 norm) and standard SVM, which means (c, g) are both set to 1, the influence of the parameter set of shift invariant sparse feature was discussed. In shift invariant K-SVD algorithm, the number of base functions  Table 1 indicates that the different methods to construct sparse features have great impact on diagnosis result, in which the sparse feature using L1 (l 1 norm) achieves the highest accuracy and thus l 1 norm is utilized in the subsequent classification task using optimized SVM. The accuracy of the feature extraction method based on Max (Maximum absolute values) is the lowest, which is due to that the Max method ignores a lot of important sparse feature information in the sparse codes. However, with the increase of M, the accuracy is improved. Figure 8 shows that on the whole, the rolling element fault acquired the worst result, which signifies that rolling element fault is very complicated and harder to recognize. For normal and outer race fault, the method based on L1 (l 1 norm) outperforms all the other methods.

Influence of Parameter Set of Shift Invariant Sparse Feature
With the feature extraction method based on L1 (l 1 norm) and standard SVM, which means (c, g) are both set to 1, the influence of the parameter set of shift invariant sparse feature was discussed. In shift invariant K-SVD algorithm, the number of base functions K for each class has a great influence on the whole fault diagnosis method so different K varying from {2, 3, . . . , 10} was respectively conducted.
For different K, the dictionary size increased rapidly as K increases, whose classification results and dictionary training time are illustrated in Figures 9 and 10, respectively. The average sparse coding time for one sample using different K is shown in Figure 11. Figure 9 shows that in general the diagnosis accuracy grows with the increasing number of base functions K, yet becomes steady when dictionary size reaches a relatively large value. Nevertheless, Figures 10 and 11 show that the dictionary training time rises quickly if dictionary size is increased and then the time of sparse coding for the training samples and test samples also grows quickly. Therefore, a proper value of K should be selected in addition to considering the diagnosis precision, the computing, and memory consumption must also be taken into account comprehensively. In this paper, K is set to 4. For different K, the dictionary size increased rapidly as K increases, whose classification results and dictionary training time are illustrated in Figures 9 and 10, respectively. The average sparse coding time for one sample using different K is shown in Figure 11. Figure 9 shows that in general the diagnosis accuracy grows with the increasing number of base functions K, yet becomes steady when dictionary size reaches a relatively large value. Nevertheless, Figures 10 and 11 show that the dictionary training time rises quickly if dictionary size is increased and then the time of sparse coding for the training samples and test samples also grows quickly. Therefore, a proper value of K should be selected in addition to considering the diagnosis precision, the computing, and memory consumption must also be taken into account comprehensively. In this paper, K is set to 4.    For different K, the dictionary size increased rapidly as K increases, whose classification results and dictionary training time are illustrated in Figures 9 and 10, respectively. The average sparse coding time for one sample using different K is shown in Figure 11. Figure 9 shows that in general the diagnosis accuracy grows with the increasing number of base functions K, yet becomes steady when dictionary size reaches a relatively large value. Nevertheless, Figures 10 and 11 show that the dictionary training time rises quickly if dictionary size is increased and then the time of sparse coding for the training samples and test samples also grows quickly. Therefore, a proper value of K should be selected in addition to considering the diagnosis precision, the computing, and memory consumption must also be taken into account comprehensively. In this paper, K is set to 4.

Diagnosis Results Using Optimized SVM
Using shift invariant sparse feature, optimized SVM with three methods including grid search, GA and PSO were respectively conducted. For all the methods, the selection ranges of (c, g) are restricted to 2 −10 to 2 10 and 5-fold cross validation is used. As for grid search, the logarithms of c and g based on 2 are stepped with the step size 1. With regard to GA and PSO, the fitness represents the c ross-validation accuracy and the population size is 20, the max generations are 100. The other parameters of GA including crossover and mutation probability are set to 0.4 and 0.2, respectively, while the other parameters of PSO are: wv = 1, wp = 1, c1 = 1.5 and c2 = 1.7. The result of the grid search is demonstrated in Figure 12, while Figures 13 and 14 demonstrate the fitness curves of GA and PSO, respectively. The figures show that the loops in the GA algorithm are terminated at the 50th generation and based on the training set, the best cross validation accuracies of the three methods corresponding to the best (c, g) are relatively high.

Diagnosis Results Using Optimized SVM
Using shift invariant sparse feature, optimized SVM with three methods including grid search, GA and PSO were respectively conducted. For all the methods, the selection ranges of (c, g) are restricted to 2 −10 to 2 10 and 5-fold cross validation is used. As for grid search, the logarithms of c and g based on 2 are stepped with the step size 1. With regard to GA and PSO, the fitness represents the c ross-validation accuracy and the population size is 20, the max generations are 100. The other parameters of GA including crossover and mutation probability are set to 0.4 and 0.2, respectively, while the other parameters of PSO are: wv = 1, wp = 1, c1 = 1.5 and c2 = 1.7. The result of the grid search is demonstrated in Figure 12, while Figures 13 and 14 demonstrate the fitness curves of GA and PSO, respectively. The figures show that the loops in the GA algorithm are terminated at the 50th generation and based on the training set, the best cross validation accuracies of the three methods corresponding to the best (c, g) are relatively high. Figure 11. Average sparse coding time using different K.

Diagnosis Results Using Optimized SVM
Using shift invariant sparse feature, optimized SVM with three methods includ grid search, GA and PSO were respectively conducted. For all the methods, the select ranges of (c, g) are restricted to 2 −10 to 2 10 and 5-fold cross validation is used. As for g search, the logarithms of c and g based on 2 are stepped with the step size 1. With reg to GA and PSO, the fitness represents the c ross-validation accuracy and the populat size is 20, the max generations are 100. The other parameters of GA including crosso and mutation probability are set to 0.4 and 0.2, respectively, while the other parame of PSO are: wv = 1, wp = 1, c1 = 1.5 and c2 = 1.7. The result of the grid search is demonstra in Figure 12, while Figures 13 and 14 demonstrate the fitness curves of GA and PSO, spectively. The figures show that the loops in the GA algorithm are terminated at the 5 generation and based on the training set, the best cross validation accuracies of the th methods corresponding to the best (c, g) are relatively high.
. Figure 12. Optimized SVM using grid search.  Using the best (c, g), the optimized SVM model is obtained based on the training samples, which can be employed to acquire the category labels of the test set. For LIBSVM, the default values of (c, g) are (1, 1/k) (k denotes data dimension). SVM with default (c, g) was also conducted for comparison with optimized SVM. Table 2 demonstrates the diagnosis results and computation time of different methods. The table indicates that the proposed method based on the shift invariant sparse feature and optimized SVM can effectively distinguish different operating conditions, thus the fault diagnosis of rolling bearings is achieved. Under default parameters (c, g), the diagnosis precision is not high, therefore different parameters of SVM have significant impact on the diagnosis precision and it is necessary to conduct the process of parameter optimization. For the three optimization methods, the PSO method acquires the highest accuracy. Moreover, the computation times of GA and PSO are much longer than grid search so the efficiency of the algorithms needs to be improved. Of course, the classification accuracies and computation times of the optimization algorithms are affected by the parameters set of the algorithms themselves. Generally speaking, when the dataset scale is small, using grid search is sufficient to meet the demand, but if the dataset scale is too large it is better to use GA or PSO algorithm.  Using the best (c, g), the optimized SVM model is obtained based on the training samples, which can be employed to acquire the category labels of the test set. For LIBSVM, the default values of (c, g) are (1, 1/k) (k denotes data dimension). SVM with default (c, g) was also conducted for comparison with optimized SVM. Table 2 demonstrates the diagnosis results and computation time of different methods. The table indicates that the proposed method based on the shift invariant sparse feature and optimized SVM can effectively distinguish different operating conditions, thus the fault diagnosis of rolling bearings is achieved. Under default parameters (c, g), the diagnosis precision is not high, therefore different parameters of SVM have significant impact on the diagnosis precision and it is necessary to conduct the process of parameter optimization. For the three optimization methods, the PSO method acquires the highest accuracy. Moreover, the computation times of GA and PSO are much longer than grid search so the efficiency of the algorithms needs to be improved. Of course, the classification accuracies and computation times of the optimization algorithms are affected by the parameters set of the algorithms themselves. Generally speaking, when the dataset scale is small, using grid search is sufficient to meet the demand, but if the dataset scale is too large it is better to use GA or PSO algorithm. Using the best (c, g), the optimized SVM model is obtained based on the training samples, which can be employed to acquire the category labels of the test set. For LIBSVM, the default values of (c, g) are (1, 1/k) (k denotes data dimension). SVM with default (c, g) was also conducted for comparison with optimized SVM. Table 2 demonstrates the diagnosis results and computation time of different methods. The table indicates that the proposed method based on the shift invariant sparse feature and optimized SVM can effectively distinguish different operating conditions, thus the fault diagnosis of rolling bearings is achieved. Under default parameters (c, g), the diagnosis precision is not high, therefore different parameters of SVM have significant impact on the diagnosis precision and it is necessary to conduct the process of parameter optimization. For the three optimization methods, the PSO method acquires the highest accuracy. Moreover, the computation times of GA and PSO are much longer than grid search so the efficiency of the algorithms needs to be improved. Of course, the classification accuracies and computation times of the optimization algorithms are affected by the parameters set of the algorithms themselves. Generally speaking, when the dataset scale is small, using grid search is sufficient to meet the demand, but if the dataset scale is too large it is better to use GA or PSO algorithm.

Conclusions
In this paper, a new fault diagnosis method for rolling bearing based on shift invariant sparse feature and optimized SVM is proposed. The shift invariant sparse feature is applied for extracting shift invariant features of the vibration signals of rolling bearings, which presents the characteristics of periodic recurrence of fault impact. The experiment of rolling bearing fault was carried out and through the analysis of experimental vibration signal, it can be found out that shift invariant sparse feature based on shift invariant K-SVD is very discriminative, which can effectively distinguish different states of rolling bearings. For shift invariant sparse feature based on different methods, l 1 norm achieves the highest classification accuracy. The influence of the parameter in shift invariant sparse feature, namely the number of basis functions is also discussed, which shows that the number of basis functions should be set comprehensively considering the diagnosis precision, the computing, and memory consumption. As for optimized SVM, the classification results indicate that parameter optimization is very essential for SVM and optimized SVM using the methods of grid search, GA, or PSO can dramatically improve the classification ability of SVM. With respect to the three methods, although PSO owns the longest running time, it obtains the highest classification accuracy. In future work, combining other effective shift invariant dictionary learning methods to obtain superior sparse features of bearing fault will be explored. For the optimized SVM, improved optimization methods based on GA or PSO can be considered to further enhance the optimization ability.

Data Availability Statement:
The data used to support the findings of this study are available from the corresponding author upon request.