Intelligent Diagnosis Method for Rotating Machinery Using Dictionary Learning and Singular Value Decomposition

Rotating machinery is widely used in industrial applications. With the trend towards more precise and more critical operating conditions, mechanical failures may easily occur. Condition monitoring and fault diagnosis (CMFD) technology is an effective tool to enhance the reliability and security of rotating machinery. In this paper, an intelligent fault diagnosis method based on dictionary learning and singular value decomposition (SVD) is proposed. First, the dictionary learning scheme is capable of generating an adaptive dictionary whose atoms reveal the underlying structure of raw signals. Essentially, dictionary learning is employed as an adaptive feature extraction method regardless of any prior knowledge. Second, the singular value sequence of learned dictionary matrix is served to extract feature vector. Generally, since the vector is of high dimensionality, a simple and practical principal component analysis (PCA) is applied to reduce dimensionality. Finally, the K-nearest neighbor (KNN) algorithm is adopted for identification and classification of fault patterns automatically. Two experimental case studies are investigated to corroborate the effectiveness of the proposed method in intelligent diagnosis of rotating machinery faults. The comparison analysis validates that the dictionary learning-based matrix construction approach outperforms the mode decomposition-based methods in terms of capacity and adaptability for feature extraction.


Introduction
As a type of equipment widely used in modern industry, rotating machinery is becoming more precise and with more complicated structures. The operating conditions have also become more severe, involving high speeds, high loads, high temperatures, etc. Mechanical sub-systems in rotating machinery, especially the critical components such as bearings [1], gearbox [2], rotor [3] and fan [4] are easily subject to failure, resulting in unexpected downtime losses or even disastrous accidents. Condition monitoring and fault diagnosis (CMFD) technology is a promising tool to realize early fault alarms and minimize losses.
Among the various approaches used in CMFD technology, the signal-based diagnosis approaches and data-driven diagnosis approaches attract continuous interest [5,6]. In signal-based approaches, the foundation is that the fault information can be reflected in the monitored signals, and a diagnosis result can be made by checking the consistency between real-time data and healthy signal patterns. decomposition (LCD), local mean decomposition (LMD), etc., to decompose the signals and get a finite number of components, which contain the different frequency information from high to low [32][33][34]. Then, the initial feature matrix can be formed automatically by merging these components. Actually, the singular values of the matrix based on this strategy mainly reflect the division states of frequency bands by ATSD. When the signals in different conditions have similar spectral contents, this feature extraction strategy may not distinguish the condition patterns exactly [35].
In this paper, a novel framework using dictionary learning and SVD is proposed for intelligent diagnosis in rotating machinery. We investigate the potential of introducing the dictionary learning scheme as the initial feature matrix extraction method to achieve improved sensitivity and diagnosis capability of singular values. Specifically, the learned dictionary can reveal the abundantly inherent information of the analyzed signal, leading to an expected feature space. By applying SVD to the dictionary matrices, the singular value sequences can be obtained and serve as the feature vectors of raw signals. Due to the high dimensionality of feature vectors, principal component analysis (PCA) is chosen to reduce the dimensionality and improve the discriminability [36,37]. The fault patterns can be visually observed from the scatter plots of the first two or three principal components, and the intelligent diagnosis results can be made by the K-nearest neighbor (KNN) algorithm. In addition, the effectiveness and superiority of the proposed feature extraction strategy is investigated in comparison with that of the traditional EMD-based method.
The remainder of this work is organized as follows: the dictionary learning scheme and SVD are reviewed in Sections 2 and 3, respectively. Then, an intelligent monitoring and diagnosis method of rotating machinery using dictionary learning and SVD is proposed in Section 4. Section 5 contains the description of two fault datasets from bearing and gearbox, the diagnosis procedures, the discussion and comparison of the results. Finally, the conclusions can be drawn in Section 6.

Sparse Representation Theory
The basic idea of sparse representation theory assumes that a digital signal can be represented by a sparse linear combination of the atoms, which are from a fixed over-complete dictionary. Generally speaking, for an input signal y ∈ R n , it can be expressed as: where D ∈ R n×K is a matrix called dictionary, which contains K atoms d i ∈ R n , i = 1, . . . , K as its columns, x ∈ R K is the sparse representation coefficient, and ξ is assumed as additive noise. When the dictionary D and input signal y are fixed, we hope to obtain the succinct representation coefficient, which means the majority of the entries in coefficient vector are zero or close to zero. That is, only a small proportion of atoms will contribute to approximating the input signal. To measure sparsity level of coefficient vector x, the l 0 -norms of vector x can be calculated as follow: which represents the number of nonzero items in x. Then the sparest representation can be transformed to the following optimization problem: (P 0, ) min x ||x|| 0 subject to ||y − Dx|| 2 ≤ where the approximation error is assessed by l 2 -norm and is the parameter which depends on the noisy level of signal. The optimization process is generally called sparse coding. For the overcomplete dictionary, Equation (3) are underdetermined systems of equation. This is a combinational optimization problem and the process of sparse coding is a typical non-deterministic polynomial (NP) hard problem. Thus, scholars turn to approximate algorithms to look for the sparsest collection. The matching pursuit (MP) and orthogonal matching pursuit (OMP) are two simplest but efficient greedy methods. These approaches select the atoms in sequence, according to the correlation between the columns of dictionary and residual signal. Since sparse coding is an indispensable step in dictionary learning scheme, the primary ideas of OMP is introduced briefly. Firstly, the algorithm defines an initial residualr (0) = y and a current linear combination of the atomsŷ (0) = 0; Then, for k = 1, . . . , ther (k) andŷ (k) are updated step by step, always keeping y =r (k) +ŷ (k) ; All the atoms in dictionary D are normalized, that is, ||d i || 2 = 1. During kth update, an atom which has a maximum correlation withr (k−1) is selected and added to current linear combination. The correlation between atoms and residual can be measured by the following equation: In this stage, a current linear combination of the atomsŷ (k) can be obtained: where the coefficients a k i can be determined by using least squares methods, that is, minimize . Also, an updated residualr (k) = y −ŷ (k) should be used in the next iteration. This iteration can be repeated until the residual satisfies some set threshold or the number of nonzero elements in x has reach an upper limit. The OMP could generate highly sparse solutions in sparse coding stage and is further adopted in this paper.

K-SVD Dictionary Learning
Sparse representation differs from other conventional basis representation models because the dictionary can provide a wider array of basis functions. This benefit offers more flexibility in signal representation, and thus more validity dealing with tasks like signal compression, feature extraction and more. It should be noted that two challenges exist in this model, one involving the sparse coefficients solving as aforementioned, and the other, designing the dictionary to fit the structure in the analyzed data. The early dictionaries are chosen as a prespecified set of basis functions, such as discrete cosine transforms (DCT), wavelets, curvelets, short-time Fourier transforms, etc. This dictionary designing scheme is simple and always lead to fast algorithms, while the performance largely depends on how adaptive the atoms are to sparsely represent the input signals, indicating it is necessary to manual find the signal pattern firstly. Another route for designing dictionary is to adapt the dictionary with respect to a set of sample signals based on learning. Such dictionary learning process can capture the inherent characteristic of raw signal as the learned atoms, which is essentially an adaptive methods regardless of any prior knowledge of signals.
K-singular value decomposition (K-SVD) is a highly efficient dictionary learning method. This algorithm mainly include two phases. One step is to find a sparse coefficient given a fixed dictionary which can be regarded as sparse coding. Another step is dictionary update stage based on the acquired coefficient vectors. The columns of dictionary are updated sequentially in K-SVD. The pivotal steps of this method will be illustrated briefly.
Given a signal set Y = {y i } N i=1 , y i ∈ R n and an initial dictionary D ∈ R n×K , the sparse representation coefficient vectors x i corresponding to signal samples y i can be gathered to construct the coefficient matrix X ∈ R K×N . In this problem, we want to get the sparse representation of sample signals Y based on the changing dictionary D and coefficient matrix X. A desirable dictionary, which can decompose the signals sparsely, should be learned in this procedure. The problem can be met by considering: where is the reconstruction error and the Frobenius norm of a matrix ||A|| F is defined as ||A|| F = (∑ ij A 2 ij ) 1/2 . Firstly, the D is fixed and an optimal coefficient matrix X can be determined.
Then, a second step is conducted to update the dictionary. Here, only one column d k is modified at one time, while other columns in D are freezed. Also, the corresponding coefficient vector x k T which is the kth row in X is changed. We just need to minimize the target function ||Y − DX|| 2 F to realize the atom updating. This process can be expressed as: where E k means the error for all signal samples when the atom d k is removed. Now, we may intend to perform singular value decomposition (SVD) on E k and find alternative d k and x k T to reduce error. However, it is to be noted that, a step must be done before SVD to make sure the updated x k T is not filled. A variable is define as: The sample signals {y i } that use the atom d k can be indexed by ω k and the positions of nonzero entries in x k T can be determined by parameter i in Equation (9). Define Ω k as a matrix of size N × |ω k |, with ones on the (ω k (i), i)th entries and zeros elsewhere. The multiplication x k R = x k T Ω k changes the length of x k T to |ω k | by removing the zero entries. Similarly, the matrixes Y R k = YΩ k , Y R k ∈ R n×|ω k | and E R k = E k Ω k , E R k ∈ R n×|ω k | only include the sample signals or error columns that use the atom d k . Therefore, we define: Figure 1 presents a more understandable explanation of this procedure. This time, we can process E R k = U∆V T via SVD. After that, the d k can be changed to the first column of U and x k R can be replaced by the first column of V multiplied by ∆(1, 1). All the atoms in D are updated one by one and the iteration of sparse coding and dictionary learning is repeated until convergence or the number of iteration reached. Detailed algorithm of K-SVD is given in [27]. th row in is changed. We just need to minimize the target function ‖ − ‖ to realize the atom updating. This process can be expressed as: where means the error for all signal samples when the atom is removed. Now, we may intend to perform singular value decomposition (SVD) on and find alternative and to reduce error. However, it is to be noted that, a step must be done before SVD to make sure the updated is not filled. A variable is define as: The sample signals that use the atom can be indexed by and the positions of nonzero entries in can be determined by parameter in Equation (9). Define as a matrix of size × | |, with ones on the ( ( ), )th entries and zeros elsewhere. The multiplication = changes the length of to | | by removing the zero entries. Similarly, the matrixes = , ∈ ×| | and = , ∈ ×| | only include the sample signals or error columns that use the atom . Therefore, we define: Figure 1 presents a more understandable explanation of this procedure. This time, we can process = ∆ via SVD. After that, the can be changed to the first column of and can be replaced by the first column of multiplied by ∆(1,1). All the atoms in are updated one by one and the iteration of sparse coding and dictionary learning is repeated until convergence or the number of iteration reached. Detailed algorithm of K-SVD is given in [27].

Singular Value Decomposition
According to matrix theory, a matrix ( ∈ × ) can be decomposed into a number of elementary matrices satisfying mutually orthogonal and unit-rank by SVD, that is: where ∈ × and ∈ × are two orthogonal matrices, is a diagonal matrix, ( = 1,2, … , ) is the singular value of matrix . Both the value and variation trend of singular value sequence reflect the nature characteristic of matrix. Consequently, the singular value can be used to describe the important information implied in the matrix. Prior to performing SVD, the feature matrix are traditionally formed based on phase space reconstruction technique. Two reconstruction

Singular Value Decomposition
According to matrix theory, a matrix A (A ∈ R m×n ) can be decomposed into a number of elementary matrices satisfying mutually orthogonal and unit-rank by SVD, that is: where U ∈ R m×m and V ∈ R n×n are two orthogonal matrices, Σ is a diagonal matrix, λ i (i = 1, 2, . . . , r) is the singular value of matrix A. Both the value and variation trend of singular value sequence reflect the nature characteristic of matrix. Consequently, the singular value can be used to describe the important information implied in the matrix. Prior to performing SVD, the feature matrix are traditionally formed based on phase space reconstruction technique. Two reconstruction parameters, namely the lag time and embedding dimension, have an influence on the initial feature matrix and further the results of singular value. Unfortunately, no mature theory can be applied to guide the selection of reconstruction parameters. As a result, the further studies focus on the extraction of feature matrix which can represent the nonlinear and non-stationary characteristics of raw vibration signals.

Proposed Method
In essence, dictionary learning allows more flexibility to extract and analyze the inherent characteristics of signal, regardless of any prior knowledge. The learning process adapts the basic atoms in a dictionary of the characteristic patterns of the monitored signals. The underlying structures and scales of signal, such as periodic impulses and resonance information, can be captured by the basis atoms. Thus the learned dictionary contains a substantial amount of signature information, delivering potential benefits in the feature extraction. For this purpose, we propose to introduce the dictionary learning scheme as the initial feature matrix extraction method for further analysis. The detailed steps of dictionary learning are described as follows: (1). Given a signal sample, partition the signal into an amount of overlapping segments and generate the dataset Y for dictionary learning. (2). Set the initial parameters, i.e., initial dictionary D 0 , iteration number K, noise level .
(3). Sparse coding: use the OMP algorithm in Section 2.1 to find the sparse representation coefficient. (4). Dictionary update: update the atoms in the dictionary according to the learning algorithm in Section 2.2. (5). Repeat the step (3) and (4) until the number of iteration reach.
A more intuitional workflow is illustrated in Figure 2. The tuned parameters in K-SVD dictionary learning are given in Table 1. With the above steps, the characteristic patterns in the signal sample can be fully explored and mined. The dictionary matrix which contains abundant diagnostic information can be adaptively learned. Section 2.2. (5). Repeat the step (3) and (4) until the number of iteration reach.
A more intuitional workflow is illustrated in Figure 2. The tuned parameters in K-SVD dictionary learning are given in Table 1. With the above steps, the characteristic patterns in the signal sample can be fully explored and mined. The dictionary matrix which contains abundant diagnostic information can be adaptively learned.     Following the data-driven diagnostic procedure, a novel intelligent diagnosis method for rotating machinery is proposed in this work (see Figure 3). At the first step, the sensitive features should be extracted based on acquired vibration signals to represent different machinery conditions. This step relies on two stages: dictionary learning and singular value extraction. Dictionary learning is employed to extract initial feature matrix. Then, the singular value sequence of learned dictionary matrix can be used as the feature vector of analyzed signal. Technically, the singular value sequence is of high-dimensionality, implying it is impossible to directly serve as the input for classification model. Hence, the next step is dimensionality reduction, which mainly includes linear and nonlinear methods. Although nonlinear dimensionality reduction methods achieved some successful cases in fault diagnosis, it should be noted that the nonlinear methods suffer from some disadvantages such as the computational burden and estimation errors [37]. Otherwise, the nonlinear methods may not outperform the traditional linear ones according to some numerical experiments as reported in literatures [38]. This framework is not limited to a specific dimensionality reduction approach, as linear and nonlinear methods may work in different scenarios. For simplicity, this work just apply a basic approach, PCA, to reduce dimensionality and map singular value sequence to low-dimensional principal components for pattern recognition. Finally, the K-nearest neighbor (KNN) classifier is utilized to identify the different machinery conditions automatically. Unlike artificial neural networks (ANN) and support vector machine (SVM), KNN is a non-parametric classification model and the training process is to store the training samples directly, which avoids the cost of parameters tuning and model training in other algorithms. basic approach, PCA, to reduce dimensionality and map singular value sequence to low-dimensional principal components for pattern recognition. Finally, the K-nearest neighbor (KNN) classifier is utilized to identify the different machinery conditions automatically. Unlike artificial neural networks (ANN) and support vector machine (SVM), KNN is a non-parametric classification model and the training process is to store the training samples directly, which avoids the cost of parameters tuning and model training in other algorithms.

Dimensionality Reduction
The refined feature vectors Construct the KNN using training dataset

Intelligent Fault Diagnosis Results
Input the testing samples

Experimental Results and Comparisons
To demonstrate the applicability and superiority of proposed method, the analysis of two fault datasets from bearings and gearbox are conducted, respectively.

Experimental Data
The rolling bearing dataset is from the Electrical Engineering lab at Case Western Reserve University. Many scholars have utilized this dataset as a standard reference to test their algorithms over the last decade. The experiment platform consists of a 2 hp motor, a torque transducer, a

Experimental Results and Comparisons
To demonstrate the applicability and superiority of proposed method, the analysis of two fault datasets from bearings and gearbox are conducted, respectively.

Experimental Data
The rolling bearing dataset is from the Electrical Engineering lab at Case Western Reserve University. Many scholars have utilized this dataset as a standard reference to test their algorithms over the last decade. The experiment platform consists of a 2 hp motor, a torque transducer, a dynamometer, and control electronics. The test bearings in SKF6205 type support the motor shaft (Svenska Kullagerfabriken AB, Gothenburgh, Sweden). A normal condition and three fault conditions, i.e., inner race fault, ball fault and outer race fault, are tested, during which the vibration signal of bearing can be collected by acceleration sensors with a sampling frequency of 12 kHz. The single point faults are introduced to the test bearings with a fault diameter of 7 mils. The experimental speed is 1772 rpm.
In each condition, a large sample are divided into 50 samples, each of which contains 6000 data points. 20 samples selected randomly are used for training dataset and the remaining 30 samples are used to test the recognition rate. Totally, there are 80 samples for training and 120 samples for testing in the four bearing conditions. The time waveforms and frequency spectra of the signal samples in the four conditions are presented in Figure 4.
speed is 1772 rpm.
In each condition, a large sample are divided into 50 samples, each of which contains 6000 data points. 20 samples selected randomly are used for training dataset and the remaining 30 samples are used to test the recognition rate. Totally, there are 80 samples for training and 120 samples for testing in the four bearing conditions. The time waveforms and frequency spectra of the signal samples in the four conditions are presented in Figure 4.

Diagnosis Procedure and Results
As presented previously, the dictionary learning scheme is firstly adopted to generate the initial feature matrices in this work. Each dictionary with a size of 100 × 100 is learned from each sample, which means 100 atoms, each with the length of 100 points. In principle, increasing the atom length maps directly to the computational burden and reduces the learning capacity based on existing dictionary learning algorithms. For fault recognition of rotating machinery, setting atom length large enough to include one impact will allow more evident fault patterns to be contained by learned dictionary. In our experiments, it can be find each impact lasts less than 100 points in most cases, if any.

Diagnosis Procedure and Results
As presented previously, the dictionary learning scheme is firstly adopted to generate the initial feature matrices in this work. Each dictionary with a size of 100 × 100 is learned from each sample, which means 100 atoms, each with the length of 100 points. In principle, increasing the atom length maps directly to the computational burden and reduces the learning capacity based on existing dictionary learning algorithms. For fault recognition of rotating machinery, setting atom length large enough to include one impact will allow more evident fault patterns to be contained by learned dictionary. In our experiments, it can be find each impact lasts less than 100 points in most cases, if any.
For clarity, the dictionary matrices corresponding to four bearing conditions are shown from Figures 5-8, respectively. We compare some selected atoms in the dictionary with the raw signal enlargements. One can perspicuously observe the atoms have caught the underlying structure of raw signal. In normal condition, no obvious impacts appear in the learned atoms, whereas acute impulses phenomenon can be noticed in fault conditions. Generally, the mechanisms of generating impulses may have differences when faults occur in different position. Hence, these impulses exhibit different fault signatures, which can be utilized for pattern recognition.
Then, the singular value sequences for total 200 samples are extracted from the learned dictionary matrices. As shown in Figure 9, it is easy to find the gaps of singular value sequences between different bearing conditions. The variation trends of sequences have excellent separability so that fault diagnosis using singular value sequences is positively tenable. Furthermore, the new significant features by PCA, namely the principal components, are adopted to recognize and classify the bearing conditions. To compare the proposed method with traditional EMD-based method, we apply EMD to decompose the sample signals, construct feature matrices and extract singular value sequences. Here, we present the results after dimensionality reduction. The contribution rate of first few principal components are listed in Table 2. The scatter plots of the principal components using the two pre-processors are shown in Figures 10 and 11, respectively. phenomenon can be noticed in fault conditions. Generally, the mechanisms of generating impulses may have differences when faults occur in different position. Hence, these impulses exhibit different fault signatures, which can be utilized for pattern recognition.
Then, the singular value sequences for total 200 samples are extracted from the learned dictionary matrices. As shown in Figure 9, it is easy to find the gaps of singular value sequences between different bearing conditions. The variation trends of sequences have excellent separability so that fault diagnosis using singular value sequences is positively tenable. Furthermore, the new significant features by PCA, namely the principal components, are adopted to recognize and classify the bearing conditions. To compare the proposed method with traditional EMD-based method, we apply EMD to decompose the sample signals, construct feature matrices and extract singular value sequences. Here, we present the results after dimensionality reduction. The contribution rate of first few principal components are listed in Table 2. The scatter plots of the principal components using the two pre-processors are shown in Figures 10 and 11, respectively.  signal. In normal condition, no obvious impacts appear in the learned atoms, whereas acute impulses phenomenon can be noticed in fault conditions. Generally, the mechanisms of generating impulses may have differences when faults occur in different position. Hence, these impulses exhibit different fault signatures, which can be utilized for pattern recognition.
Then, the singular value sequences for total 200 samples are extracted from the learned dictionary matrices. As shown in Figure 9, it is easy to find the gaps of singular value sequences between different bearing conditions. The variation trends of sequences have excellent separability so that fault diagnosis using singular value sequences is positively tenable. Furthermore, the new significant features by PCA, namely the principal components, are adopted to recognize and classify the bearing conditions. To compare the proposed method with traditional EMD-based method, we apply EMD to decompose the sample signals, construct feature matrices and extract singular value sequences. Here, we present the results after dimensionality reduction. The contribution rate of first few principal components are listed in Table 2. The scatter plots of the principal components using the two pre-processors are shown in Figures 10 and 11, respectively.        In both methods, the first two principal components account for up to 95% of the contribution rate, indicating the reliability of first two or three principal components. From Figure 10, all the samples are well separated in 2D and 3D space based on dictionary learning and SVD. Nevertheless, there is an overlapping area between the inner race fault and ball fault, using EMD-SVD preprocessing from Figure 11.
(a) (b) Figure 10. Scatter plots of the principal components after dimensionality reduction using dictionary learning and SVD: (a) the first two principal components; (b) the first three principal components.
(a) (b) Figure 11. Scatter plots of the principal components after dimensionality reduction using EMD and SVD: (a) The first two principal components; (b) The first three principal components.
Through inspecting the time waveforms and frequency spectra in Figure 4, we can find the frequency components are relatively similar for the two conditions, leading to the similarity of the frequency band partition. Thus, the major elements in IMF matrices tend to be the same in the two conditions, which results in the overlapping area of principal components. The feature matrix based on this strategy cannot directly reflect the detailed information of vibration signal.
The first three principal components are selected as the feature vector. The 80 training samples are applied to construct KNN classifier. The diagnosis results of 120 testing samples are given in  In both methods, the first two principal components account for up to 95% of the contribution rate, indicating the reliability of first two or three principal components. From Figure 10, all the samples are well separated in 2D and 3D space based on dictionary learning and SVD. Nevertheless, there is an overlapping area between the inner race fault and ball fault, using EMD-SVD preprocessing from Figure 11.
(a) (b) Figure 10. Scatter plots of the principal components after dimensionality reduction using dictionary learning and SVD: (a) the first two principal components; (b) the first three principal components.
(a) (b) Figure 11. Scatter plots of the principal components after dimensionality reduction using EMD and SVD: (a) The first two principal components; (b) The first three principal components.
Through inspecting the time waveforms and frequency spectra in Figure 4, we can find the frequency components are relatively similar for the two conditions, leading to the similarity of the frequency band partition. Thus, the major elements in IMF matrices tend to be the same in the two conditions, which results in the overlapping area of principal components. The feature matrix based on this strategy cannot directly reflect the detailed information of vibration signal.
The first three principal components are selected as the feature vector. The 80 training samples are applied to construct KNN classifier. The diagnosis results of 120 testing samples are given in  Figure 11. Scatter plots of the principal components after dimensionality reduction using EMD and SVD: (a) The first two principal components; (b) The first three principal components.
In both methods, the first two principal components account for up to 95% of the contribution rate, indicating the reliability of first two or three principal components. From Figure 10, all the samples are well separated in 2D and 3D space based on dictionary learning and SVD. Nevertheless, there is an overlapping area between the inner race fault and ball fault, using EMD-SVD pre-processing from Figure 11.
Through inspecting the time waveforms and frequency spectra in Figure 4, we can find the frequency components are relatively similar for the two conditions, leading to the similarity of the frequency band partition. Thus, the major elements in IMF matrices tend to be the same in the two conditions, which results in the overlapping area of principal components. The feature matrix based on this strategy cannot directly reflect the detailed information of vibration signal.
The first three principal components are selected as the feature vector. The 80 training samples are applied to construct KNN classifier. The diagnosis results of 120 testing samples are given in Figure 12. The multi-class confusion matrix illustrates the detailed recognition accuracy and misclassified error for all conditions (inner race fault, ball fault, outer race fault and normal correspond to labels 1, 2, 3 and 4, respectively). Undoubtedly, the diagnosis accuracy using dictionary learning is 100%, while EMD pre-processor only achieves the 90.8% accuracy, which the particular misclassified errors keep consistent with the analysis of principal components in 2D and 3D space.  Figure 12. The multi-class confusion matrix illustrates the detailed recognition accuracy and misclassified error for all conditions (inner race fault, ball fault, outer race fault and normal correspond to labels 1, 2, 3 and 4, respectively). Undoubtedly, the diagnosis accuracy using dictionary learning is 100%, while EMD pre-processor only achieves the 90.8% accuracy, which the particular misclassified errors keep consistent with the analysis of principal components in 2D and 3D space.

Experimental Data
The second dataset is from our gearbox test rig, which includes a single-stage cylindrical straight gearbox, a DC motor for driving gearbox, a magnetic powder brake for loading and data acquisition system. Four types of faults on gear and bearing are created inside the gearbox, i.e., the root crack, tooth broken, outer race fault and roller fault, respectively. Since the high-speed stage are more significant for a gearbox in terms of lifetime, all the faults are introduced to the high-speed gear and bearing. The vibration signal of gearbox casing is measured by an accelerometer with a sampling frequency 20 kHz. The speed of motor is 1500 rpm and the load is 11 N·m in the experiments. Detailed schematic diagram of test rig and damaged components are given in Figure 13.

Experimental Data
The second dataset is from our gearbox test rig, which includes a single-stage cylindrical straight gearbox, a DC motor for driving gearbox, a magnetic powder brake for loading and data acquisition system. Four types of faults on gear and bearing are created inside the gearbox, i.e., the root crack, tooth broken, outer race fault and roller fault, respectively. Since the high-speed stage are more significant for a gearbox in terms of lifetime, all the faults are introduced to the high-speed gear and bearing. The vibration signal of gearbox casing is measured by an accelerometer with a sampling frequency 20 kHz. The speed of motor is 1500 rpm and the load is 11 N·m in the experiments. Detailed schematic diagram of test rig and damaged components are given in Figure 13.
Like the sample preparation in case 1, there are 50 samples in each data subset, from which 20 samples are split for training and the other 30 samples are tested. Each sample is a section of raw signal containing 6000 points. The time waveforms and frequency spectra of the signal samples in the five conditions are presented in Figure 14.
The second dataset is from our gearbox test rig, which includes a single-stage cylindrical straight gearbox, a DC motor for driving gearbox, a magnetic powder brake for loading and data acquisition system. Four types of faults on gear and bearing are created inside the gearbox, i.e., the root crack, tooth broken, outer race fault and roller fault, respectively. Since the high-speed stage are more significant for a gearbox in terms of lifetime, all the faults are introduced to the high-speed gear and bearing. The vibration signal of gearbox casing is measured by an accelerometer with a sampling frequency 20 kHz. The speed of motor is 1500 rpm and the load is 11 N·m in the experiments. Detailed schematic diagram of test rig and damaged components are given in Figure 13.

Diagnosis Procedure and Results
The above-mentioned procedure is conducted on the gearbox fault datasets again. Due to the space limitations, we directly show the diagnosis results after dimensionality reduction. From Table   0 0

Diagnosis Procedure and Results
The above-mentioned procedure is conducted on the gearbox fault datasets again. Due to the space limitations, we directly show the diagnosis results after dimensionality reduction. From Table 3, it can be find that the first three principal components can reach a relatively satisfactory contribution rate over 90% in both two methods. As shown in Figure 15, by using dictionary learning pre-processor, almost no overlapping area can be observed for the first two principal components. And furthermore, all the samples are distinctly identified for the first three principal components. For comparison, the results by EMD pre-processor are presented in Figure 16. One can see the distribution areas are very close for the three machinery conditions, namely root crack, tooth broken and roller fault, both in 2D and 3D space, which may lead to misclassification. 3, it can be find that the first three principal components can reach a relatively satisfactory contribution rate over 90% in both two methods. As shown in Figure 15, by using dictionary learning pre-processor, almost no overlapping area can be observed for the first two principal components. And furthermore, all the samples are distinctly identified for the first three principal components. For comparison, the results by EMD preprocessor are presented in Figure 16. One can see the distribution areas are very close for the three machinery conditions, namely root crack, tooth broken and roller fault, both in 2D and 3D space, which may lead to misclassification.
The diagnosis results adopting first three principal components are illustrated in Figure 17 (root crack, tooth broken, outer race fault, roller fault and normal correspond to labels 1, 2, 3, 4 and 5 respectively). Similar to scatter plots of principal components, the diagnosis accuracy using dictionary learning is 100%, while that of EMD is 96%, demonstrating the superiority of proposed method again.  Figure 15. Scatter plots of the principal components after dimensionality reduction using dictionary learning and SVD: (a) The first two principal components; (b) The first three principal components.
(a) (b) 3, it can be find that the first three principal components can reach a relatively satisfactory contribution rate over 90% in both two methods. As shown in Figure 15, by using dictionary learning pre-processor, almost no overlapping area can be observed for the first two principal components. And furthermore, all the samples are distinctly identified for the first three principal components. For comparison, the results by EMD preprocessor are presented in Figure 16. One can see the distribution areas are very close for the three machinery conditions, namely root crack, tooth broken and roller fault, both in 2D and 3D space, which may lead to misclassification.
The diagnosis results adopting first three principal components are illustrated in Figure 17 (root crack, tooth broken, outer race fault, roller fault and normal correspond to labels 1, 2, 3, 4 and 5 respectively). Similar to scatter plots of principal components, the diagnosis accuracy using dictionary learning is 100%, while that of EMD is 96%, demonstrating the superiority of proposed method again.  Figure 15. Scatter plots of the principal components after dimensionality reduction using dictionary learning and SVD: (a) The first two principal components; (b) The first three principal components.
(a) (b) The diagnosis results adopting first three principal components are illustrated in Figure 17 (root crack, tooth broken, outer race fault, roller fault and normal correspond to labels 1, 2, 3, 4 and 5 respectively). Similar to scatter plots of principal components, the diagnosis accuracy using dictionary learning is 100%, while that of EMD is 96%, demonstrating the superiority of proposed method again.

Conclusions
In this research, a novel intelligent diagnosis method using dictionary learning and SVD is proposed for rotating machinery. The main idea of this framework is to extract the initial feature matrix from the raw signals by a dictionary learning scheme. Actually, the atoms learned in dictionary matrix are capable of capturing the underlying structures of analyzed signals, and thus preserve abundant identifying information. The dictionary learning scheme can extract the expected feature matrix efficiently, while avoiding the selection of lag time and embedding dimension. Afterwards, the singular value sequence is computed to denote the natural characteristic of the matrix. As the dimensionality of the sequence is high, the PCA is applied to relieve this dimensionality and generate the more significant PCs. Finally, the first several PCs are used as fault feature vectors for KNN classifier to diagnose the faults automatically. The proposed method is especially suited for classifying and recognizing the machinery conditions, which is verified by two datasets from bearing and gearbox, respectively. The comparisons with the existing EMD-based method demonstrate the superiority of this new approach.

Conclusions
In this research, a novel intelligent diagnosis method using dictionary learning and SVD is proposed for rotating machinery. The main idea of this framework is to extract the initial feature matrix from the raw signals by a dictionary learning scheme. Actually, the atoms learned in dictionary matrix are capable of capturing the underlying structures of analyzed signals, and thus preserve abundant identifying information. The dictionary learning scheme can extract the expected feature matrix efficiently, while avoiding the selection of lag time and embedding dimension. Afterwards, the singular value sequence is computed to denote the natural characteristic of the matrix. As the dimensionality of the sequence is high, the PCA is applied to relieve this dimensionality and generate the more significant PCs. Finally, the first several PCs are used as fault feature vectors for KNN classifier to diagnose the faults automatically. The proposed method is especially suited for classifying and recognizing the machinery conditions, which is verified by two datasets from bearing and gearbox, respectively. The comparisons with the existing EMD-based method demonstrate the superiority of this new approach.