Improved Dynamic Mode Decomposition and Its Application to Fault Diagnosis of Rolling Bearing

To solve the intractable problems of optimal rank truncation threshold and dominant modes selection strategy of the standard dynamic mode decomposition (DMD), an improved DMD algorithm is introduced in this paper. Distinct from the conventional methods, a convex optimization framework is introduced by applying a parameterized non-convex penalty function to obtain the optimal rank truncation number. This method is inspirited by the performance that it is more perfectible than other rank truncation methods in inhibiting noise disturbance. A hierarchical and multiresolution application similar to the process of wavelet packet decomposition in modes selection is presented so as to improve the algorithm’s performance. With the modes selection strategy, the frequency spectrum of the reconstruction signal is more readable and interference-free. The improved DMD algorithm successfully extracts the fault characteristics of rolling bearing fault signals when it is utilized for mechanical signal feature extraction. Results demonstrated that the proposed method has good application prospects in denoising and fault feature extraction for mechanical signals.


Introduction
Rolling bearing plays an important role in modern equipment [1], relating to the health condition and the remaining service life. Amplitude-modulated and/or frequency-modulated (AM-FM) multi-component signals would be introduced when local damage such as cracks and exfoliation corrosion occurred during the operation [2]. It becomes intractable to extract the fault feature as the fact of that the measured signals are usually immersed with intensive background noise, especially in the incipient failure. Thus, robust fault feature extraction method should perfectly eliminate the noisy components and acquire the useful features generated from the non-stationary time series, which provides the proof for the maintenance plan before the whole machine gets disabled.
Recent decades have witnessed significant advances in the feature extraction of fault data collected from either experiments or numerical simulations. Short-time Fourier Transform (STFT) [3] adopts the method of piecewise averaging of the unstable signal. Several segments are sliced by a constant sliding window in accordance with the order of the original timeline. Spectrum characteristics are then received by Fourier Transform (FT) on each segment. However, the time-frequency resolution is constant and cannot transform over the change of time and frequency, which is the shortcoming It will increase the calculation time if over-estimating the r value because of a large amount of data. Simulation results showed that the cross-correlation coefficient between the reconstruction data and the original synthetic signal even get a bit smaller along with the increase of the selected r. Conversely, the calculation result may not be accurate if the r value is too small, which would deduct some useful signal characteristics. In the past, most scholars presented the rank truncation criteria based on SVD, including selecting an "elbow" of the singular value with a logarithmic curve and a threshold such as a certain proportion of selected data variance. They mainly use the idea of discarding noises by substituting a smaller (compressed) matrix that captures the essential characteristics of the signal. A recent breakthrough by Gavish [21] provided a hard optimal threshold foundation in theory whether the noise magnitude is known or unknown. The methods of dominant mode selection strategy with a reduced order model are not identical. For a review of criterion to select dominant modes, we refer the reader to [22], which summarized the recent research situation and analyzed their respective advantages and disadvantages. At the same time, Kou [22] developed a method that multiplied each normalized DMD mode by its time coefficient. This method considered the evolution of each mode, but neglected the faint modes that should be treated as noise modes. Dang [23] introduced a tool, multiscale permutation entropy, to calculate the complexity of each DMD mode. Modes whose entropies were smaller than a threshold were chosen to recover the signal. However, for filtering the noise components, an entropy threshold needs to be set by multiple attempts.
In this paper, a DMD framework for mechanical fault signal processing is proposed, solving the intractable problems of optimal rank truncation threshold and dominant modes selection strategy based on the conventional DMD algorithm. The optimal rank truncation r of the similarity matrix is calculated with parametric non-convex penalty function for low-rank approximation. Inspired by WPT, which splits the signal up into a collection of different signals according to its level of decomposition, DMD modes are divided into slow modes and fast modes by splitting the continuous-time eigenvalues of the similarity matrix. Hierarchical application of the basic DMD algorithm is applied with a given level l. DMD modes are reconstructed by a number of bottomed modes. Fast Fourier Transform (FFT) is performed on the reconstructed signal for the purpose of extracting the fault features. The improved DMD algorithm is applied to process the rolling bearing's simulation signal for noise reduction and feature extraction. Amazingly, by comparing the signal-to-noise ratio (SNR), our method has tremendous advantages in noise removal and fault feature extraction, in comparison with traditional mature noise reduction methods such as EMD, SVD and WT. Then, the proposed method is applied to the bearing signal in practice; as a result, it can successfully extract the fault characteristics that are highly consistent with the actual field. This paper is organized as follows. Section 2 introduces the basic DMD algorithm for mechanical fault signal processing. Section 3 measures the solving of the optimal rank truncation order of the similarity matrix, and the strategy of dominant modes selection are outlined. Analysis results of dynamic simulation signal and practical bearing fault signal are described in Section 4. Conclusions are summarized in Section 5.

DMD Algorithm for Mechanical Signal Processing
DMD is an equation-free Koopman frequency analysis technique based on the theory of SVD and mode decomposition. Essentially, it is a kind of order reduction method that can represent potential dynamics' characteristics of a complex high-dimensional system by extracting a series of single frequency. An order reduction algorithm decomposed from DMD in flow fields is divided into two kinds of mathematical expressions. One introduces a companion matrix approximating to the infinite dimensional linear Koopman operator, and another presents a similarity matrix for a linear mapping operation on the singular values of POD. For detailed description of the two methods, we suggest reading [22]. Both methods are capable of implementing the core algorithm, but the latter is more commonly used in numerical calculations for its robustness. We generalize the DMD algorithm framework for mechanical time-series signal processing based on the similarity matrix.
Supposing that a size of N sample points is acquired by a sensor with equal intervals between two successive sampling points, we define the one-dimensional time series as S = [x 1 , x 2 , · · · x i · · · x N ], x i ∈ R , while ∆t = x i+1 − x i . S can also be made from a digital simulation signal. A m × n shift-stack Hankel matrix X S can be constructed by Equation (1): Apparently, the evolution of the sequence X and Y is determined by the eigenvalues of A for the linear system. It can also be taken as an operator to approximate the dynamic characteristics when the data are produced in nonlinear systems, such as bearing vibration signal.
This solution minimizes the error: where • F is the Frobenius norm, given by In what follows, we outline the key steps of the DMD algorithm for mechanical signal processing. 1. Firstly, seek an invertible matrix decomposition by SVD on matrix X: Sensors 2018, 18, 1972 5 of 15 where U and V are orthonormal, called the left and right singular vector, respectively, U * U = I, V * V = I. The symbol * denotes the complex conjugate transpose. ∑ ∈ R p×p contains a number of non-zero singular values σ 1 , · · · , σ p by descending sequence in its diagonal. The matrix A of Equation (4) may be obtained by the pseudoinverse of X: 2. It is quite normal that the matrix A contains such a large amount of data that leads to the calculating processing reluctant. Thus, we choose a given number of truncated rank r, and project it onto POD modes with the order of characteristic vectors. Thus, we have a similarity matrix A as follows: where A ∈ R r×r . The eigenvalues and eigenvectors of A are then represented by those of A as they process the same dynamical features.
3. Perform eigenvalue decomposition on the similarity matrix A: where W = [ω 1 , ω 2 , · · · , ω r ] ∈ R r×r is the eigenvectors of similarity matrix A, and Λ = diag([λ 1 , λ 2 , · · · , λ r ]) ∈ R r×r is a diagonal matrix containing the corresponding complex eigenvalue λ i . 4. Compute the reconstruction matrix of DMD. The evolution of the signal characteristics can be characterized by the similarity matrix. In addition, the i-th eigenvector of original operator is presented by the relevant feature vector of the similarity matrix: Equation (11), the project DMD approximate solution, is often called standard DMD mode. By firstly rewriting for convenience i = ln(λ i )/∆t, the approximate solution of reconstruction matrix of DMD is then given by: where Φ is a matrix consisting of DMD modes φ i , Ω = diag(λ i ) is a diagonal matrix whose enters are continuous-time eigenvalues of the similarity matrix A, b is a vector containing the initial amplitude of each mode, and b = Φ Γ X, Γ denotes the Moore-Penrose pseudoinverse. 5. Finally, signal reconstruction and feature extraction can proceed based on the reconstruction matrix X DMD . We take the first column of X DMD as the recuperative signal that the length is just about half of the original signal. Then, the Fourier transform is applied on the recuperative signal for a spectrogram, determining whether there is a failure frequency.

Rank Estimation Based on Convex Optimization
As in Section 2, non-zero singular values σ 1 , · · · , σ p are arranged in a descending sequence when performing SVD on X. Singular spectrum analysis treats the ahead k values representing the useful signals, and the residual p − k orders for part of the noise. The difficulty lies in the determination of optimized rank k. It cannot perfectly restore the dynamic information with the method of singular spectrum or hard optimal threshold foundation for their truncated rank orders k are smaller than the ideal ones, which is proved in Section 4.1. Here, a convex optimization framework is introduced to estimate the best rank r for Equation (9).
The noise polluted Hankel matrix X can be supposed to be a combination of an expectation matrix and a residual noise matrix: where X E is the desired low rank matrix, and X N is a zero mean Gaussian white noise matrix. Note that for unitary matrices U and V, Φ(X) = Φ(U T XV). Define the sparseness-inducing regularization X − X E F by adding a penalty term λΦ(X). The typical resulting low-rank approximation problem can be written as follows: where λ is a positive regularization parameter balancing the least-square fitting term, and 0 ≤ α ≤ 1/λ. When the function φ(x) is represented by |x|, the convex issue of low-rank approximation problem for rank minimization turns to a typical nuclear norm problem [24]. (14) can be equivalently written as: Dang [24] and Selesnick [25] proved that parametric non-convex penalty function method can be more accurate than using nuclear norm in estimating non-zero singular values, when the signal contains a large noise level. Therefore, the non-convex penalty function is used to obtain the optimized estimation of singular value as the following equation: In this paper, the optimal rank r of singular value, which should be determined in the second step of the DMD algorithm, is obtained by the convex optimization framework.

Dominant Modes Selection Strategy
Reconstruction matrix of DMD is obtained by a superposition of φ i exp( i ∆t)b i , and i is continuous-time eigenvalues of the similarity matrix A, which constitutes a diagonal matrix. Every single i corresponds to a DMD mode φ i . In this part, we introduce a method to select optimal dominant modes for DMD with two steps. Firstly, subtract modes relating to i = ln(λ i )/∆t ≈ 0 from all the dynamic features, and then define a hierarchical application of basic DMD algorithm.
Step 1. By the construction of DMD methodology, b = Φ Γ X, which means that φb renders the columns of X with a dimensionality reduction. Thus, the diagonal matrix of frequencies λ i dictates how the first period of signal X gets altered over time to reconstruct the subsequent periods. It becomes apparent that any period of the signal that does not change or changes very slowly in time must have an associated Fourier mode located near the first period with the frequency i ≈ 0. We regard the slow oscillations as the background modes (noises) by setting a threshold ξ, and produce a representation of the low-frequency reconstruction matrix X i ≤ξ DMD and remaining frequency reconstruction matrix X i >ξ DMD of the form: This observation makes it possible to pick out background modes from all the dynamic features with DMD, thus we only take the remaining frequency reconstruction matrix into account in this first step; thus, Step 2. A hierarchical DMD is introduced to split the reconstructed signal with several levels, inspired by the idea of wavelet packet decomposition. Mathematically, we divide the solution of the DMD approximate solution of Equation (18) into slow modes and fast modes, by splitting the number of i > ξ at its median: where the symbol (1) at the top right corner of X DMD (1) represents the first level, and r 1 equals the optimal rank r minus the number of i ≤ ξ. X S

DMD
(1) and X F DMD (1) are the slow modes matrix and fast modes matrix, respectively. Then, we apply the second segmentation on each of them: where the symbol (SS) at the top right corner represents slow modes in the first level and also the same in the second level, while (SF) represent fast modes in the first level and slow modes in the second level. The solution (20) can be made more precise by accounting for more numbers of segmentation; here, we say level (L). The arbitrary L-th DMD modes can be presented as follows: where l = 1, 2, · · · , L represents the decomposition levels, j = 1, 2, · · · , 2 l−1 represents the number of bins per level, and r l represents the truncation rank of l level. Figure 1 demonstrates the process in terms of the multiresolution DMD solution. Each X DMD (l,j) can be obtained in each level l and each bin j by linear operator A (l,j) : The solution minimizes the error yielding the least-square fit. It can be given similarly by Equation (5).
Sensors 2018, 18, 1972 8 of 15 As in the fifth step of DMD core algorithm in Section 2, Fourier transform can be applied on the recuperative signal for spectrogram with each X DMD (l,j) , determining whether there is a failure frequency.
of bins per level, and l r represents the truncation rank of l level. Figure 1 demonstrates the process in terms of the multiresolution DMD solution. Each ( , ) lj DMD X can be obtained in each level l and each bin j by linear operator ( , ) lj A : The solution minimizes the error yielding the least-square fit. It can be given similarly by formula (5).
As in the fifth step of DMD core algorithm in Section 2, Fourier transform can be applied on the recuperative signal for spectrogram with each ( , ) lj DMD X , determining whether there is a failure frequency. Rewrite the Equation (21) as follows: Define that   Rewrite the Equation (21) as follows: Define that g (l,j) = 1 when there is a failure frequency for X DMD (l,j) , accordingly, g (l,j) = 0 when there is no failure frequency. g (l,j) acts as a sifting operation for each bin j at level l. Ultimately, the flowchart of the improved DMD method is shown in Figure 2.

Analysis of Simulated Signal
There are many kinds of dynamic models simulating bearing fault signal. Comparatively speaking, the model first established by McFadden [26] and later refined by Randall [27] can better reflect the characteristics of rolling bearing vibration. According to the structure characteristics, shortterm impact signal should be produced when bearing parts revolve through the fault location. The

Analysis of Simulated Signal
There are many kinds of dynamic models simulating bearing fault signal. Comparatively speaking, the model first established by McFadden [26] and later refined by Randall [27] can better reflect the characteristics of rolling bearing vibration. According to the structure characteristics, short-term impact signal should be produced when bearing parts revolve through the fault location. The impact motivates the bearing system oscillating by its natural frequency with high frequency attenuation. Define t as a period of two intervals, S(t) as the natural frequency oscillation function, and A k as the i-th amplitude of shock response. The interference of additive noise n t (zero mean additive background noise) should be taken into consideration along with the simulation signal in view of the poor working environment of rolling bearings. Therefore, the mathematical model of fault rolling bearing can be described as: where a k is the k-th impact energy; γ and ϕ A are the initial phases; f m is the modulation frequency; B is the attenuation coefficient depended on the bearing system; τ k is the time lag from its mean period due to the presence of tiny fluctuation, and c A is a random constant. Most frequently, the fault of rolling bearing appears as the damage in outer race, inner race, and/or the surface of rolling elements. Generally, three common types of faults on outer race, inner race and rolling elements can be modeled as f m = 0, f m = f r , f m = f c , respectively. Here, f r and f c are denoted as rotational frequency and cage frequency. The time-domain fault simulation signal of the inner race is plotted in Figure 3a, added by the Gaussian white noise so the SNR ratio is 6. The parameters selection of Equation (24) is listed in Table 1.  Figure 3a, added by the Gaussian white noise so the SNR ratio is 6. The parameters selection of Equation (24) is listed in Table 1.     We choose ξ = 0.04 (retain 90% of i ) and take the decomposing level L as 2. Frequency spectrum with improved DMD plotted in Figure 3d, while FFT frequency and envelope spectrum are also plotted for comparison in Figure 3b,c, respectively. Inner race fault frequency can be identified, but the modulation frequency (FM) is not clear connected to the intuition with the FFT frequency spectrum. Inner race frequency and rotational modulation frequency are visibly presented in the envelope spectrum, but there is still a certain amount of noise frequency surrounding the principal frequencies.
By adopting our improved DMD, not only can the frequency of inner race fault be clearly visualized, but there is also a little amount of noise outlined in Figure 3d compared with Figure 3c.
Variations of cross correlation coefficient along with the truncated rank order are shown in Figure 4. The best truncation rank number of SVD, hard optimal threshold and convex optimization DMD were also marked on the curve. It can be concluded that the sparse optimization framework outperforms the other methods in the quality of signal recovery.  Comparisons between improved DMD with other conventional spectral analysis methods are characterized in Figure 5. Figure 5a is the frequency spectrum of the original situation signal without noise taking by FFT. Figure 5b is the result of SVD, and we select the first four singular values for signal reconstruction. Figure 5c is the multi-scale WT signal spectrum; here, we use wavelet function db4 with three scale wavelet decomposition [5], and choose the high frequency component of the second layer for signal reconstruction. Figure 5d is the frequency spectrum by applying FFT on superposed IMFs of EMD, and we choose IMFs of 1, 4, 5, 6 for superposition.  . Cross correlation coefficient between DMD reconstruction signal and original simulation signal with different truncation rank r. Cross correlation coefficient is first ascending and then slightly declining. The rank (r = 4) in blue circle point obtained by singular spectrum; the rank (r = 13) with the green triangle point obtained by the hard optimal threshold; the rank (r = 231) with a red cross mark obtained by our method.
Comparisons between improved DMD with other conventional spectral analysis methods are characterized in Figure 5. Figure 5a is the frequency spectrum of the original situation signal without noise taking by FFT. Figure 5b is the result of SVD, and we select the first four singular values for signal reconstruction. Figure 5c is the multi-scale WT signal spectrum; here, we use wavelet function db4 with three scale wavelet decomposition [5], and choose the high frequency component of the second layer for signal reconstruction. Figure 5d is the frequency spectrum by applying FFT on superposed IMFs of EMD, and we choose IMFs of 1, 4, 5, 6 for superposition. Figure 5e was obtained by our improved DMD method, while ξ = 0.04, L = 3. SNR of the reconstruction signal is calculated as 7.8, 9.5, 6.6, 14.2 from Figure 5b-e successively. Our method has a tremendous advantage in both denoising and fault feature extraction.
second layer for signal reconstruction. Figure 5d is the frequency spectrum by applying FFT on superposed IMFs of EMD, and we choose IMFs of 1, 4, 5, 6 for superposition.

Analysis of Measured Signal
Vibration signals of the bearing system in an industrial spot are more complex than the dynamic simulation signal [28]. To verify the effect of fault features extraction, the improved DMD algorithm

Analysis of Measured Signal
Vibration signals of the bearing system in an industrial spot are more complex than the dynamic simulation signal [28]. To verify the effect of fault features extraction, the improved DMD algorithm is applied on real environment fault signals. An integrated fault apparatus of gear-bearing transmission is shown in Figure 6. The high-speed shaft is driven by an AC motor (550 w, 220 v/50 Hz) with a belt, and the speed of the motor is 1450 r/min. A corrosion pit with the depth of 1.0 mm was fabricated by electric spark processing in the out race of the testing bearing (deep groove ball bearing, bearing number 6207). An acceleration sensor (PCB-352C33, Depew, NY, USA) is installed on the bearing pedestal along the vertical direction. The vibration signal is then obtained by a machinery health analyzer CSI2130 (Emerson, St. Louis, MO, USA) with sampling frequency of 16,384 Hz. The number of rolling elements is Z = 9, and ball diameter is d = 11.1 mm, pitch diameter of the testing bearing is D = 53.5 mm. Thus, rotation frequency and outer ring fault frequency can be calculated as 24.17 Hz and 87.01 Hz, respectively [29].  A total number of 10,000 samples, shown in Figure 7a in the time domain, are analyzed with improved DMD. The FFT spectrum is shown in Figure 7b. Pang [30] proved the superiority of the envelope spectrum over the FFT in the incipient rolling bearing fault diagnosis. For comparison, the envelope spectrum of the measured signal is shown in Figure 7c.
Obviously, we can't judge whether there is a bearing fault from Figure 7b for a large amount of interference frequencies presented in the frequency spectrum. Figure 7c is slightly better than A total number of 10,000 samples, shown in Figure 7a in the time domain, are analyzed with improved DMD. The FFT spectrum is shown in Figure 7b. Pang [30] proved the superiority of the envelope spectrum over the FFT in the incipient rolling bearing fault diagnosis. For comparison, the envelope spectrum of the measured signal is shown in Figure 7c.  Obviously, we can't judge whether there is a bearing fault from Figure 7b for a large amount of interference frequencies presented in the frequency spectrum. Figure 7c is slightly better than Figure 7b, but there are still some noise frequencies presented in the frequency spectrum. Meanwhile, in order to illustrate the effectiveness of our method, the frequency spectrum of WPT is applied and the results are shown in Figure 8. Here, we use wavelet function db15 with eleven layers of decomposition. Though it is obvious that the fault frequency f o and its harmonics (2 f o , 3 f o , 4 f o , 5 f o ) can be identified in Figure 8, where f o is the frequency of the outer race, the noise is still present in the acquired signal, as we can see in the red elliptical mark.  Subsequently, we apply our improved DMD method on the measured signal with chosen parameters m = 5000, ξ = 0.05, L = 7. The non-fault signal modes are filtered out and the useful signals with characteristic frequencies are retained by examining each bin (total sixty-four pins) of the seventh level. Then, the reconstruction matrix can be obtained by Equation (23). Figure 9 shows the spectrum of our improved DMD method. Obviously, not only are the fault frequency f o and its harmonics (2 f o , 3 f o , 4 f o , 5 f o ) easy to be identified, but there is also little interference frequency close to them. That is to say, we are more confident to determine that the outer race of the testing rolling bearing has failed, which is highly consistent with the actual situation.

Conclusions
This paper puts forward an improved DMD decomposition algorithm, which successfully applied on the bearing fault signal. Main solution of our work includes the following two aspects: (1) optimal rank truncation number of the similarity matrix is obtained by using convex optimization through introducing a parameterized non-convex penalty function. Our rank determinative method is more perfectible than others in feature extraction with standard DMD algorithms; (2) in order to archive the dominant modes selection strategy, we firstly subtract slow oscillations modes by setting a threshold, and then define a hierarchical application of the basic DMD algorithm. We apply our improved DMD framework both on simulated and measured signals for roller bearing signal fault diagnosis. SNR is better than other classical noise reduction methods after the bearing simulation signal is processed by our improved DMD algorithm. In addition, the frequency spectrum of the reconstruction signal is more readable and interference-free. Results show that our method has a good application prospect in denoising and feature extraction with a mechanical signal.

Conclusions
This paper puts forward an improved DMD decomposition algorithm, which successfully applied on the bearing fault signal. Main solution of our work includes the following two aspects: (1) optimal rank truncation number of the similarity matrix is obtained by using convex optimization through introducing a parameterized non-convex penalty function. Our rank determinative method is more perfectible than others in feature extraction with standard DMD algorithms; (2) in order to archive the dominant modes selection strategy, we firstly subtract slow oscillations modes by setting a threshold, and then define a hierarchical application of the basic DMD algorithm. We apply our improved DMD framework both on simulated and measured signals for roller bearing signal fault diagnosis. SNR is better than other classical noise reduction methods after the bearing simulation signal is processed by our improved DMD algorithm. In addition, the frequency spectrum of the reconstruction signal is more readable and interference-free. Results show that our method has a good application prospect in denoising and feature extraction with a mechanical signal.