Optimized Dynamic Mode Decomposition via Non-Convex Regularization and Multiscale Permutation Entropy

Dynamic mode decomposition (DMD) is essentially a hybrid algorithm based on mode decomposition and singular value decomposition, and it inevitably inherits the drawbacks of these two algorithms, including the selection strategy of truncated rank order and wanted mode components. A novel denoising and feature extraction algorithm for multi-component coupled noisy mechanical signals is proposed based on the standard DMD algorithm, which provides a new method solving the two intractable problems above. Firstly, a sparse optimization method of non-convex penalty function is adopted to determine the optimal dimensionality reduction space in the process of DMD, obtaining a series of optimal DMD modes. Then, multiscale permutation entropy calculation is performed to calculate the complexity of each DMD mode. Modes corresponding to the noise components are discarded by threshold technology, and we reconstruct the modes whose entropies are smaller than a threshold to recover the signal. By applying the algorithm to rolling bearing simulation signals and comparing with the result of wavelet transform, the effectiveness of the proposed method can be verified. Finally, the proposed method is applied to the experimental rolling bearing signals. Results demonstrated that the proposed approach has a good application prospect in noise reduction and fault feature extraction.


Introduction
Generally, amplitude-modulated and/or frequency-modulated (AM-FM) multi-component signals collected by sensors are interfered with by background noise, resulting lower signal-to-noise ratio (SNR) [1]. The reason for the low SNR signals obtained by a determinate acquisition system are usually as follows: firstly, when a rolling bearing is in the infantile fault period, the fault feature is inconspicuous and buried in the background noise. Secondly, signal energy as well as incipient failure features will attenuate because of the long distance between the fault source and the sensor, which usually is located on the bearing pedestal. Moreover, the characteristic signal will also be attenuated when there is a long transmission path from the sensor to the acquisition device [2]. Thirdly, the signals collected by sensors contain strong background noise. The main transmission system of a machine is a complex system composed of multiple components such as motors, couplings, multiple reduction gearbox. The vibration of all components will be collected by the sensor. Noisy signals make it difficult to identify the fault features in signal processing, therefore, it is necessary to perform denoising and characterize fault feature accurately on the low SNR vibration signal.
In the last few decades, many theoretical denoising methods have been proposed and are widely used in medical signaling, image processing and mechanical engineering signal processing [3][4][5]. Donolo and John-stone [6], taking the advantage of the spatial diversity of wavelet basis functions, proposed a denoising method with a threshold based on wavelet transform. Their methods concentrate the energy of the ideal signal, benefiting from the effective wavelet, guaranteeing a certain de-noising effect. However, the selection of threshold functions is the main problem with wavelet de-noising algorithms, which usually generate a deviation between the calculated result and the real one. Empirical mode decomposition (EMD) [7] can be treated as a typical adaptive signal decomposition method of frequency-selection and filtering. The signal is decomposed into a series of intrinsic mode functions (IMFs) arranged from high frequency to low frequency. The source signal can be processed by abandoning high-frequency components for the consideration that noise components are generally distributed in the high-frequency domain. However, as EMD has the defects of modal aliasing, the ideal signal and noise coexist in the selected IMFs. This would result in the loss of useful information. Ensemble empirical mode decomposition (EEMD) [8] was developed to solve the mode-mixing problem of EMD by adding noise to the source signal. It is more accurate and effective for data decomposition than EMD, but it prompts new trouble by being time consuming. Variable mode decomposition (VMD) [9,10] is a signal decomposition method based on wiener filtering, one dimensional Hilbert transform and heterodyne demodulation analysis, but the decomposition results are restricted by the selection of penalty parameters and the number of components. Singular value decomposition (SVD) [11], as a data-processing method without prior knowledge, has been successfully applied to signal denoising and proved to be effective. However, there exist two pivotal problems: one is that how to determine the effective rank order after decomposition, and the other one is that how to achieve the reconstruction matrix. Meanwhile, the noise reduction effect of SVD is not prominent when analyzing the low SNR signal.
Dynamic mode decomposition (DMD) [12] is an equation-free Koopman frequency analysis technique based on the theory of SVD and mode decomposition. It can extract the spatiotemporal coherent characteristics by decomposing a dynamic model into a series of single-frequency non-orthogonal modes based on its inherent space-time. The reconstructed signal can characterize the useful information of the original signal perfectly by superposing the filtered modes according to time coefficients. The algorithm is suitable for extracting dynamic information of nonlinear systems [13]. The instantaneous components and high-frequency noise components can be effectively removed, due to the dynamic delay processing in the algorithm [12]. The DMD algorithm has been well applied in the fluid mechanics community. Rowley [12] applied DMD in the jet cross flow field. The results show that DMD theory can extract the shedding frequency while the linear stability analysis method cannot. In addition, DMD can separate the shear layer and wall vortices, whereas the proper orthogonal decomposition (POD) algorithm is ineffective. Schmidt [13] applied DMD to the flow field in a variety of environmental conditions, indicating that the DMD theory is effective for feature extraction with numerical simulation and measured velocity field data. Moreover, DMD can separate video frames into a background and multiple foregrounds successfully when applied to video processing [14]. DMD can also be used for biometrics to detect fraudulent samples [15]. In the field of robotics and neuroscience, DMD has been used to estimate the perturbation of human-robot interactions [16], and the coherent modes in large-scale neural recording have been extracted [17]. Nowadays, theoretical research with DMD mainly focuses on two aspects. In the first aspect, a variety of improved methods are proposed based on the standard DMD to solve the problem that DMD cannot accurately extract the dynamic information due to the influence of signal noise. These strategies range from noise-corrected dynamic mode decomposition with control [18], total least-squares DMD [19], compressed sensing DMD [20] to sparsity-promoting DMD [21]. The improved DMD algorithm can extract fluid dynamic information Entropy 2018, 20, 152 3 of 19 more accurately in varying degrees. In the second aspect, in view of the large data processing problems, researchers aim at reducing the computational complexity and the program storage space in either pre-processing or post-processing, using strategies including random DMD [22], random low-rank DMD [23], and tensor-based DMD [24]. The standard DMD and its improved algorithm are more suitable in analyzing periodic or quasi-periodic signals, but it is prone to generate some false mode components. Therefore, the theory of fault vibration signal processing and fault mode recognition with DMD needs further study.
Essentially, DMD is a hybrid algorithm based on mode decomposition and SVD, and it inevitably inherits the drawbacks of these two algorithms, including the selection strategy of truncated rank order and DMD modes. The number of truncation rank for the similar matrix needs to be artificially and blindly selected in advance; whether the number value is appropriate or not is of crucial importance for accurately representing the dynamic information of reconstructed signal. The measured vibration signal with the characteristic of non-linear and non-stationary is coupled with multi-components and contain strong environmental noise. To achieve the purpose of fault feature extraction with DMD, it requires separation of the fault feature components effectively from the DMD modes. It is necessary to establish an adaptive optimization algorithm based on DMD to achieve an accurate truncation of rank and screen the single-frequency non-orthogonal modes effectively.
In this paper, an optimized DMD algorithm is proposed for extracting the fault features drowned in the multi-component coupled and noisy mechanical signal. The idea of sparse optimization is adopted to select the optimal rank for the solution of truncated rank order selection problem. By introducing the constraint of non-convex regulation to the rank function, the similar matrix has low rank attribute, making the ideal components (signals) more centralized. As for the fact that the noise's multiscale permutation entropy (MPE) is larger than the ideal component's, the low rank modes are selected corresponding to faults and inherent features by a threshold, solving the difficulty in selecting useful DMD modes. Then the signal is reconstructed by the low rank DMD modes, and Fourier transform (FT) is conducted to achieve the purpose of denoising and fault diagnosis of the noisy mechanical signal.
The structure of this paper is as follows. Section 2 introduces the basic theory, including the standard algorithm of DMD, the sparse optimization algorithm with non-convex optimization regulation, and MPE. In Section 3, the proposed algorithm is applied to the normal and faulty bearing noisy simulation signals to verify its effectiveness. Analysis results of two experimental bearing fault signals are described in Section 4. Conclusions are summarized in Section 5.

Methodology
In this part, the standard DMD algorithm is introduced based on time series in fault diagnosis. To solve the problem of selecting the optimal rank, a technique based on sparse optimization is proposed. The details are described as in Section 2.2. Then, we introduce the basic algorithm of MPE, and show the basic flow chart in the form of illustration.

DMD with Time Series Signal
Supposing that a signal S is acquired by numerical simulation or a sensor with equal intervals between two successive sampling points, S = (x 1 , Hankel matrix X can be obtained by a sliding window over the corresponding vector S. Decomposing the Hankel matrix into two shift-stack matrices as in the form of Equation (2).
Assume that the two continuous sequence matrices satisfy a mapping relation with optimal linear operator. DMD algorithm utilizes a low rank expression of matrix A to capture the potential dynamic characteristics.
A can be also regarded as an operator to approximate the dynamic characteristics when the data are produced in nonlinear systems [13]. Matrix A usually contains many complex entries if S is presented in high-dimensional space. The DMD algorithm finds an optimal similarity matrix F ∈ C r×r by projecting the matrix A with its r-th order eigenvectors according to POD.
where U * is the complex conjugate transpose of POD mode U, and U is obtained by SVD of matrix X t .
where U ∈ C m×r , V ∈ R r×(n−1) are known as the left and right eigenvectors respectively, U * U = I, V * V = I, The symbol * denotes the complex conjugate transpose. Σ r ∈ R r×r is a diagonal matrix containing r non-zero eigenvalues {σ 1 , σ 2 , · · · σ r } on its diagonal. The matrix F is determined by minimizing the Frobenius norm of the error between X t+1 and AX t .
Optimal solution of the above formula can be got directly.
Equation (7) was proposed by Schmid [13], while the literature also introduced the application of DMD algorithm. Then a matrix F DMD ∈ R r×r with lower dimension is obtained. After the similar transformation above, F DMD and A are almost equivalent with each other. Hence, the eigenvectors and eigenvalues of A are replaced by the eigenvectors and eigenvalues of F DMD .
Matrix F DMD optimizes the low-dimensional approximation of the internal data mapping matrix A by the POD subspace modes of X t . The dynamic information of the r-dimensional subspace is expressed as: Entropy 2018, 20, 152

of 19
Perform eigenvalue decomposition on the similarity matrix F DMD : where W = [ω 1 , ω 2 , · · · , ω r ] ∈ R r×r are the eigenvectors of similarity matrix F DMD , and Λ = diag([λ 1 , λ 2 , · · · , λ r ]) ∈ R r×r is a diagonal matrix containing the corresponding complex eigenvalue λ i . The evolution of the signal characteristics can be characterized by similarity matrix. In addition, the i-th eigenvector of original operator is presented by relevant feature vector of similarity matrix.
Equation (10), the project DMD approximate solution, is often called standard DMD mode. By firstly rewriting for convenience i = ln(λ i )/∆t, the approximate solution of reconstruction matrix of DMD is then given by: where Φ is a matrix consisted of DMD modes φ i , Ω = diag(λ i ) is a diagonal matrix whose entries are continuous-time eigenvalues i of the similarity matrix F DMD , b is a vector containing the initial amplitude of each mode, b = Φ Γ x t , Γ denotes the Moore-Penrose pseudoinverse.

Sparse Optimized DMD via Non-Convex Regularization
As described in Section 2.1, r-th non-zero eigenvalues of the similarity matrix F DMD are retained to reconstruct the analytical signal, and r is the truncation rank order for the SVD of X t . In the process of standard DMD, the rank of r needs to be manually preset. It plays a crucial role in accurately expressing the dynamic features of the original signal. If r is too large, it may reconstruct some unnecessary sparse components, such as noise, which cannot play a good denoising effect. Moreover, it increases the storage space and decreases the calculating efficiency. On the contrary, if r is too small, it will inevitably filter out useful low-rank components that symbolize the system dynamic information, failing to extract the signal physical characteristics. The difficulty lies in the determination of optimized truncation rank r.
Recently, many sparse optimization algorithms have been proposed in the fields of signal and image processing [16], compressive sensing [17], machine learning [25], and data mining [26] for consideration that the optimization variables have some sparse structures. Sparse optimization makes it possible to reconstruct high-dimensional signals and extract potential information from a small amount of data. Meanwhile, sparse optimization can greatly accelerate the calculation speed in large-scale optimization problems. Based on DMD theory, this paper introduces a parametric sparse optimization framework for the optimal low-rank approximation of X t .
If the shift-stack Hankel matrix X t take the form of where ∼ X t is the ideal clean matrix containing all the dynamical characteristics of the vibration system. N t is a random noisy matrix. A typical sparse optimization framework with minimal low rank matrix approximation is defined as follows.
where λ > 0 is a regularization coefficient, Π is a sparse regularization. The above minimum low-rank approximation problem is transformed into a standard problem of minimizing nuclear penalty when Theoretically, the nuclear norm optimization framework is convex, and it can be directly solved by taking traditional SVD on the shift-stack Hankel matrix X t . However, the solution of convex norm-based nuclear norm optimization often leads to underestimate the rank of the original matrix [27]. Selesnick [28] proposed a sparse optimization method based on non-convex regularization to denoise a noisy signal. The calculation process can guarantee the strict convexity of Equation (13) and generates a unique solution. Inspired by the idea of [28,29], this paper adopts a sparse framework of non-convex optimization to obtain the best truncated rank of the shift-stack matrix. Non-convex optimization can approximate the non-zero singular value of the calculation matrix more accurately than nuclear norm optimization framework. Moreover, the non-convex method is more robust to strong background noise then nuclear norm optimization.
regularization of sparse optimization. Non-convex penalty function must meet the following five properties: Chen [30] listed various penalty functions satisfying the above assumption in Table II. Here we employ the partly quadratic penalty function [27], defined as: where α is a non-convex penalty parameter and controls the degree of non-convexity of φ, satisfying 0 ≤ α ≤ 1 λ . Finally, the sparse optimization objective function based on non-convex regularization is defined as Equation (15). The optimal truncation rank r of singular value which should be determined in the process of DMD algorithm is obtained by the non-convex optimization framework.

Optimized DMD Modes via MPE
The result of the DMD algorithm obtains a series of single-frequency modes that represent the dynamics of the original system. DMD modes are similar to the IMFs obtained by EMD, while each DMD mode corresponds to a single frequency. It can also be regarded as the vibration modes in mechanical vibration. All DMD modes express the inherent characteristics, fault information and noise background components (introduced by the shift-stack matrix X t+1 ) of the original system information, making it difficult to diagnose mechanical fault features effectively. Approximate entropy (AE) [31] reflects the time series' complexity by calculating the approximate degree of two points in the given dimension and the probability of new mode when the dimension changes. AE has the characteristics of stable calculation results with short data sequence. Sample entropy [32,33] is an improved algorithm, and it performs better than AE to maintain the consistency of the calculation results. Permutation entropy (PE) [33] is an algorithm based on time series phase space reconstruction and neighborhood value comparison, which can effectively describe the complexity of the system. PE has the advantages of simple algorithm, fast computing speed, and reliable calculation results [34]. MPE is an improved algorithm based on PE. In addition, it is especially suitable for nonlinear data. In this paper, MPE threshold is used to identify and remove the modes of noise components. The remaining low-rank components are reconstructed by DMD to achieve the purpose of noise reduction and fault feature extraction for noisy fault signals.
The basic idea is that it first gets the time series coarse-grained in time-scale, then calculates PE on the coarse-grained data. The process of MPE is briefly described here. For more detailed information about MPE, it can be referred to in [33,35].
where S is the scale factor, S = 1, 2, . . ., when S = 1, the coarse-grained sequence is the original sequence, and the result of MPE is PE. N s means rounding N s . If S ≥ 2, the original sequence is coarse-grained with the length of N s . Secondly, the phase space reconstruction is carried on the coarse-grained time series.
where i = 1, 2, · · · , N. d ≥ 2 is the embedding dimension; τ is the delay time; Each row vector in the reconstructed phase space matrix is a reconstruction vector, and there are d reconstruction vectors in total. Considering every two data in the form of ascending or descending order, the j-th reconstruction vector has d! possible ordinal patterns. Finally, the MPE value H is defined as: where p i represents the frequency of a particular sequence pattern in all reconstruction vectors. Figure 1 shows the calculation process of p i with a few capacity sample data sequences. The regular parameter ln(d!) ensures that the maximum value of H(N, d, τ) is 1.
MPE is a measure of system complexity. The larger the entropy, the more complex the system components, and the closer the system is to the stochastic signal. Conversely, the smaller the entropy value, the simpler the system components appears. For the fixed-length analysis sequence, the result of MPE dependent on the scale factor s and the embedding dimension d. Generally, if d is too small, the reconstructed vector contains little information, the algorithm loses its validity and cannot detect the kinetic mutation of the sequence. If d is too large, the reconstruction of phase will homogenize the time series, which is not only time-consuming but also unable to reflect the small changes of the sequence. Bandt [33] and Zunino [34] suggested that d should be in the range 3-7. Time delay τ has little effect on the MPE of time series [35,36], and in most literatures [36][37][38], authors employ τ = 1. In this paper, we employ s = 5, d = 6, τ = 1 in the calculation of MPE. In [36,38,39], the length N of time series are suggested as N > 5d!, thus the sample points of the simulation signal should be larger than 3600.
Ultimately, the flowchart of the algorithm based on DMD via non-convex regularization and MPE is shown in Figure 2.  MPE is a measure of system complexity. The larger the entropy, the more complex the system components, and the closer the system is to the stochastic signal. Conversely, the smaller the entropy value, the simpler the system components appears. For the fixed-length analysis sequence, the result of MPE dependent on the scale factor s and the embedding dimension d . Generally, if d is too small, the reconstructed vector contains little information, the algorithm loses its validity and cannot detect the kinetic mutation of the sequence. If d is too large, the reconstruction of phase will homogenize the time series, which is not only time-consuming but also unable to reflect the small changes of the sequence. Bandt [33] and Zunino [34] suggested that d should be in the range 3-7.
Time delay τ has little effect on the MPE of time series [35,36], and in most literatures [36][37][38], authors employ 1 τ = . In this paper, we employ In [36,38,39], the length N of time series are suggested as , thus the sample points of the simulation signal should be larger than 3600. Ultimately, the flowchart of the algorithm based on DMD via non-convex regularization and MPE is shown in Figure 2.

Algorithm Application in Fault Bearing Simulation Signal
Here we adopt the dynamic model of bearing fault signal established by Randall [40]. The mathematical model of fault rolling bearing is described in Equation (19). According to the characteristics of bearing structure, instantaneous impact signal should be generated when bearing parts pass through the fault location. The impact motivates the bearing system oscillating by its natural frequency with high-frequency attenuation. Define T as a period of two intervals, ( ) S t as the natural frequency oscillation function, and k A as the i -th amplitude of shock response. The interference of additive noise t n (zero mean additive background noise) should be taken into consideration in the simulation signal in view of poor working environment of rolling bearings. Therefore, the mathematical model of fault rolling bearing can be described as:

Algorithm Application in Fault Bearing Simulation Signal
Here we adopt the dynamic model of bearing fault signal established by Randall [40]. The mathematical model of fault rolling bearing is described in Equation (19). According to the characteristics of bearing structure, instantaneous impact signal should be generated when bearing parts pass through the fault location. The impact motivates the bearing system oscillating by its natural frequency with high-frequency attenuation. Define T as a period of two intervals, S(t) as the natural frequency oscillation function, and A k as the i-th amplitude of shock response. The interference of additive noise n t (zero mean additive background noise) should be taken into consideration in the simulation signal in view of poor working environment of rolling bearings. Therefore, the mathematical model of fault rolling bearing can be described as: where a k is the k-th impact energy; γ and ϕ A are the initial phases; f m is the modulation frequency; B is the attenuation coefficient depended on the bearing system; τ k is the time lag from its mean period due to the presence of tiny fluctuation, and c A is a random constant.
Time domain simulation signals of rolling bearing with normal bearing and inner race, outer race and roller faults are respectively plotted in Figure 3. It should be pointed out that Gaussian white noise is added so as to SNR reach to 0.5 precisely. The parameters selection of Equation (19) is listed in Table 1, where f r and f c are denoted as rotational frequency and cage frequency. f i , f o , f b respectively represent the failure frequency of the inner circle, the outer ring, and the rolling element. The sampling frequency is selected as 5 KHz, and the number of total sampling points is 8000.   The FT spectrum analysis of the bearing noisy signal is shown in Figure 4. There are a large number of noisy spectral components, as shown by the blue dotted boxes in Figure 4a-d, which severely affect the feature identification of the fault. Rotational frequency and its resonance frequencies (as marked in blue circles in Figure 3a) should be easy to recognize in the spectrum domain when the bearing is under healthy conditions. However, numbers of noise frequencies revolve around the center frequencies, as marked in the brown dotted box in Figure 3a, resulting in the fact that we cannot interpret the signal properly. The same situation appears in the bearing inner race fault signal spectrum, as shown in Figure 3c. In general, the three common types of faults on the inner race, the outer race and the rolling elements can be modeled as m respectively. That is to say, rotational frequency, 0, cage frequency should modulate their center frequency, respectively. In Figure 3b,d, the center frequency belongs to the inner race and balls can be identified, but it is difficult to identify their respective modulation frequency as they are submerged by the noise components. The FT spectrum analysis of the bearing noisy signal is shown in Figure 4. There are a large number of noisy spectral components, as shown by the blue dotted boxes in Figure 4a-d, which severely affect the feature identification of the fault. Rotational frequency and its resonance frequencies (as marked in blue circles in Figure 3a) should be easy to recognize in the spectrum domain when the bearing is under healthy conditions. However, numbers of noise frequencies revolve around the center frequencies, as marked in the brown dotted box in Figure 3a, resulting in the fact that we cannot interpret the signal properly. The same situation appears in the bearing inner race fault signal spectrum, as shown in Figure 3c. In general, the three common types of faults on the inner race, the outer race and the rolling elements can be modeled as f m = f r , f m = 0, f m = f c , respectively. That is to say, rotational frequency, 0, cage frequency should modulate their center frequency, respectively. In Figure 3b,d, the center frequency belongs to the inner race and balls can be identified, but it is difficult to identify their respective modulation frequency as they are submerged by the noise components.
domain when the bearing is under healthy conditions. However, numbers of noise frequencies revolve around the center frequencies, as marked in the brown dotted box in Figure 3a, resulting in the fact that we cannot interpret the signal properly. The same situation appears in the bearing inner race fault signal spectrum, as shown in Figure 3c. In general, the three common types of faults on the inner race, the outer race and the rolling elements can be modeled as m respectively. That is to say, rotational frequency, 0, cage frequency should modulate their center frequency, respectively. In Figure 3b,d, the center frequency belongs to the inner race and balls can be identified, but it is difficult to identify their respective modulation frequency as they are submerged by the noise components.  In this paper, the parameters λ, a are selected as 1.2, 0.5 respectively in the process of sparse optimization using non-convex optimization regularization. The MPE threshold is selected as 0.7 (refer to the conclusion section). We recover the signals by DMD modes whose MPE values are less than the threshold. Spectrums were obtained by performing FT on the reconstruction signals, as shown in Figure 5. In this paper, the parameters , a λ are selected as 1.2, 0.5 respectively in the process of sparse optimization using non-convex optimization regularization. The MPE threshold is selected as 0.7 (refer to the conclusion section). We recover the signals by DMD modes whose MPE values are less than the threshold. Spectrums were obtained by performing FT on the reconstruction signals, as shown in Figure 5. Meanwhile, in order to illustrate the effectiveness of our method, multi-scale wavelet transform is applied on the four noisy signals by taking the wavelet function db1, WT results are shown in Figure 6. Though it is obvious that the harmonic frequencies of r f and o f can be identified in Figure 6a,c, there are still some frequencies attributed by the noise involving around the midfrequencies, as can we see in the brown dotted box. Meanwhile the spectra of the proposed algorithm are cleaner and more readable then the results of FT and WT. Not only can we pick out their center and modulation frequencies in Figure 5, but also there is few noise frequencies surrounding them. Our method has tremendous advantages in both de-noising and fault feature extraction compared with FT and WT.  Meanwhile, in order to illustrate the effectiveness of our method, multi-scale wavelet transform is applied on the four noisy signals by taking the wavelet function db1, WT results are shown in Figure 6. Though it is obvious that the harmonic frequencies of f r and f o can be identified in Figure 6a,c, there are still some frequencies attributed by the noise involving around the mid-frequencies, as can we see in the brown dotted box. Meanwhile the spectra of the proposed algorithm are cleaner and more readable then the results of FT and WT. Not only can we pick out their center and modulation frequencies in Figure 5, but also there is few noise frequencies surrounding them. Our method has tremendous advantages in both de-noising and fault feature extraction compared with FT and WT. Meanwhile, in order to illustrate the effectiveness of our method, multi-scale wavelet transform is applied on the four noisy signals by taking the wavelet function db1, WT results are shown in Figure 6. Though it is obvious that the harmonic frequencies of r f and o f can be identified in Figure 6a,c, there are still some frequencies attributed by the noise involving around the midfrequencies, as can we see in the brown dotted box. Meanwhile the spectra of the proposed algorithm are cleaner and more readable then the results of FT and WT. Not only can we pick out their center and modulation frequencies in Figure 5, but also there is few noise frequencies surrounding them. Our method has tremendous advantages in both de-noising and fault feature extraction compared with FT and WT.

Algorithm Application in Experimental Signals of Rolling Bear
Signals of bearing systems collected by sensors from practice spots are much more complex than the simulation signals, which contain multi-component signals and background noises transmitted by other components. In order to verify the effectiveness of the proposed algorithm in signal de-noising and fault diagnosis, we apply the proposed algorithm to the actual bearing signals. We test the bearing signals from two experimental environments. First, the bearing signal is from the published bearing test data at the university of Cincinnati [41]; the second comes from our rotating machinery vibration analysis test rig.
The Intelligent Maintenance Systems of Cincinnati is shown in Figure 7. Four Rexnord ZA-2115 double row bearings were installed on a shaft, which was driven by an AC motor at the speed of 2000 RPM via sever rub belts. Each row of the bearing has Z = 16 roller elements. The circle diameter of the bearing bitch and roller elements are respectively D = 28.15 mm, d = 3.31 mm, and the bearing's contact angle is α = 15.17 • . They utilized high sensitivity quartz ICP accelerometers (PCB 353B33), installed on the bearing housing, collecting the vibration signals and NI DAQ Card 6062E collecting the signals.
We take their experimental data of fourth channel in Set No. 3 to verify our method. The total number of the time series data is 20,480, and the sampling frequency is 20 KHz. Finally, the bearing failed because of outer race fault. The rotation frequency and outer ring fault frequency can be respectively calculated as f r = 33.33 Hz, f o = 236.17 Hz by formulas [42].
The time domain and the frequency domain of the selected signals are shown in Figure 8. Obviously, it is hard to identify whether there is a bearing fault from Figure 8b for such large amounts of interference frequency presented in the frequency spectrum.
Then in Figure 9. We recover the wanted signals by DMD modes whose MPE values are less than 0.7. Spectrums are obtained by performing FT on the reconstruction signal, as shown in Figure 10.
machinery vibration analysis test rig. The Intelligent Maintenance Systems of Cincinnati is shown in Figure 7. Four Rexnord ZA-2115 double row bearings were installed on a shaft, which was driven by an AC motor at the speed of 2000 RPM via sever rub belts. Each row of the bearing has The time domain and the frequency domain of the selected signals are shown in Figure 8. Obviously, it is hard to identify whether there is a bearing fault from Figure 8b for such large amounts of interference frequency presented in the frequency spectrum. We take their experimental data of fourth channel in Set No. 3 to verify our method. The total number of the time series data is 20480, and the sampling frequency is 20 KHz. Finally, the bearing failed because of outer race fault. The rotation frequency and outer ring fault frequency can be respectively calculated as 33.33Hz, 236. 17Hz by formulas [42].
The time domain and the frequency domain of the selected signals are shown in Figure 8. Obviously, it is hard to identify whether there is a bearing fault from Figure 8b for such large amounts of interference frequency presented in the frequency spectrum.  Figure 9. We recover the wanted signals by DMD modes whose MPE values are less than 0.7. Spectrums are obtained by performing FT on the reconstruction signal, as shown in Figure 10.    As shown in Figure 10 there are a few characteristic frequencies located in the frequency spectrogram provided by our algorithm. Outer race failure frequency o f and its resonant frequency 2 o f can be easily captured. Also, some resonant frequencies of the rotation frequency r f are presented in the spectrum. However, regrettably, some of the frequencies we cannot account for are still in Figure 10, which are shown in the red box, revealing that the signal of the actual bearing system is complex. Yet we still got the failure frequency of the bearing test signal, which is in line with experimental results. Wavelet packet decomposition and EEMD are applied on the experimental signal, as shown in Figure 11. Figure 11a is the spectrum of the recover signal transformed by wavelet packet, with db15 and the default threshold. The optimal amplitude of the noise is set as 0.2 and the ensemble size is As shown in Figure 10 there are a few characteristic frequencies located in the frequency spectrogram provided by our algorithm. Outer race failure frequency f o and its resonant frequency 2 f o can be easily captured. Also, some resonant frequencies of the rotation frequency f r are presented in the spectrum. However, regrettably, some of the frequencies we cannot account for are still in Figure 10, which are shown in the red box, revealing that the signal of the actual bearing system is complex. Yet we still got the failure frequency of the bearing test signal, which is in line with experimental results.
Wavelet packet decomposition and EEMD are applied on the experimental signal, as shown in Figure 11. Figure 11a is the spectrum of the recover signal transformed by wavelet packet, with db15 and the default threshold. The optimal amplitude of the noise is set as 0.2 and the ensemble size is 100 for the EEMD. Figure 11b shows the spectrum of IMF3 which has the largest correlation with the original signal. A large amount of unexplainable interference frequencies, visible in the spectrogram, seriously affect the recognition of the rotation frequency and outer ring fault frequency. Thus, the optimized DMD outperforms wavelet packet decomposition and EEMD in the mode decomposition of rolling bearing experimental signal. 100 for the EEMD. Figure 11b shows the spectrum of IMF3 which has the largest correlation with the original signal. A large amount of unexplainable interference frequencies, visible in the spectrogram, seriously affect the recognition of the rotation frequency and outer ring fault frequency. Thus, the optimized DMD outperforms wavelet packet decomposition and EEMD in the mode decomposition of rolling bearing experimental signal. Next, we apply the algorithm on the experimental signal measured by our rotating machinery vibration analysis test rig. The physical structure of the experimental rig is shown in Figure 12. The whole device consists of a variable speed drive motor, shaft, gear box, deflection wheel, bearing and governor. A picture of the sensor in Figure 12  Next, we apply the algorithm on the experimental signal measured by our rotating machinery vibration analysis test rig. The physical structure of the experimental rig is shown in Figure 12. The whole device consists of a variable speed drive motor, shaft, gear box, deflection wheel, bearing and governor. A picture of the sensor in Figure 12 refers to the position where the single channel vibration acceleration sensor (PCB-352C33, NY, USA) is installed. A cylinder roller bearing NU205 with artificial defects on the outside surface of the inner race was mounted in the housing. The fault bearing physical diagram is shown in Figure 13. The number of rolling elements is Z = 13, and the cylinder roller diameter is d = 7.5 mm, pitch diameter of the testing bearing is D = 39 mm. We set the motor at the speed of 980 RPM. So, rotation frequency and outer ring fault frequency can be calculated as f r = 16.3 Hz, f i = 126.3 Hz, respectively [42]. Next, we apply the algorithm on the experimental signal measured by our rotating machinery vibration analysis test rig. The physical structure of the experimental rig is shown in Figure 12. The whole device consists of a variable speed drive motor, shaft, gear box, deflection wheel, bearing and governor. A picture of the sensor in Figure 12 refers to the position where the single channel vibration acceleration sensor (PCB-352C33, NY, USA) is installed. A cylinder roller bearing NU205 with artificial defects on the outside surface of the inner race was mounted in the housing. The fault bearing physical diagram is shown in Figure 13. The number of rolling elements is = 13, and the cylinder roller diameter is = 7.5 mm, pitch diameter of the testing bearing is = 39 mm. We set the motor at the speed of 980RPM. So, rotation frequency and outer ring fault frequency can be calculated as 16.3Hz,126.3Hz , respectively [42]. A total number of 4000 sample points, shown in Figure 14a in time domain, are taken into analysis with the proposed algorithm. The frequency domain is shown in Figure 14b.
As is shown in Figure 14b, rotational frequency and its resonant frequencies are presented in the spectrum diagram. The center frequency of the inner race can also be easily found out, but the modulation components are not obvious. Meanwhile, there are large amounts of interference frequency presented in the frequency spectral. The spectrum of the test signal is not intuitionistic, which makes it difficult to diagnose the fault in view of the technicians. As is shown in Figure 14b, rotational frequency and its resonant frequencies are presented in the spectrum diagram. The center frequency of the inner race can also be easily found out, but the modulation components are not obvious. Meanwhile, there are large amounts of interference frequency presented in the frequency spectral. The spectrum of the test signal is not intuitionistic, which makes it difficult to diagnose the fault in view of the technicians.
Subsequently, the proposed algorithm is applied to the tested signal with the same parameters  Figure 15 shows the MPE values of each DMD modes. DMD modes whose MPE values are less than 0.82 are reconstructed to form a recovery signal. Spectrums are obtained by performing FT on the reconstruction signal, as shown in Figure 16. Wavelet packet decomposition and EEMD are also employed on our experimental signal, as shown in Figure 17. The methods and parameters used in the analysis are consistent with the signal processing procedure of Cincinnati. Here the spectrum of IMF1 decomposed by EEMD is selected because it has the largest correlation with the original signal, as shown in Figure 17b. Obviously, Figure 16 is more readable than Figures 13b and 17. The rotational frequency r f and its harmonics ( 2 r f , 3 r f , 4 r f 5 r f ,  ) are easily identified. Not only could the center frequency be pointed out clearly but also the modulation frequency of the inner face. That is to say, we are more confident to judge that the inner race of the testing rolling bearing has encountered failure, which is highly consistent with the actual situation. The basic reason the frequencies are easy to read is that there are a few interference frequencies nearby them. Therefore, the proposed algorithm of noise reduction Subsequently, the proposed algorithm is applied to the tested signal with the same parameters λ = 1.2, a = 0.5, s = 5, d = 6, τ = 1. 682 single-frequency DMD modes are obtained by sparse optimization algorithm. Figure 15 shows the MPE values of each DMD modes. DMD modes whose MPE values are less than 0.82 are reconstructed to form a recovery signal. Spectrums are obtained by performing FT on the reconstruction signal, as shown in Figure 16.
Wavelet packet decomposition and EEMD are also employed on our experimental signal, as shown in Figure 17. The methods and parameters used in the analysis are consistent with the signal processing procedure of Cincinnati. Here the spectrum of IMF1 decomposed by EEMD is selected because it has the largest correlation with the original signal, as shown in Figure 17b. Obviously, Figure 16 is more readable than Figures 13b and 17. The rotational frequency f r and its harmonics (2 f r , 3 f r , 4 f r 5 f r , · · · ) are easily identified. Not only could the center frequency be pointed out clearly but also the modulation frequency of the inner face. That is to say, we are more confident to judge that the inner race of the testing rolling bearing has encountered failure, which is highly consistent with the actual situation. The basic reason the frequencies are easy to read is that there are a few interference frequencies nearby them. Therefore, the proposed algorithm of noise reduction and feature extraction is proved to be practical and to outperform other mode decompositions such as wavelet packet decomposition and EEMD. and feature extraction is proved to be practical and to outperform other mode decompositions such as wavelet packet decomposition and EEMD.

Conclusions
As a powerful tool for Koopman spectrum analysis technology with the characteristic of equation-free and data driven, DMD decomposes the time series into a series of single-frequency modes without any assumption. Theoretically, dynamic features of the original system including fault features and noise components exist in these DMD modes only if the truncation rank order r is appropriate. The truncated rank order needs to be preset artificially in the standard DMD algorithm, which is a crucial problem. Another puzzle with the application of standard DMD is of how to define a selection strategy to filter the undesired DMD modes in the process of signal reconstruction.
By proposing solutions for the selection strategy of truncated rank order and DMD modes, this paper puts forward a novel denoising and feature extraction algorithm for multi-component coupled noisy mechanical signals. Non-convex penalty regularization is adopted to the rank function of the shift-stack matrix, and a series of optimal DMD modes are obtained by this sparse optimization method. MPE is performed as a tool of optimal mode selection. After calculating the MPE of each DMD mode, signal is reconstructed by the modes whose entropies are smaller than a threshold. The proposed algorithm is successfully applied in rolling bearing simulation and experimental signals, demonstrating that it has a good application prospect in noise reduction and fault feature extraction. DMD algorithm has been widely used in the field of fluid mechanics and video processing. This paper presents an optimized DMD algorithm and applies it to mechanical vibration signal. The parameters of non-convex penalty regularization and MPE in our algorithm are proved to be appropriate by repeated verification. The non-convex optimization method is used to extract the dynamic characteristics of the original system more comprehensively than selecting truncation rank with a "elbow" of the singular value. The dynamic characteristics, fault features and noise components of the original signal can be identified by using MPE method. However, for filtering the noise components, entropy threshold needs to be set by multiple attempts in our algorithm. For example, when dealing with the experimental signal of Cincinnati, we firstly set the threshold values of 0.7, 0.75, 0.8, etc., and then observed the effect of the recovery signal through the spectrum diagram respectively. We found that when the threshold value was 0.7, the fault characteristics were more obvious, and the interference frequencies were less. Therefore, 0.7 was selected as the criterion. Research on different vibration signals, various transmission systems and different degrees of fault damage to select a certain MPE threshold is our future work.