Laplace Prior-Based Bayesian Compressive Sensing Using K-SVD for Vibration Signal Transmission and Fault Detection

: Vibration signal transmission plays a fundamental role in equipment prognostics and health management. However, long-term condition monitoring requires signal compression before transmission because of the high sampling frequency. In this paper, an e ﬃ cient Bayesian compressive sensing algorithm is proposed. The contribution is explicitly decomposed into two components: a multitask scenario and a Laplace prior-based hierarchical model. This combination makes full use of the sparse promotion under Laplace priors and the correlation between sparse blocks to improve the e ﬃ ciency. Moreover, a K-singular value decomposition (K-SVD) dictionary learning method is used to ﬁnd the best sparse representation of the signal. Simulation results show that the Laplace prior-based reconstruction performs better than typical algorithms. The comparison between a ﬁxed dictionary and learning dictionary also illustrates the advantage of the K-SVD method. Finally, a fault detection case of a reconstructed signal is analyzed. The e ﬀ ectiveness of the proposed method is validated by simulation and experimental tests. we presented a fault detection case using the reconstructed signal and several algorithms for veriﬁcation. The experimental results showed that Lap-CBCS-KSVD achieved good classiﬁcation accuracy under the SVM and RF models.


Introduction
High-speed vibration signal transmission plays an important role in capturing equipment failure, which is of interest in many applications. Compared with wired monitoring processes, wireless systems greatly increase flexibility, maintainability, and scalability [1]. However, due to the limitations of wireless transmission bandwidth, real-time monitoring must compress the signal for transmission. In previous studies, vibration signal compression was applied for structural health monitoring (SHM) [2,3]. However, the sampling frequency in SHM is much lower than that in mechanical monitoring. For example, only a 240-Hz sampling frequency [4] is needed in bridge structure monitoring, while a sampling frequency of at least 5-20 kHz [5] is required to realize mechanical monitoring, which causes great difficulties in vibration compression. At present, methods based on the transform domain dominate vibration signal compression. Among them, wavelet transform [6], arithmetic coding [7], and Huffman coding [8] are widely used. However, some deficiencies remain in the research on vibration signal compression.
Compressive sensing (CS) theory [9][10][11], which is based on the observation that a small collection of a sparse signal projections may contain sufficient information, emerged in recent years. CS can be regarded as a breakthrough of the Nyquist sampling theorem that requires fewer measurements. A number of algorithms for recovering originally sparse signals were proposed. These algorithms can be classified into three categories: convex optimization algorithms [12], greedy-based algorithms [13], and Bayesian algorithms [14].
Bayesian compressive sensing (BCS) combines Bayesian estimation with CS to obtain the maximum posterior probability of the original signal rather than performing point estimation. One advantage of BCS is that the noise generated in signal transmission is taken into consideration, and the original signal is not strictly required to be sparse. Furthermore, BCS may be extended to multitask CS (MCS) [15], which performs multiple sets of CS measurements jointly. MCS theory is based on the observation that measurements of different tasks are statistically related when multitask reconstructions are performed under the same scenario. Moreover, Zhang [16] et al. exploited the situation in which elements in the nonzero row of the matrix are temporally correlated, and proposed two sparse Bayesian learning algorithms. Babacan [17] et al. first used the Laplace prior-based BCS algorithm and obtained better sparseness for reconstruction. Another advantage of the Laplace prior is the log-concavity, which eliminates local minima. BCS is widely used in the fields of image processing [18], electrocardiograph (ECG) signal reconstruction [19], and radar signal estimation [20], but BCS is rarely used for mechanical vibration signal reconstruction.
The premise of CS is that a signal must be sparse or sparse in a certain domain. An over-complete dictionary that decomposes the original signal efficiently is required. Dictionaries can be divided into two categories: fixed dictionaries and learning dictionaries. Fixed dictionaries, such as discrete cosine transform, Fourier transform, and wavelet transform, have strong dependence on prior knowledge of the original signal. Learning dictionaries are updated adaptively according to the original signal to achieve increasing attention. Typical learning dictionaries, such as the method of optimal directions (MOD) [21] and K-singular value decomposition (K-SVD) [22], produce good results in application. The K-SVD algorithm was successfully applied in the fields of image denoising and CS. Zhou et al. [23] replaced the orthogonal basis function with a K-SVD-trained over-complete dictionary. Yang et al. [24] improved the K-SVD method by means of the correlation coefficient matching criterion and dictionary cutting. Shi et al. [25] combined the K-SVD algorithm with the idea that high-and low-resolution dictionaries can be cogenerated. In addition to K-SVD, Jafari et al. [26] found a link between sparsity in the dictionary and sparsity in decomposition. They proposed a greedy adaptive learning algorithm for finding sparse atoms. Ophir [27] et al. presented a multiscale dictionary learning method for different applications. The algorithm can not only reduce the training time but also improve the reconstruction quality.
However, vibration signal transmission is not the main purpose of machinery condition monitoring. In the past few years, considerable attention was paid to the combination of CS and fault detection. For example, Wang et al. [28] proposed a proximal decomposition algorithm for reconstruction of sparse time-frequency (TF) representation. The experiments on bearings and gears show that the proposed method can retain TF features through small measurements. Tang et al. [29] described a CS framework of characteristic harmonics for detecting bearing faults. In their framework, the processes of sampling and fault detection are performed simultaneously. Sun et al. [30] introduced the block sparse Bayesian learning method for CS reconstruction. The Bayesian algorithm works well by exploiting the block property and inner structures of the original signal. Experiments illustrate that the Bayesian method is suitable for signal reconstruction and fault detection.
In this paper, we introduce a Laplace prior into the hierarchical MCS model in combination with K-SVD dictionary learning. The contribution of this work is twofold. Firstly, we develop a new technique, named Laplace prior-based correlated-sparse-block Bayesian CS (Lap-CBCS), which imposes sparseness over the original signal and extends the Laplace prior-based BCS algorithm to the multitask scenario. Secondly, we use the K-SVD dictionary for sparse decomposition. For a given complex signal, such a dictionary can be trained for sparse promotion. Compared to the fixed transform, K-SVD offers improved decomposition and good signal reconstruction performance. The proposed method is referred to as Lap-CBCS-KSVD. Finally, an application of Lap-CBCS-KSVD for fault classification is presented using planetary gearbox data. The classification results of the support vector machine The structure of this paper can be summarized as follows: in Section 2, we review CS theory and K-SVD dictionary optimization. In Section 3, the Laplace prior-based correlated-sparse-block BCS algorithm is presented. In Section 4, the framework of the Lap-CBCS-KSVD method is proposed. Section 5 compares the simulation results of the proposed method with those of typical algorithms. Section 6 presents the fault classification results using the reconstructed signal. Section 7 summarizes all the content in this paper.

Compressive Sensing
CS [11] uses a low-dimensional signal to approximate the original signal. Denoting Ψ as the sparse transform, the original signal x(x∈R N ) can be represented as where θ is the coefficient vector in the Ψ-domain. If θ 0 represents the N-M smallest coefficients of θ set 0, then we have that θ 0 -θ 2 / θ 2 is negligibly small when M << N. Based on this observation, the CS measurements may be represented as Solving a sparse vector θ with respect to Θ is a commonly discussed problem [11,12]. However, BCS [14] solves the maximum posteriori estimate of the original signal from a probabilistic perspective.
In the field of BCS, a Gaussian prior-based model is widely used. However, the Laplace distribution, conjugated by Gaussian and exponential distributions, recently emerged. The results of Reference [17] show that the Laplace prior-based model has better sparseness promotion than the Gaussian prior-based model, while also being log-concave. Because the Laplace prior is not conjugate to the Gaussian prior, a hierarchical model is adopted here using the relevance vector machine (RVM) [31]. In this paper, we introduce this hierarchical model to the multitask scenario in Section 3.

K-SVD
To obtain a stronger sparse representation, this paper replaces the traditional transform bases with an over-complete dictionary. The basic idea is to use the K-SVD algorithm proposed by Aharon et al. [22] to train various signal blocks and adaptively update the dictionary atoms until an over-complete dictionary is obtained. Extended by K-means, the K-SVD algorithm effectively reduces the number of atoms in the dictionary and confirms that the remaining atoms still represent all the information. Compared to the fixed sparse dictionary, K-SVD efficiently avoids the strong dependence on prior knowledge and poor adaptability. Given training matrix Y and sparse matrix G, the process of dictionary D learning is described as Let E i represent the error after removing the i-th atom, d j represent the j-th column in dictionary D, and g i indicate the i-th row in sparse matrix G. We have the following equation: Since the elements in g i may be 0, the atoms must shrink during training. We define ω i as which represents the nonzero index set of g i . Another matrix Ω i , which places 1 at the position (ω i (i), i), is also introduced; thus, we have E i R = E i Ω i and g i R = g i Ω i for zero shrinking. Then, E i R can be decomposed as Atom d i in the dictionary is replaced by the first column of matrix U. The coefficient g i R is updated by the product of the first column of matrix V and ∆(1,1). At this point, the first column is updated, and the remaining columns can be updated in the same manner to generate a new dictionary.

Laplace Prior-Based Correlated-Sparse-Block BCS Method
In this section, we propose a new algorithm called Lap-CBCS. A hierarchical model that combines Laplace priors and multitask BCS is established; then, the signal is reconstructed by estimating the hyperparameters shared by all signal blocks in the model. The improved algorithm makes full use of the sparseness promotion with Laplace priors and the correlation between signal blocks to improve the accuracy of the reconstruction.

Distribution of the Lap-CBCS Hierarchical Model
Consider an N-dimensional signal x. Let Φ ∈ R M×N represent the measurement matrix, and let Ψ be an N × N transform basis such that x = Ψθ. Assume that L tasks of CS measurements are performed, with L observations statistically related as defined below.
In Equation (7), n i represents a noisy vector, described as a zero-mean Gaussian random variable with unknown precision α 0 (variance 1/α 0 ). The Gaussian likelihood function of Equation (7) can be expressed as The likelihood function of the original signal θ i is assigned Laplace priors, as shown in Equation (9). Because Laplace priors are not conjugate to the Gaussian distribution, we introduce a hierarchical prior to solve this problem.
In Equation (10), γ is a set of hyperparameters determining the prior distribution of θ i . To apply Laplace priors to a Bayesian model, hyperparameter λ, v is introduced for γ i , as shown in Equations (11) and (12). By combining Equations (10) and (11), we obtain extended Equation (13).
p(λ|v ) = Ga(λ|v/2, v/2 ) In this manner, the original signal θ i can be estimated with hyperparameters γ, λ, and ν. Moreover, the noisy precise α 0 can be represented by hyperparameters a and b. Figure 1 shows the hierarchical a priori model of MCS using Laplace priors. (13) In this manner, the original signal θi can be estimated with hyperparameters γ, λ, and ν. Moreover, the noisy precise α0 can be represented by hyperparameters a and b. Figure 1 shows the hierarchical a priori model of MCS using Laplace priors.

Bayesian estimation for hyperparameters.
The greatest difference between Lap-CBCS and multitask BCS is the embedded layer of hyperparameter λ. This section shows how to estimate the hyperparameters and θi. Given γ, λ, and v, the posterior of θi with known measurement yi can be expressed as Since is a multivariate Gaussian distribution with mean μi and covariance ∑i.
Then, we differentiate multi task Laplace L − with respect to γ and λ and set the result to 0. This process can be implemented readily via simplification to yield

Bayesian Estimation for Hyperparameters
The greatest difference between Lap-CBCS and multitask BCS is the embedded layer of hyperparameter λ. This section shows how to estimate the hyperparameters and θ i . Given γ, λ, and v, the posterior of θ i with known measurement y i can be expressed as Since is a multivariate Gaussian distribution with mean µ i and covariance i .
Then, we differentiate L multi−task Laplace with respect to γ and λ and set the result to 0. This process can be implemented readily via simplification to yield where µ i,j represents the estimated mean value of the j-th hyperparameter γ j for signal i and i,jj is the j-th diagonal component of the variance matrix of signal i. Note that γ new j is a function of µ i,j and i,jj , while µ i,j and i,jj are functions of γ new j : this situation suggests an iterative solution using Equations (15) and (18). Compared to the Gaussian hyperparameter α new i in Reference [14], we have γ new j > α new i , which indicates that Laplace priors can better promote sparseness.

Framework of the Lap-CBCS-KSVD Method
A flowchart of the Lap-CBCS-KSVD method is shown in Figure 2. The model includes three parts: dictionary off-line training, data compression, and signal reconstruction. The measurement matrix and block length are set before the acquisition node is activated. When a signal is collected, the signal must be split because the collected data are huge and the entire data cannot be used as training samples. Each signal block is decomposed using dictionary Ψ after training, and the original signal is compressed with a measurement matrix Φ. The compressed signal is transmitted through the wireless sensor network, and the reconstruction stage is completed at the manage node. In this part, the Laplace prior-based hierarchical model is introduced for Bayesian estimation. Based on the observation of received signal y i (i = 1, . . . , L), the sparse coefficients θ i (i = 1, . . . , L) are estimated with respect to matrix Θ = ΦΨ. Finally, the original signal blocks x i (i = 1, . . . , L) are reconstructed through inverse transformation. The whole algorithm works as follows:

1.
Original signal x is split into L blocks according to the block length set. During the off-line stage, these blocks are used as samples for K-SVD dictionary training.

2.
During signal acquisition, the signal blocks are decomposed and compressed using K-SVD dictionary Ψ and measurement matrix Φ.

3.
The compressed signal blocks are transmitted to the upper node via a wireless sensor network. 4.
The Laplace prior-based hierarchical model is established, and Bayesian estimation is conducted for sparse coefficients θ i .

5.
The inverse transformation is applied to obtain signal blocks and reconstructed signal x'.
where μi,j represents the estimated mean value of the j-th hyperparameter γj for signal i and ∑i,jj is the j-th diagonal component of the variance matrix of signal i.

Framework of the Lap-CBCS-KSVD method
A flowchart of the Lap-CBCS-KSVD method is shown in Figure 2. The model includes three parts: dictionary off-line training, data compression, and signal reconstruction. The measurement matrix and block length are set before the acquisition node is activated. When a signal is collected, the signal must be split because the collected data are huge and the entire data cannot be used as training samples. Each signal block is decomposed using dictionary Ψ after training, and the original signal is compressed with a measurement matrix Φ. The compressed signal is transmitted through the wireless sensor network, and the reconstruction stage is completed at the manage node. In this part, the Laplace prior-based hierarchical model is introduced for Bayesian estimation. Based on the observation of received signal yi (i = 1,… ,L), the sparse coefficients θi (i = 1, …, L) are estimated with respect to matrix Θ = ΦΨ. Finally, the original signal blocks xi (i = 1, …, L) are reconstructed through inverse transformation. The whole algorithm works as follows: 1. Original signal x is split into L blocks according to the block length set. During the offline stage, these blocks are used as samples for K-SVD dictionary training.
2. During signal acquisition, the signal blocks are decomposed and compressed using K-SVD dictionary Ψ and measurement matrix Φ. 3. The compressed signal blocks are transmitted to the upper node via a wireless sensor network.
4. The Laplace prior-based hierarchical model is established, and Bayesian estimation is conducted for sparse coefficients θi. 5. The inverse transformation is applied to obtain signal blocks and reconstructed signal x'.

Simulation
In this section, we examine the performance of our proposed Lap-CBCS-KSVD algorithm for real accelerometer data from a bearing and a gearbox. Case 1 comes from the ball bearing fault test bed [32] belonging to the electric engineering laboratory of Case Western Reserve University. The test bench consists of a 2-hp motor, a torque decoder, and a power tester. The tested bearing supports the rotor of a motor. The drive-end bearing is an SKF6205, and the fan-end bearing is an SKF6203. The bearing faults can be divided into three types according to the fault location: inner raceway fault, outer raceway fault, and ball fault. We select two datasets: IR007_0_105 and B028_0_3005.
Case 2 involves raw data collected from a gearbox test bed. The failure of a gearbox will substantially affect the stable operation of mechanical equipment. Thus, gearbox monitoring is of great significance in real scenarios. The experimental gearbox is a JZQ175. The electromagnetic speed-regulating motor provides 4 kW of power, and the air-cooled magnetic powder brake provides load for the gearbox. The data acquisition system consists of four 3056B4 piezoelectric sensors produced by Dytran Instruments, Inc. During the experiment, the sampling frequency was set to 20 kHz.
We conducted two types of preset fault experiments for the gearbox: crack failure experiments and broken-tooth failure experiments. Four failure states were considered: 5-mm crack fault, 5-mm broken-tooth fault, 8-mm crack fault, and 10-mm broken-tooth fault. The selected datasets are shown in Table 1, and the signal after sampling is shown in Figure 3. In this paper, the compression ratio is denoted as

Simulation
In this section, we examine the performance of our proposed Lap-CBCS-KSVD algorithm for real accelerometer data from a bearing and a gearbox. Case 1 comes from the ball bearing fault test bed [32] belonging to the electric engineering laboratory of Case Western Reserve University. The test bench consists of a 2-hp motor, a torque decoder, and a power tester. The tested bearing supports the rotor of a motor. The drive-end bearing is an SKF6205, and the fanend bearing is an SKF6203. The bearing faults can be divided into three types according to the fault location: inner raceway fault, outer raceway fault, and ball fault. We select two datasets: IR007_0_105 and B028_0_3005.
Case 2 involves raw data collected from a gearbox test bed. The failure of a gearbox will substantially affect the stable operation of mechanical equipment. Thus, gearbox monitoring is of great significance in real scenarios. The experimental gearbox is a JZQ175. The electromagnetic speed-regulating motor provides 4 kW of power, and the air-cooled magnetic powder brake provides load for the gearbox. The data acquisition system consists of four 3056B4 piezoelectric sensors produced by Dytran Instruments, Inc. During the experiment, the sampling frequency was set to 20 kHz.   Table 1, and the signal after sampling is shown in Figure 3. In this paper, the compression ratio is denoted as Amplitude/m The mean square error (MSE), peak signal-to-noise ratio (PSNR), and Pearson correlation coefficient (r) were used as evaluation metrics. where u represents the original signal, u' represents the reconstructed signal, and u max represents the largest component of vector u.
where u represents the original signal, u' represents the reconstructed signal, and umax represents the largest component of vector u.

Comparison with other reconstruction algorithms
In this part, several widely used CS algorithms are compared with Lap-CBCS. Figure 4 presents the reconstruction results of basic pursuit (BP) [12], orthogonal matching pursuit (OMP) [13], BCS [14], and regular orthogonal matching pursuit (ROMP) [33].  The B028_0_3005 bearing data were chosen as an example to illustrate the reconstruction performance. The blue lines represent raw signals, and the red lines with spikes represent reconstructed signals. To guarantee the principle of a single variable, the discrete cosine transform (DCT) transform was selected as the sparse basis. In contrast to Lap-CBCS, the BCS algorithm uses the traditional Gaussian prior model for reconstruction. The BP and OMP algorithms are typical representatives of convex optimization and greedy iteration algorithms. The results in Figure 4 show that Lap-CBCS recovered the original signal better than the other four algorithms; the MSE results confirmed this conclusion.
Then, we investigated the effectiveness of different compression ratios on the reconstruction performance of different algorithms. As shown in Figure 5, we varied the compression ratio (CR) from 0 to 1. A smaller MSE indicates more accurate reconstruction. For each data point, 100 groups of experiments were conducted to calculate the average ρ and variance τ. Figure 5 gives the range of [ρ − 2τ, ρ + 2τ] for each point, which can be regarded as the 95% confidence interval. On the basis of all the results with different CRs, it can be seen that the MSE of Lap-CBCS was smaller than that of BP, BCS, and OMP, which confirms the effectiveness of the proposed algorithm. Moreover, the variance of Lap-CBCS was the smallest. Figure 5 shows that the variance of each point increased with increasing CR, which indicates the instability of reconstruction algorithms under high CRs.
The B028_0_3005 bearing data were chosen as an example to illustrate the reconstruction performance. The blue lines represent raw signals, and the red lines with spikes represent reconstructed signals. To guarantee the principle of a single variable, the discrete cosine transform (DCT) transform was selected as the sparse basis. In contrast to Lap-CBCS, the BCS algorithm uses the traditional Gaussian prior model for reconstruction. The BP and OMP algorithms are typical representatives of convex optimization and greedy iteration algorithms. The results in Figure 4 show that Lap-CBCS recovered the original signal better than the other four algorithms; the MSE results confirmed this conclusion. Then, we investigated the effectiveness of different compression ratios on the reconstruction performance of different algorithms. As shown in Figure 5, we varied the compression ratio (CR) from 0 to 1. A smaller MSE indicates more accurate reconstruction. For each data point, 100 groups of experiments were conducted to calculate the average ρ and variance τ. Figure 5 gives the range of [ρ − 2τ, ρ + 2τ] for each point, which can be regarded as

Robustness and Cost Analysis of the Reconstruction Algorithms
More experiments are needed to compare the robustness of OMP, BP, BCS, and Lap-CBCS. The datasets IR007_0_105 and break10mm_800r_15nm are used in this section. One hundred experiments were conducted to calculate the average MSE for each signal-to-noise ratio (SNR). The SNR is defined in Equation (23). Figure 6 shows that the four algorithms were not substantially different when the SNR was small; however, the PSNR of Lap-CBCS increased significantly when the SNR was large, indicating that Lap-CBCS is superior to the other three algorithms. SNR(dB) = 20lg Φx 2 / y 2 (23) Furthermore, Figure 7 compares the average running time of Lap-CBCS, OMP, BP, and BCS. Lap-CBCS and BCS consumed more time when the CR was small, which indicates the complexity of Bayesian algorithms. However, the time costs of Lap-CBCS and BCS decreased as the CR approached 1. At this time, BP had a greater time cost than BCS and Lap-CBCS. In summary, when CR > 0.7, there was little difference in the time consumption of Lap-CBCS, BCS, and OMP; thus, Lap-CBCS achieved a good balance between cost and efficiency under high CR. Therefore, Lap-CBCS is suitable for reconstructing highly compressed signals.

Comparison with K-SVD dictionary learning and traditional sparse representation
Traditional fixed dictionaries are generally obtained via an orthogonal transform, such as fast Fourier transform (FFT), DCT, and wavelet packet transform (WPT). When the signal characteristics are consistent with the atomic features in the dictionary, an efficient representation can be obtained. However, for real signals, the sparseness is unknown. These fixed orthogonal bases are not sufficiently flexible to represent such signals. An adaptive over-complete dictionary is needed to ensure that the atomic scale in the dictionary is close to that of the original signal.  To compare the sparseness of dictionaries, a threshold ε was set to 2% of the peakto-peak value of signal x. Thus, we have ε = |max(x)-min(x)|×2%. The data points in the range [−ε, ε] were set to 0, and the number of nonzero elements in signal x was counted as N0. With signal length N, the sparseness of signal x can be represented as In this section, we compare the K-SVD adaptive dictionary with the fixed dictionaries of FFT, DCT, and WPT. The bearing data 12k_Drive_End_OR007@3_0_144 and In summary, when CR > 0.7, there was little difference in the time consumption of Lap-CBCS, BCS, and OMP; thus, Lap-CBCS achieved a good balance between cost and efficiency under high CR. Therefore, Lap-CBCS is suitable for reconstructing highly compressed signals.

Comparison with K-SVD Dictionary Learning and Traditional Sparse Representation
Traditional fixed dictionaries are generally obtained via an orthogonal transform, such as fast Fourier transform (FFT), DCT, and wavelet packet transform (WPT). When the signal characteristics are consistent with the atomic features in the dictionary, an efficient representation can be obtained. However, for real signals, the sparseness is unknown. These fixed orthogonal bases are not sufficiently flexible to represent such signals. An adaptive over-complete dictionary is needed to ensure that the atomic scale in the dictionary is close to that of the original signal.
To compare the sparseness of dictionaries, a threshold ε was set to 2% of the peak-to-peak value of signal x. Thus, we have ε = |max(x)-min(x)|×2%. The data points in the range [−ε, ε] were set to 0, and the number of nonzero elements in signal x was counted as N 0 . With signal length N, the sparseness of signal x can be represented as In this section, we compare the K-SVD adaptive dictionary with the fixed dictionaries of FFT, DCT, and WPT. The bearing data 12k_Drive_End_OR007@3_0_144 and 12k_Drive_End_B028_0_3005 were randomly selected for the simulation. Firstly, we compared the sparseness of the original signal and that of the transformed signals. A signal segment consisting of 150 points was chosen for comparison. For K-SVD training, the number of atoms was set to 50, and the number of iterations was set to 10. Figure 8 displays the original signal and the results of the K-SVD, DCT, and WPT transforms. The signal after K-SVD dictionary optimization had the smallest sparseness. Next, the reconstruction effects were compared. The signal block length was set to 32, and we prepared 200 blocks for training. The remaining 500 blocks were chosen for assessing the dictionary validity. To guarantee a single variable, Lap-CBCS was adopted as the common reconstruction algorithm. Figure 9 shows the MSE, PSNR, and r of the signals reconstructed using different sparse dictionaries.   Figure 9 shows that the signal reconstructed using the K-SVD dictionary had a small MSE and a large PSNR, which indicates good performance. Comparison of the correlation coefficients confirmed this conclusion. In addition, many factors affected the performance of the K-SVD algorithm. Therefore, all these factors, which are discussed below, must be fully considered. Figure 10a shows the effects of different initial dictionaries for K-SVD training. As can be seen, r does not vary substantially for the three initial dictionaries. Figure 10b exploits the relationship between the number of atoms and r. The results show that r varied in a small range of [0.92, 0.95], indicating that K-SVD training is not sensitive to the number of atoms. Therefore, no special requirements for the initial dictionary and atoms exist in this paper. Next, the reconstruction effects were compared. The signal block length was set to 32, and we prepared 200 blocks for training. The remaining 500 blocks were chosen for assessing the dictionary validity. To guarantee a single variable, Lap-CBCS was adopted as the common reconstruction algorithm. Figure 9 shows the MSE, PSNR, and r of the signals reconstructed using different sparse dictionaries. Figure 9 shows that the signal reconstructed using the K-SVD dictionary had a small MSE and a large PSNR, which indicates good performance. Comparison of the correlation coefficients confirmed this conclusion. In addition, many factors affected the performance

Application of Fault Detection with the Reconstructed Signal
The proposed reconstruction method was validated using signals collected from the planetary gearbox. The structure of the mechanical test rig is shown in Figure 11. Wear failure was seeded on one tooth of the sun gear, planet gear, and ring gear. The experiments were conducted under the speeds, 400 rpm and 800 rpm, and loads, 0.4 Nm and 1.2 Nm. The sampling frequency was set to 20 kHz. The specific failures are shown in Figure 12.
relationship between the number of atoms and r. The results show that r varied in a small range of [0.92, 0.95], indicating that K-SVD training is not sensitive to the number of atoms. Therefore, no special requirements for the initial dictionary and atoms exist in this paper.

Application of fault detection with the reconstructed signal
The proposed reconstruction method was validated using signals collected from the planetary gearbox. The structure of the mechanical test rig is shown in Figure 11. Wear failure was seeded on one tooth of the sun gear, planet gear, and ring gear. The experiments were conducted under the speeds, 400 rpm and 800 rpm, and loads, 0.4 Nm and 1.2 Nm. The sampling frequency was set to 20 kHz. The specific failures are shown in Figure 12.     The ultimate purpose of condition monitoring is to identify fault types. In this section, we use the reconstructed signals, rather than the original signal, for failure classification. The reconstruction effectiveness was validated by comparing the classification accuracies of different methods. The data used in our test can be divided into four categories: (1) normal state, (2) planet gear failure, (3) ring gear failure, and (4) sun gear failure. The training samples and testing samples are shown in Tables 2 and 3. We used the raw signals from the test rig as the training samples and used the reconstructed signals as the testing samples. The signal block length was set to 200 sampling points, and the number of blocks is shown in Tables 2 and 3.  In the application of fault diagnosis for a planetary gearbox, various features may be sensitive to different equipment states. Therefore, feature extraction is needed before fault classification. We decomposed the planetary gearbox signal using the WPT by three levels and took the energy spectra of eight decomposition nodes as fault features. Support vector machines (SVMs) are state-of-the-art large-margin classifiers that are widely used in pattern recognition and many other applications. RF, which constructs a large set of independent decision trees, was also introduced as a classification algorithm. The results of these trees were combined for the classification task. In this paper, SVM and RF were selected to compare the effectiveness of different reconstruction algorithms. Higher classification accuracy indicates more accurate reconstruction.
The classification accuracy represents the ratio of the number of testing samples identified accurately to the total number of all samples. Figure 13 shows that, as the CR increased, the classification accuracy decreased gradually. However, Lap-CBCS-KSVD achieved the best classification accuracy when the CR was approximately 0.4-0.8. Thus, the same conclusion as that in Section 5.2 was obtained here; Lap-CBCS-KSVD is recommended for reconstructing highly compressed signals. Moreover, Lap-CBCS-KSVD was better than Lap-CBCS-DCT when the CR varied from 0.5-0.8, a common range in practice. The results of all DCT-based methods showed that Lap-CBCS was better than other commonly used reconstruction algorithms, such as BP, OMP, BCS, and ROMP. In accordance with Figure 8, the sparseness η of WPT was the largest, suggesting that the sparse promotion of WPT was the worst among FFT, DCT, and WPT. This constitutes the key reason why Lap-CBCS-WPT reconstructed signals delivered such a bad performance in Figure 13. Moreover, we studied the confusion matrices for different compression ratios, taking CR = 0.1 and 0.9 as examples. With four states represented by N, P, R, and S, we determined the confusion matrices of SVM using the Lap-CBCS-KSVD reconstructed signal. From Table 4, we can find that the planet gear failure, ring gear failure, and sun gear failure were likely to be confused with the normal state when CR was large. However, when CR was decreasing, as in Table 5, the reconstructed signal was more and more easily classified.
Finally, we considered another approach for machinery monitoring. With regard to "sending the whole signal uncompressed and fault detect at the center", we carried out further experiments, and attained the results described below. The classification accuracy using SVM for original signal was 98.75% and the one making use of random forest was 98.33%. The fault classification of the original signal was much better as compared with that of the reconstructed signal. However, we are interested in stating that, if the classification accuracy of 90% is acceptable, a compressed signal with CR = 0.5 is expected to significantly lower the cost of sensor and transmission. Accordingly, there exists a balance between the classification accuracy and transmission cost. The present paper makes it possible to attain efficient transmission if the classification accuracy is acceptable. recognition and many other applications. RF, which constructs a large set of independent decision trees, was also introduced as a classification algorithm. The results of these trees were combined for the classification task. In this paper, SVM and RF were selected to compare the effectiveness of different reconstruction algorithms. Higher classification accuracy indicates more accurate reconstruction. The classification accuracy represents the ratio of the number of testing samples identified accurately to the total number of all samples. Figure 13 shows that, as the CR increased, the classification accuracy decreased gradually. However, Lap-CBCS-KSVD achieved the best classification accuracy when the CR was approximately 0.4−0.8. Thus, the same conclusion as that in Section 5.2 was obtained here; Lap-CBCS-KSVD is recommended for reconstructing highly compressed signals. Moreover, Lap-CBCS-KSVD was better than Lap-CBCS-DCT when the CR varied from 0.5−0.8, a common range in practice. The results of all DCT-based methods showed that Lap-CBCS was better than other commonly used reconstruction algorithms, such as BP, OMP, BCS, and ROMP. In accordance with Figure 8, the sparseness η of WPT was the largest, suggesting that the sparse promotion of WPT was the worst among FFT, DCT, and WPT. This constitutes the key reason why Lap-CBCS-WPT reconstructed signals delivered such a bad performance in Figure 13. Moreover, we studied the confusion matrices for different compression ratios, taking CR = 0.1 and 0.9 as examples. With four states represented by N, P, R, and S, we determined the confusion matrices of SVM using the Lap-CBCS-KSVD reconstructed signal. From Table 4, we can find that the planet gear failure, ring gear failure, and sun gear failure were likely to be confused with the normal state when CR was large. However, when CR was decreasing, as in Table 5, the reconstructed signal was more and more easily classified.
Finally, we considered another approach for machinery monitoring. With regard to "sending the whole signal uncompressed and fault detect at the center", we carried out further experiments, and attained the results described below. The classification accuracy using SVM for original signal was 98.75% and the one making use of random forest was

Conclusions
Wireless transmission for vibration signals has great potential. However, the high sampling frequency requires an efficient signal compression approach. This paper developed a new technique that aims to impose sparseness over the original signal by means of Laplace priors and extending the algorithm to a multitask scenario. In addition, a K-SVD training method was used for signal sparse decomposition. The reconstruction performance of Lap-CBCS was compared with that of typical algorithms, such as OMP, BP, and BCS. This paper also discussed the benefits of K-SVD dictionary learning. Finally, we presented a fault detection case using the reconstructed signal and several algorithms for verification. The experimental results showed that Lap-CBCS-KSVD achieved good classification accuracy under the SVM and RF models.