Bearing Fault Diagnosis Based on Improved Convolutional Deep Belief Network

: Mechanical equipment fault detection is critical in industrial applications. Based on vibration signal processing and analysis, the traditional fault diagnosis method relies on rich professional knowledge and artiﬁcial experience. Achieving accurate feature extraction and fault diagnosis is di ﬃ cult using such an approach. To learn the characteristics of features from data automatically, a deep learning method is used. A qualitative and quantitative method for rolling bearing faults diagnosis based on an improved convolutional deep belief network (CDBN) is proposed in this study. First, the original vibration signal is converted to the frequency signal with the fast Fourier transform to improve shallow inputs. Second, the Adam optimizer is introduced to accelerate model training and convergence speed. Finally, the model structure is optimized. A multi-layer feature fusion learning structure is put forward wherein the characterization capabilities of each layer can be fully used to improve the generalization ability of the model. In the experimental veriﬁcation, a laboratory self-made bearing vibration signal dataset was used. The dataset included healthy bearings, nine single faults of di ﬀ erent types and sizes, and three di ﬀ erent types of composite fault signals. The results of load 0 kN and 1 kN both indicate that the proposed model has better diagnostic accuracy, with an average of 98.15% and 96.15%, compared with the traditional stacked autoencoder, artiﬁcial neural network, deep belief network, and standard CDBN. With improved diagnostic accuracy, the proposed model realizes reliable and e ﬀ ective qualitative and quantitative diagnosis of bearing faults.


Introduction
With the explosive progress of modern science and industry, machinery and equipment in fields such as aerospace, rail, and wind power are becoming faster, more automated, and meticulous than before.However, increasingly complex operating conditions inevitably lead to failures.Thus, monitoring and diagnosing the health of rotating machinery and equipment to ensure operational safety and reliability are immensely necessary and urgent; fortunately, intelligent diagnosis approaches, the combination of classification algorithms, and signal processing techniques have produced promising results [1].
Based on vibration signal processing, the traditional fault diagnosis methods extract fault components from the initial noise signal by using rich professional knowledge.At present, signal processing methods, such as empirical mode decomposition (EMD) [2], wavelet packet transform [3], Appl.Sci.2020, 10, 6359 2 of 17 and morphological filter [4], are commonly used for time-frequency analysis.For separating bearing fault ingredients in a wind turbine gearbox, Dong et al. [5] applied improved convolutional neural network with anti-interference to rolling bearing performance degradation assessment.Gong et al. [6] proposed a novel deep learning method called improved convolutional neural networks and support vector machines with data fusion for intelligent fault diagnosis.To realize bearing fault detection, Shi et al. [7] formed a quantity matrix of bearing vibration signal features on the basis of EMD and local mean decomposition.In recent years, machine learning methods, such as support vector machines [8] and artificial neural networks (ANNs) [9], have been gradually introduced into mechanical fault diagnosis.Such an introduction has enabled the minimization of human intervention and reliance on professional skills for development toward an intelligent direction in this field.
As an urgent field in machine learning research, deep learning has begun to appear in bearing failure in recent years, including stacked autoencoder (SAE) [10], deep belief networks (DBNs) [11], and convolutional neural networks (CNNs) [12].Zhao et al. [13] summarized the emerging research on machine health monitoring based on deep learning, then discussed new trends in machine learning monitoring methods of deep learning.In considering ways of enhancing the robust features for rotating machinery, Shen et al. [14] proposed an automatic learning method with robust features based on contractive autoencoder.Chen et al. [12] proposed cyclic spectral coherence and convolutional neural networks for bearing fault diagnosis.Shao et al. [15] compressed the data with autoencoders and constructed a convolutional deep belief network (CDBN) to diagnose bearing faults.In view of compressed sensing and deep learning theory, Wen [16] studied a fault diagnosis method of bearing vibration signals that can extract features automatically.
In summary, traditional methods have several shortcomings, as follows: 1.
Traditional vibration signal processing and analysis methods rely on certain professional skills; 2.
Existing shallow machine learning methods rely on the accuracy of manual feature extraction; 3.
Improper selection of parameters for standard deep learning models can easily result in failure to effectively converge, thus, diagnostic accuracy is difficult to guarantee; 4.
Existing research on the quantitative diagnosis of bearing fault is relatively inadequate compared with that on qualitative diagnosis.
It can be concluded that the general deep models have challenges: (1) a suitable signal preprocessing method is needed to enhance features; (2) necessary measures need to be taken during the model training process to make it more stable; (3) the models should determine not only the fault type, but also the fault degree.In response to these problems, a CDBN has the superiority of fast calculations and feature extraction.Based on the standard CDBN model, this study introduces Adam's optimization algorithm and optimizes the structure of the CDBN.An improved CDBN model is proposed to enhance diagnosis accuracy.Unlike image processing, in the case of fault diagnosis, the raw data will probably be noisy.The feature extraction capability of a standard CDBN is insufficient, which is the reason of conducting band-pass filter to pre-process the data.The main constructions of the proposed method are listed as follows: 1.
A band pass filter has been introduced to the preprocessing step to filter out noise; 2.
Qualitative and quantitative diagnoses of bearing faults can be effectively implemented; 3.
Both single fault and compound faults can be effectively identified; 4.
A comparative experiment under different operation loads further confirmed the reliability of the model.
The rest of the paper is organized as follows.Section 2 contains an introduction of the theoretical background of a CDBN and the structure of the proposed method.The experimental results are discussed in Section 3. Finally, in Section 4, conclusions are summarized.

Restricted Boltzmann Machine
DBN is composed of several restricted Boltzmann machines (RBMs).The definition of the RBM model is based on the energy function.It is a basic component of DBNs and a key preprocessing unit in deep learning [17].RMB is a basic single-layer machine learning network and has extensive applications.The formulas involved in the following text refer to the study of Chen and Li [17].As shown in Figure 1, an RBM structure has two layers, namely, visible and hidden layers.A connection exists between each neuron node in the adjacent layer, and in each layer, the neuron nodes remain in a connectionless state with each other.The connection weight between two layers is W. RBM neurons are Booleans, implying that only two states exist, 0 and 1.State 1 indicates the activation of a neuron, and state 0 indicates suppression.In a way, for a given visible and hidden layer neuron in a binary state, the energy function of the RBM is as follows: where θ = W ij , a i , b j is the parameter set of RBM, w ij denotes the weight of the ith cell in the visible layer and the jth cell in the hidden layer, a i denotes the offset of the ith unit of the visible layer, b j denotes the offset of the jth cell of the hidden layer, v i is the state of the ith unit in the visible layer, and h j is the state of the jth unit in the hidden layer.

Restricted Boltzmann Machine
DBN is composed of several restricted Boltzmann machines (RBMs).The definition of the RBM model is based on the energy function.It is a basic component of DBNs and a key preprocessing unit in deep learning [17].RMB is a basic single-layer machine learning network and has extensive applications.The formulas involved in the following text refer to the study of Chen and Li [17].As shown in Figure 1, an RBM structure has two layers, namely, visible and hidden layers.A connection exists between each neuron node in the adjacent layer, and in each layer, the neuron nodes remain in a connectionless state with each other.The connection weight between two layers is W. RBM neurons are Booleans, implying that only two states exist, 0 and 1.State 1 indicates the activation of a neuron, and state 0 indicates suppression.In a way, for a given visible and hidden layer neuron in a binary state, the energy function of the RBM is as follows: where  = { ,  ,  } is the parameter set of RBM,  denotes the weight of the ith cell in the visible layer and the jth cell in the hidden layer,  denotes the offset of the ith unit of the visible layer,  denotes the offset of the jth cell of the hidden layer,  is the state of the ith unit in the visible layer, and ℎ is the state of the jth unit in the hidden layer.After the parameters are determined, the joint probability distribution of (v,h) is exported by Formula (1), where Z(θ) represents the normalized constant.After the parameters are determined, the joint probability distribution of (v,h) is exported by Formula (1), where Z(θ) represents the normalized constant.
From the joint probability distribution, the edge probability and the conditional probability of visible and hidden neurons are driven by Formulas (4) and ( 5), respectively: Given a visible neuron and a hidden neuron state, the probability of activating the jth hidden neuron and the ith visible neuron are respectively provided as follows, where σ = 1/(1 + e −x ): The contrast divergence algorithm [18] is used in RBM during model training.The task of training RBM is to find the optimal value of parameter θ so that the edge probability of the visible neurons in the distribution represented by RBM is maximized; that is, the goal is to maximize the log-likelihood function.
2.2.Convolutional Deep Brief Network and Its Improvement

Convolutional Restricted Boltzmann Machine
Convolutional RBM (CRBM) is an improvement on the basis of the original RBM, and the structure of CRBM is similar to that of RBM.CRBM is an improved model composed of two random variable matrix layers, namely, the visible and hidden layers.An image with salient features of the local receptive field and weight sharing is regarded as the input layer of CRBM.The hidden layers are locally connected with visible layers, and their weights are shared by convolution.
The CRBM model, as shown in Figure 2, comprises the view layer V, the hidden layer H, and the pooling layer P, for a total of three layers.We assume that the size of the input layer matrix is N v × N v , the number of groups of matrices in the hidden layer is k, each group is a binary array with the size of N H × N H , and N 2 H K hidden layer units are present.Each group of hidden layers is associated with an N w × N w size filter.
Figure 3 shows the process of obtaining the hidden layer from the visible layer.The size of the convolution kernel is 3 × 3, and the hidden layer units are divided into k sub-matrices.W 1 , W 2 , • • • , W K connects the visible and hidden layers.Each hidden layer unit represents a specific feature extracted from a neighborhood unit of a visible layer.Moreover, b k indicates the value of each hidden layer unit bias, and C is the bias globally shared by all visible units.We obtain the energy number of CRBM as Equation ( 9): where * indicates convolution, v ij denotes the input values of the ith visible layer unit and the jth hidden layer unit, h k denotes the kth hidden layer, h k ij denotes the values of the ith visible layer unit and the jth unit of the kth hidden layer, and w k denotes the convolution kernel of the kth hidden unit.
As with standard RBM, the conditional probability distribution is given as follows: where A CRBM is a single-layered network.It can be considered as the basis of a CDBN.Stacking multiple CRBMs, with the hidden layer of the previous RBM as the visible layer of the next CRBM, constitutes a CDBN.In each training, the RBM of the lowest layer is trained, one layer at a time, until the top layer.
Appl.Sci.2020, 10, x FOR PEER REVIEW 5 of 17 each hidden layer unit bias, and  is the bias globally shared by all visible units.We obtain the energy number of CRBM as Equation ( 9): where * indicates convolution,  denotes the input values of the ith visible layer unit and the jth hidden layer unit, ℎ denotes the kth hidden layer, ℎ denotes the values of the ith visible layer unit and the jth unit of the kth hidden layer, and  denotes the convolution kernel of the kth hidden unit.
A CRBM is a single-layered network.It can be considered as the basis of a CDBN.Stacking multiple CRBMs, with the hidden layer of the previous RBM as the visible layer of the next CRBM, constitutes a CDBN.In each training, the RBM of the lowest layer is trained, one layer at a time, until the top layer.each hidden layer unit bias, and  is the bias globally shared by all visible units.We obtain the energy number of CRBM as Equation ( 9): where * indicates convolution,  denotes the input values of the ith visible layer unit and the jth hidden layer unit, ℎ denotes the kth hidden layer, ℎ denotes the values of the ith visible layer unit and the jth unit of the kth hidden layer, and  denotes the convolution kernel of the kth hidden unit.
A CRBM is a single-layered network.It can be considered as the basis of a CDBN.Stacking multiple CRBMs, with the hidden layer of the previous RBM as the visible layer of the next CRBM, constitutes a CDBN.In each training, the RBM of the lowest layer is trained, one layer at a time, until the top layer.

CDBN and Its Improvement
Proposed by Lee in 2009, a CDBN is a network model that consists of a CRBM.Multiple CRBMs are connected to form a CDBN.The outcome of the previous CRBM layer is regarded as the input of its subsequent layer.The model fitting ability is further improved by multi-layer linking.
The objective function of a CDBN is to maximize the log-likelihood function θ (•) in order to get the optimal parameters, and reconstruction error is used to evaluate the model.The reconstruction error refers to the difference between the original data after Gibbs sampling with the training sample as the initial state and the distribution of the RBM model.A smaller reconstruction error indicates better training.Reconstruction error is given by Equation ( 12), with Xi and X i indicating the real output and ideal output: Based on the standard CDBN model, an improved structure is proposed.The standard CDBN uses only single-layer output features but ignores the comprehensive utilization of the features of each layer.As a consequence, the classification results are not representative.An improvement of the model connection mode is proposed in this study.The outputs of a two-layer CRBM are combined into a vector and input to the softmax classifier by multi-level feature fusion to utilize features, further improving classification accuracy.
Model training can be summarized according to the following steps: 1.
Forward propagation: Update the weight and bias of the network with Adam optimizer.

Band-Pass Filter for Signal Preprocessing
A band-pass filter is a commonly used data processing method to filter out clutter.A band-pass filter can retain frequency components in a certain frequency range [19] while filtering out those in other ranges.Its bandwidth is determined by parameter λ, following positive correlation.
Assuming that the input image size is N × N, its passband bandwidth is given by Equation (13) [19]: Therefore, the filter function is given by Equation ( 14), with x and y denoting the angular frequency of a two-dimensional filter [20]: where ρ(x, y) = x 2 + y 2 .Figure 4 shows the spectrogram of a band-pass filter.It can effectively remove high-frequency noise and low frequency-connected domain parts in the image, retaining the texture features in the image and increasing the suitability of the samples for CNNs to process.

Fault Diagnosis Model Based on Improved CDBN
Based on the CDBN model in 2.2.2 2.2, we further improved the model according to two aspects: changes to the preprocessing stage and the introduction of an optimizer.First, in the preprocessing stage, we introduced a band-pass filter to process the original samples after fast Fourier transform (FFT) and folding.The center frequency of the band-pass filter is set to 50% of the bandwidth, and the passband width is set to 40% of the bandwidth.The spectrogram of the band-pass filter is shown in Figure 4. Second, during the model training phase, the Adam optimizer [20] is introduced, enabling the model to adjust to different learning rates for each parameter.For frequently changing parameters, smaller steps are applied, whereas larger steps are better for sparse parameters.This algorithm is used to further improve the model's convergence speed and to reduce the number of model errors.
Figure 5 is the flowchart of the improved CDBN-based bearing fault diagnosis model proposed in this study.The input vibration signal is at a length of 1024.After FFT, it is normalized to between 0 and signals of different fault types and degrees of failure.Moreover, band-pass filtering is conducted to enhance texture features and reduce the noise effect.The CDBN model is a combination of two CRBM layers.For the first CRBM layer, the convolution kernel size is 7 × 7, the number of link weight matrices is 9, each size matches the size of the convolution kernel, the pooling kernel size is 2 × 2, and the maximum pooling method is used.The output of the first layer with a size of 13 × 13 is a feature map, and it is used as the input of the second CRBM layer.The second hidden layer adopts a size of 9 × 9 with 16 link weight matrices.The size of the second convolution kernel is 5 × 5, and the size of the second pooling kernel is 2 × 2. The proposed model also uses the maximum pooling method.At the end of the model, softmax is used for classification.The extracted features are mapped to 13 different types, divided according to fault types or degrees.Unlike the standard CDBN, before classification, the outputs of the first layer and second layer use feature fusion to further increase the diagnostic accuracy.Table 1 displays the specific parameters of the network.

Fault Diagnosis Model Based on Improved CDBN
Based on the CDBN model in 2.2.2 2.2, we further improved the model according to two aspects: changes to the preprocessing stage and the introduction of an optimizer.First, in the preprocessing stage, we introduced a band-pass filter to process the original samples after fast Fourier transform (FFT) and folding.The center frequency of the band-pass filter is set to 50% of the bandwidth, and the passband width is set to 40% of the bandwidth.The spectrogram of the band-pass filter is shown in Figure 4. Second, during the model training phase, the Adam optimizer [20] is introduced, enabling the model to adjust to different learning rates for each parameter.For frequently changing parameters, smaller steps are applied, whereas larger steps are better for sparse parameters.This algorithm is used to further improve the model's convergence speed and to reduce the number of model errors.
Figure 5 is the flowchart of the improved CDBN-based bearing fault diagnosis model proposed in this study.The input vibration signal is at a length of 1024.After FFT, it is normalized to between 0 and signals of different fault types and degrees of failure.Moreover, band-pass filtering is conducted to enhance texture features and reduce the noise effect.The CDBN model is a combination of two CRBM layers.For the first CRBM layer, the convolution kernel size is 7 × 7, the number of link weight matrices is 9, each size matches the size of the convolution kernel, the pooling kernel size is 2 × 2, and the maximum pooling method is used.The output of the first layer with a size of 13 × 13 is a feature map, and it is used as the input of the second CRBM layer.The second hidden layer adopts a size of 9 × 9 with 16 link weight matrices.The size of the second convolution kernel is 5 × 5, and the size of the second pooling kernel is 2 × 2. The proposed model also uses the maximum pooling method.At the end of the model, softmax is used for classification.The extracted features are mapped to 13 different types, divided according to fault types or degrees.Unlike the standard CDBN, before classification, the outputs of the first layer and second layer use feature fusion to further increase the diagnostic accuracy.

Dataset Description
To confirm the performance of the proposed model in bearing fault diagnosis, a laboratory-made bearing fault test platform was used to collect multiple faults and vibration signals at various levels.
As shown in Figure 6, the test platform included variable frequency motors, normal bearings, test bearings, loading systems, and acceleration sensors.The test bearing model was 6205-2RS SKF, and specific bearing parameters are listed in Table 2. Single-point or compound faults were set on the bearing surface by wire cutting.Figure 7 shows the real pictures of four bearings with four different health conditions.Types of faults included outer-race faults, ball faults, inner-race faults, inner-race compound ball faults, outer-race compound ball faults, and inner-race compound outer-race faults (IO).In this table, IB indicates that inner-race fault and ball faults exist at the same time on the test bearing, both under a fault degree of 0.2 mm.In the same way, IO indicates that inner-race fault and outer-race fault exist at the same time, and OB indicates that outer-race fault and ball fault exist at the same time.Specific data sets are described in Tables 3 and 4; dataset1 was collected under 0 kN load and dataset2 was collected under 1 kN load.The motor speed was set to 961 rpm, signal sampling frequency was set as 10 kHz, and the entire experiment was under a load of zero condition.

Dataset Description
To confirm the performance of the proposed model in bearing fault diagnosis, a laboratory-made bearing fault test platform was used to collect multiple faults and vibration signals at various levels.
As shown in Figure 6, the test platform included variable frequency motors, normal bearings, test bearings, loading systems, and acceleration sensors.The test bearing model was 6205-2RS SKF, and specific bearing parameters are listed in Table 2. Single-point or compound faults were set on the bearing surface by wire cutting.Figure 7 shows the real pictures of four bearings with four different health conditions.Types of faults included outer-race faults, ball faults, inner-race faults, inner-race compound ball faults, outer-race compound ball faults, and inner-race compound outer-race faults (IO).In this table, IB indicates that inner-race fault and ball faults exist at the same time on the test bearing, both under a fault degree of 0.2 mm.In the same way, IO indicates that inner-race fault and outer-race fault exist at the same time, and OB indicates that outer-race fault and ball fault exist at the same time.Specific data sets are described in Tables 3 and 4; dataset1 was collected under 0 kN load and dataset2 was collected under 1 kN load.The motor speed was set to 961 rpm, signal sampling frequency was set as 10 kHz, and the entire experiment was under a load of zero condition.The original vibration signal was intercepted into samples with a length of 1024.From each label, 300 samples were obtained.Two-thirds of the total samples were used as training data, and the remaining third was used as testing data.
The raw data of the bearing vibration signals is shown in Figure 8.It can be observed that the signal is noisy and lacks key information.The first step of the data preprocessing is to fold it into the size of 32 × 32 by FFT. Figure 9 shows the state of 13 sample types after FFT processing and folding.The second step of data preprocessing is filtering and normalization, for the purpose of filtering out noise. Figure 10 shows the samples after band-pass filtering preprocessing.The figures clearly show that the noise was reduced without weakening the fault signal.By contrast, the texture features of the samples were enhanced, which is more conducive for convolutional neural networks to extract features.

Diagnosis Results and Comparative Analysis
The comparison experiment was split into two parts.The proposed model was first compared with SAE, ANN, and DBN.Then, it was compared with standard CDBN.

Comparison with SAE, ANN, and DBN
The test results of the proposed model are shown in Figure 11b,d.The proposed model can achieve 100% accuracy for most types and degrees of faults.By contrast, misclassifications occur in the case of detecting an IO fault type with a small degree, as a result of the features of the IO fault type are less obvious than those of other types of faults.Operating on the collected experimental data set, the proposed improved CDBN model generally achieved an accuracy of 98.15% with 0 kN load and 96.15% with 1 kN load.
The comparison experiment was split into two parts.The proposed model was first compared with SAE, ANN, and DBN.Then, it was compared with standard CDBN.

Comparison with SAE, ANN, and DBN
The test results of the proposed model are shown in Figure 11b,d.The proposed model can achieve 100% accuracy for most types and degrees of faults.By contrast, misclassifications occur in the case of detecting an IO fault type with a small degree, as a result of the features of the IO fault type are less obvious than those of other types of faults.Operating on the collected experimental data set, the proposed improved CDBN model generally achieved an accuracy of 98.15% with 0 kN load and 96.15% with 1 kN load,.
Table 5 and Table 6 illustrate the comparison results of the proposed improved CDBN model with SAE, ANN, and DBN.The length of the input signal was 1024.A single hidden layer with 100 hidden neurons was used in SAE; ANN also had the same structure as SAE; DBN had two hidden layers, whose first and second hidden layers each have 100 neurons.The listed results indicate that the proposed model had an obvious advantage over the other models.When the detection performance of the other models was considerably decreased, the proposed model was still able to maintain a high detection accuracy.Tables 5 and 6 illustrate the comparison results of the proposed improved CDBN model with SAE, ANN, and DBN.The length of the input signal was 1024.A single hidden layer with 100 hidden neurons was used in SAE; ANN also had the same structure as SAE; DBN had two hidden layers, whose first and second hidden layers each have 100 neurons.The listed results indicate that the proposed model had an obvious advantage over the other models.When the detection performance of the other models was considerably decreased, the proposed model was still able to maintain a high detection accuracy.The test results of the standard CDBN are illustrated in Figure 11a,c.The figure shows that the improved CDBN was able to achieve higher accuracy than the standard CDBN, especially in outer-race faults and ball faults.However, for IO fault types, namely, label 12, the classification accuracies of both models had no obvious optimization because of the difficulty in catching distinguishable features.In general, the proposed improved CDBN model had better performance compared with the standard CDBN model.A commonly used method for high-dimensional data analysis is t-distribution stochastic neighbor embedding (tSNE).tSNE can reduce any dimension to less than three at will.As shown in Figure 13, an intuitive feature distribution was obtained by using tSNE to visualize the features of the standard and improved CDBN.

Conclusions
An improved CDBN-based fault diagnosis model was proposed for the effective extraction and learning of quantitative and qualitative features of different bearing fault types and sizes.The proposed model was able to extract deep features of the dataset, effectively implementing the diagnosis of multiple degrees and types of bearing faults.First, a band-pass filter was used to preprocess the original signal to obtain optimized features.Simultaneously, the Adam optimizer was introduced to speed up training and improve the model convergence speed.Finally, in combination with the softmax two-layer connection, multi-layer feature fusion was used to fully enable the feature representation capabilities of each layer of the model.Compared with the standard CDBN, the improved CDBN model displayed higher accuracy.The improved model also ensured the characterization of the learned features and ultimately enabled qualitative and quantitative diagnosis of bearing faults.The experimental results show that the proposed method had reduced training error, a smoother decline, and a higher diagnostic accuracy than SAE, ANN, and DBN.The proposed model had a positive impact on the classification of single and compound fault types of the datasets.Future research will focus on dealing with unbalanced data or small data, since these cases are more meaningful for practical applications.In future investigations, comparative experiments of different

Conclusions
An improved CDBN-based fault diagnosis model was proposed for the effective extraction and learning of quantitative and qualitative features of different bearing fault types and sizes.The proposed model was able to extract deep features of the dataset, effectively implementing the diagnosis of multiple degrees and types of bearing faults.First, a band-pass filter was used to preprocess the original signal to obtain optimized features.Simultaneously, the Adam optimizer was introduced to speed up training and improve the model convergence speed.Finally, in combination with the softmax two-layer connection, multi-layer feature fusion was used to fully enable the feature representation capabilities of each layer of the model.Compared with the standard CDBN, the improved CDBN model displayed higher accuracy.The improved model also ensured the characterization of the learned features and ultimately enabled qualitative and quantitative diagnosis of bearing faults.The experimental results show that the proposed method had reduced training error, a smoother decline, and a higher diagnostic accuracy than SAE, ANN, and DBN.The proposed model had a positive impact on the classification of single and compound fault types of the datasets.Future research will focus on dealing with unbalanced data or small data, since these cases are more meaningful for practical applications.In future investigations, comparative experiments of different optimizers, other networks (such as recurrent neural network and generative adversarial networks), and attention-based mechanisms will be conducted.

1 WFigure 3 .
Figure 3.The process of obtaining hidden layers from the visible layer based on CRBM.

1 WFigure 3 .
Figure 3.The process of obtaining hidden layers from the visible layer based on CRBM.Figure 3. The process of obtaining hidden layers from the visible layer based on CRBM.

Figure 3 .
Figure 3.The process of obtaining hidden layers from the visible layer based on CRBM.Figure 3. The process of obtaining hidden layers from the visible layer based on CRBM.
Appl.Sci.2020, 10, 6359 6 of 17 (a) Use CD algorithm to pre-train W and b and determine the opening and closing of the corresponding hidden element; (b) Propagate upward layer by layer, calculate the excitation value of each hidden element, and use the sigmoid function to complete the standardization; Use the minimum mean square error criterion for the backward error propagation algorithm and update the parameters of the network.(b)

Figure 4 .
Figure 4.The spectrogram of a band-pass filter.

Figure 4 .
Figure 4.The spectrogram of a band-pass filter.

Figure 5 .
Figure 5. Bearing fault diagnosis model based on an improved convolutional deep belief network FFT; fast Fourier transform.

Figure 6 .
Figure 6.Bearing fault diagnosis model based on an improved CDBN.

Figure 7 .
Figure 7.The real pictures of four bearings with four different health conditions.

Figure 7 .
Figure 7.The real pictures of four bearings with four different health conditions.

Figure 7 .
Figure 7.The real pictures of four bearings with four different health conditions.

Figure 8 .
Figure 8. Raw data of bearing vibration signals of different labels.

Figure 9 .
Figure 9. Visualization of the data after FFT.

Figure 10 .
Figure 10.Visualization of data after filtering and normalization.The main purpose of this step is to filter out noise.

Figure 8 . 17 Figure 8 .
Figure 8. Raw data of bearing vibration signals of different labels.

Figure 9 .
Figure 9. Visualization of the data after FFT.

Figure 10 .
Figure 10.Visualization of data after filtering and normalization.The main purpose of this step is to filter out noise.

Figure 9 .
Figure 9. Visualization of the data after FFT.

Figure 10 .
Figure 10.Visualization of data after filtering and normalization.The main purpose of this step is to filter out noise.

Figure 10 .
Figure 10.Visualization of data after filtering and normalization.The main purpose of this step is to filter out noise.

Figure 11 .
Figure 11.(a) Testing result of dataset1 using the standard CDBN.(b) Testing result of dataset1 using the improved CDBN.(c) Testing result of dataset2 using the standard CDBN.(d) Testing result of dataset2 using the improved CDBN.

Figure 11 .
Figure 11.(a) Testing result of dataset1 using the standard CDBN.(b) Testing result of dataset1 using the improved CDBN.(c) Testing result of dataset2 using the standard CDBN.(d) Testing result of dataset2 using the improved CDBN.

Figure 12 Figure 12 .
Figure 12.Comparison of second-layer reconstruction error when training (a) standard CDBN and (b) improved CDBN.

Figure 12 .
Figure 12.Comparison of second-layer reconstruction error when training (a) standard CDBN and (b) improved CDBN.

Figure 13 .Figure 14 .
Figure 13.Feature visualization of (a) dataset1 using the standard CDBN, (b) dataset1 using the improved CDBN, (c) dataset2 using the standard CDBN, and (d) dataset2 using the improved CDBN.The results indicate the obvious optimization effect of the proposed method on feature processing.CDBN integrates convolution operation into origin DBN, which makes it have a stronger feature extraction capability.On the basis of the standard CDBN, a filter was introduced to filter out noise, so that the improved CDBN model has a more stable learning ability.The feature distribution of the original data is relatively loose, and mixed regions exist without obvious class boundaries.These phenomena are not conducive to the subsequent classification.For instance, features of labels 6 and 7 are mostly muddled; thus, misclassification occurred, as shown in Figure11.By contrast, the feature distribution is clustered tightly with the improved CDBN, and it has easily observed boundaries and a wider distance, which are helpful for further feature extraction and classification.Reconstruction error is a key indicator for measuring model learning ability.Different optimizers were employed for comparative analysis, and the reconstruction errors by different optimizers are shown in Figure14.As shown in the figure, during the training, the reconstruction error of the CDBN model with Adam optimizer dropped smoothly and had the smallest reconstruction error.

Table 1
displays the specific parameters of the network.
OutputsFigure 5. Bearing fault diagnosis model based on an improved convolutional deep belief network (CDBN).FFT; fast Fourier transform.

Table 1 .
Specific parameters of the network.

Table 1 .
Specific parameters of the network.

Table 2 .
Specifications of the test bearings.

Table 2 .
Specifications of the test bearings.

Table 2 .
Specifications of the test bearings.

Table 3 .
Description of compound faults for dataset1.IB: inner-race fault and ball faults exist at the same time on the test bearing; IO: inner-race fault and outer-race fault exist at the same time; OB: outer-race fault and ball fault exist at the same time.

Table 4 .
Description of compound faults for dataset2.

Table 5 .
Comparison of testing results of dataset1 under load 0 kN by different models.

Table 6 .
Comparison of testing results of dataset2 under load 1 kN by different models.