A Multichannel Data Fusion Method Based on Multiple Deep Belief Networks for Intelligent Fault Diagnosis of Main Reducer

: Aiming at the problems of poor e ﬃ ciency of the intelligent fault diagnosis method of the main reducer and the poor e ﬀ ectiveness of multichannel data fusion, this paper proposes a multichannel data fusion method based on deep belief networks and random forest fusion for fault diagnosis. Multiple deep belief networks (MDBNs) are constructed to obtain deep representative features from multiple modalities of multichannel data. Random forest can fuse deep representative features achieved from MDBNs to construct the model of multiple deep belief networks fusion (MDBNF). The proposed method is applied to fault diagnosis of the main reducer and evaluation of the performance. Multiple deep belief network model fusions (MD BN F) are constructed to improve the multichannel data fusion e ﬀ ect. Single sensory data, multichannel data, and two intelligent models based on support vector machine and deep belief networks are used as comparison in the experiments. The results indicate that the classiﬁcation accuracy of the test set collected by sensor 1 and sensor 2 is 88.35% and 88.73%, respectively. The comparison results show that the method has good convergence. The data fusion of the proposed diagnostic model can e ﬀ ectively improve the correlation between the collected vibration signals and the failure mode, thereby improving the diagnostic performance by nearly 8%, representing improved diagnostic accuracy.


Introduction
As the crucial part of the rear axle, the condition of the main reducer has a direct impact on the level of vibration, safety and comfort, and any fault of the main reducer may lead to production downtime, economic loss and human injury [1,2]. Vibration signals generated during the running process can effectively reflect the variance of condition. Therefore, fault diagnosis based on vibration signals is the most-used way of machinery condition monitoring and fault diagnosis [3]. McLaughlin et al. utilized bi-spectrum analysis based on modulating signal to acquire features of gear vibration signal [4]. Marnatha et al. employed vibration signals and statistical parameters to detect local fault of helical gear tooth [5]. Yang et al. employed ensemble empirical mode decomposition to extract features of gear vibration signals [6]. Jena et al. used active noise cancellation and adaptive wavelet transform in gear fault diagnosis [7].
The first step of machinery fault diagnosis is to monitor valid condition symptom [8,9]. Due to the configuration complexity and failure diversity of mechanical system, it's difficult to determine the best position of sensors for monitoring. Meanwhile, the relationship between fault patterns and condition symptoms is complex and non-linear mapping [10]. Collecting vibration signals from single-channel sensors may encounter the limitations of installation position and direction of sensor

Deep Belief Networkwith Gaussian-Bernoulli RBM
A deep belief network (DBN) is a typical form of deep learning network and contains a visible layer, multiple hidden layers and an output layer. The visible layer is used to input data and then transmit data through hidden layers to achieve non-linear transform as the process of deep learning [20,21]. The structure of DBN is a stack of restricted Boltzmann machines (RBMs), which is composed of a visible layer and a hidden layer with binary stochastic units [22]. In order to deal with real-valued input data that can't be normalized into {0, 1}, we propose to use Gaussian-Bernoulli RBM (GRBM) [23] in which the visible units are Gaussian neurons to constitute DBN. The architecture of GRBM is shown in Figure 1.
where ( )isis the normalization factor, which depends on . For a specific GRBM structure without interlayer connections, the visible vector vand hidden vector hare conditional independent. The probabilities of visible vector vcan be given: The conditional probabilities of the ith visible unit aa and the jth hidden unit ℎ a are expressed as follows: (ℎ = 1|v; ) = g ( + ∑ ) (6) where is a real number and g(•)r represents the logistic function. By stacking L GRBMs, a DBN is constructed with L hidden layers (h (1) , h (2) ,…, h (L) ) in which input layer and h (1) form GRBM1, h (1) and h (2) form GRBM2, h (L−1) and h (L) form GRBML. In Figure 1, Vi and hi ∈ {0,1} are visible units and hidden units. Vector a and b represent the biases of hidden layer and visible layer, wij represents weight between the ith visible unit and the jth hidden unit. N v and N h separately represent the number of visible units and hidden units. Considering the standard deviation of visible units, the joint energy of visible and hidden layer units of GRBM is given as follows: where θ = {w, b, a} are the model parameters and σ i represents the standard deviation of the ith visible unit. The joint probability distribution of visible vector v and hidden vector h is defined as follows: where Z(θ) is the normalization factor, which depends on θ. For a specific GRBM structure without interlayer connections, the visible vector v and hidden vector h are conditional independent. The probabilities of visible vector v can be given: The conditional probabilities of the ith visible unit V i and the jth hidden unit h j are expressed as follows: where x is a real number and g(·) represents the logistic function. By stacking L GRBMs, a DBN is constructed with L hidden layers (h (1) , h (2) , . . . , h (L) ) in which input layer and h (1) form GRBM1, h (1) and h (2) form GRBM2, h (L−1) and h (L) form GRBM L. The training of the DBN consists of two procedures: pretraining and fine-tuning [24]. The first procedure is unsupervised pretraining of a stack of GRBMs one-by-one. The lower layer of each GRBM is used as the input layer of next GRBM and the parameters containing weight and biases of each GRBM are optimized independently. Once the previous GRBM finishes the pretraining, the next GRBM starts pretraining. The second procedure is supervised fine-tuning of the whole networks by back propagation algorithm. In this procedure, all of the hidden layers are considered as a whole and model parameters are adjusted to decrease the training error [25].

Deep Learning of Multichannel Data Using MDBNs
Through several nonlinear transformations in the form of a stack of GRBMs, a DBN can effectively extract the deep representative features. In order to obtain reliable data to ensure diagnostic accuracy, we propose to use multiple sensors to collect multichannel vibration signals as the input data of diagnostic model. Srivastava et al. proposed a multimodal learning method with deep Boltzmann machines by direct combination of multiple modalities [26]. However, the calculation cost of direct linear combination of input data with different modalities is too high, and the execution time will be increased. The combination process may occupy a large memory space and bring contradictions and conflicts [27]. Based on this, we construct multiple deep belief networks (MDBNs) in which each DBN is used to obtain deep representative features of vibration signal collected from one channel.
As shown in Figure 2, the architecture of MDBNs consists of several structures of DBNs (expressed as DBN 1 , DBN 2 , . . . , DBN M ) wherein DBN i is used to learn deep representative features of the vibration signals collected from the ith channel of M channels expressed as Channel 1, Channel 2, . . . , Channel M. By using MDBNs, after multiple non-linear hierarchical transformations of the input data from multichannel, the deep representative features of the raw data are acquired.  (12) h (11) Channel M

Random Forest Fusion of Deep Representative Features
With the representative features of each DBN in the form of Equation (7), diagnostic model could acquire a result as condition pattern corresponding to the vibration signal collected from the ith channel. However, results of multiple DBNs corresponding to multichannel vibration signals may be contradictory, such as: ≠ .The difficulty comes in determining which DBN's result is valid. Therefore, we fuse the output of each DBN together to form the input of diagnostic model.
In order to effectively fuse deep representative features in the form of Equation (8) extracted from multichannel vibration signals by using MDBNs, we propose to use random forest to fuse these features and achieve the final result. By using the fusion strategy, we can obtain the final result expressed as follows: With the input data of multichannel represented as X (1) , X (2) , . . . , X (M) , the output of the ith DBN is expressed as: where i=1,2, . . . , M and N iL represents the number of the output units of the ith DBN. With the proposed MDBNs, each channel data X (i) is converted into deep representative features expressed as p i . In order to obtain final diagnostic result, the output vector p 1 , p 2 , . . . , p M of MDBNs with M independent DBN structure needs to be used as the input of the diagnostic model. The output vector of MDBNs is expressed as follows:

Random Forest Fusion of Deep Representative Features
With the representative features of each DBN in the form of Equation (7), diagnostic model could acquire a result f i as condition pattern corresponding to the vibration signal collected from the ith channel. However, results of multiple DBNs corresponding to multichannel vibration signals may be contradictory, such as: f i f j . The difficulty comes in determining which DBN's result is valid. Therefore, we fuse the output of each DBN together to form the input of diagnostic model.
In order to effectively fuse deep representative features in the form of Equation (8) extracted from multichannel vibration signals by using MDBNs, we propose to use random forest to fuse these features and achieve the final result. By using the fusion strategy, we can obtain the final result expressed as follows: where F(·) represents the fusion function.
As shown in Figure 3, the fusion model based on random forest consists of K decision trees expressed as T 1 , T 2 . . . , T K with the output of K decision trees as the diagnostic results. Each decision tree is composed of several splits and nodes in which splits lead the direction of output and nodes determine the output of this decision tree in the form of a specific class c. The output of K decision trees is generated to codetermine the output vector of the fusion model expressed as follows: where c k is the class assigned by the kth decision tree T k . With the output vector C, the final diagnostic result is produced by majority voting operation:

Application to Intelligent Fault Diagnosis of Main Reducer
The proposed multichannel data fusion method based on multiple deep belief networks and random forest fusion is applied to construct an intelligent fault diagnostic model of MDBNF. The model includes five modules: collecting multichannel vibration signal using multiple sensors, preprocessing of the collected signals to generate multiple modalities features, deep learning of

Application to Intelligent Fault Diagnosis of Main Reducer
The proposed multichannel data fusion method based on multiple deep belief networks and random forest fusion is applied to construct an intelligent fault diagnostic model of MDBNF. The model includes five modules: collecting multichannel vibration signal using multiple sensors, preprocessing of the collected signals to generate multiple modalities features, deep learning of multiple modalities features using MDBNs, fusion of deep representative features obtained from MDBNs, making diagnostic decision.

Multiple Modalities Features of Multichannel Vibration Signal
Multiple modalities features of multichannel vibration signal can be extracted using different preprocessing techniques, such as time domain analysis, frequency domain analysis and time-frequency domain analysis. Considering the complementarity of these techniques, we utilize all these techniques to achieve multiple modalities features of multichannel vibration signal.
Time domain analysis technique is easy to implement and independent to rotating speed. Frequency domain analysis technique is superior to time domain analysis technique for early-stage and distributed faults [28][29][30][31]. Therefore, we employ eight statistical parameters including kurtosis, crest factor, variance, standard deviation, RMS, skewness, mean and impulse indicator [32] expressed as follows to implement the preprocessing of time domain and frequency domain signals.
where x(i) is the signal and n is the length of x(i). Additionally, we use wavelet package transform (WPT) to decompose the vibration signal for time-frequency domain analysis and to extract more condition parameters. With the maximum decomposition level of J, we calculate the wavelet coefficients and use two J energies of the Jth level as condition parameters of time-frequency domain.
In this way, multiple modalities features of vibration signal consist of eight statistical parameters of the time domain, eight statistical parameters of the frequency domain and two J condition parameters of the time-frequency domain expressed as follows: where z 1 , z 2 and z 3 separately represent parameters of time domain, frequency domain and time-frequency domain.
For multichannel vibration signal, multiple modalities features are expressed as represents multiple modalities features of vibration signal collected from the ith sensor.

The Proposed Diagnostic Model of MDBNF
With the multiple modalities features of multichannel vibration signal in the form of multiple statistical parameters, the proposed diagnostic method constructs MDBNs model as shown in Figure 2 to obtain deep representative features of multiple modalities from multichannel data which are sensitive to fault patterns.
For each DBN in MDBNs, a greedy and layer-to-layer unsupervised learning is executed to implement non-linear mapping and model advanced abstraction of raw signal collected from each sensor. After the pretraining of DBN model, fine-tune the parameters of model in supervised way by using back-propagation algorithm to improve the diagnostic accuracy. After the unsupervised pretraining and supervised fine-tuning process, the model of DBN is well trained and generated using training samples. The output the each DBN model is a deep representative feature vector expressed as Equation (7) of one channel data.
In order to process multichannel vibration signals collected from multiple sensors, multiple DBNs constitute the model of MDBNs to obtain deep representative features of multichannel data. With the output vector of MDBNs expressed as Equation (8), a fusion model as shown in Figure 3 is constructed to fuse the multiple modalities of deep representative features, and then make the diagnostic decision using Equation (11).
After the MDBNs model is well trained by training sample, it can be used to diagnose fault pattern of main reducer by using vibration measurement of from multiple sensors. The structure of the proposed diagnostic model of MDBNF for main reducer is shown as Figure 4.
The procedure of the proposed diagnostic model of MDBNF for main reducer is described as follows: Step 1: Collect multichannel vibration signals of main reducer from M sensors expressed as X 1 (t), X 2 (t), . . . , X M (t). Define fault patterns for fault diagnosis.
Step 2: Preprocess the collected signals to generate multiple modalities features expressed as Z 1 , Z 2 , . . . , Z M uusing several statistical parameters of time domain, frequency domain and time-frequency domain.
Step 3: Construct the model of MDBNs consisting of multiple DBN structures to implement deep learning of multichannel data, and generate deep representative features expressed as Equation (8).
Step 4: Use fusion model based on the random forest method to fuse deep representative features of MDBNs, and obtain the output vector expressed as Equation (10).
Step 5: With the output of fusion model, the final diagnostic decision is making by using Equation (11).  The procedure of the proposed diagnostic model of MDBNF for main reducer is described as follows: Step 1: Collect multichannel vibration signals of main reducer from M sensors expressed as X 1 (t), X 2 (t),…, X M (t). Define fault patterns for fault diagnosis.
Step 2: Preprocess the collected signals to generate multiple modalities features expressed as Z 1 , Z 2 , … , Z uusing several statistical parameters of time domain, frequency domain and time-frequency domain.
Step 3: Construct the model of MDBNs consisting of multiple DBN structures to implement deep learning of multichannel data, and generate deep representative features expressed as Equation (8).
Step 4: Use fusion model based on the random forest method to fuse deep representative features of MDBNs, and obtain the output vector expressed as Equation (10).
Step 5: With the output of fusion model, the final diagnostic decision is making by using Equation (11).

Experiment Setup
The experiments are carried out on a main reducer fault diagnosis test rig which consists of a control cabinet part used to control the rotating speed and rotating time, a drive part used to start the driving motor and a fixture part used to simulate the running states of main reducer under certain rotating speed. The main reducer test rig is shown in Figure 5. In order to implement reliable condition monitoring of the main reducer to collect multichannel data for fault diagnosis, we install two sensors on the tested main reducer in horizontal direction and vertical direction as shown in Figure 6.
The common faults of the main reducer occur in the pair of gears, including gear error, gear burr, gear hard point, misalignment, gear tooth broken and gear crack, as listed in Table 1. Seven condition patterns were simulated at the rotating speed of 1200 rps (revolutions per second) with the sampling frequency of 12 kHz. They contained six fault patterns and the normal condition. We

Experiment Setup
The experiments are carried out on a main reducer fault diagnosis test rig which consists of a control cabinet part used to control the rotating speed and rotating time, a drive part used to start the driving motor and a fixture part used to simulate the running states of main reducer under certain rotating speed. The main reducer test rig is shown in Figure 5. In order to implement reliable condition monitoring of the main reducer to collect multichannel data for fault diagnosis, we install two sensors on the tested main reducer in horizontal direction and vertical direction as shown in Figure 6.
The common faults of the main reducer occur in the pair of gears, including gear error, gear burr, gear hard point, misalignment, gear tooth broken and gear crack, as listed in Table 1. Seven condition patterns were simulated at the rotating speed of 1200 rps (revolutions per second) with the sampling frequency of 12 kHz. They contained six fault patterns and the normal condition. We ensured that the sampling frequency is higher than the gear meshing frequency so that effective information of fault can be reserved during sampling.
For each condition pattern, we repeated10 tests to collect enough data to represent the pattern. In each test, 20 signals with the duration of 0.2 s were collected. In this way, for each sensor, 1400 vibration signals corresponding to seven condition patterns were acquired in which 1050 signals were used as the training set to train MDBNs model and 350 signals were used as the testing set to test the performance of the model. The collected multichannel vibration signals are expressed as X 1 (t) = x 1 1 (t), x 1 2 (t), . . . , x 1 1400 (t) and X 2 (t) = x 2 1 (t), x 2 2 (t), . . . , x 2 1400 (t) . Each experiment was executed for 50 trials.
In each test, 20 signals with the duration of 0.2 s were collected. In this way, for each sensor, 1400 vibration signals corresponding to seven condition patterns were acquired in which 1050 signals were used as the training set to train MDBNs model and 350 signals were used as the testing set to test the performance of the model. The collected multichannel vibration signals are expressed as X 1 ( ) = [ 1 1 ( ), 2 1 ( ), … , 1400 1 ( )] and X 2 ( ) = [ 1 2 ( ), 2 2 ( ), … , 1400 2 ( )] .Each experiment was executed for 50 trials.    vibration signals corresponding to seven condition patterns were acquired in which 1050 signals were used as the training set to train MDBNs model and 350 signals were used as the testing set to test the performance of the model. The collected multichannel vibration signals are expressed as X 1 ( ) = [ 1 1 ( ), 2 1 ( ), … , 1400 1 ( )] and X 2 ( ) = [ 1 2 ( ), 2 2 ( ), … , 1400 2 ( )] .Each experiment was executed for 50 trials.

Data Preprocessing
In the preprocessing procedure of raw vibration signals to extract condition parameters, the maximum decomposition level of WPT is 6, and it employs Daubechies wavelet of order 4 (db4) as mother wavelet of WPT. Then, 2 6 coefficients are obtained by using WPT and the energy of each coefficient is combined to form a set of condition parameters. In this way, multiple modalities features for the vibration signal collected from each sensor with the dimension of 80 are extracted and stored in a matrix of 1400 rows (number of samples) and 80 columns (number of features).
In order to implement deep learning of multiple modalities features extracted from vibration signal of two sensors, two feature sets of 1050 training samples corresponding to seven condition patterns are expressed asZ 1 = [Z 1 1 , Z 2 1 , … , Z 1050 1 ] and Z 2 = [Z 1 2 , Z 2 2 , … , Z 1050 2 ] ,,, in which Z 1 and Z 2 are composed of 80 features extracted from the ith vibration signal, and Z 1 and Z 2 are used to construct the MDBNs model with two DBNs.

Model Design
The model of DBN was developed using the principles described in Section 2.1. For vibration signals collected from two sensors, we developed a deep learning model of MDBNs containing two DBNs named DBN1 and DBN2. The number of hidden layers of DBN1 and DBN2 relevant to learning performance and computation burden was set to 2. The number of the hidden neurons for the first hidden layer was set to 50, and the number of hidden neurons for the second hidden layer was set to 30.
In order to train each DBN of MDBNs, the number of pretraining epoch was set to 100, and the number of fine-tuning epochs was set to 200. The dimension of the output vector of MDBNs was 30,

Data Preprocessing
In the preprocessing procedure of raw vibration signals to extract condition parameters, the maximum decomposition level of WPT is 6, and it employs Daubechies wavelet of order 4 (db4) as mother wavelet of WPT. Then, 2 6 coefficients are obtained by using WPT and the energy of each coefficient is combined to form a set of condition parameters. In this way, multiple modalities features for the vibration signal collected from each sensor with the dimension of 80 are extracted and stored in a matrix of 1400 rows (number of samples) and 80 columns (number of features).
In order to implement deep learning of multiple modalities features extracted from vibration signal of two sensors, two feature sets of 1050 training samples corresponding to seven condition patterns are expressed as Z 1 = Z 1 1 , Z 1 2 , . . . , Z 1 1050 and Z 2 = Z 2 1 , Z 2 2 , . . . , Z 2 1050 , in which Z 1 i and Z 2 i are composed of 80 features extracted from the ith vibration signal, and Z 1 and Z 2 are used to construct the MDBNs model with two DBNs.

Model Design
The model of DBN was developed using the principles described in Section 2.1. For vibration signals collected from two sensors, we developed a deep learning model of MDBNs containing two DBNs named DBN1 and DBN2. The number of hidden layers of DBN1 and DBN2 relevant to learning performance and computation burden was set to 2. The number of the hidden neurons for the first hidden layer was set to 50, and the number of hidden neurons for the second hidden layer was set to 30. In order to train each DBN of MDBNs, the number of pretraining epoch was set to 100, and the number of fine-tuning epochs was set to 200. The dimension of the output vector of MDBNs was 30, which means that the dimension of input for the fusion model was 30. In order to construct the fusion model based on random forest, the number of decision trees was set to 10.

Results of the Proposed Model of MDBNF
The well-trained diagnostic model of MDBNF was used for main reducer fault diagnosis. By using a testing of 350 samples collected from each sensor corresponding to seven condition patterns, 342 samples were correctly classified, which means that the classification accuracy of the proposed diagnostic model is 97.72%. Additionally, the classification accuracy for different fault patterns is shown in Table 2. Table 2 indicates that the proposed diagnostic model is available for effectively diagnosing seven fault patterns of the main reducer, and is completely suitable for diagnosing normal status with the classification accuracy of 100%.
A set of detailed diagnostic result of MDBNF in one trial is shown in Figure 8 for intuitive presentation. The detailed diagnostic result in Figure 8 shows that the vast majority of samples in the testing set can be correctly classified by the proposed model of MDBNF.
Symmetry 2020, 12, x FOR PEER REVIEW 11 of 16 which means that the dimension of input for the fusion model was 30. In order to construct the fusion model based on random forest, the number of decision trees was set to 10.

Results of the Proposed Model of MDBNF
The well-trained diagnostic model of MDBNF was used for main reducer fault diagnosis. By using a testing of 350 samples collected from each sensor corresponding to seven condition patterns, 342 samples were correctly classified, which means that the classification accuracy of the proposed diagnostic model is 97.72%. Additionally, the classification accuracy for different fault patterns is shown in Table 2. Table 2 indicates that the proposed diagnostic model is available for effectively diagnosing seven fault patterns of the main reducer, and is completely suitable for diagnosing normal status with the classification accuracy of 100%.
A set of detailed diagnostic result of MDBNF in one trial is shown in Figure 8 for intuitive presentation. The detailed diagnostic result in Figure 8 shows that the vast majority of samples in the testing set can be correctly classified by the proposed model of MDBNF.

Fault Patterns
Diagnostic Result Accuracy C1 C2 C3 C4 C5 C6 C7 The superiority of the proposed diagnostic model is to employ vibration signals collected from multiple sensors so as to significantly enhance the reliability and sensitivity between captured symptom and fault patterns. With the multichannel data, a fusion model based on random forest was used to fuse the multiple modalities features extracted from these multichannel data.  The superiority of the proposed diagnostic model is to employ vibration signals collected from multiple sensors so as to significantly enhance the reliability and sensitivity between captured symptom and fault patterns. With the multichannel data, a fusion model based on random forest was used to fuse the multiple modalities features extracted from these multichannel data.

Principal Component Analysis of the Deep Representative Features
To verify the ability of MDBNs for deep learning of fault-sensitive features, we utilized principal component analysis (PCA) to visualize the deep representative features obtained from MDBNs. The dimension of output vector of MDBNs is 30.PCA was carried out on 30 features of each sample, and the top three principal components of the feature vector of testing setare shown in Figure 9.
principal component analysis (PCA) to visualize the deep representative features obtained from MDBNs. The dimension of output vector of MDBNs is 30.PCA was carried out on 30 features of each sample, and the top three principal components of the feature vector of testing setareshowninFigure9.
As shown in Figure 9, most points corresponding to the same fault pattern are clustered together, and points corresponding to different fault patterns are mostly separated to some extent. Considering that the section of features only have the top three principal components of feature vector with the aim of visualization, so that some useful information have been neglected. Although the points obtained by PCA can't be used to directly achieve superior classification accuracy of several fault patterns, the entire deep representative feature obtained from MDBNs is fault-sensitive and is quite qualified for classification. The result revealed that the proposed MDBNs could adaptively mine the deep representative characteristics of the main reducer.

Effectiveness of the Fusion Model
In order to validate the effectiveness of the fusion model for multichannel data in the proposed diagnostic model, by using a typical testing sample composed of 268 1′ ( ) and 268 2′ ( ) collected from two sensors, we conducted a comparison experiment which includes three situations: the first situation is to directly use deep representative features 1 of the testing sample 268 1′ ( ) outputted from DBN1 as input of classifier, the second situation is to directly use deep representative features 2 of the testing sample 268 2′ ( ) outputted from DBN2 as input of classifier, the third situation is to use fusion model based on random forest to fuse the output of the entire MDBNs expressed as ( 1 , 2 )to acquire the final diagnostic result by majority voting operation. The output of above three situations and comparison results of classification accuracy testing set are shown in Figure 10. According to Figure 10a-c, the diagnostic results of directly using deep representative features 1 and 2 of the testing sample 268 1′ ( ) and 268 2′ ( ) are totally incorrect, namely C5 and C3.
However, the output of fusion model for ( 1 , 2 ) could reflect the actual fault pattern of the testing sample, namely C6. Without fusion of multichannel data, the output of the classifier with input of As shown in Figure 9, most points corresponding to the same fault pattern are clustered together, and points corresponding to different fault patterns are mostly separated to some extent. Considering that the section of features only have the top three principal components of feature vector with the aim of visualization, so that some useful information have been neglected. Although the points obtained by PCA can't be used to directly achieve superior classification accuracy of several fault patterns, the entire deep representative feature obtained from MDBNs is fault-sensitive and is quite qualified for classification. The result revealed that the proposed MDBNs could adaptively mine the deep representative characteristics of the main reducer.

Effectiveness of the Fusion Model
In order to validate the effectiveness of the fusion model for multichannel data in the proposed diagnostic model, by using a typical testing sample composed of x 1 268 (t) and x 2 268 (t) collected from two sensors, we conducted a comparison experiment which includes three situations: the first situation is to directly use deep representative features p 1 of the testing sample x 1 268 (t) outputted from DBN1 as input of classifier, the second situation is to directly use deep representative features p 2 of the testing sample x 2 268 (t) outputted from DBN2 as input of classifier, the third situation is to use fusion model based on random forest to fuse the output of the entire MDBNs expressed as p 1 , p 2 to acquire the final diagnostic result by majority voting operation. The output of above three situations and comparison results of classification accuracy testing set are shown in Figure 10.
According to Figure 10a-c, the diagnostic results of directly using deep representative features p 1 and p 2 of the testing sample x 1 268 (t) and x 2 268 (t) are totally incorrect, namely C5 and C3. However, the output of fusion model for p 1 , p 2 could reflect the actual fault pattern of the testing sample, namely C6. Without fusion of multichannel data, the output of the classifier with input of deep representative features p 1 and p 2 outputted from DBN1 and DBN2 are inferior to the output of fusion model. The results reveal that the vibratory measurements using a single sensor may result in wrong fault patterns; the proposed diagnostic model with data fusion can lead to correct classification results.  As shown in Figure 10d, without data fusion, classification accuracy of testing set collected from sensor 1 and sensor 2 are 88.35% and 88.73%. The comparison result indicates that the multichannel data fusion of the proposed diagnostic model can effectively improve the relevance between collected vibration signals and fault patterns so as to enhance the diagnostic performance by nearly 8%.

Comparison of Different Diagnostic Models
In order to validate the superiority of the proposed diagnostic model based on MDBNs and data fusion of multichannel data, we implemented a set of experiments of comparing some state-of-the-art methods in the field of fault diagnosis by using the same sample set.
The compared diagnostic models include: (1) the proposed diagnostic model of MDBNF, (2) diagnostic model with MDBNs structure and KNN fusion, (3) diagnostic model with MDBNs structure and support vector classification (SVC) fusion, (4) diagnostic model with deep learning of single sensory data and without data fusion, (5) diagnostic model with shallow learning of representative features of multichannel data, (6) diagnostic model with shallow learning of representative features of single sensory data. Twenty trials are carried out for each model. The average classification accuracies of all comparison diagnostic models in the experiments are shown in Table 3. As shown in Figure 10d, without data fusion, classification accuracy of testing set collected from sensor 1 and sensor 2 are 88.35% and 88.73%. The comparison result indicates that the multichannel data fusion of the proposed diagnostic model can effectively improve the relevance between collected vibration signals and fault patterns so as to enhance the diagnostic performance by nearly 8%.

Comparison of Different Diagnostic Models
In order to validate the superiority of the proposed diagnostic model based on MDBNs and data fusion of multichannel data, we implemented a set of experiments of comparing some state-of-the-art methods in the field of fault diagnosis by using the same sample set.
The compared diagnostic models include: (1) the proposed diagnostic model of MDBNF, (2) diagnostic model with MDBNs structure and KNN fusion, (3) diagnostic model with MDBNs structure and support vector classification (SVC) fusion, (4) diagnostic model with deep learning of single sensory data and without data fusion, (5) diagnostic model with shallow learning of representative features of multichannel data, (6) diagnostic model with shallow learning of representative features of single sensory data. Twenty trials are carried out for each model. The average classification accuracies of all comparison diagnostic models in the experiments are shown in Table 3. From Table 3, we can conclude as follows: (1) With the same deep learning architecture of MDBNs, average classification accuracies of diagnostic models that use KNN and SVC to fuse deep representative features of multichannel data have reached 94.63% and 95.79%. However, the performances of these two models are still inferior to the proposed model of MDBNF that uses random forest fusion for multichannel data with the accuracy of 97.72%. It indicates that random forest fusion with majority voting strategy is better than simple classification strategy of KNN and SVC.MDBNF can fuse deep features outputted from DBNs with the input of multichannel data to obtain the final result in higher layer. (2) Without using multiple sensors to collect vibration signals, the performance of the DBN model with deep learning of single sensory data is 88.58%, which is not ideal and inferior to models with multichannel data fusion. It indicates that sensory data in different channels may contain various failure-sensitive characteristics. Therefore, multichannel data could exhibit more complete characteristic information of the main reducer. (3) The classification accuracies of models based on SVM with shallow learning of representative features are the worst, namely 73.16% and 74.37%, no matter which kind of data is selected. It indicates that deep learning architecture indeed extracts more fault-sensitive features of main reducer than shallow learning and could effectively establish non-linear relationships between vibration measurements and fault patterns of main reducer. (4) Compared with all the peer models, the performance of the proposed diagnostic model of MDBNF with multichannel data is superior to other models for main reducer fault diagnosis. This phenomenon indicates that multichannel data and deep learning architecture can improve the reliability and accuracy for fault diagnosis.

Conclusions
In this paper, a multichannel data fusion method based on a deep belief network and random forest fusion is proposed for fault diagnosis of the main reducer. Collecting vibration signals with a single sensor may encounter limitations in the location and orientation of the sensor installation. To track this issue, multiple channels of data were collected by using multiple sensors to display more complete feature information. Then, we were able to construct the structure of multiple deep belief networks (MDBN) to deeply learn representative features from multichannel data. Finally, an MDBNF model was established to randomly fuse multiple features of MDBN to obtain the final diagnosis result. In order to confirm the effectiveness and efficiency of the proposed diagnostic model, we implemented a set of comparison experiments by using a main reducer fault test rig. The experiment results verify the effectiveness of the proposed method, which achieves the best testing accuracy among all the comparative methods in the experiments.
The method proposed in this paper is effective, but there are still some limitations in the number of sensors, sensor types and operating conditions. Our future work will focus on testing the diagnostic model of MDBNF based on a multichannel data fusion method on more sensor quantity, sensor types and operation conditions. Furthermore, with the purpose of offering reasonable maintenance proposal, it is meaningful to investigate more effective approaches that can correctly diagnose the severity of several fault patterns after fault recognition.