An Ensemble Convolutional Neural Networks for Bearing Fault Diagnosis Using Multi-Sensor Data

Multi-sensor data fusion is a feasible technique to achieve accurate and robust results in fault diagnosis of rotating machinery under complex conditions. However, the problem of information losses is always ignored during the fusion process. To solve above problem, an ensemble convolutional neural network model is proposed for bearing fault diagnosis. The framework of the proposed model contains three convolutional neural network branches: one multi-channel fusion convolutional neural network branch and two 1-D convolutional neural network branches. The former branch extracts the coupling features based on multi-sensor data and the latter two branches extract the inherent features based on single-sensor data, which can collect comprehensive fault information and reduce information losses. Furthermore, the support vector machine ensemble strategy is employed to fuse the results of multiple branches, which can improve the generalization and robustness of the proposed model. The experiments show that the proposed can obtain more effective and robust results than other methods.


Introduction
Rotating machinery is widely used in modern industry. Due to long-time running under complicated conditions such as high speed, heavy load and strong impact, rotating machinery will inevitably have some faults, which can result in enormous losses and serious casualties [1]. Therefore, the fault diagnosis of rotating machinery is necessary to ensure the safe and efficient operation of machinery [2,3].
Traditional fault diagnosis methods are mainly based on model analysis or signal processing techniques. The model-based diagnosis methods emphasize the deep understanding of the dynamic characteristics of rotating machinery. Immovilli et al. [4] conduct a theoretical analysis based on the vibration and current signals. Kerschen et al. [5] provide extensive reviews on model-based analysis of vibrating systems. These methods usually require the design of the explicit mathematical model to simulate the behavior of the machine, while the development of the mathematical model is almost impossible when dealing with modern machines with very complex structures. The methods based on signal processing techniques often utilize signal models, such as power spectrum [6], high order spectrum [6][7][8], composite spectrum [9][10][11], to directly extract the fault features from the measured the optimization of individual network, which limits the generalization of the model. In addition, the splicing or arranging of massive time-domain data may lead to the problem of over-fitting.
To solve the above problems, a novel model named ensemble convolutional neural networks (ECNN) using multi-sensor data is proposed for bearing fault diagnosis in this manuscript. The proposed model, ECNN, can automatically and effectively extract features for classification that gets rid of the dependence on signal processing techniques and diagnosis experience. Besides, a multi-channel fusion convolutional neural network (MCF-CNN) and two 1-D CNNs are designed to construct ensemble model. The coupling features between multi-sensor data are extracted by the MCF-CNN model and the inherent features of the single-sensor data are extracted by 1-D CNN model, overcoming the problem of information losses. Furthermore, to improve the generalization and robustness of the proposed model, the SVM combination strategy is employed to integrate the result of multiple CNN models. The proposed model uses frequency spectrum as the input of CNN, verified based on two typical rolling bearing fault databases.
In Section 2, a brief theory of CNN is introduced. Section 3 describes the main framework of ECNN in detail. In Section 4, the experimental results are analyzed and discussed based on the CWRU database and the Paderborn University database. Finally, the conclusions and outlook are given in Section 5.

The Standard Convolutional Neural Network
CNN is one of the most prevalent deep learning models in recent years. As shown in Figure 1, CNN processes the input samples through multiple convolutional layers and pooling layers to obtain a series of deep feature maps. Suppose the input of lth convolutional layer is x, which belongs to R A×B and A and B are the dimensions of the input data. Then the output of the convolutional layer can be calculated as follows: where u l j is the linear activation, k l j is a set of kernels of lth convolutional layer, * represents the convolutional operation, x l−1 represents the feature maps in the previous layer l − 1, b l j is the bias vector and f is the Relu activation function. The output u l j is conducted by the max pooling operation: where u l j is the output feature maps of pooling layer and up(·) represents the up-sampling function of max pooling.
After several operations of convolution and pooling, deep feature maps are expanded into fully connected layer: where x l is the fully connected layer, k l is the weight, x l−1 represents the feature vector in the previous layer l − 1, b l is the bias vector. For the output x = {x 1 , x 2 , ..., x I } of last fully connected layer, softmax function achieves the mapping between the fully connected layer and the target output: where y i is the predicted probability belonging to the ith class, x i is the ith output neuron of last fully connected layer.

The Proposed Model
In this manuscript, a novel model, ECNN, is developed for bearing fault diagnosis. The input of the model is the frequency spectrum of the collected signals so that we select the one-dimensional convolutional neural network (1-D CNN) with the one-dimensional filter as the kernel of CNN. It is clearly seen that the framework contains three CNN branches based on two-sensor data. Figure 2 shows the overall framework of the proposed model. The MCF-CNN branch is used to extract the coupling features between the two-sensor data and the two 1-D CNN branches focus on the contribution of the single-sensor data. Thus, more comprehensive fault information can be collected for fault diagnosis, overcoming the problem of information losses. The SVM combination strategy is employed to give the final results by fusing three CNN branches. More details of the proposed model are described in the following subsection.

Multi-Channels Fusion Convolutional Neural Network
As shown in Figure 3, MCF-CNN has two independent channels at the input layer which process multi-channel data separately. After multiple convolution and pooling operations, the deep features from multi-sensor data are fused at the fully connected layer. The main idea is that the two independent channels based on different sensors can extract the coupling features. Then the two channels are fused to enhance the fault information that facilitates classification. Finally, the classification is accomplished at the end of the network. It can be assumed that the MCF-CNN has better performance than the CNN with single-channel input because the multi-channel inputs are simultaneously trained under the same learning framework and the parameters of different channels can be jointly optimized during the training process. Figure 4 shows the fusion layer of MCF-CNN. Each input channel of MCF-CNN processes the input data through the convolution and pooling operations, calculating a number of feature maps. The two channels are set at the same network structure. The number of convolutional layers, the size of the kernel and the activation function are identical. At the end of each channel, the extracted feature maps are expanded into a one-dimensional feature vector, which is flatten layer. Next, the fusion layer combines the 1-D feature vector of the two channels. Denote the two feature vectors as v 1 and v 2 . The dimensions of v 1 and v 2 should be equal, then the fused feature vector z can be obtained where w 1 and w 2 are the fusion weight of the feature vectors v 1 and v 2 , respectively, which are learned during the training process. The dimension of the fusion weight is consistent with the dimension of the feature vector. The relationship can be determined by the two weights, which is beneficial to extract the coupling features.  After the fusion layer, two fully connected layers are used to classify the input signals. The first fully connected layer has hundreds of neurons and the number of neurons in the last fully connected layer corresponds to the categories of classification tasks. The softmax function is used to convert the vector of the last fully connected layer into a probability distribution form.

The Construction of Ensemble Convlutional Neural Networks
The traditional fault diagnosis methods based on multi-sensor data ignore the coupling information between signals. The CNN fault diagnosis models based on multi-sensor data focuses on the coupling information between the signals and ignores the inherent information of the single-channel sensor. These methods all result in information losses. In addition, the performance of CNN fault diagnosis model is often limited due to single network framework. Ensemble learning is a new technique, which uses multiple individual learners and a certain combination strategy to get better results than each individual learner. Recently, a lot of ensemble learning methods have been applied for machinery fault diagnosis. Thus, ensemble learning of ECNN is constructed for fault diagnosis.
To collect more comprehensive fault information, the 1-D CNN model and the MCF-CNN model are used to construct ECNN. Since the databases contain the data from two sensors, two 1-D CNN models and one MCF-CNN model can be obtained. The input of the 1-D CNN is single sensor data, while the input of the MCF-CNN is the data from two sensors which are accepted by the two channel of MCF-CNN respectively.

The Fusion Strategy
The next step is to design a fusion strategy to combine the results of three CNN branches. Among the common fusion strategies, the majority voting is a widely used fusion method for ease of execution. However, the majority voting treats each classifier equally, resulting in poor performance when there are fewer classifiers. Obviously, it is not appropriate to use this method here because there are only three classifiers. The SVM is a popular machine learning method for classification, regression because of the small structural risk. In addition, the SVM model with kernel function can also learn the non-learning relationship of the input data. Thus, The SVM model with RBF kernel is chosen as the ensemble learning algorithm to fuse the results of three CNN branches.
For the nth samples, define the predicted probability that belong to the ith class of the mth CNN as in and m ∈ {1, 2, 3}, which can also be regarded as deep feature representations of the input sample. The deep feature representations of y (1) in , y (2) in and y (3) in are employed as the input of the SVM ensemble learning algorithm. The input can be expressed as follow: where n ∈ N, N represents the number of the train samples. On the training set, we use the deep feature representations y n−train of the nth training sample as the input and the real labelŷ n−train of the samples as the output to learn the SVM ensemble learning algorithm. On the testing set, the deep feature representations y n−test are used as the input of the SVM ensemble learning algorithm and then the predictionỹ n−test of the samples can be obtained. The accuracy of the classifier is can be expressed: whereŷ n−test represents the real label of the n sample on the testing set and I(·) represents the indicator function.

The General Procedure of the Proposed Model
This manuscript develops a new model called ECNN for bearing fault diagnosis. Figure 5 gives the flowchart illustrating of the proposed model and the general procedure can be summarized: Step 1: The signal acquisition devices collect data from multiple sensors.
Step 2: The collected data are divided into training and testing set and the raw signals are divided into a series of segments. The frequency spectrum of the segment is used as the input of the CNN model.
Step 3: The MCF-CNN based on multi-sensor data and two 1-D CNNs based on single-sensor data are designed to construct ECNN based on training set.
Step 4: The SVM ensemble algorithm is employed to combine the results of three CNN branches.
Step 5: Validate the performance of the ECNN based on the testing set.

Discussion of the Proposed Model
Based on the typical model of deep learning, convolutional neural network, the proposed model fuses the multi-sensors data as two level. The advantage of the proposed model is can be summarized as follows: 1.
The proposed model does not require complex mathematical models and does not rely on signal processing techniques and expert experience compared with the traditional fault diagnosis methods.

2.
Compared with intelligent diagnosis methods, the proposed model does not require the process of complex feature extraction and feature selection. Besides, the deep learning framework can effectively extract more useful fault information for classification, which can further improve the diagnostic accuracy. Similarly, the new network structure, MCF-CNN, has the better ability to extract features. 3.
The MCF-CNN model fuses multi-sensor data at the feature level and ECNN fuses the results of three CNN branches at decision level, effectively overcoming the problem of information losses during the fusion process.

Experiments
To demonstrate the effectiveness, the proposed model is tested based on two typical rolling bearing databases. The CNN models are created based on the framework of TensorFlow 1.4. The Nesterov Adam algorithm [49] is used to optimize the CNN models and the learning rate of the optimizer is 0.0005. All the experiments are carried out on a computer with Intel CPU E5-2680 and an NVIDIA Tesla T4 GPU.

Data Processing
The input of current CNN fault diagnosis models is usually a short segment of raw signals and the CNN models based on multi-sensor data fusion often directly combine segments from multiple sensors to long sample or arrange the segments to a 2-D image as the input of CNN models. Obviously, the length of segment has an impact on the performance of the CNN models. If the length of segment is too long, the splicing or arranging of much time domain data can result in over-fitting and the waste of resources and time. If the length is too short, the model appears to be under-fitting, which cannot learn effective fault features.
According to the literature [50], in the collected signals of the rotating machine, information about the fault characteristics often resides in the low frequency components and useless information generally exists in the high frequency components. Thus, traditional fault diagnosis methods usually convert time domain signals to the frequency domain through Fourier Transform (FT) and only the frequency components under 1 kHz are used for fault diagnosis. Figure 6 gives the three steps of data preprocessing. First, the raw signals are divided into several segments with a shift size. Second, each segment is transformed into the frequency domain by Fast Fourier transform (FFT). Finally, the frequency components under 1 kHz are kept and normalized as the input of CNN models.
The min-max normalization strategy is used to map each sample into [0, 1] interval. The equation is as follows: where x represents the sample,x represents the normalized result of x, max(·) and min(·) represent the maximum and minimum function, respectively, and X represents the training set.  As shown in Table 1, ten kinds of fault bearings are collected for fault classification, including different fault types, fault severities and fault orientations. Based on the four different loads, for each fault, bearing data of three loads are randomly selected as training data and the left for testing and ten different datasets are collected. Bearing data of each load has approximately 12,000 points and each segment has 1200 points. Therefore, each dataset contains 3000 training segments and 1000 testing segments, whose low frequency components are used as the training and testing input.

Experiment and Analysis
The excellent performance of the proposed model is proved in comparison with traditional fault diagnosis methods, including SVM, RF and AdaBoost algorithm. The input of traditional methods is the connected frequency spectrum of the two-sensor signals. In addition, a multi-sensor data fusion method based deep convolutional neural network (DCNN) [38] is also chosen for comparison and the raw vibration data from the two sensors is connected as the input of the DCNN model. Based on the deep CNN model, this method can learn features from raw data and optimize a combination of different fusion levels adaptively to satisfy the requirements of any fault diagnosis task.
Ten trials are carried out based on the datasets mentioned in the previous subsection. Figure 8 shows the results of comparison and Table 2 gives the testing average accuracy in detail. It can be seen that traditional fault diagnosis methods are not effective compared with the CNN models. The average accuracy of RF, AdaBoost and SVM is only 69.2%, 68.23% and 77.25%, respectively, which is far from the needs of the industrial application. Comparatively, the average accuracy of DCNN is 86.50%, exhibiting an evident advantage over the traditional methods. However, the drawback is that the standard deviation is large (8.47%), which shows that the generalization of the individual network is limited. Among all the five methods, ECNN has the highest average accuracy (96.78%) and the smallest standard deviation (2.93%).   Figure 9 gives the confusion matrix of different methods for the first trial. The ordinate axis of the confusion matrix represents the actual label of each class and the horizontal axis represents the predicted label. It can be seen from Figure 9a-c that SVM cannot distinguish the class of 2 and 4. The class 2 is the ball fault and class 4 is the out race fault. In addition to the class 2 and 4, RF and Adaboost cannot accurately identify other classes, which explains why the accuracy of the two methods is lower than SVM. From Figure 9d,e, DCNN shows the low testing accuracy only in class 6 and 7, both of two classes has the same fault types and fault orientations. However, ECNN can accurately distinguish each class. Through the above experiments, two conclusions can be made. On the one hand, the proposed model has higher accuracy than traditional methods. The reason is that the CNN models can automatically learn more effective features from the input data, while traditional methods rely heavily on manual feature extraction. For traditional methods, the selection of sensitive features is generally time-consuming, blind and subjective so that diagnosis results are generally poor. On the other hand, compared with the direct decision of DCNN, three CNN branches of ECNN based on different sensors and different networks can make joint decisions, which not only ensures the high diagnostic accuracy but also has good generalization.

Dataset Description
The experimental data is from the Chair of Design and Drive Technology at Paderborn University [48]. As shown in Figure 10, the test rig includes: (1) an electric motor (Hanning-Motor SD4CDu8S-009/425W/Y230V), (2) a torque measurement module, (3) a rolling bearing test module, (4) a flywheel and (5) a load motor (Siemens-Motor 1FT7062-1AF70-1DG1). The test bench is a modular system and the current signals are collected at a sample rate of 64 kHz. Three health conditions of ball bearings are provided: healthy, inner race fault and outer race fault. Both inner and outer race fault in bearings contain two groups of damages: artificial and real damages. Bearings with real damages are obtained by an accelerated lifetime test. The more details of the damages can be obtained in Reference [53]. The test rig can be operated under different operating conditions. Bearings are run at a speed of 1500 r/min with a load torque of 0.1 N·m and a radial force on the bearing of 1000 N. In order to get closer to industrial applications, the dataset in Table 3 of healthy bearings with real damages were collected, including a total of 15 bearings. For each bearing, the current signals of two channels are measured and each channel collects 20 samples. Therefore, there are a total of 100 samples for each health condition and 80 samples of each are randomly selected as training data and the remaining 20 samples for testing. Ten combinations are chosen to test the model. Table 3. Data sets for healthy bearings and bearings with real damages.

Healthy OR Damgae IR Damgae
The sample is the raw current data of 256,000 points, which is sliced into 60 subsamples. Each subsample has 5120 points with a shift size of 1024. The frequency spectrum of these subsamples is the input of the CNN models. During the testing, the subsamples from the same raw data are voted for the final result. The detailed situation of healthy, inner race fault and outer race fault bearings is shown in Tables 4-6. Therefore, a training set of 14,400 subsamples and a test set of 3600 subsamples are obtained.

Experiment and Analysis
The experiments are divided into two parts. The first part is the comparison among the proposed model and the traditional methods mentioned in Section 4.2.2. In the second part, the comparison among ECNN, three CNN branches and a fusion convolutional neural network (FCNN) is carried out. As shown in Figure 11, FCNN is chosen here because it has a similar structure and the same input as MCF-CNN. Through the comparison, we can explain the reasons for the excellent performance of ECNN.
In the first part, ten trials are carried out. Figure 12 shows the comparison of SVM, RF, AdaBoost and ECNN. The testing accuracy is given in Table 7. Consistent with the results based on the CWRU database, with the highest accuracy (98.17%) and the smallest standard deviation (1.74%), ECNN shows apparently better performance than the other four methods, which demonstrates the good robustness of ECNN. However, difference also exists between the results of the two databases that SVM (85.75%) and AdaBoost (87.50%) show higher accuracy than RF (70.5%). As shown in Figure 9, the confusion matrix of different methods is given. From Figure 13a-c, it can be found that the judgment of the healthy condition and inner race damage is the main reason for the different performance of the three methods. In addition, AdaBoost has higher accuracy in outer race damage than the other two methods. From Figure 13d, ECNN has high accuracy of 100% in each class.  In the second part, two current signals are denoted as U and V and define the 1-D CNN based on U and V as CNN_U and CNN_V, respectively. Table 8 and Figure 14 show the results of the five models. The accuracy of CNN_U is 86.42% and CNN_V is 67.25%, while the accuracy of MCF-CNN is 97.75%. Obviously, the performance of MCF-CNN is much better than the two 1-D CNN models, illustrating the advantages of the model based on multi-sensor data fusion. Besides, MCF-CNN also has higher accuracy than FCNN (92.5%), which shows the effectiveness of the new structure of MCF-CNN. Among all the five methods, ECNN has the highest average accuracy and the small standard deviation. From Figure 15, it can be seen that the low accuracy happens in healthy class and inner race class of CNN_V.  The reasons for the above results can be explained as follows. FCNN can be regarded as ECNN without the branch of MCF-CNN, while the performance of ECNN is better than FCNN. Besides, the accuracy of MCF-CNN is sightly lower than ECNN. Thus, it can be inferred that the excellent performance of ECNN is mainly attributed to MCF-CNN and the CNN_U and CNN_V branches slightly improve the accuracy of ECNN.

Visualization
Recently, the CNN models have developed a variety of complex structures, which make the interpretation of the internal mechanism of the CNN model very difficult. However, with the visualization in the invisible layers, some interesting phenomena can help us understand the mechanism [54].
In this manuscript, linear discriminant analysis (LDA) is used to extract two features for visualization. As shown in Figure 16, the visualization of different layers and different networks is given and LD1 and LD2 represent the first two principle components obtained by LDA. Some interesting phenomena can be easily found. First, Figure 16a,b are the visualizations of the input layer of CNN_U and CNN_V. It can be seen that the conditions can not be distinguished at all. Therefore, extracting and selecting sensitive features is difficult, which is the main reason for the poor performance of traditional methods. Second, Figure 16c is the visualization of CNN_U in the first fully connected layers. The healthy condition is easily distinguished from the other two damage conditions, while the two damage conditions can not be distinguished from each other clearly. The visualization of CNN_V in the first fully connected layers is shown in Figure 16d. There are no clear boundaries among the three conditions in that the accuracy of CNN_V is lower than the other branches. Third, Figure 16e is the visualization of MCF-CNN in the first fully connected layers. Obviously, the three conditions can be easily distinguished, which means that MCF-CNN can extract more effective fault features compared to the other two CNNs.

1.
The experiments are carried out based on two typical rolling bearing fault databases: CWRU database and Paderborn University database. The collected data of the former is vibration signals and the latter is current signals. For the two groups of experiments, the proposed model has much higher accuracy and smaller standard deviation compared with the traditional methods, which proves that the proposed model is effective and has better generalization. What is more, the results of DCNN on the two databases are quite different, while the results of ECNN are extremely consistent, demonstrating the good robustness of the proposed model.

2.
The second part of the experiment in case 2 illustrates that the excellent performance of the proposed model is mainly from the contribution of the MCF-CNN branch and the left two branches play slightly auxiliary roles. According to the visualization analysis, MCF-CNN can extract more effective features for fault diagnosis.

3.
In the literature [50], the deep inception net with atrous convolution (ACDIN) is proposed to diagnose real bearing faults by only relying on artificial bearing data sets, which achieves high diagnostic accuracy of 96% on Paderborn University database. Different from ECNN, the input of ACDIN is raw vibration signals from single sensor. In the literature [43], a deep random forest fusion (DRFF) technique by using acoustic emission signals and vibration signals is proposed to address fault diagnosis of gearboxes. Similar to FCNN, two deep Boltzmann machines extract the sensitive features of the two signals separately and RF fuses the deep features for classification. However, DRFF ignores the coupling information between the two signals. In comparison, MCF-CNN can be regarded as a feature-level fusion model to extract the coupling features and ECNN is constructed by three CNN branches that can be considered as a decision-level fusion model to collect information comprehensively. This ensures the excellent performance of the proposed model.

Conclusions
In this manuscript, a novel model ECNN using multi-sensor data is developed for bearing fault diagnosis that get rid of the dependence on signal processing techniques and diagnosis experience. The MCF-CNN model fuses multi-sensor data at the feature level and ECNN fuses the results of three CNN branches at decision level, effectively overcoming the problem of information losses during the fusion process. The proposed model is applied on two typical databases, and the results show that the proposed model has higher accuracy and better generalization than traditional intelligent methods.
The CNN models using multi-sensor data in the field of bearing fault diagnosis are still in the exploratory stage and there are still many problems to be solved, such as the fusion of "heterogeneous" sensors. In the future, we will explore a more efficient fusion method so that these methods can be applied in the modern industry as soon as possible.