A Bearing Fault Diagnosis Method Using Multi-Branch Deep Neural Network

: Feature extraction from a signal is the most important step in signal-based fault diagnosis. Deep learning or deep neural network (DNN) is an effective method to extract features from signals. In this paper, a novel vibration signal-based bearing fault diagnosis method using DNN is proposed. First, the measured vibration signals are transformed into a new data form called multiple-domain image-representation. By this transformation, the task of signal-based fault diagnosis is transferred into the task of image classiﬁcation. After that, a DNN with a multi-branch structure is proposed to handle the multiple-domain image representation data. The multi-branch structure of the proposed DNN helps to extract features in multiple domains simultaneously, and to lead to better feature extraction. Better feature extraction leads to a better performance of fault diagnosis. The effectiveness of the proposed method was veriﬁed via the experiments conducted with actual bearing fault signals and its comparisons with well-established published methods.


Introduction
Rolling element bearings are the most important components in rotary machines. The health condition of bearings has a profound effect on the performance of the machines. According to a literature review, about 40% of failure cases in large machinery systems and 90% in small rotary machines are caused by bearing defects [1]. In the industry, bearing faults can lead to massive losses of time and money; therefore, early detecting fault of bearings is a critical task in the industry.
Intelligent signal-based fault diagnosis is the most popular approach in machine health monitoring. The signal types used for diagnosing can be vibration signal [2,3], acoustic emission signal [4], or current signal [5,6]. Among those types of signals, vibration signal is exploited most extensively because vibration signal is easy to measure and can provide highly accurate information about the bearing health condition [7]. The fault diagnosis performance of the signal-based approach highly depends on the procedure of feature extraction in which discriminating features are extracted from vibration signals. After extracting the fault features from the fault signals, an intelligent decision-maker based on machine learning algorithms is exploited to determine the type of fault occurring.
Traditionally, feature extraction exploits signal processing techniques to extract information from the fault signal in the time domain, frequency domain, and time-frequency domain [8]. This traditional approach has some disadvantages, as follows. First, the diagnosing accuracy depends on the signal processing technique and requires expert knowledge [9]. Because of the expert knowledge requirement, it is difficult to propose a generalized framework for feature extraction. It means that features used to make effectiveness of the proposed method is verified through experiments with actual bearing fault data supplied by Case Western Reverse University [29]. The major contributions of this paper are summarized as follows: 1.
Proposing a method to represent vibration signals in high dimensional form; 2.
Proposing an MB-DNN with a multi-branch structure to handle the new representation of vibration signal in the high-dimensional domain; 3.
Transforming the task of signal-based fault diagnosis into the task of image classification; 4.
The proposed fault diagnosis method achieves significant classification accuracy.
The remainder of the paper is constructed as follows. Section 2 describes the way of constructing MDIR data from vibration signals. Section 3 explains MB-DNN and its application in bearing fault diagnosis. Experiments with actual bearing data were conducted and are presented in Section 4. Section 5 concludes the paper.

Multiple-Domain Image-Representation of Vibration Signal
Originally, vibration signal is time-series data, which is a 1-D data form. A new method to represent vibration signals in high dimensions is proposed. It is motivated by three following facts: 1.
It may be easier to understand and mine information in high-dimensional data [30]. 2.
CNN and its variants are suitable for the task of recognizing two-dimensional visual patterns [31]. 3.
By transforming the signal into visual data, the task of fault diagnosis can be converted into the task of image classification.
The vibration signal will be transformed into a time-domain image by a simple method proposed by D. Nguyen et al. in [32]. Consider a signal x l with the number of samples is n 2 , all the samples in the signal are rearranged into a square matrix with a size of n × n as follows: Then the obtained square matrix is normalized to range [0.0 − 1.0] by the linear normalization: The normalized sample I ij , i, j ∈ [0, n] is placed at the row i, column j of the matrix. The corresponding time domain image consists of pixels that are normalized samples of the corresponding signal, as shown in Figure 1.
The time-frequency image representation is based on the continuous wavelet transform (CWT). The CWT uses the same length of time data as in the previous transformation. A mother wavelet is a function ψ(t) with zero average (i.e., R ψ − 0), normalized (i.e., ψ = 1), and centered in the neighborhood of t = 0 [33]. Scaling ψ(t) by a positive quantity s, and translating it by y ∈ R, a wavelet family can be defined as: Given x(t) ∈ L 2 (R), the continuous wavelet transform of x(t) at time u and scale s (which inversely relates to frequency) is defined as: where ψ * denotes the complex conjugate of ψ. CWT decomposes the input signal x(t) into a series of wavelet coefficients. The scalogram of x(t) is defined by the function: If a time interval [t 0 , t 1 ] needs to be considered, the corresponding windowed scalogram is defined by the function: In other words, the scalogram is the absolute value of the CWT of a signal, plotted as a function of time and frequency, as shown in Figure 2. Given 2 ( ) ( ) x t L ∈ R , the continuous wavelet transform of ( ) x t at time u and sc s (which inversely relates to frequency) is defined as: ψ denotes the complex conjugate of ψ . CWT decomposes the input signal x into a series of wavelet coefficients. The scalogram of ( ) x t is defined by the function: ∈ R , the continuous wavelet transform of ( ) x t at time u and s s (which inversely relates to frequency) is defined as: ψ denotes the complex conjugate of ψ . CWT decomposes the input signal into a series of wavelet coefficients. The scalogram of ( ) x t is defined by the function    shown in Figure 3. This representation of vibration signal is named multiple-domain image-representation (MDIR). By using MDIR data, the problem of fault diagnosis based on vibration signal can now be considered as a task of image classification.

Proposed Multi-Branch Deep Neural Network
In the previous section, the multiple-domain image-representation (MDIR) of vibration signal has been described. To handle the MDIR data, a multi-branch deep neural network (MB-DNN) is proposed. The structure of MB-DNN is shown in Figure 4. The proposed MB-DNN consists of several types of layers, including convolutional layer, batch normalization layer, pooling layer, feature fusing layer, dense layers, and SoftMax layer.
The convolutional layer based on the convolution operation is the most important layer type, which has been employed in many well-known DNN models such as LeNet [34], AlexNet [35], VGGNet [36], ResNet [37], and DenseNet [38]. The convolutional layer convolves the input with its kernels and feeds the obtained result into the activate function to generate the output. Consider a convolutional layer with m kernels, the input has M feature maps. The output of that layer can be calculated as follows: where f denotes the activate function. In this paper, the Rectified Linear Unit (ReLU) is used as the activate function since it is simple and easy to compute.

Proposed Multi-Branch Deep Neural Network
In the previous section, the multiple-domain image-representation (MDIR) of vibration signal has been described. To handle the MDIR data, a multi-branch deep neural network (MB-DNN) is proposed. The structure of MB-DNN is shown in Figure 4. shown in Figure 3. This representation of vibration signal is named multiple-domain image-representation (MDIR). By using MDIR data, the problem of fault diagnosis based on vibration signal can now be considered as a task of image classification.

Proposed Multi-Branch Deep Neural Network
In the previous section, the multiple-domain image-representation (MDIR) of vibration signal has been described. To handle the MDIR data, a multi-branch deep neural network (MB-DNN) is proposed. The structure of MB-DNN is shown in Figure 4. The proposed MB-DNN consists of several types of layers, including convolutional layer, batch normalization layer, pooling layer, feature fusing layer, dense layers, and SoftMax layer.
The convolutional layer based on the convolution operation is the most important layer type, which has been employed in many well-known DNN models such as LeNet [34], AlexNet [35], VGGNet [36], ResNet [37], and DenseNet [38]. The convolutional layer convolves the input with its kernels and feeds the obtained result into the activate function to generate the output. Consider a convolutional layer with m kernels, the input has M feature maps. The output of that layer can be calculated as follows: where f denotes the activate function. In this paper, the Rectified Linear Unit (ReLU) is used as the activate function since it is simple and easy to compute. The proposed MB-DNN consists of several types of layers, including convolutional layer, batch normalization layer, pooling layer, feature fusing layer, dense layers, and SoftMax layer.
The convolutional layer based on the convolution operation is the most important layer type, which has been employed in many well-known DNN models such as LeNet [34], AlexNet [35], VGGNet [36], ResNet [37], and DenseNet [38]. The convolutional layer convolves the input with its kernels and feeds the obtained result into the activate function to generate the output. Consider a convolutional layer with m kernels, the input has M feature maps. The output of that layer can be calculated as follows: where f denotes the activate function. In this paper, the Rectified Linear Unit (ReLU) is used as the activate function since it is simple and easy to compute. The batch normalization layer exploited the batch normalization technique proposed in [39] to improve the training process of DNNs. The batch normalization ensures the transformation inserted in the network by representing the identity transform. The pooling Machines 2021, 9, 345 6 of 15 layer reduces the dimension of the input feature maps. The dimension reduction operation can be conducted by max or average operation. The pooling layer computes the max value or average value of a group of neurons at the previous layer.
For convenience in calling and easer in drawing the neural network structure, three layers, including one convolutional layer, one batch normalization layer, and one pooling layer, are grouped successively to construct a convolutional-batch normalization-pooling (CBP) module. Assume that a CBP module with a convolutional layer with m kernels. With the input feature map x M×n×n , the output of the CBP module will be y m×h×h where h = n/2.
As shown in Figure 4, the proposed MB-DNN has two branches. While branch I handles the time domain images, branch II handles the time-frequency domain images of the vibration signals. Each branch consists of several CBP modules. Two branches simultaneously extract features from the input MDIR data, generate two types of feature maps. After that, the feature fusing layer will fuse these feature maps to generate a single one and forward it to the next part of the network. The operation of the feature fusing layer is described as follows: where C denotes the fusing operation of the layer; x t , x f are the feature maps extracted by branch I and II, respectively. Assume that x t and x f has sizes , respectively. First, two feature maps will be flattened to have size 1 × a 1 × b 1 2 and 1 × a 2 × b 2 2 , respectively. The output of this layer x will have size The dense layer is a traditional perception neural network. The purpose of using a dense layer is to collect all features from the previous feature map. The generated feature map of this layer will be used for the classification task, which will be performed by the SoftMax layer. With an input feature map x, the SoftMax layer computes the probabilities: where N is the number of classes in the classification task. The SoftMax function calculates the probabilities of each target class over all target classes. The loss function of MB-DNN is calculated by cross-entropy loss as follows: where q is the true label of the input data, p is the output of the SoftMax function.
Using the MDIR data and MB-DNN, the proposed fault diagnosis method is illustrated in Figure 5. As shown in the diagram, the original fault signals are transformed into MDIR data. Then the obtained dataset is split into a training set and a testing set. The MB-DNN is trained with the training set by the back-propagation algorithm. Using the stochastic gradient descent with momentum, the weights of MB-DNN are updated by the equations where α denotes the learning rate, α is the momentum parameter.

Data Preparation
The actual bearing fault data are supplied by the Bearing Data Center of Case Western Reserve University (CWRU) [40]. The bearing fault testbed is shown in Figure 6. The testbed consisted of a 2-hp motor (left), a torque transducer (center), and a dynamo-meter (right). The test bearings support the motor shaft. The test bearings were seeded with faults using electro-discharge machining (EMD). The vibration signals in this paper are measured by an accelerometer and digitized with a sampling frequency of 12 kHz. The accelerometer is installed at the driver end with magnetic bases. There are four types of bearing conditions are considered, including nofault condition, bearing with inner race fault, bearing with outer race fault, and bearing with fault at rolling elements (ball fault). Each type of bearing fault is introduced to the test bearings with different defect sizes, including 7 mils (mili-inches), 14 mils, and 21 mils. The types of bearing conditions with different fault diameters are shown in Table 1 and Figure 7. There are 10 types of bearing conditions labeled from 0 to 9, respectively. The bearing testbed can be operated under different load conditions. In this paper, four load conditions were considered, including 0 hp, 1 hp, 2 hp, and 3 hp.

Data Preparation
The actual bearing fault data are supplied by the Bearing Data Center of Case Western Reserve University (CWRU) [40]. The bearing fault testbed is shown in Figure 6. The testbed consisted of a 2-hp motor (left), a torque transducer (center), and a dynamo-meter (right). The test bearings support the motor shaft. The test bearings were seeded with faults using electro-discharge machining (EMD).

Data Preparation
The actual bearing fault data are supplied by the Bearing Data Center of Case Western Reserve University (CWRU) [40]. The bearing fault testbed is shown in Figure 6. The testbed consisted of a 2-hp motor (left), a torque transducer (center), and a dynamo-meter (right). The test bearings support the motor shaft. The test bearings were seeded with faults using electro-discharge machining (EMD). The vibration signals in this paper are measured by an accelerometer and digitized with a sampling frequency of 12 kHz. The accelerometer is installed at the driver end with magnetic bases. There are four types of bearing conditions are considered, including nofault condition, bearing with inner race fault, bearing with outer race fault, and bearing with fault at rolling elements (ball fault). Each type of bearing fault is introduced to the test bearings with different defect sizes, including 7 mils (mili-inches), 14 mils, and 21 mils. The types of bearing conditions with different fault diameters are shown in Table 1 and Figure 7. There are 10 types of bearing conditions labeled from 0 to 9, respectively. The bearing testbed can be operated under different load conditions. In this paper, four load conditions were considered, including 0 hp, 1 hp, 2 hp, and 3 hp.  The vibration signals in this paper are measured by an accelerometer and digitized with a sampling frequency of 12 kHz. The accelerometer is installed at the driver end with magnetic bases. There are four types of bearing conditions are considered, including no-fault condition, bearing with inner race fault, bearing with outer race fault, and bearing with fault at rolling elements (ball fault). Each type of bearing fault is introduced to the test bearings with different defect sizes, including 7 mils (mili-inches), 14 mils, and 21 mils. The types of bearing conditions with different fault diameters are shown in Table 1 and Figure 7. There are 10 types of bearing conditions labeled from 0 to 9, respectively. The bearing testbed can be operated under different load conditions. In this paper, four load conditions were considered, including 0 hp, 1 hp, 2 hp, and 3 hp.

Signal Pre-Processing
In the data source supplied by CWRU, each bearing condition fault signal is measured and stored in a single file Matlab file; thus, there are 10 different signal files corresponding to 10 bearing conditions as in Table 1. In the intelligent fault diagnosis approach, classifiers require data samples to be trained; therefore, the original signal files are split into equal signal samples. Each signal sample must contain enough sampling points to convey the information of the bearing status; that is, if the length is too short, the signal sample cannot reflect the bearing health status. Normally, the sample length is selected to be equal to one revolution of the rotary shaft. In this work, the rotary speed of the shaft is 1796 rpm S = . Accordingly, the rotary frequency is 1796 / 60 30Hz

Signal Pre-Processing
In the data source supplied by CWRU, each bearing condition fault signal is measured and stored in a single file Matlab file; thus, there are 10 different signal files corresponding to 10 bearing conditions as in Table 1. In the intelligent fault diagnosis approach, classifiers require data samples to be trained; therefore, the original signal files are split into equal signal samples. Each signal sample must contain enough sampling points to convey the information of the bearing status; that is, if the length is too short, the signal sample cannot reflect the bearing health status. Normally, the sample length is selected to be equal to one revolution of the rotary shaft. In this work, the rotary speed of the shaft is S = 1796 rpm. Accordingly, the rotary frequency is S f = 1796/60 ≈ 30 Hz. In the CWRU testbed, the sampling frequency F = 12, 000 Hz. The minimum value of the sample length is F/S f = 400 (sampling point). As mentioned in Section 3, in the proposed method, there is a step where signal samples are transformed into gray images and apply CNN-like neural network to classify. So, we aim to make an MNIST-like data set where each image has a size of 1 × 28 × 28; therefore, the length of signal samples is selected at the value of 28 × 28 = 784. The time-domain image has the size of 1 × 28 × 28.
The time-frequency domain image conversion exploits CWT using the Morse wavelet function. To deal with the time-frequency domain image, the obtained time-frequency images are scaled to a size of 3 × 224 × 224. The MDIR data of the vibration signals with ten labels were obtained as shown in Figure 8.
like neural network to classify. So, we aim to make an MNIST-like data set where each image has a size of 1 28 28 × × ; therefore, the length of signal samples is selected at the value of 28 28 784 × = . The time-domain image has the size of 1 28 28 × × . The time-frequency domain image conversion exploits CWT using the Morse wavelet function. To deal with the time-frequency domain image, the obtained time-frequency images are scaled to a size of 3 224 224 × × . The MDIR data of the vibration signals with ten labels were obtained as shown in Figure 8. From each original signal file, 300 data samples were obtained. Accordingly, the 10class classification task is balanced since each class has 300 data samples. For each label, the image data were split randomly with the ratio 7:3 for the training set and the test set.

Design and Train the Proposed DNN
The structure of the proposed MB-DNN is designed as follows. First of all, the number of branches is two, corresponding to two types of input images. The input sizes of branches must be suitable for the input images; therefore, the first branch has the input size of 1 28 28 × × and the second branch has the input size of 3 224 224 × × . The kernel size of convolutional layers and pooling layers are 3 3 × and 2 2 × , respectively. The number of kernels in the convolutional layer is selected by a simple rule. Start with a small number of kernels in the first layer and double that number in the next layer. The first convolutional layer has 8 kernels, the second layer has 16 kernels, the third layer has 32 kernels, and so on. Each module CBP consists of one convolutional layer, one batch normalization layer, and one pooling layer. Since we use the zero-padding method in convolutional layers and the kernel size of 2 2 × in pooling layers, after each module CBP, the size of data will decrease by a factor of 2. The number of CBP modules is increased by one until the size of the output data is an odd number. In the first branch, two CBP modules are used; in the second branch, five CBP modules are used. The proposed MB-DNN has From each original signal file, 300 data samples were obtained. Accordingly, the 10-class classification task is balanced since each class has 300 data samples. For each label, the image data were split randomly with the ratio 7:3 for the training set and the test set.

Design and Train the Proposed DNN
The structure of the proposed MB-DNN is designed as follows. First of all, the number of branches is two, corresponding to two types of input images. The input sizes of branches must be suitable for the input images; therefore, the first branch has the input size of 1 × 28 × 28 and the second branch has the input size of 3 × 224 × 224. The kernel size of convolutional layers and pooling layers are 3 × 3 and 2 × 2, respectively. The number of kernels in the convolutional layer is selected by a simple rule. Start with a small number of kernels in the first layer and double that number in the next layer. The first convolutional layer has 8 kernels, the second layer has 16 kernels, the third layer has 32 kernels, and so on. Each module CBP consists of one convolutional layer, one batch normalization layer, and one pooling layer. Since we use the zero-padding method in convolutional layers and the kernel size of 2 × 2 in pooling layers, after each module CBP, the size of data will decrease by a factor of 2. The number of CBP modules is increased by one until the size of the output data is an odd number. In the first branch, two CBP modules are used; in the second branch, five CBP modules are used. The proposed MB-DNN has the configuration as shown in Figure 9. Branch I consists of two CPB modules that handle the time domain images. Branch II uses five CBP modules to handle the time-frequency domain images. The output of branch I has a size of 16 × 7 × 7. The output of branch II has a size of 128 × 7 × 7. These two outputs are fed into the feature fusing layer to generate the output data with size of 1 × 7056. Then three successive dense layers are used to learn from the fused feature map. Finally, a SoftMax layer with ten outputs is used to classify the feature map generated by the third dense layer. the configuration as shown in Figure 9. Branch I consists of two CPB modules that handle the time domain images. Branch II uses five CBP modules to handle the time-frequency domain images. The output of branch I has a size of 16 7 7 × × . The output of branch II has a size of 128 7 7 × × . These two outputs are fed into the feature fusing layer to generate the output data with size of 1 7056 × . Then three successive dense layers are used to learn from the fused feature map. Finally, a SoftMax layer with ten outputs is used to classify the feature map generated by the third dense layer.

Fault Diagnosis Result
Four other bearing fault diagnosis methods are adopted to make comparisons with the proposed method as follows. The first method is published in [25]. In this method, the The MB-DNN is trained by mini-batch stochastic gradient descent with momentum algorithm, the learning rate α = 0.001, the momentum β = 0.9, and the batch size B = 10.

Fault Diagnosis Result
Four other bearing fault diagnosis methods are adopted to make comparisons with the proposed method as follows. The first method is published in [25]. In this method, the deep transfer learning technique is utilized to transfer the very deep neural network (Alexnet) pre-trained in the image classification domain into the domain of bearing fault diagnosis. The vibration signals are transformed into image form by using CWT. The second method is published in [24]. This method uses Lenet-5, which is a classical type of CNN in [41]. In this method, each sample of vibration signal is rearranged into a square matrix. The third method is published in [26]. This method utilized the 1-D form of CNN, which can process the vibration signal samples directly in the 1-D form without any transformation. The fourth method is published in [42]. The same as the method in [26], the vibration signals in [42] are used directly in 1-D form without any transformation. However, the number of convolution kernels of CNN is decreased with the reduction in the convolution kernel size.
The accuracy of all methods is shown in Figure 10. It can be observed that in all cases of load conditions, the proposed method and the Alexnet-based method in [25] achieve the best performance. The two methods have high mean accuracy and small standard deviation. The Lenet5-based method in [24] has a little lower accuracy. The CNN1D-based method in [26] has a lower performance with low accuracy. The CNN1D-based method in [42] has the poorest performance with low accuracy.
ond method is published in [24]. This method uses Lenet-5, which is a classical type of CNN in [41]. In this method, each sample of vibration signal is rearranged into a square matrix. The third method is published in [26]. This method utilized the 1-D form of CNN, which can process the vibration signal samples directly in the 1-D form without any transformation. The fourth method is published in [42]. The same as the method in [26], the vibration signals in [42] are used directly in 1-D form without any transformation. However, the number of convolution kernels of CNN is decreased with the reduction in the convolution kernel size.
The accuracy of all methods is shown in Figure 10. It can be observed that in all cases of load conditions, the proposed method and the Alexnet-based method in [25] achieve the best performance. The two methods have high mean accuracy and small standard deviation. The Lenet5-based method in [24] has a little lower accuracy. The CNN1D-based method in [26] has a lower performance with low accuracy. The CNN1D-based method in [42] has the poorest performance with low accuracy.

Evaluation under Noisy Conditions
The bearing data set supplied by CWRU has been extensively employed as a benchmark for evaluating bearing fault diagnosis methods. Recently, proposed fault diagnosis methods with advantage signal processing and feature learning techniques can achieve very high accuracy. It is not easy to highlight the performance of newly proposed fault diagnosis methods; therefore, noise signals are often added to the original signals to evaluate methods. This way can help to evaluate the robustness of fault diagnosis methods under more challenging conditions. In this scenario of evaluating diagnosis methods, Gaussian white noise (AGWN) is added into the original vibration signals as in Figure 11.
The signal-to-noise ratio (SNR) is defined to measure the level of the obtained noisy signal to the level of the additional Gaussian noise. The SNR of a noisy signal is computed as follows: SNM = 10 log P signal P noise (13) where P signal and P noise are the power of signal and noise, respectively. Nineteen noise levels (dB) in the range of [−8, −7, . . . , 10] are taken into account. The comparison is illustrated in Figure 12.
mark for evaluating bearing fault diagnosis methods. Recently, proposed fault diagnosis methods with advantage signal processing and feature learning techniques can achieve very high accuracy. It is not easy to highlight the performance of newly proposed fault diagnosis methods; therefore, noise signals are often added to the original signals to evaluate methods. This way can help to evaluate the robustness of fault diagnosis methods under more challenging conditions. In this scenario of evaluating diagnosis methods, Gaussian white noise (AGWN) is added into the original vibration signals as in Figure 11. Figure 11. Noise signal.
The signal-to-noise ratio (SNR) is defined to measure the level of the obtained noisy signal to the level of the additional Gaussian noise. The SNR of a noisy signal is computed as follows: where signal P and noise P are the power of signal and noise, respectively. Nineteen noise levels (dB) in the range of [−8, −7, …, 10] are taken into account. The comparison is illustrated in Figure 12.
In the previous section, all compared methods achieved very high diagnostic accuracy when the inputs were the original signals. When the input signals were transformed by adding low-level noisy signals with the SNR in the range of [0, …, 10], the performance of the method in [26,42] was significantly reduced, but the performance of the proposed algorithm and the methods in [24,25] were reduced insignificantly; however, when the input signals are transformed by adding high-level noisy signals with the SNR in the range of [−8, …, −1], their performance decreases dramatically. Obviously, the noise makes it harder to extract fault signatures from the signals. It can be observed that in the worst case (−8 dB), the methods in [24,26,42] totally fail since their accuracy is under 50%. The trend of all methods' performance is, the higher the noise level, the lower the accuracy of the diagnosis result. Among all methods, the proposed one achieves the best performance with good robustness against noise. Even under the worst noise case (−8 dB), it achieves an accuracy of 57%. Comparing the structures of all methods, we can see that the method in [25] only takes care of the time-frequency domain features; the methods in [24,26,42] only consider the time domain features. The difference of the proposed methods is that they can receive multiple feature domains. As a result, the proposed method can extract more robust features from signals in noise conditions. So, it can be concluded that the proposed network with a multi-branch deep structure can extract fault features more effectively even under severe noise conditions. That leads to better diagnostic performance compared to other DNNs.

Conclusions
This paper proposed a novel method of bearing fault diagnosis based on vibration signals. By using simple transformation methods, time-series vibration signals are transformed into high dimensional data form (MDIR). By using this transformation, the task of fault diagnosis becomes the task of image classification. A novel DNN with a multi-branch structure is proposed to handle the MDIR data of vibration signal, named MB-DNN. The proposed MB-DNN inherits the advantages of CNN in processing high-dimensional data. In addition, MB-DNN has a multi-branch structure with two branches that can simultaneously extract features from the time domain and time-frequency domain. The proposed method obtains high classification accuracy; especially, it shows the efficiency even under noise effects. The proposed algorithm can be applied as an automatic fault detection process for the early detection of bearing faults; therefore, it helps to reduce the failure rate of machinery and save repair costs. Besides the ability to process vibration signals, the In the previous section, all compared methods achieved very high diagnostic accuracy when the inputs were the original signals. When the input signals were transformed by adding low-level noisy signals with the SNR in the range of [0, . . . , 10], the performance of the method in [26,42] was significantly reduced, but the performance of the proposed algorithm and the methods in [24,25] were reduced insignificantly; however, when the input signals are transformed by adding high-level noisy signals with the SNR in the range of [−8, . . . , −1], their performance decreases dramatically. Obviously, the noise makes it harder to extract fault signatures from the signals. It can be observed that in the worst case (−8 dB), the methods in [24,26,42] totally fail since their accuracy is under 50%. The trend of all methods' performance is, the higher the noise level, the lower the accuracy of the diagnosis result. Among all methods, the proposed one achieves the best performance with good robustness against noise. Even under the worst noise case (−8 dB), it achieves an accuracy of 57%. Comparing the structures of all methods, we can see that the method in [25] only takes care of the time-frequency domain features; the methods in [24,26,42] only consider the time domain features. The difference of the proposed methods is that they can receive multiple feature domains. As a result, the proposed method can extract more robust features from signals in noise conditions. So, it can be concluded that the proposed network with a multi-branch deep structure can extract fault features more effectively even under severe noise conditions. That leads to better diagnostic performance compared to other DNNs.

Conclusions
This paper proposed a novel method of bearing fault diagnosis based on vibration signals. By using simple transformation methods, time-series vibration signals are transformed into high dimensional data form (MDIR). By using this transformation, the task of fault diagnosis becomes the task of image classification. A novel DNN with a multibranch structure is proposed to handle the MDIR data of vibration signal, named MB-DNN. The proposed MB-DNN inherits the advantages of CNN in processing high-dimensional data. In addition, MB-DNN has a multi-branch structure with two branches that can simultaneously extract features from the time domain and time-frequency domain. The proposed method obtains high classification accuracy; especially, it shows the efficiency even under noise effects. The proposed algorithm can be applied as an automatic fault detection process for the early detection of bearing faults; therefore, it helps to reduce the failure rate of machinery and save repair costs. Besides the ability to process vibration signals, the proposed method can be applied to process the current signals and acoustic emission signals.