Research on Intelligent Fault Diagnosis of Rolling Bearing Based on Improved Deep Residual Network

: Rolling bearings are the most fault-prone parts in rotating machinery. In order to find faults in time and reduce losses, this paper presents an intelligent diagnosis method for rolling bearings. At present, the deep residual network (RESNET) is the most widely used convolutional neural network (CNN) and has become one of the hotspots in fault diagnosis. However, the fully connected layer of the deep residual network has the disadvantage of too many training parameters, which makes the model training and testing time longer. So, we proposed a new network structure which the global average pooling (GAP) technology replaces the fully connected layer part of the traditional RESNET. It effectively solves the problem of too many parameters of the traditional RESNET model, and uses data enhancement, dropout, and other deep learning training techniques to prevent the model from overfitting. Experiments show that the accuracy of fault diagnosis of the improved algorithm reaches 99.83%, training time has been shortened. Also, the whole process of rolling bearing fault detection does not need any manually extract features, and this “end-to-end” algorithm has good versatility and operability.


Introduction
Hydropower Units, Large wind power equipment, and other rotating machinery are developing towards high precision machinery field. A reliable health detecting system is key for the steady operation of mechanical equipment [1]. Rolling bearing affects the overall performance of Rotating machinery [2]. Fault diagnosis of rolling bearings has attracted more and more attention. It can minimize maintenance costs and increase system reliability [3].
Rolling bearing fault diagnosis is mainly to extract features from vibration signals that was collected, and to identify and classify faults. The fault diagnosis methods based on signal processing have been widely studied. A.H. Zamanian, et al. developed a fault diagnosis method based on the Gaussian correlation of vibration signals and wavelet coefficients for gear [4]. Wei Fan a, Gaigai CAI put forward fault diagnosis modus based on sparse representation in wavelet basis for gearbox and it is superior to empirical mode decomposition (EMD) especially in transient feature extraction [5]. Kang, M., Kim, J., Kim, J.-M. proposed a fault diagnosis scheme based on a binary bat algorithm (BBA), that is superior to other dimensionality [6]. Su, Z., et al. uses supervised extended local tangent space alignment (SE-LTSA) for dimensionality reduction to make better the effectiveness of fault diagnosis in rotating machinery [7]. Yu, D., et al. put forward a new morphological component analysis (MCA) method used for the complex fault diagnosis of gearboxes [8]. Dolenc B examines a method based on vibration analysis for diagnosis of distributed bearing faults [9]. An, X., et al. proposed a new vibration analysis method based on the adaptive local iterative filtering used for a hydropower unit [10]. Hu, A., et al. presented a novel method based on linear transformation of intrinsic time-scale decomposition (ITD) and the cubic spline interpolation. Also, this method was used for diagnosing wind turbine faults with decomposing nonstationary vibration signal, that it did identify wind turbine gearbox fault. The method can solve the problem that is recognizing fault conditions when two or more fractal dimensions are close to each other [11]. From these features, fault detection and classification can be done through various machine learning techniques [12]. A support vector machine (SVM) is to detect bearing faults using harmonics of fault-related frequencies from vibration signals [13]. It also involved an ANN for bearing faults using a genetic algorithm [14]. Support vector machine (SVM) and Artificial neural network (ANN) are the two most popular intelligent diagnosis methods.
Although traditional intelligent diagnosis methods have made a great contribution to rolling element-bearing fault diagnosis, they still have some shortcomings. For example, feature extraction or selection relies heavily on expert knowledge and extensive human labor [15]. Furthermore, Artificial intelligence methods, such as SVM and ANN are shallow learning models. It is difficult for the shallow learning model to learn complex nonlinear relationships effectively [16][17][18]. Hence, deep learning has been investigated for automatic and effective fault feature learning of rolling-element bearing in recent years.
Deep learning is no longer a new concept [19]. At first, deep learning was used in image processing, audio processing, and other related fields, and has achieved great success [20,21]. Therefore, researchers have introduced deep learning into fault diagnosis. Wang, X., et al. proposed a new data-driven remaining useful life (RUL) estimation approaches of bearings based on Deep Spatiotemporal Convolutional-Neural-Network [22]. Feng, J., et al. applied deep neural networks to conquer the flaw of the before-mentioned intelligent diagnosis methods [23].
Tra, V., et al. presented a novel method using Convolutional Neural Networks and the Stochastic for detecting early bearing defects under changeable operating speeds [24]. Xia, M., et al. developed a new approach based on convolutional neural network (CNN) and multiple sensors for fault diagnosis of rotating machinery [25]. Khan MA et al. proposed a new DL model according to a dilated convolutional neural network (D-CNN) used for bearing faults detection in induction motors (IMs) [26]. Kumar, A., et al. put forward method that was applied to recognize faults of the centrifugal pump using advanced convolution neural network (ACNN) and acoustic images [27]. Shao, Y., et al. combine support vector machine (SVM) and convolutional neural network to propose a new hybrid intelligent fault diagnosis frame which is better than some traditional fault diagnosis methods and has high precision for rolling bearings [28]. Qin, Y., et al. applied the Optimized Deep Belief Networks with Improved Logistic Sigmoid Units for Planetary Gearboxes of Wind Turbines [29]. Jun, P., et al. developed a LiftingNet which achieved layer wise feature learning and effectively classify mechanical failure data even with different speeds and under the effects of random noise [30]. Zhou Q et al. combine convolutional neural networks and nonlinear auto-regression neural networks to put forward a new method for imbalanced fault diagnosis of the rotating machinery [31]. Azamfar, M., et al. put forward a method based on motor current signature analysis and 2-D convolutional neural network used for gearbox fault diagnosis [32]. Xie, S., et al. presented a new convolutional neural network with a one-dimensional structure (ODCNN) for the automatical fault diagnosis of rolling bearings [33]. However, these methods have too many parameters, and the network convergence speed is slow and cannot be used to practical projects.
To solve this issue, this paper developed an improved deep residual network for developing an intelligent fault diagnosis system. By changing the deep residual network structure, the network training time can be shortened, and the fault recognition accuracy can be improved. The novel method can extract features from the raw vibration signal and avoid manual feature extraction. The method is more reliable and effective than traditional fault diagnosis methods. Based on the excellent performance of the improved deep residual network, the rolling element bearing fault detection technology in a onedimensional vibration signal is studied. Also, from the experimental results, it is validated that our proposed ResNet model is effective. To summarize, our contributions as following: In this paper, the improved deep residual network structure is proposed, and GAP is introduced to replace the full connection layer. It is an effective method to solve the problem of too many parameters in traditional deep residual network model. The method is validated on rolling bearing fault data of different types. Compared with existing models, the training time of this model is shortened, and the classification accuracy is higher.
The rest of arrangement of this article is as follows. Section 2 presents the background of ResNet. In Section 3, the particulars of the proposed fault diagnosis method are presented. Section 4 presents one comprehensive case study of fault diagnosis for rolling element bearings to illustrate the effectiveness of the presented method. Section 5 is the summary of paper.

Deep Residual Network (ResNet)
ResNet was proposed by Dr. He in 2015 [34]. The ResNet model is an updated version of the ConvNet model. However, it is different from traditional deep learning. The ResNet adds identity mappings, it is convenient for the backpropagation of errors and the optimization of model parameters, and further reduces the training difficulty of deep neural networks. It has generated great outcomes in computer vision-related tasks such as image segmentation, image recognition, and target positioning. Therefore, ResNet is used in this study. ResNet principally consist of a certain amount of residual building blocks, several convolutional layers, a global average pooling, and a fully connected output layer. This section introduces the theory of ResNet in detail.

Convolutional Layer
In the convolutional layer, the inputs and the convolutional kernels are convolved to get the feature maps. Meanwhile, the weights of the convolutional kernels are allocated over the input. So, this significantly reduces the number of parameters required to train. The mathematical form of the convolutional operation can be expressed by: where l Indicates layer number of the network; w r is the output of feature map, r is the index of the input feature maps, r is the index of the output feature maps. The convolution operation can also be understood from Figure  1. Input is 4 by 4, r = 1, There are two kernels, each kernel is 2 by 2, r = 2.

Max Pooling Layer
The pooling layer primarily performs a down-sampling operation. In this study, the input signal is the time domain signal. So, we use the maximum pooling function. The advantage is that you can obtain location-independent features. It's important for timedomain signals. The pooling operation can be understood from Figure 2. We are taking 2 × 2 region, and taking a stride of 2. Since we start from this pool kernel is like a 2 × 2 region, which gives you the 9. Also, then, you step it over two steps to look at this region to give you the 2. The mathematical form of the max pooling operation is expressed by:

Residual Building Block
ResNet is often composed of several residual building blocks, and it is the core component of the model. Two common residual building blocks are shown in Figure 3 [35]. Figure 3a is the original residual building block, Figure 3b is the proposed residual building block. The old residual building block and the presented residual building block are consisted of two convolutional layers, two batch normalizations (BN), and two ReLU activation functions, but the location of the ReLU activation function is different. Enter the residual path and the identity mapping passed and add them before the next ReLU activation function of the original building block. BN and ReLU before each convolutional layer in the proposed residual building block. So, the proposed residual building block has a path directly connecting the input and output. It is more conducive to the backpropagation of errors in the neural network, and thus easier to train and improve generalization. Therefore, the proposed residual building block is used in this paper. The frequently used activation function is the rectified linear unit (ReLU) which can accelerate convergence. ReLU activation function rarely encounter the gradient vanishing problems in that its derivative is either 1 or 0. The activation function can be expressed as: where x is the input of the ReLU and Y(x) is the output of the ReLU, accordingly.

Global Average Pooling
To avoid overfitting, Global Average Pooling (GAP) layer is adopted in the ResNet. A brief introduction of GAP is presented below. The fully connected layer is usually located in the last two layers of traditional CNN. It can be connected with traditional neural network and convolutional structure. The full-connection layers make predictions, such as classification while the convolutional layers extract features and output feature maps. However, due to large number of parameters in the full connection layers, overfitting is easy to occur. To address this problem, GAP is introduced into the deep residual network. In CNNs, GAP instead of full connection layer was firstly proposed in their work by Lin et al. [36] GAP averages the feature maps and outputs a single value and to obtain a vector, which can be interpreted as the category of the classification confidence map. GAP layer has no parameters to optimize. So, this greatly decreases the number of parameters to avoid overfitting.

The Objective Function
First, the output layer uses the softmax activation function. It achieves an event probabilities distribution over different event. The objective function calculates the probability of each target category in all possible target category. Softmax layer operation can be specified as: k is the classes number (health status), y i are input of the softmax function. P(y i ) stand for output feature maps of the softmax function. P(y i ) can be regarded as the reckoned possibility of an observation belonging to the ith class. Then, it calculates loss when training the layer of ResNet. The objective function of ResNet must be reduced to the least for precise data prediction. In multi-class classification problems, Cross-entropy error is usually used as the target to be minimized [37]. A cross-entropy loss function can be presented as: t i and P(y i ) are the target value and the forecasted value separately.

The Proposed Method
In this research, a deep ResNet framework for feature learning and fault diagnosis of rolling bearing is proposed. The structure of the network composed of an input layer, a convolution layer, a max-pooling layer, eight residual blocks, then the following are a GAP and a softmax output layer. The RESNET is the most widely used Convolutional Neural Network (CNN) and has become one of the hotspots in terms of fault diagnosis. However, due to large number of parameters in the full connection layers, the convergence speed is slow during network training. This paper proposed an improved RESNET algorithm for intelligent fault diagnosis of rolling bearings. The complete network structure of the presented ResNet is demonstrated in Figure 4. This method improved the RESNET structure and introduced GAP technology [36] to replace the connection layer part, reducing the amount of training parameters and testing time of the model. The proposed method does not need to perform any manual feature extraction and feature transformation operations for the original data during the entire fault diagnosis process. It only needs to input the original fault data of the rolling bearing into the improved RESNET model, and the fault diagnosis results are automatically output. The "end-to-end" algorithm structure has better operability and versatility.
As demonstrated in Figure 4, the input of this network is the one-dimensional timedomain signal of the rolling bearing fault signal, and the probability distribution of each failure type is the output of the network.
Residual block is consisted of two convolutional operations, ReLU activation functions, batch normalizations (BNs), and one identity shortcut, as shown in Figure 3b. The parameters of the convolution layer in the residual block are shown in Table 1.  The size of convolution kernel in the residual block is all 1 × 3, the quantity of convolution kernels is 1, and the main difference lies in the stride.

Experimental Verifications
Programming with open source Python language (version 3.5) and TensorFlow (version 2.0) toolkit from Google to realize the ResNet model. TensorFlow, developed by Google, is an open-source machine learning library based on TensorFlow graphs. It has the function of automatically solving the reverse gradient to optimize the model parameters (weights and bias), and is suitable for the rapid development of deep learning algorithms. At the same time, the TensorFlow toolkit supports large-scale and fast matrix computations based on image processing units, greatly reducing the training time required for the ResNet algorithm.

Experimental Data Collection
The Case Western Reserve University Bearing Data Center provides experimental data. As shown in Figure 5, the test stands mainly composed of a torque transducer/encoder (center), a 2 hp motor (left), and a dynamometer (right). Motor bearings were used to inoculate faults by electro-discharge machining (EDM). The fault 0.007 inches, 0.014 inches, and 0.021 inches in diameter were introduced separately at the inner raceway, rolling element (i.e., ball), and outer raceway. Therefore, totally we have one normal condition (no-fault) and nine different faults. Bearings are tested under different conditions consisting of normal condition, ball fault (BF), outer race fault (OR), and inner race fault (IR). Vibration data was collected for motor loads of 3 horsepower (motor speeds of 1730 RPM) at 12,000 samples per second.
For training our proposed ResNet model, we employ enough training samples.

Signal Normalization
In order to increase the reliability of the model, the input signal needs to be normalized. As shown in the following equation.

Data Augmentation
Data augmentation is overlapping sampling, that is, for training samples, when training samples are recorded from the original signal, it has overlap between each segment of signal and the next segment of the signal. When the step is smaller than the signal length of a single sample, there is overlap between samples, and more samples can be extracted with the fixed-length signal. This is shown in Figure 6. When the step is the same as the data length, there is no data augmentation. In our case, the stride is 28. For each type of vibration signal collected, data segmentation is performed. A point is inserted randomly first, and then 1024 points are taken. In this way, 600 samples can be obtained after repeated operation 600 times. Since there are 10 kinds of signals, a total of 6000 samples are obtained, and then training set 4200, verification set 1200 and test set 600 are divided according to the proportion of 7:2:1. The composition of experimental sample data is shown in Table 2. The vibration data waveform of the bearing in 10 states is shown in Figure 7. Figure 7a1 is the waveform of the normal state.

Dropout
Among the deep learning algorithms in recent years, Dropout is a commonly used method to reduce overfitting [38]. During each training iteration, some neurons are dropped randomly by Dropout, so that the neural network only propagates forward and updates backward the parameters of retained neurons. In this way, Dropout can weaken the "cooperative relationship" between neurons and make each neuron function more independently, thus achieving the effect of model regularization. In this study, the dropout is used during training and not during testing. The dropout rate is set to 0.5 [38]. Dropout technologies were used to solve the overfitting problem.

Hyperparameter Setup
Hyperparameters have a great influence on the fault diagnosis accuracy of residual neural networks. According to the paper [39], the more important hyperparameters are optimizer, learning rate, activation function, convolution kernel, and pooling kernel. The hyperparameters of the experiments were made on the basis of the empirical recommendation. In our case, ReLU activation functions are selected. The convolution kernel and pooling kernel are shown in Figure 2. Adam optimizer was used in the experiment. The learning rate is too large to convergence, too small training is too slow. In this study, the exponential decay learning rate is used to optimize this problem. Set the initial learning rate to 0.01, then gradually reduce the learning rate through iteration, and the attenuation coefficient is 0.99. The best classification effect can be achieved when the learning rate is 0.001.

Outcome of Experiment
The established ResNet method was adopted in fault diagnosis based on vibration using a dataset collected from bearings. In the training of the improved ResNet model, the Adam optimization algorithm was also adopted to improve the overfitting problem. The Adam (Adaptive Moment Estimation) algorithm is an algorithm that combines Momentum algorithm and RMSProp algorithm. Also, data augmentation and Dropout technology were used to improve the overfitting problem. The mini-batch was 16, the ReLU activation function was used, and the number of cycle iterations was 100 rounds. The final improved ResNet has the highest accuracy of 99.83% on the test set, and its diagnostic results are shown in Table 3.  Table 3, it can be seen that the performance of the improved RESNET algorithm is significantly improved compared with the traditional fully connected RESNET algorithm. In terms of time, the number of model parameters is greatly reduced and the training time is significantly reduced in the improved RESNET algorithm because the full connection layer is removed, which is of great significance for the model to be applied to the online rapid diagnosis and monitoring of faults. In terms of accuracy, the accuracy of the improved RESNET algorithm has reached 99.83%, while the accuracy of the traditional RESNET algorithm is 98.48%; Figure 8 shows the training and testing result of one trial. Figure 8a shows the relationship between the epoch and the accuracy of the model. Figure 8b shows the relationship between the epoch and the cross entropy.

Discussion
In order to evaluate the accuracy of the developed ResNet more effectively, we quote the precision and recall to evaluate the algorithm [40][41][42]. Precision is the probability of actually positive samples out of all the predicted positive samples. The recall is the probability of being predicted to be positive samples out of actually positive samples. Precision and recall are shown as follows.
where P is precision, R is recall. TP stands for actual 1, predicted 1, predicted correctly. FP stands for actual 0, predicted 1, forecast wrong. FN stands for actual 1, predicted 0, forecast wrong. F1 is the harmonic mean of precision and recall. In this study, According to Equation (7) and (8), the precision and recall are calculated only based on the experimental results of improved ResNet in Table 4. As shown in Table 4: In order to further show the ability of the improved ResNet algorithm to identify minor faults and the details of the fault misjudgment, we introduce the multi-classification confusion matrices [37] to conduct a detailed quantitative analysis of the fault results. The confusion matrices comprehensively reflect the diagnosis precision and the number of misjudgments of bearings under different fault grades, as well as the information of the real fault types being misjudged. The confusion matrices quantization diagram of bearing corresponding to Table 4 is shown in Figure 9.
The confusion matrix of the improved ResNet is shown in Figure 9. The X-axis represents the predicted category of the fault, and the Y-axis represents the true label of the fault. The numbers on the main diagonal represent the accuracy of the improved ResNet algorithm for the correct diagnosis of each type of fault state.
Depending on the confusion matrix, two among the ten health states present misclassifications. F1 was misjudged as F2, F7 misjudged as F4. By analyzing the types of fault misclassifications, it can be seen that the above misclassifications are all errors between different fault categories, which basically belong to small faults misjudged as larger faults, which is meaningful for risk prediction. The recognition accuracy of this algorithm is 100% between the normal state and the fault state. It can be seen that the comprehensive fault identification rate can reach 99.8%. Experimental show that the improved ResNet algorithm has superior identification ability and higher diagnostic accuracy for the micro-faults of rolling bearings.

Performance Comparison
We did a comparison for the designed method with the modified CNN, SVM, KNN and DPBN. The results are shown in Table 5. The modified CNN gets an accuracy of 98.2%. The accuracy achieved using KNN is 91.9%. The accuracy of Support vector machine (SVM) is 94.1%. The accuracy of the DPBN is 92.3%. The comparison confirms that the improved ResNet method designed in this paper conducts better performance than the methods such as DPBN, SVM, and KNN and the existing CNN. This has been possible for the reason of the GAP involved in the improved ResNet. The GAP assures fewer amount of training parameters are needed and keep away overfitting troubles to training data. This GAP realizes deep learning and guarantees excellent defect identification result even if the data is unseen.

Conclusions
This paper developed an improved deep residual network for defects identification in the bearing. Modeling of modified ResNet is produced by using vibration signals attained by celerometers. Conclusions of the study are as following: RESNET acquired better performance by modifying its FC layer. GAP replaced the full connection layer part of the traditional RESNET model. Also, accordingly, the amount of training parameters is reduced and over-fitting of ResNet is avoided. This GAP realizes deep learning and guarantees high accuracy of defect identification even if the data is unseen. A contrast has also been done for the proposed method with the present machine learning methods and deep learning methods. Result states that the reliability of the designed method is up to 99.8%, which is much higher than the present machine learning methods and the existing deep learning methods when to ascertain defects of the rolling bearing.
The improved ResNet algorithm needn't to do any manual feature extraction of the original fault data but inputs the original fault data directly as the model, then automatically outputs the fault classification results. The "end-to-end" model has much better versatility and operability. Furthermore, Dropout, adaptive variable learning rate and data enhancement can also be used to effectively decrease training parameters and calculation time of the model while preventing model overfitting This method is verified based on experimental data, and the actual engineering data needs to be verified.