Bearing State Recognition Method Based on Transfer Learning Under Different Working Conditions

Bearing state recognition, especially under variable working conditions, has the problems of low reusability of monitoring data, low state recognition accuracy and low generalization ability of the model. The feature-based transfer learning method can solve the above problems, but it needs to rely on signal processing knowledge and expert diagnosis experience to obtain the cross-characteristics of different working conditions data in advance. Therefore, this paper proposes an improved balanced distribution adaptation (BDA), named multi-core balanced distribution adaptation (MBDA). This method constructs a weighted mixed kernel function to map different working conditions data to a unified feature space. It does not need to obtain the cross-characteristics of different working conditions data in advance, which simplifies the data processing and meet end-to-end state recognition in practical applications. At the same time, MBDA adopts the A–Distance algorithm to estimate the balance factor of the distribution and the balance factor of the kernel function, which not only effectively reduces the distribution difference between different working conditions data, but also improves efficiency. Further, feature self-learning and rolling bearing state recognition are realized by the stacked autoencoder (SAE) neural network with classification function. The experimental results show that compared with other algorithms, the proposed method effectively improves the transfer learning performance and can accurately identify the bearing state under different working conditions.


Introduction
Modern industrial production technology makes great contributions to improving productivity, reducing losses, saving natural resources and human resources, reducing the scrap rate, and ensuring product quality. As a key piece of equipment, rotary machinery is widely applied in important engineering fields, such as power, electric power, chemical, metallurgy, mining and machinery manufacturing. Once large-scale mechanical equipment fails, it will cause huge economic losses and even cause different degrees of casualties. As a key component of rotating machinery, rolling bearings play an important role in ensuring safe and efficient operation of the machine. The working condition of the rolling bearing not only affects the operation of the machine itself, but also the subsequent production. According to statistics, in all rotating machinery faults, bearing failure accounts for about 30% [1]. Therefore, it is of great significance for the continuous production system to realize the state recognition of rolling bearings. The rapid development of signal analysis and processing technology, is understandable. Section 3 gives a flow chart of the bearing diagnostic algorithm for the proposed method. We performed experiments to show the good performance of the proposed method in the bearing state identification in Section 4, and a conclusion is presented in Section 5.

Balanced Distribution Adaptation
As a cross-domain, cross-model and cross-task learning method, transfer learning can effectively solve the problem of different distributions of source data and target data. Assuming the original space is X and the class label is Y, the labeled source data and the unlabeled target data are D S = {X S , Y S } and D T = {X T , Y T }, respectively, where X S , X T ∈ X and Y S , Y T ∈ Y. The marginal probability distributions and conditional probability distributions of source data and target data respectively are P(X S ), P(Y S /X S ) and P(X T ), P(Y T /X T ). In practical applications, the marginal probability distributions and conditional probability distributions of source data and target data are not equal, i.e., P(X S ) P(X T ) and P(Y S /X S ) P(Y T /X T ). The goal of transfer learning methods is to minimize the marginal and conditional distribution discrepancy between source data and target data. BDA as a transfer learning method can adaptively minimize the marginal and conditional distribution discrepancy between domains by exploiting a balance factor µ. The formula is as follows: BDA uses the maximum mean difference (MMD) to calculate the distribution difference between the two domains. The method assumes that there is a mapping function φ, which satisfies P(φ(X S )) ≈ P(φ(X T )) and P(Y S /φ(X S )) ≈ P(Y T /φ(X T )). Therefore, Equation (1) can be represented as: where n s and n t are the number of samples of source data and target data, respectively, and C is the number of categories. By further taking advantage of matrix tricks and regularization, Equation (2) can be represented as: where the first term represents the marginal and conditional distribution divergences between the two domains, A denotes the transformation matrix, and λ is the regularization parameter. The constraints A T XHX T A = I ensure that the transformed data A T X maintain the important properties of X S and X T . I n s +n t ∈ R (n s +n t )×(n t +n s ) is the identity matrix, and H = I n s +n t − (1/n s + n t )11 T is the central matrix. µ ∈ [0, 1] is estimated by searching in the experiment. The calculation formula for each element of the M 0 and M c is as follows: The above non-convex optimization problem is transformed into the trace optimization problem by the Lagrange multiplier method, and the specific process will not be described.

Multi-Core Balanced Distribution Adaptation
BDA has the following two problems: (1) it is inefficient to get the balance factor µ; and (2) it needs to obtain some cross-characteristics of different domain data in advance. In order to solve the above problems, this paper proposes the MDBA method. The kernel function plays an important role in the BDA. The effect of a single kernel function in transfer learning is not ideal. Weighted mixed kernel functions combine the advantages of different kernel functions, but add a new parameter γ. The formula for weighted mixed kernel functions is as follows: where K RBF and K Ploy are the radial basis function (RBF) and ploynomial kernel (Ploy) function respectively, and γ ∈ [0, 1] controls the weight of the two kernel function. The formula of K RBF and K Ploy is follows: MBDA adopts A − Distance to empirically estimate γ and the specific process is as follows: (1) Use SVM to train two-classifiers h to distinguish source data from target data and obtain the loss value err(h); (2) Calculate A − Distance between source data and target data, and the formula is as follows: (3) Calculate the balance factor γ, and the formula is as follows: where X S and X T respectively represent source data and target data after kernel mapping. A RBF (X S , X T ) and A Ploy (X S , X T ) represent the values of A − Distance. The larger the value is, the greater the difference between source data and target data after the kernel mapping, so the weight of the corresponding kernel function is smaller, and vice versa.
The balance factor µ plays an important role in minimizing the marginal and conditional distribution discrepancy between the domains. BDA evaluates the µ by searching its values in experiments, but it is not an effective solution. In order to effectively adjust the marginal and conditional distribution of the importance on different tasks, we estimate the balance factor µ by adopting A − Distance, and its formula is as follows: where A Marginal (X S , X T ) represents the A − Distance value of the marginal probability distribution for source and target domains, and A Conditional (X S , X T ) represents the A − Distance value of the conditional probability distributions for the source and target domains. When µ → 0 , it means that there is a big difference between source data and target data. Therefore, the marginal distribution adaptation is more important, and vice versa.

Autoencoder
As one of the classic models of neural networks, the autoencoder consists of two stages of encoding and decoding. Its structure is shown in Figure 1. The autoencoder converts the original data of the high-dimensional space into the coding vector of the low-dimensional space by encoding, and then reconstructs the coding vector into the original data by decoding. The specific implementation process is as follows: • Coding phase: the information is transmitted from front to back.
where, assuming the input layer is X = {x 1 , x 2 , . . . x n }, the subscript in the formula indicates that there are n training samples, w 1 and b 1 are the weight and bias of the encoding layer, respectively, and f ( * ) is the excitation function, which is usually a sigmoid or tanh function.

•
Decoding phase: the information is transmitted from the back to the front: wherex 1 is the output value of the decoding layer, w 2 and b 2 are the weights and bias of the decoding layer, respectively, and w 2 = w T 1 .
Sensors 2020, 20, x FOR PEER REVIEW 6 of 13 As can be seen from the above process, the autoencoder belongs to unsupervised training. The training process of the autoencoder is to find the network parameters to minimize the reconstruction error on the training set D. The reconstruction error is generally the quadratic cost function, and the expression is as follows: where X is the predicted value, is the number of samples, and is the reconstructed error function. When the reconstruction error is small enough, the compressed feature vectors of the encoding layer retain most of the information of the original data. As can be seen from the above process, the autoencoder belongs to unsupervised training. The training process of the autoencoder is to find the network parameters to minimize the reconstruction error on the training set D. The reconstruction error is generally the quadratic cost function, and the expression is as follows: whereX is the predicted value, N is the number of samples, and L is the reconstructed error function. When the reconstruction error is small enough, the compressed feature vectors of the encoding layer retain most of the information of the original data.

Stack Autoencoder Neural Network
The SAE neural network is a deep neural network composed of multiple autoencoders. The basic principle is that the output of the previous autoencoder is used as the input of the latter autoencoder. Its structure is shown in Figure 2. Compared with the autoencoder network, the SAE neural network is more expressive and can extract more abundant features from original data. The encoding process of an m-layer SAE neural network is as follows: where a (l) is the output value of the lth encoding layer; w (l,1) , b (l,1) is the weight and bias of the lth encoding layer; and z (l) and z (l+1) are the input value of the lth layer and the l + 1th layer, respectively.
The SAE neural network transmits information from the back to the front at the decoding process: where a (m) is the activation value of the deepest hidden unit, and w (n−l,2) , b (n−l,2) are the weight and bias of the decoding layer. The SAE network structure is as follows: Sensors 2020, 20, x FOR PEER REVIEW 7 of 13 In this paper, the quadratic cost function and the cross-entropy cost function are used as the objective function of the first and second stages. The quadratic cost function is elaborated in Section 2.3.1, and the cross-entropy cost function is as follows: The training process of the SAE neural network with classification function includes two stages: model pre-training and model fine-tuning:

•
Model pre-training. The SAE neural network is constructed and the network model parameters are initialized through the unsupervised layer-by-layer training mode. Through pre-training, all hidden layers are obtained, and the features learned at each layer represent different levels of data characteristics.

•
Model fine-tuning. Add a classification layer at the top of the SAE network and fine-tune the pre-training parameters to implement the classification function. Fine-tuning training takes all the layers of the SAE neural network as a whole model to train. At each iterative training, each parameter of the model is optimally adjusted. Therefore, fine-tuning training can improve the performance of SAE deep neural networks.
In this paper, the quadratic cost function and the cross-entropy cost function are used as the objective function of the first and second stages. The quadratic cost function is elaborated in Section 2.3.1, and the cross-entropy cost function is as follows: where z j i is the predicted probability that the x i belongs to category j, h θ,j (x i ) represents the jth value of the output vector, and θ is neural network parameter. The SAE neural network can automatically extract features and has powerful feature expression capabilities. In fault diagnosis, the SAE neural network has the functions of noise reduction filtering and feature extraction.

Bearing State Recognition Method and Process under Different Working Conditions
The algorithm flow of the bearing state recognition method based on transfer learning under different working conditions is shown in Figure 3. The specific implementation process is as follows: 1.
Calculate the spectrum of labeled bearing data and unlabeled bearing data, and normalize the amplitude to the range of [0, 1]. Because the signal's spectral amplitude is symmetrical about the origin, the positive frequency part is used as feature vectors. This not only ensures that information is not lost, but also reduces the number of calculations. The positive frequency domain amplitude of labeled bearing data is used as labeled source data, and the positive frequency domain amplitude of unlabeled bearing data is used as unlabeled target data.

2.
Map labeled source data and unlabeled target data in (1) to the same feature space by using the MBDA algorithm. 3.
The training process of SAE is also a feature self-learning process, which can further extract features. The training process of SAE includes two parts: unsupervised pre-training and supervised fine tuning. Unsupervised pre-training is used to initialize network parameters, and supervised fine-tuning implements classification by adding a classification layer on top of the network. Labeled source data after spatial mapping in (2) are used as training samples to train the model, and finally the training model is obtained. Unlabeled source data after spatial mapping in (2) are input into the model, and the rolling bearing state recognition results are obtained.

Experimental Data
The experimental data are from the bearing data center of the Case Western Reserve University Laboratory. In this experiment, the SKF6205 drive end bearing vibration data is used as experimental data. Four different rotational speeds and different motor loads represent different working conditions A, B, C and D. Each working station includes four states: normal state (NS), internal raceway fault (IF), external raceway fault (OF) and ball fault (BF). The detailed information of the experimental data is shown in Table 1. In Table 1, the fault diameter of IF, OF and BF indicates the diameter of a bearing inner raceway fault, outer raceway fault and ball fault. This paper constructs data sets under single and multiple working conditions, as shown in Table 2. Where A(T)-B(S) indicates that the source data (training data) is from working condition B, and the target data (test data) is from working condition A; A(T)-BC(S) indicates that source data (training data) is from working condition BC, and target data (test data) is from working condition A; AB(T)-CD(S) indicates that source data (training data) is from working condition CD, and target data (test data) is from working condition AB; A(T)-BCD(S) indicates that source data (training data) is from working condition BCD, and target data (test data) is from working condition A.

Model Performance Analysis
In the experiment, source data (training data) and target data (test data) are selected from single or multiple working conditions data sets. Taking A(T)-B(S) as an example, labeled source data B(S) is used to train the SAE neural network, and unlabeled target data A(T) is input into the model to obtain the bearing state. Training samples after fast Fourier transform (FFT) are normalized, and the amplitude of the positive frequency part is taken as a feature vector to train the SAE network. In the experiment, the normalized positive frequency part of the vibration data is used as the feature vector. This not only ensures that information is not lost, but also reduces the number of calculations. Generally, at neural network structure, the higher the number of network layers, the stronger the network expression ability. However, when the number of network layers is too large, it is difficult to train the model. After the previous experiments, this paper uses a three-layer SAE network model, the structure of the network is set to 64-32-16, bath_size is set to 100, and the number of iterations is 200. In the experiment, the normalized positive frequency part of the vibration data is used as the feature vector. This not only ensures that information is not lost, but also reduces the number    showing the state recognition accuracy of test data (target data) and training data (source data) under four working conditions, respectively. The trend of the accuracy of training data (source data) and test data (target data) is synchronized. Therefore, there is no over-fitting. In the case of single/single conditions A (T)-B (S), the state recognition accuracy of the test data almost reaches 100%. In the case of single/multiple conditions A (T)-BC(S) and A (T)-BCD(S), the state recognition accuracy of the test data is 98.50% and 96.86%, respectively. In the case of multiple/multiple conditions AB (T)-CD (S), the state recognition accuracy of the test data reaches almost 90.50%. The experimental results show that the MBDA-SAE method can obtain higher state recognition accuracy under variable working conditions.

Analysis and Comparison of MBDA and other Algorithms
In order to prove the advantages of the MBDA method, this paper introduces traditional transfer learning methods such as the TCA, JDA and BDA methods. The results after different transfer learning methods and SAE networks under variable working conditions are shown in Table 3. As can be seen from the above table, the state recognition accuracy of the MBDA-SAE method is higher than that of the TCA-SAE, JDA-SAE, and BDA-SAE methods. The reason is that the multi-core kernel function has a good advantage in dealing with the imbalance between source data and target data. JDA-SAE and BDA-SAE methods take the difference of marginal distribution and conditional distribution as objective functions, so their state accuracy is higher than the TCA-SAE method. Compared to the JDA-SAE method, the BDA method adds a balance factor obtained by searching in experiments to balance the marginal distribution and conditional distribution. The BDA-SAE method achieves higher state recognition accuracy at the expense of efficiency.

Conclusions
In this paper, we propose a rolling bearing state recognition method based on a WDBA-SAE neural network under different working condition data. The following conclusions have been obtained through experiments: (1) This method depends on BDA theory, and constructs a weighted mixed kernel function to map different working condition data to a unified feature space, which effectively minimizes the distribution divergence between different working conditions data. The MDBA method does not need to obtain the cross-characteristics of different working conditions data in advance, which simplifies data processing. (2) This paper adopts the A − Distance algorithm to calculate the balance factor of the distribution and the balance factor of the kernel function. It can adaptively balance the importance of the marginal and conditional distribution and the importance of different kernel functions, and improve efficiency. (3) The MDBA method was compared to other transfer learning methods, such as TCA, JDA and BDA. In the case of a single/single condition A (T)-B (S), the accuracy of the bearing state recognition methods based on the JDA-SAE, BDA-SAE and MDBA-SAE methods reached more than 90%. However, the diagnostic accuracy based on the TCA-SAE method is 75%. In the case of multiple/multiple conditions AB (T)-CD (S), the state recognition accuracy of the method proposed in this paper reaches more than 90%. However, the accuracy of other methods is less than 80%. Therefore, the advantages of this method are more obvious under multiple/multiple conditions. Experiments showed that the MDBA method can better recognize the unknown state of rolling bearings under variable working conditions.
The follow-up work of this article is as follows: (1) During the deep neural network training process, multiple experiments are required to determine better hyperparameters (such as the number of network layers, the number of neurons, the number of iterations, etc.), and then the setting of the hyperparameters will be studied; (2) The features extracted from the multi-layer network feature space will be visualized; (3) This article only studies bearing-related faults, and subsequent studies will distinguish other faults, such as unbalanced loads and broken rotor bars.