Research on Rolling Bearing Fault Diagnosis Method Based on Generative Adversarial and Transfer Learning

: The diagnosis of rolling bearing faults has become an increasingly popular research topic in recent years. However, many studies have been conducted based on sufﬁcient training data. In the real industrial scene, there are some problems in bearing fault diagnosis, including the imbalanced ratio of normal and failure data and the amount of unlabeled data being far more than the amount of marked data. This paper presents a rolling bearing fault diagnosis method suitable for different working conditions based on simulating the real industrial scene. Firstly, the dataset is divided into the source and target domains, and the signals are transformed into pictures by continuous wavelet transform. Secondly, Wasserstein Generative Adversarial Nets-Gradient Penalty (WGAN-GP) is used to generate false sample images; then, the source domain and target domain data are input into the migration learning network with Resnet50 as the backbone for processing to extract similar features. Multi-Kernel Maximum mean discrepancies (MK-MMD) are used to reduce the edge distribution difference between the data of the source domain and the target domain. Based on Case Western Reserve University (cid:48) s dataset, the feasibility of the proposed method is veriﬁed by experiments. The experimental results show that the average fault diagnosis accuracy can reach 96.58%.


Introduction
With the rapid development of industrial technology, the modernization level of industrial equipment is constantly improving, and the structure is becoming more complex. Casualties and heavy economic losses often accompany equipment failure. As the core component of the rotating mechanism, the rolling bearing supports the transmission elements through the rolling contact between the parts. Its working state plays a vital role in the reliability and safety of running equipment, so it is of great significance to study the fault diagnosis of rolling bearings.
In recent years, scholars at home and abroad have conducted ongoing fault detection research. The relevant research status is summarized as follows: (1) Fault detection method based on traditional machine learning.
Those who use traditional machine learning for fault detection, such as Yang et al. proposed a SCADA raw signal reconstruction modeling technique using support vector regression, which showed higher performance in wind turbine fault detection, false alarm, and missed alarm control [1]. Hajji et al. developed an improved FDD technology for photovoltaic system failures, using the principal component analysis (PCA) technique for extracting and selecting the most relevant multivariate features and the supervised machine learning (SML) classifiers are applied for fault diagnosis [2]. Zhang et al. established a data-driven wind turbine fault detection framework by combining random forest (R.F.) and extreme gradient lifting (XGBoost). Direct sensor signals or variables constructed by prior knowledge are classified. They conducted numerical simulations on three different types of wind turbines using the most advanced wind turbine simulator FAST [3]. SU et al. propose a distributed sensor-fault detection and diagnosis system based on machine learning algorithms, and its performance in fault detection is analyzed in terms of detection accuracy, the area under the ROC curve (AUC-ROC), false-positive rate, and F1 score [4]. Euldji et al. proposed a technique based on Vibration Analysis and the decision tree model. In the vibration signal analysis of the rolling bearing experiment, they used the classification algorithm to construct the decision tree and realize the development of the decision system [5]. Sharma et al. decomposed bearing fault signals under different working conditions by permutation entropy (P.E.) and flexible analytic wavelet transform (fawt) [6]. Xia et al. proposed an output hidden feedback Elman adaptive boostingbootstrap aggregation algorithm under the comprehensive diagnosis framework, which demonstrates good diagnosis performance and high stability in different stages of rolling bearing fault [7]. Singh et al. used Stockwell transform (S.T.) to analyze stator current signals and diagnose various motor operating conditions. The extracted features were input into two different support vector machine (SVM) models, and good classification accuracy was achieved [8]. The traditional machine learning method is simple and fast but only suitable for small data occasions.
(2) Fault detection method based on deep learning and transfer learning.
The rapid development of deep learning and transfer learning can be significantly improved when applied to fault diagnosis research. Feng et al. proposed a local connection network (LCN) constructed by the normalized sparse automatic encoder (NSAE-LCN), which integrates feature extraction and fault recognition into a general learning program, and the algorithm is validated on two different vibration signal datasets [9]. Wang et al. proposed an extended deep belief network (EDBN), which combines valuable information and features in the original data to construct a dynamic fault classifier to extract the process data of dynamic characteristics consideration [10]. Guo et al. proposed a hierarchical learning rate-adaptive depth convolution neural network based on an improved algorithm and studied its application in bearing fault diagnosis and severity [11]. Andrew et al. studied the feasibility of deep learning technology in fault detection of industrial cold forgings. After collecting data from several common faults, the convolutional neural network classifier detects fault conditions with 99.02% accuracy and further classifies each fault with 92.66% accuracy [12]. Ewert et al. detected the possibility of mechanical damage to permanent magnet synchronous motor (PMSM) by analyzing the mechanical vibration supported by a shallow neural network (N.N.) [13]. Huang et al. developed a new dual network prognosis model of deep convolution neural network (DCNN) and multilayer perceptron (MLP). At the same time, one-dimensional time series data and two-dimensional image data are used as inputs to improve the performance of bearing life prediction [14]. Hadi et al. proposed a new method of analyzing speed by using discrete cosine transform (DCT) and used probabilistic neural network (PNN) to identify bearing failure, which achieved a better recognition effect than traditional SVM and ANN classifiers [15]. Khorram et al. proposed an end-to-end fault detection method. Using the equivalent time series as the input of the novel convolutional Long and Short memory recursive neural network (CRNN) to detect bearing faults with the highest accuracy in the shortest possible time [16]. Supervised learning is based on many labeled data, improving classification accuracy. However, labeled data will consume many materials and financial resources in real industrial scenarios. Therefore, semi-supervised, unsupervised, and migration learning represent the hot topics studied by more scholars [17][18][19][20].  GAN was proposed by Goodfellow [21] in 2014. It has great potential in data generation. With its unsupervised characteristics, GAN and other variants have been widely used in image processing. Bowles generates synthetic samples with actual image appearance through GAN for medical segmentation tasks [22]. Loey et al. used the conditional generative adversarial networks (CGAN) based on the deep transfer learning model to generate more images for COVID-19 detection in chest CT scanning images [23]. Dixit et al. proposed a novel condition-assisted classifier GAN framework, which was combined with Model Agnostic Meta-Learning (MAML) to generate high-quality samples in the case of limited bearing fault data [24]. Li et al. proposed a semi-supervised FDD method to replace the GAN binary discriminator with a multi-class classifier, which realized efficient fault detection for building HVAC systems [25]. Zhou et al. redesigned the Generator and Discriminator of GAN, generated more fault samples using the global optimization scheme, and solved the high fault diagnosis error rate in imbalanced data due to uneven feature extraction [26]. Yan et al. adopted a custom generative adversarial network and proposed a GAN integration framework to rebalance the training data set of cooler AFDD [27].
(4) Summary and analysis of research status.
The above work directly classifies and diagnoses the original one-dimensional data; or simply processes the data for diagnosis. It does not take into account that the proportion of standard data and fault data in the real industrial scenario is not neat, the sample size is small, and the experimental data are different from the actual industrial scenario data.

•
Mechanical equipment is in regular operation, with few fault data and an imbalanced proportion of fault types, resulting in a poor fault diagnosis.

•
The research object is the labeled fault data, while the data diagnosed in the industry are often the unlabeled data. In addition, the amount of unlabeled data is far greater than the amount of labeled data. • There are differences in fault samples under different working conditions. It is necessary to establish different models to achieve accurate prediction, and a single model exhibits poor universality.
This paper proposes a composite semi-supervised fault diagnosis method combining continuous wavelet transform, Wasserstein Generative Adversarial Nets-Gradient Penalty (WGAN-GP), and transfer learning based on the analysis of the current research situation. Through the combination of continuous wavelet transform and WGAN-GP processing, and finally, through the fault diagnosis method of migration learning, the feasibility of the process is verified on the data set of Case Western Reserve University. The experimental results show that the method proposed in this paper can better solve the aforementioned problems. Figure 1 shows the network model. The steps of the diagnosis method are as follows:

WGAN-GP Based Transfer Learning Method
Step 1: Since the relationship between frequency and time was not fully expressed in one-dimensional time domain signals, the original rolling bearing fault signals were processed. Continuous wavelet transformation transformed the rolling bearings faults from one-dimensional vibration data into two-dimensional time-frequency images. The labeled source domain images were divided into ten types of defects and named real images.
Step 2: The divided real images were taken as the input of the WGAN-GP network, and the internal network parameters of generators and discriminators were updated through the game between them in the network. There were many false sample images generated after reaching the Nash equilibrium [28]. These images that have reached the Nash equilibrium are called fake images.
Step 3: After the image generation in Step 2, a second selection of fake images is made. First, resnet50 is trained by using only the wavelet transformed images in the source domain as the training set and the verification set, and the training weight of the network is obtained. The training weight is added to the prediction script for Resnet50. The prediction script of resnet50 selects the false sample images with a probability of prediction accuracy greater than 90%. After repeated selection, the reliability of the photos is relatively high. Finally, this part of the image is combined with the real image in Step 1 as a new source domain image.
Step 4: Migration learning adopts resnet50 as the backbone network, adds the Multi-Kernel Maximum mean discrepancies (MK-MMD) adaptation layer to the full connection layer of the resnet50 network, takes the fault pictures of the new source domain synthesized by Step 3 and tests the target domain as input. It then shares the parameters of the two parts, and realizes the detection of rolling bearing data in the target domain by migration learning.
Processes 2022, 10, x FOR PEER REVIEW 4 of 15 obtained. The training weight is added to the prediction script for Resnet50. The prediction script of resnet50 selects the false sample images with a probability of prediction accuracy greater than 90%. After repeated selection, the reliability of the photos is relatively high. Finally, this part of the image is combined with the real image in Step 1 as a new source domain image.
Step 4: Migration learning adopts resnet50 as the backbone network, adds the Multi-Kernel Maximum mean discrepancies (MK-MMD) adaptation layer to the full connection layer of the resnet50 network, takes the fault pictures of the new source domain synthesized by Step 3 and tests the target domain as input. It then shares the parameters of the two parts, and realizes the detection of rolling bearing data in the target domain by migration learning. The idea of the Step 1-Step 4 scheme adopted in this paper is as follows: 1. In view of the problem that the amount of unlabeled data in real industrial scenes far exceeds the amount of labeled data, firstly, Step 1 applies the continuous wavelet transform to increase the extractable feature information. Secondly, the purpose of false generation through Step 2 is to expand the amount of labeled data. In the last Step 3, the second selection of the generated false sample data aims to improve the accuracy of the semisupervised fault diagnosis designed in this paper. 2.
Step 2 and Step 3 can also solve the problem of unbalanced proportion between normal and fault data in real industrial scenarios. After expanding unbalanced data, WGAN-GP is adopted to select samples with high reliability, to balance unbalanced group data.
3. In the real industrial situation, rolling bearings will work at different speeds, and the faults generated will be distributed in different working conditions, and the fault data in different working conditions must be different, so it is very inefficient to build different The idea of the Step 1-Step 4 scheme adopted in this paper is as follows: 1. In view of the problem that the amount of unlabeled data in real industrial scenes far exceeds the amount of labeled data, firstly, Step 1 applies the continuous wavelet transform to increase the extractable feature information. Secondly, the purpose of false generation through Step 2 is to expand the amount of labeled data. In the last Step 3, the second selection of the generated false sample data aims to improve the accuracy of the semi-supervised fault diagnosis designed in this paper. 2.
Step 2 and Step 3 can also solve the problem of unbalanced proportion between normal and fault data in real industrial scenarios. After expanding unbalanced data, WGAN-GP is adopted to select samples with high reliability, to balance unbalanced group data.
3. In the real industrial situation, rolling bearings will work at different speeds, and the faults generated will be distributed in different working conditions, and the fault data in different working conditions must be different, so it is very inefficient to build different models to diagnose the faults in each working condition, Therefore, the Step 4 network model is proposed in this paper. Resnet50 is used as the backbone network, MK-MMD is added to reduce the difference between different working conditions, and the target domain data of Step 1 and the source domain data of Step 3 are used as the input. Only the fault data in one working condition are used as the source domain to realize the fault diagnosis of the target domain in different working conditions.

Continuous Wavelet Transform of Data
Scholars have widely used Continuous wavelet transform to process mechanical fault signals [29,30]. Compared with the Fourier transform, the signal can be observed simultaneously in the time and frequency domains. Short-time Fourier transform is also a representative of the time-frequency method; compared with it, wavelet transform has the characteristics of window adaptive. It can better balance the frequency resolution and time resolution. The definition of continuous wavelet transform is that for any signal f (t) ∈ L 2 (R): In the formula, a > 0 is the scale parameter, whose function is to scale and transform basic wavelet ϕ(t). The larger a is, the wider the waveform ϕ(a/t) is. The constant factor 1/ √ a means that the ϕ a,b (t) energy under different a remains equal. b is the location parameter, and ϕ(t) is the complex conjugate of ϕ(t). Continuous wavelet transform can also be rewritten as follows: This inner product form gives the degree of similarity between the signal f (t) and the template function ϕ a,b (t). The higher the degree of similarity, the greater the WT ϕ f (a, b). Combined with the actual needs, this paper needs to process the one-dimensional vibration fault data, so the wavelet amplitude must be smooth and continuous. Complex-valued wavelet with good phase expression ability should be selected to obtain the time and frequency domain characteristics. Combined with the above bearing fault diagnosis characteristics, this paper sets the Morlet wavelet, a complex exponential wavelet with nonorthogonality. There is a good balance between time and frequency localization, and a wide peak can contain positive and negative peaks, which have more fault vibration information. In this paper, continuous wavelet transform is applied to all fault data of rolling bearings. The sampling frequency is set to 12 KHZ and the sampling length to 1024. Due to the limited fault data, to obtain more training samples and avoid overfitting caused by a small amount of data, the fault data are enhanced by overlapping sampling. In addition, the scale of the overlapping piece is set to 0.5. After the wavelet transform, all the failure data are processed, four shown in Figure 2.
Processes 2022, 10, x FOR PEER REVIEW 5 of 15 models to diagnose the faults in each working condition, Therefore, the Step 4 network model is proposed in this paper. Resnet50 is used as the backbone network, MK-MMD is added to reduce the difference between different working conditions, and the target domain data of Step 1 and the source domain data of Step 3 are used as the input. Only the fault data in one working condition are used as the source domain to realize the fault diagnosis of the target domain in different working conditions.

Continuous Wavelet Transform of Data
Scholars have widely used Continuous wavelet transform to process mechanical fault signals [29,30]. Compared with the Fourier transform, the signal can be observed simultaneously in the time and frequency domains. Short-time Fourier transform is also a representative of the time-frequency method; compared with it, wavelet transform has the characteristics of window adaptive. It can better balance the frequency resolution and time resolution. The definition of continuous wavelet transform is that for any signal ( ) ∈ 2 ( ): In the formula, > 0 is the scale parameter, whose function is to scale and transform basic wavelet ( ). The larger is, the wider the waveform ( / ) is. The constant factor 1/√ means that the , ( ) energy under different remains equal. is the location parameter, and ̅( ) is the complex conjugate of ( ). Continuous wavelet transform can also be rewritten as follows: This inner product form gives the degree of similarity between the signal ( ) and the template function , ( ) . The higher the degree of similarity, the greater the ( , ). Combined with the actual needs, this paper needs to process the one-dimensional vibration fault data, so the wavelet amplitude must be smooth and continuous. Complex-valued wavelet with good phase expression ability should be selected to obtain the time and frequency domain characteristics. Combined with the above bearing fault diagnosis characteristics, this paper sets the Morlet wavelet, a complex exponential wavelet with non-orthogonality. There is a good balance between time and frequency localization, and a wide peak can contain positive and negative peaks, which have more fault vibration information. In this paper, continuous wavelet transform is applied to all fault data of rolling bearings. The sampling frequency is set to 12 KHZ and the sampling length to 1024. Due to the limited fault data, to obtain more training samples and avoid overfitting caused by a small amount of data, the fault data are enhanced by overlapping sampling. In addition, the scale of the overlapping piece is set to 0.5. After the wavelet transform, all the failure data are processed, four shown in Figure 2.

Generative Adversarial Networks
Generative Adversarial Networks consist of Generator (G) and Discriminator (D) The generation model G achieves the purpose of deceiving the identification model D by generating data consistent with the samples distribution as much as possible. The goal of identification model D is to distinguish between real data and false data. The two constantly update internal parameters to achieve Nash equilibrium through the mutual game, and finally, use the generic model to generate false data. GAN reduces the influence on the error by alternately training the two models, and its loss function is: Most GANs have some defects, such as the collapse of generation mode, difficulty in model training, and instability of the training process. In this paper, WGAN-GP [31][32][33], a variant of GAN, is used to generate high-quality false samples, to solve the problems of imbalanced data types of fault samples and low sample numbers. WGAN replaces K.L. and J.S. divergence by introducing Wasserstein distance (Formula (4)).
In the formula, the joint distribution of P 1 and P 2 is Π(P 1 , P 2 ). For γ joint distributions, real data x and generated false data y can be sampled from them and conform to (x, y) ∼ γ, and the distance between samples is calculated x − y . Finally, the expected value of the samples E (x,y)∼γ [ x − y ] can be calculated. Based on WGAN, WGAN-GP adopts the gradient penalty strategy to replace the weight pruning strategy, which also realizes Lipschitz continuity and achieves more stable generated results and higher quality generated pictures. Its objective function is as follows: where ∇ x D( x) 2 is the weight of discrimination model D plus penalty, and λ is the penalty coefficient. WGAN-GP selects Adam optimizer to optimize the parameters in the training process. Combined with the bearing fault diagnosis data, the WGAN-GP network structure is constructed, as shown in Figure 3. This paper uses WGAN-GP as the second step to process the vibration fault data after continuous wavelet transformation. The processed data are called real images. During the test, entering too much fault data at once would cause the generated model to crash. The information with unclear characteristics can only generate a few pictures, and there is even one fault type that does not create pictures. Therefore, fault samples are sent to WGAN-GP in batches for the same round of training. Figure 3 shows the whole training process. First, the Generator will process the randomly generated Gauss noise into the image, called fake images here, and then input the real and counterfeit images into the Discriminator to judge the real and unreal images. In the training process, the quality of the Generator and the accuracy of the Discriminator will increase continuously until the Discriminator fails to distinguish fake images, indicating that the training has achieved Nash equilibrium. Figure 4 shows the loss function curve obtained after 2000 iterations of one sample data in this paper. It can be seen from the figure that the loss of the Discriminator reaches its peak briefly at the beginning of the iteration and then approaches zero after about 100 iterations, indicating that the authentication network has reached Nash equilibrium. After 1250 iterations, the generated network s fluctuation decreases, implying that the quality of the generated images tends to gradually stabilize.  Figure 4 shows the loss function curve obtained after 2000 iterations of one sample data in this paper. It can be seen from the figure that the loss of the Discriminator reaches its peak briefly at the beginning of the iteration and then approaches zero after about 100 iterations, indicating that the authentication network has reached Nash equilibrium. After 1250 iterations, the generated network′s fluctuation decreases, implying that the quality of the generated images tends to gradually stabilize.

Convolutional Neural Network
Convolutional Neural Network (CNN) [34] is widely used in image processing, semantic image segmentation, and other fields since Lecun proposed it. It is an important   Figure 4 shows the loss function curve obtained after 2000 iterations of one sam data in this paper. It can be seen from the figure that the loss of the Discriminator reac its peak briefly at the beginning of the iteration and then approaches zero after about iterations, indicating that the authentication network has reached Nash equilibrium. A 1250 iterations, the generated network′s fluctuation decreases, implying that the qua of the generated images tends to gradually stabilize.

Convolutional Neural Network
Convolutional Neural Network (CNN) [34] is widely used in image processing, mantic image segmentation, and other fields since Lecun proposed it. It is an import

Convolutional Neural Network
Convolutional Neural Network (CNN) [34] is widely used in image processing, semantic image segmentation, and other fields since Lecun proposed it. It is an important part of deep learning. Its basic structure is generally composed of the input layer, convolution layer, pooling layer, full connection layer, and output layer. The convolution layer, as the core of the convolution neural network, is a special mathematical operation: where W and H represent the width and height of the image, P represents the number of pixel layers added to the image edge, and S denotes the step size of the convolution kernel.
With the development of convolutional neural networks, scholars have proposed more and more network models. Resnet [35] was proposed in 2015 and won the championship of ILSVRC2015. He et al. proposed a network structure called a residual block. The convolutional neural network solves the gradient vanishing problem by using residual mapping. The expression of a single residual block is as follows: Resnet network contains two residual blocks, BasicBlock and Bottleneck, which correspond to two network structures, collectively referred to as BuildingBlock, as shown in Figure 5. The left network structure corresponds to a shallow network, such as Resnet34. For deeper networks, such as Resnet50, they use the Bottleneck structure to reduce the parameters of the overall network by stacking 1 × 1 convolution layers at the beginning and end, to reduce the training time.
part of deep learning. Its basic structure is generally composed of the input layer, convolution layer, pooling layer, full connection layer, and output layer. The convolution layer, as the core of the convolution neural network, is a special mathematical operation: where and represent the width and height of the image, represents the number of pixel layers added to the image edge, and denotes the step size of the convolution kernel.
With the development of convolutional neural networks, scholars have proposed more and more network models. Resnet [35] was proposed in 2015 and won the championship of ILSVRC2015. He et al. proposed a network structure called a residual block. The convolutional neural network solves the gradient vanishing problem by using residual mapping. The expression of a single residual block is as follows: Resnet network contains two residual blocks, BasicBlock and Bottleneck, which correspond to two network structures, collectively referred to as BuildingBlock, as shown in Figure 5. The left network structure corresponds to a shallow network, such as Resnet34. For deeper networks, such as Resnet50, they use the Bottleneck structure to reduce the parameters of the overall network by stacking 1 × 1 convolution layers at the beginning and end, to reduce the training time. This paper uses Resnet50 as the backbone network to train Step 3 and Step 4. Res-net50 adopts the Bottleneck structure in Figure 5. Compared with Resnet18 with lower network layers, the overall parameters are still within the acceptable range with better network performance. Using Resnet101 and other networks with a deeper network structure is unnecessary from the training results. When executing Step 3, the Resnet50 network is applied to the selected portion of the generated picture; that is, the composite standard is used to select the false samples of the fault. Firstly, the bearing fault data in the source domain are used as the training set and verification set of the Resnet50 network for training. Because the classification task of this paper is different from that of Resnet, a fully connected layer is added after the network structure of Resnet50 during training to match the bearing fault classification task. After training, a weight file is generated. Secondly, after WGAN-GP can generate stable fault pictures, it sends these pictures to the trained Resnet50 network for batch prediction. The fake images are taken as the input, the minimum prediction probability of the network is set to 90%, these pictures with high reliability are chosen as the final selected samples, and they are added to the source This paper uses Resnet50 as the backbone network to train Step 3 and Step 4. Resnet50 adopts the Bottleneck structure in Figure 5. Compared with Resnet18 with lower network layers, the overall parameters are still within the acceptable range with better network performance. Using Resnet101 and other networks with a deeper network structure is unnecessary from the training results. When executing Step 3, the Resnet50 network is applied to the selected portion of the generated picture; that is, the composite standard is used to select the false samples of the fault. Firstly, the bearing fault data in the source domain are used as the training set and verification set of the Resnet50 network for training. Because the classification task of this paper is different from that of Resnet, a fully connected layer is added after the network structure of Resnet50 during training to match the bearing fault classification task. After training, a weight file is generated. Secondly, after WGAN-GP can generate stable fault pictures, it sends these pictures to the trained Resnet50 network for batch prediction. The fake images are taken as the input, the minimum prediction probability of the network is set to 90%, these pictures with high reliability are chosen as the final selected samples, and they are added to the source domain. The composite criteria adopted in this paper can ensure that the neural network can identify the generated false samples to improve fault diagnosis accuracy. Figure 6 shows the network structure of the migration phase proposed in this paper. The updated source domain data and the target domain data to be detected are simultane-Processes 2022, 10, 1443 9 of 15 ously input into the deep migration learning network. Resnet50 is selected for the backbone network of deep transfer learning. Since the task structure of this paper is to deal with ten types of faults, the output node of the full connection layer is replaced by 10. Multi-kernel Maximum Mean Discrepancy (MK-MMD) is added to the full connection layer to measure and reduce the edge distribution difference between the source and target domains.

Deep Transfer Learning
domain. The composite criteria adopted in this paper can ensure that the neural network can identify the generated false samples to improve fault diagnosis accuracy. Figure 6 shows the network structure of the migration phase proposed in this paper. The updated source domain data and the target domain data to be detected are simultaneously input into the deep migration learning network. Resnet50 is selected for the backbone network of deep transfer learning. Since the task structure of this paper is to deal with ten types of faults, the output node of the full connection layer is replaced by 10. Multi-kernel Maximum Mean Discrepancy (MK-MMD) is added to the full connection layer to measure and reduce the edge distribution difference between the source and target domains. Transfer learning is applying knowledge or model to another learning task. Improving the baseline performance, shortening the model development time, and improving performance helps the learning accomplish the target tasks. The success of migration learning is closely related to the correlation between jobs. Pan et al. defined transfer learning in 2009 [36]: The scholar explains the field as a tuple of two elements, one of which is the marginal probability ( ). In addition, is represented as a sample data point, and the other is the feature space . = { 1 , 2 , ⋯ , }, where is a specific vector, and ∈ , so there are:

Deep Transfer Learning
Similarly, a task T can also be expressed as a tuple containing two elements. The formula is as follows: where represents the first element: feature space, and ( | ) represents another element from a probabilistic point of view: the objective function . According to the traditional machine algorithm, transfer learning can be divided into inductive, unsupervised, and direct push transfer. As a typical representative of inductive transfer, deep transfer learning has two advantages over non-deep transfer learning. One is that it can meet the end-to-end needs in practical applications; the other is that it can automatically extract more expressive features. The deep transfer learning task consists of different learning types, such as domain adaptation, domain confusion, multi-task learning, one-time learning, zero sample learning, etc. Regarding domain adaptation, Tzeng et al. first proposed the DDC method [37]. Transfer learning is applying knowledge or model to another learning task. Improving the baseline performance, shortening the model development time, and improving performance helps the learning accomplish the target tasks. The success of migration learning is closely related to the correlation between jobs. Pan et al. defined transfer learning in 2009 [36]: The scholar explains the field D as a tuple of two elements, one of which is the marginal probability PX. In addition, X is represented as a sample data point, and the other is the feature space §. X = { § 1 , § 2 , · · · , § n }, where § i is a specific vector, and X ∈ §, so there are: Similarly, a task T can also be expressed as a tuple containing two elements. The formula is as follows: where γ represents the first element: feature space, and Pγ|X represents another element from a probabilistic point of view: the objective function f . According to the traditional machine algorithm, transfer learning can be divided into inductive, unsupervised, and direct push transfer. As a typical representative of inductive transfer, deep transfer learning has two advantages over non-deep transfer learning. One is that it can meet the end-to-end needs in practical applications; the other is that it can automatically extract more expressive features. The deep transfer learning task consists of different learning types, such as domain adaptation, domain confusion, multi-task learning, one-time learning, zero sample learning, etc. Regarding domain adaptation, Tzeng et al. first proposed the DDC method [37]. Using AlexNet [38] as the backbone network, he added an adaptive layer between the feature classification layer and the extraction layer to align the data of the source domain and target domain as much as possible. The maximum mean deviation (MMD) he uses in his paper is widely used and is expressed as follows: φ is a method to map the data of the source domain and the target domain to the reproducing kernel Hilbert space [39]. Through this formula, the distance between the MMD is correlated with the selected kernel function, and the performance of MMD will decrease when the kernel function performs poorly. The appearance of MK-MMD solves this problem well: MK-MMD assumes that the optimal kernel can be formed by the linear combination of multiple kernels, which avoids the difficulty of MMD in selecting the kernel function. Therefore, this paper chooses to add MK-MMD to the network so that the network can obtain better characterization ability. The loss function after adding MK-MMD is as follows: In the formula, the left side of the plus sign is the cross-entropy loss function of the source domain, and λ represents the weight of MK-MMD.

Data Set Introduction
This paper uses the bearing data set of Case Western Reserve University (CWRU) to verify the effectiveness and feasibility of the proposed method. As shown in Figure 7, the CWRU test-bed has a fan end bearing, a two-horsepower motor, drive end bearing, torque sensor, encoder, dynamometer, etc., from left to right. An acceleration sensor is placed above the bearing seat of the motor s fan end and drive end respectively to collect vibration acceleration signals of the faulty bearing. Vibration signals are collected by the 16-channel data recorder, and the power and speed are measured by the torque sensor and the encoder.
is a method to map the data of the source domain and the target domain to the reproducing kernel Hilbert space [39]. Through this formula, the distance between the source domain and the target domain can be obtained and added to the loss function to improve the efficiency of migration learning.
MMD is correlated with the selected kernel function, and the performance of MMD will decrease when the kernel function performs poorly. The appearance of MK-MMD solves this problem well: MK-MMD assumes that the optimal kernel can be formed by the linear combination of multiple kernels, which avoids the difficulty of MMD in selecting the kernel function. Therefore, this paper chooses to add MK-MMD to the network so that the network can obtain better characterization ability. The loss function after adding MK-MMD is as follows: In the formula, the left side of the plus sign is the cross-entropy loss function of the source domain, and represents the weight of MK-MMD.

Data Set Introduction
This paper uses the bearing data set of Case Western Reserve University (CWRU) to verify the effectiveness and feasibility of the proposed method. As shown in Figure 7, the CWRU test-bed has a fan end bearing, a two-horsepower motor, drive end bearing, torque sensor, encoder, dynamometer, etc., from left to right. An acceleration sensor is placed above the bearing seat of the motor′s fan end and drive end respectively to collect vibration acceleration signals of the faulty bearing. Vibration signals are collected by the 16channel data recorder, and the power and speed are measured by the torque sensor and the encoder.  In this experiment, electrical discharge machining (EDM) was used to cause singlepoint damage to bearings. The damage to the inner race, outer race, and rolling body of bearing under different loads is simulated. The bearing to be tested supports the motor's rotating shaft, the fan end bearing is SKF6203 with a sampling frequency of 12 KHZ, and the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results.
Generate sample images (100/type) the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. Real sample images the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results.
Generate sample images the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results. the driving end bearing is SKF6205. The sampling frequency is 12 KHZ and 48 KHZ. The experiment has four rotating speeds: 1797 r/min, 1772 r/min, 1750 r/min, and 1730 r/min. The corresponding loads are 0 HP, 1 HP, 2 HP and 3 HP, respectively. The processed faulty bearings were re-loaded into the test motor, and the vibration acceleration signal data were recorded under the working conditions of 0, 1, 2, and 3 horsepower motor load, respectively. To meet the requirements of cross-working condition diagnosis in this paper, bearing fault signals at the driving end are selected at a sampling frequency of 12 KHZ. This paper divides faults into ten categories according to the damaged parts and fault diameters of bearings, as shown in Table 1, which lists the fault data at each speed. The original data adopt the continuous wavelet transform method in Step 1 to obtain 10 types of fault pictures at each rate, 200 photos of each type, and a total of 8000 images. After the pictures in Step 2 and Step 3 are generated, each class selects 100 samples with high reliability to join as a new training set to train the network. Take 0 HP as an example; Table 1 compares the generated false sample image and the real sample image. The accuracy of fault identification is obtained by inputting the source and target domain into the migration network simultaneously through Step 4.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results.

Experimental Verification under Cross-Working Conditions
Based on Experiment 1 above, to simulate the need for fault diagnosis in real industrial scenes, the experiment will verify the proposed method in two stages on the premise of cross-working conditions. In Experiment 1, four different rotational speeds were used as transfer learning conditions under cross-working conditions. For example, 0-1 indicates that 0 HP is the source domain and 1 HP is the target domain for training. A total of 12 migration schemes were set up in the experiment, and the control groups were DDC, Deepcoral [40], and DAN [41]. Under the same experimental conditions, Table 2 shows the experimental results.
To fully verify that this method still has a good effect in the case of imbalanced fault sample data, we add Experiment 2. First, based on Experiment 1, three groups with the highest accuracy were selected: 0-2 migration, 1-2 migration, and 3-2 migration. The average accuracy of these three transfer experiments under the four algorithms is 84.7%, 85.09%, and 84.76%, respectively. Secondly, three defect samples of ten types of defects are randomly reduced to 10% to simulate the state when the fault data are imbalanced. The algorithm proposed in this paper generates the remaining 90% false sample data through the WGAN-GP network of Step 2 and fills it into the source domain data set to keep the data set balanced. Finally, under the same conditions, compared with the other three algorithms, the accuracy results are shown in Table 3. In the real industrial scene, in addition to the problem of unbalanced data amount, the data collection process will also be interfered with by noise. Therefore, Experiment 3 was set up to verify the robustness of the proposed rolling bearing fault diagnosis method. On the premise of Experiment 2, an SNR of 5 dB and 10 dB was added to the target domain based on the unbalanced fault samples. The smaller the SNR, the greater the noise. With the same experimental conditions, the final algorithm pair is shown in Table 4.

Analysis of Experimental Results
As can be seen from the results in Table 2, the average accuracy of the proposed method under the 12 migration schemes is 96.58%, which is higher than 65.36% for DDC and 66.34% for Deepcoral, and 94.37% for DAN. It shows that after the source domain data samples are generated and expanded by this method, the new samples can be trained to achieve a better and more accurate network model. In Experiment 1, the average accuracy of the three migration schemes 0 HP-2 HP, 1 HP-2 HP, and 3 HP-2 HP when the target domain is 2 HP covers the top three, indicating that the fault feature extraction effect of 12 KHZ drive ends bearing under the load of 2 HP is the best. Table 3 shows that when the fault sample data are imbalanced, the diagnostic accuracy of other methods decreases by more than 10% on average, indicating that the imbalanced data will lead to the imbalance of fault diagnosis of other models. The average accuracy of the proposed method is still 99.60%, which is only one sample data gap compared with 99.65% when the data are balanced. This method maintains good accuracy and reliability given the imbalanced data in the real industrial situation.
In Experiment 3, different SNR noises were added to the target domain based on Experiment 2 to simulate the possible data collection problems in real industrial conditions. Under the influence of different decibels, DDC, Deepcoral, and DAN showed great differences in accuracy. The average classification accuracy of DDC and Deepcoral at SNR = 5 is higher than that of SNR = 10, while DAN shows the opposite. In addition, the average classification accuracy of the proposed algorithm remains above 99% in the complex environment with unbalanced data and two different SNRs, which fully verifies the robustness of the proposed algorithm. It shows that the algorithm can effectively solve practical industrial problems.

Conclusions
The method proposed in this paper effectively solves three problems in the real industrial situation and is verified under the bearing data set of Case Western Reserve University. The following conclusions can be obtained: (1) The data generated by WGAN-GP are selected by Resnet for the second time and then added to the source domain data to form the new source domain data, which can solve the problem of imbalanced fault data in the industry. In addition, in the case of data balance, it still can improve detection accuracy.
(2) The detected fault data should be unlabeled in real industrial situations. Resnet50 is used as the backbone network, and MK-MMD was added as the adaptation layer of the network so that the data of the source domain and the target domain can share parameters. It can effectively realize the detection of unmarked data.
(3) The fault data under different working conditions have other fault characteristics, so it is necessary to establish multiple network models to detect the data under different working conditions. In addition, if you use the transfer learning method, it can predict the fault data in any target domain with high accuracy under other source fault data.