Multi-Domain Weighted Transfer Adversarial Network for the Cross-Domain Intelligent Fault Diagnosis of Bearings

: Transfer learning is a topic that has attracted attention for the intelligent fault diagnosis of bearings since it addresses bearing datasets that have different distributions. However, the traditional intelligent fault diagnosis methods based on transfer learning have the following two shortcomings. (1) The multi-mode structure characteristics of bearing datasets are neglected. (2) Some local regions of the bearing signals may not be suitable for transfer due to signal ﬂuctuation. Therefore, a multi-domain weighted adversarial transfer network is proposed for the cross-domain intelligent fault diagnosis of bearings. In the proposed method, multi-domain adversarial and attention weighting modules are designed to consider bearing multi-mode structure characteristics and solve the inﬂu-ence of local non-transferability regions of signals, respectively. Two diagnosis cases are used to verify the proposed method. The results show that the proposed method is able to extract domain invariant features for different cross-domain diagnosis cases, and thus improves the accuracy of fault identiﬁcation.


Introduction
Transfer learning aims to use the existing knowledge to solve related, but different, domain of problems, discarding the constraint that the training and testing sets of intelligent models must obey the same probability distribution [1][2][3].According to this advantage, researchers have introduced it into the field of the intelligent fault diagnosis of bearings and have proposed several transfer intelligent fault diagnosis methods [4][5][6][7][8].These methods regard different bearing datasets as different domains, and the fault knowledge learned from one dataset in the source domain can be applied to the other dataset in the target domain by using domain adaptation, which breaks the limitations of traditional intelligent fault diagnosis methods based on deep learning [9][10][11][12][13].For example, Du et al. [14] presented a hybrid transfer method, which achieved the transfer of fault knowledge between different domains by the transfer component analysis method and the TrAdaBoost algorithm.Yang et al. [15] established a transfer fault diagnosis method based on a polynomial kernel-induced maximum mean discrepancy, which solved the problem of transfer fault diagnosis between different bearings.Guo et al. [16] proposed a deep transfer learning network based on a one-dimensional deep convolutional neural network.Zou et al. [17] established a novel transfer fault diagnosis for the fault diagnosis of bearings under different working conditions, which combined deep convolution Wasserstein adversarial networks and a novel variance constraint to achieve a higher fault diagnosis accuracy than the existing models.
The main goal of this paper is to improve the diagnosis accuracy of the transfer learning bearing fault diagnosis method.The traditional intelligent fault diagnosis method based on transfer learning has two disadvantages.On the one hand, the multi-mode structure characteristics of bearing datasets are neglected.On the other hand, some local regions of the bearing signals may not be suitable for transfer due to signal fluctuation.The existing intelligent diagnosis methods have not considered these characteristics in the process of domain adaptation, and just align the learned features by directly measuring the overall distributions of the source and target domains [18][19][20][21][22]. It may lead to a misalignment between the multi-mode structure bearing datasets [23].In addition, some local regions of the signals in the bearing datasets may be coupled with useless interference information due to the influence of background noise in the collection, which makes those local regions of the signals unsuitable for transfer [24][25][26][27][28][29].
In order to solve the above shortcomings, this paper addresses the following two tasks: (1) to consider the multi-mode structure characteristics of bearing datasets and (2) to solve the influence of local non-transferable area of signals.Based on these two tasks, a multi-domain weighted adversarial transfer network (MWAT) is proposed for crossdomain intelligent fault diagnosis by considering the bearing multi-mode data structure and the local non-transferability of the bearing signals.In the proposed MWAT, firstly, the feature extraction module is designed based on a convolution neural network and non-local network.This module aims to map the bearing data to the high-dimensional feature space to extract high-dimensional feature information.Secondly, the multi-domain adversarial module is established by considering the multi-mode structure characteristics of the bearing data.By capturing such multi-mode structure information, this module achieves the adversarial transfer training for the different health states of the data by using different channels.Thirdly, the attention weighted module is designed by considering the existence of local non-transferable regions in the bearing signals.This module captures the attention values of each local region to represent the transferability of the region and applies less attention to reduce the influence of the non-transferability region.Finally, a fault classification module is designed to identify the health states of the bearings so as to transfer fault diagnosis knowledge between two different bearing datasets.A diagnosis case for different working conditions and a diagnosis case for different bearings are used to verify the proposed method.
The rest of this paper is organized as follows.In Section 2, the theoretical background is described briefly, including problem description, domain adaptation method and nonlocal network.In Section 3, the proposed multi-domain weighted adversarial transfer network is introduced, including feature extraction module, multi-domain adversarial module, attention weighted module and fault classification module.In Section 4, the proposed method is verified through a diagnosis case for different working conditions and a diagnosis case for different bearings.Finally, conclusions are drawn in Section 5.

Problem Description
The transfer intelligent fault diagnosis method of bearings aims to learn fault knowledge from a labeled training dataset and apply it to the fault identification of the unlabeled testing dataset [30][31][32].In this paper, the unlabeled training dataset was defined as the source domain dataset and expressed as D s = x s i , y s i n s i , where x s i represents the i-th sample of source domain dataset with the sample actually being a bearing vibration signal; y s i represents the health state label of the i-th sample; and n s represents the number of the source domain sample.The unlabeled testing dataset was defined as the target domain dataset and expressed as D t = x t j , y t j n t j , where x t j represents the j-th sample of target domain dataset; y t j represents the health state label of the j-th sample; and n t represents the number of the target domain sample.In the field of transfer intelligent fault of bearings, the source domain and target domain datasets have different probability distributions.Let the probability distributions of the source domain and target domain datasets be expressed as P(x s ) and P x t , respectively, and P(x s ) = P x t [33].

Domain Adaptation Method
The substantive purpose of transfer intelligent fault diagnosis is to reduce the probability distribution difference between the source domain and target domain datasets using domain adaptation methods, where domain adversarial learning is one of the methods of domain adaptation.The representative work is the domain adversarial neural network (DANN) proposed by Ghifary et al. [34].Its specific structure is as follows.
DANN mainly includes three parts: feature extractor G f , domain discriminator G d and fault classifier G y .Firstly, the domain discriminator G d is mainly used to identify whether the extracted features belong to the source or target domains.Then, the feature extractor G f is fine-tuned according to the loss of the domain discriminator to confuse the domain discriminator.Finally, the features extracted from the other dataset can be identified by the fault classifier.The loss function of DANN is as follows: where The probability distribution difference can be reduced between the source domain and target domain datasets by the trainifng of the above method, so that the transfer intelligent fault diagnosis method trained by the source domain dataset can identify the health states of the target domain dataset.

Non-Local Network
The convolution neural network can only extract the local region information of the bearing sample in turn, which means that the correlation information of local regions with a large distance cannot be obtained.To solve the above problems, Wang et al. [35] proposed a non-local network, which can obtain the long-range correlation information of bearing samples by adding local region similarity information to the input features.At this time, when the noise signal is mixed into the bearing signal, the small similarity between the noise signal and other signals can be used to restrain its influence on the network training.
The non-local network is mainly composed of three 1 × 1 convolution operators.The main contents are as follows.(1) The long-range correlation information of the feature extracted from samples is calculated.(2) The long-range correlation information is integrated into the features and the output dimension is adjusted to ensure the input and output dimensions are consistent.The flowchart of the non-local network is shown in Figure 1.Firstly, the long-range dependence information of the feature is calculated by the convolution operators θ(•) and φ(•).The equation is expressed as follows: where θ(•) and φ(•) are two convolution operations of 1 × 1. w θ and w φ represent the parameters to be optimized in θ(•) and φ(•), respectively.x u represents the input features of a non-local network.Secondly, the long-range correlation information calculated is normalized and it is integrated to the features extracted from the samples.The features with the long-range correlation information are obtained by Equation (4): where g(•) is a convolution operation of 1 × 1. C(x u ) represents the normalization coeffi- cient, and its size depends on the length of the input features.so f tmax(•) represents the so f tmax function.
Finally, the idea of residual network is used to ensure that the output dimension is consistent with the input dimension and the input feature information is not lost.Therefore, the non-local network can be inserted after any convolutional layer.The above method can be expressed as the following equation: where w represents the parameter to be optimized.

The Proposed Method
A new method of the multi-domain weighted adversarial transfer network (MWAT) is proposed for the cross-domain intelligent fault diagnosis of bearings.The proposed MWTA mainly includes four parts: feature extraction module, multi-domain adversarial module, attention weighted module and fault classification module.The feature extraction module was used to extract the fault features of the dataset.The multi-domain adversarial module was used to solve the problem of the multi-mode structure of the dataset.The attention weighted module was used to solve the problem that the bearing sample has local non-transferability.Additionally, the fault classification module was optimized to identify the fault states of the bearing dataset.The structure of the MWTA is shown in Figure 2.

Feature Extraction Module
The feature extraction module was mainly used to project the dataset of bearings into a high-dimensional feature space, which is composed of a convolutional neural network and a non-local network.Firstly, the features were extracted by a convolutional neural network.Secondly, the long-range correlation information was integrated into the features by a non-local network.Thereby, the connectivity between the different regions of the original data was ensured to enhance the ability of the fault feature extraction.
In this paper, the convolutional neural network was mainly composed of two convolutional layers, one pooling layer, two Rectified Linear Unit (ReLU) layers and one dropout layer.Let the convolutional neural network be expressed as G f c .Therefore, the feature extracted can be expressed as: where x m represents the m-th sample in the union of the source and target domains.F(•) represents the mapping function of the l-th layer.w l represents the parameters of the l-th layer to be optimized.
Then, the above features were input into the non-local network.Let the non-local network be expressed as G f n .According to Equations ( 4) and ( 5), the features with longrange correlation information can be obtained by Equation (7).

Multi-Domain Adversarial Module
The source domain and target domain datasets collected contain a variety of different health states, resulting in the datasets present the multi-modal structure.However, the existing transfer fault diagnosis methods directly align the source domain and target domain features without considering the bearing multi-mode structure.When the source domain and target domain data are taken as input for domain adaptation, the source domain features that have a certain information about the health state could match the target domain features that have other information about the health state.The above wrong matching may cause the proposed methods to learn wrong fault information during domain adaptation and affect the accuracy of fault identification.In order to solve the above problem, the multi-domain adversarial module was designed.This module was mainly divided into three steps: (1) The health state type of the features was predicted to capture the bearing multi-mode structure; (2) Multiple domain adaptation networks were established to support the features with different health state types for domain adaptation; and (3) Domain adaptation was performed on each domain adaptation network to reduce the probability of distribution differences of source domain and target domain features.
Firstly, the health state type of the features was predicted to capture the bearings multi-mode structure.Let the fault classifier be G y , whose specific structure was described in Section 3.4.Furthermore, the feature x f n m obtained by Equation (7) was taken as the input feature, and it was assumed that there is a total of k types of health states of bearings.Thus, the output vector of G y can be expressed as: where ∧ P k represents the predicted probability of the m-th feature on the k-th health state type.Furthermore, the predicted probability ∧ P k was used as a category weight and assigned to the feature x f n m ; thus, the multi-modal structure of the features was captured.Therefore, the weighted features can be expressed as: Secondly, k domain adaptation networks were established based on the total number k of health state types of bearings.Then, the features x f n m,k , obtained in Equation ( 9), were input into each domain adaptation network.When the health state information of the source domain and target domain features were consistent, they were assigned to the same domain adaptation network by predicting the probability ∧ P k , thereby solving the problem of the datasets having a multi-mode structure.
Thirdly, based on the features x f n m,k obtained in the second step, domain adaptation was performed in each domain adaptation network by domain adversarial methods; thus, the probability distribution difference of the source domain and target domain features was reduced.Let the domain discriminator on each domain adaptation network be represented as G dc , where the domain discriminator G dc is composed of two fully connected layers, one ReLU layer and one SoftMax layer.The loss error k dc of the domain discriminator on the k-th domain adaptation network is expressed as follows: where L k dc represents the loss function of the domain discriminator on the k-th domain adaptation network; ∧ d m represents the predicted domain label vector of the m-th sample; and d m represents the real domain label vector of the m-th sample.
The k domains adaptation networks are trained together when the network is training.Therefore, the total loss error dc of the multi-domain adversarial module can be expressed as:

Attention Weighted Module
A bearing sample is a vibration signal and it is generally composed of hundreds of continuous sampling points.However, the different regions of a sample may have different transferability due to factors such as working conditions and noise, where some regions that are not suitable for transfer could restrain the decrease in the probability distribution of the source domain and target domain features, resulting in a reduction in the accuracy of fault identification.Therefore, a natural idea is to increase the weight of the sample suitable for transfer regions and reduce the weight of the samples not suitable for transfer regions.Furthermore, the attention weighted module was designed based on above ideas.The attention weighted module included three steps: (1) Features were extracted from the samples to convenient for transferability measurement; (2) The features extracted were divided into v small features of the same length and measuring the transferability of these v small features; and (3) the transferability values obtained in the second step were used as weights that are assigned to the features obtained in the first step.
Firstly, the multi-modal structure of the dataset was solved by the multi-domain adversarial module.Therefore, a reasonable operation is to use the features x f n m,k of the k-th network as the input features of the attention weighted module.It is worth mentioning that an attention weighted module was added to each domain adaptation network to ensure that each feature in the input model was subjected to feature weighting operations.
Secondly, the features x f n m,k of the k-th domain adaptation are equally divided into v small features (the size of v is explained in Section 4.1).The v small features are input into the transferability discriminator G d f to measure the transferability, where the network structure and function of the transferability discriminator G d f and domain discriminator G dc are the same.In this paper, in order to increase the readability in the subsequent loss error calculations, two different symbols were used to name them.The output of the transferability discriminator can be expressed as: where ∧ d v m,k represents the probability vector of the domain prediction of the v-th small features on the k-th domain adaptation network.x f n,v m,k represents the v-th small features in the v small features on the k-th domain adaptation network.In information theory, the uncertainty can be measured by an entropy function.Therefore, we can generate the weight of the v-th small feature in the v small features according to the entropy function, where the size of the weight represents the transferability of the v-th small features.This weight is calculated as shown in Equation ( 13): where W v m,k represents the weight of the v-th small features of the m-th feature on the k-th domain adaptation network.
Thirdly, the weights obtained in Equation ( 13) were assigned to the corresponding regions to obtain new weighted features x f n,v m,k : The transferability discriminator G d f was optimized the during training to obtain more accurate weights W v m,k .The loss error d f ,k of the transferability discriminator G d f on the k-th domain adaptation network is as follows: where L v d f ,k represents the loss function of the domain discriminator G d f on the v-th small features on the k-th domain adaptation network.
∧ d v m,k represents the predicted domain label vector of the v-th small features on the k-th domain adaptation network.Additionally, d v m,k represents the real domain label vector of the v-th small features on the k-th domain adaptation network.
The k domains adaptation networks are trained together when the network is training.Therefore, the total loss error d f of the attention weighted module can be expressed as:

Fault Classification Module
The fault classification module is mainly composed of fault classifier, where the fault classifier is mainly used to predict the health states of the extracted features.Let the fault classifier be represented as G y , with the fault classifier being composed of three fully connected layers, one ReLU layer and one SoftMax layer.Therefore, the predicted probability of the fault states of the m-th feature x f n,v m,k of the k-th network can be obtained according to Equation ( 17): Then, the fault classifier is optimized so that its identification accuracy is higher.The loss error k y of fault classifier is expressed as follows: where L k y represents the loss function of the fault classifier on the k-th domain adaptation network; ∧ P m,k represents the predicted probability vector of the m-th feature on the k-th network; and P m,k represents the real label vector of the m-th feature on the k-th network.
The k domains adaptation networks are trained together when the network is training.Therefore, the total loss function of the fault diagnosis can be expressed as:

Objective Loss Function of the Proposed MWTA
According to the objective loss function of the multi-domain adversarial module, attention weighted module and fault classification module, the total objective loss function of MWTA can be expressed as: where θ f , θ dc , θ d f and θ y represent, respectively, the parameters of the feature extraction module, multi-domain adversarial module, features weight modules and fault classification module to be optimized.Since the loss function dc and loss function d f are both domain discrimination loss errors, let d represent the sum of the loss functions dc and d f .λ represents the trade-off parameter between the loss function d and loss function of the fault classifier y .
Based on Equation ( 20), the optimization objective of the proposed MWTA was obtained: The optimization objective was solved by Adam, which is a commonly used optimization algorithm.After the optimization, the training of the MWTA was completed.Finally, the trained MWTA was applied to the cross-domain intelligent fault diagnosis of rolling bearings.

Experiments
The proposed method was verified by two intelligent diagnosis cases of bearings, where the diagnosis case 1 was used to verify the transfer fault diagnosis of bearings under different working conditions and the diagnosis case 2 was used to verify the transfer fault diagnosis under of different bearing datasets.

Diagnosis Case 1: Transfer Fault Diagnosis between Different Working Conditions
In this diagnosis case, the bearing datasets of Case Western Reserve University (CWRU) were selected to verify the transfer diagnosis effect of bearings under different working conditions [36].Since the CWRU bearing datasets are an open source dataset, the diagnosis results of the proposed method were compared with those of typical methods and published papers, which makes the comparative results more convincing.

Datasets Description
In this dataset, firstly, we selected a bearing monitoring signal under the condition of sampling frequency of 12 kHz and speed of 1797 r/min to construct the bearing datasets.Secondly, the A, B, C and D datasets were sequentially defined according to the dataset of CWRU, which have four different working conditions of 0, 1, 2 and 3 hp.In each working condition, the dataset consisted of ten health states under four fault types and three fault degrees, where four fault types were normal (N), roller fault (RF), inner race fault (IF) and outer race fault (OF), and three fault degrees were 0.18, 0.36 and 0.54 mm.Finally, 1044 sampling points were regarded as one sample, and the dataset under any working condition included 481 normal samples, 450 roller fault samples, 450 inner race fault samples and 450 outer race fault samples.The details of the bearing datasets are shown in Table 1.In the engineering practice, the monitoring data of the bearings under no load is easier to obtain than the monitoring data under other load conditions.Therefore, three experimental tasks were set up for the diagnosis case 1, as shown in Table 2.In each of the experimental tasks, the dataset under no-load condition was regarded as the source domain dataset, and the datasets under other load conditions were regarded as the target domain dataset.The experiment task can be expressed in the form of A→B, where the left side of the arrow represents the source domain dataset and the right side of the arrow represents the target domain dataset.In each set of the experimental tasks, all source domain data and 50% target domain data were used as training dataset, and the remaining 50% target domain data were used as the testing dataset.

Experimental Parameters
In the experiment, the proposed method consisted of four parts: feature extraction module, multi-domain adversarial module, attention weighted module and fault classification module.The feature extraction module was composed of a convolutional neural network and a non-local network.The network structure of each part was as follows.The convolutional neural network consisted of seven layers of network, including two convolutional layers, two pooling layers, two ReLU layers and one dropout layer.The non-local network was mainly composed of three 1 × 1 convolution operators.The multidomain adversarial module was mainly composed of k domain discriminator G dc .The attention weighted module was mainly composed of k transferability discriminator G d f .Additionally, the domain discriminator G dc and transferability discriminator G d f had the same network structure, both being composed of two fully connected layers, one ReLU layer and one SoftMax layer.The fault classification module was composed of the fault classifier, where the fault classifier was composed of three fully connected layers, one ReLU layer and one SoftMax layer.
Before the experiments, the hyperparameter V in Section 3.3 and the hyperparameter λ in Equation ( 18) needed to be determined, where the size of the hyperparameter v affects the training accuracy and training time, and the hyperparameter λ is used to balance the loss error of the domain discriminator and fault classification.The experiments were performed on the three groups of experiments in Table 2, and each experiment was repeated 10 times.The experimental results are shown in Figures 3 and 4. It can be seen from Figure 3 that, when the hyperparameter v is 50, it has the highest training accuracy on the testing dataset and the training time is moderate, thus 50 was selected as the value of the hyperparameter v.It can be seen from Figure 4 that, when the hyperparameter λ takes a value from 1 to 6, the average training accuracy of the proposed method shows an upward trend.Additionally, when the hyperparameter λ takes a value from 6 to 10, the average training accuracy of the proposed method shows a downward trend.Therefore, 6 was used as the value of the hyperparameter λ.

Diagnosis Results and Comparisons
In order to verify the effectiveness of the proposed method, five typical diagnosis methods were introduced for comparison with the proposed method.The first typical diagnosis method is the wavelet envelope features and support vector machine (WEF-SVM) [37].In this method, firstly, the bearing signal is decomposed by wavelet envelope analysis.Then, the decomposed signals are used to calculate the wavelet energy ratio.Finally, these energy features are input to the support vector machine for fault classification.The other four typical transfer diagnosis methods are: deep domain confusion maximizing for domain invariance (DDC) [22], domain adaptive neural network (DANN) [34], transferable attention for domain adaptation (TADA) [23] and adversarial discriminative domain adaptation (ADDA) [38].The experimental tasks are shown in Table 2, and each experimental task was repeatedly trained 10 times.Additionally, the average accuracy of each group of tasks using the different diagnostic methods was counted.
The comparison results between the proposed method and the above five typical methods are shown in Table 3.The final average diagnosis accuracy of WEF-SVM is 78.06%.
In the feature extraction of WEF-SVM, the different distribution of the source and target domains is not considered, resulting in a low accuracy.DDC uses the concept of domain confusion to reduce the difference in the feature distribution of the source and target domains, and it obtains an average diagnosis accuracy of 92.70%.DANN uses the idea of adversarial network to confuse the feature extractor, so that the source domain and target domain features extracted by the feature extractor can be on the same feature distribution.Additionally, the final average diagnosis accuracy of DANN is 95.80%.TADA proposed a method to solve the problem of different transferability in different regions of the sample based on the attention mechanism.Additionally, the final average diagnosis accuracy of TADA is 96.53%.ADDA is also based on an idea of adversarial network, hoping to map the source domain network to the target domain, and then obtain an average diagnosis accuracy of 97.40%.For the proposed MWTA, the final average diagnosis accuracy reaches 99.20%.It is seen that the average diagnostic accuracy of the proposed method is higher than other methods, which demonstrates the effectiveness of the proposed method.In order to make the comparison results more convincing, we compared the diagnosis results of the proposed method with those of published papers.The comparison results are shown in Table 4. Zhu et al. [39] proposed a bearing fault transfer diagnosis model based on the convolutional neural network (CNN) and maximum mean difference (MMD) methods, and the fault identification accuracy of this model was 94.74%.Xu et al. [40] proposed a method based on transfer component analysis (TCA), and the accuracy of classification could reach 91.40%.Chen et al. [41] proposed the deep neural network (DNN) method, which can achieve a 94.95% accuracy on three-classification tasks.Wang et al. [42] combines the convolutional neural network (CNN) and deep long short-term memory (DLSTM) methods to propose a new transfer model, whose final accuracy of fault identification could reach 96.21%.Li et al. [43] used convolutional neural networks and parameter transfer to achieve a transfer accuracy of 92.25%.Van et al. [44] combined wavelet kernel local fisher discriminant analysis and a support vector machine (SVM) to propose a new classification method, whose accuracy could reach 98.80%.The results show that proposed method has better performance compared with the other fault diagnosis models.In order to illustrate the alignment of the source domain features and the target domain features in cross-domain fault diagnosis of bearings, we use dT-SNE [45] to visualize these features and visually analyze the diagnosis results of each diagnosis method.The results of the T-SNE feature visualization are shown in Figure 5.The proposed MWTA considers the characteristics of the multi-mode structure between datasets and the transferability between different regions of the sample.In the features space, the features of the source and target domains of the same fault states merge with each other, and their distribution tends to be similar.Meanwhile, the features of different health states are separated from each other to ensure the accuracy of fault identification.Therefore, compared with the other five methods, the proposed method can obtain a higher fault identification accuracy.

Diagnosis Case 2: Transfer Fault Diagnosis between Different Bearings
The experimental data of this diagnosis case were obtained from Case Western Reserve University bearing datasets, SpectraQuest's mechanical failure simulator [46] and gearbox bearing failure test bench.The SpectraQuest's mechanical failure simulator is shown in Figure 6 and the gearbox bearing failure test bench is shown in Figure 7.

Data Collection
In the experiment, the three different bearing datasets, obtained from Case Western Reserve University bearing datasets, SpectraQuest's mechanical failure simulator and Gearbox bearing failure test bench, were defined as dataset E, dataset F and dataset G, respectively.The three different bearing datasets all contained four different health states: normal (N), roller fault (RF), inner race fault (IF) and outer race fault (OF).The sampling frequency of dataset E was 12 kHZ, the number of samples was 1710, and the rotation speed and load conditions of the bearing equipment were 1772 r/min and 1 Hp, respectively.The sampling frequency of dataset F was 5 kHZ, the number of samples was 1528, and the rotation speed and load conditions of the bearing equipment were 1528 r/min and 6 g, respectively.The sampling frequency of dataset G was 5 kHZ, the number of samples was 2400, and the rotation speed and load conditions of the bearing equipment were 600 r/min and 3 N*mm.The comparison results of the three datasets are shown in Table 5.For three different bearing datasets, six groups of experimental tasks were set, as shown in Table 6.The experiment task were expressed in the form of E→F, where the left side of the arrow represents the source domain dataset and the right side of the arrow represents the target domain dataset.Additionally, in each set of the experimental tasks, all source domain data and 50% target domain data were used as the training dataset, and the remaining 50% target domain data were used as the testing dataset.The proposed method was compared with the five typical transfer intelligent diagnosis methods introduced in Section 4.1.3.The comparison results of the experiments are shown in Table 7.The results show that the average diagnosis result of the proposed MWTA can reach 85.60% and the diagnosis accuracy of the other five methods varies from 40.54% to 77.50%.Through comparative analysis, the diagnosis result of the proposed method is better than those of the other methods, which proves the effectiveness of the proposed method.Furthermore, the extracted features of each method were visualized by T-SNE, and the feature visualization results are shown in Figure 8.It can be seen that, compared with the other methods, the proposed method can effectively align the feature distributions of the source and target domains, and the features of different health states were distinguished well.Therefore, the proposed method can improve the identification accuracy of crossdomain fault diagnosis under different bearings by considering the multi-mode data structure and the local non-transferable characteristics of the samples.

Conclusions
In order to consider the characteristics of the bearing multi-modal structure and the transferability of the sample local regions, a multi-domain weighted transfer adversarial network was proposed for the cross-domain intelligent fault diagnosis of bearings.The proposed method considers the multi-mode structure of the bearing dataset by designing a multi-domain adversarial module and solves the problem that different regions of the bearing signals have different transferability by designing an attention weighted module.Through two fault diagnosis cases, it was illustrated that the proposed method can effectively reduce the difference of the probability distribution between the source domain and target domain datasets, thereby improving the cross-domain fault diagnosis accuracy.In addition, it can be seen from Tables 3 and 7 that, for the first case, the accuracy of cross-domain fault diagnosis increased by 7.10% on average, and for the second case, the accuracy of cross-domain fault diagnosis increased by 21.17%, indicating that this method can significantly improve the accuracy of cross-domain fault diagnosis.

Figure 1 .
Figure 1.The flow chart of the non-local module.

Figure 2 .
Figure 2. The structure of the proposed MWTA.

Figure 3 .
Figure 3. Diagnostic accuracy of the proposed method according to different values of the hyperparameter V.

Figure 4 .
Figure 4. Diagnosis accuracy of the proposed method according to different values of the hyperparameter λ.
n = n s + n t represents the total number of samples in the source domain and target domain datasets.λ is a trade-off parameter between the loss of domain discriminator and the loss of fault classifier.θ f , θ y and θ d are the optimization parameters of G f , G y and G d , respectively.x i represents the i-th sample in the source domain dataset.x m represents the m-th sample in the union of the source domain and target domain datasets.y i represents the health state label of the i-th sample; d m is the domain label of the m-th sample, and it represents the source domain when d m = 0 and the target domain when d m = 1.L y and L d , respectively, represent the loss function of the fault classifier and the loss function of the discriminator.The optimization of the parameters is completed by Equation (2).

Table 1 .
Details of the bearing datasets of the diagnosis case 1.

Table 2 .
Transfer diagnosis experiment in the diagnosis case 1.

Table 3 .
The fault identification results of the proposed method and typical methods.

Table 4 .
The fault identification results of the proposed method and published papers.

Table 5 .
Details of the bearing datasets of the diagnosis case 2.

Table 6 .
Transfer diagnosis experiment in the diagnosis case 2.

Table 7 .
The fault identification results of the proposed method and typical method.