1. Introduction
Bearinga are one of the most important components of all mechanical systems, and their healthy working state is the basic guarantee for all kinds of machines to engage in production [
1,
2,
3,
4,
5]. With the continuous progress of measurement technology and the development of Internet of Things technology [
6], massive data measured through sensors are available, and bearing diagnosis methods based on signal analysis or machine learning have made great progress in recent work [
7,
8,
9,
10]. Among fault diagnosis methods based on signal analysis, common feature extraction methods include wavelet transform, spectral analysis, empirical mode decomposition, and fast Fourier transform [
11,
12,
13,
14]. Feng et al. investigated the correlation between tribological features of abrasive wear and fatigue pitting in gear meshing and constructed an indicator of vibration cyclostationarity (CS) to identify and track wear evolution [
15]. Cédric Peeters et al. derived blind filters using constructed envelope spectrum sparsity indicators and proposed an effective method of fault detection in rotating machinery [
16]. In addition, the methods based on machine learning and deep learning are more and more widely used because they do not rely on rich expert experience and can learn complex nonlinear relationships effectively [
17]. The common models among them are deep neural networks, sparse coding, and Bayesian analysis, which optimize the parameters of classifiers by utilizing labeled historical fault data [
18,
19,
20].
Although samples from unlabeled test datasets can often be effectively diagnosed by these methods, most of them are usually based on an ideal assumption that training datasets and test datasets are collected from the same part of the same machine and are quite similar in terms of data distribution [
21]. Real world conditions are often very different from ideal ones because obtaining vibration data with labels from critical parts of some machines is hard or even impossible [
22]. Furthermore, collecting data samples from each machine and labeling them manually cost a large amount of time and money [
23]. Hence, several training datasets can only be collected from another part of the same machine or from other similar but different scenarios [
24]. This kind of bearing fault diagnosis task with training data and test data from different datasets is usually called cross-domain fault diagnosis by researchers. Intrinsic discrepancies are observed in the data distribution between training datasets (source domain) with labels and test datasets (target domain) without labels. The discrepancies between source domain and target domain always have an adverse impact on the diagnostic accuracy, which is the main difficulty of cross-domain diagnosis tasks. Domain adaptation is the main strategy to solve this difficulty at present, which means making knowledge from the source domain useful for the target task by eliminating the distribution discrepancies between two domains.
To bridge the gap between the two domains in cross-domain diagnosis tasks and avoid the deterioration of performance of source data classifier on discrepant target data, several transfer learning methods based on domain adaptation have been developed in recent studies [
25,
26]. The common strategy of transfer learning methods based on domain adaptation is to train a domain-invariant classifier by eliminating the distribution mismatch between source and target domains [
27,
28]. To reduce the distribution mismatch between the two domains, distance matching methods and adversarial learning models are two main types of strategies. As for the former, maximum mean discrepancy (MMD) [
3] is widely applied for distance measurement. For a better, faster computation, multikernel maximum mean discrepancy (MK-MMD) is proposed [
29]. For instance, Xie et al. trained a domain-invariant classifier for cross-domain gearbox fault diagnosis by transfer component analysis (TCA) technology [
30]. The latter kind of methods train a domain-invariant classifier through adversarial training between domains inspired from the strategy of generative adversarial network (GAN) [
31]. In adversarial domain adaptation methods, the discriminator is responsible for distinguishing whether the feature produced by the feature extractor comes from the source domain or the target domain, whereas the feature extractor tries to obtain features that can not be distinguished by the discriminator. The competition between them drives them to promote their performance until the source domain features and target domain features are indistinguishable. A domain-invariant classifier that can be applied in source datasets and target datasets is then obtained.
In addition to marginal distribution alignment methods based on distance matching and adversarial training, conditional distribution alignment for unsupervised bearing health status diagnosis has been recently considered by many researchers. For example, Li et al. used the over fitting of various classifiers in a proposed adversarial multiclassifier model to improve class-level alignment [
32]. Qian et al. introduced a novel soft label strategy to assist joint distribution alignment of fault source and target domain datasets [
33]. Zhang et al. combined a novel subspace alignment model with JDA to achieve joint distribution alignment in fault diagnosis of rolling bearings under varying working conditions [
34]. Wu et al. utilized joint distribution adaptation and a long-short term memory recurrent neural network model simultaneously to achieve domain adaptation under the condition of unbalanced bearing fault data [
35]. Yu et al. proposed an effective simulation data-based domain adaptation strategy for intelligent fault diagnosis in which conditional and marginal distribution alignment is achieved between source data from the simulation model and target data from mechanical equipment in the actual field [
36]. Moreover, Kuniaki Saito et al. proposed a novel maximum classifier discrepancy (MCD) to reduce the decision boundary of the classifier and achieved better classification performance [
37]. Li et al. measured the completion degree of classification by employing a criterion based on linear discriminant analysis (LDA) to enhance the performance of the classifier on the target domain [
38]. Yu et al. quantitatively measured the relative importance of marginal and conditional distributions of two domains and dynamically extracted domain-invariant features in a proposed novel dynamic adversarial adaptation network (DAAN) [
39].
Although these transfer learning methods have made some progress in cross-domain bearing fault diagnosis, there are still some challenges that will cause negative transfer. Positive transfer means the knowledge learned from the source domain can effectively improve the performance of the model in the target domain. In contrast, the learning behaviour will be called negative transfer [
40,
41] when the knowledge from the source domain hinders the performance of learning in the target task. Obviously, it should be avoided in cross-domain diagnosis tasks. However, these difficulties that may lead to negative transfer have not received sufficient attention from previous researchers. The two main problems leading to negative transfer in cross-domain bearing fault diagnosis are as follows. First, some samples with poor transferability have not received as much attention as they deserve, which will lead to negative transfer. Almost all the existing domain adaptation models assume that all the data have the same influence. Certainly, this assumption does not hold in many real-world scenarios. Some sample data are more difficult to align because they are far from the distribution center when there is some noise in the measurement environment or under non-stationary conditions [
42,
43]. If each sample has the same weight in the domain adaptation training, achieving a good conditional distribution alignment will be difficult no matter how many training epochs are taken. Moreover, with the continuous updating of model parameters, the good adaptation performance of features from samples with strong transferability will be destroyed. At this stage, negative transfer occurs.
Second, the imbalance between the classifier training and domain adaptation may also cause negative transfer. Training of the fault classifier and the domain adaptation process are carried out simultaneously in most previous models, but the relationship between them is ignored and the two processes are considered independently. Indeed, excessive adaptation is likely to cause the failure of a diagnostic classifier, whereas over completion of classifier training could cause domain mismatch [
44]. Both processes influence the diagnosis accuracy, that is, the over/under completion of any of the two processes leads to negative transfer. From the above analysis, we can see that there is a strong motivation to establish a more advanced method to solve the two challenges that may cause negative transfer in cross-domain fault diagnosis.
In order to solve the above challenges in the field of cross-domain fault diagnosis, a novel DRDA method is proposed in this paper. To address the negative transfer caused by some samples with poor transferability, a soft reweighting strategy inspired by curriculum learning and conditional information entropy is proposed. Such an indicator can well measure the adaption performance of each sample in real time to provide more attention to poorly aligned samples. After proper weight adjustment, the clustering of source and target domain datasets are strengthened, thus, the conditional distribution alignment can also be improved. To address the negative transfer caused by the imbalance between the classifier training and domain adaptation, a balance factor is introduced in our method to strike a balance between them and obtain a higher accuracy in the final diagnosis on the target domain dataset. Specifically, MMD is used as an estimator to observe the degree of domain adaptation. The factor
based on linear discriminant analysis (LDA) [
38] was proposed to estimate the degree of classifier training. An effective balance factor
can then be constructed by combining these two items. Sufficient verification experiments demonstrate that our model outperforms state-of-the-art methods.
The main contributions of our work are as follows.
- 1
A novel dynamic reweighted domain adaptation method is proposed to address challenges that will cause negative transfer in cross-domain bearing fault diagnosis. A reweighted adversarial loss strategy is introduced in DRDA to eliminate negative transfer caused by samples with poor transferability in cross-domain bearing fault diagnosis.
- 2
A powerful balance factor is constructed in the proposed method to eliminate negative transfer caused by the imbalance between the classifier training and domain adaptation in cross-domain bearing fault diagnosis.
This paper is organised into the following sections. The preliminary concept is introduced in
Section 2, including the problem description and a brief introduction to the domain adversarial learning and maximum classifier discrepancy theories. In
Section 3, the proposed DRDA model for cross-domain diagnosis is introduced in detail.
Section 4 presents and analyzes the diagnosis performance on cross-location and cross-speed cases. Finally, this article is concluded in
Section 5.
3. Proposed Method
For a more robust and positive transfer, this paper introduces two reweighting strategies from two perspectives. First, adversarial loss is reweighted from the perspective of each sample. Second, a balance between domain adaptation and discrimination is made from the perspective of the whole dataset. In this section, the method, which mainly consists of four parts, will be explained in detail.
3.1. Expanding MCD
Original MCD and much derived research achieve this idea using two classifiers [
46]. However, in the final step, several methods may be used for eventually classifying target samples. For example, several methods may use the best result between
,
. Several methods may combine two classifiers’ probability outputs and compare the sum to identify the final label. To remove such ambiguity, three classifiers are utilized, as shown in
Figure 4, where
is the main fault classifier;
and
are the two classifiers used only for computing discrepancy loss. Discrepancy loss contains three parts, and the equation below is our optimization objective:
where
,
,
,
are the feature generator and three fault classifiers that can be trained by the source domain data, respectively;
,
,
,
are their parameters;
is the target domain sample that is input into the model. In our model,
is used as the main classifier. Fixing
and
maximize the discrepancy between
and
in the target domain, and the target data excluded by the support of the source can be detected. Next,
and
are fixed and the discrepancy is minimized by training
and
to learn strong discriminating features.
3.2. Reweighted Adversarial Loss
In realistic domain adaptation scenarios, domain distributions that embody multi-mode structures prone to negative transfer are always a great challenge. Previous studies were aimed at finding excellent domain adaptation but rarely considered the role of each sample. They assumed that all the samples have equal transferabilities and make equal transfer contributions.
To promote positive transfer, it should be noticed that the transferability of each sample in the source domain or the target domain is different [
47]. Different samples do not align well at the same time, that is, several samples with strong transferability may generate easily good transferable feature and have excellent performance in earlier iterations, whereas others may be just the opposite. Therefore, introducing an effective indicator to estimate such differences is needed. Inspired by information theory, entropy can measure uncertainty [
48], and a reweighting strategy is proposed in this work to measure the degree of adaptation for each sample. The well-aligned samples obtain a lower entropy, whereas the poorly-aligned samples always have a higher entropy.
In the domain adaptation process, poorly aligned or slowly aligned samples should be given more attention. Hence, probability of a classifier is utilized to compute conditional entropy and reweight the domain adversarial loss of each sample. The traditional domain adversarial loss
can be extended as follows:
where
is the domain discriminator, and
is its parameters. Joint with self-adaptive weight, domain adversarial loss can be rewritten as follows:
where
is the weight of source sample
and
is the weight of target sample
;
and
are the probabilities for each class
k towards
and
, respectively, computed by the main fault classifier
; and
is the conditional information entropy. The larger
of a sample means its worse adaptation performance in the current epoch of training. Therefore, its corresponding weight
or
in domain adversarial training will be larger. Through this mechanism of paying more attention to poorly aligned samples, the negative transfer caused by samples with poor transferability could be effectively eliminated.
3.3. Balance Factor for Domain Adaptation and Class Discrimination
In the former section, two main approaches are introduced to solve the domain adaptation problem. First, represented by a deep adversarial neural network (DANN), such methods aim to pursue feature representations that make both domains align well, and these common features can have a good transferability between domains. Second, similar to MCD, such methods believe that features aligning well are not sufficient, and obtaining excellent performance in the target domain task is the ultimate goal. For this purpose, these methods try to find class-specific features, and this process is so-called class discrimination.
Xiao et al. [
44] proposed that the importance of two said items are different during algorithm iteration. For example, in the beginning of algorithm training, domain adaptation is more important than class discrimination. With an increase of training iterations, domain-share features are learned better and the class discrimination should receive more consideration. Inadequate and excessive domain adaptation or discriminant learning are harmful to positive transfer. Thus, how to balance these two items dynamically matters. It is well known that MMD is a good choice to measure the difference between two distributions. Naturally, MMD is used in this work as an estimator to observe the degree of adaptation between the two domains. MMD is defined as follows:
For class discrimination,
based on linear discriminant analysis (LDA) [
38] was proposed to estimate this item. LDA’s optimization objective of conventional two-category classification is defined as follows:
where
is the intra-class scatter matrix, and
is the inter-class scatter matrix [
49];
indicates the mutual distance of clusters having different labels; and
shows the compactness of data having the same labels. From this point, the class discrimination can be well measured. Expanding this idea to multiclass learning problem, the corresponding indicator is defined as follows:
The above discussion shows that MMD can depict the overall degree of domain adaptation. It can be applied in transfer learning, and the
can be used to measure the feature’s degree of class discrimination. Combining these two items, a proper computation for balance factor
can be found. However, it should be noticed that J(W) and MMD(
) may have different magnitudes, and normalization is necessary;
and
are defined as the corresponding normalized value:
where
and
are all located in interval [0,1]. The balance factor can be computed as follows:
Notably,
is the weight of domain adaptation loss, and
is the weight of class discrimination loss. Thus, the loss after adding the balance factor for domain adaptation and class discrimination is written as follows:
A smaller means better domain adaptation, and a larger indicates worse class discrimination. If domain adaptation is much better than class discrimination, the and can be very small and the corresponding is close to 0. If domain adaptation is much worse than class discrimination, the and will be close to 1 and approaches 1. If equals to 0.5, these two loss items have the same weight. The factor can dynamically adjust the weight of loss items and effectively control excessive or insufficient domain adaptation and class discrimination.
3.4. Dynamical Reweighted Domain Adaptation (DRDA)
The concrete architecture of the proposed DRDA model can be seen in
Figure 4. According to the three parts, the overall optimization objective can be obtained:
The first item is classifier cross-entropy loss, the second item is reweighted adversarial loss (RAL), and the last item is class discrimination loss. The last two items multiply their own balance factor. Through the dynamic reweight for adversarial loss and balance factor to tune up the importance of domain adaptation and classifier training, our model effectively avoids negative transfer phenomenon and obtains a robust end-to-end cross domain bearing fault diagnosis system.
3.5. Training Steps
To obtain the optimal solution of the proposed model in the previous discussion, a feature generator, three fault classifiers, and one domain discriminator need to be trained. In the last subsection, the optimization objective was given, and how to solve this problem in four steps will be shown next. The expression is hyper parameters learning rate and in the experiments, 0.01 or 0.001 is selected; is the balance factor. The concrete computing method was given before, and only the detailed procedure of parameters updating will be discussed here.
Step A: The main fault classifier
and feature generator
are trained to make the main fault classifier more discriminantive and to classify the source data correctly. The network is trained to minimize the cross entropy loss
and update the parameters,
,
:
Step B: Re-weighted domain adversarial loss
is minimized. The feature generator
is fixed, and only the domain adversarial module
is trained. The parameters
are updated as follows:
Step C: The feature generator
is fixed, and three fault classifiers are trained to increase the discrepancy, which means minimizing
. As MCD, the main fault classifier’s cross entropy loss
is also added to
. Parameters
,
, and
are updated as follows:
Step D: Finally, the balanced reweighted domain adversarial loss
and discrepancy loss
are optimized. In this step, fault classifiers
,
,
and domain adversarial module
are fixed to update parameter
:
4. Experiments
The advantage of the proposed DRDA is demonstrated through evaluation experiments on two cases: a bearing fault dataset from Case West Reserve University (CWRU) and a rotor test dataset (RT). Various previous cross-domain diagnosis models are introduced for a thorough comparison. The performance of the proposed method is compared with the following cross-domain bearing fault diagnosis methods in previous studies: CNN without domain adaptation, transfer component analysis (TCA), joint distribution analysis (JDA), deep adversarial neural networks (DANN), fine-grained adversarial network-based domain adaptation (FANDA), and maximum classifier discrepancy (MCD).
4.1. Cross-Location Diagnosis on CWRU Case
The bearing fault datasets from CWRU are available from its official website [
50]. Bearings with faults are placed at either the fan end (FE) or the drive end (DE) in each test. The bearing types at the FE and DE are SKF 6203-2RS and SKF 6205-2RS, respectively. The parameters of the bearings at both ends are listed in
Table 1.
The bearing data samples are divided into four classes according to health status: outer race fault (OF), inner race fault (IF), ball fault (BF), and health (H). These bearing faults are all obtained by artificial processing. The frequency of the sampling instrument is 12 kHz, and the depth of every fault is 0.007 inches. Specific working conditions corresponding to each dataset are shown in
Table 2.
Data from
and
are collected from the bearing at the DE and their working loads are 0 HP (1797 r/min) and 3 HP (1730 r/min), respectively. Similarly,
and
are collected from the bearing at the FE and their working loads are also 0 HP (1797 r/min) and 3 HP (1730 r/min), respectively. Every dataset has 4000 examples, and the data shape of each time series example is
. As shown in
Figure 5 the CWRU case has four cross-location diagnosis tasks, i.e.,
→
,
→
→
,
→
. In each task, the former is the source domain with labels and the latter is the target domain without labels. The performance of each model is measured by the accuracy of fault predictions, which is defined as follows:
The diagnosis results of the four cross-location tasks are listed in
Table 3. First, as expected, the performance of CNN is the worst among all the models because the distribution mismatch is not eliminated at all. The performance of CNN shows great difference in two reversed tasks, in which the source and target domains correspond to two identical datasets but the roles are reversed. For example, the diagnosis accuracy reached 69% in the task
, but this number is reduced to approximately 44% in the corresponding reversed task
, probably because of the large statistical distribution discrepancy between the two domains. This situation is clearly improved in other domain adaptation based models, and this confirms the necessity of domain adaptation in cross-domain bearing fault diagnosis.
Figure 5.
Cross-location bearing diagnosis tasks in CWRU case. Data samples with inner race faults are used for illustration.
Figure 5.
Cross-location bearing diagnosis tasks in CWRU case. Data samples with inner race faults are used for illustration.
Second, the mean accuracy of the proposed method in this paper is 89.38%, which outperforms that of all the comparison models. In addition, domain adaptation models that align both marginal and conditional data distribution of the two domains perform better than those that only align marginal distribution. The mean accuracy of FANDA and JDA exceeds 84%, whereas for the DANN aligning marginal distribution only, the mean accuracy is under 82%. The DANN utilizing adversarial domain adaptation performs better than TCA utilizing MMD criterion. Moreover, MCD performs better than TCA, JDA, and DANN due to the longer distance between the feature of each sample and the decision boundary, and it performs slightly worse than FANDA, probably because of an insufficient degree of domain adaptation.
The confusion matrices of task
→
shown in
Figure 6 are introduced to analyze the results in more detail. The diagnosis accuracies of both FANDA and DANN are low, and the classifiers learned from them are unable to identify ball fault (BF) and outer race fault (OF) effectively. Part of the classification error of BF is corrected by MCD, whereas the misclassification phenomenon of OF is still evident. As can be seen from the figure, the proposed DRDA can accurately distinguish four kinds of bearing health status.
4.2. Cross-Speed Diagnosis on RT Case
Another dataset is collected from a rotor test in a practical scenario [
51], and the schematic diagram of the test rig is shown in
Figure 7. The power source of this rotor rig is a three-phase induction motor. The motor is connected to a shaft that is supported by a few bearings through a coupling. The bearings to be monitored are located at the right end of the shaft, and radial loads are provided by a loaded device. The vibration sensor for data collection is installed at point A. The bearing type in this diagnosis case is HRB 6010-2RZ (Harbin Bearing Manufacturing Co., Ltd., Harbin, China), and the related details are shown in
Table 4.
Similarly, data samples are divided into four classes according to health status: outer race fault (OF), inner race fault (IF), ball fault (BF), and health (H). Three kinds of faults are processed manually by wire electrical discharge machining. As shown in
Figure 8, the rotational speed of the motor for domains C, D, and E is set to 3000, 5000, and 8000 revolutions per minute (rpm) during the measurement, respectively. Every domain has 4000 samples, and the data shape of each time series sample is
. The frequency of the sampling instrument is 65,536 Hz, and the load exerted on the shaft and bearings by the radial loaded device is 2.0 kN. According to the said description, the details of RT case datasets are listed in the
Table 5.
The designed RT case has six cross-domain diagnosis tasks in d, namely
C→
D,
D→
C,
C→
E,
E→
C,
D→
E,
E→
D, and their results are listed in
Table 6.
Previous work in signal processing showed that the knowledge for fault diagnosis at variable speeds becomes more difficult to extract [
52]. Moreover, the excellent performance of the proposed DRDA confirms that our model can robustly extract domain-invariant features under cross-speed diagnosis scenarios. First, as listed in
Table 6, the performance of TCA and JDA degenerates more seriously than that of the other cross-domain diagnosis models. Their accuracies are under 30% in most cross-speed scenarios. For example, TCA and JDA only achieve an accuracy of 22.90% in the task of
E→
C. In accordance with the CWRU case, JDA performs slightly better than TCA in almost all tasks.
Second, the proposed DRDA performs better than other domain adaptation methods in almost all tasks, with a mean accuracy of 83.87% and an accuracy of 100% in tasks C→D, C→E, and D→E. Similarly, FANDA based on marginal and conditional distribution adaptation outperforms DANN in most tasks. In addition, the performances of almost all methods in tasks with a high speed dataset as the source domain and a low speed dataset as the target domain are unexpectedly much worse than those in reversed tasks (with a low speed dataset as the source domain and a high speed dataset as the target domain). For instance, the accuracies of FANDA, MCD, and DRDA are all above 95% in task C→E, whereas the numbers greatly reduce to approximately 50% in task E→C, because samples obtained under the working condition of high rotating speed have more noise, and extracting effective domain-invariant information for diagnosis is much more difficult when high speed datasets are set as the source domain.
To demonstrate the effectiveness of the proposed DRDA method more intuitively, the t-distributed stochastic neighbor embedding (t-SNE) technology [
53] is utilized to visualize features by mapping them into 2D space. The t-SNE visual renderings of all the models on task
D→
C are shown in
Figure 9. TCA and JDA almost fail completely on this difficult task in the RT case. CNN can only align a small part of the features with IF and OF. The confusion between IF and OF is serious in both DANN and FANDA. The confusion between IF and OF is remarkably improved in MCD, but the discrimination performance between H and BF deteriorates sharply. It is shown in
Figure 9(g) that the distribution boundary of features from each class is tighter in DRDA, and the domain adaptation of features of samples with H, IF, and OF is clearly improved. In addition, another
score metric is also introduced to verify the effect of the proposed DRDA method in eliminating negative transfer in difficult diagnosis tasks and improving the final diagnosis performance. The
score is defined as
where precision is the proportion of bearing samples given the correct label to all bearing samples in the predicted class. Recall is the proportion of bearing samples given the correct label to all bearing samples in the real class. The
score of four models that perform well under the accuracy metric in
→
and
D→
C tasks are shown in
Table 7. As shown in the table, the diagnosis performance of the proposed DRDA method is still obviously improved compared with other three methods under the
score metric. Moreover, compared with the traditional DANN and FANDA methods based on domain adversarial learning, the proposed DRDA has a greater improvement in diagnosis performance under the
score metric than under the accuracy metric.
4.3. Ablation Study
The effects of the RAL and balance factor for domain adaptation and class discrimination on enhancing the diagnosis accuracy are analyzed through comparative experiments. The performance of DRDA with or without RAL and balance factor on task
E→
C in the RT case is shown in
Table 8. The diagnosis performance in this task is enhanced when RAL strategy and balance factor or one of them is introduced. Furthermore, our RAL strategy is introduced to the traditional DANN model. The performance of DANN with or without RAL module on task
→
in the CWRU case and task
D→
C in the RT case are shown in
Figure 10, where great improvement can be noticed. This finding shows that the proposed RAL strategy can also promote positive transfer when used in other domain adaptation methods based on adversarial training.