You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

17 February 2023

Semi-Supervised Adversarial Transfer Networks for Cross-Domain Intelligent Fault Diagnosis of Rolling Bearings

,
,
and
1
College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China
2
Key Laboratory of Special Purpose Equipment and Advanced Processing Technology, Ministry of Education and Zhejiang Province, Zhejiang University of Technology, Hangzhou 310023, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Intelligent Fault Diagnosis of Rotating Machinery

Abstract

In recent advances, deep learning-based methods have been broadly applied in fault diagnosis, while most existing studies assume that source domain and target domain data follow the same distribution. As differences in operating conditions lead to the deterioration of diagnosis performance, domain adaptation technology has been introduced to bridge the distribution gap. However, most existing approaches generally assume that source domain labels are available under all health conditions during training, which is incompatible with the actual industrial situation. To this end, this paper proposes a semi-supervised adversarial transfer networks for cross-domain intelligent fault diagnosis of rolling bearings. Firstly, the Gramian Angular Field method is introduced to convert time domain vibration signals into images. Secondly, a semi-supervised learning-based label generating module is designed to generate artificial labels for unlabeled images. Finally, the dynamic adversarial transfer network is proposed to extract the domain-invariant features of all signal images and provide reliable diagnosis results. Two case studies were conducted on public rolling bearing datasets to evaluate the diagnostic performance. An experiment under variable operating conditions and an experiment with different numbers of source domain labels were carried out to verify the generalization and robustness of the proposed approach, respectively. Experiment results demonstrate that the proposed method can achieve high diagnosis accuracy when dealing with cross-domain tasks with deficient source domain labels, which may be more feasible in engineering applications than conventional methodologies.

1. Introduction

Complex rotating machinery is widely deployed in critical engineering fields such as aerospace, automobile manufacturing, rail transit, etc. [1,2]. Rolling bearings are considered one of the most essential elements of rotating machinery; thus, an inconspicuous failure of the bearing can lead to the destruction of the entire machine. Specifically, more than 30% of rotating machinery failures are attributed to the deterioration of bearings [3]. Consequently, detecting early rolling bearing faults in time is crucial to preventing accidents [4,5].
To accurately detect faults, data-driven approaches have recently been introduced into this field due to their powerful ability to construct diagnosis models from condition monitoring data without much expert domain knowledge [6]. Among them, fault diagnosis using deep neural networks (DNNs), such as deep convolutional networks (DCNs) [7], deep auto-encoders (DAEs) [8], and deep residual networks (DRNs) [9], have attracted increasing attention. However, most of them are developed based on a key assumption that the labeled source domain data and unlabeled target domain data are submitted to the same distribution. In practical industrial scenarios, due to differences in operating conditions and influence from assembly errors and noise, there are differences in data distribution, which leads to the deterioration of cross-domain diagnosis performance. Such challenging issues that attempt to transfer learned features from source domain to target domain are called the domain shift problem [10].
Specifically, the source-domain classifier can only utilize training data from the source domain to effectively train. However, the performance of model is degraded owing to the domain shift, which makes the diagnosis task unattainable. Figure 1 shows the domain shift phenomenon.
Figure 1. Description of domain shift and domain adaptation.
In order to address the above problems, scholars have developed the domain adaptation (DA) method for diminishing sample distribution discrepancy [11]. Concretely, DA can utilize relevant domain-invariant features extracted from the domain to accomplish a diagnosis task. Meanwhile, many recent studies have shown that DA exhibits encouraging performance on cross-domain works, including semantic correlation transfer [12], handwritten text recognition [13], speaker recognition [14], etc.
Although DA is effective in solving cross-domain problems, most of the existing methods assume that source-domain labels are available for each health state, i.e., all source domain data should be labeled [15,16,17,18]. However, this assumption is almost unpractical in real-world applications. On the one hand, for large rotating machinery, it is impractical to perform massive and detailed full-cycle testing in actual field, because a lot of time must be spent to obtain reliable fault status signals. On the other hand, due to the complexity and uncertainty of mechanical systems, even if vibration signals can be collected in advance, most of the signal labels are unknown. Consequently, source domain labels are insufficient for domain adaptation, resulting in inefficient detection performance. Fortunately, a small number of labeled signals are easily available, and they can be applied to adaptive fault detection.
To overcome the above drawbacks, a semi-supervised adversarial transfer network for cross-domain diagnosis is proposed in this article. Different from the assumption of most existing research, the proposed method does not require a sufficient number of source domain labels for network training. First, a signal-to-image method is adopted to convert vibration signals to images by employing the Gramian Angular Field (GAF). Next, a generative label extension module is designed to generate artificial labels to address the shortage of source domain labels. Then, the artificially labeled data and the labeled data are fed into the dynamic adversarial transfer network for domain adaptation health states detection. In this way, even if most source domain signals lack corresponding labels, the proposed method can still achieve promising performance in dealing with cross-domain tasks. The contributions of this paper are summarized below:
  • A novel adversarial domain adaptation networks is proposed to recognize the health conditions of rolling bearings. In addition to adopting a signal processing method to provide more comprehensive characterization for feature extraction, an adversarial domain adaptation approach is also introduced to optimize the cross-domain data distributions, which is beneficial to fault diagnosis under significant changes in the working conditions.
  • A semi-supervised learning (SSL)-based label generating module is designed to address the issue of insufficient source domain labels. The generalized features are extracted using consistency regularization and pseudo-labeling. Then, the artificial labels are determined by predicted their probability. Both of them are applied in source domain label generation for intelligent fault detection.
The remainder of this literature begins with a review of related works in Section 2. The proposed intelligent fault detection approach is provided in Section 3. Experimental validation is performed to evaluate the proposed method in Section 4. Section 5 concludes this literature.

3. Proposed Method

3.1. Problem Formulation

Generally, this study was conducted under the following assumptions:
  • Due to the diversity in operating conditions, although the source domain and the target domain are interrelated, they follow the distinct distributions.
  • The purpose of intelligent fault diagnosis in various domains is consistent.
  • The source-domain data (whether labeled or unlabeled) and all unlabeled data from the target domain can be employed to train networks.
  • During training, not only are signals from the target domain unlabeled, but most signals from the source domain are also unlabeled.
The above assumptions simulate the actual industrial situation where signal labels are insufficient, i.e., a large amount of vibration signals can be collected, but only a small part of them have health condition labels. Existing methods cannot obtain satisfactory diagnosis results.
Let X denote the input signal space and Y = 1 , 2 , , N c represent the set of N c health conditions, for any sample x i , y i with x i X , y i Y . A partial unlabeled source domain D s of n s samples and an unlabeled target domain D t of n t samples are available. The D s and the D t can be formulated as
D s = x i s , y i s = C h i = 1 n s u x i s , y i s i = n s u + 1 n s l
D t = x j t , y j t = 0 j = 1 n t
where x i s , y i s and x i t , y i t denote the samples of x i , y i in D s and D t , respectively; C h denotes the labeled machine health conditions; n s u and n s l mean the number of unlabeled and labeled source domain data, respectively; and n s = n s u + n s l . Similarly, n t denotes the amount of target domain data. D s and D t are sampled from joint distributions P s X , Y and P t X , Y , and P s P t . Particularly, since the label spaces of the source and target domains are identical in this paper, D s and D t are considered as being sampled from various marginal distributions— P s X and P t X .
The purpose of this paper is to generate a source domain pseudo-label data set D s P s e u d o = x i s , y i s C h i = 1 n s u x i s , y i s i = n s u + 1 n s l of n s u signals in operating conditions based on D s . The D s P s e u d o is defined with the following formula:
D s P s e u d o = x i s , y i s C h i = 1 n s u x i s , y i s i = n s u + 1 n s l
Afterwards, feature extractors and domain discriminators are constructed for minimizing the feature extractor loss and maximizing the discriminator loss so as to retrieve generalization features and reduce the network loss.

3.2. The Hybrid Networks

Aiming at the dilemma of fault detection under diverse operating conditions in the absence of sufficient labeled data, this study proposes a semi-supervised adversarial transfer network for cross-domain diagnosis. Firstly, a signal-to-image conversion method for 1D signal processing is presented. Secondly, a generative label extension network based on SSL is designed to predict and generate artificial labels. Finally, a dynamic adversarial transfer network is proposed for domain adaptation feature knowledge extraction and fault detection classification. An overview of the proposed condition monitoring approach is presented in Figure 3.
Figure 3. The flow diagram of the proposed approach.

3.2.1. Stage 1: Signal Processing

As we have seen, deep learning has advanced considerably in intelligent condition diagnosis, but it fails to preserve temporal dependency entirely when processing time domain vibration signals, which leads to a loss of data signals. Furthermore, existing networks such as recurrent neural network are difficult to train; hence, it is hard to construct accurate monitoring models. However, by converting time domain vibration signals into 2D image dataset through GAF, not only can enrich signal characteristics by filling vibration signals with pixels, but also establish bijective mapping between 1D vibration signals and 2D space to ensure the integrity of information.
Since the inner product operation cannot retain both observed values during conversion process, both of them can only be converted into one value. To avoid this loss of information, Gram-like matrix is generated by calculating x y = cos θ 1 + θ 2 , so that original values of scaled time series form diagonal lines, and 1D vibration signals are approximately reconstructed with advanced features extracted by deep learning, so as to maintain an absolute temporal relationship [50].
Therefore, it is crucial to convert 1D time series signals into 2D images efficiently [51]. And the GAF is offered to convert 1D time series into RGB images in this study. Assume a time series X = x 1 , x 2 , , x n of n real-valued observations, then rescale the time series X to keep all values are in the interval (–1, 1) or (0, 1) by:
x ˜ 1 i = x i max X + x i min X max X min X
x ˜ 0 i = x i min X max X min X
where x ˜ 1 i and x ˜ 0 i represent the values of x n falling within the interval (–1, 1) or (0, 1).
Therefore, the rescaled time series X ˜ in polar coordinates can be obtained by encoding the value as the angular cosine and the time stamp as the radius, and the corresponding formula is as follows:
= arccos x ˜ i , 1 x ˜ i 1 , x ˜ i X ˜ r = t i N , t i
where t i is the time stamp and N denotes a constant factor to regularize the span of the polar coordinate system.
The encoding map of Equation (8) has several advantages. First, since it is bijective, the original time series can be reconstructed by using θ and r . Second, the time dependence in the original time series is preserved through the coordinates of r .
Additionally, it can be seen from Equation (8) that (–1, 1) represents the cosine function in (0, π) and the cosine value decreases monotonically within this range. Likewise, (0, 1) denote the cosine function in 0 , π 2 . By calculating the cosine of the angle sum between various points, the GAF can generate two different images, i.e., the Gramian angular summation field (GASF) and the Gramian angular difference field (GADF). The two are, respectively, represented by the following formulas:
G A S F = cos ϕ i + ϕ j = X ˜ · X ˜ I X ˜ 2 · I X ˜ 2
G A D F = sin ϕ i ϕ j = I X ˜ 2 · X ˜ X ˜ · I X ˜ 2
where I denotes the unit row vector (1, 1, ···, 1).
In short, we only need to rescale the time series X into the polar coordinate system using Equation (8), then utilize the corresponding equation to calculate; finally, we can obtain the image from the GASF and GADF. The diagram of the above transformation is presented in Figure 4. In this study, we use GASF to convert 1D vibration signals into images.
Figure 4. Diagram of the GAF conversion process.

3.2.2. Stage 2: Label Generation

In traditional fault diagnosis method, the accuracy of diagnosis model is determined by the quantity of data labels. Cross-domain diagnostic tasks cannot be performed if there are sufficient signals but not enough labels [52]. Specifically, SSL can provide an effective approach to reducing the dependence on labeled data by using unlabeled data [53]. Therefore, this paper introduces an SSL-based generative label extension network, which utilizes unlabeled images to predict and generate artificial labels. The generated labels can be used for subsequent transfer learning to improve the accuracy of diagnosis models.
First, we define X = x b , p b : b 1 , 2 , , B as a batch of B-labeled samples for an L-class classification problem, where x b are the training examples and p b denotes one-hot labels. Then, let U = u b : b 1 , 2 , , μ B be a batch of μ B unlabeled examples where μ is a hyperparameter used to dictate the relevant sizes of X and U . p m y | x is the prediction category distribution generated by the input X . Additionally, we define H p , q as the cross-entropy between two probability distributions p and q. Finally, two types of augmentations are leveraged in the proposed method: strong and weak, denoted by A and α , respectively. In this work, weak augmentation makes use of standard flip-and-shift strategy, while strong augmentation first uses RandAugment [54] or CTAugment [55], then uses CutOut [56] to enhance images; severely distorted input images are output in the end.
The training process consists of two parts as depicted in Figure 5: supervised training and unsupervised training. Labeled data perform supervised training, and conventional classification tasks are carried out to reduce the supervised loss. For unlabeled data, use their weakly-augmented versions to train the model to output predictions. When the probability for a class is higher than the threshold, the prediction of that class is considered a pseudo-label. Next, this pseudo-label is used to supervise the output of a strongly-augmented version of the same image.
Figure 5. The flowchart of label generative module.
The detailed training process is as follows:
Step 1.
Input:
Prepare the labeled batch X = x b , p b : b 1 , 2 , , B , the unlabeled batch U = u b : b 1 , 2 , , μ B , and unlabeled data ratio μ.
Step 2.
Supervised training:
Use the conventional cross-entropy loss H p , q for classification task of labeled data, and the supervised loss 𝓁 s with labeled data being defined as
𝓁 s = 1 B b = 1 B H ( p b , p m y | α x b )
where α denotes a random function.
Step 3.
Pseudo-labeling:
Apply weak and strong augmentation to unlabeled data to, respectively, obtain augmented data. Then feed them into the model to acquire predictions and select the weakly augmented prediction to generate pseudo-label using argmax.
Step 4.
Consistency regularization:
Compute the cross-entropy loss H p , q between strongly augmented predicted value and weakly augmented pseudo-label value; the unsupervised loss 𝓁 u of unlabeled data is defined as
𝓁 u = 1 μ B b = 1 μ B max q b τ H q b ^ , p m y | A u b
where q b = p m y | α ( u b ) , q b ^ = arg max ( q b ) denotes pseudo-label, τ is a scalar hyperparameter denoting the threshold for pseudo-label, and represents the indicator function.
Step 5.
Objective loss function:
The training objective is a cross-entropy loss, which can be formulated as:
L = l s + λ u l u
where λ u denotes the relative weight of unlabeled data loss.
Step 6.
Label generation:
Feed unlabeled data are fed into the model to generate artificial labels for subsequent diagnosis task.

3.2.3. Stage 3: Cross-Domain Diagnosis

Feature extraction is crucial to fault diagnosis. Conventional feature extraction methods are not suitable for cross-domain diagnosis, while adversarial transfer networks can effectively extract domain-invariant features to solve cross-domain problems. Therefore, the adversarial transfer network is combined with the label generation module mentioned in Stage 2 to address the degradation of diagnosis performance due to distribution discrepancy.
Figure 6 shows the architecture of the presented networks. Form the figure, the network is composed of a feature extractor G f , a label classifier G y applied to identify source-domain labels, a global domain discriminator G d applied to align marginal distributions of source and target domains, and C local subdomain discriminators G d c ( c 1 , 2 , , C ) applied to align conditional distributions of source and target domains. In detail, labeled source domain data, target domain data, and pseudo-labeled data in Stage 2 are mixed as the input x of the network; then, the high-level features f are extracted by the feature extractor G f . Then, feature f is fed into G f and the domain discriminators (including discriminator G d and G d c ) for adversarial training. Finally, optimal domain-invariant feature extraction is achieved by minimizing the loss L y and maximizing L g and L l , where L y is the loss of G f , L g is the loss of G d and L l is the loss of G d c . We calculate the losses of L y , L g and L l as:
L y = 1 n s X i D s c = 1 C P X i C log G y G f X i
L g = 1 n s + n t X i D s D t L d G d G f X i , d i
L l = 1 n s + n t c = 1 C X i D s D t L d c G d c y ^ i c G f X i , d i
where P X i C represents the probability that X i belongs to class c, and y ^ i c denotes the predicted probability distribution of the input sample X i belonging to class C. Moreover, the dynamic adversarial factor ω can dynamically and robustly evaluate the G d and G d c weights according to the difference of data distribution in different diagnostic tasks. The ω is defined as:
ω ^ = d A , g D s , D t d A , g D s , D t + 1 C c = 1 C d A , l D s c , D t c
where d A , g D s , D t and d A , l D s c , D t c represent the A -distance of global domain discriminator G d and local domain discriminator G d c , respectively.
Figure 6. The architecture of the dynamic transfer networks. y ^ denotes predicted label, L y and L d represent classification loss and domain loss. d ^ and d ^ C present predicted domain label. GRL means the gradient reversal layer.
By integrating all components, the final objective function of transfer network can be expressed as:
L θ f , θ y , θ d , θ d c | c = 1 C = L y λ ( 1 ω L g + ω L l )
In summary, the network maps feature of source and target domain to the same space by confusing the domain labels. By adaptively measuring the distribution difference between the two domains, the target-domain data is labeled using domain-invariant features, which are indistinguishable between source and target domains. In addition, network domain adaptation and feature extraction are performed simultaneously, and label classification is supervised by source domain labels, target domain labels, and domain labels at the same time.

3.3. System and Steps of Diagnosis Process

In general, the raw vibration signal is converted into RGB images using the GAF in Stage 1 firstly, including labeled images and unlabeled images. Secondly, unlabeled images are fed into the network in Stage 2, which combines supervised and unsupervised learning to generate artificial labels. Thirdly, the adversarial transfer network in Stage 3 adaptively reduces distribution discrepancy to extract the generalization features, so as to deliver accurate fault classification results. Using the proposed hybrid networks, the networks realize feature transfer and complete intelligent fault identification of the signal.
Step 1.
Collect one-dimensional bearing vibration signal under diverse operating conditions.
Step 2.
Convert the raw signals collected in Step 1 into two-dimensional images using the GAF.
Step 3.
Divide the two-dimensional images into labeled datasets, unlabeled datasets, and validation datasets according to required proportion, and input them into label generative extension network for training.
Step 4.
Feed the unlabeled data into the label generative extension model to obtain the model prediction for label classification.
Step 5.
Divide the dataset mixed with real and artificial labels into training set and validation set and input the training set into the dynamic adversarial transfer network for cross-domain diagnosis.
Step 6.
Calculate the accuracy of the validation set and output the health state detection results of the model.

4. Case Studies

To verify the capability of the intelligent fault detection networks proposed in this article, two practical case studies were carried out. In case study 1, a comparative experiment of diagnosis performance under diverse working conditions was conducted, and case study 2 compared the diagnosis performance of the proposed method with conventional methods under various label proportions.

4.1. Dataset Descriptions

In this case, the proposed approach is performed on the public datasets acquired from the Bearing Data Center of Case Western Reserve University [57]. Extensive studies have been conducted on this rolling bearing fault diagnosis dataset. The vibration data used in this work were collected from the sensor placed on the drive end of the motor, which ran at a constant speed with four rotating speeds, i.e., 1730, 1750, 1772, 1797 r/min, and on four health conditions, i.e., normal (NO), ball fault (BF), inner race fault (IF), outer race fault (OF). The three different types of artificial faults were created with diameters of 7, 14, 21 mils. In addition, the CWRU experimental platform is shown in Figure 7.
Figure 7. The CWRU bearing test rig.
Ten operation patterns (containing one NO and nine faulty conditions) in four rotating speed domains are included in the CWRU datasets. It is worth noting that the CWRU dataset is widely considered as a publicly available dataset for fault detection of rolling bearings due to its better data quality and less noise interference. In this study, seven operation conditions were selected to examine the proposed method according to the experimental requirements.
In the CWRU dataset, the vibration signal was acquired from the sensor at an operating frequency of 1.2 kHz for about 10 s, and each dataset had about 120,000 data points. Taking into account the above factors, performing data truncation on the original signal is critical. Therefore, in this article, a sliding window truncation method is proposed to generate a dataset for seven operation patterns. As depicted in Figure 8, a truncation window slides along the vibration signal at a shift length of 64 data points, and the window size is 512 data points. Therefore, the signal is converted into a dataset consisting of 512 × 512 RGB images under the action of the sliding window. The converted results are presented in Figure 9. For example, BF-07 indicates a ball fault with a fault diameter of 7 mils, IF-14 indicates an inner race fault with a fault diameter of 14 mils, and the rest follow the same pattern.
Figure 8. The sliding window truncation method.
Figure 9. Converted images in this work.

4.2. Case Study 1: Variable Operating Conditions Transfer Experiment

Utilizing the sliding window truncation method, the raw vibration signal is processed into datasets under four different load scenarios (0, 1, 2, 3 HP) using the GAF. The dataset of each scenario contains seven types of faults, and each type of fault has 600 images. Thus, there are a total of 4200 images for operating conditions, and each dataset is randomly divided into the training dataset and testing dataset according to a ratio of 8:2. Detailed data descriptions of the dataset are presented in Table 2.
Table 2. The CWRU dataset details for each operating condition.
In this case, the variable working conditions transfer experiment is mainly evaluated by the proposed method in these 12 tasks, i.e., T 01 , T 02 , T 03 , T 10 , T 12 , T 13 , T 20 , T 21 , T 23 , T 30 , T 31 , T 32 . For example, T 01 represents a variable operating condition transfer task with 0 HP load as the source domain and 1 HP load as the target domain. The rest of symbols T follow the same pattern as well. Part of the transfer details are shown in Table 3, where labeled proportion represents the proportion of labeled images in the training dataset.
Table 3. Partial details of fault diagnosis tasks in this case.
In this experiment, there are seven types of images, totaling 4200 images fed into the hybrid network, of which 3360 images were divided into training dataset and 840 images were divided into testing dataset. Then, half of the training data are randomly selected as labeled data and the remaining training data are treated as unlabeled data. The labeled data and unlabeled data are sent to the label generation model for training, and the artificial label prediction results of unlabeled data are output. Next, the labeled data and the artificially labeled data are imported together into the dynamic adversarial transfer network for training until the network converges. Finally, an accurate cross-domain fault diagnosis model is obtained. To illustrate effectiveness, the results of different transfer tasks are compared. In addition, average identification accuracy is selected as the model measurement standard; the accuracy is the average value of 10 rounds of experiments. Table 4 shows the comparison results.
Table 4. Comparison results.
It can be seen from Table 3 that the proposed method can maintain an average accuracy of more than 99.17% in fault detection under diverse operating conditions, which indicates that the designed approach has powerful ability to solve the seven classifications problem with high efficiency. To sum up, this method can still maintain high accuracy of cross-domain label recognition under various working conditions, as well as in the absence of some source-domain labels, thus demonstrating the effectiveness of the fault transfer diagnosis network proposed in this article.
Taking task T 31 as an example, the accuracy and loss curves of the model are shown in Figure 10. With the increase in training epochs, the accuracy of the fault label classifier was continuously improved, approaching 99% in 50 epochs and tended to be stable. Meanwhile, the testing loss gradually declined until it approached 0.0087 within about 40 epochs, then kept stable. From the result, it can be inferred that after sufficient training epochs, the proposed network can effectively extract domain-invariant features of vibration signals and accurately perform cross-domain detection, which indicates that the designed diagnosis network is reasonable.
Figure 10. Training accuracy and loss curves.
Next, the influence of the evaluation metrics was investigated. In addition to accuracy, precision, recall, and F1 score were selected to evaluate the validity of the fault detections. Three representative tasks, T 01 , T 12 , and T 23 , were used as examples for analysis, and the results are shown in Figure 11. It can be observed that the overall evaluation metrics of the three tasks are above 99.2%, and T 01 , T 23 , and T 12 decrease sequentially. Specifically, precision, recall, and F1 score of each task were relatively stable and close to their corresponding precision. The satisfactory transfer performance between domains was achieved using the proposed networks, which demonstrates that the proposed networks are suitable for domain adaptation with high robustness.
Figure 11. The precision, recall, and F1 score for T 01 , T 12 and T 23 .
Furthermore, Figure 12 presents the fault diagnosis result of T 21 after ten repeated calculations by using the confusion matrix. The vertical axis of the confusion matrix denotes the true label, and the horizontal axis is the predicted label. From the figure, it can be seen that the average detection accuracy of the seven health states is 99.7%; the diagnosis performance is satisfactory. Specifically, IF-07, OF-07, BF-14, and NO all have 100% detection accuracy, which implies that they have not assigned the label to others, with BF-14 having received the wrong category from IF-14. BF-07 and OF-14 can accept the error misclassification of each other. BF-07 has the lowest accuracy rate of 98.8%, and more BF-07 elements are assigned to OF-14. The results indicated that even though half of training data lack health condition labels, the proposed approach can still achieve the precise diagnosis of seven health conditions in the target domain. Thus, the effectiveness of the proposed approach for domain adaptation detection is verified in the presence of insufficient labels.
Figure 12. Confusion matrix of fault detection results.
Furthermore, Figure 13 shows the ROC curve of the fault diagnosis result of task T 23 . As can be seen from the figure, all the fault classification curves coincide, for except some deviation in the OF-14 curve, where the IF-07 and OF-07 curves overlap with the NO curve. This means that seven kinds of bearing fault classification can be perfectly diagnosed by using proposed networks, which also proves the validity of the proposed networks.
Figure 13. The ROC curve of fault diagnosis result.

4.3. Case Study 2: Insufficient Source Domain Label Transfer Experiment

To highlight the effectiveness of the proposed networks in fault detection on small samples of labeled data, this experiment further expands the gap between the number of labeled and unlabeled data in the source domain. For the vibration signal data collected at the sampling frequency of 12 kHz, 1/2, 1/4, 1/8, and 1/16 of the source domain images are selected as labeled data, and the rest are unlabeled data. Moreover, the diagnostic tasks T 01 , T 12 , T 23 , T 21 , and T 31 are performed for experimental evaluations.
In this case study, experiments were conducted to compare the proposed networks with three other commonly used transfer learning models, namely deep adaptation network (DAN), Deep CORAL, and DANN. The structure of the CNN feature extractor used for comparison is the same as that of the proposed networks, and each group of trials was performed 10 times and averaged.
The diagnosis accuracies of the proposed method were compared with other popular transfer methods, the comparison results are shown in Table 5. From the comparison results, it can be seen that the network proposed in this experiment can still identify fault labels with an average accuracy of 98.32%, even though the number of labeled samples is only one in sixteen. Moreover, compared to the other transfer models, the accuracy fluctuation of the proposed model tends to be stable when the labeled proportion is changed. This illustrates the excellent robustness of the proposed method in extreme cases. On the other hand, the proposed networks produce higher detection accuracy than other diagnosis models, and are always higher than DANN. This is because the proposed model dynamically adjusts the proportion of marginal distribution and conditional distribution when extracting domain-invariant features, while DANN only considers the adaptation of marginal distribution.
Table 5. Average testing accuracies in the fault detection of case study 2.
Figure 14 manifests that with the decrease in labeled data in source domain, the proposed model still maintains high accuracy, while other transfer learning methods exhibit the overfitting phenomenon. It also indicates that when the labeled data are insufficient, the proposed method can effectively alleviate the overfitting phenomenon, and provide high accuracy results. Additionally, transfer adversarial networks are proven to maximize the utilization of limited labeled data.
Figure 14. Comparison of contrast test accuracy.
To visually present the training results of different transfer models, taking the 1/16 experiment as an example, t-SNE dimensionality reduction processing was performed on the last hidden layer of DAN, Deep CORAL, DANN, and the proposed networks, respectively. The features were plotted into the 2D space, and the outcome of the visualization is shown as Figure 15.
Figure 15. Visualizations of the learned feature representations in the fully connected layer in the feature extractor: (a) DAN; (b) Deep CORAL; (c) DANN; (d) Proposed.
Figure 15 depicts that all four methods can separate the fault classifications from the original distribution, but the DAN network and the Deep CORAL network cannot completely overcome the difference in feature marginal distribution, and it is prone to the negative transfer phenomenon when there are few available labeled data. On the other hand, the approach proposed in this study projects all types of labels perfectly to the same area, demonstrating significant clustering and separability. This is sufficient to show that the proposed networks can perform cross-domain operation states detection reliably in the case of less available label data.

5. Conclusions

Aiming at the domain shift problem in the case of insufficient source domain labels, a method of applying semi-supervised adversarial transfer networks for cross-domain fault detection was proposed in this literature. Initially, vibration signals were converted into images using the signal-to-image method. Furthermore, a semi-supervised generation module was designed to generate artificial labels for unlabeled images, so as to solve the dilemma of insufficient source domain labels. Eventually, the adversarial transfer networks were introduced to extract domain-invariant features in different domains and achieve fault detections. Two case studies were conducted on the public rolling bearing dataset to validate the performance of the proposed approach. The analysis results clarify that the proposed networks are able to diminish the distribution discrepancy and extract generalized features for domain adaptation detection. Meanwhile, the generalization and robustness of the proposed networks were demonstrated by carrying out a cross-domain transfer experiment and insufficient source-domain label experiment. Therefore, the finding of this research has great potential in practical applications, since comprehensive experimental data on health status labels are generally hard to acquire in the real industry.
Although the proposed approach is effective for processing common cross-domain diagnosis problems, when the training data in each health state are unbalanced, diagnostic accuracy is dramatically degraded. To address this issue, future research will be considered to perform accurate and efficient fault detection in the presence of an unbalanced health state dataset.

Author Contributions

Methodology, validation, formal analysis, writing—original draft, W.W.; conceptualization, Y.L., J.W., and W.W.; investigation, J.W.; writing—review and editing, J.W. and Y.L.; funding acquisition, B.P. and J.W.; supervision, B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Major Projects of Zhejiang Provincial Natural Science Foundation of China (Grant No. LD22E050009), Shaoxing Municipality Science and Technology Projects of “JieBangGuaShuai” (Grant 2021B41006) and Postdoctoral Science Preferential Funding of Zhejiang Province, China. (Grant 273426).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Glowacz, A.; Glowacz, W.; Kozik, J.; Piech, K.; Gutten, M.; Caesarendra, W.; Liu, H.; Brumercik, F.; Irfan, M.; Faizal Khan, Z. Detection of Deterioration of Three-phase Induction Motor using Vibration Signals. Meas. Sci. Rev. 2019, 19, 241–249. [Google Scholar] [CrossRef]
  2. Caesarendra, W.; Tjahjowidodo, T. A Review of Feature Extraction Methods in Vibration-Based Condition Monitoring and Its Application for Degradation Trend Estimation of Low-Speed Slew Bearing. Machines 2017, 5, 21. [Google Scholar] [CrossRef]
  3. Wang, Z.; Wang, J.; Cai, W.; Zhou, J.; Du, W.; Wang, J.; He, G.; He, H. Application of an Improved Ensemble Local Mean Decomposition Method for Gearbox Composite Fault Diagnosis. Complexity 2019, 2019, 1–17. [Google Scholar] [CrossRef]
  4. Bazan, G.H.; Goedtel, A.; Castoldi, M.F.; Godoy, W.F.; Duque-Perez, O.; Morinigo-Sotelo, D. Mutual Information and Meta-Heuristic Classifiers Applied to Bearing Fault Diagnosis in Three-Phase Induction Motors. Appl. Sci. 2020, 11, 314. [Google Scholar] [CrossRef]
  5. Sierra-Alonso, E.F.; Caicedo-Acosta, J.; Orozco Gutiérrez, Á.Á.; Quintero, H.F.; Castellanos-Dominguez, G. Short-Time/-Angle Spectral Analysis for Vibration Monitoring of Bearing Failures under Variable Speed. Appl. Sci. 2021, 11, 3369. [Google Scholar] [CrossRef]
  6. Li, X.; Zhang, W.; Ding, Q. Cross-Domain Fault Diagnosis of Rolling Element Bearings Using Deep Generative Neural Networks. IEEE Trans. Ind. Electron. 2019, 66, 5525–5534. [Google Scholar] [CrossRef]
  7. Li, X.; Jiang, H.; Liu, S.; Zhang, J.; Xu, J. A unified framework incorporating predictive generative denoising autoencoder and deep Coral network for rolling bearing fault diagnosis with unbalanced data. Measurement 2021, 178, 109345. [Google Scholar] [CrossRef]
  8. Krysander, M.; Nyberg, M. Structural Analysis for Fault Diagnosis of Dae Systems Utilizing Mss Sets. IFAC Proc. Vol. 2002, 35, 143–148. [Google Scholar] [CrossRef]
  9. Ma, S.; Chu, F.; Han, Q. Deep residual learning with demodulated time-frequency features for fault diagnosis of planetary gearbox under nonstationary running conditions. Mech. Syst. Signal Process. 2019, 127, 190–201. [Google Scholar] [CrossRef]
  10. Csurka, G. Domain adaptation for visual applications A comprehensive survey. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 26 July 2017. [Google Scholar]
  11. Long, M.; Cao, Y.; Wang, J.; Jordan, M.I. Learning Transferable Features with Deep Adaptation Networks. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 97–105. [Google Scholar]
  12. Zhao, Y.; Li, S.; Zhang, R.; Liu, C.H.; Cao, W.; Wang, X.; Tian, S. Semantic Correlation Transfer for Heterogeneous Domain Adaptation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–13. [Google Scholar] [CrossRef]
  13. Hu, S.; Wang, Q.; Huang, K.; Wen, M.; Coenen, F. Retrieval-based language model adaptation for handwritten Chinese text recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 2022. [Google Scholar] [CrossRef]
  14. Wang, Z.; Hansen, J.H.L. Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 60–75. [Google Scholar] [CrossRef]
  15. Han, T.; Zhou, T.; Xiang, Y.; Jiang, D. Cross-machine intelligent fault diagnosis of gearbox based on deep learning and parameter transfer. Struct. Control Health Monit. 2021, 29, e2898. [Google Scholar] [CrossRef]
  16. Zhang, L.; Guo, L.; Gao, H.; Dong, D.; Fu, G.; Hong, X. Instance-based ensemble deep transfer learning network: A new intelligent degradation recognition method and its application on ball screw. Mech. Syst. Signal Process. 2020, 140, 106681. [Google Scholar] [CrossRef]
  17. Han, T.; Li, Y.-F. Out-of-distribution detection-assisted trustworthy machinery fault diagnosis approach with uncertainty-aware deep ensembles. Reliab. Eng. Syst. Saf. 2022, 226, 108648. [Google Scholar] [CrossRef]
  18. Guo, L.; Yu, Y.; Duan, A.; Gao, H.; Zhang, J. An unsupervised feature learning based health indicator construction method for performance assessment of machines. Mech. Syst. Signal Process. 2022, 167, 108573. [Google Scholar] [CrossRef]
  19. Sugumaran, V.; Ramachandran, K.I. Automatic rule learning using decision tree for fuzzy classifier in fault diagnosis of roller bearing. Mech. Syst. Signal Process. 2007, 21, 2237–2247. [Google Scholar] [CrossRef]
  20. Shi, Q.; Zhang, H. Fault Diagnosis of an Autonomous Vehicle With an Improved SVM Algorithm Subject to Unbalanced Datasets. IEEE Trans. Ind. Electron. 2021, 68, 6248–6256. [Google Scholar] [CrossRef]
  21. Zhang, N.; Wu, L.; Yang, J.; Guan, Y. Naive Bayes Bearing Fault Diagnosis Based on Enhanced Independence of Data. Sensors 2018, 18, 463. [Google Scholar] [CrossRef]
  22. Guo, L.; Yu, Y.; Gao, H.; Feng, T.; Liu, Y. Online Remaining Useful Life Prediction of Milling Cutters Based on Multisource Data and Feature Learning. IEEE Trans. Ind. Inform. 2022, 18, 5199–5208. [Google Scholar] [CrossRef]
  23. Hoang, D.-T.; Kang, H.-J. A survey on Deep Learning based bearing fault diagnosis. Neurocomputing 2019, 335, 327–335. [Google Scholar] [CrossRef]
  24. Shen, F.; Chen, C.; Yan, R.; Gao, R.X. Bearing fault diagnosis based on SVD feature extraction and transfer learning classification. In Proceedings of the Prognostics and Health Management Conference (PHM), Beijing, China, 21–23 October 2015. [Google Scholar]
  25. Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  26. Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. Learn. Syst. 2011, 22, 199–210. [Google Scholar] [CrossRef]
  27. Wang, X.; Schneider, J. Flexible transfer learning under support and model shift. In Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  28. Han, T.; Liu, C.; Wu, R.; Jiang, D. Deep transfer learning with limited data for machinery fault diagnosis. Appl. Soft Comput. 2021, 103, 107150. [Google Scholar] [CrossRef]
  29. Azamfar, M.; Li, X.; Lee, J. Intelligent ball screw fault diagnosis using a deep domain adaptation methodology. Mech. Mach. Theory 2020, 151, 103932. [Google Scholar] [CrossRef]
  30. Qian, Q.; Qin, Y.; Luo, J.; Wang, Y.; Wu, F. Deep discriminative transfer learning network for cross-machine fault diagnosis. Mech. Syst. Signal Process. 2023, 186, 109884. [Google Scholar] [CrossRef]
  31. Huang, M.; Yin, J.; Yan, S.; Xue, P. A fault diagnosis method of bearings based on deep transfer learning. Simul. Model. Pract. Theory 2023, 122, 102659. [Google Scholar] [CrossRef]
  32. Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep Transfer Learning with Joint Adaptation Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
  33. Xiao, D.; Huang, Y.; Zhao, L.; Qin, C.; Shi, H.; Liu, C. Domain Adaptive Motor Fault Diagnosis Using Deep Transfer Learning. IEEE Access 2019, 7, 80937–80949. [Google Scholar] [CrossRef]
  34. Yang, B.; Lei, Y.; Jia, F.; Li, N.; Du, Z. A Polynomial Kernel Induced Distance Metric to Improve Deep Transfer Learning for Fault Diagnosis of Machines. IEEE Trans. Ind. Electron. 2020, 67, 9747–9757. [Google Scholar] [CrossRef]
  35. Lu, W.; Liang, B.; Cheng, Y.; Meng, D.; Yang, J.; Zhang, T. Deep Model Based Domain Adaptation for Fault Diagnosis. IEEE Trans. Ind. Electron. 2017, 64, 2296–2305. [Google Scholar] [CrossRef]
  36. Zhang, Z.; Chen, H.; Li, S.; An, Z. Sparse filtering based domain adaptation for mechanical fault diagnosis. Neurocomputing 2020, 393, 101–111. [Google Scholar] [CrossRef]
  37. Pandhare, V.; Li, X.; Miller, M.; Jia, X.; Lee, J. Intelligent Diagnostics for Ball Screw Fault Through Indirect Sensing Using Deep Domain Adaptation. IEEE Trans. Instrum. Meas. 2021, 70, 1–11. [Google Scholar] [CrossRef]
  38. Li, X.; Zhang, W. Deep Learning-Based Partial Domain Adaptation Method on Intelligent Machinery Fault Diagnostics. IEEE Trans. Ind. Electron. 2021, 68, 4351–4361. [Google Scholar] [CrossRef]
  39. Zhang, A.; Gao, X. Supervised dictionary-based transfer subspace learning and applications for fault diagnosis of sucker rod pumping systems. Neurocomputing 2019, 338, 293–306. [Google Scholar] [CrossRef]
  40. Ganin, Y.; Ustinova, E.; Ajakan, H. Domain-Adversarial Training of Neural Networks. J. Mach. Learn. Res. 2016, 17, 1–35. [Google Scholar] [CrossRef]
  41. Hu, R.; Zhang, M.; Xu, W. Cross-domain fault diagnosis of rolling element bearings using DCGAN and DANN. J. Vib. Shock 2022, 41, 21–29. [Google Scholar] [CrossRef]
  42. Ghorvei, M.; Kavianpour, M.; Beheshti, M.T.H.; Ramezani, A. Spatial graph convolutional neural network via structured subdomain adaptation and domain adversarial learning for bearing fault diagnosis. Neurocomputing 2023, 517, 44–61. [Google Scholar] [CrossRef]
  43. Guo, L.; Yu, Y.; Liu, Y.; Gao, H.; Chen, T. Reconstruction Domain Adaptation Transfer Network for Partial Transfer Learning of Machinery Fault Diagnostics. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar] [CrossRef]
  44. Zheng, H.; Wang, R.; Yang, Y.; Yin, J.; Li, Y.; Li, Y.; Xu, M. Cross-Domain Fault Diagnosis Using Knowledge Transfer Strategy: A Review. IEEE Access 2019, 7, 129260–129290. [Google Scholar] [CrossRef]
  45. Yan, R.; Shen, F.; Sun, C.; Chen, X. Knowledge Transfer for Rotary Machine Fault Diagnosis. IEEE Sens. J. 2020, 20, 8374–8393. [Google Scholar] [CrossRef]
  46. Hubel, D.H.; Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat′s visual cortex. J. Physiol. 1962, 160, 106–154. [Google Scholar] [CrossRef]
  47. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  48. Wang, X.; Liu, X.; Wang, J.; Xiong, X.; Bi, S.; Deng, Z. Improved Variational Mode Decomposition and One-Dimensional CNN Network with Parametric Rectified Linear Unit (PReLU) Approach for Rolling Bearing Fault Diagnosis. Appl. Sci. 2022, 12, 9324. [Google Scholar] [CrossRef]
  49. Srivastava, R.K.; Greff, K.; Schmidhuber, J. Highway Networks. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015. [Google Scholar]
  50. Wang, Z.; Oates, T. Imaging Time-Series to Improve Classification and Imputation. In Proceedings of the 1st International Workshop on Social Influence Analysis / 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 25–31 July 2015; pp. 3939–3945. [Google Scholar]
  51. Wang, Z.; Zhao, W.; Du, W.; Li, N.; Wang, J. Data-driven fault diagnosis method based on the conversion of erosion operation signals into images and convolutional neural network. Process Saf. Environ. Prot. 2021, 149, 591–601. [Google Scholar] [CrossRef]
  52. Mahajan, D.; Girshick, R.; Ramanathan, V. Exploring the limits of weakly supervised pretraining. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 185–201. [Google Scholar]
  53. Miyato, T.; Maeda, S.I.; Koyama, M.; Ishii, S. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1979–1993. [Google Scholar] [CrossRef] [PubMed]
  54. Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. RandAugment Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Electr Network, 14–19 June 2020; pp. 3008–3017. [Google Scholar]
  55. Berthelot, D.; Carlini, N.; Cubuk, E.D.; Kurakin, A.; Sohn, K. ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  56. DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  57. Smith, W.A.; Randall, R.B. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process. 2015, 64–65, 100–131. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.