# Frequency-Domain Fusing Convolutional Neural Network: A Unified Architecture Improving Effect of Domain Adaptation for Fault Diagnosis

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- We design the network architecture for fault diagnosis from the perspective of frequency-domain characteristics of convolution kernels. The motivation for network design has a clear physical meaning.
- For the first time, we use the amplitude-frequency characteristic curve to describe the frequency domain characteristic of the convolution kernels. This provides a new idea for analyzing the physical meaning of the convolution kernels.
- the proposed FFCNN is suitable for various domain adaptation loss functions, and can significantly improve the performance of domain adaptation for fault diagnosis without increasing the complexity of the networks.
- Dilated convolution is used in domain adaptation and fault diagnosis. Dilated convolution can improve the receptive field without increasing the number of parameters.

## 2. Related Work

## 3. Background

#### 3.1. Transfer Learning and Domain Adaptation

- Homogeneous transfer learning. The input spaces of the source domain and target domain are similar and the label spaces are the same, expressed as ${\mathcal{X}}^{S}\cap {\mathcal{X}}^{T}\ne \varnothing $ and ${\mathcal{Y}}^{S}={\mathcal{Y}}^{T}$.
- Heterogeneous transfer learning. Both the input spaces and the label spaces may be different, expressed as ${\mathcal{X}}^{S}\cap {\mathcal{X}}^{T}=\varnothing $ or ${\mathcal{Y}}^{S}\ne {\mathcal{Y}}^{T}$.

- Supervised transfer learning. All data in the target domain have labels.
- Semi-supervised transfer learning. Only part of the data in the target domain have labels.
- Unsupervised transfer learning. All data in the target domain have no labels.

#### 3.2. Convolutional Neural Network

#### 3.3. Dilated Convolution

## 4. Motivation

- the convolution kernels can be regarded as a series of filters, which can suppress signals in some single frequency bands.
- Different dilation rates have different AFC curves. Convolution kernels with a dilation rate $r>1$ have multiple suppression bands. And kernels with higher dilation rates have more suppression bands.

## 5. Proposed Method

#### 5.1. Frequency-Domain Fusing CNN

#### 5.2. Learning Process

Algorithm 1 The algorithm of FFCNN back-propagation. |

Input: Labeled source domain samples ${\left\{({\mathit{x}}_{i}^{S},{\mathit{y}}_{i})\right\}}_{i=1}^{m}$, unlabeled target domain samples |

${\left\{\left({\mathit{x}}_{i}^{t}\right)\right\}}_{i=1}^{m}$, regularization parameter $\lambda $, learning rate $\eta $, dilate rate $\{{r}_{1},{r}_{2},{r}_{3}\}$. |

Output Network parameters $\left\{{\theta}_{{r}_{j}}^{conv1},{\theta}^{conv2},{\theta}^{fc},{\theta}^{clf}\right\}$ and predicted labels for target |

domain samples. |

Begin: |

Initialization for $\left\{{\theta}_{{r}_{j}}^{conv1},{\theta}^{conv2},{\theta}^{fc},{\theta}^{clf}\right\}$. |

while stopping criteria is not met do |

for each source and target domain samples of mini-batch size ${m}^{\prime}$ do |

Calculate output ${\mathit{x}}_{{r}_{j}}^{conv1}$ of each branch in dilate convolution layer according to |

Equation (9). |

Connect ${\left\{{\mathit{x}}_{{r}_{j}}^{conv1}\right\}}_{j=1}^{3}$, and calculate output of the second convolution layer according |

to Equation (10). |

Calculate features representations ${\mathit{z}}_{i}$ and output of softmax layer according to |

Equation (11). |

Calculate loss $\ell ({\mathit{y}}^{S},{\tilde{\mathit{y}}}^{S},{\mathit{z}}^{S},{\mathit{z}}^{T})$ according to Equation (12) |

Upgrade $\left\{{\theta}_{{r}_{j}}^{conv1},{\theta}^{conv2},{\theta}^{fc},{\theta}^{clf}\right\}$ according to Equation (13). |

end for |

end while |

#### 5.3. Diagnosis Procedure

- Step 1: Data acquisition. The raw vibration signals are collected by sensors. Then the signals are sliced by a certain length of sliding window with a certain step size. When the samples are ready, they are divided into different working conditions according to the different operation settings. Among them, working condition i is the source domain, and working condition j is the target domain($i\ne j$). The samples in each working condition are further divided into training data and testing data. Section 6.1 will introduce the dataset used in this paper and the working conditions settings.
- Step 2: Domain adaptation. Based on the specific fault diagnosis problem and dataset information, the FFCNN configuration is chosen. The details of FFCNN used in this paper have been stated in Section 5.1. For training stage, FFCNN is trained by source training data and target training data based on Algorithm 1. For the testing stage, the target testing data are fed into trained FFCNN to get classification results.
- Step 3: Results analysis. The diagnosis results will be analyzed form three perspective: network architecture, feature representation and frequency domain.

## 6. Experiment

#### 6.1. Introduction to Datasets

#### 6.2. Experiment Settings and Compared Methods

- MMD: MMD criterion maps features to a Reproducing Kernel Hilbert Space (RKHS) to measure the discrepancy between source and target domain [44]. It is defined as:$${d}_{MMD}\left({\mathit{z}}^{S},{\mathit{z}}^{T}\right)={\u2225\frac{1}{{n}_{S}}\sum _{i=1}^{{n}_{s}}\varphi \left({\mathit{z}}_{i}^{S}\right)-\frac{1}{{n}_{T}}\sum _{j=1}^{{n}_{T}}\varphi \left({\mathit{z}}_{i}^{T}\right)\u2225}_{\mathcal{H}}$$
- CORAL: CORAL criterion measures the discrepancy using the second-order statistics of source and target domain feature representations [45]. It is defined as:$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& {d}_{CORa\phantom{\rule{3.33333pt}{0ex}}L}\left({\mathit{z}}^{S},{\mathit{z}}^{T}\right)=\frac{1}{4{d}^{2}}{\u2225{C}_{S}-{C}_{T}\u2225}_{F}^{2}\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& {C}_{S}=\frac{1}{{n}_{S}-1}\left({\mathit{z}}^{S\top}{\mathit{z}}^{S}-\frac{1}{{n}_{S}}{\left({\mathbf{1}}^{\top}{\mathit{z}}^{S}\right)}^{\top}\left({\mathbf{1}}^{\top}{\mathit{z}}^{S}\right)\right)\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& {C}_{T}=\frac{1}{{n}_{T}-1}\left({\mathit{z}}^{T\top}{\mathit{z}}^{T}-\frac{1}{{n}_{T}}{\left({\mathbf{1}}^{\top}{\mathit{z}}^{T}\right)}^{\top}\left({\mathbf{1}}^{\top}{\mathit{z}}^{T}\right)\right)\hfill \end{array}$$
- CMD: CMD criterion matches the domains by explicitly minimizing differences of higher order central moments for each moment order [41]. It is defined as:$$\begin{array}{c}\hfill {d}_{CMD}\left({\mathit{z}}^{S},{\mathit{z}}^{T}\right)=\frac{1}{|b-a|}\u2225\mathbb{E}\left({\mathit{z}}^{S}\right)-\mathbb{E}\left({\mathit{z}}^{T}\right)\u2225\\ \hfill +\sum _{k=2}^{K}\frac{1}{{|b-1|}^{k}}{\u2225{C}_{k}\left({\mathit{z}}^{S}\right)-{C}_{k}\left({\mathit{z}}^{S}\right)\u2225}_{2}\end{array}$$

#### 6.3. Experiment Results

- The effectiveness of domain adaptation. These tables show that source-only, without domain adaptation, performs poorly. In comparison, domain adaptation methods greatly exceed source-only in most tasks. For example, in task $B1\to B4$, the accuracy of source-only is 30.32%, but the accuracy of domain adaptation is 75.15% at the lowest and 100% at the highest. But domain adaptation fails in some cases. Such as task $B2\to B3$, the accuracy of source-only is 72.27%, compared with 49.8% for CNN-MMD, 60.91% for FFCNN-A, and 55.15% for FFCNN-B. We suppose that these two methods did not extract the appropriate features to adapt the source domain and target domain. Overall, domain adaptation methods achieved the highest average accuracy, proving the strong generalization of domain adaptation.
- The effectiveness of FFCNN. FFCNN used different dilation rates to extract features at different scales, so that it may extract better features. Compared with ordinary CNN, FFCNN is more effective in most tasks. In some tasks, the effect of using FFCNN can be greatly improved. For example, in task $B5\to B1$, FFCNN-B improved by 17.34% compared with CNN-MMD, 22.11% compared with CNN-CORAL, and 12.33% compared with CNN-CMD. But FFCNN may not be effective in some cases, such as FFCNN-A compared with CNN-MMD and FFCNN-B compared with CNN-CORAL in task $B5\to B3$. For some tasks, a feature extracted at a fixed scale may be the most significant, but multi-scale convolution may weaken the influence of such a significant feature. Nevertheless, FFCNN performs well both in terms of the accuracy for most individual tasks and the average accuracy for all tasks.
- The influence of dilation rate. To clearly illustrate the effect of dilation rate, the average accuracy of FFCNN with different dilation rates on all tasks is shown in Figure 9. As directed from the figure, FFCNN with $r=1,3,5$ performs better than FFCNN with $r=1,2,3$, except CORAL for B tasks. According to Equation (8), the kernels of size $H=15$ with dilation rate $r=1,2,3,4,5$ are equivalent to the kernels of size ${H}_{dilated}=15,29,43,57,71$. It can be concluded that a large dilation rate has a larger receptive field, which can improve the effect of domain adaptation. Further analysis of dilation rate and dilated convolution will be discussed in the following sections.
- Dilated convolution v.s. common convolution. Dilated convolution expands the receptive field by expanding the convolution kernel. According to Equation (8), the receptive fields of different dilation rates and the receptive fields of specific size convolution kernels are equivalent. To show the advantage of dilated convolution, take task $B5\to B1$ as an example, dilated convolution and common convolution are applied on CNN and FFCNN. The number of parameters and diagnosis accuracy of dilated convolution and common convolution are compared. The results are shown in Table 6. As we can see, the models using dilated convolution with different dilation rates do not increase the number of parameters. In general, their accuracy is higher than the models using common convolution kernels. This shows that both in terms of model size and diagnosis accuracy, dilated convolutions have advantages over common convolutions.

#### 6.4. Analysis

#### 6.4.1. Analysis from the Perspective of Network Architecture

#### 6.4.2. Analysis from the Perspective of Feature Representation

#### 6.4.3. Analysis from the Perspective of Frequency Domain

## 7. Discussion

- FFCNN is a unified domain adaptation architecture for fault diagnosis, it can also be applied to other CNN structures, domain adaptation methods or datasets.
- Which dilation rates are used to construct a FFCNN need to be determined according to the specific task, not necessarily $r=1,2,3$ or $r=1,3,5$. And the number of combined scales can also change.
- AFC curve can be considered as a general CNN analysis method. It provides a new perspective for describing the characteristics of the convolution kernel.
- Multi-scale convolution kernels are generally applied in the first layer, and using multi-scale convolution in the middle layers has not been studied to prove its effectiveness.

- While FFCNN can improve the effect of domain adaptation, if the source domain and target domain are too different, FFCNN will also fail. How to further enhance the effect of domain adaptation still needs to be further studied [47].
- We explained the FFCNN from the perspective of frequency domain. How to improve the interpretability of deep learning methods for fault diagnosis is a more challenging task [13].

## 8. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

$\mathcal{T}$ | Classification task |

$\mathcal{D}$ | a specific domain |

${\mathcal{D}}^{\mathcal{S}},{\mathcal{D}}^{\mathcal{T}}$ | Source domain and target domain |

$\mathcal{X}$ | Input sample space |

$\mathcal{Y}$ | Input label space |

${\mathcal{X}}^{\mathcal{S}},{\mathcal{X}}^{\mathcal{T}}$ | Input source sample space and target sample space |

${\mathcal{Y}}^{\mathcal{S}},{\mathcal{Y}}^{\mathcal{T}}$ | Input label sample space and target label space |

$X,Y$ | Dataset and labels |

$\mathit{x},\mathit{y}$ | a sample and a label in dataset |

Z | Learned features representation |

$g(\xb7)$ | Feature extractor of deep learning model |

$h(\xb7)$ | Classifier of deep learning model |

${\ell}_{clf},d(\xb7)$ | classification loss and domain loss |

$G(\xb7)$ | a convolution operation |

$A\left({f}_{i}\right)$ | Amplitude frequency characteristic of $G(\xb7)$ under frequency ${f}_{i}$ |

## References

- Lei, Y.; Lin, J.; He, Z.; Zuo, M.J. A review on empirical mode decomposition in fault diagnosis of rotating machinery. Mech. Syst. Signal Process.
**2013**, 35, 108–126. [Google Scholar] [CrossRef] - Peng, Z.; Peter, W.T.; Chu, F. A comparison study of improved Hilbert–Huang transform and wavelet transform: Application to fault diagnosis for rolling bearing. Mech. Syst. Signal Process.
**2005**, 19, 974–988. [Google Scholar] [CrossRef] - Yan, R.; Gao, R.X.; Chen, X. Wavelets for fault diagnosis of rotary machines: A review with applications. Signal Process.
**2014**, 96, 1–15. [Google Scholar] [CrossRef] - Konar, P.; Chattopadhyay, P. Bearing fault detection of induction motor using wavelet and Support Vector Machines (SVMs). Appl. Soft Comput.
**2011**, 11, 4203–4211. [Google Scholar] [CrossRef] - Zhang, X.; Liang, Y.; Zhou, J. A novel bearing fault diagnosis model integrated permutation entropy, ensemble empirical mode decomposition and optimized SVM. Measurement
**2015**, 69, 164–179. [Google Scholar] [CrossRef] - Li, Z.; Yan, X.; Tian, Z.; Yuan, C.; Peng, Z.; Li, L. Blind vibration component separation and nonlinear feature extraction applied to the nonstationary vibration signals for the gearbox multi-fault diagnosis. Measurement
**2013**, 46, 259–271. [Google Scholar] [CrossRef] - Saimurugan, M.; Ramachandran, K.; Sugumaran, V.; Sakthivel, N. Multi component fault diagnosis of rotational mechanical system based on decision tree and support vector machine. Expert Syst. Appl.
**2011**, 38, 3819–3826. [Google Scholar] [CrossRef] - Muralidharan, V.; Sugumaran, V. Feature extraction using wavelets and classification through decision tree algorithm for fault diagnosis of mono-block centrifugal pump. Measurement
**2013**, 46, 353–359. [Google Scholar] [CrossRef] - Hoang, D.T.; Kang, H.J. A survey on Deep Learning based bearing fault diagnosis. Neurocomputing
**2019**, 335, 327–335. [Google Scholar] [CrossRef] - Liu, R.; Yang, B.; Zio, E.; Chen, X. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mech. Syst. Signal Process.
**2018**, 108, 33–47. [Google Scholar] [CrossRef] - Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process.
**2019**, 115, 213–237. [Google Scholar] [CrossRef] - Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep learning for smart manufacturing: Methods and applications. J. Manuf. Syst.
**2018**, 48, 144–156. [Google Scholar] - Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process.
**2020**, 138, 106587. [Google Scholar] [CrossRef] - Li, X.; Hu, Y.; Li, M.; Zheng, J. Fault diagnostics between different type of components: A transfer learning approach. Appl. Soft Comput.
**2020**, 86, 105950. [Google Scholar] - Zhang, R.; Tao, H.; Wu, L.; Guan, Y. Transfer learning with neural networks for bearing fault diagnosis in changing working conditions. IEEE Access
**2017**, 5, 14347–14357. [Google Scholar] [CrossRef] - Zhao, Z.; Zhang, Q.; Yu, X.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Unsupervised Deep Transfer Learning for Intelligent Fault Diagnosis: An Open Source and Comparative Study. arXiv
**2019**, arXiv:1912.12528. [Google Scholar] - Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng.
**2009**, 22, 1345–1359. [Google Scholar] - Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data
**2016**, 3, 9. [Google Scholar] [CrossRef] [Green Version] - Wilson, G.; Cook, D.J. A Survey of Unsupervised Deep Domain Adaptation. arXiv
**2018**, arXiv:1812.02849. [Google Scholar] - Wang, M.; Deng, W. Deep visual domain adaptation: A survey. Neurocomputing
**2018**, 312, 135–153. [Google Scholar] [CrossRef] [Green Version] - Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv
**2015**, arXiv:1511.07122. [Google Scholar] - Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv
**2017**, arXiv:1511.07122. [Google Scholar] - Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding convolution for semantic segmentation. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1451–1460. [Google Scholar]
- Liu, R.; Wang, F.; Yang, B.; Qin, S.J. Multiscale Kernel Based Residual Convolutional Neural Network for Motor Fault Diagnosis Under Nonstationary Conditions. IEEE Trans. Ind. Inform.
**2019**, 16, 3797–3806. [Google Scholar] [CrossRef] - Jiang, G.; He, H.; Yan, J.; Xie, P. Multiscale convolutional neural networks for fault diagnosis of wind turbine gearbox. IEEE Trans. Ind. Electron.
**2018**, 66, 3196–3207. [Google Scholar] [CrossRef] - Qiao, H.; Wang, T.; Wang, P.; Zhang, L.; Xu, M. An adaptive weighted multiscale convolutional neural network for rotating machinery fault diagnosis under variable operating conditions. IEEE Access
**2019**, 7, 118954–118964. [Google Scholar] [CrossRef] - Huang, W.; Cheng, J.; Yang, Y.; Guo, G. An improved deep convolutional neural network with multi-scale information for bearing fault diagnosis. Neurocomputing
**2019**, 359, 77–92. [Google Scholar] [CrossRef] - Jia, F.; Lei, Y.; Guo, L.; Lin, J.; Xing, S. A neural network constructed by deep learning technique and its application to intelligent fault diagnosis of machines. Neurocomputing
**2018**, 272, 619–628. [Google Scholar] [CrossRef] - Yu, J. A selective deep stacked denoising autoencoders ensemble with negative correlation learning for gearbox fault diagnosis. Comput. Ind.
**2019**, 108, 62–72. [Google Scholar] [CrossRef] - Jing, L.; Zhao, M.; Li, P.; Xu, X. A convolutional neural network based feature learning and fault diagnosis method for the condition monitoring of gearbox. Measurement
**2017**, 111, 1–10. [Google Scholar] [CrossRef] - Han, T.; Liu, C.; Yang, W.; Jiang, D. A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults. Know. Based Syst.
**2019**, 165, 474–487. [Google Scholar] [CrossRef] - Chen, T.; Wang, Z.; Yang, X.; Jiang, K. A deep capsule neural network with stochastic delta rule for bearing fault diagnosis on raw vibration signals. Measurement
**2019**, 148, 106857. [Google Scholar] [CrossRef] - Li, X.; Zhang, W.; Ding, Q.; Sun, J.Q. Multi-layer domain adaptation method for rolling bearing fault diagnosis. Signal Process.
**2019**, 157, 180–197. [Google Scholar] [CrossRef] [Green Version] - Han, T.; Liu, C.; Yang, W.; Jiang, D. Deep transfer network with joint distribution adaptation: A new intelligent fault diagnosis framework for industry application. ISA Trans.
**2019**, 97, 269–281. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Wang, Q.; Michau, G.; Fink, O. Domain adaptive transfer learning for fault diagnosis. In Proceedings of the 2019 Prognostics and System Health Management Conference (PHM-Paris), Paris, France, 2–5 May 2019; pp. 279–285. [Google Scholar]
- Guo, L.; Lei, Y.; Xing, S.; Yan, T.; Li, N. Deep convolutional transfer learning network: A new method for intelligent fault diagnosis of machines with unlabeled data. IEEE Trans. Ind. Electron.
**2018**, 66, 7316–7325. [Google Scholar] [CrossRef] - Li, X.; Zhang, W.; Ding, Q.; Li, X. Diagnosing Rotating Machines With Weakly Supervised Data Using Deep Transfer Learning. IEEE Trans. Ind. Electron.
**2020**, 16, 1688–1697. [Google Scholar] [CrossRef] - Neupane, D.; Seok, J. Bearing Fault Detection and Diagnosis Using Case Western Reserve University Dataset With Deep Learning Approaches: A Review. IEEE Access
**2020**, 8, 93155–93178. [Google Scholar] [CrossRef] - Ben-David, S.; Blitzer, J.; Crammer, K.; Pereira, F. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2007; pp. 137–144. [Google Scholar]
- Zilong, Z.; Lv, H.; Xu, J.; Zizhao, H.; Qin, W. A Deep Learning Method for Bearing Fault Diagnosis through Stacked Residual Dilated Convolutions. Appl. Sci.
**2019**, 9, 1823. [Google Scholar] - Zellinger, W.; Grubinger, T.; Lughofer, E.; Natschläger, T.; Saminger-Platz, S. Central moment discrepancy (cmd) for domain-invariant representation learning. arXiv
**2017**, arXiv:1702.08811. [Google Scholar] - Smith, W.A.; Randall, R.B. Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study. Mech. Syst. Signal Process.
**2015**, 64, 100–131. [Google Scholar] [CrossRef] - Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the European Conference of the Prognostics and Health Management Society, Bilbao, Spain, 5–8 July 2016; pp. 5–8. [Google Scholar]
- Ghifary, M.; Kleijn, W.B.; Zhang, M. Domain adaptive neural networks for object recognition. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, 1–5 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 898–904. [Google Scholar]
- Sun, B.; Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 443–450. [Google Scholar]
- Maaten, L.V.D.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res.
**2008**, 9, 2579–2605. [Google Scholar] - Jiao, J.; Zhao, M.; Lin, J.; Liang, K. A comprehensive review on convolutional neural network in machine fault diagnosis. Neurocomputing
**2020**, 417, 36–63. [Google Scholar] [CrossRef]

**Figure 3.**Several typical amplitude-frequency characteristic curves and the signals after convolution without activation function. K is the kernel size, and r is the dilation rate. In the four parallel subgraphs below, the first row is the output of signal after convolution, the second row is the amplitude-frequency characteristics (AFC) curve, and the third row is the FFT spectrogram. In FFT spectrogram, the blue line represents the original signal, the red line represents the output signal.

**Figure 6.**Example signals of CWRU and Paderborn dataset. B1 to B6 are the working conditions of CWRU dataset. P1 to P4 are the working conditions of Paderborn dataset.

**Figure 9.**Average accuracy of FFCNN with different dilate rate on all tasks. B tasks are the tasks evaluated on CWRU dataset, and P tasks are the tasks evaluated on Paderborn dataset.

**Figure 11.**The visualization of learned features on CWRU dataset. The blue markers represent the source domain, the red markers represent the target domain. They are obtained from task $B4\to B5$.

**Figure 12.**The visualization of learned features on Paderborn dataset. The blue markers represent the source domain, the red markers represent the target domain. They are obtained from task $P3\to P2$.

**Figure 13.**Amplitude-frequency characteristic curves of each filter in the first layer of CNN-maximum mean discrepancies (MMD) from task $B5\to B1$.

**Figure 14.**Amplitude-frequency characteristic curves of each filter in the first layer of FFCNN-A from task $B5\to B1$. (

**a**–

**c**) represent the branches 1, 2, 3 with a dilation rate = 1, 2, 3, respectively.

**Figure 15.**Amplitude-frequency characteristic curves of each filter in the first layer of FFCNN-B from task $B5\to B1$. (

**a**–

**c**) represent the branches 1, 2, 3 with a dilation rate = 1, 3, 5, respectively.

**Table 1.**Details of proposed Frequency-domain Fusing Convolutional Neural Network (FFCNN) architecture.

Layer | Hyperparameters |
---|---|

CONV (${r}_{1}$) | ${r}_{1}=1$; channels: 8, kernel size: 15; stride: 1; activation: ReLu; padding: same |

CONV (${r}_{2}$) | ${r}_{2}=2$ (or 3); channels: 8, kernel size: 15; stride: 1; activation: ReLu; padding: same |

CONV (${r}_{3}$) | ${r}_{3}=3$ (or 5); channels: 8, kernel size: 15; stride: 1; activation: ReLu; padding: same |

POOL1 | Average Pooling, stride: 2 |

CONV | channels: 32, kernel size:15; stride: 1; activation: ReLu; padding: same |

POOL2 | Average Pooling, stride: 2 |

Features layer | Node number: 256, activation: ReLu |

Softmax layer | Node number: number of faults types, activation: softmax |

Sampling Frequency | Sensor Position | Speed (rpm) | Name of Setting |
---|---|---|---|

48 kHz | Driven end | 1796 | B1 |

48 kHz | Driven end | 1772 | B2 |

48 kHz | Driven end | 1725 | B3 |

12 kHz | Driven end | 1796 | B4 |

12 kHz | Driven end | 1725 | B5 |

12 kHz | Driven end | 1750 | B6 |

Rotating Speed (rpm) | Load Torque (Nm) | Radial Force (N) | Fault Type | Name of Setting |
---|---|---|---|---|

900 | 0.7 | 1000 | Health, inner fault, outer fault | P1 |

1500 | 0.1 | 1000 | P2 | |

1500 | 0.7 | 400 | P3 | |

1500 | 0.7 | 1000 | P4 |

**Table 4.**Diagnosis accuracy (%) on different working conditions compared with different methods using CWRU dataset. The values in bold indicate that FFCNN has a higher accuracy rate than CNN.

Tasks | Source Only | CNN-MMD | FFCNN-A | FFCNN-B | CNN-CORAL | FFCNN-A | FFCNN-B | CNN-CMD | FFCNN-A | FFCNN-B |
---|---|---|---|---|---|---|---|---|---|---|

$B1\to B2$ | 75.10 | 81.13 | 89.65 | 90.28 | 75.20 | 75.17 | 75.44 | 78.49 | 81.91 | 83.64 |

$B1\to B3$ | 78.69 | 79.27 | 81.96 | 84.15 | 79.32 | 81.66 | 83.77 | 82.86 | 87.06 | 90.09 |

$B1\to B4$ | 30.32 | 98.32 | 100.00 | 98.12 | 75.15 | 74.66 | 61.69 | 97.83 | 99.44 | 99.78 |

$B1\to B5$ | 31.13 | 67.48 | 70.48 | 80.76 | 66.90 | 65.19 | 71.26 | 92.90 | 96.92 | 96.19 |

$B1\to B6$ | 48.73 | 100.00 | 100.00 | 99.98 | 76.86 | 76.39 | 70.46 | 99.46 | 99.00 | 99.29 |

$B2\to B1$ | 88.13 | 90.21 | 98.66 | 99.63 | 89.82 | 93.46 | 95.97 | 90.80 | 94.41 | 96.12 |

$B2\to B3$ | 72.27 | 49.80 | 60.91 | 55.15 | 73.54 | 76.42 | 74.27 | 73.49 | 74.93 | 75.17 |

$B2\to B4$ | 50.00 | 97.05 | 97.05 | 96.12 | 57.03 | 68.43 | 66.58 | 97.51 | 98.66 | 98.68 |

$B2\to B5$ | 50.00 | 54.90 | 65.31 | 60.77 | 50.54 | 53.37 | 49.71 | 89.62 | 98.17 | 97.12 |

$B2\to B6$ | 40.40 | 55.91 | 58.42 | 59.30 | 35.33 | 49.52 | 44.14 | 95.80 | 96.63 | 99.44 |

$B3\to B1$ | 60.76 | 99.95 | 100.00 | 100.00 | 76.59 | 92.28 | 96.66 | 99.56 | 99.88 | 99.98 |

$B3\to B2$ | 54.30 | 66.35 | 67.01 | 74.51 | 61.13 | 69.80 | 72.63 | 75.85 | 74.85 | 73.00 |

$B3\to B4$ | 50.00 | 75.02 | 86.62 | 85.86 | 50.00 | 50.00 | 50.00 | 89.19 | 95.85 | 98.15 |

$B3\to B5$ | 51.25 | 59.15 | 96.02 | 97.37 | 51.95 | 51.42 | 52.00 | 86.55 | 92.82 | 95.68 |

$B3\to B6$ | 49.54 | 99.95 | 99.22 | 99.10 | 49.58 | 54.24 | 50.05 | 95.14 | 99.34 | 99.05 |

$B4\to B1$ | 25.71 | 100.00 | 100.00 | 99.19 | 86.33 | 84.15 | 86.52 | 98.90 | 99.95 | 99.90 |

$B4\to B2$ | 33.45 | 75.63 | 75.22 | 74.98 | 73.02 | 74.85 | 74.05 | 76.49 | 76.66 | 76.29 |

$B4\to B3$ | 38.53 | 59.23 | 59.30 | 62.28 | 47.00 | 56.47 | 65.11 | 70.26 | 77.54 | 79.66 |

$B4\to B5$ | 58.89 | 80.98 | 94.80 | 95.48 | 85.25 | 90.23 | 93.66 | 99.39 | 99.56 | 99.10 |

$B4\to B6$ | 78.05 | 100.00 | 90.57 | 90.59 | 94.55 | 89.97 | 84.45 | 100.00 | 100.00 | 100.00 |

$B5\to B1$ | 26.41 | 76.41 | 86.23 | 93.75 | 53.57 | 61.45 | 75.68 | 84.84 | 92.58 | 97.17 |

$B5\to B2$ | 25.46 | 46.09 | 54.34 | 52.22 | 38.91 | 48.68 | 47.04 | 72.70 | 79.10 | 75.12 |

$B5\to B3$ | 35.65 | 76.44 | 71.07 | 79.57 | 66.95 | 67.14 | 56.86 | 70.51 | 77.51 | 80.47 |

$B5\to B4$ | 50.07 | 51.88 | 70.55 | 71.32 | 50.00 | 69.19 | 73.02 | 99.95 | 100.00 | 100.00 |

$B5\to B6$ | 50.07 | 52.39 | 72.10 | 71.97 | 76.66 | 87.60 | 87.01 | 100.00 | 100.00 | 100.00 |

$B6\to B1$ | 25.00 | 95.53 | 95.56 | 100.00 | 46.24 | 42.62 | 52.88 | 98.32 | 99.12 | 99.93 |

$B6\to B2$ | 25.00 | 59.50 | 59.42 | 58.88 | 36.45 | 40.87 | 48.34 | 70.38 | 77.66 | 76.73 |

$B6\to B3$ | 35.84 | 70.07 | 77.63 | 82.32 | 63.38 | 61.91 | 51.93 | 77.03 | 80.86 | 87.04 |

$B6\to B4$ | 51.56 | 100.00 | 100.00 | 100.00 | 75.00 | 75.00 | 74.98 | 100.00 | 100.00 | 100.00 |

$B6\to B5$ | 54.00 | 76.00 | 77.66 | 71.26 | 67.53 | 85.18 | 75.12 | 99.02 | 99.63 | 99.73 |

AVG | 48.14 | 76.49 | 81.86 | 82.83 | 64.33 | 68.91 | 68.71 | 88.76 | 91.67 | 92.42 |

**Table 5.**Diagnosis accuracy (%) on different working conditions compared with different methods using Paderborn dataset. The values in bold indicate that FFCNN has a higher accuracy rate than CNN.

Tasks | Source Only | CNN-MMD | FFCNN-A | FFCNN-B | CNN-CORAL | FFCNN-A | FFCNN-B | CNN-CMD | FFCNN-A | FFCNN-B |
---|---|---|---|---|---|---|---|---|---|---|

$P1\to P2$ | 42.71 | 56.09 | 69.47 | 76.33 | 46.65 | 51.95 | 53.48 | 62.66 | 68.94 | 65.79 |

$P1\to P3$ | 50.62 | 18.07 | 18.30 | 20.61 | 57.72 | 65.04 | 64.94 | 42.28 | 59.80 | 64.42 |

$P1\to P4$ | 41.57 | 51.31 | 46.07 | 54.00 | 46.39 | 52.90 | 53.78 | 54.75 | 61.98 | 63.15 |

$P2\to P1$ | 48.92 | 76.78 | 88.57 | 87.37 | 52.63 | 61.33 | 62.24 | 72.79 | 74.64 | 76.30 |

$P2\to P3$ | 87.05 | 94.47 | 95.15 | 94.89 | 90.46 | 92.35 | 92.48 | 93.13 | 93.78 | 93.16 |

$P2\to P4$ | 88.28 | 91.96 | 90.14 | 92.51 | 88.64 | 85.81 | 86.85 | 88.64 | 87.60 | 90.72 |

$P3\to P1$ | 39.81 | 65.09 | 80.25 | 81.24 | 39.06 | 40.23 | 40.53 | 74.09 | 74.97 | 75.91 |

$P3\to P2$ | 57.62 | 92.12 | 92.90 | 93.88 | 62.77 | 65.10 | 65.40 | 87.21 | 89.78 | 90.40 |

$P3\to P4$ | 51.63 | 86.20 | 85.25 | 85.94 | 51.40 | 49.19 | 47.04 | 79.85 | 80.08 | 78.87 |

$P4\to P1$ | 47.07 | 70.60 | 74.58 | 72.69 | 50.13 | 59.11 | 56.93 | 68.52 | 70.28 | 71.48 |

$P4\to P2$ | 94.73 | 95.74 | 96.09 | 96.71 | 95.02 | 93.46 | 94.60 | 94.73 | 93.98 | 94.30 |

$P4\to P3$ | 60.32 | 90.82 | 89.81 | 90.95 | 81.05 | 84.51 | 84.73 | 87.04 | 87.21 | 88.09 |

AVG | 59.19 | 74.10 | 77.22 | 78.93 | 63.49 | 66.75 | 66.92 | 75.47 | 78.59 | 79.38 |

CNN ${}^{1}$ | |||||
---|---|---|---|---|---|

Dilated kernels ${}^{1}$ | Common kernels | ||||

Diration rate | Params ${}^{2}$ | Acc | Kernel size | Params ${}^{2}$ | Acc |

1 | 11936 | 83.03 | 15 | 11936 | 83.03 |

2 | 11936 | 73.55 | 29 | 12272 | 89.3 |

3 | 11936 | 96.65 | 43 | 12608 | 64.85 |

4 | 11936 | 90.09 | 57 | 12944 | 68.58 |

5 | 11936 | 84.48 | 71 | 13280 | 83.87 |

FFCNN | |||||

1, 2, 3 | 11936 | 86.23 | 15, 29, 43 | 12272 | 88.06 |

1, 3, 5 | 11936 | 93.75 | 15, 43, 71 | 12608 | 87.11 |

^{1}For fair comparison, dilated convolution and common convolution kernels of varying size only act on the first layer in CNN.

^{2}Only count the number of parameters in the convolutional layers.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, X.; Zheng, J.; Li, M.; Ma, W.; Hu, Y.
Frequency-Domain Fusing Convolutional Neural Network: A Unified Architecture Improving Effect of Domain Adaptation for Fault Diagnosis. *Sensors* **2021**, *21*, 450.
https://doi.org/10.3390/s21020450

**AMA Style**

Li X, Zheng J, Li M, Ma W, Hu Y.
Frequency-Domain Fusing Convolutional Neural Network: A Unified Architecture Improving Effect of Domain Adaptation for Fault Diagnosis. *Sensors*. 2021; 21(2):450.
https://doi.org/10.3390/s21020450

**Chicago/Turabian Style**

Li, Xudong, Jianhua Zheng, Mingtao Li, Wenzhen Ma, and Yang Hu.
2021. "Frequency-Domain Fusing Convolutional Neural Network: A Unified Architecture Improving Effect of Domain Adaptation for Fault Diagnosis" *Sensors* 21, no. 2: 450.
https://doi.org/10.3390/s21020450