1. Introduction
Bearings are regarded as critical components in rotating machinery. However, bearings often suffer from failure conditions, since they are usually working in a harsh working environment [
1,
2]. Early and effective bearing fault diagnosis technique plays an important role in avoiding unforeseen downtime of rotating machinery.
Compared to current signals [
3] and acoustic emission signals [
4], vibration signals [
5,
6] contain abundant information that reflects the health state of bearings. Thus, vibration signals are widely used in bearing fault diagnosis. Generally, fault diagnosis techniques can be categorized into two types, signal analysis, and data-driven methods. For signal analysis methods, vibration signals are first dealt with signal processing methods such as time-domain analysis [
7], frequency domain analysis [
8] and time-frequency domain analysis [
9,
10]. Then, based on the expert knowledge, features extracted from different domains are used to detect bearings health changes and assess health states. A major limitation of signal analysis methods is that comprehensive and great expert knowledge is required to determine the health states and faulty types of bearings from extracted features.
Different from signal analysis methods, data-driven methods only rely on the collected vibration data for fault diagnosis. In data-driven methods, labeled vibration data are first collected. Then, features are extracted from different domains similar to the signal analysis method. For further fault diagnosis purpose, with these extracted features, classifiers are trained using machine learning methods such as Support Vector Machine (SVM) [
11,
12], Random Forest (RF) [
13,
14] and Multi-Layer Perceptron (MLP) [
15].
Recently, deep learning methods have gained considerable attention in the field of data-driven fault diagnosis. A huge advantage of deep learning is that deep features can be extracted automatically. Generally, deep features can exhibit more useful information for fault diagnosis, compared to shallow features extracted from traditional machine learning methods. It has proved that better diagnostic performance can be achieved using deep learning methods [
16]. As representative deep learning methods, auto-encoder (AE), deep belief networks(DBN), and convolutional neural network (CNN) have shown their superiority in bearing fault diagnosis. For example, Chen and Li fed the extracted time-domain and frequency-domain features from the different sensor signals into multiple two-layer sparse autoencoders (SAE) neural networks for fault classification [
17]. Gan et al. designed a two-layer hierarchical diagnosis network (HDN) to identify fault types and recognize fault severity ranking by employing deep belief networks (DBNs) to provide representative features [
18].
As one of the most effective deep learning methods, a convolutional neural network (CNN) has also been applied to fault diagnosis. The common CNN-based methods can be categorized into one-dimensional (1-D) CNN-based and two-dimension (2-D) CNN-based methods. For 1-D CNN methods, the raw 1-D time-domain vibration signals are directly fed into the 1D CNN model [
19]. For 2-D CNN methods, the raw vibration signals are usually transformed into 2-D time-frequency domain data, and then the 2D data are dealt with 2D-CNN [
20]. Levent et al. [
19] took the raw vibration data as the input, and used the compact adaptive 1D-CNN to diagnose the bearing fault. Gao et al. [
21] proposed a novel hybrid deep learning method (NHDLM) based on Extended Deep Convolutional Neural Networks with Wide First-layer Kernels (EWDCNN) and long short-term memory (LSTM) to enhance diagnosis accuracy for rotating machinery in complex environments. Han et al. [
22] presented a novel diagnosis framework that combines the Spatio-temporal pattern network (STPN) approach with CNN and applied it to fault diagnosis of complex systems. Wang et al. [
23] fused the multi-sensor vibration signals and transformed them into images to obtain more informative features. Then the input was fed to the bottleneck CNN for fault diagnosis. However, in CNN-based fault diagnosis methods, each convolutional operation often uses convolutional kernels of the same size. To further extract more informative features from vibration data, inspired by the inception structure, convolutional kernels of different sizes can be selected to improve the performance of fault diagnosis. To extract multi-scale features, Qiao et al. [
24] employed the convolutional kernels with different widths to act as filters with different scales of frequency domain resolution to simultaneously extract features of different frequency bands of the vibration signal. Further, Wang et al. [
25] combined the dilated convolutional with multi-scale convolutional for remaining useful life prediction. Compared with the convolutional layer, dilated convolutional layer has a larger receptive field with the same size convolutional kernel. Due to this advantage, dilated convolutional can ignore the redundant information in vibration signals.
Motivated by the above discussions, an improved multi-scale convolutional neural network is developed for bearing fault diagnosis in this paper. To extract more informative features, we employ four dilated convolutional kernels with different dilation rates in multi-scale CNN. Among these four dilated convolutional kernels, the dilation rates of the two kernels are set as 1. Thus, these two dilated convolutional kernels become convolutional kernels. Moreover, different from multi-scale CNN [
24], a 1D convolutional layer is adopted before using multi-scale CNN to mitigate the effect of noise for bearing fault diagnosis. In summary, the main contributions of the proposed method can be listed as follows,
- (1)
To enlarge the receptive of multi-scale CNN, four dilated convolutional kernels with different dilation rates are designed. Thus, more informative features can be extracted for fault diagnosis.
- (2)
For reduction of the noise in vibration signals, an additional one-dimensional convolutional layer is adopted to extract the features before dilated convolutional layer.
- (3)
Two widely used datasets including CWRU and PU datasets are employed to evaluate the performance of the proposed method compared with other related methods. Results show the superiority of the proposed method.
The rest of this paper is organized as follows.
Section 2 offers a brief review of CNN and its inception structure. In
Section 3, the improved multi-scale dilated CNN is developed for bearing fault diagnosis. Two widely used experimental cases are carried out to evaluate the performance of the proposed method compared with other related methods in
Section 4. In the final section, the conclusions are drawn.
3. The Architecture of the Proposed IMSCNN
In the proposed method, the raw vibration data is used as the input of the neural network. The raw vibration data is divided into a number of groups. To facilitate subsequent processing and accelerate the convergence of the neural networks, maximum-minimum normalization is used to deal with each group of input data,
where
is the
ith sample.
and
are the smallest and largest values in the group.
N is the size of samples in group.
The structure of the proposed IMSCNN is shown in
Figure 3. In practice, the vibration signals are often contaminated by noises. To alleviate this problem, a 1-D convolutional layer is first employed in the proposed method. By using a 1D convolutional layer, noises contained in the raw vibration signals can be filtered. To enhance the ability of feature extraction, a dilated multi-scale convolutional (DMSConv) layer with a larger kernel size is employed to extract multi-scale features. Inside the DMSConv layer, there are four multi-scale convolutional as shown in
Figure 4.
In the DMSconv layer, four dilated convolutional kernels with different dilation rates are integrated to extract features through the inception structure. The details of DMSconv layer are shown in
Table 1, where KS and NC represent the kernel size and number of channels, respectively. From
Table 1, it is noticed that the kernel size of each convolutional layer is singular in multi-scale convolutional. This is for the convenience of using the
convolutional to unify the size of feature map output. Therefore, the number of output channels of the DMSconv layer is 4 × NC.
The multi-scale feature map (MSFM) is defined as,
where the
and
represent the feature maps after dilated convolutions, respectively.
To increase the robustness and reduce the computational effort, the max-pooling operation is performed on the FM obtained after the first DMSconv layer. To extract deeper features, a second DMSconv layer with a smaller kernel size is utilized. Additionally, the global average pool (GAP) is used to compress the features of each channel into four features. Finally, these features are fed into an FC layer for classification.
The structure of the proposed IMSCNN model is shown in
Table 2. Usually, vibration signals are often collected under high-frequency noise background. Thus, in the first and second DMSconv layers, relatively large kernel size and small kernel size are selected to suppress the high-frequency noise. According to [
32], the kernel sizes of the two DMSconv layers are 32 and 2 in this study, respectively. To train the proposed IMSCNN model, cross-entropy loss function is adopted for fault diagnosis.
The widely used Adam [
33] method is employed. And batch normalization (BN) [
34] is used to regularize the model and reduce the need for Dropout,
where
represents the output of the
ith layer.
4. Experiments and Results
To verify the performance of the proposed IMSCNN method, two cases including CWRU and PU datasets are carried out. For comparison, the widely used neural networks including MLP, CNN and MSCNN are employed. The details of these neural networks are described as follows,
Table 3.
Architecture-related hyperparameters of MLP.
Table 3.
Architecture-related hyperparameters of MLP.
NO. | Layer Name | Layer Size |
---|
1 | FC1 | |
2 | FC2 | |
3 | FC3 | |
4 | FC4 | |
5 | FC5 | |
6 | FC6 | |
Table 4.
Architecture-related hyperparameters of CNN.
Table 4.
Architecture-related hyperparameters of CNN.
NO. | Layer Name | Layer Size |
---|
1 | Conv&Pool1 | |
2 | Conv&Pool2 | |
3 | Conv&Pool3 | |
4 | Conv&Pool4 | |
5 | GAP | 4 |
6 | FC1 | |
7 | FC2 | |
8 | FC3 | |
9 | FC4 | |
For all comparative methods, the batch size is set to 64. Adam is used as an optimizer. The maximum number of epochs is selected as 100. 1024 data points are set as a group of data input to the neural network. The working environment is Intel Core i7-8750h CPU@ 2.20 GHz, 24.0 GB ram, and Geforce GTX 2070 GPU under Windows 10 operating system. All methods are implemented through Python 3.6.12 and Pytorch 1.7.1.
4.1. Case 1: CWRU
The CWRU datasets were provided by the Case Western Reserve University bearing data center [
35]. The vibration data was collected under three faulty conditions and one normal condition. Each fault has three kinds of faults in different positions, so there are a total of 9 kinds of faults to be classified. In this study, the data with the acquisition frequency of 12 K is selected. The details of the fault are shown in the
Table 5.
Table 5 shows that in addition to the normal bearings there are three different fault locations, Ball (B), Inner Race (IR), and Out Race (OR). Each fault location contains three fault diameters of 0.07 inches, 0.014 inches, and 0.021 inches respectively. All faults were created manually by electro-discharge machining (EDM).
In the experiment, 80% of the collected data from each condition is used for training and the other 20% is for testing. The accuracy results are shown in
Table 6. The confusion matrix is shown in
Figure 5. It can be seen from
Table 6 that both CNN and MLP offer satisfactory performance, where the accuracy reaches
and
respectively. Through extracting multi-scale features, MSCNN, SimpleIMSCNN, and IMSCNN can provide
accuracy.
To further compare the ability of feature extraction, t-SNE [
36] is used to visualize the extracted features for all methods. As shown in
Figure 6, it can be found that the features extracted by MLP are close between class 1, class 4, class 7, and class 8, while the features extracted by CNN are close between classes 5 and 8. Thus, there exist misclassified results by MLP and CNN. The data in the confusion matrix can also prove this point as plotted in
Figure 5. Contrary, the distance between features extracted from MSCNN, SimpleIMSCNN, and IMSCNN are relative far.
4.2. Case 2: PU Dataset
PU datasets were provided by the Paderborn University Bearing Data Center [
37]. In the PU dataset, there are 14 faulty conditions. In this study, the vibration data was collected under the working conditions of rotating speed 1500 rpm, load torque 0.7 nm, and radial force 1000 N. The descriptions of 14 faults are listed in
Table 7. In
Table 7, the fault location is represented by fault mode. Since the fault type of NO.13 KI04 is the same as NO. 8 KI14, we only consider NO. 8 KI14. Thus, our goal is to classify the 13 faulty conditions.
All data are collected on the test rig through the transducer. The sampling frequency of vibration data is 64 k Hz and the sampling time is 4 s. The real damages bearing used in this experiment were obtained by accelerated lifetime test. Low viscosity oil was also used during the experiments, which was more conducive to the appearance of damage. Most damage is caused by fatigue damages, which arise in the form of pittings. The rest of the damage types are mainly plastic deformation in the form of indentations caused by the debris. We use 80% of the data from each condition for training and 20% for testing.
The confusion matrix is displayed in
Figure 7. As shown in
Figure 7, there are many faulty samples misclassified by MLP and CNN. For MLP, the accuracy rate is only
for fault 7. For CNN, the accuracy of fault 11 is only
, and
of samples of fault 11 are misclassified as fault 8. From the data in confusion matrices of SimpleIMSCNN, MSCNN, and IMSCNN, it can be seen that there are much fewer misclassified samples.
In a similar way, t-SNE is used to visualize the extracted for all comparative methods. The visualization results are plotted in
Figure 8. As displayed in
Figure 8, it can be seen that the features extracted from MLP and CNN can not be well separated. Compared to MSCNN and SimpleIMSCNN, the features extracted from IMSCNN are more distinguishable.
Table 8 lists the accuracy results. From
Table 8, the accuracy of MLP and 1DCNN are
and
, respectively. Through extracting multi-scale features, the accuracy of MSCNN is
. On the other hand, the accuracy of SimpleIMSCNN is
. The proposed IMSCNN method can provide the best performance among the comparative methods, where the accuracy reaches
. It indicates that the noises contained in vibration signals can be filtered by the first 1D convolutional layer of the proposed IMSCNN. Thus, the diagnostic performance of IMSCNN is improved.