1. Introduction
In recent years, deep learning (DL) methods are increasingly used in fault diagnosis and prediction [
1,
2,
3]. Deep learning is an algorithm based on data representation learning in machine learning. An intelligent diagnosis using deep learning gets rid of the dilemma that traditional fault diagnosis methods rely too much on diagnostics experts and professional technicians and breaks the deadlock between a large amount of diagnostic data for mechanical equipment and the relatively few diagnostic experts. Traditional machine learning techniques are limited in their ability to process natural data in its raw form. The most obvious difference between deep learning models and traditional models is that DL can learn the abstract representation features of the raw data automatically [
4].
Several DL methods, such as the deep belief network (DBN), deep auto-encoder (DAE), and convolutional neural network (CNN) have been applied to fault diagnosis [
5]. These three deep learning models are all built with different base models, and they all have their own characteristics in feature learning [
6]. DAE is easy to train and is a pure unsupervised feature learning model [
7]. DBN is a probabilistic generation model, which can obtain the joint distribution of observed data and markers [
8]. CNN has some attractive advantages, such as shift-invariance and weight sharing [
9]. Compared with the traditional machine learning method, DL has achieved good results, but its application in fault diagnosis is still in the development stage.
As one of the most effective DLs, CNNs have been widely used in image classification, object detection, and semantic segmentation. Gaurav Dhiman et al. proposed a method aimed at the characteristics of discrete attributes of tumor-related medical events and proposes a medical event [
10]. CNN has the ability to represent learning and can shift invariant classification of input information according to its hierarchical structure. Therefore, it is also called “Shift - Invariant Artificial Neural Networks (SIANN)”. In the 21st century, with the proposal of DL theory and the improvement of numerical computing equipment, CNN has been rapidly developed, and has been applied to computer vision, natural language processing, and other fields. As a machine learning model under deep supervised learning, CNN has strong adaptability. It is good at mining local features of data and extracting global training features and classification. Its weight-sharing structure network makes it more similar to biological neural networks and has achieved good results in all fields of pattern recognition [
11].
To further improve the performance of deep CNNs, many studies have been carried out since the pioneering Alex Net [
12]. Long Wen et al. [
13] built a new CNN based on LeNet-5 for fault diagnosis. Through a conversion method of converting signals into two-dimensional images, the proposed method can extract the features of the converted 2D images and eliminate the effect of handcrafted features. Zhao et al. [
14] proposed a method to convert the one-dimensional vibration signals into two-dimensional grey images, and then 2D-CNN is used to extract fault features and realize fault classification. However, the conversion process of the above method may destroy the timing relationship of the original signal, and the memory usage is significantly higher. Gong et al. proposed a method dedicated to one-dimension data [
15]. Huang et al. [
16] developed a novel fault diagnosis method to identify the fault state of the bogie and locate the faulty component. Peng et al. [
17] proposed a new multi-branch multi-scale convolutional neural network that can automatically learn and fuse rich complementary fault information from multiple signal components and time scales of vibration signals. Xiong et al. [
18] proposed a fault diagnosis data preprocessing method based on an interdimensional similarity graph matrix. Mo et al. [
19] developed a new approach, integrating learnable variational kernel into 1D-CNN and focusing more on extracting important fault-related data features and providing good performance with limited data. Zhang et al. [
20] proposed an intelligent fault diagnosis method for unlabeled data rolling bearings based on a convolutional neural network (CNN) and fuzzy C-means (FCM) clustering algorithm. Chen et al. [
21] extended a multi-scale CNN with feature alignment (MSCNN-FA) for bearing fault diagnosis under different working conditions. Yu et al. [
22] proposed a novel one-dimensional residual convolutional autoencoder (1-DRCAE) for learning features from vibration signals directly in an unsupervised-learning way. Shao et al. [
23] proposed a new framework for rotor-bearing system fault diagnosis under varying working conditions by using CNN with transfer learning. Li et al. [
24] proposed a novel three-step intelligent fault diagnosis method based on CNN and Bayesian Gaussian mixture (BGM) for rotating machinery. Guo et al. [
25] proposed a rolling element bearing fault diagnosis and localization approach based on a multi-task convolutional neural network (CNN) with information fusion. Xie et al. [
26] developed a novel intelligent diagnosis method based on multi-sensor fusion (MSF) and CNN.
Although the above studies have achieved good results, the following two shortcomings remain:
- (1)
In the traditional deep neural network, the data are down sampled by reusing the pooling layer. The pooling layer can reduce the number of training parameters to achieve the effect of reducing computing costs and improving computing efficiency. However, the pooling operation will lose the position information between the data, and a certain degree of translation invariance is achieved to a certain extent. Position information is an extremely important feature in time series signals. It reflects the overall change trend of the signal. Pooling operations may change the local change trend of the signal, leading to misjudgment.
- (2)
In traditional convolutional networks, each feature channel is treated equally. Among them, some features may be important features, and some are redundant or even irrelevant features. The above research does not pay attention to the weight of each feature map channel, which may lead to feature redundancy to a certain extent.
In recent years, the achievements of the attention mechanism in the field of computer vision have attracted wide attention from researchers. It can selectively enhance useful features and weaken redundant features. Jie Hu et al. [
27] designed squeeze-and-excitation networks (SENets), which learn channel attention for each convolution block, bringing clear performance gain for various deep CNN architectures. Zilin Gao et al. [
28] improved the SE block by capturing more sophisticated channel-wise dependencies or by combining it with additional spatial attention. Jun Fu et al. [
29] proposed a dual attention network (DANet) to adaptively integrate local features with their global dependencies. Chen et al. [
30] proposed a transferable convolutional neural network to improve the learning of target tasks. Wang et al. [
31] proposed a novel multi-task attention convolutional neural network (MTA-CNN) that can automatically give feature-level attention to specific tasks. The MTA-CNN consists of a global feature shared network (GFS network) for learning globally shared features and
K task-specific networks with a feature-level attention module (FLA module). This architecture allows the FLA module to automatically learn the features of specific tasks from globally shared features, thereby sharing information among different tasks. Although these methods achieved higher accuracy, they also bring higher model complexity and more computation. Fang et al. [
32] extended an efficient feature extraction method based on CNN and used a lightweight network to complete high-precision fault diagnosis tasks. The spatial attention mechanism (SAM) is used to adjust the weight of the output feature map. This method has good anti-noise ability and domain adaptability. Wang Hui et al. [
33] proposed a new intelligent bearing fault diagnosis method, which combined the symmetric point mode (SDP) representation with the squeeze-and-excitation networks (SE-CNN) model. This method can assign a certain weight to each feature extraction channel, further strengthen the bearing diagnosis model with the main feature as the center and reduce redundant information.
Inspired by the analyses mentioned above, this paper proposes a novel improved convolutional neural network (ICNN) fault diagnosis method. This paper has the following three contributions:
- (1)
The receptive field is used as a guiding principle for the design of the network model. In this paper, the model is always designed so that the receptive field of the last layer is close to the length of the original signal, which ensures that each feature extracted by the model is focused on the complete sample.
- (2)
ACCC blocks are used to obtain features of suitable scale while avoiding the use of pooling layers, which can damage the signal timing relationship. What is more, this block can calibrate feature channels, informative features are significantly enhanced, and irrelevant features are effectively suppressed.
- (3)
After being tested on two data sets, the proposed method is better than the other nine methods and achieves the highest average accuracy rate. The results show that the proposed method has good performance.
The rest of this paper is organized as follows. The standard CNN theory and the proposed method are given in
Section 2. In
Section 3, the experiment results are discussed and verified by NASA data sets. In
Section 4, the conclusions are given.