1. Introduction
With the growth of the social economy, there are more stringent performance criteria for industrial machinery [
1]. Rolling bearings, as a vital component of industrial equipment, will reduce the performance of industrial equipment when they fail, and may even cause the machine to shut down [
2,
3,
4]. The effectiveness of industrial equipment depends on the bearing state, which operates in a complex environment and under variable loading, so there are several factors that can trigger faults in bearings [
5]. At present, there are traditional methods and data-driven methods of rolling mechanical equipment fault diagnosis. In conventional fault diagnostics of signal processing techniques [
6,
7,
8], the operating data of the equipment are collected, and the major characteristics of the vibration signal are extracted by advanced signal feature extraction methods. Finally, these methods can identify the health status of the machine and the operating state of the rolling bearing by comparing the feature frequency of each bearing fault state and the characteristic spectrum of the vibration signal [
9]. These kinds of methods perform well, and the characteristic frequency can be found in the complex signal, but more manual experience is needed [
10,
11]. It is also difficult to diagnose the degree or the type of fault. In addition, due to the complex operating environment and varying loading conditions of rolling bearings, it is still challenging to establish a bearing state diagnostic model with automatic bearing failure detection.
In data-driven approaches, most of these methods are based on traditional signal analysis techniques. Kaixuan Shao [
12] proposed a bearing fault diagnostic method that integrated variational mode decomposition (VMD), time-shift multiscale dispersion entropy (TSMDE) and support vector machine (SVM), and optimized by the vibrational Harris Hawks optimization algorithm (VHHO). X. Zhang [
13] proposed a feature selection and fault detection system that can dynamically select the optimal feature subset and update the SVM classifier’s parameters to achieve the highest diagnostic rate. This method performed well and can simultaneously acquire the appropriate feature subset and SVM model for fault diagnostics. Fan Jiang [
14] proposed a novel approach of bearing defect diagnostics that fused multi-sensor data utilizing a combination of integrated empirical mode decomposition (EEMD), correlation coefficient analysis and SVM. All of these methods are based on traditional feature extraction and use a large number of artificial features to identify faults. Moreover, the category judgment of the bearing fault is realized with the help of classification in machine learning. Despite the fact that they perform diagnostics from signal to defect, the effectiveness of these methods is highly dependent on the influence of the signal feature extraction. At the same time, feature screening with these methods is quite time-consuming, and they have not achieved intelligent condition recognition. End-to-end analysis methods are not implemented.
Deep learning has evolved data-driven fault detection methods into intelligent fault diagnosis (IFD), which attempts to learn data features for recognition from collected vibration data automatically, rather than manually extracting features, and aims to build a diagnostic model that can automatically connect the collected data and the health status of the machine [
15,
16]. Convolutional neural networks (CNN), as the most extensively utilized network in deep learning [
17,
18], excel at adaptive data feature extraction, and CNN is a well-established technique for evaluating two-dimensional data such as images.
Most of the time, the industrial machines are operating in a complex environment with strong environmental noise. Due to the manufacturing inaccuracies, improper installation, running speed, supported load or lubrication of rolling bearings, it may generate a noise signal [
19]. Moreover, noise signals may be produced by fluctuating rotating processes combined with nonlinear vibration introduced by other parts of the machine, such as gears, rotors or other bearings [
20]. The monitoring signal of a rolling bearing acquired by a vibration sensor contains a lot of noise, and the collected bearing signals from equipment such as shield tunneling machines or wind-driven generators produce stronger noise [
21,
22]. Due to the weakness of bearing faults, the fault information may be obscured by the strong noise in the signal, which is difficult to resolve with traditional methods. Thus, the first step of signal analysis is to denoise it from the raw signal, and then extract the rolling bearing fault features. These methods require the prior knowledge of noise and more computation time [
23]. The vibration signal of a rolling bearing is non-stationary and non-linear, and the useful information can be buried in the noise, making it impossible to use the convolutional network for fault detection well. It is difficult to diagnose the bearing fault in the time-domain signal [
24].
As some useful information can be acquired from multi-domain raw signals [
25], we improve a multi-domain convolution neural network using the 1D and 2D convolution neural network to solve the previous problem of bearing fault diagnosis. This method extracts the fault feature information from two domains of bearing vibration signals, which are the one-dimensional signal and two-dimensional time–frequency signal. Then, the multilayer perceptron sets the mapping between the feature matrix and bearing fault state, and realizes the end-to-end fault diagnosis.
Two bearing datasets are used to verify the effectiveness of the proposed method. One is from the Case Western Reserve University (CWRU) bearing vibration experiment, and the other is from a vibration laboratory bearing test bench from Dalian University of Technology (DUT). The effectiveness of the suggested method is validated in these two sets of data; the model accuracy exceeds 99%; we also combine different degrees of noise into the raw signal to verify the better applicability of the method in a strong noise environment, and the test results show the better generalization ability of the algorithm.
In general, the main objective of this article is to handle the fault state recognition of a rolling bearing intelligently, with no manual experience and in a complicated environment. Moreover, we construct a new fault diagnosis model that extracts the bearing fault state information adaptively by multi-domain learning of a convolution neural network.
The rest of the article is organized in the following sections.
Section 1 introduces the background of the fault diagnosis, and the types and characteristics of existing fault diagnosis methods.
Section 2 gives an overview of fault features, signal processing and related information about CNN.
Section 3 describes the proposed fault diagnostics approach.
Section 4 gives the results and
Section 5 presents the conclusions of our work.
3. Methodology
Due to the complex operating conditions, the raw vibration signal of a rolling bearing is irregular and chaotic, and it is difficult to realize the effective feature extraction and accurate fault diagnosis by using the normal convolution neural network, especially when there is noise in the signals. Moreover, the over-fitting problem will occur due to noise and limited data. To address these problems, we introduce the two-domain learning of the raw signals to extract the key features adaptively by the composite dimension structure of the convolution neural network. Moreover, the dropout is added to solve the over-fitting.
3.1. Proposed Model Structure
Considering the advantage of multi-domain learning, which can obtain more useful information from a raw sample, the first task is choosing another appropriate domain of data for the raw signal. Although the raw bearing vibration data are non-stationary series, the time–frequency domain contains more potential information of the bearing state according to the fault feature frequency of Formulas (
1)–(
3). Hence, the raw time series and time–frequency data are selected as the multi-domain supervised learning targets of our diagnostic model. On the other hand, the channel of the time-series and time–frequency data are of the one-dimensional and two-dimensional shape, respectively; the traditional two-dimensional (2D) convolution neural network cannot work on time-series data. As a result of the parallel structure of the neural network, the one-dimensional (1D) and two-dimensional (2D) convolution layers are used in our diagnostic model.
As mentioned above, the detailed fault diagnosis flow of our proposed method is shown in
Figure 2, which contains three parts: the data preprocessing, feature extraction and feature classification. In the data preprocessing part, the first task is constructing the time–frequency domain data, and the raw vibration signal is transformed into a two-dimensional time–frequency image by the short-time Fourier transform (STFT). Not only does the STFT method retain the characteristic information of the vibration signal, but it also realizes the transformation from a 1D to a 2D signal. Then, the time–frequency image data and the raw time-series data are adopted as the diagnostic model input.
In the feature extraction part, parallel and different dimensional convolution layers are constructed, which are utilized to extract the key state feature information from the two shapes of data, respectively. The first group of 2D convolution layers are operating on the time–frequency image and extracting the useful features adaptively; the second 1D convolution layers are calculating the time-series and extracting the key information from the raw signal adaptively. Finally, two different and high-dimensional feature maps are acquired, which are merged and classified in the next step.
In the feature classification part, the main purpose is building the fault classification relationship form the extraction features to the bearing fault state. First, the two different shapes of feature maps of the previous convolution operation are fused into one by the average pooling operation. Then, the new feature map is inputted to the multiple fully connected layers, which form a mapping of feature data to fault state by a series of calculations on the neural nodes and hidden layers.
3.2. The Design of Hyper-Parameters
Our method also focuses on determining the structure of the neural network, such as the number of layers and neurons in convolutional layers, and the size of the convolution kernel, as well as the number of layers and neurons in the multilayer perceptron layer. The effect of model training will improve to some level when the number of network layers is increased, but the training time will continue to expand in logarithmic jumps at the same time. With some well-known network references [
32,
34] and a few samples for pre-testing, we chose an appropriate network architecture which combines two-layer 2D convolution, two-layer 1D convolution and a three-layer multilayer perceptron; a detailed description is shown in
Table 1.
Choosing one appropriate loss function will increase the model training velocity and achieve fast convergence. At present, there are a number of loss functions, such as minimum mean square error (MSE) and cross-entropy. This paper uses the cross-entropy loss as the loss value for the computation of the network. In comparison to MSE, the cross-entropy loss function can enhance the training effects of neural networks, particularly convolutional networks [
35]. The cross-entropy loss function is shown in Formula (
8).
where
M represents the number of total classes;
represents the variable 0 or 1 depending on whether the model output is the same as or different from the training sample label.
represents the predicted probability of the observed sample
i belonging to category
c.
The connection between the raw data and the output of the network model is a sophisticated non-linear mapping. Adding the activation function after each network layer can create a non-linear relation in the model. The most common activation functions are Sigmod, Tanh, ReLU, etc., as shown in Formulas (
9)–(
11). The ReLU activation function is selected through preliminary tests.
Furthermore, the diagnostic model contains a number of parameters that need to be adjusted and optimized during the training process. However, the training data collected from the device constitute only a small proportion of the real situation, so are defect samples. Over-fitting will occur during the training process, i.e., using a model that yields good results in the training dataset, but poor effects on the test dataset or the newly collected data. To avoid this problem, our method employs the dropout technology. During model training, some hidden neurons’ nodes are dropped away at a certain probability, allowing the network to learn more complex features and enhance its generalization ability.
Meanwhile, the image data size will be modified after each convolution, as shown in Formula (
12), and the size of the input data matrix shrinks. The data on the matrix’s edge have only been computed once, whilst the data at the matrix’s core have been computed numerous times, resulting in the edge data being lost. As a result, padding technology has been used, as shown in Formula (
13), which means that the image’s outside circle is padded with 0 before each computation, preventing the original image from shrinking after numerous convolutions.
where
W represents the size of image data,
F represents the size of the convolution kernel,
P represents the size of padding,
S represents the step length of moving, and
N represents the size of output data.