1. Introduction
The rolling bearing is a critical component of rotating machinery such as the wind turbine, the hydraulic turbine, and the aero-engine. Rolling bearing faults are the leading cause of the failure of rotating machinery, bringing economic losses and even unplanned downtime. The fault diagnosis of rolling bearings before catastrophic failure is one of the main concerns of these industries [
1] and is essential for the availability and reliability of mechanical systems [
2].
The traditional fault diagnosis of rolling bearings is constructed by extracting fault characteristic frequencies (FCFs) from monitoring signals [
2,
3]. FCFs can be calculated by the geometric parameters and running speeds of the rolling bearings. However, most of the rolling bearings work under speed-varying conditions in practice, including the run-up and shutdown of machines and speed fluctuation due to variable loads. When the speed varies, the FCF changes with the speed, and the spectrum of the nonstationary signal shows a smearing phenomenon. Order analysis is a commonly employed approach that deals with time-varying FCFs [
4]. The effect of speed variation can be removed by resampling the raw signal in the angular domain, and the FCFs are converted to the fault characteristic orders (FCOs), which are constant. Order analysis relies heavily on rotational frequency, which can be obtained by installing an auxiliary tachometer [
5] or estimated from the existing signals [
6,
7]. In addition, many studies have focused on extracting time-varying fault characteristics directly through time-frequency analysis. Several methods have been proposed to obtain high-quality time-frequency representations with fine resolution and better energy concentration [
8,
9,
10].
Although these methods have proved effective under speed-varying conditions, prior knowledge is necessary to calculate the FCFs or FCOs. Deep learning (DL) algorithms have recently gained increasing interest and have proved their ability to diagnose faults more automatically and intelligently [
11,
12]. As a widely studied DL method, the convolutional neural network (CNN) has obtained promising results in fault diagnosis [
13], especially for the rolling bearings [
14]. With sparse connectivity and shared weights, CNN can better learn the deep features while, to some extent, avoiding overfitting and consuming fewer computational resources. Some of the literature concerns the use of CNN as an automatic feature extractor followed by other classifiers [
15]. However, most attention has focused on end-to-end fault diagnosis combining the feature learning and classification process [
16].
Studies on the fault diagnosis of rolling bearings under variable speed conditions using CNN represent a growing field. Huang et al. [
17] constructed the datasets including six different operation conditions to study the performance of CNN when treating nonstationary signals. Similarly, multiple-speed working conditions were considered in Refs. [
18,
19,
20]. CNN-based model validation experiments were also performed using datasets under speed-up conditions [
21,
22]. Usually, the monitoring data points are divided into two datasets using a sliding window or by random sampling. One of the datasets was used for training the fault diagnosis model, while the other was the testing dataset that verified the model performance. Though variable speed is investigated in these studies, the testing samples are in the same operating conditions as the training samples. However, this is not the case in the real world, where the operating speed of the rolling bearings is more changeable. It is generally challenging to gather comprehensive data covering all the speeds for training, especially during the early operation of the equipment. Therefore, more attention should be paid to the generalization of the model. This paper carefully evaluates the adaptation ability to cross different speed domains for the fault diagnosis problem of roller bearings under speed-varying conditions.
Some examples in the literature attempt to deal with the cross-domain problem by transfer learning algorithms, but the transfer of the model is relatively complicated, and data from the target domain are essential for training a model [
23,
24]. A CNN model could realize the cross-domain transfer of working speeds without domain adaptation treatments. Wang et al. [
25] proved the robustness of the CNN model to missing data by removing some data points in the training datasets, and the maximum missing data rate in their work was 40%. Wang et al. [
26] tested the CNN-based model with the training and testing datasets from different working speeds. However, multiple constant speeds instead of time-varying speeds were investigated, and the time series of the raw signals were converted into feature images before they were placed into the CNN model. In this paper, we are trying to address the problem of the fault diagnosis of rolling bearings under speed-varying conditions using an improved CNN model that works directly on the raw signals. The speed-varying conditions include different situations. On the one hand, the samples could be at multiple constant speeds or speeds that change with time; on the other hand, the training and testing samples could be at the same or different speeds. A novel CNN architecture, namely the fusion multiscale convolutional neural network (F-MSCNN), was proposed to address speed-varying problems. Compared with the original CNN model, a fusion layer and a multiscale layer were added to the model to enhance the model’s generalization ability.
One of the outstanding advantages of CNN is its capability to process multidimensional data so that the input of a CNN model can contain heterogeneous information [
20]. CNN with data fusion has proved beneficial in improving the fault diagnosis accuracy of rolling bearings. Comprehensive information on the faults can be gathered by fusing the monitoring signals. Related work includes fusing the vibration data of multi-directions [
27], fusing the vibration signals mounted on different locations [
28], and the fusion of vibration signals and the current signal [
29]. The sound signal analysis provides a contactless and effective solution for the fault diagnosis of rotating machines [
30,
31]. Incipient fault detection and classification can be achieved using sound data collected in a noisy environment [
32]. Inspired by these works, the fusion of the sound and vibration data of rolling bearings is studied in this paper to obtain complementary information on the faults. Instead of simply stitching the multiple signals, a two-dimension fusion layer was added at the beginning of the network to obtain the common and characteristic information of the sound and vibration signal.
In addition, convolutional kernels of different sizes were adopted in the proposed model. The concept of multiscale convolution comes from the CNN model codenamed Inception [
33]. Convincing results have been obtained with several Inception modules introduced in the original CNN model [
34]. Multiscale learning was also performed by parallel convolutional pathways and feature-level fusion at the later layers [
35,
36]. Such structures could achieve good performance but increase the parameters of the model. Huang et al. [
17] used a multiscale convolution at the beginning of the model to find the sensitive bands in the frequency domain of different resolutions. The fused sound and vibration signals also contained heterogeneous information, including multiple time characteristics at varying speeds. Therefore, this paper added a multiscale layer after the fusion layer to learn the heterogeneous features. The features were then concatenated and put into other feature learning layers to realize the end-to-end automatic fault diagnosis.
This paper primarily includes the following contributions:
(1) The data fusion, feature extraction, and fault classification processes are combined into an end-to-end automated procedure. The proposed model works directly on the raw signals of sound and vibration.
(2) The proposed method has a strong generalization capacity. In model validation, different speed-varying situations are taken into consideration. Datasets are constructed by samples at multiple constant speeds or speeds changing with time. The speeds of the testing samples are the same as or are unknown to the training sample. The proposed method can achieve high and stable accuracy under various datasets, especially fault diagnosis when crossing different speed domains.
(3) The F-MSCNN method adds a fusion layer and a multiscale convolutional layer to enhance the model performance under speed-varying conditions. The adaptive fusion of the sound and vibration data is realized at the beginning of the network to input comprehensive information to the model. The multiscale convolution operators could increase the depth and width of the network to learn more comprehensive features.
(4) The feature maps learned by F-MSCNN are visualized to reveal the inner feature learning mechanism.
The layout of this paper consists of five sections.
Section 2 formulates the traditional CNN method.
Section 3 describes the proposed F-MSCNN in detail. A verification experiment is represented in
Section 4 with a discussion of the diagnosis results, including visualization analysis, computational costs analysis, and comparison with other CNN-based methods and machine learning methods. Finally, the conclusions and further work are shown in
Section 5.
2. Convolutional Neural Networks
A convolutional neural network is a specialized kind of neural network that uses the convolution operation in at least one layer in place of a general matrix multiplication [
37]. A typical structure of CNN includes an input layer, convolution layer, pooling layer, and fully connected layer. The convolutional layer adopts filter kernels to convolve the input local regions and generate the local features. The desired features can be reorganized and extracted from the raw input data through a series of convolutions. The
jth output feature matrix of the lth convolutional layer
is described as:
where * is the convolution operator and
is the
ith input feature matrix of the layer
l − 1.
and
are the weights of the convolution kernel and the biases, respectively. The feature matrix
is the output of the convolution operation. An activation function
Φ is added to enhance the nonlinear expression ability. A rectified linear unit (ReLU) and sigmoid are commonly used as activation functions.
The pooling layer is used to reduce the spatial size of the feature map. The pooling operation helps to reduce computational complexity and avoid over-fitting. The pooling layer does not learn features. It performs downsampling by dividing each of the input feature maps into rectangular pooling regions. Max-pooling is the most widely used pooling type. For the one-dimensional max-pooling, given the size of the pooling region
w, the
jth max-pooling output of the layer
l + 1 is presented as:
where
is the
jth output feature map of layer
l.
t denotes the
tth pooling region and
. The maximum of each region is returned.
After several rounds of the convolutional and pooling layer, the fully connected layer is added to generate the high-level feature vector. The fully connected layer multiplies the inputs by a weight matrix and then adds a bias vector. Each neuron of the previous layer is connected to this layer.
where
and
indicate the weight and bias, respectively.
is the ith input feature map of layer
l, and the corresponding output is
. For a classification problem, a softmax activation function is applied in the last fully connected layer to recognize the patterns of the feature. The softmax function calculates the probability of each sample under all the possible target classes.
5. Conclusions
This paper proposed an improved model named F-MSCNN for the fault diagnosis of rolling bearings under speed-varying conditions using sound and vibration signals. The aim was to classify different types of faults accurately when the speeds of the testing samples were nonstationary and unknown to the trained model. F-MSCNN added a fusion layer and a multiscale convolutional layer to the original CNN to enhance the generalization ability by combining health conditions and working conditions information from different sensors and integrating multiscale features. F-MSCNN provided a simple end-to-end fault diagnosis framework that worked directly on raw sound and vibration signals without hand-crafted feature extraction. The sound and vibration signals were firstly fused by a two-dimension convolution to extract the common and distinctive features. The fused signals are then convolved by multiscale kernels to represent the features of various speeds.
The proposed method was tested by the sound and vibration data from a rolling bearing test bed. Six datasets are constructed to test the effectiveness of the model under different speed-varying conditions. Not only multiple speeds were considered, but speeds fluctuating with time were also studied. The speeds of the testing samples could be the same as the training samples or more strictly restricted to new working conditions unknown to the trained model. The results show that the proposed method had a strong adaptation ability for different speed domains and could achieve a high and stable diagnosis accuracy under different working speeds. The proposed method was also compared with other CNN-based methods and other machine learning methods with higher accuracy, better stability, and strong robustness to the change in speeds. In addition, to display the inner feature leaning mechanism of the network, visualization analysis was performed, including visualizing the activations of all the convolutional layers and feature distribution.
Further work will consider other changeable working conditions, such as the working loads, and conduct experiments on other objects, including the gearbox, to expand the scope of application of the model. The performance of the proposed model could be further explored under compound faults using a blind dataset. An example of a blind test is shown in Reference [
42]. Additionally, more efforts could be made to improve the performance of the proposed model further, especially when the speeds are changed with time.