1. Introduction
With the development of the manufacturing industry, rolling bearings, as one of the core components of mechanical equipment, play an increasingly irreplaceable role. However, under the condition of strong noise and multiple loads for a long time, the bearings are prone to wear or breakage. An expected failure, such as a crack in the bearings, may cause the breakdown of the entire machine, resulting in magnificent economic loss and severe safety accidents [
1]. Therefore, it is of great significance to realize the high efficiency and accuracy of bearing fault diagnosis.
The location of the bearing failure is generally located in the inner ring, outer ring, and rolling element, and the bearing fault usually produces periodic vibrations when machinery is running, so analysis of the vibration signal during bearing operation is the key to achieving the diagnosis of the fault [
2]. Traditional fault diagnosis methods are divided mainly into linear and non-linear methods. Linear diagnosis methods mainly contain time domain analysis, frequency domain analysis, and time-frequency analysis [
3]. Nonlinear analysis is less adopted in fault diagnosis than linear analysis, chaos theory, fractal dimension, and entropy value theory, are commonly applied nonlinear analysis methods, among others. However, due to the increase in bearing fault datasets and the increasing complexity of production environments, traditional fault diagnosis methods that rely on traditional manual fault sign extraction have become no longer applicable [
4]. Therefore, constructing novel fault diagnosis models based on approaches of deep learning have become a research hotspot.
Frequently used deep learning models include the Deep Autoencoder (DAE), the Deep Belief Network (DBN), the Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN). Among them, the improved Stack Denoising Autoencoder (SDAE) diagnostic method was proposed by Hou et al. [
5], in which the hyperparameters of the DAE network were adaptively selected by the particle swarm algorithm to determine the structure of the SDAE network. On this basis, the characteristic representation of the fault state was obtained, which was input into the Softmax classifier for fault classification and recognition; this method has achieved accurate fault diagnosis under the circumstance of variable operating conditions. In-depth research on DAE was conducted by Shao et al. [
6], in which DAE and shrinking auto-encoding were introduced to improve the fault extraction capabilities of faulty features, and local preservation of projection fusion features was applied to optimize feature quality. Liang T et al. [
7] presented a method for the diagnosis of rolling bearing failures, which consisted mainly of three steps: a series of DBNs with different hyperparameters were constructed and trained, after which the improved ensemble method was applied to acquire the weight matrix for each DBN and then each DBN voted together according to its respective weight matrix to obtain the final result of the diagnosis. The method of DBN-based degradation assessment under accelerated life testing of bearings was adopted by Ma et al. [
8]. Shao et al. proposed a DBN for the diagnosis of induction motors faults, in which vibration signals were introduced directly as input [
9], and the t-SNE algorithm was adopted to visualize the learning representation. Han Tao et al. [
10] used CNN training to obtain the corresponding characteristic diagram of the multi-wavelet coefficient branching process through the wavelet transform to realize the intelligent diagnosis of rolling bearing composite faults. Liang et al. [
11] have constructed two different CNNs, one for extracting time-domain features and the other was applied to extract time-frequency domain features, and then fused them with the time-frequency features and time-domain features extracted by continuous wavelet transform diagnose faults of rolling bearings in a characteristic way. Bearing fault diagnosis based on LSTM (Long Short-Term Network) and CNN models was established by Pan et al. [
12], a fault diagnosis method was proposed by Zhang et al. [
13], in which self-encoding of convolutional noise reduction was performed to achieve feature extraction and CNN was introduced for pattern recognition. The long- and short-term memory stacking network was designed by Yu et al. [
14], where 12 different bearing health conditions were classified using augmented data, including the type and severity of bearing failure. A convolutional bidirectional long- and short-term memory network was designed by Zhao et al. [
15] for bearing fault diagnosis. In this method, a convolutional neural network was applied as a feature extractor of the original signal, and then the bearing faults were classified through a bidirectional long and short-term memory network.
Based on the brief review of the existing diagnosis approaches, the challenges can be summarized as follows: first of all, numerous methods only conduct comparative experiments for a single type of noise and other types of noise are not considered. Second, deep learning structures are often determined by trial and error, which means this structure is randomly defined; as a result, the model with the best performance is adopted after many experiments [
16]. To solve the above problem, an adaptive ADSD-gcForest fault diagnosis model is proposed in this paper, and based on the basis of the existing traditional network, the core fault features at different scales are effectively extracted by using dilated convolution with different dilation rates and CBAM fusion under strong noise interference. On this basis, deep separable convolution is incorporated into the dilated convolution mechanism to improve the efficiency of the calculation [
17]. In recent years, many adaptive optimization methods have been developed for network structure, but most of these approaches require the assistant of an intelligent optimization algorithm or migration learning [
18,
19]. In contrast, the network structure can be simply optimized by the Meta-ACON activation based on the input samples without the need for additional complex algorithms and can not only optimize the model structure but also make the model better deal with different sample data. Then, through the multigranular scanning of the deep forest classifier and the cascade forest algorithm, the hidden fault features in the feature vector are analyzed and extracted, and the final classification results are output. The main contributions of this article are as follows:
An adaptive ADSD-gcForest diagnostic model is proposed for the diagnosis of rolling bearing fault diagnosis, allowing the extraction of features under the high-noise and complex working conditions that could be realized. The structure of the diagnosis model achieves adaptive optimization based on the characteristics of the sample data.
Combining the multiscale depth-separable dilated convolution with CBAM can effectively extract fault features under strong noise interference. On the basis of the lack of adjust the original structure of the model, the Meta-ACON activation function is introduced into the convolution layer of the model to achieve adaptive optimization of the model structure according to the fault data of different bearings.
The comparative experiment shows that the ADSD-gcForest model proposed in this paper has strong generalization ability and robustness with certain practical value.
The rest of this paper is organized in the following way: the introduction of the related theories is mainly in
Section 2, the specific structure of the adaptive ADSD-gcForest model is described in
Section 3,
Section 4 is the experimental comparison part, and the conclusion is drawn in
Section 5.
3. Method
The ADSD-gcForest model will be described in detail in this section. The detailed implementations of the method are described in the following three steps.
Step 1: A sliding window is used to sample vibration signals, then the noise of different intensity is added and the signals are converted into SDP image, and then the sample data are divided into a training set and a test set.
Step 2: The training set is entered into the adaptive ADSD-gcForest model for training and the Meta-ACON activation function is applied to adaptively adjust the network structure, according to different types of sample data to obtain the current optimal model structure, after which the trained model is saved.
Step 3: The trained model is used to directly extract fault features from new images, which results in the final diagnosis. The overall flowchart of the fault diagnosis is drawn in
Figure 6.
3.1. Meta-ACON
In order to achieve more effective fault diagnosis based on different bearing fault data, it may be necessary to continuously adjust the existing structure to achieve higher accuracy. In order to solve the above problems, a relatively simple way to achieve adaptive adjustment of the network model is proposed in this paper: by setting a single conversion factor
β, the Meta-ACON activation function can simply select whether to activate the neurons in this layer according to different sample data (activation represents nonlinear output, while on the contrary, non-activation represents linear output). The design of the Meta-ACON activation function is derived from the smooth maximum function, and its formula is as follows:
where
represents the input signal sequence,
is the conversion factor, when
→∞,
→
max and
→0, and
is the arithmetic mean value. Many common activation functions have the form
max (
ηa (
x),
ηb (
x)).
ηa (
x) and
ηb (
x) are two freely configurable functions. For example, in the ReLU function,
ηa (
x)
= x and
ηb (
x)
= 0, many activation functions can be expressed in the form of
max (
ηa (
x)
, ηb (
x)). To simplify the design, only two variables are considered here, and the sigmoid function is simplified as
σ. At this time, the approximate relationship is represented as:
Furthermore,
=
,
=
and
≠
. The Meta-ACON activation function is as follows:
Among them,
and
are two random trainable parameters; therefore, the activation of neurons in this layer can be easily controlled by means of conversion factor
, where
, the input sample data is represented as
and c, h and w, respectively, describe the number of channels, width and height of the input sample data.
is the convolution of the sample data with the number of input channels as the width of the sample, the number of output channels as the width/r (r is a constant, generally taken as 16) and the convolution core size of 1 × 1. Similarly,
is also obtained by the convolution with the convolution core size of 1 × 1, except that the number of output channels and input channels of convolution are opposite to the setting of
. Since the
value is directly determined by the structural characteristics of the sample data, different sample data will produce different
values, Therefore, after many times of training, with the continuous updating of Meta-ACON parameters, the structure of the model can be continuously optimized. The specific calculation process is shown in
Figure 7.
3.2. ADSD-gcForest
Compared with time-domain signals, SDP images can represent different fault types in a more intuitive and simple way by presenting different geometric features. Therefore, the key to achieve an accurate fault diagnosis is to design a diagnosis model that can effectively extract geometric features from images. The visual geometry group 16 (VGG16) is one of the commonly used models in image processing. Feature extraction is effectively realized by stacking multilayer convolution, and network parameters are reduced by pooling layer. The model in this paper takes VGG16 as the basic framework. However, the structure of VGG16 network is relatively simple. Firstly, although the network is deep, ordinary convolution is widely used in convolution layers, which cannot extract the sample feature information in multiple scales, which limits the feature extraction ability of the network under the intervention of strong noise. Second, most of the activation functions in the convolution layer simply make the input signal become non-linear, so the network does not have good migration learning ability; thus, in the face of different sample data, the performance of the network will become unstable. Moreover, most diagnostic models use Softmax as the final classifier, However, Softmax is not an advanced classifier and cannot learn the feature information that has not been extracted, so as to reduce the final accuracy. In response to the above problems, the ADSD-gcForest model proposed in this paper makes the following improvements.
Due to the large number of input sample types, in order to increase the model feature extraction range and enrich the feature information, the characteristics of the receptive field are expanded by using the dilated convolution and combined with the residual network to build three branches. Therefore, the construction of three kinds of dilated convolutions with different dilation rates is connected through the residual network, the dilation rate is set to 1, 2 and 3 and the size of the convolution kernel is 3 × 3. Thus, multi-scale feature extraction is realized. After the dilated convolution with different expansion rates, it is combined with the CBAM, and the channel attention mechanism is used to measure the importance of different kinds of channel feature information in the feature map at different scales, so as to determine the key points under different channels in the feature map features. Then, the spatial attention mechanism is introduced to locate these key features and extract the key feature information from the feature map to obtain key features at different scales. Next, three feature maps are obtained and integrated using the residual network and input into the next layer. Due to the use of more dilated convolution and attention mechanisms in the network, it may lead to a longer network training time. Since the convolution operations for different channels of the input image can be simultaneously performed by the depth-separable convolution, the depth-separable convolution mechanism is led into the dilated convolution layer, after which the weight ratio of each feature map is determined through quasi-point convolution, and the the feature maps are integrated according to the weights. Thus, computational efficiency could be improved in this way. In order to realize that the network model can be adaptively adjusted according to the sample data of different fault types, the original ReLU activation function in the convolutional layer is replaced with the Meta-ACON activation function. The Meta-ACON activation function can be based on the size characteristics of the input image. By setting the conversion factor β, the value of β determines whether to activate the neurons in this layer after multiple trainings, and a flexible and efficient network structure can be adopted for the training model according to different input samples. Softmax is replaced by gcForest, which learns the hidden fault characteristics and gives the final results of the diagnosis results. The structure of the model is shown in
Figure 8. SD convolution stands for dilated convolution with a deep separable mechanism. Detailed parameters of the optimized network are shown in
Table 1.
4. Experiments
4.1. Introduction of Datasets
The datasets used in the experiment were the Case Western Reserve University bearing dataset and the Canadian University of Ottawa bearing dataset. Two different bearings are contained in the Western Reserve University bearing dataset: drive end bearing SKF6205 and fan end bearing SKF6203. The drive end bearing included the two different sampling frequencies of 12 KHZ and 48 KHZ, while the sampling frequency of the fan end bearing was only 12 KHZ. Ten types of states are contained in each bearing dataset, which are normal state, inner ring failure, outer ring failure, and rolling element failure. Each fault state contains three different fault levels, represented by a fault diameter. A total of four different load conditions were applied when measuring the bearing data. A total of 8 normal samples, 53 outer ring damage samples, 23 inner ring damage samples and 11 rolling element damage samples were obtained. The Canadian Ottawa dataset is the bearing vibration signal of different health conditions measured under time-varying speed conditions, which had 36 datasets. The bearing conditions include: normal, inner ring failure and outer ring failure. The working speed conditions are speed increase, speed deceleration, deceleration after speed increase and speed increase after deceleration. Each dataset contains two channels, and channel 1 represents the vibration data measured by the accelerometer, channel 2 signifies the speed data measured by the encoder, the sampling frequency is 200 KHZ and the sampling duration is 10 s.
The drive end and the fan end bearing data of Western Reserve University used in this paper are at the sampling frequency of 12 KHZ, and for part of the dataset in Channel 1 of the University of Ottawa in Canada, the sample data used were randomly selected from the dataset, where B, IR and OR indicate that the fault location is located in the rolling element, inner ring and outer ring of the bearing, respectively. Moreover, 007, 014 and 021, respectively, indicate that the fault diameter is 0.1778 mm, 0.3556 mm and 0.5334 mm, and the number at the end indicates the size of the load. For example, “−1” means that the load is 1 horsepower. A total of 1000 samples were sampled for each fault category, and the sample ratio of training set to test set was 7:3. The details are shown in
Table 2.
Six noises of different intensities were added to the sample dataset, namely, Noise 1, Noise 2, Noise 3, Noise 4, Noise 5 and Noise 6. Each type of noise contains three different types of noise. The proportions of the three noises in the six noises were Gaussian noise with signal-to-noise ratios of −4, −2, 0, 2, 4 and 6, salt and pepper noise, with ratios of 0.3, 0.25, 0.2, 0.15, 0.1 and 0.05, and Cauchy noise with position parameter 0 and scale parameter 1. gcForest was set as the classifier in all comparison methods, and SigDSD-gcforest means that the Sigmoid function is the activation function of the convolutional layers. Similarly, the activation functions of the convolutional layers in ReluDSD-gcforest and PReluDSD-gcforest are Relu and PRelu, respectively. The parameter settings of the ADSD-gcForest model were as follows: the network training parameters were set at a learning rate of 0.00005, the number of batch processing was 580, the number of iterations was 350, Adam was used as the optimization algorithm, the sliding window dimension used in MGS was 240, the number of trees in the random forests of MGS was 35 and the number of trees in a single random forest in the cascade forest was 150. The diagnostic effect is analyzed by comparing the accuracy rate, F1 value and Area Under Curve (AUC) value of different diagnostic models after training. The accuracy rate is generally expressed as , and FI value is calculated as , where TP refers to True Positives, FP represents True Negatives, FN indicates False Negatives and FP signifies False Positives. AUC is defined as the area under the area under curve. Generally, the higher the AUC value, the better the classification effect of the model.
4.2. Case Study 1: Performance of Drive End Bearing Fault Diagnosis
It can be seen from
Figure 9 and
Figure 10 that when the noise environment is Noise 1, after the sample is trained by the ADSD-gcForest model, there are three categories of samples with low recognition rates, and there is also a small amount of aliasing in the T-SNE image. In other noise environments, there are only one or two fault categories with a low recognition rate. It can be seen from
Table 3 that, compared to other methods, the ADSD-gcForest model achieves the highest fault accuracy rate and F1 value under various noises and different working conditions. Among them, the VGG16-gcForest model obtained the lowest accuracy and the F1 values, which is about 26–35% lower than those of the ADSD-gcForest model, while the accuracy and F1 values of the Res50-gcForest model are about 18% higher than the VGG16-gcForest model. Since the Relu function can better solve the network convergence problem than the Sigmoid function, the accuracy and F1 values of the Relu-gcForest model are about 0.6–0.7% higher than that of the SigDSD-gcForest model, and PRelu updates the weight according to the input data, which makes the network have a certain adaptive optimization capability. The accuracy and F1 values obtained after training is about 1.3% higher than that of the Relu-gcForest model, but its values are still lower than the ADSD-gcForest model.
Figure 11 mainly describes the comparison of the AUC values of different methods. From
Figure 11, it can be found that the AUC values of ADSD-gcForest under different noises are the highest and all are above 92%, indicating that the ADSD-gcForest model has a good fault diagnosis effect. It can be seen from the experimental results presented above that the ADSD-gcForest model can more accurately diagnose drive-end bearing failures under different working conditions and strong noise interference with a high accuracy rate.
4.3. Case Study 2: Performance of Fan End Bearing Fault Diagnosis
It can be seen from
Figure 12 and
Figure 13 that only when the noise environment is Noise 1, a few fault categories cannot be effectively identified. In other noise environments, the entire fault category can be accurately identified. It can be seen in
Table 4 that the accuracy and F1 values obtained from the training of the VGG16-gcForest and Res50-gcForest models have dropped by approximately 1.5–1.6% compared to the driving end values. The overall accuracy and F1 value of the VGG16-gcForest model are between 61–75%. The accuracy and F1 values of the training of the SigDSD-gcForest, ReluDSD-gcForest and PreluDSD-gcForest models has also decreased. Among them, the most obvious decrease is SigDSD-gcForest, with a decrease from 0.3% to 0.4%, while the accuracy and F1 value of the PreluDSD-gcForest model drops by at least about 1.3–1.5%. The accuracy value and F1 values of the ADSD-gcForest model are the highest, and these values are similar to case study 1.
Figure 14 depicts the AUC values obtained by different diagnostic methods under different noises. It can be found that the AUC values obtained by the ADSD-gcForest model are still the highest, which are close to those obtained in case study 1. Through the experimental results presented above, it can be found that the ADSD-gcForest model proposed in this paper can basically realize an effective fault diagnosis for different bearings under multiple working conditions.
4.4. Case Study 3: Performance of the Ottawa Bearing Dataset
In order to further test the generalization and robustness of the ADSD-gcForest model, case study 3 focused on the University of Ottawa dataset, which was specifically divided into six datasets. The setting method of adding noise was the same as case study 1. The specific sample types are shown in
Table 5. There were three operation conditions of the bearings in the datasets, i.e., normal (H), inner race fault (I) and out race fault (O), and also contained four speed transformation conditions, i.e., speed up (A), slow down (B), speed up and slow down (C) and slow down and speed up (D). The noise setting used in case study 3 is the same as the case study 1. The training parameter settings of the ADSD-gcForest model are as follows: the network training parameters were set to a learning rate of 0.00005, the number of batch processing was 550, the number of iterations was 350, Adam was used as the optimization algorithm, the sliding window dimension used in MGS was 240, the number of trees in the MGS random forest was 35 and the number of trees in a single random forest in the cascade forest was 150.
It can be seen from
Figure 15 that, compared to case study 1 and case study 2, when the noise environment is Noise 1 and Noise 2, the degree of discrimination of some fault categories is lower, but in other noise environments, the fault categories can be accurately classified. It can be seen from
Figure 16 and
Figure 17 that the training accuracy of the ADSD-gcForest model is the highest and the value is relatively stable, while the fluctuation is small, which is consistent with the values in
Table 6. At the same time, it can be found from
Table 6 that the accuracy and F1 values obtained by training the VGG16-gcForest and Res50-gcForest models are significantly lower than case study 1 and case study 2. In
Figure 16 and
Figure 17, the accuracy of the two models also fluctuates significantly, and the accuracy of the other three models is more accurate. The rate values have also decreased, but the value fluctuations are relatively small.
Figure 18 reflects the AUC values of different diagnostic models, from which it can be found that the AUC values of the ADSD-gcForest model are basically similar to the first two cases, but other diagnostic models have decreased. Through comparative experiments of three groups of different bearings, it can be seen that under different noise conditions and for bearing data under different working conditions, in one way, the ADSD-gcForest model can achieve effective fault feature extraction, while in another way, the use of the Meta-ACON activation function can easily and efficiently complete the self-adaptive optimization of the model structure and realize more accurate fault diagnosis.
5. Conclusions
This paper proposes an adaptive ADSD-gcForest model. The model uses the VGG network as the basic framework. Multi-scale features of input samples can be extracted through deep separable dilated convolution, and then the CBAM to focus the core features is combined at different scales, the Meta-ACON activation function is integrated into all convolution layers in the network, so that the model can be optimized adaptively according to different input data, and the gcForest as the final classifier can provide the final result. In the experimental part of this paper, datasets of Western Reserve University and University of Ottawa are used, including three bearing data, and it can be seen that faults of different types of bearings under strong noise and multiple load conditions can be effectively diagnosed by the ADSD-gcForest model. This shows that the model proposed in this paper has good robustness. It can also be found that the method proposed in this paper has better improved the migration ability of the model, simplified the design process of the diagnostic model and effectively avoided the problem of repeatedly modifying the model structure.
In modern industrial production, multiple bearings are often required to work together; thus, the effective fault diagnosis of multiple bearings is a hot research topic. The ADSD-gcForest model proposed in this paper can simply optimize the model structure according to different bearing data with the help of the Meta-ACON activation function. It has a certain industrial application value, but the addition of the Meta-ACON activation function also increases the number of parameters of the model, which leads to a longer training time. Therefore, how to reduce the training parameters of the Meta-ACON activation function under the premise of ensuring high accuracy will become the focus of future research.