4.3. A-HDCNN Model Diagnosis Results
In order to verify the superiority of the proposed A-HDCNN model, at the same time, the performance difference between HDCNN and A-HDCNN model after learning rate adaptive dynamic adjustment law is compared. We use the bearing data set of CWRU for experiments. The energy spectrum matrix obtained by preprocessing the original vibration signal is used for the input of the model. Through training, the weight parameters of the model are adjusted, and the performance of the trained model is tested with test samples. During the sample construction process, we collected data under different motor load environments. Under each load, 100 sets of samples were collected for each of the 10 health states, of which 50 sets were used for training and 50 sets were used for testing. A total of 1000 sets of samples were collected under the same load. All the samples in the healthy state are used in the first layer failure mode determination layer, and a total of 1000 samples are used for training (500 samples) and testing (500 samples). In the second layer of fault severity evaluation layer, there are three types of fault severity for each fault mode. Use 150 samples for training and 150 samples for failure severity assessment. The initialization parameter configuration for the specific training of each layer model is shown in
Table 4.
The learning rate is the coefficient of the gradient in the stochastic gradient descent (SGD) process, and it is directly related to the performance of model parameter optimization. Too high a learning rate will hinder optimization and cause loss errors to oscillate, while a too low learning rate will result in poor model convergence performance and fall into a local optimum. In traditional model training methods, the learning rate is usually set to a fixed value, and a larger or smaller value will adversely affect the model training. Many researchers choose the learning rate based on experience. For the A-HDCNN model proposed in this paper, the adaptive learning rate dynamic adjustment strategy realizes the dynamic matching of the dynamic adjustment of the learning rate and the number of iterations during the training process. The corresponding learning rate is always guaranteed to be updated along the direction of the gradient and has an appropriate learning rate. The initial learning rate of the model has almost no effect on the convergence of the final A-HDCNN model. In order to conduct comparative experiments, for the HDCNN model, it is still a fixed learning rate adjustment. We select fixed learning rates at equal intervals for experiments to observe the convergence and accuracy changes of different learning rate settings for different sub-models. The experiment found that the model is optimal and has a high accuracy and convergence performance when the learning rates of the first layer of failure mode determination layer and the second layer of fault severity evaluation layer , , were set, respectively, at 0.0035, 0.0040, 0.0040, 0.0075. In order to reduce the influence of other experiments, the initial value of the learning rate of the A-HDCNN model is set to be consistent with HDCNN.
Since each model corresponds to a large number of training samples, the samples are randomly split and combined. A small batch of samples is used to pack and input to the A-HDCNN model for training, and then the parameters are optimized according to the average error loss of each batch. "Batch size" represents the number of samples in the batch, which has a significant impact on the optimization performance and training rate of the model. For the model, the training data set and the test data set each contain 500 samples, and the batch size should be a divisor of 500, such as 100, 50, 10, 5, 1, otherwise it will cause sample waste. For the , , , the training data set and the test data set each contain 150 samples. The batch size should be a divisor of 150, such as 30, 10, 5, 1. If the batch is small, the number of samples combined in each training is small, resulting in slow parameter optimization. If the batch size is large, the model will fall into the local optimum. Therefore, compare the batch size and observe the impact of different batches on the diagnosis accuracy and convergence performance. It is found that for the failure mode determination layer and the failure severity evaluation layer, the batch setting is set to 10, which has the best convergence and the highest accuracy.
The following figure shows the training and test results of the sub-models in the failure mode determination layer and the failure severity evaluation layer after adding the learning rate dynamic adjustment law under a 1hp load environment.
Figure 11a shows the failure mode determination layer model
dynamic adjustment curve of learning rate. It can be seen that the learning rate is dynamically adjusted in stages in real time according to different iterations in the training process. In the early stage of the iteration, in order to accelerate convergence, the learning rate in the initial stage is generally large due to the relatively large change in the loss function. Later in the iteration, the relative change of the loss function is small, so the learning rate is generally small. On the whole, due to the good feature expression ability of the energy spectrum matrix, there is less tendency to oscillate throughout the training process. The learning rate gradually decreases and converges with the number of iterations. The learning rate update rule proposed in this paper reflects this training process. Take the failure mode determination layer as an example, as shown in
Figure 12a. Compared with the fixed learning rate model, after the learning rate adaptive update rule is added, the accuracy rate converges faster with the number of iterations, and convergence can be achieved around 20 generations, while the fixed learning rate can only reach convergence around 60 generations. As shown in
Figure 13a, taking the failure mode determination layer as an example, the same conclusion can be drawn from the variation of error loss. In addition, for the fault severity evaluation layer, as shown in
Figure 13b–d, it shows the variation curve of the error loss of the sub-model corresponding to the fault severity evaluation layer with the number of iterations. After adding the learning rate adaptive update rule, the convergence performance is also better. Therefore, the model feature learning speed is accelerated after the learning rate update rule is added, and the same layer of feature expression ability and convergence performance within its sub-model are greatly improved. This change process verifies the reliability of the proposed method.
Figure 11b–d show that the learning rate update of the fault severity assessment layer changes with the number of iterations.
Figure 12b–d show the comparison of the accuracy rate change rule with the number of iterations compared with the fixed learning rate model. It can be concluded that for the sub-model of the fault severity assessment layer, update rules can have a significant performance improvement, accelerate the model convergence speed, improve the model diagnosis accuracy, and reduce the model loss error. These results verify the performance advantages of the A-HDCNN model in fault diagnosis.
In order to obtain more accurate evaluation results and verify the stability of the model at the same time, the samples collected at different levels of the model are randomly split and shuffled, and the training set and test set are reconstructed. We conduct 20 random selection verifications. As shown in
Table 5, the average diagnostic accuracy of different algorithms under different load environments during 20 runs is summarized. Through observation, it is found that the A-HDCNN model has higher recognition accuracy at the first layer. The diagnostic accuracy is 100%, and they can all be correctly classified. At the same time, it indicates that the failure modes can be correctly flowed into the corresponding severity evaluation layer. The second layer sub-model did not accept incorrect failure mode samples. This good performance laid a good evaluation foundation for subsequent failure severity evaluation, and also verified the superiority of the energy spectrum matrix in the determination of failure mode. The features are easy to distinguish, and the learning ability of the model is good.
At the second level, we evaluate the severity of the failure.
Table 5 shows the comparison of the accuracy of the failure severity assessment for each health condition under different load environments. The classification results show that the evaluation of the A-HDCNN model at the second layer has high accuracy, and the diagnostic accuracy is as high as 99% or more, close to 100%. The diagnostic accuracy is stable under different loads. In addition, it shows that the overall model has high diagnostic accuracy as shown in
Table 5.
In addition, in order to further measure the variable load capacity of the A-HDCNN model and verify its adaptability under different load environments, we combined and shuffled the B, C, and D data sets (1hp, 2hp, and 3hp load environments, respectively), and constructed a training set and a test set. Train the model and test the experimental results, as shown in
Table 5. It can be found that the A-HDCNN model has high diagnostic accuracy in both failure mode recognition and fault severity evaluation, close to 100%, which further proves the robustness of the A-HDCNN model under variable load environment.
Error loss is an important indicator of model stability. In order to verify the influence of adding learning rate adaptive adjustment strategy on model stability, we summarized the average error loss of different algorithms under different load environments during 20 runs (iteration 150 times), as shown in
Table 6. It can be concluded that whether it is the failure mode determination layer or the failure severity evaluation layer, the A-HDCNN model is more stable under various load environments, and the error loss can converge to a minimum. Compared with the HDCNN model, the A-HDCNN model has a smaller error loss. In addition, after verification of the B, C, and D mixed data sets, the A-HDCNN model error can still converge well and the loss is small, which further verifies the stability and robustness of the A-HDCNN model in a variable load environment.
In order to further verify the reliability of the A-HDCNN model, and to verify the performance advantages of the combination of the energy spectrum matrix and the A-HDCNN model, we compare the performance of the typical algorithms commonly used at this stage, namely, DNN and SVM. In addition, in order to reflect the effectiveness of the proposed method, we replace the A-HDCNN model with DNN or SVM in the same hierarchical structure. The two algorithm inputs are energy spectrum vectors to achieve similar layer recognition. DNN uses the ReLU function as the activation function. In SVM, we choose the radial basis function for classification.
Table 7 shows the specific diagnosis results. Through observation, it is found that, compared with the deep fully connected neural networks DNN and SVM, the A-HDCNN model can achieve higher accuracy in both failure mode determination and failure severity evaluation. The overall model diagnostic accuracy is as high as 99.74%, which is better than DNN and SVM. At the same time, for the evaluation of fault severity, the characteristics of samples with different severity under the same fault mode are difficult to distinguish and confuse, which makes the diagnosis and learning of DNN and SVM difficult. In addition, for DNN and SVM models, the problem of poor diagnostic accuracy of the first layer further makes the second layer model receive samples of wrong failure modes, which further reduces the diagnostic accuracy of the second layer model. At the same time, it is also proved that the A-HDCNN model can adaptively extract the robust characteristics of the energy spectrum matrix with constant and precise details, which further proves the combined advantages of the energy spectrum matrix and the A-HDCNN model.
The A-HDCNN model provides a systematic and complete method for bearing fault diagnosis and overcomes the limitations of traditional training methods that use a fixed learning rate on model diagnostic performance. The adaptive learning rate dynamic adjustment strategy ensures that the model can adaptively extract robust features. It is highly adaptable under different load environments and has better diagnostic performance. In order to verify the superior performance of the adaptive method proposed in this paper, we use the overall diagnostic performance of the model as an indicator to compare the overall performance of the A-HDCNN model with other superior adaptive methods proposed by researchers in the field of fault diagnosis at this stage. For example, in order to improve the efficiency of continuous learning elements of rolling bearing fault diagnosis, Tian et al. [
32] incorporated a clonal learning strategy into the convolutional network (DCNN-FD-Softmax), which can adaptively extract deep fault features. Xie et al. [
33] proposed an end-to-end fault diagnosis model based on an adaptive deep belief network (Improved DBN+FFT). Qiao et al. [
34] proposed an adaptive weighted multi-scale convolutional neural network (AWMSCNN) for bearing diagnosis under variable operating conditions. At the same time, we use the energy spectrum matrix and deep convolutional neural network method under the single-level model as a benchmark for reference. The comparison results are shown in
Table 8. The experimental results show that the proposed method performs higher than other adaptive methods. The test accuracy is 99.74%, which is mainly the result of the better feature expression of the energy spectrum matrix and the adaptive feature learning ability of the A-HDCNN model. The experimental results are satisfactory. The comparison results show that compared with other methods, the A-HDCNN model proposed in this paper has achieved significant results and is better than other adaptive methods. It can be seen that the A-HDCNN model proposed in this paper has significant performance.