Power Equipment Fault Diagnosis Method Based on Energy Spectrogram and Deep Learning

With the development of industrial manufacturing intelligence, the role of rotating machinery in industrial production and life is more and more important. Aiming at the problems of the complex and changeable working environment of rolling bearings and limited computing ability, fault feature information cannot be effectively extracted, and the current deep learning model is difficult to be compatible with lightweight and high efficiency. Therefore, this paper proposes a fault detection method for power equipment based on an energy spectrum diagram and deep learning. Firstly, a novel two-dimensional time-frequency feature representation method and energy spectrum feature map based on wavelet packet transform is proposed, and an energy spectrum feature map dataset is made for subsequent diagnosis. This method can realize multi-resolution analysis, fully extract the feature information contained in the fault signal, and accelerate the convergence of the subsequent diagnosis model. Secondly, a lightweight residual dense convolutional neural network model (LR-DenseNet) is proposed. This model combines the advantages of residual learning and a dense connection, and can not only extract deep features more easily, but can also effectively use shallow features. Then, based on the lightweight residual dense convolutional neural network model, an LR-DenseSENet model is proposed. By introducing the transfer learning strategy and adding the channel domain, an attention mechanism is added to the channel feature fusion layer, with the accuracy of detection up to 99.4%, and the amount of parameter calculation greatly reduced to one-fifth of that of VGG. Finally, through an experimental analysis, it is verified that the fault detection model designed in this paper based on the combination of an energy spectrum feature map and LR-DenseSENet achieves a satisfactory detection effect.


Introduction
With the development of industrial manufacturing intelligence, the role of rotating machinery in industrial production and life is more and more important. As the most common rotating machinery and equipment, rolling bearings work in complex and changeable harsh environments for a long time, so it is inevitable that there will be some damage, if not timely treatment. Because the failures will cause huge losses to industrial production, it is necessary to carry out scientific and efficient fault diagnosis research on rolling bearings [1].
In recent years, people have proposed many fault detection methods for rolling bearings based on vibration signal analysis. Signal feature extraction and intelligent and dense convolution is proposed. It strengthens the information flow between each layer of the deep network model, makes the transfer of features and gradients more effective, and alleviates the problem of vanishing gradients. At the same time, based on maintaining superior diagnostic accuracy, it greatly reduces the number of parameters and computational complexity of the model, and still has a good diagnostic effect under a variable load environment.
Finally, considering that the training time of the LR-DenseNet model is long and the diagnostic performance needs to be further optimized, a Lightweight Dense Residual Network based on channel attention (LR-DenseSENet, Lightweight ResNet) traffic is dense and the Connected Squeeze-and Excitation Convolutional Neural Network is proposed. Firstly, transfer learning strategy is introduced to shorten the training time by cross-domain transfer learning. At the same time, an improved feature fusion layer based on the channel attention module is proposed to improve the classification accuracy of the model, and greatly improve the reliability and robustness of the model.

Related Work
With the development of artificial intelligence technology, a lot of research has been done on the fault diagnosis of rotating machinery. To sum up, the research of fault diagnosis methods mainly goes through several stages. The first stage involves human perception. The second involves expert experiences to subjective judgment of fault condition, where the diagnosis process is relatively simple, but the influence of subjective factors leads to the failure of criteria accuracy. At the same time, the methods are restricted to those with professional skills, as the knowledge and cannot be mastered by many, leading to low commonality; With hardware such as sensors, signal analysis, and data processing technology, combined with the rapid advance of the second stage, some scholars use hardware such as sensor acquisition of rolling bearing vibration data. Based on this information, using vibration signal analysis technology for data analysis, and using the results obtained from the analysis of diagnosis, allows for the concept of diagnosis that is in use today. Yuwono et al. [15] put forward a kind of intelligent fault diagnosis methods, and was the first to use wavelet transform and cepstrum filtering completed work to extract fault features, and then based on the swarm optimization algorithm with the hidden Markov model, it was improved, and has carried on in the fault diagnosis on which this work is based. In the case of western reserve data sets, accuracy reached 97.32%. Subsequently, with the continuous development of computer software and hardware technology and the continuous improvement of artificial intelligence technology, scholars began to combine deep learning-related algorithms with signal analysis technology for fault diagnosis. Tamilselvan et al. [16] used multiple sensors to fuse the monitored data. Then, based on DBN, multi-sensor fault health diagnosis is realized.
Among them, the second and third stages of the research experience of rotating machinery fault diagnosis have the same point, which is the two core steps of signal feature extraction and intelligent diagnosis algorithm research. These two steps play a decisive role in the final effect of fault diagnosis. Next, this paper will further introduce its research status.
Fault feature extraction can be divided into three types according to the different perspectives of signal analysis: fault feature extraction based on time domain signals [17], fault feature extraction based on frequency domain signals [18], and fault feature extraction based on time and frequency domain signals. Many scholars use time-frequency domain analysis method to extract fault signals. Different from STFT [19] and EMD [20], wavelet packet decomposition retains the effective local frequency analysis of wavelet decomposition, and further decomposes different frequency ranges to achieve multi-resolution analysis of signals. Li [21] realized the diagnosis and classification of bearing faults based on wavelet packet decomposition and multi-fault classifiers composed of multiple support vector mechanisms. Wu [22] combined wavelet packet decomposition and high order cumulant to effectively extract fault features and use principal component analysis algo- rithm for dimensionality reduction, thus achieving effective identification of rolling bearing fault types. However, the fault signals of rolling bearings have the characteristics of being non-stationary and unstable, leading to traditional time-frequency domain methods that cannot fully and effectively characterize the characteristic information of vibration signals.
After extracting the characteristic information of vibration signal by analyzing it, it is necessary to classify it for fault detection, which is the key to the whole fault detection research [23]. The fault detection model is classified based on traditional signal analysis and shallow machine learning algorithm, and the algorithms mainly used include artificial neural network and support vector machine [24,25]. Tabrizi [26] obtained eigenvalue vectors of fault signals by means of ensemble empirical mode decomposition and realized further diagnosis and classification of faults by means of support vector machines. Hu [27] obtained feature vector matrix by singular spectrum decomposition and combined a bilevel support vector machine for fault classification. Tan [28] used wavelet packet decomposition to obtain characteristic information of vibration signals, and then carried out diagnostic classification of feature information based on the k-nearest neighbor algorithm.
Convolutional neural network (CNN) and VGGNet, AlexNet, residual convolutional neural network, and dense convolutional neural network are the most widely used algorithms in deep learning, which can extract the features of such irregular time series well. Li et al. [29] designed a multi-scale multi-sensor feature fusion convolutional neural network (MSMFCNN), which fused the rich information provided by multiple sensors and conducted fault diagnosis based on CNN, achieving good diagnosis results. Xiong et al. [30] proposed an enhanced deep residual network with multilevel correlation information for fault diagnosis of rotating machinery, which is used to process the feature information obtained by wavelet packet transformation. Shi et al. [31] proposed a fault diagnosis method based on IMFs and WDenseNets, in which the components of vibration signals obtained through empirical mode decomposition were weighted and input into WDenseNets for fault identification and classification [32].
In summary, traditional signal analysis combined with shallow learning network for fault diagnosis relies on the professional knowledge of researchers, and has strong un-certainty in feature extractions, which has a great impact on diagnosis performance. The fault diagnosis algorithm based on deep learning correlation architecture combines feature extraction with fault identification and classification by virtue of its adaptive ex-traction ability of fault features, which significantly improves the efficiency and generalization of fault diagnosis [33]. However, with the deepening of layers, the corresponding problems of deep learning network architecture begin to appear. The over-fitting caused by too many deep layers, the disappearance of gradients, and the increase of parameter calculation amount on the hardware requirements are some problems at the present stage. Therefore, to find a more efficient intelligent fault diagnosis model is the key work of current research. Table 1 shows the different attributes of deep learning technology fault diagnosis.

Methods Proposed in this Paper
In view of the above limitations, the main research objectives of this topic are: based on the design, to effectively extract fault feature information and use it in the form of two-dimensional data as the subsequent network input feature analysis method. At the same time, the intention was to design a lightweight deep fault diagnosis network model, so that it can ensure the diagnosis effect and reduce the parameter calculation involved in model training. The overall framework of this paper is shown in Figure 1. [33] an enhanced deep residual network with multilevel correlation information It is used to process the feature information obtained by wavelet packet transformation.
No No [34] a fault diagnosis method based on IMFs and WDenseNets The components of vibration signals obtained through empirical mode decomposition were weighted and input into WDenseNets for fault identification and classification

No No
Our method 1.

LR-DenseNet
A Lightweight Dense Residual Network, a new dense residual block, is designed by combining residual connection and dense connection

Yes No
Our method 2.

LR-DenseSENet
The channel attention and transfer learning are embedded into the network based on LRDB and energy spectrum matrix.

Methods Proposed in this Paper
In view of the above limitations, the main research objectives of this topic are: based on the design, to effectively extract fault feature information and use it in the form of twodimensional data as the subsequent network input feature analysis method. At the same time, the intention was to design a lightweight deep fault diagnosis network model, so that it can ensure the diagnosis effect and reduce the parameter calculation involved in model training. The overall framework of this paper is shown in Figure 1.

Two-Dimensional Time-Frequency Feature Representation Based on Wavelet Packet Transform
The wavelet packet nodes obtained after wavelet packet transformation contain a large amount of information about vibration signals, and the wavelet packet nodes [34] have stronger energy stability in comparison. Based on this, a novel two-dimensional time-frequency feature expression method based on wavelet packet time-frequency transformation is proposed in this paper. This method can fully extract the feature information contained in the fault signal and improve the accuracy of the subsequent fault detection model while realizing multi-resolution analysis.
The calculation process of energy information of branch nodes obtained by wavelet packet transformation is as follows: (1) Based on the original vibration data signals f (t), N layer wavelet packet decomposition and signal reconstruction are carried out to obtain a total of 2 N branch reconstruction signals. (2) The corresponding branch band energy E i,j can be obtained through calculation. The calculation expression of the total energy value contained in the original vibration signal is shown in Equation (1). where i = 0, 1, 2, · · · , n represent the layers of wavelet packet decomposition, and j = 0, 1, 2, · · · , 2 n are the index of different branch nodes. B is defined as the decomposition coefficient, S i,j is the branching signal, and E i,j represents the energy value contained in the reconstructed signal S i,j . B i,j (n) is the corresponding wavelet packet coefficient, and the frequency band energy information contained in the j-th node of the i-th layer is shown in Equation (2), where N represents the number of wavelet packet branch nodes contained in the subspace.
We obtained the energy information contained in specific branch nodes through the previous steps, and then normalized the total energy of the original vibration signal to obtain the corresponding relative energy information of a branch frequency band, as shown in Equation (3).
where e i,j represents the energy value of the corresponding node after normalization. E is the total energy. After the energy information of all nodes is obtained, the energy spectrum feature vector F is constructed, and its expression is shown in Equation (4).
On this basis, the energy spectrum matrix is constructed, as shown in Equation (5).
Experimental samples for our sample were built in accordance with the rules of 8-layer wavelet packet decomposition and reconstruction: 256 branch nodes, with corresponding wavelet packet coefficient to calculate the corresponding energy information, are obtained, and, finally, all the node energy information is evaluated for a building energy spectrum characteristic figure, and as the subsequent network model input.
The above notation descriptions are shown in Table 2. Table 2. Notation description.
The original vibration data signals The branch band energy

B
The decomposition coefficient The branching signal The corresponding wavelet packet coefficient The energy value of the corresponding node after normalization F The energy spectrum feature vector

Intelligent Fault Detection Model of LR-DenseNet
The LR-DenseNet intelligent fault detection model proposed in this paper is a new lightweight CNN architecture that integrates residual learning and dense connection ideas. It strengthens the information flow between each layer of the deep network model, makes the transfer of features and gradients more effective, and alleviates the problem of vanishing gradients. At the same time, based on maintaining superior diagnostic accuracy, the number of parameters and computational complexity of the model are greatly reduced.
The LR-DenseNet proposed in this paper mainly consists of three parts: a shallow spectrum image feature extraction module, a Local Residual Dense Block (LRDB), and a Dense fault feature abstraction module. The specific network structure is shown in Figure 2. Firstly, the shallow module extracts the features of the input data. Subsequently, each local dense residual module fuses the input shallow level feature information with the features obtained after processing the dense convolution layer, and the feature fusion used here is realized in the form of a short connection. Finally, the operation of feature abstraction and simplification should be added in the process of feature extraction, and the mean pooling layer should be set after the LRDB module, which can further abstract and process the features after fusion. Only the abstract features useful for classification should be retained, which is suitable for complex data structures such as energy spectrum images.

Intelligent Fault Detection Model of LR-DenseNet
The LR-DenseNet intelligent fault detection model proposed in this paper is a new lightweight CNN architecture that integrates residual learning and dense connection ideas. It strengthens the information flow between each layer of the deep network model, makes the transfer of features and gradients more effective, and alleviates the problem of vanishing gradients. At the same time, based on maintaining superior diagnostic accuracy, the number of parameters and computational complexity of the model are greatly reduced.
The LR-DenseNet proposed in this paper mainly consists of three parts: a shallow spectrum image feature extraction module, a Local Residual Dense Block (LRDB), and a Dense fault feature abstraction module. The specific network structure is shown in Figure  2. Firstly, the shallow module extracts the features of the input data. Subsequently, each local dense residual module fuses the input shallow level feature information with the features obtained after processing the dense convolution layer, and the feature fusion used here is realized in the form of a short connection. Finally, the operation of feature abstraction and simplification should be added in the process of feature extraction, and the mean pooling layer should be set after the LRDB module, which can further abstract and process the features after fusion. Only the abstract features useful for classification should be retained, which is suitable for complex data structures such as energy spectrum images. (1) Shallow feature extraction The input energy spectrum characteristic map is processed, and the shallow level information of the image is extracted through a convolution operation, and it is used as (1) Shallow feature extraction The input energy spectrum characteristic map is processed, and the shallow level information of the image is extracted through a convolution operation, and it is used as the input of the subsequent LRDB module. At the same time, 64 3 × 3 convolution kernels were used to complete feature extraction, and local information of the spectrum was extracted accurately and effectively. The ReLU activation function is used for nonlinear transformation.

Conv
(2) Local Residual Dense Block (LRDB) The core unit of LR-DenseNet is LRDB, which consists of a Dense Block, Transition Layer, and residual connection. The LRDB concrete structure is shown in Figure 3. Its own input information transfer from the front to rear makes a convolution of each subsequent layer before it can get the characteristics of the memory and realize the continuous memory mechanism. This kind of structure has a positive influence on the characteristics of the transmission, and makes the neural network have a continuous memory between each layer. the input of the subsequent LRDB module. At the same time, 64 3 × 3 convolution kernels were used to complete feature extraction, and local information of the spectrum was extracted accurately and effectively. The ReLU activation function is used for nonlinear transformation.
(2) Local Residual Dense Block (LRDB) The core unit of LR-DenseNet is LRDB, which consists of a Dense Block, Transition Layer, and residual connection. The LRDB concrete structure is shown in Figure 3. Its own input information transfer from the front to rear makes a convolution of each subsequent layer before it can get the characteristics of the memory and realize the continuous memory mechanism. This kind of structure has a positive influence on the characteristics of the transmission, and makes the neural network have a continuous memory between each layer.  The input data in this paper is an energy spectrum feature map that considers the complexity and characteristic difference difficult. By adding the fault feature abstraction module in the network, which consists of a global average pooling layer, it reduces the complexity of the network model to some extent and can further extract the characteristics of abstract processing, processing only that which needs to be retained for classification of abstract characteristics. It is suitable for the classification of energy spectrum character- The input data in this paper is an energy spectrum feature map that considers the complexity and characteristic difference difficult. By adding the fault feature abstraction module in the network, which consists of a global average pooling layer, it reduces the complexity of the network model to some extent and can further extract the characteristics of abstract processing, processing only that which needs to be retained for classification of abstract characteristics. It is suitable for the classification of energy spectrum characteristic map.

Conv
In this paper, a Lightweight Dense Residual Network fault diagnosis model LR-DenseNet is constructed based on the LRDB and energy spectrum matrix. The model framework is shown in Figure 4. The input data in this paper is an energy spectrum feature map that considers the complexity and characteristic difference difficult. By adding the fault feature abstraction module in the network, which consists of a global average pooling layer, it reduces the complexity of the network model to some extent and can further extract the characteristics of abstract processing, processing only that which needs to be retained for classification of abstract characteristics. It is suitable for the classification of energy spectrum characteristic map.
In this paper, a Lightweight Dense Residual Network fault diagnosis model LR-DenseNet is constructed based on the LRDB and energy spectrum matrix. The model framework is shown in Figure 4. Specific network architecture parameters are shown in Table 3.  Specific network architecture parameters are shown in Table 3. Meanwhile, the specific training process of the LR-DenseNet model designed in this paper is mainly divided into three steps: data acquisition and processing, network model training, and online fault diagnosis using the trained model. The details are summarized as follows: (1) Data acquisition and processing The vibration sensor is used to collect the original vibration data, which is transmitted to the industrial computer equipped with Win7 system through PCI board card, and the industrial computer makes the mat format data set. Each sample in the data set is numbered to obtain the sample sequence {x,L}, which represents the fault type of L corresponding to the timing signal x.
Information processing operations are performed based on mat dataset samples. Firstly, the 2 8 branch signals after reconstruction are obtained by wavelet packet transform, and the energy information contained in them is calculated. The energy spectrum feature map is obtained by processing the energy information, and the data sample set {F,L} is made for subsequent network input. Finally, the obtained energy spectrum feature map sample set {F,L} is divided into data sets. (2) Network training The LR-DenseNet intelligent fault diagnosis network model was built and includes three parts: shallow feature extraction, novel local residual dense block, and dense fault feature abstraction. At the same time, the model parameters were initialized, and the iteration times and training batches were set through experiments. The LR-DenseNet model was trained on the basis of the constructed sample dataset. During the training process, the error loss function was used to calculate the training error of the model. Then, based on the loss error, the parameters in the model were iteratively updated using the backpropagation mechanism until the model converges, indicating that the model has been trained. At the same time, the validation set is used to verify the model parameters in the optimization process, and the optimal model is saved through the program in the verification process. Finally, the diagnostic performance of the model is tested using the test set.
(3) Online diagnosis The first two steps belong to off-line model training, and then the trained model is used to diagnose bearing faults online. The model built in this paper has a built-in fault prediction port. After the vibration data samples to be predicted are sent to the network model, the fault port can diagnose the fault conditions of the samples in real time and give an early warning according to the fault conditions.

Adaptive Fault Detection Method Based on Transfer Learning
The model combining an energy spectrum feature map and LR-DenseNet proposed in the previous chapter found that the classification accuracy of some categories was slightly poor when using a confusion matrix for analysis, and the training time of the model was long. This chapter will improve the above problems. Firstly, a transfer learning strategy is introduced to shorten the training time through cross-domain transfer learning. At the same time, an improved feature fusion layer based on the channel attention module is creatively proposed to improve the classification accuracy of the model. The improved model is the LR-DenseNet model designed in this chapter. After a series of experiments, it is found that the LR-DenseNet model has a great improvement in the recognition accuracy. The training time is greatly reduced, the model can achieve convergence faster, and the reliability and robustness of the model are greatly improved. The LR-DenseNet intelligent fault detection model proposed in this paper is a new lightweight CNN architecture that integrates residual learning and dense connection ideas. It strengthens the information flow between each layer of the deep network model, makes the transfer of features and gradients more effective, and alleviates the problem of vanishing gradients. At the same time, based on maintaining superior diagnostic accuracy, the number of parameters, and computational complexity of the model are greatly reduced.
In this paper, a feature fusion mechanism based on locally dense residual block and channel attention mechanism is designed. According to the importance of different channel features, different weights are allocated to realize the suppression of low efficiency features and the full use of high efficiency features, which greatly improves the network's feature processing strategy. As an embedded module, the channel domain attention module will be placed in two places by scholars in most cases: one is at each feature extraction layer of the network, and the other is at the output layer. These two have the best effect. To ensure the efficiency of the designed model and make it more lightweight, this paper chooses the latter of the two positions. Se-block is placed before the fully connected layer of the model, which will not increase the amount of calculation too much. The model performance is improved to a certain extent. The improved LR-DenseSENet network model based on channel domain attention mechanism is shown in Figure 5. feature processing strategy. As an embedded module, the channel domain attention module will be placed in two places by scholars in most cases: one is at each feature extraction layer of the network, and the other is at the output layer. These two have the best effect. To ensure the efficiency of the designed model and make it more lightweight, this paper chooses the latter of the two positions. Se-block is placed before the fully connected layer of the model, which will not increase the amount of calculation too much. The model performance is improved to a certain extent. The improved LR-DenseSENet network model based on channel domain attention mechanism is shown in Figure 5.  In this paper, channel attention is embedded into the network based on the LRDB and energy spectrum matrix, and then a lightweight dense residual adaptive fault diagnosis model LR-DenseSENet is constructed. The schematic diagram of the model framework is shown in Figure 6. In this paper, channel attention is embedded into the network based on the LRDB and energy spectrum matrix, and then a lightweight dense residual adaptive fault diagnosis model LR-DenseSENet is constructed. The schematic diagram of the model framework is shown in Figure 6.  Table 4. As can be seen from the figure, it is mainly composed of the LRDB module and SEblock module stacked. The general working process of the model is divided into the following steps: data acquisition and processing, network training, and online diagnosis. Network training is different from LR-DenseNet, as follows:

Pooling
The LR-DenseSENet intelligent fault diagnosis network model was built, including four parts: shallow feature extraction, new local residual dense block, dense fault feature abstraction, and channel feature fusion. At the same time, the model parameters were initialized, and the iteration times and training batches were set through experiments. The specific architecture of the model is shown in Table 4.
The model processing process is mainly as follows: First, the input energy spectrum feature map is processed, and 16 3 × 3 convolution checks are used for preliminary convolution processing. Then, the feature map obtained after processing is fed into the local residual-dense module LRDB. The number of convolution kernels of the three LRDB blocks is 16, 32, and 64, and the output feature map size of the three modules is 32 × 32, 16 × 16, and 8 × 8, respectively. Both of them use the way of padding = same to make the size  Table 4. As can be seen from the figure, it is mainly composed of the LRDB module and SE-block module stacked. The general working process of the model is divided into the following steps: data acquisition and processing, network training, and online diagnosis. Network training is different from LR-DenseNet, as follows: The LR-DenseSENet intelligent fault diagnosis network model was built, including four parts: shallow feature extraction, new local residual dense block, dense fault feature abstraction, and channel feature fusion. At the same time, the model parameters were initialized, and the iteration times and training batches were set through experiments. The specific architecture of the model is shown in Table 4.
The model processing process is mainly as follows: First, the input energy spectrum feature map is processed, and 16 3 × 3 convolution checks are used for preliminary convolution processing. Then, the feature map obtained after processing is fed into the local residual-dense module LRDB. The number of convolution kernels of the three LRDB blocks is 16, 32, and 64, and the output feature map size of the three modules is 32 × 32, 16 × 16, and 8 × 8, respectively. Both of them use the way of padding = same to make the size of the feature map before and after the processing consistent. After the stitching, the transition layer is used to compress the feature, which is convenient for further processing. Then, the SE module is used to implement the channel domain attention mechanism to weigh the feature channels with different importance degrees.
Finally, the global mean pooling layer is used to further process the feature map, and the feature vectors used to characterize the input feature information are obtained. Based on these feature vectors, the classifier is used to classify and diagnose the feature map.
The LR-DenseSENet model was trained based on the constructed sample dataset. The error loss function was used to calculate the training error of the model in the training process, and then the parameters in the model were iteratively updated based on the loss error using the backpropagation mechanism until the model converges, which means that the model has been trained. At the same time, the validation set is used to verify the model parameters in the optimization process, and the optimal model is saved through the program in the verification process. Finally, the diagnostic performance of the model is tested using the test set.

Model and Training Process Parameter Settings
To verify the superiority of the proposed model, a sample dataset of energy spectrum characteristic graph was used for validation experiments. The dataset was divided into five sub-datasets according to load, and each sub-dataset represented a load, namely 0HP, 1HP, 2HP, 3HP, and variable load. Each situation was divided into 10 categories, and the data set was divided in a ratio of 7:3 to obtain the training set and test set for subsequent experiments, with 1260 training set pictures and 540 test set pictures in each category. The initialization of relevant parameters of the model is shown in Table 5.

Feature Extraction Effect
To verify the effectiveness of the energy spectrum feature graph as a fault feature representation method, a verification experiment was designed; the experiment is based on the environmental load of 2 HP and selects four health condition of bearing vibration data wavelet packet transform. Based on the results of the transformation of the energy spectrum characteristic figure of building, the building is completed on the distribution of energy using histogram visualization, which at the same time draws the corresponding energy spectrum feature maps. The feature extraction effect is shown in the Figures 7-10.
To verify the effectiveness of the energy spectrum feature graph as a fault feature representation method, a verification experiment was designed; the experiment is based on the environmental load of 2 HP and selects four health condition of bearing vibration data wavelet packet transform. Based on the results of the transformation of the energy spectrum characteristic figure of building, the building is completed on the distribution of energy using histogram visualization, which at the same time draws the corresponding energy spectrum feature maps. The feature extraction effect is shown in the Figures 7-10.        Through the experiment and the argument, it can be concluded that the proposed new 2d time-frequency characteristic expression method based on wavelet packet timefrequency transform can fully extract the fault signal characteristic information and a more detailed analysis of the signal at the same time, due to its ability to fully extract and represent the characteristics of the fault information and make various characteristics between the fault type more distinct. Compared with other feature extraction methods, the proposed method can accelerate the convergence speed of subsequent diagnosis models Through the experiment and the argument, it can be concluded that the proposed new 2d time-frequency characteristic expression method based on wavelet packet timefrequency transform can fully extract the fault signal characteristic information and a more detailed analysis of the signal at the same time, due to its ability to fully extract and represent the characteristics of the fault information and make various characteristics between the fault type more distinct. Compared with other feature extraction methods, the proposed method can accelerate the convergence speed of subsequent diagnosis models to a certain extent. Therefore, the energy spectrum feature graph is adopted as the expression form of fault features and is used as the input of subsequent networks.

Recognition Accuracy and Performance Analysis of LR-DenseNet Model
CWRU dataset [35] was selected for verification in this experiment. This dataset is divided into four sub-datasets according to the load condition, and each sub-dataset represents a load, which is 0HP, 1HP, 2HP, and 3HP, respectively. Each case corresponds to four fault modes: normal, inner ring fault, outer ring fault, and rolling body fault, among which three fault modes: inner ring fault, outer ring fault, and rolling body fault correspond to three fault degrees: 7MILS, 14MILS, and 21MILs, respectively. In summary, one health state and three defect states, each with three severe damage degrees, constitute the dataset of energy spectral features used in the experimental validation. There were 10 fault types and 18000 groups of sample data, 70% of which were used to train the network model, and the rest (30%) were used to test network performance. The training batch of this experiment was set as eight. At the same time, the initial learning rate was set to 0.001, the Adam optimizer was used for adaptive adjustment, and 100 rounds of iterative training were carried out. Figure 11 shows the change curve of accuracy and loss error of the LR-DenseNet model, wherein the bearing load is 2Hp. to a certain extent. Therefore, the energy spectrum feature graph is adopted as the expression form of fault features and is used as the input of subsequent networks.

Recognition Accuracy and Performance Analysis of LR-DenseNet Model
CWRU dataset [35] was selected for verification in this experiment. This dataset is divided into four sub-datasets according to the load condition, and each sub-dataset represents a load, which is 0HP, 1HP, 2HP, and 3HP, respectively. Each case corresponds to four fault modes: normal, inner ring fault, outer ring fault, and rolling body fault, among which three fault modes: inner ring fault, outer ring fault, and rolling body fault correspond to three fault degrees: 7MILS, 14MILS, and 21MILs, respectively. In summary, one health state and three defect states, each with three severe damage degrees, constitute the dataset of energy spectral features used in the experimental validation. There were 10 fault types and 18000 groups of sample data, 70% of which were used to train the network model, and the rest (30%) were used to test network performance. The training batch of this experiment was set as eight. At the same time, the initial learning rate was set to 0.001, the Adam optimizer was used for adaptive adjustment, and 100 rounds of iterative training were carried out. Figure 11 shows the change curve of accuracy and loss error of the LR-DenseNet model, wherein the bearing load is 2Hp. Figure 11. LR-DenseNet model accuracy and loss error curve.
As can be seen from the figure, after 40 iterations of convergence, the accuracy of the training set finally converges to 99.32%, and the accuracy of the verification set finally converges to 94.25%, which has a good effect. From the loss error convergence curve of the model, the training set and verification set converge to 0.02, basically close to 0. It can be seen from both accuracy curve and loss value curve that the training curve and test Figure 11. LR-DenseNet model accuracy and loss error curve.
As can be seen from the figure, after 40 iterations of convergence, the accuracy of the training set finally converges to 99.32%, and the accuracy of the verification set finally converges to 94.25%, which has a good effect. From the loss error convergence curve of the model, the training set and verification set converge to 0.02, basically close to 0. It can be seen from both accuracy curve and loss value curve that the training curve and test curve gradually tighten with little fluctuation, indicating that the network structure is a good model.

Attention Mechanism and Transfer Learning Effect Analysis
To verify the suitability of channel attention mechanism with the design model of this project, a comparative experiment was conducted between the channel attention mechanism and spatial attention mechanism. In the experiment, 150 Epochs were trained, and energy spectrum characteristic data set was selected. The model for comparison is Spatial Attention (SPA) [36], which is the same as the Attention steps of SE-Net and is also divided into three steps: channel compression, channel transformation, and attention weighting. The comparison of the SE attention module (left picture) and SPA attention module (right picture) is shown in Figure 12. It is not difficult to see from the figure that the model with SE attention mechan was able to achieve convergence within five epochs with fast convergence speed an smooth curve, and the diagnostic accuracy was 100%, whereas the model with SPA att tion mechanism was worse, and the convergence was achieved after 20 epochs, and convergence process had obvious vibration. Finally, the accuracy of the validation set not reach 100%. By contrast, the convergence performance of the model was improved 15% with the addition of the SE attention mechanism, the accuracy was higher, and rate of error convergence was relatively fast, which verified the superiority of the SE tention mechanism in this paper.
To verify the performance advantage of using transfer learning in this paper, model involved in this paper is evaluated from two aspects of training accuracy and tra ing duration. The fault diagnosis accuracy of the model with the addition of transfer lea ing and SE module is shown in Figure 13. It is not difficult to see from the figure that the model with SE attention mechanism was able to achieve convergence within five epochs with fast convergence speed and a smooth curve, and the diagnostic accuracy was 100%, whereas the model with SPA attention mechanism was worse, and the convergence was achieved after 20 epochs, and the convergence process had obvious vibration. Finally, the accuracy of the validation set did not reach 100%. By contrast, the convergence performance of the model was improved by 15% with the addition of the SE attention mechanism, the accuracy was higher, and the rate of error convergence was relatively fast, which verified the superiority of the SE attention mechanism in this paper.
To verify the performance advantage of using transfer learning in this paper, the model involved in this paper is evaluated from two aspects of training accuracy and training duration. The fault diagnosis accuracy of the model with the addition of transfer learning and SE module is shown in Figure 13. rate of error convergence was relatively fast, which verified the superiority of the SE attention mechanism in this paper.
To verify the performance advantage of using transfer learning in this paper, the model involved in this paper is evaluated from two aspects of training accuracy and training duration. The fault diagnosis accuracy of the model with the addition of transfer learning and SE module is shown in Figure 13. As can be seen from the figure, from the perspective of convergence performance, the improved model will converge in 20 epochs. In terms of diagnosis accuracy, the final diagnosis accuracy reaches 99.59%, which can identify the fault type well. From the error Figure 13. LR-DenseSENet model training accuracy and error curve.
As can be seen from the figure, from the perspective of convergence performance, the improved model will converge in 20 epochs. In terms of diagnosis accuracy, the final diagnosis accuracy reaches 99.59%, which can identify the fault type well. From the error loss curve, the convergence speed is fast, and the final error is less than 0.05, with a good loss error, which indicates that transfer learning and SE attention mechanism have a good performance improvement for the model design.
Then, the performance advantages brought by transfer learning are demonstrated by combining training duration and accuracy. Table 6 shows the comparison of experimental results. It can be seen that in the same training duration, the accuracy of the model with transfer learning is slightly higher, whereas the training duration of the model without transfer learning is usually more than 500 min, and only 400 min after transfer learning is added. In general, the application of transfer learning can greatly shorten the training time and achieve high accuracy, which proves that the introduction of transfer learning brings good effect to the model.

Overall Performance Analysis of the Model
Then, the overall performance of the LR-DenseSENet model based on transfer learning is analyzed, and three experiments are used to discuss the advantages of the model's overall performance from three aspects.
Firstly, to further verify the superiority of the fault diagnosis model based on the combination of energy spectrum matrix and LR-DenseSENet model, the performance of the proposed method is compared with the commonly used feature extraction algorithms at the present stage. WPT is the wavelet packet transform method used in this paper, and FFT represents the time-frequency graph obtained by Fourier transform. Two-dimensional Gray Pixel Images are two-dimensional grayscale images [37], which are 64 × 64 twodimensional image data formed by normalizing the amplitude of the original vibration signal to 0-255. The comparison results are shown in Table 7. It can be seen from the table that the feature representation method based on the twodimensional gray map has the lowest accuracy, only 93.7% on the training set and 92.8% on the test set. The feature representation method based on Fourier transform spectrum has the accuracy of 95.6% on the training set and 94.6% on the test set. As a feature representation method, the energy spectrum feature based on wavelet packet transform in this paper has the best effect, and its accuracy in the training set and test set of energy spectrum feature map is more than 95%, and the accuracy of diagnosis based on LR-DenseSENet model on the training set is 99.59%. At the same time, the accuracy rate on the test set reaches 98.7%. These data further demonstrate the effectiveness of using the energy spectrum characteristic map as the fault feature representation method in this paper and prove the performance advantage of the adaptive fault diagnosis model based on the combination of the energy spectrum characteristic map and the LR-DenseSENet model.
Next, we compare the lightweight model in this paper with several common deep learning models in terms of effect and number of parameters. MATLAB is used to realize the last three network architectures. The effect pairs are shown in Table 8  The effect pairs are shown in Figure 14. Experimental results show that the number of parameters of the lightweight LR-DenseSENet model designed in this paper is only 0.75 M, only 1/5 of that of VGG-19 model, and the number of parameters is similar to that of other deep models. From the perspective of model complexity, it is only 71.3 M, lower than ResNet-56's 90 M. It is even lower than 250 M of VGG-19, but it has achieved 99% accuracy. The comparison results show that the model designed in this paper has achieved 99.4% accuracy under the condition of fewer parameters and lowest model complexity, which has significant diagnostic performance.
To further demonstrate the recognition effect of LR-DenseNet for different fault types, the Confusion matrix [38] is used to further express the model recognition effect visually. The test result shown in Figure 15 is the Confusion matrix, where the horizontal axis represents the real fault label, the vertical axis represents the network prediction of labels, the diagonal axis realizes the correct classification of the number of samples, and the rest of the grid in this picture represents the number of sample classification errors, which at the same time can be understood according to the specific mistake in the horizontal ordinate that determines the kind of fault type.
As can be seen from the figure, the LR-DenseSENet model has a good ability to distinguish fault categories and accurately locate faults with a small number of errors. The performance of the LR-DenseSENet model is better than that of LR-DenseNet, which proves the effectiveness and strong learning ability of the model. of parameters of the lightweight LR-DenseSENet model designed in this paper is only 0.75 M, only 1/5 of that of VGG-19 model, and the number of parameters is similar to that of other deep models. From the perspective of model complexity, it is only 71.3 M, lower than ResNet-56's 90 M. It is even lower than 250 M of VGG-19, but it has achieved 99% accuracy. The comparison results show that the model designed in this paper has achieved 99.4% accuracy under the condition of fewer parameters and lowest model complexity, which has significant diagnostic performance. To further demonstrate the recognition effect of LR-DenseNet for different fault types, the Confusion matrix [38] is used to further express the model recognition effect visually. The test result shown in Figure 15 is the Confusion matrix, where the horizontal axis represents the real fault label, the vertical axis represents the network prediction of labels, the diagonal axis realizes the correct classification of the number of samples, and the rest of the grid in this picture represents the number of sample classification errors,  As can be seen from the figure, the LR-DenseSENet model has a good ability to distinguish fault categories and accurately locate faults with a small number of errors. The performance of the LR-DenseSENet model is better than that of LR-DenseNet, which proves the effectiveness and strong learning ability of the model.

Model Robustness Analysis under Variable Load Environment
To verify the robustness and generalization ability of the LR-DenseSENet model for bearing fault diagnosis, five data sets were made, in which the sample data of 0HP, 1HP, 2HP, and 4HP were made into four data sets A, B, C and D, respectively, and then the sample data of B, C, and D were combined and split. Data set E is the variable load data set. Training is conducted to observe the adaptability of the model under different load

Model Robustness Analysis under Variable Load Environment
To verify the robustness and generalization ability of the LR-DenseSENet model for bearing fault diagnosis, five data sets were made, in which the sample data of 0HP, 1HP, 2HP, and 4HP were made into four data sets A, B, C and D, respectively, and then the sample data of B, C, and D were combined and split. Data set E is the variable load data set. Training is conducted to observe the adaptability of the model under different load environments. The training results are shown in Table 9. It can be concluded from the experimental results that the LR-DenseSENet model has good accuracy under different load environments, can effectively identify the fault type, and can converge to the minimum error. Meanwhile, after adding variable load data set E, the LR-DenseSENet model can still converge, and the accuracy is close to 99%. The loss error is close to 0, which further proves the robustness and generalization ability of the LR-DenseSENet model for bearing fault diagnosis.

On-Line Diagnosis
Firstly, the offline training phase of LR-DenseSENet model was carried out, and then the online diagnosis task of fault type was carried out [39]. The trained LR-DenseSENet model is saved, and then the online diagnosis port is used to diagnose the state categories of feature images in the test set of the divided energy spectrum feature graph by using the designed model, to realize the online diagnosis of faults [40]. After the input data, the online prediction results of LR-DenseSENet model are shown in Figures 16 and 17. E, the LR-DenseSENet model can still converge, and the accuracy is close to 99%. The loss error is close to 0, which further proves the robustness and generalization ability of the LR-DenseSENet model for bearing fault diagnosis.

On-Line Diagnosis
Firstly, the offline training phase of LR-DenseSENet model was carried out, and then the online diagnosis task of fault type was carried out [39]. The trained LR-DenseSENet model is saved, and then the online diagnosis port is used to diagnose the state categories of feature images in the test set of the divided energy spectrum feature graph by using the designed model, to realize the online diagnosis of faults [40]. After the input data, the online prediction results of LR-DenseSENet model are shown in Figures 16 and 17.   Figure 16 shows the online diagnosis results of the Normal fault type samples of rolling bearings, where NL stands for Normal, i.e., Normal fault type, and Probability and the following number represent the accuracy of prediction. The model can better identify the Normal fault type. The sample diagnosis results of the other nine fault types are shown in Figure 17.
The fault classification involved mainly includes three main fault modes: inner ring fault, outer ring fault, and rolling body fault, among which each fault mode has three faults with different damage degrees, and a total of nine fault types. When online bearing fault prediction is carried out, the energy spectrum images in the test set are firstly input into the prediction system, and the system uses the trained network model to make an online diagnosis, and outputs the diagnosis results and predicted probability. It can be seen from Figures 15 and 16 that the online diagnosis module has a good diagnosis effect, and the identification accuracy of all fault categories reaches over 99%. The LR-DenseSENet model can achieve a good effect overall, and the model can extract and learn the characteristics of samples well. Therefore, the LR-DenseSENet model can be used for predicting with high accuracy. Figure 16. Bearing normal condition online diagnosis results. Figure 17. LR-DenseNet model online diagnosis results. Figure 17. LR-DenseNet model online diagnosis results.

Conclusions
As the core component of power equipment, rolling bearing has an important influence on the safe operation of the smart grid. Therefore, it is of great significance to diagnose the fault. In this paper, the rolling bearing is selected as the research object, the energy spectrum feature map is proposed as a novel fault feature representation method, a lightweight LR-DenseSENet adaptive fault diagnosis model is designed for diagnosis and classification, and the experiment is carried out based on the CWRU dataset, the standard data set of motor bearing fault diagnosis. Thus, the efficiency and generalization of the LR-DenseSENet model are further proved.
Author Contributions: Y.L. finalized the version to be published; F.L. conceived and designed the topic and wrote the paper; Q.G., Y.Z. and S.Y. refined the idea and revised the paper. All authors have read and agreed to the published version of the manuscript.