CNN-Based Feature Fusion Motor Fault Diagnosis

: Artiﬁcial intelligence ﬁelds have been using deep learning in recent years. Due to its powerful data mining capabilities, deep learning has a wide-ranging impact on the diagnosis of motor faults. A method for diagnosing motor faults based on the multi-feature fusion of convolutional neural network (CNN) is presented in this paper. As far as the method is concerned, CNN is used as the basic framework, and the CNN model has been improved. First, the collected vibration and current signals are preprocessed. Second, segmented multi-time window synchronous input is performed on the processed data. In addition, a multi-scale feature extraction process and time series fusion of vibration and current signals subject to synchronous input in the same time window can be performed, which ultimately enables the identiﬁcation of motor faults with a high degree of accuracy. In order to verify the validity of the proposed fault diagnosis model, an experimental platform for fault simulation was built for the motor, and vibration and current signals of different motor states were collected and veriﬁed by experimentation. According to the results of the experiment, the method can effectively combine motor vibration and current signal fault features, and thus motor fault diagnosis can be improved. In comparison with a single signal input, a multi-signal input provides greater accuracy and stability. As compared to other multi-signal feature fusion methods, such a deep learning model is able to extract fault features in a more comprehensive manner, which helps to improve the accuracy of motor fault diagnosis.


Introduction
As the power source in the industrial production process, the asynchronous motor is of very good economy and reliability; thus it plays an extremely important role in the industrial production process.Due to the diversity of the industrial production environment where the asynchronous motor is under continuous operation, its operating performance lowers.In case of a motor fault that is not discovered in a timely manner, the possibility of motor fault will be greatly increased.Industrial production suspension often happens on account of motor fault, which has a significant impact on the product quality and may result in serious safety problems and irreparable casualties [1].Nowadays, monitoring the running state of the motor to ensure the safety of the production process is an important issue to which attention is paid in the academic world and industrial circles, thus it is very important to research and develop the state detection and fault diagnosis methods for the motor.
Traditional motor fault diagnosis is mainly confirmed according to voltage state and artificial experience, so identifying fault causes accurately and addressing the fault of the motor early is not possible.To solve the problem, a sensor is used to collect the signal on the motor to provide more possibilities of motor fault diagnosis.The fault feature of the motor signal artificially extracted is often greatly limited because of the complexity of the motor environment in industrial production, so more and more signal processing methods and artificial intelligence methods are used in the field of motor fault diagnosis.The motor current signature analysis (MCSA) is a famous spectral analysis method [2] and a common method for online motor monitoring.W. Wang [3] put forward a model for motor rotor broken bar fault diagnosis by third-order energy operator to demodulate current signals.F. Sabbaghian-Bidgoli [4] used the improved Hilbert-Huang Transform to detect and analyze the motor rotor broken bar fault, and the FFT was also used to analyze the motor current features [5].Due to the different performance features of motor faults, MCSA has good results on the fault diagnosis of motor stator and rotor.In recent years, the analysis of motor vibration signal features has performed well in the field of motor fault diagnosis.L. Wang [6] used the short-time Fourier transform to transform the vibration signal into time spectrum to analyze motor faults.D. Wu [7] proposed a method for extracting the bearing fault feature based on the empirical wavelet transform and fuzzy entropy.In addition to this, the acoustical signal can also be used in the field of motor fault diagnosis.A. Glowacz [8] introduced the feature extraction methods of motor sound signal and developed the fault diagnosis technology based on the acoustical signal.The fault signal is subject to feature extraction.The realization of fault diagnosis is effective, but different methods have different effects on feature extraction, and the requirement against the expert knowledge is high, so the overall diagnostic effect is often accompanied by uncertainty.
With the continuous development of artificial intelligence, its application in the industrial field is also more and more extensive, and many scholars have applied artificial intelligence to the field of fault diagnosis.As a result, the abovementioned problems are solved and good results are achieved.Several common artificial intelligence methods are successfully used in the motor fault diagnosis field, such as fuzzy theory [9], BP neural network [10], support vector machine [11], ANN [12] and deep learning.Some shallow models mentioned above are used to describe the complex mapping relation between signal and health status.As a result, the model is obviously inadequate in diagnosis and generalization performance when facing many fault data [13].The deep learning is of stronger learning ability compared to other methods and capable of obtaining a more effective feature learning effect in the massive data with a deeper network structure, an excellent performance in the field of fault diagnosis and a good effect in terms of automatic speech recognition [14], computer vision [15] and bioinformatics [16].
Hinton et al. proposed the deep learning (DL) theory [17] in the science for the first time in 2006.In recent years, deep learning having a broad prospect in practical applications has been fully proven by the experts and scholars' research experiments.By the multilayer nonlinear network training, DL learns the underlying features of the sample without artificial extraction, fundamentally getting rid of the dependence on manual intervention and expert knowledge and greatly improving the ability of classification and prediction in the application.The most commonly used DLs in the field of fault diagnosis cover convolutional neural network (CNN), recurrent neural network (RNN) and stacked autoencoder (SAE).D. Wang [18] put forward a multiple-scale learning neural network including 1D and 2D convolution channels.The method is capable of learning the local correlation between adjacent and non-adjacent intervals in the periodic vibration signal.The classification accuracy of the method proposed against the motor bearing fault reaches to 98.58%.J. Zhu [19] reviewed the application of the RNN method in fault diagnosis, carried out introduction in terms of RNN and combined a neural network containing RNN and discussed the challenges and future development of fault diagnosis based on an RNN.W. Sun [20] used the sparse autoencoder (SAE) to study features and added partial damage features to the SAE input by the denoising autoencoder to improve the robustness of feature representation and then trained the neural network classifier by the features learned from SAE to identify the induction motor fault.Yong Zhu [21] et al. automatically optimized the improved hyper-parameters using the particle swarm optimization (PSO) and realized the accurate classification of hydraulic plunger pumps.
In recent years, along with the development of modern science and technology, the non-destructive diagnosis methods have become increasingly diverse to meet the needs of production.NDT technology has become widely used in the field of fault diagnosis with the advantages of no damage to the specimen and high detection sensitivity.The common methods for non-destructive diagnosis include thermal imaging detection, ray detection, ultrasonic detection, magnetic powder detection, penetration detection, eddy current detection, etc.After fuzzing the initial signal, Jan Rabcana [22] et al. classified the extracted signal features using Fuzzy Decision Tree and realized the nondestructive diagnosis of the aircraft engine blade.Adam Glowacz [23] realized the nondestructive diagnosis of an electric impact drill using the fault diagnosis method based on thermal image analysis.Faouaz Jeffali [24] et al. used the infrared detection device to monitor the motor in real time and compared it with the established thermal imaging model for the non-destructive diagnosis of a motor.Although nondestructive diagnostic methods can provide timely detection of defects in industrial applications, some tests can only use destructive testing.Therefore, at present, nondestructive testing cannot completely replace destructive testing and needs to cooperate with destructive testing.
However, the current motor fault diagnosis methods, whether the traditional feature extraction or the traditional deep learning methods or nondestructive detection methods mostly use single sensor measurement data for fault diagnosis.The industrial environment where the motor is located is often complicated, the amount of data measured by a single sensor is usually small and the fault information contained may be lost due to external interference, so the single feature model is probably insufficient to diagnose motor faults accurately.Therefore, some methods are required to realize the fusion of multiple features to maximize fault information and minimize the possibility of fault information loss.As for most of the traditional feature fusion methods, manual extraction is used and the feature fusion of different sensor data is achieved by reconstructing the feature vector [25].In recent years, some scholars have used deep learning to accomplish the multi-sensor data fusion for fault diagnosis.In the literature [26], the use of a 1D-CNN and a 2D-CNN parallel multichannel structure for extracting the depth feature was proposed, and then the feature fusion strategy was used for connection to achieve the fault diagnosis of a rolling bearing.In the literature [27], a method for the fusion of multimodal sensor signals (namely data collected by accelerometer and microphone) was proposed, the features were extracted from original vibration signal and acoustic signal, and the network based on 1D-CNN was used to fuse them to realize more accurate and more robust bearing fault diagnosis.In the literature [28], the features were extracted from different sensor signals to be input into a multiple two-layer sparse autoencoder (SAE) neural network for feature fusion, and finally the fusion feature vector was used to train the deep belief network (DBN) for further classification.Although the various methods above have a certain effect, there are still some problems.

1.
As for the single resolution or single scale data analysis, failure in accurately obtaining the global information of the original signal may occur frequently due to different faults and loads.

2.
The mapping between different signals and fault types is complex, and the common feature fusion strategies may lead to the loss of fault information in the practical application.The remaining organizations of this paper are as follows: The theoretical background of CNN and GRU is introduced in the Section 2. The CNN asynchronous motor fault diagnosis method based on feature fusion put forward is introduced in details in the Section 3. The experiments to collect vibration and current signals is introduced, the model is verified by the collected data and the contrast experiment for the performance between different models is performed in the Section 4. A conclusion is drawn in the Section 5.

1D Convolutional Neural Network (1D-CNN)
As part of deep learning, the convolutional neural network performs well in terms of data mining and is divided into many types, such as 1D convolution, 2D convolution, 3D convolution, deep separable convolution and grouping convolution.Because the vibration and current signals of the motor are 1D time series signals, the 1D convolutional neural network is used as the basic model.Although the feature dimensions of 1D convolution and other convolutions are different, all of them are constituted of an input layer, a convolution layer, a pooling layer, a fully-connected layer and an output layer [29].
As for the convolution layer, the convolution kernel is used for feature extraction.In case of I indicating the number of convolution layers, the formula of 1D convolution layer can be expressed as: N indicates the number of channels on the layer l − 1, k indicates the convolution kernel, b corresponds to the kernel deviation, j indicates the number of convolution kernels, f indicates the activation function, and * refers to the convolution operator.
The number of feature mappings will also increase after the data are subject to a convolution operation of the convolution layer; thus the pooling layer is added to realize the processing of each feature mapping and avoid data fitting problems arising from increasing the number of mappings [30].The formula of the pooling layer can be expressed as follows: x The fi and b deviation indicate different operation deviations of each feature mapping, respectively.
After obtaining enough stacking of convolution layers and pooling layers, the fullyconnected layer is used to execute further reasoning [31].If the final pooling layer is l, the formula of the fully connected layer can be expressed as follows: w indicates weight, and b indicates deviation.

Gated Recurrent Neural Network (GRU)
The gated recurrent neural network is one of recurrent neural networks [32].Its input/output structure is similar to that of a common recurrent neural network, as shown in Figure 2. In case of a current input of x t and hidden state h t−1 containing relevant information passed down from the previous section, GRU will obtain the output y t of the current hidden node and the hidden state h t passed to the next node based on their information.Similar to a long short-term memory recurrent neural network (LSTM), GRU is proposed to solve the gradient problems in long-term memory and backpropagation.Refer to Figure 3 for the internal structure of GRU.Compared to LSTM, it is capable of forgetting and selecting an operation by a gating.LSTM requires multiple gating units and has many parameters.In case of a few training samples, the GRU effect is better; thus the GRU and CNN are chosen for the combination in this paper.The gating state is obtained by h t−1 and x t for GRU, as shown in the following formula: r and Z refer to gating resetting and update, and σ refers to sigmoid function.The numerical value is transformed to a value from 0 to 1.After the gating signal is obtained, r is used to obtain the reset data firstly: The reset data is concatenated with input x t , and then an activation function is used to zoom out the data to be −1 to 1 with the following data obtained: tanh refers to the activation function, h mainly contains the data of x t , and h is targetedly added tothe hidden state, namely the selective memory stage.The most important memory update stage of GRU is as follows: x indicates the corresponding element multiplication in the matrix.The closer z ∈ (0, 1) and z are to 1, the more data will be remembered; the closer z ∈ (0, 1) and z are to 0, the more data will be forgotten.

CNN Fault Diagnosis Framework Based on Feature Fusion
Based on the basic principle of CNN, a new 1DCNN diagnosis model is proposed, and the current and vibration signal collected by the sensor are used as input to realize the fault diagnosis of asynchronous motor.

Signal Preprocessing
A large portion of the current signal of the motor is constituted by the main frequency part of the motor power supply.As for similar bearing faults, the current signal fault feature is not very obvious.The main frequency of the power supply will greatly influence the extraction of fault feature; thus it is necessary to preprocess the current signal.The envelope analysis can effectively eliminate the power supply frequency in the motor current signal, and the Hilbert Transform can be used to improve the accuracy and reliability of current signal fault feature extraction [33].So the Hilbert Transform is used to realize the envelope analysis of the motor current signal.The current signal x(t) c is subject to the Hilbert Transform expression as follows: The positive and negative frequency components of the signal are subject to phase shift for 90 degrees by the above transform.The real signal is turned into a complex signal,.The original signal is used as real part, and the analytic signal X(t) c is constructed.The X(t) c can be expressed as: Pursuant to the basic structure of CNN, the basic CNN structure is improved, and a new 1D-CNN structure is proposed to realize the motor fault diagnosis of the vibration and current signal fusion in this paper.In the model, the input is the envelope signal of the current and vibration signal.The input can be expressed as: X(t) v indicates the vibration envelope signal, X(t) c indicates current envelope signal, and i indicates signal sampling number.
The input point by point in the convolution operation will result in a complex operation.To lower the operation complexity, the vibration signal and current signal are subject to multistage segmentation simultaneously, and different time windows are used to divide the signal into equally long segments to obtain the equally long data flow with different resolutions and extract the fault feature in this paper.The length of each segment of the envelope signal with number of samples being n is divided into p, and it can be expressed as:

Multi-Scale Feature Extraction Module Based on Attention Mechanism (MFBAM)
After the data flow with different resolutions is obtained, there are limitations on the selection of filter sizes for different data flows, and the filter sizes at high proportion may have a good frequency resolution, but there are no sufficient filters for the positioning of the high frequency region.On the contrary, the dimensions of low scale filters focus on more frequency bands, but the resolution is low [34].Therefore, each data flow is subject to a multi-scale convolution operation, and the convolution kernel subject to three scales is used for the operation of each data flow with the deviation increased to extract different features in this paper.The definition is as follows: x k indicates input data flow vector,j indicates three different dimensions (j = 1, 2, 3), ω l j , and b l j indicate the convolution operation and deviation at different scales, respectively, and f indicates the activation function.Then the output vector x l j is input into the maximum pooling layer, and the feature mapping is compressed.The operation can be expressed as: To obtain fault features at different scales, after a 1 × 1 unit convolution kernel k u is added to the pooling layer, the fault feature obtained by ReLU chosen by the activation function is as follows: a u indicates the k u 's offset.The above operation can be used to obtain three fault features fault features F j (j = 1, 2, 3).
The effects of fault features obtained for different convolution kernels are different with different significances, so an adaptive attention module is added to the layer to assign weight to different features obtained to make the fault more recognizable.The importance score of fault feature F j is s j and calculated by the fully connected attention score layer: s j = sigmoid w t F j + b t w t refers to the weight of the score layer, and b t refers to the deviation of the score layer.They are adjusted by the network training without artificial setting; sigmoid is the activation function.The attention weight can be expressed as: The fault feature obtained by the attention weighting is as follows: The multi-scale feature obtained finally is as follows: Here the concat refers to series connection function.The sensitivity of the model to fault extraction and the accuracy of fault diagnosis can be improved by adding adaptive attention modules.The above multi-scale feature extraction module (MFBAM) based on the attention mechanism for single data flow is shown in Figure 4.In the above process of feature extraction, the setting of module parameters is shown in Table 1.The data flow input to different time windows in the same signal is processed separately in each time step, and the feature extraction is performed based on the same time step for the data flow of different signals in the same time window.The vibration signal feature extracted in the time step t is F v (t), and the current signal feature is F c (t).The expression can be as follows:

Feature Fusion
The vibration and current feature vectors extracted from the same time step are further coded and fused based on time series information by a group of GRU units.When the time step is t, the GRU input is as follows: The reset gate r t and update gate z t are: W, U, refers to the parameter matrix of the gating unit.After the time series feature of vibration and current signal is received from the GRU layer and the whole time series is traversed by the reset gate and update gate, the final output is h t , namely multi-signal fusion feature of time series encoded by GRU.The process is as follows: The feature fusion of the vibration and current signals under the same time window is realized by the above process.To make the fault feature more comprehensive, the fusion feature h     2 for basic parameters of the motor.One DC servo motor whose power is 3 kW is connected to the drive end of the three-phase asynchronous motor by the coupling, and the corrugated resistance of different resistance values is added to both ends of the DC servo motor to realize the variable load operation of the motor.The data collection crate of CHENGTEC is used as the data collection device, and its main parts cover CT-9208 vibration signal collection system, USB-1608 series multi-function data acquisition card and vibration sensor (as shown in Figure 7).CT-9208 has an 8-bit channel ICP sensor input and one speed input.The highest sampling rate can reach 94 ks/s/ch which can completely meet the sampling needs of the actual monitoring process.The USB-1608 series multi-function data acquisition card is capable of meeting 16-channel single-ended (SE) or 8-channel differential (DIFF) analog inputs.The BNC to M5/10-32UNF sensor test connecting line is used for transmission signal and the RG174-50Ω-1.5 high-quality cable with ultra low noise and anti-interference.The graphite semiconductor layer exists between the shielding layer and the insulating layer and the special transmission cable for sensor.The collection process of the vibration signal is as follows.The vibration sensor is installed in three directions of the motor drive end (including horizontal direction, vertical direction and axial direction).Another end of the sensor is connected to the vibration regulation equipment, and then the vibration signal in three directions (horizontal direction, vertical direction and axial direction) after signal conditioning is connected to the USB multi-function data acquisition card DAQ to complete the collection of the motor vibration signal.The collection of current signals is as follows.The current transformer is connected onto the protective circuit to collect the current signal on the three-phase current input.Another end is connected to the data collecting box, and the current signal is transmitted to the USB multi-function data acquisition card DAQ to complete the collection of the motor current signal.
The special treatment for different parts of the motor subject to the same model is mainly used to realize the simulation of the motor fault, and the signals of four fault motors and healthy motors are mainly collected in the experiment.Four faults refer to rotor broken bar fault, inter-turn short circuit fault, bearing outer ring fault and bearing inner ring fault.The fault simulation way is as follows: (1) Rotor broken bar fault: A hole is drilled on the conducting bar on the outside surface of the rotor to make the conducting bar under the breaking state.(2) Interturn short circuit fault: The experiment of three turns subject to short circuit between stator turns is simulated by one-phase grounding of the motor and drawing out the tap of three turns subject to a short circuit between stator turns.(3) Bearing outer ring fault: A certain amount of abrasion is incurred by the grinding machine to the bearing outer ring.(4) Bearing inner ring fault: A certain amount of abrasion is incurred by the grinding machine to the bearing inner ring.
Refer to Figure 8 for the simulation of the above fault.
To verify the performance of the CNN fault diagnosis model based on feature fusion proposed, the data collection device is used to collect the vibration and current signal of the motor under five different healthy states based on no load, 25% load, 50% load and 75% load, and the sampling frequency of the data collection card is set to 48 KHZ.Refer to Figure 9 for vibration signal waveform and current envelope signal collected from the motor under different states based on no load.To verify the robustness of the model, the data of the motor under each state collected under the condition of variable load will not be particularly distinguished, so there are five sets of data.The dataset is created by the vibration and current signal collected from five groups of motors under different states.Some 1200 samples are extracted from each group (300 samples under each load state).Each sample contains 1000 vibration signal points and 1000 current envelope signal points.
Then five different sets of vibration and current signals are labeled from Category 0 to Category 4, and all samples are used to form the dataset (as shown in Table 3).

Data Verification and Result Analysis
In the stage of experiment verification, the multi-time window sampling of CNN proposed in this paper is set in advance, and three different kinds of time windows are used.Refer to Table 4 for specific setting.Moreover, the Adam optimizer is chosen to reduce total loss, the initial value of the learning rate is set to 0.001, and the learning rate of Adam can be adjusted adaptively with training.All processes are performed under the deep learning framework TensorFlow.

Single-Signal Experiment
To verify the superiority of the motor fault diagnosis method based on CNN feature fusion proposed, firstly, the single vibration signal and single current envelope signal are analyzed separately.Each sample in the dataset used only contains 1000 vibration signal points or 1000 current envelope signal points, and the multi-time window CNN+ attention multi-scale network are subject to training and testing.The average accuracy and recall rated are used as the performance indicators of the fault diagnosis model.From the above experimental results, the classification accuracy of the model with vibration signal as input is better than the model with current signal as input during the training process.This means that it is more difficult for the fault diagnosis model to learn fault features from current signals than from vibration signals.The model performs worse when the current dataset is the input, indicating the model's insufficient generalization ability.Although the model proposed in this paper has high classification accuracy and fast convergence speed when the vibration signal is input, there is some gap between the model accuracy and input of the training set and the test set, indicating a slight overfitting problem in the experimental process with the vibration signal as input.

Multi-Signal Experiment
For the fairness of the experimental comparison, in the multi-feature fusion experiment with the vibration signal and the current signal as the input, the parameters were set consistent with the parameters of the single signal input.The same performance indicators were used during the experiment, and the time of training and testing was recorded.The results of fault diagnosis experiments with multi-feature fusion are shown in Table 6; the accuracy and loss during the experimental training and testing process are shown in Figure 12; the confusion matrix is shown in Figure 13.The experimental results show that the fault identification accuracy of the multi-feature fusion model with multi-signal input is better than that with single-signal input.The multifeature fusion model has more stable performance and no problems of overfitting and insufficient generalization ability.This is due to the multi-feature fusion model proposed in this paper, which effectively fuses the characteristics of the vibration signal and the current signal.The features of the failure signal during the forward propagation of the model are also well preserved.Of the 1500 sample experiments in the test set, the entire test time took only 2.1 s, demonstrating the high efficiency of the model in fault identification.

Comparative Analysis of Experimental Results
The average recall of the proposed multi-feature fusion model was 99.6%, 2.6% higher than the models using only vibrational signals.The worst average recall for the current signal model was only 58.2%.It suggests the fault features included in the vibration signal are more sensitive than the current signal.The method proposed here can effectively combine the fault information in the vibration signal and the current signal effectively and improve the effect of the fault diagnosis.
The classification results of the single-signal model and the multi-signal feature fusion model proposed in this paper are explained in detail by the above confusion matrix, and the sensitivity of different signals to different faults is explained sideways.Upon observation of abovementioned confusion matrix, it is found that the accuracy of the multisignal feature fusion model proposed in this paper is superior to that of similar model of single-signal input, proving the effectiveness of the proposed multi-signal feature fusion strategy.Specifically speaking, the model proposed is better than the single signal model when identifying the average accuracy of the motor under five states.The result of fault identification is poor in all aspects when the input is the current envelope signal, and the result of fault identification is much better than that of single current envelope signal when the input is the vibration signal in the single signal model.
In the single vibration signal input model, the identification accuracy of the motor under each state exceeds 95%, and the identification result of the motor bearing fault is optimal.In the single current envelope signal input model, the identification precision under the state of healthy motor, interturn short circuit and rotor bar breaking is the best, showing that there is much fault information provided by the motor current under these three states, and the current signal is more sensitive to the above states than the bearing fault state.When the above result conforms to the research status of the current motor fault diagnosis, the vibration signal is used for most of the motor fault diagnosis research, the MCSA analysis method is often used for interturn short circuit and rotor broken bar fault, and there are much fewer relevant research compared to those of the vibration signal, showing that the MCSA method is still quite challenging in the field of motor fault diagnosis, and there is also the possibility of further exploration.
The t-SNE can be deemed as one of the most effective data dimension reduction and visualization methods now.When the high dimensional dataset is classified, but whether the dataset is of good separability is unclear, t-SNE can be used to project the data to a 2D or 3D space for observation.To more intuitively reflect the effectiveness of the feature fusion method proposed in this paper, the t-SNE visualization sample processing is performed.Refer to Figure 14      By comparing the accuracy of different experiments mentioned above, the effect of the feature fusion of different signals accomplished by the common CNN network is not good, but the model diagnosis accuracy of the motor with the GRU network added under the state 0, 1, 2 and 4 is found to be improved by the comparison between experiment 1 and experiment 2, showing that the time series coding scheme of GRU can improve the diagnosis accuracy of the fault diagnosis model, and capturing the dependence on different feature time series is conducive to more comprehensive feature extraction in the process of feature extraction.Upon comparison between experiment 3 and experiment 4, it is found that the effect of the fault identification accuracy with the MFBAM module proposed in this paper is much better than that of the variant model of this experiment without MFBAM under all states, showing that the MFBAM module proposed in this paper can capture more comprehensive fault information in the multi-signal feature fusion strategy, greatly reducing the loss of fault information.The comparison between the model proposed in this paper and experiment 3 also shows that adding a multi-time window sampling strategy to the model can improve the diagnostic performance of the model to some extent.
The models proposed in this paper are trimed to different degrees to obtain these experimental models.Through the above experiments, the contribution of different parts of the model in the multi-signal feature fusion strategy is demonstrated.

Compared with Other Methods
Finally, the proposed multi-signal feature fusion model was compared with some traditional fault diagnostic methods commonly used in the field of motor fault diagnosis.The methods used in the comparative experiments are as follows: (1) Vibration signal fault features were manually extracted and classified using SVM.
(2) Fault signals were manually extracted, and fusion classification was performed by using LSTM.(3) RNN with an intermediate fusion strategy was adopted for fault signals.(4) The Fuzzy Decision Tree using vibration signals as input.
(5) The AlexNet using vibration signals as input.(6) The The Pso-LeNet-5 using vibration signals as input.(7) The DBN network fused with double-layer SAE was used to fuse fault signals.
The results of the above experiments are shown in Table 8, suggesting that the effect of the deep learning method is better than the traditional feature extraction and classification method, while the multi-signal feature fusion model proposed here is the best effect of fault classification.The above results can be attributed to the fact that in the traditional fault diagnosis methods, the most important step is to effectively extract the fault features in the original signal.[22] 97.67 AlexNet 96.43 Pso-LeNet-5 [21] 98.75 SAE-DBN [28] 97.23 CNN-Based Feature Fusion 99.85

Conclusions
A multi-feature extraction module based on attention mechanism is proposed in this paper.The analysis results by a comparative experiment method show that the feature extraction module is capable of more comprehensively capturing the fault feature with a higher efficiency.A motor fault diagnosis method of multi-feature fusion is put forward.As for the method, three different time windows are used to sample the vibration signal and current envelope signal at the same time, and the multi-feature extraction module based on attention mechanism is input for feature extraction of the vibration and current envelope signals sampled subject to the same time window.Then the GRU is used to encode the fault feature of the two signals in the time series anew, and the feature vector of different time windows subject to GRU encoding is input the Softmax layer after going through the fully-connected layer for fault identification and classification.The experiment has proven that the motor fault diagnosis method is of high diagnosis accuracy rate.Moreover, the effectiveness and superiority of the method is proven by the contrast test.

To overcome the limitation
of the above problems and improve the classification accuracy and model robustness in the fault diagnosis, a new feature fusion fault diagnosis model based on the convolutional neural network (CNN) is proposed.The vibration and current signals of the motor are input into the improved CNN.The sampling is conducted by different time windows with input into the feature extraction module of multi-scale combined attention mechanism later, and the features of vibration and current signals extracted synchronously in the same time window are encoded and fused in the time series by the recurrent neural network (GRU).Refer to Figure 1 for the overall process.The vibration signal and current signal are used by the method at the same time.Compared to other methods, the model put forward in this paper is capable of accurately studying the corresponding fault feature from different signals with the relation between different signals in time series explored.The main contributions of this paper can be summarized as follows: 1. Multi-scale feature extraction is performed after sampling via different time windows with loss of fault information reduced.Then the GRU is used to recode different signal features extracted in the time series, solving the information sharing between different signals and the possible overfitting problem of the model.2. The vibration and current signals of the asynchronous motor under five different working conditions are collected by the experiment, verifying the model accuracy.The comparative analysis between the model as well as the single signal model and some other motor fault diagnosis methods is conducted.

pt
(p indicates different time windows) is input to the final fully-connected layer.The final global feature output obtained is: ŷt = FCN h p t (25) 3.4.Classification and Training of Faults The Softmax layer is added after the above fully connected layer for fault identification and classification.The Softmax layer has a kernel whose size is five.As for the network model proposed in this paper, the training method of error backpropagation combined with gradient descent is used at the moment of training.The loss function MSE produced at the time of training is expressed by the mean square error function.When the training sample is n, MSE can be expressed as: − ŷi ) 2 (26) y i indicates truth value of fault type, and ŷi indicates fault type prediction.To sum up, the overall framework of the fault diagnosis model proposed is shown in Figure 5. Furthermore, the coordinated adjustment of data may be needed in the classification training process of the model.In case of overfitting, to prevent the occurrence of such a situation, the Dropout is added to the model, and the Dropout parameters are adjusted as per the training with initial value set to 0.5.

Figure 5 .
Figure 5. Overall framework of fault diagnosis model.

4 .
Collection and Verification of Experiment Data 4.1.Experiment Platform Building and Data Collection The vibration and current signals need to be used for the fault diagnosis model proposed in this paper at the same time.To obtain the vibration and current signals under different faults, the experiment platform is constructed to simulate different faults, and the simulation experiment is used to obtain the fault data.The experiment platform consists of several three-phase asynchronous motors, one DC servo motor, data collecting box, protective circuit, vibration sensor and current transformer.Refer to Figure 6 for the detail.

Figure 6 .
Figure 6.Experiment Platform.The motor used in the experiment is a 3 kW motor.Different fault motors subject to the same model are changed to achieve the goal of collecting different fault signals.Refer to Table2for basic parameters of the motor.One DC servo motor whose power is 3 kW is connected to the drive end of the three-phase asynchronous motor by the coupling, and the corrugated resistance of different resistance values is added to both ends of the DC servo motor to realize the variable load operation of the motor.

T
represents the number of samples under the correct model classification; F represents the number of samples under the wrong model classification; T i and N i represent the number of correctly classified samples and the total number of samples of the model with different signals as inputs.Adam optimizer was selected to reduce the total loss, and the initial learning rate was set as 0.001.The learning rate of Adam can be adjusted with the training adaptation.The process of model training ended with the stopping of classification accuracy, and the time of training and testing was recorded.To verify the robustness of the model, the whole experiment was repeated 30 times, and all procedures were performed under the deep learning framework-TensorFlow.The fault diagnosis experiment results of single vibration signals and single current signals are shown in Table 5; the accuracy and loss of the two different models during the training and testing of the experiment are shown in Figure 10; the confusion matrix is shown in Figure 11.

Figure 10 .
Figure 10.Model accuracy and loss for single-signal input.

Figure 12 .
Figure 12.Model accuracy and loss for multi-signal feature fusion input.
for 2D results of t-SNE visualization.According to the figure, five states of the motor can be accurately clustered in the feature fusion model, and the motor features under different states are clearly classified.By contrast, the feature extracted by the single-signal model fails to be clustered very well.The single vibration signal is with a few test sample clustering errors, and the single current signal has with plenty of test sample clustering errors.The above discussion shows that the multi-signal feature fusion model proposed in this paper has greatly reduced the loss of fault information and has a very good feature extraction effect.

Figure 14 .
Figure 14.Diagram for t-SNE visualization of each model.

4. 3 .
Comparison of Trimed Multi Feature Fusion Models To further verify the superiority of the fault diagnosis model in terms of feature fusion strategy proposed in this paper, three group of experiments are added for contrastive analysis.The two-channel CNN model is used for the first group of experiments, and different signal features extracted by two channels are directly fused by the fully-connected layer for fault diagnosis.The two-channel CNN+GRU model is used for the second groups of experiments.The inputs of the two channels are the current signal and the vibration signal of the motor.After two-channel feature extraction, the fault feature of the two different signals is encoded and fused via GRU, and the feature fusion of the motor vibration and the current signals is finally accomplished for fault diagnosis.The single time window sampling + MFBAM module proposed in this paper + GRU model are used for fault diagnosis of the third group of experiments.The multi-time window sampling + single scale feature extraction + GRU are used for fault diagnosis of the fourth group of experiments.The dataset used in the above experiments is the same as that used for the model in this paper.
However, manual feature extraction often requires repeated attempts in signal processing and feature extraction.If the characteristics of the signal are not selected properly, the effect of accurate classification may fail.In contrast, the fault diagnosis model based on deep learning can adjust the model parameters with the adaptive training set and then select the effective fault features to improve the classification accuracy of fault diagnosis.Through the multi-scale attention mechanism of different time window and combining GRU encoding, the proposed multi-feature fusion fault diagnosis model can effectively capture the connection between different signals in the time series, overcome limitations of traditional deep learning motor fault diagnosis methods using single feature signals and expand the motor fault feature space.Different signals also contribute different fault information relative to different faults.Therefore, the fusion analysis of current and vibration signals together is a good trend for the fault diagnosis of motors under different working conditions.

Table 1 .
Feature extraction module parameters.

Table 3 .
Motor fault data set.

Table 4 .
Setting of CNN Time Window Parameters.

Table 5 .
Experimental results of single-signal model.

Table 6 .
Experimental results of multi-signal feature fusion input.Figure 13.Confusion matrix for multi-signal feature fusion model.

Table 7
gives various precisions of the testing dataset finally obtained by the above experiments.

Table 7 .
Results of trimed model experiments.

Table 8 .
Comprehensive comparison of classification accuracy.