Fault Diagnosis of Main Pump in Converter Station Based on Deep Neural Network

: As the core component of the valve cooling system in a converter station, the main pump plays a major role in ensuring the stable operation of the valve. Thus, accurate and efﬁcient fault diagnosis of the main pump according to vibration signals is of positive signiﬁcance for the detection of failure equipment and reducing the maintenance cost. This paper proposed a new neural network based on the vibration signals of the main pump to classify four faults and one normal state of the main pump, which consisted of a convolutional neural network (CNN) and long short-term memory (LSTM). Multi-scale features were extracted by two CNNs with different kernel sizes, and temporal features were extracted by LSTM. Moreover, random sampling was used in data processing for imbalanced data, which is meaningful for data symmetry. Experimental results indicated that the accuracy of the network was 0.987 obtained from the test set, and the average values of F1-score, recall, and precision were 0.987, 0.987, and 0.988, respectively. It was found that the proposed network performed well in a multi-label fault diagnosis of the main pump and was superior to other methods.


Introduction
The main pump, the core of the valve cooling system, is powered for the cooling system medium to ensure the converter valve work at a normal temperature through heat exchange, which can affect the safety and stability of HVDC and even threaten large-scale renewable power generation and load electrification [1,2]. Therefore, it has great practical significance for fault diagnosis of the main pump [3,4]. However, there are few studies on fault diagnosis of the main pump, and most of the existing methods are time-consuming and laborious. Therefore, it is urgent to develop an algorithm that can timely diagnose the state of the main pump with high accuracy.
Generally, the main pump is a horizontal centrifugal pump to undertake the power supply task, thus the main pump in this paper refers to the horizontal centrifugal pump [5]. In practical application, four faults and one normal state of the main pump appear most, namely unbalance, looseness, parallel misalignment (PM), angular misalignment (AM), and normal.
At present, two main methods have been applied for the fault diagnosis of the main pump, machine learning, and deep learning. The former is to extract the signal features manually and carry out fault diagnosis by machine learning methods, such as support vector machine (SVM), k-neighborhood algorithm (KNN), and so on. Kumar et al. [6] extracted the features from the original signal and scale edge integral graph, optimized the SVM parameters by genetic algorithm (GA), trained SVM with the optimal parameters, and classified the characteristics of the centrifugal pump. The classification accuracy can reach 96.66%. Ebrahimi et al. [7] decomposed the vibration signal in three levels by the Daubechies wavelet, and 44 descriptive statistical features were extracted from the detail coefficients and approximation coefficients of the wavelet. The SVM classifier with an accuracy of 96.67% was obtained. Hui et al. [8] proposed a time-frequency signal analysis method based on the theory of cyclostationary. Firstly, the cyclic autocorrelation function (CAFs) of various signals was calculated, and then the features of CAFs in the frequency domain were obtained by FFT, thus as to carry out the fault classification. Janani et al. [9] used a wrapper model to select the appropriate features from the power spectrum of vibration signal and line current signal in an induction motor. The features were input into a multiclass support vector machine (MSVM), and the optimal MSVM classifier was obtained by using the fivefold cross-validation to select the optimal Gaussian radial basis function (RBF) and MSVM parameters. Maamar et al. [10] combined multilayer perceptron with backward propagation (MLP-BP) and genetic algorithm (GA). The feature extraction was carried out by using continuous wavelet transform and three different wavelet functions, and then GA optimized the number of hidden layers and neurons of MLP-BP. Janani et al. [11] proposed two methods based on MSVM, best energy (BE) criterion, and principal component analysis (PCA). The current and vibration signals of motor were preprocessed by wavelet packet transform (WPT), and then the appropriate features were selected according to BE and PCA, and finally, the classification was completed by MSVM. Zahoor et al. [12] proposed a three-level fault diagnosis strategy. Firstly, the fault characteristic modes of vibration signals were identified and selected, and then the mixed features were extracted in the time domain, frequency domain, and time-frequency domain of vibration signals. Then, the high correlation features in mixed features were dimensioned and a new feature pool was formed by using Pearson linear discriminant analysis (PLDA). Finally, the fault classification was carried out by KNN. Zahoor et al. [13] used cross-correlation between health baseline signal and other kinds of signals to obtain new features from the correlation sequence. Then, they extracted the mixed features in time domain, frequency domain, and time-frequency domain from these features and formed feature vectors by calculating correlation coefficients between different signals. Finally, they input feature vectors into MSVM to implement fault diagnosis. The research on fault diagnosis of the main pump mostly adopts the above methods, but they are time-consuming and may cause mistakes due to human misinterpretation.
The latter is to preprocess the signal and extract features automatically through deep learning methods to implement fault diagnosis. Deep learning methods have been used in fault diagnosis of mechanical equipment because of their superior ability of automatic feature extraction, especially convolutional neural network (CNN) and recurrent neural network (RNN). Wang et al. [14] transformed the raw signal into a spectrum signal through discrete Fourier transform (DFT) and then stacked the spectrum signal as a sample to input into CNN. Guo et al. [15] proposed a hierarchical CNN network structure with an adaptive learning rate. The first layer was used to recognize the fault type of bearing. The second layer was used to evaluate the fault size in the bearing. Because the learning rate had a great impact on the network, they also proposed a method to obtain an adaptive learning rate for making an improvement on the training effect of the network. Zhang et al. [16] studied the rolling bearing fault in a noisy environment and under the condition of constantly changing workload and proposed a new CNN training method, which greatly improved the robustness of the network and maintained high accuracy and stability even in a noisy environment and under the condition of constantly changing workload. Kumar et al. [17] proposed an improved CNN. The gray image of the sound signal was obtained by processing the sound signal with the analytic wavelet function (AWT). They used a new divergence function based on entropy as the loss function of CNN to solve the overfitting problem of CNN. Considering the outstanding extraction ability for temporal features in fault diagnosis, RNN based fault diagnosis model has been widely developed. Talebi et al. [18] put forward the idea of dynamic modeling of RNN based wind turbines for solving the inevitable problem, which is the wind energy conversion system fault. The residual error was obtained by comparing the built model with the actual system output for improving the performance of the built model. Experiments showed that the scheme could quickly obtain the fault diagnosis results, and the diagnosis effect was very effective. Przystalka et al. [19] proposed a robust fault detection method based on RNN and chaotic engineering by using local RNN to learn the chaotic behavior of chaotic engineering system. Mrugalski et al. [20] optimized the dynamic nonlinear system, especially studied the robustness of fault diagnosis. The results output a set of fault diagnosis model to make an improvement on the robustness of RNN. The model was used to simulate the disturbance attenuation process of a dynamic nonlinear system, and the results showed that the system could improve the robustness of fault estimation. Although deep learning has made some achievements in the field of mechanical fault diagnosis with high efficiency, there are few pieces of research on the application of deep learning methods in main pump fault diagnosis. At the same time, CNN's superior feature extraction ability and automatic feature extraction can get rid of the shortcomings of traditional fault diagnosis methods in manual feature extraction, but CNN cannot extract the temporal features of the signal. On the other hand, RNN can effectively extract the temporal features of signal, but its feature extraction ability is not as good as CNN in other aspects.
In order to solve the above problems, this paper proposed a fault diagnosis method based on Muti-scale Convolutional Neural Network and Long Short-Term Memory (MCNN-LSTM) hybrid neural network model for the main pump of valve cooling system in a converter station, and the performance of the model was evaluated by several indexes. This method takes into account the extraction of temporal and spatial features and retains the most features as far as possible, which makes this method more accurate than other methods. The experimental results showed that the method can diagnose the main pump quickly and accurately and had good generalizability. In this paper, Section 2 discusses the related works such as 1DCNN and LSTM. Section 3 introduces the construction and function of network in detail. Section 4 provides the composition and preprocessing of data. Section 5 describes the experiment and analyzes the results, and Section 6 draws a conclusion.

One-Dimensional Convolutional Neural Network (1DCNN)
CNN is an efficient algorithm for image recognition. It is a deep feedforward neural network including convolution operation. CNN is highly invariant to translation, scaling, tilting, or other forms of deformation, thus it is mainly used to identify two-dimensional (2D) images with distortion invariance. CNN is commonly classified into 1DCNN and 2DCNN. 2DCNN is usually used for image feature extraction. 2D filter convolutes the data on the two-dimensional plane to extract features, but it will cause the loss of time features when processing time series data. The 1DCNN's 1D filter will convolute along a single dimension, which can conserve the temporal features of the data. The vibration signal studied in this paper is 1D data and has obvious time characteristics, thus 1DCNN is used to extract data features. The convolution operation of 1DCNN's convolution layer is as follows: where K and b are the weights and biases of the i-th filter of the l-st layer, respectively, and are the i-th local input of the l-th layer. After the 1D convolution layer, the max-pooling layer is applied to decreasing the dimension and compressing the features, thus as to decrease the computational load of the model and extract the major features. The pooling operation of the max-pooling layer is as follows: where q is the t-th neuron in the l-th layer of the i-th channel, and W is the width of the pooling kernel. In this study, we also used the global average pooling (GAP) layer instead of the full connection layer to regularize the structure of the whole network, prevent over fitting, and directly give each channel the actual category meaning. The effect of the GAP was better than that of the full connection layer.

LSTM
RNN is a kind of network model with the characteristics of memory and parameter sharing, which can effectively process and predict sequence data. However, RNN has the situation of gradient disappearance and gradient explosion. For dealing with this problem, LSTM is proposed [21]. LSTM is an improved network based on traditional RNN, adding forget gate, input gate, and output gate. According to the hidden state of the upper layer, the forget gate adds weight to each input information through the sigmoid activation function, thus as to determine the retention and discard of information. The input gate uses the sigmoid activation function and the activation function to update the information, which determines how much information to update. The output gate determines what information is output. The three gates jointly control the updated state of the signal along the time axis to obtain the information of each time step. The update formula of the three doors is as follows: where b is the deviation, W and V represent the input state weight and hidden state weight, respectively. In step t, forget gate f t , input gate i t , output gate o t , and cell state c t are updated by input x t and hidden state.

The Framework of Proposed Model
The framework of the proposed MCNN-LSTM network is shown in Figure 1. To prevent the time sequence of data from being destroyed and extract multi-scale features of data, two 1DCNN with different sizes and number of cores were selected to implement feature extraction. Wide kernels CNN automatically extracts low-frequency features, and narrow kernels CNN automatically extracts high-frequency features. The features were fused after GAP, and the formed fusion features adjust the data distribution through the Batch Normalization (BN) layer [22], thus as to speed up the network training and convergence. Then the temporal features were extracted by two-layer LSTM. Finally, a softmax classifier was used for output classification. The network structure proposed in this paper was inspired by the traditional CNN-LSTM and GoogLeNet model, and some improvements have been made. The framework of network and data transmission is shown in Figure 1. The network parameters are shown in Table 1.

Model Setup
In this paper, the Adam algorithm [23] was used to update the parameters. The Adam algorithm can adjust the learning rate adaptively to make the training converge faster, and the learning rate was set to 0.006. Mean square error (MSE) was selected as a loss function.

Data
In order to make full use of the advantages of the deep neural network, a large quantity of data was needed to train the network. When the main circulating pump worked in different fault states, we collected the vibration signals from it, with a total of 1975,914 data points, and the amplitude of data was quite different. In order to facilitate neural network training, it was necessary to preprocess the data set first.

Data Description
The research object of this paper was the main pump of the water cooling system in the converter valve. The main pump was a NKG200-150-400/410 H1F2KE-SBQQE centrifugal pump. When sampling, the main pump speed was 2978 r/min, and the sampling frequency was 12 kHz. For obtaining the data of four different faults, we artificially caused four main pump failures. The data of normal, unbalance, looseness, parallel misalignment (PM), and angular misalignment (AM) were collected from the vertical, horizontal, and axial directions by vibration acceleration sensors. The experimental setup is shown in Figure 2. The data set containing five kinds of fault data is shown in Table 2. It can be found that the data set was characterized by serious imbalance. There were 317,031 points in each direction of PM and AM. However, the amount of data in normal, unbalanced, and looseness were only 8192 points in each direction.
axial directions by vibration acceleration sensors. The experimental setup is shown in Figure 2. The data set containing five kinds of fault data is shown in Table 2. It can be found that the data set was characterized by serious imbalance. There were 317,031 points in each direction of PM and AM. However, the amount of data in normal, unbalanced, and looseness were only 8192 points in each direction.

Data Processing
From Table 1, it can be found that the data were imbalanced. In order to solve this problem, a random sampling method [24][25][26][27] was used to enhance and balance the data set for data symmetry. Through the comprehensive analysis of the vibration signal, the sample length of each vibration signal was set at 1024 points, thus as to ensure the maximum information integrity of the sample in the case of the same sample length. As for the sample length, we compared different sample lengths in Section 5 to show the impact of sample length on the model performance. The data in three directions was taken as a sample for random sampling, and 1024 points in each direction were taken to form a 1000 × 1024 × 3 data set. To ensure effectiveness and robustness of proposed model, the data set was split into 70% for training and 30% for testing to obtain 700 training samples and 300 test samples.
Since the magnitude of the data was different, direct input of data into the model will increase the computational load of the network, affect the classification accuracy, and model convergence speed. Thus, standardizing the data was necessary. This paper used Z-Score standardization, which can unify data of different amplitude into the same magnitude. The specific formula is as follows: where znew stands for standardized data and zold stands for original data. μ and δ are the mean and standard deviation of the data, respectively.

Data Processing
From Table 1, it can be found that the data were imbalanced. In order to solve this problem, a random sampling method [24][25][26][27] was used to enhance and balance the data set for data symmetry. Through the comprehensive analysis of the vibration signal, the sample length of each vibration signal was set at 1024 points, thus as to ensure the maximum information integrity of the sample in the case of the same sample length. As for the sample length, we compared different sample lengths in Section 5 to show the impact of sample length on the model performance. The data in three directions was taken as a sample for random sampling, and 1024 points in each direction were taken to form a 1000 × 1024 × 3 data set. To ensure effectiveness and robustness of proposed model, the data set was split into 70% for training and 30% for testing to obtain 700 training samples and 300 test samples.
Since the magnitude of the data was different, direct input of data into the model will increase the computational load of the network, affect the classification accuracy, and model convergence speed. Thus, standardizing the data was necessary. This paper used Z-Score standardization, which can unify data of different amplitude into the same magnitude. The specific formula is as follows: where z new stands for standardized data and z old stands for original data. µ and δ are the mean and standard deviation of the data, respectively.

Results and Discussion
Because of the excellent performance of accuracy, recall, precision, and F1-score in the model evaluation, much literature have adopted these indicators as the evaluation criteria of the model. Therefore, this paper selected accuracy, recall, precision, and F1-score as evaluation indexes. Accuracy = TP + TN TP + FP + FN + TN (9) where TP, TN, FP, and FN represent the number of true positive, true negative, false positive and false negative respectively.

Results Analysis
All the experiments in the study were completed with the Spyder (python3.6) compiler, run on a GTX950m graphics card, Intel Core i5 2.3 GHz processor, and a 4 GB RAM. The neural network was implemented under the Keras (2.0.8) framework with tensorflow backend. Some third-party libraries such as Sklearn, SciPy, and Matplotlib were used for data preprocessing and visualization.
We added two sets of comparative experiments to study the effect of sample length and RNN variables on model performance. Table 3 intuitively shows the experimental results of sample length on the test set from the aspects of data. In Table 3, the average values of evaluation indexes of each fault type were taken and arranged. It can be seen from Table 3 that the 1024-length sample has the best performance in F1-score and precision, which are basically above 0.95, and the comprehensive performance is also the best. The mean values of F1-score, recall, and precision decreased obviously with the decrease of sample length from 1024. When it increased from 1024, there was an obvious downward trend. Thus, 1024 was the most suitable sample length. We selected RNN variables including unidirectional LSTM, unidirectional gated recurrent unit (GRU) [28], bidirectional LSTM (BiLSTM) [29], and bidirectional GRU (BiGRU) [30]. Based on the 1024-length sample, the RNN variable comparison test was carried out. In Table 4, the LSTM performed well in F1-score and recall, but its advantage in precision was not obvious. The precision of LSTM was only slightly higher than that of BiLSTM, but LSTM was superior in other evaluation indexes, and the unidirectional network was superior to the bidirectional network, which was contrary to RNN commonly used in traditional text processing. We speculate that it may be caused by the change of the length and channel number of the data processed by CNN.

Model Evaluation
The confusion matrix based on the best classification result of the test set is shown in Figure 3. From Figure 3, it is obvious that the learning outcome of the model is excellent. For obtaining a more objective and comprehensive evaluation of the model, we calculated the F1-score, recall, and precision on the test set, which is summarized in Table 5 below. It can be seen that all indexes of this model have high scores, stable performance, and strong generalization ability, and it has good performance for fault diagnosis.

Model Evaluation
The confusion matrix based on the best classification result of the test set is shown in Figure 3. From Figure 3, it is obvious that the learning outcome of the model is excellent. For obtaining a more objective and comprehensive evaluation of the model, we calculated the F1-score, recall, and precision on the test set, which is summarized in Table 5 below. It can be seen that all indexes of this model have high scores, stable performance, and strong generalization ability, and it has good performance for fault diagnosis.

Algorithm Comparison
There are many algorithms for fault diagnosis of vibration signals. We chose several machine learning algorithms and deep learning algorithms for comparative experiments. From Table 6, it can be found that our proposed model has a good performance in terms

Algorithm Comparison
There are many algorithms for fault diagnosis of vibration signals. We chose several machine learning algorithms and deep learning algorithms for comparative experiments. From Table 6, it can be found that our proposed model has a good performance in terms of F1-score, recall, precision, and accuracy, which is better than the comparison algorithm.
The specific values of each index are shown in the table below.

Network Visualization
The inner part of the neural network model has always been considered as a black box, and the inner principle is difficult to understand. In this section, T-SNE was applied to visualizing the feature extraction process of internal network structure and exploring the internal feature extraction and classification process. First of all, from the input data, we selected the wide kernels CNN to preliminarily classify the data and distinguish PM, normal, and unbalance from AM and looseness. Then, narrow kernel CNN was used to subdivide AM and looseness. Through the first-layer LSTM, it can be preliminarily divided into three categories: PM, normal, and unbalance. On the basis of the first-layer LSTM, the boundaries of the five types of data were clearly divided with the second-layer LSTM. Finally, the data were divided into five categories by softmax. As shown in Figure 4, the feature distribution extracted in this paper has a very clear boundary, and the classification effect is very good.

Future Work
According to some problems of the model, the future research focuses on the following three aspects. First of all, we need to improve the data preprocessing method and the network structure and achieve more accurate fault classification while reducing the network parameters as much as possible. Secondly, this study only realized the classification of four faults and one normal state. In the future, more vibration signals of other fault types will be collected to realize more fault classification. Finally, some other data enhancement methods will be tried, and a new fault diagnosis model is established by combining machine learning methods such as PCA with deep learning algorithms.
Symmetry 2021, 13, x FOR PEER REVIEW 9 of 12 of F1-score, recall, precision, and accuracy, which is better than the comparison algorithm.
The specific values of each index are shown in the table below.

Network Visualization
The inner part of the neural network model has always been considered as a black box, and the inner principle is difficult to understand. In this section, T-SNE was applied to visualizing the feature extraction process of internal network structure and exploring the internal feature extraction and classification process. First of all, from the input data, we selected the wide kernels CNN to preliminarily classify the data and distinguish PM, normal, and unbalance from AM and looseness. Then, narrow kernel CNN was used to subdivide AM and looseness. Through the first-layer LSTM, it can be preliminarily divided into three categories: PM, normal, and unbalance. On the basis of the first-layer LSTM, the boundaries of the five types of data were clearly divided with the second-layer LSTM. Finally, the data were divided into five categories by softmax. As shown in Figure  4, the feature distribution extracted in this paper has a very clear boundary, and the classification effect is very good.

Future Work
According to some problems of the model, the future research focuses on the following three aspects. First of all, we need to improve the data preprocessing method and the network structure and achieve more accurate fault classification while reducing the network parameters as much as possible. Secondly, this study only realized the classification of four faults and one normal state. In the future, more vibration signals of other fault types will be collected to realize more fault classification. Finally, some other data enhancement methods will be tried, and a new fault diagnosis model is established by combining machine learning methods such as PCA with deep learning algorithms.

Conclusions
In this paper, the main innovation was to propose a new deep neural network combining CNN and LSTM, which can classify the vibration signals of the main pump directly. The influence and importance of sample length and RNN variable selection on the performance of the model was verified by comparison experiments. The experiments on the test set showed that the model has high scores in many indexes. The mean values of F1-score, recall, and precision were 0.987, 0.987, and 0.988, respectively, with an accuracy of 0.987. Finally, the feature extraction and classification process of the model were also visualized by T-SNE.