New Fault Recognition Method for Rotary Machinery Based on Information Entropy and a Probabilistic Neural Network

Feature recognition and fault diagnosis plays an important role in equipment safety and stable operation of rotating machinery. In order to cope with the complexity problem of the vibration signal of rotating machinery, a feature fusion model based on information entropy and probabilistic neural network is proposed in this paper. The new method first uses information entropy theory to extract three kinds of characteristics entropy in vibration signals, namely, singular spectrum entropy, power spectrum entropy, and approximate entropy. Then the feature fusion model is constructed to classify and diagnose the fault signals. The proposed approach can combine comprehensive information from different aspects and is more sensitive to the fault features. The experimental results on simulated fault signals verified better performances of our proposed approach. In real two-span rotor data, the fault detection accuracy of the new method is more than 10% higher compared with the methods using three kinds of information entropy separately. The new approach is proved to be an effective fault recognition method for rotating machinery.


Introduction
Rotating machinery plays a key role in industrial production. Once failure occurs, it may lead to significant downtime losses. The condition monitoring and fault identification diagnosis for rotating machinery is an important guarantee to improve the reliability of mechanical equipment operations [1]. Since the fault feature information of rotating machinery is complex and changeable, it is important for mechanical fault diagnosis to accurately extract the intrinsic characteristics from all kinds of fault signals [2].
Common signal feature extraction methods for rotating machinery include time domain and frequency domain methods [3], e.g., time domain analysis, spectrum analysis, correlation analysis, zoom spectral analysis, independent component analysis, wavelet transform, as well as empirical mode decomposition [4][5][6]. These methods are effective only for single fault diagnosis.
As modern rotating machinery becomes more complex and intelligent, the state features have non-stationary dynamic and multi-source coupling characteristics [7]. There are many factors that cause faults. A new fast estimator of the spectral correlation (the fast spectral correlation) based on the short-time Fourier transform (STFT) for a cyclostationary signal is proposed [8]. One kind of fault can be described with different characteristic indices. The same symptom is often the result of analyzed in Section 4. The simulation and application experiments analysis for the presented features fusion model are carried out in Section 5. Conclusions of this paper are presented in Section 6.

Descriptions of Information Entropy Features
Information entropy is a description of the degree of uncertainty of the system, so we can use it to measure the state change of the rotating machinery. A lower entropy value means less uncertainty of the information. That is to say, there are fewer disorders in the information.
The definition of the entropy within the system is as follows [31]: Assume M is a Lebesgue space which is a σ-algebra and generated by measurable set S. It has µ measure and µ(M) = 1. M could be described by a finite partition A = {A i } which is incompatible.
That is to say: M = n ∪ i=1 A i and A i ∩A j = Φ, ∀i = j. Entropy with regard to partition A is defined as: where µ(A i )(i = 1, 2, 3, · · · n) is the measure for set A i . According to the theory of entropy, we analyze the energy features of vibration signals in the time and frequency domains. Furthermore, we extract entropy features in vibration signals and generate the following three kinds of features.

Singular Spectrum Entropy
Singular spectral entropy gives an indicator to measure the complexity or uncertainty of a vibrational signal with multiple spatial distributions on the whole. For a discrete time series Y t = [y 1 , y 2 , · · · y N ] (N is the number of samples) of the vibration signal, we can map the original signal into the embedding space by using the delay inlay technique. Assuming that the length of the embedding is M, we can obtain an (N − M + 1) × M dimensional matrix: The singular values can be obtained by the SVD decomposition from the matrix A. The definition of singular value decomposition is as follows [32]: We define Y as a matrix with the dimension m × n, there are two orthogonal matrices U = [u 1 , · · · u m ] ∈ R m×m , V = [v 1 , · · · v n ] ∈ R n×n which satisfy the following equation: Thus, we decompose A with the SVD decomposition, and obtain all of the singular values [σ 1 , σ 2 , · · · σ m ]. All of the singular values form the singular spectrum entropy of the vibration signal. We assume k is the number of non-zero singular values, and k stands for the number of different patterns of the column space of matrix A. Singular spectrum entropy can be defined as follows: where is the relative weight of the ith singular value with regard to all the singular values. Singular spectrum entropy extracts the intrinsic complexities of the system and describes its status species according to the SVD technology.

Power Spectrum Entropy
We assume Y(ω) as the discrete Fourier transform of the vibration signal Y t = [y 1 , y 2 , · · · y N ] (N is the number of samples). Then the power spectrum of Y t is S(ω) = 1 2πN |Y(ω)| 2 [33]. According to the energy conservation law, we set: As a result, the power spectrum S = {S 1 , S 2 , · · · S n } could be considered as a partition of the vibration signal in frequency domain. Thus, we can define the corresponding power spectrum entropy as follows.
is the proportion of the ith power spectrum with regard to the whole power spectrum.
From the definition of the power spectrum entropy, we know that the power spectrum entropy of the vibration signal depicts the distribution modes of the vibration energy in the frequency domain.

Approximate Entropy
Approximate entropy is a new metric to evaluate the complexity of a series. It estimates the probability of new mode generation according to the analysis of the complexity of a time series [34]. A non-negative value is applied to describe this complexity. We show an approximate entropy algorithm as follows: We assume {u(i)}(i = 1, 2, 3 · · · N) (N is the number of samples) is the given time series, m is the predefined number of dimensions of the patterns, and r is the predefined similarity threshold. The algorithm is as follows: (1) Form the series of data {u(i)}(i = 1, 2, 3 · · · N) into an m dimensional vector X(i); (2) Calculate the distance between X(i) and X(j): (3) Count the number of distances that satisfy d[X(i), X(j)] < r and set it as num. Then we can calculate the ratio between num and the total number of vectors as follows: (4) Take the logarithm for each C m i (r) and obtain their mean: (5) Set the dimension to be m + 1 and repeat the procedures from Step 1 to Step 4. We can obtain φ m+1 (r). (6) The approximate entropy of the series {u(i)}(i = 1, 2, 3 · · · N) can be calculated with the following equation: According to the definition of approximate entropy, we know that it is a function with respect to the dimension of the pattern, similarity threshold, and the number of samples. It estimates the probability of new pattern generation for a time series when the dimension changes. It is an effective non-linear analysis method since it only needs data with small length and could be applied to both determined and stochastic processes.

Probabilistic Neural Network (PNN)
The basic idea of a probabilistic neural network is to generate a decision space in a highdimensional input space based on Bayesian rules, which is to say to minimize the expected risk of erroneous classification. A probabilistic neural network is a kind of artificial neural network which is based on statistics. More specifically, it is a kind of feed-forward network whose activation function is the Parzen window function. A PNN contains the strengths of both the Radial Basis Function (RBF) neural network and the classical probabilistic density estimation method. It performs better in pattern classification compared with traditional feed-forward neural networks.
We map the sample space to pattern space with the help of the PNN. As a result, we can obtain a network system which is robust and has an adaptive structuring ability. The structure of the PNN is similar with that of an RBF neural network. As shown in Figure 1, the main structure contains four layers: input layer, model layer, summation layer, and output layer.
According to the definition of approximate entropy, we know that it is a function with respect to the dimension of the pattern, similarity threshold, and the number of samples. It estimates the probability of new pattern generation for a time series when the dimension changes. It is an effective non-linear analysis method since it only needs data with small length and could be applied to both determined and stochastic processes.

Probabilistic Neural Network (PNN)
The basic idea of a probabilistic neural network is to generate a decision space in a high-dimensional input space based on Bayesian rules, which is to say to minimize the expected risk of erroneous classification. A probabilistic neural network is a kind of artificial neural network which is based on statistics. More specifically, it is a kind of feed-forward network whose activation function is the Parzen window function. A PNN contains the strengths of both the Radial Basis Function (RBF) neural network and the classical probabilistic density estimation method. It performs better in pattern classification compared with traditional feed-forward neural networks.
We map the sample space to pattern space with the help of the PNN. As a result, we can obtain a network system which is robust and has an adaptive structuring ability. The structure of the PNN is similar with that of an RBF neural network. As shown in Figure 1, the main structure contains four layers: input layer, model layer, summation layer, and output layer. The input layer receives the samples (the input dimension is n). The model layer is responsible for calculating the pattern correspondence between the input feature vector and the training set. It will generate a non-linear function, i.e., Here k w is the connection weight between the input layer and the model layer. We can obtain the distance The input layer receives the samples (the input dimension is n). The model layer is responsible for calculating the pattern correspondence between the input feature vector and the training set. It will generate a non-linear function, i.e., exp[( Here w k is the connection weight between the input layer and the model layer. We can obtain the distance p i j (i = 1, 2, · · · , m, where m is the number of models in the sample) between the input vector and the weight vector from every model layer unit. Finally, we can obtain the conditional probabilistic density in the summation layer: The output layer estimates the maximum probability R k of a test sample, which belongs to a specific class based on Bayesian minimal risk estimation theory: Thus, we can obtain the expected class of the test sample.
Since the weights between the input layer and the model layer could be adaptively chosen according to the training samples, the PNN is simple, robust, and easy to train.

Information Entropy Features Fusion Model
Information fusion is a signal processing procedure. It could manipulate complex multi-source information from different scales and aspects.
The fusion process could be divided into three levels according to the relationships among multi-source information: data-level fusion, feature-level fusion, and decision-level fusion. The main procedures for feature-level fusion are as follows: Firstly, we transform the raw data from every sensor into a feature vector. Then we fuse all these feature vectors. We obtain our final decision based on the fused results. The fused results compressed the raw data and extracted the key information. As a result, it could reduce the computational complexity.
This paper proposes a feature fusion model based on information entropy and probabilistic neural networks for rotary machine fault identification. The structure of the fusion model is demonstrated in Figure 2.
The output layer estimates the maximum probability k R of a test sample, which belongs to a specific class based on Bayesian minimal risk estimation theory: Thus, we can obtain the expected class of the test sample.
Since the weights between the input layer and the model layer could be adaptively chosen according to the training samples, the PNN is simple, robust, and easy to train.

Information Entropy Features Fusion Model
Information fusion is a signal processing procedure. It could manipulate complex multi-source information from different scales and aspects.
The fusion process could be divided into three levels according to the relationships among multi-source information: data-level fusion, feature-level fusion, and decision-level fusion. The main procedures for feature-level fusion are as follows: Firstly, we transform the raw data from every sensor into a feature vector. Then we fuse all these feature vectors. We obtain our final decision based on the fused results. The fused results compressed the raw data and extracted the key information. As a result, it could reduce the computational complexity.
This paper proposes a feature fusion model based on information entropy and probabilistic neural networks for rotary machine fault identification. The structure of the fusion model is demonstrated in Figure 2.  According to Figure 2, the main steps of the feature fusion model based on information entropy and the PNN are as follows.
(1) The input data are collected from all the sensor outputs for fault identification. We calculated singular spectrum entropy, power spectrum entropy, and approximate entropy from the training data and construct the feature vectors. We feed the feature vectors into the PNN as training samples and train the model based on these samples. (2) We calculate the corresponding feature vector for each test sample and feed the vector to the PNN which has already trained in Step 1. We can obtain the fault identification result from the output layer of PNN. (3) For each test sample output, if its classification result is correct, we could add it into the training set. We retrain the whole model with this new training set in order to enhance the performance of the model.

Simulated Fault Signal
In this section, we generated three kinds of simulated fault signal of rotary machines as training and test samples.
For all the simulated signals, Class A stands for the type of unbalanced fault, Class B stands for the type of coupling misalignment fault, Class C stands for the type of rubbing fault. We set the sampling number to be 1024 and the rotation speed of the rotor is 3000 rpm. Thus, the working frequency of the rotor is f 1 = 50 Hz, double the fundamental frequency is f 2 = 100 Hz, high frequency is f 3 = 200 Hz, half the fundamental frequency is f 4 = 25 Hz, and extremely high frequency is f 4 = 500 Hz. We construct the simulated signals according to the different constitutions of frequencies for different faults.
Class A consists of 90% of frequency f 1 , and 5% of frequency f 2 and f 3 , respectively. Thus, the signal can be represented as follows: where ε(t) is Gaussian noise whose mean is 0 and standard deviation σ is 0.1. Class B consists of 50% of frequency f 2 , 40% of frequency f 1 , and 10% of frequency f 3 . Thus, the signal can be represented as follows: Class C consists of 40% of frequency f 1 , 20% of frequency f 2 , 10% of frequency f 3 , 20% of frequency f 4 , and 10% of frequency f 5 . Thus, the signal could be represented as follows: X 2 (t) = 0.4 cos(2π f 1 t) + 0.2 cos(2π f 2 t) + 0.1 cos(2π f 3 t) + 0.2 cos(2π f 4 t) + 0.1 cos(2π f 5 t) + ε(t) (14) In Equations (12)- (14), ε(t) is Gaussian white noise whose mean is 0 and standard deviation σ is 0.1. We generate 100 groups of data from the above-mentioned three classes, respectively. We calculate the singular spectrum entropy, power spectrum entropy, and approximate entropy, respectively, from the training data and form them into feature vectors.
We randomly split the generated data into two halves. We use one half as the training set and train the PNN. The other half is used as the test set. Table 1 compares the results of the four methods. The first three methods use singular spectrum entropy, power spectrum entropy, and approximate entropy separately. The last method fuses all three kinds of entropies with the PNN. From Table 1 we can conclude that the fault identification accuracy becomes much higher while fusing all three entropies when compared to using the three kinds of information entropy separately. This reflects the better performances and effectiveness of the proposed method based on information entropy and a probabilistic neural network in fault diagnosis.

Rotor Test Platform Fault Experiments
In this section, we collect signals from the rotor test platform, and calculate three kinds of information entropy from the collected signals. The used rotor test rig is an experimental device to simulate the vibration condition of rotating machinery, which can effectively reproduce many kinds of vibration phenomena generated by rotating machinery. The rotor rig can simulate the running state of the machine by changing the rotor speed, shaft stiffness, mass unbalance, bearing friction, coupling form, or impact condition through different choices.
The working principle diagram of the rotor test rig is shown in Figure 3.
separately. This reflects the better performances and effectiveness of the proposed method based on information entropy and a probabilistic neural network in fault diagnosis.

Rotor Test Platform Fault Experiments
In this section, we collect signals from the rotor test platform, and calculate three kinds of information entropy from the collected signals.
The used rotor test rig is an experimental device to simulate the vibration condition of rotating machinery, which can effectively reproduce many kinds of vibration phenomena generated by rotating machinery. The rotor rig can simulate the running state of the machine by changing the rotor speed, shaft stiffness, mass unbalance, bearing friction, coupling form, or impact condition through different choices.
The working principle diagram of the rotor test rig is shown in Figure 3.

220V AC Governor Motor
Rotor system

Signal acquisition system Computer Client
Sensor system 220V AC Figure 3. The working principle diagram of the rotor test rig.
We conduct experiments for identifying unbalance fault, coupling misalignment fault and rubbing fault with the propose feature fusion model.

Identifying Unbalance and Rubbing Faults for a Single-Span Rotor
We design a PNN model with three input units and two output units. The three input units correspond to the above-mentioned three kinds of information entropy and the two output units correspond to unbalance and rubbing faults, respectively.
We tested at three different rotation speeds, which are 1200 rpm, 2400 rpm, and 3600 rpm, for a single-span rotor. We collected vibration fault signals at the three rotation speeds. For each rotation speed, we collected 100 groups of data for training and 20 groups of data for testing.
For the single-span rotor, the calculated singular spectrum entropy of each vibration signal in different rotation speeds is shown in Figures 4 and 5. We conduct experiments for identifying unbalance fault, coupling misalignment fault and rubbing fault with the propose feature fusion model.

Identifying Unbalance and Rubbing Faults for a Single-Span Rotor
We design a PNN model with three input units and two output units. The three input units correspond to the above-mentioned three kinds of information entropy and the two output units correspond to unbalance and rubbing faults, respectively.
We tested at three different rotation speeds, which are 1200 rpm, 2400 rpm, and 3600 rpm, for a single-span rotor. We collected vibration fault signals at the three rotation speeds. For each rotation speed, we collected 100 groups of data for training and 20 groups of data for testing.
For the single-span rotor, the calculated singular spectrum entropy of each vibration signal in different rotation speeds is shown in Figures 4 and 5.     We calculated the singular spectrum entropy of each vibration signal in the single-span rotor at different rotation speeds, and the result is shown in Figures 6 and 7.      We calculated the approximate entropy of each vibration signal in the single-span rotor at different rotation speeds, and the result is shown in Figures 8 and 9. We calculated the approximate entropy of each vibration signal in the single-span rotor at different rotation speeds, and the result is shown in Figures 8 and 9. We calculated the approximate entropy of each vibration signal in the single-span rotor at different rotation speeds, and the result is shown in Figures 8 and 9.  It can be seen from Figures 5-9 that, with the change of speed, the variation law of entropy of a single-span rotor in different failure states presents different trends, and the range of variation is wide. With the increase of speed, the singular spectrum entropy, power spectrum entropy, and approximate entropy in the unbalance fault show a generally decreasing trend, while the singular spectrum entropy, power spectrum entropy, and approximate entropy of the rubbing fault show a  We calculated the approximate entropy of each vibration signal in the single-span rotor at different rotation speeds, and the result is shown in Figures 8 and 9.  It can be seen from Figures 5-9 that, with the change of speed, the variation law of entropy of a single-span rotor in different failure states presents different trends, and the range of variation is wide. With the increase of speed, the singular spectrum entropy, power spectrum entropy, and approximate entropy in the unbalance fault show a generally decreasing trend, while the singular spectrum entropy, power spectrum entropy, and approximate entropy of the rubbing fault show a It can be seen from Figures 5-9 that, with the change of speed, the variation law of entropy of a single-span rotor in different failure states presents different trends, and the range of variation is wide. With the increase of speed, the singular spectrum entropy, power spectrum entropy, and approximate entropy in the unbalance fault show a generally decreasing trend, while the singular spectrum entropy, power spectrum entropy, and approximate entropy of the rubbing fault show a generally increasing trend. Table 2 compares the fault identification accuracy rate of the four methods. From Table 2 we can conclude that there is no clear winner for any single kind of entropy from singular spectrum entropy, power spectrum entropy, and approximate entropy. Their accuracies are much lower than that of using the feature fusion model. This is because the above three kinds of entropy describe the characteristics of signals from the time domain, frequency domain, and complexity domains, respectively. The proposed feature fusion model could combine information from different characteristics and, as a result, it is more sensitive to the fault features and could boost the performance.

Identifying Unbalance, Coupling Misalignment Faults, and Rubbing Faults for Two-Span Rotors
We designed a PNN model with three input units and three output units. The three input units correspond to the above-mentioned three kinds of information entropy and the three output units correspond to the unbalance, coupling misalignment fault and rubbing faults, respectively. Similar with the above experiment, we test at three different rotation speeds, which are 1000 rpm, 3000 rpm, and 5000 rpm. We collected vibration fault signals at the three rotation speeds. For each rotation speed, we collected 100 groups of data for training and 20 groups of data for testing. Table 3 compares the results of the four methods. From Table 3 we can see that, when the singular spectrum entropy, power spectrum entropy, and approximate entropy are used separately to classify the faults of two-span rotors, the classification accuracy is not satisfactory, and the fault characteristics cannot be better reflected. However, for the proposed feature fusion model method, the fault classification accuracy is significantly higher than that of any single feature fault classification methods. This proves the effectiveness of the proposed approach.

Conclusions
In order to cope with the difficulty of identifying complex faults for rotary machinery, this paper proposes a feature fusion model based on information entropy and a probability neural network. In order to obtain the final fault identification result, we implement the characteristic measurement of a vibration signal combining singular spectrum entropy, power spectrum entropy, and approximate entropy, and fuse them with a PNN model to classify and diagnose the fault signals. Finally, the fault detection accuracy of our proposed method is more than 10% higher compared with the methods using three kinds of information entropy separately in real two-span rotor data, and proved to be an effective fault diagnosis method.