## 1. Introduction

Permutation entropy [

1,

2], as a statistic indicator to detect nonlinear dynamic changes, has been widely used in the fault feature extraction of rotating machinery [

3,

4,

5,

6]. For comprehensive analysis, multi-scale permutation entropy (MPE) is proposed to enhance the ability of character description of permutation entropy (PE) [

7,

8,

9]. MPE has obvious merits in describing the complexity in time series, such as high calculation efficiency, good robust ability, and independence from prior knowledge, etc. Recently, more attention has been paid to the application of MPE in fault diagnosis of rotating machinery [

10,

11,

12,

13]. Zhao et al. [

14] applied MPE and the hidden Markov model to identify different fault types of the rolling bearings. Wu et al. [

15,

16] used MPE to extract the fault features and then applied a support vector machine to identify the bearing fault types. Tiwari et al. [

17] combined MPE with an adaptive neuro fuzzy classifier to recognize incipient bearing faults. Zheng et al. [

18] utilized the MPE and support vector machines to recognize various bearing faults. Yao et al. [

19] employed the MPE to describe the fault characteristics and then used the extreme learning machine for bearing pattern identifications.

These works have successfully applied MPE in fault diagnosis of rotating machinery. However, the performance of MPE is dependent on the parameters setting, including embedding dimension

$m$ and time delay

$\tau $. If the embedding dimension

$m$ and time delay

$\tau $ are too small, the reconstruction vector will contain few states, which cannot reflect the real dynamic characteristics of the signal [

20,

21]. On the other hand, if embedding dimension

$m$ and time delay

$\tau $ are too big, the reconstructed phase space will homogenize the time series, leading to an inappropriate entropy value estimation [

22]. Therefore, the parameter selection of embedding dimension

$m$ and time delay

$\tau $ plays an important role in the MPE method [

2]. Until now, most researchers have selected MPE parameters according to their experience, thereby, an automatic selection of MPE parameters has become an urgent issue to be solved.

Aiming to automatically select the optimum parameters of MPE, a novel parameter optimization strategy of MPE is proposed in this paper. We call this method optimized multi-scale permutation entropy (OMPE). In the OMPE method, an improved Cao method is proposed to adaptively select embedding dimension

$m$ [

23]. Meanwhile, the time delay

$\tau $ is determined based on mutual information (MI) [

24]. To verify the effectiveness of the OMPE method, a simulated signal and two experimental signals are used for validation. Results demonstrate that the proposed OMPE method has a better feature extraction ability compared with existing MPE methods.

The organization of the rest of this paper is as follows.

Section 2 describes the proposed OMPE method.

Section 3 validates the advantages of OMPE using one simulated signal. The effectiveness of the OMPE method is validated using two experimental cases in

Section 4. Finally,

Section 5 summarizes the conclusions.

## 2. Optimized Multi-Scale Permutation Entropy

Permutation entropy (PE), as a dynamic indicator, is proposed to reflect non-linear behavior of time series by Bandt and Pompe [

1]. PE has obvious merits of high calculation efficiency, sensitivity to the dynamic changes, and robustness to noise in estimating the complexity of time series. However, PE confronts challenges in the field of fault feature extraction due to the fault information being embedded in large frequency bands in the vibration signals of rotating machinery. To overcome this shortcoming, multi-scale permutation entropy (MPE) is proposed to enhance the fault feature extraction ability of PE, which can measure the complexity of time series over multi scales.

Since the order of the amplitude is considered in the phase space that is reconstructed by the embedding dimension

$m$ and time delay

$\tau $, the determination of the embedding dimension

$m$ and time delay

$\tau $ play an important role in the final performance of MPE. In this study, an optimized multi-scale permutation entropy (OMPE) is proposed to free the parameter selection of MPE. First, the time delay is calculated using mutual information. Based on the obtained time delay

$\tau $, we determine the embedding dimension

$m$ based on an improved Cao method. Details are described in

Section 2.1 and

Section 2.2.

#### 2.1. Time Delay Calculation Based on Mutual Information

Mutual information (MI) utilizes the information change between the reconstruction space and original time series to select the optimal time delay [

24]. Since MI has merits of easy to perform, high calculation efficiency, etc., this paper introduces the MI method to select the optimal time delay of MPE.

For an arbitrary signal

$X=\{{x}_{i},i=1,2,\cdots ,N\}$, it can be reconstructed by time delay

$\tau $ as follows:

According to Ref. [

25], the Shanon entropy of

$X$ and

$Y$ can be calculated as:

where the

$p\left({x}_{i}\right)$ represents the probability that

$X={x}_{i}$, and

$p\left({y}_{i}\right)$ represents the probability that

$Y={y}_{i}$.

Then, the mutual information between

$X$ and

$Y$ can be expressed as:

The mutual information

$I(X,Y)$ measures the information change between the reconstruction space and original time series, which can be utilized to determine the time delay

$\tau $. According to [

24], the first local minimum point of mutual information

$I(X,Y)$ under different time delay

$\tau $ is the optimum time delay.

#### 2.2. Embedding Dimension Calculation Based on Improved Cao Method

Cao method [

23] utilizes the distance variation of the nearest neighbor points in the reconstruction phase space to determine the optimum embedding dimension. It is regarded as an improvement of the false nearest neighbor (FNN) method [

26], as it can estimate more appropriate embedding dimension than the traditional FNN method [

23]. In the original Cao method, when the average distance of the neighbor points stops changing at some embedding dimension

${m}_{0}$, the

${m}_{0}+1$ is the optimal embedding dimension. However, the embedding dimension

${m}_{0}$ is determined by empirical experience in the original Cao method, which is manual and unsuitable for big data analysis or intelligent fault diagnosis. Therefore, an improved Cao method with adaptive threshold is proposed to automatically determine the optimal embedding dimension. In this paper, the Chebyshev distance is used to measure the performance of MPE with a different embedding dimension

$m$. The embedding dimension

$m$ with the largest Chebyshev distance represents that the MPE values have the largest class separation distance. Therefore, the proposed OMPE can extract the most discriminative MPE value as fault features. The whole procedures of the proposed improved Cao method can be divided into two parts: embedding dimension calculation (in

Section 2.2.1.) and threshold adjustment (in

Section 2.2.2.).

#### 2.2.1. Embedding Dimension Calculation

Based on the MI principle, we can select the time delay $\tau $. On the basis of selected time delay $\tau $, the embedding dimension can be determined using the Cao method in the following steps:

Step 1: For an arbitrary signal

$X=\{{x}_{i},i=1,2,\cdots ,N\}$, it can be reconstructed by time delay

$\tau $ and embedding dimension

$m$ as follows:

Note that when $m=1$, Equation (5) equals Equation (1).

Step 2: Calculate the distance

$a(i,m)$ of neighbor points in the phase space.

where

$n(i,m)$ $(1\le n(i,m)\le N-m\tau )$ is an integer such that

${y}_{n(i,m)}(m)$ is the nearest neighbor of

${y}_{i}(m)$ in the

m-dimensional reconstructed phase space. The

$||\u2022||$ represents the Chebyshev distance as expressed in Equation (7).

Step 3: Calculate the average of all

$a(i,m)$ using Equation (8).

Step 4: Use

$E(m)$ to calculate

$E1(m)$ as expressed in Equation (9).

In the original Cao method, $E1(m)$ stops changing when $m$ is greater than some value ${m}_{0}$. Then ${m}_{0}+1$ is the desired embedding dimension. However, Cao doesn’t describe how to determine ${m}_{0}$. Hence, we propose a new threshold to determine the embedding dimension in Step 5.

Step 5: Define the increment

$\Delta E1(m)$ as follows:

when

$\Delta E1(m+1)\le k\Delta E1(m)$,

$m+1$ is the desired embedding dimension. Here,

$k$ is the threshold of

$E1(m)$. In other words,

$k$ determines the embedding dimension. If

$k$ is too big, the embedding dimension will be too low, the reconstruction vector will contain too few states which cannot reflect the real dynamic characteristics of the signal; If

$k$ is too small, the embedding dimension will be too high, the reconstructed phase space will homogenize the time series resulting inappropriate estimation [

22]. Therefore, a novel strategy is proposed to obtain appropriate parameter

$k$ in this paper.

#### 2.2.2. Threshold Adjustment using Chebyshev Distance

To obtain an appropriate parameter $k$, a novel strategy is proposed in this paper. In this method, the Chebyshev distance is used to measure the performance of MPE under different parameter $k$’s. The parameter $k$ with the largest Chebyshev distance is the optimum threshold for the improved Cao method.

Suppose that we have obtained the time delay

$\tau $ according to

Section 2.1., the parameter

$k$ of proposed improved Cao method contains six calculation steps as following.

Step 1: Set the maximum searching range ${k}_{m}$ and stride ${k}_{s}$, and initialize parameter $k={k}_{0}$.

Step 2: Calculate embedding dimension

${m}_{k}$ according to

Section 2.2.1.

Step 3: Calculate the MPE value using the obtained $\tau $ and ${m}_{k}$.

Step 4: Suppose there are

$c$ categories training samples, let the

$mp{e}_{a}({m}_{k},\tau ,S)$ represent the MPE value of category

$a$ using scale

$S$, time delay

$\tau $ and embedding dimension

${m}_{k}$. Calculate the Chebyshev distance

${D}_{ab}$ between each categories according to Equation (11), then it will be

$c(c-1)/2$ distance.

Step 5: Calculate the average distance

${D}^{k}$ according to Equation (12).

Here,${D}^{k}$ represents the average distance between different categories, which can be regarded as the class separation distance. The parameter $k$ with the largest ${D}^{k}$ represents that the MPE values have the largest class separation distance.

Step 6: Repeat step 1–5 to calculate all the ${D}^{k}$ under each parameter $k$. The ${k}_{opt}$ corresponding to the maximum ${D}^{k}{}_{\mathrm{max}}$ is the optimum threshold for the improved Cao method.

The whole procedures of proposed improved Cao method can be seen in

Figure 1.

#### 2.3. Procedures of OMPE Method

Generally, four steps are required in our proposed OMPE as follows:

Step 1: Apply the MI method to calculate the time delay of MPE as mentioned in

Section 2.1. Then the optimal time delay

${\tau}_{opt}$ can be obtained.

Step 2: Determine the threshold of the proposed improved Cao method which is described in

Section 2.2.2. Then the optimal threshold

${k}_{opt}$ can be obtained.

Step 3: Utilize the obtained threshold

${k}_{opt}$, calculate the optimal time embedding dimension

${m}_{opt}$ of MPE by the proposed improved Cao method as mentioned in

Section 2.2.1.

Step 4: Calculate the MPE value using the obtained optimum time delay ${\tau}_{opt}$ and optimum embedding dimension ${m}_{opt}$.

The flowchart of the OMPE is illustrated in

Figure 2.

The proposed OMPE can automatically and adaptively select the optimum time delay ${\tau}_{opt}$ and embedding dimension ${m}_{opt}$ of MPE for an arbitrary signal. Since the adaptive threshold uses the Chebyshev distance to choose the embedding dimension $m$ with the largest class separation distance, the proposed OMPE can extract the most discriminative MPE value as fault features, resulting in higher classification accuracy.

## 3. Simulation Validation

In this section, three bearing signals with different fault types are simulated for validation, including ball fault (BF), inner race fault (IRF) and outer race fault (ORF). The details of the three bearing theoretic models can be referred to in [

27].

Here, we set the sampling frequency at 20,480 Hz, the data length

N = 2048 points, the rotating frequency

${f}_{r}=50\mathrm{Hz}$, intrinsic frequency

${f}_{n}=2548\mathrm{Hz}$, outer race fault characteristic

${f}_{o}=203.9\mathrm{Hz}$ with added noise SNR = −18.626 dB, inner race fault characteristic

${f}_{I}=296.1\mathrm{Hz}$ with added noise SNR = −27.91 dB, and ball fault characteristic

${f}_{B}=262\mathrm{Hz}$ with added noise SNR = −36.15 dB.

Figure 3 shows the time domains of three simulated bearing signals and their corresponding noised signals.

The proposed method is applied to analyze these simulated signals as described in

Section 2. First, the searching area of parameter

$k$ is set from 0.4 to 2.0, and the stride is set to be 0.1. The threshold

k is used to determine the embedding dimension of MPE. If

k is set smaller than 0.4, the embedded dimension of MPE will be too large. This will cause that the MPE value cannot be obtained. Otherwise, if

k is set larger than 2, a smaller embedding dimension of MPE will be generated, resulting in an inappropriate MPE values for complexity estimation. Based on the above reasons, the range of parameter

k is set from 0.4 to 2. The result of parameter

$k$ is plotted in

Figure 4. As seen, the maximum value is 0.3265 and the corresponding

$k=0.5$. Therefore, we set

$k=0.5$. Second, the MI and improved Cao method is used to calculate the time delay

$\tau $ and embedding dimension

$m$. The optimum embedding dimension and optimum time delay is shown in

Table 1. Finally, the MPE can be calculated using the optimum embedding dimension and optimum time delay.

For comparison, other parameter selection methods are also used to calculate the MPE value. The first approach uses fixed parameter

$m=6$ and

$\tau =1$ (called the FIX method) as described in [

18]. The second approach is to use MI to calculate the time delay

$\tau $ and false nearest neighbor (FNN) to calculate embedding dimension

$m$ (called as MI-FNN method) [

28].

Table 1 lists the parameter settings of FIX and MI-FNN methods. As seen in

Table 1, it can be found that the embedding dimension

$m$ of FIX and MI-FNN methods are kept a constant value of 6 and 4, respectively. Only the proposed OMPE method can adaptively select suitable embedding dimension

$m$ (6, 9 and 5) for different bearing vibration signals.

The MPE values of three methods are plotted in

Figure 5. As seen in

Figure 5, it can be found that the MPE values of three simulated signals can be clearly discriminated in each scale using the proposed method. However, the MPE values can be barely discriminated in each scale using the FIX method or MI-FNN method. This phenomenon can be explained in the following way. First, the OMPE uses different parameter combinations for different kinds of signals, which can better reflect the real dynamic characteristics of the signal. Second, the proposed OMPE method uses Chebyshev distance to choose the embedding dimension

$m$ with largest class separation distance, therefore, the OMPE method can discriminate the three bearing fault types significantly.

## 5. Conclusions

This paper proposes a parameter selection approach for MPE, namely optimized MPE (OMPE). The mutual information and improved Cao method are applied to select the time delay and embedding dimension of MPE. The performance of OMPE is validated using both simulated and experimental signals. Test results demonstrate that OMPE has the best feature extraction performance with the highest classification accuracy comparing with other three methods. The main contributions of this paper are as follows:

(1) A parameter selection strategy is proposed to find the optimum couple of embedding dimension and time delay of MPE;

(2) OMPE can enhance the feature extraction ability with help of parameter optimization process;

(3) Simulated and experimental signals show that OMPE method has a better performance in describing the dynamic characteristics of vibration signal.

In this preliminary study, the parameter selection strategy was validated in the MPE method. Further validation in hierarchical permutation entropy and composite multi-scale permutation entropy will be considered in our future work.