Feature Extraction of Ship-Radiated Noise Based on Regenerated Phase-Shifted Sinusoid-Assisted EMD, Mutual Information, and Differential Symbolic Entropy

To improve the recognition accuracy of ship-radiated noise, a feature extraction method based on regenerated phase-shifted sinusoid-assisted empirical mode decomposition (RPSEMD), mutual information (MI), and differential symbolic entropy (DSE) is proposed in this paper. RPSEMD is an improved empirical mode decomposition (EMD) that alleviates the mode mixing problem of EMD. DSE is a new tool to quantify the complexity of nonlinear time series. It not only has high computational efficiency, but also can measure the nonlinear complexity of short time series. Firstly, the ship-radiated noise is decomposed into a series of intrinsic mode functions (IMFs) by RPSEMD, and the DSE of each IMF is calculated. Then, the MI between each IMF and the original signal is calculated; the sum of MIs is taken as the denominator; and each normalized MI (norMI) is obtained. Finally, each norMI is used as the weight coefficient to weight the corresponding DSE, and the weighted DSE (WDSE) is obtained. The WDSEs are sent into the support vector machine (SVM) classifier to classify and recognize three types of ship-radiated noise. The experimental results demonstrate that the recognition rate of the proposed method reaches 98.3333%. Consequently, the proposed WDSE method can effectively achieve the classification of ships.


Introduction
By analyzing ship-radiated noise, a slice of crucial information such as the type, speed, and tonnage of the ship can be extracted. Consequently, the identification of ship-radiated noise plays a significant role in the military and economic fields [1][2][3]. Due to the influence of marine environment noise, the actual measured ship-radiated noise is non-Gaussian, non-stationary, and non-linear [4][5][6]. Therefore, it is an arduous task to identify ship-radiated noise effectively. To realize the feature extraction of ship-radiated noise, scholars have proposed Fourier transform, wavelet transform and modern spectrum estimation [7]. However, these methods have some limitations. As an example, Fourier transform analysis cannot reflect the time-varying characteristics of the signal well; the wavelet transform needs to select the wavelet basis function and the decomposition level, in advance [8]. Consequently, it is indispensable to find a feature extraction method with higher recognition accuracy.
Empirical mode decomposition (EMD) [9] is an adaptive signal decomposition algorithm, which can decompose the original signal into a series of intrinsic mode functions (IMFs) with different frequencies. Therefore, it provides a new idea for underwater acoustic signal processing. However, EMD may cause the mode mixing problem, which leads to different frequency components in an and the normalized MI is used to weight the corresponding IMF's DSE to obtain the weighted DSE (WDSE). Finally, the feature vector WDSE is put into SVM for classification.

The Traditional EMD Algorithm
The specific steps of EMD are summarized as follows [9]: Step 1. Connect the local maxima/minima of the original signal x(t) to obtain the upper/lower envelope using the cubic spline.
Step 2. Derive the local mean of envelope, m(t), by averaging the upper and lower envelopes.
Step 4. If d(t) satisfies some predefined stoppage criteria, d(t) is assigned as an IMF denoted as c m (t) where m is the IMF index. Otherwise, set x(t) = d(t), and repeat Steps 1-3.
Step 6. Set x(t) = r m (t), and repeat Steps 1-5 to extract the next IMF. The final residue is denoted as r(t).
The result of EMD can be expressed as: where k is the index and K denotes the total number of IMFs. An ideal IMF should only include one monocomponent, such as I MF(t) = a(t)cos(2π f (t)t) [9], where a(t) and f (t) are the instantaneous amplitude and instantaneous phase, respectively. However, in fact, the mode mixing problem often appears, resulting in an IMF being split into two adjacent IMFs or different IMFs existing in an IMF. Then, an IMF containing J IMFs will follow the form: a j (t)cos 2π f j (t)t (2) The work in [28] indicated that a two-mode signal, with a 1 , f 1 and a 2 , f 2 as their respective amplitudes and frequencies, can be separated only if the cutoff frequency ratio is no more than 0.67, which can be expressed as: Therefore, we consider that two points at t 1 and t 2 belong to the same IMF if f (t 1 )/ f (t 2 ) ∈ [0.67, 1.5]. Further, the extreme point detected in the sifting of EMD will all belong to the high-frequency mode if: RPSEMD uses Equations (3) and (4) as the criteria for separating two IMFs. In view of the cause of the mode mixing problem, making the extreme points of each IMF distribute evenly becomes an important factor to solve the mode mixing problem, which can be achieved by adding auxiliary signals.

The Main Idea of RPSEMD
The novelty of RPSEMD can be summarized into three aspects. First, a general-form sinusoid is used as the auxiliary signal: s k (t|a k , f k , θ k ) = a k cos(2π f k t + θ k ) where k represents the k th stage of decomposition to extract the IMF c k (t); a k , f k , and θ k denote the amplitude, frequency, and phase, respectively. Second, the sinusoid-assisted signal s k (t) is designed based on the IMF selected after IMF clustering analysis. This active way ensures that the RPSEMD is deterministic and is able to solve the mode mixing problem more efficiently. Third, s k (t) is shifted by θ k to change the positions of the extreme point, not only helping to retain more details of the separated IMFs, but also ensuring s k (t) is completely canceled out in the final results. The specific steps of RPSEMD are summarized as follows [11]: Step 1. Initialize k = 1.
Step 2. Apply EMD to x(t) and then determine a k and f k with the resulting IMFs. θ ki is acquired by uniformly sampling in [0, 2π] with the phase shifting number I(1 ≤ i ≤ I). After this, Step 3. The EMD of x(t) + s k (t|a k , f k , θ ki ) is performed, which aims to obtain the first IMF. The final IMF c k (t) is calculated by averaging all these first IMFs.
Step 5. Repeat Steps 2-4 until no more IMF can be obtained. Consequently, the final x(t) is regarded as a residue r(t).

Selecting the Parameters of s k (t)
The mode mixing problem occurs because the extreme points of the IMF in the signal are unevenly distributed. Therefore, the target of adding s k (t) is to imitate a homogenous IMF.
(1) Determining a k and f k of s k (t): As described above, the first task is to find the target IMF to be imitated. Equation (2) shows that an IMF is generally a composition of J IMFs if the mode mixing problem exists. In this sense, the initial IMFs by EMD, denoted as ci k (t) (1 ≤ k ≤ K ), can be used as a reference to obtain the target IMF.
a. Obtain the target IMF to be imitated: Since EMD sifts a signal from high-frequency to low-frequency, we analyze the IMFs of ci 1 (t) and pick out the one with the highest frequency as the target IMF. However, x(t) is unknown, and the IMFs of ci 1 (t) are quite arbitrary, while the only certainty is that the frequencies of different IMFs are constrained by Equation (3). In view of these issues, clustering analysis is introduced to cluster the instantaneous frequencies of the extreme in ci 1 (t), to classify the IMFs of ci 1 (t). Wang et al. [11] used hierarchical clustering since it is a nonparametric approach. Specifically, based on the instantaneous frequencies of the extreme, the hierarchical clustering is executed as follows: first, the Euclidean distances between instantaneous frequencies are calculated, then a tree of hierarchical clusters with these distances is created, and finally, the clusters are constructed.
As the number of the clusters P is unknown, we start with P = K to ensure P ≥ J and iteratively reduce P by P m , until P m is zero. Finally, the clusters whose mean frequencies do not satisfy Equation (3) are excluded when comparing with that of s k−1 (t) obtained in the (k − 1) th stage of decomposition.
b. Determining a k and f k : The final P clusters are treated as P IMFs with amplitudes ac p (1 ≤ p ≤ P) and mean frequencies f c p , and the one with the highest frequency, say the p 0th cluster, is determined as the target IMF. We then set f k = f c p 0 and a k = ac p 0 .
Two adjustments of a k are executed based on Equations (3) and (4) to ensure the added s k (t) can be separated from the IMFs that are much easier to be mixed with s k (t), which are described as follows.
First, to separate from the other IMFs in ci 1 (t), the current a k is increased to max(a k , max(ac p × f c p / f k )) according to Equation (4). Second, to separate all IMFs in ci 2 (t), a k is further adjusted as a k + a * k if a k , f k , and a * k , f * k do not satisfy Equations (3) and (4), where a * k and f * k are obtained using the above procedure of obtaining a k and f k for ci 2 (t). This adjustment means merging the target IMF because it is judged to be split into ci 1 (t) and ci 2 (t). The algorithm of determining a k and f k is summarized below: Step 1. For the extreme of ci 1 (t), get their instantaneous amplitudes ai 1 (e) and instantaneous frequencies f i 1 (e), where e indicates the index of an extreme. Step 2. Repeatedly classify f i 1 (e) into P clusters by adjusting P ← P − P m until any two clusters satisfy Equation (3).
Step 3. Find the p 0th cluster and set f k = f c p 0 and a k = ac p 0 .
Step 4. Adjust a k to ensure s k (t) can be separated from the IMFs clustered in ci 1 (t) and ci 2 (t).
(2) Determining θ ki of s k (t): Including the phase θ ki in s k (t) has two intensions: first, make that the auxiliary sinusoid be added complementarily in pairs to make sure they can be eliminated ultimately; second, change the relative positions of the extreme between the added sinusoid and the signal to retain largely the details of the separated IMFs. The second intension is the response to the fact that, occasionally, some extreme describing the details of an IMF may be hidden and cannot be detected after adding a sinusoid; while shifting the sinusoid by a phase can make those extremes appear.
The solution of shifted phase θ k can be defined as follows: where N denotes the number of shifts. To achieve the first intention, N must be an even number so that θ ki+N/2 = θ ki + π(0 ≤ i ≤ N/2) and thus s ki+N/2 (t) = −s ki (t). For the second intention, we set N = 2R(1/2 f 1 ) where f 1 comes from s 1 (t) in the first stage of decomposition and R(·) is a rounding operation. We select f 1 to determine N to balance the calculation and the ability of retaining the details of IMFs, because usually, the details of a high-frequency IMF are much easier to lose. The link of the RPSEMD code is: http://www3.ntu.edu.sg/home/mkmqian/RPSEMD.htm.

Traditional Symbolization
Symbolization plays an important role in symbolic dynamic analysis. The symbolic procedure inevitably leads to the loss of part of statistical information; however, it simplifies time series analysis and contributes to dynamic complexity detection by extracting symbolic dynamic information.
A symbolization in the works of Kurths et al. [29], using typical local dynamic symbolization, conducts symbolic transformation by comparing relationships between adjacent symbols. Given univariate time series Y = {y i , i = 1, . . . , N}, traditional symbolization, being described as especially reflecting the dynamical properties of the record [29], transforms time series into a symbol sequence as Equation (7): 0 : ∆y ≥ 1.5σ ∆ 1 : ∆y > 0 and ∆y ≤ 1.5σ ∆ 2 : ∆y > −1.5σ ∆ and ∆y < 0 3 : ∆y ≤ −1.5σ ∆ Traditional symbolic transformation refines the differences between neighboring elements, but it only considers two adjacent values and lacks flexibility due to the fixed 1.5σ ∆ .

Differential Symbolization
Taking the relationships of three consecutive elements into account, Yao et al. [17,18] proposed differential symbolic transformation with a flexible controlling parameter. The complexity detection of this symbolization is attributed to detailed local dynamic information. Considering univariate time series Y = {y i , i = 1, . . . , N}, the differences between the current element and its forward and backward ones are expressed as D 1 = y(i) − y(i − 1) and D 2 = y(i + 1) − y(i) . The four-symbol differential symbolization with the controlling parameter α is obtained by the following formula: where di f f = D 1 − D 2 and var = (D 2 1 + D 2 2 )/2. The symbolization in Equation (8) takes advantage of more detailed local information of complexity measures than traditional symbolic transformation.
Construction of symbol sequences, or words, is the next step by collecting groups of symbols together in temporal order. This coding process is to create symbol templates or words with finite symbols and has some similarities to embedding theory for phase space construction [18]. The symbol sequence will be coded into m-bit series C(i), and there are 4 m symbols in coded series considering the four-symbol differential symbolization. Taking 3-bit coding as an example, the code for 'αβγ' can be c(i) = α · n 2 + β · n + γ, where n = 4. The procedure of symbolization and coding is illustrated in Figure 1. The probability of each code symbol is P(π) = [p(π 1 ), p(π 2 ), . . . , p(π 4 m )].
Taking the relationships of three consecutive elements into account, Yao et al. [17,18] proposed differential symbolic transformation with a flexible controlling parameter. The complexity detection of this symbolization is attributed to detailed local dynamic information. Considering univariate time series The symbolization in Equation (8) takes advantage of more detailed local information of complexity measures than traditional symbolic transformation. Construction of symbol sequences, or words, is the next step by collecting groups of symbols together in temporal order. This coding process is to create symbol templates or words with finite symbols and has some similarities to embedding theory for phase space construction [18]. The symbol sequence will be coded into m-bit series ( ) C i , and there are 4 m symbols in coded series considering the four-symbol differential symbolization. Taking 3-bit coding as an example, the code for ' αβγ ' can be α β γ The procedure of symbolization and coding is illustrated in Figure 1. The probability of each code symbol is π π π π =  Process of symbolization and coding (data and symbols in virtual frame will not be symbolized or encoded). In the creation of code series, symbol length m is three. The first and last elements will not be transformed according to the determination of symbolization, and the last n − 1 bit symbols are not encoded for this encoding process.
Finally, DSE is obtained by computing the Shannon entropy from the probability distribution for all the words, and then, it is normalized by its highest value, i.e., 2 log 4 m , such that: Figure 1. Process of symbolization and coding (data and symbols in virtual frame will not be symbolized or encoded). In the creation of code series, symbol length m is three. The first and last elements will not be transformed according to the determination of symbolization, and the last n − 1 bit symbols are not encoded for this encoding process.
Finally, DSE is obtained by computing the Shannon entropy from the probability distribution for all the words, and then, it is normalized by its highest value, i.e., log 2 4 m , such that:

Mutual Information
Originating from the classic and profoundly influential work by Shannon, the MI between discrete random variables X and Y is defined as [30]: where p(x, y) is the joint probability distribution function of x and y, p(x) and p(y) are the marginal probability distribution functions of x and y, respectively.
)dxdy (11) In probability theory and information theory, the mutual information of two random variables represents a measure of the interdependence of variables. If X and Y are independent, MI(X; Y) = 0.
Furthermore, the MI can also be expressed as [31]: where H(X) and H(Y) are information entropy, H(X|Y) and H(Y|X) are conditional entropy, and H(X, Y) is the joint entropy of X and Y.
Assume that the original signal x(t) is decomposed by RPSEMD to obtain K IMFs (for convenience, the residue r(t) is counted as the last IMF), which are denoted as I MF 1 (t), I MF 2 (t), . . . , I MF i (t), . . . , I MF K (t), respectively. Therefore, the normalized MI (norMI) between the i th IMF and the original signal can be expressed as: For convenience, the norMI used in the following test represents the norMI between the IMF and the original signal, unless otherwise specified.

The Proposed Feature Extraction Method
The flowchart of the proposed feature extraction method is shown in Figure 2.
Step 1. The three types of recorded ship-radiated noise are normalized.
Step 2. The ship-radiated noise is decomposed into a series of IMFs by RPSEMD.
Step 3. Calculate the DSE of each IMF.
Step 4. The MIs between each IMF and the original signal are calculated, and then, the sum of all MIs is used as the denominator to calculate the normalized value of each MI, expressed as norMI.
Step 5. The norMI is used as the weight coefficient to weight the corresponding DSE, and the feature vector WDSE is obtained. Step 6. The feature vector WDSE is input into the support vector machine for classification.

Performance Analysis of EMD, EEMD, and RPSEMD
To verify that RPSEMD can alleviate the mode mixing problem, we give an example here. The simulation signals are as follows: where 1 s , 2 s , and 3 s represent the three components of S , and the sampling frequency is 10 kHz. EMD, EEMD, and PRSEMD are used to decompose S . The simulation signals and the decomposition results are presented in Figure 3. According to the literature [26], the amplitude of the noise is set as 0.3, and the ensemble size is 100 for the EEMD method. It can be seen from Figure  3b that the IMF1 and IMF2 of the EMD have obvious mode mixing. EEMD alleviates the mode mixing to a certain extent, but the original signal is decomposed into nine mode components, of which IMF6-IMF9 have no practical physical meaning. Compared with EEMD, there is no need to

Performance Analysis of EMD, EEMD, and RPSEMD
To verify that RPSEMD can alleviate the mode mixing problem, we give an example here. The simulation signals are as follows: where s1, s2, and s3 represent the three components of S, and the sampling frequency is 10 kHz. EMD, EEMD, and PRSEMD are used to decompose S. The simulation signals and the decomposition results are presented in Figure 3. According to the literature [26], the amplitude of the noise is set as 0.3, and the ensemble size is 100 for the EEMD method. It can be seen from Figure 3b that the IMF1 and IMF2 of the EMD have obvious mode mixing. EEMD alleviates the mode mixing to a certain extent, but the original signal is decomposed into nine mode components, of which IMF6-IMF9 have no practical physical meaning. Compared with EEMD, there is no need to select parameters for RPSEMD, so it has better practicability. Figure 3d manifests that the original signal is decomposed into three IMFs by PRSEMD, and these IMFs are very close to the three components of the original signal.
Consequently, compared with EMD and EEMD, RPSEMD better optimizes the mode mixing problem. Further, the energy of the signal can be calculated by: where N represents the length of the signal and p i denotes the amplitude of the i th sample point. The energy of simulation signals and IMFs is shown in Tables 1 and 2, respectively. It can be seen that the energy of the IMFs of RPSEMD is equal to the simulation signal. The results demonstrate that it is feasible to use RPSEMD as the signal decomposition method in this paper.
Entropy 2018, 20, x FOR PEER REVIEW 9 of 17 select parameters for RPSEMD, so it has better practicability. Figure 3d manifests that the original signal is decomposed into three IMFs by PRSEMD, and these IMFs are very close to the three components of the original signal. Consequently, compared with EMD and EEMD, RPSEMD better optimizes the mode mixing problem. Further, the energy of the signal can be calculated by: where N represents the length of the signal and i p denotes the amplitude of the i th sample point.
The energy of simulation signals and IMFs is shown in Tables 1 and 2, respectively. It can be seen that the energy of the IMFs of RPSEMD is equal to the simulation signal. The results demonstrate that it is feasible to use RPSEMD as the signal decomposition method in this paper.

Parameter Selection of DSE
Ship-radiated noise has obvious chaotic characteristics. In order to analyze the influence of DSE parameters on the entropy value, three typical chaotic signals are selected for simulation experiments. These chaotic signals can be expressed as follows [32]: (1) Henon mapping: x(n + 1) = 1 − 1.4x(n) 2 + y(n) y(n + 1) = 0.3x(n) , and the initial condition is In this paper, we analyze the data points in the y-direction.
(2) Rossler system: , the initial condition is , and the integral step size is 0.05. In this paper, we analyze the data points in the x-direction.
(3) Mackey-Glass signal: The original signals are taken from 5000 points of the above three types of simulation signals. The original signals are normalized to get the time-domain waveform shown in Figure 4a. We select different DSE parameters to calculate the entropy of simulated signals. In this section, we mainly study the influence of symbol length m and the adjustment factor α on the differential symbolic entropy. The symbol length is set from 1-5 with a step size of one, and the adjustment factor is set from 0.1-0.8 with a step size of 0.01. The DSE entropy under different parameters is shown in Figure 4b.

Parameter Selection of DSE
Ship-radiated noise has obvious chaotic characteristics. In order to analyze the influence of DSE parameters on the entropy value, three typical chaotic signals are selected for simulation experiments. These chaotic signals can be expressed as follows [32]: (1) Henon mapping: x n x n y n y n x n , and the initial condition is In this paper, we analyze the data points in the y-direction.
(2) Rossler system: , the initial condition is , and the integral step size is 0.05. In this paper, we analyze the data points in the x-direction.
(3) Mackey-Glass signal: The original signals are taken from 5000 points of the above three types of simulation signals. The original signals are normalized to get the time-domain waveform shown in Figure 4a. We select different DSE parameters to calculate the entropy of simulated signals. In this section, we mainly study the influence of symbol length m and the adjustment factor α on the differential symbolic entropy. The symbol length is set from 1-5 with a step size of one, and the adjustment factor is set from 0.1-0.8 with a step size of 0.01. The DSE entropy under different parameters is shown in Figure 4b.  It can be seen from Figure 4b that when m is equal to one, the entropy of the three types of chaotic signals is closest. When m is greater than one, these signals can be better distinguished. When m is fixed, the value of α has no significant effect on the entropy of the Rossler signal and the Mackey-Glass signal. For the Henon mapping, as α increases, its entropy increases first and then decreases, and the maximum is obtained around 0.6. In conclusion, the value of m has little influence on DSE. When α is equal to 0.6, DSE can be used to classify the three types of chaotic signals better. Considering the calculation speed and the stability of DSE, we set m and α to three and 0.6, respectively.
It can also be seen from Figure 4b that the Henon mapping has the largest DSE entropy, which means it is more complex. The DSE of the Rossler signal are minimum, which means that the Rossler signal has regularity to some extent. The DSE of the Mackey-Glass signal are a bit lower than those of the Henon map, indicating that the Mackey-Glass signal also has higher complexity. It can be seen from Figure 4b that when m is equal to one, the entropy of the three types of chaotic signals is closest. When m is greater than one, these signals can be better distinguished. When m is fixed, the value of α has no significant effect on the entropy of the Rossler signal and the Mackey-Glass signal. For the Henon mapping, as α increases, its entropy increases first and then decreases, and the maximum is obtained around 0.6. In conclusion, the value of m has little influence on DSE. When α is equal to 0.6, DSE can be used to classify the three types of chaotic signals better. Considering the calculation speed and the stability of DSE, we set m and α to three and 0.6, respectively.
It can also be seen from Figure 4b that the Henon mapping has the largest DSE entropy, which means it is more complex. The DSE of the Rossler signal are minimum, which means that the Rossler signal has regularity to some extent. The DSE of the Mackey-Glass signal are a bit lower than those of the Henon map, indicating that the Mackey-Glass signal also has higher complexity. The results demonstrate that DSE can discriminate the different simulation signals and can be used to quantify the information content of nonlinear time series.
To illustrate the influence of sampling points on entropy, we calculate DSE at different sampling points. Figure 5 shows the DSE under the different data points of these signals; when the number of sampling points is less than 500, the DSE fluctuates greatly; when the number of sampling points is greater than 500, the DSE is more stable. Therefore, when the number of sampling points of the signal is greater than 500, the result of DSE is reliable. The results demonstrate that DSE can discriminate the different simulation signals and can be used to quantify the information content of nonlinear time series.
To illustrate the influence of sampling points on entropy, we calculate DSE at different sampling points. Figure 5 shows the DSE under the different data points of these signals; when the number of sampling points is less than 500, the DSE fluctuates greatly; when the number of sampling points is greater than 500, the DSE is more stable. Therefore, when the number of sampling points of the signal is greater than 500, the result of DSE is reliable.

Analysis of Ship-Radiated Noise Based on RPSEMD, MI, and DSE
In this paper, all data of ship-radiated noise come from the official website of the National Park Service (available at http://www.nps.gov/glba/naturescience/soundclips.htm). Three different types of ship-radiated noise are selected as sample data, namely ferry, cruise ship, and freighter. For convenience, we named the three ship-radiated noise as Ship-I, Ship-II, and Ship-III, respectively. Each type of underwater acoustic signals has 30 sample data. Each sample length is 5000 points, and the sampling frequency is 44.1 kHz. The samples are normalized to get the time-domain waveform of three types of ship-radiated noise shown in Figures 6a, 7a, and 8a, respectively. The abscissa represents the sampling point, and the ordinate represents the normalized amplitude. The RPSEMD decomposition results of the three types of ship-radiated noise are shown in Figures 6b,  7b, and 8b, respectively.

Analysis of Ship-Radiated Noise Based on RPSEMD, MI, and DSE
In this paper, all data of ship-radiated noise come from the official website of the National Park Service (available at http://www.nps.gov/glba/naturescience/soundclips.htm). Three different types of ship-radiated noise are selected as sample data, namely ferry, cruise ship, and freighter. For convenience, we named the three ship-radiated noise as Ship-I, Ship-II, and Ship-III, respectively. Each type of underwater acoustic signals has 30 sample data. Each sample length is 5000 points, and the sampling frequency is 44.1 kHz. The samples are normalized to get the time-domain waveform of three types of ship-radiated noise shown in Figures 6a, 7a and 8a, respectively. The abscissa represents the sampling point, and the ordinate represents the normalized amplitude. The RPSEMD decomposition results of the three types of ship-radiated noise are shown in Figures 6b, 7b and  The results demonstrate that DSE can discriminate the different simulation signals and can be used to quantify the information content of nonlinear time series.
To illustrate the influence of sampling points on entropy, we calculate DSE at different sampling points. Figure 5 shows the DSE under the different data points of these signals; when the number of sampling points is less than 500, the DSE fluctuates greatly; when the number of sampling points is greater than 500, the DSE is more stable. Therefore, when the number of sampling points of the signal is greater than 500, the result of DSE is reliable.

Analysis of Ship-Radiated Noise Based on RPSEMD, MI, and DSE
In this paper, all data of ship-radiated noise come from the official website of the National Park Service (available at http://www.nps.gov/glba/naturescience/soundclips.htm). Three different types of ship-radiated noise are selected as sample data, namely ferry, cruise ship, and freighter. For convenience, we named the three ship-radiated noise as Ship-I, Ship-II, and Ship-III, respectively. Each type of underwater acoustic signals has 30 sample data. Each sample length is 5000 points, and the sampling frequency is 44.1 kHz. The samples are normalized to get the time-domain waveform of three types of ship-radiated noise shown in Figures 6a, 7a, and 8a, respectively. The abscissa represents the sampling point, and the ordinate represents the normalized amplitude. The RPSEMD decomposition results of the three types of ship-radiated noise are shown in Figures 6b,  7b, and 8b, respectively.   Figure 9 shows the DSE of different sampling point numbers, whose entropy gradually stabilizes with the increase of the number of sampling points. When the number of sampling points reaches 500, the DSE entropy of the three types of ships-radiated noise are stable at around 0.9. It can be found that DSE is sensitive to noise, so these signals have a large entropy. Therefore, it is difficult to distinguish these three types of ships if the DSE of the original signal is directly used as the feature vector. It can be seen from Figures 6b, 7b, and 8b that the ship-radiated noise is decomposed into a series of IMFs by RPSEMD, and the frequency of the IMF decreases sequentially with the order of the modes. For different ship-radiated noise, the number of IMFs decomposed by RPSEMD is different. The norMI and DSE for each ship-radiated noise are listed in Figure 10.  Figure 9 shows the DSE of different sampling point numbers, whose entropy gradually stabilizes with the increase of the number of sampling points. When the number of sampling points reaches 500, the DSE entropy of the three types of ships-radiated noise are stable at around 0.9. It can be found that DSE is sensitive to noise, so these signals have a large entropy. Therefore, it is difficult to distinguish these three types of ships if the DSE of the original signal is directly used as the feature vector. It can be seen from Figures 6b, 7b, and 8b that the ship-radiated noise is decomposed into a series of IMFs by RPSEMD, and the frequency of the IMF decreases sequentially with the order of the modes. For different ship-radiated noise, the number of IMFs decomposed by RPSEMD is different. The norMI and DSE for each ship-radiated noise are listed in Figure 10.  Figure 9 shows the DSE of different sampling point numbers, whose entropy gradually stabilizes with the increase of the number of sampling points. When the number of sampling points reaches 500, the DSE entropy of the three types of ships-radiated noise are stable at around 0.9. It can be found that DSE is sensitive to noise, so these signals have a large entropy. Therefore, it is difficult to distinguish these three types of ships if the DSE of the original signal is directly used as the feature vector.  Figure 9 shows the DSE of different sampling point numbers, whose entropy gradually stabilizes with the increase of the number of sampling points. When the number of sampling points reaches 500, the DSE entropy of the three types of ships-radiated noise are stable at around 0.9. It can be found that DSE is sensitive to noise, so these signals have a large entropy. Therefore, it is difficult to distinguish these three types of ships if the DSE of the original signal is directly used as the feature vector. It can be seen from Figures 6b, 7b, and 8b that the ship-radiated noise is decomposed into a series of IMFs by RPSEMD, and the frequency of the IMF decreases sequentially with the order of the modes. For different ship-radiated noise, the number of IMFs decomposed by RPSEMD is different. The norMI and DSE for each ship-radiated noise are listed in Figure 10. It can be seen from Figures 6b, 7b and 8b that the ship-radiated noise is decomposed into a series of IMFs by RPSEMD, and the frequency of the IMF decreases sequentially with the order of the modes. For different ship-radiated noise, the number of IMFs decomposed by RPSEMD is different. The norMI and DSE for each ship-radiated noise are listed in Figure 10. In Figure 10, the abscissa represents the IMF ordinal number of ship-radiated noise, the red circle represents the norMI of IMF, and the blue triangle represents the DSE of IMF. It can be found that DSE decreases with the increase of IMF, indicating that the higher the order of IMF, the lower the complexity. Consequently, if only DSE is used as the feature vector, it is arduous to distinguish the three types of ship-radiated noise.
Li et al. [21] used the IMF with the highest energy as the principal IMF (PIMF) and used the permutation entropy of PIMF as the feature vector (namely EMD-PIMF-PE) to realize the classification of three types of ship-radiated noise. In Figure 10, the DSE of the IMF with the largest norMI is marked with a green dashed line, and the DSE of the IMF with the largest norMI is also listed in Table 3. It can be seen that for Ship-I and Ship-II, the DSE of the IMF with the largest norMI are around 0.27, and the values are quite close. Therefore, it may be arduous to distinguish between Ship-I and Ship-II using this method, which will be proven later.

Parameter
Ship-I Ship-II Ship-III The index of the IMF with the largest norMI 7 5 3 The DSE of the IMF with the largest norMI 0.2802 0.2664 0.7116 Therefore, we consider using norMI as the weight, weighting the corresponding DSE and using the sum of the weighted DSE (WDSE) as the feature vector, giving the formula as follows: where i norMI represents the norMI of the i th IMF, i DSE denotes the DSE of the i th IMF, and K is the number of IMFs. The WDSE of the three ships' radiated noise are listed in Table 4. It can be found that the difference between the feature vectors of Ship-I and Ship-II becomes larger.  In Figure 10, the abscissa represents the IMF ordinal number of ship-radiated noise, the red circle represents the norMI of IMF, and the blue triangle represents the DSE of IMF. It can be found that DSE decreases with the increase of IMF, indicating that the higher the order of IMF, the lower the complexity. Consequently, if only DSE is used as the feature vector, it is arduous to distinguish the three types of ship-radiated noise.
Li et al. [21] used the IMF with the highest energy as the principal IMF (PIMF) and used the permutation entropy of PIMF as the feature vector (namely EMD-PIMF-PE) to realize the classification of three types of ship-radiated noise. In Figure 10, the DSE of the IMF with the largest norMI is marked with a green dashed line, and the DSE of the IMF with the largest norMI is also listed in Table 3. It can be seen that for Ship-I and Ship-II, the DSE of the IMF with the largest norMI are around 0.27, and the values are quite close. Therefore, it may be arduous to distinguish between Ship-I and Ship-II using this method, which will be proven later. Therefore, we consider using norMI as the weight, weighting the corresponding DSE and using the sum of the weighted DSE (WDSE) as the feature vector, giving the formula as follows: where norMI i represents the norMI of the i th IMF, DSE i denotes the DSE of the i th IMF, and K is the number of IMFs. The WDSE of the three ships' radiated noise are listed in Table 4. It can be found that the difference between the feature vectors of Ship-I and Ship-II becomes larger.

Feature Extraction
To verify the effectiveness of the proposed feature extraction method, 30 samples of each type of ship-radiated noise are selected. The WDSE distribution of the three types of ship-radiated noise is shown in Figure 11a. The abscissa represents the number of samples, and the ordinate represents the feature vector WDSE. It can be seen that the WDSE of Ship-III is the largest, and the WDSE of Ship-II is the smallest. The results demonstrate that the WDSE value is at the same level for the same ships, but there is an obvious difference for different types of ships. The above results manifest that the proposed feature extraction method can distinguish three types of ship-radiated noise.

Feature Extraction
To verify the effectiveness of the proposed feature extraction method, 30 samples of each type of ship-radiated noise are selected. The WDSE distribution of the three types of ship-radiated noise is shown in Figure 11a. The abscissa represents the number of samples, and the ordinate represents the feature vector WDSE. It can be seen that the WDSE of Ship-III is the largest, and the WDSE of Ship-II is the smallest. The results demonstrate that the WDSE value is at the same level for the same ships, but there is an obvious difference for different types of ships. The above results manifest that the proposed feature extraction method can distinguish three types of ship-radiated noise.
In order to prove the superiority of the proposed method, the DSE of original ship-radiated noise, the EMD-PIMF-PE method [21], and the DSE of the IMF with the largest norMI (IMF-norMI-DSE) are taken as the feature vector of ship-radiated noise, respectively. According to literature [21], the embedding dimension and time delay of permutation entropy are set as four and one, respectively. As shown in Figure 11b, due to the influence of the marine environment, the DSEs of the three types of ship-radiated noise are basically between 0.9 and 0.94. Therefore, it is not feasible to distinguish these ships directly by using DSE. It can be seen from Figure 11c that the EMD-PIMF-PE method can basically distinguish between Ship-II and Ship-III, but there is a large overlap between Ship-I and Ship-II. Figures 11c and 11d have similar results. It can be seen that the feature vector of Ship-III is the largest, and the feature vector of Ship-II is the smallest. This result is consistent with the proposed method in this paper. Compared with the proposed method, the IMF-norMI-DSE method is completely unable to identify Ship-I because their DSE values fluctuate greatly.

Classification
To realize the automatic identification of ship-radiated noise, the extracted features are input In order to prove the superiority of the proposed method, the DSE of original ship-radiated noise, the EMD-PIMF-PE method [21], and the DSE of the IMF with the largest norMI (IMF-norMI-DSE) are taken as the feature vector of ship-radiated noise, respectively. According to literature [21], the embedding dimension and time delay of permutation entropy are set as four and one, respectively. As shown in Figure 11b, due to the influence of the marine environment, the DSEs of the three types of ship-radiated noise are basically between 0.9 and 0.94. Therefore, it is not feasible to distinguish these ships directly by using DSE. It can be seen from Figure 11c that the EMD-PIMF-PE method can basically distinguish between Ship-II and Ship-III, but there is a large overlap between Ship-I and Ship-II. Figure 11c,d have similar results. It can be seen that the feature vector of Ship-III is the largest, and the feature vector of Ship-II is the smallest. This result is consistent with the proposed method in this paper. Compared with the proposed method, the IMF-norMI-DSE method is completely unable to identify Ship-I because their DSE values fluctuate greatly.

Classification
To realize the automatic identification of ship-radiated noise, the extracted features are input into the SVM for training and testing. For each type of ship-radiated noise, 10 samples are used as training samples, and the remaining 20 samples are used as test samples. To compare classification accuracy, the DSE of the original ship-radiated noise, the EMD-PIMF-PE method [21], and the IMF-norMI-DSE method are also used to classify ship-radiated noise. The SVM outputs of these four methods are shown in Figure 12, respectively, and the recognition rates are listed in Table 5. It can be found that the classification results of SVM are consistent with the feature extraction results in Section 6.1. For each type of ship-radiated noise, the DSE of the original signal is not completely classified correctly, and the classification accuracy is 48.3333%. The EMD-PIMF-PE method can classify Ship-II and Ship-III well, but cannot correctly identify Ship-I, and the classification accuracy is 70%. The IMF-norMI-DSE method is inferior to the EMD-PIMF-PE method, and the classification accuracy is 66.6667%. Compared with the other three methods, the classification accuracy of the proposed method reaches 98.3333%. The results indicate that the proposed method can better classify the three types of ship-radiated noise.

Conclusions
To improve the recognition accuracy of ship-radiated noise, a novel feature extraction method based on RPSEMD, MI, and DSE is proposed. The main findings in this paper are highlighted as follows: (1) A novel differential symbolic entropy for measuring the complexity of time series is introduced.
DSE not only has the advantage of high computational efficiency, but also has a significant effect on shorter time series. It was first applied to underwater acoustic signal processing. (2) Simulation experiments demonstrate that RPSEMD can better alleviate the mode mixing problem compared with EMD and EEMD. Therefore, this paper uses RPSEMD as a signal decomposition tool. (3) Compared with [21], it is often the case that only one IMF with the principal features is selected for feature extraction. In this paper, the entropy is weighted by norMI, so the importance of each IMF is considered. (4) The method proposed in this paper can extract the characteristics of ship-radiated noise more precisely and comprehensively, and the classification accuracy reaches 98.3333%.
Last but not least, this paper provides a new idea for feature extraction of nonlinear and non-stationary signals. For example, the weight coefficient can be replaced by the IMF's energy and other parameters. It is worth noting that although both EEMD and RPSEMD improve the mode mixing problem of EMD, RPSEMD is more efficient than EEMD and therefore has better utility. In the following work, we will use RPSEMD to reduce noise of underwater acoustic signal and compare it with EEMD.