A Comparative Study of Multiscale Sample Entropy and Hierarchical Entropy and Its Application in Feature Extraction for Ship-Radiated Noise

The presence of marine ambient noise makes it difficult to extract effective features from ship-radiated noise. Traditional feature extraction methods based on the Fourier transform or wavelets are limited in such a complex ocean environment. Recently, entropy-based methods have been proven to have many advantages compared with traditional methods. In this paper, we propose a novel feature extraction method for ship-radiated noise based on hierarchical entropy (HE). Compared with the traditional entropy, namely multiscale sample entropy (MSE), which only considers information carried in the lower frequency components, HE takes into account both lower and higher frequency components of signals. We illustrate the different properties of HE and MSE by testing them on simulation signals. The results show that HE has better performance than MSE, especially when the difference in signals is mainly focused on higher frequency components. Furthermore, experiments on real-world data of five types of ship-radiated noise are conducted. A probabilistic neural network is employed to evaluate the performance of the obtained features. Results show that HE has a higher classification accuracy for the five types of ship-radiated noise compared with MSE. This indicates that the HE-based feature extraction method could be used to identify ships in the field of underwater acoustic signal processing.


Introduction
Identification and classification of marine vehicles are important in the field of underwater signal processing, as they are of great value in the military and marine economy [1][2][3][4]. An important aspect of the ship classification problem is to extract effective features from received signals. Features extracted from a signal are the representation of part of the signal's characteristics. Insufficient characteristic reflection will lead to low accuracy of classification. Therefore, there is a great need for the development of feature extraction methods in the field of underwater signal processing.
The traditional feature extraction method is based on the frequency domain. There are many studies devoted to extracting the spectral characteristics of signals, such as the analysis of the power spectral density of signals [5]. However, studies show that traditional methods have shortcomings and limitations in practical applications. For example, the traditional spectrum-based method is based on the assumption of the linearity of the signals, which means the features extracted using this method will miss the signal's nonlinear characteristics [6]. In this paper, we use entropy as a feature extraction method, which is based on the time domain and quantifies the complexity of the signal as a feature.
MSE. In Section 4, five types of ship-radiated noise are given to reflect the difference between the two feature extraction methods. Finally, Section 5 is the conclusion.

Sample Entropy
Sample entropy quantifies a system's degree of regularity by calculating the negative natural logarithm of conditional probability. It was developed by Richman and Moorman in 2000. Compared with approximate entropy, sample entropy eliminates the bias caused by self-matching. Meanwhile, it also reduces the computational time. Given a time series {x(i) : 1 ≤ i ≤ N}, N is the length of the original time series. {x(i) : 1 ≤ i ≤ N} can be reconstructed into a set of sequences as follows: X(i) = [x(i), x(i + 1), . . . , x(i + m − 1)] : 1 ≤ i ≤ N − m + 1, where m is the embedding dimension. According to m and N + m − 1 sequences, which were obtained above, the distance d[X(i), X(j)] between any two vectors can be defined, abbreviated as D m (i): Since the time series {x(i) : 1 ≤ i ≤ N} has already been given, the standard deviation (SD) of the time series can be readily obtained. Set r = 0.1 SD ∼ 0.25 SD to be the threshold, with the distance d[X(i), X(j)]. The formula of B m i (r) is given by: Equation (2) computes the probability of the distance between X(i) and the remaining sequences within the threshold r. Moreover, the average of B m i (r) can obtained by Equation (3): Increasing the embedding dimension m to m + 1, then B m+1 i (r) can be analogously obtained by repeating the previous steps. Finally, the sample entropy (SampEn(m, r, N)) is given by the following equation: In order to better understand the calculation process of sample entropy, we briefly describe it through Figure 1.
A time series {x(i) : 1 ≤ i ≤ 50} is given to illustrate the process for calculating SampEn(m, r, N). We specify m = 2 and r = 0.15 SD. The horizontal dashed lines around x(1), x (2), and x(3) represent x(1) ± r, x(2) ± r, and x(3) ± r, respectively. If the absolute difference between any two points is less than r, these two points match each other; also, it can be viewed as "indistinguishable". In Figure 1, all of the points that match x(1), x (2), and x(3) are represented with same symbol, respectively. Let {x(1), x(2)} and {x (1) (1) The ratio between the sum of two-point template matches and the sum of three-point template matches can be obtained. Therefore, SampEn(m, r, N) is the natural logarithm of this ratio.
The value of SampEn(m, r, N) is related to the parameters m and r. Therefore, the choices of these two parameters are also very important. According to Chen's research [26], m is set to be one or two, and r = 0.1 SD ∼ 0.25 SD under most circumstances.

Multiscale Sample Entropy
Although SE has many advantages, in some circumstances, it cannot reflect the complexity differences between different signals accurately. The structure of signals generated from complex systems exhibits multiple temporal scale characteristics in the actual ocean environment. SE, as a single-scale-based method, does not account for the interrelationship between entropy and multiple scales. In order to overcome this shortage, Costa et al. developed the concept of multiscale sample entropy [15]. MSE can be viewed as SE with a coarse-graining process for the time series [27]. The coarse-graining process is based on averaging the samples inside moving, but non-overlapping windows. For a given time series {x(i) : 1 ≤ i ≤ N}, the coarse-graining process is denoted as: where N is the length of the time series and N n = N n stands for the largest integer no greater than N n . Hence, MSE at scale n is obtained by calculating the sample entropy of y (n) . The MSE focuses on lower frequency components of a time series. However, it ignores the information contained in the higher frequency components of the signal. This problem leads to the development of hierarchical entropy.

Hierarchical Entropy
Hierarchical entropy (HE) is an algorithm quantifies the "complexity" of a time series based on SE and hierarchical decomposition. Unlike MSE, hierarchical decomposition takes both higher and lower frequency components of a time series into consideration [18]. Specifically, for a given time series, x = {x(i) : 1 ≤ i ≤ 2 n }. The definition of two operators Q 0 and Q 1 is as follows: Q 0 (x) and Q 1 (x) are respectively the lower and higher frequency component of time series x, and their scale is two and their length 2 n−1 . As a matter of fact, x can be reconstructed from Q 0 (x) and Q 1 (x).
Q 0 (x) j and Q 1 (x) j stand for the j th value in Q 0 (x) and Q 1 (x), respectively. Thus, Q 0 (x) and Q 1 (x) constitute the two-scale hierarchical decomposition of the time series x.
After we obtain Q 0 (x) and Q 1 (x), each of them can also be decomposed by Q 0 and Q 1 . Consequently, we can get the hierarchical decomposition of the time series X at a scale of three. A tree graph can clearly show the relationship between each hierarchical component of the time series X in Figure 2. After the hierarchical decomposition, several sub-signals x (n,e) can be obtained, where n represents the scale and e stands for the e th sub-signal at scale n. Calculate the SE for each sub-signal, and the HE result of X is obtained. It is important to choose the appropriate scales in different circumstances. On the one hand, high scales usually lead to computational redundancy. On the other hand, low scales may have insufficient accuracy in SampEn(m, r, N)'s computation.

Simulation Analysis of Different Signals Based on Hierarchical Entropy and Multiscale Sample Entropy
In this section, MSE and HE are compared using different simulation signals in order to illustrate their different characteristics. Before the simulation analysis, there are some previous steps that need to be done. In this paper, all the SE calculation's parameters are the same, which is m = 2, r = 0.15 SD, and the length of the data is at least 512 points for every SE calculation. In this part, the content is divided into the following subsections. First, we prove that the parameters chosen when calculating SE are appropriate. Second, three different orders of AR signals with different complexity are used to prove that HE is an effective measure of complexity. Third, different simulation signals are constructed, and their results of HE and MSE are compared. The results show that MSE pays more attention to the low-frequency components of the signal, and HE not only retains the information of the low-frequency components of the signal, but also retains the information of the high-frequency components of the signal. Finally, considering the noise interference in practical applications, this paper compares the robustness of the two methods to noise.

Parameter Selection for Sample Entropy
Both HE and MSE are based on SE. When we calculate the SE for a signal, it is important to choose the appropriate m and r. Since our main purpose is using entropy as a feature extraction method for ship-radiated noise, the simulation signals in this subsection are set as follows: In Equation (9), S 1 (n) and S 2 (n) are two sinusoidal signals mixed with Gaussian white noise. We use the sinusoidal signal in order to simulate the periodic signal produced by the ship engine or propeller. Meanwhile, Gaussian white noise is used to simulate the ambient noise. Since the composition of the ship-radiated noise is very complex, including ambient noise, cavitation noise, and signals produced by propellers and the engine, we simplify the model of ship-radiated noise as Equation (9). The signal-to-noise ratio (SNR) is set to be 5 dB, m = 2, and r = 0.15 SD. To demonstrate the impact of different data lengths on the calculation results, we calculated 60 sets of SE results with different lengths of the two signals, each with 30 results. The data length increased from 150 equal intervals to 3150. The result is shown in Figure 3. In Figure 3, as the length of the calculated sample entropy data increases, the results of the calculation become gradually stable. When the data length is too short, the SE results are too unstable to distinguish the sinusoidal signals of two different frequencies very well. Although the result becomes more stable as the data length increases, due to the consideration of the amount of calculation, when calculating the sample entropy in the paper, the data length is unified to 512. When we calculate HE in this paper, since the data length is 8192 points, we decompose the signal into a scale of five and guarantee that the SE's calculation that is contained in HE is at least 512 points.
After selecting the appropriate data length, the same simulated signals in Equation (9)   From Figures 4 and 5, the result of SE is too close to distinguish two signals when m = 3, and it becomes unstable when m is larger than three, so we set m = 2 in this paper. As for r, the value of r has little effect on the stability of the results, so we set r = 0.15 * SD. The same parameters are discussed using the real ship-radiated noise employed in this paper [28], further verifying the conclusion in this section. The results are demonstrated in Figure 6. For some certain types of ship-radiated noise, SE cannot distinguish them very well according to Figure 6. This is why we need to introduce HE as a new feature extraction method to help us distinguish different signals.

Hierarchical Entropy Analysis for the AR Process
Three autoregressive processes (AR) with different orders will be given to demonstrate that HE is an effective method for measuring the complexity of different signals. The AR time series are given by: where n(t) is the Gaussian white noise with a standard normal distribution. The length of each AR process is 2 13 . p indicates the order of the AR process, and α i is the correlation coefficients. The value of α i in each AR process is given in Table 1 according to [29].  The HE results of three AR time series are illustrated in Figure 7; HE(n, e) stands for the e th component of hierarchical entropy at scale n, and this abbreviation is used throughout this paper.  The AR process specifies that the output value is linearly dependent on its own previous and random terms. The dependence of the output value on the previous terms increases as the order p increases. Furthermore, as the order p increases, the correlation of the signal increases accordingly, making the model more predictable [23,29]. That is, the complexity of AR(p + 1) is lower than that of AR(p). Based on this idea, the value of HE should be negatively correlated to order p. Figure 7 depicts that the sample entropy of lower frequency components decreases while the order p of the AR time series increases. Hence, HE can be confirmed as an effective method for measuring the complexity of different time series.

Properties for Multiscale Sample Entropy
In this section, a set of simulation signals is employed to demonstrate the properties for MSE, which is focused on the lower frequency components of the signal. This property leads to the result that MSE performs well in distinguishing the signals with different low-frequency components. In order to highlight these properties of MSE, a set of signals is given as follows: The lower frequency components of f 1 (n) and f 2 (n) are different, while the high-frequency components are the same. The waveform of f 1 (n) and f 2 (n) is shown in Figure 8. According to the theory of MSE, MSE should be able to distinguish between the two signals very well since the difference between the two signals is mainly in the lower frequency components. Figure 9 is the MSE result for f 1 (n) and f 2 (n) from a scale of 1-15.  f 1 (n) and f 2 (n) can be distinguished by MSE easily since the two signals' MSE have a great difference when the scale is greater than eight. Therefore, MSE performs well when distinguishing signals with different low-frequency components.

Properties for Hierarchical Entropy
According to the basic theory of hierarchical entropy, it takes into account higher frequency components of the signal when calculating, while sample entropy and multiscale sample entropy do not. Consequently, hierarchical entropy performs better when measuring the complexity of those signals whose information is stored in both lower and higher frequency components. In order to illustrate this characteristic, a set of synthetic signals are given as follows: f 4 (n) = sin(2π * 5n) : 1 ≤ n ≤ (2 13 − 2 10 ), sin(2π * 50n) : (2 13 − 2 10 ) + 1 ≤ n ≤ 2 13 .
f 3 (n) and f 4 (n) are signals that contain both higher and lower frequency components. Part of the waveform of f 3 (n) and f 4 (n) is shown in Figure 10. It is obvious that the information stored in the lower frequency components is the same, while the information stored in the higher frequency components is different. Based on the theory of sample entropy and multiscale sample entropy, only the lower frequency part is considered, which will lead to lower accuracy in distinguishing different signals while using SE or MSE. However, HE still measures the complexity of f 3 (n) and f 4 (n) very well since it considers the information stored in the higher frequency component. The HE results of two signals is displayed in Figure 11. The numerical result of SE, MSE, and HE is also shown in Table 2. Table 2. Different entropy's results of f 3 (n) and f 4 (n). Before the interpretation of the results, first, some abbreviations are explained. MSE(i) stands for the multiscale entropy of signals at scale i, and HE(n, e) stands for the e th component of the hierarchical entropy at scale n. These abbreviation are also used in the rest of this paper. According to the results displayed in Figure 11 and Table 2. The histogram at a scale of one is the sample entropy of the signal, HE(i, 0) is equivalent to MSE(2 i−1 ). Based on this equivalence relationship between MSE and HE, the HE results of f 3 (n) and f 4 (n) illustrated in Figure 11 also include part of the results of MSE. From Figure 11c, the HE results of the low frequency components of the two signals are not much different, but in some of the high-frequency components, the two signals can be successfully distinguished. That is to say, MSE cannot distinguish between signals that differ only in high-frequency components. Hence, HE has a better performance than SE or MSE in distinguish different frequency signals, especially when the information of the signal is mainly stored in higher frequency components.

Feature Extraction Method Based on HE
The main steps of the feature extraction method based on HE are shown in Figure 12.

•
Step 1: Five types of ship-radiated noise are given in this paper; choose the appropriate hierarchical decomposition order to guarantee that the length of sub-signal is longer than 512.

•
Step 2: By doing the hierarchical decomposition n times, 2 n sub-signals can be obtained, representing the lower and higher frequency components of the original signal, respectively.

•
Step 3: Calculate the sample entropy for each sub-signal. Get the HE result.

•
Step 4: Flatten the HE matrix into a vector. Pass the vector through an artificial neural network.

•
Step 5: Get the classification results.

Feature Extraction of Ship-Radiated Noise Based on HE
In this section, five types of ship-radiated noise were employed for the feature extraction (the ship-radiated noise of Ships D and E can be obtained from https://www.nps.gov/glba/learn/nature/ soundclips.htm). The sampling frequency of Ships A, B, and C was 52.7 kHz. As for Ships D and E, the sampling frequency was 44.1 kHz. Ship A was a cruise ship. The vessel was less than 50 m away from the hydrophone. The hydrophone depth was 4.8 m. Ship B was an ocean liner. The vessel was less than 50 m away from the hydrophone. The hydrophone depth was 5.8 m. Ship C was a motorboat. The distance between the vessel and the hydrophone changed from 50 m-100 m during the recording of the data approximately.
The hydrophone depth was 5.8 m. Further information for Ships A, B, and C can be found in [30]. Ships D and E were downloaded from a public website [31]. We chose a part of each signal and divided them into 100 segments separately. The length of each segment was 8192 sample points, namely 0.18 s of real-world data for Ships D and E and 0.15 s of real-world data for Ships A, B, and C. We can obtain 100 results for each type of ship-radiated noise by calculating the HE and MSE for every segment. The number of hierarchical decompositions was set as five. The waveform of five types of ship-radiated noise is demonstrated in Figure 13. Figure 14 gives the power spectrum density analysis results of the five types of signals. Much useful information can be obtained from the power spectrum density analysis results of the five types of ship-radiated noise in Figure 14. The narrow-band spectral lines existing in Figure 14b,c make it easy to distinguish Ship B and Ship C. As for the rest of the types of ship, which are Ships A, D, and E in Figure 14a,d,e, few spectral lines can be found for us to distinguish different types of ship. Especially for Ships D and E, the fact that there was no evident distinction existing in their broadband spectral envelops made it difficult for us to distinguish these two types of ships accurately. Therefore, classifying these five different ships using the spectrum as a feature is difficult. The HE results of the five types of ship-radiated noise are illustrated in Figure 15. In order to compare the performance when HE and MSE both calculate the same data length for their sub-signals, Figure 16 shows the MSE result of the five types of ships from a scale of 1-16. Guarantee that when calculating the HE at a scale five, the length of the sub-signal was 512 points, the same as MSE at a scale of 16. Since it is difficult to see the differences between the five types of ship-radiated noise through Figure 15, part of the HE results are also shown numerically through Table 3. HE(n)represents the HE result at scale n.  According to the MSE result demonstrated in Figure 16, we can see that SE can only distinguish Ship C from other types of ship. Throughout the MSE result from a scale of 1-16, the entropy differences between Ships A and D and Ships B and E remained small.  To evaluate the performance of the above-mentioned feature extraction methods quantitatively, the results of two methods were separately classified and identified by a probabilistic neural network. Since the MSE's results for the five types of ships were vectors of length 16, we fed the probabilistic neural network with these vectors to get the classification results. As for HE, we flattened the HE's results from matrices into vectors of a length of 31, then fed the PNN with these vectors to get the classification results. The classification results are demonstrated in Tables 4-6. The training set for each type of ship was 70, and the test set was 30.  Before assessing the performance of the PNN, the definitions of "sensitivity" and "specificity" are given as follows: where TP, TN, FP, and FN are the abbreviations for "true positive", "true negative ", "false positive", and "false negative", respectively. It is important to note that "accuracy" calculates the overall classification accuracy of neural networks, which is also the average of "sensitivity". From Tables 4-6, it is obvious that HE was able to classify five types of ships very well. Even for those types of ships that SE and MSE could not classify, their sensitivities in HE's result were very high. The accuracy of HE increased 9.3% compared with MSE and 23.3% compared with SE. In order to eliminate the impact of sampling frequency, we reduced the sampling frequency of Ships A, B, and C from 52.7 kHz to 44.1 kHz, calculated the HE results for five types of ships, and passed the results through PNN. The classification result is demonstrated in Table 7. Through the table, we can see that the classification accuracy was 96%, very close to the accuracy of not reducing the sampling frequency. Moreover, we mixed five types of ship-radiated noise with Gaussian white noise. The SNR was set to be 5 dB, and the classification results are illustrated in Tables 8 and 9.  According to the results shown in Tables 8 and 9, as the noise mixed into the ship-radiated noise, both HE and MSE were affected. However, even though the accuracy of both methods decreased, HE's accuracy remained higher compared with MSE. The accuracy of HE decreased by 5.3% with added noise, while the accuracy of MSE decreased by 14.7% under the same conditions. Furthermore, even when the ship-radiated noise was mixed with noise, HE could still distinguish Ship C very well.

Conclusions
A new method was proposed for feature extraction of ship-radiated noise based on hierarchical entropy in this paper. The simulation analysis indicated that HE had better performance compared with MSE when the differences between signals were mainly focused on their high-frequency components. Applying two feature extraction methods to ship-radiated noise could help distinguish some signals that were not very different in the frequency domain. Moreover, in order to compare the performance of HE and MSE, we passed the extracted features through a neural network, and the classification results showed that the classification accuracy of HE was higher than MSE. In summary, since HE considered more information, as a new feature extraction method in the field of underwater acoustic signal processing, HE can better distinguish different signals in most circumstances than traditional entropy-based methods such as MSE.
Author Contributions: W.L., X.S., and Y.L. conceived and designed the research, W.L. analyzed the data and wrote the manuscript. X.S. and Y.L. revised the manuscript. All authors have read and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: