Underwater Acoustic Target Recognition: A Combination of Multi-Dimensional Fusion Features and Modiﬁed Deep Neural Network

: A method with a combination of multi-dimensional fusion features and a modiﬁed deep neural network (MFF-MDNN) is proposed to recognize underwater acoustic targets in this paper. Speciﬁcally, due to the complex and changeable underwater environment, it is di ﬃ cult to describe underwater acoustic signals with a single feature. The Gammatone frequency cepstral coe ﬃ cient (GFCC) and modiﬁed empirical mode decomposition (MEMD) are developed to extract multi-dimensional features in this paper. Moreover, to ensure the same time dimension, a dimension reduction method is proposed to obtain multi-dimensional fusion features in the original underwater acoustic signals. Then, to reduce redundant features and further improve recognition accuracy, the Gaussian mixture model (GMM) is used to modify the structure of a deep neural network (DNN). Finally, the proposed underwater acoustic target recognition method can obtain an accuracy of 94.3% under a maximum of 800 iterations when the dataset has underwater background noise with weak targets. Compared with other methods, the recognition results demonstrate that the proposed method has higher accuracy and strong adaptability.


Introduction
With the development of sonar technology, underwater acoustic target recognition has become one of the major functions of sonar systems. It is extensively used for marine biological survey, marine exploration and other scientific activities. However, existing underwater acoustic target recognition still depends on the decisions of well-trained solar persons. It has been difficult to implement continuous monitoring and recognition [1,2]. Therefore, underwater acoustic target recognition with a high recognition accuracy and efficiency attracts extensive attention both in military and civil fields [3][4][5][6][7][8].
Currently, underwater acoustic target recognition contains feature extraction and recognition. Feature extraction is a process which extracts various features from underwater acoustic signals [9,10]. The Mel filter bank has been widely used in feature extraction. It was designed for imitating the band pass filter bank features of the human ear, and it has been the foundation of most speech processing, such as underwater acoustic target recognition, speaker recognition [11][12][13][14][15]. The cepstral or energy features are obtained from the Mel filter bank, which is known as the Mel frequency cepstral coefficient (MFCC). It has been considered the baseline feature for most applications. Lim, T et al. firstly introduced MFCC to recognize underwater acoustic targets. Experimental results demonstrate that the method is very promising for underwater acoustic target recognition [11]. Later Lim and Bae et al. proposed a method which combined the MFCC with neural network. This method also where ( ) x t is the original underwater acoustic signal, S M is the needed number of the DFT.
The Gammatone filter refers to a causal filter with an infinitely long sequence impulse response. In the filter bank, the time domain impulse response of each Gammatone filter can be considered as the product of the Gammatone function, and the acoustic signals can be calculated as below: where n is the order of the filter, i B is the attenuation factor of the filter, i f is the center frequency, i ϕ is the phase of the filter, ( ) u k is the step function, N is the total number of filters.
When extracting the features from the underwater acoustic signals, the bandwidth of each filter is determined by the critical band of the human ear for simulating human auditory features. The critical frequency band is expressed as ( ) ( ) 24.7 4.37 1000 1 i i ( ) 1.019 where i b is the bandwidth of each sub-band filter in the Gammatone filter bank, which is obtained from the critical band. The energy spectrum ( ) S E i is shown as ( ) ( ) ( ) The DFT is obtained by the Fourier transformation as described by the following equation: where x(t) is the original underwater acoustic signal, M S is the needed number of the DFT. The Gammatone filter refers to a causal filter with an infinitely long sequence impulse response. In the filter bank, the time domain impulse response of each Gammatone filter can be considered as the product of the Gammatone function, and the acoustic signals can be calculated as below: where n is the order of the filter, B i is the attenuation factor of the filter, f i is the center frequency, ϕ i is the phase of the filter, u(k) is the step function, N is the total number of filters. When extracting the features from the underwater acoustic signals, the bandwidth of each filter is determined by the critical band of the human ear for simulating human auditory features. The critical frequency band is expressed as where b i is the bandwidth of each sub-band filter in the Gammatone filter bank, which is obtained from the critical band. The energy spectrum E S (i) is shown as where Q is the number of filters and i = 1, 2, . . . , Q, is the boundary of the filters. The DCT is applied to calculate the logarithm of the filter bank as described by the following equation: To verify the validity of the proposed multi-dimensional fusion features method in this paper, Figure 2 shows the time-domain waveform of the original underwater acoustic signals, which is a section of underwater mammals in the dataset. The feature extraction results of original underwater acoustic signals are given in Figure 3 on the original underwater acoustic signals shown in Figure 2. where Q is the number of filters and is the boundary of the filters.
The DCT is applied to calculate the logarithm of the filter bank as described by the following equation: To verify the validity of the proposed multi-dimensional fusion features method in this paper, Figure 2 shows the time-domain waveform of the original underwater acoustic signals, which is a section of underwater mammals in the dataset. The feature extraction results of original underwater acoustic signals are given in Figure 3 on the original underwater acoustic signals shown in Figure 2.  As depicted in Figure 3, because the MFCC pays more attention to semantic features, feature extraction results of original underwater acoustic signals are rough in Figure 3a. The GFCC extract features of original underwater acoustic signals in Figure 3b, which can remove considerable redundant noise information in underwater acoustic signals, improve robustness, and retain effective voiceprint feature. Moreover, the extracted features are more precise and accurate.
To further compare the anti-noise performance of GFCC and MFCC in the feature extraction where Q is the number of filters and is the boundary of the filters.
The DCT is applied to calculate the logarithm of the filter bank as described by the following equation: To verify the validity of the proposed multi-dimensional fusion features method in this paper, Figure 2 shows the time-domain waveform of the original underwater acoustic signals, which is a section of underwater mammals in the dataset. The feature extraction results of original underwater acoustic signals are given in Figure 3 on the original underwater acoustic signals shown in Figure 2.   As depicted in Figure 3, because the MFCC pays more attention to semantic features, feature extraction results of original underwater acoustic signals are rough in Figure 3a. The GFCC extract features of original underwater acoustic signals in Figure 3b, which can remove considerable redundant noise information in underwater acoustic signals, improve robustness, and retain effective voiceprint feature. Moreover, the extracted features are more precise and accurate.
To further compare the anti-noise performance of GFCC and MFCC in the feature extraction process, Gaussian white noise was added into Figure 2. When the signal-to-noise ratio was 10 dB, the  Figure 3b, which can remove considerable redundant noise information in underwater acoustic signals, improve robustness, and retain effective voiceprint feature. Moreover, the extracted features are more precise and accurate.
To further compare the anti-noise performance of GFCC and MFCC in the feature extraction process, Gaussian white noise was added into Figure 2. When the signal-to-noise ratio was 10 dB, the time domain waveform of noise signals is shown in Figure 4. Figure 5 shows the feature extraction results of the MFCC and GFCC algorithm on the original underwater acoustic signals shown in Figure 4.  It can be seen from Figure 5a,b that the GFCC algorithm has strong anti-noise performance and has superiority in the feature extraction of underwater acoustic signals targets. It is beneficial to subsequent recognition research. But the features, distributions, and sizes extracted by the MFCC change significantly and it affects the accuracy of underwater acoustic target recognition.
Meanwhile, to verify the effectiveness and adaptability of the GFCC again, Table 1 shows MFCC-GMM and GFCC-GMM recognition accuracy when dataset did not have underwater background noise with weak targets.   It can be seen from Figure 5a,b that the GFCC algorithm has strong anti-noise performance and has superiority in the feature extraction of underwater acoustic signals targets. It is beneficial to subsequent recognition research. But the features, distributions, and sizes extracted by the MFCC change significantly and it affects the accuracy of underwater acoustic target recognition.
Meanwhile, to verify the effectiveness and adaptability of the GFCC again, Table 1 shows MFCC-GMM and GFCC-GMM recognition accuracy when dataset did not have underwater background noise with weak targets.  It can be seen from Figure 5a,b that the GFCC algorithm has strong anti-noise performance and has superiority in the feature extraction of underwater acoustic signals targets. It is beneficial to subsequent recognition research. But the features, distributions, and sizes extracted by the MFCC change significantly and it affects the accuracy of underwater acoustic target recognition.
Meanwhile, to verify the effectiveness and adaptability of the GFCC again, Table 1 shows MFCC-GMM and GFCC-GMM recognition accuracy when dataset did not have underwater background noise with weak targets. It can be seen from Table 1, the recognition accuracy of the GFCC-GMM is higher than that of MFCC-GMM. The GFCC can better describe underwater acoustic signals. Therefore, The GFCC has better robustness than the MFCC, and extracting voiceprint features has better adaptability to underwater acoustic target recognition.

Modified Empirical Mode Decomposition
MEMD is an improved algorithm for empirical mode decomposition (EMD). It refers to a signal decomposition method based on the local features of signals. The method exploits the advantages of the wavelet transform. Meanwhile, it solves the problem of selecting a wavelet basis and determining the decomposition scale in the wavelet transform [25]. Therefore, it is more suitable for non-linear and non-stationary signals analysis. MEMD is a white-adapted signal decomposition method, which can be employed for the analysis of underwater acoustic signals. According to empirical mode decomposition, it is assumed that all complex signals consist of simple IMF, and each IMF is independent of each other. The algorithm process of MEMD is shown in Figure 6. It can decompose the different scales or trend components of underwater acoustic signals step by step and produce a series of data sequences with the same feature scales. Compared with the original underwater acoustic signals, the decomposed sequence has stronger regularity. It can be seen from Table 1, the recognition accuracy of the GFCC-GMM is higher than that of MFCC-GMM. The GFCC can better describe underwater acoustic signals. Therefore, The GFCC has better robustness than the MFCC, and extracting voiceprint features has better adaptability to underwater acoustic target recognition.

Modified Empirical Mode Decomposition
MEMD is an improved algorithm for empirical mode decomposition (EMD). It refers to a signal decomposition method based on the local features of signals. The method exploits the advantages of the wavelet transform. Meanwhile, it solves the problem of selecting a wavelet basis and determining the decomposition scale in the wavelet transform [25]. Therefore, it is more suitable for non-linear and non-stationary signals analysis. MEMD is a white-adapted signal decomposition method, which can be employed for the analysis of underwater acoustic signals. According to empirical mode decomposition, it is assumed that all complex signals consist of simple IMF, and each IMF is independent of each other. The algorithm process of MEMD is shown in Figure 6. It can decompose the different scales or trend components of underwater acoustic signals step by step and produce a series of data sequences with the same feature scales. Compared with the original underwater acoustic signals, the decomposed sequence has stronger regularity.  MEMD calculates the x-coordinates of the interpolation points (IPs) by the following equation: where max t and min t contain the IPs.
To extract the IE and IF, IMFs are usually transformed by the HHT as below: MEMD calculates the x-coordinates of the interpolation points (IPs) by the following equation: where t max and t min contain the IPs.
To extract the IE and IF, IMFs are usually transformed by the HHT as below: where ∧ H(τ) is the IMF. The analytic signals can be calculated as follows: where a(t) is the model of H(t), j is the imaginary unit. a(t) is described by the following equation: where ϑ(t) is the phase of H(t), it can be calculated as below: The IE a 2 (t) and IF can be extracted by analyzing the signals. The IF is obtained by the following equation:

Multi-Dimensional Fusion Features Algorithm
The underwater environment is complex and changeable, so it is difficult to describe underwater acoustic signals with a single feature. GFCC and MFCC are both algorithms for simulating human hearing. The abilities to extract the infrasound or ultrasound features are limited, but the MEMD algorithm can effectively extract these features. Therefore, GFCC and MEMD were developed to extract multi-dimensional features in this paper. On this basis, to ensure the same time dimension, a dimension reduction method is proposed to obtain multi-dimensional fusion features in the original underwater acoustic signals. Figure 7 shows the feature extraction process. The feature extraction algorithm can extract the IF and IE from the IMFs, Meanwhile, it can extract the GFCC features and combine it with IF and IE to improve the capabilities of the algorithm. The process can eventually extract features that contain three dimensions and construct a feature vector.

Multi-Dimensional Feature Extraction
The global feature vector → g can be expressed as where t is time series (sampling points), G i (t) is the feature value of the GFCC at time i, a 2  Figure 7 shows the feature extraction process. The feature extraction algorithm can extract the IF and IE from the IMFs, Meanwhile, it can extract the GFCC features and combine it with IF and IE to improve the capabilities of the algorithm. The process can eventually extract features that contain three dimensions and construct a feature vector. The global feature vector  g can be expressed as Figure 7. The process of feature extraction. Figure 8 shows the structure of the feature fusion matrix. The feature fusion matrix is sorted in the same time sequence, which holds the same sequence. Furthermore, it can merge multiple features. It is noteworthy that the weight of GFCC is adjusted to the same proportion as that of IE and IF, and the ratio can be considered as a research direction in future studies. Since the features are exceedingly redundant, and the same time series are ensured, it is necessary to perform dimensionality reduction operations, which is similar to random pooling and framing for IE and IF.  Figure 8 shows the structure of the feature fusion matrix. The feature fusion matrix is sorted in the same time sequence, which holds the same sequence. Furthermore, it can merge multiple features. It is noteworthy that the weight of GFCC is adjusted to the same proportion as that of IE and IF, and the ratio can be considered as a research direction in future studies. Since the features are exceedingly redundant, and the same time series are ensured, it is necessary to perform dimensionality reduction operations, which is similar to random pooling and framing for IE and IF.

Dimension Reduction Method
The HHT is used to extract IE and IF on IMFs in a certain time series (sampling points), and the GFCC divides frames before Gammatone filtering. As a result, the temporal features of IE and IF are different from those of the GFCC. Therefore, it is necessary to reduce the dimension of IE and IF. Figure 9 shows the dimensionality reduction algorithm structure diagram. It is a onedimensional feature (IE or IF), and the red box represents the pooled domain. The size and step size of the pooled domain is identical to the frame length when the GFCC frames are divided. The frame length equals the ratio of the one-dimensional feature (IE or IF) to the GFCC feature length. The specific process of the proposed multi-dimensional fusion features is shown as Algorithm 1.

Dimension Reduction Method
The HHT is used to extract IE and IF on IMFs in a certain time series (sampling points), and the GFCC divides frames before Gammatone filtering. As a result, the temporal features of IE and IF are different from those of the GFCC. Therefore, it is necessary to reduce the dimension of IE and IF. Figure 9 shows the dimensionality reduction algorithm structure diagram. It is a one-dimensional feature (IE or IF), and the red box represents the pooled domain. The size and step size of the pooled domain is identical to the frame length when the GFCC frames are divided. The frame length equals the ratio of the one-dimensional feature (IE or IF) to the GFCC feature length.
GFCC divides frames before Gammatone filtering. As a result, the temporal features of IE and IF are different from those of the GFCC. Therefore, it is necessary to reduce the dimension of IE and IF. Figure 9 shows the dimensionality reduction algorithm structure diagram. It is a onedimensional feature (IE or IF), and the red box represents the pooled domain. The size and step size of the pooled domain is identical to the frame length when the GFCC frames are divided. The frame length equals the ratio of the one-dimensional feature (IE or IF) to the GFCC feature length. The specific process of the proposed multi-dimensional fusion features is shown as Algorithm 1. The specific process of the proposed multi-dimensional fusion features is shown as Algorithm 1.  (6); 6: Calculate the GFCC using Equation (7); 7: Let r 0 (t) ← s(t), h 0 k (t) ← r k−1 (t); 8: For m ← 1 to M or he number of extrema in r k (t) is 2 or less do: 9: For n ← 1 to N do: 10: Calculate the x-coordinates of the IPs, i.e., obtain t max and t min using Equation (11); 11: Calculate the y-coordinates of the IPs, h n−1 k (t); 12: y max ← h n−1 k (t max ), y min ← h n−1 k (t min ); 13: Create the maxima envelope, e max (t) using cubic spline interpolation, with the IPs as t max , y max ; 14: Create the minima envelope, e min (t), using cubic spline interpolation, with the IPs as t min , y min ;

Modified Deep Neural Network
When a DNN is directly used to recognize underwater acoustic targets, due to the direct input of the features of the underwater acoustic signals, there exist many redundant features in the processing of recognition. The GMM can extract the statistical parameters of the underwater acoustic signals, which can reduce the length of the underwater acoustic signals. The GMM can reduce the size of the DNN model, can reduce redundant features, and further improve recognition accuracy. Therefore, The GMM is used to modify the structure of the DNN in this paper. The modified DNN, which is shown in Figure 10, is a fully connected feedforward neural network with multiple hidden layers, and it can better accomplish underwater acoustic target recognition tasks. probability density function with order, which is expressed as where x is the D dimension of the eigenvector, ( ), 1,2, , ( ) ( ) ( ) ( ) where μ i is mean vector, Σ i is covariance matrix, T is the number of feature vectors, mixed weights , 1,2, , The Gaussian probability density function of each component is weighted to yield the GMM probability density function with order, which is expressed as where x is the D dimension of the eigenvector, N i (x), i = 1, 2, · · · , M is sub-distribution, w i , i = 1, 2, · · · , M is mixed weight. Each sub-distribution N i (x) refers to a D dimensional joint Gaussian probability distribution.
where µ i is mean vector, Σ i is covariance matrix, T is the number of feature vectors, mixed weights The full parameters of the GMM model for each kind of underwater acoustic signal targets are composed by the mean vector of each component, covariance matrix, and the set of mixed weights. The eigenvectors corresponding to the parameters are expressed as follows: The feature vector sequence extracted from training data is X = {x t }, t = 1, 2, · · · , T and the likelihood probability defined as Substituting it into the Gauss density function, as follows: The parameter estimation method adopted in the GMM model is maximum likelihood estimation (MLE). According to the feature vector sequence extracted from training data, the parameters of the Remote Sens. 2019, 11, 1888 11 of 17 model are adjusted continuously until the maximum likelihood probability is P(X|λ). The maximum likelihood probability is λ i , the model parameters are According to Equation (16), the input parameters of the DNN are defined as where µ i,ubm is the mean vector of the universal background model (UBM). Given a dimension input eigenvector, the activation vector of the first hidden layer is expressed as where W (1)T is the first hidden layer weight matrix transpose, dimension is I × N 1 , is the size of the offset vector, σ is activation function of the hidden layer.

The activation vectors of the i hidden layer are obtained by the activation vectors
where N i is the number of neurons in the i hidden layer, W (i)T is the transpose of the weight matrix of the i hidden layer, the dimension is is the offset vector of size. Since the ReLU function performs well in DNN classification and recognition tasks, the ReLU function serves as hidden layer activation functions. It is specifically defined as: The output layer of the DNN uses the SoftMax function to complete the output category. It finally realizes underwater acoustic signals classification and recognition. The SoftMax function is expressed as where z k is the vector of the dimension output layer. SoftMax regression algorithm serves as the loss function.
where log e w T j z (i) is the predicted label value of the i data in the dataset, d is the label value, when y (i) = d is true, return 1, otherwise return 0.

Experiment Results and Analysis
This section shows numerical examples to validate the generality and effectiveness of the proposed MFF-MDNN for underwater acoustic target recognition. The dataset was divided into six categories, including four types of ships, underwater mammals, and underwater background noise with weak targets in this paper. The total length of our dataset is almost 20 h. Each acoustic signal was divided into two seconds, and the background noise dataset was used to simulate the general situation of the underwater. The train set was three times as large as the test set in the experiments.
To demonstrate the MEMD suitable for feature extraction in underwater acoustic targets, Figure 11 shows the first 5 IMFs waveforms extracted of the EMD and the MEMD on the original underwater acoustic signals shown in the section of Figure 2. Figure 11a decomposes the IMFs waveform graph of EMD. Figure 11b decomposes the IMFs waveform of the MEMD. As depicted in Figure 12, the recognition accuracy of the proposed modified DNN is higher than that of the GMM. Although the GMM can recognize underwater acoustic targets, it is a shallow recognition model in which the recognition accuracy is relatively low. Meanwhile, the recognition accuracy of the GFCC-MDNN was higher than that of the MFCC-MDNN, which proves that the GFCC has strong adaptability to modify the DNN in this paper, and it is suitable for underwater acoustic target recognition compared to MFCC. From Figure 11, it can be observed that the underwater environment was extremely complex and that the EMD shows little difference between each order IMF and the original underwater acoustic signals. While MEMD can decompose the original underwater acoustic signals more distinctly, it can improve the effectiveness of feature information extraction in underwater acoustic signals, which is conducive to the subsequent recognition of underwater targets.
Therefore, the proposed multi-dimensional fusion features method has validity to some extent from Figures 3 and 5, Table 1, and Figure 11 in this paper.
Similarly, to verify the superiority of the proposed modified DNN in this paper, Figure 12 shows the recognition accuracy of 30 experiments with a different training set and testing set in the dataset when the maximum iteration was 800, which included the GMM recognition method using the MFCC feature extraction (MFCC-GMM) [11,29], the proposed modified DNN recognition method using MFCC feature extraction (MFCC-MDNN) [11], the proposed modified DNN recognition method using GFCC feature extraction (GFCC-MDNN) [18]. decomposition (EMD) and MEMD: (a) The first 5 IMFs waveforms extracted from the EMD; (b) The first 5 IMFs waveforms extracted from the MEMD.
As depicted in Figure 12, the recognition accuracy of the proposed modified DNN is higher than that of the GMM. Although the GMM can recognize underwater acoustic targets, it is a shallow recognition model in which the recognition accuracy is relatively low. Meanwhile, the recognition accuracy of the GFCC-MDNN was higher than that of the MFCC-MDNN, which proves that the GFCC has strong adaptability to modify the DNN in this paper, and it is suitable for underwater acoustic target recognition compared to MFCC.  As depicted in Figure 12, the recognition accuracy of the proposed modified DNN is higher than that of the GMM. Although the GMM can recognize underwater acoustic targets, it is a shallow recognition model in which the recognition accuracy is relatively low. Meanwhile, the recognition accuracy of the GFCC-MDNN was higher than that of the MFCC-MDNN, which proves that the GFCC has strong adaptability to modify the DNN in this paper, and it is suitable for underwater acoustic target recognition compared to MFCC.
To further demonstrate the recognition accuracy of the proposed the MFF-MDNN in this paper, Figure 13 shows the recognition accuracy of the proposed modified the DNN recognition method using GFCC feature extraction (GFCC-MDNN) [18], the proposed modified DNN recognition method using MFCC and MEMD feature extraction (MM-MDNN) [16], and the proposed MFF-MDNN. To further demonstrate the recognition accuracy of the proposed the MFF-MDNN in this paper, Figure 13 shows the recognition accuracy of the proposed modified the DNN recognition method using GFCC feature extraction (GFCC-MDNN) [18], the proposed modified DNN recognition method using MFCC and MEMD feature extraction (MM-MDNN) [16], and the proposed MFF-MDNN. As seen in Figure 13, the recognition accuracy of the proposed MFF-MDNN was higher than the other method. Therefore, the proposed MFF-MDNN has better recognition accuracy for underwater acoustic target recognition in this paper.
To describe more clearly the reliability and stability of the proposed MFF-MDNN method, Table  2    As seen in Figure 13, the recognition accuracy of the proposed MFF-MDNN was higher than the other method. Therefore, the proposed MFF-MDNN has better recognition accuracy for underwater acoustic target recognition in this paper.
To describe more clearly the reliability and stability of the proposed MFF-MDNN method, Table 2   As seen from Table 2, the recognition accuracy of the proposed MFF-MDNN method was higher than that of the other algorithms in the 30 experiments with a different training set and testing set in the dataset when the maximum iteration was 800. In this paper, the proposed multi-dimensional fusion features method can describe underwater acoustic signals from multiple angles. It combines the advantages of the GFCC with MEMD, which is more suitable for underwater acoustic target recognition than the single feature. Moreover, GMM was used to extract the statistical parameters of the feature matrix, which modify the structure of DNN. It can reduce redundant features and further improve recognition accuracy. From the above comparative experiments in Tables 1 and 2, the following results can be drawn. The proposed MFF-MDNN method can improve accuracy when the dataset has underwater background noise with weak targets. Therefore, the recognition results demonstrate that the proposed MFF-MDNN method has higher accuracy and strong adaptability over other methods.

Conclusions
In this paper, a combination of multi-dimensional fusion features and modified DNN method was proposed to recognize underwater acoustic targets. The problem where the single feature could not describe underwater acoustic signals well was solved effectively, using the GFCC and MEMD to extract multi-dimensional features. On this basis, a dimension reduction method was proposed, which fused the multi-dimensional features in the same time dimension. It could obtain multi-dimensional fusion features in the original underwater acoustic signals. In addition, the problem of many redundant features in the processing of recognition was solved by utilizing GMM to modify the structure of DNN. It could improve recognition accuracy. The proposed underwater acoustic target recognition method was employed on a dataset, which was divided into six categories, including four types of ships, underwater mammals, and underwater background noise with weak targets. The average recognition accuracies of the proposed MFF-MDNN, MM-MDNN, GFCC-MDNN, MFCC-MDNN, GFCC-GMM, and MFCC-GMM were 94.3%, 91.1%, 78.2%, 73.1%, 62.2%, and 59.0%, respectively, when the maximum iterations were 800. The recognition results showed that the method had good validity and adaptability. However, MFF-MDNN has a higher time complexity due to the algorithm principle of MEMD and the particularity of underwater acoustic signals. Further studies can be conducted in the future in this direction.