Research on Underwater Acoustic Source Localization Based on Typical Machine Learning Algorithms

Yuan, Peilong; Wang, Xiaochuan; Zhang, Zhiqiang; Zhang, Jiawei; Zhang, Honggang

doi:10.3390/app15179617

Open AccessArticle

Research on Underwater Acoustic Source Localization Based on Typical Machine Learning Algorithms

by

Peilong Yuan

^*,

Xiaochuan Wang

,

Zhiqiang Zhang

,

Jiawei Zhang

and

Honggang Zhang

Naval University of Engineering, Wuhan 430033, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9617; https://doi.org/10.3390/app15179617

Submission received: 1 July 2025 / Revised: 30 July 2025 / Accepted: 20 August 2025 / Published: 1 September 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Underwater acoustic source localization is formulated as a feature learning problem within a machine learning framework, where a data-driven approach directly extracts source distance features from hydroacoustic signals. This study systematically compares the localization performance of four machine learning models—decision tree (DT), random forest (RF), support vector machine (SVM), and feedforward neural network (FNN) models—in both classification and regression tasks. Experimental results demonstrate that, in classification tasks, all algorithms achieve effective localization under high signal-to-noise ratio (SNR) conditions, while the DT model exhibits significant noise sensitivity in low-SNR scenarios; regression tasks show reduced model convergence overall, with only the SVM and RF models maintaining basic localization capabilities at a high SNR. For two-dimensional localization, machine learning classification algorithms are employed, revealing systematic accuracy degradation compared to one-dimensional scenarios, where only the RF and SVM models demonstrate practical value under high-SNR conditions. Validation using measured data from the SWellEx-96 experiment’s S5 event confirms that when constructing datasets with frequency-domain acoustic pressure features from the final 35 min segment, the classification task-driven DT, RF, and SVM models all demonstrate reliable localization performance, benefiting from the inherent high-SNR characteristics of the data.

Keywords:

underwater detection; machine learning; target localization algorithms

1. Introduction

In recent years, with the development of computer science and technology and the increasing importance of acoustic source localization technology in related research and practice, more and more fields have an urgent need for algorithms related to underwater acoustic source localization, including but not limited to ocean exploration, underwater communication, coastal defense strategy, resource development, and other fields [1,2].

The earliest marine positioning system was the sonar system that emerged in 1906. With the advancement of science and technology [3], beam-forming algorithms were developed, which determine the direction of the acoustic source by generating beams with certain directivity. Later, there was the matched field processing (MFP) algorithm [4], which builds an environmental model based on the physical acoustic field for matching. It can be regarded as an extension of beam-forming algorithms in the marine environment and has also been a popular algorithm in the field of underwater acoustic positioning in recent decades. The traditional MFP algorithm has problems of location errors caused by system mismatch, environmental mismatch, and statistical mismatch in practical applications. It is limited in some practical applications due to its sensitivity to the mismatch between model-generated replica fields and measurements [5,6,7]. MFP gives reasonable predictions only if the ocean environment can be accurately modeled. Unfortunately, this is difficult because realistic ocean environments are complicated and unstable. In recent years, high-quality hydrophones have been developed that typically possess broader frequency response ranges and higher sensitivity, enabling them to capture a wider range of acoustic signal frequencies and intensities. The flaws of the MFP algorithm itself have become more obvious. With the rapid development of machine learning in recent years, related algorithms for underwater acoustic target localization based on machine learning have begun to emerge and gradually receive more attention [8,9,10].

The use of machine learning algorithms is a promising method for locating underwater targets because of their ability to learn features from data without requiring acoustic propagation modeling. Machine learning-related algorithms can effectively avoid the mismatch problem of traditional algorithms because they enable the learning system to autonomously learn the sample features through sufficient datasets to obtain the desired output. Compared with deep learning algorithms, such as the long short-term memory network (LSTM) and convolutional neural network (CNN), they perform stably in small-sample scenarios (tens to hundreds) with a low computational complexity [11,12,13].

The early neural networks had a relatively small number of large layers and were first introduced into the field of underwater acoustics by Steinberg B Z in 1991 [14]. Only linear neurons and Sigmoid activation functions in a simple single-layer network were used. When the relevant conditions of resolution and sampling can be well satisfied, the error of the positioning result is relatively small. However, when the relevant conditions of sampling and resolution cannot be well met, simple neural networks may not necessarily be satisfactory [14]. Niu takes the acoustic signals received by the multi-channel receiver array in the Noise09 dataset as input and successfully achieves localization through both classification and regression methods [15]. They also used the relevant algorithms of machine learning to estimate the voyage of ships in the ocean, mainly adopting the classification algorithm of FNNs. The localization effect had a relatively small error within 10 km, which was much better than the traditional MFP algorithm, which proved the effectiveness of machine learning [16]. Ferguson introduced the convolutional neural network (CNN) and used it to locate acoustic sources in shallow sea environments. It took the cepstral spectra and generalized cross-correlation information of the received acoustic signals as the input of the neural network. The voyage and position of the ship in motion have been precisely determined [17]. Huang also proposed deep neural networks for acoustic source localization. Two-segment methods were adopted for localization. Firstly, feature vectors were extracted. Then, a neural network was constructed to model the extracted feature vectors. Finally, a regression model was established. The results were compared with an FNN, confirming the effectiveness and accuracy of CNN localization [18]. The research also shows that the CNN method relies more on the quantity of the dataset.

Howarth added Gaussian white noise to experimental data to further explore the accuracy of CNN localization under different SNRs. The experimental conclusion shows that whether a CNN can achieve accurate locations which depends more on the amount of training data [19]. The CNN localization method relies on processor clusters to accelerate computing, which leads to high training energy consumption. Niu H and Liu Y proposed a Multi-Task Learning (MTL) algorithm based on a CNN, which accurately estimated the distance and depth of the acoustic source. The results show that the robustness of the MTL CNN algorithm in the case of array tilt is much better than that of the traditional MFP algorithm [20]. Furthermore, for studying the relationship between linear supervised learning and beam-forming algorithms, Ozanich E proposed a nonlinear deep feedforward neural network (FNN) for direction of arrival estimation. In the case of a horizontal array, the experimental results of multi-source wave arrival also prove the accuracy of the FNN [21].

Due to the complexity and time-varying characteristics of the marine environment, performance in the field of marine acoustics is often unsatisfactory, especially in scenarios with low signal-to-noise ratios (SNRs). Additionally, small-sample datasets pose challenges for deep learning training methods such as convolutional neural networks (CNNs). Consequently, this paper investigates machine learning-related algorithms for underwater passive localization of both simulated and experimental acoustic sources. This goal is achieved through classification and regression tasks based on a specified number of sample sets. Traditional machine learning methods, including decision tree (DT), random forest (RF), support vector machine (SVM), and feedforward neural network (FNN) models, are studied.

Accordingly, this study makes the following contributions:

(1): Systematically compares the performance of machine learning algorithms in acoustic source localization applications.
(2): Within a unified framework, comprehensively evaluates the performance of four classical machine learning models (DT, RF, SVM, and FNN) in underwater acoustic localization; based on a simulation environment similar to the SWellEx-96 experimental environment, performed acoustic source localization on one-dimensional (distance) and two-dimensional (distance + depth) simulated datasets through classification and regression tasks under different signal-to-noise ratios (SNRs = 2, 5, and 10).
(3): Furthermore, this study provides standardized data processing models and, through simulated and field data (SWellEx-96 experiment), validates the feasibility of using machine learning to replace traditional physics-based models (e.g., matched field processing, MFP) in complex marine environments, thereby offering practical guidelines for algorithm selection and performance boundaries for engineering applications.

2. Based Theory of Machine Learning for Underwater Acoustic Source Localization

The machine learning algorithms utilized for underwater acoustic source localization are primarily supervised learning models. During the training stage, each feature in the dataset must be paired with its corresponding actual labels, allowing the machine learning model to undergo self-training by minimizing the discrepancy between its outputs and the actual labels through iterative optimization. The quality of the trained model is assessed during the testing stage, where a preprocessed test dataset is input into the trained model. By comparing the predicted labels generated by the model with those from actual test data, one can evaluate the effectiveness of the model training process. The algorithm flows of machine learning for underwater acoustic source localization is shown in Figure 1.

2.1. Data Preprocessing

The complex acoustic pressure signal at a certain frequency point f received by the microphone array can be expressed as Equation (1).

x (f) = S (f) g (f) + n (f)

(1)

Here,

x (f) = [x_{1} (f), \dots, x_{L} (f)]

represents the received frequency-domain acoustic pressure signal by the array consisting of L detectors. The term S(f) represents the intensity of the acoustic source, and g(f) is the Green function, which contains the characteristics of the point source acoustic field. n(f) represents the white noise. To minimize the influence of the amplitude of the acoustic source on the localization effect as much as possible, the received acoustic pressure signal is also normalized.

\tilde{x} (f) = \frac{x (f)}{{‖x (f)‖}_{2}}

(2)

To reduce the influence of the acoustic source phase, the normalized covariance matrix of the complex acoustic pressure is adopted. This is a complex conjugate symmetric matrix averaged from N snapshots:

C (f) = \frac{1}{N} \sum_{i = 1}^{i = N} {\tilde{x}}_{i} (f) {\tilde{x}}_{i}^{H} (f)

(3)

Here,

{\tilde{x}}_{i}^{H} (f)

represents conjugate transposition and

{\tilde{x}}_{i} (f)

represents the complex acoustic pressure data of the ith snap. The Green function is employed, from which features can be extracted for acoustic source localization. Obviously, the covariance matrix is a conjugate symmetric matrix. If broadband acoustic waves are studied, the data of multiple frequency points can be concatenated into a higher-dimensional real number vector as input.

2.2. Selection of Data Labels

Suppose the distance range between the actual acoustic source and the receiver array is (rmin, rmax). The distance dimension of the search interval is discretized into K classes, and the labels of each data are selected according to the following equation:

\begin{array}{l} Δ r = \frac{r_{m a x} - r_{m i n}}{K} \\ l a b e l_{i} = ⌈\frac{r_{i} - r_{m i n}}{Δ r}⌉ \end{array}

(4)

Here, Δr represents the classification interval of each category and label i is the label to which the ith sample belongs, representing the operation of rounding up. For the DT, RF, and SVM models mentioned later, such labels can all be directly used for model training. For the FNN model, it is also necessary to map these labels into a one-dimensional binary vector ti as the output layer of the neural network. The mapping scheme is shown as follows:

t_{i k} = \{\begin{cases} 1, & l a b e l i - r k \leq \frac{Δ r}{2} \\ 0, & e l s e \end{cases}

(5)

Here, tik denotes the value of the kth dimension in the vector ti and rk represents the actual distance of the acoustic source in the kth sample. Therefore, ti represents the expected output of the neural network, which, when combined with the Softmax function introduced later, forms a probability distribution indicating the positions of the acoustic source. Finally, the distance corresponding to the dimension with the maximum value in the output vector is selected as the estimated distance of the acoustic source. In the regression task, the expected output for each sample is the actual location of the acoustic source in that sample.

2.3. The Algorithm Flow of a Typical Model

2.3.1. Decision Tree (DT) Model

The key to a DT model is to select the best nodes and the most excellent attribute partitioning. The most ideal outcome is naturally that as attributes are continuously divided, the samples contained in each node are as similar as possible; that is, the higher the similarity, the higher the purity. Usually, the indicators for measuring purity use information entropy and Gini index. The calculation equation of information entropy is

E n t (D) = - \sum_{i = 1}^{N} p_{i} \log_{2} p_{i}

(6)

Here, D represents the dataset contained in the current node and p_i represents the proportion of the samples of the ith node (i = 1, 2…). The higher the calculated information entropy Ent(D), the lower the purity of the current sample set D. Suppose the selected branch attribute f, which contains f¹, f², … fⁿ; then, the information gain of the sample set is defined as

G a i n (D, f) = E n t (D) - \sum_{j = 1}^{j = n} \frac{|D^{j}|}{|D|} E n t (D^{j})

(7)

Here, one localization strategy known as ID3 is to ensure that the information gain obtained from partitioning reaches the maximum value [22].

Similarly, the Gini index can also be used to select the division method, where purity is represented by a quantity called the Gini value, which is defined as

G i n i (D) = 1 - \sum_{i = 1}^{i = N} p_{i}^{2}

(8)

The further Gini index is defined as

G i n i_i n d e x (D, f) = \sum_{j = 1}^{j = n} \frac{|D^{j}|}{|D|} G i n i (D^{j})

(9)

2.3.2. Random Forest (RF) Model

The RF model is a kind of classic ensemble model. The two existing integration strategies are Bagging and Boosting, respectively. The Boosting method is to train each base learner in sequence. When training the first base learner, a sample set is randomly assigned to it. A typical algorithm that employs the Boosting strategy is AdaBoost [23].

The strategy of the Bagging method is that when training a single base learner, a sample is randomly drawn for training and then put back so that it is still possible to draw this sample in the next sampling until the predetermined number of samples for training a single base learner is reached. Random forest is an extended variant of the Bagging strategy used for decision trees. Based on the Bagging integration of the decision tree model, it also incorporates a certain degree of randomness.

2.3.3. Support Vector Machine (SVM) Model

The SVM model is used in classification problems to find a hyperplane in the sample space to separate samples of different categories [24]. After mapping to the high-dimensional space, the complexity of the classification model also increases, especially with an inner product term, which will significantly increase the actual computational load. To avoid such cumbersome calculations, kernel functions are often introduced:

κ (x_{i}, x_{j}) = ϕ {(x_{i})}^{T} ϕ (x_{j})

(10)

The inner product of vectors in a high-dimensional mapping space will correspond to a kernel function. Therefore, to achieve a good classification effect, the correct kernel function should be selected as much as possible.

2.3.4. Feedforward Neural Network (FNN)

The feedforward neural network (FNN) is composed of multiple layers of neurons, including the input layer, the output layer, and the hidden layer in the middle [25]. The input layer L1 contains multiple neurons (I), which are the input vectors in the final localization problem, while the inputs in the second layer are all linear combinations of the neurons in the first layer. The input of the jth neuron in the second layer can be written as

m_{j} = \sum_{i = 1}^{I} w_{j i}^{(1)} x_{i} + w_{j 0}^{(1)}

(11)

Here,

w_{j i}^{(1)}

and

w_{j 0}^{(1)}

are the weights and biases of linear combinations. Meanwhile, the input received by the neurons is output through the activation function. Commonly used activation functions include a Sigmoid function, tanh function, SoftMax function, etc. In addition, the most important part of neural networks is the updating of weights. The commonly used learning algorithm is still the error Back Propagation (BP) algorithm. The optimization purpose of a weight update in neural networks is to minimize the mean square error of the network. At present, there are various commonly used optimization schemes for weight, such as the stochastic gradient descent (SGD) algorithm, the adaptive moment estimation algorithm, the momentum method, etc. For some subsequent optimization algorithms using neural networks in this paper, the SGD algorithm has been selected.

Comparison of key parameters of different machine learning models in the underwater acoustic source l (Table 1).

3. Result Analysis of Simulation Data

3.1. Simulation Data Generation

The simulation data were all generated by KRAKEN [26] in the acoustic toolbox. The frequency-domain acoustic field is calculated based on the given specific marine environmental parameters and the positions of the acoustic source and the receiver. Two environment parameter files (.env and .flp) are required as input and outputs, a series of calculation results after running. Files in the env format mainly include specific information about the marine environment, such as the number of medium layers, acoustic velocity profiles, and the bottom boundaries of the ocean. The flp format file mainly describes the relevant parameters of the acoustic source and the receiver, including the distance and depth of the acoustic source, the number and depth of the receivers, the type of the receiver array, etc.

The simulated experimental environment is the SWellEx-96 experimental environment as shown in Figure 2. The experiment was conducted near San Diego, California in May 1996. The waveguide environment can be referred to the official website of the experiment. Among them, the depth of seawater is 216.5 m. The speed of acoustic in the marine waveguide mainly depends on the depth of seawater, salinity, and temperature. The experiment measured the relevant data, generated the 51 set of TD (Conductance, Temperature, and Depth) data, and finally the waveguide velocity was derived by the average process.

For the waveguide, the water depth is assumed to be the water depth at the array for range-independent processing. The seafloor is composed first of a 23.5 m-thick sediment layer with a density of 1.76 g/cm³ and an compressional attenuation of 0.2 dB/kmHz. The top of the sediment layer has a compressional acoustic speed of 1572.368 m/s, while the bottom of the sediment layer has a compressional acoustic speed of 1593.016 m/s. Below the sediment lay an 800 m thick mudstone layer with a density of 2.06 g/cm³ and an attenuation of 0.06 dB/kmHz. The top and bottom acoustic speeds of the mudstone layer are 1881 m/s and 3245 m/s, respectively. The geoacoustic model is completed by a half space with a density of 2.66 g/cm³, an attenuation of 0.020 dB/kmHz, and a compressional acoustic speed of 5200 m/s [27].

3.2. Simulation Data Preprocessing

The settings of the simulation acoustic source and receiver array refer to the S5 event of the experiment. The depth of the acoustic source is 9 m below the water surface, and it emits a single-frequency acoustic wave of 109 Hz. The receiver array adopts a vertical array, which contains 21 array elements, has a sampling frequency of 1500 Hz, and has complete recorded data for 75 min. In this event, the ship towed the acoustic source forward at a constant speed of 2.5 m/s and the distance between the acoustic source and the array was approximately 900 to 8700 m. The sampling situation when the acoustic source moves is simulated by setting up multiple arrays with consistent spacing. A total of 10,000 equidistant arrays are set up with distances ranging from 100 m to 10,100 m from the acoustic source (excluding 100 m, that is, starting from 101 m, the array spacing is 1 m). In this way, 10,000 frequency-domain acoustic pressure samples can be generated as the training set. Moreover, there will be no duplicate samples from the subsequent test sets, which would result in a relatively high prediction accuracy of the final model. Another 200 arrays are set up, with a horizontal distance from the acoustic source ranging from 100 m to 10,100 m, thereby generating 200 samples to form a test set, with a sample spacing of approximately 50 m. The depth of the acoustic source alone is changed to generate data of three depths. That is, the training set contains 30,000 data points and the test set contains 600 data points. Each depth corresponds to 10,000 training data points and 200 test data points. The result calculated by the KRAKEN [26] program is the acoustic pressure signal in the frequency domain.

For each normalized frequency-domain acoustic pressure sample, Gaussian white noise is added with three specific SNRs separately. In the classification problem, we divide these 10,000 training samples into 100 categories, with 100 samples in each category; that is, every 100 m is a category. Similarly, in the test set, 100 types of samples are also classified, with each category containing 2 samples. The labels are generated according to Equations (4) and (5), corresponding respectively to the non-neural network model and the feedforward neural network model. In the regression problem, the labels can be used to obtain the actual distance of the acoustic source. Since the label range of the samples has a large span, the maximum–minimum normalization method is adopted currently.

3.3. Analysis of 1D Localization Results Based on Simulation Data

To quantify the prediction performance of the range estimation methods, the mean absolute percentage error (P_MAPE) over N samples is defined as

P_{M A P E} = \frac{100}{N} |\frac{p_{i} - T_{i}}{T_{i}}|

(12)

where p_i and T_i are the predicted range and the ground truth range, respectively. P_MAPE is preferred as an error measure because it accounts for the magnitude of error in faulty range estimates as well as the frequency of correct estimates. P_MAPE is known to be an asymmetric error measure but is adequate for the small range of outputs considered [28]. Underwater acoustic source localization is first solved based on the simulation data with the DT, RF, SVM, and FNN networks. The location results based on 1D simulation data with different SNRs are analyzed. Only the best P_MAPE obtained with different SNRs by the DT, RF, SVM, and FNN models are shown in Figure 3.

The blue solid line represents the actual distance of the simulated acoustic source, while the orange dispersion points indicate the distances predicted by various machine learning algorithms. When the SNR is 2 in the Figure 3, the accuracy of the DT model is only 0.05, whereas the RF model achieves an accuracy of 0.31. The SVM and FNN models perform significantly better, with classification accuracies of 0.65 and 0.66, respectively. When the SNR increases to 5 in the Figure 4, the classification accuracy of all models improves slightly. Specifically, the DT model’s accuracy rises to 0.33 and the RF model’s accuracy increases to 0.87. The SVM and FNN models achieve classification accuracies of 0.93 and 0.95, respectively. At a SNR of 10 in the Figure 5, all models show substantial improvements in classification accuracy (see Figure 5). The DT model reaches an accuracy of 0.66, while the RF, SVM, and FNN models achieve an accuracy of 0.94, 0.95, and 0.96, respectively.

To quantify the prediction performance, the mean absolute percentage error (P_MAPE) is calculated based on Equation (12). The predicted range is calculated by the machine learning model. The reliability was set at 96%. The confidence intervals are set relative to 92–100% of the truth range to up and down of the real data to reduce the prediction error. In this case, the predicted range and the ground truth range can be achieved through the total 100 times calculation. The P_MAPE results are summarized in Table 2 with different SNRs. The lowest P_MAPE is achieved by the FNN and SVM, with 0% on the data sets with a SNR of 10. The performance of these machine learning algorithms is comparable when solving range estimation as a classification problem. The DT model demonstrates weaker predictive performance compared to the RF model. Furthermore, when the SNR is relatively low, the RF model performs slightly worse than the SVM and FNN models. In scenarios with many sample features and significant noise, the DT model produces unstable predictions and larger deviations. In contrast, SVM and simple neural networks exhibit superior performance in such high-dimensional, noisy datasets.

In regression problems, the outputs of machine learning models are directly represented by the continuous label values derived from actual distances. Figure 6 illustrates the prediction results of various machine learning models for regression tasks under different SNRs. At a SNR of 5 in the Figure 6, neither the DT nor the RF model can accurately predict the distance to the acoustic source; both exhibit numerous outliers, which is significantly worse than performance observed in classification tasks at the same SNR. Conversely, while the SVM model demonstrates a general trend that approximates real distances, it suffers from considerable jitter and yields unsatisfactory localization results.

Figure 7 indicates that when the SNR increases to 10, there is a marked improvement in regression outcomes for the RF model, with a slight reduction in prediction fluctuations for the SVM as well. However, despite these advancements, the regression effect of the DT model is still easily affected by although the noise is not large, and the prediction effect is not satisfactory. Compared to classification methods in the previous results, the 1D results demonstrate that in localization tasks, regression models exhibit a noticeable performance gap in prediction accuracy. In addition, the larger the SNR is, the greater the prediction accuracy of the algorithm will be.

3.4. Analysis of 2D Localization Results Based on Simulation Data

The 1D results reveal that ensemble learning models such as the RF and SVM models, leveraging their strong feature representation capabilities, significantly outperform single decision tree models in key metrics including Mean Squared Error (MSE) and coefficient of determination (R²). Notably, due to the increased complexity from higher dimensionality, 2D localization shows a measurable decline in prediction accuracy compared to 1D scenarios. Based on these findings, this study specifically employs the classification model architectures of RF and SVM for in-depth investigation in 2D spatial localization tasks.

The classification output of 2D localization consists of two arrays. One array is the predicted output label in the distance dimension, and the other array is the predicted output label in the depth dimension, which belongs to a multi-output classification problem. Figure 8 shows the multi-output classification results of the two algorithms (RF and SVM) when the SNR is 2. In multi-output classification, the prediction accuracy of the same distance dimension is lower than that of the distance dimension prediction in 1D localization. Because there are three classifications in depth dimensions and many training samples, even if the SNR is not high, the localization effect is still good. In Figure 9, the 2D localization effects of the two machine learning models with a SNR of 5 are shown. When the SNR is improved, the prediction results of the SVM have been significantly improved, the accuracy of the distance localization of the RF model has also slightly improved, and the number of outliers has decreased. When the SNR increased to 10, the multi-output classification accuracy of both models was significantly improved. There were no obvious outliers and the localization effect was good (see Figure 10), which could well fit the actual position of the simulated acoustic source.

Based on the 2D localization results with the two machine learning models, the mean absolute percentage error (P_MAPE) is also calculated to compare the algorithm performance. The same reliability and confidence intervals of the truth range to up and down of the real data were set to calculate the predicted range and the ground truth range. The predicted range and the ground truth range can be achieved through the total 100 times calculation. The mean absolute percentage error (P_MAPE) is calculated based on Equation (12). The results are summarized in Table 3 with different SNRs. The results indicate that the RF model demonstrates weaker predictive performance compared to the SVM model. Furthermore, when the SNR is relatively low, the RF model performs slightly worse than the SVM model. In contrast, the SVM exhibits superior performance in such 2D-dimensional conditions with noisy datasets (SNR of10).

4. Experimental Results and Analysis

4.1. Experimental Data Preprocessing

This section uses the sea trial experimental data of SWellEx96 [27]. Based on sea trial data, several machine learning models are applied to conduct 1D acoustic source localization. Here, the S5 event of the SWellEx-96 experiment is adopted (Figure 11), and the specific environmental parameters are the same with Section 3 of this event. The acoustic sources used in this experiment consist of two acoustic sources towed by a ship. One acoustic source is located 9 m underwater and the other is located 54 m underwater. GPS information is recorded once every minute. Among them, the deep source emits a group of comb-shaped acoustic waves containing five intensities, with the acoustic wave frequencies ranging from 49 Hz to 400 Hz. The shallow source emits a set of comb-shaped acoustic waves of consistent intensity, with frequencies ranging from 109 Hz to 385 Hz. In addition, this experiment set up four kinds of receiver arrays, namely the vertical linear array (VLA), the tilt linear array (TLA), and two horizontal linear arrays (HLAs) located on the north and south sides, respectively. Here, the data were collected by the vertical array (VLA). A total of 75 min of signals were recorded, and the sampling frequency was 1500 Hz.

After loading the data from the website, the spectrum was estimated by subtracting the mean value from the signals received by the 5th array element. Figure 12 shows an example of a spectrum signal in the experimental data. The obtained spectrum is basically consistent with that given on the website.

The horizontal distance between the acoustic source and the vertical detectors decreases first and then increases, reaching the minimum value around 60 min. Here, data ranging from 40 min to 75 min are used. Among them, the data from the 2400 s to 3580 s are used as the training dataset and the data from 3580 s to 4500 s are used as the testing dataset. Here, four 0.5 s snapshots are taken, respectively. There is no repetition between adjacent snapshots. After subtracting the mean from each snapshot, a Fourier transform is performed. The frequency-domain acoustic pressure signals of the four snapshots are averaged to obtain a 21 × 21 covariance matrix. The real and imaginary parts of the upper triangle of the covariance matrix are respectively extracted, which can be concatenated into a vector as the input of the machine learning model. The data of the training set and the test set are classified into 20 categories according to the actual distance between the acoustic source and the array. The generation of labels is based on Equation (5) with K = 20.

4.2. Localization Results and Analysis of Experiment Data

The obtained input vectors from Section 4.1 are input into the DT, RF, and SVM models to predict the actual acoustic source localization based on the classification task with experimental data. The final prediction results are shown in Figure 13. The data segment adopted in the machine learning model is the latter part, which has a relatively high SNR.

Here, 100 calculations were performed in the experiment results. The reliability was set at 96%. The confidence intervals were set relative to 92–100% of the truth range to up and down of the real data (blue line) to reduce the prediction error. Among them, the final classification accuracy rates of the three algorithms are 0.44, 0.65, and 0.69. Comparing the experimental results with the simulation data in Section 3, we can conclude that, among the different machine learning methods, the prediction results of several classification algorithms are all quite satisfactory. Although the accuracy rates of the three models are not very high, it can be seen from Figure 13 that the outliers of the overall data are relative, and they are roughly all near the blue solid line representing the actual distance. Due to its instability and being greatly affected by noise, the DT model has slightly more outliers than the other two algorithms, but it can still reveal the approximate location of the acoustic source. The RF model and the SVM model are relatively stable and have smaller prediction errors. Another reason is the data segment adopted is the latter part, which has a relatively high SNR.

5. Conclusions

In this paper, an approach for acoustic source localization in ocean waveguides within a machine learning framework was presented both for the simulation datasets and experimental datasets. The localization is posed as a supervised learning problem. The adopted approach involves transforming time-domain acoustic pressure into the frequency domain to extract frequency-domain acoustic pressure data at specific frequency points for the application of acoustic source localization by machine learning algorithms. Multiple sets of frequency-domain acoustic pressure data are averaged to construct the frequency-domain covariance matrix. For the classification task, labels are determined using the equal-width binning method, whereas for the regression task, the actual acoustic source position is normalized and employed as the label. The research results prove the reliability of the relevant data processing methods.

In the simulation research, the datasets with different SNRs were generated using the KRAKEN program. In classification problems, the effect of 1D localization is slightly better than that of 2D localization. Meanwhile, the SVM and FNN models perform significantly better than the DT model and RF model in the case of a low SNR. The larger the SNR is, the greater the prediction accuracy of the algorithm will be. However, the convergence and accuracy of the regression task are slightly lower than those of the classification task, which means that classification methods perform better than regression. Only the SVM regression model in the case of a high SNR has a satisfactory positioning effect. Comparing the different results with the 2D simulation data, which is considered as a multi-output classification problem, the prediction results of the depth dimension are more accurate than those of the distance dimension. After the SNR was improved, the prediction results of the machine learning models were significantly enhanced.

In addition, the experimental data of the S5 event of SWellEx-96 were adopted and several models were used for underwater acoustic source localization. Because the adopted data segment is the second half and has a relatively high SNR, the predicted results of several classification algorithms are all quite satisfactory. Comparing the simulation and experimental results, we can conclude that the machine learning algorithms with simple networks can achieve satisfactory prediction results under a high SNR situation with small amount of computation. In future research, recurrent neural networks will be adopted to compare with the related machine learning models, and machine learning parameters will be optimized to extract features at lower SNRs and enhance the underwater acoustic source localization.

Author Contributions

Methodology, P.Y. and H.Z.; Software, P.Y.; Validation, X.W.; Resources, Z.Z.; Writing—original draft, P.Y.; Writing—review & editing, X.W., J.Z. and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China for Young Scientists (grant number 12304535) and a self-developed program of the Naval University of Engineering (No. 2023507020).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The simulated data presented in this study are available upon request from the corresponding author because of legal and the experimental data are available online: http://swellex96.ucsd.edu/.

Acknowledgments

The authors would like to thank the staff and students at NJU for supporting the project and giving technical support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sørensen, F.F.; Mai, C.; von Benzon, M.; Liniger, J.; Pedersen, S. The Localization Problem for Underwater Vehicles: An Overview of Operational Solutions. Ocean Eng. 2025, 330, 121173. [Google Scholar] [CrossRef]
Thorpe, S.A.; Stubbs, A.R.; Hall, A.J.; Turner, R.J. Wave-produced bubbles observed by side-scan sonar. Nature 1982, 296, 636–638. [Google Scholar] [CrossRef]
Vaccaro, R.J. The past, present, and the future of underwater acoustic signal processing. IEEE Signal Process. Mag. 1998, 15, 21–51. [Google Scholar] [CrossRef]
BUCKERHP Use of calculated acoustic fields and matched-field detection to locate acoustic sources in shallow water. J. Acoust. Soc. Am. 1976, 59, 368–373. [CrossRef]
Baggeroer, A.B.; Kuperman, W.A. An overview of matched field methods in ocean acoustics. IEEE J. Ocean. Eng. 1993, 18, 401–424. [Google Scholar] [CrossRef]
Baggeroer, A.B. Why Did Applications of MFP Fail, or Did We Not Understand How to Apply MFP? Available online: https://www.mendeley.com/catalogue/b45adab6-b442-34dd-8fde-79327c92a9de/ (accessed on 1 August 2025).
Schmidt Henrik. Environmentally tolerant beamforming for high-resolution matched field processing: Deterministic mismatch. Acoust. Soc. Am. J. 1990, 88, 1851–1862. [Google Scholar] [CrossRef]
Shwetha, M.; Krishnaveni, S. A Systematic Analysis, Outstanding Challenges, and Future Prospects for Routing Protocols and Machine Learning Algorithms in Underwater Wireless Acoustic Sensor Networks. J. Interconnect. Netw. 2025, 25, 2330001. [Google Scholar] [CrossRef]
Baggeroer, A.B.; Kuperman, W.A. Matched Field Processing in Ocean Acoustics. In Acoustic Signal Processing for Ocean Exploration; Moura, J.M.F., Lourtie, I.M.G., Eds.; NATO ASI Series; Springer: Dordrecht, The Netherlands, 1993; Volume 388. [Google Scholar] [CrossRef]
Lei, Z.; Yang, K.; Ma, Y. Passive localization in the deep ocean based on cross-correlation function matching. J. Acoust. Soc. Am. 2016, 139, EL196–EL201. [Google Scholar] [CrossRef]
Gola, K.K.; Dhingra, M.; Gupta, B.; Rathore, R. An empirical study on underwater acoustic sensor networks based on localization and routing approaches. Adv. Eng. Softw. 2023, 175, 103319. [Google Scholar] [CrossRef]
Ashok, P.; Latha, B. Feature Extraction of Underwater Acoustic Signal Target Using Machine Learning Technique. Trait. Du Signal 2024, 41, 1303. [Google Scholar] [CrossRef]
Yang, H.; Lee, K.; Choo, Y.; Kim, K. Underwater Acoustic Research Trends with Machine Learning: General Background. J. Ocean Eng. Technol. 2020, 34, 147–154. [Google Scholar] [CrossRef]
Steinberg, B.Z.; Beran, M.J.; Chin, S.H.; Howard, J.H., Jr. A neural network approach tosource localization. J. Acoust. Soc. Am. 1991, 90, 2081–2090. [Google Scholar] [CrossRef]
Niu, H.; Reeves, E.; Gerstoft, P. Source localization in an ocean waveguide using supervised machine learning. J. Acoust. Soc. Am. 2017, 142, 1176–1188. [Google Scholar] [CrossRef]
Niu, H.; Ozanich, E.; Gerstoft, P. Ship localization in Santa Barbara Channel using machine learning classifiers. J. Acoust. Soc. Am. 2017, 142, EL455–EL460. [Google Scholar] [CrossRef] [PubMed]
Ferguson, E.L.; Williams, S.B.; Jin, C.T. Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018. [Google Scholar] [CrossRef]
Huang, Z.; Xu, J.; Gong, Z.; Wang, H.; Yan, Y. Source localization using deep neural networks in a shallow water environment. J. Acoust. Soc. Am. 2018, 143, 2922–2932. [Google Scholar] [CrossRef] [PubMed]
Howarth, K.; Van Komen, D.F.; Neilsen, T.B.; Knobles, D.P.; Dahl, P.H.; Dall’Osto, D.R. Effect of signal to noise ratio on a convolutional neural network for source ranging and environmental classification. J. Acoust. Soc. Am. 2019, 146, 2961–2962. [Google Scholar] [CrossRef]
Liu, Y.; Niu, H.; Li, Z. A multi-task learning convolutional neural network for source localization in deep ocean. J. Acoust. Soc. Am. 2020, 148, 873–883. [Google Scholar] [CrossRef] [PubMed]
Ozanich, E.; Gerstoft, P.; Niu, H. A feedforward neural network for direction-of-arrival estimation. J. Acoust. Soc. Am. 2020, 147, 2035–2048. [Google Scholar] [CrossRef]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Suykens, J.A.K.; Vandewalle, J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
Sanger, T.D. Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw. 1989, 2, 459–473. [Google Scholar] [CrossRef]
Porter, M.B. The KRAKEN Normal Mode Program. Unknown, 1992. Available online: http://oalib.hlsresearch.com/Modes/index.html (accessed on 1 August 2025).
The SWellEx-96 Experiment[Z]. Available online: http://swellex96.ucsd.edu/ (accessed on 1 August 2025).
Goodwin, P.; Lawton, R. On the asymmetry of the symmetric MAPE. Int. J. Forecast. 1999, 15, 405–408. [Google Scholar] [CrossRef]

Figure 1. Algorithm flows of machine learning for underwater acoustic source localization.

Figure 2. The simulated environment of SWellEx-96 data.

Figure 3. The localization results of each algorithm in the case of a SNR of 2.

Figure 4. The localization results of each algorithm in the case of a SNR of 5.

Figure 5. The localization results of each algorithm in the case of a SNR of 10.

Figure 6. The regression localization results of each algorithm in the case of a SNR of 5.

Figure 7. The regression localization results of each algorithm in the case of a SNR of 10.

Figure 8. The 2D localization results of each algorithm in the case of a SNR of 2.

Figure 9. The 2D localization results of each algorithm in the case of a SNR of 5.

Figure 10. The 2D localization results of each algorithm in the case of a SNR of 10.

Figure 11. The experimental environment and the distance between the acoustic source and the array. (a) The travel trajectory and array position of the acoustic source of the S5 event. (b) The distance of the acoustic source from each detector every 5 min.

Figure 12. An example of the signal spectrum recorded by element 5.

Figure 13. The localization results of each model based on experimental data.

Table 1. Comparison of key parameters of different machine learning models.

Models	Influencing Factors	Parameters
DR	Information gain\Gini index	ID3 division strategy/Gini index
RF	The number of base learners	Bagging strategy/Boosting strategy
SVM	Hyperplane\Margin\Kernel function	Radial basis function
FNN	Activation function\Weights	Softmax function/SGD algorithm

Table 2. Best P_MAPE rate of DT, RF, SVM, and FNN predictions for the 1D localization.

Model	P_MAPE (%)
Model	Data_1 (SNR of 2)	Data_2 (SNR of 5)	Data_3 (SNR of 10)
DT classifier	86	53	25
RF classifier	52	10	3
SVM classifier	28	3	20
FNN classifier	24	0	1

Table 3. Best P_MAPE rate of DT, RF, SVM, and FNN predictions for the 2D localization.

Model	P_MAPE (%)
Model	Data_4 (SNR of 2)	Data_5 (SNR of 10)	Data_6 (SNR of 10)
RF (distance)	85	77	5
SVM (distance)	58	19	3
RF (depth)	55	47	3
SVM (depth)	54	15	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, P.; Wang, X.; Zhang, Z.; Zhang, J.; Zhang, H. Research on Underwater Acoustic Source Localization Based on Typical Machine Learning Algorithms. Appl. Sci. 2025, 15, 9617. https://doi.org/10.3390/app15179617

AMA Style

Yuan P, Wang X, Zhang Z, Zhang J, Zhang H. Research on Underwater Acoustic Source Localization Based on Typical Machine Learning Algorithms. Applied Sciences. 2025; 15(17):9617. https://doi.org/10.3390/app15179617

Chicago/Turabian Style

Yuan, Peilong, Xiaochuan Wang, Zhiqiang Zhang, Jiawei Zhang, and Honggang Zhang. 2025. "Research on Underwater Acoustic Source Localization Based on Typical Machine Learning Algorithms" Applied Sciences 15, no. 17: 9617. https://doi.org/10.3390/app15179617

APA Style

Yuan, P., Wang, X., Zhang, Z., Zhang, J., & Zhang, H. (2025). Research on Underwater Acoustic Source Localization Based on Typical Machine Learning Algorithms. Applied Sciences, 15(17), 9617. https://doi.org/10.3390/app15179617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Underwater Acoustic Source Localization Based on Typical Machine Learning Algorithms

Abstract

1. Introduction

2. Based Theory of Machine Learning for Underwater Acoustic Source Localization

2.1. Data Preprocessing

2.2. Selection of Data Labels

2.3. The Algorithm Flow of a Typical Model

2.3.1. Decision Tree (DT) Model

2.3.2. Random Forest (RF) Model

2.3.3. Support Vector Machine (SVM) Model

2.3.4. Feedforward Neural Network (FNN)

3. Result Analysis of Simulation Data

3.1. Simulation Data Generation

3.2. Simulation Data Preprocessing

3.3. Analysis of 1D Localization Results Based on Simulation Data

3.4. Analysis of 2D Localization Results Based on Simulation Data

4. Experimental Results and Analysis

4.1. Experimental Data Preprocessing

4.2. Localization Results and Analysis of Experiment Data

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI