Next Article in Journal
BFLE-Net: Boundary Feature Learning and Enhancement Network for Medical Image Segmentation
Previous Article in Journal
An Improved Wind Power Forecasting Model Considering Peak Fluctuations
Previous Article in Special Issue
An Enhanced Approach for Sound Speed Profiles Inversion Using Remote Sensing Data: Sample Clustering and Physical Regression
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Learning-Based Source Localization with Interference Striation of a Towed Horizontal Line Array

School of Ocean Engineering and Technology, Sun Yat-sen University, Daxuelu Road, Tangjiawan Town, Zhuhai 519082, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(15), 3053; https://doi.org/10.3390/electronics14153053
Submission received: 4 July 2025 / Revised: 27 July 2025 / Accepted: 28 July 2025 / Published: 30 July 2025
(This article belongs to the Special Issue Low-Frequency Underwater Acoustic Signal Processing and Applications)

Abstract

The aperture of the towed horizontal line array is limited and the received signal is unstable in a complex ocean environment, making it difficult to distinguish the location of the sound source. To address this challenge, this paper presents a MoELocNet (Mixture of Experts Localization Network) for deep-sea sound source localization, leveraging interference structures in range-frequency domain signals from a towed horizontal line array. Unlike traditional correlation-based methods constrained by time-varying ocean environments and low signal-to-noise ratios, the model employs multi-expert and multi-task learning to extract interference periods from single-frame data, enabling robust estimation of source range and depth. Simulation results demonstrate its superior performance in the deep-sea shadow zone, achieving a range localization error of 0.029 km and a depth error of 0.072 m. The method exhibits strong noise robustness and delivers satisfactory results across diverse deep-sea zones, with optimal performance in shadow zones and secondary effectiveness in the direct arrival zone.

1. Introduction

Underwater source localization has always been a popular and challenging research topic. Conventional physical methods are affected by the environment and thus it is difficult to maintain good performance, especially in the deep ocean, where the sound speed profile varies greatly with time and range. Furthermore, with various internal wave phenomena occurring frequently, great challenges are exposed to underwater source localization. Currently, using a horizontal line array for source localization has become an option [1,2]. It plays a significant role in target direction estimation and can be effectively applied to moving target tracking. Meanwhile, benefiting from the array gain, it can effectively improve the signal-to-noise ratio of the received signal. Particularly at low frequencies, as the sound propagation attenuation is relatively slow, a longer detection range can be achieved.
In previous experiments [3], interference striations varying with space and frequency could be observed from the received signals. Chruprov [4] and L.M. Brekhovskikh [5] first proposed the property of the waveguide invariant in the ocean, which is a physical quantity used to explain this interference structure of the sound field in the waveguide [6,7]; T.C. Yang [8] further elaborated on the sound intensity interference striations that vary with time and frequency in the ocean, and they used a two-dimensional Fourier transform to extract the waveguide invariant. After that, interference striation was applied well in positioning and tracking of the acoustic source. Zurk [9], Rouseff [10,11,12] and other researchers have done a lot of work on the waveguide invariant in the ocean [13] and established great connections between interference striation patterns and source range. They also extracted the active waveguide invariant [14] and carried out active target tracking based on the interference fringes.
Striation structures can also be observed in a horizontal line array when it serves as a beamformer [15]. Turgut [10] applied this phenomenon in broadband source localization with horizontal-beam output. The horizontal line array is mounted on the seabed and the spacing is quite large. In the following years, both horizontal array and vertical array were used to simulate the interference striation and their practical interference structures were analyze through experiments. The output of beamformer data provides better striation patterns due to the array gain while it still relies on time accumulation. After that, Zurk obtained the waveguide invariant using the Direction of Arrival (DOA) of the passive received signal and applied striation-based beamforming for active sonar, utilizing array to facilitate source localization. In terms of neural networks, the research mainly focuses on matched field positioning [16], target recognition [17,18,19,20], etc. Liu [21] used a multitask neural network for the depth and range localization of the sound source; Niu [22] used a multi-task convolutional neural network for the localization of the sound source; and Qian P [23] used a multi-task training structure to perform the matched field positioning of the sound source in the environment of internal soliton waves. The works of the latter two researchers were finally verified by experimental data and both achieved good results.
However, the practical application of interference striation-based localization in complex deep-sea environments faces several critical challenges. Firstly, the technique exhibits significant sensitivity to dynamic oceanographic conditions. Temporal variations in sound speed profiles, seafloor sediment properties, and internal wave activity [24] can substantially distort striation patterns, compromising their periodicity and undermining the reliability of conventional feature extraction methods. Secondly, technical constraints pose substantial limitations, outlined as follows: the restricted aperture of towed horizontal arrays results in inadequate spatial sampling resolution, producing poorly resolved striation features that become indistinguishable from background noise, particularly in the low signal-to-noise ratio (SNR) conditions characteristic of deep-water operations. Thirdly, the fundamental physics presents inherent difficulties—the relationship between striation characteristics and source parameters (range and depth) demonstrates complex nonlinearity and strong environmental dependence, rendering traditional analytical models inadequate for establishing robust inversion frameworks.
In this context, deep learning emerges as a particularly promising solution, though its potential for interference striation analysis remains largely untapped. The approach offers distinct advantages for overcoming these challenges. Unlike conventional methods reliant on handcrafted features, deep neural networks possess the capability to autonomously learn hierarchical representations of striation patterns directly from raw range-frequency data. This enables the extraction of subtle, environmentally adaptive features that would be inaccessible to manual design approaches, potentially revolutionizing interference-based localization in complex marine environments. Based on the above research background, this paper proposes MoELocNet (Mixture of Experts Localization Network), a deep learning-based source localization method with received signal of a towed horizontal line array in a single frame. The structure of this paper is as follows:
  • In Section 2, the theoretical foundation is established by deriving the relationships among the underwater sound intensity interference period, waveguide invariant, receiving depth, and receiving range. These theoretical derivations are subsequently validated using both simulated and experimental data, providing a solid physical acoustic basis for the subsequent analysis.
  • Section 3 details the training models employed in this study, along with the generation of the dataset and the data input process. The methodology for dataset generation is presented, covering the simulation setup using Bellhop to create acoustic signals under diverse oceanic conditions, such as varying depths, ranges, and incident angles relative to the horizontal array.
  • Section 4 focuses on the comparative evaluation and optimization of neural network models. Leveraging simulation data from the deep-sea shadow zone, a comprehensive comparison of different network architectures is presented. Following this, a full-scale model test is performed to further validate its practical effectiveness. Then we explore the model’s performance under varying Signal-to-Noise Ratios (SNRs). These efforts collectively contribute to a more robust and reliable underwater source localization framework, as discussed in the concluding sections of the paper.

2. Related Work

2.1. Range Frequency Interference

In the ocean waveguide, the intensity of the sound field has a robust interference structure in the two-dimensional plane of range and frequency. The frequency interval of the interference striations is not only related to the range between transmitting position and receiving position but also to the frequency. From the perspective of the normal mode theory, it can be demonstrated that the normal modes of different modes are superimposed or canceled out with each other, exhibiting the characteristic of bright and dark alternation in frequency. From the perspective of ray acoustics, it is explicit that for any receiving point in space, there are sound rays arriving from different directions. Due to the existence of the multi-path effect, an interference effect will occur at a specific frequency. For example, in the direct arrival zone, there is an interference structure composed of the direct wave and the sea surface reflected wave; in the shadow zone, there are four dominant acoustic paths and an interference structure formed by the four main sound rays such as the bottom reflected path, the surface–bottom reflected path, the bottom–surface reflected path, and the surface–bottom–surface reflected path. In the convergence zone, due to the dense sound rays, it is usually difficult to observe an obvious interference structure. In most situations, sound speed profile, sediment parameters [25], and the source location jointly determine the pattern of interference striation. Since the first two factors are observable, the relationship between the source localization and the interference striation can be established.
Figure 1a shows the sound intensity distribution at different receiving ranges (0–200 km) when the transmitting depth is 50 m and the receiving depth is 100 m (the ocean environment is depicted in Figure 1b, the ocean depth is 5000 m, the sound velocity of the seabed is 1600 m/s, the absorption coefficient is 0.1 dB/ λ , and the sound source intensity is 200 dB). It can be seen that the interference structure between the direct sound area and the first convergence zone is extremely obvious. In the first shadow zone, the interference period is positively correlated with the receiving range. In the second shadow zone, the interference period is relatively large, and it is difficult to observe the interference structure within a short spatial range. In the third shadow zone, the sound intensity begins to weaken and thus the striation becomes implicit. In the convergence zones, the sound intensity presents obvious bright fringes, which makes it difficult to extract the interference structure. The interference structure that varies with the receiving ranges can be explained through the waveguide invariant theory in the ocean waveguide. The ocean environment waveguide invariant is expressed as β = r ω Δ ω Δ r , and the value of β is approximately 1 in shallow water and is asymptotically infinite in the convergence zone [7]. In a typical passive sonar condition, after acquiring the received signal with a single hydrophone, the data are processed to derive the sound intensity distribution in the space-frequency domain. The waveguide invariant and interference period can be extracted through a two-dimensional Fourier transform or Radon transform, and then the sound source position information (i.e.,range and depth) can be extracted. The interference structure of the horizontal array is also subject to the combined effects of the target’s depth and range. By using this phenomenon and combining with deep learning, the sound source can be localized. In the first convergence zone, a relatively clear interference structure can be extracted with a small array aperture. At a further range, affected by the signal-to-noise ratio and aperture, there is insufficient coverage in the spatial range, resulting in the inability to extract an obvious interference structure. However, by training with a neural network, the sound intensity information can be well extracted, and then the depth and range of the sound source can be robustly estimated.

2.2. Theoretical Analysis

2.2.1. Normal Mode

Initially, it is postulated that the seabed bathymetry exhibits a gradual variation and can be considered a range-independent acoustic environment. Therefore, the normal mode theory is adopted for analysis. For a passive sonar, assuming that the depth of the sound source is z. Then at the receiving point with the receiving position ( r , z r ) , the sound pressure at the frequency point ω is
p 0 ( r , z ; z r ; ω ) = i ρ 0 ( 8 π ) 1 / 2 e i π / 4 m ψ m ( z ) ψ m ( z r ) e i k m r k m r m B m e i k m r ,
where k m is the horizontal wavenumber of the m-th mode, and ψ m ( z ) is the corresponding eigenfunction. According to the definition of sound intensity I ( r , z ; z r ; ω ) = p 0 ( r , z ; z r ; ω ) p 0 * ( r , z ; z r ; ω ) , it can be obtained that
I ( r , z ; z r ; ω ) = m B m B m * + 2 m , n B m B n * cos ( ( k m k n ) r ) .
In this equation, the initial term represents the autocorrelation magnitude of the same-order mode, while the subsequent term denotes the cross-correlation magnitude among different modes. The coherent term brought by different modes explains the interference effect of the sound intensity in the range-frequency)domain. The interference distance between different modes can be expressed as r = 2 π q ( k m k n ) , where q is an integer.
Regarding the sound intensity interference striation, under the condition of constant sound intensity, there exists
d I = I ω d ω + I r d r = 0 ,
where
I r = 2 m , n B m B n * ( Δ k m n ) sin ( Δ k m n r )
I ω = 2 r m , n B m B n * Δ k m n ω sin ( Δ k m n r ) ,
Δ k m n = k m k n . u and v are the group velocity and phase velocity under different modes, respectively, and their expressions are u = d ω d k and v = ω k .
The above formula can be rewritten as
I r = 2 ω m , n B m B n * 1 v m 1 v n sin ( Δ k m n r )
I ω = 2 r m , n B m B n * 1 u m 1 u n sin ( Δ k m n r ) .
Therefore, the definition of the waveguide invariant can be obtained as follows
β = d ( 1 / v ) d ( 1 / u ) = r ω I / r I / ω = r ω Δ ω Δ r .

2.2.2. Ray Theory

For any point in the ocean waveguide environment, the direct sound field can be expressed as the coherent superposition of multi-path sound rays [26]
p ( f ; r , z r ) = u = 0 A u ( r , z r ) e i Φ u ( r , z r ) .
In the formula, A u ( r , z r ) represents the amplitude factor of the eigen sound ray emitted at the angle α u and Φ u ( r , z r ) is the corresponding phase factor. According to the ray acoustics theory, it can be expressed as
A u ( L , z ) = cos α u r ( r / α ) α u sin α u V u 1 n V u 2 m e α S u ,
Φ u ( r , z ) = k 0 z s z r n 2 ( z ) cos α u 2 d z + k 0 r c o s α u + Φ u .
where r is the horizontal range, z s is the source depth, z r is the receiving depth, k 0 is the wave number, and V u 1 and V u 2 represent the surface and bottom reflection coefficients of the sound ray, respectively. The superscripts n and m denote the number of times of contact with the boundaries, and Φ u is used to characterize the phase change of the sound ray when reflecting at the interface or reversing in the water. For a sound ray with multiple reflections, the integral term in Φ should be the ray path, which can be more specifically expressed as the following formula
p a t h S d z = 2 N z l z u S d z + μ 0 z s S d z + v 0 z r S d z ,
where the integral term S = n 2 ( z ) cos 2 α u , N is an integer, z l and z u are the lower and upper bounds of the sound ray reversal, respectively, z s and z r represent the depths of the sound source and the receiver, respectively, and the values of μ and v are both + 1 and −1. Let Φ u ( r , z s ) 2 π f t u , where t u is the propagation time of this sound ray from the sound source to the receiving point, and f is the frequency of the sound source. Then the arriving sound pressure can be expressed in terms of the amplitude of the eigen sound ray and the arrival time as
p ( f ; r , z r ) = u = 0 A u ( r , z r ) e i 2 π f t u .
Considering that the sound source is near the surface and the receiving position is in the deep-sea shadow zone, the arrival structure of the sound rays is shown in Figure 2. In the shadow zone, the direct sound ray and the reversed sound ray cannot reach. The sound rays are composed of the sound rays reflected by the bottom and the surface. Under this condition, z l = 0 and z u = H , where H is the ocean depth. Neglecting the sound rays with secondary and higher-order bottom reflections, there exists four dominant sound rays, namely the bottom reflected ray, the bottom–surface reflected ray, the bottom–surface reflected ray, and the surface–bottom–surface reflected ray. When the sound ray touches the ocean surface, a phase flip of 180° occurs. Let the grazing angles of these four sound rays from the sound source be α 1 , α 2 , α 3 , α 4 , respectively, and the arrival times of the sound rays be t 1 , t 2 , t 3 , t 4 , respectively. The sound field can be approximately expressed as
p ( f ; r , z r ) = A 1 e i 2 π f t 1 A 2 e i 2 π f t 2 A 3 e i 2 π f t 3 + A 4 e i 2 π f t 4 .
For a target near the sea surface, the trajectories of the bottom reflected sound ray and the surface–bottom reflected sound ray are similar, and the grazing angles of the sound source α 1 and α 2 are almost equal. Similarly, the trajectories of the bottom–surface reflected sound ray and the surface–bottom–surface reflected sound ray are similar, and the grazing angles α 3 and α 4 are almost equal. Let Δ 1 = t 2 t 1 and Δ 2 = t 3 t 1 , then the sound intensity can be approximated as
I ( f ; r , z r ) = p ( f ; r , z r ) p * ( f ; r ; z r ) A 1 2 [ 1 c o s ( 2 π f Δ 1 ) ] [ 1 c o s ( 2 π f Δ 2 ) ] .
This formula indicates that there are two interference periods in the frequency distribution of the sound intensity. They are, respectively,
Δ f 1 = 1 Δ t 12 c 0 2 0 z s n 2 ( z ) cos 2 α 2 d z
and
Δ f 2 = 1 Δ t 13 c 0 2 0 z r n 2 ( z ) cos 2 α 3 d z .
Here, c 0 denotes the sound velocity corresponding to the depth where the sound source is located. The two aforementioned equations demonstrate that the interference period of the first type within the deep-sea shadow zone is independent of the receiving depth but decreases as the sound source depth increases. As the range expands, the value of α 2 decreases, leading to an increase in the interference period with distance. In contrast, the interference period of the second type is unaffected by the sound source depth but decreases with an increase in the receiving depth. Similar to the first type, as the range increases, α 3 decreases, resulting in an increase in the interference period with distance.
Figure 2. Schematic diagram of sound ray arrival structure in the deep-sea shadow zone. (a) The sound ray structure including bottom reflected ray, surface–bottom reflected ray, bottom–surface reflected ray, and surface–bottom surface reflected ray. (b) The emit angles of different sound rays, respectively, correspond to bottom reflected ray with α 1 , surface–bottom reflected ray with α 2 , bottom–surface reflected ray with α 3 , and surface–bottom surface reflected ray with α 4 .
Figure 2. Schematic diagram of sound ray arrival structure in the deep-sea shadow zone. (a) The sound ray structure including bottom reflected ray, surface–bottom reflected ray, bottom–surface reflected ray, and surface–bottom surface reflected ray. (b) The emit angles of different sound rays, respectively, correspond to bottom reflected ray with α 1 , surface–bottom reflected ray with α 2 , bottom–surface reflected ray with α 3 , and surface–bottom surface reflected ray with α 4 .
Electronics 14 03053 g002
The locations where the peak values of the sound intensity emerge satisfy
f = l Δ t i , l = 1 , 2 , 3 . . . ; i = 1 , 2 .
According to the definition in Equation (8), it can be concluded that β = r Δ t Δ 2 t Δ r . Finally, the waveguide invariant can be achieved as
β 1 r 2 0 z s n 2 ( z ) cos 2 α 2 d z [ c o s ( α 1 ) c o s ( α 3 ) ]
and
β 2 r 2 0 z r n 2 ( z ) cos 2 α 3 d z [ c o s ( α 1 ) c o s ( α 2 ) ] .

2.2.3. Experiment Analysis

Experimental data from an Indian Ocean survey conducted in 2024 can be employed to validate the aforementioned theory. The deployment diagram is shown in Figure 3a. The ocean depth in the experimental area is approximately 5250 m. The sound source is a towed source at a depth of 110 m, which is used to transmit a hyperbolic frequency modulated (HFM) signal with a bandwidth of 250 Hz–350 Hz. The vertical hydrophone array is used to collect the signal, and the hydrophone array covers the entire ocean depth. The received signals at specific depth(i.e., 50 m, 100 m, 500 m and 700 m) are selected for data processing. In the experiment, the range between the sound source and the hydrophone array is determined by the ship-borne GPS, and the horizontal range between them gradually increased with the navigation of the experimental ship. The sound speed profile measured in the experiment is shown in Figure 3b.
After performing matched filtering and power spectral density calculation on the received signals at distinct time intervals and calculating the corresponding sound energy of these signals, the relationship depicting the change of received sound intensity as a function of frequency and propagation range can be obtained, as shown in Figure 4. An explicit interference structure can be observed from Figure 4a,b. Taking the case where the receiving point is located at a depth of 50 m, we could use the method described above to solve the waveguide invariant of the interference fringes, which at 30 km is approximately β 2 = 1.7984 . Similarly, by using the normal mode formula theory for the calculation, β 2 = 1.6238 can be obtained. As depicted in Figure 4a, as the receiving range grows, the interference period steadily expands. Second-type interference striations are more readily discernible in the received signals at shallower depths. In contrast, at greater depths, first-type interference striations stand out more prominently.

3. Materials and Methods

3.1. Range Frequency Interference Striation in Horizontal Line Array

The horizontal array reception also exhibits the same interference structure. Such a structure is typically obtained by performing time-frequency analysis on signals at different distances after acquiring signals through beamforming. The advantage of this method lies in the ability to achieve a higher signal-to-noise ratio (SNR) via array gain, but it also requires a certain tracking time for the target [27,28]. In addition to this beamforming-based method, the towed horizontal array enables the effective observation of interference patterns in a single snapshot. Its spatial resolution capability is attributed to the array aperture, allowing it to capture acoustic field interference structures without the need for prolonged signal accumulation—unlike traditional single-hydrophone techniques that rely on temporal integration. To begin with, the acoustic field distribution received by the towed horizontal array is analyzed to verify the feasibility of the interference structure of the horizontal array for source localization. Suppose that in a deep ocean waveguide environment shown in Figure 5, a towed horizontal array is used to collect acoustic signals. The horizontal array has 100 elements with an element spacing of 0.5 m, and the receiving depth is at 100 m. The distance between the array head and the sound source in the horizontal direction is regarded as the source range. Neglecting the array inclination, the interference structures received by this horizontal array under different sound source depths and ranges are shown in Figure 6. It can be seen that the depth and range jointly affect the interference structure of the signals received by a horizontal array, and this feature is more explicit when the sound source depth is near the surface.
As can be seen from Figure 6, when the sound source depth is 6 m, only the second-type interference striation can be effectively observed, and it is clearly visible at distances ranging from 10 km to 40 km. In contrast, when the sound source depth is 50 m, two types of interference striation can be observed at moderate distances. However, either too large or too small distances will impede the extraction of these interference structures. Considering the case of a sound source depth of 50 m, by extracting the periods and comparing them with the results calculated through Formulas (16) and (17), the result depicted in Figure 7 is obtained. It is evident that for short-range scenarios, the calculation formula can accurately predict the periodic structures of the two types of interference fringes. Nevertheless, as the receiving range increases, the simulated interference period gradually exceeds the predicted period. When the distance becomes even greater, the horizontal array proves inadequate for supporting the extraction of the interference period.
In the previous section, the waveguide invariant and the interference period of the underwater acoustic field are deduced through the normal mode theory and the ray theory, and the characteristic relationship of their variation with the sound source depth and distance is obtained. Under ideal circumstances, these characteristics can be used for sound source localization. However, in practice, the uncertainty of the sound source movement and the instability of the received signal make it difficult to achieve good performance in practical applications. Therefore, using a single-frame signal of the horizontal array to extract the interference fringes and performing sound source localization on it becomes possible. The stable positioning of the sound source range depends on the judgment of the sound source depth. In some cases, the misjudgment of the sound source depth may lead to relatively large errors. On the other hand, the signal-to-noise ratio of the received signal will also greatly affect the positioning accuracy. As the range increases, the aperture of the horizontal array is insufficient, which makes it hard to extract the interference period. For the above reasons, in the next section, a deep learning-based method is adopted to extract the characteristic quantities in the sound intensity interference structure for estimating the position and range of the sound source.

3.2. Source Localization Method

3.2.1. Input Data Processing

After the sound pressure reaches the array, the complex sound pressure at different frequencies f is expressed as
p ( f ) = S ( f ) g ( f , r ) + ϵ ,
where p ( f ) = [ p 1 ( f ) , p 2 ( f ) , , p L ( f ) ] T represents the sound pressure received by L hydrophones, S ( f ) represents the signal spectrum of the sound source, g ( r , f ) is the Green’s function at a distance r, and ϵ is the noise. The ratio of the acoustic signal energy to the noise signal energy is used as the signal-to-noise ratio, and the definition of the signal-to-noise ratio in logarithmic form is expressed as
SNR = 10 log 10 l = 1 L f = 1 F | p l ( f ) | 2 F L σ 2 ,
where p l ( f ) is the complex sound pressure at the frequency point f of the l-th hydrophone, σ 2 is the variance of the noise, and F is the number of frequency points.
In order to extract the interference structure, the received sound pressure is processed to obtain the sound intensity as follows.
I ( f ) = 10 l o g 10 ( p ( f ) · p H ( f ) )
In actual calculation, since only the interference structure needs to be extracted, a normalization operation based on the standard deviation is carried out for it.
I n o r m , i j = μ I = 1 L 2 m = 1 L n = 1 L I m n σ I = 1 L 2 m = 1 L n = 1 L ( I m n μ I ) 2 I n o r m , i j = I i j μ I σ I , i = 1 , , L ; j = 1 , , L .

3.2.2. Data Augmentation

Considering the direction of arrival and setting it as θ , the sound pressure could be simplified as
p ( f ) = p 1 ( f ) [ 1 , e j 2 π f c d c o s θ , . . . , e j 2 π f c ( N 1 ) d c o s θ ] T
where d is the element spacing of the array, N is the elements amount, and c is the sound velocity. In this formula, as θ changes, the phase variation arriving at the array changes from 0 to ω ( N 1 ) d / c . Similarly, the length of the acoustic path for each element varies from R to ω ( N 1 ) d + R (R refers to the range of the source). Therefore, for received signals arriving at different angles, they can be approximated as croppings at different positions on the array. By performing image data augmentation on the input signals, applying cutout operations within the range of ( 0 N ) in the receiving range dimension, the tolerance of samples to receiving angles is increased.
Figure 8 shows an example of data cropping. In this case, the arrival angle is 30 and half of the received signal is cropped as the augmented signal, which is trained together with the original signal.

3.3. Environment Model and Data Sets Generation

3.3.1. MoELocNet

In this section, we present MoELocNet (Mixture of Experts Localization Network), a novel multi-task deep learning architecture that combines the ResNet18 backbone with a mixture of expert mechanisms [29,30] for acoustic source localization.
Figure 9 illustrates the framework and architecture of MoELocNet, with the network parameter settings detailed in Table 1. The framework includes a MoE stem layer (expert layers and gating layer), shared layers, and dual-branch task residual layers specifically designed for acoustic source localization tasks. The model employs a multi-task learning paradigm that jointly optimizes range estimation and depth estimation through a top-k expert selection mechanism and task-specific residual layer processing.
The proposed MoELocNet architecture integrates a specialized MoE stem layer with ResNet-18 backbone, utilizing a multi-task learning paradigm to address the challenges of deep-sea acoustic source localization from range-frequency domain interference patterns. MoELocNet replaces the standard ResNet input layer as expert layers within the MoE stem layer and incorporates multiple expert layers alongside a unified gating mechanism for adaptive weight learning. Each expert within the stem layer comprises a 7 × 7 convolutional layer with 64 output channels, batch normalization, ReLU activation, and 3 × 3 max pooling, capable of processing different interference patterns inherent in deep-sea waveguides such as multi-path effects, shadow zone characteristics, and direct arrival signals, with each expert having distinct convolutional layer parameters that enable them to perceive and acquire different knowledge aspects from various input features.
The gating mechanism implements a sophisticated top-k expert selection strategy where the input undergoes adaptive average pooling and flattening before being processed through two linear layers ( 3 128 n u m b e r   o f   e x p e r t s ) with batch normalization and ReLU activation in between. This gating network computes relevance scores for all experts but employs a sparse activation scheme where only the k highest-scoring experts contribute to the final output while others are masked with negative infinity prior to softmax normalization. This top-k selection approach not only improves computational efficiency by reducing the number of active experts but also enhances model interpretability by explicitly identifying the most relevant expert knowledge for each input signal.
Following expert fusion through the weighted combination of top-k selected experts, the unified representations are processed through the first three of the four residual layers following the ResNet [31] architecture as shared layers. These shared layers capture common acoustic patterns beneficial for both range and depth localization tasks through residual connections and progressive feature abstraction, enabling the model to learn fundamental acoustic propagation principles that generalize across different localization objectives.
The architecture culminates in task-specific branches where each task maintains its own residual block, followed by adaptive average pooling and linear regression heads that output continuous values. The range and depth branches process the shared feature representations independently through their respective residual layers, enabling task-specific feature refinement while preserving the advantages of shared low-level acoustic feature extraction, ultimately facilitating robust localization performance in complex deep-sea acoustic environments.

3.3.2. Workflow of MoELocNet

This subsection introduces the detailed workflow of MoELocNet for deep-sea acoustic source localization. The input is a 2D range-frequency domain feature x R B × 1 × H × W , which is repeated to 3 channels as x 3 c h R B × 3 × H × W to accommodate the ResNet architecture requirements for acoustic signal processing.
The workflow begins with simultaneous processing of the 3-channel input by 3 expert networks E 1 ( · ) , E 2 ( · ) , E 3 ( · ) and the gating network, where each expert generates specialized feature representations r i = E i ( x 3 c h ) R B × 64 × H × W capturing distinct acoustic propagation characteristics. Concurrently, the gating mechanism processes the input through adaptive average pooling and flattening operations to generate compact gating features s = AdaptiveAvgPool 2 d ( x 3 c h ) R B × 3 , which are subsequently fed into the gating network G ( · ) to compute expert relevance scores g = G ( s ) R B × 3 . The top-2 expert selection mechanisms then identify the most relevant experts through
g t o p 2 , idx t o p 2 = Top 2 ( g , 2 ) ,
followed by the application of a selective mask
mask i , j = g t o p 2 , i , j if j idx t o p 2 , i , otherwise ,
ensuring that only the 2 highest-scoring experts contribute to the final representation through the normalized weights w = softmax ( mask ) R B × 3 .
The expert fusion process stacks all expert outputs into a unified tensor
R = stack ( [ r 1 , r 2 , r 3 ] ) R B × 3 × 64 × H × W
and computes the weighted combination
r f u s e d = i = 1 3 w i · r i R B × 64 × H × W ,
effectively integrating the specialized knowledge from the top-2 selected experts based on their computed relevance to the input acoustic signal. This fused representation is subsequently processed through the first three of the four residual layers following ResNet architecture as shared layers to extract common acoustic features
f s h a r e d = SharedLayers ( r f u s e d ) R B × 256 × H × W
that benefit both range and depth localization tasks. The shared feature extraction stage captures fundamental acoustic propagation patterns such as interference structures and frequency-domain characteristics that are inherent to underwater sound propagation in deep-sea environments.
For task-specific processing, each localization task processes the shared features through its dedicated residual block and regression head, where the range estimation branch computes
f r a n g e = RangeLayer 4 ( f s h a r e d )
and
y ^ r = RangeHead ( AdaptiveAvgPool 2 d ( f r a n g e ) ) R B × 1 ,
while the depth estimation branch computes
f d e p t h = DepthLayer 4 ( f s h a r e d )
and
y ^ d = DepthHead ( AdaptiveAvgPool 2 d ( f d e p t h ) ) R B × 1 .
The model optimization employs a joint MTL transfer loss function that combines mean squared error losses from both localization tasks with homoskedastic uncertainty weighting as follows:
loss ( y r , y d , y ^ r , y ^ d ; w ) = 1 2 σ r 2 | | y r y ^ r | | 2 + 1 2 σ d 2 | | y d y ^ d | | 2 + log σ r σ d = 1 2 σ r 2 L r ( w ) + 1 2 σ d 2 L d ( w ) + log σ r σ d
where y r and y d represent the ground truth range and depth values, y ^ r and y ^ d are the predicted range and depth values, σ r and σ d are learnable uncertainty parameters for range and depth tasks, respectively, and L r ( w ) and L d ( w ) represent the individual MSE losses for each task. The gradient of the joint loss is calculated and all learnable parameters (including the uncertainty parameters σ r and σ d ) are updated through backpropagation. This unified expert weighting approach with homoskedastic uncertainty weighting enables the model to leverage complementary information from multiple experts while automatically balancing task importance based on prediction uncertainty, ultimately achieving robust regression-based localization performance across varying signal-to-noise ratios and complex acoustic propagation conditions in deep-sea waveguides.

3.3.3. Evaluation Criteria

The performance metrics are defined through mean absolute error (MAE)
E M A E = 1 N i = 1 N | y i y ^ i |
and mean absolute percentage error (MAPE)
E M A P E = 1 N i = 1 N | y i y ^ i | y i ,
where N is the number of test sets, y i denotes ground truth for either depths or range, and y ^ i represents the corresponding predictions. Both MAE and MAPE are adopted for the quantification of range and depth estimation error.

3.3.4. Training Parameter Settings

For this research, critical hyper parameters governing convergence, generalization, and task-specific performance are configured in Table 2. The model iterates over the dataset for 200 epochs (balancing feature learning and overfitting risk), adopts a 128-sample batch size (shaping training efficiency and stability trade-offs), leverages the Adam optimizer (adaptive gradient updates via first/second-moment estimation), sets a 1 × 10 3 learning rate (calibrated for stable convergence speed), and applies 1 × 10 4 decoupled weight decay (regularizing parameters to enhance generalization while avoiding optimizer-gradient interference).

3.4. Dataset

The ocean environment is simulated by Bellhop, and the sound speed profile is shown in Figure 5b. An absolute soft boundary and a fluid half-space basement are applied. The sound source frequency ranges from 200 Hz to 300 Hz with an interval of 1 Hz. The environmental parameters are shown in Table 3. The number of elements of the towed horizontal array is 100, and the element spacing is 0.5 m. After generating the dataset according to the data in the table, through random sampling, the training set is selected at a ratio of 0.8 and the test set is selected at a ratio of 0.2. According to the data in the table, the depth has an interval of 0.25 m, the range interval is 0.2 km, the sound speed of basement is 1600 m/s, and the attenuation coefficient is 0.2 d B / λ .

4. Results

4.1. Training and Test Results

Through the experiment, the MAE (MAPE) curves of the range and depth are shown in Figure 10. The error of the entire test set is calculated in each epoch, and the MAE is obtained by averaging the final results over the number of samples. Finally, the MAPE is calculated according to the range of the range and depth. It can be seen that after 115 model updates, the model tends to be stable. The final range MAE is 0.029 km and the depth MAE is 0.072 m. Meanwhile, the range MAPE is 0.297 % and depth MAPE is 0.267%.
A comprehensive error is established for different depths and ranges. For the sake of simplification, the summation in the following equation is dimensionless.
E r r o r = 1 2 ( M A E R a n g e + M A E d e p t h )
As shown in Figure 11, in most cases, the combined error is quite low, hovering around 0.04. Figure 11a visually represents the error as a function of distance and depth, with predominantly low-error regions indicated by the darker shades in the color scale. Figure 11b further quantifies this, showing that the majority of error values are less than 0.2.
Based on the established simulation environment and the relationship between the receiving range, depth, and interference period derived in Section 2, the localization results of the traditional algorithm for the target can be obtained by leveraging the interference period. However, due to the small aperture of the towed horizontal array, the traditional algorithm exhibits poor resolution in depth. Therefore, only the difference in depth localization accuracy is compared here. The extraction of the interference period refers to the 2D-FFT extraction method in the paper. After calculation, the error comparison between the traditional algorithm and the neural network algorithm is depicted in Figure 12. The MAE within 50 km is 10.9 km and MAPE is 35.95%, which is much larger than the results achieved from MoELocNet. Meanwhile, as the range increases, the localization error of the traditional algorithm keeps increasing, while the localization result based on the neural network remains relatively good.

4.2. Comparison Experiments Between Different Network

To rigorously assess the performance of the MoELocNet architecture, we conducted a comparative analysis against ResNext, SwinTransformer, and VisionTransformer. Two key metrics, Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE), were employed to evaluate the localization accuracy for both range (unit: km) and depth (unit: m) tasks, facilitating a comprehensive comparison of model robustness. All models were trained under identical conditions using the same dataset and hyper parameters, ensuring a fair comparison. The test results of different models are shown in Figure 13. The MoELocNet model has the best performance in both depth and range tasks.
For the MAE metric, which quantifies the average magnitude of absolute deviations between predicted and ground truth values, in the context of range localization (represented by blue bars in Figure 13a), the MoELocNet achieved the lowest MAE of 0.029 km. When compared to other models, ResNext showed a range MAE of 0.041 km (41.4% higher error); SwinTransformer recorded 0.0545 km (87.9% higher error) and VisionTransformer had 0.034 km (17.2% higher error). In terms of depth localization (represented by orange bars in Figure 13a), MoELocNet yielded a depth MAE of 0.072 m. SwinTransformer had 0.188 m (161.1% higher error), and VisionTransformer had 0.075 m (4.2% higher error). Statistically, MoeLocNet showcased its superior precision in minimizing absolute errors.
As for the MAPE metric, which normalizes errors to account for scale differences and provides a relative performance measure, for range localization (blue bars in Figure 13b), MoELocNet achieved a MAPE of 0.297%. ResNext showed 0.5% (68.3% higher error), SwinTransformer had 0.328% (10.4% higher error), and VisionTransformer had 0.3576% (20.4% error). For depth localization (orange bars in Figure 13b), MoE achieved a MAPE of 0.263%. ResNext showed 0.6% (128.1% higher error), SwinTransformer had 0.29% (10.2% higher error), and VisionTransformer had 0.274% (4.2% higher error). The balanced and low MAPE of MoeLocNet (≤0.3% for both tasks) indicates consistent performance across range and depth scales.
The superiority of MoELocNet can be attributed to its multi-expert architecture. This design allows for the adaptive extraction of interference features from single-frame data. In contrast to ResNet18, which uses simpler residual blocks, and SwinTransformer, which focuses on hierarchical aggregation, MoELocNet leverages specialized experts to resolve complex interference patterns in deep-sea waveguides. This architectural advantage enables MoELocNet to minimize both absolute (MAE) and relative (MAPE) errors, ensuring robust performance across diverse oceanic conditions. Practically, the 0.029 km range error and 0.072 m depth error of MoELocNet meet the precision requirements of deep-sea localization and outperform baseline models in both accuracy and consistency. These results validate the effectiveness of integrating multi-expert learning into residual networks for interference structure extraction, confirming that MoELocNet is the best-performing model in this comparative analysis.
To address the concern raised regarding model interpretability, we further visualize the expert activation patterns within the MoELocNet framework with an expert activation map. The map visually demonstrate the activation intensity of each expert across different interference patterns, revealing which experts are prioritized for specific signal characteristics. By quantifying and visualizing expert contributions, we aim to shed light on the model’s decision-making logic, thereby enhancing trust in its predictions.
The detailed activation patterns are presented in Figure 14. The categories in the map are defined based on depth and range as follows: depth is divided into ‘Shallow’ (less than 50 m) and ‘Deep’ (50 m or more); range is categorized as ‘Near’ (1 km–15 km), ‘Mid’ (15 km–35 km), and ‘Far’ (35 km–50 km). The map shows the activation strength of each expert (with Expert ID 0, 1, and 2) for different depth–range combinations.
As is depicted in Figure 14, expert 1 shows consistent yet moderate activation in all scenarios, acting as a complementary specialist when dealing with transitional acoustic conditions. Regarding physical consistency, the patterns of expert selection are in line with underwater acoustic theory. Far-field sources need specialized processing due to their complex multi-path structures and frequency-dependent attenuation, while near-field sources gain advantages from direct path emphasis and early reflection analysis. The fact that expert selection is independent of depth implies that in our acoustic environment, range-dependent effects have a more significant impact than depth-dependent effects.
In terms of real-time interpretability implementation, our training framework includes comprehensive interpretability features. These features consist of real-time monitoring of expert activation during the inference process and systematic logging of expert selection patterns throughout all training epochs. For robustness analysis, the distinct boundaries of expert activation, where the activation strengths of dominant experts exceed 0.8, demonstrate the robustness of decision making. Meanwhile, the moderate activations in transitional scenarios indicate proper handling of uncertainty. This interpretability is of great importance for building operators’ trust in autonomous underwater acoustic localization systems.
The ablation experimental results based on different routers, loss types, and multi-expert models are shown in Table 4 (where Conv in the table represents Convolutional Stem). We compared the simulation results under different routers, loss types, and with or without the MoE model, respectively. To further validate the efficacy of each component within MoELocNet, we conducted systematic ablation experiments.
Firstly, concerning the impact of the expert architecture, we compare the MoE stem structure with the standard convolutional stem. It is evident that the MoE approach outperforms the convolutional stem. Specifically, the MAE-R is reduced from 0.037 km to 0.029 km, achieving a 21.6% improvement, and the MAE-D is improved from 0.077 m to 0.072 m, with a 6.5% enhancement.
Secondly, we assess the contribution of the multi-task loss. The single-task-loss experiments, including range-only and depth-only experiments, exhibit degraded performance when compared to the multi-task-loss learning. This validates the effectiveness of our joint optimization strategy.
Finally, we analyze the router mode by comparing the learned gating and fixed expert selection. The learned gating consistently demonstrates superior performance over the fixed selection, with MAE-R values of 0.029 km and 0.042 km, respectively.
In summary, these systematic ablation results indicate that the performance improvement is not solely due to the increased model complexity. Instead, it may be attributed to the MoE architecture’s capability to learn specialized representations for different acoustic scenarios.

4.3. Direction of Arrival Test

The test dataset construction adheres to a systematic design to simulate real-world deep-sea waveguide conditions. Targets are randomly positioned across a depth range of 6–67 m and a horizontal distance range of 1–50 km, covering typical deep-sea propagation scenarios. To account for directional variability, signals are simulated for incident angles spanning 0°–45° relative to the horizontal array (100 random sources for each angle, yielding 4500 test samples). Acoustic propagation modeling is performed using the Bellhop. The sound speed profile is shown as Figure 5b and the sediment’s properties are listed in Table 3. Test signals are then processed using the MoELocNet model trained in Section 4.1.
Subsequently, the range error reaches 0.1919 km and the depth error reaches 0.4649 m. As can be seen from Figure 15, the error differences at different angles are not significant. The error varies greatly at short distances, which is because the approximate formula for the arrival sound pressure no longer satisfies the phase condition at short distances. In this case, the data augmentation method cannot be approximately used for synchronous training. Within the shadow zone range, the overall error is small.

4.4. SNR

In this section, the influence of the signal-to-noise ratio of the received signal on the output of the model is considered. When the signal-to-noise ratio is low, the received signal is feeble, and the sound intensity interference structure appears vague. The definition of the signal-to-noise ratio (SNR) is given in Equation (22) and Figure 16 presents input signals with different signal-to-noise ratios. For the noiseless training set, the trained model is applied to the test sets with different SNRs, where the SNRs range from −10 dB to 50 dB.
As depicted in the figures, Figure 17a illustrates the Mean Absolute Error (MAE) for range localization against the Signal-to-Noise Ratio (SNR), while Figure 17b shows the corresponding MAE for depth localization. Evidently, range localization exhibits relatively low sensitivity to SNR variations and performs satisfactorily across all tests. When the SNR exceeds 5 dB, the MAE for range localization gradually diminishes, dropping below 0.15 km once the SNR reaches 20 dB. Conversely, for depth localization, the MAE is notably high at low SNR levels. A sharp decline in the MAE occurs around −5 dB. Once the SNR surpasses 0 dB, the MAE for depth localization stabilizes.
On this basis, model training for the noisy training set is added. During the first 10 epochs of training, noise is added to the training set with random SNR values in the range from −10 dB to 0 dB. Finally, the performance is tested on the test sets with different SNRs. Results are shown in Figure 17a,b. Compared to the noiseless condition, the Mean Absolute Error (MAE) for both range and depth localization decreases. When the Signal-to-Noise Ratio (SNR) exceeds 20 dB, as the SNR further increases, the MAE in both scenarios remains constant.

4.5. Marine Environmental Adaptability

Despite the model’s favorable performance with simulation data, the simulated towed horizontal array data are overly idealized and fail to account for the real ocean environment, including factors such as sea depth, variations in sound velocity profiles, changes in internal wave environments, and even array tilt effects caused by the ocean environment. Therefore, in this section, the trained model will be used to conduct tests in three distinct environments as illustrated in Table 5.
Figure 18a depicts the sound speed profiles (SSPs) utilized in different test environments. The SSPs of the Indian Ocean and Munk are both typical deep-sea sound speed profiles, with slight differences in sea depth, while the Gulf SSP represents a typical double-channel axis sound speed. The micro-disturbance in this context is characterized by a mechanical model, which manifests as minor variations in the receiving range and depth for each receiver. These variations can be attributed to vehicle movement or ocean fluctuations. Here, we adopted an array tilt structure as illustrated in Figure 18b. The aforementioned receiving data are all generated using Bellhop, and parameters that are not explicitly mentioned, such as the seabed parameters, remain consistent with those used in Table 3.
Figure 19 presents a comparative analysis of target localization outcomes across various oceanic environment tests. Specifically, Figure 19a–c illustrates the range localization results, while Figure 19d–f depicts the depth localization results. As observed from the figures, when the SSP types are analogous, the localization results remain relatively accurate, with minor errors in depth localization but consistently high efficiency in range localization. Conversely, when there are substantial variations in the SSP, as exemplified in Figure 19b,e, the dual-channel axis in the Gulf SSP amplifies the influence of the sound source depth. The acoustic field variations induced by slight changes in the sound source are considerably more pronounced compared to the other two SSPs, and this effect escalates with increasing range. Additionally, as indicated in Figure 19c,f, slight array disturbances exert minimal impact on the overall localization performance. These findings are further corroborated by the MAE and MAPE results presented in Table 6.

4.6. Long-Range Localization

After tested by the training sets in the deep-sea shadow zone, the MoELocNet model is used to train and test in a long-range situation. The marine environment used for generating data is shown in Table 7. Similar to the deep-sea shadow zone test set, the sound speed profile is shown in Figure 5b. In the test environment, the sound source depth ranges from 1 m to 200 m with an interval of 0.25 m; the sound source distance ranges from 1 km to 200 km with an interval of 0.2 km. The receiving depth is fixed at 100 m. Figure 20 shows the two dimensional transmission loss and the division of the sound field region. Within the receiving range of 200 km, there are a total of three convergence zones (with positions, respectively, at 63.8 km, 127.6 km and 191.4 km), three shadow zones, and one direct arrival zone.
As shown in Figure 21, this model behaves well in this long-range situation. The final range MSE is 0.0609 km and the depth MSE is 0.074 m. Meanwhile, the range MAPE is 0.169 % and depth MAPE is 0.297%. It can be seen from the figure that the training effect of this network structure as a whole is pretty good.
As shown in Figure 22, there are obvious bright spots at the convergence zone range (i.e., 63.8 km, 127.6 km, 191.4 km), which means the majority of higher errors occur when the sound source is located at the convergence zone. This is because in this relatively complex region, the sound ray convergence effect is significant. A large number of sound rays interact with each other, making it impossible to obtain the actual interference period or structure. On the other hand, the received signal energy in the convergence zone is quite strong, smoothing out the energy differences across different ranges and frequencies, failing to provide corresponding characteristic quantities. The model performs well in the first and second shadow zone. In the direct arrival zone, the error is still relatively large due to the short range. Through this test, the network structure originally applicable to the deep-sea shadow zone range is extended to different deep-sea areas.

5. Discussion

This research leverages the interference effects in the range-frequency domain of array-received signals to estimate target range and depth, addressing a critical challenge in underwater acoustics. Traditional localization methods, which rely on establishing relationships between target range, transceiver depth, and waveguide characteristics for correlation searches, are inherently constrained by dynamic ocean environments and target geometry. For example, beamforming with horizontal arrays have been widely used to exploit interference patterns [10], but their performance deteriorates significantly in low signal-to-noise ratio (SNR) conditions or time-varying environments, where interference fringes become indistinct.
The proposed method overcomes these limitations through two key innovations. First, by analyzing the received signals from individual array elements, it mitigates the impact of time-varying environmental factors, offering a more stable foundation for feature extraction. Second, the integration of MoELocNet addresses the critical challenge of extracting interference periods from weak signals. Unlike conventional algorithms, which struggle to identify periodic patterns under noisy conditions, the multi-expert and multi-task design of the neural network enables it to robustly capture subtle interference features. This aligns with emerging trends in machine learning for signal processing, where deep neural networks have demonstrated superior capabilities in handling complex, nonlinear relationships within acoustic data.
The implications of these findings extend beyond mere methodological improvement. By decoupling interference analysis from traditional beamforming, the proposed approach significantly enhances localization reliability, particularly in deep-sea shadow zones where acoustic energy is sparse and conventional methods often fail. Moreover, MoELocNets’ ability to learn implicit associations between interference structures and target parameters bridges the gap between physical models and data-driven approaches. This synergy not only improves localization accuracy but also paves the way for data-efficient solutions in environments where precise waveguide modeling is difficult to achieve, such as during oceanic frontal passages or in regions with complex seafloor topographies.
Regarding the limitations of MoeLocNet, this model is primarily tailored for the deep-sea shadow zone structure. When extending its application to different positioning ranges, we find that the positioning effect remains favorable in multiple shadow zones, yet there are certain errors in the direct arrival zone and significant errors in the convergence zone. By combining with the theoretical derivation of the interference structure in the previous section, we conclude that this model performs optimally in regions where the interference effect is sufficiently pronounced. In the convergence zone, the excessive number of arrival structures and the strong convergence phenomenon of sound intensity make it difficult to extract the interference structure. For the direct arrival zone, typically, when receiving at a large depth, the time-delay difference between sea surface reflected sound and direct arrival sound is sufficient to obtain a good interference structure. However, in this paper, the receiving depth is set for a towed array that is slightly beneath the ocean surface, resulting in arrival delays that are too close to obtain a good interference structure. If the model in this paper is to be extended to this area, using the measurement data of the seabed horizontal array would be an option.
Nevertheless, there are still several aspects of the current framework that warrant further exploration. The model does not account for the potential impact of signal disorder caused by marine phenomena such as internal waves. In real-world deployments, internal waves can distort interference patterns and introduce systematic errors in target localization. Incorporating mechanisms to mitigate the effects of internal wave-induced signal disorder into the neural network architecture would enhance its practical applicability. Additionally, the study does not fully encompass heterogeneous ocean environments, including underwater mountains, sloping seafloors, or dynamic processes like internal waves, all of which can further distort interference patterns. Expanding the training dataset to encompass these diverse scenarios is crucial for improving the model’s versatility. Finally, while the current approach focuses on passive sonar applications, exploring its integration with active sonar systems presents a promising avenue. By leveraging the known spectral characteristics of transmitted signals, the method could potentially achieve even higher localization precision, opening new possibilities for joint estimation of target parameters and environmental characteristics.
In summary, this research demonstrates the potential of combining array signal analysis with deep neural networks for underwater source localization, providing a more robust alternative to traditional methods in challenging environments. Future research should concentrate on refining the model to accommodate complex geometries and environmental conditions, as well as exploring its compatibility with active sonar modalities, to fully realize its practical potential in oceanic applications.

6. Conclusions

This research investigates how the interference structure of received signals in the range-frequency domain within a deep-sea waveguide correlates with target depth and range. Afterwards, the features of the interference period contained in the received signals collected from the towed horizontal line array are extracted by means of a deep learning-based method, so as to localize the sound source. For this purpose, a multi-expert coupling model-based on ResNet is proposed in this paper. A multi-task and multi-expert training method is adopted to localize the target, and theoretical verification is carried out through rigorous experiments.
(a)
Model Design
The MoELocNet model breaks through the representational limitations of single networks for complex marine acoustic signals via a collaborative architecture with a “ResNet main backbone + multi-expert fine-grained mining + multi-task strong correlation”, making it more adaptive to sound source localization in the time-varying and multi-interference environments of deep-sea waveguides. This lays a precise feature foundation for subsequent localization inference.
(b)
Experimental Validation
The MoELocNet’s performance was well validated using a horizontal array reception dataset constructed with sound sources at varying depths and ranges in the deep-sea shadow zone. Comparisons with other baseline models such as Swin-Transformer and ResNet18 demonstrate significant improvements in localization accuracy for both depth and range. Meanwhile, by contrasting the localization results of signals with different signal-to-noise ratios (SNRs), the model’s noise resistance is evident. Further evaluation of the model’s applicability in long-range-localization datasets shows that this model achieves optimal performance in the shadow zone, exhibits secondary effectiveness in the direct arrival zone, and is generally inapplicable in the convergence zone.
Overall, this research provides valuable insights for underwater source localization using a horizontal line array. The MoeLocNet framework proves to be efficient and practical, demonstrating significant potential in the shadow zone. Future work may focus on enhancing the MoeLocNet architecture to improve its applicability in the direct sound zone and robustness against diverse ocean waveguides.

Author Contributions

Conceptualization, Z.H. and P.Q.; methodology, Z.H., P.Q. and Y.D.; software, Z.H., P.Q. and Y.D.; validation, Z.H. and P.Q.; formal analysis, Z.H. and P.Q.; investigation, none specified; resources, Z.L. and P.X.; data curation, none specified; writing—original draft preparation, Z.H.; writing—review and editing, Z.H., Y.D., P.Q., Z.L. and P.X.; visualization, none specified; supervision, Z.L.; project administration, Z.L.; funding acquisition, none specified. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 12374428 and 12474446; and by the Guangdong Basic and Applied Basic Research Foundation grant number 2024A1515030149, Science and Technology on Sonar Laboratory(2024-JCJQ-LB-32/08), and Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (SML2024SP006).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gong, Z.; Tran, D.D.; Ratilal, P. Comparing passive source localization and tracking approaches with a towed horizontal receiver array in an ocean waveguide. J. Acoust. Soc. Am. 2013, 134, 3705–3720. [Google Scholar] [CrossRef]
  2. Heaney, K.D.; Campbell, R.L.; Murray, J.J.; Baggeroer, A.B.; Scheer, E.K.; Stephen, R.A.; Mercer, J.A. Deep water towed array measurements at close range. J. Acoust. Soc. Am. 2013, 134, 3230–3241. [Google Scholar] [CrossRef] [PubMed]
  3. D’Spain, G.L.; Kuperman, W.A. Application of waveguide invariants to analysis of spectrograms from shallow water environments that vary in range and azimuth. J. Acoust. Soc. Am. 1999, 106, 2454–2468. [Google Scholar] [CrossRef]
  4. Chuprov, S.D. Interference structure of a sound field in a layered ocean. In Ocean Acoustics. Current State; Brekhovskikh, L.M., Andreeva, I.B., Eds.; Nauka: Moscow, Russia, 1982; pp. 71–91. [Google Scholar]
  5. Brekhovskikh, L.M.; Lysanov, Y.P. Fundamentals of Ocean Acoustics, 3rd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  6. Emmetière, R.; Bonnel, J.; Gėhant, M.; Cristol, X.; Chonavel, T. Understanding Deep-Water Striation Patterns and Predicting the Waveguide Invariant as a Distribution Depending on Range and Depth. J. Acoust. Soc. Am. 2018, 143, 3444–3454. [Google Scholar] [CrossRef] [PubMed]
  7. Harrison, C.H. The Relation between the Waveguide Invariant, Multipath Impulse Response, and Ray Cycles. J. Acoust. Soc. Am. 2011, 129, 2863–2877. [Google Scholar] [CrossRef]
  8. Yang, T.C. Beam Intensity Striations and Applications. J. Acoust. Soc. Am. 2003, 113, 1342–1352. [Google Scholar] [CrossRef]
  9. Zurk, L.M.; Ou, H.H.; Campbell, R.L. Effect of the Target Scattering Function on Active Waveguide Invariance Striations. In Proceedings of the OCEANS 2010 MTS/IEEE SEATTLE, Seattle, WA, USA, 20–23 September 2010; pp. 1–5. [Google Scholar] [CrossRef]
  10. Turgut, A.; Orr, M.; Rouseff, D. Broadband source localization using horizontal-beam acoustic intensity striations. J. Acoust. Soc. Am. 2010, 127, 73–83. [Google Scholar] [CrossRef]
  11. Rouseff, D. Modeling the Waveguide Invariant as a Distribution. AIP Conf. Proc. 2002, 621, 137–150. [Google Scholar] [CrossRef]
  12. Rouseff, D.; Zurk, L.M. Striation-Based Beamforming for Estimating the Waveguide Invariant with Passive Sonar. J. Acoust. Soc. Am. 2011, 130, EL76–EL81. [Google Scholar] [CrossRef]
  13. Cockrell, K.L.; Schmidt, H. Robust Passive Range Estimation Using the Waveguide Invariant. J. Acoust. Soc. Am. 2010, 127, 2780–2789. [Google Scholar] [CrossRef]
  14. Quijano, J.E.; Zurk, L.M.; Rouseff, D. Demonstration of the Invariance Principle for Active Sonar. J. Acoust. Soc. Am. 2008, 123, 1329–1337. [Google Scholar] [CrossRef]
  15. Rouseff, D.; Leigh, C.V. Using the waveguide invariant to analyze Lofargrams. In Proceedings of the OCEANS ’02 MTS/IEEE, Biloxi, MI, USA, 29–31 October 2002; Volume 4, pp. 2239–2243. [Google Scholar] [CrossRef]
  16. Liu, W.; Yang, Y.; Xu, M.; Lü, L.; Liu, Z.; Shi, Y. Source Localization in the Deep Ocean Using a Convolutional Neural Network. J. Acoust. Soc. Am. 2020, 147, EL314–EL319. [Google Scholar] [CrossRef]
  17. Zhou, X.; Yang, K.; Duan, R. Deep Learning Based on Striation Images for Underwater and Surface Target Classification. IEEE Signal Process. Lett. 2019, 26, 1378–1382. [Google Scholar] [CrossRef]
  18. Ferguson, E.L.; Williams, S.B.; Jin, C.T. Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, USA, 15–20 April 2018; pp. 2386–2390. [Google Scholar] [CrossRef]
  19. Neupane, D.; Seok, J. A review on deep learning-based approaches for automatic sonar target recognition. Electronics 2020, 9, 1972. [Google Scholar] [CrossRef]
  20. Yao, Q.; Wang, Y.; Yang, Y. Underwater acoustic acoustic acoustic target recognition based on data augmentation and residual CNN. Electronics 2023, 12, 1206. [Google Scholar] [CrossRef]
  21. Liu, Y.; Niu, H.; Li, Z. A Multi-Task Learning Convolutional Neural Network for Source Localization in Deep Ocean. J. Acoust. Soc. Am. 2020, 148, 873–883. [Google Scholar] [CrossRef]
  22. Niu, H.; Gong, Z.; Ozanich, E.; Gerstoft, P.; Wang, H.; Li, Z. Deep-Learning Source Localization Using Multi-Frequency Magnitude-Only Data. J. Acoust. Soc. Am. 2019, 146, 211–222. [Google Scholar] [CrossRef]
  23. Qian, P.; Gan, W.; Niu, H.; Ji, G.; Li, Z.; Li, G. A Feature-Compressed Multi-Task Learning U-net for Shallow-Water Source Localization in the Presence of Internal Waves. Appl. Acoust. 2023, 211, 109530. [Google Scholar] [CrossRef]
  24. Rouseff, D. Effect of shallow water internal waves on ocean acoustic striation patterns. Waves Random Media 2001, 11, 377. [Google Scholar] [CrossRef]
  25. Hou, Z.; Tang, D.; Liu, J.; Li, Z.; Xiao, P. Distribution and influencing factors of acoustic characteristics of seafloor sediment in the Sunda Shelf. J. Oceanol. Limnol. 2024, 42, 1486–1492. [Google Scholar] [CrossRef]
  26. Jensen, F.B.; Kuperman, W.A.; Porter, M.B.; Schmidt, H. Computational Ocean Acoustics (Chapter 3); Springer: Berlin/Heidelberg, Germany, 1994. [Google Scholar]
  27. Song, H.C.; Cho, C. The relation between the waveguide invariant and array invariant. J. Acoust. Soc. Am. 2015, 138, 899–903. [Google Scholar] [CrossRef]
  28. Lee, S.; Makris, N.C. The array invariant. J. Acoust. Soc. Am. 2006, 119, 336–351. [Google Scholar] [CrossRef] [PubMed]
  29. Xie, Y.; Ren, J.; Li, J.; Xu, J. Advancing robust underwater acoustic target recognition through multitask learning and multi-gate mixture of experts. J. Acoust. Soc. Am. 2024, 156, 244–255. [Google Scholar] [CrossRef] [PubMed]
  30. Zhang, L.; Zhang, S.; Zhang, X.; Zhao, Y. A Multimodal Artificial Intelligence Model for Depression Severity Detection Based on Audio and Video Signals. Electronics 2025, 14, 1464. [Google Scholar] [CrossRef]
  31. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Figure 1. Simulation environment. (a) Spatial frequency striation in deep ocean. (b) Sound speed profile.
Figure 1. Simulation environment. (a) Spatial frequency striation in deep ocean. (b) Sound speed profile.
Electronics 14 03053 g001
Figure 3. Experiment setup scheme. (a) Experiment setup. (b) Sound speed profile.
Figure 3. Experiment setup scheme. (a) Experiment setup. (b) Sound speed profile.
Electronics 14 03053 g003
Figure 4. Range-frequency domain interference striation observed in the experiment. The emission depth is 110 m, and the receiving depths are, respectively, (a) 50 m, (b)100 m, (c) 500 m, and (d) 700 m.
Figure 4. Range-frequency domain interference striation observed in the experiment. The emission depth is 110 m, and the receiving depths are, respectively, (a) 50 m, (b)100 m, (c) 500 m, and (d) 700 m.
Electronics 14 03053 g004
Figure 5. Simulation environment. (a) Ocean waveguide and setup. The signal is collected through an AUV towed horizontal line array so its receiving depths is versatile. (b) Sound speed profile: A typical Munk profile.
Figure 5. Simulation environment. (a) Ocean waveguide and setup. The signal is collected through an AUV towed horizontal line array so its receiving depths is versatile. (b) Sound speed profile: A typical Munk profile.
Electronics 14 03053 g005
Figure 6. Striation versus range when source lies at different depths and ranges. (a) Depth: 6 m; range: 1 km. (b) Depth: 6 m; range: 10 km. (c) Depth: 6 m; range: 20 km. (d) Depth: 6 m; range: 40 km. (e) Depth: 50m; range: 1km. (f) Depth: 50 m; range: 10 km. (g) Depth: 50 m; range: 20 km. (h) Depth: 50 m; range: 40 km.
Figure 6. Striation versus range when source lies at different depths and ranges. (a) Depth: 6 m; range: 1 km. (b) Depth: 6 m; range: 10 km. (c) Depth: 6 m; range: 20 km. (d) Depth: 6 m; range: 40 km. (e) Depth: 50m; range: 1km. (f) Depth: 50 m; range: 10 km. (g) Depth: 50 m; range: 20 km. (h) Depth: 50 m; range: 40 km.
Electronics 14 03053 g006
Figure 7. Striation Cycle Comparison between prediction and simulation, z s = 50 m, z r = 100 m.
Figure 7. Striation Cycle Comparison between prediction and simulation, z s = 50 m, z r = 100 m.
Electronics 14 03053 g007
Figure 8. Data augmentation through cutout. The cutout data space is filled with 0. (a) Original input data. (b) Data augmentation result.
Figure 8. Data augmentation through cutout. The cutout data space is filled with 0. (a) Original input data. (b) Data augmentation result.
Electronics 14 03053 g008
Figure 9. An overview of the structure of (a) MoELocNet model. (b) Detailed architecture components.
Figure 9. An overview of the structure of (a) MoELocNet model. (b) Detailed architecture components.
Electronics 14 03053 g009
Figure 10. Training and testing results. (a) Range MAE curve. (b) Range MAPE curve. (c) Range comparison. (d) Depth MAE curve. (e) Depth MAPE curve. (f) Depth comparison.
Figure 10. Training and testing results. (a) Range MAE curve. (b) Range MAPE curve. (c) Range comparison. (d) Depth MAE curve. (e) Depth MAPE curve. (f) Depth comparison.
Electronics 14 03053 g010
Figure 11. Combined error versus range and depth. (a) The two-dimensional combined error heatmap. (b) Combined error distribution histogram.
Figure 11. Combined error versus range and depth. (a) The two-dimensional combined error heatmap. (b) Combined error distribution histogram.
Electronics 14 03053 g011
Figure 12. Source localization with interference striation when the interference interval is extracted by 2D-DFT: The black solid line represents ground truth, red cross marks represent results acquired by traditional sonar methods, and yellow circle marks represent results acquired by MoELocNet.
Figure 12. Source localization with interference striation when the interference interval is extracted by 2D-DFT: The black solid line represents ground truth, red cross marks represent results acquired by traditional sonar methods, and yellow circle marks represent results acquired by MoELocNet.
Electronics 14 03053 g012
Figure 13. Comparison of training results of different visual neural network models. (a) MAE output results of range and depth features. Blue bar represents range (unit: km), and red bar represents depth (unit: m). (b) MAPE output results of Range and depth features, all presented as percentages.
Figure 13. Comparison of training results of different visual neural network models. (a) MAE output results of range and depth features. Blue bar represents range (unit: km), and red bar represents depth (unit: m). (b) MAPE output results of Range and depth features, all presented as percentages.
Electronics 14 03053 g013
Figure 14. Statistical chart of expert activation.
Figure 14. Statistical chart of expert activation.
Electronics 14 03053 g014
Figure 15. MAE for different direction of arrival tests. (a) Two-dimensional combined error. (b) MAE versus direction of arrival where blue line refers to range test (units: km) and orange line refers to depth (units: m).
Figure 15. MAE for different direction of arrival tests. (a) Two-dimensional combined error. (b) MAE versus direction of arrival where blue line refers to range test (units: km) and orange line refers to depth (units: m).
Electronics 14 03053 g015
Figure 16. Input data with different signal-to-noise ratios. (a) SNR: −10 dB. (b) SNR: −5 dB. (c) SNR: 0 dB. (d) SNR: 5 dB.
Figure 16. Input data with different signal-to-noise ratios. (a) SNR: −10 dB. (b) SNR: −5 dB. (c) SNR: 0 dB. (d) SNR: 5 dB.
Electronics 14 03053 g016
Figure 17. Performance metrics versus SNR. (a) MAE for the range estimation by MoELocNet. (b) MAE for the depth estimation by MoELocNet.
Figure 17. Performance metrics versus SNR. (a) MAE for the range estimation by MoELocNet. (b) MAE for the depth estimation by MoELocNet.
Electronics 14 03053 g017
Figure 18. Environment setup for model evaluation. (a) Sound speed profile for different environments. (b) Schematic diagram of array tilt, with the geometric center of the array as the coordinate origin.
Figure 18. Environment setup for model evaluation. (a) Sound speed profile for different environments. (b) Schematic diagram of array tilt, with the geometric center of the array as the coordinate origin.
Electronics 14 03053 g018
Figure 19. Testing results for different environments. (a) Range comparison for environment type 1. (b) Range comparison for environment type 2. (c) Range comparison for environment type 3. (d) Depth comparison for environment type 1. (e) Depth comparison for environment type 2. (f) Depth comparison for environment type 3.
Figure 19. Testing results for different environments. (a) Range comparison for environment type 1. (b) Range comparison for environment type 2. (c) Range comparison for environment type 3. (d) Depth comparison for environment type 1. (e) Depth comparison for environment type 2. (f) Depth comparison for environment type 3.
Electronics 14 03053 g019
Figure 20. The two-dimensional transmission loss and the division of the sound field region. (a) The direct arrival zone. (b1–b3) The shadow zone. (c1–c3) The convergence zone. The red dash line represents the range where the convergence zone lies.
Figure 20. The two-dimensional transmission loss and the division of the sound field region. (a) The direct arrival zone. (b1–b3) The shadow zone. (c1–c3) The convergence zone. The red dash line represents the range where the convergence zone lies.
Electronics 14 03053 g020
Figure 21. Training and testing results. (a) Range MAE curve. (b) Range MAPE curve. (c) Range comparison. (d) Depth MAE curve. (e) Depth MAPE curve. (f) Depth comparison.
Figure 21. Training and testing results. (a) Range MAE curve. (b) Range MAPE curve. (c) Range comparison. (d) Depth MAE curve. (e) Depth MAPE curve. (f) Depth comparison.
Electronics 14 03053 g021
Figure 22. Combined error versus range and depth. (a) The two-dimensional combined error heatmap. (b) Combined error distribution histogram.
Figure 22. Combined error versus range and depth. (a) The two-dimensional combined error heatmap. (b) Combined error distribution histogram.
Electronics 14 03053 g022
Table 1. Detailed network architectures for MoELocNet.
Table 1. Detailed network architectures for MoELocNet.
ModuleDetailed Network Architecture
Expert layerConv2d( in_channel = 3, out_channel = 64, kernel_size = 7, stride = 2, padding = 3); Batch normalization (BN) 2d(num_features = 64); ReLU(); Max_pooling(kernel_size = 3, stride = 2, padding = 1)
Gating layerLinear(in_features = 3, out_features = 128); BN 1d(num_features = 128); ReLU(); Linear(in_features = 128, out_features = expert num)
Basic block 1(in dim, out dim)Conv2d(in_dim, out_dim, kernel_size = 3, padding = 1); BN 2d(num_features = out_dim); ReLU(); Conv2d(out_dim, out_dim, kernel_size = 3, padding = 1); BN 2d(num_features = out_dim)
Basic block 2(in dim, out dim)Conv2d(in_dim, out_dim, kernel_size = 3, padding = 1); BN 2d(num_features = out_dim); ReLU(); Conv2d(out_dim, out_dim, kernel_size = 3, padding = 1); BN 2d(num_features = out_dim); (downsample): Sequential(    Conv2d(in_dim, out_dim, kernel_size = 1, stride = 2);    BatchNorm2d(out_dim) )
Shared layerBasic block1(64,64), basic block1(64,64); Basic block2(64,128), basic block1(128,128); Basic block2(128,256), basic block1(256,256)
Task residual layerBasic block2(256,512), basic block1(512,512)
Table 2. Training parameter settings.
Table 2. Training parameter settings.
Parameter NameParameter Value
Epochs (training rounds)200
Batchsize128
OptimizerAdam
Learning rate 1 × 10 3
Rate of decoupling weight decay regularization 1 × 10 4
Table 3. Environmental parameters for training and testing sets’ generation.
Table 3. Environmental parameters for training and testing sets’ generation.
ParametersUnitsLower BoundUpper BoundNo. of Discrete Values
Data set
Source rangekm1.050.0246
Source depthm6.067.0245
Sediment parameters
Sediment thicknessm30.030.0
P-wave speedm/s16001600
Densityg/cm31.51.5
P-wave attenuationdB/ λ 0.20.2
Table 4. Ablation experiment.
Table 4. Ablation experiment.
Stem TypeRouter ModeLossMAE-R (km)MAE-D (m)MAPE-R (%)MAPE-D (%)
Conv-MTL0.0370.0770.40.5
MoELearnedMTL0.0290.0720.2970.267
MoELearnedRange0.035-0.265-
MoELearnedDepth-0.132-0.48
MoEFixedMTL0.0420.0770.3430.280
Table 5. Test environments for model evaluation.
Table 5. Test environments for model evaluation.
EnvironmentSound Speed ProfileSea Depth (m)Array Micro-Disturbance
Type 1India Ocean (Measured)5223No
Type 2Gulf (Ideal)5000No
Type 3Munk (Ideal)5000Yes
Table 6. Test results for model evaluation.
Table 6. Test results for model evaluation.
EnvironmentMAE-R (km)MAE-D (m)MAPE-R (%)MAPE-D (%)
Type 10.44470.67557.2510.48
Type 22.63022.18623.722.25
Type 30.15000.30620.831.08
Table 7. Long-range simulation parameter settings.
Table 7. Long-range simulation parameter settings.
ParametersUnitsLower BoundUpper BoundNo. of Discrete Values
Data set
Source rangekm1.0200.0996
Source depthm6.067.0245
Sediment parameters
Sediment thicknessm30.030.0
P-wave speedm/s16001600
Densityg/cm31.51.5
P-wave attenuationdB/ λ 0.20.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, Z.; Deng, Y.; Qian, P.; Li, Z.; Xiao, P. Deep Learning-Based Source Localization with Interference Striation of a Towed Horizontal Line Array. Electronics 2025, 14, 3053. https://doi.org/10.3390/electronics14153053

AMA Style

Huang Z, Deng Y, Qian P, Li Z, Xiao P. Deep Learning-Based Source Localization with Interference Striation of a Towed Horizontal Line Array. Electronics. 2025; 14(15):3053. https://doi.org/10.3390/electronics14153053

Chicago/Turabian Style

Huang, Zhengchao, Yanfa Deng, Peng Qian, Zhenglin Li, and Peng Xiao. 2025. "Deep Learning-Based Source Localization with Interference Striation of a Towed Horizontal Line Array" Electronics 14, no. 15: 3053. https://doi.org/10.3390/electronics14153053

APA Style

Huang, Z., Deng, Y., Qian, P., Li, Z., & Xiao, P. (2025). Deep Learning-Based Source Localization with Interference Striation of a Towed Horizontal Line Array. Electronics, 14(15), 3053. https://doi.org/10.3390/electronics14153053

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop