Underwater Time Delay Estimation Based on Meta-DnCNN with Frequency-Sliding Generalized Cross-Correlation

Ji, Meiqi; Cui, Xuerong; Li, Juan; Li, Lei; Jiang, Bin

doi:10.3390/jmse13050919

Open AccessArticle

Underwater Time Delay Estimation Based on Meta-DnCNN with Frequency-Sliding Generalized Cross-Correlation

by

Meiqi Ji

^1,*,

Xuerong Cui

^1,*,

Juan Li

²,

Lei Li

¹ and

Bin Jiang

¹

College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China

²

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China

^*

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(5), 919; https://doi.org/10.3390/jmse13050919

Submission received: 4 April 2025 / Revised: 26 April 2025 / Accepted: 28 April 2025 / Published: 7 May 2025

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

In underwater signal processing, accurate time delay estimation (TDE) is of crucial importance for ensuring the reliability of data transmission. However, the complex propagation of sound waves and strong noise interference in the underwater environment make this task extremely challenging. Especially under the condition of low signal-to-noise ratio (SNR), the existing methods based on cross-correlation and deep learning struggle to meet requirements. Aiming at this core issue, this paper proposed an innovative solution. Firstly, a multi-sub-window reconstruction is performed on the frequency-sliding generalized colorboxpinkcross-correlation (FS-GCC) matrix between signals to capture the time delay characteristics from different frequency bands and conduct the enhancement and extraction of features. Then, the grayscale image corresponding to the generated FS-GCC matrix is used, and the multi-level noise features are extracted by the multi-layer convolution of denoising convolutional neural network (DnCNN), effectively suppressing the noise and improving the estimation accuracy. Finally, the model-agnostic meta-learning (MAML) framework is introduced. Through training tasks under various SNR conditions, the model is enabled to possess the ability to quickly adapt to new environments, and it can achieve the desired estimation accuracy even when the number of underwater training samples is limited. Simulation validation was conducted under the NOF and NCS underwater acoustic channels, and results demonstrate that our proposed approach exhibits lower estimation errors and greater stability compared with existing methods under the same conditions. This method enhances the practicality and robustness of the model in complex underwater environments, providing strong support for the efficient and stable operation of underwater sensor networks.

Keywords:

underwater TDE; FS-GCC; DnCNN; MAML

1. Introduction

Underwater sensor networks [1,2], as a crucial technology for obtaining ocean information, are crucial for ocean detection, environmental monitoring, and military applications. Accurate underwater signal time delay estimation (TDE) is the foundation for ensuring the reliability of data transmission. It is related to whether a network can accurately and promptly acquire and transmit ocean information. In the complex underwater environment, acoustic wave propagation is affected by factors like water temperature, salinity, and depth, along with various noise sources such as marine organisms, ships, and waves. These conditions pose significant challenges to TDE, which are more pronounced than in terrestrial environments [3,4,5]. Therefore, improving the accuracy of underwater TDE is essential for the efficient and stable operation of underwater sensor networks.

The generalized cross-correlation (GCC) algorithm [6,7,8] is a vital tool for underwater TDE and is widely employed in scenarios such as detection, localization, and tracking of underwater targets. This algorithm estimates the time delay between two signals through frequency domain analysis, leveraging the phase information of the cross-power spectrum. Jang et al. [9] employed the GCC algorithm to extract the time difference of arrival (TDOA) information. By assigning specific weights to different frequency components, they effectively suppressed the noise interference and thus achieved the positioning and tracking of marine organisms. The uniqueness of this method lies in its frequency weighting technique, which can carry out targeted suppression according to the frequency characteristics of the noise. However, it has limited adaptability to complex and changeable noise environments. When the frequency distribution of the noise is relatively wide and irregular, the suppression effect will decline. Jia et al. [10] proposed an innovative zero-padding GCC algorithm for direct and reflected waves, accurately estimating time delays from unknown emission waveforms of underwater high-level explosive sound sources. However, this method is only applicable to specific sound sources, and its applicability to non-explosive sound sources needs to be further improved. Additionally, Pang et al. [11] proposed a TDE algorithm based on the quadratic GCC for the field of underwater positioning. This method conducts a quadratic GCC based on traditional weighting functions. Although the accuracy of time delay estimation has been improved under low signal-to-noise ratio (SNR) conditions, it still cannot effectively solve the problems of false peaks and spurious peaks caused by traditional weighting functions.

Although the GCC algorithm and its derivative methods have met the requirements of some underwater TDE to a certain extent, their susceptibility to noise remains a significant concern. Particularly in scenarios with low SNRs, the GCC algorithm is prone to generating spurious peaks, which can severely disrupt the accuracy of TDE, causing significant deviations between the estimated results and the actual values, and thus failing to meet the high-precision requirements of practical underwater applications. In recent years, with the rapid development of deep learning technology, there has been an increasing exploration of its applications in fields such as cross-correlation TDE and sound source localization [12,13]. Yao et al. [14] utilized parallel long short-term memory (LSTM) neural networks to estimate parameters in the GCC_PHAT model. By leveraging the LSTM’s strong ability to process time series data, this method effectively reduced the variance of TDE in low SNRs. However, the training process of this method is rather complicated, requiring a large amount of training data and a long training time. Salvati et al. [15] constructed a GCC feature matrix using PHAT filters with different parameters

β

, which was then input into a CNN to directly estimate the TDOA for speaker localization. The innovation of this method lies in the ingenious construction of the feature matrix of the GCC function. Nevertheless, due to the limited amount of underwater environment data, an insufficient dataset may lead to unsatisfactory training results. Comanducci et al. [16] employed GCCs as inputs and introduced a ray space transformation (RST) as an intermediary between CNN input and output, enhancing interpretability through feature visualization. However, this method has a relatively high computational complexity, and in a complex underwater environment, the stability of the RST intermediate quantity may be affected.

Despite that methods combining correlation functions with deep learning has shown strong generalization capabilities in the field of speech signal processing, underwater environments often suffer from low SNR due to environmental noise and reverberation, limiting the efficacy of these methods. To further enhance the robustness of TDE, the frequency-sliding generalized cross-correlation (FS-GCC) technique [17,18] has emerged. By applying subband low-pass filtering to the phase spectrum of the cross-power spectrum, this method provides a well-organized two-dimensional depiction of time delay data over various frequency ranges. Song et al. [19] extracted the principal eigenvectors of the FS-GCC through constructing window groups, pruning, deconvolution, and merging operations. A large number of experiments have verified that this algorithm outperforms other algorithms, such as having a lower estimating anomaly probability, lower estimation error, and a higher first-to-second peak ratio (FSPR). However, the adaptability of this method to certain extremely complex noise environments still needs to be further verified. Overall, this method provides new ideas and directions for the field of TDE. Subsequent research can be further optimized and improved on this basis to better address various challenges in practical applications. Recent research has combined FS-GCC with deep learning to address the TDE problem. For instance, Comanducci et al. [20] were inspired by deep learning image denoising approaches, proposing the use of U-Net to denoise images constructed from FS-GCC matrices, resulting in enhanced time delay information. However, the U-Net model still has room for improvement in the feature extraction ability of images.

To address these challenges, this paper proposes a meta-learning-based denoising convolutional neural network (DnCNN) FS-GCC method, named Meta-DnCNN_FS-GCC. This method takes the grayscale image generated by the feature-enhanced FS-GCC matrices as input to estimate the FS-GCC under noise-free conditions and optimize the accuracy of TDE. It aims to enhance signal processing capabilities in underwater environments, providing more reliable and accurate time delay information for underwater communication nodes.

The main contributions of this paper are as follows:

Reconstruct FS-GCC matricies for multiple subwindow groups. Each subwindow captures time delay-related signal features from different frequency band perspectives, thereby enriching the information dimension of the FS-GCC matricies and enhancing the key feature vectors. Compared with traditional FS-GCC procedure, this reconstruction operation can more fully extract the time delay information in the signals, improve the accuracy of TDE, and reduce the probability of abnormal estimation.
Recognizing the effectiveness of the DnCNN in image denoising and feature extraction, this paper introduces it into the field of underwater TDE. The FS-GCC matrix is transformed into a grayscale image, and the DnCNN architecture is utilized to strengthen time delay features reconstructed from the FS-GCC. The network automatically learns and extracts refined time delay features through multi-layer convolution operations, effectively suppressing noise interference and significantly improving estimation accuracy. Compared with other deep learning methods, DnCNN has stronger noise extraction capabilities and can better adapt to the complex underwater noise environment.
Considering the variable SNR in underwater acoustic environments and the scarcity of high-quality training samples due to equipment limitations, we train the DnCNN model using a model-agnostic meta-learning (MAML) framework [21,22,23]. MAML simulates training tasks under diverse SNR conditions, enabling the model to adapt swiftly to new environments. This approach facilitates efficient learning with limited samples and promotes knowledge sharing across tasks, consequently increasing the model’s generalization ability in varying SNR scenarios and improving its practicality and robustness in complex underwater environments.

2. Basic Theory

2.1. Signal Model

A commonly used signal model for the classical TDE problem [19] is as in Equations (1) and (2):

x_{1} (t) = s_{1} (t) + n_{1} = α_{1} s (t - τ_{1}) + n_{1} (t)

(1)

x_{1} (t) = s_{2} (t) + n_{2} = α_{2} s (t - τ_{2}) + n_{2} (t)

(2)

where

x_{1} (t)

and

x_{2} (t)

represent the signals received by hydrophones 1 and 2, respectively.

s_{1} (t)

and

s_{2} (t)

are the noise-free signals, while

s (t)

is the source signal.

α_{1}

and

α_{2}

denote the attenuation factor due to distance and material absorption.

n_{1} (t)

and

n_{2} (t)

represent the additive environmental noise. It is assumed that

s (t)

and noise are uncorrelated.

τ_{1}

and

τ_{2}

denote the TDOA from the source to the hydrophone arrays.

The aforementioned equation can be converted into the discrete-time Fourier transform domain:

X_{1} (ω) = S_{1} (ω) + N_{1} (ω) = α_{1} S (ω) e^{- j ω τ_{1}} + N_{1} (ω)

(3)

X_{2} (ω) = S_{2} (ω) + N_{2} (ω) = α_{2} S (ω) e^{- j ω τ_{2}} + N_{2} (ω)

(4)

where

X_{1} (ω)

,

X_{2} (ω)

,

S_{1} (ω)

,

S_{2} (ω)

,

S (ω)

,

N_{1} (ω)

, and

N_{2} (ω)

are the Fourier transforms of

x_{1} (t)

,

x_{2} (t)

,

s_{1} (t)

,

s_{2} (t)

,

s (t)

,

n_{1} (t)

and

n_{2} (t)

, respectively.

ω

is the normalized angular frequency by the sampling rate.

e^{- j ω τ_{i}}

is the phase factor introduced by the time delay

τ_{i}

, where

i = 1, 2

.

To effectively distinguish multi-path components, the transmitted signal needs to exhibit high time resolution. This paper employs Linear Frequency Modulation (LFM) signals, which feature a frequency that changes linearly with time. LFM signals possess substantial energy and bandwidth, making them resistant to noise interference [24]. The LFM signal is defined as

s (t) = A sin (2 π (f_{0} t + \frac{1}{2} k t^{2}))

(5)

where

A

represents the signal amplitude,

k = \frac{B}{T}

represents the modulation rate of the LFM signal,

B

represents the bandwidth,

T

represents the time width, and

f_{0}

represents the center frequency. The time-domain and frequency-domain representations of the signal are shown in Figure 1.

2.2. Generalized Cross-Correlation

The cross-power spectrum

G (ω)

of the phase transformation between signals

X_{1} (ω)

and

X_{2} (ω)

is defined as

G (ω) = X_{1} (ω) X_{2}^{*} (ω)

(6)

where

{(\cdot)}^{*}

denotes the complex conjugate. The weighting function of GCC_PHAT is defined as

ψ (ω) = \frac{1}{| G (ω) |}

(7)

GCC is the inverse Fourier transform of the weighted cross-power spectrum, represented as

R [τ] = \frac{1}{2 π} \int_{- π}^{π} G (ω) ψ (ω) e^{j ω τ} d ω

(8)

where

τ

denotes the time delay. Thus, the TDE based on GCC_PHAT is represented as

\hat{τ} = a r g max_{τ} R [τ]

(9)

When accounting for noise and reverberation, GCC will exhibits spurious peaks, which complicates the TDE.

2.3. Frequency-Sliding Generalized Cross-Correlation

FS-GCC improves upon the GCC method, enabling the analysis of how various frequency bands contribute to the TDE of direct path. The size of the spectral window is

B

, and the hop length is

M

.

B

and

M

should be selected based on channel and signal characteristics. The subband GCC for any frequency band

l

is defined as

R [l, τ] = \frac{1}{2 π} \int_{- π}^{π} Ψ (ω + ω_{l}) Φ (ω) e^{j ω τ} d ω

(10)

where

Ψ (ω) = G (ω) ψ (ω)

,

ω_{l}

represents the frequency offset of the subband

l

.

Φ (ω) \in R

is a symmetric frequency domain window centered at

ω = 0

with a bandwidth of

B_{Φ} \in [0, π]

.

The frequency-swept subband GCC can be obtained by scanning

∠ P (ω)

(cross-power spectrum phase) over potentially overlapping frequency bands:

ω_{l} = l M_{Φ}, l = 0, \dots, L - 1

(11)

where

M_{Φ}

is frequency hopping.

When selecting the number of subbands

L

, it must cover all frequencies and not exceed the Nyquist limit:

L = [\frac{(π - B_{0} + M_{Φ})}{M_{Φ}}]

(12)

In practical applications, we consider using the discrete Fourier transform (DFT) of underwater LFM signals to extract subband GCC. The DFT of underwater LFM signals

x_{m} [n]

is represented as

X_{m} = {[X_{m} [0], X_{m} [1], \dots, 0, 0, \dots, X_{m} [N - 1]]}^{T}, m = 1, 2

(13)

where the elements

X_{m} \in C^{N}

are the coefficients

X_{m} [k]

of discrete frequencies

ω_{k} = k \frac{2 π}{N}

, and

n

is DFT’s length. It includes the discrete frequency sample

Φ [k] = Φ (ω_{k})

symmetric frequency domain window vector

Φ

, represented as

Φ = {[Φ [0], Φ [1], \dots, 0, 0, \dots, Φ [N - 1]]}^{T}

(14)

where

Φ \in R^{N}

performs symmetric zero-padding on

Φ

to ensure it contains only

B = [2 B_{Φ} \frac{N}{2 π}]

non-zero elements.

The subband GCC vector

r_{l} \in C^{N}

is obtained through the inverse transform of the windowed PHAT spectrum DFT, represented as

r_{l} [n] = \frac{1}{N} \sum_{k = 0}^{N - 1} \frac{X_{1} [k + l M] X_{2}^{*} [k + l M]}{| X_{1} [k + l M] X_{2}^{*} [k + l M]} Φ [k] e^{j \frac{2 π}{N} k n}

(15)

where

M = [M_{Φ} \frac{N}{2 π}]

for discrete frequency hopping.

The FSGCC matrix is constructed by stacking subband GCC vectors together:

R = [r_{0}, r_{1}, \dots, r_{L - 1}] \in C^{L \times N}

(16)

In the absence of noise, the true TDOA is located at the maximum value of each row

R [l, τ]

. However, the presence of noise and reverberation affects the FS-GCC, making TDE more challenging. Figure 2 shows examples of FS-GCC for underwater LFM signals at different SNRs.

3. Proposed Method

In this section, our proposed method specifically includes two key steps:

FS-GCC matrix Feature Enhancement: This step focuses on reconstructing multiple subwindow groups of the FS-GCC matrix to capture time delay-related signal features from various frequency band perspectives, enriching the information dimension and strengthening the key feature vectors.
MAML-based DnCNN Network Optimization Training: The FS-GCC matrix is first converted into a grayscale image, and DnCNN’s multi-layer convolution is applied to automatically learn and extract the time delay features reconstructed from the FS-GCC, effectively suppressing noise interference. Then, we incorporate the MAML framework to facilitate training under various SNR conditions. This integration allows the model to rapidly adapt to new environments and learn efficiently with a limited number of samples, thereby improving its generalization capabilities and robustness across different SNR scenarios.

3.1. FS-GCC Matrix Feature Enhancement

According to reference [19], we designed a group of spectral windows

Φ_{g r o u p} = \{Φ_{1}, Φ_{2}, Φ_{3}, \dots, Φ_{N}\}

with a length of

N

. Then, the FS-GCC is computed for each window function

Φ_{i} (i \in \{1, 2, \dots, N\})

in the window group

Φ_{g r o u p}

. The spectral window size

B

and the spectral window hop length

M

are defined as

B_{g r o u p} = \{B_{1, Φ}, B_{2, Φ}, B_{3, Φ}, \dots, B_{N, Φ}\}

and

M_{g r o u p} = \{M_{1, Φ}, M_{2, Φ}, M_{3, Φ}, \dots, M_{N, Φ}\}

, respectively. Across the full frequency range, the window

Φ_{i}

is slid to partition the cross-power spectrum phase into

L_{i}

subbands. The FS-GCC for the

i

-th window

Φ_{i}

is defined as follows:

R_{i} (l, τ) = \frac{1}{2 π} \int_{- π}^{π} Ψ (ω + ω_{i, l}) Φ_{i} (ω) e^{j ω τ} d ω

(17)

where the frequency offset

ω_{i, l}

corresponds to the window

i

and frequency band

l

. In the following expressions,

R_{i} (l, τ)

,

τ \in \{0, 1, \dots, N_{t} - 1\}

is represented as

R_{i, l}

.

Figure 3 shows a schematic diagram of two spectral windows sliding across the frequency domain, with

∠ (\cdot)

representing the complex-valued phase angle. The subband GCC can be interpreted as the product of the shifts of

φ (ω)

and

Φ (ω)

to

ω_{l}

, with the frequency shifted back to zero before performing the inverse Fourier transform.

Sliding each window

Φ_{i}

from

Φ_{g r o u p}

across the GCC frequency domain will produce

N

subband matrices

R_{i} = [R_{i, 0}, R_{i, 1}, \dots, R_{i, L_{i - 1}}] \in C^{L_{i} \times N_{t}}

, as shown in Figure 4a. Given that the dimensions of

R_{i}

depend on the values of

B

and

M

, it is necessary to standardize the dimensions for the subsequent CNN training. Additionally, the time delay information encoded in the FS-GCC matrices from different spectral windows exhibits distinct characteristics. Effectively integrating this information can improve the feature representation of the FS-GCC matrices. Therefore, to reconstruct

N

FS-GCC matrices from multiple windows, the

N

subband matrices are concatenated into a joint matrix

R_{j o i n t} = [R_{1}, R_{2}, \dots, R_{N}]

, as shown in Figure 4b. Let

m_{i, l}

be the maximum value in each row of the matrix

R_{j o i n t}

. By sorting

R_{j o i n t}

in descending order based on

m_{i, l}

, we obtain

R_{s o r t} = [R_{1_{-} s o r t}, R_{2_{-} s o r t}, \dots, R_{\sum_{i = 1}^{N} L_{i - 1 -} s o r t}]

. The top

R_{L_{i - 1}}

rows are then retained to form the reconstructed FS-GCC matrix

R_{r e c o} = [R_{1_{-} s o r t}, R_{2_{-} s o r t}, \dots, R_{L_{i - 1_{-}} s o r t}]

. Sorting rows by their maximum values

m_{i, l}

prioritizes subbands with the most prominent time delay peaks, which statistically correspond to less noise-corrupted components. The selection step then constructs a compact representation by retaining these optimal subbands, directly improving DnCNN’s input quality. As shown in Figure 4c, it can be observed that the time delay information in the reconstructed FS-GCC matrix is significantly enhanced.

3.2. Meta-DnCNN Network Optimization Training

3.2.1. MAML Framework

MAML is a meta-learning method based on initializing parameters. Its core concept revolves around refining the model’s initialization parameters through training across a spectrum of related tasks. This approach allows the training of new tasks to build upon prior training and adjustments, significantly improving the model’s training efficiency and convergence speed. Rather than modifying the neural network’s structure, MAML primarily focuses on optimizing and tuning the initialization parameters, which substantially mitigates the computational burden. From the perspective of the loss function, the learning process of MAML can be interpreted as maximizing the sensitivity of the new task’s loss function to the model parameters. It means that small parameter adjustments can significantly affect the task loss. MAML’s training strategy employs an inner and outer parameter update mechanism, as shown in Figure 5. The inner update in MAML addresses each task within the batch individually: each task is fed into the network, with the outer parameters serving as the foundation for initializing the inner parameters. The network weights are updated using the support set, and the task loss is evaluated through the query set. The outer update of MAML is based on the performance of all tasks within the batch: the losses of all subtasks are summed, and this sum is used to update the outer parameters of the network. The advantage of this strategy is that it ensures consistency between local training tasks and test tasks while also expanding the training data distribution, thereby improving the model’s generalization ability.

The mathematical framework of the MAML algorithm is as follows: Suppose we have a neural network model

f_{θ}

initialized with parameter

θ

, and the training task distribution is denoted as

p (T)

. When performing inner-loop parameter

θ_{i}^{'}

updates for task

T_{i}

, we can use the following formula:

θ_{i}^{'} = θ - α \nabla_{θ} L_{T_{i}} (f_{θ})

(18)

where

α

is the inner-loop learning rate, which represents the step size of gradient descent. To ensure that the network performs well on all tasks in the current batch, we update the network parameters by summing the losses of all tasks to form a batch loss and minimizing this loss. The formula for outer-loop parameter updates is as follows:

θ \leftarrow θ - β \nabla_{θ} \sum_{T_{i} \sim p (T)} L_{T_{i}} (f_{θ_{i}^{'}})

(19)

where

β

is the learning rate for outer-loop updates. It is worth noting that although the loss

L

is computed using

θ_{i}^{'}

in the formula, the subsequent updates are not made to

θ_{i}^{'}

but rather to

θ

, which are the initialization parameters for the outer loop. Algorithm 1 outlines the Training Stage of MAML.

Algorithm 1 The training stage of MAML

Input:: Set of tasks $T = \{T_{1}, T_{2}, \dots, T_{n}\}$ , $α$ , $β$
Output:: $θ$
1:: Initialize $θ$ of the model $f_{θ}$
2:: while not done do
3:: for each $T_{i} \in T_{i} \sim p (T)$ do
4:: Initialize the network with $θ$
5:: Gradient descent using $α$
6:: Obtain $θ_{i}^{'}$ according to Equation (18)
7:: end for
8:: Gradient descent using $β$
9:: Update $θ$ according to Equation (19)
10:: end while
11:: return $θ$

3.2.2. Meta-DnCNN Model

In this paper, we apply the MAML training strategy to the DnCNN model, naming it Meta-DnCNN. The objective is to leverage meta-learning to enhance the model’s adaptability and robustness across various SNR environments. Even with a limited training dataset, the model effectively suppresses noise interference and extracts delay features from the FS-GCC matrix, significantly reducing the network’s reliance on large-scale training data. The specific implementation steps are illustrated in Figure 6. We convert FS-GCC matrix data obtained under different SNR conditions into grayscale images to construct multi-task datasets, which are then fed into the MAML model. With DnCNN serving as the base model for MAML, continuous iterations through the MAML training strategy

Ω_{M e t a}

ultimately yield the predicted FS-GCC matrix.

DnCNN Model:

The DnCNN model is designed based on the deep neural network architecture of the VGG (Visual Geometry Group), which utilizes repeated convolutional blocks [25]. Typically, deep learning-based methods aim to establish a mapping from noisy images to denoised images. However, unlike other deep learning-based denoising algorithms, the DnCNN model employs residual learning, where the network focuses on learning the noise characteristics rather than the image features. It predicts the noise component, and the denoised image is then obtained by subtracting the predicted noise component from the noisy image. The model’s principle is illustrated in the lower part of Figure 6.

During the training process, the DnCNN model’s output is the predicted residual image, and the target output is the difference between the noisy and noise-free images. The difference between these two images serves as the loss function for error backpropagation, which adjusts the network parameters. In the DnCNN model, the Mean Square Error (MSE) is used as the loss function, with the specific expression shown in Equation (20):

l (θ) = \frac{1}{2 N} \sum_{i = 1}^{N} | | R (y_{i}, θ) - (y_{i} - x_{i}) {| |}^{2}

(20)

where

R (y_{i}, θ)

represents the predicted residual output,

y_{i}

and

x_{i}

denote the noisy and noise-free images, respectively,

N

is the number of samples, and

θ

refers to the network parameters.

In this paper, the DnCNN network model consists of 13 layers. The first layer combines convolution with the ReLU function. The middle layers consist of convolution, BN (Batch Normalization), and ReLU functions to accelerate convergence. The model ends with a convolutional layer. The convolutional kernels are of size 3 × 3, with both stride and padding set to 1.

MAML Training Process:

Step 1. Dataset Construction: Since the inner loop of the MAML algorithm is designed for datasets with consistent distributions, the experimental data consists of grayscale images derived from FS-GCC matrices under varying SNR conditions. These images serve as the MAML training tasks for the experimental model. The grayscale images

F_{p}

generated from noisy FS-GCC matrices are used as inputs for DnCNN, while the corresponding grayscale images

F

derived from noise-free FS-GCC matrices function as the target labels. The task set for MAML can be expressed as

T = \{{(F_{p}, F)}_{1}, {(F_{p}, F)}_{2}, \dots, {(F_{p}, F)}_{n}\}

(21)

Within each task, the dataset is further divided into support sets

Ψ_{i}^{s u p} = {(F_{p}, F)}_{i}

and query sets

Ξ_{i}^{q u e} = {(F_{p}, F)}_{i}

, which are analogous to training and test sets in traditional deep learning model. These sets are mutually exclusive.

Step 2. MAML Training Process: After constructing the task sets, MAML trains separately through the inner-loop learner and the outer-loop meta-learner. The inner-loop learner uses sample data from the support sets

Ψ_{i}^{s u p} = {(F_{p}, F)}_{i}

for training, as described in Equation (18). The outer-loop learner iterates over sample data from the query sets

Ξ_{i}^{q u e} = {(F_{p}, F)}_{i}

across different tasks based on Equation (19). By learning across multiple tasks, the model is able to acquire an initialization parameter that is suitable for all tasks, thereby enabling the model to “learn to learn". This ability allows the model to rapidly converge and adjust parameters when encountering new tasks, achieving fast learning.

Step 3. Fine-Tuning Process: After completing MAML training, the model is equipped with a set of initialization parameters. When faced with a new task, the parameters can be further adjusted through fine-tuning based on these initialization parameters. This adjustment leads to faster convergence and requires fewer samples. First, a new task dataset

T_{n e w} = \{{(F_{p}, F)}_{1}, {(F_{p}, F)}_{2}, \dots, {(F_{p}, F)}_{n}\}

is constructed, and the support set data

Ψ_{n e w}^{s u p} = {(F_{p}, F)}_{i}

are used to update the model’s parameters. A relatively small number of samples is sufficient to obtain an optimized Meta-DnCNN model that better adapts to the data distribution of the new task.

Step 4. MAML Testing Process: Following training and fine-tuning, the DnCNN model is saved and employed for FS-GCC estimation in new SNR environments. Its performance is evaluated by comparing the results with grayscale images of FS-GCC matrices under noise-free conditions. This comparison assesses the generalization performance of the proposed algorithm in unfamiliar environments, validating the model’s adaptability and accuracy in practical applications across a range of SNR conditions.

4. Simulation Results and Discussion

This section exhibits the simulation results, verifies the effectiveness of the proposed method, and discusses the results accordingly.

4.1. Evaluation Metrics

To assess the effectiveness of TDE, the following metrics are used: mean absolute error (MAE), percentage of abnormal points (PAP), and FSPR.

MAE is a metric that quantifies the average absolute difference between estimated delays and actual delays. It serves as a measure of the overall accuracy of an algorithm in estimating delay values. The MAE is calculated using the following formula:

M A E = \frac{1}{N_{t o t a l}} \sum_{i = 1}^{N_{t i a l}} | D_{i} - D |,

(22)

where

D

is the true value of the time delay,

D_{i}

is the TDE for the

i

-th test,

N_{t o t a l}

represents the total estimated samples.

PAP represents the ratio of abnormal estimated samples to the total estimated samples. This metric helps to detect significant discrepancies or errors in the TDE process. It is computed using the following formulas:

P A P = \frac{N_{n}}{N_{t o t a l}}

(23)

where

N_{n}

represents the set of abnormal estimated samples. A TDE error exceeding 5 sampling points is considered an abnormal estimation result.

FSPR is described as the mean increase of the highest GCC peak compared with the second highest peak.

F S P R = \frac{1}{N_{t o t a l}} \sum_{i = 1}^{N_{t o t a l}} 20 log 10 (\frac{p_{1}^{i}}{p_{2}^{i}})

(24)

where

p_{1}^{i}

and

p_{2}^{i}

represent the highest and second highest peak values, respectively.

4.2. Simulation Settings

We consider LFM signals with center frequency is

11

kHz, sampling rate is 50 kHz, B = 8 kHz and

T = 0.01

s for TDE as shown in Figure 1. The underwater channel environment is intercepted from the NOF and NCS channel of the Watermark underwater channel dataset [26]. The intercepted NOF and NCS channel impulse response in this experiment is shown in Figure 7a,b.

We defined a frequency window group

Φ_{g r o u p}

, where

B_{g r o u p} = \{32, 64, 128, 256\}

and

M_{g r o u p} = \{B_{g r o u p} / 8\}

are the coefficients of spectrum window function. We added white Gaussian noise to the received signal, where the SNR ranges from −20 dB to 10 dB, incrementing by 5 dB each time. Furthermore, 200 Monte Carlo simulations are conducted for each SNR level. The parameters of the eta-DnCNN network are outlined in Table 1, noting that the training task’s SNR is varied in steps of 5 dB.

4.3. TDE Results and Analysis

Figure 8 displays the single prediction results of our proposed method for FS-GCC under NOF channel when SNR is −10 dB. Specifically, Figure 8a depicts the original FS-GCC matrix, Figure 8b shows the FS-GCC matrix under noise-free conditions, and Figure 8c presents the predicted FS-GCC matrix. Notably, Figure 8c is the restored FS-GCC matrix obtained from the gray image predicted by the Meta-DnCNN network. It is evident that Meta-DnCNN method effectively recovers the time delay characteristics of the FS-GCC matrix under noise-free conditions.

Figure 9a–d show the peaks of single TDE results of GCC_PHAT, SVD_FS-GCC [18], WSVD_FS-GCC [18], and Meta-DnCNN_FS-GCC, respectively, where the red line indicates the sampling point values of the correct TDE. As can be observed, the peak value of the traditional GCC_PHAT method is not prominent in high SNR underwater channel environments, which tends to result in a high anomaly rate in predictions. Both the SVD_FS-GCC and WSVD_FS-GCC methods effectively mitigate the impact of clutter peaks. However, the proposed method further enhances the peaks and exhibits greater stability during the prediction process.

Figure 10 and Figure 11 show the TDE performance of the Meta-DnCNN_FS-GCC method with single subwindow and multi-sub-window reconstruction for white Gaussian noise with different SNR added for NOF and NCS channels, respectively. The single subwindow we chosen is the frequency window with

B = 256

and

M = 32

. It can be observed that the Meta-DnCNN_FS-GCC method using multi-sub-window reconstruction outperforms the single subwindow method across all performance indicators. This improvement is attributed to the multi-sub-window reconstruction operation, which enhances the characteristics of the FS-GCC matrix, leading to a more pronounced peak in the cross-correlation estimation and thereby increasing both the accuracy and stability of the estimation. This finding is consistent in both channel environments, although the NCS channel, being more complex than the NOF channel, exhibits slightly worse performance metrics.

Figure 12 and Figure 13 present the MAE, FSPR, and PAP of NOF and NCS channels for the GCC_PHAT, SVD_FS-GCC [18], WSVD_FS-GCC [18], U−Net_FS-GCC [20], and Meta-DnCNN_FS-GCC methods with the addition of Gaussian white noise with different SNR, respectively. The U-Net network in reference [20] was also trained using MAML. As shown in Figure 12a and Figure 13a, the MAE of TDE decreases with an increase in SNR. Under low SNR conditions, the MAE of the deep learning-based methods shows significant improvement compared with existing methods, with the Meta-DnCNN_FS-GCC method further enhancing TDE accuracy relative to the U−Net_FS-GCC method. Figure 12b and Figure 13b illustrate the variation in FSPR under different SNR conditions. A higher FSPR corresponds to a more pronounced peak. The traditional GCC method exhibits weak peaks at low SNR, while methods based on SVD and deep learning demonstrate superior peak significance. The Meta-DnCNN_FS-GCC method further accentuates the peak by enhancing the FS-GCC matrix characteristics. Figure 12c and Figure 13c show the variation in PAP with SNR. As the SNR decreases, the probability of TDE anomalies increases. However, the Meta-DnCNN_FS-GCC method maintains stability and effectively mitigates the occurrence of such abnormal estimations.

5. Discussion

This paper analyses the performance of the Meta-DnCNN_FS-GCC method through a series of experiments. The results clearly demonstrate that in the complex and variable underwater channel environment, especially under the condition of low SNRs, the Meta-DnCNN_FS-GCC method shows significant advantages, with the MAE reduced, the FSPR higher, and the PAP effectively controlled.

The peak of the traditional GCC_PHAT method is not obvious, which is consistent with the conclusion in the existing literature that its ability to capture signal features in a complex channel environment is limited. Based on the cross-correlation principle, GCC_PHAT performs well under ideal conditions but is severely impacted by multi-path propagation and noise interference in underwater environments. As noted in reference [17], when numerous reflections and noise are present in the channel, GCC_PHAT struggles to accurately differentiate the true time delay peak from false peaks induced by interference signals. This leads to a high anomaly rate and high estimation errors. In contrast, the SVD_FS-GCC and WSVD_FS-GCC methods, which incorporate singular-value decomposition to process the FS-GCC matrix, are more effective in suppressing clutter peaks. By decomposing the signal matrix into a subspace containing the primary signal features and a noise subspace, these methods reduce noise components, making the time delay peak more obvious. This result is consistent with the conclusion about the noise reduction and feature enhancement of singular-value decomposition in signal processing proposed in reference [18]. However, the SVD method is easily affected by spurious peaks, so the accuracy and stability of TDE are not ideal in underwater channels.

Under low SNRs, some frequency bands of the signal can be severely polluted by noise, making it difficult for traditional methods to extract effective time delay information. The multi-sub-window reconstruction can capture the time delay features from different frequency bands, and each subwindow can be regarded as an analysis window of the signal in a specific frequency range. In this way, even if some frequency bands are strongly interfered by noise, other frequency bands may still capture relatively clear time delay features, thus avoiding the overall estimation anomaly caused by the loss of partial frequency band information. It can be seen from Figure 4 that this method of comprehensively extracting features from multiple frequency bands enriches the information dimension of the FS-GCC matrix and enhances its key eigenvectors. Our experiments confirm that this approach facilitates comprehensive TDE even under low SNR conditions, providing more basis for subsequent accurate estimation and reducing the probability of abnormal estimation. This is consistent with the conclusion about the advantage of multi-band signal processing in capturing the key features of time delay in reference [19]. On this basis, the Meta-DnCNN_FS-GCC method proposed in this paper further highlights the peak. By leveraging the denoising and feature extraction capabilities of DnCNN, the method transforms the FS-GCC matrix into a grayscale image and applies a multi-layer convolutional network to automatically learn noise features. This enables more accurate extraction of useful time delay information from noisy signals and restores the FS-GCC matrix closer to a noise-free state. As a result, time delay estimation remains stable even under high SNR, reducing prediction anomalies. Additionally, the application of the MAML framework enables the model to obtain good performance with only 150 pieces of data in each group of tasks. When facing a new task with a SNR of −20 dB, the model can quickly adjust its parameters based on the feature extraction methods and noise suppression strategies learned under various SNR conditions before, allowing for accurate TDE despite limited data. This knowledge-sharing capability significantly enhances the model’s practicality and robustness in dynamic underwater environments.

However, there are still some areas for improvement in the method of this study. For example, although the Meta-DnCNN_FS-GCC performs excellently under the current experimental settings, the complexity of the actual underwater environment may exceed the scope of experimental simulation. Future research can further explore how to further optimize and improve the TDE under more extreme underwater environmental conditions, such as strong water currents and severe multi-path effects caused by complex geological structures, so as to meet the needs of the continuous development of underwater engineering and scientific research.

6. Conclusions

Aiming at the challenges of accuracy and stability in time delay estimation under low SNRs in underwater signal processing, as well as the problem that the existing methods based on cross-correlation and deep learning fail to meet the practical application requirements, this paper proposes an innovative solution that combines a multi-sub-window reconstruction of the FS-GCC matrix, multi-layer convolution denoising of the DnCNN, and the MAML framework. Through multi-sub-window reconstruction, this solution captures and enhances the extraction of time delay characteristics from different frequency bands. The DnCNN is utilized to accurately extract multi-level noise features for noise suppression. With the help of the MAML, the model can quickly adapt to new environments under various SNR conditions. Even with a limited number of underwater training samples, it can achieve the desired accuracy, significantly improving the accuracy and stability of underwater time delay estimation, as well as the practicability and robustness of the model in complex underwater environments. We conducted simulation experiments in a real underwater acoustic channel environment. By analyzing the experimental results, we explored the limitations of the existing methods and investigated the advantages of the proposed method in underwater time delay estimation. The experimental results show that, compared with the existing methods, the proposed method reduces the MAE, increases the FSPR, and effectively controls the PAP. Under the same conditions, it has lower estimation errors and higher stability. This innovation provides strong support for the efficient and stable operation of underwater sensor networks and also offers new ideas for applying the human voice time delay estimation methods to the underwater environment.

Author Contributions

Conceptualization, M.J. and X.C.; methodology, M.J.; validation: M.J., X.C., and J.L.; formal analysis: M.J. and J.L.; investigation: M.J. and L.L.; data management: M.J. and B.J.; writing—original draft preparation: M.J.; writing—review and editing: X.C.; visualization: M.J.; supervision: J.L. and B.J.; project management: L.L. and B.J.; funding acquisition: X.C. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 52171341 and 62471494).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Luo, J.; Yang, Y.; Wang, Z.; Chen, Y. Localization algorithm for underwater sensor network: A review. IEEE Internet Things J. 2021, 8, 13126–13144. [Google Scholar] [CrossRef]
Li, S.; Qu, W.; Liu, C.; Qiu, T.; Zhao, Z. Survey on high reliability wireless communication for underwater sensor networks. J. Netw. Comput. Appl. 2019, 148, 102446. [Google Scholar] [CrossRef]
Fan, H.; Nie, W.; Yao, S.; An, L.; Yu, F.; Zhang, Y.; Wu, Q. A high-order time-delay difference estimation method for signal enhancement in the distorted towed hydrophone array. J. Acoust. Soc. Am. 2024, 156, 1996–2008. [Google Scholar] [CrossRef]
Wang, X.; Xu, B.; Guo, Y. Minimum error entropy robust delay filter for multi-auv cooperative localization. IEEE/ASME Trans. Mechatronics 2024, 30, 1567–1577. [Google Scholar] [CrossRef]
Xia, Z.; Li, X.; Meng, X. High resolution time-delay estimation of underwater target geometric scattering. Appl. Acoust. 2016, 114, 111–117. [Google Scholar] [CrossRef]
Lowes, G.J.; Neasham, J. PADAL—Passive acoustic detection and localisation: Low energy underwater wireless vessel tracking network. Comput. Netw. 2024, 241, 110216. [Google Scholar] [CrossRef]
Jo, M.J.; Choi, J.W.; Han, D.G. Estimation of Source Range and Location Using Ship-Radiated Noise Measured by Two Vertical Line Arrays with a Feed-Forward Neural Network. J. Mar. Sci. Eng. 2024, 12, 1665. [Google Scholar] [CrossRef]
Hu, X.; Zhang, L.; Hu, B.; Wang, J.; Guo, L.; Zhang, H. Position estimation of acoustic elements based on improved delay estimation algorithm. Appl. Acoust. 2025, 228, 110286. [Google Scholar] [CrossRef]
Jang, J.; Meyer, F.; Snyder, E.R.; Wiggins, S.M.; Baumann-Pickering, S.; Hildebrand, J.A. Bayesian detection and tracking of odontocetes in 3-D from their echolocation clicks. J. Acoust. Soc. Am. 2023, 153, 2690. [Google Scholar] [CrossRef]
Jia, L.; Zhang, G.; Liu, Y.; Bai, Z.; Geng, Y.; Wu, Y.; Zhang, J.; Zhang, W. Sonar buoy active detection and localization for underwater targets using high-level sound sources and MEMS hydrophone. Measurement 2025, 241, 115740. [Google Scholar] [CrossRef]
Pang, X.; Jiang, F. Generalized quadratic correlation delay estimation algorithm based on Phase Transform weighting function. In Proceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering, Xiamen, China, 20–22 October 2023; pp. 1069–1074. [Google Scholar]
Ferguson, E.L. Multitask convolutional neural network for acoustic localization of a transiting broadband source using a hydrophone array. J. Acoust. Soc. Am. 2021, 150, 248–256. [Google Scholar] [CrossRef]
Whitaker, S.; Barnard, A.; Anderson, G.D.; Havens, T.C. Through-ice acoustic source tracking using vision transformers with ordinal classification. Sensors 2022, 22, 4703. [Google Scholar] [CrossRef]
Yao, S.; Meng, Q.; Chen, C.; Tariq, I.; Zhou, C.; Liu, W. High-precision time delay estimation of narrowband radio signal by PHAT-LSTM. Meas. Sci. Technol. 2021, 32, 075001. [Google Scholar] [CrossRef]
Salvati, D.; Drioli, C.; Foresti, G.L. Time delay estimation for speaker localization using CNN-based parametrized GCC-PHAT features. In Proceedings of the Interspeech, Brno, Czechia, 30 August–3 September 2021; pp. 1479–1483. [Google Scholar]
Comanducci, L.; Borra, F.; Bestagini, P.; Antonacci, F.; Tubaro, S.; Sarti, A. Source localization using distributed microphones in reverberant environments based on deep learning and ray space transform. IEEE/ACM Trans. Audio Speech Lang Process. 2020, 28, 2238–2251. [Google Scholar] [CrossRef]
Wang, S.; Zhou, Y.; Yang, X.; Liu, H. A robust blind source separation algorithm based on non-negative matrix factorization and frequency-sliding generalized cross-correlation. In Proceedings of the 2021 IEEE Statistical Signal Processing Workshop (SSP), Rio de Janeiro, Brazil, 11–14 July 2021; pp. 231–235. [Google Scholar]
Cobos, M.; Antonacci, F.; Comanducci, L.; Sarti, A. Frequency-sliding generalized cross-correlation: A sub-band time delay estimation approach. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 1270–1281. [Google Scholar] [CrossRef]
Song, Q.; Ou, Z. Modified frequency-sliding generalized cross correlation for time delay difference estimation of microphone array. IEEE Sensors J. 2023, 23, 31038–31049. [Google Scholar] [CrossRef]
Comanducci, L.; Cobos, M.; Antonacci, F.; Sarti, A. Time difference of arrival estimation from frequency-sliding generalized cross-correlations using convolutional neural networks. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4945–4949. [Google Scholar]
Yang, N.; Zhang, B.; Ding, G.; Wei, Y.; Wei, G.; Wang, J.; Guo, D. Specific emitter identification with limited samples: A model-agnostic meta-learning approach. IEEE Commun. Lett. 2021, 26, 345–349. [Google Scholar] [CrossRef]
Lin, W.; Mak, M.W. Model-agnostic meta-learning for fast text-dependent speaker embedding adaptation. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 1866–1876. [Google Scholar] [CrossRef]
Yang, F.; Liu, J.; Hua, C.; Liu, W.; Dong, D. Early fault diagnosis strategy for high-speed train suspension systems based on model-agnostic meta-learning. Veh. Syst. Dyn. 2024, 62, 2510–2532. [Google Scholar] [CrossRef]
Wu, J.; Li, M.; Fang, X.; Ramaccia, D.; Toscano, A.; Bilotti, F.; Ding, D. Anti-interference DoA estimation for LFM radar signals using space-time-modulated metasurfaces. IEEE Trans. Microw. Theory Tech. 2024, 73, 1460–1472. [Google Scholar] [CrossRef]
Mehdizadeh, M.; MacNish, C.; Xiao, D.; Alonso-Caneiro, D.; Kugelman, J.; Bennamoun, M. Deep feature loss to denoise OCT images using deep neural networks. J. Biomed. Opt. 2021, 26, 046003. [Google Scholar] [CrossRef] [PubMed]
Van Walree, P.A.; Socheleau, F.X.; Otnes, R.; Jenserud, T. The watermark benchmark for underwater acoustic modulation schemes. IEEE J. Ocean. Eng. 2017, 42, 1007–1018. [Google Scholar] [CrossRef]

Figure 1. Time-domain and frequency-domain of LFM signal: (a) Time-domain. (b) Frequency-domain.

Figure 2. Examples of FS-GCC at different SNRs: (a) SNR = −10 dB. (b) SNR = 0 dB. (c) SNR = 10 dB.

Figure 3. Explanation of different spectral windows in FS-GCC: (a) Spectral window 1. (b) Spectral window 2.

Figure 4. Reconstruction process of multi-window FS-GCC matrix at SNR = −10 dB: (a) Subband matrices. (b) Jointed matrix. (c) Reconstructed matrix.

Figure 5. The parameter update mechanism of MAML.

Figure 6. The Meta-DnCNN model.

Figure 7. Watermark channel impulse response: (a) NOF channel. (b) NCS channel.

Figure 8. Single prediction results: (a) Original FS-GCC matrix. (b) FS-GCC matrix under noise-free conditions. (c) Predicted FS-GCC matrix.

Figure 9. Single estimated peak comparison: (a) GCC_PHAT. (b) SVD_FS-GCC. (c) WSVD_FS-GCC. (d) Meta-DnCNN_FS-GCC.

Figure 10. TDE performance for multi-window and single-window Meta-DnCNN_FS-GCC under NOF channel: (a) MAE. (b) FSPR. (c) PAP.

Figure 11. TDE performance for multi-window and single-window Meta-DnCNN_FS-GCC under NCS channel: (a) MAE. (b) FSPR. (c) PAP.

Figure 12. TDE performance for methods under NOF channel: (a) MAE. (b) FSPR. (c) PAP.

Figure 13. TDE performance for methods under NCS channel: (a) MAE. (b) FSPR. (c) PAP.

Table 1. Network simulation parameters of Meta-DnCNN.

Parameters	Value
Number of Tasks	6
SNR of Training Tasks	−15∼10 dB
Number of Training Data of Each Task	150
Number of Test Data of Each Task	20
Outer-Loop Learning Rate	0.001
Inner-Loop Learning Rate	0.001
Epoch	200
Optimizer	Adam

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ji, M.; Cui, X.; Li, J.; Li, L.; Jiang, B. Underwater Time Delay Estimation Based on Meta-DnCNN with Frequency-Sliding Generalized Cross-Correlation. J. Mar. Sci. Eng. 2025, 13, 919. https://doi.org/10.3390/jmse13050919

AMA Style

Ji M, Cui X, Li J, Li L, Jiang B. Underwater Time Delay Estimation Based on Meta-DnCNN with Frequency-Sliding Generalized Cross-Correlation. Journal of Marine Science and Engineering. 2025; 13(5):919. https://doi.org/10.3390/jmse13050919

Chicago/Turabian Style

Ji, Meiqi, Xuerong Cui, Juan Li, Lei Li, and Bin Jiang. 2025. "Underwater Time Delay Estimation Based on Meta-DnCNN with Frequency-Sliding Generalized Cross-Correlation" Journal of Marine Science and Engineering 13, no. 5: 919. https://doi.org/10.3390/jmse13050919

APA Style

Ji, M., Cui, X., Li, J., Li, L., & Jiang, B. (2025). Underwater Time Delay Estimation Based on Meta-DnCNN with Frequency-Sliding Generalized Cross-Correlation. Journal of Marine Science and Engineering, 13(5), 919. https://doi.org/10.3390/jmse13050919

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Underwater Time Delay Estimation Based on Meta-DnCNN with Frequency-Sliding Generalized Cross-Correlation

Abstract

1. Introduction

2. Basic Theory

2.1. Signal Model

2.2. Generalized Cross-Correlation

2.3. Frequency-Sliding Generalized Cross-Correlation

3. Proposed Method

3.1. FS-GCC Matrix Feature Enhancement

3.2. Meta-DnCNN Network Optimization Training

3.2.1. MAML Framework

3.2.2. Meta-DnCNN Model

4. Simulation Results and Discussion

4.1. Evaluation Metrics

4.2. Simulation Settings

4.3. TDE Results and Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI