Small Target Detection in Forward-Looking Sonar Images via LoG5S-LAD Framework

Wei, Yuhang; Wang, Jian; Wen, Jiani; Zhang, Zengming; Li, Haisen

doi:10.3390/rs18101518

Open AccessArticle

Small Target Detection in Forward-Looking Sonar Images via LoG5S-LAD Framework

by

Yuhang Wei

^1,2,3

,

Jian Wang

⁴

,

Jiani Wen

^1,2,3,

Zengming Zhang

^1,2,3 and

Haisen Li

^1,2,3,4,5,*

¹

National Key Laboratory of Underwater Acoustic Technology, Harbin Engineering University, Harbin 150001, China

²

Key Laboratory of Marine Information Acquisition and Security, Ministry of Industry and Information Technology, Harbin Engineering University, Harbin 150001,China

³

College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin 150001, China

⁴

Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Chengdu 610054, China

⁵

Nanhai Institute of Harbin Engineering University, Sanya 572024, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1518; https://doi.org/10.3390/rs18101518

Submission received: 26 March 2026 / Revised: 29 April 2026 / Accepted: 8 May 2026 / Published: 12 May 2026

(This article belongs to the Special Issue Satellite Remote Sensing for Ocean and Coastal Environment Monitoring (Second Edition))

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A physically adaptable G5S model is proposed to accurately fit the anisotropic energy of sonar echoes.
The LoG5S-LAD algorithm achieves superior background suppression and mitigates the detection-false alarm trade-off.

What are the implications of the main findings?

The broad physical universality of the G5S model guarantees accurate characterization of diverse target energy distributions.
Real-world sea data validation confirms that the LoG5S-LAD algorithm enables efficient detection of weak targets.

Abstract

In maritime search and rescue and underwater surveillance missions employing forward-looking sonar, strong reverberation and complex underwater environments often substantially degrade the target signal-to-clutter ratio (SCR), presenting significant challenges for target detection. Existing algorithms typically simplify the point spread function (PSF) into an ideal isotropic model, thereby overlooking the inherent anisotropy induced by its sidelobe structures. This physical model mismatch leads to target energy leakage and severely limits detection performance in complex backgrounds. To overcome the limitations of current target models and detection algorithms, this paper introduces a Gaussian 5 Superposition (G5S) model to accurately characterize the physical features of the PSF and proposes a Laplacian-of-G5S-based Local Adaptive Detection (LoG5S-LAD) method through the construction of a LoG5S filtering operator. Initially, a high-SCR target likelihood map is generated using Hessian-matrix-based geometric gating and LoG5S matched filtering techniques. Subsequently, robust background suppression and the effective preservation of faint targets are achieved through morphological artifact suppression, connected component screening, and a high-energy exemption mechanism. The effectiveness of the proposed framework is validated through model fitting experiments, as well as comprehensive simulations and detection tests across various sonar configurations. Experimental results indicate that the G5S model demonstrates precise fitting capabilities and strong physical adaptability. Furthermore, the proposed LoG5S-LAD algorithm significantly enhances the SCR while maintaining robust detection performance for faint and small-scale targets.

Keywords:

forward-looking sonar; target detection; model fitting; point spread function

1. Introduction

Imaging sonar serves as an essential technology in contemporary ocean engineering, with widespread applications in underwater engineering, target detection, maritime search and rescue, and marine scientific resource exploration [1,2,3]. However, current target detection methods utilized for imaging sonar primarily rely on optical image processing algorithms, which are ill-suited for the challenging conditions of underwater acoustic imaging [4,5]. Additionally, complex marine environmental noise, reverberation interference, and the physical resolution limits associated with long-range detection often result in sonar images characterized by minuscule target dimensions, a lack of textural detail, and a high vulnerability to background clutter [6,7,8]. As a result, the investigation of effective methods for faint and small target detection in sonar imagery has emerged as a significant research focus within the domain of sonar image processing.

Existing algorithms for target detection in single-frame sonar images primarily focus on background suppression by leveraging the local saliency features of targets. These approaches can be broadly classified into two categories: detection methods based on local statistical differences and filtering-based enhancement techniques utilizing Gaussian-like kernels.

Methods based on local statistical differences focus on exploiting the statistical feature differences between the target and the background in terms of gray level, contrast, or gradient. Classical background-suppression algorithms in this category include Max-Mean [9], Max-Median [9], and Morphology Opening [10]. These methods feature high computational efficiency, but they often yield high false alarm rates when background estimation is inaccurate. The Top-Hat algorithm [11] enhances bright targets smaller than the structuring element by subtracting a morphological-opening-based background estimate from the original image; however, it remains relatively sensitive to noise. To further improve the signal-to-clutter ratio (SCR), approaches based on local contrast have been proposed. The Local Contrast Measure (LCM) [12] effectively boosts high-intensity regions but simultaneously amplifies single-pixel salt-and-pepper noise. To address this issue, Human Visual System (HVS)-inspired refinements have been developed, including the Improved Local Contrast Measure (ILCM) [13] and the New Local Contrast Measure (NLCM) [14]. The Multi-Scale Patch-based Contrast Measure (MPCM) proposed in [15] accommodates targets of varying sizes by using multi-scale image patches, but it imposes a substantial computational burden. In contrast, the Directional Gradient (DGRAD) method [16] enhances candidate targets by extracting multi-directional gradient information from the target neighborhood; this facilitates the separation of small targets from structural background clutter and achieves robust detection in complex, low-SCR scenarios. The Average Absolute Gray Difference (AAGD) algorithm [17] reduces background noise and highlights targets through local averaging and subtraction; however, its use of a single fixed window size limits its adaptability to targets of varying dimensions. The Multi-Scale AAGD (MS-AAGD) [18] addresses this limitation by computing average absolute gray differences across multiple window scales, enabling the robust detection of faint and small-scale targets of different sizes. However, these approaches rely primarily on pixel-level statistical saliency and often ignore the morphological characteristics of sonar targets as physical objects within the imaging system; this omission commonly elevates false alarm rates in complex maritime environments. While incorporating micro-Doppler signatures can effectively mitigate these false alarms [19], such techniques inherently necessitate multi-frame coherent processing, rendering them inapplicable to strictly single-frame sonar applications.

Filtering methods that employ Gaussian-like kernels implement the matched-filtering principle, which maximizes the SCR by shaping filter kernels to resemble the target signature. The Laplacian of Gaussian (LoG) operator [20] assumes targets have an isotropic Gaussian profile and discriminates them from background clutter by selecting an optimal scale. In [21], this idea was extended by applying second-order differential operators to obtain scale-adaptive enhancement for point-like targets. The Difference of Gaussian (DoG) algorithm, proposed in [22,23], serves as an efficient approximation of the LoG operator for detecting faint and small-scale targets. To migrate the spurious edge responses produced by DoG filters, the Difference of Gabor (DoGb) and Improved Difference of Gabor (IDoGb) algorithms were introduced in [24].

Classical filtering algorithms rely on the mathematical assumption of ideal target isotropy [25], they conceptualize the target’s energy as a symmetric circular spot with no directional variation in the two-dimensional (2D) spatial domain. However, the energy distribution of real sonar targets contradicts this assumption. The 2D spatial resolution of a sonar system is governed by mutually independent physical mechanisms: the range resolution depends on the bandwidth of the transmitted acoustic pulse, whereas the azimuth resolution is strictly constrained by the spatial directivity of the transducer array’s physical aperture. Driven by this discrepancy in physical mechanisms, the point spread function (PSF) inherently exhibits severe spatial asymmetry, directly causing real targets to manifest pronounced directional variations, namely anisotropic structures [26]. Such a severe mismatch between the theoretical physical model and the actual energy distribution prevents the filter from fully capturing the mainlobe energy of the target, this will induce substantial target energy leakage into the sidelobes, drastically degrading target detection performance in complex backgrounds [27].

(1): Grounded in the physical mechanisms of sonar imaging, this work investigates the representation of anisotropic target features. To this end, the G5S model is established to precisely characterize the authentic PSF, fundamentally resolving the inherent mismatch between traditional filtering operators and the actual target energy distribution.
(2): We derive and implement an anisotropic LoG5S operator customized for the G5S model, which precisely concentrates energy on faint and small-scale targets. Leveraging this operator, we propose the LoG5S-LAD algorithm. By integrating Hessian-based geometric gating with LoG5S matched filtering, the algorithm effectively suppresses elongated reverberation artifacts and background clutter, followed by local adaptive thresholding to achieve refined target detection.
(3): Comprehensive validation using both simulation and field experiments shows that the G5S model achieves significantly lower fitting errors in the azimuth and range dimensions than traditional models, confirming its physical generality across different sonar parameters. Comparative evaluations reveal that the proposed LoG5S-LAD method outperforms mainstream algorithms in key metrics, including the SCR and Background Suppression Factor (BSF), demonstrating its robust detection performance in complex, noisy environments.

The remainder of this paper is organized as follows: Section 2 analyzes the physical mechanisms of sonar imaging and presents the theoretical formulation of the G5S model and the LoG5S-LAD algorithm. Section 3 discusses the selection of the model order using both simulated and measured sea trial data, and comprehensively evaluates the target fitting accuracy of the model as well as the detection performance of the proposed algorithm. Finally, Section 4 provides an in-depth discussion of the work presented in this paper and concludes the study.

2. Materials and Methods

2.1. Analysis of Sonar Imaging Mechanisms and Target Spatial Distribution Characteristics

In classical signal processing theory, the matched filter has been proven to be the optimal linear filter for detecting known signals in the presence of Gaussian white noise. Its fundamental mechanism operates by matching the filter’s transfer function

H (f)

with the complex conjugate of the target signal’s spectrum

X^{*} (f)

to maximize the output signal-to-noise ratio (SNR) [28]. Similarly, the problem of target detection in 2D sonar imagery can be equivalently formulated as the search for a spatial matched filter. In the imaging process, this optimal template corresponds to the system’s PSF. Due to finite transmission and reception bandwidths and a limited physical array aperture, an ideal point target does not appear as a single pixel; rather, convolution with the PSF spreads its energy into a localized blob exhibiting a characteristic spatial distribution [29]. This imaging process is intuitively illustrated in Figure 1.

To develop an optimal 2D spatial energy fitting template, it is essential to conduct a thorough analysis of the morphological characteristics of the observed energy blob and the physical mechanisms underlying its formation. Studies indicate that target spatial distributions in sonar imagery are markedly anisotropic, contrasting sharply with the near-isotropic patterns typical of infrared and visible-light images [30]. This difference arises from the distinct imaging physics and resolution properties of sonar systems.

Sonar image formation depends on two physically independent processes: time-domain sampling along the range dimension and spatial beamforming along the azimuth dimension. Since these processes are orthogonal, it is theoretically justifiable to analyze their resolution characteristics separately. In classical acoustic imaging theory, under the far-field condition, the 2D system response can be modeled as a separable function across the range and azimuth dimensions [31]. Based on this spatial separability assumption, and to effectively mitigate the computational bottleneck associated with high-dimensional parameter estimation, we approximate the 2D PSF response

H (x, y)

as the product of the azimuth response

f_{A} (x)

and the range response

f_{R} (y)

:

H (x, y) \approx f_{A} (x) f_{R} (y)

(1)

where

f_{A} (x)

and

f_{R} (y)

represent the decoupled one-dimensional impulse responses along the azimuth and range directions, respectively.

The morphology of the range response

f_{R} (y)

is strictly governed by the frequency-domain characteristics of the transmitted signal. The range resolution of a sonar system is predominantly determined by the bandwidth of the transmitted signal. For active sonar systems employing pulse compression techniques, the theoretical range resolution, ΔR is defined as the −3 dB width of the main lobe of the impulse response, which is also referred to as the Full Width at Half Maximum (FWHM):

Δ R = \frac{c}{2 B}

(2)

where

c

is the speed of sound in water, and

B

is the signal bandwidth.

Due to the influence of practical windowing and the Central Limit Theorem, the energy envelope of

f_{R} (y)

can be accurately approximated as a one-dimensional Gaussian distribution

N (0, σ_{y}^{2})

. Based on the statistical properties of the Gaussian distribution, a fixed relationship exists between its FWHM and the standard deviation

σ_{y}

:

F W H M = 2 \sqrt{2 l n 2} σ_{y} \approx 2.355 σ_{y}

(3)

Since

Δ R = F W H M

, a mapping between the physical resolution and the statistical parameters can be established:

σ_{y} = \frac{Δ R}{2.355} = \frac{c}{4.71 B}

(4)

To fully characterize the target energy within the detection model, the effective physical scale

H_{T}

in the range dimension is defined as the

{6 σ}_{y}

width of the Gaussian kernel, following the

3 σ

rule of Gaussian distribution. This scale encompasses approximately 99.7% of the total signal energy, yielding:

H_{T} = 6 σ_{y} = 6 \cdot (\frac{c}{4.71 B}) \approx \frac{1.27 c}{B}

(5)

It can be observed that the range dimension scale

H_{T}

is inversely proportional solely to the signal bandwidth

B

and remains independent of the detection range

R

. This indicates that the physical width of the target in the range dimension of the sonar image remains constant across different range.

Conversely, the morphology of the azimuthal response

f_{A} (x)

is constrained by the spatial filtering capability of the array aperture. The beamforming process is mathematically equivalent to the spatial convolution of the array pattern with the target’s spatial distribution function. According to the Rayleigh criterion, when the angular interval between two adjacent point targets is less than the beam width

θ_{B W}

, their echo responses merge into a single peak, rendering the system incapable of spatially resolving them [32]. Hence, the beam width

θ_{B W}

directly defines the system’s minimum resolvable unit in the azimuthal direction. For the 360° uniform circular array primarily investigated in this paper,

θ_{B W}

is jointly determined by the operating wavelength

λ

and effective aperture

D_{e f f}

involved in beamforming:

θ_{B W} \approx \frac{λ}{D_{e f f}}

(6)

In Cartesian coordinate imagery, the physical width

W_{T}

occupied by a point target in the azimuthal dimension is defined as the product of the angular resolution and the detection range

R

:

W_{T} \approx R \cdot θ_{B W} = \frac{λ \cdot R}{D_{e f f}}

(7)

It can be observed that the azimuthal dimension

W_{T}

is linearly proportional to the detection range

R

. As the target distance increases, the acoustic beam undergoes angular divergence, resulting in a significant broadening of the target energy in the azimuthal direction.

Therefore, based on the spatiotemporal orthogonal decomposition of the sonar PSF, the range response

f_{R} (y)

is governed by the signal bandwidth

B

and demonstrates spatial invariance. The azimuthal response

f_{A} (x)

constrained by the spatial directivity of the aperture, manifests a spatial divergence that varies linearly with range

R

. This discrepancy in physical scales gives rise to a distinct dynamically anisotropic energy distribution of the target. Although Gaussian kernels are commonly employed as foundational models for energy fitting [33], owing to the effects of the Central Limit Theorem and aperture weighting, traditional isotropic models remain insufficient for accurately characterizing such complex dynamic features.

2.2. LoG5S-Based Local Adaptive Target Detection Algorithm

Section 2.1 demonstrates that the sonar PSF has pronounced anisotropic structure. Therefore, reducing this structure to a single Gaussian (G1) or an ideal Bessel function will produce a substantial model mismatch. This mismatch not only leads to energy leakage during fine-grained target detection, but also increases false-alarm rates. To derive the optimal matched filter for point targets in sonar systems, maximize the output SCR, and accurately reconstruct the physical imaging characteristics of the targets. we first introduce a G5S model that captures both the anisotropic core geometry and sidelobe structures. Treating this model as the matched-filter kernel, we derive its Laplacian operator and develop LoG5S-LAD. The overall architecture and components of the proposed method are summarized in Figure 2.

As illustrated in Figure 2, the proposed algorithmic framework comprises two main components: physics-driven target enhancement and adaptive thresholding processing.

Initially, the raw sonar image undergoes a physics-driven, dual-stream enhancement scheme operating in parallel. In the first stream, the Hessian matrix is employed to extract local second-order curvature information and construct a geometric gating mechanism, which effectively suppresses non-blob-like strip noise commonly observed in forward-looking sonar (FLS) images from a morphological perspective. Concurrently, the second stream applies the derived anisotropic LoG5S operator as a matched filter, enabling the accurate aggregation and enhancement of target energy. The outputs of these two streams are then fused through element-wise multiplication, resulting in effective background suppression while preserving genuine target responses.

Subsequently, a local adaptive thresholding strategy is applied to address non-stationary background conditions and SCR variations inherent in sonar imagery. To further improve detection reliability, a cascaded post-processing procedure is adopted, including artifact suppression, connected component screening, and a high-energy exemption mechanism. This combination ensures the robust preservation of genuine targets and enables accurate target detection under complex acoustic environments.

2.2.1. Hessian-Based Shape Gating for Background Suppression

In FLS images, genuine targets often appear amid high-intensity background clutter, such that the clutter energy may match or exceed the target energy. Consequently, intensity alone is usually insufficient for reliable discrimination. Owing to the modulation of the sonar system’s PSF, true target echoes appear as Gaussian-like elliptical blobs in the image. These blobs produce strong second-order derivative responses along both orthogonal principal axes, yielding bidirectional curvature features. In contrast, structured interference manifests as elongated patterns that extend along either the azimuth or range direction. These patterns exhibit curvature only in the direction perpendicular to their extension, while curvature along the extension direction is negligible; thus, they exhibit effectively unidirectional curvature.

To capture and exploit this distinction in local curvature, we employ the Hessian matrix, which provides a precise characterization of a function’s local curvature through a square matrix of second-order partial derivatives. For a 2D sonar image

I (x, y)

, the Hessian matrix

H (x, y; σ)

at pixel

(x, y)

is defined as:

H (x, y; σ) = [\begin{matrix} I (x, y) * \frac{\partial^{2} G}{\partial x^{2}} & I (x, y) * \frac{\partial^{2} G}{\partial x \partial y} \\ I (x, y) * \frac{\partial^{2} G}{\partial y \partial x} & I (x, y) * \frac{\partial^{2} G}{\partial y^{2}} \end{matrix}]

(8)

where

G

denotes the 2D isotropic Gaussian kernel used for multi-scale representation and noise suppression. The expression for the Gaussian kernel is given by:

G (x, y; σ) = \frac{1}{2 π σ^{2}} e x p (- \frac{x^{2} + y^{2}}{2 σ^{2}})

(9)

where

(x, y)

represent the spatial coordinates along the azimuth and range dimensions, and

σ

denotes the scale parameter, which corresponds to the standard deviation of the Gaussian distribution. In the Hessian matrix defined in Equation (8),

σ

dictates the physical size of the local neighborhood over which the second-order derivatives are computed. By tuning

σ

, the convolution operation selectively enhances target blobs of corresponding sizes while effectively smoothing out fine-grained background noise. In this study, to accommodate the physical dimensions of the target echoes, the discrete scale parameters are set to

σ \in [1 . 0,1.2]

.

By performing an eigen-decomposition on the constructed Hessian matrix, we can extract the principal curvatures of the local spatial structures. Letting

I_{x x}

,

I_{y y}

, and

I_{x y}

denote the respective elements of

H (x, y; σ)

, the corresponding raw eigenvalues

μ_{1}

and

μ_{2}

can be explicitly computed as:

μ_{1, 2} = \frac{(I_{x x} + I_{y y}) \pm \sqrt{{(I_{x x} - I_{y y})}^{2} + 4 I_{x y}^{2}}}{2}

(10)

Subsequently, the magnitudes of the principal curvatures

|μ_{m a x}|

and

|μ_{m i n}|

, which are utilized for discrimination, are derived from the absolute values of these eigenvalues:

∣ μ_{m a x} ∣ = m a x (∣ μ_{1} ∣, ∣ μ_{2} ∣), ∣ μ_{m i n} ∣ = m i n (∣ μ_{1} ∣, ∣ μ_{2} ∣)

(11)

where

|μ_{m a x}|

and

|μ_{m i n}|

represent the respective magnitudes of the two principal curvatures of the local image surface. Specifically,

|μ_{m a x}|

quantifies the maximum steepness along the direction of greatest change, whereas

|μ_{m i n}|

measures the degree of curvature in the orthogonal direction.

Figure 3 summarizes the results from a sampling analysis of representative targets and background interference in sonar imagery. The left panel shows the raw sonar image, while the right panel presents a comparison of shape isotropy. In the right panel, the horizontal axis denotes the blobness ratio, explicitly calculated as

R_{b l o b} = |μ_{m i n}| / |μ_{m a x}|

. Values near 1 indicate similar curvature across directions, corresponding to a quasi-circular morphology; conversely, values near 0 indicate an elongated structure. The vertical axis represents the probability density, where the red histogram denotes the targets’ probability density and the yellow histogram denotes that of the noise.

This visualization reveals a clear statistical divergence: the core pixels of genuine targets tend to cluster at higher

R_{b l o b}

values indicative of higher isotropy, whereas structured stripe noise distributes predominantly across the lower spectrum. This distinct trend indicates that effective separation of genuine targets from background clutter can be achieved at a morphological level through an appropriate geometric threshold

T_{s h a p e}

, completely independent of the absolute intensity of the pixels. To implement this geometric discrimination, we formulate a soft-gating mask using a Sigmoid function:

P_{G a t e} (x, y) = M_{G a t e} (x, y) = \frac{1}{1 + e x p (- g \cdot (R_{b l o b} (x, y) - T_{s h a p e}))}

(12)

where

M_{G a t e} (x, y)

denotes the resulting soft-gating mask, and

T_{s h a p e}

represents the geometric threshold that defines the system’s tolerance baseline for target morphology. When

R_{b l o b} \geq T_{s h a p e}

, the structure is identified as a potential target, and the gating value converges to 1; conversely, it is classified as background, with the value converging to 0. In this study, we set

T_{s h a p e} = 0.40

. The parameter

g

serves as the gain coefficient controlling the steepness of the function, a larger

g

yields a response closer to a hard threshold, enhancing the discrimination between targets and background clutter. But it will also increase the risk of edge truncation artifacts. In this study, the optimal parameters are explicitly set to

T_{s h a p e} = 0.40

and

g = 5

. A detailed analysis of these parameter selections is provided in Section 3.3.3. The geometric probability map

P_{G a t e} (x, y)

is equivalent to the soft gating mask in numerical. The generated map

P_{G a t e} (x, y) \in [0, 1]

functions as a spatial weight in subsequent processing stages, where values approaching 1 indicate a higher degree of geometric confidence in the presence of a target.

2.2.2. LoG5S-Based Local Adaptive Detection

Standard detection algorithms commonly utilize isotropic LoG filters based on the assumption that targets appear as perfectly circular blobs. However, due to the spatial directivity limitations of a finite array aperture, genuine FLS targets inherently manifest as anisotropic structures accompanied by pronounced sidelobes. A standard G1 model is inadequate for representing the energy dispersed within these sidelobes; the filter misclassifies this structural energy as background noise, leading to target signal degradation. According to matched-filtering principles, this inherent energy loss fundamentally prevents the maximization of the image SCR.

Consequently, we introduce a Gaussian

N

-Superposition (GNS) model in this study. By employing a linear superposition of multiple Gaussian components in both the range and azimuth directions, the GNS strategy characterizes the complex energy distribution features—including the mainlobe and sidelobes—with high flexibility and precision. Such high-fidelity parametric modeling not only maximizes the SCR gain but also effectively suppresses noise interference resulting from sidelobe energy leakage.

This study investigates an FLS system built on a uniform circular array (UCA). The directivity theory of circular arrays shows that its far-field target beam pattern follows the first-order Bessel function and therefore appears as an Airy disk [34].

B (θ) \propto {[\frac{2 J_{1} (\frac{1}{2} k D s i n θ)}{\frac{1}{2} k D s i n θ}]}^{2}

(13)

where

B (θ)

denotes the beam pattern function, representing the spatial distribution of beam energy generated by the sonar array in the far field as a function of angle;

J_{1} (\cdot)

is the first-order Bessel function of the first kind;

D

is the diameter of the circular array;

k

is the wavenumber of the transmitted signal; and

θ

represents the deviation angle relative to the normal direction of the sonar array.

Analogous to the

s i n c

directivity pattern produced by linear arrays, this Bessel-based distribution yields a beam characterized by a dominant central lobe and clearly defined sidelobes that decay asymptotically with the angle. Figure 4 illustrates this discrepancy: the solid blue curve plots the theoretical Bessel model, whereas the green dashed curve represents the standard G1 model fit. Evidently, the G1 model completely fails to reproduce the Bessel sidelobes; therefore, a more sophisticated parametric model is strictly required to faithfully characterize the target’s spatial energy distribution.

Directly using Bessel functions to build convolution kernels suffers from truncation artifacts, high computational cost, and limited flexibility. To overcome these limitations, the proposed GNS model strikes an optimal balance between physical fidelity and computational efficiency. It mathematically approximates the physical distribution of the Airy disk in Equation (13) through a linear superposition of

N

Gaussian components, as expressed in Equation (14):

P_{G N S} (x, y) = \sum_{i = 1}^{N} A_{i} \cdot e x p (- (\frac{{(x - μ_{x, i})}^{2}}{2 σ_{x, i}^{2}} + \frac{{(y - μ_{y, i})}^{2}}{2 σ_{y, i}^{2}}))

(14)

where

P_{G N S} (x, y)

denotes the normalized energy intensity of the PSF at the 2D image coordinates within the FLS imagery;

N

represents the total number of superimposed Gaussian components, defined as the model order;

i = 2, 3, \dots, N

indicates the component index;

(x, y)

represents the discrete pixel coordinates of the image; and

A_{i}

represents the amplitude coefficient of the

i

-th Gaussian component, which physically characterizes the energy weight of that specific lobe. Specifically,

A_{i} (i = 1)

represents the concentrated acoustic energy within the mainlobe, whereas

A_{i}

for

i \geq 2

denotes the residual acoustic energy dispersed in the sidelobes due to the finite array aperture. Furthermore,

(μ_{x, i}, μ_{y, i})

denotes the mean center position of the

i

-th Gaussian component, where

μ_{x, i}

and

μ_{y, i}

determine the spatial offsets in the azimuth and range directions, respectively. Finally,

(σ_{x, i}, σ_{y, i})

represents the standard deviations of the

i

-th Gaussian component, which physically characterize the spatial resolution constraints of the sonar system; specifically,

σ_{x, i}

controls the beamwidth in the azimuth direction, and

σ_{y, i}

governs the pulse length in the range direction.

Based on the theory of matched filtering, to effectively extract the desired target structures from the noisy background, this paper constructs the Laplacian-of-GNS (LoGNS) filtering kernel by applying the Laplacian operator to the physical model

P_{G N S}

.

K_{L o G N S} (x, y) = \nabla^{2} P_{G N S} (x, y) = \sum_{i = 1}^{N} (\frac{\partial^{2} P_{i}}{\partial x^{2}} + \frac{\partial^{2} P_{i}}{\partial y^{2}})

(15)

expanding this equation yields:

K_{L o G N S} (x, y) = \sum_{i = 1}^{N} A_{i} \cdot [\frac{1}{σ_{x, i}^{2}} (\frac{{(x - μ_{x, i})}^{2}}{σ_{x, i}^{2}} - 1) + \frac{1}{σ_{y, i}^{2}} (\frac{{(y - μ_{y, i})}^{2}}{σ_{y, i}^{2}} - 1)] \cdot G_{i} (x, y)

(16)

where

G_{i} (x, y) = e x p (- (\frac{{(x - μ_{x, i})}^{2}}{2 σ_{x, i}^{2}} + \frac{{(y - μ_{y, i})}^{2}}{2 σ_{y, i}^{2}}))

(17)

Thus far, the general analytical expression of the LoGNS filter kernel has been fully derived. It is evident that in practical detection scenarios, the operator’s efficiency in focusing the mainlobe energy of the target, as well as its precision in fitting complex sidelobe structures, is entirely contingent upon the model order

N

of the GNS model and the key parameter set

Θ = {A_{i}, μ_{x, i}, μ_{y, i}, σ_{x, i}, σ_{y, i}}_{i = 1}^{N}

. To transform the theoretical model into a practical operator capable of precisely characterizing the authentic PSF, this study employs the nonlinear least squares method to optimally estimate the model parameters using actual imaging data. Initially, a high-SCR empirical PSF, denoted as

I_{e m p}

, is constructed by performing c and superposition averaging on real target samples. This empirical PSF serves as the “ground truth” for parameter estimation. The solution for the parameters is formulated as a nonlinear optimization problem, aimed at minimizing the sum of squared residuals between the image generated by the GNS model and the empirical PSF. The objective function

J (Θ)

is constructed as follows:

J (Θ) = \sum_{x, y} {(I_{e m p} (x, y) - P_{G N S} (x, y ∣ Θ))}^{2}

(18)

Given that the objective function is non-convex with respect to the parameter set

Θ

, this study employs the Levenberg–Marquardt (LM) algorithm for iterative optimization. To prevent the optimization from becoming trapped in local minima and to ensure the physical significance of the parameters, we impose the non-negativity constraint

A_{i} > 0

. Furthermore, although the GNS model mathematically comprises

N

independent Gaussian components, an inherent coupling relationship exists among them due to the physical symmetry of the sonar beam. Consequently, the optimization process does not solve for all parameters independently. Specifically,

▪: For the mainlobe $(i = 1)$ , the parameters $A_{1}$ , $μ_{1}$ and $σ_{1}$ are solved independently;

▪: For the remaining $N - 1$ components $i \geq 2$ , they are treated as symmetric sidelobe pairs. For each pair, symmetry constraints are applied to enforce equal amplitude and width, and symmetric spatial positioning with respect to the mainlobe. That is, the following conditions are strictly satisfied:

$\{\begin{matrix} A_{2 l} = A_{2 l + 1} \\ σ_{2 l} = σ_{2 l + 1} \\ μ_{2 l} = μ_{1} - Δ μ_{l} \\ μ_{2 l + 1} = μ_{1} + Δ μ_{l} \end{matrix} (I = 1, 2, . . . \frac{N - 1}{2})$

(19)

where $l$ denotes the index of the sidelobe pair; $Δ μ_{l}$ is the spatial offset parameter representing the physical distance by which the $l$ -th sidelobe pair deviates from the target center. This constraint reduces the degrees of freedom, enabling the model to robustly reconstruct the spatial energy distribution of the PSF while strictly aligning with the actual physical echoes.

The selection of the model order

N

necessitates a trade-off between fitting accuracy and computational complexity. In this study,

N

is set to 5, transitioning the general GNS framework into the specific G5S model. The detailed rationale for this selection is discussed in Section 3.1. By implementing the aforementioned optimization strategies and symmetry constraints, the number of independent parameters required to be solved for each dimension is significantly reduced to 9 when

N = 5

. Upon the convergence of the LM iteration, the resulting optimal parameter set is presented in Table 1:

Following the completion of the LoG5S filter kernel construction, the raw sonar image undergoes filtering to generate the target likelihood map, denoted as

E_{G 5 S}

. This operation is mathematically defined as the convolution between the input sonar image

I (x, y)

and the filter kernel:

E_{G 5 S} (x, y) = m a x (0, I (x, y) * K_{L o G 5 S} (x, y))

(20)

By utilizing this filter, the multi-peak structure of

K_{L o G 5 S}

inherently aligns with the target’s sidelobes. This spatial alignment enables the substantial preservation of sidelobe energy. It will maximize the response magnitude at the target location and significantly enhance the SCR of the image. Simultaneously, functioning as a zero-mean band-pass filter,

K_{L o G 5 S}

effectively suppresses the low-frequency DC components within the background reverberation. This dual capability ensures that

E_{G 5 S}

serves as a high-contrast map accurately reflecting the target likelihood.

Finally, the map

E_{G 5 S}

generated in this section is fused with the geometric probability map

P_{G a t e}

, obtained via the Hessian-based shape gating described previously, to yield the final target likelihood map, denoted as

E_{F u s e}

. It is formulated as:

E_{F u s e} (x, y) = P_{G a t e} (x, y) \cdot E_{G 5 S} (x, y)

(21)

This fusion strategy synergistically combines multiple extracted features for rigorous screening. Through this joint evaluation, large-area reverberation stripes and isolated strong speckle noise are precisely eliminated. The algorithm, consequently, exclusively preserves true targets satisfying both isotropic geometric characteristics and matched energy patterns. By isolating these authentic responses, false alarms originating from clutter are significantly suppressed. This comprehensive screening mechanism ultimately yields a highly robust target-likelihood representation for the final detection stage.

2.2.3. Adaptive Thresholding Processing for Noise Suppression

Following the enhancement procedures described in the previous two sections, we obtain a relatively clean target likelihood map, denoted as

E_{F u s e}

. However, reverberation varies substantially across the complex underwater environment, so the resulting map may still contain sparse but high-intensity anomalous artifacts. To address these residual errors, an adaptive thresholding scheme is applied. Particularly, we employ local adaptive thresholding followed by morphological filtering to eliminate isolated noise points while strictly preserving genuine targets.

First, a sliding window performs a local statistical analysis on

E_{F u s e}

for dynamic assessment of background clutter. Specifically, for any pixel located at coordinates

(x, y)

within

E_{F u s e}

, a local window

W_{x y}

centered at that pixel is defined. The local mean

μ_{l o c a l} (x, y)

within this neighborhood is then computed to serve as the estimate of the background intensity:

μ_{l o c a l} (x, y) = \frac{1}{S^{2}} \sum_{(i, j) \in W_{x y}} E_{F u s e} (i, j)

(22)

where

S

denotes the side length of the sliding window. The algorithm dynamically adjusts this window scale according to the image resolution. A strong target can improperly elevate the local mean and suppress its own detection. To effectively mitigate this “self-masking” effect, we set the side length

S

to four times the support domain of the G5S filter kernel This ensures that the window encompasses a sufficient number of background samples relative to the target footprint. In this study, the window size is explicitly configured as

61 \times 61

pixels (i.e.,

S = 61

).

After estimating the local background, we introduce a sensitivity coefficient,

η

, to define a local adaptive threshold,

T_{a d a p t} (x, y)

, to achieve robust segmentation of the target region:

T_{a d a p t} (x, y) = \frac{μ_{l o c a l} (x, y)}{1 - η}

(23)

where the sensitivity coefficient is set to

η = 0.90

in this study, with a detailed justification provided in Section 3.3.3. The final target extraction is formulated as a pixel-wise decision process, in which

T_{a d a p t}

estimates the upper energy limit of the local background clutter. Finally, within the likelihood map

E_{F u s e}

, pixels with intensities

T_{a d a p t}

are identified as potential targets, whereas those falling below this threshold are classified as noise. Based on this criterion, the binary detection mask

B (x, y)

is generated as follows:

B (x, y) = \{\begin{array}{l} 1, & if E_{Fuse} (x, y) > T_{adapt} (x, y) \\ 0, & otherwise \end{array}

(24)

Furthermore, strong sidelobes and beam leakage frequently result in horizontal and vertical artifact noise. To eliminate these residual artifacts, this study proposes a dynamic morphological filter. By exploiting the spatial characteristics of such noise—specifically its pronounced directionality and large aspect ratios—we construct adaptive linear structuring elements, denoted as

S_{a z i m u t h}

and

S_{r a n g e}

, along the azimuth and range directions, respectively. The length of these elements is dynamically set to twice the estimated physical size of the target. The final refined detection map,

D_{f i n a l}

, is obtained via the following formulation:

D_{f i n a l} = B \ [(B \circ S_{a z i m u t h}) \cup (B \circ S_{r a n g e})]

(25)

where

B

denotes the foreground pixel set of the original binary mask;

S_{a z i m u t h}

and

S_{r a n g e}

represent the structuring elements in the azimuth and range directions, respectively; the operator

\circ

indicates morphological opening, which is defined as

X \circ S = (X ⊖ S) \oplus S

, where

⊖

and

\oplus

denote erosion and dilation operations;

\cup

represents the set union operation; and

\

denotes the set difference operation.

Genuine targets and discrete speckle noise exhibit an inherent disparity in pixel connectivity. To exploit this disparity and eliminate residual isolated noise, the algorithm performs connected component analysis on the binary mask. This analysis decomposes the refined map

D_{f i n a l}

into a collection of independent connected sets

\{O_{l}\}

. An area-based filtering step strictly discards components with a pixel count smaller than the threshold

τ_{m i n}

. This explicit removal achieves a more thorough suppression of speckle noise. In this implementation,

τ_{m i n}

is empirically set to 5.

Although the aforementioned steps effectively mitigate stripe artifacts and speckle noise, exceptionally strong genuine targets might still suffer from morphological distortion due to intense reverberation coupling. To prevent the erroneous deletion of such targets during geometric filtering, this study establishes a safety threshold based on global statistical characteristics to enforce the preservation of high-energy pixels. The final detection result,

M_{o u t}

, is calculated as follows:

M_{o u t} = D_{f i n a l}^{'} \cup {(x, y) | E_{F u s e} (x, y) > μ_{g} + α \cdot σ_{g}}

(26)

where

M_{o u t} (x, y)

represents the final output binary detection result;

D_{f i n a l}^{'}

denotes the binary mask obtained following the connected component analysis;

μ_{g}

and

σ_{g}

correspond to the global mean and global standard deviation of

E_{F u s e}

, respectively; and

α

serves as the high-energy exemption coefficient. This formulation guarantees that when a pixel’s intensity exceeds the global mean by

α

times the global standard deviation, it is exempted from morphological suppression and directly preserved in the output. In this study,

α = 10

.

3. Experiments

This section presents three experimental evaluations: optimal model-order selection, G5S model validation, and detection-performance assessment. First, the optimal GNS model order is determined by balancing fitting accuracy and computational efficiency using an empirical PSF. Second, the physical consistency and robustness of the G5S model are validated through comparisons with classical models and multi-system simulations. Finally, using multiple field-measured datasets, the proposed method is compared with state-of-the-art methods. and evaluated in terms of SCR, BSF, F1-score, and Area Under Curve (AUC).

The experiments presented in this section were conducted on a computing platform equipped with a 12th-generation Intel Core™ i7-12700H processor and 32 GB of RAM, operating under a 64-bit Windows 11 environment. All algorithms were implemented in MATLAB R2021b. The experimental data were acquired from field trials carried out by Harbin Engineering University in the Dalijia coastal area of Dalian, China, in March 2025.

The sonar system employed in the experiments was a custom-built UCA developed by our laboratory at Harbin Engineering University, with a diameter of 250 mm, providing full 360° omnidirectional coverage. A continuous wave (CW) pulse was transmitted, with a center frequency of 100 kHz, a bandwidth of 15 kHz, and a pulse width of 1 ms. During deployment, both the sonar transducer and the target—an iron sphere with a scattering strength of −15 dB—were fixed at a depth of 2.5 m below the water surface. The effective detection range extended from 50 m to 100 m. The environmental conditions of the experimental sea area, along with the target deployment and system configuration, are illustrated in Figure 5.

3.1. Determination of the GNS Model Order

To identify the Gaussian superposition model that most accurately characterizes the target PSF, fitting experiments are conducted across various model orders

N

. As established in Section 2, the empirical PSF, denoted as

I_{e m p}

, is constructed by averaging multiple authentic target samples extracted from the sea trial data, subsequent to spatial alignment, background suppression, and intensity normalization. To quantitatively assess the impact of the model order

N

on fitting accuracy, and to justify the necessity of higher-order components, the Mean Absolute Error (MAE) is adopted as the evaluation metric:

M A E = \frac{1}{H \times W} \sum_{x = 1}^{H} \sum_{y = 1}^{W} |P_{G N S} (x, y) - I_{e m p} (x, y)|

(27)

where

H

and

W

represent the height and width of the spatial bounding box for the target, respectively. In this experiment, the dimensions are empirically set to

H = 65

and

W = 15

pixels, such that

I_{e m p} \in R^{65 \times 15}

. The variables

(x, y)

denote the spatial pixel coordinates. A smaller MAE value directly correlates with a more precise reconstruction of the authentic target’s spatial energy distribution by the model.

The final results are presented in Figure 6, where pixel intensity is represented as height to visualize the spatial distribution of echo energy. Inspection of the empirical PSF reveals a pronounced anisotropic structure: the energy peak exhibits significant extension along the range direction while remaining relatively narrow in the azimuth direction. In addition, secondary energy peaks are clearly observed on both sides of the mainlobe in the azimuth direction. This peak structure corroborates the spatial directivity limitations inherent to the sonar array, which fundamentally cannot be adequately characterized by a standard G1 model.

As a result, the G1 model produces the largest MAE, reflecting the poorest fitting performance. Although it can roughly approximate the main energy peak, it fails to capture the essential sidelobe structures, leading to severe underfitting. As the model order

N

increases, the MAE gradually decreases, and the fitted model begins to represent the energy spreading around the mainlobe. However, for lower values of

N

, the fine peak–valley structures of the sidelobes remain blurred, preventing accurate reconstruction of the true beam shape.

To explicitly quantify the computational cost versus performance trade-off and justify the optimal model selection for practical deployment, a runtime analysis was conducted, as illustrated in the dual Y-axis plot in Figure 6. As the parameter

N

increases from 1 to 5, the MAE decreases sharply, reaching a convergence point at

N = 5

. Crucially, for

N \geq 5

, the MAE curve plateaus, indicating that further increasing the model order yields negligible marginal improvements in geometric fitting accuracy. However,

N > 5

sustains a high computational burden without contributing any additional morphological value. In practical maritime deployment scenarios, computational resources and latency budgets are strictly constrained. Therefore,

N = 5

is identified as the optimal inflection point, striking the perfect balance between the high-fidelity spatial reconstruction and the low-latency processing required for real-time engineering applications; as a result, the model order is definitively set to

N = 5

in this study.

3.2. Model Fitting Experiments

To evaluate the proposed G5S model’s fitting performance on authentic targets, this study compares it against three classical models: the G1,

{s i n c}^{2}

, and Bessel models. The comparison utilizes both 1D profiles and 2D reconstructions. Multiple quantitative metrics are employed to assess the overall fitting accuracy.

First, the Normalized Error for Spatial Scale (NESS) between the empirical PSF and the different fitting models is calculated based on the profiles along the range and azimuth directions. The NESS is defined to evaluate the discrepancy in spatial distribution as follows:

N E S S = \frac{\sum_{i = 1}^{K} {[y (i) - \hat{y} (i)]}^{2}}{K \cdot {(m a x (y) - m i n (y))}^{2}}

(28)

where

y (i)

represents the normalized profile of the empirical PSF along either the range or azimuth direction, obtained by averaging the target regions from multiple sets of actual sonar images;

\hat{y} (i)

denotes the corresponding model-fitted profile;

K

indicates the total number of pixels in the respective 1D direction; and

m a x (y) - m i n (y)

represents the spatial dynamic range.

Figure 7 illustrates both the quantitative metrics and the 2D visual reconstructions of the different PSF fitting models Figure 7a presents the comparative results of the NESS for different PSF fitting models along both the range and azimuth directions. In the range direction, attributed to the precise characterization of pulse sidelobes, the G5S model achieves an approximate two-fold error reduction compared to the standard G1 model. In the azimuth direction, traditional models such as G1,

{s i n c}^{2}

, and Bessel fail to simultaneously account for the beam anisotropy and sidelobe features, resulting in severe underfitting. This architecture dramatically decreases the azimuth error from approximately

12.58 \times 10^{- 3}

to

0.28 \times 10^{- 3}

. This absolute drop achieves a remarkable reduction of over 40-fold.

To further characterize the fitting efficacy, we reconstructed the 2D PSF by using optimal parameters (Figure 7b). The empirical PSF exhibits pronounced anisotropy and distinct sidelobe structures. Traditional models produce over-smoothed, single-kernel elliptical spots, failing to capture these structural edge details. The proposed G5S model accurately restores the high-intensity central mainlobe and successfully reconstructs the faint range-direction sidelobes. This comprehensive spatial reconstruction achieves physics-level morphological fidelity consistent with experimental data.

The absolute residual maps and MAE metrics (Figure 7c) further corroborate this observation. Traditional single-kernel models uniformly exhibit high-intensity annular error bands, indicating a significant loss of structural target energy. The G5S residual map presents purely as low-amplitude random noise, reducing the MAE to 0.0183. This confirms the model’s precise capture of anisotropic features, effectively overcoming the theoretical mismatches inherent in traditional methods. This robust formulation provides a more precise mathematical representation for sonar targets.

To demonstrate the generalizability and training-free capability of the G5S model, this study conducted four sets of simulation experiments configured according to the specifications of actual sonar systems. To ensure a rigorous comparison, a UCA and standard time-domain beamforming were uniformly applied across all groups to generate point target responses under ideal high-SCR conditions, serving as the physical ground truth. We use the 2D Pearson correlation coefficient (PCC) as the evaluation metric to quantify the predictive accuracy of the G5S model for the PSFs of different sonar systems. The PCC is defined as follows:

P C C = \frac{C o v (I_{G T}, I_{p r e d})}{σ_{G T} \cdot σ_{p r e d}}

(29)

Figure 8 presents the results of this experiments. Each row integrates the specific hardware parameters directly as labels, displaying the 2D target responses alongside the 1D comparative profiles for both azimuth and range directions.

In the range dimension, a detailed comparison reveals that as the bandwidth

B

increases from 15 kHz to 60 kHz, the G5S model accurately predicts the narrowing of the main lobe width. This phenomenon is in strict accordance with the theoretical principle derived in Equation (5), where the physical size

H_{T}

is inversely proportional to the bandwidth. Similarly, in the azimuth dimension, the comparative profiles demonstrate that as the aperture diameter

D

increases from 250 mm to 510 mm, the model precisely reconstructs the compression of the beam width associated with the expansion of the array aperture. This visual consistency is quantitatively validated by the PCC scores integrated directly into the labels of Figure 8. Across all four configurations, the PCC between the G5S predictions and the simulated ground truth consistently exceeds 0.969, reaching a peak of 0.980 for Parameter 3. This confirms that the G5S model functions not merely as a high-precision fitting tool, but as a general-purpose sonar imaging operator characterized by strong physical interpretability.

3.3. Target Detection Experiments

Following the validation of the physical consistency of the model, this section focuses on the practical engineering performance of the algorithm in real-world underwater environments. The primary objective is to comprehensively verify the detection effectiveness of the proposed LoG5S-LAD method.

3.3.1. Evaluation Metrics

To enable an objective and quantitative assessment of detection algorithms, this study explicitly utilizes SCR, BSF, F1-Score, and AUC as objective metrics to comprehensively evaluate and compare the detection accuracy and background suppression capabilities across different algorithms.

(1): Signal-to-Clutter Ratio (SCR)

To quantify the target enhancement capability of the algorithm on sonar images, we employ the SCR as the primary evaluation metric. The specific calculation is formulated as follows:

S C R = \frac{|μ_{t} - μ_{b}|}{σ_{b}}

(30)

where

μ_{t}

and

μ_{b}

represent the average grayscale intensities of the target region and the local background region, respectively.

σ_{b}

denotes the standard deviation of the grayscale intensities within the local background region. A higher SCR value signifies a more distinct target against the background clutter, indicating superior enhancement performance.

(2): Background Suppression Factor (BSF)

The BSF quantifies the algorithm’s capability to suppress pervasive background clutter. A larger BSF indicates a stronger suppression effect and a clearer distinction between the target signal and the background. The BSF is calculated as follows:

B S F = \frac{σ_{i n}}{σ_{o u t}}

(31)

where

σ_{i n}

represents the standard deviation of the input image, and

σ_{o u t}

denotes the standard deviation of the image processed by the algorithm.

(3): F1-Score

The F1-Score is defined as the harmonic mean of precision and recall. Given the complexity of underwater environments, relying exclusively on a single metric—either precision or recall—is insufficient to fully characterize algorithmic robustness. Therefore, the F1-Score is employed to effectively balance the trade-off between missed detections and false alarms, providing a comprehensive assessment of detection performance. The specific calculation is given by:

Fl-Score = 2 \times \frac{P r e c i s i o n \times P_{d}}{P r e c i s i o n + P_{d}}

(32)

P_{d} = \frac{T P}{T P + F N}

(33)

P r e c i s i o n = \frac{T P}{T P + F P}

(34)

where

T P

represents the number of correctly identified targets,

F N

denotes the number of actual targets that failed to be correctly identified, and

F P

represents the number of false positives. An F1-Score closer to 1 indicates that the algorithm effectively controls the false alarm rate while simultaneously maintaining a high detection rate.

(4): Area Under the Curve (AUC)

AUC represents the area under the ROC curve, reflecting the comprehensive detection performance of the algorithm across different sensitivities. AUC is calculated as follows:

A U C = \int_{0}^{1} P_{d} (F_{a}) d F_{a}

(35)

where

P_{d}

represents the detection probability, and

F_{a}

denotes the false alarm probability. An AUC value closer to 1 indicates a stronger capability of the algorithm to distinguish the target region from background clutter, implying superior robustness to threshold variations.

The F1-Score quantifies the detection performance at a specific, optimal threshold. In contrast, the AUC provides a threshold-independent assessment, objectively reflecting the algorithm’s robustness and global detection capability across all potential operating points. Given their complementary nature, both metrics are employed in this section to ensure a comprehensive and rigorous evaluation of the proposed algorithm.

3.3.2. Experimental Performance Analysis

To evaluate the proposed algorithm, comparative experiments were conducted against five representative traditional methods (Top-hat, MS-LoG, LIG, DGRAD, and MS-AAGD) and a lightweight Convolutional Neural Network (CNN) based on the You Only Look Once version 2 (YOLOv2) architecture. The traditional methods were selected due to their proven effectiveness in small target detection under high-noise conditions, while YOLOv2 was adopted as a representative data-driven approach suitable for edge deployment under size, weight, and power constraints.

More complex deep learning architectures, such as Transformer-based models, were not considered in this forward-looking sonar (FLS) scenario. This is primarily due to limited availability of high-quality annotated sonar data, which may lead to overfitting in data-intensive models, as well as their high computational cost, which restricts real-time deployment on marine platforms. In addition, the proposed method is grounded in acoustic imaging principles, whereas data-driven approaches rely on statistical learning, making direct comparison less straightforward. Therefore, a lightweight CNN baseline is included to ensure a balanced and practically relevant evaluation.

Prior to the comparative experiments, all sonar images were resampled to a fixed

500 \times 500

spatial grid using bilinear interpolation to improve computational efficiency. Compared with direct decimation, this approach better preserves local neighborhood information, and reduces small-target energy loss and visual aliasing while maintaining target saliency and background statistical characteristics [35].

To facilitate intuitive visualization, we use the three-dimensional (3D) surface plots for both global images and local target regions. In the first column, the red bounding box indicates the magnified target region, while in the subsequent columns, red ellipses denote the corresponding target responses produced by different algorithms. To further evaluate the performance under diverse underwater scenarios, three high-SCR and two low-SCR FLS images were selected for comparative analysis. The corresponding SCR values for Images 1 through 5 are 2.037, 2.350, 2.062, 0.323, and 0.126, respectively.

Observing the 3D intensity plots in Figure 9, the raw FLS data is heavily affected by speckle noise. Among the comparative methods, Top-hat, MS-LoG, and MS-AAGD exhibit incomplete background suppression with noticeable residual clutter. In contrast, LIG and DGRAD produce relatively flat backgrounds but tend to attenuate target energy. The CNN baseline (YOLOv2) also retains significant high-frequency noise, indicating limited robustness to speckle interference.

A detailed analysis of individual cases further highlights these differences. In Images 1–3, Top-hat and MS-LoG fail to remove local residual noise, while DGRAD and MS-AAGD show unstable target preservation, with DGRAD missing the target in Image 2 due to over-smoothing. LIG suppresses clutter effectively in Image 1 but produces a false response in Image 3 due to structural interference. The CNN approach remains heavily cluttered with high-frequency spikes across all views, indicating severe overfitting to speckle that easily triggers false alarms. In contrast, LoG5S-LAD maintains stable performance, achieving strong background suppression while preserving distinct target responses across varying conditions.

Figure 10 presents two sonar frames with extremely low SCR to evaluate algorithmic robustness against severe reverberation. In Image 4, affected by vertical noise and interference, Top-hat and MS-LoG fail to suppress peri-target clutter, while MS-AAGD amplifies background reverberation. DGRAD results in a missed detection, whereas LIG achieves effective background suppression by exploiting local intensity contrast. Image 5 presents a speckle-dominated environment. Most traditional methods struggle to isolate the target, increasing the risk of false alarms. Although LIG suppresses background noise, it distorts the target structure, resulting in fragmented responses. DGRAD again fails under these conditions. The data-driven CNN baseline suffered complete missed detections in both cases. Severe reverberation deeply submerges the physical target energy and the network fails to extract distinct spatial contours. The proposed method effectively suppressed background clutter in both cases, successfully restoring high-contrast target peaks within the ROIs.

To objectively evaluate detection performance, SCR, BSF, AUC, and F1-score were computed for five sonar images. The CNN-based method was excluded from this quantitative analysis due to fundamental differences in output representation. Specifically, the proposed and traditional methods operate on pixel-level intensity enhancement, enabling direct evaluation of background suppression and spatial structure, whereas the CNN produces bounding boxes and confidence scores, making SCR and BSF computation inapplicable and introducing bias in pixel-level metrics such as AUC and F1-score.

Figure 11 shows the SCR and BSF results and the proposed method which is highlighted in red consistently outperforms the comparison algorithms. This advantage is particularly evident in Image 4 and Image 5, where methods such as DGRAD and LIG exhibit significant degradation. In contrast, the proposed method achieves strong target enhancement and near-complete background suppression, demonstrating robust performance in complex underwater environments.

The quantitative results of AUC and F1-score are presented in Figure 12. In terms of F1-score, LoG5S-LAD shows clear advantages in critical scenarios, particularly in Image 2 and Image 3, indicating robust detection under high-SCR conditions. In contrast, under low-SCR conditions (Image 4 and Image 5), the performance of DGRAD, MS-AAGD, and LIG degrades significantly, with values below 0.01. Although MS-LoG achieves relatively competitive results, the proposed method maintains consistently high F1-scores with negligible degradation, demonstrating superior overall robustness. In terms of AUC, DGRAD drops below 0.1, confirming its failure in these scenarios. Top-hat and LIG yield generally low AUC values, indicating limited separability for threshold-based segmentation. MS-LoG achieves the highest AUC across all frames, with the proposed method consistently ranking second. This is attributed to the linear nature of MS-LoG, which preserves continuous grayscale variations and pixel ranking under threshold changes.

Although the high AUC of MS-LoG suggests theoretical separability of targets, its low BSF indicates that targets remain embedded in strong background residuals, making practical detection highly dependent on adaptive thresholding. In contrast, the proposed method employs a nonlinear Sigmoid gating mechanism to achieve strong background suppression, generating a high-contrast target-likelihood map. While this nonlinear process slightly reduces the theoretical AUC and limits the maximum F1-score, it enables detection in a nearly clutter-free environment. Consequently, the proposed method avoids complex threshold optimization and allows stable target extraction using a fixed threshold. Overall, the method trades a minor loss in theoretical metrics for substantial gains in robustness and practical applicability in complex underwater environments.

To further evaluate the algorithm in complex maritime environments, a continuous-frame experiment was conducted using a sequence of 94 sonar images acquired during field operations. Platform motion and environmental variations introduce dynamic noise with varying characteristics, making the dataset representative of realistic conditions.

As illustrated in Figure 13, the severe performance fluctuations of traditional methods directly reflect the challenging nature of this dynamic environment. First, Figure 13a,b present the SCR and the BSF. When confronted with continuous seabed reverberation, the BSF values of MS-LoG and Top-Hat remain fundamentally at the

10^{0}

level, whereas those of LIG, DGRAD, and MS-AAGD fluctuate violently between

10^{0}

and

10^{3}

. This indicates their extreme sensitivity to seabed heterogeneity and various noise types, rendering them entirely incapable of resisting dynamic background interference. The proposed LoG5S-LAD demonstrates exceptional environmental robustness, its BSF consistently remains on the order of

10^{3}

to

10^{4}

, while the SCR also maintains a commanding lead. Figure 13c,d show the comprehensive detection capability of the algorithms. Throughout this experiment, the F1-Score of LoG5S-LAD maintains a leading position almost entirely, and the AUC results are consistent with the previous analysis.

This continuous-frame evaluation robustly demonstrates that the proposed algorithm possesses strong robustness and generalization capability in complex, unconstrained real-world maritime environments, highlighting its significant reliability for practical engineering applications.

3.3.3. Parameter Settings and Analysis

This section details the configuration and sensitivity analysis of the core hyperparameters within the LoG5S-LAD model:

T_{s h a p e}

,

g

, and

η

. Theoretically,

T_{s h a p e}

dictates the discrimination between targets and noise,

g

regulates the contrast enhancement via mapping steepness, and

η

modulates the detection recall.

To determine the optimal settings, a joint grid search is performed within a search space empirically bounded by the statistical priors from Figure 3. Specifically, the search intervals are defined as

T_{s h a p e} \in [0.40,0.90]

and

g \in [5,45]

. This range is selected to sufficiently capture the morphological transition from noise-dominant to target-dominant regions, as well as the functional evolution from soft to hard thresholding.

The experimental results presented in Figure 14 validate the aforementioned theoretical analysis. Across three distinct parameter combinations of

g

and

T_{s h a p e}

, the SCR and BSF metrics remain substantially stable when

η

ranges from 0.5 to 0.9. However, once

η

exceeds 0.9, these metrics exhibit a noticeable decline. This drop occurs because the excessively low adaptive threshold triggers severe noise leakage. The resulting residual reverberation and speckle elevate the background energy variance, directly driving the attenuation of both SCR and BSF. While the AUC value fluctuates marginally between 0.865 and 0.870 with minimal amplitude, the F1-Score demonstrates a gradual and consistent increase as

η

increments, reaching a distinct global peak exactly at

η = 0.90

before suffering a sharp decline. Based on a comprehensive evaluation of these four metrics,

η

is set to 0.90 in this study to maintain an optimal balance between background suppression and detection accuracy.

Figure 15 illustrates the impact of parameters

g

and

T_{s h a p e}

on the detection performance of the proposed algorithm. It is evident that the SCR and BSF metrics exhibit distinct sensitivities to the parameter space. Specifically, the SCR displays a sharp, isolated maximum at

g = 5

and

T_{s h a p e} = 0.40

, whereas the BSF achieves its peak at a higher gain at

g = 25

. Beyond these specific nodes, the response remains relatively flat along the

T_{s h a p e}

axis; this confirms that the morphological threshold primarily functions as a geometric filter without significantly altering the broader output energy levels. Regarding the F1-Score, the maximum value is identically achieved at

g = 5

and

T_{s h a p e} = 0.40

, after which a continuous decline is observed as

g

increases. This occurs because an excessively large

g

value, while potentially providing stronger background suppression, simultaneously erodes the edges of weak targets, thereby reducing the effective target pixel area or even triggering missed detections. Notably, the AUC exhibits extreme stability across the entire parameter search space, with fluctuations remaining below 1%, which further validates the superior robustness of the proposed algorithm. Based on the aforementioned analysis, in this study,

g

is set to 5 to achieve an order-of-magnitude improvement in target contrast while mitigating the risk of target erosion, and

T_{s h a p e}

is set to 0.40 to strike an optimal balance between suppressing stripe interference and maintaining target morphological robustness.

In addition to the core parameters optimized through the aforementioned experiments, the remaining statistical and morphological parameter settings of the proposed algorithm are based on the physical characteristics of sonar imaging. Specifically, the local adaptive sliding window size

W_{x y}

is calculated based on the 100 kHz sonar data selected for this experiment, ensuring that the window contains sufficient samples while preserving target integrity. The minimum connected component area threshold

τ_{m i n}

is determined according to the size of real point targets, filtering out discrete speckle noise while safeguarding weak and small targets. The high-energy exemption coefficient is determined based on the statistical significance

10 σ

criterion [36], ensuring that only strong targets with extremely high confidence are forcibly retained. The parameter settings utilized in this framework are summarized in Table 2:

4. Discussion

This study conducts a comprehensive analysis of the anisotropic characteristics of target energy in FLS images. It introduces a physics-driven G5S model aimed at accurately fitting the target energy distribution. Based on this physical framework, the LoG5S-LAD algorithm is developed to facilitate the efficient detection of weak underwater targets.

4.1. Method Importance

The proposed LoG5S-LAD algorithm establishes a novel physics-driven paradigm for complex underwater target detection. Under severe speckle noise interference, traditional Gaussian-based detection algorithms are highly prone to signal leakage stemming from inadequate energy matching. In contrast, by comprehensively analyzing the physical imaging mechanisms of sonar targets, the G5S model is constructed to achieve a highly accurate representation of the acoustic energy distribution. The LoG5S filtering operator, formulated based on this model, further facilitates the precise focusing of target energy. Furthermore, to address the unique challenges of stripe noise and spatial reverberation heterogeneity inherent in sonar images, the integration of the Hessian matrix and adaptive thresholding robustly suppresses background clutter while ensuring the morphological preservation of weak targets. This synergistic mechanism significantly enhances both the overall detection efficacy and algorithmic robustness.

4.2. Algorithm Limitations

Despite achieving robust experimental performance in sonar image background suppression and target detection, the proposed method is primarily limited by its high computational complexity. First, high-precision physical modeling inherently demands substantial computational resources; although exploiting the spatial symmetry of the model optimizes parameter solving, the computation of critical parameters remains intensive. Furthermore, the noise suppression process requires calculating second-order partial derivatives to construct the Hessian matrix, while the adaptive thresholding necessitates intensive sliding-window operations across the entire image. This compounding computational load, characterized by extensive floating-point and matrix operations, may pose a major bottleneck for practical engineering deployment in edge sonar systems with strictly constrained hardware resources.

Consequently, our future research will primarily focus on exploring fast approximate solution strategies for the G5S model to alleviate this algorithmic overhead. Concurrently, efforts will be directed toward integrating heterogeneous computing hardware platforms, such as Field-Programmable Gate Arrays (FPGAs) or embedded GPUs. By achieving low-level acceleration via highly parallel processing and leveraging algorithm-hardware co-optimization, we aim to facilitate the real-time practical deployment of the proposed LoG5S-LAD framework.

5. Conclusions

This study addressed the critical challenge of low detection efficiency for weak and small targets in FLS images under strong reverberation and low SCR. Thorough analysis of the sonar imaging mechanism yields a novel physics-driven G5S model for accurate target energy characterization. Compared with conventional Gaussian-based models, the proposed model provides a more faithful description of sidelobe structures, thereby achieving superior physical consistency.

Based on this physical framework, the LoG5S-LAD target detection algorithm was developed. The method integrates a Hessian matrix-based geometric gating mechanism with a custom LoG5S filtering kernel to achieve precise target energy aggregation. These processes are fused into a unified target likelihood map, which, refined by robust morphological post-processing, enables extreme reverberation suppression while completely preserving weak target morphological responses.

Extensive experiments validated both the modeling accuracy and detection performance of the framework. Model fitting results demonstrated that the G5S model reduces fitting errors by approximately factors of 2 and 40 in the azimuth and range directions, respectively, compared to classical models. Simulation results across multiple FLS configurations yielded PCC values consistently above 0.9, indicating highly stable target characterization. Furthermore, comparative experiments on real-world datasets indicated that the proposed method achieves order-of-magnitude improvements in SCR and BSF over conventional baselines. The F1-Score and AUC metrics further confirmed that the method effectively mitigates performance degradation under strong reverberation and outperforms pure data-driven approaches in terms of interpretability and false alarm control.

Overall, the G5S model and the LoG5S-LAD framework provide highly reliable target modeling capabilities and exceptionally stable detection performance in complex underwater acoustic environments. The proposed methodology offers a robust, physics-grounded, and practical approach for weak and small target detection in engineering FLS applications under severe low-SCR conditions.

Author Contributions

Conceptualization, Y.W. and H.L.; methodology, Y.W. and J.W. (Jian Wang); software, Y.W.; validation, Y.W. and H.L.; formal analysis, Y.W. and J.W. (Jian Wang).; investigation, J.W. (Jian Wang). and Z.Z.; resources, H.L.; data curation, H.L.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W., J.W. (Jian Wang) and H.L.; visualization, Y.W.; supervision, J.W. (Jiani Wen) and H.L.; project administration, J.W. (Jiani Wen) and H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 42576186.

Data Availability Statement

The datasets presented in this article are not readily available because they contain proprietary sonar signal parameters from an ongoing research project and are subject to pending intellectual property rights. Requests to access the datasets should be directed to the corresponding author at hsli@hrbeu.edu.cn.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, J.; Li, H.; Dong, C.; Wang, J.; Zheng, B.; Xing, T. An underwater side-scan sonar transfer recognition method based on crossed point-to-point second-order self-attention mechanism. Remote Sens. 2023, 15, 4517. [Google Scholar] [CrossRef]
Zhou, T.; Si, J.; Wang, L.; Xu, C.; Yu, X. Automatic detection of underwater small targets using forward-looking sonar images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4207912. [Google Scholar] [CrossRef]
Yang, H.; Zhou, T.; Jiang, H.; Yu, X.; Xu, S. A lightweight underwater target detection network for forward-looking sonar images. IEEE Trans. Instrum. Meas. 2024, 73, 2525113. [Google Scholar] [CrossRef]
Wang, K.; Liu, P.; Zhang, C. ProNet: Underwater forward-looking sonar images target detection network based on progressive sensitivity capture. Comput. Mater. Contin. 2025, 82, 4931–4948. [Google Scholar] [CrossRef]
Zou, L.; Liang, B.; Cheng, X.; Li, S.; Lin, C. Sonar image target detection for underwater communication system based on deep neural network. Comput. Model. Eng. Sci. 2023, 137, 2641–2659. [Google Scholar] [CrossRef]
He, J.; Chen, J.; Xu, H.; Ayub, M.S. Small target detection method based on low-rank sparse matrix factorization for side-scan sonar images. Remote Sens. 2023, 15, 2054. [Google Scholar] [CrossRef]
Sun, Y.; Zheng, H.; Zhang, G.; Ren, J.; Xu, H.; Xu, C. DP-ViT: A dual-path vision transformer for real-time sonar target detection. Remote Sens. 2022, 14, 5807. [Google Scholar] [CrossRef]
Guo, Q.; Xie, K.; Ye, W.; Zhou, T.; Xu, S. A sparse Bayesian learning method for moving target detection and reconstruction. IEEE Trans. Instrum. Meas. 2025, 74, 4505413. [Google Scholar] [CrossRef]
Deshpande, S.D.; Er, M.H.; Venkateswarlu, R.; Chan, P. Max-mean and max-median filters for detection of small targets. In Signal and Data Processing of Small Targets 1999, Proceedings of the SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation, Denver, CO, USA, 4 October 1999; Drummond, O.E., Ed.; SPIE: Bellingham, WA, USA, 1999; pp. 74–83. [Google Scholar]
Zeng, M.; Li, J.; Peng, Z. The design of Top-Hat morphological filter and application to infrared target detection. Infrared Phys. Technol. 2006, 48, 67–76. [Google Scholar] [CrossRef]
Haralick, R.M.; Sternberg, S.R.; Zhuang, X. Image analysis using mathematical morphology. IEEE Trans. Pattern Anal. Mach. Intell. 1987, PAMI-9, 532–550. [Google Scholar] [CrossRef]
Chen, C.L.P.; Li, H.; Wei, Y.; Xia, T.; Tang, Y.Y. A local contrast method for small infrared target detection. IEEE Trans. Geosci. Remote Sens. 2014, 52, 574–581. [Google Scholar] [CrossRef]
Han, J.; Ma, Y.; Zhou, B.; Fan, F.; Liang, K.; Fang, Y. A robust infrared small target detection algorithm based on human visual system. IEEE Geosci. Remote Sens. Lett. 2014, 11, 2168–2172. [Google Scholar] [CrossRef]
Qin, Y.; Li, B. Effective infrared small target detection utilizing a novel local contrast method. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1890–1894. [Google Scholar] [CrossRef]
Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit. 2016, 58, 216–226. [Google Scholar] [CrossRef]
Fan, X.; Li, J.; Min, L.; Feng, L.; Yu, L.; Xu, Z. Dim and small target detection based on energy sensing of local multi-directional gradient information. Remote Sens. 2023, 15, 3267. [Google Scholar] [CrossRef]
Wang, G. Efficient method for multiscale small target detection from a natural scene. Opt. Eng. 1996, 35, 761. [Google Scholar] [CrossRef]
Moradi, S.; Moallem, P.; Sabahi, M.F. A false-alarm aware methodology to develop robust and efficient multi-scale infrared small target detection algorithm. Infrared Phys. Technol. 2018, 89, 387–397. [Google Scholar] [CrossRef]
Lu, Y.; Kou, S.; Wang, X. Micro-Doppler effect and sparse representation analysis of underwater targets. Sensors 2023, 23, 8066. [Google Scholar] [CrossRef] [PubMed]
Kim, S.; Lee, J. Scale invariant small target detection by optimizing signal-to-clutter ratio in heterogeneous background for infrared search and track. Pattern Recognit. 2012, 45, 393–406. [Google Scholar] [CrossRef]
Fotin, S.V.; Yankelevitz, D.F.; Henschke, C.I.; Reeves, A.P. A multiscale Laplacian of Gaussian (LoG) filtering approach to pulmonary nodule detection from whole-lung CT scans. arXiv 2019, arXiv:1907.08328. [Google Scholar] [CrossRef]
Wang, X.; Lv, G.; Xu, L. Infrared dim target detection based on visual attention. Infrared Phys. Technol. 2012, 55, 513–521. [Google Scholar] [CrossRef]
Dong, X.; Huang, X.; Zheng, Y.; Shen, L.; Bai, S. Infrared dim and small target detecting and tracking method inspired by human visual system. Infrared Phys. Technol. 2014, 62, 100–109. [Google Scholar] [CrossRef]
Han, J.; Ma, Y.; Huang, J.; Mei, X.; Ma, J. An infrared small target detecting algorithm based on human visual system. IEEE Geosci. Remote Sens. Lett. 2016, 13, 452–456. [Google Scholar] [CrossRef]
Qi, H.; Tan, S.; Li, Z. Anisotropic weighted total variation feature fusion network for remote sensing image denoising. Remote Sens. 2022, 14, 6300. [Google Scholar] [CrossRef]
Dillon, J.; Charron, R. Resolution measurement for synthetic aperture sonar. In Proceedings of the OCEANS 2019 MTS/IEEE SEATTLE, Seattle, WA, USA, 27–31 October 2019; pp. 1–6. [Google Scholar]
Fortunati, S.; Greco, M.S.; Gini, F. Asymptotic robustness of Kelly’s GLRT and adaptive matched filter detector under model misspecification. arXiv 2017, arXiv:1709.08667. [Google Scholar] [CrossRef]
Turin, G. An introduction to matched filters. IEEE Trans. Inf. Theory 1960, 6, 311–329. [Google Scholar] [CrossRef]
Sun, D.; Ma, C.; Mei, J.; Shi, W. Improving the resolution of underwater acoustic image measurement by deconvolution. Appl. Acoust. 2020, 165, 107292. [Google Scholar] [CrossRef]
Peng, Y.; Li, H.; Zhang, W.; Zhu, J.; Liu, L.; Zhai, G. Underwater sonar image classification with image disentanglement reconstruction and zero-shot learning. Remote Sens. 2025, 17, 134. [Google Scholar] [CrossRef]
Taxt, T. Restoration of medical ultrasound images using two-dimensional homomorphic deconvolution. IEEE Trans. Ultrason. Ferroelectr. Freq. Control. 1995, 42, 543–554. [Google Scholar] [CrossRef]
Yang, C.; Liu, C.; Bai, M.; Zhao, Y.; Ma, Y.; Liu, S. Weighted sparse image quality restoration algorithm for small-pixel high-resolution remote sensing data. Remote Sens. 2025, 17, 2979. [Google Scholar] [CrossRef]
Belmonte, A.; Riefolo, C.; Buttafuoco, G.; Castrignanò, A. An approach for spatial statistical modelling remote sensing data of land cover by fusing data of different types. Remote Sens. 2025, 17, 123. [Google Scholar] [CrossRef]
Wang, S.; Lin, Q.; Zhao, D.; Chen, Q. How to get airy disc from Airy pattern. Opt. Laser Technol. 1999, 31, 437–441. [Google Scholar] [CrossRef]
Getreuer, P. Linear methods for image interpolation. Image Process. Line 2011, 1, 238–259. [Google Scholar] [CrossRef]
Badhan, A.; Ganpati, A. Overview of outlier detection methods and evaluation metrics: A review. In Challenges in Information, Communication and Computing Technology; CRC Press: London, UK, 2024; pp. 736–741. ISBN 978-1-003-55909-2. [Google Scholar]

Figure 1. Schematic illustration of the point target imaging mechanism in sonar system.

Figure 2. The LoG5S-LAD target detection framework.

Figure 3. Geometric curvature contrast: Targets vs. Noise.

Figure 4. Comparison between the theoretical Bessel beam pattern and the standard G1 model fit.

Figure 5. Schematic diagram of the experimental setup and equipment connections.

Figure 6. Comparison between the real sonar target and the GNS model.

Figure 7. Comprehensive evaluation of the proposed G5S model against traditional methods. (a) Quantitative NESS comparison in the azimuth and range directions; (b) 2D reconstructed PSFs illustrating morphological fidelity; (c) Absolute residual maps with corresponding MAE metrics.

Figure 8. Comparison of simulated targets and G5S model results under different parameters.

Figure 9. Sonar image experiments under high SCR conditions. The numbers 1–3 denote the three different test cases (Images 1–3) detailed in the text.

Figure 10. Sonar image experiments under low SCR conditions. The numbers 4 and 5 denote the two test cases (Images 4 and 5) detailed in the text.

Figure 11. SCR and BSF values of the test images: (a) SCR values; (b) BSF values.

Figure 12. F1-Score and AUC values of the test images: (a) F1-Score values; (b) AUC values.

Figure 13. Results of continuous image sequences: (a) SCR; (b) BSF; (c) F1-Score; (d) AUC.

Figure 14. Impact of the sensitivity coefficient

η

on detection performance.

Figure 14. Impact of the sensitivity coefficient

η

on detection performance.

Figure 15. Comprehensive 3D response surfaces illustrating the joint impact of the gain coefficient

g

and geometric threshold

T_{s h a p e}

on detection performance: (a) SCR; (b) BSF; (c) F1-Score; and (d) AUC. Black lines on the surfaces represent contour lines of equal value.

Figure 15. Comprehensive 3D response surfaces illustrating the joint impact of the gain coefficient

g

and geometric threshold

T_{s h a p e}

on detection performance: (a) SCR; (b) BSF; (c) F1-Score; and (d) AUC. Black lines on the surfaces represent contour lines of equal value.

Table 1. Optimal parameter set for the G5S model employed in this study.

Dimension	Component	$A_{i}$	$σ_{i}$	$Δ μ_{i}$	Physical Interpretation
Range	$i = 1$	0.894	13.43	0	Mainlobe
	$i = 2,3$	0.118	3.07	13.99	Near Sidelobes
	$i = 4,5$	0.125	5.88	21.98	Far Sidelobes
Azimuth	$i = 1$	0.856	1.47	0	Mainlobe
	$i = 2,3$	0.160	2.56	2.93	Near Sidelobes
	$i = 4,5$	0.072	8.33	8.25	Far Sidelobes

Table 2. Simulation parameters.

Parameter	Value	Comment
$σ$	$[1.0, 1.2]$	Scale multipliers for multi-scale Hessian matrix analysis
$T_{s h a p e}$	0.40	Hessian geometric threshold
$g$	5	Sigmoid gain coefficient
$η$	0.90	Adaptive sensitivity coefficient
$W_{x y}$	$61 \times 61$	Locally adaptive sliding window size
$τ_{m i n}$	5	Component area threshold
$α$	10	High-Energy waiver factor

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, Y.; Wang, J.; Wen, J.; Zhang, Z.; Li, H. Small Target Detection in Forward-Looking Sonar Images via LoG5S-LAD Framework. Remote Sens. 2026, 18, 1518. https://doi.org/10.3390/rs18101518

AMA Style

Wei Y, Wang J, Wen J, Zhang Z, Li H. Small Target Detection in Forward-Looking Sonar Images via LoG5S-LAD Framework. Remote Sensing. 2026; 18(10):1518. https://doi.org/10.3390/rs18101518

Chicago/Turabian Style

Wei, Yuhang, Jian Wang, Jiani Wen, Zengming Zhang, and Haisen Li. 2026. "Small Target Detection in Forward-Looking Sonar Images via LoG5S-LAD Framework" Remote Sensing 18, no. 10: 1518. https://doi.org/10.3390/rs18101518

APA Style

Wei, Y., Wang, J., Wen, J., Zhang, Z., & Li, H. (2026). Small Target Detection in Forward-Looking Sonar Images via LoG5S-LAD Framework. Remote Sensing, 18(10), 1518. https://doi.org/10.3390/rs18101518

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Small Target Detection in Forward-Looking Sonar Images via LoG5S-LAD Framework

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Analysis of Sonar Imaging Mechanisms and Target Spatial Distribution Characteristics

2.2. LoG5S-Based Local Adaptive Target Detection Algorithm

2.2.1. Hessian-Based Shape Gating for Background Suppression

2.2.2. LoG5S-Based Local Adaptive Detection

2.2.3. Adaptive Thresholding Processing for Noise Suppression

3. Experiments

3.1. Determination of the GNS Model Order

3.2. Model Fitting Experiments

3.3. Target Detection Experiments

3.3.1. Evaluation Metrics

3.3.2. Experimental Performance Analysis

3.3.3. Parameter Settings and Analysis

4. Discussion

4.1. Method Importance

4.2. Algorithm Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI