Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration

Li, Haixiang; Ren, Haohao; Zhou, Yun; Zou, Lin; Wang, Xuegang

doi:10.3390/rs17172939

Open AccessArticle

Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration

by

Haixiang Li

,

Haohao Ren

^*,

Yun Zhou

,

Lin Zou

and

Xuegang Wang

School of Information Communication and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(17), 2939; https://doi.org/10.3390/rs17172939

Submission received: 9 July 2025 / Revised: 18 August 2025 / Accepted: 22 August 2025 / Published: 24 August 2025

(This article belongs to the Special Issue Synthetic Aperture Radar (SAR) Image Object Detection and Information Extraction: Methods and Applications (Second Edition))

Download

Browse Figures

Versions Notes

Abstract

Object detection in single synthetic aperture radar (SAR) imagery has always been essential for SAR interpretation. Over the years, the saliency-based detection method is considered as a strategy that can overcome some inherent deficiencies in traditional SAR detection and arouses widespread attention. Considering that the conventional saliency method usually suffers performance loss in saliency map generation from lacking specific task priors or highlighted non-object regions, this paper is devoted to achieving excellent salient object detection in single SAR imagery via a two-channel framework integrating Bayesian inference and adaptive iteration. Our algorithm firstly utilizes the two processing channels to calculate the object/background prior without specific task information and extract four typical features that can enhance the object presence, respectively. Then, these two channels are fused to generate an initial saliency map by Bayesian inference, in which object areas are assigned with high saliency values. After that, we develop an adaptive iteration mechanism to further modify the saliency map, during which object saliency is progressively enhanced while the background is continuously suppressed. Thus, in the final saliency map, there will be a distinct difference between object components and the background, allowing object detection to be realized easily by global threshold segmentation. Extensive experiments on real SAR images from the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset and SAR Ship Detection Dataset (SSDD) qualitatively and quantitatively demonstrate that our saliency map is superior to those of four classical benchmark methods, and final detection results of the proposed algorithm present better performance than several comparative methods across both ground and maritime scenarios.

Keywords:

synthetic aperture radar; object detection; saliency; two channels; Bayesian inference; adaptive iteration

1. Introduction

As active microwave equipment, synthetic aperture radar (SAR) operates in all weather and day–night, with the capability of penetrating dry soil and vegetation canopies, leading to its widespread adoption in both military and civilian applications [1,2,3,4,5]. Considering that objects in SAR images generally convey essential scene information that is crucial for interpretation, object detection has long been a fundamental research challenge in SAR image analysis.

In practical tasks, we usually need to realize object extraction based on single SAR imagery. Scholars have investigated various solutions to this problem. For example, Shirvany et al. proposed a method anchored in polarimetric features for maritime target detection [1,2,3], references [6,7,8,9] achieved ship detection by leveraging the physical principles of SAR imaging, and song et al. exploited object segmentation from the perspective of statistical modeling [10].

Among various detection algorithms, the Constant False Alarm Rate (CFAR) detector [11,12] has been extensively studied and widely applied. CFAR-based methods essentially treat targets as statistical anomalies distinct from background clutter and conduct detection with adaptive thresholds derived from specific clutter distribution models. Under homogeneous clutter conditions and well-matched statistical models, these techniques can demonstrate satisfactory detection performance. However, with the expansion of SAR applications and improvements in imaging resolution, monitoring scenarios have become complicated. This evolution substantially increases the difficulty of accurate clutter modeling, consequently limiting the effectiveness of CFAR in practice. Although researchers have developed enhanced frameworks [13,14], these modifications still fail to fully address the demanding requirements of contemporary detection tasks. Meanwhile, CFAR processing employs a local contrast strategy, making it inherently sensitive to object sizes [15] and often ineffective for those exhibiting high global but low local contrast. Moreover, CFAR methods rely exclusively on pixel-wise backscatter intensity while failing to exploit other rich information in SAR imagery, rendering them particularly vulnerable to multiplicative texture noise. These limitations collectively impair the performance of CFAR-based algorithms in increasingly complex detection scenarios.

Recently, the rapid development of computer vision (CV) techniques and their demonstrated success in optical image detection have inspired researchers to explore CV-based approaches for SAR image analysis. Saliency-based methods and deep learning (DL)-based techniques represent two prominent directions in this trend.

DL-based methods usually leverage the powerful automatic feature extraction capability of neural networks to capture high-level semantic information from images and then realize an end-to-end object detection. With enough samples for the training of neural networks, DL-based methods can achieve excellent detection performance and generalization ability. For instance, zhou et al. propose an anchor-free Convolutional Neural Network (CNN) framework, which integrates multi-level feature refinement and can outperform several typical competitors in SAR ship detection tasks [16]; Jie Zhou et al. design a novel network by introducing diffusion models to SAR target detection and successfully locate aircrafts [17]. Although DL-based methods have demonstrated their performance in various application scenes, this work focuses on detection based on a single SAR image. In this case, there is a lack of sufficient data and prior knowledge to support the evolution of CNN models. Therefore, DL-based methods are not the research subject of this study. It should be noted that while simulation-based and synthetic data augmentation techniques have been adopted to build SAR image datasets, as exemplified by the Synthetic and Measured Paired and Labeled Experiment (SAMPLE) dataset [18,19], two critical challenges still persist: insufficient fidelity in synthetic data representations and limited generalization of trained models when applied to real-world measured SAR data. These issues require further in-depth research for reliable SAR interpretation.

On another hand, when performing tasks such as scene understanding, the human visual system can rapidly focus attention on regions of interest while ignoring the background. This selective process is known as the visual attention mechanism [20]. In recent years, many scholars introduced visual attention into SAR object detection, developing numerous saliency models which have demonstrated their effectiveness. For example, Lai et al. designed a mechanism for weak target detection by referring to the ITTI saliency model [21]; Zhang et al. proposed an oil tank detection method based on saliency analysis [22]; Jin et al. achieved hurricane eye extraction by utilizing a classic saliency framework which integrates brightness and orientation features [23].

Compared with CFAR detection, saliency-based methods often do not require pre-processing for noise reduction and are not reliant on specific clutter models. Meanwhile, saliency detection can integrate multiple features, including echo power, local contrast, global contrast, and so on, fully exploiting the various information carried by SAR images. It can be said that saliency-based methods successfully circumvent several major obstacles that are difficult to overcome by traditional algorithms represented by CFAR detectors.

Currently, research on SAR saliency detection primarily progresses along two main directions. The first approach integrates saliency with learning-based techniques, particularly CNNs. In such methods, saliency serves as crucial information and is typically involved in model construction in the form of features or weights. Similar to DL-based processing, if there are enough SAR images for model training, learning-based saliency can achieve excellent detection performance. However, this work focuses on the interpretation of single SAR imagery, and learning-based processing is not applicable to the discussion herein. The second approach derives from saliency definition, conducting detection based on the understanding of saliency. For example, Zhai et al. argued that salient objects in SAR images should present both high local and global contrast, thereby realizing inshore ship detection based on this understanding [24]. Ni et al. considered regions that significantly differ from the background as salient targets and then designed a two-stage detection framework accordingly [25]. Gao et al. believed that saliency should consider local contrast, edge density, and boundary connectivity comprehensively, and based on this, they constructed a saliency algorithm for river detection [26]. Generally, definition-based methods firstly construct a saliency map (SM) to highlight objects and suppress the background from the perspective of saliency understanding; then, they achieve object extraction by segmenting the SM. Unlike learning-based processing, these methods do not rely on sample datasets but directly operate on single SAR images. Therefore, definition-based saliency is more appropriate for the discussion in this paper.

Traditional definition-based algorithms predominantly employ a bottom-up mechanism to construct saliency maps. In recent years, many researchers have attempted to incorporate top-down strategies to further enhance the accuracy of saliency detection. For instance, references [27,28] determined the scale of the Gaussian pyramid based on specific tasks, ensuring that the SM has an excellent highlighting effect on the target. Reference [29] utilized morphological operations to filter out interference regions in the SM according to object sizes, thereby more effectively suppressing background noise. However, similar processing depends on the specific detection task and requires corresponding prior knowledge. To address the limitation, this work proposes a two-channel framework by referring to guided search theory (GST) [30]. Within this framework, the prior acquisition processing channel simulates the “where” pathway in GST, automatically computing object prior indication based on edge information and assigning more saliency to potential object regions. Meanwhile, the feature extraction processing channel simulates the “what” pathway in GST, extracting four typical features: brightness, frequency, global contrast, and local contrast, to further reinforce the object presence. Finally, the outputs of these two channels are fused via Bayesian inference to generate an initial SM.

An ideal SM should thoroughly suppress background regions while exclusively highlighting objects. However, due to the complexity of practical scenes, some non-object areas may also exhibit high saliency levels, which will inevitably hinder subsequent object extraction based on the SM. Current research often designs complicated segmentation strategies, or employs discrimination processing, to ensure the final performance of saliency detection. For example, Zhang et al. proposed a localization algorithm based on the active contour model, which accurately identified oil tanks from the SM [31]. Wang et al. applied morphological filtering and clustering to eliminate false alarms after obtaining preliminary detection results through SM binarization, thereby improving the accuracy of the final output [27]. However, most of these methods have high computational complexity and may be subject to certain limitations in practical tasks. Correspondingly, an adaptive iteration mechanism is designed for SM improvement in this work. Through multiple cycles of iteration, it continuously reinforces the presence of objects and suppresses the saliency of the background. In the final SM, there will be a clear distinction between objects and the background; thus, object extraction can be realized with a simple global threshold.

Integrating the prior calculation and the SM modification, this paper proposes a two-channel Bayesian framework with adaptive iteration for salient object detection in single SAR imagery. The main contributions of this work are summarized as follows:

To improve the performance of object detection in single SAR images, this paper proposes a two-channel saliency framework with Bayesian inference and adaptive iteration. The qualitative and quantitative experiments on real SAR datasets demonstrate that our method can present better detection results than several classic competitors.
To acquire an ideal SM which assigns potential object regions with high saliency values, we develop a two-channel framework with the top-down mechanism by imitating the guided search theory in the human vision system. It first calculates the prior and feature information with each channel, respectively, then utilizes Bayesian inference to integrate these two, and finally generates an initial SM.
To further rectify the results of SM generation, we provide an adaptive iteration mechanism. Through iterative processing, object areas are progressively highlighted while background clutter is continuously suppressed. Ultimately, a distinct contrast between the object and background emerges in the SM, allowing for straightforward object extraction via simple threshold segmentation.

2. Methodology

The structure of the proposed algorithm is illustrated in Figure 1. The prior acquisition processing channel firstly calculates an improved object indication using edge strength and standard deviation, subsequently generating object and background priors. Meanwhile, the feature extraction processing channel derives four features: brightness, frequency (count of pixel value occurrence), local contrast, and global contrast. The outputs from these channels are then fused through Bayesian inference to produce an initial SM. After that, an iterative mechanism is implemented where, if the SM fails to meet the termination condition, it serves as a new improved object indication to update the priors for Bayesian inference in the next round, thereby producing a new SM. Upon satisfying the termination condition, the SM generated by the current iteration becomes the final SM of the proposed algorithm. At this stage, there is a distinction between objects and the background in the SM, and the detection results can be obtained through simple OTSU segmentation [32].

2.1. Acquisition of Object/Background Prior

The prior acquisition processing channel operates in a task-independent manner, automatically computing improved object indication to quantify the probability of corresponding areas being objects, and finally outputting the prior for Bayesian inference.

2.1.1. Edge Strength Index

Edges can effectively delineate the structure of scene distribution and reveal potential locations of objects. Therefore, we utilize edge information to estimate the probability of specific regions belonging to object components. The most classical approach for SAR edge detection is the Ratio of Average (ROA) operator [33]. Here in this section, we firstly introduce this operator.

(1) ROA Operator

The ROA operator enables CFAR detection in four directions

{0, π / 4, π / 2, 3 π / 4}

, and its overall structure is illustrated in Figure 2, in which the blue point represents the current pixel, while the gray regions

r_{θ, 1}

and

r_{θ, 2}

denote the reference windows established around the current pixel in a specified direction

θ

. When the ROA operator is applied to an image, it firstly calculates the gray-level mean of corresponding pixels within each reference window, as expressed in Equation (1).

R_{θ, k} = r_{θ, k} * I θ \in \{0, π / 4, π / 2, 3 π / 4\} k \in {1,2}

(1)

where I denotes the input image;

r_{θ, k}

represents the reference window in the direction of

θ

; “*” means convolution operation here;

R_{θ, k}

is the gray-level weighted mean of pixels within the reference window

r_{θ, k}

. Subsequently, the edge strength response of the current pixel, referred to as ES_ROA, can be calculated by Equation (2),

{E S}_{R O A} = \max_{θ \in \{0 °, 45 °, 90 °, 135 °\}} (1 - m i n (\frac{R_{θ, 1}}{R_{θ, 2}}, \frac{R_{θ, 2}}{R_{θ, 1}}))

(2)

where max(•) and min(•) represent the maximum and minimum operators, respectively.

(2) Directional Derivative Function of Anisotropic Gaussian kernels

It should be noted that the ROA operator only employs kernels in four directions. This may lead to compromised detection performance when handling edges with finer directional variations. To address this limitation, we augment the ROA operator by incorporating directional derivatives of anisotropic Gaussian kernels. These kernels not only provide enhanced directional sensitivity but also exhibit inherent noise suppression capabilities, thereby improving the overall robustness of edge detection.

The two-dimensional Gaussian kernel function is given by Equation (3),

g_{σ, ρ} (x) = \frac{1}{2 π σ^{2}} \exp (- \frac{1}{2 σ^{2}} x^{T} [\begin{matrix} ρ^{2} & 0 \\ 0 & ρ^{2} \end{matrix}] x) σ > 0, ρ \geq 1

(3)

where

σ

and

ρ

, respectively, indicate the scale factor and anisotropic factor;

x = {(i, j)}^{T}

denotes one specific cell in the kernel. According to Reference [34], the anisotropic Gaussian kernel can be derived by applying a rotation matrix to the 2D Gaussian kernel, as shown in Equation (4), where

R_{θ}

is the rotation matrix.

g_{σ, ρ, θ} (x) = \frac{1}{2 π σ^{2}} \exp (- \frac{1}{2 σ^{2}} x^{T} R_{- θ} [\begin{matrix} ρ^{2} & 0 \\ 0 & ρ^{2} \end{matrix}] R_{θ} x) σ > 0, ρ \geq 1 R_{θ} = [\begin{matrix} \cos θ & \sin θ \\ - \sin θ & \cos θ \end{matrix}]

(4)

Taking the derivate of

g_{σ, ρ, θ} (x)

with respect to

θ

, the directional derivative function of the anisotropic Gaussian kernel along the

θ

direction is expressed as shown in Equation (5). As illustrated in Figure 3, the pair of reference windows in the kernel have opposite signs, which results in a high sensitivity to grayscale variations. Therefore,

φ_{σ, ρ, θ} (x)

can be regarded as an edge detector along the

θ

direction.

φ_{σ, ρ, θ} (x) = \frac{ρ^{2} [\cos θ \sin θ] x}{σ^{2}} g_{σ, ρ, θ} (x)

(5)

Unlike the ROA operator,

θ

does not impose specific value constraints (given the symmetry of

φ_{σ, ρ, θ} (x)

,

θ

is typically selected from [0,

π

]). This allows for edge detection in arbitrary directions. In this work,

θ

is set to

{0,1, 2, \dots, 7} \times π / 8

, and the corresponding kernels are illustrated in Figure 3.

The edge strength response (ES_G) of this detector can be computed from Equation (6) by applying

φ_{σ, ρ, θ} (x)

to the image, where “*” denotes the convolution operation.

{E S}_{G} = \max_{k = 0,1, \dots, 7} (|φ_{σ, ρ, θ} (x) * I|) θ = k π / 8 k = 0,1, 2, \dots, 7

(6)

(3) Edge Strength Index

In the above process, we utilized the ROA operator and the directional derivative function of the anisotropic Gaussian kernel to obtain ES_ROA and ES_G, respectively. The final Edge Strength Index (ESI) for the original SAR image can be calculated using Equation (7), where “

⨀

” denotes element-wise multiplication of two matrices.

E S I = {E S}_{R O A} ⨀ {E S}_{G}

(7)

Figure 4 illustrates an example of ESI calculation. It can be seen that ES_ROA exhibits significant interference caused by speckle noise. ES_G appears overly sensitive to variations in grayscale intensity, resulting in fragmented detections in regions where the edge direction changes, as well as strong responses within the object interior. In contrast, ESI effectively suppresses such interferences while providing a more complete delineation of the object contour, thereby facilitating subsequent processes.

2.1.2. Improved Object Indication

In this subsection, we firstly extract single-pixel-width edges from the ESI using Non-maxima Suppression (NMS) [35] and a straightforward global threshold segmentation. Empirically, the threshold is set as

μ_{N M S} + σ_{N M S}

, where

μ_{N M S}

and

σ_{N M S}

are, respectively, the mean and standard deviation of the NMS-processed image. Based on the edge information, we further adopt a computational approach inspired by visual filling to generate the object indication (OI), which quantifies the probability of specified regions belonging to object components [36].

Given the abundant speckle noise and possible shadows inherent in SAR images, edge detection results are prone to false alarms, causing certain non-object regions in the OI map also exhibiting high indication values, as exemplified by the areas enclosed in red boxes in Figure 5.

References [37,38] demonstrate that local variance can effectively characterize the edge and shape information of objects. Additionally, in SAR images, the grayscale intensity of objects is typically higher than that of the background and shadows, and the grayscale fluctuations caused by multiplicative speckle noise are more pronounced on objects. It implies that local variance can also reflect the presence of objects. Therefore, we incorporate the local variance feature to suppress indication values of non-object regions in the OI map. The calculation of local variance

f_{v a r} (i, j)

is illustrated in Equation (8),

f_{v a r} (i, j) = \frac{1}{N^{2}} (\sum_{i - M}^{i + M} \sum_{j - M}^{j + M} {I (i, j)}^{2} - \frac{1}{N^{2}} {(\sum_{i - M}^{i + M} \sum_{j - M}^{j + M} I (i, j))}^{2}) N = 2 M + 1

(8)

where N represents the size of the kernel and is empirically set as 7 (that is, the size of local-variance window is 7 × 7); I denotes the input image; (i, j) indicates the coordinates of the current pixel. As exemplified in Figure 5c, compared with the background and shadows, edges and edge-surrounded regions are obviously highlighted in the variance feature map.

Furthermore, we normalize local variance feature and fuse it with the OI, further enhancing the presence of objects while suppressing other components. This process is detailed in Equation (9),

I O I = N (f_{v a r}) ⨀ O I

(9)

where

N (•)

denotes the linear normalization operator (that is, min–max normalization, which will linearly scale all elements to [0, 1]); “

⨀

” represents element-wise multiplication of two matrices; IOI stands for improved object indication. As exemplified in Figure 5d, after being refined, only the object regions in the IOI map maintain high indication values. It can be said that under the combined effect of the OI and local variance feature, the potential locations of objects are estimated rather appropriately.

2.1.3. Object/Background Prior

A higher indication value in the IOI map corresponds to a greater probability of the location belonging to object components. We normalize IOI to the range [0, 1] and utilize it as the object prior, as specified in Equation (10),

P (O) = N (I O I)

(10)

where

N (•)

denotes the normalization operator;

P (O)

represents the prior probability of a region belonging to a salient object. Correspondingly, the prior probability of it belonging to the background is

P (B)

, where

P (B) = 1 - P (O)

.

2.2. Feature Extraction

The feature extraction channel is designed for acquiring feature information that enhances the saliency of objects, thereby enabling more accurate determination of their existence. In SAR images, the object saliency can be considered from three perspectives: (1) objects generally exhibit strong backscatter intensity; (2) whether viewed in the context of the entire image or compared to local surrounding regions, objects are expected to demonstrate uniqueness and distinctiveness in terms of intensity, shape, or some other attributes; (3) the human visual system instinctively focuses on rare or anomalous regions within a scene [39]; compared with the abundant background elements, objects usually occupy a smaller proportion, making them more likely to attract visual attention. Accordingly, we employ four features—brightness, frequency, local contrast, and global contrast—to characterize the object saliency.

(1) Brightness

In SAR images, the intensity of echo power is reflected by grayscale levels. We directly utilize the original image, after linear function normalization, as the brightness feature

f_{L}

, as shown in Equation (11).

f_{L} = \frac{I - m i n (I)}{m a x (I) - m i n (I)} \times 255

(11)

(2) Frequency

In SAR images, particularly those capturing large scenes, objects usually occupy a minor proportion of the image. Thus, the corresponding grayscale components will appear infrequently across the image. We employ a frequency feature to characterize this sparsity. Here, “frequency” refers to the occurrence rate of pixel values, which differs from the conventional concept in image processing (typically computed via Fourier transform) that describes the rate of grayscale variations.

The frequency feature can be calculated by Equation (12),

f_{f r e} (i, j) = \frac{λ}{f_{i, j}} f_{i, j} = \frac{n_{i, j}}{N_{a l l}}

(12)

where

N_{a l l}

denotes the total number of pixels in the input image. If a pixel at coordinates

(i, j)

has a grayscale level

g_{i, j}

, then

n_{i, j}

represents the count of all pixels in the SAR image with grayscale level

g_{i, j}

. The parameter

λ

is an empirical constant, set to 1 in the experiments presented in this work. Due to the infrequent occurrence of the corresponding grayscale components, objects will exhibit higher frequency feature values, while the background is relatively suppressed.

(3) Local contrast

Objects usually exhibit various differences from their surrounding elements. We quantify this local dissimilarity using the method introduced in reference [40]. The detailed steps are described as below:

(S₁) A reference window of size

9 \times 9

is centered at the current pixel. This window is uniformly divided into 9 sub-windows with the size of

3 \times 3

, which are numbered from 0 to 8, just as illustrated in Figure 6.

(S₂) The maximum grayscale value of pixels within sub-window 0 is denoted as L₀.

(S₃) Calculate the mean grayscale values of sub-window 1 to 8, respectively. The maximum of these 8 mean values is denoted as m_max.

(S₄) The local contrast feature

f_{l o c} (i, j)

of the current pixel

(i, j)

is calculated using Equation (13), where

n

is an empirical parameter and set to 5 in this work.

f_{l o c} (i, j) = \frac{L_{0}^{n}}{m_{m a x}}

(13)

(4) Global contrast

From the perspective of an entire SAR scene, objects typically attract visual attention due to their distinctiveness. We employ the global contrast feature to quantify this object-background discriminability, as mathematically formulated in Equation (14),

G_{g l o b} (i, j) = I (i, j) - I_{m e a n} I_{m e a n} = \frac{1}{N_{a l l}} \sum_{i, j} I (i, j) f_{g l o b} (i, j) = \{\begin{array}{r} G_{g l o b} (i, j), G_{g l o b} (i, j) \geq α \\ 0, G_{g l o b} (i, j) < α \end{array}

(14)

where

f_{g l o b} (i, j)

denotes the global contrast corresponding to the pixel at coordinates

(i, j)

; I represents the input image,

I_{m e a n}

is its mean grayscale value,

N_{a l l}

is the total number of pixels in the input image I; parameter

α

is an empirical threshold. In this paper, we adopt an adaptive approach to set

α

, as shown in Equation (15), where

σ

represents the standard deviation of the input image.

α = \frac{I_{m e a n}^{2}}{σ}

(15)

2.3. Saliency Map Generation

Based on IOI revealing potential locations of objects and four features determining the object presence, this section first utilizes Bayesian inference to fuse channel outputs, generating an initial SM. Furthermore, an adaptive iteration mechanism is utilized to refine the SM progressively. In the final SM, object regions will exhibit high saliency values, while other components are substantially suppressed.

According to Bayesian principles, the posterior probability that a certain pixel x in an image belongs to object components can be expressed as Equation (16),

P (O| x) = \frac{P (O) P (x | O)}{P (O) P (x| O) + P (B) P (x | B)}

(16)

where

P (O)

and

P (B)

are the prior probabilities mentioned above in Section 2.1.3;

P (x| O)

and

P (x| B)

represent the likelihood probabilities of observing the sample x in the object and background regions, respectively.

Calculating likelihood probabilities requires object and background sample sets. Here, the threshold selection method described in reference [41] is introduced to determine the optimal threshold T_opt. Then, we use T_opt to binarize the IOI map, roughly estimating the object/background sample set. Specifically, pixels with indication values higher than T_opt form the object sample set S_obj, while the other areas constitute the background sample set S_back.

2.3.1. The Calculation of Weight for Feature Fusion

The Bayesian inference process, when utilized for integrating two channel outputs, requires us to fuse multiple features. Many current saliency algorithms based on feature integration often treat different features as equally important. For instance, reference [24] directly combines local contrast and global contrast under the assumption that they contribute equally. However, such processing fails to consider that some features may have a stronger descriptive capability and should have been assigned more weight, which might further enhance the SM performance.

Accordingly, we assign fusion weights based on the descriptive performance of the features. The calculation of the weight is shown in Equation (17),

w_{f} = \frac{1}{W} |m e a n (S_{o b j}^{f}) - m e a n (S_{b a c k}^{f})| W = \sum_{f} w_{f} f \in {f_{L}, f_{f r e}, f_{l o c}, f_{g l o b}}

(17)

where

w_{f}

is the fusion weight for feature f;

S_{o b j}^{f}

(or

S_{b a c k}^{f}

) denotes a set in feature map f, which corresponds to the sample set S_obj (or S_back);

m e a n (•)

is utilized to calculate the average value. Thus, features that demonstrate significant differences between the object and the background are assigned higher fusion weights, whereas features with minor differences receive lower weights. Compared to an SM generated by equal-weight fusion, our SM will exhibit superior object highlighting effects and improved background suppression capabilities.

2.3.2. The Calculation of Likelihood Probabilities

The estimation of likelihood probabilities relies on the ‘feature conditional independence assumption’ [42], which presumes statistical independence among features. Actually, it remains challenging to mathematically prove this assumption rigorously. The independence between brightness and its derivative contrast features has always been controversial [43]. On the other hand, numerous methods that directly utilize this assumption have achieved excellent results. For instance, the classic Naive Bayes Classifier, widely applied in pattern recognition, operates under this assumption [42]. Therefore, our method still adheres to this assumption.

Based on the independence, the likelihood probabilities of one pixel

x

in the input image I,

P (x| O)

and

P (x| B)

, can be, respectively, calculated by Equations (18) and (19),

P (x| O) = \prod_{f \in {f_{L}, f_{f r e}, f_{l o c}, f_{g l o b}}} {(P (x_{f}| O))}^{w_{f}}

(18)

P (x| B) = \prod_{f \in {f_{L}, f_{f r e}, f_{l o c}, f_{g l o b}}} {(P (x_{f}| B))}^{w_{f}}

(19)

where

P (x_{f}| O)

represents the probability that

x_{f}

in the feature map f (

f \in {f_{L}, f_{f r e}, f_{l o c}, f_{g l o b}}

) belongs to the object regions; similarly,

P (x_{f}| B)

) represents the probability that

x_{f}

belongs to the background regions.

x_{f}

and

x

correspond to each other, which indicates that the feature map f has the same size as the input image I, and the pixel

x_{f}

in f has the same position as the pixel

x

in I.

P (x_{f}| O)

and

P (x_{f}| B)

can be, respectively, calculated by Equations (20) and (21),

P (x_{f}| O) = \frac{N (x_{f}| O)}{N (S_{o b j}^{f})}

(20)

P (x_{f}| B) = \frac{N (x_{f}| B)}{N (S_{b a c k}^{f})}

(21)

where

N (S_{o b j}^{f})

represents the total number of pixels in

S_{o b j}^{f}

(i.e., the number of pixels in the object set S_obj);

N (x_{f}| O)

represents the number of pixels in

S_{o b j}^{f}

that have the same feature value as

x_{f}

(similarly,

N (S_{b a c k}^{f})

and

N (x_{f}| B)

are defined for the background).

By substituting the derived likelihood probabilities into Equation (16), we obtain the posterior probability, which represents the saliency value of pixel x in the current iteration.

2.3.3. The Adaptive Iteration Mechanism

To more thoroughly highlight salient objects and suppress the background, we design an adaptive iteration mechanism to further improve the SM performance. Assuming the iteration index is t, SM (t) represents the SM generation result of the current iteration, where the saliency value of pixel x is denoted as

P_{t} (O| x)

. If the termination condition remains unsatisfied, SM(t) undergoes simple smoothing processing (median filtering is employed in our experiments, and the size of the median window is empirically set as 7 × 7) and subsequently serves as the new IOI for the next iteration. Then, the object/background sample sets, prior probabilities, and likelihood probabilities will be re-calculated, thereby generating a new saliency map, SM (t + 1). This process continues iteratively until the termination condition is satisfied, and the SM provided by the current iteration becomes the final SM of our method.

Here, we utilize Mean Absolute Error (MAE) [44] to establish the termination condition for the iterative process, as shown in Equations (22) and (23),

M A E = \frac{\sum_{x} |P_{t + 1} (O| x) - P_{t} (O| x)|}{N_{a l l}}

(22)

M A E \leq T_{M A E}

(23)

where

T_{M A E}

is an empirical threshold. If

T_{M A E}

is set too large, the iterative optimization will be insufficient, making it difficult to achieve ideal performance. Conversely, if

T_{M A E}

is set too small, the iteration rounds will increase, significantly raising the overall computational complexity of the algorithm. Moreover, some small or weak objects may be continuously weakened during the iteration process, thereby affecting the local performance of the SM. In our experiments,

T_{M A E}

is set to 0.25.

When Equation (23) holds, we consider that the variation between SM (t + 1) and SM (t) is negligible and the iteration process has approached convergence. At this stage, the current SM (t + 1) can be adopted as the final SM of our algorithm.

2.4. Saliency Map Segmentation

In the final SM of our proposed method, there is a distinct difference between the object and the background. We can easily extract the salient objects using simple OTSU threshold segmentation.

3. Experiments

This section comprehensively evaluates the performance of our proposed algorithm in terms of SM and detection results with real SAR images. The ground-object scenes used in experiments are selected from the Moving and Stationary Target Acquisition and Recognition (MSTAR) [45,46] dataset, as shown in Figure 7a. The maritime-object scenes are derived from the SAR Ship Detection Dataset (SSDD) [47], as illustrated in Figure 8a.

3.1. Data Description

The MSTAR dataset is jointly developed by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) in the United States. The corresponding SAR system operates in the X-band with spotlight mode and HH polarization, achieving an imaging resolution of 0.3 m × 0.3 m. The images employed in the experiments are derived from the BTR-60(test) subset, wherein each individual image has dimensions of 128 × 128 pixels and the number of total images is 195.

The SSDD is manually constructed and specifically designed for ship detection in SAR imagery. The images are acquired under four polarization modes, with spatial resolutions ranging from 1 m to 15 m. The ships, located either in open seas or inshore areas, are typically positioned at the center of images. We randomly select 200 images from the SSDD train dataset for experiments. Among them, the majority are open-sea scenes, while a small portion are inshore scenes. In the inshore scenes, ships may be joined with some harbor facilities with strong echoes, making it impossible to distinguish between them, as illustrated in Figure 8a(i,ii). Since object recognition and identification are not within the scope of this study, these facilities are treated as object regions in the following experiments.

3.2. Experimental Settings

In our experiments, some important parameters were configured as below: λ utilized for calculating frequency is set to 1; α utilized for generating global contrast is calculated with Equation (14); T_MAE utilized for constructing termination condition is set to 0.25.

All experiments were conducted on a personal computer equipped with Intel Core i5-9600 CPU and 32GB RAM. The software used here is Matlab R2022a.

3.3. Experimental Analysis of Saliency Map

The essential step in our proposed method is SM generation, and the quality of the SM can indirectly reflect the final detection results. Hence, we firstly evaluate the performance of our method from the perspective of the SM.

To comprehensively evaluate our SM, on one hand, we conducted experiments using both ground and maritime scenes from the MSTAR and SSDD datasets, respectively. On the other hand, we introduced four classical saliency models—ITTI [48], Region Contrast (RC) [49], Spectral Residual (SR) [50], and Contour-Guided Visual Search (CGVS) [41]—as benchmarks. Among them, the ITTI model, based on the feature integration theory and local contrast strategy, represents the most classical visual saliency algorithm. The RC model employs a global contrast strategy, and its performance has been successfully validated in optical applications. Unlike the ITTI and RC models, which compute saliency values in the color (or grayscale) space, the SR model is the most classical saliency algorithm that models visual attention in the frequency domain. The CGVS model, similar to the proposed algorithm, also utilizes Bayesian inference.

3.3.1. Qualitative and Quantitative Analysis of SM Generation Results

The SM generation results of the above algorithms based on ground and maritime scenarios are illustrated in Figure 7 and Figure 8. Within one figure, row (a) presents several original SAR images used in the experiments; row (b) shows the SM generation results of our proposed method corresponding to SAR images in row (a); rows (c–f), respectively, provide the SM generation results of four benchmark algorithms, which are ITTI, SR, RC, and CGVS in sequence; row (g) presents the ground truth (GT) that is manually annotated through human observation.

As illustrated in Figure 7c(iv,vi), although ITTI roughly highlights object areas, it fails to effectively suppress the background regions. This will potentially complicate the subsequent segmentation processing. Perhaps due to the adoption of Gaussian pyramid and cross-scale difference, ITTI cannot uniformly enhance the entire object. Instead, it tends to emphasize the edge regions, resulting in low saliency values in the central region of the object, as indicated by the areas marked with red boxes in Figure 7c(i) and Figure 8c(i). In contrast, the SR algorithm demonstrates better background suppression, while its effect on enhancing objects is slightly insufficient (comparing Figure 7d(i) with Figure 7c(i)), which can easily increase the risk of missed detections in the SM-based segmentation. The RC model achieves acceptable performance in both object enhancement and background inhibition. Its SM primarily exhibits two limitations: (1) under high-power clutter conditions, certain background regions still present undesirably high saliency values, as demonstrated in Figure 7e(vi); (2) owing to utilizing a global contrast mechanism, shadow areas—which typically exhibit lower grayscale levels than both background clutter and objects—are often assigned elevated saliency values, as indicated by the blue-boxed areas in Figure 7e. These shortcomings may adversely affect subsequent salient object extraction. The CGVS model can assign high saliency values to object areas, and there is a clear distinction between regions with high and low saliency values in its SM. However, it simultaneously enhances numerous background regions and presents a more pronounced shadow enhancement effect. As shown in Figure 7f(v), shadow and object regions show comparable saliency levels.

By comparison, our algorithm achieves uniform enhancement across entire object regions, with high-saliency areas precisely matching object regions. It also demonstrates superior background inhibition relative to benchmark methods. A representative interference area exhibiting strong echo (green box, Figure 8a(iii)) exemplifies this advantage: while comparative methods assign it saliency levels comparable to actual objects, our SM maintains near-zero values in such region, thereby minimally impacting downstream processing. Moreover, our SM presents an intrinsic binarization property, enabling direct extraction of high-saliency regions through computationally efficient threshold segmentation. This characteristic significantly simplifies the subsequent object segmentation.

Furthermore, we utilize the P-R curve (precision–recall curve) [51,52] to quantitatively analyze the SM generation results. The definitions of precision and recall are illustrated in Equation (24),

p r e c i s i o n = \frac{T P}{T P + F P} r e c a l l = \frac{T P}{T P + F N}

(24)

where TP, FP, and FN, respectively, represent the numbers of true positive, false positive, and false negative samples. Theoretically, if an SM presents excellent performance, its corresponding P-R curve will approach the upper-right corner and have a large BEP (break-even point) value [53]. The break-event point in a P-R curve indicates the position in which precision equals recall.

The P-R curves corresponding to SMs based on ground and maritime scenes are presented in Figure 9a,b, respectively, and BEP values of these curves are recorded in Table 1. It can be seen that blue curves exhibit the most obvious tendency towards the upper-right corner and demonstrate the best results at the BEP, indicating that the SMs provided by our method present superior overall performance. This finding is consistent with the qualitative analysis presented above.

3.3.2. Analysis of the Binarization Property of SMs

An ideal SM should effectively highlight objects while thoroughly suppressing the background, leading to a clear distinction between areas of high and low saliency values. This property significantly facilitates subsequent object extraction based on the SM. This subsection evaluates the segmentation potential of SMs from the perspective of the binarization property and elaborates its implications for object extraction. Experiments were conducted on the maritime scenes. For consistency, the Otsu threshold—employed in our proposed algorithm for object extraction—was uniformly applied to segment objects from the SMs produced by other benchmark algorithms.

Figure 10 presents partial experimental results. The corresponding original SAR images and GT are shown in Figure 8a and 8g, respectively. Figure 10a–e sequentially display segmentation examples based on the proposed SMs, ITTI SMs, SR SMs, RC SMs, and CGVS SMs. Experimental observations indicate that while CGVS SMs exhibit favorable binary property with high correspondence between segmentation results and regions of high saliency values, they simultaneously activate extensive non-object areas, leading to severe false alarms that render CGVS ineffective in this scenario. ITTI and SR SMs achieve better segmentation quality, with results largely covering object regions and much fewer false alarms than CGVS + OTSU. However, residual false alarms persisting in Figure 10b,c suggest that simple OTSU segmentation struggles to precisely extract objects from ITTI and SR SMs, and more advanced post-processing strategies would be required for these algorithms to achieve excellent detection performance. Results based on RC SMs demonstrate marked improvement over those of ITTI and SR, though noticeable false alarms remain (e.g., the red-boxed region in Figure 10d(iii)). By contrast, the proposed method generates SMs with precise high-saliency localization in object regions and effective background suppression. Correspondingly, its segmentation results exhibit superior control over both false alarms and missed detections, outperforming all four benchmark algorithms. For instance, the high-intensity clutter region marked with a green box in Figure 8a(iii) is thoroughly suppressed in our segmentation, while all other methods produce distinct false alarms in this area.

Quantitative evaluation using precision and recall metrics (Table 2) corroborates the qualitative analysis. CGVS, ITTI, and SR methods achieve low precision scores, indicating substantial false alarms that limit their practical utility. RC maintains relatively balanced precision and recall scores, while the proposed algorithm significantly surpasses RC in both metrics. Notably, although CGVS and ITTI achieve higher recall than our method, this comes at the cost of severely degraded precision, compromising their detection reliability. Comparative analysis of Table 1 (BEP scores) and Table 2 reveals a strong positive correlation between SM quality (BEP) and segmentation performance. This indicates that improving the SM quality facilitates to achieve better object detection results through more concise segmentation processes, such as basic threshold processing.

In summary, the proposed algorithm simultaneously achieves strong object activation and effective background suppression. Its SMs exhibit exceptional binary characteristics, enabling high-quality object extraction via simple threshold segmentation. Experimental validation confirms that our method delivers superior detection performance with minimal post-processing requirements, fulfilling the practical demands of efficient and accurate SAR object detection.

3.4. The Effect of Adaptive Iterative Mechanism

In contrast to conventional saliency detection algorithms that directly segment objects based on the initial SM, the proposed framework incorporates an adaptive iteration mechanism to progressively refine the preliminary SM, thereby ensuring improved performance of the final output. In this subsection, maritime scenes were utilized and all images which undergo five iteration rounds were selected as illustrative examples to analyze the impact of the proposed adaptive iteration mechanism on SM generation.

Figure 11 provides an exemplification of SM evolution during the adaptive iteration. The iterative process conducts five rounds of Bayesian inference, and the SMs generated from each round are sequentially shown in Figure 11b–f. In the SM corresponding to the first round (Figure 11b), objects appear relatively blurred, and some background regions exhibit high saliency levels. If this SM were directly applied to segmentation processing, as is common with conventional saliency algorithms, the final detection results would inevitably suffer accuracy loss. As the iteration process progresses, object components in the SM gradually become clearer, while the background regions are increasingly suppressed. In the final SM (Figure 11f), there is a distinct difference between object areas and the background, allowing us to conduct segmentation with a simple global threshold. Therefore, the adaptive iteration mechanism can effectively improve the SM generated from a single round of Bayesian inference, and according to the analysis in Section 3.3.2, it will indirectly enhance the overall detection performance of our method.

The P-R curves corresponding to each round are presented in Figure 12. As the number of iterations increases, the P-R curves progressively shift towards the upper-right corner, quantitatively revealing the enhancement of SM performance through the iteration processing. On the other hand, the variation from Figure 11e to Figure 11f is minimal, and correspondingly, the blue curve in Figure 12 nearly overlaps with the pink curve. It indicates that the SM reaches a stable state where further iterations provide negligible improvement, prompting the adaptive mechanism to automatically terminate the process. These results confirm that the iteration mechanism not only effectively enhances SM quality but also possesses the capability to terminate the process at an appropriate stage, demonstrating excellent self-adaptability.

3.5. Experimental Analysis of Detection Results

Object extraction results directly reflect the comprehensive performance of detection algorithms. This subsection systematically evaluates the final detection of the proposed saliency algorithm. We employed both ground scenes (BTR-60 (test)) and maritime scenes (SSDD (train)). For ground object detection, the Two Parameter-CFAR (TP-CFAR) detector [11] and the Background-Context-Aware Saliency (BCAsaliency) method [25] were selected as benchmark algorithms. The former is a classical approach in SAR image detection and has been widely applied in practical detection, while the latter—a recently proposed saliency algorithm for SAR ground targets by Ni et al.—demonstrates outstanding object highlighting performance on the MSTAR dataset. Since reference [25] did not provide an extraction scheme for BCAsaliency SMs, the OTSU threshold was adopted to segment its SMs for fair comparison with our proposed method. Thus, the actual benchmark method is denoted as BCAsaliency + OTSU, hereafter abbreviated as BCAsaliency. For maritime object detection, TP-CFAR, Density Censoring-CFAR (DC-CFAR) [54], and Superpixel and Gamma-CFAR (SPG-CFAR) [55] were chosen as baseline algorithms. The latter two are improved CFAR strategies proposed in recent years for SAR maritime targets: DC-CFAR employs superpixel-level processing to mitigate speckle noise and incorporates a screening mechanism to reduce computational redundancy; SPG-CFAR similarly utilizes superpixel-level processing for noise suppression and specifically employs the Gamma distribution for clutter modeling tailored to maritime background.

Figure 13 and Figure 14 presents several experimental examples.

In the ground scenes, the TP-CFAR detector fails to completely extract objects, revealing significant limitations in detection performance. While the BCAsaliency approach generally preserves object integrity and demonstrates robust clutter suppression in homogeneous backgrounds, it exhibits three critical shortcomings. First, as shown in Figure 13d(i,ii), it tends to misclassify background components adjacent to objects as regions of interest, causing detection results to substantially exceed true object boundaries. Second, its clutter suppression capability deteriorates sharply in heterogeneous backgrounds, leading to pronounced false alarms (seeing Figure 13d(iii,vi)); notably, in a strong-clutter scenario depicted in Figure 13d(vi), the object becomes entirely indistinguishable from background noise. Third, BCAsaliency usually generates false alarms in shadow regions (Figure 13d(v,vii)), which is a consequence of its global contrast mechanism. Shadow areas exhibit extremely low backscattered energy, creating strong contrast against typical SAR backgrounds and thereby triggering false positives. In contrast, our proposed algorithm overcomes these limitations by fully segmenting entire objects while maintaining effective clutter suppression across both homogeneous and heterogeneous environments. By integrating multiple features—including echo power (brightness feature)—it successfully circumvents shadow-induced interference, ultimately achieving superior comprehensive detection performance.

In the maritime scenes, all evaluated algorithms achieved satisfactory detection results in relatively clean backgrounds, as shown in Figure 14(i). However, when processing those containing complex content, our method demonstrates superior detection performance. As exemplified in Figure 14(ii), TP-CFAR confronts severe missed detections; both DC-CFAR and SPG-CFAR yield marginally inferior object extraction with non-negligible false alarms; comparatively, the proposed method achieves optimal detection results. Furthermore, our algorithm exhibits more outstanding suppression effects against high-power interference. As illustrated in Figure 14(iii), where a strong interference which is marked by a green box exits, the proposed algorithm suppresses this interference thoroughly, while results from the other three methods all retains conspicuous components corresponding to this interference.

Comprehensively considering experiments in both ground and maritime scenes, the proposed algorithm achieves better results than other classic benchmark methods, presenting excellent detection performance.

We also employ the F_β-measure, which comprehensively considers both precision and recall, to quantitatively analyze detection results. The F_β-measure is defined as the weighted harmonic mean of precision and recall [16], as shown in Equation (25),

F_{β} - m e a s u r e = \frac{(1 + β^{2}) \times p r e c i s i o n \times r e c a l l}{(β^{2} \times p r e c i s i o n) + r e c a l l}

(25)

where the parameter β characterizes the relative importance of recall with respect to precision. If β > 1, recall exerts a more dominant influence on the F_β-measure; conversely, if β < 1, precision contributes more significantly to the metric. In this experiment, we consider precision and recall to be of equal importance and set β to 1.

Figure 15a,b present the F_β-measure values for detection results in ground and maritime scenarios, respectively. In the ground scenes, our proposed algorithm achieves a much higher F_β-measure than those of two classical benchmarks (TP-CFAR and BCAsaliency), demonstrating optimal detection performance. In the maritime scene, although two improved CFAR variants (DC-CFAR and SPG-CFAR) outperform the traditional TP-CFAR, our method still surpasses all three CFAR detectors, yielding the most competitive results.

Comparative analysis of Figure 15a,b reveals that the F_β-measure of TP-CFAR in the ground scene is markedly higher than that in the maritime scene, indicating its sensitivity to scene variations and significant performance instability. In contrast, the proposed algorithm maintains robust performance across both scenarios, with closely aligned F_β-measure values, denoting its consistent efficacy in heterogeneous tasks. These results confirm the superior robustness and generalization capability of our algorithm.

Based on the above analysis, our method exhibits superior detection capabilities compared with the classic SAR detection algorithms. It achieves reliable results across different scenarios, demonstrating excellent detection performance.

3.6. Experimental Analysis of the Setting of Parameter T_MAE

The parameter T_MAE plays a critical role in our algorithm by determining the termination condition for adaptive iteration, which directly influences final detection results. To systematically evaluate its operational mechanism, experiments were conducted on ground scenes with T_MAE values sequentially set to {0.10, 0.15, 0.20, 0.25, 0.30, 0.35}. And then we quantitatively analyzed the detection results under such configurations via the metric of F_β-measure to reveal performance variations. The corresponding F_β-measure scores for each parameter setting are listed in Table 3.

Experimental results demonstrate minor fluctuations in detection performance when T_MAE varies within [0.1, 0.35], with the proposed algorithm consistently maintaining high effectiveness. This phenomenon confirms its low sensitivity to T_MAE parameter settings of our method, indicating its strong parameter robustness. Moreover, these findings further verify the stability and reliability of the proposed algorithm.

4. Discussions

In addition, supplementary experiments are conducted utilizing the MiniSAR dataset [56]. Figure 16 presents a detection exemplification based on the scene of multiple ground objects from MiniSAR dataset. It reveals that the performance of our method is limited when detecting objects with similar intensity to that of the background. As shown in Figure 16a, although objects and background present comparable grayscale levels, they exhibit significant differences in terms of shape and structural characteristics. These observations suggest that the current algorithm, which primarily relies on intensity-based features, may benefit from incorporating additional structural information. In future work, we intend to enhance the current algorithm by integrating structural features. Then, in low-contrast scenes where intensity-based features become less discriminative, the modified algorithm might achieve more reliable detection results.

5. Conclusions

This work presents saliency-based object detection in single SAR imagery. The innovations of the proposed algorithm can be summarized as follows: (1) We propose a framework that initially employs two processing channels to, respectively, capture the object/background prior and four typical features. Subsequently, these two are fused based on Bayesian inference to generate an initial saliency map (SM) to highlight object regions. (2) We introduce an adaptive iteration mechanism to continuously enhance the SM quality. During the iteration process, the object saliency is gradually improved while the background is progressively suppressed. In the final SM, there is a distinct difference between objects and the background, enabling object segmentation with a simple global threshold. Experiments conducted across both ground and maritime scenarios demonstrate that compared with several classical saliency models, our SM exhibits superior performance. Moreover, the object detection results of the proposed method significantly outperform the classical benchmark algorithms. In the future work, we will further improve the current algorithm by incorporating some structural information.

Author Contributions

Conceptualization, H.L.; methodology, H.L.; software, H.L.; validation, H.L. and H.R.; formal analysis, L.Z. and H.R.; investigation, H.L. and H.R.; resources, H.R., Y.Z. and X.W.; data curation, L.Z.; writing—original draft preparation, H.L.; writing—review and editing, H.R. and H.L.; visualization, H.L.; supervision, H.R. and X.W.; project administration, H.R.; funding acquisition, X.W. and H.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 42027805 and 62201124.

Data Availability Statement

Data has been contained within the article.

Acknowledgments

The authors would like to thank the anonymous reviewers and editors for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shirvany, R.; Chabert, M.; Tourneret, J.-Y. Ship and oil-spill detection using the degree of polarization in linear and hybrid/compact dual-pol SAR. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2012, 5, 885–892. [Google Scholar] [CrossRef]
Margarit, G.; Mallorqui, J.J.; Fortuny-Guasch, J.; Lopez-Martinez, C. Exploitation of ship scattering in polarimetric SAR for an improved classification under high clutter conditions. IEEE Trans. Geosci. Remote Sens. 2009, 47, 1224–1235. [Google Scholar] [CrossRef]
Velotto, D.; Soccorsi, M.; Lehner, S. Azimuth ambiguities removal for ship detection using full polarimetric X-band SAR data. IEEE Trans. Geosci. Remote Sens. 2014, 52, 76–88. [Google Scholar] [CrossRef]
Zhou, Q.; Yuan, Y.; Venturino, L.; Yi, W. Direct Target Localization for Distributed Passive Radars with Direct-Path Interference Suppression. IEEE Trans. Signal Process. 2024, 72, 3611–3625. [Google Scholar] [CrossRef]
Zhou, Q.; Yuan, Y.; Li, H.; Lai, Y.; Greco, M.; Gini, F.; Yi, W. Target Localization for Distributed Hybrid Active–Passive Radars. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 8953–8971. [Google Scholar] [CrossRef]
Nunziata, F.; Migliaccio, M.; Brown, C.E. Reflection symmetry for polarimetric observation of man-made metallic targets at sea. IEEE J. Ocean. Eng. 2012, 37, 384–394. [Google Scholar] [CrossRef]
Marino, A.; Hajnsek, I. Statistical tests for a ship detector based on the polarimetric notch filter. IEEE Trans. Geosci. Remote Sens. 2015, 53, 4578–4595. [Google Scholar] [CrossRef]
Iervolino, P.; Guida, R.; Whittaker, P. A model for the backscattering from a canonical ship in SAR imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2016, 9, 1163–1175. [Google Scholar] [CrossRef]
Iervolino, P.; Guida, R. A novel ship detector based on the generalized-likelihood ratio test for SAR imagery. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2017, 10, 3616–3630. [Google Scholar] [CrossRef]
Song, S.; Xu, B.; Li, Z.; Yang, J. Ship detection in SAR imagery via variational Bayesian inference. IEEE Geosci. Remote Sens. Lett. 2016, 13, 319–323. [Google Scholar] [CrossRef]
Novak, L.M.; Burl, M.C.; Irving, W.W.; Owirka, G.J. Optimal polarimetric processing for enhanced target detection. IEEE Trans. Aerosp. Electron. Syst. 1993, 29, 234–244. [Google Scholar] [CrossRef]
Gao, G.; Liu, L.; Zhao, L.; Shi, G.; Kuang, G. An adaptive and fast CFAR algorithm based on automatic censoring for target detection in high-resolution SAR images. IEEE Trans. Geosci. Remote Sens. 2009, 47, 1685–1697. [Google Scholar] [CrossRef]
Cui, Y.; Zhou, G.; Yang, J.; Yamaguchi, Y. On the iterative censoring for target detection in SAR images. IEEE Geosci. Remote Sens. Lett. 2011, 8, 641–645. [Google Scholar] [CrossRef]
An, W.; Xie, C.; Yuan, X. An improved iterative censoring scheme for CFAR ship detection with SAR imagery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4585–4595. [Google Scholar] [CrossRef]
Xiang, D.; Tang, T.; Ni, W.; Zhang, H.; Lei, W. Saliency Map Generation for SAR Images with Bayes Theory and Heterogeneous Clutter Model. Remote Sens. 2017, 9, 1290. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, S.; Ren, H.; Hu, J.; Zou, L.; Wang, X. Multi-Level Feature-Refinement Anchor-Free Framework with Consistent Label-Assignment Mechanism for Ship Detection in SAR Imagery. Remote Sens. 2024, 16, 975. [Google Scholar] [CrossRef]
Zhou, J.; Xiao, C.; Peng, B.; Liu, Z.; Liu, L.; Liu, Y. DiffDet4SAR: Diffusion-Based Aircraft Target Detection Network for SAR Images. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Lewis, B.; Scarnati, T.; Sudkamp, E.; Nehrbass, J.; Rosencrantz, S.; Zelnio, E. A SAR dataset for ATR development: The Synthetic and Measured Paired Labeled Experiment (SAMPLE). In Algorithms for Synthetic Aperture Radar Imagery XXVI; SPIE: Pune, India, 2019; Volume 10987, pp. 39–54. [Google Scholar] [CrossRef]
Shi, Y.; Du, L.; Li, C.; Guo, Y.; Du, Y. Unsupervised domain adaptation for SAR target classification based on domain- and class-level alignment: From simulated to real data. ISPRS J. Photogramm. Remote Sens. 2024, 207, 1–13. [Google Scholar] [CrossRef]
Ma, F.; Sun, X.; Zhang, F.; Zhou, Y.; Li, H. What Catches Your Attention in SAR Images: Saliency Detection Based on Soft-Superpixel Lacunarity Cue. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
Lai, D.; Xiong, B.; Kuang, G. Weak target detection in SAR images via improved itti visual saliency model. In Proceedings of the IEEE International Conference on Signal and Image Processing (ICSIP), Beijing, China, 14–16 August 2017. [Google Scholar] [CrossRef]
Zhang, L.; Liu, C. Oil Tank Extraction Based on Joint-Spatial Saliency Analysis for Multiple SAR Images. IEEE Geosci. Remote Sens. Lett. 2020, 17, 998–1002. [Google Scholar] [CrossRef]
Jin, S.; Li, X.; Shuang, W. Hurricane eye extraction from SAR image using saliency-based visual attention algorithm. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, Australia, 21–26 July 2013. [Google Scholar] [CrossRef]
Zhai, L.; Li, Y.; Su, Y. Inshore ship detection via saliency and context information in high-resolution SAR images. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1870–1874. [Google Scholar] [CrossRef]
Ni, W.; Ma, L.; Yan, W.; Zhang, H.; Wu, J. Background Context-Aware-Based SAR Image Saliency Detection. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1392–1396. [Google Scholar] [CrossRef]
Gao, F.; Wang, J.; Sun, J.; Yang, E.; Zhou, H. Visual Saliency Modeling for River Detection in High-Resolution SAR Imagery. IEEE Access 2017, 5, 1000–1014. [Google Scholar] [CrossRef]
Wang, Z.; Du, L.; Zhang, P.; Li, L.; Wang, F.; Xu, S.; Su, H. Visual Attention-Based Target Detection and Discrimination for High-Resolution SAR Images in Complex Scenes. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1855–1872. [Google Scholar] [CrossRef]
Wang, Z.; Du, L.; Wang, F.; Su, H.; Zhou, Y. Multi-Scale Target Detection in SAR Image Based on Visual Attention Model. In Proceedings of the IEEE 5th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR), Singapore, 1–3 September 2015. [Google Scholar] [CrossRef]
Wang, Z.; Du, L.; Su, H. Target Detection via Bayesian-Morphological Saliency in High-Resolution SAR Images. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6346–6356. [Google Scholar] [CrossRef]
Wolfe, J.M.; Võ, M.L.-H.; Evans, K.K.; Greene, M.R. Visual Search in Scenes Involves Selective and Non-Selective Pathways. Trends Cogn. Sci. 2011, 15, 77–84. [Google Scholar] [CrossRef]
Zhang, L. Saliency-Driven Oil Tank Detection Based on Multidimensional Feature Vector Clustering for SAR Images. IEEE Geosci. Remote Sens. Lett. 2019, 16, 653–657. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Touzi, R.; Lopes, A.; Bousquet, P. A Statistical and Geometrical Edge Detector for SAR Images. IEEE Trans. Geosci. Remote Sens. 1988, 26, 764–773. [Google Scholar] [CrossRef]
Shui, P.-L.; Zhang, W.-C. Noise-Robust Edge Detector Combining Isotropic and Anisotropic Gaussian Kernels. Pattern Recognit. 2012, 45, 806–820. [Google Scholar] [CrossRef]
Zeng, C.; Li, Y.; Li, C. Center–Surround Interaction with Adaptive Inhibition: A Computational Model for Contour Detection. NeuroImage 2011, 55, 49–66. [Google Scholar] [CrossRef]
Anstis, S. Visual filling-in. Curr. Biol. 2010, 20, R664–R666. [Google Scholar] [CrossRef]
Kong, H.S.; Yao, N.; Vetro, A.; Sun, H.; Barner, K.E. Coding Artifacts Reduction Using Edge Map Guided Adaptive and Fuzzy Filtering. In Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, 27–30 June 2004; p. 1394417. [Google Scholar] [CrossRef]
Snorrason, M.; Ruda, H.; Hoffman, J. Image-based model for visual search and target acquisition. In Proceedings of the Ground Target Modeling and Validation ’99, Houghton, MI, USA, 17–19 August 1999. [Google Scholar]
Zhang, Q.; Cao, Z. A Feature Fusion-Based Visual Attention Method for Target Detection in SAR Images. In The Proceedings of the Third International Conference on Communications, Signal Processing, and Systems; Springer Nature: Cham, Switzerland, 2015; pp. 159–166. [Google Scholar] [CrossRef]
Yuan, X. Algorithm Research of Ship Target Detection for Synthetic Aperture Radar. Master’s Thesis, Dalian Maritime University, Dalian, China, January 2018. [Google Scholar]
Yang, K.-F.; Li, H.; Li, C.-Y.; Li, Y.-J. A Unified Framework for Salient Structure Detection by Contour-Guided Visual Search. IEEE Trans. Image Process. 2016, 25, 3475–3488. [Google Scholar] [CrossRef] [PubMed]
Domingos, P.; Pazzani, M. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Mach. Learn. 1997, 29, 103–130. [Google Scholar] [CrossRef]
Mante, V.; Frazor, R.A.; Bonin, V.; Geisler, W.S.; Carandini, M. Independence of Luminance and Contrast in Natural Scenes and in the Early Visual System. Nat. Neurosci. 2005, 8, 1690–1697. [Google Scholar] [CrossRef]
Zhao, J. The Study of Salient Object Detection Algorithm Based On the Visual Perception Mechanism. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, April 2017. [Google Scholar]
Keydel, E.R.; Lee, S.W.; Moore, J.T. MSTAR Extended Operating Conditions: A Tutorial. Algorithms Synth. Aperture Radar Imag. III 1996, 2757, 228–242. [Google Scholar] [CrossRef]
Ren, H.; Zhou, R.; Zou, L.; Tang, H. Hierarchical Distribution-Based Exemplar Replay for Incremental SAR Automatic Target Recognition. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 6576–6588. [Google Scholar] [CrossRef]
Li, J.; Qu, C.; Shao, J. Ship Detection in SAR Images Based on an Improved Faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar] [CrossRef]
Itti, L.; Koch, C.; Niebur, E. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [Google Scholar] [CrossRef]
Cheng, M.-M.; Mitra, N.J.; Huang, X.; Torr, P.H.S.; Hu, S.-M. Global Contrast Based Salient Region Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 569–582. [Google Scholar] [CrossRef] [PubMed]
Hou, X. Saliency Detection: A Spectral Residual Approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 18–23 June 2007; pp. 1–8. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, S.; Sun, Z. Cross-Sensor SAR Image Target Detection Based on Dynamic Feature Discrimination and Center-Aware Calibration. IEEE Trans. Geosci. Remote Sens. 2025, 63. [Google Scholar] [CrossRef]
Sun, Z.; Leng, X.; Zhang, X.; Zhou, Z.; Xiong, B.; Ji, K.; Kuang, G. Arbitrary-Direction SAR Ship Detection Method for Multi-Scale Imbalance. IEEE Trans. Geosci. Remote Sens. 2025, 63. [Google Scholar] [CrossRef]
Zhou, Z. Machine Learning, 2nd ed.; Tsinghua University Press: Beijing, China, 2016; pp. 30–35. [Google Scholar]
Wang, X.; Li, G.; Zhang, X.; He, Y. A Fast CFAR Algorithm Based on Density-Censoring Operation for Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2021, 28, 1085–1089. [Google Scholar] [CrossRef]
Li, T.; Peng, D.; Chen, Z.; Guo, B. Superpixel-Level CFAR Detector Based on Truncated Gamma Distribution for SAR Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 1421–1425. [Google Scholar] [CrossRef]
MiniSAR Complex Imagery. Available online: https://www.sandia.gov/radar/pathfinder-radar-isr-and-synthetic-aperture-radar-sar-systems/complex-data/ (accessed on 15 August 2025).

Figure 1. The structure of the proposed method.

Figure 2. ROA operator illustration. Blue point represents the current pixel; gray regions are reference windows established around the current pixel in specified directions. (a) Kernel in the horizontal direction; (b) kernel in the direction of

π / 4

; (c) kernel in the vertical direction; (d) kernel in the direction of

3 π / 4

.

Figure 2. ROA operator illustration. Blue point represents the current pixel; gray regions are reference windows established around the current pixel in specified directions. (a) Kernel in the horizontal direction; (b) kernel in the direction of

π / 4

; (c) kernel in the vertical direction; (d) kernel in the direction of

3 π / 4

.

Figure 3. Illustration of the directional derivative function of the anisotropic Gaussian kernel.

Figure 4. Exemplification of ESI. (a) Original SAR image; (b) ES_ROA; (c) ES_G; (d) ESI.

Figure 5. Exemplification of IOI: (a) original SAR image; (b) OI, in which red boxes indicate some non-object region with high indication values; (c) feature map of standard deviation; (d) IOI.

Figure 6. Reference window in local contrast calculation.

Figure 7. Exemplifications of SM experiment based on the ground scenes. Row (a): seven original SAR images randomly selected from the dataset used in the experiment. Row (b): SMs generated by our method corresponding to row (a). Row (c): the corresponding ITTI SMs. Row (d): the corresponding SR SMs. Row (e): the corresponding RC SMs. Row (f): the corresponding CGVS SMs. Row (g): the object ground truth corresponding to row (a), which are manually annotated.

Figure 8. Exemplifications of SM experiment based on the maritime scenes. Row (a): three original SAR images randomly selected from the dataset used in the experiment. Row (b): SMs generated by our method corresponding to row (a). Row (c): the corresponding ITTI SMs. Row (d): the corresponding SR SMs. Row (e): the corresponding RC SMs. Row (f): the corresponding CGVS SMs. Row (g): the object ground truth corresponding to row (a), which are manually annotated.

Figure 9. P-R curves corresponding to SMs of different algorithms on ground and maritime scenes. The intersection of a P-R curve with the diagonal dashed line represents its corresponding BEP, where precision equals recall. Subfigures (a) and (b), respectively, correspond to Figure 7 and Figure 8.

Figure 10. Exemplifications of segmenting SM with OTSU threshold based on the maritime scenes. The original SAR images chosen for exemplification are listed in Figure 8a, and the corresponding GT is shown in Figure 8g. Row (a): segmentation results based on SMs of our proposed method. Row (b): segmentation results based on ITTI SMs. Row (c): segmentation results based on SR SMs. Row (d): segmentation results based on RC SMs. Row (e): segmentation results based on CGVS SMs.

Figure 11. Exemplification of SM evolution during the adaptive iteration. (a) is one original SAR image; (b–f), respectively, correspond to the outputs of the first to fifth Bayesian Inference.

Figure 12. P-R curves corresponding to SMs obtained from the first to fifth Bayesian inference during the iteration process.

Figure 13. Exemplifications of detection results based on ground scenes. Row (a): several original SAR images selected from BTR-60(test) subset; row (b): detection results corresponding to row (a) of our proposed method; row (c): detection results corresponding to row (a) of TP-CFAR; row (d): detection results corresponding to row (a) of BCAsaliency; row (e): GT (ground truth) corresponding to row (a). Column (i) displays an original SAR image, object detection results of three methods on it, and the corresponding GT; columns (ii)~(vii) have similar meanings.

Figure 14. Exemplifications of detection results based on maritime scenes. Row (a): several original SAR images selected from SSDD; row (b): detection results corresponding to row (a) of our proposed method; row (c): detection results corresponding to row (a) of TP-CFAR; row (d): detection results corresponding to row (a) of DC-CFAR; row (e): detection results corresponding to row (a) of SPG-CFAR; row (f): GT (ground truth) corresponding to row (a). Column (i) displays an original SAR image, object detection results of four methods on it, and the corresponding GT; columns (ii)~(iii) have similar meanings. The green box in a(iii) indicates an inference with strong power.

Figure 15. F_β-measure of detection results. (a) and (b), respectively, correspond to ground scenes and maritime scenes.

Figure 16. Exemplification of the experiment based on MiniSAR dataset: (a) an original SAR image; (b) the detection result; (c) GT.

Table 1. BEP values of P-R curves for different algorithms.

	Our Method	ITTI	SR	RC	CGVS
Ground scene	0.8323	0.6494	0.7659	0.7930	\
Ocean scene	0.8482	0.5522	0.6864	0.7987	\

Table 2. Quantitative analysis of segmenting SMs via OTSU.

	Our Method	ITTI	SR	RC	CGVS
precision	0.8634	0.3966	0.5489	0.7596	0.0682
recall	0.8315	0.8708	0.7960	0.7679	0.9627

Table 3. F_β-measure scores of detection results corresponding to different T_MAE values.

T_MAE	0.10	0.15	0.20	0.25	0.30	0.35
F_β-measure	0.8279	0.8345	0.8384	0.8399	0.8233	0.7984

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Ren, H.; Zhou, Y.; Zou, L.; Wang, X. Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration. Remote Sens. 2025, 17, 2939. https://doi.org/10.3390/rs17172939

AMA Style

Li H, Ren H, Zhou Y, Zou L, Wang X. Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration. Remote Sensing. 2025; 17(17):2939. https://doi.org/10.3390/rs17172939

Chicago/Turabian Style

Li, Haixiang, Haohao Ren, Yun Zhou, Lin Zou, and Xuegang Wang. 2025. "Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration" Remote Sensing 17, no. 17: 2939. https://doi.org/10.3390/rs17172939

APA Style

Li, H., Ren, H., Zhou, Y., Zou, L., & Wang, X. (2025). Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration. Remote Sensing, 17(17), 2939. https://doi.org/10.3390/rs17172939

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration

Abstract

1. Introduction

2. Methodology

2.1. Acquisition of Object/Background Prior

2.1.1. Edge Strength Index

2.1.2. Improved Object Indication

2.1.3. Object/Background Prior

2.2. Feature Extraction

2.3. Saliency Map Generation

2.3.1. The Calculation of Weight for Feature Fusion

2.3.2. The Calculation of Likelihood Probabilities

2.3.3. The Adaptive Iteration Mechanism

2.4. Saliency Map Segmentation

3. Experiments

3.1. Data Description

3.2. Experimental Settings

3.3. Experimental Analysis of Saliency Map

3.3.1. Qualitative and Quantitative Analysis of SM Generation Results

3.3.2. Analysis of the Binarization Property of SMs

3.4. The Effect of Adaptive Iterative Mechanism

3.5. Experimental Analysis of Detection Results

3.6. Experimental Analysis of the Setting of Parameter T_MAE

4. Discussions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Object Detection in Single SAR Images via a Saliency Framework Integrating Bayesian Inference and Adaptive Iteration

Abstract

1. Introduction

2. Methodology

2.1. Acquisition of Object/Background Prior

2.1.1. Edge Strength Index

2.1.2. Improved Object Indication

2.1.3. Object/Background Prior

2.2. Feature Extraction

2.3. Saliency Map Generation

2.3.1. The Calculation of Weight for Feature Fusion

2.3.2. The Calculation of Likelihood Probabilities

2.3.3. The Adaptive Iteration Mechanism

2.4. Saliency Map Segmentation

3. Experiments

3.1. Data Description

3.2. Experimental Settings

3.3. Experimental Analysis of Saliency Map

3.3.1. Qualitative and Quantitative Analysis of SM Generation Results

3.3.2. Analysis of the Binarization Property of SMs

3.4. The Effect of Adaptive Iterative Mechanism

3.5. Experimental Analysis of Detection Results

3.6. Experimental Analysis of the Setting of Parameter TMAE

4. Discussions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.6. Experimental Analysis of the Setting of Parameter T_MAE