1. Introduction
As an active microwave remote-sensing technology, Synthetic Aperture Radar (SAR) exploits signal processing and aperture synthesis to achieve high-resolution imaging and offers all-weather, all-day, and strong anti-interference observation capabilities [
1]. By transmitting and receiving multi-polarization and multi-band electromagnetic waves, SAR is unaffected by illumination and meteorological conditions, efficiently probes surface structures and targets, and produces high-resolution images. SAR image target detection, a core step in SAR data interpretation, has been widely applied in maritime management, military reconnaissance, disaster monitoring, urban planning, and traffic navigation [
2,
3,
4,
5,
6].
In recent years, ship detection in SAR imagery has become a research focus. Nevertheless, targets such as ships exhibit complex backscattering characteristics and are easily affected by geometric distortion, speckle noise, and background clutter, posing severe challenges for detection. Traditional methods, including Constant False Alarm Rate (CFAR) algorithms [
7,
8] and texture- or structure-based techniques [
9,
10], are simple and widely adopted in engineering, yet their limited feature-representation capacity makes it difficult to suppress speckle noise and handle complex backgrounds, resulting in constrained detection performance.
The success of deep learning, especially Convolutional Neural Networks (CNNs), in computer vision has driven innovation in SAR target detection. Two-stage detectors, such as the Region-based Convolutional Neural Networks (R-CNN) family [
11,
12,
13], markedly improve accuracy through region-proposal generation and refined classification–regression, but their performance heavily relies on large-scale annotated data. Influenced by speckle noise and complex backgrounds, they often produce redundant or false proposals, limiting efficiency and generalization. Single-stage detectors, represented by You Only Look Once (YOLO) variants [
14,
15], directly regress category and location in an end-to-end manner, significantly enhancing efficiency and reducing training and inference costs. Nevertheless, both two-stage and single-stage frameworks depend strongly on annotated source-domain data, whereas annotating SAR images is labor-intensive, hampering real-world deployment, but using simulated, synthetic, or augmented data can improve the situation [
16,
17,
18].
In practice, different SAR detection tasks involve diverse sensors, imaging parameters, and environmental conditions, leading to significant distribution discrepancies between source and target domains. Directly applying a source-trained model to the target domain results in drastic performance degradation, making it imperative to enhance cross-domain generalization.
Traditional domain-adaptation techniques comprise feature-transformation and distribution-alignment approaches. The former maps source and target domains into a unified subspace to reduce distribution gaps, exemplified by Subspace Alignment (SA) [
19] and Transfer Component Analysis (TCA) [
20]. The latter directly minimizes the distribution distance, such as through Maximum Mean Discrepancy (MMD) [
21] and sample-reweighting methods [
22]. In addition, several studies have concentrated on super-resolution domain-adaptation techniques [
23], which can be grouped into three main categories: Feature-Based, Model-Based, and Semi-Model-Based approaches. Feature-driven methods rely on transformations in feature space; typical implementations include template matching and Bayesian classifiers. Model-driven methods achieve domain transfer through physical modeling, such as scattering-center parameterization. Semi-model-driven methods combine the advantages of the first two by using quasi-invariant local feature modeling. Lane et al. [
24] proposed a Bayesian super-resolution algorithm based on Markov chain Monte Carlo that estimates the full posterior probability distribution of a target’s scattering field within a Bayesian framework, thereby quantifying the uncertainty introduced by the super-resolution process and markedly improving classification accuracy in low-resolution scenarios. Shi et al. [
25] presented an automatic SAR target-recognition method that integrates a Super-Resolution Generative Adversarial Network (SRGAN) with a Deep Convolutional Neural Network (DCNN); SRGAN first enhances the visual resolution and feature representation of low-resolution images, and DCNN then performs high-accuracy classification, significantly boosting recognition performance under complex conditions.
To overcome these limitations, recent studies have focused on Unsupervised Domain Adaptation (UDA) based on deep learning, which requires no target-domain labels and leverages source-domain annotations for knowledge transfer, thereby enhancing target-domain performance. Current mainstream UDA techniques, including adversarial learning [
26,
27,
28,
29,
30,
31,
32,
33], self-training with pseudo-labels [
34,
35,
36], and distribution-metric approaches [
37,
38], have greatly advanced cross-domain detection. However, cross-sensor SAR scenarios exhibit enormous inter-domain gaps due to differing imaging mechanisms, parameter configurations, and background environments. Existing UDA methods still struggle with refined distribution alignment and cross-domain feature representation, leaving domain shift unresolved.
To tackle these challenges, this paper introduces a few-shot SAR target-detection framework for cross-sensor scenarios (CS-FSDet), aiming to enable efficient source-domain knowledge transfer and boost target-domain generalization. The main contributions are as follows.
We propose a novel cross-sensor few-shot SAR target-detection framework that integrates two innovative modules, MUBDA and ACICA, significantly improving detection under sample scarcity and distribution discrepancy.
We introduce MUBDA, which models SAR features as posterior Gaussian distributions and employs an MMD loss incorporating multi-scale Gaussian kernels and uncertainty-driven dynamic weighting, achieving fine-grained alignment of source and target distributions across scales and classes and substantially mitigating domain shift.
We design the ACICA module, which fuses cross-domain attention information by computing spatial-attention similarity and learning interaction weights adaptively, thereby suppressing domain-specific noise and enhancing domain-shared target-feature representation, leading to improved robustness and detection accuracy.
The remainder of this paper is organized as follows.
Section 2 provides a detailed overview of related research areas, including few-shot SAR target detection and representative works in cross-domain scenarios.
Section 3 formalizes the task definition and systematically describes the overall methodological framework, with a focus on the technical details of the Multi-scale Uncertainty-aware Bayesian Distribution Alignment module (MUBDA) and the Adaptive Cross-Domain Interactive Coordinate Attention module (ACICA).
Section 4 presents the complete experimental setup and comparison protocols, followed by a comprehensive analysis and discussion of the experimental results. Finally,
Section 5 concludes the paper and discusses possible directions for future research in cross-sensor few-shot SAR target detection.
3. Materials and Methods
3.1. Overall Architecture
To systematically address inter-domain feature discrepancies in cross-sensor SAR target detection—stemming from differences in resolution, noise distribution, and imaging conditions—we propose an end-to-end few-shot SAR detection framework. Its core innovations are the Multi-scale Uncertainty-aware Bayesian Distribution Alignment module (MUBDA) and the Adaptive Cross-domain Interactive Coordinate Attention module (ACICA). As shown in
Figure 1, the workflow comprises the following key stages.
First, SAR images from the source and target domains are fed into a parameter-shared feature extractor, yielding multi-scale features in a unified representation space. To capture predictive uncertainty and achieve joint alignment of mean and covariance distribution structures, the features pass through a Bayesian Neural Network (BNN) that represents each spatial location as a Gaussian posterior with mean and covariance, facilitating downstream distribution alignment.
For feature-level alignment, the MUBDA module employs multi-scale, multi-bandwidth kernel fusion and entropy-based uncertainty weighting to finely calibrate source and target feature distributions under varying resolutions, noise intensities, and class conditions, thereby reducing cross-domain shift. In parallel, the ACICA module introduces cross-domain spatial-attention similarity and adaptive interaction-weight learning, explicitly enhancing domain-shared feature representation while suppressing domain-specific interference.
The cross-domain fused features output by these two modules are forwarded to a detection head, where a regression branch and a classification branch perform bounding-box localization and category prediction, respectively, achieving end-to-end cross-domain SAR target detection. The feature extractor and BNN parameters are shared between source and target domains, maximizing the exploitation of domain-shared information and improving generalization.
In summary, the proposed architecture integrates dynamic distribution alignment (MUBDA) with explicit attention-based interaction (ACICA) to resolve inter-domain distribution inconsistencies at their source, enabling efficient and robust cross-sensor SAR target detection.
The architecture jointly leverages shared feature extraction and Bayesian uncertainty modeling for both source and target domains. The integration of the MUBDA and ACICA modules enables dynamic probabilistic alignment and interactive attention fusion, respectively. These enhanced features are passed to unified detection heads for precise localization and classification, effectively bridging cross-domain discrepancies under diverse sensor conditions.
3.2. Multi-Scale Uncertainty-Aware Bayesian Distribution Alignment
To address the significant shifts in scale and noise distribution between high-resolution source domain and low-resolution target domain SAR images, this paper proposes a resolution-adaptive Bayesian uncertainty-weighted distribution alignment method (MUBDA). As shown in
Figure 2, this approach integrates several innovative mechanisms, including Bayesian neural network posterior distribution modeling, mean-covariance adaptive kernel alignment, multi-kernel fusion strategy, uncertainty-based dynamic weighting, and multi-scale category-conditioned alignment. Together, these modules enable fine-grained calibration of distribution shifts between the source and target domains at the feature level, providing a solid theoretical foundation and practical support for improved cross-domain SAR target detection.
3.2.1. Posterior Distribution Retrieval for Bayesian Neural Networks
First, to quantify the model’s predictive uncertainty at each spatial location, we use a Bayesian neural network (BNN) to explicitly model the posterior distribution of features as a Gaussian. Specifically, let the shared feature extraction backbone be denoted as
. For an input image
x, at the
s-th level of the feature pyramid network (FPN) and spatial position (or detection box index)
i, the network outputs the corresponding posterior Gaussian distribution parameters as follows:
where
is the mean vector of the latent feature at that position, and
is the corresponding covariance matrix. This posterior distribution not only provides an expectation representation for the feature but, more importantly, captures the model’s predictive uncertainty at this position through
, which can be quantified by the entropy:
3.2.2. Mean-Covariance Adaptive Distribution Alignment Kernel
To align both the means and uncertainties in the feature distribution space, we further introduce a closed-form Gaussian–Gaussian kernel that fuses mean and covariance information:
where
is a numerical stability term, and
and
represent two Gaussian distributions. This kernel function is equivalent to an exponential mapping of the Mahalanobis distance defined by the parameters
and
of the two Gaussians, and the determinant term
adaptively adjusts the uncertainty weighting, reflecting the “volume” of the fused distribution and balancing coverage and difference.
3.2.3. Multi-Kernel Fusion Strategy
Furthermore, considering the significant variations in noise and texture scales of SAR images at different resolutions, a single fixed-bandwidth kernel cannot accommodate diverse scale differences. Therefore, this paper proposes a multi-bandwidth kernel fusion strategy with adaptive bandwidths and weights, allowing dynamic coverage of a broader range of scale shifts. Specifically, the multi-kernel fusion kernel is expressed as
where
is the Gaussian–Gaussian kernel with bandwidth
, and
. Unlike traditional approaches, both the bandwidths
and the weights
in this paper are adaptively learned in an end-to-end manner. For each kernel component
m, the bandwidth
is generated by feeding the statistical features
(such as global mean, variance, maximum, entropy, etc.) of the corresponding scale into a lightweight neural network (such as an Multilayer Perceptron), followed by a softplus activation to ensure positivity:
where
is a learnable mapping function,
, and
is the feature tensor at the
m-th scale, where GAP is the Global Average Pooling. At the same time, the weights
for each kernel are obtained by applying a softmax normalization to a set of learnable parameters
:
3.2.4. Uncertainty-Driven Dynamic Weighting
In addition, considering that high-noise regions or object boundaries in SAR images often exhibit higher uncertainty, assigning equal weights to all regions during alignment may introduce errors. To address this, for any pair of Gaussian distributions
, we design an inverse entropy-based dynamic weighting mechanism to suppress the influence of high-uncertainty regions during alignment:
where
is the entropy of the Gaussian distribution, and
controls the attenuation strength of entropy on the weight. This design assigns smaller weights to distributions with higher uncertainty, thereby enhancing the robustness of alignment.
3.2.5. Dual-Domain Independent Coordinate Attention Feature Extraction
To further improve domain adaptation precision, MUBDA aligns distributions across multiple feature scales and target categories. Let the set of posterior Gaussians for the source domain at scale
s and category
c be
, and the set for the target domain be
. The weighted uncertainty maximum mean discrepancy (WU-MMD) is then defined as:
where
,
, and
.
To account for differences across scales and categories, we further introduce scale weights
and category weights
and define the final distribution alignment loss function as
where
denotes the importance of the
s-th scale, typically set to
or determined by cross-validation.
can be set according to sample frequency or the inverse of average uncertainty to balance the contribution of each category. By combining multi-scale and category-aware adaptive weighting strategies, this method enables precise calibration of the feature distribution shift between the source and target domains. When
approaches zero, it indicates that the posterior Gaussian distributions (including both means and uncertainties) of the source and target domains have been well aligned across all scales and categories. Moreover, the cross-domain generalization ability of the MUBDA method under few-shot conditions is reinforced by three cooperative mechanisms. First, its core lies in using a Bayesian Neural Network to model features as Gaussian distributions, explicitly aligning the means and covariances of source- and target-domain features. Compared with conventional approaches that align only means, this higher-order distribution alignment captures the overall structure and uncertainty of feature distributions more completely, making effective use of source-domain statistics to fill distribution gaps when target-domain samples are extremely scarce. Second, MUBDA introduces a multi-scale, multi-bandwidth fusion mechanism designed to accommodate statistical differences in resolution, scale, and class among cross-domain data. By covering global, class-level, and local-detail statistics, this mechanism greatly alleviates the “extremely sparse distribution statistics” problem caused by a lack of target-domain samples, thereby avoiding feature mismatches and information loss and enhancing the generality of domain alignment. Finally, by means of an uncertainty-aware dynamic weighting strategy (Equation (
7)), the model automatically assigns lower loss weights to high-noise, high-entropy samples according to the entropy of their Gaussian distributions. This strategy suppresses the ineffective transfer of noise and errors when target-domain annotations are sparse, ensuring robust model training. Together, these higher-order distribution-modeling and uncertainty-driven adaptive-alignment components improve the robustness and generalization of few-shot cross-domain detection. To facilitate understanding, the overall procedure of the proposed Multi-scale Uncertainty-aware Bayesian Distribution Alignment is summarized in Algorithm 1.
Algorithm 1 Multi-scale Uncertainty-aware Bayesian Distribution Alignment (MUBDA) |
- Require:
Input: Source domain samples with full labels; target domain samples with few-shot labels; feature extraction network ; number of feature scales S; number of kernels M Output: Trained feature extractor and detection network - 1:
for each training batch do - 2:
for each image x in do - 3:
for each scale to S do - 4:
Extract feature map - 5:
for each position or region i do - 6:
Estimate Gaussian posterior via BNN - 7:
Compute entropy - 8:
end for - 9:
end for - 10:
end for - 11:
for each kernel to M do - 12:
Compute statistical descriptor from (mean, var, max, entropy, …) - 13:
Set bandwidth - 14:
end for - 15:
Compute kernel weights for - 16:
for each pair of Gaussians do - 17:
for each kernel m do - 18:
Compute with bandwidth - 19:
end for - 20:
Compute fused kernel: - 21:
Compute uncertainty-aware weight: - 22:
end for - 23:
for each scale to S do - 24:
for each category c do - 25:
Collect posteriors for source: - 26:
Collect posteriors for target: - 27:
Compute using and - 28:
end for - 29:
end for - 30:
Compute final alignment loss: - 31:
Update network parameters by minimizing and detection loss - 32:
end for - 33:
return Trained model
|
3.3. Adaptive Cross-Domain Interactive Coordinate Attention
Given that SAR images acquired from different sensors exhibit substantial cross-domain feature shifts due to inherent differences in imaging conditions such as operating frequency, polarization, and spatial resolution, we propose an Adaptive Cross-Domain Interactive Coordinate Attention (ACICA) module to significantly improve the generalization performance of cross-domain SAR target detection models. As shown in
Figure 3, building upon the traditional coordinate attention mechanism, this module introduces cross-domain spatial attention similarity calculation and adaptive interaction weight learning, thereby enabling explicit feature interaction between the source and target domains, suppressing domain-specific interference, enhancing the expression of domain-shared target features, and ultimately improving the accuracy and robustness of cross-domain detection.
3.3.1. Dual-Domain Independent Coordinate Attention Feature Extraction
Specifically, for the source domain feature
and the target domain feature
, spatial coordinate attention is independently applied to extract spatial attention features for both domains. Taking the source domain as an example, the feature
is subjected to adaptive average pooling along the height (
H) and width (
W) dimensions to obtain
where
and
denote the features pooled along the height and width, respectively. These are then concatenated along the spatial dimension to form
This concatenated feature is passed through convolution, batch normalization, and an h-swish activation function for nonlinear feature transformation, resulting in
where h-swish [
48] is the activation function. The transformed feature
is then split again and normalized by the Sigmoid function to obtain height and width attention weights:
Finally, the attention weights along height and width are fused to yield the spatial coordinate attention map for the source domain:
The extraction of the target domain attention feature follows the same procedure.
3.3.2. Cross-Domain Spatial Attention Similarity Computation
To quantify and utilize the spatial correlation between domain attention features, we further compute the pixel-wise cosine similarity of spatial attention between source and target domains. For spatial location
, the similarity is defined as
Similarly, the similarity from target to source, , is equivalent. In implementation, these similarity measures are extended to match the feature dimension, i.e., .
3.3.3. Cross-Domain Interaction Parameter
To achieve adaptive adjustment of interaction weights, we employ a data-driven strategy to determine the cross-domain interaction parameter
. First, global average pooling is performed on the spatial attention maps
and
for the source and target domains, yielding global attention descriptors:
These global attention descriptors are concatenated and input into a multilayer perceptron (such as a
convolution or fully connected layer), followed by a Sigmoid activation to yield the interaction strength
, allowing dynamic adjustment based on the inter-domain feature relationship:
3.3.4. Cross-Domain Attention Interaction Fusion Strategy
Finally, integrating the above features, we obtain the enhanced source and target domain feature representations. The enhanced source domain feature is
where ⊙ denotes element-wise multiplication. This fusion strategy preserves the spatial structure of the original features while introducing cross-domain attention interaction, thereby achieving feature enhancement and semantic alignment. This significantly improves the generalization and detection accuracy of the model in cross-domain SAR target detection tasks. Under few-shot conditions, the ACICA module further alleviates the lack of spatial-semantic representation caused by scarce target-domain annotations through adaptive fusion of cross-domain spatial-attention information. Its operation consists of two main stages. (i) Adaptive interaction of attention features: the cosine similarity of spatial coordinate attention between the source and target domains is computed at each spatial location to enable interaction and initial alignment of cross-domain spatial structures; the dynamically generated interaction weight
(Equation (
20)) then adjusts the supplementation strength of source-domain information to the target domain on the basis of the spatial-feature differences in the current batch. (ii) Feature enhancement and fusion (Equation (
22)): target-domain features explicitly incorporate the similarity-weighted source-domain spatial attention, allowing effective transfer of spatial-structure information and thus compensating for incomplete spatial-semantic representation due to sparse annotations in the target domain. Through this efficient cross-domain transfer of spatial-structure information, ACICA markedly improves feature-transfer effectiveness under few-shot conditions.
5. Discussion
Although the proposed CS-FSDet method demonstrates a marked performance advantage in cross-sensor few-shot SAR target detection, several scientific phenomena and engineering challenges merit further investigation.
First, the MUBDA module leverages Bayesian uncertainty modeling and multi-scale kernel fusion to refine feature-distribution alignment, effectively mitigating cross-domain shifts induced by sample scarcity and resolution discrepancies. Nonetheless, the experimental results show that model stability and robustness remain vulnerable under extreme data imbalance or when noise levels greatly exceed those of typical scenes. This finding suggests that, in practical applications, an alignment mechanism based solely on a Gaussian assumption may not fully capture the more complex distribution structures of the target domain, indicating a need for more universally applicable distribution-modeling strategies.
Second, the ACICA module enhances features through explicit spatial-attention interaction; however, when the imaging mechanisms of the source and target domains differ substantially, the effectiveness of spatial-structure transfer diminishes. In several test cases, detection performance for densely occluded or very small targets still has room for improvement, implying that the current attention-interaction mechanism requires stronger generalization in complex structural scenes. It is also notable that the two modules yield synergistic gains in most scenarios, yet the theoretical limits and underlying mechanisms of their interaction have not been fully revealed. Future work should probe their cooperative dynamics through theoretical analyses and interpretability studies.
From an application standpoint, all experiments in this study employ idealized cross-domain few-shot settings and do not account for factors common in real SAR imagery, such as anomalous samples, open-set target categories, or extreme variations in imaging conditions. Furthermore, data acquisition and annotation costs continue to constrain progress in the SAR field. Although the proposed method performs well under very limited data, reducing dependence on manual annotation in extremely low-label or label-free environments remains an important avenue for future exploration.
6. Conclusions
This study proposes a few-shot SAR target-detection framework for cross-sensor scenarios (CS-FSDet) to counteract the severe performance degradation that occurs when models are transferred across sensors with scarce target-domain data. The key contributions are as follows:
We introduce the Multi-scale Uncertainty-aware Bayesian Distribution Alignment (MUBDA) method, which fuses multi-scale features, class information, and uncertainty weighting to align feature distributions between high-resolution source data and low-resolution target data.
We design the Adaptive Cross-domain Interactive Coordinate Attention (ACICA) module, which explicitly models source–target relationships via spatial-attention similarity and an adaptive interaction mechanism, thereby suppressing inter-domain discrepancies and enhancing shared features.
The synergy between MUBDA and ACICA markedly improves cross-domain knowledge transfer efficiency.
On the HRSID→SSDD and SSDD→HRSID tasks, the proposed method consistently outperforms state-of-the-art approaches by 1.27%–2.78% and 1.26%–2.17% mAP, respectively, across all shot settings. In particular, under the 5-shot HRSID→SSDD setting, our method raises the mAP of the baseline AsyFOD by 5.82%. These results strongly validate the effectiveness of the proposed approach.
Although our framework enhances cross-sensor few-shot SAR detection, it still faces limitations in open-set scenarios. Future work will focus on alleviating the representation gap for unseen classes under cross-sensor domain shift to further improve generalization and robustness in open-set target detection.