SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection

Lei, Lanfang; Chang, Sheng; Sun, Zhongzhen; Zheng, Xinli; Liao, Changyu; Wei, Wenjun; Ma, Long; Zhong, Ping

doi:10.3390/rs18040619

Open AccessArticle

SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection

by

Lanfang Lei

¹,

Sheng Chang

²

,

Zhongzhen Sun

³,

Xinli Zheng

¹

,

Changyu Liao

¹,

Wenjun Wei

¹,

Long Ma

¹ and

Ping Zhong

^1,*

¹

College of Physics, Donghua University, Shanghai 201620, China

²

Department of Space Microwave Remote Sensing System, Aerospace Information Research Institute, Beijing 100190, China

³

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(4), 619; https://doi.org/10.3390/rs18040619

Submission received: 20 December 2025 / Revised: 31 January 2026 / Accepted: 12 February 2026 / Published: 16 February 2026

(This article belongs to the Special Issue Advanced Methods and Applications in SAR (Synthetic Aperture Radar) Image Target Detection and Recognition)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Proposed SAR-DRBNet, a high-precision rotated object detection method based on YOLOv13, integrating three novel modules: DEOBB (Detail-Enhanced Oriented Bounding Box Detect Head), CkDRB (Ck-MultiDilated Reparam Block), and DynWeave (Dynamic Feature Weaving Module).
DEOBB enhances small target perception and rotation invariance via multi-branch enhanced convolution, while CkDRB utilizes re-parameterization and dilated convolutions to efficiently extract multi-scale features and suppress SAR speckle noise.
Extensive experiments on HRSID, RSDD-SAR, and DSSDD datasets demonstrate that SAR-DRBNet outperforms state-of-the-art OBB detectors, achieving an optimal balance between accuracy and efficiency with strong cross-dataset generalization.

What are the implications of the main findings?

Demonstrates the effectiveness of “Algebraically Equivalent Aggregation” (via CkDRB) in resolving the conflict between inference speed and noise suppression capability in complex SAR imagery.
Validates that dynamic feature weaving (DynWeave), through global–local dual attention, significantly improves robustness against scale diversity and complex backgrounds, providing a stable and efficient technical solution for rotated SAR detection.

Abstract

Synthetic aperture radar (SAR) imagery is widely used for target detection in complex backgrounds and adverse weather conditions. However, high-precision detection of rotated small targets remains challenging due to severe speckle noise, significant scale variations, and the need for robust rotation-aware representations. To address these issues, we propose SAR-DRBNet, a high-precision rotated small-target detection framework built upon YOLOv13. First, we introduce a Detail-Enhanced Oriented Bounding Box detection head (DEOBB), which leverages multi-branch enhanced convolutions to strengthen fine-grained feature extraction and improve oriented bounding box regression, thereby enhancing rotation sensitivity and localization accuracy for small targets. Second, we design a Ck-MultiDilated Reparameterization Block (CkDRB) that captures multi-scale contextual cues and suppresses speckle interference via multi-branch dilated convolutions and an efficient reparameterization strategy. Third, we propose a Dynamic Feature Weaving module (DynWeave) that integrates global–local dual attention with dynamic large-kernel convolutions to adaptively fuse features across scales and orientations, improving robustness in cluttered SAR scenes. Extensive experiments on three widely used SAR rotated object detection benchmarks (HRSID, RSDD-SAR, and DSSDD) demonstrate that SAR-DRBNet achieves a strong balance between detection accuracy and computational efficiency compared with state-of-the-art oriented bounding box detectors, while exhibiting superior cross-dataset generalization. These results indicate that SAR-DRBNet provides an effective and reliable solution for rotated small-target detection in SAR imagery.

Keywords:

synthetic aperture radar (SAR); rotated object detection; YOLOv13; feature fusion; attention mechanism; SAR-DRBNet

1. Introduction

Synthetic Aperture Radar (SAR) enables high-resolution imaging under all-weather and all-day conditions, and has demonstrated significant application value in remote sensing monitoring, military reconnaissance, environmental observation, and disaster emergency response [1,2,3,4]. Compared with optical imaging, SAR is insensitive to illumination and weather conditions and can penetrate clouds, smoke, and even vegetation to a certain extent, providing unique advantages for target detection in complex environments [5]. However, SAR images are inherently affected by strong speckle noise, large target scale variations, and complex backgrounds [6,7,8], which severely limit the performance of traditional target detection methods, particularly for small targets and rotated objects. In practical airborne/spaceborne SAR systems, image quality can be further degraded by intentional interference (e.g., deceptive jamming) and system-induced ambiguities (e.g., range ambiguity) in multichannel and high-resolution wide-swath (HRWS) imaging, which can complicate downstream target interpretation and detection; recent studies have investigated jammer localization/suppression and ambiguity suppression schemes based on (under)determined blind source separation [9,10,11,12].

Early SAR target detection methods were mainly based on handcrafted feature extraction, template matching, or filtering techniques, whose detection logic typically relied on simple intensity differences or fixed templates [13,14]. Studies have shown that such methods suffer from high false alarm rates in complex cluttered backgrounds and lack adaptability to targets of varying sizes, resulting in limited robustness to multi-scale, rotated targets and complex scenes. With the introduction of machine learning techniques, detection models gained the ability to automatically learn target feature distributions through training, leading to improved detection performance. Nevertheless, their capabilities in rotation invariance, multi-scale adaptability, and speckle noise suppression remain insufficient.

In recent years, deep learning has achieved remarkable progress in object detection and has demonstrated superior performance in complex scenarios that are difficult for traditional methods to handle. According to detection strategies, existing approaches can be broadly categorized into single-stage detectors [15], two-stage detectors [16], and Transformer-based detection methods [17]. Single-stage detectors perform object localization and classification simultaneously within a unified framework, enabling end-to-end inference with high computational efficiency and fast inference speed. As a result, they have been widely adopted in real-time detection applications. However, because these methods rely heavily on global feature representations for direct prediction of bounding boxes and categories, their ability to distinguish small targets, densely distributed objects, and targets in complex backgrounds remains limited, often leading to missed detections or false alarms.

In contrast, two-stage detectors employ a coarse-to-fine detection strategy, where potential target regions are first generated as proposals and then refined through fine-grained feature analysis and classification. This two-step process significantly improves detection accuracy but introduces higher computational complexity and slower inference speed, making such methods less suitable for real-time applications and resource-constrained environments. More recently, Transformer-based detection methods have incorporated global self-attention mechanisms to model long-range dependencies among objects, enabling a more comprehensive understanding of global scene context. These methods exhibit inherent advantages in handling multi-scale targets, rotated objects, and complex backgrounds, resulting in improved detection accuracy and robustness. However, Transformer architectures typically involve substantial computational overhead and memory consumption, which pose challenges for practical deployment.

Despite these advances, rotated small-target detection in SAR imagery still faces several critical challenges. First, in terms of rotated object representation and regression, targets often appear at arbitrary orientations, and the accuracy of angle regression in oriented bounding boxes has a decisive impact on detection performance. Even slight angular deviations can lead to a significant reduction in the intersection-over-union (IoU) between predicted and ground-truth boxes, thereby degrading detection stability and reliability. Second, for small and densely distributed targets, the extremely limited pixel occupancy of small targets in SAR images results in weak feature representations that are easily overwhelmed by speckle noise and complex backgrounds, leading to missed detections. Meanwhile, densely packed targets may exhibit overlap or adhesion, increasing the difficulty of feature separation and localization, and causing false detections or confusion. Finally, speckle noise, which is an inherent artifact caused by coherent interference during SAR imaging, exhibits strong randomness and high-frequency characteristics that severely disrupt texture and edge information, further complicating accurate target detection.

To address these challenges, this study adopts YOLOv13 [18] as the baseline model and introduces targeted improvements within its framework. The YOLO family of models provides an effective foundation for small-target detection due to its single-stage architecture, end-to-end training paradigm, and efficient feature fusion mechanisms. However, the standard YOLOv13 still suffers from insufficient detection accuracy when dealing with rotated small targets, speckle noise, and complex backgrounds in SAR imagery. Therefore, while preserving the high inference efficiency of YOLOv13, this work enhances the model’s capability to perceive rotated target features, improves multi-scale adaptability, and suppresses noise interference through the introduction of novel modules and optimization strategies. These improvements enable high-precision rotated small-target detection and provide a solid theoretical basis for subsequent experimental evaluation and performance validation.

Specifically, the main contributions of this work are summarized as follows:

(1): To enhance feature perception for rotated small targets in SAR imagery, a DEOBB detection head is proposed. By incorporating multi-branch enhanced convolutions, the detection head performs multi-channel feature extraction and detail enhancement, enabling accurate regression of rotated object boundaries and high-fidelity feature representation. This design significantly improves detection accuracy and rotation invariance for small rotated targets.
(2): To address multi-scale target representation and speckle noise interference in SAR images, the CkDRB module is introduced. This module combines multi-branch dilated convolutions with a reparameterization mechanism to efficiently extract features of targets at different scales while suppressing noise, achieving a favorable balance between detection performance and computational efficiency.
(3): To further enhance feature representation for rotated small targets, the DynWeave module is designed by integrating global–local dual attention mechanisms with dynamic large-kernel convolutions. This module adaptively fuses features across different scales and orientations, effectively improving rotation robustness and feature discrimination capability in complex scenes, thereby enhancing overall detection accuracy and stability.

2. Related Work

In recent years, significant progress has been made in small-target detection using synthetic aperture radar (SAR) imagery. Early studies mainly relied on traditional methods, such as constant false alarm rate (CFAR) detection [19], feature matching based on HOG [20] or SIFT [21], and statistical filtering techniques. These approaches are largely dependent on handcrafted features and threshold-based decision mechanisms, making them sensitive to speckle noise, multi-scale targets, and complex backgrounds, which often limits their detection accuracy and robustness. With the rapid development of deep learning, convolutional neural networks (CNNs) and Transformer-based architectures have been gradually introduced into SAR target detection, leading to substantial improvements in detection accuracy and efficiency. From a methodological perspective, recent advances can be viewed along two complementary axes: (i) detector-level architectural designs and training strategies, and (ii) representation- and feature-level robustness, especially those leveraging polarimetric scattering cues. Current mainstream research can be broadly categorized into three groups: two-stage detection methods, single-stage detection methods, and Transformer-based approaches, all of which have contributed to the advancement of this field.

In addition to detector-centric improvements, fully polarimetric SAR (PolSAR) provides richer scattering information and has motivated studies on interpretable target characterization and robust representation learning. Li and Chen [22] proposed a general polarimetric correlation pattern as a visualization and characterization tool for investigating joint-domain scattering mechanisms. Li and Chen [23] further explored multidomain joint characterization for canonical scatterers (e.g., polyhedral corner reflectors) using fully polarimetric radar, strengthening the physical interpretability of polarimetric representations. Moreover, Li et al. [24] investigated PolSAR ship characterization and robust detection under different grazing angles using polarimetric roll-invariant features, highlighting the importance of geometry-robust representations when imaging conditions vary. These findings provide complementary insights for SAR detection; however, this paper focuses on downstream rotated small-target detection in SAR imagery and develops a high-efficiency detector based on YOLOv13 with dedicated modules to address rotation regression, multi-scale representation, and speckle-clutter interference.

2.1. Two-Stage Detection Methods

Two-stage detection methods typically adopt a coarse-to-fine strategy consisting of region proposal generation followed by refined classification. Their main advantage lies in the ability to first identify potential target regions and then perform detailed feature analysis and classification for each proposal, resulting in high detection accuracy. In SAR ship detection tasks, such methods are often combined with multi-scale feature fusion and attention mechanisms to address small targets, rotated objects, and complex background interference. For example, Lin et al. [25] introduced a Squeeze-and-Excitation mechanism into Faster R-CNN to recalibrate multi-scale features, significantly improving detection performance. Cui et al. [26] proposed a Dense Attention Pyramid Network (DAPN) that incorporates the CBAM module to further enhance multi-scale feature representation. Zhang et al. [27] employed a multi-task learning strategy to jointly learn ship texture and shape features, effectively suppressing speckle noise.

Wu et al. [28] developed a dual-branch network named ISASDNet, which integrates instance segmentation with object detection and achieves accurate recognition of multi-scale and multi-orientation ships through global relational reasoning and mask-assisted modules. Zhang et al. [29] proposed Quad-FPN, which enhances small-target detection in complex backgrounds via multi-branch feature pyramids and attention mechanisms. Chai et al. [30] introduced Res2Net and a spatial enhancement module into Cascade R-CNN, combined with a bidirectional feature pyramid and generalized focal loss to optimize dense target boundary detection. Zhan et al. [31] proposed MFT-Reasoning RCNN, which improves multi-scale ship recognition through adaptive global reasoning and cross-dataset knowledge transfer. Kamirul et al. [32] designed a Sparse R-CNN OBB method based on sparse proposal learning, achieving strong performance in oriented object detection tasks. Jiang et al. [33] proposed the FFCV method, which combines Faster R-CNN, fast non-local means filtering, and the Chan–Vese model to extract fine ship contours.

2.2. Single-Stage Detection Methods

Single-stage detection methods directly predict object bounding boxes and categories in an end-to-end manner, offering advantages such as fast inference speed and simplified training procedures. These characteristics make them widely adopted for SAR small-target detection. For instance, Sun et al. [2] proposed BiFA-YOLO, which integrates bidirectional feature aggregation and angle classification, along with random rotation mosaic augmentation, to enhance detection of arbitrarily oriented and densely distributed ships. Chen J. et al. [34] proposed YOLO-SAD to improve detection performance by enhancing key feature extraction. Chen P. et al. [35] introduced GCN-YOLO, which employs graph convolution to model object relationships, while Chen Z. et al. [36] proposed CSD-YOLO, focusing on multi-scale feature fusion to further improve detection accuracy.

Guan et al. [37] enhanced small-target detection accuracy by introducing reparameterization techniques, reducing downsampling loss, and optimizing bounding box regression. Tan et al. [38] proposed the YOLO-RC network, which leverages amplitude gradients and geometric features in the range-compressed domain. Yan et al. [39] developed DA-YOLO, incorporating a dynamic adaptive module to enhance feature extraction flexibility. Jiang et al. [40] and Zhang et al. [41] proposed lightweight models, namely DWSC-YOLO and an edge-optimized lightweight YOLO, respectively, achieving a balance between detection accuracy and computational efficiency. He et al. [42] proposed PPDM-YOLO, which effectively suppresses complex sea clutter through PCA-based feature extraction and noise enhancement modules. Huang et al. [43] introduced NST-YOLO11, combining Neural Swin Transformer and spatial–channel attention to further improve multi-scale detection performance. Chen et al. [44] proposed RSNet, achieving lightweight and high-accuracy detection through global feature extraction and efficient fusion modules. Beyond bounding-box detection, YOLO-style frameworks have also been extended to instance segmentation for small-scale ship targets in SAR imagery; for example, Zhang et al. proposed SegS-YOLO to improve fine-grained ship delineation under complex SAR backgrounds [45].

2.3. Transformer-Based Detection Methods

Transformer-based approaches have recently gained increasing attention in SAR small-target detection due to their ability to capture global dependencies through self-attention mechanisms, thereby improving robustness to multi-scale and rotated targets. Xia et al. [46] proposed CRTransSar, which integrates Swin Transformer and CNN features to enhance multi-scale representations. Zhou et al. [47] introduced PVT-SAR, leveraging a pyramid vision Transformer architecture with overlapping patch embedding and multi-scale fusion modules to improve small-target detection performance. Zhao et al. [48] proposed ViT-FRCNN, which employs domain adaptation strategies to achieve improvements in both accuracy and training efficiency across multi-source datasets.

Feng et al. [49] proposed OEGR-DETR based on the DETR framework, introducing orientation enhancement modules and contrastive loss to optimize rotated target feature representation. Fang et al. [50] developed FEVT-SAR, which combines local feature suppression with global dependency modeling for high-precision multi-class SAR ship detection. In addition to rotation and scale variations, multi-category SAR ship detection is often hampered by severe class imbalance and inter-class confusion induced by heterogeneous imaging mechanisms. To address this issue, Sun et al. [51] proposed KFIA-Net, which injects domain knowledge via knowledge tokens and cross-attention fusion and introduces an imbalance-aware loss to improve classification reliability while maintaining localization performance. Yang et al. [52] proposed LPST-Det, integrating CNN and Transformer features with multi-scale enhancement modules to improve detection of arbitrarily oriented targets. Zhou et al. [53] introduced EGTB-Net, which combines a lightweight backbone with Transformer modules to balance feature extraction efficiency and long-range dependency modeling. Sivapriya et al. [54] applied Vision Transformer for edge detection, improving the clarity of small-target contours. Qin et al. [55] proposed RDB-DINO, which significantly enhances small-scale target detection through denoising training and iterative bounding box refinement. Li et al. [56] applied Transformer architectures to visual grounding tasks and proposed TACMT to improve semantic alignment and localization accuracy. Zhang et al. [57] introduced DenSe-AdViT, which incorporates a density-aware mechanism and leverages both convolutional and Transformer strengths to achieve accurate perception of densely distributed multi-scale targets. Luo et al. proposed dense dual-branch cross-attention aggregation [58], dual-domain Transformer modeling across spatial and channel dimensions [59], and weakly supervised contrastive learning under limited annotations [60], which enhance feature discrimination and robustness in complex scenes by leveraging attention/Transformer-based multi-scale context modeling, together with contrastive constraints.

Despite the notable progress achieved by these methods in SAR small-target detection, existing approaches still struggle to achieve an ideal balance among detection accuracy, inference speed, and rotation robustness for rotated small-target detection. This challenge is particularly pronounced in scenarios involving extremely small targets, arbitrary orientations, and strong background noise. Therefore, there remains a strong demand for an efficient, accurate, and robust detection framework. To this end, this work introduces a series of improvements to YOLOv13 by incorporating the DEOBB detection head, CkDRB module, and DynWeave module, aiming to enhance rotated small-target detection performance while maintaining high efficiency.

3. Materials and Methods

YOLOv13 [13] is a state-of-the-art object detection framework that significantly improves the recognition and localization of small targets while preserving the high inference efficiency characteristic of the YOLO family. Its architectural innovations can be summarized in three aspects. First, the HyperACE mechanism is introduced to capture high-order many-to-many semantic relationships through adaptive hypergraph computation, overcoming the limitation of traditional models that only model local pairwise relationships. Second, the FullAD architecture is proposed to enable feature enhancement strategies across the entire pipeline, including the Backbone, Neck, and Head. Third, the DSC3k2 module is employed to construct efficient feature extraction units.

In the original architecture, the Backbone uses convolutional layers to extract low-level features, followed by DSConv and DSC3k2 modules for multi-scale abstraction, while the A2C2f module compensates for accuracy loss caused by downsampling. The HyperACE module is positioned between the Backbone and Neck, where a global high-order perception branch (C3AH) and a local low-order branch (DS-C3k) collaboratively perform cross-spatial semantic aggregation. The Neck and Head further complete feature fusion and multi-scale prediction through cross-scale mechanisms and oriented bounding box (OBB) modules, respectively.

To address the specific requirements of small-target detection in SAR imagery, this paper proposes an improved network named SAR-DRBNet based on YOLOv13, as illustrated in Figure 1. The main modifications include replacing the original A2C2f and DSC3k2 modules with the proposed CkDRB to enhance feature extraction capability; substituting the Concat module with DynWeave to achieve more effective local cross-scale feature fusion; and replacing the original OBB module with DEOBB to improve multi-scale oriented target prediction accuracy.

In the improved architecture, the Backbone employs convolutional layers for low-level feature extraction, while CkDRB enables efficient multi-scale feature learning. The HyperACE module remains between the Backbone and Neck, where adaptive hypergraph computation is combined with the local CkDRB branch to enhance perception of small SAR targets. The Neck utilizes CkDRB to optimize feature fusion and incorporates cross-scale mechanisms for deep feature interaction. In the Head, convolutional layers and CkDRB further strengthen feature representations, DynWeave performs local cross-scale fusion, and the DEOBB module outputs three detection branches corresponding to small, medium, and large targets, predicting object coordinates, categories, and confidence scores.

Through these targeted improvements, the proposed SAR-DRBNet significantly enhances small-target detection performance on SAR datasets, providing an effective and practical solution for real-world SAR applications.

3.1. Detail-Enhanced Oriented Bounding Box Detection Head (DEOBB)

In synthetic aperture radar (SAR) object detection, rotated targets often exhibit complex geometric structures, weak scattering responses, and pronounced orientation variability. Owing to SAR’s side-looking geometry and coherent imaging mechanism, small targets are typically represented by a few sparse scattering centers with anisotropic spatial distributions, and their signatures are further obscured by speckle noise and background clutter. These factors jointly make accurate rotated bounding box localization and orientation estimation particularly challenging.

From the standpoint of detection head design, rotated SAR detection requires not only precise boundary regression but also robust perception of orientation-dependent structural details. However, conventional convolution-based heads usually aggregate features in an isotropic manner, resulting in limited sensitivity to directional gradients and rotation-induced feature variations—especially for small targets under low signal-to-noise ratio (SNR) conditions.

To this end, we propose a Detail-Enhanced Oriented Bounding Box detection head (DEOBB Detect Head) built upon Group-Normalized Detail-Enhanced Convolution (GNDEConv), as shown in Figure 2. By adopting a unified multi-branch, difference-based feature modeling strategy, the proposed head explicitly strengthens local texture, directional gradients, and rotational cues, thereby improving the representation and localization accuracy of rotated targets.

(1): Group-Normalized Detail-Enhanced Convolution

The GNDEConv module adopts a multi-branch parallel convolutional architecture to extract complementary features from multiple structural perspectives. Let

F_{i}

denote the output of the

i

-th convolutional branch. The aggregated feature representation is obtained by summing all branch responses followed by group normalization (GN) and nonlinear activation:

\begin{matrix} F = ϕ (G N (\sum_{i} F_{i})), \end{matrix}

(1)

where

G N (\cdot)

denotes group normalization and

ϕ (\cdot)

represents a nonlinear activation function. Group normalization is particularly suitable for SAR detection tasks, where batch sizes are often limited by high-resolution inputs.

Within GNDEConv, the difference-based convolutional branches are designed to explicitly model local intensity variations and directional gradients induced by anisotropic scattering and target rotation. Central difference convolution (CDC) enhances local texture and edge information by computing the difference between the central pixel response and its neighboring responses:

\begin{matrix} F_{CDC} = F_{center} - F_{neighbor}, \end{matrix}

(2)

This formulation ensures that both central and neighboring pixels receive effective gradients during backpropagation, thereby improving sensitivity to weak boundaries and small-scale structures. Such a property is particularly important for SAR small targets whose contours are often blurred by speckle noise.

To explicitly capture orientation-dependent characteristics, angular difference convolution (ADC) is introduced to model feature variations across different directions. The output of the ADC branch is defined as

\begin{matrix} F_{ADC} = \sum_{θ} w_{θ} (F_{θ} - F_{θ + Δ}), \end{matrix}

(3)

where

F_{θ}

denotes the feature response along direction

θ

,

Δ

represents an angular offset, and

w_{θ}

is a learnable weight. This angular differencing operation makes feature gradients highly sensitive to rotation variations, thereby enhancing the network’s capability to perceive orientation changes and improving rotation-invariant representation learning.

In addition to angular differences, horizontal difference convolution (HDC) and vertical difference convolution (VDC) are employed to enhance gradient responses along horizontal and vertical directions, respectively. These directional difference operations are particularly effective for elongated or strip-like targets that exhibit dominant structural orientations. Together with ADC, HDC and VDC provide complementary directional cues at different angular resolutions.

From a unified perspective, the CDC, ADC, HDC, and VDC branches can be interpreted as a set of directional difference operators acting on the input feature map. Their collective effect can be expressed in a generalized form as

\begin{matrix} F_{Δ} = \sum_{k} α_{k} (F - T_{k} (F)), \end{matrix}

(4)

where

T_{k} (\cdot)

denotes a spatial or angular transformation corresponding to different difference directions, and

α_{k}

represents the contribution weight of each difference branch. This formulation reveals that GNDEConv performs multi-directional gradient enhancement by aggregating difference responses across multiple orientations and spatial directions.

To further characterize the rotation-aware property of the proposed design, a rotation sensitivity metric can be defined as

S_{rot} = ∥\frac{\partial F}{\partial θ}∥,

(5)

which measures the variation in feature responses with respect to rotation angle. Due to the explicit angular differencing in ADC, the proposed GNDEConv exhibits higher rotation sensitivity under this metric compared with standard convolution, enabling more accurate orientation perception for rotated targets.

By jointly leveraging CDC, ADC, HDC, and VDC, GNDEConv realizes a multi-directional and multi-scale difference-based feature modeling mechanism, which enhances sensitivity to local texture, directional gradients, and rotation-induced variations while maintaining effective gradient propagation. After fusing features from all branches, the resulting representation exhibits improved rotation invariance, clearer boundary delineation, and stronger robustness to background clutter.

(2): Oriented Bounding Box Prediction

In the overall network architecture, the DEOBB detection head is connected to multi-scale feature maps output by the backbone network. The enhanced features produced by GNDEConv are used for oriented bounding box regression and classification. The final prediction can be expressed as

\begin{matrix} \hat{y} = \{b, c, θ\}, \end{matrix}

(6)

where

b

denotes bounding box coordinates,

c

represents class probabilities, and

θ

denotes the predicted rotation angle. Distribution-based regression (DFL) is adopted to improve localization accuracy. Through multi-branch detail enhancement, group-normalized feature aggregation, and explicit rotation-aware modeling, the proposed DEOBB detection head provides a robust and theoretically grounded solution for rotated SAR target detection under complex backgrounds and low-SNR conditions.

Through multi-branch detail enhancement, group-normalized feature aggregation, and explicit rotation-aware modeling, the proposed DEOBB detection head provides a robust and theoretically grounded solution for rotated SAR target detection under complex backgrounds and low-SNR conditions. Overall, DEOBB is specifically designed to address the key bottleneck of small rotated SAR targets—insufficient perception of directional details—by strengthening local texture and directional gradient representations while improving sensitivity to rotational cues, thereby achieving more accurate OBB localization and more stable angle regression.

3.2. Ck-MultiDilated Reparameterization Block (CkDRB)

In synthetic aperture radar (SAR) target detection, affected by imaging mechanisms and observation conditions, targets often exhibit pronounced multi-scale scattering characteristics and anisotropic structural distributions, while being disturbed by speckle noise. As a result, scattering responses and spatial structures at different scales show strong instability.

From the perspective of SAR imaging physics, the observed intensity distribution of a target can be regarded as the superposition of electromagnetic scattering responses originating from multiple spatial scales and scattering centers. Variations in target geometry, orientation, observation angle, and radar wavelength lead to heterogeneous spatial distributions of scattering energy, making it difficult for a single fixed receptive field to simultaneously capture both localized strong scatterers and spatially extended scattering structures.

Traditional single-scale convolutions have difficulty modeling both local strong scattering details and overall target structures, while directly introducing multi-scale or multi-branch convolution structures significantly increases inference-stage computational complexity, which is unfavorable for efficient deployment in remote sensing scenarios.

To address these issues, we propose a Ck-MultiDilated Reparameterization Block (CkDRB), as shown in Figure 3. By introducing parallel multi-scale dilated depthwise convolutions during training, CkDRB enhances feature modeling capability. Each dilated convolution branch corresponds to a linear filtering operation with a distinct effective receptive field, enabling the network to explicitly model scattering responses at different spatial scales within a unified feature representation. During inference, structural reparameterization is employed to equivalently fold the multi-branch structure into a single convolution operator, thereby achieving multi-scale feature modeling without increasing inference cost.

CkDRB adopts the Dilated Reparameterization Block (DRB) [61] as its core feature extraction unit and is embedded into the C3k2 framework of YOLO-series networks. Given an input feature map

X \in R^{C \times H \times W}

, DRB employs a parallel multi-branch structure during training. The main branch uses a large-kernel depthwise convolution to model the main target structure and continuous scattering regions, while the remaining branches use depthwise dilated convolutions with different kernel sizes and dilation rates to capture scattering responses and spatial dependencies at different scales.

Formally, a depthwise dilated convolution in the

i

-th branch can be expressed as

\begin{matrix} {[W_{i}^{(r_{i})} * X]}_{c} (p) = \sum_{q \in K_{i}} W_{i, c} (q) X_{c} (p + r_{i} \cdot q), \end{matrix}

(7)

where

c

denotes the channel index,

p

represents a spatial position, and

K_{i}

is the spatial support of the convolution kernel.

The equivalent receptive field size of this branch is given by

\begin{matrix} R_{i} = r_{i} (k_{i} - 1) + 1, \end{matrix}

which allows each branch to focus on scattering patterns at a specific spatial scale.

Each branch consists of a depthwise convolution and batch normalization, and the overall output can be expressed as

Y = B N (W_{main} * X) + \sum_{i = 1}^{N} B N (W_{i}^{(r_{i})} * X),

(8)

where

W_{main}

denotes the convolution kernel of the main branch, and

W_{i}^{(r_{i})}

denotes the convolution kernel of the

i

-th dilated branch.

To avoid additional computational burden during inference, structural reparameterization is further introduced to transform the multi-branch structure in the training phase into an equivalent single-branch structure in the inference phase. For a convolution followed by batch normalization, the equivalent fused parameters can be written as

\begin{matrix} \hat{W} = \frac{γ}{\sqrt{σ^{2} + ϵ}} W, \hat{b} = β - \frac{γ μ}{\sqrt{σ^{2} + ϵ}}, \end{matrix}

(9)

Then, each dilated convolution kernel is equivalently expanded into a non-dilated dense large-kernel representation:

{\tilde{W}}_{i} (u, v) = \{\begin{matrix} W_{i} (\frac{u}{r_{i}}, \frac{v}{r_{i}}), & u, v \equiv 0 (m o d r_{i}), \\ 0, & otherwise . \end{matrix}

(10)

Finally, the equivalent convolution kernels are aggregated to obtain

\begin{matrix} W_{final} = W_{main} + \sum_{i = 1}^{N} {\tilde{W}}_{i} . \end{matrix}

(11)

Through the above reparameterization process, CkDRB degenerates into a standard depthwise convolution during inference:

Y_{infer} = W_{final} * X + b_{final} .

(12)

Therefore, this module introduces no additional branches, parameter accesses, or computational overhead during inference, while retaining the multi-scale feature modeling capability learned during training. The module is further embedded into the CSP-style bottleneck structure of the network to replace traditional feature extraction units, thereby providing stable and effective feature support for multi-scale SAR target detection without changing the overall network topology.

Following the above design, the pseudocode of the Ck-MultiDilated Reparameterization Block (CkDRB) is summarized in Algorithm 1.

Algorithm 1: Ck-MultiDilated Reparameterization Block (CkDRB)
1:	Input: $X \in R^{C \times H \times W}$ ; main-branch depthwise kernel $W_{main}$ (large-kernel DWConv) and BN parameters $(γ, β, μ, σ^{2}, ϵ)$ ; auxiliary dilated depthwise kernels ${W_{i}^{(r_{i})}}_{i = 1}^{N}$ with dilation rates ${r_{i}}_{i = 1}^{N}$ and branch-wise BN parameters ${(γ_{i}, β_{i}, μ_{i}, σ_{i}^{2}, ϵ)}_{i = 1}^{N}$ .
2:	//Training-time forward (code-consistent): branch-wise BN then summation
3:	$Y \leftarrow BN (W_{main} * X)$
4:	for $i = 1$ to $N$ do
5:	$Y \leftarrow Y + BN (W_{i}^{(r_{i})} * X)$
6:	end for
7:	//Inference-time re-parameterization
8:	${\hat{W}}_{main} \leftarrow \frac{γ}{\sqrt{σ^{2} + ϵ}} W_{main}, {\hat{b}}_{main} \leftarrow β - \frac{γ μ}{\sqrt{σ^{2} + ϵ}}$
9:	$W_{final} \leftarrow {\hat{W}}_{main}, b_{final} \leftarrow {\hat{b}}_{main}$
10:	for $i = 1$ to $N$ do
11:	${\hat{W}}_{i} \leftarrow \frac{γ_{i}}{\sqrt{σ_{i}^{2} + ϵ}} W_{i}^{(r_{i})}, {\hat{b}}_{i} \leftarrow β_{i} - \frac{γ_{i} μ_{i}}{\sqrt{σ_{i}^{2} + ϵ}}$
12:	${\hat{W}}_{i} (u, v) \leftarrow \{\begin{matrix} {\hat{W}}_{i} (\frac{u}{r_{i}}, \frac{v}{r_{i}}), u \equiv 0 (m o d r_{i}), v \equiv 0 (m o d r_{i}) \\ 0 o t h e r w i s e \end{matrix}$
13:	${\tilde{W}}_{i} \leftarrow CenterPad ({\tilde{W}}_{i}, to size of W_{main})$
14:	$W_{final} \leftarrow W_{final} + {\tilde{W}}_{i}, b_{final} \leftarrow b_{final} + {\hat{b}}_{i}$
15:	end for
16:	$Y_{infer} \leftarrow W_{final} * X + b_{final}$
17:	Output: training output $Y$ or inference output $Y_{infer}$
Note. * denotes depthwise convolution (DWConv, groups = C) with same padding; $W_{i}^{(r_{i})} * X$ indicates depthwise convolution with dilation rate $r_{i}$ . In training, the output is obtained by summing the branch-wise BN-normalized responses from the main and dilated branches. In inference, BN is fused into each branch to produce ( $\hat{W}, \hat{b}$ ); each dilated kernel is expanded to an equivalent dense kernel and center-aligned/padded to the main kernel size, after which all kernels and biases are accumulated into a single deployable ( $W_{final}$ , $b_{final}$ ).

3.3. Dynamic Feature Weaving Module (Complete Formulation)

In rotated synthetic aperture radar (SAR) object detection, targets usually exhibit multi-scale and multi-orientation distributions under complex backgrounds and low signal-to-noise ratio (SNR) conditions. Due to the side-looking imaging geometry and coherent imaging mechanism of SAR, target scattering responses are often anisotropic and spatially unstable, while background clutter and speckle noise further degrade feature discriminability. Under such conditions, effective integration of deep semantic features and skip-connection features containing fine spatial details is critical for robust detection of rotated targets.

From a feature modeling perspective, deep features provide strong semantic abstraction and robustness to noise, whereas skip features preserve spatial details and boundary information. However, their relative importance varies across channels, spatial locations, and target scales, especially under rotation and low-SNR conditions. Therefore, feature fusion in rotated SAR object detection can be regarded as a conditional aggregation problem, where heterogeneous features should be dynamically integrated according to the input content.

To address this issue, we propose the Dynamic Feature Weaving Module (DynWeave), which dynamically integrates deep and skip features through adaptive modulation along channel, spatial, and scale dimensions. Instead of performing fixed fusion, DynWeave explicitly models cross-level interactions via a hierarchical weaving strategy. The overall architecture is illustrated in Figure 4.

(1): Channel Unification and Spatial Alignment

Given a deep feature map

F_{d}

and a skip feature map

F_{s}

, DynWeave first aligns them in the channel space using a pointwise convolution:

\begin{matrix} F_{d^{'}} = {C o n v}_{1 \times 1} (F_{d}), F_{s^{'}} = {C o n v}_{1 \times 1} (F_{s}), \end{matrix}

(13)

where

{C o n v}_{1 \times 1} (\cdot)

denotes the

1 \times 1

convolution. If the input channel dimension already matches the target dimension, an identity mapping is applied.

To resolve spatial resolution mismatch caused by different network depths, the skip feature is resized via bilinear interpolation:

\begin{matrix} F_{s}^{align} = I n t e r p (F_{s^{'}}), \end{matrix}

(14)

yielding spatially aligned features for subsequent joint modeling.

(2): Global Channel Attention with Joint–Split Weaving

After alignment, the deep and skip features are concatenated along the channel dimension:

F^{cat} = [F_{d^{'}}, F_{s}^{align}] .

(15)

Global channel attention is generated by applying global average pooling (GAP) followed by a

1 \times 1

convolution and sigmoid activation:

\begin{matrix} w_{c} = σ ({C o n v}_{1 \times 1} (G A P (F^{cat}))), \end{matrix}

(16)

where

w_{c} \in R^{2 C}

.

To explicitly model cross-level interactions, the channel weights are split into two parts:

\begin{matrix} w_{c} = [w_{c}^{d}, w_{c}^{s}], w_{c}^{d}, w_{c}^{s} \in R^{C}, \end{matrix}

(17)

which are applied to the deep and skip features, respectively:

\begin{matrix} F_{d}^{c} = w_{c}^{d} ⊙ F_{d^{'}}, F_{s}^{c} = w_{c}^{s} ⊙ F_{s}^{align} . \end{matrix}

(18)

This joint–split channel attention enables adaptive cross-level channel recalibration conditioned on both feature streams.

(3): Local Spatial Attention with Dual Spatial Weaving

While channel attention captures global semantic relevance, rotated SAR targets exhibit strong spatial variability due to orientation changes and anisotropic scattering. To model such spatial instability, DynWeave introduces a local spatial attention mechanism based on depthwise convolution:

\begin{matrix} w_{s} = σ (D W C o n v ([F_{d}^{c}, F_{s}^{c}])), \end{matrix}

(19)

where

D W C o n v (\cdot)

denotes depthwise convolution.

The spatial attention map is further decomposed into two branch-specific masks:

\begin{matrix} w_{s} = [w_{s}^{d}, w_{s}^{s}], w_{s}^{d}, w_{s}^{s} \in R^{1 \times H \times W}, \end{matrix}

(20)

which are applied as

\begin{matrix} F_{d}^{c s} = w_{s}^{d} ⊙ F_{d}^{c}, F_{s}^{c s} = w_{s}^{s} ⊙ F_{s}^{c} . \end{matrix}

(21)

The joint use of channel-wise and spatial-wise weaving can be interpreted as a factorized approximation of full spatio-channel attention, achieving adaptive modulation with reduced computational complexity.

(4): Dynamic Large-Kernel Enhancement and Gated Output

The spatially and channel-modulated features are fused as

F_{c s} = F_{d}^{c s} + F_{s}^{c s} .

(22)

To capture multi-scale contextual information, DynWeave employs a dynamic large-kernel enhancement module. Multiple convolution branches with different kernel sizes are used, and their outputs are dynamically weighted:

\begin{matrix} F_{k} = \sum_{i = 1}^{N} α_{i} \cdot {C o n v}_{k_{i}} (F_{c s}), \end{matrix}

(23)

where the weighting coefficients are generated as

\begin{matrix} α = Softmax ({C o n v}_{1 \times 1} (G A P (F_{c s}))), \sum_{i = 1}^{N} α_{i} = 1 . \end{matrix}

(24)

From a signal processing perspective, this module can be regarded as a scale-adaptive filter bank, where the network selects appropriate receptive fields conditioned on the input feature distribution, effectively alleviating scale–orientation coupling caused by target rotation.

Finally, a gated unit is applied to generate the output feature

\begin{matrix} F_{out} = σ (G) ⊙ F_{k}, \end{matrix}

(25)

where the gating operation further suppresses residual background noise and highlights discriminative target features.

Based on the above design, the overall execution flow of the DynWeave module is summarized in Algorithm 2.

Algorithm 2: Dynamic Feature Weaving (DynWeave)
1:	Input: Deep features $X$ , Skip connection features $S$
2:	Initialize target dimension $C$
3:	$X_{d e e p} \leftarrow {Conv}_{1 \times 1} (X_{d e e p})$ , $X_{s k i p} \leftarrow {Conv}_{1 \times 1} (S)$
4:	if $S i z e (X) \neq S i z e (S)$ then $S \leftarrow Upsample (S)$
5:	//Attention & Weaving
6:	$F_{c a t} = Concat ([X, S])$
7:	$W_{g l o b a l} \leftarrow σ (MLP (AvgPool (F_{c a t})))$
8:	$W_{l o c a l} \leftarrow σ (DWConv (F_{c a t}))$
9:	$X = X ⊙ W_{g} [0] ⊙ W_{l} [0]$
10:	$S = S ⊙ W_{g} [1] ⊙ W_{l} [1]$
11:	//Fusion & Dynamic Enhancement
12:	$F_{f u s e d} \leftarrow DynamicKernel (X + S)$
13:	$Y \leftarrow {Conv}_{g a t e} (F_{f u s e d})$
14:	Output: Fused feature map $Y$
Note: $σ (\cdot)$ represents the Sigmoid function; $⊙$ denotes element-wise multiplication; $D W C o n v$ denotes depth-wise convolution used for local spatial context.

Overall, DynWeave can be viewed as a hierarchical conditional feature weaving framework that dynamically integrates deep and shallow features through channel-wise, spatial-wise, and scale-wise modulation. By explicitly modeling cross-level interactions and input-dependent feature importance, DynWeave provides a theoretically grounded and robust solution for rotated SAR object detection under complex backgrounds and low-SNR conditions.

4. Results

4.1. Experimental Platform and Evaluation Metrics

To validate the effectiveness of the proposed method, all experiments were conducted on a unified experimental platform. The operating system is Ubuntu 18.04, and the implementation is based on PyTorch 2.1.1 with Python 3.8. We adopt YOLOv13n as the baseline model. The detailed software/hardware configuration is summarized in Table 1. For fair comparison, all ablation studies and comparative experiments were carried out under the same training protocol. In particular, we use a fixed input resolution of 640 × 640 for all datasets, where images are preprocessed by the default YOLOv13 letterbox strategy (aspect-ratio-preserving resizing followed by padding), and the corresponding labels are transformed consistently. Moreover, unless otherwise stated, the same set of training hyperparameters is applied across HRSID, RSDD-SAR, and DSSDD to ensure strict comparability. The complete hyperparameter configuration is reported in Table 2 to facilitate reproducibility.

During the entire training process, identical hyperparameter settings were applied to all experiments. Unless explicitly mentioned, all experiments use the same optimizer and learning settings for all three datasets. The specific hyperparameters used in training are listed in Table 2.

In this study, Precision (P), Recall (R), and mean Average Precision (mAP@0.5) were selected as evaluation metrics to assess the performance of the improved network. The corresponding calculation formulas are given as follows.

I O U = \frac{A \cup B}{A \cap B}

(26)

P r e c i s i o n = \frac{T P}{T P + F P}

(27)

R e c a l l = \frac{T P}{T P + F N}

(28)

A P = \sum_{i = 1}^{n - 1} (r_{i + 1} - r_{i}) P_{i n t e r} (r_{i} + 1)

(29)

F 1 S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(30)

m A P = \frac{\sum_{i = 1}^{k} {A P}_{i}}{k}

(31)

In Equation (26), the ground-truth bounding box is denoted by

A

, and the predicted bounding box is denoted by

B

. The intersection area between

A

and

B

is represented by

A \cap B

, and the union area is represented by

A \cup B

. In Equations (27) and (28), TP represents the number of positive samples correctly predicted as positive, FP represents the number of negative samples incorrectly predicted as positive, and FN represents the number of positive samples incorrectly predicted as negative. In Equation (29),

n

denotes the number of recall sample points. The variable

r_{i}

represents the recall value at the

i

-th rank, and

P_{inter} (r_{i + 1})

denotes the interpolated precision at the next recall level, which is defined as the maximum precision measured for any recall value exceeding

r_{i + 1}

. This integration effectively calculates the area under the Precision–Recall (PR) curve. In Equation (31),

k

represents the total number of object classes. The mean Average Precision (mAP) is calculated by averaging the AP values of all

k

categories to comprehensively evaluate the detection performance of the network.

4.2. Datasets

Experiments are conducted on three representative SAR ship detection datasets, namely HRSID, RSDD-SAR, and DSSDD, which differ in imaging resolution, polarization modes, annotation formats, and scene complexity. The dataset overview is provided in Table 3, and the detailed train/val partitioning with subset statistics is reported in Table 4.

Partitioning strategy: We do not use an independent test set. Instead, we adopt a train/validation evaluation protocol, where the validation subset is used for quantitative evaluation. For HRSID, we employ a custom 8:2 train/val split; for RSDD-SAR and DSSDD, we follow the official splits. The same partitions are used throughout all experiments to ensure fair comparison and reproducibility.

(1): High-Resolution SAR Images Dataset (HRSID)

HRSID is a large-scale high-resolution benchmark dataset for SAR ship detection [62]. It contains 5604 images and approximately 17k ship instances, supporting both object detection and instance segmentation tasks. The data are collected from platforms such as Sentinel-1B, TerraSAR-X, and TanDEM-X, with spatial resolutions ranging from 0.5 m to 5 m, and multiple polarization channels (HH, VV, HV). Due to its complex scenes and fine-grained annotations, HRSID is particularly valuable for multi-scale detection, small-target detection, and robustness evaluation under speckle noise and clutter. For reproducibility, we use a custom 8:2 train/val split, resulting in 4483 training images (13,344 instances) and 1121 validation images (3607 instances).

(2): Rotated Ship Detection Dataset in SAR Images (RSDD-SAR)

RSDD-SAR is constructed using SAR data acquired by GF-3 and TerraSAR-X [63]. It consists of 127 original SAR images, which are cropped into approximately 7000 image patches of size 512 × 512 pixels, forming a patch-level dataset. The dataset features arbitrary target orientations, a high proportion of small targets, and large variations in aspect ratios, making it well-suited for rotated object detection. The annotations adopt the OpenCV long-side representation, including the target center point, long side, short side, and rotation angle. In our experiments, we follow the official train/val split, which contains 5600 training patches (8144 instances) and 1400 validation patches (2119 instances).

(3): Dual-Polarimetric SAR Ship Detection Dataset (DSSDD)

DSSDD is a dual-polarimetric (VV, VH) SAR ship detection dataset designed for deep learning applications [64]. It is constructed from Sentinel-1 SLC images, and after preprocessing (e.g., via SNAP), a total of 1236 patches of size 256 × 256 pixels are generated. The dataset provides both rotated bounding boxes (RBox) and horizontal bounding boxes (BBox), and is dominated by small targets. The dual-polarization information improves target separability, especially in near-shore environments with complex backgrounds. We follow the official train/val split, with 856 training patches (2277 instances) and 380 validation patches (1139 instances).

Small-target definition and statistics. To provide a clear and quantitative definition of small targets, we categorize objects according to the pixel area of their oriented bounding boxes (OBBs). Specifically, the OBB area (A) is computed in the image plane from the four corner points, and an object is defined as small if (A < 1024) pixels² (equivalent to a 32 × 32 square). Under this criterion, small targets account for 78.64% of instances in HRSID, 81.18% in RSDD-SAR, and 100.00% in DSSDD (Table 5), indicating that these benchmarks are dominated by small ships, with DSSDD being the most extreme. To characterize the effective pixel-size range while reducing the influence of outliers, we report percentile statistics (P5/P50/P95) for both (A) and the OBB side lengths (w,h) in Table 5.

4.3. Ablation Study and Comparison Experiments

4.3.1. Ablation Experiments on the HRSID Dataset

To validate the effectiveness of the proposed modules in multi-directional and high-density ship scenarios, we conduct stepwise ablation studies on the HRSID dataset using YOLOv13 as the baseline. Specifically, the DEOBB detection head, CkDRB feature enhancement block, and DynWeave adaptive fusion module are progressively integrated into the baseline model. Performance is evaluated using Precision, Recall, F1-score, mAP@50, and mAP@50–95, while computational efficiency is additionally reported in terms of parameters (params), GFLOPs, and inference speed (FPS). The results are summarized in Table 6.

As shown in Table 6, the baseline model achieves Precision, Recall, F1-score, mAP@50, and mAP@50–95 of 0.91573, 0.85303, 0.88327, 0.93792, and 0.66690, respectively, indicating that rotated localization and background suppression remain challenging under dense and complex scenes.

After introducing DEOBB, the model obtains a noticeable improvement in localization-sensitive metrics: Recall increases by 1.41 percentage points and F1 rises by 0.62 percentage points, while mAP@50 and mAP@50–95 increase by 0.18 percentage points and 0.75 percentage points, respectively. Although Precision slightly decreases by 0.28 percentage points, DEOBB simultaneously reduces model complexity (from 2.5 M to 2.1 M params; from 6.4 to 5.9 GFLOPs) with only a marginal FPS drop (from 40.5 to 39.6), suggesting that the proposed detail-enhanced branches effectively strengthen rotated box regression with a lightweight design.

Building upon DEOBB, adding CkDRB further enhances both accuracy and robustness in cluttered SAR scenes: Precision and Recall increase by 0.59 and 0.75 percentage points, respectively, and the F1-score improves to 0.89619 (an increase of 0.67 percentage points). Meanwhile, mAP@50 and mAP@50–95 reach 0.94455 (an increase of 0.48 percentage points) and 0.69224 (an increase of 1.79 percentage points), confirming that multi-scale dilated context modeling and speckle-noise suppression significantly improve discriminative feature extraction. This gain is accompanied by increased computation (6.4 M params, 18.2 GFLOPs), yet the measured inference speed remains acceptable (47.6 FPS) under the current implementation and test settings.

Finally, integrating DynWeave yields consistent additional gains, pushing the performance to the best overall results (Precision/Recall/F1 = 0.91953/0.87795/0.89826, mAP@50 = 0.94872, mAP@50–95 = 0.69768). Compared with the CkDRB-enhanced model, DynWeave brings an increase of 0.07 percentage points in Precision, 0.33 percentage points in Recall, and 0.21 percentage points in F1-score, together with increases of 0.42 percentage points in mAP@50 and 0.54 percentage points in mAP@50–95, at a modest overhead (7.0 M params; 19.3 GFLOPs) and a slight FPS reduction (from 47.6 to 43.8). Overall, relative to the baseline, the full model improves Recall by 2.49 percentage points, F1-score by 1.50 percentage points, and mAP@50–95 by 3.08 percentage points, demonstrating the advantage of the proposed framework in adaptive fusion of multi-directional and multi-scale features for dense rotated ship detection.

The comparative training curves of multiple performance metrics shown in Figure 5 illustrate that, compared with the baseline model, the proposed model achieves notable performance gains in Precision, Recall, mAP@50, and mAP@50–95, with all models converging after approximately 250 training epochs. Overall, the three proposed modules effectively enhance the model’s rotational robustness and feature representation capability from different perspectives, resulting in consistent and significant improvements over the baseline across all evaluation metrics.

To intuitively validate the effectiveness of the proposed modules, Grad-CAM is employed to visualize the feature maps under different model configurations, as shown in Figure 6. As illustrated in the second column, the baseline model exhibits limited feature extraction capability in complex backgrounds. In particular, in near-shore scenarios (Figure 6b), the heatmaps show strong responses in non-target regions, such as land areas and sea clutter, leading to imprecise localization of small targets.

After incorporating the DEOBB module (third column), the model’s perception of rotated small targets is noticeably enhanced. As observed in Figure 6a, the activation maps become more concentrated around the ship centers, indicating that the multi-branch enhanced convolution effectively improves feature perception.

With the further introduction of the CkDRB module (fourth column), the coherent speckle noise and land background interference in SAR images are effectively suppressed, benefiting from the re-parameterization mechanism and multi-branch dilated convolutions. Consequently, background noise is significantly reduced, and the boundaries of the activation maps become clearer.

Finally, after integrating the DynWeave module (fifth column), the model achieves optimal performance across all scenarios. Through the global–local dual-attention mechanism, the feature heatmaps accurately cover targets of different scales and orientations, such as the large ship in Figure 6c, while maintaining a high signal-to-noise ratio. These results demonstrate that DynWeave successfully realizes adaptive feature fusion and substantially enhances the robustness of the model.

4.3.2. Ablation Experiments on the DSSDD Dataset

To evaluate the adaptability and effectiveness of the proposed modules in dual-polarization SAR ship detection scenarios, we conduct stepwise ablation experiments on the DSSDD dataset using YOLOv13 as the baseline. In addition to detection performance metrics (Precision, Recall, F1-score, mAP@50, and mAP@50–95), computational efficiency is further reported in terms of parameters (params), GFLOPs, and inference speed (FPS) for a comprehensive assessment. The results are summarized in Table 7.

As shown in Table 7, the baseline model already achieves strong performance on DSSDD, with Precision, Recall, F1-score, mAP@50, and mAP@50–95 values of 0.94675, 0.95088, 0.94878, 0.98543, and 0.70081, respectively. In terms of efficiency, it requires 2.5 M parameters and 6.4 GFLOPs, and runs at 60.8 FPS. Nevertheless, there remains room for improvement in rotated target representation and the utilization of polarization-related features.

After introducing DEOBB, the model achieves consistent improvements across all metrics. Specifically, Precision increases by 0.66 percentage points, rising from 0.94675 to 0.95335; Recall increases by 1.49 percentage points, rising from 0.95088 to 0.96576; and F1-score increases by 1.07 percentage points, rising from 0.94878 to 0.95951. Meanwhile, mAP@50 increases by 0.12 percentage points, rising from 0.98543 to 0.98664, and mAP@50–95 improves substantially by 9.27 percentage points, rising from 0.70081 to 0.79347. These results indicate that DEOBB effectively strengthens rotated bounding box regression and enhances fine-grained structural feature capture. More importantly, DEOBB reduces model complexity, with the parameter count decreasing from 2.5 M to 2.1 M and GFLOPs decreasing from 6.4 to 5.9, while the inference speed only slightly decreases from 60.8 to 60.1 FPS, demonstrating a favorable accuracy–efficiency trade-off.

With the further incorporation of CkDRB, the feature representation capability on dual-polarization SAR imagery is further enhanced. Compared with the DEOBB-enhanced model, Precision increases by 0.66 percentage points to 0.95992, Recall increases by 0.15 percentage points to 0.96722, and F1-score increases by 0.40 percentage points to 0.96356. Moreover, mAP@50 and mAP@50–95 rise to 0.98687 and 0.80305, corresponding to gains of 0.02 and 0.96 percentage points, respectively. These results suggest that CkDRB can effectively capture multi-scale contextual textures and suppress speckle-like interference, thereby improving discriminative feature extraction under complex ship morphologies and cluttered backgrounds. This improvement comes with increased computation, with 6.4 M parameters and 18.2 GFLOPs. Under the current implementation and test settings, the inference speed reaches 67.0 FPS.

Finally, integrating DynWeave further improves the model performance and yields the best overall results. Precision, Recall, and F1-score reach 0.96414, 0.96783, and 0.96598, respectively, while mAP@50 rises to 0.98725, and mAP@50–95 achieves the highest value of 0.81167. Compared with the CkDRB-enhanced model, DynWeave increases Precision, Recall, and F1-score by 0.42, 0.06, and 0.24 percentage points, respectively, and further improves mAP@50 and mAP@50–95 by 0.04 and 0.86 percentage points, validating the effectiveness of its adaptive multi-scale feature fusion in dual-polarization scenarios. In terms of efficiency, the full model incurs a moderate overhead, requiring 7.0 M parameters and 19.3 GFLOPs, and runs at 57.7 FPS, indicating that improved robustness and localization quality are achieved at a controllable computational cost.

As illustrated in Figure 7, the multi-metric training curves show that, compared with the baseline model, the improved model achieves higher performance in Precision, Recall, mAP@50, and mAP@50–95, leading to a significant enhancement in overall detection capability. Meanwhile, all models tend to converge after approximately 100 training epochs, indicating stable training behavior and consistent optimization trends.

To further evaluate the generalization capability of the proposed method under different data distributions, the same Grad-CAM visualization analysis is conducted on the DSSDD dataset, and the results are shown in Figure 8. As illustrated in the second column, the baseline model exhibits evident limitations when handling complex scenes in the DSSDD dataset. In particular, in the multi-target dispersed scenario shown in Figure 8b, the feature heatmaps appear highly diffused, making it difficult to distinguish adjacent ship targets. In addition, in the near-shore scenario shown in Figure 8c, strong false activations are produced by land backgrounds

After introducing the DEOBB module (third column), the model demonstrates enhanced attention to target features. As shown in Figure 8b, the previously diffused feature regions begin to converge toward the specific locations of individual ships, indicating the effectiveness of this module in enhancing rotated feature perception.

Subsequently, the incorporation of the CkDRB module (fourth column) significantly improves the signal-to-noise ratio. In the dark-background scenario of Figure 8a and the riverbank region in Figure 8c, the originally cluttered background noise is markedly suppressed, as evidenced by darker activation colors. This observation confirms that the multi-branch dilated convolutions and re-parameterization mechanism effectively mitigate background clutter and coherent speckle noise.

Finally, the complete model integrated with the DynWeave module (fifth column) achieves the best detection performance. In Figure 8b, the model successfully realizes clear separation of dispersed small targets, with well-defined boundaries for each object. In the highly complex near-shore environment shown in Figure 8c, the model accurately focuses on ships within the waterway while maximally suppressing interference from onshore buildings. These results convincingly demonstrate that DynWeave substantially enhances the robustness of the model in complex SAR scenarios through adaptive feature fusion.

4.3.3. Ablation Experiments on the RSDD Dataset

To further evaluate the generalization capability of the proposed modules under complex rotation scenarios, we conduct stepwise ablation experiments on the RSDD dataset using YOLOv13 as the baseline. In addition to detection accuracy metrics (Precision, Recall, F1-score, mAP@50, and mAP@50–95), we also report computational efficiency in terms of Params, GFLOPs, and inference speed (FPS). The results are summarized in Table 8. Although the baseline already performs strongly on RSDD, further improvements are still achievable, particularly for rotated bounding box representation and fine-grained feature modeling.

After introducing DEOBB, all metrics exhibit consistent (though modest) improvements. Recall increases by 0.28 percentage points, rising from 0.91076 to 0.91360, and the F1-score increases by 0.17 percentage points to 0.92987. mAP@50 and mAP@50–95 improve by 0.08 and 0.14 percentage points, reaching 0.97178 and 0.73934, respectively. Meanwhile, the model becomes lighter, with Params decreasing from 2.5 M to 2.1 M and GFLOPs decreasing from 6.4 to 5.9, while FPS slightly decreases from 61.3 to 59.4. These results indicate that DEOBB can further refine rotated bounding box regression accuracy with a favorable accuracy–efficiency trade-off.

With the subsequent incorporation of CkDRB, the performance gains become more pronounced. Compared with the DEOBB-enhanced model, Recall increases by 0.80 percentage points to 0.92162, the F1-score increases by 0.43 percentage points to 0.93412, and mAP@50–95 increases by 0.70 percentage points to 0.74636. mAP@50 also improves to 0.97245. These results demonstrate the effectiveness of CkDRB in multi-scale contextual modeling and robust feature extraction under complex backgrounds. This improvement is accompanied by increased computation, with Params rising to 6.4 M and GFLOPs to 18.2. Under the current implementation and test settings, the inference speed remains high at 65.8 FPS.

Finally, integrating DynWeave further strengthens the overall detection performance. Precision increases to 0.94985, Recall increases to 0.92218, and the F1-score reaches the highest value of 0.93581. mAP@50 remains improved at 0.97246, while mAP@50–95 stays at a comparable level (0.74612) relative to the CkDRB-enhanced model. These results suggest that DynWeave effectively enhances feature fusion across different orientations, contributing to improved classification–localization consistency. In terms of efficiency, the full model introduces a moderate overhead (7.0 M Params, 19.3 GFLOPs) and runs at 60.2 FPS, indicating that the performance gains are obtained at a controllable computational cost.

As illustrated in Figure 9, the comparison of multi-metric training curves shows that, compared with the baseline model, the proposed improved model achieves notable performance gains in Precision, Recall, mAP@50, and mAP@50–95. Moreover, all models converge after approximately 150 training epochs, indicating stable training behavior and consistent convergence trends.

To comprehensively validate the robustness of the proposed method across different data domains, further visualization analysis is conducted on the RSDD dataset, and the results are shown in Figure 10. As illustrated in the second column, the baseline model exhibits severe feature diffusion when facing typical near-shore scenarios in the RSDD dataset. In particular, in the dense dock scenario shown in Figure 10b, the activation maps merge into large continuous regions, making it difficult to distinguish closely arranged ship instances. In addition, in Figure 10c, extensive land buildings generate strong background noise responses, which severely interfere with target extraction.

After introducing the DEOBB module (third column), the model’s geometric perception of rotated targets is significantly enhanced. As observed in Figure 10a, for extremely small targets, the activation regions become more compact and precise, indicating improved localization capability.

Subsequently, the incorporation of the CkDRB module (fourth column) substantially purifies the feature maps. The most notable improvement appears in the highly interfered regions of Figure 10c, where the previously high-intensity land background activations (red/yellow) are effectively suppressed to cooler tones (blue). This result indicates that the re-parameterization mechanism and multi-branch dilated convolutions successfully filter high-frequency land scattering interference, leading to a significant improvement in the signal-to-noise ratio.

Finally, the complete model integrated with the DynWeave module (fifth column) demonstrates superior performance in dense scenarios. In Figure 10b, the model not only suppresses near-shore background clutter but also achieves instance-level separation of densely packed ships, with each ship corresponding to an independent activation peak. Benefiting from the global–local dual-attention mechanism, the model in Figure 10c accurately localizes targets in corner regions while maintaining a clean background. These observations convincingly demonstrate the superiority of DynWeave in handling complex near-shore and densely populated scenarios.

4.3.4. Model Comparison Experiments on the HRSID Dataset

As summarized in Table 9, the proposed SAR-DRBNet maintains strong overall detection performance on the HRSID dataset. It achieves a Precision of 0.91953 and the highest Recall of 0.87795 among all compared methods, resulting in the best F1-score of 0.89826. These results indicate that the proposed approach improves target discovery capability in dense and multi-directional ship scenes while preserving a low false-positive rate, thereby achieving a favorable balance between precision and recall.

In terms of detection accuracy, SAR-DRBNet attains the highest mAP@50 of 0.94872, surpassing all competing models and improving upon the second-best YOLOv10n-OBB (0.94374). Under the more stringent mAP@50–95 metric, SAR-DRBNet achieves 0.69768, which is slightly lower than the best-performing YOLOv10n-OBB (0.70387) but remains competitive overall. This suggests that, while the proposed method substantially improves detection sensitivity and overall matching at IoU = 0.5, there is still a small margin under higher IoU thresholds, and the model nevertheless exhibits robust generalization to multi-scale and multi-orientation ship targets in complex SAR backgrounds.

Importantly, Table 9 also reports computational efficiency. Compared with lightweight YOLO-based OBB detectors (e.g., YOLOv10n-OBB with 2.3 M parameters and 6.8 FLOPs), SAR-DRBNet introduces a moderate computational overhead (7.0 M parameters and 19.3 FLOPs) due to the synergistic integration of DEOBB, CkDRB, and DynWeave. Despite this increase, the proposed model still runs at 43.8 FPS, which is comparable to the baseline YOLOv13n-OBB (40.5 FPS) and demonstrates that the performance gains are achieved at a controllable inference cost.

Overall, benefiting from the complementary effects of the DEOBB detection head (enhanced rotated regression), CkDRB (multi-scale contextual enhancement with noise suppression), and DynWeave (adaptive multi-scale feature fusion), SAR-DRBNet delivers consistent improvements in Recall, F1-score, and mAP@50 on HRSID, validating the effectiveness of the proposed model design.

We conduct a qualitative comparison of different object detection models on the HRSID dataset, as illustrated in Figure 11. In the sparse scenario shown in Figure 11a, all models are able to accurately detect the targets. However, in the dense multi-target scenario shown in Figure 11b, conventional YOLO-based models, although capable of identifying all targets, exhibit insufficient bounding box localization accuracy when handling closely arranged ships. In contrast, the proposed SAR-DRBNet demonstrates superior multi-object separation capability, with the predicted bounding boxes showing the highest alignment with the ground-truth annotations. This result provides strong evidence that precise target localization in dense scenarios is significantly improved.

Furthermore, in the most challenging near-shore port scenario shown in Figure 11c, the proposed method exhibits strong robustness against clutter, enabling the model to effectively detect ships that are either obscured by or located in close proximity to high-intensity land clutter. This capability effectively alleviates the missed detection issues commonly encountered by conventional methods in complex backgrounds.

Overall, the qualitative results consistently indicate that the proposed SAR-DRBNet achieves comprehensive optimization across key performance aspects on the HRSID dataset.

4.3.5. Model Comparison Experiments on the DSSDD Dataset

Table 10 presents a comprehensive comparison on the DSSDD dataset. Overall, most oriented bounding box (OBB) detectors achieve strong performance under this dual-polarization SAR ship detection benchmark, indicating that the task is relatively mature and performance tends to saturate. Under such competitive conditions, the proposed SAR-DRBNet still delivers consistently competitive results across multiple metrics.

Specifically, SAR-DRBNet achieves Precision, Recall, and F1-score values of 0.96414, 0.96783, and 0.96598, respectively, which are comparable to representative high-performing YOLO-based OBB detectors (e.g., YOLOv6-OBB and YOLOv12n-OBB). This demonstrates that the proposed design maintains stable detection capability in dual-polarization SAR scenarios with complex ship morphologies and cluttered backgrounds.

In terms of detection accuracy, SAR-DRBNet attains the highest mAP@50 of 0.98725 among all evaluated methods, slightly surpassing YOLOv5n-OBB (0.98724), YOLOv6-OBB (0.98720), and YOLOv10n-OBB (0.98684). Under the more stringent mAP@50–95 metric, SAR-DRBNet reaches 0.81167, which remains highly competitive and is close to other top-performing methods such as YOLOv5n-OBB (0.81165) and YOLOv12n-OBB (0.81117). Although the best mAP@50–95 is achieved by YOLOv8n-OBB (0.81533) and YOLOv6-OBB (0.81368), the proposed method stays within a narrow margin, suggesting robust generalization under multi-scale, multi-orientation, and polarization-sensitive detection settings.

Moreover, Table 10 reports computational efficiency in terms of Params, GFLOPs, and FPS. Compared with lightweight YOLO-based OBB models, SAR-DRBNet introduces a moderate increase in model complexity, with 7.0 M parameters and 19.3 GFLOPs, while maintaining an inference speed of 57.7 FPS. These results indicate that the performance improvements—particularly the leading mAP@50—are achieved at a controllable computational cost, which is meaningful for practical deployment.

Overall, even when performance differences across strong baselines become small on DSSDD, the proposed multi-module collaborative design still provides stable gains and competitive accuracy, supporting its effectiveness in exploiting polarization-aware cues and handling complex scattering characteristics in dual-polarization SAR imagery.

Based on the DSSDD dataset, Figure 12 clearly illustrates the performance advantages of the proposed SAR-DRBNet in addressing the inherent challenges of SAR imagery, particularly in terms of improved target confidence and detection stability.

In the weak-target scenario shown in Figure 12a, the background contains a high level of noise interference, and the target signal is extremely faint. Although YOLOv11 and YOLOv12 are able to detect the target, they assign relatively low confidence scores (e.g., 0.46 for YOLOv12). In contrast, the proposed SAR-DRBNet not only consistently detects the target but also outputs the highest confidence score (0.81), providing strong evidence of its robustness and superior target discrimination capability under adverse environmental conditions.

For the multi-target and missed-detection scenario shown in Figure 12b, which typically occurs in near-shore or densely clustered settings where some targets are located near image boundaries, YOLOv11 and YOLOv12 suffer from evident missed detections, resulting in insufficient recall. Although both YOLOv13 and the proposed method successfully identify two targets, SAR-DRBNet assigns consistently high confidence scores to both detections (0.87 and 0.86), indicating improved detection stability and enhanced target separation capability.

Finally, in the complex near-shore small-target scenario shown in Figure 12c, which combines intricate coastline backgrounds with multiple small-scale targets, the proposed SAR-DRBNet successfully detects all targets while providing higher and more consistent confidence scores for each instance (e.g., 0.89, 0.88, 0.86, and 0.79). Compared with other models that exhibit unstable confidence levels for edge targets, these results demonstrate that SAR-DRBNet achieves more accurate recognition and stronger interference resistance for small targets in cluttered environments.

4.3.6. Model Comparison Experiments on the RSDD Dataset

Table 11 reports the comparison results on the RSDD dataset. Since RSDD is relatively well-curated, most advanced OBB detectors exhibit strong and generally comparable performance, suggesting that the benchmark is close to saturation. Under such a competitive setting, SAR-DRBNet still demonstrates consistent advantages across multiple metrics.

Specifically, SAR-DRBNet achieves a Precision of 0.94985 and a Recall of 0.92218, leading to the highest F1-score of 0.93581 among all compared methods. This indicates improved detection stability and a more favorable precision–recall trade-off for rotated ship detection.

Regarding detection accuracy, SAR-DRBNet attains the highest mAP@50 of 0.97246, slightly surpassing other strong YOLO-based OBB baselines. Notably, as mAP@50 becomes highly saturated on RSDD, the mAP@50–95 metric is more indicative of localization quality under stricter IoU thresholds. Under this criterion, SAR-DRBNet achieves 0.74612, which remains highly competitive and stays within a narrow margin of the best-performing methods (e.g., YOLOv6-OBB with 0.74710 and YOLOv5n-OBB with 0.74665). These results suggest that the proposed method maintains robust localization capability under both loose and strict evaluation protocols while achieving superior overall balance reflected by the F1-score.

In addition, Table 11 includes computational efficiency in terms of params, GFLOPs, and FPS. SAR-DRBNet introduces a moderate computational overhead, with 7.0 M parameters and 19.3 GFLOPs, compared with lightweight YOLO-based OBB models (typically around 2–4 M parameters and 6–12 GFLOPs). Despite this increase, SAR-DRBNet still runs at 60.2 FPS, which is comparable to the baseline YOLOv13n-OBB (61.3 FPS), indicating that the accuracy gains are achieved at a controllable inference cost.

Overall, SAR-DRBNet exhibits robust advantages in F1-score and mAP@50, while maintaining top-tier mAP@50–95 performance on RSDD, validating its enhanced feature representation capability and generalization for rotated ship detection.

Figure 13 presents a comparison of experimental results on the RSDD dataset. Across all scenarios, the proposed SAR-DRBNet consistently demonstrates clear advantages over the competing models. For example, in Figure 13a, which involves strong clutter caused by port facilities, YOLOv12 exhibits unstable detection results, as highlighted by the red circles, whereas SAR-DRBNet accurately distinguishes targets from background clutter while maintaining high confidence scores.

In Figure 13b, which focuses on densely arranged offshore ships, all models are able to detect the targets; however, the bounding boxes produced by SAR-DRBNet are more compact, and the associated confidence scores are generally higher than those of YOLOv11, demonstrating its superiority in decoupling features of densely distributed targets.

Figure 13c represents the most challenging case, characterized by severe land echo interference and extremely small ship targets. As indicated by the red circles, both YOLOv11 and YOLOv12 fail to extract effective features, resulting in serious missed detections. Although YOLOv13 detects the target, the confidence score remains relatively low (0.79). In contrast, SAR-DRBNet not only successfully captures all small targets but also assigns high confidence scores (0.87).

Overall, this visual comparison strongly demonstrates that SAR-DRBNet exhibits superior robustness and detection accuracy in handling the complexity of SAR imagery and the diversity of ship targets.

4.3.7. Visualization Analysis of the Speckle Noise Suppression Effect of CkDRB

To intuitively verify the speckle-noise suppression capability of CkDRB in SAR imagery, we select three representative complex-scene examples and compare the feature-response differences between the baseline model (w/o CkDRB) and the model equipped with CkDRB (w/CkDRB) at a key intermediate layer. As shown in Figure 14a–c, the three examples correspond to: (Figure 14a) a single-target scene over open sea with weak clutter; (Figure 14b) a single-target scene in near-shore areas with strong clutter/texture interference; and (Figure 14c) a multi-target densely distributed scene in a harbor area or under a background of dense scatterers. For each example, the first column shows the original SAR image, the second column shows the feature-response heatmap extracted by the baseline model at a key layer (e.g., P3/P4 in the neck or mid-to-late layers in the backbone), and the third column shows the corresponding result after introducing CkDRB. It can be observed that in Figure 14a, the baseline still exhibits scattered high responses over sea-speckle regions, whereas adding CkDRB substantially weakens background responses and yields more concentrated target activation with a more continuous contour; in Figure 14b, the baseline tends to produce spurious activations along shorelines and textured structures, while CkDRB significantly reduces non-target high responses and makes the target boundary response clearer and more stable; and in Figure 14c, the baseline responses are prone to diffusion or adhesion under interference from dense scatterers, whereas CkDRB improves the separability of responses among different targets and suppresses interference activations induced by the dense background. These comparisons demonstrate that CkDRB can effectively suppress non-target activations caused by speckle noise while preserving target saliency, thereby enhancing target–background separability at the feature level and providing more stable discriminative cues for subsequent localization and classification in the detection head, which helps reduce false alarms and missed detections under noisy and complex-background conditions.

5. Discussion

The ablation results reported in Table 3, Table 4 and Table 5 provide strong evidence for the effectiveness of SAR-DRBNet in addressing the key challenges of rotated small-object detection in SAR imagery. Specifically, the experiments disentangle the individual contributions of each proposed component and reveal their complementary roles in enhancing detection performance under diverse SAR scattering conditions.

The DEOBB detection head substantially improves the quality of rotated bounding box regression by employing a multi-branch enhanced convolutional design. This ad-vantage is particularly evident on the DSSDD dataset, where the introduction of DEOBB leads to a remarkable increase of 9.27% in mAP50–95 (from 70.08% to 79.35%). Such improvement indicates that DEOBB effectively alleviates the ambiguity in angle-sensitive regression, which is a common bottleneck in densely distributed ship targets with arbitrary orientations.

The CkDRB module contributes primarily by suppressing speckle noise inherent to SAR imaging through its re-parameterization mechanism. This capability is reflected in the improved recall on the HRSID dataset, where strong coherent noise and background clutter are prevalent. By enhancing the signal-to-noise ratio of discriminative features, CkDRB enables the detector to recover weak ship responses that are otherwise easily sub-merged by noise, thereby improving target completeness.

Meanwhile, the DynWeave module plays a crucial role in complex coastal scenarios. On the RSDD dataset, which contains strong near-shore interference and heterogeneous backgrounds, DynWeave elevates the F1 score to 0.9358 by adaptively integrating multi-scale contextual information. This suggests that dynamic feature weaving is particularly effective in mitigating false alarms caused by land clutter while preserving high recall for small and elongated ship targets.

Building upon the validated effectiveness of individual components, SAR-DRBNet is further evaluated against a broad range of state-of-the-art detectors, including classical anchor-free and anchor-based methods (e.g., Rotated FCOS and R-Faster R-CNN) as well as recent YOLOv5–v13 OBB variants, as summarized in Table 6, Table 7 and Table 8. Across all three datasets, SAR-DRBNet consistently achieves the highest F1 scores, demonstrating a superior balance between precision and recall.

On the HRSID dataset (Table 6), characterized by complex backgrounds and severe speckle noise, SAR-DRBNet attains the highest recall of 0.87795, outperforming YOLOv13n-OBB (0.85303), and achieves a final F1 score of 0.89826. This performance gain can be largely attributed to the noise-aware design of the CkDRB module. On the densely populated DSSDD dataset (Table 7), the model exhibits strong feature decoupling capability, achieving an (mAP50 of 0.98725, slightly surpassing YOLOv5n-OBB while maintaining an exceptionally high F1 score of 0.96598. In the presence of strong coastal interference on the RSDD dataset (Table 8), SAR-DRBNet achieves the highest precision (0.94985) and an F1 score of 0.93581, outperforming Hyper-YOLO-OBB and YOLOv11n-OBB.

Although certain models, such as YOLOv8n-OBB, demonstrate marginal advantages in specific mAP50–95 metrics, SAR-DRBNet exhibits greater robustness in application-oriented SAR scenarios where detection completeness and balanced discrimination are of higher priority. This robustness stems from the incorporation of modules explicitly designed to accommodate SAR-specific scattering characteristics rather than relying solely on generic CNN architectures.

The reported quantitative results are obtained under diverse acquisition conditions across the HRSID, RSDD-SAR, and SSDD datasets, ranging from relatively simple offshore scenes to highly cluttered near-shore environments. The consistent performance gains across datasets indicate strong generalization capability. Complementary qualitative analyses (Figure 6, Figure 8 and Figure 10) further reveal that the progressive integration of DEOBB, CkDRB, and DynWeave leads to more focused target responses while effectively suppressing speckle noise and land clutter. Notably, the clear instance separation observed in dense ship distributions highlights the model’s effectiveness in mitigating feature diffusion, a critical challenge in high-density SAR target detection.

Despite these advantages, several limitations remain. The performance of SAR-DRBNet under extreme environmental conditions beyond the current datasets, such as high sea states with intense wave-induced clutter, warrants further investigation. In addition, while DynWeave significantly enhances feature fusion, the introduction of global–local attention mechanisms inevitably increases training-time computational cost compared to purely lightweight CNN architectures. Future work will explore more efficient attention designs and adaptive inference strategies to further balance accuracy and efficiency.

6. Conclusions

This paper proposes a high-precision rotated small-target detection method, termed SAR-DRBNet, based on the YOLOv13 architecture, aiming to address the key challenges of rotated small-target detection in synthetic aperture radar (SAR) images under complex backgrounds and adverse environmental conditions. These challenges mainly include severe noise interference, large variations in target scale, and the need for rotation invariance. To enhance detection performance and robustness, three core innovations are integrated into the proposed framework. First, the Detail-Enhanced Oriented Bounding Box Detection Head (DEOBB) introduces GNDEConv-based multi-branch enhanced convolutions, enabling accurate feature perception and oriented bounding box (OBB) regression for rotated small targets, thereby significantly improving rotational invariance. Second, the CkDRB module, which leverages multi-branch dilated convolutions and a re-parameterization mechanism, is demonstrated to efficiently extract multi-scale features from SAR images while effectively suppressing noise. Third, the Dynamic Feature Weaving Module (DynWeave) combines a global–local dual-attention mechanism with dynamic large-kernel convolutions to achieve adaptive fusion of features across different scales and orientations, substantially enhancing robustness and detection accuracy in complex scenarios. Comprehensive experiments are conducted on three widely used SAR rotated object detection benchmarks, namely HRSID, RSDD-SAR, and DSSDD. The experimental results consistently demonstrate that the proposed method outperforms existing state-of-the-art OBB-based detection algorithms while achieving a favorable balance between detection accuracy and inference efficiency. Notably, SAR-DRBNet exhibits excellent generalization capability in cross-dataset evaluations, highlighting its strong potential for practical applications under diverse SAR imaging conditions. In summary, the proposed high-precision rotated small-target detection framework provides an efficient, stable, and highly generalizable solution for target recognition and localization in complex SAR scenes. Looking ahead, future research may focus on further model lightweighting to facilitate deployment, as well as exploring unsupervised or weakly supervised learning strategies to alleviate the high annotation cost associated with SAR imagery.

Author Contributions

Conceptualization, L.L.; methodology, L.L.; software, L.L.; validation, S.C., Z.S., X.Z., C.L., W.W. and L.M.; formal analysis, L.L.; investigation, L.L.; resources, P.Z.; data curation, L.L.; writing—original draft preparation, L.L.; writing—review and editing, S.C., Z.S. and P.Z.; visualization, L.L.; supervision, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sun, Z.; Leng, X.; Zhang, X.; Xiong, B.; Ji, K.; Kuang, G. Ship recognition for complex SAR images via dual-branch transformer fusion network. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Sun, Z.; Leng, X.; Lei, Y.; Xiong, B.; Ji, K.; Kuang, G. BiFA-YOLO: A novel YOLO-based method for arbitrary-oriented ship detection in high-resolution SAR images. Remote Sens. 2021, 13, 4209. [Google Scholar] [CrossRef]
Liu, T.; Yang, Z.; Yang, J.; Gao, G. CFAR ship detection methods using compact polarimetric SAR in a K-Wishart distribution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3737–3745. [Google Scholar] [CrossRef]
Jamali, A.; Roy, S.K.; Beni, L.H.; Pradhan, B.; Li, J.; Ghamisi, P. Residual wave vision U-Net for flood mapping using dual polarization Sentinel-1 SAR imagery. Int. J. Appl. Earth Obs. Geoinf. 2024, 127, 103662. [Google Scholar] [CrossRef]
Han, S.; Wang, J.; Zhang, S. Former-CR: A transformer-based thick cloud removal method with optical and SAR imagery. Remote Sens. 2023, 15, 1196. [Google Scholar] [CrossRef]
Duan, Y.; Sun, K.; Li, W.; Wei, J.; Gao, S.; Tan, Y.; Zhou, W.; Liu, J.; Liu, J. WCMU-net: An effective method for reducing the impact of speckle noise in SAR image change detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 2880–2892. [Google Scholar] [CrossRef]
Wang, X.; Wang, Y.; Li, S. Study on Coherent Speckle Noise Suppression in the SAR Images Based on Regional Division. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 11703–11715. [Google Scholar] [CrossRef]
Hu, Q.; Peng, Y.; Zhang, C.; Lin, Y.; Kintak, U.; Chen, J.J.B. Building Instance Extraction via Multi-Scale Hybrid Dual-Attention Network. Buildings 2025, 15, 3102. [Google Scholar] [CrossRef]
Chang, S.; Tang, S.; Deng, Y.; Zhang, H.; Liu, D.; Wang, W. An Advanced Scheme for Deceptive Jammer Localization and Suppression in Elevation Multichannel SAR for Underdetermined Scenarios. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 1–18. [Google Scholar] [CrossRef]
Deng, Y.; Tang, S.; Chang, S.; Zhang, H.; Liu, D.; Wang, W. A Novel Scheme for Range Ambiguity Suppression of Spaceborne SAR Based on Underdetermined Blind Source Separation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–15. [Google Scholar] [CrossRef]
Chang, S.; Deng, Y.; Zhang, Y.; Zhao, Q.; Wang, R.; Jia, X. An Advanced Scheme for Range Ambiguity Suppression of Spaceborne SAR Based on Cocktail Party Effect. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 2075–2078. [Google Scholar]
Chang, S.; Deng, Y.; Zhang, Y.; Zhao, Q.; Wang, R.; Zhang, K. An Advanced Scheme for Range Ambiguity Suppression of Spaceborne SAR Based on Blind Source Separation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
Kaplan, L.M. Improved SAR target detection via extended fractal features. IEEE Trans. Aerosp. Electron. Syst. 2001, 37, 436–451. [Google Scholar] [CrossRef]
Martí-Cardona, B.; López-Martínez, C.; Dolz-Ripollés, J. Local Isotropy Indicator for SAR Image Filtering: Application to Envisat/ASAR Images of the Doñana Wetland (November 2014). IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 8, 1614–1622. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Du, L.; Zhang, R.; Wang, X. Overview of two-stage object detection algorithms. J. Phys. Conf. Ser. 2020, 1544, 012033. [Google Scholar] [CrossRef]
Shah, S.; Tembhurne, J.V. Object detection using convolutional neural networks and transformer-based models: A review. J. Electr. Syst. Inf. Technol. 2023, 10, 54. [Google Scholar] [CrossRef]
Lei, M.; Li, S.; Wu, Y.; Hu, H.; Zhou, Y.; Zheng, X.; Ding, G.; Du, S.; Wu, Z.; Gao, Y. YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception. arXiv 2025, arXiv:2506.17733. [Google Scholar]
Rohling, H. Radar CFAR thresholding in clutter and multiple target situations. IEEE Trans. Aerosp. Electron. Syst. 2007, AES-19, 608–621. [Google Scholar] [CrossRef]
Song, S.; Xu, B.; Yang, J. SAR target recognition via supervised discriminative dictionary learning and sparse representation of the SAR-HOG feature. Remote Sens. 2016, 8, 683. [Google Scholar] [CrossRef]
Dellinger, F.; Delon, J.; Gousseau, Y.; Michel, J.; Tupin, F. SAR-SIFT: A SIFT-like algorithm for SAR images. IEEE Trans. Geosci. Remote Sens. 2014, 53, 453–466. [Google Scholar] [CrossRef]
Li, H.L.; Chen, S.W. General Polarimetric Correlation Pattern: A Visualization and Characterization Tool for Target Joint-Domain Scattering Mechanisms Investigation. IEEE Trans. Geosci. Remote Sens. 2026, 64, 1–17. [Google Scholar] [CrossRef]
Li, H.-L.; Chen, S.-W. Polyhedral corner reflectors multi-domain joint characterization with fully polarimetric radar. IEEE Trans. Antennas Propag. 2025, 73, 10679–10693. [Google Scholar] [CrossRef]
Li, H.-L.; Liu, S.-W.; Chen, S.-W.J. PolSAR ship characterization and robust detection at different grazing angles with polarimetric roll-invariant features. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–18. [Google Scholar] [CrossRef]
Lin, Z.; Ji, K.; Leng, X.; Kuang, G. Squeeze and excitation rank faster R-CNN for ship detection in SAR images. IEEE Geosci. Remote Sens. Lett. 2018, 16, 751–755. [Google Scholar] [CrossRef]
Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense attention pyramid networks for multi-scale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
Zhang, X.; Huo, C.; Xu, N.; Jiang, H.; Cao, Y.; Ni, L.; Pan, C. Multitask learning for ship detection from synthetic aperture radar images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8048–8062. [Google Scholar] [CrossRef]
Wu, Z.; Hou, B.; Ren, B.; Ren, Z.; Wang, S.; Jiao, L. A deep detection network based on interaction of instance segmentation and object detection for SAR images. Remote Sens. 2021, 13, 2582. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A novel quad feature pyramid network for SAR ship detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
Chai, B.; Nie, X.; Zhou, Q.; Zhou, X. Enhanced cascade R-CNN for multiscale object detection in dense scenes from SAR images. IEEE Sens. J. 2024, 24, 20143–20153. [Google Scholar] [CrossRef]
Zhan, S.; Zhong, M.; Yang, Y.; Lu, G.; Zhou, X. MFT-Reasoning RCNN: A Novel Multi-Stage Feature Transfer Based Reasoning RCNN for Synthetic Aperture Radar (SAR) Ship Detection. Remote Sens. 2025, 17, 1170. [Google Scholar] [CrossRef]
Kamirul, K.; Pappas, O.; Achim, A. Sparse R-CNN OBB: Ship Target Detection in SAR Images Based on Oriented Sparse Proposals. arXiv 2024, arXiv:2409.07973. [Google Scholar] [CrossRef]
Jiang, M.; Gu, L.; Li, X.; Gao, F.; Jiang, T. Ship contour extraction from SAR images based on faster R-CNN and Chan–Vese model. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Chen, J.; Shen, Y.; Liang, Y.; Wang, Z.; Zhang, Q. Yolo-sad: An efficient SAR aircraft detection network. Appl. Sci. 2024, 14, 3025. [Google Scholar] [CrossRef]
Chen, P.; Wang, Y.; Liu, H. GCN-YOLO: YOLO based on graph convolutional network for SAR vehicle target detection. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Chen, Z.; Liu, C.; Filaretov, V.F.; Yukhimets, D.A. Multi-scale ship detection algorithm based on YOLOv7 for complex scene SAR images. Remote Sens. 2023, 15, 2071. [Google Scholar] [CrossRef]
Guan, T.; Chang, S.; Wang, C.; Jia, X. SAR Small Ship Detection Based on Enhanced YOLO Network. Remote Sens. 2025, 17, 839. [Google Scholar] [CrossRef]
Tan, X.; Leng, X.; Luo, R.; Sun, Z.; Ji, K.; Kuang, G. YOLO-RC: SAR ship detection guided by characteristics of range-compressed domain. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 18834–18851. [Google Scholar] [CrossRef]
Yan, C.; Peng, X. DA-YOLO: A dynamic adaptive network for SAR ship detection. Earth Sci. Inform. 2025, 18, 536. [Google Scholar] [CrossRef]
Jiang, S.; Zhou, X. DWSC-YOLO: A lightweight ship detector of SAR images based on deep learning. J. Mar. Sci. Eng. 2022, 10, 1699. [Google Scholar] [CrossRef]
Zhang, C.; Yu, R.; Wang, S.; Zhang, F.; Ge, S.; Li, S.; Zhao, X. Edge-Optimized Lightweight YOLO for Real-Time SAR Object Detection. Remote Sens. 2025, 17, 2168. [Google Scholar] [CrossRef]
He, H.; Hu, T.; Xu, S.; Xu, H.; Song, L.; Sun, Z. PPDM-YOLO: A Lightweight Algorithm for SAR Ship Image Target Detection in Complex Environments. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 22690–22705. [Google Scholar] [CrossRef]
Huang, Y.; Wang, D.; Wu, B.; An, D. NST-YOLO11: ViT merged model with neuron attention for arbitrary-oriented ship detection in SAR images. Remote Sens. 2024, 16, 4760. [Google Scholar] [CrossRef]
Chen, H.; Chen, C.; Wang, F.; Shi, Y.; Zeng, W. RSNet: A Light Framework for The Detection of SAR Ship Detection. arXiv 2024, arXiv:2410.23073. [Google Scholar]
Zhang, Y.; Sun, Z.; Chang, S.; Tang, B.; Hou, B. SegS-YOLO: A YOLO-Based Instance Segmentation Approach for Small-Scale Ship Targets in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2026, 19, 2658–2679. [Google Scholar] [CrossRef]
Xia, R.; Chen, J.; Huang, Z.; Wan, H.; Wu, B.; Sun, L.; Yao, B.; Xiang, H.; Xing, M. CRTransSar: A visual transformer based on contextual joint representation learning for SAR ship detection. Remote Sens. 2022, 14, 1488. [Google Scholar] [CrossRef]
Zhou, Y.; Jiang, X.; Xu, G.; Yang, X.; Liu, X.; Li, Z. PVT-SAR: An arbitrarily oriented SAR ship detector with pyramid vision transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 16, 291–305. [Google Scholar] [CrossRef]
Zhao, S.; Luo, Y.; Zhang, T.; Guo, W.; Zhang, Z. A domain specific knowledge extraction transformer method for multisource satellite-borne SAR images ship detection. ISPRS J. Photogramm. Remote Sens. 2023, 198, 16–29. [Google Scholar] [CrossRef]
Feng, Y.; You, Y.; Tian, J.; Meng, G. OEGR-DETR: A novel detection transformer based on orientation enhancement and group relations for SAR object detection. Remote Sens. 2023, 16, 106. [Google Scholar] [CrossRef]
Fang, M.; Gu, Y.; Peng, D. FEVT-SAR: Multi-category oriented SAR ship detection based on feature enhancement vision transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 2704–2717. [Google Scholar] [CrossRef]
Sun, Z.; Zhang, X.; Leng, X.; Wu, X.; Xiong, B.; Ji, K.; Kuang, G. KFIA-Net: A knowledge fusion and imbalance-awarenetwork for multi-category SAR ship detection. Int. J. Appl. Earth Obs. Geoinf. 2026, 146, 105127. [Google Scholar] [CrossRef]
Yang, Z.; Xia, X.; Liu, Y.; Wen, G.; Zhang, W.E.; Guo, L. LPST-Det: Local-perception-enhanced swin transformer for SAR ship detection. Remote Sens. 2024, 16, 483. [Google Scholar] [CrossRef]
Zhou, S.; Zhang, M.; Wu, L.; Yu, D.; Li, J.; Fan, F.; Zhang, L.; Liu, Y. Lightweight sar ship detection network based on transformer and feature enhancement. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4845–4858. [Google Scholar] [CrossRef]
Sivapriya, M.; Suresh, S. ViT-DexiNet: A vision transformer-based edge detection operator for small object detection in SAR images. Int. J. Remote Sens. 2023, 44, 7057–7084. [Google Scholar] [CrossRef]
Qin, C.; Zhang, L.; Wang, X.; Li, G.; He, Y.; Liu, Y. RDB-DINO: An improved end-to-end transformer with refined de-noising and boxes for small-scale ship detection in SAR images. IEEE Trans. Geosci. Remote Sens. 2024, 63, 1–17. [Google Scholar] [CrossRef]
Li, T.; Wang, C.; Tian, S.; Zhang, B.; Wu, F.; Tang, Y.; Zhang, H. TACMT: Text-Aware Cross-Modal Transformer for Visual Grounding on High-Resolution SAR Images. ISPRS J. Photogramm. Remote Sens. 2025, 222, 152–166. [Google Scholar] [CrossRef]
Zhang, Y.; Cao, J.; You, Y.; Qiao, Y. DenSe-AdViT: A novel Vision Transformer for Dense SAR Object Detection. arXiv 2025, arXiv:2504.13638. [Google Scholar]
Luo, Z.; Zeng, Z.; Tang, W.; Wan, J.; Xie, Z.; Xu, Y. Dense dual-branch cross attention network for semantic segmentation of large-scale point clouds. IEEE Trans. Geosci. Remote Sens. 2023, 62, 1–16. [Google Scholar] [CrossRef]
Luo, Z.; Zeng, Z.; Wan, J.; Tang, W.; Jin, Z.; Xie, Z.; Xu, Y. D2T-Net: A dual-domain transformer network exploiting spatial and channel dimensions for semantic segmentation of urban mobile laser scanning point clouds. Int. J. Appl. Earth Obs. Geoinf. 2024, 132, 104039. [Google Scholar] [CrossRef]
Luo, Z.; Zeng, T.; Jiang, X.; Peng, Q.; Ma, Y.; Xie, Z.; Pan, X. Dense Supervised Dual-Aware Contrastive Learning for Airborne Laser Scanning Weakly Supervised Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–15. [Google Scholar] [CrossRef]
Zhang, Y.; Ding, X.; Yue, X. Scaling up your kernels: Large kernel design in convnets towards universal representations. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 11692–11707. [Google Scholar] [CrossRef]
Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
Congan, X.; Hang, S.; Jianwei, L.; Yu, L.; Libo, Y.; Long, G.; Wenjun, Y.; Taoyang, W. RSDD-SAR: Rotated ship detection dataset in SAR images. J. Radars 2022, 11, 581–599. [Google Scholar]
Hu, Y.; Li, Y.; Pan, Z. A dual-polarimetric SAR ship detection dataset and a memory-augmented autoencoder-based detection method. Sensors 2021, 21, 8478. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Hou, B.; Wu, Z.; Ren, B.; Yang, C. FCOSR: A simple anchor-free rotated detector for aerial object detection. Remote Sens. 2023, 15, 5499. [Google Scholar] [CrossRef]
Yang, S.; Pei, Z.; Zhou, F.; Wang, G. Rotated faster R-CNN for oriented object detection in aerial images. In Proceedings of the 2020 3rd International Conference on Robot Systems and Applications, Chengdu, China, 13 July 2020; pp. 35–39. [Google Scholar]
Zhang, Y.; Chen, C.; Hu, R.; Yu, Y. ESarDet: An efficient SAR ship detection method based on context information and large effective receptive field. Remote Sens. 2023, 15, 3018. [Google Scholar] [CrossRef]
He, R.; Han, D.; Shen, X.; Han, B.; Wu, Z.; Huang, X. AC-YOLO: A lightweight ship detection model for SAR images based on YOLO11. PLoS ONE 2025, 20, e0327362. [Google Scholar] [CrossRef]
Hu, R.; Lin, H.; Lu, Z.; Xia, J. Despeckling Representation for Data-Efficient SAR Ship Detection. IEEE Geosci. Remote Sens. Lett. 2024, 22, 1–5. [Google Scholar] [CrossRef]
Feng, Y.; Huang, J.; Du, S.; Ying, S.; Yong, J.-H.; Li, Y.; Ding, G.; Ji, R.; Gao, Y. Hyper-yolo: When visual object detection meets hypergraph computation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 2388–2401. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Tian, Y.; Ye, Q.; Doermann, D. Yolov12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]

Figure 1. Schematic illustration of the SAR-DRBNet architecture. The framework consists of three components: Backbone, Neck, and Head. Specifically, the light blue regions denote the CkDRB modules (feature extraction stage), the light brown regions represent the DynWeave structure (feature fusion stage), and the yellow regions correspond to the DEOBB module (object detection stage).

Figure 2. Detail-Enhanced Oriented Bounding Box Detect Head. The head enhances multi-scale SAR features via shared GNDEConv modules for oriented target detection.

Figure 3. CkDRB module. This figure illustrates the two structural variants of the proposed CDRB feature extraction module, along with its core component, the DRB.

Figure 4. Dynamic Feature Weaving Module. The module performs feature fusion through attention weighting and dynamic multi-scale convolution.

Figure 5. Comparison of multi-metric training curves on the HRSID dataset.

Figure 6. Grad-CAM visualization results of the ablation experiments on the HRSID dataset. (a) Sparse small-target scenario; (b) near-shore dense-target scenario; (c) large-scale single-target scenario.

Figure 7. Comparison of multi-metric training curves on the DSSDD dataset.

Figure 8. Grad-CAM visualization results of the ablation experiments on the DSSDD dataset. (a) Single-target scenario with a simple background; (b) offshore dispersed multi-target scenario; (c) complex near-shore scenario.

Figure 9. Comparison of multi-metric training curves on the RSDD dataset.

Figure 10. Grad-CAM visualization results of the ablation experiments on the RSDD dataset. (a) Small-target scenario with a simple background; (b) dense docked-ship scenario; (c) strong land-clutter interference scenario.

Figure 11. Comparison of detection results on the HRSID dataset. (a) Sparse target scenario; (b) dense target scenario; (c) near-shore port scenario.

Figure 12. Comparison of detection results on the DSSDD dataset. (a) Weak-target scenario; (b) multi-target scenario; (c) complex near-shore small-target scenario.

Figure 13. Comparison of detection results on the DSSDD dataset. (a) Near-shore scenario with strong background clutter; (b) offshore scenario with densely distributed targets; (c) complex scenario with land interference and extremely small targets.

Figure 14. Comparison of key-layer feature-response heatmaps without CkDRB and with CkDRB. (a) Single-target scene over open sea with weak clutter; (b) single-target scene in near-shore areas with strong clutter/texture interference; (c) densely distributed multi-target scene in a harbor area or under a background of dense scatterers.

Table 1. Experimental Environment Configuration.

Environmental Parameter	Value
Operating system	Ubuntu 18.04
Deep learning framework	Pytorch 2.1.1
Programming language	Python 3.8
CPU	Intel Xeon Scale 8358
GPU	NVIDIA A100 (SXM4, 80 GB)
RAM	256 GB

Table 2. Training Hyperparameters.

Hyperparameters	Value
Learning Rate	0.01
Image Size	640 × 640
Momentum	SGD
Batch Size	32
Weight decay	0.0005
Optimizer	SGD
Epoch	300
Weight Decay	0.0005

Table 3. Detailed Information Concerning the Datasets.

Dataset	HRSID	RSDD-SAR	DSSDD
Images	5604	7000	1236
Image Size/Res.	800 × 800 pixels/0.5–3 m	512 × 512 pixels/ multi-resolution	256 × 256 pixels/ ~9 m × 14 m
Polarization	VV, HV, HH	Multiple polarization modes (GF-3, TerraSAR-X)	VV, VH
Key Strengths	Large, high-resolution, fine-grained; ideal for training complex models	Arbitrary object orientations, large aspect ratios, high proportion of small targets, diverse scenes	Dual-polarization pseudo-color fusion, 98% of targets are small targets

Table 4. Train/Val Partitioning and Subset Statistics.

Dataset	Split Strategy	Train (Images/Patches)	Train Instances	Val (Images/Patches)	Val Instances
HRSID	Custom (8:2)	4483 images	13,344	1121 images	3607
RSDD-SAR	Official	5600 patches	8144	1400 patches	2119
DSSDD	Official	856 patches	2277	380 patches	1139

Table 5. Small-target statistics in pixel space.

Dataset	Total Inst.	Small Inst. (%) (A < 1024)	Area A (px²) P5/P50/P95	w (px) P5/P50/P95	w (px) P5/P50/P95
HRSID	16,951	13,330 (78.64%)	78.00/540.19/2315.29	13.42/45.12/99.67	5.30/12.02/25.61
RSDD-SAR	10,263	8331 (81.18%)	101.75/506.95/2052.88	16.00/43.61/97.65	6.06/11.38/22.40
DSSDD	3416	3416 (100.00%)	59.49/138.23/305.34	10.00/18.03/30.00	5.63/7.66/10.71

Table 6. Ablation experiments on the HRSID dataset.

Model	Precision ↑	Recall ↑	F1 ↑	mAP50 ↑	mAP50–95 ↑	Params	GFLOPs	FPS
Baseline	0.91573	0.85303	0.883269	0.93792	0.66690	2.5 M	6.4	40.5
+DEOBB	0.91292	0.86713	0.889436	0.93975	0.67436	2.1 M	5.9	39.6
+CkDRB	0.91884	0.87462	0.896185	0.94455	0.69224	6.4 M	18.2	47.6
+DynWeave	0.91953	0.87795	0.898259	0.94872	0.69768	7.0 M	19.3	43.8

Table 7. Ablation experiments on the DSSDD dataset.

Model	Precision ↑	Recall ↑	F1 ↑	mAP50 ↑	mAP50–95 ↑	Params	GFLOPs	FPS
Baseline	0.94675	0.95088	0.94878	0.98543	0.70081	2.5 M	6.4	60.8
+DEOBB	0.95335	0.96576	0.95951	0.98664	0.79347	2.1 M	5.9	60.1
+CkDRB	0.95992	0.96722	0.96356	0.98687	0.80305	6.4 M	18.2	67.0
+DynWeave	0.96414	0.96783	0.96598	0.98725	0.81167	7.0 M	19.3	57.7

Table 8. Ablation experiments on the RSDD dataset.

Model	Precision ↑	Recall ↑	F1 ↑	mAP50 ↑	mAP50–95 ↑	Params	GFLOPs	FPS
Baseline	0.94623	0.91076	0.92817	0.9710	0.73793	2.5 M	6.4	61.3
+DEOBB	0.94674	0.9136	0.92987	0.97178	0.73934	2.1 M	5.9	59.4
+CkDRB	0.94695	0.92162	0.93412	0.97245	0.74636	6.4 M	18.2	65.8
+DynWeave	0.94985	0.92218	0.93581	0.97246	0.74612	7.0 M	19.3	60.2

Table 9. Results of model comparison experiments on the HRSID dataset.

Model	Precision ↑	Recall ↑	F1 ↑	mAP50 ↑	mAP50–95 ↑	Params	GFLOPs	FPS
Rotated FCOS [65]	0.91070	0.87100	0.89039	0.8040		32.7 M	207.7	29.7
R-FasterRCNN [66]	0.91085	0.85100	0.87994	0.80120		36.4 M	215.9	16.5
ESarDet-OBB [67]	0.91161	0.71429	0.80098	0.81159	0.59072	3.5 M	9.7	69.4
AC-YOLO-OBB [68]	0.91915	0.84201	0.87889	0.93524	0.6732	1.8 M	5.5	98.8
DS-YOLO-OBB [69]	0.91233	0.87278	0.89212	0.94732	0.68996	9.4 M	25.6	124.3
Hyper-YOLO-OBB [70]	0.91771	0.85998	0.88791	0.94183	0.68258	2.8 M	7.9	103.9
YOLOv5n-OBB	0.91761	0.87046	0.89341	0.94314	0.68418	2.6 M	7.3	138.4
YOLOv6-OBB [71]	0.90892	0.83874	0.87242	0.92795	0.66048	4.3 M	11.8	157.9
YOLOv8n-OBB	0.89378	0.84170	0.86695	0.91702	0.65310	3.1 M	8.3	142.5
YOLOv9t-OBB [72]	0.92244	0.85354	0.89125	0.94346	0.69362	2.0 M	7.8	64.7
YOLOv10n-OBB [73]	0.9154	0.84577	0.88379	0.94374	0.70387	2.3 M	6.8	122.4
YOLOv11n-OBB	0.91783	0.85298	0.88421	0.93830	0.67669	2.7 M	6.6	114.8
YOLOv12n-OBB [74]	0.91185	0.85076	0.88024	0.93349	0.66740	2.6 M	6.1	60.4
YOLOv13n-OBB [18]	0.91573	0.85303	0.88327	0.93792	0.66690	2.5 M	6.4	40.5
Ours (SAR-DRBNet)	0.91953	0.87795	0.89826	0.94872	0.69768	7.0 M	19.3	43.8

Table 10. Comparison with other advanced methods on the DSSDD dataset.

Model	Precision ↑	Recall ↑	F1 ↑	mAP50 ↑	mAP50–95 ↑	Params	GFLOPs	FPS
Rotated FCOS [65]	0.9177	0.8710	0.8012	0.9177		32.7 M	207.7	26.9
R-FasterRCNN [66]	0.91085	0.8510	0.8012	0.91085		36.4 M	215.9	18.1
ESarDet-OBB [67]	0.94962	0.5338	0.68343	0.71286	0.58991	3.5 M	9.7	70.0
AC-YOLO-OBB [68]	0.95428	0.95292	0.95360	0.98683	0.81065	1.8 M	5.5	132.2
DS-YOLO-OBB [69]	0.9519	0.9640	0.95791	0.98614	0.80953	9.4 M	25.6	161.2
Hyper-YOLO-OBB [70]	0.9646	0.9569	0.96073	0.98605	0.80755	2.8 M	7.9	130.7
YOLOv5n-OBB	0.95714	0.95314	0.95514	0.98724	0.81165	2.6 M	7.3	171.9
YOLOv6-OBB [71]	0.96658	0.96481	0.96569	0.98720	0.81368	4.3 M	11.8	197.5
YOLOv8n-OBB	0.95465	0.96839	0.96147	0.98602	0.81533	3.1 M	8.3	188.7
YOLOv9t-OBB [72]	0.95027	0.97542	0.96268	0.98633	0.80597	2.0 M	7.8	81.2
YOLOv10n-OBB [73]	0.95583	0.96905	0.96239	0.98684	0.81099	2.3 M	6.8	157.5
YOLOv11n-OBB	0.96152	0.96531	0.96341	0.98644	0.80976	2.7 M	6.6	144.9
YOLOv12n-OBB [74]	0.96197	0.96839	0.96516	0.98632	0.81117	2.6 M	6.1	86.3
YOLOv13n-OBB [18]	0.94675	0.95088	0.94878	0.98543	0.70081	2.5 M	6.4	60.8
Ours (SAR-DRBNet)	0.96414	0.96783	0.96598	0.98725	0.81167	7.0 M	19.3	57.7

Table 11. Results of model comparison experiments on the RSDD dataset.

Model	Precision ↑	Recall ↑	F1 ↑	mAP50 ↑	mAP50–95 ↑	Params	GFLOPs	FPS
Rotated FCOS [60]	0.92300	0.89610	0.90970	0.95900		32.7 M	207.7	27.2
R-FasterRCNN [61]	0.93440	0.85700	0.89410	0.92700		36.4 M	215.9	17.4
ESarDet-OBB [62]	0.94696	0.81681	0.87708	0.88508	0.69041	3.5 M	9.7	75.0
AC-YOLO-OBB [63]	0.93996	0.8966	0.91777	0.96573	0.74055	1.8 M	5.5	133.2
DS-YOLO-OBB [64]	0.92813	0.91643	0.92224	0.96702	0.74531	9.4 M	25.6	152.0
Hyper-YOLO-OBB [65]	0.93475	0.93107	0.93291	0.97244	0.74615	2.8 M	7.9	134.7
YOLOv5n-OBB	0.94689	0.92635	0.93551	0.97156	0.74665	2.6 M	7.3	174.0
YOLOv6-OBB [66]	0.92541	0.9239	0.92465	0.97158	0.74710	4.3 M	11.8	202.6
YOLOv8n-OBB	0.93544	0.92493	0.93016	0.97199	0.73964	3.1 M	8.3	187.3
YOLOv9t-OBB [67]	0.9388	0.92257	0.93061	0.97195	0.73469	2.0 M	7.8	80.4
YOLOv10n-OBB [68]	0.94487	0.92351	0.93407	0.97213	0.74174	2.3 M	6.8	155.6
YOLOv11n-OBB	0.94886	0.91785	0.93309	0.97140	0.74577	2.7 M	6.6	140.1
YOLOv12n-OBB [69]	0.9392	0.92631	0.93271	0.97217	0.74198	2.6 M	6.1	84.0
YOLOv13n-OBB [17]	0.94623	0.91076	0.92816	0.97100	0.73793	2.5 M	6.4	61.3
Ours (SAR-DRBNet)	0.94985	0.92218	0.93581	0.97246	0.74612	7.0 M	19.3	60.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lei, L.; Chang, S.; Sun, Z.; Zheng, X.; Liao, C.; Wei, W.; Ma, L.; Zhong, P. SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection. Remote Sens. 2026, 18, 619. https://doi.org/10.3390/rs18040619

AMA Style

Lei L, Chang S, Sun Z, Zheng X, Liao C, Wei W, Ma L, Zhong P. SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection. Remote Sensing. 2026; 18(4):619. https://doi.org/10.3390/rs18040619

Chicago/Turabian Style

Lei, Lanfang, Sheng Chang, Zhongzhen Sun, Xinli Zheng, Changyu Liao, Wenjun Wei, Long Ma, and Ping Zhong. 2026. "SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection" Remote Sensing 18, no. 4: 619. https://doi.org/10.3390/rs18040619

APA Style

Lei, L., Chang, S., Sun, Z., Zheng, X., Liao, C., Wei, W., Ma, L., & Zhong, P. (2026). SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection. Remote Sensing, 18(4), 619. https://doi.org/10.3390/rs18040619

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SAR-DRBNet: Adaptive Feature Weaving and Algebraically Equivalent Aggregation for High-Precision Rotated SAR Detection

Highlights

Abstract

1. Introduction

2. Related Work

2.1. Two-Stage Detection Methods

2.2. Single-Stage Detection Methods

2.3. Transformer-Based Detection Methods

3. Materials and Methods

3.1. Detail-Enhanced Oriented Bounding Box Detection Head (DEOBB)

3.2. Ck-MultiDilated Reparameterization Block (CkDRB)

3.3. Dynamic Feature Weaving Module (Complete Formulation)

4. Results

4.1. Experimental Platform and Evaluation Metrics

4.2. Datasets

4.3. Ablation Study and Comparison Experiments

4.3.1. Ablation Experiments on the HRSID Dataset

4.3.2. Ablation Experiments on the DSSDD Dataset

4.3.3. Ablation Experiments on the RSDD Dataset

4.3.4. Model Comparison Experiments on the HRSID Dataset

4.3.5. Model Comparison Experiments on the DSSDD Dataset

4.3.6. Model Comparison Experiments on the RSDD Dataset

4.3.7. Visualization Analysis of the Speckle Noise Suppression Effect of CkDRB

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI