Previous Article in Journal
Detection of Aguadas (Ponds) Through Remote Sensing in the Bajo El Laberinto Region, Calakmul, Campeche, Mexico
Previous Article in Special Issue
Enhancing YOLO-Based SAR Ship Detection with Attention Mechanisms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DWTF-DETR: A DETR-Based Model for Inshore Ship Detection in SAR Imagery via Dynamically Weighted Joint Time–Frequency Feature Fusion

1
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
2
School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
3
National Key Laboratory of Science and Technology on Space-Born Intelligent Information Processing, Beijing Institute of Technology, Beijing 100081, China
4
Hubei Geospatial Information Technology Group Co., Ltd., Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(19), 3301; https://doi.org/10.3390/rs17193301
Submission received: 17 July 2025 / Revised: 17 September 2025 / Accepted: 19 September 2025 / Published: 25 September 2025

Abstract

Highlights

What are the main findings?
  • A Dual-Domain Feature Fusion Module (DDFM) is proposed to jointly extract spatial and frequency-domain features, enhancing sensitivity to ship backscatter in cluttered inshore environments.
  • A Dual-Path Attention Fusion Module (DPAFM) combines shallow detail and deep semantic features via attention-based reweighting, improving robustness against blurred boundaries.
What is the implication of the main finding?
  • The dual-domain and dual-path fusion strategies validate the effectiveness of combining time–frequency information with attention-guided enhancement for SAR ship detection.
  • The findings provide insights for transformer-based detection models, with potential applications in real-time monitoring of harbors and nearshore maritime surveillance.

Abstract

Inshore ship detection in synthetic aperture radar (SAR) imagery poses significant challenges due to the high density and diversity of ships. However, low inter-object backscatter contrast and blurred boundaries of docked ships often result in performance degradation for traditional object detection methods, especially under complex backgrounds and low signal-to-noise ratio (SNR) conditions. To address these issues, this paper proposes a novel detection framework, the Dynamic Weighted Joint Time–Frequency Feature Fusion DEtection TRansformer (DETR) Model (DWTF-DETR), specifically designed for SAR-based ship detection in inshore areas. The proposed model integrates a Dual-Domain Feature Fusion Module (DDFM) to extract and fuse features from both SAR images and their frequency-domain representations, enhancing sensitivity to both high- and low-frequency target features. Subsequently, a Dual-Path Attention Fusion Module (DPAFM) is introduced to dynamically weight and fuse shallow detail features with deep semantic representations. By leveraging an attention mechanism, the module adaptively adjusts the importance of different feature paths, thereby enhancing the model’s ability to perceive targets with ambiguous structural characteristics. Experiments conducted on a self-constructed inshore SAR ship detection dataset and the public HRSID dataset demonstrate that DWTF-DETR achieves superior performance compared to the baseline RT-DETR. Specifically, the proposed method improves mAP@50 by 1.60% and 0.72%, and F1-score by 0.58% and 1.40%, respectively. Moreover, comparative experiments show that the proposed approach outperforms several state-of-the-art SAR ship detection methods. The results confirm that DWTF-DETR is capable of achieving accurate and robust detection in diverse and complex maritime environments.

1. Introduction

Ship detection plays a critical role in various maritime applications, including vessel traffic management, maritime safety, and marine environmental protection, and has received increasing attention from both academia and industry [1,2]. In oceanic regions, high humidity and frequent atmospheric disturbances—such as sea–land breezes and precipitation—greatly limit the effectiveness of optical and thermal infrared sensors due to cloud cover and rain interference [3]. In contrast, synthetic aperture radar (SAR) imagery, with its capability to penetrate clouds and operate under all-weather and day-and-night conditions [4], offers significant advantages for maritime target monitoring. Consequently, SAR-based ship detection has become a prominent research focus [5], and numerous studies have been conducted to address the specific challenges and application needs of ship detection in SAR imagery, particularly under complex background conditions and using deep learning-based methods [6].
In recent years, ship detection methods in SAR imagery have generally been categorized into traditional approaches and deep learning-based approaches. With the introduction of public SAR ship detection datasets such as SSDD [7] and HRSID [8], and the rapid development of convolutional neural networks (CNNs), deep learning-based methods have gradually become the mainstream. These methods do not rely on handcrafted modeling and are less susceptible to environmental interference in feature extraction, making them more robust and adaptive in complex maritime scenes [9].
According to the detection paradigm, deep learning-based methods can be further divided into single-stage detectors (e.g., SSD [10], YOLO [11]) and two-stage detectors (e.g., R-CNN [12], Faster R-CNN [13]). In 2020, Facebook AI proposed the DEtection TRansformer (DETR) [14], which captures global contextual information and long-range dependencies via cross-attention mechanisms. By eliminating the reliance on non-maximum suppression (NMS), DETR addresses common issues of incorrect bounding box retention and deletion in YOLO and CNN-based frameworks [15], thereby improving the accuracy of small object detection. As a result, researchers have started exploring its potential in SAR image target detection, and transformer-based architectures have begun to attract increasing attention in this domain [16].
An analysis of the scene composition in existing datasets reveals a growing proportion of inshore and docked ship samples in SAR ship detection benchmarks. For instance, in the SSDD dataset proposed in 2017 [7], inshore scenes accounted for 19.80%; and in the SRSDD-v1.0 dataset introduced in 2021 [17], the proportion of inshore scenes increased significantly to 63.10%. This trend reflects the increasing attention paid by researchers to ship detection in port and coastal environments. From an application perspective, ships in inshore and port regions are generally more numerous and densely distributed. Their activities have a substantial impact on inland economic development, making ship detection in these areas of greater practical significance in real-world maritime applications.
Unlike ship detection in open-sea scenarios, inshore and dockside environments present unique challenges due to the presence of man-made structures such as buildings, docks, and port facilities [18,19]. In these scenes, ships often exhibit backscattering characteristics similar to surrounding coastal infrastructure, which introduces significant interference from visually and radiometrically similar objects. Such interference can mislead the network during feature learning, making it difficult to accurately distinguish ship contours and structural details. As a result, feature extraction networks may fail to generate reliable ship representations, leading to a higher rate of false alarms and missed detections in port scenes [20].
Considering the specific characteristics of ships within port environments, the major challenges of SAR-based ship detection in inshore scenarios (as shown in Figure 1) can be summarized as following two points:
Challenge 1. Low backscatter contrast between ship targets and surrounding objects in port areas: Facilities and structures commonly found in ports, such as containers, gantry cranes, and warehouses, often share similar metallic materials and structural forms with ships [21]. Due to the imaging principles of SAR, which is highly sensitive to the reflective characteristics of objects, these man-made structures tend to produce strong backscatter signals, appearing as bright spots in the SAR imagery. Traditional methods based on thresholding or template matching are prone to falsely classify these facilities as ship targets, thereby reducing the overall detection precision and recall. See Figure 1a for an example.
To address the aforementioned challenges, existing studies have primarily explored three directions: land interference removal methods, context-aware enhancement methods, and attention mechanism-based methods.
Land interference removal methods typically perform sea–land segmentation as a preprocessing step to eliminate land regions from port SAR images before applying object detection. A representative method combines threshold-based sea–land segmentation with a CNN-based detection framework. However, such methods often incur high computational cost and are sensitive to segmentation errors in complex terrains, which may lead to the misclassification of land areas as potential ship targets, ultimately reducing detection accuracy [22]. Context-aware enhancement methods aim to improve target feature representation by modeling spatial relationships between ships and their surrounding environment, thereby suppressing background interference. Typical architectures such as GCAM [23] and GC-FBP [24] employ context modules to capture long-range dependencies between ships and background objects, achieving competitive performance on public datasets. However, these methods heavily rely on diverse and well-annotated training data. In complex port scenes, the integration of noisy contextual information may increase semantic ambiguity and degrade model performance. Attention mechanism-based methods are designed to guide the network’s focus toward regions relevant to ships [25]. Notable examples include pyramid pooling attention [26] and anchor-free attention mechanisms [27,28]. These approaches enhance salient features and suppress interference from coastal structures through adaptive weighting, thereby improving detection accuracy to a certain extent. Nevertheless, most of these methods focus primarily on time-domain features and have yet to fully exploit the frequency-domain characteristics that are unique to SAR imagery.
Challenge 2. Difficulty in extracting structural features of docked ships: In port environments, docked ships are often tightly adjacent to quay-side facilities. The radar backscatter characteristics of these facilities are highly similar to those of the ships, resulting in the visual merging of ship contours with surrounding port structures in SAR imagery. Consequently, the connected regions between ships and dock structures cannot be clearly separated, which reduces the distinctiveness of structural features [29]. This often leads to detection boxes that are excessively large—partially enclosing port infrastructure—or to missed detections due to the incomplete recognition of ship targets. See Figure 1b for an illustration.
At present, there are relatively few research approaches proposed by domestic and international scholars to address the above-mentioned issues, with most relying on edge feature enhancement methods for processing [30]. Feng, Yan et al. [31]. proposed an edge self-attention mechanism that incorporates edge gradient matrices—extracted via edge detection algorithms—into the feature pyramid structure to effectively enhance ship boundary representation. Wu, Guoqing et al. [32] employed classical edge detection operators combined with adaptive edge extraction algorithms to delineate ship contours from optimized SAR images. Canbin Hu et al. [33] designed a multi-scattering intensity difference (MSID) edge component that highlights ship boundaries by leveraging the differences in scattering intensity between ships and the surrounding sea.
However, these approaches primarily target scenes in which ship contours are clearly visible. In cases where docked ships have no obvious boundary with adjacent port infrastructure—i.e., when edge features are missing or incomplete—the effectiveness of these methods is significantly limited.
Motivated by the two key challenges identified above and inspired by existing research on time-domain feature extraction and frequency-domain image denoising [34], this paper proposes a novel SAR-based ship detection method for inshore scenes—the Dynamic Weighted Joint Time–Frequency Feature Fusion DETR Model (DWTF-DETR). The model builds upon the latest transformer-based detection architecture, RT-DETR, and incorporates both spatial–frequency fusion and edge feature enhancement to address the limitations of conventional detection approaches in complex inshore environments.
The proposed DWTF-DETR framework consists of two synergistic feature enhancement modules designed to improve SAR ship detection performance under challenging conditions. These modules are described as follows:
Module 1: Dual-Domain Feature Fusion Module (DDFM): This module performs synchronous feature extraction in both the spatial domain (original SAR image) and the frequency domain (obtained via Fourier transform of the SAR image). The features from the two branches are subsequently combined through a dynamic weighting mechanism to produce a joint representation that captures both spatial and spectral characteristics. This enhances the model’s sensitivity to ship backscatter features, especially in cluttered inshore environments.
Module 2: Dual-Path Attention Fusion Module (DPAFM): This module integrates high-resolution detail features from shallow layers with low-resolution semantic features from deeper layers. Through attention-based channel reweighting and residual connections, the module enhances discriminative feature learning while preserving important spatial and semantic information. As a result, it improves the network’s ability to extract structural characteristics of ships, particularly in cases with blurred or ambiguous boundaries.
The main contributions of this paper are summarized as follows:
  • Building upon the DETR framework, the proposed DWTF-DETR enhances the backscatter feature representation of ship targets by incorporating a Dual-Domain Feature Fusion Module (DDFM), which implements a joint time–frequency domain feature extraction scheme. Unlike the baseline RT-DETR, which primarily relies on spatial-domain features, DDFM explicitly integrates both spatial and spectral information, thereby improving the model’s ability to capture and utilize high- and low-frequency characteristics specific to SAR imagery.
  • To address the challenge of blurred or incomplete ship boundaries in nearshore scenes, DWTF-DETR introduces a Dual-Path Attention Fusion Module (DPAFM). This attention-guided feature reorganization strategy goes beyond the standard feature aggregation in existing DETR-based models by dynamically weighting and combining shallow detail features with deep semantic representations, thus enhancing sensitivity to structural characteristics of ships under complex backgrounds.
  • Extensive experiments conducted on both a self-constructed inshore SAR ship detection dataset and the public HRSID dataset demonstrate that the proposed method achieves more accurate ship detection in port scenarios compared to the baseline RT-DETR and other mainstream deep learning-based approaches.
The remainder of this paper is organized as follows: Section 2 describes related methods. Section 3 presents the proposed approach and its innovations. Section 4 introduces the experimental design, datasets, environment, and evaluation metrics. Section 5 analyzes the experimental results. Section 6 concludes the paper.

2. Related Methods

2.1. Origin and Principles of RT-DETR

With the continuous advancement of deep learning, single-stage and two-stage object detection frameworks such as R-CNN and YOLO have become mainstream approaches in the field. However, these methods typically rely on a one-to-many label assignment strategy during training, which necessitates a non-maximum suppression (NMS) step during inference [35]. NMS generally filters overlapping bounding boxes based on an IoU threshold, which can result in the erroneous suppression of true positives, especially in densely populated object scenes, thereby affecting detection accuracy.
The DEtection TRansformer (DETR) reformulates object detection as a set prediction problem [14]. By leveraging the self-attention mechanism of the Transformer, DETR directly predicts object instances in an image, eliminating the need to generate numerous candidate boxes and apply NMS as in R-CNN and YOLO. Instead, DETR uses the Hungarian matching algorithm to align predictions with ground truth annotations, fundamentally avoiding errors introduced by NMS. This approach significantly improves recall and precision, particularly in complex scenes involving closely packed targets.
In 2023, Baidu introduced RT-DETR [36], a real-time variant of DETR, which addressed the latency issues of traditional Transformer-based detectors. By integrating a hybrid encoder and an IoU-aware query selection mechanism, RT-DETR achieved a substantial improvement in inference speed without compromising accuracy. It is considered a major breakthrough in real-time object detection and has accelerated the practical deployment of Transformer-based detectors in time-sensitive applications.

2.1.1. Origin of RT-DETR

Building upon the original DETR framework, RT-DETR introduces a Hybrid Encoder to enhance the model’s capability in detecting objects at various scales. The hybrid encoder adopts a multi-scale feature fusion strategy that combines low-level detailed features with high-level semantic features. This enables the model to better capture fine-grained information while maintaining semantic consistency, thereby improving overall detection performance.
The main architecture of RT-DETR is illustrated in Figure 2.

2.1.2. Application of RT-DETR in SAR Ship Detection

Given the efficiency and accuracy of the RT-DETR model in processing visual targets, several researchers have recently begun exploring its application in SAR-based ship detection tasks [37,38]. This direction aims to address the inherent challenges in SAR imagery, such as complex backgrounds and highly variable object scales. However, it is important to note that the direct application of RT-DETR to SAR scenarios still encounters several limitations, including insufficient feature representation, limited capacity for modeling long-range dependencies, and inadequate multi-scale feature extraction.
To overcome these challenges, recent studies have primarily focused on three directions to improve the RT-DETR framework for SAR ship detection:
  • Introducing feature enhancement modules to improve feature expressiveness;
  • Optimizing feature interaction mechanisms to strengthen long-range information propagation;
  • Designing multi-scale modeling strategies to better adapt to variations in target sizes.
For example, Chushi Yu et al. [39] incorporated a DualConv module into the RT-DETR framework, leveraging shared information across convolution layers to reduce computational cost while enhancing feature representation. Their model demonstrated effective performance on the HRSID dataset. Yue Guo et al. [40] proposed a lightweight SAR ship detection algorithm by replacing the self-attention encoder in RT-DETR with a lightweight CNN backbone. This modification reduced model parameters and computation by approximately 30% without significant loss in detection accuracy, achieving competitive results across multiple public SAR ship detection benchmarks. Cao, Jie et al. [41] introduced a Feature Selection Module combined with Multi-scale Feature Focusing (MFF), and validated the improved performance of their method on the HRSID and SSDD datasets, showing a clear advantage over the original RT-DETR baseline.
Although the aforementioned studies demonstrate that the RT-DETR model exhibits good adaptability and scalability for SAR ship detection tasks, most existing improvements remain focused on architectural modifications—such as attention mechanism replacements or the integration of feature aggregation modules. However, there is a lack of systematic modeling and mechanism-level interpretation tailored to the unique imaging characteristics of SAR data. In particular, the potential of Transformer-based frameworks in capturing spectral representations and structural awareness has not been fully explored, especially in addressing challenges related to multi-scale ship detection and complex background interference.
Therefore, it is necessary to design an improved framework that integrates both physical properties of SAR imaging and the spatial distribution characteristics of ship targets. Such a framework should offer interpretable mechanisms and dual-domain perception capabilities to further enhance detection performance in inshore scenes.

2.2. Spatial-Domain and Frequency-Domain Feature Extraction

Current mainstream approaches in SAR ship detection are predominantly based on spatial-domain analysis. Researchers have widely adopted deep convolutional neural networks (CNNs) and Transformer-based architectures to model the spatial structural features of ship targets [42]. These methods exploit the local spatial correlation of image pixels and utilize multi-layer convolutional networks to extract edge, texture, and contour features of targets, thereby achieving accurate ship detection.
However, spatial-domain-based detection methods are often sensitive to noise and background interference in SAR imagery. In scenarios with low signal-to-noise ratio (SNR) or low contrast between ships and background, the probability of false negatives and false positives tends to increase significantly. Due to the inherent characteristics of SAR imaging—such as severe speckle noise, low contrast, and complex background—methods that rely solely on spatial-domain features in SAR images often exhibit limited robustness in real-world applications.
The frequency domain can reveal underlying periodic structures, edge patterns, and directional information in SAR imagery, and it exhibits strong resistance to noise and interference. As a result, some researchers have begun to explore feature extraction in the frequency domain for ship targets, aiming to enhance global image representation and suppress local noise [43,44]. This is typically achieved by designing frequency-domain attention mechanisms and frequency response enhancement modules to highlight the spectral characteristics of ships.
However, the modeling approaches based on frequency-domain features are still in their early stages. Existing methods lack a unified framework and systematic design, and their integration with the physical imaging principles of SAR remains relatively superficial.
Building upon the above analyses of spatial and frequency domains, some researchers have recently begun to explore joint modeling approaches that leverage the complementary characteristics of both domains to address target detection challenges under low signal-to-noise ratios and complex background conditions.
Several researchers have recently attempted to exploit cross-modal feature fusion of optical and SAR images to strengthen target feature representation. For example, Zheng Zhou et al. [45] proposed a domain-adaptive few-shot detection framework based on SSD, which transfers knowledge from optical to SAR images through a distance metric, a lossy branching mechanism, and a dual-stream alignment network, achieving strong performance under limited SAR samples. Chenfang Liu et al. [46] proposed OSHFNet, which adopts a heterogeneous dual-branch design that combines CNN and VMamba, a more lightweight alternative to Transformers, to extract complementary spatial-domain features from optical and SAR modalities. Although designed for cross-modal classification, this strategy illustrates how lightweight architectures can effectively capture modality-specific advantages and provide a foundation for feature fusion.
However, in practical detection tasks, it is often difficult to obtain optical and SAR images of the same scene at the same time. Therefore, some researchers rely solely on synthetic aperture radar images to perform joint spatial–frequency modeling. Wang Yong et al. [47] introduced a Spatial–Frequency Interaction (SFI) module that exploits the complementarity between low-frequency and high-frequency components. By decomposing input images into frequency components and extracting features critical for classification and precise localization, their method significantly improves small-sample target detection performance in remote sensing imagery. Ning Gao et al. [34] proposed the Fused Fourier Convolution Mixer (FFCM) architecture, which fuses frequency-domain features with multi-scale spatial features to enable efficient global modeling. In addition, a residual channel prior is introduced to enhance the restoration of local details. During training, a contrastive space is constructed using Discrete Fourier Transform (DFT) to fully exploit frequency-domain information from negative samples. Xingyu Jiang et al. [48] integrated the Fast Fourier Transform (FFT) mechanism into a Transformer architecture by designing the LGPM module—a hybrid structure combining spatial and frequency domains—which performs local and global modeling in both domains. Additionally, a MCFN module is employed to aggregate multi-scale features, effectively recovering fine-grained image representations.
It is worth noting that although current spatial–frequency joint modeling approaches have shown promising results in feature enhancement and multi-scale perception, most existing efforts remain focused on local module-level improvements. There is still a lack of a unified modeling framework and mechanism-level interpretability specifically tailored to SAR ship detection tasks. In scenarios characterized by complex background interference, low signal-to-noise ratios, and blurred features of docked ships, relying solely on spatial-domain or frequency-domain modeling is often insufficient to achieve robust detection performance. Therefore, integrating both spatial and frequency domain information to enable collaborative feature enhancement has emerged as a critical direction for advancing SAR ship detection.

2.3. Attention-Guided Feature Reorganization

Due to the unique imaging mechanism of SAR, the resulting imagery is highly susceptible to speckle noise and interference from cluttered ocean backgrounds. Moreover, the extreme variability in ship aspect ratios, shapes, and orientations further undermines the consistency of feature representations. Attention-guided feature fusion modules can adaptively emphasize feature channels and spatial regions relevant to ships, allocate attention resources across different regions, and suppress irrelevant noise, thereby improving detection accuracy. Currently, mainstream attention-guided feature reorganization modules can be divided into those guided by channel attention mechanisms, those guided by spatial attention mechanisms, and those guided by self-attention mechanisms [49], each of which contributes to more effective and adaptive feature learning for ship detection in complex SAR scenarios.
For example, Zhiyuan Zhao et al. [23] developed a feature enhancement module based on depthwise separable convolution (DSC) and the Convolutional Block Attention Module (CBAM), which effectively improves target recognition performance in SAR images by enhancing the feature representation of both target and shadow regions. Lv, Jianming et al. [23] incorporated the BiFPN (Bidirectional Feature Pyramid Network) structure into the CenterNet2 framework to enable efficient multi-scale feature fusion. By introducing learnable weights, their method adaptively balances the importance of features at different scales, thereby improving overall detection accuracy. Liang Chen et al. [50] pioneered the introduction of an image–text multimodal fusion paradigm for SAR ship recognition, marking a significant breakthrough in remote sensing target interpretation. Their proposed PGMNet fully leverages the unique SAR imaging mechanism to generate high-reliability textual descriptions of targets and implements an efficient image–text fusion module that imposes deep semantic constraints on visual features.
Although existing attention mechanisms have demonstrated some effectiveness in enhancing feature representation for SAR image target detection, most approaches remain focused on channel- or scale-level weight allocation. They often lack sufficient modeling of ship targets’ structural information, morphological characteristics, and spatial composition relationships. Moreover, these methods are predominantly built upon CNN-based architectures, and there has been limited exploration of their integration with Transformer-based frameworks. As a result, they fail to fully leverage the advantages of modeling long-range dependencies inherent in Transformer architectures. Therefore, it is necessary to design a fusion module that incorporates structural awareness of targets and attention mechanisms, while maintaining strong compatibility with Transformer networks. Such a module would enhance the model’s ability to perceive structurally ambiguous targets in complex SAR scenes.

3. Proposed Method

Based on the analysis presented in the Introduction and Related Work sections, although RT-DETR has achieved remarkable progress in generic object detection tasks, its application in SAR-based ship detection within port scenarios still faces several challenges, including limited structural adaptability and insufficient feature representation capabilities. Meanwhile, current SAR ship detection methods exhibit inadequate utilization of information in both spatial and frequency domains. This limitation becomes particularly evident in complex environments characterized by low signal-to-noise ratios and overlapping dockside targets, where detection performance significantly degrades.
Furthermore, while mainstream attention mechanisms have improved the representation of channel and scale features, they generally lack the ability to model target structural awareness and support effective multi-scale fusion. In addition, most are not well-integrated with Transformer-based architectures, thereby limiting their full potential in long-range dependency modeling.
To address these challenges, this paper proposes DWTF-DETR, an SAR ship detection model for inshore scenes that incorporates content-guided dual-domain feature fusion and structural enhancement mechanisms. The model is specifically designed to tackle two key issues in inshore-based SAR ship detection: inconspicuous target features and difficult structural extraction. The remainder of this section presents a detailed discussion of the two proposed modules, including their design principles and technical implementations.

3.1. Overview of the Network Architecture

The proposed DWTF-DETR model is built upon the real-time, end-to-end RT-DETR detection framework as the base architecture. It introduces a Dual-Domain Feature Fusion Module (DDFM), which performs synchronous feature extraction from both the spatial domain (original SAR image) and the frequency domain (Fourier spectrum). By adaptively fusing spatial and frequency features, the model generates dual-domain representations that enhance sensitivity to variations in ship backscatter characteristics while suppressing noise and background interference.
In addition, a Dual-Path Attention Fusion Module (DPAFM) is designed to integrate fine-grained details from shallow layers with high-level semantic features from deeper layers. Through attention-based reweighting, this module strengthens the structural feature representation of ship targets, improving the model’s robustness to boundary ambiguity and indistinct contours—particularly for docked vessels in complex port scenes.
As illustrated in Figure 3, by combining dual-domain fusion, structure-aware enhancement, and lightweight Transformer adaptation, DWTF-DETR improves detection accuracy in complex SAR scenarios while maintaining high efficiency and strong generalization capability.

3.2. Dual-Domain Feature Fusion Module (DDFM)

In inshore areas, densely distributed metal-structured facilities and buildings often generate strong repeated backscatter in SAR imagery, producing periodic textures that can obscure ship targets. This results in blurred edges and indistinct features, making ships difficult to detect and distinguish from the background.
To address this issue, inspired by SFHformer [34], we design a Dual-Domain Feature Fusion Module (DDFM) that integrates spatial- and frequency-domain information to enhance ship feature representation. The module consists of three components:
Spatial-domain branch: The Scharr convolution operator (ScharrConv) is applied to emphasize ship boundary gradients, improving target–background separation.
Frequency-domain branch: A real-valued fast Fourier transform (rFFT2) and its inverse (irFFT2) are used for spectral transformation and reconstruction. Convolutions on the real and imaginary parts capture global periodic textures while suppressing repeated backscatter from metallic port structures.
Dual-domain fusion: The extracted spatial and frequency features are fused through a lightweight CSP (C2f) backbone, which balances local detail preservation with global context modeling.
This structured design improves sensitivity to ship-specific features, reduces false alarms and missed detections, and maintains high inference efficiency.
The network structure of DDFM, as shown in Figure 4, comprises the spatial-domain branch, frequency-domain branch, and the dual-domain fusion component.

3.2.1. Spatial-Domain Branch

The primary goal of the spatial-domain branch is to achieve edge awareness and enhance the local features of ship targets. This branch leverages Scharr convolution (ScharrConv) by applying deep convolutions with Scharr operators in two orthogonal directions (X and Y) to directly extract image gradients. This highlights intensity transitions along ship hull contours and berth structures.
The Scharr operator, proposed by H. Scharr [51] as an improvement over the Sobel operator, is an image gradient operator designed to enhance the rotational symmetry of gradient estimation. This ensures more consistent responses regardless of edge orientation. Due to its high sensitivity to edge features, the Scharr operator has become increasingly prominent in modern edge detection algorithms [52].
The computation of the Scharr operator is similar to that of the Sobel operator and includes convolution kernels in both the X and Y directions. Compared to traditional operators such as Sobel, Prewitt, and Roberts, the Scharr operator places greater emphasis on central weights, which enhances its ability to suppress noise while preserving edge details [53]. This makes it particularly well-suited for applications in SAR ship detection, where edge definition is often blurred by noise. The specific formulation is given in Equations (1) and (2) [54].
G x = 3 0 + 3 10 0 + 10 3 0 + 3
G y = 3 10 3 0 0 0 + 3 + 10 + 3
In this work, we directly extract gradient edges using Scharr operators in both directions to enhance the edge features of ship targets in the spatial domain. This approach emphasizes the intensity transitions between ship hull contours and berth structures, thereby achieving more complete preservation of docked ship features.
After performing feature convolution in both directions, the resulting gradient magnitudes are subjected to a weighted summation. The corresponding formulation is provided in Equation (3).
E x , y = α I o r i g i n G x + β I o r i g i n G y   ,     ( α = β = 0.5 )
After extracting bidirectional gradient features, the proposed method incorporates a residual connection with the original feature map. The combined result—obtained by adding the gradient map to the original features—is further refined through two successive processing steps to focus on the local geometric structure of the ship body. This design also helps alleviate the issues of vanishing and exploding gradients commonly encountered in deep networks.

3.2.2. Frequency-Domain Branch

The primary objective of the frequency-domain branch is to capture global periodic textures and suppress repetitive backscatter caused by metal facilities in port areas. The input to this branch is the frequency-domain representation of the original image, obtained via Fast Fourier Transform (FFT). FFT is an efficient algorithm for computing the Discrete Fourier Transform (DFT), and its core formulation is presented in Equation (4).
X k = n = 0 N 1 x n e i 2 π k N n , k = 0,1 , , N 1
After applying the FFT, the frequency-domain features are separated into real and imaginary components. Two-dimensional convolutional kernels are then applied independently to the real and imaginary parts. In the frequency domain, the real component is typically associated with the symmetric characteristics of the signal, while the imaginary component relates to its antisymmetric properties. By convolving them separately, the network can more precisely extract these distinct attributes, thereby enhancing its sensitivity to fine-grained image details.
Once the real and imaginary components have been processed through their respective convolutions, the features are fused using an addition operation, which effectively captures and leverages the phase-related information of SAR ship targets.
The fused frequency-domain features are then transformed back into the spatial domain using the Inverse Fast Fourier Transform (IFFT) to prepare for subsequent dual-domain fusion. IFFT is an efficient algorithm for computing the Inverse Discrete Fourier Transform (IDFT), and its formulation is given in Equation (5).
x n = 1 N n = 0 N 1 X k e i 2 π k N n , n = 0,1 , , N 1

3.2.3. Dual-Domain Fusion Component

After processing through the spatial-domain and frequency-domain branches, the resulting feature maps are jointly fed into the dual-domain fusion module. This module employs an addition operation to merge the features from both branches. By combining information from different feature levels, this fusion enables effective interaction and complementarity between multi-level features, facilitating joint modeling of edge, texture, and global structural information. As a result, it significantly improves the model’s ability to distinguish between ships and port facilities that exhibit similar backscatter characteristics.

3.3. Dual-Path Attention Fusion Module (DPAFM)

Due to their limited scale, inshore ship targets are highly susceptible to being overwhelmed by background clutter in noisy environments. Deep features alone often fail to retain precise localization information, while shallow features struggle to effectively filter background interference. Inspired by the Bidirectional Feature Pyramid Network (BiFPN) structure used in the YOLO-based EfficientDet architecture [55], we propose a novel module that combines context–detail mutual guidance with channel-wise adaptive weight allocation. This module is termed the DPAFM. The detailed network architecture of the DPAFM is illustrated in Figure 5.
This module first employs an adjust_conv operation to align the channel dimensions of the two input features, x0 (shallow, high-resolution detail features) and x1 (deep, low-resolution semantic features). This step ensures channel consistency and enables the fusion of multi-scale information into a unified feature map, providing a solid foundation for subsequent attention operations.
The fused feature map is then processed by a Squeeze-and-Excitation (SE) attention mechanism [56], which adaptively assigns weights to each channel. This mechanism emphasizes features relevant to ship targets while suppressing background noise and irrelevant information, thereby improving the discriminative power of the feature representation.
The attention-enhanced feature map is subsequently split into two branches, each of which undergoes element-wise multiplication with the original inputs (x0 and x1, respectively), enabling feature reweighting. Next, the reweighted features are added to the original features via residual connections, further strengthening feature representation while preserving original information. Finally, the two enhanced feature maps are concatenated along the channel dimension to form the output of the module. This design not only retains the original feature information but also enhances its discriminability through the combined effect of attention mechanisms and residual connections.

4. Materials

4.1. Test Datasets

The test dataset used in this study is adapted from the SARDet-100K dataset, which includes data collected from sensors such as Sentinel-1B, TerraSAR-X, and GF-3. The dataset consists of 12,646 image patches with a resolution of 256 × 256 pixels, among which 3046 patches correspond to inshore scenes, accounting for 24.09% of the total. A total of 15,746 ships are annotated in the dataset, with an average target size of 804.12 pixels.
In addition, to evaluate the generalizability of the proposed method, ablation experiments are also conducted on the HRSID dataset. This dataset contains imagery from sensors including TerraSAR-X and TanDEM-X, with a total of 5604 image patches, each of size 800 × 800 pixels. It includes 16,951 ship targets, with spatial resolution ranging from 0.5 to 3 m.
Both datasets are split into training, validation, and test sets in a ratio of 8:1:1, and multiple rounds of experiments are performed for evaluation. Examples of image patches from the two datasets are shown in Figure 6.

4.2. Experimental Details

4.2.1. Experimental Environment

All experiments were conducted on a workstation equipped with an Nvidia (Santa Clara, CA, USA) GeForce RTX 3090 (24 GB) GPU and an Intel (Santa Clara, CA, USA) Xeon Gold 6134 CPU. The initial learning rate (lr0) was set to 0.0001. All SAR images were resampled into patches of 256 × 256 pixels, and the number of worker threads for data loading was set to 4, with a batch size of 32. Given that the data volume ratio between the self-constructed dataset and the HRSID dataset is approximately 1:2.25, the number of training epochs was adjusted accordingly to ensure training balance: 200 epochs were used for the self-constructed dataset, and 500 epochs for the HRSID dataset.

4.2.2. Experimental Design

The experimental evaluation is divided into three main parts:
  • Operator Comparison Experiment:
To demonstrate the necessity of selecting the Scharr operator for the spatial-domain branch of the proposed DDFM, we substitute it with three widely used alternatives—Sobel, Prewitt and Laplacian—while keeping the overall architecture unchanged. Each variant is embedded into the baseline RT-DETR model and evaluated on our inshore ship dataset.
The operator comparison experiment in this section aims to validate the effectiveness and superiority of selecting the Scharr operator as the bi-directional image gradient extractor within the DDFM.
  • Ablation Experiment:
Based on the RT-DETR framework, we conduct a series of controlled experiments by progressively integrating the two proposed modules: the DDFM and the DPAFM. The evaluated configurations include: RT-DETR (baseline), RT-DETR + DDFM, RT-DETR + DPAFM, RT-DETR + DDFM + DPAFM (DWTF-DETR)
This ablation experiment aims to validate the individual and combined effectiveness of the proposed modules.
  • Comparative Experiment:
To further evaluate the performance of our method, we compare it against several mainstream object detection approaches, including single-stage, two-stage, and Transformer-based detectors, as well as various SAR-specific ship detection algorithms referenced in the Introduction. Both quantitative results (using five evaluation metrics) and qualitative visualizations are presented to comprehensively verify the accuracy and effectiveness of the proposed approach.
Each group of experiments was conducted five times, and all the reported results in the table represent the average values across these repeated runs.

4.2.3. Comparison Metrics

The effectiveness of the proposed detection method is evaluated using four commonly used metrics: Precision, Recall, F1 Score, and mAP@50.
Precision reflects the accuracy of the detection results, indicating the proportion of correctly predicted ship targets among all predicted positive samples. It is defined as
P r e c i s i o n = T P T P + F P
where TP is the number of true positives (correctly detected ships), and FP is the number of false positives (incorrectly detected non-ship targets).
Recall measures the ability of the detection method to identify actual ship targets in the image. It is defined as
R e c a l l = T P T P + F N
where TP is the number of true positives (correctly detected ships), and FN is the number of false negatives (missed ship targets).
F1 Score is the harmonic mean of Precision and Recall, providing a balanced measure that considers both aspects. It reflects the overall stability and reliability of the model in SAR ship target detection. The F1 Score is defined as
F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
mAP@50 refers to the mean Average Precision calculated at an IoU threshold of 0.50, which evaluates the average precision of ship target detection at this fixed overlap threshold. For IoU = 0.50, the detection results are sorted in descending order by confidence score, and corresponding Precision–Recall points (P(r),r) are recorded at different confidence levels. The mAP@50 is computed as the integral of the area under the Precision–Recall curve, defined as
m A P @ 50 = 0 1 P ( r ) d r   ( I o U = 0.50 )

5. Results

5.1. Operator Experiment

To validate the necessity of operator selection within the DDFM, we successively adopt Sobel, Prewitt, Scharr and Laplacian kernels as its core bi-directional gradient extractors, integrate each variant into the RT-DETR framework, and conduct comparative experiments on our self-constructed inshore ship dataset and the HRSID dataset; the results are reported in Table 1.
The quantitative comparison of different edge detection operators as shown in Table 1 reveals distinct strengths and trade-offs.
On our self-constructed dataset, Scharr achieves the highest Precision (88.56%), outperforming Sobel, Prewitt, and Laplacian by 0.35%, 0.32%, and 1.03%. For the overall balance between Precision and Recall, F1-score, Scharr again performs best at 82.50%, with improvements of 0.77% over Prewitt and 0.28% over Laplacian, and nearly equal to Sobel (−0.01%). The most prominent advantage of Scharr is observed in mAP@50, where it achieves 83.39%, surpassing Sobel by 1.81%, Prewitt by 1.32%, and Laplacian by 0.11%.
On the HRSID dataset, Scharr achieves the highest mAP@50 (84.95%), outperforming Sobel, Prewitt, and Laplacian by 0.62%, 0.77%, and 0.07%, respectively. In terms of F1-score, Scharr again performs best at 83.52%, with improvements of 1.82% over Prewitt and 0.15% over Laplacian, and nearly equal to Sobel (−0.56%). For Recall, Scharr reaches 78.58%, exceeding Sobel, Prewitt, and Laplacian by 2.10%, 2.00%, and −0.67%. However, Sobel achieves the highest Precision, outperforming Scharr, Prewitt, and Laplacian by 1.52%, 3.10%, and 2.70%.

5.2. Ablation Experiment

To validate the effectiveness of the proposed modules for ship detection in port scenes, we conduct ablation experiments by selectively integrating the DDFM and the DPAFM into the baseline model. Two datasets are used for evaluation: the self-constructed dataset and the HRSID dataset. A total of four comparative configurations are tested, as shown in Table 2.
Figure 7 illustrates the mAP@50 convergence curves of different models on the test dataset, providing an intuitive comparison of detection performance throughout the training process. All four models exhibit a steadily increasing trend followed by convergence, indicating effective training. Among them, the baseline model M1 shows the lowest overall performance with the slowest convergence. M2, which incorporates the Dual-Domain Feature Module (DDFM), demonstrates a noticeably faster convergence rate in the early stages, validating its effectiveness in enhancing target feature extraction. M3, which integrates the Dual-Path Attention Fusion Module (DPAFM), surpasses M2 in the later training stages, indicating improved capability in handling blurred ship targets near the shoreline. The proposed M4 (DWTF-DETR) model, combining both DDFM and DPAFM, consistently achieves the best performance throughout training, ultimately reaching the highest mAP@50 value.
From the ablation experiments conducted on the self-constructed dataset, incorporating the A1 module (DDFM) yields a 0.33% increase in F1-score and a 1.05% improvement in mAP@50 over the original RT-DETR. Introducing the A2 module (DPAFM) further raises the F1-score by 0.45% and mAP@50 by 0.54%. When A1 and A2 are integrated, the proposed model attains a 0.58% gain in F1-score and a 1.60% boost in mAP@50 relative to the baseline.
On the HRSID dataset, the complete model similarly improves the F1-score by 0.61% and mAP@50 by 0.31% compared with RT-DETR. These quantitative results show consistent advantages in Precision, Recall, F1-score, and mAP@50.
From the visualization results and heatmaps in Table 3, it can be observed that the baseline method (M1) is easily disturbed by strong clutter in harbor areas and shows scattered responses in heatmaps. With the introduction of the DDFM (M2), the detection of ship contours improves slightly, and the responses begin to concentrate on target regions. After incorporating the DPAFM (M3), the detector focuses more explicitly on ships in nearshore scenes, effectively suppressing background interference and enhancing the perception of blurred and low-contrast targets. Finally, the integration of both modules (M4, DWTF-DETR) achieves the most stable detection, with accurate bounding boxes and heatmaps highly concentrated on ship targets.

5.3. Comparative Experiment

To better demonstrate the superiority of the proposed DWTF-DETR, we compare it with various state-of-the-art (SOTA) methods on our test datasets. The comparison methods are divided into three categories:
  • Mainstream object detection models, including: YOLOv11, YOLOv12, YOLO-DETR [57], SWIN-DETR [58], Sparse-DETR [59] and RT-DETR;
  • Methods based on feature enhancement, including: YOLO-BiFPN, RT-DETR-BiFPN, SEAttention, CBAM [48], and EMA [60];
  • Methods utilizing frequency-domain features, including: FFCM, SFHF, and FocalNet [61].
From the quantitative comparison presented in Table 4, it can be observed that compared with General Detector, Methods based on feature enhancement, and Methods utilizing multi-domain fusion, the proposed DWTF-DETR achieves the best performance on two core metrics. Specifically, it attains the highest F1-score among all methods, with an improvement of 0.30% over the second-best method, and achieves the highest mAP@50, outperforming the next best by 0.61%. For Precision, Recall, and mAP@50, the proposed method ranks in the top tier: Recall, while not the highest, is the second-best and only 0.39% lower than the leading method SFHF. Overall, DWTF-DETR demonstrates a relatively balanced performance advantage across Precision, Recall, F1-score, and mAP@50.
Furthermore, the comparison of parameters and computational efficiency in Table 5 reveals significant differences in model complexity among various detection methods. The YOLO series is overall lightweight, with GFLOPs controlled below 10, making it suitable for real-time detection but with limited accuracy. RT-DETR and its variants contain approximately 20 million parameters, with GFLOPs ranging from 57 to 66, reflecting higher computational overhead but stronger feature modeling capability. Frequency-domain-based methods (e.g., FocalNet) tend to have deeper structures but maintain lower GFLOPs, thus achieving higher efficiency. In contrast, DWTF-DETR introduces only a marginal increase in parameters and GFLOPs compared to RT-DETR, while attaining higher detection accuracy, thereby achieving a favorable balance between performance and efficiency.
The visualization results in Table 6 further support the quantitative findings. In three representative scenarios, the proposed DWTF-DETR effectively distinguishes ships from regular strong-scattering structures in port areas, maintaining stable perception of weak boundaries and ship–shore connected regions. Only two false detections occur in nearshore open water, mainly due to confusion between bright mirror reflections or wake echoes and local strong scattering.

6. Discussion

The proposed DWTF-DETR demonstrates clear advantages in addressing the challenges of ship detection in complex nearshore SAR scenes. The three experiments collectively verify the effectiveness of the proposed DWTF-DETR framework in SAR ship detection.
Operator experiments show that Scharr achieves consistently higher F1 score and mAP@50 than other edge operators across two datasets, as shown in Table 1. This indicates that Scharr has a stronger ability to maintain boundary continuity and suppress pseudo-edges. Unlike Laplacian or Sobel, Scharr adopts a rotationally symmetric gradient operator, which enhances sensitivity to fine structural details while reducing directional bias. This property is particularly valuable in SAR imagery, where ship contours are often blurred by speckle noise and irregular scattering. Embedding Scharr into the DDFM therefore provides more reliable structural cues, enabling the detector to better distinguish true ship boundaries from interfering textures in harbor environments.
Ablation studies further verify the complementary roles of DDFM and DPAFM. As shown in Table 2, integrating the two modules achieves statistically significant improvements in F1-score and mAP@50 across both the self-constructed dataset and the HRSID benchmark. The visualization results, as shown in Table 3, illustrate a progressive concentration of heatmap responses on ship regions: the baseline model (M1) shows scattered activations in harbor scenes, while the inclusion of DDFM (M2) improves contour delineation. With DPAFM (M3), background interference is further suppressed and responses are more focused on targets. Finally, their integration (M4) achieves the most stable detection, with bounding boxes tightly aligned to actual ship contours. These observations align with the model design: DDFM enhances representation of structured textures and frequency-domain priors, while DPAFM adaptively strengthens attention to blurred boundaries and low-contrast targets. Together, they yield robust detection and precise localization in complex nearshore environments.
Comparative experiments demonstrate that DWTF-DETR achieves both high accuracy and efficiency. As shown in Table 4, the model obtains the highest F1-score and mAP@50 among all competing methods, while incurring only marginal increases in parameter count and GFLOPs compared to RT-DETR, as shown in Table 5. This favorable trade-off results from the dual-domain fusion strategy, which maximizes feature discriminability without substantially increasing network complexity. Furthermore, visual comparisons, as shown in Table 6, highlight the robustness of DWTF-DETR in distinguishing ships from strong scattering structures such as port facilities. Taken together, the quantitative and qualitative evidence confirms that DWTF-DETR not only enhances detection accuracy but also maintains computational efficiency, making it well-suited for real-world SAR-based maritime surveillance.
Overall, DWTF-DETR strikes a favorable balance between accuracy and efficiency, demonstrating strong generalization to diverse nearshore scenarios. The dual-domain fusion and feature reorganization strategies underpin both its quantitative improvements and its robustness in practical applications.
Despite these advantages, several directions remain open. First, more extensive validation on larger-scale and multi-sensor SAR datasets is needed to further confirm the generalization ability of the model. Second, incorporating temporal information from multi-frame SAR sequences may help mitigate false alarms caused by transient clutter such as wakes or reflections. Third, exploring lightweight variants of DWTF-DETR could enhance its suitability for real-time maritime surveillance applications. Finally, future work will also focus on integrating explicit ship structural information into the model design, aiming to further improve detection robustness and interpretability.

7. Conclusions

This paper proposes DWTF-DETR, a novel detection framework designed to address missed detections and inaccurate bounding boxes in ship detection tasks within port scenes of SAR imagery. The method integrates a Dual-Domain Feature Fusion Module (DDFM), which combines spatial-domain edge enhancement with frequency-domain global context analysis, and a Dual-Path Attention Fusion Module (DPAFM), which dynamically reorganizes features through an attention mechanism to improve structural sensitivity.
Comparative experiments demonstrate that DWTF-DETR consistently outperforms the baseline RT-DETR and other state-of-the-art methods across multiple metrics. These results confirm the effectiveness and robustness of the proposed approach for accurate ship detection in challenging nearshore SAR scenarios. In future work, we plan to enhance the current algorithm by integrating explicit ship structural information to further improve detection robustness and interpretability.

Author Contributions

Conceptualization, T.W. and D.L.; Methodology, T.D.; Formal analysis, Y.H.; Investigation, G.Z.; Data curation, Y.P.; Writing—original draft, T.D.; Writing—review & editing, T.W. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

Author Yuan Peng was employed by the company Hubei Geospatial Information Technology Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zhang, M.; Zhu, Y.; Li, L.; Guo, J.; Liu, Z.; Li, Y. S4Det: Breadth and Accurate Sine Single-Stage Ship Detection for Remote Sense SAR Imagery. Remote Sens. 2025, 17, 900. [Google Scholar] [CrossRef]
  2. Li, D.; Liang, Q.; Liu, H.; Liu, Q.; Liu, H.; Liao, G. A Novel Multidimensional Domain Deep Learning Network for SAR Ship Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
  3. Nie, X.; Yang, M.; Liu, R.W. Deep Neural Network-Based Robust Ship Detection Under Different Weather Conditions. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 47–52. [Google Scholar]
  4. Zhou, X.; Chang, N.-B.; Li, S. Applications of SAR Interferometry in Earth and Environmental Science Research. Sensors 2009, 9, 1876–1912. [Google Scholar] [CrossRef]
  5. Li, J.; Xu, C.; Su, H.; Gao, L.; Wang, T. Deep Learning for SAR Ship Detection: Past, Present and Future. Remote Sens. 2022, 14, 2712. [Google Scholar] [CrossRef]
  6. Dong, T.; Wang, T.; Li, X.; Hong, J.; Jing, M.; Wei, T. A Large Ship Detection Method Based on Component Model in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4108–4123. [Google Scholar] [CrossRef]
  7. Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. SAR Ship Detection Dataset (SSDD): Official Release and Comprehensive Data Analysis. Remote Sens. 2021, 13, 3690. [Google Scholar] [CrossRef]
  8. Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
  9. Yasir, M.; Jianhua, W.; Mingming, X.; Hui, S.; Zhe, Z.; Shanwei, L.; Colak, A.T.I.; Hossain, M.S. Ship Detection Based on Deep Learning Using SAR Imagery: A Systematic Literature Review. Soft Comput. 2023, 27, 63–84. [Google Scholar] [CrossRef]
  10. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
  11. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  12. Ke, X.; Zhang, X.; Zhang, T.; Shi, J.; Wei, S. SAR Ship Detection Based on an Improved Faster R-CNN Using Deformable Convolution. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 3565–3568. [Google Scholar]
  13. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef]
  14. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
  15. Feng, Y.; You, Y.; Tian, J.; Meng, G. OEGR-DETR: A Novel Detection Transformer Based on Orientation Enhancement and Group Relations for SAR Object Detection. Remote Sens. 2024, 16, 106. [Google Scholar] [CrossRef]
  16. Lin, H.; Liu, J.; Li, X.; Wei, L.; Liu, Y.; Han, B.; Wu, Z. DCEA: DETR With Concentrated Deformable Attention for End-to-End Ship Detection in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 17292–17307. [Google Scholar] [CrossRef]
  17. Lei, S.; Lu, D.; Qiu, X.; Ding, C. SRSDD-v1.0: A High-Resolution SAR Rotation Ship Detection Dataset. Remote Sens. 2021, 13, 5104. [Google Scholar] [CrossRef]
  18. Liu, X.; Liu, M.; Yin, Y. Infrared Ship Detection in Complex Nearshore Scenes Based on Improved YOLOv5s. Sensors 2025, 25, 3979. [Google Scholar] [CrossRef] [PubMed]
  19. Zhou, Y.; Zhang, F.; Yin, Q.; Ma, F.; Zhang, F. Inshore Dense Ship Detection in SAR Images Based on Edge Semantic Decoupling and Transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 4882–4890. [Google Scholar] [CrossRef]
  20. Cheng, J.; Xiang, D.; Tang, J.; Zheng, Y.; Guan, D.; Du, B. Inshore Ship Detection in Large-Scale SAR Images Based on Saliency Enhancement and Bhattacharyya-like Distance. Remote Sens. 2022, 14, 2832. [Google Scholar] [CrossRef]
  21. Chen, Z.; Ding, Z.; Zhang, X.; Wang, X.; Zhou, Y. Inshore Ship Detection Based on Multi-Modality Saliency for Synthetic Aperture Radar Images. Remote Sens. 2023, 15, 3868. [Google Scholar] [CrossRef]
  22. Fu, X.; Wang, Z. Fast Ship Detection Method for Sar Images in the Inshore Region. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 3569–3572. [Google Scholar]
  23. Li, C.; Yue, C.; Li, H.; Wang, Z. Context-Aware SAR Image Ship Detection and Recognition Network. Front. Neurorobot. 2024, 18, 1293992. [Google Scholar] [CrossRef]
  24. Bai, L.; Yao, C.; Ye, Z.; Xue, D.; Lin, X.; Hui, M. A Novel Anchor-Free Detector Using Global Context-Guide Feature Balance Pyramid and United Attention for SAR Ship Detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4003005. [Google Scholar] [CrossRef]
  25. Fu, J.; Sun, X.; Wang, Z.; Fu, K. An Anchor-Free Method Based on Feature Balancing and Refinement Network for Multiscale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1331–1344. [Google Scholar] [CrossRef]
  26. Tang, G.; Zhao, H.; Claramunt, C.; Zhu, W.; Wang, S.; Wang, Y.; Ding, Y. PPA-Net: Pyramid Pooling Attention Network for Multi-Scale Ship Detection in SAR Images. Remote Sens. 2023, 15, 2855. [Google Scholar] [CrossRef]
  27. Hu, Q.; Hu, S.; Liu, S. BANet: A Balance Attention Network for Anchor-Free Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5222212. [Google Scholar] [CrossRef]
  28. Wang, P.; Chen, Y.; Yang, Y.; Chen, P.; Zhang, G.; Zhu, D.; Jie, Y.; Jiang, C.; Leung, H. A General Multiscale Pyramid Attention Module for Ship Detection in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2815–2827. [Google Scholar] [CrossRef]
  29. Zhang, T.; Zhang, X.; Ke, X. Quad-FPN: A Novel Quad Feature Pyramid Network for SAR Ship Detection. Remote Sens. 2021, 13, 2771. [Google Scholar] [CrossRef]
  30. Xu, S.; Fan, J.; Jia, X.; Chang, J. Edge-Constrained Guided Feature Perception Network for Ship Detection in SAR Images. IEEE Sens. J. 2023, 23, 26828–26838. [Google Scholar] [CrossRef]
  31. Feng, Y.; Zhang, Y.; Zhang, X.; Wang, Y.; Mei, S. Large Convolution Kernel Network with Edge Self-Attention for Oriented SAR Ship Detection. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2025, 18, 2867–2879. [Google Scholar] [CrossRef]
  32. Wu, G.; Wang, S.L.; Liu, Y.; Wang, P.; Li, Y. Ship Contour Extraction from Polarimetric SAR Images Based on Polarization Modulation. Remote Sens. 2024, 16, 3669. [Google Scholar] [CrossRef]
  33. Hu, C.; Chen, H.; Sun, X.; Ma, F. Polarimetric SAR Ship Detection Using Context Aggregation Network Enhanced by Local and Edge Component Characteristics. Remote Sens. 2025, 17, 568. [Google Scholar] [CrossRef]
  34. Efficient Frequency-Domain Image Deraining with Contrastive Regularization|SpringerLink. Available online: https://link.springer.com/chapter/10.1007/978-3-031-72940-9_14 (accessed on 15 July 2025).
  35. Gong, M.; Wang, D.; Zhao, X.; Guo, H.; Luo, D.; Song, M. A Review of Non-Maximum Suppression Algorithms for Deep Learning Target Detection. In Proceedings of the Seventh Symposium on Novel Photoelectronic Detection Technology and Applications, Kunming, China, 5–7 November 2020; Volume 11763, pp. 821–828. [Google Scholar]
  36. Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar] [CrossRef]
  37. Liu, C.; Zhang, Y.; Shen, J.; Liu, F. Improved RT-DETR for Infrared Ship Detection Based on Multi-Attention and Feature Fusion. J. Mar. Sci. Eng. 2024, 12, 2130. [Google Scholar] [CrossRef]
  38. Wang, Y.; Li, X. Ship-DETR: A Transformer-Based Model for EfficientShip Detection in Complex Maritime Environments. IEEE Access 2025, 13, 66031–66039. [Google Scholar] [CrossRef]
  39. Yu, C.; Shin, Y. SAR Ship Detection Based on Improved YOLOv5 and BiFPN. ICT Express 2024, 10, 28–33. [Google Scholar] [CrossRef]
  40. Guo, Y.; Zhang, T.; Chen, S.; Zhan, R.; Li, L.; Zhang, J. YO-DETR: A Lightweight End-to-End SAR Ship Detector Using Decoder Head without NMS. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; pp. 9408–9411. [Google Scholar]
  41. Cao, J.; Han, P.; Liang, H.; Niu, Y. SFRT-DETR:A SAR Ship Detection Algorithm Based on Feature Selection and Multi-Scale Feature Focus. Signal Image Video Process. 2025, 19, 115. [Google Scholar] [CrossRef]
  42. Xing, Z.; Ren, J.; Fan, X.; Zhang, Y. S-DETR: A Transformer Model for Real-Time Detection of Marine Ships. J. Mar. Sci. Eng. 2023, 11, 696. [Google Scholar] [CrossRef]
  43. Zhang, L.; Liu, Y.; Zhao, W.; Wang, X.; Li, G.; He, Y. Frequency-Adaptive Learning for SAR Ship Detection in Clutter Scenes. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
  44. Xu, K.; Qin, M.; Sun, F.; Wang, Y.; Chen, Y.-K.; Ren, F. Learning in the Frequency Domain. arXiv 2020, arXiv:2002.12416. [Google Scholar]
  45. Zhou, Z.; Zhao, L.; Ji, K.; Kuang, G. A Domain-Adaptive Few-Shot SAR Ship Detection Algorithm Driven by the Latent Similarity Between Optical and SAR Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–18. [Google Scholar] [CrossRef]
  46. Liu, C.; Sun, Y.; Zhang, X.; Xu, Y.; Lei, L.; Kuang, G. OSHFNet: A Heterogeneous Dual-Branch Dynamic Fusion Network of Optical and SAR Images for Land Use Classification. Int. J. Appl. Earth Obs. Geoinf. 2025, 141, 104609. [Google Scholar] [CrossRef]
  47. Wang, Y.; Li, J.; Guo, J.; Liu, R.; Cao, Q.; Li, D.; Wang, L. SFIDM: Few-Shot Object Detection in Remote Sensing Images with Spatial-Frequency Interaction and Distribution Matching. Remote Sens. 2025, 17, 972. [Google Scholar] [CrossRef]
  48. When Fast Fourier Transform Meets Transformer for Image Restoration|SpringerLink. Available online: https://link.springer.com/chapter/10.1007/978-3-031-72995-9_22 (accessed on 15 July 2025).
  49. Liu, C.; Sun, Y.; Xu, Y.; Sun, Z.; Zhang, X.; Lei, L.; Kuang, G. A Review of Optical and SAR Image Deep Feature Fusion in Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 12910–12930. [Google Scholar] [CrossRef]
  50. Chen, L.; Li, J.; Zhong, H.; Shi, H.; Yang, Z.; Li, W. PGMNet: A Prototype-Guided Multimodal Network for Ship Recognition in SAR Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1–17. [Google Scholar] [CrossRef]
  51. Scharr, H. Optimal Operators in Digital Image Processing. 2000. Available online: https://www.researchgate.net/publication/36148383_Optimal_operators_in_digital_image_processing_Elektronische_Ressource (accessed on 18 September 2025).
  52. Liu, J.; Li, Z.; Zhang, X.; Li, P.; Xu, Y. Region of Interest Detection on the Complex Sea Scenes Based on Visual Saliency. In Proceedings of the Second Target Recognition and Artificial Intelligence Summit Forum, Changchun, China, 28–30 August 2019; Volume 11427, pp. 1057–1066. [Google Scholar]
  53. Meester, M.J.; Baslamisli, A.S. SAR Image Edge Detection: Review and Benchmark Experiments. Int. J. Remote Sens. 2022, 43, 5372–5438. [Google Scholar] [CrossRef]
  54. Chen, J.; Hu, Z.; Zhao, Y.; Wu, W.; Chen, L.; Hu, Y.; Huang, B. Discrete Edge Feature Guided Rotation Detection Method for Remote Sensing Ship Wake. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 21275–21288. [Google Scholar] [CrossRef]
  55. Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. arXiv 2020, arXiv:1911.09070. [Google Scholar] [CrossRef]
  56. Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. arXiv 2019, arXiv:1709.01507. [Google Scholar]
  57. Ouyang, H. DEYO: DETR with YOLO for End-to-End Object Detection. arXiv 2024, arXiv:2402.16370. [Google Scholar]
  58. Xie, Y.; Ma, X.; Zhao, Q. Research on Target Detection Network Based on Improved Swin-DETR. In Proceedings of the 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Hangzhou, China, 25–27 August 2023; pp. 324–328. [Google Scholar]
  59. Roh, B.; Shin, J.; Shin, W.; Kim, S. Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity. arXiv 2022, arXiv:2111.14330. [Google Scholar]
  60. Han, Q.; Han, X.; Niu, L.; Fan, Y. Light-YOLOv7: Lightweight Ship Object Detection Algorithm Based on CA and EMA. In Proceedings of the 2024 9th International Conference on Automation, Control and Robotics Engineering (CACRE), Jeju Island, Republic of Korea, 18–20 July 2024; pp. 231–236. [Google Scholar]
  61. Yang, J.; Li, C.; Dai, X.; Yuan, L.; Gao, J. Focal Modulation Networks. arXiv 2022, arXiv:2203.11926. [Google Scholar] [CrossRef]
Figure 1. Illustration of Detection Challenges. (a) Low reflectivity contrast between ships and surrounding objects; (b) Difficulty in extracting structural features of dockside ships. Green boxes indicate ship targets. Yellow boxes indicate the magnified region where the problem is relatively obvious.
Figure 1. Illustration of Detection Challenges. (a) Low reflectivity contrast between ships and surrounding objects; (b) Difficulty in extracting structural features of dockside ships. Green boxes indicate ship targets. Yellow boxes indicate the magnified region where the problem is relatively obvious.
Remotesensing 17 03301 g001
Figure 2. RT-DETR Architecture Diagram.
Figure 2. RT-DETR Architecture Diagram.
Remotesensing 17 03301 g002
Figure 3. DWTF-DETR Architecture Diagram.
Figure 3. DWTF-DETR Architecture Diagram.
Remotesensing 17 03301 g003
Figure 4. DDFM Architecture Diagram.
Figure 4. DDFM Architecture Diagram.
Remotesensing 17 03301 g004
Figure 5. DPAFM Architecture Diagram.
Figure 5. DPAFM Architecture Diagram.
Remotesensing 17 03301 g005
Figure 6. Sample Image Patches from the Dataset.
Figure 6. Sample Image Patches from the Dataset.
Remotesensing 17 03301 g006
Figure 7. Comparison of mAP@50 Convergence Curves on two Datasets. (a) self-constructed dataset; (b) HRSID dataset.
Figure 7. Comparison of mAP@50 Convergence Curves on two Datasets. (a) self-constructed dataset; (b) HRSID dataset.
Remotesensing 17 03301 g007aRemotesensing 17 03301 g007b
Table 1. Quantitative Testing Results of Operator Experiments.
Table 1. Quantitative Testing Results of Operator Experiments.
DatasetsMethodPrecision (%)Recall (%)F1 Score (%)mAP@50 (%)
self-constructed datasetSobel88.2177.5082.5181.58
Prewitt88.2476.1181.7382.07
Laplacian87.5377.5182.2283.28
Scharr88.5677.2282.5083.39
HRSIDSobel90.6476.4882.9684.33
Prewitt87.5476.5881.7084.18
Laplacian87.9479.2583.3784.88
Scharr89.1278.5883.5284.95
The results are color-coded, with red highlighting the top performance and green for the second best.
Table 2. Quantitative Testing Results of Ablation Experiment.
Table 2. Quantitative Testing Results of Ablation Experiment.
DatasetsMethodA1A2Precision (%)Recall (%)F1 Score (%)mAP@50 (%)
self-constructed
dataset
M1××87.8677.1882.1782.34
M2×88.56 (+0.70)77.22 (+0.04)82.50 (+0.33)83.39 (+1.05)
M3×89.71 (+1.85)76.57 (−0.61)82.62 (+0.45)82.88 (+0.54)
M488.46 (+0.60)77.73 (+0.55)82.75 (+0.58)83.94 (+1.60)
HRSIDM1××89.0578.1383.2384.88
M2×89.12 (+0.07)78.58 (+0.45)83.52 (+0.28)84.95 (+0.08)
M3×89.45 (+0.40)79.22 (+1.09)84.02 (+0.79)85.28 (+0.40)
M491.37 (+2.31)78.83 (+0.70)84.64 (+1.40)85.59 (+0.72)
A1 refers to the DDFM, and A2 refers to the DPAFM. The values of the most accurate method showed in bold. Green numbers indicate the improvements compared to the baseline method, while red numbers represent the decreases. × indicates that the module is not included in the algorithm; ✓ indicates that the module is included in the algorithm.
Table 3. Visualization of the detection results of different methods on two datasets.
Table 3. Visualization of the detection results of different methods on two datasets.
Method Ships in Complex Land–Object Environments Moored and Near-Shore Ships
Result1 Hotmap1 Result2 Hotmap2
M1Remotesensing 17 03301 i001Remotesensing 17 03301 i002Remotesensing 17 03301 i003Remotesensing 17 03301 i004
M2Remotesensing 17 03301 i005Remotesensing 17 03301 i006Remotesensing 17 03301 i007Remotesensing 17 03301 i008
M3Remotesensing 17 03301 i009Remotesensing 17 03301 i010Remotesensing 17 03301 i011Remotesensing 17 03301 i012
M4Remotesensing 17 03301 i013Remotesensing 17 03301 i014Remotesensing 17 03301 i015Remotesensing 17 03301 i016
Green boxes indicate correctly detected targets, red boxes indicate false detections, and yellow boxes indicate missed detections.
Table 4. Quantitative Testing Results of Different Methods.
Table 4. Quantitative Testing Results of Different Methods.
MethodsPrecision (%)Recall (%)F1 Score (%)mAP@50 (%)
General
Detector
YOLOv885.7568.3076.0477.15
YOLOv1183.7669.6376.0476.49
YOLOv1285.9868.2776.1175.97
YOLO-DETR87.2473.4979.7879.56
SWIN-DETR86.6476.8881.4782.16
Sparse-DETR87.3176.8581.7482.68
RT-DETR87.8677.1882.1782.34
Methods based on feature
enhancement
YOLO-BiFPN89.5175.7582.0682.92
RT-DETR-BiFPN87.7377.6282.3683.06
SEAttention87.5477.1882.0382.90
CBAM88.6376.4182.0782.94
EMA88.1677.1682.2982.60
Methods
utilizing multi-
domain fusion
FFCM88.0577.1682.2482.72
SFHF87.2978.1282.4583.31
FocalNet86.9776.4181.3482.27
Our methodDWTF-DETR88.4677.7382.7583.94
The results are color-coded, with red highlighting the top performance and green for the second best.
Table 5. Parameters of Different Methods.
Table 5. Parameters of Different Methods.
MethodsLayersParameters (M)GFLOPs
General
Detector
YOLOv82253.018.2
YOLOv113192.596.4
YOLOv124652.576.5
YOLO-DETR2286.1912.0
SWIN-DETR40236.4197.3
Sparse-DETR47319.7354.0
RT-DETR29519.9757.3
Methods based on feature
enhancement
YOLO-BiFPN3691.936.4
RT-DETR-BiFPN31520.4064.6
SEAttention32320.0657.3
CBAM34320.0657.3
EMA32522.3466.4
Methods
utilizing multi-
domain fusion
FFCM44516.6650.9
SFHF49318.0154.9
FocalNet53514.5648.7
Our methodDWTF-DETR33125.0966.7
Table 6. Visualization of the detection results of different methods on self-constructed dataset.
Table 6. Visualization of the detection results of different methods on self-constructed dataset.
Method E1 E2 E3 Method E1 E2 E3
YOLOv8Remotesensing 17 03301 i017Remotesensing 17 03301 i018Remotesensing 17 03301 i019RT-DETR-BiFPNRemotesensing 17 03301 i020Remotesensing 17 03301 i021Remotesensing 17 03301 i022
YOLOv11Remotesensing 17 03301 i023Remotesensing 17 03301 i024Remotesensing 17 03301 i025SE
Attention
Remotesensing 17 03301 i026Remotesensing 17 03301 i027Remotesensing 17 03301 i028
YOLOv12Remotesensing 17 03301 i029Remotesensing 17 03301 i030Remotesensing 17 03301 i031CBAMRemotesensing 17 03301 i032Remotesensing 17 03301 i033Remotesensing 17 03301 i034
YOLO-DETRRemotesensing 17 03301 i035Remotesensing 17 03301 i036Remotesensing 17 03301 i037EMARemotesensing 17 03301 i038Remotesensing 17 03301 i039Remotesensing 17 03301 i040
SWIN-DETRRemotesensing 17 03301 i041Remotesensing 17 03301 i042Remotesensing 17 03301 i043FFCMRemotesensing 17 03301 i044Remotesensing 17 03301 i045Remotesensing 17 03301 i046
Sparse-DETRRemotesensing 17 03301 i047Remotesensing 17 03301 i048Remotesensing 17 03301 i049SFHFRemotesensing 17 03301 i050Remotesensing 17 03301 i051Remotesensing 17 03301 i052
RT-DETRRemotesensing 17 03301 i053Remotesensing 17 03301 i054Remotesensing 17 03301 i055FocalNetRemotesensing 17 03301 i056Remotesensing 17 03301 i057Remotesensing 17 03301 i058
YOLO11-BiFPNRemotesensing 17 03301 i059Remotesensing 17 03301 i060Remotesensing 17 03301 i061DWTF-DETRRemotesensing 17 03301 i062Remotesensing 17 03301 i063Remotesensing 17 03301 i064
Green boxes indicate correctly detected targets, red boxes indicate false detections, and yellow boxes indicate missed detections. E1: Moored and Near-Shore Ships; E2: Dockside Ships with Blurred Boundaries; E3: Ships in Complex Land–Object Environments.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dong, T.; Wang, T.; Han, Y.; Li, D.; Zhang, G.; Peng, Y. DWTF-DETR: A DETR-Based Model for Inshore Ship Detection in SAR Imagery via Dynamically Weighted Joint Time–Frequency Feature Fusion. Remote Sens. 2025, 17, 3301. https://doi.org/10.3390/rs17193301

AMA Style

Dong T, Wang T, Han Y, Li D, Zhang G, Peng Y. DWTF-DETR: A DETR-Based Model for Inshore Ship Detection in SAR Imagery via Dynamically Weighted Joint Time–Frequency Feature Fusion. Remote Sensing. 2025; 17(19):3301. https://doi.org/10.3390/rs17193301

Chicago/Turabian Style

Dong, Tiancheng, Taoyang Wang, Yuqi Han, Deren Li, Guo Zhang, and Yuan Peng. 2025. "DWTF-DETR: A DETR-Based Model for Inshore Ship Detection in SAR Imagery via Dynamically Weighted Joint Time–Frequency Feature Fusion" Remote Sensing 17, no. 19: 3301. https://doi.org/10.3390/rs17193301

APA Style

Dong, T., Wang, T., Han, Y., Li, D., Zhang, G., & Peng, Y. (2025). DWTF-DETR: A DETR-Based Model for Inshore Ship Detection in SAR Imagery via Dynamically Weighted Joint Time–Frequency Feature Fusion. Remote Sensing, 17(19), 3301. https://doi.org/10.3390/rs17193301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop