Next Article in Journal
Applicability Assessment of GFED4 and GFED5 on Forest Fires in Chinese Mainland and Its Fire-Scale Patterns Change
Previous Article in Journal
Coherent Dynamic Clutter Suppression in Structural Health Monitoring via the Image Plane Technique
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FAWT-Net: Attention-Matrix Despeckling and Haar Wavelet Reconstruction for Small-Scale SAR Ship Detection

1
Beijing Institute of Remote Sensing Equipment, the Second Institute of China Aerospace Science and Technology Corporation, Beijing 100854, China
2
College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China
3
Department of Space Microwave Remote Sensing System, Aerospace Information Research Institute, Beijing 100190, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(20), 3460; https://doi.org/10.3390/rs17203460
Submission received: 10 September 2025 / Revised: 9 October 2025 / Accepted: 14 October 2025 / Published: 16 October 2025
(This article belongs to the Section Remote Sensing Image Processing)

Abstract

Highlights

What are the main findings?
  • FAWT-Net introduces a unified framework integrating attention-matrix despeckling, Haar wavelet-based feature reconstruction, and aspect-ratio-aware WH-IoU loss for small-scale ship detection in SAR images.
  • Compared with the existing methods, this method increases the AP50 by 1.3% and the APS by 0.8% on the SSDD and LS-SSDD datasets. Moreover, the recall rate is increased by 2.2%. It has a significant improvement in the detection of small targets under low signal-to-noise ratio conditions.
What is the implication of the main finding?
  • The attention-matrix despeckling module effectively suppresses coherent speckle noise interference, enabling the network to focus on target locations even when noise occupies more pixels than the ship itself.
  • The Haar wavelet reconstruction during upsampling preserves detailed ship contours, significantly enhancing the detection of small targets in complex maritime scenes with minimal computational overhead.

Abstract

Aiming at the challenges faced by the detection of small-scale ship targets in Synthetic Aperture Radar (SAR) images, this paper proposes a novel deep learning network named FAWT-Net based on attention-matrix despeckling and Haar wavelet reconstruction. This network collaboratively optimizes the detection performance through three core modules. First, during the feature transfer stage from backbone to the neck, a filtering module based on attention matrix is designed, which can suppress the speckle noise. Then, during feature upsampling stage, a wavelet transform feature upsampling method for reconstructing image details is designed to enhance the distinguishability of target boundaries and textures. At the same time, the network also combines sub-image feature stitching downsampling to avoid losing key details in small targets, and adopts a scale-sensitive detection head. By adaptively adjusting the shape constraints of prediction boxes, it effectively solves the regression deviation problem of ship targets with inconsistent aspect ratios. Verified by experiments on SSDD and LS-SSDD, the proposed method improves AP50 by 1.3% and APS by 0.8% on the SSDD. Meanwhile, it is verified that the proposed method has higher precision and recall rates on the LS-SSDD, and the recall rate has been increased by 2.2%.

1. Introduction

Synthetic Aperture Radar (SAR) provides high-resolution imagery under all-weather and day–night conditions. It is indispensable for maritime surveillance, illegal fishing monitoring, search and rescue operations, and national security [1]. Among various SAR applications, ship detection is a critical and widely studied task due to the increasing demand for automated maritime situational awareness. Over the past decade, significant progress has been made in SAR ship detection, transitioning from traditional thresholding and constant false-alarm rate (CFAR) methods [2,3] to advanced deep learning frameworks that leverage convolutional neural networks (CNNs) and Transformer architectures [4,5,6,7,8].
Despite these advances, detecting ships in SAR imagery remains a highly challenging task, primarily due to the unique physical imaging mechanisms of SAR systems and the complex maritime environment. One of the most prominent challenges is the presence of speckle noise, a granular interference inherent in coherent imaging systems. Unlike the additive Gaussian noise commonly found in optical images, speckle noise is multiplicative and signal-dependent, severely degrading image quality and obscuring fine structural details of ships. This noise not only reduces the contrast between targets and background but also introduces false responses that can be easily mistaken for real ships, leading to a high false alarm rate, especially in nearshore regions with complex textures such as harbor structures, reefs, and waves [9,10,11]. The manifestation of speckle noise in SAR images mainly depends on the echo signal intensity and the surface of the scattering target. According to existing research [12,13], the power of the noise can be approximated by a gamma approximation [14]. Compounding this issue is the strong sea clutter interference, particularly in low-wind or nearshore scenarios, where the backscatter from the sea surface becomes spatially heterogeneous and non-stationary. Considering that the interference in SAR images can be divided into speckle components and clutter components, to describe the interference in SAR images in more detail, the K distribution and its extended models are widely used in the description of actual scenario models [15]. The K+Noise distribution [16], as a classic model of high-resolution radar sea clutter, has been studied and proven to be effective. In the actual data of X-band sea clutter, the variation law of the K distribution parameters with the radar resolution has been systematically analyzed [17], further verifying the applicability of this model. Traditional detection algorithms often assume a homogeneous background, which fails under such conditions, resulting in poor generalization and degraded detection performance. Even state-of-the-art deep learning models, which rely on clean feature representations for accurate localization and classification, struggle when trained on noisy data without effective preprocessing or noise-robust feature learning mechanisms.
Furthermore, a significant portion of detectable ships in large-scale SAR scenes are small targets, often occupying only a few pixels in the image. These include small fishing vessels, yachts, or distant cargo ships, whose weak radar cross-section (RCS) makes them difficult to distinguish from background fluctuations. The limited spatial resolution and low signal-to-noise ratio (SNR) associated with such targets exacerbate the problem, causing standard detection networks—designed primarily for larger objects in natural images—to miss these instances or produce inaccurate bounding boxes. The lack of sufficient high-resolution features after multiple downsampling layers in CNNs further diminishes the model’s sensitivity to small-scale ships. In addition to these physical and perceptual challenges, the arbitrary orientation of ships in SAR images necessitates the use of oriented bounding boxes (OBBs) [18,19,20] rather than axis-aligned ones (HBBs). While OBB-based detection improves localization accuracy, it introduces new difficulties such as boundary discontinuity in angle regression and increased model complexity, especially when combined with the aforementioned noise and scale challenges. To address these issues, recent studies have explored various strategies, including feature-enhancement modules guided by partial differential equations [21], feature fusion based on wavelets [22], and multi-scale feature fusion networks [23]. However, most existing methods treat noise suppression, feature extraction, and small-object detection as separate stages, often leading to suboptimal integration and information loss. There remains a pressing need for a unified framework that simultaneously tackles speckle noise suppression, robust feature learning under sea clutter, and accurate detection of small, arbitrarily oriented ships in a coherent and end-to-end trainable manner.
In this paper, we propose a novel architecture specifically designed to address these intertwined challenges. By integrating noise-suppressed feature enhancement, wavelet-based detail feature reconstruction, and a scale-sensitive detection head, our method aims to improve the performance of detecting small ships in complex SAR maritime monitoring scenarios. This approach integrates Sub-image Feature Splicing Downsampling (SPD), Filtering with Attention matrix (FAM), Haar Wavelet Transform (HWT), and the Boundary Width-to-Height Ratio Loss (WH-IoU) into plug-and-play modules, and their respective advantages have been verified in the experiments. Meanwhile, extensive experiments on public benchmarks such as SSDD and LS-SDD demonstrate the effectiveness and robustness of the proposed approach, particularly under small-object conditions. The specific contributions are as follows:
  • Aiming at the widely existing speckle noise in SAR images, inspired by the transformer structure, we proposed a speckle noise-filtering method using the attention matrix. This method can effectively reduce the interference of speckle noise on ship recognition and enable the network to focus on the target location.
  • In response to the problem of easy feature loss in small-scale ship target detection, we designed a feature-enhancement method based on the discrete wavelet transform. This method is applied in the upsampling stage and can enable the network to effectively retain ship features.
  • The WH-IoU loss that can take into account the height-to-width ratio of ships is effectively integrated, which further enables the detection box to conform to the target.
  • An sub-image feature splicing downsampling method is integrated to retain more features during the downsampling stage of the backbone, further enhancing the effectiveness of the filtering and feature-enhancement methods.
  • The chapter arrangement is as follows: Section 2 discusses the related works; Section 3 elaborates on the core structure of FAWT-Net; Section 4 introduces the experimental methods; Section 5 presents the results of various experiments; Section 6 conducts an in-depth discussion on the work of this paper and proposes the directions for future work; Section 7 provides a summary.

2. Related Works

2.1. General Deep Learning-Based Architectures for SAR Ship Detection

With the increasing availability of SAR imagery, automatic ship detection has become a critical component in maritime surveillance, environmental monitoring, and defense-related applications. Conventional detection approaches have traditionally relied on handcrafted features and heuristic thresholding strategies [24,25,26], which often suffer from limited generalization capability and high false alarm rates, particularly in heterogeneous maritime environments. The rapid advancement of deep learning has revolutionized object detection, enabling significant improvements in accuracy and robustness. The emergence of benchmark datasets such as SSDD and HRSID [27,28,29], as well as more detailed research on the ship scattering mechanism [30], has further accelerated the transition from classical feature-based pipelines to end-to-end trainable deep neural networks.
Among the various deep architectures, Faster R-CNN [31] has been extensively adopted in SAR analysis because of its well-defined two-stage framework and high detection precision. However, the original Faster R-CNN was primarily designed for natural images and encounters several challenges when applied to SAR data. SAR imagery lacks chromatic information, exhibits low signal-to-noise ratios (SNR), and often represents ship targets as isolated bright spots or weak scatterers embedded in complex and non-stationary backgrounds. To address these domain-specific issues, numerous adaptations of Faster R-CNN have been proposed. One notable extension is Meta R-CNN [32], which integrates meta-learning into the detection pipeline to enable effective recognition of novel categories with only a few annotated examples. By establishing a similarity matching mechanism between support and query sets, this framework enhances generalization in low-data regimes, making it particularly suitable for detecting rare or infrequently observed vessel types in SAR scenes, where large annotated datasets are difficult to obtain. The incorporation of a meta-learning module during Region of Interest (RoI) feature extraction strengthens the model’s adaptability to unseen classes, offering a promising direction for few-shot learning in remote sensing.
Further advancing this line of work, Qiao et al. introduced DeFRCN (Decoupled Faster R-CNN) [33], which decouples the feature learning and detection tasks within the two-stage framework. In conventional Faster R-CNN, shared backbone features are used for both region proposal generation and final detection, potentially leading to conflicting optimization objectives and suboptimal representations. DeFRCN addresses this by employing two distinct branches: one dedicated to region proposal generation and another to detection head refinement. This architectural separation improves feature consistency and has demonstrated superior performance in few-shot scenarios. In the context of SAR, this decoupling strategy enhances sensitivity to faint targets and reduces false alarms, proving especially beneficial for long-range detection under low-SNR conditions. To improve computational efficiency and streamline the detection process, one-stage detectors have gained prominence. These models perform direct dense prediction on feature maps without requiring a separate region proposal stage. Notable examples include the YOLO series [34,35], SSD [36], and FCOS [37,38]. Among these, FCOS [39] eliminates the need for predefined anchor boxes, instead adopting a per-pixel prediction paradigm where each spatial location is treated as a potential object center, offering greater flexibility in handling objects of arbitrary scale and aspect ratio. This anchor-free design avoids the need for anchor hyperparameter tuning and improves adaptability to sparse and irregularly shaped targets.

2.2. Anchor-Free and Transformer Architectures for Small-Scale SAR Ship Detection

Building on the above, anchor-free detectors tailored for SAR have shown promise on small and poorly resolved vessels. AFSar [40] employs ResNet/ResNeXt backbones with FPN to aggregate multi-scale cues, and introduces a multi-scale enhancement module that adaptively fuses shallow high-resolution details with deep semantics. A customized loss mitigates low-SNR and sea-clutter effects, improving robustness. To stabilize localization when ship edges are ambiguous, AFSar treats the ship center as a keypoint and estimates scale around it, reducing boundary-regression instability. Experiments indicate advantages over both two-stage (e.g., Faster R-CNN) and one-stage (e.g., YOLOv5) baselines, particularly in cluttered sea states. Transformer-based models further extend receptive fields and capture long-range dependencies often underexploited by CNNs. CRTransSar [41] utilizes a Swin-Transformer backbone with a Contextual Enhancement Module (CEM) to compute spatial affinities and suppress background responses, while cross-stage interactions blend fine-grained and high-level cues. This global attention improves discrimination among closely spaced or partially occluded vessels (e.g., nearshore anchorages). To curb the computational overhead common to Transformers on large SAR scenes, CRTransSar adopts windowed self-attention, which constrains computation yet preserves sufficient context for accurate detection. The SAD-Det proposed by Chen et al. [42] combines Transformer and adaptive feature-extraction techniques, effectively enhancing the context awareness ability of ships in SAR images and the robustness to noise, thus achieving high-precision ship detection. Chen et al. [43] proposed an end-to-end detection method based on transformers, which incorporates the Perceptual Enhancement Transformer and the sparse attention mechanism, effectively improving the detection accuracy of multi-scale and sparse ship targets in the presence of noise and complex backgrounds.
Small ships remain difficult due to weak backscatter, low resolution, and strong resemblance to sea clutter. Multi-scale fusion is therefore pivotal. Standard FPNs propagate semantics top–down but may underutilize shallow discriminative details. HRLE-SARDet [44] addresses this with a Hybrid Representation Learning Enhancement and bidirectional feature enhancement, coupling top–down semantic enrichment with bottom–up detail injection to form a closed-loop refinement that better preserves structure relevant to small targets. Beyond architecture, recent advances target small-object fidelity through improved queries, context, and geometry. RDB-DINO [45] leverages end-to-end Transformer detection with refined denoising queries and box refinement for small ships. Semantic-context enhancement [46] expands the effective receptive field to strengthen discrimination amidst complex backgrounds. For oriented detection, the BurgsVO encoding scheme [47] alleviates boundary discontinuities and yields more precise localization. From an application perspective, SAR has been exploited to observe submesoscale oceanic phenomena related to upwelling [48] and to monitor small riverine dredges [49]; the latter reported an overall accuracy of ~72%, underscoring the difficulty of distinguishing tiny, mobile targets from clutter and the need for better generalization.
Physics-informed and statistical perspectives have also been explored. By combining electromagnetic scattering characteristics with image features, Zhang et al. [50] improved interpretability, while Deng et al. [51] enhanced ship–background contrast via statistics of pixel sets coupled with scattering mechanisms. Nevertheless, many of these designs incur high computation/memory footprints—hindering real-time or resource-limited deployment—and some treat speckle noise only implicitly, leaving robustness gaps in practical scenarios.
The above literature suggests that (i) explicit speckle-aware representation learning, (ii) detail-preserving upsampling for small targets, and (iii) scale/aspect-ratio sensitivity remain bottlenecks. Our FAWT-Net targets these gaps via attention-matrix–based despeckling during feature transfer, Haar wavelet–based detail reconstruction in upsampling, and a scale-sensitive head that adapts to varying aspect ratios—aiming for improved recall and precision on small ships under low-SNR, heterogeneous backgrounds.

3. Methodology

3.1. Architecture of FAWT-Net

Due to the large amount of coherent noise and sidelobe interference in SAR images themselves, this impacts the quality of the imaging results. Especially in the field of small target recognition, under low signal-to-noise ratio conditions, the pixel scale occupied by coherent speckle noise and pulse compression sidelobes may exceed that occupied by the ship target itself. When ships sail near the coast, there are also complex coastal and island targets in the observation scene. When coherent speckle noise is superimposed, the target features are further weakened, which seriously affects the recognition accuracy of the network. Starting from the network structure, this paper designs a network FAWT-Net that has both the ability to filter coherent speckle noise and the ability to enhance the edge features of small-scale targets.
Figure 1 shows the architecture of FAWT-Net. We used the amplitude image after SAR imaging as the signal input of the network. Among it, FAM (filtering with attention matrix) represents the filtering method based on the attention matrix, and HWT (Haar wavelet transform) represents the method of enhancing the edge features of targets based on wavelet transform. At the output part of the network, we adopted a bounding box estimation method that focuses on the aspect ratio of the target. As can be seen from the figure, the filtering methods of the network are concentrated in the lateral transmission process at different stages. The reason for this design is that whether the network is performing the initial feature extraction or the upsampling and downsampling during the feature fusion stage, it will receive feature information from the preceding network. For a network with a stronger ability to retain detailed features, it contains more noise interference. Considering that adding FAM during the downsampling process may cause the image to become smooth and result in the loss of too many feature details, FAM is added during the transmission step to suppress the propagation of noise and make the network more focused on target features. In the vertical structure of the network from deep to shallow, focusing on retaining more details of the ship contour, we use HWT to enhance the ship contour during the upsampling process.
It should be particularly emphasized that for small target detection, the recognition accuracy of the network is related to the amount of information in the feature flow in the network to a certain extent. When the amount of information retained at each stage is large, there is more target information available for the network to refer to. Guided by this principle, this paper introduces a sub-image feature splicing downsampling method [52,53,54] that has been widely verified and can retain more pixels of the ship contour during the downsampling stage. Different from the traditional method of increasing the receptive field by relying on large-kernel convolution, this method discretizes the SAR image with the scale of HWC by dividing it pixel by pixel according to the horizontal and vertical pixel numbers and then reorganizes it into sub-feature maps, and performs feature downsampling by stacking the sub-feature maps in channels. As shown in Figure 2, the pixel separate method of the sub-image is to discretize the image with a step of one pixel interval in the h dimension and w dimension, respectively. The expressions are as follows:
F = f 1 , 1 f 1 , 2 f 1 , w f 2 , 1 f 2 , 2 f 2 , w f h , 1 f h , 2 f h , w h   ×   w
F 1 = f 1 , 1 f 1 , 3 f 1 , w 1 f 3 , 1 f 3 , 3 f 3 , w 1 f h 1 , 1 f h 1 , 3 f h 1 , w 1 h / 2   ×   w / 2   F 2 = f 1 , 2 f 1 , 4 f 1 , w f 3 , 2 f 3 , 4 f 3 , w f h 1 , 2 f h 1 , 4 f h 1 , w h / 2   ×   w / 2   F 3 = f 2 , 1 f 2 , 3 f 2 , w 1 f 4 , 1 f 4 , 3 f 4 , w 1 f h , 1 f h , 3 f h , w 1 h / 2   ×   w / 2   F 4 = f 2 , 2 f 2 , 4 f 2 , n f 4 , 2 f 4 , 4 f 4 , n f h , 2 f h , 4 f h , w h / 2   ×   w / 2
Among them, F represents the input feature map, F1, F2, F3, and F4, respectively, represent the four downsampled feature maps that are discretized and reorganized, and f represents each pixel point in the figure. After downsampling, a convolutional operation with a convolutional kernel size of 1 × 1 is used to adjust the channel dimension (1 × 1 convolution). This convolution is used to perceive the channel dimension corresponding to each pixel. At the same time, the pixel information of new feature elements is extracted element-wise.
Different from simple channel shuffling operations, SPD divides the input image into sub-images along the pixel dimension and then performs channel concatenation. This module does not contain learnable parameters. For each pixel position in the output image, it is obtained by concatenating 4 sub-images along the channel dimension. Each sub-image needs to perform h / 2   ×   w / 2 pixel concatenation operations in the length and width dimensions. The total number of pixel concatenation operations in the downsampling result is expressed as h / 2   ×   w / 2   ×   c   ×   4 = h   ×   w   ×   c .

3.2. Filtering with Attention Matrix (FAM)

The spatial clustering distribution characteristics of speckle noise require the algorithm to simultaneously focus on the texture details of the local neighborhood and the global structural continuity, which is precisely the advantage of the FAM module. Figure 3 shows the detailed structure of the FAM module. At the local level, we draw on the convolution method of 3D convolution from the two-dimensional plane to the channel dimension and set up a 3 × 3 × 3 convolution kernel. After the channel shuffle operation, this method convolves the images of three adjacent channels and captures the local statistical characteristics of the noise cluster through the joint modeling of the spatial-echo signal spectrum. For example, it can suppress high-frequency noise by using a small convolution kernel while maintaining the clarity of edges. At the global level, the self-attention mechanism can reconstruct the long-distance spatial associations broken by noise interference by explicitly modeling the long-range dependencies between pixels, thereby restoring the continuity of the contours of ships, coasts, islands or terrain textures in SAR images.
One of FAM’s branches is the local branch for obtaining local texture and noise suppression. It extracts local spatial-echo signal spectrum features through 1 × 1 convolution and channel shuffling. The input tensor is represented as f R H   ×   W   ×   C . First, the 1 × 1 convolution is used to read target pixel features and adjust the channel dimension, and then channel shuffling is performed:
F 1 = Shuffle W 1   ×   1 f
To further improve the noise suppression effect of the local branch, we introduce the 3D convolution commonly used in the processing of hyperspectral images to obtain the local correlation of the SAR echo signal. Since the speckle noise is a typical multiplicative noise N. Input is represented as X, the echo signal after pulse compression. Then, according to the statistical characteristics of speckle noise:
F conv W F 1 W X E 1 + N W X , N ~ N ( 0 , σ 2 )
Here, it is assumed that the mean value of the speckle noise is E N = 0 , so the noise can be considered to be suppressed on average while the signal itself is retained. The local branch mainly focuses on the fact that the energy of speckle noise that may be present among different channels of the feature stream varies. After channel shuffling, the dependence between each channel is further reduced. The subsequent multi-dimensional convolutional perception promotes cross-channel information fusion. At the same time, the pooling effect of the convolution operation helps to suppress the noise in the feature stream. However, when the signal-to-noise ratio is low, it is difficult to effectively suppress the noise solely depending on the pooling effect of convolution. Therefore, it is necessary to introduce the attention mechanism of the local branch.
For the other global branch, its core purpose is to filter out long-range speckle noise while constructing the dependency relationship between different polarization modes. Here, two parallel QKV attention generation modules [55,56,57] are used. They have the same estimation principle, and the only difference is that different convolution methods are used to obtain target features at different scales. This is crucial for the network’s attention to ships and coastal targets in SAR images. The branch of 3 × 3 depthwise separable convolution enables the network to perceive ships. After obtaining the QK matrix, spatial dimension flattening and reversing operations are, respectively, performed on it to obtain:
Q R H   ×   W stride   ×   C
K R C   ×   H   ×   W stride
Further, the matrix A obtained through the activation function is expressed as:
A = Softmax K Q α R C   ×   C
where α is learnable. Similarly to the representation form of covariance matrix, the newly obtained matrix A mainly reflects the mutual relationship between different channels. This is crucial for the target detection results obtained by simultaneously processing multiple polarization modes in SAR applications and enhances the dependence of the network’s perception in identifying targets among different polarization modes. The reshaped matrix V’ and subsequent calculations can be expressed as:
Context = V A
This indicates that the features at each spatial location fuse information from other channels through the attention weights A, enhancing the modeling of global dependencies. Another set of parallel QKV attention modules is exactly the same in the subsequent calculation method, except that the depthwise separable convolution is replaced by the dynamic snake convolution. This is carried out to enable the network to simultaneously notice the coastal contour information, which is helpful for identifying nearshore targets overlapping with the coast. The output of the global module is expressed as:
F att = W 1   ×   1 Context 1 + W 1   ×   1 Context 2 + f

3.3. Haar Wavelet Transform (HWT)

The Discrete Wavelet Transform (DWT) can decompose a signal according to frequency scales, which has strong theoretical advantages for decomposing information of different frequencies in SAR images [58]. The pixel details contained of small ships mainly exhibit the characteristics of high-frequency signals. Using DWT can effectively enhance their detailed texture features and effectively prevent feature loss during upsampling.
DWT sets the scaling factor in a discrete manner, and has a smaller computational load compared to the Continuous Wavelet Transform, which has an indispensable advantage for SAR target recognition applications with high requirements for real-time inference. Figure 4 shows the detailed structure of HWT. It can be seen that DWT is applied after the input of shallow features. Here, it is considered that the upper-level features often contain more complex target detail contour information. This information can be understood as high-frequency signals at the signal processing level; while the semantic features representing the target position are mainly composed of low-frequency signals. Due to the filtering effect of the downsampling process, such signals are mainly concentrated in the deep features of the network.
In the DWT operation, the choice of wavelet is crucial for the effect after transformation. To restore the ships in the image more clearly, a comparison is made between Daubechies db4 and Haar here to analyze the boundary sparsity of the two. Among them, the scaling function φ x and wavelet function ϕ x of Daubechies db4 satisfy x m ϕ x d x = 0 m = 0 , 1 . Regarding the step signal of the target edge in the image, its coefficient distribution is:
d j k i = 0 3 h i x k i
Since the filter length is 4, 4 pixels near the edge pixels are involved in the operation, which results in 3–4 non-zero parameters in the calculated result image. For the Haar wavelet, its boundary coefficients are expressed as:
ϕ x = 1 ,   0 x < 0.5 1 ,   0.5 x < 1 0 ,   otherwise
After convolution, the Haar wavelet’s representation of step edges is exactly sparse (with only 1 non-zero coefficient), while db4/coiflet requires more coefficients and has poorer sparsity. Also, considering the issue of convolution length, the Haar wavelet has the shortest filter length (L = 2), while the filter lengths of Daubechies (e.g., L = 4 for db4) and Coiflet (e.g., L = 6 for coif2) are significantly longer. Especially when dealing with wide images, the computational complexity increases significantly. We use the Haar wavelet for DWT [59], which consists of the scaling and mother function:
h 0 = 1 2 , 1 2 , h 1 = 1 2 , 1 2
According to the definition of wavelet transform, first perform Haar transform on each row of the image f h , w to obtain the low-frequency and high-frequency decomposition results of each row. The expressions are as follows:
c A row h , w = f h , 2 w + f h , 2 w + 1 2
c D row h , w = f h , 2 w f h , 2 w + 1 2
On this basis, we further perform the Haar transform in the column direction on the results after row decomposition, and obtain:
LL h , w = c A row 2 h , w + c A row 2 h + 1 , w 2
LH h , w = c A row 2 h , w c A row 2 h + 1 , w 2
HL h , w = c D row 2 h , w + c D row 2 h + 1 , w 2
HH h , w = c D row 2 h , w c D row 2 h + 1 , w 2
Our design idea is to apply low-pass and high-pass filters horizontally and vertically, respectively, to the feature image with the upper-layer scale of H × W × C. After DWT decomposition, it is divided into four sub-bands, and the size of each sub-band is H/2 × W/2 × C. The four sub-bands are represented as HL with high-frequency in the horizontal direction and low-frequency in the vertical direction; LH with low-frequency in the horizontal direction and high-frequency in the vertical direction; HH with high-frequency in both the horizontal and vertical directions; and LL with low-frequency in both the horizontal and vertical directions [60]. LL is directly concatenated with the features in the lower-layer network in the channel dimension to obtain a low-frequency feature map with enhanced position and semantic information. The other three features HH, LH, and HL that contain high-frequency signals are concatenated in the channel dimension after the channel shuffle operation. The residual connection added at the end of this branch aims to facilitate the network to fit parameters according to the features, which is expressed as follows:
F = Concat Shuffle HH , HL , LH
F H = Conv 3   ×   3 Conv 3   ×   3 F + F
When performing IDWT, it is necessary to merge the sub-bands by column and further obtain the results of the inverse column transformation as follows:
c A col 2 h , w = L   L   h , w 2 c A col 2 h + 1 , w = L   H   h , w 2
c D col 2 h , w = H   L   h , w 2 c D col 2 h + 1 , w = H   H   h , w 2
c A row h , w = c A col h , 2 w + c A col h , 2 w + 1 2 c D row h , w = c D col h , 2 w c D col h , 2 w + 1 2
After that, it is necessary to further perform the inverse transformation in the row direction to obtain the reconstructed feature image of each channel:
f ˜ h , 2 w = c A row h , w + c D row h , w 2 f ˜ h , 2 w + 1 = c A row h , w c D row h , w 2
It is worth noting that the final upsampling result is obtained by concatenating the IDWT results of each channel in the channel dimension.

3.4. The Boundary Width-to-Height Ratio (WH-IoU) Loss

Since ships in SAR usually have slender or irregular geometric shapes, and are affected by speckle noise, resulting in blurred boundaries, the traditional IoU is not sensitive enough to the deviation of the center point and shape differences. For example, a slight deviation of a slender ship in the short-axis direction can significantly reduce the IoU value, while a large deviation in the long-axis direction may be overlooked, leading to directional biases in the regression results.
IoU = B B gt B B gt
In addition, small targets are prone to blend into the background. To solve this problem, we introduce the WH-IoU loss to improve detection accuracy by explicitly modeling the shape and scale characteristics of the target itself [61]. First, WH-IoU calculates a dynamic weight factor based on the aspect ratio of the target, assigns a higher deviation penalty weight to the short-side direction, and suppresses shape asymmetry.
w w = 2 w gt scale w gt scale + h gt scale
h h = 2 h gt scale w gt scale + h gt scale
In the formula, w gt and h gt respectively represent the width and height of the GT box, and the scale is set to 1. Secondly, targets’ loss sensitivity of different sizes is balanced through the scale normalization term S to ensure balanced optimization of the center point positioning and size regression of small targets.
S = i = 1 M j = 1 N w ij h ij i = 1 M N i
w ij and h ij represent the width and height of the j-th box in the i-th image, and Ni represents bounding boxes’ number in the i-th image. Finally, the shape difference loss term Ωshape and the shape weighted distance term Dshape are introduced. Through weighting the normalized aspect ratio error, the geometric similarity is further confined:
Ω shape = t = w , h 1 e w t 4 = 1 w w gt max w , w gt + 1 h h gt max h , h gt
D shape = h h x c x gt 2 c 2 + w w y c y gt 2 c 2
In the formula, x c , y c is the predicted box’s center point, and x gt , y gt is the actual box’s center point. The final loss function is obtained by considering the above factors, and its specific expression is:
L shape IoU = 1 IoU + D shape + 1 2 Ω shape
When the predicted box completely coincides with the ground truth box in the traditional IoU, the IoU reaches the maximum value of 1, and at this time, the gradient is 0. When the predicted box has no intersection or a very small intersection with the ground truth box, the IoU is close to 0, and the gradient also approaches 0, which makes the optimization process difficult. GIoU introduces the concept of the covering bounding box (convex hull):
GIoU = IoU C B B gt C
where C represents the smallest enclosing rectangle of the ground truth (GT) box and the predicted box. Even when the IoU is 0, it can provide a non-zero gradient by predicting the “non-overlapping” area between the predicted box and the ground truth box. This solves the problem of gradient disappearance of IoU when there is no intersection. However, it still mainly relies on the area ratio and has limited ability to handle shape differences (such as aspect ratio), especially in the short-side offset of slender objects. DIoU explicitly adds a penalty term for the distance between the center points, which can more directly guide the center point of the predicted box to approach the ground truth center:
DIoU = IoU ρ 2 b , b gt c 2
Here, b and b gt are the center points of the predicted box and the GT box, respectively. ρ represents the Euclidean distance, and c is the diagonal distance of the smallest enclosing bounding box of the predicted box and the GT box. For short-side offset, the gradient of the center point distance is more sensitive. It mainly focuses on the center point distance and the size of the bounding box but lacks a direct penalty for changes in shape (especially aspect ratio). CIoU combines the penalty terms of IoU, center point distance, and aspect ratio, but it does not directly penalize shape differences as directly as WH-IoU:
CIoU = IoU ρ 2 b , b gt c 2 α ν
α = ν 1 IoU + ν
ν = 4 π 2 arctan w gt h gt arctan w h 2
In contrast, the processing flow of WH-IoU can be expressed by Figure 5.
This mechanism effectively alleviates the problem of gradient disappearance of traditional IoU in small target detection. Through shape-adaptive weight assignment, it significantly improves the localization robustness of slender or irregular ship targets in SAR images. Especially, it can still maintain sensitivity to subtle geometric changes under noise interference.

4. Experiment

4.1. Dataset and Training Details

Table 1 summarizes the key information of the SSDD and LS-SSDD datasets. SAR images in SSDD dataset are sourced from mainstream SAR satellite platforms such as Gaofen-3 (GF-3) and Sentinel-1. It covers a variety of typical maritime scenarios, including open ocean waters, port berthing areas, and surroundings of islands and reefs. The dataset incorporates images with two common polarization modes (VV and VH) and varying resolutions. The ship targets within the dataset exhibit significant size differences: the smallest target occupies merely 28 pixels, while the largest one takes up 62,878 pixels. From the perspective of target type statistics, 172 images in the dataset involve small ships, and the number exceeds 1000. The statistical results of the target scale are shown in Figure 6. We consider that when the target pixel scale is less than 1/100 of the image area, it is regarded as a small-scale target, which is represented by blue scatter points in the figure. To construct an available dataset, we select 80% of the images from it as the training set.
The LS-SSDD dataset is a SAR image dataset with high resolution specifically constructed for small-scale ship target detection tasks. It has a more diverse data source, covering not only satellite SAR data from Sentinel-1 and RADARSAT-2 but also images collected by airborne SAR platforms. The scenarios include both daytime and nighttime periods, as well as sea surface environments under various wind and wave conditions. The prominent features of this dataset lie in target characteristics and scenario complexity: the ship targets generally present tiny sizes, with average of 12–15 pixels and the smallest target being only 5 pixels. Additionally, the aspect ratios of the targets vary significantly, and their forms include various types such as hulls, masts, and deck structures. Furthermore, the images are strongly interfered by SAR speckle noise and sea clutter. In terms of data scale and division, LS-SSDD contains more than 10,000 multi-polarization (HH, HV, VV polarizations) SAR images, with a spatial resolution ranging from 0.3 to 1.5 m per pixel. This can fully ensure the generalization ability of the algorithm in scenarios with different sea areas, meteorological conditions, and ship types. Since the original data also does not have a training set for training, we still choose 80% of the images for training.
All the networks described in this study were trained using the 12th Gen Intel Core i9–12900HX central processing unit and the NVIDIA GeForce RTX 4060 laptop graphics processing unit. The framework used for training was Ultralytics. The other models compared in the experiment were trained using the mm detection and Ultralytics frameworks. For each model, the number of training epochs is set to 50, the number of images per batch is 4. Training is carried out using dynamic learning rate. The initial learning rate is 0.01, and the final learning rate is 0.001. The optimizer chosen is Stochastic Gradient Descent (SGD). The warm-up period is set to 3, the non-maximum suppression threshold is set to 0.6, the scaling gain of random image scaling is set to 0.5, and the probability of random image flipping is set to 0.5.

4.2. Evaluation Criteria

Precision measures the ratio of ships that are actually labeled as ships among all the ships detected by the model, and is expressed as:
P = TP TP + FP
Among them, TP represents the number of correct predictions, and FP represents the number of ships not detected. R is used to measure the ratio of detected ships among ship labels:
R = TP TP + FN
FN is the number of targets that are misjudged as non-ship targets. The F1 Score is used to balance the importance of P and R:
F 1 = 2 P R P + R
AP calculates the accuracy of the network in detecting ship targets under different thresholds. This accuracy is related to both the recall rate and the detection accuracy:
AP = 0 1 P R dR
As an extended indicator of AP, APS is mainly used to measure the detection accuracy of small-scale objects with a pixel ratio less than 30. AP50 represents the detection accuracy under the 0.5 IoU threshold; similarly, AP75 measures the detection accuracy under the 0.75 IoU threshold.

5. Results

5.1. Quantitative Results

To evaluate the detection performance of FAWT-Net for ship targets, we applied the trained weights to the SSDD and LS-SSDD datasets, respectively, to obtain the final evaluation results. We compared seven commonly used networks on both datasets, as well as four networks specifically for remote sensing, PKINet, CRTransSar, FESAR and SAR-CNN. Table 2 and Table 3 show the comparison results, respectively.
Compared with the sub-optimal method, the test results of FAWT-Net on the SSDD dataset show that AP under different IoU thresholds has improved. In particular, the APS index for small target detection has increased by 0.8%, which further proves the advantage of our method in small target detection. From the Flops and Params parameters, it can be seen that this method has fewer parameters and requires less computing power. Especially compared with the SAR-CNN dedicated to SAR target detection, when the detection performance is comparable, this method occupies fewer hardware resources, which is conducive to further deployment on embedded platforms with low computing power in the future.
Table 3 mainly shows the test results on the LS-SSDD dataset. As a lightweight network, FAWT-Net has a higher recall rate R. This improvement is crucial when detecting small ship targets. Especially in most of the observation scenarios in LS-SSDD dataset, the proportion of land is much larger than that of ships, and for nearshore ship targets, there is often pixel overlap with islands. This kind of detection is quite difficult, and it is very likely to cause missed detections for other networks. The reduction in the missed detection rate indicates that this network is suitable for this kind of detection scenario.

5.2. Qualitative Results

To prove the advantages of FAWT-Net in detecting SAR ships, we, respectively, presented the results of testing above networks on SSDD and LS-SSDD. As shown in Figure 7, the first row represents actual annotations of ships in the image; Figure 7b represents FAWT-Net’s results; the rest are comparison results. When facing noise interference, our method demonstrates a good recall rate. It can extract the pixel positions of ships from interference images and prevent false alarms caused by the large scale of clutter. When facing complex nearshore scenarios, our method can still maintain good detection accuracy. Especially when ships coincide with the coastline, our method can also effectively extract the required ship targets from the coastal boundary.
Figure 8 represents the results by comparison different methods on LS-SSDD. Similarly to the above images, the first row represents the ground truth annotations; the second row presents the detection results of our method. Different from the SSDD dataset, the LS-SSDD dataset focuses more on small targets. Therefore, the scale of ship targets is smaller in it and the detection difficulty is higher. For example, Figure 8a shows the detection performance of a large number of ships in a nearshore scenario. Our method can effectively achieve a low missed detection rate. Even for ships that highly coincide with the coastline, our network can identify them by extracting their contour characteristics. When detecting small ships, the noise interference will seriously affect the detection performance of the network, and small targets have a high probability of being submerged in the noise and being missed. Our network reduces the impact of noise interference through multi-level filtering, ensuring the recall rate.
To further demonstrate the detection performance of this network under complex noise and high sea state conditions, we selected the images shown in Figure 9 for comparison. It can be seen that FAWT-Net can still maintain a low missed detection rate under conditions of more speckle noise and high sea states. At the same time, compared with other methods, it can also reduce the false alarm probability to a certain extent.

5.3. Ablation Experiment Results

To confirm the contribution of each module in the FAWT-Net, regarding the problem of ship recognition in SAR, we carried out the following ablation experiments. The results are shown in Table 4. We measured the detection metrics after adding each module, respectively. By stacking different modules, the detection accuracy is continuously improving, and recall rate is also increasing. Since the premise for FAWT-Net to achieve filtering is that the network can ensure a good feature sampling effect, the more detailed features are obtained during the downsampling process, the more features can be retained after filtering. Therefore, the role of the SPD module cannot be ignored either. After adding it, the network performance has been further improved. It can be considered that precisely because SPD retains more target information, even though it also retains the noise in the image, after the filtering process, the network’s attention to detailed features is enhanced. As a result, the overall recognition accuracy and recall rate of the network have been improved.
To further illustrate the improvement effect of WH-IoU, we counted the aspect ratios of the object annotation bounding boxes in the SSDD dataset. The results are shown in Figure 10. We consider an object with an aspect ratio greater than 4 or less than 0.2 to be a slender object. We further selected the images containing such objects and conducted ablation experiments. The comparison results are shown in Table 5. It can be seen that WH-IoU has a relatively higher detection accuracy.
To verify the contribution of the HWT module, the effect of HWT is demonstrated through feature flow visualization below, as shown in Figure 11. When faced with a large amount of noise interference, the feature flow input to the three detection heads after upsampling by HWT was further refined. Especially the features input to the deep network shown in the third column, where the target positions were clearer and more intuitive. This operation made the weights of the three detection heads more average when obtaining the target detection results, enabling the deep network, which originally had a low contribution to small target detection, to also participate.
We used the same comparison method to visualize the contribution of the FAM module. It can be seen that whether in images with noise interference or in complex scenarios, the noise-filtering capability of FAM can boost the attention of the feature flow within the network towards the real target. It is obvious from Figure 12c,d that after using FAM, the feature-enhancement part of the network is mainly concentrated on the target position. Even though a small amount of clutter interference can still be seen in the image after filtering, the intensity of these clutter signals is very weak and no longer affects the network’s attention to the target.

6. Discussion

The series of experiments described above served to validate the performance of FAWT-Net under interference and in complex environments. Taking into account the actual application scenarios, including spaceborne and airborne applications, SAR images are characterized by a wide coverage area and are frequently employed to acquire ship targets over large regions. From a practical application perspective, we conducted further tests on the segmentation efficacy of this network in wide-format SAR images. As depicted in Figure 13, when dealing with regions within the image where ships are relatively concentrated, this method is still capable of attaining a low missed-detection rate, thereby establishing a solid foundation for practical applications.
Verified on two commonly used public datasets for small target detection, the FAWT-Net outperforms other networks in detecting small ships. The main modules relied on in this method are the FAM filtering and the upsampling module based on wavelet transform. The design idea of the FAM module takes into account the limitations of current SAR observation scenarios. Since SAR images are different from optical imaging, which rely on actively emitting electromagnetic signals and obtaining echoes through receiving equipment for imaging, the quality of the images depends more on the polarization method of the electromagnetic signals and the quality of the receiving equipment, which inevitably introduces more noise. Considering that most current networks focus on mining features at different depths of the network and pay less attention to the lateral feature flow, we take image filtering as the starting point and try to optimize the feature information at the level of lateral feature flow. It has been verified that this method has a certain effect on extracting the edge contour and semantic information of ships from interfering SAR images. The HWT module primarily incorporates the approach of extracting signals at various frequencies, as utilized in the wavelet transform methodology. This enables the module to effectively integrate information bearing distinct characteristics within the image. This is mainly used when different polarization methods in multi-polarization SAR image data may cause differences in the representation of targets. Using this method can enable the network to effectively fuse the differences brought by various polarization methods, thereby improving the anti-interference performance. This paper mainly focuses on small ship detection. It is advisable to contemplate applying the concept of this research to multi-scale target recognition later. Additionally, more meticulous data with rotated box annotations can be incorporated, and further exploration of the application of this method in the realm of ship pose estimation is merited. Building upon the design concept of this method, by initiating measures to enhance the receptive field of the network and the efficiency of feature fusion, the accuracy and application scope of the network can be further augmented.

7. Conclusions

Our work in this paper is mainly reflected in building a network dedicated to small-scale ship target detection under the ultralytics framework. The work mainly includes optimizing the feature downsampling using the SPD module, enabling more features in the backbone network to be retained. Then, during the transmission stage from backbone to neck, we designed FAM filtering module to filter out the coherent speckle noise interference unique to SAR images. During the upsampling stage of the FPN neck, the HWT module designed by combining the wavelet transform method with the Transformer concept can further optimize the feature distribution during the upsampling process. Finally, by using the WH-IoU to better fit the target shape. This method starts with the height and width of the ship, and introduces the ratio weight into the target box drawing, enabling the final obtained target box to better fit the ship. The final results show that FAWT-Net is capable of handling the task of small ship detection and has certain advantages in accuracy and recall rate.

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; software, Y.Z.; validation, Y.Z.; formal analysis, Y.Z.; investigation, Y.Z.; resources, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Z.S. and S.C.; visualization, Y.Z.; supervision, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

Thanks to Zifen Chen (Z.C.) for providing the experimental site and experimental equipment.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, M.; Ouyang, Y.; Yang, M.; Guo, J.; Li, Y. ORPSD: Outer Rectangular Projection-Based Representation for Oriented Ship Detection in SAR Images. Remote Sens. 2025, 17, 1511. [Google Scholar] [CrossRef]
  2. Xie, T.; Liu, M.; Zhang, M.; Qi, S.; Yang, J. Ship Detection Based on a Superpixel-Level CFAR Detector for SAR Imagery. Int. J. Remote Sens. 2022, 43, 3412–3428. [Google Scholar] [CrossRef]
  3. Gao, G.; Ouyang, K.; Luo, Y.; Liang, S.; Zhou, S. Scheme of Parameter Estimation for Generalized Gamma Distribution and Its Application to Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2017, 55, 1812–1832. [Google Scholar] [CrossRef]
  4. Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 9626–9635. [Google Scholar]
  5. Zhao, J.; Ding, Z.; Zhou, Y.; Zhu, H.; Du, W.-L.; Yao, R.; El Saddik, A. RQFormer: Rotated Query Transformer for End-to-End Oriented Object Detection. Expert Syst. Appl. 2025, 266, 126034. [Google Scholar] [CrossRef]
  6. Xu, X.; Zhang, X.; Zeng, T.; Shi, J.; Shao, Z.; Zhang, T. Group-Wise Feature Fusion R-CNN for Dual Polarization SAR Ship Detection. In Proceedings of the 2023 IEEE Radar Conference, Radarconf23, San Antonio, TX, USA, 1–5 May 2023; IEEE: New York, NY, USA, 2023. [Google Scholar]
  7. Shao, Z.; Zhang, X.; Xu, X.; Zeng, T.; Zhang, T.; Shi, J. CFAR-Guided Convolution Neural Network for Large Scale Scene SAR Ship Detection. In Proceedings of the 2023 IEEE Radar Conference, Radarconf23, San Antonio, TX, USA, 1–5 May 2023; IEEE: New York, NY, USA, 2023. [Google Scholar]
  8. Xu, X.; Zhang, X.; Zhang, T.; Zeng, T. Group-Wise Shuffle Attention R-Cnn for Ship Detection in Dual-Polarization Sar Images. In Proceedings of the Igarss 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; IEEE: New York, NY, USA, 2023; pp. 6410–6413. [Google Scholar]
  9. Zhou, Z.; Chen, J.; Huang, Z.; Lv, J.; Song, J.; Luo, H.; Wu, B.; Li, Y.; Diniz, P.S.R. HRLE-SARDet: A Lightweight SAR Target Detection Algorithm Based on Hybrid Representation Learning Enhancement. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5203922. [Google Scholar] [CrossRef]
  10. Duan, L.; Lu, X.; Yang, J.; Yang, H.; Zhang, S.; Tong, G.; Tan, K.; Dai, Z.; Gu, H. Saturated Interference in SAR: Theoretical Analysis and Suppression Solutions. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2025, 18, 9244–9261. [Google Scholar] [CrossRef]
  11. Xiao, Z.; He, F.; Sun, Z.; Zhang, Z. Mitigation of Suppressive Interference in AMPC SAR Based on Digital Beamforming. Remote Sens. 2024, 16, 2812. [Google Scholar] [CrossRef]
  12. Singh, P.; Diwakar, M.; Shankar, A.; Shree, R.; Kumar, M. A Review on SAR Image and Its Despeckling. Arch. Comput. Method Eng. 2021, 28, 4633–4653. [Google Scholar] [CrossRef]
  13. Aksoy, G.; Nar, F. Multiplicative-Additive Despeckling in SAR Images. Turk. J. Electr. Eng. Comput. Sci. 2020, 28, 1871–1885. [Google Scholar] [CrossRef]
  14. de Medeiros, D.d.S.; Garcia, F.D.A.; Alves, D.I.; da Costa, R.F.; Machado, R.; Santos Filho, J.C.S. CA-CFAR Detection for SAR Systems Over Correlated Gamma-Distributed Clutter. IEEE Geosci. Remote Sens. Lett. 2024, 21, 8003605. [Google Scholar] [CrossRef]
  15. Huang, P.; Zou, Z.; Xia, X.-G.; Liu, X.; Liao, G. A Statistical Model Based on Modified Generalized-K Distribution for Sea Clutter. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8015805. [Google Scholar] [CrossRef]
  16. Angelliaume, S.; Rosenberg, L.; Ritchie, M. Modeling the Amplitude Distribution of Radar Sea Clutter. Remote Sens. 2019, 11, 319. [Google Scholar] [CrossRef]
  17. Ritchie, M.A.; Woodbridge, K.; Stove, A.G. Analysis of Sea Clutter Distribution Variation with Doppler Using the Compound K-Distribution. In Proceedings of the 2010 IEEE Radar Conference, Arlington, VA, USA, 10–14 May 2010; IEEE: New York, NY, USA, 2010; pp. 495–499. [Google Scholar]
  18. Sun, Z.; Leng, X.; Zhang, X.; Zhou, Z.; Xiong, B.; Ji, K.; Kuang, G. Arbitrary-Direction SAR Ship Detection Method for Multiscale Imbalance. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5208921. [Google Scholar] [CrossRef]
  19. Sun, Z.; Leng, X.; Lei, Y.; Xiong, B.; Ji, K.; Kuang, G. BiFA-YOLO: A Novel YOLO-Based Method for Arbitrary-Oriented Ship Detection in High-Resolution SAR Images. Remote Sens. 2021, 13, 4209. [Google Scholar] [CrossRef]
  20. Guan, T.; Chang, S.; Deng, Y.; Xue, F.; Wang, C.; Jia, X. Oriented SAR Ship Detection Based on Edge Deformable Convolution and Point Set Representation. Remote Sens. 2025, 17, 1612. [Google Scholar] [CrossRef]
  21. Lei, S.; Qiu, X.; Ding, C.; Lei, S. A Feature Enhancement Method Based on the Sub-Aperture Decomposition for Rotating Frame Ship Detection in Sar Images. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium Igarss, Brussels, Belgium, 12–16 July 2021; IEEE: New York, NY, USA, 2021; pp. 3573–3576. [Google Scholar]
  22. Wu, F.; Hu, T.; Xia, Y.; Ma, B.; Sarwar, S.; Zhang, C. WDFA-YOLOX: A Wavelet-Driven and Feature-Enhanced Attention YOLOX Network for Ship Detection in SAR Images. Remote Sens. 2024, 16, 1760. [Google Scholar] [CrossRef]
  23. Cui, Z.; Li, Q.; Cao, Z.; Liu, N. Dense Attention Pyramid Networks for Multi-Scale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8983–8997. [Google Scholar] [CrossRef]
  24. Gu, D.; Xu, X. Multi-Feature Extraction of Ships From SAR Images. In Proceedings of the 2013 6th International Congress on Image and Signal Processing (CISP), Hangzhou, China, 16–18 December 2013; Yuan, Z., Wang, L., Xu, W., Yu, K., Eds.; IEEE: New York, NY, USA, 2013; Volume 1, pp. 454–458. [Google Scholar]
  25. Hou, B.; Chen, X.; Jiao, L. Multilayer CFAR Detection of Ship Targets in Very High Resolution SAR Images. IEEE Geosci. Remote Sens. Lett. 2015, 12, 811–815. [Google Scholar]
  26. Charalampidis, D. Target Detection Based on Multiresolution Fractal Analysis. Int. Soc. Opt. Photonics 2007, 6567, 65671B. [Google Scholar]
  27. Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. SAR Ship Detection Dataset (SSDD): Official Release and Comprehensive Data Analysis. Remote Sens. 2021, 13, 3690. [Google Scholar] [CrossRef]
  28. Zhang, T.; Zhang, X.; Ke, X.; Zhan, X.; Shi, J.; Wei, S.; Pan, D.; Li, J.; Su, H.; Zhou, Y.; et al. LS-SSDD-v1.0: A Deep Learning Dataset Dedicated to Small Ship Detection from Large-Scale Sentinel-1 SAR Images. Remote Sens. 2020, 12, 2997. [Google Scholar] [CrossRef]
  29. Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
  30. Wang, J.; Quan, S.; Xing, S.; Li, Y.; Wu, H.; Meng, W. PSO-Based Fine Polarimetric Decomposition for Ship Scattering Characterization. ISPRS J. Photogramm. Remote Sens. 2025, 220, 18–31. [Google Scholar] [CrossRef]
  31. Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: New York, NY, USA, 2015; pp. 1440–1448. [Google Scholar]
  32. Yan, X.; Chen, Z.; Xu, A.; Wang, X.; Liang, X.; Lin, L. Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV 2019), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: New York, NY, USA, 2019; pp. 9576–9585. [Google Scholar]
  33. Qiao, L.; Zhao, Y.; Li, Z.; Qiu, X.; Wu, J.; Zhang, C. DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 8661–8670. [Google Scholar]
  34. Hu, R.; Lin, H.; Lu, Z.; Xia, J. Despeckling Representation for Data-Efficient SAR Ship Detection. IEEE Geosci. Remote Sens. Lett. 2025, 22, 4002005. [Google Scholar] [CrossRef]
  35. Tang, X.; Zhang, X.; Shi, J.; Wei, S. A Moving Target Detection Method Based on Yolo for Dual-Beam Sar. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium Igarss, Brussels, Belgium, 12–16 July 2021; IEEE: New York, NY, USA, 2021; pp. 5315–5318. [Google Scholar]
  36. Wu, S.; Zhang, L. Using Popular Object Detection Methods for Real Time Forest Fire Detection. In Proceedings of the 2018 11th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 8–9 December 2018; IEEE: New York, NY, USA, 2018; Volume 1, pp. 280–284. [Google Scholar]
  37. Xu, X.; Liang, W.; Zhao, J.; Gao, H. Tiny FCOS: A Lightweight Anchor-Free Object Detection Algorithm for Mobile Scenarios. Mobile Netw. Appl. 2021, 26, 2219–2229. [Google Scholar] [CrossRef]
  38. Liu, S.; Chi, J.; Wu, C. FCOS-Lite: An Efficient Anchor-Free Network for Real-Time Object Detection. In Proceedings of the 33rd Chinese Control and Decision Conference (CCDC 2021), Kunming, China, 22–24 May 2021; IEEE: New York, NY, USA, 2021; pp. 1519–1524. [Google Scholar]
  39. Zhu, M.; Hu, G.; Li, S.; Zhou, H.; Wang, S.; Feng, Z. A Novel Anchor-Free Method Based on FCOS plus ATSS for Ship Detection in SAR Images. Remote Sens. 2022, 14, 2034. [Google Scholar] [CrossRef]
  40. Wan, H.; Chen, J.; Huang, Z.; Xia, R.; Wu, B.; Sun, L.; Yao, B.; Liu, X.; Xing, M. AFSar: An Anchor-Free SAR Target Detection Algorithm Based on Multiscale Enhancement Representation Learning. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5219514. [Google Scholar] [CrossRef]
  41. Xia, R.; Chen, J.; Huang, Z.; Wan, H.; Wu, B.; Sun, L.; Yao, B.; Xiang, H.; Xing, M. CRTransSar: A Visual Transformer Based on Contextual Joint Representation Learning for SAR Ship Detection. Remote Sens. 2022, 14, 1488. [Google Scholar] [CrossRef]
  42. Chen, B.; Yu, C.; Zhao, S.; Song, H. An Anchor-Free Method Based on Transformers and Adaptive Features for Arbitrarily Oriented Ship Detection in SAR Images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2024, 17, 2012–2028. [Google Scholar] [CrossRef]
  43. Chen, Y.; Xia, Z.; Liu, J.; Wu, C. TSDet: End-to-End Method with Transformer for SAR Ship Detection. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: New York, NY, USA, 2022. [Google Scholar]
  44. Zhou, Z.; Cui, Z.; Zang, Z.; Meng, X.; Cao, Z.; Yang, J. UltraHi-PrNet: An Ultra-High Precision Deep Learning Network for Dense Multi-Scale Target Detection in SAR Images. Remote Sens. 2022, 14, 5596. [Google Scholar] [CrossRef]
  45. Qin, C.; Zhang, L.; Wang, X.; Li, G.; He, Y.; Liu, Y. RDB-DINO: An Improved End-to-End Transformer with Refined De-Noising and Boxes for Small-Scale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5200517. [Google Scholar] [CrossRef]
  46. Zou, B.; Qin, J.; Zhang, L. Vehicle Detection Based on Semantic-Context Enhancement for High-Resolution SAR Images in Complex Background. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4503905. [Google Scholar] [CrossRef]
  47. Zhang, M.; Li, Y.; Guo, J.; Li, Y.; Gao, X. BurgsVO: Burgs-Associated Vertex Offset Encoding Scheme for Detecting Rotated Ships in SAR Images. Remote Sens. 2025, 17, 388. [Google Scholar] [CrossRef]
  48. Alpers, W.; Bignami, F. Small-Scale and Sub-Mesoscale Phenomena Associated with Upwelling Studied by Sar. In Proceedings of the Igarss 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; IEEE: New York, NY, USA, 2020; pp. 3537–3540. [Google Scholar]
  49. Alessi, M.A.; Chirico, P.G.; Sunder, S.; O’Pry, K.L. Detection and Monitoring of Small-Scale Diamond and Gold Mining Dredges Using Synthetic Aperture Radar on the Kadei (Sangha) River, Central African Republic. Remote Sens. 2023, 15, 913. [Google Scholar] [CrossRef]
  50. Zhang, H.; Wang, W.; Deng, J.; Guo, Y.; Liu, S.; Zhang, J. MASFF-Net: Multiazimuth Scattering Feature Fusion Network for SAR Target Recognition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 19425–19440. [Google Scholar] [CrossRef]
  51. Deng, J.; Wang, W.; Zhang, H.; Zhang, T.; Zhang, J. PolSAR Ship Detection Based on Superpixel-Level Contrast Enhancement. IEEE Geosci. Remote Sens. Lett. 2024, 21, 4008805. [Google Scholar] [CrossRef]
  52. Zhao, J.; Yang, J.; Yuan, Z.; Lin, Q. A Novel Fusion Framework without Pooling for Noisy SAR Image Classification. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; IEEE: New York, NY, USA, 2020; pp. 3531–3536. [Google Scholar]
  53. Sunkara, R.; Luo, T. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Part III, Grenoble, France, 19–23 September 2022; Amini, M.R., Canu, S., Fischer, A., Guns, T., Novak, P.K., Tsoumakas, G., Eds.; Springer International Publishing Ag: Cham, Switzerland, 2023; Volume 13715, pp. 443–459. [Google Scholar]
  54. Guan, T.; Chang, S.; Wang, C.; Jia, X. SAR Small Ship Detection Based on Enhanced YOLO Network. Remote Sens. 2025, 17, 839. [Google Scholar] [CrossRef]
  55. Hu, S.; Gao, F.; Zhou, X.; Dong, J.; Du, Q. Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5504005. [Google Scholar] [CrossRef]
  56. Liu, X.; Wu, Y.; Hu, X.; Li, Z.; Li, M. A Novel Lightweight Attention-Discarding Transformer for High-Resolution SAR Image Classification. IEEE Geosci. Remote Sens. Lett. 2023, 20, 4006405. [Google Scholar] [CrossRef]
  57. Wang, Z.; Wang, L.; Wang, W.; Tian, S.; Zhang, Z. WAFormer: Ship Detection in SAR Images Based on Window-Aware Swin-Transformer. In Proceedings of the Pattern Recognition and Computer Vision, Proceedings of the Chinese Conference, PRCV 2022, Pt III, Urumqi, China, 18–20 October 2024; Yu, S., Zhang, Z., Yuen, P.C., Han, J., Tan, T., Guo, Y., Lai, J., Zhang, J., Eds.; Springer International Publishing Ag: Cham, Switzerland, 2022; Volume 13536, pp. 524–536. [Google Scholar]
  58. Hou, X.; Han, M.; Gong, C.; Qian, X. SAR Complex Image Data Compression Based on Quadtree and Zerotree Coding in Discrete Wavelet Transform Domain: A Comparative Study. Neurocomputing 2015, 148, 561–568. [Google Scholar] [CrossRef]
  59. Kanagaraj, H.; Muneeswaran, V. Image Compression Using HAAR Discrete Wavelet Transform. In Proceedings of the 2020 5th International Conference on Devices, Circuits and Systems (ICDCS’ 20), Coimbatore, India, 5–6 March 2020; IEEE: New York, NY, USA, 2020; pp. 271–274. [Google Scholar]
  60. Li, W.; Guo, H.; Liu, X.; Liang, K.; Hu, J.; Ma, Z.; Guo, J. Efficient Face Super-Resolution via Wavelet-Based Feature Enhancement Network. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, Australia, 28 October–1 November 2024. [Google Scholar]
  61. Zhang, H.; Zhang, S. Shape-IoU: More Accurate Metric Considering Bounding Box Shape and Scale. arXiv 2024, arXiv:2312.17663. [Google Scholar]
Figure 1. The overall structure of our proposed FAWT-Net.
Figure 1. The overall structure of our proposed FAWT-Net.
Remotesensing 17 03460 g001
Figure 2. Diagram of sub-image feature splicing downsampling (SPD).
Figure 2. Diagram of sub-image feature splicing downsampling (SPD).
Remotesensing 17 03460 g002
Figure 3. The detail structure of FAM.
Figure 3. The detail structure of FAM.
Remotesensing 17 03460 g003
Figure 4. The detail structure of HWT.
Figure 4. The detail structure of HWT.
Remotesensing 17 03460 g004
Figure 5. The collaborative effect of WH-IoU and the detection head. The blue boxes indicate the identified ship targets.
Figure 5. The collaborative effect of WH-IoU and the detection head. The blue boxes indicate the identified ship targets.
Remotesensing 17 03460 g005
Figure 6. The statistical results of target scales in SSDD data, where blue points represent small-scale targets.
Figure 6. The statistical results of target scales in SSDD data, where blue points represent small-scale targets.
Remotesensing 17 03460 g006
Figure 7. Visualization results on the SSDD dataset: (a) Ground Truth; (b) Ours; (c) Faster RCNN; (d) PKINet; (e) Yolov8m; (f) CenterNet. The blue and red boxes indicate the identified ship targets, green circles represent missed detection, yellow boxes represent false alarms.
Figure 7. Visualization results on the SSDD dataset: (a) Ground Truth; (b) Ours; (c) Faster RCNN; (d) PKINet; (e) Yolov8m; (f) CenterNet. The blue and red boxes indicate the identified ship targets, green circles represent missed detection, yellow boxes represent false alarms.
Remotesensing 17 03460 g007
Figure 8. Visualization results on the LS-SSDD dataset: (a) Ground Truth; (b) Ours; (c) Faster RCNN; (d) PKINet; (e) Yolov8m; (f) CenterNet. The blue and red boxes indicate the identified ship targets, green circles represent missed detection, yellow boxes represent false alarms.
Figure 8. Visualization results on the LS-SSDD dataset: (a) Ground Truth; (b) Ours; (c) Faster RCNN; (d) PKINet; (e) Yolov8m; (f) CenterNet. The blue and red boxes indicate the identified ship targets, green circles represent missed detection, yellow boxes represent false alarms.
Remotesensing 17 03460 g008
Figure 9. Visualization results on the interference: (a) Ground Truth; (b) Ours; (c) Faster RCNN; (d) PKINet; (e) CenterNet; (f) Yolov8m. The blue and red boxes indicate the identified ship targets, green circles represent missed detection, yellow boxes represent false alarms.
Figure 9. Visualization results on the interference: (a) Ground Truth; (b) Ours; (c) Faster RCNN; (d) PKINet; (e) CenterNet; (f) Yolov8m. The blue and red boxes indicate the identified ship targets, green circles represent missed detection, yellow boxes represent false alarms.
Remotesensing 17 03460 g009
Figure 10. Statistics of the aspect ratio of the target box.
Figure 10. Statistics of the aspect ratio of the target box.
Remotesensing 17 03460 g010
Figure 11. Feature visualization of the ablation experiment of the HWT module: (a,b) Network input images; (c,e) Feature flow without using HWT; (d,f) Feature flow with using HWT. The red box shows the feature optimization part after adding the module. And the blue boxes indicate the identified ship targets.
Figure 11. Feature visualization of the ablation experiment of the HWT module: (a,b) Network input images; (c,e) Feature flow without using HWT; (d,f) Feature flow with using HWT. The red box shows the feature optimization part after adding the module. And the blue boxes indicate the identified ship targets.
Remotesensing 17 03460 g011
Figure 12. Feature visualization of the ablation experiment of the FAM module: (a,b) Network input images; (c,e) Feature flow without using FAM; (d,f) Feature flow with using FAM. The red box shows the feature optimization part after adding the module. And the blue boxes indicate the identified ship targets.
Figure 12. Feature visualization of the ablation experiment of the FAM module: (a,b) Network input images; (c,e) Feature flow without using FAM; (d,f) Feature flow with using FAM. The red box shows the feature optimization part after adding the module. And the blue boxes indicate the identified ship targets.
Remotesensing 17 03460 g012
Figure 13. Detection results of wide-format images. The red boxes highlight the enlarged image, while the blue boxes indicate the identified ship targets.
Figure 13. Detection results of wide-format images. The red boxes highlight the enlarged image, while the blue boxes indicate the identified ship targets.
Remotesensing 17 03460 g013
Table 1. Comparison of relevant information of two different experimental datasets.
Table 1. Comparison of relevant information of two different experimental datasets.
DatasetSize (pixel)Image NumbersResolution (m)Polarization Mode
SSDD390 × 205–600 × 50011601–15VV VH
VH HV
LS-SSDD800 × 800
24,000 × 16,000
9064 sub-images
15 wide-format images
0.5–15
Table 2. Comparative experiments on the SSDD dataset.
Table 2. Comparative experiments on the SSDD dataset.
MethodAPAP50AP75APSFlops (G)Params (M)FPS
Faster RCNN66.396.276.565.3105.443.97.48
Yolov8m55.392.563.553.710.22.825.12
Centernet66.796.178.961.5---
Solov239.772.443.232.1---
HRNet59.090.669.256.5---
PKINet56.992.971.756.712.94.2121.18
HTC60.592.769.156.3--5.52
CRTransSar-97.076.2----
FESAR64.896.777.162.046.83.5-
SAR-CNN63.292.175.263.8101.9--
Ours67.697.580.166.19.03.1626.52
Table 3. Comparative experiments on the LS-SSDD dataset.
Table 3. Comparative experiments on the LS-SSDD dataset.
MethodAPAP50AP75RFlops (G)Params (M)FPS
Faster RCNN26.373.150.368.9105.443.9-
Yolov8m21.668.448.874.010.22.819.10
Centernet26.972.555.465.6---
Solov216.347.127.235.2---
HRNet20.563.640.563.7---
PKINet27.173.353.972.112.94.2118.47
HTC22.366.547.667.9---
FESAR-72.0-60.446.83.5-
Ours29.874.756.176.29.03.1620.28
Table 4. Ablation experiments on the SSDD dataset. The hyphen represents not adding the corresponding module, and √ represents adding the corresponding module.
Table 4. Ablation experiments on the SSDD dataset. The hyphen represents not adding the corresponding module, and √ represents adding the corresponding module.
SPDFAMHWTWH-IoUPRAP50F1
---87.881.791.40.85
---88.582.591.20.85
--91.983.793.60.88
-93.784.295.10.89
94.886.297.50.90
Table 5. Ablation experiments for slender objects. × and √ respectively represent adding and not adding the WH-IoU module.
Table 5. Ablation experiments for slender objects. × and √ respectively represent adding and not adding the WH-IoU module.
WH-IoUPRAP50F1
×90.283.793.10.87
92.984.595.60.89
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Sun, Z.; Chang, S. FAWT-Net: Attention-Matrix Despeckling and Haar Wavelet Reconstruction for Small-Scale SAR Ship Detection. Remote Sens. 2025, 17, 3460. https://doi.org/10.3390/rs17203460

AMA Style

Zhang Y, Sun Z, Chang S. FAWT-Net: Attention-Matrix Despeckling and Haar Wavelet Reconstruction for Small-Scale SAR Ship Detection. Remote Sensing. 2025; 17(20):3460. https://doi.org/10.3390/rs17203460

Chicago/Turabian Style

Zhang, Yangyiyao, Zhongzhen Sun, and Sheng Chang. 2025. "FAWT-Net: Attention-Matrix Despeckling and Haar Wavelet Reconstruction for Small-Scale SAR Ship Detection" Remote Sensing 17, no. 20: 3460. https://doi.org/10.3390/rs17203460

APA Style

Zhang, Y., Sun, Z., & Chang, S. (2025). FAWT-Net: Attention-Matrix Despeckling and Haar Wavelet Reconstruction for Small-Scale SAR Ship Detection. Remote Sensing, 17(20), 3460. https://doi.org/10.3390/rs17203460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop