A CFAR-Enhanced Ship Detector for SAR Images Based on YOLOv5s

: Ship detection and recognition in Synthetic Aperture Radar (SAR) images are crucial for maritime surveillance and traffic management. Limited availability of high-quality datasets hinders in-depth exploration of ship features in complex SAR images. While most existing SAR ship research is primarily based on Convolutional Neural Networks (CNNs), and although deep learning advances SAR image interpretation, it often prioritizes recognition over computational efficiency and underutilizes SAR image prior information. Therefore, this paper proposes YOLOv5s-based ship detection in SAR images. Firstly, for comprehensive detection enhancement, we employ the lightweight YOLOv5s model as the baseline. Secondly, we introduce a sub-net into YOLOv5s, learning traditional features to augment ship feature representation of Constant False Alarm Rate (CFAR). Additionally, we attempt to incorporate frequency-domain information into the channel attention mechanism to further improve detection. Extensive experiments on the Ship Recognition and Detection Dataset (SRSDDv1.0) in complex SAR scenarios confirm our method’s 68.04% detection accuracy and 60.25% recall, with a compact 18.51 M model size. Our network surpasses peers in mAP, F1 score, model size, and inference speed, displaying robustness across diverse complex scenes.


Introduction
Synthetic Aperture Radar (SAR) leverages the synthetic aperture principle to achieve high-resolution microwave imaging, offering characteristics such as all-weather capability, high resolution, and extensive coverage.Unlike optical remote sensing imagery, SAR enables Earth surface observations without being constrained by weather conditions.Ships, significant subjects in remote sensing images, play a crucial role in various applications, including military surveillance, combating illegal resource exploitation and waterway management [1][2][3].The use of SAR imagery for ship detection and identification has become a prominent research focus, posing a central question in this field.
Due to the top-down acquisition of remote sensing images, large image dimensions, and highly complex scenes, strong clutter signals from rough sea surfaces interfere with SAR image-based ship detection.This interference significantly diminishes the performance of ship detection [4].Various methods have been proposed for SAR ship target detection, including traditional approaches such as CFAR, template matching, and trailing edge detection.These methods often rely on manually designed features, exhibiting limited generalization capabilities.Among them, CFAR algorithms, known for their adaptability and false alarm reduction capabilities, stand as the most widely applied algorithms in ship detection.This algorithm was initially introduced by An et al. [5] and rooted in ship detection from the Ottawa Defense Research Center in 2001 [6].Subsequently, scientists have made significant enhancements to CFAR detection algorithms from various perspectives, resulting in numerous CFAR-based detection algorithms.However, given the rising volume of satellite remote sensing data and the escalating complexity of SAR scenes, traditional algorithms relying on manually designed features, such as CFAR operators, are no longer able to meet the demands for detection speed and accuracy.
To address the limitations of traditional methods in complex scenarios, deep learningbased approaches have emerged and achieved remarkable success.Notably, methods based on Convolutional Neural Network (CNN) have demonstrated significant potential in computer vision tasks, giving rise to precise and robust SAR ship target detection techniques.In comparison to traditional model-driven approaches, deep learning-based methods offer advantages such as high automation, fast processing speed, and strong model transferability [7].According to network architecture, deep learning-based ship detection methods can be categorized into single-stage and two-stage detection methods.Single-stage ship detection methods transform the problem of target localization in a frame into a regression task, eliminating the need for target proposal generation.Two-stage ship detection methods first generate a series of target proposals and subsequently determine the presence of ship targets.In general, single-stage algorithms tend to exhibit lower detection accuracy than two-stage algorithms, but the former often feature lighter networks and faster detection speeds.
The SAR ship detection model based on CNN has achieved significant performance improvement [8].Unfortunately, it has widely abandoned traditional manual features.However, we believe that blindly abandoning these features is unwise, as they not only possess elegant functionality but also exhibit advantages in multiple sensor and diverse scene applications through fine-tuning.The interpretability of these manual features provides decision transparency for SAR target recognition technology, aiding in mitigating potential decision risks in high-risk applications, such as military reconnaissance and precision strikes, and gaining user trust.Also, despite the success of CNN in computer vision, in SAR ship detection, SAR images exhibit rich noise and complex backgrounds, while ship targets are often densely distributed, varied in size, and oriented arbitrarily.The "black box" nature of neural networks limits reliability and credibility, particularly in high-risk scenarios like military reconnaissance, where user understanding and trust in decision making are crucial.Additionally, most deep learning methods solely utilize spatial domain information from SAR images, neglecting frequency domain information, which results in decreased detection performance in sea clutter scenarios [9].Consequently, despite the encouraging outcomes achieved by current deep learning-based SAR ship detection methods, there remains substantial room for improvement in terms of detection accuracy and efficiency.
To enhance ship target detection performance in complex scenes within SAR images, this paper proposes a rotation-based ship detection method based on the YOLOv5s [10] framework.This method prioritizes addressing challenges such as ship target rotation, noise, and signal interference in SAR images while maintaining detection speed.Our approach innovatively combines deep learning with traditional handcrafted features by introducing sub-net into YOLOv5s to learn CFAR features, thereby enhancing the representation capability of ship targets.Additionally, we incorporate frequency domain information into the channel attention mechanism using FcaNet [11] to further improve ship detection performance.Furthermore, we enable ship target detection in arbitrary orientations, enhancing detection precision.We conducted extensive experiments on publicly available SAR ship detection datasets, and the results demonstrate significant improvements in accuracy and recall compared to existing methods.Our method exhibits robustness in various complex scenarios.These experimental findings suggest the potential practical value of our approach in the field of SAR ship detection.
In summary, the main contributions of this paper can be summarized as follows: (1) Aiming at the complex backgrounds encountered in SAR images, we propose an end-to-end network structure based on a single-stage object detection algorithm.This network achieves high ship detection accuracy while maintaining a fast speed.We incorporate handcrafted feature extraction and attention mechanisms into the network, ensuring the effectiveness of ship detection in SAR images.
(2) We design a sub-net that supervises feature extraction in the main network, helping our model learn more handcrafted features and highlighting the differences between ships and backgrounds, thereby overcoming the challenges of ship detection in complex backgrounds.
The remaining sections of this paper are organized as follows.Section 2 introduces related work.The method is described in Section 3. Experimental results and ablation studies are presented in Section 4. Finally, Section 5 summarizes the paper.

Handcraft Feature-Based Methods
Handcraft feature-based methods can be classified into three categories: using polarization features, geometric features, and backscatter features.
Methods utilizing polarization characteristics [12][13][14] distinguish targets from the background by exploiting the scattering difference between ships and sea clutter, achieving ship detection.However, this approach requires accurate scattering models, which poses significant challenges.Additionally, this method is highly sensitive to the detection environment and prone to interference from sea clutter, leading to severe performance degradation.
Geometric feature-based detection methods [15] employ artificial designs to capture the shape, size, texture, and other characteristics of ship targets.This involves constructing a template library for template matching and ship detection.However, this approach involves pixel-wise matching between the entire SAR image and the designed templates, resulting in high computational costs and slow detection speed.Furthermore, the performance of this method heavily relies on the quality of the designed templates, making it expensive and highly dependent on expert experience, ultimately resulting in poor robustness.Moreover, the presence of sea clutter greatly affects the accurate matching between templates and ship targets.
A group of methods based on backscatter features widely employs CFAR detection.These algorithms demonstrate reliability across diverse environments and applications, requiring minimal human intervention.The CFAR algorithm was initially introduced by An et al. [5] and rooted in ship detection at the Ottawa Defense Research Center in 2001 [6].Subsequently, scientists have made significant improvements to CFAR detection algorithms from various perspectives, resulting in numerous CFAR-based detection algorithms.Leng et al. [16] introduced a bilateral CFAR algorithm that combines SAR image intensity and spatial distribution, reducing the blurred effects caused by the SAR platform and sea clutter.In simple scenarios, CFAR methods yield favorable results.However, for small ships and complex maritime scenes, modeling challenges result in higher false alarm rates and inferior detection performance.With the increasing volume of satellite remote sensing data and the growing complexity of SAR scenes, traditional ship detection methods based on handcrafted features can only leverage simple low-level features of SAR images.They lack generalization, rendering them unsuitable for complex SAR image detection tasks.

Deep Learning-Based Methods
To address the challenge of insufficient generalization, deep learning-based methods have emerged and achieved significant success.Particularly, methods based on Convolutional Neural Networks (CNNs) have shown substantial potential in computer vision tasks, thereby advancing the precise and robust detection of SAR ship targets.Compared to traditional model-driven approaches, deep learning-based methods offer advantages such as high automation, rapid processing speed, and strong model transferability.Based on network architecture, deep learning-based ship detection methods can be categorized into two-stage and one-stage detection methods.
Two-stage ship detection methods generate target proposals before confirming ship presence, involving classification, regression, and segmentation tasks with a connected segmentation sub-network, enhancing accuracy.RCNN [17] paved the way, followed by Fast R-CNN [18] in 2015 and Faster R-CNN [19] in the same year.In 2016, Dai et al., introduced the region-based fully convolutional network (R-fcn) [20] with location-sensitive ROI pooling.
Despite the potential advantages mentioned earlier, two-stage methods pose challenges with proposal generation and multiple tasks, leading to complexity and computational overhead.In response, single-stage methods like SSD [21] emerged, directly detecting targets from densely sampled anchor points, utilizing uniform dense sampling and strategies like aspect ratios and scales.The YOLO series [10,22,23] exemplifies classic single-stage algorithms, consistently improving performance.
Deep learning-based SAR ship detection, whether two-stage or single-stage, often features large models and deep architectures [24].Zhang et al. [25] proposed a lightweight SAR ship detector, "ShipDeNet-20", which is several tens to even hundreds of times lighter than other detectors.This contributes to real-time SAR applications and future hardware implementations.Despite advancements, methods predominantly use spatial domain information, neglecting the potential of frequency domain information.In addition, Liu et al. [26] highlighted challenges in feature extraction from the horizontal region of interest (HRoI) in remote sensing images.While these methods show promise, there is room for improvement in terms of accuracy and efficiency.

Fusion-Based Methods
In recent years, the integration of handcrafted traditional features with deep learning has become a crucial research direction [27,28].In the domain of SAR image ship detection, based on the differences in the fusion approaches, three primary methods are prominent: the two-stage fusion method, direct embedding method, and multi-branch fusion method.
The two-stage fusion method employs a two-stage processing pipeline that combines traditional operators with neural networks, demonstrating effectiveness [29].Through initial preprocessing, leveraging traditional operators such as CFAR, classical operations are performed on SAR images for feature extraction.However, this method may lose information when dealing with complex electromagnetic scattering scenarios.Additionally, the two-stage fusion method introduces computational overhead, impacting the feasibility of real-time applications.
The direct embedding method directly embeds traditional features into neural networks, either at the input layer or intermediate layers.This method offers advantages but also presents challenges.For instance, introducing traditional features at the input layer, as demonstrated by MSRIHL-CNN [30], enhances the combination of low-level texture and deep features but may face difficulties in complex pattern recognition of high-level features.Embedding traditional features into intermediate layers, as shown by HOG-ShipCLSNet [31], requires a delicate balance between traditional and deep features for optimal performance.
The multi-branch fusion method designs a multi-branch structure to incorporate traditional features into the network.The innovation of such methods aims to organically fuse traditional and deep learning features, with each branch focusing on different types of features.The decision to use the multi-branch approach is based on its capability to comprehensively capture various target features.However, it is essential to recognize the potential for further improvement.Practices like collaborative tasks in MTL-Det [32] may introduce complexity during training and require a substantial amount of labeled data.Similarly, the effective yet optimizable feature fusion method in a single-stage ship detection network [33] could further be optimized to reduce computational demands and enhance real-time applicability.
To achieve a balance between the speed and accuracy of ship detection in SAR images, this study is anchored in the YOLOv5s single-stage network.Simultaneously, by harnessing the comprehensive feature-capturing capability of the multi-branch network approach, sub-net is incorporated beyond the backbone network to seamlessly integrate traditional CFAR features into deep learning methods.

Method
In this section, we provide a detailed exposition of the architecture of the proposed network designed for achieving fast and accurate ship detection in SAR images.

The Overall Framework
Figure 1 illustrates the overall structure of the method proposed in this paper.We have chosen the YOLOv5s model as the base network.Unlike other detection networks in the YOLO series, YOLOv5 has not been formally introduced in relevant literature.Nevertheless, the detection speed of YOLOv5 is notably faster than any previous versions.Furthermore, this network exhibits robust performance in detecting multi-scale and small objects.The model comprises four versions: s, m, l, and x, among which the s version is most commonly used.This architecture is relatively concise and allows for faster execution and inference while maintaining accuracy.Our network primarily consists of three components: the backbone network, CFAR feature constraint sub-net (CFAR-FCN), and FcaNet Channel Attention Module (Fca-Neck).
enhance real-time applicability.
To achieve a balance between the speed and accuracy of ship detection in SAR images, this study is anchored in the YOLOv5s single-stage network.Simultaneously, by harnessing the comprehensive feature-capturing capability of the multi-branch network approach, sub-net is incorporated beyond the backbone network to seamlessly integrate traditional CFAR features into deep learning methods.

Method
In this section, we provide a detailed exposition of the architecture of the proposed network designed for achieving fast and accurate ship detection in SAR images.

The Overall Framework
Figure 1 illustrates the overall structure of the method proposed in this paper.We have chosen the YOLOv5s model as the base network.Unlike other detection networks in the YOLO series, YOLOv5 has not been formally introduced in relevant literature.Nevertheless, the detection speed of YOLOv5 is notably faster than any previous versions.Furthermore, this network exhibits robust performance in detecting multi-scale and small objects.The model comprises four versions: s, m, l, and x, among which the s version is most commonly used.This architecture is relatively concise and allows for faster execution and inference while maintaining accuracy.Our network primarily consists of three components: the backbone network, CFAR feature constraint sub-net (CFAR-FCN), and FcaNet Channel Attention Module (Fca-Neck).Firstly, the input image is resized to an appropriate size (e.g., 512).We employ the backbone of YOLOv5 as the main structure of the overall network and incorporate the FcaNet bottleneck into the neck to enhance low-frequency domain features.Furthermore, we establish a sub-net mechanism to constrain and incentivize feature learning in the Firstly, the input image is resized to an appropriate size (e.g., 512).We employ the backbone of YOLOv5 as the main structure of the overall network and incorporate the FcaNet bottleneck into the neck to enhance low-frequency domain features.Furthermore, we establish a sub-net mechanism to constrain and incentivize feature learning in the backbone network.In particular, we treat the backbone part of the main network as the "contracting path", where feature map sizes decrease layer by layer while channel counts increase.Subsequently, we connect this with an "expanding path" consisting of multiple layers of deconvolutional layers to output feature maps, which are supervised using CFAR feature maps.CFAR supervision allows us to obtain more meaningful semantic information from upsampled feature maps and finer-grained information from early traditional feature maps.
Our network predicts three different scales of bounding boxes.After feature extraction by the backbone network, we further extract features based on the fused feature map using Fca-Neck with the channel attention mechanism.Finally, multiple convolutional layers are connected after Fca-Neck in each branch, and three different scales of outputs are used to predict rotated boxes, including the center point, height, width, and rotation angle.In this method, we treat the angle prediction as an output with 180 categories.Therefore, for four bounding box offsets, one object confidence, six categories, and 180 angles, the predicted tensor size is N × M × [3 × (4 + 1 + 6 + 180)], where N and M represent the height and width of the tensor, respectively.The final detection results are obtained after applying Non-Maximum Suppression (NMS) filtering to the predicted boxes.

CFAR-FCN
The objective of ship target detection in SAR ship images primarily emphasizes high detection rates and low false alarm rates.The CFAR algorithm achieves target detection by comparing pixel grayscale values with a threshold within a specific region, effectively representing the electromagnetic scattering characteristics of targets in the form of images and providing rich target information.In this paper, we construct a multi-task network structure with two network branches.One serves as the main branch for ship target detection and regression, while the other branch acts as a sub-net to constrain and supervise the backbone network, enabling it to learn the spatial features of ship targets present in CFAR feature maps.The CFAR-FCN is not involved in inference.
As a typical image segmentation network, U-Net [34] has been widely applied in the field of SAR image water body segmentation in recent years.Considering the cost and constraints of SAR image acquisition, obtaining a large-scale dataset is relatively challenging.U-Net, through jump connections, propagates shallow feature information to deeper layers, merging relevant feature information, making the network model more suitable for small-sample SAR dataset segmentation tasks.Therefore, in this paper, we refer to the overall network architecture of U-Net and construct a sub-net for enhancing ship target features, as shown in Figure 1.
This sub-net consists of a contracting path and an expanding path.In this module, the contracting path corresponds to the backbone network and follows the typical architecture of a convolutional network.Each block in the expanding path includes a 2 × 2 convolution ("deconvolution") that reduces the number of feature channels by half.It is connected with the modules in the contracting path through convolution, followed by an activation function.In the final layer, a 1 × 1 2D convolution reduces the number of output channels to match the number of channels in the feature map.Ultimately, this branch outputs a feature map of size 512 × 512 × 3 (taking an input image size of 512 as an example).The CFAR operators are used in the filtering results of each SAR image, yielding CFAR feature maps.During the training phase, the loss is quantified by computing the Euclidean distance between the CFAR feature maps and the output of this branch, followed by backpropagation to update the model parameters.Through the gradient backpropagation of this branch, the backbone network learns CFAR features, endowing this branch with the capability to simulate CFAR feature maps.

Fca-Neck
Differing from conventional optical images, the imaging of objects in SAR images is solely related to their radar signal reflectivity, resulting in relatively monotonous information content, low resolution, and a strong correlation with the image signal-to-noise ratio.On the other hand, SAR images exhibit strong speckle noise, which interferes with target identification and detection.Furthermore, in the application of attention mechanisms in typical neural networks, there is a tendency to overlook the extraction and collaboration of frequency domain information.We attempt to exploit the frequency domain features of SAR images.
We use the two-dimensional Discrete Cosine Transform (2D-DCT) as weights to aggregate frequency information based on an attention mechanism.This yields the frequency band with the highest energy aggregation, which helps refine dense multi-target feature maps, reducing false alarms and enhancing the accuracy of dense multi-target detection.
The typical basis functions of two-dimensional (2D) DCT [35] are as And 2D-DCT can be represented as where f 2d ∈ R H×W is the 2D DCT spectrum, x 2d ∈ R H×W is the input, H is the height of x 2d , and W is the width of x 2d .Channel attention mechanisms are widely used in CNNs.They employ scalars to represent and assess the importance of each channel.Let X ∈ R H×W×C be the image feature tensor in the network, C be the number of channels, H be the height of the feature, and W be the width of the feature.The scalar in channel attention is often seen as a compression problem because it must represent the entire channel, and therefore, only a single scalar can be used.Thus, the attention mechanism can be written as where att ∈ R C is the attention vector, sigmoid is the function of sigmoid, f c is a mapping function like a fully connected layer or 1D convolution, and compress : R C×H×W → R C is a compression method.After obtaining attention vectors for all C channels, each channel of the input x 2d is scaled by the corresponding attention value, x 2d :,i,:,i = att i x 2d :,i,:,i , where X is the output of the attention mechanism, att i is the ith element of the attention vector, and X :,i,:,: is the ith input channel.In this paper, we employ FcaNet [11] for weight allocation, as detailed in Figure 1.In contrast to the global average pooling (GAP) used in typical networks, FcaNet extends GAP to 2D discrete cosine transform (2DDCT).In FcaNet, GAP is considered as a special case of 2DDCT, with its result being directly proportional to the lowest-frequency component of the 2DDCT.Therefore, utilizing FcaNet enables more efficient aggregation of frequency domain information.

Loss Function
This network employs a multi-task loss to optimize the supervisory branch and the final prediction network.The loss function consists of three parts: the CFAR-FCN branch loss, the original YOLO loss function, and the angle classification prediction loss.The formulas are as where L box , L obj , L cls , L CFAR , and L angle represent the detection box regression loss, the confidence loss, the classification loss, the CFAR branch loss, and the angle classification loss, respectively.λ n are the corresponding weight coefficients that control the importance of these losses, with a default setting of 1.
For the loss function related to the regression of detection box parameters, the complete-IoU (CIoU Loss ) [36] is used for calculation.The specific formula for calculation is as where N p represents the number of prediction layers (with a default value of 3), P box ∈ R N t ×(x c ,y c ,w,h) represents the bounding boxes predicted by the model, T box ∈ R N t ×(x c ,y c ,w,h) represents the corresponding ground truth bounding boxes, and N t represents the number of ship targets.The losses L obj , L cls , and L angle are computed using binary cross-entropy (BCE) logits loss, BCEWithLogits named L BCEL , defined as: The specific formulas for L obj , L cls , and L angle are as follows: L angle = where P obj ∈ R N p ×W i ×H i represents the predicted offset vector, T obj ∈ R N p ×W i ×H i is the ground truth vector, W i (i = 1, 2, 3) is the width of the predicted layer feature map, and ) is the height of the predicted layer feature map.P cls ∈ R N t ×N c represents the probability distribution of various class predictions, T cls ∈ R N t ×N c is the ground truth probability distribution, and N c is the number of ship categories (default value is 1).In typical angle classification, angles are usually treated as categories to calculate the loss function, neglecting the relationship between angles.In this paper, P θ ∈ R N t ×L angle and T θ ∈ R N t ×L angle represent the CSL-adjusted angle labels and predictions, respectively.Furthermore, to encourage the sub-net to learn features from the CFAR maps effectively and thus provide supervision to the detection backbone network, the loss function for the CFAR-FCN branch is calculated using Euclidean distance.

Experiments and Analysis
In this section, we evaluate the detection performance of the proposed method through experiments.First, we introduce the dataset and relevant experimental settings and evaluation criteria.Then, we compare this method with other CNN-based rotation ship detection methods to demonstrate its superiority.

Dataset and Experimental Settings 4.1.1. Dataset
Lei et al. [37] released the SRSDD-v1.0dataset in 2021, which was constructed from more than 30 large-scene images captured from five locations using China's GF-3 satellite, with an image resolution of 1 m.These large-scene images were cropped into small patches of 1024 × 1024 pixels each.The dataset contains a total of 666 images, all of which are cut from the original s, and container ships.The number of images including land cover is 420, which contain 2275 ships.The number of images with only the sea in the background is 246, which contain 609 ships.It is worth noting that this dataset is the first publicly available SAR ship dataset with varying resolutions and classifications.Therefore, we conduct ship detection and recognition experiments based on this dataset.Table 1 provides additional details about SRSDD.The ships in SRSDD are annotated with rotated bounding boxes and categorized into six classes: oil tankers, bulk carriers, fishing boats, law enforcement vessels, dredgers, and container ships.Experts inspected the corresponding SAR images and optical images, providing rotated bounding boxes and class labels for each ship target.This ensures their authenticity and accuracy.Figure 2 displays the quantity for each class.From the graph, it can be observed that bulk carriers constitute the majority of the dataset, while law enforcement vessels make up almost one-tenth of the bulk carriers.The overall distribution of the training and testing datasets is consistent.Additionally, considering the issue of having more offshore scenes than nearshore scenes in existing SAR datasets, SRSDD places emphasis on selecting nearshore scenes during sampling.In SRSDD, nearshore scenes account for 63.1%, while offshore scenes make up 36.9%.To ensure the fairness of experimental results, our experiments follow the same protocol as in the literature [37], which uses 532 images for training and 134 images for testing.In both the training and testing sets, the categories, widths, heights, and angle distributions of the ships are similar to ensure the effectiveness of the test results.The specific distribution of ship quantities can be seen in Figure 2.
Therefore, we conduct ship detection and recognition experiments based on this dataset.Table 1 provides additional details about SRSDD.The ships in SRSDD are annotated with rotated bounding boxes and categorized into six classes: oil tankers, bulk carriers, fishing boats, law enforcement vessels, dredgers, and container ships.Experts inspected the corresponding SAR images and optical images, providing rotated bounding boxes and class labels for each ship target.This ensures their authenticity and accuracy.Figure 2 displays the quantity for each class.From the graph, it can be observed that bulk carriers constitute the majority of the dataset, while law enforcement vessels make up almost one-tenth of the bulk carriers.The overall distribution of the training and testing datasets is consistent.Additionally, considering the issue of having more offshore scenes than nearshore scenes in existing SAR datasets, SRSDD places emphasis on selecting nearshore scenes during sampling.In SRSDD, nearshore scenes account for 63.1%, while offshore scenes make up 36.9%.To ensure the fairness of experimental results, our experiments follow the same protocol as in the literature [37], which uses 532 images for training and 134 images for testing.In both the training and testing sets, the categories, widths, heights, and angle distributions of the ships are similar to ensure the effectiveness of the test results.The specific distribution of ship quantities can be seen in Figure 2.

Implementation
All experiments in this paper were conducted on a PC with an Intel ® Core™ i7-10875H CPU @2.30GHz × 16 and a GeForce RTx 2060 Mobile GPU, using PyTorch for implementation.The operating system was Ubuntu, and the CUDA version was 11.7.We employed the stochastic gradient descent (SGD) algorithm as the optimizer, with each

Implementation
All experiments in this paper were conducted on a PC with an Intel ® Core™ i7-10875H CPU @2.30GHz × 16 and a GeForce RTx 2060 Mobile GPU, using PyTorch for implementation.The operating system was Ubuntu, and the CUDA version was 11.7.We employed the stochastic gradient descent (SGD) algorithm as the optimizer, with each minibatch containing 20 images.Weight updates for the network used an initial learning rate of 1 × 10 −2 , weight decay of 5 × 10 −4 , and momentum of 0.937.Lastly, we utilized rotation non-maximum suppression (R-NMS) with an IoU threshold of 0.5 to remove duplicate detections and set the confidence threshold to 0.3.The computer and deep learning environment configurations for our experiments are presented in Table 2. To evaluate the detection performance of our method, we employed four evaluation metrics: precision, recall, F1 score, and average precision (AP).Precision and recall are defined as follows: where TP represents the number of true-positive detections, FP represents the number of false positives, and FN represents the number of ships that were not detected.Precision is defined as the proportion of true ship detections among all positive predictions made by the algorithm.Precision and recall are typically negatively correlated, meaning that as one goes up, the other tends to go down.A higher precision indicates fewer false positives, while a higher recall means fewer false negatives.The F1 score is used to provide a balanced measure of both precision and recall and is defined as Furthermore, we can obtain the precision-recall curve and calculate the mean Average Precision (mAP): where k is the total number of target categories, and P(R) is the precision-recall curve, which is obtained from the four components determined in information retrieval: true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).Additionally, it is set that a detected target in a bounding box is considered a correct ship when the IoU between the bounding box and a single ground truth is greater than 0.5.

Quantitative Analysis of Results
To evaluate the comprehensive performance of our method, we compared the detection results of our proposed model with those of eight other rotation detectors, as shown in Table 3. Labels C1-C6 correspond to bulk carriers, fishing vessels, law enforcement vessels, dredgers, general cargo ships, and container ships, respectively.The detection results of other methods are sourced from [37].From the results, it can be observed that among the two-stage detection networks, O-RCNN [38] exhibits the best overall detection performance on the SRSDD dataset, with mAP and F1 scores of 56.23% and 60.64, respectively.In the single-stage detection models, BBAVectors [39] and R-FCOS [37,40] show the best overall performance, but their mAP values are both less than 50, and F1 scores are below 45.In contrast, our proposed model achieves a mAP of 61.07 and an F1 score of 63.91, significantly outperforming BBAVectors and R-FCOS.Moreover, our method processes the SRSDD dataset almost six times faster than other methods, with a model size of only 18.51M, which is notably better than the original lightweight model R-FCOS with 244M [40].The results indicate that our model can effectively detect ship targets in complex SAR images.In terms of operational efficiency across different models, two-stage detection models achieve a maximum FPS of only 8.38 when processing images on the SRSDD dataset.Among the other single-stage detection models, R-RetinaNet [37] is the fastest, but its FPS is only 10.53.In contrast, as shown in Table 3, our proposed model achieves an FPS value of 56.18, demonstrating a significant advantage in processing efficiency over other single-stage or two-stage detection models.

Qualitative Analysis of Results
Furthermore, we visualize the detection and recognition results of our proposed model in nearshore and offshore scenarios in the SRSDD dataset, as shown in Figures 3-5.       Figure 3 presents the detection and classification results for ships at a distance from the shore.From the SAR images, it is evident that our network successfully suppresses false alarms in SAR images and can effectively suppress interference noise that closely resembles ship characteristics.The detection results highlight our network's strong adaptability to different scenes.Due to the utilization of the CFAR-FCN in our model, it enhances ship features while suppressing noise, making the network more robust and reducing the impact of noise on detection and classification results to some extent.
Figure 4 displays the detection and classification results for ships in nearshore scenarios.From the image, it is evident that our network successfully detects and classifies ship targets of different categories, even when complex backgrounds dominate most of the picture.This is attributed to our network's ability to suppress land features, thereby enhancing the accuracy of nearshore ship detection.
Figure 5 presents the detection and classification results for densely arranged ship scenarios.Typically, due to complex backgrounds, false alarms are a common occurrence in the detection results for onshore ships.Furthermore, the dense arrangement of nearshore ships poses a challenge to detection and classification.From the detection results, it is evident that our network accurately detects and classifies densely arranged ships in nearshore scenes while suppressing false alarms in these scenarios.This is attributed to our adoption of a rotation detection mechanism, which yields improved regression results for ship detection boxes in nearshore settings.

Ablation Experiments
In this section, we conducted a series of experiments to validate the effectiveness of key improvements in the network, The comparative results of each module are shown in Table 4.These improvements include the CFAR-FCN branch and the frequency domain Fca-Neck module.Furthermore, we qualitatively explained the improvements brought about by these introduced modules based on the test results.The results show that the addition of these improvements gradually enhances the network's detection accuracy.• Effectiveness of CFAR-FCN Table 5 presents the results of ablation studies on the CFAR-FCN network branch.We compared a series of detection results.Except for the CFAR-FCN network part, the three networks used in the experiments were identical.One network employed YOLOv5s+CFAR feature maps, using feature maps filtered by the CFAR operator as the output labels for the CFAR-FCN branch.Another network used YOLOv5s+shipseg segmentation maps, where segmentation maps that segmented ship targets and assigned 0 to other areas were used as the output labels for the network branch.The last network was our base network, which is the normal YOLOv5s with rotated detector.The results indicate that using the CFAR-FCN network can improve the accuracy of detection and classification to a certain extent, effectively enhancing the overall detection performance of the network.Leveraging the CFAR-FCN network branch allows the fusion of CFAR ship features from traditional handcrafted operators, highlighting ship information, thus improving detection and classification accuracy.Additionally, from the experimental results, it can be seen that CFAR feature maps outperform regular ship segmentation maps in terms of detection performance.This is because handcrafted operators enhance effective ship features.Additionally, from Figure 6, the sub-net of this network exhibits a suppressive effect on land.This phenomenon arises from the fact that in this experiment, the CFAR feature map is obtained after land and water segmentation, that is, land is assigned as 0, and then the CFAR operator is used for the filtering calculation.In other words, the land portions in the CFAR map input to the network are all assigned a value of 0. This significantly reduces the interference caused by strong signal reflection points on land to the network's ship detection.
indicate that using the CFAR-FCN network can improve the accuracy of detection and classification to a certain extent, effectively enhancing the overall detection performance of the network.Leveraging the CFAR-FCN network branch allows the fusion of CFAR ship features from traditional handcrafted operators, highlighting ship information, thus improving detection and classification accuracy.Additionally, from the experimental results, it can be seen that CFAR feature maps outperform regular ship segmentation maps in terms of detection performance.This is because handcrafted operators enhance effective ship features.
Additionally, from Figure 6, the sub-net of this network exhibits a suppressive effect on land.This phenomenon arises from the fact that in this experiment, the CFAR feature map is obtained after land and water segmentation, that is, land is assigned as 0, and then the CFAR operator is used for the filtering calculation.In other words, the land portions in the CFAR map input to the network are all assigned a value of 0. This significantly reduces the interference caused by strong signal reflection points on land to the network's ship detection.• Effectiveness of Fca-Neck Table 6 presents the results of ablative studies on the frequency-domain module.In this experiment, we compared the detection results with and without the inclusion of the Fca module.The networks used in the experiment were identical except for the presence of the FcaNet module.The first three networks in the table incorporate the FcaNet module, and each of them utilizes a different frequency component: low-frequency (low), top-performing frequency component (top), and the optimal frequency component (bot) obtained through the neural architecture search [11].
From Table 6, it can be observed that networks with the FcaNet module exhibit higher detection accuracy, as this module enhances the features.Furthermore, the module that aggregates low-frequency information shows the best detection performance.This is attributed to the Fca-low module's ability to aggregate low-frequency information that is often overlooked in convolutional neural networks, enabling the network to learn more useful features.
As shown in Figure 7, the integration of high-frequency features in SAR images enables rapid and accurate localization of ship targets.However, it is susceptible to interference from echo noise, leading to errors in the detection of rotational directions.Conversely, the utilization of low-frequency information effectively mitigates this issue, allowing the network to comprehensively learn the characteristic information of ship targets across different frequency bands.This enhances the network's resistance to interference.
utilizes a different frequency component: low-frequency (low), top-performing frequency component (top), and the optimal frequency component (bot) obtained through the neural architecture search [11].
From Table 6, it can be observed that networks with the FcaNet module exhibit higher detection accuracy, as this module enhances the features.Furthermore, the module that aggregates low-frequency information shows the best detection performance.This is attributed to the Fca-low module's ability to aggregate low-frequency information that is often overlooked in convolutional neural networks, enabling the network to learn more useful features.
As shown in Figure 7, the integration of high-frequency features in SAR images enables rapid and accurate localization of ship targets.However, it is susceptible to interference from echo noise, leading to errors in the detection of rotational directions.Conversely, the utilization of low-frequency information effectively mitigates this issue, allowing the network to comprehensively learn the characteristic information of ship targets across different frequency bands.This enhances the network's resistance to interference.

Conclusions
This study introduced a network model for ship detection and recognition in complex SAR images.Firstly, the CFAR-FCN branch innovatively integrates CFAR features in the form of a sub-net, allowing the detection network to learn CFAR features while concurrently providing it with supervisory learning and feature enhancement capabilities.

Conclusions
This study introduced a network model for ship detection and recognition in complex SAR images.Firstly, the CFAR-FCN branch innovatively integrates CFAR features in the form of a sub-net, allowing the detection network to learn CFAR features while concurrently providing it with supervisory learning and feature enhancement capabilities.The Fca-Neck module aggregates useful frequency domain information, especially low-frequency information, in the image, which is often overlooked in standard convolutional neural networks.Additionally, the rotation anchors used in this study effectively reduce interference from complex backgrounds in SAR ship recognition while mitigating the problem of ground truth suppression caused by NMS in densely clustered ship scenarios.
Numerous ablation experiments and comparative studies conducted on the latest SRSDD-v1.0dataset demonstrated the effectiveness of each module created in this paper.The experimental results indicate that the model achieves an F1 Score of 63.91, a mAP of 61.07, and an FPS of 56.18 on the SRSDD dataset, with a model size of only 18.51 M. The performance of our method on the SRSDD dataset outperforms several other approaches.Our method is notably suitable for practical devices, ensuring accuracy and meeting the real-time requirements for future SAR ship detection and recognition.

Figure 1 .
Figure 1.The Overall Framework.CFAR-FCN stands for CFAR feature constraint sub-net; Backbone represents the feature extractor; Fca-Neck is the FcaNet Channel Attention Module.

Figure 1 .
Figure 1.The Overall Framework.CFAR-FCN stands for CFAR feature constraint sub-net; Backbone represents the feature extractor; Fca-Neck is the FcaNet Channel Attention Module.

Figure 2 .
Figure 2. The quantity distributions of six categories of ships in SRSDD-v1.0dataset.

Figure 2 .
Figure 2. The quantity distributions of six categories of ships in SRSDD-v1.0dataset.

Figure 3 .
Figure 3. Detection results in offshore scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.Figure 3. Detection results in offshore scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.

Figure 3 .
Figure 3. Detection results in offshore scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.Figure 3. Detection results in offshore scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.

Figure 3 .
Figure 3. Detection results in offshore scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.

Figure 4 .
Figure 4. Detection results in inshore scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.

Figure 4 . 18 Figure 5 .
Figure 4. Detection results in inshore scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.Remote Sens. 2024, 16, x FOR PEER REVIEW 13 of 18

Figure 3
Figure 3 presents the detection and classification results for ships at a distance from the shore.From the SAR images, it is evident that our network successfully suppresses false alarms in SAR images and can effectively suppress interference noise that closely resembles ship characteristics.The detection results highlight our network's strong adaptability to different scenes.Due to the utilization of the CFAR-FCN in our model, it en-

Figure 5 .
Figure 5. Detection results in dense array scenes.The first row is the original SAR images, the second row is the ground truth, and the third row is the OBB prediction results.
Deep learning-based SAR ship detection methods offer advantages.With extensive training data, deep learning methods can explore features that traditional algorithms cannot, resulting in improved SAR ship detection.When training data is limited, incorporating certain traditional handcrafted operators, such as feature extraction techniques, can enhance the network detection performance.However, through extensive comparative experiments, we have observed that due to the influence of image quality and resolution, our method still has noticeable limitations in ship target recognition performance.The detection accuracy is not adequately high, especially in few-shot scenarios.Therefore, we intend to conduct further research in the following areas: (1) Continuing research on improving the accuracy and quality of SAR ship detection in few-shot scenarios; (2) Continuing research on rapid detection and segmentation of ship targets in SAR images.

Table 1 .
The basic parameters of SRSDD.

Table 1 .
The basic parameters of SRSDD.

Table 2 .
Experimental setup and environment.
* With the best results highlighted in bold.

Table 4 .
Comparative analysis of the effects of each module in ablation experiments.With the best results highlighted in bold.YOLOv5s means the normal YOLOv5s with rotated detector; CFAR means CFAR-FCN with CFAR feature map; low means low-frequency information. *
* With the best results highlighted in bold.
* With the best results highlighted in bold.
* With the best results highlighted in bold.