EFCNet: Expert Feature-Based Convolutional Neural Network for SAR Ship Detection

Chen, Zheng; Zhang, Yuxiang; Bai, Jing; Hou, Biao

doi:10.3390/rs17071239

Open AccessArticle

EFCNet: Expert Feature-Based Convolutional Neural Network for SAR Ship Detection

The Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(7), 1239; https://doi.org/10.3390/rs17071239

Submission received: 23 February 2025 / Revised: 18 March 2025 / Accepted: 26 March 2025 / Published: 31 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

Due to the special properties of synthetic aperture radar (SAR) images, they are widely used in maritime applications, such as detecting ships at sea. To perform ship detection in SAR images, existing algorithms commonly utilize convolutional neural network (CNN). However, the challenges in acquiring SAR images and the imaging noise hinder CNN in performing SAR ship-detection tasks. In this paper, we revisit the relationship between SAR expert features and network abstract features, and propose an expert-feature-based convolutional neural network (EFCNet). Specifically, we exploit the inherent physical properties of SAR images by manually extracting a range of expert features, including electromagnetic scattering, geometric structure, and grayscale statistics. These expert features are then adaptively integrated with abstract CNN features through a newly designed multi-source features association module, which improves the common CNN’s capability to recognize ship targets. Experiment results on the SSDD demonstrate that EFCNet outperforms general CNN approaches. Furthermore, EFCNet achieves comparable detection performance to baseline methods while utilizing only 70% of the data capacity, highlighting its efficiency. This work aims to reignite interest in leveraging expert features in remote sensing tasks and offers promising avenues for improved SAR image interpretation.

Keywords:

SAR; ship detection; deep learning; convolutional neural network

Graphical Abstract

1. Introduction

Synthetic aperture radar (SAR) is a proactive sensing system that utilizes coherent processing of radar signals to generate fine-detail imagery. One of SAR’s key advantages is its ability to offer continuous, multi-angle, and long-range surveillance capabilities, which is crucial for overseeing maritime activities [1,2,3,4,5], particularly for ship detection at sea. This capability is vital for maritime safety, enabling efficient management of rescue operations and navigation. Consequently, SAR ship detection [6,7,8,9] has attracted increasing attention in relevant community, as it plays a pivotal role in improving the effectiveness of maritime surveillance systems.

Traditional SAR ship- detection methods [10,11,12] are often tailored to specific conditions and employ predefined feature-extraction techniques. For instance, some approaches use the histogram of oriented gradients (HOG) [10], which leverages the geometric properties by computing the gradient histograms of local regions. Similarly, other methods based on the scale-invariant feature transform (SIFT) [11] focus on identifying key points that remain stable despite noise and geometric transformations. Although these methods have proven effective in certain contexts, they are limited by their dependence on manually designed features, which can reduce their robustness and ability to generalize across different scenarios.

With the rapid advancement of deep-learning techniques in past decades [13,14,15,16,17,18], convolutional neural networks (CNNs) have made a great splash in a variety of image tasks [19,20,21,22,23,24,25,26,27,28,29,30], and the SAR ship detection task has also benefited from it [31,32,33,34,35]. Current SAR ship-detection frameworks can generally be divided into two categories: two-stage detectors [36,37,38] and single-stage detectors [39,40,41]. The Faster R-CNN [37] framework, which is widely used in two-stage detection, integrates deep learning with region proposal techniques to accelerate detection efficiency. Building on the strong performance of Faster R-CNN in optical images, Li et al. [42] adapted it for SAR images and introduced a lightweight version of Faster R-CNN. Zhang et al. [43] released a publicly available SAR ship dataset, named SSDD, and then enhanced the Faster R-CNN framework through the combination of feature fusion and transfer learning strategies. Additionally, Gui et al. [44] built a multi-layer and lightweight head detector within the Faster R-CNN architecture, optimizing for multi-scale detection performance and computational efficiency. As for the single-stage approach, Hang et al. [45] employed a deep separable convolutional network for high-speed SAR ship detection, building on the YOLOv3 framework [39] with multi-scale detection and a cascade-anchor mechanism to boost accuracy. Cai et al. [46] developed a feature-integration network that considers texture variations between objects and their backgrounds for improving SAR ship detection. It is worth mentioning that Zhang et al. [47] proposed HOG-ShipCLSNet, which combines HOG features with a CNN model for ship classification. Specifically, the extracted HOG features are processed using principal component analysis to obtain feature vectors, which are then stacked with the feature vectors extracted by the CNN in the final fully connected layer, serving as the input for the final ship classification.

While the implementation of CNN has yielded success in SAR ship detection, certain challenges remain, demanding meticulous consideration and resolution.

(1): The CNN-based methods have a great reliance on the amount of data. To enable the model to effectively capture discriminative features associated with SAR ship targets, substantial training data is required. Nevertheless, acquiring SAR images is both challenging and expensive, leading to a limited amount of available data. This scarcity of data creates a significant bottleneck, making it difficult for CNN to learn generalizable features. Consequently, the limited SAR image data hinders the possibility of CNN to realize their full potential in SAR ship detection. In other words, it is worth exploring how to enable the model to fully extract robust features related to the target from the limited data for achieving better SAR ship detection.
(2): Existing CNN-based approaches are notably vulnerable to interference arising from multiple sources, such as noise perturbations, alterations in target orientation, and variations in imaging angles. As shown in Figure 1, unlike optical images, SAR images are prone to substantial background clutter, potentially obfuscating the discernible features of target. Besides, the inherent relative motion between the radar apparatus and the target may induce azimuthal and distance blurring, introducing uncertainty regarding the target’s spatial positioning and morphology. These intrinsic attributes render the detection of ship targets within SAR images a notably intricate endeavor. If the SAR ship-detection task is directly treated as an optical target-detection task, it may not achieve the same performance.

In response to these challenges, we suggest to explore in depth the method of combining the expert features of SAR ships with deep neural networks. By incorporating SAR-specific features such as electromagnetic scattering, geometric structure, and grayscale statistics into deep-learning models, it becomes possible to develop a more robust ship-detection system. This approach can help overcome the inherent challenges of SAR imaging and data limitations. We believe this research direction holds significant potential for improving the reliability and generalization of CNN-based SAR ship-detection techniques.

To address the research gap, we analyze key expert features of SAR images, including electromagnetic scattering, geometric structure, and grayscale statistics. we propose an expert feature-based convolutional neural network (EFCNet), which explores the integration of expert features with deep neural networks. Our EFCNet consists of two separate branches: one for learning deep abstract features and another for processing expert features. The outputs from both branches are integrated, guiding the model to make more accurate detection decisions. Our method also incorporates a feature-association module that enables EFCNet to capture the intrinsic characteristics of SAR ship targets, even with limited data. We conduct experiments on the SSDD dataset, and the results highlight the effectiveness of EFCNet, showing that it outperforms traditional CNN approaches and achieves performance comparable to baseline methods, despite using only 70% of the available data.

2. Method

2.1. Overview

As illustrated in Figure 2, our proposed EFCNet contains three main components: (1) deep-network feature extraction, (2) SAR ship expert-feature extraction, and (3) multi-source features association module. In (1), the backbone and feature pyramid network (FPN) [48] are employed for feature extraction, producing deep feature maps. A region proposal network (RPN) is then used to produce candidate boxes, which undergo non-maximum suppression to retain N candidate boxes. In (2), expert-feature extraction is performed using multiple specific operators. Finally, in (3), the deep-network features are fused with the expert features using a multi-source features association module. By feeding the fused features into the fully connected layer, we can further refine the final prediction boxes.

2.2. Deep-Feature Extraction

We take Faster R-CNN as the basic architecture to obtain high-quality abstract features. Faster R-CNN is a well-established deep-learning model known for its accuracy and efficiency in various object-detection tasks, including SAR ship detection. We chose FPN for its ability to detect objects at multiple scales, which is crucial for handling the variability in ship sizes and orientations in SAR images. FPN enhances Faster R-CNN’s performance by providing a rich, multi-scale feature representation, improving the model’s robustness and accuracy. Specifically, as shown in Figure 3, Faster R-CNN distributes the detection task into two stages: candidate boxes generation and candidate boxes refinement. In the first stage, the candidate boxes are filtered and optimized through the RPN to obtain candidate boxes containing targets. In the second stage, the candidate boxes are further refined to obtain more accurate final detection results.

2.3. Expert Feature Extraction

Given the complexity of SAR images in contrast to optical images, conventional CNN require more data samples to learn features with generalization. Also, due to the difficulty of acquiring SAR images, we suggest to utilize the SAR expert features to further facilitate the performance of CNN.

We take full account of the physical features of SAR images shown in Figure 4, especially the electromagnetic properties [49]. Of course, SAR images also contain Contour [50], HOG [51], Gray-Level Co-occurrence matrix (GLCM) [52], Canny [53], and Harris features [54], just like optical images [55,56]. Considering the specificity of electromagnetic features of SAR images, in this section we focus more on the electromagnetic scattering characteristics and three related feature-extraction algorithms are detailed here: strong scattering point extraction, peak features [57], and constant false alarm rate (CFAR) [58].

2.3.1. Strong Scattering Point Extraction

During SAR imaging, ship targets exhibit multiple scattering patterns, owing to variations in the incidence angle and azimuth angle. The SIFT algorithm [59], widely used in computer vision, is effective for detecting and describing local features in images. It identifies distinct points at different spatial scales, making it well-suited for extracting prominent scattering features from SAR ship images. The SIFT method consists of two main phases: scale-space extrema detection and localization of key points with strong scattering. For scale-space extrema detection, the Gaussian function serves as the fundamental kernel for multi-scale transformations. Also known as the normal distribution, the Gaussian function is a continuous, symmetric curve defined by its mean and standard deviation. Its smooth nature and ability to model natural phenomena and noise make it particularly useful across various fields. The original image undergoes convolution with a Gaussian function, resulting in smoothed images at multiple scales. Next, the Gaussian smoothed images at adjacent scales are subtracted to find stable extreme value points on the Difference of Gaussian (DoG) image.

G (x, y, σ) = \frac{1}{2 π σ^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}},

(1)

\begin{matrix} D (x, y, σ) & = (G (x, y, k σ) - G (x, y, σ)) * I (x, y) \\ = L (x, y, k σ) - L (x, y, σ) \end{matrix}

(2)

where the scale-space image

L (x, y, σ)

is generated by convolving the original image

I (x, y)

with the Gaussian function

G (x, y, σ)

. The result is then further processed using the DoG function to produce

D (x, y, σ)

, which identifies the stable extrema points. A higher

σ

leads to lower image resolution and more blurred images after Gaussian smoothing. This implies that larger scales capture the contour features of the image, while smaller scales focus on finer details. The parameter k is the multiplicative scale factor, which typically takes smaller values. During the strong scattering keypoint localization phase, the algorithm encounters numerous low-contrast points and unstable edge points within the polarized regions, which are particularly susceptible to noise. Additionally, the localization of these points may lack precision. To mitigate these issues, a median filter is applied prior to further processing to minimize noise interference and enhance precision. Subsequently, the location of the extreme value points is determined with high precision by fitting a three-dimensional quadratic function using a Taylor series expansion around the extreme value point.

D (p) = D + \frac{\partial D^{T}}{\partial p} p + \frac{1}{2} p^{T} \frac{\partial^{2} D}{\partial p^{2}} p,

(3)

D (\hat{p}) = D + \frac{1}{2} \frac{\partial D^{T}}{\partial p} \hat{p},

(4)

where p and

\hat{p}

represent the pixel points before and after median filtering, respectively.

| D (\hat{p}) | < 0.03

that is, the low-contrast points are removed. The remaining ones are the strong scattering points. We show the feature-extraction result of strong scattering points in Figure 5.

These points provide explicit, high-reflectivity regions that are often indicative of ship structures. While CNN excels at learning complex, abstract patterns from data, they may not always emphasize such explicit high-reflectivity regions. By pinpointing these key points, the model can concentrate on the discriminant region, enhancing detection accuracy.

2.3.2. Peak Features

In the SAR image imaging process, the varying types of targets, along with different imaging angles and orientations, cause the position and strength of strong scattering points to differ across various ships. The peak feature [57] is characterized as a local extreme value on SAR images, which essentially corresponds to the strong scattering center of the target, and is the result of the mutual convolution of the point scatterer response and the SAR system impact response during the SAR imaging process. The specific algorithm is as follows.

p_{i j} = \{\begin{matrix} 1, & m e a n (a_{i j} - a N (i, j)) > σ \\ 0, & e l s e \end{matrix},

(5)

where p is the peak value at point

(i, j)

and a denotes the magnitude of SAR image. After the sliding window, if

(i, j)

is the window center of mass,

N (i, j)

is the coordinates of the remaining neighboring points in the window, and the center of mass amplitude

a_{i j}

is the difference between the neighboring amplitude

a_{N (i, j)}

, respectively, and its mean value is obtained. If

σ

is greater than the standard deviation of the image background, it is a peak point. We show the extraction result of peak feature in Figure 6.

Peak features represent the highest intensity values, often associated with edges and corners of ship structures. While CNN excels at capturing complex patterns and hierarchical features, they might not always emphasize the absolute intensity peaks as strongly. By incorporating peak features, we ensure that the model pays attention to these critical high-intensity regions, improving the precision of ship localization and distinguishing ships from background clutter.

2.3.3. CFAR

The traditional method based on specific features is to detect ships by manual features. The most commonly used method is the detection of ships based on backscattering features. Ships made of metal materials have strong backscattering properties and therefore have high gray values. The CFAR algorithm [58] is a method for adaptive threshold detection. It operates by estimating the statistical distribution around a target, based on an assumed model for the background’s probability density function, to maintain a consistent false alarm rate. The typical operation flow of the CFAR algorithm is shown in Figure 7. This approach evaluates the gray value of each pixel against a threshold that is adjusted in real time to determine if the pixel is part of the target area. To calculate this threshold, three essential components are required: a predefined false alarm rate, a statistical model for the background clutter, and a CFAR sliding window detector.

The CFAR detector scans each pixel through a sliding window to carry out target detection on a pixel-by-pixel basis. The most frequently employed sliding window for CFAR detection is the hollow window, which consists of three regions, arranged from the inside out: the target region, the protection region, and the background clutter region, as illustrated in Figure 8.

Based on the principle of CFAR detection, the target region can be approximately identified using the following formula:

C (x) = \{\begin{matrix} f o r e g r o u n d, & x > β \\ b a c k g r o u n d, & e l s e \end{matrix},

(6)

where x represents the intensity of the detected pixel and

β

denotes the threshold. We show the extraction result of CFAR in Figure 9.

CFAR adapts to varying noise conditions by estimating local noise levels and adjusting detection thresholds dynamically. This provides robustness in noisy environments, which CNN might not explicitly account for as they learn generalized feature representations. Integrating CFAR-based features ensures that regions with significant deviations from background noise are emphasized, maintaining consistent detection performance across different noise conditions.

2.4. Multi-Source Features Association Module

These expert features such as strong scattering points, peak features, and CFAR provide explicit, high-level information that is interpretable and robust to noise, complementing the rich, hierarchical representations learned by the CNN. This combination leverages the strengths of both approaches, ensuring that the model captures both global and local features of SAR ships, thus promising to improve overall detection performance. Specifically, as illustrated in Figure 10, the network abstract feature vector

f_{A}^{S_{i}}

and the SAR ship expert feature vector are associated, where

f_{G}^{S_{i}}

is extracted from the corresponding coordinate positions on the depth feature maps, and the feature vector of (256, 7, 7) is obtained by ROI-Pooling. The specific operation is to map the candidate regions to the corresponding positions of feature maps, divide the mapped area into cells of the same size, and afterwards perform the maximum pooling operation for each cell; in this way, the corresponding feature vectors of fixed size can be obtained from the candidate regions of different sizes. ROI-Pooling significantly enhances processing efficiency. Since each detection map corresponds to a specific set of expert feature maps for SAR ships, a scaling operation is applied to transform the feature maps of varying sizes into a consistent shape of (C, 7, 7), where C denotes channel number. The multi-source features association module captures the relationships between these vectors. We adopt the scaled dot product attention to associate the two extracted features, which can be formulated as:

f_{R}^{S i} = s o f t m a x (\frac{W_{Q} * W_{K}^{T}}{\sqrt{d_{k}}}) * W_{V},

(7)

where the input deep-network feature vector

f_{A}^{S_{i}}

is represented as the query and corresponding values of Q and V by two-way branching, W means the corresponding weight, and the input SAR ship expert feature vector

f_{G}^{S_{i}}

is represented as the key value of K. The query and key vectors are then combined using scaled dot-product attention to compute attention scores, which indicate the relevance of different regions of the abstract CNN features based on the expert features. Our multi-source features association module effectively integrates feature information from different sources, and adaptively models the relationship and importance between different features to better learn discriminative features related to ship targets from a limited training set.

To evaluate the impact of various fusion techniques, we also implement a basic feature-association method. Specifically, we directly stack expert features and modeled abstract features on the channel. Subsequently, we input the stacked features as query, key, and value into the self-attention mechanism.

3. Result

3.1. Datasets and Experimental Details

Experimental environment. All algorithms are implemented with Pytorch 1.10.0 and Python 3.7, running on Ubuntu 16.04. The system is powered by an NVIDIA 3080Ti GPU with 12 GB graphics memory (NVIDIA, Santa Clara, CA, USA).

Datasets. SSDD Dataset [48]: The dataset consists of 1160 SAR images and 2456 ship targets, and covers multi-scale ship targets distributed in different locations such as near-shore and far-shore. Figure 11 shows the sample examples for various scenarios.

Experimental details. To align with the size, all input images are resized to 512 × 512 pixels for training and testing. During training, we apply horizontal flipping, random cropping, and color jittering for data augmentation. A weight decay rate of 0.0005 is used for all network training. Adam optimization is used with a 0.9 momentum, an initial learning rate of 0.001, and a decay factor of 0.9. To benchmark the performance of our proposed approach, we use Faster R-CNN as the baseline model for comparison. This enables us to highlight the improvements achieved by integrating expert features and deep-learning techniques in our model.

Evaluation Metric. To quantitatively assess the detection performance of various methods, we utilize common evaluation metrics from object-detection tasks [60,61,62,63]: recall and average precision (AP). Recall indicates the likelihood that all ship targets are accurately detected, reflecting the coverage of the ground truth, and can be computed as:

R e c a l l = \frac{T P}{T P + F P},

(8)

where TP denotes the count of ships correctly identified, and FP refers to the number of ships falsely detected.

The AP is calculated as the area under the precision–recall (PR) curve, which provides a comprehensive evaluation of the trade-off between precision and recall. Mathematically, AP is defined as follows:

A P = \int_{0}^{1} P (R) d R .

(9)

AP@0.5 represents the AP calculated at a single intersection over union (IoU) threshold of 0.5. This means that a detected bounding box is considered correct if its IoU with the ground truth is at least 0.5. AP@0.5:0.95 (or AP across multiple IoUs) is the mean AP calculated at ten different IoU thresholds, ranging from 0.5 to 0.95, in increments of 0.05. This metric provides a more rigorous and comprehensive evaluation of detection performance, as it accounts for localization accuracy at various levels of IoU.

3.2. Ablation Experiment and Analysis

Results of single feature. To investigate the effect of each expert feature on the model, we first conduct single-feature fusion experiments. Integrate all available SAR ship features in sequence, including strong scattering point [59], peak features [57], CFAR [58], Contour [50], HOG [51], GLCM [52], Canny [53], and Harris features [54]. In our experiments, we evaluate two types of feature association methods utilizing the association module. The first method involves stacking the candidate regions extracted from the abstract and expert feature maps. Then, we apply the multi-source features association module to weight the different features and perform the distinctive feature activation. The second method feeds the set of vectors obtained from the expert feature module as the key, while the deep network’s feature is used as both the query and value.

The results of the correlation network detection of SAR ship features are shown in Table 1. We find that simply stacking two types of feature and using the multi-source features association module does not effectively optimize detection performance. In contrast, we take expert features and model abstract features as separate input branches to the multi-source features association module, enabling the model to be more sensitive to SAR ship targets motivated by expert features. In addition, we observe that the mechanism of fusing expert and abstract features indeed improves the performance of CNN, especially when using CFAR, Peak. We suggest that scattering features (e.g., CFAR) outperform other features (e.g., Contour and Canny) as general CNNs are capable of learning contour and texture features but struggle to learn scattering features from the dataset. This also validates the complementarity between some expert features and abstract features.

Results of multiple features. The results of combining multiple expert features in our experiments reveal some interesting insights regarding the fusion of these features. As indicated in Table 2, the proposed multi-source features association module consistently outperforms other fusion methods. This reinforces the idea that splitting the feature sets into separate branches for processing allows the model to learn more effectively from the distinct characteristics of each feature. The separate processing helps prevent the features from conflicting or diluting each other’s impact on detection performance, which is a common issue in feature fusion tasks.

However, we also observed that in some cases, the combination of multiple features did not always lead to a performance improvement compared to using a single feature. This phenomenon suggests that there may be representational inconsistencies between the features at certain dimensions. For example, features such as Contour and Harris, which focus more on structural and edge information, might not align well with texture-based features like GLCM or Peak, which capture different types of patterns in the image. This misalignment could lead to a situation where the combined features do not provide as much complementary information as expected. When features do not complement each other properly, their combination can result in a weaker representation for the model, thus negatively impacting detection accuracy.

Further analysis of the single-feature and combined-feature results highlights some key insights into the relative effectiveness of specific features in improving the detection performance of the CNN. Among the conventional handcrafted features, Strong Scattering Points, HOG, and Peak features consistently show significant improvements in detection performance. Strong Scattering Points provide robust signals that help highlight potential target areas, making them particularly useful for detecting ships in cluttered environments. HOG features, which capture shape and edge information, significantly improve the detection of objects with distinct outlines, such as ships with clear hull boundaries. Similarly, Peak features, which are sensitive to the brightest regions in the image, also aid in detecting ships that have high contrast against the background.

Ablation study on data capacity. To explore the impact of expert features on CNN, we also evaluate the performance by testing with different training set capacity, and the experimental results are reported in Table 3. The results show that our proposed EFCNet achieves comparable AP to Faster R-CNN when using only 70% of the data capacity. To provide a deeper comparison and further demonstrate the advantage of our approach, we also analyzed the performance of Faster R-CNN at varying data capacities. Our experiments revealed that as the dataset size decreased, Faster R-CNN’s performance deteriorated significantly, whereas the performance gap between EFCNet and Faster R-CNN became even more pronounced. In fact, the improvement of EFCNet over Faster R-CNN was more evident with smaller data sizes, highlighting the robustness of our model in situations where data is scarce.

Additionally, we perform a qualitative evaluation to assess different methods. As shown in Figure 12 and Figure 13, we compare the detection results of our proposed EFCNet with those of the benchmark model across several scenarios. In the first and second rows, the detection boxes show that our approach is capable of precisely locating SAR ship targets, significantly reducing the likelihood of missed or incorrect detections. In the more complex scenarios in the remaining rows, our approach still outperforms the benchmark, despite a few occasional errors. This analysis highlights the effectiveness of our feature association module in enhancing the performance of traditional CNNs, even if limited by the data capacity.

3.3. Comparison Experiments and Analysis

To further assess the effectiveness of our proposed method, we compared it with several advanced approaches in the field. As shown in Table 4, our method, EFCNet, outperforms all other tested models in both AP@0.5:0.95 and AP@0.5. The results demonstrate that EFCNet consistently achieves superior performance compared to established techniques, including EfficientNet [64], YOLOv3 [39], SSD [65], RetinaNet [66], Faster R-CNN [37], Cascade R-CNN [67], Grid R-CNN [68], Double-Head R-CNN [69], Sparse-RCNN [70], CRTransSar [71], YOLO-Lite [72], and SD-YOLO [73]. These findings highlight the advantages of our approach in SAR ship detection, showcasing its robustness and effectiveness in comparison to other advanced methods.

4. Discussion

Since our proposed EFCNet currently only considers ship targets in calm sea state scenarios, we use default settings when extracting expert features from SAR images. In fact, even for the same target, the backscattering of electromagnetic intensity can change with different incidence angles. Similarly, even for the same target and the same incidence angle, the CFAR threshold will vary because the returned electromagnetic intensity changes with different sea states. Even in low sea states, high-intensity spikes can appear, confusing the strong returns from ships. For these characteristics, we need to consider more imaging mechanisms and other factors to obtain higher quality expert features.

In certain scenarios, ship wake and ocean waves can generate strong reflection signals [74,75,76]. Ship wake appears in SAR images as diffuse patterns with gradually weakening intensity. We suggest that these typical wake characteristics assist the model in identifying ship targets, as the convergence point of the diffuse wake is the ship’s location. Ocean waves, on the other hand, generate background clutter, which may affect the quality of SAR expert features. However, as rigid bodies, ships produce reflection signals with higher intensity and more regular contours, helping to distinguish ships from background clutter. Overall, ship wake and ocean waves are factors that may impact the model’s detection performance. Therefore, our method will require specific analysis for better performance when applied to these particular scenarios.

5. Conclusions

In this paper, we introduce an advanced convolutional neural network named EFCNet for SAR ship detection. The primary innovation of EFCNet lies in its multi-source feature association mechanism, which adaptively combines expert features, such as electromagnetic scattering, geometric structure, and grayscale statistics, with the abstract features. Extensive comparative experiments on the SSDD dataset validate our EFCNet outperforms other common convolutional neural networks. Furthermore, we demonstrate that our approach delivers performance on par with the baseline, even when using only 70% of the available data.

As a foundational exploration, the proposed EFCNet in some scenarios may not be optimal. For example, in nearshore scenarios where multiple ship targets are closely packed together, our method, like other existing approaches, may struggle with distinguishing these adjacent targets effectively. Additionally, our method may perform less effectively in complex sea conditions compared to general scenarios. In future research, we will further focus on the solutions in more complex scenarios. On the one hand, we will be digging for more useful SAR expert features in various complex scenarios. This will allow us to develop more robust feature representation methods that can mitigate the effects of imaging differences and irrelevant background interference. On the other hand, we will consider more types of feature information, such as ship wake [75,76], to incorporate into our feature association module. Specifically, when a ship is sailing, it usually generates ship wake, and this feature information can indicate the ship’s spatial position and travelling direction, which may be helpful for continuous monitoring of the ship. Meanwhile, exploring the combination of other data sources (e.g., optical images) and advanced techniques (e.g., multi-task learning) is also a valuable research direction. For example, we can combine the object segmentation and detection tasks to alleviate our limitation of poor performance for multi-target detection in nearshore scenarios. Overall, we hope that our core idea of multi-source features association can provide inspiration for tasks in other fields. Lastly, exploring adaptability to different SAR sensors and enhancing detection in extreme weather conditions could further improve its practical impact.

Author Contributions

Conceptualization, J.B. and B.H.; methodology, Z.C.; software, Y.Z.; validation, Z.C. and Y.Z.; formal analysis, Z.C.; investigation, Z.C.; resources, J.B.; data curation, Z.C.; writing, Z.C. and Y.Z.; visualization, Y.Z.; supervision, J.B. and B.H.; project administration, J.B.; funding acquisition, J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under Grant 62276206, Grant U24A20247, and Grant 62176196; in part by the Aeronautical Science Foundation of China under Grant 2023Z071081001.

Data Availability Statement

Data associated with this research are available online. The SSDD dataset is available at https://github.com/TianwenZhang0825/Official-SSDD (accessed on 1 August 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Reigber, A.; Scheiber, R.; Jager, M.; Prats-Iraola, P.; Hajnsek, I.; Jagdhuber, T.; Papathanassiou, K.P.; Nannini, M.; Aguilera, E.; Baumgartner, S.; et al. Very-high-resolution airborne synthetic aperture radar imaging: Signal processing and applications. Proc. IEEE 2012, 101, 759–783. [Google Scholar]
Li, J.; Xu, C.; Su, H.; Gao, L.; Wang, T. Deep learning for SAR ship detection: Past, present and future. Remote Sens. 2022, 14, 2712. [Google Scholar] [CrossRef]
Yasir, M.; Jianhua, W.; Mingming, X.; Hui, S.; Zhe, Z.; Shanwei, L.; Colak, A.T.I.; Hossain, M.S. Ship detection based on deep learning using SAR imagery: A systematic literature review. Soft Comput. 2023, 27, 63–84. [Google Scholar]
Bi, H.; Deng, J.; Yang, T.; Wang, J.; Wang, L. CNN-Based Target Detection and Classification When Sparse SAR Image Dataset is Available. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2021, 14, 6815–6826. [Google Scholar] [CrossRef]
Li, J.; Chen, J.; Cheng, P.; Yu, Z.; Yu, L.; Chi, C. A Survey on Deep-Learning-Based Real-Time SAR Ship Detection. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2023, 16, 3218–3247. [Google Scholar]
ZHANG, T.; ZHANG, X.; SHI, J.; WEI, S. High-speed ship detection in SAR images by improved yolov3. In Proceedings of the 2019 16th International Computer Conference on Wavelet Active Media Technology and Information Processing, Chengdu, China, 14–15 December 2019; IEEE: New Youk, NJ, USA, 2019; pp. 149–152. [Google Scholar]
Li, D.; Liang, Q.; Liu, H.; Liu, Q.; Liu, H.; Liao, G. A novel multidimensional domain deep learning network for SAR ship detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5203213. [Google Scholar]
Pan, D.; Wu, Y.; Dai, W.; Miao, T.; Zhao, W.; Gao, X.; Sun, X. TAG-Net: Target Attitude Angle-Guided Network for Ship Detection and Classification in SAR Images. Remote Sens. 2024, 16, 944. [Google Scholar] [CrossRef]
Du, Y.; Du, L.; Guo, Y.; Shi, Y. Semisupervised SAR ship detection network via scene characteristic learning. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5201517. [Google Scholar]
Song, S.; Xu, B.; Yang, J. SAR target recognition via supervised discriminative dictionary learning and sparse representation of the SAR-HOG feature. Remote Sens. 2016, 8, 683. [Google Scholar] [CrossRef]
Agrawal, A.; Mangalraj, P.; Bisherwal, M.A. Target detection in SAR images using SIFT. In Proceedings of the 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Abu Dhabi, United Arab Emirates, 7–10 December 2015; pp. 90–94. [Google Scholar]
Zhang, T.; Ji, J.; Li, X.; Yu, W.; Xiong, H. Ship Detection From PolSAR Imagery Using the Complete Polarimetric Covariance Difference Matrix. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2824–2839. [Google Scholar]
Bai, J.; Lu, J.; Xiao, Z.; Chen, Z.; Jiao, L. Generative adversarial networks based on transformer encoder and convolution block for hyperspectral image classification. Remote Sens. 2022, 14, 3426. [Google Scholar] [CrossRef]
Bai, J.; Yu, W.; Xiao, Z.; Havyarimana, V.; Regan, A.C.; Jiang, H.; Jiao, L. Two-stream spatial–temporal graph convolutional networks for driver drowsiness detection. IEEE Trans. Geosci. Remote Sens. 2021, 52, 13821–13833. [Google Scholar]
Bai, J.; Ding, B.; Xiao, Z.; Jiao, L.; Chen, H.; Regan, A.C. Hyperspectral image classification based on deep attention graph convolutional network. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5504316. [Google Scholar]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [PubMed]
Li, Y.; Wang, H.; Jin, Q.; Hu, J.; Chemerys, P.; Fu, Y.; Wang, Y.; Tulyakov, S.; Ren, J. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. Adv. Neural Inf. Process. Syst. 2024, 36, 20662–20678. [Google Scholar]
Wang, Y.; Bai, J.; Xiao, Z.; Chen, Z.; Xiong, Y.; Jiang, H.; Jiao, L. AutoSMC: An Automated Machine Learning Framework for Signal Modulation Classification. IEEE Trans. Inf. Forensics Secur. 2024, 19, 6225–6236. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Bai, J.; Huang, S.; Xiao, Z.; Li, X.; Zhu, Y.; Regan, A.C.; Jiao, L. Few-shot hyperspectral image classification based on adaptive subspaces and feature transformation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5523917. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
Bai, J.; Shi, W.; Xiao, Z.; Regan, A.C.; Ali, T.A.A.; Zhu, Y.; Zhang, R.; Jiao, L. Hyperspectral image classification based on superpixel feature subdivision and adaptive graph structure. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5524415. [Google Scholar]
Bai, J.; Yuan, A.; Xiao, Z.; Zhou, H.; Wang, D.; Jiang, H.; Jiao, L. Class incremental learning with few-shots based on linear programming for hyperspectral image classification. IEEE Trans. Cybern. 2020, 52, 5474–5485. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Bai, J.; Sun, F. Visual localization method for unmanned aerial vehicles in urban scenes based on shape and spatial relationship matching of buildings. Remote Sens. 2024, 16, 3065. [Google Scholar] [CrossRef]
Bai, J.; Zhou, Z.; Chen, Z.; Xiao, Z.; Wei, E.; Wen, Y.; Jiao, L. Cross-dataset model training for hyperspectral image classification using self-supervised learning. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5538017. [Google Scholar] [CrossRef]
Bai, J.; Shi, W.; Xiao, Z.; Ali, T.A.A.; Ye, F.; Jiao, L. Achieving better category separability for hyperspectral image classification: A spatial–spectral approach. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 9621–9635. [Google Scholar] [CrossRef]
Bai, J.; Liu, R.; Zhao, H.; Xiao, Z.; Chen, Z.; Shi, W.; Xiong, Y.; Jiao, L. Hyperspectral image classification using geometric spatial–spectral feature integration: A class incremental learning approach. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5531215. [Google Scholar] [CrossRef]
Bai, J.; Wen, Z.; Xiao, Z.; Ye, F.; Zhu, Y.; Alazab, M.; Jiao, L. Hyperspectral image classification based on multibranch attention transformer networks. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5535317. [Google Scholar] [CrossRef]
Gao, G.; Chen, Y.; Feng, Z.; Zhang, C.; Duan, D.; Li, H.; Zhang, X. R-LRBPNet: A Lightweight SAR Image Oriented Ship Detection and Classification Method. Remote Sens. 2024, 16, 1533. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X. Injection of traditional hand-crafted features into modern CNN-based models for SAR ship classification: What, why, where, and how. Remote Sens. 2021, 13, 2091. [Google Scholar] [CrossRef]
Yasir, M.; Liu, S.; Mingming, X.; Wan, J.; Pirasteh, S.; Dang, K.B. ShipGeoNet: SAR image-based geometric feature extraction of ships using convolutional neural networks. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 5202613. [Google Scholar] [CrossRef]
Chang, S.; Deng, Y.; Zhang, Y.; Zhao, Q.; Wang, R.; Zhang, K. An advanced scheme for range ambiguity suppression of spaceborne SAR based on blind source separation. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5230112. [Google Scholar] [CrossRef]
Liangjun, Z.; Feng, N.; Yubin, X.; Gang, L.; Zhongliang, H.; Yuanyang, Z. MSFA-YOLO: A Multi-Scale SAR Ship Detection Algorithm Based on Fused Attention. IEEE Access 2024, 12, 24554–24568. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Wu, K.; Zhang, Z.; Chen, Z.; Liu, G. Object-Enhanced YOLO Networks for Synthetic Aperture Radar Ship Detection. Remote Sens. 2024, 16, 1001. [Google Scholar] [CrossRef]
Liu, Y.; Ma, Y.; Chen, F.; Shang, E.; Yao, W.; Zhang, S.; Yang, J. YOLOv7oSAR: A Lightweight High-Precision Ship Detection Model for SAR Images Based on the YOLOv7 Algorithm. Remote Sens. 2024, 16, 913. [Google Scholar] [CrossRef]
Li, Y.; Zhang, S.; Wang, W.Q. A lightweight faster R-CNN for ship detection in SAR images. IEEE Geosci. Remote. Sens. Lett. 2020, 19, 4006105. [Google Scholar]
Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. SAR ship detection dataset (SSDD): Official release and comprehensive data analysis. Remote Sens. 2021, 13, 3690. [Google Scholar] [CrossRef]
Gui, Y.; Li, X.; Xue, L. A multilayer fusion light-head detector for SAR ship detection. Sensors 2019, 19, 1124. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X.; Shi, J.; Wei, S. Depthwise separable convolution neural network for high-speed SAR ship detection. Remote Sens. 2019, 11, 2483. [Google Scholar] [CrossRef]
Wang, S.; Cai, Z.; Yuan, J. Automatic SAR Ship Detection Based on Multifeature Fusion Network in Spatial and Frequency Domains. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4102111. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X.; Ke, X.; Liu, C.; Xu, X.; Zhan, X.; Wang, C.; Ahmad, I.; Zhou, Y.; Pan, D.; et al. HOG-ShipCLSNet: A Novel Deep Learning Network with HOG Feature Fusion for SAR Ship Classification. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5210322. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Zhang, J.; Xing, M.; Xie, Y. FEC: A feature fusion framework for SAR target recognition based on electromagnetic scattering features and deep CNN features. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2174–2187. [Google Scholar]
Suzuki, S.; be, K. Topological structural analysis of digitized binary images by border following. Comput. Vision Graph. Image Process. 1985, 30, 32–46. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; Volume 1, pp. 886–893. [Google Scholar] [CrossRef]
Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar]
Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
Harris, C.G.; Stephens, M.J. A Combined Corner and Edge Detector. In Proceedings of the Alvey Vision Conference, Manchester, UK, 31 August–2 September 1988. [Google Scholar]
Zhang, T.; Zhang, X. A polarization fusion network with geometric feature embedding for SAR ship classification. Pattern Recognit. 2022, 123, 108365. [Google Scholar]
Kuiying, Y.; Lin, J.; Changchun, Z.; Jin, J. SAR automatic target recognition based on shadow contour. In Proceedings of the 2013 Fourth International Conference on Digital Manufacturing & Automation, ICDMA, Shinan, China, 29–30 June 2013; IEEE: New Youk, NJ, USA, 2013; pp. 1179–1183. [Google Scholar]
HM, F. Adaptive detection mode with threshold control as a function of spatially sampled clutter-level estimates. Rca Rev. 1968, 29, 414–465. [Google Scholar]
Steenson, B.O. Detection performance of a mean-level threshold. IEEE Trans. Aerosp. Electron. Syst. 1968, AES-4, 529–534. [Google Scholar]
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; IEEE: New Youk, NJ, USA, 1999; Volume 2, pp. 1150–1157. [Google Scholar]
Bai, J.; Ren, J.; Yang, Y.; Xiao, Z.; Yu, W.; Havyarimana, V.; Jiao, L. Object detection in large-scale remote-sensing images based on time-frequency analysis and feature optimization. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 5405316. [Google Scholar]
Szegedy, C.; Toshev, A.; Erhan, D. Deep neural networks for object detection. Adv. Neural Inf. Process. Syst. 2013, 26, 2553–2561. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Chen, S.; Sun, P.; Song, Y.; Luo, P. Diffusiondet: Diffusion model for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 19830–19843. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference Computer Vision—ECCV, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162. [Google Scholar]
Lu, X.; Li, B.; Yue, Y.; Li, Q.; Yan, J. Grid r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7363–7372. [Google Scholar]
Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking classification and localization for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10186–10195. [Google Scholar]
Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 14454–14463. [Google Scholar]
Xia, R.; Chen, J.; Huang, Z.; Wan, H.; Wu, B.; Sun, L.; Yao, B.; Xiang, H.; Xing, M. CRTransSar: A Visual Transformer Based on Contextual Joint Representation Learning for SAR Ship Detection. Remote Sens. 2022, 14, 1488. [Google Scholar] [CrossRef]
Ren, X.; Bai, Y.; Liu, G.; Zhang, P. YOLO-Lite: An Efficient Lightweight Network for SAR Ship Detection. Remote Sens. 2023, 15, 3771. [Google Scholar] [CrossRef]
Zhang, Y.; Hao, L.Y.; Li, Y. SD-YOLO: An Attention Mechanism Guided YOLO Network for Ship Detection. In Proceedings of the 2024 14th International Conference on Information Science and Technology (ICIST), Chengdu, China, 6–9 December 2024; IEEE: New Youk, NJ, USA, 2024; pp. 769–776. [Google Scholar]
Bayındır, C.; Altintas, A.A.; Ozaydin, F. Self-localized solitons of a q-deformed quantum system. Commun. Nonlinear Sci. Numer. Simul. 2021, 92, 105474. [Google Scholar] [CrossRef]
Fujimoto, W.; Miratsu, R.; Ishibashi, K.; Zhu, T. Estimation and Use of Wave Information for Ship Monitoring. ClassNK Tech. J. 2022, 2022, 79–92. [Google Scholar]
Wang, H.; Nie, D.; Zuo, Y.; Tang, L.; Zhang, M. Nonlinear ship wake detection in SAR images based on electromagnetic scattering model and YOLOv5. Remote Sens. 2022, 14, 5788. [Google Scholar] [CrossRef]

Figure 1. Differences in ship target representation between optical and SAR images.

Figure 2. Overview framework. The proposed expert feature-based convolutional neural network, mainly containing deep-network-feature extraction, SAR ship expert-feature extraction, and multi-source features association module.

Figure 3. Basic pipeline of detecting SAR ship targets using Faster R-CNN. The backbone is built to capture abstract features, FPN to extract target features for different sized targets, RPN to generate candidate boxes, and heads to make final detection.

Figure 4. Some of the expert features accessible in SAR images.

Figure 5. Feature maps of strong scattering points extracted from SAR images of different scenarios.

Figure 6. Peak feature maps extracted from SAR images of different scenarios.

Figure 7. General form of CFAR detection algorithms.

Figure 8. Schematic of CFAR detector sliding window.

Figure 9. CFAR feature maps extracted from SAR images of different scenarios.

Figure 10. Our proposed multi-source features association module. Expert features is the key and model abstract feature is the query and value.

Figure 11. Some samples from the SSDD dataset. The first line is the offshore scenario and the second line is the nearshore scenario.

Figure 12. Detection results in simple scenarios. (Left) ground truth; (middle) Faster R-CNN; (right) our EFCNet.

Figure 13. Detection results in complex scenarios. (Top) ground truth; (middle) Faster R-CNN; (bottom) our EFCNet.

Table 1. SAR ship single-feature association network-detection results on SSDD dataset.

Feature	Recall	AP@0.5:0.95	AP@0.5
Faster R-CNN (Baseline)	0.661	0.609	0.970
Feature association type: stack channel & self-attention mechanism
(1) Scattering	0.648	0.529	0.974
(2) Peak	0.666	0.610	0.973
(3) CFAR	0.642	0.583	0.960
(4) Contour	0.647	0.592	0.973
(5) Hog	0.662	0.608	0.974
(6) GLCM	0.614	0.540	0.952
(7) Canny	0.652	0.599	0.969
(8) Harris	0.642	0.580	0.959
Feature association type: multi-source features association module
(1) Scattering	0.683	0.631	0.976
(2) Peak	0.670	0.613	0.970
(3) CFAR	0.662	0.611	0.974
(4) Contour	0.662	0.607	0.975
(5) Hog	0.670	0.619	0.975
(6) GLCM	0.664	0.607	0.973
(7) Canny	0.656	0.604	0.974
(8) Harris	0.667	0.613	0.974

Table 2. SAR ship multiple-features association network-detection results on SSDD dataset.

Feature	Recall	AP@0.5:0.95	AP@0.5
Faster R-CNN (Baseline)	0.661	0.609	0.970
Feature association type: stack channel & self-attention mechanism
4 5	0.643	0.589	0.964
1 3	0.643	0.575	0.966
1 2 3	0.633	0.574	0.966
4 5 8	0.637	0.573	0.959
1 4 5 8	0.640	0.578	0.959
1 3 4 7 8	0.612	0.542	0.935
Feature association type: multi-source features association module
4 5	0.665	0.613	0.961
1 3	0.664	0.612	0.973
1 2 3	0.667	0.611	0.971
4 5 8	0.657	0.606	0.975
1 4 5 8	0.666	0.608	0.974
1 3 4 7 8	0.661	0.619	0.975
(1) Scattering, (2) Peak, (3) CFAR, (4) Contour
(5) Hog, (6) GLCM, (7) Canny, (8) Harris

Table 3. Comparison of model performance with different training set capacity.

Method	Faster R-CNN (Baseline)			EFCNet (Ours)
Method	Recall	AP@0.5:0.95	AP@0.5	Recall	AP@0.5:0.95	AP@0.5
10%	0.473	0.352	0.722	0.507	0.495	0.880
20%	0.498	0.409	0.777	0.517	0.543	0.891
30%	0.515	0.421	0.795	0.612	0.554	0.914
40%	0.566	0.475	0.833	0.631	0.579	0.925
50%	0.595	0.524	0.862	0.645	0.591	0.927
60%	0.609	0.534	0.889	0.627	0.576	0.919
70%	0.636	0.565	0.905	0.664	0.612	0.947
80%	0.638	0.568	0.907	0.670	0.615	0.958
90%	0.676	0.618	0.936	0.676	0.623	0.960
100%	0.661	0.609	0.970	0.661	0.619	0.975

Table 4. Results compared with other advanced methods. The best results are in bold.

Method	AP@0.5:0.95	AP@0.5
EfficientNet [64]	0.507	0.866
YOLOv3 [39]	0.563	0.915
SSD [65]	0.558	0.948
RetinaNet [66]	0.585	0.900
Faster R-CNN [37]	0.609	0.970
Cascade R-CNN [67]	0.624	0.945
Grid R-CNN [68]	0.531	0.958
Double-Head R-CNN [69]	0.605	0.944
Sparse-RCNN [70]	0.612	0.932
CRTransSar [71]	-	0.970
YOLO-Lite [72]	-	0.944
SD-YOLO [73]	0.623	0.961
EFCNet (ours)	0.631	0.976

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Zhang, Y.; Bai, J.; Hou, B. EFCNet: Expert Feature-Based Convolutional Neural Network for SAR Ship Detection. Remote Sens. 2025, 17, 1239. https://doi.org/10.3390/rs17071239

AMA Style

Chen Z, Zhang Y, Bai J, Hou B. EFCNet: Expert Feature-Based Convolutional Neural Network for SAR Ship Detection. Remote Sensing. 2025; 17(7):1239. https://doi.org/10.3390/rs17071239

Chicago/Turabian Style

Chen, Zheng, Yuxiang Zhang, Jing Bai, and Biao Hou. 2025. "EFCNet: Expert Feature-Based Convolutional Neural Network for SAR Ship Detection" Remote Sensing 17, no. 7: 1239. https://doi.org/10.3390/rs17071239

APA Style

Chen, Z., Zhang, Y., Bai, J., & Hou, B. (2025). EFCNet: Expert Feature-Based Convolutional Neural Network for SAR Ship Detection. Remote Sensing, 17(7), 1239. https://doi.org/10.3390/rs17071239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EFCNet: Expert Feature-Based Convolutional Neural Network for SAR Ship Detection

Abstract

1. Introduction

2. Method

2.1. Overview

2.2. Deep-Feature Extraction

2.3. Expert Feature Extraction

2.3.1. Strong Scattering Point Extraction

2.3.2. Peak Features

2.3.3. CFAR

2.4. Multi-Source Features Association Module

3. Result

3.1. Datasets and Experimental Details

3.2. Ablation Experiment and Analysis

3.3. Comparison Experiments and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI