A Lightweight Deep Learning Framework with Reduced Computational Overhead for Ship Detection in Satellite SAR Imagery

Sun, Yuchao; Liu, Chenxi; He, Zhengzheng; Zhang, Zhen

doi:10.3390/jmse13122234

Open AccessArticle

A Lightweight Deep Learning Framework with Reduced Computational Overhead for Ship Detection in Satellite SAR Imagery

¹

Technology Innovation Center for South China Sea Remote Sensing, Surveying and Mapping Collaborative Application, Ministry of Natural Resources, Guangzhou 510300, China

²

Faculty of Land and Resources Engineering, Kunming University of Science and Technology, Kunming 650500, China

³

Shandong Provincal NO. 4 Institute of Geological and Mineral Survey, Weifang 261021, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(12), 2234; https://doi.org/10.3390/jmse13122234

Submission received: 18 October 2025 / Revised: 17 November 2025 / Accepted: 18 November 2025 / Published: 24 November 2025

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

Ship detection plays a pivotal role in safeguarding maritime security, regulating vessel traffic, and bolstering national maritime defense. While contemporary lightweight models predominantly emphasize parameter reduction, efforts to curtail computational demands remain underexplored. In this study, we propose a lightweight multi-feature channel convolution module (MFC-Conv) to create an efficient backbone network. This module adeptly propagates multi-scale feature information, yielding a holistic representation while approximating residual architectures in a computationally frugal manner, thereby promoting seamless gradient flow during optimization. Notably, MFC-Conv can be re-parameterized into a streamlined two-layer convolutional structure devoid of branching or partitioning, streamlining deployment on resource-constrained edge devices. Complementing this, a multi-feature attention module (MFA) is proposed to augment localization and classification efficacy with negligible overhead. Furthermore, leveraging the inherent resolution traits of satellite SAR imagery, the decoder is refined to minimize redundant computations. Empirical evaluations across diverse datasets reveal that our framework outperforms the baseline by slashing parameters by 57.8% and FLOPs by 42.7%. Relative to two leading lightweight state-of-the-art (SOTA) models, it achieves computational reductions of 51.4% and 25.0%, respectively, thereby enabling viable onboard satellite deployment for ship detection.

Keywords:

ship detection; SAR; lightweight computing; multi-feature channel convolution; multi-feature attention; deep learning framework

1. Introduction

Ship detection is a critical task in the field of marine affairs. As a key technological advancement, it supports the advancement of national marine economies, safeguards maritime rights and interests, and facilitates the intelligent transformation of the shipping industry. This technology has wide-ranging applications, including ensuring shipping safety, maintaining maritime traffic, supporting marine environmental protection, providing rescue services, and enhancing national maritime security control [1,2,3,4].

In the field of ship detection, high-frequency surface wave radars (HFSWRs) [5,6,7] have long served as a widely used ground-based sensing technology. It offers several advantages, including real-time monitoring capability, rich motion information, and relatively low operational costs. However, its limited spatial resolution prevents the extraction of detailed image features, its positioning accuracy is modest, and its detection range does not extend to deep-sea targets. These limitations significantly constrain its applicability across broader maritime scenarios. Traditional satellite-based ship detection methodologies have relied on optical, infrared, and microwave remote sensing modalities. Initially, optical-based approaches dominated, harnessing visible or infrared imagery to exploit color, morphology, and textural cues [8,9,10]. Despite their interpretability, these techniques are inherently vulnerable to atmospheric perturbations, with inclement weather precipitating marked declines in efficacy. Moreover, optical sensors preclude uninterrupted, all-weather surveillance. The continuous advancement of synthetic aperture radar (SAR) technology has propelled SAR as the modality of choice for ship detection [11,12]. Unlike optical data, SAR exhibits resilience to meteorological and illuminative variability, penetrating obfuscating phenomena like clouds and fog to enable persistent, round-the-clock monitoring. Moreover, maritime ships, especially those constructed from metallic materials, exhibit distinct radiometric characteristics in microwave remote sensing, often showing strong backscattering coefficients. These features effectively overcome the limitations of optical imaging systems.

Pioneering detection paradigms hinged on manually designed features, such as Histogram of Oriented Gradients (HOG) [13] and Deformable Part Model (DPM) [14], which suffered from limited generalization ability and poor adaptability to complex scenes. Subsequently, the task of object detection was advanced by models such as Region-based Convolutional Neural Networks (R-CNN) [15], Fast Region-based Convolutional Network (Fast R-CNN) [16], and Faster Region-based Convolutional Network (Faster R-CNN) [17], which adopted a two-stage detection framework. While these models provided significant advancements, they are relatively slow in computation and involve large numbers of parameters. Additionally, due to their reliance on fixed anchor boxes, their positioning accuracy is limited, and they struggle to detect small targets effectively. The introduction of the You Only Look Once (YOLO)v1-YOLOv3 series [18,19,20] marked a shift toward one-stage algorithms, significantly improving computational speed compared to earlier models. Subsequently, with the release of YOLOv5-YOLOv7 [21,22,23], dynamic anchor boxes were incorporated, enhancing the boundary accuracy of target detection. The latest iterations, YOLOv8-YOLOv12 [24,25,26,27,28], have adopted an anchor-free design, which further enhances the detection accuracy of small targets. Building upon these YOLO models, advanced methods for small target detection have emerged, including feature enhancement, fusion and context aware YOLO (FFCA-YOLO) [29] and faster and better for real-time YOLO (FBRT-YOLO) [30]. In recent years, alongside the YOLO series, transformer-based models [31] utilizing global attention mechanisms, such as real-time detection transformer (RT-DETR) [32], and models like efficient object detection V2 (EfficientDet-V2) [33], which offer self-supervised pre-training capabilities, have been developed. Some researchers have employed pruning techniques together with knowledge distillation to achieve both compact model size and low computational cost [34]. These models provide high accuracy and advanced features. However, despite their strengths, they are characterized by large model parameters and high computational demands. Additionally, their ability to capture small targets remains limited, making them less suitable for processing the vast amounts of data generated by maritime satellite monitoring.

Therefore, this study leverages SAR imagery to develop a lightweight, all-weather ship detection framework, prioritizing computational efficiency over mere parameter reduction, as 1M–3M parameters pose minimal burden on satellite-edge devices. We designed the Multi-Feature Channel Convolution (MFC-Conv) module to build an efficient backbone, enabling multi-scale feature propagation, lightweight residual approximation for improved gradient flow, and reparameterization into a branch-free dual-layer convolution for seamless edge deployment. The Multi-Feature Attention (MFA) module was incorporated to boost localization and classification with low overhead. Exploiting the limited scale variation in SAR vessel targets, the decoder was optimized by eliminating redundant heads, reducing computations and mitigating background noise. The main contributions are as follows:

We unveiled MFC-Conv, a module that concurrently diminishes parameters and FLOPs, elevating operational efficiency, with reparameterization to dual convolutions for edge-device compatibility.
An efficient emulation of residual blocks was introduced, balancing lightweight design with enhanced training dynamics.
A novel multi-feature attention (MFA) module was proposed, which enhanced the model’s localization and recognition capabilities, improving overall detection accuracy.
Exploiting SAR-specific traits, the decoder was rationalized, yielding reductions in parameters and computations while mitigating clutter and refining target acuity.

The remainder of this paper proceeds as follows: Section 2 presents the proposed method, elaborating on the structure of the key modules and the loss function used for training; Section 3 reports the experimental environment and parameter design of this paper, and conducts comparative tests on three datasets to evaluate the proposed method and other recent methods; Section 4 discusses the effectiveness of each module proposed in this paper, and also examines the impact of different parameter settings on the results; and finally, Section 5 summarizes the above results and presents the final conclusion.

2. Methodology

Our framework, MFCNet, as illustrated in Figure 1, builds upon the YOLOv8n model as the baseline through pivotal enhancements tailored to satellite SAR imagery. Firstly, we modified the decoder to better accommodate the characteristics of satellite SAR images, in which ship targets are typically small and may comprise only a few dozen pixels. In target detection tasks, such targets are classified as small objects. Consequently, the detect module responsible for identifying targets of other sizes can be pruned. By removing the unnecessary detect module, we reduce the model’s computational cost while allowing it to focus more effectively on learning the features of small ship targets. Second, we introduce the MFC-Conv to construct a new backbone. This modification further reduces the model’s computational cost, with only a slight trade-off in accuracy. Lastly, we integrate the MFA module on top of MFC-Conv. By leveraging the diverse feature information provided by MFC-Conv, MFA enhances both the spatial and channel aspects, significantly improving the model’s ability to recognize and localize targets. The overall architecture of the model is shown in the figure below:

In the Encoder section (as depicted in Figure 1a), all 3 × 3 convolutions within MFCNet are followed by a batch normalization layer and the SiLU activation function [35] to yield the resultant features. In this section, all independent 3 × 3 convolutions are configured with a stride of two, which serves to reduce the spatial resolution of the feature maps, thereby decreasing subsequent computational costs while simultaneously enlarging the receptive field of the deep model. In the decoder section (as depicted in Figure 1b), the upsampling module doubles the spatial resolution to align the feature maps with the subsequent concatenation layer in terms of spatial scale. In the detect section (as depicted in Figure 1c), the 3 × 3 convolutions across the dual branches emulate those in the primary pathway, with their outputs subsequently normalized via batch normalization and activated using SiLU. The terminal 1 × 1 convolution layer undergoes no additional post-processing.

2.1. Multi-Feature Channel Convolution

As shown in Figure 1e, the MFC-Conv module incorporates multiple feature channels, allowing for the simultaneous processing of several sets of feature information within a single layer. In each feature channel, a 3 × 3 convolution is employed to further extract features with high parameter and computational efficiency. This design ensures the overall computational efficiency of the model while maintaining compatibility with convolution acceleration algorithms. Furthermore, MFC-Conv includes an approximation of residual structure, enabling the model to retain essential shallow features while also facilitating the extraction of deeper features. Furthermore, just like the typical residual structure, it enables the gradient to be efficiently propagated during the training process, thereby enhancing the training performance of deep models. Finally, after completing feature extraction, a linear projection is applied along the channel direction to filter out the information requiring further extraction, as well as to preserve the necessary shallow features. The equation is expressed as follows:

(X^{0}, X^{1}, X^{2}, X^{3}) = S p l i t (X^{i n p u t}),

(1)

Y^{(i)} = S i L u (B N (W_{1}^{(i)} * X^{(i)} + B_{1}^{(i)})) i \in [1, 2],

(2)

O = S i L u (B N (W_{2} \times C o n c a t (Y^{(0)}, Y^{(1)}, Y^{(2)}, X^{3}) + B_{2})),

(3)

where

X_{i n p u t}

represents the model’s input. Through the split operation, the input is evenly divided into four parts,

X^{0}

to

X^{3}

, along the channel dimension. The first three components are processed according to Equation (2), where the subscript (i) denotes the index corresponding to each input branch.

W_{1}^{(i)}

and

B_{1}^{(i)}

represent the weights and biases of the corresponding convolutional layers respectively. BN denotes the batch normalization layer, and SiLU is the activation function. Finally, in Equation (3), the outputs

Y^{0}

to

Y^{2}

obtained from Equation (2), together with

X^{3}

from Equation (1), are concatenated along the channel dimension for fusion. The merged result is then processed through a convolutional layer, followed by batch normalization and the SiLU activation function, to produce the final output of the module.

The MFC-Conv module enhances parameter utilization efficiency by introducing multiple feature channels, thereby mitigating feature interference and reducing both the number of parameters and computational cost while maintaining model performance. MFC-Conv adopts an approximation of residual structure, which facilitates the downward transmission of shallow features and the upward propagation of error gradients during training. This design not only lowers computational costs during inference but also minimizes memory consumption by avoiding the overhead typically introduced by complex branching structures.

Furthermore, certain edge computing devices may provide only limited support for the split operation. To overcome this limitation, the first three convolutional branches in the MFC-Conv module can be seamlessly merged into a grouped convolution. The approximation of residual structure can also be equivalently transformed into a standard convolution by introducing a set of manually designed fixed parameters. As a result, the MFC-Conv module can be reconfigured, when necessary, into a simplified structure consisting of a 3 × 3 grouped convolution and a 1 × 1 standard convolution. This reconfiguration greatly enhances deployment efficiency on resource-constrained edge devices. The reconfiguration process is illustrated in the Figure 2.

The upper part of Figure 2 illustrates the structural evolution of the MFC-Conv module during the parameter reconstruction process, while the lower part shows the corresponding changes in the parameter structure. In this figure, the orange branch represents the multi-feature channel branch, and its parameters are depicted as orange squares. The blue branch denotes the approximation of residual structure, with the reconstructed parameters represented by blue squares, each assigned a value of one. The symbol N refers to the channel dimension of the input and output feature matrices. In the final step, the parameters from the four branches are integrated to form a grouped convolution with four groups. This design enables better adaptability to various types of deep learning edge computing devices.

2.2. Multi-Feature Attention

To enhance the model’s spatial localization and feature recognition capabilities, this study designs the MFA module based on the backbone constructed using the proposed MFC-Conv, leveraging its ability to transmit multi-level feature information. As illustrated in Figure 1d, the MFA module uses diverse feature channels propagated through the backbone network to generate both spatial and channel attention weights. These weights are then cross fused, complementing another set of features that lack corresponding information. The resulting features are subsequently aggregated and fused to produce an enhanced output that simultaneously strengthens both spatial and channel representations, thereby improving the overall performance of the model. The corresponding Equations are expressed as follows:

(X^{s p a t i a l}, X^{c h a n n e l}) = S p l i t (X^{i n p u t}),

(4)

W_{a t t n}^{s p a t i a l} = S o f t m a x (B N (W_{s} \times X^{s p a t i a l} + B_{s})),

(5)

X_{d w} = W_{d w} * X^{c h a n n e l} + B_{d w},

(6)

W_{a t t n}^{c h a n n e l} = S o f t m a x (\frac{1}{W \times H} \sum_{i = 0}^{H} \sum_{j = 0}^{W} X_{d w} (i, j)),

(7)

O_{F C A} = W_{a t t n}^{c h a n n e l} \times X^{s p a t i a l} + W_{a t t n}^{s p a t i a l} \times X^{c h a n n e l}

(8)

The final 1 × 1 convolutional projection layer in the backbone’s MFC-Conv maps the output to twice the required number of channels. As shown in Equation (4), the input feature

X_{i n p u t}

is divided into two independent data blocks,

X_{s p a t i a l}

and

X_{c h a n n e l}

, which are used to extract spatial and channel attention weights, respectively. This design prevents a single set of features from simultaneously containing spatial, channel, and object-level information, thereby providing dedicated and independent transmission pathways for spatial and channel features. As a result, the learning difficulty of the model is reduced. The computation of spatial weights is expressed in Equation (5), where

W_{s}

and

B_{s}

denote the weights and biases of a 1 × 1 convolution. The output is then processed by a batch normalization (BN) layer and a Softmax function to obtain the spatial attention weights. The calculation of channel weights is shown in Equations (6) and (7), where

W_{d} w

and

B_{d} w

represent the weights and biases of a 3 × 3 depth wise convolution. Each channel’s features are independently extracted through the depth wise convolution, followed by channel-wise average pooling and normalization using the Softmax activation function to produce the channel attention weights. Finally, the output of the MFA module is computed as shown in Equation (8). The channel weights are applied to the data block to refine spatial features, while the spatial weights are applied to enhance channel features. The two results are then combined through element-wise addition and fusion to generate the final output. This cross-attention mechanism compensates for the missing spatial or channel information in each feature subset, ultimately producing outputs with precise spatial localization and salient feature representation.

2.3. Loss Functions

In this paper, three loss functions [19] are employed to achieve accurate ship target detection. To determine the existence of a predicted bounding box, the binary cross-entropy loss, referred to as the classification loss (Cls loss), is utilized. Its equation is as follows:

L_{c l s} = - \frac{1}{N} \sum_{n = 1}^{N} [y_{n} log (\frac{1}{1 + e^{- x_{n}}}) + (1 - y_{n}) log (1 - \frac{1}{1 + e^{- x_{n}}})],

(9)

where

y_{n}

represents the true value in one-hot encoding,

x_{n}

is the category logit output by the classification branch of the detect module of the model. N represents the number of candidate regions. The remaining two losses are box loss and DFL loss. These two are collectively referred to as bounding box loss (Bbox loss) and are calculated by the output of another branch of the detect module. Box loss is used to enhance the overlap between the predicted box and the real box. DFL loss optimizes the prediction of boundary positions and improves the robustness of the model. Their Equations are as follows:

I o U = \frac{A r e a (B o x_{p r e} \cap B o x_{t r u e})}{A r e a (B o x_{p r e} \cup B o x_{t r u e})},

(10)

L_{b o x} = 1 - I o U,

(11)

L_{d f l} = - \sum [(⌈ y ⌉ - y) log (S_{1}) + (y - ⌊ y ⌋) log (S_{2})]

(12)

The box loss is based on intersection over union (IoU), where

B o x_{p r e}

represents the prediction box and

B o x_{t r u e}

represents the ground truth box. For the DFL loss, y represents the value obtained by scaling the distance between one of the four boundaries and the center point of the true bounding box to the scale of the output feature.

S_{1}

and

S_{2}

are the predicted distribution probability values by the model for the two adjacent integer intervals where y lies. The overall loss is as follows:

L_{t o t a l} = w_{b o x} \times L_{b o x} + w_{d f l} \times L_{d f l} + w_{c l s} \times L_{c l s},

(13)

In this paper, the weighting parameters

w_{b} o x

,

w_{d} f l

, and

w_{c} l s

are set to 7.5, 1.5, and 0.5, respectively.

3. Experiment Results

3.1. Dataset Description

This study leverages three benchmark datasets to assess the performance of the proposed model. The salient dataset attributes are tabulated in Table 1:

The high-resolution SAR images for ship detection dataset (HRSID) and SAR–Ship dataset provide images with various imaging modes and resolutions, which can offer rich information on ship targets. The SAR ship detection dataset (SSDD) is a classic ship detection dataset. Due to its relatively small data scale and number of ship targets, it is used here as a supplementary dataset for comparison.

3.2. Experiment Setup

The software and hardware environments, along with the training parameter settings used in all experiments in this paper, are summarized in the Table 2. In addition, the training parameters are followed by YOLOv8 [24].

In this paper, three different evaluation metrics were used to assess the performance of the model. They were precision (P), recall (R), and mean average precision (mAP). Among them, mAP was divided into mAP50 and mAP50:95, which were used to evaluate the localization quality of the model’s predicted bounding boxes. Their equations are as follows:

P = \frac{T P}{T P + F P},

(14)

R = \frac{T P}{T P + F N},

(15)

A P = \int_{0}^{1} P (R) d R,

(16)

m A P 50 = \frac{1}{C} \sum_{c = 1}^{C} A P_{c, 50 %},

(17)

m A P 50 : 95 = \frac{1}{10 \times C} \sum_{t = 50 %}^{95 %} \sum_{c = 1}^{C} A P_{c, t},

(18)

where true positives (TP) encompass predicted bounding boxes exhibiting an Intersection over union (IoU) with their corresponding ground-truth counterparts at or exceeding the predefined threshold, without redundant matching. False positives (FP) include predictions with sub-threshold IoU or those exhibiting multiple matches, while false negatives (FN) denote unmatched ground-truth instances. The IoU metric is formalized in Equation (10). Herein, the confidence threshold for deriving precision (P) and recall (R) is established at 30%, with the IoU threshold fixed at 50%. Equation (16) delineates average precision (AP) as the integral under the precision–recall (PR) curve. In Equation (17), mAP50 quantifies the mean AP across all categories (C) at a fixed IoU threshold of 50%. Equation (18) computes mAP50:95 by averaging AP values over IoU thresholds spanning 50% to 95% in 5% increments. Precision (P) and recall (R) constitute foundational benchmarks, encapsulating the model’s discriminative accuracy and exhaustive coverage in target identification, respectively. Mean average precision (mAP) stands as the preeminent holistic metric in object detection, with mAP50 gauging coarse alignment proficiency and mAP50:95 appraising efficacy under rigorous localization exigencies.

3.3. Comparison

At present, the YOLO series continues to dominate the field of lightweight small object detection owing to its flexible and efficient architecture. Therefore, the comparative experiments in this study employ several representative YOLO models developed from 2023 onward, including YOLOv8, YOLOv10, YOLOv11, and YOLOv12. In addition, two SOTA lightweight models specifically designed for small object detection (FBRT-YOLO and FFCA-YOLO) are also included for comprehensive comparison.

3.3.1. Parameters and Computational Cost

The core objective of the proposed method is to develop an efficient and reliable lightweight deep learning model specifically tailored for satellite SAR image data. Existing lightweight models have already reduced their parameter counts to under 3M, which no longer imposes significant storage constraints on most edge devices used for ship detection tasks. Therefore, this work primarily focuses on optimizing the model’s computational cost, enabling direct deployment on edge computing platforms for efficient and real-time maritime data processing.

As shown in Table 3, the proposed model exhibits a clear advantage in computational cost, achieving a 42.70% reduction in computational cost compared with the baseline model (YOLOv8n), and reductions of 25.00% and 51.43% relative to the two latest SOTA models, FBRT-YOLO and FFCA-YOLO, respectively. In terms of real-time frame rate, the proposed method achieves 345.49% of the baseline’s frame rate. Compared with FBRT-YOLO and FFCA-YOLO, the frame rate of our method reaches 229.81% and 122.15%, respectively. Although the proposed method is not explicitly optimized for parameter minimization, it still outperforms the traditional YOLO series models and remains only slightly inferior to the two SOTA counterparts in terms of parameter count. The subsequent section presents the model’s performance evaluation across three benchmark datasets.

3.3.2. Comparison of Results on HRSID

Figure 3 presents an image from the HRSID test set along with the detection results produced by various models. Focusing first on the regions marked by blue circles, it can be observed that the ship targets in these areas are closely spaced, causing mutual interference in their backscattered signals. Additionally, since these targets are located near the image edges, their structural features appear incomplete. As a result of these adverse conditions, both the YOLO series models and the FFCA-YOLO model failed to accurately detect ships in this region. Although the FBRT-YOLO model successfully identified all ship targets within this complex area, as highlighted by the green circle in Figure 3g, it also produced false positives by misclassifying sea surface clutter and coastal structures as ships. Furthermore, as shown in the green circle in Figure 3f, YOLOv12n erroneously divided a single ship into multiple detections of varying sizes due to background noise interference. In contrast, MFCNet effectively enhances target feature representation and spatial localization through the MFA attention module while mitigating background noise via the decoder optimized for satellite SAR data. Consequently, it correctly detected all ship targets within this region. Only the target located in the lower-right corner of the blue circle exhibited slightly weaker bounding box localization, likely due to incomplete target information.

The quantitative evaluation results presented in Table 4 demonstrate that the proposed method achieves advantages on this dataset. In particular, the proposed model, along with the two SOTA counterparts, significantly outperforms the traditional YOLO series models on the HRSID dataset in terms of R and mAP50. Furthermore, regarding the precision-oriented metrics (P and mAP50:95), the proposed method attains an additional improvement of approximately 0.2% compared to the other SOTA models. These results indicate that, despite the substantial reduction in computational cost, the proposed model maintains strong recognition capabilities without compromising detection accuracy. Therefore, it can be regarded as an efficient and high-performance approach for ship target recognition in the SAR imagery.

3.3.3. Comparison of Results on SAR–Ship Dataset

For the SAR–Ship dataset, a total of 6000 images were randomly selected as the test set, accounting for approximately 15.1% of the entire dataset. Figure 4a presents one representative image from the test set. This image includes not only ship targets and land backgrounds but also multiple islands of varying sizes that act as interference factors. As indicated by the green and blue circles in the figure, all models were influenced by the presence of these small islands, leading to false detections where islands were mistakenly classified as ships. In addition, some models incorrectly identified the large island within the blue circle as a ship target. The model proposed in this paper effectively enhances target feature recognition by integrating the MFC-Conv-based backbone with the MFA attention module, thereby reducing the impact of small islands on detection. Moreover, the modified decoder helps mitigate the interference caused by large islands. As a result, the proposed model achieves accurate and efficient ship detection in this challenging scenario.

Table 5 presents the performance of each model on the test subset of the SAR–Ship dataset, which aligns well with the analysis discussed above. On this dataset, the proposed model demonstrates improvements across all evaluation metrics. Compared with the baseline, the proposed model achieves gains of 0.2%, 1.0%, 0.3%, and 0.6% in P, R, mAP50, and mAP50:95, respectively. These results indicate that the proposed model effectively mitigates the impact of various types of noise, leading to more robust and reliable ship detection performance.

3.3.4. Comparison of Results on SSDD

Figure 5 presents the detection results of various models on an image from the SSDD dataset. Unlike the previous examples, the primary challenge in this image lies in accurately detecting densely distributed ship targets. As indicated by the blue circles in Figure 5e,f, YOLOv11n and YOLOv12n failed to correctly identify these dense ship targets, generating multiple incorrect bounding boxes. Similarly, in the regions marked by green circles, the traditional YOLO series models also produced a considerable number of false detections. In contrast, only the proposed MFCNet and the two SOTA models successfully identified all ship targets in this complex scene.

Table 6 presents the accuracy metrics of each model on the SSDD test set. It can be observed that the proposed model and the two SOTA models outperform the traditional YOLO series models. The proposed model achieves the highest performance in precise localization, while FFCA-YOLO demonstrates stronger coarse localization capability, and FBRT-YOLO exhibits higher recognition accuracy. Overall, on the SSDD dataset, MFCNet achieves performance comparable to that of the two latest SOTA models.

4. Discussion

4.1. Ablation Experiment

To compare the effectiveness of each module of the model, we conducted ablation experiments on the decoder, the MFC-Conv backbone, and the MFA attention modules in each block, and verified them on the three datasets used in this paper. Firstly, the parameter quantities and computational cost under different configurations are as Table 7.

It can be observed that by improving the decoder and introducing the MFC-Conv module proposed in this paper, the number of parameters and computational cost of the model have significantly decreased. The subsequently added MFA attention module only slightly increased the number of parameters and computational volume. The final model of this paper has reduced the parameter count by 57.8% and the computational cost by 42.7% compared to the baseline model. The Table 8 presents the ablation experiment results for three datasets.

From the ablation experiment results on the above three datasets, it can be observed that the effects of each module are consistent with the expectations. The decoder simplified based on the resolution of the satellite SAR data reduces the computational load without causing any negative impact on the results. The backbone composed of MFC-Conv avoids the appearance of branch structures and further reduces the computational load, causing only a slight negative effect on the results. Finally, the added MFA module enhances the positioning and recognition capabilities, increasing a small amount of computational cost to improve the overall performance of the model.

4.2. Efficiency of MFA

To verify the effectiveness of the MFA attention module, we compared it with several other low-computation-volume attention mechanism modules, including channel attention, spatial attention, and CBAM [39]. The comparison results on the three datasets are shown in Table 9.

From the above results, it can be seen that the MFA module proposed in this paper can more effectively use the information transmitted by the main backbone composed of the MFC-Conv module, while improving the positioning accuracy of the target and the features of the target, thereby enhancing the overall accuracy of the model. Overall, the performance of MFA is superior to those of the other several low-computation-volume attention modules compared.

4.3. Feature Channels Experiment

The MFC-Conv module proposed in this study serves as the backbone of the model, and the selection of its internal feature channel number has a substantial influence on both the model’s overall performance and computational cost. Therefore, this subsection presents a comparative analysis of the model’s performance and computational cost under different channel configurations of the MFC-Conv module. The results across the three datasets are summarized in Table 10.

From the data presented above, it can be observed that the model achieves higher overall performance when the number of feature channels is relatively small. However, as the number of channels increases, the limited amount of information contained within each channel becomes insufficient for effective feature transmission, which negatively impacts performance. For the backbone of a lightweight model, it is essential not only to consider the final performance but also to account for variations in computational cost. Table 11 presents the number of parameters and computational load of the model under different feature channel configurations.

From the table above, it can be observed that when the number of feature channels is set to two, four, or eight, the differences in computational cost between adjacent configurations are approximately 5.9% and 2%, respectively. This indicates that as the number of feature channels increases, the computational efficiency gains gradually diminish. Moreover, when the number of feature channels is set to two or four, the variations in model accuracy across all evaluation metrics remain within 1%. Therefore, considering the trade-off between accuracy and computational efficiency, the configuration with four feature channels is ultimately selected as the standard setting for the MFC-Conv module.

4.4. The Current Shortcomings and Future Tasks

In the current implementation of the proposed model, all MFC-Conv modules use the same number of feature channels. However, the amount and complexity of feature information vary across different network depths. To optimize feature transmission and further reduce computational cost, it would be beneficial to assign different channel configurations to MFC-Conv modules at different depths. Future work will focus on a more comprehensive investigation of this aspect, analyzing the effects of various channel allocation strategies on ship target recognition performance in the SAR data. The goal is to achieve higher detection accuracy while maintaining an optimal balance between model efficiency and computational cost.

Regarding the influence of weather conditions on model accuracy, the special physical properties of the microwave signals used by satellite SAR ensure that fog and cloud cover cause virtually no interference, resulting in negligible impact on detection performance. However, heavy precipitation can induce scattering of the microwave signal, weakening the returned echoes and slightly reducing recognition accuracy. In addition, high wind speeds can generate a strong background on the sea surface, which may obscure the signatures of small vessels and make them difficult to detect.

5. Conclusions

The proposed MFCNet framework harnesses the MFC-Conv module as its cornerstone architecture, enabling efficient information transmission while significantly reducing model parameters and computational complexity. The MFA attention module fully leverages the multi-channel features propagated by the MFC-Conv backbone to enhance both spatial localization and feature recognition, all with a lightweight computational overhead. Compared with the baseline, the proposed model reduces the number of parameters by 57.8% and the computational cost by 42.7%, while maintaining a clear accuracy advantage over recent lightweight models on the HRSID and SAR–Ship dataset benchmarks. On the SSDD dataset, it achieves performance comparable to the latest state-of-the-art (SOTA) model. Therefore, for ship recognition tasks in the SAR imagery, the proposed approach represents an efficient and lightweight solution, offering strong potential for real-time, all-weather, on-orbit ship detection.

Author Contributions

Conceptualization, methodology, validation, formal analysis, Y.S.; investigation, resources, data curation, writing—original draft preparation, writing—review and editing, supervision, C.L.;writing—review and editing, Z.H.; project administration, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Open Fund of the Technology Innovation Center for South China Sea Remote Sensing, Surveying and Mapping Collaborative Application, Ministry of Natural Resources, P.R. China (Grant No. RSSMCA-2024-B012), the Science and Technology Development Foundation of South China Sea Bureau, Ministry of Natural Resources (Grant No. 230103), and the National Natural Science Foundation of China (Grant No. 42561060).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Manshausen, P.; Watson-Parris, D.; Christensen, M.W.; Jalkanen, J.-P.; Stier, P. Invisible ship tracks show large cloud sensitivity to aerosol. Nature 2022, 610, 101–106. [Google Scholar] [CrossRef]
Hermansson, A.L.; Hassellöv, I.; Grönholm, T.; Jalkanen, J.; Fridell, E.; Parsmo, R.; Hassellöv, J.; Ytreberg, E. Strong economic incentives of ship scrubbers promoting pollution. Nat. Sustain. 2024, 7, 812–822. [Google Scholar] [CrossRef]
Yeh, C.-K.; Lin, C.; Shen, H.-C.; Cheruiyot, N.K.; Nguyen, D.-H.; Chang, C.-C. Real-time energy consumption and air pollution emission during the transpacific crossing of a container ship. Sci. Rep. 2022, 12, 15272. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Huang, L.; Wang, Q.; Jiang, L.; Qi, Y.; Wang, S.; Shen, T.; Tang, B.-H.; Gu, Y. UAV Hyperspectral Remote Sensing Image Classification: A Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3099–3124. [Google Scholar] [CrossRef]
Golubović, D.; Erić, M.; Vukmirović, N.; Vukmirović, N. Improved Detection of Targets on the High-Resolution Range-Doppler Map in HFSWRs. In Proceedings of the 2024 23rd International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovina, 20–22 March 2024; pp. 1–6. [Google Scholar]
Golubović, D. The Future of Maritime Target Detection Using HFSWRs: High-Resolution Approach. In Proceedings of the 2024 32nd Telecommunications Forum (TELFOR), Belgrade, Serbia, 26–27 November 2024; pp. 1–8. [Google Scholar]
Golubović, D.; Marjanović, D. An Experimentally-Based Method for Detection Threshold Determination in HFSWR’s High-Resolution Range-Doppler Map Target Detection. In Proceedings of the 2025 24th International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovina, 19–21 March 2025; pp. 1–6. [Google Scholar]
Wu, P.; Huang, H.; Qian, H.; Su, S.; Sun, B.; Zuo, Z. SRCANet: Stacked Residual Coordinate Attention Network for Infrared Ship Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5003614. [Google Scholar] [CrossRef]
Lu, H.; Li, Y.; Lang, L.; Wu, L.; Xu, L. An Improved Ship Detection Algorithm for an Airborne Passive Interferometric Microwave Sensor (PIMS) Based on Ship Wakes. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5302012. [Google Scholar] [CrossRef]
Zhang, Z.; Huang, L.; Tang, B.-H.; Wang, Q.; Ge, Z.; Jiang, L. Non-Euclidean Spectral-Spatial Feature Mining Network with Gated GCN-CNN for Hyperspectral Image Classification. Expert Syst. Appl. 2025, 272, 126811. [Google Scholar] [CrossRef]
Ma, Y.; Guan, D.; Deng, Y.; Yuan, W.; Wei, M. 3SD-Net: SAR Small Ship Detection Neural Network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5221613. [Google Scholar] [CrossRef]
Chang, H.; Fu, X.; Lang, P.; Guo, K.; Dong, J.; Chang, S. GLDet: Real-Time SAR Ship Detector Based on Global Semantic Information Enhancement and Local Gradient Information Mining. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5209020. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Khanam, R.; Hussain, M. What is YOLOv5: A deep look into the internal features of the popular object detector. arXiv 2024, arXiv:2407.20892. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Yaseen, M. What is YOLOv8: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2408.15857. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611215. [Google Scholar] [CrossRef]
Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. Comput. Vis. Pattern Recognit. 2025, 39, 8673–8681. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Virtual, 3–7 May 2021. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q. DETRs Beat YOLOs on Real-time Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 16965–16974. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Hu, B.; Miao, H. A Lightweight SAR Ship Detection Network Based on Deep Multiscale Grouped Convolution, Network Pruning, and Knowledge Distillation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 2190–2207. [Google Scholar] [CrossRef]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
Wang, Y.; Wang, C.; Zhang, H.; Dong, Y.; Wei, S. A SAR Dataset of Ship Detection for Deep Learning under Complex Backgrounds. Remote Sens. 2019, 11, 765. [Google Scholar] [CrossRef]
Li, J.; Qu, C.; Shao, J. Ship detection in SAR images based on an improved faster R-CNN. In Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China, 13–14 November 2017; pp. 1–6. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]

Figure 1. The overall structure of MFCNet: (a–c) the encoder, decoder, and detect head used by MFCNet and (d) the module for constructing the backbone of MFCNet. Inside this module, the main backbone is built by MFC-Conv (e), and is followed by the MFA module. (f,g) The spatial attention module and channel attention module used in the MFA module.

Figure 2. Parameter construction process for the approximation of residual structure.

Figure 3. Comparison of results on HRSID; image (a) is the visualized image of the original data, while images (b–h) are the results of each comparison model, and image (i) is the result of the method proposed in this paper.

Figure 4. Comparison of results on SAR–Ship dataset; image (a) is the visualized image of the original data, while images (b–h) are the results of each comparison model, and image (i) is the result of the method proposed in this paper.

Figure 5. Comparison of results on SSDD; image (a) is the visualized image of the original data, while images (b–h) are the results of each comparison model, and image (i) is the result of the method proposed in this paper.

Table 1. Basic parameters of the dataset.

Dataset Name	Sensors	Resolution	Image Count	Target Count	Train Count	Test Count
HRSID [36]	Sentinel-1BTerraSAR-XTanDEM-X	0.5 m 1 m 3 m	5604	16,951	3642	1962
SAR–Ship dataset [37]	GF-3 Sentinel-1	1–25 m	39,729	43,819	33,729	6000
SSDD [38]	RadarSat-2	1–15 m	1160	2456	928	232

Table 2. Environment and training parameter configuration.

Hardware/Software	Value	Parameter Name	Value
CPU	i9-13900k	Input Size	800/256/544
GPU	RTX4090	Optimizer	SGD
Memory	128 G	Learning Rate	0.01
OS	Ubuntu22.04	Momentum	0.937
CUDA Version	12.1	Weight Decay	0.0005
Python Version	3.12	Batch Size	8/24/8
PyTorch Version	2.5.1	Epoch	300

Table 3. Comparison of parameters and computational cost.

Model	Parameters	FLOPs	FPS
YOLOv8n [24] (2023)	3.01 M	8.9 G	185.40
YOLOv10n [26] (2024)	2.70 M	8.4 G	186.38
YOLOv11n [27] (2024)	2.59 M	6.4 G	428.34
YOLOv12n [28] (2025)	2.57 M	6.5 G	261.60
FFCA-YOLO [29] (TGRS 2024)	0.93 M	10.5 G	278.73
FBRT-YOLO [30] (AAAI 2025)	0.85 M	6.8 G	524.40
MFCNet (Ours)	1.27 M	5.1 G	640.54

Table 4. Comparison of result on HRSID.

Model	P	R	mAP50	mAP50:95
YOLOv8n [24] (2023)	0.904	0.809	0.902	0.647
YOLOv10n [26] (2024)	0.894	0.829	0.903	0.677
YOLOv11n [27] (2024)	0.907	0.808	0.904	0.671
YOLOv12n [28] (2025)	0.901	0.823	0.904	0.670
FFCA-YOLO [29] (TGRS 2024)	0.910	0.837	0.919	0.692
FBRT-YOLO [30] (AAAI 2025)	0.902	0.837	0.918	0.668
MFCNet (Ours)	0.913	0.837	0.920	0.694

Table 5. Comparison of result on SAR–Ship dataset.

Model	P	R	mAP50	mAP50:95
YOLOv8n [24] (2023)	0.927	0.916	0.959	0.629
YOLOv10n [26] (2024)	0.921	0.911	0.956	0.626
YOLOv11n [27] (2024)	0.924	0.911	0.958	0.615
YOLOv12n [28] (2025)	0.919	0.916	0.956	0.611
FFCA-YOLO [29] (TGRS 2024)	0.923	0.911	0.958	0.613
FBRT-YOLO [30] (AAAI 2025)	0.927	0.917	0.958	0.622
MFCNet (Ours)	0.929	0.926	0.962	0.635

Table 6. Comparison of result on SSDD.

Model	P	R	mAP50	mAP50:95
YOLOv8n [24] (2023)	0.962	0.940	0.973	0.743
YOLOv10n [26] (2024)	0.943	0.902	0.957	0.715
YOLOv11n [27] (2024)	0.957	0.931	0.974	0.732
YOLOv12n [28] (2025)	0.950	0.921	0.965	0.725
FFCA-YOLO [29] (TGRS 2024)	0.954	0.949	0.984	0.738
FBRT-YOLO [30] (AAAI 2025)	0.973	0.932	0.976	0.735
MFCNet (Ours)	0.963	0.942	0.974	0.747

Table 7. The parameters and computational cost of the ablation experiment.

Baseline	Decoder	MFC-Conv	MFA	Parameters	FLOPs
✓				3.01 M	8.9 G
✓	✓			1.61 M	6.1 G
✓	✓	✓		1.18 M	5.0 G
✓	✓	✓	✓	1.27 M	5.1 G

“✓” indicates the inclusion of the corresponding module.

Table 8. Ablation experiment results.

Dataset	Baseline	Decoder	MFC-Conv	MFA	P	R	mAP50	mAP50:95
HRSID	✓				0.904	0.809	0.902	0.647
	✓	✓			0.913	0.836	0.917	0.682
	✓	✓	✓		0.912	0.829	0.914	0.673
	✓	✓	✓	✓	0.913	0.837	0.920	0.694
SAR–Ship dataset	✓				0.927	0.916	0.959	0.629
	✓	✓			0.929	0.919	0.961	0.627
	✓	✓	✓		0.924	0.920	0.958	0.623
	✓	✓	✓	✓	0.929	0.926	0.962	0.635
SSDD	✓				0.962	0.940	0.973	0.743
	✓	✓			0.960	0.936	0.976	0.744
	✓	✓	✓		0.944	0.921	0.967	0.735
	✓	✓	✓	✓	0.963	0.942	0.974	0.747

“✓” indicates the inclusion of the corresponding module.

Table 9. Comparison of the attention module.

Dataset	Attention	P	R	mAP50	mAP50:95
HRSID	Channel	0.903	0.837	0.917	0.694
	Spatial	0.909	0.834	0.917	0.676
	CBAM	0.913	0.834	0.918	0.690
	MFA	0.913	0.837	0.920	0.694
SAR–Ship dataset	Channel	0.931	0.924	0.960	0.630
	Spatial	0.927	0.916	0.957	0.620
	CBAM	0.923	0.926	0.958	0.619
	MFA	0.929	0.926	0.962	0.635
SSDD	Channel	0.974	0.923	0.982	0.740
	Spatial	0.962	0.929	0.980	0.729
	CBAM	0.972	0.912	0.979	0.746
	MFA	0.963	0.942	0.974	0.747

Table 10. Compare the different numbers of feature channels.

Dataset	Channels	P	R	mAP50	mAP50:95
HRSID	2	0.916	0.839	0.920	0.676
	4	0.913	0.837	0.920	0.694
	8	0.917	0.829	0.918	0.693
SAR–Ship dataset	2	0.931	0.930	0.966	0.941
	4	0.929	0.926	0.962	0.935
	8	0.932	0.927	0.964	0.936
SSDD	2	0.969	0.940	0.983	0.952
	4	0.963	0.942	0.974	0.947
	8	0.960	0.930	0.970	0.938

Table 11. Compare the number of parameters with the computational cost.

Channels	Parameters	FLOPs
2	1.34 M	5.4 G
4	1.27 M	5.1 G
8	1.20 M	5.0 G

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Y.; Liu, C.; He, Z.; Zhang, Z. A Lightweight Deep Learning Framework with Reduced Computational Overhead for Ship Detection in Satellite SAR Imagery. J. Mar. Sci. Eng. 2025, 13, 2234. https://doi.org/10.3390/jmse13122234

AMA Style

Sun Y, Liu C, He Z, Zhang Z. A Lightweight Deep Learning Framework with Reduced Computational Overhead for Ship Detection in Satellite SAR Imagery. Journal of Marine Science and Engineering. 2025; 13(12):2234. https://doi.org/10.3390/jmse13122234

Chicago/Turabian Style

Sun, Yuchao, Chenxi Liu, Zhengzheng He, and Zhen Zhang. 2025. "A Lightweight Deep Learning Framework with Reduced Computational Overhead for Ship Detection in Satellite SAR Imagery" Journal of Marine Science and Engineering 13, no. 12: 2234. https://doi.org/10.3390/jmse13122234

APA Style

Sun, Y., Liu, C., He, Z., & Zhang, Z. (2025). A Lightweight Deep Learning Framework with Reduced Computational Overhead for Ship Detection in Satellite SAR Imagery. Journal of Marine Science and Engineering, 13(12), 2234. https://doi.org/10.3390/jmse13122234

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Deep Learning Framework with Reduced Computational Overhead for Ship Detection in Satellite SAR Imagery

Abstract

1. Introduction

2. Methodology

2.1. Multi-Feature Channel Convolution

2.2. Multi-Feature Attention

2.3. Loss Functions

3. Experiment Results

3.1. Dataset Description

3.2. Experiment Setup

3.3. Comparison

3.3.1. Parameters and Computational Cost

3.3.2. Comparison of Results on HRSID

3.3.3. Comparison of Results on SAR–Ship Dataset

3.3.4. Comparison of Results on SSDD

4. Discussion

4.1. Ablation Experiment

4.2. Efficiency of MFA

4.3. Feature Channels Experiment

4.4. The Current Shortcomings and Future Tasks

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI