LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism

Zhao, Yuliang; Du, Yang; Wang, Qiutong; Li, Changhe; Miao, Yan; Wang, Tengfei; Song, Xiangyu

doi:10.3390/rs17142514

Open AccessArticle

LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism

by

Yuliang Zhao

¹

,

Yang Du

^2,*

,

Qiutong Wang

¹

,

Changhe Li

¹,

Yan Miao

²

,

Tengfei Wang

³ and

Xiangyu Song

⁴

¹

Aulin College, Northeast Forestry University, Harbin 150040, China

²

Computer and Control Engineering College, Northeast Forestry University, Harbin 150040, China

³

China Railway Tunnel Group Co., Ltd., Changchun 130022, China

⁴

Civil Engineering College, Shijiazhuang Tiedao University, Shijiazhuang 050043, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(14), 2514; https://doi.org/10.3390/rs17142514

Submission received: 8 June 2025 / Revised: 11 July 2025 / Accepted: 17 July 2025 / Published: 19 July 2025

(This article belongs to the Special Issue Deep Learning-Based Cloud Detection and Removal for Remote Sensing Images)

Download

Browse Figures

Versions Notes

Abstract

The all-weather imaging capability of synthetic aperture radar (SAR) confers unique advantages for maritime surveillance. However, ship detection under complex sea conditions still faces challenges, such as high-frequency noise interference and the limited computational power of edge computing platforms. To address these challenges, we propose a lightweight SAR small ship detection network, LWSARDet, which mitigates feature redundancy and reduces computational complexity in existing models. Specifically, based on the YOLOv5 framework, a dual strategy for the lightweight network is adopted as follows: On the one hand, to address the limited nonlinear representation ability of the original network, a global channel attention mechanism is embedded and a feature extraction module, GCCR-GhostNet, is constructed, which can effectively enhance the network’s feature extraction capability and high-frequency noise suppression, while reducing computational cost. On the other hand, to reduce feature dilution and computational redundancy in traditional detection heads when focusing on small targets, we replace conventional convolutions with simple linear transformations and design a lightweight detection head, LSD-Head. Furthermore, we propose a Position–Morphology Matching IoU loss function, P-MIoU, which integrates center distance constraints and morphological penalty mechanisms to more precisely capture the spatial and structural differences between predicted and ground truth bounding boxes. Extensive experiments conduct on the High-Resolution SAR Image Dataset (HRSID) and the SAR Ship Detection Dataset (SSDD) demonstrate that LWSARDet achieves superior overall performance compared to existing state-of-the-art (SOTA) methods.

Keywords:

SAR; small ship target detection; lightweight network; YOLOv5

1. Introduction

With the rapid development of remote sensing technology, SAR technology achieves high resolution and robust performance in complex environments due to its all-weather, all-day imaging capabilities and strong penetration ability. It has been widely applied in various fields, including agricultural monitoring [1], disaster assessment [2], military reconnaissance [3], environmental monitoring [4], and resource exploration [5]. Compared with optical remote sensing, SAR is able to break through the limitations of meteorological conditions such as cloud cover, rain, and snow, and it can realize continuous and reliable monitoring under complex meteorological conditions. Small ship detection is one of the critical yet challenging tasks in SAR image interpretation. A small ship typically refers to a maritime target with a length less than 50 m, which occupies only a few pixels in SAR images [6]. Their weak scattering characteristics and high sensitivity to speckle noise further increase the difficulty of detection. However, in edge computing platforms, it is necessary to balance efficiency and accuracy, addressing issues such as extraction features and limited precision in small target localization.

Ship detection represents a critical technology for ocean exploration utilizing SAR images. Most traditional ship target detection algorithms are designed for specific scenarios. They typically rely on modeling and simulating sea clutter, combined with Constant False Alarm Rate (CFAR) detection to achieve target recognition [7,8]. Such methods are highly dependent on features and rules designed based on empirical knowledge, resulting in limited generalization capabilities [9,10]. This dependency limits adaptability to the wide variability of sea surface conditions encountered in practical SAR applications. In contrast, the neural network SAR ship target detection method, which eliminates the need for manual feature design, has demonstrated superior performance. However, conventional neural network target detection methods still face several challenges. Firstly, detecting small ship targets is difficult. In Figure 1a,b, in the vast ocean background environment, the ships in SAR images are usually described by a small size and sparse features, lacking significant texture and shape features. Secondly, high-frequency noise interference in complex environments poses a problem. In Figure 1c,d, SAR images not only contain the ocean surface but also land, islands, and other topographic features. This complex background information is often confused with the target ship, increasing the false alarm rate and missed detection rate of the detection algorithm. Thirdly, deploying models on edge computing platforms is challenging. Most general-purpose detection models pursue accuracy, which makes the number of model parameters large and difficult to realize real-time deployment on edge computing platforms such as spaceborne and airborne. In summary, achieving the accurate localization of small targets while ensuring a lightweight model and real-time deployment capability has become a key research direction in the current field.

Under the condition of comprehensively dealing with the multiple challenges of small target feature sparseness, high-frequency noise interference and limited edge computing resources, the key to solving these problems is to enhance the feature extraction capability [11] and achieve the lightweight model [12]. In terms of traction ability enhancement, the main approaches include multi-scale feature fusion, an attention mechanism, and the mutual fusion of the two. Wang et al. [13] achieved accurate recognition of targets with different sizes and distances by extracting features from different network layers and fusing the receptive fields of different sizes. Similarly, Liu et al. [14] proposed a feature pyramid network (FPN), which ensures effective representation of multi-scale targets by generating multi-level feature maps at different layers and fusing them. The attention mechanism is integrated into the multi-scale feature fusion module, which can further improve the focusing ability of small target features and enhance the discrimination between target and background. Therefore, both Yang et al. [15] and Yang et al. [16] achieve the improvement of small target detection accuracy by integrating regional focus and attention mechanism. Li [17] introduces the CBAM fusion attention mechanism in the shallow layer of the YOLOv7 backbone network so that the model can better improve the accuracy of small target positioning. He et al. [18] proposed a target detector that introduces multi-head self-attention and multi-scale fusion, which can better extract the internal features of the target. Although methods such as the attention mechanism and multi-scale feature fusion have achieved remarkable results in small target detection, they also face some challenges. Especially when dealing with high-resolution data, Query-Key-Value (QKV) computation in multi-scale feature extraction and attention mechanism not only consumes substantial memory but also requires a large amount of computational resources, which leads to a complex model training process and inefficient inference, limiting the feasibility of real-time detection applications [19].

Target detection in resource-constrained environments requires a careful trade-off between accuracy and computational efficiency. Existing lightweight research methods generally predict the target position and category directly through regression, which eliminates the step of generating a pre-selected target frame and significantly improves detection efficiency while maintaining high detection accuracy. Therefore, one-stage lightweight target detection methods [20,21] have emerged as a promising solution for reducing model complexity through architectural optimization while maintaining detection performance. For example, Li et al. [22] effectively achieve structural lightweight and computing resource optimization by adopting a lightweight MobileNet-V2 backbone network and introducing dynamic semantic matching and edge self-alignment modules. Peng et al. [23], by introducing an enhanced ShuffleNetV2 backbone, structure-heavy parameterized feature pyramid and simplified detection header, achieved excellent detection accuracy and operation speed while significantly reducing the number of parameters and computational complexity. Although lightweight target detection models are computationally efficient, in the ship detection for small targets, they often suffer from a significant loss of accuracy and robustness due to the loss of semantic information in feature representation. This makes them struggle to balance efficiency and performance in complex environments.

In order to better solve the problem of feature redundancy and computational inefficiency in SAR small target ship detection model, we propose a lightweight network structure, LWSARDet. Regarding the lightweight detection network structure, we make improvements from two perspectives. For the feature extraction component, the limited nonlinear representation capability of the original GhostNet is tackled by developing GCCR-GhostNet, which leverages channel attention mechanisms to enhance feature extraction performance and high-frequency noise suppression while maintaining low computational cost. For the detection component, feature dilution and computational redundancy problems encountered by traditional detection heads in small target scenarios are resolved through LSD-Head, a lightweight detection head that replaces conventional convolutions with simple linear transformations to achieve improved efficiency. In addition, we design a novel positional morphological matching loss function, P-MIoU, which constrains ship positional features through center distance penalties and regulates ship shape features via aspect ratio penalties and angular constraint mechanisms, thereby improving the positioning accuracy of SAR small ship targets under complex conditions.

The main contributions of this paper are summarized as follows:

To address the limitations of the original network in nonlinear expression capability, we construct a feature extraction module (GCCR-GhostNet) by embedding a global channel attention mechanism, which significantly reduces the network training parameter scale while enhancing multi-scale correlation and spatial semantic relationship modeling ability in feature space representation, so as to achieve the optimal balance between model ability and computational efficiency.
Aiming at the defects of traditional detection heads in feature sparseness and computational cost when dealing with small targets, we design a lightweight detection head (LSD-Head) by utilizing simple linear transformations to replace traditional convolutions, further improving network efficiency.
Given the challenges of low localization accuracy and shape mismatch, we propose a matching loss function (P-MIoU) by integrating center distance constraints and aspect ratio penalty mechanisms, which combines center distance constraints, aspect ratio penalties, and angular limitation mechanisms to accurately reflect positional and morphological deviations, improving the localization accuracy of small targets.
Extensive experiments conducted on the High-Resolution SAR Image Dataset (HRSID) and the SAR Ship Detection Dataset (SSDD) demonstrate that LWSARDet achieves superior overall performance compared to existing state-of-the-art (SOTA) methods.

2. Related Work

2.1. SAR Small Target Detection Methods

In recent years, target detection has shown great potential in SAR image recognition. However, targets in SAR datasets usually have low-resolution, small size, and sparse features, making them susceptible to high-frequency noise interference, which poses challenges for effective feature extraction by traditional detection algorithms.

Given the background of the continuous evolution of target detection technology, the detection method based on feature enhancement has gradually become an important direction to improve detection performance. Among the classical algorithms, the CFAR algorithm [24,25,26] is widely used in high-resolution SAR image detection due to its advantages of simple calculation, constant false alarm probability, and fast detection speed, but it performs poorly in complex clutter environments. Subsequent research mainly focuses on eliminating the estimated clutter environment. Gao et al. [27] used an index matrix to adaptively determine the clutter environment. However, the parameter settings of this method require knowledge of the target information, making it difficult to determine the optimal values. Tian et al. [28] proposed a segmentation-based global iterative filtering algorithm. Although their detection rate performs well, it usually detects more candidate target pixels, which may lead to an increase in the false alarm rate. With the rapid development of deep learning technology, it has gradually become the main method in the field of target detection due to its powerful feature extraction capabilities [29,30,31,32]. Advanced target detection networks are typically categorized into one-stage and two-stage methods. Two-stage methods first generate region proposals to localize potential targets and then perform classification on these proposals. This kind of algorithm mainly includes R-CNN [33], Faster R-CNN [11], and Mask R-CNN [34]. However, this method obviously leads to the increase of computational complexity, which makes it difficult to satisfy detection tasks with time-sensitive requirements. In contrast, one-stage target detection algorithms eliminate the step of generating region proposals by directly predicting target locations and classes, so that the detection efficiency is significantly improved while maintaining high detection accuracy. The YOLO series of algorithms [35,36,37] are typical representatives of such algorithms. Based on YOLOv5, YOLO-Former [38] enhances feature extraction ability by introducing attention mechanisms, combining the Transformer head with an improved CBAM attention module. YOLOv8 adopts a more complex attention mechanism design. For example, UAV-YOLOv8 [39] strengthens contextual modeling by embedding the BiFormer attention module and the FFNB feature processing structure.

Although the above methods have achieved significant progress in feature extraction and enhancement, they generally suffer from high model complexity and large computational cost, which makes it challenging to balance efficiency and deployment requirements. This has motivated researchers to further explore design strategies for lightweight detection networks.

2.2. Model Lightweight Method

SAR image interpretation has a long history, and current research mainly focuses on deep learning. Although YOLO-based detectors reach speeds up to 120 FPS [37], the detection of ships in any direction without anchor still relies on SAR images with high-resolution and complex sea conditions, which limits generalization and hinders the deployment of ships on edge devices [40]. Therefore, reducing computational complexity while maintaining high accuracy has become a key research focus.

The development of lightweight convolutional neural networks has gradually replaced traditional networks. Yang et al. [41] proposed that the acceptable quantization method (FT-INQ) effectively reduces the computational complexity of convolutional weights. Zheng et al. [42] adopted the EfficientNet-B0 as a false-alarm elimination network, significantly reducing computational cost. Long et al. [43] designed a bi-directional densely connected module to reduce the complexity while maintaining the detection performance. Lan et al. [44] achieved this by designing a dense lightweight network with modules, while Yang et al. [45] similarly employed a hybrid lightweight framework with modules that combines the advantages of CNN and Transformer to reduce model complexity further. In small target detection, it is critical to maintain the ability to capture fine-grained features while compressing computational resources. Deng et al. [46] optimized the network structure based on ShuffleNetV2, which reduces the number of parameters and computation and improves the generalization ability. Yu et al. [47] proposed a multi-scale target context feature extraction module that can effectively reduce computational complexity. Wang et al. [39] improved target detection performance by designing lightweight feature processing modules, Zhang et al. [48] further developed a combination scheme of three lightweight modules, and Kou et al. [49] also optimized small target feature extraction processes by integrating multiple convolutional modules.

Among many lightweight design strategies, GhostNet dynamically generates redundant feature maps through linear transformation, which provides a new idea for balancing model efficiency and feature expression ability. The core idea of GhostNet [50] is to generate redundant feature maps via cheap and efficient linear transformations, which can significantly reduce the computational cost while maintaining or even enhancing the feature expression ability. It is widely regarded as a representative work in lightweight neural network structures. Based on the GhostNet model, many studies have carried out rich improvements and applications around the Ghost module. For example, Han et al. [51] combined GhostNet with attention mechanisms in Light-YOLOv7, effectively compressing the volume of the model while maintaining good detection performance. Lv et al. [52] designed a feature extraction module that integrates C3 GhostV2 with SE attention mechanism and combined it with a novel XIOU loss function to improve detection accuracy. Building upon this work, Misbah et al. [53] further introduced GhostConv and C3Ghost modules in the head and neck of YOLOv5, achieving improved detection accuracy while reducing the number of parameters. Luo et al. [54] incorporated GhostConv and WIOU improvement strategies based on YOLOv8n, significantly optimizing the model’s parameter count and computational overhead. However, despite the significant improvements achieved by GhostNet and similar architectures, YOLOv5 remains a leading solution for real-time SAR ship detection tasks due to its robust performance, fast inference speed, and modular architecture that allows for the easy integration of custom lightweight modules [55]. Compared with more recent versions like YOLOv8 and YOLOv10, YOLOv5 maintains a stable and lightweight foundation that makes it particularly well-suited for deployment in resource-constrained edge environments such as UAVs and spaceborne platforms. Although these methods have achieved certain results in structural lightweighting and efficiency improvement, most of these methods ignore the potential of nonlinear feature representation. The weakening of the nonlinear modeling ability can lead to the limited ability of the network to capture detailed features in complex backgrounds, especially causing spatial offset errors in target edges and small target localization. This lack of nonlinear expressiveness in feature representation poses a significant challenge for existing lightweight improvements, making it difficult to strike a balance between high-precision positioning requirements and low computational cost.

Therefore, to achieve a lightweight network, we replace the standard C3 structure in the backbone part with our self-developed GCCR-GhostNet module, thereby reducing the computational cost of the model. By constructing the global context channel recalibration module (GCCR) and embedding it into each feature transformation stage of GhostNetv2, the nonlinear expression ability of the network is enhanced. In parallel, we substitute traditional convolutions with simple linear transformations and design a lightweight LSD-Head detection head, further boosting overall inference efficiency. Moreover, to tackle the inaccurate localization of small targets, we introduce the Position–Morphology Matching IoU loss function (P-MIoU), which constrains the ship positioning deviation with the center distance, the aspect ratio, and the angle to constrain the ship shape characteristics, and it more accurately reflects the spatial difference and shape deviation between the prediction frame and the real frame. Consequently, the proposed method significantly improves the localization accuracy of SAR small ship targets under complex sea conditions.

3. Proposed Methods

To address the challenges of small ship target detection in SAR images, this paper proposes lightweight improvements based on the YOLOv5 structure, introducing a novel lightweight network LWSARDet to achieve a balance between accuracy and efficiency, and the structure of LWSARDet is shown in Figure 2. The structure involves feature extraction of SAR images through the GCCR-GhostNet module in the Backbone, followed by feature fusion in the neck network, and finally outputting target detection results through the LSD-Head detection head.

In the backbone of LWSARDet, we develop a feature extraction module, GCCR-GhostNet, which is specifically designed to reduce model computational cost while enhancing high-frequency noise suppression and improving feature representation efficiency. By integrating channel attention mechanisms, GCCR-GhostNet significantly addresses the original network’s limitations in nonlinear expressiveness, effectively improving the network’s feature extraction capability and its ability to suppress high-frequency information. Meanwhile, we design a lightweight LSD-Head detection head that employs simple linear transformations to replace conventional convolutions, further enhancing the efficiency of feature processing and model computational cost when handling small targets. Furthermore, we introduce a morphological matching loss function P-MIoU to improve localization accuracy for small-scale targets, which incorporates center distance constraints, aspect ratio penalties, and angular restriction mechanisms. These components enable LWSARDet to effectively balance detection accuracy and computational efficiency in complex SAR imaging environments.

3.1. GhostNet with Global Channel Recalibration (GCCR-GhostNet)

YOLOv5 offers a great balance between speed and accuracy, making it ideal for real-time SAR target detection. Its modular structure allows easy integration of custom components. In SAR small ship target detection, traditional lightweight networks struggle with weak features and background clutter. The C3 module in YOLOv5 causes small target feature attenuation and noise amplification due to deep stacking and residual connections. At the same time, the linear operations in GhostNetv2 cannot capture nonlinear inter-channel dependencies, thereby limiting the network’s ability to adaptively emphasise weak target features in complex backgrounds. To overcome this shortcoming, we propose a novel feature extraction module, GCCR-GhostNet, as illustrated in Figure 3. By incorporating dynamic channel attention mechanisms and lightweight structural reconstruction strategies, GCCR-GhostNet significantly enhances high-frequency noise suppression and improves feature representation efficiency while maintaining a low parameter count suitable for deployment on resource-constrained platforms.

In GCCR-GhostNet, the GCCR module is integrated into each transformation stage of GhostNetv2, which enhances focus on weak targets via channel attention. The GCCR module functions as follows: global channel information is first extracted via global average pooling (GAP) to generate channel descriptors. This is followed by two successive 1 × 1 convolutions with a ReLU activation in between, capturing nonlinear inter-channel dependencies. Finally, a Sigmoid function produces attention weights to recalibrate the channel features.

Given an input feature map

X \in R^{C \times H \times W}

, global average pooling (GAP) is applied to generate a channel descriptor vector

Z \in R^{C \times 1 \times 1}

as follows:

Z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{c} (i, j)

(1)

The descriptor

Z_{c}

is then passed through two successive 1 × 1 convolutional layers with a ReLU activation in between to generate the channel attention weights

W \in R^{C \times 1 \times 1}

, which can be shown as follows:

W = σ (C o n v_{1 \times 1}^{2} (ReLU (C o n v_{1 \times 1}^{1} (Z))))

(2)

Finally, the output feature map

Y_{c}

is obtained by channel-wise multiplication of the input feature map with the following attention weights:

Y_{c} = W_{c} \times X_{c}, c = 1, 2, \dots, C

(3)

The integration of the GCCR module significantly enhances GhostNetv2’s feature extraction capability, improving sensitivity and accuracy in detecting weak targets in SAR images.

3.2. Detection Head Structure Integrating Attention Mechanism and Lightweight Convolution (LSD-Head)

In SAR small target detection tasks, the structure of the detection head plays a crucial role in maintaining and enhancing weak features. Traditional detection algorithms typically employ fully convolutional stacking approaches for feature processing in their detection heads. Although these possess multi-scale prediction capabilities, they present the following problems when handling small targets: First, continuous convolutional operations easily tend to dilute and loss of small target feature information, particularly in deeper network layers. As semantic information becomes increasingly abstracted, fine-grained spatial details are often lost. Second, traditional detection heads lack modeling of importance differences between feature channels, failing to effectively highlight key feature channels, resulting in insensitive responses to weak small targets in complex SAR scenarios. Furthermore, fully convolutional structures often involve high computational cost and parameter redundancy, which is unfavorable for practical deployment applications of models in resource-constrained environments.

To further improve the expressive ability and computational efficiency of the detection head, this paper proposes a lightweight detection head module, LSD-Head, which features a simple structure and enhanced perception. The overall structure of the module is illustrated in Figure 4. In order to simplify the parameters, this module uses a simple linear operation-direct parameter multiplication strategy. Specifically, this module integrates Linear Transform with SE channel attention mechanisms (Squeeze-and-Excitation) and Depthwise Separable Convolution, implementing direct weighted modulation of feature channels through element-wise multiplication operations. This approach avoids complex nonlinear transformations and redundant parameter mapping, ensuring lightweight design while improving response capability to weak small target features.

First, input feature map

X \in R^{C \times W \times H}

is first processed through channel-wise convolution with kernel size n of

1 \times 1

to accumulate feature points, generating m real feature maps

Y \in R^{m \times W \times H}

, where

m \leq n

. The input feature map and output feature map maintain the same spatial correlation. This process can be expressed as follows:

Y^{'} = X * f^{'}

(4)

where

f^{'} \in R^{m \times C \times 1 \times 1}

is the set of

1 \times 1

convolution kernels used to project the input from C channels to m channels, and ∗ denotes the convolution operation.

Then, the remaining feature maps

Y

generated in the previous step are processed through each channel’s feature map

y_{i}

via a series of cheap linear transformations

Φ_{i, j}

to generate s linear transformation-generated feature maps

y_{i, j}

as follows:

y_{i, j} = Φ_{i, j} (y_{i}^{'}), \forall i = 1, \dots, m, j = 1, \dots, s

(5)

where

y_{i}^{'}

is the i-th original feature map,

Φ_{i, j}

is the j-th linear transformation. The final

Φ_{i, s}

are identity mappings used to maintain the original feature map. Specifically, each

Φ_{i, j}

is implemented as a spatially local linear mapping without inter-channel mixing, and can be mathematically expressed as a

1 \times 1

convolution as follows:

Φ_{i, j} (y_{i}^{'}) = w_{i, j} * y_{i}^{'}

(6)

where

w_{i, j}

denotes a learnable

1 \times 1

convolution kernel applied only to the i-th channel. This operation satisfies the properties of additivity and homogeneity, thereby preserving the linearity of the transformation while incurring minimal computational overhead.

Finally, the m intrinsic feature maps generated accumulatively and the feature maps generated by linear transformations are concatenated to obtain the following output feature map:

Y = Concat ([Y^{'}, Y^{'} * Φ_{i, j}])

(7)

This linear transformation approach can significantly reduce computational cost while expanding the feature map’s expressive capability, fully capturing internal correlations between feature dimensions.

Feature map

Y

is fed into the SE module for adaptive channel attention weighting. First, through Global Average Pooling (GAP), the information of each channel is compressed, calculated as follows:

Z_{c} = \frac{1}{W \times H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} Y_{c} (i, j)

(8)

where

W \times H

is the spatial dimension of the feature map. Through this global pooling,

z = {[z_{1}, \dots, z_{C}]}^{T}

transforms the response of each channel into a real number, embodying the global semantic information and importance distribution of the channels.

Next,

z

is input into two fully connected layers for dimensionality reduction, expansion, and nonlinear mapping to generate attention weight vector

s

.

s = σ (W_{2} \cdot δ (W_{1} \cdot z))

(9)

where

W_{1} \in R^{d \times C}

and

W_{2} \in R^{C \times d}

are learnable weight matrices with

d = C / r

, where r is the reduction ratio. Both matrices are initialized using the Kaiming uniform strategy and jointly optimized during training. Here,

δ (\cdot)

and

σ (\cdot)

denote the ReLU and Sigmoid functions. The feature map of each channel is scaled by its corresponding attention weight

s_{c}

to achieve feature channel weighting adjustment. The specific expression is as follows:

{\tilde{X}}_{c} = s_{c} \cdot Y_{c}

(10)

Finally, the feature map is fed into a depthwise separable convolution module to further extract discriminative features and reduce model complexity. First, depthwise convolution is applied to each input channel

Y_{c}

separately using independent convolution kernels

k_{c}^{(d w)}

for local spatial feature extraction as follows:

M_{c} = {\tilde{X}}_{c} * k_{c}^{(d w)}

(11)

where

k_{c}^{(d w)}

denotes a depthwise convolution kernel applied to the c-th input channel. Subsequently, a pointwise

1 \times 1

convolution is used to perform channel-wise fusion, where the fusion weights

w_{c^{'}, c}^{(p w)}

are learnable parameters forming a matrix

W^{(p w)} \in R^{C_{out} \times C}

, initialized using the Kaiming method and updated jointly with other network parameters during training.

Subsequently, pointwise convolution using

1 \times 1

convolution kernels performs channel fusion on the depthwise convolution output results, forming the final output feature map

O \in R^{C_{out} \times W \times H}

. The calculation formula is:

O_{c^{'}} = \sum_{c = 1}^{C} w_{c^{'}, c}^{(p w)} \cdot M_{c}

(12)

where

w_{c^{'}, c}^{(p w)}

represents the fusion weight from the c-th input channel to the

c^{'}

-th output channel. These weights form a learnable parameter matrix

W^{(p w)} \in R^{C_{out} \times C}

corresponding to a

1 \times 1

pointwise convolution. The matrix is initialized using the Kaiming uniform strategy and jointly optimized during network training through backpropagation.

In summary, the LSD-Head structure utilizes a simple linear transformation to replace the traditional convolution to extract spatial structural information, employs SE attention mechanism for channel attention reweighting, and uses depthwise separable convolution to enhance efficiency and expressive capability, achieving efficient and accurate detection of SAR small targets, particularly suitable for resource-constrained edge deployment scenarios.

3.3. Position–Morphology Matching IoU(P-MIoU)

Traditional IoU-based methods face significant limitations in small target detection. Due to the high sensitivity of small targets to positional deviations, IoU values tend to exhibit substantial instability, resulting in poor model convergence. Furthermore, the lack of sufficient position constraints may cause misdetections, even when bounding boxes achieve relatively high IoU scores but remain misaligned, particularly for small targets. In addition, traditional IoU metrics are insensitive to differences in aspect ratio, making it difficult to distinguish targets with varied shapes (such as slender and compact), which is particularly important in tasks such as abnormal target detection.

To address these issues, this paper proposes a Position–Morphology Matching penalty IoU (P-MIoU) by incorporating geometric structural constraints along three dimensions: center point offset, aspect ratio discrepancy, and angular deviation, building upon the traditional overlap degree notion.

3.3.1. Loss Constraint Based on Center Point Position

From an intuitive geometric perspective, the distance between the center points of the predicted box and the ground truth box is a crucial indicator reflecting localization accuracy. Thus, this study first quantifies center offset using a normalized Euclidean distance metric as follows:

ρ^{2} = \frac{{(x_{c_{1}} - x_{c_{2}})}^{2} + {(y_{c_{1}} - y_{c_{2}})}^{2}}{4}

(13)

where

(x_{c_{1}} - x_{c_{2}})

and

(y_{c_{1}} - y_{c_{2}})

denote the predicted and ground truth box centroids.

To ensure scale invariance, we normalize using the minimum enclosing rectangle’s diagonal length as follows:

c^{2} = w_{c}^{2} + h_{c}^{2}

(14)

where

w_{c}

and

h_{c}

denote the average width and height of the predicted bounding box and the ground truth box, respectively.

Based on the above definitions, the center point position penalty term in this study can be defined as follows:

P_{p o s} = \frac{ρ^{2} (b, b^{g t})}{c^{2}}

(15)

where b represents the center point of the predicted box B, and

b^{g t}

represents the center point of the ground truth box

B^{g t}

.

The position-based penalty term

P_{p o s}

effectively quantifies the degree of center point offset of the prediction box relative to ground truth box. The larger the center point offset is, the more significant the impact of the penalty term on the loss is, thus prompting the model to return to the position more accurately.

3.3.2. Loss Constraint Based on Morphological Matching

To tackle the positional offset of bounding boxes, this paper further introduces the difference term of the aspect ratio and the angle offset term, and it disassembles the shape inconsistency between the predicted frame and the real frame into two independent structural components. Instead of directly measuring the scale ratio difference, the squared difference of their width-to-height ratios is used as follows:

P_{s c a l e} = λ \cdot \frac{4}{π^{2}} {(\frac{w}{h + ϵ} - \frac{w^{g t}}{h^{g t} + ϵ})}^{2}

(16)

where

λ

is the loss term weight coefficient;

w, h

are the width and height of the predicted box;

w^{g t}, h^{g t}

are the corresponding dimensions of the ground truth box;

ϵ

is a small constant to prevent division by zero.

Additionally, for angular inconsistency, an angular difference penalty is introduced to enable the model to better handle the skewness of bounding boxes in the orientation dimension. The aspect ratio term

P_{a s p e c t}

is defined as follows:

P_{a s p e c t} = \frac{λ^{4}}{π^{2}} {(arctan (\frac{w}{h + ϵ}) - arctan (\frac{w^{g t}}{h^{g t} + ϵ}))}^{2}

(17)

where

\frac{w}{h + ϵ}

and

\frac{w^{g t}}{h^{g t} + ϵ}

represent the predicted and ground-truth aspect ratios in 3D space, respectively.

By incorporating the above term, the predicted aspect ratio is effectively aligned with the ground truth aspect ratio through the comparison of width and height in both dimensions. This alignment reduces structural aspect ratio errors and enhances the accuracy of the predicted structure. Finally, the

P_{M I o U}

term is proposed to standardize the evaluation metric and normalize the outputs to prevent exceeding the maximum permissible value. It is defined as follows:

P - MIoU = I o U - P_{p o s} - P_{a s p e c t} - P_{s c a l e}

(18)

To ensure stability and constrain the value within valid bounds, the following normalization is applied:

P - MIoU = max (0, min (1, P - MIoU))

(19)

As illustrated in Figure 5, conventional CIoU loss without positional confidence constraints exhibits significant deviation between the target centroid and predicted bounding box center, failing to accurately align with the actual target center. In contrast, our proposed P-MIoU effectively corrects this deviation by incorporating a center distance penalty term within its loss formulation. In addition, the further added shape loss term also enhances the matching degree between the frame shape and the real target, and avoids the problem of boundary tilt or mismatch caused by the lack of shape constraints. Therefore, the method achieves a better balance between center positioning accuracy and shape fitting, which is especially suitable for edge computing scenarios with high matching consistency requirements such as small target detection.

4. Results

In this section, we evaluate the detection performance of LWSARDet through a series of experiments. Firstly, the dataset, experimental setup, and evaluation metrics are introduced. Subsequently, we conduct comparative and ablation studies to validate the effectiveness of the proposed model.

4.1. Dataset

This study adopts the following two representative SAR ship detection datasets: HRSID [56] and SSDD [57]. The two datasets used in this study are both grayscale SAR datasets. Both datasets contain a large number of small-scale targets. This makes them highly representative benchmarks for evaluating small object detection performance in SAR images. These grayscale images are highly representative and practical, especially under real-time and lightweight constraints, enabling a comprehensive evaluation of detection performance.

(1) The HRSID dataset consists of SAR images with diverse resolutions (0.5–3 m), polarization modes, sea states, maritime regions, and coastal port scenes. It includes 5604 high-resolution SAR images (800 × 800 pixels) with 16,951 annotated ship instances. Following the standard protocol, the dataset is split into a training set of 3642 images and a test set of 454 images for evaluation, and the remaining images are in the verification set.

(2) The SSDD dataset is specifically designed for ship detection in SAR image. It comprises 1160 images with resolutions ranging from 1 to 15 m, containing a total of 2456 annotated ship instances and averaging 2.12 targets per image. The dataset is divided into training, validation, and test sets in a 7:2:1 ratio, making it well-suited for SAR small ship target detection studies under controlled experimental settings.

4.2. Experimental Environment

All experiments in this paper are conducted using PyTorch v2.10, CUDA v10.1, cuDNN v7.6, and Python 3.8. The hardware environment includes Ubuntu 20.04, an NVIDIA GeForce RTX A5000 GPU, and 128 GB of RAM. The training is performed for 500 epochs with a batch size of 16 and an input resolution of 640 × 640 pixels. The SGD optimizer is employed with an initial learning rate of 0.01, momentum of 0.937, and weight decay of 0.0005. A warm-up strategy is applied for the first 3 epochs, where the learning rate and momentum linearly increase from 0.1 and 0.8, respectively.

4.3. Evaluation Criteria

Average precision (AP) is widely used in target detection to reflect the average accuracy of the network’s overall performance under different confidence thresholds. Mean AP (mAP) is the average AP value of all categories. In order to evaluate the performance of the target detector, we use mAP to evaluate the proposed framework.

For each category in the validation set, it contains many actual boxes and many detection boxes calculated by the network. In order to judge whether the bounding box is consistent with the ground truth, a complete intersection over union (CIoU) of the detection box and the ground truth is used to test the sample. True Positive (TP) is the number of correctly classified targets in the positive sample. False Positive (FP) is the number of incorrectly classified targets in the positive sample. False Negative (FN) is the number of actual targets that have not been detected by the network.

Precision is the ratio of TP to the sum of TP and FP. It can be formulated as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(20)

Recall is the ratio of TP to the sum of TP and FN. It can be formulated as follows:

R e c a l l = \frac{T P}{T P + F N}

(21)

Hence, AP can be defined as follows:

AP = \sum_{n} (r_{n + 1} - r_{n}) max_{\tilde{r} : {\tilde{r}}^{3} r_{n + 1}} p (\tilde{r})

(22)

where

p (\tilde{r})

is the measurement accuracy when the recall is

\tilde{r}

.

Then, mAP is formulated as follows:

m A P = \frac{\sum_{i = 1}^{m} {A P}_{i}}{m}

(23)

4.4. Performance Comparison

4.4.1. Performance on HRSID Datasets

Based on Table 1, we present two model variants on the HRSID dataset that achieve an excellent balance between accuracy and efficiency within a lightweight structure. The LWSARDet-Small model requires only 12.8 GFLOPs while achieving 94.2% mAP, outperforming YOLOv5-Small by 3.4 percentage points in accuracy while reducing computational cost by 19.0%. Additionally, it achieves superior performance compared to CenterNet with 50.9% higher mAP using only 18.2% of its computational resources. Compared to LWSARDet, YOLOv12-Nano achieves a lower mAP (88.6%) and mAP50-90 (62.4%), despite being lightweight. Yue et al. [58] deliver the highest mAP (91.3%) but at a high computational cost (105.6 GFLOPs), and Guan et al. [59] offer a competitive mAP (91.0%) with better efficiency, but their model still falls behind LWSARDet in recall and localization performance. LWSARDet-Nano achieves a mAP50-90 of 67.0%, outperforming YOLOv9-Tiny (65.8%), YOLOv8 (65.3%), and YOLOv10-Nano (65.1%), while being the lightest model with only 3.4 GFLOPs and 1.63M parameters. LWSARDet-Small also maintains a competitive mAP50-90 of 66.9%, slightly behind YOLOv5-Small (67.8%) and YOLOv3 (67.6%). We demonstrate our model’s precise localization ability, which is particularly beneficial for detecting small and densely distributed targets and offers a better trade-off between accuracy and efficiency. In addition, LWSARDet-Small achieves the highest recall (89.3%) among all models, surpassing YOLOv5-Small (83.5%), YOLOv3 (78.0%), and YOLOv9-Tiny (83.5%) by a noticeable margin. LWSARDet-Nano also achieves a high recall of 87.9%, comparable to the best-performing baselines, despite its ultra-lightweight design.

At the same time, we present the confidence details of the relevant algorithms on the HRSID dataset, further validating the detection advantages of the proposed lightweight SAR small target ship detection network in complex scenarios. The detection results under different scenarios and noise conditions are shown in Figure 6. In scenarios (a) and (b), CenterNet’s detection boxes are relatively small, with some targets not fully covered and poor boundary integrity. SSD and YOLOv3-Tiny are severely affected by noise, resulting in many false alarms or complete misses. Although YOLOv8 and YOLOv10 maintain good recognition performance on large targets, they still exhibit some target positioning deviations when dealing with closely adjacent areas and complex backgrounds. Meanwhile, in terms of small target detection confidence, the proposed LWSARDet-Small and LWSARDet-Nano show remarkable improvements. In scenario (c), YOLO series and CenterNet methods tend to misidentify the background as targets and are insensitive to potentially overlapping targets or miss real targets close to the shore. In contrast, LWSARDet-Small and LWSARDet-Nano demonstrate stronger anti-interference ability, accurately distinguishing multiple adjacent and overlapping small targets, showing excellent detection accuracy and discrimination ability. Notably, LWSARDet-Small and LWSARDet-Nano can comprehensively capture all small-scale targets without any misses, with clear prediction box boundaries. Especially in cases with multiple closely adjacent targets, their discrimination ability and spatial separation performance are significantly better than other methods, reflecting outstanding small target modeling capability.

4.4.2. Performance on SSDD Datasets

As shown in Table 2, our method achieves an excellent balance between accuracy and efficiency in lightweight structures. The LWSARDet-Small model requires only 12.8 GFLOPs, reducing computational cost by 96.4% compared to the traditional detector SSD. This model achieves a mAP of 92.1%, surpassing all comparative methods. Compared to other lightweight models, LWSARDet-Small outperforms YOLOv9-Tiny by 2.9%. In terms of parameter efficiency, this model has only 6.45M parameters, an 89.5% reduction compared to YOLOv3. More notably, the LWSARDet-Nano model achieves 90.4% mAP with only 3.4 GFLOPs. Compared to similar lightweight models, it reduces computation by 58.5% compared to YOLOv10-Nano while improving accuracy by 4.3%. This model has only 1.63M parameters, which is the lightest structure in the table. While YOLOv12-Nano is lightweight with only 5.8 GFLOPs, it achieves a lower mAP (86.5%) and mAP50-90 (62.4%) compared to LWSARDet. Yue et al. [58] achieve the highest mAP (91.8%), but their model comes at a high computational cost (105.6 GFLOPs), making it less efficient. On the other hand, the model of Guan et al. [59] provides a competitive mAP (90.0%) with better efficiency, but it still lags behind LWSARDet in recall (90.5%) and localization precision. Our method performs well in terms of recall rate (LWSARDet-Small reaches 90.5%, LWSARDet-Nano reaches 89.7%), reflecting the sensitive detection ability of SAR small targets. Meanwhile, the accuracy indicators (81.2% and 78.2%) reflect the feature representation characteristics of lightweight structures in complex SAR scenarios. Even though the model’s accuracy decreases on the SSDD dataset, this performance distribution reflects the optimization focus on small ship target detection tasks. Under resource constraints, priority is given to ensuring comprehensive target capture capability, which is crucial for SAR applications where missing key targets is unacceptable. In addition, LWSARDet-Small achieves the best mAP50-90 score of 70.4%, surpassing YOLOv3-CSP (69.5%) and YOLOv3 (68.9%), and representing a substantial improvement over YOLOv5-Small (67.3%) and YOLOv8 (65.5%), which reflects robustness across different IoU thresholds and being critical for evaluating small target localization precision. Even LWSARDet-Nano reaches a competitive 67.7%, outperforming YOLOv9-Tiny (66.6%) while maintaining ultra-low computational complexity. The significant improvement in overall mAP performance fully validates the proposed method’s outstanding performance in ultra-lightweight scenarios.

Similarly, as shown in Figure 7, we conducted a performance comparison of various models on SSDD. In scenario (a), CenterNet and SSD only detected some targets, with obvious missed detections. YOLOv3-Tiny completely missed all targets. In contrast, although YOLOv8 and YOLOv10 detected some targets, their confidence scores were generally low. Our proposed LWSARDet-Small and LWSARDet-Nano not only successfully detected multiple targets but also significantly improved confidence scores, which fully proves the sensitivity of the algorithm to small targets. In particular, scenario (b) is a representative case of low-resolution ship detection, where the target only has sparse features. Each algorithm can detect the main target, but the confidence is different. The confidence of the traditional method and the early version of the YOLO series is generally between 0.52 and 0.63, while our LWSARDet-Small and LWSARDet-Nano both reach a high confidence of 0.92. The excellent performance of our model in this setting underscores its superior sensitivity and robustness for extremely low-resolution target detection tasks. In scenario (c), although YOLOv8 and YOLOv10 showed improved detection performance, their confidence scores remained low. LWSARDet-Small and LWSARDet-Nano not only accurately located the main targets but also demonstrated excellent target separation ability and anti-interference performance in dense target areas.

On the whole, the performance of our LWSARDet series models on HRSID and SSDD datasets verifies its effectiveness in SAR image ship detection tasks, especially in small target detection, complex background suppression and high confidence output.

4.4.3. Attention Visualization Analysis

To more intuitively reflect the advantages of LWSARDet, in Figure 8, we generated attention mechanism heat maps to visualize the detection results of LWSARDet-Small and YOLOv5-Small. The green boxes indicate the actual positions of the ships, and the darker colors correspond to higher heat values.

Figure 8a,b show isolated scenarios under sea surface conditions with high-frequency noise and noise-free background. Compared to YOLOv5-Small, our method selects center points that are closer to the true centroid of the target, which demonstrates superior robustness to high-frequency interference and ensures more accurate responses around small targets. Figure 8c shows a complex background scenario with mixed docks and ships. YOLOv5-Small generates considerable redundant attention in the background regions, while LWSARDet precisely focuses on the ship hull edges, better conforming to the true target boundaries with more stable shape assessment. This demonstrates excellent small ship target detection and target separation capabilities, especially in cluttered scenes where targets may be partially occluded or visually fused with background structures. In contrast, YOLOv5s exhibits a broader detection focus region that is less precise than LWSARDet. This highlights the excellent balance achieved by P-MIoU between target localization accuracy and shape selection, which makes it particularly suitable for edge computing platforms that require high matching stability, such as those used for small ship target detection tasks.

4.4.4. Computational Efficiency Analysis

Figure 9 presents the per-epoch training time and overall testing time for CenterNet, SSD, YOLO series, and the proposed model in this paper on both datasets. It can be observed that LWSARDet-Small and LWSARDet-Nano significantly reduce both training and test time while maintaining detection ability, demonstrating superior efficiency performance. From an overall trend perspective, as the model structures become progressively lightweight, both the training and testing time consumption are significantly reduced. Compared to traditional heavyweight models like CenterNet, the YOLO series has gradually enhanced efficiency through its iterations yet still presents opportunities for further refinement. The LWSARDet series of models we proposed demonstrates remarkable advantages in time efficiency. On the HRSID dataset, LWSARDet-Small achieved a training time of 124 s and a testing time of only 21 s. LWSARDet-Nano further compressed training time to 63 s and reduced testing time to 14 s, achieving significant time reductions of 82.5% and 83.1% compared to CenterNet. On the SSDD dataset, the efficiency improvement is even more significant: our approach demonstrates clear advantages even when compared to the similarly lightweight model like YOLOv10-Nano. On the HRSID dataset, LWSARDet-Nano achieves a 55.0% reduction in training time and a 61.1% reduction in testing time compared to YOLOv10-Nano. These results fully demonstrate that our method significantly improves training and inference efficiency while maintaining detection accuracy, making it particularly suitable for resource-constrained edge deployment scenarios.

4.5. Ablation Study

4.5.1. Ablation on Attention Network

To evaluate the effectiveness of different attention mechanisms, we conducted ablation studies on the HRSID dataset, as summarized in Table 3. The baseline model without attention achieves a mAP of 92.8% and a recall of 84.9%. Introducing attention consistently improves performance across all variants. The LSD-Head module delivers the best overall results, increasing mAP by 1.4%, mAP50-90 by 3.2%, and recall by 5. 1% with nearly no increase in parameters demonstrating excellent efficiency. In comparison, the CA module raises mAP to 93.4%, but shows limited improvement in recall. The MCA module balances recall and parameters, though it slightly reduces mAP50-90. The PSA module achieves the highest precision, but at the cost of increased complexity, with 7.8% higher GFLOPs and 10.7% more parameters, limiting its deployment efficiency. Overall, the results highlight the superiority of lightweight channel attention mechanisms, especially the LSD-Head, in achieving a better trade-off between accuracy and efficiency. This makes them more suitable for practical applications. In particular, the notable improvements in mAP50-90 and recall demonstrate the enhanced capability of LSD-Head to accurately localize and capture small-scale ship targets.

4.5.2. Ablation Experiments on IoU Loss Function

In this section, the influence of different bounding box regression loss functions on target detection performance is systematically evaluated through the IoU ablation study. As shown in Table 4, experimental results demonstrate that P-MIoU achieves the best overall performance, with a mAP of 0.942, which is 1.7 % higher than the baseline EIoU. Notably, it achieves a 6.0% increase in the high-threshold-sensitive mAP50-90 metric, indicating that its incorporation of center distance constraints and aspect ratio penalty effectively enhances localization robustness. It is worth noting that while XIoU outperforms SIoU and EIoU in basic mAP, it exhibits a clear trade-off between recall and precision. In contrast, P-MIoU leverages the integration of center distance constraints, aspect ratio penalties, and angular limitations to accurately reflect positional and morphological deviations, thereby improving the localization accuracy of small targets.

4.5.3. Overall Impact of Components

Sensitivity Analysis on LWSARDet-Small

The ablation study conducted on the HRSID dataset proves the effectiveness of our model design. Multiple experimental settings were evaluated to assess the impact of individual components, and the results are summarized in Table 5. Integrating the GCCR-GhostNet module into the backbone significantly improves detection performance. While reducing the computational cost by 18.99%, mAP is increased by 1.98%, highlighting its advantages in capturing SAR small ship targets. Further enhancement is achieved by introducing the LSD-Head module, which enhances the feature dependence between channels and increases mAP to 93.2% Replacing the standard IoU loss with the proposed P-MIoU yields additional gains: mAP improves by 1.07%, with simultaneous increases in recall and precision, confirming its effectiveness in balancing detection accuracy and missed detections, particularly beneficial in small target detection tasks. In conclusion, these components synergistically contribute to the development of a highly efficient and accurate small ship target detection model.

As shown in Table 6, under consistent computational costs, the improved model demonstrates significant performance advantages. GCCR-GhostNet achieves a mAP increase from 93.7 to 94.2 and mAP50-90 improvement from 65.0 to 66.9. Notably, these performance gains are achieved with no additional parameter cost.

Sensitivity Analysis on LWSARDet-Nano

We further conducted ablation studies on the Nano version using the SSDD dataset. As shown in Table 7, integrating the GCCR-GhostNet module into the backbone improves the mAP by 1.8% while maintaining the same computational cost. The mAP50-90 significantly increases to 92.3%, and precision improves from 81.8% to 87.0%, confirming the enhanced feature extraction capability of GCCR-GhostNet, particularly for small ship target detection. However, the model still did not fully meet lightweight constraints. By replacing the standard IoU with P-MIoU, the computational load is significantly reduced by 17.07%, though the mAP temporarily declines. Introducing the LSD-Head module compensates for this loss by enhancing channel-wise feature interactions, resulting in a recovery of mAP to 67.7%, an increase in mAP50-90 to 90.4%, and stable recall at 89.7%. These results demonstrate the LSD-Head module’s effectiveness in restoring performance in lightweight models. The combined use of GCCR-GhostNet, LSD-Head, and P-MIoU establishes a more robust and efficient small ship target detection model under the Nano configuration, offering a promising solution for lightweight deployment in small target detection tasks.

Table 8 presents the experimental results of LWSARDet-Small and LWSARDet-Nano on the HRSID and SSDD datasets. The results indicate that LWSARDet-Nano significantly reduces computational complexity and parameter count, but its detection performance remains close to LWSARDet-Small. In particular, on the HRSID dataset, the mAP drops by only 0.1%. In addition, LWSARDet-Small shows a decline in precision on the SSDD dataset, whereas LWSARDet-Nano achieves a higher recall on SSDD compared to its performance on HRSID. Overall, LWSARDet-Nano maintains competitive detection performance even under constrained computational resources.

5. Discussion

To further validate the effectiveness of the proposed P-MIoU loss in small target detection, we analyze its penalization behavior under controlled conditions. As illustrated in Figure 10, when keeping the positional deviation (e.g., a 5-pixel shift) constant, the resulting IoU drop for small targets is significantly greater than for large targets. This is due to the limited spatial extent of small objects, where even minor localization errors substantially reduce the overlap area. P-MIoU enhances this penalization by incorporating center distance, aspect ratio, and angular constraints, thereby providing stronger and more geometry-aware supervision. Such design not only compensates for the inherent instability of IoU for small objects but also leads to better localization accuracy under limited-resolution SAR imagery.

Although LWSARDet has achieved favorable results in detection accuracy and computational complexity, there remains scope for further optimization. In subsequent work, we plan to conduct optimizations focusing on aspects such as the model structure, loss function, and detection network, particularly implementing deep customization improvements tailored specifically to ship characteristics in SAR images.

Furthermore, the LSD-Head module introduced in the LWSARDet model aims to enhance the overall performance by strengthening feature dependencies among channels. However, when dealing with SAR images with extremely complex backgrounds, the effectiveness of the LSD-Head module may prove suboptimal. Particularly in computationally constrained environments, it is subject to limitations imposed by computational cost, which can compromise overall performance. This limitation is also reflected in the observed performance gap across different datasets. Specifically, the proposed model shows a difference of about 10% in detection accuracy between HRSID and SSDD datasets. This difference is mainly due to the differences in the image conditions of the two data integrations, such as resolution, noise level, background complexity, and sensor parameters. These factors affect our model and other detection algorithms. This result highlights the importance of robustness and versatility in practical SAR applications, and it also points out the need to improve adaptability in different SAR domains and acquisition environments.

6. Conclusions

In this paper, we introduced the LWSARDet, a lightweight target detection network designed explicitly for small target detection in SAR images within satellite-based edge computing platforms. The proposed method introduces a GCCR-GhostNet feature extraction module, which enhances discriminative feature extraction through efficient channel attention mechanisms. Meanwhile, a lightweight LSD-Head detection module is presented, which replaces standard convolution with linear transformation to reduce computational cost. In addition, we develop P-MIoU loss function, which improves spatial localization precision through position–morphology matching. Extensive experiments validate the effectiveness and efficiency of LWSARDet. Compared with conventional and state-of-the-art methods, our approach demonstrates enhanced detection accuracy, particularly in identifying small and densely distributed targets, while also maintaining lower computational demands in terms of floating-point operations and parameter size. Additionally, the model exhibits greater confidence in predictions, with more accurate alignment between predicted and actual target locations. LWSARDet achieves superior overall performance compared to existing SOTA methods, demonstrating its potential as a practical solution for real-time monitoring on satellite-based edge computing platforms.

Author Contributions

Conceptualization, Y.Z. and Y.D.; Data curation, Y.Z., Q.W. and C.L.; Formal analysis, Y.Z.; Funding acquisition, Y.D.; Investigation, Y.Z., Q.W., C.L., T.W. and X.S.; Methodology, Y.Z.; Project administration, Y.Z. and Y.D.; Resources, Y.M. and T.W.; Software, Y.Z.; Supervision, Y.D., Y.M. and X.S.; Validation, Y.Z.; Visualization, Y.Z., Q.W. and C.L.; Writing—original draft, Y.Z.; Writing—review and editing, Y.Z. and Y.D.; Y.D., Y.M. and X.S. provided valuable suggestions for the overall concept of the paper and algorithm model. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities (2572025BR14), National Natural Science Foundation of China (42401166), and the Polar Environment Monitoring and Public Governance Key Laboratory of Ministry of Education Open Fund (202405).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Tengfei Wang was employed by the company China Railway Tunnel Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Liu, C.A.; Chen, Z.; Hao, P.; Li, K.; Wang, X. LAI Retrieval of Winter Wheat using Simulated Compact SAR Data through GA-PLS Modeling. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 3840–3843. [Google Scholar]
Chen, Y.; Liu, X. Research on Methods of Quick Monitoring and Evaluating of Flood Disaster in Poyang Lake Area Based on RS and GIS. In Proceedings of the 2008 IEEE International Symposium on Knowledge Acquisition and Modeling Workshop, Wuhan, China, 21–22 December 2008; pp. 1105–1108. [Google Scholar]
Gu, Y.; Tao, J.; Feng, L.; Wang, H. Using VGG16 to Military Target Classification on MSTAR Dataset. In Proceedings of the 2021 2nd China International SAR Symposium (CISS), Shanghai, China, 3–5 November 2021; pp. 1–3. [Google Scholar]
Moroni, D.; Pieri, G.; Salvetti, O.; Tampucci, M. Proactive marine information system for environmental monitoring. In Proceedings of the OCEANS 2015-Genova, Genova, Italy, 18–21 May 2015; pp. 1–5. [Google Scholar]
Arguedas, V.F. Texture-based vessel classifier for electro-optical satellite imagery. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 3866–3870. [Google Scholar]
Jiang, Z.; Wang, Y.; Zhou, X.; Chen, L.; Chang, Y.; Song, D.; Shi, H. Small-Scale Ship Detection for SAR Remote Sensing Images Based on Coordinate-Aware Mixed Attention and Spatial Semantic Joint Context. Smart Cities 2023, 6, 1612–1629. [Google Scholar] [CrossRef]
El-Darymli, K.; Gill, E.W.; Mcguire, P.; Power, D.; Moloney, C. Automatic target recognition in synthetic aperture radar imagery: A state-of-the-art review. IEEE Access 2016, 4, 6014–6058. [Google Scholar] [CrossRef]
Qian, G.; Haipeng, W.; Feng, X. Research progress on aircraft detection and recognition in SAR imagery. J. Radars 2020, 9, 497–513. [Google Scholar]
Gong, S.; Xu, S.; Zhou, L.; Zhu, J.; Zhong, S. Deformable atrous convolution nearshore SAR small ship detection incorporating mixed attention. J. Image Graph. 2022, 27, 3663–3676. [Google Scholar] [CrossRef]
Ruan, C.; Guo, H.; An, J. SAR Inshore Ship Detection Algorithm in Complex Background. J. Image Graph. 2021, 26, 1058–1066. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Qian, Y.; He, X. Design and Implementation of Lightweight Neural Network Inference Accelerator Based on FPGA. In Proceedings of the 2024 International Conference on Control, Electronic Engineering and Machine Learning (CEEML), Kuala Lumpur, Malaysia, 22–24 November 2024; pp. 98–104. [Google Scholar]
Wang, P.; Wang, W.; Wang, H. Infrared unmanned aerial vehicle targets detection based on multi-scale filtering and feature fusion. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2017; pp. 1746–1750. [Google Scholar]
Liu, D.; Liang, J.; Geng, T.; Loui, A.; Zhou, T. Tripartite Feature Enhanced Pyramid Network for Dense Prediction. IEEE Trans. Image Process. 2023, 32, 2678–2692. [Google Scholar] [CrossRef] [PubMed]
Yang, F.; Fan, H.; Chu, P.; Blasch, E.; Ling, H. Clustered Object Detection in Aerial Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8231–8240. [Google Scholar]
Li, R. Improved YOLOv7 Aerial Small Target Detection Algorithm Based on Hole Convolutional ASPP. In Proceedings of the 2023 3rd International Conference on Electronic Information Engineering and Computer Communication (EIECC), Wuhan, China, 22–24 December 2023; pp. 663–666. [Google Scholar]
He, Y.; Zhang, X.; Zheng, S.; Peng, L.; Chen, Y. Object Detector with Multi-head Self-attention and Multi-scale Fusion. In Proceedings of the 2022 International Conference on Algorithms, Data Mining, and Information Technology (ADMIT), Xi’an, China, 23–25 September 2022; pp. 147–154. [Google Scholar]
Chung, W.Y.; Lee, I.H.; Park, C.G. Lightweight Infrared Small Target Detection Network Using Full-Scale Skip Connection U-Net. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Ou, J.; Li, X.; Sun, Y.; Shi, Y. A Configurable Hardware Accelerator Based on Hybrid Dataflow for Depthwise Separable Convolution. In Proceedings of the 2022 4th International Conference on Advances in Computer Technology, Information Science and Communications (CTISC), Suzhou, China, 22–24 April 2022; pp. 1–5. [Google Scholar]
Romero, A.; Gatta, C.; Camps-Valls, G. Unsupervised deep feature extraction of hyperspectral images. In Proceedings of the 2014 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Lausanne, Switzerland, 24–27 June 2014; pp. 1–4. [Google Scholar]
Li, G.; Liu, Z.; Zhang, X.; Lin, W. Lightweight Salient Object Detection in Optical Remote-Sensing Images via Semantic Matching and Edge Alignment. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–11. [Google Scholar] [CrossRef]
Peng, W.; Zhang, L.; Zhao, L.; Li, X. Lite-ODNet: A Lightweight Object Detection Network. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR), Hong Kong, China, 15–17 November 2024; pp. 229–236. [Google Scholar]
Li, L.; Du, L.; Wang, Z. Target Detection Based on Dual-Domain Sparse Reconstruction Saliency in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4230–4243. [Google Scholar] [CrossRef]
Gan, R.; Wang, J. Distribution-based CFAR detectors in SAR images. J. Syst. Eng. Electron. 2006, 17, 717–721. [Google Scholar] [CrossRef]
Li, J.x.; Chen, H. SAR image preprocessing based on the CFAR and ROA algorithm. In Proceedings of the IET International Radar Conference 2013, Xi’an, China, 14–16 April 2013; pp. 1–4. [Google Scholar]
Gao, G.; Liu, L.; Zhao, L.; Shi, G.; Kuang, G. An Adaptive and Fast CFAR Algorithm Based on Automatic Censoring for Target Detection in High-Resolution SAR Images. IEEE Trans. Geosci. Remote Sens. 2009, 47, 1685–1697. [Google Scholar] [CrossRef]
Tian, S.; Wang, C.; Zhang, H. A segmentation based global iterative censoring scheme for ship detection in synthetic aperture radar image.doc. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 6513–6516. [Google Scholar]
Nercessian, S.; Panetta, K.; Agaian, S. Improving edge-based feature extraction using feature fusion. In Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore, 12–15 October 2008; pp. 679–684. [Google Scholar]
Shin, H.C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [Google Scholar] [CrossRef] [PubMed]
Xue, Z.; Chen, W.; Li, J. Enhancement and Fusion of Multi-Scale Feature Maps for Small Object Detection. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7212–7217. [Google Scholar]
Yang, Z.; Liu, Y.; Gao, Z.; Wen, G.; Emma Zhang, W.; Xiao, Y. Deep Convolutional Feature Enhancement for Remote Sensing Object Detection. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Dai, Y.; Liu, W.; Wang, H.; Xie, W.; Long, K. YOLO-Former: Marrying YOLO and Transformer for Foreign Object Detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Chen, P.; Yang, J.; An, W.; Zheng, G.; Luo, D.; Lu, A.; Wang, Z. TKP-Net: A Three Keypoint Detection Network for Ships Using SAR Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 364–376. [Google Scholar] [CrossRef]
Yang, C.; Li, B.; Wang, Y. A Fully Quantitative Scheme with Fine-grained Tuning Method for Lightweight CNN Acceleration. In Proceedings of the 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Genoa, Italy, 27–29 November 2019; pp. 125–126. [Google Scholar]
Zheng, X.; Feng, Y.; Shi, H.; Zhang, B.; Chen, L. Lightweight convolutional neural network for false alarm elimination in SAR ship detection. In Proceedings of the IET International Radar Conference (IET IRC 2020), Online, 4–6 November 2020; Volume 2020, pp. 287–291. [Google Scholar] [CrossRef]
Zhou, L.; Wei, S.; Cui, Z.; Fang, J.; Yang, X.; Ding, W. Lira-YOLO: A lightweight model for ship detection in radar images. J. Syst. Eng. Electron. 2020, 31, 950–956. [Google Scholar] [CrossRef]
Lan, R.; Sun, L.; Liu, Z.; Lu, H.; Pang, C.; Luo, X. MADNet: A Fast and Lightweight Network for Single-Image Super Resolution. IEEE Trans. Cybern. 2021, 51, 1443–1453. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Zhang, S.; Duan, S.; Yang, W. An Effective and Lightweight Hybrid Network for Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–11. [Google Scholar] [CrossRef]
Deng, L.; Bi, L.; Li, H.; Chen, H.; Duan, X.; Lou, H.; Zhang, H.; Bi, J.; Liu, H. Lightweight aerial image object detection algorithm based on improved YOLOv5s. Sci. Rep. 2023, 13, 7817. [Google Scholar] [CrossRef] [PubMed]
Ma, T.; Yang, Z.; Liu, B.; Sun, S. A Lightweight Infrared Small Target Detection Network Based on Target Multiscale Context. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Kou, R.; Wang, C.; Yu, Y.; Peng, Z.; Yang, M.; Huang, F.; Fu, Q. LW-IRSTNet: Lightweight Infrared Small Target Segmentation Network and Application Deployment. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
Ye, T.; Qin, W.; Zhao, Z.; Gao, X.; Deng, X.; Ouyang, Y. Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans. Instrum. Meas. 2023, 72, 1–13. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Misbah, M.; Khan, M.U.; Kaleem, Z.; Muqaibel, A.; Alam, M.Z.; Liu, R.; Yuen, C. MSF-GhostNet: Computationally Efficient YOLO for Detecting Drones in Low-Light Conditions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 3840–3851. [Google Scholar] [CrossRef]
Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Cao, C.; Chen, S.; Zhang, W.; Tian, W.; Miao, H. A Real-Time SAR Ship Detection Method Based on Improved Yolov5. In Proceedings of the 2023 Cross Strait Radio Science and Wireless Technology Conference (CSRSWTC), Guilin, China, 10–13 November 2023; pp. 1–3. [Google Scholar] [CrossRef]
Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
Zhang, T.; Zhang, X.; Li, J.; Xu, X.; Wang, B.; Zhan, X.; Xu, Y.; Ke, X.; Zeng, T.; Su, H.; et al. SAR Ship Detection Dataset (SSDD): Official Release and Comprehensive Data Analysis. Remote Sens. 2021, 13, 3690. [Google Scholar] [CrossRef]
Yue, T.; Zhang, Y.; Liu, P.; Xu, Y.; Yu, C. A Generating-Anchor Network for Small Ship Detection in SAR Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7665–7676. [Google Scholar] [CrossRef]
Guan, T.; Chang, S.; Wang, C.; Jia, X. SAR Small Ship Detection Based on Enhanced YOLO Network. Remote Sens. 2025, 17, 839. [Google Scholar] [CrossRef]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6568–6577. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Deng, L.; Li, H.; Liu, H.; Gu, J. A lightweight YOLOv3 algorithm used for safety helmet detection. Sci. Rep. 2022, 12, 10981. [Google Scholar] [CrossRef] [PubMed]
Rani, E. LittleYOLO-SPP: A delicate real-time vehicle detection algorithm. Optik 2021, 225, 165818. [Google Scholar]
Zheng, Y.; Zhang, Y.; Qian, L.; Zhang, X.; Diao, S.; Liu, X.; Cao, J.; Huang, H. A lightweight ship target detection model based on improved YOLOv5s algorithm. PLoS ONE 2023, 18, 1–23. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Jha, A.; Tiwari, S.K. Glioma Detection Using YOLO V9: A Deep Learning Framework. In Proceedings of the 2025 3rd International Conference on Smart Systems for Applications in Electrical Sciences (ICSSES), Tumakuru, India, 7–8 March 2025; pp. 1–6. [Google Scholar]
Tasin, M.A.U.; Faiyaz, G.M.F.; Uddin, M.N. Deep Learning for Brain Tumor Detection Leveraging YOLOv10 for Precise Localization. In Proceedings of the 2024 IEEE 3rd International Conference on Robotics, Automation, Artificial-Intelligence and Internet-of-Things (RAAICON), Dhaka, Bangladesh, 29–30 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 207–212. [Google Scholar]
Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv 2024, arXiv:2502.12524. [Google Scholar]
Han, G.; Huang, S.; Zhao, F.; Tang, J. SIAM: A parameter-free, Spatial Intersection Attention Module. Pattern Recognit. 2024, 153, 110509. [Google Scholar] [CrossRef]
Jiang, Y.; Jiang, Z.; Han, L.; Huang, Z.; Zheng, N. MCA: Moment channel attention networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 2579–2588. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
Liu, H.; Liu, F.; Fan, X.; Huang, D. Polarized self-attention: Towards high-quality pixel-wise regression. arXiv 2021, arXiv:2107.00782. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar] [CrossRef]
Xin, Z.; Lu, T.; Li, X. Detection of Train Bottom Parts Based on XIoU. In Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology, New York, NY, USA, 27 June–1 July 2019; pp. 91–96. [Google Scholar]

Figure 1. (a,b) Open water and port operation scenarios from the HRSID dataset, respectively, with small ship targets marked by green arrows; (c,d) coastal zone and bay composite scenarios from the SSDD dataset, featuring prominent complex background noise.

Figure 2. Overall workflow based on the YOLOv5 structure: the GCCR-GhostNet module enhances weak feature capture via embedded channel attention mechanisms while reducing parameters. The LSD-Head detector replaces the traditional convolution with a simple linear transformation to further improve the network efficiency. P-MIoU optimizes shape matching accuracy. These technologies work together to achieve a balance between efficiency and accuracy in small ship target detection.

Figure 3. Overall workflow for the GCCR-GhostNet module extracts global channel information via GAP, captures nonlinear inter-channel dependencies using two 1 × 1 convolutions with ReLU, and generates adaptive attention weights through a Sigmoid function to recalibrate feature channels.

Figure 4. Overall workflow of the LSD-Head module: A detection head structure that integrates attention mechanisms with lightweight convolutions.

Figure 5. Performance comparison between CIoU and P-MIoU.

Figure 6. Figures (a–c) respectively show the performance of representative target detection algorithms in three different test scenarios on the HRSID dataset. Figures (a,b) contain multiple vessels with different target morphologies and partially close to the shoreline with complex background textures. Figure (c) shows a typical land-sea interface complex background scene with dense small targets and obvious feature texture interference. The area indicated by the red arrow represents false detection, and the green rectangular box marks the missed target.

Figure 7. Figures (a–c) respectively show the performance of representative target detection algorithms in three different test scenarios on the SSDD dataset. (a) displays a multi-target scattering scene, which contains multiple scattered small ship targets. (b) shows a nearshore single-target scenario. (c) presents a complex port scenario, containing densely docked ship targets. Areas indicated by red arrows represent false detections, and green rectangles mark missed targets.

Figure 8. Comparison of the attention visualization of YOLOv5-Small and LWSARDet. (a) displays the attention map for a scenario with a small ship target in a low-resolution image. (b) shows the attention map for a nearshore single-target scenario. (c) presents the attention map for a complex port scenario with multiple docked ship targets. Green rectangles highlight the ground truth target positions, and areas of high attention are highlighted in the heatmaps.

Figure 9. Comparison of the training and testing times of each detection model on the HRSID and SSDD dataset.

Figure 10. Impact of different target size on localization loss.

Table 1. Comparative experiments on HRSID.

Methods	GFLOPs/G ↓	mAP/% ↑	mAP50-90/% ↑	Recall/% ↑	Precision/% ↑	Params/M ↓
CenterNet [60]	70.2	62.5	30.2	42.0	97.9	3.27
SSD [61]	360.7	50.0	22.4	22.7	89.3	50.21
YOLOv3 [35]	154.5	94.1	67.6	78.0	91.8	61.50
YOLOv3-CSP [62]	155.4	93.7	67.1	88.6	92.1	62.55
YOLOv3-Tiny [63]	12.9	83.7	56.2	76.5	90.1	8.67
YOLOv5-Small [64]	15.8	91.1	67.8	83.5	90.2	7.01
YOLOv6 [65]	11.8	88.9	63.3	81.1	89.7	4.23
YOLOv8 [66]	8.1	90.9	65.3	84.1	90.4	3.01
YOLOv9-Tiny [67]	7.6	91.6	65.8	83.5	91.3	1.97
YOLOv10-Nano [68]	8.2	90.2	65.1	79.3	91.6	2.69
YOLOv12-Nano [69]	5.8	88.6	62.4	85.1	89.0	2.50
Yue et al. [58]	105.6	91.3	66.5	87.2	91.7	43.42
Guan et al. [59]	19.2	91.0	66.3	83.7	90.4	8.50
LWSARDet-Small	12.8	94.2	66.9	89.3	92.5	6.45
LWSARDet-Nano	3.4	94.1	67.0	87.9	92.0	1.63

↑ Indicates a higher value that signifies better performance; ↓ indicates a lower value that signifies better performance. Red highlights the best result, orange highlights the second-best.

Table 2. Comparative experiments on SSDD.

Methods	GFLOPs/G ↓	mAP/% ↑	mAP50-90/% ↑	Recall/% ↑	Precision/% ↑	Params/M ↓
CenterNet [60]	70.2	73.5	40.7	57.7	83.5	3.27
SSD [61]	360.7	51.3	22.9	53.4	83.1	50.21
YOLOv3 [35]	154.5	90.5	68.9	81.0	83.5	61.50
YOLOv3-CSP [62]	155.4	90.7	69.5	86.6	79.9	62.55
YOLOv3-Tiny [63]	12.9	90.3	68.0	85.3	81.4	8.67
YOLOv5-Small [64]	15.8	89.1	67.3	85.1	81.8	7.01
YOLOv6 [65]	11.8	89.1	65.7	85.2	82.1	4.23
YOLOv8 [66]	8.1	88.7	65.5	86.5	80.1	3.01
YOLOv9-Tiny [67]	7.6	89.2	66.6	84.1	81.4	1.97
YOLOv10-Nano [68]	8.2	86.1	62.5	80.6	81.1	2.69
YOLOv12-Nano [69]	5.8	86.5	62.4	85.1	81.2	2.50
Yue et al. [58]	105.6	91.8	64.5	88.1	81.7	43.42
Guan et al. [59]	19.2	90.0	66.3	81.3	77.4	8.50
LWSARDet-Small	12.8	92.1	70.4	90.5	81.2	6.45
LWSARDet-Nano	3.4	90.4	67.7	89.7	78.2	1.63

↑ Indicates a higher value that signifies better performance; ↓ indicates a lower value that signifies better performance. Red highlights the best result, orange highlights the second-best.

Table 3. Ablation experiments on attention module.

Metric	-	Siam2 [70]	MCA [71]	CA [72]	PSA [73]	LSD-Head
GFLOPs/G ↓	12.8	12.8	12.8	12.8	13.8	12.8
mAP/% ↑	92.8	92.9	93.2	93.4	92.8	94.2
mAP50-90/% ↑	63.7	64.7	64.3	64.4	64.5	66.9
Recall/% ↑	84.9	85.7	86.1	86.1	84.7	89.2
Precision/% ↑	91.2	92.0	92.3	93.4	94.5	92.4
Params/M ↓	6.45	6.45	6.45	6.49	7.14	6.45

↑ Indicates a higher value that signifies better performance; ↓ indicates a lower value that signifies better performance. Red highlights the best result, orange highlights the second-best.

Table 4. Ablation Experiments on IoU loss function on HRSID.

Metric	-	EIoU [74]	SIoU [75]	XIoU [76]	P-MIoU
mAP/% ↑	92.8	92.6	92.8	93.1	94.2
mAP50-90/% ↑	63.7	63.1	63.5	64.6	66.9
Recall/% ↑	84.9	85.5	86.3	86.8	89.2
Precision/% ↑	91.2	93.7	91.8	90.9	92.4

↑ Indicates a higher value that signifies better performance; Red highlights the best result, and orange highlights the second-best.

Table 5. Ablation experiments for small version on HRSID.

Baseline	GCCR -GhostNet	LSD -Head	P-MIoU	GFLOPs/G ↓	mAP/% ↑	mAP50-90/% ↑	Recall/% ↑	Precision/% ↑	Params/M ↓
✓	–	–	–	15.8	91.1	67.8	83.5	90.2	7.01
✓	✓	–	–	12.8	92.9	63.7	84.9	91.2	6.45
✓	✓	✓	–	12.8	93.2	63.7	85.6	92.2	6.45
✓	✓	✓	✓	12.8	94.2	66.9	89.3	92.5	6.45

↑ Indicates a higher value that signifies better performance; ↓ indicates a lower value that signifies better performance.

Table 6. Ablation experiments on HRSID.

Methods	GFLOPs/G ↓	mAP/% ↑	mAP50-90/% ↑	Params/M ↓
GhostNetV2	12.8	93.7	65.0	6.45
GCCR-GhostNet	12.8	94.2	66.9	6.45

↑ Indicates a higher value that signifies better performance; ↓ indicates a lower value that signifies better performance. Red highlights the best result.

Table 7. Ablation experiment table for the Nano version on SSDD.

Baseline	GCCR -GhostNet	LSD -Head	P-MIoU	GFLOPs/G ↓	mAP/% ↑	mAP50-90/% ↑	Recall/% ↑	Precision/% ↑	Params/M ↓
✓	–	–	–	15.8	89.1	67.3	85.1	81.8	7.01
✓	✓	–	–	3.4	92.3	69.1	86.2	87.0	1.63
✓	✓	–	✓	3.4	88.3	63.3	89.9	73.6	1.63
✓	✓	✓	✓	3.4	90.4	67.7	89.7	78.2	1.63

↑ Indicates a higher value that signifies better performance; ↓ indicates a lower value that signifies better performance.

Table 8. Performance comparison of LWSARDet on the HRSID and SSDD datasets.

Methods	Dataset	GFLOPs/G ↓	mAP/% ↑	mAP50-90/% ↑	Recall/% ↑	Precision/% ↑	Params/M ↓
LWSARDet-Small	HRSID	12.8	94.2	66.9	89.3	92.5	6.45
LWSARDet-Small	SSDD	12.8	92.1	70.4	90.5	81.2	6.45
LWSARDet-Nano	HRSID	3.4	94.1	67.0	87.9	92.0	1.63
LWSARDet-Nano	SSDD	3.4	90.4	67.7	89.7	78.2	1.63

↑ Indicates a higher value that signifies better performance; ↓ indicates a lower value that signifies better performance. Red highlights the best result in each column.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Y.; Du, Y.; Wang, Q.; Li, C.; Miao, Y.; Wang, T.; Song, X. LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism. Remote Sens. 2025, 17, 2514. https://doi.org/10.3390/rs17142514

AMA Style

Zhao Y, Du Y, Wang Q, Li C, Miao Y, Wang T, Song X. LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism. Remote Sensing. 2025; 17(14):2514. https://doi.org/10.3390/rs17142514

Chicago/Turabian Style

Zhao, Yuliang, Yang Du, Qiutong Wang, Changhe Li, Yan Miao, Tengfei Wang, and Xiangyu Song. 2025. "LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism" Remote Sensing 17, no. 14: 2514. https://doi.org/10.3390/rs17142514

APA Style

Zhao, Y., Du, Y., Wang, Q., Li, C., Miao, Y., Wang, T., & Song, X. (2025). LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism. Remote Sensing, 17(14), 2514. https://doi.org/10.3390/rs17142514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LWSARDet: A Lightweight SAR Small Ship Target Detection Network Based on a Position–Morphology Matching Mechanism

Abstract

1. Introduction

2. Related Work

2.1. SAR Small Target Detection Methods

2.2. Model Lightweight Method

3. Proposed Methods

3.1. GhostNet with Global Channel Recalibration (GCCR-GhostNet)

3.2. Detection Head Structure Integrating Attention Mechanism and Lightweight Convolution (LSD-Head)

3.3. Position–Morphology Matching IoU(P-MIoU)

3.3.1. Loss Constraint Based on Center Point Position

3.3.2. Loss Constraint Based on Morphological Matching

4. Results

4.1. Dataset

4.2. Experimental Environment

4.3. Evaluation Criteria

4.4. Performance Comparison

4.4.1. Performance on HRSID Datasets

4.4.2. Performance on SSDD Datasets

4.4.3. Attention Visualization Analysis

4.4.4. Computational Efficiency Analysis

4.5. Ablation Study

4.5.1. Ablation on Attention Network

4.5.2. Ablation Experiments on IoU Loss Function

4.5.3. Overall Impact of Components

Sensitivity Analysis on LWSARDet-Small

Sensitivity Analysis on LWSARDet-Nano

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI