A Lightweight Network Based on One-Level Feature for Ship Detection in SAR Images

: Recently, deep learning has greatly promoted the development of detection methods for ship targets in synthetic aperture radar (SAR) images. However, existing detection networks are mostly based on large-scale models and high-cost computations, which require high-performance computing equipment to realize real-time processing and limit their hardware transplantation to onboard platforms. To address this problem, a lightweight ship detection network via YOLOX-s is proposed in this paper. Firstly, we remove the computationally heavy pyramidal structure and build a streamlined network based on a one-level feature for higher detection efﬁciency. Secondly, to expand the limited receptive ﬁeld and enhance the semantic information of a single-feature map, a residual asymmetric dilated convolution (RADC) block is proposed. Through four branches with different dilation rates, the RADC block can help the detector to capture various ships in complex backgrounds. Finally, to tackle the imbalance problem between ships of different scales in the training stage, we put forward a balanced label assignment strategy called center-based uniform matching. To verify the effectiveness of the proposed method, we conduct extensive experiments on the SAR Ship Detection Dataset (SSDD) and High-Resolution SAR Images Dataset (HRSID). The results show that our method can achieve comparable performance to general detection networks with much less computational cost. W.Y.; methodology, and software, vali-dation, and investigation, writing—original


Introduction
Synthetic aperture radar (SAR) is an active microwave earth observation system that can stably provide high-resolution images in all weather conditions at any time of day. With the rapid development of SAR technology, the quantity and diversity (i.e., different resolutions, scenarios, imaging platforms, etc.) of SAR images are improving year by year, which promotes the research of SAR image interpretation algorithms [1][2][3][4]. Among the interpretation tasks of SAR images, ship detection is a fundamental task in marine monitoring and national defense. However, due to the complex background and resourceconstrained onboard environments, real-time ship detection with high-resolution SAR images is still a challenging task.
Traditional ship detection methods are mainly based on prior knowledge, such as statistical models and hand-crafted features. One of the most representative methods is the constant false alarm rate (CFAR) [5] detection algorithm, which models the background clutter using statistical distribution and computes an adaptive threshold to determine whether a pixel belongs to the target region. To deal with different sea conditions, researchers have designed many novel CFAR detectors by adopting different statistical models to fit complex sea clutter and designing new sliding window structures to estimate model models to fit complex sea clutter and designing new sliding window structures to estimate model parameters. However, the scattering-based CFAR heavily relies on sea conditions and cannot deal with multitarget situations and nonhomogeneous backgrounds. To solve this problem, other features are also utilized to detect ships, such as extended fractal features [6], scale-invariant feature transform (SIFT) [7], reflection symmetry properties [8], and saliency information [9]. The performance of these methods mainly depends on manually designed features and is not stable when ships are of various shapes or under different backgrounds. Moreover, sea-land segmentation is always required to decrease false alarms in inshore areas, which increases the complexity of the detection algorithm.
Object detection in optical images is one of the major tasks in computer vision and great breakthroughs have been made in recent years. Benefitting from their automatic learning ability and powerful feature extracting ability, convolutional neural networks (CNNs) can precisely locate targets in the image. According to the development process of CNN-based detectors, they can be roughly grouped into three categories: two-stage methods [10][11][12], one-stage anchor-based methods [13][14][15][16][17][18], and one-stage anchor-free methods [19][20][21]. In the recently presented YOLOX [22], the YOLO network is integrated with an anchor-free mechanism and SimOTA label matching strategy, which achieves state-of-the-art performance.
The breakthrough in computer vision also promotes the rapid development of SAR image processing. Attracted by their simplicity and high accuracy, many scholars tried to introduce these CNN-based detectors to SAR image interpretation tasks [2,[23][24][25][26][27]. However, certain problems remain to be solved for ship detection in SAR images. First, due to the active imaging mechanism of SAR, there inevitably exists coherent speckle noise, which is far from the noise in optical images and leads to a more complex background. Besides, despite the various image resolutions and ship sizes, most ships are small compared to the large-scene background, as shown in Figure 1a. Small ships take up only a few pixels in the image, making them more likely to be missed by the network. To address these problems, scholars have proposed many novel models to improve ship detection accuracy. Kang et al. [28] combined three feature layers for region generation and proposed a contextual CNN detector. Jiao et al. [29] densely connected all feature maps from top down to achieve multiscale and multiscene SAR ship detection. To detect multiscale ships with different directions, Zhao et al. [30] designed an attention-receptive pyramid network. Fu et al. [31] proposed level-based attention to better fuse features across different pyramidal levels. Zhang et al. [32] integrated four unique FPNs to constitute the Quad-FPN and significantly boosted ship detection performance. Gao et al. [33] replaced the path aggregation network (PANet) [34] in YOLOv4 [16] with the scale-equalizing pyramid convolution (SEPC) [35] to better extract semantic characteristics of different scales. Among these methods, inserted modules, e.g., the attention mechanism, and a new feature fusion approach are common solutions to complex backgrounds and small target scales. However, the model parameter and computational complexity are also increased due to extra structures, causing a decline in detection speed. In practical maritime surveillance, the timeliness of detection results is important in addition to accuracy. Since data transmitting between satellite and ground station could be time-consuming, it is of crucial importance to realize real-time on-satellite ship detection. However, the weight and volume of the processing system are limited due to the finite load of the satellite, leading to constrained computation resources. Therefore, lightness and efficiency of the detection model are critical to algorithm deployment and achieving real-time detection. For neural network-based detectors, the network architecture which decides how features are propagated and represented is of vital importance to the efficiency of detection algorithms. In consideration of this, some researchers have put forward novel lightweight models to improve detection speed. Zhang et al. [37] proved that SAR ship detection is relatively easier than optical target detection by proposing "ShipDeNet-20", which combines feature fusion, feature enhance and scale share feature pyramid modules to build a lightweight and precise model. To realize low-equipment-required real-time ship detection, Jiang et al. [38] constructed an end-to-end detection network based on YOLOv4, achieving a smaller model size and higher detection speed. Feng et al. [39] redesigned the backbone of YOLOX and proposed position-enhanced attention to improve the accuracy and speed of SAR ship detection from a more balanced perspective.
Moreover, general detection networks designed for optical images become redundant when they are directly applied to SAR ship detection. Compared to optical images, SAR images have a relatively low resolution as well as a low signal-to-noise ratio, which means the amplitude information of SAR images is not as incomprehensible as that of optical targets for CNN. Hence, there is room for the simplification of ship detection networks. For instance, the feature pyramid structure is widely adopted in recently developed detection networks [22,40,41] to achieve multiscale detection. As shown in Figure 2a, the key idea of such pyramidal structures is to combine semantically weak but spatially strong low-level feature maps with spatially coarse but semantically strong high-level feature maps to create a feature pyramid that has strong semantical information at all scales. In optical images with complex contexts, the target scale distribution is more even; therefore, the feature pyramid is essential for balancing targets of different scales. It can utilize multiple feature maps to split the optimization process of the complex detection problem into multiple subproblems according to object scales, thus improving detection efficiency. However, the situation is different for SAR ship detection. There are two main reasons. First, although the size of the ship target varies by ship type and imaging resolution, most ships are small and cover a few pixels in SAR images. Because the downsampling operation in CNNs loses detail information and is affected by speckle noise and background interference, these small ships tend to be more easily overlooked in high-level features. Intuitively, it is questionable whether a ship with a length and width of no more than 20 pixels retains valid target information after 32× downsampling. As a result, the fusion between high-level features and low-level features could be of little help to the latter. Second, compared to optical images containing multiple color channels, single-channel SAR images have a relatively low texture level. Therefore, the semantic information of SAR ship targets is not as complex as that of optical image targets, leading to a relative reduction in the required network depth [36]. The semantic level in low-level features could be discriminative enough to distinguish between ships and interference, making the localization information in lowlevel features more important than the semantic information from high-level features. Therefore, feature pyramids are inefficient for SAR ship detection due to their equal focus on high-level features and low-level features. In summary, although the feature pyramid can deal with multiscale target problems, it is not efficient enough for fast ship detection.
Based on the analysis above, a lightweight SAR ship detection method using a one-level feature is proposed. On account of the state-of-the-art performance, we chose the small version of YOLOX, i.e., YOLOX-s, as our baseline and further simplified the network structure. Different from current detection methods, the proposed network replaced the computationally heavy feature pyramid structure with neat convolution blocks and detect objects based on the one-level feature. It can be seen from Figure 2b  network has a shallower depth and more simple structure, showing greater portability. Detailed contributions of this paper are summarized as follows: (1) Inspired by the idea of utilizing a one-level feature in YOLOF [42], the feature representation ability of one-level feature maps for ship detection in SAR images is verified and a novel ship detector is proposed. Different from mainstream ship detection methods, the proposed method offers an alternative option to complex pyramidal structures by detecting ships using a one-level feature, which is valid and efficient. (2) In order to expand the receptive field and enrich semantic information of the one-level feature map, a residual asymmetric dilated convolution (RADC) block is proposed. By stacking convolutional blocks with four different dilated branches, ships with various shapes can be captured by the network efficiently. (3) Since the proposed network detects objects on a single scale, large targets take up more pixels whereas small targets are easily ignored when calculating losses. To deal with this imbalance problem, center-based uniform matching, which assigns labels based on their center locations, is employed during the training stage. Based on the analysis above, a lightweight SAR ship detection method using a onelevel feature is proposed. On account of the state-of-the-art performance, we chose the small version of YOLOX, i.e., YOLOX-s, as our baseline and further simplified the network structure. Different from current detection methods, the proposed network replaced the computationally heavy feature pyramid structure with neat convolution blocks and detect objects based on the one-level feature. It can be seen from Figure 2b that the proposed network has a shallower depth and more simple structure, showing greater portability. Detailed contributions of this paper are summarized as follows: (1) Inspired by the idea of utilizing a one-level feature in YOLOF [42], the feature representation ability of one-level feature maps for ship detection in SAR images is verified and a novel ship detector is proposed. Different from mainstream ship detection methods, the proposed method offers an alternative option to complex pyramidal structures by detecting ships using a one-level feature, which is valid and efficient. (2) In order to expand the receptive field and enrich semantic information of the onelevel feature map, a residual asymmetric dilated convolution (RADC) block is proposed. By stacking convolutional blocks with four different dilated branches, ships with various shapes can be captured by the network efficiently. (3) Since the proposed network detects objects on a single scale, large targets take up more pixels whereas small targets are easily ignored when calculating losses. To deal To validate the effectiveness and robustness of the proposed method, extensive experiments on the SAR Ship Detection Dataset (SSDD) [36] and High-Resolution SAR Images Dataset (HRSID) [43] were conducted. The results show that the proposed method achieves comparative performance with baseline while requiring fewer model parameters and less computational cost, proving its high efficiency and reliability.
The rest of the paper is arranged as follows. In Section 2, the overall structure and detailed improvements of the proposed method are described. Section 3 provides the experiment results as well as corresponding analysis. Then, the experiment results are discussed and problems are analyzed in Section 4. Lastly, the conclusion is drawn in Section 5.

Materials and Methods
The proposed method can be mainly divided into four parts: a baseline network using a backbone for feature extraction, a neck constructed by a projector and stacking RADC modules, a decoupled head for final detection, and a label assignment strategy for model training. First of all, the overall framework, as well as the backbone of the proposed Remote Sens. 2022, 14, 3321 5 of 21 network, is described. Next, the structure of the proposed RADC block is illustrated. After that, the structure of the decoupled detection head is described. Then, drawbacks of traditional label assignment strategies on a single detection head are analyzed and centerbased uniform matching is presented in detail. Finally, details of the output mapping and loss function calculation are given.

Overall Scheme of the Proposed Network
In recently developed one-stage detection networks, the most commonly used structure is the combination of backbone, neck, and head [16]. Backbones, such as VGG [44], ResNet [45] and DenseNet [46], are the key part for feature extracting. Following YOLOX, the Cross-Stage Partial Darknet (CSPDarknet) [47] was adopted to construct our backbone. During the forward propagation process, the image is gradually subsampled through convolution layers with stride 2, and higher feature maps are obtained. The feature map after l times of downsampling is denoted as C l ∈ R W/s l ×H/s l ×c in in this paper, where W × H is the size of the input image, c in is the channel number, and s l = 2 l is the corresponding downsampling rate of C l . In order to validate the effectiveness of the one-level feature map for ship detection, we only adopted one feature map from the backbone, as shown in Figure 3. Different from multilevel feature maps that require sequential resampling and fusion, the only feature map is enhanced by a streamlined neck. First, the feature map is adjusted by a projector which consists of a 1 × 1 convolution layer and a 3 × 3 convolution layer, where both convolutions are followed by a batch normalization (BN) operation. Additionally, the channel number of the feature map is changed to c out . Then, the feature map is enhanced by n consecutive RADC blocks. With multiple asymmetric dilation convolution branches, an RADC block can expand the receptive field and can efficiently discover strip-shaped ship targets. The processed feature P l is transferred into detection output by a decoupled head. Finally, to generate the final detection results, non-maximum suppression (NMS) is used to remove repetitive predictions.

Materials and Methods
The proposed method can be mainly divided into four parts: a baseline network using a backbone for feature extraction, a neck constructed by a projector and stacking RADC modules, a decoupled head for final detection, and a label assignment strategy for model training. First of all, the overall framework, as well as the backbone of the proposed network, is described. Next, the structure of the proposed RADC block is illustrated. After that, the structure of the decoupled detection head is described. Then, drawbacks of traditional label assignment strategies on a single detection head are analyzed and centerbased uniform matching is presented in detail. Finally, details of the output mapping and loss function calculation are given.

Overall Scheme of the Proposed Network
In recently developed one-stage detection networks, the most commonly used structure is the combination of backbone, neck, and head [16]. Backbones, such as VGG [44], ResNet [45] and DenseNet [46], are the key part for feature extracting. Following YOLOX, the Cross-Stage Partial Darknet (CSPDarknet) [47] was adopted to construct our backbone. During the forward propagation process, the image is gradually subsampled through convolution layers with stride 2, and higher feature maps are obtained. The feature map after l times of downsampling is denoted as is the corresponding downsampling rate of Cl. In order to validate the effectiveness of the onelevel feature map for ship detection, we only adopted one feature map from the backbone, as shown in Figure 3. Different from multilevel feature maps that require sequential resampling and fusion, the only feature map is enhanced by a streamlined neck. First, the feature map is adjusted by a projector which consists of a 1 × 1 convolution layer and a 3 × 3 convolution layer, where both convolutions are followed by a batch normalization (BN) operation. Additionally, the channel number of the feature map is changed to cout. Then, the feature map is enhanced by n consecutive RADC blocks. With multiple asymmetric dilation convolution branches, an RADC block can expand the receptive field and can efficiently discover strip-shaped ship targets. The processed feature Pl is transferred into detection output by a decoupled head. Finally, to generate the final detection results, non-maximum suppression (NMS) is used to remove repetitive predictions.

Reg.
Obj.  Figure 3. The overall framework of the proposed method. In this figure, "CSPDarknet*" represents the backbone CSPDarknet that is truncated from a middle stage. The structure of CSPDarknet, as well as the backbone of the proposed method, is shown in Figure 4. As a note, Conv Block denotes a convolutional layer followed by a BN layer and a SiLU activation function. When the image is input into CSPDarknet, several feature maps with different downsampling rates are generated at specific stages, which are the input of the neck part. For mainstream detection networks, C 3 , C 4, and C 5 are utilized to realize feature fusion across different scales, which is also the case of YOLOX-s. To verify the effectiveness of one-level features, an appropriate feature level associated with target characteristics is crucial. On account of the small size of ship targets and the influence of coherent noise, shallow feature maps with higher resolution tend to retain more efficient characteristics of small ship targets than deep feature maps. In view of this, some researchers additionally added a shallower level C 2 into their feature pyramid [48,49] to build an effective ship detector, which demonstrates the importance of shallow feature maps in SAR ship detection. Therefore, we set l = 3 and chose C 3 as the only output of the backbone.

RADC
layer and a SiLU activation function. When the image is input into CSPDarknet, several feature maps with different downsampling rates are generated at specific stages, which are the input of the neck part. For mainstream detection networks, C3, C4, and C5 are utilized to realize feature fusion across different scales, which is also the case of YOLOX-s. To verify the effectiveness of one-level features, an appropriate feature level associated with target characteristics is crucial. On account of the small size of ship targets and the influence of coherent noise, shallow feature maps with higher resolution tend to retain more efficient characteristics of small ship targets than deep feature maps. In view of this, some researchers additionally added a shallower level C2 into their feature pyramid [48,49] to build an effective ship detector, which demonstrates the importance of shallow feature maps in SAR ship detection. Therefore, we set l = 3 and chose C3 as the only output of the backbone.

Residual Asymmetric Dilated Convolution Block
As the network goes deeper, high-level feature maps with large receptive fields contain stronger semantic information and are suitable for detecting large ships, whereas semantically weak but spatially strong low-level feature maps with small receptive fields are favorable to the detection of small ship targets. Thus, in addition to the fusion and enhancement of features, the other important role of multiscale detection is that the network has multiple receptive fields to detect targets of all scales. However, when it comes to the single-feature map situation, the receptive field of the output map is a constant, greatly limiting the network's generalization ability. On one hand, if the scale of a ship target is much larger than this receptive field, it would be difficult for the network to fully extract target features and, thus, becomes problematic to detect. On the other hand, a ship that is significantly smaller than the receptive field can be easily ignored by the network, making it hard to be located precisely. To detect ships with different scales within a singlefeature map, we were inspired by SC-EADNet [50] and proposed the RADC block to expand the receptive field of the one-level feature and improve the network's ability to detect ship targets of various scales.

Residual Asymmetric Dilated Convolution Block
As the network goes deeper, high-level feature maps with large receptive fields contain stronger semantic information and are suitable for detecting large ships, whereas semantically weak but spatially strong low-level feature maps with small receptive fields are favorable to the detection of small ship targets. Thus, in addition to the fusion and enhancement of features, the other important role of multiscale detection is that the network has multiple receptive fields to detect targets of all scales. However, when it comes to the single-feature map situation, the receptive field of the output map is a constant, greatly limiting the network's generalization ability. On one hand, if the scale of a ship target is much larger than this receptive field, it would be difficult for the network to fully extract target features and, thus, becomes problematic to detect. On the other hand, a ship that is significantly smaller than the receptive field can be easily ignored by the network, making it hard to be located precisely. To detect ships with different scales within a single-feature map, we were inspired by SC-EADNet [50] and proposed the RADC block to expand the receptive field of the one-level feature and improve the network's ability to detect ship targets of various scales.
As shown in Figure 5, four branches with different asymmetric dilation rates were used to enrich the receptive field of the one-level feature map. The input and output of the RADC block are of the same size, including both feature map scale and channel number. When the feature map with c out channels is fed into the RADC block, the processing schedule can be divided into four steps. First, to reduce computational complexity, the channel number of the feature map is reduced to c m by a convolution block with kernel size 1 × 1. Second, the compressed feature map is parallelly processed by four dilated convolution blocks with kernel size 3 × 3. Since dilated convolution can effectively enlarge receptive fields, the covered scale range of the feature map is also expanded. The output channel number of each branch is a quarter of c m as there are four branches. Then, the four outputs are concatenated and followed by a 1 × 1 convolution to restore the channel number to c out . Finally, we add the input feature to the processed feature by adding a residual connection, resulting in an output feature map with multiple receptive fields. schedule can be divided into four steps. First, to reduce computational complexity, the channel number of the feature map is reduced to cm by a convolution block with kernel size 1 × 1. Second, the compressed feature map is parallelly processed by four dilated convolution blocks with kernel size 3 × 3. Since dilated convolution can effectively enlarge receptive fields, the covered scale range of the feature map is also expanded. The output channel number of each branch is a quarter of cm as there are four branches. Then, the four outputs are concatenated and followed by a 1 × 1 convolution to restore the channel number to cout. Finally, we add the input feature to the processed feature by adding a residual connection, resulting in an output feature map with multiple receptive fields.   The dilation rates of these four branches are set as (1, 1), (d, d), (2 × d, d), and (d, 2 × d), respectively. The parameter d is a predefined base dilation rate that indicates the receptive field level of the RADC block, which is set as 2 in our method. Generally speaking, a larger d can greatly increase the receptive field. However, the 3 × 3 convolution may damage feature extraction if the dilation rate is too large, as those pixels far from the center are less likely to locate at the same target. To enhance the receptive field effectively, we stack identical RADC blocks to expand the receptive field gradually.
There are several benefits of applying the RADC block as a basic component in the neck part. (1) Dilated convolutions are used to enlarge the receptive field of the one-level feature map, which contributes to the detection of large-scale ship targets. Meanwhile, the residual connection can effectively preserve original information, generating a feature map with multiple receptive fields covering all object scales. (2) Considering the slender shape and arbitrary orientation of ship targets in SAR images, as shown in Figure 6a, standard square dilation is not suitable for the convolution kernel to capture ship target features. Thus, there are two branches whose dilation rates are asymmetric in the RADC block, i.e., one horizontally longer and the other vertically longer. The receptive field of the RADC block is demonstrated in Figure 6b, where it can be seen that ships with various shapes can be covered properly, contributing to the feature extraction of large targets. (3) Due to four parallel dilated convolution branches, the low-level feature from the backbone is refined and semantically reinforced with more context information while only a little The dilation rates of these four branches are set as (1, 1) respectively. The parameter d is a predefined base dilation rate that indicates the receptive field level of the RADC block, which is set as 2 in our method. Generally speaking, a larger d can greatly increase the receptive field. However, the 3 × 3 convolution may damage feature extraction if the dilation rate is too large, as those pixels far from the center are less likely to locate at the same target. To enhance the receptive field effectively, we stack identical RADC blocks to expand the receptive field gradually.
There are several benefits of applying the RADC block as a basic component in the neck part. (1) Dilated convolutions are used to enlarge the receptive field of the one-level feature map, which contributes to the detection of large-scale ship targets. Meanwhile, the residual connection can effectively preserve original information, generating a feature map with multiple receptive fields covering all object scales. (2) Considering the slender shape and arbitrary orientation of ship targets in SAR images, as shown in Figure 6a, standard square dilation is not suitable for the convolution kernel to capture ship target features. Thus, there are two branches whose dilation rates are asymmetric in the RADC block, i.e., one horizontally longer and the other vertically longer. The receptive field of the RADC block is demonstrated in Figure 6b, where it can be seen that ships with various shapes can be covered properly, contributing to the feature extraction of large targets. (3) Due to four parallel dilated convolution branches, the low-level feature from the backbone is refined and semantically reinforced with more context information while only a little computational burden is increased. By stacking RADC blocks, the receptive field and semantic level of C l can be gradually enhanced.

Decoupled Head
To make the proposed network simple and streamlined, we dealt with the feature map within the single scale and used only one detection head, which produces one output map O ∈ R W/s l ×H/s l ×(5+c) , where c is the number of target categories. The structure of the decoupled head is shown in Figure 7. The output of the RADC blocks, P l , is sent into the detection head for final detection. The channel number of P l is first changed to 128 through a 1 × 1 convolution block. After that, the feature map is divided into two branches with two convolution blocks, one for classification and the other for regression. The regression branch is further separated to predict the coordinate and quality of the predicted box, i.e., the coordinate branch and IoU (intersection over union) branch. At the end of each branch, a 1 × 1 convolution is performed to compress the channel dimension and form the output map. computational burden is increased. By stacking RADC blocks, the receptive field and semantic level of Cl can be gradually enhanced.

Decoupled Head
To make the proposed network simple and streamlined, we dealt with the feature map within the single scale and used only one detection head, which produces one output map where c is the number of target categories. The structure of the decoupled head is shown in Figure 7. The output of the RADC blocks, Pl, is sent into the detection head for final detection. The channel number of Pl is first changed to 128 through a 1 × 1 convolution block. After that, the feature map is divided into two branches with two convolution blocks, one for classification and the other for regression. The regression branch is further separated to predict the coordinate and quality of the predicted box, i.e., the coordinate branch and IoU (intersection over union) branch. At the end of each branch, a 1 × 1 convolution is performed to compress the channel dimension and form the output map.

Reg.
Obj.  Figure 7. Structure of the decoupled detection head.

Center-Based Uniform Matching
For CNN-based object detectors, dense prior settings (e.g., preset anchor boxes and grid points) are essential to cover as many potential targets as possible. Most of these prior settings are redundant and invalid for the final detection result. During the training process, the label assignment strategy determines which predictions are positive samples and which predictions are negative samples and is of great importance for the optimization of the network. To ensure the effectiveness of loss function, a good assignment strategy should consider the measurement of the similarity between label and prediction and have

Decoupled Head
To make the proposed network simple and streamlined, we dealt with the feature map within the single scale and used only one detection head, which produces one output map where c is the number of target categories. The structure of the decoupled head is shown in Figure 7. The output of the RADC blocks, Pl, is sent into the detection head for final detection. The channel number of Pl is first changed to 128 through a 1 × 1 convolution block. After that, the feature map is divided into two branches with two convolution blocks, one for classification and the other for regression. The regression branch is further separated to predict the coordinate and quality of the predicted box, i.e., the coordinate branch and IoU (intersection over union) branch. At the end of each branch, a 1 × 1 convolution is performed to compress the channel dimension and form the output map.

Cls.
Reg.  Figure 7. Structure of the decoupled detection head.

Center-Based Uniform Matching
For CNN-based object detectors, dense prior settings (e.g., preset anchor boxes and grid points) are essential to cover as many potential targets as possible. Most of these prior settings are redundant and invalid for the final detection result. During the training process, the label assignment strategy determines which predictions are positive samples and which predictions are negative samples and is of great importance for the optimization of the network. To ensure the effectiveness of loss function, a good assignment strategy should consider the measurement of the similarity between label and prediction and have

Center-Based Uniform Matching
For CNN-based object detectors, dense prior settings (e.g., preset anchor boxes and grid points) are essential to cover as many potential targets as possible. Most of these prior settings are redundant and invalid for the final detection result. During the training process, the label assignment strategy determines which predictions are positive samples and which predictions are negative samples and is of great importance for the optimization of the network. To ensure the effectiveness of loss function, a good assignment strategy should consider the measurement of the similarity between label and prediction and have a universal rule to separate positive and negative samples. The IoU threshold was widely adopted as the assignment criterion and, since the arrival of anchor-free detectors, a lot of strategies that are based on the location of labels and predictions were presented [20,21]. To make the label assigning procedure more adaptive, scholars are also trying to design dynamic strategies to improve detection performance [51,52].
As for YOLOX, SimOTA can assign a different number of positive predictions with each ground truth based on their matching similarity, where higher similarity corresponds to more assignments. Such a dynamic mechanism is sensitive to the suitability of predictions for labels and can adaptively adjust positive samples. For multiscale situations in which targets at different scales are assigned with predictions at different scales, SimOTA is good at dividing positives and negatives. However, it cannot balance targets at different scales when all predictions are on the same scale. Larger targets tend to have a higher IoU Remote Sens. 2022, 14, 3321 9 of 21 with predictions and obtain more assignments, whereas small targets are easily neglected. Moreover, in the end stage of network training, too many positive samples are assigned, e.g., 6-8, for each target. The network might be misled by low-quality predictions, leading to a risk of a high false alarm rate.
To deal with the problem of scale imbalance and low-quality matching, a plain matching strategy, namely, center-based uniform matching, is proposed in this paper. As the only output map is subsampled, every location of the output corresponds to a grid in the input image and represents the predicted result around the grid. Considering that CNN extracts the feature in a local manner, closer pixels in the output map are more representative of the target. Thus, according to the center location of the ground truth box, a fixed number of grids around the center point are selected as positive samples. The matching of the ground truth g n and the grid that corresponds to pixel (i, j) in the output map can be summarized as follows: where N k (g n ) represents the region of the k nearest grids around the center point of target g n and k is a constant designed manually. The case of k = 4 is shown in Figure 8, where four nearest grids are selected as positive samples. It can be seen that the assigned positive pixels are distributed in the central region of the target, and the target can be properly covered by their receptive field. Additionally, if a pixel lies in the positive neighbor of multiple targets, it would be assigned to the target with the largest IoU.
sponds to more assignments. Such a dynamic mechanism is sensitive to the suitability of predictions for labels and can adaptively adjust positive samples. For multiscale situations in which targets at different scales are assigned with predictions at different scales, SimOTA is good at dividing positives and negatives. However, it cannot balance targets at different scales when all predictions are on the same scale. Larger targets tend to have a higher IoU with predictions and obtain more assignments, whereas small targets are easily neglected. Moreover, in the end stage of network training, too many positive samples are assigned, e.g., 6-8, for each target. The network might be misled by low-quality predictions, leading to a risk of a high false alarm rate.
To deal with the problem of scale imbalance and low-quality matching, a plain matching strategy, namely, center-based uniform matching, is proposed in this paper. As the only output map is subsampled, every location of the output corresponds to a grid in the input image and represents the predicted result around the grid. Considering that CNN extracts the feature in a local manner, closer pixels in the output map are more representative of the target. Thus, according to the center location of the ground truth box, a fixed number of grids around the center point are selected as positive samples. The matching of the ground truth gn and the grid that corresponds to pixel (i, j) in the output map can be summarized as follows: The hyperparameter k represents the number of positive samples for every target and determines the overall assignment level. When k = 1, the center grid is assigned to the target, which is similar to CenterNet [20]. Generally speaking, more positives can improve The hyperparameter k represents the number of positive samples for every target and determines the overall assignment level. When k = 1, the center grid is assigned to the target, which is similar to CenterNet [20]. Generally speaking, more positives can improve training efficiency while generating more low-quality predictions. Therefore, considering the small size of ship targets, we set k = 4.
Center-based uniform matching assigns an equal number of positive samples to each ground truth regardless of their scale, which solves the imbalance problem of positive samples for a one-level feature map situation. Different from the uniform matching in [41], the proposed method measures the distances between grids and labels with central locations and avoids the hyperparameter design required by the anchor mechanism. Since the location of all target boxes is determined, assigned positives and the optimization objective are invariable during the training process, which is more stable and prevents low-quality predictions caused by too many positive assignments.

Output Mapping and Loss Function
The proposed network is an anchor-free detector. It directly outputs the detection result and every pixel in the output map produces one predicted box. Concretely, for a pixel point located at position (i, j) (where i = 0, 1, 2, . . . , W/s l − 1 and j = 0, 1, 2, . . . , H/s l − 1) in the output map, its dimension (i.e., channel number of the output map) is 4 + 1 + c, corresponding to four output coordinates t o ij = (x o ij , y o ij , w o ij , h o ij ), the predicted confidence p ij , and classification result of this prediction c ij . Additionally, the coordinate of the predicted bounding box t p ij can be obtained by decoding the output coordinates through: where (x While training the network, the predicted boxes are first divided into positive targets and negative backgrounds using the proposed center-based uniform matching strategy. Then, for an image with N ship targets, the total loss function L can be calculated by: where N pos is the number of positive assignments, α is a weighting parameter set as 5.0, c n and t n represent the category and bounding box of the n-th target, respectively, and p * ij,n is an indicator that equals to 1 when the prediction on (i, j) is assigned to the n-th target; otherwise, it is 0. The classification loss L cls and objectiveness loss L obj adopt binary cross entropy with sigmoid normalization, and bounding box regression loss L reg adopts IoU loss.

Data Sets
Experiments on the SSDD and HRSID were conducted to verify the effectiveness of the proposed method. As the first open dataset for SAR ship detection, the SSDD is composed of 1160 images with 2456 ship targets. It contains samples with different resolutions, sizes, and sea conditions, providing abundant diversity to build a reliable detection model. Following the official scheme [42], 928 images were used for model training, and the rest of the 232 images were used for testing. Considering the distribution of image size, these images were resized to 352 × 512 before being sent into the model. Additionally, a larger dataset called the HRSID was adopted to validate the generalization ability of our method, which comprises 5604 cropped SAR images and 16,951 ships. They were divided into a training set with 3642 images and a test set with 1962 images. the HRSID has an image size of 800 × 800, displaying the characteristics of large detection scenes. In Figure 9, the ship size distribution of both datasets is given. It can be seen that both datasets are composed mainly of small ship targets.

Implementation Details
The experiments were conducted in the PyTorch 1.7.1, CUDA 11.0 framework based on a single NVIDIA Quadro P6000 GPU and the Ubuntu 20.04 system. The network was trained from scratch by the stochastic gradient descent (SGD) algorithm for 120 epochs with 0.9 momentum and 0.0005 weight decay. The learning rate followed a linear warmup and cosine decay schedule, with a maximum of 1 × 10 −4 at the 5th epoch and a minimum

Implementation Details
The experiments were conducted in the PyTorch 1.7.1, CUDA 11.0 framework based on a single NVIDIA Quadro P6000 GPU and the Ubuntu 20.04 system. The network was trained from scratch by the stochastic gradient descent (SGD) algorithm for 120 epochs with 0.9 momentum and 0.0005 weight decay. The learning rate followed a linear warmup and cosine decay schedule, with a maximum of 1 × 10 −4 at the 5th epoch and a minimum of 5 × 10 −6 after 100 epochs. Additionally, the batch sizes for SSDD and HRSID were set as 64 and 16, respectively. To make a fair comparison with other detectors, we canceled the mosaic and mixup enhancement strategy in YOLOX, and only employed random flip and crop as data augmentation for all models.

Evaluation Metrics
In order to evaluate the detection performance of different methods, we followed the evaluation criteria of MS COCO [53] and used AP, AP 50 , AP 75 , AP S , AP M , and AP L as evaluation metrics. By calculating the area under the precision-recall curve, the average precision (AP) achieved a more comprehensive representation of the detection performance. In addition to AP, the precision rate (P), recall rate (R), and F1-score were used to indicate the performance of whole scene images. These metrics are defined as follows: where TP, FP, and FN represent the number of true positives, false positives, and false negatives, respectively. The mostly used AP 50 was based on an IoU threshold of 0.5. Correspondingly, AP 75 was based on a higher IoU threshold of 0.75. Additionally, AP was calculated across the IoU thresholds from 0.5 to 0.95 with an interval of 0.05. AP S , AP M , and AP L represented AP for objects of small, medium, and large scales, respectively. In addition, floating-point operations (FLOPs) and the number of model parameters were adopted to measure the complexity of the model.

Model Analysis
In order to verify the effectiveness of a ship detector with a one-level feature map and analyze the effect of each proposed component, we conducted a series of experiments. They were conducted on the SSDD with the same training setting to make a fair comparison.

Feature Level Selection
In YOLOF [41], C 5 , the spatially smallest feature map with the largest receptive field, was adopted to construct a SiSo (single-in-single-out) encoder because it contains sufficient context information for optical image targets. However, the most suitable feature level for ship targets in SAR images might change because of different target characteristics. To study the influence of different feature maps and find the best feature layer for ship detection, we trained the network with different one-level features, i.e., C 2 , C 3 , C 4 , and C 5 . It is worth noting that when a shallow feature map is adopted, the depth of the backbone needs to be reconsidered as those layers after the selected feature are not involved in the forward propagation anymore. In consideration of this, those redundant layers were removed for the simplicity of the network. All networks simply adopted one projector followed by a decoupled head and were trained with the SimOTA matching strategy.
The results are given in Table 1. It can be seen that the detection performance is sensitive to the adopted one-level feature. From C 5 to C 3 , AP S and the overall performance show an increasing trend whereas AP L decreases, which proves that the scale range matched to the receptive field is of great importance to detection performance. Low-level features with small receptive fields are more suitable for small targets whereas high-level features with large receptive fields are fit for large targets. Surprisingly, the detection results of the one-level feature for scale-matched targets, e.g., AP L of C 5 and AP S for C 3 , are a little better than the baseline. This result reveals that the feature representation ability of the one-level feature is sufficient for specific ship targets and the role the feature pyramid plays is to make a balance between different scales. When the lowest feature C 2 is adopted, the performance is decreased compared to C 3 , which demonstrates that the context information of C 2 is not strong enough to distinguish small ships from backgrounds since its receptive field is too small.   Meanwhile, the model parameter number and computational complexity for most one-level situations are greatly reduced. This reduction mainly comes from two aspects. First, the original neck, i.e., PAFPN, with a complicated pyramidal structure is replaced with a simple projector. Second, the backbone is truncated as deep layers are not necessary anymore. The only exception, i.e., FLOPs of C 2 , is because the feature map size is too large. Additionally, it can be seen that as the feature level becomes shallower, there is a decrease in model size for the reason that the shallow part of the backbone has a smaller channel number. In consideration of the performance on small targets, we chose C 3 as the default feature of the proposed method.

Ablation Study of the Proposed Method
Though C 3 achieves comparable AP to the baseline, it has an extremely poor performance in detecting large targets due to the limited scale range. We added RADC blocks and center-based uniform matching to C 3 to deal with the problems brought by adopting the one-level feature. As shown in Table 2, the detection accuracy of all scales is improved after adding RADC blocks. Particularly, AP M is increased by 7.85% and AP L is increased by 23.56%, proving the effectiveness of the RADC block to expand the receptive field and enrich context information of the feature. Furthermore, when center-based uniform matching is applied, the overall precision AP gains is an improvement of 2.62% with no extra computational cost. Additionally, it can be seen that this improvement mainly comes from the increase in AP S and AP M . On the contrary, the performance for large targets is decreased, which is because the center-based uniform matching treats all targets equally regardless of their scales and the network is more focused on small targets than before. From an overall perspective, the proposed method achieves better performance compared to YOLOX-s while less computation is needed, revealing the effectiveness and high-efficiency of ship detection using a one-level feature map. Table 2. Effect of RADC block and center-based uniform matching strategy. Some visualized results are shown in Figure 10. The confidence threshold is set as 0.5 and the IoU threshold for NMS is set as 0.65. As a note, green, red, and yellow rectangles represent truth positives, false alarms, and missed ship targets, respectively. As shown in Figure 10b, YOLOX-s has a poor detection rate on small ship targets. In the first three rows of Figure 10, there are a lot of small ships missed by YOLOX, including in both inshore and offshore areas. When one-level feature C 3 is adopted to focus on small ships, the number of missed small ships in offshore areas is reduced. Furthermore, it can be seen that by adding RADC blocks and training with center-based uniform matching, the proposed method captured those inshore ships in the third row, which indicates that the proposed method is qualified to detect small ships in various backgrounds. Though C 3 discovers more small ships, its ability to deal with large targets and complex inshore areas is significantly reduced, which can be proven by the results in the last two rows of Figure 10. Meanwhile, the proposed method can precisely locate large ships compared to C 3 , which benefits from the increased receptive field brought by the RADC blocks. Specially, the missed target in the last row shows that the proposed method is not capable of discovering targets of extreme shapes. This deficiency is unsurprising given that we abandoned the multiscale structure in pursuit of detection efficiency.

Ablation Study of RADC Block
In the RADC block, residual connection and symmetric dilated convolution are designed to expand the receptive field while maintaining the information of small targets.

Ablation Study of RADC Block
In the RADC block, residual connection and symmetric dilated convolution are designed to expand the receptive field while maintaining the information of small targets. To verify the effectiveness of these two components, we canceled the residual structure and used standard dilated convolution to replace the four branches. The results in Table 3 show that the residual connection which preserves original information is essential to the detection of small targets. Additionally, the multibranch design with asymmetric dilated convolutions can effectively improve detection performance of large targets, which means large ship targets with various shapes are properly covered by the four branches. As the stacking of RADC blocks is the key to how our neck is constructed, the number of stacked blocks is important to the receptive field and feature representation of the output map. The results in Table 4 show that with the increase in stacked blocks, AP is gradually increased, which manifests the effectiveness of the RADC block to improve feature representation of the network. Though more blocks can boost the accuracy even further, we added four RADC blocks by default to keep the network lightweight and fast.

Positive Samples of Center-Based Uniform Matching
The number of assigned positive samples for every ground truth has a great impact on the calculation of loss function and further network optimization. Intuitively, in one-stage detectors with dense priors, more positive assignments can bring benefits to the training process. On this account, we have conducted experiments with different positive samples for each target. As shown in Table 5, the performance of center-based uniform matching is quite robust with different k values, except for the k = 1 situation in which target information is not learned efficiently. The best result is achieved with four positive assignments for each target, indicating that four pixels on the final output map can attain the best representation for most ship targets. Additionally, the performance begins to decline when more positive samples are assigned. This is because these samples are far from the ship center and hold inadequate information for target locating.

Extended Experiments on HRSID
In order to verify the generalization ability of the proposed method, we conducted the same experiments on the HRSID. The results are shown in Table 6. After removing the feature pyramid structure, the one-level feature C3 achieves a decreased accuracy compared to the baseline, which is the same as the results of the SSDD. By adding the proposed components, AP and AP 50 of C3 are increased by 2.25% and 2.36%, respectively. It is worth noting that the target scale of the HRSID is distributed in a wider range, which can be proven by Figure 9b, and there are a lot of ships with extremely small or large sizes, which increases the difficulty of target detection. As a result, the proposed method achieves slightly lower overall accuracy compared to the baseline, whereas the model size and computational cost are both reduced substantially.  Figure 11 shows some detection results of the proposed method on the HRSID. It can be seen that small ships in both inshore and offshore areas can be properly detected, proving the effective feature learning of the proposed network to capture ship targets. Additionally, the few false alarms are all within the water area, which indicates the superiority of CNNs for automatically distinguishing land interference without sea-land segmentation. At the same time, the second row of Figure 11 demonstrates that ships adjacent to each other are easily missed by the proposed method. This phenomenon is related to the postprocessing operation and can be further solved with soft-NMS [54]. Figure 11 shows some detection results of the proposed method on the HRSID. It can be seen that small ships in both inshore and offshore areas can be properly detected, proving the effective feature learning of the proposed network to capture ship targets. Additionally, the few false alarms are all within the water area, which indicates the superiority of CNNs for automatically distinguishing land interference without sea-land segmentation. At the same time, the second row of Figure 11 demonstrates that ships adjacent to each other are easily missed by the proposed method. This phenomenon is related to the postprocessing operation and can be further solved with soft-NMS [54].

Comparison with Other CNN Detectors
The proposed method was compared with several representative detection networks, including Faster R-CNN [12], RetinaNet [18], FCOS [21], and the tiny version of YOLOX [22]. Furthermore, the original YOLOF [41] which adopts a one-level feature C5 was also performed on SAR ship datasets for comparison. Except for YOLOX, all other networks adopted ResNet-50 [45] as backbone. To make a fair comparison, we conducted experiments using the same input image size and data augmentation methods. The results are given in Table 7, where for every column the best value is bolded and the second-best value is underlined. According to the results, the model size of the proposed method is 10.3 MB, which is much smaller than that of other detectors and is conducive to the deployment of the detection algorithm in resource-constrained platforms. Compared with other anchor-free detectors, the model size of the proposed lightweight model is 4.0% of FCOS, 14.3% of YOLO-s, and 25.4% of YOLOX-tiny. Additionally, the inference speed of the proposed method is also qualified for real-time ship detection. At the same time, the proposed method achieves the highest AP50 on the SSDD and the second-best AP on the HRSID, which proves the adequacy of the one-level feature for ship detection in SAR images. With the proposed RADC block and center-based uniform matching, single-scale detection can achieve competitive accuracy. Specially, it can be noticed that the performance of YOLOF

Comparison with Other CNN Detectors
The proposed method was compared with several representative detection networks, including Faster R-CNN [12], RetinaNet [18], FCOS [21], and the tiny version of YOLOX [22]. Furthermore, the original YOLOF [41] which adopts a one-level feature C 5 was also performed on SAR ship datasets for comparison. Except for YOLOX, all other networks adopted ResNet-50 [45] as backbone. To make a fair comparison, we conducted experiments using the same input image size and data augmentation methods. The results are given in Table 7, where for every column the best value is bolded and the second-best value is underlined.
According to the results, the model size of the proposed method is 10.3 MB, which is much smaller than that of other detectors and is conducive to the deployment of the detection algorithm in resource-constrained platforms. Compared with other anchor-free detectors, the model size of the proposed lightweight model is 4.0% of FCOS, 14.3% of YOLO-s, and 25.4% of YOLOX-tiny. Additionally, the inference speed of the proposed method is also qualified for real-time ship detection. At the same time, the proposed method achieves the highest AP 50 on the SSDD and the second-best AP on the HRSID, which proves the adequacy of the one-level feature for ship detection in SAR images. With the proposed RADC block and center-based uniform matching, single-scale detection can achieve competitive accuracy. Specially, it can be noticed that the performance of YOLOF on SAR ship detection is not as satisfying as that on natural object detection. This difference mainly results from different target scales of natural targets and ship targets. * Inference time is measured on HRSID with an input size of (800, 800).

Detection Results of Large-Scale Image
In order to further evaluate the effectiveness of the proposed method on large-scene SAR images, an additional dataset, namely, the large-scale SAR ship detection dataset-v1.0 (LS-SSDD-v1.0) [55] was used, in which a large-scale image (i.e., 16,000 × 24,000 pixels) was tested by the proposed method. The large image was cropped into image chips of 800 × 800 pixels with no overlap before being sent into the network, and a confidence threshold of 0.2 was used for detection results, which are given in Table 8 and Figure 12. It can be drawn from Table 8 that, compared with YOLOX-s, the proposed method can detect more ships with fewer false alarms. At the same time, the shorter inference time can be proof of its high detection efficiency. In the visualized result of Figure 12, most ships with both offshore and inshore backgrounds can be detected properly, indicating the stability of the proposed method in large-scene images. on SAR ship detection is not as satisfying as that on natural object detection. This difference mainly results from different target scales of natural targets and ship targets.

Detection Results of Large-Scale Image
In order to further evaluate the effectiveness of the proposed method on large-scene SAR images, an additional dataset, namely, the large-scale SAR ship detection dataset-v1.0 (LS-SSDD-v1.0) [55] was used, in which a large-scale image (i.e., 16,000 × 24,000 pixels) was tested by the proposed method. The large image was cropped into image chips of 800 × 800 pixels with no overlap before being sent into the network, and a confidence threshold of 0.2 was used for detection results, which are given in Table 8 and Figure 12. It can be drawn from Table 8 that, compared with YOLOX-s, the proposed method can detect more ships with fewer false alarms. At the same time, the shorter inference time can be proof of its high detection efficiency. In the visualized result of Figure 12, most ships with both offshore and inshore backgrounds can be detected properly, indicating the stability of the proposed method in large-scene images.

Discussion
Based on the small scale of SAR ship targets and relatively weak texture level of SAR images, we design a lightweight detection network by removing the low-efficiency highlevel features. Therefore, the detection accuracy of our one-level feature-based method is related to the target scale distribution of the dataset. In our experiments, we first verified the influence of adopting features of different levels on the SSDD and found that a shallow layer C 3 performs well in locating small ships. On this basis, we added the proposed RADC blocks and center-based uniform matching to C 3 and boosted the performance significantly. Furthermore, when trained with a larger dataset, the HRSID, the proposed method can also achieve comparable accuracy with the baseline while having a much smaller model size. The comparison results with other CNN-based detectors are visualized in Figure 13. It can be seen that the proposed method has a low model complexity and a high detection speed. From an overall perspective, the proposed method achieves comparable performance to other CNN-based detectors with computational cost, which proves the significant superiority of our method. level features. Therefore, the detection accuracy of our one-level feature-based method is related to the target scale distribution of the dataset. In our experiments, we first verified the influence of adopting features of different levels on the SSDD and found that a shallow layer C3 performs well in locating small ships. On this basis, we added the proposed RADC blocks and center-based uniform matching to C3 and boosted the performance significantly. Furthermore, when trained with a larger dataset, the HRSID, the proposed method can also achieve comparable accuracy with the baseline while having a much smaller model size. The comparison results with other CNN-based detectors are visualized in Figure 13. It can be seen that the proposed method has a low model complexity and a high detection speed. From an overall perspective, the proposed method achieves comparable performance to other CNN-based detectors with computational cost, which proves the significant superiority of our method.
By building a lightweight detector within a single scale, we proved the validity of the one-level feature for detecting ships in SAR images. Nevertheless, there are still some problems to be solved. First, the proposed method is not capable of detecting ships with extreme shapes, which means the robustness of the proposed method still needs to be improved. Second, from the detection results in Figure 11, it can be seen that the detection ability of our method for adjacent ships is limited. In addition, the adopted target representation, i.e., bounding box, is not the most appropriate form for oriented ship targets and more precise forms can be used for better ship representation. These problems are to be solved in our future work.

Conclusions
In this paper, we proposed a lightweight network using a one-level feature to achieve high-efficiency ship detection in SAR images. We replaced the feature pyramid structure with a streamlined neck and designed RADC blocks to detect ships of various scales. With RADC blocks, both the limited receptive field and weak semantic level of the one-level feature can be improved effectively. Furthermore, to deal with the imbalance problem between different scales in the training stage, we proposed center-based uniform matching, which assigns a fixed number of positive samples to each target. Experiments on the SSDD and HRSID showed that the proposed components can effectively improve the performance of the one-level feature. Compared with mainstream CNN-based detectors, the proposed method is fast and accurate. Additionally, the detection results on a large-scene image also prove the effectiveness of the proposed method. By building a lightweight detector within a single scale, we proved the validity of the one-level feature for detecting ships in SAR images. Nevertheless, there are still some problems to be solved. First, the proposed method is not capable of detecting ships with extreme shapes, which means the robustness of the proposed method still needs to be improved. Second, from the detection results in Figure 11, it can be seen that the detection ability of our method for adjacent ships is limited. In addition, the adopted target representation, i.e., bounding box, is not the most appropriate form for oriented ship targets and more precise forms can be used for better ship representation. These problems are to be solved in our future work.

Conclusions
In this paper, we proposed a lightweight network using a one-level feature to achieve high-efficiency ship detection in SAR images. We replaced the feature pyramid structure with a streamlined neck and designed RADC blocks to detect ships of various scales. With RADC blocks, both the limited receptive field and weak semantic level of the one-level feature can be improved effectively. Furthermore, to deal with the imbalance problem between different scales in the training stage, we proposed center-based uniform matching, which assigns a fixed number of positive samples to each target. Experiments on the SSDD and HRSID showed that the proposed components can effectively improve the performance of the one-level feature. Compared with mainstream CNN-based detectors, the proposed method is fast and accurate. Additionally, the detection results on a large-scene image also prove the effectiveness of the proposed method.
Funding: This work was supported in part by the National Natural Science Foundation of China (61971026).

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.