SAR Ship Detection Dataset (SSDD): Official Release and Comprehensive Data Analysis

SAR Ship Detection Dataset (SSDD) is the first open dataset that is widely used to research state-of-the-art technology of ship detection from Synthetic Aperture Radar (SAR) imagery based on deep learning (DL). According to our investigation, up to 46.59% of the total 161 public reports confidently select SSDD to study DL-based SAR ship detection. Undoubtedly, this situation reveals the popularity and great influence of SSDD in the SAR remote sensing community. Nevertheless, the coarse annotations and ambiguous standards of use of its initial version both hinder fair methodological comparisons and effective academic exchanges. Additionally, its single-function horizontal-vertical rectangle bounding box (BBox) labels can no longer satisfy the current research needs of the rotatable bounding box (RBox) task and the pixel-level polygon segmentation task. Therefore, to address the above two dilemmas, in this review, advocated by the publisher of SSDD, we will make an official release of SSDD based on its initial version. SSDD’s official release version will cover three types: (1) a bounding box SSDD (BBox-SSDD), (2) a rotatable bounding box SSDD (RBox-SSDD), and (3) a polygon segmentation SSDD (PSeg-SSDD). We relabel ships in SSDD more carefully and finely, and then explicitly formulate some strict using standards, e.g., (1) the training-test division determination, (2) the inshore-offshore protocol, (3) the ship-size reasonable definition, (4) the determination of the densely distributed small ship samples, and (5) the determination of the densely parallel berthing at ports ship samples. These using standards are all formulated objectively based on the using differences of existing 75 (161 × 46.59%) public reports. They will be beneficial for fair method comparison and effective academic exchanges in the future. Most notably, we conduct a comprehensive data analysis on BBox-SSDD, RBox-SSDD, and PSeg-SSDD. Our analysis results can provide some valuable suggestions for possible future scholars to further elaborately design DL-based SAR ship detectors with higher accuracy and stronger robustness when using SSDD.


Deep Learning for SAR Ship Detection before SSDD
Taking SAR ship detection as an example, since the first public report was presented at the conference of IEEE International Geoscience and Remote Sensing Symposium (IGARSS) in Beijing, China on 3 November 2016 by C.P. Schwegmann et al. [23], a large number of studies based on DL have emerged for SAR ship detection. According to our statistics, before 1 December 2017, there were four public reports in the SAR remote sensing community that apply DL to SAR ship detection, including (1) three conference papers from the IGARSS and the International Workshop on Remote Sensing with Intelligent Processing (RSIP) and (2) one journal paper from Remote Sensing. See their list summary in Table 1. In the report of C.P. Schwegmann et al. [23], inspired by the work of Srivastava et al. [27] from the computer vision community, a very deep High-Way CNN was established to achieve SAR ship discrimination, i.e., ruling out false alarms. This work was presented at the conference of IGARSS in Beijing, China. Their work applied DL to the last step of the traditional SAR ship detection workflow. Similarly, on 26 Junuary 2017, Liu et al. [24] designed a CNN to achieve ship discrimination. Their network contains two convolution layers, two maximum pooling layers, and three fully connected layers. Their workflow is inspired by R-CNN, proposed by Girshick et al. [15] in 2014. Although the above two reports drew support from the DL technology, they both still designed their detection methods according to the standard four-step process. That is, CNN is only used to accomplish the ship-background binary classification task. From their reports, the binary classification accuracy of the proposed CNN achieved a huge process compared with several traditional classifiers, e.g., decision trees, AdaBoost, and support vector machine (SVM). Nevertheless, end-to-end detection was not completed in their work. Their methods are still troublesome.
In the report of Kang et al. [25], the famous Faster R-CNN [17] was first applied to SAR ship detection. Moreover, the detection outputs of Faster R-CNN were modified according to their corresponding classification confidences or scores. Here, the bounding box with a score lower than 0.2 was transferred into CFAR to detect again to avoid the missed detection. They hold the view that Faster R-CNN always missed small ships because its deeper layers offered little small-ship information due to multiple max-pooling operations. Without exaggeration, this work poses a significant impact on the follow-up scholars' researches. If not considering the post-processing of CFAR, their work can achieve the full end-to-end training and test, avoiding the traditional four-step process. In particular, the exemption of the sea-land segmentation step can greatly reflect the greatest advantage of DL. Later, Kang et al. [26] also further improved the network structure of Faster R-CNN, i.e., adding contextual information in the region of interest (ROI) and pooling and generating proposals in multiple layers with different depths. As a result, their experimental results on the Sentinel-1 SAR images revealed that their modified version could detect more ships and remove more false alarms. This work reaches the goal of end-to-end SAR ship detection.
Regretfully, the four reports in Table 1 do not all provide public training datasets for later possible scholars. This hinders the development of DL in the field of SAR ship detection. It has been extensively shown that a huge amount of data is needed for DL to be effective. Only big training data can ensure DL networks learn target features deeply and accurately. Before

Initial Release of SSDD
Fortunately, on 1 December 2017, Jianwei Li, a co-author of this article, made his own collected dataset, i.e., SAR Ship Detection Dataset (SSDD), publicly available to everyone at the conference of SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA) in Beijing, China [28]. SSDD is the first open dataset in this community. See the list summary of SSDD in Table 2. Detailed descriptions of SSDD are shown in Table 3.  Table 3. Detailed descriptions of SSDD [28]. In his report [28], he applied the classical and famous two-stage detector Faster R-CNN to complete the SAR ship detection task. In addition to this, he also proposed four strategies to further improve the performance of the standard Faster RCNN algorithm when it is used for SAR ship detection, including feature fusion, transfer learning, hard negative mining, and other optimized implementation details. Finally, his proposed Improved-Faster R-CNN enhanced the detection performance by~8% mean average precision (mAP) on the SSDD dataset.

Sensors
In fact, the report of Li et al. [28] merely released an initial coarse version of SSDD. Surprisingly, this first open dataset SSDD gained unprecedented attention from quite a number of scholars, which was beyond the author's imagination. According to our investigation from 2016 to 25 August 2021 (the completion time of this manuscript), after the release of SSDD, 46.59% of the total 161 public reports confidently choose SSDD to study DL-based SAR ship detection. Obviously, this situation reveals the popularity and great influence of SSDD in the SAR remote sensing community.
These 75 (161 × 46.59%) public reports  that used SSDD will be shown in Section 2. Here, only reports in English are recorded. See the pie chart in Figure 1. In Figure 1, the others include (1) the self-collected databases and (2) the other five open datasets in Table 4  There are 161 public reports using DL for SAR ship detection. Among them, there are 75 that used SSDD as their study data source, i.e., 46.6%.

Success of SSDD
We hold the view that the tremendous success of SSDD is due to the following seven factors: 1.
The public time of SSDD is the earliest. It is older than the second open dataset SAR-Ship-Dataset by~1.5 years. When no other datasets are available, SSDD becomes the only option.

2.
Many countries or organizations have launched various SAR satellites. Several frequently used satellites for SAR ship detection include Sentinel-1 from the European Space Agency (ESA) [107], Gaofen-3 from China [108], TerraX-SAR from Germany [109], COSMO-SkyMed from Italy [110], ALOS from Japan [111], and Kompsat-5 from South Korea [112]. They are all commercial satellites. Except for Sentinel-1, users need to pay to download data, increasing research and development costs. However, the resolutions of Sentinel-1 are modest. In Sentinel-1 SAR images, ships are universally small, with unclear geometric features. However, the emergence of SSDD can solve the above dilemma.

3.
The publisher of SSDD is active in the SAR remote sensing research community. Some public media platforms promote the dissemination of this dataset. The SAR image samples in SSDD are various with different resolutions from 1m  to 15 m, different sensors from RadarSat-2, TerraSAR-X, and Sentinel-1, different  polarizations (HH, VV, VH, and HV), different sea conditions, different ship scenes, including inshore and offshore, and different ship sizes. Data diversity is one of the major issues in building reliable regressive/predictive detection models. See Table 3.

5.
When several reports using SSDD appeared, follow-up scholars chose to experiment on this SSDD dataset in order to facilitate the comparison of methodologies with previous scholars. As a result, there are gradually increasing public reports using this SSDD dataset. 6.
In the early stage, the GPU computing power of computers used by most scholars in the SAR remote sensing community was limited. The sample number of SSDD is relatively moderate, i.e., 1160, compared with large-scale datasets in the computer vision community, e.g.,~9000 images in the PASCAL VOC dataset [113] and~20 w images in the COCO dataset [114]. This reduces the equipment cost of studies. This enables ordinary researchers equipped with general performance GPUs to carry out research and development. This point enables the community, using the SSDD dataset to study DL-based SAR ship detection, to become rather active. As a result, the increase of researchers may greatly lead to the increase of research results. Moreover, a relatively moderate sample number also facilitates the debugging of models, improving work efficiency, rather than a long time of training waiting. Of course, when using the SSDD dataset, some few-shot strategies, e.g., data augmentation and transfer learning, should be considered so as to avoid overfitting. 7.
There are typical hard-detected samples in SSDD. These samples all need special consideration in the practical application of SAR ship detection, e.g., (1) small ships with inconspicuous features, (2) densely parallel ships berthing at ports with overlapping ship hulls, (3) large scale-difference ships, (4) ship detection under severe speckle noise, (5) ship detection under complex, and (6) multiple types of sea clutters. Ship detection in these difficult samples is a research hotspot, regardless of traditional hand-crafted methods or modern DL-based methods. Therefore, SSDD can provide a possible data source to study these focus issues.

Motivations of This Review
Nevertheless, the coarse annotations and ambiguous using standards of SSDD's initial version have hindered fair methodological comparisons and effective academic exchanges. Firstly, there are some coarse annotations in the initial version, e.g., missed ship annotations, false ship annotations, and not compact bounding boxes. Therefore, the initial version is "dirty". The phenomenon of dirty data is widespread in the field of computer vision. For huge data, deep networks can reduce the negative influence of dirty data through batch training to improve the generalization ability of models. However, for the fewshot SAR data, a training oscillation may occur in deep networks, which will lower the detection performance. Therefore, it is necessary to correct them. Some scholars [51,52] have corrected them partly, but their revised labels are not publicly available. Secondly, in the original conference report of SSDD [28], the using standards are ambiguous, even unreasonable. For example, the training-test division is random, but the cardinality of the test set affects the resulting accuracy due to too few samples [90]. This results in unfair methodological comparisons with different scholars. Moreover, the inshore-offshore protocol was not provided in his raw report, leading to an unfair detection accuracy comparison between inshore ships and offshore ones by later scholars. More importantly, at the moment, there is still a lack of comprehensive data analysis of this dataset. This is not conducive to further research by other scholars.
Moreover, the single-function horizontal-vertical rectangle bounding box (BBox) labels of SSDD's initial version can no longer satisfy the current research needs of both the rotatable bounding box (RBox) task and the pixel-level polygon segmentation (PSeg) one. Horizontal-vertical BBox is not suitable for ships with large aspect ratios and arbitrary ori-entations. Furthermore, ships at ports are too closely packed to be effectively distinguished, thereby resulting in missing detections [62]. There are lots of background clutters in the BBox, reducing the ship feature learning benefits. On the contrary, RBox can better describe the true shape of the target while providing better accuracy in ship detection. Therefore, some scholars [31,36,46,56,59,62,79,80,91] have employed RBox to detect ships in SSDD. Here, the rotatable bounding box ground truths are labeled by themselves, but these RBox labels are not publicly available. Moreover, PSeg is the highest-level task because it is based on pixel-level. Obviously, SAR ship detection using PSeg is the most ideal because PSeg can almost completely suppress background clutter. Up to now, several scholars [52,54,98,99] have drawn support from it to achieve SAR ship detection. PSeg ground truths are labeled by themselves, but they are not publicly available, too. Therefore, to address the above two dilemmas, advocated by the publisher of SSDD, i.e., a co-author of this review, Jianwei Li, we will make an official release of SSDD based on its initial version [28]. The official release version of SSDD will cover three types-(1) bounding box SSDD (BBox-SSDD), (2) rotatable bounding box SSDD (RBox-SSDD), and (3) polygon segmentation SSDD (PSeg-SSDD). These re-released three types of ship ground truths labels will be convenient for future scholars to use, according to different task requirements. With the participation of many researchers, we re-label SAR ships in SSDD more carefully and finely. Furthermore, we explicitly formulate some strict using standards for the sake of fair methodological comparisons and effective academic exchanges, including (1) the training-test division determination, (2) the inshore-offshore protocol, (3) the ship-size reasonable definition, (4) the determination of the densely distributed small ship samples, and (5) the determination of the densely parallel berthing at ports ship samples. To be clear, the determinations of these using standards are based on the using differences of the existing 75 public reports .
Most notably, we also conduct a comprehensive data analysis on BBox-SSDD, RBox-SSDD, and PSeg-SSDD, which is missing in its initial release version. Our analysis results will be able to provide some valuable suggestions for possible future scholars to further elaborately design DL-based SAR ship detectors with higher accuracy and stronger robustness when using SSDD. We expect that this review will be useful for relevant scholars who are studying SAR ship detection based on DL.
The main contributions of this review are summarized as follows: 1.
The official version of SSDD is released, including three types: BBox-SSDD, RBox-SSDD, and PSeg-SSDD. It will be convenient for future scholars to use according to different task requirements.

2.
A comprehensive data analysis on BBox-SSDD, RBox-SSDD, and PSeg-SSDD is conducted. It can provide some valuable suggestions for possible future scholars to further design DL-based SAR ship detectors with higher accuracy and stronger robustness when using SSDD.

3.
Some more reasonable and stricter using standards are formulated objectively based on the using differences of existing 75 (161 × 46.59%) public reports. We also provide some potential solutions to improve the detection performance of difficult-detected ship samples.
The rest of this review is arranged as follows. Section 2 provides a summary of public reports using SSDD. This is convenient for scholars to consult, analyze and summarize the regularity. Section 3 shows the official ground truth ship labels in SSDD, including BBox, RBox, and PSeg. This can provide potential scholars with a comprehensive browse without querying in the data directory. Section 4 introduces the data directory of SSDD. In the data directory, we also provide some useful tools in Python language for user-friendly service. Data analysis is presented in Section 5. In this section, we provide some valuable suggestions when using SSDD. Using standards are listed in Section 6. This will contribute to the comparison of reasonable methods in the future. Finally, Section 7 summarizes the whole article.
Last but not least, the official release version of SSDD (i.e., BBox-SSDD, RBox-SSDD, and PSeg-SSDD) is available at https://github.com/TianwenZhang0825/Official-SSDD (accessed on 25 August 2021). Table 5 shows the summary list of 75 public reports using SSDD. In Table 5, all public reports are sorted according to the order of their publication time. In order to facilitate readers' browsing, the public reports of the same publication year are marked as uniform color blocks. The number of public publications has increased year by year.

Summary of Public Reports Using SSDD
Moreover, we provide some statistical analysis in Figure 2. From Figure 2a, the number of public publications has increased year by year. The highest growth rate was in 2019. The growth rate slows down slightly in 2020, which may be because of the influence of the successive releases of other datasets. By 25 August 2021 (the completion time of this manuscript), there have been 24 public reports using SSDD. As one can imagine, there must be >28 public reports from 2020 that will appear throughout 2021. From Figure 2b, the studies using SSDD are active in both peer-review journals and conferences, showing SSDD's great influence. From Figure 2c, except for one conference report from the Indian scholars' Anil Raj et al. [87], all reports are from Chinese scholars. This may be because (1) the publisher of SSDD Jianwei Li is from China; (2) SSDD was firstly released at the conference of the BIGSARDATA in Beijing, China; (3) it was in Beijing, China, at the 2016 IGARSS conference where the first report from C.P. Schwegmann et al. [23] applied DL to SAR ship detection, although they [23] did not use SSDD. Therefore, China has become the most active country or region in research.
From Figure 2d, SSDD appears in a variety of mainstream remote sensing journals, e.g., MDPI Remote Sensing (RS), IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS), IEEE Transactions on Geoscience and Remote Sensing (TGRS), and IEEE Geoscience and Remote Sensing Letters (GRSL). This shows that SSSD is widely accepted by academics in the remote sensing community.
Finally, we also counted the label types used among these 75 public reports in Figure 3, i.e., BBox, RBox, and PSeg. As shown in Figure 3, there are 62 papers using Bbox and only 9 using RBox; there are only four papers using PSeg. The initial release version of SSDD only provided BBox labels. This makes the number using BBox account for the majority (82.7%). However, Bbox is not compact enough for ship detection, and there is a lot of background clutter in it, resulting in insufficient ship feature extraction. Therefore, later, Wang et al. [31], An et al. [36,80], Chen et al. [46], Pan et al. [56], Yang et al. [59,79], Chen et al. [62], and He et al. [91] used RBox to detect ships in SAR ships. Their methods were inspired by the rotational text detection in the deep learning community, e.g., R2CNN [115] and EAST [116]. Compared with RBox, PSeg is better because it is based on the pixel level. Obviously, SAR ship detection using PSeg is the most ideal because PSeg can almost thoroughly suppress the background clutter. So far, there are only four papers adopting these kinds of labels, including Su et al. [52], Mao et al. [54], Sun et al. [98], and Wu et al. [99]. It should be noted that only one of the four realizes the ship segmentation authentically, i.e., Su et al. [52]. The other three merely used the PSeg labels to guide the ship detection of BBox. Specifically, Mao et al. [54] generated semantic feature maps to generate the attention score so as to obtain better ship BBoxs. Sun et al. [98] designed a semantic attention-based network for inshore ship detection. In this network, PSeg labels were used to suppress cruciform sidelobes. Wu et al. [99] established the interaction between instance segmentation and object detection. Finally, the BBox detection accuracy on inshore scenes is improved greatly.       The fact that there are rather few scholars using RBox and PSeg labels for SAR ship detection comes from the lack of ground truths. The initial release version of SSDD did not provide the two. This defect hinders more research by relevant scholars when using RBox and PSeg. Motivated by this, therefore, this article will handle this defect.

SSDD
In this section, we will display the ship ground truths and descript label formats of BBox-SSDD, RBox-SSDD, and PSeg-SSDD in Section 3.1, Section 3.2, and Section 3.3, respectively. In order to facilitate the reader's overall browsing, all images in the test set (i.e., images with the name suffix of 1 and 9) are displayed. Figure A3 in Appendix A shows the bounding box ground truths in the test set of BBox-SSDD. The inshore samples are marked in magenta. Figure 4 shows the label format of BBox-SSDD, where we use the PASCAL VOC format to explain. The initial release version of SSDD adopted this format. One BBox is described by two points in the x-y image coordinates, i.e., A(xmin, ymin) and B(xmax, ymax). Thus, the width (w) and height (h) of the BBox are

BBox-SSDD
Some CNN-based detectors adopt some other descriptions of BBox, e.g., (xmin, ymin, w, h) of YOLO [19]. Users can flexibly convert them according to the labels we provide. Of course, we also provide conversion tools in Python language. In fact, this review can be used as a detailed tutorial of SSDD for beginners. To be clear, in Figure 4a, the image channel number is the default 3, where 3 refers to RGB. Considering that SAR images only have a gray-level channel, we copy this channel twice to obtain RGB 3-channel images. This is to facilitate the use of detectors in the field of computer vision. Moreover, the current ImageNet pre-training weights are all based on 3-channel inputs. According to experience, using the pre-training weights can accelerate the convergence speed of networks and reduce the risk of overfitting with fine-tuning. Of course, one can extract the data of one channel to use for designing the "specialized" SAR ship detectors from scratch. This practice is worth advocating since it can eliminate learning bias between different data domains.  Figure A1 in Appendix A shows the rotational bounding box ship ground truths in the test set of RBox-SSDD. The inshore samples are marked in magenta. In Figure A1, boxes are rectangles, not parallelograms. Some do not look like right angles because images are scaled (the aspect ratio is not maintained) for tidiness and beauty. Figure 5 shows the label format of RBox-SSDD. The initial release version of SSDD did not provide such labels for scholars. One RBox can be described by four vertices, i.e., A(x1, y1), B(x2, y2), C(x3, y3), and D(x4, y4). We call this scheme four-point type. Here, the four corners of a quadrilateral are right angles, i.e., ∠A = ∠B = ∠C = ∠D = 90 • . Thus, the w and h of RBox are

RBox-SSDD
The above representation may be the most intuitive, but some detectors adopt different coding modes. Therefore, we also provide another description mode of one RBox, i.e., (x, y, w, h, θ) in Figure 5a, where x, y is the coordinate of the center P, and θ is the direction angle. We call this scheme center-angle type. In fact, this center-angle type is equivalent to four-point one. In RBox-SSDD, θ ∈ [0 • , 90 • ] is based on the angle system rather than the radian system. Of course, one can convert it easily into the radian system according to application requirements. Furthermore, θ is the angle between the principal ship axis and negative y-axis. It is more in line with human visual habits, based on the counter-clockwise direction. One can also use 90 • − θ to represent the ship direction angle.
It should be noted that Wang et al. [31], An et al. [36,80], Chen et al. [46], Pan et al. [56], Yang et al. [59,79], Chen et al. [62], and He et al. [91] used RBox to detect ships in SAR ships, but their angle estimation results are discrete, e.g., 30 • , 60 • , 90 • , and so on. Their angular clearance is too large to describe the direction of the ship finely. However, the angle labels we provide are the continuous float-type rather than the discrete int-type, which will lead to better direction estimation accuracy. Obviously, RBox-SSDD has the potential to promote greater progress for ship direction estimation in the future. Scholars need to design stronger regression models to achieve such fine angle regression.
Given the above, according to the labels we provide, scholars can use RBox-SSDD more flexibly. Figure A2 in Appendix A shows the ship polygon segmentation labels in the test set of PSeg-SSDD. The inshore samples are marked in magenta. In Figure 3, different ships are covered with different colors for clarity. The outline of the ship polygon is marked in green. From Figure A2, obviously, it is the most satisfactory to adopt polygon segmentation labels to detect ships. This task is at the pixel level, similar to traditional CFAR detections. Figure 6 shows the label format of PSeg-SSDD. The initial release version of SSDD did not provide such labels for scholars. From Figure 6, there are a series of points used to describe the outline of the ship. These points can be connected into a closed polygon. We use a famous and open annotation tool, LabelMe [117], to obtain these points. To be clear, due to scale differences of ships in SSDD, ships with different scales or sizes must result in different numbers of outline points. Large ships offer more points while small ones offer fewer points. Moreover, in Figure 6b, the order of these points is counterclockwise.

PSeg-SSDD
Based on PSeg-SSDD, one can study the semantic segmentation task by using a fully convolutional network (FCN) [118] or U-Net [119]. Semantic segmentation refers to classify each pixel in the image. For SAR ship detection, that will be a ship-background binary pixel-level classification task. If one pixel is predicted with a "1" label, then it will be regarded as one ship pixel. On the contrary, for a "0" label, it is the background. Taking the PASCAL VOC dataset format as an example, we provide the semantic segmentation mask in Figure 7. In Figure 7b, the black regions in the mask belong to background pixels meanwhile the green ones belong to ship pixels. Because there is only one ship category, the colors in the mask are only black and green. This is different from the 20 colors in the PASCAL VOC dataset.  Furthermore, based on PSeg-SSDD, one can study the instance segmentation task by using Mask R-CNN [120] or PANet [121]. Instance segmentation means that the machine automatically frames different instances from the image with the object detection method and then marks them pixel by pixel in different instance areas with the semantic segmentation method.
In fact, the labels of polygon segmentation can act as labels for BBox and RBox. Suppose the labels of polygon segmentation are denoted as the set L PSeg , where there are n points make up this set L to describe this ship. Therefore, the labels of BBox L BBox can be obtained by The above formula is obvious because a BBox is indeed the smallest circumscribed horizontal rectangle of a PSeg. Therefore, for the instance segmentation task, the training label is shown in Figure 8. In Figure 8, each ship is given a polygon outline label and a horizontal rectangular box at the same time. For the same ship, the color of the polygon outline label and the horizontal rectangular box is the same. This shows that the two types of labels represent the same ship. Moreover, the colors of different ships are different because they are different instances, determined by the definition of the instance segmentation.
Finally, the labels of RBox L RBox can be obtained from the PSeg labels L PSeg . The former is the smallest circumscribed rectangle of the latter, i.e., where cv2 is the python-opencv package, and minAreaRect is the function to calculate the smallest circumscribed rectangle. The input of minAreaRect is a point set, and the output is a list. The first element of this list is a tuple (x, y), i.e., the center coordinate of the box. The second element is a tuple (w, h), i.e., the width and height of the box. The last one is a float-type number θ, i.e., the ship direction angle ranging from 0 • to 90 • . Figure 9 is the diagrammatic sketch of the data directory. In the root directory (Official-SSDD), there are four folders-BBox-SSDD (labels of BBox), RBox-SSDD (labels of RBox), PSeg-SSDD (labels of PSeg), and BBox-RBox-PSeg-SSDD (labels of BBox, RBox, and PSeg) provided to study the "all-in-one" ship detection. We provide two types of annotations-the PASCAL VOC format (voc_type) [113] and the COCO format (coco_type) [114].

Data Directory
In Figure 9, due to limited pages, we only expanded the specific contents of the first folder (BBox-SSDD) to display. From Figure 9, we also provide the SAR images with ship ground truths in the JPEGImages_BBox_GT. Moreover, it should be noted that after further update and maintenance in the future, the data directory may change slightly. Figure 9. Diagrammatic sketch of the data directory. We provide two types of annotations, i.e., the PASCAL VOC format (voc_type) [113] and the COCO format (coco_type) [114]. It should be noted that after further update and maintenance in the future, the data directory may change slightly.

Data Analysis
The initial reports of SSDD lack comprehensive data analysis. This is not conducive to the further research of follow-up scholars. Therefore, in this review, it will be complemented. Frist, we will analyze the image size (width and height) distribution in the following parts.
The size of the sample image in the dataset has a significant impact on the final detection performance. Generally, a deep network often needs a fixed image input size to maintain the unity of feature dimensions, e.g., SSD-300's input dimension is 300 × 300, SSD-512 is 512 × 512 [20], and YOLO is 416 × 416 or other multiples of 32 [19]. When the images in the dataset are not uniform, one usually needs to use some image interpolation methods to resize the original image. Therefore, the dimension of the resized scale is rather important. This size should preferably be determined according to the image size distribution of the dataset. For example, in the computer vision community, the default input size of the COCO dataset is 1333 × 800, which is an empirical optimal value adopted by a wide range of scholars. If this resized scale is too small, the information of many ships will be lost after a series of convolutions, resulting in their missed detections. If this resized scale is too large, the parameters of the network will increase rapidly, resulting in slow convergence speed, longer training time, and sacrificing the computational cost.
Nowadays, when using SSDD, many scholars in the SAR ship detection community adopt different input sizes in their different networks, e.g., 160 × 160 in Zhang et al. [65], 300 × 300 in Wang et al. [31], 500 × 500 in Jian et al. [89], 512 × 512 in Zhang et al. [73], 600 × 600 in Yu et al. [97], 600 × 1000 in Wei et al. [51], and so on. This leads to an unreasonable comparison of methods in accuracy. Generally, for the same network, the accuracy of 300 × 300 is often inferior to 500 × 500 or larger sizes. Therefore, it is necessary to make a detailed analysis of the image size. Moreover, an appropriate input size is also conducive to the design of better data enhancement methods, e.g., the work of Yang et al. [122]. Figure 10 shows the SAR image sample size statistics in SSDD. The three types of datasets share the same analysis results. In Figure 10, the test set is an image set with their suffix is 1 or 9. See the samples marked in magenta. The rest constitutes the training set. As a result, the training-test ratio is 8:2, i.e., 928 training samples and 232 samples. In Figure 10, we investigated the width-height distribution of images in Figure 10a and the ratio between width and height in Figure 10b.
From Figure 10a, the following conclusions can be drawn: 1.
The sample sizes in the SSDD dataset are quite different. Taking the entire dataset as an example, the smallest width is 214 pixels, while the largest one is 668 pixels. Their difference has tripled. The smallest height is 160 pixels, while the largest one is 526 pixels. Their difference has tripled too. 2.
The widths of images are widely larger than the heights from the green lines. Therefore, we hold the view that it may be unreasonable to directly stretch the height of the image to its width size. Otherwise, the aspect ratio of the original ship in the SAR image must change, which is a typical violation of the SAR imaging mechanism. 3.
Many samples share the same 500-pixel width, i.e., a strong cluster width = 500, but they do not share the same height. Their heights range from~200-pixel to~500-pixel. It is noteworthy that the mean, median, and mode values are all located at width = 500.
From Figure 10b, the following conclusions can be drawn: 1.
The ratio between the image width and height reflects a normal distribution. The ratio with the highest frequency was~1.4. Therefore, we hold the view that it is better to maintain this ratio during image pre-processing because this can minimize the information loss caused by pre-processing.

2.
The aspect ratio of images has an extreme tailing effect at both ends of the histogram. Therefore, scholars can consider clipping the image with an extremely differentiated aspect ratio so as to realize the normalization of network input. In fact, this practice can also serve as data enhancement.
Furthermore, from Figure 10, the image scale distribution on the entire dataset, the training set, and the test set remains roughly unchanged. This shows that the dataset partition mechanism we customized is reasonable. In essence, DL is indeed to fit the distribution of samples on the training set. Consequently, it can be generalized to the test set, which shares a similar distribution to the training set. Finally, we suggest the resized image scale as 500 × 350 pixels because it is closer to the median values of the width and the height Figure 10a. In addition, the ratio is~1.4 (500/350), which is in line with the mean value of Gaussian distribution in Figure 10b. Figure 11 shows the data statistics results on the BBox-SSDD dataset. We have investigated the distribution of the BBox width and height shown in Figure 11a, the distribution of the ratio between the BBox width and height in Figure 11b, the distribution of the area of the BBox in Figure 11c, and the distribution of the BBox center coordinates (x, y) in Figure 11d where x = (xmin + xmax)/2 and y = (ymin + ymax)/2 from Figure 4a.

Data Analysis on BBox-SSDD
From Figure 11a, the following conclusions can be drawn: 1. Ships in SSDD are universally small. The width-height distribution of BBox presents a symmetrical funnel shape. There are more ships at the top of the funnel and fewer ships in the center of the funnel. This shows that SAR ships are rarely square (the green line), which is also reasonable because ships are always flat. The average size of ships is only~35 × 35 pixels. It is extremely difficult to detect such small ships. Thus, scholars should pay special attention to this phenomenon.

2.
The reason why the ship size distribution presents a symmetrical structure based on the diagonal is that the breadth and length of the ship are not completely distinguished in the image coordinate system. Sometimes, the BBox width and height are confused.  From Figure 11b, the aspect ratio of BBox ships is seriously unbalanced. The aspect ratio of most ships is less than 1. Some ships with extreme aspect ratios are easy to miss detection because of the scarcity of training data. Scholars can use the 90 • rotation data enhancement to alleviate this problem. Moreover, one can also selectively only enhance the data of ships with a large length-width ratio so as to realize the learning balance of the network.
From Figure 11c, we can more clearly find that the SAR ship is small. However, the upper limit of the BBox areas is rather large. In other words, some ships are extremely large in size, but their number is extremely small. For example, the number of areas > 50,000 is only 4 in the entire dataset. It is possible that these ships have the highest resolution because the area of BBox is calculated according to the number of pixels. The area of the same ship in high-resolution images is larger than that in low-resolution images. This huge cross-scale ship detection is a challenging task. In order to realize the multi-scale ship detection under the condition of multi-resolution, scholars can appropriately downsample the large ships to improve the benefit of multi-scale feature learning of the network. Moreover, for too small ships, it can be improved by up-sampling interpolation to avoid their information loss in the network. A potential solution to small ship detection is to design a network to super-resolution reconstruct small ships under low resolution so as to expand the characteristics of small ships.
From Figure 11c, the positions of ships in images are completely random, so scholars need to be careful when using random slicing to realize data enhancement. Figure 12 shows the data statistics results on the RBox-SSDD dataset. We have investigated the distribution of the RBox width and height shown in Figure 12a, the distribution of the ratio between the RBox width and height in Figure 12b, the distribution of the area of the RBox in Figure 12c, the distribution of the RBox center coordinates (x, y) in Figure 12d (see Figure 5a), and the distribution of the RBox angle θ in Figure 12e.

Data Analysis on RBox-SSDD
From Figure 12a, compared with the BBox size distribution in Figure 11a, the size distribution of the ship is closer to the edge of the funnel. In other words, almost no ships are square, which is in line with common sense. BBox ships appear diagonally because the direction of the ship is around 45 • , so the width of BBox is equal to the height. This indicates that RBox is more advanced than BBox, and the former can better depict the real ship.
From Figure 12b, the distribution rule of the width-height ratio of RBox is different from that of BBox. When the aspect ratio is about 3, a small crest appears. This is because RBox can successfully identify the long and short sides of ships, so the aspect ratio is mostly greater than 1. However, RBox is still powerless for very small ships because the total number of pixels of the ship is too small. From Figure 12c, similar conclusions can be drawn to Figure 11c. From Figure 12d, similar conclusions can be drawn to Figure 12d. Because of the limited pages, we will not repeat the introduction.
Only RBox provided the direction theta θ distribution in Figure 12e. From Figure 12e, the angle distribution of ships presents a bowl shape with high ends and low middle. In other words, most ships maintain the relevant vertical and horizontal state in SAR images. However, some ships parking at ports may enable large direction angles. Based on this phenomenon, we can roughly identify inshore ships. Then, we can contrapuntally design the network to improve the detection accuracy of inshore ships. Moreover, if scholars want to improve the multi-directional detection performance of detectors, we hold the view that ships with strong angle identification (the bowl bottom in Figure 12e From Figure 12a, compared with the BBox size distribution in Figure 11a, the size distribution of the ship is closer to the edge of the funnel. In other words, almost no ships are square, which is in line with common sense. BBox ships appear diagonally because the direction of the ship is around 45°, so the width of BBox is equal to the height. This indicates that RBox is more advanced than BBox, and the former can better depict the real ship.
From Figure 12b, the distribution rule of the width-height ratio of RBox is different from that of BBox. When the aspect ratio is about 3, a small crest appears. This is because RBox can successfully identify the long and short sides of ships, so the aspect ratio is mostly greater than 1. However, RBox is still powerless for very small ships because the total number of pixels of the ship is too small. From Figure 12c, similar conclusions can be drawn to Figure 11c. From Figure 12d, similar conclusions can be drawn to Figure 12d. Because of the limited pages, we will not repeat the introduction.
Only RBox provided the direction theta distribution in Figure 12e. From Figure  12e, the angle distribution of ships presents a bowl shape with high ends and low middle. In other words, most ships maintain the relevant vertical and horizontal state in SAR images. However, some ships parking at ports may enable large direction angles. Based on this phenomenon, we can roughly identify inshore ships. Then, we can contrapuntally design the network to improve the detection accuracy of inshore ships. Moreover, if scholars want to improve the multi-directional detection performance of detectors, we hold the view that ships with strong angle identification (the bowl bottom in Figure 12e) should be rotated to generate more samples so as to avoid the angle imbalance learning. Figure 13 shows the data statistics results on the PSeg-SSDD dataset. We have investigated the distribution of the PSeg area shown in Figure 12a, the distribution of the PSeg perimeter in Figure 12b, and the distribution of the proportion of the PSeg area among the whole image in Figure 12c.  Figure 13 shows the data statistics results on the PSeg-SSDD dataset. We have investigated the distribution of the PSeg area shown in Figure 12a, the distribution of the PSeg perimeter in Figure 12b, and the distribution of the proportion of the PSeg area among the whole image in Figure 12c.

Data Analysis on PSeg-SSDD
In Figure 13a, in line with previous statements, the number of large ships is far smaller than that of small ships. This quantitative imbalance will inevitably make it difficult for networks to learn the characteristics of large ships effectively. As a result, the detection accuracy of large ships is lower than that of small ships. Quantitative comparison results can be found in the work of Mao et al. [54]. This phenomenon seems contrary to common sense because, in fact, small ships are harder to detect usually. One can reduce the resolution of the large ship image by down-sampling to improve the detection performance of large ships. This is because, in this way, the rare large ships may become medium-sized ships so as to avoid the huge scale imbalance learning of the network.
In Figure 13b, the distribution of the PSeg perimeter is a little different from that of the PSeg area based on the visual observation from the highest value. In other words, the larger the area of a ship, it does not necessarily mean the larger its perimeter. This is caused by the flat shape of the ship. Moreover, the existence of speckle noise may affect the statistics of the actual perimeter of the ship because the noise will make the edge of the ship more uneven, thus increasing the contact area between the ship and the ocean. Therefore, when using SSDD, one should better consider suppressing speckle noise. Some previous work in Zhang et al. [55] and Chen et al. [39] considered this problem.
From Figure 13c, SAR ships always account for a very small proportion in the whole image, most <4%. This vividly shows the characteristics of the "bird's-eye" view of SAR. Therefore, SAR remote sensing images are different from the optical images of natural scenes that have a "person's-eye" view. This shows that it is not feasible to directly apply deep learning detectors from the field of computer vision to SAR ship detection. Scholars should design the network according to the characteristics of SAR so as to achieve purposeful ship detection rather than generic object detection [123]. In Figure 13a, in line with previous statements, the number of large ships is far smaller than that of small ships. This quantitative imbalance will inevitably make it difficult for networks to learn the characteristics of large ships effectively. As a result, the detection accuracy of large ships is lower than that of small ships. Quantitative comparison results can be found in the work of Mao et al. [54]. This phenomenon seems contrary to common sense because, in fact, small ships are harder to detect usually. One can reduce the resolution of the large ship image by down-sampling to improve the detection performance of large ships. This is because, in this way, the rare large ships may become medium-sized ships so as to avoid the huge scale imbalance learning of the network.
In Figure 13b, the distribution of the PSeg perimeter is a little different from that of the PSeg area based on the visual observation from the highest value. In other words, the

Using Standards
So far, 75 public reports have used SSDD to study DL-based SAR ship detection, but there are no unified standards for the way they use SSDD because the initial open report of SSDD did not provide them. This situation will hinder fair methodological comparison and effective academic exchanges. Therefore, we will explicitly formulate some strict using standards, e.g., (1) the training-test division determination in Section 6.1, (2) the inshore-offshore protocol in Section 6.2, (3) the ship-size reasonable definition in Section 6.3, (4) the determination of the densely distributed ship samples in Section 6.4, and (5) the determination of the densely parallel berthing at ports ship samples in Section 6.5.

Training-Test Division Determination
The original report of SSDD in [28] adopted the random ratio of 7:1:2 to divide the dataset into a training set, a validation set, and a test set. However, this random division mechanism leads to great uncertainty in the samples in the test set. For example, using the same detector for multiple training and testing by a random division leads to different accuracy results due to the great uncertainty of training samples. This is because the number of samples in SSDD is too small, only 1160. In this case, the random partition may destroy the distribution consistency between the training set and the test set.
Later, some scholars also adopted other ratios for training, validation, and testing, e.g., 7:2:1 in the work of Chen et al. [62], about 5:1 in the work of Yu et al. [97], 7:3 in the work of Wu et al. [99], 8:2 in the work of Chen et al. [90], and so on. Obviously, these diverse dataset division mechanisms will lead to unfair methodological comparison, which is not conducive to academic exchanges. This problem was also revealed by Zhang et al. [57,73] and Chen et al. [90].
In fact, in the field of computer vision, two well-known datasets for object detection, i.e., the PASCAL VOC [113] and COCO datasets [114], both provide the only determined training set, verification set, and test set, which is also to ensure the fairness of their competition. Later, inspired by this practice, the publisher of the AIR-SARShip-1.0 dataset [104] provided the unique training set and test set files. Therefore, inspired by these works, we make strict regulations on the division of training set and test set of SSDD, as in Table 6. From Table 6, the images with the last digits of the file number 1 and 9 are uniquely determined as the test set, and the rest are regarded as the training set. Such a rule can also maintain the distribution consistency of the training set and test set, which is conducive to network feature learning. More information on distribution consistency can be found in the work of Han et al. [66][67][68]. Moreover, the official released SSDD does not provide the unique validation set. Scholars can extract some images from the training set to form a verification set according to their own needs. We only care about the fair accuracy comparison when the test set is exactly the same.
Finally, according to our experience, we suggest that scholars do not set up a verification set because it sacrifices the learning gain of the network. In short, the number of samples in SSDD is very small, so we should cherish each sample to ensure that the training gradient of each test sample is reduced. However, the verification set does not participate in the training gradient descent, which will inevitably lead to insufficient ship feature learning. Of course, if researchers want to monitor whether the model has been overfitted in the training process, they can set up multiple overlapping cross-verification sets to achieve the purpose.

Inshore-Offshore Protocol
Many previous reports focused on inshore ship detection, e.g., Wei et al. [51], Su et al. [52], Yang et al. [59], Zhang et al. [73], and so on. Ships landing on the shore are easily interfered with by port facilities, and the land backgrounds in the image are more complex. Nowadays, inshore SAR ship detection is a research hotspot. In order to ensure the fairness of accuracy in these two different scenarios, we also uniquely determined the inshore and offshore files tested. Inshore images are marked in magenta in Figure A1.
According to statistics, among the 232 test images, there are 186 offshore scene images, while there are only 46 inshore scene images. The proportion between offshore and inshore scenes is shown in Figure 14. Similar to HRSID [63] and LS-SSDD-v1.0 [105], we regard images containing land as inshore samples and others as offshore samples. From Figure 14, the numbers of inshore samples and offshore ones are hugely imbalanced (19.8% VS. 80.2%). This phenomenon seems to accord with the fact that the ocean area of the Earth is much larger than land. However, DL needs a lot of data to learn features. More data often brings better learning benefits; less data is bound to cause inadequate learning. Thus, the sample number imbalance between the offshore scene and inshore one will bring about a huge imbalance of models' learning representation capacity between offshore and inshore. Networks will be trapped in many easy offshore samples. The detection performance of inshore ships will become poor due to fewer training samples; meanwhile, that of offshore ships will become excellent due to more samples.
In the future, scholars should pay special attention to the above problem when designing detectors. Several reports, e.g., the balance scene learning mechanism in the work of Zhang et al. [73] and the visual attention mechanism in the work of Chen et al. [46], can provide some valuable suggestions. We hold the view that one can design a classifier to realize scene recognition and then carry out selective-scene data enhancement to achieve balanced scene learning. Moreover, an interesting report from Chen et al. [90] proposed to mix-up stitches multiple rotating ships into one image and mosaic combines four original images into one image, which can improve the detection performance of inshore scenes. This work is inspired by YOLOv4 [124]. This method is rather useful and can also avoid network training falling into a large number of useless pure background negative samples, i.e., no ships in the image [105].

Ship-Size Definition
Multi-scale ship detection is a challenging task because different types of ships have different sizes, and if the same ship has different resolutions, it will also lead to a change in the total number of pixels in the image. However, so far, there is still no clear definition of which ships are small ships and which ships are large ships in SAR images. Some scholars believe that ships with less than 40 pixels are small ships, but they do not take into account the actual resolution of the image. Moreover, it is not consistent with the consensus in the computer vision community to determine the size definition of ships simply according to the number of pixels.
In the SAR ship detection community, Wei et al. [51], Su et al. [52], and Mao et al. [54] followed the standard of the COCO dataset to classify ship sizes, i.e., the area of BBox < 32 2 means a small ship, the area of BBox < 96 2 but >32 2 means a medium ship, and the area of BBox > 96 2 means a large ship. However, this definition is tailored only for the COCO dataset, and it may be problematic to use it on the SSDD dataset. It does not match the area distribution of BBox very well, as shown in Figures 11c, 12c and 13a. Therefore, it is better to specify the ship size definition according to the SSDD dataset we use. Moreover, we should also define the ship size according to different label types.
Finally, according to Figure 15, we define the ship size standard in Table 7. Here, according to the statistical histogram of the label area, we respectively distribute the large, medium, and small ships.
sus in the computer vision community to determine the size definition of ships simply according to the number of pixels.
In the SAR ship detection community, Wei et al. [51], Su et al. [52], and Mao et al. [54] followed the standard of the COCO dataset to classify ship sizes, i.e., the area of BBox < 32 2 means a small ship, the area of BBox < 96 2 but >32 2 means a medium ship, and the area of BBox > 96 2 means a large ship. However, this definition is tailored only for the COCO dataset, and it may be problematic to use it on the SSDD dataset. It does not match the area distribution of BBox very well, as shown in Figures 11c, 12c, and 13a. Therefore, it is better to specify the ship size definition according to the SSDD dataset we use. Moreover, we should also define the ship size according to different label types.
Finally, according to Figure 15, we define the ship size standard in Table 7. Here, according to the statistical histogram of the label area, we respectively distribute the large, medium, and small ships.    Moreover, for these kinds of difficult samples, we provide several potential solutions: 1.
One can use random crop data enhancement to increase the proportion of small ships in the whole image. 2.
One can detect small ships in the shallow layer of the deep network with low feature loss. 3.
One can combine CFAR into the deep network because CFAR is more pixel-sensitive. 4.
One can combine visual saliency theory to generate a saliency map to guide deep network learning features because these small ships are very significant in human-eye observation.

5.
One can design a deep network to super-resolution reconstruct the small ship. In this way, the features of small ships will become richer.

Densely Parallel Berthing at Ports Ship Samples
Ships densely parallel berthing at ports are also rather difficult to detect. On the one hand, the very complex land background will reduce the training efficiency because there will be a large number of negative samples generated during training. On the other hand, ships moored side by side will produce hull overlap effects because of SAR's special imaging mechanism and limited resolution. In order to facilitate the accuracy evaluation specifically in this specific scenario, we specify the samples of parallel berthing at ports ships in the test set as in Figure 17. Moreover, for these kinds of difficult samples, we provide several potential solutions:

1.
One can use the attention mechanism to suppress the land interference so as to focus on the ship region. 2.
One can use generative adversarial networks (GAN) [125] to generate more samples of such scenes so as to improve the learning proportion of these ships, e.g., the work of Jiang et al. [95]. 4.
One can use the soft-NMS post-processing algorithm [126] to avoid missed detections, e.g., the reports of Wei et al. [51] and Zhang et al. [101].

Conclusions
This article reviews the current usage status of the first open SSDD dataset in the SAR ship detection community. We release the official version of SSDD, covering three types: BBox-SSDD, RBox-SSDD, and PSeg-SSDD. We have made a detailed analysis of the three datasets when applied to different tasks. We comprehensively summarize the differences between ship detection in the SAR remote sensing community and general object detection in the computer vision community, which will help future scholars to design more purposeful detectors combined with the characteristics of SAR. Furthermore, we explicitly formulate some strict using standards for the sake of fair methodological comparisons and effective academic exchanges, including (1) the training-test division determination, (2) the inshore-offshore protocol, (3) the ship-size reasonable definition, (4) the determination of the densely distributed small ship samples, and (5) the determination of the densely parallel berthing at ports ship samples. These using standards can provide a fair methodological comparison. We also put forward many valuable suggestions to improve the detection accuracy of difficult samples for possible future scholars. We expect that this review will be useful for relevant scholars who are studying DL-based SAR ship detection. We hope this review could also serve as a careful and useful introductory tutorial for beginners who are preparing to study SAR ship detection based on DL.
Finally, we will develop an online evaluation system for benchmarks on SSDD. Researchers of this field can submit their results for a completely fair evaluation.