TISD: A Three Bands Thermal Infrared Dataset for All Day Ship Detection in Spaceborne Imagery

: The development of infrared remote sensing technology improves the ability of night target observation, and thermal imaging systems (TIS) play a key role in the military ﬁeld. Ship detection using thermal infrared (TI) remote sensing images (RSIs) has aroused great interest for ﬁshery supervision, port management, and maritime safety. However, due to the high secrecy level of infrared data, thermal infrared ship datasets are lacking. In this paper, a new three-bands thermal infrared ship dataset (TISD) is proposed to evaluate all-day ship target detection algorithms. All images are from SDGSAT-1 satellite TIS three bands RSIs of the real world. Based on the TISD, we use the state-of-the-art algorithm as a baseline to do the following. (1) Common ship detection methods and existing ship datasets from synthetic aperture radar, visible, and infrared images are elementarily summarized. (2) The proposed standard deviation of single band, correlation coefﬁcient of combined bands, and optimum index factor features of three-bands datasets are analyzed, respectively. Combined with the above theoretical analysis, the inﬂuence of the bands’ information input on the detection accuracy of a neural network model is explored. (3) We construct a lightweight network based on Yolov5 to reduce the number of ﬂoating-point operations, which is beneﬁcial to reduce the inference time. (4) By utilizing up-sampling and registration pre-processing methods, TI images are fused with glimmer RSIs to verify the detection accuracy at night. In practice, the proposed datasets are expected to promote the research and application of all-day ship detection.


Introduction
Ship detection is of great value in marine traffic management, navigation safety supervision, fishery management, ship rescue, ocean monitoring, and other civil fields.Timely acquisition of ship location, size, heading, and speed information is of great significance to ensure maritime safety.Due to the complexity of the ocean environment, high labor cost, experience dependence, and unreliable manual observation, automatic ship detection using remote sensing images (RSIs) has attracted more and more interest.
At present, remote sensing satellite images mainly include visible, infrared, and synthetic aperture radar (SAR).With the limited number of SAR satellites and long revisiting period, the applications based on SAR images cannot achieve real-time ship monitoring.Due to the great variation of weather and wind speed, there is high non-uniformity of sea surface clutter in SAR images [1], which is not conducive to ship detection based on SAR images.Ship monitoring based on spaceborne optical images works well, except when there is heavy cloud cover and light restriction.Infrared imaging systems can record the radiation, reflection, and scattering information of the object to overcome some of the negative effects of thin clouds, mist, and dark light.Therefore, target detection based on thermal infrared remote sensing images has become one of the important means of all-day Earth observation.
Ship detection comprises hull and wake detection [2].However, ship wakes do not always exist; therefore, hull detection is more widely used.In recent years, researchers have proposed a variety of ship detection algorithms based on RSIs.In general, ship target features are extracted by traditional or intelligent methods.Computer-assisted ship detection methods typically involve feature maps extraction and automatic location by classifiers, thereby freeing human resources.Traditional detection methods extract middle or low level features containing the color, texture, and shape of targets.Intensity distribution differences between ships and waters are helpful to distinguish ship candidates from sea, but the effectiveness varies across different sea types and states.Since the sea surface is more uniform than the target, Yang et al. [3] defined intensive metrics to distinguish anomalies from relatively similar backgrounds.Zhu et al. [4] firstly segmented images to obtain simple shapes, and then extracted shape and texture features from ship candidates.Finally, three classification strategies were used to classify ship candidates.In calm seas, the results of the above method are stable.However, the algorithm based on low-level features has poor robustness when wave, cloud, rain, fog, or reflection occur.In addition, manual feature selection is time consuming and strongly depends on the expertise of the user.
Consequently, later research has focused on how to extract and incorporate more ship features to detect ships more accurately and quickly.In recent years, convolutional neural network (CNN) has made many breakthroughs.Through a series of convolutional and pooling layers, more distinguishable features can be extracted by CNN.However, the accuracy of data-driven CNN detection methods largely depends on large-scale and highquality training datasets.Driven by CNN, intelligent methods based on advanced features are mainly divided into two categories.The two-stage algorithms first utilize the region proposal network to select the approximate objects region, and then the target detection network classifies the candidate region to obtain more accurate boundaries.Two-stage models mainly contain R-CNN [5], Fast R-CNN [6], and Faster R-CNN [7].The one-stage detection methods include SSD [8], RFBNet [9], and YOLOv1 [10], YOLO9000 [11], and YOLOv3 [12].One-stage algorithms omit the region proposal process and directly return to the bounding box and assign the relevant class probability.
The accuracy of the supervised algorithm is closely related to the quality of the datasets.Although various public datasets such as ImageNet [13], PASCAL VOC [14], COCO [15], and DOTA [16] can be used to identify multiple general targets, they are not specifically meant for ship detection.Some large remote sensing targets datasets, such as FAIR1M [17], include geographical information containing latitude, longitude, and resolution attributes to provide abundant fine-grained classification information.Qi et al. [18] designed MLRSNet datasets for multi-label scene classification and image retrieval visual recognition tasks.Zhou et al. proposed a large-scale Patternnet-Google Maps/API [19] dataset which is suitable for deep learning-based image retrieval methods.The open large datasets have greatly accelerated the development of target detection.However, public datasets specific to maritime vessel detection are still not available.
To sum up, there are three main challenges for space-based thermal infrared all-sky automatic ship detection research: (1) Due to the high security level of infrared data, training datasets of thermal infrared remote sensing images for ship detection are scarce.(2) During heat source imaging, the target and boundary may be too indistinct to distinguish, which may lead to false alarms or missed detection.(3) Due to the lack of a clear connection between network parameters and approximate mathematical functions, the interpretability of CNN is poor.The neural network can find as many ships as possible and predict the accurate target position, but it is not known which input information is useful.
In this paper, we label a new three-bands thermal infrared ship dataset (TISD) to solve the above challenges.All images are from the SDGSAT-1 thermal imaging system (TIS) real remote sensing images.SDGSAT-1 is designed for a sun-synchronous orbit at an altitude of 1.
To the best of our knowledge, we are the first to annotate the three-bands thermal infrared ship dataset.All images are from the SDGSAT-1 TIS three-bands real remote sensing images.To enrich the proposed datasets, the selected images contain features of different target sizes and illumination levels in a variety of complex environments.TISD Website: https://pan.baidu.com/s/1a9_iT-pdaSZ-hkBYU2Qciw?pwd=fgcq (accessed on 14 October 2022).

2.
Due to the lack of clear connection between network parameters and approximate mathematical functions, it is not known which input information is useful.In order to explore the relationship between input information and detection accuracy, the optimum index factor (OIF), which is related to the key information and redundancy between different bands images, is used appropriately to evaluate the useful features in our dataset.

3.
Based on TISD, we used the state of art detector that we proposed before, namely, the improved Yolov5s [20], as the baseline to train different models by utilizing different spectral bands datasets.Combined with the above theoretical analysis, the influence of combined bands on detection accuracy is explored.

4.
The difficulties of the existing ship detection methods based on datasets are summarized.By using up-sampling and registration pre-processing methods, glimmer images are combined with thermal infrared remote sensing images to verify the all-day ship detection capability.
The organizational structure of this study is as follows.In Section 2, the related work of publicly available ship datasets is elaborated in detail.In Section 3, the description of the datasets is outlined.In Section 4, we present experiments and discuss the experimental results to validate the effectiveness of the proposed research.Finally, in Section 5, we summarize the content of this study.

Related Works
At present, RSIs from radar, optical, reflective infrared, and thermal infrared are mainly used for ship detection, as shown in Figure 1.As an active microwave sensor, SAR can obtain high-resolution data under various weather conditions, and has been widely used in ocean surveillance [21,22].With the development of deep learning and imaging technology, many automatic target algorithms for RSIs have been proposed to detect different targets.To capture the features of ships with large aspect ratio, Zhao et al. [23] proposed a new attentional reception pyramid network, which has asymmetric core sizes and various dilated rates.Due to different local clutter and low signal-clutter ratio existing in SAR images, Wang et al. [24] used variance-weighted information entropy method to measure the local difference between the targets and its neighborhood.Then, the optimal window selection mechanism based on multi-scale local contrast measures are utilized to enhance the target from the complex background.Considering the difference in gray distribution and shape between ship and clutter, Ai et al. [25] modeled using the ship target's gray correlation and joint gray intensity distribution of strong clutter pixel and its adjacent pixel in two-dimensional joint lognormal distribution, which greatly reduced the false positives caused by speckle and local background non-uniformity.Gong et al. [26] presented a novel neighborhood-based ratio operator to produce a difference image for change detection in SAR images.Zhang et al. [27] proposed an unsupervised change detection method using saliency extraction; however, this method is not suitable for object detection in a single-frame image.Song et al. [28] generated proven robust training datasets by using synthetic SAR images and automatic identification system data; however, the acquisition of the above data requires the establishment of ground base stations, which is limited by region and lacks real-time capability.Rostami et al. [29] proposed a new semi-supervised domain adaptive algorithm based on existing optical images labels to transfer features learned from optical to SAR.To be more intuitive, the existing general multi-target detection datasets with ship targets and the proposed TISD are summarized, as shown in Table 1.

Datasets Analysis
In this paper, we propose a new dataset which consists of 2190 768 × 768 images, containing day and night, and 12,774 targets selected from a 4.23-7.53[20] aspect ratio.All images are from SDGSAT-1 TIS three-bands RSIs of the real world.Each image in our dataset is accurately annotated using labels and bounding boxes.There are three bands in the TISD, including: B1: 11.5~12.5 µm, B2: 8~10.5 µm, and B3: 10.3~11.3µm.To enrich the TISD, the images are selected to cover a set of features including different sizes, lightings, and scenes.The following are detailed steps for labeling the datasets and for data analysis.

Movement Correction Based on Cross Correlation Method
Different from geometrically aligning [52], the offset between different channels is horizontal and vertical in TISD.Therefore, cross-correlation program is chosen to calculate offset from different channels.The cross-correlation function represents the correlation degree between two random or deterministic signals at any two different time.It is assumed that image 2 f is obtained by translation of 1 f , as shown in Equation ( 1), and then it is transformed according to Fourier theorem in Equation ( 2).The Fourier transform is computed for the cross-correlation function, as in Equations ( 3) and (4).Finally, Equation ( 4) is transformed by inverse Fourier to get Equation (5).
( , ) ( , ) ( , ) ( , ) ( , ) ( , ) According to the cross-correlation function, the peak value of ( , ) F x y is at the origin and the peak value of ( , ) R x y is at the ( , )   x y Δ Δ , which is the offset of 2 ( , ) f x y .The horizontal displacement deviation of image blocks with rich texture can be calculated quickly and intuitively by the cross-correlation registration method.The offsets of images with 30 m and 10 m resolutions are calculated, and the difference between normalized offsets is   The resolution of SDGSAT-1 TIS is 30 m, the images of TISD are up-sampled to 10 m.
As the only available fine-grained ship dataset, HRSC2016 [35] has been used as a baseline in many studies.By using public ship dataset HRSC2016, Wang et al. [39] validated an improved encoder-decoder structure which added a batch normalization (BN) processing layer to speed up model training and introduced extended convolution at different rates to fuse features of different scales.However, some subcategories of HRSC2016 contain no more than ten ship instances, and some small ships are neglected during marking.Given the lack of diversity in public datasets, Cui et al. [36] established HPDM-OSOD and proposed a novel anchor-free rotating ship detection framework, SKNet.The ship target center key points and shape dimensions, including width, height, and rotation angle, are utilized during modeling to avoid many predefined anchors in the rotating ship detector.For the limitation of fine-grained datasets, Han et al. [37] established a new twenty-class three-level directional ship recognition dataset (DOSR).Li et al. [40] combined the classic Saliency Estimation algorithm and deep CNN object detection to ensure the extraction of large ships from multi-scale ships in high-resolution RSIs.Yao et al. [41] used a region proposal regression algorithm to identify ships of panchromatic images, but the large parameters of the network led to long prediction times.Due to the large size of the remote sensing images, Zhang et al. [42] firstly utilized a support vector machine to classify the water and non-water areas.However, ships close to shore are difficult to classify by the preprocessing separation method.
As opposed to SAR or spaceborne optical images, ground-based visual images can achieve better accuracy and real-time processing for ship detection, which can be widely used in port management, cross-border ship detection, autonomous shipping, and safe navigation.Li et al. [43] introduced the attention module to the YOLOv3 network to achieve a good application for real-time ship detection in a real scenario.Shao et al. [44] used the Seaships dataset [38] to train CNN to predict the approximate position of ships, and then used significant area detection and coastline information based on global contrast to correct the position of ships.For continuous video detection tasks, accuracy should be sacrificed to ensure real-time processing.
Due to the high secrecy of the infrared remote sensing data, the supply of images is very limited; therefore, it is difficult to collect many positive samples of ships.Transfer learning is helpful when the amount of data is insufficient.Wang et al. [45] used optical panchromatic images to assist limited infrared data during auxiliary training; however, there is a great difference between infrared and panchromatic images in imaging principle.Song et al. [46] collected dark light boat images from the infrared cameras on the ships.The datasets contain 3352 marked images of a variety of ship navigation states and interference scenarios.Li et al. [47] utilized MarDCT videos and images from fixed, mobile, and pan-tiltzoom cameras [48], as well as the PETS2016 dataset [49] for visual performance evaluation.It must be noted that the above studies are not based on real spaceborne infrared remote sensing data [50,51], and infrared RSIs have irreplaceable value in the field of ship detection.Therefore, to make up for the lack of a spaceborne thermal infrared public dataset, we notated a thermal infrared ship dataset in three bands based on SDGSAT-1 TIS images.

Datasets Analysis
In this paper, we propose a new dataset which consists of 2190 768 × 768 images, containing day and night, and 12,774 targets selected from a 4.23-7.53[20] aspect ratio.All images are from SDGSAT-1 TIS three-bands RSIs of the real world.Each image in our dataset is accurately annotated using labels and bounding boxes.There are three bands in the TISD, including: B1: 11.5~12.5 µm, B2: 8~10.5 µm, and B3: 10.3~11.3µm.To enrich the TISD, the images are selected to cover a set of features including different sizes, lightings, and scenes.The following are detailed steps for labeling the datasets and for data analysis.

Movement Correction Based on Cross Correlation Method
Different from geometrically aligning [52], the offset between different channels is horizontal and vertical in TISD.Therefore, cross-correlation program is chosen to calculate offset from different channels.The cross-correlation function represents the correlation degree between two random or deterministic signals at any two different time.It is assumed that image f 2 is obtained by translation of f 1 , as shown in Equation ( 1), and then it is transformed according to Fourier theorem in Equation ( 2).The Fourier transform is computed for the cross-correlation function, as in Equations ( 3) and ( 4).Finally, Equation ( 4) is transformed by inverse Fourier to get Equation (5).
According to the cross-correlation function, the peak value of F(x, y) is at the origin and the peak value of R(x, y) is at the (∆x, ∆y), which is the offset of f 2 (x, y).The horizontal displacement deviation of image blocks with rich texture can be calculated quickly and intuitively by the cross-correlation registration method.The offsets of images with 30 m and 10 m resolutions are calculated, and the difference between normalized offsets is negligible, as shown in Table 2.The horizontal and vertical offsets of two adjacent band images with a resolution of 30 m are 30 pixels and 2 pixels, respectively, are shown in Figure 2.Where ∆x B1−B2 is the horizontal offset of bands B1 and B2, ∆x B2−B3 is the horizontal offset of bands B2 and B3, ∆y B1−B2 is the vertical offset of bands B1 and B2, and ∆y B2−B3 is the vertical offset of bands B2 and B3 with the resolution of 30 m or 10 m. (m represents meters).
offset of bands B2 and B3,  In this paper, B1 channel is 11.5~12.5 µm, B2 channel is 8~10.5 µm, and B3 channel is 10.3~11.3µm.After B2 translates 30 pixels to the left and B3 translates 60 pixels to the left, image fusion can be carried out after registration with B1, corresponding to the R, G, and B channels of fusion images, as shown in Figure 2.

Labeling Process
The length of aircraft carriers or cruisers is about 200-350 m, which can account for 7-12 pixels in a 30 m resolution image.However, the standard length of common marine fishing vessels is generally less than 100 m, which takes up fewer pixels.To facilitate the annotation, the image should be preprocessed by up-sampling, mainly including nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation.In order to avoid sacrificing image quality, we adopted the time-consuming bicubic interpolation method, shown in Equation (6), where ( ) S x is the interpolation kernel and f (M) is the interpolation calculation formula of pixel values of corresponding scaled matrix coordinate points, In this paper, B1 channel is 11.5~12.5 µm, B2 channel is 8~10.5 µm, and B3 channel is 10.3~11.3µm.After B2 translates 30 pixels to the left and B3 translates 60 pixels to the left, image fusion can be carried out after registration with B1, corresponding to the R, G, and B channels of fusion images, as shown in Figure 2.

Labeling Process
The length of aircraft carriers or cruisers is about 200-350 m, which can account for 7-12 pixels in a 30 m resolution image.However, the standard length of common marine fishing vessels is generally less than 100 m, which takes up fewer pixels.To facilitate the annotation, the image should be preprocessed by up-sampling, mainly including nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation.In order to avoid sacrificing image quality, we adopted the time-consuming bicubic interpolation method, shown in Equation ( 6), where S(x) is the interpolation kernel and f(M) is the interpolation calculation formula of pixel values of corresponding scaled matrix coordinate points, as shown in Formula (7), where A, B, and C are matrices and Im is the original gray matrix.
Remote Sens. 2022, 14, 5297 8 of 21 After registration and up-sampling, the LabelImg software is used to annotate triplechannel pseudo-color patches with a resolution of 10 m.Specifically, in PASCAL VOC XML annotation format, bndbox represents the four coordinate values in the upper left and lower right corner of the annotation box.Additionally, it should be noted that the coordinate origin is the upper left corner of the picture, as shown in Figure 3. ' After registration and up-sampling, the LabelImg software is used to annotate triple-channel pseudo-color patches with a resolution of 10 m.Specifically, in PASCAL VOC XML annotation format, bndbox represents the four coordinate values in the upper left and lower right corner of the annotation box.Additionally, it should be noted that the coordinate origin is the upper left corner of the picture, as shown in Figure 3.

Statistical Analysis of Dataset
In the TISD, band B1 contains 545 images and 2927 ships, with an average of 5.37 ships per image.After statistical analysis of the target bounding boxes, the length of anchor boxes is 9 to 87 pixels, namely 90 to 870 m in the image with 10 m resolution.The width of an anchor box includes 7 to 67 pixels, that is, 70 to 670 m in the image with 10 m resolution, as shown in Figure 4.The aspect ratio in the TISD is widely distributed, mainly from 0.3 to 3.5, as shown in Figure 5.In the process of designing potential target area, candidate boxes of different sizes and aspect ratios have been weighed.The TISD contains a minimum temperature difference of 0.3226 K between that of ship and sea.

Statistical Analysis of Dataset
In the TISD, band B1 contains 545 images and 2927 ships, with an average of 5.37 ships per image.After statistical analysis of the target bounding boxes, the length of anchor boxes is 9 to 87 pixels, namely 90 to 870 m in the image with 10 m resolution.The width of an anchor box includes 7 to 67 pixels, that is, 70 to 670 m in the image with 10 m resolution, as shown in Figure 4.The aspect ratio in the TISD is widely distributed, mainly from 0.3 to 3.5, as shown in Figure 5.In the process of designing potential target area, candidate boxes of different sizes and aspect ratios have been weighed.The TISD contains a minimum temperature difference of 0.3226 K between that of ship and sea.

Dataset Feature Analysis
For the images with the same quantization level, there is a direct relationship between the standard deviation and the quantity of information.The standard deviation reflects the total dispersion between the gray value of each pixel and the mean of the image.To a certain extent, the larger the standard deviation, the greater the information content contained.The minimum, maximum, mean, and standard deviation of the three bands in our datasets are summarized in Table 3.The TISD contains day and night images of clouds, rivers, and sea scenes, as shown in Figure 6.

Dataset Feature Analysis
For the images with the same quantization level, there is a direct relationship between the standard deviation and the quantity of information.The standard deviation reflects the total dispersion between the gray value of each pixel and the mean of the  For the images with the same quantization level, there is a direct relationship between the standard deviation and the quantity of information.The standard deviation reflects the total dispersion between the gray value of each pixel and the mean of the image.To a certain extent, the larger the standard deviation, the greater the information content contained.The minimum, maximum, mean, and standard deviation of the three bands in our datasets are summarized in Table 3.The TISD contains day and night images of clouds, rivers, and sea scenes, as shown in Figure 6.The correlation coefficient is related to the redundancy between different bands images.If it approaches or is 0, there is no correlation between bands.The correlation coefficient between B1: 11.5-12.5 µm and B3: 10.3-11.3µm is larger than that of B1: 11.5-12.5 µm and B2: 8-10.5 µm, indicating that the images fused by B1 and B2 bands have less redundant information when they are input into the CNN, as shown in Table 4. Combining standard deviation and correlation coefficient, Chavez proposed the optimum index factor (OIF) in 1982, as shown in Equation (9).Where S i is the standard deviation, R ij is the correlation coefficient of i and j channels, and n represents the combination of n bands.The larger the OIF, the greater the information content of the combined n bands image.The band combination corresponding to a larger OIF is the optimal scheme, as shown in Table 5.

Experimental Analysis
In this section, evaluation criteria, the proposed network (including experimental details and architecture of our method), comparative experiments (containing quantitative and qualitative results), and the fusion of glimmer and thermal infrared results are described in detail.

Evaluation Criteria
By using the proposed TISD dataset, the advanced algorithm is utilized for evaluation to establish relevance to the dataset feature analysis and a baseline for future research in the field.Precision is the correct positive class divided by all positive classes found, as shown in Equation (10).Recall is the correct positive class found divided by all the positive classes that should have been found, as in Equation (11).To be more comprehensive, the mean average accuracy (mAP) is the area enclosed by Precision and Recall as the two coordinate axes, as shown in Equation (12).Additionally, in mAP@0.5, the number after @ is the threshold of intersection over union (IOU).The missing alarm (MA) is how many positive cases are missed, as shown in Equation ( 13).The false alarm (FA) is the number of negative cases misjudged to be positive cases, as shown in Equation (14).

The Proposed Network
The one-stage algorithms omit the region proposal in the two-stage models and they directly predict spatially separated boundary boxes and related class probabilities.To achieve real-time actual ship detection in this paper, the advanced one-stage target detection frame is chosen.To verify the reliability of datasets and feature analysis, the improved YOLO-based algorithm is proposed to train and generate corresponding models by utilizing the TISD datasets of different bands.The architecture of proposed all-day ship detection methods is shown in Figure 7.Our experiments run on a personal computer with 64-bit Ubuntu 20.04.1 operating system.The software consists of Python, Torch 1.9.0,Conda 4.12.0,CUDA 11.Our model is mainly divided into backbone, neck, and head.Based on our previous work [20], Dilated Conv is added in backbone to extract ships of different sizes.In the neck, SElayer is added to extract more important feature maps.Additionally, the details of these modules, including Focus, Ghost Bottleneck, and CSP1_X, are shown at the bottom in Figure 7.To achieve real-time ship detection on the satellite, we further lightweight the network by replacing ordinary convolution with depth-wise separable convolution in the head of the network.Compared with the state-of-the-art models, the number of floating-point operations (FLOPs) and parameters is greatly reduced in our Our model is mainly divided into backbone, neck, and head.Based on our previous work [20], Dilated Conv is added in backbone to extract ships of different sizes.In the neck, SElayer is added to extract more important feature maps.Additionally, the details of these modules, including Focus, Ghost Bottleneck, and CSP1_X, are shown at the bottom in Figure 7.To achieve real-time ship detection on the satellite, we further lightweight the network by replacing ordinary convolution with depth-wise separable convolution in the head of the network.Compared with the state-of-the-art models, the number of floating-point operations (FLOPs) and parameters is greatly reduced in our model.GFLOPs is used to measure the complexity of an algorithm or model.The GFLOPs of our model is 8.2, which greatly reduces the amount of computation required by the models, of which Faster R-CNN [7] is 46.7,SSD [8] is 19.6,and Yolov5s [20] is 17.1.Additionally, our model has 390 layers, 3,244,653 parameters, and 3,244,653 gradients.The memory size of the saved model is 6.5 MB.The number of parameters of our model is 3.2 M, of which Faster R-CNN [7] is 31.3M, SSD [8] is 138.0M, and Yolov5s [20] is 7.3 M. The lower number of FLOPs required to process the same image on the same hardware allows for more images to be processed in the same amount of time.In general, the lower the number of network layers and parameters, the smaller the memory required to save the model, and the lower the hardware memory requirements.Therefore, compared with the mainstream methods, our model can be more easily deployed on the embedded platform.

Comparative Experiments
Using the same test set, the Precision, Recall, and mAP@0.5 are compared.According to the datasets feature analysis in Section 3.3, the OIF of B12, B13, and B23 are 35.5952,34.1856, and 35.7666, respectively.To a certain extent, the OIF is related to the available information, that is, the data of band B23 contains more information than that of B12 and B13.
After the learning the data, the Precision and Recall curve (PR curve) of the B23 model completely covers the PR curve of the B12 and B13 models; therefore, it can be asserted that the performance of B23 is better than that of B12 and B13.The PR curve of B12 and B13 intersect, so their performances can be compared based on the area under the curve.As shown in Figure 8, the model obtained by B23 is significantly better than that of B12 and B13, which is consistent with the analysis results of OIF.
Remote Sens. 2022, 14, x FOR PEER REVIEW 14 of 22 ware allows for more images to be processed in the same amount of time.In general, the lower the number of network layers and parameters, the smaller the memory required to save the model, and the lower the hardware memory requirements.Therefore, compared with the mainstream methods, our model can be more easily deployed on the embedded platform.

Comparative Experiments
Using the same test set, the Precision, Recall, and mAP@0.5 are compared.According to the datasets feature analysis in Section 3.3, the OIF of B12, B13, and B23 are 35.5952,34.1856, and 35.7666, respectively.To a certain extent, the OIF is related to the available information, that is, the data of band B23 contains more information than that of B12 and B13.
After the learning the data, the Precision and Recall curve (PR curve) of the B23 model completely covers the PR curve of the B12 and B13 models; therefore, it can be asserted that the performance of B23 is better than that of B12 and B13.The PR curve of B12 and B13 intersect, so their performances can be compared based on the area under the curve.As shown in Figure 8, the model obtained by B23 is significantly better than that of B12 and B13, which is consistent with the analysis results of OIF.In the lower left corner of Figure 9, a comprehensive evaluation index mAP@0.5 in 200 epochs is selected for comparison.Moreover, the curves of 160~190 periods are amplified, and the mean average accuracy of combined band B23 is better than that of B12 and B13.In conclusion, there is a positive correlation between the band information content and the detection accuracy, and the trend of theoretical OIF analysis is consistent with that of mAP@0.5.The standard deviation of B1, B2, and B3 images with the same quantization level are 33.7011,36.0583, and 34.1300, respectively.Theoretically, the information content of B2 is higher than that of B1 and B3.Empirically, the more channels information is input into the same CNN model, the higher the possibility of extracting richer features.As shown in Figure 9, compared with single band and combined band, In the lower left corner of Figure 9, a comprehensive evaluation index mAP@0.5 in 200 epochs is selected for comparison.Moreover, the curves of 160~190 periods are amplified, and the mean average accuracy of combined band B23 is better than that of B12 and B13.In conclusion, there is a positive correlation between the band information content and the detection accuracy, and the trend of theoretical OIF analysis is consistent with that of mAP@0.5.The standard deviation of B1, B2, and B3 images with the same quantization level are 33.7011,36.0583, and 34.1300, respectively.Theoretically, the information content of B2 is higher than that of B1 and B3.Empirically, the more channels information is input into the same CNN model, the higher the possibility of extracting richer features.As shown in Figure 9, compared with single band and combined band, the quantitative evaluation mAP@0.5 of ship detection based on B23 datasets is the highest.In addition, the OIF of combined band B23 is also the highest, which means that increasing spectral channels is conducive to improving the target detection accuracy.Interestingly, mAP@0.5 of B2 is slightly lower than that of B23, but almost equal to mAP@0.5 of B12, which means that in the process of training CNN models, the input spectral channels should not be added blindly.In the binary classification experiment, if the candidate sample is predicted to be a ship target, the classification result belongs to the ship; otherwise, it belongs to the non-ship.Single band and combined band data are used to train the optimal model for testing, and the evaluation criteria are shown in Table 6.The Precision and Recall of B23, B2, and B123 ranked the top three.By analyzing the dataset features of standard deviation, correlation coefficients, and OIF, best spectrum channels combination can be selected during training, which is conducive to the improvement of target detection accuracy.In the binary classification experiment, if the candidate sample is predicted to be a ship target, the classification result belongs to the ship; otherwise, it belongs to the non-ship.Single band and combined band data are used to train the optimal model for testing, and the evaluation criteria are shown in Table 6.The Precision and Recall of B23, B2, and B123 ranked the top three.By analyzing the dataset features of standard deviation, correlation coefficients, and OIF, best spectrum channels combination can be selected during training, which is conducive to the improvement of target detection accuracy.
Through the evaluation in the TISD, the detection accuracy in the cloudy images is lower than that in the cloud-free images, therefore, broken clouds are the main false alarms during ship detection, as shown in Table 7.In cloud, river and sea scenes, the detection accuracy on the sea surface is the highest, up to 81.15%.
Based on the proposed datasets, a round-the-clock ship detection model can be obtained.The prediction results during day and night are shown in Figures 10 and 11, and the quantitative evaluation summary is shown in Table 8.Based on the proposed datasets, a round-the-clock ship detection model can be obtained.The prediction results during day and night are shown in Figures 10 and 11, and the quantitative evaluation summary is shown in Table 8.

Fusion of Glimmer and Thermal Infrared
Glimmer sensor is an active application in the field of remote sensing, which can obtain visible light emitted from the surface without cloudlessness at night.Most of the information at night is related to human activities, such as city lights and ship lights.Compared with daytime images, the information at night can be directly captured by glimmer sensors to depict human fine activities.According to the differences of imaging technologies, the detection results of glimmer sensors can increase the reliability of thermal infrared detection results.As shown in Figure 12a, the positive ship detection results of thermal infrared image are marked in yellow boxes, totaling 91.The ships observed by glimmer data are marked in blue boxes in Figure 12b, totaling 42.In Figure 12c, the yellow boxes are the ships observed in the thermal infrared image but not in the glimmer data, and blue boxes are the ships observed by both.

Discussion
As an important military target, real-time ship detection throughout the day has great military significance.Many scholars have studied the effectiveness and generalization of models using public datasets.However, due to the lack of infrared images, there are few available thermal infrared ship datasets.In this paper, a thermal infrared three-channel ship dataset is proposed, and a complete ship detection network model is designed based on the regression algorithm.Unlike visible remote sensing data, our dataset contains ships at night.As opposed to the simulation data, the dataset we propose uses real remote sensing images, which is more conducive to real-time target detection on the satellite.The TISD is based on three-channel thermal infrared images of the SDG-SAT-1 thermal imaging system, and the Landsat-8 thermal infrared sensor only has two channels.The TISD has an additional band, namely 8~10.5 µm; therefore, the proposed

Discussion
As an important military target, real-time ship detection throughout the day has great military significance.Many scholars have studied the effectiveness and generalization of models using public datasets.However, due to the lack of infrared images, there are few available thermal infrared ship datasets.In this paper, a thermal infrared three-channel ship dataset is proposed, and a complete ship detection network model is designed based on the regression algorithm.Unlike visible remote sensing data, our dataset contains ships at night.As opposed to the simulation data, the dataset we propose uses real remote sensing images, which is more conducive to real-time target detection on the satellite.The TISD is based on three-channel thermal infrared images of the SDGSAT-1 thermal imaging system, and the Landsat-8 thermal infrared sensor only has two channels.The TISD has an additional band, namely 8~10.5 µm; therefore, the proposed dataset contains rich spectral information.Dataset feature analysis in Section 3.4 and the experimental results in Table 6 show that the increase of spectral information is more conducive to target detection.Instead of utilizing two-stage algorithms, our model is based on a one-stage Yolov5s, which is more conductive to speeding up prediction.In our model, dilated convolution can extract more fine features for smaller ships, and SElayer can pick out more important features.As shown in Table 7, the accuracy of the proposed model is higher than that of other advanced models in sea scenes.In complex scenes, with a slight decrease of accuracy, our model parameters and floating-point operations are greatly reduced, where our model's FLOPs is only 47.95% of original Yolov5s' FLOPs.Thus, it is possible to detect ships from thermal infrared remote sensing images based on the lightweight Yolov5s model.
However, the following aspects can be further studied.First, there is an object on land that is misidentified as a ship, as shown by the yellow box in Figure 11a.Due to the complex situation of land surface, target detection in the sea can be carried out after the preprocessing of sea and land segmentation in the future work.Second, in Table 7, the accuracy detection at nighttime is lower than during the daytime.The possible reason is that the temperature difference between the ship and the water is too small at night, resulting in a weak contrast between the intensity of target and the background.In Figure 13a3, during the day, the intensity of the boat is much higher than that of the water.As shown in Figure 13b3, at night, the intensity of the ship is slightly lower than that of the water, resulting in a low signal-to-background ratio, which is not conducive to target detection.Later work should be scheduled to increase the nighttime ship dataset and to augment the target signal.Third, due to different imaging technology, glimmer datasets are a good complement for thermal infrared images.Our future work will focus on the expansion of glimmer datasets and ship wake labeling to promote accurate ship detection at night.proposed model is higher than that of other advanced models in sea scenes.In complex scenes, with a slight decrease of accuracy, our model parameters and floating-point operations are greatly reduced, where our model's FLOPs is only 47.95% of original Yolov5s' FLOPs.Thus, it is possible to detect ships from thermal infrared remote sensing images based on the lightweight Yolov5s model.However, the following aspects can be further studied.First, there is an object on land that is misidentified as a ship, as shown by the yellow box in Figure 11a.Due to the complex situation of land surface, target detection in the sea can be carried out after the preprocessing of sea and land segmentation in the future work.Second, in Table 7, the accuracy detection at nighttime is lower than during the daytime.The possible reason is that the temperature difference between the ship and the water is too small at night, resulting in a weak contrast between the intensity of target and the background.In Figure 13a3, during the day, the intensity of the boat is much higher than that of the water.As shown in Figure 13b3, at night, the intensity of the ship is slightly lower than that of the water, resulting in a low signal-to-background ratio, which is not conducive to target detection.Later work should be scheduled to increase the nighttime ship dataset and to augment the target signal.Third, due to different imaging technology, glimmer datasets are a good complement for thermal infrared images.Our future work will focus on the expansion of glimmer datasets and ship wake labeling to promote accurate ship detection at night.

Conclusions
In this paper, the difficulties of existing ship detection datasets are summarized.Due to the high secrecy level of infrared data, thermal infrared ship datasets are lacking.Moreover, both detection accuracy and speed need to be considered for ship detection.

−
is the vertical offset of bands B1 and B2, and offset of bands B2 and B3 with the resolution of 30 m or 10 m. (m represents meters).

Figure 2 .
Figure 2. The intensity images of B1: 11.5~12.5 µm, B2: 8~10.5 µm, B3: 10.3~11.3µm.The brightness temperature on the pupil (T/K) are marked in the color bar, and the three-bands pseudo-color image with the resolution of 30 m in Yellow Sea of China from SDGSAT-1 TIS is shown in the bottom left.The digital number (DN) of the ship and sea surface and the brightness temperature on the pupil (T/K) of the ship and sea surface are shown in the table on the right.The horizontal pixels offset of B1-B2 and B2-B3 is thirty pixels, and the vertical pixels offset of B1-B2 and B2-B3 is two pixels, as shown on the bottom right.

Figure 2 .
Figure 2. The intensity images of B1: 11.5~12.5 µm, B2: 8~10.5 µm, B3: 10.3~11.3µm.The brightness temperature on the pupil (T/K) are marked in the color bar, and the three-bands pseudo-color image with the resolution of 30 m in Yellow Sea of China from SDGSAT-1 TIS is shown in the bottom left.The digital number (DN) of the ship and sea surface and the brightness temperature on the pupil (T/K) of the ship and sea surface are shown in the table on the right.The horizontal pixels offset of B1-B2 and B2-B3 is thirty pixels, and the vertical pixels offset of B1-B2 and B2-B3 is two pixels, as shown on the bottom right.

Figure 3 .
Figure 3. PASCAL VOC XML annotation for 768 × 768 three-bands pseudo-color images with the resolution of 10 m. (351,243) and (388,283) are, respectively, the coordinates of the top left and bottom right of the bounding boxes of the left ships.

Figure 3 .
Figure 3. PASCAL VOC XML annotation for 768 × 768 three-bands pseudo-color images with the resolution of 10 m. (351,243) and (388,283) are, respectively, the coordinates of the top left and bottom right of the bounding boxes of the left ships.

Figure 4 .
Figure 4. Statistical results of ship target bounding box length and width in the TISD dataset.

Figure 4 .
Figure 4. Statistical results of ship target bounding box length and width in the TISD dataset.Figure 4. Statistical results of ship target bounding box length and width in the TISD dataset.

Figure 4 . 22 Figure 5 .
Figure 4. Statistical results of ship target bounding box length and width in the TISD dataset.Figure 4. Statistical results of ship target bounding box length and width in the TISD dataset.

Figure 5 .
Figure 5. Statistical results of (a) aspect ratio of ship target bounding box and (b) the brightness temperature on the pupil (T/K) between the ships and sea surface in 8~10.5 µm in the TISD.

Figure 6 .
Figure 6.Parts of day and night images in the cloud, river, and sea scenes of the TISD.

Figure 6 .
Figure 6.Parts of day and night images in the cloud, river, and sea scenes of the TISD.

Figure 7 .
Figure 7.The architecture of proposed all-day ship detection methods (The details of these modules including Focus, Ghost Bottleneck, and CSP1_X are shown at the bottom).

Figure 7 .
Figure 7.The architecture of proposed all-day ship detection methods (The details of these modules including Focus, Ghost Bottleneck, and CSP1_X are shown at the bottom).

Figure 10 . 22 Figure 10 .
Figure 10.The results of nighttime ship detection by using the TISD in (a) Shanghai Port and (b) sea near Pudong Airport.(The red boxes show the correct vessel detected, the yellow boxes show false alarms, and the blue boxes show missing alarms).

Figure 11 .
Figure 11.The results of daytime ship detection by using the TISD in (a) Tianjin Port and (b) Partial Sea of Bohai during daytime.(The red boxes show the correct vessel detected, the yellow box shows the false alarm, and the blue boxes show the missing alarms).

Figure 11 .
Figure 11.The results of daytime ship detection by using the TISD in (a) Tianjin Port and (b) Partial Sea of Bohai during daytime.(The red boxes show the correct vessel detected, the yellow box shows the false alarm, and the blue boxes show the missing alarms).
Remote Sens. 2022, 14, x FOR PEER REVIEW 18 of 22 the yellow boxes are the ships observed in the thermal infrared image but not in the glimmer data, and blue boxes are the ships observed by both.

Figure 12 .
Figure 12.Night images in Mumbai, India with (a) thermal infrared image of 11.5-12.5 µm, 8-10.5 µm, and 10.3-11.3µm at night (The positive ship detection results are marked in yellow boxes), (b) glimmer image of R:615~690 nm, G:520~615 nm, and B:430~520 nm at night (The observed ships are marked in blue boxes), (c) fusion image of 0.615~0.69nm, 8~10.5 µm, and 10.3~11.3µm with the resolution of 10 m (Ships observed in the thermal infrared image but not in the glimmer data are marked in yellow boxes, and ships observed by both are marked in blue boxes), (d) an enlarged image of the green box in c.

Figure 12 .
Figure 12.Night images in Mumbai, India with (a) thermal infrared image of 11.5-12.5 µm, 8-10.5 µm, and 10.3-11.3µm at night (The positive ship detection results are marked in yellow boxes), (b) glimmer image of R:615~690 nm, G:520~615 nm, and B:430~520 nm at night (The observed ships are marked in blue boxes), (c) fusion image of 0.615~0.69nm, 8~10.5 µm, and 10.3~11.3µm with the resolution of 10 m (Ships observed in the thermal infrared image but not in the glimmer data are marked in yellow boxes, and ships observed by both are marked in blue boxes), (d) an enlarged image of the green box in c.

Figure 13 .
Figure 13.(a1) the daytime image at Bohai port in 38°10′50.61″N,118°4′39.27″E(a2) an enlarged image of the green box in a1, (a3) the intensity distribution of a2, (b1) the images at night in the same area as a1, (b2) an enlarged image of the green box in b1, and (b3) intensity distribution of b2 (The red boxes are the ships).

Figure 13 .
Figure 13.(a1) the daytime image at Bohai port in 38 • 10 50.61"N,118 • 4 39.27"E (a2) an enlarged image of the green box in a1, (a3) the intensity distribution of a2, (b1) the images at night in the same area as a1, (b2) an enlarged image of the green box in b1, and (b3) intensity distribution of b2 (The red boxes are the ships).

Table 1 .
The summary of ship datasets.

Table 2 .
The horizontal and vertical pixel offsets of two adjacent band images.

Table 3 .
The statistics of digital numbers in the TISD's images.

Table 3 .
The statistics of digital numbers in the TISD's images.

Table 4 .
The summary of correlation coefficients.

Table 5 .
The summary of dataset feature analysis.

Table 6 .
Detection evaluation criteria of single band and combined bands images in the TISD by several models (The bold data is the best result of the each model).

Table 6 .
Detection evaluation criteria of single band and combined bands images in the TISD by several models (The bold data is the best result of the each model).

Table 7 .
The evaluation criteria of different methods by using the TISD (The bold data are the best and second-best results of different models).

Table 8 .
Ship detection results for different scenarios and times.

Table 8 .
Ship detection results for different scenarios and times.