Lightweight Small Ship Detection Algorithm Combined with Infrared Characteristic Analysis for Autonomous Navigation

: Merchant ships sometimes fail to detect small ships at night and in poor visibility, leading to urgent situations and even collisions. Infrared (IR) cameras have inherent advantages in small target detection and become essential environmental awareness equipment on unmanned ships. The existing target detection models are complex and difﬁcult to deploy on small devices. Lightweight detection algorithms are needed with the increase in the number of shipborne cameras. Therefore, herein, a lightweight model for small IR ship detection was selected as the research object. IR videos were collected in the Bohai Strait, the image sampling interval was calculated, and an IR dataset of small ships was constructed. Based on the analysis of the characteristics of the IR ship images, gamma transform was used to preprocess the images, which increased the gray difference between the target and background. The backbone of YOLOv5 was replaced with that of Mobilev3 to improve the computing efﬁciency. Finally, the results showed that the parameters of the proposed model were reduced by 83% compared with those of the YOLOv5m model, while the detection performance was almost the same.


Introduction
Collisions between large and small ships usually cause severe loss of life and marine pollution.The detection of small ships has become a crucial issue in navigation.Statistics show that ships smaller than 100 m have a greater collision probability than large ships [1] owing to their poor radar reflection.Small ships, such as pirate ships, are mostly made of wood or fiberglass and have poor reflectivity toward radar waves.Moreover, small ships, such as fishing boats, sometimes turn off their automatic identification system (AIS), which broadcasts information to surrounding ships and should always be on during sailing.Finally, the use of fog signals is unregulated on small ships.Small ships are usually busy with their operations and do not sound the fog horn in poor visibility.
Small ships burn oil during sailing, so their temperature is higher than the background temperature.IR cameras can convert the temperature difference into grayscale difference, so small ships remain clear in IR images at night and in poor visibility.Therefore, IR equipment has considerable advantages for the detection of small ships and can compensate for the deficiencies of radar and AIS in this respect.Therefore, IR cameras have become essential surveillance sensors for unmanned ships [2][3][4].
Recently, researchers have designed many high-precision object detection models, but these models are usually computationally intensive.The need for lightweight ship detection models becomes stronger with the increasing number of onboard cameras.Detection accuracy is usually proportional to algorithm complexity, making it challenging to deploy high-precision algorithms on unmanned ships.Herein, the contradiction between algorithm accuracy and complexity is investigated, and a lightweight deep learning model for small ship detection in IR images is proposed.
Marine IR images are collected in the Bohai Strait, and the sampling interval of the IR images is analyzed.The coefficients of the gamma transform are then determined by analyzing the characteristics of small ships and their backgrounds.Finally, the experiments confirm that the gamma transform can improve the detection efficiency of small ships.
The main contributions of this study are as follows: 1.
The dataset built in this study is the first IR image dataset for small ship detection in complex scenarios.

2.
The backbone of YOLOv5 is replaced by that of MobileNetV3 to make the detection model lightweight.The proposed model is conducive to autonomous navigation in practical scenarios.3.
This study analyzes the characteristics of IR ship images and proposes the use of gamma transform to preprocess the images.Small IR ships are evident in the transformed image, assisting the algorithms in detecting small ships.

Maritime Video Surveillance
More attention has been paid to maritime video surveillance with the development of unmanned ships in recent years.IR cameras have become essential equipment on unmanned ships [5][6][7].The Maritime Unmanned Navigation through Intelligence in Networks project combined IR cameras with visible cameras, radars, and AIS.The zeroemission unmanned ship Yara Birkeland, developed by KONGSBERG, was equipped with IR and visible cameras [8].In addition, video surveillance is increasingly being employed on ordinary ships to assist crews in locating small ships during sailing.
Maritime video surveillance is more complicated than the recognition of vehicles and pedestrians.First, it is challenging to obtain enough data, particularly because deep learning requires numerous training samples.However, compared with the visible ship images [9], the IR dataset is scarcer, limiting the research on IR ship detection, recognition, and tracking.Maritime video collection is more time-consuming and expensive than land traffic video acquisition.Second, the size of the ships in the image changes drastically.The length of ships varies considerably, ranging from 400 m to <10 m.In addition, the distance variations substantially change the scale of the ships in the image.Finally, IR images lack rich colors and textures, which makes many neural network units unable to obtain effective parameters, leading to poor performance.

Ship Detection Based on Traditional Methods
Traditional ship detection methods mainly focus on mining the unique attributes of ships.These technologies are generally under the scope of background modeling and subtraction, human visual attention, and edge features.A Gaussian mixture model [10,11] and its modifications [12] are often used to model the sea surface, followed by background subtraction.Background modeling usually takes a certain amount of time and is suitable for target detection in fixed scenarios.Background subtraction may fail when the ship is swinging.Mumtaz et al. [13] used a graph-based visual saliency algorithm to compute the saliency maps of the input IR images and determined the salient regions by multilevel thresholding the maps.Fefilatyev et al. [14] exploited gradient information for ship detection and tracking.
Sea-sky line (SSL) detection is also an important content in traditional ship detection methods.The methods for SSL detection can be broadly categorized into three types: transformation [15], region attributes [16], and semantic segmentation [17].The determination of SSL can reduce the search area and the interference of waves and clouds [18].However, SSL in visible images cannot be detected in poor visibility.

Ship Detection Based on Deep Learning
Recently, object detection based on deep learning has attracted considerable research attention and achieved remarkable performance results.Unlike object detection based on manually extracted features, deep learning can automatically optimize parameters based on big data and computing power.Deep learning object detection algorithms are categorized into one-stage and two-stage detection methods.One-stage detectors directly predict the class probability and coordinate offset of the object.Representative one-stage detection algorithms include you only look once (YOLO) [19,20], single shot multibox detector (SSD) [21], and RetinaNet [22].Unlike one-stage detection methods, two-stage detection methods generate a set of recommended regions and take these regions as input to detect and classify objects.Regions with convolutional neural network (R-CNN) [23] is the first proposed two-stage detection algorithm.Fast R-CNN [24], Faster R-CNN [25], and feature pyramid network (FPN) [26] are the improved R-CNN algorithms.
Both one-stage and two-stage detection methods are used in marine target detection.Zhou et al. [27] improved YOLOv5s by expanding the receptive field area and optimizing the loss function to increase ship detection mAP to 98.6%.Ma et al. [28] built a lowillumination ship dataset containing 1258 images and used RetinaNet to detect ships in dim light.Shao et al. [9] constructed a large visible image dataset containing six types of ships and tested Fast R-CNN, Faster R-CNN, SSD, and YOLOv2 on this dataset.To solve the problem of insufficient samples of small ships, Chen et al. [29] used a generative adversarial network and YOLOv2 to generate training samples and to detect small ships in visible images, respectively.However, it was shown that these ships are relatively large in the images; therefore, ship detection is not considerably difficult.Li et al. [30] used DenseNet and spatial separation convolution to replace the backbone of YOLOv3 and standard convolution.The modified network was lightweight and exhibited better ship detection performance in the case of visible images than the original YOLOv3.
Recently, ship detection methods based on SAR and IR images have become increasingly popular.In complex-scenario SAR images, Chen et al. [31] combined the region proposal network with the FPN to detect small ships.To make the detection network lightweight, Xiong et al. [32] used YOLOv5n as the baseline, used the attention mechanism, and simplified the spatial pyramid pool structure to detect ships in SAR images.In terms of IR ship detection, Wang et al. [33] generated an enhancement map to suppress the background and segment the maritime targets from the IR images.To eliminate the interference of sun-glint clutter in the case of ship detection, Li et al. [34] designed a high-dimensional feature based on the time fluctuation and space structure features to distinguish ships and clutters.Wang et al. [18] analyzed the spectral bands commonly used in ship detection and pointed out that IR images performed better than visible images in harsh maritime environments.Farahnakian et al. [35] investigated the influence of different fusion architectures on ship detection.Chang et al. [36] detected ships on self-built datasets by modifying the network structure of YOLOv3.However, the IR images in their study do not conform to the characteristics of thermal imaging and may be visible under low light.

Image Collection
On 28 October 2022, FLIR M617CS, a multispectral marine camera with gyro stabilization, was installed on the Sinorail Bohai Train Ferry to collect maritime videos.The ship sailed from Lushun Port at 10:30 am and arrived at Yantai Port at 5:00 pm.The ship crossed the Bohai Strait from north to south and passed through the Laotieshan Channel, one of the busiest waterways in China.The collected videos covered typical navigation scenarios, such as docking, departure, and sailing (Figure 1).

Image Sampling
Ships usually keep a vast distance from obstacles (e.g., other ships, islands, and buoys) to avoid collisions.In addition, compared with land traffic, the speed of ships is relatively slow; therefore, the change in the maritime image is slow.Almost no difference is observed between images at intervals of 1 s, as shown in Figure 2. In the sea image with an interval of 10 s, the nearby ship exhibits a substantial azimuth change, but the distant mountain exhibits a marginal change.The original data are in a video format.For the convenience of follow-up research, an 8-bit deep PNG image sequence is extracted from the videos, and the time interval is an important issue.There is a contradiction between the sampling interval and image diversity.When the sampling interval is small, the similarity between images is high.Using these images may lead to overfitting for neural network training.The increase in image sampling time results in a considerable difference between images, but reduces the number of sampled images.It is necessary to deduce the sampling time theoretically to obtain a reasonable sampling interval.
The acquisition ship sails along route AB, as shown in Figure 3.When the own ship is at point A, the distance between the own ship and the target is D A .When the ship sails to point B, the azimuth change angle of the target is ∆.
Assuming the speed of the ship is V, the travel time from point A to point B can be obtained.

sin sin(
) As shown in Equation ( 2), the travel time is related to the target distance, sailing speed, heading, and minimum change azimuth.In this study, the speed of the ship is ~13 knots.Assuming that the distance of the acquisition target is 1 nautical mile, the minimum change angle of the target in the image is 2° and the minimum sampling interval can be obtained as 9.67 s.In addition, ship swaying and platform rotation may cause severely blurred image records.Therefore, these data are eliminated in the preprocessing stage.

Annotation
In this study, LabelMe [37,38] software was used to annotate ships with rectangular boxes and the annotations were converted to PASCAL VOC [39] format.Because there are only dozens of pixels for small ships in thermal images, it is impossible to distinguish the types of ships; therefore, all ships were marked with "ship."After selection and annotation, 390 images containing 668 ships were retained in the dataset.Figure 4 shows the size of ships in the dataset.According to the definition of target size in MS COCO [40], most ships in the dataset are small.The distance from point A to B is calculated as follows: Assuming the speed of the ship is V, the travel time from point A to point B can be obtained.
As shown in Equation ( 2), the travel time is related to the target distance, sailing speed, heading, and minimum change azimuth.In this study, the speed of the ship is ~13 knots.Assuming that the distance of the acquisition target is 1 nautical mile, the minimum change angle of the target in the image is 2 • and the minimum sampling interval can be obtained as 9.67 s.In addition, ship swaying and platform rotation may cause severely blurred image records.Therefore, these data are eliminated in the preprocessing stage.

Annotation
In this study, LabelMe [37,38] software was used to annotate ships with rectangular boxes and the annotations were converted to PASCAL VOC [39] format.Because there are only dozens of pixels for small ships in thermal images, it is impossible to distinguish the types of ships; therefore, all ships were marked with "ship."After selection and annotation, 390 images containing 668 ships were retained in the dataset.Figure 4 shows the size of ships in the dataset.According to the definition of target size in MS COCO [40], most ships in the dataset are small.The background becomes complex in the narrow waters and harbors.The brightness of mountains and buildings in the IR images is high, affecting the detection of ships in these areas, as shown in Figure 5b,c.
The grayscale value of IR images is counted to obtain the grayscale probability map.The number of pixels with the same gray value is then divided by the total number of pixels to obtain the probability in the entire image, as shown in Figure 6a.As can be seen, the grayscale distribution of the maritime IR image is between 30 and 250, approximately following the normal distribution.The overall grayscale of the image is low because the sky and sea occupy most of the image.

Analysis of IR Ship Images
The statistics of the ship regions were calculated according to the target coordinates in the annotations (Figure 6b).The brightness of the ship areas is higher than the background.Ships emit considerable heat during sailing, resulting in the high brightness of the targets in the IR image.The grayscale of 90% of the pixels is >150, and the maximum probability lies between 250 and 255.A few pixels in the target areas are low in brightness because the annotation boxes contain some background.

Principle
As shown in Figure 6, a large gap is observed between the ships and the background in the grayscale probability distribution.To increase the grayscale difference between foreground and background, in this study, the gamma transform is used to preprocess the IR image.Gamma transform enhances the image through nonlinear transformation and corrects substantially bright or dark images.The formula of gamma transform is as follows: where r represents the input value, s represents the output value, c represents a constant, and γ (gamma) represents a coefficient.To ensure that the output value is within a reasonable range, the input image needs to be normalized as follows: Figure 7 shows the gamma transform with the input and output on the horizontal axis and vertical axis, respectively.When gamma is <1, the overall brightness of the transformed image is improved.The brightness and contrast of the dark area are considerably improved, which is conducive to displaying the details.When gamma is >1, the contrast of highbrightness areas increases.

Transformation Coefficient
The selection of the coefficient is crucial in the transform.Different improvements can be obtained for various coefficients [41].This paper uses signal-to-clutter ratio (SCR) and SCR gain to select an appropriate gamma coefficient.In small IR target detection, SCR is usually used to measure the contrast between the target and the background, and SCR gain is used to measure the ability of the algorithm to suppress background noise [42].They are defined as follows: where µ t represents the mean pixel value of the target and µ b and σ b represent the mean pixel value and standard pixel deviation of the background, respectively.
SCR Gain = SCR out SCR in (6) where SCR in and SCR out represent the SCR of the image before and after the gamma transform, respectively.As presented in Table 1, the mean value of the background decreases sharply, but the mean value of the target region decreases slowly.The difference between the target and background becomes significant after transformation.The transform also improves the standard deviation (std), SCR, and SCR gain of images.Figure 8 displays the images after the gamma transform from the perspective of subjective perception.When gamma is set to 4, the contrast between the foreground and background increases and the clarity of the distant targets increases.When gamma is set to ≥5, some ships at sea with low thermal radiation may disappear from the image.Therefore, comprehensively considering the influence of gamma transform on subjective and objective observation, gamma was set as 4 in this study.

Analysis of IR Characteristics after Transformation
The statistics of the collected images and targets werecalculated, as shown in Figure 9. Compared with the distribution map before transformation, the gray values of the entire image and the target region shift in different directions.Figure 10 shows the typical navigation images after the gamma transform.It can be seen that the targets become clearer after transformation, which is beneficial for target detection.

Lightweight Small Ship Detection Neural Network 4.1. Evaluation Indices
According to the definitions of spectral bands in ISO 20473 [43], IR radiation has three categories: near, mid, and far IR.In this study, the IR camera band is 7.5-13.5 µm, corresponding to the mid-IR category.Furthermore, in this study, the mid-IR images in the ship detection dataset at open sea [44] were used to increase the diversity of images and avoid overfitting.The dataset retained 1434 images.Moreover, 80% of the images were used for training and validation, and the rest were used as test sets.
The evaluation index in the experiments included average precision, recall rate (R), and precision rate (P), which are shown in Equation ( 7).
where TP, FP, and N represent the number of ships correctly detected, falsely detected, and undetected, respectively.

Performance of YOLOv5
As a typical version of the current YOLO series, the YOLOv5 [45] model has been a widely used target detection model since its release, exhibiting excellent performance in target detection [27,46].According to the width and depth of the network, YOLOv5 models can be divided into five models: YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x.YOLOv5n has the lowest number of parameters, while YOLOv5x has the highest number.
The YOLOv5 models were trained using the dataset and verified on the test set in the case of IR small ship detection.The experiments were conducted on a Linux server with Pytorch 1.7.1.CUDA 11.0 was employed, and the GPU was a GeForce 3070 Ti with an 8 GB memory.Tables 2 and 3 present the parameters and comparison results of the different models.As presented in Table 3, the detection accuracy of the models gradually improved with the increase in network depth and width.YOLOv5n had the shallowest depth and narrowest width of features and could be easily deployed on mobile devices.However, the performance of YOLOv5n was worse than that of other large models and could not meet the requirement of autonomous navigation.Among them, the YOLOv5x model exhibited the best detection accuracy, but it was substantially complex to deploy on mobile devices.
In addition to the YOLOv5 model, Faster R-CNN [25], SSD [21], and RetinaNet [22] were also trained and tested.In terms of the four evaluation indexes, Faster R-CNN exhibited the worst target detection performance, which can be attributed to poor region proposal.SSD achieved the best ship detection precision, but performed poorly on the other three indexes.The performance of RetinaNet was moderate with respect to all indexes, but its performance was lower than that of YOLOv5s.

YOLOv5-Mobile
The structure of YOLOv5 can be divided into input, backbone, neck, and head.The backbone is the foundation of the entire network, which is used for image feature extraction.The subsequent processes are based on the features extracted from the backbone.The backbone network is the most computationally intensive part of YOLOv5; therefore, it becomes the focus of the model's lightweight improvement.
The proportion of parameters in different YOLOv5 models was statistically analyzed (Table 4).Among the four parts, the parameter in the backbone occupied the largest proportion, which was approximately 60%.In the case of YOLOv5x, the number of parameters in the backbone was approximately 46 million.The large number of parameters in the backbone made YOLOv5x perform best in the case of small IR ship detection.Therefore, making the backbone network more concise and efficient has become an attractive topic.To solve the contradiction between model precision and complexity, in this study, the backbone of MobileNetV3 [47] was used to replace that of YOLOv5.MobileNet is a highperformance network that can run on mobile devices.Using 1 × 1 and 3 × 3 convolution instead of 5 × 5 convolution reduces the parameters and backpropagation delay in the MobileNet.Adopting the h-swish and attention mechanism improves computing speed and accuracy.The YOLOv5 backbone was replaced with that of the MobileNetV3-Small, and the modified model was named YOLOv5-mobile.
YOLOv5-mobile can also be divided into backbone, neck, and head (Figure 11).The depth and complexity of the backbone are reduced, which helps reduce the overall computation load of the model.The PANet structure is retained in the neck to enhance the detection of ships of different sizes.The PANet structure is a bottom-up path enhancement that shortens the information propagation path and uses precise positioning information from low-level features.In the head, YOLOv5-mobile retains the structure of YOLOv5 using three heads to detect ships of different sizes.

Experiment and Discussion
Table 5 presents the small IR ship detection results obtained using the test set.The first five algorithms detected ships in original images, while the latest algorithm worked on gamma-transformed images.As can be seen, compared with YOLOv5n, mAP@0.5 and mAP@0.5:0.9 of YOLOv5-mobile increased by 12.7% and 25.1%, respectively.The change in the backbone improved the ability of feature extraction.YOLOv5-mobile exhibited almost the same detection performance as YOLOv5s, but only half the parameters.For the transformed images, the objective indices of YOLOv5-mobile were further improved, indicating that gamma transform was helpful for small ship detection.The performance of YOLOv5-mobile on the transformed image was almost similar to that of YOLOv5m, but the number of parameters decreased by 83%.The parameter number of YOLOv5-mobile only accounted for one-eighth of YOLOv5m, and the complexity of the model was considerably reduced, which was conducive to application.Three typical images were selected for IR small ship detection (Figure 12).As shown in Figure 12a, YOLOv5n, YOLOv5s, and YOLOv5-mobile failed to detect a small ship in the sea-sky area.The possible reason was that the size of the ship was considerably small and the gray difference between the ship and the background was slight; therefore, the networks could not extract the features.As shown in Figure 12b, YOLOv5n did not detect any ship, possibly owing to the narrow width of the network.As shown in Figure 12c, YOLOv5mobile (gamma) detected five small ships.In comparison, YOLOv5-mobile detected only two ships, confirming that the gamma transform was indeed helpful in improving small ship detection in IR images.

Conclusions
Owing to the particularity of the size and materials of small ships, it is difficult for shipborne radar, AIS, and crew members to detect small ships, which becomes a threat to the automatic navigation of ships.IR cameras can convert the temperature difference between small ships and the background into gray difference; therefore, small IR ship detection becomes an essential technology in poor visibility and at night.
Herein, a lightweight small ship detection algorithm combined with IR characteristic analysis was proposed to reduce the complexity of the model and ensure detection performance.First, a small IR ship dataset was constructed, and the detailed design of the dataset was described, including its acquisition procedure, annotation method, and sampling interval calculation.Second, gamma transform was used to preprocess the input images by analyzing the characteristics of IR ships.Furthermore, YOLOv5-mobile was proposed to reduce the complexity of existing detection models.The detection performance of the proposed model was almost the same as that of YOLOv5m, while the model parameters were reduced by 83%.Two inspirations can be drawn from this study, which can also be applied to image segmentation and target tracking.First, in the era of deep learning, traditional image preprocessing is still helpful in improving neural network performance.Second, for a simple task, the performance improvement of complex models is limited, and lightweight models are suitable.
Future research can be focused on the following aspects:

Conclusions
Owing to the particularity of the size and materials of small ships, it is difficult for shipborne radar, AIS, and crew members to detect small ships, which becomes a threat to the automatic navigation of ships.IR cameras can convert the temperature difference between small ships and the background into gray difference; therefore, small IR ship detection becomes an essential technology in poor visibility and at night.
Herein, a lightweight small ship detection algorithm combined with IR characteristic analysis was proposed to reduce the complexity of the model and ensure detection performance.First, a small IR ship dataset was constructed, and the detailed design of the dataset was described, including its acquisition procedure, annotation method, and sampling interval calculation.Second, gamma transform was used to preprocess the input images by analyzing the characteristics of IR ships.Furthermore, YOLOv5-mobile was proposed to reduce the complexity of existing detection models.The detection performance of the proposed model was almost the same as that of YOLOv5m, while the model parameters were reduced by 83%.Two inspirations can be drawn from this study, which can also be applied to image segmentation and target tracking.First, in the era of deep learning, traditional image preprocessing is still helpful in improving neural network performance.Second, for a simple task, the performance improvement of complex models is limited, and lightweight models are suitable.

Figure 1 .
Figure 1.Maritime image acquisition.(a) Data acquisition on the ship; (b) ship route.

Figure 2 .
Figure 2. Marine IR images at different intervals.(a) Marine IR image; (b) marine IR image with a 1 s interval from (a); (c) marine IR image with a 10 s interval from (a).

Figure 3 .
Figure 3. Schematic of data collection at sea.

Figure 4 .
Figure 4. Size of ships in the dataset.

Figure 3 .
Figure 3. Schematic of data collection at sea.

Figure 4 .
Figure 4. Size of ships in the dataset.

3. 2 .
Characteristic Analysis of Marine IR Images 3.2.1.Analysis of IR Background Unmanned ships can navigate not only in open waters, but also in narrow waters and complete berthing and departing automatically.The enormous environmental change poses a considerable challenge to ship detection algorithms.Ships maintain vast distances between each other in open water; therefore, small targets usually appear in the sea-sky line.A slight brightness difference is observed between the sea-sky line and the ship, as seen in Figure 5a.

Table 1 .
Image evaluation after gamma transform.

Table 3 .
Performance comparison table.

Table 4 .
Parameter statistics of different parts of YOLOv5.

Table 5 .
Comparison of the improved model evaluation.