1. Introduction
Synthetic aperture radar (SAR) is an active side-looking radar that can overcome weather interference and provide high-resolution images. SAR images have been then considered more suitable for ship detection than optical images. SAR ship detection is an has important application in the field of marine surveillance and has received much attention recently [
1,
2].
In recent years, increasing scholars began to study SAR ship recognition method based on neural networks. Some scholars use the two-stage method to detect ships. Cui et al. [
3] proposed a dense attention pyramid network to detect multiscale ships, and Lin et al. [
4] also proposed a squeeze and excitation Faster R-CNN [
5] to improve detection accuracy. Zhao et al. [
6] applied fast region convolutional neural network (R-CNN) [
7] to ship detection in synthetic aperture radar (SAR) image. These two-stage methods can often achieve higher detection accuracy, but their detection computational speed is often slower than that of one-stage methods. Therefore, in order to ensure the real-time effect of recognition, some scholars use the one-stage method to inspect ships. Wei et al. [
8] designed a high-resolution SAR ship detection network based on HR-SDNet. Wang et al. [
9] applied transfer learning based on SSD [
10] to improve accuracy. Wang et al. [
11] proposed an RetinaNet-based [
12] detection for the ship in GaoFen-3 images. Mao et al. [
13] firstly used a simplified U-Net to extract features and proposed an anchor-free SAR ship detection framework. These one-stage methods have faster detection speed, but they are not effective in detecting small targets (the small target refers to the kind of target with a very small proportion in the image area). You Only Look Once (YOLO) [
14] is also a one-stage method with fast recognition speed, and the latest version of YOLOv5 has made targeted improvements to the problem of different target sizes. Therefore, and in order to obtain faster recognition speed, this paper proposes a SAR ship recognition method based on YOLOv5.
Compared with offshore ships, the detection of inshore ships can be more challenging. Although the pixel value of a ship is much higher than that of the ocean, its texture features and gray level are very similar to those of coastal and shore buildings. Recently, most of the related work on inshore ship detection is aimed at optical remote sensing images. Xu et al. [
15] present a detection method based on robust invariant generalized Hough transform to detect and segment out the inshore ship using high-resolution remote sensing imagery. Lin et al. [
16] present a fully convolutional network, the shape and context information are utilized to detect the inshore ship. Liu et al. [
17] used Harris corner detection for ship foredeck detection. These methods carry out inshore detection based on the extraction of contour and edge. However, when it comes to the SAR image, it is hard to extract the contour and edge feature accurately due to the existence of intrinsic multiplicative speckle noise. Zhai et al. [
18] proposed a inshore ship recognition method based on a super pixel generation algorithm, including salient region detection and final false alarm elimination mechanism. This method can achieve better detection results, but the overall structure is slightly complex, and there is still room for real-time improvement. Cui et al. [
19] preprocessed a SAR image by using similar pixels according to the different scattering mechanisms of shore and ship, and finally used threshold processing and morphological filtering. This method has a certain effect on the recognition of inshore ships, but the calculation is a little heavy, and the final threshold processing and filtering will lose small target ships. Fu et al. [
20] proposed a ship detection method based on FBR-Net, Cui et al. [
21] proposed a ship detection method based on CenterNet, both of which are anchor-free detection methods to reduce the influence of background environment. This method can reduce the interference of background environment to ship detection to a certain extent, but it depends on the accuracy of regression. Because the basic candidate frame is abandoned, the regression error may be very large.
When considering SAR images, the existence of noise is always a problem faced by ship recognition, and a large amount of noise interferes with ship recognition. CFAR is a commonly used preprocessing method for ship recognition in SAR images, which is used to overcome the interference of background clutter and noise. [
22] Common CFAR detection algorithms in recent years include the cell averaging CFAR (CA-CFAR) [
23], greatest of CFAR (GO-CFAR) [
24], smallest of CFAR (SO-CFAR) [
25] and the order statistic CFAR (OS-CFAR) [
26]. The CA-CFAR detector is used in the homogeneous clutter background, and the commonly used two-parameter CFAR algorithm is based on the CA-CFAR detector using normal distribution. The GO-CFAR detector and the SO-CFAR detector are proposed to solve the clutter edge. The OS-CFAR detector is designed based on sort processing technique in digital image processing, which has a good performance with the strong interfering target presence. However, they cannot be used alone to deal with complex and changeable background situations. Some scholars use other filters for experiments. Liu et al. [
27] proposed a ship detection method based on whitening filter, which can improve the contrast between ship and background clutter, thus improving the accuracy of ship recognition. However, when the speckle is very bright and close to the ship, this method cannot separate the ship from the speckle. Liu et al. [
28] proposed a method based on adaptive bandwidth for ship recognition, which can obtain small bandwidth in the ship area and large bandwidth in the background area, thus smoothing the background of the image. However, the premise of this method is to extract the target area of the ship by using local mean, and the extraction work will be affected when the ship inshore or has obvious speckle. Other researchers have chosen a notch filter to filter SAR [
29,
30,
31], because notch filter can deal with multiple interference components or periodic noises at the same time. However, one of the most important parameters of notch filter is to set the domain size with the same weight. If this parameter is too small, it is not conducive to noise equalization in a wider range. If this parameter is too large, the image details cannot be obtained. For SAR images, there are great differences in noise distribution level and noise types, so it is almost impossible to set a parameter to be applicable to all images to obtain common and good results. Because the noise levels and types of SAR images are quite different, this kind of filtering method using filters has the same disadvantages when processing SAR images, that is, it cannot deal with the noise in all images well.
The research developed in this paper introduces a new SAR ship detection method so-called N-YOLO, which is based on the classification of noise level and the processing of noise. It consists of three parts, the first one is a noise level classifier (NLC), the second one is the SAR target potential area extraction (STPAE) module, the third one is the identification module based on YOLOv5. By applying the NLC classifier, images are divided into three levels according to the level of noise and sent to different modules. Images affected by high-level noise are sent to YOLOv5 for detection, and other images are sent to STPAE module. In STPAE module, CA-CFAR is used to detect the preliminary target area in order to extract the potential target area. In order to prevent some dark pixels on the target from being missed by CA-CFAR, the expansion operation is used to fill and expand the target area acquired by CA-CFAR. In YOLOv5-based recognition module, firstly, the image extracted by STPAE module is combined with the original image to obtain a new image. In the new image, there are fewer noise and the ship and coast are highlighted, thus reducing the impact of coast and noise on ships. The new image is then sent to YOLOv5 for recognition. To evaluate the performance of N-YOLO, we conducted several experiments on the GaoFen-3 dataset, in which the images were taken by GaoFen-3 satellite in China. The detection results show that our method is efficient for detecting multiscale ships in SAR images, compared with several CNN-based methods, e.g., YOLOv5 and G-YOLOv5. The major contributions of this article are summarized as follows:
1) A novel detection method called N-YOLO for detecting ships in SAR images.
2) A three-step framework that first contains a NLC module to distinguish images with different noise levels. Secondly, a STPAE module to extract the complete potential target area, and thirdly, a module based on YOLOv5 to identify the ship from the image with highlighted target and less noise.
3) Experiments on the reference GaoFen-3 dataset demonstrate that the called N-YOLO detects ships with competitive results in comparison with some classical and special CNN-based methods.
2. Methods
Let us successively introduce the three components of our N-YOLO approach, that is, the NLC module, STPAE module and YOLOV5-based target discrimination.
The architecture of N-YOLO is shown in
Figure 1. The influence of noise on SAR images varies greatly. The objective of the NLC module is to classify the noise level. If the image is affected by medium-level noise or low-level noise, the original image is sent using path1 to two processes. On the one hand, the image is sent to the STPAE module in which the image is prescreened with CA-CFAR, and then the whole potential target area is obtained by dilation operation. On the other hand, the other branch retains and outputs the original image. Images obtained from the two branches are then combined. If the pixel value of a given position on the two images is not null, the pixel value of this point on the combined image will be assigned as 1, otherwise, it will be assigned as null. The combined image will be sent to YOLOv5 network for ship detection. If the image is affected by high-level noise, it will be sent to YOLOv5 for detection through path2.
2.1. Classify the Noise Level
When considering the GaoFen-3 dataset, images are affected by different levels and kinds of noises. Among all kinds of noises, salt and pepper noise is the most common and has the greatest influence on ship identification. Salt and pepper noise, also known as impulse noise, which randomly changes some pixel values, denotes a noise produced by image sensor, transmission channel and decoding processing. In order to better deal with the influence of salt and pepper noise, we divided the noise into three grades according to its influence. The average pixel value is calculated as follows:
in which
is the average pixel value of the whole image,
is the pixel value at coordinates
i,j in the picture and
is the total number of pixels in the image.
In order to improve ship detection affected by high-level noise, we introduced an NLC module to classify and process images, as shown in
Figure 2. For images affected by low-level noise and medium-level noise, they are sent to the STPAE module for processing. Images affected by high-level noise are sent to YOLOv5 for detection.
The threshold value T is selected by an empirical method. According to the images affected by different noise levels and the results obtained by CA-CFAR processing, we applied an empirical method to obtain the interval of different noise levels. We set the average pixel value range of images affected by low-level noise to be . Accordingly, the average pixel values of the images affected by medium-level noise and those affected by high-level noise are and , respectively. Therefore, we set the threshold T to 80. If the threshold is higher than 80, some images affected by high-level noise will be sent to STPAE, which can affect the overall training results and improve the missed detection rate. If the threshold is lower than 80, some images affected by medium-level noise cannot remove noise interference, and some images affected by shore interference cannot remove shore interference.
2.1.1. Low-Level Noise
The images affected by low-level noise are shown in
Figure 3. This kind of image has less noise and sparse distribution, which has little influence on the ship recognition task. The average pixel value range of such images is less than 30.
The average pixel values of each image in
Figure 3 are listed in
Table 1. It can be seen from
Table 1 that the average pixel values of these four images are all less than 30, so they all belong to images affected by low-level noise. There are uniformly distributed salt and pepper noises in these four images, but the noise influence is slight, which hardly affects ship identification.
2.1.2. Medium-Level Noise
The images affected by medium-level noise are shown in
Figure 4. The noise density of this kind of image is not too large and the distribution is not too dense, which will have some influence on the ship recognition task. The average pixel value of this kind of image is between 30 and 80.
The average pixel values of each image in
Figure 4 are listed in
Table 2. It can be seen from
Table 2 that the average pixel values of these four images are between 30 and 80, so they all belong to images affected by medium-level noise. There are uniformly distributed and dense salt and pepper noises in this kind of images, which will have some influence on ship recognition. However, the potential target region extraction module and the YOLOv5-based recognition module can filter out the noise and improve the recognition accuracy.
2.1.3. High-Level Noise
This kind of picture shown in
Figure 5 is disturbed by severe noise, and the noise in this kind of picture is very dense and uniform, which brings great challenges to ship recognition. The average pixel value of this kind of image is greater than 80.
The average pixel values of each image in
Figure 5 are listed in
Table 3. It can be seen from
Table 3 that the average pixel values of these four images are all greater than 80, so they all belong to images affected by high-level noise. This kind of picture is greatly affected by noise, and if the potential target extraction module and the recognition module based on YOLOv5 are directly used to recognize it, the effect is not good; not only is the rate of missing detection high, but also the training effect is poor.
2.2. Extract the Complete Target Area
In order to extract the complete target area from SAR images, this paper introduced a STPAE module, which consists of CA-CFAR and dilation operation.
In SAR images, the gray intensity of ship is higher than that of surrounding sea clutter. CA-CFAR can generate a local threshold value to detect bright pixels via a sliding window. CA-CFAR divides the local area into three windows: center region of interest’s (ROI) window, the guard window and the background clutter’s window, as shown in
Figure 6.
CA-CFAR first calculates the average pixel value of the region of interest (
) and the average pixel value of clutter (
), and then multiplies the average value of clutter by a coefficient
. The obtained value is the adaptive threshold T. Finally, the
are compared with the threshold T. If the
are greater than the threshold T, the ROI pixels are marked as bright pixel in an output binary image J, otherwise it is marked as a dark pixel. Assuming that the dimensions of the input SAR image I and the output binary image J are both X × Y, where
,
.The I and J can be defined as
The CA-CFAR binary pixel
can be calculated with
The SAR ship images for which will be extracted and sent to the next stage for expansion operation. Using the prescreening proposed method can greatly reduce the workload of subsequent recognition work, maintain a constant false alarm rate. At the same time, it will not miss all possible ships in the image.
The flow chart of STPAE module is shown in
Figure 7. After we sent the original SAR image, we first calculate the adaptive threshold when the sliding window traverses each point on the image. The adaptive threshold can be defined as
where
z is the average value of surrounding pixels and α is the adaptive coefficient. The size of
α depends on the size of the clutter window. Then, compare the pixel value of each point with its adaptive threshold. If the pixel value of this point is greater than its adaptive threshold, assign 1 to the corresponding position of the prescreened picture; otherwise, assign 0 to the corresponding position of the prescreened picture. Next, the prescreened pictures are sent to the expansion operation. Through the expansion operation, the highlighted pixels are expanded around, thus the potential target areas extracted in the previous step are filled and expanded, to avoid the partial areas of some targets being lost by the previous operation due to the low pixel value. Finally, the obtained image covering the complete target area is sent to the next stage.
2.3. Ship Identification Based on YOLOv5
In the recognition stage, firstly, the extracted image of potential target area is combined with the original image, and the preprocessed image with bright target and less noise points is obtained. Compare the pixels in the same position on the two images. If the pixels in the original image are greater than the threshold value T and the pixels in the image obtained by STPAE module are greater than 0, then the point in the new image is assigned 1, otherwise, the point is assigned 0.
The process of combining the above two images is shown in
Figure 8. If two conditions are met, that is, the pixel value in the original image is greater than the threshold value
, and the pixel value at the corresponding position in the extracted potential target image is 1, then the pixel value of this point in the obtained new image is 1, as shown by point 2 in
Figure 8. Otherwise, even if one of the conditions is met, the pixel value of this point in the new image will be null. As shown in point 1 in
Figure 8, the pixel value in the original image is greater than the threshold value t, but the pixel value of the corresponding position in the extracted potential target image is null, so the pixel value of this point in the new image is set to null. By analogy, we can get a new image combined with the above two images. Compared with the original image, most of the noise is filtered out and the target is highlighted and enhanced. Finally, the new image will be sent to YOLOv5 for ship identification.
4. Discussion
It can be concluded from
Table 4 that the highest accuracy is the result of training directly sent to STPAE and YOLOv5-based detection module without passing through NLC module. Compared with the training result directly using YOLOv5, its precision is 7% higher, but its recall rate is 12.75% lower. This is because the images affected by high-level noise will produce a mass of noise in the middle of the image and lose the target after being sent to STPAE, which will not only improve the missed detection rate, but also affect the overall training results in the training process. In contrast, using the method proposed in this paper (classified by NLC module), the recall rate is greatly improved. The recall rate of images affected by high-level noise after training is as high as 92.36%, which is very close to 92.65% of the YOLOv5, and the recall rate of images affected by low-level noise after training also reaches 86.42%. Compared with the YOLOv5, the accuracy of the proposed method is greatly improved. Among them, the accuracy of images affected by medium and low-level noise after training reaches 76.5%, which is 5.7% higher than that of the YOLOv5. The accuracy of images affected by high level noise after training is 67.46%, which is 3.34% lower than that of the first method. Among the 12,000 images in the training set, there are 1744 images affected by high level noise and 10256 images affected by medium and low-level noise. According to the ratio of the two, N-YOLO has improved the accuracy and decreased the false detection rate.
Experiments show that using NLC can not only improve the detection accuracy, but also increase the missed detection rate less, thus improving the overall detection performance. At the same time, images affected by different noise levels can be prevented from interfering with each other in the training process.
It can be seen from
Table 5 that the precision of the last two methods has been improved to varying degrees compared with the first method, and the precision of the method proposed in this paper has been improved the most. In terms of recall rate, the first two methods are almost the same and superior to the latter two methods, while YOLOv5 is the best. Because the latter two methods preprocess the images, the details of small targets are destroyed, resulting in missing detection.
Figure 13 shows the
PR curves of the CNN-based methods. The navy blue line is the
PR curve obtained by using YOLOv5 training. The light blue line is the
PR curve obtained by non-NLC. The green line and yellow line are
PR curves of images affected by high-level noise and medium/low-level noise, respectively, which are trained by our method. The red line is the
PR curve obtained from the contrast experiment, which is first filtered by Gaussian and then sent to YOLOv5 for training.
The PR curve of non-NLC comes to a sharp decrease with an increase in recall rate compared with YOLOv5. It might be because of the insufficient characteristics extracted by non-NLC, which leads to weak discrimination for ships. Furthermore, the PR curve of non-NLC is lower than those of other methods when recall rate is higher than about 0.5. In addition, the PR curve of h-level is higher than that of others when recall rate is greater than 0.9.
Figure 14 shows the detection results of the different methods as applied to four different ships situations. These four situations are as follows: offshore ships affected by medium/low-level noise (the first row of
Figure 14), offshore ships affected by high-level noise (the second row of
Figure 14), inshore ships affected by high-level noise (the third row of
Figure 14), and inshore ships affected by medium/low-level noise (the fourth row of
Figure 14). It can be seen from the first line of
Figure 14 that the effects of the four detection methods are almost the same for the first situation. Compared with the original method, the detection accuracy of the latter two methods is slightly improved, among which G-YOLOv5 is improved by 1%, N-YOLO is improved by 2%. For the second situation, compared with the original method, the detection accuracy of G-YOLOv5 is equal to the original method, and N-YOLO is improved by 4%. For the third situation, compared with the original method, the detection accuracy of G-YOLOv5 is reduced to a certain extent, and G-YOLOv5 also has a false detection. In this picture, the detection accuracy of N-YOLO is improved by 7% on average compared with the original method. For the last situation, G-YOLOV5 not only failed to reduce the noise interference, but also the target became blurred, so the detection accuracy dropped significantly and there were four missing detections. For this image, the detection accuracy of N-YOLO is slightly improved compared with the original method. Among them, the detection accuracy of the ship in the lower left corner increased by 15%. However, although N-YOLO did not miss the detection, it mistakenly identified a ship in the lower right corner.