N-YOLO: A SAR Ship Detection Using Noise-Classifying and Complete-Target Extraction

: High-resolution images provided by synthetic aperture radar (SAR) play an increasingly important role in the ﬁeld of ship detection. Numerous algorithms have been so far proposed and relative competitive results have been achieved in detecting different targets. However, ship detection using SAR images is still challenging because these images are still affected by different degrees of noise while inshore ships are affected by shore image contrasts. To solve these problems, this paper introduces a ship detection method called N-YOLO, which based on You Only Look Once (YOLO). The N-YOLO includes a noise level classiﬁer (NLC), a SAR target potential area extraction module (STPAE) and a YOLOv5-based detection module. First, NLC derives and classiﬁes the noise level of SAR images. Secondly, the STPAE module is composed by a CA-CFAR and expansion operation, which is used to extract the complete region of potential targets. Thirdly, the YOLOv5-based detection module combines the potential target area with the original image to get a new image. To evaluate the effectiveness of the N-YOLO, experiments are conducted using a reference GaoFen-3 dataset. The detection results show that competitive performance has been achieved by N-YOLO in comparison with several CNN-based algorithms.


Introduction
Synthetic aperture radar (SAR) is an active side-looking radar that can overcome weather interference and provide high-resolution images. SAR images have been then considered more suitable for ship detection than optical images. SAR ship detection is an has important application in the field of marine surveillance and has received much attention recently [1,2].
In recent years, increasing scholars began to study SAR ship recognition method based on neural networks. Some scholars use the two-stage method to detect ships. Cui et al. [3] proposed a dense attention pyramid network to detect multiscale ships, and Lin et al. [4] also proposed a squeeze and excitation Faster R-CNN [5] to improve detection accuracy. Zhao et al. [6] applied fast region convolutional neural network (R-CNN) [7] to ship detection in synthetic aperture radar (SAR) image. These two-stage methods can often achieve higher detection accuracy, but their detection computational speed is often slower than that of one-stage methods. Therefore, in order to ensure the real-time effect of recognition, some scholars use the one-stage method to inspect ships. Wei et al. [8] designed a high-resolution SAR ship detection network based on HR-SDNet. Wang et al. [9] applied transfer learning based on SSD [10] to improve accuracy. Wang et al. [11] proposed an RetinaNet-based [12] detection for the ship in GaoFen-3 images. Mao et al. [13] firstly used a simplified U-Net to extract features and proposed an anchor-free SAR ship detection because notch filter can deal with multiple interference components or periodic noises at the same time. However, one of the most important parameters of notch filter is to set the domain size with the same weight. If this parameter is too small, it is not conducive to noise equalization in a wider range. If this parameter is too large, the image details cannot be obtained. For SAR images, there are great differences in noise distribution level and noise types, so it is almost impossible to set a parameter to be applicable to all images to obtain common and good results. Because the noise levels and types of SAR images are quite different, this kind of filtering method using filters has the same disadvantages when processing SAR images, that is, it cannot deal with the noise in all images well.
The research developed in this paper introduces a new SAR ship detection method so-called N-YOLO, which is based on the classification of noise level and the processing of noise. It consists of three parts, the first one is a noise level classifier (NLC), the second one is the SAR target potential area extraction (STPAE) module, the third one is the identification module based on YOLOv5. By applying the NLC classifier, images are divided into three levels according to the level of noise and sent to different modules. Images affected by high-level noise are sent to YOLOv5 for detection, and other images are sent to STPAE module. In STPAE module, CA-CFAR is used to detect the preliminary target area in order to extract the potential target area. In order to prevent some dark pixels on the target from being missed by CA-CFAR, the expansion operation is used to fill and expand the target area acquired by CA-CFAR. In YOLOv5-based recognition module, firstly, the image extracted by STPAE module is combined with the original image to obtain a new image. In the new image, there are fewer noise and the ship and coast are highlighted, thus reducing the impact of coast and noise on ships. The new image is then sent to YOLOv5 for recognition. To evaluate the performance of N-YOLO, we conducted several experiments on the GaoFen-3 dataset, in which the images were taken by GaoFen-3 satellite in China. The detection results show that our method is efficient for detecting multiscale ships in SAR images, compared with several CNN-based methods, e.g., YOLOv5 and G-YOLOv5. The major contributions of this article are summarized as follows: (1) A novel detection method called N-YOLO for detecting ships in SAR images.
(2) A three-step framework that first contains a NLC module to distinguish images with different noise levels. Secondly, a STPAE module to extract the complete potential target area, and thirdly, a module based on YOLOv5 to identify the ship from the image with highlighted target and less noise.
(3) Experiments on the reference GaoFen-3 dataset demonstrate that the called N-YOLO detects ships with competitive results in comparison with some classical and special CNN-based methods.

Methods
Let us successively introduce the three components of our N-YOLO approach, that is, the NLC module, STPAE module and YOLOV5-based target discrimination.
The architecture of N-YOLO is shown in Figure 1. The influence of noise on SAR images varies greatly. The objective of the NLC module is to classify the noise level. If the image is affected by medium-level noise or low-level noise, the original image is sent using path1 to two processes. On the one hand, the image is sent to the STPAE module in which the image is prescreened with CA-CFAR, and then the whole potential target area is obtained by dilation operation. On the other hand, the other branch retains and outputs the original image. Images obtained from the two branches are then combined. If the pixel value of a given position on the two images is not null, the pixel value of this point on the combined image will be assigned as 1, otherwise, it will be assigned as null.
The combined image will be sent to YOLOv5 network for ship detection. If the image is affected by high-level noise, it will be sent to YOLOv5 for detection through path2. image will be sent to YOLOv5 network for ship detection. If the image is affected by highlevel noise, it will be sent to YOLOv5 for detection through path2.

Classify the Noise Level
When considering the GaoFen-3 dataset, images are affected by different levels and kinds of noises. Among all kinds of noises, salt and pepper noise is the most common and has the greatest influence on ship identification. Salt and pepper noise, also known as impulse noise, which randomly changes some pixel values, denotes a noise produced by image sensor, transmission channel and decoding processing. In order to better deal with the influence of salt and pepper noise, we divided the noise into three grades according to its influence. The average pixel value is calculated as follows: in which is the average pixel value of the whole image, is the pixel value at coordinates i,j in the picture and is the total number of pixels in the image. In order to improve ship detection affected by high-level noise, we introduced an NLC module to classify and process images, as shown in Figure 2. For images affected by low-level noise and medium-level noise, they are sent to the STPAE module for processing. Images affected by high-level noise are sent to YOLOv5 for detection.
The threshold value T is selected by an empirical method. According to the images affected by different noise levels and the results obtained by CA-CFAR processing, we applied an empirical method to obtain the interval of different noise levels. We set the average pixel value range of images affected by low-level noise to be 0, 30 . Accordingly, the average pixel values of the images affected by medium-level noise and those affected by high-level noise are 30,80 and 80, 255 , respectively. Therefore, we set the threshold T to 80. If the threshold is higher than 80, some images affected by high-level noise will be sent to STPAE, which can affect the overall training results and improve the missed detection rate. If the threshold is lower than 80, some images affected by medium-level noise cannot remove noise interference, and some images affected by shore interference cannot remove shore interference.

Classify the Noise Level
When considering the GaoFen-3 dataset, images are affected by different levels and kinds of noises. Among all kinds of noises, salt and pepper noise is the most common and has the greatest influence on ship identification. Salt and pepper noise, also known as impulse noise, which randomly changes some pixel values, denotes a noise produced by image sensor, transmission channel and decoding processing. In order to better deal with the influence of salt and pepper noise, we divided the noise into three grades according to its influence. The average pixel value is calculated as follows: in which V is the average pixel value of the whole image, v ij is the pixel value at coordinates i,j in the picture and n 2 is the total number of pixels in the image. In order to improve ship detection affected by high-level noise, we introduced an NLC module to classify and process images, as shown in Figure 2. For images affected by low-level noise and medium-level noise, they are sent to the STPAE module for processing. Images affected by high-level noise are sent to YOLOv5 for detection.

Low-level Noise
The images affected by low-level noise are shown in Figure 3. This kind of image has less noise and sparse distribution, which has little influence on the ship recognition task. The average pixel value range of such images is less than 30. The threshold value T is selected by an empirical method. According to the images affected by different noise levels and the results obtained by CA-CFAR processing, we applied an empirical method to obtain the interval of different noise levels. We set the average pixel value range of images affected by low-level noise to be [0, 30) . Accordingly, the average pixel values of the images affected by medium-level noise and those affected Remote Sens. 2021, 13, 871 5 of 16 by high-level noise are [30,80] and (80, 255] , respectively. Therefore, we set the threshold T to 80. If the threshold is higher than 80, some images affected by high-level noise will be sent to STPAE, which can affect the overall training results and improve the missed detection rate. If the threshold is lower than 80, some images affected by medium-level noise cannot remove noise interference, and some images affected by shore interference cannot remove shore interference.

Low-Level Noise
The images affected by low-level noise are shown in Figure 3. This kind of image has less noise and sparse distribution, which has little influence on the ship recognition task. The average pixel value range of such images is less than 30.

Low-level Noise
The images affected by low-level noise are shown in Figure 3. This kind of image has less noise and sparse distribution, which has little influence on the ship recognition task. The average pixel value range of such images is less than 30. The average pixel values of each image in Figure 3 are listed in Table 1. It can be seen from Table 1 that the average pixel values of these four images are all less than 30, so they all belong to images affected by low-level noise. There are uniformly distributed salt and pepper noises in these four images, but the noise influence is slight, which hardly affects ship identification.  The images affected by medium-level noise are shown in Figure 4. The noise density of this kind of image is not too large and the distribution is not too dense, which will have some influence on the ship recognition task. The average pixel value of this kind of image is between 30 and 80.  Table 1. It can be seen from Table 1 that the average pixel values of these four images are all less than 30, so they all belong to images affected by low-level noise. There are uniformly distributed salt and pepper noises in these four images, but the noise influence is slight, which hardly affects ship identification. Table 1. Average pixel value of each image in Figure 3.

Medium-Level Noise
The images affected by medium-level noise are shown in Figure 4. The noise density of this kind of image is not too large and the distribution is not too dense, which will have some influence on the ship recognition task. The average pixel value of this kind of image is between 30 and 80.   Table 2. It can be seen from Table 2 that the average pixel values of these four images are between 30 and 80, so they all belong to images affected by medium-level noise. There are uniformly distributed and dense salt and pepper noises in this kind of images, which will have some influence on ship recognition. However, the potential target region extraction module and the YOLOv5-based recognition module can filter out the noise and improve the recognition  Table 2. It can be seen from Table 2  80, so they all belong to images affected by medium-level noise. There are uniformly distributed and dense salt and pepper noises in this kind of images, which will have some influence on ship recognition. However, the potential target region extraction module and the YOLOv5-based recognition module can filter out the noise and improve the recognition accuracy. Table 2. Average pixel value of each image in Figure 4. This kind of picture shown in Figure 5 is disturbed by severe noise, and the noise in this kind of picture is very dense and uniform, which brings great challenges to ship recognition. The average pixel value of this kind of image is greater than 80. The average pixel values of each image in Figure 4 are listed in Table 2. It can be seen from Table 2 that the average pixel values of these four images are between 30 and 80, so they all belong to images affected by medium-level noise. There are uniformly distributed and dense salt and pepper noises in this kind of images, which will have some influence on ship recognition. However, the potential target region extraction module and the YOLOv5-based recognition module can filter out the noise and improve the recognition accuracy.  This kind of picture shown in Figure 5 is disturbed by severe noise, and the noise in this kind of picture is very dense and uniform, which brings great challenges to ship recognition. The average pixel value of this kind of image is greater than 80.  Table 3. It can be seen from Table 3 that the average pixel values of these four images are all greater than 80, so they all belong to images affected by high-level noise. This kind of picture is greatly affected by noise, and if the potential target extraction module and the recognition module based on YOLOv5 are directly used to recognize it, the effect is not good; not only is the rate of missing detection high, but also the training effect is poor.    Table 3. It can be seen from Table 3 that the average pixel values of these four images are all greater than 80, so they all belong to images affected by high-level noise. This kind of picture is greatly affected by noise, and if the potential target extraction module and the recognition module based on YOLOv5 are directly used to recognize it, the effect is not good; not only is the rate of missing detection high, but also the training effect is poor.

Extract the Complete Target Area
In order to extract the complete target area from SAR images, this paper introduced a STPAE module, which consists of CA-CFAR and dilation operation.
In SAR images, the gray intensity of ship is higher than that of surrounding sea clutter. CA-CFAR can generate a local threshold value to detect bright pixels via a sliding window. CA-CFAR divides the local area into three windows: center region of interest's (ROI) window, the guard window and the background clutter's window, as shown in Figure 6.
In order to extract the complete target area from SAR images, this paper introduced a STPAE module, which consists of CA-CFAR and dilation operation.
In SAR images, the gray intensity of ship is higher than that of surrounding sea clutter. CA-CFAR can generate a local threshold value to detect bright pixels via a sliding window. CA-CFAR divides the local area into three windows: center region of interest's (ROI) window, the guard window and the background clutter's window, as shown in Figure 6. CA-CFAR first calculates the average pixel value of the region of interest ( ) and the average pixel value of clutter ( ), and then multiplies the average value of clutter by a coefficient . The obtained value is the adaptive threshold T. Finally, the are compared with the threshold T. If the are greater than the threshold T, the ROI pixels are marked as bright pixel in an output binary image J, otherwise it is marked as a dark pixel. Assuming that the dimensions of the input SAR image I and the output binary image J are both X × Y, where = 0,1, ⋯ , 1 , = 0,1, ⋯ , 1 .The I and J can be defined as , = , , , .
The CA-CFAR binary pixel , , , can be calculated with The SAR ship images for which , , , = will be extracted and sent to the next stage for expansion operation. Using the prescreening proposed method can greatly reduce the workload of subsequent recognition work, maintain a constant false alarm rate. at the same time, it will not miss all possible ships in the image.
The flow chart of STPAE module is shown in Figure 7. After we sent the original SAR image, we first calculate the adaptive threshold when the sliding window traverses each point on the image. The adaptive threshold can be defined as CA-CFAR first calculates the average pixel value of the region of interest (µ ROI ) and the average pixel value of clutter (µ c ), and then multiplies the average value of clutter by a coefficient α. The obtained value is the adaptive threshold T. Finally, the µ ROI are compared with the threshold T. If the µ ROI are greater than the threshold T, the ROI pixels are marked as bright pixel in an output binary image J, otherwise it is marked as a dark pixel. Assuming that the dimensions of the input SAR image I and the output binary image J are both X × Y, where x = {0, 1, · · · , X − 1}, y = {0, 1, · · · , Y − 1}.The I and J can be defined as ( The CA-CFAR binary pixel J(I, x, y, T) can be calculated with The SAR ship images for which J(I, x, y, T) = true will be extracted and sent to the next stage for expansion operation. Using the prescreening proposed method can greatly reduce the workload of subsequent recognition work, maintain a constant false alarm rate. At the same time, it will not miss all possible ships in the image.
The flow chart of STPAE module is shown in Figure 7. After we sent the original SAR image, we first calculate the adaptive threshold when the sliding window traverses each point on the image. The adaptive threshold can be defined as where z is the average value of surrounding pixels and α is the adaptive coefficient. The size of α depends on the size of the clutter window. Then, compare the pixel value of each point with its adaptive threshold. If the pixel value of this point is greater than its adaptive threshold, assign 1 to the corresponding position of the prescreened picture; otherwise, assign 0 to the corresponding position of the prescreened picture. Next, the prescreened pictures are sent to the expansion operation. Through the expansion operation, the highlighted pixels are expanded around, thus the potential target areas extracted in the previous step are filled and expanded, to avoid the partial areas of some targets being lost by the previous operation due to the low pixel value. Finally, the obtained image covering the complete target area is sent to the next stage.
where z is the average value of surrounding pixels and α is the adaptive coefficient. The size of α depends on the size of the clutter window. Then, compare the pixel value of each point with its adaptive threshold. If the pixel value of this point is greater than its adaptive threshold, assign 1 to the corresponding position of the prescreened picture; otherwise, assign 0 to the corresponding position of the prescreened picture. Next, the prescreened pictures are sent to the expansion operation. Through the expansion operation, the highlighted pixels are expanded around, thus the potential target areas extracted in the previous step are filled and expanded, to avoid the partial areas of some targets being lost by the previous operation due to the low pixel value. Finally, the obtained image covering the complete target area is sent to the next stage.

Ship Identification Based on YOLOv5
In the recognition stage, firstly, the extracted image of potential target area is combined with the original image, and the preprocessed image with bright target and less noise points is obtained. Compare the pixels in the same position on the two images. If the pixels in the original image are greater than the threshold value T and the pixels in the image obtained by STPAE module are greater than 0, then the point in the new image is assigned 1, otherwise, the point is assigned 0.
The process of combining the above two images is shown in Figure 8. If two conditions are met, that is, the pixel value in the original image is greater than the threshold value , and the pixel value at the corresponding position in the extracted potential target image is 1, then the pixel value of this point in the obtained new image is 1, as shown by point 2 in Figure 8. Otherwise, even if one of the conditions is met, the pixel value of this point in the new image will be null. As shown in point 1 in Figure 8, the pixel value in the original image is greater than the threshold value t, but the pixel value of the corresponding position in the extracted potential target image is null, so the pixel value of this point in the new image is set to null. By analogy, we can get a new image combined with the above two images. Compared with the original image, most of the noise is filtered out and the target is highlighted and enhanced. Finally, the new image will be sent to YOLOv5 for ship identification.

Ship Identification Based on YOLOv5
In the recognition stage, firstly, the extracted image of potential target area is combined with the original image, and the preprocessed image with bright target and less noise points is obtained. Compare the pixels in the same position on the two images. If the pixels in the original image are greater than the threshold value T and the pixels in the image obtained by STPAE module are greater than 0, then the point in the new image is assigned 1, otherwise, the point is assigned 0.
The process of combining the above two images is shown in Figure 8. If two conditions are met, that is, the pixel value in the original image is greater than the threshold value T c , and the pixel value at the corresponding position in the extracted potential target image is 1, then the pixel value of this point in the obtained new image is 1, as shown by point 2 in Figure 8. Otherwise, even if one of the conditions is met, the pixel value of this point in the new image will be null. As shown in point 1 in Figure 8, the pixel value in the original image is greater than the threshold value t, but the pixel value of the corresponding position in the extracted potential target image is null, so the pixel value of this point in the new image is set to null. By analogy, we can get a new image combined with the above two images. Compared with the original image, most of the noise is filtered out and the target is highlighted and enhanced. Finally, the new image will be sent to YOLOv5 for ship identification.

Experiments
This section the experiments evaluation of the function of the NLC module and the

Experiments
This section the experiments evaluation of the function of the NLC module and the impact of the whole method through some experiments.

Experimental Datasets
We carried out a series of experiments on GaoFen-3 dataset to verify the proposed YOLOv5-based ship detection method. Some samples of inshore ships and ships in images affected by noise are shown in Figure 9.
12,000 images extracted from the GaoFen-3 dataset are randomly divided into two parts, i.e., a training set and a testing set, with the proportion of 6:1. All experiments are implemented using a TensorFlow framework on windows with a Nvidia Quadro p5000 graphics card support.
Some images from the GaoFen-3 dataset have three channels, while others have only one channel, so the experiment uses one channel for all images.

Experiments
This section the experiments evaluation of the function of the NLC module and the impact of the whole method through some experiments.

Experimental Datasets
We carried out a series of experiments on GaoFen-3 dataset to verify the proposed YOLOv5-based ship detection method. Some samples of inshore ships and ships in images affected by noise are shown in Figure 9.
12,000 images extracted from the GaoFen-3 dataset are randomly divided into two parts, i.e., a training set and a testing set, with the proportion of 6:1. All experiments are implemented using a TensorFlow framework on windows with a Nvidia Quadro p5000 graphics card support.
Some images from the GaoFen-3 dataset have three channels, while others have only one channel, so the experiment uses one channel for all images.

Evaluation Criterions
Experimental results are shown in Table 4 and Table 5 respectively. Some indices are used to evaluate the experimental results of the different methods, e.g., recall rate, precision rate, F score (F1), and average precision (AP). The following equations define these indices:

Evaluation Criterions
Experimental results are shown in Tables 4 and 5 respectively. Some indices are used to evaluate the experimental results of the different methods, e.g., recall rate, precision rate, F score (F1), and average precision (AP). The following equations define these indices: Recall rate(R) = TP TP + FN , where TP, FP, and FN represent true positives, false positives, and false negatives, respectively. Precision rate refers to the proportion of ground truth ships predicted by networks in all predictions. Recall rate refers to the proportion of ground truth ships predicted by networks in all ground truth ships. F1 is a comprehensive indicator used for judging the performance of different networks by combining precision rate with recall rate together. AP describes the area under precision-recall (PR) curves and it also illustrates comprehensive performance of different methods.

Noise Level Classifier Impact
When STPAE module processes images are affected by different noise levels, the effects are quite different.
The images affected by low-level noise are shown in Figure 10. This kind of image has less noise and sparse distribution, which has little influence on the ship recognition task. With our method, these tiny noises will be filtered out well. The images affected by medium-level noise are shown in Figure 11. The noise distribution in this kind of image is dense and uniform, which has certain influence on ship recognition. Using our method, the noise in this kind of images will be removed in large quantities, thus greatly improving the accuracy of ship recognition. The kind of image shown in Figure 12 is disturbed by high-level noise, and the noise in this kind of image is very dense and uniform, which brings great challenges to ship recognition. The STPAE is not efficient when dealing with such images. After combining the extracted image of potential target area with the original image, an image with noise in the center and almost no ship will be obtained.
where TP, FP, and FN represent true positives, false positives, and false negatives, respectively. Precision rate refers to the proportion of ground truth ships predicted by networks in all predictions. Recall rate refers to the proportion of ground truth ships predicted by networks in all ground truth ships. F1 is a comprehensive indicator used for judging the performance of different networks by combining precision rate with recall rate together. AP describes the area under precision-recall (PR) curves and it also illustrates comprehensive performance of different methods.

Noise Level Classifier Impact
When STPAE module processes images are affected by different noise levels, the effects are quite different.
The images affected by low-level noise are shown in Figure 10. This kind of image has less noise and sparse distribution, which has little influence on the ship recognition task. With our method, these tiny noises will be filtered out well. The images affected by medium-level noise are shown in Figure 11. The noise distribution in this kind of image is dense and uniform, which has certain influence on ship recognition. Using our method, the noise in this kind of images will be removed in large quantities, thus greatly improving the accuracy of ship recognition. The kind of image shown in Figure 12 is disturbed by high-level noise, and the noise in this kind of image is very dense and uniform, which brings great challenges to ship recognition. The STPAE is not efficient when dealing with such images. After combining the extracted image of potential target area with the original image, an image with noise in the center and almost no ship will be obtained.   It can be seen from Figure 10 and Figure 11 that if the noise interference in the original image is not severe, the proposed method can get better results; otherwise, if the noise interference in the original image is too serious, the target will be lost in the new image. The emergence of this situation not only reduces the recall rate and causes many missed inspections, but also has a negative impact on the training in the process of sending to YOLOv5 training, resulting in low overall recognition accuracy. Therefore, the NLC module can be applied to classify SAR images according to the noise level. Images affected by high-level noise are sent to YOLOv5 for detection, while other images are sent to STPAE for processing and then sent to YOLOv5 for recognition.
In order to verify the effectiveness of the NLC module, a set of comparative experiments are carried out to verify it. We compare the recognition with YOLOv5, the recogni-  It can be seen from Figure 10 and Figure 11 that if the noise interference in the original image is not severe, the proposed method can get better results; otherwise, if the noise interference in the original image is too serious, the target will be lost in the new image. The emergence of this situation not only reduces the recall rate and causes many missed inspections, but also has a negative impact on the training in the process of sending to YOLOv5 training, resulting in low overall recognition accuracy. Therefore, the NLC module can be applied to classify SAR images according to the noise level. Images affected by high-level noise are sent to YOLOv5 for detection, while other images are sent to STPAE for processing and then sent to YOLOv5 for recognition.
In order to verify the effectiveness of the NLC module, a set of comparative experiments are carried out to verify it. We compare the recognition with YOLOv5, the recogni- It can be seen from Figures 10 and 11 that if the noise interference in the original image is not severe, the proposed method can get better results; otherwise, if the noise interference in the original image is too serious, the target will be lost in the new image. The emergence of this situation not only reduces the recall rate and causes many missed inspections, but also has a negative impact on the training in the process of sending to YOLOv5 training, resulting in low overall recognition accuracy. Therefore, the NLC module can be applied to classify SAR images according to the noise level. Images affected by high-level noise are sent to YOLOv5 for detection, while other images are sent to STPAE for processing and then sent to YOLOv5 for recognition.
In order to verify the effectiveness of the NLC module, a set of comparative experiments are carried out to verify it. We compare the recognition with YOLOv5, the recognition with STPAE and the YOLOv5-based recognition module without NLC module classification, and the recognition of images affected by different noise levels with the N-YOLO. The test results are shown in Table 4.

Comparison with Other CNN-Based Methods
In order to prove the filtering effect of N-YOLO, we used the typical filtering method (Gaussian filtering) as preprocessing before YOLOv5 detection. As shown in Table 5, the experiments we conducted using our method, YOLOv5, and G-YOLOv5 respectively. Figure 13 shows the PR curves of the different CNN-based methods tested on several ships

Comparison with Other CNN-Based Methods
In order to prove the filtering effect of N-YOLO, we used the typical filtering method (Gaussian filtering) as preprocessing before YOLOv5 detection. As shown in Table 5, the experiments we conducted using our method, YOLOv5, and G-YOLOv5 respectively. Figure 13 shows the PR curves of the different CNN-based methods tested on several ships

Discussion
It can be concluded from Table 4 that the highest accuracy is the result of training directly sent to STPAE and YOLOv5-based detection module without passing through NLC module. Compared with the training result directly using YOLOv5, its precision is 7% higher, but its recall rate is 12.75% lower. This is because the images affected by highlevel noise will produce a mass of noise in the middle of the image and lose the target after being sent to STPAE, which will not only improve the missed detection rate, but also affect the overall training results in the training process. In contrast, using the method proposed in this paper (classified by NLC module), the recall rate is greatly improved. The recall rate of images affected by high-level noise after training is as high as 92.36%, which is very

Discussion
It can be concluded from Table 4 that the highest accuracy is the result of training directly sent to STPAE and YOLOv5-based detection module without passing through NLC module. Compared with the training result directly using YOLOv5, its precision is 7% higher, but its recall rate is 12.75% lower. This is because the images affected by high-level noise will produce a mass of noise in the middle of the image and lose the target after being sent to STPAE, which will not only improve the missed detection rate, but also affect the overall training results in the training process. In contrast, using the method proposed in this paper (classified by NLC module), the recall rate is greatly improved. The recall rate of images affected by high-level noise after training is as high as 92.36%, which is very close to 92.65% of the YOLOv5, and the recall rate of images affected by low-level noise after training also reaches 86.42%. Compared with the YOLOv5, the accuracy of the proposed method is greatly improved. Among them, the accuracy of images affected by medium and low-level noise after training reaches 76.5%, which is 5.7% higher than that of the YOLOv5. The accuracy of images affected by high level noise after training is 67.46%, which is 3.34% lower than that of the first method. Among the 12,000 images in the training set, there are 1744 images affected by high level noise and 10256 images affected by medium and low-level noise. According to the ratio of the two, N-YOLO has improved the accuracy and decreased the false detection rate.
Experiments show that using NLC can not only improve the detection accuracy, but also increase the missed detection rate less, thus improving the overall detection performance. At the same time, images affected by different noise levels can be prevented from interfering with each other in the training process.
It can be seen from Table 5 that the precision of the last two methods has been improved to varying degrees compared with the first method, and the precision of the method proposed in this paper has been improved the most. In terms of recall rate, the first two methods are almost the same and superior to the latter two methods, while YOLOv5 is the best. Because the latter two methods preprocess the images, the details of small targets are destroyed, resulting in missing detection. Figure 13 shows the PR curves of the CNN-based methods. The navy blue line is the PR curve obtained by using YOLOv5 training. The light blue line is the PR curve obtained by non-NLC. The green line and yellow line are PR curves of images affected by high-level noise and medium/low-level noise, respectively, which are trained by our method. The red line is the PR curve obtained from the contrast experiment, which is first filtered by Gaussian and then sent to YOLOv5 for training.
The PR curve of non-NLC comes to a sharp decrease with an increase in recall rate compared with YOLOv5. It might be because of the insufficient characteristics extracted by non-NLC, which leads to weak discrimination for ships. Furthermore, the PR curve of non-NLC is lower than those of other methods when recall rate is higher than about 0.5. In addition, the PR curve of h-level is higher than that of others when recall rate is greater than 0.9. Figure 14 shows the detection results of the different methods as applied to four different ships situations. These four situations are as follows: offshore ships affected by medium/low-level noise (the first row of Figure 14), offshore ships affected by high-level noise (the second row of Figure 14), inshore ships affected by high-level noise (the third row of Figure 14), and inshore ships affected by medium/low-level noise (the fourth row of Figure 14). It can be seen from the first line of Figure 14 that the effects of the four detection methods are almost the same for the first situation. Compared with the original method, the detection accuracy of the latter two methods is slightly improved, among which G-YOLOv5 is improved by 1%, N-YOLO is improved by 2%. For the second situation, compared with the original method, the detection accuracy of G-YOLOv5 is equal to the original method, and N-YOLO is improved by 4%. For the third situation, compared with the original method, the detection accuracy of G-YOLOv5 is reduced to a certain extent, and G-YOLOv5 also has a false detection. In this picture, the detection accuracy of N-YOLO is improved by 7% on average compared with the original method. For the last situation, G-YOLOV5 not only failed to reduce the noise interference, but also the target became blurred, so the detection accuracy dropped significantly and there were four missing detections. For this image, the detection accuracy of N-YOLO is slightly improved compared with the original method. Among them, the detection accuracy of the ship in the lower left corner increased by 15%. However, although N-YOLO did not miss the detection, it mistakenly identified a ship in the lower right corner. Remote Sens. 2021, 13,  G+YOLOv5 N-YOLO Figure 14. Visual detection results of CNN-based methods on offshore ships. The first column is the detection result of YOLOv5, the second column is the detection result of YOLOv5 after preprocessing with Gaussian filter, and the third column is the detection result of the N-YOLO. Figure 14. Visual detection results of CNN-based methods on offshore ships. The first column is the detection result of YOLOv5, the second column is the detection result of YOLOv5 after preprocessing with Gaussian filter, and the third column is the detection result of the N-YOLO.

Conclusions
The research developed in this paper introduced a new ship detection method of the maritime environment in SAR imagery, consisting of NLC module, STPAE module, and YOLOv5-based discrimination. The NLC module classifies the images according to the noise level, and the images affected by high-level noise are sent to YOLOv5 for detection, while the rest of the images are sent to the STPAE module. The STPAE module uses CA-CFAR and expansion operation to extract the target potential region and expand and fill it. In the recognition stage, firstly, the extracted image of potential target area is combined with the original image, and then the image with bright target and less noise is obtained, which is sent to YOLOv5 for recognition. Compared with sending pictures directly to classical target detection networks (such as YOLOv5), the N-YOLO has better detection performance. Experiments show that N-YOLO has a good effect on ship recognition in SAR images. The proposed method can reduce the interference of noise and shore to ship identification, and has a wide application prospect in the field of marine monitoring. The N-YOLO still partially damages the ship edge information, and the future work will focus on better protecting the edge information.