To test the performance of our proposed model, we will compare it with other models. This section describes the datasets used in the evaluation, evaluation criteria, comparison methods, and comparison results.
4.4. Comparative Experiment
To test the effectiveness of TWC-Net, we used RetinaNet, YOLO, SSD, and Faster RCNN as experimental comparison methods. TWC-Net belongs to the one-stage detector. In the comparison method, RetinaNet, YOLO, and SSD belong to the one-stage detector, and Faster RCNN belongs to the two-stage detector. To verify the validity of the model, we tested the validity of the model in a common image.
We used the VOC2007 dataset to validate the model’s ability to detect on common datasets, as shown in
Table 1. Our model is better at small target detection of remote sensing images, so obtaining these results is not our primary goal.
RetinaNet is a detector based on focal loss, which mainly solves the problem of sample imbalance. RetinaNet uses ResNet50 (Res50) + FPN as the backbone, and the target classifier sub network uses focal loss as a loss function, which effectively eliminates the problem of sample imbalance and prompts the model to mine difficult samples.
YOLO is a one-stage target detection network. YOLO solves the problem of object detection as a regression problem. The input image can be processed once to output the location of the object and the corresponding category and confidence level. Through continuous development and improvement, YOLO has several different versions, of which YOLOv4 [
31] has better overall performance. Therefore, YOLOv4 was also used as a comparison model in this experiment.
SSD runs faster than other models. SSD uses feature pyramid detection to predict targets on feature maps of different receptive fields. Different types of SSD have a different backbone, mainly including VGG19 [
32] and ResNet50, which have a better performance than those using ResNet50. The SSD used in this experiment was SSD + ResNet50.
Faster RCNN is a two-stage detector proposed by Girshick and is an upgrade to the target detector of the previous RCNN series. Faster RCNN combines feature extraction, proposal extraction, and bounding box regression classification in a network to improve its comprehensive performance. Faster RCNN also has several different backbones, including ResNet50 and FPN as the backbone. This experiment used Faster RCNN + ResNet50 + FPN.
The proposed comparison between TWC-Net and RetinaNet, SSD, and Faster RCNN is quantitative and intuitive.
Table 2 and
Table 3 show the precision, recall, and F-Measure for all methods, and
Figure 7 shows the visual detection results for different methods on the test sets.
In terms of recall rate, TWC-Net scored the highest at 95.29%, followed by SSD, which scored 95.21%. In terms of precision, RetinaNet achieved the highest score of 92.11%, followed by YOLOv4 with a score of 91.90%. The precision of the TWC-Net is 91.44%, which was close to the score of YOLOv4. In terms of F-measure, TWC-Net achieved the highest score of 93.32%, followed by SSD of 92.01%, RetinaNet of 86.36%, and Faster RCNN of 83.80%. Simply put, TWC-Net and SSD are better than other models, but SSD has lower precision than TWC-Net.
Figure 7 shows the visual detection results of the different detection methods. It can be seen from the diagram that other methods have some degree of miss detection in small target detection or complex background detection near the shore, which is also the difficult point in SAR ship detection. Small target information is easily submerged in the continuous feature extraction. The high noise characteristics of SAR images make small target detection more difficult. Nearshore targets have a larger size, but image features are more complex. Detectors are prone to be miss detected and missed due to the interference of near-shore irrelevant information. In the figure, RetinaNet missed inshore ship targets, SSD performed poorly in small target detection, and small target ship missed detection that occurred in large areas of the sea. The result of the detection in the comprehensive graph shows that TWC-Net has a better comprehensive performance in detection.
For a more comprehensive comparison of the detection performance of each model, we set the IOU to 0.75 to re-evaluate the model. The IOU is 0.75, which means that the detection box has a higher overlap range with the ground truth box, so the model needs to detect the location of the detection more accurately. From
Table 3, we can see that TWC-Net achieved the highest scores in recall rate and F-measure, while TWC-Net achieved the highest scores in precision. Next came RetinaNet and SSD. By comparison, our TWC-Net locations are more accurate and can more effectively locate the ship targets in the SAR image.
In the actual application environment, the complexity and memory requirements of the model are required. Models with high complexity require more stringent operating conditions and are more prone to over-fitting. Therefore, models with less memory and less complexity have more advantages. For a more comprehensive assessment of the model, we included comparisons of the model size, FLOPs, and parameters. The model size is the amount of memory occupied by the model after the training is completed. FLOPs are used to calculate floating-point arithmetic, which can measure the complexity of the model. Parameters represent the variables needed to define the model and measure the complexity of the model.
Table 4 and
Table 5 show the model size, FLOPs, and parameters for all methods.
Table 4 compares TWC-Net’s backbone with other mainstream backbones. By comparison, TWC-Net’s two-way convolution module has lower FLOPs and parameters, and a better performance in model size. Therefore, the backbone of TWC-Net has lower complexity and is suitable for lightweight devices with lower performance.
TWC-Net has a smaller memory footprint in terms of model size, followed by SSD. For FLOPs, TWC-Net has lower model complexity, followed by RetinaNet. SSD performed best in terms of parameters, followed by TWC-Net. Because Faster RCNN is a two-stage detector and has more module design in the structure, it has a poor performance in memory and complexity of the model. Overall, TWC-Net performs better in terms of model complexity.
To measure precision rates and recall rates, we used precision–recall curves to measure different models. The precision–recall curve can show the model’s overall recall and precision. The precision–recall curves take precision as the vertical axis and recall as the transverse axis. A skewed curve is obtained by counting each sample. As shown in
Figure 8, the precision–recall curves shows that RetinaNet and TWC-Net perform better.
4.5. Generating Heatmap
Generation of a heatmap can be used to interpret models. Gradient-weighted class activation mapping (Grad CAM) [
34] uses the gradient of the target concept to flow into the final convolution layer, producing a rough positioning map that highlights the area in the image used for prediction. Grad CAM overcomes the drawback of requiring a global average pooling (GAP) layer in class activation mapping (CAM) [
35] network architecture and achieves the visualization result without modifying the network structure. In this experiment, Grad CAM was used to generate heatmap images to determine the effect of different locations on the output. As shown in
Figure 9, the heatmap generated by TWC-Net using Grad CAM is displayed.
From
Figure 9, it can be observed that the TWC-Net responds to each output feature map using large sea area images and offshore sea area images. Large sea area images have a simple background and small target. Based on the design of TWC-Net, high-scale shallow information is used for detection. Columns 2, 3, and 4 of
Figure 9 show that the shallow output characteristics of TWC-Net can effectively respond to small targets. The image of the offshore sea area has the characteristics of a complex background and more interference information, so the network should extract deeper features for detection. TWC-Net can output deep low-scale information. As shown in columns 5 and 6 of
Figure 9, it is easier for a network to learn the characteristics of larger targets from deep features and respond to them. Overall, through heatmap visualization, we can see that TWC-Net can effectively use multiscale structures to learn ship characteristics and detect ship targets of different scales.
4.6. Generalized Performance Test
Generalization ability is used to evaluate the model’s adaptability to fresh samples and is of great significance in practical application scenarios. In this experiment, we used other SAR ship data to test the generalization performance of TWC-Net, mainly using the HRSID [
36] and SAR-Ship-Dataset [
37]. HRSID contains 5604 high-resolution SAR images and 16951 ship targets, including SAR images of different resolutions, polarizations, and marine environments at resolutions of 0.5, 1 and 3m. The SAR-Ship-Dataset contains 43,819 SAR images. In total, 102 high-resolution 3rd and 108 Sentinel-1 images are used to construct the SAR images, including multiple imaging modes with resolutions of 3, 5, 8, 10 and 25 m. The SAR images that construct the sample library are multi-source and multi-mode. The three images of the HRSID and SAR-Ship-Dataset were selected for detection as shown in
Figure 10.
Figure 10 shows that TWC-Net has a good detection effect for small target detection. It can detect most vessels effectively. The SAR-Ship-Dataset dataset contains some high noise interference images. It shows few such images during training. TWC-Net can still detect some high noise targets in the SAR-Ship-Dataset.
To quantitatively assess model generalization capabilities, we tested 10,955 pictures from the SAR-Ship-Dataset as test sets. The models used included TWC-Net, RetinaNet, SSD, and Faster RCNN. The indicators tested included recall, precision, and F-measure. The test results are shown in
Table 6.
Table 6 shows that TWC-Net achieved the best results in recall rates, followed by SSD. RetinaNet achieved the best precision, followed by SSD. TWC-Net achieved the best results in the F-measure, followed by SSD. Overall, TWC-Net has better generalization performance.