Author Contributions
Funding acquisition, J.Z.; Investigation, H.G. and S.Z.; Methodology, H.G.; Project administration, J.X. and Z.J.; Resources, Z.J.; Software, Y.Y.; Supervision, J.X.; Visualization, S.Z.; Writing—original draft, Y.Y.; Writing—review & editing, J.Z. All authors have read and agreed to the published version of the manuscript.
Figure 1.
The network structure of the proposed method can be divided into four parts: (a) input image, (b) feature pyramid net, (c) feature selection module, (d) multitasking subnets.
Figure 1.
The network structure of the proposed method can be divided into four parts: (a) input image, (b) feature pyramid net, (c) feature selection module, (d) multitasking subnets.
Figure 2.
The network structure of the feature fusion network. The red dotted line—the bottom-up path of the shallow information transmitted to the high level; the yellow dotted—the new bottom-up path; —convolution operation with the convolution kernel; —the double upsampling operation by bilinear interpolation; —convolution operation with the convolution kernel and a stride of 2, —convolution operation with the convolution kernel and a stride of 1.
Figure 2.
The network structure of the feature fusion network. The red dotted line—the bottom-up path of the shallow information transmitted to the high level; the yellow dotted—the new bottom-up path; —convolution operation with the convolution kernel; —the double upsampling operation by bilinear interpolation; —convolution operation with the convolution kernel and a stride of 2, —convolution operation with the convolution kernel and a stride of 1.
Figure 3.
ResNet50 network structure; the red arrow indicates the path from to .
Figure 3.
ResNet50 network structure; the red arrow indicates the path from to .
Figure 4.
Multi-feature selection of multi-scale feature maps.
Figure 4.
Multi-feature selection of multi-scale feature maps.
Figure 5.
Detailed information on the multi-feature selection module. CNNs: four layers of convolution, ⊙: Hadamard product, ⊕ Matrix addition.
Figure 5.
Detailed information on the multi-feature selection module. CNNs: four layers of convolution, ⊙: Hadamard product, ⊕ Matrix addition.
Figure 6.
Visualization results of multi-scale feature maps. From top to bottom, there are the multi-scale feature maps , the multi-scale feature maps used for classification tasks, and the multi-scale feature map used for regression tasks.
Figure 6.
Visualization results of multi-scale feature maps. From top to bottom, there are the multi-scale feature maps , the multi-scale feature maps used for classification tasks, and the multi-scale feature map used for regression tasks.
Figure 7.
The regression inaccuracy of the five-parameter method. RetinaNet is the base model. The cars and ships in the red box have not been accurately detected, and the angles between the prediction boxes and the ground truth are different.
Figure 7.
The regression inaccuracy of the five-parameter method. RetinaNet is the base model. The cars and ships in the red box have not been accurately detected, and the angles between the prediction boxes and the ground truth are different.
Figure 8.
Visual detection results of some typical objects based on the proposed classification method.
Figure 8.
Visual detection results of some typical objects based on the proposed classification method.
Figure 9.
DOTA dataset detection results (the first line is the proposed method, the second line is the RetinaNet method).
Figure 9.
DOTA dataset detection results (the first line is the proposed method, the second line is the RetinaNet method).
Figure 10.
Detection results of ships with cloud and fog interferences on the images of the DOTA-GF dataset (left: RetinaNet method; right: our proposed method).
Figure 10.
Detection results of ships with cloud and fog interferences on the images of the DOTA-GF dataset (left: RetinaNet method; right: our proposed method).
Figure 11.
Comparison of detection results of small ships on the DOTA-GF dataset images (left: RetinaNet method; right: our proposed method).
Figure 11.
Comparison of detection results of small ships on the DOTA-GF dataset images (left: RetinaNet method; right: our proposed method).
Figure 12.
Comparison of large-scale target detection results in DOTA-GF dataset images (left: RetinaNet method; right: our proposed method).
Figure 12.
Comparison of large-scale target detection results in DOTA-GF dataset images (left: RetinaNet method; right: our proposed method).
Table 1.
The experimental results of the bidirectional multi-scale feature fusion network.
Table 1.
The experimental results of the bidirectional multi-scale feature fusion network.
Method | AP (%) | mAP (%) |
---|
PL | SH | BG | SV | LV | ST |
---|
FPN | 83.4 | 62.2 | 32.3 | 65.7 | 48.3 | 74.9 | 61.1 |
our-FPN | | | | | | | |
Table 2.
Experimental results of different attention mechanisms.
Table 2.
Experimental results of different attention mechanisms.
Method | AP (%) | mAP (%) |
---|
PL | SH | BG | SV | LV | ST |
---|
Baseline | 83.4 | 62.2 | 32.3 | 65.7 | 48.3 | 74.9 | 61.1 |
SE | 83.6 | 64.3 | 33.4 | 66.1 | | 74.1 | 61.9 |
CBAM | 84.4 | | | 67.0 | 49.1 | 75.2 | 62.3 |
MFSM | | 63.4 | 33.6 | | 49.5 | | |
Table 3.
Experimental results of RetinaNet using classification and regression methods to predict angles.
Table 3.
Experimental results of RetinaNet using classification and regression methods to predict angles.
Method | AP (%) | mAP (%) |
---|
PL | SH | BG | SV | LV | ST |
---|
Regression | 83.4 | 62.2 | 32.3 | 65.7 | 48.3 | 74.9 | 61.1 |
Classification | | | | | | | |
Table 4.
Comparison results of different algorithms on the DOTA dataset.
Table 4.
Comparison results of different algorithms on the DOTA dataset.
Category | CSL | RRPN | RetinaNet | Xiao | Proposed |
---|
PL | 84.2 | 83.9 | 83.4 | 78 | |
SH | 64.9 | 47.2 | 62.2 | 65 | |
BG | 34.5 | 32.3 | 32.3 | 38 | |
LV | 51.5 | 49.7 | 48.3 | 59 | 54.2 |
SV | 67.6 | 34.7 | 65.7 | 37 | |
ST | 75.8 | 48.8 | 74.9 | 50 | |
mAP (%) | 63.1 | 48.0 | 61.1 | 55 | |
Table 5.
Comparison results of different algorithms on the DOTA-GF dataset.
Table 5.
Comparison results of different algorithms on the DOTA-GF dataset.
Category | CSL | RRPN | RetinaNet | R3Det | Proposed |
---|
PL | 83.6 | 81.7 | 83.2 | | 84.6 |
SH | 64.1 | 46.8 | 61.0 | 66.1 | |
BG | 35.3 | 34.8 | 32.5 | 35.5 | |
LV | 50.4 | 48.2 | 50.2 | | 53.8 |
SV | 64.7 | 33.8 | 64.5 | 59.8 | |
ST | 72.9 | 48.6 | 72.7 | 70.5 | |
mAP(%) | 56.5 | 49.0 | 60.7 | 63.1 | |
Table 6.
Comparisons with different methods on the HRSC2016 dataset.
Table 6.
Comparisons with different methods on the HRSC2016 dataset.
Methods | Size | mAP (%) |
---|
R2CNN | 800 × 800 | 73.7 |
RRPN | 800 × 800 | 79.1 |
RetinaNet | 800 × 800 | 81.7 |
RoI transformer | 512 × 800 | 86.2 |
Proposed | 800 × 800 | |