Figure 1.
The overall architecture of the proposed MAF-Net. A dedicated small-object detection layer is appended to the YOLOv7 head at the low-level feature map stage, and a coordinate attention mechanism is integrated into the detection head. The detection anchors are clustered using the KMeans++ algorithm to optimize small-object detection performance.
Figure 1.
The overall architecture of the proposed MAF-Net. A dedicated small-object detection layer is appended to the YOLOv7 head at the low-level feature map stage, and a coordinate attention mechanism is integrated into the detection head. The detection anchors are clustered using the KMeans++ algorithm to optimize small-object detection performance.
Figure 2.
Block diagram of the MAF-Net architecture.
Figure 2.
Block diagram of the MAF-Net architecture.
Figure 3.
Schematic diagram of the receptive field.
Figure 3.
Schematic diagram of the receptive field.
Figure 4.
Structure diagram of the hybrid-attention encoder channel attention module.
Figure 4.
Structure diagram of the hybrid-attention encoder channel attention module.
Figure 5.
Structure diagram of the hybrid-attention encoder spatial attention module.
Figure 5.
Structure diagram of the hybrid-attention encoder spatial attention module.
Figure 6.
Model structure of Hybrid-Attention Encoder.
Figure 6.
Model structure of Hybrid-Attention Encoder.
Figure 7.
Model structure of attention-guided decoder.
Figure 7.
Model structure of attention-guided decoder.
Figure 8.
(a) Target size distribution in the AI-TOD dataset: Small targets (< pixels): 97.96% of total Medium targets (– pixels): 2.04% Large targets (> pixels): Absent (b) Quantitative distribution of object categories in AI-TOD dataset.
Figure 8.
(a) Target size distribution in the AI-TOD dataset: Small targets (< pixels): 97.96% of total Medium targets (– pixels): 2.04% Large targets (> pixels): Absent (b) Quantitative distribution of object categories in AI-TOD dataset.
Figure 9.
(a) Target size distribution in the DOTA dataset: Small targets (< pixels): 35.75% of total Medium targets (– pixels): 53.49% Large targets (> pixels): 10.77% (b) Quantitative distribution of object categories in DOTA dataset.
Figure 9.
(a) Target size distribution in the DOTA dataset: Small targets (< pixels): 35.75% of total Medium targets (– pixels): 53.49% Large targets (> pixels): 10.77% (b) Quantitative distribution of object categories in DOTA dataset.
Figure 10.
(a) Target size distribution in the RSOD dataset: Small targets (< pixels): 13.30% of total Medium targets (– pixels): 61.30% Large targets (> pixels): 25.40% (b) Quantitative distribution of object categories in RSOD dataset.
Figure 10.
(a) Target size distribution in the RSOD dataset: Small targets (< pixels): 13.30% of total Medium targets (– pixels): 61.30% Large targets (> pixels): 25.40% (b) Quantitative distribution of object categories in RSOD dataset.
Figure 11.
When the feature maps of the three detection heads in YOLOv7 are (a) 80 × 80, (b) 40 × 40, and (c) 20 × 20, respectively, it is evident that only the detection head with the hightest resolution can effectively detect small targets, while the other two detection heads struggle to detect them.
Figure 11.
When the feature maps of the three detection heads in YOLOv7 are (a) 80 × 80, (b) 40 × 40, and (c) 20 × 20, respectively, it is evident that only the detection head with the hightest resolution can effectively detect small targets, while the other two detection heads struggle to detect them.
Figure 12.
There are four images in each row, (a–d); (e–h); (i–l) are attention heatmaps of a detected target waiting below four detection layers. (a,e,i); (b,f,j); (c,g,k); (d,h,l) are the heatmaps of attention under the 20 × 20, 40 × 40, 80 × 80, 160 × 160 feature maps, respectively.
Figure 12.
There are four images in each row, (a–d); (e–h); (i–l) are attention heatmaps of a detected target waiting below four detection layers. (a,e,i); (b,f,j); (c,g,k); (d,h,l) are the heatmaps of attention under the 20 × 20, 40 × 40, 80 × 80, 160 × 160 feature maps, respectively.
Figure 13.
The left image (a) illustrates the results of object detection using the previous anchor box configuration, where no targets were detected. In contrast, the right image (b) shows the detection results after adjusting the size of the anchor boxes. It is evident that by optimizing the size of the anchor boxes, the accuracy of object detection can be significantly improved.
Figure 13.
The left image (a) illustrates the results of object detection using the previous anchor box configuration, where no targets were detected. In contrast, the right image (b) shows the detection results after adjusting the size of the anchor boxes. It is evident that by optimizing the size of the anchor boxes, the accuracy of object detection can be significantly improved.
Figure 14.
The left image (a) shows the object detection results before applying the Dual-path Attention, while the right image (b) presents the results after applying the Dual-path Attention. In (a), 205 vehicles were detected, whereas in (b), 300 vehicles were detected. This demonstrates that the Dual-path Attention can enhance the detection accuracy of small targets in areas with a high density of such targets.
Figure 14.
The left image (a) shows the object detection results before applying the Dual-path Attention, while the right image (b) presents the results after applying the Dual-path Attention. In (a), 205 vehicles were detected, whereas in (b), 300 vehicles were detected. This demonstrates that the Dual-path Attention can enhance the detection accuracy of small targets in areas with a high density of such targets.
Figure 15.
The left figure (a) shows the thermal map before processing through the Dual-path Attention, while the right figure (b) displays the thermal map after processing through the Dual-path Attention. It is evident that the Dual-path Attention effectively reduces the attention on non-car parts and enhances the attention on the target car.
Figure 15.
The left figure (a) shows the thermal map before processing through the Dual-path Attention, while the right figure (b) displays the thermal map after processing through the Dual-path Attention. It is evident that the Dual-path Attention effectively reduces the attention on non-car parts and enhances the attention on the target car.
Figure 16.
Visualization of attention heatmaps. (a) Original image. (b) Heatmap produced by YOLOv7, exhibiting widespread background activation (attention noise). (c) Heatmap produced by the proposed model, where responses are concentrated on the target regions and background activations are suppressed.
Figure 16.
Visualization of attention heatmaps. (a) Original image. (b) Heatmap produced by YOLOv7, exhibiting widespread background activation (attention noise). (c) Heatmap produced by the proposed model, where responses are concentrated on the target regions and background activations are suppressed.
Figure 17.
The DJI Mavic Mini 2 unmanned aerial vehicle (UAV) employed as the physical experimental platform for the field tests.
Figure 17.
The DJI Mavic Mini 2 unmanned aerial vehicle (UAV) employed as the physical experimental platform for the field tests.
Figure 18.
The mobile control, data reception, and image processing terminal.
Figure 18.
The mobile control, data reception, and image processing terminal.
Figure 19.
Experimental results in real-world environments.
Figure 19.
Experimental results in real-world environments.
Figure 20.
Sample images of the IGPWG dataset under diverse environmental conditions.
Figure 20.
Sample images of the IGPWG dataset under diverse environmental conditions.
Figure 21.
Sample images of the dataset under different altitude conditions. (a) High-altitude aircraft UAV image; (b) Medium-altitude aircraft UAV image; (c) Low-altitude aircraft UAV image; (d) High-altitude helicopter UAV image; (e) Medium-altitude helicopter UAV image; (f) Low-altitude helicopter UAV image.
Figure 21.
Sample images of the dataset under different altitude conditions. (a) High-altitude aircraft UAV image; (b) Medium-altitude aircraft UAV image; (c) Low-altitude aircraft UAV image; (d) High-altitude helicopter UAV image; (e) Medium-altitude helicopter UAV image; (f) Low-altitude helicopter UAV image.
Figure 22.
Sample images under different illumination conditions.
Figure 22.
Sample images under different illumination conditions.
Figure 23.
Sample images of the Water dataset under different blur conditions.
Figure 23.
Sample images of the Water dataset under different blur conditions.
Figure 24.
Sample images of the Indoor dataset under different blur conditions.
Figure 24.
Sample images of the Indoor dataset under different blur conditions.
Table 1.
Given the input image size of , the parameters of different detection heads.
Table 1.
Given the input image size of , the parameters of different detection heads.
| Feature Map Size | Downsampling Multiple | Receptive Field |
|---|
| 20 × 20 | 32 times downsampling | 32 × 32 |
| 40 × 40 | 16 times downsampling | 16 × 16 |
| 80 × 80 | 8 times downsampling | 8 × 8 |
| 160 × 160 | 4 times downsampling | 4 × 4 |
Table 2.
Number of different target types in the AI-TOD dataset.
Table 2.
Number of different target types in the AI-TOD dataset.
| Object Class | Train Number | Test Number | Proportion |
|---|
| person | 14127 | 3841 | 5.00 |
| vehicle | 248077 | 59915 | 87.78 |
| ship | 13541 | 3791 | 4.79 |
| airplane | 623 | 170 | 0.22 |
| storage-tank | 5278 | 2479 | 1.87 |
| bridge | 512 | 140 | 0.18 |
| wind-mill | 176 | 67 | 0.06 |
| pool | 293 | 34 | 0.10 |
Table 3.
Number of different target types in the DOTA dataset.
Table 3.
Number of different target types in the DOTA dataset.
| Object Class | Train Number | Test Number | Proportion |
|---|
| plane | 8055 | 2531 | 8.77 |
| large-vehicle | 16969 | 4387 | 15.20 |
| small-vehicle | 26126 | 5438 | 18.85 |
| ship | 28068 | 8960 | 31.05 |
| harbor | 5983 | 2090 | 7.24 |
| ground-track-field | 325 | 144 | 0.50 |
| soccer-ball-field | 326 | 153 | 0.53 |
| tennis-court | 2367 | 760 | 2.63 |
| baseball-diamond | 415 | 214 | 0.74 |
| swimming-pool | 1736 | 440 | 1.52 |
| roundabout | 399 | 179 | 0.62 |
| basketball-court | 515 | 132 | 0.46 |
| storage-tank | 5029 | 2888 | 10.01 |
| bridge | 2047 | 464 | 1.61 |
| helicopter | 630 | 73 | 0.25 |
Table 4.
Number of different target types in the RSOD dataset.
Table 4.
Number of different target types in the RSOD dataset.
| Object Class | Image Number | Entity Number | Proportion |
|---|
| aircraft | 446 | 4993 | 71.84 |
| oiltank | 189 | 191 | 2.75 |
| overpass | 176 | 180 | 2.59 |
| playground | 165 | 1586 | 22.82 |
Table 5.
Class-wise object detection results of MAF-Net on the AI-TOD dataset.
Table 5.
Class-wise object detection results of MAF-Net on the AI-TOD dataset.
| Class | Images | Labels | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 2804 | 70437 | 0.667 | 0.553 | 0.558 | 0.245 |
| person | 2804 | 3841 | 0.743 | 0.264 | 0.356 | 0.118 |
| vehicle | 2804 | 59915 | 0.766 | 0.745 | 0.754 | 0.315 |
| ship | 2804 | 3791 | 0.779 | 0.667 | 0.722 | 0.348 |
| airplane | 2804 | 170 | 0.802 | 0.794 | 0.826 | 0.396 |
| storage-tank | 2804 | 2479 | 0.831 | 0.83 | 0.864 | 0.475 |
| bridge | 2804 | 140 | 0.683 | 0.462 | 0.52 | 0.204 |
| wind-mill | 2804 | 67 | 0.3 | 0.284 | 0.183 | 0.0394 |
| pool | 2804 | 34 | 0.432 | 0.382 | 0.242 | 0.064 |
Table 6.
Class-wise object detection results on the DOTA dataset.
Table 6.
Class-wise object detection results on the DOTA dataset.
| Class | Images | Labels | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 458 | 28853 | 0.718 | 0.474 | 0.49 | 0.28 |
| plane | 458 | 2531 | 0.812 | 0.727 | 0.739 | 0.461 |
| large-vehicle | 458 | 4387 | 0.718 | 0.775 | 0.764 | 0.499 |
| small-vehicle | 458 | 5438 | 0.579 | 0.6 | 0.569 | 0.309 |
| ship | 458 | 8960 | 0.799 | 0.584 | 0.583 | 0.312 |
| harbor | 458 | 2090 | 0.683 | 0.755 | 0.713 | 0.319 |
| ground-track-field | 458 | 144 | 0.774 | 0.309 | 0.396 | 0.169 |
| soccer-ball-field | 458 | 153 | 0.601 | 0.333 | 0.298 | 0.171 |
| tennis-court | 458 | 760 | 0.839 | 0.909 | 0.921 | 0.773 |
| baseball-diamond | 458 | 214 | 0.763 | 0.491 | 0.578 | 0.308 |
| swimming-pool | 458 | 440 | 0.646 | 0.55 | 0.487 | 0.188 |
| roundabout | 458 | 179 | 0.685 | 0.0615 | 0.103 | 0.0366 |
| basketball-court | 458 | 132 | 0.586 | 0.402 | 0.408 | 0.289 |
| storage-tank | 458 | 2888 | 0.68 | 0.346 | 0.357 | 0.174 |
| bridge | 458 | 464 | 0.602 | 0.235 | 0.257 | 0.0748 |
| helicopter | 458 | 73 | 1 | 0.0323 | 0.182 | 0.123 |
Table 7.
Class-wise object detection results of MAF-Net on the RSOD dataset.
Table 7.
Class-wise object detection results of MAF-Net on the RSOD dataset.
| Class | Images | Labels | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 253 | 777 | 0.347 | 0.955 | 0.349 | 0.238 |
| aircraft | 253 | 546 | 0.33 | 0.973 | 0.337 | 0.224 |
| oiltank | 253 | 197 | 0.372 | 0.954 | 0.373 | 0.306 |
| overpass | 253 | 19 | 0.347 | 0.895 | 0.321 | 0.14 |
| playground | 253 | 15 | 0.338 | 1 | 0.364 | 0.28 |
Table 8.
Fine-grained ablation experiment results on AI-TOD dataset.
Table 8.
Fine-grained ablation experiment results on AI-TOD dataset.
| Algorithm | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| YOLOv7 (Baseline) | 0.664 | 0.243 | 0.256 | 0.104 |
| YOLOv7 + HAE (Average/Max Pooling Only) | 0.612 | 0.357 | 0.328 | 0.142 |
| YOLOv7 + HAE (With Variance Pooling) | 0.635 | 0.402 | 0.360 | 0.165 |
| YOLOv7 + AGD (Without Coordinate Decoupling) | 0.587 | 0.396 | 0.341 | 0.151 |
| YOLOv7 + AGD (With Coordinate Decoupling) | 0.603 | 0.448 | 0.385 | 0.179 |
| YOLOv7 + HAE + AGD (Dual-path Attention) | 0.648 | 0.501 | 0.433 | 0.208 |
| YOLOv7 + 160 × 160 Detection Layer | 0.537 | 0.401 | 0.420 | 0.181 |
| YOLOv7 + Density-adaptive Anchor | 0.628 | 0.327 | 0.301 | 0.132 |
| YOLOv7 + Hierarchical Feature Aggregation | 0.641 | 0.315 | 0.304 | 0.135 |
| YOLOv7 + Joint Optimization of Three Components | 0.605 | 0.473 | 0.489 | 0.215 |
| MAF-Net (Complete Model) | 0.667 | 0.553 | 0.558 | 0.245 |
Table 9.
Fine-grained ablation experiment results on DOTA dataset.
Table 9.
Fine-grained ablation experiment results on DOTA dataset.
| Algorithm | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| YOLOv7 (Baseline) | 0.413 | 0.465 | 0.260 | 0.156 |
| YOLOv7 + HAE (Average/Max Pooling Only) | 0.389 | 0.521 | 0.315 | 0.173 |
| YOLOv7 + HAE (With Variance Pooling) | 0.407 | 0.558 | 0.343 | 0.189 |
| YOLOv7 + AGD (Without Coordinate Decoupling) | 0.376 | 0.513 | 0.302 | 0.168 |
| YOLOv7 + AGD (With Coordinate Decoupling) | 0.398 | 0.572 | 0.338 | 0.185 |
| YOLOv7 + HAE + AGD (Dual-path Attention) | 0.425 | 0.614 | 0.379 | 0.207 |
| YOLOv7 + 160 × 160 Detection Layer | 0.412 | 0.513 | 0.279 | 0.166 |
| YOLOv7 + Density-adaptive Anchor | 0.435 | 0.498 | 0.297 | 0.169 |
| YOLOv7 + Hierarchical Feature Aggregation | 0.429 | 0.487 | 0.291 | 0.167 |
| YOLOv7 + Joint Optimization of Three Components | 0.468 | 0.572 | 0.336 | 0.189 |
| MAF-Net (Complete Model) | 0.718 | 0.474 | 0.490 | 0.280 |
Table 10.
Fine-grained ablation experiment results on RSOD dataset.
Table 10.
Fine-grained ablation experiment results on RSOD dataset.
| Algorithm | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| YOLOv7 (Baseline) | 0.056 | 0.142 | 0.027 | 0.017 |
| YOLOv7 + HAE (Average/Max Pooling Only) | 0.189 | 0.423 | 0.156 | 0.098 |
| YOLOv7 + HAE (With Variance Pooling) | 0.214 | 0.487 | 0.182 | 0.117 |
| YOLOv7 + AGD (Without Coordinate Decoupling) | 0.197 | 0.456 | 0.163 | 0.102 |
| YOLOv7 + AGD (With Coordinate Decoupling) | 0.226 | 0.532 | 0.195 | 0.124 |
| YOLOv7 + HAE + AGD (Dual-path Attention) | 0.253 | 0.601 | 0.227 | 0.143 |
| YOLOv7 + 160 × 160 Detection Layer | 0.223 | 0.612 | 0.208 | 0.131 |
| YOLOv7 + Density-adaptive Anchor | 0.164 | 0.385 | 0.134 | 0.089 |
| YOLOv7 + Hierarchical Feature Aggregation | 0.172 | 0.398 | 0.141 | 0.093 |
| YOLOv7 + Joint Optimization of Three Components | 0.276 | 0.712 | 0.264 | 0.162 |
| MAF-Net (Complete Model) | 0.347 | 0.955 | 0.349 | 0.238 |
Table 11.
Comparison of object detection results among different models in AI-TOD dataset.
Table 11.
Comparison of object detection results among different models in AI-TOD dataset.
| Model Name | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| MAF-Net | 0.667 | 0.553 | 0.558 | 0.245 |
| YOLOv7-SCD | 0.5789 | 0.4543 | 0.4677 | 0.2045 |
| YOLOv7-UWSC | 0.741 | 0.3491 | 0.372 | 0.162 |
| YOLOv7 Improved | 0.588 | 0.4874 | 0.4893 | 0.211 |
| YOLOv7-tiny | 0.7602 | 0.2715 | 0.2892 | 0.1161 |
| YOLOv7-MH | 0.6304 | 0.4299 | 0.4417 | 0.1917 |
Table 12.
Comparison of object detection results among different models in DOTA dataset.
Table 12.
Comparison of object detection results among different models in DOTA dataset.
| Model Name | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| MAF-Net | 0.718 | 0.474 | 0.49 | 0.28 |
| YOLOv7-SCD | 0.6633 | 0.433 | 0.4234 | 0.2209 |
| YOLOv7-UWSC | 0.405 | 0.51 | 0.281 | 0.166 |
| YOLOv7 Improved | 0.472 | 0.625 | 0.32 | 0.264 |
| YOLOv7-tiny | 0.381 | 0.35 | 0.193 | 0.094 |
| YOLOv7-MH | 0.406 | 0.463 | 0.257 | 0.146 |
Table 13.
Comparison of object detection results among different models in RSOD dataset.
Table 13.
Comparison of object detection results among different models in RSOD dataset.
| Model Name | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| MAF-Net | 0.347 | 0.955 | 0.349 | 0.238 |
| YOLOv7-SCD | 0.305 | 0.829 | 0.311 | 0.21 |
| YOLOv7-UWSC | 0.0396 | 0.0962 | 0.0205 | 0.0129 |
| YOLOv7 Improved | 0.582 | 0.224 | 0.0734 | 0.0121 |
| YOLOv7-tiny | 0.248 | 0.695 | 0.243 | 0.147 |
| YOLOv7-MH | 0.149 | 0.5 | 0.126 | 0.0779 |
Table 14.
Comparison of model parameters, computational complexity and inference speed.
Table 14.
Comparison of model parameters, computational complexity and inference speed.
| Model Name | Parameters (M) | FLOPs (G) | FPS |
|---|
| MAF-Net | 39.123 | 59.773 | 21.277 |
| YOLOv7-SCD | 41.564 | 49.453 | 9.346 |
| YOLOv7-UWSC | 42.437 | 112.139 | 14.286 |
| YOLOv7 Improved | 37.829 | 59.942 | 11.96 |
| YOLOv7-tiny | 6.270 | 6.697 | 38.46 |
| YOLOv7-MH | 37.532 | 52.608 | 23.81 |
Table 15.
Detection accuracy comparison of various models for small object detection on AI-TOD dataset.
Table 15.
Detection accuracy comparison of various models for small object detection on AI-TOD dataset.
| Model Name | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 |
|---|
| MAF-Net | 0.667 | 0.553 | 0.558 | 0.245 |
| YOLOv7-UWSC [50] | 0.741 | 0.3491 | 0.372 | 0.162 |
| YOLOv7-tiny [51] | 0.7602 | 0.2715 | 0.2892 | 0.1161 |
| KANs-DETR [39] | 0.638 | 0.412 | 0.427 | 0.186 |
| YOLO-CC [45] | 0.605 | 0.437 | 0.443 | 0.197 |
Table 16.
Detailed specifications of the DJI Mavic Mini 2 UAV.
Table 16.
Detailed specifications of the DJI Mavic Mini 2 UAV.
| Item | Specification |
|---|
| Folded dimensions (without propellers) | mm |
| Unfolded dimensions (without propellers) | mm |
| Diagonal wheelbase | 213 mm |
| Maximum horizontal flight speed (near sea level, no wind) | 16 m/s (Sport mode), 10 m/s (Normal mode), 6 m/s (Cine mode) |
| Maximum ascent speed | 6 m/s (Normal mode), 8 m/s (Sport mode) |
| Maximum descent speed | 6 m/s |
| Maximum hover time | 38 min |
| Maximum flight time | 45 min |
| Battery capacity | 5000 mAh |
| Gimbal pitch range | to |
| Gimbal roll range | to |
| Gimbal yaw range | to |
Table 17.
Detailed distribution of the test dataset across different groups.
Table 17.
Detailed distribution of the test dataset across different groups.
| Category | Indoor | Grid | Pavement | Water | Grass |
|---|
| Images | 208 | 203 | 848 | 367 | 154 |
| AEW Aircraft | 14 | 49 | 155 | 0 | 17 |
| Aircraft | 27 | 15 | 128 | 0 | 27 |
| Fighter | 18 | 31 | 173 | 0 | 17 |
| Helicopter | 54 | 58 | 184 | 0 | 22 |
| Hummer | 19 | 59 | 226 | 0 | 50 |
| Missile | 40 | 46 | 127 | 0 | 0 |
| Tank | 42 | 36 | 199 | 0 | 49 |
| Truck | 10 | 53 | 94 | 0 | 5 |
| Warship | 0 | 0 | 0 | 339 | 0 |
| Yacht | 0 | 0 | 0 | 227 | 0 |
Table 18.
Class-wise object detection performance on the Indoor dataset.
Table 18.
Class-wise object detection performance on the Indoor dataset.
| Class | Images | Labels | P | R | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 26 | 28 | 0.679 | 0.9 | 0.859 | 0.742 |
| aew | 26 | 6 | 0.888 | 1 | 0.995 | 0.85 |
| aircraft | 26 | 2 | 0.179 | 1 | 0.995 | 0.995 |
| fighter | 26 | 6 | 0.885 | 1 | 0.995 | 0.826 |
| helicopter | 26 | 4 | 0.992 | 0.25 | 0.579 | 0.457 |
| hummer | 26 | 1 | 0.547 | 1 | 0.995 | 0.896 |
| missile | 26 | 3 | 0.362 | 0.95 | 0.456 | 0.328 |
| tank | 26 | 1 | 0.841 | 1 | 0.995 | 0.896 |
| truck | 26 | 5 | 0.734 | 1 | 0.862 | 0.686 |
Table 19.
Class-wise object detection performance on the Grid dataset.
Table 19.
Class-wise object detection performance on the Grid dataset.
| Class | Images | Labels | P | R | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 20 | 35 | 0.745 | 0.648 | 0.711 | 0.542 |
| aew | 20 | 3 | 0.767 | 1 | 0.995 | 0.807 |
| aircraft | 20 | 1 | 1 | 0 | 0.199 | 0.139 |
| fighter | 20 | 4 | 0.931 | 0.25 | 0.459 | 0.34 |
| helicopter | 20 | 5 | 0.866 | 1 | 0.995 | 0.699 |
| hummer | 20 | 7 | 0.649 | 1 | 0.889 | 0.619 |
| missile | 20 | 5 | 0.558 | 0.6 | 0.662 | 0.526 |
| tank | 20 | 6 | 0.498 | 0.332 | 0.491 | 0.407 |
| truck | 20 | 4 | 0.693 | 1 | 0.995 | 0.796 |
Table 20.
Class-wise object detection performance on the Pavement dataset.
Table 20.
Class-wise object detection performance on the Pavement dataset.
| Class | Images | Labels | P | R | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 84 | 123 | 0.974 | 0.989 | 0.994 | 0.791 |
| aew | 84 | 12 | 0.916 | 1 | 0.995 | 0.769 |
| aircraft | 84 | 13 | 0.953 | 1 | 0.995 | 0.782 |
| fighter | 84 | 14 | 0.983 | 1 | 0.995 | 0.816 |
| helicopter | 84 | 20 | 0.983 | 1 | 0.995 | 0.822 |
| hummer | 84 | 20 | 0.985 | 1 | 0.995 | 0.809 |
| missile | 84 | 14 | 0.99 | 1 | 0.995 | 0.792 |
| tank | 84 | 19 | 0.982 | 1 | 0.995 | 0.873 |
| truck | 84 | 11 | 1 | 0.909 | 0.988 | 0.664 |
Table 21.
Class-wise object detection performance on the Water dataset.
Table 21.
Class-wise object detection performance on the Water dataset.
| Class | Images | Labels | P | R | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 36 | 60 | 0.992 | 0.985 | 0.995 | 0.632 |
| warship | 36 | 33 | 1 | 0.97 | 0.995 | 0.73 |
| yacht | 36 | 27 | 0.984 | 1 | 0.995 | 0.533 |
Table 22.
Class-wise object detection performance on the Grass dataset.
Table 22.
Class-wise object detection performance on the Grass dataset.
| Class | Images | Labels | P | R | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 15 | 19 | 0.857 | 0.75 | 0.829 | 0.603 |
| aew | 15 | 2 | 0.993 | 0.5 | 0.828 | 0.679 |
| aircraft | 15 | 2 | 0.695 | 1 | 0.995 | 0.721 |
| helicopter | 15 | 4 | 0.719 | 1 | 0.995 | 0.821 |
| hummer | 15 | 5 | 0.845 | 1 | 0.995 | 0.706 |
| tank | 15 | 5 | 0.888 | 1 | 0.995 | 0.577 |
| truck | 15 | 1 | 1 | 0 | 0.166 | 0.116 |
Table 23.
Class-wise object detection performance on the IGPWG dataset.
Table 23.
Class-wise object detection performance on the IGPWG dataset.
| Class | Images | Labels | P | R | mAP@0.5 | mAP@0.5:0.95 |
|---|
| all | 178 | 262 | 0.963 | 0.987 | 0.99 | 0.766 |
| aew | 178 | 20 | 0.854 | 1 | 0.995 | 0.8 |
| aircraft | 178 | 14 | 0.973 | 1 | 0.995 | 0.828 |
| fighter | 178 | 27 | 0.963 | 0.969 | 0.994 | 0.806 |
| helicopter | 178 | 39 | 0.998 | 1 | 0.995 | 0.787 |
| hummer | 178 | 35 | 0.99 | 1 | 0.995 | 0.792 |
| missile | 178 | 24 | 0.953 | 1 | 0.995 | 0.814 |
| tank | 178 | 33 | 0.97 | 1 | 0.995 | 0.829 |
| truck | 178 | 18 | 0.976 | 0.944 | 0.99 | 0.741 |
| warship | 178 | 31 | 0.996 | 1 | 0.996 | 0.739 |
| yacht | 178 | 21 | 0.952 | 0.953 | 0.952 | 0.525 |