Figure 1.
Analysis of grey-value characteristics of UAV target. A comparison between (c,d) demonstrates that the features of the UAV target in complex backgrounds are prone to being obscured by background noise. (a) Orign image. (b) SCR calculation region. (c) Global 3D grayscale image. (d) 3D grayscale image of the target region.
Figure 1.
Analysis of grey-value characteristics of UAV target. A comparison between (c,d) demonstrates that the features of the UAV target in complex backgrounds are prone to being obscured by background noise. (a) Orign image. (b) SCR calculation region. (c) Global 3D grayscale image. (d) 3D grayscale image of the target region.
Figure 2.
The line chart analyses the Signal-to-Clutter Ratio (SCR) on the AntiUAV410 dataset. It marks the lowest-SCR region and provides corresponding image examples.
Figure 2.
The line chart analyses the Signal-to-Clutter Ratio (SCR) on the AntiUAV410 dataset. It marks the lowest-SCR region and provides corresponding image examples.
Figure 3.
Illustration of challenging scenarios in anti-UAV tracking. # indicates frame number.
Figure 3.
Illustration of challenging scenarios in anti-UAV tracking. # indicates frame number.
Figure 4.
The general framework of FSTC-DiMP.
Figure 4.
The general framework of FSTC-DiMP.
Figure 5.
Perceptual capabilities of CNN (a), self-attention (b), and LSK (c).
Figure 5.
Perceptual capabilities of CNN (a), self-attention (b), and LSK (c).
Figure 6.
Overall framework of spatio-temporal consistency-guided re-detection.
Figure 6.
Overall framework of spatio-temporal consistency-guided re-detection.
Figure 7.
When peak interference occurs, the system first retrieves the result from the previous frame (denoted by a red bounding box). The primary peak corresponds to the target region represented by either a blue bounding box or a yellow bounding box, while the other box indicates the secondary peak’s target region. "?" indicates target lost.
Figure 7.
When peak interference occurs, the system first retrieves the result from the previous frame (denoted by a red bounding box). The primary peak corresponds to the target region represented by either a blue bounding box or a yellow bounding box, while the other box indicates the secondary peak’s target region. "?" indicates target lost.
Figure 8.
Enhanced Feature Learning Based on Background Augmentation.
Figure 8.
Enhanced Feature Learning Based on Background Augmentation.
Figure 9.
The overall success plots (a) and precision plots (b) of FSTC-DiMP and other trackers on the AntiUAV410 test set, along with the overall success plots (c) and precision plots (d) on the AntiUAV600 validation set.
Figure 9.
The overall success plots (a) and precision plots (b) of FSTC-DiMP and other trackers on the AntiUAV410 test set, along with the overall success plots (c) and precision plots (d) on the AntiUAV600 validation set.
Figure 10.
Qualitative comparison of five trackers on the Anti-UAV 410 dataset, where we selected six challenging sequences: DBC (dynamic background clutter), FM (fast motion), OC (occlusion), TC (thermal crossover), OV (out of view), and SV (scale variation).
Figure 10.
Qualitative comparison of five trackers on the Anti-UAV 410 dataset, where we selected six challenging sequences: DBC (dynamic background clutter), FM (fast motion), OC (occlusion), TC (thermal crossover), OV (out of view), and SV (scale variation).
Figure 11.
Evaluation of FSTC-DiMP and other trackers on the AntiUAV410 test set in terms of target size, including normal size, medium size, small size, and tiny size. Precision plots (a–d) and success plots (e–h).
Figure 11.
Evaluation of FSTC-DiMP and other trackers on the AntiUAV410 test set in terms of target size, including normal size, medium size, small size, and tiny size. Precision plots (a–d) and success plots (e–h).
Figure 12.
Attribute evaluation on the AntiUAV410 test set. In the precision plots (a–f), the legend values indicate the precision scores of corresponding trackers, while in the success plots (g–l), the legend values represent the success AUC scores of respective trackers.
Figure 12.
Attribute evaluation on the AntiUAV410 test set. In the precision plots (a–f), the legend values indicate the precision scores of corresponding trackers, while in the success plots (g–l), the legend values represent the success AUC scores of respective trackers.
Figure 13.
The AntiUAV410 test set was evaluated using attribute-specific metrics, where tracker performance was ranked according to its AUC scores.
Figure 13.
The AntiUAV410 test set was evaluated using attribute-specific metrics, where tracker performance was ranked according to its AUC scores.
Figure 14.
The IoU curves of six representative test sequences reflect the tracking quality. In the figure, the ground truth is indicated by green boxes, the red boxes represent the prediction results of FSTC-DiMP, and the blue boxes denote the prediction results of Super-DiMP. Super-DiMP performs poorly in scenarios involving camera motion, target leaving the field of view, and occlusion. In contrast, FSTC-DiMP effectively overcomes these challenges through a spatio-temporal consistency-aware re-detection mechanism.
Figure 14.
The IoU curves of six representative test sequences reflect the tracking quality. In the figure, the ground truth is indicated by green boxes, the red boxes represent the prediction results of FSTC-DiMP, and the blue boxes denote the prediction results of Super-DiMP. Super-DiMP performs poorly in scenarios involving camera motion, target leaving the field of view, and occlusion. In contrast, FSTC-DiMP effectively overcomes these challenges through a spatio-temporal consistency-aware re-detection mechanism.
Figure 15.
Manifestations of our method limitations (green boxes indicate the ground truth, and the red boxes represent the prediction results of FSTC-DiMP).
Figure 15.
Manifestations of our method limitations (green boxes indicate the ground truth, and the red boxes represent the prediction results of FSTC-DiMP).
Table 1.
Comparison of feature extraction methods.
Table 1.
Comparison of feature extraction methods.
Method | Strategy | Limitations |
---|
Dual-Semantic Feature Extraction | Parallel extraction of “matching” semantics and “foreground” semantics | When target–background semantics are similar, the dual-branch structure amplifies misclassification |
Centre-Prediction Feature Extraction | One-step mapping of CNN features into single-channel centre confidence heatmap | Heatmap robustness degrades in complex scenes |
GE-AP | Multi-feature similarity matrix construction | Computational overhead from texture feature processing |
Our Method | Dynamic adjustment of receptive field range and selectivity | Elevated computational complexity |
Table 2.
Detailed explanation of image data augmentation methods.
Table 2.
Detailed explanation of image data augmentation methods.
Method | Content |
---|
Horizontal flipping | Mirror transformation is performed using the vertical central axis as the symmetry axis to enhance the perspective diversity of samples. |
Scale variation | The scaling factors are set to 0.7 and 0.9 to simulate visual variations when the UAV target moves away from the camera. |
Viewpoint translation | Slight viewpoint offsets are simulated during the shooting process. |
Rotation | Fixed rotation transformations at ±45° are applied to simulate pitch variations induced by rapid target motion in real-world scenarios. |
Gaussian blurring | Gaussian noise is introduced to simulate three optical degradation scenarios caused by UAV movement and sensor noise: slight defocus, image blurring, and moderate out-of-focus effects. |
Salt-and-pepper noise | By configuring random cluster nodes to aggregate discrete noise into randomly distributed rectangular patches, realistic background noise interference in actual scenarios is effectively replicated. |
Table 3.
Quantitative comparison of FSTC-DiMP with single-object trackers on AntiUAV410 test set and AntiUAV600 validation set (best results in bold, second-best in red, and third-best in blue).
Table 3.
Quantitative comparison of FSTC-DiMP with single-object trackers on AntiUAV410 test set and AntiUAV600 validation set (best results in bold, second-best in red, and third-best in blue).
Method | Source | AntiUAV410 | AntiUAV600 |
---|
AUC | P | SA | AUC | P | SA |
---|
ATOM | CVPR19 | 50.4 | 70.1 | 51.4 | 41.2 | 61.7 | 41.9 |
DiMP50 | ICCV19 | 55.5 | 75.9 | 56.7 | 47.5 | 70.7 | 48.3 |
PrDiMP50 | CVPR20 | 53.6 | 75.1 | 54.7 | 51.0 | 76.4 | 51.9 |
Super-DiMP | - | 59.6 | 81.8 | 60.8 | 49.3 | 74.3 | 50.2 |
KYS | ECCV20 | 44.1 | 63.9 | 44.9 | 39.5 | 61.8 | 39.9 |
SiamCAR | CVPR20 | 46.0 | 64.7 | 46.9 | 33.8 | 53.5 | 34.3 |
SiamBAN | CVPR20 | 46.5 | 67.3 | 47.3 | 28.2 | 47.0 | 28.5 |
Stark-ST101 | ICCV21 | 56.1 | 78.6 | 57.2 | 49.1 | 74.3 | 49.8 |
AiATrack | ECCV22 | 58.4 | 83.4 | 59.6 | 47.7 | 73.4 | 48.5 |
ToMP50 | CVPR22 | 54.0 | 74.0 | 55.1 | 46.3 | 69.8 | 47.1 |
ToMP101 | CVPR22 | 54.0 | 75.2 | 55.1 | 50.6 | 75.2 | 51.5 |
DropTrack | CVPR23 | 59.0 | 82.3 | 60.2 | 50.0 | 77.1 | 50.7 |
SiamDT | PAMI2024 | 66.8 | 89.5 | 68.2 | 54.0 | 82.3 | 54.8 |
Ours | - | 67.7 | 91.3 | 69.1 | 53.6 | 79.9 | 54.4 |
Table 4.
Confidence interval analysis.
Table 4.
Confidence interval analysis.
Dataset | Sequences | AUC (95% CI) | P (95% CI) |
---|
AntiUAV410 Test | 120 | 67.7 [64.4, 71.0] | 91.3 [87.8, 94.8] |
AntiUAV600 Validation | 49 | 53.6 [45.7, 61.4] | 79.9 [72.0, 87.8] |
Table 5.
Comparison results of ablation experiments for each module, with the optimal performance metrics highlighted in bold.
Table 5.
Comparison results of ablation experiments for each module, with the optimal performance metrics highlighted in bold.
Dataset | Baseline | LSK | STR | ELB | AUC | |
---|
AntiUAV410 Validation | √ | | | | 63.9 | - |
√ | √ | | | 64.6 | +0.7 |
√ | √ | √ | | 67.4 | +3.5 |
√ | √ | √ | √ | 68.1 | +4.2 |
AntiUAV600 Validation | √ | | | | 49.3 | - |
√ | √ | | | 49.9 | +0.6 |
√ | √ | √ | | 53.1 | +3.8 |
√ | √ | √ | √ | 53.6 | +4.3 |
Table 6.
Analysis of GPU memory usage and inference time across different image resolutions.
Table 6.
Analysis of GPU memory usage and inference time across different image resolutions.
Model | Image Resolution | Memory Usage | Inference Time |
---|
FSTC-DiMP | 512 × 512 | 2.15 GB | 0.152 |
1024 × 1024 | 5.86 GB | 0.396 |
2048 × 2048 | 20.38 GB | 1.451 |
4096 × 4096 | Out of Memory | - |
Table 7.
Comparison results of experiments with different attention mechanisms, with the optimal performance metrics highlighted in bold.
Table 7.
Comparison results of experiments with different attention mechanisms, with the optimal performance metrics highlighted in bold.
Dataset | Baseline | Attention | AUC | P |
---|
AntiUAV410 Test | Super-DiMP | - | 59.6 | 81.8 |
CBAM | 52.6 | 72.2 |
ECA | 58.4 | 80.7 |
EMA | 58.6 | 80.6 |
LSK | 61.4 | 83.1 |
AntiUAV410 Validation | Super-DiMP | - | 63.9 | 85.5 |
CBAM | 55.5 | 74.6 |
ECA | 62.3 | 83.4 |
EMA | 64.1 | 86.2 |
LSK | 64.6 | 86.5 |
Table 8.
Performance comparison under different background fusion data parameters, with the best results highlighted in bold.
Table 8.
Performance comparison under different background fusion data parameters, with the best results highlighted in bold.
Dataset | Model | | AUC | P |
---|
AntiUAV410 Test | FSTC-DiMP | 40 | 67.0 | 90.2 |
60 | 67.4 | 90.8 |
80 | 67.7 | 91.3 |
100 | 67.4 | 90.8 |
AntiUAV600 Validation | FSTC-DiMP | 40 | 52.8 | 77.7 |
60 | 52.6 | 78.2 |
80 | 53.6 | 79.9 |
100 | 53.4 | 79.7 |