Next Article in Journal
Trajectory Planning of Unmanned Aerial Vehicles in Complex Environments Based on Intelligent Algorithm
Previous Article in Journal
UAVs’ Flight Dynamics Is All You Need for Wind Speed and Direction Measurement in Air
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Closed-Loop Aerial Tracking with Dynamic Detection-Tracking Coordination

1
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2
Department of Science and Engineering, The Open University of Zhangzhou, Zhangzhou 363000, China
*
Author to whom correspondence should be addressed.
Drones 2025, 9(7), 467; https://doi.org/10.3390/drones9070467
Submission received: 13 May 2025 / Revised: 22 June 2025 / Accepted: 24 June 2025 / Published: 30 June 2025
(This article belongs to the Section Drone Design and Development)

Abstract

Aerial tracking is an important service for many Unmanned Aerial Vehicle (UAV) applications. Existing work has failed to provide robust solutions when handling target disappearance, viewpoint changes, and tracking drifts in practical scenarios with limited UAV resources. In this paper, we propose a closed-loop framework integrating three key components: (1) a lightweight adaptive detection with multi-scale feature extraction, (2) spatiotemporal motion modeling through Kalman-filter-based trajectory prediction, and (3) autonomous decision-making through composite scoring of detection confidence, appearance similarity, and motion consistency. By implementing dynamic detection-tracking coordination with quality-aware feature preservation, our system enables real-time operation through performance-adaptive frequency modulation. Evaluated on VOT-ST2019 and OTB100 benchmarks, the proposed method yields marked improvements over baseline trackers, achieving a 27.94% increase in Expected Average Overlap (EAO) and a 10.39% reduction in failure rates, while sustaining a frame rate of 23–95 FPS on edge hardware. The framework achieves rapid target reacquisition during prolonged occlusion scenarios through optimized protocols, outperforming conventional methods in sustained aerial surveillance tasks.

1. Introduction

Single-target long-term UAV visual tracking aims to persistently locate and maintain the identity of a specific object in aerial video streams, which often span thousands of frames. This task begins with a single annotation (bounding box or segmentation mask) in the first frame only. The tracking system must adapt to dynamic aerial views and consistently provide accurate positional information, even as targets disappear and reappear, undergo deformations, or encounter changing illumination. Crucially, such systems require strict temporal constraints to fulfill real-time operational requirements in airborne platforms. Long-term single-target UAV visual tracking, as an open and compelling research domain, serves critical roles in numerous practical applications, including environmental monitoring [1], search and rescue [2], aerial vision tracking [3], sports analytics [4], and beyond.
In autonomous UAV tracking applications, the system architecture is typically divided into two core components: (1) persistent target tracking for continuous acquisition of positioning data, and (2) UAV flight control based on the obtained positioning information [5]. This paper primarily focuses on the former challenge, specifically investigating methodologies for maintaining persistent visual-based target tracking and acquiring continuous positioning data during long-term operations. The adoption of vision-centric solutions is necessitated by UAV payload limitations and stringent power constraints, which preclude the deployment of large-scale sensing modalities like LiDAR or millimeter-wave radar for target localization [6].
Based on our comprehensive investigation into single-target real-time persistent tracking in UAV systems, we have identified several critical challenges that remain underexplored or inadequately addressed in existing works. These challenges represent fundamental research gaps in achieving reliable long-duration aerial tracking:
  • Single-module trackers exhibit limited adaptability in target disappearance scenarios. However, augmenting these systems with redundant verification modules introduces significant computational overhead, thereby compromising real-time operational feasibility.
  • Validating tracking success remains particularly challenging under UAV-borne dynamic observation conditions, where substantial appearance variations and trajectory deviations induced by viewpoint-related geometric distortions frequently occur.
  • Existing trackers lack efficient mechanisms for rapid and robust re-acquisition confidence estimation after disappearance, particularly under the severe viewpoint changes and scale variations characteristic of UAV tracking.
The proposed single-target long-term UAV visual tracking system in this paper comprises three principal modules: (1) a lightweight object detection and feature extraction module, (2) a multi-criteria assessment module, and (3) a motion-aware tracking module. In response to these core challenges, our work delivers the following key contributions:
  • By incorporating a lightweight feature extraction network and efficient target information caching mechanisms, our approach achieves real-time detection of target disappearance and re-emergence while maintaining computational efficiency.
  • Spatiotemporal feature encoding with motion pattern analysis preserves high-quality historical representations to mitigate appearance drift.
  • The closed-loop feedback architecture among modules facilitates autonomous decision-making through performance-driven adaptation.
  • In our architecture, the object detection and feature extraction module alternates with the motion-aware tracking module. This alternating execution enables performance enhancement via dynamic feature banks and historical trajectory queues, while also operating within computational constraints.

2. Related Work

This section reviews three distinct categories of long-term UAV visual tracking methodologies: correlation filter-based trackers, deep learning-based trackers, and hybrid trackers that leverage the complementary strengths of both approaches. We systematically analyze each category’s operational mechanisms and performance characteristics in aerial scenarios [7].

2.1. Correlation Filter-Based Trackers

Correlation filter-based trackers achieve efficient target localization through frequency-domain computations. The central concept involves training a filter template on the initial frame of a video sequence; this filter then generates maximal response values when correlated with target regions in subsequent frames. By performing convolutional operations between image patches and the filter in the frequency domain, these algorithms predict target positions within millisecond-level latency, making them particularly suitable for real-time mobile platforms (e.g., UAVs) with stringent computational constraints. Fu et al. [8] proposed a temporal regularization approach for multi-frame response maps, optimizing model updating through temporal correlation constraints across historical response maps, effectively mitigating error accumulation inherent in conventional single-frame update mechanisms. Zhang et al. [9] devised a global-target dual regression filter. This approach integrates global contextual information with local target features, achieving synergistic modeling. It effectively mitigates tracking drift caused by complex background interference. However, the dual-channel architecture increases computational complexity. The computational efficiency of correlation filter algorithms facilitates their deployment on UAV platforms. Despite notable advancements, existing methods have key limitations in long-term UAV tracking, especially when handling target disappearance/reappearance and appearance drift [7].

2.2. Deep Learning-Based Trackers

Researchers have increasingly employed deep learning-based methodologies to address long-term single-target UAV tracking challenges, spurred by rapid advancements in the field. This research thrust has yielded a proliferation of sophisticated deep learning-powered tracking algorithms that demonstrate notable performance improvements. Bertinetto et al. [10] learn generic similarity maps through cross-correlation between target template and search region feature representations and conduct online adaptive updates. Sun et al. [11] integrate Siamese networks with Transformer architectures and employ drift suppression to achieve efficient, robust real-time UAV tracking. Deep learning-based trackers demonstrate superior adaptability to rapid viewpoint changes and occlusion scenarios in UAV perspectives compared to correlation filter-based approaches. However, practical deployments reveal inherent limitations, including complex computational pipelines and numerous model parameters that require substantial computational resources, coupled with dependencies on large-scale training datasets and extended training periods to achieve operational efficacy.

2.3. Hybrid Trackers

Tracking algorithms have advanced substantially by incorporating object detection methodologies [12], enabling more robust long-term tracking. Cross-domain innovations like region proposal networks and feature pyramids further leverage these techniques to address target disappearance and scale changes. Huang et al. [13] integrated detection techniques to develop a two-stage detection architecture: the first stage employs a query-guided region proposal network to generate candidate regions, while the second stage utilizes a region-based CNN for precise localization. Several studies have adopted multi-tracker frameworks for performance enhancement. For instance, FGLT [14] employs both MDNet [15] and SiamRPN++ [16] trackers to verify tracking results, while a dedicated detection module rectifies tracking failures through error correction. Although existing studies [13,14] effectively mitigate single-tracker failures through error detection and recovery modules, their redundant architectures exhibit critical limitations. Specifically, multi-network parallelization exponentially increases model complexity, hindering compliance with UAV platforms’ lightweight deployment requirements. Furthermore, these approaches insufficiently address the balance between computational resources and real-time constraints. Nevertheless, the innovative approaches in heterogeneous feature fusion and failure compensation mechanisms provide valuable theoretical references for designing long-term UAV tracking algorithms.

3. Methodology

This section systematically elucidates the core architectural modules of the proposed framework, followed by a comprehensive exposition of the algorithmic workflow and the interdependencies among constituent elements. As shown in Figure 1, the closed-loop architecture integrates real-time detection-tracking-control pipelines with self-correction mechanisms, where multi-criteria scoring enables dynamic switching between detection and tracking branches based on environmental feedback.

3.1. System Overview

To achieve persistent aerial pursuit in long-term drone-following scenarios, we propose a real-time tracking system architecture comprising three synergistic modules.
The detection and feature extraction module employs a dual-phase operation: During initialization, the system accepts either a ground-truth bounding box for target specification or leverages gesture recognition to dynamically localize human subjects. In active tracking mode, this component generates candidate bounding boxes through continuous scene parsing. These detection outputs undergo rigorous evaluation in our multi-criteria scoring module, which quantifies target confidence through spatial-temporal consistency metrics and appearance similarity measures.
The tracking and motion perception module strategically integrates these assessments with intersection-over-union (IoU) analysis between optimal candidates and current tracking windows, implementing an adaptive tracker reinitialization protocol when divergence thresholds are exceeded. This core processor simultaneously maintains a trajectory buffer T hist that archives historical motion patterns, enabling predictive movement analysis. Simultaneously, the degree of matching derived from the intersection-over-union (IoU) analysis inversely regulates the detection frequency, thus mitigating the excessive computational latency and energy consumption overhead caused by frequent detection.
The integrated framework establishes a closed-loop processing chain where detection validation informs tracking adjustments, while motion history enhances future detection reliability, significantly enhancing system robustness and tracking continuity in dynamic environments.

3.2. Motion-Aware Tracking Module

In this module, the implementation of our lightweight correlation filter-based filter tracker will be introduced. We selected CSR-DCF (Discriminative Correlation Filter with Channel and Spatial Reliability) [17], an advanced short-term tracker, as our tracker. By incorporating channel and spatial reliability, it effectively addresses occlusion and boundary effects while maintaining computational efficiency. This method has achieved state-of-the-art performance on multiple benchmark datasets, including VOT 2016 [18], VOT 2015 [18], and OTB-100 [19]. Correlation filter-based trackers, including Channel and Spatial Reliability Discriminative Correlation Filter (CSR-DCF), often underperform in long-term tracking scenarios due to unresolved model drift and the absence of long-term memory mechanisms. To mitigate these limitations, the proposed system integrates adaptively scheduled object detection and feature extraction modules to supervise the tracking process.
Whenever the multi-criteria scoring module produces an output (the optimal bounding box that meets the threshold criteria), the system computes IoU ( B opt , B trk ) between this optimal bounding box and the tracker’s current bounding box by:
IoU ( B opt , B trk ) = B opt B trk B opt B trk
where B opt denotes the optimal bounding box from the multi-criteria scoring module, and B trk represents the tracker’s current prediction. The IoU ( B opt , B trk ) is dynamically leveraged to regulate the detection interval Δ frame by
Δ frame = 10 · IoU ( B opt , B trk ) · k , IoU ( B opt , B trk ) τ track Δ min , IoU ( B opt , B trk ) < τ track
where · denotes the ceiling operator for discrete interval quantization, k Z + is the base interval multiplier (default: 5 frames), τ track is the tracking confidence threshold (empirically set to 0.6), and Δ min defines the minimum detection interval (10 frames).
This control strategy implements dual operational modes:
  • IoU ( B opt , B trk ) τ track : The detection frequency is adaptively modulated, achieving a latency-energy trade-off through the relationship with tracking confidence.
  • IoU ( B opt , B trk ) < τ track : The tracker is reinitialized using the optimal candidate bounding box B opt , as sub-threshold IoU indicates potential tracking failure. This minimum interval guarantees prompt evaluation of tracker reinitialization.
This module employs the Kalman filter (KF) algorithm [20] to model target trajectories. The KF establishes a linear motion model integrating trajectory buffer T hist and system dynamics for motion perception. This approach effectively addresses trajectory prediction challenges during short-term target occlusion or detection failures.

3.2.1. CSR-DCF Tracker

The Channel and Spatial Reliability Discriminative Correlation Filter (CSR-DCF) [17] enhances traditional correlation filtering with dual reliability mechanisms. Its core formulation minimizes the objective:
E ( h ) = k = 1 K ω k x k × h k y 2 + λ k = 1 K h k 2
where h k denotes the filter for channel k, x k the feature map, and ω k the channel reliability weights. The spatial reliability component is implemented through a binary mask m { 0 , 1 } W × H that excludes non-target regions:
m ( i , j ) = 1 if ( i , j ) R target 0 otherwise
this algorithm enhances tracking robustness through three technical innovations: First, a channel reliability weighting mechanism based on feature response consistency dynamically evaluates confidence levels across feature channels. Second, a spatial reliability adaptation strategy employs binary region segmentation to handle partial occlusion via mask updating. Finally, a Fourier-domain optimization scheme reduces computational complexity to O ( N log N ) , achieving real-time performance while maintaining tracking precision. The tracker achieves real-time performance while maintaining robustness to deformation and illumination changes, with an update rule:
h t = ( 1 η ) h t 1 + η h new
where η [ 0.01 , 0.05 ] controls model adaptation rate.

3.2.2. Motion Awareness

The motion awareness system combines a historical trajectory buffer with Kalman filtering to achieve robust motion prediction. A double-ended queue maintains an N t frames trajectory buffer T hist containing motion states:
T hist = ( x t , y t , v x ( t ) , v y ( t ) ) t = ( N t 1 ) 0
where ( x t , y t ) represents target coordinates and ( v x ( t ) , v y ( t ) ) denotes instantaneous velocities calculated through frame differencing:
v x ( t ) = x t x t 1 , v y ( t ) = y t y t 1
The buffer implements FIFO replacement for new entries when exceeding capacity, marks invalid detections with null values ( , , , ) , and preserves velocity continuity through differential calculations.
For physical motion modeling, an eight-dimensional Kalman filter with state vector
x = [ x , y , v x , v y , Δ x , Δ y , Δ v x , Δ v y ] T
employs a constant velocity transition model:
F = 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1
the observation model H = I 2 0 2 × 6 focuses on positional measurements, with adaptive noise covariance matrices Q = 0.01 I 8 and R = I 2 . Velocity estimation derives from averaged historical displacements:
v ¯ x = 1 N t 1 t = ( N t 2 ) 0 ( x t x t 1 )
this integrated approach enables continuous motion tracking through temporary detection failures while maintaining physical plausibility constraints.

3.3. Detection and Feature Extraction Module

The detection and feature extraction module synergistically integrates three operational components: a YOLO-based detector generating geometrically validated candidate boxes D t = { ( x i , y i , w i , h i , c i ) } , a MobileNetV2-powered feature extractor producing illumination-robust 2 -normalized embeddings f i R 1280 via depthwise separable convolutions [21], and a quality-aware feature repository F maintaining discriminative templates through composite-score-driven FIFO renewal with configurable capacity N f . This integrated pipeline achieves real-time processing while enabling metric-aware matching and persistent identity preservation essential for occlusion-resilient multi-target tracking across variable scenarios.

3.3.1. Object Detection Framework

The detection system employs a multi-scale convolutional architecture integrated with cascaded prediction heads for precise human target localization. The framework incorporates dual filtering mechanisms to ensure operational reliability, comprising confidence thresholding (retaining proposals with detection scores c i > 0.5 and geometric constraints (filtering regions exhibiting abnormal aspect ratios ( w / h [ 0.2 , 5 ] ) ) . The detector generates candidate bounding box outputs formalized as D t = { ( x i , y i , w i , h i , c i ) } , where ( x , y ) denote the center coordinates, (w, h) represent the box dimensions, and c indicates the detection confidence score.
The selection of object detection frameworks was rigorously evaluated against four criteria: computational efficiency, detection accuracy, deployment flexibility, and real-time performance. The YOLO [22] series was ultimately adopted due to its optimal balance between latency-sensitive operations and precision-critical detection tasks.

3.3.2. Deep Feature Representation

The feature extraction framework employs MobileNetV2’s [21] depthwise separable convolution architecture to achieve efficient discriminative feature learning. Spatial normalization is accomplished through affine transformations that align input regions to canonical perspectives, followed by an embedding projection that generates L2-normalized 1280-dimensional feature vectors ( f R 1280 ). Adversarial training with illumination perturbation samples enhances robustness to lighting variations by minimizing feature space sensitivity to controlled brightness fluctuations. The inverted residual structure with linear bottlenecks enables effective feature reuse while maintaining computational efficiency, particularly through expansion ratios of six in critical layers. The feature mapping process formalizes as
f i = Φ ( I ( B i ) ; θ ) where f i 2 = 1
where Φ ( · ; θ ) denotes the parametric feature extractor and I ( B i ) represents the image region within the bounding box B i .
While our implementation utilizes MobileNetV2 for its optimal balance between accuracy and latency, the modular design permits substitution with alternative architectures, including EfficientNet [23], SqueezeNet [24], and GhostNet [25]. These candidate networks could be integrated through the standardized feature interface while maintaining system compatibility.

3.3.3. Quality-Aware Feature Repository

The system maintains a quality-aware feature repository F R N f × 1280 to preserve the characteristics of the desired appearance, where N f denotes the capacity of the repository controlling the maximum stored templates. This circular buffer stores L2-normalized feature vectors extracted through a MobileNetV2 backbone [21] from Section 3.3.2.
The feature repository update mechanism utilizes the composite score S total from Section 3.4 Multi-criteria Scoring Module to determine template quality. The update policy follows:
U ( f t ) = Replace f min if S total t > min ( S total F ) Retain otherwise
where S total F denotes the historical composite scores of the stored features. The replacement process maintains a fixed repository capacity | F | = N f through the replacement of First-In-First-Out (FIFO), where N f = 20 in our implementation. Update operations are triggered when the composite score exceeds a quality threshold τ q . The system prioritizes replacement of features possessing minimal historical scores min ( S total F ) , establishing a dynamic equilibrium between feature freshness and representation quality.
The feature repository maintains N f discriminative target appearance representations to enhance system stability through dynamic template management. By preserving high-fidelity features, the repository enables robust detection of target disappearance/re-emergence events while mitigating long-term appearance drift via quality-controlled template renewal. This dual mechanism of persistence and adaptability ensures continuous identity preservation under occlusion/reappearance scenarios without compromising discrimination capability.

3.4. Multi-Criteria Scoring Module

The module establishes a composite scoring mechanism to evaluate candidate bounding boxes through four complementary criteria, ensuring reliable target validation in dynamic aerial scenarios. For each candidate B i D t with feature f i the total confidence score is computed as
S total ( i ) = λ 1 c i + λ 2 s i + λ 3 t i + λ 4 m i
where λ 1 λ 4 denote criterion weights ( k = 1 4 λ k = 1 ).
The detection confidence component c i directly adopts the detector’s output score from D t , enhanced through sigmoidal compression to emphasize high-confidence proposals while suppressing ambiguous detections.
The appearance similarity is quantified via maximum cosine similarity between candidate feature f i and reference templates in repository F :
s i = max f j ( ref ) F f i f j ( ref )
The temporal consistency score t i [ 0 ,   1 ] is defined as
t i = IoU ( B i , B t τ ) · e τ / k
where IoU ( B i , B t τ ) denotes intersection-over-union with the tracking box from τ frames earlier, τ Z + represents elapsed frames since the last validated tracking update, and k controls temporal reliability decay rate (empirically k = 15 frames). This formulation introduces an exponential decay term e τ / k to address temporal reliability degradation, where increasing τ induces progressive score attenuation. The decay constant k governs the attenuation gradient—a smaller k accelerates score reduction for long intervals. When τ > 3 k , the term e τ / k < 0.05 effectively nullifies temporal consistency reliance, mandating appearance similarity dominance. The compound metric preserves short-term spatial coherence ( τ < k ) while preventing erroneous propagation of outdated positional priors in long-term tracking scenarios.
The motion consistency evaluates positional alignment with Kalman-filtered predictions from Section 3.2:
m i = 1 1 + | p i p ^ | 2 / σ m
The optimal candidate B opt is selected through thresholded maximization:
B opt = arg max B i D t S total ( i ) · I ( c i > 0.5 )
where I ( · ) enforces minimum detection confidence. The existence confidence of the optimal candidate is evaluated through threshold comparison:
target _ exists = I S total ( opt ) > τ exist
where τ exist ( 0 ,   1 ) denotes the existence confirmation threshold. For high-confidence matches ( target _ exists = 1 ), the feature repository F undergoes FIFO-based template renewal using B opt as detailed in Section 3.3.2, while simultaneously resetting the persistence counter of trajectory buffer T hist to maintain temporal constraints. Conversely, when S total ( opt ) τ exist , indicating a potential target disappearance, the system initiates emergency protocols: (1) complete reset of T hist to prevent error propagation, and (2) schedules the next detection interval as Δ frame = Δ min .
This unified scoring framework achieves robust target validation in aerial tracking scenarios through static weighting of spatial, temporal, and appearance cues, combining detection confidence, template similarity, decayed spatial consistency, and motion alignment in fixed proportions to address occlusion resilience, motion blur tolerance, and long-term tracking stability without requiring dynamic weight adaptation.

3.5. Integrated Algorithmic Workflow

The proposed tracking framework operates through three cascaded decision layers, as formalized in Algorithm 1:
Algorithm 1: Enhanced tracking algorithm with existence check.
Drones 09 00467 i001
Triggered at intervals of Δ frame frames, this layer executes
D t = { B i | c i > 0.5 , w i h i [ 0.2 , 5 ] }
where geometric constraints filter non-human proposals. Each candidate B i is then scored by
S total ( i ) = k = 1 4 λ k · metric k
with λ 1 + λ 2 + λ 3 + λ 4 = 1 . The optimal candidate B opt is selected through
B opt = arg max B i D t S total ( i ) · I ( c i > 0.5 )
Target persistence is validated via threshold comparison:
target _ exists = I S total ( opt ) > τ exist
where τ exist defines the existence confidence threshold. Negative verification triggers
T hist
Δ frame Δ min
which operates under the target _ exists = 1 condition. The tracker-detector consensus is evaluated by
IoU = B opt B trk B opt B trk
with adaptive control rules
Δ frame = 10 · IoU · k IoU τ trk Δ min IoU < τ trk
where τ trk establishes a tracking confidence threshold. Tracker reinitialization occurs when IoU < τ trk .

4. Implementation

We deployed our tracking algorithm within a closed-loop aerial tracking pipeline (Figure 2 and Figure 3) designed for autonomous target pursuit. Conceptually, this pipeline
  • Captures real-time video via UAV-mounted cameras.
  • Processes frames on edge nodes to execute tracking algorithms.
  • Generates dynamic control commands for UAV trajectory adjustment.
This section details the hardware implementation and quantifies the system’s latency characteristics across these stages.

4.1. Hardware Platform

The system’s hardware architecture comprises three key components in Figure 4. The UAV transmits video streams via an RTSP protocol to a custom decoder, which sends processed YUV data through WebSocket to the edge server. Target coordinates (x, y, w, h) from detection algorithms generate three-axis control commands: horizontal adjustment uses bounding box center deviation, longitudinal control calculates compensation from target area changes, while altitude remains fixed for safety. Commands are queued and dispatched via a dedicated WebSocket thread. Control communication is implemented through a DJI MSDK-based Android app in Figure 5.

4.1.1. UAV Platform

The The DJI Mini 3 UAV (SZ DJI Technology Co., Ltd., Shenzhen, China) served as the primary aerial sensing platform, equipped with a 1/1.3-inch CMOS sensor supporting 4 K/30 fps video acquisition. Configured to stream 1080 p (1920 × 1080) YUV 4:2:0 video via its three-axis mechanical gimbal, the system achieves stabilized dynamic scene capture. A hybrid positioning system integrating dual-frequency GPS and six-axis IMU provided geodetic coordinates under WGS84 datum with ±1.5 m horizontal positioning accuracy.

4.1.2. Remote Controller

The remote control system consisted of a DJI RC231 controller and an Android application developed with DJI MSDK. The RC231 received the UAV’s real-time 1080p@30fps video feed via OcuSync 2.0 and transmitted the video stream to the Android terminal through USB-C. The application employed the WebSocket protocol to send compressed video frames to the edge server, which performed tracking analysis and returned control commands (e.g., yaw adjustment, speed modulation). These commands are executed on the UAV through DJI-specific APIs like sendVirtualStickCommand.

4.1.3. Edge Computing Unit

The edge computing platform utilized the NVIDIA Jetson Orin Developer Kit (manufacturer: NVIDIA Corporation, Santa Clara, CA, USA. Detailed specs: 12-core ARM Cortex-A78AE CPU @2.0GHz, 2048-core Ampere GPU (4.0 TFLOPS), 32GB LPDDR5 RAM (204 GB/s), 64GB eMMC storage.) running Ubuntu 20.04.6 LTS (Focal Fossa) with Linux kernel 5.10.65. Its heterogeneous architecture combined a multi-core ARM CPU and a high-performance Ampere GPU with over 4 TFLOPS computational capacity, supported by high-bandwidth memory and solid-state storage. The software stack utilized CUDA for GPU acceleration, OpenCV for computer vision routines, and Python for algorithmic implementation (Detailed specs: 12-core ARM Cortex-A78AE CPU @2.0GHz, 2048-core Ampere GPU (4.0 TFLOPS), 32GB LPDDR5 RAM (204 GB/s), 64GB eMMC storage. Software versions: CUDA 11.4, OpenCV 4.5.5, Python 3.8.10).

4.2. Latency Characteristics

The end-to-end latency stems from three physical processes as detailed in Table 1.
The round-trip latency of the control pipeline totals 764 ms ± 107 ms , decomposed into three sequential stages. Video acquisition contributes 183 ms ± 27 ms for sensor readout and OcuSync transmission. Data processing introduces 50 ms ± 12 ms for YUV 4:2:0 to RGB conversion, while edge communication accounts for 154 ms ± 58 ms , dominated by network variability. Total uncertainty derives from the root sum of squares (RSS)’s propagation of individual stage deviations.

5. Evaluation

We evaluated the performance of the proposed tracking-detection-scoring framework, focusing on filter-based trackers. These conventional trackers are enhanced with low computational overhead to enable effective real-time UAV deployment. Experimental analysis follows.

5.1. Experimental Setup

This subsection systematically describes the benchmark datasets and algorithmic parameter configurations, with hardware specifications detailed in Section 4.1.3.

5.1.1. Datasets

VOT-ST2019 [26] contains 60 short-term sequences annotated with rotated bounding boxes, explicitly incorporating six critical UAV challenges: occlusion, motion blur, viewpoint changes, scale variations, target disappearance/reappearance, and illumination changes.
OTB100 [27] provides 100 long-term sequences with axis-aligned bounding boxes and 11 challenge tags (scale/illumination changes). Its extended temporal span (500+ frames/seq) validates tracking consistency under simulated UAV communication delays (200–500 ms).

5.1.2. Parameter Settings

The algorithmic hyperparameters were empirically validated on benchmark datasets, with critical thresholds configured as follows. Threshold τ exist = 0.5 governs target validation, consistent with tracking standards where I o U > 0.4 typically indicates correct localization. Threshold τ trk = 0.6 filters low-quality trajectories, enforcing strict positioning to maintain targets near bounding box centroids for stable tracking.
A sliding temporal window T hist spanning seven consecutive frames enforces short-term motion consistency, and the feature repository F maintains 20 prototypical feature vectors through FIFO replacement to optimize memory-representation trade-offs. The sliding temporal window requires strong temporal correlation, hence discards outdated motion data, while the feature repository stores long-term high-confidence features to mitigate appearance drift—its size is constrained to avoid excessive computational burden.
For the composite scoring metric in Equation (13), grid search optimization yielded the following weight coefficients: λ 1 = 0.4 for detection confidence component c i , λ 2 = 0.3 for appearance similarity s i , λ 3 = 0.2 for motion consistency t i , and λ 4 = 0.1 for m i . The optimal parameter settings vary across different datasets and scenarios. The values provided represent empirically determined defaults that typically yielded the best overall accuracy in our experimental evaluations.

5.2. Evaluation Metrics

Expected Average Overlap (EAO): Quantifies the expected average intersection-over-union (IoU) between predicted and ground-truth bounding boxes over the entire sequence, measuring long-term tracking robustness.
Failure Rate: Definition as the ratio of frames with IoU < 0.3 . The absence of reset protocols makes this metric more rigorous, as temporary tracking failures cannot be artificially recovered.
Frames Per Second (FPS): The FPS quantifies the computational throughput of tracking algorithms by measuring processed frames per unit time.

5.3. Quantitative Comparison

To rigorously evaluate our enhancement framework’s generalization capability, we conducted cross-dataset validations on both VOT-ST2019 [26] and OTB100 [27] benchmarks, applying our methodology to four baseline trackers: CSR-DCF [17], Kernelized Correlation Filters (KCFs) [28], Multiple Instance Learning tracker (MIL) [29], and Generic Object Tracking Using Regression Networks (GOTURNs) [30]. The following analysis systematically quantifies performance improvements across different tracking paradigms.

5.3.1. Experimental Analysis on VOT-ST2019

Our framework achieved a 27.94% EAO gain and 10.39% failure reduction on VOT-ST2019 by dynamically weighting detection-tracking outputs and integrating Kalman-filtered motion vectors. This enabled sustained 23–95 FPS tracking during occlusions. The success plots in Figure 6 reveal critical insights into tracking precision enhancement under our proposed methodology.
The success rate curves exhibit systematic improvements across all overlap thresholds (0.2–0.8), with particularly notable gains in high-precision regimes (>0.5 IoU). For CSR-DCF-improved, the success rate at T h r e s h o l d IOU = 0.3 increases from 45.5% to 53.1% (relative improvement: 16.7%), indicating enhanced capability to maintain precise target localization. The performance enhancement stems from the synergistic integration between the detection–feature extraction module and the multi-criteria scoring architecture. The detection module employs categorical identification to discern target classes, while the feature extraction submodule generates discriminative embeddings through deep representation learning. These visual signatures are subsequently combined with motion consistency metrics and temporal correlation analyses in our multi-dimensional scoring module (Section 3.4) to compute composite confidence scores.
Our experimental validation on VOT-2019 Table 2 demonstrates statistically significant improvements in both Expected Average Overlap (EAO, 19.17–60.47%) and failure rate reduction (3.39–22.04%), while incurring minimal latency penalties of merely 0.64% to 8.88% across different tracker configurations. The framework with CSR-DCF achieves these advancements while maintaining real-time operational capability (>23 FPS), establishing an optimal trade-off between tracking precision, computational efficiency, and long-term robustness in aerial surveillance scenarios.

5.3.2. Experimental Analysis on OTB100

As demonstrated in Figure 7 and Table 3, our framework exhibits systematic performance improvements on the OTB100 benchmark. Experimental results reveal significant enhancements in Expected Average Overlap (EAO) ranging from 5.88% to 62.39% across improved trackers. Particularly noteworthy is the deep learning-based GOTURN tracker, which achieves a remarkable 62.39% EAO improvement (0.117 → 0.190), attributed to the synergistic optimization between motion consistency scoring mechanisms and deep feature representations. This enhancement maintains practical applicability through controlled frame rate reduction of merely 5.14% via dynamic detection scheduling, while simultaneously demonstrating the framework’s compatibility with deep learning paradigms.
Crucially, all enhanced algorithms maintain real-time operation (>14 FPS), with CSR-DCF and KCF preserving deployment feasibility at 30.83 FPS and 95.89 FPS, respectively, while achieving their respective EAO improvements. These results collectively validate the framework’s capability to enhance both traditional and deep learning-based tracking paradigms while maintaining operational practicality for aerial platforms.

5.4. Visualizing Algorithmic Gains

This section presents a visual comparative analysis conducted on the VOT-ST2019 dataset [26] to demonstrate our framework’s tracking enhancement capabilities. The evaluation focuses on two representative sequences exhibiting distinct challenges.

5.4.1. Pedestrian Interaction Scenario (Girl Sequence)

Figure 8 illustrates tracking performance under occlusion and scale variation. Our method exhibits significantly enhanced target re-acquisition capability during occlusion events compared to baseline methods. This performance advantage is empirically demonstrated through the sequential frames presented in Figure 8. When a white-clad male completely occludes the target (girl) in the test sequence, the baseline CSR-DCF method erroneously locks onto the distractor, as evidenced by bounding box drift toward the occluder—a failure attributed to insufficient discriminative feature persistence. Our framework addresses occlusion by maintaining a high-fidelity feature bank through discriminative scoring. Upon target loss, the detection mechanism rapidly identifies tracking inconsistencies and initiates re-detection to recover the true target. This architectural integration enables real-time identification of tracker failure states through continuous confidence metric analysis. Upon occlusion resolution (target fully re-emerging at Frame 121), the system demonstrates rapid recovery capability, achieving correct target reacquisition within 5 frames (Frame 126), thereby confirming the efficacy of our reinitialization protocol.

5.4.2. Dynamic Sports Scenario (iceskater2 Sequence)

As illustrated in Figure 9, our enhanced tracker maintains exceptional motion consistency during rapid rotations while preserving positional accuracy under high-speed motion. Comparative analysis reveals critical improvements in positional precision under high-speed motion, target retention under strict overlap constraints, and failure recovery robustness during abrupt motion discontinuities. This capability is rigorously validated in rotational entanglement scenarios where two ice-skaters perform prolonged embracing spins—under sustained heavy occlusion conditions, our enhanced CSR-DCF maintains continuous tracking of the target female skater, while the baseline method catastrophically drifts to the distractor. Crucially, during separation phases, our system demonstrates discriminative identification precision, successfully reacquiring the designated target through quality-driven feature prioritization as detailed in Section 3.3.3. This discriminative capacity originates from our multi-dimensional scoring mechanism that adaptively weights high-fidelity features while suppressing transient interference patterns during repository updates.

6. Conclusions

Our framework achieves a 27.94% EAO gain and 10.39% failure rate reduction in Table 2 and Table 3 through three core innovations: First, adaptive detection-tracking coordination enables automatic failure recovery with <10% latency overhead. Detection modules identify tracking failures to trigger target re-acquisition, while tracking quality-regulated detection frequency optimizes computational efficiency. Second, motion-aware spatiotemporal modeling in Section 3.2 maintains continuity during occlusions by preserving motion cues for predictive position estimation. Third, quality-driven feature banks mitigate long-term appearance drift through discriminative template management. High-fidelity templates preserve multi-angle representations and enable continuous temporal renewal in Section 5.4.2.
Experimental validation confirms our framework’s capability to enhance various trackers (correlation filter-based and deep learning-based) across challenging aerial scenarios. Real-time performance varies (23– 30 FPS) due to computational surges during target re-acquisition, triggered by tracking interruptions in visually distorted scenes. Future work will extend this architecture to multi-target scenarios and integrate domain adaptation mechanisms for adverse weather operations.

Author Contributions

Conceptualization, Y.W.; formal analysis, Y.W.; investigation, Y.W.; methodology, Y.W.; software, Y.W. and J.H.; supervision, H.H. and Z.Z.; validation, Y.W. and D.H.; writing—original draft, Y.W.; writing—review and editing, H.H. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62372094) and the Natural Science Foundation of Sichuan Province (2025YFHZ0197).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. De Smedt, F.; Hulens, D.; Goedeme, T. On-Board Real-Time Tracking of Pedestrians on a UAV. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
  2. Scherer, J.; Yahyanejad, S.; Hayat, S.; Yanmaz, E.; Andre, T.; Khan, A.; Vukadinovic, V.; Bettstetter, C.; Hellwagner, H.; Rinner, B. An Autonomous Multi-UAV System for Search and Rescue. In Proceedings of the First Workshop on Micro Aerial Vehicle Networks, Systems, and Applications for Civilian Use, Florence, Italy, 18 May 2015; DroNet ’15. Association for Computing Machinery: New York, NY, USA, 2015; pp. 33–38. [Google Scholar] [CrossRef]
  3. Hao, J.; Zhou, Y.; Zhang, G.; Lv, Q.; Wu, Q. A Review of Target Tracking Algorithm Based on UAV. In Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS), Shenzhen, China, 25–27 October 2018; pp. 328–333. [Google Scholar] [CrossRef]
  4. Yang, C.; Yang, M.; Li, H.; Jiang, L.; Suo, X.; Mao, L.; Meng, W.; Li, Z. A survey on soccer player detection and tracking with videos. Vis. Comput. 2024, 41, 815–829. [Google Scholar] [CrossRef]
  5. Carli, R.; Cavone, G.; Epicoco, N.; Di Ferdinando, M.; Scarabaggio, P.; Dotoli, M. Consensus-Based Algorithms for Controlling Swarms of Unmanned Aerial Vehicles. In Ad-Hoc, Mobile, and Wireless Networks: 19th International Conference on Ad-Hoc Networks and Wireless, ADHOC-NOW 2020, Bari, Italy, 19–21 October 2020; Proceedings; Springer: Berlin/Heidelberg, Germany, 2020; pp. 84–99. [Google Scholar] [CrossRef]
  6. Yang, J.; Tang, W.; Ding, Z. Long-Term Target Tracking of UAVs Based on Kernelized Correlation Filter. Mathematics 2021, 9, 3006. [Google Scholar] [CrossRef]
  7. Sun, N.; Zhao, J.; Shi, Q.; Liu, C.; Liu, P. Moving Target Tracking by Unmanned Aerial Vehicle: A Survey and Taxonomy. IEEE Trans. Ind. Inform. 2024, 20, 7056–7068. [Google Scholar] [CrossRef]
  8. Fu, C.; Ye, J.; Xu, J.; He, Y.; Lin, F. Disruptor-Aware Interval-Based Response Inconsistency for Correlation Filters in Real-Time Aerial Tracking. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6301–6313. [Google Scholar] [CrossRef]
  9. Zhang, F.; Ma, S.; Qiu, Z.; Qi, T. Learning target-aware background-suppressed correlation filters with dual regression for real-time UAV tracking. Signal Process. 2022, 191, 108352. [Google Scholar] [CrossRef]
  10. Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking. In Proceedings of the Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Hua, G., Jégou, H., Eds.; Springer: Cham, Switzerland, 2016; pp. 850–865. [Google Scholar]
  11. Sun, X.; Wang, Q.; Xie, F.; Quan, Z.; Wang, W.; Wang, H.; Yao, Y.; Yang, W.; Suzuki, S. Siamese Transformer Network: Building an autonomous real-time target tracking system for UAV. J. Syst. Archit. 2022, 130, 102675. [Google Scholar] [CrossRef]
  12. Marvasti-Zadeh, S.M.; Cheng, L.; Ghanei-Yakhdan, H.; Kasaei, S. Deep Learning for Visual Tracking: A Comprehensive Survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 3943–3968. [Google Scholar] [CrossRef]
  13. Huang, L.; Zhao, X.; Huang, K. GlobalTrack: A Simple and Strong Baseline for Long-Term Tracking. Proc. AAAI Conf. Artif. Intell. 2020, 34, 11037–11044. [Google Scholar] [CrossRef]
  14. Wu, H.; Yang, X.; Yang, Y.; Liu, G. Flow Guided Short-Term Trackers with Cascade Detection for Long-Term Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
  15. Nam, H.; Han, B. Learning Multi-Domain Convolutional Neural Networks for Visual Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  16. Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  17. Lukezic, A.; Vojir, T.; Cehovin Zajc, L.; Matas, J.; Kristan, M. Discriminative Correlation Filter with Channel and Spatial Reliability. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  18. Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Cehovin, L.; Fernandez, G.; Vojir, T.; Hager, G.; Nebehay, G.; Pflugfelder, R. The Visual Object Tracking VOT2015 Challenge Results. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, Santiago, Chile, 7–13 December 2015. [Google Scholar]
  19. Wu, Y.; Lim, J.; Yang, M.H. Online Object Tracking: A Benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013. [Google Scholar]
  20. Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
  21. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  22. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  23. Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Proceedings of Machine Learning Research. Chaudhuri, K., Salakhutdinov, R., Eds.; PMLR: New York, NY, USA, 2019; Volume 97, pp. 6105–6114. [Google Scholar]
  24. Koonce, B. SqueezeNet. In Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization; Apress: Berkeley, CA, USA, 2021; pp. 73–85. [Google Scholar] [CrossRef]
  25. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WT, USA, 14–19 June 2020. [Google Scholar]
  26. Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Pflugfelder, R.; Kamarainen, J.K.; Cehovin Zajc, L.; Drbohlav, O.; Lukezic, A.; Berg, A.; et al. The Seventh Visual Object Tracking VOT2019 Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
  27. Wu, Y.; Lim, J.; Yang, M.H. Object Tracking Benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1834–1848. [Google Scholar] [CrossRef] [PubMed]
  28. Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef] [PubMed]
  29. Babenko, B.; Yang, M.H.; Belongie, S. Visual tracking with online Multiple Instance Learning. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 983–990. [Google Scholar] [CrossRef]
  30. Held, D.; Thrun, S.; Savarese, S. Learning to Track at 100 FPS with Deep Regression Networks. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; pp. 749–765. [Google Scholar]
Figure 1. Closed-loop real-time detection-tracking-control system architecture.
Figure 1. Closed-loop real-time detection-tracking-control system architecture.
Drones 09 00467 g001
Figure 2. Real-time UAV visual tracking.
Figure 2. Real-time UAV visual tracking.
Drones 09 00467 g002
Figure 3. Implementation.
Figure 3. Implementation.
Drones 09 00467 g003
Figure 4. System hardware components.
Figure 4. System hardware components.
Drones 09 00467 g004
Figure 5. Custom Android application developed using DJI Mobile SDK (MSDK).
Figure 5. Custom Android application developed using DJI Mobile SDK (MSDK).
Drones 09 00467 g005
Figure 6. Success plots of trackers with IOU metrics on VOT-st2019 [26].
Figure 6. Success plots of trackers with IOU metrics on VOT-st2019 [26].
Drones 09 00467 g006
Figure 7. Performance evaluation on OTB100 benchmark. (a) Precision plot comparison on OTB100 dataset. (b) Success plot comparison on OTB100 dataset.
Figure 7. Performance evaluation on OTB100 benchmark. (a) Precision plot comparison on OTB100 dataset. (b) Success plot comparison on OTB100 dataset.
Drones 09 00467 g007
Figure 8. Tracking performance on VOT-ST2019 girl sequence: comparative analysis of CSR-DCF and enhanced CSR-DCF. Green (ground truth), red (CSR-DCF-improved), and blue (baseline CSR-DCF).
Figure 8. Tracking performance on VOT-ST2019 girl sequence: comparative analysis of CSR-DCF and enhanced CSR-DCF. Green (ground truth), red (CSR-DCF-improved), and blue (baseline CSR-DCF).
Drones 09 00467 g008
Figure 9. Tracking performance on VOT-ST2019 iceskater2 sequence: comparative analysis of CSR-DCF and enhanced CSR-DCF. Green (ground truth), red (CSR-DCF-improved), and blue (baseline CSR-DCF).
Figure 9. Tracking performance on VOT-ST2019 iceskater2 sequence: comparative analysis of CSR-DCF and enhanced CSR-DCF. Green (ground truth), red (CSR-DCF-improved), and blue (baseline CSR-DCF).
Drones 09 00467 g009
Table 1. Latency profiling of operational pipeline.
Table 1. Latency profiling of operational pipeline.
StageLatency (ms)Uncertainty (±ms)
Video Acquisition18327
Data Processing5012
Edge Communication15458
Total (Round-Trip)764107
Table 2. Performance comparison on VOT-ST2019 [26] with improvement rates.
Table 2. Performance comparison on VOT-ST2019 [26] with improvement rates.
TrackerEAOFailure RateFPS
BaseImp. Δ (%)BaseImp. Δ (%)BaseImp. Δ (%)
CSR-DCF0.2920.348+19.17%0.5450.469 13.94 % 24.9823.45−6.12%
KCF0.1170.136+16.24%0.8270.799 3.39 % 90.2589.67−0.64%
GOTURN0.0950.123+29.47%0.8670.805 7.15 % 6.166.07−1.46%
MIL0.1720.276+60.47%0.7440.580 22.04 % 15.3213.96−8.88%
Table 3. Performance comparison on OTB100 [27] with improvement rates.
Table 3. Performance comparison on OTB100 [27] with improvement rates.
TrackerEAOFailure RateFPS
BaseImp. Δ (%)BaseImp. Δ (%)BaseImp. Δ (%)
CSR-DCF0.5100.540+5.88%0.2530.231 8.70 % 34.8830.83−11.61%
KCF0.2510.292+16.33%0.6550.586 10.53 % 96.7895.89−0.92%
GOTURN0.1170.190+62.39%0.7640.724 5.24 % 6.225.90−5.14%
MIL0.3480.395+13.51%0.4450.391 12.13 % 16.0414.39−10.29%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Huang, H.; He, J.; Han, D.; Zhao, Z. Closed-Loop Aerial Tracking with Dynamic Detection-Tracking Coordination. Drones 2025, 9, 467. https://doi.org/10.3390/drones9070467

AMA Style

Wang Y, Huang H, He J, Han D, Zhao Z. Closed-Loop Aerial Tracking with Dynamic Detection-Tracking Coordination. Drones. 2025; 9(7):467. https://doi.org/10.3390/drones9070467

Chicago/Turabian Style

Wang, Yang, Heqing Huang, Jiahao He, Dongting Han, and Zhiwei Zhao. 2025. "Closed-Loop Aerial Tracking with Dynamic Detection-Tracking Coordination" Drones 9, no. 7: 467. https://doi.org/10.3390/drones9070467

APA Style

Wang, Y., Huang, H., He, J., Han, D., & Zhao, Z. (2025). Closed-Loop Aerial Tracking with Dynamic Detection-Tracking Coordination. Drones, 9(7), 467. https://doi.org/10.3390/drones9070467

Article Metrics

Back to TopTop