MDPI - Publisher of Open Access Journals

19 pages, 1563 KiB

Open AccessArticle

Small Object Tracking in LiDAR Point Clouds: Learning the Target-Awareness Prototype and Fine-Grained Search Region

by Shengjing Tian, Yinan Han, Xiantong Zhao and Xiuping Liu

Sensors 2025, 25(12), 3633; https://doi.org/10.3390/s25123633 - 10 Jun 2025

Viewed by 686

Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point [...] Read more.

Light Detection and Ranging (LiDAR) point clouds are an essential perception modality for artificial intelligence systems like autonomous driving and robotics, where the ubiquity of small objects in real-world scenarios substantially challenges the visual tracking of small targets amidst the vastness of point cloud data. Current methods predominantly focus on developing universal frameworks for general object categories, often sidelining the persistent difficulties associated with small objects. These challenges stem from a scarcity of foreground points and a low tolerance for disturbances. To this end, we propose a deep neural network framework that trains a Siamese network for feature extraction and innovatively incorporates two pivotal modules: the target-awareness prototype mining (TAPM) module and the regional grid subdivision (RGS) module. The TAPM module utilizes the reconstruction mechanism of the masked auto-encoder to distill prototypes within the feature space, thereby enhancing the salience of foreground points and aiding in the precise localization of small objects. To heighten the tolerance of disturbances in feature maps, the RGS module is devised to retrieve detailed features of the search area, capitalizing on Vision Transformer and pixel shuffle technologies. Furthermore, beyond standard experimental configurations, we have meticulously crafted scaling experiments to assess the robustness of various trackers when dealing with small objects. Comprehensive evaluations show our method achieves a mean Success of 64.9% and 60.4% under original and scaled settings, outperforming benchmarks by +3.6% and +5.4%, respectively. Full article

(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)

► Show Figures

Figure 1

30 pages, 9702 KiB

Open AccessArticle

SiamCTCA: Cross-Temporal Correlation Aggregation Siamese Network for UAV Tracking

by Qiaochu Wang, Faxue Liu, Bao Zhang, Jinghong Liu, Fang Xu and Yulong Wang

Drones 2025, 9(4), 294; https://doi.org/10.3390/drones9040294 - 10 Apr 2025

Viewed by 651

Abstract

In aerial target-tracking research, complex scenarios place extremely high demands on the precision and robustness of tracking algorithms. Although the existing target-tracking algorithms have achieved good performance in general scenarios, all of them ignore the correlation between contextual information to a certain extent, [...] Read more.

In aerial target-tracking research, complex scenarios place extremely high demands on the precision and robustness of tracking algorithms. Although the existing target-tracking algorithms have achieved good performance in general scenarios, all of them ignore the correlation between contextual information to a certain extent, and the manipulation between features exacerbates the loss of information, leading to the degradation of precision and robustness, especially in the field of UAV target tracking. In response to this, we propose a new lightweight Siamese-based tracker, SiamCTCA. Its innovative cross-temporal aggregated strategy and three feature correlation fusion networks play a key role, in which the Transformer multistage embedding achieves cross-branch information fusion with the help of the intertemporal correlation interactive vision Transformer modules to efficiently integrate different levels of features, and the feed-forward residual multidimensional fusion edge mechanism reduces information loss by introducing residuals to cope with dynamic changes in the search region; and the response significance filter aggregation network suppresses the shallow noise amplification problem of neural networks. The modules are confirmed to be effective after ablation and comparison experiments, indicating that the tracker exhibits excellent tracking performance, and with faster tracking speeds than other trackers, these can be better deployed in the field of a UAV as a platform. Full article

(This article belongs to the Special Issue Detection, Identification and Tracking of UAVs and Drones)

► Show Figures

Figure 1

17 pages, 12394 KiB

Open AccessArticle

TensorTrack: Tensor Decomposition for Video Object Tracking

by Yuntao Gu, Pengfei Zhao, Lan Cheng, Yuanjun Guo, Haikuan Wang, Wenjun Ding and Yu Liu

Mathematics 2025, 13(4), 568; https://doi.org/10.3390/math13040568 - 8 Feb 2025

Viewed by 927

Abstract

Video Object Tracking (VOT) is a critical task in computer vision. While Siamese-based and Transformer-based trackers are widely used in VOT, they struggle to perform well on the OTB100 benchmark due to the lack of dedicated training sets. This challenge highlights the difficulty [...] Read more.

Video Object Tracking (VOT) is a critical task in computer vision. While Siamese-based and Transformer-based trackers are widely used in VOT, they struggle to perform well on the OTB100 benchmark due to the lack of dedicated training sets. This challenge highlights the difficulty of effectively generalizing to unknown data. To address this issue, this paper proposes an innovative method that utilizes tensor decomposition, an underexplored concept in object-tracking research. By applying L1-norm tensor decomposition, video sequences are represented as four-mode tensors, and a real-time background subtraction algorithm is introduced, allowing for effective modeling of the target–background relationship and adaptation to environmental changes, leading to accurate and robust tracking. Additionally, the paper integrates an improved multi-kernel correlation filter into a single frame, locating and tracking the target by comparing the correlation between the target template and the input image. To further enhance localization precision and robustness, the paper also incorporates Tucker2 decomposition to integrate appearance and motion patterns, generating composite heatmaps. The method is evaluated on the OTB100 benchmark dataset, showing significant improvements in both performance and speed compared to traditional methods. Experimental results demonstrate that the proposed method achieves a 15.8% improvement in AUC and a ten-fold increase in speed compared to typical deep learning-based methods, providing an efficient and accurate real-time tracking solution, particularly in scenarios with similar target–background characteristics, high-speed motion, and limited target movement. Full article

(This article belongs to the Special Issue Advanced Research in Image Processing and Optimization Methods)

► Show Figures

Figure 1

26 pages, 5609 KiB

Open AccessArticle

DSiam-CnK: A CBAM- and KCF-Enabled Deep Siamese Region Proposal Network for Human Tracking in Dynamic and Occluded Scenes

by Xiangpeng Liu, Jianjiao Han, Yulin Peng, Qiao Liang, Kang An, Fengqin He and Yuhua Cheng

Sensors 2024, 24(24), 8176; https://doi.org/10.3390/s24248176 - 21 Dec 2024

Viewed by 889

Abstract

Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently [...] Read more.

Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently addressing issues such as complete occlusion or instances where the target exits the frame. To tackle these issues, this study enhances the SiamRPN algorithm by integrating the convolutional block attention module (CBAM), which enhances spatial channel attention. Additionally, it integrates the kernelized correlation filters (KCFs) for enhanced feature template representation. Building on this, we present DSiam-CnK, a Siamese neural network with dynamic template updating capabilities, facilitating adaptive adjustments in tracking strategy. The proposed algorithm is tailored to elevate the Siamese neural network’s accuracy and robustness for prolonged tracking, all the while preserving its tracking velocity. In our research, we assessed the performance on the OTB2015, VOT2018, and LaSOT datasets. Our method, when benchmarked against established trackers, including SiamRPN on OTB2015, achieved a success rate of 92.1% and a precision rate of 90.9%. On the VOT2018 dataset, it excelled, with a VOT-A (accuracy) of 46.7%, a VOT-R (robustness) of 135.3%, and a VOT-EAO (expected average overlap) of 26.4%, leading in all categories. On the LaSOT dataset, it achieved a precision of 35.3%, a normalized precision of 34.4%, and a success rate of 39%. The findings demonstrate enhanced precision in tracking performance and a notable increase in robustness with our method. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

28 pages, 14547 KiB

Open AccessArticle

A Contrastive-Augmented Memory Network for Anti-UAV Tracking in TIR Videos

by Ziming Wang, Yuxin Hu, Jianwei Yang, Guangyao Zhou, Fangjian Liu and Yuhan Liu

Remote Sens. 2024, 16(24), 4775; https://doi.org/10.3390/rs16244775 - 21 Dec 2024

Cited by 2 | Viewed by 1092

Abstract

With the development of unmanned aerial vehicle (UAV) technology, the threat of UAV intrusion is no longer negligible. Therefore, drone perception, especially anti-UAV tracking technology, has gathered considerable attention. However, both traditional Siamese and transformer-based trackers struggle in anti-UAV tasks due to the [...] Read more.

With the development of unmanned aerial vehicle (UAV) technology, the threat of UAV intrusion is no longer negligible. Therefore, drone perception, especially anti-UAV tracking technology, has gathered considerable attention. However, both traditional Siamese and transformer-based trackers struggle in anti-UAV tasks due to the small target size, clutter backgrounds and model degradation. To alleviate these challenges, a novel contrastive-augmented memory network (CAMTracker) is proposed for anti-UAV tracking tasks in thermal infrared (TIR) videos. The proposed CAMTracker conducts tracking through a two-stage scheme, searching for possible candidates in the first stage and matching the candidates with the template for final prediction. In the first stage, an instance-guided region proposal network (IG-RPN) is employed to calculate the correlation features between the templates and the searching images and further generate candidate proposals. In the second stage, a contrastive-augmented matching module (CAM), along with a refined contrastive loss function, is designed to enhance the discrimination ability of the tracker under the instruction of contrastive learning strategy. Moreover, to avoid model degradation, an adaptive dynamic memory module (ADM) is proposed to maintain a dynamic template to cope with the feature variation of the target in long sequences. Comprehensive experiments have been conducted on the Anti-UAV410 dataset, where the proposed CAMTracker achieves the best performance compared to advanced tracking algorithms, with significant advantages on all the evaluation metrics, including at least 2.40%, 4.12%, 5.43% and 5.48% on precision, success rate, success AUC and state accuracy, respectively. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Methods for Remote Sensing Target and Object Detection II)

► Show Figures

Figure 1

17 pages, 6857 KiB

Open AccessArticle

Lightweight Siamese Network with Global Correlation for Single-Object Tracking

by Yuxuan Ding and Kehua Miao

Sensors 2024, 24(24), 8171; https://doi.org/10.3390/s24248171 - 21 Dec 2024

Viewed by 1098

Abstract

Recent advancements in the field of object tracking have been notably influenced by Siamese-based trackers, which have demonstrated considerable progress in their performance and application. Researchers frequently emphasize the precision of trackers, yet they tend to neglect the associated complexity. This oversight can [...] Read more.

Recent advancements in the field of object tracking have been notably influenced by Siamese-based trackers, which have demonstrated considerable progress in their performance and application. Researchers frequently emphasize the precision of trackers, yet they tend to neglect the associated complexity. This oversight can restrict real-time performance, rendering these trackers inadequate for specific applications. This study presents a novel lightweight Siamese network tracker, termed SiamGCN, which incorporates global feature fusion alongside a lightweight network architecture to improve tracking performance on devices with limited resources. MobileNet-V3 was chosen as the backbone network for feature extraction, with modifications made to the stride of its final layer to enhance extraction efficiency. A global correlation module, which was founded on the Transformer architecture, was developed utilizing a multi-head cross-attention mechanism. This design enhances the integration of template and search region features, thereby facilitating more precise and resilient tracking capabilities. The model underwent evaluation across four prominent tracking benchmarks: VOT2018, VOT2019, LaSOT, and TrackingNet. The results indicate that SiamGCN achieves high tracking performance while simultaneously decreasing the number of parameters and computational costs. This results in significant benefits regarding processing speed and resource utilization. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

24 pages, 10105 KiB

Open AccessArticle

SiamRhic: Improved Cross-Correlation and Ranking Head-Based Siamese Network for Object Tracking in Remote Sensing Videos

by Afeng Yang, Zhuolin Yang and Wenqing Feng

Remote Sens. 2024, 16(23), 4549; https://doi.org/10.3390/rs16234549 - 4 Dec 2024

Viewed by 1167

Abstract

Object tracking in remote sensing videos is a challenging task in computer vision. Recent advances in deep learning have sparked significant interest in tracking algorithms based on Siamese neural networks. However, many existing algorithms fail to deliver satisfactory performance in complex scenarios due [...] Read more.

Object tracking in remote sensing videos is a challenging task in computer vision. Recent advances in deep learning have sparked significant interest in tracking algorithms based on Siamese neural networks. However, many existing algorithms fail to deliver satisfactory performance in complex scenarios due to challenging conditions and limited computational resources. Thus, enhancing tracking efficiency and improving algorithm responsiveness in complex scenarios are crucial. To address tracking drift caused by similar objects and background interference in remote sensing image tracking, we propose an enhanced Siamese network based on the SiamRhic architecture, incorporating a cross-correlation and ranking head for improved object tracking. We first use convolutional neural networks for feature extraction and integrate the CBAM (Convolutional Block Attention Module) to enhance the tracker’s representational capacity, allowing it to focus more effectively on the objects. Additionally, we replace the original depth-wise cross-correlation operation with asymmetric convolution, enhancing both speed and performance. We also introduce a ranking loss to reduce the classification confidence of interference objects, addressing the mismatch between classification and regression. We validate the proposed algorithm through experiments on the OTB100, UAV123, and OOTB remote sensing datasets. Specifically, SiamRhic achieves success, normalized precision, and precision rates of 0.533, 0.786, and 0.812, respectively, on the OOTB benchmark. The OTB100 benchmark achieves a success rate of 0.670 and a precision rate of 0.892. Similarly, in the UAV123 benchmark, SiamRhic achieves a success rate of 0.621 and a precision rate of 0.823. These results demonstrate the algorithm’s high precision and success rates, highlighting its practical value. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Graphical abstract

21 pages, 19996 KiB

Open AccessArticle

UAV Visual Object Tracking Based on Spatio-Temporal Context

by Yongxiang He, Chuang Chao, Zhao Zhang, Hongwu Guo and Jianjun Ma

Drones 2024, 8(12), 700; https://doi.org/10.3390/drones8120700 - 22 Nov 2024

Viewed by 1432

Abstract

To balance the real-time and robustness of UAV visual tracking on a single CPU, this paper proposes an object tracker based on spatio-temporal context (STCT). STCT integrates the correlation filter and Siamese network into a unified framework and introduces the target’s motion model, [...] Read more.

To balance the real-time and robustness of UAV visual tracking on a single CPU, this paper proposes an object tracker based on spatio-temporal context (STCT). STCT integrates the correlation filter and Siamese network into a unified framework and introduces the target’s motion model, enabling the tracker to adapt to target scale variations and effectively address challenges posed by rapid target motion, etc. Furthermore, a spatio-temporal regularization term based on the dynamic attention mechanism is proposed, and it is introduced into the correlation filter to suppress the aberrance of the response map. The filter solution is provided through the alternating direction method of multipliers (ADMM). In addition, to ensure efficiency, this paper proposes the average maximum response value-related energy (AMRE) for adaptive tracking state evaluation, which considers the time context of the tracking process in STCT. Experimental results show that the proposed STCT tracker can achieve a favorable balance between tracking robustness and real-time performance for UAV object tracking while running at ∼38 frames/s on a low-cost CPU. Full article

► Show Figures

Figure 1

17 pages, 3344 KiB

Open AccessArticle

Inteval Spatio-Temporal Constraints and Pixel-Spatial Hierarchy Region Proposals for Abrupt Motion Tracking

by Daxiang Suo and Xueling Lv

Electronics 2024, 13(20), 4084; https://doi.org/10.3390/electronics13204084 - 17 Oct 2024

Viewed by 793

Abstract

The RPN-based Siamese tracker has achieved remarkable performance with real-time speed but suffers from a lack of robustness in complex motion tracking. Especially when the target comes into an abrupt motion scenario, the assumption of motion smoothness may be broken, which will further [...] Read more.

The RPN-based Siamese tracker has achieved remarkable performance with real-time speed but suffers from a lack of robustness in complex motion tracking. Especially when the target comes into an abrupt motion scenario, the assumption of motion smoothness may be broken, which will further compromise the reliability of tracking results. Therefore, it is important to develop an adaptive tracker that can maintain robustness in complex motion scenarios. This paper proposes a novel tracking method based on the interval spatio-temporal constraints and a region proposal method over a pixel-spatial hierarchy. Firstly, to cope with the limitations of a fixed-constraint strategy for abrupt motion tracking, we propose a question-guided interval spatio-temporal constraint strategy. Based on the consideration of tracking status and the degree of penalty expansion, it enables the dynamic adjustment of the constraint weights, which ensures a match between response scores and true confidence values. Secondly, to guarantee the coverage of a target using candidate proposals in extreme motion scenarios, we propose a region proposal method over the pixel-spatial hierarchy. By combining visual common sense with reciprocal target-distractor information, our method implements a careful refinement of the primary proposals. Moreover, we introduce a discriminative-enhanced memory updater designed to ensure effective model adaptation. Comprehensive evaluations on five benchmark datasets: OTB100, UAV123, LaSOT, VOT2016, and VOT2018 demonstrate the superior performance of our proposed method in comparison to several state-of-the-art approaches. Full article

(This article belongs to the Special Issue Deep Perception in Autonomous Driving)

► Show Figures

Figure 1

19 pages, 9450 KiB

Open AccessArticle

Spatial-Temporal Contextual Aggregation Siamese Network for UAV Tracking

by Qiqi Chen, Xuan Wang, Faxue Liu, Yujia Zuo and Chenglong Liu

Drones 2024, 8(9), 433; https://doi.org/10.3390/drones8090433 - 26 Aug 2024

Viewed by 840

Abstract

In recent years, many studies have used Siamese networks (SNs) for UAV tracking. However, there are two problems with SNs for UAV tracking. Firstly, the information sources of the SNs are the invariable template patch and the current search frame. The static template [...] Read more.

In recent years, many studies have used Siamese networks (SNs) for UAV tracking. However, there are two problems with SNs for UAV tracking. Firstly, the information sources of the SNs are the invariable template patch and the current search frame. The static template information lacks the perception of dynamic feature information flow, and the shallow feature extraction and linear sequential mapping severely limit the mining of feature expressiveness. This makes it difficult for many existing SNs to cope with the challenges of UAV tracking, such as scale variation and viewpoint change caused by the change in height and angle of the UAV, and the challenges of background clutter and occlusion caused by complex aviation backgrounds. Secondly, the SNs trackers for UAV tracking still struggle with extracting lightweight and effective features. A tracker with a heavy-weighted backbone is not welcome due to the limited computing power of the UAV platform. Therefore, we propose a lightweight spatial-temporal contextual Siamese tracking system for UAV tracking (SiamST). The proposed SiamST improves the UAV tracking performance by augmenting the horizontal spatial information and introducing vertical temporal information to the Siamese network. Specifically, a high-order multiscale spatial module is designed to extract multiscale remote high-order spatial information, and a temporal template transformer introduces temporal contextual information for dynamic template updating. The evaluation and contrast results of the proposed SiamST with many state-of-the-art trackers on three UAV benchmarks show that the proposed SiamST is efficient and lightweight. Full article

► Show Figures

Figure 1

20 pages, 17698 KiB

Open AccessArticle

Contextual Enhancement–Interaction and Multi-Scale Weighted Fusion Network for Aerial Tracking

by Bo Wang, Xuan Wang, Linglong Ma, Yujia Zuo and Chenglong Liu

Drones 2024, 8(8), 343; https://doi.org/10.3390/drones8080343 - 24 Jul 2024

Cited by 1 | Viewed by 1406

Abstract

Siamese-based trackers have been widely utilized in UAV visual tracking due to their outstanding performance. However, UAV visual tracking encounters numerous challenges, such as similar targets, scale variations, and background clutter. Existing Siamese trackers face two significant issues: firstly, they rely on single-branch [...] Read more.

Siamese-based trackers have been widely utilized in UAV visual tracking due to their outstanding performance. However, UAV visual tracking encounters numerous challenges, such as similar targets, scale variations, and background clutter. Existing Siamese trackers face two significant issues: firstly, they rely on single-branch features, limiting their ability to achieve long-term and accurate aerial tracking. Secondly, current tracking algorithms treat multi-level similarity responses equally, making it difficult to ensure tracking accuracy in complex airborne environments. To tackle these challenges, we propose a novel UAV tracking Siamese network named the contextual enhancement–interaction and multi-scale weighted fusion network, which is designed to improve aerial tracking performance. Firstly, we designed a contextual enhancement–interaction module to improve feature representation. This module effectively facilitates the interaction between the template and search branches and strengthens the features of each branch in parallel. Specifically, a cross-attention mechanism within the module integrates the branch information effectively. The parallel Transformer-based enhancement structure improves the feature saliency significantly. Additionally, we designed an efficient multi-scale weighted fusion module that adaptively weights the correlation response maps across different feature scales. This module fully utilizes the global similarity response between the template and the search area, enhancing feature distinctiveness and improving tracking results. We conducted experiments using several state-of-the-art trackers on aerial tracking benchmarks, including DTB70, UAV123, UAV20L, and UAV123@10fps, to validate the efficacy of the proposed network. The experimental results demonstrate that our tracker performs effectively in complex aerial tracking scenarios and competes well with state-of-the-art trackers. Full article

► Show Figures

Figure 1

13 pages, 2487 KiB

Open AccessArticle

SiamSMN: Siamese Cross-Modality Fusion Network for Object Tracking

by Shuo Han, Lisha Gao, Yue Wu, Tian Wei, Manyu Wang and Xu Cheng

Information 2024, 15(7), 418; https://doi.org/10.3390/info15070418 - 19 Jul 2024

Viewed by 1561

Abstract

The existing Siamese trackers have achieved increasingly successful results in visual object tracking. However, the interactive fusion among multi-layer similarity maps after cross-correlation has not been fully studied in previous Siamese network-based methods. To address this issue, we propose a novel Siamese network [...] Read more.

The existing Siamese trackers have achieved increasingly successful results in visual object tracking. However, the interactive fusion among multi-layer similarity maps after cross-correlation has not been fully studied in previous Siamese network-based methods. To address this issue, we propose a novel Siamese network for visual object tracking, named SiamSMN, which consists of a feature extraction network, a multi-scale fusion module, and a prediction head. First, the feature extraction network is used to extract the features of the template image and the search image, which is calculated by a depth-wise cross-correlation operation to produce multiple similarity feature maps. Second, we propose an effective multi-scale fusion module that can extract global context information for object search and learn the interdependencies between multi-level similarity maps. In addition, to further improve tracking accuracy, we design a learnable prediction head module to generate a boundary point for each side based on the coarse bounding box, which can solve the problem of inconsistent classification and regression during the tracking. Extensive experiments on four public benchmarks demonstrate that the proposed tracker has a competitive performance among other state-of-the-art trackers. Full article

(This article belongs to the Special Issue Advanced Methods for Multi-Source Information Management, Modeling, and Analysis)

► Show Figures

Figure 1

16 pages, 6425 KiB

Open AccessArticle

A Robust AR-DSNet Tracking Registration Method in Complex Scenarios

by Xiaomei Lei, Wenhuan Lu, Jiu Yong and Jianguo Wei

Electronics 2024, 13(14), 2807; https://doi.org/10.3390/electronics13142807 - 17 Jul 2024

Viewed by 1053

Abstract

A robust AR-DSNet (Augmented Reality method based on DSST and SiamFC networks) tracking registration method in complex scenarios is proposed to improve the ability of AR (Augmented Reality) tracking registration to distinguish target foreground and semantic interference background, and to address the issue [...] Read more.

A robust AR-DSNet (Augmented Reality method based on DSST and SiamFC networks) tracking registration method in complex scenarios is proposed to improve the ability of AR (Augmented Reality) tracking registration to distinguish target foreground and semantic interference background, and to address the issue of registration failure caused by similar target drift when obtaining scale information based on predicted target positions. Firstly, the pre-trained network in SiamFC (Siamese Fully-Convolutional) is utilized to obtain the response map of a larger search area and set a threshold to filter out the initial possible positions of the target; Then, combining the advantage of the DSST (Discriminative Scale Space Tracking) filter tracker to update the template online, a new scale filter is trained after collecting multi-scale images at the initial possible position of target to reason the target scale change. And linear interpolation is used to update the correlation coefficient to determine the final position of target tracking based on the difference between two frames. Finally, ORB (Oriented FAST and Rotated BRIEF) feature detection and matching are performed on the accurate target position image, and the registration matrix is calculated through matching relationships to overlay the virtual model onto the real scene, achieving enhancement of the real world. Simulation experiments show that in complex scenarios such as similar interference, target occlusion, and local deformation, the proposed AR-DSNet method can complete the registration of the target in AR 3D tracking, ensuring real-time performance while improving the robustness of the AR tracking registration algorithm. Full article

(This article belongs to the Special Issue Emerging Immersive Learning Technologies: Augmented and Virtual Reality)

► Show Figures

Figure 1

16 pages, 1385 KiB

Open AccessArticle

SiamDCFF: Dynamic Cascade Feature Fusion for Vision Tracking

by Jinbo Lu, Na Wu and Shuo Hu

Sensors 2024, 24(14), 4545; https://doi.org/10.3390/s24144545 - 13 Jul 2024

Viewed by 867

Abstract

Establishing an accurate and robust feature fusion mechanism is key to enhancing the tracking performance of single-object trackers based on a Siamese network. However, the output features of the depth-wise cross-correlation feature fusion module in fully convolutional trackers based on Siamese networks cannot [...] Read more.

Establishing an accurate and robust feature fusion mechanism is key to enhancing the tracking performance of single-object trackers based on a Siamese network. However, the output features of the depth-wise cross-correlation feature fusion module in fully convolutional trackers based on Siamese networks cannot establish global dependencies on the feature maps of a search area. This paper proposes a dynamic cascade feature fusion (DCFF) module by introducing a local feature guidance (LFG) module and dynamic attention modules (DAMs) after the depth-wise cross-correlation module to enhance the global dependency modeling capability during the feature fusion process. In this paper, a set of verification experiments is designed to investigate whether establishing global dependencies for the features output by the depth-wise cross-correlation operation can significantly improve the performance of fully convolutional trackers based on a Siamese network, providing experimental support for rational design of the structure of a dynamic cascade feature fusion module. Secondly, we integrate the dynamic cascade feature fusion module into the tracking framework based on a Siamese network, propose SiamDCFF, and evaluate it using public datasets. Compared with the baseline model, SiamDCFF demonstrated significant improvements. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

17 pages, 687 KiB

Open AccessArticle

Enhancing Embedded Object Tracking: A Hardware Acceleration Approach for Real-Time Predictability

by Mingyang Zhang, Kristof Van Beeck and Toon Goedemé

J. Imaging 2024, 10(3), 70; https://doi.org/10.3390/jimaging10030070 - 13 Mar 2024

Cited by 1 | Viewed by 2516

Abstract

While Siamese object tracking has witnessed significant advancements, its hard real-time behaviour on embedded devices remains inadequately addressed. In many application cases, an embedded implementation should not only have a minimal execution latency, but this latency should ideally also have zero variance, i.e., [...] Read more.

While Siamese object tracking has witnessed significant advancements, its hard real-time behaviour on embedded devices remains inadequately addressed. In many application cases, an embedded implementation should not only have a minimal execution latency, but this latency should ideally also have zero variance, i.e., be predictable. This study aims to address this issue by meticulously analysing real-time predictability across different components of a deep-learning-based video object tracking system. Our detailed experiments not only indicate the superiority of Field-Programmable Gate Array (FPGA) implementations in terms of hard real-time behaviour but also unveil important time predictability bottlenecks. We introduce dedicated hardware accelerators for key processes, focusing on depth-wise cross-correlation and padding operations, utilizing high-level synthesis (HLS). Implemented on a KV260 board, our enhanced tracker exhibits not only a speed up, with a factor of 6.6, in mean execution time but also significant improvements in hard real-time predictability by yielding 11 times less latency variation as compared to our baseline. A subsequent analysis of power consumption reveals our approach’s contribution to enhanced power efficiency. These advancements underscore the crucial role of hardware acceleration in realizing time-predictable object tracking on embedded systems, setting new standards for future hardware–software co-design endeavours in this domain. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning: Trends and Applications (2nd Edition))

► Show Figures

Figure 1

Search Results (89)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (89)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI