sensors-logo

Journal Browser

Journal Browser

Transformer Applications in Target Tracking

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 31 October 2025 | Viewed by 2964

Special Issue Editors


E-Mail Website
Guest Editor
School of Automation and Software Engineering, Shanxi University, Taiyuan 030006, China
Interests: image processing; artificial intelligence;deep learning; target detection;pattern recognition; target recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: wireless resource allocation and management; wireless communications and networking; dynamic game and mean field game theory; big data analysis; security
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100811, China
Interests: medical image processing; deep learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

A convolutional neural networks is a neural network architecture for processing spatial data, such as images and videos. Given their good translational invariance and local perceptibility, they have been widely used in target classification and target tracking. However, CNNs cannot model long-range information, and they cannot effectively extract long-range feature information of the target to be tracked, which affects the efficiency and accuracy of target tracking. Since the release of transformer-based ChatGPT 3.0 on June 11, 2020, they have provided the powerful capability to process sequence data.

Although the CNN model has achieved great success in the field of target tracking over the years, there are still many problems in the practical application of the target tracking problem in complex scenes. This situation shows that there is a non-negligible gap between the theoretical progress in related fields and practical applications. Therefore, we invite papers on theoretical research and practical applications related to transformer architecture in the field of target tracking.

Prof. Dr. Fengping An
Prof. Dr. Haitao Xu
Dr. Chuyang Ye
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • transformers
  • target tracking
  • target recognition
  • deep learning
  • CNN
  • sensors

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 11785 KiB  
Article
IRFNet: Cognitive-Inspired Iterative Refinement Fusion Network for Camouflaged Object Detection
by Guohan Li, Jingxin Wang, Jianming Wei and Zhengyi Xu
Sensors 2025, 25(5), 1555; https://doi.org/10.3390/s25051555 - 3 Mar 2025
Viewed by 562
Abstract
Camouflaged Object Detection (COD) aims to identify objects that are intentionally concealed within their surroundings through appearance, texture, or pattern adaptations. Despite recent advances, extreme object–background similarity causes existing methods struggle with accurately capturing discriminative features and effectively modeling multiscale patterns while preserving [...] Read more.
Camouflaged Object Detection (COD) aims to identify objects that are intentionally concealed within their surroundings through appearance, texture, or pattern adaptations. Despite recent advances, extreme object–background similarity causes existing methods struggle with accurately capturing discriminative features and effectively modeling multiscale patterns while preserving fine details. To address these challenges, we propose Iterative Refinement Fusion Network (IRFNet), a novel framework that mimics human visual cognition through progressive feature enhancement and iterative optimization. Our approach incorporates the following: (1) a Hierarchical Feature Enhancement Module (HFEM) coupled with a dynamic channel-spatial attention mechanism, which enriches multiscale feature representations through bilateral and trilateral fusion pathways; and (2) a Context-guided Iterative Optimization Framework (CIOF) that combines transformer-based global context modeling with iterative refinement through dual-branch supervision. Extensive experiments on three challenging benchmark datasets (CAMO, COD10K, and NC4K) demonstrate that IRFNet consistently outperforms fourteen state-of-the-art methods, achieving improvements of 0.9–13.7% across key metrics. Comprehensive ablation studies validate the effectiveness of each proposed component and demonstrate how our iterative refinement strategy enables progressive improvement in detection accuracy. Full article
(This article belongs to the Special Issue Transformer Applications in Target Tracking)
Show Figures

Figure 1

22 pages, 12001 KiB  
Article
A Study on Systematic Improvement of Transformer Models for Object Pose Estimation
by Jungwoo Lee and Jinho Suh
Sensors 2025, 25(4), 1227; https://doi.org/10.3390/s25041227 - 18 Feb 2025
Viewed by 498
Abstract
Transformer architecture, initially developed for natural language processing and time series analysis, has been successfully adapted to various generative models in several domains. Object pose estimation, which uses images to determine the 3D position and orientation of an object, is essential for tasks [...] Read more.
Transformer architecture, initially developed for natural language processing and time series analysis, has been successfully adapted to various generative models in several domains. Object pose estimation, which uses images to determine the 3D position and orientation of an object, is essential for tasks such as robotic manipulation. This study introduces a transformer-based deep learning model for object pose estimation in computer vision, which determines the 3D position and orientation of objects from images. A baseline model derived from an encoder-only transformer faces challenges with high GPU memory usage when handling multiple objects. To improve training efficiency and support multi-object inference, it reduces memory consumption by adjusting the transformer’s attention layer and incorporates low-rank weight decomposition to decrease parameters. Additionally, GQA and RMS normalization enhance multi-object pose estimation performance, resulting in reduced memory usage and improved training accuracy. The improved model implementation with an extended matrix dimension reduced the GPU memory usage to only 2.5% of the baseline model, although it increased the number of model weight parameters. To mitigate this, the number of weight parameters was reduced by 28% using low-rank weight decomposition in the linear layer of attention. In addition, a 17% improvement in rotation training accuracy over the baseline model was achieved by applying GQA and RMS normalization. Full article
(This article belongs to the Special Issue Transformer Applications in Target Tracking)
Show Figures

Figure 1

23 pages, 4920 KiB  
Article
Robust Tracking Method for Small and Weak Multiple Targets Under Dynamic Interference Based on Q-IMM-MHT
by Ziqian Yang, Hongbin Nie, Yuxuan Liu and Chunjiang Bian
Sensors 2025, 25(4), 1058; https://doi.org/10.3390/s25041058 - 10 Feb 2025
Cited by 1 | Viewed by 542
Abstract
In complex environments, traditional multi-target tracking methods often encounter challenges such as strong clutter interference and interruptions in target trajectories, which can result in insufficient tracking accuracy and robustness. To address these issues, this paper presents an improved multi-target tracking algorithm, termed Q-IMM-MHT. [...] Read more.
In complex environments, traditional multi-target tracking methods often encounter challenges such as strong clutter interference and interruptions in target trajectories, which can result in insufficient tracking accuracy and robustness. To address these issues, this paper presents an improved multi-target tracking algorithm, termed Q-IMM-MHT. This method integrates Multiple Hypothesis Tracking (MHT) with Interactive Multiple Model (IMM) and introduces a Q-learning-based adaptive model switching strategy to dynamically adjust model selection in response to variations in the target’s motion patterns. Furthermore, the algorithm utilizes Support Vector Machines (SVMs) for anomaly detection and trajectory recovery, thereby enhancing the accuracy of data association and the overall robustness of the system. Experimental results indicate that under high noise conditions, the Root Mean Square Error (RMSE) of position estimation decreases to 0.74 pixels, while the RMSE of velocity estimation falls to 0.04 pixels/frame. Compared to traditional methods such as the Unscented Kalman Filter (UKF), IMM, and CIMM, the RMSE is reduced by at least 10.84% and 42.86%, respectively. In scenarios characterized by target trajectory interruptions and clutter interference, the algorithm maintains an association accuracy exceeding 46.3% even after 30 frames of interruption, significantly outperforming other methods. These findings demonstrate that the Q-IMM-MHT algorithm offers substantial performance improvements in multi-target tracking tasks within complex environments, effectively enhancing both tracking accuracy and stability, with considerable application value and extensive potential for future use. Full article
(This article belongs to the Special Issue Transformer Applications in Target Tracking)
Show Figures

Figure 1

20 pages, 8117 KiB  
Article
Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition
by Xin Guo, Young Kim, Xueli Ning and Se Dong Min
Sensors 2025, 25(2), 301; https://doi.org/10.3390/s25020301 - 7 Jan 2025
Viewed by 1095
Abstract
The Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize the a priori [...] Read more.
The Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize the a priori information of strong complex temporal correlations. Therefore, we proposed using multi-layer convolutional layers as a Convolutional Feature Extractor Block (CFEB). CFEB enables the Transformer model to leverage both local and global time series features for activity classification. Meanwhile, the absolute position embedding (APE) in existing Transformer models cannot accurately represent the distance relationship between individuals at different time points. To further explore positional correlations in temporal signals, this paper introduces the Vector-based Relative Position Embedding (vRPE), aiming to provide more relative temporal position information within sensor signals for the Transformer model. Combining these innovations, we conduct extensive experiments on three HAR benchmark datasets: KU-HAR, UniMiB SHAR, and USC-HAD. Experimental results demonstrate that our proposed enhancement scheme substantially elevates the performance of the Transformer model in HAR. Full article
(This article belongs to the Special Issue Transformer Applications in Target Tracking)
Show Figures

Figure 1

Back to TopTop