Submit to Special Issue Submit Abstract to Special Issue Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Transformer Applications in Target Tracking

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 31 October 2025 | Viewed by 6687

Share This Special Issue

Special Issue Editors

Dr. Fengping An

E-Mail Website
Guest Editor

School of Automation and Software Engineering, Shanxi University, Taiyuan 030006, China
Interests: image processing; artificial intelligence;deep learning; target detection;pattern recognition; target recognition
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Haitao Xu

E-Mail Website
Guest Editor

Department of Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Interests: wireless resource allocation and management; wireless communications and networking; dynamic game and mean field game theory; big data analysis; security
Special Issues, Collections and Topics in MDPI journals

Dr. Chuyang Ye

E-Mail Website
Guest Editor

School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing 100811, China
Interests: medical image processing; deep learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

A convolutional neural networks is a neural network architecture for processing spatial data, such as images and videos. Given their good translational invariance and local perceptibility, they have been widely used in target classification and target tracking. However, CNNs cannot model long-range information, and they cannot effectively extract long-range feature information of the target to be tracked, which affects the efficiency and accuracy of target tracking. Since the release of transformer-based ChatGPT 3.0 on June 11, 2020, they have provided the powerful capability to process sequence data.

Although the CNN model has achieved great success in the field of target tracking over the years, there are still many problems in the practical application of the target tracking problem in complex scenes. This situation shows that there is a non-negligible gap between the theoretical progress in related fields and practical applications. Therefore, we invite papers on theoretical research and practical applications related to transformer architecture in the field of target tracking.

Prof. Dr. Fengping An
Prof. Dr. Haitao Xu
Dr. Chuyang Ye
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

transformers
target tracking
target recognition
deep learning
CNN
sensors

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

17 pages, 2072 KiB

Open AccessArticle

Barefoot Footprint Detection Algorithm Based on YOLOv8-StarNet

by Yujie Shen, Xuemei Jiang, Yabin Zhao and Wenxin Xie

Sensors 2025, 25(15), 4578; https://doi.org/10.3390/s25154578 - 24 Jul 2025

Viewed by 326

Abstract

This study proposes an optimized footprint recognition model based on an enhanced StarNet architecture for biometric identification in the security, medical, and criminal investigation fields. Conventional image recognition algorithms exhibit limitations in processing barefoot footprint images characterized by concentrated feature distributions and rich texture patterns. To address this, our framework integrates an improved StarNet into the backbone of YOLOv8 architecture. Leveraging the unique advantages of element-wise multiplication, the redesigned backbone efficiently maps inputs to a high-dimensional nonlinear feature space without increasing channel dimensions, achieving enhanced representational capacity with low computational latency. Subsequently, an Encoder layer facilitates feature interaction within the backbone through multi-scale feature fusion and attention mechanisms, effectively extracting rich semantic information while maintaining computational efficiency. In the feature fusion part, a feature modulation block processes multi-scale features by synergistically combining global and local information, thereby reducing redundant computations and decreasing both parameter count and computational complexity to achieve model lightweighting. Experimental evaluations on a proprietary barefoot footprint dataset demonstrate that the proposed model exhibits significant advantages in terms of parameter efficiency, recognition accuracy, and computational complexity. The number of parameters has been reduced by 0.73 million, further improving the model’s speed. Gflops has been reduced by 1.5, lowering the performance requirements for computational hardware during model deployment. Recognition accuracy has reached 99.5%, with further improvements in model precision. Future research will explore how to capture shoeprint images with complex backgrounds from shoes worn at crime scenes, aiming to further enhance the model’s recognition capabilities in more forensic scenarios. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Figure 1

27 pages, 1868 KiB

Open AccessArticle

SAM2-DFBCNet: A Camouflaged Object Detection Network Based on the Heira Architecture of SAM2

by Cao Yuan, Libang Liu, Yaqin Li and Jianxiang Li

Sensors 2025, 25(14), 4509; https://doi.org/10.3390/s25144509 - 21 Jul 2025

Viewed by 434

Abstract

Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with their background, presenting significant challenges such as low contrast, complex textures, and blurred boundaries. Existing deep learning methods often struggle to achieve robust segmentation under these conditions. To address these limitations, this paper proposes a novel COD network, SAM2-DFBCNet, built upon the SAM2 Hiera architecture. Our network incorporates three key modules: (1) the Camouflage-Aware Context Enhancement Module (CACEM), which fuses local and global features through an attention mechanism to enhance contextual awareness in low-contrast scenes; (2) the Cross-Scale Feature Interaction Bridge (CSFIB), which employs a bidirectional convolutional GRU for the dynamic fusion of multi-scale features, effectively mitigating representation inconsistencies caused by complex textures and deformations; and (3) the Dynamic Boundary Refinement Module (DBRM), which combines channel and spatial attention mechanisms to optimize boundary localization accuracy and enhance segmentation details. Extensive experiments on three public datasets—CAMO, COD10K, and NC4K—demonstrate that SAM2-DFBCNet outperforms twenty state-of-the-art methods, achieving maximum improvements of 7.4%, 5.78%, and 4.78% in key metrics such as S-measure (

S_{α}

), F-measure (

F_{β}

), and mean E-measure (

E_{ϕ}

), respectively, while reducing the Mean Absolute Error (M) by 37.8%. These results validate the superior performance and robustness of our approach in complex camouflage scenarios. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Figure 1

20 pages, 11785 KiB

Open AccessArticle

IRFNet: Cognitive-Inspired Iterative Refinement Fusion Network for Camouflaged Object Detection

by Guohan Li, Jingxin Wang, Jianming Wei and Zhengyi Xu

Sensors 2025, 25(5), 1555; https://doi.org/10.3390/s25051555 - 3 Mar 2025

Viewed by 1008

Abstract

Camouflaged Object Detection (COD) aims to identify objects that are intentionally concealed within their surroundings through appearance, texture, or pattern adaptations. Despite recent advances, extreme object–background similarity causes existing methods struggle with accurately capturing discriminative features and effectively modeling multiscale patterns while preserving fine details. To address these challenges, we propose Iterative Refinement Fusion Network (IRFNet), a novel framework that mimics human visual cognition through progressive feature enhancement and iterative optimization. Our approach incorporates the following: (1) a Hierarchical Feature Enhancement Module (HFEM) coupled with a dynamic channel-spatial attention mechanism, which enriches multiscale feature representations through bilateral and trilateral fusion pathways; and (2) a Context-guided Iterative Optimization Framework (CIOF) that combines transformer-based global context modeling with iterative refinement through dual-branch supervision. Extensive experiments on three challenging benchmark datasets (CAMO, COD10K, and NC4K) demonstrate that IRFNet consistently outperforms fourteen state-of-the-art methods, achieving improvements of 0.9–13.7% across key metrics. Comprehensive ablation studies validate the effectiveness of each proposed component and demonstrate how our iterative refinement strategy enables progressive improvement in detection accuracy. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Figure 1

22 pages, 12001 KiB

Open AccessArticle

A Study on Systematic Improvement of Transformer Models for Object Pose Estimation

by Jungwoo Lee and Jinho Suh

Sensors 2025, 25(4), 1227; https://doi.org/10.3390/s25041227 - 18 Feb 2025

Viewed by 1130

Abstract

Transformer architecture, initially developed for natural language processing and time series analysis, has been successfully adapted to various generative models in several domains. Object pose estimation, which uses images to determine the 3D position and orientation of an object, is essential for tasks such as robotic manipulation. This study introduces a transformer-based deep learning model for object pose estimation in computer vision, which determines the 3D position and orientation of objects from images. A baseline model derived from an encoder-only transformer faces challenges with high GPU memory usage when handling multiple objects. To improve training efficiency and support multi-object inference, it reduces memory consumption by adjusting the transformer’s attention layer and incorporates low-rank weight decomposition to decrease parameters. Additionally, GQA and RMS normalization enhance multi-object pose estimation performance, resulting in reduced memory usage and improved training accuracy. The improved model implementation with an extended matrix dimension reduced the GPU memory usage to only 2.5% of the baseline model, although it increased the number of model weight parameters. To mitigate this, the number of weight parameters was reduced by 28% using low-rank weight decomposition in the linear layer of attention. In addition, a 17% improvement in rotation training accuracy over the baseline model was achieved by applying GQA and RMS normalization. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Figure 1

23 pages, 4920 KiB

Open AccessArticle

Robust Tracking Method for Small and Weak Multiple Targets Under Dynamic Interference Based on Q-IMM-MHT

by Ziqian Yang, Hongbin Nie, Yuxuan Liu and Chunjiang Bian

Sensors 2025, 25(4), 1058; https://doi.org/10.3390/s25041058 - 10 Feb 2025

Cited by 3 | Viewed by 804

Abstract

In complex environments, traditional multi-target tracking methods often encounter challenges such as strong clutter interference and interruptions in target trajectories, which can result in insufficient tracking accuracy and robustness. To address these issues, this paper presents an improved multi-target tracking algorithm, termed Q-IMM-MHT. This method integrates Multiple Hypothesis Tracking (MHT) with Interactive Multiple Model (IMM) and introduces a Q-learning-based adaptive model switching strategy to dynamically adjust model selection in response to variations in the target’s motion patterns. Furthermore, the algorithm utilizes Support Vector Machines (SVMs) for anomaly detection and trajectory recovery, thereby enhancing the accuracy of data association and the overall robustness of the system. Experimental results indicate that under high noise conditions, the Root Mean Square Error (RMSE) of position estimation decreases to 0.74 pixels, while the RMSE of velocity estimation falls to 0.04 pixels/frame. Compared to traditional methods such as the Unscented Kalman Filter (UKF), IMM, and CIMM, the RMSE is reduced by at least 10.84% and 42.86%, respectively. In scenarios characterized by target trajectory interruptions and clutter interference, the algorithm maintains an association accuracy exceeding 46.3% even after 30 frames of interruption, significantly outperforming other methods. These findings demonstrate that the Q-IMM-MHT algorithm offers substantial performance improvements in multi-target tracking tasks within complex environments, effectively enhancing both tracking accuracy and stability, with considerable application value and extensive potential for future use. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Figure 1

20 pages, 8117 KiB

Open AccessArticle

Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition

by Xin Guo, Young Kim, Xueli Ning and Se Dong Min

Sensors 2025, 25(2), 301; https://doi.org/10.3390/s25020301 - 7 Jan 2025

Viewed by 2282

Abstract

The Transformer model has received significant attention in Human Activity Recognition (HAR) due to its self-attention mechanism that captures long dependencies in time series. However, for Inertial Measurement Unit (IMU) sensor time-series signals, the Transformer model does not effectively utilize the a priori information of strong complex temporal correlations. Therefore, we proposed using multi-layer convolutional layers as a Convolutional Feature Extractor Block (CFEB). CFEB enables the Transformer model to leverage both local and global time series features for activity classification. Meanwhile, the absolute position embedding (APE) in existing Transformer models cannot accurately represent the distance relationship between individuals at different time points. To further explore positional correlations in temporal signals, this paper introduces the Vector-based Relative Position Embedding (vRPE), aiming to provide more relative temporal position information within sensor signals for the Transformer model. Combining these innovations, we conduct extensive experiments on three HAR benchmark datasets: KU-HAR, UniMiB SHAR, and USC-HAD. Experimental results demonstrate that our proposed enhancement scheme substantially elevates the performance of the Transformer model in HAR. Full article

(This article belongs to the Special Issue Transformer Applications in Target Tracking)

► Show Figures

Journal Menu

Journal Browser

Transformer Applications in Target Tracking

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (6 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI