Submit to Special Issue Submit Abstract to Special Issue Review for Information Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Emerging Research in Object Tracking and Image Segmentation

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 March 2026 | Viewed by 8475

Share This Special Issue

Special Issue Editors

Prof. Dr. Kaihua Zhang

E-Mail Website
Guest Editor

School of Computer Science and Technology, Nanjing University of Information Science & Technology, Nanjing 210044, China
Interests: image segmentation; level sets; visual tracking
Special Issues, Collections and Topics in MDPI journals

Dr. Francesco Camastra

E-Mail
Guest Editor

Department of Science and Technology, University of Naples Parthenope, 80133 Napoli, Italy
Interests: machine learning; kernel methods; lustering; intrinsic dimension estimation; gesture recognition; handwriting recognition; time series prediction; dimensionality reduction
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Visual object tracking and segmentation are both essential components of perception, and taken together they have become an active research topic in the computer vision community over the decades. Visual object tracking and segmentation algorithms have developed rapidly thanks to the massive amount of video data that in turn creates high demand for the speed and accuracy of tracking algorithms. Researchers are motivated to design faster and better methods in spite of the challenges that exist in visual object tracking and segmentation, especially robustness when it comes to heavy occlusions, fast motion, accurate localization, mult-object tracking, and low-resolution. Despite the success in addressing numerous challenges under a wide range of circumstances, the core problems remain complex and challenging.

This main aim of this Special lssue will be to focus on the most recent advancements and trends in visual object tracking and segmentation. Methods such as those reported in the formulation of Siamese networks and spatial-temporal memory for VOT and VOS may be further explored to improve performance. We invite original research work involving novel technigues, innovative methods, and useful applications that lead to significant advances in VOT and VOS. We also welcome reviews and surveys on state-of-the-art methods. Topics of interest include, but are not limited to:

Object detection, identification, recognition, tracking, and segmentation.
Video analysis and tracking.
Image and video enhancement algorithms to improve the quality of video object tracking.
Computational photography and imaging for advanced object detection and tracking.
Depth estimation and three-dimensional reconstruction for augmented reality (AR) and/or advanced driver assistance systems (ADAS).
Learning data representation from video based on supervised, unsupervised, and semi-supervised learning.
Dataset and performance evaluation, person re-identification, and vehicle re-identification.
Human behavior detection, human pose estimation, and tracking.
Visual surveillance and monitoring.

Prof. Dr. Kaihua Zhang
Dr. Francesco Camastra
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

visual object tracking
image segmentation
pose estimation
image and video enhancement

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

22 pages, 4248 KiB

Open AccessArticle

Reconstructing Domain-Specific Features for Unsupervised Domain-Adaptive Object Detection

by Shuai Dong, Kang Deng and Kun Zou

Information 2025, 16(6), 439; https://doi.org/10.3390/info16060439 - 26 May 2025

Viewed by 580

Abstract

Unsupervised domain adaptation (UDA) effectively transfers knowledge learned from a labeled source domain to an unlabeled target domain. The teacher–student framework, which generates pseudo-labels for target domain samples and uses them for pseudo-supervised training, enables self-training and improves generalization in UDA object detection. However, for one-stage detection models, pseudo-labels are unreliable when positive and negative samples are imbalanced. This may lead the model to overfit the source domain and overlook important target-domain information. In this work, we propose a novel domain-specific student–teacher framework to address this issue. The innovations of the proposed framework can be summarized in two aspects. First, we employ two domain-specific heads (DSHs) in the student model to handle inputs from the source domain and the target domain separately. These two heads are optimized independently with samples from their respective domains. This design allows for reducing the impact of unreliable pseudo-labels and fully leveraging unique information specific to the target domain. Second, we introduce an auxiliary reconstruction branch, named the multi-scale mask adversarial alignment (MMAA) module, into the teacher–student framework. The MMAA is tasked with reconstructing randomly masked multi-scale features of the source domain, which enhances the student model’s semantic representation capability and facilitates the generation of high-quality pseudo-labels. Experimental results on six diverse cross-domain scenarios demonstrate the effectiveness of our framework. Full article

(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)

► Show Figures

Graphical abstract

24 pages, 14300 KiB

Open AccessArticle

Quantum Edge Detection and Convolution Using Paired Transform-Based Image Representation

by Artyom Grigoryan, Alexis Gomez, Sos Agaian and Karen Panetta

Information 2025, 16(4), 255; https://doi.org/10.3390/info16040255 - 21 Mar 2025

Viewed by 594

Abstract

Classical edge detection algorithms often struggle to process large, high-resolution image datasets efficiently. Quantum image processing offers a promising alternative, but current implementations face significant challenges, such as time-consuming data acquisition, complex device requirements, and limited real-time processing capabilities. This work presents a novel paired transform-based quantum representation for efficient image processing. This representation enables the parallelization of convolution operations, simplifies gradient calculations, and facilitates the processing of one-dimensional and two-dimensional signals. We demonstrate that our approach achieves improved processing speed compared to classical methods while maintaining comparable accuracy. The successful implementation of real-world images highlights the potential of this research for large-scale quantum image processing, architecture-specific optimizations, and applications beyond edge detection. Full article

(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)

► Show Figures

Graphical abstract

21 pages, 10628 KiB

Open AccessArticle

Thermal Video Enhancement Mamba: A Novel Approach to Thermal Video Enhancement for Real-World Applications

by Sargis Hovhannisyan, Sos Agaian, Karen Panetta and Artyom Grigoryan

Information 2025, 16(2), 125; https://doi.org/10.3390/info16020125 - 9 Feb 2025

Viewed by 1623

Abstract

Object tracking in thermal video is challenging due to noise, blur, and low contrast. We present TVEMamba, a Mamba-based enhancement framework with near-linear complexity that improves tracking in these conditions. Our approach uses a State Space 2D (SS2D) module integrated with Convolutional Neural Networks (CNNs) to filter, sharpen, and highlight important details. Key components include (i) a denoising module to reduce background noise and enhance image clarity, (ii) an optical flow attention module to handle complex motion and reduce blur, and (iii) entropy-based labeling to create a fully labeled thermal dataset for training and evaluation. TVEMamba outperforms existing methods (DCRGC, RLBHE, IE-CGAN, BBCNN) across multiple datasets (BIRDSAI, FLIR, CAMEL, Autonomous Vehicles, Solar Panels) and achieves higher scores on standard quality metrics (EME, BDIM, DMTE, MDIMTE, LGTA). Extensive tests, including ablation studies and convergence analysis, confirm its robustness. Real-world examples, such as tracking humans, animals, and moving objects for self-driving vehicles and remote sensing, demonstrate the practical value of TVEMamba. Full article

(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)

► Show Figures

Figure 1

23 pages, 32729 KiB

Open AccessArticle

PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles

by Husnain Mushtaq, Xiaoheng Deng, Fizza Azhar, Mubashir Ali and Hafiz Husnain Raza Sherazi

Information 2024, 15(11), 739; https://doi.org/10.3390/info15110739 - 19 Nov 2024

Cited by 4 | Viewed by 2615

Abstract

Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision. Full article

(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)

► Show Figures

Figure 1

25 pages, 9121 KiB

Open AccessArticle

Flying Projectile Attitude Determination from Ground-Based Monocular Imagery with a Priori Knowledge

by Huamei Chen, Zhigang Zhu, Hao Tang, Erik Blasch, Khanh D. Pham and Genshe Chen

Information 2024, 15(4), 201; https://doi.org/10.3390/info15040201 - 4 Apr 2024

Cited by 1 | Viewed by 1566

Abstract

This paper discusses using ground-based imagery to determine the attitude of a flying projectile assuming prior knowledge of its external geometry. It presents a segmentation-based approach to follow the object and evaluates it quantitatively with simulated data and qualitatively with both simulated and real data. Two experimental cases are considered: One assumes reliable target distance measurement from an auxiliary range sensor, while the other assumes no range information. The results show that in the case of an unknown projectile–camera distance, with projectile dimensions of 1.378 m and 0.08 m in length and diameter, the estimated distance, in-plane location, and pitch angle accuracies are about 50 m, 0.15 m, and 6 degrees, respectively. Yaw angle estimation is ambiguous. In the second case, assuming that the projectile–camera distance is known resolves the ambiguity of yaw estimation, resulting in accuracies of about 0.15 m, 3 degrees, and 20 degrees for in-plane location, pitch, and yaw angles, respectively. These accuracies were normalized to a 1-km projectile–camera distance. Full article

(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)

► Show Figures

Journal Menu

Journal Browser

Emerging Research in Object Tracking and Image Segmentation

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI