Emerging Research in Object Tracking and Image Segmentation

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 March 2026 | Viewed by 5661

Special Issue Editors


E-Mail Website
Guest Editor
Department of Science and Technology, University of Naples Parthenope, 80133 Napoli, Italy
Interests: machine learning; kernel methods; lustering; intrinsic dimension estimation; gesture recognition; handwriting recognition; time series prediction; dimensionality reduction
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Visual object tracking and segmentation are both essential components of perception, and taken together they have become an active research topic in the computer vision community over the decades. Visual object tracking and segmentation algorithms have developed rapidly thanks to the massive amount of video data that in turn creates high demand for the speed and accuracy of tracking algorithms. Researchers are motivated to design faster and better methods in spite of the challenges that exist in visual object tracking and segmentation, especially robustness when it comes to heavy occlusions, fast motion, accurate localization, mult-object tracking, and low-resolution. Despite the success in addressing numerous challenges under a wide range of circumstances, the core problems remain complex and challenging.

This main aim of this Special lssue will be to focus on the most recent advancements and trends in visual object tracking and segmentation. Methods such as those reported in the formulation of Siamese networks and spatial-temporal memory for VOT and VOS may be further explored to improve performance. We invite original research work involving novel technigues, innovative methods, and useful applications that lead to significant advances in VOT and VOS. We also welcome reviews and surveys on state-of-the-art methods. Topics of interest include, but are not limited to:

  1. Object detection, identification, recognition, tracking, and segmentation.
  2. Video analysis and tracking.
  3. Image and video enhancement algorithms to improve the quality of video object tracking.
  4. Computational photography and imaging for advanced object detection and tracking.
  5. Depth estimation and three-dimensional reconstruction for augmented reality (AR) and/or advanced driver assistance systems (ADAS).
  6. Learning data representation from video based on supervised, unsupervised, and semi-supervised learning.
  7. Dataset and performance evaluation, person re-identification, and vehicle re-identification.
  8. Human behavior detection, human pose estimation, and tracking.
  9. Visual surveillance and monitoring.

Prof. Dr. Kaihua Zhang
Dr. Francesco Camastra
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • visual object tracking
  • image segmentation
  • pose estimation
  • image and video enhancement

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 14300 KiB  
Article
Quantum Edge Detection and Convolution Using Paired Transform-Based Image Representation
by Artyom Grigoryan, Alexis Gomez, Sos Agaian and Karen Panetta
Information 2025, 16(4), 255; https://doi.org/10.3390/info16040255 - 21 Mar 2025
Viewed by 224
Abstract
Classical edge detection algorithms often struggle to process large, high-resolution image datasets efficiently. Quantum image processing offers a promising alternative, but current implementations face significant challenges, such as time-consuming data acquisition, complex device requirements, and limited real-time processing capabilities. This work presents a [...] Read more.
Classical edge detection algorithms often struggle to process large, high-resolution image datasets efficiently. Quantum image processing offers a promising alternative, but current implementations face significant challenges, such as time-consuming data acquisition, complex device requirements, and limited real-time processing capabilities. This work presents a novel paired transform-based quantum representation for efficient image processing. This representation enables the parallelization of convolution operations, simplifies gradient calculations, and facilitates the processing of one-dimensional and two-dimensional signals. We demonstrate that our approach achieves improved processing speed compared to classical methods while maintaining comparable accuracy. The successful implementation of real-world images highlights the potential of this research for large-scale quantum image processing, architecture-specific optimizations, and applications beyond edge detection. Full article
(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)
Show Figures

Graphical abstract

21 pages, 10628 KiB  
Article
Thermal Video Enhancement Mamba: A Novel Approach to Thermal Video Enhancement for Real-World Applications
by Sargis Hovhannisyan, Sos Agaian, Karen Panetta and Artyom Grigoryan
Information 2025, 16(2), 125; https://doi.org/10.3390/info16020125 - 9 Feb 2025
Viewed by 879
Abstract
Object tracking in thermal video is challenging due to noise, blur, and low contrast. We present TVEMamba, a Mamba-based enhancement framework with near-linear complexity that improves tracking in these conditions. Our approach uses a State Space 2D (SS2D) module integrated with Convolutional Neural [...] Read more.
Object tracking in thermal video is challenging due to noise, blur, and low contrast. We present TVEMamba, a Mamba-based enhancement framework with near-linear complexity that improves tracking in these conditions. Our approach uses a State Space 2D (SS2D) module integrated with Convolutional Neural Networks (CNNs) to filter, sharpen, and highlight important details. Key components include (i) a denoising module to reduce background noise and enhance image clarity, (ii) an optical flow attention module to handle complex motion and reduce blur, and (iii) entropy-based labeling to create a fully labeled thermal dataset for training and evaluation. TVEMamba outperforms existing methods (DCRGC, RLBHE, IE-CGAN, BBCNN) across multiple datasets (BIRDSAI, FLIR, CAMEL, Autonomous Vehicles, Solar Panels) and achieves higher scores on standard quality metrics (EME, BDIM, DMTE, MDIMTE, LGTA). Extensive tests, including ablation studies and convergence analysis, confirm its robustness. Real-world examples, such as tracking humans, animals, and moving objects for self-driving vehicles and remote sensing, demonstrate the practical value of TVEMamba. Full article
(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)
Show Figures

Figure 1

23 pages, 32729 KiB  
Article
PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles
by Husnain Mushtaq, Xiaoheng Deng, Fizza Azhar, Mubashir Ali and Hafiz Husnain Raza Sherazi
Information 2024, 15(11), 739; https://doi.org/10.3390/info15110739 - 19 Nov 2024
Cited by 2 | Viewed by 1794
Abstract
Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates [...] Read more.
Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision. Full article
(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)
Show Figures

Figure 1

25 pages, 9121 KiB  
Article
Flying Projectile Attitude Determination from Ground-Based Monocular Imagery with a Priori Knowledge
by Huamei Chen, Zhigang Zhu, Hao Tang, Erik Blasch, Khanh D. Pham and Genshe Chen
Information 2024, 15(4), 201; https://doi.org/10.3390/info15040201 - 4 Apr 2024
Viewed by 1412
Abstract
This paper discusses using ground-based imagery to determine the attitude of a flying projectile assuming prior knowledge of its external geometry. It presents a segmentation-based approach to follow the object and evaluates it quantitatively with simulated data and qualitatively with both simulated and [...] Read more.
This paper discusses using ground-based imagery to determine the attitude of a flying projectile assuming prior knowledge of its external geometry. It presents a segmentation-based approach to follow the object and evaluates it quantitatively with simulated data and qualitatively with both simulated and real data. Two experimental cases are considered: One assumes reliable target distance measurement from an auxiliary range sensor, while the other assumes no range information. The results show that in the case of an unknown projectile–camera distance, with projectile dimensions of 1.378 m and 0.08 m in length and diameter, the estimated distance, in-plane location, and pitch angle accuracies are about 50 m, 0.15 m, and 6 degrees, respectively. Yaw angle estimation is ambiguous. In the second case, assuming that the projectile–camera distance is known resolves the ambiguity of yaw estimation, resulting in accuracies of about 0.15 m, 3 degrees, and 20 degrees for in-plane location, pitch, and yaw angles, respectively. These accuracies were normalized to a 1-km projectile–camera distance. Full article
(This article belongs to the Special Issue Emerging Research in Object Tracking and Image Segmentation)
Show Figures

Figure 1

Back to TopTop