Recent Progress and Challenges in Computer Vision and Machine Learning

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Computer Vision and Pattern Recognition".

Deadline for manuscript submissions: 30 November 2026 | Viewed by 196

Special Issue Editor

Australian Institute for Machine Learning, The University of Adelaide, Adelaide, SA 5000, Australia
Interests: computer vision; medical image analysis; generative AI; artificial intelligence; multimodal AI

Special Issue Information

Dear Colleagues,

Recent years have witnessed rapid advances in computer vision and machine learning, driven by the emergence of large-scale data, powerful representation learning models, and increasingly integrated multimodal systems. From foundational vision tasks such as recognition, detection, and reconstruction to higher-level reasoning, generation, and decision-making, modern visual intelligence systems are becoming more general, controllable, and applicable to complex real-world scenarios. At the same time, these advances also bring forward new challenges related to robustness, generalization, interpretability, data efficiency, and deployment in domain-specific settings.

This Special Issue aims to provide a comprehensive forum for presenting recent progress, open challenges, and future directions in computer vision and machine learning. We welcome contributions that span theoretical developments, algorithmic innovations, and practical systems, with particular emphasis on the interaction between vision models and modern learning paradigms such as deep learning, generative modeling, and multimodal large language models. Topics of interest include, but are not limited to, controllable and interpretable visual generation, vision–language understanding, multimodal representation learning, robust and trustworthy vision systems, and applications in domains such as medicine, architecture, remote sensing, and the Internet.

By bringing together researchers from both academia and industry, this Special Issue seeks to highlight emerging trends, identify key challenges, and stimulate cross-disciplinary discussions that advance the state of the art in computer vision and machine learning.

Dr. Qi Chen
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • machine learning
  • multimodal learning
  • vision–language models
  • generative models
  • controllable and interpretable AI
  • large language models
  • robust visual understanding
  • real-world applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 3504 KB  
Article
Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)
by Md. Rahatul Islam, Yui Kataoka, Keisuke Teramoto and Keiichi Horio
J. Imaging 2026, 12(4), 150; https://doi.org/10.3390/jimaging12040150 - 30 Mar 2026
Abstract
Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address [...] Read more.
Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address this challenge, this study proposes a novel hybrid framework combining YOLOv8 and Vision Transformer (ViT). Using YOLOv8 for detection and ViT for global feature extraction, we trained the model on a custom dataset of 31,521 images, achieving an overall accuracy of 93.75%, and the public benchmark MOT20 dataset of 28,630 images, achieving an overall accuracy of 96.02%. Our system showed remarkable success in tracking performance, where it achieved 86.7% MOTA and 99.7% IDF1 scores. This high IDF1 score proves that the model is highly effective in preventing identity switch. The main novelty of this study is the behavioral analysis of children beyond the boundaries of surveillance, where we measure walking distance and trajectory, and screen time. Finally, through cross-dataset comparison with the MOT20 public benchmark, we demonstrated that our proposed customized model is much more effective than current state-of-the-art methods in overcoming the domain gap in specific environments such as kindergarten. Full article
Show Figures

Figure 1

Back to TopTop