Recent Trends in Computer Vision with Neural Networks

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Computer Vision and Pattern Recognition".

Deadline for manuscript submissions: closed (30 January 2025) | Viewed by 7058

Special Issue Editor


E-Mail Website
Guest Editor
Department of Electrical and Information Engineering “Maurizio Scarano”, University of Cassino and Southern Lazio, 03043 Cassino, Italy
Interests: machine learning; pattern recognition; IoT; image understanding; biomedical imaging; sensors
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are delighted to announce the forthcoming Special Issue titled "Recent Trends in Computer Vision with Neural Networks" in the Journal of Imaging. This Special Issue aims to explore the cutting-edge advancements in the field of computer vision, particularly focusing on the innovative applications and developments of neural networks. As computer vision continues to revolutionize various sectors—from healthcare to automotive industries—the role of neural networks in enhancing and evolving this technology is more significant than ever.

We invite contributions that address a range of topics including, but not limited to, machine learning algorithms for image and video analysis, deep learning approaches for pattern recognition, and neural network architectures for real-time image processing. Submissions that demonstrate novel applications of AI in computer vision, or that propose innovative solutions to traditional computer vision challenges using neural networks, are highly encouraged.

This Special Issue seeks to provide a platform for researchers and practitioners from around the world to share their insights, discoveries, and advancements in the field. We welcome original research papers, comprehensive reviews, and case studies that contribute to the body of knowledge in applying neural networks to computer vision.

Your submission will contribute to a broader understanding of how neural networks are shaping the future of computer vision and its applications across diverse fields. We look forward to your valuable contributions to this dynamic and rapidly evolving area of research.

Dr. Mario Molinara
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • computer vision
  • neural networks
  • artificial intelligence
  • pattern recognition

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 3621 KiB  
Article
SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting
by Ahmed Zgaren, Wassim Bouachir and Nizar Bouguila
J. Imaging 2025, 11(2), 52; https://doi.org/10.3390/jimaging11020052 - 10 Feb 2025
Viewed by 1066
Abstract
Zero-shot counting is a subcategory of Generic Visual Object Counting, which aims to count objects from an arbitrary class in a given image. While few-shot counting relies on delivering exemplars to the model to count similar class objects, zero-shot counting automates the operation [...] Read more.
Zero-shot counting is a subcategory of Generic Visual Object Counting, which aims to count objects from an arbitrary class in a given image. While few-shot counting relies on delivering exemplars to the model to count similar class objects, zero-shot counting automates the operation for faster processing. This paper proposes a fully automated zero-shot method outperforming both zero-shot and few-shot methods. By exploiting feature maps from a pre-trained detection-based backbone, we introduce a new Visual Embedding Module designed to generate semantic embeddings within object contextual information. These embeddings are then fed to a Self-Attention Matching Module to generate an encoded representation for the head counter. Our proposed method has outperformed recent zero-shot approaches, achieving the best Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) results of 8.89 and 35.83, respectively, on the FSC147 dataset. Additionally, our method demonstrates competitive performance compared to few-shot methods, advancing the capabilities of visual object counting in various industrial applications such as tree counting, wildlife animal counting, and medical applications like blood cell counting. Full article
(This article belongs to the Special Issue Recent Trends in Computer Vision with Neural Networks)
Show Figures

Figure 1

17 pages, 2685 KiB  
Article
Investigating the Sim-to-Real Generalizability of Deep Learning Object Detection Models
by Joachim Rüter, Umut Durak and Johann C. Dauer
J. Imaging 2024, 10(10), 259; https://doi.org/10.3390/jimaging10100259 - 18 Oct 2024
Cited by 1 | Viewed by 1716
Abstract
State-of-the-art object detection models need large and diverse datasets for training. As these are hard to acquire for many practical applications, training images from simulation environments gain more and more attention. A problem arises as deep learning models trained on simulation images usually [...] Read more.
State-of-the-art object detection models need large and diverse datasets for training. As these are hard to acquire for many practical applications, training images from simulation environments gain more and more attention. A problem arises as deep learning models trained on simulation images usually have problems generalizing to real-world images shown by a sharp performance drop. Definite reasons and influences for this performance drop are not yet found. While previous work mostly investigated the influence of the data as well as the use of domain adaptation, this work provides a novel perspective by investigating the influence of the object detection model itself. Against this background, first, a corresponding measure called sim-to-real generalizability is defined, comprising the capability of an object detection model to generalize from simulation training images to real-world evaluation images. Second, 12 different deep learning-based object detection models are trained and their sim-to-real generalizability is evaluated. The models are trained with a variation of hyperparameters resulting in a total of 144 trained and evaluated versions. The results show a clear influence of the feature extractor and offer further insights and correlations. They open up future research on investigating influences on the sim-to-real generalizability of deep learning-based object detection models as well as on developing feature extractors that have better sim-to-real generalizability capabilities. Full article
(This article belongs to the Special Issue Recent Trends in Computer Vision with Neural Networks)
Show Figures

Figure 1

15 pages, 6555 KiB  
Article
Video-Based Sign Language Recognition via ResNet and LSTM Network
by Jiayu Huang and Varin Chouvatut
J. Imaging 2024, 10(6), 149; https://doi.org/10.3390/jimaging10060149 - 20 Jun 2024
Viewed by 3307
Abstract
Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural [...] Read more.
Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural networks used to extract spatio-temporal features from sign language videos suffer from insufficient feature extraction, resulting in low recognition rates. Nevertheless, a large number of video-based sign language datasets require a significant amount of computing resources for training while ensuring the generalization of the network, which poses a challenge for recognition. In this paper, we present a video-based sign language recognition method based on Residual Network (ResNet) and Long Short-Term Memory (LSTM). As the number of network layers increases, the ResNet network can effectively solve the granularity explosion problem and obtain better time series features. We use the ResNet convolutional network as the backbone model. LSTM utilizes the concept of gates to control unit states and update the output feature values of sequences. ResNet extracts the sign language features. Then, the learned feature space is used as the input of the LSTM network to obtain long sequence features. It can effectively extract the spatio-temporal features in sign language videos and improve the recognition rate of sign language actions. An extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed method, with an accuracy of 85.26%, F1-score of 84.98%, and precision of 87.77% on Argentine Sign Language (LSA64). Full article
(This article belongs to the Special Issue Recent Trends in Computer Vision with Neural Networks)
Show Figures

Figure 1

Back to TopTop