applsci-logo

Journal Browser

Journal Browser

Convolutional Neural Networks and Computer Vision

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 September 2025 | Viewed by 4115

Special Issue Editors


E-Mail Website
Guest Editor
Computer Science Department, Universidad Carlos III de Madrid, Avenida Gregorio Peces-Barba Martínez, 22, 28270 Colmenarejo, Madrid, Spain
Interests: machine learning; computer vision; data mining; neural networks; IoT
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Computer Science Department, Universidad de Alcalá, Ctra. Madrid-Barcelona, 28805 Alcalá de Henares, Spain
Interests: cognitive science; computer vision; evolutionary computation

Special Issue Information

Dear Colleagues,

Convolutional neural networks (CNNs) have revolutionized the field of computer vision and become a fundamental tool in modern AI research and applications. CNNs are designed to automatically and adaptively learn spatial hierarchies of features from input images, making them highly effective in a wide range of computer vision tasks such as image classification, object detection, semantic segmentation, instance segmentation, pose estimation, and image generation, among others. With the exponential growth of digital imagery and video data, CNNs offer a scalable solution for analyzing and interpreting visual information at unprecedented levels of accuracy.

Computer vision, an interdisciplinary field that combines artificial intelligence, machine learning, and signal processing, has advanced significantly due to CNNs' capability to extract meaningful features from complex image data.

Research areas may include (but are not limited to) the following:

  1. Novel CNN architectures for computer vision applications.
  2. Image segmentation (object detection, instance segmentation, semantic segmentation, and panoptic segmentation).
  3. Video object segmentation.
  4. Object and multi-object tracking.
  5. Vision-based navigation.
  6. Autonomous vehicle perception and control systems.
  7. Remote sensing and environmental monitoring.
  8. Real-time image processing and analysis.
  9. Foundation vision models.
  10. Fusion of CNNs with other machine learning techniques.
  11. Medical imaging and diagnostics using CNNs.
  12. Industrial automation and robotics applications.
  13. Image and video generation using deep learning models.
  14. CNNs for augmented and virtual reality systems.

Contributions that explore new architectures, training methodologies, applications, or integration of CNNs with emerging technologies are especially welcome. All submitted papers will undergo rigorous peer review and will be selected based on their relevance, originality, and alignment with the theme of this Special Issue.

We look forward to receiving your contributions.

Dr. Miguel Angel Patricio
Prof. Dr. Luis Usero Aragonés
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • CNN
  • image segmentation
  • video segmentation
  • object tracking
  • multi-object tracking
  • vision-based navigation
  • visual-SLAM foundation vision models

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 3023 KiB  
Article
SEM-Net: A Social–Emotional Music Classification Model for Emotion Regulation and Music Literacy in Individuals with Special Needs
by Yu-Chi Chou, Shan-Ken Chien, Pen-Chiang Chao, Yuan-Jin Lin, Chih-Yun Chen, Kuang-Kai Yeh, Yen-Chia Peng, Chen-Hao Tsao, Shih-Lun Chen and Kuo-Chen Li
Appl. Sci. 2025, 15(8), 4191; https://doi.org/10.3390/app15084191 - 10 Apr 2025
Viewed by 292
Abstract
This study aims to establish an innovative AI-based social–emotional music classification model named SEM-Net, specifically designed to integrate three core positive social–emotional elements—positive outlook, empathy, and problem-solving—into classical music, facilitating accurate emotional classification of musical excerpts related to emotional states. SEM-Net employs a [...] Read more.
This study aims to establish an innovative AI-based social–emotional music classification model named SEM-Net, specifically designed to integrate three core positive social–emotional elements—positive outlook, empathy, and problem-solving—into classical music, facilitating accurate emotional classification of musical excerpts related to emotional states. SEM-Net employs a convolutional neural network (CNN) architecture composed of 17 meticulously structured layers to capture complex emotional and musical features effectively. To further enhance the precision and robustness of the classification system, advanced social–emotional music feature preprocessing and sophisticated feature extraction techniques were developed, significantly improving the model’s predictive performance. Experimental results demonstrate that SEM-Net achieves an impressive final classification accuracy of 94.13%, substantially surpassing the baseline method by 54.78% and outperforming other widely used deep learning architectures, including conventional CNN, LSTM, and Transformer models, by at least 27%. The proposed SEM-Net system facilitates emotional regulation and meaningfully enhances emotional and musical literacy, social communication skills, and overall quality of life for individuals with special needs, offering a practical, scalable, and accessible tool that contributes significantly to personalized emotional growth and social–emotional learning. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

18 pages, 7671 KiB  
Article
Automated Gluten Detection in Bread Images Using Convolutional Neural Networks
by Aviad Elyashar, Abigail Paradise Vit, Guy Sebbag, Alex Khaytin and Avi Zakai
Appl. Sci. 2025, 15(4), 1737; https://doi.org/10.3390/app15041737 - 8 Feb 2025
Viewed by 822
Abstract
Celiac disease and gluten sensitivity affect a significant portion of the population and require adherence to a gluten-free diet. Dining in social settings, such as family events, workplace gatherings, or restaurants, makes it difficult to ensure that certain foods are gluten-free. Despite the [...] Read more.
Celiac disease and gluten sensitivity affect a significant portion of the population and require adherence to a gluten-free diet. Dining in social settings, such as family events, workplace gatherings, or restaurants, makes it difficult to ensure that certain foods are gluten-free. Despite the availability of portable gluten testing devices, these instruments have high costs, disposable capsules, depend on user preparation and technique, and cannot analyze an entire meal or detect gluten levels below the legal thresholds, potentially leading to inaccurate results. In this study, we propose RGB (Recognition of Gluten in Bread), a novel deep learning-based method for automatically detecting gluten in bread images. RGB is a decision-support tool to help individuals with celiac disease make informed dietary choices. To develop this method, we curated and annotated three unique datasets of bread images collected from Pinterest, Instagram, and a custom dataset containing information about flour types. Fine-tuning pre-trained convolutional neural networks (CNNs) on the Pinterest dataset, our best-performing model, ResNet50V2, achieved 77% accuracy and recall. Transfer learning was subsequently applied to adapt the model to the Instagram dataset, resulting in 78% accuracy and 77% recall. Finally, further fine-tuning the model on a significantly different dataset, the custom bread dataset, significantly improved the performance, achieving an accuracy of 86%, precision of 87%, recall of 86%, and F1-score of 86%. Our analysis further revealed that the model performed better on gluten-free flours, achieving higher accuracy scores for these types. This study demonstrates the feasibility of image-based gluten detection in bread and highlights its potential to provide a cost-effective non-invasive alternative to traditional testing methods by allowing individuals with celiac disease to receive immediate feedback on potential gluten content in their meals through simple food photography. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

17 pages, 5264 KiB  
Article
Automated Road Extraction from Satellite Imagery Integrating Dense Depthwise Dilated Separable Spatial Pyramid Pooling with DeepLabV3+
by Arpan Mahara, Md Rezaul Karim Khan, Liangdong Deng, Naphtali Rishe, Wenjia Wang and Seyed Masoud Sadjadi
Appl. Sci. 2025, 15(3), 1027; https://doi.org/10.3390/app15031027 - 21 Jan 2025
Cited by 1 | Viewed by 1064
Abstract
Road extraction is a sub-domain of remote sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field [...] Read more.
Road extraction is a sub-domain of remote sensing applications; it is a subject of extensive and ongoing research. The procedure of automatically extracting roads from satellite imagery encounters significant challenges due to the multi-scale and diverse structures of roads; improvement in this field is needed. Convolutional neural networks (CNNs), especially the DeepLab series known for its proficiency in semantic segmentation due to its efficiency in interpreting multi-scale objects’ features, address some of these challenges caused by the varying nature of roads. The present work proposes the utilization of DeepLabV3+, the latest version of the DeepLab series, by introducing an innovative Dense Depthwise Dilated Separable Spatial Pyramid Pooling (DenseDDSSPP) module and integrating it in the place of the conventional Atrous Spatial Pyramid Pooling (ASPP) module. This modification enhances the extraction of complex road structures from satellite images. This study hypothesizes that the integration of DenseDDSSPP with a CNN backbone network and a Squeeze-and-Excitation block will generate an efficient dense feature map by focusing on relevant features, leading to more precise and accurate road extraction from remote sensing images. The Results Section presents a comparison of our model’s performance against state-of-the-art models, demonstrating better results that highlight the effectiveness and success of the proposed approach. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

22 pages, 8466 KiB  
Article
A Comparative Study of Convolutional Neural Network and Transformer Architectures for Drone Detection in Thermal Images
by Gian Gutierrez, Juan P. Llerena, Luis Usero and Miguel A. Patricio
Appl. Sci. 2025, 15(1), 109; https://doi.org/10.3390/app15010109 - 27 Dec 2024
Cited by 2 | Viewed by 1312
Abstract
The widespread growth of drone technology is generating new security paradigms, especially with regard to the unauthorized activities of UAVs in restricted or sensitive areas, as well as illegal and illicit activities or attacks. Among the various UAV detection technologies, vision systems in [...] Read more.
The widespread growth of drone technology is generating new security paradigms, especially with regard to the unauthorized activities of UAVs in restricted or sensitive areas, as well as illegal and illicit activities or attacks. Among the various UAV detection technologies, vision systems in different spectra are postulated as outstanding technologies due to their peculiarities compared to other technologies. However, drone detection in thermal imaging is a challenging task due to specific factors such as thermal noise, temperature variability, or cluttered environments. This study addresses these challenges through a comparative evaluation of contemporary neural network architectures—specifically, convolutional neural networks (CNNs) and transformer-based models—for UAV detection in infrared imagery. The research focuses on real-world conditions and examines the performance of YOLOv9, GELAN, DETR, and ViTDet in different scenarios of the Anti-UAV Challenge 2023 dataset. The results show that YOLOv9 stands out for its real-time detection speed, while GELAN provides the highest accuracy in varying conditions and DETR performs reliably in thermally complex environments. The study contributes to the advancement of state-of-the-art UAV detection techniques and highlights the need for the further development of specialized models for specific detection scenarios. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Computer Vision)
Show Figures

Figure 1

Back to TopTop