Convolutional Neural Networks and Vision Applications, 4th Edition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 20 December 2025 | Viewed by 5005

Special Issue Editors

Special Issue Information

Dear Colleagues,

Processing speed is critical for visual inspection automation and mobile visual computing applications. Many powerful and sophisticated computer vision algorithms generate accurate results but require high computational power or resources, and they are not entirely suitable for real-time vision applications. On the other hand, there are vision algorithms and convolutional neural networks that perform at camera frame rates but with moderately reduced accuracy, which is arguably more applicable for real-time vision applications. This Special Issue is reserved for research related to the design, optimization, and implementation of machine-learning-based vision algorithms or convolutional neural networks that are suitable for real-time vision applications.

General topics covered in this Special Issue include the following:

  • Optimization of software-based vision algorithms;
  • CNN architecture optimizations for real-time performance;
  • CNN acceleration through approximate computing;
  • CNN applications that require real-time performance;
  • Tradeoff analysis between speed and accuracy in CNN;
  • GPU-based implementations for real-time CNN performance;
  • FPGA-based implementations for real-time CNN performance;
  • Embedded vision systems for applications that require real-time performance;
  • Machine vision applications that require real-time performance.

Prof. Dr. D. J. Lee
Prof. Dr. Dong Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • convolutional neural networks
  • vision applications
  • CNN architecture

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 1746 KB  
Article
Network Splitting Techniques and Their Optimization for Lightweight Ternary Neural Networks
by Hasna Nur Karimah, Novi Prihatiningrum, Young-Ho Gong, Jonghoon Jin and Yeongkyo Seo
Electronics 2025, 14(18), 3651; https://doi.org/10.3390/electronics14183651 - 15 Sep 2025
Abstract
To run a high-performing deep convolutional neural network (CNN), substantial memory and computational resources are typically required. To address this, we propose an optimization method of a ternary neural network (TNN) by applying network splitting techniques to achieve an even more lightweight model. [...] Read more.
To run a high-performing deep convolutional neural network (CNN), substantial memory and computational resources are typically required. To address this, we propose an optimization method of a ternary neural network (TNN) by applying network splitting techniques to achieve an even more lightweight model. TNN offers a favorable trade-off between accuracy and computational saving compared to a binary quantized network, which often suffers from higher accuracy loss due to extreme quantization. Our network splitting technique combines grouped convolution and pointwise convolution, where the convolution operations are computed in separate groups and then the features are fused together in the later step. Our proposed network splitting technique has the advantage of being easily implemented with lightweight hardware design. For example, when implementing Processing-In-Memory (PIM) hardware, each convolution layer can be set to the same size, enabling the design of lightweight neural network accelerators by eliminating the need for analog-to-digital conversion. As a result, our experiments show that the proposed method can save up to 4.53× memory compression with minimal impact on the accuracy. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

26 pages, 5004 KB  
Article
Effectiveness of Modern Models Belonging to the YOLO and Vision Transformer Architectures in Dangerous Items Detection
by Zbigniew Omiotek
Electronics 2025, 14(17), 3540; https://doi.org/10.3390/electronics14173540 - 5 Sep 2025
Viewed by 549
Abstract
The effectiveness of recently developed tools for detecting dangerous items is overestimated due to the low quality of the datasets used to build the models. The main drawbacks of these datasets include the unrepresentative range of conditions in which the items are presented, [...] Read more.
The effectiveness of recently developed tools for detecting dangerous items is overestimated due to the low quality of the datasets used to build the models. The main drawbacks of these datasets include the unrepresentative range of conditions in which the items are presented, the limited number of classes representing items being detected, and the small number of instances of items belonging to individual classes. To fill the gap in this area, a comprehensive dataset dedicated to the detection of items most used in various acts of public security violations has been built. The dataset includes items such as a machete, knife, baseball bat, rifle, and gun, which are presented in varying quality and under different environmental conditions. The specificity of the constructed dataset allows for more reliable results, which give a better idea of the effectiveness of item detection in real-world conditions. The collected dataset was used to build and compare the effectiveness of modern models for detecting items belonging to the YOLO and Vision Transformer (ViT) architectures. Based on a comprehensive analysis of the results, taking into account accuracy and performance, it turned out that the best results were achieved by the YOLOv11m model, for which Recall = 88.2%, Precision = 89.6%, mAP@50 = 91.8%, mAP@50–95 = 73.7%, Inference time = 1.9 ms. The test results make it possible to recommend this model for use in public security monitoring systems aimed at detecting potentially dangerous items. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

15 pages, 2578 KB  
Article
Effects of Composite Cross-Entropy Loss on Adversarial Robustness
by Ning Ding and Knut Möller
Electronics 2025, 14(17), 3529; https://doi.org/10.3390/electronics14173529 - 4 Sep 2025
Viewed by 399
Abstract
Convolutional neural networks (CNNs) can efficiently extract image features and perform corresponding classification. Typically, the CNN architecture uses the softmax layer to map the extracted features to classification probabilities, and the cost function used for training is the cross-entropy loss. In this paper, [...] Read more.
Convolutional neural networks (CNNs) can efficiently extract image features and perform corresponding classification. Typically, the CNN architecture uses the softmax layer to map the extracted features to classification probabilities, and the cost function used for training is the cross-entropy loss. In this paper, we evaluate the influence of a number of representative composite cross-entropy loss functions on the learned feature space at the fully connected layer, when a target classification is introduced into a multi-class classification task. In addition, the accuracy and robustness of CNN models trained with different composite cross-entropy loss functions are investigated. Improved robustness is achieved by changing the loss between the input and the target classification. Preliminary experiments were conducted using ResNet-50 on the Cholec80 dataset for surgical tool recognition. Furthermore, the model trained with the proposed composite cross-entropy loss incorporating another target all-one classification demonstrates a 31% peak improvement in adversarial robustness. Adversarial training with target adversarial samples yields 80% robustness against PGD attack. This investigation shows that the careful choice of the loss function can improve the robustness of CNN models. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

23 pages, 3506 KB  
Article
Evaluation of Vision Transformers for Multi-Organ Tumor Classification Using MRI and CT Imaging
by Óscar A. Martín and Javier Sánchez
Electronics 2025, 14(15), 2976; https://doi.org/10.3390/electronics14152976 - 25 Jul 2025
Viewed by 671
Abstract
Using neural networks has become the standard technique for medical diagnostics, especially in cancer detection and classification. This work evaluates the performance of Vision Transformer architectures, including Swin Transformer and MaxViT, for several datasets of magnetic resonance imaging (MRI) and computed tomography (CT) [...] Read more.
Using neural networks has become the standard technique for medical diagnostics, especially in cancer detection and classification. This work evaluates the performance of Vision Transformer architectures, including Swin Transformer and MaxViT, for several datasets of magnetic resonance imaging (MRI) and computed tomography (CT) scans. We used three training sets of images with brain, lung, and kidney tumors. Each dataset included different classification labels, from brain gliomas and meningiomas to benign and malignant lung conditions and kidney anomalies such as cysts and cancers. This work aims to analyze the behavior of the neural networks in each dataset and the benefits of combining different image modalities and tumor classes. We designed several experiments by fine-tuning the models on combined and individual datasets. The results revealed that the Swin Transformer achieved the highest accuracy, with an average of 99.0% on single datasets and reaching 99.43% on the combined dataset. This research highlights the adaptability of Transformer-based models to various human organs and image modalities. The main contribution lies in evaluating multiple ViT architectures across multi-organ tumor datasets, demonstrating their generalization to multi-organ classification. Integrating these models across diverse datasets could mark a significant advance in precision medicine, paving the way for more efficient healthcare solutions. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

21 pages, 3250 KB  
Article
Deploying Optimized Deep Vision Models for Eyeglasses Detection on Low-Power Platforms
by Henrikas Giedra, Tomyslav Sledevič and Dalius Matuzevičius
Electronics 2025, 14(14), 2796; https://doi.org/10.3390/electronics14142796 - 11 Jul 2025
Viewed by 777
Abstract
This research addresses the optimization and deployment of convolutional neural networks for eyeglasses detection on low-power edge devices. Multiple convolutional neural network architectures were trained and evaluated using the FFHQ dataset, which contains annotated eyeglasses in the context of faces with diverse facial [...] Read more.
This research addresses the optimization and deployment of convolutional neural networks for eyeglasses detection on low-power edge devices. Multiple convolutional neural network architectures were trained and evaluated using the FFHQ dataset, which contains annotated eyeglasses in the context of faces with diverse facial features and eyewear styles. Several post-training quantization techniques, including Float16, dynamic range, and full integer quantization, were applied to reduce model size and computational demand while preserving detection accuracy. The impact of model architecture and quantization methods on detection accuracy and inference latency was systematically evaluated. The optimized models were deployed and benchmarked on Raspberry Pi 5 and NVIDIA Jetson Orin Nano platforms. Experimental results show that full integer quantization reduces model size by up to 75% while maintaining competitive detection accuracy. Among the evaluated models, MobileNet architectures achieved the most favorable balance between inference speed and accuracy, demonstrating their suitability for real-time eyeglasses detection in resource-constrained environments. These findings enable efficient on-device eyeglasses detection, supporting applications such as virtual try-ons and IoT-based facial analysis systems. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

24 pages, 985 KB  
Article
Attention-Based Deep Feature Aggregation Network for Skin Lesion Classification
by Siddiqui Muhammad Yasir and Hyun Kim
Electronics 2025, 14(12), 2364; https://doi.org/10.3390/electronics14122364 - 9 Jun 2025
Viewed by 1012
Abstract
Early and accurate detection of dermatological conditions, particularly melanoma, is critical for effective treatment and improved patient outcomes. Misclassifications may lead to delayed diagnosis, disease progression, and severe complications in medical image processing. Hence, robust and reliable classification techniques are essential to enhance [...] Read more.
Early and accurate detection of dermatological conditions, particularly melanoma, is critical for effective treatment and improved patient outcomes. Misclassifications may lead to delayed diagnosis, disease progression, and severe complications in medical image processing. Hence, robust and reliable classification techniques are essential to enhance diagnostic precision in clinical practice. This study presents a deep learning-based framework designed to improve feature representation while maintaining computational efficiency. The proposed architecture integrates multi-level feature aggregation with a squeeze-and-excitation attention mechanism to effectively extract salient patterns from dermoscopic medical images. The model is rigorously evaluated on five publicly available benchmark datasets—ISIC-2019, ISIC-2020, SKINL2, MED-NODE, and HAM10000—covering a diverse spectrum of dermatological medical disorders. Experimental results demonstrate that the proposed method consistently outperforms existing approaches in classification performance, achieving accuracy rates of 94.41% and 97.45% on the MED-NODE and HAM10000 datasets, respectively. These results underscore the method’s potential for real-world deployment in automated skin lesion analysis and clinical decision support. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

25 pages, 9742 KB  
Article
Autism Spectrum Disorder Detection Using Skeleton-Based Body Movement Analysis via Dual-Stream Deep Learning
by Jungpil Shin, Abu Saleh Musa Miah, Manato Kakizaki, Najmul Hassan and Yoichi Tomioka
Electronics 2025, 14(11), 2231; https://doi.org/10.3390/electronics14112231 - 30 May 2025
Viewed by 1098
Abstract
Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns [...] Read more.
Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns can be efficiently and non-intrusively captured using modern computational techniques, making them valuable for ASD recognition. Various types of research have been conducted to detect ASD through deep learning, including facial feature analysis, eye gaze analysis, and movement and gesture analysis. In this study, we optimise a dual-stream architecture that combines image classification and skeleton recognition models to analyse video data for body motion analysis. The first stream processes Skepxels—spatial representations derived from skeleton data—using ConvNeXt-Base, a robust image recognition model that efficiently captures aggregated spatial embeddings. The second stream encodes angular features, embedding relative joint angles into the skeleton sequence and extracting spatiotemporal dynamics using Multi-Scale Graph 3D Convolutional Network(MSG3D), a combination of Graph Convolutional Networks (GCNs) and Temporal Convolutional Networks (TCNs). We replace the ViT model from the original architecture with ConvNeXt-Base to evaluate the efficacy of CNN-based models in capturing gesture-related features for ASD detection. Additionally, we experimented with a Stack Transformer in the second stream instead of MSG3D but found it to result in lower performance accuracy, thus highlighting the importance of GCN-based models for motion analysis. The integration of these two streams ensures comprehensive feature extraction, capturing both global and detailed motion patterns. A pairwise Euclidean distance loss is employed during training to enhance the consistency and robustness of feature representations. The results from our experiments demonstrate that the two-stream approach, combining ConvNeXt-Base and MSG3D, offers a promising method for effective autism detection. This approach not only enhances accuracy but also contributes valuable insights into optimising deep learning models for gesture-based recognition. By integrating image classification and skeleton recognition, we can better capture both global and detailed motion patterns, which are crucial for improving early ASD diagnosis and intervention strategies. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

Back to TopTop