Computer Vision and Pattern Recognition Based on Machine Learning

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 April 2026 | Viewed by 1587

Special Issue Editor


E-Mail Website
Guest Editor
School of Computer Science, Sichuan University, Chengdu, China
Interests: deep learning; pattern recognition; computer vision

Special Issue Information

Dear Colleagues,

Computer vision plays a pivotal role in modern intelligent systems, with widespread applications in medical diagnostics, industrial automation, remote sensing, autonomous driving, and beyond. Tasks such as image recognition, object detection, semantic segmentation, and 3D pose estimation are fundamental to these domains. However, the increasing complexity of vision models and the growing demand for real-time, resource-efficient solutions necessitate advancements in data-efficient and model-efficient learning.

This Special Issue focuses on cutting-edge research in efficient computer vision algorithms, covering both data efficiency and model efficiency. In data-efficient learning, the explored techniques reduce reliance on large-scale labelled datasets, including few-shot learning, transfer learning, domain adaptation, and self-supervised pre-training for downstream task adaptation. In model-efficient learning, the emphasis lies on lightweight architectures, neural architecture search (NAS), model compression (e.g., pruning, quantization, distillation), and the efficient design of convolutional neural networks (CNNs), vision transformers (ViTs), and hybrid architectures.

 This Special Issue welcomes original research and reviews addressing efficiency challenges in computer vision. Other potential applications may include, but are not limited to, the following:

 • Medical imaging (e.g., low-data lesion detection);

• Industrial inspection (e.g., defect recognition with limited samples);

• Autonomous systems (e.g., real-time object tracking);

• Remote sensing (e.g., efficient land-cover segmentation). 

In summary, we welcome studies on the following topics:

  1. Data-Efficient Learning

      - Few-shot/zero-shot learning for vision tasks;  

      - Transfer learning and domain adaptation;  

      -Self-supervised and weakly supervised learning;  

      - Active learning and annotation-efficient methods.

2. Model-Efficient Design  

      -Efficient CNN and transformer architectures;  

      - Neural architecture search (NAS) for efficient models;  

      - Model compression (pruning, quantization, knowledge distillation);  

      - Dynamic or adaptive inference for computational savings.

3. Efficient Vision Tasks

      -Real-time object detection and segmentation;

      - Efficient depth estimation and 3D reconstruction

      - Low-latency video analysis (e.g., action recognition);

      - Energy-efficient deployment on edge devices.

4. Applications and Case Studies

   - Efficient vision systems for healthcare, robotics, or agriculture;

   - Benchmarks and datasets for evaluating efficiency;

   - Hardware-aware algorithm design (e.g., for mobile/embedded devices).

Dr. Tao Wang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • architecture design
  • data-efficient learning
  • image processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 18413 KB  
Article
Improving Texture Recognition via Multi-Layer Feature Aggregation from Pre-Trained Vision Architectures
by Nikolay Neshov, Krasimir Tonchev, Ivaylo Bozhilov, Radostina Petkova and Agata Manolova
Electronics 2025, 14(23), 4779; https://doi.org/10.3390/electronics14234779 - 4 Dec 2025
Abstract
Texture recognition is a fundamental task in computer vision, with diverse applications in material sciences, medicine, and agriculture. The ability to analyze complex patterns in images has been greatly enhanced by advancements in Deep Neural Networks and Vision Transformers. To address the challenging [...] Read more.
Texture recognition is a fundamental task in computer vision, with diverse applications in material sciences, medicine, and agriculture. The ability to analyze complex patterns in images has been greatly enhanced by advancements in Deep Neural Networks and Vision Transformers. To address the challenging nature of texture recognition, this paper investigates the performance of several pre-trained vision architectures for texture recognition, including both CNN- and transformer-based models. For each architecture, multi-level features are extracted from early, intermediate, and final layers, concatenated, and fed into a trainable Multi-Layer Perceptron (MLP) classifier. The architecture is thoroughly evaluated using five publicly available texture datasets, KTH-TIPS2-b, FMD, GTOS-Mobile, DTD, and Soil, with MLP hyperparameters determined through an exhaustive grid search on one of the datasets to ensure optimal performance. Extensive experiments highlight the comparative performance of each architecture and demonstrate that aggregating features from different hierarchical levels improves texture recognition in most cases, outperforming even architectures that require substantially higher computational resources. The study also shows the particular effectiveness of transformer-based models, such as BEiTv2, in achieving state-of-the-art results on four of the five examined datasets. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Machine Learning)
Show Figures

Figure 1

22 pages, 1773 KB  
Article
ACE-Net: A Fine-Grained Deepfake Detection Model with Multimodal Emotional Consistency
by Shaoqian Yu, Xingyu Chen, Yuzhe Sheng, Han Zhang, Xinlong Li and Sijia Yu
Electronics 2025, 14(22), 4420; https://doi.org/10.3390/electronics14224420 - 13 Nov 2025
Viewed by 448
Abstract
The alarming realism of Deepfake presents a significant challenge to digital authenticity, yet its inherent difficulty in synchronizing the emotional cues between facial expressions and speech offers a critical opportunity for detection. However, most existing approaches rely on general-purpose backbones for unimodal feature [...] Read more.
The alarming realism of Deepfake presents a significant challenge to digital authenticity, yet its inherent difficulty in synchronizing the emotional cues between facial expressions and speech offers a critical opportunity for detection. However, most existing approaches rely on general-purpose backbones for unimodal feature extraction, resulting in an inadequate representation of fine-grained dynamic emotional expressions. Although a limited number of studies have explored cross-modal emotional consistency of deepfake detection, they typically employ shallow fusion techniques which limit latent expressiveness. To address this, we propose ACE-Net, a novel framework that identifies forgeries via multimodal emotional inconsistency. For the speech modality, we design a bidirectional cross-attention mechanism to fuse acoustic features from a lightweight CNN-based model with textual features, yielding a representation highly sensitive to fine-grained emotional dynamics. For the visual modality, a MobileNetV3-based perception head is proposed to adaptively select keyframes, yielding a representation focused on the most emotionally salient moments. For multimodal emotional consistency discrimination, we develop a multi-dimensional fusion strategy to deeply integrate high-level emotional features from different modalities within a unified latent space. For unimodal emotion recognition, both the audio and visual branches outperform baseline models on the CREMA-D dataset. Building on this, the complete ACE-Net model achieves a state-of-the-art AUC of 0.921 on the challenging DFDC benchmark. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Machine Learning)
Show Figures

Figure 1

22 pages, 5609 KB  
Article
Lightweight Algorithm for Steel Surface Defect Detection Based on PPY-YOLO
by Jue Zhao, Yufa Peng, Sheng Zhang and Xiaolong Li
Electronics 2025, 14(17), 3401; https://doi.org/10.3390/electronics14173401 - 26 Aug 2025
Viewed by 876
Abstract
We propose an improved steel surface defect detection algorithm based on YOLOv8, named PPY-YOLO. First, we improve the neck architecture of YOLOv8. We add upsampling and feature extraction fusion layers in the neck for more thorough multi-scale feature interaction in the model, effectively [...] Read more.
We propose an improved steel surface defect detection algorithm based on YOLOv8, named PPY-YOLO. First, we improve the neck architecture of YOLOv8. We add upsampling and feature extraction fusion layers in the neck for more thorough multi-scale feature interaction in the model, effectively integrating fine-grained with semantic features. Second, we introduce an improved GAM-B attention mechanism before the SPPF layer. This enhances the model’s ability to focus on key features and suppress non-key features, thus improving the model’s detection accuracy. Third, we introduce the C2f_RVB module, boosting computational efficiency and enhancing its representation ability. Fourth, we redesign the detection head with weight sharing and group convolution, further boosting the model’s computational efficiency and detection accuracy. Experimental results show that on the NEU-DET dataset, the PPY-YOLO model has a 4.8% increase in mAP@0.5 and a 1.7% increase in mAP@0.5:0.95 compared to the baseline. On the GC10-DET dataset, it has a 6.6% increase in mAP@0.5 and a 5.3% increase in mAP@0.5:0.95. While improving the detection accuracy, we reduce the number of parameters by 30.0% and the computational cost by 30.8%. Experimental results prove that the PPY-YOLO model proposed in this paper has higher detection accuracy and computational efficiency. It is more suitable for deployment on resource-constrained mobile detection devices and has good generalization ability. Full article
(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Machine Learning)
Show Figures

Figure 1

Back to TopTop