Deep/Machine Learning in Visual Recognition and Anomaly Detection

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: 15 August 2026 | Viewed by 16573

Special Issue Editors


E-Mail Website
Guest Editor
School of Telecommunications Engineering, Xidian University, Xi’an 710071, China
Interests: computer vision; pattern recognition; machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Computer Science, Shaanxi Normal University, Xi’an 710062, China
Interests: computer vision; artificial intelligence

Special Issue Information

Dear Colleagues,

The success of machine learning and deep neural networks have facilitated advances in understanding the high-level semantics of visual content. Conventional learning-based visual semantic recognition approaches rely heavily on large-scale training data with dense annotations and consistently fail to estimate accurate semantic labels for unseen categories. The emergence and rapid progress of few-/zero-shot learning make it possible to learn unseen categories from a few labeled or even zero-labeled samples, which advances the extension to practical applications. This Special Issue aims to demonstrate (1) how machine learning algorithms have contributed, and are contributing, to new theories, models, and datasets related to the topic of few-/zero-shot learning; (2) how few-/zero-shot learning can facilitate other tasks such as visual recognition and anomaly detection. The editors hope to collate a group of research results to report the recent developments in the related research topics. In addition, researchers can exchange their innovative ideas on the topic of few-/zero-shot learning in visual recognition and anomaly detection by submitting manuscripts for this Special Issue.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

(1) Theoretical advances and algorithm developments in few-/zero-shot learning;

(2) Useful applications of few-/zero-shot learning in visual recognition and anomaly detection;

(3) New datasets and benchmarks for few-/zero-shot learning in visual recognition and anomaly detection.

We look forward to receiving your contributions.

Dr. Yang Liu
Dr. Jin Li
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • few-/zero-shot learning
  • visual recognition
  • anomaly detection

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

39 pages, 1521 KB  
Article
Illumination-Decoupled Transformer Learning for Shadow-Robust Crop Disease Diagnosis Under Structured Cast Shadows
by Zuoming Yin, Yifei Zhang, Qiangqiang Lei and Fang Feng
Electronics 2026, 15(10), 2165; https://doi.org/10.3390/electronics15102165 - 18 May 2026
Viewed by 118
Abstract
Crop disease diagnosis can be degraded by structured cast shadows, including panel-like strip shadows that motivate applications in agrivoltaic-style farming. This paper presents ShadowFormer-AV, a transformer-based framework that adapts general shadow-robust visual learning to crop disease classification by separating disease evidence from illumination [...] Read more.
Crop disease diagnosis can be degraded by structured cast shadows, including panel-like strip shadows that motivate applications in agrivoltaic-style farming. This paper presents ShadowFormer-AV, a transformer-based framework that adapts general shadow-robust visual learning to crop disease classification by separating disease evidence from illumination interference. The proposed approach combines a soft shadow-prior extractor, an illumination-decoupled dual-stream token encoder, lesion-preserving adaptive attention, and a cross-view consistency objective between original and shadow-perturbed images. The method uses only standard RGB inputs and does not require shadow-free reference images, multispectral sensing, or pixel-level shadow annotation. We evaluated the framework on publicly available plant disease datasets using calibrated panel-like synthetic shadows and a naturally shadowed PlantDoc subset. Because no on-site agrivoltaic disease dataset was used, the conclusions were limited to shadow robustness under these simulated and naturally shadowed test conditions rather than verified performance under real photovoltaic-panel shadows. Within this validation boundary, ShadowFormer-AV improved accuracy, Macro-F1, and calibration over representative convolutional and transformer baselines, suggesting that illumination-aware token learning is useful for crop disease recognition under structured shadow interference. Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
21 pages, 10362 KB  
Article
U-Net-Based Model Design for Semantic Segmentation of Class-Imbalanced Semi-Synthetic Roads
by Artur Morys-Magiera, Marek Długosz and Paweł Skruch
Electronics 2026, 15(10), 2008; https://doi.org/10.3390/electronics15102008 - 9 May 2026
Viewed by 224
Abstract
Accurate semantic segmentation of roads and overlaid markings is essential for multi-camera multi-robot visual localization systems, yet lane markings occupy a tiny fraction of the image area, making them difficult to segment reliably. This paper presents a U-Net design study for semantic segmentation [...] Read more.
Accurate semantic segmentation of roads and overlaid markings is essential for multi-camera multi-robot visual localization systems, yet lane markings occupy a tiny fraction of the image area, making them difficult to segment reliably. This paper presents a U-Net design study for semantic segmentation of imbalanced segmentation of a dominant class and two similar, minority classes, that occur on top of the dominant class. We analyze the problem of designing a multi-head U-Net for segmenting semi-synthetic Duckietown model road map images into roads, stop-line markings, and lane-line markings. The multi-head design decomposes the task into a binary road segmentation head and a ternary marking segmentation head, connected through a road-aware loss that restricts marking supervision to predicted road regions. Our work assesses the nine loss functions to approach the class imbalance problem in the marking head—including cross-entropy, focal loss, Tversky loss, Lovász-softmax, and a subset of combinations thereof. These configurations are systematically evaluated on a dataset of semi-synthetic map images generated using an evolutionary algorithm described in a previous work of the authors, where road marking classes are a minority. The Tversky–Lovász combination achieves the highest per-class IoU across all segmentation targets, being statistically significantly better than other configurations. The results demonstrate that the Tversky loss combined with a direct IoU surrogate, Lovász-softmax, is particularly effective for small-object segmentation under severe class imbalance. Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
Show Figures

Figure 1

22 pages, 481 KB  
Article
PrivAgriVolt: Privacy-Preserving Shadow-Aware Vision for Crop Stress Diagnosis in Agrivoltaic Photovoltaic Systems
by Zuoming Yin, Yifei Zhang, Qiangqiang Lei and Fang Feng
Electronics 2026, 15(8), 1762; https://doi.org/10.3390/electronics15081762 - 21 Apr 2026
Viewed by 275
Abstract
Agrivoltaic systems co-locate photovoltaic (PV) arrays and crops, offering land-use efficiency and potential microclimate benefits, yet they introduce new challenges for computer-vision-based crop monitoring. PV structures produce strong, spatially varying shadows, specular reflections, and periodic occlusions that confound visual cues for diagnosing crop [...] Read more.
Agrivoltaic systems co-locate photovoltaic (PV) arrays and crops, offering land-use efficiency and potential microclimate benefits, yet they introduce new challenges for computer-vision-based crop monitoring. PV structures produce strong, spatially varying shadows, specular reflections, and periodic occlusions that confound visual cues for diagnosing crop diseases and abiotic stresses. Meanwhile, agrivoltaic deployments are often distributed across farms and operators, making centralized data collection impractical due to privacy, ownership, and regulatory concerns. This paper proposes PrivAgriVolt, a novel privacy-preserving learning framework for agrivoltaic crop issue recognition that explicitly models PV-induced illumination and enables collaborative training without sharing raw images. The core algorithm integrates (i) a PV-geometry-conditioned shadow normalization module that fuses estimated array layout and sun-angle priors into a shadow-aware appearance canonization network, reducing illumination-induced domain shift across times and sites; (ii) a federated contrastive stress learner that aligns stress semantics across farms via prototype-based contrastive objectives while remaining robust to heterogeneous sensors and crop stages; and (iii) an adaptive privacy layer that combines secure aggregation with budget-aware gradient perturbation and client-level clipping to provide formal privacy guarantees while preserving fine-grained diagnostic performance. Extensive experiments on real agricultural vision benchmarks and agrivoltaic shadow variants demonstrate that PrivAgriVolt improves stress recognition and segmentation under PV shading while maintaining strong privacy–utility trade-offs. Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
Show Figures

Figure 1

27 pages, 2835 KB  
Article
Textile Defect Detection Using Artificial Intelligence and Computer Vision—A Preliminary Deep Learning Approach
by Rúben Machado, Luis A. M. Barros, Vasco Vieira, Flávio Dias da Silva, Hugo Costa and Vitor Carvalho
Electronics 2025, 14(18), 3692; https://doi.org/10.3390/electronics14183692 - 18 Sep 2025
Cited by 9 | Viewed by 9393
Abstract
Fabric defect detection is essential for quality assurance in textile manufacturing, where manual inspection is inefficient and error-prone. This paper presents a real-time deep learning-based system leveraging YOLOv11 for detecting defects such as holes, color bleeding and creases on solid-colored, patternless cotton and [...] Read more.
Fabric defect detection is essential for quality assurance in textile manufacturing, where manual inspection is inefficient and error-prone. This paper presents a real-time deep learning-based system leveraging YOLOv11 for detecting defects such as holes, color bleeding and creases on solid-colored, patternless cotton and linen fabrics using edge computing. The system runs on an NVIDIA Jetson Orin Nano platform and supports real-time inference, Message Queuing Telemetry (MQTT)-based defect reporting, and optional Real-Time Messaging Protocol (RTMP) video streaming or local recording storage. Each detected defect is logged with class, confidence score, location and unique ID in a Comma Separated Values (CSV) file for further analysis. The proposed solution operates with two RealSense cameras placed approximately 1 m from the fabric under controlled lighting conditions, tested in a real industrial setting. The system achieves a mean Average Precision (mAP@0.5) exceeding 82% across multiple synchronized video sources while maintaining low latency and consistent performance. The architecture is designed to be modular and scalable, supporting plug-and-play deployment in industrial environments. Its flexibility in integrating different camera sources, deep learning models, and output configurations makes it a robust platform for further enhancements, such as adaptive learning mechanisms, real-time alerts, or integration with Manufacturing Execution System/Enterprise Resource Planning (MES/ERP) pipelines. This approach advances automated textile inspection and reduces dependency on manual processes. Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
Show Figures

Figure 1

14 pages, 4005 KB  
Article
A Directional Enhanced Adaptive Detection Framework for Small Targets
by Chao Li, Yifan Chang, Shimeng Yang, Kaiju Li and Guangqiang Yin
Electronics 2024, 13(22), 4535; https://doi.org/10.3390/electronics13224535 - 19 Nov 2024
Cited by 1 | Viewed by 1019
Abstract
Due to the challenges posed by limited size and features, positional and noise issues, and dataset imbalance and simplicity, small object detection is one of the most challenging tasks in the field of object detection. Consequently, an increasing number of researchers are focusing [...] Read more.
Due to the challenges posed by limited size and features, positional and noise issues, and dataset imbalance and simplicity, small object detection is one of the most challenging tasks in the field of object detection. Consequently, an increasing number of researchers are focusing on this area. In this paper, we propose a Directional Enhanced Adaptive (DEA) detection framework for small targets. This framework effectively combines the detection accuracy advantages of two-stage methods with the detection speed advantages of one-stage methods. Additionally, we introduce a Multi-Scale Object Adaptive Slicing (MASA) module and an improved IoU-based aggregation module that integrate with this framework to enhance detection performance. For better comparison, we use the F1 score as one of the evaluation metrics. The experimental results demonstrate that our DEA framework improves the performance of various backbone detection networks and achieves better comprehensive detection performance than other proposed methods, even though our network has not been trained on the test dataset while others have. Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
Show Figures

Figure 1

16 pages, 531 KB  
Article
A Robust Generalized Zero-Shot Learning Method with Attribute Prototype and Discriminative Attention Mechanism
by Xiaodong Liu, Weixing Luo, Jiale Du, Xinshuo Wang, Yuhao Dang and Yang Liu
Electronics 2024, 13(18), 3751; https://doi.org/10.3390/electronics13183751 - 21 Sep 2024
Viewed by 2447
Abstract
In the field of Generalized Zero-Shot Learning (GZSL), the challenge lies in learning attribute-based information from seen classes and effectively conveying this knowledge to recognize both seen and unseen categories during the training process. This paper proposes an innovative approach to enhance the [...] Read more.
In the field of Generalized Zero-Shot Learning (GZSL), the challenge lies in learning attribute-based information from seen classes and effectively conveying this knowledge to recognize both seen and unseen categories during the training process. This paper proposes an innovative approach to enhance the generalization ability and efficiency of GZSL models by integrating a Convolutional Block Attention Module (CBAM). The CBAM blends channel-wise and spatial-wise information to emphasize key features, thereby improving the model’s discriminative and localization capabilities. Additionally, the method employs a ResNet101 backbone for systematic image feature extraction, enhanced contrastive learning, and a similarity map generator with attribute prototypes. This comprehensive framework aims to achieve robust visual–semantic embedding for classification tasks. The proposed method demonstrates significant improvements in performance metrics in benchmark datasets, showcasing its potential in advancing GZSL applications. Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
Show Figures

Figure 1

18 pages, 1072 KB  
Article
Leveraging Self-Distillation and Disentanglement Network to Enhance Visual–Semantic Feature Consistency in Generalized Zero-Shot Learning
by Xiaoming Liu, Chen Wang, Guan Yang, Chunhua Wang, Yang Long, Jie Liu and Zhiyuan Zhang
Electronics 2024, 13(10), 1977; https://doi.org/10.3390/electronics13101977 - 18 May 2024
Viewed by 1803
Abstract
Generalized zero-shot learning (GZSL) aims to simultaneously recognize both seen classes and unseen classes by training only on seen class samples and auxiliary semantic descriptions. Recent state-of-the-art methods infer unseen classes based on semantic information or synthesize unseen classes using generative models based [...] Read more.
Generalized zero-shot learning (GZSL) aims to simultaneously recognize both seen classes and unseen classes by training only on seen class samples and auxiliary semantic descriptions. Recent state-of-the-art methods infer unseen classes based on semantic information or synthesize unseen classes using generative models based on semantic information, all of which rely on the correct alignment of visual–semantic features. However, they often overlook the inconsistency between original visual features and semantic attributes. Additionally, due to the existence of cross-modal dataset biases, the visual features extracted and synthesized by the model may also mismatch with some semantic features, which could hinder the model from properly aligning visual–semantic features. To address this issue, this paper proposes a GZSL framework that enhances the consistency of visual–semantic features using a self-distillation and disentanglement network (SDDN). The aim is to utilize the self-distillation and disentanglement network to obtain semantically consistent refined visual features and non-redundant semantic features to enhance the consistency of visual–semantic features. Firstly, SDDN utilizes self-distillation technology to refine the extracted and synthesized visual features of the model. Subsequently, the visual–semantic features are then disentangled and aligned using a disentanglement network to enhance the consistency of the visual–semantic features. Finally, the consistent visual–semantic features are fused to jointly train a GZSL classifier. Extensive experiments demonstrate that the proposed method achieves more competitive results on four challenging benchmark datasets (AWA2, CUB, FLO, and SUN). Full article
(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)
Show Figures

Figure 1

Back to TopTop