Submit to Special Issue Submit Abstract to Special Issue Review for Electronics Propose a Special Issue

Journal Menu

Journal Browser

Deep Learning in Image Processing and Pattern Recognition, 2nd Edition

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Related Special Issue
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 30 November 2025 | Viewed by 16166

Share This Special Issue

Special Issue Editors

Prof. Dr. Yuji Iwahori

E-Mail Website
Guest Editor

Department of Computer Science, Chubu University, 1200 Matsumoto-cho, Kasugai 487-8501, Aichi, Japan
Interests: computer vision; neural networks; machine learning; medical image analysis
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Aili Wang

E-Mail Website
Guest Editor

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China
Interests: remote sensing image processing; deep learning
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Haibin Wu

E-Mail Website
Guest Editor

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China
Interests: machine vision; visual detection and image processing; medical virtual reality
Special Issues, Collections and Topics in MDPI journals

Dr. Xiaoming Sun

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

People primarily use images to acquire and exchange information, so the application of image processing is inevitably involved in all aspects of human life and work. At present, image processing technology has played an important role in the fields of aerospace, public security, biomedicine, industrial engineering, and business communication. Up until now, image processing technology based on deep learning has rapidly developed and become the most successful applied intelligent technology. Pattern recognition is an important research field in image processing and includes image preprocessing, feature extraction and selection, classifier design, and classification decisions.

In this context, for this Special Issue on “Deep Learning in Image Processing and Pattern Recognition”, we invite original research and comprehensive reviews on topics that include, but are not limited to, the following:

Advances in image preprocessing;
Advances in feature selection in images;
Advances in pattern recognition in image processing technology;
Image processing in intelligent transportation;
Hyperspectral image processing;
Biomedical image processing;
Image processing in intelligent monitoring;
Deep learning for image processing;
AI-based image processing, understanding, recognition, compression, and reconstruction.

Prof. Dr. Yuji Iwahori
Dr. Aili Wang
Prof. Dr. Haibin Wu
Dr. Xiaoming Sun
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

deep learning
image processing
pattern recognition

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Deep Learning in Image Processing and Pattern Recognition in Electronics (36 articles)

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

22 pages, 18934 KB

Open AccessArticle

A Graph-Aware Color Correction and Texture Restoration Framework for Underwater Image Enhancement

by Jin Qian, Bin Zhang, Hui Li and Xiaoshuang Xing

Electronics 2025, 14(20), 4079; https://doi.org/10.3390/electronics14204079 - 17 Oct 2025

Viewed by 44

Abstract

Underwater imagery exhibits markedly more severe visual degradation than their terrestrial counterparts, manifesting as pronounced color aberration, diminished contrast and luminosity, and spatially non-uniform haze. To surmount these challenges, we propose the graph-aware framework for underwater image enhancement (GA-UIE), integrating specialized modules for color correction and texture restoration, a unified framework that explicitly utilizes the intrinsic graph information of underwater images to achieve high-fidelity color restoration and texture enhancement. The proposed algorithm is architected in three synergistic stages: (1) graph feature generation, which distills color and texture graph feature priors from the underwater image; (2) graph-aware enhancement, performing joint color restoration and texture sharpening under explicit graph priors; and (3) graph-aware fusion, harmoniously aggregating the graph-aware color and texture joint representations to yield the final visually coherent output. Comprehensive quantitative evaluations reveal that the output from our novel framework achieves the significant scores across a broad spectrum of metrics, including PSNR, SSIM, LPIPS, UCIQE, and UIQM on the UIEB and U45 datasets. These results decisively exceed those of all existing benchmark techniques, thereby validating the method’s exceptional efficacy in the enhancement of underwater imagery. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

23 pages, 10648 KB

Open AccessArticle

Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification

by Aili Wang, Kang Zhang, Haibin Wu, Haisong Chen and Minhui Wang

Electronics 2025, 14(15), 2952; https://doi.org/10.3390/electronics14152952 - 24 Jul 2025

Viewed by 541

Abstract

In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a neural architecture search (NAS) for a few-shot HSI classification method that combines meta learning. Firstly, a multi-source domain learning framework was constructed to integrate heterogeneous natural images and homogeneous remote sensing images to improve the information breadth of few-sample learning, enabling the final network to enhance its generalization ability under limited labeled samples by learning the similarity between different data sources. Secondly, by constructing precise and robust search spaces and deploying different units at different locations, the classification accuracy and model transfer robustness of the final network can be improved. This method fully utilizes spatial texture information and rich category information of multi-source data and transfers the learned meta knowledge to the optimal architecture for HSIC execution through precise and robust search space design, achieving HSIC tasks with limited samples. Experimental results have shown that our proposed method achieved an overall accuracy (OA) of 98.57%, 78.39%, and 98.74% for classification on the Pavia Center, Indian Pine, and WHU-Hi-LongKou datasets, respectively. It is fully demonstrated that utilizing spatial texture information and rich category information of multi-source data, and through precise and robust search space design, the learned meta knowledge is fully transmitted to the optimal architecture for HSIC, perfectly achieving classification tasks with few-shot samples. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

16 pages, 2365 KB

Open AccessArticle

Fast Inference End-to-End Speech Synthesis with Style Diffusion

by Hui Sun, Jiye Song and Yi Jiang

Electronics 2025, 14(14), 2829; https://doi.org/10.3390/electronics14142829 - 15 Jul 2025

Viewed by 1753

Abstract

In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in the decoder. To address these issues, this paper proposes an improved TTS framework named Q-VITS. Q-VITS incorporates Rotary Position Embedding (RoPE) into the text encoder to enhance long-sequence modeling, adopts a frame-level prior modeling strategy to optimize one-to-many mappings, and designs a style extractor based on a diffusion model for controllable style rendering. Additionally, the proposed decoder ConfoGAN integrates explicit F0 modeling, Pseudo-Quadrature Mirror Filter (PQMF) multi-band synthesis and Conformer structure. The experimental results demonstrate that Q-VITS outperforms the VITS in terms of speech quality, pitch accuracy, and inference efficiency in both subjective Mean Opinion Score (MOS) and objective Mel-Cepstral Distortion (MCD) and Root Mean Square Error (RMSE) evaluations on a single-speaker dataset, achieving performance close to ground-truth audio. These improvements provide an effective solution for efficient and controllable speech synthesis. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

24 pages, 12224 KB

Open AccessArticle

Roadside Perception Applications Based on DCAM Fusion and Lightweight Millimeter-Wave Radar–Vision Integration

by Xiaoyu Yu, Tao Hu and Haozhen Zhu

Electronics 2025, 14(8), 1576; https://doi.org/10.3390/electronics14081576 - 13 Apr 2025

Cited by 2 | Viewed by 898

Abstract

With the advancement in intelligent transportation systems, single-sensor perception solutions face inherent limitations. To address the constraints of monocular vision detection, this study presents a vehicle road detection system that integrates millimeter-wave radar and visual information. By generating mask maps from millimeter-wave radar point clouds, radar data transition from a global assistance role to localized guidance, identifying vehicle target positions within RGB images. These mask maps, along with RGB images, are processed by a Dual Cross-Attention Module (DCAM), where the fused features are fed into an enhanced YOLOv5 network, improving target localization accuracy. The proposed dual-input DCAM enables dynamic feature fusion, allowing the model to adjust its reliance on visual and radar data according to environmental conditions. To optimize the network architecture, ShuffleNetv2 replaces the YOLOv5 Backbone, while the Ghost Module is incorporated into the Neck, creating a lightweight design. Pruning techniques are applied to reduce model complexity, making it suitable for embedded applications and real-time detection scenarios. The experimental results demonstrate that this fusion scheme effectively improves vehicle detection accuracy and robustness compared to YOLOv5, with accuracy increasing from 59.4% to 67.2%. The number of parameters is reduced from 7.05 M to 2.52 M, providing a precise and reliable solution for intelligent transportation and roadside perception. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

20 pages, 1969 KB

Open AccessArticle

SlantNet: A Lightweight Neural Network for Thermal Fault Classification in Solar PV Systems

by Hrach Ayunts, Sos Agaian and Artyom Grigoryan

Electronics 2025, 14(7), 1388; https://doi.org/10.3390/electronics14071388 - 30 Mar 2025

Cited by 3 | Viewed by 959

Abstract

The rapid growth of solar photovoltaic (PV) installations worldwide has increased the need for the effective monitoring and maintenance of these vital renewable energy assets. PV systems are crucial in reducing greenhouse gas emissions and diversifying electricity generation. However, they often experience faults and damage during manufacturing or operation, significantly impacting their performance, while thermal infrared imaging provides a promising non-invasive method for detecting common defects such as hotspots, cracks, and bypass diode failures, current deep learning approaches for fault classification generally rely on computationally intensive architectures or closed-source solutions, constraining their practical use in real-time situations involving low-resolution thermal data. To tackle these challenges, we introduce SlantNet, a lightweight neural network crafted to classify thermal PV defects efficiently and accurately. At its core, SlantNet incorporates an innovative Slant Convolution (SC) layer that utilizes slant transformation to enhance directional feature extraction and capture subtle thermal gradient variations essential for fault detection. We complement this architectural advancement with a thermal-specific image enhancement augmentation strategy that employs adaptive contrast adjustments to bolster model robustness under the noisy and class-imbalanced conditions typically encountered in field applications. Extensive experimental validation on a comprehensive solar panel defect detection benchmark dataset showcases SlantNet’s exceptional performance. Our method achieves a 95.1% classification accuracy while reducing computational overhead by approximately 60% compared to leading models. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

19 pages, 685 KB

Open AccessArticle

Orientation Detection in Color Images Using a Bio-Inspired Artificial Visual System

by Tianqi Chen, Zeyu Zhang, Yuki Todo, Zheng Tang and Huiran Zhang

Electronics 2025, 14(2), 239; https://doi.org/10.3390/electronics14020239 - 8 Jan 2025

Viewed by 1089

Abstract

In this study, we propose a biologically inspired artificial visual system (AVS) for efficient orientation detection. The AVS begins by processing multi-channel red, green and blue (RGB) inputs using cone cells, which is followed by the preprocessing of visual signals through on–off response mechanisms in bipolar and horizontal cells. Local dendritic neurons detect orientation and generate feature maps, which are then integrated in a lateral geniculate nucleus (LGN)-like process to capture global features. Inspired by the Koch, Poggio, and Torre framework, the dendritic model employs nonlinear multiplicative operations for feature selection, while backpropagation optimizes parameters for accurate motion direction analysis. Our system significantly reduces learning time and computational costs compared to traditional convolutional neural networks (CNNs) by over 50% in duration and RAM usage, especially to the complex models like ResNet and EfficientNet. Evaluations on various noise conditions and real-world datasets demonstrate the AVS’s robustness, high accuracy, and efficiency, even when trained with limited data. The biologically plausible design, coupled with the system’s ability to process RGB images, makes the AVS a promising solution for industrial and medical applications, such as defect detection and medical image analysis. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

22 pages, 8517 KB

Open AccessArticle

Insulator Defect Detection Based on YOLOv5s-KE

by Guozhi Fang, Xin An, Qi Fang and Shengpan Gao

Electronics 2024, 13(17), 3483; https://doi.org/10.3390/electronics13173483 - 2 Sep 2024

Cited by 4 | Viewed by 1409

Abstract

To tackle the issue of low detection accuracy in insulator images caused by intricate backgrounds and small defect sizes, as well as the requirement for real-time detection on embedded and mobile devices, this research introduces the YOLOv5s-KE model. Integrating multiple strategies, YOLOv5s-KE aims to boost detection accuracy significantly. Initially, an enhanced anchor generation method utilizing the K-means++ algorithm is proposed to generate more appropriate anchor boxes for insulator defects. Moreover, an attention mechanism is integrated into both the backbone and neck networks to enhance the model’s capacity to focus on defect features and resist interference. To improve the detection of small defects, the EIoU loss function is implemented in place of the original CIoU loss function. In order to meet the real-time detection needs on embedded and mobile devices, the model is further refined through the integration of Ghost convolution for lightweight feature extraction and a linear transformation to reduce the computational burden of standard convolution. A channel pruning strategy is deployed to optimize the sparsely trained network, diminishing redundancy, and improving model generalization. Additionally, the CARAFE operator replaces the original upsampling operator to minimize model parameters and elevate detection speed. Experimental outcomes demonstrate that YOLOv5s-KE achieves a detection accuracy of 92.3% on the Chinese transmission line insulator dataset, marking a 5.2% enhancement over the original YOLOv5s. The streamlined version of YOLOv5s-KE achieves a detection speed of 94.3 frames per second, indicating an improvement of 30.1 frames per second compared to the original model. Model parameters are condensed to 9.6 M, resulting in a detection accuracy of 91.1%. This study underscores the precision and efficiency of the proposed approach, suggesting that the advanced strategies explored introduce novel possibilities for insulator defect detection. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

16 pages, 13049 KB

Open AccessArticle

Image Databases with Features Augmented with Singular-Point Shapes to Enhance Machine Learning

by Nikolay Metodiev Sirakov and Adam Bowden

Electronics 2024, 13(16), 3150; https://doi.org/10.3390/electronics13163150 - 9 Aug 2024

Viewed by 1791

Abstract

The main objective of this paper is to present a repository of image databases whose features are augmented with embedded vector field (VF) features. The repository is designed to provide the user with image databases that enhance machine learning (ML) classification. Also, six VFs are provided, and the user can embed them into her/his own image database with the help of software named ELPAC. Three of the VFs generate real-shaped singular points (SPs): springing, sinking, and saddle. The other three VFs generate seven kinds of SPs, which include the real-shaped SPs and four complex-shaped SPs: repelling and attracting (out and in) spirals and clockwise and counterclockwise orbits (centers). Using the repository, this work defines the locations of the SPs according to the image objects and the mappings between the SPs’ shapes if separate VFs are embedded into the same image. Next, this paper produces recommendations for the user on how to select the most appropriate VF to be embedded in an image database so that the augmented SP shapes enhance ML classification. Examples of images with embedded VFs are shown in the text to illustrate, support, and validate the theoretical conclusions. Thus, the contributions of this paper are the derivation of the SP locations in an image; mappings between the SPs of different VFs; and the definition of an imprint of an image and an image database in a VF. The advantage of classifying an image database with an embedded VF is that the new database enhances and improves the ML classification statistics, which motivates the design of the repository so that it contains image features augmented with VF features. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

12 pages, 2213 KB

Open AccessArticle

Transforming Color: A Novel Image Colorization Method

by Hamza Shafiq and Bumshik Lee

Electronics 2024, 13(13), 2511; https://doi.org/10.3390/electronics13132511 - 26 Jun 2024

Cited by 5 | Viewed by 4504

Abstract

This paper introduces a novel method for image colorization that utilizes a color transformer and generative adversarial networks (GANs) to address the challenge of generating visually appealing colorized images. Conventional approaches often struggle with capturing long-range dependencies and producing realistic colorizations. The proposed method integrates a transformer architecture to capture global information and a GAN framework to improve visual quality. In this study, a color encoder that utilizes a random normal distribution to generate color features is applied. These features are then integrated with grayscale image features to enhance the overall representation of the images. Our method demonstrates superior performance compared with existing approaches by utilizing the capacity of the transformer, which can capture long-range dependencies and generate a realistic colorization of the GAN. Experimental results show that the proposed network significantly outperforms other state-of-the-art colorization techniques, highlighting its potential for image colorization. This research opens new possibilities for precise and visually compelling image colorization in domains such as digital restoration and historical image analysis. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

20 pages, 22781 KB

Open AccessArticle

Multi-Scale Residual Spectral–Spatial Attention Combined with Improved Transformer for Hyperspectral Image Classification

by Aili Wang, Kang Zhang, Haibin Wu, Yuji Iwahori and Haisong Chen

Electronics 2024, 13(6), 1061; https://doi.org/10.3390/electronics13061061 - 13 Mar 2024

Cited by 5 | Viewed by 1690

Abstract

Aiming to solve the problems of different spectral bands and spatial pixels contributing differently to hyperspectral image (HSI) classification, and sparse connectivity restricting the convolutional neural network to a globally dependent capture, we propose a HSI classification model combined with multi-scale residual spectral–spatial attention and an improved transformer in this paper. First, in order to efficiently highlight discriminative spectral–spatial information, we propose a multi-scale residual spectral–spatial feature extraction module that preserves the multi-scale information in a two-layer cascade structure, and the spectral–spatial features are refined by residual spectral–spatial attention for the feature-learning stage. In addition, to further capture the sequential spectral relationships, we combine the advantages of Cross-Attention and Re-Attention to alleviate computational burden and attention collapse issues, and propose the Cross-Re-Attention mechanism to achieve an improved transformer, which can efficiently alleviate the heavy memory footprint and huge computational burden of the model. The experimental results show that the overall accuracy of the proposed model in this paper can reach 98.71%, 99.33%, and 99.72% for Indiana Pines, Kennedy Space Center, and XuZhou datasets, respectively. The proposed method was verified to have high accuracy and effectiveness compared to the state-of-the-art models, which shows that the concept of the hybrid architecture opens a new window for HSI classification. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Journal Menu

Journal Browser

Deep Learning in Image Processing and Pattern Recognition, 2nd Edition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI