Image Processing Based on Convolution Neural Network: 2nd Edition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 March 2026 | Viewed by 10081

Special Issue Editors

Special Issue Information

Dear Colleagues,

The advent of Convolutional Neural Networks (CNNs) has revolutionized the realm of image processing, leading to breakthroughs in numerous fields such as facial recognition, autonomous vehicles, and medical imaging. This can be attributed to their capacity for processing large-scale image data both efficiently and dependably. While CNN-based image processing techniques have played a significant role in feature extraction, information fusion, and the processing of static, dynamic, color, and grayscale images, it still holds immense potential for future advancements. Currently, more research is applying CNN-based image processing techniques to fields such as medical imaging, biometric identification, entertainment media, and public safety, presenting a variety of more refined and novel visual capabilities to individuals while simultaneously ensuring greater convenience.

Nevertheless, key challenges arise when CNNs are applied in image processing. These include the difficulty in handling complex and large-scale data, as well as the model's sensitivity to geometric transformations such as image deformation and rotation, which can lead to unstable prediction outcomes. In addition, the black-box nature of CNNs obscures the decision-making process, making it difficult to understand and interpret. Moreover, CNNs require a vast amount of annotated data for training, which can be challenging to obtain in certain fields like medical image processing, thereby limiting their application in these areas. Finally, just as federated learning has enhanced data security in computing networks, similar concerns and solutions are applicable to image processing using CNNs.

This Special Issue aims to provide a platform for researchers to present innovative and effective image processing technologies based on CNNs. This includes addressing the following specific topics:

  • Advancements in CNN-based image processing techniques;
  • Integration of CNNs with other AI techniques for image processing;
  • CNN architecture optimization for image processing;
  • Mathematical models for CNN-based image processing;
  • Security and privacy in image processing;
  • Resource allocation optimization for CNNs in image processing tasks;
  • Modeling, analysis, and measurement of computational and requirements for CNN-based image processing;
  • Interpretable image processing with CNNs.

Prof. Dr. Shaozhang Niu
Dr. Jiwei Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • convolutional neural networks
  • deep learning
  • image processing
  • machine learning
  • information security
  • privacy-preserving
  • architecture optimization
  • multimedia

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 6063 KB  
Article
Prex-NetII: Attention-Based Back-Projection Network for Light Field Reconstruction
by Dong-Myung Kim and Jae-Won Suh
Electronics 2025, 14(20), 4117; https://doi.org/10.3390/electronics14204117 - 21 Oct 2025
Viewed by 314
Abstract
We propose an attention-based back-projection network that enhances light field reconstruction quality by modeling inter-view dependencies. The network uses pixel shuffle to efficiently extract initial features. Spatial attention focuses on important regions while capturing inter-view dependencies. Skip connections in the refinement network improve [...] Read more.
We propose an attention-based back-projection network that enhances light field reconstruction quality by modeling inter-view dependencies. The network uses pixel shuffle to efficiently extract initial features. Spatial attention focuses on important regions while capturing inter-view dependencies. Skip connections in the refinement network improve stability and reconstruction performance. In addition, channel attention within the projection blocks enhances structural representation across views. The proposed method reconstructs high-quality light field images not only in general scenes but also in complex scenes containing occlusions and reflections. The experimental results show that the proposed method outperforms existing approaches. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

20 pages, 803 KB  
Article
The Effective Highlight-Detection Model for Video Clips Using Spatial—Perceptual
by Sungshin Kwak, Jaedong Lee and Sohyun Park
Electronics 2025, 14(18), 3640; https://doi.org/10.3390/electronics14183640 - 15 Sep 2025
Viewed by 2408
Abstract
With the rapid growth of video platforms such as YouTube, Bilibili, and Dailymotion, an enormous amount of video content is being shared worldwide. In this environment, content providers are increasingly adopting methods that restructure videos around highlight scenes and distribute them in short-form [...] Read more.
With the rapid growth of video platforms such as YouTube, Bilibili, and Dailymotion, an enormous amount of video content is being shared worldwide. In this environment, content providers are increasingly adopting methods that restructure videos around highlight scenes and distribute them in short-form formats to encourage more efficient content consumption by viewers. As a result of this trend, the importance of highlight extraction technologies capable of automatically identifying key scenes from large-scale video datasets has been steadily increasing. To address this need, this study proposes SPOT (Spatial Perceptual Optimized TimeSformer), a highlight extraction model. The proposed model enhances spatial perceptual capability by integrating a CNN encoder into the internal structure of the existing Transformer-based TimeSformer, enabling simultaneous learning of both the local and global features of a video. The experiments were conducted using Google’s YT-8M video dataset along with the MR.Hisum dataset, which provides organized highlight information. The SPOT model adopts a regression-based highlight prediction framework. Experimental results on video datasets of varying complexity showed that, in the high-complexity group, the SPOT model achieved a reduction in mean squared error (MSE) of approximately 0.01 (from 0.090 to 0.080) compared to the original TimeSformer. Furthermore, the model outperformed the baseline across all complexity groups in terms of mAP, Coverage, and F1-Score metrics. These results suggest that the proposed model holds strong potential for diverse multimodal applications such as video summarization, content recommendation, and automated video editing. Moreover, it is expected to serve as a foundational technology for advancing video-based artificial intelligence systems in the future. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

21 pages, 904 KB  
Article
Ensemble-Based Knowledge Distillation for Identification of Childhood Pneumonia
by Grega Vrbančič and Vili Podgorelec
Electronics 2025, 14(15), 3115; https://doi.org/10.3390/electronics14153115 - 5 Aug 2025
Viewed by 722
Abstract
Childhood pneumonia remains a key cause of global morbidity and mortality, highlighting the need for accurate and efficient diagnostic tools. Ensemble methods have proven to be among the most successful approaches for identifying childhood pneumonia from chest X-ray images. However, deploying large, complex [...] Read more.
Childhood pneumonia remains a key cause of global morbidity and mortality, highlighting the need for accurate and efficient diagnostic tools. Ensemble methods have proven to be among the most successful approaches for identifying childhood pneumonia from chest X-ray images. However, deploying large, complex convolutional neural network models in resource-constrained environments presents challenges due to their high computational demands. Therefore, we propose a novel ensemble-based knowledge distillation method for identifying childhood pneumonia from X-ray images, which utilizes an ensemble of classification models to distill the knowledge to a more efficient student model. Experiments conducted on a chest X-ray dataset show that the distilled student model achieves comparable (statistically not significantly different) predictive performance to that of the Stochastic Gradient with Warm Restarts ensemble method (F1-score on average 0.95 vs. 0.96, respectively), while significantly reducing inference time and decreasing FLOPs by a factor of 6.5. Based on the obtained results, the proposed method highlights the potential of knowledge distillation to enhance the efficiency of complex methods, making them more suitable for utilization in environments with limited computational resources. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

21 pages, 16441 KB  
Article
Video Compression Using Hybrid Neural Representation with High-Frequency Spectrum Analysis
by Jian Hua Zhao, Xue Jun Li and Peter Han Joo Chong
Electronics 2025, 14(13), 2574; https://doi.org/10.3390/electronics14132574 - 26 Jun 2025
Viewed by 1050
Abstract
Recent advancements in implicit neural representations have shown substantial promise in various domains, particularly in video compression and reconstruction, due to their rapid decoding speed and high adaptability. Building upon the state-of-the-art Neural Representations for Videos, the Expedite Neural Representation for Videos and [...] Read more.
Recent advancements in implicit neural representations have shown substantial promise in various domains, particularly in video compression and reconstruction, due to their rapid decoding speed and high adaptability. Building upon the state-of-the-art Neural Representations for Videos, the Expedite Neural Representation for Videos and Hybrid Neural Representation for Videos primarily enhance performance by optimizing and expanding the embedded input of the Neural Representations for Videos network. However, the core module in Neural Representations for Videos network, responsible for video reconstruction, has garnered comparatively less attention. This paper introduces a novel High-frequency Spectrum Hybrid Network, which leverages high-frequency information from the frequency domain to generate detailed image reconstructions. The central component of this approach is the High-frequency Spectrum Hybrid Network block, an innovative extension of the module in Neural Representations for Videos network, which integrates the High-frequency Spectrum Convolution Module into the original framework. The high-frequency spectrum convolution module emphasizes the extraction of high-frequency features through a frequency domain attention mechanism, significantly enhancing both performance and the recovery of local details in video images. As an enhanced module in the Neural Representations for Videos network, it demonstrates exceptional adaptability and versatility, enabling seamless integration into a wide range of existing Neural Representations for Videos network architectures without requiring substantial modifications to achieve improved results. In addition, this work introduces the High-frequency Spectrum loss function and the Multi-scale Feature Reuse Path to further mitigate the issue of blurriness caused by the loss of high-frequency details during image generation. Experimental evaluations confirm that the proposed High-frequency Spectrum Hybrid Network surpasses the performance of the Neural Representations for Videos, the Expedite Neural Representation for Videos, and the Hybrid Neural Representation for Videos, achieving improvements of +5.75 dB, +4.53 dB, and +1.05 dB in peak signal-to-noise ratio, respectively. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

28 pages, 17488 KB  
Article
Attentive Multi-Scale Features with Adaptive Context PoseResNet for Resource-Efficient Human Pose Estimation
by Ali Zakir, Sartaj Ahmed Salman, Gibran Benitez-Garcia and Hiroki Takahashi
Electronics 2025, 14(11), 2107; https://doi.org/10.3390/electronics14112107 - 22 May 2025
Viewed by 1044
Abstract
Human Pose Estimation (HPE) remains challenging due to scale variation, occlusion, and high computational costs. Standard methods often struggle to capture detailed spatial information when keypoints are obscured, and they typically rely on computationally expensive deconvolution layers for upsampling, making them inefficient for [...] Read more.
Human Pose Estimation (HPE) remains challenging due to scale variation, occlusion, and high computational costs. Standard methods often struggle to capture detailed spatial information when keypoints are obscured, and they typically rely on computationally expensive deconvolution layers for upsampling, making them inefficient for real-time or resource-constrained scenarios. We propose AMFACPose (Attentive Multi-scale Features with Adaptive Context PoseResNet) to address these limitations. Specifically, our architecture incorporates Coordinate Convolution 2D (CoordConv2d) to retain explicit spatial context, alleviating the loss of coordinate information in conventional convolutions. To reduce computational overhead while maintaining accuracy, we utilize Depthwise Separable Convolutions (DSCs), separating spatial and pointwise operations. At the core of our approach is an Adaptive Feature Pyramid Network (AFPN), which replaces costly deconvolution-based upsampling by efficiently aggregating multi-scale features to handle diverse human poses and body sizes. We further introduce Dual-Gate Context Blocks (DGCBs) that refine global context to manage partial occlusions and cluttered backgrounds. The model integrates Squeeze-and-Excitation (SE) blocks and the Spatial–Channel Refinement Module (SCRM) to emphasize the most informative feature channels and spatial regions, which is particularly beneficial for occluded or overlapping keypoints. For precise keypoint localization, we replace dense heatmap predictions with coordinate classification using Multi-Layer Perceptron (MLP) heads. Experiments on the COCO and CrowdPose datasets demonstrate that AMFACPose surpasses the existing 2D HPE methods in both accuracy and computational efficiency. Moreover, our implementation on edge devices achieves real-time performance while preserving high accuracy, confirming the suitability of AMFACPose for resource-constrained pose estimation in both benchmark and real-world environments. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

17 pages, 380 KB  
Article
Multi-Head Hierarchical Attention Framework with Multi-Level Learning Optimization Strategy for Legal Text Recognition
by Ke Zhang, Yufei Tu, Jun Lu, Zhongliang Ai, Zhonglin Liu, Licai Wang and Xuelin Liu
Electronics 2025, 14(10), 1946; https://doi.org/10.3390/electronics14101946 - 10 May 2025
Cited by 1 | Viewed by 1353
Abstract
Owing to the rapid increase in the amount of legal text data and the increasing demand for intelligent processing, multi-label legal text recognition is becoming increasingly important in practical applications such as legal information retrieval and case classification. However, traditional methods have limitations [...] Read more.
Owing to the rapid increase in the amount of legal text data and the increasing demand for intelligent processing, multi-label legal text recognition is becoming increasingly important in practical applications such as legal information retrieval and case classification. However, traditional methods have limitations in handling the complex semantics and multi-label characteristics of legal texts, making it difficult to accurately extract feature and effective category information. Therefore, this study proposes a novel multi-head hierarchical attention framework suitable for multi-label legal text recognition tasks. This framework comprises a feature extraction module and a hierarchical module. The former extracts multi-level semantic representations of text, while the latter obtains multi-label category information. In addition, this study proposes a novel hierarchical learning optimization strategy that balances the learning needs of multi-level semantic representation and multi-label category information through data preprocessing, loss calculation, and weight updating, effectively accelerating the convergence speed of framework training. We conducted comparative experiments on the legal domain dataset CAIL2021 and the general multi-label recognition datasets AAPD and Web of Science (WOS). The results indicate that the method proposed in this study is significantly superior to mainstream methods in legal and general scenarios, demonstrating excellent performance. The study findings are expected to be widely applied in the field of intelligent processing of legal information, improving the accuracy of intelligent classification of judicial cases and further promoting the digitalization and intelligence process of the legal industry. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

25 pages, 5122 KB  
Article
Detection of Exoplanets in Transit Light Curves with Conditional Flow Matching and XGBoost
by Stefano Fiscale, Alessio Ferone, Angelo Ciaramella, Laura Inno, Massimiliano Giordano Orsini, Giovanni Covone and Alessandra Rotundi
Electronics 2025, 14(9), 1738; https://doi.org/10.3390/electronics14091738 - 24 Apr 2025
Cited by 3 | Viewed by 2464
Abstract
NASA’s space-based telescopes Kepler and Transiting Exoplanet Survey Satellite (TESS) have detected billions of potential planetary signatures, typically classified with Convolutional Neural Networks (CNNs). In this study, we introduce a hybrid model that combines deep learning, dimensionality reduction, decision trees, and diffusion models [...] Read more.
NASA’s space-based telescopes Kepler and Transiting Exoplanet Survey Satellite (TESS) have detected billions of potential planetary signatures, typically classified with Convolutional Neural Networks (CNNs). In this study, we introduce a hybrid model that combines deep learning, dimensionality reduction, decision trees, and diffusion models to distinguish planetary transits from astrophysical false positives and instrumental artifacts. Our model consists of three main components: (i) feature extraction using the CNN VGG19, (ii) dimensionality reduction through t-Distributed Stochastic Neighbor Embedding (t-SNE), and (iii) classification using Conditional Flow Matching (CFM) and XGBoost. We evaluated the model on two Kepler and one TESS datasets, achieving F1-scores of 98% and 100%, respectively. Our results demonstrate the effectiveness of VGG19 in extracting discriminative patterns from data, t-SNE in projecting features in a lower dimensional space where they can be most effectively classified, and CFM with XGBoost in enabling robust classification with minimal computational cost. This study highlights that a hybrid approach leveraging deep learning and dimensionality reduction allows one to achieve state-of-the-art performance in exoplanet detection while maintaining a low computational cost. Future work will explore the use of adaptive dimensionality reduction methods and the application to data from upcoming missions like the ESA’s PLATO mission. Full article
(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)
Show Figures

Figure 1

Back to TopTop