Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.

AI-Driven Image Processing: Theory, Methods, and Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 March 2026 | Viewed by 6121

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Mathematics, Central South University of Forestry and Technology, Changsha 410004, China
Interests: image processing; machine learning and applications; artificial intelligence

Special Issue Information

Dear Colleagues,

This Special Issue aims to explore the transformative role of artificial intelligence (AI) in advancing image processing across theoretical, methodological, and applied domains. Rapid advancements in AI, particularly in deep learning, generative models, and computer vision, have revolutionized traditional image processing paradigms, enabling unprecedented accuracy, efficiency, and scalability. This issue seeks to showcase cutting-edge research addressing fundamental challenges—such as interpretability, robustness, and computational efficiency—while fostering innovative solutions for real-world applications.

The scope encompasses three core themes. Theory focuses on foundational AI frameworks, including novel architectures (e.g., CNNs, transformers, diffusion models), learning paradigms (self-supervised, few-shot learning), and theoretical insights into model generalization and adversarial robustness. Methods emphasize algorithmic innovations, such as lightweight models for edge computing, federated learning for privacy preservation, and multimodal fusion techniques. Submissions may also address dataset curation, ethical AI practices, and evaluation metrics tailored to diverse imaging contexts. Applications highlight AI-driven breakthroughs in domains like medical imaging (e.g., disease diagnosis, surgical planning), autonomous systems (object detection, scene understanding), environmental monitoring (satellite/remote sensing), and creative industries (image restoration, style transfer).

This Special Issue encourages interdisciplinary contributions bridging AI, computer vision, and domain-specific challenges. Researchers are invited to submit original articles, reviews, and case studies that advance the state of the art, address scalability and fairness concerns, or demonstrate transformative impacts. Research areas may include (but are not limited to) the following:

  1. Image enhancement;
  2. Image recovery;
  3. Image super-resolution;
  4. Image denoising;
  5. Image deblurring;
  6. Image fusion;
  7. Image segmentation;
  8. Image classification;
  9. Object detection;
  10. Computational imaging;
  11. Polarization imaging;
  12. Infrared imaging;
  13. Hyperspectral imaging.

Dr. Junchao Zhang
Prof. Dr. Chuanli Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • image recovery
  • image enhancement
  • object detection
  • image fusion
  • computational imaging

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 1521 KB  
Article
Image–Text Sentiment Analysis Based on Dual-Path Interaction Network with Multi-Level Consistency Learning
by Zhi Ji, Chunlei Wu, Qinfu Xu and Yixiang Wu
Electronics 2026, 15(3), 581; https://doi.org/10.3390/electronics15030581 - 29 Jan 2026
Viewed by 360
Abstract
With the continuous evolution of social media, users are increasingly inclined to express their personal emotions on digital platforms by integrating information presented in multiple modalities. Within this context, research on image–text sentiment analysis has garnered significant attention. Prior research efforts have made [...] Read more.
With the continuous evolution of social media, users are increasingly inclined to express their personal emotions on digital platforms by integrating information presented in multiple modalities. Within this context, research on image–text sentiment analysis has garnered significant attention. Prior research efforts have made notable progress by leveraging shared emotional concepts across visual and textual modalities. However, existing cross-modal sentiment analysis methods face two key challenges: Previous approaches often focus excessively on fusion, resulting in learned features that may not achieve emotional alignment; traditional fusion strategies are not optimized for sentiment tasks, leading to insufficient robustness in final sentiment discrimination. To address the aforementioned issues, this paper proposes a Dual-path Interaction Network with Multi-level Consistency Learning (DINMCL). It employs a multi-level feature representation module to decouple the global and local features of both text and image. These decoupled features are then fed into the Global Congruity Learning (GCL) and Local Crossing-Congruity Learning (LCL) modules, respectively. GCL models global semantic associations using Crossing Prompter, while LCL captures local consistency in fine-grained emotional cues across modalities through cross-modal attention mechanisms and adaptive prompt injection. Finally, a CLIP-based adaptive fusion layer integrates the multi-modal representations in a sentiment-oriented manner. Experiments on the MVSA_Single, MVSA_Multiple, and TumEmo datasets with baseline models such as CTMWA and CLMLF demonstrate that DINMCL significantly outperforms mainstream models in sentiment classification accuracy and F1-score and exhibits strong robustness when handling samples containing highly noisy symbols. Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

19 pages, 2336 KB  
Article
A Lightweight Upsampling and Cross-Modal Feature Fusion-Based Algorithm for Small-Object Detection in UAV Imagery
by Jianglei Gong, Zhe Yuan, Wenxing Li, Weiwei Li, Yanjie Guo and Baolong Guo
Electronics 2026, 15(2), 298; https://doi.org/10.3390/electronics15020298 - 9 Jan 2026
Cited by 1 | Viewed by 509
Abstract
Small-object detection in UAV remote sensing faces common challenges such as tiny target size, blurred features, and severe background interference. Furthermore, single imaging modalities exhibit limited representation capability in complex environments. To address these issues, this paper proposes CTU-YOLO, a UAV-based small-object detection [...] Read more.
Small-object detection in UAV remote sensing faces common challenges such as tiny target size, blurred features, and severe background interference. Furthermore, single imaging modalities exhibit limited representation capability in complex environments. To address these issues, this paper proposes CTU-YOLO, a UAV-based small-object detection algorithm built upon cross-modal feature fusion and lightweight upsampling. The algorithm incorporates a dynamic and adaptive cross-modal feature fusion (DCFF) module, which achieves efficient feature alignment and fusion by combining frequency-domain analysis with convolutional operations. Additionally, a lightweight upsampling module (LUS) is introduced, integrating dynamic sampling and depthwise separable convolution to enhance the recovery of fine details for small objects. Experiments on the DroneVehicle and LLVIP datasets demonstrate that CTU-YOLO achieves 73.9% mAP on DroneVehicle and 96.9% AP on LLVIP, outperforming existing mainstream methods. Meanwhile, the model possesses only 4.2 MB parameters and 13.8 GFLOPs computational cost, with inference speeds reaching 129.9 FPS on DroneVehicle and 135.1 FPS on LLVIP. This exhibits an excellent lightweight design and real-time performance while maintaining high accuracy. Ablation studies confirm that both the DCFF and LUS modules contribute significantly to performance gains. Visualization analysis further indicates that the proposed method can accurately preserve the structure of small objects even under nighttime, low-light, and multi-scale background conditions, demonstrating strong robustness. Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

34 pages, 3029 KB  
Article
A Functionally Guided U-Net for Chronic Kidney Disease Assessment: Joint Structural Segmentation and eGFR Prediction with a Structure–Function Consistency Loss
by Omar Al-Salman and Mesut Cevik
Electronics 2026, 15(1), 176; https://doi.org/10.3390/electronics15010176 - 30 Dec 2025
Viewed by 663
Abstract
An accurate assessment of chronic kidney disease (CKD) requires understanding both renal morphology and functional decline, yet most deep learning approaches treat segmentation and eGFR prediction as separate tasks. This paper proposes the Functionally Guided CKD U-Net (FG-CKD-UNet), a dual-headed multitask architecture that [...] Read more.
An accurate assessment of chronic kidney disease (CKD) requires understanding both renal morphology and functional decline, yet most deep learning approaches treat segmentation and eGFR prediction as separate tasks. This paper proposes the Functionally Guided CKD U-Net (FG-CKD-UNet), a dual-headed multitask architecture that integrates multi-class kidney segmentation with end-to-end eGFR prediction using a structure–function consistency loss. The model incorporates a morphological biomarker extractor to derive cortical thickness, kidney volume, and cortex–medulla ratios, enabling explicit coupling between anatomy and physiology. Experiments on T2-weighted MRI and colorized CT datasets demonstrate that the proposed method surpasses state-of-the-art segmentation baselines, achieving a Dice score of 0.94 and an HD95 of 9.8 mm. For functional prediction, the model achieves an MAE of 0.039, an RMSE of 0.058, and a Pearson correlation of 0.92, outperforming CNN, MLP, and ResNet baselines. The structure–function consistency mechanism reduces the consistency error from 0.071 to 0.042, confirming coherent physiological modeling. The results indicate that the FG-CKD-UNet provides a reliable, interpretable, and physiologically grounded framework for comprehensive CKD assessment. Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

16 pages, 3360 KB  
Article
Diffusion Preference Alignment via Attenuated Kullback–Leibler Regularization
by Xinjian Zhang and Wei Xiang
Electronics 2025, 14(15), 2939; https://doi.org/10.3390/electronics14152939 - 23 Jul 2025
Cited by 1 | Viewed by 2328
Abstract
Direct preference optimization (DPO) has been successfully applied to align large language models (LLMs) with human preferences. In recent years, DPO has also been used to improve the generation quality of text-to-image diffusion models. However, existing techniques often rely on a single type [...] Read more.
Direct preference optimization (DPO) has been successfully applied to align large language models (LLMs) with human preferences. In recent years, DPO has also been used to improve the generation quality of text-to-image diffusion models. However, existing techniques often rely on a single type of reward model. They are also prone to overfitting to inaccurate reward signals. As a result, model quality cannot be continuously improved. To address these limitations, we propose xDPO. This method introduces a novel regularization approach that implicitly defines reward functions for both preferred and non-preferred samples. This design greatly enhances the flexibility of reward modeling. The experimental results show that, after fine-tuning Stable Diffusion v1.5, xDPO achieves significant improvements in human preference evaluations compared to previous DPO methods. It also improves training efficiency by approximately 1.5 times. Meanwhile, xDPO maintains image–text alignment performance that is comparable to the original model. Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

27 pages, 11612 KB  
Article
FACDIM: A Face Image Super-Resolution Method That Integrates Conditional Diffusion Models with Prior Attributes
by Jianhua Ren, Yuze Guo and Qiangkui Leng
Electronics 2025, 14(10), 2070; https://doi.org/10.3390/electronics14102070 - 20 May 2025
Viewed by 1676
Abstract
Facial image super-resolution seeks to reconstruct high-quality details from low-resolution inputs, yet traditional methods, such as interpolation, convolutional neural networks (CNNs), and generative adversarial networks (GANs), often fall short, suffering from insufficient realism, loss of high-frequency details, and training instability. Furthermore, many existing [...] Read more.
Facial image super-resolution seeks to reconstruct high-quality details from low-resolution inputs, yet traditional methods, such as interpolation, convolutional neural networks (CNNs), and generative adversarial networks (GANs), often fall short, suffering from insufficient realism, loss of high-frequency details, and training instability. Furthermore, many existing models inadequately incorporate facial structural attributes and semantic information, leading to semantically inconsistent generated images. To overcome these limitations, this study introduces an attribute-prior conditional diffusion implicit model that enhances the controllability of super-resolution generation and improves detail restoration capabilities. Methodologically, the framework consists of four components: a pre-super-resolution module, a facial attribute extraction module, a global feature encoder, and an enhanced conditional diffusion implicit model. Specifically, low-resolution images are subjected to preliminary super-resolution and attribute extraction, followed by adaptive group normalization to integrate feature vectors. Additionally, residual convolutional blocks are incorporated into the diffusion model to utilize attribute priors, complemented by self-attention mechanisms and skip connections to optimize feature transmission. Experiments conducted on the CelebA and FFHQ datasets demonstrate that the proposed model achieves an increase of 2.16 dB in PSNR and 0.08 in SSIM under an 8× magnification factor compared to SR3, with the generated images displaying more realistic textures. Moreover, manual adjustment of attribute vectors allows for directional control over generation outcomes (e.g., modifying facial features or lighting conditions), ensuring alignment with anthropometric characteristics. This research provides a flexible and robust solution for high-fidelity face super-resolution, offering significant advantages in detail preservation and user controllability. Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

Back to TopTop