Deep Learning Technologies and Their Applications in Image Processing, Computer Vision, and Computational Intelligence

A special issue of AI (ISSN 2673-2688).

Deadline for manuscript submissions: 15 May 2026 | Viewed by 12201

Special Issue Editors

College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
Interests: image/video restoration; image/video coding; machine learning; image segmentation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Deep learning has emerged as a pivotal technology across diverse domains, including image processing, computer vision, natural language processing, speech recognition, and beyond. With rapid advancements in artificial intelligence, deep learning, and high-performance computing, image, vision, and computing technologies have been widely implemented in autonomous driving, medical imaging, smart cities, augmented reality, and other cutting-edge fields.

These technological breakthroughs and expanded applications not only offer novel tools and methodologies for scientific research but also enhance industrial innovation. In the era of intelligence and digitization, deep learning technologies and their applications in image, vision, and computing are accelerating societal progress while providing critical support for future talent cultivation and disciplinary development.

This Special Issue will showcase the latest advances in deep learning, encompassing fundamental technologies and interdisciplinary applications in image processing, computer vision, intelligent computing, and related domains. We invite original research papers and comprehensive literature reviews addressing the aforementioned topics. Particularly, extended versions of papers accepted at ICIVC 2025 (https://icivc.org/) and ICDLT 2025 (https://www.icdlt.org/) are highly encouraged.

You may choose our Joint Special Issue in Sensors.

Dr. Honggang Chen
Dr. Chao Ren
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. AI is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning models and algorithms
  • machine learning theory and technology
  • image processing theory and applications
  • computer graphics and computational photography
  • computer vision techniques and applications
  • multimedia technology

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

26 pages, 16800 KB  
Article
Automated Anatomical Feature Analysis and Scoring for Draw-a-Person Test Drawings via ResNet-Based Multi-Label Detection and Classification
by Asma Abdullah Alwadai and Emad Sami Jaha
AI 2026, 7(4), 130; https://doi.org/10.3390/ai7040130 - 2 Apr 2026
Viewed by 680
Abstract
The process of manually scoring drawings for the Goodenough–Harris Draw-a-Person (DAP) test is time-consuming and labor-intensive. It is also prone to inconsistencies due to subjective interpretation. Keeping these drawbacks in mind, this study aims to introduce a hybrid model of automated analysis and [...] Read more.
The process of manually scoring drawings for the Goodenough–Harris Draw-a-Person (DAP) test is time-consuming and labor-intensive. It is also prone to inconsistencies due to subjective interpretation. Keeping these drawbacks in mind, this study aims to introduce a hybrid model of automated analysis and scoring of DAP test results using a combination of deep learning and rule-based reasoning. The proposed model has two different modules: one for predicting ten visual anatomical features of drawings using a convolutional neural network (CNN), and another set of six rules for representing geometric and spatial relationships. The output of the CNN is converted to binary using thresholding and then concatenated with the results of heuristic rules to obtain a final set of sixteen features. The proposed model was also evaluated using five-fold cross-validation methods and a separate hold-out test set containing 948 labeled drawings. The evaluation using the five-fold cross-validation approach shows that the proposed approach maintains consistent performance with high average F1-scores for all primary anatomical features above 0.90. On the other hand, the evaluation using the hold-out test set revealed that the proposed approach achieved a high macro-average accuracy of 91.78% for all sixteen features. This implies that the proposed approach has a high degree of generalization capability for the problem domain. The proposed approach achieves almost-perfect scores for structurally prominent anatomical features such as the head, limbs, trunk-related relationships, and all heuristic-based features. Nevertheless, the proposed approach performs poorly for less visually distinguishable anatomical features such as the ears (average F1-scores ≈ 0.09–0.12) and the neck (average F1-scores ≈ 0.75). The evaluation results show that the proposed approach is efficient in approximating expert-level scoring with a considerable reduction in human effort. Nevertheless, some limitations exist in the proposed approach. First, the proposed approach is less robust for subtle anatomical features. Second, the proposed approach relies on heuristic thresholds for feature extraction. Third, the proposed approach equally weighs all sixteen features; however, this may not exactly match the actual DAP scoring system. Full article
Show Figures

Figure 1

21 pages, 18953 KB  
Article
Evaluating AI-Based Image Inpainting Techniques for Facial Components Restoration Using Semantic Masks
by Hussein Sharadga, Abdullah Hayajneh and Erchin Serpedin
AI 2026, 7(4), 119; https://doi.org/10.3390/ai7040119 - 30 Mar 2026
Viewed by 1403
Abstract
This paper presents a comparative analysis of advanced AI-based techniques for human face inpainting using semantic masks that fully occlude targeted facial components. The primary objective is to evaluate the ability of image inpainting methods to accurately restore semantically meaningful facial features. Our [...] Read more.
This paper presents a comparative analysis of advanced AI-based techniques for human face inpainting using semantic masks that fully occlude targeted facial components. The primary objective is to evaluate the ability of image inpainting methods to accurately restore semantically meaningful facial features. Our results show that existing inpainting models face significant challenges when semantic masks completely obscure the underlying facial structures. In contrast to random masks, which leave partial visual cues, semantic masks remove all structural information, making reconstruction substantially more difficult. We assess the performance of generative adversarial networks (GANs), transformer-based models, and diffusion models in restoring fully occluded facial components. To address these challenges, we explore three retraining strategies: using semantic masks, using random masks, and a hybrid approach combining both. While the hybrid strategy leverages the complementary strengths of each mask type and improves contextual understanding, fully accurate reconstruction remains challenging. These findings demonstrate that inpainting under fully occluding semantic masks is a critical yet underexplored area, offering opportunities for developing new AI architectures and strategies for advanced facial reconstruction. Full article
Show Figures

Figure 1

18 pages, 10421 KB  
Article
A Deep Learning Framework with Multi-Scale Texture Enhancement and Heatmap Fusion for Face Super Resolution
by Bing Xu, Lei Wang, Yanxia Wu, Xiaoming Liu and Lu Gan
AI 2026, 7(1), 20; https://doi.org/10.3390/ai7010020 - 9 Jan 2026
Viewed by 941
Abstract
Face super-resolution (FSR) has made great progress thanks to deep learning and facial priors. However, many existing methods do not fully exploit landmark heatmaps and lack effective multi-scale texture modeling, which often leads to texture loss and artifacts under large upscaling factors. To [...] Read more.
Face super-resolution (FSR) has made great progress thanks to deep learning and facial priors. However, many existing methods do not fully exploit landmark heatmaps and lack effective multi-scale texture modeling, which often leads to texture loss and artifacts under large upscaling factors. To address these problems, we propose a Multi-Scale Residual Stacking Network (MRSNet), which integrates multi-scale texture enhancement with multi-stage heatmap fusion. The MRSNet is built upon Residual Attention-Guided Units (RAGUs) and incorporates a Face Detail Enhancer (FDE), which applies edge, texture, and region branches to achieve differentiated enhancement across facial components. Furthermore, we design a Multi-Scale Texture Enhancement Module (MTEM) that employs progressive average pooling to construct hierarchical receptive fields and employs heatmap-guided attention for adaptive texture refinement. In addition, we introduce a multi-stage heatmap fusion strategy that injects landmark priors into multiple phases of the network, including feature extraction, texture enhancement, and detail reconstruction, enabling deep sharing and progressive integration of prior knowledge. Extensive experiments on CelebA and Helen demonstrate that the proposed method achieves superior detail recovery and generates perceptually realistic high-resolution face images. Both quantitative and qualitative evaluations confirm that our approach outperforms state-of-the-art methods. Full article
Show Figures

Figure 1

27 pages, 5157 KB  
Article
Remote Sensing Scene Classification via Multi-Feature Fusion Based on Discriminative Multiple Canonical Correlation Analysis
by Shavkat Fazilov, Ozod Yusupov, Yigitali Khandamov, Erali Eshonqulov, Jalil Khamidov and Khabiba Abdieva
AI 2026, 7(1), 5; https://doi.org/10.3390/ai7010005 - 23 Dec 2025
Cited by 3 | Viewed by 1032
Abstract
Scene classification in remote sensing images is one of the urgent tasks that requires an improvement in recognition accuracy due to complex spatial structures and high inter-class similarity. Although feature extraction using convolutional neural networks provides high efficiency, combining deep features obtained from [...] Read more.
Scene classification in remote sensing images is one of the urgent tasks that requires an improvement in recognition accuracy due to complex spatial structures and high inter-class similarity. Although feature extraction using convolutional neural networks provides high efficiency, combining deep features obtained from different architectures in a semantically consistent manner remains an important scientific problem. In this study, a DMCCA + SVM model is proposed, in which Discriminative Multiple Canonical Correlation Analysis (DMCCA) is applied to fuse multi-source deep features, and final classification is performed using a Support Vector Machine (SVM). Unlike conventional fusion methods, DMCCA projects heterogeneous features into a unified low-dimensional latent space by maximizing within-class correlation and minimizing between-class correlation, resulting in a more separable and compact feature space. The proposed approach was evaluated on three widely used benchmark datasets—NWPU-RESISC45, AID, and PatternNet—and achieved accuracy scores of 92.75%, 93.92%, and 99.35%, respectively. The results showed that the model outperforms modern individual CNN architectures. Additionally, the model’s stability and generalization capability were confirmed through K-fold cross-validation. Overall, the proposed DMCCA + SVM model was experimentally validated as an effective and reliable solution for high-accuracy classification of remote sensing scenes. Full article
Show Figures

Figure 1

21 pages, 899 KB  
Article
Gated Fusion Networks for Multi-Modal Violence Detection
by Bilal Ahmad, Mustaqeem Khan and Muhammad Sajjad
AI 2025, 6(10), 259; https://doi.org/10.3390/ai6100259 - 3 Oct 2025
Cited by 1 | Viewed by 2908
Abstract
Public safety and security require an effective monitoring system to detect violence through visual, audio, and motion data. However, current methods often fail to utilize the complementary benefits of visual and auditory modalities, thereby reducing their overall effectiveness. To enhance violence detection, we [...] Read more.
Public safety and security require an effective monitoring system to detect violence through visual, audio, and motion data. However, current methods often fail to utilize the complementary benefits of visual and auditory modalities, thereby reducing their overall effectiveness. To enhance violence detection, we present a novel multimodal method in this paper that detects motion, audio, and visual information from the input to recognize violence. We designed a framework comprising two specialized components: a gated fusion module and a multi-scale transformer, which enables the efficient detection of violence in multimodal data. To ensure a seamless and effective integration of features, a gated fusion module dynamically adjusts the contribution of each modality. At the same time, a multi-modal transformer utilizes multiple instance learning (MIL) to identify violent behaviors more accurately from input data by capturing complex temporal correlations. Our model fully integrates multi-modal information using these techniques, improving the accuracy of violence detection. In this study, we found that our approach outperformed state-of-the-art methods with an accuracy of 86.85% using the XD-Violence dataset, thereby demonstrating the potential of multi-modal fusion in detecting violence. Full article
Show Figures

Figure 1

15 pages, 4635 KB  
Article
GLNet-YOLO: Multimodal Feature Fusion for Pedestrian Detection
by Yi Zhang, Qing Zhao, Xurui Xie, Yang Shen, Jinhe Ran, Shu Gui, Haiyan Zhang, Xiuhe Li and Zhen Zhang
AI 2025, 6(9), 229; https://doi.org/10.3390/ai6090229 - 12 Sep 2025
Cited by 2 | Viewed by 2293
Abstract
In the field of modern computer vision, pedestrian detection technology holds significant importance in applications such as intelligent surveillance, autonomous driving, and robot navigation. However, single-modal images struggle to achieve high-precision detection in complex environments. To address this, this study proposes a GLNet-YOLO [...] Read more.
In the field of modern computer vision, pedestrian detection technology holds significant importance in applications such as intelligent surveillance, autonomous driving, and robot navigation. However, single-modal images struggle to achieve high-precision detection in complex environments. To address this, this study proposes a GLNet-YOLO framework based on cross-modal deep feature fusion, aiming to improve pedestrian detection performance in complex environments by fusing feature information from visible light and infrared images. By extending the YOLOv11 architecture, the framework adopts a dual-branch network structure to process visible light and infrared modal inputs, respectively, and introduces the FM module to realize global feature fusion and enhancement, as well as the DMR module to accomplish local feature separation and interaction. Experimental results show that on the LLVIP dataset, compared to the single-modal YOLOv11 baseline, our fused model improves the mAP@50 by 9.2% over the visible-light-only model and 0.7% over the infrared-only model. This significantly improves the detection accuracy under low-light and complex background conditions and enhances the robustness of the algorithm, and its effectiveness is further verified on the KAIST dataset. Full article
Show Figures

Figure 1

Review

Jump to: Research

23 pages, 2541 KB  
Review
Artificial Intelligence in Endometriosis Imaging: A Scoping Review
by Rawan AlSaad, Thomas Farrell, Ali Elhenidy, Shima Albasha and Rajat Thomas
AI 2026, 7(2), 43; https://doi.org/10.3390/ai7020043 - 29 Jan 2026
Viewed by 1761
Abstract
Endometriosis is a chronic gynecological condition characterized by endometrium-like tissue outside the uterus. In clinical practice, diagnosis and anatomical mapping rely heavily on imaging, yet performance remains operator- and modality-dependent. Artificial intelligence (AI) has been increasingly applied to endometriosis imaging. We conducted a [...] Read more.
Endometriosis is a chronic gynecological condition characterized by endometrium-like tissue outside the uterus. In clinical practice, diagnosis and anatomical mapping rely heavily on imaging, yet performance remains operator- and modality-dependent. Artificial intelligence (AI) has been increasingly applied to endometriosis imaging. We conducted a PRISMA-ScR-guided scoping review of primary machine learning and deep learning studies using endometriosis-related imaging. Five databases (MEDLINE, Embase, Scopus, IEEE Xplore, and Google Scholar) were searched from 2015 to 2025. Of 413 records, 32 studies met inclusion and most were single-center, retrospective investigations in reproductive-age cohorts. Ultrasound predominated (50%), followed by laparoscopic imaging (25%) and MRI (22%); ovarian endometrioma and deep infiltrating endometriosis were the most commonly modeled phenotypes. Classification was the dominant AI task (78%), typically using convolutional neural networks (often ResNet-based), whereas segmentation (31%) and object detection (3%) were less explored. Nearly all studies relied on internal validation (97%), most frequently simple hold-out splits with heterogeneous, accuracy-focused performance reporting. The minimal AI-method quality appraisal identified frequent methodological gaps across key domains, including limited reporting of patient-level separation, leakage safeguards, calibration, and data and code availability. Overall, AI-enabled endometriosis imaging is rapidly evolving but remains early-stage; multi-center and prospective validation, standardized reporting, and clinically actionable detection–segmentation pipelines are needed before routine clinical integration. Full article
Show Figures

Figure 1

Back to TopTop