AI-Driven Medical Image/Video Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 31 January 2026 | Viewed by 13093

Special Issue Editor


E-Mail Website
Guest Editor
School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon 440-746, Republic of Korea
Interests: deep learning algorithm; image/video signal processing; medical image processing and system; image/video communication and systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial intelligence (AI) has emerged as a transformative force in the field of medical image and video processing, offering unprecedented opportunities to improve diagnosis, treatment planning, and patient outcomes. As AI-driven image processing continues to shape modern technology, addressing challenges, such as those of computational efficiency, interpretability, and ethical considerations, remains crucial. Additionally, there is a need to foster interdisciplinary collaboration and knowledge exchange between clinical professionals and AI experts, bringing more accurate diagnosis and treatment through an integrated approach to the disease. This special issue aims to explore the latest theoretical advancements, methodologies, and practical applications of AI in medical imaging and video analysis. We invite contributions that highlight innovative uses of AI-driven techniques to address challenges in medical image interpretation, video analysis, multimodal imaging, and clinical decision support. Through this issue, we seek to foster a deeper understanding of AI's role in revolutionizing healthcare and its potential to enhance the precision, efficiency, and accessibility of medical practices worldwide.

Prof. Dr. Jitae Shin
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • AI techniques in medical imaging
  • medical image/video processing
  • multimodal imaging
  • diagnostic imaging
  • data augmentation and synthetic data
  • explainable AI in clinical decision support
  • deep learning for image/video analysis
  • medical image and video enhancement
  • applications of large language models (LLMs) in medical domain
  • AI-driven smart devices for medical image/video
  • clinical decision support

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 2106 KB  
Article
A Hierarchical Multi-Modal Fusion Framework for Alzheimer’s Disease Classification Using 3D MRI and Clinical Biomarkers
by Ting-An Chang, Chun-Cheng Yu, Yin-Hua Wang, Zi-Ping Lei and Chia-Hung Chang
Electronics 2026, 15(2), 367; https://doi.org/10.3390/electronics15020367 - 14 Jan 2026
Viewed by 209
Abstract
Accurate and interpretable staging of Alzheimer’s disease (AD) remains challenging due to the heterogeneous progression of neurodegeneration and the complementary nature of imaging and clinical biomarkers. This study implements and evaluates an optimized Hierarchical Multi-Modal Fusion Framework (HMFF) that systematically integrates 3D structural [...] Read more.
Accurate and interpretable staging of Alzheimer’s disease (AD) remains challenging due to the heterogeneous progression of neurodegeneration and the complementary nature of imaging and clinical biomarkers. This study implements and evaluates an optimized Hierarchical Multi-Modal Fusion Framework (HMFF) that systematically integrates 3D structural MRI with clinical assessment scales for robust three-class classification of cognitively normal (CN), mild cognitive impairment (MCI), and AD subjects. A standardized preprocessing pipeline, including N4 bias field correction, nonlinear registration to MNI space, ANTsNet-based skull stripping, voxel normalization, and spatial resampling, was employed to ensure anatomically consistent and high-quality MRI inputs. Within the proposed framework, volumetric imaging features were extracted using a 3D DenseNet-121 architecture, while structured clinical information was modeled via an XGBoost classifier to capture nonlinear clinical priors. These heterogeneous representations were hierarchically fused through a lightweight multilayer perceptron, enabling effective cross-modal interaction. To further enhance discriminative capability and model efficiency, a hierarchical feature selection strategy was incorporated to progressively refine high-dimensional imaging features. Experimental results demonstrated that performance consistently improved with feature refinement and reached an optimal balance at approximately 90 selected features. Under this configuration, the proposed HMFF achieved an accuracy of 0.94 (95% Confidence Interval: [0.918, 0.951]), a recall of 0.91, a precision of 0.94, and an F1-score of 0.92, outperforming unimodal and conventional multimodal baselines under comparable settings. Moreover, Grad-CAM visualization confirmed that the model focused on clinically relevant neuroanatomical regions, including the hippocampus and medial temporal lobe, enhancing interpretability and clinical plausibility. These findings indicate that hierarchical multimodal fusion with interpretable feature refinement offers a promising and extensible solution for reliable and explainable automated AD staging. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

20 pages, 2026 KB  
Article
Unified Adult–Pediatric Glioma Segmentation via Synergistic MAE Pretraining and Boundary-Aware Refinement
by Moldir Zharylkassynova, Jaepil Ko and Kyungjoo Cheoi
Electronics 2026, 15(2), 329; https://doi.org/10.3390/electronics15020329 - 12 Jan 2026
Viewed by 162
Abstract
Accurate brain tumor segmentation in both adult and pediatric populations remains a challenge due to substantial differences in brain anatomy, tumor distribution, and subregion size. This study proposes a unified segmentation framework based on nnU-Net, integrating encoder-level self-supervised pretraining with a lightweight, boundary-aware [...] Read more.
Accurate brain tumor segmentation in both adult and pediatric populations remains a challenge due to substantial differences in brain anatomy, tumor distribution, and subregion size. This study proposes a unified segmentation framework based on nnU-Net, integrating encoder-level self-supervised pretraining with a lightweight, boundary-aware decoder. The encoder is initialized using a large-scale 3D masked autoencoder pretrained on brain MRI, while the decoder is trained with a hybrid loss function that combines region-overlap and boundary-sensitive terms. A harmonized training and evaluation protocol is applied to both the BraTS-GLI (adult) and BraTS-PED (pediatric) cohorts, enabling fair cross-cohort comparison against baseline and advanced nnU-Net variants. The proposed method improves mean Dice scores from 0.76 to 0.90 for adults and from 0.64 to 0.78 for pediatric cases, while reducing HD95 from 4.42 to 2.24 mm and from 9.03 to 6.23 mm, respectively. These results demonstrate that combining encoder-level pretraining with decoder-side boundary supervision significantly enhances segmentation accuracy across age groups without adding inference-time computational overhead. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

15 pages, 2713 KB  
Article
Deep Learning-Based Segmentation for Digital Epidermal Microscopic Images: A Comparative Study of Overall Performance
by Yeshun Yue, Qihang He and Yaobin Zou
Electronics 2025, 14(19), 3871; https://doi.org/10.3390/electronics14193871 - 29 Sep 2025
Viewed by 446
Abstract
Digital epidermal microscopic (DEM) images offer the potential to quantitatively analyze skin aging at the microscopic level. However, stochastic complexity, local highlights, and low contrast in DEM images pose significant challenges to accurate segmentation. This study evaluated eight deep learning models to identify [...] Read more.
Digital epidermal microscopic (DEM) images offer the potential to quantitatively analyze skin aging at the microscopic level. However, stochastic complexity, local highlights, and low contrast in DEM images pose significant challenges to accurate segmentation. This study evaluated eight deep learning models to identify methods capable of accurately segmenting complex DEM images while meeting diverse performance requirements. To this end, this study first constructed a manually labeled DEM image dataset. Then, eight deep learning models (FCN-8s, SegNet, UNet, ResUNet, NestedUNet, DeepLabV3+, TransUNet, and AttentionUNet) were systematically evaluated for their performance in DEM image segmentation. Our experimental findings show that AttentionUNet achieves the highest segmentation accuracy, with a DSC of 0.8696 and an IoU of 0.7703. In contrast, FCN-8s is a better choice for efficient segmentation due to its lower parameter count (18.64 M) and efficient inference speed (GPU time 37.36 ms). FCN-8s and NestedUNet show a better balance between accuracy and efficiency when assessed across metrics like segmentation accuracy, model size, and inference time. Through a systematic comparison of eight deep learning models, this study identifies superior methods for segmenting skin furrows and ridges in DEM images. This work lays the foundation for subsequent applications, such as analyzing skin aging through furrow and ridge features. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

18 pages, 775 KB  
Article
Better with Less: Efficient and Accurate Skin Lesion Segmentation Enabled by Diffusion Model Augmentation
by Peng Yang, Zhuochao Chen, Xiaoxuan Sun and Xiaodan Deng
Electronics 2025, 14(17), 3359; https://doi.org/10.3390/electronics14173359 - 24 Aug 2025
Viewed by 1495
Abstract
Automatic skin lesion segmentation is essential for early melanoma diagnosis, yet the scarcity and limited diversity of annotated training data hinder progress. We introduce a two-stage framework that first employs a denoising diffusion probabilistic model (DDPM) enhanced with dilated convolutions and self-attention to [...] Read more.
Automatic skin lesion segmentation is essential for early melanoma diagnosis, yet the scarcity and limited diversity of annotated training data hinder progress. We introduce a two-stage framework that first employs a denoising diffusion probabilistic model (DDPM) enhanced with dilated convolutions and self-attention to synthesize unseen, high-fidelity dermoscopic images. In the second stage, segmentation models—including a dilated U-Net variant that leverages dilated convolutions to enlarge the receptive field—are trained on the augmented dataset. Experimental results demonstrate that this approach not only enhances segmentation accuracy across various architectures with an increase in DICE of more than 0.4, but also enables compact and computationally efficient segmentation models to achieve performance comparable to or even better than that of models with 10 times the parameters. Moreover, our diffusion-based data augmentation strategy consistently improves segmentation performance across multiple architectures, validating its effectiveness for developing accurate and deployable clinical tools. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

16 pages, 1422 KB  
Article
Prototype-Guided Promptable Retinal Lesion Segmentation from Coarse Annotations
by Qinji Yu and Xiaowei Ding
Electronics 2025, 14(16), 3252; https://doi.org/10.3390/electronics14163252 - 15 Aug 2025
Viewed by 1082
Abstract
Accurate segmentation of retinal lesions is critical for the diagnosis and management of ophthalmic diseases, but pixel-level annotation is labor-intensive and demanding in clinical scenarios. To address this, we introduce a promptable segmentation approach based on prototype learning that enables precise retinal lesion [...] Read more.
Accurate segmentation of retinal lesions is critical for the diagnosis and management of ophthalmic diseases, but pixel-level annotation is labor-intensive and demanding in clinical scenarios. To address this, we introduce a promptable segmentation approach based on prototype learning that enables precise retinal lesion segmentation from low-cost, coarse annotations. Our framework treats clinician-provided coarse masks (such as ellipses) as prompts to guide the extraction and refinement of lesion and background feature prototypes. A lightweight U-Net backbone fuses image content with spatial priors, while a superpixel-guided prototype weighting module is employed to mitigate background interference within coarse prompts. We simulate coarse prompts from fine-grained masks to train the model, and extensively validate our method across three datasets (IDRiD, DDR, and a private clinical set) with a range of annotation coarseness levels. Experimental results demonstrate that our prototype-based model significantly outperforms fully supervised and non-prototypical promptable baselines, achieving more accurate and robust segmentation, particularly for challenging and variable lesions. The approach exhibits excellent adaptability to unseen data distributions and lesion types, maintaining stable performance even under highly coarse prompts. This work highlights the potential of prompt-driven, prototype-based solutions for efficient and reliable medical image segmentation in practical clinical settings. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

17 pages, 3807 KB  
Article
2AM: Weakly Supervised Tumor Segmentation in Pathology via CAM and SAM Synergy
by Chenyu Ren, Liwen Zou and Luying Gui
Electronics 2025, 14(15), 3109; https://doi.org/10.3390/electronics14153109 - 5 Aug 2025
Viewed by 1327
Abstract
Tumor microenvironment (TME) analysis plays an extremely important role in computational pathology. Deep learning shows tremendous potential for tumor tissue segmentation on pathological images, which is an essential part of TME analysis. However, fully supervised segmentation methods based on deep learning usually require [...] Read more.
Tumor microenvironment (TME) analysis plays an extremely important role in computational pathology. Deep learning shows tremendous potential for tumor tissue segmentation on pathological images, which is an essential part of TME analysis. However, fully supervised segmentation methods based on deep learning usually require a large number of manual annotations, which is time-consuming and labor-intensive. Recently, weakly supervised semantic segmentation (WSSS) works based on the Class Activation Map (CAM) have shown promising results to learn the concept of segmentation from image-level class labels but usually have imprecise boundaries due to the lack of pixel-wise supervision. On the other hand, the Segment Anything Model (SAM), a foundation model for segmentation, has shown an impressive ability for general semantic segmentation on natural images, while it suffers from the noise caused by the initial prompts. To address these problems, we propose a simple but effective weakly supervised framework, termed as 2AM, combining CAM and SAM for tumor tissue segmentation on pathological images. Our 2AM model is composed of three modules: (1) a CAM module for generating salient regions for tumor tissues on pathological images; (2) an adaptive point selection (APS) module for providing more reliable initial prompts for the subsequent SAM by designing three priors of basic appearance, space distribution, and feature difference; and (3) a SAM module for predicting the final segmentation. Experimental results on two independent datasets show that our proposed method boosts tumor segmentation accuracy by nearly 25% compared with the baseline method, and achieves more than 15% improvement compared with previous state-of-the-art segmentation methods with WSSS settings. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

24 pages, 2119 KB  
Article
Multimodal Medical Image Fusion Using a Progressive Parallel Strategy Based on Deep Learning
by Peng Peng and Yaohua Luo
Electronics 2025, 14(11), 2266; https://doi.org/10.3390/electronics14112266 - 31 May 2025
Cited by 3 | Viewed by 5467
Abstract
Multimodal medical image fusion plays a critical role in enhancing diagnostic accuracy by integrating complementary information from different imaging modalities. However, existing methods often suffer from issues such as unbalanced feature fusion, structural blurring, loss of fine details, and limited global semantic modeling, [...] Read more.
Multimodal medical image fusion plays a critical role in enhancing diagnostic accuracy by integrating complementary information from different imaging modalities. However, existing methods often suffer from issues such as unbalanced feature fusion, structural blurring, loss of fine details, and limited global semantic modeling, particularly in low signal-to-noise modalities like PET. To address these challenges, we propose PPMF-Net, a novel progressive and parallel deep learning framework for PET–MRI image fusion. The network employs a hierarchical multi-path architecture to capture local details, global semantics, and high-frequency information in a coordinated manner. Specifically, it integrates three key modules: (1) a Dynamic Edge-Enhanced Module (DEEM) utilizing inverted residual blocks and channel attention to sharpen edge and texture features, (2) a Nonlinear Interactive Feature Extraction module (NIFE) that combines convolutional operations with element-wise multiplication to enable cross-modal feature coupling, and (3) a Transformer-Enhanced Global Modeling module (TEGM) with hybrid local–global attention to improve long-range dependency and structural consistency. A multi-objective unsupervised loss function is designed to jointly optimize structural fidelity, functional complementarity, and detail clarity. Experimental results on the Harvard MIF dataset demonstrate that PPMF-Net outperforms state-of-the-art methods across multiple metrics—achieving SF: 38.27, SD: 96.55, SCD: 1.62, and MS-SSIM: 1.14—and shows strong generalization and robustness in tasks such as SPECT–MRI and CT–MRI fusion, indicating its promising potential for clinical applications. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

22 pages, 2133 KB  
Article
Classification of Whole-Slide Pathology Images Based on State Space Models and Graph Neural Networks
by Feng Ding, Chengfei Cai, Jun Li, Mingxin Liu, Yiping Jiao, Zhengcan Wu and Jun Xu
Electronics 2025, 14(10), 2056; https://doi.org/10.3390/electronics14102056 - 19 May 2025
Cited by 2 | Viewed by 2316
Abstract
Whole-slide images (WSIs) pose significant analytical challenges due to their large data scale and complexity. Multiple instance learning (MIL) has emerged as an effective solution for WSI classification, but existing frameworks often lack flexibility in feature integration and underutilize sequential information. To address [...] Read more.
Whole-slide images (WSIs) pose significant analytical challenges due to their large data scale and complexity. Multiple instance learning (MIL) has emerged as an effective solution for WSI classification, but existing frameworks often lack flexibility in feature integration and underutilize sequential information. To address these limitations, this work proposes a novel MIL framework: Dynamic Graph and State Space Model-Based MIL (DG-SSM-MIL). DG-SSM-MIL combines graph neural networks and selective state space models, leveraging the former’s ability to extract local and spatial features and the latter’s advantage in comprehensively understanding long-sequence instances. This enhances the model’s performance in diverse instance classification, improves its capability to handle long-sequence data, and increases the precision and scalability of feature fusion. We propose the Dynamic Graph and State Space Model (DynGraph-SSM) module, which aggregates local and spatial information of image patches through directed graphs and learns global feature representations using the Mamba model. Additionally, the directed graph structure alleviates the unidirectional scanning limitation of Mamba and enhances its ability to process pathological images with dispersed lesion distributions. DG-SSM-MIL demonstrates superior performance in classification tasks compared to other models. We validate the effectiveness of the proposed method on features extracted from two pretrained models across four public medical image datasets: BRACS, TCGA-NSCLC, TCGA-RCC, and CAMELYON16. Experimental results demonstrate that DG-SSM-MIL consistently outperforms existing MIL methods across four public datasets. For example, when using ResNet-50 features, our model achieves the highest AUCs of 0.936, 0.785, 0.879, and 0.957 on TCGA-NSCLC, BRACS, CAMELYON16, and TCGA-RCC, respectively. Similarly, with UNI features, DG-SSM-MIL reaches AUCs of 0.968, 0.846, 0.993, and 0.990, surpassing all baselines. These results confirm the effectiveness and generalizability of our approach in diverse WSI classification tasks. Full article
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)
Show Figures

Figure 1

Back to TopTop