Topic Editors

Department of Biomedical Imaging and Radiological Sciences, National Yang-Ming Chiao-Tung University, Taipei 112, Taiwan
Prof. Dr. Kuangyu Shi
Department of Nuclear Medicine, Bern University Hospital, Unviersity of Bern, 3010 Bern, Switzerland

Applications of Image and Video Processing in Medical Imaging

Abstract submission deadline
closed (28 February 2026)
Manuscript submission deadline
30 April 2026
Viewed by
28269

Topic Information

Dear Colleagues,

We invite submissions of original research articles and reviews on topics such as image reconstruction, enhancement, anomaly detection, segmentation, motion correction, modelling, and computer-aided diagnosis. Emphasis is placed on the role of artificial intelligence, machine learning, and deep learning in improving the safety, practicality, and efficacy of medical imaging in clinical applications.

This Special Issue seeks interdisciplinary contributions spanning areas such as radiology, nuclear medicine, ultrasound, interventional imaging, and telemedicine. We welcome works presenting novel algorithms and new applications, as well as those addressing challenges in big data-handling, privacy, and security.

Prof. Dr. Jyh-Cheng Chen
Prof. Dr. Kuangyu Shi
Topic Editors

Keywords

  • image and video processing
  • medical imaging
  • artificial intelligence
  • machine learning
  • deep learning

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.5 2011 16 Days CHF 2400 Submit
Big Data and Cognitive Computing
BDCC
4.4 9.8 2017 23.1 Days CHF 1800 Submit
Electronics
electronics
2.6 6.1 2012 16.4 Days CHF 2400 Submit
Information
information
2.9 6.5 2010 20.9 Days CHF 1800 Submit
Journal of Imaging
jimaging
3.3 6.7 2015 18 Days CHF 1800 Submit
Machine Learning and Knowledge Extraction
make
6.0 9.9 2019 27 Days CHF 1800 Submit
Signals
signals
2.6 4.6 2020 21.8 Days CHF 1200 Submit

Preprints.org is a multidisciplinary platform offering a preprint service designed to facilitate the early sharing of your research. It supports and empowers your research journey from the very beginning.

MDPI Topics is collaborating with Preprints.org and has established a direct connection between MDPI journals and the platform. Authors are encouraged to take advantage of this opportunity by posting their preprints at Preprints.org prior to publication:

  1. Share your research immediately: disseminate your ideas prior to publication and establish priority for your work.
  2. Safeguard your intellectual contribution: Protect your ideas with a time-stamped preprint that serves as proof of your research timeline.
  3. Boost visibility and impact: Increase the reach and influence of your research by making it accessible to a global audience.
  4. Gain early feedback: Receive valuable input and insights from peers before submitting to a journal.
  5. Ensure broad indexing: Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (13 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
24 pages, 4292 KB  
Article
KhayyamNet: A Parallel Multiscale Feature Fusion Framework for Accurate Diagnosis of Multiple Sclerosis and Myelitis
by Mahshid Dehghanpour, Mansoor Fateh, Zeynab Mohammadpoory and Saideh Ferdowsi
Mach. Learn. Knowl. Extr. 2026, 8(3), 62; https://doi.org/10.3390/make8030062 - 5 Mar 2026
Viewed by 118
Abstract
Multiple Sclerosis (MS) and Myelitis are serious inflammatory spinal cord disorders with overlapping clinical symptoms and radiological characteristics, making accurate differentiation challenging yet clinically essential. Early and precise diagnosis is critical for guiding treatment strategies and improving patient outcomes. In this study, we [...] Read more.
Multiple Sclerosis (MS) and Myelitis are serious inflammatory spinal cord disorders with overlapping clinical symptoms and radiological characteristics, making accurate differentiation challenging yet clinically essential. Early and precise diagnosis is critical for guiding treatment strategies and improving patient outcomes. In this study, we propose KhayyamNet, a novel hybrid deep learning architecture designed to fuse complementary local and global representations for the accurate diagnosis of MS and Myelitis using spinal MRI. To improve robustness and generalization capability, a comprehensive preprocessing strategy including data augmentation and intensity normalization is also applied to reduce noise and address data variability. The proposed architecture combines three complementary deep learning models for feature extraction composed of Xception for high-level semantic features, Convolutional Neural Networks (CNNs) for fine-grained local patterns, and Vision Transformers (ViTs) for global contextual representations via attention mechanisms. Extracted features are then fused and refined using the Minimum Redundancy Maximum Relevance (MRMR) algorithm to eliminate redundancy and retain the most informative signals. Finally, a Random Forest (RF) classifier utilizes the optimized feature set to achieve accurate and robust differentiation between MS, Myelitis, and control spinal MRIs. Experimental results demonstrate that KhayyamNet outperforms existing methods by achieving an average classification accuracy of 98.15±0.80%. This framework demonstrates promising performance for the automated analysis of spinal MRIs and shows potential to assist in the differentiation of MS and Myelitis. While these findings highlight the potential of KhayyamNet for automated MRI interpretation, its evaluation is limited to a single-center dataset, and further validation on external multi-center data is required. Full article
Show Figures

Figure 1

20 pages, 1379 KB  
Article
Hybrid Vision Transformer–CNN Framework for Alzheimer’s Disease Cell Type Classification: A Comparative Study with Vision–Language Models
by Md Easin Hasan, Md Tahmid Hasan Fuad, Omar Sharif and Amy Wagler
J. Imaging 2026, 12(3), 98; https://doi.org/10.3390/jimaging12030098 - 25 Feb 2026
Viewed by 235
Abstract
Accurate identification of Alzheimer’s disease (AD)-related cellular characteristics from microscopy images is essential for understanding neurodegenerative mechanisms at the cellular level. While most computational approaches focus on macroscopic neuroimaging modalities, cell type classification from microscopy remains relatively underexplored. In this study, we propose [...] Read more.
Accurate identification of Alzheimer’s disease (AD)-related cellular characteristics from microscopy images is essential for understanding neurodegenerative mechanisms at the cellular level. While most computational approaches focus on macroscopic neuroimaging modalities, cell type classification from microscopy remains relatively underexplored. In this study, we propose a hybrid vision transformer–convolutional neural network (ViT–CNN) framework that integrates DeiT-Small and EfficientNet-B7 to classify three AD-related cell types—astrocytes, cortical neurons, and SH-SY5Y neuroblastoma cells—from phase-contrast microscopy images. We perform a comparative evaluation against conventional CNN architectures (DenseNet, ResNet, InceptionNet, and MobileNet) and prompt-based multimodal vision–language models (GPT-5, GPT-4o, and Gemini 2.5-Flash) using zero-shot, few-shot, and chain-of-thought prompting. Experiments conducted with stratified fivefold cross-validation show that the proposed hybrid model achieves a test accuracy of 61.03% and a macro F1 score of 61.85, outperforming standalone CNN baselines and prompt-only LLM approaches under data-limited conditions. These results suggest that combining convolutional inductive biases with transformer-based global context modeling can improve generalization for cellular microscopy classification. While constrained by dataset size and scope, this work serves as a proof of concept and highlights promising directions for future research in domain-specific pretraining, multimodal data integration, and explainable AI for AD-related cellular analysis. Full article
Show Figures

Figure 1

22 pages, 7547 KB  
Article
AuraViT-FL: A Resource-Efficient 2D Hybrid Transformer Framework for Federated Lung Tumor Segmentation
by Mohamed A. Abdelhamed, Hana M. Nassef, Sara Abdelnasser, Sahar Selim and Lobna A. Said
Mach. Learn. Knowl. Extr. 2026, 8(2), 34; https://doi.org/10.3390/make8020034 - 3 Feb 2026
Viewed by 284
Abstract
Accurate lung tumor segmentation using computed tomography (CT) scans is needed for efficient tumor treatment. However, the development of deep learning models is often constrained by strict patient privacy regulations that limit direct data sharing. This work presents a system that enables multi-institutional [...] Read more.
Accurate lung tumor segmentation using computed tomography (CT) scans is needed for efficient tumor treatment. However, the development of deep learning models is often constrained by strict patient privacy regulations that limit direct data sharing. This work presents a system that enables multi-institutional collaboration while training high-quality lung tumor segmentation models without requiring access to sensitive patient data. The proposed framework features the AuraViT suite, which includes the standard AuraViT—a hybrid model with 136 million parameters that combines a Vision Transformer (ViT) encoder, Atrous Spatial Pyramid Pooling (ASPP), and attention-gated residual connections—and the Lightweight AuraViT (LAURA) family (Small, Tiny, and Mobile). These variants are designed for resource-constrained environments and potential edge deployment scenarios. Training is conducted on publicly available datasets (MSD Lung and NSCLC) in a simulated five-client federated learning setup that emulates collaboration among institutions while ensuring patient privacy. The framework uses a federated learning setup with FedProx, adaptive weighted aggregation, and a dynamic virtual client strategy to handle data and system differences. The framework is further evaluated through ablation studies on model architecture and feature importance. The results show that the standard AuraViT-FL achieves a global mean Dice score of 80.81%, while maintaining performance close to centralized training. Additionally, the LAURA variations show a better trade-off between accuracy and efficiency. Notably, the Mobile variant with ∼5 M parameters reduces model complexity by over 96% while maintaining competitive performance (82.96% Dice on MSD Lung). Full article
Show Figures

Figure 1

17 pages, 3892 KB  
Article
Transformer-Driven Semi-Supervised Learning for Prostate Cancer Histopathology: A DINOv2–TransUNet Framework
by Rubina Akter Rabeya, Jeong-Wook Seo, Nam Hoon Cho, Hee-Cheol Kim and Heung-Kook Choi
Mach. Learn. Knowl. Extr. 2026, 8(2), 26; https://doi.org/10.3390/make8020026 - 23 Jan 2026
Viewed by 466
Abstract
Prostate cancer is diagnosed through a comprehensive study of histopathology slides, which takes time and requires professional interpretation. To minimize this load, we developed a semi-supervised learning technique that combines transformer-based representation learning and a custom TransUNet classifier. To capture a wide range [...] Read more.
Prostate cancer is diagnosed through a comprehensive study of histopathology slides, which takes time and requires professional interpretation. To minimize this load, we developed a semi-supervised learning technique that combines transformer-based representation learning and a custom TransUNet classifier. To capture a wide range of morphological structures without manual annotation, our method pretrains DINOv2 on 10,000 unlabeled prostate tissue patches. After receiving the transformer-derived features, a bespoke CNN-based decoder uses residual upsampling and carefully constructed skip connections to merge data from many spatial scales. Expert pathologists identified only 20% of the patches in the whole dataset; the remaining unlabeled samples were contributed by using a consistency-driven learning method that promoted reliable predictions across various augmentations. The model received precision and recall scores of 91.81% and 89.02%, respectively, and an accuracy of 93.78% on an additional test set. These results exceed the performance of a conventional U-Net and a baseline encoder–decoder network. All things considered, the localized CNN (Convolutional Neural Network) decoding and global transformer attention provide a reliable method for prostate cancer classification in situations with little annotated data. Full article
Show Figures

Graphical abstract

20 pages, 11517 KB  
Article
M3-TransUNet: Medical Image Segmentation Based on Spatial Prior Attention and Multi-Scale Gating
by Zhigao Zeng, Jiale Xiao, Shengqiu Yi, Qiang Liu and Yanhui Zhu
J. Imaging 2026, 12(1), 15; https://doi.org/10.3390/jimaging12010015 - 29 Dec 2025
Viewed by 825
Abstract
Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To [...] Read more.
Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To address these limitations, we propose the M3-TransUNet architecture, which incorporates three key innovations: (1) MSGA (Multi-Scale Gate Attention) and MSSA (Multi-Scale Selective Attention) modules to enhance multi-scale feature representation; (2) ME-MSA (Manhattan Enhanced Multi-Head Self-Attention) to integrate spatial priors into self-attention computations, thereby overcoming spatial modeling deficiencies; and (3) MKGAG (Multi-kernel Gated Attention Gate) to optimize skip connections by precisely filtering noise and preserving boundary details. Extensive experiments on public datasets—including Synapse, CVC-ClinicDB, and ISIC—demonstrate that M3-TransUNet achieves state-of-the-art performance. Specifically, on the Synapse dataset, our model outperforms recent TransUNet variants such as J-CAPA, improving the average DSC to 82.79% (compared to 82.29%) and significantly reducing the average HD95 from 19.74 mm to 10.21 mm. Full article
Show Figures

Figure 1

12 pages, 529 KB  
Article
Long-Term Prognostic Value in Nuclear Cardiology: Expert Scoring Combined with Automated Measurements vs. Angiographic Score
by George Angelidis, Stavroula Giannakou, Varvara Valotassiou, Emmanouil Panagiotidis, Ioannis Tsougos, Chara Tzavara, Dimitrios Psimadas, Evdoxia Theodorou, Charalampos Ziangas, John Skoularigis, Filippos Triposkiadis and Panagiotis Georgoulias
J. Imaging 2026, 12(1), 6; https://doi.org/10.3390/jimaging12010006 - 25 Dec 2025
Viewed by 355
Abstract
The evaluation of myocardial perfusion imaging (MPI) studies is based on the visual interpretation of the reconstructed images, while the measurements obtained through software packages may contribute to the investigation, mainly in cases of ambiguous scintigraphic findings. We aimed to investigate the long-term [...] Read more.
The evaluation of myocardial perfusion imaging (MPI) studies is based on the visual interpretation of the reconstructed images, while the measurements obtained through software packages may contribute to the investigation, mainly in cases of ambiguous scintigraphic findings. We aimed to investigate the long-term prognostic value of expert reading of Summed Stress Score (SSS), Summed Rest Score (SRS), and Summed Difference Score (SDS), combined with the automated measurements of these parameters, in comparison to the prognostic ability of the angiographic score for soft and hard cardiac events. The study was conducted at the Nuclear Medicine Laboratory of the University of Thessaly, in Larissa, Greece. Overall, 378 consecutive patients with known or suspected coronary artery disease (CAD) were enrolled. Automated measurements of SSS, SRS, and SDS were obtained using the Emory Cardiac Toolbox, Myovation, and Quantitative Perfusion SPECT software packages. Coronary angiographies were scored according to a four-point scoring system (angiographic score). Follow-up data were recorded after phone contact, as well as through review of hospital records. All participants were followed up for at least 36 months. Soft and hard cardiac events were recorded in 31.7% and 11.6% of the sample, respectively, while any cardiac event was recorded in 36.5%. For hard cardiac events, the prognostic value of expert scoring, combined with the prognostic value of the automated measurements, was significantly greater compared to the prognostic ability of the angiographic score (p < 0.001). As far as any cardiac event, the prognostic value of expert scoring, combined with the prognostic value of the automated analyses, was significantly greater compared to the prognostic ability of the angiographic score (p < 0.001). According to our results, in patients with known or suspected CAD, the combination of expert reading and automated measurements of SSS, SRS, and SDS shows a superior prognostic ability in comparison to the angiographic score. Full article
Show Figures

Graphical abstract

24 pages, 2447 KB  
Article
Augmented Gait Classification: Integrating YOLO, CNN–SNN Hybridization, and GAN Synthesis for Knee Osteoarthritis and Parkinson’s Disease
by Houmem Slimi, Ala Balti, Mounir Sayadi and Mohamed Moncef Ben Khelifa
Signals 2025, 6(4), 64; https://doi.org/10.3390/signals6040064 - 7 Nov 2025
Cited by 1 | Viewed by 1227
Abstract
We propose a novel hybrid deep learning framework that synergistically integrates Convolutional Neural Networks (CNNs), Spiking Neural Networks (SNNs), and Generative Adversarial Networks (GANs) for robust and accurate classification of high-resolution frontal and sagittal human gait video sequences—capturing both lower-limb kinematics and upper-body [...] Read more.
We propose a novel hybrid deep learning framework that synergistically integrates Convolutional Neural Networks (CNNs), Spiking Neural Networks (SNNs), and Generative Adversarial Networks (GANs) for robust and accurate classification of high-resolution frontal and sagittal human gait video sequences—capturing both lower-limb kinematics and upper-body posture—from subjects with Knee Osteoarthritis (KOA), Parkinson’s Disease (PD), and healthy Normal (NM) controls, classified into three disease-type categories. Our approach first employs a tailored CNN backbone to extract rich spatial features from fixed-length clips (e.g., 16 frames resized to 128 × 128 px), which are then temporally encoded and processed by an SNN layer to capture dynamic gait patterns. To address class imbalance and enhance generalization, a conditional GAN augments rare severity classes with realistic synthetic gait sequences. Evaluated on the controlled, marker-based KOA-PD-NM laboratory public dataset, our model achieves an overall accuracy of 99.47%, a sensitivity of 98.4%, a specificity of 99.0%, and an F1-score of 98.6%, outperforming baseline CNN, SNN, and CNN–SNN configurations by over 2.5% in accuracy and 3.1% in F1-score. Ablation studies confirm that GAN-based augmentation yields a 1.9% accuracy gain, while the SNN layer provides critical temporal robustness. Our findings demonstrate that this CNN–SNN–GAN paradigm offers a powerful, computationally efficient solution for high-precision, gait-based disease classification, achieving a 48.4% reduction in FLOPs (1.82 GFLOPs to 0.94 GFLOPs) and 9.2% lower average power consumption (68.4 W to 62.1 W) on Kaggle P100 GPU compared to CNN-only baselines. The hybrid model demonstrates significant potential for energy savings on neuromorphic hardware, with an estimated 13.2% reduction in energy per inference based on FLOP-based analysis, positioning it favorably for deployment in resource-constrained clinical environments and edge computing scenarios. Full article
Show Figures

Figure 1

21 pages, 4354 KB  
Article
Exploring the Application and Characteristics of Homomorphic Encryption Based on Pixel Scrambling Algorithm in Image Processing
by Tieyu Zhao
Big Data Cogn. Comput. 2025, 9(10), 250; https://doi.org/10.3390/bdcc9100250 - 30 Sep 2025
Viewed by 1062
Abstract
Homomorphic encryption is well known to researchers, yet its application in image processing is scarce. The diversity of image processing algorithms makes homomorphic encryption implementation challenging. Current research often uses the CKKS algorithm, but it has core bottlenecks in image encryption, such as [...] Read more.
Homomorphic encryption is well known to researchers, yet its application in image processing is scarce. The diversity of image processing algorithms makes homomorphic encryption implementation challenging. Current research often uses the CKKS algorithm, but it has core bottlenecks in image encryption, such as the mismatch between image data and the homomorphic operation mechanism, high 2D-structure-induced costs, noise-related visual quality damage, and poor nonlinear operational support. This study, based on image pixel characteristics, analyzes homomorphic encryption via pixel scrambling algorithms. Using magic square, Arnold, Henon map, and Hilbert curve transformations as starting points, it reveals their homomorphic properties in image processing. This further explores general pixel scrambling algorithm homomorphic encryption properties, offering valuable insights for homomorphic encryption applications in image processing. Full article
Show Figures

Figure 1

27 pages, 57533 KB  
Article
Assessing the Influence of Feedback Strategies on Errors in Crowdsourced Annotation of Tumor Images
by Jose Alejandro Libreros, Edwin Gamboa, Erik Henke and Matthias Hirth
Big Data Cogn. Comput. 2025, 9(9), 220; https://doi.org/10.3390/bdcc9090220 - 26 Aug 2025
Viewed by 1300
Abstract
Crowdsourcing enables the acquisition of distributed human intelligence for solving tasks involving human judgments in scalable ways, with many use cases in various application areas accessing human intelligence. However, crowdworkers completing the tasks may have limited or no background knowledge about the tasks [...] Read more.
Crowdsourcing enables the acquisition of distributed human intelligence for solving tasks involving human judgments in scalable ways, with many use cases in various application areas accessing human intelligence. However, crowdworkers completing the tasks may have limited or no background knowledge about the tasks they solve due to the plethora of various tasks available. Therefore, the tasks—even on a micro scale—also need to include appropriate training for the crowdworkers to enable them to complete them successfully. However, training crowdworkers efficiently in a short time for complex tasks poses a challenge and remains an unresolved issue. This paper addresses this challenge by empirically comparing different training strategies for crowdworkers and evaluating their impact on the crowdworkers’ task results. We perform comparisons between a basic training strategy, a strategy based on previous errors made by other crowdworkers, and the addition of instant feedback during training and task completion. Our results show that adding instant feedback during both the training phase and during the task yields more attention from the workers in difficult tasks and hence reduces errors and improves the results. We conclude that more attention is retained when the content of instant feedback includes information about mistakes made by other crowdworkers previously. Full article
Show Figures

Figure 1

21 pages, 4400 KB  
Article
BFLE-Net: Boundary Feature Learning and Enhancement Network for Medical Image Segmentation
by Jiale Fan, Liping Liu and Xinyang Yu
Electronics 2025, 14(15), 3054; https://doi.org/10.3390/electronics14153054 - 30 Jul 2025
Viewed by 1310
Abstract
Multi-organ medical image segmentation is essential for accurate clinical diagnosis, effective treatment planning, and reliable prognosis, yet it remains challenging due to complex backgrounds, irrelevant noise, unclear organ boundaries, and wide variations in organ size. To address these challenges, the boundary feature learning [...] Read more.
Multi-organ medical image segmentation is essential for accurate clinical diagnosis, effective treatment planning, and reliable prognosis, yet it remains challenging due to complex backgrounds, irrelevant noise, unclear organ boundaries, and wide variations in organ size. To address these challenges, the boundary feature learning and enhancement network is proposed. This model integrates a dedicated boundary learning module combined with an auxiliary loss function to strengthen the semantic correlations between boundary pixels and regional features, thus reducing category mis-segmentation. Additionally, channel and positional compound attention mechanisms are employed to selectively filter features and minimize background interference. To further enhance multi-scale representation capabilities, the dynamic scale-aware context module dynamically selects and fuses multi-scale features, significantly improving the model’s adaptability. The model achieves average Dice similarity coefficients of 81.67% on synapse and 90.55% on ACDC datasets, outperforming state-of-the-art methods. This network significantly improves segmentation by emphasizing boundary accuracy, noise reduction, and multi-scale adaptability, enhancing clinical diagnostics and treatment planning. Full article
Show Figures

Figure 1

49 pages, 5692 KB  
Review
Artificial Intelligence-Empowered Embryo Selection for IVF Applications: A Methodological Review
by Lazaros Moysis, Lazaros Alexios Iliadis, George Vergos, Sotirios P. Sotiroudis, Achilles D. Boursianis, Achilleas Papatheodorou, Konstantinos-Iraklis D. Kokkinidis, Mohammad Abdul Matin, Panagiotis Sarigiannidis, Ilias Siniosoglou, Vasileios Argyriou and Sotirios K. Goudos
Mach. Learn. Knowl. Extr. 2025, 7(2), 56; https://doi.org/10.3390/make7020056 - 16 Jun 2025
Cited by 3 | Viewed by 12847
Abstract
In vitro fertilization (IVF) is a well-established and efficient assisted reproductive technology (ART). However, it requires a series of costly and non-trivial procedures, and the success rate still needs improvement. Thus, increasing the success rate, simplifying the process, and reducing costs are all [...] Read more.
In vitro fertilization (IVF) is a well-established and efficient assisted reproductive technology (ART). However, it requires a series of costly and non-trivial procedures, and the success rate still needs improvement. Thus, increasing the success rate, simplifying the process, and reducing costs are all essential challenges of IVF. These can be addressed by integrating artificial intelligence techniques, like deep learning (DL), with several aspects of the IVF process. DL techniques can help extract important features from the data, support decision making, and perform several other tasks, as architectures can be adapted to different problems. The emergence of AI in the medical field has seen a rise in DL-supported tools for embryo selection. In this work, recent advances in the use of AI and DL-based embryo selection for IVF are reviewed. The different architectures that have been considered so far for each task are presented. Furthermore, future challenges for artificial intelligence-based ARTs are outlined. Full article
Show Figures

Graphical abstract

24 pages, 1224 KB  
Article
MDFormer: Transformer-Based Multimodal Fusion for Robust Chest Disease Diagnosis
by Xinlong Liu, Fei Pan, Hainan Song, Siyi Cao, Chunping Li and Tanshi Li
Electronics 2025, 14(10), 1926; https://doi.org/10.3390/electronics14101926 - 9 May 2025
Cited by 2 | Viewed by 3797
Abstract
With the increasing richness of medical images and clinical data, abundant data support is provided for multimodal chest disease diagnosis methods. However, traditional multimodal fusion methods are often relatively simple, leading to insufficient exploitation of crossmodal complementary advantages. At the same time, existing [...] Read more.
With the increasing richness of medical images and clinical data, abundant data support is provided for multimodal chest disease diagnosis methods. However, traditional multimodal fusion methods are often relatively simple, leading to insufficient exploitation of crossmodal complementary advantages. At the same time, existing multimodal chest disease diagnosis methods usually focus on two modalities, and their scalability is poor when extended to three or more modalities. Moreover, in practical clinical scenarios, missing modality problems often arise due to equipment limitations or incomplete data acquisition. To address these issues, this paper proposes a novel multimodal chest disease classification model, MDFormer. This model designs a crossmodal attention fusion mechanism, MFAttention, and combines it with the Transformer architecture to construct a multimodal fusion module, MFTrans, which effectively integrates medical imaging, clinical text, and vital signs data. When extended to multiple modalities, MFTrans significantly reduces model parameters. At the same time, this paper also proposes a two-stage masked enhancement classification and contrastive learning training framework, MECCL, which significantly improves the model’s robustness and transferability. Experimental results show that MDFormer achieves a classification precision of 0.8 on the MIMIC dataset, and when 50% of the modality data are missing, the AUC can reach 85% of that of the complete data, outperforming models that did not use two-stage training. Full article
Show Figures

Figure 1

25 pages, 6904 KB  
Article
A Weighted Facial Expression Analysis for Pain Level Estimation
by Parkpoom Chaisiriprasert and Nattapat Patchsuwan
J. Imaging 2025, 11(5), 151; https://doi.org/10.3390/jimaging11050151 - 9 May 2025
Cited by 1 | Viewed by 2682
Abstract
Accurate assessment of pain intensity is critical, particularly for patients who are unable to verbally express their discomfort. This study proposes a novel weighted analytical framework that integrates facial expression analysis through action units (AUs) with a facial feature-based weighting mechanism to enhance [...] Read more.
Accurate assessment of pain intensity is critical, particularly for patients who are unable to verbally express their discomfort. This study proposes a novel weighted analytical framework that integrates facial expression analysis through action units (AUs) with a facial feature-based weighting mechanism to enhance the estimation of pain intensity. The proposed method was evaluated on a dataset comprising 4084 facial images from 25 individuals and demonstrated an average accuracy of 92.72% using the weighted pain level estimation model, in contrast to 83.37% achieved using conventional approaches. The observed improvements are primarily attributed to the strategic utilization of AU zones and expression-based weighting, which enable more precise differentiation between pain-related and non-pain-related facial movements. These findings underscore the efficacy of the proposed model in enhancing the accuracy and reliability of automated pain detection, especially in contexts where verbal communication is impaired or absent. Full article
Show Figures

Figure 1

Back to TopTop