Multimodal Artificial Intelligence in Healthcare

A special issue of AI (ISSN 2673-2688). This special issue belongs to the section "Medical & Healthcare AI".

Deadline for manuscript submissions: 30 September 2025 | Viewed by 2548

Special Issue Editors

Division of Computer Science and Mathematics, University of Stirling, Stirling FK9 4LA, UK
Interests: artificial intelligence (AI); deep learning; generative AI; medical AI; machine learning; medical imaging; speaker recognition; image processing

E-Mail Website
Guest Editor
Department of Computer Science, Munster Technological University, Cork, Ireland
Interests: artificial intelligence; natural language processing; AI and NLP applications in smart cities
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial Intelligence (AI) is rapidly transforming healthcare. AI has applications in disease diagnosis, prognosis, and healthcare data analytics. AI can aid physicians and doctors through efficient workload management and in reducing data analysis time. AI can make better-informed decisions through incorporating multimodal healthcare data. However, the majority of AI methods rely on use of single-modality data. Many recent studies have demonstrated that the use of multimodal data tends to enhance the predictive performance of AI models in medical imaging, e.g., through leverage of diverse features in different-modality data. Hence, multimodal AI models, along with early/late data/inference fusion approaches, can utilize complex features from data efficiently, thus resulting in better decisions. Exploring new methods to combine different data types often leads to new protocols and strategies for multimodal data collection, cleaning, pre-processing, and integration.

This Special Issue aims to cover recent advancements in multimodal AI for healthcare applications. Its focus is on new methods and applications of AI in healthcare that incorporate multiple modalities of data such as images, text, electronic healthcare records, etc. This research topic will benefit both the AI and clinical researchers looking for new developments in multimodal AI methods in the medical data domain.

Dr. Hazrat Ali
Dr. Kashif Ahmad
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. AI is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • medical AI
  • generative AI
  • multimodal AI
  • medical imaging
  • electronic health record
  • radiology
  • clinical notes
  • data analytics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 1390 KiB  
Article
Emotion-Aware Embedding Fusion in Large Language Models (Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4) for Intelligent Response Generation
by Abdur Rasool, Muhammad Irfan Shahzad, Hafsa Aslam, Vincent Chan and Muhammad Ali Arshad
AI 2025, 6(3), 56; https://doi.org/10.3390/ai6030056 - 13 Mar 2025
Viewed by 150
Abstract
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention [...] Read more.
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention mechanisms to prioritize semantic and emotional features in therapy transcripts. Our approach combines multiple emotion lexicons, including NRC Emotion Lexicon, VADER, WordNet, and SentiWordNet, with state-of-the-art LLMs such as Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4. Therapy session transcripts, comprising over 2000 samples, are segmented into hierarchical levels (word, sentence, and session) using neural networks, while hierarchical fusion combines these features with pooling techniques to refine emotional representations. Attention mechanisms, including multi-head self-attention and cross-attention, further prioritize emotional and contextual features, enabling the temporal modeling of emotional shifts across sessions. The processed embeddings, computed using BERT, GPT-3, and RoBERTa, are stored in the Facebook AI similarity search vector database, which enables efficient similarity search and clustering across dense vector spaces. Upon user queries, relevant segments are retrieved and provided as context to LLMs, enhancing their ability to generate empathetic and contextually relevant responses. The proposed framework is evaluated across multiple practical use cases to demonstrate real-world applicability, including AI-driven therapy chatbots. The system can be integrated into existing mental health platforms to generate personalized responses based on retrieved therapy session data. The experimental results show that our framework enhances empathy, coherence, informativeness, and fluency, surpassing baseline models while improving LLMs’ emotional intelligence and contextual adaptability for psychotherapy. Full article
(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)
Show Figures

Figure 1

34 pages, 4483 KiB  
Article
A Fused Multi-Channel Prediction Model of Pressure Injury for Adult Hospitalized Patients—The “EADB” Model
by Eba’a Dasan Barghouthi, Amani Yousef Owda, Majdi Owda and Mohammad Asia
AI 2025, 6(2), 39; https://doi.org/10.3390/ai6020039 - 18 Feb 2025
Viewed by 339
Abstract
Background: Pressure injuries (PIs) are increasing worldwide, and there has been no significant improvement in preventing them. Traditional assessment tools are widely used to identify a patient at risk of developing a PI. This study aims to construct a novel fused multi-channel prediction [...] Read more.
Background: Pressure injuries (PIs) are increasing worldwide, and there has been no significant improvement in preventing them. Traditional assessment tools are widely used to identify a patient at risk of developing a PI. This study aims to construct a novel fused multi-channel prediction model of PIs in adult hospitalized patients using machine learning algorithms (MLAs). Methods: A multi-phase quantitative approach involving a case–control experimental design was used. A first-hand dataset was collected retrospectively between March/2022 and August/2023 from the electronic medical records of three hospitals in Palestine. Results: The total number of patients was 49,500. A balanced dataset was utilized with a total number of 1110 patients (80% training and 20% testing). The models that were developed utilized eight MLAs, including linear regression and support vector regression (SVR), logistic regression (LR), random forest (RF), gradient boosting (GB), K-nearest neighbor (KNN), decision tree (DT), and extreme gradient boosting (XG boosting) and validated with five-fold cross-validation techniques. The best model was RF, for which the accuracy was 0.962, precision was 0.942, recall was 0.922, F1 was 0.931, area under curve (AUC) was 0.922, false positive rate (FPR) was 0.155, and true positive rate (TPR) was 0.782. Conclusions: The predictive factors were age, moisture, activity, length of stay (LOS), systolic blood pressure (BP), and albumin. A novel fused multi-channel prediction model of pressure injury was developed from different datasets. Full article
(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)
Show Figures

Figure 1

24 pages, 7886 KiB  
Article
AdaptiveSwin-CNN: Adaptive Swin-CNN Framework with Self-Attention Fusion for Robust Multi-Class Retinal Disease Diagnosis
by Imran Qureshi
AI 2025, 6(2), 28; https://doi.org/10.3390/ai6020028 - 6 Feb 2025
Viewed by 845
Abstract
Retinal diseases account for a large fraction of global blinding disorders, requiring sophisticated diagnostic tools for early management. In this study, the author proposes a hybrid deep learning framework in the form of AdaptiveSwin-CNN that combines Swin Transformers and Convolutional Neural Networks (CNNs) [...] Read more.
Retinal diseases account for a large fraction of global blinding disorders, requiring sophisticated diagnostic tools for early management. In this study, the author proposes a hybrid deep learning framework in the form of AdaptiveSwin-CNN that combines Swin Transformers and Convolutional Neural Networks (CNNs) for the classification of multi-class retinal diseases. In contrast to traditional architectures, AdaptiveSwin-CNN utilizes a brand-new Self-Attention Fusion Module (SAFM) to effectively combine multi-scale spatial and contextual options to alleviate class imbalance and give attention to refined retina lesions. Utilizing the adaptive baseline augmentation and dataset-driven preprocessing of input images, the AdaptiveSwin-CNN model resolves the problem of the variability of fundus images in the dataset. AdaptiveSwin-CNN achieved a mean accuracy of 98.89%, sensitivity of 95.2%, specificity of 96.7%, and F1-score of 97.2% on RFMiD and ODIR benchmarks, outperforming other solutions. An additional lightweight ensemble XGBoost classifier to reduce overfitting and increase interpretability also increased diagnostic accuracy. The results highlight AdaptiveSwin-CNN as a robust and computationally efficient decision-support system. Full article
(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)
Show Figures

Figure 1

Back to TopTop