Previous Issue
Volume 6, August
 
 

AI, Volume 6, Issue 9 (September 2025) – 34 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
15 pages, 4633 KB  
Article
GLNet-YOLO: Multimodal Feature Fusion for Pedestrian Detection
by Yi Zhang, Qing Zhao, Xurui Xie, Yang Shen, Jinhe Ran, Shu Gui, Haiyan Zhang, Xiuhe Li and Zhen Zhang
AI 2025, 6(9), 229; https://doi.org/10.3390/ai6090229 - 12 Sep 2025
Abstract
In the field of modern computer vision, pedestrian detection technology holds significant importance in applications such as intelligent surveillance, autonomous driving, and robot navigation. However, single-modal images struggle to achieve high-precision detection in complex environments. To address this, this study proposes a GLNet-YOLO [...] Read more.
In the field of modern computer vision, pedestrian detection technology holds significant importance in applications such as intelligent surveillance, autonomous driving, and robot navigation. However, single-modal images struggle to achieve high-precision detection in complex environments. To address this, this study proposes a GLNet-YOLO framework based on cross-modal deep feature fusion, aiming to improve pedestrian detection performance in complex environments by fusing feature information from visible light and infrared images. By extending the YOLOv11 architecture, the framework adopts a dual-branch network structure to process visible light and infrared modal inputs, respectively, and introduces the FM module to realize global feature fusion and enhancement, as well as the DMR module to accomplish local feature separation and interaction. Experimental results show that on the LLVIP dataset, compared to the single-modal YOLOv11 baseline, our fused model improves the mAP@50 by 9.2% over the visible-light-only model and 0.7% over the infrared-only model. This significantly improves the detection accuracy under low-light and complex background conditions and enhances the robustness of the algorithm, and its effectiveness is further verified on the KAIST dataset. Full article
33 pages, 5048 KB  
Article
Beyond DOM: Unlocking Web Page Structure from Source Code with Neural Networks
by Irfan Prazina, Damir Pozderac and Vensada Okanović
AI 2025, 6(9), 228; https://doi.org/10.3390/ai6090228 - 12 Sep 2025
Abstract
We introduce a code-only approach for modeling web page layouts directly from their source code (HTML and CSS only), bypassing rendering. Our method employs a neural architecture with specialized encoders for style rules, CSS selectors, and HTML attributes. These encodings are then aggregated [...] Read more.
We introduce a code-only approach for modeling web page layouts directly from their source code (HTML and CSS only), bypassing rendering. Our method employs a neural architecture with specialized encoders for style rules, CSS selectors, and HTML attributes. These encodings are then aggregated in another neural network that integrates hierarchical context (sibling and ancestor information) to form rich representational vectors for each web page’s element. Using these vectors, our model predicts eight spatial relationships between pairs of elements, focusing on edge-based proximity in a multilabel classification setup. For scalable training, labels are automatically derived from the Document Object Model (DOM) data for each web page, but the model operates independently of the DOM during inference. During inference, the model does not use bounding boxes or any information found in the DOM; instead, it relies solely on the source code as input. This approach facilitates structure-aware visual analysis in a lightweight and fully code-based way. Our model demonstrates alignment with human judgment in the evaluation of web page similarity, suggesting that code-only layout modeling offers a promising direction for scalable, interpretable, and efficient web interface analysis. The evaluation metrics show our method yields similar performance despite relying on less information. Full article
Show Figures

Figure 1

19 pages, 325 KB  
Review
Artificial Intelligence in Medical Education: A Narrative Review on Implementation, Evaluation, and Methodological Challenges
by Annalisa Roveta, Luigi Mario Castello, Costanza Massarino, Alessia Francese, Francesca Ugo and Antonio Maconi
AI 2025, 6(9), 227; https://doi.org/10.3390/ai6090227 - 11 Sep 2025
Abstract
Artificial Intelligence (AI) is rapidly transforming medical education by enabling adaptive tutoring, interactive simulation, diagnostic enhancement, and competency-based assessment. This narrative review explores how AI has influenced learning processes in undergraduate and postgraduate medical training, focusing on methodological rigor, educational impact, and implementation [...] Read more.
Artificial Intelligence (AI) is rapidly transforming medical education by enabling adaptive tutoring, interactive simulation, diagnostic enhancement, and competency-based assessment. This narrative review explores how AI has influenced learning processes in undergraduate and postgraduate medical training, focusing on methodological rigor, educational impact, and implementation challenges. The literature reveals promising results: large language models can generate didactic content and foster academic writing; AI-driven simulations enhance decision-making, procedural skills, and interprofessional communication; and deep learning systems improve diagnostic accuracy in visually intensive tasks such as radiology and histology. Despite promising findings, the existing literature is methodologically heterogeneous. A minority of studies use controlled designs, while the majority focus on short-term effects or are confined to small, simulated cohorts. Critical limitations include algorithmic opacity, generalizability concerns, ethical risks (e.g., GDPR compliance, data bias), and infrastructural barriers, especially in low-resource contexts. Additionally, the unregulated use of AI may undermine critical thinking, foster cognitive outsourcing, and compromise pedagogical depth if not properly supervised. In conclusion, AI holds substantial potential to enhance medical education, but its integration requires methodological robustness, human oversight, and ethical safeguards. Future research should prioritize multicenter validation, longitudinal evaluation, and AI literacy for learners and educators to ensure responsible and sustainable adoption. Full article
(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)
29 pages, 651 KB  
Systematic Review
Retrieval-Augmented Generation (RAG) in Healthcare: A Comprehensive Review
by Fnu Neha, Deepshikha Bhati and Deepak Kumar Shukla
AI 2025, 6(9), 226; https://doi.org/10.3390/ai6090226 - 11 Sep 2025
Viewed by 34
Abstract
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieval to improve factual consistency and reduce hallucinations. Despite growing interest, its use in healthcare remains fragmented. This paper presents a Systematic Literature Review (SLR) following PRISMA guidelines, synthesizing 30 peer-reviewed [...] Read more.
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieval to improve factual consistency and reduce hallucinations. Despite growing interest, its use in healthcare remains fragmented. This paper presents a Systematic Literature Review (SLR) following PRISMA guidelines, synthesizing 30 peer-reviewed studies on RAG in clinical domains, focusing on three of its most prevalent and promising applications in diagnostic support, electronic health record (EHR) summarization, and medical question answering. We synthesize the existing architectural variants (naïve, advanced, and modular) and examine their deployment across these applications. Persistent challenges are identified, including retrieval noise (irrelevant or low-quality retrieved information), domain shift (performance degradation when models are applied to data distributions different from their training set), generation latency, and limited explainability. Evaluation strategies are compared using both standard metrics and clinical-specific metrics, FactScore, RadGraph-F1, and MED-F1, which are particularly critical for ensuring factual accuracy, medical validity, and clinical relevance. This synthesis offers a domain-focused perspective to guide researchers, healthcare providers, and policymakers in developing reliable, interpretable, and clinically aligned AI systems, laying the groundwork for future innovation in RAG-based healthcare solutions. Full article
Show Figures

Figure 1

45 pages, 2364 KB  
Systematic Review
Advances and Optimization Trends in Photovoltaic Systems: A Systematic Review
by Luis Angel Iturralde Carrera, Gendry Alfonso-Francia, Carlos D. Constantino-Robles, Juan Terven, Edgar A. Chávez-Urbiola and Juvenal Rodríguez-Reséndiz
AI 2025, 6(9), 225; https://doi.org/10.3390/ai6090225 - 10 Sep 2025
Viewed by 126
Abstract
This article presents a systematic review of optimization methods applied to enhance the performance of photovoltaic (PV) systems, with a focus on critical challenges such as system design and spatial layout, maximum power point tracking (MPPT), energy forecasting, fault diagnosis, and energy management. [...] Read more.
This article presents a systematic review of optimization methods applied to enhance the performance of photovoltaic (PV) systems, with a focus on critical challenges such as system design and spatial layout, maximum power point tracking (MPPT), energy forecasting, fault diagnosis, and energy management. The emphasis is on the integration of classical and algorithmic approaches. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (PRISMA) methodology, 314 relevant publications from 2020 to 2025 were analyzed to identify current trends, methodological advances, and practical applications in the optimization of PV performance. The principal novelty of this review lies in its integrative critical analysis, which systematically contrasts the applicability, performance, and limitations of deterministic classical methods with emerging stochastic metaheuristic and data-driven artificial intelligence (AI) techniques, highlighting the growing dominance of hybrid models that synergize their strengths. Traditional techniques such as analytical modeling, numerical simulation, linear and dynamic programming, and gradient-based methods are examined in terms of their efficiency and scope. In parallel, the study evaluates the growing adoption of metaheuristic algorithms, including particle swarm optimization, genetic algorithms, and ant colony optimization, as well as machine learning (ML) and deep learning (DL) models applied to tasks such as MPPT, spatial layout optimization, energy forecasting, and fault diagnosis. A key contribution of this review is the identification of hybrid methodologies that combine metaheuristics with ML/DL models, demonstrating superior results in energy yield, robustness, and adaptability under dynamic conditions. The analysis highlights both the strengths and limitations of each paradigm, emphasizing challenges related to data availability, computational cost, and model interpretability. Finally, the study proposes future research directions focused on explainable AI, real-time control via edge computing, and the development of standardized benchmarks for performance evaluation. The findings contribute to a deeper understanding of current capabilities and opportunities in PV system optimization, offering a strategic framework for advancing intelligent and sustainable solar energy technologies. Full article
Show Figures

Figure 1

24 pages, 5198 KB  
Article
A Markerless Vision-Based Physical Frailty Assessment System for the Older Adults
by Muhammad Huzaifa, Wajiha Ali, Khawaja Fahad Iqbal, Ishtiaq Ahmad, Yasar Ayaz, Hira Taimur, Yoshihisa Shirayama and Motoyuki Yuasa
AI 2025, 6(9), 224; https://doi.org/10.3390/ai6090224 - 10 Sep 2025
Viewed by 321
Abstract
The geriatric syndrome known as frailty is characterized by diminished physiological reserves and heightened susceptibility to unfavorable health consequences. As the world’s population ages, it is crucial to detect frailty early and accurately in order to reduce hazards, including falls, hospitalization, and death. [...] Read more.
The geriatric syndrome known as frailty is characterized by diminished physiological reserves and heightened susceptibility to unfavorable health consequences. As the world’s population ages, it is crucial to detect frailty early and accurately in order to reduce hazards, including falls, hospitalization, and death. In particular, functional tests are frequently used to evaluate physical frailty. However, current evaluation techniques are limited in their scalability and are prone to inconsistency due to their heavy reliance on subjective interpretation and manual observation. In this paper, we provide a completely automated, impartial, and comprehensive frailty assessment system that employs computer vision techniques for assessing physical frailty tests. Machine learning models have been specifically designed to analyze each clinical test. In order to extract significant features, our system analyzes the depth and joint coordinate data for important physical performance tests such as the Walking Speed Test, Timed Up and Go (TUG) Test, Functional Reach Test, Seated Forward Bend Test, Standing on One Leg Test, and Grip Strength Test. The proposed system offers a comprehensive system with consistent measurements, intelligent decision-making, and real-time feedback, in contrast to current systems, which lack real-time analysis and standardization. Strong model accuracy and conformity to clinical benchmarks are demonstrated by the experimental outcomes. The proposed system can be considered a scalable and useful tool for frailty screening in clinical and distant care settings by eliminating observer dependency and improving accessibility. Full article
(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)
Show Figures

Figure 1

15 pages, 889 KB  
Article
Transformer Models Enhance Explainable Risk Categorization of Incidents Compared to TF-IDF Baselines
by Carlos Ramon Hölzing, Patrick Meybohm, Oliver Happel, Peter Kranke and Charlotte Meynhardt
AI 2025, 6(9), 223; https://doi.org/10.3390/ai6090223 - 9 Sep 2025
Viewed by 545
Abstract
Background: Critical Incident Reporting Systems (CIRS) play a key role in improving patient safety but facess limitations due to the unstructured nature of narrative data. Systematic analysis of such data to identify latent risk patterns remains challenging. While artificial intelligence (AI) shows promise [...] Read more.
Background: Critical Incident Reporting Systems (CIRS) play a key role in improving patient safety but facess limitations due to the unstructured nature of narrative data. Systematic analysis of such data to identify latent risk patterns remains challenging. While artificial intelligence (AI) shows promise in healthcare, its application to CIRS analysis is still underexplored. Methods: This study presents a transformer-based approach to classify incident reports into predefined risk categories and support clinical risk managers in identifying safety hazards. We compared a traditional TF-IDF/logistic regression model with a transformer-based German BERT (GBERT) model using 617 anonymized CIRS reports. Reports were categorized manually into four classes: Organization, Treatment, Documentation, and Consent/Communication. Models were evaluated using stratified 5-fold cross-validation. Interpretability was ensured via Shapley Additive Explanations (SHAP). Results: GBERT outperformed the baseline across all metrics, achieving macro averaged-F1 of 0.44 and a weighted-F1 of 0.75 versus 0.35 and 0.71. SHAP analysis revealed clinically plausible feature attributions. Conclusions: In summary, transformer-based models such as GBERT improve classification of incident report data and enable interpretable, systematic risk stratification. These findings highlight the potential of explainable AI to enhance learning from critical incidents. Full article
(This article belongs to the Special Issue Adversarial Learning and Its Applications in Healthcare)
Show Figures

Figure 1

25 pages, 4660 KB  
Article
Dual-Stream Former: A Dual-Branch Transformer Architecture for Visual Speech Recognition
by Sanghun Jeon, Jieun Lee and Yong-Ju Lee
AI 2025, 6(9), 222; https://doi.org/10.3390/ai6090222 - 9 Sep 2025
Viewed by 503
Abstract
This study proposes Dual-Stream Former, a novel architecture that integrates a Video Swin Transformer and Conformer designed to address the challenges of visual speech recognition (VSR). The model captures spatiotemporal dependencies, achieving a state-of-the-art character error rate (CER) of 3.46%, surpassing traditional convolutional [...] Read more.
This study proposes Dual-Stream Former, a novel architecture that integrates a Video Swin Transformer and Conformer designed to address the challenges of visual speech recognition (VSR). The model captures spatiotemporal dependencies, achieving a state-of-the-art character error rate (CER) of 3.46%, surpassing traditional convolutional neural network (CNN)-based models, such as 3D-CNN + DenseNet-121 (CER: 5.31%), and transformer-based alternatives, such as vision transformers (CER: 4.05%). The Video Swin Transformer captures multiscale spatial representations with high computational efficiency, whereas the Conformer back-end enhances temporal modeling across diverse phoneme categories. Evaluation of a high-resolution dataset comprising 740,000 utterances across 185 classes highlighted the effectiveness of the model in addressing visually confusing phonemes, such as diphthongs (/ai/, /au/) and labio-dental sounds (/f/, /v/). Dual-Stream Former achieved phoneme recognition error rates of 10.39% for diphthongs and 9.25% for labiodental sounds, surpassing those of CNN-based architectures by more than 6%. Although the model’s large parameter count (168.6 M) poses resource challenges, its hierarchical design ensures scalability. Future work will explore lightweight adaptations and multimodal extensions to increase deployment feasibility. These findings underscore the transformative potential of Dual-Stream Former for advancing VSR applications such as silent communication and assistive technologies by achieving unparalleled precision and robustness in diverse settings. Full article
Show Figures

Figure 1

18 pages, 495 KB  
Article
Optimizing NFL Draft Selections with Machine Learning Classification
by Akshaj Enaganti and George Pappas
AI 2025, 6(9), 221; https://doi.org/10.3390/ai6090221 - 9 Sep 2025
Viewed by 258
Abstract
The National Football League draft is one of the most important events in the creation of a successful franchise in professional American football. Selecting players as part of the draft process, however, is difficult, as a multitude of factors affect decisions to opt [...] Read more.
The National Football League draft is one of the most important events in the creation of a successful franchise in professional American football. Selecting players as part of the draft process, however, is difficult, as a multitude of factors affect decisions to opt for one player over another; a few of these include collegiate statistics, team need and fit, and physical potential. In this paper, we utilize a machine learning approach, with various types of models, to optimize the NFL draft and, in turn, enhance team performances. We compare the selections made by the system to the real athletes selected, and assess which of the picks would have been more impactful for the respective franchise. The specific investigation allows for further research by altering the weighting of specific factors and their significance in this decision-making process to land on the ideal player based on what a specific team desires. Using artificial intelligence in this process can produce more consistent results than high-risk traditional methods. Our approach extends beyond a basic Random Forest classifier by simulating complete draft scenarios with player attributes and team needs weighted. This allows comparison of different draft strategies (best-player-available vs. need-based) and demonstrates improved prediction accuracy over conventional methods. Full article
Show Figures

Figure 1

17 pages, 4523 KB  
Article
Self-Emotion-Mediated Exploration in Artificial Intelligence Mirrors: Findings from Cognitive Psychology
by Gustavo Assuncao, Miguel Castelo-Branco and Paulo Menezes
AI 2025, 6(9), 220; https://doi.org/10.3390/ai6090220 - 9 Sep 2025
Viewed by 286
Abstract
Background: Exploration of the physical environment is an indispensable precursor to information acquisition and knowledge consolidation for living organisms. Yet, current artificial intelligence models lack these autonomy capabilities during training, hindering their adaptability. This work proposes a learning framework for artificial agents to [...] Read more.
Background: Exploration of the physical environment is an indispensable precursor to information acquisition and knowledge consolidation for living organisms. Yet, current artificial intelligence models lack these autonomy capabilities during training, hindering their adaptability. This work proposes a learning framework for artificial agents to obtain an intrinsic exploratory drive, based on epistemic and achievement emotions triggered during data observation. Methods: This study proposes a dual-module reinforcement framework, where data analysis scores dictate pride or surprise, in accordance with psychological studies on humans. A correlation between these states and exploration is then optimized for agents to meet their learning goals. Results: Causal relationships between states and exploration are demonstrated by the majority of agents. A 15.4% mean increase is noted for surprise, with a 2.8% mean decrease for pride. Resulting correlations of ρsurprise=0.461 and ρpride=0.237 are obtained, mirroring previously reported human behavior. Conclusions: These findings lead to the conclusion that bio-inspiration for AI development can be of great use. This can incur benefits typically found in living beings, such as autonomy. Further, it empirically shows how AI methodologies can corroborate human behavioral findings, showcasing major interdisciplinary importance. Ramifications are discussed. Full article
Show Figures

Figure 1

20 pages, 2020 KB  
Article
MST-DGCN: Multi-Scale Temporal–Dynamic Graph Convolutional with Orthogonal Gate for Imbalanced Multi-Label ECG Arrhythmia Classification
by Jie Chen, Mingfeng Jiang, Xiaoyu He, Yang Li, Jucheng Zhang, Juan Li, Yongquan Wu and Wei Ke
AI 2025, 6(9), 219; https://doi.org/10.3390/ai6090219 - 8 Sep 2025
Viewed by 333
Abstract
Multi-label arrhythmia classification from 12-lead ECG signals is a tricky problem, including spatiotemporal feature extraction, feature fusion, and class imbalance. To address these issues, a multi-scale temporal–dynamic graph convolutional with orthogonal gates method, termed MST-DGCN, is proposed for ECG arrhythmia classification. In this [...] Read more.
Multi-label arrhythmia classification from 12-lead ECG signals is a tricky problem, including spatiotemporal feature extraction, feature fusion, and class imbalance. To address these issues, a multi-scale temporal–dynamic graph convolutional with orthogonal gates method, termed MST-DGCN, is proposed for ECG arrhythmia classification. In this method, a temporal–dynamic graph convolution with dynamic adjacency matrices is used to learn spatiotemporal patterns jointly, and an orthogonal gated fusion mechanism is used to eliminate redundancy, so as to strength their complementarity and independence through adjusting the significance of features dynamically. Moreover, a multi-instance learning strategy is proposed to alleviate class imbalance by adjusting the proportion of a few arrhythmia samples through adaptive label allocation. After validating on the St Petersburg INCART dataset under stringent inter-patient settings, the experimental results show that the proposed MST-DGCN method can achieve the best classification performance with an F1-score of 73.66% (+6.2% over prior baseline methods), with concurrent improvements in AUC (70.92%) and mAP (85.24%), while maintaining computational efficiency. Full article
Show Figures

Figure 1

15 pages, 1304 KB  
Article
Conv-ScaleNet: A Multiscale Convolutional Model for Federated Human Activity Recognition
by Xian Wu Ting, Ying Han Pang, Zheng You Lim, Shih Yin Ooi and Fu San Hiew
AI 2025, 6(9), 218; https://doi.org/10.3390/ai6090218 - 8 Sep 2025
Viewed by 266
Abstract
Background: Artificial Intelligence (AI) techniques have been extensively deployed in sensor-based Human Activity Recognition (HAR) systems. Recent advances in deep learning, especially Convolutional Neural Networks (CNNs), have advanced HAR by enabling automatic feature extraction from raw sensor data. However, these models often struggle [...] Read more.
Background: Artificial Intelligence (AI) techniques have been extensively deployed in sensor-based Human Activity Recognition (HAR) systems. Recent advances in deep learning, especially Convolutional Neural Networks (CNNs), have advanced HAR by enabling automatic feature extraction from raw sensor data. However, these models often struggle to capture multiscale patterns in human activity, limiting recognition accuracy. Additionally, traditional centralized learning approaches raise data privacy concerns, as personal sensor data must be transmitted to a central server, increasing the risk of privacy breaches. Methods: To address these challenges, this paper introduces Conv-ScaleNet, a CNN-based model designed for multiscale feature learning and compatibility with federated learning (FL) environments. Conv-ScaleNet integrates a Pyramid Pooling Module to extract both fine-grained and coarse-grained features and employs sequential Global Average Pooling layers to progressively capture abstract global representations from inertial sensor data. The model supports federated learning by training locally on user devices, sharing only model updates rather than raw data, thus preserving user privacy. Results: Experimental results demonstrate that the proposed Conv-ScaleNet achieves approximately 98% and 96% F1-scores on the WISDM and UCI-HAR datasets, respectively, confirming its competitiveness in FL environments for activity recognition. Conclusions: The proposed Conv-ScaleNet model addresses key limitations of existing HAR systems by combining multiscale feature learning with privacy-preserving training. Its strong performance, data protection capability, and adaptability to decentralized environments make it a robust and scalable solution for real-world HAR applications. Full article
Show Figures

Figure 1

23 pages, 904 KB  
Article
Unplugged Activities for Teaching Decision Trees to Secondary Students—A Case Study Analysis Using the SOLO Taxonomy
by Konstantinos Karapanos, Vassilis Komis, Georgios Fesakis, Konstantinos Lavidas, Stavroula Prantsoudi and Stamatios Papadakis
AI 2025, 6(9), 217; https://doi.org/10.3390/ai6090217 - 5 Sep 2025
Viewed by 1834
Abstract
The integration of Artificial Intelligence (AI) technologies in students’ lives necessitates the systematic incorporation of foundational AI literacy into educational curricula. Students are challenged to develop conceptual understanding of computational frameworks such as Machine Learning (ML) algorithms and Decision Trees (DTs). In this [...] Read more.
The integration of Artificial Intelligence (AI) technologies in students’ lives necessitates the systematic incorporation of foundational AI literacy into educational curricula. Students are challenged to develop conceptual understanding of computational frameworks such as Machine Learning (ML) algorithms and Decision Trees (DTs). In this context, unplugged (i.e., computer-free) pedagogical approaches have emerged as complementary to traditional coding-based instruction in AI education. This study examines the pedagogical effectiveness of an instructional intervention employing unplugged activities to facilitate conceptual understanding of DT algorithms among 47 9th-grade students within a Computer Science (CS) curriculum in Greece. The study employed a quasi-experimental design, utilizing the Structure of Observed Learning Outcomes (SOLO) taxonomy as the theoretical framework for assessing cognitive development and conceptual mastery of DT principles. Quantitative analysis of pre- and post-intervention assessments demonstrated statistically significant improvements in student performance across all evaluated SOLO taxonomy levels. The findings provide empirical support for the hypothesis that unplugged pedagogical interventions constitute an effective and efficient approach for introducing AI concepts to secondary education students. Based on these outcomes, the authors recommend the systematic implementation of developmentally appropriate unplugged instructional interventions for DTs and broader AI concepts across all educational levels, to optimize AI literacy acquisition. Full article
Show Figures

Figure 1

45 pages, 990 KB  
Review
Large Language Models in Cybersecurity: A Survey of Applications, Vulnerabilities, and Defense Techniques
by Niveen O. Jaffal, Mohammed Alkhanafseh and David Mohaisen
AI 2025, 6(9), 216; https://doi.org/10.3390/ai6090216 - 5 Sep 2025
Viewed by 871
Abstract
Large Language Models (LLMs) are transforming cybersecurity by enabling intelligent, adaptive, and automated approaches to threat detection, vulnerability assessment, and incident response. With their advanced language understanding and contextual reasoning, LLMs surpass traditional methods in tackling challenges across domains such as the Internet [...] Read more.
Large Language Models (LLMs) are transforming cybersecurity by enabling intelligent, adaptive, and automated approaches to threat detection, vulnerability assessment, and incident response. With their advanced language understanding and contextual reasoning, LLMs surpass traditional methods in tackling challenges across domains such as the Internet of Things (IoT), blockchain, and hardware security. This survey provides a comprehensive overview of LLM applications in cybersecurity, focusing on two core areas: (1) the integration of LLMs into key cybersecurity domains, and (2) the vulnerabilities of LLMs themselves, along with mitigation strategies. By synthesizing recent advancements and identifying key limitations, this work offers practical insights and strategic recommendations for leveraging LLMs to build secure, scalable, and future-ready cyber defense systems. Full article
Show Figures

Figure 1

21 pages, 471 KB  
Review
Long Short-Term Memory Networks: A Comprehensive Survey
by Moez Krichen and Alaeddine Mihoub
AI 2025, 6(9), 215; https://doi.org/10.3390/ai6090215 - 5 Sep 2025
Viewed by 780
Abstract
Long Short-Term Memory (LSTM) networks have revolutionized the field of deep learning, particularly in applications that require the modeling of sequential data. Originally designed to overcome the limitations of traditional recurrent neural networks (RNNs), LSTMs effectively capture long-range dependencies in sequences, making them [...] Read more.
Long Short-Term Memory (LSTM) networks have revolutionized the field of deep learning, particularly in applications that require the modeling of sequential data. Originally designed to overcome the limitations of traditional recurrent neural networks (RNNs), LSTMs effectively capture long-range dependencies in sequences, making them suitable for a wide array of tasks. This survey aims to provide a comprehensive overview of LSTM architectures, detailing their unique components, such as cell states and gating mechanisms, which facilitate the retention and modulation of information over time. We delve into the various applications of LSTMs across multiple domains, including the following: natural language processing (NLP), where they are employed for language modeling, machine translation, and sentiment analysis; time series analysis, where they play a critical role in forecasting tasks; and speech recognition, significantly enhancing the accuracy of automated systems. By examining these applications, we illustrate the versatility and robustness of LSTMs in handling complex data types. Additionally, we explore several notable variants and improvements of the standard LSTM architecture, such as Bidirectional LSTMs, which enhance context understanding, and Stacked LSTMs, which increase model capacity. We also discuss the integration of Attention Mechanisms with LSTMs, which have further advanced their performance in various tasks. Despite their strengths, LSTMs face several challenges, including high Computational Complexity, extensive Data Requirements, and difficulties in training, which can hinder their practical implementation. This survey addresses these limitations and provides insights into ongoing research aimed at mitigating these issues. In conclusion, we highlight recent advances in LSTM research and propose potential future directions that could lead to enhanced performance and broader applicability of LSTM networks. This survey serves as a foundational resource for researchers and practitioners seeking to understand the current landscape of LSTM technology and its future trajectory. Full article
Show Figures

Figure 1

26 pages, 3073 KB  
Article
From Detection to Decision: Transforming Cybersecurity with Deep Learning and Visual Analytics
by Saurabh Chavan and George Pappas
AI 2025, 6(9), 214; https://doi.org/10.3390/ai6090214 - 4 Sep 2025
Viewed by 334
Abstract
Objectives: The persistent evolution of software vulnerabilities—spanning novel zero-day exploits to logic-level flaws—continues to challenge conventional cybersecurity mechanisms. Static rule-based scanners and opaque deep learning models often lack the precision and contextual understanding required for both accurate detection and analyst interpretability. This [...] Read more.
Objectives: The persistent evolution of software vulnerabilities—spanning novel zero-day exploits to logic-level flaws—continues to challenge conventional cybersecurity mechanisms. Static rule-based scanners and opaque deep learning models often lack the precision and contextual understanding required for both accurate detection and analyst interpretability. This paper presents a hybrid framework for real-time vulnerability detection that improves both robustness and explainability. Methods: The framework integrates semantic encoding via Bidirectional Encoder Representations from Transformers (BERTs), structural analysis using Deep Graph Convolutional Neural Networks (DGCNNs), and lightweight prioritization through Kernel Extreme Learning Machines (KELMs). The architecture incorporates Minimum Intermediate Representation (MIR) learning to reduce false positives and fuses multi-modal data (source code, execution traces, textual metadata) for robust, scalable performance. Explainable Artificial Intelligence (XAI) visualizations—combining SHAP-based attributions and CVSS-aligned pair plots—serve as an analyst-facing interpretability layer. The framework is evaluated on benchmark datasets, including VulnDetect and the NIST Software Reference Library (NSRL, version 2024.12.1, used strictly as a benign baseline for false positive estimation). Results: Our evaluation reports that precision, recall, AUPRC, MCC, and calibration (ECE/Brier score) demonstrated improved robustness and reduced false positives compared to baselines. An internal interpretability validation was conducted to align SHAP/GNNExplainer outputs with known vulnerability features; formal usability testing with practitioners is left as future work. Conclusions: The framework, Designed with DevSecOps integration in mind, the system is packaged in containerized modules (Docker/Kubernetes) and outputs SIEM-compatible alerts, enabling potential compatibility with Splunk, GitLab CI/CD, and similar tools. While full enterprise deployment was not performed, these deployment-oriented design choices support scalability and practical adoption. Full article
Show Figures

Figure 1

50 pages, 2995 KB  
Review
A Survey of Traditional and Emerging Deep Learning Techniques for Non-Intrusive Load Monitoring
by Annysha Huzzat, Ahmed S. Khwaja, Ali A. Alnoman, Bhagawat Adhikari, Alagan Anpalagan and Isaac Woungang
AI 2025, 6(9), 213; https://doi.org/10.3390/ai6090213 - 3 Sep 2025
Viewed by 415
Abstract
To cope with the increasing global demand of energy and significant energy wastage caused by the use of different home appliances, smart load monitoring is considered a promising solution to promote proper activation and scheduling of devices and reduce electricity bills. Instead of [...] Read more.
To cope with the increasing global demand of energy and significant energy wastage caused by the use of different home appliances, smart load monitoring is considered a promising solution to promote proper activation and scheduling of devices and reduce electricity bills. Instead of installing a sensing device on each electric appliance, non-intrusive load monitoring (NILM) enables the monitoring of each individual device using the total power reading of the home smart meter. However, for a high-accuracy load monitoring, efficient artificial intelligence (AI) and deep learning (DL) approaches are needed. To that end, this paper thoroughly reviews traditional AI and DL approaches, as well as emerging AI models proposed for NILM. Unlike existing surveys that are usually limited to a specific approach or a subset of approaches, this review paper presents a comprehensive survey of an ensemble of topics and models, including deep learning, generative AI (GAI), emerging attention-enhanced GAI, and hybrid AI approaches. Another distinctive feature of this work compared to existing surveys is that it also reviews actual cases of NILM system design and implementation, covering a wide range of technical enablers including hardware, software, and AI models. Furthermore, a range of new future research and challenges are discussed, such as the heterogeneity of energy sources, data uncertainty, privacy and safety, cost and complexity reduction, and the need for a standardized comparison. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

38 pages, 13994 KB  
Article
Post-Heuristic Cancer Segmentation Refinement over MRI Images and Deep Learning Models
by Panagiotis Christakakis and Eftychios Protopapadakis
AI 2025, 6(9), 212; https://doi.org/10.3390/ai6090212 - 2 Sep 2025
Viewed by 587
Abstract
Lately, deep learning methods have greatly improved the accuracy of brain-tumor segmentation, yet slice-wise inconsistencies still limit reliable use in clinical practice. While volume-aware 3D convolutional networks achieve high accuracy, their memory footprint and inference time may limit clinical adoption. This study proposes [...] Read more.
Lately, deep learning methods have greatly improved the accuracy of brain-tumor segmentation, yet slice-wise inconsistencies still limit reliable use in clinical practice. While volume-aware 3D convolutional networks achieve high accuracy, their memory footprint and inference time may limit clinical adoption. This study proposes a resource-conscious pipeline for lower-grade-glioma delineation in axial FLAIR MRI that combines a 2D Attention U-Net with a guided post-processing refinement step. Two segmentation backbones, a vanilla U-Net and an Attention U-Net, are trained on 110 TCGA-LGG axial FLAIR patient volumes under various loss functions and activation functions. The Attention U-Net, optimized with Dice loss, delivers the strongest baseline, achieving a mean Intersection-over-Union (mIoU) of 0.857. To mitigate slice-wise inconsistencies inherent to 2D models, a White-Area Overlap (WAO) voting mechanism quantifies the tumor footprint shared by neighboring slices. The WAO curve is smoothed with a Gaussian filter to locate its peak, after which a percentile-based heuristic selectively relabels the most ambiguous softmax pixels. Cohort-level analysis shows that removing merely 0.1–0.3% of ambiguous low-confidence pixels lifts the post-processing mIoU above the baseline while improving segmentation for two-thirds of patients. The proposed refinement strategy holds great potential for further improvement, offering a practical route for integrating deep learning segmentation into routine clinical workflows with minimal computational overhead. Full article
Show Figures

Figure 1

22 pages, 47099 KB  
Article
Deciphering Emotions in Children’s Storybooks: A Comparative Analysis of Multimodal LLMs in Educational Applications
by Bushra Asseri, Estabrag Abaker, Maha Al Mogren, Tayef Alhefdhi and Areej Al-Wabil
AI 2025, 6(9), 211; https://doi.org/10.3390/ai6090211 - 2 Sep 2025
Viewed by 555
Abstract
Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language [...] Read more.
Emotion recognition capabilities in multimodal AI systems are crucial for developing culturally responsive educational technologies yet remain underexplored for Arabic language contexts, where culturally appropriate learning tools are critically needed. This study evaluated the emotion recognition performance of two advanced multimodal large language models, GPT-4o and Gemini 1.5 Pro, when processing Arabic children’s storybook illustrations. We assessed both models across three prompting strategies (zero-shot, few-shot, and chain-of-thought) using 75 images from seven Arabic storybooks, comparing model predictions with human annotations based on Plutchik’s emotional framework. GPT-4o consistently outperformed Gemini across all conditions, achieving the highest macro F1-score of 59% with chain-of-thought prompting compared to Gemini’s best performance of 43%. Error analysis revealed systematic misclassification patterns, with valence inversions accounting for 60.7% of errors, while both models struggled with culturally nuanced emotions and ambiguous narrative contexts. These findings highlight fundamental limitations in current models’ cultural understanding and emphasize the need for culturally sensitive training approaches to develop effective emotion-aware educational technologies for Arabic-speaking learners. Full article
(This article belongs to the Special Issue Exploring the Use of Artificial Intelligence in Education)
Show Figures

Figure 1

21 pages, 1406 KB  
Article
Neural Network-Based Weight Loss Prediction: Behavioral Integration of Stress and Sleep in AI Decision Support
by Mayra Cruz Fernandez, Francisco Antonio Castillo-Velásquez, Omar Rodriguez-Abreo, Enriqueta Ortiz-Moctezuma, Luis Angel Iturralde Carrera, Adyr A. Estévez-Bén, José M. Álvarez-Alvarado and Juvenal Rodríguez-Reséndiz
AI 2025, 6(9), 210; https://doi.org/10.3390/ai6090210 - 2 Sep 2025
Viewed by 504
Abstract
This study evaluates the effect of incorporating behavioral variables, sleep quality (SQ) and stress level (SL), into neural network models for predicting weight loss. An artificial neural network (ANN) was trained using data from 100 adults aged 18 to 60, integrating demographic, physiological, [...] Read more.
This study evaluates the effect of incorporating behavioral variables, sleep quality (SQ) and stress level (SL), into neural network models for predicting weight loss. An artificial neural network (ANN) was trained using data from 100 adults aged 18 to 60, integrating demographic, physiological, and behavioral inputs. The findings emphasize that weight change is a multifactorial process influenced not only by caloric intake, basal metabolic rate, and physical activity, but also by psychological and behavioral factors such as sleep and stress. From a medical perspective, the inclusion of SQ and SL aligns with the biopsychosocial model of obesity, acknowledging the metabolic consequences of chronic stress and poor sleep. This integration allows for the development of low-cost, non-invasive, and personalized weight management tools based on self-reported data, especially valuable in resource-limited healthcare settings. Behavioral-aware AI systems such as the one proposed have the potential to support clinical decision-making, enable early risk detection, and guide the development of digital therapeutics. Quantitative results demonstrate that the best-performing architecture achieved a Root Mean Square Error (RMSE) of 1.98%; when SQ was excluded, the RMSE increased to 4.39% (1.8-fold), when SL was excluded it rose to 4.69% (1.95-fold), and when both were removed, the error reached 6.02% (2.5-fold), confirming the substantial predictive contribution of these behavioral variables. Full article
Show Figures

Figure 1

27 pages, 520 KB  
Article
QiMARL: Quantum-Inspired Multi-Agent Reinforcement Learning Strategy for Efficient Resource Energy Distribution in Nodal Power Stations
by Sapthak Mohajon Turjya, Anjan Bandyopadhyay, M. Shamim Kaiser and Kanad Ray
AI 2025, 6(9), 209; https://doi.org/10.3390/ai6090209 - 1 Sep 2025
Viewed by 1031
Abstract
The coupling of quantum computing with multi-agent reinforcement learning (MARL) provides an exciting direction to tackle intricate decision-making tasks in high-dimensional spaces. This work introduces a new quantum-inspired multi-agent reinforcement learning (QiMARL) model, utilizing quantum parallelism to achieve learning efficiency and scalability improvement. [...] Read more.
The coupling of quantum computing with multi-agent reinforcement learning (MARL) provides an exciting direction to tackle intricate decision-making tasks in high-dimensional spaces. This work introduces a new quantum-inspired multi-agent reinforcement learning (QiMARL) model, utilizing quantum parallelism to achieve learning efficiency and scalability improvement. The QiMARL model is tested on an energy distribution task, which optimizes power distribution between generating and demanding nodal power stations. We compare the convergence time, reward performance, and scalability of QiMARL with traditional Multi-Armed Bandit (MAB) and Multi-Agent Reinforcement Learning methods, such as Greedy, Upper Confidence Bound (UCB), Thompson Sampling, MADDPG, QMIX, and PPO methods with a comprehensive ablation study. Our findings show that QiMARL yields better performance in high-dimensional systems, decreasing the number of training epochs needed for convergence while enhancing overall reward maximization. We also compare the algorithm’s computational complexity, indicating that QiMARL is more scalable to high-dimensional quantum environments. This research opens the door to future studies of quantum-enhanced reinforcement learning (RL) with potential applications to energy optimization, traffic management, and other multi-agent coordination problems. Full article
(This article belongs to the Special Issue Advances in Quantum Computing and Quantum Machine Learning)
Show Figures

Figure 1

26 pages, 6078 KB  
Article
Handling Missing Air Quality Data Using Bidirectional Recurrent Imputation for Time Series and Random Forest: A Case Study in Mexico City
by Lorena Díaz-González, Ingrid Trujillo-Uribe, Julio César Pérez-Sansalvador and Noureddine Lakouari
AI 2025, 6(9), 208; https://doi.org/10.3390/ai6090208 - 1 Sep 2025
Viewed by 510
Abstract
Accurate imputation of missing data in air quality monitoring is essential for reliable environmental assessment and modeling. This study compares two imputation methods, namely Random Forest (RF) and Bidirectional Recurrent Imputation for Time Series (BRITS), using data from the Mexico City air quality [...] Read more.
Accurate imputation of missing data in air quality monitoring is essential for reliable environmental assessment and modeling. This study compares two imputation methods, namely Random Forest (RF) and Bidirectional Recurrent Imputation for Time Series (BRITS), using data from the Mexico City air quality monitoring network (2014–2023). The analysis focuses on stations with less than 30% missingness and includes both pollutant (CO, NO, NO2, NOx, SO2, O3, PM10, PM2.5, and PMCO) and meteorological (relative humidity, temperature, wind direction and speed) variables. Each station’s data was split into 80% for training and 20% for validation, with 20% artificial missingness. Performance was assessed through two perspectives: local accuracy (MAE and RMSE) on masked subsets and distributional similarity on complete datasets (Two One-Sided Tests and Wasserstein distance). RF achieved lower errors on masked subsets, whereas BRITS better preserved the complete distribution. Both methods struggled with highly variable features. On complete time series, BRITS produced more realistic imputations, while RF often generated extreme outliers. These findings demonstrate the advantages of deep learning for handling complex temporal dependencies and highlight the need for robust strategies for stations with extensive gaps. Enhancing the accuracy of imputations is crucial for improving forecasting, trend analysis, and public health decision-making. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

24 pages, 2357 KB  
Article
From Vision-Only to Vision + Language: A Multimodal Framework for Few-Shot Unsound Wheat Grain Classification
by Yuan Ning, Pengtao Lv, Qinghui Zhang, Le Xiao and Caihong Wang
AI 2025, 6(9), 207; https://doi.org/10.3390/ai6090207 - 29 Aug 2025
Viewed by 543
Abstract
Precise classification of unsound wheat grains is essential for crop yields and food security, yet most existing approaches rely on vision-only models that demand large labeled datasets, which is often impractical in real-world, data-scarce settings. To address this few-shot challenge, we propose UWGC, [...] Read more.
Precise classification of unsound wheat grains is essential for crop yields and food security, yet most existing approaches rely on vision-only models that demand large labeled datasets, which is often impractical in real-world, data-scarce settings. To address this few-shot challenge, we propose UWGC, a novel vision-language framework designed for few-shot classification of unsound wheat grains. UWGC integrates two core modules: a fine-tuning module based on Adaptive Prior Refinement (APE) and a text prompt enhancement module that incorporates Advancing Textual Prompt (ATPrompt) and the multimodal model Qwen2.5-VL. The synergy between the two modules, leveraging cross-modal semantics, enhances generalization of UWGC in low-data regimes. It is offered in two variants: UWGC-F and UWGC-T, in order to accommodate different practical needs. Across few-shot settings on a public grain dataset, UWGC-F and UWGC-T consistently outperform existing vision-only and vision-language methods, highlighting their potential for unsound wheat grain classification in real-world agriculture. Full article
Show Figures

Figure 1

37 pages, 2412 KB  
Systematic Review
Unlocking the Potential of the Prompt Engineering Paradigm in Software Engineering: A Systematic Literature Review
by Irdina Wanda Syahputri, Eko K. Budiardjo and Panca O. Hadi Putra
AI 2025, 6(9), 206; https://doi.org/10.3390/ai6090206 - 28 Aug 2025
Viewed by 851
Abstract
Prompt engineering (PE) has emerged as a transformative paradigm in software engineering (SE), leveraging large language models (LLMs) to support a wide range of SE tasks, including code generation, bug detection, and software traceability. This study conducts a systematic literature review (SLR) combined [...] Read more.
Prompt engineering (PE) has emerged as a transformative paradigm in software engineering (SE), leveraging large language models (LLMs) to support a wide range of SE tasks, including code generation, bug detection, and software traceability. This study conducts a systematic literature review (SLR) combined with a co-citation network analysis of 42 peer-reviewed journal articles to map key research themes, commonly applied PE methods, and evaluation metrics in the SE domain. The results reveal four prominent research clusters: manual prompt crafting, retrieval-augmented generation, chain-of-thought prompting, and automated prompt tuning. These approaches demonstrate notable progress, often matching or surpassing traditional fine-tuning methods in terms of adaptability and computational efficiency. Interdisciplinary collaboration among experts in AI, machine learning, and software engineering is identified as a key driver of innovation. However, several research gaps remain, including the absence of standardized evaluation protocols, sensitivity to prompt brittleness, and challenges in scalability across diverse SE applications. To address these issues, a modular prompt engineering framework is proposed, integrating human-in-the-loop design, automated prompt optimization, and version control mechanisms. Additionally, a conceptual pipeline is introduced to support domain adaptation and cross-domain generalization. Finally, a strategic research roadmap is presented, emphasizing future work on interpretability, fairness, and collaborative development platforms. This study offers a comprehensive foundation and practical insights to advance prompt engineering research tailored to the complex and evolving needs of software engineering. Full article
(This article belongs to the Topic Challenges and Solutions in Large Language Models)
Show Figures

Figure 1

20 pages, 2409 KB  
Article
Brainwave Biometrics: A Secure and Scalable Brain–Computer Interface-Based Authentication System
by Mashael Aldayel, Nouf Alsedairy and Abeer Al-Nafjan
AI 2025, 6(9), 205; https://doi.org/10.3390/ai6090205 - 28 Aug 2025
Viewed by 669
Abstract
This study introduces a promising authentication framework utilizing brain–computer interface (BCI) technology to enhance both security protocols and user experience. A key strength of this approach lies in its reliance on objective, physiological signals—specifically, brainwave patterns—which are inherently difficult to replicate or forge, [...] Read more.
This study introduces a promising authentication framework utilizing brain–computer interface (BCI) technology to enhance both security protocols and user experience. A key strength of this approach lies in its reliance on objective, physiological signals—specifically, brainwave patterns—which are inherently difficult to replicate or forge, thereby providing a robust foundation for secure authentication. The authentication system was developed and implemented in four sequential stages: signal acquisition, preprocessing, feature extraction, and classification. Objective feature extraction methods, including Fisher’s Linear Discriminant (FLD) and Discrete Wavelet Transform (DWT), were employed to isolate meaningful brainwave features. These features were then classified using advanced machine learning techniques, with Quadratic Discriminant Analysis (QDA) and Convolutional Neural Networks (CNN) achieving accuracy rates exceeding 99%. These results highlight the effectiveness of the proposed BCI-based system and underscore the value of objective, data-driven methodologies in developing secure and user-friendly authentication solutions. To further address usability and efficiency, the number of BCI channels was systematically reduced from 64 to 32, and then to 16, resulting in accuracy rates of 92.64% and 80.18%, respectively. This reduction streamlined the authentication process, demonstrating that objective methods can maintain high performance even with simplified hardware and pointing to future directions for practical, real-world implementation. Additionally, we developed a real-time application using our custom dataset, reaching 99.75% accuracy with a CNN model. Full article
Show Figures

Figure 1

19 pages, 5636 KB  
Article
Complete Workflow for ER-IHC Pathology Database Revalidation
by Md Hadayet Ullah, Md Jahid Hasan, Wan Siti Halimatul Munirah Wan Ahmad, Mohammad Faizal Ahmad Fauzi, Zaka Ur Rehman, Jenny Tung Hiong Lee, See Yee Khor and Lai-Meng Looi
AI 2025, 6(9), 204; https://doi.org/10.3390/ai6090204 - 27 Aug 2025
Viewed by 1086
Abstract
Computer-aided systems can assist doctors in detecting cancer at an early stage using medical image analysis. In estrogen receptor immunohistochemistry (ER-IHC)-stained whole-slide images, automated cell identification and segmentation are helpful in the prediction scoring of hormone receptor status, which aids pathologists in determining [...] Read more.
Computer-aided systems can assist doctors in detecting cancer at an early stage using medical image analysis. In estrogen receptor immunohistochemistry (ER-IHC)-stained whole-slide images, automated cell identification and segmentation are helpful in the prediction scoring of hormone receptor status, which aids pathologists in determining whether to recommend hormonal therapy or other therapies for a patient. Accurate scoring can be achieved with accurate segmentation and classification of the nuclei. This paper presents two main objectives: first is to identify the top three models for this classification task and establish an ensemble model, all using 10-fold cross-validation strategy; second is to detect recurring misclassifications within the dataset to identify “misclassified nuclei” or “incorrectly labeled nuclei” for the nuclei class ground truth. The classification task is carried out using 32 pre-trained deep learning models from Keras Applications, focusing on their effectiveness in classifying negative, weak, moderate, and strong nuclei in the ER-IHC histopathology images. An ensemble learning with logistic regression approach is employed for the three best models. The analysis reveals that the top three performing models are EfficientNetB0, EfficientNetV2B2, and EfficientNetB4 with an accuracy of 94.37%, 94.36%, and 94.29%, respectively, and the ensemble model’s accuracy is 95%. We also developed a web-based platform for the pathologists to rectify the “faulty-class” nuclei in the dataset. The complete flow of this work can benefit the field of medical image analysis especially when dealing with intra-observer variability with a large number of images for ground truth validation. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

15 pages, 2190 KB  
Article
Multi-Objective Optimization Model for Emergency Evacuation Based on Adaptive Ant Colony Algorithm
by Jiacheng Yuan and Baiqing Sun
AI 2025, 6(9), 203; https://doi.org/10.3390/ai6090203 - 26 Aug 2025
Viewed by 726
Abstract
Evacuation in public places under emergency situations represents a significant area of management research. With the rapid development of the railway industry, the evacuation of railway stations has gradually attracted attention. This article employs the minimization of congestion degree and total evacuation time [...] Read more.
Evacuation in public places under emergency situations represents a significant area of management research. With the rapid development of the railway industry, the evacuation of railway stations has gradually attracted attention. This article employs the minimization of congestion degree and total evacuation time as primary objectives. In addition, the psychological behavior of individuals and the impact of congestion are sufficiently considered. Moreover, an adaptive Cauchy mutation operator is adopted for flexible population diversity. As a result, a multi-objective optimization model for the evacuation paths is established, with an improved adaptive quantum ant colony algorithm, and a comparison between the model based on adaptive quantum ant colony algorithm and the traditional ant colony model is made. Full article
Show Figures

Figure 1

22 pages, 3435 KB  
Article
An Explainable AI Framework for Stroke Classification Based on CT Brain Images
by Serra Aksoy, Pinar Demircioglu and Ismail Bogrekci
AI 2025, 6(9), 202; https://doi.org/10.3390/ai6090202 - 25 Aug 2025
Viewed by 713
Abstract
Stroke is a major global cause of death and disability and necessitates both quick diagnosis and treatment within narrow windows of opportunity. CT scanning is still the first-line imaging in the acute phase, but correct interpretation may not always be readily available and [...] Read more.
Stroke is a major global cause of death and disability and necessitates both quick diagnosis and treatment within narrow windows of opportunity. CT scanning is still the first-line imaging in the acute phase, but correct interpretation may not always be readily available and may not be resource-available in poor and rural health systems. Automated stroke classification systems can offer useful diagnostic assistance, but clinical application demands high accuracy and explainable decision-making to maintain physician trust and patient safety. In this paper, a ResNet-18 model was trained on 6653 CT brain scans (hemorrhagic stroke, ischemia, normal) with two-phase fine-tuning and transfer learning, XRAI explainability analysis, and web-based clinical decision support system integration. The model performed with 95% test accuracy with good performance across all classes. This system has great potential for emergency rooms and resource-poor environments, offering quick stroke evaluation when specialists are not available, particularly by rapidly excluding hemorrhagic stroke and assisting in the identification of ischemic stroke, which are critical steps in considering tissue plasminogen activator (tPA) administration within therapeutic windows in eligible patients. The combination of classification, explainability, and clinical interface offers a complete framework for medical AI implementation. Full article
(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)
Show Figures

Figure 1

19 pages, 1118 KB  
Article
Fine-Grained Open-Vocabulary Object Detection via Attribute Decomposition and Aggregation
by Bei Dou, Tao Wu and Zhiwei Guo
AI 2025, 6(9), 201; https://doi.org/10.3390/ai6090201 - 25 Aug 2025
Viewed by 823
Abstract
Open-vocabulary object detection (OVOD) aims to localize and recognize objects in images by leveraging category-specific textual inputs, including both known and novel categories. While existing methods excel in general scenarios, their performance significantly deteriorates in domain-specific fine-grained detection because of their heavy reliance [...] Read more.
Open-vocabulary object detection (OVOD) aims to localize and recognize objects in images by leveraging category-specific textual inputs, including both known and novel categories. While existing methods excel in general scenarios, their performance significantly deteriorates in domain-specific fine-grained detection because of their heavy reliance on high-quality textual descriptions. In specialized domains, such textual descriptions are often affected by newly introduced terms or subjective human biases, limiting their applicability. In this paper, we propose an attribute decomposition–aggregation approach for the OVOD to address these challenges. By decomposing categories into fine-grained attributes and learning them in a multi-label manner, our method mitigates text quality issues caused by novel terms and human bias. During inference, unseen fine-grained category texts can be effectively represented by combining the decomposed attributes for detection. Even if the model learns the attributes, a key limitation of current methods is the insufficient utilization of textual attributes. To mitigate this issue, we propose an attribute-aggregation module that enhances the discriminative capability by emphasizing critical attributes for distinguishing target objects from foreground elements. To demonstrate the effectiveness of our OVOD framework, we evaluate our method on both our newly constructed military dataset and the public LAD dataset. Experimental results demonstrate that our method outperforms existing methods in domain-specific fine-grained open-vocabulary detection tasks. Full article
Show Figures

Figure 1

14 pages, 2206 KB  
Article
Predicting Clinical Outcomes and Symptom Relief in Uterine Fibroid Embolization Using Machine Learning on MRI Features
by Sepehr Janghorbani, Alexandre Caprio, Laya Sam, Benjamin C. Lee, Mert R. Sabuncu, Nicole A. Lamparello, Marc Schiffman and Bobak Mosadegh
AI 2025, 6(9), 200; https://doi.org/10.3390/ai6090200 - 25 Aug 2025
Viewed by 728
Abstract
Uterine fibroids are one of the leading health concerns for women worldwide, affecting up to 80% of women by the age of 50. While recent advancements have improved the diagnosis and treatment of fibroids, the current standard of care still faces important limitations [...] Read more.
Uterine fibroids are one of the leading health concerns for women worldwide, affecting up to 80% of women by the age of 50. While recent advancements have improved the diagnosis and treatment of fibroids, the current standard of care still faces important limitations due to the need for a personalized approach to treatment. Uterine fibroid embolization (UFE) has emerged as a promising minimally invasive alternative to traditional surgery, offering advantages such as shorter recovery times, fewer complications, and the preservation of the uterus. However, despite their highly reported effectiveness, only about 1% of eligible patients are offered UFE. This drastic underutilization is partially due to limited physician confidence in predicting patient-specific outcomes. To address this challenge, in this study, we aim to present an objective analysis of the factors influencing UFE success and introduce a scalable and interpretable machine learning (ML) system designed to support clinical decision-making. We have curated a dataset that includes 74 patients, with a total of 311 fibroids for our analysis. We have also developed two sets of ML models for predicting UFE procedure success based on a pre-operative MRI scan as the input. The first model predicts overall procedure success and the likelihood of relieving specific symptoms, achieving an accuracy of 75% (AUC = 0.74) for procedure outcome and 81–88% (AUC = 0.81–0.87) for different symptoms, respectively. The second set of models predicts the success of each individual fibroid responding to the treatment, achieving a 76% accuracy and 75% F-1 score. The AI models in this study can potentially provide patient-specific prediction of procedure effectiveness on both patient-level and fibroid-level, enhancing procedure referral accuracy. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

Previous Issue
Back to TopTop