AI | An Open Access Journal from MDPI

23 pages, 3532 KB

Open AccessReview

Generative Artificial Intelligence in Healthcare: A Bibliometric Analysis and Review of Potential Applications and Challenges

by Vanita Kouomogne Nana and Mark T. Marshall

AI 2025, 6(11), 278; https://doi.org/10.3390/ai6110278 - 23 Oct 2025

Abstract

The remarkable progress of artificial intelligence (AI) in recent years has significantly extended its application possibilities within the healthcare domain. AI has become more accessible to a wider range of healthcare personnel and service users, in particular due to the proliferation of Generative [...] Read more.

The remarkable progress of artificial intelligence (AI) in recent years has significantly extended its application possibilities within the healthcare domain. AI has become more accessible to a wider range of healthcare personnel and service users, in particular due to the proliferation of Generative AI (GenAI). This study presents a bibliometric analysis of GenAI in healthcare. By analysing the Scopus database academic literature, our study explores the knowledge structure, emerging trends, and challenges of GenAI in healthcare. The results showed that GenAI is increasingly being adoption in developed countries, with major US institutions leading the way, and a large number of papers are being published on the topic in top-level academic venues. Our findings also show that there is a focus on particular areas of healthcare, with medical education and clinical decision-making showing active research, while areas such as emergency medicine remain poorly explored. Our results also show that while there is a focus on the benefits of GenAI for the healthcare industry, its limitations need to be acknowledged and addressed to facilitate its integration in clinical settings. The findings of this study can serve as a foundation for understanding the field, allowing academics, healthcare practitioners, educators, and policymakers to better understand the current focus within GenAI for healthcare, as well as highlighting potential application areas and challenges around accuracy, privacy, and ethics that must be taken into account when developing healthcare-focused GenAI applications. Full article

(This article belongs to the Topic Artificial Intelligence in Public Health: Current Trends and Future Possibilities, 2nd Edition)

► Show Figures

Figure 1

19 pages, 4001 KB

Open AccessArticle

ConvNeXt with Context-Weighted Deep Superpixels for High-Spatial-Resolution Aerial Image Semantic Segmentation

by Ziran Ye, Yue Lin, Muye Gan, Xiangfeng Tan, Mengdi Dai and Dedong Kong

AI 2025, 6(11), 277; https://doi.org/10.3390/ai6110277 - 22 Oct 2025

Abstract

Semantic segmentation of high-spatial-resolution (HSR) aerial imagery is critical for applications such as urban planning and environmental monitoring, yet challenges, including scale variation, intra-class diversity, and inter-class confusion, persist. This study proposes a deep learning framework that integrates convolutional networks (CNNs) with context-enhanced [...] Read more.

Semantic segmentation of high-spatial-resolution (HSR) aerial imagery is critical for applications such as urban planning and environmental monitoring, yet challenges, including scale variation, intra-class diversity, and inter-class confusion, persist. This study proposes a deep learning framework that integrates convolutional networks (CNNs) with context-enhanced superpixel generation, using ConvNeXt as the backbone for feature extraction. The framework incorporates two key modules, namely, a deep superpixel module (Spixel) and a global context modeling module (GC-module), which synergistically generate context-weighted superpixel embeddings to enhance scene–object relationship modeling, refining local details while maintaining global semantic consistency. The introduced approach achieves mIoU scores of 84.54%, 90.59%, and 64.46% on diverse HSR aerial imagery benchmark datasets (Vaihingen, Potsdam, and UV6K), respectively. Ablation experiments were conducted to further validate the contributions of the global context modeling module and deep superpixel modules, highlighting their synergy in improving segmentation results. This work facilitates precise spatial detail preservation and semantic consistency in HSR aerial imagery interpretation tasks, particularly for small objects and complex land cover classes. Full article

► Show Figures

Figure 1

20 pages, 2894 KB

Open AccessArticle

End-to-End Swallowing Event Localization via Blue-Channel-to-Depth Substitution in RGB-D: GRNConvNeXt-Modified AdaTAD with KAN-Chebyshev Decoder

by Derek Ka-Hei Lai, Zi-An Zhao, Andy Yiu-Chau Tam, Jing Li, Jason Zhi-Shen Zhang, Duo Wai-Chi Wong and James Chung-Wai Cheung

AI 2025, 6(11), 276; https://doi.org/10.3390/ai6110276 - 22 Oct 2025

Abstract

Background: Swallowing is a complex biomechanical process, and its impairment (dysphagia) poses major health risks for older adults. Current diagnostic methods such as videofluoroscopic swallowing (VFSS) and fiberoptic endoscopic evaluation of swallowing (FEES) are effective but invasive, resource-intensive, and unsuitable for continuous [...] Read more.

Background: Swallowing is a complex biomechanical process, and its impairment (dysphagia) poses major health risks for older adults. Current diagnostic methods such as videofluoroscopic swallowing (VFSS) and fiberoptic endoscopic evaluation of swallowing (FEES) are effective but invasive, resource-intensive, and unsuitable for continuous monitoring. This study proposes a novel end-to-end RGB–D framework for automated swallowing event localization in continuous video streams. Methods: The framework enhances the AdaTAD backbone through three key innovations: (i) finding the optimal strategy to integrate depth information to capture subtle neck movements, (ii) examining the best adapter design for efficient temporal feature adaptation, and (iii) introducing a Kolmogorov–Arnold Network (KAN) decoder that leverages Chebyshev polynomials for non-linear temporal modeling. Evaluation on a proprietary swallowing dataset comprising 641 clips and 3153 annotated events demonstrated the effectiveness of the proposed framework. We analysed and compared the modification strategy across designs of adapters, decoders, input channel combinations, regression methods, and patch embedding techniques. Results: The optimized configuration (VideoMAE + GRNConvNeXtAdapter + KAN + RGD + boundary regression + sinusoidal embedding) achieved an average mAP of 83.25%, significantly surpassing the baseline I3D + RGB + MLP model (61.55%). Ablation studies further confirmed that each architectural component contributed incrementally to the overall improvement. Conclusions: These results establish the feasibility of accurate, non-invasive, and automated swallowing event localization using depth-augmented video. The proposed framework paves the way for practical dysphagia screening and long-term monitoring in clinical and home-care environments. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Engineering: Challenges and Developments)

► Show Figures

Figure 1

23 pages, 2701 KB

Open AccessArticle

Grad-CAM-Assisted Deep Learning for Mode Hop Localization in Shearographic Tire Inspection

by Manuel Friebolin, Michael Munz and Klaus Schlickenrieder

AI 2025, 6(10), 275; https://doi.org/10.3390/ai6100275 - 21 Oct 2025

Abstract

In shearography-based tire testing, so-called “Mode Hops”, abrupt phase changes caused by laser mode changes, can lead to significant disturbances in the interference image analysis. These artifacts distort defect assessment, lead to retesting or false-positive decisions, and, thus, represent a significant hurdle for [...] Read more.

In shearography-based tire testing, so-called “Mode Hops”, abrupt phase changes caused by laser mode changes, can lead to significant disturbances in the interference image analysis. These artifacts distort defect assessment, lead to retesting or false-positive decisions, and, thus, represent a significant hurdle for the automation of the shearography-based tire inspection process. This work proposes a deep learning workflow that combines a pretrained, optimized ResNet-50 classifier with Grad-CAM, providing a practical and explainable solution for the reliable detection and localization of Mode Hops in shearographic tire inspection images. We trained the algorithm on an extensive, cross-machine dataset comprising more than 6.5 million test images. The final deep learning model achieves a classification accuracy of 99.67%, a false-negative rate of 0.48%, and a false-positive rate of 0.24%. Applying a probability-based quadrant-repeat decision rule within the inspection process effectively reduces process-level false positives to zero, with an estimated probability of repetition of ≤0.084%. This statistically validated approach increases the overall inspection accuracy to 99.83%. The method allows the robust detection and localization of relevant Mode Hops and represents a significant contribution to explainable, AI-supported tire testing. It fulfills central requirements for the automation of shearography-based tire testing and contributes to the possible certification process of non-destructive testing methods in safety-critical industries. Full article

► Show Figures

Figure 1

36 pages, 2714 KB

Open AccessReview

Artificial Intelligence-Based Epileptic Seizure Prediction Strategies: A Review

by Andrea V. Perez-Sanchez, Martin Valtierra-Rodriguez, J. Jesus De-Santiago-Perez, Carlos A. Perez-Ramirez, Arturo Garcia-Perez and Juan P. Amezquita-Sanchez

AI 2025, 6(10), 274; https://doi.org/10.3390/ai6100274 - 21 Oct 2025

Abstract

Epilepsy, a chronic neurological disorder marked by recurrent and unpredictable seizures, poses significant risks of injury and compromises patient quality of life. The accurate forecasting of seizures is paramount for enabling timely interventions and improving safety. Since the 1970s, research has increasingly focused [...] Read more.

Epilepsy, a chronic neurological disorder marked by recurrent and unpredictable seizures, poses significant risks of injury and compromises patient quality of life. The accurate forecasting of seizures is paramount for enabling timely interventions and improving safety. Since the 1970s, research has increasingly focused on analyzing bioelectrical signals for this purpose. In recent years, artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has emerged as a powerful tool for seizure prediction. This review, conducted by PRISMA guidelines, analyzes studies from 2020 to August 2025. It explores the evolution from traditional ML classifiers toward advanced DL architecture, including convolutional and recurrent neural networks and transformer-based frameworks, applied to bioelectrical signals. While these approaches show promising performance, significant challenges persist in patient generalization, standardized evaluation, and clinical validation. This review synthesizes current advancements, provides a critical analysis of methodological limitations, and outlines future directions for developing robust, clinically relevant seizure prediction systems to enhance patient autonomy and outcomes. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Engineering: Challenges and Developments)

► Show Figures

Figure 1

33 pages, 4831 KB

Open AccessArticle

A General-Purpose Knowledge Retention Metric for Evaluating Distillation Models Across Architectures and Tasks

by Arjay Alba and Jocelyn Villaverde

AI 2025, 6(10), 273; https://doi.org/10.3390/ai6100273 - 21 Oct 2025

Abstract

Background: Knowledge distillation (KD) compresses deep neural networks by transferring knowledge from a high-capacity teacher model to a lightweight student model. However, conventional evaluation metrics such as accuracy, mAP, IoU, or RMSE focus mainly on task performance and overlook how effectively the [...] Read more.

Background: Knowledge distillation (KD) compresses deep neural networks by transferring knowledge from a high-capacity teacher model to a lightweight student model. However, conventional evaluation metrics such as accuracy, mAP, IoU, or RMSE focus mainly on task performance and overlook how effectively the student internalizes the teacher’s knowledge. Methods: This study introduces the Knowledge Retention Score (KRS), a composite metric that integrates intermediate feature similarity and output agreement into a single interpretable score to quantify knowledge retention. KRS was primarily validated in computer vision (CV) through 36 experiments covering image classification, object detection, and semantic segmentation using diverse datasets and eight representative KD methods. Supplementary experiments were conducted in natural language processing (NLP) using transformer-based models on SST-2, and in time series regression with convolutional teacher–student pairs. Results: Across all domains, KRS correlated strongly with standard performance metrics while revealing internal retention dynamics that conventional evaluations often overlook. By reporting feature similarity and output agreement separately alongside the composite score, KRS provides transparent and interpretable insights into knowledge transfer. Conclusions: KRS offers a stable diagnostic tool and a complementary evaluation metric for KD research. Its generality across domains demonstrates its potential as a standardized framework for assessing knowledge retention beyond task-specific performance measures. Full article

► Show Figures

Figure 1

34 pages, 8070 KB

Open AccessArticle

AI-Enhanced Rescue Drone with Multi-Modal Vision and Cognitive Agentic Architecture

by Nicoleta Cristina Gaitan, Bianca Ioana Batinas and Calin Ursu

AI 2025, 6(10), 272; https://doi.org/10.3390/ai6100272 - 20 Oct 2025

Abstract

In post-disaster search and rescue (SAR) operations, unmanned aerial vehicles (UAVs) are essential tools, yet the large volume of raw visual data often overwhelms human operators by providing isolated, context-free information. This paper presents an innovative system with a novel cognitive–agentic architecture that [...] Read more.

In post-disaster search and rescue (SAR) operations, unmanned aerial vehicles (UAVs) are essential tools, yet the large volume of raw visual data often overwhelms human operators by providing isolated, context-free information. This paper presents an innovative system with a novel cognitive–agentic architecture that transforms the UAV from an intelligent tool into a proactive reasoning partner. The core innovation lies in the LLM’s ability to perform high-level semantic reasoning, logical validation, and robust self-correction through internal feedback loops. A visual perception module based on a custom-trained YOLO11 model feeds the cognitive core, which performs contextual analysis and hazard assessment, enabling a complete perception–reasoning–action cycle. The system also incorporates a physical payload delivery module for first-aid supplies, which acts on prioritized, actionable recommendations to reduce operator cognitive load and accelerate victim assistance. This work, therefore, presents the first developed LLM-driven architecture of its kind, transforming a drone from a mere data-gathering tool into a proactive reasoning partner and demonstrating a viable path toward reducing operator cognitive load in critical missions. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence on Structures Subjected to Natural Hazards)

► Show Figures

Figure 1

17 pages, 1775 KB

Open AccessArticle

AI-Driven Analysis for Real-Time Detection of Unstained Microscopic Cell Culture Images

by Kathrin Hildebrand, Tatiana Mögele, Dennis Raith, Maria Kling, Anna Rubeck, Stefan Schiele, Eelco Meerdink, Avani Sapre, Jonas Bermeitinger, Martin Trepel and Rainer Claus

AI 2025, 6(10), 271; https://doi.org/10.3390/ai6100271 - 18 Oct 2025

Abstract

Staining-based assays are widely used for cell analysis but are invasive, alter physiology, and prevent longitudinal monitoring. Label-free, morphology-based approaches could enable real-time, non-invasive drug testing, yet detection of subtle and dynamic changes has remained difficult. We developed a deep learning framework for [...] Read more.

Staining-based assays are widely used for cell analysis but are invasive, alter physiology, and prevent longitudinal monitoring. Label-free, morphology-based approaches could enable real-time, non-invasive drug testing, yet detection of subtle and dynamic changes has remained difficult. We developed a deep learning framework for stain-free monitoring of leukemia cell cultures using automated bright-field microscopy in a semi-automated culture system (AICE3, LABMaiTE, Augsburg, Germany). YOLOv8 models were trained on images from K562, HL-60, and Kasumi-1 cells, using an NVIDIA DGX A100 GPU for training and tested on GPU and CPU environments for real-time performance. Comparative benchmarking with RT-DETR and interpretability analyses using Eigen-CAM and radiomics (RedTell) was performed. YOLOv8 achieved high accuracy (mAP@0.5 > 98%, precision/sensitivity > 97%), with reproducibility confirmed on an independent dataset from a second laboratory and an AICE3 setup. The model distinguished between morphologically similar leukemia lines and reliably classified untreated versus differentiated K562 cells (hemin-induced erythroid and PMA-induced megakaryocytic; >95% accuracy). Incorporation of decitabine-treated cells demonstrated applicability to drug testing, revealing treatment-specific and intermediate phenotypes. Longitudinal monitoring captured culture- and time-dependent drift, enabling separation of temporal from drug-induced changes. Radiomics highlighted interpretable features such as size, elongation, and texture, but with lower accuracy than the deep learning approach. To our knowledge, this is the first demonstration that deep learning resolves subtle, drug-induced, and time-dependent morphological changes in unstained leukemia cells in real time. This approach provides a robust, accessible framework for label-free longitudinal drug testing and establishes a foundation for future autonomous, feedback-driven platforms in precision oncology. Ultimately, this approach may also contribute to more precise and adaptive clinical decision-making, advancing the field of personalized medicine. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

25 pages, 7385 KB

Open AccessArticle

Reducing Annotation Effort in Semantic Segmentation Through Conformal Risk Controlled Active Learning

by Can Erhan and Nazim Kemal Ure

AI 2025, 6(10), 270; https://doi.org/10.3390/ai6100270 - 18 Oct 2025

Abstract

Modern semantic segmentation models require extensive pixel-level annotations, creating a significant barrier to practical deployment as labeling a single image can take hours of human effort. Active learning offers a promising way to reduce annotation costs through intelligent sample selection. However, existing methods [...] Read more.

Modern semantic segmentation models require extensive pixel-level annotations, creating a significant barrier to practical deployment as labeling a single image can take hours of human effort. Active learning offers a promising way to reduce annotation costs through intelligent sample selection. However, existing methods rely on poorly calibrated confidence estimates, making uncertainty quantification unreliable. We introduce Conformal Risk Controlled Active Learning (CRC-AL), a novel framework that provides statistical guarantees on uncertainty quantification for semantic segmentation, in contrast to heuristic approaches. CRC-AL calibrates class-specific thresholds via conformal risk control, transforming softmax outputs into multi-class prediction sets with formal guarantees. From these sets, our approach derives complementary uncertainty representations: risk maps highlighting uncertain regions and class co-occurrence embeddings capturing semantic confusions. A physics-inspired selection algorithm leverages these representations with a barycenter-based distance metric that balances uncertainty and diversity. Experiments on Cityscapes and PascalVOC2012 show CRC-AL consistently outperforms baseline methods, achieving 95% of fully supervised performance with only 30% of labeled data, making semantic segmentation more practical under limited annotation budgets. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Graphical abstract

30 pages, 6035 KB

Open AccessArticle

Bio-Inspired Optimization of Transfer Learning Models for Diabetic Macular Edema Classification

by A. M. Mutawa, Khalid Sabti, Bibin Shalini Sundaram Thankaleela and Seemant Raizada

AI 2025, 6(10), 269; https://doi.org/10.3390/ai6100269 - 17 Oct 2025

Abstract

Diabetic Macular Edema (DME) poses a significant threat to vision, often leading to permanent blindness if not detected and addressed swiftly. Existing manual diagnostic methods are arduous and inconsistent, highlighting the pressing necessity for automated, accurate, and personalized solutions. This study presents a [...] Read more.

Diabetic Macular Edema (DME) poses a significant threat to vision, often leading to permanent blindness if not detected and addressed swiftly. Existing manual diagnostic methods are arduous and inconsistent, highlighting the pressing necessity for automated, accurate, and personalized solutions. This study presents a novel methodology for diagnosing DME and categorizing choroidal neovascularization (CNV), drusen, and normal conditions from fundus images through the application of transfer learning models and bio-inspired optimization methodologies. The methodology utilizes advanced transfer learning architectures, including VGG16, VGG19, ResNet50, EfficientNetB7, EfficientNetV2-S, InceptionV3, and InceptionResNetV2, for analyzing both binary and multi-class Optical Coherence Tomography (OCT) datasets. We combined the OCT datasets OCT2017 and OCTC8 to create a new dataset for our study. The parameters, including learning rate, batch size, and dropout layer of the fully connected network, are further adjusted using the bio-inspired Particle Swarm Optimization (PSO) method, in conjunction with thorough preprocessing. Explainable AI approaches, especially Shapley additive explanations (SHAP), provide transparent insights into the model’s decision-making processes. Experimental findings demonstrate that our bio-inspired optimized transfer learning Inception V3 significantly surpasses conventional deep learning techniques for DME classification, as evidenced by enhanced metrics including the accuracy, precision, recall, F1-score, misclassification rate, Matthew’s correlation coefficient, intersection over union, and kappa coefficient for both binary and multi-class scenarios. The accuracy achieved is approximately 98% in binary classification and roughly 90% in multi-class classification with the Inception V3 model. The integration of contemporary transfer learning architectures with nature-inspired PSO enhances diagnostic precision to approximately 95% in multi-class classification, while also improving interpretability and reliability, which are crucial for clinical implementation. This research promotes the advancement of more precise, personalized, and timely diagnostic and therapeutic strategies for Diabetic Macular Edema, aiming to avert vision loss and improve patient outcomes. Full article

(This article belongs to the Special Issue Artificial Intelligence in Biomedical Engineering: Challenges and Developments)

► Show Figures

Figure 1

23 pages, 3017 KB

Open AccessArticle

Improving Forecasting Accuracy of Stock Market Indices Utilizing Attention-Based LSTM Networks with a Novel Asymmetric Loss Function

by Shlok Sagar Rajpal, Rajesh Mahadeva, Amit Kumar Goyal and Varun Sarda

AI 2025, 6(10), 268; https://doi.org/10.3390/ai6100268 - 17 Oct 2025

Abstract

This study presents a novel approach to financial time series forecasting by introducing asymmetric loss functions. This is specifically designed to enhance directional accuracy across major stock indices (S&P 500, DJI, and NASDAQ Composite) over a 33-year time period. We integrate these loss [...] Read more.

This study presents a novel approach to financial time series forecasting by introducing asymmetric loss functions. This is specifically designed to enhance directional accuracy across major stock indices (S&P 500, DJI, and NASDAQ Composite) over a 33-year time period. We integrate these loss functions into an attention-based Long Short-Term Memory (LSTM) framework. The proposed loss functions are evaluated against traditional metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and other recent research-based losses. Our approach consistently achieves superior test-time directional accuracy, with gains of 3.4–6.1 percentage points over MSE/MAE and 2.0–4.5 percentage points over prior asymmetric losses, which are either non-differentiable or require extensive hyperparameter tuning. Furthermore, proposed models also achieve an F1 score of up to 0.74, compared to 0.63–0.68 for existing methods, and maintain competitive MAE values within 0.01–0.03 of the baseline. The optimized asymmetric loss functions improve specificity to above 0.62 and ensure a better balance between precision and recall. These results underscore the potential of directionally aware loss design to enhance AI-driven financial forecasting systems. Full article

(This article belongs to the Special Issue AI in Finance: Leveraging AI to Transform Financial Services)

► Show Figures

Figure 1

22 pages, 3339 KB

Open AccessArticle

An AutoML Algorithm: Multiple-Steps Ahead Forecasting of Correlated Multivariate Time Series with Anomalies Using Gated Recurrent Unit Networks

by Ying Su and Morgan C. Wang

AI 2025, 6(10), 267; https://doi.org/10.3390/ai6100267 - 14 Oct 2025

Abstract

Multiple time series forecasting is critical in domains such as energy management, economic analysis, web traffic prediction and air pollution monitoring to support effective resource planning. Traditional statistical learning methods, including Vector Autoregression (VAR) and Vector Autoregressive Integrated Moving Average (VARIMA), struggle with [...] Read more.

Multiple time series forecasting is critical in domains such as energy management, economic analysis, web traffic prediction and air pollution monitoring to support effective resource planning. Traditional statistical learning methods, including Vector Autoregression (VAR) and Vector Autoregressive Integrated Moving Average (VARIMA), struggle with nonstationarity, temporal dependencies, inter-series correlations, and data anomalies such as trend shifts, seasonal variations, and missing data. Furthermore, their effectiveness in multi-step ahead forecasting is often limited. This article presents an Automated Machine Learning (AutoML) framework that provides an end-to-end solution for researchers who lack in-depth knowledge of time series forecasting or advanced programming skills. This framework utilizes Gated Recurrent Unit (GRU) networks, a variant of Recurrent Neural Networks (RNNs), to tackle multiple correlated time series forecasting problems, even in the presence of anomalies. To reduce complexity and facilitate the AutoML process, many model parameters are pre-specified, thereby requiring minimal tuning. This design enables efficient and accurate multi-step forecasting while addressing issues including missing values and structural shifts. We also examine the advantages and limitations of GRU-based RNNs within the AutoML system for multivariate time series forecasting. Model performance is evaluated using multiple accuracy metrics across various forecast horizons. The empirical results confirm our proposed approach’s ability to capture inter-series dependencies and handle anomalies in long-range forecasts. Full article

(This article belongs to the Special Issue Machine Learning in Action: Practical Applications and Emerging Trends)

► Show Figures

Figure 1

24 pages, 1699 KB

Open AccessArticle

Efficient Sparse MLPs Through Motif-Level Optimization Under Resource Constraints

by Xiaotian Chen, Hongyun Liu and Seyed Sahand Mohammadi Ziabari

AI 2025, 6(10), 266; https://doi.org/10.3390/ai6100266 - 9 Oct 2025

Abstract

We study motif-based optimization for sparse multilayer perceptrons (MLPs), where weights are shared and updated at the level of small neuron groups (‘motifs’) rather than individual connections. Building on Sparse Evolutionary Training (SET), our approach reduces the number of unique parameters and redundant [...] Read more.

We study motif-based optimization for sparse multilayer perceptrons (MLPs), where weights are shared and updated at the level of small neuron groups (‘motifs’) rather than individual connections. Building on Sparse Evolutionary Training (SET), our approach reduces the number of unique parameters and redundant multiply–accumulate operations by exploiting block-structured sparsity. Across Fashion-MNIST and a lung X-ray dataset, our Motif-SET improves training/inference efficiency with modest accuracy trade-offs, and we provide a principled recipe to choose motif size based on accuracy and efficiency budgets. We further compare against representative modern sparse training and compression methods, analyze failure modes such as overly large motifs, and outline real-world constraints on mobile/embedded targets. Our results and ablations indicate that motif size

m = 2

often offers a strong balance between compute and accuracy under resource constraints. Full article

► Show Figures

Figure 1

20 pages, 3126 KB

Open AccessArticle

Few-Shot Image Classification Algorithm Based on Global–Local Feature Fusion

by Lei Zhang, Xinyu Yang, Xiyuan Cheng, Wenbin Cheng and Yiting Lin

AI 2025, 6(10), 265; https://doi.org/10.3390/ai6100265 - 9 Oct 2025

Abstract

Few-shot image classification seeks to recognize novel categories from only a handful of labeled examples, but conventional metric-based methods that rely mainly on global image features often produce unstable prototypes under extreme data scarcity, while local-descriptor approaches can lose context and suffer from [...] Read more.

Few-shot image classification seeks to recognize novel categories from only a handful of labeled examples, but conventional metric-based methods that rely mainly on global image features often produce unstable prototypes under extreme data scarcity, while local-descriptor approaches can lose context and suffer from inter-class local-pattern overlap. To address these limitations, we propose a Global–Local Feature Fusion network that combines a frozen, pretrained global feature branch with a self-attention based multi-local feature fusion branch. Multiple random crops are encoded by a shared backbone (ResNet-12), projected to Query/Key/Value embeddings, and fused via scaled dot-product self-attention to suppress background noise and highlight discriminative local cues. The fused local representation is concatenated with the global feature to form robust class prototypes used in a prototypical-network style classifier. On four benchmarks, our method achieves strong improvements: Mini-ImageNet 70.31% ± 0.20 (1-shot)/85.91% ± 0.13 (5-shot), Tiered-ImageNet 73.37% ± 0.22/87.62% ± 0.14, FC-100 47.01% ± 0.20/64.13% ± 0.19, and CUB-200-2011 82.80% ± 0.18/93.19% ± 0.09, demonstrating consistent gains over competitive baselines. Ablation studies show that (1) naive local averaging improves over global-only baselines, (2) self-attention fusion yields a large additional gain (e.g., +4.50% in 1-shot on Mini-ImageNet), and (3) concatenating global and fused local features gives the best overall performance. These results indicate that explicitly modeling inter-patch relations and fusing multi-granularity cues produces markedly more discriminative prototypes in few-shot regimes. Full article

► Show Figures

Figure 1

15 pages, 3254 KB

Open AccessArticle

Rodent Social Behavior Recognition Using a Global Context-Aware Vision Transformer Network

by Muhammad Imran Sharif, Doina Caragea and Ahmed Iqbal

AI 2025, 6(10), 264; https://doi.org/10.3390/ai6100264 - 8 Oct 2025

Abstract

Animal behavior recognition is an important research area that provides insights into areas such as neural functions, gene mutations, and drug efficacy, among others. The manual coding of behaviors based on video recordings is labor-intensive and prone to inconsistencies and human error. Machine [...] Read more.

Animal behavior recognition is an important research area that provides insights into areas such as neural functions, gene mutations, and drug efficacy, among others. The manual coding of behaviors based on video recordings is labor-intensive and prone to inconsistencies and human error. Machine learning approaches have been used to automate the analysis of animal behavior with promising results. Our work builds on existing developments in animal behavior analysis and state-of-the-art approaches in computer vision to identify rodent social behaviors. Specifically, our proposed approach, called Vision Transformer for Rat Social Interactions (ViT-RSI), leverages the existing Global Context Vision Transformer (GC-ViT) architecture to identify rat social interactions. Experimental results using five behaviors of the publicly available Rat Social Interaction (RatSI) dataset show that the ViT-RatSI approach can accurately identify rat social interaction behaviors. When compared with prior results from the literature, the ViT-RatSI approach achieves best results for four out of five behaviors, specifically for the “Approaching”, “Following”, “Moving away”, and “Solitary” behaviors, with F1 scores of 0.81, 0.81, 0.86, and 0.94, respectively. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

20 pages, 1740 KB

Open AccessArticle

Cross-Modal Alignment Enhancement for Vision–Language Tracking via Textual Heatmap Mapping

by Wei Xu, Gu Geng, Xinming Zhang and Di Yuan

AI 2025, 6(10), 263; https://doi.org/10.3390/ai6100263 - 8 Oct 2025

Abstract

Single-object vision–language tracking has become an important research topic due to its potential in applications such as intelligent surveillance and autonomous driving. However, existing cross-modal alignment methods typically rely on contrastive learning and struggle to effectively address semantic ambiguity or the presence of [...] Read more.

Single-object vision–language tracking has become an important research topic due to its potential in applications such as intelligent surveillance and autonomous driving. However, existing cross-modal alignment methods typically rely on contrastive learning and struggle to effectively address semantic ambiguity or the presence of multiple similar objects. This study aims to explore how to achieve more robust vision–language alignment under these challenging conditions, thereby achieving accurate object localization. To this end, we propose a text heatmap mapping (THM) module that enhances the spatial guidance of textual cues in tracking. The THM module integrates visual and language features and generates semantically aware heatmaps, enabling the tracker to focus on the most relevant regions while suppressing distractors. This framework, developed based on UVLTrack, combines a visual transformer with a pre-trained language encoder. The proposed method is evaluated on benchmark datasets such as OTB99, LaSOT, and TNL2K. The main contribution of this paper is the introduction of a novel spatial alignment mechanism for multimodal tracking and its effectiveness on various tracking benchmarks. Results demonstrate that the THM-based tracker improves robustness to semantic ambiguity and multi-instance interference, outperforming baseline frameworks. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Object Detection and Tracking: Theory and Applications)

► Show Figures

Figure 1

27 pages, 9738 KB

Open AccessArticle

Machine Learning Recognition and Phase Velocity Estimation of Atmospheric Gravity Waves from OI 557.7 nm All-Sky Airglow Images

by Rady Mahmoud, Moataz Abdelwahab, Kazuo Shiokawa and Ayman Mahrous

AI 2025, 6(10), 262; https://doi.org/10.3390/ai6100262 - 7 Oct 2025

Abstract

Atmospheric gravity waves (AGWs) are treated as density structure perturbations of the atmosphere and play an important role in atmospheric dynamics. Utilizing All-Sky Airglow Imagers (ASAIs) with OI-Filter 557.7 nm, AGW phase velocity and propagation direction were extracted using classified images by visual [...] Read more.

Atmospheric gravity waves (AGWs) are treated as density structure perturbations of the atmosphere and play an important role in atmospheric dynamics. Utilizing All-Sky Airglow Imagers (ASAIs) with OI-Filter 557.7 nm, AGW phase velocity and propagation direction were extracted using classified images by visual inspection, where airglow images were collected from the OMTI network at Shigaraki (34.85 E, 134.11 N) from October 1998 to October 2002. Nonetheless, a large dataset of airglow images are processed and classified for studying AGW seasonal variation in the middle atmosphere. In this article, a machine learning-based approach for image recognition of AGWs from ASAIs is suggested. Consequently, three convolutional neural networks (CNNs), namely AlexNet, GoogLeNet, and ResNet-50, are considered. Out of 13,201 deviated images, 1192 very weak/unclear AGW signatures were eliminated during the quality control process. All networks were trained and tested by 12,007 classified images which approximately cover the maximum solar cycle during the time-period mentioned above. In the testing phase, AlexNet achieved the highest accuracy of 98.41%. Consequently, estimation of AGW zonal and meridional phase velocities in the mesosphere region by a cascade forward neural network (CFNN) is presented. The CFNN was trained and tested based on AGW and neutral wind data. AGW data were extracted from the classified AGW images by event and spectral methods, where wind data were extracted from the Horizontal Wind Model (HWM) as well as the middle and upper atmosphere radar in Shigaraki. As a result, the estimated phase velocities were determined with correlation coefficient (R) above 0.89 in all training and testing phases. Finally, a comparison with the existing studies confirms the accuracy of our proposed approaches in addition to AGW velocity forecasting. Full article

► Show Figures

Figure 1

19 pages, 1858 KB

Open AccessArticle

Color Space Comparison of Isolated Cervix Cells for Morphology Classification

by Irari Jiménez-López, José E. Valdez-Rodríguez and Marco A. Moreno-Armendáriz

AI 2025, 6(10), 261; https://doi.org/10.3390/ai6100261 - 7 Oct 2025

Abstract

Cervical cytology processing involves the morphological analysis of cervical cells to detect abnormalities. In recent years, machine learning and deep learning algorithms have been explored to automate this process. This study investigates the use of color space transformations as a preprocessing technique to [...] Read more.

Cervical cytology processing involves the morphological analysis of cervical cells to detect abnormalities. In recent years, machine learning and deep learning algorithms have been explored to automate this process. This study investigates the use of color space transformations as a preprocessing technique to reorganize visual information and improve classification performance using isolated cell images. Twelve color space transformations were compared, including RGB, CMYK, HSV, Grayscale, CIELAB, YUV, the individual RGB channels, and combinations of these channels (RG, RB, and GB). Two classification strategies were employed: binary classification (normal vs. abnormal) and five-class classification. The SIPaKMeD dataset was used, with images resized to

256 \times 256

pixels via zero-padding. Data augmentation included random flipping and ±10° rotations applied with a 50% probability, followed by normalization. A custom CNN architecture was developed, comprising four convolutional layers followed by two fully connected layers and an output layer. The model achieved average precision, recall, and F1-score values of 91.39%, 91.34%, and 91.31% for the five-class case, respectively, and 99.69%, 96.68%, and 96.89% for the binary classification, respectively; these results were compared with a VGG-16 network. Furthermore, CMYK, HSV, and the RG channel combination consistently outperformed other color spaces, highlighting their potential to enhance classification accuracy. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

80 pages, 7623 KB

Open AccessSystematic Review

From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs

by Ioannis Kazlaris, Efstathios Antoniou, Konstantinos Diamantaras and Charalampos Bratsas

AI 2025, 6(10), 260; https://doi.org/10.3390/ai6100260 - 3 Oct 2025

Abstract

Large Language Models (LLMs) exhibit remarkable generative capabilities but remain vulnerable to hallucinations—outputs that are fluent yet inaccurate, ungrounded, or inconsistent with source material. To address the lack of methodologically grounded surveys, this paper introduces a novel method-oriented taxonomy of hallucination mitigation strategies [...] Read more.

Large Language Models (LLMs) exhibit remarkable generative capabilities but remain vulnerable to hallucinations—outputs that are fluent yet inaccurate, ungrounded, or inconsistent with source material. To address the lack of methodologically grounded surveys, this paper introduces a novel method-oriented taxonomy of hallucination mitigation strategies in text-based LLMs. The taxonomy organizes over 300 studies into six principled categories: Training and Learning Approaches, Architectural Modifications, Input/Prompt Optimization, Post-Generation Quality Control, Interpretability and Diagnostic Methods, and Agent-Based Orchestration. Beyond mapping the field, we identify persistent challenges such as the absence of standardized evaluation benchmarks, attribution difficulties in multi-method systems, and the fragility of retrieval-based methods when sources are noisy or outdated. We also highlight emerging directions, including knowledge-grounded fine-tuning and hybrid retrieval–generation pipelines integrated with self-reflective reasoning agents. This taxonomy provides a methodological framework for advancing reliable, context-sensitive LLM deployment in high-stakes domains such as healthcare, law, and defense. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Graphical abstract

Journal Description

AI

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI