AI | April 2026 - Browse Articles

16 pages, 5098 KB

Open AccessArticle

Etch-ViGen: A Video Generation Model for Etching Simulation

by Li Ding, Hua Shao, Zhiqiang Li, Nan Liu, Rui Chen and Zhenjie Yao

AI 2026, 7(4), 149; https://doi.org/10.3390/ai7040149 - 21 Apr 2026

Viewed by 1574

With the scaling down of integrated circuit dimensions and the increasing complexity of transistor structures, the role of etching in manufacturing has become increasingly critical. We propose an etching simulation approach based on a video generation model, which models the evolution of the [...] Read more.

With the scaling down of integrated circuit dimensions and the increasing complexity of transistor structures, the role of etching in manufacturing has become increasingly critical. We propose an etching simulation approach based on a video generation model, which models the evolution of the etching process as a video generation task. By embedding frames into quantized latent codeword representations using VQ-VAE (Vector Quantized Variational Autoencoder), injecting physical conditions with a CLIP projection layer, and leveraging a temporal autoregressive prediction model, we propose a generative model of the etching process. We validate the effectiveness of our model on both simulated and experimental data. Our approach achieves a 6000× speedup over the Monte Carlo method while reducing the simulation MAE (Mean Absolute Error) by 14.4% compared with the state-of-the-art video model. Furthermore, results generated by our video-based model show strong agreement with experimental data. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

27 pages, 1901 KB

Open AccessArticle

Comparative Forecasting and Misclassification Analysis Using Health Survey Data

by Ermioni Traka, George Papageorgiou, Georgios Mantzavinis and Christos Tjortjis

AI 2026, 7(4), 148; https://doi.org/10.3390/ai7040148 - 20 Apr 2026

Viewed by 1649

Abstract

Background: Accurate mortality prediction remains a major challenge in public health due to the complex interactions among demographic, socioeconomic, behavioral, and medical factors. This problem is particularly relevant for identifying high-risk groups and improving preventive healthcare strategies. While existing studies demonstrate strong predictive [...] Read more.

Background: Accurate mortality prediction remains a major challenge in public health due to the complex interactions among demographic, socioeconomic, behavioral, and medical factors. This problem is particularly relevant for identifying high-risk groups and improving preventive healthcare strategies. While existing studies demonstrate strong predictive performance, they mainly rely on clinically structured data and focus on model performance. Challenges such as misclassification and atypical cases remain less explored. Methods: Using the Integrated Public Use Microdata Series National Health Interview Survey (IPUMS-NHIS) 2010 and 2015 datasets (193,765 records, 104 features), this study investigates mortality prediction through comparative Machine Learning. Data preprocessing included feature engineering, categorical encoding, and removal of missing entries. Class imbalance was addressed using SMOTE and SMOTE-ENN resampling, followed by hyperparameter tuning. Three models—Logistic Regression, Random Forest, and XGBoost—were trained to classify mortality, with recall prioritized to ensure accurate identification of deceased cases. Results: Results showed that XGBoost achieved the best performance (Recall = 69%, F1 = 0.39, AUC = 0.92), outperforming other models in balancing sensitivity and specificity. Feature importance and permutation analyses highlighted age, employment status, self-reported health, and lifestyle indicators as key predictors. Misclassification analysis combined with Isolation Forest revealed atypical profiles not captured by standard models. Conclusions: The findings underscore XGBoost’s effectiveness and demonstrate the value of integrating anomaly detection with classification to improve mortality prediction and inform public health planning. Full article

► Show Figures

Figure 1

22 pages, 2108 KB

Open AccessReview

A Short Review of Arabic Aspect-Based Sentiment Analysis: Methods, Challenges and Future Directions

by Hamza Youseef, Luis Gonzaga Baca Ruiz, David Criado Ramón and María del Carmen Pegalajar Jimenez

AI 2026, 7(4), 147; https://doi.org/10.3390/ai7040147 - 19 Apr 2026

Viewed by 1886

Abstract

The need for Arabic Aspect-Based Sentiment Analysis (ABSA) has grown steadily alongside the expansion of digital content, while the linguistic complexity of Modern Standard Arabic and its diverse dialects introduces significant challenges. However, progress in the field remains constrained by methodological fragmentation, inconsistent [...] Read more.

The need for Arabic Aspect-Based Sentiment Analysis (ABSA) has grown steadily alongside the expansion of digital content, while the linguistic complexity of Modern Standard Arabic and its diverse dialects introduces significant challenges. However, progress in the field remains constrained by methodological fragmentation, inconsistent task definitions, heterogeneous datasets, and non-standardized evaluation practices. Based on a systematic analysis of 57 studies, this work presents an analytical and interpretive review that moves beyond performance-oriented surveys to examine the methodological foundations of Arabic ABSA research. The review follows a rigorous and transparent study selection process and applies a structured analytical framework to analyze task formulations, dataset characteristics, modeling approaches and evaluation strategies. Our findings reveal persistent challenges, including ambiguous aspect definitions, insufficiently documented annotation protocols, structural annotation biases, and limited robustness across domains and dialects. A heavy reliance on Transformer-based architectures and new Arabic foundation models can create an illusion of progress. Researchers often evaluate these models on small and homogeneous datasets. Consequently, strong in-domain performance obscures limited cross-domain and cross-dialectal generalizability. This study concludes by outlining actionable research directions, emphasizing clearer task standardization, more rigorous annotation guidelines, unified evaluation, and broader dialectal coverage to enhance reproducibility and scalability in Arabic ABSA systems. Full article

► Show Figures

Figure 1

23 pages, 2954 KB

Open AccessArticle

VGPO-MCTS: Distilling Step-Level Supervision from Value-Guided Tree Search for Mathematical Reasoning

by Pin Wu, Yufei Zhu and Huiyan Wang

AI 2026, 7(4), 146; https://doi.org/10.3390/ai7040146 - 17 Apr 2026

Viewed by 1290

Abstract

Large language models (LLMs) are increasingly used in applied intelligent systems, but mid-sized models still lag on mathematical reasoning, partly because reliable step-level supervision is scarce. Many existing remedies rely on costly human annotation, stronger teacher models, or heavy training pipelines, which limits [...] Read more.

Large language models (LLMs) are increasingly used in applied intelligent systems, but mid-sized models still lag on mathematical reasoning, partly because reliable step-level supervision is scarce. Many existing remedies rely on costly human annotation, stronger teacher models, or heavy training pipelines, which limits practical adoption. We propose VGPO-MCTS (Value-Guided Group-wise Policy Optimization over Monte Carlo Tree Search), a search-and-distillation framework that constructs reusable step-level supervision from datasets that provide only problems and final answers. VGPO-MCTS augments a frozen backbone with (i) a lightweight value model that scores candidate reasoning states formed by a reasoning prefix and its candidate next step, and (ii) a policy updated with parameter-efficient adaptation. During search, the value model guides tree expansion and selection, while verified outcomes are propagated backward to correct node utilities. The corrected search trees are then distilled into two complementary datasets: a value regression dataset for value learning and group-wise sibling candidate sets for GRPO-style policy optimization. Experiments on GSM8K and the MATH dataset with ChatGLM3-6B and SciGLM-6B show stable round-wise improvements in final-answer exact match under a lightweight adaptation setting. After three rounds of self-training, the proposed framework improves performance by about 6.3 percentage points on GSM8K and about 3.9 percentage points on MATH across the two backbones. Full article

► Show Figures

Figure 1

58 pages, 2450 KB

Open AccessArticle

Quantum-Inspired Hybrid Bald Eagle-Ukari Algorithm with Reinforcement Learning for Performance Optimization of Conical Solar Distillers with Sand-Filled Copper Fins: A Novel Bio-Inspired Approach

by Mohamed Loey, Mostafa Elbaz, Hanaa Salem Marie and Heba M. Khalil

AI 2026, 7(4), 145; https://doi.org/10.3390/ai7040145 - 17 Apr 2026

Cited by 1 | Viewed by 1334

Abstract

This study introduces a novel Quantum-Inspired Hybrid Bald Eagle-Ukari Algorithm with Reinforcement Learning (QI-HBEUA-RL) for comprehensive optimization of conical solar distillers equipped with sand-filled copper conical fins. The proposed algorithm synergistically combines quantum computing principles (superposition and entanglement), bio-inspired metaheuristics (Bald Eagle Search [...] Read more.

This study introduces a novel Quantum-Inspired Hybrid Bald Eagle-Ukari Algorithm with Reinforcement Learning (QI-HBEUA-RL) for comprehensive optimization of conical solar distillers equipped with sand-filled copper conical fins. The proposed algorithm synergistically combines quantum computing principles (superposition and entanglement), bio-inspired metaheuristics (Bald Eagle Search and Ukari Algorithm), and reinforcement learning mechanisms to achieve unprecedented optimization performance in complex thermal-hydraulic systems. The QI-HBEUA-RL framework employs quantum-encoded population representation, enabling simultaneous exploration of multiple solution states, while reinforcement learning dynamically adjusts algorithmic parameters based on search landscape characteristics and historical performance data. Experimental validation tested seven distiller configurations in El-Oued, Algeria, under controlled conditions (7.85 kWh/m²/day solar radiation, 42.2 °C ambient temperature). The optimal configuration of copper conical fins with 14 g sand at 0 cm spacing achieved: daily productivity of 7.75 L/m²/day (+61.46% improvement over conventional design), thermal efficiency of 61.9%, exergy efficiency of 4.02%, and economic payback period of 5.8 days. Comprehensive algorithm comparison against six state-of-the-art multi-objective optimizers (NSGA-II, MOEA/D, MOPSO, MOGWO, MOHHO) across 30 independent runs demonstrated statistically significant superiority (p < 0.001, Wilcoxon test). QI-HBEUA-RL achieved 7.42% improvement in hypervolume indicator, 29.35% reduction in inverted generational distance, and 19.49% better solution spacing. Generalization validation on seven benchmark problems (ZDT1-6, DTLZ2, DTLZ7) and three renewable energy applications confirmed algorithm robustness across diverse problem types. Three real-world case studies, remote village water supply (238:1 benefit–cost), industrial facility (100% energy reduction), and emergency relief (740× cost savings) validate practical implementation viability. This research advances solar thermal desalination technology and multi-objective optimization methodologies, providing validated solutions for sustainable freshwater production in water-scarce regions. Full article

► Show Figures

Figure 1

15 pages, 629 KB

Open AccessArticle

Tiny Neural Receiver: Enabling On-Device Learning for Scalable and Adaptive 6G Devices

by Iñigo Bilbao, Eneko Iradier, Jon Montalban, Marta Fernández, Iñaki Eizmendi and Pablo Angueira

AI 2026, 7(4), 144; https://doi.org/10.3390/ai7040144 - 17 Apr 2026

Viewed by 1232

Abstract

The evolution toward 6G communications requires integrating Tiny Machine Learning (TinyML) principles to enable intelligent, energy-efficient, and adaptable signal processing at the network edge. However, current receiver architectures face a fundamental trade-off: classical model-driven designs, while naturally efficient due to their basis in [...] Read more.

The evolution toward 6G communications requires integrating Tiny Machine Learning (TinyML) principles to enable intelligent, energy-efficient, and adaptable signal processing at the network edge. However, current receiver architectures face a fundamental trade-off: classical model-driven designs, while naturally efficient due to their basis in communication theory, lack the flexibility to adapt to varying channel conditions. Meanwhile, fully data-driven deep-learning-based approaches break the stringent resource constraints of TinyML. This paper introduces the tiny neural receiver (TNR), a pioneering architecture that bridges these paradigms by integrating model-based signal processing with lightweight neural optimization to overcome this challenge. The TNR’s primary contribution is its unique hybrid design, which combines the efficiency and interpretability of traditional theory-based receivers with the ability to adapt to different contexts using trainable neural components. This integration occurs within resource budgets that align with TinyML specifications. Experimental results show that the TNR achieves a 5 dB SNR reduction at a target block error rate of

10^{- 4}

. The reported 5 dB SNR gain is a direct result of our resource-aware design framework, which selectively applies lightweight neural optimization to only the most impactful receiver blocks (channel estimation and decoding) to maximize gain without exceeding TinyML complexity limits. This achievement is further supported by an end-to-end training protocol that uses 15,000 iterations of over-the-air data to fine-tune these parameters for the specific static 3.5 GHz propagation channel and OFDM configuration evaluated. Furthermore, the TNR’s modular design enables flexible deployment across a range of 6G scenarios, from mobile broadband to mission-critical IoT. This establishes the TNR as a promising framework for AI-native 6G receivers. Full article

(This article belongs to the Special Issue Deep Learning Approaches for PHY/MAC Wireless Communication and AI Integration)

► Show Figures

Figure 1

26 pages, 1442 KB

Open AccessArticle

Hybrid Loss-Based Deep Learning Framework Using EfficientNet-B3 for Multi-Class Colorectal Cancer Detection

by Anusha Nallamalla and Chandrakanta Mahanty

AI 2026, 7(4), 143; https://doi.org/10.3390/ai7040143 - 16 Apr 2026

Viewed by 1249

Abstract

Diagnosis of colorectal cancer (CRC) primarily relies on histopathological examination of hematoxylin and eosin-stained tissue sections; however, manual interpretation is time-consuming, subjective, and increasingly impractical given the rapid growth of digital pathology data. We introduced a hybrid loss-based learning framework for multi-class colorectal [...] Read more.

Diagnosis of colorectal cancer (CRC) primarily relies on histopathological examination of hematoxylin and eosin-stained tissue sections; however, manual interpretation is time-consuming, subjective, and increasingly impractical given the rapid growth of digital pathology data. We introduced a hybrid loss-based learning framework for multi-class colorectal histopathology image classification that improves class-balanced performance without increasing model complexity. Various EfficientNet versions were checked as the first step to establishing a strong baseline, and EfficientNet-B3 was chosen based on validation Matthews Correlation Coefficient (MCC). Extending this backbone, we propose a hybrid loss function that mixes weighted cross-entropy and focal loss to achieve the combined effect of dealing with the global class imbalance while also focusing on hard-to-classify samples. The results of experiments on a large-scale colorectal histopathology dataset show that the Hybrid-B3 model introduced significantly improves the baseline settings. Hybrid-B3 registers a test accuracy of 99.83%, a very high class-balanced performance with a balanced accuracy and G-Mean of 99.85%. The changes are verified and non-random by the statistical validation using bootstrap confidence intervals and paired significance tests. The offered solution emphasizes the efficiency of loss-function optimization solely to provide improvements in robustness and reliability in computational pathology and, correspondingly, yields a practical and scalable solution for colorectal cancer diagnostic support in the real world. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

23 pages, 5230 KB

Open AccessReview

Mapping the LLM Landscape: A Cross-Family Survey of Architectures, Alignment Methods, and Benchmark Performance

by Deepshikha Bhati, Fnu Neha, Devi Sri Bandaru, Matthew Weber and Ishan Dilipbhai Gajera

AI 2026, 7(4), 142; https://doi.org/10.3390/ai7040142 - 16 Apr 2026

Viewed by 3321

Abstract

Large Language Models (LLMs) have become foundational to modern Artificial Intelligence (AI), enabling advanced reasoning, multimodal understanding, and scalable human-AI interaction across diverse domains. This survey provides a comprehensive review of major proprietary and open-source LLM families, including GPT, LLaMA 2, Gemini, Claude, [...] Read more.

Large Language Models (LLMs) have become foundational to modern Artificial Intelligence (AI), enabling advanced reasoning, multimodal understanding, and scalable human-AI interaction across diverse domains. This survey provides a comprehensive review of major proprietary and open-source LLM families, including GPT, LLaMA 2, Gemini, Claude, DeepSeek, Falcon, and Qwen. It systematically examines architectural advancements such as transformer refinements, mixture-of-experts paradigms, attention optimization, long-context modeling, and multimodal integration. The paper further analyzes alignment and safety mechanisms, encompassing instruction tuning, reinforcement learning from human feedback, and constitutional frameworks, and discusses their implications for controllability, reliability, and responsible deployment. Comparative analysis of training strategies, data curation practices, efficiency optimizations, and application settings highlights key trade-offs among scalability, performance, interpretability, and ethical considerations. Beyond synthesis, the survey introduces a structured taxonomy and a feature-driven comparative study of over 50 reconstructed LLM architectures, complemented by an interactive visualization interface and an open-source implementation to support transparency and reproducibility. Finally, it outlines open challenges and future research directions related to transparency, computational cost, data governance, and societal impact, offering a unified reference for researchers and practitioners developing large-scale AI systems. Full article

► Show Figures

Figure 1

34 pages, 6632 KB

Open AccessArticle

SPICD-Net: A Siamese PointNet Framework for Autonomous Indoor Change Detection in 3D LiDAR Point Clouds

by Dalibor Šeljmeši, Vladimir Brtka, Velibor Ilić, Dalibor Dobrilović, Eleonora Brtka and Višnja Ognjenović

AI 2026, 7(4), 141; https://doi.org/10.3390/ai7040141 - 15 Apr 2026

Viewed by 988

Abstract

Reliable change detection in indoor environments remains a challenge for autonomous robotic systems using 3D LiDAR. Existing methods often require manual annotation, computationally intensive architectures, or focus on outdoor scenes. This paper presents SPICD-Net, a lightweight Siamese PointNet framework for indoor 3D change [...] Read more.

Reliable change detection in indoor environments remains a challenge for autonomous robotic systems using 3D LiDAR. Existing methods often require manual annotation, computationally intensive architectures, or focus on outdoor scenes. This paper presents SPICD-Net, a lightweight Siamese PointNet framework for indoor 3D change detection trained exclusively on synthetically generated anomalies, eliminating manual labeling. The framework offers three deployment-oriented contributions: a three-class Siamese formulation separating no-change, changed, and geometrically inconsistent tile pairs; a pre-FPS anomaly injection strategy that aligns synthetic training with inference-time preprocessing; and a stochastic-gated Chamfer-statistics branch that complements learned embeddings with explicit geometric cues under consumer-grade hardware constraints. Evaluated on 14 controlled simulation experiments in an indoor corridor dataset, SPICD-Net achieved aggregated Precision = 0.86, Recall = 0.82, F1-score = 0.84, and Accuracy = 0.96, with zero false positives in the no-change baseline and mean inference time of 22.4 s for a 172-tile map on a single consumer GPU. Additional robustness experiments identified registration accuracy as the main operational prerequisite. A limited real-world validation in one unseen room (four scans, 67 tiles) achieved Precision = 0.583, Recall = 1.000, and F1 = 0.737. Full article

(This article belongs to the Special Issue Artificial Intelligence for Robotic Perception and Planning)

► Show Figures

Figure 1

19 pages, 1305 KB

Open AccessArticle

AI-Driven Identification of Candidate Peptides for Immunotherapy in Non-Obese Diabetic Mice: An In Silico Study

by Irini Doytchinova, Ivan Dimitrov, Mariyana Atanasova, Nikolina M. Mihaylova and Andrey Tchorbanov

AI 2026, 7(4), 140; https://doi.org/10.3390/ai7040140 - 15 Apr 2026

Viewed by 1235

Abstract

Type 1 diabetes (T1D) is an autoimmune disease characterized by T-cell-mediated destruction of pancreatic β-cells. Antigen-specific peptide immunotherapy represents a promising strategy to restore immune tolerance. Reliable identification of relevant T-cell epitopes requires accurate prediction of peptide binding to disease-associated major histocompatibility complex [...] Read more.

Type 1 diabetes (T1D) is an autoimmune disease characterized by T-cell-mediated destruction of pancreatic β-cells. Antigen-specific peptide immunotherapy represents a promising strategy to restore immune tolerance. Reliable identification of relevant T-cell epitopes requires accurate prediction of peptide binding to disease-associated major histocompatibility complex (MHC) molecules. In this study, we developed and validated artificial intelligence (AI)-driven machine learning (ML) predictive models for peptides binding to the NOD mouse-specific MHC class I molecules H-2D^b and H-2K^d and the class II molecule I-A^g7. Balanced datasets of experimentally validated binders and non-binders were compiled, divided into training and test sets, and used to construct position-specific logo models and supervised ML classifiers based on z-scale physicochemical descriptors. External validation demonstrated moderate predictive performance for the logo models (ROC AUC 0.685–0.738), whereas AI models, including Random Forest, Support Vector Machine, and Gradient Boosting, achieved substantially improved discrimination (ROC AUC 0.888–0.906). The validated models were applied to the major T1D autoantigens glutamic acid decarboxylase 65, insulin-1, insulin-2 and zinc transporter 8 and predicted multiple binders, with some overlapping with previously reported immunodominant regions. Selected binders were prioritized for further synthesis and in vivo immunogenicity testing in NOD mice. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

15 pages, 3008 KB

Open AccessArticle

Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting

by Georgios Bouchouras, Dimitrios Doumanas, Andreas Soularidis, Konstantinos Kotis and George Vouros

AI 2026, 7(4), 139; https://doi.org/10.3390/ai7040139 - 14 Apr 2026

Viewed by 972

Abstract

Ontology engineering plays a critical role in clinical decision support systems for Parkinson’s Disease (PD) monitoring and alerting. While Large Language Models (LLMs) have shown promise in knowledge modeling tasks, their effectiveness in autonomously constructing comprehensive ontologies for complex clinical domains remains unclear. [...] Read more.

Ontology engineering plays a critical role in clinical decision support systems for Parkinson’s Disease (PD) monitoring and alerting. While Large Language Models (LLMs) have shown promise in knowledge modeling tasks, their effectiveness in autonomously constructing comprehensive ontologies for complex clinical domains remains unclear. This study investigates four ontology engineering methodologies for PD monitoring and alerting: One-shot (OS) prompting, Decomposed Sequential Prompting (DSP), X-HCOME, and SimX-HCOME+. Multiple LLMs were evaluated across these methodologies. Generated ontologies were assessed against a reference PD ontology using structural evaluation metrics focused on classes and object properties. Expert review was additionally conducted to analyze knowledge extensions beyond the gold standard. LLMs were able to autonomously generate syntactically valid and semantically meaningful ontologies using OS and DSP prompting; however, these ontologies exhibited limited conceptual coverage. Incorporating human expertise through X-HCOME significantly improved ontology completeness and evaluation metrics. Expert review further validated clinically relevant concepts absent from the reference ontology. SimX-HCOME+ demonstrated that iterative, supervised collaboration supports ontology refinement, although challenges persisted in natural language-to-rule formalization. The findings suggest that LLMs are more effective as collaborative assistants rather than standalone ontology engineers in the PD domain. Structured human–LLM collaboration is associated with improved ontology coverage and facilitates the identification of potential knowledge extensions in clinical monitoring applications. While the present evaluation focuses primarily on structural ontology elements, the proposed methodologies provide useful insights for LLM-assisted ontology engineering in complex healthcare domains. Full article

► Show Figures

Figure 1

14 pages, 531 KB

Open AccessArticle

Hybrid Sentiment Analysis in Financial Markets: Multi-Stage LLM Integration for Market-Neutral Alpha Generation

by Johannes Stübinger and Luis Wöhner

AI 2026, 7(4), 138; https://doi.org/10.3390/ai7040138 - 13 Apr 2026

Cited by 1 | Viewed by 3141

Abstract

This study addresses the challenge of high signal-to-noise ratios in financial sentiment analysis by introducing a hybrid, multi-stage AI framework. We combine the high-throughput capabilities of FinBERT with the deep contextual reasoning of Google Gemini to extract actionable intelligence from over 9,000,000 data [...] Read more.

This study addresses the challenge of high signal-to-noise ratios in financial sentiment analysis by introducing a hybrid, multi-stage AI framework. We combine the high-throughput capabilities of FinBERT with the deep contextual reasoning of Google Gemini to extract actionable intelligence from over 9,000,000 data points, including the U.S. Securities and Exchange Commission (SEC) filings and financial news. By applying our rigorous “Data Funnel” logic, we filter out noise from the massive dataset and surface a small set of high-conviction signals. These signals are executed on a historically dynamic universe of top S&P 500 constituents within a dollar-neutral long/short framework, integrated with macro-regime filters and technical trend confirmation. Our results over a 16-year testing period demonstrate a mean excess return of 51.02% per annum net of transaction costs, while achieving a Sharpe ratio of 1.06 and a Sortino ratio of 2.61. The significant divergence between Sharpe and Sortino ratios highlights the strategy’s positive skewness, effectively capturing upside volatility while limiting downside risk. Statistical robustness is confirmed by a Newey–West adjusted t-statistic of 4.01, indicating that the generated alpha is highly significant. This research provides a proof-of-concept for the use of Large Language Models (LLMs) as qualitative gatekeepers in quantitative finance, effectively bridging the gap between statistical NLP and human-like contextual understanding. Full article

► Show Figures

Figure 1

15 pages, 2849 KB

Open AccessArticle

Empowering Rural Livestock Health: AI-Powered Early Detection of Cattle Diseases

by Dammavalam Srinivasa Rao, P. Chandra Sekhar Reddy, Annam Revathi, Vangipuram Sravan Kiran, Nuvvusetty Rajasekhar, Nadella Sandhya, Pulipati Venkateswara Rao, Adla Sai Karthik and Puvvala Jogeeswara Venkata Naga Sai

AI 2026, 7(4), 137; https://doi.org/10.3390/ai7040137 - 9 Apr 2026

Viewed by 1566

Abstract

This paper presents a novel approach for the early detection of cattle diseases. We present a uniquely integrated image classification-based project for real-time cattle disease diagnosis that combines image classification models to identify diseases accurately; a seamless, user-friendly dashboard for real-time monitoring with [...] Read more.

This paper presents a novel approach for the early detection of cattle diseases. We present a uniquely integrated image classification-based project for real-time cattle disease diagnosis that combines image classification models to identify diseases accurately; a seamless, user-friendly dashboard for real-time monitoring with data visualization and instant predictions; and a mobile application that acts as a data source. The mobile application enables real-time collection of farmer and cattle-related data, including age, number of cattle, vaccination cycles, cattle images, and location metadata. Our AI-based cattle health monitoring project enables the early, efficient, scalable, and timely detection of Lumpy Skin Disease (LSD) and Foot and Mouth Disease (FMD) in cattle with high accuracy. A dataset of approximately 1600 LSD/non-LSD images and 840 FMD images was used to train multiple classification networks such as EfficientNetB0, ResNet50, VGG16, EfficientNetV2B0, and EfficientNetV2S, along with a soft-voting ensemble at inference. The proposed framework achieved a maximum testing accuracy of 98.36% for LSD classification and 99.84% for FMD classification under internal validation. These results indicate strong disease recognition capability, with ensemble-based prediction improving robustness, particularly for FMD classification. The proposed system enables practical, early, efficient, and scalable applications of AI research to improve livestock health monitoring and support the early prevention of widespread disease outbreaks. Full article

► Show Figures

Graphical abstract

28 pages, 2765 KB

Open AccessArticle

Machine Learning-Based Approach for Malicious Node Security and Trust Provision in 5G-Enabled VANET

by Samuel Kofi Erskine

AI 2026, 7(4), 136; https://doi.org/10.3390/ai7040136 - 9 Apr 2026

Viewed by 651

Abstract

This research utilizes machine learning (ML)-based malicious node detection techniques to effectively incorporate security and trustworthiness into fifth-generation (5G) and Vehicular Ad hoc Network (VANET) systems, in contrast to traditional methods that do not employ modern techniques. VANET may be vulnerable due to [...] Read more.

This research utilizes machine learning (ML)-based malicious node detection techniques to effectively incorporate security and trustworthiness into fifth-generation (5G) and Vehicular Ad hoc Network (VANET) systems, in contrast to traditional methods that do not employ modern techniques. VANET may be vulnerable due to vehicle mobility, network openness, and the conventional network architecture. Therefore, security and trust management using modern methodologies, such as ML approaches, is essential for 5G-enabled VANET integration, which has become a paramount concern. And due to limitations imposed by traditional security methods, which are unable to identify malicious nodes in VANET completely, processing delays are longer. Therefore, this research utilizes the VANET malicious-node dataset designed for real-time malicious node/attack detection in VANET. The proposed ML methodology uses a Random Forest (RF) and an optimized ensemble ML classifier, such as XGBoost and LightGBM, which require a security and trustworthiness solution provided by the RF Trust Extended Authentication (TEA). We simulate vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) mobility, communication behaviors, and trust metrics to assess the accuracy of malicious-vehicular-node features for the identification and detection of attacks, including False Injection, Sybil, blackhole, and Denial-of-Service (DoS). The proposed ML methodology also identifies these attack patterns, providing a realistic dataset for Intelligent Transportation System (ITS) research. In contrast, traditional VANET methods do not. We compared the performance of the proposed ML method with other literature-standard ML and RF methods using metrics such as accuracy, confusion matrices, and precision, Recall, and F1-score to measure effectiveness. In our proposed machine learning (ML) method, we achieve 99% accuracy in classifying MVN and predicting both attack, including False Injection, Sybil, blackhole, and Denial-of-Service (DoS), and benign classes, with precision, recall, and F1-score of 100% each, and establish a trustworthiness score of 100%, Whilst the standard models, such as other VANET methods achieved an accuracy of only 95%, with precision, recall, and F1-score of 98%, without a confusion matrix to confirm the model’s performance. Full article

► Show Figures

Figure 1

19 pages, 9603 KB

Open AccessArticle

Understanding Modality-Specific Vulnerabilities in Vision–Language Models Under Adversarial Attacks

by Maisha Binte Rashid and Pablo Rivas

AI 2026, 7(4), 135; https://doi.org/10.3390/ai7040135 - 9 Apr 2026

Viewed by 901

Abstract

Vision–language models (VLMs), such as Contrastive Language–Image Pretraining (CLIP), are increasingly deployed in real-world applications, including content moderation, misinformation detection, and fraud analysis, making their robustness to adversarial attacks a critical concern. While adversarial robustness has been widely studied in unimodal models, modality-specific [...] Read more.

Vision–language models (VLMs), such as Contrastive Language–Image Pretraining (CLIP), are increasingly deployed in real-world applications, including content moderation, misinformation detection, and fraud analysis, making their robustness to adversarial attacks a critical concern. While adversarial robustness has been widely studied in unimodal models, modality-specific vulnerabilities in multimodal models remain underexplored. In this work, we analyze CLIP by applying gradient-based adversarial attacks to its vision and language modalities, both independently and jointly, and evaluating performance on two multimodal classification benchmarks: the Facebook Hateful Memes dataset and a large-scale Suspicious Car Parts dataset. Using Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks along with multiple adversarial retraining strategies, we show that adversarial perturbations on the image modality consistently cause the most severe and unstable performance degradation. These results demonstrate that the vision modality is the primary vulnerability in CLIP, highlighting the need for modality-specific defense strategies that focus more on the weaker modality in multimodal systems. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Graphical abstract

22 pages, 6080 KB

Open AccessArticle

A Conceptual Framework for Simulated Self-Assessment and Meta-Evaluation of Generative AI Models

by Kostadin Yotov, Stanka Hadzhikoleva, Emil Hadzhikolev, Mariyan Milev and Todor Rachovski

AI 2026, 7(4), 134; https://doi.org/10.3390/ai7040134 - 7 Apr 2026

Viewed by 715

Abstract

The increasing integration of generative artificial intelligence (GenAI) into scientific research raises the question of whether such systems can be evaluated not only through external benchmarks but also through structured analysis of their own meta-evaluative responses. This study introduces a conceptual framework for [...] Read more.

The increasing integration of generative artificial intelligence (GenAI) into scientific research raises the question of whether such systems can be evaluated not only through external benchmarks but also through structured analysis of their own meta-evaluative responses. This study introduces a conceptual framework for simulated self-assessment of GenAI models, formalized through a multidimensional self-assessment profile and a metacognitive self-assessment index (MSI). The proposed framework integrates quantitative criteria capturing hallucination propensity, knowledge currency, formal-structure handling, source validity, and terminological precision. To evaluate the reliability of model-generated self-assessments, psychometric instruments traditionally used in human metacognition research—MAI, SRIS, and SDQ—are adapted for large language models. Experimental results across multiple GPT models indicate that, despite the absence of genuine introspective mechanisms, GenAI systems can produce internally consistent and moderately calibrated meta-evaluative responses. These findings suggest that simulated self-assessment, when interpreted within a rigorous methodological framework and combined with external validation, can serve as a complementary quantitative tool for trust analysis and reliability assessment of generative models. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

14 pages, 2118 KB

Open AccessArticle

AI Method for Classification of Diagnosis of Near-Infrared Breast Lesion Images

by Kaiquan Chen, Fangyang Shen, Honggang Wang, Zhengchao Dong, Jizhong Xiao, Ming Ma, Afroza Aktar, Christopher Chow and Wenxiong Zhang

AI 2026, 7(4), 133; https://doi.org/10.3390/ai7040133 - 7 Apr 2026

Viewed by 583

Abstract

In near-infrared optical breast lesion screening and diagnosis systems, high-speed four-dimensional scanners can dynamically acquire tens of thousands of lesion images within a five-minute period. Currently, manual computer annotation is required to generate standard samples from these scanned breast lesion images, a process [...] Read more.

In near-infrared optical breast lesion screening and diagnosis systems, high-speed four-dimensional scanners can dynamically acquire tens of thousands of lesion images within a five-minute period. Currently, manual computer annotation is required to generate standard samples from these scanned breast lesion images, a process that depends heavily on physicians with clinical expertise. On average, a single physician can annotate only approximately ten samples per working day. As a result, this process is time-consuming and labor-intensive, and the collected samples often suffer from low accuracy, large variability, and limited diagnostic reliability. Several AI-based annotation tools, such as QuPath, HALO AI™, and X-AnyLabeling, have been developed to assist this process. However, these tools are primarily manual or semi-automated and are unable to provide rapid and high-precision recognition. To address these limitations, this study proposes a new AI-based method for the rapid, accurate, and fully automated detection and diagnosis of breast lesions. The proposed approach complements existing AI-based annotation and diagnostic methods by enabling automated detection and classification of breast lesion samples. The proposed system employs a deep learning–based classification framework to construct a professional-level AI diagnostic model. The system automatically generates diagnostic outputs based on the annotation criteria used by professional physicians, including positive/negative classification and accuracy metrics. Compared with conventional manual diagnostic methods, the proposed approach provides faster and more reliable diagnostic estimates for new patients. These results demonstrate the potential of the proposed AI-based method to advance automated breast lesion screening and diagnosis and to contribute to future research and clinical applications in this field. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

23 pages, 2601 KB

Open AccessArticle

Can Modern Vision Models Understand the Difference Between an Object and a Look-Alike?

by Itay Cohen, Ethan Fetaya and Amir Rosenfeld

AI 2026, 7(4), 132; https://doi.org/10.3390/ai7040132 - 4 Apr 2026

Viewed by 967

Abstract

Recent advances in computer vision have yielded models with strong performance on recognition benchmarks; however, significant gaps remain in comparison with human perception. One subtle ability is to judge whether an image looks like a given object without being an instance of that [...] Read more.

Recent advances in computer vision have yielded models with strong performance on recognition benchmarks; however, significant gaps remain in comparison with human perception. One subtle ability is to judge whether an image looks like a given object without being an instance of that object. We study whether vision–language models such as CLIP capture this distinction. We curated a dataset named RoLA (Real or LookAlike) of real and look-alike exemplars (e.g., toys, statues, drawings, pareidolia) across multiple categories, and first evaluate a prompt-based baseline with paired “real”/“look-alike” prompts. We then estimate a direction in CLIP’s embedding space that moves representations between real and look-alike. Applying this direction to image and text embeddings improves discrimination in cross-modal retrieval on Conceptual 12M, and also enhances captions produced by a CLIP prefix captioner. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

► Show Figures

Figure 1

12 pages, 284 KB

Open AccessArticle

LLM-Based Control for Simulated Physical Reasoning: Modular Evaluation in the NeurIPS Embodied Agent Interface Challenge

by Hilmi Demirhan and Wlodek Zadrozny

AI 2026, 7(4), 131; https://doi.org/10.3390/ai7040131 - 3 Apr 2026

Viewed by 868

Abstract

Benchmark-driven evaluation helps distinguish between planning quality and interface reliability when large language models are utilized for embodied reasoning in simulation. Our submission to the Embodied Agent Interface Challenge (EAI) is evaluated across four stages of the pipeline. These being goal interpretation, subgoal [...] Read more.

Benchmark-driven evaluation helps distinguish between planning quality and interface reliability when large language models are utilized for embodied reasoning in simulation. Our submission to the Embodied Agent Interface Challenge (EAI) is evaluated across four stages of the pipeline. These being goal interpretation, subgoal decomposition, action sequencing, and transition modeling. The tasks run in the BEHAVIOR and VirtualHome simulators, which use constrained action vocabularies, fixed-object inventories and symbolic state representations within a standard evaluation protocol. Our system accesses the OpenAI API using GPT-4.1 for BEHAVIOR, GPT-4.1-mini for VirtualHome, and GPT-5-mini in later exploratory experiments across both environments. The schemas for each task determine how the outputs are structured, and outputs are regenerated when they do not follow the specification. On the final public leaderboard, our system ranked eighteenth overall with a score of 57.92, achieving 68.88 on BEHAVIOR and 46.96 on VirtualHome. In this paper, we describe our approach and discuss what these observations suggest about the strengths and limitations of current language models when used for embodied reasoning. Full article

(This article belongs to the Special Issue Integrating Large Language Models into Robotic Autonomy)

► Show Figures

Figure 1

26 pages, 16800 KB

Open AccessArticle

Automated Anatomical Feature Analysis and Scoring for Draw-a-Person Test Drawings via ResNet-Based Multi-Label Detection and Classification

by Asma Abdullah Alwadai and Emad Sami Jaha

AI 2026, 7(4), 130; https://doi.org/10.3390/ai7040130 - 2 Apr 2026

Cited by 1 | Viewed by 1058

Abstract

The process of manually scoring drawings for the Goodenough–Harris Draw-a-Person (DAP) test is time-consuming and labor-intensive. It is also prone to inconsistencies due to subjective interpretation. Keeping these drawbacks in mind, this study aims to introduce a hybrid model of automated analysis and [...] Read more.

The process of manually scoring drawings for the Goodenough–Harris Draw-a-Person (DAP) test is time-consuming and labor-intensive. It is also prone to inconsistencies due to subjective interpretation. Keeping these drawbacks in mind, this study aims to introduce a hybrid model of automated analysis and scoring of DAP test results using a combination of deep learning and rule-based reasoning. The proposed model has two different modules: one for predicting ten visual anatomical features of drawings using a convolutional neural network (CNN), and another set of six rules for representing geometric and spatial relationships. The output of the CNN is converted to binary using thresholding and then concatenated with the results of heuristic rules to obtain a final set of sixteen features. The proposed model was also evaluated using five-fold cross-validation methods and a separate hold-out test set containing 948 labeled drawings. The evaluation using the five-fold cross-validation approach shows that the proposed approach maintains consistent performance with high average F1-scores for all primary anatomical features above 0.90. On the other hand, the evaluation using the hold-out test set revealed that the proposed approach achieved a high macro-average accuracy of 91.78% for all sixteen features. This implies that the proposed approach has a high degree of generalization capability for the problem domain. The proposed approach achieves almost-perfect scores for structurally prominent anatomical features such as the head, limbs, trunk-related relationships, and all heuristic-based features. Nevertheless, the proposed approach performs poorly for less visually distinguishable anatomical features such as the ears (average F1-scores ≈ 0.09–0.12) and the neck (average F1-scores ≈ 0.75). The evaluation results show that the proposed approach is efficient in approximating expert-level scoring with a considerable reduction in human effort. Nevertheless, some limitations exist in the proposed approach. First, the proposed approach is less robust for subtle anatomical features. Second, the proposed approach relies on heuristic thresholds for feature extraction. Third, the proposed approach equally weighs all sixteen features; however, this may not exactly match the actual DAP scoring system. Full article

(This article belongs to the Special Issue Deep Learning Technologies and Their Applications in Image Processing, Computer Vision, and Computational Intelligence)

► Show Figures

Figure 1

42 pages, 12119 KB

Open AccessArticle

AI-FRS: An Ensemble-Based AI Decision-Support System for Fetal Risk Prediction in a Mexican Clinical Setting

by Abimael Guzman-Pando, Bernardo O. Enriquez-Guillen, Graciela Ramirez-Alonso, Javier Camarillo-Cisneros, Cesar R. Aguilar-Torres and Luis C. Hinojos-Gallardo

AI 2026, 7(4), 129; https://doi.org/10.3390/ai7040129 - 1 Apr 2026

Viewed by 1211

Abstract

Nearly 2 million stillbirths occur globally each year. These outcomes are often driven by disparities in healthcare access, especially in low- and middle-income countries, where limited resources and shortages of trained medical personnel further increase preventable risks. Addressing these challenges requires not only [...] Read more.

Nearly 2 million stillbirths occur globally each year. These outcomes are often driven by disparities in healthcare access, especially in low- and middle-income countries, where limited resources and shortages of trained medical personnel further increase preventable risks. Addressing these challenges requires not only strengthening healthcare systems but also enhancing intervention strategies. In this context, the development of decision-support systems becomes essential to dynamically identify at-risk pregnancies and improve fetal outcomes. Therefore, we propose AI-FRS (Artificial Intelligence–Fetal Risk Prediction System), a decision support tool for fetal risk prediction, designed to classify fetal conditions as healthy or at risk, using clinical data from Mexican obstetric patients. AI-FRS is built upon seven distinct machine learning models, systematically evaluated through 127 first-order ensemble combinations using hard voting. To further enhance predictive performance, we assessed 32,752 second-order ensembles, constructed by combining top-performing first-order ensembles across recall, precision, and F1-score metrics. The final selected model, called BSOEM, achieved a robust F1-score of 0.812, providing a more balanced and robust decision-making framework than individual models or simple ensembles. Additionally, we conducted an interpretability analysis to identify the clinical variables with the greatest contribution to model predictions, strengthening the system’s transparency and potential clinical trust. AI-FRS features a user-friendly interface specifically designed to facilitate adoption by healthcare professionals. This provides a fast and clinically applicable AI tool for intrapartum and peripartum risk detection in obstetrics, supporting clinical decision-making and improving fetal health outcomes. Full article

(This article belongs to the Special Issue Intelligent Data-Driven Approaches for Next-Generation Medical Diagnosis and Healthcare Systems)

► Show Figures

Figure 1

50 pages, 986 KB

Open AccessReview

A Survey and Taxonomy of Loss Functions in Machine Learning

by Lorenzo Ciampiconi, Adam Elwood, Marco Leonardi, Ashraf Mohamed and Alessandro Rozza

AI 2026, 7(4), 128; https://doi.org/10.3390/ai7040128 - 1 Apr 2026

Cited by 2 | Viewed by 2356

Abstract

Most state-of-the-art machine learning techniques revolve around the optimization of loss functions, making the choice of an objective critical to model performance and reliability. Although recent reviews discuss loss functions in specific domains or in deep learning settings, there is still no single [...] Read more.

Most state-of-the-art machine learning techniques revolve around the optimization of loss functions, making the choice of an objective critical to model performance and reliability. Although recent reviews discuss loss functions in specific domains or in deep learning settings, there is still no single reference that presents widely used losses across major task families within a unified formal setting and with consistent optimization-relevant property annotations. In this survey, we compile and systematize the most widely adopted loss functions for regression, classification, generative modeling, ranking, energy-based modeling, and relational learning. Our selection procedure combines seeding from foundational textbooks and prior surveys with cross-checking of highly cited literature and common implementations in mainstream machine learning frameworks. We introduce 52 loss functions and organize them into an intuitive taxonomy, summarizing their theoretical motivation, key mathematical properties, and typical application contexts, with compact appendix tables for quick lookup. This survey is intended as a resource for undergraduate, graduate, and Ph.D. students, as well as researchers seeking a structured reference for selecting and comparing loss functions. Full article

(This article belongs to the Special Issue Advances and Applications in Graph Neural Networks (GNNs))

► Show Figures

Figure 1

19 pages, 712 KB

Open AccessArticle

Federated Learning-Driven Protection Against Adversarial Agents in a ROS2 Powered Edge-Device Swarm Environment

by Brenden Preiss and George Pappas

AI 2026, 7(4), 127; https://doi.org/10.3390/ai7040127 - 1 Apr 2026

Viewed by 1097

Abstract

Federated learning (FL) enables collaborative model training across distributed devices and robotic systems while preserving data privacy, making it well-suited for swarm robotics and edge-device-powered intelligence. However, FL remains vulnerable to adversarial behaviors such as data and model poisoning, particularly in real-world deployments [...] Read more.

Federated learning (FL) enables collaborative model training across distributed devices and robotic systems while preserving data privacy, making it well-suited for swarm robotics and edge-device-powered intelligence. However, FL remains vulnerable to adversarial behaviors such as data and model poisoning, particularly in real-world deployments where detection methods must operate under strict computational and communication constraints. This paper presents a practical, real-world federated learning framework that enhances robustness to adversarial agents in a ROS2-based edge-device swarm environment. The proposed system integrates the Federated Averaging (FedAvg) algorithm with a lightweight average cosine similarity-based filtering method to detect and suppress harmful model updates during aggregation. Unlike prior work that primarily evaluates poisoning defenses in simulated environments, this framework is implemented and evaluated on physical hardware, consisting of a laptop-based aggregator and multiple Raspberry Pi worker nodes. A convolutional neural network (CNN) based on the MobileNetV3-Small architecture is trained on the MNIST dataset, with one worker executing a sign-flipping model poisoning attack. Experimental results show that FedAvg alone fails to maintain meaningful model accuracy under adversarial conditions, resulting in near-random classification performance with a final global model accuracy of 11% and a loss of 2.3. In contrast, the integration of cosine similarity filtering demonstrates effective detection of sign-flipping model poisoning in the evaluated ROS2 swarm experiment, allowing the global model to maintain model accuracy of around 90% and loss around 0.37, which is close to baseline accuracy of 93% of the FedAvg algorithm only under no attack with a very minimal increase in loss, despite the presence of an attacker. The proposed method also maintains a false positive rate (FPR) of around 0.01 and a false negative rate (FNR) of around 0.10 of the global model in the presence of an attacker, which is a minimal difference from the baseline FedAvg-only results of around 0.008 for FPR and 0.07 for FNR. Additionally, the proposed method of FedAvg + cosine similarity filtering maintains computational statistics similar to baseline FedAvg with no attacker. Baseline results show an average runtime of about 34 min, while our proposed method shows an average runtime of about 35 min. Also, the average size of the global model being shared among workers remains consistent at around 7.15 megabytes, showing little to no increase in message payload sizes between baseline results and our proposed method. These results demonstrate that computationally lightweight cosine similarity-based detection methods can be effectively deployed in real-world, resource-constrained robotic swarm environments, providing a practical path toward improving robustness in real-world federated learning deployments beyond simulation-based evaluation. Full article

► Show Figures

Figure 1

21 pages, 56996 KB

Open AccessArticle

Comprehensive Analysis of Multimodal Fusion Techniques for Ocular Disease Detection

by Veena K. M., Pragya Gupta, Ruthvik Avadhanam, Rashmi Naveen Raj, Sulatha V. Bhandary, Varadraj Gurupur and Veena Mayya

AI 2026, 7(4), 126; https://doi.org/10.3390/ai7040126 - 1 Apr 2026

Viewed by 1389

Abstract

Accurate and early identification of ocular diseases is essential to prevent vision impairment and enable timely medical intervention. In routine clinical practice, ophthalmologists rely on a structured diagnostic workflow that incorporates multiple imaging modalities to manually assess and diagnose ocular diseases. However, interpreting [...] Read more.

Accurate and early identification of ocular diseases is essential to prevent vision impairment and enable timely medical intervention. In routine clinical practice, ophthalmologists rely on a structured diagnostic workflow that incorporates multiple imaging modalities to manually assess and diagnose ocular diseases. However, interpreting each modality requires significant clinical experience and can be time-consuming. These limitations can be effectively addressed through the application of AI (Artificial intelligence)-driven multimodal fusion techniques. In this study, we conducted an empirical investigation to assess the impact of different fusion strategies—including early, intermediate, and late fusion—on diagnostic performance, training requirements, and interpretability. The proposed methodology was evaluated using three publicly available datasets: FFA-Fundus (Fundus fluorescein angiography), GAMMA (Glaucoma Analysis and Multi-Modal Assessment), and OLIVES (Ophthalmic Labels to Investigate Visual Eye Semantics). Experimental results demonstrate that multimodal feature fusion improves disease detection performance. Although fused models typically required an increase in training parameters compared to single-modality models, they provided interpretability on par with that of individual single-modal networks. However, inference time increased by approximately 50% for multimodal architectures. These findings underscore the value of integrating diverse ophthalmic imaging modalities to enhance diagnostic accuracy in automated disease detection systems. At the same time, the results highlight that unimodal models containing highly discriminative features can also perform competitively, particularly when a single modality is sufficient for disease identification. Multimodal fusion provides the greatest benefit in scenarios where complementary information across modalities contributes distinct and non-redundant features. Furthermore, fusing all available modalities may not be optimal due to increased computational cost and reduced inference efficiency; thus, selective modality integration and lightweight fusion strategies are essential to balance accuracy, interpretability, and efficiency in clinical deployment. Full article

► Show Figures

Figure 1

33 pages, 16801 KB

Open AccessArticle

A GNSS–Vision Integrated Autonomous Navigation System for Trellis Orchard Transportation Robots

by Huaiyang Liu, Haiyang Gu, Yong Wang, Tianjiao Zhong, Tong Tian and Changxing Geng

AI 2026, 7(4), 125; https://doi.org/10.3390/ai7040125 - 1 Apr 2026

Cited by 1 | Viewed by 1112

Abstract

Autonomous navigation is essential for orchard transportation robots to support automated operations and precision orchard management. However, in trellis orchards, dense vegetation and complex canopy structures often degrade the stability of GNSS-based navigation in in-row environments. To address this issue, this study proposes [...] Read more.

Autonomous navigation is essential for orchard transportation robots to support automated operations and precision orchard management. However, in trellis orchards, dense vegetation and complex canopy structures often degrade the stability of GNSS-based navigation in in-row environments. To address this issue, this study proposes a GNSS–vision integrated navigation framework for orchard transportation robots. The performance of GNSS-based navigation in out-of-row environments and vision-based navigation in in-row environments was experimentally evaluated under representative orchard operating conditions. In out-of-row areas, the robot employs GNSS-based path planning and trajectory tracking to achieve reliable navigation in relatively open, lightly occluded environments. During in-row navigation, a deep learning-based real-time object detection approach is used to detect tree trunks and trellis supporting structures. By integrating corner-point selection with temporal RANSAC-based line fitting, a stable orchard row structure is constructed to generate robust navigation references. The visual perception module serves as the front-end sensing component of the navigation system and is designed to be independent of specific object detection architectures, allowing flexible integration with different real-time detection models. Field experiments were conducted under various orchard layouts and growth stages. The average lateral deviation of GNSS-based navigation in out-of-row scenarios ranged from 0.093 to 0.221 m, while the average heading deviation of in-row visual navigation was approximately 5.23° at a robot speed of 0.6 m/s. These results indicate that the proposed perception and navigation methods can maintain stable navigation performance within their respective applicable scenarios in trellis orchard environments. The experimental findings provide a practical and engineering-oriented basis for future research on automatic navigation mode switching and system-level integration of orchard transportation robots. Full article

► Show Figures

Figure 1

24 pages, 5084 KB

Open AccessArticle

Real-Time Constrained Visual Servoing for Agricultural Harvesting Robots via MPC-Guided Reinforcement Learning

by Liangzheng Gao, Qingchun Feng, Shiqi Chen, Zhijie Yang, Fengcui Fan, Lin Chen and Chunjiang Zhao

AI 2026, 7(4), 124; https://doi.org/10.3390/ai7040124 - 1 Apr 2026

Viewed by 1414

Abstract

With the intensification of global agricultural labor shortage and scaled development of facility agriculture, autonomous precision harvesting robots for unstructured greenhouse environments have become an urgent need. For cluster-picking crops such as tomatoes, visual servoing enables real-time closed-loop control of the end-effector pose, [...] Read more.

With the intensification of global agricultural labor shortage and scaled development of facility agriculture, autonomous precision harvesting robots for unstructured greenhouse environments have become an urgent need. For cluster-picking crops such as tomatoes, visual servoing enables real-time closed-loop control of the end-effector pose, addressing challenges of random fruit distribution and variable stem orientations. However, existing methods struggle to balance constraint handling with real-time efficiency. This paper proposes an MPC-Guided Reinforcement Learning visual servoing framework, innovatively combining the planning capability of optimal control with the adaptive learning ability and real-time inference advantages of reinforcement learning. The approach adopts a teacher–student paradigm: expert trajectories from the MPC controller warm-start the reinforcement learning policy through behavior cloning, followed by PPO-based fine-tuning with adaptive gain regulation and stagnation-enhanced exploration mechanisms. Simulation experiments demonstrate a 95% success rate with average positioning and orientation errors of 13.6 mm and 0.009 rad respectively. Compared to MPC baseline, task steps are reduced by 53.4%; compared to Standard PPO, success rate improves by 6%. Greenhouse field validation achieves 85.3% picking success rate and 5.63 s per fruit operation time, confirming the framework’s excellent balance among control precision, robustness, and efficiency for high-precision robotic harvesting in unstructured agricultural environments. Full article

(This article belongs to the Special Issue Harvesting the Future: Transforming Agricultural Practices Through AI Application)

► Show Figures

Figure 1

31 pages, 2539 KB

Open AccessArticle

Design and Evaluation of an AI-Based Conversational Agent for Travel Agencies: Enhancing Training, Assistance, and Operational Efficiency

by Pablo Vicente-Martínez, Emilio Soria-Olivas, Inés Esteve-Mompó, Manuel Sánchez-Montañés, María Ángeles García Escrivà and Edu William-Secin

AI 2026, 7(4), 123; https://doi.org/10.3390/ai7040123 - 1 Apr 2026

Cited by 1 | Viewed by 2167

Abstract

The tourism industry faces increasing pressure for agile, personalized services, yet travel agencies struggle with fragmented knowledge scattered across isolated systems and legacy formats. While Large Language Models (LLMs) are widely applied in customer-facing roles, their potential to enhance internal operational efficiency remains [...] Read more.

The tourism industry faces increasing pressure for agile, personalized services, yet travel agencies struggle with fragmented knowledge scattered across isolated systems and legacy formats. While Large Language Models (LLMs) are widely applied in customer-facing roles, their potential to enhance internal operational efficiency remains largely underexplored. This study presents the design and evaluation of an intelligent assistant specifically for travel agency operations, built upon a Retrieval-Augmented Generation (RAG) architecture using Gemini 2.0 Flash. The system integrates heterogeneous data sources, including structured product catalogs and unstructured documentation processed via Optical Character Recognition (OCR), into a unified interface comprising work assistance, interactive training, and evaluation modules. Results demonstrate information retrieval times not greater than 45 s, ensuring its daily usability, while maintaining 95% accuracy. Furthermore, the system democratizes tacit senior expertise and accelerates new employee onboarding. This research validates RAG architectures as a powerful solution to knowledge fragmentation, shifting the strategic AI focus from customer automation to employee empowerment and operational optimization. Full article

► Show Figures

Figure 1

29 pages, 2627 KB

Open AccessArticle

Building-Level Energy Disaggregation Using AI-Based NILM Techniques in Heterogeneous Environments

by Ana Rubio-Bustos, Gloria Calleja-Rodríguez, Jorge De-La-Torre-García, Unai Fernandez-Gamiz and Ekaitz Zulueta

AI 2026, 7(4), 122; https://doi.org/10.3390/ai7040122 - 1 Apr 2026

Viewed by 1336

Abstract

Non-Intrusive Load Monitoring (NILM) represents a powerful approach for energy disaggregation, which enables detailed insights into energy consumption patterns without requiring extensive sensor deployment. While significant advances have been achieved in residential NILM applications, commercial and industrial buildings remain largely underexplored despite their [...] Read more.

Non-Intrusive Load Monitoring (NILM) represents a powerful approach for energy disaggregation, which enables detailed insights into energy consumption patterns without requiring extensive sensor deployment. While significant advances have been achieved in residential NILM applications, commercial and industrial buildings remain largely underexplored despite their substantial contribution to global energy consumption. This study addresses this gap by developing and evaluating multiple artificial intelligence approaches for energy disaggregation across residential, commercial, and industrial buildings under a unified experimental protocol. We implement and compare several AI-based models, including Vision Transformer (ViT), Variational Autoencoder (VAE), Random Forest (RF), and custom architectures inspired by TimeGPT and Prophet, alongside traditional baseline methods. The proposed framework is validated using three benchmark datasets representing residential (AMPds), commercial (COmBED), and industrial (IMDELD) environments. Experimental results demonstrate that architecture–load interactions, rather than model complexity alone, are the primary determinants of disaggregation accuracy: the ViT-small configuration achieves superior performance for complex industrial loads with R² values exceeding 0.94, Random Forest proves most effective for finite-state commercial HVAC systems with R² up to 0.97, and the Prophet-inspired model excels in capturing seasonal patterns in residential appliances. These findings provide evidence-based guidelines for selecting appropriate AI models based on load characteristics, signal-to-noise ratio, and building type, contributing to the practical deployment of NILM in heterogeneous building environments. Full article

► Show Figures

Figure 1

32 pages, 2463 KB

Open AccessReview

Artificial Intelligence and Youth: Cognitive, Educational, and Behavioral Impacts

by Daniele Giansanti and Claudia Cosenza

AI 2026, 7(4), 121; https://doi.org/10.3390/ai7040121 - 1 Apr 2026

Cited by 1 | Viewed by 6846

Abstract

Background: Artificial Intelligence (AI) and Generative AI (GenAI) are increasingly integrated into educational and professional settings, offering personalized learning, productivity gains, and enhanced engagement. However, excessive reliance may compromise critical thinking, autonomous problem-solving, and emotional regulation among youth (i.e., adolescents and young adults) [...] Read more.

Background: Artificial Intelligence (AI) and Generative AI (GenAI) are increasingly integrated into educational and professional settings, offering personalized learning, productivity gains, and enhanced engagement. However, excessive reliance may compromise critical thinking, autonomous problem-solving, and emotional regulation among youth (i.e., adolescents and young adults) and early-career professionals. Aim: This review examines the cognitive, educational, and behavioral impacts of AI and GenAI use in youth, highlighting implications for their responsible integration in learning and professional development. Methods: A narrative review was conducted, synthesizing empirical studies, psychometric instruments, and international policy frameworks addressing AI engagement. Emphasis was placed on cognitive, behavioral, educational, and ethical dimensions across youth and early-career professionals. Results: AI enhances learning efficiency, creativity, and professional decision-making but may also foster cognitive offloading, dependency, and addiction-like behaviors. Instruments such as the Conversational AI Dependence Scale (CAIDS) and the Problematic ChatGPT Use Scale (PCGUS) help identify maladaptive patterns. Effective strategies include structured pedagogy, human oversight, reflective practice, AI literacy, and ethical guidance. Paradoxically, higher AI competence and trust may increase reliance, underscoring the need for guided and balanced engagement. Conclusions: Responsible AI integration requires multidimensional approaches combining instructional scaffolding, metacognitive strategies, supervision, and governance to preserve autonomy, professional judgment, and cognitive development in youth. Full article

(This article belongs to the Special Issue How Is AI Transforming Education?)

► Show Figures

Figure 1

25 pages, 3662 KB

Open AccessArticle

Evaluating the Perception, Understanding, and Forgetting of Progressive Neural Networks: A Quantitative and Qualitative Analysis

by Lucía Güitta-López, Jaime Boal and Álvaro J. López-López

AI 2026, 7(4), 120; https://doi.org/10.3390/ai7040120 - 31 Mar 2026

Viewed by 837

Abstract

The use of virtual environments to collect the experience required by deep reinforcement learning models is accelerating the deployment of these algorithms in industrial environments. However, once the experience-gathering problem is solved, it is necessary to address how to efficiently transfer the knowledge [...] Read more.

The use of virtual environments to collect the experience required by deep reinforcement learning models is accelerating the deployment of these algorithms in industrial environments. However, once the experience-gathering problem is solved, it is necessary to address how to efficiently transfer the knowledge from the virtual scenario to reality. This paper focuses on examining Progressive Neural Networks (PNNs) as a promising transfer learning technique. The analyses carried out range from studying the capabilities and limits of the layers responsible for learning the state representation from a pixel space, which could arguably be the convolutional blocks, to the forgetting agents suffer when learning a new task. Introducing controlled visual changes in the environment scene can lead to a performance degradation of 50.3% in the worst-case scenario. These visual discrepancies significantly impact the agent’s learning time and accuracy when using a PNN architecture. Regarding the PNN forgetting assessment, partial forgetting occurs in two of the three environments analyzed, those where the agent masters its new task. This could be due to a balance between the relevance of the new features learned and the ones inherited from the teacher agent. Full article

(This article belongs to the Special Issue The Future of Robotics: AI Algorithms, Ethics, and Real-World Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

AI, Volume 7, Issue 4 (April 2026) – 34 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI