MDPI - Publisher of Open Access Journals

38 pages, 1194 KiB

Open AccessReview

Transforming Data Annotation with AI Agents: A Review of Architectures, Reasoning, Applications, and Impact

by Md Monjurul Karim, Sangeen Khan, Dong Hoang Van, Xinyue Liu, Chunhui Wang and Qiang Qu

Future Internet 2025, 17(8), 353; https://doi.org/10.3390/fi17080353 (registering DOI) - 2 Aug 2025

Data annotation serves as a critical foundation for artificial intelligence (AI) and machine learning (ML). Recently, AI agents powered by large language models (LLMs) have emerged as effective solutions to longstanding challenges in data annotation, such as scalability, consistency, cost, and limitations in [...] Read more.

Data annotation serves as a critical foundation for artificial intelligence (AI) and machine learning (ML). Recently, AI agents powered by large language models (LLMs) have emerged as effective solutions to longstanding challenges in data annotation, such as scalability, consistency, cost, and limitations in domain expertise. These agents facilitate intelligent automation and adaptive decision-making, thereby enhancing the efficiency and reliability of annotation workflows across various fields. Despite the growing interest in this area, a systematic understanding of the role and capabilities of AI agents in annotation is still underexplored. This paper seeks to fill that gap by providing a comprehensive review of how LLM-driven agents support advanced reasoning strategies, adaptive learning, and collaborative annotation efforts. We analyze agent architectures, integration patterns within workflows, and evaluation methods, along with real-world applications in sectors such as healthcare, finance, technology, and media. Furthermore, we evaluate current tools and platforms that support agent-based annotation, addressing key challenges such as quality assurance, bias mitigation, transparency, and scalability. Lastly, we outline future research directions, highlighting the importance of federated learning, cross-modal reasoning, and responsible system design to advance the development of next-generation annotation ecosystems. Full article

(This article belongs to the Special Issue 2024 and 2025 Feature Papers from Future Internet’s Editorial Board Members)

► Show Figures

Figure 1

16 pages, 1873 KiB

Open AccessSystematic Review

A Systematic Review of GIS Evolution in Transportation Planning: Towards AI Integration

by Ayda Zaroujtaghi, Omid Mansourihanis, Mohammad Tayarani, Fatemeh Mansouri, Moein Hemmati and Ali Soltani

Future Transp. 2025, 5(3), 97; https://doi.org/10.3390/futuretransp5030097 (registering DOI) - 1 Aug 2025

Viewed by 67

Abstract

Previous reviews have examined specific facets of Geographic Information Systems (GIS) in transportation planning, such as transit-focused applications and open source geospatial tools. However, this study offers the first systematic, PRISMA-guided longitudinal evaluation of GIS integration in transportation planning, spanning thematic domains, data [...] Read more.

Previous reviews have examined specific facets of Geographic Information Systems (GIS) in transportation planning, such as transit-focused applications and open source geospatial tools. However, this study offers the first systematic, PRISMA-guided longitudinal evaluation of GIS integration in transportation planning, spanning thematic domains, data models, methodologies, and outcomes from 2004 to 2024. This study addresses this gap through a longitudinal analysis of GIS-based transportation research from 2004 to 2024, adhering to PRISMA guidelines. By conducting a mixed-methods analysis of 241 peer-reviewed articles, this study delineates major trends, such as increased emphasis on sustainability, equity, stakeholder involvement, and the incorporation of advanced technologies. Prominent domains include land use–transportation coordination, accessibility, artificial intelligence, real-time monitoring, and policy evaluation. Expanded data sources, such as real-time sensor feeds and 3D models, alongside sophisticated modeling techniques, enable evidence-based, multifaceted decision-making. However, challenges like data limitations, ethical concerns, and the need for specialized expertise persist, particularly in developing regions. Future geospatial innovations should prioritize the responsible adoption of emerging technologies, inclusive capacity building, and environmental justice to foster equitable and efficient transportation systems. This review highlights GIS’s evolution from a supplementary tool to a cornerstone of data-driven, sustainable urban mobility planning, offering insights for researchers, practitioners, and policymakers to advance transportation strategies that align with equity and sustainability goals. Full article

► Show Figures

Figure 1

12 pages, 1346 KiB

Open AccessArticle

A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology

by Yi Luo, Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Xiaojian Chen, Rui Zhang, Quan Chen, Wil Ngwa and Kai Ding

Bioengineering 2025, 12(8), 835; https://doi.org/10.3390/bioengineering12080835 (registering DOI) - 31 Jul 2025

Viewed by 99

Abstract

Background: Lung cancer ranks as the leading cause of cancer-related mortality worldwide. The complexity of tumor delineation, crucial for radiation therapy, requires expertise often unavailable in resource-limited settings. Artificial Intelligence (AI), particularly with advancements in deep learning (DL) and natural language processing (NLP), [...] Read more.

Background: Lung cancer ranks as the leading cause of cancer-related mortality worldwide. The complexity of tumor delineation, crucial for radiation therapy, requires expertise often unavailable in resource-limited settings. Artificial Intelligence (AI), particularly with advancements in deep learning (DL) and natural language processing (NLP), offers potential solutions yet is challenged by high false positive rates. Purpose: The Oncology Contouring Copilot (OCC) system is developed to leverage oncologist expertise for precise tumor contouring using textual descriptions, aiming to increase the efficiency of oncological workflows by combining the strengths of AI with human oversight. Methods: Our OCC system initially identifies nodule candidates from CT scans. Employing Language Vision Models (LVMs) like GPT-4V, OCC then effectively reduces false positives with clinical descriptive texts, merging textual and visual data to automate tumor delineation, designed to elevate the quality of oncology care by incorporating knowledge from experienced domain experts. Results: The deployment of the OCC system resulted in a 35.0% reduction in the false discovery rate, a 72.4% decrease in false positives per scan, and an F1-score of 0.652 across our dataset for unbiased evaluation. Conclusions: OCC represents a significant advance in oncology care, particularly through the use of the latest LVMs, improving contouring results by (1) streamlining oncology treatment workflows by optimizing tumor delineation and reducing manual processes; (2) offering a scalable and intuitive framework to reduce false positives in radiotherapy planning using LVMs; (3) introducing novel medical language vision prompt techniques to minimize LVM hallucinations with ablation study; and (4) conducting a comparative analysis of LVMs, highlighting their potential in addressing medical language vision challenges. Full article

(This article belongs to the Special Issue Novel Imaging Techniques in Radiotherapy)

► Show Figures

Figure 1

16 pages, 628 KiB

Open AccessArticle

Beyond the Bot: A Dual-Phase Framework for Evaluating AI Chatbot Simulations in Nursing Education

by Phillip Olla, Nadine Wodwaski and Taylor Long

Nurs. Rep. 2025, 15(8), 280; https://doi.org/10.3390/nursrep15080280 (registering DOI) - 31 Jul 2025

Viewed by 152

Abstract

Background/Objectives: The integration of AI chatbots in nursing education, particularly in simulation-based learning, is advancing rapidly. However, there is a lack of structured evaluation models, especially to assess AI-generated simulations. This article introduces the AI-Integrated Method for Simulation (AIMS) evaluation framework, a dual-phase [...] Read more.

Background/Objectives: The integration of AI chatbots in nursing education, particularly in simulation-based learning, is advancing rapidly. However, there is a lack of structured evaluation models, especially to assess AI-generated simulations. This article introduces the AI-Integrated Method for Simulation (AIMS) evaluation framework, a dual-phase evaluation framework adapted from the FAITA model, designed to evaluate both prompt design and chatbot performance in the context of nursing education. Methods: This simulation-based study explored the application of an AI chatbot in an emergency planning course. The AIMS framework was developed and applied, consisting of six prompt-level domains (Phase 1) and eight performance criteria (Phase 2). These domains were selected based on current best practices in instructional design, simulation fidelity, and emerging AI evaluation literature. To assess the chatbots educational utility, the study employed a scoring rubric for each phase and incorporated a structured feedback loop to refine both prompt design and chatbox interaction. To demonstrate the framework’s practical application, the researchers configured an AI tool referred to in this study as “Eval-Bot v1”, built using OpenAI’s GPT-4.0, to apply Phase 1 scoring criteria to a real simulation prompt. Insights from this analysis were then used to anticipate Phase 2 performance and identify areas for improvement. Participants (three individuals)—all experienced healthcare educators and advanced practice nurses with expertise in clinical decision-making and simulation-based teaching—reviewed the prompt and Eval-Bot’s score to triangulate findings. Results: Simulated evaluations revealed clear strengths in the prompt alignment with course objectives and its capacity to foster interactive learning. Participants noted that the AI chatbot supported engagement and maintained appropriate pacing, particularly in scenarios involving emergency planning decision-making. However, challenges emerged in areas related to personalization and inclusivity. While the chatbot responded consistently to general queries, it struggled to adapt tone, complexity and content to reflect diverse learner needs or cultural nuances. To support replication and refinement, a sample scoring rubric and simulation prompt template are provided. When evaluated using the Eval-Bot tool, moderate concerns were flagged regarding safety prompts and inclusive language, particularly in how the chatbot navigated sensitive decision points. These gaps were linked to predicted performance issues in Phase 2 domains such as dialog control, equity, and user reassurance. Based on these findings, revised prompt strategies were developed to improve contextual sensitivity, promote inclusivity, and strengthen ethical guidance within chatbot-led simulations. Conclusions: The AIMS evaluation framework provides a practical and replicable approach for evaluating the use of AI chatbots in simulation-based education. By offering structured criteria for both prompt design and chatbot performance, the model supports instructional designers, simulation specialists, and developers in identifying areas of strength and improvement. The findings underscore the importance of intentional design, safety monitoring, and inclusive language when integrating AI into nursing and health education. As AI tools become more embedded in learning environments, this framework offers a thoughtful starting point for ensuring they are applied ethically, effectively, and with learner diversity in mind. Full article

► Show Figures

Figure 1

24 pages, 739 KiB

Open AccessArticle

CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond

by Jinzhong Xu, Junyi Gao, Xiaoming Liu, Guan Yang, Jie Liu, Yang Long, Ziyue Huang and Kai Yang

Mathematics 2025, 13(15), 2460; https://doi.org/10.3390/math13152460 - 30 Jul 2025

Viewed by 274

Abstract

Adapting Large Language Models (LLMs) to specialized domains like geriatric care remains a significant challenge due to the limited availability of domain-specific data and the difficulty of achieving efficient yet effective fine-tuning. Current methods often fail to effectively harness domain-specific causal insights, which [...] Read more.

Adapting Large Language Models (LLMs) to specialized domains like geriatric care remains a significant challenge due to the limited availability of domain-specific data and the difficulty of achieving efficient yet effective fine-tuning. Current methods often fail to effectively harness domain-specific causal insights, which are crucial for understanding and solving complex problems in low-resource domains.To address these challenges, we propose Causality-Aware, Parameter-Efficient Learning (CPEL), a novel framework that leverages domain-specific causal relationships to guide a multi-layer, parameter-efficient fine-tuning process for more effective domain adaptation. By embedding causal reasoning into the model’s adaptation pipeline, CPEL enables efficient specialization in the target domain while maintaining strong task-specific performance. Specifically, the Causal Prompt Generator of CPEL extracts and applies domain-specific causal structures, generating adaptive prompts that effectively guide the model’s learning process. Complementing this, the MPEFT module employs a dual-adapter mechanism to balance domain-level adaptation with downstream task optimization. This cohesive design ensures that CPEL achieves resource efficiency while capturing domain knowledge in a structured and interpretable manner. Based on this framework, we delved into its application in the field of geriatric care and trained a specialized large language model (Geriatric Care LLaMA) tailored for the aged-care domain, leveraging its capacity to efficiently integrate domain expertise. Experimental results from question-answering tasks demonstrate that CPEL improves ROUGE scores by 9–14% compared to mainstream LLMs and outperforms frontier models by 1–2 points in auto-scoring tasks. In summary, CPEL demonstrates robust generalization and cross-domain adaptability, highlighting its scalability and effectiveness as a transformative solution for domain adaptation in specialized, resource-constrained fields. Full article

(This article belongs to the Special Issue Recent Advances of Neural Network Optimization and Algorithms in Deep Learning)

► Show Figures

Figure 1

11 pages, 15673 KiB

Open AccessArticle

Automating GIS-Based Cloudburst Risk Mapping Using Generative AI: A Framework for Scalable Hydrological Analysis

by Alexander Adiyasa, Andrea Niccolò Mantegna and Irma Kveladze

Hydrology 2025, 12(8), 196; https://doi.org/10.3390/hydrology12080196 - 23 Jul 2025

Viewed by 288

Abstract

Accurate dynamic hydrological models are often too complex and costly for the rapid, broad-scale screening necessitated for proactive land-use planning against increasing cloudburst risks. This paper demonstrates the use of GPT-4 to develop a GUI-based Python 3.13.2 application for geospatial flood risk assessments. [...] Read more.

Accurate dynamic hydrological models are often too complex and costly for the rapid, broad-scale screening necessitated for proactive land-use planning against increasing cloudburst risks. This paper demonstrates the use of GPT-4 to develop a GUI-based Python 3.13.2 application for geospatial flood risk assessments. The study used instructive prompt techniques to script a traditional stream and catchment delineation methodology, further embedding it with a custom GUI. The resulting application demonstrates high performance, processing a 29.63 km² catchment at a 1 m resolution in 30.31 s, and successfully identifying the main upstream contributing areas and flow paths for a specified area of interest. While its accuracy is limited by terrain data artifacts causing stream breaks, this study demonstrates how human–AI collaboration, with the LLM acting as a coding assistant guided by domain expertise, can empower domain experts and facilitate the development of advanced GIS-based decision-support systems. Full article

► Show Figures

Figure 1

13 pages, 388 KiB

Open AccessArticle

Benchmarking ChatGPT-3.5 and OpenAI o3 Against Clinical Pharmacists: Preliminary Insights into Clinical Accuracy, Sensitivity, and Specificity in Pharmacy MCQs

by Esraa M. Alsaudi, Sireen A. Shilbayeh and Rana K Abu-Farha

Healthcare 2025, 13(14), 1751; https://doi.org/10.3390/healthcare13141751 - 19 Jul 2025

Viewed by 461

Abstract

Objective: This proof-of-concept study aimed to evaluate and compare the clinical performance of two AI language models (ChatGPT-3.5 and OpenAI o3) in answering clinical pharmacy multiple-choice questions (MCQs), benchmarked against responses from specialist clinical pharmacists in Jordan, including academic preceptors and hospital-based clinicians. [...] Read more.

Objective: This proof-of-concept study aimed to evaluate and compare the clinical performance of two AI language models (ChatGPT-3.5 and OpenAI o3) in answering clinical pharmacy multiple-choice questions (MCQs), benchmarked against responses from specialist clinical pharmacists in Jordan, including academic preceptors and hospital-based clinicians. Methods: A total of 60 clinical pharmacy MCQs were developed based on current guidelines across four therapeutic areas: cardiovascular, endocrine, infectious, and respiratory diseases. Each item was reviewed by academic and clinical experts and then pilot-tested with five pharmacists to determine clarity and difficulty. Two ChatGPT models—GPT-3.5 and OpenAI o3—were tested using a standardized prompt for each MCQ, entered in separate sessions to avoid memory retention. Their answers were classified as true/false positives or negatives and retested after two weeks to assess reproducibility. Simultaneously, 25 licensed pharmacists (primarily from one academic institution and several hospitals in Amman) completed the same MCQs using validated references (excluding AI tools). Accuracy, sensitivity, specificity, and Cohen’s Kappa were used to compare AI and human performance, with statistical analysis conducted using appropriate tests at a significance level of p ≤ 0.05. Results: OpenAI o3 achieved the highest accuracy (83.3%), sensitivity (90.0%), and specificity (70.0%), outperforming GPT-3.5 (70.0%, 77.5%, 55.0%) and pharmacists (69.7%, 77.0%, 55.0%). AI performance declined significantly with increasing question difficulty. OpenAI o3 showed the highest accuracy in the cardiovascular domain (93.3%), while GPT-3.5 performed best in infectious diseases (80.0%). Reproducibility was higher for GPT-3.5 (81.6%, κ = 0.556) than OpenAI o3 (76.7%, κ = 0.364). Over two test rounds, GPT-3.5’s accuracy remained stable, whereas OpenAI o3’s accuracy decreased from 83.3% to 70.0%, indicating some variability. Conclusions: OpenAI o3 shows strong promise as a clinical decision-support tool in pharmacy, especially for low- to moderate-difficulty questions. However, inconsistencies in reproducibility and limitations in complex cases highlight the importance of cautious, supervised integration alongside human expertise. Full article

► Show Figures

Figure 1

33 pages, 2593 KiB

Open AccessArticle

Methodological Exploration of Ontology Generation with a Dedicated Large Language Model

by Maria Assunta Cappelli and Giovanna Di Marzo Serugendo

Electronics 2025, 14(14), 2863; https://doi.org/10.3390/electronics14142863 - 17 Jul 2025

Viewed by 326

Abstract

Ontologies are essential tools for representing, organizing, and sharing knowledge across various domains. This study presents a methodology for ontology construction supported by large language models (LLMs), with an initial application in the automotive sector. Specifically, a user preference ontology for adaptive interfaces [...] Read more.

Ontologies are essential tools for representing, organizing, and sharing knowledge across various domains. This study presents a methodology for ontology construction supported by large language models (LLMs), with an initial application in the automotive sector. Specifically, a user preference ontology for adaptive interfaces in autonomous machines was developed using ChatGPT-4o. Based on this case study, the results were generalized into a reusable methodology. The proposed workflow integrates classical ontology engineering methodologies with the generative and analytical capabilities of LLMs. Each phase follows well-established steps: domain definition, term elicitation, class hierarchy construction, property specification, formalization, population, and validation. A key innovation of this approach is the use of a guiding table that translates domain knowledge into structured prompts, ensuring consistency across iterative interactions with the LLM. Human experts play a continuous role throughout the process, refining definitions, resolving ambiguities, and validating outputs. The ontology was evaluated in terms of logical consistency, structural properties, semantic accuracy, and inferential completeness, confirming its correctness and coherence. Additional validation through SPARQL queries demonstrated its reasoning capabilities. This methodology is generalizable to other domains, if domain experts adapt the guiding table to the specific context. Despite the support provided by LLMs, domain expertise remains essential to guarantee conceptual rigor and practical relevance. Full article

(This article belongs to the Special Issue Role of Artificial Intelligence in Natural Language Processing)

► Show Figures

Figure 1

38 pages, 5791 KiB

Open AccessArticle

Hybrid Gaussian Process Regression Models for Accurate Prediction of Carbonation-Induced Steel Corrosion in Cementitious Mortars

by Teerapun Saeheaw

Buildings 2025, 15(14), 2464; https://doi.org/10.3390/buildings15142464 - 14 Jul 2025

Viewed by 227

Abstract

Steel corrosion prediction in concrete infrastructure remains a critical challenge for durability assessment and maintenance planning. This study presents a comprehensive framework integrating domain expertise with advanced machine learning for carbonation-induced corrosion prediction. Four Gaussian Process Regression (GPR) variants were systematically developed: Baseline [...] Read more.

Steel corrosion prediction in concrete infrastructure remains a critical challenge for durability assessment and maintenance planning. This study presents a comprehensive framework integrating domain expertise with advanced machine learning for carbonation-induced corrosion prediction. Four Gaussian Process Regression (GPR) variants were systematically developed: Baseline GPR with manual optimization, Expert Knowledge GPR employing domain-driven dual-kernel architecture, GPR with Automatic Relevance Determination (GPR-ARD) for feature selection, and GPR-OptCorrosion featuring specialized multi-component composite kernels. The models were trained and validated using 180 carbonated mortar specimens with 15 systematically categorized variables spanning mixture, material, environmental, and electrochemical parameters. GPR-OptCorrosion achieved superior performance (R² = 0.9820, RMSE = 1.3311 μA/cm²), representing 44.7% relative improvement in explained variance over baseline methods, while Expert Knowledge GPR and GPR-ARD demonstrated comparable performance (R² = 0.9636 and 0.9810, respectively). Contrary to conventional approaches emphasizing electrochemical indicators, automatic relevance determination revealed supplementary cementitious materials (silica fume and fly ash) as dominant predictive factors. All advanced models exhibited excellent generalization (gaps < 0.02) and real-time efficiency (<0.006 s), with probabilistic uncertainty quantification enabling risk-informed infrastructure management. This research contributes to advancing machine learning applications in corrosion engineering and provides a foundation for predictive maintenance strategies in concrete infrastructure. Full article

(This article belongs to the Section Building Materials, and Repair & Renovation)

► Show Figures

Figure 1

18 pages, 1760 KiB

Open AccessArticle

Integrating ⁶⁸Ga-PSMA-11 PET/CT with Clinical Risk Factors for Enhanced Prostate Cancer Progression Prediction

by Joanna M. Wybranska, Lorenz Pieper, Christian Wybranski, Philipp Genseke, Jan Wuestemann, Julian Varghese, Michael C. Kreissl and Jakub Mitura

Cancers 2025, 17(14), 2285; https://doi.org/10.3390/cancers17142285 - 9 Jul 2025

Viewed by 422

Abstract

Background/Objectives: This study evaluates whether combining ⁶⁸Ga-PSMA-11-PET/CT derived imaging biomarkers with clinical risk factors improves the prediction of early biochemical recurrence (eBCR) or clinical progress in patients with high-risk prostate cancer (PCa) after primary treatment, using machine learning (ML) models. Methods: We [...] Read more.

Background/Objectives: This study evaluates whether combining ⁶⁸Ga-PSMA-11-PET/CT derived imaging biomarkers with clinical risk factors improves the prediction of early biochemical recurrence (eBCR) or clinical progress in patients with high-risk prostate cancer (PCa) after primary treatment, using machine learning (ML) models. Methods: We analyzed data from 93 high-risk PCa patients who underwent ⁶⁸Ga-PSMA-11 PET/CT and received primary treatment at a single center. Two predictive models were developed: a logistic regression (LR) model and an ML derived probabilistic graphical model (PGM) based on a naïve Bayes framework. Both models were compared against each other and against the CAPRA risk score. The models’ input variables were selected based on statistical analysis and domain expertise including a literature review and expert input. A decision tree was derived from the PGM to translate its probabilistic reasoning into a transparent classifier. Results: The five key input variables were as follows: binarized CAPRA score, maximal intraprostatic PSMA uptake intensity (SUVmax), presence of bone metastases, nodal involvement at common iliac bifurcation, and seminal vesicle infiltration. The PGM achieved superior predictive performance with a balanced accuracy of 0.73, sensitivity of 0.60, and specificity of 0.86, substantially outperforming both the LR (balanced accuracy: 0.50, sensitivity: 0.00, specificity: 1.00) and CAPRA (balanced accuracy: 0.59, sensitivity: 0.20, specificity: 0.99). The decision tree provided an explainable classifier with CAPRA as a primary branch node, followed by SUVmax and specific PET-detected tumor sites. Conclusions: Integrating ⁶⁸Ga-PSMA-11 imaging biomarkers with clinical parameters, such as CAPRA, significantly improves models to predict progression in patients with high-risk PCa undergoing primary treatment. The PGM offers superior balanced accuracy and enables risk stratification that may guide personalized treatment decisions. Full article

(This article belongs to the Special Issue Bridging the Gap: Integrating AI into Clinical Practice for Oncological PET/CT Imaging)

► Show Figures

Figure 1

32 pages, 6788 KiB

Open AccessArticle

Knee Osteoarthritis Detection and Classification Using Autoencoders and Extreme Learning Machines

by Jarrar Amjad, Muhammad Zaheer Sajid, Ammar Amjad, Muhammad Fareed Hamid, Ayman Youssef and Muhammad Irfan Sharif

AI 2025, 6(7), 151; https://doi.org/10.3390/ai6070151 - 8 Jul 2025

Viewed by 567

Abstract

Background/Objectives: Knee osteoarthritis (KOA) is a prevalent disorder affecting both older adults and younger individuals, leading to compromised joint function and mobility. Early and accurate detection is critical for effective intervention, as treatment options become increasingly limited as the disease progresses. Traditional diagnostic [...] Read more.

Background/Objectives: Knee osteoarthritis (KOA) is a prevalent disorder affecting both older adults and younger individuals, leading to compromised joint function and mobility. Early and accurate detection is critical for effective intervention, as treatment options become increasingly limited as the disease progresses. Traditional diagnostic methods rely heavily on the expertise of physicians and are susceptible to errors. The demand for utilizing deep learning models in order to automate and improve the accuracy of KOA image classification has been increasing. In this research, a unique deep learning model is presented that employs autoencoders as the primary mechanism for feature extraction, providing a robust solution for KOA classification. Methods: The proposed model differentiates between KOA-positive and KOA-negative images and categorizes the disease into its primary severity levels. Levels of severity range from “healthy knees” (0) to “severe KOA” (4). Symptoms range from typical joint structures to significant joint damage, such as bone spur growth, joint space narrowing, and bone deformation. Two experiments were conducted using different datasets to validate the efficacy of the proposed model. Results: The first experiment used the autoencoder for feature extraction and classification, which reported an accuracy of 96.68%. Another experiment using autoencoders for feature extraction and Extreme Learning Machines for actual classification resulted in an even higher accuracy value of 98.6%. To test the generalizability of the Knee-DNS system, we utilized the Butterfly iQ+ IoT device for image acquisition and Google Colab’s cloud computing services for data processing. Conclusions: This work represents a pioneering application of autoencoder-based deep learning models in the domain of KOA classification, achieving remarkable accuracy and robustness. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

27 pages, 13752 KiB

Open AccessArticle

Robust Watermarking of Tiny Neural Networks by Fine-Tuning and Post-Training Approaches

by Riccardo Adorante, Alessandro Carra, Marco Lattuada and Danilo Pietro Pau

Symmetry 2025, 17(7), 1094; https://doi.org/10.3390/sym17071094 - 8 Jul 2025

Viewed by 512

Abstract

Because neural networks pervade many industrial domains and are increasingly complex and accurate, the trained models themselves have become valuable intellectual properties. Developing highly accurate models demands increasingly higher investments of time, capital, and expertise. Many of these models are commonly deployed in [...] Read more.

Because neural networks pervade many industrial domains and are increasingly complex and accurate, the trained models themselves have become valuable intellectual properties. Developing highly accurate models demands increasingly higher investments of time, capital, and expertise. Many of these models are commonly deployed in cloud services and on resource-constrained edge devices. Consequently, safeguarding them is critically important. Neural network watermarking offers a practical solution to address this need by embedding a unique signature, either as a hidden bit-string or as a distinctive response to specially crafted “trigger” inputs. This allows owners to subsequently prove model ownership even if an adversary attempts to remove the watermark through attacks. In this manuscript, we adapt three state-of-the-art watermarking methods to “tiny” neural networks deployed on edge platforms by exploiting symmetry-related properties that ensure robustness and efficiency. In the context of machine learning, “tiny” is broadly used as a term referring to artificial intelligence techniques deployed in low-energy systems in the mW range and below, e.g., sensors and microcontrollers. We evaluate the robustness of the selected techniques by simulating attacks aimed at erasing the watermark while preserving the model’s original performances. The results before and after attacks demonstrate the effectiveness of these watermarking schemes in protecting neural network intellectual property without degrading the original accuracy. Full article

(This article belongs to the Section Computer)

► Show Figures

Graphical abstract

28 pages, 1987 KiB

Open AccessArticle

LLM-as-a-Judge Approaches as Proxies for Mathematical Coherence in Narrative Extraction

by Brian Keith

Electronics 2025, 14(13), 2735; https://doi.org/10.3390/electronics14132735 - 7 Jul 2025

Viewed by 568

Abstract

Evaluating the coherence of narrative sequences extracted from large document collections is crucial for applications in information retrieval and knowledge discovery. While mathematical coherence metrics based on embedding similarities provide objective measures, they require substantial computational resources and domain expertise to interpret. We [...] Read more.

Evaluating the coherence of narrative sequences extracted from large document collections is crucial for applications in information retrieval and knowledge discovery. While mathematical coherence metrics based on embedding similarities provide objective measures, they require substantial computational resources and domain expertise to interpret. We propose using large language models (LLMs) as judges to evaluate narrative coherence, demonstrating that their assessments correlate with mathematical coherence metrics. Through experiments on two data sets—news articles about Cuban protests and scientific papers from visualization conferences—we show that the LLM judges achieve Pearson correlations up to 0.65 with mathematical coherence while maintaining high inter-rater reliability (ICC > 0.92). The simplest evaluation approach achieves a comparable performance to the more complex approaches, even outperforming them for focused data sets while achieving over 90% of their performance for the more diverse data sets while using less computational resources. Our findings indicate that LLM-as-a-judge approaches are effective as a proxy for mathematical coherence in the context of narrative extraction evaluation. Full article

(This article belongs to the Special Issue Advances in Large Language Model Empowered Machine Learning: Design and Application)

► Show Figures

Figure 1

15 pages, 4430 KiB

Open AccessArticle

A Comprehensive Approach to Instruction Tuning for Qwen2.5: Data Selection, Domain Interaction, and Training Protocols

by Xungang Gu, Mengqi Wang, Yangjie Tian, Ning Li, Jiaze Sun, Jingfang Xu, He Zhang, Ruohua Xu and Ming Liu

Computers 2025, 14(7), 264; https://doi.org/10.3390/computers14070264 - 5 Jul 2025

Viewed by 391

Abstract

Instruction tuning plays a pivotal role in aligning large language models with diverse tasks, yet its effectiveness hinges on the interplay of data quality, domain composition, and training strategies. This study moves beyond qualitative assessment to systematically quantify these factors through extensive experiments [...] Read more.

Instruction tuning plays a pivotal role in aligning large language models with diverse tasks, yet its effectiveness hinges on the interplay of data quality, domain composition, and training strategies. This study moves beyond qualitative assessment to systematically quantify these factors through extensive experiments on data selection, data mixture, and training protocols. By quantifying performance trade-offs, we demonstrate that the implicit method SuperFiltering achieves an optimal balance, whereas explicit filters can induce capability conflicts. A fine-grained analysis of cross-domain interactions quantifies a near-linear competition between code and math, while showing that tool use data exhibits minimal interference. To mitigate these measured conflicts, we compare multi-task, sequential, and multi-stage training strategies, revealing that multi-stage training significantly reduces Conflict Rates while preserving domain expertise. Our findings culminate in a unified framework for optimizing instruction tuning, offering actionable, data-driven guidelines for balancing multi-domain performance and enhancing model generalization, thus advancing the field by providing a methodology to move from intuition to systematic optimization. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)

► Show Figures

Figure 1

22 pages, 3183 KiB

Open AccessArticle

Surrogate Modeling for Building Design: Energy and Cost Prediction Compared to Simulation-Based Methods

by Navid Shirzadi, Dominic Lau and Meli Stylianou

Buildings 2025, 15(13), 2361; https://doi.org/10.3390/buildings15132361 - 5 Jul 2025

Viewed by 495

Abstract

Designing energy-efficient buildings is essential for reducing global energy consumption and carbon emissions. However, traditional physics-based simulation models require substantial computational resources, detailed input data, and domain expertise. To address these limitations, this study investigates the use of three machine learning-based surrogate models—Random [...] Read more.

Designing energy-efficient buildings is essential for reducing global energy consumption and carbon emissions. However, traditional physics-based simulation models require substantial computational resources, detailed input data, and domain expertise. To address these limitations, this study investigates the use of three machine learning-based surrogate models—Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multilayer Perceptron (MLP)—trained on a synthetic dataset of 2000 EnergyPlus-simulated building design scenarios to predict both energy use intensity (EUI) and cost estimates for midrise apartment buildings in the Toronto area. All three models exhibit strong predictive performance, with R² values exceeding 0.9 for both EUI and cost. XGBoost achieves the best performance in cost prediction on the testing dataset with a root mean squared error (RMSE) of 5.13 CAD/m², while MLP outperforms others in EUI prediction with a testing RMSE of 0.002 GJ/m². In terms of computational efficiency, the surrogate models significantly outperform a physics-based simulation model, with MLP running approximately 340 times faster and XGBoost and RF achieving over 200 times speedup. This study also examines the effect of training dataset size on model performance, identifying a point of diminishing returns where further increases in data size yield minimal accuracy gains but substantially higher training times. To enhance model interpretability, SHapley Additive exPlanations (SHAP) analysis is used to quantify feature importance, revealing how different model types prioritize design parameters. A parametric design configuration analysis further evaluates the models’ sensitivity to changes in building envelope features. Overall, the findings demonstrate that machine learning-based surrogate models can serve as fast, accurate, and interpretable alternatives to traditional simulation methods, supporting efficient decision-making during early-stage building design. Full article

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

► Show Figures

Figure 1

Search Results (371)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (371)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI