MDPI - Publisher of Open Access Journals

21 pages, 708 KB

Open AccessArticle

Assessing Comprehensive Spatial Ability and Specific Attributes Through Higher-Order LLM

by Jujia Li, Kaiwen Man, Mehdi Rajeb, Andrew Krist and Joni M. Lakin

J. Intell. 2025, 13(10), 127; https://doi.org/10.3390/jintelligence13100127 - 5 Oct 2025

Spatial reasoning ability plays a critical role in predicting academic outcomes, particularly in STEM (science, technology, engineering, and mathematics) education. According to the Cattell–Horn–Carroll (CHC) theory of human intelligence, spatial reasoning is a general ability including various specific attributes. However, most spatial assessments [...] Read more.

Spatial reasoning ability plays a critical role in predicting academic outcomes, particularly in STEM (science, technology, engineering, and mathematics) education. According to the Cattell–Horn–Carroll (CHC) theory of human intelligence, spatial reasoning is a general ability including various specific attributes. However, most spatial assessments focus on testing one specific spatial attribute or a limited set (e.g., visualization, rotation, etc.), rather than general spatial ability. To address this limitation, we created a mixed spatial test that includes mental rotation, object assembly, and isometric perception subtests to evaluate both general spatial ability and specific attributes. To understand the complex relationship between general spatial ability and mastery of specific attributes, we used a higher-order linear logistic model (HO-LLM), which is designed to simultaneously estimate high-order ability and sub-attributes. Additionally, this study compares four spatial ability classification frameworks using each to construct Q-matrices that define the relationships between test items and spatial reasoning attributes within the HO-LLM framework. Our findings indicate that HO-LLMs improve model fit and show distinct patterns of attribute mastery, highlighting which spatial attributes contribute most to general spatial ability. The results suggest that higher-order LLMs can offer a deeper and more interpretable assessment of spatial ability and support tailored training by identifying areas of strength and weakness in individual learners. Full article

(This article belongs to the Section Contributions to the Measurement of Intelligence)

20 pages, 5435 KB

Open AccessArticle

Do LLMs Offer a Robust Defense Mechanism Against Membership Inference Attacks on Graph Neural Networks?

by Abdellah Jnaini and Mohammed-Amine Koulali

Computers 2025, 14(10), 414; https://doi.org/10.3390/computers14100414 - 1 Oct 2025

Abstract

Graph neural networks (GNNs) are deep learning models that process structured graph data. By leveraging their graphs/node classification and link prediction capabilities, they have been effectively applied in multiple domains such as community detection, location sharing services, and drug discovery. These powerful applications [...] Read more.

Graph neural networks (GNNs) are deep learning models that process structured graph data. By leveraging their graphs/node classification and link prediction capabilities, they have been effectively applied in multiple domains such as community detection, location sharing services, and drug discovery. These powerful applications and the vast availability of graphs in diverse fields have facilitated the adoption of GNNs in privacy-sensitive contexts (e.g., banking systems and healthcare). Unfortunately, GNNs are vulnerable to the leakage of sensitive information through well-defined attacks. Our main focus is on membership inference attacks (MIAs) that allow the attacker to infer whether a given sample belongs to the training dataset. To prevent this, we introduce three LLM-guided defense mechanisms applied at the posterior level: posterior encoding with noise, knowledge distillation, and secure aggregation. Our proposed approaches not only successfully reduce MIA accuracy but also maintain the model’s performance on the node classification task. Our findings, validated through extensive experiments on widely used GNN architectures, offer insights into balancing privacy preservation with predictive performance. Full article

► Show Figures

Figure 1

20 pages, 1909 KB

Open AccessArticle

RecGen: No-Coding Shell of Rule-Based Expert System with Digital Twin and Capability-Driven Approach Elements for Building Recommendation Systems

by Sergejs Kodors, Ilmars Apeinans, Imants Zarembo and Jelena Lonska

Appl. Sci. 2025, 15(19), 10482; https://doi.org/10.3390/app151910482 - 27 Sep 2025

Abstract

Translating knowledge into formal representation for the purpose of building an expert system is a daunting task for domain experts and requires information technology (IT) competence and software developer support. The availability of open and robust expert system shells is a way to [...] Read more.

Translating knowledge into formal representation for the purpose of building an expert system is a daunting task for domain experts and requires information technology (IT) competence and software developer support. The availability of open and robust expert system shells is a way to solve this task. A new architecture of a rule-based expert system combining the digital twin paradigm and a capability-driven approach is presented in this study. The aim of the architecture is to provide a user-friendly framework for domain experts to build upon without the need to delve into technical aspects. To support this architecture, an open-source no-coding shell RecGen has been developed (Python and Django framework). RecGen was validated on a use case of an expert system for providing recommendations to reduce plate waste in schools. In addition, the article presents experiments with large language models (LLMs) by implementing a question-answering functionality in an attempt to improve the user experience while working with large expert system knowledge bases. A mean classification accuracy of 74.1% was achieved experimentally using the injection method with language prefixes. The ablation test was applied in order to investigate the effect of augmentation, injection, a linear layer size, and lowercase text on LLM accuracy. However, the analysis of the results showed that clustering algorithms would be a more suitable solution for future improvements of the expert system shell RecGen. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 912 KB

Open AccessArticle

Large Language Model and Knowledge Graph-Driven AJCC Staging of Prostate Cancer Using Pathology Reports

by Eunbeen Jo, Tae Il Noh and Hyung Joon Joo

Diagnostics 2025, 15(19), 2474; https://doi.org/10.3390/diagnostics15192474 - 27 Sep 2025

Abstract

Background/Objectives: To develop an automated American Joint Committee on Cancer (AJCC) staging system for radical prostatectomy pathology reports using large language model-based information extraction and knowledge graph validation. Methods: Pathology reports from 152 radical prostatectomy patients were used. Five additional parameters [...] Read more.

Background/Objectives: To develop an automated American Joint Committee on Cancer (AJCC) staging system for radical prostatectomy pathology reports using large language model-based information extraction and knowledge graph validation. Methods: Pathology reports from 152 radical prostatectomy patients were used. Five additional parameters (Prostate-specific antigen (PSA) level, metastasis stage (M-stage), extraprostatic extension, seminal vesicle invasion, and perineural invasion) were extracted using GPT-4.1 with zero-shot prompting. A knowledge graph was constructed to model pathological relationships and implement rule-based AJCC staging with consistency validation. Information extraction performance was evaluated using a local open-source large language model (LLM) (Mistral-Small-3.2-24B-Instruct) across 16 parameters. The LLM-extracted information was integrated into the knowledge graph for automated AJCC staging classification and data consistency validation. The developed system was further validated using pathology reports from 88 radical prostatectomy patients in The Cancer Genome Atlas (TCGA) dataset. Results: Information extraction achieved an accuracy of 0.973 and an F1-score of 0.986 on the internal dataset, and 0.938 and 0.968, respectively, on external validation. AJCC staging classification showed macro-averaged F1-scores of 0.930 and 0.833 for the internal and external datasets, respectively. Knowledge graph-based validation detected data inconsistencies in 5 of 150 cases (3.3%). Conclusions: This study demonstrates the feasibility of automated AJCC staging through the integration of large language model information extraction and knowledge graph-based validation. The resulting system enables privacy-protected clinical decision support for cancer staging applications with extensibility to broader oncologic domains. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

15 pages, 3463 KB

Open AccessArticle

LLM-Enhanced Multimodal Framework for Drug–Drug Interaction Prediction

by Song Im and Younhee Ko

Biomedicines 2025, 13(10), 2355; https://doi.org/10.3390/biomedicines13102355 - 26 Sep 2025

Abstract

Background: Drug–drug interactions (DDIs) involve pharmacokinetic or pharmacodynamic changes that occur when multiple drugs are co-administered, potentially leading to reduced efficacy or adverse effects. As polypharmacy becomes more prevalent, especially among patients with chronic diseases, scalable and accurate DDI prediction has become increasingly [...] Read more.

Background: Drug–drug interactions (DDIs) involve pharmacokinetic or pharmacodynamic changes that occur when multiple drugs are co-administered, potentially leading to reduced efficacy or adverse effects. As polypharmacy becomes more prevalent, especially among patients with chronic diseases, scalable and accurate DDI prediction has become increasingly important. Although numerous computational approaches have been proposed to predict DDIs using various modalities such as chemical structure and biological networks, the intrinsic heterogeneity of these data complicates unified modeling; Methods: We address this challenge with a multimodal deep learning framework that integrates three complementary, heterogeneous modalities: (i) chemical structure, (ii) BioBERT-derived semantic embeddings (a domain-specific large language model, LLM), and (iii) pharmacological mechanisms through the CTET proteins. To incorporate indirect biological pathways within the PPI network, we apply a random walk with restart (RWR) algorithm. Results: Across features combinations, fusing structural feature with BioBERT embedding achieved the highest classification accuracy (0.9655), highlighting the value of readily available data and the capacity of domain-specific language models to encode pharmacological semantics from unstructured texts. Conclusions: BioBERT embeddings were particularly informative, capturing subtle pharmacological relationships between drugs and improving prediction of potential DDIs. Beyond predictive performance, the framework is readily applicable to real-world clinical workflows, providing rapid DDI references to support the polypharmacy decision-making. Full article

(This article belongs to the Special Issue Advances in Drug Discovery and Development Using Mass Spectrometry)

► Show Figures

Figure 1

25 pages, 2375 KB

Open AccessArticle

Evaluating the Effectiveness of Large Language Models (LLMs) Versus Machine Learning (ML) in Identifying and Detecting Phishing Email Attempts

by Saed Tarapiah, Linda Abbas, Oula Mardawi, Shadi Atalla, Yassine Himeur and Wathiq Mansoor

Algorithms 2025, 18(10), 599; https://doi.org/10.3390/a18100599 - 25 Sep 2025

Abstract

Phishing emails remain a significant concern and a growing cybersecurity threat in online communication. They often bypass traditional filters due to their increasing sophistication. This study presents a comparative evaluation of machine learning (ML) models and transformer-based large language models (LLMs) for phishing [...] Read more.

Phishing emails remain a significant concern and a growing cybersecurity threat in online communication. They often bypass traditional filters due to their increasing sophistication. This study presents a comparative evaluation of machine learning (ML) models and transformer-based large language models (LLMs) for phishing email detection, with embedded URL analysis. This study assessed ML training and LLM fine-tuning on both balanced and imbalanced datasets. We evaluated multiple ML models, including Random Forest, Logistic Regression, Support Vector Machine, Naïve Bayes, Gradient Boosting, Decision Tree, and K-Nearest Neighbors, alongside transformer-based LLMs DistilBERT, ALBERT, BERT-Tiny, ELECTRA, MiniLM, and RoBERTa. To further enhance realism, phishing emails generated by LLMs were included in the evaluation. Across all configurations, both the ML models and the fine-tuned LLMs demonstrated robust performance. Random Forest achieved over 98% accuracy in both email detection and URL classification. DistilBERT obtained almost as high scores on emails and URLs. Balancing the dataset led to slight accuracy gains in ML models but minor decreases in LLMs, likely due to their sensitivity to majority class reductions during training. Overall, LLMs are highly effective at capturing complex language patterns, while traditional ML models remain efficient and require low computational resources. Combining both approaches through a hybrid or ensemble method could enhance phishing detection effectiveness. Full article

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

► Show Figures

Figure 1

17 pages, 918 KB

Open AccessArticle

Criteria and Protocol: Assessing Generative AI Efficacy in Perceiving EULAR 2019 Lupus Classification

by Gerald H. Lushington, Sandeep Nair, Eldon R. Jupe, Bernard Rubin and Mohan Purushothaman

Diagnostics 2025, 15(18), 2409; https://doi.org/10.3390/diagnostics15182409 - 22 Sep 2025

Viewed by 184

Abstract

Background/Objectives: In clinical informatics, the term ‘information overload’ is increasingly used to describe the operational impediments of excessive documentation. While electronic health records (EHRs) are growing in abundance, many medical records (MRs) remain in legacy formats that impede efficient, systematic processing, contributing to [...] Read more.

Background/Objectives: In clinical informatics, the term ‘information overload’ is increasingly used to describe the operational impediments of excessive documentation. While electronic health records (EHRs) are growing in abundance, many medical records (MRs) remain in legacy formats that impede efficient, systematic processing, contributing to the extenuating challenges of care fragmentation. Thus, there is a growing interest in using generative AI (genAI) for automated MR summarization and characterization. Methods: MRs for a set of 78 individuals were digitized. Some were known systemic lupus erythematosus (SLE) cases, while others were under evaluation for possible SLE classification. A two-pass genAI assessment strategy was implemented using the Claude 3.5 large language model (LLM) to mine MRs for information relevant to classifying SLE vs. undifferentiated connective tissue disorder (UCTD) vs. neither via the 22-criteria EULAR 2019 model. Results: Compared to clinical determination, the antinuclear antibody (ANA) criterion (whose results are crucial for classifying SLE-negative cases) exhibited favorable sensitivity 0.78 ± 0.09 (95% confidence interval) and a positive predictive value 0.85 ± 0.08 but a marginal performance for specificity 0.60 ± 0.11 and uncertain predictivity for the negative predictive value 0.48 ± 0.11. Averaged over the remaining 21 criteria, these four performance metrics were 0.69 ± 0.11, 0.87 ± 0.04, 0.54 ± 0.10, and 0.93 ± 0.03. Conclusions: ANA performance statistics imply that genAI yields confident assessments of SLE negativity (per high sensitivity) but weaker positivity. The remaining genAI criterial determinations support (per specificity) confident assertions of SLE-positivity but tend to misclassify a significant fraction of clinical positives as UCTD. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Graphical abstract

17 pages, 1622 KB

Open AccessArticle

On Measuring Large Language Models Performance with Inferential Statistics

by Jesús M. Fraile-Hernández and Anselmo Peñas

Information 2025, 16(9), 817; https://doi.org/10.3390/info16090817 - 20 Sep 2025

Viewed by 184

Abstract

Measuring the reliability of performance evaluations is particularly important when we evaluate non-deterministic models. This is the case of using large language models (LLMs) in classification tasks, where different runs generate different outputs. This fact raises the question about how reliable the evaluation [...] Read more.

Measuring the reliability of performance evaluations is particularly important when we evaluate non-deterministic models. This is the case of using large language models (LLMs) in classification tasks, where different runs generate different outputs. This fact raises the question about how reliable the evaluation of a solution is. Previous work relies on executing several runs and then taking some kind of average together with confidence intervals. However, confidence intervals themselves may not be reliable if the number of executions is not large enough. Therefore, more effective and robust methods are needed for their estimation. In this work, we propose a methodology that estimates model performance while capturing the intra-run variability by leveraging instance-level predictions across multiple runs, enabling the computation of more reliable confidence intervals when the gold standard is available. Our method also offers greater computational efficiency by reducing the number of full model executions required to estimate performance variability. Compared against existing state-of-the-art evaluation methods, our approach achieves full empirical coverage (100%) of plausible performance outcomes using as few as three runs, whereas traditional methods reach at most 63% coverage, even with eight runs. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 2229 KB

Open AccessArticle

Large Language Models for Construction Risk Classification: A Comparative Study

by Abdolmajid Erfani and Hussein Khanjar

Buildings 2025, 15(18), 3379; https://doi.org/10.3390/buildings15183379 - 18 Sep 2025

Viewed by 349

Abstract

Risk identification is a critical concern in the construction industry. In recent years, there has been a growing trend of applying artificial intelligence (AI) tools to detect risks from unstructured data sources such as news articles, social media, contracts, and financial reports. The [...] Read more.

Risk identification is a critical concern in the construction industry. In recent years, there has been a growing trend of applying artificial intelligence (AI) tools to detect risks from unstructured data sources such as news articles, social media, contracts, and financial reports. The rapid advancement of large language models (LLMs) in text analysis, summarization, and generation offers promising opportunities to improve construction risk identification. This study conducts a comprehensive benchmarking of natural language processing (NLP) and LLM techniques for automating the classification of risk items into a generic risk category. Twelve model configurations are evaluated, ranging from classical NLP pipelines using TF-IDF and Word2Vec to advanced transformer-based models such as BERT and GPT-4 with zero-shot, instruction, and few-shot prompting strategies. The results reveal that LLMs, particularly GPT-4 with few-shot prompts, achieve a competitive performance (F1 = 0.81) approaching that of the best classical model (BERT + SVM; F1 = 0.86), all without the need for training data. Moreover, LLMs exhibit a more balanced performance across imbalanced risk categories, showcasing their adaptability in data-sparse settings. These findings contribute theoretically by positioning LLMs as scalable plug-and-play alternatives to NLP pipelines, offering practical value by highlighting how LLMs can support early-stage project planning and risk assessment in contexts where labeled data and expert resources are limited. Full article

(This article belongs to the Special Issue Next-Gen Risk Management: AI-Driven Solutions for Engineering and Construction Projects)

► Show Figures

Figure 1

18 pages, 3997 KB

Open AccessArticle

A Novel Multimodal Large Language Model-Based Approach for Urban Flood Detection Using Open-Access Closed Circuit Television in Bandung, Indonesia

by Tsun-Hua Yang, Obaja Triputera Wijaya, Sandy Ardianto and Albert Budi Christian

Water 2025, 17(18), 2739; https://doi.org/10.3390/w17182739 - 16 Sep 2025

Viewed by 248

Abstract

Monitoring urban pluvial floods remains a challenge, particularly in dense city environments where drainage overflows are localized, and sensor-based systems are often impractical. Physical sensors can be costly, prone to theft, and difficult to maintain in areas with high human activity. To address [...] Read more.

Monitoring urban pluvial floods remains a challenge, particularly in dense city environments where drainage overflows are localized, and sensor-based systems are often impractical. Physical sensors can be costly, prone to theft, and difficult to maintain in areas with high human activity. To address this, we developed an innovative flood detection framework that utilizes publicly accessible CCTV imagery and large language models (LLMs) to classify flooding conditions directly from images using natural language prompts. The system was tested in Bandung, Indonesia, across 340 CCTV locations over a one-year period. Four multimodal LLMs, ChatGPT-4.1, Gemini 2.5 Pro, Mistral Pixtral, and DeepSeek-VL Janus, were evaluated based on classification accuracy and operational cost. ChatGPT-4.1 achieved the highest overall accuracy at 85%, with higher performance during the daytime (89%) and lower accuracy at night (78%). A cost analysis showed that deploying GPT-4.1 every 15 min across all locations would require approximately USD 59,568 per year. However, using compact models like GPT-4 nano could reduce costs by up to seven times, with minimal loss of accuracy. These results highlight the trade-off between performance and affordability, especially in developing regions. This approach offers a scalable, passive flood monitoring solution that can be integrated into early warning systems. Future improvements may include multi-frame image analysis, automated confidence filtering, and multi-level flood classification for enhanced situational awareness. Full article

(This article belongs to the Special Issue Machine Learning Models for Hydrological Inference: A Case Study for Flood Events)

► Show Figures

Figure 1

22 pages, 785 KB

Open AccessArticle

Detection of Fake News in Romanian: LLM-Based Approaches to COVID-19 Misinformation

by Alexandru Dima, Ecaterina Ilis, Diana Florea and Mihai Dascalu

Information 2025, 16(9), 796; https://doi.org/10.3390/info16090796 - 13 Sep 2025

Viewed by 365

Abstract

The spread of misinformation during the COVID-19 pandemic raised widespread concerns about public health communication and media reliability. In this study, we focus on these issues as they manifested in Romanian-language media and employ Large Language Models (LLMs) to classify misinformation, with a [...] Read more.

The spread of misinformation during the COVID-19 pandemic raised widespread concerns about public health communication and media reliability. In this study, we focus on these issues as they manifested in Romanian-language media and employ Large Language Models (LLMs) to classify misinformation, with a particular focus on super-narratives—broad thematic categories that capture recurring patterns and ideological framings commonly found in pandemic-related fake news, such as anti-vaccination discourse, conspiracy theories, or geopolitical blame. While some of the categories reflect global trends, others are shaped by the Romanian cultural and political context. We introduce a novel dataset of fake news centered on COVID-19 misinformation in the Romanian geopolitical context, comprising both annotated and unannotated articles. We experimented with multiple LLMs using zero-shot, few-shot, supervised, and semi-supervised learning strategies, achieving the best results with an LLaMA 3.1 8B model and semi-supervised learning, which yielded an F1-score of 78.81%. Experimental evaluations compared this approach to traditional Machine Learning classifiers augmented with morphosyntactic features. Results show that semi-supervised learning substantially improved classification results in both binary and multi-class settings. Our findings highlight the effectiveness of semi-supervised adaptation in low-resource, domain-specific contexts, as well as the necessity of enabling real-time misinformation tracking and enhancing transparency through claim-level explainability and fact-based counterarguments. Full article

(This article belongs to the Special Issue Advances in Information Quality: Fact-Checking and AI in the Era of Fake News)

► Show Figures

Figure 1

59 pages, 3482 KB

Open AccessFeature PaperArticle

Empirical Evaluation of Reasoning LLMs in Machinery Functional Safety Risk Assessment and the Limits of Anthropomorphized Reasoning

by Padma Iyenghar

Electronics 2025, 14(18), 3624; https://doi.org/10.3390/electronics14183624 - 12 Sep 2025

Cited by 1 | Viewed by 341

Abstract

Transparent reasoning and interpretability are essential for AI-supported risk assessment, yet it remains unclear whether large language models (LLMs) can provide reliable, deterministic support for safety-critical tasks or merely simulate reasoning through plausible outputs. This study presents a systematic, multi-model empirical evaluation of [...] Read more.

Transparent reasoning and interpretability are essential for AI-supported risk assessment, yet it remains unclear whether large language models (LLMs) can provide reliable, deterministic support for safety-critical tasks or merely simulate reasoning through plausible outputs. This study presents a systematic, multi-model empirical evaluation of reasoning-capable LLMs applied to machinery functional safety, focusing on Required Performance Level (PL_r) estimation as defined by ISO 13849-1 and ISO 12100. Six state-of-the-art models (Claude-opus, o3-mini, o4-mini, GPT-5-mini, Gemini-2.5-flash, DeepSeek-Reasoner) were evaluated across six prompting strategies and two dataset variants: canonical ISO-style hazards (Variant 1) and engineer-authored free-text scenarios (Variant 2). Results show that rule-grounded prompting consistently stabilizes performance, achieving ceiling-level accuracy in Variant 1 and restoring reliability under lexical variability in Variant 2. In contrast, unconstrained chain-of-thought reasoning (CoT) and CoT together with Retrieval-Augmented Generation (RAG) introduce volatility, overprediction biases, and model-dependent degradations. Safety-critical coverage was quantified through per-class F1 and recall of PL_r class e, confirming that only rule-grounded prompts reliably captured rare but high-risk hazards. Latency analysis demonstrated that rule-only prompts were both the most accurate and the most efficient, while CoT strategies incurred 2–10× overhead. A confusion/rescue analysis of retrieval interactions further revealed systematic noise mechanisms such as P-inflation and F-drift, showing that retrieval can either destabilize or rescue cases depending on model family. Intermediate severity/frequency/possibility (S/F/P) reasoning steps were found to diverge from ISO-consistent logic, reinforcing critiques that LLM “reasoning” reflects surface-level continuation rather than genuine inference. All reported figures include 95% confidence intervals, t-intervals across runs (

r = 5

) for accuracy and timing, and class-stratified bootstrap CIs for Micro/Macro/Weighted-F₁ and per-class metrics. Overall, this study establishes a rigorous benchmark for evaluating LLMs in functional safety workflows such as PL_r determination. It shows that deterministic, safety-critical classification requires strict rule-constrained prompting and careful retrieval governance, rather than reliance on assumed model reasoning abilities. Full article

(This article belongs to the Special Issue New Insights into Natural Language Processing and Large Language Models)

► Show Figures

Figure 1

26 pages, 24511 KB

Open AccessArticle

VTLLM: A Vessel Trajectory Prediction Approach Based on Large Language Models

by Ye Liu, Wei Xiong, Nanyu Chen and Fei Yang

J. Mar. Sci. Eng. 2025, 13(9), 1758; https://doi.org/10.3390/jmse13091758 - 11 Sep 2025

Viewed by 439

Abstract

In light of the rapid expansion of maritime trade, the maritime transportation industry has experienced burgeoning growth and complexity. The deployment of trajectory prediction technology is paramount in safeguarding navigational safety. Due to limitations in design complexity and the high costs of data [...] Read more.

In light of the rapid expansion of maritime trade, the maritime transportation industry has experienced burgeoning growth and complexity. The deployment of trajectory prediction technology is paramount in safeguarding navigational safety. Due to limitations in design complexity and the high costs of data fusion, current deep learning methods struggle to effectively integrate high-level semantic cues, such as vessel type, geographical identifiers, and navigational states, within predictive frameworks. Yet, these data contain abundant information regarding vessel categories or operational scenarios. Inspired by the robust semantic comprehension exhibited by large language models (LLMs) in natural language processing, this study introduces a trajectory prediction method leveraging LLMs. Initially, Automatic Identification System (AIS) data undergoes processing to eliminate incomplete entries, thereby selecting trajectories of high quality. Distinct from prior research that concentrated solely on vessel position and velocity, this study integrates ship identity, spatiotemporal trajectory, and navigational information through prompt engineering, empowering the LLM to extract multidimensional semantic features of trajectories from comprehensive natural language narratives. Thus, the LLM can amalgamate multi-source semantics with zero marginal cost, significantly enhancing its understanding of complex maritime environments. Subsequently, a supervised fine-tuning approach rooted in Low-Rank Adaptation (LoRA) is applied to train the chosen LLMs. This enables rapid adaptation of the LLM to specific maritime areas or vessel classifications by modifying only a limited subset of parameters, thereby appreciably diminishing both data requirements and computational costs. Finally, representative metrics are utilized to evaluate the efficacy of the model training and to benchmark its performance against prevailing advanced models for ship trajectory prediction. The results indicate that the model demonstrates notable performance in short-term predictions fFor instance, with a prediction step of 1 h, the average distance errors for VTLLM and TrAISformer are 5.26 nmi and 6.12 nmi, respectively, resulting in a performance improvement of approximately 14.05%), having identified certain patterns and features, such as linear movements and turns, from the training data. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

29 pages, 3929 KB

Open AccessArticle

Large Language Model-Based Autonomous Agent for Prognostics and Health Management

by Minhyeok Cha, Sang-il Yoon, Seongrae Kim, Daeyoung Kang, Keonwoo Nam, Teakyong Lee and Joon-Young Kim

Machines 2025, 13(9), 831; https://doi.org/10.3390/machines13090831 - 9 Sep 2025

Viewed by 610

Abstract

Prognostics and Health Management (PHM), including fault diagnosis and Remaining Useful Life (RUL) prediction, is critical for ensuring the reliability and efficiency of industrial equipment. However, traditional AI-based methods require extensive expert intervention in data preprocessing, model selection, and hyperparameter tuning, making them [...] Read more.

Prognostics and Health Management (PHM), including fault diagnosis and Remaining Useful Life (RUL) prediction, is critical for ensuring the reliability and efficiency of industrial equipment. However, traditional AI-based methods require extensive expert intervention in data preprocessing, model selection, and hyperparameter tuning, making them less scalable and accessible in real-world applications. To address these limitations, this study proposes an autonomous agent powered by Large Language Models (LLMs) to automate predictive modeling for fault diagnosis and RUL prediction. The proposed agent processes natural language queries, extracts key parameters, and autonomously configures AI models while integrating an iterative optimization mechanism for dynamic hyperparameter tuning. Under identical settings, we compared GPT-3.5 Turbo, GPT-4, GPT-4o, GPT-4o-mini, Gemini-2.0-Flash, and LLaMA-3.2 on accuracy, latency, and cost, using GPT-4 as the baseline. The most accurate model is GPT-4o with an accuracy of 0.96, a gain of six percentage points over GPT-4. It also reduces end-to-end time to 1.900 s and cost to $0.00455 per 1 k tokens, which correspond to reductions of 32% and 59%. For speed and cost efficiency, Gemini-2.0-Flash reaches 0.964 s and $0.00021 per 1 k tokens with accuracy 0.94, an improvement of four percentage points over GPT-4. The agent operates through interconnected modules, seamlessly transitioning from query analysis to AI model deployment while optimizing model selection and performance. Experimental results confirmed that the developed agent achieved stable performance under ideal configurations, attaining accuracy 0.97 on FordA for binary fault classification, accuracy 0.95 on CWRU for multi-fault classification, and an asymmetric score of 380.74 on C-MAPSS FD001 for RUL prediction, while significantly reducing manual intervention. By bridging the gap between domain expertise and AI-driven predictive maintenance, this study advances industrial automation, improving efficiency, scalability, and accessibility. The proposed approach paves the way for the broader adoption of autonomous AI systems in industrial maintenance. Full article

(This article belongs to the Section Automation and Control Systems)

► Show Figures

Figure 1

24 pages, 6133 KB

Open AccessArticle

A Smart System for Continuous Sitting Posture Monitoring, Assessment, and Personalized Feedback

by David Faith Odesola, Janusz Kulon, Shiny Verghese, Adam Partlow and Colin Gibson

Sensors 2025, 25(18), 5610; https://doi.org/10.3390/s25185610 - 9 Sep 2025

Viewed by 926

Abstract

Prolonged sitting and the adoption of unhealthy sitting postures have been a common issue generally seen among many adults and the working population in recent years. This alone has contributed to the alarming rise of various health issues, such as musculoskeletal disorders and [...] Read more.

Prolonged sitting and the adoption of unhealthy sitting postures have been a common issue generally seen among many adults and the working population in recent years. This alone has contributed to the alarming rise of various health issues, such as musculoskeletal disorders and a range of long-term health conditions. Hence, this study proposes the development of a novel smart-sensing chair system designed to analyze and provide actionable insights to help encourage better postural habits and promote well-being. The proposed system was equipped with two 32 × 32 pressure sensor mats, which were integrated into an office chair to facilitate the collection of postural data. Unlike traditional approaches that rely on generalized datasets collected from multiple healthy participants to train machine learning models, this study adopts a user-tailored methodology—collecting data from a single individual to account for their unique physiological characteristics and musculoskeletal conditions. The dataset was trained using five different machine learning models—Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Convolutional Neural Networks (CNN)—to classify 19 distinct sitting postures. Overall, CNN achieved the highest accuracy, with 98.29%. To facilitate user engagement and support long-term behavior change, we developed SitWell—an intelligent postural feedback platform comprising both mobile and web applications. The platform’s core features include sitting posture classification, posture duration analytics, and sitting quality assessment. Additionally, the platform integrates OpenAI’s GPT-4o Large Language Model (LLM) to deliver personalized insights and recommendations based on users’ historical posture data. Full article

(This article belongs to the Special Issue Advanced Non-Invasive Sensors: Methods and Applications—2nd Edition)

► Show Figures

Figure 1

Search Results (175)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (175)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI