Saved Queries

Adversarial attacks in Natural Language Processing (NLP) present a critical challenge, particularly in sentiment analysis, where subtle input modifications can significantly alter model predictions. In search of more robust defenses against adversarial attacks on sentimental analysis, this research work introduces two novel defense mechanisms: the Lexicon-Based Random Substitute Model (LRSM) and the Word-Variant Voting Model (WVVM). LRSM employs randomized substitutions from a dataset-specific lexicon to generate diverse input variations, disrupting adversarial strategies by introducing unpredictability. Unlike traditional defenses requiring synonym dictionaries or precomputed semantic relationships, LRSM directly substitutes words with random lexicon alternatives, reducing overhead while maintaining robustness. Notably, LRSM not only neutralizes adversarial perturbations but occasionally surpasses the original accuracy by correcting inherent model misclassifications. Building on LRSM, WVVM integrates LRSM, Frequency-Guided Word Substitution (FGWS), and Synonym Random Substitution and Voting (RS&V) in an ensemble framework that adaptively combines their outputs. Logistic Regression (LR) emerged as the optimal ensemble configuration, leveraging its regularization parameters to balance the contributions of individual defenses. WVVM consistently outperformed standalone defenses, demonstrating superior restored accuracy and F1 scores across adversarial scenarios. The proposed defenses were evaluated on two well-known sentiment analysis benchmarks: the IMDB Sentiment Dataset and the Yelp Polarity Dataset. The IMDB dataset, comprising 50,000 labeled movie reviews, and the Yelp Polarity dataset, containing labeled business reviews, provided diverse linguistic challenges for assessing adversarial robustness. Both datasets were tested using 4000 adversarial examples generated by established attacks, including Probability Weighted Word Saliency, TextFooler, and BERT-based Adversarial Examples. WVVM and LRSM demonstrated superior performance in restoring accuracy and F1 scores across both datasets, with WVVM excelling through its ensemble learning framework. LRSM improved restored accuracy from 75.66% to 83.7% when compared to the second-best individual model, RS&V, while the Support Vector Classifier WVVM variation further improved restored accuracy to 93.17%. Logistic Regression WVVM achieved an F1 score of 86.26% compared to 76.80% for RS&V. These findings establish LRSM and WVVM as robust frameworks for defending against adversarial text attacks in sentiment analysis. Full article

(This article belongs to the Special Issue When Natural Language Processing Meets Machine Learning—Opportunities, Challenges and Solutions)

►▼ Show Figures

Figure 1

22 pages, 2498 KiB

Open AccessArticle

SceEmoNet: A Sentiment Analysis Model with Scene Construction Capability

by Yi Liang, Dongfang Han, Zhenzhen He, Bo Kong and Shuanglin Wen

Appl. Sci. 2025, 15(15), 8588; https://doi.org/10.3390/app15158588 (registering DOI) - 2 Aug 2025

Abstract

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive the final analysis result. However, current sentiment analysis models lack such imagination; they can only analyze based on existing information in the text, which limits their classification accuracy. To address this issue, we propose the SceEmoNet model. This model endows text classification models with imagination through Stable diffusion, enabling the model to generate corresponding visual scenes from input text, thus introducing a new modality of visual information. We then use the Contrastive Language-Image Pre-training (CLIP) model, a multimodal feature extraction model, to extract aligned features from different modalities, preventing significant feature differences caused by data heterogeneity. Finally, we fuse information from different modalities using late fusion to obtain the final classification result. Experiments on six datasets with different classification tasks show improvements of 9.57%, 3.87%, 3.63%, 3.14%, 0.77%, and 0.28%, respectively. Additionally, we set up experiments to deeply analyze the model’s advantages and limitations, providing a new technical path for follow-up research. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)

►▼ Show Figures

Figure 1

25 pages, 1138 KiB

Open AccessArticle

Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information

by Fei Chen and Wenchi Zhou

Electronics 2025, 14(15), 3092; https://doi.org/10.3390/electronics14153092 (registering DOI) - 1 Aug 2025

Abstract

In order to increase the effectiveness of model training, data reduction is essential to data-centric Artificial Intelligence (AI). It achieves this by locating the most instructive examples in massive datasets. To increase data quality and training efficiency, the main difficulty is choosing the best examples rather than the complete datasets. In this paper, we propose an effective data reduction strategy based on Pointwise 𝒱-Information (PVI). To enable a static method, we first use PVI to quantify instance difficulty and remove instances with low difficulty. Experiments show that classifier performance is maintained with only a 0.0001% to 0.76% decline in accuracy when 10–30% of the data is removed. Second, we train the classifiers using a progressive learning strategy on examples sorted by increasing PVI, accelerating convergence and achieving a 0.8% accuracy gain over conventional training. Our findings imply that training a classifier on the chosen optimal subset may improve model performance and increase training efficiency when combined with an efficient data reduction strategy. Furthermore, we have adapted the PVI framework, which was previously limited to English datasets, to a variety of Chinese Natural Language Processing (NLP) tasks and base models, yielding insightful results for faster training and cross-lingual data reduction. Full article

(This article belongs to the Special Issue Data Retrieval and Data Mining)

25 pages, 2860 KiB

Open AccessReview

Multimodal Sensing-Enabled Large Language Models for Automated Emotional Regulation: A Review of Current Technologies, Opportunities, and Challenges

by Liangyue Yu, Yao Ge, Shuja Ansari, Muhammad Imran and Wasim Ahmad

Sensors 2025, 25(15), 4763; https://doi.org/10.3390/s25154763 (registering DOI) - 1 Aug 2025

Abstract

Emotion regulation is essential for mental health. However, many people ignore their own emotional regulation or are deterred by the high cost of psychological counseling, which poses significant challenges to making effective support widely available. This review systematically examines the convergence of multimodal sensing technologies and large language models (LLMs) for the development of Automated Emotional Regulation (AER) systems. The review draws upon a comprehensive analysis of the existing literature, encompassing research papers, technical reports, and relevant theoretical frameworks. Key findings indicate that multimodal sensing offers the potential for rich, contextualized data pertaining to emotional states, while LLMs provide improved capabilities for interpreting these inputs and generating nuanced, empathetic, and actionable regulatory responses. The integration of these technologies, including physiological sensors, behavioral tracking, and advanced LLM architectures, presents the improvement of application, moving AER beyond simpler, rule-based systems towards more adaptive, context-aware, and human-like interventions. Opportunities for personalized interventions, real-time support, and novel applications in mental healthcare and other domains are considerable. However, these prospects are counterbalanced by significant challenges and limitations. In summary, this review synthesizes current technological advancements, identifies substantial opportunities for innovation and application, and critically analyzes the multifaceted technical, ethical, and practical challenges inherent in this domain. It also concludes that while the integration of multimodal sensing and LLMs holds significant potential for AER, the field is nascent and requires concerted research efforts to realize its full capacity to enhance human well-being. Full article

(This article belongs to the Section Intelligent Sensors)

►▼ Show Figures

Figure 1

16 pages, 1651 KiB

Open AccessArticle

Modular Pipeline for Text Recognition in Early Printed Books Using Kraken and ByT5

by Yahya Momtaz, Lorenza Laccetti and Guido Russo

Electronics 2025, 14(15), 3083; https://doi.org/10.3390/electronics14153083 (registering DOI) - 1 Aug 2025

Viewed by 110

Abstract

Early printed books, particularly incunabula, are invaluable archives of the beginnings of modern educational systems. However, their complex layouts, antique typefaces, and page degradation caused by bleed-through and ink fading pose significant challenges for automatic transcription. In this work, we present a modular pipeline that addresses these problems by combining modern layout analysis and language modeling techniques. The pipeline begins with historical layout-aware text segmentation using Kraken, a neural network-based tool tailored for early typographic structures. Initial optical character recognition (OCR) is then performed with Kraken’s recognition engine, followed by post-correction using a fine-tuned ByT5 transformer model trained on manually aligned line-level data. By learning to map noisy OCR outputs to verified transcriptions, the model substantially improves recognition quality. The pipeline also integrates a preprocessing stage based on our previous work on bleed-through removal using robust statistical filters, including non-local means, Gaussian mixtures, biweight estimation, and Gaussian blur. This step enhances the legibility of degraded pages prior to OCR. The entire solution is open, modular, and scalable, supporting long-term preservation and improved accessibility of cultural heritage materials. Experimental results on 15th-century incunabula show a reduction in the Character Error Rate (CER) from around 38% to around 15% and an increase in the Bilingual Evaluation Understudy (BLEU) score from 22 to 44, confirming the effectiveness of our approach. This work demonstrates the potential of integrating transformer-based correction with layout-aware segmentation to enhance OCR accuracy in digital humanities applications. Full article

(This article belongs to the Special Issue Electronics and Computer Science for Cultural Heritage: Advancements, Preservation, and Applications, 2nd Edition)

►▼ Show Figures

Figure 1

15 pages, 1515 KiB

Open AccessArticle

Ontology-Based Data Pipeline for Semantic Reaction Classification and Research Data Management

by Hendrik Borgelt, Frederick Gabriel Kitel and Norbert Kockmann

Computers 2025, 14(8), 311; https://doi.org/10.3390/computers14080311 (registering DOI) - 1 Aug 2025

Viewed by 100

Abstract

Catalysis research is complex and interdisciplinary, involving diverse physical effects and challenging data practices. Research data often captures only selected aspects, such as specific reactants and products, limiting its utility for machine learning and the implementation of FAIR (Findable, Accessible, Interoperable, Reusable) workflows. To improve this, semantic structuring through ontologies is essential. This work extends the established ontologies by refining logical relations and integrating semantic tools such as the Web Ontology Language or the Shape Constraint Language. It incorporates application programming interfaces from chemical databases, such as the Kyoto Encyclopedia of Genes and Genomes and the National Institute of Health’s PubChem database, and builds upon established ontologies. A key innovation lies in automatically decomposing chemical substances through database entries and chemical identifier representations to identify functional groups, enabling more generalized reaction classification. Using new semantic functionality, functional groups are flexibly addressed, improving the classification of reactions such as saponification and ester cleavage with simultaneous oxidation. A graphical interface (GUI) supports user interaction with the knowledge graph, enabling ontological reasoning and querying. This approach demonstrates improved specificity of the newly established ontology over its predecessors and offers a more user-friendly interface for engaging with structured chemical knowledge. Future work will focus on expanding ontology coverage to support a wider range of reactions in catalysis research. Full article

►▼ Show Figures

Figure 1

20 pages, 562 KiB

Open AccessArticle

Effectiveness of a Post-Acute-Care Rehabilitation Program in Patients with Stroke: A Retrospective Cohort Study

by Yi-Pang Lo, Mei-Chen Wang, Yao-Hsiang Chen, Shang-Lin Chiang and Chia-Huei Lin

Life 2025, 15(8), 1216; https://doi.org/10.3390/life15081216 - 1 Aug 2025

Viewed by 180

Abstract

Early rehabilitation is essential for restoring functional recovery in patients with stroke, particularly during the early phase of post-acute care (PAC), or the subacute stage. We aimed to evaluate the effectiveness of a 7-week PAC rehabilitation program in improving muscle strength, physical performance, and functional recovery. A total of 219 inpatients with stroke in the subacute stage were initially recruited from the PAC ward of a regional teaching hospital in Northern Taiwan, with 79 eligible patients—within 1 month of an acute stroke—included in the analysis. The program was delivered 5 days per week, with 3–4 sessions daily (20–30 min each, up to 120 min daily), comprising physical, occupational, and speech–language therapies. Sociodemographic data, muscle strength, physical performance (Berg Balance Scale [BBS], gait speed, and 6-minute walk test [6MWT]), and functional recovery (modified Rankin Scale [mRS], Barthel Index [BI], Instrumental Activities of Daily Living [IADL], and Fugl–Meyer assessment: sensory and upper extremity) were collected at baseline, 3 weeks, and 7 weeks. Generalized estimating equations analyzed program effectiveness. Among the 56 patients (70.9%) who completed the program, significant improvements were observed in the muscle strength of both the affected upper (B = 0.93, p < 0.001) and lower limbs (B = 0.88, p < 0.001), as well as in their corresponding unaffected limbs; in physical performance, including balance (BBS score: B = 9.70, p = 0.003) and gait speed (B = 0.23, p = 0.024); and in functional recovery, including BI (B = 19.5, p < 0.001), IADL (B = 1.48, p < 0.001), and mRS (B = −0.13, p = 0.028). These findings highlight the 7-week PAC rehabilitation program as an effective strategy during the critical recovery phase for patients with stroke. Full article

(This article belongs to the Special Issue Advances in the Rehabilitation of Stroke)

►▼ Show Figures

Figure 1

12 pages, 1346 KiB

Open AccessArticle

A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology

by Yi Luo, Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Xiaojian Chen, Rui Zhang, Quan Chen, Wil Ngwa and Kai Ding

Bioengineering 2025, 12(8), 835; https://doi.org/10.3390/bioengineering12080835 (registering DOI) - 31 Jul 2025

Viewed by 99

Abstract

Background: Lung cancer ranks as the leading cause of cancer-related mortality worldwide. The complexity of tumor delineation, crucial for radiation therapy, requires expertise often unavailable in resource-limited settings. Artificial Intelligence (AI), particularly with advancements in deep learning (DL) and natural language processing (NLP), offers potential solutions yet is challenged by high false positive rates. Purpose: The Oncology Contouring Copilot (OCC) system is developed to leverage oncologist expertise for precise tumor contouring using textual descriptions, aiming to increase the efficiency of oncological workflows by combining the strengths of AI with human oversight. Methods: Our OCC system initially identifies nodule candidates from CT scans. Employing Language Vision Models (LVMs) like GPT-4V, OCC then effectively reduces false positives with clinical descriptive texts, merging textual and visual data to automate tumor delineation, designed to elevate the quality of oncology care by incorporating knowledge from experienced domain experts. Results: The deployment of the OCC system resulted in a 35.0% reduction in the false discovery rate, a 72.4% decrease in false positives per scan, and an F1-score of 0.652 across our dataset for unbiased evaluation. Conclusions: OCC represents a significant advance in oncology care, particularly through the use of the latest LVMs, improving contouring results by (1) streamlining oncology treatment workflows by optimizing tumor delineation and reducing manual processes; (2) offering a scalable and intuitive framework to reduce false positives in radiotherapy planning using LVMs; (3) introducing novel medical language vision prompt techniques to minimize LVM hallucinations with ablation study; and (4) conducting a comparative analysis of LVMs, highlighting their potential in addressing medical language vision challenges. Full article

(This article belongs to the Special Issue Novel Imaging Techniques in Radiotherapy)

►▼ Show Figures

Figure 1

23 pages, 7266 KiB

Open AccessArticle

Intelligent ESG Evaluation for Construction Enterprises in China: An LLM-Based Model

by Binqing Cai, Zhukai Ye and Shiwei Chen

Buildings 2025, 15(15), 2710; https://doi.org/10.3390/buildings15152710 (registering DOI) - 31 Jul 2025

Viewed by 88

Abstract

Environmental, social, and governance (ESG) evaluation has become increasingly critical for company sustainability assessments, especially for enterprises in the construction industry with a high environmental burden. However, existing methods face limitations in subjective evaluation, inconsistent ratings across agencies, and a lack of industry-specificity. To address these limitations, this study proposes a large language model (LLM)-based intelligent ESG evaluation model specifically designed for the construction enterprises in China. The model integrates three modules: (1) an ESG report information extraction module utilizing natural language processing and Chinese pre-trained language models to identify and classify ESG-relevant statements; (2) an ESG rating prediction module employing XGBoost regression with SHAP analysis to predict company ratings and quantify individual statement contributions; and (3) an ESG intelligent evaluation module combining knowledge graph construction with fine-tuned Qwen2.5 language models using Chain-of-Thought (CoT). Empirical validation demonstrates that the model achieves 93.33% accuracy in the ESG rating classification and an R² score of 0.5312. SHAP analysis reveals that environmental factors contribute most significantly to rating predictions (38.7%), followed by governance (32.0%) and social dimensions (29.3%). The fine-tuned LLM integrated with knowledge graph shows improved evaluation consistency, achieving 65% accuracy compared to 53.33% for standalone LLM approaches, constituting a relative improvement of 21.88%. This study contributes to the ESG evaluation methodology by providing an objective, industry-specific, and interpretable framework that enhances rating consistency and provides actionable insights for enterprise sustainability improvement. This research provides guidance for automated and intelligent ESG evaluations for construction enterprises while addressing critical gaps in current ESG practices. Full article

(This article belongs to the Topic Improving Nature-Smart Policies through Innovative Resilient Evaluations)

►▼ Show Figures

Figure 1

15 pages, 415 KiB

Open AccessArticle

Enhancing MusicGen with Prompt Tuning

by Hohyeon Shin, Jeonghyeon Im and Yunsick Sung

Appl. Sci. 2025, 15(15), 8504; https://doi.org/10.3390/app15158504 (registering DOI) - 31 Jul 2025

Viewed by 72

Abstract

Generative AI has been gaining attention across various creative domains. In particular, MusicGen stands out as a representative approach capable of generating music based on text or audio inputs. However, it has limitations in producing high-quality outputs for specific genres and fully reflecting user intentions. This paper proposes a prompt tuning technique that effectively adjusts the output quality of MusicGen without modifying its original parameters and optimizes its ability to generate music tailored to specific genres and styles. Experiments were conducted to compare the performance of the traditional MusicGen with the proposed method and evaluate the quality of generated music using the Contrastive Language-Audio Pretraining (CLAP) and Kullback–Leibler Divergence (KLD) scoring approaches. The results demonstrated that the proposed method significantly improved the output quality and musical coherence, particularly for specific genres and styles. Compared with the traditional model, the CLAP score was increased by 0.1270, and the KLD score was increased by 0.00403 on average. The effectiveness of prompt tuning in optimizing the performance of MusicGen validated the proposed method and highlighted its potential for advancing generative AI-based music generation tools. Full article

(This article belongs to the Special Issue Recent Advances in AI Convergence: Innovations at the Crossroads of Disciplines)

►▼ Show Figures

Figure 1

23 pages, 4379 KiB

Open AccessArticle

Large Vision Language Model: Enhanced-RSCLIP with Exemplar-Image Prompting for Uncommon Object Detection in Satellite Imagery

by Taiwo Efunogbon, Abimbola Efunogbon, Enjie Liu, Dayou Li and Renxi Qiu

Electronics 2025, 14(15), 3071; https://doi.org/10.3390/electronics14153071 (registering DOI) - 31 Jul 2025

Viewed by 99

Abstract

Large Vision Language Models (LVLMs) have shown promise in remote sensing applications, yet struggle with “uncommon” objects that lack sufficient public labeled data. This paper presents Enhanced-RSCLIP, a novel dual-prompt architecture that combines text prompting with exemplar-image processing for cattle herd detection in satellite imagery. Our approach introduces a key innovation where an exemplar-image preprocessing module using crop-based or attention-based algorithms extracts focused object features which are fed as a dual stream to a contrastive learning framework that fuses textual descriptions with visual exemplar embeddings. We evaluated our method on a custom dataset of 260 satellite images across UK and Nigerian regions. Enhanced-RSCLIP with crop-based exemplar processing achieved 72% accuracy in cattle detection and 56.2% overall accuracy on cross-domain transfer tasks, significantly outperforming text-only CLIP (31% overall accuracy). The dual-prompt architecture enables effective few-shot learning and cross-regional transfer from data-rich (UK) to data-sparse (Nigeria) environments, demonstrating a 41% improvement over baseline approaches for uncommon object detection in satellite imagery. Full article

(This article belongs to the Topic Next-Generation IoT and Smart Systems for Communication and Sensing)

►▼ Show Figures

Figure 1

21 pages, 1750 KiB

Open AccessArticle

Predictive Analytics Leveraging a Machine Learning Approach to Identify Students’ Reasons for Dropping out of University

by Asmaa El Mahmoudi, Nour El Houda Chaoui and Habiba Chaoui

Appl. Sci. 2025, 15(15), 8496; https://doi.org/10.3390/app15158496 (registering DOI) - 31 Jul 2025

Viewed by 119

Abstract

In today’s fast-changing world, the higher education system must evolve to enhance the quality of learning and teaching. Fulfilling the role of a university is a major challenge. Universities must implement strategies that place the student at the center of their concerns; so, these strategies must be designed for and by the student. However, the high university dropout rate is one of the current problems faced by many universities. This suggests that there are some issues that hinder the learning process. Several studies have highlighted the advantage of artificial intelligence (AI) technologies in providing explorative and predictive analyses that explain why students are dropping out, with the aim of improving the quality of teaching and providing an integrated learning environment. This paper proposes a framework that predicts student dropout rates using machine learning techniques, based on data collected from various sources. Data collection was carried out between 2022 and 2024. We used a quantitative analysis method employed through a questionnaire distributed to 120 students (aged 18–26) from open access faculties of a Moroccan public university to identify the factors leading to an increase in university dropout rates. We discuss the impact of selected variables, and the findings show that several factors are related to university dropout rates, such as social background, psychological and health problems, insufficient motivation of professors, limited perspective on educational programs, changes in language and teaching methodologies, absenteeism, student attitude, and a lack of interaction between professors and students. Full article

(This article belongs to the Special Issue ICT in Education, 2nd Edition)

►▼ Show Figures

Figure 1

13 pages, 564 KiB

Open AccessArticle

Enhanced Semantic Retrieval with Structured Prompt and Dimensionality Reduction for Big Data

by Donghyeon Kim, Minki Park, Jungsun Lee, Inho Lee, Jeonghyeon Jin and Yunsick Sung

Mathematics 2025, 13(15), 2469; https://doi.org/10.3390/math13152469 - 31 Jul 2025

Viewed by 185

Abstract

The exponential increase in textual data generated across sectors such as healthcare, finance, and smart manufacturing has intensified the need for effective Big Data analytics. Large language models (LLMs) have become critical tools because of their advanced language processing capabilities. However, their static nature limits their ability to incorporate real-time and domain-specific knowledge. Retrieval-augmented generation (RAG) addresses these limitations by enriching LLM outputs through external content retrieval. Nevertheless, traditional RAG systems remain inefficient, often exhibiting high retrieval latency, redundancy, and diminished response quality when scaled to large datasets. This paper proposes an innovative structured RAG framework specifically designed for large-scale Big Data analytics. The framework transforms unstructured partial prompts into structured semantically coherent partial prompts, leveraging element-specific embedding models and dimensionality reduction techniques, such as principal component analysis. To further improve the retrieval accuracy and computational efficiency, we introduce a multi-level filtering approach integrating semantic constraints and redundancy elimination. In the experiments, the proposed method was compared with structured-format RAG. After generating prompts utilizing two methods, silhouette scores were computed to assess the quality of embedding clusters. The proposed method outperformed the baseline by improving the clustering quality by 32.3%. These results demonstrate the effectiveness of the framework in enhancing LLMs for accurate, diverse, and efficient decision-making in complex Big Data environments. Full article

(This article belongs to the Special Issue Big Data Analysis, Computing and Applications)

►▼ Show Figures

Figure 1

16 pages, 628 KiB

Open AccessArticle

Beyond the Bot: A Dual-Phase Framework for Evaluating AI Chatbot Simulations in Nursing Education

by Phillip Olla, Nadine Wodwaski and Taylor Long

Nurs. Rep. 2025, 15(8), 280; https://doi.org/10.3390/nursrep15080280 (registering DOI) - 31 Jul 2025

Viewed by 152

Abstract

Background/Objectives: The integration of AI chatbots in nursing education, particularly in simulation-based learning, is advancing rapidly. However, there is a lack of structured evaluation models, especially to assess AI-generated simulations. This article introduces the AI-Integrated Method for Simulation (AIMS) evaluation framework, a dual-phase evaluation framework adapted from the FAITA model, designed to evaluate both prompt design and chatbot performance in the context of nursing education. Methods: This simulation-based study explored the application of an AI chatbot in an emergency planning course. The AIMS framework was developed and applied, consisting of six prompt-level domains (Phase 1) and eight performance criteria (Phase 2). These domains were selected based on current best practices in instructional design, simulation fidelity, and emerging AI evaluation literature. To assess the chatbots educational utility, the study employed a scoring rubric for each phase and incorporated a structured feedback loop to refine both prompt design and chatbox interaction. To demonstrate the framework’s practical application, the researchers configured an AI tool referred to in this study as “Eval-Bot v1”, built using OpenAI’s GPT-4.0, to apply Phase 1 scoring criteria to a real simulation prompt. Insights from this analysis were then used to anticipate Phase 2 performance and identify areas for improvement. Participants (three individuals)—all experienced healthcare educators and advanced practice nurses with expertise in clinical decision-making and simulation-based teaching—reviewed the prompt and Eval-Bot’s score to triangulate findings. Results: Simulated evaluations revealed clear strengths in the prompt alignment with course objectives and its capacity to foster interactive learning. Participants noted that the AI chatbot supported engagement and maintained appropriate pacing, particularly in scenarios involving emergency planning decision-making. However, challenges emerged in areas related to personalization and inclusivity. While the chatbot responded consistently to general queries, it struggled to adapt tone, complexity and content to reflect diverse learner needs or cultural nuances. To support replication and refinement, a sample scoring rubric and simulation prompt template are provided. When evaluated using the Eval-Bot tool, moderate concerns were flagged regarding safety prompts and inclusive language, particularly in how the chatbot navigated sensitive decision points. These gaps were linked to predicted performance issues in Phase 2 domains such as dialog control, equity, and user reassurance. Based on these findings, revised prompt strategies were developed to improve contextual sensitivity, promote inclusivity, and strengthen ethical guidance within chatbot-led simulations. Conclusions: The AIMS evaluation framework provides a practical and replicable approach for evaluating the use of AI chatbots in simulation-based education. By offering structured criteria for both prompt design and chatbot performance, the model supports instructional designers, simulation specialists, and developers in identifying areas of strength and improvement. The findings underscore the importance of intentional design, safety monitoring, and inclusive language when integrating AI into nursing and health education. As AI tools become more embedded in learning environments, this framework offers a thoughtful starting point for ensuring they are applied ethically, effectively, and with learner diversity in mind. Full article

►▼ Show Figures

Figure 1

16 pages, 2647 KiB

Open AccessArticle

“Habari, Colleague!”: A Qualitative Exploration of the Perceptions of Primary School Mathematics Teachers in Tanzania Regarding the Use of Social Robots

by Edger P. Rutatola, Koen Stroeken and Tony Belpaeme

Appl. Sci. 2025, 15(15), 8483; https://doi.org/10.3390/app15158483 (registering DOI) - 30 Jul 2025

Viewed by 139

Abstract

The education sector in Tanzania faces significant challenges, especially in public primary schools. Unmanageably large classes and critical teacher–pupil ratios hinder the provision of tailored tutoring, impeding pupils’ educational growth. However, artificial intelligence (AI) could provide a way forward. Advances in generative AI can be leveraged to create interactive and effective intelligent tutoring systems, which have recently been built into embodied systems such as social robots. Motivated by the pivotal influence of teachers’ attitudes on the adoption of educational technologies, this study undertakes a qualitative investigation of Tanzanian primary school mathematics teachers’ perceptions of contextualised intelligent social robots. Thirteen teachers from six schools in both rural and urban settings observed pupils learning with a social robot. They reported their views during qualitative interviews. The results, analysed thematically, reveal a generally positive attitude towards using social robots in schools. While commended for their effective teaching and suitability for one-to-one tutoring, concerns were raised about incorrect and inconsistent feedback, language code-switching, response latency, and the lack of support infrastructure. We suggest actionable steps towards adopting tutoring systems and social robots in schools in Tanzania and similar low-resource countries, paving the way for their adoption to redress teachers’ workloads and improve educational outcomes. Full article

(This article belongs to the Special Issue Advances in Human–Machine Interaction)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 94.

Go to page 1 2 3 4 5

Search Results (4,680)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI