Saved Queries

Adversarial attacks in Natural Language Processing (NLP) present a critical challenge, particularly in sentiment analysis, where subtle input modifications can significantly alter model predictions. In search of more robust defenses against adversarial attacks on sentimental analysis, this research work introduces two novel defense mechanisms: the Lexicon-Based Random Substitute Model (LRSM) and the Word-Variant Voting Model (WVVM). LRSM employs randomized substitutions from a dataset-specific lexicon to generate diverse input variations, disrupting adversarial strategies by introducing unpredictability. Unlike traditional defenses requiring synonym dictionaries or precomputed semantic relationships, LRSM directly substitutes words with random lexicon alternatives, reducing overhead while maintaining robustness. Notably, LRSM not only neutralizes adversarial perturbations but occasionally surpasses the original accuracy by correcting inherent model misclassifications. Building on LRSM, WVVM integrates LRSM, Frequency-Guided Word Substitution (FGWS), and Synonym Random Substitution and Voting (RS&V) in an ensemble framework that adaptively combines their outputs. Logistic Regression (LR) emerged as the optimal ensemble configuration, leveraging its regularization parameters to balance the contributions of individual defenses. WVVM consistently outperformed standalone defenses, demonstrating superior restored accuracy and F1 scores across adversarial scenarios. The proposed defenses were evaluated on two well-known sentiment analysis benchmarks: the IMDB Sentiment Dataset and the Yelp Polarity Dataset. The IMDB dataset, comprising 50,000 labeled movie reviews, and the Yelp Polarity dataset, containing labeled business reviews, provided diverse linguistic challenges for assessing adversarial robustness. Both datasets were tested using 4000 adversarial examples generated by established attacks, including Probability Weighted Word Saliency, TextFooler, and BERT-based Adversarial Examples. WVVM and LRSM demonstrated superior performance in restoring accuracy and F1 scores across both datasets, with WVVM excelling through its ensemble learning framework. LRSM improved restored accuracy from 75.66% to 83.7% when compared to the second-best individual model, RS&V, while the Support Vector Classifier WVVM variation further improved restored accuracy to 93.17%. Logistic Regression WVVM achieved an F1 score of 86.26% compared to 76.80% for RS&V. These findings establish LRSM and WVVM as robust frameworks for defending against adversarial text attacks in sentiment analysis. Full article

(This article belongs to the Special Issue When Natural Language Processing Meets Machine Learning—Opportunities, Challenges and Solutions)

►▼ Show Figures

Figure 1

22 pages, 2498 KiB

Open AccessArticle

SceEmoNet: A Sentiment Analysis Model with Scene Construction Capability

by Yi Liang, Dongfang Han, Zhenzhen He, Bo Kong and Shuanglin Wen

Appl. Sci. 2025, 15(15), 8588; https://doi.org/10.3390/app15158588 (registering DOI) - 2 Aug 2025

Abstract

How do humans analyze the sentiments embedded in text? When attempting to analyze a text, humans construct a “scene” in their minds through imagination based on the text, generating a vague image. They then synthesize the text and the mental image to derive the final analysis result. However, current sentiment analysis models lack such imagination; they can only analyze based on existing information in the text, which limits their classification accuracy. To address this issue, we propose the SceEmoNet model. This model endows text classification models with imagination through Stable diffusion, enabling the model to generate corresponding visual scenes from input text, thus introducing a new modality of visual information. We then use the Contrastive Language-Image Pre-training (CLIP) model, a multimodal feature extraction model, to extract aligned features from different modalities, preventing significant feature differences caused by data heterogeneity. Finally, we fuse information from different modalities using late fusion to obtain the final classification result. Experiments on six datasets with different classification tasks show improvements of 9.57%, 3.87%, 3.63%, 3.14%, 0.77%, and 0.28%, respectively. Additionally, we set up experiments to deeply analyze the model’s advantages and limitations, providing a new technical path for follow-up research. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)

►▼ Show Figures

Figure 1

20 pages, 1253 KiB

Open AccessArticle

Multimodal Detection of Emotional and Cognitive States in E-Learning Through Deep Fusion of Visual and Textual Data with NLP

by Qamar El Maazouzi and Asmaa Retbi

Computers 2025, 14(8), 314; https://doi.org/10.3390/computers14080314 (registering DOI) - 2 Aug 2025

Abstract

In distance learning environments, learner engagement directly impacts attention, motivation, and academic performance. Signs of fatigue, negative affect, or critical remarks can warn of growing disengagement and potential dropout. However, most existing approaches rely on a single modality, visual or text-based, without providing a general view of learners’ cognitive and affective states. We propose a multimodal system that integrates three complementary analyzes: (1) a CNN-LSTM model augmented with warning signs such as PERCLOS and yawning frequency for fatigue detection, (2) facial emotion recognition by EmoNet and an LSTM to handle temporal dynamics, and (3) sentiment analysis of feedback by a fine-tuned BERT model. It was evaluated on three public benchmarks: DAiSEE for fatigue, AffectNet for emotion, and MOOC Review (Coursera) for sentiment analysis. The results show a precision of 88.5% for fatigue detection, 70% for emotion detection, and 91.5% for sentiment analysis. Aggregating these cues enables an accurate identification of disengagement periods and triggers individualized pedagogical interventions. These results, although based on independently sourced datasets, demonstrate the feasibility of an integrated approach to detecting disengagement and open the door to emotionally intelligent learning systems with potential for future work in real-time content personalization and adaptive learning assistance. Full article

(This article belongs to the Special Issue Present and Future of E-Learning Technologies (2nd Edition))

►▼ Show Figures

Figure 1

18 pages, 638 KiB

Open AccessArticle

The Influence of Teaching Songs with Text and a Neutral Syllable on 4-to-9-Year-Old Portuguese Children’s Vocal Performance

by Ana Isabel Pereira and Helena Rodrigues

Educ. Sci. 2025, 15(8), 984; https://doi.org/10.3390/educsci15080984 (registering DOI) - 1 Aug 2025

Abstract

Research on children’s singing development is extensive. Different ages, approaches, and variables have been taken into consideration. However, research on singing with text or a neutral syllable is scarce, and findings are inconclusive. This study investigated the influence of singing with text and a neutral syllable on children’s vocal performance. Children aged 4 to 9 (n = 135) participated in two periods of instruction and assessment. In Period One, Song 1 was taught with text and Song 2 with a neutral syllable, and in Period Two, the text was added to Song 2. In each period, children were individually audio-recorded singing both songs. Three independent raters scored the songs’ vocal performances using two researcher-designed rating scales, one for each song, which included the assessment of tonal and rhythm dimensions. Before data analysis, the validity and reliability of the rating scales used to assess vocal performance were examined and assured. The results revealed that 4-, 5-, and 7-year-olds sang Song 1 significantly better in Period One, and 4- and 5-year-olds sang Song 1 significantly better in Period Two. Thus, singing with text seems to favour younger children’s vocal performance. Findings also revealed that girls scored significantly higher than boys for Song 1 in both periods, but not for Song 2 in Period One. The implications of incorporating songs with text and neutral syllables into music programs, as well as the instruments used to assess vocal performances, are discussed. Full article

(This article belongs to the Special Issue Contemporary Issues in Music Education: International Perspectives)

24 pages, 1855 KiB

Open AccessArticle

AI-Driven Panel Assignment Optimization via Document Similarity and Natural Language Processing

by Rohit Ramachandran, Urjit Patil, Srinivasaraghavan Sundar, Prem Shah and Preethi Ramesh

AI 2025, 6(8), 177; https://doi.org/10.3390/ai6080177 (registering DOI) - 1 Aug 2025

Abstract

Efficient and accurate panel assignment is critical in expert and peer review processes. Traditional methods—based on manual preferences or Heuristic rules—often introduce bias, inconsistency, and scalability challenges. We present an automated framework that combines transformer-based document similarity modeling with optimization-based reviewer assignment. Using the all-mpnet-base-v2 from model (version 3.4.1), our system computes semantic similarity between proposal texts and reviewer documents, including CVs and Google Scholar profiles, without requiring manual input from reviewers. These similarity scores are then converted into rankings and integrated into an Integer Linear Programming (ILP) formulation that accounts for workload balance, conflicts of interest, and role-specific reviewer assignments (lead, scribe, reviewer). The method was tested across 40 researchers in two distinct disciplines (Chemical Engineering and Philosophy), each with 10 proposal documents. Results showed high self-similarity scores (0.65–0.89), strong differentiation between unrelated fields (−0.21 to 0.08), and comparable performance between reviewer document types. The optimization consistently prioritized top matches while maintaining feasibility under assignment constraints. By eliminating the need for subjective preferences and leveraging deep semantic analysis, our framework offers a scalable, fair, and efficient alternative to manual or Heuristic assignment processes. This approach can support large-scale review workflows while enhancing transparency and alignment with reviewer expertise. Full article

(This article belongs to the Section AI Systems: Theory and Applications)

►▼ Show Figures

Figure 1

18 pages, 1811 KiB

Open AccessArticle

A Multimodal Deep Learning Framework for Consistency-Aware Review Helpfulness Prediction

by Seonu Park, Xinzhe Li, Qinglong Li and Jaekyeong Kim

Electronics 2025, 14(15), 3089; https://doi.org/10.3390/electronics14153089 (registering DOI) - 1 Aug 2025

Abstract

Multimodal review helpfulness prediction (MRHP) aims to identify the most helpful reviews by leveraging both textual and visual information. However, prior studies have primarily focused on modeling interactions between these modalities, often overlooking the consistency between review content and ratings, which is a key indicator of review credibility. To address this limitation, we propose CRCNet (Content–Rating Consistency Network), a novel MRHP model that jointly captures the semantic consistency between review content and ratings while modeling the complementary characteristics of text and image modalities. CRCNet employs RoBERTa and VGG-16 to extract semantic and visual features, respectively. A co-attention mechanism is applied to capture the consistency between content and rating, and a Gated Multimodal Unit (GMU) is adopted to integrate consistency-aware representations. Experimental results on two large-scale Amazon review datasets demonstrate that CRCNet outperforms both unimodal and multimodal baselines in terms of MAE, MSE, RMSE, and MAPE. Further analysis confirms the effectiveness of content–rating consistency modeling and the superiority of the proposed fusion strategy. These findings suggest that incorporating semantic consistency into multimodal architectures can substantially improve the accuracy and trustworthiness of review helpfulness prediction. Full article

(This article belongs to the Special Issue Innovative Applications of Large Language Models in Natural Language Processing (NLP))

22 pages, 4480 KiB

Open AccessArticle

MGMR-Net: Mamba-Guided Multimodal Reconstruction and Fusion Network for Sentiment Analysis with Incomplete Modalities

by Chengcheng Yang, Zhiyao Liang, Tonglai Liu, Zeng Hu and Dashun Yan

Electronics 2025, 14(15), 3088; https://doi.org/10.3390/electronics14153088 (registering DOI) - 1 Aug 2025

Abstract

Multimodal sentiment analysis (MSA) faces key challenges such as incomplete modality inputs, long-range temporal dependencies, and suboptimal fusion strategies. To address these, we propose MGMR-Net, a Mamba-guided multimodal reconstruction and fusion network that integrates modality-aware reconstruction with text-centric fusion within an efficient state-space modeling framework. MGMR-Net consists of two core components: the Mamba-collaborative fusion module, which utilizes a two-stage selective state-space mechanism for fine-grained cross-modal alignment and hierarchical temporal integration, and the Mamba-enhanced reconstruction module, which employs continuous-time recurrence and dynamic gating to accurately recover corrupted or missing modality features. The entire network is jointly optimized via a unified multi-task loss, enabling simultaneous learning of discriminative features for sentiment prediction and reconstructive features for modality recovery. Extensive experiments on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets demonstrate that MGMR-Net consistently outperforms several baseline methods under both complete and missing modality settings, achieving superior accuracy, robustness, and generalization. Full article

(This article belongs to the Special Issue Application of Data Mining in Decision Support Systems (DSSs))

20 pages, 865 KiB

Open AccessReview

Barriers and Facilitators to Artificial Intelligence Implementation in Diabetes Management from Healthcare Workers’ Perspective: A Scoping Review

by Giovanni Cangelosi, Andrea Conti, Gabriele Caggianelli, Massimiliano Panella, Fabio Petrelli, Stefano Mancin, Matteo Ratti and Alice Masini

Medicina 2025, 61(8), 1403; https://doi.org/10.3390/medicina61081403 (registering DOI) - 1 Aug 2025

Abstract

Background and Objectives: Diabetes is a global public health challenge, with increasing prevalence worldwide. The implementation of artificial intelligence (AI) in the management of this condition offers potential benefits in improving healthcare outcomes. This study primarily investigates the barriers and facilitators perceived by healthcare professionals in the adoption of AI. Secondarily, by analyzing both quantitative and qualitative data collected, it aims to support the potential development of AI-based programs for diabetes management, with particular focus on a possible bottom-up approach. Materials and Methods: A scoping review was conducted following PRISMA-ScR guidelines for reporting and registered in the Open Science Framework (OSF) database. The study selection process was conducted in two phases—title/abstract screening and full-text review—independently by three researchers, with a fourth resolving conflicts. Data were extracted and assessed using Joanna Briggs Institute (JBI) tools. The included studies were synthesized narratively, combining both quantitative and qualitative analyses to ensure methodological rigor and contextual depth. Results: The adoption of AI tools in diabetes management is influenced by several barriers, including perceived unsatisfactory clinical performance, high costs, issues related to data security and decision-making transparency, as well as limited training among healthcare workers. Key facilitators include improved clinical efficiency, ease of use, time-saving, and organizational support, which contribute to broader acceptance of the technology. Conclusions: The active and continuous involvement of healthcare workers represents a valuable opportunity to develop more effective, reliable, and well-integrated AI solutions in clinical practice. Our findings emphasize the importance of a bottom-up approach and highlight how adequate training and organizational support can help overcome existing barriers, promoting sustainable and equitable innovation aligned with public health priorities. Full article

(This article belongs to the Special Issue Advances in Public Health and Healthcare Management for Chronic Care)

17 pages, 1907 KiB

Open AccessSystematic Review

Pilomatricoma in Syndromic Contexts: A Literature Review and a Report of a Case in Apert Syndrome

by Gianmarco Saponaro, Elisa De Paolis, Mattia Todaro, Francesca Azzuni, Giulio Gasparini, Antonio Bosso, Giuliano Ascani, Angelo Minucci and Alessandro Moro

Dermatopathology 2025, 12(3), 24; https://doi.org/10.3390/dermatopathology12030024 (registering DOI) - 1 Aug 2025

Abstract

Pilomatricomas are benign tumors originating from hair follicle matrix cells and represent the most common skin tumors in pediatric patients. Pilomatricomas may be associated with genetic syndromes such as myotonic dystrophy, familial adenomatous polyposis (FAP), Turner syndrome, Rubinstein–Taybi syndrome, Kabuki syndrome, and Sotos syndrome. This study reviews the literature on pilomatricomas occurring in syndromic contexts and presents a novel case linked to Apert syndrome. A systematic review was conducted using PubMed and Cochrane databases, focusing on case reports, case series, and reviews describing pilomatricomas associated with syndromes. A total of 1272 articles were initially screened; after removing duplicates and excluding articles without syndromic diagnoses or lacking sufficient data, 81 full-text articles were reviewed. Overall, 96 cases of pilomatricomas associated with genetic syndromes were identified. Reports of patients with Apert syndrome who do not develop pilomatricomas are absent in the literature. Pilomatricomas predominantly affect pediatric patients, with a slight female predominance, and are often the first manifestation of underlying genetic syndromes. Our study highlights previously unreported associations of pilomatricoma with Apert syndrome, providing molecular insights. This study contributes to understanding the clinical and molecular features of pilomatricomas in syndromic contexts and underscores the importance of genetic analysis for accurate diagnosis and management. Full article

►▼ Show Figures

Figure 1

24 pages, 6260 KiB

Open AccessArticle

Transforming Product Discovery and Interpretation Using Vision–Language Models

by Simona-Vasilica Oprea and Adela Bâra

J. Theor. Appl. Electron. Commer. Res. 2025, 20(3), 191; https://doi.org/10.3390/jtaer20030191 (registering DOI) - 1 Aug 2025

Abstract

In this work, the utility of multimodal vision–language models (VLMs) for visual product understanding in e-commerce is investigated, focusing on two complementary models: ColQwen2 (vidore/colqwen2-v1.0) and ColPali (vidore/colpali-v1.2-hf). These models are integrated into two architectures and evaluated across various product interpretation tasks, including image-grounded question answering, brand recognition and visual retrieval based on natural language prompts. ColQwen2, built on the Qwen2-VL backbone with LoRA-based adapter hot-swapping, demonstrates strong performance, allowing end-to-end image querying and text response synthesis. It excels at identifying attributes such as brand, color or usage based solely on product images and responds fluently to user questions. In contrast, ColPali, which utilizes the PaliGemma backbone, is optimized for explainability. It delivers detailed visual-token alignment maps that reveal how specific regions of an image contribute to retrieval decisions, offering transparency ideal for diagnostics or educational applications. Through comparative experiments using footwear imagery, it is demonstrated that ColQwen2 is highly effective in generating accurate responses to product-related questions, while ColPali provides fine-grained visual explanations that reinforce trust and model accountability. Full article

►▼ Show Figures

Figure 1

25 pages, 659 KiB

Open AccessSystematic Review

Mechanical and Physical Properties of Durable Prosthetic Restorations Printed Using 3D Technology in Comparison with Hybrid Ceramics and Milled Restorations—A Systematic Review

by Bettanapalya. V. Swapna, B. Shivamurthy, Vinu Thomas George, Kavishma Sulaya and Vaishnavi M Nayak

Prosthesis 2025, 7(4), 90; https://doi.org/10.3390/prosthesis7040090 (registering DOI) - 1 Aug 2025

Abstract

Background/Objectives: Additive manufacturing (AM) technology has emerged as an innovative approach in dentistry. Recently, manufacturers have developed permanent resins engineered explicitly for the fabrication of definitive prostheses using AM techniques. This systematic review evaluated the mechanical and physical properties of 3D-printed permanent resins in comparison to milled resins and hybrid ceramics for the fabrication of indirect dental restorations. Methods: Three electronic databases—Scopus, Web of Science, and PubMed—were searched for English-language articles. Two independent researchers conducted study selection, data extraction, quality assessment, and the evaluation of the certainty of evidence. In vitro studies assessing the mechanical and physical properties of the permanent resins were included in this review. Results: A total of 1779 articles were identified through electronic databases. Following full-text screening and eligibility assessment, 13 studies published between 2023 and 2024 were included in this qualitative review. The investigated outcomes included physical properties (surface roughness, color changes, water sorption/solubility) and mechanical properties (flexural strength, elastic modulus, microhardness). Conclusions: Three-dimensionally printed permanent resins show promising potential for fabricating indirect dental restorations. However, the current evidence regarding their mechanical and physical properties remain limited and inconsistent, mainly due to variability in study methodologies. Full article

(This article belongs to the Section Prosthodontics)

►▼ Show Figures

Figure 1

37 pages, 642 KiB

Open AccessArticle

The Goddess of the Flaming Mouth Between India and Tibet

by Arik Moran and Alexander Zorin

Religions 2025, 16(8), 1002; https://doi.org/10.3390/rel16081002 - 1 Aug 2025

Abstract

This article examines the evolution and potential cross-cultural adaptations of the “Goddess of the Flaming Mouth”, Jvālāmukhī (Skt.) or Kha ‘bar ma (Tib.), in Indic and Tibetan traditions. A minor figure in medieval Hindu Tantras, Jvālāmukhī is today best known through her tangible manifestation as natural flames in a West Himalayan temple complex in the valley of Kangra, Himachal Pradesh, India. The gap between her sparse portrayal in Tantric texts and her enduring presence at this local “seat of power” (śakti pīṭha) raises questions regarding her historical development and sectarian affiliations. To address these questions, we examine mentions of Jvālāmukhī’s Tibetan counterpart, Kha ‘bar ma, across a wide range of textual sources: canonical Buddhist texts, original Tibetan works of the Bön and Buddhist traditions, and texts on sacred geography. Regarded as a queen of ghost spirits (pretas) and field protector (kṣetrapāla) in Buddhist sources, her portrayal in Bön texts contain archaic motifs that hint at autochthonous and/or non-Buddhist origins. The assessment of Indic material in conjunction with Tibetan texts point to possible transformations of the goddess across these culturally proximate Himalayan settings. In presenting and contextualizing these transitions, this article contributes critical data to ongoing efforts to map the development, adaptation, and localization of Tantric deities along the Indo-Tibetan interface. Full article

16 pages, 1651 KiB

Open AccessArticle

Modular Pipeline for Text Recognition in Early Printed Books Using Kraken and ByT5

by Yahya Momtaz, Lorenza Laccetti and Guido Russo

Electronics 2025, 14(15), 3083; https://doi.org/10.3390/electronics14153083 (registering DOI) - 1 Aug 2025

Abstract

Early printed books, particularly incunabula, are invaluable archives of the beginnings of modern educational systems. However, their complex layouts, antique typefaces, and page degradation caused by bleed-through and ink fading pose significant challenges for automatic transcription. In this work, we present a modular pipeline that addresses these problems by combining modern layout analysis and language modeling techniques. The pipeline begins with historical layout-aware text segmentation using Kraken, a neural network-based tool tailored for early typographic structures. Initial optical character recognition (OCR) is then performed with Kraken’s recognition engine, followed by post-correction using a fine-tuned ByT5 transformer model trained on manually aligned line-level data. By learning to map noisy OCR outputs to verified transcriptions, the model substantially improves recognition quality. The pipeline also integrates a preprocessing stage based on our previous work on bleed-through removal using robust statistical filters, including non-local means, Gaussian mixtures, biweight estimation, and Gaussian blur. This step enhances the legibility of degraded pages prior to OCR. The entire solution is open, modular, and scalable, supporting long-term preservation and improved accessibility of cultural heritage materials. Experimental results on 15th-century incunabula show a reduction in the Character Error Rate (CER) from around 38% to around 15% and an increase in the Bilingual Evaluation Understudy (BLEU) score from 22 to 44, confirming the effectiveness of our approach. This work demonstrates the potential of integrating transformer-based correction with layout-aware segmentation to enhance OCR accuracy in digital humanities applications. Full article

(This article belongs to the Special Issue Electronics and Computer Science for Cultural Heritage: Advancements, Preservation, and Applications, 2nd Edition)

►▼ Show Figures

Figure 1

13 pages, 1003 KiB

Open AccessArticle

Evaluation of an Artificial Intelligence-Generated Health Communication Material on Bird Flu Precautions

by Ayokunle A. Olagoke, Comfort Tosin Adebayo, Joseph Ayotunde Aderonmu, Emmanuel A. Adeaga and Kimberly J. Johnson

Zoonotic Dis. 2025, 5(3), 22; https://doi.org/10.3390/zoonoticdis5030022 - 1 Aug 2025

Abstract

The 2025 avian influenza A(H5N1) outbreak has highlighted the urgent need for rapidly generated health communication materials during public health emergencies. Artificial intelligence (AI) systems offer transformative potential to accelerate content development pipelines while maintaining scientific accuracy and impact. We evaluated an AI-generated health communication material on bird flu precautions among 100 U.S. adults. The material was developed using ChatGPT for text generation based on CDC guidelines and Leonardo.AI for illustrations. Participants rated perceived message effectiveness, quality, realism, relevance, attractiveness, and visual informativeness. The AI-generated health communication material received favorable ratings across all dimensions: perceived message effectiveness (3.83/5, 77%), perceived message quality (3.84/5, 77%), realism (3.72/5, 74%), relevance (3.68/5, 74%), attractiveness (3.62/5, 74%), and visual informativeness (3.35/5 67%). Linear regression analysis revealed that all features significantly predicted perceived message effectiveness in unadjusted and adjusted models (p < 0.0001), e.g., multivariate analysis of outcome on perceived visual informativeness showed β = 0.51, 95% CI: 0.37–0.66, p < 0.0001. Also, mediation analysis revealed that visual informativeness accounted for 23.8% of the relationship between material attractiveness and perceived effectiveness. AI tools can enable real-time adaptation of prevention guidance during epidemiological emergencies while maintaining effective risk communication. Full article

►▼ Show Figures

Figure 1

23 pages, 1192 KiB

Open AccessArticle

Multi-Model Dialectical Evaluation of LLM Reasoning Chains: A Structured Framework with Dual Scoring Agents

by Catalin Anghel, Andreea Alexandra Anghel, Emilia Pecheanu, Ioan Susnea, Adina Cocu and Adrian Istrate

Informatics 2025, 12(3), 76; https://doi.org/10.3390/informatics12030076 (registering DOI) - 1 Aug 2025

Abstract

(1) Background and objectives: Large language models (LLMs) such as GPT, Mistral, and LLaMA exhibit strong capabilities in text generation, yet assessing the quality of their reasoning—particularly in open-ended and argumentative contexts—remains a persistent challenge. This study introduces Dialectical Agent, an internally developed modular framework designed to evaluate reasoning through a structured three-stage process: opinion, counterargument, and synthesis. The framework enables transparent and comparative analysis of how different LLMs handle dialectical reasoning. (2) Methods: Each stage is executed by a single model, and final syntheses are scored via two independent LLM evaluators (LLaMA 3.1 and GPT-4o) based on a rubric with four dimensions: clarity, coherence, originality, and dialecticality. In parallel, a rule-based semantic analyzer detects rhetorical anomalies and ethical values. All outputs and metadata are stored in a Neo4j graph database for structured exploration. (3) Results: The system was applied to four open-weight models (Gemma 7B, Mistral 7B, Dolphin-Mistral, Zephyr 7B) across ten open-ended prompts on ethical, political, and technological topics. The results show consistent stylistic and semantic variation across models, with moderate inter-rater agreement. Semantic diagnostics revealed differences in value expression and rhetorical flaws not captured by rubric scores. (4) Originality: The framework is, to our knowledge, the first to integrate multi-stage reasoning, rubric-based and semantic evaluation, and graph-based storage into a single system. It enables replicable, interpretable, and multidimensional assessment of generative reasoning—supporting researchers, developers, and educators working with LLMs in high-stakes contexts. Full article

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 248.

Go to page 1 2 3 4 5

Search Results (12,397)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI