You are currently on the new version of our website. Access the old version .

30 Results Found

  • Article
  • Open Access
2 Citations
2,906 Views
28 Pages

Evaluating the coherence of narrative sequences extracted from large document collections is crucial for applications in information retrieval and knowledge discovery. While mathematical coherence metrics based on embedding similarities provide objec...

  • Article
  • Open Access
629 Views
16 Pages

Evaluating Medical Text Summaries Using Automatic Evaluation Metrics and LLM-as-a-Judge Approach: A Pilot Study

  • Yuriy Vasilev,
  • Irina Raznitsyna,
  • Anastasia Pamova,
  • Tikhon Burtsev,
  • Tatiana Bobrovskaya,
  • Pavel Kosov,
  • Anton Vladzymyrskyy,
  • Olga Omelyanskaya and
  • Kirill Arzamasov

Background: Electronic health records (EHRs) remain a vital source of clinical information, yet processing these heterogeneous data is extremely labor-intensive. Summarization of these data using Large Language Models (LLMs) is considered a promising...

  • Review
  • Open Access
320 Views
15 Pages

Artificial Authority: The Promise and Perils of LLM Judges in Healthcare

  • Ariana Genovese,
  • Lars Hegstrom,
  • Srinivasagam Prabha,
  • Cesar A. Gomez-Cabello,
  • Syed Ali Haider,
  • Bernardo Collaco,
  • Nadia G. Wood and
  • Antonio Jorge Forte

Background: Large language models (LLMs) are increasingly integrated into clinical documentation, decision support, and patient-facing applications across healthcare, including plastic and reconstructive surgery. Yet, their evaluation remains bottlen...

  • Article
  • Open Access
4 Citations
4,680 Views
12 Pages

Multifaceted Assessment of Responsible Use and Bias in Language Models for Education

  • Ishrat Ahmed,
  • Wenxing Liu,
  • Rod D. Roscoe,
  • Elizabeth Reilley and
  • Danielle S. McNamara

Large language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases,...

  • Article
  • Open Access
907 Views
14 Pages

LLM-Based Pipeline for Structured Knowledge Extraction from Scientific Literature on Heavy Metal Hyperaccumulation

  • Kiril Makrinsky,
  • Valery Shendrikov,
  • Anna Makhonko,
  • Dmitry Merkushkin and
  • Oleg V. Batishchev

The rapid growth of the body of literature on heavy metal hyperaccumulation in plants has created a critical bottleneck in data synthesis. Manual curation is slow, labor-intensive, and not scalable. To address this issue, we developed an artificial i...

  • Article
  • Open Access
179 Views
34 Pages

Literary Language Mashup: Curating Fictions with Large Language Models

  • Gerardo Aleman Manzanarez,
  • Raul Monroy,
  • Jorge Garcia Flores and
  • Hiram Calvo

6 January 2026

The artificial generation of text by computers has been a field of study in computer science since the beginning of the twentieth century, from Markov chains to Turing tests. This has evolved into automatic summarization and marketing chatbots. The g...

  • Article
  • Open Access
2,679 Views
31 Pages

Evaluating Faithfulness in Agentic RAG Systems for e-Governance Applications Using LLM-Based Judging Frameworks

  • George Papageorgiou,
  • Vangelis Sarlis,
  • Manolis Maragoudakis,
  • Ioannis Magnisalis and
  • Christos Tjortjis

As Large Language Models (LLMs) are core components in Retrieval-Augmented Generation (RAG) systems for knowledge-intensive tasks, concerns regarding hallucinations, redundancy, and unverifiable outputs have intensified, particularly in high-stakes d...

  • Article
  • Open Access
308 Views
21 Pages

This paper presents the design, implementation and evaluation of an agentic virtual assistant (VA) for a medical clinic, combining large language models (LLMs) with retrieval-augmented generation (RAG) technology and multi-agent artificial intelligen...

  • Article
  • Open Access
2,063 Views
21 Pages

8 August 2025

We present a novel hybrid approach to literature-based discovery (LBD) which exploits large language models (LLMs) to enhance traditional LBD methodologies. We explore the use of LLMs to address significant LBD challenges: (1) the extraction of factu...

  • Article
  • Open Access
16 Citations
9,458 Views
36 Pages

Large Language Models as Evaluators in Education: Verification of Feedback Consistency and Accuracy

  • Hyein Seo,
  • Taewook Hwang,
  • Jeesu Jung,
  • Hyeonseok Kang,
  • Hyuk Namgoong,
  • Yohan Lee and
  • Sangkeun Jung

11 January 2025

The recent advancements in large language models (LLMs) have brought significant changes to the field of education, particularly in the generation and evaluation of feedback. LLMs are transforming education by streamlining tasks like content creation...

  • Article
  • Open Access
534 Views
21 Pages

KGEval: Evaluating Scientific Knowledge Graphs with Large Language Models

  • Vladyslav Nechakhin,
  • Jennifer D’Souza,
  • Steffen Eger and
  • Sören Auer

3 January 2026

This paper explores the novel application of large language models (LLMs) as evaluators for structured scientific summaries—a task where traditional natural language evaluation metrics may not readily apply. Leveraging the Open Research Knowled...

  • Article
  • Open Access
1 Citations
3,623 Views
19 Pages

The rapid progress of Large Language Models (LLMs) has intensified the demand for flexible evaluation frameworks capable of accommodating diverse user needs across a growing variety of applications. While numerous standardized benchmarks exist for ev...

  • Article
  • Open Access
1,732 Views
25 Pages

Multi-Agent Multimodal Large Language Model Framework for Automated Interpretation of Fuel Efficiency Analytics in Public Transportation

  • Zhipeng Ma,
  • Ali Rida Bahja,
  • Andreas Burgdorf,
  • André Pomp,
  • Tobias Meisen,
  • Bo Nørregaard Jørgensen and
  • Zheng Grace Ma

30 October 2025

Enhancing fuel efficiency in public transportation requires the integration of complex multimodal data into interpretable, decision-relevant insights. However, traditional analytics and visualization methods often yield fragmented outputs that demand...

  • Article
  • Open Access
481 Views
34 Pages

A Safety and Security-Centered Evaluation Framework for Large Language Models via Multi-Model Judgment

  • Jinxin Zhang,
  • Yunhao Xia,
  • Hong Zhong,
  • Weichen Lu,
  • Qingwei Deng and
  • Changsheng Wan

26 December 2025

The pervasive deployment of large language models (LLMs) has given rise to mounting concerns regarding the safety and security of the content generated by these models. Nevertheless, the absence of comprehensive evaluation methods constitutes a subst...

  • Article
  • Open Access
181 Views
27 Pages

SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation

  • Catalin Anghel,
  • Marian Viorel Craciun,
  • Adina Cocu,
  • Andreea Alexandra Anghel and
  • Adrian Istrate

14 January 2026

Large language models (LLMs) are increasingly used as rubric-guided graders for short-answer exams, but their decisions can be unstable across prompts and vulnerable to answer-side prompt injection. In this paper, we study SteadyEval, a guardrailed e...

  • Article
  • Open Access
904 Views
31 Pages

Common Weakness Enumerations (CWEs) and Common Vulnerabilities and Exposures (CVEs) are open knowledge bases that provide definitions, descriptions, and samples of code vulnerabilities. The combination of Large Language Models (LLMs) with vulnerabili...

  • Article
  • Open Access
1 Citations
4,748 Views
42 Pages

5 September 2025

A learning management system (LMS) plays a crucial role in supporting students’ educational activities by centralized platforms for course delivery, communication, and student support. Recently, many universities have integrated chatbots into t...

  • Feature Paper
  • Article
  • Open Access
1,062 Views
32 Pages

15 October 2025

Integrating emotional intelligence into AI systems is essential for developing empathetic chatbots, yet deploying fully empathetic models is often constrained by business, ethical, and computational factors. We propose an innovative solution: a dedic...

  • Article
  • Open Access
2 Citations
3,123 Views
16 Pages

Background: Large Language Models (LLMs) have demonstrated strong performances in clinical question-answering (QA) benchmarks, yet their effectiveness in addressing real-world consumer medical queries remains underexplored. This study evaluates the c...

  • Article
  • Open Access
6 Citations
4,831 Views
20 Pages

Diagnosing Bias and Instability in LLM Evaluation: A Scalable Pairwise Meta-Evaluator

  • Catalin Anghel,
  • Andreea Alexandra Anghel,
  • Emilia Pecheanu,
  • Adina Cocu,
  • Adrian Istrate and
  • Constantin Adrian Andrei

31 July 2025

The evaluation of large language models (LLMs) increasingly relies on other LLMs acting as automated judges. While this approach offers scalability and efficiency, it raises serious concerns regarding evaluator reliability, positional bias, and ranki...

  • Article
  • Open Access
1 Citations
5,496 Views
21 Pages

1 August 2025

The justice system has increasingly applied AI techniques for legal judgment to enhance efficiency. However, most AI techniques focus on decision-making outcomes, failing to capture the deliberative nature of the real-world judicial process. To addre...

  • Article
  • Open Access
3 Citations
1,792 Views
30 Pages

CourseEvalAI: Rubric-Guided Framework for Transparent and Consistent Evaluation of Large Language Models

  • Catalin Anghel,
  • Marian Viorel Craciun,
  • Emilia Pecheanu,
  • Adina Cocu,
  • Andreea Alexandra Anghel,
  • Paul Iacobescu,
  • Calina Maier,
  • Constantin Adrian Andrei,
  • Cristian Scheau and
  • Serban Dragosloveanu

11 October 2025

Background and objectives: Large language models (LLMs) show promise in automating open-ended evaluation tasks, yet their reliability in rubric-based assessment remains uncertain. Variability in scoring, feedback, and rubric adherence raises concerns...

  • Article
  • Open Access
1,462 Views
22 Pages

RCEGen: A Generative Approach for Automated Root Cause Analysis Using Large Language Models (LLMs)

  • Rubel Hassan Mollik,
  • Arup Datta,
  • Anamul Haque Mollah and
  • Wajdi Aljedaani

7 November 2025

Root cause analysis (RCA) identifies the faults and vulnerabilities underlying software failures, informing better design and maintenance decisions. Earlier approaches typically framed RCA as a classification task, predicting coarse categories of roo...

  • Article
  • Open Access
3,050 Views
42 Pages

13 October 2025

Large language models (LLMs) judge three pairs of architectural design proposals which have been independently surveyed by opinion polls: department store buildings, sports stadia, and viaducts. A tailored prompt instructs the LLM to use specific emo...

  • Review
  • Open Access
667 Views
14 Pages

28 November 2025

Helicobacter pylori infects about half of the global population and is a major cause of peptic ulcer disease and gastric cancer. Improving patient education can increase screening participation, enhance treatment adherence, and help reduce gastric ca...

  • Article
  • Open Access
1 Citations
3,500 Views
30 Pages

Enhancing Online Learning Through Multi-Agent Debates for CS University Students

  • Jing Du,
  • Guangtao Xu,
  • Wenhao Liu,
  • Dibin Zhou and
  • Fuchang Liu

23 May 2025

As recent advancements in large language models enhance reasoning across various domains, educators are increasingly exploring their use in conversation-based tutoring systems. However, since LLMs are black-box models to users and lack human-like pro...

  • Article
  • Open Access
191 Views
16 Pages

Public Opinion Reports are essential tools for crisis management, yet their evaluation remains a critical bottleneck that often delays response actions. Recently, dominant Large Language Model (LLM)-based evaluators often overlook a critical challeng...

  • Systematic Review
  • Open Access
1 Citations
3,873 Views
90 Pages

Background: Retrieval-augmented generation (RAG) aims to reduce hallucinations and outdated knowledge by grounding LLM outputs in retrieved evidence, but empirical results are scattered across tasks, systems, and metrics, limiting cumulative insight....

  • Article
  • Open Access
2 Citations
2,342 Views
32 Pages

Smart City Ontology Framework for Urban Data Integration and Application

  • Xiaolong He,
  • Xi Kuai,
  • Xinyue Li,
  • Zihao Qiu,
  • Biao He and
  • Renzhong Guo

Rapid urbanization and the proliferation of heterogeneous urban data have intensified the challenges of semantic interoperability and integrated urban governance. To address this, we propose the Smart City Ontology Framework (SMOF), a standards-drive...

  • Article
  • Open Access
395 Views
13 Pages

AI Decision-Making Performance in Maternal–Fetal Medicine: Comparison of ChatGPT-4, Gemini, and Human Specialists in a Cross-Sectional Case-Based Study

  • Matan Friedman,
  • Amit Slouk,
  • Noa Gonen,
  • Laura Guzy,
  • Yael Ganor Paz,
  • Kira Nahum Sacks,
  • Amihai Rottenstreich,
  • Eran Weiner,
  • Ohad Gluck and
  • Ilia Kleiner

24 December 2025

Background/Objectives: Large Language Models (LLMs), including ChatGPT-4 and Gemini, are increasingly incorporated into clinical care; however, their reliability within maternal–fetal medicine (MFM), a high-risk field in which diagnostic and ma...