Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (49)

Search Parameters:
Keywords = BM25 retriever

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
32 pages, 1673 KB  
Article
InspectCL: A Contrastive Learning Assistant for Similar Case Retrieval in Organizational Audit and Compliance
by Jianfeng Liu, Yuetian Huang, Changhua Hu, Kangheng Feng, Suining Zhu, Qingguo Shi and Yi Su
Electronics 2026, 15(11), 2495; https://doi.org/10.3390/electronics15112495 (registering DOI) - 5 Jun 2026
Abstract
In large-scale state-owned enterprise audit and compliance tasks, ensuring that similar violations receive consistent disciplinary decisions is essential for procedural fairness and institutional credibility. However, existing retrieval methods face three major challenges: lexical matching methods fail to recognize semantically equivalent violation descriptions, general-purpose [...] Read more.
In large-scale state-owned enterprise audit and compliance tasks, ensuring that similar violations receive consistent disciplinary decisions is essential for procedural fairness and institutional credibility. However, existing retrieval methods face three major challenges: lexical matching methods fail to recognize semantically equivalent violation descriptions, general-purpose semantic encoders lack knowledge of inspection-specific terminology and regulatory distinctions, and retrieved precedents are often not directly transformed into actionable disciplinary references. To address these problems, this paper proposes InspectCL, a domain-enhanced contrastive learning and Retrieval-Augmented Generation framework for similar case retrieval, validated on audit data from a provincial power grid company. First, to provide task-specific supervision that is unavailable in existing benchmarks, we construct InspectCase, a de-identified dataset of 4200 audit and compliance cases across 12 violation categories, with expert-validated positive pairs and hard negative pairs. Second, to overcome the weak domain awareness of generic encoders, we design a domain-enhanced contrastive learning model. Specifically, terminology-masking augmentation improves robustness to specialized inspection expressions, regulatory semantic injection incorporates disciplinary rules to distinguish factually similar but legally different cases, and hierarchical contrastive optimization strengthens both case-level similarity learning and category-level boundary separation. Third, to convert retrieved precedents into practical decision support, the Top-K similar cases are used as evidence for a large language model to generate structured disciplinary recommendation summaries, including violation classification, penalty references, applicable regulations, and rectification measures. Experimental results on InspectCase show that InspectCL substantially outperforms BM25, BERT-base, SimCSE, and Legal-BERT baselines, achieving 56.9% ± 0.7% Recall@5 and an 87.6% ± 0.4% Penalty Consistency Score (PCS). These results demonstrate that the proposed problem-driven modules jointly improve semantic retrieval accuracy and disciplinary decision consistency, offering a practical reference for similar power-grid audit scenarios, with broader applicability to be validated in future cross-domain studies. Full article
(This article belongs to the Special Issue AI-Powered Natural Language Processing Applications)
23 pages, 1481 KB  
Article
Rare-Disease Diagnosis on the ZebraMap Multimodal Case Report Dataset: A Hybrid Pipeline with Grounded Explainability
by Md Sanzidul Islam, Amani Jamal and Ali Alkhathlan
Sensors 2026, 26(11), 3582; https://doi.org/10.3390/s26113582 - 4 Jun 2026
Viewed by 152
Abstract
Rare-disease diagnosis is difficult because clinicians must identify plausible conditions from a large, severely imbalanced disease space using evidence distributed across clinical narratives, structured findings, and image-linked descriptions. This paper presents a hybrid pipeline with caption-mediated multimodal fusion for ranked rare-disease diagnosis and [...] Read more.
Rare-disease diagnosis is difficult because clinicians must identify plausible conditions from a large, severely imbalanced disease space using evidence distributed across clinical narratives, structured findings, and image-linked descriptions. This paper presents a hybrid pipeline with caption-mediated multimodal fusion for ranked rare-disease diagnosis and grounded explanation, developed and evaluated on the ZebraMap multimodal case-report dataset (69,146 structured cases; 1727 diseases). Grouped train–validation–test splitting by source article was applied to prevent leakage, and a sequential pipeline was constructed combining BM25 lexical retrieval, a class-balanced TF–IDF classifier, MedCPT dense retrieval and cross-encoder reranking, caption-based image-aware late fusion, and post hoc grounded explanation generation. The final pipeline achieved test MRR 0.3905 and Recall@10 0.5507 (nDCG@10 0.4273), while the strongest individual component, the class-balanced TF–IDF classifier, reached MRR 0.4200 and Recall@10 0.6279; the hybrid pipeline therefore integrates ranking with grounded explanation rather than maximizing single-metric diagnostic accuracy. On 256 explained cases, the explanation module achieved citation coverage 0.7334 and usefulness 3.8734, exposing a tradeoff between diagnostic accuracy and explanation richness. These results indicate that a hybrid retrieval-and-classification approach can support ranked rare-disease differential diagnosis and that grounded explanation quality can be evaluated quantitatively, extending computational support for the prolonged rare-disease diagnostic process. Full article
Show Figures

Figure 1

36 pages, 5874 KB  
Article
Research on Methods for Linking Geoscience Literature and Geoscientific Data Based on Large Language Models
by Xinyu Chen, Yin Ma, Kai Wu, Xing Pang, Guoqing Li, Ruikai Ma, Linhan Yang, Chuang Peng, Jiayu Zhi and Jiabin Yuan
ISPRS Int. J. Geo-Inf. 2026, 15(6), 243; https://doi.org/10.3390/ijgi15060243 - 1 Jun 2026
Viewed by 251
Abstract
Automated linkage between geoscientific literature and datasets is essential for improving data reuse, reproducibility, and knowledge discovery, yet existing methods often struggle with implicit dataset references, heterogeneous spatial–temporal expressions, and inconsistent naming conventions. To address this problem, we propose a literature–data linkage framework [...] Read more.
Automated linkage between geoscientific literature and datasets is essential for improving data reuse, reproducibility, and knowledge discovery, yet existing methods often struggle with implicit dataset references, heterogeneous spatial–temporal expressions, and inconsistent naming conventions. To address this problem, we propose a literature–data linkage framework that integrates candidate retrieval, large language model (LLM)-based structured extraction, normalization, and knowledge graph construction. The framework first identifies candidate fragments through BM25-based retrieval, regex filtering, and whitelist-assisted scoring, and then applies schema-constrained prompting to extract dataset names and key attributes, including temporal coverage, spatial scope, resolution, provider, and role. The extracted results are subsequently normalized to canonical forms and ingested into a Neo4j-based knowledge graph linking articles, datasets, institutions, and regions. Experiments on a cross-journal benchmark show that the proposed framework achieves 93.79% precision, 90.66% recall, and 92.20% F1-score. Comparative experiments across multiple LLM backbones further indicate that the framework remains effective across both proprietary and open-source models, while ablation results confirm that candidate retrieval and normalization are the two most influential components for balanced extraction performance. The resulting knowledge graph provides a structured representation of literature–data linkages and supports exploration of dataset reuse patterns, provenance relations, and cross-document connections. These results demonstrate that carefully constrained LLM extraction, combined with retrieval and normalization, provides a robust and interpretable pathway for transforming unstructured geoscientific literature into structured and reusable knowledge. Full article
Show Figures

Figure 1

24 pages, 5366 KB  
Article
A Three-Tier Hybrid Architecture for an Admissions Dialogue Assistant with Graph-Aware Context Routing
by Nikita Stepanov, Anastasiya Radaeva, Peter Panfilov, Alexander Suleykin and Valery Pyatetsky
Big Data Cogn. Comput. 2026, 10(5), 156; https://doi.org/10.3390/bdcc10050156 - 15 May 2026
Viewed by 211
Abstract
University admissions services must answer large volumes of applicant questions that differ substantially in complexity, ranging from repetitive FAQ-type requests to multi-step questions involving programs, entrance exams, admission rules, passing scores, and temporal comparisons. Ungrounded large language model responses are risky in this [...] Read more.
University admissions services must answer large volumes of applicant questions that differ substantially in complexity, ranging from repetitive FAQ-type requests to multi-step questions involving programs, entrance exams, admission rules, passing scores, and temporal comparisons. Ungrounded large language model responses are risky in this domain because answers must be factually correct, source-based, and consistent with official institutional data. This paper presents a three-tier hybrid architecture for an admissions dialogue assistant that combines deterministic FAQ matching, hybrid retrieval-augmented generation, and graph-grounded retrieval for complex queries. The first tier, Hash-FAQ, returns verified answers for frequent intents using normalized keys, hash-based lookup, near-duplicate fingerprinting, and semantic similarity checks. The second tier applies hybrid RAG based on BM25 retrieval, vector search, rank fusion, and optional cross-encoder reranking. The third tier uses GraphRAG to extract a constrained k-hop subgraph from a Neo4j knowledge graph built from relational admissions data and document-derived facts. All tiers are synchronized through a versioned indexing pipeline with shadow collections and atomic switching across lexical, vector, FAQ, relational, and graph stores. The system was evaluated using real admissions-campaign traffic and a labeled subset of applicant queries. Tier 1 resolved 68.7% of requests with low latency, while the GraphRAG branch improved factual accuracy with attribution on multi-step queries from 0.55 to 0.91 compared with the non-graph baseline. The main contribution of the study is a production-oriented, cost-aware retrieval-and-generation architecture that links tiered routing, synchronized knowledge publication, source attribution, and operational evaluation for applicant-facing institutional dialogue systems. Full article
(This article belongs to the Topic Electronic Communications, IOT and Big Data, 2nd Volume)
Show Figures

Figure 1

24 pages, 2105 KB  
Article
A Multi-Stage Hybrid Retrieval Framework for the Scientific Literature with Cross-Encoder Re-Ranking
by Walaa Al-Joofi, Alaa Sagheer and Hala Hamdoun
Appl. Sci. 2026, 16(10), 4813; https://doi.org/10.3390/app16104813 - 12 May 2026
Viewed by 509
Abstract
Effective scientific literature retrieval requires moving beyond surface-level term matching toward structured semantic reasoning. This paper presents a controlled empirical study of multi-stage retrieval for scientific literature, integrating lexical matching, dense semantic modeling, hybrid fusion, and cross-encoder re-ranking within a unified evaluation framework. [...] Read more.
Effective scientific literature retrieval requires moving beyond surface-level term matching toward structured semantic reasoning. This paper presents a controlled empirical study of multi-stage retrieval for scientific literature, integrating lexical matching, dense semantic modeling, hybrid fusion, and cross-encoder re-ranking within a unified evaluation framework. The study is designed to analyze the interactions, trade-offs, and failure modes of these components in claim-based scientific search. Experiments on the SciFact benchmark demonstrate that dense models capture semantic similarity but remain insufficient when used in isolation. Hybrid fusion broadens the candidate pool but does not consistently outperform the best standalone dense retriever, as RRF-based fusion can dilute strong dense rankings when lexical and semantic signals diverge. Cross-encoder re-ranking proves to be the primary driver of final performance gains, with the best configuration, Hybrid (SciNCL + BM25) + Cross-Encoder, reaching NDCG@10 of 0.523, MAP@10 of 0.479, Recall@10 of 0.642, and MRR@10 of 0.497. Ablation analysis shows that lexical pseudo-relevance feedback (RM3) introduces query drift in claim-focused retrieval, and that passage-level max pooling weakens effectiveness by fragmenting document-level evidence. Cross-domain evaluation on SciFact, PubMedQA, and SciDocs demonstrates that the relative ranking of retrieval paradigms remains stable across datasets with varying difficulty levels, while also revealing that the RRF dilution effect intensifies on harder retrieval tasks. These findings suggest that effective scientific retrieval benefits from integrated multi-stage pipelines, and that understanding component-level interactions is essential for designing robust retrieval systems. Full article
Show Figures

Figure 1

21 pages, 1748 KB  
Article
Multi-Route Search and Adaptive Fusion for Power QA with Small Language Model Guidance
by Zhijun Shen, Qian Guo, Lizhou Jiang, Jingkang Huang, Zhenfan Yu, Xinlei Cai, Hailin Pang and Tao Yu
Algorithms 2026, 19(5), 378; https://doi.org/10.3390/a19050378 - 11 May 2026
Viewed by 302
Abstract
Power documentation serves as the core guideline for the safe operation of power systems, and its precise retrieval is crucial for ensuring grid stability and safety. In this context, Retrieval-Augmented Generation (RAG) frameworks emerge as an effective technique by combining LLMs with natural [...] Read more.
Power documentation serves as the core guideline for the safe operation of power systems, and its precise retrieval is crucial for ensuring grid stability and safety. In this context, Retrieval-Augmented Generation (RAG) frameworks emerge as an effective technique by combining LLMs with natural language understanding capabilities and a retrieval-based model with traceability. However, existing Retrieval-Augmented Generation (RAG) frameworks face several main challenges for power-system documents: semantic drift caused by non-standardized industry terminology, increased semantic noise due to fixed-window segmentation, and knowledge conflicts in the multi-source retrieval context. To address these challenges, we propose a multi-path adaptive fusion retrieval framework based on small language models (SLMs). To map queries to standard terminology, our framework first constructs a common terminology repository and section-structure-aware index for the power industry while fully preserving the physical hierarchical logic from related documents. Subsequently, the SLM in our framework assigns prior weights based on query features and retrieved context, which contributes to adaptive fusion of retrieval paths through confidence assessment and consistency verification. With the help of the fusion process, our method effectively filters retrieval noise and resolves knowledge conflicts. Experimental results on real-world power-document datasets covering dispatch, energy storage and emergency response show that our framework achieves an average recall of 91%, outperforming DENSE and BM25 by 21% and 28% respectively. Compared with other methods, it yields the optimal BERTScore F1 (0.7798) and Rouge-1/2/L F1 (0.2430, 0.1588, 0.2098) and achieves the best results in the RAGAS framework evaluation, which significantly enhances the rigor and reliability of the question-answering system in the power engineering domain. Full article
Show Figures

Figure 1

27 pages, 1222 KB  
Article
Query-Adaptive Hybrid Search
by Pavel Posokhov, Stepan Skrylnikov, Sergei Masliukhin, Alina Zavgorodniaia, Olesia Koroteeva and Yuri Matveev
Mach. Learn. Knowl. Extr. 2026, 8(4), 91; https://doi.org/10.3390/make8040091 - 5 Apr 2026
Viewed by 1672
Abstract
The modern information retrieval field increasingly relies on hybrid search systems combining sparse retrieval with dense neural models. However, most existing hybrid frameworks employ static mixing coefficients and independent component training, failing to account for the specific needs of individual queries and corpus [...] Read more.
The modern information retrieval field increasingly relies on hybrid search systems combining sparse retrieval with dense neural models. However, most existing hybrid frameworks employ static mixing coefficients and independent component training, failing to account for the specific needs of individual queries and corpus heterogeneity. In this paper, we introduce an adaptive hybrid retrieval framework featuring query-driven alpha prediction that dynamically calibrates the mixing weights based on query latent representations instantiated in a lightweight low-latency configuration and a full-capacity encoder-scale predictor, enabling flexible trade-offs between computational efficiency and retrieval accuracy without relying on resource-inefficient LLM-based online evaluation. Furthermore, we propose antagonist negative sampling, a novel training paradigm that optimizes the dense encoder to resolve the systematic failures of the lexical retriever, prioritizing hard negatives where BM25 exhibits high uncertainty. Empirical evaluations on large-scale multilingual benchmarks (MLDR and MIRACL) indicate that our approach demonstrates superior average performance compared to state-of-the-art models such as BGE-M3 and mGTE, achieving an nDCG@10 of 74.3 on long-document retrieval. Notably, our framework recovers up to 92.5% of the theoretical oracle performance and yields significant improvements in nDCG@10 across 16 languages, particularly in challenging long-context scenarios. Full article
(This article belongs to the Special Issue Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning)
Show Figures

Figure 1

17 pages, 1283 KB  
Article
LedgerRAG: Governance-Driven Agentic Chain of Retrieval for Dynamic Knowledge Scenarios
by Siwei Wang, Yangsen Zhang, Yalong Guo and Jing Kang
Electronics 2026, 15(7), 1376; https://doi.org/10.3390/electronics15071376 - 26 Mar 2026
Viewed by 801
Abstract
Retrieval-augmented generation (RAG) grounds large language models (LLMs) with external evidence. Dynamic knowledge tasks, however, require systems to decide not only what to retrieve but also when to refresh, how to arbitrate conflicts, and how to preserve an auditable record of the evidence [...] Read more.
Retrieval-augmented generation (RAG) grounds large language models (LLMs) with external evidence. Dynamic knowledge tasks, however, require systems to decide not only what to retrieve but also when to refresh, how to arbitrate conflicts, and how to preserve an auditable record of the evidence used to answer a query. We present LedgerRAG, a trigger-aware retrieval chain framework that maintains an explicit claim-level evidence ledger and uses coverage, temporal validity, authority, and conflict signals to control retrieval, refresh, and stopping decisions. We expand the evaluation with a query-level BM25 baseline, a dense retriever setting, and task-aligned proxy baselines representing graph-style retrieval, temporal-only retrieval, and conflict-focused retrieval. The revised results show that LedgerRAG’s clearest advantage lies in conflict governance and auditable evidence control, achieving near-perfect ConFLICT adjudication (CRAcc = 0.993) under authority-aware routing while yielding more modest gains and explicit trade-offs in regulation-change and streaming settings. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

27 pages, 3012 KB  
Article
Emergency Operation Scheme Generation for Urban Rail Transit Train Door Systems Using Retrieval-Augmented Large Language Models
by Lu Huang, Zhigang Liu, Chengcheng Yu, Tianliang Zhu and Bing Yan
Sensors 2026, 26(6), 2006; https://doi.org/10.3390/s26062006 - 23 Mar 2026
Viewed by 771
Abstract
Urban rail transit (URT) train-door failures are safety-critical and can cause cascading service disruptions, yet existing emergency operation schemes (EOSs) are often static, difficult to adapt to evolving fault patterns, and hard to verify against updated regulations. This study proposes a retrieval-augmented large [...] Read more.
Urban rail transit (URT) train-door failures are safety-critical and can cause cascading service disruptions, yet existing emergency operation schemes (EOSs) are often static, difficult to adapt to evolving fault patterns, and hard to verify against updated regulations. This study proposes a retrieval-augmented large language model (LLM) framework for executable and evidence-traceable EOS generation. Multi-source heterogeneous incident evidence (structured work orders, operational impact records, and unstructured maintenance/dispatch narratives) is normalized into a structured incident representation, and a hybrid retriever (dense + BM25) with cross-encoder reranking selects compact regulatory clauses and historical cases under a fixed context budget. The generator is fine-tuned with structured objectives to enforce schema compliance, role assignment, and citation grounding. Experiments on 776 passenger-door incidents from Shanghai URT (2019–2024) show that Hybrid + rerank achieves the best retrieval quality (Recall@5 = 0.78; Coverage@B = 0.71; FirstHit/B = 0.46). For generation, the full setting improves operational usability, reaching SchemaPass = 0.88, RoleAcc = 0.91, CiteCov = 0.73, and UsableAns = 0.83, compared with 0.15 UsableAns for a pure LLM baseline and 0.26 for prompting with RAG only. These results indicate that combining high-utility retrieval with structure- and citation-aware fine-tuning substantially improves the executability and verifiability of safety-critical operation schemes. Full article
Show Figures

Figure 1

37 pages, 2886 KB  
Article
A Zero-Touch Vulnerability Remediation Framework Based on OpenVAS, Threat Intelligence, and RAG-Enhanced Large Language Models
by Cheng-Hui Hsieh, Chen-Yi Cheng and Yung-Chung Wang
Mathematics 2026, 14(6), 1072; https://doi.org/10.3390/math14061072 - 22 Mar 2026
Viewed by 1540
Abstract
Vulnerability disclosures are outpacing manual remediation capacity. We present a Zero-Touch Vulnerability Remediation Framework combining OpenVAS scanning, multi-source threat intelligence, and Large Language Models (LLMs) enhanced through Retrieval-Augmented Generation (RAG). The Scanning Layer normalizes findings into structured JSON; the AI Decision Layer applies [...] Read more.
Vulnerability disclosures are outpacing manual remediation capacity. We present a Zero-Touch Vulnerability Remediation Framework combining OpenVAS scanning, multi-source threat intelligence, and Large Language Models (LLMs) enhanced through Retrieval-Augmented Generation (RAG). The Scanning Layer normalizes findings into structured JSON; the AI Decision Layer applies hybrid FAISS + BM25 retrieval, dual-LLM verification (a primary generator checked by a gpt-4o auxiliary verifier), and confidence-based routing; the Orchestration Layer executes validated patches via CI/CD pipelines with automated rollback. On 350 real-world vulnerability cases across five GPT-family models, the full Prompt + RAG pipeline raised accuracy from 52.0% to 76.7–82.6% (all p < 0.001, Cohen’s h = 0.51–0.68) and reduced hallucination from 23.4% to 7.8%. Confidence routing routed 34.9% of cases to the high-confidence auto-execution tier, yielding a 4.1% rollback rate and zero service outages. The framework addresses the most relevant categories of the OWASP LLM Top 10 and lays groundwork for enterprise-scale, Zero-Touch vulnerability management. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

17 pages, 566 KB  
Article
Analyst-of-Record: A Proof-of-Concept for Influence-Based Analyst Credit Assignment in Human-Feedback Decision Support
by Devon L. Brown and Danda B. Rawat
Electronics 2026, 15(6), 1210; https://doi.org/10.3390/electronics15061210 - 13 Mar 2026
Viewed by 471
Abstract
The purpose of this study is to examine whether analyst-level credit can be assigned quantitatively in a lightweight human-feedback decision-support pipeline. In intelligence and national security workflows, analysts often provide edits, comments, and evaluative feedback during the production of analytic products, yet these [...] Read more.
The purpose of this study is to examine whether analyst-level credit can be assigned quantitatively in a lightweight human-feedback decision-support pipeline. In intelligence and national security workflows, analysts often provide edits, comments, and evaluative feedback during the production of analytic products, yet these intermediate contributions are usually discarded, leaving no auditable record of how individual feedback shaped the final output. To address this problem, this study proposes a proof-of-concept Analyst-of-Record framework that combines synthetic analyst feedback, a linear ridge reward model, first-order influence functions, and additive Shapley aggregation to estimate both feedback-item and analyst-level contribution scores. The research design uses the Fact Extraction and VERification (FEVER) fact-verification dataset under controlled experimental settings. The pipeline retrieves evidence with Best Matching 25 (BM25), generates a grounded template-based response, derives three synthetic analyst feedback channels from FEVER annotations, trains a reward model on simple claim–answer and analyst-identity features, and aggregates per-feedback influence scores into an Analyst Contribution Index (ACI). The main experiments are conducted on a 500-claim subset across five random seeds, with additional ablation and bootstrap analyses used to assess sensitivity and stability. The findings show that the reward model achieves a mean validation R2 of 0.801±0.037, indicating that the synthetic feedback signals are learnable under the selected featureization. The analyst-level contribution scores remain stable across random seeds, with approximately half of the total influence magnitude attributed to the explanation-quality channel and the remainder split across the other two channels. Ablation results further show that removing the explanation-quality channel collapses validation fit, while bootstrap resampling demonstrates tight concentration of absolute ACI magnitudes. Theoretically, this study extends attribution research beyond document-only grounding by showing how analyst feedback itself can be modeled as an object of contribution analysis. It also demonstrates that influence functions and Shapley-style aggregation can be adapted into a tractable framework for estimating interpretable analyst-level credit in a reproducible experimental setting. Practically, the proposed framework offers an initial foundation for more traceable and accountable decision-support workflows in which intermediate analyst contributions can be preserved rather than lost. The results also provide a feasible implementation path for future systems that incorporate stronger generators, richer evidence representations, and real analyst annotations. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

20 pages, 682 KB  
Article
Semantic Search for System Dynamics Models Using Vector Embeddings in a Cloud Microservices Environment
by Pavel Kyurkchiev, Anton Iliev and Nikolay Kyurkchiev
Future Internet 2026, 18(2), 86; https://doi.org/10.3390/fi18020086 - 5 Feb 2026
Viewed by 1115
Abstract
Efficient retrieval of mathematical and structural similarities in System Dynamics models remains a significant challenge for traditional lexical systems, which often fail to capture the contextual dependencies of simulation processes. This paper presents an architectural approach and implementation of a semantic search module [...] Read more.
Efficient retrieval of mathematical and structural similarities in System Dynamics models remains a significant challenge for traditional lexical systems, which often fail to capture the contextual dependencies of simulation processes. This paper presents an architectural approach and implementation of a semantic search module integrated into an existing cloud-based modeling and simulation system. The proposed method employs a strategy for serializing graph structures into textual descriptions, followed by the generation of vector embeddings via local ONNX inference and indexing within a vector database (Qdrant). Experimental validation performed on a diverse corpus of complex dynamic models, compares the proposed approach against traditional information retrieval methods (Full-Text Search, Keyword Search in PostgreSQL, and Apache Lucene with Standard and BM25 scoring). The results demonstrate the distinct advantage of semantic search, achieving high precision (over 90%) within the scope of the evaluated corpus and effectively eliminating information noise. In comparison, keyword search exhibited only 24.8% precision with a significant rate of false positives, while standard full-text analysis failed to identify relevant models for complex conceptual queries (0 results). Despite a recorded increase in latency (~2 s), the study proves that the vector-based approach is a significantly more robust solution for detecting hidden semantic connections in mathematical model databases, providing a foundation for future developments toward multi-vector indexing strategies. Full article
(This article belongs to the Special Issue Intelligent Agents and Their Application)
Show Figures

Graphical abstract

26 pages, 403 KB  
Article
How the Representation of Retrieved Context Affects In-Context Prompting for Commit Message Generation
by Dokyeong An and Geunseok Yang
Electronics 2026, 15(3), 652; https://doi.org/10.3390/electronics15030652 - 2 Feb 2026
Viewed by 333
Abstract
High-quality commit messages are essential software artifacts because they succinctly communicate the intent and scope of code changes, yet large language models (LLMs) often fail to reflect project-specific writing conventions when used in a zero-shot setting without contextual signals. This study investigates not [...] Read more.
High-quality commit messages are essential software artifacts because they succinctly communicate the intent and scope of code changes, yet large language models (LLMs) often fail to reflect project-specific writing conventions when used in a zero-shot setting without contextual signals. This study investigates not whether retrieval helps, but how the same retrieved example, when represented differently in the prompt, quantitatively changes generation outcomes. We implement a retrieve-then-generate framework where the target commit’s diff is used as a query for BM25 (Best Matching 25)-based sparse retrieval over a commit-level database, and the top-1 similar commit is optionally injected as an example context. We compare a no-context condition (K = 0) against a minimal-context condition (K = 1) under three context representations: Diff-only, Message-only, and Diff + Message pair. Using Qwen-7B on 8000 evaluation samples with a fixed prompt skeleton, deterministic decoding, and identical post-processing across conditions, we observe negligible differences at K = 0 (BLEU-4 1.14, ROUGE-L 7.47–7.48, METEOR 4.88–4.91), establishing a stable baseline. At K = 1, the same top-1 retrieved case yields systematically different metric responses depending on how it is represented (Diff-only, Message-only, or Diff + Message), even under an identical prompt skeleton, deterministic decoding, and identical post-processing. This indicates that “context representation” is not a cosmetic formatting choice but a first-class prompt-design variable in retrieval-augmented in-context learning for commit message generation. Accordingly, practitioners should select the representation based on the intended objective (e.g., lexical/style alignment vs. change-intent grounding), rather than assuming a universally optimal format. Full article
(This article belongs to the Special Issue AI-Powered Natural Language Processing Applications)
Show Figures

Figure 1

23 pages, 1237 KB  
Article
Enhancing Medical Question Answering with LLMs via a Hybrid Retrieval-Augmented Generation Framework
by Bushra Aljohani and Tawfeeq Alsanoosy
Information 2026, 17(2), 133; https://doi.org/10.3390/info17020133 - 1 Feb 2026
Cited by 1 | Viewed by 1545
Abstract
Given the knowledge-intensive and rapidly expanding nature of medical field, accurately synthesizing and interpreting findings remain a major challenge for clinicians and medical students. Although Large Language Models (LLMs) have advanced automated summarization or generated responses, their deployment is limited by hallucinations, outdated [...] Read more.
Given the knowledge-intensive and rapidly expanding nature of medical field, accurately synthesizing and interpreting findings remain a major challenge for clinicians and medical students. Although Large Language Models (LLMs) have advanced automated summarization or generated responses, their deployment is limited by hallucinations, outdated knowledge, and insufficient domain adaptation. Retrieval-Augmented Generation (RAG) addresses these issues by grounding LLMs in external knowledge bases. However, as the document corpus scales, maintaining RAG accuracy becomes increasingly difficult, making retrievers critical for contextual relevance. In this paper, we examined the efficiency of a modular RAG framework with a hybrid retrieval strategy that combines sparse retrieval (BM25) and dense retrieval (MedCPT) to extract the most relevant documents from the corpus, thereby providing contextual grounding for the LLM to improve medical responses. Evaluation was conducted on three benchmark healthcare datasets: PubMedQA, MedMCQA, and MedQA-US, using two LLMs, GPT-4o and BioGPT. Performance was assessed using retrieval metrics (context precision, context recall, F1-score) and generation metrics (BERTScore, RAG Assessment Score). The hybrid retriever achieved 92.14% recall, 74.36% precision, and an F1-score of 82.30%. GPT-4o with hybrid retrieval reached 89.4% faithfulness, 82.7% answer relevancy, and an F1BERT of 88.0% on PubMedQA. Results demonstrated that hybrid retrieval within a modular architecture substantially improves retrieval effectiveness and response quality. The proposed work offers a scalable, generalizable solution for high-stakes healthcare applications, supporting flexible retriever integration and robust evaluation to advance transparent QA systems. Full article
Show Figures

Figure 1

18 pages, 1672 KB  
Article
Mitigating Hallucinations in Discipline Inspection QA: A Two-Stage RAG Framework with Late Interaction and Reranking
by Changhua Hu, Yuetian Huang, Jiexin Kuang, Bozhi Dai, Yun Peng, Yuxin Xiao and Yi Su
Electronics 2026, 15(3), 541; https://doi.org/10.3390/electronics15030541 - 27 Jan 2026
Viewed by 970
Abstract
The automation of precise discipline inspection consultation requires question-answering (QA) systems that are both semantically nuanced and factually grounded. To address the limitations of keyword-based retrieval and the hallucination tendencies of generative language models in high-stakes discipline inspection domains, we propose a two-stage [...] Read more.
The automation of precise discipline inspection consultation requires question-answering (QA) systems that are both semantically nuanced and factually grounded. To address the limitations of keyword-based retrieval and the hallucination tendencies of generative language models in high-stakes discipline inspection domains, we propose a two-stage Retrieval-Augmented Generation (RAG) framework designed for Chinese discipline inspection text. Our approach synergizes token-level late interaction and cross-encoder reranking to achieve high-precision evidence retrieval. First, we employ ColBERTv2 to perform efficient, fine-grained semantic matching between queries and lengthy discipline inspection documents. Subsequently, we refine the initial candidate set using a computationally focused cross-encoder, which performs deep pairwise relevance scoring on a shortlist of passages. This retrieved evidence strictly conditions the answer generation process of a large language model (DeepSeek-chat). Through rigorous evaluation on a curated corpus of real Chinese discipline inspection documents and expert-annotated queries, we demonstrate that our pipeline significantly outperforms strong baselines—including BM25, single-stage dense retrieval (BGE), and a simplified ColBERT variant—in both retrieval metrics (Recall@k, Precision@k) and answer faithfulness. Our work provides a robust, reproducible blueprint for building reliable, evidence-based discipline inspection AI systems, highlighting the critical role of hierarchical retrieval in mitigating hallucinations for domain-specific QA. Full article
(This article belongs to the Special Issue AI-Driven Natural Language Processing Applications)
Show Figures

Figure 1

Back to TopTop