Machine Learning and Knowledge Extraction

35 pages, 903 KB

Open AccessArticle

BRAG: Bayesian Retrieval-Augmented Generation; A Methodological Framework for Evidence-Governed Decision Support

by Lebede Ngartera, Saralees Nadarajah, Rodoumta Koina and Youssou Gningue

Mach. Learn. Knowl. Extr. 2026, 8(6), 151; https://doi.org/10.3390/make8060151 - 1 Jun 2026

In high-stakes settings, the most consequential failure of a language model is not a wrong answer but an answer it was not entitled to give. Existing retrieval-augmented generation (RAG) pipelines retrieve context, generate text, and perhaps add citations, but they do not decide [...] Read more.

In high-stakes settings, the most consequential failure of a language model is not a wrong answer but an answer it was not entitled to give. Existing retrieval-augmented generation (RAG) pipelines retrieve context, generate text, and perhaps add citations, but they do not decide whether the evidence justifies answering, how uncertain the answer is, or at what level the system should intervene. We argue that LLMs should not only generate answers; they should be embedded inside a selective decision architecture that jointly estimates answerability, quantifies uncertainty, verifies structural validity, and chooses among direct response, escalation, abstention, or failure. We introduce BRAG (Bayesian Retrieval-Augmented Generation), a framework that operationalises this shift from answer generation to evidence-governed decision support. BRAG estimates an answerability posterior, decomposes uncertainty into epistemic and aleatoric components, and applies a structural validity gate prior to answer emission. Evaluation is conducted using controlled Monte Carlo simulation (

n = 2 400

queries) and a calibrated statistical pilot (

N = 500

), both parametric models of the pipeline’s output distribution, together with a governed operational validation that executes the full released pipeline end-to-end on independently generated MIMIC-IV-schema records (

N = 100

; not credentialed patient records), expert adjudication on a stratified subset (

N = 200

), and secondary transfer experiments on SEC EDGAR and CUAD. In simulation, BRAG reduces hallucination from 0.257 to 0.016 (93.8%) and achieves the highest coverage-adjusted utility (0.632) among five systems. In the synthetic MIMIC-IV-schema pilot, hallucination decreases from 0.292 to 0.020 (93.2%), with utility 0.538 at 89.6% coverage and an answerability AUROC of 0.692, which is moderate in absolute terms and is therefore positioned as a routing signal that operates jointly with the deterministic validity gate rather than as a stand-alone clinical classifier. Expert adjudication yields substantial agreement (Cohen’s

κ = 0.778

) and 93.5% concordance with BRAG decisions. Cross-domain transfer demonstrates 96–97% hallucination reduction without retriever modification, while ablation identifies the structural validity gate as the primary safety mechanism and the answerability posterior as the primary coverage and routing-precision mechanism. These results show that combining answerability estimation with structural validity enforcement can substantially reduce unsupported outputs. All findings are methodological rather than clinical: every evaluation tier uses synthetic or schema-conformant data, and validation on credentialed de-identified patient records remains necessary before any clinical deployment. Full article

(This article belongs to the Section Data)

31 pages, 7713 KB

Open AccessArticle

Temporal Knowledge Extraction Through BayeStack with Multi-Level Explainability for Optimal Sepsis Classification

by Anjana Geetha, K. L. Nisha, Arun Sankar Muttathu Sivasankara Pillai and Sreenath Rajeev

Mach. Learn. Knowl. Extr. 2026, 8(6), 150; https://doi.org/10.3390/make8060150 - 1 Jun 2026

Journal Description

Machine Learning and Knowledge Extraction

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Topical Collections

Further Information

Guidelines

MDPI Initiatives

Follow MDPI