Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection

Lipianina-Honcharenko, Khrystyna; Bykovyy, Pavlo; Krysovatyy, Andriy; Komar, Myroslav; Yazlyuk, Borys

doi:10.3390/make8060161

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection

by

Khrystyna Lipianina-Honcharenko

^1,*

,

Pavlo Bykovyy

¹

,

Andriy Krysovatyy

²,

Myroslav Komar

¹

and

Borys Yazlyuk

³

¹

Department of Information Computer Systems and Control, West Ukrainian National University, 11 Lvivska Str., 46009 Ternopil, Ukraine

²

S. I. Yuriy Department of Finance, West Ukrainian National University, 11 Lvivska Str., 46009 Ternopil, Ukraine

³

Department of Economic Expertise and Land Management, West Ukrainian National University, 11 Lvivska Str., 46009 Ternopil, Ukraine

^*

Author to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2026, 8(6), 161; https://doi.org/10.3390/make8060161

Submission received: 7 May 2026 / Revised: 9 June 2026 / Accepted: 10 June 2026 / Published: 11 June 2026

(This article belongs to the Special Issue Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning)

Download

Browse Figure

Versions Notes

Abstract

Large language models (LLMs) increasingly require robust evaluation under realistic instruction-following conditions, particularly for fine-tuned task-specific adapters operating in multilingual environments. This study proposes a scenario-adaptive evaluation framework for assessing the reliability of fine-tuned text models across two application regimes: misinformation detection (disinfo) and knowledge-grounded factual biography generation (heroes). The framework integrates automated generation of balanced risk-oriented scenarios, bilingual evaluation in English and Ukrainian, the LLM-as-a-Judge paradigm, and multidimensional robustness analysis through the Alignment Robustness Index (ARI). Six LoRA-adapted models based on Qwen2.5-3B-Instruct, SmolLM2-1.7B-Instruct, and TinyLlama-1.1B-Chat-v1.0 were evaluated. The implemented pipeline generated 2052 scenarios and 6156 model responses, producing a final bilingual analytical subset of 4104 judged records. Experimental results show that task-specific adaptation produces task-dependent robustness profiles. In the disinfo case, Qwen2.5-3B achieved the strongest overall performance, combining the highest safety and classification accuracy. In contrast, the heroes case revealed a more compressed and multidimensional vulnerability space without a single dominant model. The results further demonstrate the importance of multilingual evaluation, as weaker adapters exhibited more pronounced cross-lingual safety gaps. Overall, the framework provides a reproducible and practically applicable methodology for evaluating fine-tuned language models under imperfect instruction conditions.

Keywords: large language models; scenario-based evaluation; multilingual robustness; LoRA adaptation; LLM-as-a-Judge; misinformation detection; trustworthy AI; hallucination; safety evaluation; Alignment Robustness Index (ARI)

Graphical Abstract

Share and Cite

MDPI and ACS Style

Lipianina-Honcharenko, K.; Bykovyy, P.; Krysovatyy, A.; Komar, M.; Yazlyuk, B. Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection. Mach. Learn. Knowl. Extr. 2026, 8, 161. https://doi.org/10.3390/make8060161

AMA Style

Lipianina-Honcharenko K, Bykovyy P, Krysovatyy A, Komar M, Yazlyuk B. Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection. Machine Learning and Knowledge Extraction. 2026; 8(6):161. https://doi.org/10.3390/make8060161

Chicago/Turabian Style

Lipianina-Honcharenko, Khrystyna, Pavlo Bykovyy, Andriy Krysovatyy, Myroslav Komar, and Borys Yazlyuk. 2026. "Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection" Machine Learning and Knowledge Extraction 8, no. 6: 161. https://doi.org/10.3390/make8060161

APA Style

Lipianina-Honcharenko, K., Bykovyy, P., Krysovatyy, A., Komar, M., & Yazlyuk, B. (2026). Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection. Machine Learning and Knowledge Extraction, 8(6), 161. https://doi.org/10.3390/make8060161

Article Menu

Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI