Next Article in Journal
MedToolica: Finetuning-Free Agentic Compositional Tool Learning for 3D CT Reasoning
Previous Article in Journal
Interpretable Machine Learning for the Shear Capacity of RC Corbels: A Validated, Application-Driven Model
Previous Article in Special Issue
A Sovereign Conversational Assistant Powered by ALIA and Mistral for the AI Act Age: Architecture, Governance, and Evaluation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection

by
Khrystyna Lipianina-Honcharenko
1,*,
Pavlo Bykovyy
1,
Andriy Krysovatyy
2,
Myroslav Komar
1 and
Borys Yazlyuk
3
1
Department of Information Computer Systems and Control, West Ukrainian National University, 11 Lvivska Str., 46009 Ternopil, Ukraine
2
S. I. Yuriy Department of Finance, West Ukrainian National University, 11 Lvivska Str., 46009 Ternopil, Ukraine
3
Department of Economic Expertise and Land Management, West Ukrainian National University, 11 Lvivska Str., 46009 Ternopil, Ukraine
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2026, 8(6), 161; https://doi.org/10.3390/make8060161
Submission received: 7 May 2026 / Revised: 9 June 2026 / Accepted: 10 June 2026 / Published: 11 June 2026
(This article belongs to the Special Issue Trustworthy AI: Integrating Knowledge, Retrieval, and Reasoning)

Abstract

Large language models (LLMs) increasingly require robust evaluation under realistic instruction-following conditions, particularly for fine-tuned task-specific adapters operating in multilingual environments. This study proposes a scenario-adaptive evaluation framework for assessing the reliability of fine-tuned text models across two application regimes: misinformation detection (disinfo) and knowledge-grounded factual biography generation (heroes). The framework integrates automated generation of balanced risk-oriented scenarios, bilingual evaluation in English and Ukrainian, the LLM-as-a-Judge paradigm, and multidimensional robustness analysis through the Alignment Robustness Index (ARI). Six LoRA-adapted models based on Qwen2.5-3B-Instruct, SmolLM2-1.7B-Instruct, and TinyLlama-1.1B-Chat-v1.0 were evaluated. The implemented pipeline generated 2052 scenarios and 6156 model responses, producing a final bilingual analytical subset of 4104 judged records. Experimental results show that task-specific adaptation produces task-dependent robustness profiles. In the disinfo case, Qwen2.5-3B achieved the strongest overall performance, combining the highest safety and classification accuracy. In contrast, the heroes case revealed a more compressed and multidimensional vulnerability space without a single dominant model. The results further demonstrate the importance of multilingual evaluation, as weaker adapters exhibited more pronounced cross-lingual safety gaps. Overall, the framework provides a reproducible and practically applicable methodology for evaluating fine-tuned language models under imperfect instruction conditions.
Keywords: large language models; scenario-based evaluation; multilingual robustness; LoRA adaptation; LLM-as-a-Judge; misinformation detection; trustworthy AI; hallucination; safety evaluation; Alignment Robustness Index (ARI) large language models; scenario-based evaluation; multilingual robustness; LoRA adaptation; LLM-as-a-Judge; misinformation detection; trustworthy AI; hallucination; safety evaluation; Alignment Robustness Index (ARI)
Graphical Abstract

Share and Cite

MDPI and ACS Style

Lipianina-Honcharenko, K.; Bykovyy, P.; Krysovatyy, A.; Komar, M.; Yazlyuk, B. Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection. Mach. Learn. Knowl. Extr. 2026, 8, 161. https://doi.org/10.3390/make8060161

AMA Style

Lipianina-Honcharenko K, Bykovyy P, Krysovatyy A, Komar M, Yazlyuk B. Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection. Machine Learning and Knowledge Extraction. 2026; 8(6):161. https://doi.org/10.3390/make8060161

Chicago/Turabian Style

Lipianina-Honcharenko, Khrystyna, Pavlo Bykovyy, Andriy Krysovatyy, Myroslav Komar, and Borys Yazlyuk. 2026. "Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection" Machine Learning and Knowledge Extraction 8, no. 6: 161. https://doi.org/10.3390/make8060161

APA Style

Lipianina-Honcharenko, K., Bykovyy, P., Krysovatyy, A., Komar, M., & Yazlyuk, B. (2026). Scenario-Adaptive Evaluation of Trustworthy Fine-Tuned Text Models Across Knowledge-Grounded Generation and Misinformation Detection. Machine Learning and Knowledge Extraction, 8(6), 161. https://doi.org/10.3390/make8060161

Article Metrics

Back to TopTop