This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering
by
Narimene Ayat
Narimene Ayat 1
,
Fouzi Harrag
Fouzi Harrag 1,*
,
Nassir Harrag
Nassir Harrag 1 and
Khaled Shaalan
Khaled Shaalan 2
1
LRSD Laboratory, Farhat Abbas University of Setif 1, Setif 19000, Algeria
2
Department of Informatics, The British University in Dubai, Dubai 345015, United Arab Emirates
*
Author to whom correspondence should be addressed.
Computation 2026, 14(6), 121; https://doi.org/10.3390/computation14060121 (registering DOI)
Submission received: 6 March 2026
/
Revised: 13 May 2026
/
Accepted: 19 May 2026
/
Published: 24 May 2026
Abstract
Arabic’s complex morphological system and the optional use of short vowels (tashkīl) introduce substantial lexical ambiguity, posing significant challenges for Large Language Models (LLMs). While diacritics enhance linguistic precision, LLMs trained predominantly on undiacritized corpora often exhibit performance degradation when processing fully diacritized inputs due to representation shifts and tokenization inconsistencies. To address this limitation, we propose the Arabic Diacritic Lexical Knowledge Graph (ADL-KG), a structured framework that links diacritized and undiacritized forms through integrated lexical, morphological, and semantic knowledge. Building upon this resource, we introduce Diacritic-Aware Knowledge Graph Prompting (DA-KGP), a prompt augmentation strategy that injects explicit linguistic features into LLM inputs to facilitate robust interpretation of diacritized Arabic text. The framework is evaluated on the Arabic Reading Comprehension Dataset under zero-shot and few-shot question answering across AraGPT2-base, BLOOMZ-560M, SILMA-v1, and LLaMA 3.1-8B. Performance is assessed using Exact Match, BLEU, ROUGE-1, and BERTScore-F1. Experimental results show that fully diacritized prompts significantly degrade baseline performance, whereas DA-KGP consistently mitigates this effect by improving semantic alignment across diverse architectures. For AraGPT2-base, KG augmentation improves average BERTScore-F1 by +5.96 points. SILMA-v1 achieves the strongest lexical improvements, reaching 21.57 BLEU and 81.31% BERTScore-F1 in the KG-enhanced two-shot configuration. LLaMA 3.1-8B achieves the highest overall semantic performance with 82.54% BERTScore-F1 under KG-enhanced prompting, while BLOOMZ-560M also demonstrates statistically significant semantic gains through structured augmentation. These findings demonstrate that morphologically informed prompting and structured lexical grounding provide an effective and parameter-efficient strategy for improving the robustness and semantic fidelity of Arabic LLMs under fully diacritized input conditions.
Share and Cite
MDPI and ACS Style
Ayat, N.; Harrag, F.; Harrag, N.; Shaalan, K.
ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering. Computation 2026, 14, 121.
https://doi.org/10.3390/computation14060121
AMA Style
Ayat N, Harrag F, Harrag N, Shaalan K.
ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering. Computation. 2026; 14(6):121.
https://doi.org/10.3390/computation14060121
Chicago/Turabian Style
Ayat, Narimene, Fouzi Harrag, Nassir Harrag, and Khaled Shaalan.
2026. "ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering" Computation 14, no. 6: 121.
https://doi.org/10.3390/computation14060121
APA Style
Ayat, N., Harrag, F., Harrag, N., & Shaalan, K.
(2026). ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering. Computation, 14(6), 121.
https://doi.org/10.3390/computation14060121
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.