ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering

Ayat, Narimene; Harrag, Fouzi; Harrag, Nassir; Shaalan, Khaled

doi:10.3390/computation14060121

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering

¹

LRSD Laboratory, Farhat Abbas University of Setif 1, Setif 19000, Algeria

²

Department of Informatics, The British University in Dubai, Dubai 345015, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Computation 2026, 14(6), 121; https://doi.org/10.3390/computation14060121 (registering DOI)

Submission received: 6 March 2026 / Revised: 13 May 2026 / Accepted: 19 May 2026 / Published: 24 May 2026

(This article belongs to the Special Issue Recent Advances on Computational Linguistics and Natural Language Processing)

Download Versions Notes

Abstract

Arabic’s complex morphological system and the optional use of short vowels (tashkīl) introduce substantial lexical ambiguity, posing significant challenges for Large Language Models (LLMs). While diacritics enhance linguistic precision, LLMs trained predominantly on undiacritized corpora often exhibit performance degradation when processing fully diacritized inputs due to representation shifts and tokenization inconsistencies. To address this limitation, we propose the Arabic Diacritic Lexical Knowledge Graph (ADL-KG), a structured framework that links diacritized and undiacritized forms through integrated lexical, morphological, and semantic knowledge. Building upon this resource, we introduce Diacritic-Aware Knowledge Graph Prompting (DA-KGP), a prompt augmentation strategy that injects explicit linguistic features into LLM inputs to facilitate robust interpretation of diacritized Arabic text. The framework is evaluated on the Arabic Reading Comprehension Dataset under zero-shot and few-shot question answering across AraGPT2-base, BLOOMZ-560M, SILMA-v1, and LLaMA 3.1-8B. Performance is assessed using Exact Match, BLEU, ROUGE-1, and BERTScore-F1. Experimental results show that fully diacritized prompts significantly degrade baseline performance, whereas DA-KGP consistently mitigates this effect by improving semantic alignment across diverse architectures. For AraGPT2-base, KG augmentation improves average BERTScore-F1 by +5.96 points. SILMA-v1 achieves the strongest lexical improvements, reaching 21.57 BLEU and 81.31% BERTScore-F1 in the KG-enhanced two-shot configuration. LLaMA 3.1-8B achieves the highest overall semantic performance with 82.54% BERTScore-F1 under KG-enhanced prompting, while BLOOMZ-560M also demonstrates statistically significant semantic gains through structured augmentation. These findings demonstrate that morphologically informed prompting and structured lexical grounding provide an effective and parameter-efficient strategy for improving the robustness and semantic fidelity of Arabic LLMs under fully diacritized input conditions.

Keywords: Arabic Natural Language Processing (ANLP); Large Language Models (LLMs); knowledge graph prompting; Arabic diacritization (Tashkeel); question answering; morphological analysis

Share and Cite

MDPI and ACS Style

Ayat, N.; Harrag, F.; Harrag, N.; Shaalan, K. ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering. Computation 2026, 14, 121. https://doi.org/10.3390/computation14060121

AMA Style

Ayat N, Harrag F, Harrag N, Shaalan K. ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering. Computation. 2026; 14(6):121. https://doi.org/10.3390/computation14060121

Chicago/Turabian Style

Ayat, Narimene, Fouzi Harrag, Nassir Harrag, and Khaled Shaalan. 2026. "ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering" Computation 14, no. 6: 121. https://doi.org/10.3390/computation14060121

APA Style

Ayat, N., Harrag, F., Harrag, N., & Shaalan, K. (2026). ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering. Computation, 14(6), 121. https://doi.org/10.3390/computation14060121

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI