Moroccan Institutional Chatbots: A Hybrid Approach with LLMs, Semantic Matching, and Dialect Adaptation for DARIJA

Ennasri, Oumaima; El Bhiri, Brahim; Ben Maissa, Yann

doi:10.3390/engproc2025112041

Open AccessProceeding Paper

Moroccan Institutional Chatbots: A Hybrid Approach with LLMs, Semantic Matching, and Dialect Adaptation for DARIJA^†

by

Oumaima Ennasri

^1,*,

Brahim El Bhiri

² and

Yann Ben Maissa

¹

STRS Laboratory, National Institute of Posts and Telecommunications (INPT), Rabat 10112, Morocco

²

Harmony Technology, Rabat 10100 , Morocco

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th edition of the International Conference on Advanced Technologies for Humanity (ICATH 2025), Kenitra, Morocco, 9–11 July 2025.

Eng. Proc. 2025, 112(1), 41; https://doi.org/10.3390/engproc2025112041

Published: 20 October 2025

Download

Browse Figures

Versions Notes

Abstract

With the rapid growth of LLM-based chatbots and their applications in fields such as health, education, and entertainment, there is a growing interest in developing systems capable of mimicking human behavior through conversation and natural language interaction. These chatbots are available in several languages, such as English, French, and Spanish. Unfortunately, Arabic chatbots—especially those that understand Arabic dialects—are still very limited. In this paper, we develop a chatbot for the Moroccan Arabic dialect, specifically designed for the public sector, such as the fiscal domain and government administration. These institutions require tools to reduce communication loads, limit human assistance, and minimize the time needed to find documents or complete payment procedures. Our optimized chatbot combines recent technologies like LLMs and semantic similarity. It supports Moroccan citizens by providing responses in the Moroccan dialect (Darija), both in text and speech, without requiring extensive resources. It also supports other citizens in French, Spanish, and English. Our chatbot was tested in a real use case in the tax domain, and the results were satisfactory, especially considering the general complexity of the Arabic language and the particular challenges of the Moroccan dialect.

Keywords:

artificial intelligence; LLM; natural language processing; semantic similarity; chatbot; Arabic; Moroccan dialect

1. Introduction

In recent years, Artificial Intelligence (AI) has had a powerful influence on human lives. It has achieved remarkable results in many fields of technology and computing and now interacts with various branches of science. Among these, natural language processing (NLP) [1] stands out, referring to the ability of computers to understand and process human language, enabling seamless human–computer interaction. Numerous applications of NLP have emerged, especially chatbots [2], which serve as virtual assistants capable of understanding user needs and providing relevant information or responses based on their knowledge.

The majority of existing work in the development of chatbots has been focused on the English language. However, Arabic chatbots remain scarce due to the complexity and variability of the Arabic language [3]. Spoken Arabic differs from one country to another, giving rise to several dialects, each with its own characteristics. Additionally, the development of Arabic chatbots is hindered by a lack of resources, particularly the scarcity of annotated datasets needed to train NLP models. Although there has been growing interest in Arabic dialects in recent years, these efforts still face many challenges, and the results remain far from those achieved in English.

In this paper, we present a chatbot that supports Moroccan Arabic (Darija). This dialect, like others in the Arab world, remains one of the most effective means of communicating with citizens in their daily lives. So we propose a chatbot architecture that can be adapted to any Arabic dialect to provide domain-specific responses. Our system was tested in the Moroccan fiscal system, where users often need assistance in procedures such as making payments or finding documents related to tax regulations. To evaluate the chatbot’s performance, we used a dataset of 500 question–answer pairs.

The remainder of this paper is organized as follows: Section 2 presents related work and existing Arabic chatbots. Section 3 describes the methodology used to develop our chatbot, including the system architecture, data collection, and the integration of the Darija dialect. Section 4 details the experimental setup, including data labeling, the AI models, and the results. Finally, Section 5 concludes this paper.

2. Related Works

The growing demand for chatbot systems in fields such as healthcare, education, and public services has driven significant advances in natural language processing (NLP) and conversational AI. Large language models (LLMs) like ChatGPT (GPT-4 Turbo), Gemini (Gemini 2.5), and LLaMA (LLaMA 2) now demonstrate strong performance across various domains and in multiple languages. now demonstrate strong performance across various domains and in multiple languages. Despite this, the support for Arabic chatbots, especially dialectal Arabic, is still limited. In this section, we will describe some related studies, research, and applications that are similar to our work.

2.1. Multilingual Language Models and Chatbots

Multilingual models like XLM-R [4] and mBERT [5] have enabled cross-lingual understanding. However, the majority of Arabic chatbots are handled using standard Arabic, and this limitation excludes a significant portion of native speakers who use regional dialects in daily conversations.

2.2. NLP Challenges for Moroccan Darija

Arabic dialects, including Moroccan Darija, present a unique linguistic challenge because they are informal, highly oral, and non-standardized. Recent research by Abbad et al. [6] presents DarijaBERT, a linguistic model based on transformers, optimized and fine-tuned on written Moroccan Arabic. In parallel, Shang et al. [3] proposed Atlas Chat, adapting LLM architectures for Darija dialect processing, highlighting the promise and challenges of integrating dialect-specific capabilities into modern language models. This offers a crucial basis for NLP applications in this under-resourced language.

2.3. Semantic Similarity Models for Dialogue Systems

To improve chatbot response accuracy, it is important to combine not only text generation but also retrieval mechanisms. Sentence-BERT [7] and MiniLM [8] show strong performance in semantic similarity search, enabling efficient information retrieval. These approaches are often used with vector databases and scalable vector search libraries such as FAISS [9] to enhance chatbot accuracy by matching user input to semantically similar responses stored in a predefined dataset.

2.4. Chatbots in Government and Public Sector Applications

In the public sector, conversational agents play a crucial role by reducing workload and improving service accessibility. Many countries have AI-powered chatbots to respond to citizen inquiries, such as the USA, UAE, and Scandinavian countries. However, in Arab countries, especially North Africa, the integration of chatbots in the public sector is limited to Standard Arabic and French, which does not reflect daily communication patterns. Our work addresses this challenge by offering an approach that can identify a similarity level comparable to French to find similar patterns in the Moroccan dialect, enhancing access for Moroccan citizens.

3. Methodology

In this section, we detail the methodology used for the development of the chatbot, which consists of five stages: collecting data from online sources within the context of the tax domain, structuring the data into a JSON file, creating equivalent data in the Moroccan dialect, designing and developing the chatbot algorithm, and testing the chatbot to ensure accuracy and relevance.

3.1. Dataset

To collect and prepare the data, we analyzed the official websites of Moroccan public entities operating in the fiscal domain. We extracted frequently asked questions (FAQs), citizen guides, and various documents to cover the most frequently asked questions related to public services.

The data collected was mainly in French. We gathered a total of 500 question–answer pairs, each representing or associated with a specific class or intent, and for structuring the data, we used a JSON file (Figure 1), which consists of a list of intents. Each intent (or tag) is associated with multiple elements:

tag: A unique identifier representing the intent or category of the question.
patterns: A set of example user questions that represent different ways of asking the same thing. These are used to train the chatbot to recognize various formulations.
responses: The predefined response(s) that the chatbot returns when this intent is detected. In the French version, this field contains responses in French. In the Darija dataset, this field holds responses in the Moroccan dialect.
context_set: An optional context label that helps place the user’s question within a specific conversation flow or situation to manage the conversation more effectively.

Figure 1. Structure of a JSON intent element in the dataset.

To enhance the dataset, we supplement each question with multiple rephrasings to improve the robustness of the model. These rephrasings simulate the variety of ways users may express the same intent (Table 1).

After structuring the French dataset, we created an equivalent dataset in the Moroccan dialect (Darija) (Figure 2) using the same structure and the same number of question–answer pairs. This Darija dataset was also formatted in a JSON file.

3.2. System Design

Our system supports multiple languages: French, English, Spanish, standard Arabic, and Moroccan dialect (Darija) (Table 2). The design is based on a pivot architecture in which French serves as the main language for intent classification and response generation. The system relies on two types of similarity techniques to identify the most relevant intent and response: a static similarity method based on lexical matching and a semantic similarity model, specifically all-MiniLM-L6-v2 [10] (Table 3), which is used to compute deep semantic similarity between user queries and dataset patterns. Depending on the selected language, we apply a specific processing flow that combines these methods to return the correct response.

3.2.1. Similarity Computation Approaches

Static Similarity: This method compares the input query q with each pattern

p_{i}

using a lexical similarity measure based on the normalized Levenshtein distance:

{sim}_{static} (q, p_{i}) = 1 - \frac{Levenshtein (q, p_{i})}{max (| q |, | p_{i} |)}

Here,

Levenshtein (q, p_{i})

represents the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform q into

p_{i}

, and

| \cdot |

denotes the length of the string. If the similarity score exceeds a threshold

τ

, the system considers

p_{i}

a close match and directly returns the associated response.

Semantic Similarity: If no adequate static match is found, the system resorts to semantic similarity, which captures the meaning beyond exact lexical matches. Both the user query and patterns are embedded into vector representations using a sentence embedding model. The similarity between vectors

v_{q}

and

v_{j}

is then computed using cosine similarity:

{sim}_{\cos ine} (v_{q}, v_{j}) = \frac{v_{q} \cdot v_{j}}{∥ v_{q} ∥ ∥ v_{j} ∥}

where

v_{q} \cdot v_{j}

is the dot product of the vectors and

∥ \cdot ∥

denotes the vector norm. The pattern with the highest semantic similarity score determines the best-matching intent and response.

3.2.2. French Input

When the user submits a question in French, the processing flow follows the structure illustrated in Figure 3. First, we check for an exact match between the user input and the dataset using static similarity, where the user input is matched against the dataset on a per-character basis. Although the probability of the user entering a question identical to one stored in the dataset is low, our system includes an intelligent suggestion mechanism (Figure 4). This module compares the current user input to stored questions and displays, in real time, a list of suggested queries. These suggestions are exact questions already present in the dataset, which encourages users to select or refine their query accordingly. In such cases, only static similarity is required. If no exact match is found, the system computes the semantic similarity between the input and all available intent patterns using the semantic model [10]. A ranked list of semantically similar questions from various classes is then presented to the user. The user can confirm the intended question, after which the corresponding response is returned immediately.

3.2.3. Other Languages: English, Standard Arabic, and Spanish

For these standard languages, we use a translation-based strategy using Google Translate (Figure 5).

For these languages, the user input is translated into French to find the response normally, as described in Section 3.2.2. Then, we take the response and retranslate it into the original language using Google Translate.

3.2.4. Moroccan Dialect (Darija) Handling

For the Darija dialect, we use the Gemini model, specifically Gemini-1.5-flash [11], for translation to French. After testing several LLMs for Darija-to-French translation, we identified that Gemini provided the highest accuracy for this translation. Once translated into French, it becomes easier to identify the corresponding pattern and retrieve the match using the aligned Darija dataset, as detailed in Algorithm 1.

Algorithm 1: Darija query handling algorithm with static and semantic matching.

1:: Start
2:: Input:
3:: $q_{d}$ – User input in Moroccan Darija
4:: $D_{f r}$ – French dataset (intents, patterns, responses)
5:: $D_{d a}$ – Aligned Darija dataset (intents, patterns, responses)
6:: $M_{s i m}$ – Semantic similarity model (all-MiniLM-L6-v2)
7:: $T_{d a \to f r}$ – Translation model from Darija to French (Gemini)
8:: Output: Response in Darija
9:: Step 1: Static Matching in Darija
10:: for each intent i in $D_{d a}$ do
11:: for each pattern p in i do
12:: if $g e t_c l o s e_m a t c h (q_{d}, p)$ is True then
13:: /* Static match found */
14:: Retrieve response $r_{d a}$ corresponding to p
15:: Return: $r_{d a}$
16:: End
17:: end if
18:: end for
19:: end for
20:: /* Else: No static match found */
21:: Step 2: Translate Input
22:: Translate $q_{d}$ to French using $T_{d a \to f r}$
23:: Let $q_{f r} = T_{d a \to f r} (q_{d})$
24:: Step 3: Semantic Similarity Matching
25:: for each intent i in $D_{f r}$ do
26:: for each pattern p in i do
27:: Compute similarity score $s = M_{s i m} (q_{f r}, p)$
28:: end for
29:: end for
30:: Select intent $i^{*}$ with highest average similarity
31:: Step 4: Retrieve Response
32:: Retrieve corresponding response $r_{d a}$ from $D_{d a}$ using intent $i^{*}$
33:: Return: $r_{d a}$
34:: End

3.3. Audio Input and Output Handling

To enhance the accessibility of our chatbot, we add audio functionality for input and output across supported languages.

Speech Recognition (Input) For French, English, Spanish, and standard Arabic, we use WebKit Speech Recognition to transcribe speech to text to retrieve the answer, as described in Section 3.2.2 and Section 3.2.3.

We handle Darija with the same API, but we treat the speech-to-text output as standard Arabic.

Speech Synthesis (Output) For spoken responses, we integrate ResponsiveVoice to ensure that the chatbot can deliver answers audibly and fluently.

Darija Audio Traitement

For Darija TTS, we create a JSON file (Listing 1) that has the same architecture as the Darija and French dataset, respecting the number of responses and patterns, but instead of text, we add the path to the recorded audio. So when we retrieve the Darija answer, we also retrieve its equivalent audio path to be listenable (Figure 6).

Listing 1. Darija Audio JSON Structure.

4. Experiments and Evaluation

This section presents the evaluation methodology and performance metrics used to assess our multilingual chatbot system. We focus on two key aspects: response accuracy and computational efficiency.

4.1. Evaluation Metrics

We employ the following quantitative metrics to evaluate system performance.

4.1.1. Accuracy Metrics

Top-1 Accuracy: Measures the percentage of queries where the correct answer appears as the first suggestion returned by the system. This reflects the system’s precision in immediate response generation.
Top-3 Accuracy: Measures the percentage of queries where the correct answer appears among the top three suggestions. This indicates the system’s robustness when multiple candidate responses are considered.

4.1.2. Performance Metrics

Average Response Time: The mean duration (in seconds) required for:
-
Speech-to-text conversion (when applicable);
-
Language translation (for non-French queries);
-
Semantic similarity computation;
-
Response retrieval.
Standard Deviation of Response Time: Measures the variability in processing times across different queries. A low standard deviation indicates consistent performance regardless of input complexity.

4.2. Experimental Setup

We conducted our experiments using a pre-trained sentence transformer model for semantic similarity, combined with a FAISS index to retrieve the most similar patterns. All code was written in Python 3.11, using the sentence-transformers, scikit-learn, and faiss libraries.

The test sets consist of manually curated utterances in French, Moroccan dialect (Darija), English, Spanish, and modern standard Arabic, each associated with a predefined intent tag. Some non-French inputs were translated into French using Google Translate or a custom Darija-to-French module before similarity computation. For each input, we measured whether the expected intent was returned in the Top-1 or Top-3 predictions based on semantic similarity. We also recorded the average response time and its standard deviation across the test set.

4.3. Results

Table 4 summarizes the performance of our chatbot in five languages. The system shows the highest Top-1 and Top-3 accuracy in French, which is expected since it is the primary language used during training. In particular, despite the lack of standardized orthography and the linguistic complexity of Moroccan Darija, the system achieves a respectable Top-1 accuracy of 70% and a Top-3 accuracy of 90%. These results outperform those of other languages, such as standard Arabic, demonstrating the effectiveness of our approach in low-resource, dialectal settings. Furthermore, the average response time remains within acceptable limits, confirming the efficiency of our lightweight method.

5. Conclusions

In this work, we presented a lightweight and efficient chatbot system designed specifically for the Darija dialect, leveraging a similarity-based retrieval approach combined with light translation techniques. Our experiments show that, despite using only around 500 question–answer pairs, the system achieves strong accuracy and low latency, outperforming many heavier multilingual models that require significantly more computational resources. This makes our solution particularly suitable for real-time deployment in resource-constrained environments, such as public service applications.

Future work will focus on expanding the dataset to cover more diverse intents and dialects, improving translation quality, and integrating the chatbot into practical platforms to evaluate user experience in real-world scenarios.

Author Contributions

Conceptualization, O.E., B.E.B., Y.B.M.; methodology, O.E., B.E.B.; software, O.E.; validation, O.E., B.E.B., Y.B.M.; formal analysis, O.E.; investigation, O.E.; resources, B.E.B.; data curation, O.E.; writing—original draft preparation, O.E.; writing—review and editing, O.E., B.E.B., Y.B.M.; visualization, O.E.; supervision, B.E.B., Y.B.M.; project administration, O.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by Harmony Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset supporting the results reported in this study is not publicly available due to privacy constraints, but can be provided upon reasonable request.

Conflicts of Interest

Oumaima Ennasri conducted this research as an intern at Harmony Technology under the supervision of Brahim El Bhiri, with academic guidance from Yann Ben Maissa at INPT. The research was carried out in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Kang, Y.; Cai, Z.; Tan, C.W.; Huang, Q.; Liu, H. Natural language processing (NLP) in management research: A literature review. J. Manag. Anal. 2020, 7, 139–172. [Google Scholar] [CrossRef]
Lalwani, T.; Bhalotia, S.; Pal, A.; Rathod, V.; Bisen, S. Implementation of a Chatbot System Using AI and NLP. Int. J. Innov. Res. Comput. Sci. Technol. (IJIRCST) 2018, 6, 26–30. [Google Scholar] [CrossRef]
Saoudi, Y.; Gammoudi, M.M. Trends and challenges of Arabic Chatbots: Literature review. Jordanian J. Comput. Inf. Technol. (JJCIT) 2023, 9, 1. [Google Scholar] [CrossRef]
Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised cross-lingual representation learning at scale. arXiv 2019, arXiv:1911.02116. [Google Scholar]
Pires, T.; Schlinger, E.; Garrette, D. How multilingual is multilingual BERT? arXiv 2019, arXiv:1906.01502. [Google Scholar]
Gaanoun, K.; Naira, A.M.; Allak, A.; Benelallam, I. DarijaBERT: A step forward in NLP for the written Moroccan dialect. Int. J. Data Sci. Anal. 2024, 20, 917–929. [Google Scholar] [CrossRef]
Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural Inf. Process. Syst. 2020, 33, 5776–5788. [Google Scholar]
Douze, M.; Guzhva, A.; Deng, C.; Johnson, J.; Szilvasy, G.; Mazaré, P.E.; Lomeli, M.; Hosseini, L.; Jégou, H. The faiss library. arXiv 2024, arXiv:2401.08281. [Google Scholar] [CrossRef]
Face, H. Sentence-Transformers/all-MiniLM-L6-v2. April 2023. Available online: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 (accessed on 1 March 2025).
Team, G.; Georgiev, P.; Lei, V.I.; Burnell, R.; Bai, L.; Gulati, A.; Tanzer, G.; Vincent, D.; Pan, Z.; Wang, S.; et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv 2024, arXiv:2403.05530. [Google Scholar] [CrossRef]

Figure 2. Intent alignment between French and Moroccan dialect (Darija).

Figure 3. French input handling.

Figure 4. Suggestion mechanism.

Figure 5. Translation-based approach using Google Translate Python 3.11.0.

Figure 6. Multilingual audio input/output processing.

Table 1. Reformulation of a base question for the intent contact_tgr. Non-English terms are translated in parentheses.

Base Question	Generated Reformulations
Comment puis-je contacter la TGR? (How can I contact the TGR?)	Où puis-je trouver les coordonnées de la TGR? (Where can I find TGR contact information?) Je cherche les informations pour contacter la TGR. (I am looking for information to contact the TGR.) Quel est le moyen pour joindre la TGR? (What is the way to reach the TGR?) Comment entrer en contact avec la TGR? (How to get in touch with the TGR?)

Table 2. Multilingual processing strategy for the chatbot system.

Input Language	Translation Step	Semantic Search Language	Response Mapping	Final Response Language
French	None	French Dataset	Direct Match	French
English	EN → FR (Google Translate)	French Dataset	FR Match → EN (Translate Back)	English
Standard Arabic	AR → FR (Google Translate)	French Dataset	FR Match → AR (Translate Back)	Standard Arabic
Spanish	ES → FR (Google Translate)	French Dataset	FR Match → ES (Translate Back)	Spanish
Moroccan Darija	DA → FR (Gemini)	French Dataset	FR Match → Darija (Aligned Dataset)	Darija

Table 3. Key specifications of the all-MiniLM-L6-v2 model.

Specification	Value
Base Model	`melmrs/MiniLM-L6-H384-uncased`
Max Sequence Length	256 tokens
Embedding Dimensions	384
Normalized Embeddings	Yes
Similarity Functions	Dot Product, Cosine, Euclidean Distance
Model Size	80 MB
Pooling Method	Mean Pooling
Training Data	1B+ pairs

Table 4. Model accuracy and response time by language.

Language	Top-1 Accuracy (%)	Top-3 Accuracy (%)	Average Response Time (s)	Standard Deviation (s)
French	90.0	100.0	0.0181	0.0013
Spanish	80.0	100.0	0.3015	0.0942
English	70.0	100.0	0.2482	0.0310
Darija	70.0	90.0	0.6825	0.0400
Standard Arabic	60.0	90.0	0.2391	0.0289

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ennasri, O.; El Bhiri, B.; Ben Maissa, Y. Moroccan Institutional Chatbots: A Hybrid Approach with LLMs, Semantic Matching, and Dialect Adaptation for DARIJA. Eng. Proc. 2025, 112, 41. https://doi.org/10.3390/engproc2025112041

AMA Style

Ennasri O, El Bhiri B, Ben Maissa Y. Moroccan Institutional Chatbots: A Hybrid Approach with LLMs, Semantic Matching, and Dialect Adaptation for DARIJA. Engineering Proceedings. 2025; 112(1):41. https://doi.org/10.3390/engproc2025112041

Chicago/Turabian Style

Ennasri, Oumaima, Brahim El Bhiri, and Yann Ben Maissa. 2025. "Moroccan Institutional Chatbots: A Hybrid Approach with LLMs, Semantic Matching, and Dialect Adaptation for DARIJA" Engineering Proceedings 112, no. 1: 41. https://doi.org/10.3390/engproc2025112041

APA Style

Ennasri, O., El Bhiri, B., & Ben Maissa, Y. (2025). Moroccan Institutional Chatbots: A Hybrid Approach with LLMs, Semantic Matching, and Dialect Adaptation for DARIJA. Engineering Proceedings, 112(1), 41. https://doi.org/10.3390/engproc2025112041

Article Menu

Moroccan Institutional Chatbots: A Hybrid Approach with LLMs, Semantic Matching, and Dialect Adaptation for DARIJA^†

Abstract

1. Introduction

2. Related Works