Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis

Alsehaimi, Afnan; Babour, Amal; Alahmadi, Dimah

doi:10.3390/app151910659

Open AccessReview

Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis

by

Afnan Alsehaimi

^1,2,*

,

Amal Babour

¹

and

Dimah Alahmadi

¹

Information Systems Department, King Abdul Aziz University, Jeddah 21589, Saudi Arabia

²

Computer Science Department, College of Science, Northern Border University, Arar 91431, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10659; https://doi.org/10.3390/app151910659

Submission received: 22 August 2025 / Revised: 24 September 2025 / Accepted: 29 September 2025 / Published: 2 October 2025

Download

Browse Figures

Versions Notes

Abstract

The increasing prevalence of Arabic text in digital media offers significant potential for sentiment analysis. However, challenges such as linguistic complexity and limited resources make Arabic sentiment analysis (ASA) particularly difficult. In addition, explainable artificial intelligence (XAI) has become crucial for improving the transparency and trustworthiness of artificial intelligence (AI) models. This paper addresses the integration of XAI techniques in ASA through a scoping review of developments. This study critically identifies trends in model usage, examines explainability methods, and explores how these techniques enhance the explainability of model decisions. This review is crucial for consolidating fragmented efforts, identifying key methodological trends, and guiding future research in this emerging area. Online databases (IEEE Xplore, ACM Digital Library, Scopus, Web of Science, ScienceDirect, and Google Scholar) were searched to identify papers published between 1 January 2016 and 31 March 2025. The last search across all databases was conducted on 1 April 2025. From these, 19 peer-reviewed journal articles and conference papers focusing on ASA with explicit use of XAI techniques were selected for inclusion. This time frame was chosen to capture the most recent decade of research, reflecting advances in deep learning and the transformer-based and explainable AI methods. The findings indicate that transformer-based models and deep learning approaches dominate in ASA, achieving high accuracy, and that local interpretable model-agnostic explanations (LIME) is the most widely used explainability tool. However, challenges such as dialectal variation, small or imbalanced datasets, and the black box nature of advanced models persist. To address these challenges future research directions should include the creation of richer Arabic sentiment datasets, the development of hybrid explainability models, and the enhancement of adversarial robustness.

Keywords:

Arabic sentiment analysis; explainable AI; interpretability; Arabic NLP; sentiment classification; XAI techniques

1. Introduction

Artificial intelligence (AI) techniques aim to simulate aspects of human cognition in machines, enabling them to learn, reason, and make decisions. Early AI systems relied on rule-based programming. However, the emergence of machine learning (ML) methods in the late 20th century marked a significant shift, allowing systems to improve their performance on the basis of data rather than explicit programming. Over time, deep learning (DL) technology has further revolutionized the AI field by enabling models to extract complex patterns from vast datasets, leading to advancements in computer vision, speech recognition, and natural language processing (NLP). One of the most impactful breakthroughs in the field of AI has been the development of large language models (LLMs). LLMs are deep learning-based architectures trained based on massive text corpora [1]. These architectures support human-like text generation, context aware understanding, and NLP task execution. Transformer-based models have significantly improved the accuracy and efficiency of sentiment analysis and other NLP applications. These include pretrained transformer models such as bidirectional encoder representations from transformers (BERT) and their Arabic counterparts like AraBERT and MARBERT, which are primarily designed for language understanding tasks. More recently, LLMs, such as GPT, have extended these capabilities by enabling both language understanding and text generation.

Sentiment analysis (also known as opinion mining) is a fundamental task in NLP that involves identifying and classifying sentiment polarity (positive, negative, or neutral) expressed in text [2].With the exponential growth of Arabic content on social media, e-commerce, and other platforms, Arabic sentiment analysis (ASA) has gained significant importance for businesses and policy-makers to gauge public opinion and customer feedback. However, ASA remains challenging due to the complexity of the Arabic language, which is characterized by rich morphology, diverse dialects, and context-dependent semantics. In recent years, DL methods, especially those based on neural networks and transformer-based LLMs, have greatly improved sentiment classification performance by automatically learning feature representations from data. Notably, the development of Arabic pretrained language models such as AraBERT and MARBERT has contributed to narrowing the gap between Arabic and high-resources languages in terms of modeling accuracy.

Although modern DL and LLMs are accurate, they are often perceived as black boxes with unclear reasoning and are hard to interpret. This lack of transparency can reduce user trust and impede the adoption of AI systems, especially in sensitive or critical domains. In response to these concerns, the field of explainable artificial intelligence (XAI) has emerged as a critical area of research aimed at enhancing the explainability and transparency of AI model decisions. The formal recognition and momentum of XAI research were significantly catalyzed by a seminal 2016 presentation by David Gunning [3]. The aim of XAI is to develop techniques that clarify the internal mechanisms and decision-making processes, offering insights into why a model behaves in a certain way or produces a specific output [4].

In light of these developments, this review explores the intersection of performance and transparency in Arabic sentiment analysis. The aim is to systematically examine how XAI techniques are applied to Arabic models and evaluate the extent to which these methods improve explainability without compromising accuracy. This review highlights two research domains that have largely evolved in parallel: the advancement of Arabic language modeling and the growing need for responsible XAI. This review is guided by two main objectives:

To examine the AI methods and XAI techniques that have been applied to enhance explainability in ASA models.
To identify the key challenges in this area and explore future research directions aimed at improving both the explainability and overall performance of ASA systems.

By surveying the current literature, our goal is to provide a scoping overview of the state of the art in explainable ASA, highlight common trends and gaps, and suggest avenues for future research. To the best of our knowledge, this review represents an early attempt to systematically synthesize research on explainable sentiment analysis in the Arabic language. Our findings aim to accelerate the development of transparent and effective sentiment analysis tools for Arabic, bridging the gap between advanced AI technologies and the needs of Arabic-speaking users.

ASA has evolved from early lexicon-based methods to sophisticated deep learning and transformer models, leading to notable increases in accuracy across diverse domains. While these advancements contribute to bring the performance of ASA closer to the performance of high-resource languages, they often reduce explainability, making AI decisions unclear and difficult to trust. To address this issue, the integration of XAI into ASA has become essential to enhance transparency and foster trust in these systems. The following points summarize the key stages in the development of ASA techniques:

Foundations and Early Developments in Arabic Sentiment Analysis:

Early efforts in ASA relied on lexicon-based and rule-driven approaches [5]. These included adapting SentiWordNet to Modern Standard Arabic (MSA) and constructing small polarity lexicons for specific dialects [5]. While transparent, these methods suffered from sparse coverage and morphological complexity. They also showed weak adaptability across different dialects and text styles. The release of annotated corpora such as LABR (63K book reviews) marked a shift toward data-driven analysis [6]. However, they continued to face challenges with generalizability, diacritic omission, and code-switching.

Deep Learning Breakthroughs and the Rise of Transformer Models:

The emergence of neural architectures has facilitated the learning of distributed text representations directly from data, significantly improving performance. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) capture subword morphology and contextual patterns more effectively. Nonetheless, these models still struggle with dialectal variation. The adoption of transformer-based models pretrained based on extensive Arabic corpora, such as AraBERT [7], and specialized variants such as FinBERTAR has led to substantial gains, often exceeding 90% accuracy based on public benchmarks. While enhancing performance, modern Arabic sentiment models have introduced opacity, making their predictions harder to interpret and raising concerns about bias, fairness, and compliance.

Enhancing Performance Through Preprocessing Methods, Hybrid Models, and Domain Adaptation Techniques:

To address performance gaps, researchers have explored various strategies, including advanced preprocessing methods, feature engineering approaches, and hybrid deep learning architectures. These enhancements have led to success in diverse domains [8,9,10,11,12]. Moreover, pretrained models such as AraBERT, MARBERT, and QARIB have been instrumental in narrowing the performance gap between Arabic and high-resource languages. When fine-tuned for specific tasks, such as aspect-based sentiment analysis or dialect-specific classification, these models delivered robust results across domains [13,14,15].

Recent work [16] suggests that robustness should be assessed not only by accuracy or F1 scores, but also through the stability of explanations under small, meaning-preserving perturbations, such as synonym substitutions or character-level noise. This can be approached by generating local explanations (e.g., local interpretable model-agnostic explanations (LIME)/Shapley additive explanations (SHAP)) before and after minor perturbations and then measuring how consistent the salient features remain using metrics such as rank correlation and overlap scores. Low stability would indicate fragile decision signals that rely on spurious lexical cues, while stable explanations reflect semantically grounded features. Future studies should report both performance and explanation-stability, and they should also use adversarial tests to better evaluate model reliability in Arabic settings.

This lack of transparency hinders trust and responsible AI use, highlighting the urgent need for XAI methods. This review addresses critical gaps in the literature and sets the stage for future work focused on interpretable and accountable AI systems in Arabic-speaking contexts.

2. Materials and Methods

Scoping reviews are designed to map the breadth and depth of literature on a given topic, identifying key concepts, theoretical frameworks, research gaps, and the range of available evidence [17]. This approach is particularly well-suited for examining complex, emerging, or under-researched fields where methodologies and findings are heterogeneous. Owing to the uncertainty regarding the volume and diversity of relevant studies, a scoping review methodology was chosen. This study explores the intersection of the ASA and XAI models. In this section, we briefly introduce the scoping review method employed in this study. This scoping review was conducted in accordance with the PRISMA-ScR guidelines (Tricco et al., 2018) to ensure methodological rigor and transparency [18]. This scoping review was retrospectively registered on the Open Science Framework (OSF) on 14 September 2025 ([https://doi.org/10.17605/OSF.IO/RYDMP]), covering the review protocol, the PRISMA flow diagram, and the PRISMA-ScR checklist. Also, the extraction form and the PRISMA-ScR checklist have also been added to the Supplementary Materials.

Data selection and extraction were initially conducted by the researcher, and all records were subsequently verified through discussions with the two co-authors. Then, titles and abstracts were screened by the primary researcher, with full-text eligibility assessments conducted for the remaining records. Any uncertainties were resolved in discussion with the co-authors to ensure consistency and minimize errors across all included studies. A formal critical appraisal of individual sources was not performed, consistent with the nature of a scoping review. Instead, inclusion was limited to peer-reviewed journal articles and conference papers to ensure reliability.

The studies included in this review focus specifically on the application of explainability techniques within the domain of Arabic sentiment analysis. Only papers published in peer-reviewed journals or presented at reputable conferences were considered to ensure academic rigor and credibility; preprints and other non-peer-reviewed platforms were excluded. Furthermore, the review is limited to studies conducted between 1 January 2016 and 31 March 2025 to capture the most recent advancements in the field. The last search across all databases was conducted on 1 April 2025.

Studies were excluded if they focused on ASA without incorporating any explainability aspects or if they addressed explainability in sentiment analysis without involving the Arabic language. Additionally, papers published on non-peer-reviewed or preprint platforms were not considered. Preprints refer to manuscripts in repositories prior to peer review, while non–peer-reviewed platforms include non-academically reviewed sources. These were excluded to maintain the highest standards of scholarly rigor and reliability. Studies conducted outside the (31 March 2025) timeframe were also excluded.

A broad search strategy was employed across multiple academic databases, including IEEE Xplore, ACM Digital Library, Scopus, Web of Science, ScienceDirect, and Google Scholar. The primary search query was formulated as follows: (“Arabic sentiment analysis” OR “Arabic opinion mining”) AND (“Explainability” OR “XAI” OR “Interpretability” OR “Explainable AI”). The selection of studies was guided by the predefined inclusion and exclusion criteria to ensure the relevance and quality of the reviewed literature.

The initial search identified 203 candidate articles. These articles were filtered through title/abstract screening and then full-text screening on the basis of predefined inclusion criteria: studies had to involve sentiment analysis on Arabic text and incorporate some form of model explainability technique (either through post hoc XAI methods or inherently interpretable models). Studies that did not focus on Arabic data, did not include any explainability component, or were not primary research (e.g., pure surveys without experiments) were excluded. After removing duplicates and ineligible papers, 19 studies met all the criteria and were included in this review. The process flow diagram is shown in Figure 1.

For each included study, the following details were extracted:

The datasets and domain used for sentiment analysis (e.g., tweets, reviews, and specific topics).
The AI methodology or model employed (e.g., traditional ML classifier, deep neural network, transformer-based model, and hybrid ensemble).
The explainability techniques applied (e.g., LIME, SHAP, attention visualization, and rule-based explanations).
Key performance results (e.g., accuracy and F1 score).
Advantages or limitations concerning explainability or model performance.

These attributes are tabulated to facilitate cross-study comparisons. Figure 2 shows the distribution of selected studies by publication year, highlighting the growing research interest in explainable ASA over time.

The bar chart shows the publication trend of explainable ASA studies between 2016 and 2025. No relevant publications were identified from 2016 to 2019, reflecting the nascency of both explainability research and the focused application of sentiment analysis in Arabic at that time. A single study appeared in 2020, signaling the early stages of interest. No growth was observed in 2021. However, starting in 2022, the field began to gain momentum, with two studies published in 2022, 3 studies in 2023, and a significant surge to 12 studies in 2024. A modest number (1 study) is already recorded for 2025, indicating ongoing activity. This sharp increase after 2022 highlights a rising awareness of the importance of transparency, trust, and explainability in Arabic NLP applications. The absence of studies in earlier years emphasizes how explainability only recently became a mainstream concern within ASA, propelled by broader trends in AI ethics and model accountability.

3. Results

The findings of the scoping review reveal the breadth and diversity of artificial intelligence techniques employed in ASA, spanning from conventional machine learning methods to cutting-edge deep learning and transformer-based architectures specifically adapted for the Arabic language. A key focus of this review is the growing integration of XAI techniques, particularly focusing on tools such as LIME, SHAP, attention mechanisms, and rule-based explainability strategies. This section presents a structured synthesis of the studies analyzed, highlighting emerging trends, dominant methodologies, and the extent to which explainability has been incorporated into ASA systems. It also highlights notable gaps, most significantly, the limited number of works that address both ASA and XAI jointly. This synthesis provides a clearer understanding of the current research landscape, setting the stage for identifying future opportunities and challenges in building more interpretable, robust, and linguistically aware sentiment analysis systems for Arabic text.

3.1. Dataset Diversity and Domain Coverage

Before delving into AI methodologies, it is essential to acknowledge the diversity of datasets employed in ASA research. This breadth is critical for enhancing model robustness, domain adaptability, and generalizability. By drawing on corpora from health, finance, hospitality, literature, media, and cybersecurity, researchers better capture the real-world complexity of sentiment expression in Arabic.

In the environmental and financial sphere, Saxena et al. put forward a specialized dataset designed to capture sentiment signals related to sustainability and their influence on Environmental, Social, and Governance (ESG) metrics [15]. Within healthcare, the Arabic Health Services (AHS) corpus, compiled by Awadallah and co-authors, integrates both MSA and regional dialects [19]. Sweidan’s group contributed the ASAVACT dataset, which reflects public opinion on vaccines, whereas Hossain and colleagues examined Arabic tweets on asthma, identifying themes such as treatment recommendations and personal experiences [20,21]. Similarly, LASIK surgery feedback has been analyzed by Abdelwahab et al., who drew on tweets in Egyptian and Saudi dialects in addition to MSA [22].

To address the spread of misinformation and security threats, Ibrahim and Umer made use of the Arabic Fake News Detection (AFND) dataset to classify fabricated news and assess media credibility [23]. In a related effort, the ArabFake dataset, developed by Shehata and collaborators, was sourced from AraNews and annotated by both topical category (e.g., politics, economics, satire) and societal risk, such as anxiety induction or misinformation potential [24].

Cross-domain studies also shed light on model generalization. Elbasiony’s team integrated three distinct datasets: COVID-19 vaccine tweets (health), the Arabic 100k Reviews dataset (general reviews), and SS2030 (national reform sentiment), the latter reflecting public opinion on reforms such as women driving and cinema reopenings in Saudi Arabia [25]. In the hospitality and literary fields, Berrimi et al. and Atabuzzaman et al. employed resources including the Hotel Arabic Reviews Dataset (HARD), the Hotel Review (HTL) dataset, and literary corpora such as LABR and BRAD [26,27].

Further work has examined sentiment in public opinion, social media discourse, and online safety. Aftan et al. utilized the Saudi Twitter Dataset (STD) to analyze societal issues, while Hossain’s research group investigated Arabic news articles from Assabah and Hespress across categories like politics, sports, and economics [13,28]. For broader sentiment mining, Rahab et al. provided the Opinion Corpus for Arabic (OCA), a foundational benchmark dataset [29].

Efforts to counter harmful online behavior are also notable. Azzeh and collaborators developed the Arabic Cyberbullying Tweets Corpus (ArCybC) to identify cyberbullying patterns [30], whereas Alhumoud addressed religiously sensitive hate speech linked to the Ashura event [31]. Most notably, Abdelaty and Lazem worked with the OSACT5 dataset, a large-scale, manually annotated collection of Arabic tweets containing offensive, hateful, and violent content, gathered using an emoji-based, language-independent strategy [16]. Their study revealed nuanced linguistic patterns, deep cultural markers, and interpretability challenges including sarcasm and contextual ambiguity. Table 1 presents an overview of datasets used in explainability ASA and their source reference.

Collectively, these datasets represent a comprehensive spectrum across health, hospitality, cybersecurity, public policy, finance, education, and social media, offering a rich foundation for developing and evaluating interpretable, domain-aware Arabic sentiment analysis systems.

3.2. AI Methods for ASA on Explainability

The reviewed studies on ASA provide a clearer understanding of the current research landscape, setting the stage for identifying future opportunities and challenges in building more interpretable, robust, and linguistically aware sentiment analysis systems for Arabic text. These studies span a diverse array of artificial intelligence approaches, ranging from traditional ML models to more sophisticated DL models, LLMs, and hybrid architectures. The primary focus of this section is on classification performance and model effectiveness in sentiment detection, without delving into explainability techniques, which are detailed separately in the next section.

Traditional ML models have demonstrated competitive performance in Arabic sentiment analysis. For example, Awadallah et al. employed random forest (RF) and decision tree (DT) classifiers across various social media topics, with RF outperforming DT using 10-fold cross-validation [19]. Similarly, Elbasiony et al. compared support vector machines (SVMs) and logistic regression (LR) based on COVID-19-related Arabic text and reported that both models are highly accurate, outperforming the DT and RF classifiers across several datasets [25]. Rahab et al. introduced a rule-based classification method using a binary equilibrium optimization algorithm, offering interpretable results and robust accuracy across Arabic documents [29].

Deep learning approaches have become prominent because of their ability to capture complex linguistic structures. Sweidan et al. extended BiLSTM with XLNet embeddings and a multi self-attention mechanism, significantly enhancing the performance based on Arabic tweets related to COVID-19 vaccines [20]. Abdelwahab et al. applied an attention-based LSTM, which achieved high performance in handling multidialect Arabic [22]. Ibrahim and Umer demonstrated the efficacy of a CNN-LSTM model using Embeddings from Language Models (ELMo), which outperformed other embedding techniques, such as GloVe and BERT, in capturing contextual sentiment signals in Arabic [23]. Although primarily developed for misinformation analysis, this model demonstrated strong performance in capturing sentiment signals across multi-label tasks, validating the potential of multitask DL-based approaches in Arabic sentiment classification. Berrimi et al. integrated BiGRU and BiLSTM networks trained with FastText and learnable embeddings, achieving superior results based on benchmark datasets [26]. Atabuzzaman et al. employed BiLSTM and CNN-BiLSTM models with a noise-injection layer to mitigate overfitting, increasing classification accuracy [27]. Azzeh et al. introduced a BiLSTM-CNN architecture tailored for cyberbullying detection that was effective in learning contextual patterns specific to Arabic discourse [30]. Alhumoud proposed CNN-BiGRU-Focus, which combines convolutional layers and bidirectional recurrent units to capture both local patterns and sequential dependencies [31]. Almani and Tang applied a deep attention-based ANN that effectively captured salient sentiment cues without relying on external resources and achieved excellent accuracy based on review datasets [47].

Several studies have adopted transformer-based large language models for sentiment classification. Aftan et al. fine-tuned AraBERT for Saudi dialect sentiment analysis, leveraging minimal labeled data and generative augmentation to improve performance [13]. AZZEM et al. evaluated AraBERT and AraGPT across sentiment and semantic similarity tasks and reported strong outcomes [14]. Saxena et al. used FinBERT for nested sentiment analysis in eco-product reviews, achieving high precision [15]. Abdelaty and Lazem examined the vulnerability of transformer-based classifiers to adversarial perturbations, highlighting a 30% success rate in label flipping with minimal word edits, thus revealing robustness gaps despite high baseline performance [16].

Hybrid models that integrate elements of traditional ML, DL, and transformer models have also shown promising results. Hossain et al. proposed the attention-based transformer model (ABTM), which combines deep learning and traditional preprocessing pipelines to improve classification reliability [28]. Also, Hossain et al. introduced TransNet, a hybrid architecture that integrates GRU, LSTM, and transformer blocks—which significantly outperformed baselines in classifying [21]. Shehata et al. proposed ArabFake, a deep learning-based multitask framework built on the MARBERTv2 transformer model. This framework is designed for fake news detection, content categorization, and risk assessment in Arabic text [24]. Aljrees proposed a tri-ensemble model that integrates ML and DL techniques with contextual embeddings for robust sentiment and fake news classification [48]. Although primarily developed for misinformation analysis, this model demonstrated strong performance in capturing sentiment signals across multi-label tasks, validating the potential of multitask LLM-based approaches in Arabic sentiment classification. Table 2 summarizes the main approaches, datasets, and ASA results of each study.

To further clarify the strongest outcomes, we emphasize the top-performing models across the reviewed studies. The CNN-BiGRU-Focus model [31] achieved the highest reported accuracy (99.89%) on Arabic hate speech tweets, while the tri-ensemble stacking model [48] consistently achieved 99% across all measures (Accuracy, F1, Recall, Precision) on the AFND news dataset. Likewise, the fine-tuned AraBERT with GPT-augmented Saudi tweets [13] produced robust results, with F1, Recall, and Precision all reaching 98% on a large-scale Twitter corpus. In the financial domain, the FinBERT model [20] achieved 99% accuracy and 98% AUC on ESG reviews, confirming the adaptability of BERT-based architectures to specialized domains. Similarly, the ABTM [28] reported strong results on Moroccan news articles (Accuracy = 97.69%), underscoring the effectiveness of attention-based transformer models for Arabic text classification. Collectively, these findings demonstrate that while the highest accuracy values are often achieved by deep learning or hybrid models on large datasets, large language models also deliver competitive performance when adapted to domain-specific tasks.

Figure 3 illustrates the distribution of primary AI approaches employed across the reviewed ASA studies. Most studies utilize deep learning architectures such as RNNs and CNNs or leverage transformer-based LLMs. A smaller portion relies solely on traditional machine learning methods, while several adopt hybrid strategies that integrate ML, DL, and/or LLM components. This distribution is further depicted in the following figure.

These results reveal a clear trend: transformer-based models and deep neural networks have largely replaced earlier approaches in Arabic sentiment analysis, mirroring global NLP trends. Studies that combine multiple model types (e.g., using ensembles or multiphase training) often do so to leverage complementary strengths, such as the robustness of ML ensembles and the contextual learning of DL/LLMs. Rahab et al. presented a notable exception by applying an inherently interpretable rule-based approach, achieving slightly lower accuracy (84%) than that of black-box models but offering full transparency [29]. This finding highlights the well-known accuracy—explainability trade-off often observed in AI.

In addressing objective 1, the review reveals that ASA has progressed from using traditional classifiers to predominantly using advanced deep learning and transformer architectures, with some innovative hybrid solutions. The best performing models in the recent literature are typically transformer-based (especially when fine-tuned with large Arabic corpora), although simpler models can still be effective in certain contexts or when coupled with ensemble strategies. An examination of how researchers have incorporated explainability into these various models follows.

Overall, the results from these studies demonstrate the strong performance of modern AI models in Arabic sentiment analysis. Classical ML algorithms maintain their relevance for simpler tasks, whereas deep and hybrid models offer increased robustness for more complex or noisy data environments. LLMs such as AraBERT and FinBERT stand out for their pre-trained language understanding, especially when fine-tuned based on domain-specific or dialect-rich datasets. This wide methodological landscape sets the stage for evaluating how these models explain their predictions, which will be the focus of the next section.

3.3. Explainability Techniques in ASA

A central focus of this review is how the authors of the reviewed ASA studies have approached the task of making their model outputs explainable or interpretable. A wide array of XAI techniques have been applied, ranging from post hoc explanation methods (which treat the model as a black box and explain its outputs) to intrinsically interpretable model design. XAI methods have been increasingly integrated into ASA research to address black box concerns and improve stakeholder trust.

This review identifies several widely used XAI methods. Gradient-based techniques, such as saliency maps, input*gradient (InputGrad), and integrated gradients (IGs), highlight influential input features by analyzing how small input changes affect outputs. Feature importance methods, such as coefficient-based and Gini importance (Coef./Gini) methods, are commonly used in linear and tree-based models to rank features. Perturbation-based approaches, including LIME, SHAP, and Shapley value sampling (SHAP_VS), offer model-agnostic interpretations by estimating the impact of feature perturbations on predictions.

Together, these methods reflect a growing commitment in ASA research to balance predictive power with explainability, ensuring that AI systems remain both accurate and accountable.

Figure 4 highlights the predominant reliance on LIME for explainability in Arabic sentiment analysis. While LIME provides local explainability, its susceptibility to perturbation inconsistencies suggests a need for complementary techniques such as SHAP. The adoption of attention mechanisms underscores the importance of intrinsic explainability within transformer-based models. The presence of hybrid XAI approaches signals a shift toward more robust, multifaceted explainability solutions. LIME is preferred over SHAP due to its simplicity, computational efficiency, and adaptability to small or imbalanced datasets common in Arabic NLP [14,19,20]. Conversely, SHAP, despite its stronger theoretical foundations and consistency in feature attribution, imposes significantly higher computational demands, emphasizing the ongoing challenge of balancing transparency with predictive performance.

While LIME remains a widely adopted method, its explanations are vulnerable to perturbative instability and continue to raise questions about their faithfulness to the underlying model. Moreover, attention weights should not be directly interpreted as explanations, as they do not necessarily reflect causal reasoning. Recent studies have therefore emphasized the need to evaluate the stability and trustworthiness of interpretability methods to ensure their reliability in practice.

These findings have important implications for Arabic NLP tool developers, suggesting that integrating lightweight explainability methods like LIME can enhance model transparency without imposing prohibitive computational costs. For production systems, the scalability of XAI methods should be carefully considered, as more computationally intensive techniques like SHAP may limit deployment in real-time applications. For industry practitioners, adopting hybrid explainability strategies can improve trust and facilitate informed decision-making when deploying sentiment analysis models in domains such as social media monitoring, customer feedback analysis, and digital marketing.

In the health and medical domain, XAI techniques have been instrumental in revealing sentiment patterns associated with medical discourse and public health communication. Abdelwahab et al. applied LIME to analyze LASIK surgery-related tweets, successfully identifying symptom-specific language that validated the model’s sentiment predictions [22]. Similarly, Hossain et al. leveraged LIME to uncover emotional expressions and key health-related terms within asthma-related tweets, shedding light on the linguistic cues driving the classification outcomes [21]. Extending this approach, Sweidan et al. adopted a more comprehensive strategy by combining LIME with SHAP and attention mechanisms to interpret COVID-19-related sentiment in tweets, producing multidimensional explanations that enriched the understanding of how models process public health narratives [20]. Additionally, Elbasiony et al. addressed bias detection and stop-word refinement in health-related social media sentiment by combining local (e.g., LIME) and global (e.g., logistic coefficients and Gini importance) feature importance techniques [25]. This hybrid approach contributed to auditing model behavior and refining linguistic features, promoting more equitable and interpretable outcomes in Arabic health sentiment analysis.

In the domain of cybersecurity, particularly for tasks involving fake news detection and misinformation, explainability plays a pivotal role in building trust and ensuring the credibility of AI systems. Ibrahim and Umer employed LIME to highlight modality verbs and reference sources that influence fake versus real news classification, aligning model outputs with known linguistic indicators [23]. Shehata et al. enhanced model explainability by integrating LIME with expert rationales and valence scoring, focusing on emotionally charged content as a key signal in misinformation detection [24]. Aljrees applied a tri-ensemble approach within a cybersecurity context for Arabic fake news detection, using explanation tools to validate model coherence and highlight markers of credibility or deception [48]. In a related effort, Abdelaty and Lazem, also within the cybersecurity domain, leveraged LIME not only for explainability but also for adversarial vulnerability analysis, identifying the most impactful words that models used to detect deception [16]. Collectively, these studies underscore the critical role of explainability in safeguarding the reliability and accountability of fake news detection systems.

In the context of social media, the informal, dialect-rich nature of Arabic content has led to a wide variety of XAI implementations aimed at improving model transparency and explainability. Awadallah et al. utilized LIME to extract influential n-grams associated with sentiment polarity, validating model outputs against known colloquial expressions and surfacing ambiguous cases [19]. Similarly, Hossain et al. applied LIME and attention mechanisms to improve Arabic news classification on social media, helping explain misclassifications between domains such as politics and economics while maintaining strong performance in clearer categories such as sports [28]. Alhumoud introduced a CNN-BiGRU-Focus model with attention layers that highlighted key words and emojis in hate speech detection, enhancing explainability in socially sensitive tasks [31]. Azzeh et al. advanced this approach by visualizing multihead attention weights to reveal abusive language cues for cyberbullying detection, offering insight into harmful discourse [30]. Rahab et al. developed a rule-based sentiment analysis system that delivers fully transparent, human-readable decision logic, trading some predictive flexibility for explainability [29]. These studies collectively demonstrate the critical role of XAI in making sentiment predictions over social platforms more comprehensible and trustworthy, particularly in linguistically complex and dynamic environments.

In the linguistics domain, explainability has been pivotal in clarifying how ASA models interpret nuanced language phenomena, such as dialectal variation and morphological complexity. Atabuzzamanm introduced a noise-injection layer with LIME, which not only improved the generalizability of their model but also enhanced its explainability by emphasizing stable and meaningful sentiment features [27]. Aftan et al. improved the transparency of transformer-based ASA systems by using word level heatmaps via LIME to highlight the model’s dependence on Saudi dialectal phrases and emojis, offering interpretable cues that aligned with human expectations and increased trust [13]. AZZEM et al. conducted a comprehensive benchmarking of XAI methods—including saliency maps, InputGrad, IG, LIME, SHAP_VS, and random baselines, for the AraBERT and AraGPT2 models [14]. Their findings indicated that gradient-based techniques offered higher fidelity to model behavior, whereas perturbation-based methods such as SHAP_VS provided more intuitively understandable explanations.

Moreover, Berrimi et al. explored additive attention mechanisms in a hybrid BiGRU-BiLSTM model, revealing which sentence components most strongly influenced sentiment predictions and thereby shedding light on the inner decision-making process of complex neural architectures [26]. Collectively, these studies underscore the importance of linguistic sensitivity in designing interpretable ASA systems and validate the role of XAI approaches in aligning model behavior with linguistic reasoning.

In the business and sustainability domains, XAI has been applied to promote transparency and trust in decision-making processes related to economic and environmental outcomes. Almani and Tang focused on explaining the outputs of deep learning models in business sentiment analysis by visualizing attention weights [47]. Their approach highlighted the key sentiment-driving words within financial texts, allowing stakeholders to better understand how the model arrived at specific conclusions—effectively opening the black box of deep learning. Similarly, Saxena et al. applied XAI techniques in the context of ESG investment analysis [15]. By integrating explainability into sentiment classification, their model enables investors and analysts to trace sentiment outcomes back to textual indicators of ESG performance, supporting more transparent and informed decision-making in sustainable finance. Table 3 provides an overview of the explainability approaches applied in Arabic sentiment analysis, highlighting the contribution of each to model interpretability.

In addressing objective 1, the findings revealed that explainability in ASA has been approached through both post hoc techniques and built-in model features. Post hoc approaches, especially LIME and SHAP, are widely used and are effective at explaining individual predictions in terms of word importance. In addition, these methods are easy to apply and understand, which likely explains their widespread adoption. However, these local explanations can sometimes be unstable (LIME results may vary with perturbations) and need to be complemented with global insights.

Intrinsic explainability via attention or transparent models offers another approach. When these approaches maintain high performance (as with attention-based RNNs), they offer a convenient solution. However, not all high-performing models are interpretable by design (e.g., ensemble or complex transformer approaches require integration with XAI tools).

A few innovative strategies, such as adversarial analysis explainability, can also be included in the development process (e.g., simplifying the models or applying targeted perturbations to reveal internal decision patterns). These findings are promising for future work in balancing between accuracy and explainability.

Our review also indicates that multiple studies emphasize the need to combine explainability methods to obtain a more complete understanding of model behavior. For example, using both attention visualization and LIME provided complementary perspectives in some cases (one highlighting model internals and the other providing an external approximation of feature importance). The use of human annotations or domain knowledge (as in ArabFake) to validate model explanations is another best practice that has emerged.

In conclusion, while accuracy is crucial, explainability is becoming a standard component of ASA research. Next, the broader implications of these results, the challenges identified, and future directions proposed across the literature are discussed.

4. Discussion

Owing to its profound linguistic characteristics, the Arabic language presents unique challenges in terms of computational explainability, particularly in tasks requiring nuanced understanding, such as sentiment analysis. These complexities become even more pronounced when they intersect with the demand for model transparency through explainability. In this discussion, a clearer understanding of the current research landscape is provided, establishing a foundation for identifying future opportunities and challenges in developing more interpretable, robust, and linguistically aware sentiment analysis systems for Arabic text. The integration of ASA with XAI has clear industrial relevance. For example, in the healthcare sector, hospitals and clinics analyze Arabic patient feedback to evaluate satisfaction with medical services, where XAI explanations help identify aspects driving positive or negative sentiment, guiding targeted improvements in patient experience. In the banking sector, financial institutions monitor Arabic customer feedback on social media, with XAI explanations clarifying why a comment is classified as negative (“transfer issues”) or positive (“excellent service”), thereby supporting proactive customer care and building trust. This section concludes by outlining future research pathways to address current gaps and advance the boundaries of explainability in Arabic NLP. This review highlights both the technical linguistic and the methodological dimensions essential for developing more transparent and effective ASA models.

4.1. Trends and Insights

The synthesis of methods and XAI techniques from the reviewed studies highlights several notable trends in explainable Arabic sentiment analysis. First, there has been a clear shift toward the use of advanced language models (transformers) in ASA, which mirrors their success in other languages. This shift has been driven by the significant performance gains that these models offer, i.e., many studies have reported state-of-the-art accuracies in Arabic sentiment tasks when models such as AraBERT or MARBERT are fine-tuned with relevant data.

This approach is promising, although it requires careful tuning and still somewhat sacrifices the full power of complex models (since the student model may not capture all nuances of the teacher model). The combination of multiple XAI methods in a single study suggests that researchers are aware of each method’s limitations. For example, attention weights are sometimes criticized as explanations because they can be manipulated without changing model outputs (thus, they do not always faithfully indicate importance). Moreover, LIME can suffer from instability. By using both methods, as [20] did, one can cross-verify the output explanations: if both attention and LIME highlight the same word in a tweet as important, one gains more confidence that the model truly uses that word to make its decision. This multifaceted explainability is a good practice that has emerged from the literature. A significant insight from these studies is how explainability can reveal dataset or model biases. Several studies listed in Table 3 reported the use of XAI approaches to diagnose misclassifications or limitations. Hossain et al. discovered via LIME that errors in news categorization occurred when articles shared vocabulary (e.g., politics vs. economics), indicating a limitation in the model’s ability to discern context; thus, they suggested a need for more distinctive features or targeted training data [28]. Also, reference [16] used an XAI approach to show that a single word change could lead an offensive content classifier to flip its output, indicating potential vulnerability in the model.

These examples not only expose a weakness but also implicitly hint at bias: models could be over reliant on certain keywords to the extent that removal reduces performance.

4.2. Challenges in ASA Explainability

Despite progress, several challenges remain in explainable ASA. A recurring theme is the linguistic diversity of the Arabic language. Many studies acknowledge that their models do not cover all Arabic dialects or forms. For example, a model trained on Egyptian tweets might not generalize to Gulf Arabic or to formal news text. This issue is compounded when explaining models: an explanation is only as good as the model’s understanding. If the model lacks knowledge of a particular dialectal expression, it might misclassify it, and any explanation reflects that misclassification (possibly confusing users). Addressing this challenge requires richer datasets. As noted in multiple future work suggestions, the lack of large, diverse, and well-annotated Arabic sentiment datasets is a bottleneck. More comprehensive datasets would not only improve model performance but also allow for more reliable explanations.

In addition to data limitations, the issue of incomparability across studies is critical. The included works many studies report only accuracy without specifying macro-F1 or confidence intervals, limiting interpretability and reducing comparability. There is also a risk of inflated performance in cases of unbalanced classes, where high accuracy may simply reflect the dominance of majority classes. Furthermore, the specificity of tweet-based datasets—which are often short, noisy, and context-dependent—adds another layer of caution when comparing them with more structured review or news corpora. These limitations reinforce the need for standardized evaluation practices in future research on XAI in Arabic sentiment analysis.

Another challenge is the trade-off between model complexity and explainability. Within Arabic sentiment analysis, rule-based methods typically achieve accuracies of around 84%, offering high interpretability since their outcomes are directly tied to predefined linguistic rules and lexicons [29]. Yet, this simplicity constrains their ability to account for the linguistic diversity and contextual nuances of Arabic, resulting in comparatively lower predictive performance. Conversely, black-box models, particularly transformer-based architectures, often exceed 90% accuracy by capturing complex and context-dependent patterns [28]. However, this gain in predictive power comes at the expense of transparency. These findings have important implications for Arabic NLP tool developers, suggesting that integrating lightweight explainability methods like LIME can enhance model transparency without imposing prohibitive computational costs.

These explainability challenges observed in ASA are closely aligned with broader trends in the XAI/NLP landscape [49]. Across languages, black-box models such as transformers consistently achieve high accuracy but remain difficult to interpret, highlighting the universal accuracy–explainability trade-off. Some studies have attempted to make black-box models more transparent (e.g., using SHAP with BERT), but no perfect solution exists to date. The field will likely continue exploring hybrid models that maintain high accuracy while being more transparent. For example, one could imagine a system that uses a transformer to obtain an initial prediction and then uses an explainable model to adjust or validate that prediction, providing a human-understandable rule if possible. Explanation evaluation is also challenging. In most cases, the utility of explanations was discussed qualitatively, suggesting a need for better metrics and frameworks to assess whether an explanation in ASA is “good.” Human user studies can be useful, e.g., showing a user model outputs with explanations and assessing whether it improves their trust or their ability to correct the model. Evidence of extensive user studies in the reviewed papers was not found, likely because of their focus on technical contributions. However, for broader adoption, understanding how end-users perceive these explanations (e.g., Are they satisfying? Do they expose biases?) will be crucial.

Additionally, several domain-specific challenges have emerged. In a hate speech detection scenario [31], even with explainability, authors noted difficulty in interpreting emojis and sarcasm. An attention heatmap might highlight an emoji, but the model might misinterpret its meaning: a laughing emoji could mean genuine amusement or derision. Therefore, explainability does not automatically equate to understanding, especially when the model itself struggles with a nuanced phenomenon. Future studies could incorporate external knowledge (e.g., an emoji sentiment lexicon) into the explainability pipeline to better handle such cases. Another challenge is adversarial robustness and its interplay with explainability.

Abdelaty et al. [16] shows that models can be fragile, and current explainability methods might inadvertently reveal these fragilities (which is a double-edged sword: good for improving the model but risky if malicious actors exploit them). Future ASA systems may need to integrate adversarial training so that their explanations remain valid under slight input perturbations. Otherwise, an explanation might highlight a specific word (e.g., “word X was important”), but if an attacker replaces that word with a synonym and the model changes its output, the explanation no longer holds. Some researchers have suggested that truly robust explanations should highlight more fundamental features (such as concepts) rather than specific tokens.

4.3. Limitations

This review includes 19 studies, which provide only a snapshot of a rapidly evolving field. It is also important to note the possibility of publication bias, as studies with positive results are more likely to be published than those with negative or non-significant findings. This scoping review acknowledges several limitations, which can be categorized into review-level and field-level limitations.

While this review provides a comprehensive synthesis of XAI in ASA, several limitations must be acknowledged at the review level. First, the pool of 19 studies, although comprehensive in the context of XAI-focused ASA research up to early 2025, may not capture some relevant studies that addressed Arabic sentiment without explicitly mentioning XAI, or very recent publications. As the field is rapidly evolving, new models (e.g., Arabic GPT-style generative models) and new XAI methods (e.g., those leveraging inner transformer representations) are continuously emerging. Second, the synthesis is based on reported measures, which may not be directly comparable due to variations in datasets and evaluation protocols. Accordingly, rather than identifying a single best model, our focus was on overarching trends.

In addition, several limitations exist at the field level. The evaluation of explainability remains inherently subjective, often relying on authors’ interpretations and limited quantitative evidence. In addition to these qualitative assessments, future research should incorporate quantitative measures such as faithfulness, stability, completeness, and human-centered metrics (e.g., task performance or time to decision) [16,50]. Combining qualitative insights with quantitative evaluation frameworks would allow for a more rigorous and balanced assessment of explanation quality, particularly in the context of Arabic sentiment analysis. Furthermore, dataset diversity and dialectal coverage continue to constrain generalizability. A complementary line of work that emphasizes the direct measurement of explanation quality could yield different insights.

Despite these limitations, the key messages remain as follows: XAI has become a fundamental aspect of advanced ASA research, and the community is actively addressing the dual challenges of achieving high accuracy while ensuring strong explainability. Continued interdisciplinary collaboration across NLP, machine learning, and human–computer interaction will be essential to achieve transparent and reliable sentiment analysis systems.

4.4. Future Directions

Considering the above challenges, the reviewed literature collectively suggests several future research directions for explainable ASA.

Development of richer datasets and benchmarks:

Several studies [15,22,23,28] emphasize the urgent need for large-scale, diverse Arabic sentiment datasets. These should span various dialects and genres, such as social media, news, and movie reviews, and include multilabel sentiment or emotion annotations. To support transparency, dataset documentation should highlight potential biases and challenging linguistic phenomena. Including human rationale annotations could further aid in evaluating explanation quality.

Collaborative efforts, like shared tasks or competitions (e.g., an Explainable Arabic Sentiment Analysis challenge), could drive this initiative by evaluating models on both performance and the quality of their explanations.

Integration of hybrid XAI frameworks:

Recent studies [23,28] suggest combining various explainability methods, such as attention mechanisms, feature attribution, and example-based reasoning, within a single model. For example, a hybrid XAI model might highlight important words through attention mechanisms, retrieve similar cases from the training data using case-based reasoning, and generate human-readable rules that approximate the model’s decision logic.

Such multi-faceted explanations could satisfy a broader audience, from developers to end-users. Advanced XAI methods from computer vision, such as Concept Activation Vectors (CAVs), may be adapted for text to extract semantically meaningful concepts tied to sentiment.

Addressing the accuracy–explainability trade-off

Future research could focus on embedding explainability constraints into training objectives without severely compromising accuracy. One approach involves regularizing attention weights for sharper focus [26]. Another direction is designing inherently interpretable architectures—e.g., networks that first extract a set of sentiment-related factors (topics/aspects) and base predictions on these interpretable features [47].

Explainability beyond individual predictions:

Rahab et al. and others highlight the need for corpus-level explanations to identify systematic model behaviors [29]. For instance, “the model often misclassifies inputs containing word X when Y is present” can provide insights into model weaknesses. This moves toward explainable evaluation for ASA debugging. While some studies identified such issues manually (e.g., misclassifications between closely related categories or dialects), future work could leverage automated techniques like association rule mining or dataset-wide SHAP value aggregation to detect these patterns.

User-centric evaluation of ASA systems:

Finally, a user-centered approach is essential to assess how well XAI explanations serve real-world needs [16]. End-users, including analysts and native speakers, should evaluate explanation clarity, usefulness, and actionability. For example, can users revise content based on an explanation highlighting a sentiment-driving word? Closing the feedback loop may improve explanation delivery through natural language outputs or visualizations, as demonstrated in references [24,50].

Broader adoption in real-world applications:

Insights from this scoping review highlight the strong need for an additional direction in future work that involves expanding the adoption of explainable sentiment analysis systems in Arabic-speaking contexts. These systems could be highly valuable for businesses and government institutions where understanding public sentiment is essential. However, these systems must prioritize transparency to gain user trust and support widespread deployment. The development of culturally and linguistically sensitive explainability mechanisms can play a vital role in fostering acceptance and usability across real-world scenarios.

5. Conclusions

This paper reviews the use of XAI techniques in Arabic sentiment analysis. It covers studies published from 1 January 2016 to 31 March 2025. A total of 19 studies were identified and examined. The review presents an analysis of the methods, explainability techniques, and the specific challenges each study addresses.

The review revealed that the landscape of ASA has evolved to incorporate deep learning and transformer-based models because of their superior performance in handling Arabic linguistic complexity. Concurrently, there is a strong and growing emphasis on making these models explainable by using tools such as LIME, SHAP, and attention visualization, as well as innovative strategies such as rule-based systems. The comparative analysis revealed that while many ASA models can achieve accuracies above 90% based on various datasets, the explainability of these models will drive user adoption and trust. For example, a sentiment classifier for Arabic tweets that can highlight exactly which words lead to a negative label is far more useful for practical insights than one that outputs a label with no context. The predominance of LIME in the literature underscores the need for intuitive, model-agnostic explanation tools, although it also points to the importance of developing explanations that are stable and faithful to model behavior.

Key challenges, including handling Arabic dialectal diversity, balancing model complexity with explainability, evaluating the quality of explanations, and ensuring the robustness of both models and explanations to adversarial or rare inputs, are identified. These challenges inform recommendations for future research, emphasizing the development of richer Arabic sentiment datasets with ground-truth explanations, the design of hybrid models and multimethod XAI frameworks, and human-centered evaluations of explainable ASA systems. The ultimate goal is to construct sentiment analysis models that are not only accurate across the breadth of the Arabic language’s variations but also transparent in their decision-making process, thereby enabling stakeholders to confidently act on their outputs.

In conclusion, XAI is essential for progress in ASA, not just an optional addition. By integrating explainability into model development and evaluation, researchers and practitioners can ensure that sentiment analysis tools are accountable and aligned with user needs and ethical considerations. This review serves as a valuable resource for academics and engineers striving to enhance explainable sentiment analysis in Arabic and other languages, driving the development of AI systems that are both transparent and effective.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app151910659/s1, Table S1: Extraction Sheets Form; Table S2: Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist.

Author Contributions

Conceptualization, A.A., A.B. and D.A.; methodology, A.A.; formal analysis, A.A.; investigation, A.A.; data curation, A.A.; writing—original draft preparation, A.A.; writing—review and editing, A.B. and D.A.; visualization, A.A.; supervision, A.B. and D.A.; project administration, A.B. and D.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deanship of Scientific Research at Northern Border University, Arar, KSA, through the project number “NBU-SAFIR-2025”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kumar, P. Large Language Models (LLMs): Survey, Technical Frameworks, and Future Challenges. Artif. Intell. Rev. 2024, 57, 260. [Google Scholar] [CrossRef]
Pang, B.; Lee, L. Opinion Mining and Sentiment Analysis. Found. Trends. Inf. Retr. 2008, 2, 148. [Google Scholar] [CrossRef]
Främling, K. Decision Theory Meets Explainable AI. In Proceedings of the International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems, Auckland, New Zealand, 9–13 May 2020; pp. 57–74. [Google Scholar]
Salih, A.M.; Galazzo, I.B.; Gkontra, P.; Rauseo, E.; Lee, A.M.; Lekadir, K.; Radeva, P.; Petersen, S.E.; Menegaz, G. A Review of Evaluation Approaches for Explainable AI with Applications in Cardiology. Artif. Intell. Rev. 2024, 57, 240. [Google Scholar] [CrossRef]
Nabil, M.; Aly, M.; Atiya, A. Astd: Arabic Sentiment Tweets Dataset. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 2515–2519. [Google Scholar]
Aly, M.; Atiya, A. Labr: A Large Scale Arabic Book Reviews Dataset. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; (Volume 2: Short Papers). pp. 494–498. [Google Scholar]
Antoun, W.; Baly, F.; Hajj, H. Arabert: Transformer-Based Model for Arabic Language Understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
Aladeemy, A.A.; Aldhyani, T.H.H.; Alzahrani, A.; Alzahrani, E.M.; Khalaf, O.I.; Alsubari, S.N.; Deshmukh, S.N.; Al-Adhaileh, M.H. Machine Learning Algorithms for Predicting and Analyzing Arabic Sentiment. SN Comput. Sci. 2024, 5, 1132. [Google Scholar] [CrossRef]
Gharaibeh, H.; Al Mamlook, R.E.; Samara, G.; Nasayreh, A.; Smadi, S.; Nahar, K.M.O.; Aljaidi, M.; Al-Daoud, E.; Gharaibeh, M.; Abualigah, L. Arabic Sentiment Analysis of Monkeypox Using Deep Neural Network and Optimized Hyperparameters of Machine Learning Algorithms. Soc. Netw. Anal. Min. 2024, 14, 30. [Google Scholar] [CrossRef]
Abdelhady, N.; Hassan, A.; Soliman, T.; Farghally, M.F. Stacked-CNN-BiLSTM-COVID: An Effective Stacked Ensemble Deep Learning Framework for Sentiment Analysis of Arabic COVID-19 Tweets. J. Cloud Comput. 2024, 13, 85. [Google Scholar]
Alsehaimi, A.A.A.; Abi Sen, A.A.; Srinivas, R.; Bahbouh, N.M.; Alsubhy, A.M.; al Masabi, M.M.M. A Smart Framework to Analyze Hotel Services after COVID-19. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 23–25 March 2022; pp. 415–419. [Google Scholar]
Diwali, A.; Dashtipour, K.; Saeedi, K.; Gogate, M.; Cambria, E.; Hussain, A. Arabic Sentiment Analysis Using Dependency-Based Rules and Deep Neural Networks. Appl Soft Comput. 2022, 127, 109377. [Google Scholar] [CrossRef]
Aftan, S.; Zhuang, Y.; Aseeri, A.O.; Shah, H. Steering a Standard Arab Language Processing Model Towards Accurate Saudi Dialect Sentiment Analysis Using Generative AI. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; pp. 5891–5900. [Google Scholar]
Azzem, Y.C.H.; Harrag, F.; Bellatreche, L. Exploring Explainability in Arabic Language Models: An Empirical Analysis of Techniques. Procedia Comput. Sci. 2024, 244, 212–219. [Google Scholar] [CrossRef]
Saxena, A.; Santhanavijayan, A.; Shakya, H.K.; Kumar, G.; Balusamy, B.; Benedetto, F. Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI. Mathematics 2024, 12, 3332. [Google Scholar] [CrossRef]
Abdelaty, M.; Lazem, S. Investigating the Robustness of Arabic Offensive Language Transformer-Based Classifiers to Adversarial Attacks. In Proceedings of the 2024 Intelligent Methods, Systems, and Applications (IMSA), Giza, Egypt, 13–14 July 2024; pp. 109–114. [Google Scholar]
Campbell, F.; Tricco, A.C.; Munn, Z.; Pollock, D.; Saran, A.; Sutton, A.; White, H.; Khalil, H. Mapping Reviews, Scoping Reviews, and Evidence and Gap Maps (EGMs): The Same but Different the “Big Picture” Review Family. Syst. Rev. 2023, 12, 45. [Google Scholar] [CrossRef]
Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.J.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef] [PubMed]
Awadallah, M.S.; de Arriba-Pérez, F.; Costa-Montenegro, E.; Kholief, M.; El-Bendary, N. Investigation of Local Interpretable Model-Agnostic Explanations (LIME) Framework with Multi-Dialect Arabic Text Sentiment Classification. In Proceedings of the 2022 32nd International Conference on Computer Theory and Applications (ICCTA), Alexandria, Egypt, 17–19 December 2022; pp. 116–121. [Google Scholar]
Sweidan, A.H.; El-Bendary, N.; Taie, S.A.; Idrees, A.M.; Elhariri, E. Explainable Deep Learning for COVID-19 Vaccine Sentiment in Arabic Tweets Using Multi-Self-Attention BiLSTM with XLNet. Big Data Cogn. Comput. 2025, 9, 37. [Google Scholar] [CrossRef]
Hossain, M.M.; Hossain, M.S.; Hossain, M.S.; Mridha, M.F.; Safran, M.; Alfarhood, S. TransNet: Deep Attentional Hybrid Transformer for Arabic Posts Classification. IEEE Access 2024, 12, 111070–111096. [Google Scholar] [CrossRef]
Abdelwahab, Y.; Kholief, M.; Sedky, A.A.H. Justifying Arabic Text Sentiment Analysis Using Explainable Ai (Xai): Lasik Surgeries Case Study. Information 2022, 13, 536. [Google Scholar] [CrossRef]
Ibrahim Aboulola, O.; Umer, M. Novel Approach for Arabic Fake News Classification Using Embedding from Large Language Features with CNN-LSTM Ensemble Model and Explainable AI. Sci. Rep. 2024, 14, 30463. [Google Scholar] [CrossRef]
Shehata, A.; Al-Suqri, M.N.; Elshaiekh, N.E.; Hamad, F.; Alhusaini, Y.N.; Mahfouz, A. ArabFake: A Multitask Deep Learning Framework for Arabic Fake News Detection, Categorization, and Risk Prediction. IEEE Access 2024, 12, 191345–191360. [Google Scholar] [CrossRef]
Elbasiony, A.; El-Hasnony, I.M.; Abdelrazek, S. Xai-Based Sentiment Analysis Using Machine Learning Approaches. Mansoura J. Comput. Inf. Sci. 2024, 19, 23–42. [Google Scholar] [CrossRef]
Berrimi, M.; Oussalah, M.; Moussaoui, A.; Saidi, M. Attention Mechanism Architecture for Arabic Sentiment Analysis. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 1–26. [Google Scholar] [CrossRef]
Atabuzzaman, M.; Shajalal, M.; Baby, M.B.; Boden, A. Arabic Sentiment Analysis with Noisy Deep Explainable Model. In Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval, Seoul, Republic of Korea, 15–17 December 2023; pp. 185–189. [Google Scholar]
Hossain, M.M.; Hossain, M.S.; Safran, M.; Alfarhood, S.; Alfarhood, M.; Mridha, M.F. A Hybrid Attention-Based Transformer Model for Arabic News Classification Using Text Embedding and Deep Learning. IEEE Access 2024, 12, 198046–198066. [Google Scholar] [CrossRef]
Rahab, H.; Haouassi, H.; Laouid, A. Rule-Based Arabic Sentiment Analysis Using Binary Equilibrium Optimization Algorithm. Arab. J. Sci. Eng. 2023, 48, 2359–2374. [Google Scholar] [CrossRef]
Azzeh, M.; Alhijawi, B.; Tabbaza, A.; Alabboshi, O.; Hamdan, N.; Jaser, D. Arabic Cyberbullying Detection System Using Convolutional Neural Network and Multi-Head Attention. Int. J. Speech Technol. 2024, 27, 521–537. [Google Scholar] [CrossRef]
Alhumoud, S.O. CNN-BiGRU-Focus: A Hybrid Deep Learning Classifier for Sentiment and Hate Speech Analysis of Ashura-Arabic Content for Policy Makers. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 11. [Google Scholar] [CrossRef]
Alayba, A.M.; Palade, V.; England, M.; Iqbal, R. Arabic Language Sentiment Analysis on Health Services. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 114–118. [Google Scholar]
Elnagar, A.; Khalifa, Y.S.; Einea, A. Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications. In Intelligent Natural Language Processing: Trends and Applications; Springer International Publishing: Cham, Switzerland, 2017; pp. 35–52. [Google Scholar]
Khalil, A.; Jarrah, M.; Aldwairi, M.; Jaradat, M. AFND: Arabic Fake News Dataset for the Detection and Classification of Articles Credibility. Data Brief 2022, 42, 108141. [Google Scholar] [CrossRef] [PubMed]
Elnagar, A.; Einea, O. BRAD 1.0: Book Reviews in Arabic Dataset. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Agadir, Morocco, 29 November–2 December 2016; pp. 1–8. [Google Scholar]
ElSahar, H.; El-Beltagy, S.R. Building Large Arabic Multi-Domain Resources for Sentiment Analysis. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Cairo, Egypt, 15–21 March 2015; Springer: Cham, Switzerland; pp. 23–34. [Google Scholar]
Marseille, F. Arabic Hate Speech 2022 Shared Task! Available online: https://sites.google.com/view/arabichate2022/home (accessed on 21 February 2025).
Shannag, F.; Hammo, B.H.; Faris, H. The Design, Construction and Evaluation of Annotated Arabic Cyberbullying Corpus. Educ. Inf. Technol. (Dordr.) 2022, 27, 10977–11023. [Google Scholar] [CrossRef]
Rushdi-Saleh, M.; Martin-Valdivia, M.T.; Ureña-López, L.A.; Perea-Ortega, J.M. OCA: Opinion Corpus for Arabic. J. Am. Soc. Inf. Sci. Technol. 2011, 62, 2045–2054. [Google Scholar] [CrossRef]
Alhumoud, S.; Al Wazrah, A.; Alhussain, L.; Alrushud, L.; Aldosari, A.; Altammami, R.N.; Almukirsh, N.; Alharbi, H.; Alshahrani, W. ASAVACT: Arabic Sentiment Analysis for Vaccine-Related COVID-19 Tweets Using Deep Learning. PeerJ Comput. Sci 2023, 9, e1507. [Google Scholar] [CrossRef]
Almamlouk, M. Data Tweet-s. Available online: https://www.kaggle.com/datasets/mahdimahdi55/data-tweet-s?resource=download (accessed on 22 January 2025).
Alyami, S. Arabic Sentiment Analysis Dataset SS2030 Dataset. Available online: https://www.kaggle.com/datasets/snalyami3/arabic-sentiment-analysis-dataset-ss2030-dataset (accessed on 15 February 2025).
Khooli, A. Arabic 100k Reviews. Available online: https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews (accessed on 1 February 2025).
Abdelfattah, Y. Lasik Surgery Arabic Text Dataset. Available online: https://www.kaggle.com/datasets/youmnahabdelfattah/lasik-surgery-arabic-text-dataset (accessed on 3 January 2025).
Mohamed, B. DataSet for Arabic Classification. Available online: https://data.mendeley.com/datasets/v524p5dhpj/2 (accessed on 2 February 2025).
Alotaibi, M.; Ahmed, O. An Investigation of Asthma Experiences in Arabic Communities through Twitter Discourse. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 5. [Google Scholar] [CrossRef]
Almani, N.M.; Tang, L.H. Deep Attention-Based Review Level Sentiment Analysis for Arabic Reviews. In Proceedings of the 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 4–5 March 2020; pp. 47–53. [Google Scholar]
Aljrees, T. Improving Prediction of Arabic Fake News Using ELMO’s Features-Based Tri-Ensemble Model and LIME XAI. IEEE Access 2024, 12, 63066–63076. [Google Scholar] [CrossRef]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What We Know and What Is Left to Attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
El Beggar, O.; Ramdani, M.; Kissi, M. Design and Development of a Fuzzy Explainable Expert System for a Diagnostic Robot of COVID-19. Int. J. Electr. Comput. Eng. 2023, 13, 6940–6951. [Google Scholar] [CrossRef]

Figure 1. PRISMA flow diagram of the review process.

Figure 2. Publication trend of explainable ASA studies by year (1 January 2016–31 March 2025).

Figure 3. Distribution of primary AI methods among ASA studies.

Figure 4. Distribution of explainability methods used across ASA studies. Categories are defined as follows: LIME only (perturbation-based local interpretability using LIME), LIME + Other Methods (hybrid approaches combining LIME with additional explainability techniques), Attention Mechanism (interpretability derived from neural attention weights), and Valence Scoring + Expert Annotations (rule-based or expert-driven sentiment assignment).

Table 1. Overview of the datasets used in the reviewed ASA studies.

Reference	Dataset Name	Shortcut
[15]	Environmental, Social, and Governance	ESG
[32]	Arabic health services dataset	AHS
[33]	Hotel Arabic Reviews dataset	HARD
[34]	Arabic fake news detection	AFND
[6]	Large-scale Arabic Book Reviews dataset	LABR
[35]	Book reviews in Arabic dataset	BRAD
[36]	Hotel Review dataset	HTL
[13]	Saudi Twitter Dataset	STD
[37]	Open-Source Arabic Corpora and shared task	OSACT5
[38]	Arabic cyber bullying tweets dataset	ArCybC
[39]	Opinion corpus for Arabic dataset	OCA
[40]	Arabic sentiment analysis for the vaccine-related COVID-19 tweet dataset	ASAVACT
[41]	COVID-19 vaccine tweets	N/A ¹
[42]	SS2030 Saudi Twitter commentary	N/A
[43]	100k reviews dataset	N/A
[31]	Arabic hate speech-Ashura event posts	N/A
[44]	LASIK surgery feedback	N/A
[24]	ArabFake	N/A
[45]	News articles from (Assabah, Hespress, and Akhbarona)	N/A
[46]	Asthma tweets	N/A

¹ N/A = Not Applicable.

Table 2. Overview of reviewed studies: AI methods and performance in Arabic sentiment analysis.

Reference	ASA Model	ASA Model Type	Additional Techniques	Dataset Name (Size)	Dataset Type	Performance Measures	Best Results
[19]	RF, DT	ML	Combined n-gram (UniGram, Bigram, Trigram)	HARD (105,698), AHS (2026)	Reviews, Tweets	Accuracy, F1-Score, Recall, Precision	93.40%, 97.20%, 95.00%, 97.50%
[25]	SVM, LR, DT, RF	ML	With FastText Embeddings	Arabic COVID-19 vaccine (32,476), 100k reviews dataset (100,000), SS2030 (4252)	Reviews, Tweets	Accuracy, F1-Score, Recall, Precision	91.10%, N/A ¹, 89.50%, 89.00%
[29]	Rule-based classifier	ML	Using 13 rules	OCA (500)	Reviews	Accuracy, F1-Score, Recall, Precision	84.0%, N/A, N/A 100%
[20]	BiLSTM	DL	With XLNet embeddings, multi-self-attention mechanism	ASAVACT (32,476)	Tweets	Accuracy, F1-Score, Recall, Precision	93.20%, 92.00%, 92.30%, 92.00%
[22]	LSTM	DL	N/A	LASIK surgery feedback (4202)	Tweets	Accuracy, F1-Score, Recall, Precision	79.10%, 71.00%, 76.00%, 71.00%
[23]	Voting ensemble (CNN-LSTM)	DL	With ELMo embeddings	AFND (606,912)	News	Accuracy, F1-Score, Recall, Precision	98.42%, 98.93%, 99.50%, 98.54%
[26]	BiGRU -BiLSTM	DL	FastText, trainable embeddings	LABR (63,257), HARD (105,698), BRAD (510,600)	Reviews	Accuracy, F1-Score, Recall, Precision	96.29%, 96.28%, 96.14%, 97.03%
[27]	BiLSTM, CNN-BiLSTM	DL	With noise layer (to regularize)	LABR (63,257), HTL (15,572)	Reviews	Accuracy, F1-Score, Recall, Precision	88.00%, 93.00%, 95.00%, 91.00%
[30]	ArCB	DL	CNN, Multi-Head Attention, ResNet (Word2Vec)	ArCybC (4.5000)	Tweets	Accuracy, F1-Score, Recall, Precision, AUC	82.60%, 74.40%, 72.50%, 76.40%, 88.50%
[31]	CNN-BiGRU-Focus model	DL	N/A	Arabic hate speech-Ashura event posts (428,210)	Tweets	Accuracy, F1-Score, Recall, Precision, AUC ²,	99.89%, 98.00%, 98.00%, 96.00%, 99.00%,
[47]	DARLSA	DL	GRU with Transfer Learning, and additive attention	LABR (16,486)	Reviews	Accuracy, F1-Score, Recall, Precision	92.0%, 85.0%, N/A, N/A
[13]	AraBERT	LLM	Fine-tuned with GPT-augmented Saudi tweets	STD (50,000), GPTsynth (19,251)	Tweets	Accuracy, F1-Score, Recall, Precision,	97.00%, 98.00%, 98.00% 98.00%
[14]	AraGPT2, AraBERT	LLM	Fine-tuned per task	HARD (490,587)	Reviews	Accuracy, F1-Score Recall, Precision	96.00%, 95.90%, N/A, N/A
[15]	FinBERT	LLM	BERT-based with nested sentiment classification	ESG	Reviews	Accuracy, F1-Score, Recall, Precision, AUC	99.00%, 91.00%, 94.00%, 94.00%, 98.00%
[16]	Transformer-based offensive language classifier	LLM	N/A	OSACT5 (10,540)	Tweets	N/A	30% success rate in fooling the model
[21]	TransNet	DL, LLM	Hybrid Transformer, BiLSTM, GRU	Asthma tweets	Tweets	Accuracy, F1-Score, Recall, Precision,	97.87%, 97.86%, 97.86%, 97.86%,
[24]	ArabFake on MARBERTv2	DL, LLM	Multitask learning (fake news detection, topic, risk)	ArabFake, (2495)	News	Accuracy, F1-Score, Recall, Precision	94.07%, 94.12%, 94.08%, 94.17%
[28]	ABTM (Attention-Based Transformer Model)	DL, LLM	Transformer encoder, TF-IDF/BOW features	(Assabah + Hespress + Akhbarona) (111,728)	News	Accuracy, F1-Score, Recall, Precision	97.69%, 97.11%, 97.13%, 97.10%
[48]	Tri-ensemble stacking (bagging, boosting, baseline)	ML, DL	ELMO embeddings, textual feature extraction	AFND (606,000)	News	Accuracy, F1-Score, Recall, Precision	99.00%, 99.00%, 99.00%, 99.00%

¹ N/A = Not Applicable. ² AUC = Area Under the Curve.

Table 3. Explainability methods used in each study with the insight.

Reference	ASA Model Type	Study Domain	XAI Method	XAI Aim	Results
[22]	DL	Health	LIME	Highlight key emotion/symptom words in LASIK reviews	Boosted trust in health monitoring applications.
[21]	DL, LLM	Health	LIME	Clarify TransNet’s predictions on Arabic asthma posts	Clarified model behavior, revealing asthma-related terms.
[20]	DL	Health	LIME, SHAP, Attention	COVID-19 vaccine sentiment analysis & interpreted model focus	Attention revealed emotion-bearing phrases.
[25]	ML	Health/Social Media	LIME, Coef./Gini	Blend global/local feature importance	Aided bias detection and stop-word pruning.
[23]	DL	Cybersecurity	LIME	Arabic fake news detection & identified key dialectal cues	Helped refine data.
[24]	DL, LLM	Cybersecurity	Valence, Expert Annotations	Fake news/ risk assessment	Matched model decisions to emotional scoring and human annotations.
[48]	ML, DL	Cybersecurity	LIME	Fake news detection	Validated ensemble consistency, highlighted deceptive language.
[16]	LLM	Cybersecurity	LIME	Vulnerability detection	Revealed impactful adversarial tokens.
[19]	ML	Social media	LIME	Improve transparency in ASA	Identified influential n-grams, and validated polarity cues.
[28]	Dl, LLM	Social media	LIME, Attention	Improve Arabic news classification	LIME and attention clarified misclassifications.
[31]	DL	Social media	Attention mechanism (intrinsic)	Hate speech detection	Highlighted emojis and hate-indicative terms.
[30]	DL	Social media	Attention mechanism (intrinsic)	Bullying detection	Attention revealed abusive terms in tweets.
[29]	ML	Social media	Attention mechanism (intrinsic)	Improve transparency in ASA	Fully interpretable but lower accuracy.
[27]	DL	Linguistics	LIME	Enhance transparency in ASA	Noise layer and LIME boosted interpretability.
[13]	LLM	Linguistics	LIME	Improve ASA transparency	Heatmaps validated dialect features.
[14]	LLM	Linguistics	Saliency, InputGrad, IG, LIME, SHAP_VS, Random	Transparency in Arabic models	Gradient methods were more faithful, SHAP more human-friendly.
[26]	DL	Linguistics	Attention mechanism (intrinsic)	Explain BiGRU-BiLSTM decisions	Attention revealed key sentences.
[47]	DL	Business	Attention mechanism (intrinsic)	Visualize attention weight.	Showed sentiment-driving words.
[15]	LLM	ESG	LIME	Improve transparency for investment sentiment	Explained sentiment–ESG links for stakeholders.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alsehaimi, A.; Babour, A.; Alahmadi, D. Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis. Appl. Sci. 2025, 15, 10659. https://doi.org/10.3390/app151910659

AMA Style

Alsehaimi A, Babour A, Alahmadi D. Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis. Applied Sciences. 2025; 15(19):10659. https://doi.org/10.3390/app151910659

Chicago/Turabian Style

Alsehaimi, Afnan, Amal Babour, and Dimah Alahmadi. 2025. "Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis" Applied Sciences 15, no. 19: 10659. https://doi.org/10.3390/app151910659

APA Style

Alsehaimi, A., Babour, A., & Alahmadi, D. (2025). Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis. Applied Sciences, 15(19), 10659. https://doi.org/10.3390/app151910659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Dataset Diversity and Domain Coverage

3.2. AI Methods for ASA on Explainability

3.3. Explainability Techniques in ASA

4. Discussion

4.1. Trends and Insights

4.2. Challenges in ASA Explainability

4.3. Limitations

4.4. Future Directions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI