Stance Detection in Arabic Tweets: A Machine Learning Framework for Identifying Extremist Discourse

Alkhraiji, Arwa K.; Azmi, Aqil M.

doi:10.3390/math13182965

Open AccessArticle

Stance Detection in Arabic Tweets: A Machine Learning Framework for Identifying Extremist Discourse

by

Arwa K. Alkhraiji

and

Aqil M. Azmi

^*

Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(18), 2965; https://doi.org/10.3390/math13182965

Submission received: 21 July 2025 / Revised: 1 September 2025 / Accepted: 5 September 2025 / Published: 13 September 2025

(This article belongs to the Special Issue Machine Learning Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

Terrorism remains a critical global challenge, and the proliferation of social media has created new avenues for monitoring extremist discourse. This study investigates stance detection as a method to identify Arabic tweets expressing support for or opposition to specific organizations associated with extremist activities, using Hezbollah as a case study. Thousands of relevant Arabic tweets were collected and manually annotated by expert annotators. After extensive preprocessing and feature extraction using term frequency–inverse document frequency (tf-idf), we implemented traditional machine learning (ML) classifiers—Support Vector Machines (SVMs) with multiple kernels, Multinomial Naïve Bayes, and Weighted K-Nearest Neighbors. ML models were selected over deep learning (DL) approaches due to (1) limited annotated Arabic data availability for effective DL training; (2) computational efficiency for resource-constrained environments; and (3) the critical need for interpretability in counterterrorism applications. While interpretability is not a core focus of this work, the use of traditional ML models (rather than DL) makes the system inherently more transparent and readily adaptable for future integration of interpretability techniques. Comparative experiments using FastText word embeddings and tf-idf with supervised classifiers revealed superior performance with the latter approach. Our best result achieved a macro F-score of 78.62% using SVMs with the RBF kernel, demonstrating that interpretable ML frameworks offer a viable and resource-efficient approach for monitoring extremist discourse in Arabic social media. These findings highlight the potential of such frameworks to support scalable and explainable counterterrorism tools in low-resource linguistic settings.

Keywords:

Arabic tweets; stance detection; terrorism; word embedding; supervised machine learning

MSC:

68T50; 68P20; 91D30; 68T05

1. Introduction

Social media has become an integral part of modern life, serving not only as a primary platform for content sharing but also as a dominant source of information. As of April 2025, over 5.3 billion people—nearly 65% of the global population—use social media, with the typical user engaging across about 6.9 platforms monthly [1]. Platforms such as Facebook, YouTube, Instagram, TikTok, and Twitter encompass diverse communication forms, including personal updates, news, and educational content. Studies report that users often engage with multiple platforms simultaneously—such as browsing Instagram or YouTube while interacting on Twitter—highlighting the multi-modal and concurrent nature of social media consumption [2,3].

Twitter, one of the most prominent microblogging platforms, allows users to share short posts (tweets) limited to 280 characters. Although the platform has been renamed X, we retain the term “Twitter” throughout this study for clarity and consistency with prior studies. Tweets are published to the user’s “followers”—subscribers to their account—who can interact by replying, retweeting, or marking them as favorites [4].

While sentiment analysis and stance detection are closely related, they address distinct objectives. Sentiment analysis determines the polarity of a text (positive, negative, or neutral) by identifying emotional cues such as “love” or “hate.” However, sentiment alone is often insufficient; understanding a user’s position toward a predefined topic—known as stance detection—is equally crucial [5]. As an emerging research area, which is particularly underexplored for Arabic-language content, stance detection seeks to classify a text as favoring, opposing, or remaining neutral toward a pre-determined target.

A key challenge in stance detection lies in the implicitness of targets. Often, the target is not explicitly mentioned within the text, requiring systems to infer stance from indirect references, rhetorical devices, or context-specific cues that assume shared knowledge with the audience. For instance, consider a synthetic tweet such as “They say progress is being made, but the streets tell a different story” does not explicitly name the government, yet it may reflect opposition to its policies. Conversely, “Finally, someone is taking bold steps to fix years of neglect” might imply support. Addressing such complexities, where stance must be inferred rather than directly observed, is a central objective of stance detection systems [5].

The term terrorism originates from the French word terrorisme, which was derived from the Latin term terror (“great fear” or “dread”), and is related to terrere (“to frighten”). While historically applied to state-imposed terror during revolutionary periods (e.g., the Reign of Terror), contemporary usage more commonly describes subnational violence against states [6], though state-sponsored violence against civilian populations also falls within many scholarly definitions. Despite its widespread use, no universal legal or scientific consensus exists, with over 109 distinct definitions documented [7]. Governments and legal systems employ varying interpretations, and political sensitivities have prevented international agreement. https://en.wikipedia.org/wiki/Definition_of_terrorism (accessed on 9 June 2025). Attempts like Alex Schmid’s 1992 proposal to define terrorism as “peacetime equivalents of war crimes” based on international humanitarian law failed to gain adoption.

Nevertheless, terrorism remains a pressing global issue, with governments dedicating substantial resources to monitoring and countering terrorist activities. Motivated by this, we focus on Hezbollah as the central target of this study due to its classification as a terrorist organization by the Gulf Cooperation Council (GCC) and multiple international bodies. Hezbollah’s prominent role in regional conflicts, particularly in Syria and Lebanon, and its extensive discussion on Arabic social media make it an ideal case for examining stance detection in highly polarized and sensitive discourse.

Critically, we contend that this focused approach is a methodological strength. Unlike prior research, which often examines stance detection in broad or unspecified contexts, our deliberate emphasis on Hezbollah enables a deeper, contextually grounded analysis of the group’s unique discourse patterns, rhetorical strategies, and regional linguistic nuances. To analyze public sentiment, we compiled thousands of Arabic tweets—concise texts by nature—and developed a robust system for detecting stances specifically regarding Hezbollah’s activities in Syria. For brevity, we refer to this subject simply as “Hezbollah” throughout the paper. This specificity not only enhances the validity and reliability of our methodology for Hezbollah-related content but also provides a clear, replicable framework that can be adapted to other organizations in future work. To the best of our knowledge, this is the first stance detection study tailored to a specific organization, setting a precedent for more precise and comparable analyses in this domain.

In this work, we employed traditional machine learning (ML) classifiers—including Support Vector Machines (SVMs), Multinomial Naïve Bayes (MNB), and weighted K-Nearest Neighbors—rather than deep learning (DL) approaches. While state-of-the-art performance in Arabic stance detection is currently achieved using DL models like AraBERT, our decision was driven by three specific objectives critical to counterterrorism contexts: (i) model interpretability, as opaque “black box” predictions are unsuitable for high-stakes scenarios requiring validation and justification, with serious consequences for decision-making; (ii) computational efficiency, given many real-world applications operate in resource-constrained environments; and (iii) effective operation with limited training data, since high-quality annotated Arabic datasets remain scarce and DL models are prone to overfitting.

We recognize this choice may sacrifice some absolute performance compared to modern DL methods. However, it ensures broader applicability in low-resource or explainability-critical settings while establishing a clear, well-documented baseline. This foundation enables future studies to precisely measure improvements when applying DL models while currently providing policymakers with the transparent classification logic essential for counterterrorism analysis.

1.1. Problem Definition

The computational analysis of terrorist organizations on social media involves two complementary tasks: content detection and stance detection. Content detection identifies posts containing extremist material—such as propaganda, recruitment messages, hate speech, or violent imagery—using binary or multi-class classification. This supports content moderation and counterterrorism monitoring by flagging material that explicitly promotes or supports terrorist activities.

Stance detection, in contrast, provides a finer-grained analysis by determining the author’s position (supporting, opposing, or neutral) toward a specific organization like Hezbollah. Unlike content detection, which focuses on the presence of extremist themes, stance detection examines ideological alignment and rhetorical patterns to assess public sentiment. Such analysis is critical for understanding polarization, mapping societal attitudes, and informing policy decisions in conflict-prone regions.

The distinction lies in analytical granularity: content detection broadly scans for terrorism-related material, while stance detection targets individual attitudes toward predefined entities. Together, these tasks offer a comprehensive framework for monitoring terrorist organizations’ digital presence—from operational content identification to public opinion mapping.

1.2. Our Contributions

The main contributions of this work are as follows:

Creation of a specialized Arabic dataset: A collection of 7053 tweets focused on Hezbollah’s activities was compiled and annotated, addressing the scarcity of labeled data for Arabic stance detection research. We made the dataset available for non-commercial use.
Development of a practical stance detection system: An efficient ML-based framework for classifying stances in Arabic tweets about Hezbollah’s involvement in Syria was developed.

While interpretability is not a core focus of this work, the use of traditional ML models (rather than deep learning) makes the system inherently more transparent and easily expandable to incorporate interpretability techniques for stance analysis in future work.

The remainder of this paper is structured as follows: Section 2 provides the necessary background. Section 3 reviews related work, beginning with research in stance detection and proceeding to studies on terrorism detection. Section 4 details the proposed methodology for stance detection in Arabic tweets. Section 5 presents experimental setups and discusses the results. Finally, Section 6 concludes the study and suggests promising directions for future research.

2. Background

In this section, we provide essential background on the Arabic language, outline its inherent linguistic challenges, and present an overview of stance detection along with the key difficulties it entails.

2.1. Challenges in Handling Arabic Tweets

Twitter has emerged as a vital platform for studying contemporary public discourse, offering researchers unprecedented access to real-time public opinion and diverse societal voices [8]. While the platform’s big data capabilities complement traditional survey methods [9], scholars must navigate significant methodological challenges, including algorithmic biases, echo chambers, and the performative nature of online self-presentation. There is a growing consensus about the limitations of hashtag-centric studies and platform-specific analyses, and the field is witnessing a shift toward more robust theoretical frameworks and interdisciplinary approaches to move beyond descriptive analyses toward a deeper understanding of mediated public discourse [10].

From a computational perspective, this work is hampered by the unique nature of Twitter data itself. The platform is characterized by brevity and inherent noise, and this short length severely restricts the available word co-occurrence statistics and contextual information, which are crucial for deriving reliable semantic similarity measures [11].

These challenges are profoundly exacerbated by the intrinsic complexities of the Arabic language. Modern Standard Arabic (MSA) is typically written without diacritical markings (harakat), leading to pervasive lexical and semantic ambiguity. This issue is compounded by the language’s rich morphology, its extensive use of synonyms, and its highly inflectional and derivational structure.

The problem is further complicated by the diglossic reality of Arabic social media, where tweets are composed in a blend of MSA and various regional colloquial dialects [12]. These dialects exhibit greater variation than those typically found in European languages and lack standardized spelling rules or formal dictionaries [12]. Consequently, a single word can possess contradictory meanings across different dialects; for instance, the word بلش can mean “finished” in one dialect but “started” in another.

The core of the disambiguation problem lies in the absence of short vowel diacritics. These markings are essential for resolving a word’s precise meaning and grammatical function. While a human reader can use context to infer the correct interpretation of an unvowelized word like علم (which can mean “science,” “flag,” “taught,” “knew,” or “knowledge”), this task is exceptionally difficult for automated systems [12]. This ambiguity even extends to vowelized texts, where a word like عَام can mean either “year” or “public,” requiring deep contextual understanding for accurate disambiguation.

The intertwined challenges of ambiguity, dialectal variation, and sparse contexts are not peripheral concerns for social media analysis; they represent a core bottleneck to the progress of Arabic NLP at large. Addressing them demands not incremental refinements but domain-specific innovation and substantial resource investment. For example, the TaSbeeb judicial decision support system [13] illustrates the scale of effort required to adapt NLP to the rigor of legal discourse, while the Arabic essay evaluator in [14,15] underscores the difficulty of simultaneously assessing linguistic proficiency and semantic depth in educational settings. These cases demonstrate that overcoming the inherent linguistic complexity of Arabic is indispensable for building robust and socially impactful AI systems. Nowhere is this more critical than in computational analysis of Arabic Twitter data, where meaningful societal insights can only emerge if models effectively navigate the dense interplay between dialect, implicitness, and limited contexts.

2.2. Stance Detection

Stance detection is the task of automatically determining whether the author of a text is in favor, neutral, or against a given target (e.g., person, organization, movement, or topic). While progress has been made in detecting stance toward explicitly mentioned targets, a persistent challenge arises when the target is implicit—that is, not directly mentioned in the text but only implied through the context, allusion, or shared knowledge. This implicitness, also referred to as target latency, transforms stance detection from a surface-level classification problem into a complex reasoning task that integrates natural language understanding, commonsense inference, and knowledge retrieval.

Several interconnected factors make implicit targets particularly difficult to handle:

Disambiguation Problem: A stance-bearing statement may not identify its target explicitly but instead reference related sub-events, policies, or entities. For instance, a tweet stating, “They finally lifted the ban!” conveys a positive stance, but without an explicit target, it could refer to (1) women’s driving rights in Saudi Arabia, (2) COVID-19 travel restrictions, or (3) the reinstatement of a suspended player. Correct interpretation requires deep contextual modeling and extensive world knowledge to resolve that the term “they” refers to a specific authority and “the ban” to a specific policy.
Limits of Lexical and Syntactic Patterns: Traditional supervised and feature-based approaches (e.g., relying on tf-idf, n-grams, or syntactic dependencies) struggle when surface lexical cues are absent. If words such as “climate,” “global,” or “warming” never appear, these models lack anchors to tether stance predictions to the correct topic. This illustrates why implicit targets cannot be resolved by shallow statistical associations alone.
Knowledge Gap for LLMs: Modern large language models (LLMs) encapsulate vast world knowledge, yet they are not immune to errors with implicit targets. The challenge lies in retrieval and alignment: the model must (a) recall the relevant event (e.g., that lifting the ban on Saudi women driving occurred in 2018), (b) align this knowledge to ambiguous references such as “they” and “ban,” and (c) filter this event from countless similar “ban-lifting” instances. This process is highly sensitive to context length, prompt phrasing, and model biases, which often results in inconsistent stance predictions.
Context and Figurative Language: Implicit stance frequently co-occurs with figurative language such as irony, sarcasm, or metaphor, where the literal form diverges from the intended meaning. Statements like “What a brilliant idea!” could signal sarcasm and convey a negative stance, requiring pragmatic inference beyond lexical cues.
Data Scarcity and Annotation Difficulty: Constructing datasets for implicit stance detection is particularly challenging. Human annotators can rely on commonsense reasoning to infer implicit targets, but for machine learning, this requires datasets labeled not only with stance but also with supporting evidence and often explicit target resolution. This is expensive, time-intensive, and hard to scale. Consequently, most benchmarks (e.g., SemEval-2016) underrepresent implicit targets, leaving models poorly trained for real-world ambiguity.
Evaluation Dilemma: Assessing stance detection performance on implicit targets is also non-trivial. For example, if a model labels a post as “against” a policy, but the gold label associates the stance with the political party responsible for the policy, should this be considered correct? Simple accuracy or F-scores are insufficient, and evaluation frameworks must account for reasoning chains and conceptual proximity to capture model competence fairly.

The implicitness of targets magnifies the difficulty of stance detection because it requires resolving hidden references, integrating discourse-level and contextual information, and applying external world knowledge. While LLMs have advanced the field, reliably bridging the gap between explicit utterances and implicit meaning remains one of the most fundamental challenges.

The challenge of implicit targets compounds the difficulty of stance detection by necessitating the resolution of unstated references, the integration of discourse and contextual cues, and the application of external world knowledge. Although LLMs have propelled the field forward, reliably inferring implicit meaning from explicit utterances remains a fundamental unsolved problem. This challenge is further amplified by dialectal variation when shifting focus across regional groups. For instance, while this work concentrates on Hezbollah—which is associated with a specific dialect—applying these methods to groups using other Arabic dialects (e.g., Maghrebi or Gulf Arabic) [16] introduces additional complexity due to significant linguistic variation. Effective cross-dialect stance detection thus requires more than additional training data; it demands the development of dialect-specific lexical resources to address contradictory word senses, the creation of large-scale annotated corpora capturing unique morphological and syntactic features, and the design of dialect-aware models capable of dynamically adapting to the blended MSA–dialect continuum prevalent in social media.

3. Related Works

Our review of related work follows two key strands of research: (1) studies focusing on stance detection in social media, which provide methodological foundations for analyzing user perspectives and (2) research on terrorism-related content detection, which offers domain-specific insights relevant to our study of Hezbollah-related discourse. This dual perspective allows us to situate our work at the intersection of computational methods and security applications.

3.1. Stance Detection

Stance detection, the computational task of determining a text’s position toward a specific target, has become a cornerstone of computational linguistics. The field gained significant momentum with the establishment of standardized benchmarks through the SemEval-2016 Task 6 competition [17], which continues to influence contemporary methodologies. Prior foundational work by Somasundaran and Wiebe [18] combined sentiment and argument features in supervised classifiers, achieving 63.93% accuracy on contentious topics. Similarly, Addawood et al. [19] leveraged syntactic, lexical, and argumentative features in an SVM model to analyze public reaction to a specific event, attaining a 90.4% F-score.

The SemEval-2016 shared task [17] provided a formal definition and a dataset of 4,163 tweets, catalyzing further research. The best score attained in the shared task was an F-score of 67.8%. This benchmark was later advanced by Al-Ghadir et al. [5], whose pipeline combining tf-idf with sentiment features achieved a state-of-the-art macro F-score of 76.45%. The IberEval-2017 shared task [20] expanded the scope to include multilingual tweets on Catalan’s independence, combining stance detection with author profiling.

The advent of LLMs has transformed the field, as analyzed in the taxonomy introduced by Pangtey et al. [21], which categorizes approaches by learning methods, data modalities, and target relationships. Methodological progress has since advanced along multiple fronts. Architecturally, Garg and Caragea [22] proposed Stanceformer, which integrates a target awareness matrix to amplify attention to target-relevant terms. In the multimodal domain, Barel et al. [23] introduced TASTE, a model that fuses transformer-based textual embeddings with a structural social context using a gated residual network (GRN). A key innovation in prompting is the “Chain of Stance” (CoS) method by Ma et al. [24], which decomposes the detection process into intermediate reasoning steps for superior zero-shot and few-shot performance.

Substantial work has also focused on improving generalization. Zhao et al. [25] designed a framework using multi-expert cooperative learning, where a gating mechanism fuses diverse semantic features to enhance transfer to unseen targets. Similarly, Ma et al. [26] pioneered a zero-shot multi-agent debate (ZSMD) framework, where opposing agents debate a text’s stance with access to external knowledge for more robust predictions.

Research in Arabic stance detection has progressed from initial studies to rich shared tasks. Abdelhade et al. [27] developed a deep neural network to classify Arabic tweets in financial and sports domains, achieving a 90.68% F-score. A significant large-scale analysis was conducted by Azmi and Al-Ghadir [8], who examined Twitter as a real-time barometer of public opinion during Saudi Arabia’s 2016–2017 reform period. Their study collected approximately 200 million Arabic tweets, identifying 50,000 unique hashtags. From this corpus, they performed a two-tiered analysis: first, a behavioral study clustered 4000 prevalent hashtags into 12 primary categories, from which five key topics—four related to gender relations (e.g., women’s driving and male guardianship)—were selected for in-depth stance detection. In the second phase, they employed a dual-labeling process to classify tweets as supportive, neutral, or opposed to these social changes. Among multiple classifiers evaluated, a weighted 9-NN model achieved the highest performance, with an F-score of 72.45%. This work highlighted Twitter’s utility as a complementary tool to traditional surveys for capturing nuanced public sentiment during socio-political transformations.

The creation of new resources like the ArabicStanceX dataset by Alkhathlan et al. [28] and the multi-dialectal corpus by Charfi et al. [29] provided valuable benchmarks for dialect-aware research. This progression culminated in StanceEval 2024 [30], the first Arabic shared task for stance detection, where fine-tuned AraBERT by Badran et al. [31] achieved the highest F-score (84.38%), underscoring the transformative impact of LLMs in Arabic NLP.

The foundational landscape of the field is thoroughly mapped in the comprehensive survey by Zhang et al. [32], which details core definitions, datasets, and the methodological progression from traditional models to LLM-based approaches, while outlining emergent challenges. For a more focused examination of recent algorithmic advances, the survey by Gera and Neal [33] provides a specialized overview of deep learning techniques within stance detection.

3.2. Terrorism Detection

Recent studies have developed various computational approaches to identify terrorism-related content on social media. Omer [34] proposed a machine learning framework to classify radical vs. non-radical tweets collected using certain ideological hashtags, accepting both Arabic and non-Arabic content. After preprocessing (removing retweet tags, URLs, and annotations), the text was lemmatized and tokenized. Experiments with stylometric, sentiment-based, and time-based features showed that AdaBoost outperformed SVM and NB classifiers when features were combined.

Alsaedi and Burnap [35] developed a robust framework for detecting disruptive events through real-time Twitter analysis. Their methodology leveraged Twitter’s streaming API to collect geolocated data using region-specific keywords and hashtags, with MongoDB employed for efficient data storage. The preprocessing pipeline incorporated stop word removal and content filtering, followed by feature extraction using tf-idf and dual-language stemming (Khoja for Arabic and Porter for English). The system achieved notable performance (85.43% F-score) through a novel combination of NB classification in R and online clustering for topic identification, culminating in a voting-based summarization mechanism. This approach demonstrated particular effectiveness in handling multilingual social media data during crisis events.

For terrorist affiliation detection, Ali [36] collected tweets via Twitter API and preprocessed them by eliminating numbers, stop words, and punctuation. An unsupervised clustering algorithm grouped the data, with cluster numbers determined by frequency histograms. Visualization tools (R, NodeXL, and Gephi) revealed network patterns, demonstrating that combining basic data mining tools with linguistic analysis effectively identifies terrorist affiliations.

The ALT-TERROS system [37] is an AI-enabled framework designed to detect terrorist behavior in Arabic social media. It analyzed 10,000 tweets, assigning suspicion probabilities based on hostile emotions; links to terrorist organizations; and violent imagery. Microsoft Cognitive Services supported semantic tagging, while a risk-scoring mechanism generated alerts. By integrating heuristic rules with advanced analytics and visualizations, ALT-TERROS offers an end-to-end workflow for extracting and evaluating textual and visual data. The system also addresses challenges such as limited Arabic API support and the need for robust language resources to enhance sentiment analysis.

Al-Shawakfa et al. [38] proposed RADAR#, a system combining two neural network approaches—a CNN-BiLSTM model and the AraBERT transformer—using an ensemble technique to identify radicalization indicators in tweets. When tested on nearly 90,000 Arabic tweets, the system achieved high detection accuracy (98%) while revealing important insights through component analysis. Key findings show that the model struggles most with sarcastic posts (a third of the errors) and regional dialect variations (one-fifth of errors), highlighting areas needing improvement for real-world counterterrorism applications. The technical approach demonstrates how combining different AI methods can improve detection of harmful online content in Arabic language contexts.

Another model to detect extremist content in Arabic social media using AraBERT was proposed by [39]. The model combined AraBERT with sentiment features (SFs) and multilayer perceptrons (MLPs). The study evaluated the model on three datasets: (1) 89,816 Arabic tweets with 56% extremist content collected via X’s APIs; (2) 24,078 ISIS-related Arabic tweets having 10,000 extremist contents, analyzed via a 3400-tweet subset; and (3) 100 professionally translated Kazakh texts, half of which is extremist content for cross-lingual validation. These were, respectively, sourced from [40,41,42]. The highest performance achieved was an F-score of 96% on the first dataset using AraBERT with SF and MLP.

3.3. Concluding Remarks

This review examined two interrelated research areas: stance detection and terrorism-related content identification in social media. Current work in both domains demonstrates a clear progression toward deep learning methods, particularly LLMs, as evidenced by recent surveys of advances in stance detection [32,33] and prominent NLP tasks such as abstractive summarization [43] and question answering [44]. However, our analysis reveals a significant research gap in stance detection specifically applied to terrorism-related Arabic tweets. Sabri and Abdullah [45]’s comprehensive survey on Arabic extremism content detection highlights two key findings: (1) most studies focused on MSA despite widespread use of regional dialects in social media, and (2) among traditional machine learning approaches (excluding deep learning), SVM classifiers consistently outperform other algorithms (e.g., Naïve Bayes and Random Forests) in detection accuracy.

4. Our Proposed System

In this section, we present the key aspects of our proposed stance detection system for terrorism-related Arabic tweets. The discussion begins with the processes of data collection and annotation and concludes with the classification phase. This study focuses on terrorism as the target of interest. Recognizing that the definition of terrorism varies across individuals and governments, we adopt the organizations designated as terrorist by the Gulf Cooperation Council (GCC) as our targets of interest. Specifically, this research centers on Hezbollah-related terrorism in Syria. As noted earlier, stance detection is an emerging area of research, and this work represents one of the few studies addressing stance detection in the Arabic language. Our proposed system comprises four main components: data collection and annotation, preprocessing, feature extraction and selection, and stance classification. These components are illustrated in Figure 1.

4.1. Dataset and Annotation

We collected over 8000 tweets using Twitter’s API, which after filtering redundant and irrelevant content resulted in about 7000 usable tweets. The final dataset comprises 7053 tweets discussing Hezbollah, categorized as 3201 supporting, 2352 opposing, and 1500 neutral.

For supervised classification, we employed manual annotation by two native Arabic speakers who labeled tweets as “favor,” “against,” or “neither” (following [46]’s combined neutral/no-stance category). Although labor-intensive, this approach ensured reliable and trustworthy annotations given the task’s sensitivity. Crowdsourcing alternatives like ClickWorker or Amazon Mechanical Turk proved problematic for Arabic texts due to quality control issues [47], making professional annotation essential for this high-stakes domain. Table 1 shows some samples of tweets and their stance.

The dataset was partitioned into the training (70%) and testing (30%) subsets, preserving the original distribution of all stance categories (see Table 2). This standard split ratio, consistent with SemEval-2016’s methodology [17], ensures balanced model development and evaluation.

4.2. Tweet Preprocessing

The preprocessing phase cleans the dataset by removing noise, reducing features, and improving classification performance. We first eliminate non-Arabic content, including special symbols (“@” and “#”), URLs, retweet markers (RTs), and end-of-line characters. Next, we normalize Arabic words to their standard forms (e.g., مرررحبا to مرحبا) and remove Arabic stop words using Python’s NLTK 3.6.5 library. Finally, we tokenize the text into individual word units using standard tokenization methods [47,48].

4.3. Stemming

Stemming reduces words to their base or root form [48]. For instance, various Arabic verb conjugations like قرأ (“he read”), قرأت (“she read”), and سيقرأ (“he will read”) all share the same root meaning of reading. A stemming system consolidates these variants into the single-root form قرأ, significantly simplifying text processing.

This study employs the ISRI stemmer from Python’s NLTK library, an Arabic root-based stemmer developed by the Information Science Research Institute [49]. The stemmer first attempts to find the word’s root in its dictionary, returning the normalized form if the root is not found, rather than the original word.

5. Experiments and Discussion

In this section, we present the experimental setup and discuss the results obtained. Two distinct experimental configurations were designed for the dataset: (i) classification without the application of SMOTE and (ii) classification with SMOTE applied to address class imbalance. Across all experiments, we evaluated the performance of four SVM kernels (linear, RBF, polynomial, and sigmoid), alongside Multinomial Naive Bayes (MNB) and weighted K-Nearest Neighbors (K-NN) classifiers.

The synthetic minority over-sampling technique (SMOTE) is a widely used method for addressing class imbalance in datasets [50]. It generates synthetic samples for the minority class by interpolating between existing minority instances and their nearest neighbors. For example, consider two minority class instances

x_{1} = (2, 3)

and

x_{2} = (4, 5)

in a two-dimensional feature space. SMOTE can create a new synthetic instance

x^{'}

by interpolating between them as follows:

x^{'} = x_{1} + δ \cdot (x_{2} - x_{1})

, where

δ \in [0, 1]

is a random number. If, for example,

δ = 0.5

, then

x^{'} = (3, 4)

. This process reduces the bias towards the majority class during training and enhances classifier performance, particularly in imbalanced classification tasks. First, we cover the evaluation metrics.

5.1. Evaluation Metrics

We evaluated classification performance using standard metrics, namely accuracy, precision (P), recall (R), and the F-score, which are derived from fundamental prediction outcomes. True positives (TPs) and true negatives (TNs) represent correct classifications of positive and negative samples, respectively, while false positives (FPs) and false negatives (FNs) indicate misclassified negative and positive samples. These measures form the basis for all subsequent performance calculations, with FPs and FNs being particularly critical for assessing model reliability in imbalanced datasets.

Precision represents the fraction of correctly predicted positives among all predicted positives:

P = TP / (TP + FP)

, while recall indicates the fraction of correctly identified positives among all actual positives:

R = TP / (TP + FN)

. The F-score provides their harmonic mean:

F_{1} = \frac{2 P R}{P + R} .

(1)

Following the evaluation protocol established in the SemEval-2016 stance detection shared tasks [17], we employed the macro-averaged F-score evaluation metric, focusing exclusively on the “favor” and “against” classes while treating “neutral” as negative. The F-average (

F_{avg}

) metric is computed as

F_{avg} = \frac{F_{favor} + F_{against}}{2},

(2)

where

F_{favor}

and

F_{against}

are calculated as

\begin{matrix} F_{favor} & = \frac{2 P_{favor} R_{favor}}{P_{favor} + R_{favor}}, \end{matrix}

(3)

\begin{matrix} F_{against} & = \frac{2 P_{against} R_{against}}{P_{against} + R_{against}}, \end{matrix}

(4)

where

P_{favor}

and

P_{against}

denote the precision values for the classes “favor” and “against”, respectively, while

R_{favor}

and

R_{against}

represent the corresponding recall values for these two classes.

This evaluation framework aligns with established stance detection competitions and enables direct performance comparisons across systems [17].

5.2. Word Embedding Using FastText

The word embedding approach represents words as real-valued vectors in a predefined space, capturing semantic relationships by assigning similar vectors to words with related meanings [51].

We employed Facebook’s FastText library, which offers pretrained models for over 290 languages and supports both supervised and unsupervised word vector generation [51]. For our experiment, we trained a supervised FastText model on the datasets, using the default vector size of 300.

Model tuning focused on two key parameters:

Epochs: The term refers to the number of times the instance is seen by FastText. The standard range for the number of epochs is 5–50.
Learning rate (LR): It refers to the amount of change performed by the model after processing each instance. The standard range for LR is 0.1–1.0.

Our experimental results demonstrate that the optimal configuration of an LR = 0.1, and 25 epochs yielded peak performance with an F-score of 70.86%, as detailed in Table 3.

5.3. Supervised Classifiers

Our feature selection process began with term frequency–inverse document frequency (tf-idf) weighting, which evaluates word importance through a balance of two factors: term frequency tf (how often a word appears in a document) and inverse document frequency df (how rare it is across all documents) [52,53]. It is calculated by

w_{i j} = {t f}_{i j} \cdot log (D / {df}_{i}),

(5)

where

{tf}_{i j}

denotes occurrences of word i in document j,

{df}_{i}

tracks how many documents contain word i, and D is our total document count. This formula automatically boosts distinctive terms while suppressing overly common ones.

When we cross-validated these tf-idf features using Chi-square selection, we found complete agreement—both methods identified the same set of meaningful features. This gave us confidence to proceed with these optimized word representations.

To determine the optimal number of features, we evaluated our system using 100, 500, 1000, 1500, and 2000 features, measuring the impact on

F_{avg}

across three classifiers: Multinomial Naïve Bayes (MNB), an SVM with the RBF kernel, and weighted K-NN (with

K = 1

). The dataset was partitioned into 70% for training and 30% for testing, adhering to the established evaluation framework used in the SemEval-2016 stance detection shared tasks [17]. We assessed all combinations of unigrams, bigrams, and trigrams across these feature set sizes, leveraging their established effectiveness for Arabic text processing.

The results, as detailed in Table 4, reveal a clear performance trend. The SVM-RBF classifier achieved its highest

F_{avg}

of 0.7243 using a combination of 1000 unigram and trigram features. A close second result of

F_{avg} = 0.7238

was attained using 1000 unigram and bigram features for the same classifier. Both values appear as 0.724 in the table due to rounding. MNB performed fairly well, but weighted 1NN yielded the least favorable performance.

Figure 2 presents the confusion matrix heatmaps for the top-performing feature configuration (1000 unigram and trigram features,

N = 1 + 3

from Table 4) under imbalanced data conditions. Several observations can be made. SVM-RBF achieves the most favorable trade-off, yielding both higher TP for the favor class (454) and higher TN for the against class (317). MNB performs reasonably well but suffers from a larger number of false positives (157 cases). In contrast, W-1NN exhibits the weakest performance, misclassifying a substantial portion of both favor and against examples.

Interestingly, the number of TPs for the favor class is almost identical for MNB and SVM-RBF, despite the latter being the stronger classifier overall. The key advantage of SVM-RBF lies not in identifying favor instances but in avoiding the misclassification of against examples as favor. Specifically, MNB produces 157 FPs (against predicted as favor), whereas SVM-RBF reduces this to only 107 FPs. Consequently, although both models capture approximately 79% of the favor class (similar recall), SVM-RBF achieves superior precision by minimizing false positives.

In summary, the strength of SVM-RBF is evident in its effective management of true negatives and false positives rather than true positives alone. This capability leads to consistently higher precision and

F_{avg}

scores, underscoring its advantages over MNB in handling imbalanced stance detection tasks.

5.4. Hyper-Parameter Optimization

The goal of a learning algorithm A is to find a function f that minimizes the expected loss

L (x; f)

over samples x drawn from the true data distribution

G_{x}

. The algorithm A maps the training set

X^{(train)}

to f and depends on hyper-parameters

λ

, which must be tuned to produce the best-performing model. Thus, the key challenge is selecting the optimal

λ

that minimizes the generalization error:

λ^{*} = {argmin}_{λ \in Λ} E_{x \sim G_{x}} [L (x; A_{λ} (X^{(train)}))] .

(6)

This selection process is known as hyper-parameter optimization [54].

Among the available optimization methods, grid search is particularly effective in low-dimensional spaces due to its simplicity [54]. In this study, we employed grid search to tune the hyper-parameters of the SVM classifier.

The SVM’s performance is highly sensitive to its hyper-parameters, particularly C (regularization parameter) and

γ

(kernel coefficient). The C parameter controls the trade-off between maximizing training accuracy and ensuring a smooth decision boundary—higher values enforce strict classification of training samples, while lower values prioritize smoother decision surfaces [55]. The

γ

parameter determines the influence range of a single training instance; higher values mean only nearby points significantly affect the decision boundary [56].

To optimize C and

γ

, we employed a grid search strategy combined with 10-fold cross-validation on the training dataset. The search space for both parameters was defined as

[0.01, 0.1, 1, 10, 100]

. The results, presented in Table 5, was evaluated in terms of classification accuracy [54].

Table 6 presents a comparative analysis of stance detection performance on the Hezbollah dataset, evaluating results both without and with the application of SMOTE. The table details the precision, recall, and F-score metrics for the “favor” and “against” classes, in addition to the overall macro-average F-score (

F_{avg}

). Complementary to these results, Figure 3 and Figure 4 illustrate the comparative

F_{avg}

performance across different classifiers and n-gram feature combinations under both experimental conditions.

Employing a combination of unigram and trigram features, the SVM-RBF classifier yielded the strongest performance in our experiments, achieving an

F_{avg}

of 72.43% without SMOTE (Figure 3e; see also Table 4) and improving further to 78.62% with SMOTE (Figure 4e).

The analysis of feature combinations revealed that integrating multiple n-grams consistently equaled or exceeded the performance of any single feature type. For instance, without SMOTE, combining unigram and bigram features increased the MNB classifier’s

F_{avg}

from 68.85% (unigrams alone) and 63.92% (bigrams alone) to 69.41%, demonstrating a clear synergistic effect. Among the classifiers we evaluated—MNB, the SVM (with multiple kernels), and weighted K-NN—MNB delivered competitive results approaching the SVM’s performance while requiring significantly less computation time. The weighted K-NN, while inefficient for our dataset, demonstrated good performance on the SemEval-2016 English dataset, as reported in [5]. In kernel comparisons, RBF consistently outperformed polynomial and sigmoid kernels for our Hezbollah dataset.

5.5. Discussion

In the context of stance detection on extremist groups, recall is a more critical metric than precision. The primary objective is to ensure that supportive or sympathetic stances are comprehensively detected, as overlooking such content (FNs) may result in missed signals of radicalization, recruitment, or propaganda dissemination, with direct implications for national security. While precision remains important for reducing false alarms and avoiding misallocation of resources, its trade-off is less consequential than recall in high-stakes domains. From an operational standpoint, it is preferable to flag more potential threats—even at the cost of some FPs—than to miss genuine supportive expressions. Thus, models for this task must be optimized for high recall while maintaining an acceptable level of precision to ensure feasibility in deployment.

The application of SMOTE underscores a critical divergence in how classifiers balance the precision–recall trade-off in stance detection, as evidenced by the confusion matrices in Figure 5. When compared to the non-SMOTE performance (Figure 2), both MNB and SVM-RBF exhibit improved specificity against the “against” class, with FPs decreasing from 157 to 103 and from 107 to 77, respectively. However, their behavior diverges sharply in terms of recall. MNB undergoes a conservative shift that severely compromises its sensitivity: TPs for the “favor” class drop from 453 to 405, while FNs increase from 123 to 171. This significant loss of recall—wherein numerous genuine supportive stances are overlooked—renders MNB unsuitable for applications such as counter-extremism, where detecting subtle or early signals of radicalization depends on high recall. In such contexts, failing to capture authentic “favor” instances poses a far greater risk than misclassifying a limited number of negative examples.

At the same time, the ethical implications of FPs cannot be ignored. Each misclassification of an “against” stance as “favor” constitutes a false accusation, potentially stigmatizing individuals as sympathizers of extremist groups when in fact they are not. This risk of creating “false victims” underscores the necessity of maintaining strong precision to protect individuals from wrongful suspicion and to preserve trust in automated systems. In this respect, SVM-RBF demonstrates a more desirable profile: it maintains high recall with only a marginal decrease in TPs (454 to 441) while simultaneously achieving greater precision by substantially reducing FPs and increasing TNs (317 to 347). By balancing the need to capture supportive stances (high recall) with the responsibility to avoid wrongful accusations (high precision), SVM-RBF emerges as the more robust and socially responsible classifier. This dual consideration highlights that effective stance detection in sensitive domains must optimize not only for security imperatives but also for ethical safeguards.

We observed an interesting performance gap between FastText (

F_{avg} = 70.86 %

, Table 3) and SVM-RBF (

F_{avg} = 78.62 %

). While we included FastText for its distinct neural approach and proven effectiveness with morphologically rich languages like Arabic, two main factors contributed to this difference: (1) Our dataset size was relatively limited for optimal FastText performance, which typically benefits from larger corpora or pretrained embeddings, while SVM-RBF excels with smaller datasets when using well-engineered features like tf-idf and n-grams. (2) FastText’s shallow architecture is less capable of modeling complex decision boundaries compared to SVM-RBF’s kernel-based approach.

We maintained FastText in our comparisons to provide a comprehensive methodological perspective and demonstrate how different approaches perform under identical conditions. Although it underperformed relative to SVM-RBF in this specific context, FastText remains a valuable baseline that could potentially achieve better results with larger-scale pretraining or enhanced preprocessing techniques.

6. Conclusions

This study aims to develop an automated system for stance detection in Arabic tweets, focusing on Hezbollah’s activities in Syria as a case study. The proposed algorithm addresses the unique challenges of Arabic tweets, including dialectal variations in stance identification, as noted by the annotators. Despite these complexities, the system demonstrates strong performance by first preprocessing tweets using the ISRI stemmer, then converting them into feature vectors via tf-idf vectorization. These vectors serve as input for multiple classifiers, including MNB, SVMs, and weighted K-NN.

The experiments were conducted on a dataset of approximately 7000 tweets from Hezbollah-related discussions. The results indicate that the system performs effectively, particularly when the dataset is balanced. We achieved an

F_{avg}

score of 78.63%—a competitive result compared to state-of-the-art performance in English tweet stance detection, where the highest reported F-average score for the SemEval-2016 dataset was 76%. This underscores the system’s capability to handle the linguistic nuances of Arabic social media content.

For future work we plan to incorporate a profiler, similar to the one in [5]. This profiler would enable the system to infer demographic attributes of users, such as gender and age group, thereby providing richer contexts for stance detection and content analysis.

Author Contributions

Conceptualization, A.K.A. and A.M.A.; methodology, A.K.A.; software, A.K.A.; validation, A.K.A.; formal analysis, A.M.A. and A.K.A.; investigation, A.K.A.; resources, A.M.A.; data curation, A.K.A.; writing—original draft preparation, A.K.A.; writing—review and editing, A.M.A.; supervision, A.M.A.; funding acquisition, A.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank the Ongoing Research Funding Program (ORFFT-2025-006-4), King Saud University, Riyadh, Saudi Arabia for financial support.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. They are available for non-commercial research purposes.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Backlinko Team. Social Media Users & Growth Statistics. 2024. Available online: https://backlinko.com/social-media-users (accessed on 9 June 2025).
DataReportal/Kepios. Global Social Media Statistics. 2025. Available online: https://datareportal.com/social-media-users (accessed on 12 June 2025).
Boczkowski, P.J.; Matassi, M.; Mitchelstein, E. How Young Users Deal With Multiple Platforms: The Role of Meaning-Making in Social Media Repertoires. J. Comput.-Mediat. Commun. 2018, 23, 245–259. [Google Scholar] [CrossRef]
Zubiaga, A.; Aker, A.; Bontcheva, K.; Liakata, M.; Procter, R. Detection and resolution of rumours in social media: A survey. Acm Comput. Surv. (Csur) 2018, 51, 1–36. [Google Scholar] [CrossRef]
Al-Ghadir, A.I.; Azmi, A.M.; Hussain, A. A novel approach to stance detection in social media tweets by fusing ranked lists and sentiments. Inf. Fusion 2020, 67, 29–40. [Google Scholar] [CrossRef]
Williamson, M. Terrorism, War and International Law: The Legality of the Use of Force Against Afghanistan in 2001; Routledge: London, UK, 2016. [Google Scholar]
Kruglanski, A.W.; Fishman, S. Terrorism between “syndrome” and “tool”. Curr. Dir. Psychol. Sci. 2006, 15, 45–48. [Google Scholar] [CrossRef]
Azmi, A.M.; Al-Ghadir, A.I. Using Twitter as a digital insight into public stance on societal behavioral dynamics. J. King Saud-Univ.-Comput. Inf. Sci. 2024, 36, 102078. [Google Scholar] [CrossRef]
Callegaro, M.; Yang, Y. The role of surveys in the era of “big data”. In The Palgrave Handbook of Survey Research; Palgrave Macmillan: Cham, Switzerland, 2017; pp. 175–192. [Google Scholar]
Bruns, A.; Stieglitz, S. Twitter Data: What do they represent? IT—Inf. Technol. 2014, 56, 240–245. [Google Scholar] [CrossRef]
Alruily, M.; Manaf Fazal, A.; Mostafa, A.M.; Ezz, M. Automated Arabic long-tweet classification using transfer learning with BERT. Appl. Sci. 2023, 13, 3482. [Google Scholar] [CrossRef]
Azmi, A.M.; Aljafari, E.A. Universal web accessibility and the challenge to integrate informal Arabic users: A case study. Univers. Access Inf. Soc. 2018, 17, 131–145. [Google Scholar] [CrossRef]
Almuzaini, H.A.; Azmi, A.M. TaSbeeb: A judicial decision support system based on deep learning framework. J. King Saud-Univ.-Comput. Inf. Sci. 2023, 35, 101695. [Google Scholar] [CrossRef]
Al-Jouie, M.F.; Azmi, A.M. Automated Evaluation of School Children Essays in Arabic. In Proceedings of the 3rd International Conference on Arabic Computational Linguistics (ACLing 2017), Dubai, United Arab Emirates, 5–6 November 2017; Volume 117, pp. 19–22. [Google Scholar]
Alqahtani, A.; Al-Saif, A. Automated Arabic essay evaluation. In Proceedings of the 17th International Conference on Natural Language Processing (ICON), Patna, India, 18–21 December 2020; pp. 181–190. [Google Scholar]
AlShenaifi, N.; Azmi, A. Arabic dialect identification using machine learning and transformer-based models. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP 2022), Abu Dhabi, United Arab Emirates, 8 December 2022; pp. 464–467. [Google Scholar]
Mohammad, S.; Kiritchenko, S.; Sobhani, P.; Zhu, X.; Cherry, C. Semeval-2016 Task 6: Detecting stance in tweets. In Proceedings of the SemEval, San Diego, CA, USA, 16–17 June 2016; pp. 31–41. [Google Scholar]
Somasundaran, S.; Wiebe, J. Recognizing stances in ideological on-line debates. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA, USA, 1–6 June 2010; pp. 116–124. [Google Scholar]
Addawood, A.; Schneider, J.; Bashir, M. Stance Classification of Twitter Debates: The Encryption Debate as A Use Case. In Proceedings of the 8th International Conference on Social Media & Society, Toronto, ON, Canada, 28–30 July 2017. [Google Scholar]
Taulé, M.; Martí, A.; Rangel, F.; Rosso, P.; Bosco, C.; Patti, V. Overview of the Task on Stance and Gender Detection in Tweets on Catalan Independence at IberEval 2017. In Proceedings of the 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages, Murcia, Spain, 19 September 2017. [Google Scholar]
Pangtey, L.; Choudhary, R.; Mehta, P. Large Language Models Meet Stance Detection: A Survey of Tasks, Methods, Applications, Challenges and Future Directions. arXiv 2025, arXiv:2505.08464. [Google Scholar] [CrossRef]
Garg, K.; Caragea, C. Stanceformer: Target-Aware Transformer for Stance Detection. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; pp. 4969–4984. [Google Scholar]
Barel, G.; Tsur, O.; Vilenchik, D. Acquired TASTE: Multimodal Stance Detection with Textual and Structural Embeddings. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, United Arab Emirates, 19–24 January 2025; pp. 6492–6504. [Google Scholar]
Ma, J.; Wang, C.; Xing, H.; Zhao, D.; Zhang, Y. Chain of stance: Stance detection with large language models. In Natural Language Processing and Chinese Computing (NLPCC 2024); Lecture Notes in Computer Science; Wong, D.F., Wei, Z., Yang, M., Eds.; Springer: Singapore, 2024; Volume 15363, pp. 82–94. [Google Scholar]
Zhao, X.; Li, F.; Wang, H.; Zhang, M.; Liu, Y. Zero-Shot Stance Detection Based on Multi-Expert Collaboration. Sci. Rep. 2024, 14, 15923. [Google Scholar] [CrossRef]
Ma, J.; Qian, Y.; Yang, J. Exploring Multi-Agent Debate for Zero-Shot Stance Detection: A Novel Approach. Appl. Sci. 2025, 15, 4612. [Google Scholar] [CrossRef]
Abdelhade, N.; Soliman, T.; Ibrahim, H. Detecting Twitter users’ opinions of Arabic comments during various time episodes via deep neural network. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 1–3 September 2018. [Google Scholar]
Alkhathlan, A.; Alahmadi, F.; Kateb, F.; Al-Khalifa, H. Constructing and evaluating ArabicStanceX: A social media dataset for Arabic stance detection. Front. Artif. Intell. 2025, 8, 1615800. [Google Scholar] [CrossRef]
Charfi, A.; Khene, M.; Belguith, L.H. Stance Detection in Arabic with a Multi-Dialectal Cross-Domain Stance Corpus. Soc. Netw. Anal. Min. 2024, 14, 69. [Google Scholar] [CrossRef]
Alturayeif, N.; Luqman, H.; Alyafeai, Z.; Yamani, A. StanceEval 2024: The first Arabic stance detection shared task. In Proceedings of the Second Arabic Natural Language Processing Conference, Bangkok, Thailand, 16 August 2024; pp. 774–782. [Google Scholar]
Badran, M.; Hamdy, M.; Torki, M.; El-Makky, N.M. AlexUNLP-BH at StanceEval2024: Multiple contrastive losses ensemble strategy with multi-task learning for stance detection in Arabic. In Proceedings of the Second Arabic Natural Language Processing Conference, Bangkok, Thailand, 16 August 2024; pp. 823–827. [Google Scholar]
Zhang, B.; Jiang, Y.; Wang, L.; Wang, Z.; Li, X. A Survey of Stance Detection on Social Media: New Directions and Perspectives. arXiv 2024, arXiv:2409.15690. [Google Scholar] [CrossRef]
Gera, P.; Neal, T. Deep Learning in Stance Detection: A Survey. ACM Comput. Surv. 2025, 58, 1–37. [Google Scholar] [CrossRef]
Omer, E. Using Machine Learning to Identify Jihadist Messages on Twitter. In Proceedings of the 2015 European Intelligence and Security Informatics Conference, Manchester, UK, 7–9 September 2015. [Google Scholar]
Alsaedi, N.; Burnap, P. Arabic event detection in social media. In Computational Linguistics and Intelligent Text Processing (CICLing 2015); Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 9041, pp. 384–401. [Google Scholar]
Ali, G. Identifying Terrorist Affiliations through Social Network Analysis Using Data Mining Techniques. Master’s Thesis, Valparaiso University, Valparaiso, IN, USA, 2016. [Google Scholar]
Alhalabi, W.; Jussila, J.; Jambi, K.; Visvizi, A.; Qureshi, H.; Lytras, M.; Malibari, A.; Adham, R. Social mining for terroristic behavior detection through Arabic tweets characterization. Future Gener. Comput. Syst. 2021, 116, 132–144. [Google Scholar] [CrossRef]
Al-Shawakfa, E.M.; Alsobeh, A.M.; Omari, S.; Shatnawi, A. RADAR#: An Ensemble Approach for Radicalization Detection in Arabic Social Media Using Hybrid Deep Learning and Transformer Models. Information 2025, 16, 522. [Google Scholar] [CrossRef]
Himdi, H.; Alhayan, F.; Shaalan, K. Neural Networks and Sentiment Features for Extremist Content Detection in Arabic Social Media. Int. Arab J. Inf. Technol. (IAJIT) 2025, 22, 522–534. [Google Scholar] [CrossRef]
Aldera, S.; Emam, A.; Al-Qurishi, M.; Alrubaian, M.; Alothaim, A. Annotated Arabic Extremism Tweets. IEEE Dataport, 8 August 2021. [Google Scholar] [CrossRef]
Fraiwan, M. Identification of Markers and Artificial Intelligence-Based Classification of Radical Twitter Data. Appl. Comput. Inform 2022. [Google Scholar] [CrossRef]
Mussiraliyeva, S.; Bolatbek, M.; Omarov, B.; Bagitova, K. Detection of extremist ideation on social media using machine learning techniques. In Proceedings of the International Conference on Computational Collective Intelligence (ICCCI 2020); Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12496, pp. 743–752. [Google Scholar]
Almohaimeed, N.; Azmi, A.M. Abstractive text summarization: A comprehensive survey of techniques, systems, and challenges. Comput. Sci. Rev. 2025, 57, 100762. [Google Scholar] [CrossRef]
Abdel-Nabi, H.; Awajan, A.; Ali, M.Z. Deep learning-based question answering: A survey. Knowl. Inf. Syst. 2023, 65, 1399–1485. [Google Scholar] [CrossRef]
Sabri, R.F.; Abdullah, N.A. A Review for Arabic Extremism Detection Using Machine Learning. Iraqi J. Sci. 2024, 65, 6617–6630. [Google Scholar] [CrossRef]
Mohammad, S.; Kiritchenko, S.; Sobhani, P.; Zhu, X.; Cherry, C. A dataset for detecting stance in tweets. In Proceedings of the 10th Edition of the the Language Resources and Evaluation Conference (LREC), Portorož, Slovenia, 23–28 May 2016. [Google Scholar]
Almuzaini, H.A.; Azmi, A.M. An unsupervised annotation of Arabic texts using multi-label topic modeling and genetic algorithm. Expert Syst. Appl. 2022, 203, 117384. [Google Scholar] [CrossRef]
Al-Zyoud, A.; Al-Rabayah, W. Arabic Stemming Techniques: Comparisons and New Vision. In Proceedings of the 8th IEEE GCC Conference and Exhibition, Muscat, Oman, 1–4 February 2015. [Google Scholar]
Taghva, K.; Elkhoury, R.J.; Coombs, J. Arabic stemming without a root dictionary. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)-Volume II, Las Vegas, NV, USA, 4–6 April 2005; Volume 1, pp. 152–157. [Google Scholar]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef]
Sparck Jones, K. A Statistical Interpretation of Term Specificity and Its Application in Retrieval. J. Doc. 1972, 28, 11–21. [Google Scholar] [CrossRef]
Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Schölkopf, B.; Burges, C.J.; Vapnik, V.N. Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. In Proceedings of the 9th International Conference on Artificial Neural Networks (ICANN), Edinburgh, UK, 7–10 September 1997; pp. 63–72. [Google Scholar]

Figure 1. Our proposed terrorism stance detection system.

Figure 2. The confusion matrix heapmap for 1000 features (unigram and trigram) for three different classifiers.

Figure 3. The behavior of

F_{avg}

without SMOTE across different n-gram combinations and classifiers: (a) unigram, (b) bigram, (c) trigram, (d) unigram + bigram, (e) unigram + trigram, and (f) bigram + trigram.

Figure 3. The behavior of

F_{avg}

without SMOTE across different n-gram combinations and classifiers: (a) unigram, (b) bigram, (c) trigram, (d) unigram + bigram, (e) unigram + trigram, and (f) bigram + trigram.

Figure 4. The behavior of

F_{avg}

with SMOTE (Synthetic Minority Over-sampling Technique) for identical n-gram combinations and classifiers as in the previous figure. To enable direct comparison, the y-axis limits are kept identical.

Figure 4. The behavior of

F_{avg}

with SMOTE (Synthetic Minority Over-sampling Technique) for identical n-gram combinations and classifiers as in the previous figure. To enable direct comparison, the y-axis limits are kept identical.

Figure 5. The confusion matrix heatmaps for unigram and trigram features using two classifiers trained with SMOTE. For comparability with the earlier results, the dataset size is normalized to 1000 instances.

Table 1. Sample tweets from the dataset, accompanied by their translations and associated stance labels.

Tweet	Stance
أصلا حزب الله وداعش حلفاء ولم يتعرض حزب الله لداعش في سوريا أبدا وكلاهما كان يقتل في الشعب السوري وكلاهما صناعة إسرائيلية وما وراء ذلك كله تمثيل	Against
In fact, Hezbollah and ISIS are allies, and Hezbollah never confronted ISIS in Syria; both were killing the Syrian people, both are an Israeli creation, and everything beyond that is mere theatrics.
قَسَم لو ما قَسَم مايفرق.. بعد لو يحط ايد عالقرآن وإيد على عيونه عساها العمى ويحلف حلف مايفيده،حزب الله بيرد بيرد والسيد ما يثنّي كلمته، بس اعطوا هالنتن كلينكس لايصيح	Favor
Whether he swears an oath or not, it makes no difference. Even if he puts one hand on the Qur’an and the other over his eyes—may they go blind—and swears the strongest oath, it won’t help him. Hezbollah will definitely respond, and “the Sayyid” doesn’t go back on his word. Just give that stinker a Kleenex so he doesn’t start crying.
والله مو كاسر خاطري الا الايتام الي بالدول العربيه الي فقدوا اهلهم بالربيع العربي تخيل فجاه تصير بدون عائله وبدون وطن وعمرك صغير وعندك اخوات كنت اتمنى من سفارات دول الخليج انها تتبنى الايتام وتوفر لهم سكن وملبس وتعليم خصوصا دول الخليج مقتدره ماديا #سوريا	Neutral
By God, what breaks my heart most are the orphans in Arab countries who lost their families during the Arab Spring. Imagine suddenly being without a family or a homeland, being so young and having siblings. I wish the embassies of the Gulf states would take in these orphans and provide housing, clothing, and education especially the Gulf countries which are financially capable. #Syria
هههههه تمثيلياتكم وأفلامكم أنتهت والجيش السوري بالتعاون مع روسيا وإيران يحررون سوريا من ما تبقي من جرذان الغرب و المدن التي يحررها دائما ما يمهل فترة للمدنيين ومن يبقون هم الخونة والعملاء فقط وعلي أساس الخونة مسالمون وليسوا بلطجية وإرهابيين ومسلحين	Favor
Hahaha—your theatrics and films are over. The Syrian army, in cooperation with Russia and Iran, is liberating Syria from what remains of the West’s “rats.” In the cities it liberates, it always gives civilians a grace period, and those who remain are only the traitors and collaborators. As if those traitors were “peaceful” and not thugs, terrorists, and armed men.
الاف القتلى المسلمين السنة في سوريا و تهجير الملايين المسلمين السنة السوريين بمساعدة الحرس الثوري الايراني و الميليشيات الشيعية العنصرية الارهابية من حشد شعبي و ما يسمى حزب الله و حتى الجيش الروسي و انت تستغرب !!‏	Against
Thousands of Sunni Muslims have been killed in Syria, and millions of Syrian Sunni Muslims have been displaced with the help of the Iranian Revolutionary Guard, sectarian terrorist Shiite militias from the Popular Mobilization Forces, the so-called Hezbollah, and even the Russian army, and you’re surprised?!

Table 2. The distribution of tweets in our Hezbollah dataset across the three stance categories: in favor, against, and neutral.

	Training (70%)	Testing (30%)	Total
Favor	2241	960	3201
Against	1646	706	2352
Neither	1050	450	1500
Total	4937	2116	7053

Table 3. Results of word embedding using FastText. We boldface the best entry.

Learning Rate	Epochs	F-Score
0 (default)	5 (default)	0.1361
0.1	25	0.7086
0.2	25	0.7006
0.3	25	0.6957
1.0	25	0.6851
1.0	50	0.6847

Table 4. Performance tuples (

P_{favor}

,

R_{favor}

,

F_{favor}

,

P_{against}

,

R_{against}

,

F_{against}

, and

F_{avg}

) for MNB, the SVM with the RBF kernel, and W-1NN across different feature counts and n-gram configurations. The first six tuples are, respectively, designated by

P^{+}

,

R^{+}

,

F^{+}

,

P^{-}

,

R^{-}

, and

F^{-}

. The N refers to the size of the N-gram, with 1 refering to unigram, 1 + 3 refering to unigram and trigram (combined), and so on. For each classifier we boldface the best

F_{avg}

.

Table 4. Performance tuples (

P_{favor}

,

R_{favor}

,

F_{favor}

,

P_{against}

,

R_{against}

,

F_{against}

, and

F_{avg}

) for MNB, the SVM with the RBF kernel, and W-1NN across different feature counts and n-gram configurations. The first six tuples are, respectively, designated by

P^{+}

,

R^{+}

,

F^{+}

,

P^{-}

,

R^{-}

, and

F^{-}

. The N refers to the size of the N-gram, with 1 refering to unigram, 1 + 3 refering to unigram and trigram (combined), and so on. For each classifier we boldface the best

F_{avg}

.

		MNB							SVM-RBF							W-1NN
$N$	# Feature	$P^{+}$	$R^{+}$	$F^{+}$	$P^{-}$	$R^{-}$	$F^{-}$	$F_{avg}$	$P^{+}$	$R^{+}$	$F^{+}$	$P^{-}$	$R^{-}$	$F^{-}$	$F_{avg}$	$P^{+}$	$R^{+}$	$F^{+}$	$P^{-}$	$R^{-}$	$F^{-}$	$F_{avg}$
1	100	0.675	0.806	0.735	0.546	0.559	0.552	0.644	0.802	0.771	0.786	0.643	0.606	0.624	0.705	0.664	0.683	0.673	0.502	0.451	0.475	0.574
	500	0.714	0.796	0.753	0.573	0.663	0.615	0.684	0.814	0.779	0.796	0.639	0.640	0.639	0.718	0.527	0.638	0.577	0.388	0.447	0.415	0.496
	1000	0.735	0.788	0.761	0.562	0.682	0.616	0.689	0.807	0.789	0.798	0.650	0.641	0.646	0.722	0.518	0.557	0.537	0.354	0.502	0.415	0.476
	1500	0.745	0.781	0.763	0.562	0.713	0.629	0.696	0.799	0.782	0.790	0.644	0.656	0.650	0.720	0.928	0.157	0.269	0.356	0.977	0.522	0.395
	2000	0.741	0.779	0.759	0.549	0.712	0.620	0.690	0.804	0.781	0.792	0.642	0.656	0.649	0.720	0.949	0.152	0.262	0.356	0.984	0.523	0.393
2	100	0.578	0.875	0.696	0.588	0.288	0.387	0.542	0.595	0.845	0.698	0.543	0.416	0.471	0.585	0.672	0.307	0.422	0.342	0.717	0.463	0.442
	500	0.659	0.803	0.724	0.551	0.501	0.525	0.624	0.655	0.790	0.716	0.559	0.537	0.548	0.632	0.679	0.469	0.555	0.410	0.616	0.492	0.523
	1000	0.680	0.785	0.729	0.553	0.546	0.550	0.639	0.672	0.780	0.722	0.553	0.568	0.561	0.641	0.652	0.588	0.619	0.419	0.504	0.457	0.538
	1500	0.695	0.776	0.733	0.546	0.575	0.560	0.647	0.677	0.777	0.723	0.556	0.599	0.577	0.650	0.649	0.620	0.634	0.438	0.490	0.463	0.548
	2000	0.714	0.772	0.742	0.544	0.611	0.576	0.659	0.687	0.778	0.729	0.552	0.603	0.577	0.653	0.625	0.640	0.633	0.435	0.435	0.435	0.534
3	100	0.481	0.962	0.641	0.500	0.082	0.141	0.391	0.482	0.961	0.642	0.496	0.095	0.160	0.401	0.769	0.135	0.230	0.332	0.922	0.488	0.359
	500	0.506	0.949	0.660	0.583	0.186	0.282	0.471	0.506	0.951	0.661	0.601	0.196	0.296	0.478	0.801	0.200	0.320	0.344	0.900	0.497	0.409
	1000	0.513	0.932	0.662	0.571	0.228	0.326	0.494	0.513	0.935	0.663	0.586	0.224	0.324	0.493	0.765	0.241	0.367	0.353	0.881	0.504	0.435
	1500	0.527	0.921	0.670	0.582	0.269	0.368	0.519	0.524	0.925	0.669	0.599	0.253	0.356	0.513	0.768	0.256	0.384	0.353	0.855	0.499	0.441
	2000	0.534	0.913	0.674	0.591	0.290	0.389	0.531	0.531	0.916	0.672	0.603	0.280	0.382	0.527	0.770	0.279	0.410	0.360	0.849	0.506	0.458
1 + 2	100	0.678	0.804	0.736	0.553	0.555	0.554	0.645	0.796	0.772	0.784	0.636	0.593	0.614	0.699	0.667	0.688	0.677	0.497	0.442	0.468	0.572
	500	0.716	0.797	0.755	0.585	0.657	0.619	0.687	0.819	0.779	0.798	0.641	0.646	0.643	0.721	0.540	0.659	0.594	0.410	0.470	0.438	0.516
	1000	0.742	0.790	0.765	0.578	0.676	0.623	0.694	0.807	0.787	0.797	0.653	0.649	0.651	0.724	0.533	0.589	0.559	0.367	0.504	0.424	0.492
	1500	0.759	0.787	0.773	0.578	0.704	0.635	0.704	0.809	0.782	0.795	0.645	0.656	0.650	0.723	0.915	0.164	0.279	0.356	0.969	0.520	0.400
	2000	0.760	0.781	0.770	0.579	0.728	0.645	0.708	0.803	0.787	0.795	0.644	0.650	0.647	0.721	0.935	0.161	0.275	0.356	0.977	0.522	0.399
1 + 3	100	0.678	0.804	0.736	0.553	0.555	0.554	0.645	0.796	0.772	0.784	0.636	0.593	0.614	0.699	0.667	0.688	0.677	0.497	0.442	0.468	0.572
	500	0.717	0.793	0.753	0.582	0.659	0.618	0.686	0.821	0.780	0.800	0.643	0.649	0.646	0.723	0.541	0.656	0.593	0.411	0.476	0.441	0.517
	1000	0.743	0.787	0.764	0.577	0.678	0.624	0.694	0.809	0.789	0.799	0.653	0.647	0.650	0.724	0.526	0.590	0.556	0.359	0.485	0.412	0.484
	1500	0.758	0.786	0.772	0.577	0.703	0.634	0.703	0.808	0.783	0.795	0.645	0.654	0.650	0.722	0.917	0.167	0.283	0.356	0.969	0.521	0.402
	2000	0.766	0.780	0.773	0.582	0.735	0.649	0.711	0.805	0.788	0.797	0.642	0.652	0.647	0.722	0.936	0.163	0.278	0.356	0.977	0.522	0.400
2 + 3	100	0.578	0.867	0.693	0.569	0.302	0.394	0.544	0.593	0.843	0.696	0.535	0.420	0.471	0.584	0.656	0.298	0.410	0.339	0.723	0.461	0.436
	500	0.659	0.800	0.723	0.550	0.493	0.520	0.621	0.651	0.792	0.715	0.557	0.514	0.535	0.625	0.693	0.459	0.552	0.411	0.625	0.496	0.524
	1000	0.679	0.773	0.723	0.540	0.545	0.542	0.632	0.672	0.771	0.718	0.546	0.568	0.557	0.637	0.660	0.560	0.606	0.413	0.527	0.463	0.534
	1500	0.690	0.766	0.726	0.542	0.571	0.556	0.641	0.681	0.772	0.724	0.555	0.599	0.576	0.650	0.664	0.587	0.623	0.510	0.417	0.459	0.541
	2000	0.704	0.770	0.735	0.534	0.592	0.562	0.648	0.690	0.779	0.732	0.549	0.600	0.573	0.653	0.658	0.617	0.637	0.453	0.511	0.480	0.558

Table 5. Hyper-parameter tuning results for our dataset for the SVM classifier.

Kernel #	Best C	Best $γ$
Linear	1.0	0.01
RBF	1.0	1.0
Polynomial	0.01	10
Sigmoid	10	0.01

Table 6. Results for our dataset without SMOTE and after applying it. N refers to the size of the N-gram combinations (same as in Table 4). For each N, we boldface the best

F_{avg}

.

Table 6. Results for our dataset without SMOTE and after applying it. N refers to the size of the N-gram combinations (same as in Table 4). For each N, we boldface the best

F_{avg}

.

		Without SMOTE							With SMOTE
$N$	Model	$P_{favor}$	$R_{favor}$	$F_{favor}$	$P_{against}$	$P_{favor}$	$F_{against}$	$F_{avg}$	$P_{favor}$	$R_{favor}$	$F_{favor}$	$P_{against}$	$P_{favor}$	$F_{against}$	$F_{avg}$
1	MNB	0.735	0.788	0.7606	0.562	0.682	0.6164	0.6885	0.789	0.688	0.7354	0.649	0.716	0.6809	0.7081
	SVM linear	0.794	0.777	0.7852	0.638	0.622	0.6300	0.7076	0.847	0.718	0.7772	0.720	0.698	0.7089	0.7430
	SVM RBF	0.807	0.789	0.7979	0.650	0.641	0.6455	0.7217	0.845	0.769	0.8053	0.774	0.749	0.7613	0.7833
	SVM poly	0.625	0.890	0.7347	0.653	0.565	0.6059	0.6703	0.973	0.329	0.4916	0.563	0.932	0.7017	0.5966
	SVM sigmoid	0.728	0.820	0.7712	0.600	0.657	0.6275	0.6993	0.860	0.703	0.7734	0.723	0.661	0.6906	0.7320
	W-1NN	0.518	0.557	0.5366	0.354	0.502	0.4150	0.4758	0.918	0.172	0.2891	0.574	0.762	0.6549	0.4720
	W-5NN	0.641	0.779	0.7032	0.509	0.590	0.5468	0.6250	0.911	0.104	0.1869	0.517	0.733	0.6067	0.3968
	W-10NN	0.664	0.780	0.7173	0.551	0.603	0.5758	0.6465	0.959	0.073	0.1348	0.519	0.767	0.6186	0.3767
2	MNB	0.680	0.785	0.7288	0.553	0.546	0.5497	0.6392	0.787	0.540	0.6408	0.505	0.691	0.5837	0.6122
	SVM linear	0.642	0.817	0.7193	0.604	0.488	0.5397	0.6295	0.768	0.565	0.6509	0.683	0.538	0.6016	0.6263
	SVM RBF	0.672	0.780	0.7220	0.553	0.568	0.5606	0.6413	0.794	0.564	0.6594	0.678	0.588	0.6298	0.6446
	SVM poly	0.539	0.877	0.6679	0.526	0.250	0.3392	0.5036	0.821	0.430	0.5643	0.745	0.427	0.5427	0.5535
	SVM sigmoid	0.503	0.961	0.6608	0.597	0.204	0.3034	0.4821	0.812	0.538	0.6474	0.714	0.418	0.5274	0.5874
	W-1NN	0.652	0.588	0.6186	0.419	0.504	0.4574	0.5380	0.753	0.494	0.5967	0.672	0.546	0.6026	0.5997
	W-5NN	0.613	0.636	0.6241	0.414	0.460	0.4358	0.5299	0.609	0.641	0.6248	0.663	0.508	0.5749	0.5998
	W-10NN	0.584	0.676	0.6265	0.406	0.406	0.4058	0.5162	0.602	0.621	0.6113	0.675	0.457	0.5448	0.5781
3	MNB	0.513	0.932	0.6621	0.571	0.228	0.3263	0.4942	0.839	0.234	0.3658	0.360	0.920	0.5172	0.4415
	SVM linear	0.507	0.943	0.6593	0.591	0.201	0.2994	0.4793	0.850	0.231	0.3630	0.606	0.224	0.3267	0.3449
	SVM RBF	0.513	0.935	0.6625	0.586	0.224	0.3241	0.4933	0.809	0.251	0.3834	0.615	0.227	0.3315	0.3574
	SVM poly	0.505	0.943	0.6574	0.574	0.176	0.2690	0.4632	0.842	0.207	0.3327	0.647	0.199	0.3042	0.3185
	SVM sigmoid	0.467	0.994	0.6353	0.412	0.010	0.0200	0.3276	0.979	0.097	0.1765	0.341	0.985	0.5069	0.3417
	W-1NN	0.765	0.241	0.3670	0.353	0.881	0.5039	0.4354	0.379	0.902	0.5336	0.587	0.189	0.2861	0.4099
	W-5NN	0.757	0.243	0.3683	0.352	0.889	0.5041	0.4362	0.747	0.253	0.3783	0.358	0.896	0.5119	0.4451
	W-10NN	0.510	0.921	0.6567	0.563	0.221	0.3175	0.4871	0.763	0.249	0.3756	0.357	0.886	0.5087	0.4422
1 + 2	MNB	0.742	0.790	0.7652	0.578	0.676	0.6230	0.6941	0.798	0.705	0.7483	0.670	0.714	0.6913	0.7198
	SVM linear	0.795	0.776	0.7854	0.638	0.631	0.6343	0.7099	0.854	0.730	0.7874	0.734	0.717	0.7253	0.7564
	SVM RBF	0.807	0.787	0.7967	0.653	0.649	0.6509	0.7238	0.850	0.766	0.8060	0.775	0.755	0.7647	0.7853
	SVM poly	0.634	0.894	0.7419	0.648	0.568	0.6053	0.6736	0.976	0.339	0.5034	0.565	0.924	0.7012	0.6023
	SVM sigmoid	0.731	0.820	0.7734	0.601	0.643	0.6209	0.6972	0.864	0.699	0.7724	0.713	0.659	0.6849	0.7286
	W-1NN	0.533	0.589	0.5594	0.367	0.504	0.4244	0.4919	0.932	0.181	0.3028	0.566	0.752	0.6460	0.4744
	W-5NN	0.642	0.778	0.7037	0.518	0.596	0.5541	0.6289	0.913	0.118	0.2097	0.529	0.753	0.6211	0.4154
	W-10NN	0.674	0.789	0.7268	0.547	0.597	0.5710	0.6489	0.962	0.078	0.1436	0.513	0.766	0.6145	0.3791
1 + 3	MNB	0.743	0.787	0.7641	0.577	0.678	0.6235	0.6938	0.797	0.704	0.7472	0.672	0.711	0.6907	0.7189
	SVM linear	0.795	0.773	0.7839	0.638	0.630	0.6337	0.7088	0.848	0.730	0.7848	0.733	0.718	0.7256	0.7552
	SVM RBF	0.809	0.789	0.7987	0.653	0.647	0.6500	0.7243	0.852	0.765	0.8062	0.775	0.758	0.7663	0.7862
	SVM poly	0.635	0.894	0.7422	0.645	0.565	0.6026	0.6724	0.974	0.338	0.5018	0.568	0.926	0.7042	0.6030
	SVM sigmoid	0.733	0.822	0.7747	0.599	0.638	0.6180	0.6963	0.859	0.696	0.7686	0.712	0.653	0.6808	0.7247
	W-1NN	0.526	0.590	0.5561	0.359	0.485	0.4124	0.4843	0.933	0.184	0.3071	0.567	0.749	0.6453	0.4762
	W-5NN	0.644	0.778	0.7046	0.518	0.597	0.5547	0.6296	0.923	0.123	0.2164	0.528	0.749	0.6192	0.4178
	W-10NN	0.677	0.787	0.7279	0.553	0.606	0.5786	0.6532	0.965	0.084	0.1541	0.513	0.762	0.6133	0.3837
2 + 3	MNB	0.679	0.773	0.7226	0.540	0.545	0.5422	0.6324	0.780	0.528	0.6297	0.496	0.684	0.5753	0.6025
	SVM linear	0.645	0.809	0.7176	0.589	0.480	0.5290	0.6233	0.751	0.557	0.6392	0.674	0.519	0.5865	0.6129
	SVM RBF	0.672	0.771	0.7179	0.546	0.568	0.5566	0.6373	0.786	0.556	0.6511	0.682	0.597	0.6364	0.6437
	SVM poly	0.548	0.863	0.6700	0.539	0.274	0.3631	0.5165	0.790	0.446	0.5704	0.756	0.439	0.5551	0.5627
	SVM sigmoid	0.506	0.958	0.6624	0.594	0.217	0.3175	0.4900	0.795	0.530	0.6360	0.712	0.435	0.5403	0.5881
	W-1NN	0.660	0.560	0.6059	0.413	0.527	0.4630	0.5344	0.590	0.673	0.6288	0.688	0.547	0.6095	0.6191
	W-5NN	0.636	0.603	0.6191	0.414	0.488	0.4478	0.5335	0.590	0.652	0.6194	0.670	0.503	0.5749	0.5971
	W-10NN	0.597	0.641	0.6183	0.393	0.419	0.4056	0.5120	0.593	0.650	0.6201	0.675	0.467	0.5518	0.5859

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alkhraiji, A.K.; Azmi, A.M. Stance Detection in Arabic Tweets: A Machine Learning Framework for Identifying Extremist Discourse. Mathematics 2025, 13, 2965. https://doi.org/10.3390/math13182965

AMA Style

Alkhraiji AK, Azmi AM. Stance Detection in Arabic Tweets: A Machine Learning Framework for Identifying Extremist Discourse. Mathematics. 2025; 13(18):2965. https://doi.org/10.3390/math13182965

Chicago/Turabian Style

Alkhraiji, Arwa K., and Aqil M. Azmi. 2025. "Stance Detection in Arabic Tweets: A Machine Learning Framework for Identifying Extremist Discourse" Mathematics 13, no. 18: 2965. https://doi.org/10.3390/math13182965

APA Style

Alkhraiji, A. K., & Azmi, A. M. (2025). Stance Detection in Arabic Tweets: A Machine Learning Framework for Identifying Extremist Discourse. Mathematics, 13(18), 2965. https://doi.org/10.3390/math13182965

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stance Detection in Arabic Tweets: A Machine Learning Framework for Identifying Extremist Discourse

Abstract

1. Introduction

1.1. Problem Definition

1.2. Our Contributions

2. Background

2.1. Challenges in Handling Arabic Tweets

2.2. Stance Detection

3. Related Works

3.1. Stance Detection

3.2. Terrorism Detection

3.3. Concluding Remarks

4. Our Proposed System

4.1. Dataset and Annotation

4.2. Tweet Preprocessing

4.3. Stemming

5. Experiments and Discussion

5.1. Evaluation Metrics

5.2. Word Embedding Using FastText

5.3. Supervised Classifiers

5.4. Hyper-Parameter Optimization

5.5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI