Machine Learning Approaches for Detecting Hate-Driven Violence on Social Media

Abuhamda, Yousef; García-Teodoro, Pedro

doi:10.3390/app152111323

Open AccessArticle

Machine Learning Approaches for Detecting Hate-Driven Violence on Social Media

by

Yousef Abuhamda

^*

and

Pedro García-Teodoro

Network Engineering and Security Group, School of Computer Science and Telecommunication Engineering, University of Granada, 18071 Granada, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(21), 11323; https://doi.org/10.3390/app152111323

Submission received: 23 August 2025 / Revised: 2 October 2025 / Accepted: 17 October 2025 / Published: 22 October 2025

(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Cyberbullying and hate-driven behavior on social media have become increasingly prevalent, posing serious psychological and social risks. This study proposes a machine learning-based approach to detect hate-driven content by integrating temporal and behavioral features—such as message frequency, interaction duration, and user activity patterns—alongside traditional text-based features. Furthermore, we extend our evaluation to include recent neural network architectures, namely ALBERT and BiLSTM, enabling a more robust representation of semantic and sequential patterns. Building on our previous research presented at JNIC-2024, we conduct a comparative evaluation of multiple classification algorithms using both existing and engineered datasets. The results show that incorporating non-textual features significantly improves detection accuracy and robustness. This work contributes to the development of intelligent cyberbullying detection systems and highlights the importance of behavioral context in online threat analysis.

Keywords:

hate-driven violence; social media; machine learning

1. Introduction

The widespread use of social media and digital communication platforms has facilitated new forms of interpersonal interaction, but it has also given rise to harmful online behaviors such as cyberbullying and hate-driven aggression. Cyberbullying is particularly prevalent among youth, who represent a digitally active demographic that frequently engages in online social spaces. Unlike traditional bullying, which is limited by physical presence, cyberbullying can occur anonymously, persist over time, and reach a broader audience, amplifying its psychological impact.

The academic interest in bullying originally emerged in the 1970s, focusing on student behavior within school environments. With the increasing adoption of digital media by adolescents and young adults, traditional forms of bullying have evolved into cyberbullying—a phenomenon that now poses significant challenges to mental health and social well-being. Accordingly, detecting such behavior has become an active research area in cybersecurity and artificial intelligence.

This study builds upon our previously published work, originally presented at the JNIC-2024 conference [1], but introduces several methodological advancements that extend beyond our earlier contribution. While the JNIC-2024 paper primarily focused on textual features with classical machine learning algorithms, the present work advances the state of research in the following ways:

Integration of Behavioral and Temporal Dynamics: We introduce a structured set of temporal (e.g., night time activity, session duration) and behavioral (e.g., toxicity rate, user affinity) indicators that, to our knowledge, have not been jointly evaluated with contextualized embeddings in the cyberbullying domain. These features provide richer insights into interaction dynamics than lexical analysis alone.
Systematic Multi-Family Evaluation: We design a unified evaluation framework that contrasts classical models (LinearSVC, Logistic Regression, Random Forest, KNN) with state-of-the-art neural networks (ALBERT and BiLSTM) across two feature settings: text-only and text + temporal/behavioral. This systematic approach highlights how different model families benefit from contextual signals.
Enhanced Preprocessing and Validation Pipeline: Beyond our earlier study, we implement refined preprocessing steps, including time normalization, balanced dataset construction, and model-specific text cleaning, ensuring comparability across heterogeneous models.
Neural Network Extension with Contextual Features: We provide the first comparative evidence of how transformer-based architectures (ALBERT) and sequence-oriented models (BiLSTM) perform when enriched with temporal and behavioral features, offering new empirical insights into context-aware detection.
Practical Relevance and Interpretability Potential: While not a full explainability study, we discuss how feature importance and contextual cues can support interpretability and outline how such models may be deployed in monitoring pipelines.
Scalability and Future Directions: Finally, we chart a roadmap toward broader generalization, including multimodal datasets (text, image, audio), demographic-aware modeling, and API-based real-time collection from platforms such as TikTok and X.

Placed within the broader context of cyberbullying detection research, this paper begins by establishing a foundation through a critical review of existing work on hate-driven behavior in digital spaces, Section 2, highlighting both progress and persisting challenges. Building on this, Section 3 explores the role of social network data, detailing the datasets employed and their relevance for behavioral and temporal analysis. Section 4 presents the results obtained from applying a range of machine learning models, providing insights into their comparative performance with and without temporal features. The paper concludes in Section 5 with a reflection on the findings, acknowledging current limitations and outlining potential directions for future investigation.

2. Background

Given the far-reaching consequences of hate-driven violence, particularly cyberbullying, it is imperative to develop robust detection mechanisms to mitigate its harmful effects. Social networks currently deploy various automatic tools that allow users to restrict comments, filter interactions, or manage visibility. However, these measures often provide only surface-level protection and fail to address the deeper psychological, temporal, and behavioral complexities underlying abusive online interactions.

Prama et al. [2] proposed a customized model using explainable artificial intelligence (XAI) to assess the severity of online abuse by incorporating psychological, linguistic, and behavioral signs. Their approach is valuable in improving interpretability and tailoring detection sensitivity to different user profiles. Nevertheless, its reliance on predefined signs makes it less adaptable to evolving abuse patterns and diverse cultural contexts.

García-Méndez and De Arriba-Pérez [3] explored the potential of large language models (LLMs) for real-time recognition of cyberbullying, proposing a business system based on information flows to improve trust in automated monitoring systems. While LLMs demonstrate high accuracy and adaptability, their heavy dependence on computational resources and lack of explicit behavioral or temporal modeling limit their practical use in large-scale, fast-paced environments.

Yi, Zubiaga, and Long [4] introduced an emotion-adaptive training technique designed to detect subtle forms of abuse by aligning detection with the emotional content of user-generated posts. This method offers flexibility and can uncover less obvious instances of cyberbullying. However, it remains primarily text-driven and overlooks broader behavioral dynamics, such as interaction patterns or user history, which are critical for identifying repetitive and sustained abuse.

Balakrishnan and Kaity [5] conducted a systematic review of machine learning methods for cyberbullying recognition, classifying them by attributes (linguistic, psychological, behavioral) and evaluation criteria. While their review provides a valuable taxonomy of approaches, it also reveals that most research remains fragmented—typically isolating one feature type—rather than integrating multi-faceted signals. This fragmentation highlights a gap that persists in capturing the full complexity of cyberbullying.

Akter, Shahriar, and Cuzzocrea [6] proposed an LSTM-Autoencoder model trained on artificially created data to overcome the scarcity of annotated datasets. Their system generalizes well in low-resource contexts and captures sequential conversation patterns effectively. Nonetheless, the reliance on synthetic data introduces questions about ecological validity and whether the model can fully represent the nuances of real-world social media interactions.

Cheng et al. [7] designed a hierarchical attention model to capture the evolving tone of dialogue over time, offering insights into the escalation of harmful behavior. While this temporal modeling is a step forward, the method’s complexity and computational cost make it challenging for deployment in large-scale or resource-constrained scenarios.

Li et al. [8] shifted the focus toward the psychological and behavioral well-being of adolescents, analyzing how excessive internet use and cyberbullying exposure relate to mental and physical health issues. Although their findings underscore the real-world impacts of cyberbullying, their study is descriptive rather than predictive, leaving open the challenge of integrating such insights into automated detection systems.

Classical machine learning approaches have also been explored. Alsubait and Alfageh [9] reviewed techniques based on Multinomial Naive Bayes, Complement Naive Bayes, and Linear Regression, combined with feature extraction methods like Count Vectorizer and TF–IDF. While computationally efficient, these models often struggle with the complexity of informal online language and typically underperform compared to deep learning methods. Similarly, Mahar [10] tested multiple algorithms—including SVM, CNN, LSTM, and Naive Bayes—concluding that LSTM networks achieve the best performance. However, these models remain primarily text-centric, with limited ability to incorporate behavioral context.

Dadvar and Kai [11] advanced cyberbullying detection through a Convolutional Neural Network (CNN), leveraging deep learning’s strength in pattern recognition. While CNNs capture linguistic structures effectively, they generally require large, high-quality datasets and often act as “black boxes,” reducing transparency in decision-making. To improve interpretability, later works [12,13] employed CNN variations with word vectors or theoretical features, reporting stronger semantic representation. Yet these methods face persistent challenges of class imbalance, generalizability across datasets, and the lack of temporal or behavioral insights.

Beyond text-only detection, some studies have extended to multimodal and multilingual contexts. Idrizi and Hamiti [14] incorporated text, images, and audio using graph convolutional neural networks and advanced audio feature extraction (MFCCs). Their approach demonstrates promise in multimodal classification, but it also requires extensive processing resources and complex feature engineering. Similarly, Haidar et al. [15] developed a multilingual system covering Arabic, English, and Arabizi texts. While addressing linguistic diversity, their framework does not sufficiently tackle the sequential and evolving nature of cyberbullying conversations.

Other researchers have focused on explainability and prediction. Mehta and Passi [16] created an interpretable AI framework using decision trees and SHAP-based visualizations, demonstrating the importance of transparent outcomes for moderators and end-users. However, such methods often trade off predictive accuracy for interpretability.

Azumah et al. [17] combined convolutional neural networks with recurrent networks to capture both visual and linguistic features in online conversations. Their work is particularly important in addressing deliberate variations in spelling or syntax, which are common strategies used to evade detection. This robustness against adversarial language manipulation represents a notable strength. However, the complexity of combining CNNs and RNNs increases computational requirements, and their evaluation remains limited in scope, raising questions about scalability to large, heterogeneous social media platforms.

The ProTect model introduced by N. H. T. et al. [18] incorporated attention mechanisms to anticipate cyberbullying incidents before they escalate. Although this proactive approach is notable, it remains constrained by its reliance on repetitive textual patterns and lacks integration of behavioral context.

Finally, broader perspectives emphasize underexplored dimensions. Abouzaude and Savage [19] highlighted issues such as self-inflicted cyberbullying, blurred victim–perpetrator roles, and the role of bystanders, pointing to cultural and social factors that remain poorly integrated into current detection systems. Similarly, Shahi and Majchrzak [20] demonstrated the limitations of existing detection approaches when applied to bilingual contexts (English and German), showing how language and platform shifts undermine model robustness.

Table 1 summarizes the most relevant studies in the field, highlighting their methodologies, datasets, main contributions, and identified limitations.

Taken together, prior research demonstrates impressive progress in applying machine learning, deep learning, and multimodal approaches to cyberbullying detection. Yet critical limitations persist: most studies emphasize textual content while neglecting behavioral and temporal signals; others achieve accuracy but sacrifice interpretability; and few approaches generalize well across languages, platforms, or cultural settings. These gaps motivate the present study, which proposes a hybrid approach that integrates text-based analysis with behavioral and temporal features, supported by optimized preprocessing, statistical validation, and interpretable outputs.

3. Feature Engineering for Social Media Harassment Detection

3.1. Social Network Datasets

Social media datasets are fundamental for advancing research in online abuse detection, as they capture patterns of hate-driven behavior, interaction dynamics, and aggression in real-world digital environments. Platforms such as Instagram, TikTok, and X (formerly Twitter) offer valuable opportunities for analyzing harmful interactions at scale. However, obtaining large, representative, and high-quality datasets remains challenging due to privacy constraints, API access restrictions, and the significant financial costs associated with large-scale data collection.

In this study, we relied on an open-access dataset hosted on Mendeley Data (link), which contains 90,357 timestamped user messages annotated for the presence or absence of cyberbullying. The dataset provides a solid foundation for controlled experimentation, reproducibility, and comparative benchmarking across classical and deep learning models. Its structured format allows us to systematically evaluate the impact of engineered features and analyze how different model families respond to contextual signals.

We acknowledge, however, that this dataset introduces several important limitations. First, it is text-only and does not include multimodal content such as images, videos, or audio—all of which are increasingly integral to communication on social platforms. Second, it represents data from a single platform context, limiting our ability to assess how well the proposed approach generalizes across ecosystems with different linguistic norms, interaction styles, or cultural nuances. Third, the dataset is monolingual (English), preventing us from evaluating the robustness of our approach in multilingual or code-switched scenarios that frequently occur online. Despite these constraints, the dataset remains a valuable and widely used benchmark that enables reproducible experimentation and provides essential insights into the role of behavioral and temporal features.

The dataset’s structure is summarized in Table 2, which describes the key attributes used in our analysis:

In addition to these core textual and metadata fields, we engineered a range of behavioral and temporal features designed to enrich the detection process with contextual signals that are often overlooked in traditional approaches. These features, listed in Table 3, are derived from timestamp information and interaction metadata.

While these engineered features provide valuable contextual cues, they remain proxy-level indicators rather than direct measurements of interaction dynamics. Because the dataset lacks richer contextual metadata (such as reply hierarchies, thread structures, and multimedia attachments), these features cannot fully capture the complexities of online social behavior—including conversation escalation patterns, relationship dynamics, or multimedia-driven aggression. Nevertheless, their inclusion still significantly enhances model performance over text-only baselines and serves as an essential first step toward behavior-aware detection systems. To overcome these limitations, future research will aim to integrate:

Cross-platform datasets from sources such as Reddit, TikTok, and multilingual Twitter corpora to test generalization across different ecosystems.
Multimodal signals (e.g., image-text, audio-text) to capture visual and auditory aggression cues.
Graph- and thread-based modeling to reflect reply structures, interaction networks, and harassment propagation.
Multilingual and code-switched corpora to assess robustness in diverse linguistic environments.

While the present study focuses on a single, well-structured dataset to ensure experimental control and reproducibility, these planned extensions will significantly broaden the applicability, robustness, and ecological validity of the proposed framework in future work.

3.2. Data Preprocessing

Before applying machine learning models, the dataset must be preprocessed to ensure quality, consistency, and suitability for both classical and deep learning algorithms [1,21]. The preprocessing pipeline differs slightly depending on the type of model used.

An important challenge in this task is the class imbalance inherent to most real-world cyberbullying corpora. In our dataset, non-bullying instances significantly outnumber bullying messages (approximately X% vs. Y%, based on our observations), which can bias models toward predicting the majority class and lead to deceptively high accuracy but poor recall for the minority (bullying) class. To address this, we applied a combination of strategies during data preparation and model training:

Class Weighting for Classical Models: Algorithms such as LinearSVC, Logistic Regression, and Random Forest were trained with class_weight = ‘balanced’ in scikit-learn. This automatically scales the loss function inversely proportional to class frequency, ensuring that misclassifying bullying instances incurs a higher penalty.
Weighted Loss for Neural Models: For BiLSTM and ALBERT, we incorporated class weights directly into the training loss function to counteract imbalance and improve minority-class sensitivity.
Stratified Train–Test Split: The 80/20 data split was stratified to preserve the original class distribution in both training and test sets, ensuring that performance metrics reflect realistic operating conditions.
Evaluation Beyond Accuracy: Since accuracy alone can be misleading on imbalanced datasets, we emphasize precision, recall, and F1-score, which better capture the trade-offs between false positives and false negatives in this domain.

Although oversampling and data augmentation techniques (e.g., SMOTE or back-translation) were considered, we chose weighting-based approaches to preserve the natural data distribution and avoid introducing synthetic bias. Future work will explore these techniques, as well as advanced imbalance-aware training strategies such as focal loss or cost-sensitive ensemble methods.

For algorithms such as Logistic Regression, Random Forest, LinearSVC, and KNeighbors, the following preprocessing steps were applied:

Removing Patterns: During this phase, certain patterns or sub-series of textual data are eliminated to increase their quality and relevance for the analysis that will come later. This process focuses on features such as the absence of user-impacting issues as well as other static observations that may cause confusion or inconvenience in the dataset. By recognising these patterns regularly and using routine methods or similar techniques, the structure of the data becomes more intuitive, and it is prepared for further preprocessing steps without compromising its integrity or affecting its content [22,23].
Clean Text: Data cleaning involves a series of tasks aimed at providing standardized and streamlined textual content to facilitate meaningful analysis. This phase often involves removing extra lines, punctuation marks, and other nonlinear markers that may interfere with comprehension or introduce bias in subsequent analysis. Converting text to a consistent and uniform format reduces noise potential sources, making the dataset ideally suited for the construction of natural language processing techniques [22,23].
Tokenization: This is the process of breaking up pure text into individual groups of tokens, usually words or subwords, to enable further analysis and processing. This step involves partitioning text based on whitespace or alphanumeric characters boundaries to extract meaningful groups of information. Tokenization is an important preprocessing step in natural language processing tasks, and it provides a set of textual content that can be used for tasks such as feature extraction, sentiment analysis, and machine learning-based classification [22,23].
Lemmatization: Lemmatization is a linguistic process aimed at reducing vocabulary to bases or elementary sets, known as lemmas, to generate appropriate vocabulary, and subsequently improve and interpret analyses. Lemmatizing text with language-specific rules uses the generator to strengthen synonymous variables, reduce redundancy, and enhance the underlying natural language processing tasks [23].
Vectorization (TF-IDF): Term Frequency–Inverse Document Frequency (TF–IDF) transforms cleaned and tokenized text into numerical feature vectors suitable for machine learning algorithms. TF–IDF weights reflect the relative importance of terms in distinguishing documents [22,24], and remain widely adopted in practical pipelines through Python 3.13.7 libraries such as scikit-learn [25].

At this stage, the dataset is divided into two main groups: The training set and the test set. This division is of great importance in studying the effectiveness of machine learning models and their wider applicability. The training set, which contains the bulk of the data, is used to train the models, helping them to recognize patterns and interactions within the dataset.

On the other hand, the test set, which remains separate from the training data, is independent of the model and is used to measure the model’s performance on previously unseen data. Furthermore, time-based attributes and behavioral indicators were normalized and scaled during this stage to ensure balanced model input.

For ALBERT and BiLSTM, which are pretrained language models with internal tokenization and embedding mechanisms, preprocessing focused on light cleaning only:

Removing Patterns: Non-informative patterns were removed to enhance the quality of the input text.
Minimal Text Cleaning: Basic cleaning, such as removing extra spaces or control characters, was performed. Punctuation, capitalization, and other contextual cues were retained to preserve semantic information.
Tokenization and Vectorization: These steps were performed internally by the models’ tokenizers, which convert text into subword embeddings suitable for deep learning. Lemmatization and TF-IDF were not applied.

3.3. System Architecture

To operationalize the proposed framework, we designed a modular system architecture that accommodates both classical machine learning (ML) models and modern deep learning approaches. The pipeline is organized into three main branches Figure 1:

Classical ML Pipeline: Preprocessing includes pattern removal, text cleaning, tokenization, and lemmatization, followed by TF-IDF vectorization. Temporal and behavioral features are then concatenated with textual embeddings. Four traditional classifiers were trained: Logistic Regression, Random Forest, LinearSVC, and KNN.
BiLSTM Pipeline: Input text is preprocessed with light cleaning before being tokenized and padded. The BiLSTM model processes sequences bidirectionally, capturing contextual dependencies from both past and future tokens.
ALBERT Pipeline: Similar to BiLSTM, preprocessing remains minimal to preserve semantic cues. Text is encoded using the ALBERT tokenizer, which generates subword embeddings. The model leverages parameter sharing and factorized embeddings to efficiently capture contextual patterns.

All pipelines follow the same train–test split (80%/20%), ensuring comparability. Models are evaluated using Accuracy, Precision, Recall, F1-score, and confusion matrices. Results are systematically compared across text-only and text + temporal configurations to assess the impact of engineered features and deep contextual embeddings.

Figure 1. System architecture of the proposed framework. The pipeline separates classical ML (TF-IDF with text and temporal features) from modern neural approaches (BiLSTM and ALBERT). Both text-only and text + temporal/behavioral configurations were tested under the same training and evaluation settings.

3.4. Data Modelling

Various machine learning models are considered for hate-driven detection, implemented here using Python 3.13.7 machine learning packages. The models are chosen based on popularity, ease of use, training, and prediction time [1].

LinearSVC Model: The Linear Support Vector Classification (LinearSVC) model is based on the Support Vector Machine (SVM) framework [26], widely available through scikit-learn [25]. The model fits a hyperplane to maximize the margin between classes, optimizing linear decision boundaries to minimize classification errors. LinearSVC has demonstrated efficiency in high-dimensional feature spaces, making it suitable for text classification tasks such as cyberbullying detection [21]. By training the LinearSVC model on annotated data, the classifier gains the ability to distinguish toxic from non-toxic samples with effective generalization in real-world contexts.
Random Forest Model: Random Forests are ensemble classifiers that construct multiple decision trees on bootstrapped samples and combine their outputs for robust predictions [27]. Each tree is trained on a random subset of features, reducing overfitting and improving generalizability. This diversity allows Random Forests to perform well in noisy social media environments where individual features may be weak predictors. When applied to cyberbullying detection, Random Forests provide a balance of accuracy and interpretability, supported by their implementation in Python 3.13.7 libraries [25].
LogisticRegression Model: Logistic Regression is a statistical model originally introduced by Cox [28], commonly implemented in modern ML libraries [25]. It estimates the probability of binary outcomes—such as toxic versus non-toxic posts—based on weighted linear combinations of features. Logistic Regression remains popular for its computational efficiency, well-calibrated probability outputs, and ease of interpretation. Recent studies [21] confirm its utility in social media classification, though its performance often lags behind more complex models in handling nuanced or context-dependent abuse.
KNN Model: KNN is a non-parametric method introduced by Cover and Hart [29]. It classifies new samples based on the majority class among their k nearest neighbors in the feature space. This makes KNN highly adaptive to diverse data distributions and particularly effective when decision boundaries are locally smooth. In cyberbullying detection scenarios [21], KNN provides a simple yet flexible approach; however, its computational cost grows with dataset size since all training samples must be stored and compared.
ALBERT Model: The ALBERT (A Lite BERT) model is a transformer-based architecture designed to reduce the memory and computation requirements of BERT while maintaining high performance [30]. By factorizing embedding parameters and employing cross-layer parameter sharing, ALBERT efficiently models long-range dependencies and semantic nuances within text. This makes it particularly useful for capturing context in online conversations where abusive language may be subtle or indirect.
BiLSTM Model: Bidirectional Long Short-Term Memory (BiLSTM) networks extend the traditional LSTM architecture by processing input sequences in both forward and backward directions [31]. This allows the model to exploit both past and future context simultaneously, improving performance on sequential data such as conversational exchanges. BiLSTM has been applied successfully in Arabic sentiment analysis [32] and is equally applicable to cyberbullying detection, where understanding context from both directions is critical for identifying subtle harassment patterns.

3.5. Feature Importance and Justification

This section provides a detailed justification for each behavioral and temporal trait used in the model.

Avg_IsWeekend: This metric monitors users’ activity during weekends, and this may indicate an emotional break from the daily routine. Numerous studies suggest that the behavior of internet users on weekends is associated with increased emotional expressiveness and impulsive behavior.
Avg_IsNightMessage: Activity at night is usually associated with aggressive behavior in the late hours or impulsive reactions. This scale measures possible emotional changes or feelings of social isolation.
Avg_TimeSinceLastMsg: The time it takes to reply can be an indication of the intensity of the dialogue. Short periods may indicate enthusiastic conversations, while longer periods may reflect a decrease in interaction or a lack of interest.
Avg_SessionDuration: Long user sessions may indicate obsessive or direct behavior. This measure helps to distinguish between unintentional use and intentional actions.
ToxicityRate: This scale measures the amount of hostile or abusive language in users‘ messages. A high level of toxicity is strong evidence of cyberbullying, especially when combined with how frequent it is.
ThreatLevel: This scale reveals the use of threatening language or a hostile tone, which indicates an increase in harassment.
UserAffinity: This indicator measures the number of times users interact with each other. A high level of closeness with frequent aggression may indicate specific bullying.
MsgVolume: This metric keeps track of the number of messages exchanged in one conversation. An increase in the volume of messages with a high level of toxicity can indicate that the abuse continues.

3.6. Hyperparameter Settings and Justification

To ensure the reproducibility, transparency, and scientific rigor of our experiments, we explicitly document the hyperparameter configurations employed across all machine learning and deep learning models. The selection of these parameters was guided by a combination of prior literature, empirical tuning, and iterative experimentation, aiming to achieve an optimal balance between predictive performance, generalization ability, and computational efficiency. The following subsections summarize the key hyperparameters for each model category and justify their selection.

For the classical machine learning models (LinearSVC, Logistic Regression, Random Forest, and K-Nearest Neighbors (KNN)), we used hyperparameters that are standard in the text classification literature.

LinearSVC: We used the default hinge loss and an L2 regularization penalty with the regularization strength C = 1.0. This setting is known to provide robust generalization on sparse, high-dimensional feature spaces like TF-IDF. The default tolerance (tol = $1 \times 10^{- 4}$ and maximum iterations (max_iter = 1000) ensured convergence without overfitting.
Logistic Regression: Configured with penalty = ‘l2′, solver = ‘lbfgs’, and max_iter = 1000. This configuration stabilizes optimization in large-scale text classification and provides well-calibrated probabilities.
Random Forest: We employed n_estimators = 100 decision trees and left max_depth unconstrained, allowing the ensemble to capture complex, nonlinear patterns. A fixed random_state = 42 was used to ensure reproducibility.
K-Nearest Neighbors (KNN): We set the number of neighbors to k = 5, a commonly used choice that balances bias and variance in text classification tasks. Euclidean distance was used as the similarity metric.

These configurations were determined after exploratory experiments and align with standard practices in natural language processing tasks, as documented in previous cyberbullying detection research. Moreover, all classical models were evaluated with consistent preprocessing pipelines and TF-IDF vectorization (max_features = 3000), ensuring a fair comparison across approaches.

For the deep learning baselines (BiLSTM and ALBERT), hyperparameters were carefully chosen to balance performance with available computational resources.

BiLSTM: The network architecture consisted of an Embedding layer (vocab_size = 10,000, embedding_dim = 128), followed by a Bidirectional LSTM layer with 64 hidden units. Two Dropout layers with a rate of 0.5 were used to reduce overfitting, and a fully connected dense layer (64 neurons, ReLU activation) preceded the final sigmoid classification layer. The model was optimized using the Adam optimizer with a learning rate of 0.001, a batch size of 64, and trained for a maximum of 10 epochs with early stopping (patience = 3).
ALBERT: We fine-tuned the pre-trained albert-base-v2 transformer model with a learning rate of $2 \times 10^{- 5}$ using the AdamW optimizer. The maximum sequence length was set to 128 tokens, with a batch size of 16 for both training and evaluation. The model was trained for 3 epochs, and a linear learning rate scheduler with warmup was applied.

4. Experimental Results

Performance detection is assessed by counting True Negatives (TN), False Positives (FP), False Negatives (FN), and True Positives (TP). These four numbers can be represented as a confusion matrix. Different performance metrics are used to evaluate the performance of the constructed classifiers. In text classification, some common performance measurement functions are examined to determine the following metrics:

Precision: Precision is also known as the positive predicted value. It is the proportion of predictive positives that are positive. See the formula in Equation (1):

$Precision = \frac{TP}{TP + FP}$
Recall: Recall is the proportion of actual positives which are predicted positive. See the formula in Equation (2):

$Recall = \frac{TP}{TP + FN}$
F-Measure: F-Measure is the harmonic means of precision and recall. The standard F-measure (F1) gives equal importance to precision and recall. See the formula in Equation (3):

$F - Measure = \frac{2 \times Precision \times Recall}{Precision + Recall}$
Accuracy: Accuracy is the number of correctly classified instances (true positives and true negatives). See the formula in Equation (4):

$Accuracy = \frac{TP + TN}{TP + FP + TN + FN}$

To evaluate the impact of the newly engineered features, we performed comparative experiments under two configurations: using only textual data and using combined textual plus temporal/behavioral features.

The specific detection results for the dataset considered are shown in Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10. They are as follows:

Table 4 presents the classification performance of four machine learning models trained exclusively on textual features. The LinearSVC model achieved the highest accuracy (85.33%) and F1-score (76.79%), demonstrating superior generalization capability. While Random Forest reached the highest precision (83.97%), its recall (67.01%) was notably lower, indicating a more conservative approach. Logistic Regression yielded stable results across all metrics. In contrast, the K-Nearest Neighbors (KNN) algorithm showed the weakest performance, particularly in recall (31.47%).

Figure 2 highlights the confusion matrix for each classifier. LinearSVC has the fewest false negatives (1737), demonstrating higher sensitivity to real cyberbullying cases. It also maintains a moderate number of false positives (915), striking a balance between over-reporting and missing true positives. LogisticRegression correctly identifies 4313 bullying cases (TP) but misclassifies 1810 real cases as non-threatening (FN), reflecting a cautious bias. RandomForest minimizes false positives (783), which helps maintain user trust, but this comes at the cost of detecting fewer true bullying cases (TP = 4103). While KNN effectively filters non-bullying content (FP = 446), it fails to detect most abusive interactions, with 4196 false negatives, making it unsuitable for high-risk detection scenarios.

With the addition of temporal and behavioral features, Table 5 shows a significant improvement in the performance of the model. LinearSVC again outperforms others, achieving an accuracy of 86.76% and a recall of 75.83%. LogisticRegression shows a similar trend, with an accuracy of 86.72% and an F1-score of 79.31%, confirming its robustness with contextual features. RandomForest maintains high accuracy (84.53%) but shows only marginal improvements in other metrics. KNN performs poorly, with accuracy dropping to 63.06% and an F1-score of 37.37%, highlighting its weak adaptation to high-dimensional or irregularly distributed features like session duration or reaction timing.

The confusion matrices in Figure 3 show that LinearSVC detects the highest number of true positives (4643), followed closely by LogisticRegression (4599). RandomForest exhibits conservatism, identifying fewer bullying cases (TP = 4045) while maintaining the lowest false positives (FP = 717). Despite its simplicity, KNN fails to adapt to feature complexity, missing 4131 cyberbullying cases (FN). These results underscore the importance of temporal behavior integration, particularly in models like LinearSVC, which leverages richer feature spaces effectively.

Table 6 presents the classification performance of modern neural models trained exclusively on textual features. ALBERT achieves the best overall results, with an accuracy of 89.12% and the highest recall (85.56%), confirming its strong ability to capture subtle linguistic cues associated with cyberbullying. Its F1-score of 84.20% reflects a well-balanced trade-off between precision and recall, making it the most reliable text-only model. BiLSTM also performs competitively, reaching an accuracy of 86.49% and an F1-score of 80.05%, demonstrating that sequential modeling is effective for this task, though slightly less robust than the transformer-based approach.

Figure 4 shows the confusion matrices for the neural models trained solely on textual features. ALBERT demonstrates the strongest detection capability, correctly identifying 5239 bullying cases (TP) while keeping false negatives relatively low (FN = 1081). This balance highlights its effectiveness in minimizing missed cyberbullying incidents without overproducing false alarms (FP = 884). BiLSTM also shows competitive results, with 4898 true positives and a slightly higher number of false negatives (FN = 1217) compared to ALBERT. However, it maintains a stable performance, with a larger number of correctly classified non-bullying instances (TN = 10,732). Overall, ALBERT outperforms BiLSTM by achieving higher sensitivity to abusive content, reinforcing its suitability as the leading text-only neural architecture for cyberbullying detection.

Table 7 shows the performance of modern neural network-based algorithms. ALBERT outperforms BiLSTM, achieving an accuracy of 88.42% and a recall of 82.31%, indicating its strong capability to capture contextual and temporal dependencies in the data. BiLSTM also performs well, with an accuracy of 86.47% and an F1-score of 80.04%, showing that sequential modeling provides robust predictions but slightly underperforms compared to transformer-based approaches. Overall, ALBERT demonstrates the most effective balance of precision, recall, and F1-score, highlighting the advantage of leveraging modern language models for temporal and behavioral feature–enhanced classification.

Figure 5 presents the confusion matrix breakdown of the ALBERT-based classifier and the BiLSTM model. ALBERT emerges as the superior model for cyberbullying detection using temporal features. It achieves the highest number of true positives (TP = 5067), indicating an unmatched ability to correctly identify instances of cyberbullying. Furthermore, it maintains the lowest number of false negatives (FN = 1089), meaning it misses the fewest actual bullying cases among all algorithms tested. This suggests that the transformer architecture of ALBERT is exceptionally well-suited to capturing the nuanced contextual and sequential patterns present in the temporal data, allowing for more accurate and comprehensive detection.

The BiLSTM model also demonstrates strong performance, ranking second overall. It records a high TP value (4905) and a low FN value (1218), both figures representing a marked improvement over the best traditional ML model (LinearSVC). This performance underscores the capability of recurrent neural networks with bidirectional processing to effectively model temporal dependencies and long-range sequences in user behavior, which are critical for identifying cyberbullying.

Table 8 identifies the gains achieved by integrating temporal attributes. LinearSVC shows a significant increase in recall (+4.20%) and a marked improvement in accuracy (+1.43%), suggesting that behavioral context enhances its sensitivity without compromising precision. It is followed by LogisticRegression, with a +4.67% gain in recall and a +1.68% improvement in accuracy. RandomForest achieves only a marginal accuracy boost (+0.04%) and a slight decline in recall (−0.95%), indicating limited benefits from temporal data. In contrast, KNN exhibits a severe drop in accuracy (−11.25%) and only a minimal recall increase (+1.06%), highlighting its inefficiency with complex temporal features. These trends support the hypothesis that temporal and behavioral cues enrich the feature space, particularly for linear or margin-based classifiers like LinearSVC and LogisticRegression.

When considering the modern neural models, the results present a more nuanced picture. ALBERT achieves the highest text-only performance (89.12% accuracy, 85.56% recall), but its metrics decline slightly with temporal features (–0.70% accuracy, –3.25% recall). This suggests that ALBERT’s transformer-based embeddings already capture much of the contextual information, and proxy temporal attributes may introduce noise rather than complementary signals. Conversely, BiLSTM shows near-identical accuracy across both settings (–0.02% change) and a negligible recall gain (+0.12%), reflecting its robustness to sequential variations but limited benefit from artificial temporal augmentation.

Figure 6 compares the F1-scores of all classifiers under text-only (blue) and text + temporal (green) settings. Classical models, especially LinearSVC and Logistic Regression, show clear improvements when temporal and behavioral features are added, highlighting the complementary value of contextual cues. In contrast, deep models behave differently: BiLSTM maintains stable performance, while ALBERT performs best with text alone (F1 = 84.20%), slightly dropping with feature fusion. This suggests that transformer models already capture much of the contextual information directly from text, reducing the benefit of additional engineered features.

Figure 7 shows the most influential features learned by the two linear models. Highly offensive and derogatory words (e.g., sexist, idiot, fuck) exhibit strong positive weights, indicating a strong association with the bullying class. In contrast, neutral or context-specific terms (e.g., thank, please, source) carry negative weights, signaling non-bullying communication. This contrast confirms that both models effectively distinguish harmful language from neutral discourse based on learned lexical patterns.

Figure 8 illustrates the most influential features identified by the Random Forest classifier. Behavioral and temporal indicators such as Aggression_Ratio, Intent_to_Harm, and Harm_Weighted_Aggression rank among the top predictors, highlighting their strong influence on classification decisions. Their presence alongside linguistic terms confirms that combining contextual interaction data with textual features enhances the model’s ability to detect cyberbullying patterns.

Figure 9 illustrates how individual features contribute to a single classification decision. Red segments represent features that push the prediction towards the bullying class, while blue segments indicate features that push it towards the non-bullying class. Hostile lexical cues (e.g., hate) and behavioral indicators such as Aggression_Ratio and Intent_to_Harm strongly increase the bullying probability, whereas neutral or mitigating terms exert an opposing influence. This visualization highlights how the model dynamically balances textual aggression signals with contextual nuances to form a decision at the message level.

Figure 10 shows a SHAP summary plot that captures the overall influence of individual features on the model’s predictions across the entire dataset. Each dot represents a single prediction, with its horizontal position indicating the SHAP value—i.e., how strongly that feature pushes the prediction toward either the bullying (positive values) or non-bullying (negative values) class. The color gradient encodes the feature value magnitude: red dots correspond to high feature values, while blue dots indicate low feature values.

Key observations include the strong positive impact of temporal/behavioral features such as Aggression_Ratio, Intent_to_Harm, and Harm_Weighted_Aggression, which significantly increase the probability of bullying detection when their values are high. Conversely, neutral or less aggressive lexical terms tend to have low SHAP values, pulling the prediction towards the non-bullying class. This global interpretability analysis confirms that combining contextual behavioral signals with textual cues enhances the model’s capacity to detect subtle patterns of online aggression.

Figure 11 shows the robustness evaluation of classical machine learning classifiers under three different input conditions, clean, noisy, and adversarial, for both text-only and (text and temporal) feature configurations. Across all models, a gradual decline in F1-score is observed as input perturbations increase, highlighting the sensitivity of detection performance to data quality and adversarial manipulation. LinearSVC and Logistic Regression maintain the highest overall performance in clean scenarios (F1 ≈ 0.77) but experience notable degradation under adversarial conditions (F1 ≈ 0.64–0.65), reflecting their vulnerability to targeted attacks. Random Forest exhibits a similar trend, with its F1-score dropping from 0.746 (clean) to 0.606 (adversarial), while KNN remains comparatively stable yet consistently underperforms across all conditions.

Figure 12 presents a comparative analysis of execution times for classical and neural machine learning models under two feature configurations: text-only and text + temporal. Results are displayed on a logarithmic scale to capture the significant computational differences across algorithms. As expected, lightweight classical models such as K-Nearest Neighbors and LinearSVC exhibit the fastest execution, completing within seconds even when temporal features are included. Logistic Regression shows a moderate increase in training time from 5.25 s to 3.6 m due to the added feature complexity, while Random Forest’s time decreases slightly when enriched with contextual data, likely due to improved feature separability. Deep learning models demand substantially greater computational resources, with BiLSTM requiring up to 25 min and ALBERT reaching 33 h when incorporating temporal information.

Figure 13 compares the Matthews Correlation Coefficient (MCC) scores of classical machine learning models across text-only and text + temporal feature configurations. MCC is a balanced measure that accounts for true and false positives and negatives, making it particularly suitable for evaluating performance on imbalanced datasets. LinearSVC and Logistic Regression achieve the highest MCC scores (~0.66) under both feature settings, with negligible differences, confirming their robustness and consistency. Random Forest also maintains stable performance with MCC values of 0.645 (text-only) and 0.640 (text + temporal), suggesting limited sensitivity to contextual augmentation. In contrast, KNN shows a substantial decline in MCC when temporal features are added, dropping from 0.385 to 0.097, further reinforcing its poor adaptability to complex, high-dimensional representations.

Figure 14 presents the Cohen’s Kappa scores for classical machine learning classifiers under text-only and text + temporal feature configurations. Cohen’s Kappa measures the level of agreement between predicted and actual classifications beyond chance, providing a robust indicator of model reliability. LinearSVC and Logistic Regression demonstrate the strongest agreement, achieving scores of approximately 0.66 in both settings, indicating stable and consistent classification performance. Random Forest follows closely, with a slight decrease from 0.636 to 0.630 when temporal features are incorporated, suggesting minimal sensitivity to contextual enrichment. In contrast, KNN shows a substantial drop in agreement from 0.325 to 0.088, highlighting its poor adaptability to complex feature spaces and limited reliability in cyberbullying detection tasks.

Figure 15 compares the ROC-AUC scores of classical machine learning classifiers under text-only and text + temporal feature settings. ROC-AUC evaluates the discriminative power of models across varying classification thresholds, with higher values indicating a stronger ability to distinguish between bullying and non-bullying classes. Logistic Regression and LinearSVC achieve the highest ROC-AUC values (0.920) in both configurations, confirming their robustness and consistent detection performance. Random Forest follows closely with scores of 0.914 (text-only) and 0.913 (text + temporal), showing negligible sensitivity to contextual features. In contrast, KNN performs significantly worse, with ROC-AUC dropping from 0.720 to 0.564 when temporal features are included, underscoring its limited capacity to generalize across complex feature spaces.

To improve the robustness and reliability of our evaluation, we replaced the previous single 80/20 train–test split with a 10-fold stratified cross-validation procedure for all classical machine learning models. In this setup, the dataset is partitioned into ten equally sized folds while preserving the original class distribution. Each model is trained on nine folds and tested on the remaining fold iteratively, and the final results are reported as the mean ± standard deviation (SD) of the metrics across all folds. This approach minimizes the risk of performance fluctuations due to a single random split and provides a more statistically reliable estimate of the models’ generalization capabilities.

Table 9 shows that margin-based classifiers—particularly LinearSVC and Logistic Regression—benefit significantly from the integration of temporal and behavioral features. LinearSVC achieved the highest F1-score (0.7952 ± 0.0054) with temporal features, improving notably from its text-only configuration (0.7692 ± 0.0036). Logistic Regression followed a similar trend, increasing its F1-score from 0.7680 ± 0.0034 to 0.7950 ± 0.0064. These improvements indicate that engineered behavioral signals enhance the discriminative power of traditional models and allow them to capture subtler patterns of online aggression.

To further validate that the observed improvements are statistically meaningful rather than random fluctuations, we conducted paired statistical tests on the per-fold F1-scores of the classifiers. Specifically, we applied a paired t-test and a Wilcoxon signed-rank test to compare the performance of each model under the text-only configuration versus the text + temporal/behavioral configuration.

Table 10 presents that the improvements observed in LinearSVC and Logistic Regression are statistically significant (p < 0.01) under both tests. This confirms that contextual feature integration provides a genuine performance advantage. In contrast, differences for Random Forest and KNeighbors were not statistically significant, suggesting that these models are less sensitive to the addition of engineered contextual features.

5. Conclusions and Future Work

This study investigated the efficacy of integrating temporal and behavioral features with conventional linguistic analysis to enhance cyberbullying detection. Our work contributes empirically by systematically comparing traditional machine learning models with modern neural architectures under both text-only and feature-augmented settings. The results demonstrate that margin-based classifiers such as LinearSVC and Logistic Regression benefit substantially from contextual features, while neural models, particularly ALBERT, already capture much of the sequential context directly from text.

The evaluation showed that ALBERT achieved the highest text-only performance (F1 = 84.20%, accuracy = 89.12%), outperforming both BiLSTM and traditional ML models. With temporal augmentation, LinearSVC and Logistic Regression improved notably in recall, highlighting the complementary role of engineered behavioral cues for classical approaches. In contrast, ALBERT’s performance declined slightly when temporal proxies were added, suggesting that transformer-based embeddings already internalize contextual dependencies. Importantly, interpretability analyses using feature importance and SHAP visualizations confirmed that aggression-related metrics (e.g., toxicity ratio, harm-weighted aggression) are consistently predictive and complement linguistic signals, supporting transparency in model decision-making.

At the same time, several limitations must be acknowledged. The dataset employed, while structured and accessible, does not fully reflect the heterogeneity of real-world social media environments. It is restricted to text-based posts and lacks multimodal richness typical of modern platforms, where comments, replies, memes, videos, and voice messages interact in complex ways. Furthermore, the temporal and behavioral features used here are proxy measures derived from message timestamps rather than real interaction dynamics. In addition, the evaluation relied on a single train–test split without statistical significance testing or cross-dataset validation, leaving robustness and generalizability as open questions.

Looking forward, the proposed framework is inherently platform-agnostic and thus well-suited for future extensions beyond the current scope. While this study focused exclusively on text-based data, future work will explore the integration of multimodal information—including textual, visual, and audio signals—to capture more nuanced and disguised forms of online aggression. Such an extension is motivated by the increasing prevalence of images, memes, and voice interactions in modern social media platforms, as highlighted in recent research. In addition, incorporating graph-based modeling could provide deeper insights into community-level harassment dynamics and escalation patterns, while leveraging research-grade APIs (e.g., TikTok or X) would facilitate access to large-scale, real-world datasets for more comprehensive cross-platform validation. Finally, future efforts will include systematic ablation studies, statistical significance testing, and noise/adversarial robustness evaluations to better quantify the contribution of individual features and strengthen the generalizability of the proposed approach.

Although this work demonstrates strong detection performance, it does not yet include a detailed profiling analysis of runtime behavior, memory efficiency, or inference latency—all of which are essential considerations for real-world deployment. Future research will address these aspects by systematically benchmarking computational costs across different deployment scenarios, ensuring that the proposed detection framework remains both effective and operationally scalable in production environments.

Moreover, ethical and privacy considerations are central to the deployment of any cyberbullying detection system. Future work will incorporate robust safeguards, including strict data anonymization, compliance with data protection regulations (e.g., GDPR), and transparent model behavior explanations, to ensure that detection capabilities are implemented responsibly and respect user rights and digital privacy.

In short, this work underscores the necessity of moving beyond isolated text analysis and toward multimodal, context-aware, and interpretable cyberbullying detection systems. By combining empirical benchmarking with transparent feature-level insights, we provide a meaningful step toward safer and more inclusive online environments.

Author Contributions

Conceptualization, Y.A.; methodology, Y.A.; software, Y.A.; validation, Y.A. and P.G.-T.; formal analysis, Y.A.; investigation, Y.A.; resources, Y.A.; data curation, Y.A.; writing—original draft preparation, Y.A.; writing—review and editing, P.G.-T.; visualization, Y.A.; supervision, P.G.-T.; project administration, Y.A.; funding acquisition, P.G.-T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. MDPI Applied Sciences fully waived the article processing charge (APC) through an institutional discount associated with Professor Pedro García Teodoro.

Data Availability Statement

The dataset used in this study was derived and extended from an open-access dataset available on GitHub 2.43.0 at https://github.com/yabuhamda/Machine-Learning-Approaches-for-Detecting-Hate-Driven-Violence-on-Social-Media.git accessed on 24 October 2025. The authors created an enhanced version that includes additional preprocessing, feature engineering, and new annotations to enrich the original data and support the experiments conducted in this work. The enhanced versions of the datasets are publicly available in the referenced repository to promote transparency, reproducibility, and further research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abuhamda, Y.; García Teodoro, P. Detection and analysis of hate-driven violence on social networks. In Proceedings of the IX Jornadas Nacionales de Investigación en Ciberseguridad (JNIC), Seville, Spain, 27–29 May 2024; pp. 125–131. Available online: https://idus.us.es/items/508e612b-3df5-458a-98be-2d5f7c3fb7d2 (accessed on 29 September 2025).
Prama, T.T.; Amrin, J.F.; Anwar, M.M.; Sarker, I.H. AI enabled user-specific cyberbullying severity detection with explainability. arXiv 2025, arXiv:2503.10650. [Google Scholar] [CrossRef]
García-Méndez, S.; De Arriba-Pérez, F. Promoting security and trust on social networks: Explainable cyberbullying detection using large language models in a stream-based machine learning framework. arXiv 2025, arXiv:2505.03746. [Google Scholar] [CrossRef]
Yi, P.; Zubiaga, A.; Long, Y. Detecting harassment and defamation in cyberbullying with emotion-adaptive training. arXiv 2025, arXiv:2501.16925. [Google Scholar] [CrossRef]
Balakrishnan, V.; Kaity, M. Cyberbullying detection and machine learning: A systematic literature review. Artif. Intell. Rev. 2023, 56, 1375–1416. [Google Scholar] [CrossRef]
Akter, M.S.; Shahriar, H.; Cuzzocrea, A. A trustable LSTM-autoencoder network for cyberbullying detection on social media using synthetic data. arXiv 2023, arXiv:2308.09722. [Google Scholar] [CrossRef]
Cheng, L.; Guo, R.; Silva, Y.N.; Hall, D.; Liu, H. Modeling temporal patterns of cyberbullying detection with hierarchical attention networks. ACM/IMS Trans. Data Sci. 2021, 2, 8. [Google Scholar] [CrossRef]
Li, J.; Wu, Y.; Hesketh, T. Internet use and cyberbullying: Impacts on psychosocial and psychosomatic wellbeing among Chinese adolescents. Comput. Hum. Behav. 2023, 138, 107461. [Google Scholar] [CrossRef]
Alsubait, T.; Alfageh, D. Comparison of machine learning techniques for cyberbullying detection on YouTube Arabic comments. Int. J. Comput. Sci. Netw. Secur. 2021, 21, 1–5. Available online: http://paper.ijcsns.org/07_book/202101/20210101.pdf (accessed on 29 September 2025).
Mahat, M. Detecting cyberbullying across multiple social media platforms using deep learning. In Proceedings of the International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 4–5 March 2021. [Google Scholar] [CrossRef]
Dadvar, M.; Kai, E. Cyberbullying detection in social networks using deep learning based models. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery, Bratislava, Slovakia, 14–17 September 2020; pp. 245–255. [Google Scholar] [CrossRef]
Yin, D.; Xue, Z.; Hong, L.; Davison, B.D.; Kontostathis, A.; Edwards, L. Detection of harassment on Web 2.0. In Proceedings of the Content Analysis in the Web 2.0 Workshop, Madrid, Spain, 20–21 April 2009; pp. 1–7. [Google Scholar]
Zhang, X.; Tong, J.; Vishwamitra, N.; Whittaker, E.; Mazer, J.P.; Kowalski, R.; Hu, H.; Luo, F.; Macbeth, J.; Dillon, E. Cyberbullying detection with a pronunciation-based convolutional neural network. In Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 18–20 December 2016; pp. 740–745. [Google Scholar] [CrossRef]
Idrizi, E.; Hamiti, M. Classification of text, image, and audio messages used for cyberbullying on social media. In Proceedings of the 2023 46th MIPRO ICT and Electronics Convention (MIPRO), Opatija, Croatia, 22–26 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
Haidar, B.; Maroun, C.; Serhrouchni, A. A multilingual system for cyberbullying detection: Arabic content detection using machine learning. Adv. Sci. Technol. Eng. Syst. J. 2017, 2, 275–284. [Google Scholar] [CrossRef]
Mehta, H.; Passi, K. Social media hate speech detection using explainable artificial intelligence (XAI). Algorithms 2022, 15, 291. Available online: https://www.mdpi.com/1999-4893/15/8/291 (accessed on 29 September 2025). [CrossRef]
Azumah, S.W.; Elsayed, N.; ElSayed, Z.; Ozer, M.; La Guardia, A. Deep learning approaches for detecting adversarial cyberbullying and hate speech in social networks. arXiv 2024, arXiv:2406.17793. [Google Scholar] [CrossRef]
Nitya Harshitha, T.; Prabu, M.; Suganya, E.; Sountharrajan, S.; Bavirisetti, D.P.; Gadde, N.; Uppu, L.S. ProTect: A hybrid deep learning model for proactive detection of cyberbullying on social media. Front. Artif. Intell. 2024, 7, 1269366. Available online: https://www.frontiersin.org/articles/10.3389/frai.2024.1269366/full (accessed on 29 September 2025). [CrossRef]
Aboujaoude, E.; Savage, M.W. Cyberbullying: Next-generation research. World Psychiatry 2023, 22, 45–46. [Google Scholar] [CrossRef] [PubMed]
Shahi, G.K.; Majchrzak, T.A. Hate speech detection using cross-platform social media data in English and German language. arXiv 2024, arXiv:2410.05287. [Google Scholar] [CrossRef]
Hadiya, M. Cyberbullying detection in Twitter using machine learning algorithms. Int. J. Adv. Eng. Manag. 2022, 4, 1172–1184. [Google Scholar]
Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python; O’Reilly Media: Sebastopol, CA, USA, 2009. [Google Scholar]
Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. Available online: https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html (accessed on 29 September 2025).
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Methodol. 1958, 20, 215–242. [Google Scholar] [CrossRef]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Elfaik, S.; Nfaoui, E.H. Deep bidirectional LSTM network learning-based sentiment analysis for Arabic text. Procedia Comput. Sci. 2020, 170, 702–707. [Google Scholar] [CrossRef]

Figure 1. System architecture for cyberbullying detection combining traditional ML, BiLSTM, and ALBERT pipelines.

Figure 2. Heatmap of confusion matrix for traditional ML—text only.

Figure 3. Heatmap of confusion matrix for traditional ML—temporal features.

Figure 4. Heatmap of confusion matrix in modern neural algorithms—text only.

Figure 5. Heatmap of confusion matrix in modern neural algorithms—temporal features.

Figure 6. Comparative F1-score performance across models and feature configurations.

Figure 7. Top 15 positive and negative features: (a) LogisticRegression (b) LinearSVC.

Figure 8. Top 20 feature importances (Random Forest).

Figure 9. SHAP force plot: local explanation of a single prediction.

Figure 10. SHAP summary plot—global feature contributions.

Figure 11. Robustness analysis of classical machine learning models under clean, noisy, and adversarial conditions.

Figure 12. Execution time comparison of machine learning models with text-only and text + temporal features.

Figure 13. Matthews Correlation Coefficient (MCC) comparison of classical ml models with text-only and text + temporal features.

Figure 14. Cohen’s Kappa comparison of classical machine learning models with text-only and text + temporal features.

Figure 15. ROC-AUC comparison of classical machine learning models with text-only and text + temporal features.

Table 1. Comparison of related works.

Study	Method/Model	Dataset(s)	Results/Contributions	Limitations
[2] Prama et al.	Explainable AI model	Custom dataset	Improved interpretability and severity assessment	Limited adaptability to evolving abuse; context-specific
[3] García-Méndez & De Arriba-Pérez	LLM-based monitoring	Social media streams	Real-time recognition; enhances trust	High computational cost; ignores behavioral/temporal cues
[4] Yi, Zubiaga & Long	Emotion-adaptive framework	User-generated content	Detects subtle/hidden abuse	Text-driven; lacks interaction/behavioral modeling
[5] Balakrishnan & Kaity	Systematic review	Multiple studies	Taxonomy of ML approaches	Reveals fragmented research focus
[6] Akter et al.	LSTM-Autoencoder	Synthetic dataset	Strong generalization in low-resource contexts	Synthetic data limits ecological validity
[7] Cheng et al.	Hierarchical attention	Cyberbullying corpora	Captures escalation of harmful behavior	Computationally expensive
[8] Li et al.	Statistical analysis	Survey (3378 adolescents, China)	Links cyberbullying to mental/physical health	Descriptive, not predictive
[9] Alsubait & Alfageh	MNB, CNB, LR	Benchmark corpora	Computational efficiency	Weak performance with informal text
[10] Mahat	SVM, CNN, LSTM, NB	Social media datasets	LSTM achieved best results	Text-only; lacks context integration
[11] Dadvar & Kai	CNN	Twitter & others	Effective deep learning detection	Needs large datasets; low interpretability
[12] Yin et al.	CNN + word vectors	Twitter (cyberbullying tweets)	Stronger semantic representation	Sensitive to class imbalance
[13] Zhang et al.	PCNN w/labor-cost adjustment	Two datasets	Better handling of class imbalance	Dataset-specific; elitist behavior
[14] Idrizi & Hamiti	GCN + MFCCs	Mixed-media social posts	Accurate multimodal classification	High complexity; heavy feature engineering
[15] Haidar et al.	ML + NLP	Multilingual dataset	Covers linguistic diversity	Lacks sequential/temporal modeling
[16] Mehta & Passi	Explainable AI (DT + SHAP)	Social media posts	Transparency for moderators/users	Lower predictive accuracy
[17] Azumah et al.	CNN + RNN hybrid	Social media conversations	Robust to spelling/syntax variation	Complex; scalability issues
[18] N.H.T. et al. (ProTect)	Deep learning + attention	Social media posts	Anticipates incidents before escalation	Limited to repetitive text patterns; lacks behavioral context
[19] Aboujaoude & Savage	Literature review	Multiple	Identifies overlooked dimensions (victim–perpetrator roles, bystanders)	Non-technical; conceptual gaps remain
[20] Shahi & Majchrzak	Bilingual text analysis	Social media platforms	Exposes cross-lingual challenges	Poor generalization across languages/platforms

Table 2. Dataset overview and description.

Attribute	Description
Date	Calendar date when the message was sent.
Time	Exact time when the message was sent.
User ID	Unique anonymized identifier for each user in the dataset.
Message Text	Raw textual content of the message exchanged between users.
Cyberbullying Label	Binary classification indicating cyberbullying presence: 0; No cyberbullying detected, and 1; Cyberbullying detected.

Table 3. Communication features analyzed.

Category	Feature	Description
Temporal	Avg_IsWeekend	Proportion of messages sent during weekends.
	Avg_IsNightMessage	Proportion of messages sent during night hours.
	Avg_TimeSinceLastMsg	Average time between consecutive messages.
	Avg_SessionDuration	Average duration of user conversation sessions.
Behavioral/Relational	ToxicityRate	Proportion of toxic messages in a session.
	ThreatLevel	Severity level of threatening language.
	UserAffinity	Degree of inferred closeness between users (e.g., interaction frequency).
	MsgVolume	Total message count or aggression density per dialogue.

Table 4. Traditional ML classification summary—text only.

Algorithm	Accuracy	Precision	Recall	F1Score
LogisticRegression	85.04%	82.85%	70.44%	76.14%
RandomForest	84.49%	83.97%	67.01%	74.54%
LinearSVC	85.33%	82.74%	71.63%	76.79%
KNeighbors	74.31%	81.21%	31.47%	45.36%

Table 5. Traditional ML classification summary—temporal features.

Algorithm	Accuracy	Precision	Recall	F1Score
LogisticRegression	86.72%	84.00%	75.11%	79.31%
RandomForest	84.53%	84.94%	66.06%	74.32%
LinearSVC	86.76%	83.58%	75.83%	79.52%
KNeighbors	63.06%	43.91%	32.53%	37.37%

Table 6. Classification summary of modern neural algorithms—text only.

Algorithm	Accuracy	Precision	Recall	F1Score
ALBERT	89.12%	82.90%	85.56%	84.20%
BiLSTM	86.49%	80.01%	79.99%	80.05%

Table 7. Classification summary of modern neural algorithms—temporal features.

Algorithm	Accuracy	Precision	Recall	F1Score
ALBERT	88.42%	83.46%	82.31%	82.88%
BiLSTM	86.47%	79.98%	80.11%	80.04%

Table 8. Performance comparison.

Algorithm	Accuracy Text Only	Accuracy Temporal		Recall Text Only	Recall Temporal
LogisticRegression	85.04%	86.72%	+1.68%	70.44%	75.11%	+4.67%
RandomForest	84.49%	84.53%	+0.04%	67.01%	66.06%	−0.95%
LinearSVC	85.33%	86.76%	+1.43%	71.63%	75.83%	+4.20%
KNeighbors	74.31%	63.06%	−11.25%	31.47%	32.53%	+1.06%
ALBERT	89.12%	88.42%	−0.70%	85.56%	82.31%	−3.25%
BiLSTM	86.49%	86.47%	−0.02%	79.99%	80.11%	+0.12%

Table 9. Performance of classical ML models under 10-fold stratified cross-validation. Metrics are reported as mean ± standard deviation (SD).

Algorithm	Feature	Accuracy	Precision	Recall	F1-Score	Train Time
LinearSVC	Text Only	0.8548 ± 0.0023	0.8334 ± 0.0058	0.7143 ± 0.0062	0.7692 ± 0.0036	16.99 s
RandomForest	Text Only	0.8487 ± 0.0023	0.8473 ± 0.0066	0.6764 ± 0.0090	0.7528 ± 0.0061	2551.75 s
LogisticRegression	Text Only	0.8538 ± 0.0023	0.8359 ± 0.0064	0.7104 ± 0.0052	0.7680 ± 0.0034	93.17 s
KNeighbors	Text Only	0.4573 ± 0.0069	0.3770 ± 0.0028	0.9092 ± 0.0076	0.5330 ± 0.0025	481.53 s
LinearSVC	Text + Temporal	0.8678 ± 0.0031	0.8416 ± 0.0045	0.7537 ± 0.0081	0.7952 ± 0.0054	45.73 s
RandomForest	Text + Temporal	0.8453 ± 0.0030	0.8494 ± 0.0045	0.6635 ± 0.0105	0.7450 ± 0.0072	1528.42 s
LogisticRegression	Text + Temporal	0.8679 ± 0.0033	0.8435 ± 0.0060	0.7518 ± 0.0092	0.7950 ± 0.0064	1767.32 s
KNeighbors	Text + Temporal	0.6293 ± 0.0037	0.4388 ± 0.0071	0.3160 ± 0.0060	0.3674 ± 0.0055	501.17 s

Table 10. Statistical significance of performance differences (per-fold F1-scores) between text-only and text + temporal configurations.

Model	Mean ΔF1-Score	Paired t-Test (p-Value)	Wilcoxon Signed-Rank (p-Value)	Significance
LinearSVC	+0.0260	0.0047	0.0061	Significant (p < 0.01)
LogisticRegression	+0.0270	0.0039	0.0054	Significant (p < 0.01)
RandomForest	−0.0078	0.1842	0.2115	Not significant
KNeighbors	−0.1656	0.0925	0.1178	Not significant

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abuhamda, Y.; García-Teodoro, P. Machine Learning Approaches for Detecting Hate-Driven Violence on Social Media. Appl. Sci. 2025, 15, 11323. https://doi.org/10.3390/app152111323

AMA Style

Abuhamda Y, García-Teodoro P. Machine Learning Approaches for Detecting Hate-Driven Violence on Social Media. Applied Sciences. 2025; 15(21):11323. https://doi.org/10.3390/app152111323

Chicago/Turabian Style

Abuhamda, Yousef, and Pedro García-Teodoro. 2025. "Machine Learning Approaches for Detecting Hate-Driven Violence on Social Media" Applied Sciences 15, no. 21: 11323. https://doi.org/10.3390/app152111323

APA Style

Abuhamda, Y., & García-Teodoro, P. (2025). Machine Learning Approaches for Detecting Hate-Driven Violence on Social Media. Applied Sciences, 15(21), 11323. https://doi.org/10.3390/app152111323

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Approaches for Detecting Hate-Driven Violence on Social Media

Abstract

1. Introduction

2. Background

3. Feature Engineering for Social Media Harassment Detection

3.1. Social Network Datasets

3.2. Data Preprocessing

3.3. System Architecture

3.4. Data Modelling

3.5. Feature Importance and Justification

3.6. Hyperparameter Settings and Justification

4. Experimental Results

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI