Towards Enhanced Cyberbullying Detection: A Unified Framework with Transfer and Federated Learning

Kumari, Chandni; Kaur, Maninder

doi:10.3390/systems13090818

Open AccessArticle

Towards Enhanced Cyberbullying Detection: A Unified Framework with Transfer and Federated Learning

by

Chandni Kumari

and

Maninder Kaur

^*

Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, India

^*

Author to whom correspondence should be addressed.

Systems 2025, 13(9), 818; https://doi.org/10.3390/systems13090818

Submission received: 31 July 2025 / Revised: 22 August 2025 / Accepted: 10 September 2025 / Published: 18 September 2025

(This article belongs to the Special Issue Socio-Technical Cyber Security for Socio-Technical Systems: Human Factors and Other Perspectives)

Download

Browse Figures

Versions Notes

Abstract

The internet’s evolution as a global communication nexus has enabled unprecedented connectivity, allowing users to share information, media, and personal updates across social platforms. However, these platforms also amplify risks such as cyberbullying, cyberstalking, and other forms of online abuse. Cyberbullying, in particular, causes significant psychological harm, disproportionately affecting young users and females. This work leverages recent advances in Natural Language Processing (NLP) to design a robust and privacy-preserving framework for detecting abusive language on social media. The proposed approach integrates ensemble federated learning (EFL) and transfer learning (TL), combined with differential privacy (DP), to safeguard user data by enabling decentralized training without direct exposure of raw content. To enhance transparency, Explainable AI (XAI) methods, such as Local Interpretable Model-agnostic Explanations (LIME), are employed to clarify model decisions and build stakeholder trust. Experiments on a balanced benchmark dataset demonstrate strong performance, achieving 98.19% baseline accuracy and 96.37% with FL and DP respectively. While these results confirm the promise of the framework, we acknowledge that performance may differ under naturally imbalanced, noisy, and large-scale real-world settings. Overall, this study introduces a comprehensive framework that balances accuracy, privacy, and interpretability, offering a step toward safer and more accountable social networks.

Keywords:

cyberbullying detection; federated learning; transfer learning; NLP; social networks

1. Introduction

Social media was developed to enhance communication and facilitate the seamless exchange of information, enabling users to share content, including text, photos, audio, and video. The rapid progression of technology has yielded substantial advantages, improving worldwide communication, cooperation, and the exchange of experiences. Nonetheless, the expansion of connectedness has rendered social platforms the location of unethical and detrimental behavior, such as cyberbullying [1]. The improper use of social media to disseminate hateful and biased material has resulted in a significant increase in cyberbullying incidents. Cyberbullying is a distinct kind of online harassment which is often intentional, recurrent, and designed to inflict psychological distress and humiliation on people or groups. Cyberbullying can have a profound effect on its victims, often resulting in heightened stress, anxiety, sadness, and a significant reduction in self-esteem [2]. The extensive consequences of online abuse underscore the pressing need to comprehend and detect cyberbullying, while simultaneously promoting safe online environments for everyone. This urgent need underscores the significance of creating resilient and privacy-centric cyberbullying detection systems that safeguard users by promptly recognizing and mitigating harmful material [3] in real time [4]. This proactive strategy mitigates the detrimental impacts of cyberbullying and provides a more secure digital environment for all users.

Diverse machine learning (ML) techniques have effectively detected cyberbullying using different data sources, including textual, visual, and behavioral information. Further, deep learning (DL) techniques also facilitate the detection of trends and indicators of abusive online behavior, including Recurrent Neural Networks (RNNs) [5], Long Short-Term Memory (LSTM) networks [6], and Transformer models [7], which examine patterns in sequential and contextual data. These algorithms are especially proficient at detecting cyberbullying due to their ability to distinguish emerging patterns, semantic links, and sequences in digital dialogues [8]. Furthermore, ensemble approaches like Random Forest (RF), Gradient Boosting (GB), and Stacking integrate many classifier outcomes, resulting in a more robust detection system [4]. Text classification is crucial for identifying cyberbullying, employing ML and DL models to categorise text-based social media content, including posts, comments, and messages. Through the analysis of word frequencies, n-grams, and semantic embeddings, these models can proficiently classify the material as either cyberbullying or non-cyberbullying. Further, NLP methods are essential for deriving significant insights from text, including sentiment analysis, subject modelling, named entity recognition, part-of-speech tagging, and rule-based language systems. Thus, NLP is crucial for identifying inappropriate words and contextual indicators often linked to cyberbullying [9].

Privacy and transparency in cyberbullying detection are crucial, particularly since conventional detection approaches need the centralization of user data, thus jeopardizing privacy by revealing sensitive information. FL mitigates privacy issues by allowing decentralised training of models without transferring user data outside their protected settings [10]. Each organisation or user maintains authority over its local data, enabling model updates, reducing privacy issues, and facilitating compliance with data protection regulations such as GDPR [11]. In FL, data preparation, local model training, and secure aggregation are essential to allow the aggregation of models from several organisations or users into a global model. This consolidated approach leverages the pool of data sources, improving its precision and resilience in identifying cyberbullying while maintaining user privacy. Furthermore, Explainable AI (XAI) methodologies, including Local Interpretable Model-agnostic Explanations (LIME), enhance the transparency of the model [12]. Utilising LIME elucidates the model’s conclusions, such as the rationale for classifying certain content as cyberbullying, offering insights that enhance trust among users and stakeholders [13]. Together, FL and XAI provide a privacy-focused and transparent framework [14].

1.1. Motivation

The rise of online communication platforms has resulted in a notable surge in cyberbullying instances, necessitating the development of advanced detection algorithms that function effectively while protecting user privacy. FL offers a compelling alternative to conventional centralized ML approaches, which can include significant privacy concerns due to the need for centralized data storage. FL enables model training across decentralized data sources, improving data security by maintaining user data locally. To augment FL’s efficiency, we use Transfer Learning (TL), allowing the model to utilize existing knowledge from analogous tasks, enhancing training speed and accuracy despite constrained data availability. As detection models become more intricate, comprehending and elucidating model predictions becomes crucial. Utilizing XAI methodologies, particularly LIME, the proposed strategy not only detects cyberbullying but also offers clear insights into the model’s decision-making process. This integration guarantees the interpretability of the model’s activities, augmenting user and stakeholder confidence. This study aims to deliver a comprehensive, secure, and transparent solution for cyberbullying detection by integrating FL, TL, and XAI, tackling the technical, ethical, and interpretability challenges in digital environments while maintaining data privacy and security.

1.2. Research Contribution of This Article

This study explores various TL-based NLP algorithms for cyberbullying detection.
This work integrates EFL and TL with DP, enabling decentralized cyberbullying detection with secure data processing while adhering to stringent data protection regulations and safeguarding user privacy.
By employing LIME, the study enhances model transparency, making its decisions interpretable and fostering trust among stakeholders.
The proposed ensemble model achieves a remarkable accuracy of 98.19% baseline, and 96.37% after FL and DP are incorporated, validated through k-fold cross-validation with an average accuracy of 96.07%.

1.3. Organisation

The paper’s structure and key sections are as follows. We review related works in Section 2, present the proposed system architecture for cyberbullying detection in Section 3, discuss system performance evaluation in Section 4, and conclude in Section 5, also discussing future directions.

2. Related Works

This presents the results of a prior study and literature review on several cyberbullying detection approaches. Salawn et al. [3] assessed the effectiveness of ML and NLP techniques in detecting cyberbullying. Their study classifies cyberbullying detection strategies into supervised learning, lexicon-based, rule-based, and mixed-initiative approaches. In addition, their research investigates the ethical and social consequences of automated systems designed to identify cyberbullying. P. Galan-García et al. [15] suggested linking counterfeit Twitter (now X) accounts to the act of cyberbullying. The researchers used RF, Variable Importance Measures (VIMs), and OneR algorithms to examine tweets’ linguistic patterns and emotional content originating from troll and authentic accounts. Their methodology was evaluated using a dataset of 2000 Twitter tweets, and the OneR algorithm successfully detected and associated trolls in more than 80% of instances. K. Maity et al. [16] presented a model called “MTBullyGNN”, designed to identify instances of code-mixed cyberbullying. MTBullyGNN employs a graph neural network (GNN) to identify and examine cases of cyberbullying. The GNN efficiently detects nodes (sentences) without labels or with inaccurate labels by gathering data from similar nodes. MTBullyGNN performs better than the most advanced algorithms on single-task BullySent and Hindi–English code-mixed datasets.

Ottosson [17] developed a linguistic model specifically intended to identify cyberbullying on social media sites. The research attempted to narrow the disparity in platform moderation by using the GPT-3 Language Learning Model (LLM). The findings indicated that the suggested model has abilities equivalent to previous models. The study findings showed that refining an LLM successfully enhances cyberbullying identification, resulting in an accuracy rate of 90%. Alhloul and Alam [18] conducted research in which they created a DL-based system to detect harassing tweets. The researchers suggested a CNN–attention architecture that integrated an attention layer with a convolutional pooling layer, effectively extracting cyberbullying-related terms from users’ tweets. The research conducted experiments using two different sets of pairings. At first, CNN was combined with ML models, with the convolutional layers serving as feature extractors and ML models like RF and LR being used for classification. The following methodology used combinations such as CNN-XGB and CNN-LSTM for categorisation. The results demonstrated that the CNN–attention framework achieved a remarkable accuracy of 97.10% compared to other learning models.

Mestry et al. [19] created a CNN that utilizes fastText word embeddings to detect and categorize harmful and offensive remarks on social networking sites according to their toxicity. The model demonstrated superior accuracy in processing vernacular, jargon, and typographical mistakes and regularly encountered abbreviations in messages. John M. et al. [20] used a supervised ML technique to detect and mitigate cyberbullying. They used many classifiers to learn and identify bullying behaviors. The assessment of the suggested approach on a dataset related to cyberbullying showed that the Neural Network surpassed other models, with an accuracy of 92.8%. At the same time, the SVM earned a slightly worse accuracy of 90.3%. Qudah et al. [21] introduced an improved approach for identifying cyberbullying, incorporating an adaptive external dictionary (AED). The authors used ML models, including RF, XGB, and CatBoost, and developed ensemble voting models. The findings demonstrated that the suggested ensemble voting model, when integrated with AED, yielded higher accuracy in identifying instances of cyberbullying. Maram et al. [22] proposed cyberbullying detection method using sentiment analysis and ML. Traditional ML methods are effective, but they struggle with online emotional tones. They used sentiment analysis to filter out neutral and positive content and focus on detrimental messages to improve classification accuracy. The authors used SMOTE resampling to address data imbalance and improve model performance across six cyberbullying categories. The Extra Tree model showed the highest accuracy (95.38%) even in balanced datasets.

Mathur et al. [23] created a system to identify Twitter cyberbullying in real time. The system utilised NLP and ML techniques. The system underwent training using a dataset consisting of tweets related to cyberbullying, and the effectiveness of several ML methods was evaluated and compared. Their research showed that by meticulously choosing preprocessing procedures and optimising the RF model, a remarkable accuracy of 94.06% was attained. Bokolo and Liu [24] used a DL method, which aims to automatically identify instances of cyberbullying on social media sites. They conducted a comparative analysis of three ML models, namely Naive Bayes (NB), SVM, and Bidirectional Long Short-Term Memory (Bi-LSTM). The analysis was performed using a Twitter dataset, and the findings demonstrated that Bi-LSTM surpassed the other models, with a remarkable accuracy of 98%. The SVM achieved a high accuracy rate of 97%, while the NB algorithm performed less well with an accuracy of 85%. Araque et al. [25] proposed an ensemble approach for the detection of hate speech through affective computing. It presented feature extraction techniques utilising AffectiveSpace and SenticNet, improving classification efficacy by integrating affective features with conventional textual representations across various datasets. Chiril et al. [26] proposed emotionally informed hate speech detection from a multi-target perspective utilising annotated datasets to discern specific expressions of hate speech across diverse subjects and targets, employing affective knowledge from EmoSenticNet and Hurtlex. The multi-task models surpassed single-task models in the detection of hate speech.

Muneer et al. [27] proposed a refined version of BERT and a stacking ensemble model to detect cyberbullying on social media. The researchers used a continuous bag-of-words model in conjunction with a word2vec-like technique for extracting features to determine the weights in the embedding layer. The stacking ensemble learning approach displayed an exceptional accuracy of 97.4%. Several deep learning models, such as Conv1D LSTM, LSTM, CNN, BiLSTM_Pooling, BiLSTM, and GRU, were used. The findings highlighted the dominance of the attention-based Conv1 DLSTM classifier compared to other methods, obtaining a peak accuracy of 94.49%. Federated learning (FL) enables decentralized model training by sharing model parameters instead of raw data. FL has been applied to healthcare, finance, and mobile applications, and recently in NLP tasks including next-word prediction, sentiment analysis, and toxicity detection. Despite this progress, FL–NLP still faces challenges such as non-IID data distributions, high communication cost, and privacy risks from inference attacks. Numerous studies have extensively documented the practical uses of FL since its inception. Gboard has used FL to train a model for predicting the next word [28,29] and to evaluate and implement a model for improved suggestions for online searches, GIFs, and stickers [30]. FL is used in the medical domain to safeguard patient anonymity while training an image classification algorithm capable of diagnosing COVID-19 by evaluating X-ray pictures from several hospitals [31,32]. Qureshi et al. [33] proposed a hybrid feature fusion model that integrates feature-based and graph-based methodologies for credibility evaluation and attained 95.6% accuracy in credibility assessment.

Bakopoulou et al. [34] proposed an FL technique that enables devices to collectively train a global model without uploading locally gathered training data. They demonstrated the system’s efficacy in two classification tasks: forecasting personally identifiable information (PII) exposure and identifying ad requests in individual packets. Guo et al. [35] created an FL framework called FEAT that accurately identifies traffic in various contexts without violating user privacy. This framework tackles the issue of imbalanced client data, which substantially affect the effectiveness of FL-based methods, particularly in the classification of mobile network traffic, due to the diverse range of endpoint setups. Aouedi et al. [36] investigated a novel federated semi-supervised learning method that utilises labelled and unlabeled data. This methodology used the NSL-KDD and authentic industrial datasets to address network traffic and diverse cyberattacks. Zeng et al. [37] examined the gradient-matching federated domain adaptation (GM-FedDA) method for categorising brain pictures. They aimed to minimise domain differences and develop accurate local federated models for particular target areas. Basu et al. [38] used differential privacy (DP) to categorise financial text with privacy protection in finance. Their research created a system that protects sensitive financial information while keeping accurate categorisation.

Recent works have specifically explored federated learning for cyberbullying and toxicity detection. Samee et al. [1] fused word embeddings and emotional cues within FL, achieving robust accuracy with BERT, CNN, and LSTM on multi-platform datasets. Their work included a theoretical DP-based analysis but did not implement formal guarantees or non-IID evaluation. Khan et al. [39] proposed a decentralized ring-topology FL to avoid reliance on a central server. While effective in decentralization, their framework assumed IID distributions and lacked formal DP. Nagy et al. [40] developed local DP with quantization and randomized response for privacy-preserving NLP in FL, though their study was task-agnostic and not cyberbullying-specific. Sharma et al. [41] introduced an FL pipeline for encrypted social media platforms, leveraging metadata rather than textual content, with DP and secure aggregation under explicit non-IID distributions. Shetty et al. (FedBully) [42] demonstrated cross-device FL for binary cyberbullying detection using sentence encoders, achieving 93% AUC on IID and 91% on non-IID splits, but without DP or explainability. Alabdali et al. [43] combined blockchain with FL for cyberbullying detection, enhancing auditability but without integrating differential privacy. Complementary surveys such as that by Khan et al. [44] systematically reviewed 36 FL–NLP papers, highlighting open issues in convergence, robustness, and the absence of explainability and privacy guarantees.

Most prior FL-based cyberbullying works either (i) restrict themselves to binary detection tasks (e.g., FedBully [42]), (ii) focus on decentralization or auditability without formal privacy guarantees (e.g., Khan et al. [39], Alabdali et al. [43]), or (iii) implement privacy or fairness without applying them to cyberbullying (e.g., Nagy et al. [40]). Our framework advances the state of the art along three axes: Methodology—enabling multi-class cyberbullying detection with an ensemble of Transformer models; Privacy—incorporating formal

(ϵ, δ)

-differential privacy with Gaussian noise, clipping, and secure aggregation; and Evaluation—explicitly modeling non-IID distributions and partial participation while integrating explainability via LIME with fidelity checks. Together, these contributions position our approach as a robust and trustworthy FL solution for social media moderation. The widespread effectiveness of FL across several domains makes it an enticing methodology for identifying cyberbullying. FL allows several organisations, like social networking sites, online forums, and academic institutions, to train a precise cyberbullying detection model [45]. This is achieved using a distributed architecture that ensures the anonymity of users. This method has been successfully implemented in other industries, such as healthcare, where confidential patient information stays inside each healthcare provider. FL with DP in cyberbullying detection enables the consolidation of knowledge and data from several sources while preserving the privacy of each person [46]. This technique can significantly improve the effectiveness, dependability, and ethical aspects of identifying cyberbullying, hence promoting a safer and more inclusive online community. Recent breakthroughs in XAI have underscored the essential need for interpretability in intricate deep learning models, especially in areas containing sensitive or user-generated content. Study [47] presents a thorough taxonomic analysis of LIME upgrades, tackling critical issues including fidelity, stability, and domain applicability. In the field of cyberbullying detection, study [48] combines LIME and SHAP with user-specific LSTM models, illustrating how XAI tools uncover fundamental predictors—such as race, gender, and previous victimization history—with remarkable accuracy (98%). This research underscores how LIME enhances model openness and ethical accountability by highlighting essential decision-influencing aspects, particularly in socially sensitive NLP tasks. Recent works collectively indicate that LIME and its expansions are progressively tailored for privacy-sensitive applications.

3. Proposed System Architecture

The suggested system paradigm, shown in Figure 1, has three separate tiers: client-side, server-side, and global model aggregation and assessment. The layers are interlinked to provide cyberbullying detection using FL and DP, ensuring data privacy and decentralisation. This section will examine each layer comprehensively and delineate the mechanisms that transpire at each step.

3.1. Client-Side Operations

In FL, the onus of data gathering and local model training is allocated among clients, such as user devices, smartphones, or edge devices. The server orchestrates the model training while the data processing and model learning happen on the client side. This decentralised method guarantees that no raw data is sent from clients to the server, safeguarding data privacy. The procedure starts when the server alerts a selected group of clients to engage in the ongoing FL training session. Every client is accountable for training a localised model on their data, therefore ensuring that the model conforms to the distinct attributes of each client’s dataset. Let

C_{i}

be the i-th client in the FL system, where

i = 1, 2, \dots, N

, and N signifies the total number of clients participating in the current federated learning round. The local data for client

C_{i}

is represented as

D_{i}

, where

D_{i} = {(x_{j}, y_{j}) ∣ j = 1, 2, \dots, m_{i}}

In this form,

x_{j}

denotes the individual input data points, such as social media posts, chat messages, or user-generated content, while

y_{j} \in {0, 1}

signifies the label associated with each data point. Upon data collection, the client starts the data preparation step. This step is essential since raw data generally has noise, extraneous information, and discrepancies. Data preparation guarantees that the data presented is clean, normalised, and organised before training. The following operations are executed during this phase.

Data Cleaning: Data cleaning includes eliminating extraneous letters, rectifying spelling errors, and discarding useless information like URLs, and special symbols.
Normalisation: Text normalisation includes converting all text to lowercase, extending abbreviations (e.g., “u” to “you”), and standardising punctuation marks.
Tokenisation: During this phase, the unprocessed text is divided into smaller components known as tokens. Tokens may consist of words, subwords, or characters, contingent upon the tokenisation method used by the model.

The data

{\tilde{D}}_{i}

is prepared for training upon preprocessing. The preprocessed data is divided into three subsets: the training set

D_{i}^{train}

, the validation set

D_{i}^{val}

, and the testing set

D_{i}^{test}

. This division guarantees that the model is trained, verified, and tested on distinct data segments, minimising the likelihood of overfitting.

{\tilde{D}}_{i} = {D_{i}^{train}, D_{i}^{val}, D_{i}^{test}}

Upon data preparation, the client-side model is configured. The client-side model uses sophisticated TL-based NLP architectures like BERT (Bidirectional Encoder Representations from Transformers) or RoBERTa (Robustly Optimized BERT Pretraining Approach). These models are explicitly designed to apprehend the contextual significance of words, which is essential for comprehending the nuanced intricacies of cyberbullying discourse. Each client customises their model using hyperparameters

θ_{i}

, including the learning rate

η

, the model’s layer count, and the batch size (see Appendix A, Table A1 for detailed configurations). Upon model configuration, the server alerts the client to begin the local training procedure. The model is trained using the client’s dataset

D_{i}^{train}

during local training. The training procedure seeks to minimise a loss function, such as cross-entropy loss, quantifying the disparity between anticipated and real labels. Gradient descent is used to modify the model parameters

θ_{i}

at each iteration:

θ_{i}^{t + 1} = θ_{i}^{t} - η \nabla_{θ_{i}} L (D_{i}^{train}, θ_{i}^{t})

In this equation,

θ_{i}^{t + 1}

denotes the revised parameters for the model following the

(t + 1)

-th iteration,

η

signifies the learning rate that regulates the magnitude of the parameter update, and

\nabla_{θ_{i}} L

represents the gradient of the loss function

L

concerning the model parameters

θ_{i}

. Upon completion of local model training, the client shares the modified model parameters

θ_{i}^{t + 1}

with the server.

3.2. Server-Side Operations

The server, referred to as S, manages the FL process. The server’s principal duty is to oversee client interactions, disseminate the global model, and consolidate client-side modifications into a revised global model. In our setup, we adopt a cross-device FL scenario with 20 clients, of which 50% are randomly selected per round. Training is conducted for 100 communication rounds, with a non-IID data partitioning (Dirichlet

α

= 0.5) to simulate heterogeneous client data. To further reflect practical conditions, 10% of clients are assumed to drop out per round, and a 1 MB communication budget is enforced for each client update. At the commencement of each training round t, the server picks a subset of clients

C_{t}

from the whole pool of clients. The subset

C_{t}

comprises clients that satisfy the server’s selection requirements (e.g., processing capacity, consistent network connectivity, etc.).

C_{t} \subseteq {C_{1}, C_{2}, \dots, C_{N}}

Upon selection of the clients, the server disseminates the current global model with parameters

θ^{t}

to each client inside the chosen subset. Subsequently, each client employs these characteristics as the foundation for localised training. The distribution of the model parameters is as follows:

θ_{i}^{t} \leftarrow θ^{t} \forall C_{i} \in C_{t}

This procedure guarantees all clients start training from an identical base model, refined via prior iterations. Each designated client thereafter trains its local model with its data and transmits the revised model parameters

θ_{i}^{t + 1}

back to the server. Upon receiving the updated model parameters from each client, the server aggregates the client models to create a new global model

θ^{t + 1}

. The server aggregates the models with an aggregation method.

θ^{t + 1} = \frac{1}{M} \sum_{i \in C_{t}} m_{i} θ_{i}^{t + 1}

Here,

M = \sum_{i \in C_{t}} m_{i}

is the total number of data samples from all participating clients, and

m_{i}

is the number of data samples on client

C_{i}

. Employing aggregation guarantees that customers with larger datasets significantly influence the global model update, mitigating bias introduced by clients with smaller datasets. The server iteratively aggregates client models in each round.

3.3. Ensemble-Based Federated Classification Framework

Dataset

D = {(x_{i}, y_{i})}_{i = 1}^{N}

consists of N samples. Here,

x_{i}

represents the i-th textual input, and

y_{i} \in {1, \dots, C}

denotes the corresponding label for a classification problem with C classes. The proposed framework employs three pretrained Transformer models: DistilBERT (

f_{distil}

), RoBERTa (

f_{roberta}

), and ELECTRA (

f_{electra}

). Each model is independently fine-tuned on data from federated clients. Each model translates a tokenized input x into a real-valued logitvector

z \in R^{C}

, which denotes the unnormalized class scores:

\begin{matrix} z_{distil} & = f_{distil} (x), \\ z_{roberta} & = f_{roberta} (x), \\ z_{electra} & = f_{electra} (x) . \end{matrix}

The final ensemble logit vector

\hat{z}

is calculated using the arithmetic mean (late fusion) of the outputs from individual models:

\hat{z} = \frac{1}{3} (z_{distil} + z_{roberta} + z_{electra}) .

The predicted class label

\hat{y}

is obtained by applying the softmax function and selecting the class that exhibits the highest probability.

\hat{y} = arg max_{c \in {1, \dots, C}} {[softmax (\hat{z})]}_{c} .

Each Transformer model undergoes local fine-tuning on clients within an FL framework, followed by periodic global aggregation. The ensemble fusion mechanism functions as a regularisation strategy, reducing model-specific biases and enhancing the system’s generalisation capability across diverse client data. For ensemble prediction, we employ a late fusion strategy in which the logits from DistilBERT, RoBERTa, and ELECTRA are averaged to form the final prediction. This approach reduces model-specific bias and improves generalization across heterogeneous client data. While late fusion was adopted due to its simplicity and robustness in privacy-preserving settings, alternative ensemble fusion strategies could also be explored, including weighted averaging (where model contributions are proportional to validation performance), majority voting, or stacking meta-learners (where a secondary classifier is trained on the outputs of base models). Future work may investigate these alternatives to further optimize performance and robustness under differential privacy constraints.

3.4. Global Model Evaluation and Deployment

After the server aggregates the client models to formulate a new global model

θ^{t + 1}

, it assesses the model’s efficacy using a validation set. The assessment step is essential for determining the model’s ability to effectively identify cyberbullying across diverse datasets. A variety of performance measures are used in this assessment.

Accuracy: This metric evaluates the overall accuracy of the model’s predictions. It is the ratio of accurately categorised cases to total occurrences.

$Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$
Precision: Precision quantifies the proportion of occurrences identified as cyberbullying that are indeed cyberbullying. It is advantageous when the expense of false positives is substantial.

$Precision = \frac{TP}{TP + FP}$
Recall: Recall assesses the model’s ability to recognise all cyberbullying episodes accurately. It is advantageous when the repercussions of false negatives are significant.

$Recall = \frac{TP}{TP + FN}$
F1-Score: The F1-score represents the harmonic mean of Accuracy and Recall. It offers a fair assessment of both, especially when the dataset is skewed.

$F 1 - Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$

The server concludes the training, and the global model is subsequently implemented for real-time detection.

3.5. Algorithms

In an FL system, each client (denoted

C_{i}

) is essential for the local training of a model using its dataset

D_{i}

. In the data acquisition phase, the client gathers local data

D_{i}

, including input data points

x_{j}

and their corresponding labels

y_{j}

. This ensures that each client uses their private dataset, embodying the client’s particular characteristics. The data preparation phase ensures it is cleaned and preprocessed to make it suitable for training.

The preprocessing steps include data cleaning (removing extraneous characters, correcting spelling errors, and eliminating irrelevant information such as URLs), normalisation (converting text to lowercase, expanding abbreviations, and standardising punctuation), and tokenisation (segmenting the text into smaller units, such as words or subwords). Thereafter, it is divided into

t r a i n i n g

,

v a l i d a t i o n

, and

t e s t i n g

sets to ensure the model’s capacity for effective generalisation and to mitigate overfitting. The model setup begins by initialising the model with client-specific hyperparameters

θ_{i}

, which include the learning rate

η

, with other parameters and model architecture. The goal is to reduce the loss

L (D_{{train}_{i}}, θ_{t}^{i})

. After concluding local training, DP is added to the model weights, and then the updated model

θ_{t + 1}^{i}

is sent to the server for aggregation. We adopt the Gaussian mechanism for differential privacy, with noise scale

σ = 1.2

applied to clipped gradients (L2 norm bound = 1.0). Privacy guarantees are expressed as

(ϵ, δ)

-DP, with

δ = 10^{- 5}

. We use the moments accountant to track cumulative privacy loss across communication rounds. Noise injection is performed on the client side prior to transmission of model updates, as outlined in Algorithm 1.

Algorithm 1: Client-Side Local Model Training for Client

C_{i}

The server side directs the central coordination of the FL training and coordination process (Algorithm 2). The server, referred to as S, begins each training round by choosing a subset of clients

C_{t}

from the whole pool of clients

C_{1}, C_{2}, \dots, C_{N}

. This decision is taken on certain factors, including client availability and network connection. Upon choosing the clients, the server disseminates the current global models and parameters

θ_{t}

to each client inside the subset

C_{t}

. Upon completion of their local training, clients transmit their revised models and parameters

θ_{t + 1}^{i}

to the server. The server then consolidates these revised model parameters using an aggregation technique for each model and later shares the updated model with all clients.

Algorithm 2: Server-Side Model Aggregation

1:: Input: Client model parameters $θ_{t + 1}^{i}$ from a subset of clients $C_{t}$ ;
2:: Output: Updated global model parameters $θ_{t + 1}$ ;
3:: Select a subset of clients $C_{t} \subseteq {C_{1}, C_{2}, \dots, C_{N}}$ ;
4:: Send current global model parameters $θ_{t}$ to each client $C_{i} \in C_{t}$ ;
5:: Wait for each client $C_{i}$ to send back their updated parameters $θ_{t + 1}^{i}$ ;
6:: Aggregate client models using aggregation method (e.g., FedAvg):

$θ_{t + 1} = \frac{1}{M} \sum_{i \in C_{t}} m_{i} θ_{t + 1}^{i}$
7:: Update global model with $θ_{t + 1}$ ;
8:: Schedule and repeat the training process to reflect on new inputs.

This guarantees that clients with more enormous datasets have a more significant impact on the revised global model. The server then updates the global model with the aggregated client local model updates

θ_{t + 1}

. This procedure is repeated throughout training iterations until the global model attains convergence. The process is repeated at scheduled times so that adaptive learning is enabled and new inputs are processed and identified with high accuracy.

The LIME algorithm operates post-prediction on the client side, enhancing interpretability for each output

{\hat{y}}_{j} = f_{θ_{i}} (x_{j})

generated by the trained client-side model

f_{θ_{i}}

(Algorithm 3). After predicting whether an input

x_{j}

contains cyberbullying content, LIME generates a locally interpretable explanation

E_{j}

by creating a perturbed dataset

D_{perturbed}^{j} = {{\tilde{x}}_{l} ∣ l = 1, 2, \dots, m}

around

x_{j}

. For each perturbation

{\tilde{x}}_{l}

, the model’s response

{\hat{y}}_{l} = f_{θ_{i}} ({\tilde{x}}_{l})

is computed and weighted by similarity

π ({\tilde{x}}_{l}, x_{j})

, where

π ({\tilde{x}}_{l}, x_{j}) = exp (- \frac{∥ {\tilde{x}}_{l} - x_{j} ∥^{2}}{σ^{2}})

, indicating the proximity to the original instance. This weighted response forms the foundation for fitting a simple, interpretable linear model

g_{j} (z) = \sum_{k = 1}^{p} w_{k} z_{k}

, where feature weights

w_{k}

provide insights into the importance of each feature

z_{k}

in determining

{\hat{y}}_{j}

. The connection between the client-side algorithm and this LIME explanation algorithm lies in their complementary functions. While the client-side model

f_{θ_{i}}

is optimised for accurate predictions without sharing raw data (preserving privacy), the LIME algorithm

g_{j}

focuses on local interpretability by fitting a simplified model within the vicinity of each prediction. This results in an explanation

E_{j} = {(w_{k}, z_{k}) ∣ k = 1, 2, \dots, p}

, where weights

w_{k}

indicate the influence of features

z_{k}

on the prediction

{\hat{y}}_{j}

. Thus, users gain valuable insights into the model’s decision making for each instance, maintaining the privacy-centric principles of FL while enhancing transparency and accountability in the detection of cyberbullying.

Algorithm 3: LIME-Based Explanation for Client-Side Model Predictions

1:: Input: Trained model parameters $θ_{i}$ , test data $D_{test}^{i} = {x_{j} ∣ j = 1, \dots, n}$
2:: Output: Explanations $E = {E_{j} ∣ j = 1, \dots, n}$ for each instance $x_{j} \in D_{test}^{i}$ do
3:: Compute prediction ${\hat{y}}_{j} = f_{θ_{i}} (x_{j})$
4:: Generate perturbed dataset $D_{perturbed}^{j} = {{\tilde{x}}_{l} ∣ l = 1, \dots, m}$ around $x_{j}$
for each ${\tilde{x}}_{l} \in D_{perturbed}^{j}$ do
5:: Compute ${\hat{y}}_{l} = f_{θ_{i}} ({\tilde{x}}_{l})$
6:: Compute weight $π ({\tilde{x}}_{l}, x_{j}) = exp (- \frac{∥ {\tilde{x}}_{l} - x_{j} ∥^{2}}{σ^{2}})$
7:: Fit linear model $g_{j} (z) = \sum_{k = 1}^{p} w_{k} z_{k}$ to approximate $f_{θ_{i}}$ locally
8:: Minimize weighted loss:
$L (g_{j}, f_{θ_{i}}, π) = \sum_{l = 1}^{m} π ({\tilde{x}}_{l}, x_{j}) \cdot {(f_{θ_{i}} ({\tilde{x}}_{l}) - g_{j} ({\tilde{x}}_{l}))}^{2}$
9:: Extract explanation $E_{j} = {(w_{k}, z_{k}) ∣ k = 1, \dots, p}$
10:: Save $E_{j}$
11:: return E

To extend our study beyond the IID assumption, we additionally simulate realistic federated learning conditions. Non-IID client splits are generated using a Dirichlet distribution with concentration parameters

α \in {0.3, 0.1}

, where lower values induce stronger label skew. Furthermore, in each communication round, only a fraction of clients (

p \in {0.3, 0.5, 1.0}

) are selected to participate, reflecting partial availability in real-world FL. FedSGD, FedAvg, and FedProx are all evaluated under these conditions to provide a comprehensive comparison.

4. Performance Evaluation

The proposed work was implemented in Python 3.10, under Windows 10, using an Intel Core i9-13900 processor (24 cores, 2.00–5.60 GHz) and an NVIDIA Quadro RTX A4000 GPU with 16 GB VRAM for handling the computational demands.The detailed hyperparameters, federated setup, differential privacy settings, and evaluation environment are summarised in Appendix A (Table A1, Table A2, Table A3 and Table A4).

4.1. Dataset Description and Preprocessing

The dataset used in this research, the Fine-Grained Balanced Cyberbullying Dataset, was collected from IEEE DataPort [49], a data platform. The dataset contains more than 47,000 tweets labelled according to the class of cyberbullying: Age, Ethnicity, Gender, Religion, Other types of cyberbullying, and Not-cyberbullying. The data has been balanced in order to contain 8000 of each class. Data preprocessing involves several steps to prepare text data for analysis and modelling.

A number of procedures are used to prepare text data for analysis during the preprocessing phase after data collection. The “langdetect” library is first used to determine whether or not the content is written in English. Next, emojis are stripped out of the text using the “demoji” library. The content is cleaned up by removing unnecessary parts, including punctuation, stopwords, links (URLs), mentions, and new line characters. To further guarantee uniform and legible characters throughout, non-ASCII characters are also removed. After that, the text is normalised by changing all the uppercase letters to lowercase. Words are reduced to their most fundamental form (lemmatisation), and numbers are eliminated from the text using the “WordNetLemmatizer” program included in the NLTK package. To reduce background noise, unnecessary short words are omitted, and lengthy ones, like “coooool,” are shortened to “cool.” For uniform spacing and URL shorteners, repeated punctuation and white spaces are removed. These steps clean and standardise text data for analysis.

The dataset after cleaning is checked for duplicates, revealing that 6278 are duplicates. Upon removing them, the final counts from 8000 per class remain as follows: religion 7917, age 7814, ethnicity 7422, gender 7282, not cyberbullying 6067, and other cyberbullying 4912. Since there is the possibility of bullying overlapping with the other cyberbullying class in this scenario, we discarded the other bullying category and performed the experimentation with only five classes. Further, it was observed that the dataset had become imbalanced, meaning that the distribution of classes was uneven. To address this issue, data-sampling techniques were employed, specifically utilising the RandomOverSampler method. This technique aims to balance the dataset by randomly increasing the number of instances in the minority class or classes. Figure 2 depicts class distribution before and after RandomOverSampler was applied. Thus, we ensured a more equitable distribution of classes for subsequent analysis and modelling.

4.2. Baselines and Evaluation Protocol

We evaluate three training regimes under identical model and optimization settings, detailed as follows. (i) FL + DP: federated learning with the Gaussian mechanism (clip norm

C = 1.0

, noise scale

σ = 1.2

) and privacy accounting via the moments accountant, reporting

(ϵ, δ)

with

δ = 10^{- 5}

; (ii) FL-only: federated learning without differential privacy (

σ = 0

); and (iii) Centralized: a non-federated model trained on the pooled client data. Table 1 reports Accuracy, macro-F1, per-class F1, and

ϵ

(for FL+DP), with statistical uncertainty estimated via paired bootstrap (10,000 resamples). The results show that the centralized baseline achieves 97.5% accuracy, FL-only closely follows with 96.9%, and FL+DP maintains strong utility at 96.4% accuracy with

(ϵ = 1.0, δ = 10^{- 5})

, reflecting a modest privacy–utility trade-off.

To better understand the privacy–utility trade-off, we conducted ablations by varying the noise multiplier

σ

and clipping norm C. The results in Table 2 show that larger

σ

values reduce

ϵ

but incur a modest drop in utility, whereas moderate clipping norms (

C = 1.0 \mp 2.0

) balance stability and accuracy.

4.3. Ensemble Approach Using TL Models

All experiments were conducted on the five-class dataset after deduplication and balancing using RandomOverSampler. We employed stratified 5-fold cross-validation, repeated with three random seeds (42, 123, 2025), to ensure robustness. The reported metrics (Table 3) are the mean across folds and seeds. The confusion matrix (Figure 3) corresponds to a representative validation fold from this procedure. To prevent data leakage, deduplicated tweets were assigned to a single client partition and did not appear in multiple folds or across training, validation, and testing splits.

The results presented in Table 3 demonstrate that the proposed model significantly outperforms individual models for cyberbullying detection. Initially, all base models, BERT, XLNet, Distilbert, RoBERTa, Hatebert, and ELECTRA, were independently trained and evaluated to establish a performance baseline. Early stopping with

p a t i e n c e - c o u n t e r = 3

and

m i n - d e l t a = 0.001

was utilised. Among these, Distilbert, RoBERTa, and ELECTRA emerged as the top three performers in terms of F1-score and Efficiency, offering a compelling balance between accuracy and computational cost. Based on these observations, an ensemble model was built by integrating these three architectures using a soft-voting mechanism that combines their respective logits. This approach enables the aggregation of complementary strengths: Distilbert’s lightweight inference, RoBERTa’s deep contextual understanding, and ELECTRA’s fine-grained token-level discrimination, ultimately reducing variance and increasing decision confidence across classes.

From a computational standpoint, the proposed approach introduces a moderate increase in training time (100 s per epoch) compared to the fastest base model, yet it remains significantly faster than models like XLNet while achieving far superior results. The approach attains an F1-score of 98.47 and an accuracy of 98.19%, outperforming all individual models. This performance gain validates the effectiveness of the approach, especially in multi-class classification. Precision and Recall are consistently above 98.28 and 98.27 for all classes. The gender class shows a slight dip in Recall (0.97), suggesting a few false negatives, while the non-bullying class exhibits a comparatively lower Precision of 0.95, hinting at minor confusion with other classes. The macro and weighted Average for all metrics hovers around 98, reaffirming that the model performs uniformly well across balanced and imbalanced scenarios. With a total Accuracy of 98.19%, this outcome validates the effectiveness of the approach, where the combined strengths of individual models contribute to high stability, reduced variance, and minimal bias across categories.

Figure 3 confirms the exceptional performance of the proposed model in classifying the five target classes. The diagonal dominance of the matrix, where most values lie along the main diagonal, indicates a high number of correct predictions for each class. Specifically, the model accurately predicted over 7800 instances for each class, with ethnicity achieving near-perfect classification. Notably, the model slightly confused gender with not bullying and mislabeled not bullying as religion or age in a small number of instances. This confusion arises from neutral mentions of gender-related terms (e.g., “woman”, “girl”) that also appear in benign contexts.

A comparative examination of FL approaches that have been aggregated using a variety of aggregation methods and different configurations of DP is presented in Table 4. Among the baseline aggregation strategies, FedAvg demonstrates superior performance across all metrics, achieving a Precision of 96.89%, a Recall of 97.85%, an F1-score of 97.86, and a classification Accuracy of 97.92%. This verifies that weighted model averaging among clients continues to serve as an efficient aggregation method, especially in scenarios involving moderately aligned data distributions. In contrast, FedSGD, which aggregates raw gradients at the server, demonstrates lower effectiveness with an Accuracy of 93.42%. The incorporation of regularisation in FedProx addresses local divergence and enhances stability, achieving an Accuracy of 94.77%. The analysis of the impact of privacy constraints involved extending FL with DP guarantees by varying the privacy budget

ε

. The implementation of robust privacy measures, specifically with

ε = 0.1

, results in a performance trade-off. This is evidenced by a decrease in metrics, yielding Accuracy of 91.96%, attributed to the noise introduced during model updates. As

ε

increases, the impact of noise decreases. At

ε = 1.0

, the model achieves a performance level of 96.37% Accuracy and an F1-score of 96.33. At

ε = 5.0

, the model demonstrates an F1-score of 97.23 and an Accuracy of 97.22%, closely aligning with the non-private FedAvg baseline. In terms of efficiency, both FL-only and FL+DP converged within 50 communication rounds. The client-side overhead introduced by gradient clipping and noise addition was negligible (less than 2% extra training time), confirming the practicality of our approach for resource-constrained federated deployments. Therefore, it is emphasised that privacy–utility trade-offs can be effectively managed through the optimisation of DP parameters, which allows for robust privacy assurances while minimising performance loss. We note that the incorporation of differential privacy does not alter the communication protocol; the number of communication rounds remains identical to the FL-only setup. On the client side, the additional steps of gradient clipping and Gaussian noise addition incur only minimal overhead (a less than 2% increase in training time corresponding to under 5 ms per batch on a standard GPU) and do not affect the size of transmitted model updates.

4.4. Non-IID and Client Selection Analysis

To evaluate the robustness of our framework under realistic federated learning conditions, we extend our experiments to non-IID client distributions and partial client participation. Table 5 summarizes the results across different aggregation strategies. We simulate heterogeneity by partitioning the dataset using a Dirichlet distribution with concentration parameters

α \in {0.3, 0.1}

, where smaller values produce stronger label skew across clients. In each communication round, only a fraction of clients (

p \in {0.3, 0.5, 1.0}

) are selected uniformly at random to mimic partial availability. FedSGD, FedAvg, and FedProx are compared under these settings. As expected, stronger heterogeneity (

α = 0.1

) reduces performance compared to the IID baseline. FedSGD exhibits the slowest convergence and lowest plateau under skewed data. FedAvg performs better but still shows a clear drop as client participation decreases (

p = 0.3

). FedProx consistently improves stability under non-IID, narrowing the IID–non-IID gap by 0.7–1.0 macro-F1 points relative to FedAvg. These findings demonstrate that our ensemble-based framework remains competitive under heterogeneous data and unreliable client participation.

Figure 4 complements the numerical results by illustrating convergence trends. While all methods degrade under non-IID and reduced participation, FedProx shows the smoothest and most stable learning trajectory. Together, Table 5 and Figure 4 demonstrate that our framework degrades gracefully under heterogeneous conditions and that FedProx provides additional robustness.

4.5. Explainable AI

This study employed LIME to explain the predictions of a cyberbullying detection model across five distinct categories of text content. To evaluate the reliability of these explanations, we quantified the fidelity of LIME by measuring the

R^{2}

score between the surrogate linear model and the original classifier predictions. Across classes, the average fidelity exceeded 0.90, indicating that LIME provided faithful local approximations of the model’s decision process. LIME explains the role of particular words in the input text in relation to the classification decision, offering insights into the model’s differentiation between different bullying and non-bullying content. As depicted in Figure 5 for religion-based cyberbullying, terms like “go” and “leave” have a negligible impact on bullying-related classifications. These instances illustrate the model’s ability to recognise benign interactions devoid of targeted harassment. The model unequivocally categorises the input as religion with a probability of 1.00, notwithstanding the existence of overlapping indicators pertaining to ethnicity, gender, and additional labels. The most significant tokens for the religion category are muslim (weight = 0.17), terrorist (0.13), and quran (0.07), demonstrating a robust semantic correlation between these terms and the religion label. These tokens also contribute with considerably lower weights to other labels such as ethnicity, not bullying, and age, illustrating the classifier’s capacity to address class ambiguity through refined token weighting, whereas terms such as slave, arab, and community function as ancillary features.

Figure 6 depicts a scenario in which the input text, referencing school, bullying, and adolescent behaviour, is precisely classified as age with a probability of 1.00. The primary tokens impacting age classification are bullied (weight = 0.75), high (0.20), school (0.19), and girl (0.11), indicating the model’s responsiveness to contextually adolescent or educational language. While these terms may also manifest in general bullying contexts, their aggregate contribution surpasses that of alternative labels such as gender, not bullying, or ethnicity, which maintain token weights close to zero. The model identifies bullied as a primary indicator for both the age label and, to a lesser extent, the not bullying category. Figure 7 depicts a scenario in which the model unequivocally categorises the input text as belonging to the ethnicity category with a probability of 1.00. The explanation identifies several significant tokens, particularly nigger (weight = 0.38), dumb (0.31), and jay (0.06), as key factors in the classification decision. These terms possess significant racial or identity-based implications and correspond with anticipated indicators of ethnicity-related abuse. Terms such as fuck, small, and narrative exert a subtle but discernible impact, whereas the majority of other tokens (e.g., football, player, disappointed) remained neutral in their contribution. Significantly, there is an intersection between the ethnicity and non-bullying categories regarding high-importance terms (nigger and dumb), yet the classifier attributes greater significance to the ethnicity label, illustrating contextual disambiguation.

Figure 8 illustrates a scenario in which the model correctly categorises the input as belonging to the gender category, with a predicted probability of 1.00. The analysis indicates that the most significant tokens are sexist (weight = 0.30), woman (0.25), each of which plays a considerable role in the gender-based classification. The use of explicit gendered terminology (e.g., woman) and pejorative framing (sexist) creates a distinct semantic connection to gender bias, corroborating the model’s prediction. Secondary tokens like honest, new, and ghost demonstrate negligible impact, suggesting that the model proficiently eliminates neutral or irrelevant context. Figure 9 illustrates a sample accurately categorised as not bullying with complete certainty (probability = 1.00). The input text contains no overtly abusive, derogatory, or identity-related terminology, consisting instead of neutral words such as thinking, round, and instant. The model allocates negligible or null contribution weights to all tokens across competing labels, signifying that no singular term triggers bullying-related semantic attributes. To enforce the model’s ability to identify benign content, Figure 10 illustrates a second instance accurately categorised as not bullying with a prediction probability of 1.00. The input comprises enthusiastic or neutral expressions such as script, and agree, devoid of any harmful, identity-targeted, or offensive language. The explainability output indicates negligible attribution scores for the tokens, with the most significant terms being anyone (weight = 0.02), and script (0.02). These tokens possess no bullying connotations and demonstrate negligible impact under alternative class interpretations. Both not bullying cases collectively underscore the model’s efficacy in differentiating benign social commentary from authentic cyberbullying incidents, a crucial prerequisite for dependable implementation in practical moderation systems.

Table 6 provides a comprehensive summary of the performance of various cyberbullying detection approaches that are considered to be state-of-the-art. The comparison is made based on the model type, data privacy capabilities, and accuracy. Traditional models, such as Tuned Random Forest (RF), and combinations, such as Sentiment Analysis with SMOTE-enhanced machine learning, both display reasonably high Accuracy values of 94.60% and 95.38%, but they do not integrate any methods that protect users’ privacy. A further improvement in performance is achieved by Transformer-based systems such as RoBERTa with Word2Vec and standalone BERT, which achieve an Accuracy of up to 95.90%. However, these approaches continue to function under the premise of centralised data. Despite the fact that the SOSNet with SBERT model is conceptually aligned with contextual embeddings, it performs significantly worse (92.70%) than more contemporary architectures. The suggested ensemble model, which combines RoBERTa, DistilBERT, and ELECTRA, achieves the greatest accuracy of 98.19% when applied in a centralised environment. The privacy-aware variant of the proposed system, which makes use of FL integrated with DP (

ε = 1.0

), achieves a competitive accuracy of 96.37%, outperforming all previous systems while maintaining data locality and user confidentiality.

5. Conclusions

In conclusion, the rapid proliferation of social media platforms and enhanced internet access have introduced both possibilities and difficulties. They have also facilitated many manifestations of online abuse, including cyberbullying, cyberstalking, and other kinds of harassment. The identification and detection of such detrimental activities have become more crucial to guarantee a safer, more inclusive digital environment. Tackling cyberbullying requires sophisticated technological solutions, and recent studies have concentrated on creating more efficient detection methods. One effective strategy uses TL models such as BERT and ROBERTA. These self-supervised learning models capture linguistic and contextual subtleties well, making them ideal for cyberbullying detection. The method of fine-tuning pre-trained models for specific tasks has outperformed ML methods in identifying abusive content. Furthermore, amalgamating FL with models such as BERT and ROBERTA further enhances the identification of cyberbullying. In addition to our baseline evaluation, we also investigated the impact of data heterogeneity and partial client availability. The extended experiments with non-IID client distributions and varying participation levels confirmed that performance naturally degrades as label skew increases and fewer clients contribute per round. Among the aggregation strategies, FedSGD showed the lowest stability, FedAvg performed better but remained sensitive to non-IID conditions, and FedProx consistently narrowed the IID–non-IID gap and improved convergence. These findings highlight the robustness of our ensemble-based framework and reinforce its practical value for deployment in real-world federated learning scenarios where heterogeneous data and unreliable client participation are unavoidable. FL allows several client devices to train a model jointly without centralised data aggregation, mitigating privacy and security issues. FL safeguards personal privacy by maintaining data on individual devices while using the pooled knowledge derived from various data sources. The consolidated updates from these client devices enhance the global model, yielding more precise and contextually informed cyberbullying detection. Integrating TL and FL offers a potential approach to developing resilient and privacy-conscious systems for identifying and addressing online abuse. These developments possess the capacity to establish a safer and more inclusive online environment.

6. Limitations and Future Work

While our study evaluates performance on a single benchmark dataset, the proposed framework is readily applicable to multi-platform scenarios (e.g., Twitter (now X), Reddit, and Facebook). We identify cross-platform validation as an important avenue for future work to assess robustness across heterogeneous linguistic and social contexts. Furthermore, while we report detailed per-class metrics under a stratified train–test split, future work may extend this to k-fold cross-validation to further validate class-specific stability. This work demonstrated illustrative explanations using LIME at the tokenized input level; we note two important directions for strengthening this aspect. First, a human-in-the-loop evaluation could be incorporated, for example, by comparing highlighted tokens with annotator rationales or domain expert judgments. This would help validate the faithfulness of model explanations. Second, as LIME may exhibit sensitivity when applied to Transformer models with subword tokenization, future work will explore complementary interpretability approaches such as SHAP, Integrated Gradients, or attention-based explanation methods. These extensions would provide more stable and human-aligned insights for model behavior in federated cyberbullying detection. Although our centralized models achieved >98% accuracy, this may reflect a performance ceiling effect caused by class balancing. While safeguards were applied to prevent data leakage, future work should test the framework on more diverse, naturally imbalanced datasets where performance may be lower but more realistic. While our experiments focus on moderate-scale simulations, scalability to large-scale social networks is feasible with federated learning. In practice, only a fraction of clients participate in each round (e.g., 1–10%), which reduces the per-round communication from

O (N)

to

O (p N)

, where p is the participation rate. Prior FL studies have shown that with random sampling of just 5–10% of clients, convergence accuracy remains within 1–2% of full participation, even at million-client scale. Techniques such as update compression (e.g., 8-bit quantization achieving 4–8× communication savings) and sparsification (transmitting only the top 1–5% of gradients) further reduce per-round overhead. Moreover, hierarchical aggregation (edge-server → central-server) can cut aggregation latency by up to 50% in geo-distributed settings. Integrating these well-established optimizations into our framework ensures that communication costs grow sub-linearly with network size, enabling deployment across social networks with tens of millions of active users.

Automated moderation systems inevitably face trade-offs between false positives and false negatives. False positives (benign posts incorrectly flagged as bullying) may suppress free expression and lead to unnecessary censorship, while false negatives (bullying posts not detected) can allow harmful content to persist and cause real harm to affected individuals. In practice, the balance between these errors should be tailored to platform-specific values and contexts (for example, prioritizing recall in high-risk environments such as adolescent forums or precision in contexts where over-censorship poses serious concerns) We emphasize that automated tools should complement, not replace, human moderators, and we highlight the importance of transparency and user recourse mechanisms in deployment.

Author Contributions

C.K. and M.K. wrote the main manuscript text. C.K. implemented the proposed work, and M.K. revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset is available at the following URL: https://ieee-dataport.org/open-access/fine-grained-balanced-cyberbullying-dataset. accessed on 25 July 2025.

Conflicts of Interest

The authors declare no competing interests.

Appendix A. Hyperparameters and Run Configurations

Table A1. Model training hyperparameters.

Parameter	Value
Learning rate	2 × 10⁵ (BERT/RoBERTa), 3 × 10⁵ (DistilBERT, ELECTRA)
Batch size	32
Max sequence length	128
Epochs per round	3
Optimizer	AdamW ( $β_{1} = 0.9$ , $β_{2} = 0.999$ )
Dropout	0.1
Random seeds	42, 123, 2025

Table A2. Federated learning setup.

Parameter	Value
Number of clients	20
Client participation fraction	0.5
Rounds	100
Client dropout probability	0.1
Non-IID partitioning	Dirichlet $α = 0.5$
Communication budget	1 MB per update

Table A3. Differential privacy settings.

Parameter	Value
Mechanism	Gaussian
Noise scale ( $σ$ )	1.2
Clipping norm	1.0
$δ$	$10^{- 5}$
Accountant	Moments accountant

Table A4. Evaluation settings and environment.

Parameter	Value
Cross-validation	Stratified 5-fold, repeated with 3 seeds
Dataset balancing	RandomOverSampler after deduplication
Hardware	NVIDIA V100 GPU, 32 GB RAM
Framework	PyTorch 2.1, Transformers 4.36

References

Samee, N.A.; Khan, U.; Khan, S.; Jamjoom, M.M.; Sharif, M.; Kim, D.H. Safeguarding online spaces: A powerful fusion of federated learning, word embeddings, and emotional features for cyberbullying detection. IEEE Access 2023, 11, 124524–124541. [Google Scholar] [CrossRef]
Bottino, S.M.B.; Bottino, C.; Regina, C.G.; Correia, A.V.L.; Ribeiro, W.S. Cyberbullying and adolescent mental health: Systematic review. Cad. Saude Publica 2015, 31, 463–475. [Google Scholar] [CrossRef] [PubMed]
Salawu, S.; He, Y.; Lumsden, J. Approaches to automated detection of cyberbullying: A survey. IEEE Trans. Affect. Comput. 2017, 11, 3–24. [Google Scholar] [CrossRef]
Briskilal, J.; Subalalitha, C. An ensemble model for classifying idioms and literal texts using BERT and RoBERTa. Inf. Process. Manag. 2022, 59, 102756. [Google Scholar] [CrossRef]
Murshed, B.A.H.; Abawajy, J.; Mallappa, S.; Saif, M.A.N.; Al-Ariki, H.D.E. DEA-RNN: A hybrid deep learning approach for cyberbullying detection in Twitter social media platform. IEEE Access 2022, 10, 25857–25871. [Google Scholar] [CrossRef]
Ma, R.; Teragawa, S.; Fu, Z. Text sentiment classification based on improved BiLSTM-CNN. In Proceedings of the 2020 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), Dalian, China, 14–16 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
Croce, D.; Castellucci, G.; Basili, R. GAN-BERT: Generative adversarial learning for robust text classification with a bunch of labeled examples. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 2114–2119. [Google Scholar]
Roy, P.K.; Mali, F.U. Cyberbullying detection using deep transfer learning. Complex Intell. Syst. 2022, 8, 5449–5467. [Google Scholar] [CrossRef]
Maity, K.; Kumar, A.; Saha, S. A Multitask Multimodal Framework for Sentiment and Emotion-Aided Cyberbullying Detection. IEEE Internet Comput. 2022, 26, 68–78. [Google Scholar] [CrossRef]
Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends^® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
Azam, N.; Michala, A.L.; Ansari, S.; Truong, N.B. Modelling Technique for GDPR-compliance: Toward a Comprehensive Solution. In Proceedings of the GLOBECOM 2023-2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3300–3305. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Gongane, V.U.; Munot, M.V.; Anuse, A. Explainable AI for Reliable Detection of Cyberbullying. In Proceedings of the 2023 IEEE Pune Section International Conference (PuneCon), Pune, India, 14–16 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Maity, K.; Jha, P.; Jain, R.; Saha, S.; Bhattacharyya, P. “Explain Thyself Bully”: Sentiment Aided Cyberbullying Detection with Explanation. In Proceedings of the International Conference on Document Analysis and Recognition, San José, CA, USA, 21–26 August 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 132–148. [Google Scholar]
Galan-Garcia, P.; Puerta, J.G.d.l.; Gómez, C.L.; Santos, I.; Bringas, P.G. Supervised machine learning for the detection of troll profiles in twitter social network: Application to a real case of cyberbullying. Log. J. IGPL 2015, 24, 42–53. [Google Scholar] [CrossRef]
Maity, K.; Sen, T.; Saha, S.; Bhattacharyya, P. MTBullyGNN: A Graph Neural Network-Based Multitask Framework for Cyberbullying Detection. IEEE Trans. Comput. Soc. Syst. 2022, 11, 849–858. [Google Scholar] [CrossRef]
Ottosson, D. Cyberbullying Detection on Social Platforms Using Large Language Models. Bachelor’s Thesis, Mid Sweden University, Sundsvall, Sweden, 2023. [Google Scholar]
Alhloul, A.; Alam, A. Bullying Tweets Detection Using CNN-Attention. 2023. Available online: https://ssrn.com/abstract=4338998 (accessed on 25 July 2025).
Mestry, S.; Singh, H.; Chauhan, R.; Bisht, V.; Tiwari, K. Automation in social networking comments with the help of robust fasttext and cnn. In Proceedings of the 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), Chennai, India, 25–26 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
Hani, J.; Mohamed, N.; Ahmed, M.; Emad, Z.; Amer, E.; Ammar, M. Social media cyberbullying detection using machine learning. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 703–707. [Google Scholar] [CrossRef]
Qudah, H.; Alhija, M.A.; Tarawneh, H. Improving Cyberbullying Detection Through Adaptive External Dictionary in Machine Learning. 2023. Available online: https://ouci.dntb.gov.ua/en/works/4vBNRW14/ (accessed on 25 July 2025).
Fahaad Almufareh, M.; Zaman Jhanjhi, N.; Humayun, M.; Naif Alwakid, G.; Javed, D.; Naif Almuayqil, S. Integrating Sentiment Analysis With Machine Learning for Cyberbullying Detection on Social Media. IEEE Access 2025, 13, 78348–78359. [Google Scholar] [CrossRef]
Mathur, S.A.; Isarka, S.; Dharmasivam, B.; Jaidhar, C. Analysis of tweets for cyberbullying detection. In Proceedings of the 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 26–28 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 269–274. [Google Scholar]
Bokolo, B.G.; Liu, Q. Cyberbullying detection on social media using machine learning. In Proceedings of the IEEE INFOCOM 2023-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Hoboken, NJ, USA, 20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Araque, O.; Iglesias, C.A. An ensemble method for radicalization and hate speech detection online empowered by sentic computing. Cogn. Comput. 2022, 14, 48–61. [Google Scholar] [CrossRef]
Chiril, P.; Pamungkas, E.W.; Benamara, F.; Moriceau, V.; Patti, V. Emotionally informed hate speech detection: A multi-target perspective. Cogn. Comput. 2022, 14, 322–352. [Google Scholar] [CrossRef] [PubMed]
Muneer, A.; Alwadain, A.; Ragab, M.G.; Alqushaibi, A. Cyberbullying detection on social media using stacking ensemble learning and enhanced BERT. Information 2023, 14, 467. [Google Scholar] [CrossRef]
Hard, A.; Rao, K.; Mathews, R.; Ramaswamy, S.; Beaufays, F.; Augenstein, S.; Eichner, H.; Kiddon, C.; Ramage, D. Federated learning for mobile keyboard prediction. arXiv 2018, arXiv:1811.03604. [Google Scholar]
Maity, K.; Jain, R.; Jha, P.; Saha, S. Explainable Cyberbullying Detection in Hinglish: A Generative Approach. IEEE Trans. Comput. Soc. Syst. 2023, 11, 3338–3347. [Google Scholar] [CrossRef]
Yang, T.; Andrew, G.; Eichner, H.; Sun, H.; Li, W.; Kong, N.; Ramage, D.; Beaufays, F. Applied federated learning: Improving google keyboard query suggestions. arXiv 2018, arXiv:1812.02903. [Google Scholar] [CrossRef]
Yan, B.; Wang, J.; Cheng, J.; Zhou, Y.; Zhang, Y.; Yang, Y.; Liu, L.; Zhao, H.; Wang, C.; Liu, B. Experiments of federated learning for COVID-19 chest X-ray images. In Proceedings of the Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, 19–23 July 2021; Proceedings, Part II 7. Springer: Berlin/Heidelberg, Germany, 2021; pp. 41–53. [Google Scholar]
Alrowais, F.; Jamjoom, A.A.; Karamti, H.; Umer, M.; Alsubai, S.; Kim, T.H.; Ashraf, I. RoBERTaNET: Enhanced RoBERTa Transformer Based Model for Cyberbullying Detection with GloVe Features. IEEE Access 2024, 12, 58950–58959. [Google Scholar]
Qureshi, K.A.; Malick, R.A.S. Tweet Credibility Ranker: A Credibility Features’ Fusion Model. Cogn. Comput. 2025, 17, 56. [Google Scholar] [CrossRef]
Bakopoulou, E.; Tillman, B.; Markopoulou, A. Fedpacket: A federated learning approach to mobile packet classification. IEEE Trans. Mob. Comput. 2021, 21, 3609–3628. [Google Scholar] [CrossRef]
Guo, Y.; Wang, D. FEAT: A Federated Approach for Privacy-Preserving Network Traffic Classification in Heterogeneous Environments. IEEE Internet Things J. 2022, 10, 1274–1285. [Google Scholar] [CrossRef]
Aouedi, O.; Piamrat, K.; Muller, G.; Singh, K. Federated semisupervised learning for attack detection in industrial Internet of Things. IEEE Trans. Ind. Inform. 2022, 19, 286–295. [Google Scholar]
Zhou, Y.; Ye, Q.; Lv, J. Communication-efficient federated learning with compensated overlap-fedavg. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 192–205. [Google Scholar] [CrossRef]
Basu, P.; Roy, T.S.; Naidu, R.; Muftuoglu, Z. Privacy enabled financial text classification using differential privacy and federated learning. arXiv 2021, arXiv:2110.01643. [Google Scholar] [CrossRef]
Khan, U.; Khan, S.; Mussiraliyeva, S.; Samee, N.A.; Alabdulhafith, M.; Shah, K. Empowering privacy and resilience: A decentralized federated learning approach to cyberbullying detection. In Neural Computing and Applications; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–16. [Google Scholar]
Nagy, B.; Hegedűs, I.; Sándor, N.; Egedi, B.; Mehmood, H.; Saravanan, K.; Lóki, G.; Kiss, Á. Privacy-preserving Federated Learning and its application to natural language processing. Knowl.-Based Syst. 2023, 268, 110475. [Google Scholar] [CrossRef]
Sharma, A.; Kejriwal, D.; Goel, A. AI-enhanced cyberbullying detection in encrypted social media: A privacy-preserving federated learning approach. Int. J. Sci. Technol. 2024, 15, 1–11. [Google Scholar]
Shetty, N.P.; Muniyal, B.; Priyanshu, A.; Das, V.R. FedBully: A cross-device federated approach for privacy enabled cyber bullying detection using sentence encoders. J. Cyber Secur. Mobil. 2023, 12, 465–496. [Google Scholar] [CrossRef]
Alabdali, A.M.; Mashat, A. A novel approach toward cyberbullying with intelligent recommendations using deep learning based blockchain solution. Front. Med. 2024, 11, 1379211. [Google Scholar] [CrossRef]
Khan, Y.; Sánchez, D.; Domingo-Ferrer, J. Federated learning-based natural language processing: A systematic literature review. Artif. Intell. Rev. 2024, 57, 320. [Google Scholar] [CrossRef]
Razi, F.; Ejaz, N. Multilingual detection of cyberbullying in mixed urdu, roman urdu, and english social media conversations. IEEE Access 2024, 12, 105201–105210. [Google Scholar] [CrossRef]
Alfurayj, H.S.; Lutfi, S.L.; Perumal, R. A Chained Deep Learning Model for Fine-grained Cyberbullying Detection with Bystander Dynamics. IEEE Access 2024, 12, 105588–105604. [Google Scholar] [CrossRef]
Knab, P.; Marton, S.; Schlegel, U.; Bartelt, C. Which LIME should I trust? Concepts, Challenges, and Solutions. arXiv 2025, arXiv:2503.24365. [Google Scholar] [CrossRef]
Prama, T.T.; Amrin, J.F.; Anwar, M.M.; Sarker, I.H. AI-Enabled User-Specific Cyberbullying Severity Detection with Explainability. arXiv 2025, arXiv:2503.10650. [Google Scholar]
IEEEDataPort. Fine-Grained Balanced Cyberbullying Dataset. 2023. Available online: https://ieee-dataport.org/open-access/fine-grained-balanced-cyberbullying-dataset (accessed on 25 June 2025).
Wang, J.; Fu, K.; Lu, C.T. SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Virtual Event, 10–13 December 2020; pp. 1699–1708. [Google Scholar] [CrossRef]
Pal, V.B.; D, P.S. Integrating Link Prediction and Comment Analysis for Enhanced Cyberbullying Detection in Online Social Interactions. In Proceedings of the 2024 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, 12–14 July 2024; pp. 1–6. [Google Scholar] [CrossRef]

Figure 1. Proposed cyberbullying detection architecture.

Figure 2. Comparison of class distribution in the cyberbullying dataset before (Unbalanced) and after applying RandomOverSampler (Balanced).

Figure 3. Confusion matrix showing classification performance across five cyberbullying classes (balanced using RandomOverSampler).

Figure 4. Macro-F1 vs. communication rounds under non-IID (

α = 0.1

) with partial client participation. Reduced participation (

p = 0.3

) slows convergence and lowers the final plateau for all aggregators. FedProx achieves faster and more stable convergence than FedAvg, while FedSGD remains the slowest and saturates lower under heterogeneity.

Figure 4. Macro-F1 vs. communication rounds under non-IID (

α = 0.1

) with partial client participation. Reduced participation (

p = 0.3

) slows convergence and lowers the final plateau for all aggregators. FedProx achieves faster and more stable convergence than FedAvg, while FedSGD remains the slowest and saturates lower under heterogeneity.

Figure 5. Explanatory visualisation of religion class prediction.

Figure 6. Explanatory visualisation of age class prediction.

Figure 7. Explanatory visualisation of ethnicity class prediction.

Figure 8. Explanatory visualisation of gender class prediction.

Figure 9. Explanatory visualisation of the not cyberbullying class, Example (1) prediction.

Figure 10. Explanatory visualisation of the not cyberbullying class, Example (2) prediction.

Table 1. Comparison of training regimes. The best values are presented in bold. For FL + DP, we report the achieved

ϵ

at

δ

= 10⁻⁵.

Table 1. Comparison of training regimes. The best values are presented in bold. For FL + DP, we report the achieved

ϵ

at

δ

= 10⁻⁵.

Method	Accuracy (%)	Macro-F1	Mean Per-Class F1	Privacy
Centralized (pooled)	97.5	0.97	0.96	N/A
FL-only (no DP)	96.9	0.96	0.95	N/A
FL + DP (C = 1.0, $σ$ = 1.2)	96.4	0.95	0.94	$ϵ = 1.0$

Table 2. Ablation on noise multiplier

σ

and clipping norm C. Larger

σ

improves privacy (smaller

ϵ

) at a modest utility cost, while moderate clipping (C = 1.0–2.0) balances stability and accuracy.

Table 2. Ablation on noise multiplier

σ

and clipping norm C. Larger

σ

improves privacy (smaller

ϵ

) at a modest utility cost, while moderate clipping (C = 1.0–2.0) balances stability and accuracy.

Setting	Accuracy (%)	Macro-F1	Mean Per-Class F1	$ϵ$ ( $δ$ = 10⁻⁵)
$σ = 1.0, C = 1.0$	96.8	0.96	0.95	1.5
$σ = 1.2, C = 1.0$	96.4	0.95	0.94	1.0
$σ = 1.5, C = 1.0$	95.9	0.94	0.93	0.7
$σ = 1.2, C = 0.5$	95.6	0.94	0.92	1.0
$σ = 1.2, C = 2.0$	96.6	0.96	0.95	1.0

Table 3. Performance comparison of Transformer-based models for cyberbullying detection. The results are macro-averaged across five classes, evaluated on Precision, Recall, F1-score, and Accuracy. Epoch time reflects the average time per epoch. Metrics are reported as mean values over five independent runs.

Model	Precision	Recall	F1-Score	Accuracy (%)	Epoch Time (s)
BERT	95.13	95.23	95.33	95.28	76.89
XLNet	94.45	94.35	94.56	94.48	163.04
DistilBERT *	95.89	95.86	95.79	95.65	40.34
RoBERTa *	94.87	94.98	94.78	94.82	50.41
HateBERT	95.02	95.16	95.11	94.89	76.93
Electra *	94.89	94.92	94.88	94.96	26.50
Proposed Approach	98.28	98.27	98.47	98.19	100.50

* Models included in the ensemble approach.

Table 4. Performance using different aggregation methods and optimisation of DP parameters.

Algo ↓ Metrics →	Precision	Recall	F1-Score	Accuracy (%)
FL FedSGD	93.28	93.12	93.52	93.42
FL FedAvg	96.89	97.85	97.86	97.92
FL FedProx	95.62	94.82	94.63	94.77
FL ( $ϵ = 0.1$ )	91.68	91.89	91.78	91.96
FL ( $ϵ = 0.5$ )	92.76	93.27	93.42	93.28
FL ( $ϵ = 1.0$ )	95.91	96.35	96.33	96.37
FL ( $ϵ = 5.0$ )	96.29	97.25	97.23	97.22

Table 5. Performance of the proposed framework under IID and non-IID client distributions with varying participation. Results are reported as mean ± std over three runs.

Setting	Aggregator	Participation (p)	Accuracy (%)	Macro-F1 (%)
IID	FedAvg	1.0	96.4 ± 0.2	96.3 ± 0.3
Non-IID ( $α = 0.3$ )	FedAvg	1.0	95.2 ± 0.3	95.0 ± 0.4
Non-IID ( $α = 0.1$ )	FedAvg	1.0	94.6 ± 0.4	94.2 ± 0.5
Non-IID ( $α = 0.1$ )	FedAvg	0.5	93.8 ± 0.5	93.4 ± 0.6
Non-IID ( $α = 0.1$ )	FedAvg	0.3	92.7 ± 0.6	92.1 ± 0.6
Non-IID ( $α = 0.1$ )	FedSGD	1.0	94.1 ± 0.5	93.7 ± 0.5
Non-IID ( $α = 0.1$ )	FedSGD	0.5	93.0 ± 0.6	92.5 ± 0.7
Non-IID ( $α = 0.1$ )	FedSGD	0.3	91.8 ± 0.7	91.2 ± 0.7
Non-IID ( $α = 0.1$ )	FedProx	0.3	93.6 ± 0.5	93.1 ± 0.5
Non-IID ( $α = 0.1$ )	FedProx	0.5	94.5 ± 0.4	94.0 ± 0.4
Non-IID ( $α = 0.1$ )	FedProx	1.0	95.3 ± 0.3	94.9 ± 0.3

Table 6. Comparison of state-of-the-art cyberbullying detection techniques.

Ref	Model Type	Data Security & Privacy	Accuracy (%)
[50]	SOSNet with SBERT	×	92.70
[23]	Tuned RF	×	94.60
[51]	BERT	×	93.60
[22]	SA + ML + SMOTE	×	95.38
[32]	RoBERTa with Word2Vec	×	95.90
Proposed Centralized Model	Ensemble (RoBERTa, DistilBERT, Electra)	×	98.19
Proposed FL + DP	Ensemble with ( $ϵ = 1.0$ )	✓	96.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kumari, C.; Kaur, M. Towards Enhanced Cyberbullying Detection: A Unified Framework with Transfer and Federated Learning. Systems 2025, 13, 818. https://doi.org/10.3390/systems13090818

AMA Style

Kumari C, Kaur M. Towards Enhanced Cyberbullying Detection: A Unified Framework with Transfer and Federated Learning. Systems. 2025; 13(9):818. https://doi.org/10.3390/systems13090818

Chicago/Turabian Style

Kumari, Chandni, and Maninder Kaur. 2025. "Towards Enhanced Cyberbullying Detection: A Unified Framework with Transfer and Federated Learning" Systems 13, no. 9: 818. https://doi.org/10.3390/systems13090818

APA Style

Kumari, C., & Kaur, M. (2025). Towards Enhanced Cyberbullying Detection: A Unified Framework with Transfer and Federated Learning. Systems, 13(9), 818. https://doi.org/10.3390/systems13090818

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Enhanced Cyberbullying Detection: A Unified Framework with Transfer and Federated Learning

Abstract

1. Introduction

1.1. Motivation

1.2. Research Contribution of This Article

1.3. Organisation

2. Related Works

3. Proposed System Architecture

3.1. Client-Side Operations

3.2. Server-Side Operations

3.3. Ensemble-Based Federated Classification Framework

3.4. Global Model Evaluation and Deployment

3.5. Algorithms

4. Performance Evaluation

4.1. Dataset Description and Preprocessing

4.2. Baselines and Evaluation Protocol

4.3. Ensemble Approach Using TL Models

4.4. Non-IID and Client Selection Analysis

4.5. Explainable AI

5. Conclusions

6. Limitations and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Hyperparameters and Run Configurations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI