A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space

Chechkin, Aleksandr; Pleshakova, Ekaterina; Gataullin, Sergey

doi:10.3390/a18120735

Open AccessArticle

A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space

by

Aleksandr Chechkin

¹,

Ekaterina Pleshakova

^2,* and

Sergey Gataullin

^2,3

¹

Department of Mathematics and Data Analysis, Financial University Under the Government of the Russian Federation, Leningradsky Ave., 49/2, 125167 Moscow, Russia

²

MIREA—Russian Technological University, Vernadsky Ave., 78, Bldg. 4, 119454 Moscow, Russia

³

Social Modeling Lab, Central Economics and Mathematics Institute, Russian Academy of Sciences, Nakhimovsky Pr., 47, 117418 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(12), 735; https://doi.org/10.3390/a18120735 (registering DOI)

Submission received: 15 October 2025 / Revised: 16 November 2025 / Accepted: 21 November 2025 / Published: 23 November 2025

Download

Browse Figures

Versions Notes

Abstract

Cybersecurity remains a key challenge in the development of intelligent telecommunications systems and the Internet of Things (IoT). The growing destructive impact of the digital environment, coupled with high-performance computing (HPC), requires the development of effective countermeasures to ensure the security of the digital space. Traditional approaches to detecting destructive content are primarily limited to static text analysis, which ignores the temporal dynamics and evolution of destructive impact scenarios. This is critical for monitoring tasks in the digital environment, where threats rapidly evolve. To overcome this limitation, this study proposes a hybrid architecture, Hyb-TKAN, based on adaptive algorithms that account for the temporal component and nonlinear dependencies. This approach enables not only the classification of destructive messages but also the analysis of their development and transformation over time. Unlike existing studies, which focus on individual aspects of aggressive content, the model utilizes multilayered data analysis to identify hidden relationships and nonlinear patterns in destructive messages. The integration of these components ensures high adaptability and accuracy of text processing. The presented approach was implemented in a multi-class classification task with evaluation based on real text data. The obtained results demonstrate improved classification accuracy. In the Experimental Analysis Section, the results are compared with the closest modern analogs, confirming the relevance and competitiveness of the proposed hybrid neural network.

Keywords:

artificial intelligence; cybersecurity; adaptive algorithms; social engineering; kolmogorov-arnold networks (KANs); recurrent neural networks (RNNs); high performance computing (HPC)

1. Introduction

The generative artificial intelligence (GenAI) and large language models (LLMs) [1] emergent abilities are shifting the cybersecurity paradigm [2]. On the one hand, we are seeing the deep learning (DL) architectures for large neural network structures and LLM development to solve specific problems, such as the latent consistency model (LCM) [3,4,5], vision language model (VLM) [6], small language model (SLM) [7,8,9,10,11,12], language action model (LAM) [13,14], masked language model (MLM) [15,16,17], segment anything model (SAM) [18,19,20,21,22,23], etc. On the other hand, the computational complexity of modern GenAI systems, including the aforementioned large-scale language and multimodal models, requires a continuous and exponential increase in computing power. These models are characterized by a huge number of parameters, requiring large-scale parallelization of computations using resource-intensive graphics processors (GPUs) in cloud or data-centric solutions.

The paper [24] analyzes various computing system (CS) architectures used in recent decades, identifying the most common structures for building cluster CSs. To ensure high accuracy, interpretability, and computational efficiency of such systems, it is advisable to develop new algorithms adaptable to complex, evolving cybersecurity attacks. The use of Kolmogorov-Arnold networks (KANs) [25], based on Kolmogorov’s theorem on superposition of multidimensional functions, serves as a large-scale direction in the field of compact, interpretable and high-performance models [26]. It is worth noting that their practical application remains insufficiently studied, but impressive results have been achieved in a short time in the field of text analytics, speech emotion analysis and digital signal processing [27,28,29,30,31,32,33], which can be used in multimodal data processing tasks in intelligent telecommunication systems and the Internet of Things (IoT) to ensure cybersecurity. This study presents a hybrid neural network model with multi-domain dynamic attention based on Kolmogorov-Arnold networks (Hyb-KANs), combining a transformer architecture, an attention mechanism, and bidirectional recurrent neural networks. The components integration of bidirectional recurrent neural networks with long short-term memory, the transformer architecture, a multi-domain dynamic attention module, and Kolmogorov-Arnold network framework ensures high adaptability and accuracy in processing text information. The suggested approach is implemented in a multinomial cyberbullying classification task using real-world text data. The obtained results demonstrate an increase in classification accuracy compared to traditional models and advancement in finding a balance between accuracy, computational efficiency, and interpretability of solutions. Future work could be to investigate possible modifications of KANs [34], for example, KKANs [35] in digital biomedical signal processing tasks. Most existing models in this area focus on architectural compositions, while the algorithmic mechanisms of feature fusion and temporal dependency modeling remain insufficiently formalized. The present work focuses on the algorithmic formulation of hybrid temporal–semantic attention, leading to a reproducible computational scheme rather than an architectural configuration only.

2. Related Works

Despite the progress achieved in hybrid Transformer–RNN architectures, existing approaches remain limited in their ability to capture cross-domain dependencies and dynamic contextual patterns relevant to cybersecurity and social-engineering threat detection. Current works typically focus either on temporal modeling (e.g., BiLSTM) or on contextual embedding (e.g., RoBERTa-based Transformers), lacking an integrated mechanism for multi-domain adaptation and interpretable reasoning. To address these gaps, this study introduces a novel Hybrid Temporal Kolmogorov-Arnold Network (Hyb-TKAN) framework that unifies four complementary components: BiLSTM—for modeling sequential syntactic dependencies; RoBERTa Transformer—for extracting high-level contextual embeddings; Multi-Domain Dynamic Attention Network (MD-DAN)—for semantic, contextual, and temporal feature interaction; and Temporal-KAN functional layer—for interpretable nonlinear decomposition based on the Kolmogorov-Arnold representation theorem. We propose an integrated hybrid architecture (Hyb-TKAN) that captures both long-term temporal dependencies and cross-domain contextual relations in a unified learning framework. Applied scientific research on the social media cyberbullying detection and classification utilizes various approaches to improve the quality of deep learning algorithms. The Experimental Section includes a comparative performance analysis for the suggested hybrid neural network model based on adaptive algorithms with several widely used methods and a benchmark dataset. The performance of the proposed method was evaluated with a set of eight modern transformer-type architectures, including BERT (bert-base-uncased), which is a novel bidirectional architecture for extracting contextual representations of words [36], its lightweight version DistilBERT, which is a compact transformer model that inherits BERT’s bidirectional contextual representations at a significantly lower computational cost [37], and RoBERTa, a retrained version of BERT with modified hyperparameters and expanded data corpora [38]. To test with more recent architectural innovations, we included XLNet [39], a permutation language modeling that overcomes the limitations of BERT due to the assumption of independence of masked tokens; ALBERT, which implements a factorized decomposition of the embedding matrix and a parameter sharing mechanism to significantly reduce the model size without losing quality [40]; ELECTRA [41], a transformer-based pretrained model that uses a token substitution task instead of masking; DeBERTa, a modification of BERT; and RoBERTa, which uses an improved attention mechanism to provide more accurate semantic dependency extraction [42].

3. Materials and Methods

3.1. Dataset

To solve the problem of multi-class message classification in the context of offensive and toxic user comments, we used an English-language dataset containing annotated user comments from Twitter, Wikipedia Talk pages, and YouTube. The dataset contains over 150,000 comment texts, labeled according to the following offensive categories: insults, threats, obscene language, sexually explicit content, identity threats, and severe toxicity. The data is divided into six classes accordingly. Figure 1 shows an example of tweets from each category in the dataset.

The dataset used in this study was derived from the publicly available Cyberbullying Detection Dataset on Kaggle, containing approximately 150,000 English-language comments collected from multiple online social-media sources. Each comment was annotated with one or more of the following labels: insult, threat, profanity, religious hate, age discrimination, and neutral/non-toxic content.

To ensure rigorous evaluation and prevent data leakage, the dataset was randomly divided into three non-overlapping subsets using a 70:15:15 ratio for training, validation, and testing, respectively. All partitions preserve the overall class distribution by means of stratified sampling.

The dataset is moderately imbalanced: the neutral and insult categories dominate (approximately 42% and 28% of samples), whereas threat and age discrimination classes occur less frequently (around 6% and 5%, respectively). To mitigate imbalance effects, we applied a class-weighted loss function and oversampling of underrepresented classes within each mini-batch during training.

3.2. Data Preparation and Pre-Processing

At the initial stage, the raw text data underwent a comprehensive cleaning and normalization procedure. This includes eliminating extraneous elements such as hyperlinks, HTML tags, numbers, punctuation marks, special symbols, and redundant character repetitions inside words. All tokens were converted to lowercase, while common stop words (e.g., and, or, in) were excluded to reduce noise. Non-standard symbols were discarded, abbreviations were expanded to their full form, and duplicate entries together with incomplete records were addressed. In addition, stemming was applied to reduce words to their base form, and text formatting was unified to a standard representation (Figure 2).

After preprocessing, the corpus was tokenized and forwarded into a RoBERTa-based Transformer. This component is responsible for constructing contextual embeddings for each token, enabling effective capture of semantic and syntactic dependencies within the sequence. In this work, RoBERTa was selected as the backbone model for tokenization and labeling due to its robustness and proven efficiency in natural language processing tasks, especially when dealing with complex linguistic structures.

3.3. Core Algorithmic Components

The proposed Hyb-TKAN architecture integrates several well-established deep learning models—BiLSTM, Transformer, Multi-Domain Dynamic Attention Network (MD-DAN), and Kolmogorov-Arnold Network (KAN). Each module considers a distinct type of dependency or representation, critical for complex multi-domain text classification tasks such as cyberbullying and social engineering detection.

The BiLSTM network captures local and sequential dependencies by processing text in both forward and backward directions. This bidirectional structure provides contextual awareness in temporal sequences, improving the recognition of syntactic patterns such as negations or sequential sentiment cues. The BiLSTM network offers the following advantages: high performance when modeling long-term dependencies with a relatively small number of parameters. However, limitations of this method should be noted: its sequential nature makes parallelization difficult, and modeling long-term context remains limited compared to self-attention mechanisms.

The Transformer module (based on RoBERTa) models global contextual dependencies via multi-head self-attention. It efficiently learns hierarchical semantic representations, regardless of token position. Advantages: Excellent parallelization, scalability, and semantic coherence across long sequences. Limitations: Quadratic complexity with respect to sequence length and limited interpretability of attention weights.

MD-DAN enhances feature fusion by learning to dynamically weigh three complementary subspaces: semantic, contextual, and temporal. This design enables adaptive re-calibration of attention based on domain-specific signal variability. Advantages: Domain-adaptive feature weighting and improved robustness across heterogeneous data sources. Limitations: Increased computational cost and potential redundancy between attention branches if hyperparameters are not well tuned.

Temporal Kolmogorov-Arnold Network (T-KAN). The KAN component implements functional decomposition

f (x) = \sum_{i = 1}^{n} φ_{i} (g_{i} (x))

to achieve interpretable nonlinear mapping between temporal features and decision boundaries. It approximates complex dependencies with fewer parameters while providing interpretability through its functional basis. Advantages: Compact parameterization, interpretability, and smooth functional behavior. Limitations: Sensitivity to choice of basis functions and relatively slower convergence during fine-tuning.

In the proposed Hyb-TKAN, these modules operate synergistically: BiLSTM extracts temporal–syntactic cues, RoBERTa captures global context, MD-DAN aligns heterogeneous domains, and T-KAN provides interpretable functional reasoning. Their integration yields a balanced trade-off between accuracy, interpretability, and computational efficiency.

3.4. The Proposed Approach

This paper presents a hybrid architecture of the Hyb-TKAN neural network transformer, which combines the key advantages of the BiLSTM, Transformer, Multi-Domain Dynamic Attention Network (MD-DAN), and Temporal-KAN models. The proposed structure combines BiLSTM’s ability to effectively capture local relationships within sequences with the Transformer’s ability to analyze global contextual dependencies. The additional MD-DAN module enhances the model’s performance by emphasizing the semantic, contextual, and temporal characteristics of the data.

The hybrid structure allows the advantages of Transformer and BiLSTM to complement each other when processing sequential text data, while Temporal-KAN enhances the model’s ability to account for evolving dependencies across time. This makes it particularly effective in scenarios where destructive content spreads dynamically, such as coordinated disinformation campaigns.

The architecture consists of several modules, including the BiLSTM module, the Transformer module, the Multi-Domain Dynamic Attention Network (MD-DAN), the Temporal-KAN block, and the final classifier. Key features of Hyb-TKAN include multi-level analysis, in which BiLSTM captures local syntactic and sequential dependencies, Transformer processes global contextual information, MD-DAN provides semantic, contextual, and temporal attention, and Temporal-KAN explicitly models nonlinear temporal dependencies, improving the system’s ability to detect dynamic and hidden patterns in the data.

The synergy of BiLSTM, Transformer, MD-DAN, and Temporal-KAN provides flexibility, adaptability, and accuracy in text and sequence analysis. The model can be easily applied across different domains and tasks thanks to its hybrid architecture and the introduction of Temporal-KAN, which enables effective handling of time-dependent and nonlinear relationships. The model can be easily applied across different domains and tasks thanks to its hybrid architecture and the introduction of Temporal-KAN, which enables effective handling of time-dependent and nonlinear relationships. In particular, it can be employed for the detection of disinformation and fake news in social media streams, monitoring the spread of destructive narratives in online communities, identifying coordinated bot-driven campaigns, analyzing temporal patterns of cyberbullying or hate speech escalation.The structure of the proposed model is shown in Figure 3.

The authors’ proposed hybrid neural network model, the Transformer, with multi-domain dynamic attention, integrates recurrent, transformer, and feature-oriented architectures to extract multidimensional features from text data and then efficiently analyze them for classification tasks. This section presents a formalization of the key components of the architecture.

The proposed Hyb-TKAN model integrates the advantages of BiLSTM, Transformer, a Multi-Domain Dynamic Attention Network (MD-DAN), and a KAN. A formalization of the model’s main modules is presented below.

3.5. Input Data

Consider the input text X as a sequence of embeddings:

X = {x_{1}, x_{2}, \dots, x_{T}}, x_{i} \in R^{d} .

(1)

where T is the length of the sequence, d is the embedding dimension.

3.6. Bidirectional LSTM (BiLSTM)

BiLSTM is used to model local syntactic context:

\begin{matrix} {\vec{h}}_{t} & = {LSTM}_{f} (x_{t}, {\vec{h}}_{t - 1}), \end{matrix}

(2)

\begin{matrix} {\overset{\leftarrow}{h}}_{t} & = {LSTM}_{b} (x_{t}, {\overset{\leftarrow}{h}}_{t + 1}), \end{matrix}

(3)

\begin{matrix} h_{t} & = [{\vec{h}}_{t}; {\overset{\leftarrow}{h}}_{t}] \end{matrix}

(4)

where

x_{t}

is Token embedding at time step t, and sequence length

h_{t}

is hidden state of the BiLSTM at time step t.

3.7. Transformer Module (RoBERTa)

For global context encoding, a pre-trained RoBERTa transformer is used:

H_{Trans} = RoBERTa (X) = {r_{1}, \dots, r_{T}}, r_{t} \in R^{d_{r}} .

(5)

where r frequency or order index of the basis functions.

3.8. Multi-Domain Dynamic Attention Network (MD-DAN)

Let

E_{i}

be the common feature vector for the i-th token, obtained by combining the hidden states:

E_{i} = [h_{i}; r_{i}] \in R^{2 h + d_{r}} .

(6)

Semantic attention:

α_{i} = softmax (E_{i}^{⊤} W_{s}), S = \sum_{i = 1}^{T} α_{i} E_{i} .

(7)

where

α_{i}

—the normalized attention coefficient for the i-th element, obtained via the softmax operation, such that

\sum_{i = 1}^{T} α_{i} = 1

and

α_{i} \geq 0

.

Contextual self-attention:

\begin{matrix} Q_{i} & = E_{i} W_{q}, K_{j} = E_{j} W_{k}, V_{j} = E_{j} W_{v}, \end{matrix}

(8)

\begin{matrix} A_{i j} & = \frac{Q_{i} K_{j}^{⊤}}{\sqrt{d_{k}}}, C_{i} = \sum_{j = 1}^{T} softmax (A_{i j}) V_{j} . \end{matrix}

(9)

Aggregation of domain features:

F_{MD - DAN} = w_{s} S + w_{c} C + w_{t} T, w_{s}, w_{c}, w_{t} \in R .

(10)

3.9. Temporal-KAN

To capture nonlinear temporal dependencies, we introduce Temporal-KAN with explicit time embeddings

e_{t}

:

\begin{matrix} g_{j} (x_{j, t}, e_{t}) & = a_{j 0} + \sum_{k = 1}^{p} a_{j k} x_{j, t}^{k} + \sum_{r = 1}^{R} (c_{j r} sin (2 π r τ_{t}) + d_{j r} cos (2 π r τ_{t})), \end{matrix}

(11)

\begin{matrix} y_{t}^{(tmp)} & = \sum_{i = 1}^{m} φ_{i} (\sum_{j = 1}^{n} ω_{i j} g_{j} (x_{j, t}, e_{t}) + b_{i}), \end{matrix}

(12)

where

τ_{t}

is the normalized timestamp, and

φ_{i}

are trainable nonlinear activations.

The temporal output is aggregated:

F_{T - KAN} = \frac{1}{T} \sum_{t = 1}^{T} y_{t}^{(tmp)} .

(13)

Algorithm 1 outlines the main computational stages of the Hyb-TKAN learning algorithm, combining BiLSTM-based local modeling, Transformer-based global context extraction, and Temporal-KAN nonlinear processing into a unified optimization pipeline.

The novelty of the Hyb-TKAN lies in the algorithm that adaptively fuses temporal, semantic, and contextual representations.The following pseudocode formalizes this mechanism as an integrated optimization algorithm.

Algorithm 1: Hybrid Temporal–KAN Attention (Hyb–TKAN)

Input: Token sequence

X = {x_{1}, \dots, x_{T}}

, time indices

{t_{1}, \dots, t_{T}}

, parameters

Θ

Output: Predicted class probabilities

\hat{y}

4. Experimental Results and Evaluation

This part of the study reports the outcomes of the experiments and provides a comparative evaluation of the proposed approach against several widely used methods.

4.1. Data Processing Tools

The Python 3.9.13 programming language and the Tensorflow machine learning library were used to develop the algorithm. The experimental platform was equipped with a DEPO Storm 3450T4R server (DATSN.466219.013-03) SMD/2xG6230/1024GBRE16/L9361-8i/2DT480/4T4000G7/2DT960L/2DT960L/8HSDA/DATSN.469535.001/16D/6E/4GLAN/ IPMI+/RTX3080/RTX3080/1200W2HS/FP/ONS3S, 2 x G 6230 processor [20 cores, 40 threads, 2.1 GHz, 27.5 MB cache, 125 W], 1024GB RAM: 8 x 128 GB DDR4 ECC REG.

4.2. Model Results

This section presents the results of experiments conducted using the Hyb-TKAN model, which is developed as a dynamic framework for analyzing the evolution of destructive impact scenarios in the digital space and their classification. The proposed model was tested in multi-class classification mode on a real-world cyberbullying dataset.

The model was trained for 20 epochs with the Adam optimizer, an initial learning rate of 3 ×

10^{- 5}

, and a batch size of 32. The learning rate was adaptively decreased by a factor of 0.5 if the validation loss did not improve for 3 consecutive epochs. We employed early stopping based on validation F1-score, with a patience of 5 epochs, retaining the checkpoint with the best validation performance. The loss function was class-weighted cross-entropy, as described in Section 2, to compensate for moderate class imbalance. Dropout regularization (p = 0.3) was applied to the BiLSTM and fully connected layers to prevent overfitting. Each experiment was repeated five times with different random seeds (42, 73, 101, 202, 777), and the reported metrics represent the mean ± standard deviation across runs. The following metrics were used to evaluate the performance on the test subset: accuracy, precision, recall, F1-score, and AUC (macro-averaged over all classes).

The developed Hyb-TKAN model demonstrated high performance in solving the multi-class cyberbullying classification problem, providing robust and consistent recognition quality. Thanks to its hybrid architecture, which combines the advantages of BiLSTM, Transformer, MD-DAN, and KAN, the model successfully captures both global and local dependencies in text data. The final model accuracy was 96.3, F1-score was 95.7, precision was 96.6, and recall was 94.9, indicating balanced performance across all key metrics (Figure 4).

The obtained results confirm the model’s high ability to detect various forms of aggression in online communication, including subtle manifestations of toxicity, and highlight its applicability for automated monitoring of the digital space. The high performance of the Hyb-TKAN model in the task of multi-class classification of toxic comments is confirmed by analysis of the confusion matrix presented in Figure 5. The model demonstrated accurate identification across all key cyberbullying categories. Moreover, the number of misclassifications between adjacent classes (off-diagonal) remains minimal, demonstrating the model’s high selective ability. The overall accuracy reached 96.3, reflecting the model’s ability to reliably generalize and accurately classify various forms of online aggression. This result demonstrates the robustness of the Hyb-TKAN architecture, capable of effectively extracting and interpreting hidden semantic dependencies in text, making it a reliable tool for monitoring toxic behavior in the digital space.

The integration of the Temporal-KAN module significantly expands the model’s interpretability. Unlike standard neural network architectures, which often operate as a “black box,” Temporal-KAN allows for explicit tracking of the contribution of temporal factors and nonlinear dependencies to the classification process. This is particularly important for our task, as the temporal evolution of destructive content reflects hidden patterns in the spread of threats: a gradual intensification of rhetoric, shifts in categories of aggressive behavior, and the emergence of new lexical patterns. The use of temporal functions and polynomial approximations in Temporal-KAN allows us to quantitatively assess which time intervals and patterns of change most significantly influenced the model’s final decision. This approach increases the reliability of classification results, as it allows cybersecurity experts not only to identify destructive messages but also to understand the dynamics of their transformation and key indicators that determine threat escalation.

To assess the overall dynamics of destructive content, we analyzed aggregated monthly indicators across all categories. As shown in Figure 6, the dynamics reveal distinct peaks of activity, reflecting periods of increased destructive activity followed by declines. This result demonstrates Hyb-TKAN’s ability to detect both spikes and long-term changes in malicious communication.

To provide a more detailed view, Figure 7 depicts the monthly dynamics for each individual category. The results show distinct temporal patterns: insults and obscene language remain relatively stable, whereas threats, sexual explicit content, and identity-based attacks exhibit sharper fluctuations. This confirms that different forms of harmful communication evolve at different paces and that the proposed model successfully captures these heterogeneous temporal trends.

The model proposed by the authors in the study was compared with transformer-type architectures, including BERT, DistilBERT, RoBERTa, XLNet, ALBERT, ELECTRA, and DeBERTa (Table 1). The performance of the proposed model and all baseline models was evaluated on a test set using a set of standard classification metrics: Accuracy, Precision, Recall, and F1-score.

The Hyb-TKAN model demonstrated the highest Accuracy value of 96.3%, which significantly exceeds the accuracy of even such powerful models as RoBERTa-large (93.0%) and DeBERTa-v3-small (92.4%). In terms of F1-score, which reflects the balance between precision and recall, Hyb-TKAN also outperforms all compared architectures, reaching 95.7%, while its closest competitors, RoBERTa-large and RoBERTa-base, demonstrated F1-score values of 94.4% and 94.2%, respectively. Additionally, Hyb-TKAN’s Precision was 96.6%, which exceeds the values of most models, including ELECTRA-base (95.6%) and ALBERT-xxlarge (95.1%). This demonstrates the model’s high ability to avoid false positives. Hyb-TKAN’s recall is also high (94.9%), outperforming DistilBERT (88.7%), BERT-base (89.2%), and even more powerful variations of RoBERT and ALBERT. This demonstrates that combining different methods leverages the strengths of each architecture to improve model accuracy and robustness.

Thus, the presented Hyb-TKAN model, based on a hybrid architecture, demonstrates the most balanced and high performance in the multi-class classification task of toxic comments, including categories such as insults, threats, obscenity, and discrimination. This confirms its high generalization ability and robustness to various types of online textual aggression.

4.3. Discussion on Ablation Studies

Hybrid KAN-based models have a more complex structure than classical models due to the combination of basis and spline functions. While ablation studies are beyond the scope of this research, they would provide a deeper understanding for the each module contribution to the overall model performance. Such studies have been conducted in a number of cases related to both widely used deep learning architectures [43] and KAN-based models [44,45,46]. This potential direction for future research could be implemented.

4.4. Computational Complexity of the Proposed Algorithm

The computational complexity of the proposed hybrid Hyb-TKAN model is determined by the combined costs of all its main components. It is affected by the length of the input text sequence, the size of the BiLSTM hidden state, the embedding dimension, and the number of Transformer layers. The parameters of the Temporal-KAN module, related to the number of basis and nonlinear functions used, also contribute additionally.

In general, BiLSTM increases the load proportionally to the sequence length, Transformer adds quadratic computations characteristic of self-attention mechanisms, and the MD-DAN and Temporal-KAN modules increase the complexity only moderately. Despite this, the additional computations are relatively small compared to traditional architectures while significantly improving data processing quality.

A detailed analysis of the computational complexity of the proposed Hyb-TKAN model was conducted. Given that the model integrates several computational modules—BiLSTM, Transformer (RoBERTa), Multi-Domain Dynamic Attention Network (MD-DAN), and Temporal-KAN—the overall complexity can be expressed as a combination of their individual contributions.

The runtime complexity combines the cost of the BiLSTM, the Transformer block, the multi-domain attention module, and the Temporal-KAN component. For input sequences of approximately 80–120 tokens, the Transformer attention remains the dominant term, while BiLSTM, MD-DAN and T-KAN introduce only moderate linear or near-linear overhead.

The following results were obtained.

Thus, the hybrid modules increase inference overhead by only 18–22%, which is expected given the additional MD-DAN and T-KAN computations (Table 2).

Thus, the final computational load of the model remains comparable to conventional Transformer solutions, while Hyb-TKAN provides higher accuracy and robustness of analysis, making it suitable for practical applications, including scenarios requiring fast text processing.

5. Conclusions

The results of this study confirmed the high effectiveness of the proposed hybrid Hyb-TKAN architecture, which combines BiLSTM for accounting for local dependencies, Transformer modules for extracting global context, a multi-layer attention mechanism (MD-DAN) for analyzing semantic and contextual relationships, and a Temporal-KAN block for modeling temporal dynamics and nonlinear dependencies in the data.

The model demonstrated consistent performance in the multi-class classification of destructive text content, achieving 96.3% accuracy, 96.6% precision, 94.9% recall, and 95.7% F1-score. These results outperform several modern Transformer-based baseline models (including RoBERTa-large and ELECTRA-base), confirming the competitiveness of the proposed approach.

In addition to the high accuracy of detecting toxic content in digital communications, the experimental results demonstrated that the proposed model is applicable to broader cybersecurity challenges in intelligent telecommunications systems and Internet of Things (IoT) infrastructure. The specific characteristics of the IoT environment—heterogeneous data flows, limited node computing resources, and high vulnerability to social engineering attacks—require adaptive and interpretable models.

Integrating the Temporal-KAN module enhanced the model’s interpretability and allowed it to account for the evolution of destructive scenarios over time, as well as identify hidden nonlinear dependencies between features. This is especially important for critical cyber–physical applications, where the explainability of automated system decisions plays a key role in user trust. Furthermore, the model’s modular structure enables adaptation to multimodal data (text, audio, sensors), which is becoming increasingly relevant in next-generation cyber–physical systems.

Thus, the Hyb-TKAN architecture represents not only a scientific contribution to the development of hybrid neural network models but also a practical solution with high potential for application in ensuring cybersecurity in IoT and digital infrastructures.

Author Contributions

Conceptualization, E.P. and S.G.; methodology, A.C.; software, E.P.; validation, S.G., E.P. and A.C.; formal analysis, A.C.; investigation, S.G., E.P. and A.C.; resources, E.P.; data curation, E.P.; writing—original draft preparation, S.G., E.P. and A.C.; writing—review and editing, E.P., S.G.; visualization, E.P.; supervision, S.G.; project administration, E.P.; funding acquisition, E.P. and S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Russian Science Foundation, grant number 25-71-10012.

Institutional Review Board Statement

This study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

The cyberbullying datasets can be downloaded from https://www.kaggle.com/datasets/shauryapanpalia/cyberbullying-classification (accessed on 14 September 2025). The code is available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yenduri, G.; Ramalingam, M.; Selvi, G.C.; Supriya, Y.; Srivastava, G.; Maddikunta, P.K.R.; Raj, G.D.; Jhaveri, R.H.; Prabadevi, B.; Wang, W.; et al. GPT (Generative Pre-Trained Transformer)—A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. IEEE Access 2024, 12, 54608–54649. [Google Scholar] [CrossRef]
Pleshakova, E.; Osipov, A.; Gataullin, S.; Gataullin, T.; Vasilakos, A. Next Gen Cybersecurity Paradigm Towards Artificial General Intelligence: Russian Market Challenges and Future Global Technological Trends. J. Comput. Virol. Hacking Tech. 2024, 20, 429–440. [Google Scholar] [CrossRef]
Wang, F.Y.; Huang, Z.; Bergman, A.; Shen, D.; Gao, P.; Lingelbach, M.; Sun, K.; Bian, W.; Song, G.; Liu, Y.; et al. Phased Consistency Models. Adv. Neural Inf. Process. Syst. 2024, 37, 83951–84009. [Google Scholar]
Zheng, J.; Hu, M.; Fan, Z.; Wang, C.; Ding, C.; Tao, D.; Cham, T.J. Trajectory Consistency Distillation. CoRR 2024. Available online: https://openreview.net/forum?id=aDJXCgfkf4 (accessed on 20 November 2025).
Geng, Z.; Pokle, A.; Luo, W.; Lin, J.; Kolter, J.Z. Consistency Models Made Easy. arXiv 2024, arXiv:2406.14548. [Google Scholar] [CrossRef]
Liu, D.; Yang, M.; Qu, X.; Zhou, P.; Cheng, Y.; Hu, W. A Survey of Attacks on Large Vision–Language Models: Resources, Advances, and Future Trends. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 19525–19545. [Google Scholar] [CrossRef] [PubMed]
Zhang, P.; Zeng, G.; Wang, T.; Lu, W. TinyLLaMA: An Open-Source Small Language Model. arXiv 2024, arXiv:2401.02385. [Google Scholar]
Schick, T.; Schütze, H. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 2339–2352. [Google Scholar]
Magister, L.C.; Mallinson, J.; Adamek, J.; Malmi, E.; Severyn, A. Teaching Small Language Models to Reason. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 1773–1781. [Google Scholar]
Wang, F.; Zhang, Z.; Zhang, X.; Wu, Z.; Mo, T.; Lu, Q.; Wang, W.; Li, R.; Xu, J.; Tang, X.; et al. A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness. In ACM Transactions on Intelligent Systems and Technology; ACM: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Hu, S.; Tu, Y.; Han, X.; He, C.; Cui, G.; Long, X.; Zheng, Z.; Fang, Y.; Huang, Y.; Zhao, W.; et al. MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
Zhou, Z.; Liu, Z.; Liu, J.; Dong, Z.; Yang, C.; Qiao, Y. Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models. Adv. Neural Inf. Process. Syst. 2024, 37, 4819–4851. [Google Scholar]
Kim, M.J.; Pertsch, K.; Karamcheti, S.; Xiao, T.; Balakrishna, A.; Nair, S.; Rafailov, R.; Foster, E.; Lam, G.; Sanketi, P.; et al. OpenVLA: An Open-Source Vision-Language-Action Model. arXiv 2025, arXiv:2406.09246. [Google Scholar]
Wen, J.; Zhu, Y.; Li, J.; Zhu, M.; Tang, Z.; Wu, K.; Xu, Z.; Liu, N.; Cheng, R.; Shen, C.; et al. TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation. IEEE Robot. Autom. Lett. 2025, 10, 3988–3995. [Google Scholar] [CrossRef]
Salazar, J.; Liang, D.; Nguyen, T.Q.; Kirchhoff, K. Masked Language Model Scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020. [Google Scholar]
Bao, H.; Dong, L.; Wei, F.; Wang, W.; Yang, N.; Liu, X.; Wang, Y.; Gao, J.; Piao, S.; Zhou, M.; et al. UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
Sahoo, S.; Arriola, M.; Schiff, Y.; Gokaslan, A.; Marroquin, E.; Chiu, J.; Rush, A.; Kuleshov, V. Simple and Effective Masked Diffusion Language Models. Adv. Neural Inf. Process. Syst. 2024, 37, 130136–130184. [Google Scholar]
Arriola, M.; Chiu, J.; Gokaslan, A.; Kuleshov, V.; Marroquin, E.; Rush, A.; Sahoo, S.; Schiff, Y. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023. [Google Scholar]
Zhang, Y.; Shen, Z.; Jiao, R. Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions. Comput. Biol. Med. 2024, 171, 108238. [Google Scholar] [CrossRef]
Mazurowski, M.A.; Dong, H.; Gu, H.; Yang, J.; Konz, N.; Zhang, Y. Segment Anything Model for Medical Image Analysis: An Experimental Study. Med. Image Anal. 2023, 89, 102918. [Google Scholar] [CrossRef]
Ren, S.; Luzi, F.; Lahrichi, S.; Kassaw, K.; Collins, L.M.; Bradbury, K.; Malof, J.M. Segment Anything, from Space? In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024. [Google Scholar]
Ke, L.; Ye, M.; Danelljan, M.; Tai, Y.W.; Tang, C.K.; Yu, F. Segment Anything in High Quality. Adv. Neural Inf. Process. Syst. 2023, 36, 29914–29934. [Google Scholar]
Wang, D.; Zhang, J.; Du, B.; Xu, M.; Liu, L.; Tao, D.; Zhang, L. SAMRS: Scaling-Up Remote Sensing Segmentation Dataset with Segment Anything Model. Adv. Neural Inf. Process. Syst. 2023, 36, 8815–8827. [Google Scholar]
Petushkov, G.V.; Sigov, A.S. Analysis and selection of the structure of a multiprocessor computing system according to the performance criterion. Russ. Technol. J. 2024, 12, 20–25. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
Kozyrev, A.N. Data Economy, Neural Network Training, and Multidimensional Geometry. Digit. Econ. 2024, 3, 5–13. (In Russian) [Google Scholar] [CrossRef]
Thant, A.M.; Panitanarak, T. Emotion Recognition Through Advanced Signal Fusion and Kolmogorov-Arnold Networks. IEEE Access 2025, 13, 93259–93270. [Google Scholar] [CrossRef]
Ping, J.; Xu, B.; Wang, X.; Zhang, W.; Gao, Z.; Song, A. KAN-GCNN: EEG-Based Emotion Recognition with a Kolmogorov-Arnold Network-Enhanced Graph Convolutional Neural Network. In Proceedings of the 5th International Conference on Robotics and Control Engineering, Linköping, Sweden, 7–10 May 2025. [Google Scholar]
Ghosh, S.; Saha, S.; Jana, N.D. KANGAN-AVSS: Kolmogorov-Arnold Network Based Generative Adversarial Networks for Audio–Visual Speech Synthesis. In Proceedings of the ICASSP 2025—IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025. [Google Scholar]
Wang, Z.; Zainal, A.; Siraj, M.M.; Ghaleb, F.A.; Hao, X.; Han, S. An Intrusion Detection Model Based on Convolutional Kolmogorov-Arnold Networks. Sci. Rep. 2025, 15, 1917. [Google Scholar] [CrossRef] [PubMed]
Zheng, J.; Cao, M.; Zhang, C. ICKAN: A Deep Musical Instrument Classification Model Incorporating Kolmogorov-Arnold Network. Sci. Rep. 2025, 15, 21573. [Google Scholar] [CrossRef]
Alsayed, A.; Li, C.; Abdalsalam, M.; Fat’hAlalim, A. A Hybrid Model for Arabic Character Recognition Using CNN and Kolmogorov-Arnold Networks (KANs). Multimed. Tools Appl. 2025, 1–24. [Google Scholar] [CrossRef]
Rizk, F.; Rizk, R.; Rizk, D.; Rizk, P.; Chu, C.-H.H. KAN-MID: A Kolmogorov-Arnold Networks-Based Framework for Malicious URL and Intrusion Detection in IoT Systems. IEEE Access 2025, 13, 160855–160873. [Google Scholar] [CrossRef]
Pleshakova, E.S.; Gataullin, S.T. KAN-BiLSTM: Hybrid Neural Network Model with Multi-Domain Attention for Threat Analysis in Digital Space; Certificate of State Registration of Computer Program No. 2025619243; EDN ORSBNY; Rospatent Federal Service on Intellectual Property: Moscow, Russia, 2025. (In Russian) [Google Scholar]
Toscano, J.D.; Wang, L.L.; Karniadakis, G.E. KKANs: Kurkova-Kolmogorov-Arnold Networks and Their Learning Dynamics. Neural Netw. 2025, 191, 107831. [Google Scholar] [CrossRef] [PubMed]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT: A Distilled Version of BERT—Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. ELECTRA: Pre-Training Text Encoders as Discriminators Rather Than Generators. arXiv 2020, arXiv:2003.10555. [Google Scholar]
He, P.; Gao, J.; Chen, W. DeBERTaV3: Improving DeBERTa Using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv 2021, arXiv:2111.09543. [Google Scholar]
Jin, Z.; Zhang, Y. A Graph Neural Network-Based Context-Aware Framework for Sentiment Analysis Classification in Chinese Microblogs. Mathematics 2025, 13, 997. [Google Scholar] [CrossRef]
Vaca-Rubio, C.J.; Blanco, L.; Pereira, R.; Caus, M. Kolmogorov-Arnold Networks (KANs) for Time Series Analysis. arXiv 2024, arXiv:2405.08790. [Google Scholar]
Chen, K.L.; Ding, J.J. Kolmogorov-Arnold Networks with Trainable Activation Functions for Data Regression and Classification. In Proceedings of the 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 18–21 February 2025. [Google Scholar]
Yuan, L. G-KAN: Graph Kolmogorov-Arnold Network for Node Classification Using Contrastive Learning. IEEE Access 2025, 13, 100287–100297. [Google Scholar] [CrossRef]

Figure 1. Example tweets from each dataset category.

Figure 2. Data preparation and preprocessing.

Figure 3. Conceptual architecture of the proposed model.

Figure 4. Key metrics of the Hyb-TKAN model.

Figure 5. Error matrix for detecting cyberbullying.

Figure 6. Temporal dynamics of the total volume of destructive messages aggregated across all categories on a monthly basis.

Figure 7. Temporal dynamics of the total volume of destructive messages aggregated across all categories on a monthly basis.

Table 1. Performance comparison of transformer-based models.

Models	Accuracy	Precision	Recall	F1-Score
BERT-base	90.1%	94.7%	89.2%	91.9%
RoBERTa-base	92.8%	94.2%	94.2%	93.2%
RoBERTa-large	93.0%	94.9%	93.9%	94.4%
DistilBERT	89.0%	93.5%	88.7%	90.1%
ALBERT-xxlarge	91.5%	95.1%	91.3%	93.0%
XLNet-large	90.5%	94.2%	90.5%	91.7%
ELECTRA-base	91.6%	95.6%	92.0%	93.7%
DeBERTa-v3-small	92.4%	96.0%	95.7%	93.8%
Hyb-TKAN	96.3%	96.4%	94.7%	95.3%

Table 2. Training and inference performance of different models.

Model	Train Time per Epoch (s)	Inference (ms/Sample)	GPU Memory (GB)
DistilBERT	52–57	2.9–3.4	2.4–2.7
BERT-base	88–94	4.2–4.8	3.9–4.3
RoBERTa-base	78–84	3.8–4.4	4.1–4.6
XLNet-base	112–119	5.0–5.6	4.8–5.3
ELECTRA-base	71–76	3.4–3.9	3.6–3.9
Hyb-TKAN	98–104	4.6–5.2	4.4–4.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chechkin, A.; Pleshakova, E.; Gataullin, S. A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space. Algorithms 2025, 18, 735. https://doi.org/10.3390/a18120735

AMA Style

Chechkin A, Pleshakova E, Gataullin S. A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space. Algorithms. 2025; 18(12):735. https://doi.org/10.3390/a18120735

Chicago/Turabian Style

Chechkin, Aleksandr, Ekaterina Pleshakova, and Sergey Gataullin. 2025. "A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space" Algorithms 18, no. 12: 735. https://doi.org/10.3390/a18120735

APA Style

Chechkin, A., Pleshakova, E., & Gataullin, S. (2025). A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space. Algorithms, 18(12), 735. https://doi.org/10.3390/a18120735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Neural Network Transformer for Detecting and Classifying Destructive Content in Digital Space

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Dataset

3.2. Data Preparation and Pre-Processing

3.3. Core Algorithmic Components

3.4. The Proposed Approach

3.5. Input Data

3.6. Bidirectional LSTM (BiLSTM)

3.7. Transformer Module (RoBERTa)

3.8. Multi-Domain Dynamic Attention Network (MD-DAN)

3.9. Temporal-KAN

4. Experimental Results and Evaluation

4.1. Data Processing Tools

4.2. Model Results

4.3. Discussion on Ablation Studies

4.4. Computational Complexity of the Proposed Algorithm

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI