LKD: LLM-Assisted Knowledge Distillation for Efficient and Robust Social Bot Detection

Ye, Wenhui; Ye, Wenxi; Wang, Haizhou

doi:10.3390/electronics15102019

Open AccessArticle

LKD: LLM-Assisted Knowledge Distillation for Efficient and Robust Social Bot Detection

by

Wenhui Ye

¹,

Wenxi Ye

² and

Haizhou Wang

^1,*

¹

School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China

²

School of Artificial Intelligence and Advanced Computing, Hunan University of Technology and Business, Changsha 410205, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(10), 2019; https://doi.org/10.3390/electronics15102019

Submission received: 9 April 2026 / Revised: 4 May 2026 / Accepted: 6 May 2026 / Published: 9 May 2026

(This article belongs to the Topic New Applications of Big Data Technology: Integration of Data Mining and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Social bots significantly threaten online public opinion through manipulation and misinformation, posing detection challenges due to high anthropomorphism and concealment. GNN methods show superior performance but face deployment hurdles on real-world platforms because of their reliance on multi-hop neighbor information during inference. Conversely, pure text-based methods lack collective behavior modeling and robustness against advanced bots. This paper proposes LKD, a social bot detection framework for graph-less deployment. The framework utilizes large language models to summarize historical tweets, compressing long-text information to construct multi-source inputs including metadata, profiles, and tweets. By employing a GNN as the teacher and a pre-trained LM as the student, LKD transfers structural knowledge to a text-based model via dual-objective knowledge distillation across prediction distributions and feature spaces. Experiments on Cresci-2015 and TwiBot-20 datasets show that the graph-less LKD-LM mode outperforms state-of-the-art methods in accuracy and F1-score. It maintains stable performance in label-scarce and sparse-graph scenarios, providing an efficient, robust solution for social media platforms with restricted interfaces or real-time requirements.

Keywords:

social bot detection; knowledge distillation; large language models; GNN; graph-less inference

1. Introduction

As the influence of social media on public discourse continues to grow, social bots have become significant tools for information manipulation, the dissemination of misinformation, and public opinion interference. Research indicates that approximately 15% to 20% of accounts on platform X, formerly known as Twitter, are bots. These accounts leverage automated content generation and human behavior camouflage to participate extensively in political guidance, commercial marketing, and the manipulation of social events [1,2]. Traditional detection methods based on rules and simple thresholds struggle to identify new types of bots that evolve toward high anthropomorphism, particularly advanced bots that use large language models to generate semantically coherent content [3]. Consequently, developing efficient, robust, and easily deployable social bot detection methods has become a critical issue for both academia and industry.

In recent years, social bot detection techniques have evolved into several mainstream research paradigms. Feature engineering-based methods rely on manually extracted account metadata and behavioral statistical features, offering strong interpretability but exhibiting limited generalization capability to emerging bot types [4]. Graph-based methods construct heterogeneous graphs of user relationships, including following, reposting, and interaction links, and employ graph neural networks (GNNs) to capture community structures and propagation patterns, achieving state of the art detection performance in existing studies [5,6]. However, such methods present significant limitations during deployment, as the inference process requires querying multi-hop neighbor information of target users, making them difficult to apply reliably in real-world scenarios due to platform API constraints, latency overhead, and sampling bias [7]. Text-based detection methods leverage pre-trained language models (PLMs) to model tweets and user profiles, enabling operation without graph information; however, they generally lack the incorporation of social structural knowledge and fail to fully capture the deeper behavioral semantic patterns in user-generated content [8]. Although LLMs introduce new opportunities for social bot detection, existing approaches still face challenges such as limited long-text input capacity and insufficient efficiency in knowledge integration.

To address the aforementioned challenges, this paper proposes the LKD framework, namely LLM-assisted knowledge distillation, a social bot detection method designed for graph-free deployment that integrates large language models with knowledge distillation. LKD utilizes LLMs to generate deep semantic summaries of users’ historical posts, extracting behavioral patterns, sentiment consistency, and anomalous posting characteristics, thereby incorporating highly condensed semantic information into the model input. Through a teacher–student knowledge distillation mechanism, structural knowledge learned by graph neural networks is transferred to a lightweight language model, forming a detection paradigm in which graph structure is leveraged to enhance learning during the training phase, while only textual information is required during inference. LKD can be applied to both graph-available and graph-free data scenarios. When graph information is available, the language model is improved through iterative distillation, whereas in the absence of graph information, the summary-enhanced language model is directly used for inference.

The main contributions of this paper are as follows.

(1): The LKD framework is proposed, incorporating LLM-generated semantic summaries into the knowledge distillation process to achieve multimodal fusion of text, summaries, and graph information.
(2): A multi-source structured user input scheme is designed, enabling high-level semantic summaries to enhance domain adaptation and improve the detection of advanced bots with semantic camouflage.
(3): A multi-objective knowledge distillation mechanism is developed to align both prediction distributions and feature representations, improving knowledge transfer efficiency between teacher and student models.
(4): A graph-free deployment paradigm is established for real-world social platforms, maintaining high accuracy and robustness without relying on neighbor information.

2. Related Work

As robot camouflage evolves, social bot detection has shifted from traditional feature engineering to an integrated paradigm combining textual semantics, social graph structures and multimodal fusion. Driven by the wide application of LLMs in content generation, conventional detection methods suffer from limited robustness, poor generalization and high deployment costs. Recent research thus focuses on refined feature representation, structural modeling and knowledge transfer.

2.1. Feature-Based Methods

Feature-based methods constitute the earliest and most widely adopted fundamental paradigm in social bot detection, especially in industrial applications. Leveraging statistical analysis and domain expertise, these approaches extract explicit discriminative features from multi-dimensional account information, establish a systematic feature representation, and employ conventional machine learning or deep learning classifiers to perform binary classification. The feature framework mainly consists of the following four core dimensions: user attribute features [9] including account age and follower–friend ratio, user behavioral features [10,11] including posting frequency and temporal patterns, content features [12,13] including topic distribution and sentiment tendency, as well as temporal and network features. In supervised learning, the random forest algorithm achieves superior overall detection performance due to its strong resistance to overfitting [14]. Representative studies include a lightweight detection model based on random forest proposed by Echeverría et al. [15], the effectiveness of multi-dimensional feature fusion validated by Beskow and Carley [16], and a clustering-based unsupervised detection framework introduced by Miller et al. [17] to mitigate the shortage of labeled data.

Feature-based methods exhibit high interpretability and low deployment barriers and remain core basic components for large-scale account risk control on social platforms. However, their inherent limitations have become increasingly prominent: the feature system relies heavily on expert experience with insufficient generalization ability; bot controllers can forge features by tampering with metadata and imitating human behavioral patterns to directly evade detection models [18]; in addition, strong dependence on data integrity restricts their application in scenarios with limited platform data.

2.2. Text-Based Methods

Text-based methods center on natural language processing (NLP) techniques [19,20] and perform deep encoding on textual data such as user profiles and posts to capture implicit differences between bots and genuine users in linguistic expression, semantic logic and writing style. Early studies adopted models including RNN, CNN and Bi-LSTM to extract textual semantic features [11,21]. With the rise in pre-trained language models, BERT, RoBERTa and other architectures have been widely used for deep representation learning on tweets and user descriptions [20,22]. Wu et al. [20] further constructed account-level text embeddings via triplet networks, enhancing semantic consistency modeling for long-term user behavior.

Recently, the explosive development of LLMs has posed disruptive challenges to text-based detection methods. LLMs equip social bots with strong human-like content generation capabilities, and OpenAI’s official AI content classifier only achieves a 26% accuracy in identifying LLM-generated content [23]. Furthermore, most existing text-based methods struggle to effectively capture semantic consistency over an account’s full lifecycle and suffer from insufficient generalization when applied to cross-platform and cross-lingual data.

2.3. Graph-Based Methods

Graph-based methods represent the mainstream paradigm for achieving state-of-the-art performance in social bot detection [24]. By modeling user interaction networks on social media as graph structures, such approaches use graph neural networks (GNNs) to capture structural anomalies, connection patterns and group coordination characteristics of bot accounts. Feng et al. [25] first applied the heterogeneous graph neural network RGCN to bot detection and obtained breakthrough performance. Their subsequent work, the Relational Graph Transformer (RGT) [26], aggregates cross-relation features via semantic attention networks and significantly improves model generalization. Zhou et al. [27] proposed CBD, a contrastive learning-based detection method that effectively alleviates the scarcity of labeled data. In addition, Yang et al. [28] integrated reinforcement learning and self-supervised learning to explore the optimal architecture of graph neural networks and learned fine-grained user subgraph embedding features from heterogeneous networks, thereby further boosting the detection performance of social bots under complex application scenarios.

Despite their outstanding performance, graph-based methods suffer from the following three major bottlenecks in real-world deployment: first, they exhibit extremely limited detection capability on isolated and sparsely connected nodes [29]; second, inference requires multi-hop neighbor information, leading to prohibitive time and financial costs under platform API constraints; third, most graph-based methods only exploit structural information and lack deep fusion with textual semantics.

2.4. Multimodal Fusion and Knowledge Distillation Methods

Multimodal fusion methods have emerged as a mainstream trend in recent years by combining textual, attribute and structural information to enhance detection robustness [30]. BotBuster [8] employs a mixture-of-experts architecture with separate branches for different modalities and decision-level fusion. Feng et al. [7] applied instruction tuning to LLMs and improved detection accuracy by 9.1% over single-modal baselines on multimodal datasets. To address the deployment difficulty of graph models, knowledge distillation has been adopted to transfer structural knowledge into lightweight models: GLNN [31] distills GNN knowledge into a multi-layer perceptron (MLP) to enable graph-free fast inference. Nevertheless, GLNN only transfers knowledge to a simple classifier and fails to fully exploit the advantages of pre-trained language models in textual understanding and long-sequence modeling, nor does it incorporate LLMs for high-level semantic abstraction, resulting in clear limitations when detecting advanced camouflaged bots. The proposed LKD framework in this paper represents a further breakthrough based on these observations.

3. Method

This paper proposes the LKD (LLM-assisted knowledge distillation) framework, which adopts a teacher–student knowledge distillation structure as shown in Figure 1. The teacher model is composed of graph neural networks for learning structural knowledge in social relationships, while the student model is a pre-trained language model for high-precision inference under graph-free conditions. On the left side of the framework, an LLM summarizer processes user tweets and generates high-level semantic summaries. In the middle, a multi-source input construction module uniformly encodes metadata, semantic summaries, user profiles and original tweets into structured inputs, and the student language model then completes domain adaptation and iterative multi-objective distillation optimization. On the right side, with the social graph structure as input, the teacher GNN completes structural feature modeling of user relationships. The core of the framework is to use an LLM to compress user tweets into semantic summaries and transfer structural knowledge to a pure text model through multi-objective knowledge distillation, finally realizing a detection paradigm of “graph for training and graph-free for inference”.

3.1. LLM-Based User Post Summarization

In existing text-based detection methods, user historical tweets are usually directly concatenated and fed into pre-trained language models. However, Transformer-based models have a fixed maximum input length, typically 512 or 1024 tokens. When a user’s posts exceed 1000, simple concatenation not only leads to the loss of key information but also weakens the model’s ability to distinguish core behavioral patterns due to noise such as repeated retweets and low-information interactive replies.

To address this issue, the LKD framework introduces an LLM-based user post summarization mechanism. Through offline pre-computation, this mechanism maps massive raw user tweets into structured, high-density semantic summaries for information compression and knowledge extraction. Formally, given the tweet set

T_{i} = \{t_{i}^{1}, t_{i}^{2}, \dots, t_{i}^{100}\}

of user

u_{i}

, the summarization function is defined as

S_{i} = L L M (T_{i}, d e s_{i}),

(1)

where

{des}_{i}

denotes user profile information, and LLM represents the prompt-driven generation function of a large language model. The output

S_{i}

is a user-level semantic summary covering topic consistency, sentiment stability, behavioral rhythm and anomaly patterns. The core design objectives of this module include the following four aspects: (1) deep semantic condensation to extract stable semantic patterns from long-term behavior trajectories; (2) potential automated behavior recognition to capture anomalies such as fixed templated expressions and frequent homogeneous content; (3) model input enhancement and knowledge injection by embedding summaries as domain meta-knowledge into the student model input sequence; (4) noise filtering and information density improvement by automatically discarding low-information content such as advertisements and meaningless replies.

In this study, the open-source large language model Qwen 2.5 7B Instruct is employed for zero-shot summarization via prompt engineering. The first K = 100 tweets are randomly sampled from each user’s history for processing. The generated summaries are constrained to 150–300 words and stored in a local database after format verification and post-processing.

3.2. Multi-Source User Input Construction

To address the issue that single-modal information cannot fully characterize user features in social bot detection, this paper adopts a multi-source information fusion strategy. For each user

u_{i}

, the following four types of information are available: (1) metadata M including username, account creation time, follower count, following count and verification status; (2) personal description D; (3) historical tweet set T; and (4) LLM-generated summary S.

All structured and unstructured information is uniformly converted into natural language expressions via a textual encoding strategy and concatenated into a single input sequence (

[M] \{m e t a d a t a 1\} [S E P] \{m e t a d a t a 2\} [S E P] \dots

).

Full user sequence:

X_{i} = [M] \{m e t a d a t a\} [S] \{s u m m a r y\} [D] \{d e s c r i p t i o n\} [T] \{t w e e t\},

(2)

where [M], [S], [D] and [T] are newly added special tokens that are explicitly incorporated into the vocabulary of the language model tokenizer and trained during fine-tuning. This multi-segment template offers the following three advantages: (1) an explicit hierarchical structure that facilitates the self-attention mechanism in learning inter-segment relationships; (2) priority injection of deep semantics, with the summary segment placed at the front of the sequence to ensure key information participates in representation construction first; and (3) support for dynamic length expansion, which dynamically adjusts the length of the tweet segment under the maximum length constraint

L_{m a x} = 2048

tokens while keeping metadata and summary segments intact. Hashtags, user mentions and URLs in original tweets and descriptions are replaced with the placeholders #HASHTAG, @USER and HTTPURL respectively to reduce interference from sparse vocabulary.

3.3. Student Language Model

In the LKD framework, the language model serves as the core inference model for final deployment. To alleviate the significant discrepancy between pre-trained data distribution and social media corpus, namely the representation shift problem, this paper performs domain-adaptive fine-tuning on the language model. Let

X_{i}

denote the input text sequence of the i-th user. The encoding process of the language model is defined as

H = L M (X_{i}) \in R^{L_{i} \times d}

(3)

where

L_{i}

is the token length and

d

is the hidden dimension. Mean pooling is adopted to aggregate the hidden states and obtain a user-level embedding

h

of 768 dimensions, which is more stable than using the [CLS] token alone for long-sequence inputs. A multi-layer perceptron (MLP) is then used to output classification logits:

z_{s} = M L P (h)

(4)

The model is fine-tuned for domain adaptation using the standard cross-entropy loss:

L_{C E} = - \sum_{i} \sum_{c} y_{i, c} \log δ (z_{s, i, c}) + λ_{L M} {‖θ_{L M}‖}^{2}

(5)

where

y_{i, c}

denotes the ground-truth hard label,

δ

denotes the Softmax function, and

λ_{L M}

is the coefficient of L2 regularization. Through this training stage, a stable text baseline model for social bot detection is obtained, which acts as the initialization for the subsequent distillation phase.

3.4. Teacher Graph Neural Network

In social bot detection, user relationships such as following, retweeting and replying form an explicit social graph, which contains group behavioral patterns unobservable from single text entries (e.g., bot accounts often appear in clusters with abnormally dense mutual subgraphs). To fully exploit structural signals, the LKD framework adopts a GNN as the teacher model.

The teacher model takes the user embedding

h_{i}

generated by the student language model as the initial node feature and models the graph structure via multi-layer message passing. The general form of the l-th layer is given by:

m_{i}^{(l)} = M S G^{(l)} (h_{i}^{(l)}, h_{j}^{(l)}, r_{j i})

(6)

h_{i}^{(l + 1)} = U P D A T E (h_{i}^{(l - 1)}, A G G (\{m_{i j}^{(l)} |j \in N (i)\}))

(7)

After

L

layers of GNN computation, the structure-aware user representation

h_{G}

is obtained and fed into a linear classifier to produce logits

z_{t}

. To improve knowledge transfer, the teacher model outputs soft labels smoothed by a temperature parameter

T

:

p_{t} = δ (z_{t} / T)

(8)

For datasets without explicit graph structure, an MLP is used instead of GNN as the teacher model to maintain framework consistency and applicability.

3.5. Multi-Objective Knowledge Distillation

After training the teacher model, LKD transfers discriminative information from the structural model to the student language model through a multi-objective knowledge distillation mechanism. Different from conventional distillation that only aligns outputs, this work imposes constraints on both the prediction distribution and intermediate feature layers to achieve the following dual transfer goals: decision alignment and representation alignment.

Prediction-layer distillation aligns the student prediction distribution with teacher soft labels using Kullback–Leibler divergence:

L_{K L} = D_{K L} (p_{t} ‖p_{s})

(9)

Feature-layer distillation maps student representations to the teacher space via a linear projection layer

ϕ (\cdot)

and performs direct feature matching with mean squared error:

L_{F e a t} = \frac{1}{N} \sum_{i} {‖ϕ (h_{i}) - h_{G, i}‖}_{2}^{2}

(10)

The final joint optimization objective for the student language model in the distillation stage is

L = L_{C E} + λ_{1} L_{K L} + λ_{2} L_{F e a t}

(11)

where

λ_{1}

and

λ_{2}

are hyperparameters balancing different loss terms. Through multi-objective optimization, the text-only student model can simultaneously learn classification knowledge, decision boundary knowledge and structural knowledge, enabling high-precision inference without graph structures.

3.6. Iterative Training and Inference

LKD employs an iterative training strategy: first, domain adaptation of the student model is completed using Equation (5); then, the student model is fixed, and the teacher GNN is trained with its generated user embeddings as input (Equations (7) and (8)); next, the student model is optimized via distillation using the soft labels and features from the teacher model (Equation (11)). This process repeats until performance on the validation set stops improving. Convergence is considered reached and iteration terminates if neither model achieves performance gains on the validation set in a given round.

The inference stage supports the following two modes: (1) LKD-LM, which uses only the language model for inference with user text sequences as input and no graph construction required, suitable for online real-time detection scenarios; (2) LKD-GNN, which constructs a full user relation graph in offline settings and performs inference with the teacher model to enable group analysis using structural information. In practical deployment, LKD-LM is generally preferred as the primary inference model.

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

This study adopts the following two widely recognized social bot detection datasets in academia: Cresci-2015 [32] and TwiBot-20 [33]. Cresci-2015 contains account information, tweets, and graph structures with two types of user relations for 5301 users. TwiBot-20 includes account data, tweets and relational graphs for 229,580 users, among which 11,826 labeled users are used in this work. All datasets are randomly split into training, validation and test sets with a ratio of 1:1:8. Each experiment is repeated five times, and results are reported as mean values. Detailed dataset statistics are shown in Table 1.

4.1.2. Experimental Configuration

The LKD framework is implemented using PyTorch(2.6.0+cu124) [34], PyTorch Geometric(2.6.1) [35], scikit-learn(1.5.1) [36], and Transformers(4.41.1) [37]. All experiments are conducted on a single Tesla V100 GPU with 32 GB memory. The detailed hyperparameter settings are summarized in Table 2.

4.2. Comparison Experiments

In this paper, we compare LKD with a variety of baselines, including feature-based methods (SGBot [9], BotHunter [16], Kudugunta [38], LOBO [15]), graph structure-based methods (BotRGCN [25], RGT [26], SimpleHGN [39], HGT [40]), multimodal methods (BotBuster [8]), and knowledge distillation methods (GLNN-RGT, GLNN-HGT [31]). In addition, a fine-tuned language model baseline, denoted as LM-finetune, is included for comparison. The detailed introduction of these comparison methods is given in Appendix A.

Experimental results are shown in Table 3, with the main conclusions as follows: (1) LKD achieves the best performance on both datasets. In the graph-free inference mode LKD-LM, the accuracy is improved by 0.37% and 0.97% on Cresci-2015 and TwiBot-20, respectively, compared with the state-of-the-art model, which fully validates the effectiveness of the distillation mechanism. (2) Language models play a core role in text-rich tasks. The GNN enhanced by the language model (LKD-GNN) outperforms the original BotRGCN, indicating that language models can provide higher-quality initial feature representations for GNNs. (3) LKD significantly outperforms GLNN-based baselines on graph knowledge distillation. By adopting an iterative distillation strategy to transfer GNN knowledge to the language model, LKD makes fuller use of the text understanding advantages of pre-trained language models than GLNN, which only transfers knowledge to an MLP.

4.3. Ablation Experiments

To verify the effectiveness of each core component, we conduct the following three groups of ablation studies on the TwiBot-20 dataset:

(1): LM fine-tune only: the GNN module and the LKD iterative distillation are removed, leaving only the language model fine-tuned individually;
(2): GNN training only: the LM fine-tuning and LKD iteration are removed, leaving only the GNN trained individually;
(3): MLP instead of GNN + LKD iteration: the GNN is replaced with an MLP, while the full LKD iterative pipeline is retained.

The ablation results are presented in Table 4.

The ablation results clearly validate the effectiveness of each core module and the rationality of our bimodal fusion design. The LM-only model significantly outperforms the GNN-only counterpart, demonstrating that textual representation is essential for accurate social bot detection. Replacing GNN with MLP results in a slight performance decline, indicating that GNN has unique advantages in modeling user social relations and group collaborative behavior. The full LKD framework transfers structural knowledge from GNN to LM via iterative distillation, enabling joint optimization of textual and structural representations, which further verifies the effectiveness of the proposed design.

4.4. Experiments on Different Model Combinations

To verify the robustness of the LKD framework, we explore the performance of various combinations of GNNs and LMs. Specifically, we select the following five language models: T5, BERT, BART, RoBERTa, and XLM-RoBERTa, as well as the following four graph neural networks: GCN, GAT, SimpleHGN, and RGCN, to construct multiple combined architectures.

As shown in Figure 2, the model performance remains high and relatively concentrated across most combinations, confirming that the multi-objective knowledge distillation mechanism of LKD exhibits strong generalizability. XLM-RoBERTa achieves the most prominent performance among LMs when paired with different GNNs, while SimpleHGN and RGCN attain favorable performance as GNN backbones across most LM configurations. A few abnormal combinations (e.g., BART/BERT with HGT) are speculated to be caused by unstable convergence during training, which does not affect the overall judgment on the robustness of the proposed framework.

4.5. Analysis of Domain Adaptation Effectiveness

To verify the effectiveness of the domain-adaptation fine-tuning operation, we compare and analyze the convergence processes of the full LKD framework and the LKD without LM domain adaptation on the TwiBot-20 dataset, as shown in Figure 3. During the iterative training process, the LM and GNN can promote each other’s performance; the LM ultimately surpasses the GNN in detection performance by fusing the inductive bias of the GNN. Meanwhile, the language model with domain adaptation fine-tuning requires significantly fewer iterations to converge to the optimal performance than the unfine-tuned model, fully verifying the effectiveness of the domain-adaptation operation.

4.6. Analysis of Prediction Consistency

To explore the differences and commonalities between the prediction results of the student model and the teacher model, we visualize the bot class prediction probabilities of both models on the TwiBot-20 test set, with the results shown in Figure 4. Statistics indicate that the prediction consistency rate of the two models on the test set reaches 96.03%, and most data points are distributed in a narrow band along the diagonal. This demonstrates that the student model is not only highly aligned with the teacher model in prediction direction but also exhibits significant consistency in probability structure (relative confidence ranking of different samples). The few inconsistent points (accounting for approximately 3.97%) are almost all concentrated near the decision threshold; these samples are high-uncertainty cases where the textual signal and structural signal are of varying strengths, which is consistent with our expectations.

4.7. Analysis of Data Efficiency

To verify the model’s performance in scenarios with scarce labeled resources, we investigate the impact of randomly removing part of the labels and graph structure information on model performance, as shown in Figure 5. The label efficiency experiment demonstrates that when the scale of training labels is reduced to 25%, the model performance degrades gently rather than sharply. This is attributed to the graph structure relationships encoded by the teacher GNN, which serve as implicit supervision signals to compensate for insufficient labeling, and the LLM summaries that significantly improve the information density of each training sample. The graph structure efficiency experiment shows that even when a large number of edges are randomly removed, the model performance does not deteriorate drastically. The architectural design of “structure-text dual-track modeling” endows the model with inherent robustness in scenarios with degraded graph structures.

4.8. Batch Size and Optimizer Sensitivity Analysis

We conduct ablation analyses on the LM training batch size (i.e., {2, 4, 8, 16, and 32}) and optimizer selection (AdamW, Adam, RAdam, and Adadelta) on the TwiBot-20 dataset, with the results shown in Figure 6 and Figure 7. Experiments indicate that: (1) regarding batch size, the performance of the student model shows an obvious downward trend as the batch size increases, and the best performance is achieved at batch size = 2 (Acc = 0.852, F1 = 0.871). A smaller batch size helps the student model fully absorb fine-grained knowledge from the teacher’s soft labels. (2) Regarding the optimizer, AdamW achieves the best performance (Acc = 0.852, F1 = 0.871) owing to its decoupled weight decay mechanism, significantly outperforming Adam (Acc = 0.700), RAdam (Acc = 0.631), and Adadelta (Acc = 0.539). Both experiments verify the synergistic coupling relationship between the student and the teacher in the iterative distillation of LKD.

4.9. Efficiency Evaluation of Deployment Modes

To validate the efficiency and lightweight characteristics of the LKD framework in real-world deployment, we conduct a comprehensive efficiency evaluation on the TwiBot-20 dataset. We assess four key metrics including total parameters, inference throughput, per-user latency, and peak GPU memory consumption. Following the experimental configuration in Table 2, all evaluations employ T5 as the pre-trained language model and HGT as the graph neural network backbone. The following two typical deployment configurations are tested under the same hardware environment with 200 test users: the graph-free LKD-LM mode for real-time online detection, and the end-to-end LKD-GNN mode that supports online structural modeling. Latency is reported by mean, p50, and p95 values to characterize both average and tail performance.

Table 5 illustrates the quantitative efficiency results. The LKD-LM mode contains 109.81 million parameters, achieves an inference speed of 40.67 users per second, records a mean latency of 24.59 ms per user, and consumes 0.487 GB of peak GPU memory. By comparison, the LKD-GNN mode has a slightly larger parameter scale of 110.08 million, reduces the throughput to 25.87 users per second, increases the mean latency to 38.66 ms, and occupies 0.549 GB of peak GPU memory.

The results demonstrate that LKD maintains a lightweight model footprint after knowledge distillation. The structural enhancement module introduces only negligible parameter overhead (merely 0.27 M). The graph-free LKD-LM mode achieves higher throughput and lower latency, making it appropriate for latency-sensitive online detection scenarios. Enabling online graph reasoning improves structural representation ability at a moderate computational cost, which mainly comes from subgraph extraction and neighborhood scheduling operations. Overall, LKD provides flexible and efficient deployment options: the graph-free mode satisfies low-latency requirements, while the optional graph-enhanced mode offers stronger structural modeling when needed.

5. Discussion

This study verifies the effectiveness of the LKD framework for social bot detection through extensive comparative experiments and ablation analyses. Experimental results show that transferring graph structural knowledge to pre-trained language models via dual-target knowledge distillation can achieve superior detection accuracy over existing mainstream methods under the constraint of graph-free inference. Compared with traditional schemes that distill graph knowledge into simple multi-layer perceptrons, LKD leverages the advantages of pre-trained language models in text representation, sequence modeling, and contextual understanding to more fully absorb and utilize structural knowledge, thereby obtaining stronger discriminative ability when relying only on text input.

In terms of the contribution of information modalities, textual information plays a dominant role in human–bot classification, while graph structural information provides crucial supplementary gains. This phenomenon is consistent with real social scenarios: bot accounts can forge structural features by adjusting follow relationships and interaction patterns, but struggle to fully simulate real human behavioral patterns in long-term posting semantics, linguistic style, behavioral rhythm, and emotional consistency. User post summaries based on LLMs can effectively compress long-sequence information, filter noisy content, significantly improve input information density, and provide a higher-quality feature foundation for the subsequent distillation process.

The LKD framework exhibits remarkable advantages in practical deployment. Its core value lies in breaking the dependence of traditional GNN models on multi-hop neighbor queries, solving practical issues such as API restrictions, excessive latency, and high deployment costs in real social platforms. The model maintains stable performance even under low-resource conditions such as limited labels and sparse graph structures, making it more practical in real-world tasks with high annotation costs and restricted data access.

Despite its outstanding overall performance, the proposed method still has room for improvement. The detection ability of the model is somewhat limited for latent bot accounts with extremely low activity, scarce historical text, and highly sparse social relations. In scenarios with mixed languages, heavy text noise, and highly fragmented content, the quality of LLM-generated summaries decreases, which further affects the overall detection performance. In addition, the knowledge distillation process is sensitive to batch size and optimizer, and the training stability on devices with limited computational resources can be further improved. Future research can be extended based on the above limitations, including enhancing the recognition of sparse nodes with temporal behavior modeling, exploring adaptive knowledge distillation and self-supervised learning strategies, and extending the LKD framework to multi-platform and cross-lingual detection tasks.

6. Conclusions

This paper proposes LKD, a social bot detection framework for graph-free deployment. Through LLM semantic summarization and dual-target knowledge distillation, it efficiently transfers the structural knowledge of graph neural networks to a pure-text language model, realizing a lightweight detection paradigm that utilizes graph structures for enhancement in the training phase and relies only on text during inference. Experimental results demonstrate that the LKD-LM graph-free inference mode achieves state-of-the-art detection performance on the standard Cresci-2015 and TwiBot-20 datasets, outperforming existing mainstream methods in terms of accuracy and F1-score, while maintaining stability in low-resource scenarios such as scarce labels and sparse graph structures.

LKD fully combines the strengths of textual semantic modeling and structural knowledge learning. It retains the advantages of convenient deployment and efficient inference of pure-text models, while achieving discriminative accuracy close to that of graph models, enabling it to effectively handle advanced camouflaged bots driven by LLMs. The proposed strategies of multi-source input construction, LLM summary enhancement, and dual-target knowledge distillation provide a feasible solution to the dilemma between performance and deployment efficiency in social bot detection. The framework presented in this paper has strong engineering practical value in real social media environments and can run stably under scenarios with limited platform interfaces, high real-time requirements, and restricted data access. Future work will further improve the detection ability for sparse accounts, enhance adaptability to multilingual and high-noise text, and extend the framework to cross-platform and lightweight deployment scenarios.

Author Contributions

Conceptualization, W.Y. (Wenhui Ye); methodology, W.Y. (Wenhui Ye); software, W.Y. (Wenhui Ye); validation, W.Y. (Wenhui Ye); formal analysis, W.Y. (Wenhui Ye) and W.Y. (Wenxi Ye); investigation, W.Y. (Wenhui Ye); resources, H.W.; data curation, W.Y. (Wenxi Ye); writing—original draft preparation, W.Y. (Wenxi Ye); writing—review and editing, W.Y. (Wenhui Ye); visualization, W.Y. (Wenxi Ye); supervision, H.W.; project administration, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used in this study are publicly available benchmark datasets for social bot detection, including Cresci-2015 and TwiBot-20, which can be accessed via the Botometer repository: https://botometer.osome.iu.edu/bot-repository/datasets.html (accessed on 5 May 2026). All data utilized in this work are collected and released by the original authors for academic research purposes. This study does not involve any additional data collection. The LLM-based summarization process operates solely on the textual content already contained in these public datasets. No personally identifiable information is collected, and no user tracking or re-identification is performed.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

In comparative experiments, the proposed model is evaluated against a range of representative social bot detection methods, including feature-based, text-based, and graph-based approaches. The detailed descriptions of the baseline models are as follows:

SGBot [9] leverages user metadata and handcrafted features extracted from tweets and employs a random forest classifier to achieve scalable and generalizable bot detection.
BotHunter [16] extracts features from user metadata and utilizes a random forest classifier for detection.
Kudugunta et al. [38] combine Synthetic Minority Over-sampling Technique (SMOTE) with undersampling strategies and adopt an AdaBoost classifier to obtain optimal detection performance.
LOBO [15] extracts 19 features from user metadata and tweets and applies a random forest algorithm for classification.
BotRGCN [25] encodes user tweets and profile descriptions using the RoBERTa model, while incorporating numerical and categorical features, and performs classification via a Relational Graph Convolutional Network (RGCN).
RGT [26] adopts Graph Transformer architecture for each relation type and employs a semantic attention network across all relations to learn user representations for bot detection.
SimpleHGN [39] is an extension of the Graph Attention Network (GAT), specifically designed for heterogeneous graph scenarios with multiple edge types.
HGT [40] applies a Graph Transformer on heterogeneous relational graphs and utilizes type-specific semantic attention mechanisms to effectively capture complex social structural information for bot detection.
BotBuster [8] adopts a Mixture-of-Experts (MoE) architecture, where each expert focuses on a specific type of information, enabling cross-platform bot detection.
GLNN [31] distills knowledge from graph neural networks into a Multi-Layer Perceptron (MLP), enabling fast inference without requiring graph structures.

The methods compared in Table A1 reflect several distinct design paradigms in social bot detection, ranging from feature engineering and graph-based modeling to multimodal fusion and knowledge distillation. Early feature-based approaches such as SGBot and LOBO rely on manually constructed features derived from metadata and textual statistics, which offer strong interpretability but limited adaptability to evolving bot behaviors. In contrast, graph-based models including BotRGCN, RGT, SimpleHGN, and HGT focus on capturing structural patterns within user interaction networks, achieving strong empirical performance at the cost of heavy dependence on multi-hop neighborhood information during inference, which significantly constrains their deployment in real-world platforms.

Table A1. Comparison of different social bot detection methods.

Method	Input Modality	Core Model	Structural Modeling	Distillation	Graph	Efficiency
SGBot	Metadata + Handcrafted Features	Random Forest	✗	✗	✗	High
BotHunter	Metadata	Random Forest	✗	✗	✗	High
Kudugunta et al.	Metadata + Sampling	AdaBoost	✗	✗	✗	High
LOBO	Metadata + Text	Random Forest	✗	✗	✗	High
BotRGCN	Text + Metadata + Graph	RGCN	✓	✗	✓	Low
RGT	Graph (multi-relation)	Graph Transformer	✓	✗	✓	Low
SimpleHGN	Graph	Heterogeneous GAT	✓	✗	✓	Low
HGT	Graph	Graph Transformer	✓	✗	✓	Low
BotBuster	Text + Metadata + Graph	Mixture-of-Experts	✓	✗	✓	Medium
GLNN	Metadata (+Graph in training)	GNN + MLP	✓ (teacher)	✓	✗	High
LKD (Ours)	Text + Metadata + (+Graph in training)	GNN + LM	✓ (teacher)	✓	✗	High

✓ indicates the module is adopted; ✗ indicates the module is not adopted.

More recent efforts attempt to bridge these limitations through multimodal learning and model compression. For instance, BotBuster integrates multiple information sources via a mixture-of-experts architecture, yet still requires graph inputs at inference time, thereby inheriting part of the deployment overhead. GLNN takes a step further by distilling structural knowledge from graph neural networks into a graph-free MLP model, enabling efficient inference without explicit graph construction; however, its reliance on a relatively simple student architecture and output-level distillation limits its capacity to fully capture complex semantic patterns.

Compared with GLNN, the proposed LKD framework presents evident differences in four key aspects.

Input modality differs: GLNN takes raw graph node features as input, whereas LKD constructs multi-source text inputs consisting of metadata, user profiles, and LLM-generated tweet summaries, and no graph access is required during inference.
Distillation objective is different: GLNN only minimizes the prediction-level KL divergence for output alignment, while LKD adopts a dual-objective distillation strategy that jointly aligns prediction distributions and intermediate feature representations, supporting more comprehensive structural knowledge transfer.
Student architecture is distinct: GLNN distills structural knowledge into a lightweight MLP, while LKD uses a pre-trained language model as the student model, enabling stronger semantic understanding and long-sequence modeling capabilities.
Deployment efficiency shows different characteristics: both GLNN and LKD support graph-free inference, yet the MLP in GLNN brings lower resource consumption in deployment, while the LM-based LKD yields stronger representation ability and detection robustness under practical scenarios.

In comparison, the proposed LKD framework follows a different design trajectory by combining large language models with knowledge distillation in a unified manner. Instead of directly operating on raw textual sequences, LKD introduces LLM-based semantic summarization to compress long-term user behavior into high-density representations, which are then incorporated into a multi-source input schema. More importantly, structural knowledge is transferred from the GNN teacher to the language model student through a dual-objective distillation process that jointly aligns prediction distributions and intermediate feature representations. This design allows LKD to retain the advantages of structural modeling during training while completely removing the dependency on graph information at inference time. As a result, LKD achieves a more favorable balance between detection performance and deployment efficiency compared to existing approaches, particularly in scenarios where graph access is restricted or real-time processing is required.

References

Shao, C.; Ciampaglia, G.L.; Varol, O.; Yang, K.-C.; Flammini, A.; Menczer, F. The Spread of Low-Credibility Content by Social Bots. Nat. Commun. 2018, 9, 4787. [Google Scholar] [CrossRef] [PubMed]
Cai, M.; Luo, H.; Meng, X.; Cui, Y.; Wang, W. Network Distribution and Sentiment Interaction: Information Diffusion Mechanisms Between Social Bots and Human Users on Social Media. Inform. Process. Manag. 2023, 60, 103197. [Google Scholar] [CrossRef]
Li, S.; Yang, J.; Zhao, K.; Jia, D. Understanding Large Language Model Driven Social Bots: A Behavioral Analysis and Impact Assessment. IEEE Trans. Comput. Soc. Syst. 2026, 13, 1455–1469. [Google Scholar] [CrossRef]
Sallah, A.; Alaoui, E.A.A.; Agoujil, S. Transformer-Based Models for Detecting Bots on Twitter. In Proceedings of the Advanced Materials for Sustainable Energy and Engineering; Elkhattabi, E.M., Boutahir, M., Termentzidis, K., Nakamura, K., Rahmani, A., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 122–127. [Google Scholar]
Bibi, M.; Hussain Qaisar, Z.; Aslam, N.; Faheem, M.; Akhtar, P. TL-PBot: Twitter Bot Profile Detection Using Transfer Learning Based on DNN Model. Eng. Rep. 2024, 6, e12838. [Google Scholar] [CrossRef]
Ilias, L.; Michail Kazelidis, I.; Askounis, D. Multimodal Detection of Bots on X (Twitter) Using Transformers. IEEE Trans. Inf. Forensics Secur. 2024, 19, 7320–7334. [Google Scholar] [CrossRef]
Feng, S.; Wan, H.; Wang, N.; Tan, Z.; Luo, M.; Tsvetkov, Y. What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Ku, L.-W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 3580–3601. [Google Scholar]
Ng, L.H.X.; Carley, K.M. BotBuster: Multi-Platform Bot Detection Using a Mixture of Experts. Proc. Int. AAAI Conf. Web Soc. Media 2023, 17, 686–697. [Google Scholar] [CrossRef]
Yang, K.-C.; Varol, O.; Hui, P.-M.; Menczer, F. Scalable and Generalizable Social Bot Detection through Data Selection. AAAI 2020, 34, 1096–1103. [Google Scholar] [CrossRef]
Sayyadiharikandeh, M.; Varol, O.; Yang, K.-C.; Flammini, A.; Menczer, F. Detection of Novel Social Bots by Ensembles of Specialized Classifiers. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2020; pp. 2725–2732. [Google Scholar]
Cai, C.; Li, L.; Zengi, D. Behavior Enhanced Deep Bot Detection in Social Media. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI); IEEE Press: Beijing, China, 2017; pp. 128–130. [Google Scholar]
Wang, Y.; Wu, C.; Zheng, K.; Wang, X. Social Bot Detection Using Tweets Similarity. In Proceedings of the Security and Privacy in Communication Networks; Beyah, R., Chang, B., Li, Y., Zhu, S., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 63–78. [Google Scholar]
Heidari, M.; Jones, J.H.J.; Uzuner, O. An Empirical Study of Machine Learning Algorithms for Social Media Bot Detection. In Proceedings of the 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS); IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
Aljabri, M.; Zagrouba, R.; Shaahid, A.; Alnasser, F.; Saleh, A.; Alomari, D.M. Machine Learning-Based Social Media Bot Detection: A Comprehensive Literature Review. Soc. Netw. Anal. Min. 2023, 13, 20. [Google Scholar] [CrossRef]
Echeverría, J.; De Cristofaro, E.; Kourtellis, N.; Leontiadis, I.; Stringhini, G.; Zhou, S. LOBO: Evaluation of Generalization Deficiencies in Twitter Bot Classifiers. In Proceedings of the 34th Annual Computer Security Applications Conference; Association for Computing Machinery: New York, NY, USA, 2018; pp. 137–146. [Google Scholar]
Beskow, D.M.; Carley, K.M. Bot-hunter: A Tiered Approach to Detecting & Characterizing Automated Activity on Twitter. In Proceedings of the 11th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, Washington, DC, USA, 10–13 July 2018. [Google Scholar]
Miller, Z.; Dickinson, B.; Deitrick, W.; Hu, W.; Wang, A.H. Twitter Spammer Detection Using Data Stream Clustering. Inf. Sci. 2014, 260, 64–73. [Google Scholar] [CrossRef]
Cresci, S. A Decade of Social Bot Detection. Commun. ACM 2020, 63, 72–83. [Google Scholar] [CrossRef] [PubMed][Green Version]
Hayawi, K.; Mathew, S.; Venugopal, N.; Masud, M.M.; Ho, P.-H. DeeProBot: A Hybrid Deep Neural Network Model for Social Bot Detection Based on User Profile Data. Soc. Netw. Anal. Min. 2022, 12, 43. [Google Scholar] [CrossRef] [PubMed]
Wu, J.; Ye, X.; Man, Y. Bottrinet: A Unified and Efficient Embedding for Social Bots Detection via Metric Learning. In Proceedings of the 2023 11th International Symposium on Digital Forensics and Security (ISDFS); IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Liu, Y.; Tan, Z.; Wang, H.; Feng, S.; Zheng, Q.; Luo, M. BotMoE: Twitter Bot Detection with Community-Aware Mixtures of Modal-Specific Experts. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval; Association for Computing Machinery: New York, NY, USA, 2023; pp. 485–495. [Google Scholar]
Guo, Q.; Xie, H.; Li, Y.; Ma, W.; Zhang, C. Social Bots Detection via Fusing Bert and Graph Convolutional Networks. Symmetry 2021, 14, 30. [Google Scholar] [CrossRef]
New AI Classifier for Indicating AI-Written Text. Available online: https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/ (accessed on 8 January 2026).
Magelinski, T.; Beskow, D.; Carley, K.M. Graph-Hist: Graph Classification from Latent Feature Histograms with Application to Bot Detection. AAAI 2020, 34, 5134–5141. [Google Scholar] [CrossRef]
Feng, S.; Wan, H.; Wang, N.; Luo, M. BotRGCN: Twitter Bot Detection with Relational Graph Convolutional Networks. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining; Association for Computing Machinery: New York, NY, USA, 2022; pp. 236–239. [Google Scholar]
Feng, S.; Tan, Z.; Li, R.; Luo, M. Heterogeneity-Aware Twitter Bot Detection with Relational Graph Transformers. In Proceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Palo Alto, CA, USA, 2022; Volume 36, pp. 3977–3985. [Google Scholar]
Zhou, M.; Zhang, D.; Wang, Y.; Geng, Y.-A.; Tang, J. Detecting Social Bot on the Fly Using Contrastive Learning. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2023; pp. 4995–5001. [Google Scholar]
Yang, Y.; Yang, R.; Li, Y.; Cui, K.; Yang, Z.; Wang, Y.; Xu, J.; Xie, H. RoSGAS: Adaptive Social Bot Detection with Reinforced Self-Supervised GNN Architecture Search. ACM Trans. Web 2023, 17, 1–31. [Google Scholar] [CrossRef]
Zhou, M.; Zhang, D.; Wang, Y.; Geng, Y.; Dong, Y.; Tang, J. LGB: Language Model and Graph Neural Network-Driven Social Bot Detection. IEEE Trans. Knowl. Data Eng. 2025, 37, 4728–4742. [Google Scholar] [CrossRef]
Feng, S.; Tan, Z.; Wan, H.; Wang, N.; Chen, Z.; Zhang, B.; Zheng, Q.; Zhang, W.; Lei, Z.; Yang, S.; et al. TwiBot-22: Towards Graph-Based Twitter Bot Detection. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ‘22); Curran Associates Inc.: Red Hook, NY, USA, 2022; Article 2555; pp. 35254–35269. [Google Scholar]
Zhang, S.; Liu, Y.; Sun, Y.; Shah, N. Graph-Less Neural Networks: Teaching Old Mlps New Tricks via Distillation. In Proceedings of the International Conference on Learning Representations (ICLR 2022, Poster), Virtual Conference, 25 April 2022. [Google Scholar]
Cresci, S.; Di Pietro, R.; Petrocchi, M.; Spognardi, A.; Tesconi, M. Fame for Sale: Efficient Detection of Fake Twitter Followers. Decis. Support Syst. 2015, 80, 56–71. [Google Scholar] [CrossRef]
Feng, S.; Wan, H.; Wang, N.; Li, J.; Luo, M. TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2021; pp. 4485–4494. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2019; pp. 8026–8037. [Google Scholar]
Fey, M.; Lenssen, J.E. Fast Graph Representation Learning with PyTorch Geometric. In Proceedings of the International Conference on Learning Representations Workshop on Representation Learning on Graphs and Manifolds (ICLR 2019 RLGM Workshop), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; Liu, Q., Schlangen, D., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 38–45. [Google Scholar]
Kudugunta, S.; Ferrara, E. Deep Neural Networks for Bot Detection. Inf. Sci. 2018, 467, 312–322. [Google Scholar] [CrossRef]
Lv, Q.; Ding, M.; Liu, Q.; Chen, Y.; Feng, W.; He, S.; Zhou, C.; Jiang, J.; Dong, Y.; Tang, J. Are We Really Making Much Progress? Revisiting, Benchmarking and Refining Heterogeneous Graph Neural Networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: New York, NY, USA, 2021; pp. 1150–1160. [Google Scholar]
Hu, Z.; Dong, Y.; Wang, K.; Sun, Y. Heterogeneous Graph Transformer. In Proceedings of the Web Conference 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 2704–2710. [Google Scholar]

Figure 1. The architecture design of LKD.

Figure 2. Experimental results of different combinations of graph neural networks and language models on the TwiBot-20 dataset: (a) LM accuracy; (b) LM F1; (c) GNN accuracy; (d) GNN F1.

Figure 3. Comparison of LKD domain-adaptation pre-training effectiveness: (a) convergence curve of the teacher model without LM domain adaptation; (b) convergence curve of the student model without LM domain adaptation; (c) convergence curve of the teacher model with LM domain adaptation; (d) convergence curve of the student model with LM domain adaptation.

Figure 4. Visualization of prediction probability consistency between the teacher and student models (TwiBot-20 test set). The horizontal axis represents the bot class prediction probability of the teacher model, and the vertical axis represents the corresponding probability of the student model; green points indicate consistent prediction categories of the two models, brown points indicate inconsistency; the gray diagonal represents the ideal scenario of complete consistency.

Figure 5. Results of label and graph structure efficiency experiments (TwiBot-20 dataset): (a) label efficiency experiment; (b) graph structure efficiency experiment.

Figure 6. Impact of different LM training batch sizes on model performance (TwiBot-20 dataset).

Figure 7. Impact of different optimizers on student model performance (TwiBot-20 dataset).

Table 1. Statistics of the two datasets.

Statistic	Cresci-2015	TwiBot-20
Total users	5301	229,580
Genuine users	1950	5237
Bots	3351	6589
Total tweets	2,827,757	33,488,192
Graph structure	Available	Available

Table 2. Hyperparameter configuration.

Hyperparameter	Value
Optimizer	AdamW
LM	T5
GNN	HGT
LM lr	5 × 10⁻⁶
GNN lr	5 × 10⁻⁴
LM dropout	0.1
GNN dropout	0.4
LM L2	10⁻²
GNN L2	10⁻⁵
GNN layers	2
GNN hidden dimension	128
temperature	3
Soft label loss weight	0.5
LM fine-tuning epochs	2

Table 3. The results of comparative experiments.

Method	Cresci-2015		TwiBot-20
Method	ACC	F1	ACC	F1
SGBot [9]	97.39 ± 0.20	97.97 ± 0.14	79.61 ± 0.56	83.20 ± 0.43
BotHunter [16]	96.26 ± 0.15	97.00 ± 0.18	73.01 ± 0.73	77.30 ± 0.23
Kudugunta [38]	79.07 ± 0.32	80.14 ± 0.15	59.14 ± 0.53	49.64 ± 0.62
LOBO [15]	97.38 ± 0.10	97.97 ± 0.15	76.14 ± 0.32	80.69 ± 0.18
BotRGCN [25]	98.50 ± 0.12	98.80 ± 0.04	68.02 ± 0.72	75.29 ± 0.64
RGT [26]	98.13 ± 0.09	98.54 ± 0.20	84.26 ± 0.34	86.58 * ± 0.27
SimpleHGN [39]	98.69 ± 0.20	98.95 ± 0.13	84.69 ± 0.14	86.15 ± 0.12
HGT [40]	98.69 ± 0.15	98.98 ± 0.06	84.43 ± 0.13	86.31 ± 0.13
BotBuster [8]	95.32 ± 0.46	96.19 ± 0.12	79.27 ± 0.15	81.65 ± 0.21
GLNN-RGT [31]	97.38 ± 0.11	97.97 ± 0.10	82.99 ± 0.31	85.59 ± 0.19
GLNN-HGT [31]	97.01 ± 0.13	97.60 ± 0.14	82.49 ± 0.21	85.16 ± 0.05
LM-finetune	97.57 ± 0.22	98.04 ± 0.07	84.78 * ± 0.08	86.43 ± 0.03
LKD-GNN	99.07 ^† ± 0.22	99.25 ^† ± 0.04	84.45 ± 0.12	86.30 ± 0.10
LKD-LM	*98.87 ± 0.08**	*99.12 ± 0.13**	85.21 ^† ± 0.20	87.05 ^† ± 0.15

^† denotes the best result; * denotes the second best result; bold values represent the proposed method in this paper.

Table 4. Ablation study results on the TwiBot-20 dataset.

Method	Acc	F1
LM fine-tune only	0.8478 *	0.8643 *
GNN training only	0.7802	0.8134
MLP instead of GNN	0.8427	0.8616
LKD-GNN	0.8445	0.8630
LKD-LM	0.8521 ^†	0.8705 ^†

^† denotes the best result; * denotes the second best result; bold values represent the proposed method in this paper.

Table 5. Efficiency evaluation of LKD on TwiBot-20.

LKD Setting	Params (M)	Inference Speed (Users/s)	Latency Mean (ms)	Latency p50 (ms)	Latency p95 (ms)	Peak GPU Mem (GB)
LKD-LM	109.81	40.67	24.59	24.31	32.15	0.487
LKD-GNN	110.08	25.87	38.66	33.74	84.91	0.549

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ye, W.; Ye, W.; Wang, H. LKD: LLM-Assisted Knowledge Distillation for Efficient and Robust Social Bot Detection. Electronics 2026, 15, 2019. https://doi.org/10.3390/electronics15102019

AMA Style

Ye W, Ye W, Wang H. LKD: LLM-Assisted Knowledge Distillation for Efficient and Robust Social Bot Detection. Electronics. 2026; 15(10):2019. https://doi.org/10.3390/electronics15102019

Chicago/Turabian Style

Ye, Wenhui, Wenxi Ye, and Haizhou Wang. 2026. "LKD: LLM-Assisted Knowledge Distillation for Efficient and Robust Social Bot Detection" Electronics 15, no. 10: 2019. https://doi.org/10.3390/electronics15102019

APA Style

Ye, W., Ye, W., & Wang, H. (2026). LKD: LLM-Assisted Knowledge Distillation for Efficient and Robust Social Bot Detection. Electronics, 15(10), 2019. https://doi.org/10.3390/electronics15102019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LKD: LLM-Assisted Knowledge Distillation for Efficient and Robust Social Bot Detection

Abstract

1. Introduction

2. Related Work

2.1. Feature-Based Methods

2.2. Text-Based Methods

2.3. Graph-Based Methods

2.4. Multimodal Fusion and Knowledge Distillation Methods

3. Method

3.1. LLM-Based User Post Summarization

3.2. Multi-Source User Input Construction

3.3. Student Language Model

3.4. Teacher Graph Neural Network

3.5. Multi-Objective Knowledge Distillation

3.6. Iterative Training and Inference

4. Experiments

4.1. Experimental Setup

4.1.1. Datasets

4.1.2. Experimental Configuration

4.2. Comparison Experiments

4.3. Ablation Experiments

4.4. Experiments on Different Model Combinations

4.5. Analysis of Domain Adaptation Effectiveness

4.6. Analysis of Prediction Consistency

4.7. Analysis of Data Efficiency

4.8. Batch Size and Optimizer Sensitivity Analysis

4.9. Efficiency Evaluation of Deployment Modes

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI