Multi-Strategy Improvement and Comparative Research on Data-Driven Social Network Construction in Edge-Deficient Scenarios for Social Bot Account Detection
Abstract
1. Introduction
- (a)
- It systematically reveals the structural causes and synthetic social graph data distortion mechanisms of edge deficiency in real social networks and proposes a “complete first, then reference” research approach for real reference data, which mitigates the negative impact of incomplete real data on high-quality synthetic social graph construction.
- (b)
- It systematically compares and improves two mainstream data-driven synthetic social graph generation strategies: node degree-driven models and real incomplete network-driven methods, enhancing their ability to generate high-quality synthetic social graph data for the social bot detection task in edge-deficient scenarios.
- (c)
- We design and verify a node label-independent edge completion strategy for real edge-deficient social network data, which integrates interest identification and social association mechanisms based on user behavior logic, helps infer and supplement potential topological connections in real data, improves the structural authenticity of real reference data, and supports the construction of high-quality synthetic social graph data.
2. Improved Strategies for Social Network Construction in Edge-Deficient Scenarios
2.1. Principle, Defect Analysis, and Improvement of Node Degree-Driven Strategy
2.1.1. Core Principle of the Original Chung-Lu Model
2.1.2. Core Defects and Impacts on Bot Detection
- The randomization of edge connections leads to distortions in high-order topology and associations. Without reference to real networks, edge connections lack constraints from association logic. Data such as the human–bot interaction ratio, community structure, and node attribute associations in real networks can only be extracted with real networks as references, which the node degree-driven strategy lacks. This inevitably results in topological and associative distortions, making this flaw inherently unavoidable [13].
- There is a mismatch between a node’s labeled degree and its actual degree in the network, which stems from the lack of complete network topology. While degree information such as in-degrees and out-degrees of social network nodes is easy to collect, the target network fails to cover all associated nodes due to limitations in data collection scope and node selection rules. This leads to significant differences between the global association degree labeled in node attributes and the local actual degree within the network. This has been verified on the TwiBot-22 dataset [14]: the global total number of follows and being followed for 1 million nodes in the dataset reaches 43.7 billion, while the number of actual effective association edges in the target network constructed based on it is only 3.74 million, showing an enormous scale gap.
2.1.3. Targeted Improvement Strategies
- Proportional scaling and fine-tuning of the real degree sequence
- 2.
- Integration of human–bot interaction preference weights into connection probability
2.2. Research and Improvement of Real Incomplete Network-Driven Strategy
2.2.1. Original RCMH Sampling + Diffusion Model: Core Logic
2.2.2. Core Defects and Impacts on Bot Detection
2.2.3. Improved RCMH Sampling: Node Importance Weighting and Human–Bot Balance
- Step 1: Degree interval division
- Step 2: Node importance weight assignment
- Step 3: Improved transition acceptance probability
2.2.4. Improved Diffusion Model: Subgraph Density Constraint and Human–Bot Interaction Preservation
- Step 1: Subgraph density constraint (Formula (7))
- Step 2: Human–bot interaction weight preservation (Formula (8))
2.3. Research on Edge Completion Strategies for Social Networks with Edge Deficiency
2.3.1. Interest Identification: Edge Completion Based on Topic Participation
2.3.2. Social Association: Edge Completion Based on Mention Relationships
2.3.3. Edge Completion Based on Link Prediction Technology
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| RCMH | Rejection-Controlled Metropolis–Hastings |
| SVD | Singular Value Decomposition |
| GNN | Graph Neural Network |
| IPTC | International Press Telecommunications Council |
Appendix A. Impact of Isolated Node Proportion on Bot Detection and Reliability Test Method for Synthetic Social Graphs
| Algorithm A1: Edge Deficiency Rate vs. Bot Detection |
| INPUT: Edge/label files; Missing rate [0.24, 0.99] (step = 0.05); RF n_est = 100 OUTPUT: Metrics 1: Func BuildGraph(edges, rate): 2: Return directed graph G 3: Function ExtractFeat(G, nodes, labels): 4: Return 9-dim node features, binary labels 5: Func TrainEval(X, y): 6: Return metrics, feat_imp 7: MAIN: 8: Load edges→real_nodes, labels→label_dict 9: For rate in [0.24,0.29,…,0.99]: 10: G = BuildGraph(edges, rate); 11: X,y = ExtractFeat(G, real_nodes, label_dict) 12: metrics, feat_imp = TrainEval(X, y) 13: Save results; Return metrics |
Appendix B. Pseudocode for Two Types of Classic Network Synthesis Methods
| Algorithm A2: Traditional Attribute-Degree Chung-Lu Network |
| INPUT: Label file , Degree file , Bot count , Human count OUTPUT: Synthetic edge set , Report 1: label_map ← Load(id → label) from 2: , out_deg, in_deg ← Sample Bots Humans, match degrees from 3: Adjust in_deg to make out_deg in_deg 4: rem_out, rem_in ← out_deg, in_deg 5: While sum(rem_out) 0: 6: ← Random sample by rem_out weight 7: ← Random non-u sample by P = (rem_out[u]·rem_in[v])/total_edges 8: Add edge (u,v) to ; rem_out[u]1; rem_in[v]1 9: Save to CSV; Generate report |
| Algorithm A3: Traditional Real-Degree Network Synthesis |
1: label_map ← Load(user id → label) from 2: real_out, real_in ← Calculate in/out-degree from 3: V, out_deg, in_deg ← Sample Humans, match real degrees 4: Adjust in_deg to satisfy in_deg 5: rem_out, rem_in ← out_deg, in_deg 6: While sum(rem_out) 0: 7: u ← Random sample weighted by rem_out 8: v ← Random non-u node with P = (rem_out[u]·rem_in[v])/total_edges 9: Add edge (u,v) to E; rem_out[u]1 10: |
| Algorithm A4: Traditional Real-Degree Network Synthesis (Non-Isolated Nodes) |
| INPUT: Label set , Real follow graph OUTPUT: Synthetic edge set , Network report 1: label_map ← Load(user id → label) from 2: real_out, real_in ← Calculate in/out-degree from 3: V, out_deg, in_deg ← Select non-isolated nodes (out_deg > 0), 1:1 Bots/Humans 4: rem_out, rem_in ← out_deg, in_deg 5: While sum(rem_out) 0: 6: u ← Random sample weighted by rem_out 7: v ← Random non-u node with P = (rem_out[u]·rem_in[v])/total_edges 8: Add edge (u,v) to E; rem_out[u]1; rem_in[v]1 9: Save to CSV; Generate network report |
Appendix C. Reliability Validation Experiments for the Edge Completion Strategy
| Experimental Method | Number of True Matched Edges | Total Number of Completed Edges | Authenticity Rate (Precision) |
|---|---|---|---|
| Social Association-Based Edge Completion | 476,439 | 1,466,912 | 32.4790% |
| Random Edge Completion | 25 | 1,466,912 | 0.0017% |
| Network Type | Clustering Coefficient |
|---|---|
| Raw Real Follow Network | 0.0618 |
| Network with Interest-Driven Edge Completion | 0.0329 |
| Network with Random Edge Completion | 0.0091 |
| Network Type | Sampled Node Scale | Relative Error vs. Real Network (%) | |
|---|---|---|---|
| Raw Real Network | 50,000 | 0.7472 | - |
| 100,000 | 0.6859 | - | |
| 200,000 | 0.6241 | - | |
| Interest-Driven Edge Completion Network | 50,000 | 0.6311 | 15.54 |
| 100,000 | 0.5376 | 21.62 | |
| 200,000 | 0.4605 | 26.21 | |
| Random Edge Completion Network | 50,000 | 0.1104 | 85.22 |
| 100,000 | 0.1575 | 77.04 | |
| 200,000 | 0.2332 | 62.63 |
| Network Type | HH | BH | BB | Euclidean Distance |
|---|---|---|---|---|
| No Edge Completion (Benchmark) | 0.8483 | 0.1464 | 0.0053 | - |
| Interest-Driven Edge Completion | 0.8166 | 0.1624 | 0.0021 | 0.0357 |
| Random Edge Completion | 0.7431 | 0.2376 | 0.0193 | 0.2708 |
| Network Type | Power-Law Exponent | Average Clustering Coefficient | Average Path Length | K-S p-Value |
|---|---|---|---|---|
| No Edge Completion (Real Observation) | 1.4053 | 0.1571 | 3.1166 | - |
| Our Interest-Driven Edge Completion | 1.3967 | 0.1635 | 3.0789 | 0.033 |
| Random Edge Completion | 1.4855 | 0.0709 | 2.8148 | 0.000 |
Appendix D. Specific Implementation Details of the Edge Completion Strategy (Based on the Common Identity and Common Bond Theory)
| Algorithm A5: Interest-Based Edge Completion |
| INPUT: Node set V, Post data D, Topic categories K = 17, Attribute data Attr OUTPUT: Supplementary edge set E_add 1: Func TopicClassify(D): // Use Gemma3-12b to generate 17-dimensional topic probability distribution // Return: node -> {topic 1: prob, …, topic 17: prob} Return topic_dict 2: Func BuildTopicGroups(topic_dict): // Complete Topic Mapping (Enhanced) Initialize node_main_topic = empty dict // node -> its core topic Initialize same_topic_nodes = empty dict // topic -> node set For each node v in V: dist = topic_dict[v], main_k = argmax(dist) // Select the ONLY core topic node_main_topic[v] = main_k, Add v to same_topic_nodes[main_k] Return same_topic_nodes, node_main_topic 3: Func CalcAvgDegree(V_public, topic_dict): Calculate average in/out degree for each topic k Return avg_out[k], avg_in[k] (∀k ∈ K) 4: Func AssignTargets(v, main_k, Attr[v], avg_degree): Return target_out, target_in 5: Func GenEdges(v, targets, same_topic_nodes): Return E_supp 6: MAIN: 7: topic_dict = TopicClassify(D) 8: same_topic_nodes, node_main_topic = BuildTopicGroups(topic_dict) 9: avg_degree = CalcAvgDegree(V_public, topic_dict) 10: For each edge-deficient node v: 11: main_k = node_main_topic[v] // Directly use mapped core topic 12: targets = AssignTargets(v, main_k, Attr[v], avg_degree) 13: E_add += GenEdges(v, targets, same_topic_nodes[main_k]) 14: Save E_add; Return E_add |
| Algorithm A6: Social Association Edge Completion |
| INPUT: User set V, Mention data M, Edge set E_existing OUTPUT: Supplementary edge set E_social 1: # Stage 1: Direct mention edges 2: For (u,v) In M where @ relation exists: 3: If (u,v) Not In E_existing: E_social.add((u,v)) 4: # Stage 2: Similar mention group edges 5: mention_sets = {u: set(mentioned_by(u)) For u In V_public} 6: For u In V Where is_edge_deficient(u): 7: For v In V_public Where v u: 8: sim = jaccard(mention_sets[u], mention_sets[v]) 9: prob = learn_edge_prob_from_A(sim) 10: if random() < prob: E_social.add((u,v)) |
Appendix E. Experimental Environment and Hyperparameter Configuration
Appendix F. Feature Engineering Explanation
Appendix G. Feature Importance and Ablation Experiments
| Feature Name | Importance (Pre-Completion) | Importance (Post-Completion) |
|---|---|---|
| In-Degree | 0.378 | 0.245 |
| Total Degree | 0.257 | 0.167 |
| Degree Centrality | 0.161 | 0.160 |
| Proportion of Bot In-Neighbors | 0.070 | 0.087 |
| Proportion of Human In-Neighbors | 0.066 | 0.142 |
| Out-Degree | 0.024 | 0.059 |
| Proportion of Bot Out-Neighbors | 0.023 | 0.069 |
| Proportion of Human Out-Neighbors | 0.022 | 0.071 |
| Node Activity Status | 0.001 | 0.001 |
| Method Name | ACC | Pre | Rec | F1 |
|---|---|---|---|---|
| Degree Scaling Only | 0.5772 [0.5716, 0.5822] | 0.5515 [0.5454, 0.5579] | 0.8262 [0.8206, 0.8322] | 0.6614 [0.6561, 0.6666] |
| Interaction Preference Weight Only | 0.5124 [0.5070, 0.5176] | 0.5459 [0.5316, 0.5600] | 0.1477 [0.1417, 0.1530] | 0.2324 [0.2247, 0.2402] |
| Node Importance Weight Only | 0.5286 [0.5237, 0.5336] | 0.6578 [0.6427, 0.6737] | 0.1191 [0.1143, 0.1237] | 0.2017 [0.1944, 0.2088] |
| Balance + Degree Constraint Only | 0.5255 [0.5207, 0.5305] | 0.6220 [0.6071, 0.6379] | 0.1301 [0.1251, 0.1349] | 0.2152 [0.2079, 0.2223] |
Appendix H. Data-Driven Balanced Node Weight Calculation
| Algorithm A7: Data-Driven Balanced Node Weight Calculation |
| INPUT: Graph , node labels label:, degree bins OUTPUT: Node weights 1: For each bin do count bot and human nodes: 2: For each node do 3: Find bin of 4: If then else 5: 6: 7: For each node do ← 8: Return |
Appendix I. Data Preprocessing and Division Details
References
- Chu, Z.; Gianvecchio, S.; Wang, H.; Jajodia, S. Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Trans. Dependable Secur. Comput. 2012, 9, 811–824. [Google Scholar] [CrossRef]
- Al-Na’amneh, Q.; Aljawarneh, M.; Alhazaimeh, A.S.; Hazaymih, R.; Shah, S.M.; Dhifallah, W. Securing trust: Rule-based defense against on/off and collusion attacks in cloud environments. STAP J. Secur. Risk Manag. 2025, 2025, 85–114. [Google Scholar] [CrossRef]
- Abdulateef, O.G.; Joudah, A.; Abdulsahib, M.G.; Alrammahi, H. Designing a robust machine learning-based framework for secure data transmission in internet of things (IoT) environments: A multifaceted approach to security challenges. J. Cyber Secur. Risk Audit. 2025, 2025, 266–275. [Google Scholar] [CrossRef]
- Laprevotte, A.; Lin, R.Y.; Ojha, S. Diffusion-Generated Social Graphs Enhance Bot Detection. In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: New Perspectives in Graph Machine Learning (NPGML), San Diego, CA, USA, 7 December 2025. [Google Scholar]
- Davies, A.O.; Ajmeri, N.S.; Telmo De Menezes Filho, E.S. A comparative study of gnns and rule-based methods for synthetic social network generation. IEEE Access 2025, 13, 32198–32210. [Google Scholar] [CrossRef]
- Pi, J.; Xian, Y.; Huang, Y.; Xiang, Y.; Song, R.; Yu, Z. Topology-Aware Gated Graph Neural Network for Social Bot Detection. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics; The Asian Federation of Natural Language Processing and The Association for Computational Linguistics: Mumbai, India, 2025; pp. 235–245. [Google Scholar]
- Lingam, G.; Das, S.K. Social bot detection using variational generative adversarial networks with hidden Markov models in Twitter network. Knowl.-Based Syst. 2025, 311, 113019. [Google Scholar] [CrossRef]
- Dehghan, A.; Siuta, K.; Skorupka, A.; Dubey, A.; Betlen, A.; Miller, D.; Xu, W.; Kamiński, B.; Prałat, P. Detecting bots in social-networks using node and structural embeddings. J. Big Data 2023, 10, 119. [Google Scholar] [CrossRef] [PubMed]
- Alkathiri, N.; Slhoub, K. Challenges in machine learning-based social bot detection: A systematic review. Discov. Artif. Intell. 2025, 5, 214. [Google Scholar] [CrossRef]
- Yu, Z.; Bai, L.; Ye, O.; Cong, X. Social Robot Detection Method with Improved Graph Neural Networks. Comput. Mater. Contin. 2024, 78, 1773–1795. [Google Scholar] [CrossRef]
- Li, Y.; Shi, S.; Guo, X.; Zhou, C.; Hu, Q. G-CutMix: A CutMix-based graph data augmentation method for bot detection in social networks. PLoS ONE 2025, 20, e0331978. [Google Scholar] [CrossRef] [PubMed]
- Chung, F.; Lu, L. Connected components in random graphs with given expected degree sequences. Ann. Comb. 2002, 6, 125–145. [Google Scholar] [CrossRef]
- Seshadhri, C.; Pinar, A.; Kolda, T.G. An in-depth study of stochastic Kronecker graphs. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining; IEEE: Piscataway, NJ, USA, 2011; pp. 587–596. [Google Scholar]
- Feng, S.; Tan, Z.; Wan, H.; Wang, N.; Chen, Z.; Zhang, B.; Zheng, Q.; Zhang, W.; Lei, Z.; Yang, S.; et al. Twibot-22: Towards graph-based twitter bot detection. Adv. Neural Inf. Process. Syst. 2022, 35, 35254–35269. [Google Scholar]
- Leskovec, J.; Faloutsos, C. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2006; pp. 631–636. [Google Scholar]
- Newman, M.E.J. Mixing patterns in networks. Phys. Rev. E 2003, 67, 026126. [Google Scholar] [CrossRef] [PubMed]
- Cresci, S.; Di Pietro, R.; Petrocchi, M.; Spognardi, A.; Tesconi, M. The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th International Conference on World Wide Web Companion; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2017; pp. 963–972. [Google Scholar]
- Gjoka, M.; Kurant, M.; Butts, C.T.; Markopoulou, A. Walking in facebook: A case study of unbiased sampling of osns. In Proceedings of the 2010 Proceedings IEEE INFOCOM; IEEE: Piscataway, NJ, USA, 2010; pp. 1–9. [Google Scholar]
- Feng, S.; Wan, H.; Wang, N.; Li, J.; Luo, M. Twibot-20: A comprehensive twitter bot detection benchmark. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2021; pp. 4485–4494. [Google Scholar]
- Gao, M.; Du, H.; Wen, W.; Duan, Q.; Wang, X.; Chen, Y. FediData: A Comprehensive Multi-Modal Fediverse Dataset from Mastodon. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2025; pp. 6372–6376. [Google Scholar]
- Ren, Y.; Kraut, R.; Kiesler, S. Applying common identity and bond theory to design of online communities. Organ. Stud. 2007, 28, 377–408. [Google Scholar] [CrossRef]
- Postmes, T.; Spears, R.; Lea, M. Breaching or building social boundaries? SIDE-effects of computer-mediated communication. Commun. Res. 1998, 25, 689–715. [Google Scholar] [CrossRef]
- Barbieri, N.; Bonchi, F.; Manco, G. Who to follow and why: Link prediction with explanations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2014; pp. 1266–1275. [Google Scholar]
- Kuzman, T.; Ljubešić, N. LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification. IEEE Access 2025, 13, 35621–35633. [Google Scholar] [CrossRef]
- Adamic, L.A.; Adar, E. Friends and neighbors on the web. Soc. Netw. 2003, 25, 211–230. [Google Scholar] [CrossRef]
- Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 855–864. [Google Scholar]
- Liben-Nowell, D.; Kleinberg, J. The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2003; pp. 556–559. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016. [Google Scholar]
- Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
- Kipf, T.N. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017); Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
- Zhao, J.; Wen, Q.; Sun, S.; Ye, Y. Multi-view self-supervised heterogeneous graph embedding. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer International Publishing: Cham, Switzerland, 2021; pp. 319–334. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning (PMLR); JMLR.org: Cambridge, MA, USA, 2017; pp. 1126–1135. [Google Scholar]

| Statistical Dimension | Human Proportion (%) | Bot Proportion (%) |
|---|---|---|
| Composition of high-degree nodes (deg > 105) | 97.69 | 2.31 |
| Composition of interaction objects of high-degree human nodes | 95.05 | 4.95 |
| Dataset | Method Name | ACC (95% CI) | Pre (95% CI) | Rec (95% CI) | F1 (95% CI) |
|---|---|---|---|---|---|
| TwiBot-22 | Real Data (train:val:test = 7:2:1) | 0.6170 [0.6115, 0.6224] | 0.6006 [0.5928, 0.6076] | 0.6997 [0.6929, 0.7064] | 0.6463 [0.6404, 0.6521] |
| Attribute-Based Degree | 0.4999 [0.4947, 0.5052] | 0.5000 [0.4947, 0.5048] | 1.0000 [1.0000, 1.0000] | 0.6666 [0.6620, 0.6710] | |
| Network-Based Degree | 0.5009 [0.4957, 0.5062] | 0.7734 [0.6557, 0.8724] | 0.0024 [0.0017, 0.0031] | 0.0047 [0.0033, 0.0061] | |
| Network-Based Degree (Non-Isolated) | 0.5088 [0.5034, 0.5139] | 0.5054 [0.4998, 0.5105] | 0.8353 [0.8297, 0.8409] | 0.6297 [0.6246, 0.6343] | |
| Subgraph Sampling + Diffusion Model | 0.6061 [0.6014, 0.6111] | 0.5760 [0.5703, 0.5821] | 0.8041 [0.7984, 0.8093] | 0.6712 [0.6664, 0.6762] | |
| Improved Degree-Based Method | 0.5788 [0.5732, 0.5839] | 0.5529 [0.5468, 0.5595] | 0.8235 [0.8179, 0.8295] | 0.6615 [0.6564, 0.6666] | |
| Improved Subgraph Sampling + Diffusion Model | 0.6190 [0.6142, 0.6241] | 0.6042 [0.5975, 0.6108] | 0.6902 [0.6833, 0.6970] | 0.6443 [0.6386, 0.6501] | |
| TwiBot-20 | Real Data (train:val:test = 7:2:1) | 0.5750 [0.5532, 0.5978] | 0.5951 [0.5597, 0.6347] | 0.4752 [0.4424, 0.5087] | 0.5286 [0.4974, 0.5565] |
| Attribute-Based Degree | 0.4023 [0.3822, 0.4223] | 0.5077 [0.3709, 0.6305] | 0.0218 [0.0143, 0.0297] | 0.0417 [0.0274, 0.0576] | |
| Network-Based Degree | 0.4320 [0.4111, 0.4530] | 0.5542 [0.5160, 0.5929] | 0.2540 [0.2308, 0.2761] | 0.3490 [0.3224, 0.3784] | |
| Network-Based Degree (Non-Isolated) | 0.4423 [0.4209, 0.4642] | 0.5438 [0.5123, 0.5743] | 0.4144 [0.3876, 0.4407] | 0.4702 [0.4446, 0.4945] | |
| Subgraph Sampling + Diffusion Model | 0.5750 [0.5535, 0.5917] | 0.6596 [0.6384, 0.6902] | 0.6058 [0.5804, 0.6273] | 0.6297 [0.6127, 0.6528] | |
| Improved Degree-Based Method | 0.5656 [0.5447, 0.5861] | 0.6425 [0.6146, 0.6686] | 0.6153 [0.5875, 0.6426] | 0.6287 [0.6059, 0.6512] | |
| Improved Subgraph Sampling + Diffusion Model | 0.5722 [0.5530, 0.5915] | 0.6497 [0.6237, 0.6742] | 0.6282 [0.6050, 0.6514] | 0.6365 [0.6153, 0.6566] |
| Method Name | ACC (95% CI) | Pre (95% CI) | Rec (95% CI) | F1 (95% CI) |
|---|---|---|---|---|
| Real Data (train:val:test = 7:2:1) | 0.6170 [0.6115, 0.6224] | 0.6006 [0.5928, 0.6076] | 0.6997 [0.6929, 0.7064] | 0.6463 [0.6404, 0.6521] |
| Real Data (Interest-Based Edge Completion) | 0.6833 [0.6784, 0.6881] | 0.6771 [0.6704, 0.6840] | 0.7008 [0.6936, 0.7075] | 0.6888 [0.6833, 0.6945] |
| Real Data (Social Association Edge Completion) | 0.6299 [0.6246, 0.6348] | 0.6151 [0.6078, 0.6212] | 0.6943 [0.6878, 0.7012] | 0.6523 [0.6467, 0.6575] |
| Real Data (Interest + Social Association Edge Completion) | 0.6875 [0.6829, 0.6923] | 0.6675 [0.6619, 0.6735] | 0.7472 [0.7410, 0.7537] | 0.7051 [0.7002, 0.7104] |
| Improved Degree-Based Method (Post Edge Completion) | 0.5908 ↑ [0.5861, 0.5956] | 0.5587 ↑ [0.5530, 0.5643] | 0.8648 ↑ [0.8599, 0.8695] | 0.6788 ↑ [0.6741, 0.6834] |
| Improved Subgraph Sampling + Diffusion (Post Edge Completion) | 0.6427 ↑ [0.6383, 0.6474] | 0.5991 ↓ [0.5934, 0.6049] | 0.8626 ↑ [0.8580, 0.8677] | 0.7071 ↑ [0.7025, 0.7117] |
| Method Name | ACC (95% CI) | Pre (95% CI) | Rec (95% CI) | F1 (95% CI) |
|---|---|---|---|---|
| Real Data (20% Test Set) | 0.6170 [0.6115, 0.6224] | 0.6006 [0.5928, 0.6076] | 0.6997 [0.6929, 0.7064] | 0.6463 [0.6404, 0.6521] |
| Link Prediction (Basic Supervision) | 0.6176 [0.6124, 0.6229] | 0.6007 [0.5937, 0.6076] | 0.7021 [0.6953, 0.7093] | 0.6474 [0.6418, 0.6533] |
| Link Prediction (Enhanced Supervision) | 0.5941 [0.5890, 0.5993] | 0.5743 [0.5675, 0.5811] | 0.7280 [0.7208, 0.7347] | 0.6420 [0.6368, 0.6474] |
| Link Prediction (Representation Learning) | 0.6166 [0.6113, 0.6221] | 0.6000 [0.5928, 0.6069] | 0.7002 [0.6933, 0.7073] | 0.6462 [0.6405, 0.6520] |
| Link Prediction (Hybrid Integration) | 0.6058 [0.6007, 0.6110] | 0.5928 [0.5856, 0.5999] | 0.6770 [0.6701, 0.6841] | 0.6321 [0.6264, 0.6380] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, J.; Tang, M. Multi-Strategy Improvement and Comparative Research on Data-Driven Social Network Construction in Edge-Deficient Scenarios for Social Bot Account Detection. Information 2026, 17, 360. https://doi.org/10.3390/info17040360
Wang J, Tang M. Multi-Strategy Improvement and Comparative Research on Data-Driven Social Network Construction in Edge-Deficient Scenarios for Social Bot Account Detection. Information. 2026; 17(4):360. https://doi.org/10.3390/info17040360
Chicago/Turabian StyleWang, Junjie, and Minghu Tang. 2026. "Multi-Strategy Improvement and Comparative Research on Data-Driven Social Network Construction in Edge-Deficient Scenarios for Social Bot Account Detection" Information 17, no. 4: 360. https://doi.org/10.3390/info17040360
APA StyleWang, J., & Tang, M. (2026). Multi-Strategy Improvement and Comparative Research on Data-Driven Social Network Construction in Edge-Deficient Scenarios for Social Bot Account Detection. Information, 17(4), 360. https://doi.org/10.3390/info17040360
