APT Detection via Hypergraph Attention Network with Community-Based Behavioral Mining
Abstract
1. Introduction
- Behavior Boundary Division (C1): In the provenance graph, there is no clear boundary between which behavior a node or edge belongs to. Data-driven and rule-driven methods cannot accurately distinguish them, resulting in further aggravation of false positives and false negatives.
- High-level Behavioral Modeling (C2): Hyperedges need to integrate spatiotemporal features of multi-entity interactions like timestamps, process attributions, and network parameters, but existing methods cannot generalize complex behavioral patterns in dynamic contexts. Traditional rule matching and shallow machine learning struggle to capture deep correlations in cross-modal data.
- Heterogeneous Data Fusion (C3): Kernel logs contain multiple entity types (processes, files, registries, network sockets, etc.) which have interactive behaviors featuring heterogeneity and temporal dependencies. Designing targeted approaches to efficiently extract key features and map them into hyperedges remains necessary.
- 1
- This paper proposes a hypergraph construction method based on overlapping community discovery: by adapting LFM to the provenance graph to discover behavioral subgraphs, multi-dimensional features are integrated into weighted hyperedges, and high-order multi-entity behaviors are modeled, breaking through the limitations of traditional binary relationships in graphs.
- 2
- This paper improves the overlapping community detection algorithm to achieve more efficient behavioral community mining. Because of the instability of the starting point of overlapping community discovery, this paper selects the initial point of overlapping community discovery through an abnormal node detection method to avoid the high overhead and behavioral mining uncertainty caused by the randomness of community discovery.
- 3
- This paper proposes a HyperGAT model based on a dual attention mechanism: design node-level and edge-level attention mechanisms, focus on key entities and discriminative hyperedges, improve the accuracy of feature aggregation and attack behavior detection, and optimize the misjudgment rate to 0.12% through graph-level classification.
2. Related Works
2.1. Behavioral Community Mining Techniques
2.2. Graph-Based APT Attack Detection Methods
3. Preliminary
3.1. Provenance Graph
3.2. Hypergraph
3.3. Threat Model
4. System Overview
5. Provenance Graph Construction
6. Abnormal Behavioral Community Detection
6.1. Seed Node Selection
6.2. Overlapping Behavioral Community Detection
- : if and only if the time difference between u and v is less than the threshold t, where the threshold t can be customized by the user.
- : if and only if u and v have the same process id.
- : if and only if the destination IP of u and v are the same.
- : if and only if the ports of u and v are the same.
- to : if and only if the corresponding characteristics of u and v in the socket logs are the same.
- to : if and only if u and v are sibling processes or parent-child processes.
- : if and only if u and v access the same object.
- : if and only if u and v have the same process name.
- and : if and only if the outbound/inbound network requests are related to DNS queries.
- to : if and only if DNS query behaviors are related to web browsing behaviors.
Algorithm 1 Overlapping Behavioral Community Detection Algorithm |
Input:
Output:
|
7. Hypergraph Construction
8. Attack Behaviors Detection
8.1. HyperGAT
8.2. Attention in HyperGAT
8.3. Node-Level Attention
8.4. Edge-Level Attention
8.5. Behavior Detection
9. Evaluation
9.1. Dataset
9.2. Abnormal Nodes Detection
9.3. Ablation Experiment
9.4. Runtime Overhead
10. Conclusions and Future Work
10.1. Conclusions
10.2. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rehman, M.U.; Ahmadi, H.; Hassan, W.U. Flash: A comprehensive approach to intrusion detection via provenance graph representation learning. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 3552–3570. [Google Scholar]
- Han, X.; Pasquier, T.; Bates, A.; Mickens, J.; Seltzer, M. Unicorn: Runtime provenance-based detector for advanced persistent threats. arXiv 2020, arXiv:2001.01525. [Google Scholar]
- Jia, Z.; Xiong, Y.; Nan, Y.; Zhang, Y.; Zhao, J.; Wen, M. MAGIC: Detecting advanced persistent threats via masked graph representation learning. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 5197–5214. [Google Scholar]
- Xiong, W.; Legrand, E.; Åberg, O.; Lagerström, R. Cyber security threat modeling based on the MITRE Enterprise ATT&CK Matrix. Softw. Syst. Model. 2022, 21, 157–177. [Google Scholar]
- Lv, M.; Gao, H.; Qiu, X.; Chen, T.; Zhu, T.; Chen, J.; Ji, S. TREC: APT tactic/technique recognition via few-shot provenance subgraph learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2024; pp. 139–152. [Google Scholar]
- Jia, J.; Yang, L.; Wang, Y.; Sang, A. Hyper attack graph: Constructing a hypergraph for cyber threat intelligence analysis. Comput. Secur. 2025, 149, 104194. [Google Scholar] [CrossRef]
- Li, Y.; Zhu, Z. A Fast Method of Detecting Overlapping Community in Network Based on LFM. J. Softw. 2015, 10, 825–834. [Google Scholar] [CrossRef]
- Pei, K.; Gu, Z.; Saltaformaggio, B.; Ma, S.; Wang, F.; Zhang, Z.; Si, L.; Zhang, X.; Xu, D. Hercule: Attack story reconstruction via community discovery on correlated log graph. In Proceedings of the 32nd Annual Conference on Computer Security Applications, Los Angeles, CA, USA, 5–9 December 2016; pp. 583–595. [Google Scholar]
- Li, T.; Liu, X.; Qiao, W.; Zhu, X.; Shen, Y.; Ma, J. T-Trace: Constructing the APTs Provenance Graphs Through Multiple Syslogs Correlation. IEEE Trans. Dependable Secur. Comput. 2024, 21, 1179–1195. [Google Scholar] [CrossRef]
- Cheng, Z.; Lv, Q.; Liang, J.; Wang, Y.; Sun, D.; Pasquier, T.; Han, X. Kairos: Practical intrusion detection and investigation using whole-system provenance. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 3533–3551. [Google Scholar]
- Xu, Z.; Fang, P.; Liu, C.; Xiao, X.; Wen, Y.; Meng, D. Depcomm: Graph summarization on system audit logs for attack investigation. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 540–557. [Google Scholar]
- Hossain, M.N.; Milajerdi, S.M.; Wang, J.; Eshete, B.; Gjomemo, R.; Sekar, R.; Stoller, S.; Venkatakrishnan, V. SLEUTH: Real-time attack scenario reconstruction from COTS audit data. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 487–504. [Google Scholar]
- Milajerdi, S.M.; Gjomemo, R.; Eshete, B.; Sekar, R.; Venkatakrishnan, V. Holmes: Real-time apt detection through correlation of suspicious information flows. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1137–1152. [Google Scholar]
- Bhatt, P.; Yano, E.T.; Gustavsson, P. Towards a framework to detect multi-stage advanced persistent threats attacks. In Proceedings of the 2014 IEEE 8th International Symposium on Service Oriented System Engineering, Oxford, UK, 7–11 April 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 390–395. [Google Scholar]
- Li, Z.; Chen, Q.A.; Yang, R.; Chen, Y.; Ruan, W. Threat detection and investigation with system-level provenance graphs: A survey. Comput. Secur. 2021, 106, 102282. [Google Scholar] [CrossRef]
- Bretto, A. Hypergraph theory. In An Introduction. Mathematical Engineering; Springer: Cham, Switzerland, 2013; Volume 1, pp. 209–216. [Google Scholar]
- Hassan, W.U.; Guo, S.; Li, D.; Chen, Z.; Jee, K.; Li, Z.; Bates, A. Nodoze: Combatting threat alert fatigue with automated provenance triage. In Proceedings of the Network and Distributed Systems Security Symposium, San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
- Li, S.; Dong, F.; Xiao, X.; Wang, H.; Shao, F.; Chen, J.; Guo, Y.; Chen, X.; Li, D. Nodlink: An online system for fine-grained apt attack detection and investigation. arXiv 2023, arXiv:2311.02331. [Google Scholar]
- Chen, T.; Song, Q.; Zhu, T.; Qiu, X.; Zhu, Z.; Lv, M. Kellect: A Kernel-based efficient and lossless event log collector for windows security. Comput. Secur. 2025, 150, 104203. [Google Scholar] [CrossRef]
- Xie, J.; Kelley, S.; Szymanski, B.K. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. (CSUR) 2013, 45, 1–35. [Google Scholar] [CrossRef]
- Djidjev, C. siForest: Detecting Network Anomalies with Set-Structured Isolation Forest. arXiv 2024, arXiv:2412.06015. [Google Scholar]
- Yan, N.; Wen, Y.; Chen, L.; Wu, Y.; Zhang, B.; Wang, Z.; Meng, D. Deepro: Provenance-based APT campaigns detection via GNN. In Proceedings of the 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Wuhan, China, 9–11 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 747–758. [Google Scholar]
- Ding, K.; Wang, J.; Li, J.; Li, D.; Liu, H. Be More with Less: Hypergraph Attention Networks for Inductive Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Webber, B., Cohn, T., He, Y., Liu, Y., Eds.; pp. 4927–4936. [Google Scholar]
- Manzoor, E.; Milajerdi, S.M.; Akoglu, L. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1035–1044. [Google Scholar]
- Wang, S.; Wang, Z.; Zhou, T.; Sun, H.; Yin, X.; Han, D.; Zhang, H.; Shi, X.; Yang, J. Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3972–3987. [Google Scholar] [CrossRef]
- Yang, F.; Xu, J.; Xiong, C.; Li, Z.; Zhang, K. PROGRAPHER: An anomaly detection system based on provenance graph embedding. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 4355–4372. [Google Scholar]
Node Pair | Edge Type |
---|---|
process - process | create |
process - file | create, read, write, close, delete |
process - registry | open, query, enumerate, modify, close, delete |
process - socket | send, receive, connect, accept, disconnect, reconnect |
Dimensions | Features |
---|---|
Variable and Formula | Definition |
---|---|
C | Represents a community containing multiple nodes. |
Community C’s internal fitness, initialized to 0 and incremented as nodes are added. | |
Community C’s external fitness, initialized to 0. | |
Internal fitness after node j joins C; for the first node addition, . | |
External fitness after node j joins C. | |
Node j’s internal fitness within C, defined as the sum of products of weights of all edges connected to j in C and j’s node weight. | |
Node j’s external fitness, defined as the sum of products of weights of all edges connected to j outside C and j’s node weight. |
Dataset | Malicious | Entities | Edges | Size (GB) |
---|---|---|---|---|
Unicorn Wget | 265424 | 975226 | 64 | |
Y | 257156 | 949887 | 12.6 | |
DARPA E3 Theia | 1598647 | 2874821 | 17.91 | |
Y | 25319 | |||
DARPA E3 Cadets | 1614189 | 3303264 | 18.38 | |
Y | 12846 |
Accuracy | Recall | F1-Score | FPR | AUC | |
---|---|---|---|---|---|
StreamSpot | 0.764 | 0.891 | 0.871 | 0.216 | 0.797 |
ThreaTrace | 0.875 | 0.874 | 0.885 | 0.152 | 0.863 |
Ours | 0.903 | 0.915 | 0.898 | 0.092 | 0.914 |
Dataset | Model | Accuracy | F1-Score | FPR | AUC |
---|---|---|---|---|---|
Theia | GCN | 0.7520 ± 0.0053 | 0.8050 ± 0.0101 | 0.2280 ± 0.0048 | 0.7580 ± 0.0082 |
GAT | 0.7890 ± 0.0042 | 0.8320 ± 0.0075 | 0.2050 ± 0.0053 | 0.7810 ± 0.0025 | |
w/o attention | 0.7080 ± 0.0016 | 0.7790 ± 0.0022 | 0.2314 ± 0.0042 | 0.7153 ± 0.0018 | |
w/o behavior semantic | 0.6684 ± 0.0042 | 0.7151 ± 0.0031 | 0.1557 ± 0.0022 | 0.6580 ± 0.0032 | |
HyperGAT (1-layer) | 0.9320 ± 0.0010 | 0.9776 ± 0.0006 | 0.0082 ± 0.0026 | 0.9238 ± 0.0017 | |
HyperGAT | 0.9773 ± 0.0018 | 0.9835 ± 0.0005 | 0.0012 ± 0.0010 | 0.9751 ± 0.0014 | |
Cadets | GCN | 0.7750 ± 0.0046 | 0.8420 ± 0.0082 | 0.1920 ± 0.0025 | 0.7760 ± 0.0037 |
GAT | 0.8060 ± 0.0051 | 0.8580 ± 0.0046 | 0.1750 ± 0.0057 | 0.8020 ± 0.0092 | |
w/o attention | 0.7350 ± 0.0037 | 0.8456 ± 0.0021 | 0.1268 ± 0.0016 | 0.7145 ± 0.0007 | |
w/o behavior semantic | 0.6862 ± 0.0116 | 0.7465 ± 0.0015 | 0.1024 ± 0.0061 | 0.6814 ± 0.0057 | |
HyperGAT (1-layer) | 0.9056 ± 0.0083 | 0.9340 ± 0.0058 | 0.0614 ± 0.0034 | 0.9258 ± 0.0016 | |
HyperGAT | 0.9253 ± 0.0006 | 0.9637 ± 0.0023 | 0.0016 ± 0.0011 | 0.9832 ± 0.0038 |
Stage | Time Consumption (s) | Memory Usage (GB) |
---|---|---|
Overlapping Community Detection | 38 | 0.59 |
Training | 1285 | 2.71 |
Testing | 875 | 3.87 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, Q.; Chen, T.; Zhu, T.; Lv, M.; Qiu, X.; Zhu, Z. APT Detection via Hypergraph Attention Network with Community-Based Behavioral Mining. Appl. Sci. 2025, 15, 5872. https://doi.org/10.3390/app15115872
Song Q, Chen T, Zhu T, Lv M, Qiu X, Zhu Z. APT Detection via Hypergraph Attention Network with Community-Based Behavioral Mining. Applied Sciences. 2025; 15(11):5872. https://doi.org/10.3390/app15115872
Chicago/Turabian StyleSong, Qijie, Tieming Chen, Tiantian Zhu, Mingqi Lv, Xuebo Qiu, and Zhiling Zhu. 2025. "APT Detection via Hypergraph Attention Network with Community-Based Behavioral Mining" Applied Sciences 15, no. 11: 5872. https://doi.org/10.3390/app15115872
APA StyleSong, Q., Chen, T., Zhu, T., Lv, M., Qiu, X., & Zhu, Z. (2025). APT Detection via Hypergraph Attention Network with Community-Based Behavioral Mining. Applied Sciences, 15(11), 5872. https://doi.org/10.3390/app15115872