Using Causality-Driven Graph Representation Learning for APT Attacks Path Identification
Abstract
1. Introduction
- A dynamic causal weight assignment method based on reinforcement learning is proposed. The causal quantification of edges is achieved through interaction with the graph environment by Q-learning, breaking through the limitations of static association analysis in traditional traceability graphs.
- A causal weighting mechanism is innovatively introduced into the GAT, enabling the attention allocation to prioritize paths with high causal importance and enhancing the model’s ability to capture the causal semantics of attacks.
- The anomaly confidence score is integrated with attack path reconstruction, and the key abnormal elements are filtered through the anomaly score. The interpretable attack path is generated by combining the timing and causal dependencies, and the intuitive attack evolution and risk level are presented.
2. Materials and Methods
2.1. Construction of the Provenance Graph
2.2. Reinforcement Learning and Causality
2.2.1. Edge Feature Extractor
2.2.2. Causal Weight Predictor
2.2.3. Reward Estimator
2.2.4. Experience Replay Buffer
Algorithm 1: Causal weight learning | |||
Input: G = (V, E) | |||
Output: G′ = (V, E, W) | |||
1 | Initialization , Q, Qtarget, Reward R, buffer D, optimizer | ||
2 | for e ← 1 to E | ||
for each edge(, ) ∈ E: | |||
a. | extract state | ||
b. | action a: | ||
choose with probability 1 − ε | |||
choose random action with probability ε | |||
c. | r = R(s, a) | ||
d. | get next | ||
e. | save (edgeid, s, a, r, s′, done) to D | ||
f. | sampling (sj, aj, rj, s′j, donej) from D | ||
g. | target value | ||
h. | Loss | ||
i. | update Qtarget = Q | ||
3 | |||
4 | return G′ |
2.3. Mask Graph Autoencoder
2.3.1. Causal Weight Application and the Masking Mechanism
2.3.2. Encoder and Decoder
2.4. Anomaly Detection and Building Attack Paths
3. Evaluation and Results
- RQ1:
- Detection Performance
- RQ2:
- Parameter Sensitivity
4. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Wang, Y.; Liu, H.; Li, Z.; Su, Z.; Li, J. Combating advanced persistent threats: Challenges and solutions. IEEE Netw. 2024, 38, 324–333. [Google Scholar] [CrossRef]
- Kareem, K. A comprehensive analysis of pegasus spyware and its implications for digital privacy and security. arXiv 2024, arXiv:2404.19677. [Google Scholar] [CrossRef]
- Fraunhofer FKIE. SolarMarker (Win32)—Threat Summary[EB/OL]. Malpedia. 30 May 2024. Available online: https://malpedia.caad.fkie.fraunhofer.de/details/win.solarmarker (accessed on 9 August 2025).
- McAfee. CLOP Ransomware Exploits MOVEit Software. 2023. Available online: https://www.mcafee.com/blogs/other-blogs/mcafee-labs/clop-ransomware-exploits-moveit-software/ (accessed on 9 August 2025).
- Akbarzadeh, A.; Erdodi, L.; Houmb, S.H.; Soltvedt, T.G. Two-stage advanced persistent threat (APT) attack on an IEC 61850 power grid substation. Int. J. Inf. Secur. 2024, 23, 2739–2758. [Google Scholar] [CrossRef]
- Mahmoud, M.; Mannan, M.; Youssef, A. APTHunter: Detecting advanced persistent threats in early stages. Digit. Threat. Res. Pract. 2023, 4, 1–31. [Google Scholar] [CrossRef]
- Cheng, W.; Yuan, Q.; Zhu, T.; Chen, T.; Ying, J.; Zheng, A.; Ma, M.; Xiong, C.; Lv, M.; Chen, Y. TAGAPT: Towards Automatic Generation of APT Samples with Provenance-level Granularity. IEEE Trans. Inf. Forensics Secur. 2025, 20, 4137–4151. [Google Scholar] [CrossRef]
- Lee, J.S.; Fan, Y.Y.; Cheng, C.H.; Chew, C.-J.; Kuo, C.-W. ML-based intrusion detection system for precise APT cyber-clustering. Comput. Secur. 2025, 149, 104209. [Google Scholar] [CrossRef]
- Xuan, C.D.; Nguyen, T.T. A novel approach for APT attack detection based on an advanced computing. Sci. Rep. 2024, 14, 22223. [Google Scholar] [CrossRef] [PubMed]
- Yue, H.; Li, T.; Wu, D.; Zhang, R.; Yang, Z. Detecting APT attacks using an attack intent-driven and sequence-based learning approach. Comput. Secur. 2024, 140, 103748. [Google Scholar] [CrossRef]
- Liu, H.; Wang, Y.; Su, Z.; Wang, Z.; Pan, Y.; Lit, R. TRACEGADGET: Detecting and Tracing Network Level Attack Through Federal Provenance Graph. In Proceedings of the ICC 2024—IEEE International Conference on Communications, Denver, CO, USA, 9–13 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 2713–2718. [Google Scholar]
- Xu, F.; Zhao, Q.; Liu, X.; Wang, N.; Gao, M.; Wen, X.; Zhang, D. Advanced persistent threat detection via mining long-term features in provenance graphs. Front. Comput. Sci. 2025, 19, 1910809. [Google Scholar] [CrossRef]
- Li, T.; Liu, X.; Qiao, W.; Zhu, X.; Shen, Y.; Ma, J. T-trace: Constructing the APTs provenance graphs through multiple syslogs correlation. IEEE Trans. Dependable Secur. Comput. 2023, 21, 1179–1195. [Google Scholar] [CrossRef]
- Weng, Z.; Zhang, W.; Zhu, T.; Dou, Z.; Sun, H.; Ye, Z.; Tian, Y. RT-APT: A real-time APT anomaly detection method for large-scale provenance graph. J. Netw. Comput. Appl. 2025, 233, 104036. [Google Scholar] [CrossRef]
- Akbar, K.A.; Wang, Y.; Ayoade, G.; Gao, Y.; Singhal, A.; Khan, L.; Thuraisingham, B.; Jee, K. Advanced persistent threat detection using data provenance and metric learning. IEEE Trans. Dependable Secur. Comput. 2022, 20, 3957–3969. [Google Scholar] [CrossRef]
- Yang, F.; Xu, J.; Xiong, C.; Li, Z.; Zhang, K. {PROGRAPHER}: An anomaly detection system based on provenance graph embedding. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 4355–4372. [Google Scholar]
- Goyal, A.; Han, X.; Wang, G.; Bates, A. Sometimes, you aren’t what you do: Mimicry attacks against provenance graph host intrusion detection systems. In Proceedings of the 30th Network and Distributed System Security Symposium, San Diego, CA, USA, 27 February 2023. [Google Scholar]
- Zhu, T.; Yu, J.; Xiong, C.; Cheng, W.; Yuan, Q.; Ying, J.; Chen, T.; Zhang, J.; Lv, M.; Chen, Y.; et al. Aptshield: A stable, efficient and real-time apt detection system for linux hosts. IEEE Trans. Dependable Secur. Comput. 2023, 20, 5247–5264. [Google Scholar] [CrossRef]
- Li, S.; Dong, F.; Xiao, X.; Wang, H.; Shao, F.; Chen, J.; Guo, Y.; Chen, X.; Li, D. Nodlink: An online system for fine-grained apt attack detection and investigation. arXiv 2023, arXiv:2311.02331. [Google Scholar] [CrossRef]
- Wang, S.; Wang, Z.; Zhou, T.; Sun, H.; Yin, X.; Han, D.; Zhang, H.; Shi, X.; Yang, J. Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3972–3987. [Google Scholar] [CrossRef]
- Wang, Q.; Hassan, W.U.; Li, D.; Jee, K.; Yu, X.; Zou, K.; Chen, H. You are what you do: Hunting stealthy malware via data provenance analysis. In Proceedings of the NDSS, San Diego, CA, USA, 23–26 February 2020. [Google Scholar]
- Lv, M.; Gao, H.Z.; Qiu, X.; Chen, T.; Zhu, T.; Chen, J.; Ji, S. TREC: APT tactic/technique recognition via few-shot provenance subgraph learning. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, Salt Lake City, UT, USA, 14–18 October 2024; pp. 139–152. [Google Scholar]
- Ren, J.; Geng, R. Provenance-based APT campaigns detection via masked graph representation learning. Comput. Secur. 2025, 148, 104159. [Google Scholar] [CrossRef]
- Cheng, Z.; Lv, Q.; Liang, J.; Wang, Y.; Sun, D.; Pasquier, T.; Han, X. Kairos: Practical intrusion detection and investigation using whole-system provenance. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 3533–3551. [Google Scholar]
- Jia, Z.; Xiong, Y.; Nan, Y.; Zhang, Y.; Zhao, J.; Wen, M. {MAGIC}: Detecting advanced persistent threats via masked graph representation learning. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 5197–5214. [Google Scholar]
- Aly, A.; Iqbal, S.; Youssef, A.; Mansour, E. MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation Learning. IEEE Trans. Inf. Forensics Secur. 2024, 19, 5257–5271. [Google Scholar] [CrossRef]
- Rehman, M.U.; Ahmadi, H.; Hassan, W.U. Flash: A comprehensive approach to intrusion detection via provenance graph representation learning. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 3552–3570. [Google Scholar]
Dataset | Node of Graph | Edge of Graph | Attack Node | Attack Edge |
---|---|---|---|---|
THEIA-E3 | 327,408 | 597,282 | 18,904 | 25,789 |
CADETS-E3 | 357,173 | 840,299 | 12,846 | 177,849 |
TRACE-E3 | 3,288,676 | 4,080,457 | 68,086 | 457,011 |
Parameter Names | Default |
---|---|
lr | 0.001 |
Epohcs | 500 |
Batch_size | 8 |
Activation | Prelu |
Optimizer | Adam |
Loss_fn | Sce |
Dataset | TN | FN | TP | FP | AUC | F1 | Precision | Recall |
---|---|---|---|---|---|---|---|---|
THEIA-E3 | 308,061 | 18 | 18,886 | 443 | 99.83% | 98.79% | 97.71% | 99.90% |
CADETS-E3 | 344,091 | 31 | 128,151 | 236 | 98.06% | 98.97% | 98.19% | 99.76% |
TRACE-E3 | 616,010 | 25 | 68,061 | 11 | 99.99% | 99.97% | 99.98% | 99.96% |
Dataset | System | Precision | Recall | Accuracy | F1 |
---|---|---|---|---|---|
THEIA-E3 | THREATRACE | 87% | 99% | 99% | 93% |
FLASH | 93% | 99% | 99% | 96% | |
CAGE | 97% | 99% | 99% | 98% | |
CADETS-E3 | THREATRACE | 90% | 99% | 98% | 95% |
FLASH | 95% | 99% | 99% | 97% | |
CAGE | 98% | 99% | 98% | 99% | |
TRACE-E3 | THREATRACE | 72% | 99% | 99% | 83% |
FLASH | 95% | 99% | 99% | 97% | |
CAGE | 99% | 99% | 99% | 99% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, X.; Kuang, M.; Yang, H. Using Causality-Driven Graph Representation Learning for APT Attacks Path Identification. Symmetry 2025, 17, 1373. https://doi.org/10.3390/sym17091373
Cheng X, Kuang M, Yang H. Using Causality-Driven Graph Representation Learning for APT Attacks Path Identification. Symmetry. 2025; 17(9):1373. https://doi.org/10.3390/sym17091373
Chicago/Turabian StyleCheng, Xiang, Miaomiao Kuang, and Hongyu Yang. 2025. "Using Causality-Driven Graph Representation Learning for APT Attacks Path Identification" Symmetry 17, no. 9: 1373. https://doi.org/10.3390/sym17091373
APA StyleCheng, X., Kuang, M., & Yang, H. (2025). Using Causality-Driven Graph Representation Learning for APT Attacks Path Identification. Symmetry, 17(9), 1373. https://doi.org/10.3390/sym17091373