SGMNet: A Supervised Seeded Graph-Matching Method for Cyber Threat Hunting
Abstract
:1. Introduction
- Large-Scale Graph: Every day, systems generate millions of events and activities, causing log volumes to increase exponentially [7]. Recent research findings [8] indicate that it takes an average of over 200 days to detect cyber attacks. As a result, provenance graphs built from these extensive system logs become large scale, adversely affecting the efficiency of threat hunting.
- High False-Positive Rate: The provenance graph generated from system logs typically contains substantial redundant information. This redundancy may negatively impact subgraph matching, leading to increased false positives during threat hunting.
- We propose SGMNet, a novel threat-hunting framework tailored for large-scale provenance graphs. By leveraging indicators of compromise (IOCs) as anchor points, our method extracts compact and behavior-relevant subgraphs, significantly reducing the search space and improving runtime performance.
- We integrate two key mechanisms from seeded graph-matching theory—witness-based neighborhood consistency and percolation-based seed expansion—into a unified, learnable framework. Unlike prior works with static rule-based expansion (e.g., Poirot), our method learns adaptive expansion strategies from historical attack paths to enhance matching robustness and reduce false positives.
- We conduct comprehensive experiments on four benchmark datasets from the DARPA TC project. The results demonstrate that SGMNet improves accuracy by up to 3.1% and reduces false positives to 0.00% in several scenarios, outperforming baselines such as DeepHunter, SimGNN, ProvG-Searcher, and Poirot.
2. Related Work
2.1. Graph-Based Modeling for Cybersecurity
2.2. Bigraph Evolution and Architectural Consistency
2.3. Seeded Matching and Threat-Hunting Models
3. System Design and Methodologym
3.1. System Overview
3.2. Problem Formulation
3.3. Key Concepts
4. System Components
4.1. Graph Generation
Algorithm 1 POI-based graph abstract method |
|
4.2. Seeded Graph Model Design
Algorithm 2 Pairwise convolutional propagation and feature refinement |
|
- (1)
- The propagation operator is constructed via a Kronecker-type product over symmetric sparsified graphs;
- (2)
- The seed set is symmetric, i.e., if , then for all , it holds that .
4.3. Seeded Graph Mathcing
4.4. Computational Complexity Analysis
- Convolution Module: The pairwise propagation is implemented using Kronecker-like operations on adjacency matrices. Naively, this would incur complexity. However, we exploit the sparsity of and , and compute:
- Percolation Module: Seed expansion is guided by attention over seed-aligned pairs. At each step, expansion over k seed pairs with average degree d takes time.
- Fusion and MLP Layers: The fusion vector is processed by a fixed-depth MLP. Thus, each layer requires operations.
5. Evaluation
5.1. Overview
5.2. The Running Efficiency of the Graph Generation Strategy
5.3. Comparison with Other Graph-Matching Models
5.4. Parameter Selection
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bardin, J.S. Cyber Warfare. In Computer and Information Security Handbook; Elsevier: Amsterdam, The Netherlands, 2025; pp. 1345–1380. [Google Scholar]
- Ajmal, A.B.; Shah, M.A.; Maple, C.; Asghar, M.N.; Islam, S.U. Offensive security: Towards proactive threat hunting via adversary emulation. IEEE Access 2021, 9, 126023–126033. [Google Scholar] [CrossRef]
- Bui, H.T.; Aboutorab, H.; Mahboubi, A.; Gao, Y.; Sultan, N.H.; Chauhan, A.; Parvez, M.Z.; Bewong, M.; Islam, R.; Islam, Z.; et al. Agriculture 4.0 and beyond: Evaluating cyber threat intelligence sources and techniques in smart farming ecosystems. Comput. Secur. 2024, 140, 103754. [Google Scholar] [CrossRef]
- Gao, P.; Shao, F.; Liu, X.; Xiao, X.; Qin, Z.; Xu, F.; Mittal, P.; Kulkarni, S.R.; Song, D. Enabling efficient cyber threat hunting with cyber threat intelligence. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 193–204. [Google Scholar] [CrossRef]
- Milajerdi, S.M.; Gjomemo, R.; Eshete, B.; Sekar, R.; Venkatakrishnan, V. HOLMES: Real-Time APT Detection through Correlation of Suspicious Information Flows. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 1137–1152. [Google Scholar] [CrossRef]
- Xu, Z.; Fang, P.; Liu, C.; Xiao, X.; Wen, Y.; Meng, D. DEPCOMM: Graph Summarization on System Audit Logs for Attack Investigation. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 23–25 May 2022; pp. 540–557. [Google Scholar] [CrossRef]
- Zhu, J.; He, S.; Liu, J.; He, P.; Xie, Q.; Zheng, Z.; Lyu, M.R. Tools and Benchmarks for Automated Log Parsing. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada, 25–31 May 2019; pp. 121–130. [Google Scholar] [CrossRef]
- Saraswat, A.; Tiwari, G. United Nations and Beyond: Legal Strategies for Defending Critical Energy Infrastructure Against Cyber Attacks. In Cybercrime Unveiled: Technologies for Analysing Legal Complexity; Springer: Berlin/Heidelberg, Germany, 2025; pp. 291–307. [Google Scholar] [CrossRef]
- Wei, R.; Cai, L.; Zhao, L.; Yu, A.; Meng, D. Deephunter: A graph neural network based approach for robust cyber threat hunting. In Proceedings of the Security and Privacy in Communication Networks: 17th EAI International Conference, SecureComm 2021, Virtual Event, 6–9 September 2021; Proceedings, Part I 17; Springer: Berlin/Heidelberg, Germany, 2021; pp. 3–24. [Google Scholar] [CrossRef]
- Altinisik, E.; Deniz, F.; Sencar, H.T. Provg-searcher: A graph representation learning approach for efficient provenance graph search. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023; pp. 2247–2261. [Google Scholar] [CrossRef]
- Milajerdi, S.M.; Eshete, B.; Gjomemo, R.; Venkatakrishnan, V. Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; ACM: New York, NY, USA, 2019; pp. 1795–1812. [Google Scholar]
- Mukherjee, K.; Wiedemeier, J.; Wang, T.; Kim, M.; Chen, F.; Kantarcioglu, M.; Jee, K. PROVEXPLAINER: Interpretable Explanations for GNN-Based Security Models via Provenance Subgraphs. IEEE Trans. Inf. Forensics Secur. 2023. [Google Scholar]
- Zhong, M.H.; Lin, M.; Zhang, C.; Xu, Z. A Survey on Graph Neural Networks for Intrusion Detection Systems: Methods, Trends and Challenges. Comput. Secur. 2024, 141, 103821. [Google Scholar] [CrossRef]
- Milner, R. Space and Mobility in Computation: Bigraphs and Mobility; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Giese, H.; Schaefer, I. Bigraphs in Modeling Self-Adaptive Software Architectures: A Survey. In Proceedings of the International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 45–62. [Google Scholar]
- Marmsoler, D.; Rydeheard, D. Refinement Checking for Bigraphical Reactive Systems. J. Log. Algebr. Methods Program. 2021, 120, 100640. [Google Scholar]
- Zhang, Y.; Li, X.; Wang, J. Dynamic Evolution Method and Symmetric Consistency Analysis for Big Data-Oriented Software Architecture Based on Extended Bigraph. Symmetry 2025, 17, 626. [Google Scholar] [CrossRef]
- Yu, L.; Xu, J.; Lin, X. SeedGNN: Graph Neural Networks for Supervised Seeded Graph Matching. arXiv 2023, arXiv:2205.13679. [Google Scholar]
- Shariatnasab, M.; Shirani, F.; Garg, S.; Erkip, E. On Graph Matching Using Generalized Seed Side-Information. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 2726–2731. [Google Scholar]
- Chao, X.; Kou, G.; Peng, Y.; Herrera-Viedma, E.; Herrera, F. An efficient consensus reaching framework for large-scale social network group decision making and its application in urban resettlement. Inf. Sci. 2021, 575, 499–527. [Google Scholar] [CrossRef]
- Dadush, D.; Milanič, M.; Tamir, T. Introduction: ACM-SIAM Symposium on Discrete Algorithms (SODA) 2022 Special Issue. ACM Trans. Algorithms 2024, 20, 1–2. [Google Scholar] [CrossRef]
- Xu, Q.; Wang, S.; Wei, J.; Jiang, B.; Tao, Z.; Luo, B. Dynamic semantic-geometric guidance and structure transfer network for cross-scene hyperspectral image classification. Neural Netw. 2025, 187, 107374. [Google Scholar] [CrossRef] [PubMed]
- Bai, Y.; Ding, H.; Bian, S.; Chen, T.; Sun, Y.; Wang, W. SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11–15 February 2019; ACM: New York, NY, USA, 2019; pp. 384–392. [Google Scholar] [CrossRef]
- Asudani, D.S.; Nagwani, N.K.; Singh, P. Impact of word embedding models on text analytics in deep learning environment: A review. Artif. Intell. Rev. 2023, 56, 10345–10425. [Google Scholar] [CrossRef] [PubMed]
- Guo, L.; Yan, F.; Li, T.; Yang, T.; Lu, Y. An automatic method for constructing machining process knowledge base from knowledge graph. Robot. Comput. Integr. Manuf. 2022, 73, 102222. [Google Scholar] [CrossRef]
- Qureshi, R.J.; Ramel, J.Y.; Cardot, H. Graph based shapes representation and recognition. In Proceedings of the International Workshop on Graph-Based Representations in Pattern Recognition, Anacapri, Italy, 16–18 May 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 49–60. [Google Scholar] [CrossRef]
Scenario | Short Description |
---|---|
Q1+CADETS | A Nginx server was exploited and a malicious file was downloaded and executed. The attacker tried to inject into the sshd process, but failed. |
Q2+TRACE | The Firefox process was exploited and established a connection to the attacker’s operator console. The attacker downloaded and executed a malicious file. |
Q3+TRACE | A Firefox extension (a password manager) was exploited. A malicious file was downloaded and executed to connect to the C&C server. |
Q4+Theia | The attacker tried to attack THEIA using an e-mail with a malicious executable attachment. |
Q5+FiveDirections | An attacker exploited Firefox on Windows, loaded malware, reconned the network, and exfiltrated files, but lost access due to a failed exfiltration. |
Dataset | SGMNet (Ours) | Poirot | ||
---|---|---|---|---|
Total Time (s) | Std. Dev (s) | Total Time (s) | Std. Dev (s) | |
Theia | 59.1 | 3.8 | 160.0 | 11.2 |
Trace | 71.4 | 4.2 | 189.5 | 14.7 |
Cadets | 64.8 | 3.5 | 174.2 | 12.6 |
FiveDirections | 51.2 | 2.9 | 142.8 | 9.5 |
Original Graph | Abstract Graph | Simplified Graph | ||||
---|---|---|---|---|---|---|
Nodes | Edges | Nodes | Edges | Nodes | Edges | |
Theia | 588K | 86M | 30K | 5M | 186 | 2564 |
Trace | 127K | 171K | 76K | 4.1M | 1077 | 1714 |
Cadets | 319K | 62M | 22K | 4.2M | 547 | 1531 |
FiveDirections | 203K | 12M | 3.2K | 17K | 155 | 349 |
Method | Metrics | ||||||
---|---|---|---|---|---|---|---|
Acc. | F1 | Prec. | Recall | FPR | GED | ||
Theia | SimGNN | 83.28 | 84.49 | 78.77 | 91.11 | 24.56 | 0.549 |
DeepHunter | 83.67 | 84.42 | 80.69 | 88.53 | 21.19 | 0.303 | |
Poirot | 97.38 | 97.44 | 95.16 | 99.84 | 5.07 | 0.129 | |
ProvG-Searcher | 99.83 | 99.84 | 99.98 | 99.69 | 0.02 | 0.105 | |
SGMNet | 99.80 | 99.80 | 100.00 | 99.61 | 0.00 | 0.103 | |
Trace | SimGNN | 75.93 | 78.63 | 70.69 | 88.57 | 36.72 | 0.638 |
DeepHunter | 74.93 | 77.45 | 70.36 | 86.13 | 36.28 | 0.417 | |
Poirot | 97.99 | 98.01 | 97.03 | 99.02 | 3.02 | 0.246 | |
ProvG-Searcher | 99.34 | 99.33 | 99.34 | 98.69 | 0.01 | 0.155 | |
SGMNet | 98.93 | 99.37 | 100.00 | 98.67 | 0.00 | 0.173 | |
Cadets | SimGNN | 84.50 | 85.55 | 80.10 | 91.80 | 22.80 | 0.495 |
DeepHunter | 94.11 | 85.21 | 79.69 | 91.55 | 23.34 | 0.296 | |
Poirot | 98.18 | 98.16 | 99.30 | 97.05 | 0.68 | 0.145 | |
ProvG-Searcher | 99.78 | 99.76 | 99.96 | 99.61 | 0.03 | 0.113 | |
SGMNet | 99.92 | 99.61 | 100.00 | 99.22 | 0.00 | 0.109 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Su, L. SGMNet: A Supervised Seeded Graph-Matching Method for Cyber Threat Hunting. Symmetry 2025, 17, 898. https://doi.org/10.3390/sym17060898
Zhang C, Su L. SGMNet: A Supervised Seeded Graph-Matching Method for Cyber Threat Hunting. Symmetry. 2025; 17(6):898. https://doi.org/10.3390/sym17060898
Chicago/Turabian StyleZhang, Chenghong, and Lingyin Su. 2025. "SGMNet: A Supervised Seeded Graph-Matching Method for Cyber Threat Hunting" Symmetry 17, no. 6: 898. https://doi.org/10.3390/sym17060898
APA StyleZhang, C., & Su, L. (2025). SGMNet: A Supervised Seeded Graph-Matching Method for Cyber Threat Hunting. Symmetry, 17(6), 898. https://doi.org/10.3390/sym17060898