ConLBS: An Attack Investigation Approach Using Contrastive Learning with Behavior Sequence
Abstract
:1. Introduction
2. Related Work
2.1. Attack Investigation
2.2. Contrastive Learning Framework
3. Methodology
3.1. Provenance Graphs Construction and Optimization
3.2. Behavior Sequences
3.3. Behavior Sequence Augmentation
- (a)
- Sequence truncation randomly removes events from the head and tail of the behavior sequences and preserves the continuous sequence in the middle. The maximum length of the removed event is set to , where k is the total length of the sequence. The truncation enables the model to learn the intermediate process of the behaviors.
- (b)
- Event deletion randomly selects events in the behavior sequence and replaces them with a special token [DEL]. The percentage of events deleted was 20%. This strategy simulates scenarios where some system events were not recorded by the monitor tools or were lost.
- (c)
- Noise addition inserts random events into the behavior sequences. The inserted position is random. The addition of noise simulates scenarios in which the behavior sequence may include system events that do not belong to that particular behavior. Events of 5% length are randomly added at four selected locations, ensuring a total length of around 20%.
- (d)
- Substitution is a strategy used to enhance the robustness of the model. It involves randomly selecting certain events and replacing them with other events that share the same entity. The number of replaced events does not exceed 20%.
3.4. Behavior Sequence Representation
3.5. Sequence Classification Training
4. Experiment
4.1. Datasets and Setups
4.2. Attack Investigation Results
4.3. Comparison Analysis
4.4. Runtime Performance of ConLBS
5. Discussion
5.1. Assumption for ConLBS
5.2. Limitation of ConLBS
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Mirsaraei, A.G.; Barati, A.; Barati, H. A secure three-factor authentication scheme for IoT environments. J. Parallel Distrib. Comput. 2022, 169, 87–105. [Google Scholar] [CrossRef]
- Milajerdi, S.M.; Eshete, B.; Gjomemo, R.; Venkatakrishnan, V.N. Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 1795–1812. [Google Scholar]
- Milajerdi, S.M.; Gjomemo, R.; Eshete, B.; Sekar, R.; Venkatakrishnan, V.N. Holmes: Real-time apt detection through correlation of suspicious information flows. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 1137–1152. [Google Scholar]
- Zeng, J.; Chua, Z.L.; Chen, Y.; Ji, K.; Liang, Z.; Mao, J. Watson: Abstracting behaviors from audit logs via aggregation of contextual semantics. In Proceedings of the 28th Annual Network and Distributed System Security Symposium, NDSS, Online, 21–25 February 2021. [Google Scholar]
- Gao, P.; Shao, F.; Liu, X.; Xiao, X.; Qin, Z.; Xu, F.; Mittal, P.; Kulkarni, S.R.; Song, D. Enabling efficient cyber threat hunting with cyber threat intelligence. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 193–204. [Google Scholar]
- Alsaheel, A.; Nan, Y.; Ma, S.; Yu, L.; Walkup, G.; Celik, Z.B.; Zhang, X.; Xu, D. ATLAS: A Sequence-based Learning Approach for Attack Investigation. In Proceedings of the 30th USENIX Security Symposium, Online, 11–13 August 2021; pp. 3005–3022. [Google Scholar]
- Hassan, W.U.; Noureddine, M.A.; Datta, P.; Bates, A. OmegaLog: High-Fidelity Attack Investigation via Transparent Multi-layer Log Analysis. In Proceedings of the Network and Distributed System Security Symposium 2020, Online, 23–26 February 2020. [Google Scholar]
- Gao, P.; Xiao, X.; Li, Z.; Xu, F.; Kulkarni, S.R.; Mittal, P. AIQL: Enabling Efficient Attack Investigation from System Monitoring Data. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18), Boston, MA, USA, 11–13 July 2018; pp. 113–126. [Google Scholar]
- Yonghwi, K.; Wang, F.; Wang, W.; Lee, K.H. MCI: Modeling-based Causality Inference in Audit Logging for Attack Investigation. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 18–21 February 2018; Volume 2, p. 4. [Google Scholar]
- Zhao, J.; Yan, Q.; Liu, X.; Li, B.; Zuo, G. Cyber Threat Intelligence Modeling Based on Heterogeneous Graph Convolutional Network. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses ({RAID} 2020), San Sebastian, Spain, 14–16 October 2020; pp. 241–256. [Google Scholar]
- Hossain, M.N.; Sheikhi, S.; Sekar, R. Combating dependence explosion in forensic analysis using alternative tag propagation semantics. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1139–1155. [Google Scholar]
- Zhu, T.; Wang, J.; Ruan, L.; Xiong, C.; Yu, J.; Li, Y.; Chen, Y.; Chen, T. General, Efficient, and Real-time Data Compaction Strategy for APT Forensic Analysis. IEEE Trans. Inf. Forensics Secur. 2021, 16, 3312–3325. [Google Scholar] [CrossRef]
- Yang, R. RATScope: Recording and Reconstructing Missing RAT Semantic Behaviors for Forensic Analysis on Windows. IEEE Trans. Dependable Secur. Comput. 2020, 19, 1621–1638. [Google Scholar] [CrossRef]
- Du, M.; Li, F.; Zheng, G.; Srikumar, V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017. [Google Scholar]
- Ding, H.; Zhai, J.; Nan, Y. AIRTAG: Towards Automated Attack Investigation by Unsupervised Learning with Log Texts. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 373–390. [Google Scholar]
- Liu, F.; Wen, Y.; Zhang, D.; Jiang, X.; Xing, X.; Meng, D. Log2vec: A heterogeneous graph embedding based approach for detecting cyber threats within enterprise. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 3–5 June 2019; pp. 4171–4186. [Google Scholar]
- Yan, Y.; Li, R.; Wang, S.; Zhang, F.; Wu, W.; Xu, W. Consert: A contrastive framework for self-supervised sentence representation transfer. arXiv 2021, arXiv:2105.11741. [Google Scholar]
- Wu, Z.; Wang, S.; Gu, J.; Khabsa, M.; Sun, F.; Ma, H. Clear: Contrastive learning for sentence representation. arXiv 2020, arXiv:2012.15466. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- King, S.T.; Chen, P.M. Backtracking intrusions. ACM SIGOPS Oper. Syst. Rev. 2003, 37, 223–236. [Google Scholar] [CrossRef]
- Hassan, W.U.; Guo, S.; Li, D.; Chen, Z.; Jee, K.; Li, Z.; Bates, A. Nodoze: Combatting threat alert fatigue with automated provenance triage. In Proceedings of the Network and Distributed System Security Symposium 2019, San Diego, CA, USA, 24 February 2019. [Google Scholar]
- Zhang, Y.; He, R.; Liu, Z.; Lim, K.H.; Bing, L. An unsupervised sentence embedding method by mutual information maximization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bristol, UK, 6–9 September 2022; pp. 1601–1610. [Google Scholar]
- Fang, H.; Xie, P. Cert: Contrastive self-supervised learning for language understanding. arXiv 2020, arXiv:2005.12766. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Carlsson, F.; Sahlgren, M.; Gogoulou, E.; Gyllensten, A.C.; Ylipa, E. Semantic re-tuning with contrastive tension. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
- Giorgi, J.M.; Nitski, O.; Bader, G.D.; Wang, B. Declutr: Deep contrastive learning for unsupervised textual representations. arXiv 2020, arXiv:2006.03659. [Google Scholar]
- Torrey, J. Transparent Computing Engagement 3 Data Release. 2020. Available online: https://github.com/darpa-i2o/Transparent-Computing/blob/master/README-E3.md (accessed on 15 March 2023).
- Zhang, Y.; Wallace, B.C. A Sensitivity Analysis of (and Prac-titioners’ Guide to) Convolutional Neural Networks for Sentence Classification. Proc. Int. Jt. Conf. Nat. Lang. Process. 2017, 1, 253–263. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Tomas, M.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 2, 3111–3119. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pretraining approach. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Gao, P.; Liu, C.; Ayday, E.; Jee, K.; Wang, T.; Ye, Y.; Liu, Z.; Xiao, X. {Back-Propagating} System Dependency Impact for Attack Investigation. In Proceedings of the31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022. [Google Scholar]
Type | Node Name | Semantic |
---|---|---|
process | name_PID | name |
network | IP-Port, website | IP address, url |
file | .jpg, .png, .py, .java | picture file, code file |
\system32\, \Program files\ | system file, app file | |
*.html, *.lst | html file, lst file |
Attack Scenarios | Attack Investigation Results | ||||||
---|---|---|---|---|---|---|---|
TP | TN | FP | FN | Precision | Recall | F1-Score | |
ATLAS. S-1 | 4536 | 78,856 | 28 | 13 | 99.387% | 99.714% | 99.550% |
ATLAS. S-2 | 13,584 | 331,051 | 47 | 10 | 99.655% | 99.926% | 99.791% |
ATLAS. S-3 | 4975 | 109,285 | 22 | 23 | 99.560% | 99.540% | 99.550% |
ATLAS. S-4 | 13,199 | 88,576 | 21 | 4 | 99.841% | 99.970% | 99.905% |
ATLAS. M-1 | 6331 | 171,131 | 13 | 9 | 99.795% | 99.858% | 99.827% |
ATLAS. M-2 | 28,914 | 180,326 | 51 | 17 | 99.824% | 99.941% | 99.883% |
ATLAS. M-3 | 24,728 | 140,347 | 94 | 7 | 99.621% | 99.972% | 99.796% |
ATLAS. M-4 | 5945 | 137,167 | 24 | 22 | 99.598% | 99.631% | 99.615% |
ATLAS. M-5 | 23,526 | 452,354 | 86 | 37 | 99.636% | 99.843% | 99.739% |
ATLAS. M-6 | 6372 | 201,569 | 17 | 22 | 99.734% | 99.656% | 99.695% |
ATLAS. Avg. | 13,211 | 189,066 | 40 | 16 | 99.696% | 99.876% | 99.786% |
CADETS. case-1 | 87,658 | 436,957 | 218 | 76 | 99.752% | 99.913% | 99.833% |
CADETS. case-2 | 53,631 | 472,913 | 175 | 49 | 99.675% | 99.909% | 99.792% |
CADETS. case-3 | 34,097 | 209,681 | 58 | 47 | 99.830% | 99.862% | 99.846% |
CADETS. Avg. | 58,462 | 373,184 | 150 | 57 | 99.744% | 99.902% | 99.823% |
Method | Precision | Recall | F1-Score |
---|---|---|---|
RS + BERTBase | 87.782% | 84.333% | 86.023% |
Lem ATLAS + BERTBase | 97.102% | 92.184% | 94.579% |
Lem ConLBS + BERTBase | 99.532% | 98.831% | 99.180% |
RS + BERTRe-train | 93.850% | 89.700% | 91.728% |
Lem ATLAS + BERTRe-train | 99.132% | 99.365% | 99.248% |
LemConLBS + BERTRe-train | 99.696% | 99.876% | 99.786% |
Base Models/Method | Recall | Precision | F1-Score |
---|---|---|---|
Word2vec + CNN [29] | 87.425% | 89.379% | 88.391% |
Word2vec + LSTM [30] | 95.854% | 96.412% | 96.132% |
BERT [17] | 98.460% | 98.891% | 98.675% |
RoBERTa [32] | 99.601% | 99.829% | 99.715% |
ConLBS | 99.902% | 99.744% | 99.823% |
Method | Logs Size (/min) | Graph/Sequence Construction | Train Time | Investigation Time (Avg.) |
---|---|---|---|---|
POIROT [2] | 114.5 MB | 1:54:35 | -- | 7.72 s |
ATLAS | 169 MB | 0:30:23 | 0:28:26 | 5.0 s |
ConLBS | 358 MB | 0:23:48 | 0:36:35 | 2.53 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Zhang, R.; Liu, J. ConLBS: An Attack Investigation Approach Using Contrastive Learning with Behavior Sequence. Sensors 2023, 23, 9881. https://doi.org/10.3390/s23249881
Li J, Zhang R, Liu J. ConLBS: An Attack Investigation Approach Using Contrastive Learning with Behavior Sequence. Sensors. 2023; 23(24):9881. https://doi.org/10.3390/s23249881
Chicago/Turabian StyleLi, Jiawei, Ru Zhang, and Jianyi Liu. 2023. "ConLBS: An Attack Investigation Approach Using Contrastive Learning with Behavior Sequence" Sensors 23, no. 24: 9881. https://doi.org/10.3390/s23249881
APA StyleLi, J., Zhang, R., & Liu, J. (2023). ConLBS: An Attack Investigation Approach Using Contrastive Learning with Behavior Sequence. Sensors, 23(24), 9881. https://doi.org/10.3390/s23249881