An APT Event Extraction Method Based on BERT-BiGRU-CRF for APT Attack Detection
Abstract
:1. Introduction
- (1)
- An APT event schema is proposed based on analyzing APT attack stages. Event schemas are different in different fields. For APT events, it needs to define a proper schema to extract effective information.
- (2)
- An APT event dataset in Chinese is constructed to train models. There is no APT event dataset although there are many event datasets. It is necessary to construct a corresponding dataset to train extraction models.
- (3)
- An APT event extraction method based on the BERT-BiGRU-CRF model is proposed. This offers numerous advantages, which are helpful for solving the issues of insufficient attack sample data and low detection accuracy.
2. Related Works
2.1. APT Attack Detection Method
2.2. CTI Analysis
2.3. Event Extraction
3. Materials and Methods
3.1. Data Source and Preprocess
3.2. APT Attack Stages and Event Schema
3.3. APT Dataset Construction
3.4. APT Attack Event Extraction Based on BERT-BiGRU-CRF
- (1)
- BERT layer. At first, the BERT model is applied to pre-train word vectors. The BERT encoding layer is located at the bottom of the model. In the encoding layer, tokens are segmented from the input of APT texts, and the segmented tokens are transformed into corresponding word vectors by extracting the semantic feature.
- (2)
- BiGRU layer. Secondly, it connects with BiGRU to carry out the APT trigger word and event argument extraction. The pre-trained word vector is fed into the BiGRU layer, which will continue to extract its features and obtain the emission matrix of its sequence. The final output is the predicted label (APT-related trigger word or arguments defined in the schema) corresponding to each word.
- (3)
- CRF layer. The obtained result is then constrained by the CRF layer and its transfer matrix is obtained. Ultimately, the optimal label sequence is output.
3.4.1. BERT Pre-Training Layer
3.4.2. BiGRU Layer
3.4.3. CRF Layer
- (1)
- The beginning of the sentence should be “B-“ or “O”, not “I-“; as shown in Figure 8, the sentence cannot start with “I-Attack Weapon”.
- (2)
- B-lablel1 I-label2 I-label3… “In this case, categories 1, 2, and 3 should be the same entity category.” For example, “B-attacker I-attacker” is correct, while “B-attacker I-attack weapon” is incorrect.
- (3)
- “O I-Attack Weapon” is incorrect, the beginning of the named entity should be “B-“ instead of “I-“.
4. Experimental Results
4.1. Model Construction and Training
4.2. Experimental Results
4.2.1. Comparison with Other Models
- Precision = number of correct predictions with “Positive”/number of predictions with “Positive”, mainly focusing on the accuracy of the results predicted by the model. The formula is as shown below:For TP, FP, etc., the meanings are as shown in Table 5.
- Recall = number of correctly predicted items with “Positive”/number of manually annotated items with “Positive”, mainly focusing on what the model missed. The formula is as shown below:
- F1 = 2 × Precision × Recall/(Precision + Recall), the formula is calculated as follows:
4.2.2. Performance Analysis of BERT-BiGRU-CRF Model for APT Attack Event Extraction
4.2.3. Case Study
- (1)
- The input data were preprocessed including word cut, word2id, long text cut, and short text padding.
- (2)
- The preprocessed data were input to the first BERT-BiGRU-CRF model to extract the trigger word. In this case, the trigger word is “漏洞利用” (“exploit vulnerability”), and the corresponding event type is “攻击实施-漏洞利用” (“Attack implementation-Vulnerability exploitation”).
- (3)
- According to the APT event type, the event roles are decided. Data are input to the second BERT-BiGRU-CRF model to extract the corresponding arguments.
5. Conclusions and Discussion
Author Contributions
Funding
Conflicts of Interest
References
- National Institute of Standards and Technology. SP800—53 Managing Information Security Risks; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2013. [Google Scholar]
- Zhang, Y.; Pan, X.; Liu, Q.; Cao, J.; Luo, Z. APT attacks and defenses. J. Tsinghua Univ. (Sci. Technol.) 2017, 57, 1127–1133. [Google Scholar]
- Chinese CNCERT. 2020 China Cybersecurity Analysis, [EB/OL]. (2021-5-26) [2021-6-4]. Available online: https://www.cert.org.cn/publish/main/upload/File/2020%20CNCERT%20Cybersecurity%20Analysis.pdf (accessed on 1 May 2023).
- Yang, H. Research on APT Attack of Behavior Analyzing and Defense Decision. Master’s Thesis, Information Engineering University, Zhengzhou, China, 2017. [Google Scholar]
- Yang, H.; Wang, K. Phase-based classification and evaluation of APT attack behaviors. Comput. Eng. Appl. 2017, 53, 97–104, 234. [Google Scholar]
- Fu, Y.; Li, H.; Wu, X.; Wang, J. Detecting APT attacks: A survey from the perspective of big data analysis. J. Commun. 2015, 36, 1–14. [Google Scholar]
- Chen, X.; Zeng, X.; Wang, W.; Shao, G. Big Data Analytics for Network Security and Intelligence. Adv. Eng. Sci. 2017, 39, 112–129. [Google Scholar]
- Wang, D.; Zhao, W.; Ding, Z. Review of Big Data Security Critical Technologies. J. Beijing Univ. Technol. 2017, 43, 335–349. [Google Scholar]
- Sun, L. Research on Key Technology of APT Detection Based on Malicious Domain Name. Master’s Thesis, Harbin Engineering University, Harbin, China, 2017. [Google Scholar]
- Sun, J.; Wang, C. Research on APT attack detection based on behavior analysis. Electron. Des. Eng. 2019, 27, 142–146. [Google Scholar]
- Eslam, A.; Ivan, Z. A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Comput. Secur. 2020, 92, 101760. [Google Scholar] [CrossRef]
- Hamid, D.; Sajad, H.; Ali, D.; Sattar, H.; Hadis, K.; Reza, P.; Raymond, C. Detecting Cryptomining Malware: A Deep Learning Approach for Static and Dynamic Analysis. J. Grid Comput. 2020, 18, 293–303. [Google Scholar] [CrossRef]
- Yang, P.; Wu, Y.; Su, L.; Liu, B. Overview of Threat Intelligence Sharing Technologies in Cyberspace. Comput. Sci. 2018, 45, 9–18, 26. [Google Scholar]
- Ramsdale, A.; Shiaeles, S.; Kolokotronis, N. A Comparative Analysis of Cyber-Threat Intelligence Sources, Formats and Languages. Electronics 2020, 9, 824. [Google Scholar] [CrossRef]
- Lin, Y.; Liu, P.; Wang, H.; Wang, W.; Zhang, Y. Overview of Threat Intelligence Sharing and Exchange in Cybersecurity. J. Comput. Res. Dev. 2020, 57, 2052–2065. [Google Scholar]
- Jo, H.; Kim, J.; Porras, P.; Yegneswaran, V.; Shin, S. GapFinder: Finding Inconsistency of Security Information from Unstructured Text. IEEE Trans. Inf. Forensics Secur. 2021, 16, 86–99. [Google Scholar] [CrossRef]
- Christian, R.; Dutta, S.; Park, Y.; Rastogi, N. An Ontology-driven Knowledge Graph for Android Malware. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 15–19 November 2021; pp. 2435–2437. [Google Scholar]
- Zhang, Q.; Ma, W.; Wang, Y.; Zhang, Y.; Shi, Z.; Li, Y. Backdoor Attacks on Image Classification Models in Deep Neural Networks. Chin. J. Electron. 2022, 31, 199–212. [Google Scholar] [CrossRef]
- Li, Y.; He, J.; Li, J.; Yu, Y.; Tan, F. US Cyber Threat Intelligence Sharing Technology Analysis of Framework and Standards. Secrecy Sci. Technol. 2016, 6, 16–21. [Google Scholar]
- Wagner, T.; Mahbub, K.; Palomar, E.; Abdallah, A. Cyber threat intelligence sharing: Survey and research directions. Comput. Secur. 2019, 87, 10158. [Google Scholar] [CrossRef]
- Liao, X.; Yuan, K.; Wang, X.; Li, Z.; Xing, L.; Beyah, R. Acing the IOC game: Toward automatic discovery and analysis of open-source cyber threat intelligence. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 755–766. [Google Scholar]
- Husari, G.; Al-Shaer, E.; Ahmed, M.; Chu, B.; Niu, X. TTPDrill: Automatic and accurate extraction of threat actions from unstructured text of CTI sources. In Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, 4–8 December 2017; pp. 103–115. [Google Scholar]
- Shang, W.; Zhu, P.; Wang, B.; Cao, Z.; Zhang, M. Key Technologies for Building Knowledge Graphs for Threat Intelligence. Autom. Panor. 2023, 40, 15–19. [Google Scholar]
- Khoo, C.S.; Kornfilt, J.; Oddy, R.N.; Myaeng, S.H. Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Lit. Linguist. Comput. 1998, 13, 177–186. [Google Scholar] [CrossRef] [Green Version]
- Khoo, C.S.; Chan, S.; Niu, Y. Extracting causal knowledge from a medical database using graphical patterns. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, 1–8 October 2000; pp. 336–343. [Google Scholar]
- Hashimoto, C.; Torisawa, K.; De Saeger, S.; Oh, J.H. Excitatory or inhibitory: A new semantic orientation extracts contradiction and causality from the web. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju, Korea, 12–14 July 2012; pp. 619–630. [Google Scholar]
- Sadek, J.; Meziane, F. Extracting Arabic Causal Relations Using Linguistic Patterns. ACM Trans. Asian Lang. Inf. Process. 2016, 15, 14. [Google Scholar] [CrossRef]
- Girju, R. Automatic detection of causal relations for question answering. In Proceedings of the ACL 2003 workshop on Multilingual Summarization and Question Answering, Sapporo, Japan, 11–12 July 2003; pp. 76–83. [Google Scholar]
- Blanco, E.; Castell, N.; Moldovan, D. Causal relation extraction. In Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco, 26 May–1 June 2008. [Google Scholar]
- Wang, H.; Shi, Y.; Zhou, X.; Zhou, Q.; Shao, S.; Bouguettaya, A. Web service classification using support vector machine. In Proceedings of the 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, Arras, France, 27–29 October 2010; Volume 1, pp. 3–6. [Google Scholar]
- Zhao, S.; Liu, T.; Zhao, S.; Chen, Y.; Nie, J.Y. Event causality extraction based on connectives analysis. Neurocomputing 2016, 173, 1943–1950. [Google Scholar] [CrossRef]
- De Silva, T.N.; Zhibo, X.; Rui, Z.; Kezhi, M. Causal relation identification using convolutional neural networks and knowledge based features. Int. J. Comput. Syst. Eng. 2017, 11, 696–701. [Google Scholar]
- Jin, G.; Zhou, J.; Qu, W.; Long, Y.; Gu, Y. Exploiting Rich Event Representation to Improve Event Causality Recognition. Intell. Autom. Soft Comput. 2021, 30, 161–173. [Google Scholar] [CrossRef]
- Gao, J.; Luo, X.; Wang, H. Chinese causal event extraction using causality-associated graph neural network. Concurr. Comput. Pract. Exp. 2022, 34, e6572. [Google Scholar] [CrossRef]
- Xu, J.; Zuo, W.; Liang, S.; Wang, Y. Causal Relation Extraction Based on Graph Attention Networks. J. Comput. Res. Dev. 2020, 57, 16. [Google Scholar]
- Tan, Y.; Peng, H.; Qin, J.; Xue, Y. Chinese causality analysis based on weight calculation. J. Huazhong Univ. Sci. Technol. (Nat. Sci. Ed.) 2022, 50, 112–117. [Google Scholar] [CrossRef]
- Fei, H.; Ren, Y.; Ji, D. A tree-based neural network model for biomedical event trigger detection. Inf. Sci. 2020, 512, 175–185. [Google Scholar] [CrossRef]
- Nguyen, T.; Cho, K.; Grishman, R. Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 300–309. [Google Scholar]
- Ritter, A.; Mausam; Etzioni, O.W.; Clark, S. Open domain event extraction from twitter. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 1104–1112. [Google Scholar]
- Lu, D.; Ran, S.; Tetreault, J.; Jaimes, A. Event Extraction as Question Generation and Answering. arXiv 2023, arXiv:2307.05567. [Google Scholar]
- Fei, H.; Wu, S.; Li, J.; Li, B.; Li, F.; Qin, L.; Zhang, M.; Zhang, M.; Chua, T. Lasuie: Unifying information extraction with latent adaptive structure-aware generative language model. Adv. Neural Inf. Process. Syst. 2022, 35, 15460–15475. [Google Scholar]
- Pustejovsky, J.; Hanks, P.; Sauri, R.; See, A.; Gaizauskas, R.; Setzer, A.; Radev, D.; Sundheim, B.; Day, D.; Ferro, L. The timebank corpus. Corpus Linguist. 2003, 2003, 40. [Google Scholar]
- Doddington, G.R.; Mitchell, A.; Przybocki, M.A.; Ramshaw, L.A.; Strassel, S.; Weischedel, R.M. The Automatic Content Extraction (ACE) Program-Tasks, Data, and Evaluation. LREC 2004, 2, 837–840. [Google Scholar]
- Wang, X. Event-Oriented Text Knowledge Discovery and Representation. Ph.D. Thesis, Shanghai University, Shanghai, China, 2017. Available online: https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CDFD0911&filename=2010252946.nh (accessed on 15 May 2023).
- Mariko, D.; Akl, H.A.; Labidurie, E.; Mazancourt, H.; El-Haj, M. Financial document causality detection shared task. arXiv 2020, arXiv:2012.02505. [Google Scholar]
- Drury, B.; Gonçalo Oliveira, H.; De Andrade Lopes, A. A survey of the extraction and applications of causal relations. Nat. Lang. Eng. 2020, 28, 361–400. [Google Scholar] [CrossRef]
- Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
- Auty, M. Anatomy of an advanced persistent threat. Netw. Secur. 2015, 2015, 13–16. [Google Scholar] [CrossRef]
- Chen, P.; Desmet, L.; Huygens, C. A study on advanced persistent threats. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8735, pp. 63–72. [Google Scholar] [CrossRef]
Types of Web Sites | Detail Information |
---|---|
Authoritative network security technology center | https://www.cert.org.cn/ * |
https://www.cnvd.org.cn/ * | |
https://cve.mitre.org/ https://nvd.nist.gov/ https://www.cvedetails.com/ | |
Major manufacturers | https://www.oracle.com/security-alerts/ |
https://msrc.microsoft.com/update-guide/ * | |
Research institutions | https://www.kaspersky.com.cn/ * |
https://www.nsfocus.com.cn/ * | |
https://www.qianxin.com/ * | |
Forum | honker or hacker organizations and forums |
APT dataset | https://github.com/cyber-research/APTMalware |
NO. | Event Category | Event Type | Argument Role1 | Argument Role2 | Argument Role3 | Argument Role4 | Argument Role5 |
---|---|---|---|---|---|---|---|
1 | Preparation | Spear phishing attack | Fake file | True file | Attacker | Target | Attack tactics |
2 | Water hole attack | Fake file | True file | Attack weapon | |||
3 | Scan | Target | |||||
4 | Steal information | Attacker | Target | Stolen target | Attack weapon | ||
5 | Implementation | Trojan | Attacker | Target | Attack weapon | Attack tactics | |
6 | Worm | Attacker | Target | Attack weapon | |||
7 | Back door | Attacker | Target | Attack weapon | |||
8 | Virus | Attacker | Target | Attack weapon | Attack tactics | ||
9 | Vulnerability exploitation | Attacker | Target | Attack weapon | Attack tactics |
Transition Matrix | 0 | B-Attacker | I-Attacker | B-Attack Weapon | I-Attack Weapon |
---|---|---|---|---|---|
0 | 0.8 | 0.07 | 0 | 0.12 | 0 |
B-Attacker | 0 | 0 | 1 | 0 | 0 |
I-Attacker | 0.18 | 0 | 0.85 | 0 | 0 |
B-Attack Weapon | 0 | 0 | 0 | 0 | 1 |
I-Attack Weapon | 1 | 0 | 0 | 0 | 0 |
Parameter Name | Values |
---|---|
num_epoch(training rounds) | 60 |
learnin_rate(learning_rate) | 5 × 10−5 |
weight_decay(weight decay) | 0.01 |
warmup_proportion(warmup proportion) | 0.1 |
gru_hidden_size(gru hidden size) | 300 |
True/False Examples | Prediction | |
---|---|---|
Positive | Negative | |
True | TP | FN |
False | FP | TN |
Model | Trigger Word Detection | APT Event Argument Recognition | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | |
ERNIE | 1.00 | 1.00 | 1.00 | 0.5859 | 0.8189 | 0.6831 |
BERT | 1.00 | 1.00 | 1.00 | 0.5812 | 0.8813 | 0.7004 |
BiGRU-CRF | 0.9903 | 1.00 | 0.9951 | 0.5211 | 0.8462 | 0.6451 |
BERT-BiGRU-CRF | 1.00 | 1.00 | 1.00 | 0.7013 | 0.8011 | 0.7479 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiang, G.; Shi, C.; Zhang, Y. An APT Event Extraction Method Based on BERT-BiGRU-CRF for APT Attack Detection. Electronics 2023, 12, 3349. https://doi.org/10.3390/electronics12153349
Xiang G, Shi C, Zhang Y. An APT Event Extraction Method Based on BERT-BiGRU-CRF for APT Attack Detection. Electronics. 2023; 12(15):3349. https://doi.org/10.3390/electronics12153349
Chicago/Turabian StyleXiang, Ga, Chen Shi, and Yangsen Zhang. 2023. "An APT Event Extraction Method Based on BERT-BiGRU-CRF for APT Attack Detection" Electronics 12, no. 15: 3349. https://doi.org/10.3390/electronics12153349