EDAER: Entropy-Driven Approach for Entity and Relation Extraction in Chinese Cyber Threat Intelligence
Abstract
1. Introduction
2. Related Work
2.1. PLMs
2.2. Chinese CTI NER
2.3. Chinese CTI RE
3. EDAER Approach
3.1. Overview
3.2. EDRMRC Model
3.2.1. RoBERTa_wwm-Based Global Semantic Representation
3.2.2. Mamba-Based Efficient Long-Range Dependency Modeling
3.2.3. RDCNN-Based Multi-Scale Local Feature Extraction
3.2.4. CRF-Based Global Sequence Decoding
3.2.5. Entropy-Based Dynamic Gating Mechanism
3.2.6. Uncertainty-Guided Contrastive Learning Mechanism
3.2.7. Joint Loss Function for EDRMRC Training
3.3. Entropy-Driven Relation Extraction Model (EDRRC)
4. Establishment of MyCDTier Dataset
4.1. Data Acquisition, Preprocessing, and Annotation
4.2. Named Entity Types
4.3. Relation Types
5. Experimental Evaluation
5.1. Experimental Setup
5.1.1. Experimental Environment
5.1.2. Dataset
- (1)
- CDTier Dataset [6]. This is an open-source dataset for Chinese CTI, accessible at https://github.com/MuYu-z/CDTier (accessed on 1 June 2025). For NER, it includes 100 CTI reports, 3744 threat sentences, and 4259 threat knowledge objects. The dataset is annotated with five entity types (Attacker, Campaign, Industry, Region, and Tools) and their beginning (B-) and inside (I-) tags. Thus, there are 10 unique labels. For RE, there are 100 CTI reports, 2598 threat sentences, and 2562 knowledge object relations, covering 11 types of relationships (e.g., utilize, target) based on standards like STIX.
- (2)
- MyCDTier. This is detailed in Section 4. It is a proprietary dataset developed in-house for cybersecurity NER and RE, featuring the 16 entity types defined in Table 3. The RE task involves extracting nine specific relationships between these entity types—namely, Control, Occur, Possess, Present, Utilize, Impose, Type-of, Eliminate, and Destroy—which significantly increases the task complexity.
5.1.3. Evaluation Metrics
5.1.4. Models and Baselines
- RoBERTa_wwm-RDCNN-CRF. Combines RoBERTa_wwm with RDCNN and CRF. We selected this baseline because RoBERTa_wwm is a state-of-the-art pretrained model optimized for Chinese text, and RDCNN is widely used to capture local contextual features. This combination serves as a strong representative of recent Chinese NER models, allowing us to verify the performance gain from introducing the Mamba module in our architecture.
- BERT-Mamba-RDCNN-CRF. Employs the Chinese version of BERT-base as the PLM, paired with Mamba, RDCNN, and CRF. We included this baseline to compare the impacts of different pretrained language models on performance.
- RoBERTa_wwm-TENER-CRF. Combines RoBERTa_wwm with TENER (Transformer-based) and CRF. We chose this baseline because TENER is a classic Transformer-based architecture for sequence labeling, enabling us to demonstrate the advantages of our Mamba + RDCNN hybrid feature extraction over Transformer-only architectures in CTI scenarios.
- BERT-BiLSTM-GRU-CRF: Uses BERT with LSTM and GRU layers, followed by CRF. This is a well-established hybrid architecture that was widely used in early Chinese NER research. By including it, we can highlight the technical advantages of modern components (e.g., Mamba, RDCNN) over traditional recurrent models in the CTI domain.
5.2. Experimental Results
5.2.1. Named Entity Recognition Results
5.2.2. Relation Extraction Results
6. Conclusions and Future Work
- (1)
- In both NER and RE tasks, the PLM model RoBERTa_wwm significantly outperforms BERT.
- (2)
- In NER tasks, Mamba achieves superior results compared to other methods.
- (3)
- In both NER and RE tasks, the entropy-based dynamic gating mechanism is helpful for performance improvement.
- (4)
- In NER tasks, the uncertainty-guided contrastive learning mechanism is helpful for performance improvement.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- An Overview of 2025 Global APT Attack Landscape. 2025. Available online: https://nsfocusglobal.com/pt-br/an-overview-of-2025-global-apt-attack-landscape/ (accessed on 20 February 2026).
- Wu, X.; Wang, M.; Chang, X.; Li, C.; Wang, Y.; Liang, B.; Deng, S. Resisting Memorization-Based APT Attacks Under Incomplete Information in DDHR Architecture: An Entropy-Heterogeneity-Aware RL-Based Scheduling Approach. Entropy 2025, 27, 1238. [Google Scholar] [CrossRef]
- Incident Response Recommendations and Considerations for Cybersecurity Risk Management. 2025. Available online: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r3.pdf (accessed on 20 February 2026).
- Mouiche, I.; Saad, S. Entity and relation extractions for threat intelligence knowledge graphs. Comput. Secur. 2025, 148, 104120. [Google Scholar] [CrossRef]
- Zhen, Z.; Gao, J. Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF. Comput. Mater. Contin. 2023, 77, 299–323. [Google Scholar] [CrossRef]
- Zhou, Y.; Ren, Y.; Yi, M.; Xiao, Y.; Tan, Z.; Moustafa, N.; Tian, Z. Cdtier: A Chinese dataset of threat intelligence entity relationships. IEEE Trans. Sustain. Comput. 2023, 8, 627–638. [Google Scholar] [CrossRef]
- Cui, Y.; Che, W.; Liu, T.; Qin, B.; Wang, S.; Hu, G. Revisiting pre-trained models for Chinese natural language processing. arXiv 2020, arXiv:2004.13922. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Demirol, D.; Das, R.; Hanbay, D. A novel approach for cyber threat analysis systems using bert model from cyber threat intelligence data. Symmetry 2025, 17, 587. [Google Scholar] [CrossRef]
- Chen, L.; Deng, H.; Zhang, J.; Zheng, B.; Jiang, R. Threat Intelligence Named Entity Recognition Based on Segment-Level Information Extraction and Similar Semantic Space Construction. Symmetry 2025, 17, 783. [Google Scholar] [CrossRef]
- Liu, L.; Wang, M.; Zhang, M.; Qing, L.; He, X. Uamner: Uncertainty-aware multimodal named entity recognition in social media posts. Appl. Intell. 2022, 52, 4109–4125. [Google Scholar] [CrossRef]
- Han, Y.; Lu, Z.; Jiang, B.; Liu, Y.; Zhang, C.; Jiang, Z.; Li, N. MTLAT: A Multi-Task Learning Framework Based on Adversarial Training for Chinese Cybersecurity NER. In Network and Parallel Computing; NPC 2020; Springer: Cham, Switzerland, 2020; pp. 43–54. [Google Scholar]
- Yang, K.; Yang, Z.; Zhao, S.; Yang, Z.; Zhang, S.; Chen, H. Uncertainty-Aware Contrastive Learning for semi-supervised named entity recognition. Knowl.-Based Syst. 2024, 296, 111762. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, K.; Tong, R.; Cai, C.; Chen, D.; Wu, X. A Two-Stage Boundary-Enhanced Contrastive Learning approach for nested named entity recognition. Expert Syst. Appl. 2025, 271, 126707. [Google Scholar] [CrossRef]
- Liu, Y.; Jiang, X.; Lv, P.; Lu, Y.; Li, S.; Zhang, K.; Xu, M. Hierarchical symmetric cross entropy for distant supervised relation extraction. Appl. Intell. 2024, 54, 11020–11033. [Google Scholar] [CrossRef]
- Sun, Q.; Huang, K.; Yang, X.; Hong, P.; Zhang, K.; Poria, S. Uncertainty Guided Label Denoising for Document-level Distant Relation Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Kerrville, TX, USA, 2023. [Google Scholar]
- Jamal, R.; Ourekouch, M.; Erradi, M. UOREX: Towards Uncertainty-Aware Open Relation Extraction. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Association for Computational Linguistics: Kerrville, TX, USA, 2025; pp. 6027–6040. [Google Scholar]
- Li, Y.; Yu, X.; Liu, Y.; Chen, H.; Liu, C. Uncertainty-Aware Bootstrap Learning for Joint Extraction on Distantly-Supervised Data. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); Association for Computational Linguistics: Kerrville, TX, USA, 2023; pp. 1349–1358. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Kerrville, TX, USA, 2019; pp. 4171–4186. [Google Scholar]
- Tsai, C.; Yang, C.; Chen, C. CTI ANT: Hunting for Chinese threat intelligence. In 2020 IEEE International Conference on Big Data (Big Data); IEEE: Piscataway, NJ, USA, 2020; pp. 1847–1852. [Google Scholar]
- Long, Z.; Tan, L.; Zhou, S.; He, C.; Liu, X. Collecting indicators of compromise from unstructured text of cybersecurity articles using neural-based sequence labelling. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Feng, X.; He, S.; Wei, X.; Liu, R.; Yue, H.; Wang, X. PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence. Appl. Sci. 2025, 15, 10276. [Google Scholar] [CrossRef]
- Jarnac, L.; Chabot, Y.; Couceiro, M. Uncertainty Management in the Construction of Knowledge Graphs: A Survey. Trans. Graph Data Knowl. 2025, 3, 48. [Google Scholar]
- Neo4j. 2022. Available online: https://neo4j.com/ (accessed on 6 March 2022).
- Lafferty, J.; McCallum, A.; Pereira, F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML ‘01: Proceedings of the Eighteenth International Conference on Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Palo Alto, CA, USA, 2017; Volume 31. [Google Scholar]
- Bach, F. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning; JMLR, Inc.: Norfolk, MA, USA, 2015; Volume 37, p. 448. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics; JMLR Workshop and Conference Proceedings; JMLR, Inc.: Norfolk, MA, USA, 2011; pp. 315–323. [Google Scholar]
- Tencent Security. Available online: https://security.tencent.com (accessed on 1 June 2025).
- Alibaba Cloud Security. Available online: https://help.aliyun.com/zh/acsg/ (accessed on 1 June 2025).
- 360 Security. Available online: https://360.net/research/analysis/ (accessed on 1 June 2025).
- ThreatBook. Available online: https://www.threatbook.cn (accessed on 1 June 2025).
- FreeBuf. Available online: https://www.freebuf.com (accessed on 1 June 2025).
- Qianxin. Available online: https://ti.qianxin.com/ (accessed on 1 June 2025).









| Ref. | English CTI Dataset | Chinese CTI Dataset | NER | RE | Joint/Pipeline | Entropy-Based Uncertainty Measurement | Contrastive Learning |
|---|---|---|---|---|---|---|---|
| [11] 2022 | ✗ | ✗ | ✓ | ✗ | - | ✓ | ✗ |
| [12] 2020 | ✓ | ✓ | ✓ | ✗ | - | ✓ | ✗ |
| [9] 2025 | ✓ | ✗ | ✓ | ✗ | - | ✓ | ✗ |
| [10] 2025 | ✓ | ✗ | ✓ | ✗ | - | ✓ | ✗ |
| [13] 2024 | ✗ | ✗ | ✓ | ✗ | - | ✓ | ✓ |
| [14] 2025 | ✗ | ✗ | ✓ | ✗ | - | ✓ | ✓ |
| [15] 2024 | ✗ | ✗ | ✗ | ✓ | - | ✓ | ✗ |
| [16] 2023 | ✗ | ✗ | ✗ | ✓ | - | ✓ | ✗ |
| [17] 2025 | ✗ | ✗ | ✗ | ✓ | - | ✓ | ✗ |
| [18] 2023 | ✗ | ✗ | ✓ | ✓ | Joint | ✓ | ✗ |
| [4] 2025 | ✗ | ✗ | ✓ | ✓ | Pipeline | ✗ | ✗ |
| [5] 2023 | ✗ | ✓ | ✓ | ✗ | - | ✗ | ✗ |
| [6] 2023 | ✗ | ✓ | ✓ | ✓ | Pipeline | ✗ | ✗ |
| Ours | ✗ | ✓ | ✓ | ✓ | Pipeline | ✓ | ✓ |
| [Original text] | BITTER APT组织是一个长期活跃的境外网络攻击组织。 |
| [BERT] | BITTER APT组织是一个长期活跃的[MASK] [MASK]网络攻击组织。 |
| [RoBERTa_wwm] | BITTER APT组织是一个长期活跃的[MASK][MASK][MASK][MASK][MASK][MASK][MASK] [MASK]。 |
| Entity Type | Entity Description |
|---|---|
| Admin | Professionals who manage and protect computer systems |
| Device | Any computing device connected to a network |
| Hardware | All physical components and devices of a computer |
| Software | Programs and data running on a computer |
| Operating System | The fundamental software in a computer |
| Service | An application or process running on a network |
| Network | Multiple interconnected computing devices capable of communicating and exchanging information with each other |
| IP | A unique numerical label assigned to each device for locating and identifying it on a network |
| Port | A logical channel in a network used to distinguish between different services and applications |
| Protector | Individuals or teams responsible for protecting computer systems |
| Countermeasure | Techniques employed to safeguard computer systems |
| Adversary | Individuals or organizations attempting to infiltrate or damage computer systems |
| Attack Vector | Techniques used by adversaries to gain control of computer systems |
| Vulnerability | Security flaws present in devices |
| Malware | Malicious software programs designed to cause harm to computers |
| Worm | Programs that destroy data by infecting computers |
| Model | Macro-Average | Micro-Average | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-Score | Precision | Recall | F1-Score | |
| EDRMRC | 93.39% | 90.12% | 91.72% | 93.48% | 90.11% | 91.76% |
| RMRC | 92.58% | 88.15% | 90.29% | 92.28% | 87.93% | 90.05% |
| RoBERTa_wwm-RDCNN-CRF | 91.05% | 85.96% | 88.4% | 90.94% | 85.61% | 88.19% |
| RoBERTa_wwm-TENER-CRF | 89.78% | 80.76% | 84.87% | 90.15% | 80.89% | 85.27% |
| BERT-Mamba-RDCNN-CRF | 88.65% | 76.82% | 82.21% | 89.03% | 79.79% | 84.16% |
| BERT-BiLSTM-GRU-CRF | 71.1% | 75.37% | 73.09% | 68.5% | 74.24% | 71.25% |
| Model | Macro-Average | Micro-Average | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-Score | Precision | Recall | F1-Score | |
| EDRMRC | 84.14% | 75.01% | 77.49% | 78.79% | 73.04% | 75.81% |
| RMRC | 83.16% | 64.57% | 71.92% | 79.06% | 64.66% | 71.21% |
| RoBERTa_wwm-RDCNN-CRF | 70.84% | 59.96% | 63.66% | 76.92% | 62.68% | 68.97% |
| Model | Macro-Average | Micro-Average | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-Score | Precision | Recall | F1-Score | |
| EDRRC | 89.38% | 93.68% | 90.84% | 93.12% | 92.26% | 92.40% |
| RRC | 77.04% | 72.54% | 72.90% | 89.28% | 89.23% | 88.98% |
| BERT-based method | 51.00% | 56.00% | 48.00% | 62.00% | 60.00% | 59.00% |
| Relation Type | EDRRC | RRC | BERT-Based Method | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1-Score | Precision | Recall | F1-Score | Precision | Recall | F1-Score | |
| alias_of | 92.31% | 97.30% | 94.74% | 89.74% | 90.91% | 90.32% | 61.00% | 60.00% | 60.00% |
| cooperate_with | 100.00% | 100.00% | 100.00% | 100.00% | 33.33% | 50.00% | 57.00% | 67.00% | 62.00% |
| related_to | 92.31% | 100.00% | 96.00% | 84.00% | 77.78% | 80.77% | 57.00% | 74.00% | 65.00% |
| uses | 81.82% | 75.00% | 78.26% | 76.56% | 84.48% | 80.33% | 66.00% | 57.00% | 61.00% |
| target_at | 97.37% | 88.10% | 92.50% | 93.50% | 95.90% | 94.68% | 62.00% | 76.00% | 68.00% |
| originated_from | 100.00% | 88.89% | 94.12% | 87.10% | 79.41% | 83.08% | 50.00% | 35.00% | 41.00% |
| launch | 60.00% | 100.00% | 75.00% | 66.67% | 100.00% | 80.00% | 40.00% | 50.00% | 44.00% |
| consist_of | 100.00% | 100.00% | 100.00% | 100.00% | 91.67% | 95.65% | 50.00% | 17.00% | 25.00% |
| operating_in | 100.00% | 100.00% | 100.00% | 96.59% | 94.44% | 95.51% | 73.00% | 42.00% | 54.00% |
| develop | 70.00% | 87.50% | 77.78% | 53.33% | 50.00% | 51.61% | 33.00% | 38.00% | 35.00% |
| Model | Accuracy | |
|---|---|---|
| CDTier | MyCDTier | |
| EDRRC | 92.26% | 94.23% |
| RRC | 89.23% | 92.26% |
| BERT | 60.00% | 82.95% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, Y.; Li, X.; Zhang, Y.; Liu, Z.; Li, X.; Xu, Q.; Chang, X. EDAER: Entropy-Driven Approach for Entity and Relation Extraction in Chinese Cyber Threat Intelligence. Entropy 2026, 28, 261. https://doi.org/10.3390/e28030261
Li Y, Li X, Zhang Y, Liu Z, Li X, Xu Q, Chang X. EDAER: Entropy-Driven Approach for Entity and Relation Extraction in Chinese Cyber Threat Intelligence. Entropy. 2026; 28(3):261. https://doi.org/10.3390/e28030261
Chicago/Turabian StyleLi, Yong, Xiuping Li, Yangbai Zhang, Zhiqiang Liu, Xiaowei Li, Qi Xu, and Xiaolin Chang. 2026. "EDAER: Entropy-Driven Approach for Entity and Relation Extraction in Chinese Cyber Threat Intelligence" Entropy 28, no. 3: 261. https://doi.org/10.3390/e28030261
APA StyleLi, Y., Li, X., Zhang, Y., Liu, Z., Li, X., Xu, Q., & Chang, X. (2026). EDAER: Entropy-Driven Approach for Entity and Relation Extraction in Chinese Cyber Threat Intelligence. Entropy, 28(3), 261. https://doi.org/10.3390/e28030261

