This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Study on Improving the Automatic Classification Performance of Cybersecurity MITRE ATT&CK Tactics Using NLP-Based ModernBERT and BERTopic Models
by
Jaehwan Baek
Jaehwan Baek 1
,
Jeonghoon O
Jeonghoon O 2,
Seungwoo Jeong
Seungwoo Jeong 2 and
Wooju Kim
Wooju Kim 3,*
1
Graduate Program in Technology Policy, Yonsei University, Seoul 03722, Republic of Korea
2
Independent Researcher, Seoul 03722, Republic of Korea
3
Department of Industrial Engineering, Yonsei University, Seoul 03722, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(22), 4434; https://doi.org/10.3390/electronics14224434 (registering DOI)
Submission received: 10 October 2025
/
Revised: 31 October 2025
/
Accepted: 4 November 2025
/
Published: 13 November 2025
Abstract
Cyber Threat Intelligence (CTI) reports are essential resources for identifying the Tactics, Techniques, and Procedures (TTPs) of hackers and cyber threat actors. However, these reports are often lengthy and unstructured, which limits their suitability for automatic mapping to the MITRE ATT&CK framework. This study designs and compares five hybrid classification models that combine statistical features (TF-IDF), transformer-based contextual embeddings (BERT and ModernBERT), and topic-level representations (BERTopic) to automatically classify CTI reports into 12 ATT&CK tactic categories. Experiments using the rcATT dataset, consisting of 1490 public threat reports, show that the model integrating TF-IDF and ModernBERT achieved a micro-precision of 72.25%, reflecting a 10.07-percentage-point improvement in detection precision compared with the baseline. The model combining TF-IDF and BERTopic achieved a micro F0.5 of 67.14% and a macro F0.5 of 63.20%, demonstrating balanced performance across both frequent and rare tactic classes. These findings indicate that integrating statistical, contextual, and semantic representations can improve the balance between precision and recall while enabling clearer interpretation of model outputs in multi-label CTI classification. Furthermore, the proposed model shows potential applicability for improving detection efficiency and reducing analyst workload in Security Operations Center (SOC) environments.
Share and Cite
MDPI and ACS Style
Baek, J.; O, J.; Jeong, S.; Kim, W.
A Study on Improving the Automatic Classification Performance of Cybersecurity MITRE ATT&CK Tactics Using NLP-Based ModernBERT and BERTopic Models. Electronics 2025, 14, 4434.
https://doi.org/10.3390/electronics14224434
AMA Style
Baek J, O J, Jeong S, Kim W.
A Study on Improving the Automatic Classification Performance of Cybersecurity MITRE ATT&CK Tactics Using NLP-Based ModernBERT and BERTopic Models. Electronics. 2025; 14(22):4434.
https://doi.org/10.3390/electronics14224434
Chicago/Turabian Style
Baek, Jaehwan, Jeonghoon O, Seungwoo Jeong, and Wooju Kim.
2025. "A Study on Improving the Automatic Classification Performance of Cybersecurity MITRE ATT&CK Tactics Using NLP-Based ModernBERT and BERTopic Models" Electronics 14, no. 22: 4434.
https://doi.org/10.3390/electronics14224434
APA Style
Baek, J., O, J., Jeong, S., & Kim, W.
(2025). A Study on Improving the Automatic Classification Performance of Cybersecurity MITRE ATT&CK Tactics Using NLP-Based ModernBERT and BERTopic Models. Electronics, 14(22), 4434.
https://doi.org/10.3390/electronics14224434
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.