PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence

Feng, Xinzhu; He, Songheng; Wei, Xinxin; Liu, Runshi; Yue, Huanzhou; Wang, Xuren

doi:10.3390/app151810276

Open AccessArticle

PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence

by

Xinzhu Feng

¹

,

Songheng He

²,

Xinxin Wei

¹,

Runshi Liu

³,

Huanzhou Yue

¹ and

Xuren Wang

^1,*

¹

Information Engineering College, Capital Normal University, Beijing 100048, China

²

School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, USA

³

School of Cyberspace Security, Beijing Institute of Technology, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(18), 10276; https://doi.org/10.3390/app151810276

Submission received: 4 July 2025 / Revised: 23 August 2025 / Accepted: 4 September 2025 / Published: 22 September 2025

Download

Browse Figures

Versions Notes

Abstract

The growing sophistication of cyberattacks underscores the need for the automated extraction of machine-readable intelligence from unstructured Cyber Threat Intelligence (CTI), commonly achieved through Named Entity Recognition (NER). However, existing CTI-oriented NER research faces two major limitations: the scarcity of standardized datasets and the lack of advanced models tailored to domain-specific entities. To address the dataset challenge, we present CTINER, the first STIX 2.1-aligned dataset, comprising 42,549 annotated entities across 13 cybersecurity-specific types. CTINER surpasses existing resources in both scale (+51.82% more annotated entities) and vocabulary coverage (+40.39%), while ensuring label consistency and rationality. To tackle the modeling challenge, we propose PROMPT-BART, a novel NER model built upon the BART generative framework and enhanced through three types of prompt designs. Experimental results show that PROMPT-BART improves F1 scores by 4.26–8.3% over conventional deep learning baselines and outperforms prompt-based baselines by 1.31%.

Keywords:

cyber threat intelligence; named entity recognition; prompt learning; demonstration learning

1. Introduction

In recent years, network attacks have shown an increasing trend [1]. Among them, Advanced Persistent Threats (APT), as a complex, long-term, and targeted form of cyber attack, stand out. APTs are complex, long-term network attacks launched by highly skilled attackers, often covert with specific targets, and, once successful, can cause significant damage [2]. Therefore, the defense against APTs not only relies on traditional intrusion detection techniques but also requires a deep understanding of the attack chain, attack techniques, and the attacker’s intentions [3]. How to trace the origin of an APT attack [4], identify the attacking organization, reveal the attack methods, and summarize their tactics has become an important topic in cybersecurity research.

Cyber Threat Intelligence (CTI) plays a key role in this context as a core means to address these challenges. CTI refers to the actionable knowledge derived from collecting, processing, and analyzing raw data related to cyber threats, used to uncover attack mechanisms [5], identify threat indicators [6], assess potential impacts, and provide decision support for defense [7]. Unlike traditional security mechanisms that rely on signature matching or rule-based detection, CTI plays a crucial role in addressing APT attacks. Specifically, CTI helps security teams gain an advantage at various stages of the attack chain: before the attack, CTI provides intelligence on potential attackers to predict and prevent potential threats [8]; during the attack, CTI provides key indicators such as malicious domains, IP addresses, and file hashes to assist in detecting covert intrusions [9]; after the attack, CTI supports event tracing and pattern recognition, revealing the attacker’s behavioral traits and strengthening subsequent defenses [10]. Thus, CTI can fill the gaps of traditional intrusion detection, providing systematic and intelligence-driven support for APT detection and defense.

Despite its undeniable importance, threat intelligence is often disseminated in unstructured formats such as web pages [11], emails [12], and other sources. This unstructured nature presents significant challenges in quickly extracting relevant information from massive datasets. Therefore, Named Entity Recognition (NER), the task of identifying and classifying specific entities (such as malware names, IP addresses, attack techniques, and related organizations) in unstructured text, has become a crucial technique for automatically extracting key information related to cyberattacks [13]. NER for threat intelligence plays an important role in maintaining network security [14].

In this context, Natural Language Processing (NLP) technology has gradually become an important tool for threat intelligence processing. Among them, Named Entity Recognition (NER), as a core method for information extraction, can automatically identify and extract key entities related to cyberattacks from unstructured text, providing support for the structuring and automation of threat intelligence processing. With NER, large volumes of dispersed security texts can be transformed into usable structured data, enabling the knowledge-based description of APT activities and cross-source information correlation analysis. However, due to the high number of specialized terms, inconsistent naming, and the scarcity of labeled corpora in the cybersecurity field, existing general-purpose NER methods still face performance bottlenecks in the threat intelligence domain [15]. Therefore, domain-specific NER research for threat intelligence has become one of the key directions for improving APT detection and tracing capabilities.

At present, research on Named Entity Recognition (NER) in the field of Cyber Threat Intelligence (CTI) faces multiple challenges, primarily arising from dataset-level and model-level issues.

First, at the dataset level, existing datasets fail to meet research needs. Most current datasets rely on semi-structured text (such as forums, blogs, and emails) and lack standardization. They do not provide large-scale, uniformly annotated corpora, nor do they ensure consistent data distributions or label definitions. Consequently, there is no unified evaluation benchmark, making it difficult to measure actual performance. Furthermore, high-quality, manually annotated cybersecurity datasets are extremely scarce, especially for domain-specific NER tasks. Existing datasets suffer from incomplete annotations, and most corpora contain limited entity types, often failing to fully cover the complex and diverse entities within the CTI domain. Such dataset deficiencies restrict model training effectiveness and generalization ability, thereby hindering the advancement of CTI-oriented NER research.

Second, at the model level, existing CTI-specific NER models mainly focus on dictionary-based, rule-based, machine learning, and traditional deep learning methods, without keeping pace with the latest paradigm shifts in general NER research. When using general-purpose NER models for CTI-NER tasks, threat intelligence is often treated as ordinary text, overlooking the unique characteristics of cybersecurity reports. As a result, these models fail to effectively leverage domain-specific knowledge, making it difficult to capture the technical details embedded in threat intelligence.

To address the above problems, we propose the following approaches, aiming to solve these two critical challenges.

First, to address dataset limitations, we improve the standards of entity definition and the structural novelty of samples. We begin by establishing a generalized standard for entity type definitions, drawing inspiration from the STIX 2.1 standard [16], an important framework in the threat intelligence field that provides a structured approach for representing and exchanging cybersecurity threat information. STIX 2.1 enables analysts, organizations, and automated systems to share information efficiently and in a standardized manner. Therefore, we adopt this standard as the foundation for our entity classification scheme. To transform the widely available unstructured and unlabeled semi-structured text into structured training samples, we designed an annotation system and rigorous annotation methodology, recruiting volunteers with professional backgrounds to perform manual labeling.

Second, to address model limitations, we incorporate recent advances in general-purpose NER research and use Large Language Models (LLMs) as the backbone of our approach. To capture the unique characteristics of cybersecurity reports within CTI, we inject expert knowledge into the model through prompt engineering. In order to provide precise guidance for knowledge injection, we employ three types of prompts (prefix prompts, demonstration prompts, and template prompts) to steer the model toward producing the desired outputs.

In this paper, we tackle two key challenges in CTI-oriented NER research following the above approaches. Our contributions are as follows:

1.: Dataset improvement: We propose a new entity type definition standard based on the STIX 2.1 framework to enhance the structuring and annotation of CTI datasets. We further develop a systematic annotation methodology and recruit domain experts to annotate large-scale datasets, thereby improving data quality and coverage.
2.: Model enhancement: We introduce a novel NER model, PROMPT-BART, which leverages large language models (LLMs) and incorporates domain-specific expert knowledge through prompt engineering. We design three types of prompts (prefix, demonstration, and template prompts) to guide the model in effectively capturing technical details in cybersecurity reports.
3.: Evaluation and analysis: Using our newly created CTI NER dataset, we conduct a comprehensive evaluation of the proposed model, demonstrating significant improvements in accuracy and generalization compared to traditional NER methods.

The remainder of this paper is organized as follows: Section 2 reviews related work, Section 3 introduces the CTINER dataset, Section 4 presents the PROMPT-BART model based on prompt learning, Section 5 provides experimental evaluations, and Section 6 concludes the paper.

2. Related Work

Recent advances in Named Entity Recognition (NER) within the Cyber Threat Intelligence (CTI) area have been driven by breakthroughs in both benchmark datasets and modeling techniques. Specifically, our work has made significant contributions to these two areas by developing new datasets and novel modeling approaches. To provide context for our work, it is essential to review prior research on both datasets and models, as these areas provide the necessary foundation for the advancements we have made.

2.1. Research on Methods for Named Entity Recognition

2.1.1. General Domain Named Entity Recognition Research

Named Entity Recognition (NER) aims to detect and classify named entities in text into pre-defined categories. Over the years, NER has evolved from rule- and dictionary-based approaches to machine learning and deep learning methods, with recent advances incorporating pre-trained and large-scale models.

Ma et al. [17] proposed an end-to-end model that combines Bi-LSTM, CNN, and CRF, where character-level features extracted via CNN are concatenated with Word2Vec embeddings. Pinheiro et al. [18] similarly integrated CNN and CRF, while Jiang et al. [19] addressed Chinese character ambiguity using a BERT-Bi-LSTM-CRF model for electronic medical records, showing that BERT enhances semantic representation. Fang et al. [20] tackled cross-domain few-shot learning by incorporating a memory module to store source domain information, and Chen et al. [21] developed HEProto, a multi-task model that jointly performs span detection and type classification.

Arora et al. [22] introduced Split-NER, decomposing NER into two subtasks: a question-answering model first detects entity spans without labels, and then a QA-style classifier assigns entity types. This two-stage fine-tuning of BERT results in efficient training and strong performance. Wang et al. [23] proposed OaDA to enhance few-shot NER by generating multiple permutations of entities, ensuring unique input–output pairs, and introduced an OaDA-XE loss function to mitigate the one-to-many mapping issue. Their approach significantly improves the few-shot performance of pre-trained language models on three benchmark datasets.

Furthermore, Yan et al. [24] proposed a novel Chinese named entity recognition model based on XLNet-BiLSTM-CRF, which integrates the pre-trained XLNet into the classical BiLSTM-CRF framework to enhance feature representation. The study demonstrates that this approach effectively addresses the challenge of capturing rare entities in small-scale datasets and achieves superior performance compared with existing methods on the CoNLL-2003 and WNUT-2017 datasets. Moreover, Cui et al. [25] proposed a template-based sequence-to-sequence method, Template-NER, which formulates NER as a language model ranking problem. This approach classifies candidate entity spans based on template matching scores, thereby significantly improving recognition performance.

2.1.2. Named Entity Recognition Research in Cyber Threat Intelligence

Cyber threat intelligence NER shares methodological similarities with general-domain NER but demands additional domain-specific knowledge due to the unique characteristics of cybersecurity entities.

Liao et al. [26] employed a rule-based method to extract four predefined entity types, while Zhu et al. [27] classified compromise indicators (IOCs) using a four-stage manual analysis. Zhou et al. [28] combined a Bi-LSTM, attention mechanism, and spelling features to detect low-frequency IOCs in security reports, whereas Dionisio et al. [29] utilized a Bi-LSTM-CRF model to extract IOC entities from tweets.

To tackle challenges such as mixed-language texts and specialized vocabulary, Wang et al. [30] integrated boundary features with an iterative dilated CNN for NER. Chen et al. [31] streamlined the conventional BERT-BiLSTM-CRF framework by removing the Bi-LSTM layer and directly feeding BERT-generated word vectors into a CRF, thereby achieving high accuracy in both real-world scenarios and malware intelligence datasets.

2.1.3. Research on Prompt Engineering-Based Named Entity Recognition

Prompt learning reformulates downstream tasks by converting input data into formats compatible with pre-trained language models using predefined templates, thereby harmonizing the objectives of pre-training and fine-tuning. This approach mitigates performance degradation inherent in traditional paradigms, where distinct stages involve disparate tasks and optimization goals, by recasting many natural language processing tasks as masked language model tasks.

Chen et al. [32] introduced the Self-Describing Network (SDNet), which leverages labeled data and external knowledge for transfer learning. Ben-David et al. [33] proposed the PADA model for automated prompt generation. Other generative approaches include prompt decomposition [25], prefix prompt methods [34], and a combination of template-based prompts with multi-level neural networks to address ambiguous Chinese NER [35].

Ye et al. [36] developed a two-stage prompt learning framework that decomposes NER into entity localization and type classification. The framework employs distant supervision to efficiently train the localization module and uses concise prompt templates to train the classification module. During inference, entity spans are first predicted and then reformulated via prompt templates to enable rapid and accurate type prediction.

Xia et al. [37] introduced the MPE framework, which embeds semantic information of entities directly into the prompt construction process to enhance recognition accuracy. To address data scarcity, MPE decomposes training data into domain-agnostic meta-tasks tailored to NER and employs a dedicated prompt meta-learner to optimize meta-prompts.

In unsupervised and low-resource scenarios, prompt learning demonstrates significant advantages in cost-effectiveness and flexibility. Validated in tasks such as knowledge question answering and text generation, this approach shows promise for threat intelligence analysis, potentially outperforming methods that rely on traditional pre-training and fine-tuning. However, the application of prompt engineering in named entity recognition for cyber threat intelligence remains relatively scarce, which is also the research direction we are dedicated to advancing.

2.2. Construction of Datasets for Named Entity Recognition

2.2.1. General Domain Datasets

Named entity recognition (NER) has advanced significantly with the availability of authoritative benchmark datasets. For example, Sang et al. [38] developed the CoNLL-2003 dataset to identify and classify entities (e.g., person names, locations, and organizations), thereby establishing a standardized evaluation benchmark. Similarly, the Linguistic Data Consortium (LDC) introduced the ACE2005 dataset [39], a multilingual resource covering English, Arabic, and Chinese, which addresses five subtasks: entities, values, temporal expressions, relations, and events. Derczynski et al. proposed the WNUT-2017 dataset [40], which focuses on noisy user-generated content from social media and covers entities such as locations, products, companies, persons, creative works, and organizations. Weischedel et al. constructed the OntoNotes dataset [41], a large-scale multilingual corpus (Arabic, English, and Chinese) of 2.9M words annotated across multiple genres, including structural and shallow semantic layers. Although these datasets effectively evaluate general NER performance, they offer limited utility for specialized domains such as cyber threat intelligence, highlighting the need for domain-specific datasets.

2.2.2. Cybersecurity Domain Datasets

There are several semi-structured datasets in the cyber threat intelligence (CTI) field, such as VirusTotal [42], MISP [43], OpenIOC [44], and AlienVault OTX [45]. VirusTotal is an online malware analysis platform that allows users to analyze suspicious files. MISP is an open-source platform for sharing threat intelligence, enabling users to exchange malware samples and related information. OpenIOC is a format for describing malware, facilitating data sharing across different tools and allowing easy integration into various security systems. AlienVault OTX is an open-source intelligence-sharing platform where users can share and access a variety of cyber threat data, helping security teams gain insights into threats.

In terms of structured datasets, Wang et al. [46] constructed the DNRTI dataset from over 300 threat intelligence reports, predefining 13 entity labels over 175,677 words with 36,812 annotations (approximately 21% of the total), partitioned into training, test, and validation sets (7:1.5:1.5 ratio). Kim et al. [47] developed the CTIReports dataset from 160 unstructured PDFs, featuring 10 entity types across 327,096 words and 9458 annotations. Moreover, Wang et al. [48] integrated cybersecurity articles, blogs, and open-source reports to create the APTNER dataset, which defines 21 entity labels over 260,491 words with 39,616 annotations (approximately 15.2%), using a similar data split.

3. Construction of Cyber Threat Intelligence Datasets

Through an investigation and analysis of existing datasets, it has been observed that publicly available NER datasets in the domain of cyber threat intelligence generally suffer from several limitations. First, most datasets are outdated and fail to incorporate emerging entities that have appeared in recent years. Second, their scale is relatively small, which restricts their suitability for large-scale training and applications. Third, many datasets define entity labels without adhering to the STIX 2.1 specification, resulting in difficulties when applying them to downstream research tasks. Finally, data distribution within these datasets is imbalanced, with certain entity categories being significantly under-represented, which compromises completeness and introduces bias. These issues collectively limit the practical applicability and research value of existing datasets.

To address these problems, this study redefines entity categories in accordance with the STIX 2.1 specification and common paradigms of cyber threat intelligence. The focus is placed on entities that contribute to threat understanding and detection, while discarding easily falsified indicators such as file hashes and IP addresses. By conducting a semi-automated annotation process on a large collection of APT (Advanced Persistent Threat) reports, we ultimately construct a domain-specific dataset for entity recognition, named CTINER. Our dataset has been publicly released on GitHub and is available for download.

3.1. Dataset Extraction Methodology and Corresponding Modules

The construction process of the CTINER dataset is illustrated in Figure 1. Specifically, the dataset is built through seven modules: threat intelligence acquisition, data preprocessing, entity definition, annotation and validation, data format conversion, annotated data cleaning, and rare data supplementation. A detailed description of each module is provided below.

3.1.1. Threat Intelligence Acquisition Module

The dataset primarily consists of APT analysis reports, which describe in detail the attack methods employed against specific organizations and their consequences. Since constructing the dataset requires a large volume of raw intelligence corpora, we first collected documents from cybersecurity companies and open-source repositories such as GitHub. The APT reports published by security companies are summarized in Table 1. All collected reports were converted into TXT format. To ensure the high relevance of the corpus to the cyber threat intelligence domain, the keyword “APT” was explicitly specified during the search and selection process. Finally, only the main body of the reports was retained as the textual content for further processing.

3.1.2. Data Preprocessing Module

After obtaining the raw corpus, we performed standardization and cleaning procedures to transform the data into valuable training samples.

First, text formatting was conducted. Since some paragraphs in the raw text lacked punctuation at the end, which affected sentence segmentation, we examined the final character of each paragraph. If a punctuation mark was already present, the text was left unchanged; otherwise, an English period was appended.

Next, sentence segmentation and tokenization were applied. Given that the reports are typically presented in long paragraphs while entity annotation is sentence-based, we employed nltk.sent_tokenize(all_text) to split the corpus into individual sentences, ensuring that each sentence occupies one line. Since entity annotation is token-level, we further applied nltk.word_tokenize(sentence) to separate words and punctuation with spaces.

Subsequently, data cleaning was performed. Using automated scripts, we corrected spacing inconsistencies and removed sentences that were either too short (fewer than 10 tokens) or excessively long (more than 50 tokens), as such sentences were found to contain little valuable information or entity content. In addition, we detected a small number of garbled sentences, most of which did not begin with uppercase English letters. Therefore, sentences whose initial character was not within the range “A–Z” were discarded.

After these preprocessing steps, the corpus was transformed into a clean and standardized form suitable for use as training samples, as illustrated in Figure 2.

3.1.3. Entity Pre-Definition Module

In the process of defining entities, the first step is to determine which types of entities should be extracted, namely, which attack indicators in CTI carry substantial analytical value. As shown in Figure 1, the pyramid model [49] illustrates the relative practical value of different attack indicators. We excluded the three lower-value and easily mutable categories in the pyramid (file hashes, IP addresses, and domain names) since adversaries frequently alter such indicators to evade detection.

Among the remaining layers of the pyramid, the following categories were retained: network or host characteristics, including URLs, sample files, and Operating Systems (OSs); attack tools, including benign tools and malware; Tactics, Techniques, and Procedures (TTPs), including vulnerabilities, campaigns, and malicious emails.

Subsequently, we referred to the STIX 2.1 specification, which defines 18 types of domain objects in threat intelligence, such as attack patterns, campaigns, courses of action, intrusion sets, infrastructure, indicators, malware, malware analysis, threat actors, reports, tools, vulnerabilities, and others. Finally, considering practical attack scenarios, we additionally incorporated “attack target” as a crucial entity type that warrants attention.

Based on these considerations, we defined a total of 13 entity labels, as shown in Table 1. These labels not only align with the domain objects specified in widely adopted standards, thereby enabling the extraction of comprehensive attack descriptors, but also reflect realistic attack scenarios. This design ensures standardized transmission of threat intelligence and enhances compatibility between the dataset and both current and future threat analysis systems.

3.1.4. Annotation and Verification Module

Since this dataset is specifically designed for the cyber threat intelligence domain, it contains unique entity types and domain-specific vocabulary. As a result, automated annotation tools are not sufficiently accurate in recognizing the relevant named entities. Therefore, this study adopts a manual annotation approach. Annotators were first provided with preliminary training: each annotator was required to read a certain number of threat intelligence reports and acquire the necessary domain knowledge before being allowed to perform annotations.

The data to be annotated consisted of the TXT corpus obtained from the data preprocessing module, while the entity categories were defined according to the 13 entity labels established in the entity definition module. The annotation process was carried out using the Doccano system, and an example of annotated data is shown in Figure 3.

3.1.5. Data Format Conversion Module

To facilitate processing and visualization, the module first converts the JSONL annotation files exported from Doccano into JSON format, and subsequently into TXT format, adopting the BIO labeling scheme for the named entity recognition task.

The final CTINER dataset consists of two columns, where each line of data contains a token and its corresponding label, separated by a space. Different sentences in the dataset are separated by a blank line. The entity label format is defined as follows:

Tag = {B, I, O} + - + {ACT, TAR, CAM, IDTY, VUL, TOOL, MAL, LOC, TIME, FILE, URL, OS,

EML, O}

. Here, {B, I} represent the boundary tags for tokens belonging to entities, while {ACT, TAR, CAM, IDTY, VUL, TOOL, MAL, LOC, TIME, FILE, URL, OS, EML, O} correspond to the set of predefined entity categories. In this study, the label “O-O” is simplified to “O,” indicating non-entity tokens.

3.1.6. Annotation Data Cleaning Module

Cyber threat intelligence exhibits unique characteristics: its sentences are generally long, but the density of entities is relatively low. As a result, a large number of sentences contain no entities. In this study, such sentences were removed in order to increase the overall proportion of entities and to address the problem of imbalanced label distribution. Finally, the dataset was divided into training, testing, and validation sets in a 7:2:1 ratio.

3.1.7. Rare Data Supplementation Module

Certain entity types in APT reports are underrepresented due to the nature of the reports. To address this, open-source data was sourced from the internet to supplement underrepresented entities, based on statistical analysis of the dataset.

3.2. Dataset Overview and Entity Distribution

The dataset is split into training, testing, and validation sets with a ratio of 7:2:1, as shown in Table 2. The dataset comprises 16,573 sentences, 459,308 words, 42,549 entities, and 60,167 entity tokens. On average, each sentence contains approximately 28 words, indicating that the sentences in cyber threat intelligence texts are relatively lengthy but contain fewer entities. This characteristic presents challenges in constructing a knowledge-rich named entity recognition dataset for the cyber threat intelligence domain.

The top five most frequent entities in the CTINER dataset are IDTY (6778), TOOL (6544), TAR (6358), TIME (5171), and ACT (4607), which play a critical role in network threat assessment and enhancing cybersecurity.

3.3. Comparison with Other Datasets: DNRTI, CTIReports, and APTNER

To assess the CTINER dataset’s advancement and rationality, it is compared with other open-source datasets in the cyber threat intelligence domain. The comparison includes predefined label rationality, entity density, and overall dataset characteristics. The number of labels and their specific categories for each dataset are shown in Table 3.

3.3.1. Label Rationality

The APTNER dataset [48] defines 21 entity types, increasing annotation complexity and compromising dataset quality. Certain entity types, such as IP addresses and MD5 values, have limited utility due to their time-sensitive nature, with attackers frequently altering these indicators. To address this, CTINER eliminates these time-sensitive entities and merges those with overlapping meanings (e.g., merging security teams and identity authentication into IDTY, domain names and URLs into URL, and vulnerability names and identifiers into VUL).

In the DNRTI dataset [46], CTINER modifies the attack objective to “attack target” based on the threat information expression model. DNRTI also suffers from overlapping or ambiguously defined entity types, such as spear phishing, and lacks critical entities like malware.

The CTIReports dataset [47] primarily focuses on IP addresses, malware, and URLs, which are insufficient for downstream tasks such as knowledge graph construction or machine-readable intelligence generation.

In contrast, the CTINER dataset incorporates the STIX 2.1 specification, the pyramid model, and the threat information expression model, defining 13 distinct entity types. These types are independent and better support downstream research in threat intelligence.

3.3.2. Dataset Scale Comparison

An analysis of dataset characteristics (sentence count, word count, and entity token count) reveals that the CTINER dataset contains significantly more sentences, words, and entity tokens than the other three datasets in comparison, shown in Figure 4, summarizes the CTINER dataset alongside other cyber threat intelligence datasets, with the data split ratio of training, testing, and validation sets.

4. Model

The core of prompt engineering lies in embedding high-quality knowledge into the model training process by incorporating human-provided prior information, thereby guiding learning and activating the model’s reasoning capabilities. In this study, we propose the PROMPT-BART model. To enable the BART model to acquire domain-specific knowledge in the CIT field, we provide three types of prompts (task prompts, entity demonstration prompts, and template prompts), as illustrated in Figure 5.

4.1. Model Construction

We design and integrate task prompts, entity demonstration prompts, and template prompts along with the original input to form the comprehensive input for the PROMPT-BART model. The simultaneous use of these three prompt types offers multi-level guidance, effectively merging high-level task instructions with concrete examples to produce accurate and reliable outputs. This multi-prompt strategy enhances the model’s reasoning ability and output consistency, thereby reducing errors and uncertainty, especially in complex tasks.

4.1.1. Constructing Task Prompts

Task prompts serve to define the task and clarify the model’s role, thus guiding its behavior and enriching the semantic context. For example, the first sentence introduces the model with a statement such as “I am an experienced cybersecurity expert and a proficient linguist”. In the context of a cybersecurity NER task, the subsequent sentence specifies, “This task involves named entity recognition for cybersecurity, aiming to label entity types in the provided sentence”. The final sentence, “Below are some demonstrations; please make the appropriate predictions”, directs the model’s focus toward the provided examples.

4.1.2. Constructing Entity Demonstration Prompts

Entity demonstration prompts provide clear task guidance by presenting representative examples for each entity type. We employ a statistical method to select exemplar entities for each category, as depicted in Figure 6. For each entity type, the most frequent and concise entity from the training set is chosen as the contextual reference. Sentences are then constructed based on these entities, as detailed in Table 4. This approach ensures that the model is furnished with clear, contextually relevant examples to inform its predictions.

4.1.3. Constructing Template Prompts

Template prompts are carefully designed to provide clear, structured guidance that enhances both the model’s understanding of the input and its generation of outputs in a predetermined, standardized format. By explicitly outlining task requirements and offering a consistent framework, these prompts not only reduce ambiguity and bolster the interpretability and reliability of predictions but also improve the model’s comprehension of varied input expressions. The multi-prompt approach leverages multiple prompt inputs during inference to facilitate robust predictions. Template prompt construction involves two key steps.

(1): Prompt Paraphrasing:

By transforming the original template into multiple linguistically diverse yet semantically equivalent expressions, paraphrasing generates varied prompts that enable the model to accurately comprehend task requirements despite varied phrasings. For example, an original prompt such as “Identify the entities in the sentence” might be rephrased as “Locate all words or phrases with specific meanings in the sentence”, thereby mitigating overfitting to a single formulation and enhancing robustness. To boost template diversity and lexical richness, we utilize a fine-tuned fairseq English–German translation model along with fastBPE subword segmentation and Moses tokenization.

This process translates and refines the initial prompts, resulting in a collection comprising three general-purpose templates and two cybersecurity-specific templates, culminating in five template pairs (see Table 5). This step not only expands the expressive range but also reinforces the model’s adaptability to varied linguistic patterns.

(2): Prompt Decomposition:

Decomposing a complex template into smaller sub-prompts allows the model to sequentially focus on distinct aspects of the task, reducing interference and ambiguity when processing intricate information. For instance, a sentence containing multiple entities may be split into sub-prompts like “Identify the first entity” and “Identify the second entity”, enabling the model to incrementally complete the task and ensure each entity is accurately captured. In our model, this step entails breaking down a main prompt into several sub-prompts, each independently addressing a specific entity span. By decomposing the prompt, the model is enabled to focus on individual entity predictions, thereby improving precision and reducing ambiguity.

In summary, integrating these three prompt types endows the PROMPT-BART model with comprehensive, multi-level guidance, thereby achieving superior performance in cybersecurity-oriented named entity recognition tasks.

4.2. Model Training and Inference

The task prompt T, entity demonstrations M, and the original input sequence X are jointly fed into the encoder of the BART model, as shown in (1) and (2), ultimately generating a natural sentence that contains the entities and their corresponding types (i.e., the answer).

Here,

I n p u t = {I_{1}, I_{2}, I_{3}, \dots, I_{n}}

and

A n s w e r = {a_{1}, a_{2}, a_{3}, \dots, a_{m}}

, where N and M represent the lengths of the input and output sequences, respectively. Non-entities are represented as “

[Z]

is not a named entity”, where

[Z]

denotes a non-entity word, and its unit is a token.

I n p u t = J O I N (T, M, X),

(1)

H_{e n c o d e r} = E N C O D E R (I n p u t) .

(2)

For each

(I n p u t, A n s w e r)

pair, the input is fed into the BART model’s encoder to obtain a hidden representation,

H_{e n c o d e r}

, of the input sequence. This hidden representation, along with the token

(a_{1 : i - 1})

output by the decoder from the previous step, serves as the input for the decoder in the current step, as shown in (3). The conditional probability of the token output

a_{i}

at the current stage is specified in (4).

H_{d e c o d e r}^{i} = D E C O D E R (H_{e n c o d e r}, a_{1 : i - 1}),

(3)

p (a_{i} | a_{1 : i - 1}, I n p u t) = s o f t m a x (H_{d e c o d e r}^{i} * W + b),

(4)

where W and b represent trainable parameters, and i represents the i

t h

stage.

The decoder-side model’s loss function is represented by (5).

L = - \sum_{i = 1}^{m} log p (a_{i} ∣ a_{1 : i - 1}, I n p u t) .

(5)

For each span

X_{i, j}

in the original input sequence X, we combine the span with each label

y_{k}

using a template to obtain the

A n s w e r

, where

y_{k} \in Y = {A C T, T A R, C A M, . . .}

. The trained model calculates a score for each

A n s w e r

, assigning the label with the highest score to

X_{i, j}

, as shown in the following formula.

f (A n s w e r, y_{k}, x_{i, j}) = \sum_{i = 1}^{m} log p (a_{i} ∣ a_{1 : i - 1}, I n p u t) .

(6)

5. Experiment

In this section, we first introduce the experimental setup, followed by a comparison experiment on the templates to identify the best template. Finally, based on the selected template, the model is evaluated on the CTINER dataset.

5.1. Experiment Setup

The effectiveness of the model in this paper is verified based on the following experimental environment and evaluation metrics. In terms of hardware, data preprocessing operations are performed on a 13th Gen Intel(R) Core i5-13500H CPU with 16 cores and 32 GB of memory, running on Windows 11. Model training and inference are carried out on an NVIDIA GeForce RTX 2080 Ti GPU with a configuration of 4 cores and 44 GB of memory, running on Pop!_OS 20.04 LTS. The experimental deep learning framework used is torch 2.1.0, with the development language being Python 3.11.0. The main libraries include transformers 4.34.0, tqdm 4.66.1, sentence_transformers 0.64.3, and fairseq 0.12.2.

In terms of evaluation, common methods for assessing Named Entity Recognition (NER) tasks include exact matching and loose matching. Exact matching requires both the boundary and the category of the entity to be correctly identified, while loose matching only requires one of either the boundary or the category to be correct. Given the importance of both boundaries and categories in NER tasks, most studies adopt the exact matching method for evaluation. This paper uses three common evaluation metrics: precision, recall, and F1 score.

For model comparison, we use XLNet-Bi-LSTM-CRF [24], BERT-Bi-LSTM-CRF [19], and Template-NER [25], and replace LSTM with GRU. Additionally, XLNet-Bi-GRU-CRF and BERT-Bi-GRU-CRF are also included for a more comprehensive comparison.

5.2. Comparison and Analysis of Template Prompts

To assess the efficacy of template prompts, we selected seven representative sentence structures for entity recognition from five pairs of bidirectionally translated prompt templates (refer to Table 5). For non-entity cases, a uniform template, “

[Z]

is not a named entity”, where “

[Z]

” denotes a term that does not constitute an entity, was applied. The experimental results (Table 6) reveal that shorter templates consistently outperform longer ones in terms of precision, recall, and F1 score. We hypothesize that longer templates tend to incorporate redundant information, potentially obscuring the essential content required for accurate model comprehension, while also incurring increased resource consumption that may further degrade performance. Consequently, the final template adopted for the PROMPT-BART model is “

[X]

is

[Y]

”.

5.3. Comparison of NER Models

The PROMPT-BART model was benchmarked against several named entity recognition (NER) models based on deep learning and prompt-based approaches. As presented in Table 7, PROMPT-BART achieved an F1 score improvement ranging from 4.26% to 8.3% over the deep learning-based NER baseline models. Furthermore, compared to the prompt-based NER model Template-NER, PROMPT-BART exhibited an additional F1 score enhancement of 1.31%.

The key reason for the enhanced results lies in the integration of prompt engineering with Large Language Models (LLMs). By designing three types of prompts (prefix, demonstration, and template), we enable the model to focus more effectively on capturing technical details from cybersecurity reports, which is crucial for the NER task in the CTI domain.

5.4. Ablation Experiment Results and Analysis

An ablation study was conducted on the PROMPT-BART model to evaluate the contribution of each prompt component. The results, shown in Table 8, indicate that the removal of either the task prompt or the entity demonstration prompt leads to declines in precision, recall, and F1 score, with F1 scores decreasing by 1.06% and 1.16%, respectively. These findings substantiate the effectiveness of the prompt engineering strategy employed in this study.

6. Conclusions

In this study, our research improves the NER task for CTI in two key areas: dataset improvement and model enhancement. To improve the structure and annotation of Cyber Threat Intelligence (CTI) datasets, we propose a new entity type definition standard based on the STIX 2.1 framework. Additionally, we develop a systematic annotation methodology and collaborate with domain experts to annotate large-scale datasets, i.e., CTINER, significantly enhancing both data quality and coverage. We introduce a novel Named Entity Recognition (NER) model, PROMPT-BART, which integrates Large Language Models (LLMs) with domain-specific knowledge through prompt engineering. We design three distinct types of prompts (prefix, demonstration, and template prompts) to guide the model in effectively extracting technical details from cybersecurity reports.

Based on the experimental results, the PROMPT-BART model demonstrates significant improvements over existing NER models. PROMPT-BART achieves an F1 score improvement ranging from 4.26% to 8.3% over the baseline deep learning models, showcasing the superiority of more advanced architectures in the NER field. It also outperforms the prompt-based Template-NER model by 1.31% in F1 score, highlighting its advantage in capturing technical details from cybersecurity reports compared to generic methods. Furthermore, the ablation study confirms that each prompt component plays a crucial role in maximizing the model’s performance.

In future research, we will further investigate NER in the CTI domain based on more advanced architectures, such as incorporating a multi-agent framework for more accurate identification of error-prone entity types. This will help improve the performance of NER models and contribute more effectively to maintaining cybersecurity.

Author Contributions

Conceptualization, X.F., and X.W. (Xinxin Wei); methodology, X.F., and S.H.; methodology, X.F. and S.H.; software, X.F. and X.W. (Xinxin Wei); validation, X.W. (Xinxin Wei) and S.H.; formal analysis, X.F. and S.H.; investigation, X.W. (Xinxin Wei), R.L., and H.Y.; data curation, X.W. (Xinxin Wei), R.L., and H.Y.; writing—original draft preparation, X.F. and S.H.; writing—review and editing, X.F. and S.H.; visualization, X.F. and S.H.; supervision, X.W. (Xuren Wang); project administration, X.F. and S.H.; funding acquisition, X.W. (Xuren Wang). All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Open Foundation of the Key Laboratory of Cyberspace Security, Ministry of Education of China (No. KLCS20240206).

Data Availability Statement

The dataset employed in this study is proprietary but has been made accessible for research purposes via the following repository: https://github.com/Fxz03/CTINER_dataset (accessed on 3 September 2025). Access may be subject to certain restrictions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

European Union Agency for Cybersecurity (ENISA). ENISA Threat Landscape 2024; Twelve-Edition Annual Cybersecurity Threat Landscape Report, Covering July 2023–June 2024; European Union Agency for Cybersecurity (ENISA): Chalandri, Greece, 2024. [Google Scholar] [CrossRef]
Wang, Y.; Liu, H.; Li, Z.; Su, Z.; Li, J. Combating Advanced Persistent Threats: Challenges and Solutions. IEEE Netw. 2024, 38, 324–333. [Google Scholar] [CrossRef]
Moskal, S.; Yang, S.J. Cyberattack Action-Intent-Framework for Mapping Intrusion Observables. arXiv 2020, arXiv:2002.07838. Available online: https://arxiv.org/abs/2002.07838 (accessed on 3 September 2025).
Krishnapriya, S.; Singh, S. A Comprehensive Survey on Advanced Persistent Threat (APT) Detection Techniques. Comput. Mater. Contin. 2024, 80, 2675–2719. [Google Scholar] [CrossRef]
Rahman, M.R.; Basak, S.K.; Hezaveh, R.M.; Williams, L. Attackers reveal their arsenal: An investigation of adversarial techniques in CTI reports. arXiv 2024, arXiv:2401.01865. Available online: https://arxiv.org/abs/2401.01865 (accessed on 3 September 2025). [CrossRef]
Ainslie, S.; Thompson, D.; Maynard, S.; Ahmad, A. Cyber-threat intelligence for security decision-making: A review and research agenda for practice. Comput. Secur. 2023, 132, 103352. [Google Scholar] [CrossRef]
Kulikova, O.; Heil, R.; van den Berg, J.; Pieters, W. Cyber Crisis Management: A Decision-Support Framework for Disclosing Security Incident Information. In Proceedings of the 2012 International Conference on Cyber Security, Alexandria, VA, USA, 14–16 December 2012; pp. 103–112. [Google Scholar]
Ahrend, J.M.; Jirotka, M.; Jones, K. On the collaborative practices of cyber threat intelligence analysts to develop and utilize tacit Threat and Defence Knowledge. In Proceedings of the 2016 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (CyberSA), London, UK, 13–14 June 2016; pp. 1–10. [Google Scholar]
Liu, J.; Yan, J.; Jiang, J.; He, Y.; Wang, X.; Jiang, Z.; Yang, P.; Li, N. TriCTI: An actionable cyber threat intelligence discovery system via trigger-enhanced neural network. Cybersecurity 2022, 5, 8. [Google Scholar] [CrossRef]
Wang, Z.; He, X.; Yi, X.; Li, Z.; Cao, X.; Yin, T.; Li, S.; Fu, A.; Zhang, Y. Survey of attack and detection based on the full life cycle of APT. J. Commun. 2024, 45, 206–228. [Google Scholar]
Schröer, S.L.; Canevascini, N.; Pekaric, I.; Widmer, P.; Laskov, P. The Dark Side of the Web: Towards Understanding Various Data Sources in Cyber Threat Intelligence. arXiv 2025, arXiv:2504.14235. Available online: https://arxiv.org/abs/2504.14235 (accessed on 3 September 2025). [CrossRef]
Venčkauskas, A.; Toldinas, J.; Morkevičius, N.; Sanfilippo, F. An Email Cyber Threat Intelligence Method Using Domain Ontology and Machine Learning. Electronics 2024, 13, 2716. [Google Scholar] [CrossRef]
Evangelatos, P.; Iliou, C.; Mavropoulos, T.; Apostolou, K.; Tsikrika, T.; Vrochidis, S.; Kompatsiaris, I. Named Entity Recognition in Cyber Threat Intelligence Using Transformer-based Models. In Proceedings of the 2021 IEEE International Conference on Cyber Security and Resilience (CSR), Rhodes, Greece, 26–28 July 2021; pp. 348–353. [Google Scholar]
Zhen, Z.; Gao, J. Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF. Comput. Mater. Contin. 2023, 77, 299–323. [Google Scholar] [CrossRef]
Wang, X.; Wei, X.; Wang, Y.; Jiang, Z.; Jiang, J.; Yang, P.; Liu, R. A Survey of Cyber Threat Intelligence Entity Recognition Research. J. Cyber Secur. 2024, 9, 74–99. [Google Scholar]
OASIS Cyber Threat Intelligence (CTI) Technical Committee. STIX 2.1. OASIS, 2019. STIX Version 2.1 Specification. Available online: https://docs.oasis-open.org/cti/stix/v2.1/csprd01/stix-v2.1-csprd01.html (accessed on 3 September 2025).
Ma, X.; Hovy, E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; pp. 1064–1074. [Google Scholar]
Pinheiro, P.; Collobert, R. Recurrent convolutional neural networks for scene labeling. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 82–90. [Google Scholar]
Jiang, S.; Zhao, S.; Hou, K. A BERT-BiLSTM-CRF model for Chinese electronic medical records named entity recognition. In Proceedings of the 12th International Conference on Intelligent Computation Technology and Automation (ICICTA), Xiangtan, China, 26–27 October 2019; pp. 166–169. [Google Scholar]
Fang, J.; Wang, X.; Meng, Z. MANNER: A variational memory-augmented model for cross domain few-shot named entity recognition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 4261–4276. [Google Scholar]
Chen, W.; Zhao, L.; Luo, P. Heproto: A hierarchical enhancing protonet based on multi-task learning for few-shot named entity recognition. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 296–305. [Google Scholar]
Arora, J.; Park, Y. Split-NER: Named entity recognition via two question-answering-based classifications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Toronto, ON, Canada, 9–14 July 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; p. 416. [Google Scholar]
Wang, H.; Cheng, L.; Zhang, W.; Soh, D.W.; Bing, L. Order-Agnostic Data Augmentation for Few-Shot Named Entity Recognition. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 7792–7807. [Google Scholar] [CrossRef]
Yan, R.; Jiang, X.; Dang, D. Named Entity Recognition by Using XLNet-BiLSTM-CRF. Neural Process. Lett. 2021, 53, 3339–3356. [Google Scholar] [CrossRef]
Cui, L.; Wu, Y.; Liu, J.; Yang, S.; Zhang, Y. Template-Based Named Entity Recognition Using BART. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 1835–1845. [Google Scholar]
Liao, X.; Yuan, K.; Wang, X.F. Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 755–766. [Google Scholar]
Zhu, Z.; Dumitras, T. Chainsmith: Automatically Learning the Semantics of Malicious Campaigns by Mining Threat Intelligence Reports. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, UK, 24–26 April 2018; pp. 458–472. [Google Scholar]
Zhou, S.; Long, Z.; Tan, L. Automatic Identification of Indicators of Compromise Using Neural-Based Sequence Labelling. In Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation, Hong Kong, China, 1–3 December 2018; pp. 849–857. [Google Scholar]
Dionísio, N.; Alves, F.; Ferreira, P.M. Cyberthreat Detection from Twitter Using Deep Neural Networks. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 4–19 July 2019; pp. 1–8. [Google Scholar]
Wang, Y.; Wang, Z.-H.; Li, H.; Huang, W.-J. Named Entity Recognition in Threat Intelligence Domain Based on Deep Learning. J. Northeast. Univ. (Nat. Sci.) 2023, 44, 33–39. [Google Scholar]
Chen, S.-S.; Hwang, R.-H.; Sun, C.-Y.; Lin, Y.-D.; Pai, T.-W. Enhancing Cyber Threat Intelligence with Named Entity Recognition Using BERT-CRF. In Proceedings of the GLOBECOM 2023—2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 7532–7537. [Google Scholar] [CrossRef]
Chen, J.; Liu, Q.; Lin, H. Few-Shot Named Entity Recognition with Self-Describing Networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 5711–5722. [Google Scholar]
Ben-David, E.; Oved, N.; Reichart, R. PADA: Example-Based Prompt Learning for On-the-Fly Adaptation to Unseen Domains. Trans. Assoc. Comput. Linguist. 2022, 10, 414–433. [Google Scholar] [CrossRef]
Chen, X.; Zhang, N.; Li, L. LightNER: A Lightweight Generative Framework with Prompt-Guided Attention for Low-Resource NER. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2374–2387. [Google Scholar]
Wang, X.; Wei, C.; Zhang, L. Chinese Named Entity Recognition Based on Prompt Learning and Multi-Level Feature Fusion. J. Data Acquis. Process. 2024, 39, 1020–1032. [Google Scholar]
Ye, F.; Huang, L.; Liang, S.; Chi, K. Decomposed Two-Stage Prompt Learning for Few-Shot Named Entity Recognition. Information 2023, 14, 262. [Google Scholar] [CrossRef]
Xia, Y.; Tong, Z.; Wang, L.; Liu, Q.; Wu, S.; Zhang, X. MPE3: Learning Meta-Prompt with Entity-Enhanced Semantics for Few-Shot Named Entity Recognition. Neurocomputing 2025, 620, 129031. [Google Scholar] [CrossRef]
Tjong Kim Sang, E.F.; De Meulder, F. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, Edmonton, AB, Canada, 31 May–1 June 2003; pp. 142–147. [Google Scholar]
Linguistic Data Consortium. ACE 2005 Multilingual Training Corpus; Linguistic Data Consortium: Philadelphia, PA, USA, 2006. [Google Scholar]
Derczynski, L.; Nichols, E.; van Erp, M.; Limsopatham, N. Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text (W-NUT 2017), Copenhagen, Denmark, 7 September 2017; Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 140–147. [Google Scholar] [CrossRef]
Weischedel, R.; Pradhan, S.; Ramshaw, L.; Palmer, M.; Xue, N.; Marcus, M.; Taylor, A.; Greenberg, R.; Hovy, E.; Maeda, K. OntoNotes Release 5.0; Linguistic Data Consortium: Philadelphia, PA, USA, 2013; LDC2013T19. [Google Scholar]
VirusTotal. VirusTotal: Online Malware Analysis. Available online: https://www.virustotal.com/ (accessed on 3 September 2025).
MISP. MISP: Open Source Threat Intelligence Platform. Available online: https://www.misp-project.org/ (accessed on 3 September 2025).
OpenIOC. OpenIOC: A Framework for Threat Intelligence Sharing. Available online: https://openioc.org/ (accessed on 3 September 2025).
AlienVault OTX. AlienVault Open Threat Exchange (OTX). Available online: https://otx.alienvault.com/ (accessed on 3 September 2025).
Wang, X.; Liu, X.; Ao, S. DNR-TI: A Large-Scale Dataset for Named Entity Recognition in Threat Intelligence. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December–1 January 2020; pp. 1842–1848. [Google Scholar]
Kim, G.; Lee, C.; Jo, J. Automatic Extraction of Named Entities of Cyber Threats Using a Deep Bi-LSTM-CRF Network. Int. J. Mach. Learn. Cybern. 2020, 11, 2341–2355. [Google Scholar] [CrossRef]
Wang, X.; He, S.; Xiong, Z. APTNER: A Specific Dataset for NER Missions in Cyber Threat Intelligence Field. In Proceedings of the 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Hangzhou, China, 4–6 May 2022; pp. 1233–1238. [Google Scholar]
Bianco, D.J. The Pyramid of Pain. Enterprise Detection & Response Blog. 1 March 2013. Available online: http://detect-respond.blogspot.com/2013/03/the-pyramid-of-pain.html (accessed on 29 March 2024).

Figure 1. Framework diagram of the CTINER dataset’s construction.

Figure 2. Example of the preprocessed corpus.

Figure 3. Example of annotated data.

Figure 4. Comparison of the CTINER dataset with other CTI datasets.

Figure 5. Overall structure of the PROMPT-BART model.

Figure 6. Constructing entity demonstrations.

Table 1. The 13 entity labels of the CTINER dataset.

Name	Type	Explanation
Threat Actor	ACT	The individuals, organizations, or groups with malicious intent and their aliases, namely, the initiators of the attack actions.
Attack Target	TAR	Industries, governments, etc.
Campaign	CAM	A set of malicious activities or attacks launched against specific targets over a period of time.
Identity	IDTY	Individuals, organizations, or groups, as well as categories of individuals, organizations, or groups.
Vulnerability	VUL	Vulnerability names and Vulnerability numbers.
Tool	TOOL	Legitimate software utilized by threat actors to launch attacks.
Malware	MAL	A program inserted into a system to disrupt confidentiality, integrity.
Location	LOC	Specific locations, geographical positions.
Time	TIME	Dates, years, months, time points, etc.
Sample File	FILE	For instance: at.exe, Vietnam.exe.
URL	URL	URLs appearing in threat intelligence reports are often malicious and related to cyberattacks
OS	OS	Various operating systems.
Email	EML	For example: uglygorilla@163.com.

Table 2. CTINER dataset.

Dataset	Sentences	Words	Entities	Entity Tokens
Training Set	11,653	321,471	30,360	44,783
Test Set	3291	91,854	8202	10,389
Validation Set	1629	45,983	3987	4995
Total	16,573	459,308	42,549	60,167

Table 3. Comparison of entity categories across datasets.

Dataset	Labels	Entity Categories
APTNER	21	Threat Actor, Security Team, Identity, Operating System, Malicious Email, Location, Time, IP Address, Domain, URL, Protocol, File Sample, Tool, MD5, SHA1, SHA2, Malware, Encryption Algorithm, Vulnerability Name, Vulnerability ID, Attack Campaign
DNRTI	13	Hacker Group, Attack, File Sample, Security Team, Tool, Time, Objective, Region, Industry, Organization, Method, Vulnerability, Feature
CTIReports	10	Hash, IP.Unknown, Malware.Backdoor, Malware.Drop, Malware.Infosteal, Malware.Ransom, Malware.Unknown, URL.CNCsvr, URL.Normal, URL.Unknown
CTINER	13	Threat Actor, Attack Target, Attack Campaign, Identity, Vulnerability, Tool, Malware, Location, Time, File Sample, Malicious URL, Operating System, Malicious Email

Table 4. Entity and its corresponding sentence for each entity type.

Entity Type	Max Number of Entities	Sentence Containing the Entities
ACT	APT1 (263)	The timeline and details of APT1’s extensive attack infrastructure.
TAR	organizations (241)	A relatively advanced threat actor, it has been targeting a variety of organizations over the past years.
CAM	phishing (110)	Users are directed to either a phishing page or a survey scam.
IDTY	Recorded Future (256)	Learn more about using Recorded Future for cyber security analysis.
VUL	zero-day (100)	And this year quite a number of zero-days were used in targeted attacks.
TOOL	C2 (227)	The threat actors also leverage popular code and file-sharing sites for their C2 domains.
MAL	Olympic Destroyer (158)	But it has now become more popular, especially in more publicized malware, and Olympic Destroyer is a good example of that.
LOC	China (518)	The “Belt and Road Initiative” and China’s Economic Goals.
TIME	2018 (169)	It was detected on a machine in 2018, unrelated to any of the attacks in the current operation.
FILE	payload (215)	The payload is detected as BKDR_FYNLOS.SM1 and has been used in similar attacks in the past.
URL	downloads.zyns.com (5)	downloads.zyns.com has resolved to 108.177.181.66.
OS	Windows (414)	There is an order in which executables load DLLs on the Windows operating system.
EML	uglygorilla@163.com (2)	The infrastructure was registered by an individual using the email address uglygorilla@163.com.

Table 5. Bidirectional translation templates.

Seed Template	Translated Template
[X] is [Y].	[X] is [Y].
[X] is [Y] entity.	[X] is [Y] unit.
In this sentence, [X] is [Y].	In this sentence is [X] [Y].
In the field of cyber threat intelligence, [X] is a named entity and its category is [Y].	In the area of cyber threat information, [X] is a designated entity and its category is [Y].
In the field of cybersecurity, [X] belongs to the [Y] entity type.	In the field of cybersecurity, [X] belongs to the type of [Y] unit.

Table 6. Experimental results for different templates.

Template	Precision	Recall	F1	Template Length
[X] is [Y].	87.92	88.39	88.16	4
[X] is [Y] entity.	84.44	84.68	84.56	5
[X] is [Y] unit.	83.96	84.05	84.01	5
In this sentence, [X] is [Y].	75.13	82.74	78.75	8
In the field of cybersecurity, [X] belongs to the [Y] entity type.	76.21	82.89	79.41	14
In the field of cyber threat intelligence, [X] is a named entity and its category is [Y].	73.54	84.75	78.75	19
In the area of cyber threat information, [X] is a designated entity and its category is [Y].	77.50	81.82	79.60	19

Table 7. Comparative experiment results of PROMPT-BART.

Model	Precision	Recall	F1
XLNet-Bi-LSTM-CRF	80.53	86.98	83.31
XLNet-Bi-GRU-CRF	80.51	88.07	83.90
BERT-Bi-LSTM-CRF	74.63	87.21	79.86
BERT-Bi-GRU-CRF	75.71	86.85	80.22
Template-NER	85.31	88.44	86.85
PROMPT-BART (ours)	87.92	88.39	88.16

Table 8. Ablation experiment results of PROMPT-BART.

Model	Precision	Recall	F1
Without-task prompt	86.63	87.59	87.10
Without-demonstration prompt	86.14	87.89	87.00
PROMPT-BART (ours)	87.92	88.39	88.16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, X.; He, S.; Wei, X.; Liu, R.; Yue, H.; Wang, X. PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence. Appl. Sci. 2025, 15, 10276. https://doi.org/10.3390/app151810276

AMA Style

Feng X, He S, Wei X, Liu R, Yue H, Wang X. PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence. Applied Sciences. 2025; 15(18):10276. https://doi.org/10.3390/app151810276

Chicago/Turabian Style

Feng, Xinzhu, Songheng He, Xinxin Wei, Runshi Liu, Huanzhou Yue, and Xuren Wang. 2025. "PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence" Applied Sciences 15, no. 18: 10276. https://doi.org/10.3390/app151810276

APA Style

Feng, X., He, S., Wei, X., Liu, R., Yue, H., & Wang, X. (2025). PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence. Applied Sciences, 15(18), 10276. https://doi.org/10.3390/app151810276

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PROMPT-BART: A Named Entity Recognition Model Applied to Cyber Threat Intelligence

Abstract

1. Introduction

2. Related Work

2.1. Research on Methods for Named Entity Recognition

2.1.1. General Domain Named Entity Recognition Research

2.1.2. Named Entity Recognition Research in Cyber Threat Intelligence

2.1.3. Research on Prompt Engineering-Based Named Entity Recognition

2.2. Construction of Datasets for Named Entity Recognition

2.2.1. General Domain Datasets

2.2.2. Cybersecurity Domain Datasets

3. Construction of Cyber Threat Intelligence Datasets

3.1. Dataset Extraction Methodology and Corresponding Modules

3.1.1. Threat Intelligence Acquisition Module

3.1.2. Data Preprocessing Module

3.1.3. Entity Pre-Definition Module

3.1.4. Annotation and Verification Module

3.1.5. Data Format Conversion Module

3.1.6. Annotation Data Cleaning Module

3.1.7. Rare Data Supplementation Module

3.2. Dataset Overview and Entity Distribution

3.3. Comparison with Other Datasets: DNRTI, CTIReports, and APTNER

3.3.1. Label Rationality

3.3.2. Dataset Scale Comparison

4. Model

4.1. Model Construction

4.1.1. Constructing Task Prompts

4.1.2. Constructing Entity Demonstration Prompts

4.1.3. Constructing Template Prompts

4.2. Model Training and Inference

5. Experiment

5.1. Experiment Setup

5.2. Comparison and Analysis of Template Prompts

5.3. Comparison of NER Models

5.4. Ablation Experiment Results and Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI