Threat Intelligence Extraction Framework (TIEF) for TTP Extraction

Joy, Anooja; Chandane, Madhav; Nagare, Yash; Kazi, Faruk

doi:10.3390/jcp5030063

Open AccessArticle

Threat Intelligence Extraction Framework (TIEF) for TTP Extraction

¹

Department of Computer Engineering and Information Technology, Veermata Jijabai Technological Institute, Mumbai 400019, Maharashtra, India

²

Department of Cyber Security, Shah and Anchor Kutchhi Engineering College, Mahavir Education Trust Chowk, Mumbai 400088, Maharashtra, India

³

Department of Electrical Engineering, Veermata Jijabai Technological Institute, Mumbai 400019, Maharashtra, India

^*

Author to whom correspondence should be addressed.

J. Cybersecur. Priv. 2025, 5(3), 63; https://doi.org/10.3390/jcp5030063

Submission received: 28 June 2025 / Revised: 14 August 2025 / Accepted: 29 August 2025 / Published: 3 September 2025

(This article belongs to the Collection Machine Learning and Data Analytics for Cyber Security)

Download

Browse Figures

Versions Notes

Abstract

The increasing complexity and scale of cyber threats demand advanced, automated methodologies for extracting actionable cyber threat intelligence (CTI). The automated extraction of Tactics, Techniques, and Procedures (TTPs) from unstructured threat reports remains a challenging task, constrained by the scarcity of labeled data, severe class imbalance, semantic variability, and the complexity of multi-class, multi-label learning for fine-grained classification. To address these challenges, this work proposes the Threat Intelligence Extraction Framework (TIEF) designed to autonomously extract Indicators of Compromise (IOCs) from heterogeneous textual threat reports and represent them by the STIX 2.1 standard for standardized sharing. TIEF employs the DistilBERT Base-Uncased model as its backbone, achieving an F1 score of 0.933 for multi-label TTP classification, while operating with 40% fewer parameters than traditional BERT-base models and preserving 97% of their predictive performance. Distinguishing itself from existing methodologies such as TTPDrill, TTPHunter, and TCENet, TIEF incorporates a multi-label classification scheme capable of covering 560 MITRE ATT&CK classes comprising techniques and sub-techniques, thus facilitating a more granular and semantically precise characterization of adversarial behaviors. BERTopic modeling integration enabled the clustering of semantically similar textual segments and captured the variations in threat report narratives. By operationalizing sub-technique-level discrimination, TIEF contributes to context-aware automated threat detection.

Keywords:

cyber threat intelligence (CTI); Common Vulnerability Enumeration (CVE); Indicators of Compromise (IOCs); Tactics, Techniques, and Procedures (TTPs)

1. Introduction

The examination of advanced adversarial tactics and technologies has become imperative for the timely and effective mitigation of sophisticated cyber threats. Achieving accurate detection requires a comprehensive analysis of adversarial artifacts, including malicious binaries, IP addresses, domains, and vulnerabilities. This analysis extends to evaluating correlations among attack vectors, exploited techniques, targeted assets, and the temporal patterns of malicious activity [1]. Cyber threat intelligence (CTI) evaluates historical threat information, including attacker motives, tactics, and potential business impact during an attack, to generate actionable insights. Publicly disclosed threat reports often provide valuable intelligence, encapsulating strategic and technical observations that enable analysts to identify and respond to emerging threats [2]. CTI can refer to adversarial techniques that carry out attacks based on analysis of traffic patterns, communication frequencies, timing intervals, and message size distributions to infer the operational behaviors of networks [3,4,5].

CTI includes Indicators of Compromise (IOCs) and Tactics, Techniques, and Procedures (TTPs) used by attackers [6]. IOCs are discrete, observable artifacts such as malware signatures, IP addresses, email addresses, or file hashes that serve as forensic evidence of a potential breach. They are inherently reactive, as they can only be identified after a compromise, and they are static in nature, which aids in detecting immediate threats. Adversaries can easily modify or obfuscate IOCs, such as changing file hashes, IP addresses, or domain names, to evade detection and reuse the same underlying tactics in future campaigns. This adaptability diminishes the long-term effectiveness of IOCs, and relying solely on these static indicators fails to reveal the deeper correlations between adversary behavior and attack progression [7].

In contrast, TTPs characterize the operational behaviors, strategic methodologies, and specific tools employed by attackers [7]. IOCs facilitate the detection of system breaches by revealing specific indicators, while TTPs provide a broader understanding of attacker intent and modus operandi, enabling proactive threat hunting and pattern-based correlation across multiple intrusion events. To achieve a more comprehensive defensive posture, threat intelligence must incorporate both IOCs and TTPs in a structured and actionable format. When incorporated into activities such as red teaming or penetration testing, this intelligence enables organizations to proactively identify systemic vulnerabilities and simulate adversarial behavior, thereby strengthening detection and response capabilities.

A case in point is the SolarWinds/SUNBURST attack conducted by APT29, which involved injecting a backdoor into SolarWinds Orion updates, which involved a supply chain compromise, stealing or forging authentication certificates, and manipulating application layer protocols, particularly web traffic usage. These techniques were detailed in threat intelligence reports [8,9], which required manual extraction to identify and disseminate defensive measures. A manual process of mapping CTI is inherently error-prone, demands extensive security expertise, and is highly resource-intensive, making it unsustainable given the increasing volume and velocity of CTI sharing [10]. Automating TTP extraction from these reports could bridge the gap between raw intelligence and actionable defense by enabling faster correlation and rule deployment in the Security Information and Event Management System (SIEM), thereby accelerating incident response across global enterprises and preventing widespread compromise. Threat reports published by security firms contain rich information on attacker tools, techniques, and procedures, but are typically present in unstructured, natural language formats. Analyzing these reports to uncover recurring attack patterns enables organizations to derive actionable CTI, particularly in the form of TTPs and IOCs [11].

Effective extraction of TTPs from unstructured threat intelligence reports requires a rich dataset that spans a broad spectrum of adversarial behaviors with adequate frequency across diverse attack classes. The MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge) is a knowledge base for adversary behavior that covers the domains of Enterprise, Industrial Control System (ICS), and Mobile [12]. With over 350 techniques and 560 sub-techniques, MITRE ATT&CK offers valuable insights into the tactical objectives of threat actors, serving as a key data source for this research. Incorporating sub-techniques into the threat classification process improves the granularity of attack attribution, rather than relying solely on technique-based classification. For instance, grouping all variations of “credential dumping” into one category obscures the distinctions in adversarial tooling, memory access methods, or privilege escalation strategies. By including a corpus of 560 sub-techniques recorded by MITRE’s worldwide threat intelligence community, a comprehensive coverage of the tactical knowledge for the threat extraction framework is ensured. Such depth prevents selection bias, enables precise identification of sub-techniques, and improves detection of both well-known and emerging threat vectors.

This research applies natural language processing (NLP) techniques to extract TTPs from textualized security analysis reports and generate the STIX 2.1 format from the derived threat intelligence. The primary contributions of this work are as follows:

Proposes a novel Threat Intelligence Extraction Framework (TIEF) for the automated extraction of TTPs from unstructured threat reports to map IOCs to MITRE ATT&CK sub-techniques and the generation of threat intelligence in STIX 2.1 format for interoperability;
Designs and implements a multi-class classification model capable of identifying 560 sub-techniques across the MITRE ATT&CK domains of ICS, Enterprise, and Mobile, enabling precise sub-technique-level identification of attacker behaviors;
Constructs a TTP classifier dataset through the application of Large Language Model (LLM)-driven data generation and augmentation techniques that mitigate class imbalance and ensure equitable representation of low-frequency sub-techniques, enhancing the classifier’s robustness and granularity;
Introduces a sentence grouping methodology based on semantic correlation rather than simple keyword matching that captures contextual relationships within threat reports, improving adaptability to evolving attack patterns.

The remainder of this paper is organized as follows: Section 2 summarizes the relevant literature. Section 3 briefly describes the preliminaries, ie, the elucidation of key terms and concepts relevant to the research. Section 4 describes the TIEF used for the TTP extraction process. Section 5 discusses the results of the proposed methods, and Section 6 presents the conclusion and directions for future research.

2. Literature Review

The automated extraction of IOCs and TTPs from unstructured threat reports has garnered significant research attention over recent years. Early efforts, such as TTPDrill, employed term frequency–inverse document frequency (TF–IDF) with BM25 weighting to classify threat actions and construct TTPs in the STIX 2.1 format [13]. Although TTPDrill achieved promising results with 84% precision and 82% recall, its reliance on predefined ontologies limited its generalizability across various CTI sources and hindered its performance in composite actions involving multiple techniques. Similarly, rcATT, an automated tool for extracting TTPs from threat reports in STIX notation, used multi-label text classification with linear Support Vector Machines (SVMs) to classify MITRE ATT&CK techniques, obtaining an F1 score of 80% [14]. However, the absence of sub-technique-level granularity constrained its utility for detailed attribution.

The integration of transformer-based models, such as CTI-BERT and SecureBERT, demonstrated improved masked-word prediction and classification performance in cybersecurity corpora, along with domain-specific pre-training [15,16]. BERT’s contextual embeddings enhanced semantic disambiguation in unstructured text, improving adversary behavior classification [17]. The BERT-Conditional Random Field (CRF) model used Named Entity Recognition (NER), removed intermediate BiLSTM layers, and directly fed BERT embeddings to CRF layers. The model achieved 82.64% accuracy in real-world scenarios and demonstrated superior performance with threat intelligence containing more than 50 characters, achieving 94.93% average precision [18]. Addressing class imbalance in TTP classification, [19] applied oversampling methods, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Easy Data Augmentation (EDA), on a composite corpus derived from TRAM and MITRE ATT&CK datasets. This increased the F1 score from approximately 60% to 80%, improving generalizability across under-represented TTP labels.

To address the challenges of semantic interpretation, TCENet integrated IPs, CVEs, and URLs as structured cues within the threat text to improve context-sensitive TTP classification, achieving 94.1% precision in six TTP categories [7]. The narrow contextual window and limited dataset restricted TCENet’s scalability for complex multi-stage campaigns. TTPHunter extended transformer-based approaches with contextual sentence embeddings and threshold-based filtering to handle noisy CTI text, though it struggled with multi-label associations and under-represented classes [20]. Its successor, TTPXHunter, incorporated SecureBERT and new augmentation strategies, achieving an F1 score of 97.09% [21], yet remained constrained by a sliding-window approach and single-label classification. The sliding-window approach risks missing contextual information distributed across multiple sentences. Building upon transformer foundations, [22] incorporated entity recognition, semantic similarity analysis, and multi-label classification to extract and categorize TTPs using the WAVE-27 dataset, which comprises 27 MITRE techniques and seven tactics, to obtain a 97.00% micro F1 score.

In the domain of topic modeling, U-BERTopic, designed for urgency-aware topic modeling, has demonstrated enhanced capabilities in the identification of cybersecurity events while preserving topic diversity and coherence compared with traditional models [23,24]. Integrating BERT models with knowledge graphs has improved the understanding of cybersecurity semantics and facilitated better prediction of threat actor behaviors [25]. K-CTIAA improved the accuracy of threat action extraction by integrating a pre-trained model with a knowledge graph to capture network security semantics, while simultaneously addressing knowledge noise through the use of a visibility matrix [26]. The LANCE system addressed the challenge of constructing ground truth datasets for IOC extraction, reducing analyst workload by 43% and improving precision through explainable, context-aware labeling. However, its reliance on human validation limited the scalability for fully automated pipelines [10].

Recent benchmarking initiatives, such as CTIBench and AttackSeqBench, have created benchmark datasets and evaluated LLM capabilities for TTP reasoning and sequential attack understanding, highlighting the gaps in the performance of the existing model when handling temporally ordered threat sequences [27,28]. UniTTP, a unified framework consolidating TTP mapping across multiple phases of the cyber kill chain via multi-task learning, improved robustness against unseen threat patterns [29]. More recently, an evolving memory-driven framework has been proposed to enable explainable TTP assignments, enhancing transparency and human oversight in automated extraction pipelines [30].

Despite these advances, most state-of-the-art systems remain focused on high-level technique classification and do not capture sub-technique-level detail essential for precise attribution and mitigation. For example, grouping all credential dumping methods into a single category, T1003, obscures the distinctions between LSASS memory scraping (T1003.001) and DCSync attacks (T1003.006), each requiring different detection and defense strategies. Furthermore, current frameworks often struggle to process reports containing multiple sub-techniques in a single passage, where high textual similarity between related techniques complicates semantic differentiation. The multi-label nature of threat intelligence reports, where single sentences can describe concurrent TTPs, is frequently overlooked by systems optimized for single-label classification. Persistent class imbalance also undermines performance for under-represented classes, limiting operational reliability of automated extraction systems. Similarly, topic modeling approaches relying on fixed sentence windows or basic keyword matching often fail to preserve semantic or contextual coherence across varied narrative structures.

This work addresses these critical gaps by introducing TIEF, a comprehensive pipeline that incorporates LLM-based data augmentation, multi-label classification for all 560 MITRE ATT&CK sub-techniques, sentence grouping via semantic correlation, and end-to-end STIX 2.1 formatting. In contrast with existing models, TIEF achieves improved interpretability, broader TTP coverage, and precise mapping of contextual threat behaviors, thereby advancing the state of the art in automated threat intelligence extraction.

3. Background and Preliminaries

Structured Threat Information eXpression (STIX): STIX 2.1 is a standardized language and format for exchanging CTI, facilitating effective sharing, storage, and analysis of cyber threat information. The OASIS Cyber Threat Intelligence Technical Committee develops and maintains STIX. STIX 2.1 structures threat data in a JavaScript Object Notation (JSON) format and includes structured threat information, typically comprising various STIX objects such as Indicator, Attack Pattern, Threat Actor, Report, and their inter-relationships, as illustrated in Figure 1. The adoption of the STIX 2.1 format ensures that threat information is easily shareable and comprehensible across different security platforms, improving collaboration and threat response capabilities within the cybersecurity community.
Indicators of Compromise (IOCs): IOCs are digital artifacts that serve as forensic evidence of a system or network breach. Cybersecurity professionals utilize these IOCs to identify and analyze security incidents, such as malware infections, data breaches, and unauthorized access. The specific IOCs used in this research are detailed in Table 1, which provides a basis for the identification of TTPs.
Tactics, Techniques, and Procedures (TTPs): TTPs provide a structured framework to describe the behavior of a threat actor during a cyber-attack, detailing the specific methods they employ. Analyzing TTPs enables a shift from reactive defense mechanisms to a more proactive and strategic cybersecurity approach, allowing for improved anticipation and mitigation of potential threats.
- Tactics: Tactics represent the high-level objectives that an attacker seeks to accomplish, such as gaining initial access, escalating privileges, exfiltrating data, or disrupting operations. Understanding these high-level goals is crucial to developing models that can predict and mitigate potential attack strategies.
- Techniques: Techniques refer to the specific methods used by attackers to achieve their tactical objectives, analogous to tools in a toolbox. These methods may include exploiting vulnerabilities, employing social engineering tactics, such as phishing emails, or deploying malware. Accurate identification and classification of these techniques are essential to improve cybersecurity defenses.
- Procedures: Procedures are the detailed step-by-step instructions that attackers follow to execute a specific technique. Representing the most granular level of an attack, these procedures outline the precise actions taken during an attack.

4. Threat Intelligence Extraction Framework (TIEF)

The TIEF comprises five key stages: raw report ingestion and text preprocessing, contextual sentence grouping via topic modeling, IOC extraction and soft tagging, multi-label TTP classification, and STIX 2.1 object generation, as illustrated in Figure 2. TIEF operates on two principal sources of data: the first for training machine learning models for the TTP classifier and the second for threat intelligence reports for TTP intelligence extraction. Algorithm 1 describes the general workflow of the TIEF framework. The subsequent sections provide an in-depth explanation of the functionality and design of each module within the framework.

Algorithm 1: Algorithm for TTP extraction

4.1. Raw Report Ingestion and Text Preprocessing

Raw threat intelligence reports, predominantly text-heavy PDFs detailing various malicious activities, serve as the initial input to the preprocessing stage. Given the inherently unstructured nature of these reports, a dedicated preprocessing pipeline has been developed to convert the raw textual data into a structured and analyzable format. The pipeline begins with thorough text data cleaning, which involves removing special characters, embedded images, excessive whitespace, and other non-textual elements to streamline the content. An illustrative example of the text preprocessing procedure is provided in Figure 3.

Following cleaning, the text undergoes tokenization, segmenting it into discrete units such as words or meaningful phrases. Stop words—common words that contribute little semantic value—are filtered out to improve the relevance of the data. The final step in the pipeline applies lemmatization, reducing words to their base or dictionary forms, ensuring consistency across the dataset.

4.2. Contextual Sentence Grouping via Topic Modeling

The extracted text from threat intelligence reports is segmented into variable-length sentences using topic modeling techniques. In this research, three different topic modeling approaches are evaluated to determine their suitability within the TIEF framework. The comparative results are summarized in Table 2. Based on this evaluation, the BERTopic method proved to be the most effective for TIEF, primarily due to the high quality and relevance of the topics it generated. As illustrated in Figure 4, BERTopic produces context-sensitive group embeddings tailored to the cybersecurity domain by clustering sentences containing keywords such as “exfiltrating the information”, “report the connection status”, and “enumerate directories and files”. Furthermore, BERTopic’s approach to forming topic groups considers the structure of phrases, which preserves contextual continuity and effectively handles synonyms, ensuring that semantically related sentences are grouped together. This improves the coherence and precision of subsequent analysis.

BERTopic is a topic modeling technique that combines transformer-based embeddings, dimensionality reduction, and density-based clustering to generate interpretable and contextually relevant topics from text-heavy data. Transformer models, such as BERT, are used to generate high-dimensional sentence embeddings, which are then projected into a lower-dimensional space using UMAP (Uniform Manifold Approximation and Projection). In the implementation, UMAP is configured with parameters optimized for cybersecurity text: three nearest neighbors, five components, and a cosine similarity metric, preserving contextual relationships between sentences while reducing computational complexity. Following dimensionality reduction, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) is applied to identify dense clusters of semantically related sentences. The clustering algorithm utilizes a minimum cluster size of 2 and the “eom” (excess of mass) method for cluster selection, enabling the robust identification of meaningful topic groups even in noisy data. A custom vectorizer, featuring a stemming tokenizer (StemTokenizer) with the Porter Stemmer, processes the text by tokenizing and stemming words, filtering out common and rare terms, and utilizing both bigrams and unigrams for feature extraction, ensuring a relevant feature set for topic representation.

The topic interpretability is refined by integrating Part-of-Speech (POS) tagging with the representation model and defining key linguistic patterns, such as adjective–noun pairs (e.g., “malicious software”) and verb–noun pairs (e.g., “exploit vulnerability”). This linguistic awareness ensures that the topics extracted are not only statistically coherent but also linguistically meaningful, thereby enhancing their clarity and relevance for cybersecurity analysts. By integrating these components: transformer embeddings, UMAP, HDBSCAN, a custom vectorizer, and linguistically informed topic representation, BERTopic effectively transforms unstructured, text-heavy threat intelligence into structured, context-aware topic clusters.

4.3. IOC Extraction and Soft Tagging

From preprocessed threat intelligence reports, 13 IOCs, as detailed in Table 1, were systematically extracted using a hybrid approach that combines Regular Expressions (regex) and a gazetteer. Regex, a pattern-matching technique, enables the identification of structured data elements, such as IP addresses, hashes, and URLs, regardless of whether these elements are obfuscated, which is crucial for identifying structured data elements within inconsistently formatted threat reports. A gazetteer, a curated dictionary of predefined names, terms, or entities, is used to recognize specific non-IOC entries. For example,“communication protocols” were identified using a gazetteer populated with terms such as “http”, “https”, “ftp”, “smtp”, “pop3”, and “dns”. This dual approach improves both the coverage and accuracy of IOC extraction, as regex efficiently captures well-defined patterns, while the gazetteer ensures recognition of domain-specific terms that may not conform to strict syntactic rules.

Once extracted, a soft tagging approach was implemented. Each IOC instance in the text was replaced with a generic token representing its category (e.g., replacing an IP address like “192.168.0.1” with the token “IPv4”). The original IOC values were preserved separately in a dictionary of arrays indexed by their respective categories. This abstraction serves two purposes: reducing data variability and preventing the model from overfitting to specific IOC instances, thus improving generalization during model prediction. The resulting structure, illustrated in Figure 5, is a dictionary where each key corresponds to an IOC type and the associated value is a list of all extracted instances.

4.4. Multi-Label TTP Classification

Soft-tagged sentences undergo multi-label TTP classification to identify either the threat category (for example, malware, phishing) or the associated threat actor. The TTP classifier, using a DistilBERT Base-Uncased model, predicts corresponding MITRE ATT&CK TTP IDs. This model, pre-trained on labeled MITRE datasets, is fine-tuned for multi-label classification, enabling the prediction of multiple TTPs per input.

The workflow of the classifier is illustrated in Figure 6, consisting of a DistilBERT Base-Uncased encoder that utilizes 6 transformer layers and 12 attention heads to process text and capture contextual relationships. A linear classification layer with sigmoid activation generates independent probabilities for each TTP label. The predicted TTPs map sentences to MITRE attack patterns, enabling the generation of structured threat intelligence output. The classifier is fine-tuned for TTP extraction using cybersecurity-specific corpora.

4.4.1. Training Data Construction and Augmentation

The MITRE ATT&CK framework [31] is widely recognized for its thorough documentation of adversary tactics and techniques across Enterprise, Industrial Control Systems (ICSs), and Mobile domains. Based on real-world observations of adversary behavior, MITRE ATT&CK provides a highly relevant and up-to-date foundation for accurate TTP identification and classification.

For this research work, the data were extracted from the MITRE ATT&CK website using the Selenium web scraping tool, resulting in a dataset containing 11,068 rows and 7 columns. These columns were chosen for their direct relevance to adversary behaviors, providing an essential context for the classification of TTPs. Specifically, features such as technique names, sub-techniques, procedure descriptions, and associated tactics were selected as shown in Table 3, as they are critical for the machine learning model’s ability to learn and differentiate between various adversarial techniques. In particular, procedure descriptions provide detailed step-by-step accounts of adversary actions, which capture the operational context required for precise TTP classification. Columns pertaining to defensive strategies and metadata, such as Detection, Mitigations, Version, Created, and Last Modified, were excluded because they are not directly pertinent to the classification of adversarial behaviors. The final dataset thus offers a detailed and focused view of adversary actions, with TTPs as the target variable and procedure descriptions serving as the primary textual feature for model training.

The original MITRE dataset comprises 560 technique classes, exhibiting severe imbalance, with sentence counts per class ranging from 1 to 300, as shown in Figure 7. The original dataset exhibits a highly imbalanced class distribution, with dominant technique IDs (e.g., T1489) and sparse representation for many others, reflecting bias in the raw CTI data. This imbalance risks biased model training toward majority classes, compromising generalization for underrepresented techniques. The augmented dataset exhibits a more uniform class distribution, achieved through oversampling and text augmentation, which improves learning stability and reduces classifier bias toward high-frequency techniques.

To address both the imbalance in the number of descriptions per class and the limited diversity in technique descriptions, large-language models (LLMs) were employed to generate synthetic data. Approximately 100 sentences were created for each technique label (64,953 total) that were under-represented using the Claude Sonnet 3.5 API. These additional data improved the representation of minority classes, resulting in a more balanced and diverse dataset, as shown in Figure 8. The distribution of TTP_ids is shown in the heatmap representation in Figure A1 and Figure A2 in the Appendix A. The data generation process was driven by previous incidents taken from the original dataset, ensuring that the newly generated content remained closely aligned with existing descriptions in terms of context.

To ensure the quality and relevance of synthetically generated MITRE ATT&CK descriptions, a structured approach is implemented: (1) structured prompt engineering using a template that explicitly referenced technique IDs, technique names, and tactic names while requiring alignment with existing attack patterns (e.g., “Generate attack descriptions for T1557 (Adversary-in-the-Middle) using SSL/TLS exploitation patterns”); (2) contextual anchoring by including authentic examples (e.g., “Applications encrypting traffic evade Adversary-in-the-Middle”) to guide relevant outputs (e.g., “Attackers exploit vulnerabilities in SSL/TLS implementations to intercept encrypted traffic”); and (3) expert validation through rigorous manual review by graduate and PhD cybersecurity students to verify technical accuracy and MITRE framework consistency. This manual validation mitigated potential risks from over-reliance on synthetic data by ensuring that generated entries preserved both tactical context and technical depth consistent with real-world adversarial behaviors.

For example, in the System Network Connections Discovery technique (T1049) under the discovery tactic, synthetic descriptions were generated by building on real attack scenarios, such as running the command netstat -ano to list active network connections on compromised hosts. The LLM then produced additional procedure-level descriptions emphasizing this specific method and its strategic objective, thereby expanding the dataset with realistic and tactically relevant attack behaviors aligned with the MITRE framework.

The MITRE dataset and the synthetic data are merged under unified “Technique” labels. The “Procedure Description” column is retained as a model input for TTP extraction and classification. Overall, around 64,953 new descriptions were generated, significantly expanding and balancing the dataset across all techniques and sub-technique IDs. This augmentation was crucial in minimizing potential biases and improving the performance of TTP extraction models.

Synthetic data do not negatively impact the accuracy of the classifier, as verified in Table 4. The marginal decrease in Receiver Operating Characteristic (ROC) Area under the Curve (AUC) (97.45% to 96.9%) demonstrates that the discriminative ability of the classifier, preserved by synthetic data, is addressed while mitigating representation biases. The notable increase in F1 scores (from 35% to 93%) provides strong evidence that data augmentation successfully mitigates classification bias in under-represented classes without compromising overall accuracy. LLM augmentation resolved class imbalance while maintaining taxonomic integrity, enabling robust training of the TTP classifier across all categories of techniques.

4.4.2. Model Selection and Hyperparameter Optimization

This research conducted a comparative evaluation of three BERT-based architectures: DistilBERT, SecBERT, and SecureBERT, to determine the most suitable model for TTP classification. The automated hyperparameter optimization of Optuna identified optimal training configurations for each model. Performance was assessed using accuracy, ROC, AUC, Hamming loss, and validation loss as summarized in Table 5. DistilBERT demonstrated superiority by balancing high accuracy (93.29%), exceptional discriminative power (96.67% ROC AUC), and efficient resource utilization.

The DistilBERT Base-Uncased model was configured to optimize both accuracy and computational efficiency for the task of TTP classification. Input text was processed using a tokenizer that converted sentences into token IDs, with truncation applied to limit sequences to a maximum of 128 tokens and padding used to standardize input lengths. For the model architecture, the num_labels parameter was set to match the total number of unique TTP labels, reflecting the multi-label nature of the classification task. The problem_type parameter was defined as “multi_label_classification” to enable the model to assign multiple TTPs to a single input, aligning with the complexity of real-world threat intelligence scenarios.

Training parameters were selected to strike a balance between performance and stability. A batch size of eight was used for both training and evaluation, which provided a good compromise between memory usage and convergence speed. The model was trained for five epochs, a duration determined to be sufficient to achieve robust performance without overfitting. To further safeguard against overfitting and to facilitate optimal model selection, checkpointing was implemented every 1000 steps, with the two most recent checkpoints retained throughout the training process.

For the evaluation phase, a comprehensive dataset was assembled, comprising 1982 threat intelligence reports, including Advanced Persistent Threat (APT) reports, gathered from multiple sources, such as GitHub repositories [32,33], threat intelligence platforms, forums, and other credible online outlets. Threat intelligence reports cover broad threat categories, including APTs, malware, vulnerabilities, and other related threat domains, compiled from leading cybersecurity organizations. These reports contain detailed descriptions of adversary techniques, enriched with valuable contextual information essential for intelligence extraction. This diversity was intentional, ensuring that the evaluation set encompassed a broad spectrum of threat scenarios, attack techniques, and adversarial strategies. Each report was reviewed to ensure technical accuracy and contextual completeness, thus providing authentic real-world examples for model evaluation.

4.5. STIX 2.1 Object Generation

This stage processes the extracted threat intelligence, comprising arrays of IOC elements, keyword-replaced text segments, TTPs, tactic names, and technique names, and converts them into the STIX 2.1 format using the official Python 3.X STIX library. While STIX 2.1 defines 18 Domain Objects and 2 Relationship Objects, this framework utilizes three primary Domain Objects: Attack Pattern, Indicator, and Report, along with the Relationship Object to capture associations between extracted elements. The mapping between TIEF classifier outputs and STIX object properties is detailed below.

Attack Pattern Properties
Figure 9 represents the Attack Pattern STIX object.
- type: Implicitly set by creating an AttackPattern object;
- external_references: Mapped to a list containing source_name, mitreattack, external_id, and technique_id;
- name: Combines technique_name and sub_tech_name;
- description: Mapped to the chunks variable, containing descriptive text for each attack pattern;
- aliases: Mapped to a list containing technique_name and sub_tech_name for alternative identification;
- kill_chain_phases: Mapped to the tactic variable (from tactic_name), with phase_name.
Indicator Object Properties
Figure 10 denotes the Indicator STIX object.
- type: Implicitly set by creating an Indicator object;
- name: Mapped to name, dynamically generated as “ioc_type Indicator” (e.g., “ipv4 Indicator”);
- description: Mapped to description, containing ioc_type and ioc_value.;
- indicator_types: Set to (“maliciousactivity”), categorizing the Indicator as malicious;
- pattern: Mapped from ioc_mapping, formatted according to ioc_type and ioc_value;
- pattern_type: Set to “stix”, indicating that the STIX pattern language is used.
Report Object Properties
Figure 11 denotes the Report STIX object.
- type: Implicitly set by creating an Indicator object;
- name: Mapped to name, dynamically generated as “ioc_type Indicator”;
- description: Mapped to description, containing ioc_type and ioc_value;
- report_types: Set to (“threat-report”), indicating the report’s content type as a threat report;
- published: Automatically set to the current date and time;
- object_refs: Mapped to all_ids, a list containing the IDs of all Indicator, Relationship, and AttackPattern objects used.
Relationship Object Properties
Figure 12 denotes the Relationship STIX object.
- type: Implicitly set by creating a Relationship object;
- relationship_type: Set to “indicates”, indicating the relationship between the Indicator and Attack Pattern;
- description: Mapped to description, which combines ioc_value, technique_name, and sub_tech_name for descriptive context;
- source_ref: Mapped to indicator.id, representing the Indicator as the source;
- target_ref: Mapped to attack_pattern.id, representing the Attack Pattern as the target;
- start_time: Set to datetime.now(), representing the current time as the start of the relationship.

5. Results and Discussion

The performance of the DistilBERT model was evaluated across four training epochs, with results recorded across each stage using a set of evaluation metrics, including F1 score, ROC AUC, Hamming Loss, and runtime statistics, as shown in Table 6. The F1 score, which balances precision and recall, remained consistently high throughout the training process, averaging around 0.933, underscoring the model’s ability to maintain a strong balance between correctly identifying relevant TTP labels and minimizing false positives or negatives. ROC AUC values were also stable, averaging approximately 0.964 across epochs, indicating the model’s excellent ability to distinguish between classes in the multi-label classification setting.

The Hamming loss, which measures the fraction of labels that are incorrectly predicted, was observed to be exceptionally low during the evaluation. In the second epoch, the Hamming loss had already reached a minimal value of 0.000393 and maintained this low error rate in subsequent epochs, demonstrating the model’s effectiveness in minimizing misclassifications across the diverse set of TTP categories.

The runtime for each epoch was closely monitored to assess computational efficiency. The model completed each epoch with an average runtime of around 22.2 s, processing approximately 540 samples per second. Steps per second were also consistent, with the model processing around 16.8 steps per second. The optimal configuration for the DistilBERT model, as determined in trial 3, included a batch size of 32, a learning rate of

1.023 \times 10^{- 5}

, four training epochs, and a warm-up ratio of 0.0866. The weight decay was set to 0.0475, and the classification threshold was adjusted to 0.391. This set of hyperparameters produced the highest validation performance, with an overall F1 score of 0.9339, confirming the effectiveness of this configuration for the TTP classification task.

The confusion matrix provided in Figure 13 reports 6036 true positives, 6036 true negatives, and 459 false predictions in each error category. The balanced distribution of errors demonstrates the model’s consistent performance across classes, achieving an F1 score of 0.934 with minimal misclassifications.

A broader comparison, as presented in Table 7, situates the TIEF framework within the context of related research and summarizes key aspects such as the classification model used, the number of supported TTP classes, the incorporation of topic modeling for contextual awareness, the ability to output results in the STIX notation, and the achieved F1 scores. The TIEF framework stands out by successfully classifying 560 attack classes and achieving an F1 score of 93.39%, representing a significant improvement over previous approaches. This advancement is particularly notable because it demonstrates the value of incorporating a greater number of sub-techniques into the classification process, thereby enhancing the granularity and practical utility of extracted cybersecurity intelligence.

The TIEF framework improves the precision of threat detection and response in complex attack scenarios by effectively addressing the challenge of identifying sub-techniques. Its multilevel categorization system enables accurate mapping to specific sub-techniques, thereby overcoming the limitations of broader classification approaches. For instance, rather than merely detecting the overarching “System Binary Proxy Execution” technique, TIEF is capable of identifying the specific sub-technique T1218.010 (Regsvr32) when analyzing threats involving the misuse of Regsvr32.exe. This fine-grained classification allows defenders to implement targeted mitigation strategies tailored to the specific binary abuse pattern, rather than relying on generic defenses applicable to the broader technique.

5.1. Tief Model Validation

For validation, a representative sample of 1500 records is selected from the CTI-Bench dataset [34], consisting of 3115 entries [27]. The CTI-Bench dataset consists of real-world descriptions of cyber incidents and vulnerabilities derived from open-source reports. Since the original dataset lacked precise MITRE ATT&CK procedure-level labels critical for evaluation, a team of graduate and PhD cybersecurity students manually reviewed and annotated each prediction generated by the model. The manual validation process ensured the integrity of the performance assessment, despite the unlabeled nature of the original benchmark. The model correctly predicted 831 out of 1500 samples, yielding an accuracy of approximately 55.4%. This moderate accuracy reflects a granularity mismatch as the method is designed to interpret MITRE ATT&CK “procedures”. In contrast, the CTI benchmark entries provide higher-level, generalized textual summaries that usually mention affected products, the type of vulnerability, and potential impacts, but lack concrete procedural information or a step-by-step behavioral context.

5.2. Error Analysis

To analyze the classification errors made by the model, a detailed error analysis is performed on a representative sample of misclassified instances. Of the total of 669 misclassified cases from the 1500-sample validation dataset, a subset of 60 examples was manually reviewed to identify the causes of errors. This sample was chosen to provide a manageable but representative snapshot for qualitative examination.

From this manually reviewed subset, it is found that approximately 58% of the errors were due to semantic overlap between closely related MITRE ATT&CK techniques or sub-techniques. For example, the model frequently confused file upload vulnerabilities with system discovery techniques due to overlapping terminology such as “systemConfig” or “upload”. Approximately 25% of the errors in this sample arose from insufficient procedural detail within the input descriptions, where the textual data lacked explicit attacker actions or a concrete operational context critical for precise classification. The remaining 17% were attributed to label noise or ambiguity, including inconsistencies or vagueness in the underlying reference data. The Table 8 presents a sample of the first 10 misclassified instances from the reviewed subset, detailing the description, predicted and correct techniques, error categories and explanations for each misclassification.

5.3. Limitations

While TIEF demonstrates promising performance in automated TTP classification, certain limitations should be taken into consideration. The approach is specifically designed to classify MITRE ATT&CK procedures, which are detailed stepwise descriptions of adversarial behaviors, including concrete operational actions, such as commands or attacker tactics. The benchmark dataset employed for the evaluation, CTI-Bench, consists predominantly of generalized descriptions of vulnerabilities and incidents. These descriptions often lack explicit procedural context, resulting in a granularity mismatch that limits the model’s ability to demonstrate its strengths and yield moderate accuracy results fully. To date, there are no publicly available benchmark datasets that fully align with procedure-level classification, presenting a broader challenge for evaluating such models in the CTI domain.

DistilBERT, a general-purpose pre-trained language model that is not specifically focused on cybersecurity corpora, is used in the implementation. As a result, domain-specific terminology and subtle expressions of attack methods or behavior might not be fully captured by the model. By more accurately representing specialized language and lowering semantic ambiguity, models trained on cybersecurity datasets (such as CyberBERT, SecBERT, and SecureBERT) may improve performance.

Addressing these limitations in future work through the development of procedure-aligned benchmark datasets and the use of domain-adapted language models can further improve the effectiveness and applicability of automated TTP extraction approaches.

6. Conclusions and Future Works

This research work presents TIEF, an automated tool that extracts TTPs from unstructured threat intelligence reports and represents them in the STIX 2.1 standardized format. The presented work goes beyond TTP extraction and also identifies IOCs within the relevant textual context. Through the integration of BERTopic-based dynamic sentence grouping, the framework effectively organizes raw report content into different threat contexts, improving the interpretability and utility of extracted intelligence.

A key contribution of this work lies in addressing the challenge of limited and imbalanced training data for TTP classification. Using LLM oversampling, the research work constructed a comprehensive sentence-level TTP dataset comprising 560 technique classes from the MITRE framework. A classifier was developed and trained to handle these 560 distinct labels, representing a significant advancement in this field of research. The fine-tuned DistilBERT Base-Uncased model demonstrated strong performance across the generated dataset, efficiently producing contextualized embeddings and achieving high accuracy in the classification of complex TTPs. By outputting results in STIX 2.1 format, TIEF ensures compatibility and interoperability with a wide range of existing security tools and platforms, thereby facilitating automated threat response and collaborative defense.

Future efforts will focus on refining the grouping of sentences based on the threat context, aiming further to enhance the granularity and relevance of the extracted intelligence. The current implementation of topic modeling within the TIEF framework utilizes BERTopic. Recent advancements have been introduced with U-BERTopic [15], an urgency-aware, BERT-enhanced topic modeling approach designed explicitly for cybersecurity contexts. Given its capability to effectively group data with an improved understanding of cybersecurity-specific texts, integrating U-BERTopic is a future work that has been planned. Additionally, there is potential to expand the framework’s capabilities by incorporating real-time data sources and adaptive learning mechanisms to keep pace with the evolving threat landscape. Ultimately, by automating the extraction and contextualization of TTPs and IOCs, this research empowers security professionals to make informed, timely decisions and strengthens the overall resilience of cybersecurity defenses.

Author Contributions

Conceptualization, A.J.; methodology, A.J.; software, A.J. and Y.N.; validation, A.J., M.C. and Y.N.; formal analysis, A.J. and Y.N.; investigation, A.J. and M.C.; resources, A.J., M.C. and F.K.; data curation, A.J. and Y.N.; writing—original draft preparation, A.J. and Y.N.; writing—review and editing, A.J., M.C., Y.N. and F.K.; visualization, A.J.; supervision, M.C. and F.K.; project administration, F.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study relies on the following three data resources: MITRE ATT&CK TTP corpus (Enterprise, Mobile, and ICS domains): The complete list of tactics, techniques, and sub-techniques was downloaded from the publicly accessible MITRE ATT&CK knowledge base. Cyber-threat-intelligence reports: Narrative CTI reports were obtained from the open repository APT_REPORT. Augmented dataset: To correct class imbalance, a synthetically augmented dataset was produced by prompting a large-language-model pipeline with the original ATT&CK technique descriptions. The augmented dataset, together with the preprocessing and labeling scripts, can be obtained from the corresponding author upon reasonable request. The source code, datasets, and associated scripts of the Threat Intelligence Extraction Framework (TIEF) will be publicly released upon publication to support reproducibility and further research. The repository will be made available at https://github.com/NagareYash/TIEF.git (accessed on 28 August 2025).

Acknowledgments

The authors thank the CoE-CNDS (Centre of Excellence-Complex and Non-linear Dynamic Systems) laboratory, VJTI, Mumbai, for providing access to its cutting-edge research facilities. During the preparation of this study, the authors used the Claude Sonnet API (Anthropic) to augment the MITRE ATT&CK corpus and curate synthetic data whose context matches the original dataset. The authors have reviewed and edited all generated content and take full responsibility for the final manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

APT	Advanced Persistent Threat
CTI	Cyber Threat Intelligence
CVE	Common Vulnerabilities and Exposures
IOC	Indicator of Compromise
LLM	Large Language Model
NLP	Natural Language Processing
ROC AUC	Receiver Operating Characteristic Area under the Curve
SIEM	Security Information and Event Management
SVM	Support Vector Machine
TTP	Tactics, Techniques, and Procedures
STIX	Structured Threat Information eXpression
JSON	JavaScript Object Notation
ICS	Industrial Control Systems
PDF	Portable Document Format
UMAP	Uniform Manifold Approximation and Projection
HDBSCAN	Hierarchical Density-Based Spatial Clustering of Applications with Noise
POS	Part-of-Speech

Appendix A. Extra Figure References

Figure A1. Heatmap of dataset distribution before augmentation.

Figure A2. Heatmap of dataset distribution after augmentation.

References

Abdullahi, M.; Baashar, Y.; Alhussian, H.; Alwadain, A.; Aziz, N.; Capretz, L.F.; Abdulkadir, S.J. Detecting cybersecurity attacks in internet of things using artificial intelligence methods: A systematic literature review. Electronics 2022, 11, 198. [Google Scholar] [CrossRef]
Chakraborty, A.; Biswas, A.; Khan, A.K. Artificial intelligence for cybersecurity: Threats, attacks and mitigation. Artif. Intell. Societal Issues 2023, 231, 3–25. [Google Scholar] [CrossRef]
Sun, Z.; Ni, T.; Yang, H.; Liu, K.; Zhang, Y.; Gu, T.; Xu, W. Flora+: Energy-efficient, reliable, beamforming-assisted, and secure over-the-air firmware update in lora networks. ACM Trans. Sens. Netw. 2024, 20, 1–28. [Google Scholar] [CrossRef]
Li, J.; Wu, S.; Zhou, H.; Luo, X.; Wang, T.; Liu, Y.; Ma, X. Packet-level open-world app fingerprinting on wireless traffic. In Proceedings of the The 2022 Network and Distributed System Security Symposium (NDSS’22), San Diego, CA, USA, 24–28 April 2022. [Google Scholar]
Ni, T.; Lan, G.; Wang, J.; Zhao, Q.; Xu, W. Eavesdropping mobile app activity via {Radio-Frequency} energy harvesting. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023. [Google Scholar]
Berady, A.; Jaume, M.; Tong, V.V.T.; Guette, G. From TTP to IoC: Advanced persistent graphs for threat hunting. IEEE Trans. Netw. Serv. Manage. 2021, 18, 1321–1333. [Google Scholar] [CrossRef]
You, Y.; Jiang, J.; Jiang, Z.; Yang, P.; Liu, B.; Feng, H.; Wang, X.; Li, N. TIM: Threat context-enhanced TTP intelligence mining on unstructured threat data. Cybersecurity 2022, 5, 3. [Google Scholar] [CrossRef]
Highly Evasive Attacker Leverages SolarWinds Supply Chain to Compromise Multiple Global Victims With SUNBURST Backdoor. Available online: https://cloud.google.com/blog/topics/threat-intelligence/evasive-attacker-leverages-solarwinds-supply-chain-compromises-with-sunburst-backdoor (accessed on 15 April 2025).
Tan, Z.; Parambath, S.P.; Anagnostopoulos, C.; Singer, J.; Marnerides, A.K. Advanced Persistent Threats Based on Supply Chain Vulnerabilities: Challenges, Solutions, and Future Directions. IEEE Internet Things J. 2025, 12, 6371–6395. [Google Scholar] [CrossRef]
Froudakis, E.; Avgetidis, A.; Frankum, S.T.; Perdisci, R.; Antonakakis, M.; Keromytis, A. Uncovering Reliable Indicators: Improving IoC Extraction from Threat Reports. arXiv 2025, arXiv:2506.11325. [Google Scholar] [CrossRef]
Badger, L.; Johnson, C.; Waltermire, D.; Snyder, J.; Skorupka, C. Guide to Cyber Threat Information Sharing. In National Institute of Standards and Technology (NIST); National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2016. [Google Scholar]
MITRE ATT&CK Framework. Available online: https://attack.mitre.org (accessed on 15 November 2024).
Husari, G.; Al-Shaer, E.; Ahmed, M.; Chu, B.; Niu, X. Ttpdrill: Automatic and accurate extraction of threat actions from unstructured text of cti sources. In Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, 4–8 December 2017. [Google Scholar]
Legoy, V.; Caselli, M.; Seifert, C.; Peter, A. Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports. arXiv 2020, arXiv:2004.14322. [Google Scholar] [CrossRef]
Aghaei, E.; Niu, X.; Shadid, W.; Al-Shaer, E. Securebert: A domain-specific language model for cybersecurity. In Proceedings of the International Conference on Security and Privacy in Communication Systems, Kansas City, MO, USA, 17–19 October 2022. [Google Scholar]
Park, Y.; You, W. A Pretrained Language Model for Cyber Threat Intelligence. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, Resorts World Convention Centre, Singapore, 6–10 December 2023. [Google Scholar]
Alves, P.M.; Geraldo Filho, P.; Gonçalves, V.P. Leveraging BERT’s Power to Classify TTP from Unstructured Text. In Proceedings of the 2022 Workshop on Communication Networks and Power Systems (WCNPS), Fortaleza, Brazil, 17–18 November 2022. [Google Scholar]
Chen, S.S.; Hwang, R.H.; Sun, C.Y.; Lin, Y.D.; Pai, T.W. Enhancing cyber threat intelligence with named entity recognition using bert-crf. In Proceedings of the GLOBECOM 2023-2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023. [Google Scholar]
Kim, H.; Kim, H. Comparative experiment on TTP classification with class imbalance using oversampling from CTI dataset. Secur. Commun. Netw. 2022, 2022, 5021125. [Google Scholar] [CrossRef]
Rani, N.; Saha, B.; Maurya, V.; Shukla, S.K. TTPHunter: Automated Extraction of Actionable Intelligence as TTPs from Narrative Threat Reports. In Proceedings of the 2023 Australasian Computer Science Week, Melbourne, Australia, 31 January–3 February 2023. [Google Scholar]
Rani, N.; Saha, B.; Maurya, V.; Shukla, S.K. TTPXHunter: Actionable Threat Intelligence Extraction as TTPs from Finished Cyber Threat Reports. Dig. Threats Res. Pract. 2024, 5, 1–19. [Google Scholar] [CrossRef]
Castaño, F.; Gil Lerchundi, A.; Orduna Urrutia, R.; Fernandez, E.F.; Alaiz-Rodrıguez, R. Automating Cybersecurity TTP Classification Based on Unstructured Attack Descriptions. In Proceedings of the IX Jornadas Nacionales de Investigación En Ciberseguridad, Sevilla, Spain, 27–29 May 2024. [Google Scholar]
Albarrak, M.; Pergola, G.; Jhumka, A. U-BERTopic: An urgency-aware BERT-Topic modeling approach for detecting cyberSecurity issues via social media. In Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, Lancaster, UK, 29–30 July 2024. [Google Scholar]
Zhong, X.; Zhang, Y.; Liu, J. PenQA: A Comprehensive Instructional Dataset for Enhancing Penetration Testing Capabilities in Language Models. Appl. Sci. 2025, 15, 2117. [Google Scholar] [CrossRef]
Demirol, D.; Das, R.; Hanbay, D. A Novel Approach for Cyber Threat Analysis Systems Using BERT Model from Cyber Threat Intelligence Data. Symmetry 2025, 17, 587. [Google Scholar] [CrossRef]
Li, Z.X.; Li, Y.J.; Liu, Y.W.; Liu, C.; Zhou, N.X. K-CTIAA: Automatic Analysis of Cyber Threat Intelligence Based on a Knowledge Graph. Symmetry 2023, 15, 337. [Google Scholar] [CrossRef]
Alam, M.T.; Bhusal, D.; Nguyen, L.; Rastogi, N. Ctibench: A benchmark for evaluating llms in cyber threat intelligence. Adv. Neural Inf. Process. Syst. 2024, 37, 50805–50825. [Google Scholar]
Yong, J.; Ma, H.; Ma, Y.; Yusof, A.; Liang, Z.; Chang, E.C. AttackSeqBench: Benchmarking Large Language Models’ Understanding of Sequential Patterns in Cyber Attacks. arXiv 2025, arXiv:2503.03170. [Google Scholar] [CrossRef]
Zhang, J.; Wen, H.; Li, L.; Zhu, H. UniTTP: A Unified Framework for Tactics, Techniques, and Procedures Mapping in Cyber Threats. In Proceedings of the 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Sanya, China, 17–21 December 2024. [Google Scholar]
Meng, C.; Jiang, Z.; Wang, Q.; Li, X.; Ma, C.; Dong, F.; Ren, F.; Liu, B. Instantiating Standards: Enabling Standard-Driven Text TTP Extraction with Evolvable Memory. arXiv 2025, arXiv:2505.09261. [Google Scholar] [CrossRef]
MITRE ATT&CK—Techniques. Available online: https://attack.mitre.org/techniques (accessed on 31 July 2025).
APT_REPORT: Collection of APT Campaign Reports. Available online: https://github.com/blackorbird/APT_REPORT.git (accessed on 31 July 2025).
CTI-HAL: Cyber Threat Intelligence—Hierarchical Attention Learning. Available online: https://github.com/dessertlab/CTI-HAL (accessed on 31 July 2025).
CTI-Bench: A Benchmark Dataset for Cyber Threat Intelligence Evaluation. Available online: https://github.com/xashru/cti-bench (accessed on 31 July 2025).

Figure 1. Example representation of a STIX 2.1 bundle.

Figure 2. Threat Intelligence Extraction Framework (TIEF) for automated TTP and IOC extraction.

Figure 3. Preprocessing stages in TIEF.

Figure 4. Sentence clusters formed on applying BERTopic modeling.

Figure 5. Structured representation of extracted IOCs in JSON format.

Figure 6. Architecture of the DistilBERT Base-Uncased model used for multi-label TTP classification.

Figure 7. Distribution of MITRE ATT&CK technique classes in the scraped dataset before augmentation.

Figure 8. Distribution of MITRE ATT&CK technique classes in the scraped dataset after augmentation.

Figure 9. Example of a STIX 2.1 attack-pattern object generated from extracted IOC elements.

Figure 10. Example of a STIX 2.1 indicator object generated from extracted IOC elements.

Figure 11. Example of a STIX 2.1 report object generated from extracted IOC elements.

Figure 12. Example of a STIX 2.1 relationship object generated from extracted IOC elements.

Figure 13. Confusion matrix for TTP classification using the DistilBERT model on the evaluation dataset.

Table 1. IOCs used for TTP identification from threat intelligence reports.

IOC Element	Extraction Technique	Example
IPv4	Regex	192.168.1.1
Domain	Regex	example.com
Email	Regex	name@domain.com
URL	Regex	https://example.com
Query ASN	Regex	AS13335
Filename	Regex	document.txt
File Hash	Regex	d41d8cd98f00b204e9800998ecf8427e
File Path	Regex	/home/user/…/document.txt
CVE	Regex	CVE-2023-45678
Registry Key	Regex	HKEY_LOCAL_MACHINE\…
Encryption Algorithm	Gazetteer	AES, DES
Communication Protocols	Gazetteer	TCP, HTTP
Data Object	Gazetteer	Desktop

Table 2. Comparison of topic modeling techniques.

Aspect	HDP	BERTopic	NER
Approach	Bayesian nonparametric model	BERT embeddings + HDBSCAN	Entity-driven topic extraction
Strengths	Adapts topic count dynamically	Rich semantics, dynamic topics	Granular, entity-focused topics
Limitations	Computationally intensive	Demands transformer knowledge	Misses non-entity topics
Topic generation approach	Topics automatically generated	Topics automatically generated	Topic count to be generated has to be given

Table 3. Key features extracted from the MITRE ATT&CK dataset for TTP classifier training.

Column Name	Explanation
Technique Id	A unique identifier assigned to each technique.
Sub-Technique Id	A unique identifier assigned to each sub-technique.
Technique Name	Represents the methods adversaries use to achieve their goals.
Sub-technique Name	Provides a more detailed view of the primary techniques.
Tactic Name	A broader category or goal under which each technique falls, explaining the adversary’s intent.
Platform	The platforms (e.g., Windows, Linux, Mobile) targeted by each technique.
Procedure Description	Detailed, step-by-step accounts of how adversaries execute techniques, providing critical operational context.

Table 4. Classification accuracy and F1 scores for TTP extraction using original versus LLM-augmented datasets.

Metrics	Non-Augmented Data	Augmented Data
ROC AUC	97.45%	96.9%
F1 Score	35%	93%

Table 5. Comparative performance of DistilBERT, SecBERT, and SecureBERT models for TTP classification.

Metrics	DistilBERT	SecBERT	SecureBERT
Accuracy	93.29%	69.57%	87.27%
ROC AUC	96.67%	75.81%	91.00%
Hamming loss	0.00062	0.00500	0.00250
Validation loss	0.00500	0.15000	0.02300
Learning rate	$1.4690 \times 10^{- 5}$	$6.1546 \times 10^{- 5}$	$7.0986 \times 10^{- 5}$
Batch size	8	12	12
Epochs	12	10	10

Table 6. Results of DistilBERT model for TTP classification.

Epoch	Validation Loss	F1 Score	ROC AUC	Hamming Loss	Runtime (s)	Samples per Sec	Steps per Sec
1	0.002694	0.933747	0.964762	0.000394	22.8199	524.192	16.389
2	0.002744	0.933857	0.964838	0.000393	22.2589	537.403	16.802
3	0.002877	0.933568	0.964492	0.000395	22.1404	540.279	16.892
4	0.002899	0.933941	0.964665	0.000393	22.2326	538.040	16.822

Table 7. Comparison of existing TTP extraction approaches.

Paper Title	Classification Technique	No. of TTP Classes	Topic Modeling Technique	STIX Object Notation	F1 Score	Advantage	Disadvantage
TTPHunter [20]	Multi-class classification with linear classifier	177	No	No	88%	Efficient classification with linear model	Limited to narrative reports without standardized format
TTPDrill [13]	Multi-class classification using SVM	187	No	Yes	82.98%	STIX integration for standardized sharing	Extracts separate techniques without comprehensive context
TIM [7]	Multi-class classification using TCENet	6	No	Yes	94.1%	High accuracy for covered TTPs	Limited to only 6 TTPs, does not reflect the full spectrum of threats
TTPXHunter [21]	Multi-class classification using fine-tuned SecureBERT	193	No	Yes	97.09%	Domain-specific language model	One-to-one classifier limits multi-TTP extraction from single sentences
TIEF	Multi-label classification using TIEF classifier	560	BERTopic	Yes	93.39 %	Completely handles the wide range of TTPs with classification accuracy	Lacks domain-specific model benefits

Table 8. Statistical summarization of error categories for the first 10 misclassified cases.

Description (CTI-Bench)	Predicted	Correct Technique	Error Category	Explanation of Overlap
IBM OpenPages with Watson 8.3 and 9.0 could provide attackers persistence via crafted configs…	`t1456, t1398, t1404, t1221`	`T1547.001 Registry Run Keys/Startup Folder`	Insufficient Detail	The description hints at persistence but does not specify how the model guessed multiple persistence techniques.
Improper Neutralization of Input during Web Page Generation leads to reflected XSS…	`t1489`	`T1059.007 JavaScript`	Semantic Overlap	XSS payloads are JavaScript, but the model mapped to phishing impact instead.
Out-of-Bounds Write in svc1td_x64.dll allows arbitrary code execution	`t1221`	`T1203 Exploitation for Client Execution`	Label Noise	The training label had “defense evasion”, but the description clearly describes an exploit.
The Essential Addons for Elementor plugin allows privilege escalation via unquoted service path.	`t1404, t1221`	`T1068 Exploitation for Privilege Escalation`	Semantic Overlap	Unquoted service paths are a known privilege-escalation exploit, not a code-execution flaw.
Server-Side Request Forgery (SSRF) vulnerability in URL fetch allows data exfiltration	`t1404, t1105`	`T1573.001 Server-Side Request Forgery`	Semantic Overlap	SSRF was lumped under “exfiltration” instead of its own technique.
Use of hard-coded credentials in Java app’s config file leads to unauthorized account access.	`t1136`	`T1552.001 Credentials in Files and Locations`	Semantic Overlap	The model saw “create account” instead of stolen creds from file.
DLL search order hijacking in MyApp.exe allows attacker-controlled DLL to load instead of legit library.	`t1221`	`T1574.001 DLL Search Order Hijacking`	Semantic Overlap	Mapped to generic defense evasion rather than the specific DLL hijack.
Weak encryption (MD5) used for password hashing in PHP web app enables offline cracking.	`t1486`	`T1550.002 Unsecured Credentials: Password Hashes`	Insufficient Detail	The model guessed ransomware impact (MD5 = malware), but it was really just weak creds.
Container escape via misconfigured Docker socket lets attacker get host-level root.	`t1552`	`T1611 Escape to Host`	Semantic Overlap	The model assigned “credentials”, because Docker socket implies PR, but it is a container escape.
Misconfigured S3 bucket ACL publicly exposes sensitive files.	`t1537`	`T1537 Transfer Data to Cloud Account`	Label Noise	The label was “ingress tool transfer”, but public-ACL is simply misconfiguration.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Joy, A.; Chandane, M.; Nagare, Y.; Kazi, F. Threat Intelligence Extraction Framework (TIEF) for TTP Extraction. J. Cybersecur. Priv. 2025, 5, 63. https://doi.org/10.3390/jcp5030063

AMA Style

Joy A, Chandane M, Nagare Y, Kazi F. Threat Intelligence Extraction Framework (TIEF) for TTP Extraction. Journal of Cybersecurity and Privacy. 2025; 5(3):63. https://doi.org/10.3390/jcp5030063

Chicago/Turabian Style

Joy, Anooja, Madhav Chandane, Yash Nagare, and Faruk Kazi. 2025. "Threat Intelligence Extraction Framework (TIEF) for TTP Extraction" Journal of Cybersecurity and Privacy 5, no. 3: 63. https://doi.org/10.3390/jcp5030063

APA Style

Joy, A., Chandane, M., Nagare, Y., & Kazi, F. (2025). Threat Intelligence Extraction Framework (TIEF) for TTP Extraction. Journal of Cybersecurity and Privacy, 5(3), 63. https://doi.org/10.3390/jcp5030063

Article Menu

Threat Intelligence Extraction Framework (TIEF) for TTP Extraction

Abstract

1. Introduction

2. Literature Review

3. Background and Preliminaries

4. Threat Intelligence Extraction Framework (TIEF)

4.1. Raw Report Ingestion and Text Preprocessing

4.2. Contextual Sentence Grouping via Topic Modeling

4.3. IOC Extraction and Soft Tagging

4.4. Multi-Label TTP Classification

4.4.1. Training Data Construction and Augmentation

4.4.2. Model Selection and Hyperparameter Optimization

4.5. STIX 2.1 Object Generation

5. Results and Discussion

5.1. Tief Model Validation

5.2. Error Analysis

5.3. Limitations

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Extra Figure References

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI