1. Introduction
Accurately identifying Tactics, Techniques, and Procedures (TTPs) from unstructured cyber-attack descriptions is a core task in threat intelligence analysis [
1,
2]. This task involves three major challenges: First, attack narratives are concrete, diverse, and often colloquial, whereas TTP definitions are highly abstract and standardized [
3]. The mismatch in linguistic style and semantic granularity makes direct alignment difficult. Second, a single description may correspond to multiple TTPs with varying semantic intensity, and these associations form a continuous spectrum that cannot be effectively represented by hard-label binary classification frameworks [
4]. Third, real-world TTP taxonomies are dynamic and open-ended, while annotated data typically exhibit long-tailed distributions, requiring models to generalize to unseen labels and remain robust under sparse samples [
5].
Existing research generally falls into three methodological directions. Classification-based approaches treat TTP identification as a multi-label prediction task using CNNs [
6,
7], BiLSTMs [
8,
9,
10], or fine-tuned BERT-derived models [
11,
12,
13]. Although easy to deploy, these methods rely on a closed-set assumption and generalize poorly to novel or evolving TTPs. Retrieval-based approaches project attack descriptions and TTP definitions into a shared semantic space and perform similarity-based matching [
14,
15,
16]. However, their symmetric encoding architectures, such as dot-product similarity on single-vector embeddings, limit their ability to capture asymmetric semantic relations between free-form descriptions and standardized TTP labels [
17,
18]. Large language model (LLM)-based approaches employ GPT- [
19,
20], T5- [
21,
22,
23], or instruction-tuned models for direct label reasoning [
24,
25], often with few-shot prompting. While these models provide strong semantic understanding, they incur high inference costs and lack reliable confidence estimation, making large-scale operational deployment difficult [
26,
27].
Overall, existing methods face fundamental limitations: classification models do not support dynamic TTP taxonomies; retrieval approaches struggle with asymmetric semantics relationships, where the mismatch between semantic representations leads to alignment difficulties; and LLM-based systems, though powerful, remain computationally expensive and operationally opaque. These issues become more pronounced when dealing with emerging or evolving attack behaviors, highlighting the need for a lightweight solution that balances inference efficiency with open-set generalization capability.
To address these challenges, we propose DTGBI-TM (Dual-Tower Gated Binary Interaction Model for TTP Matching), a lightweight semantic matching framework that integrates several novel mechanisms to overcome the limitations of existing approaches. DTGBI-TM combines soft-label supervision, hierarchical hard-negative sampling, and gated binary interaction modeling. The soft-label supervision enhances model flexibility by incorporating continuous-label learning, while hierarchical hard-negative sampling improves robustness by better handling long-tailed distributions. Most importantly, the gated binary interaction modeling addresses this limitation by explicitly learning both shared and differential semantic pathways. This allows DTGBI-TM to capture the complex, asymmetric nature of the interactions between attack descriptions and TTP definitions, providing a more nuanced and accurate semantic representation.
The main contributions of this work are as follows:
Soft–hard collaborative supervision. We introduce a confidence-guided mechanism that extends TTP matching from binary classification to a hybrid discrete-classification + continuous-regression paradigm, preserving fine-grained semantics of boundary samples.
Hierarchical hard-negative sampling. Leveraging ATT&CK hierarchical priors, we design intra-tactic hard-negative sampling via temperature-scaled similarity softmax among sibling techniques, and cross-tactic hard-negative sampling through tactic-level anchoring based on inverse weighted semantic distance, achieving both local confusion and global coverage.
Gated binary interaction modeling. We propose a semantic interaction module that explicitly models deep interactions between attack descriptions and TTP definitions. A dimension-wise gating mechanism adaptively learns shared and differential semantic pathways, balancing similarity aggregation and discrepancy separation to enhance fine-grained discrimination.
Multi-objective optimization with dynamic weighting. We develop a unified optimization framework integrating weighted classification loss [
28,
29], soft-label semantic regression loss [
30,
31], and representation-consistency contrastive loss [
32,
33]. A dynamic weighting strategy further mitigates optimization bias under long-tailed [
34] label distributions.
Comprehensive experimental validation. Experiments across multiple task scenarios demonstrate that DTGBI-TM consistently outperforms strong baselines in semantic modeling, TTP prediction, and cold-start recognition, while maintaining high deployment efficiency.
2. Related Work
Research on identifying Tactics, Techniques, and Procedures (TTPs) from unstructured cyber-attack descriptions has progressed substantially, giving rise to three major methodological directions: multi-label classification, retrieval-based semantic matching, and pretrained large-language-model (LLM) approaches.
2.1. Classification-Based Methods
Classification-based approaches typically adopt CNNs, BiLSTMs, or fine-tuned BERT models to output TTP probability distributions. Recent studies have further validated the utility of hybrid deep learning architectures (e.g., combining Autoencoders with Gated Recurrent Units) for enhanced cyber threat detection [
35]. Representative efforts include TTPDrill [
36], which employs an enhanced BM25-weighted TF–IDF [
37] scheme for similarity mapping; BERT–BiLSTM–CRF [
38] pipelines for extracting threat actions and aligning them with ontologies [
39]; and BERT-based variants evaluated across multiple architectural configurations [
40]. Despite their simplicity and efficiency, these models often miss behaviors embedded in complex clause structures, degrade under semantic ambiguity, and struggle with emerging or long-tailed TTPs due to the inherent limitations of closed-set multi-label classification.
2.2. Retrieval-Based Methods
Retrieval-based methods encode attack descriptions and TTP definitions into a shared semantic space and perform matching via similarity metrics. RENet [
41] improves technique classification using tactic–technique [
42] transition matrices; SeqMask [
43] extracts behavioral cues for conditional prediction; SATG [
44] enables interpretable, evidence-based [
45] TTP classification; and ATHRNN [
46] incorporates hierarchical dependencies from MITRE ATT&CK. These approaches enhance robustness under semantic ambiguity and provide varying levels of interpretability. However, they still struggle to bridge the semantic gap between free-form attack narratives and standardized TTP definitions, especially when semantic asymmetry is significant.
2.3. LLM-Based Methods
LLM-based methods leverage GPT-, T5-, or instruction-tuned models for direct TTP inference or few-shot prompting. Examples include AECR [
47], which uses fine-tuned LLMs [
48] for CTI parsing; TTPFShot [
49], which employs retrieval-augmented few-shot prompting; and subsequent evaluations demonstrating advantages of retrieval-augmented LLMs in TTP extraction [
50,
51]. Although these models provide strong generalization and few-shot capability, hallucination, unstable confidence estimation, and high inference cost significantly limit their reliability and deployability in operational cybersecurity environments [
52].
3. Task Definition
In cybersecurity, TTP identification aims to extract standardized MITRE ATT&CK labels from unstructured descriptions of attack behaviors. These labels support alert correlation, attack-chain reconstruction, and automated response by providing a unified semantic representation of adversarial actions. Input sources typically include threat intelligence reports, APT analyses, sandbox logs, SIEM alerts, and other heterogeneous textual data with diverse linguistic styles.
Such texts often contain long and complex sentences, implicit cross-sentence dependencies, loosely structured expressions, and substantial linguistic variability. In contrast, MITRE ATT&CK TTP labels define clear semantic granularity and explicit boundary constraints. This disparity between free-form narratives and standardized label definitions creates a fundamental challenge for semantic alignment and structured prediction.
This work formulates TTP identification from unstructured attack descriptions as a semantic matching problem. We denote the attack-description corpus and the standardized TTP label set as
Here,
represents the
i-th attack-description text, and
denotes the
j-th standardized TTP label. The goal is to learn a matching function
that measures the semantic relevance between a description
and a candidate TTP label
:
The matching score
quantifies the semantic consistency between the text and the candidate TTP. Based on pairwise relevance scores, we rank the label candidates for each text and select the top-
K most relevant labels as the prediction result:
This formulation captures the essence of the TTP matching task: learning fine-grained semantic alignment between heterogeneous, unstructured descriptions of adversarial behaviors and a structured, standardized TTP taxonomy.
4. DTGBI-TM Model
Accurately identifying TTP labels hidden within unstructured attack descriptions requires addressing three core challenges: (1) The linguistic style and abstraction level of attack-description texts differ significantly from those of standardized TTP definitions, making direct semantic alignment difficult. (2) Attack descriptions often contain semantic ambiguity and multi-label characteristics, where boundary cases are common and traditional classification models struggle to capture fine-grained relational cues. (3) The TTP taxonomy is inherently open and exhibits long-tailed data distributions, requiring models to generalize to unseen labels while maintaining robustness under label sparsity.
To address these issues, this work proposes a lightweight gated binary interaction model for semantic matching, termed DTGBI-TM. The model integrates soft–hard collaborative supervision, hierarchical hard-negative sampling, and dual-tower gated interaction modeling to enhance semantic representation and improve the adaptability of TTP identification in open and heterogeneous label spaces. DTGBI-TM is designed to extract deep semantic correlations between attack descriptions and TTP labels while retaining sensitivity to nuanced and complex semantic differences.
Figure 1 illustrates the overall architecture of the proposed approach. The pipeline begins by encoding attack-description texts and TTP label definitions into a shared semantic space. A dual-tower encoder produces independent feature representations, followed by a gated binary interaction mechanism that models both shared and differential semantics between the two modalities. This enables fine-grained alignment between free-form text and standardized TTP labels, even when semantic asymmetry is strong.
Furthermore, the model incorporates hierarchical structural information derived from the ATT&CK framework. A tactic-aware and technique-aware hard-negative sampling mechanism is designed to inject domain priors and enhance contrastive difficulty, thereby improving robustness against semantic ambiguity and structural heterogeneity. This sampling strategy increases both intra-tactic discrimination and cross-tactic coverage, contributing to a more effective modeling of long-tailed TTP distributions.
Overall, DTGBI-TM improves semantic resilience and structural diversity by integrating multi-level ATT&CK knowledge, designing tactic-aware and cross-tactic contrastive mechanisms, and constructing adaptive hard-negative samples that enhance differentiation and fine-grained matching capability in complex TTP identification scenarios.
4.1. Soft–Hard Label Supervision Signal Construction
Traditional TTP mapping approaches typically frame the task as a multi-label classification problem, where each input sample is associated with a multi-hot vector indicating the presence (1) or absence (0) of specific TTP labels. However, due to the semantic heterogeneity between unstructured attack descriptions and standardized TTP definitions, determining precise binary assignments is inherently challenging. For example, the sentence “using PowerShell scripts to remotely download files from a server” is more semantically aligned with T1059.001 (PowerShell) and T1105 (Ingress Tool Transfer) than with T1190 (Exploit Public-Facing Application). Such semantic relationships often lie on a continuum rather than adhering to crisp boundaries. Consequently, traditional binary labeling schemes fail to capture the nuanced differences among “strongly related,” “weakly related,” and “irrelevant” labels.
To address this limitation, we propose a semantic soft–hard label supervision mechanism that (i) retains fine-grained associations between input texts and candidate TTP labels, and (ii) smooths label boundaries for cases involving semantic ambiguity. This approach enables the model to better distinguish between semantically adjacent TTP labels and mitigates overfitting caused by inconsistent or imprecise annotations.
To ensure both the consistency and quality of supervision, we developed a rigorous human-in-the-loop (HITL) annotation pipeline involving three senior threat intelligence analysts. The construction process follows three standardized stages: Candidate Generation, Expert Scoring, and Soft-Label Normalization.
4.1.1. Candidate Generation
To reduce the annotation burden associated with exhaustively reviewing the entire label space, we first narrow the search space using a hybrid retrieval strategy. For each input sample , a candidate set is formed by aggregating results from two complementary sources:
Lexical Retrieval: Top-20 TTPs retrieved via BM25-based keyword matching (), ensuring high recall for syntactically related labels.
Semantic Reasoning: Top-5 TTPs generated by an instruction-guided LLM (GPT-3.5) through zero-shot prompting (), aimed at capturing semantic associations beyond lexical overlap.
This hybrid candidate generation strategy substantially reduces the annotation cost: instead of assessing more than 200 techniques per sample, experts focus only on a compact, high-recall set . Presenting BM25- and LLM-based candidates side by side also allows experts to cross-check lexical and semantic retrieval quality, identify missing but relevant TTPs, and correct occasional noise introduced by either source.
4.1.2. Expert Scoring Protocol
Each unique TTP label
is independently evaluated by all three domain experts, who assign a semantic matching score
. This 0–5 scoring scheme is adapted from widely used semantic textual similarity benchmarks (e.g., STS-B [
53]), where human annotators rate the degree of semantic relatedness on a bounded ordinal scale. To reduce subjectivity and standardize the process, we established formal scoring criteria as summarized in
Table 1.
To ensure label consistency, cases with high inter-annotator variance were flagged for joint review, allowing the experts to resolve ambiguities and finalize a consensus score.
4.1.3. Normalization and Soft Label Generation
The resulting consensus score
is then transformed into a soft label
for regression-based supervision. We adopt the following piecewise mapping to preserve semantic granularity:
In this formulation, scores in the ambiguous region are linearly scaled to reflect subtle semantic gradations as perceived by experts. This design ensures that the regression objective imposes penalties proportional to the degree of semantic deviation, thereby encouraging more faithful modeling of expert judgments.
This soft–hard supervision framework preserves semantic continuity while still maintaining clear decision boundaries for highly relevant or irrelevant cases. The resulting hybrid labels facilitate both fine-grained representation learning and improved generalization for TTP mapping tasks.
4.2. Hard-Negative Sampling Based on ATT&CK Hierarchical Semantic Priors
The MITRE ATT&CK framework structures cyber-attack behaviors into a two-level hierarchy consisting of Tactics and Techniques, where each tactic represents the adversary’s strategic objective and each technique or sub-technique corresponds to a concrete implementation step. This hierarchical structure encodes rich semantic priors that can guide TTP modeling and enhance label discrimination. To leverage this property, we introduce a hierarchical hard-negative sampling strategy that injects both within-tactic and cross-tactic semantic contrast, improving the quality and diversity of negative samples and thereby strengthening the model’s fine-grained discriminative capability.
4.2.1. ATT&CK Hierarchical Semantics
Let the complete set of TTP labels be defined as
The 14 high-level tactic categories are denoted as
The mapping from each technique to its corresponding parent tactic is defined as
This mapping function assigns each technique label
to the unique tactic
under which it resides. For any positive ground-truth label
, the techniques that share the same parent tactic are defined as its sibling techniques. The sibling set is given by
Here, denotes the set of sibling techniques under the same high-level tactic ; denotes an individual sibling technique that shares the same tactic semantics with .
Although sibling techniques are semantically related at the tactic level, they may differ significantly in their technical implementation. For example, the TTP labels T1059.001-PowerShell and T1059.003-VBScript both belong to the “Execution” tactic. While both represent script interpreter execution, their operational mechanisms differ across scripting environments. Thus, techniques that appear tactically similar may still constitute different types of hard-negative samples for the model.
To quantify semantic relationships between labels, we use a MiniLM pretrained language model to encode the definition text of each TTP label and obtain corresponding label embeddings. The semantic similarity between a positive label
and a negative label
is measured using the following cosine-style similarity function:
This similarity measure provides the basis for constructing hierarchical hard-negative samples under the ATT&CK framework.
4.2.2. In-Tactic Hard-Negative Sampling
To enhance the model’s ability to distinguish different techniques under the same tactic, we construct a similarity-based Softmax sampling distribution. This distribution incorporates temperature scaling, allowing techniques that are closer to the positive label in semantic space to receive higher sampling probability.
Given an attack-description text and its corresponding positive technique label
, let its sibling technique set be
. Using MiniLM to encode the definition texts of TTP labels, we obtain their embedding vectors
and
. The semantic similarity between
and a sibling technique
is computed as
Within the sibling set
, to automatically select more confusing hard-negative samples, the sampling probability of a sibling technique
is defined as
where the temperature parameter
controls the smoothness of the sampling distribution. When
, the distribution approaches a one-hot form, meaning the sibling technique most similar to the positive label is selected. When
, the distribution becomes smoother, enabling broader exploration among sibling techniques.
Finally, based on the probability , a sibling technique is sampled to construct the in-tactic hard-negative set .
4.2.3. Cross-Tactic Hard Negative Sampling
To maintain global semantic stability and mitigate class imbalance, this work introduces a cross-tactic hard negative sampling mechanism at the tactic level. The idea is to draw negative samples from tactics different from that of the given positive label, expanding semantic coverage and preventing the model from overly focusing on locally clustered categories.
For a given positive label
with its tactic defined as
, negative samples may be drawn from the set of techniques belonging to tactics
. We define the average semantic distance between two distinct tactics
and
as
where
and
denote the sets of TTP technique labels belonging to tactics
and
, respectively, and
indicates their cardinalities. The function
measures the semantic cosine similarity between label embeddings. This distance quantifies the average semantic gap between the two tactics: a larger value indicates greater semantic discrepancy.
Based on the average semantic distance across tactics, constructs a tactic-level semantic distribution. By computing the average cosine similarity gap for all techniques across tactics, this measure reflects the degree of semantic overlap or separation among tactics. Smaller similarity and larger distance imply stronger semantic divergence.
Given a positive sample
associated with tactic
, the probability of selecting a cross-tactic negative sample from tactic
is defined by an inverse-distance Softmax:
where
is set to
to avoid division by zero.
The sampling distribution ensures that (1) larger semantic distance yields a lower sampling probability, indicating that semantically distant tactics contribute less frequently as negative examples; (2) smaller semantic distance yields a higher sampling probability, meaning that more semantically related tactics are preferentially sampled as hard negatives.
This mechanism ensures sufficient cross-tactic negative coverage without deviating from the global semantic space.
After sampling a target tactic according to , negative labels are then drawn from the technique label set , following the intra-tactic sampling distribution , forming the final cross-tactic hard negative set .
4.3. Dual-Tower Encoder and Gated Binary Interaction Modeling
Unstructured attack descriptions and standardized TTP labels exhibit substantial semantic asymmetry. The former is linguistically loose, diverse, and easily affected by noise, whereas the latter emphasize conceptual definition and hierarchical structure, with stable abstraction granularity. Directly concatenating and feeding both into a unified encoder tends to blur semantics, introduce noise, and weaken the independence and discrimination capability of the label semantic space.
To address this issue, we propose a dual-tower encoder with gated binary interaction, forming a semantic matching architecture named DTGBI-TM. The model jointly balances semantic comprehension, alignment between similar semantics, and separation of distinct semantics. The overall architecture (
Figure 2) consists of three components:
- (1)
Dual-tower encoders: independently encode attack descriptions and TTP definitions to ensure semantic consistency within each space;
- (2)
Gated binary interaction: model fine-grained semantic similarity and discrepancy through an asymmetric interaction pathway;
- (3)
Matching prediction layer: output matching confidence, providing supervision signals for multi-objective optimization.
4.3.1. Dual-Tower Encoding Layer
Given an input attack description
and TTP label definition
, we first use a MiniLM pretrained model to extract contextualized sentence embeddings. Let
denote the resulting embeddings:
where
N and
M denote the input sequence lengths, MiniLM represents the pretrained encoder, and
d is the embedding dimension.
The dual-tower structure preserves independent encoding for the two inputs, avoiding semantic interference while maintaining shared model parameters to ensure consistent representation spaces. This encoding strategy retains both the contextual richness of attack descriptions and the abstract stability of label semantics, providing disentangled representations for subsequent interaction layers.
4.3.2. Gated Binary Interaction Semantic Representation Layer
To capture deeper semantic relations between attack descriptions and the definitions of TTP labels, this work introduces a gated binary interaction mechanism. The encoded representations are projected onto complementary shared-semantic and differential-semantic channels, enabling an adjustable semantic fusion between shared-mode and boundary-mode representations.
First, let the attack description vector
and the TTP label vector
serve as the basis units. The binary interaction representation is constructed as
where ⊙ denotes element-wise multiplication,
denotes element-wise absolute difference, and
denotes vector concatenation. The term
represents the shared-semantic path, used to highlight common semantic activation dimensions of the two vectors; the term
represents the differential-semantic path, emphasizing dimension-wise semantic discrepancies.
Although static concatenation can encode basic similarity and difference signals, it lacks the capability to dynamically balance various semantic relations. Therefore, a gating mechanism is introduced to adaptively control the fusion ratio between the shared-semantic and differential-semantic channels. A learnable
d-dimensional gating weight vector is defined as
, controlling how the two channels are combined:
where
and
are trainable parameters, and
is the Sigmoid activation.
The gating weight
acts as a dynamic control signal, determining the fusion direction in semantic space. Using
to weight the two channels, the fused semantic representation is computed as
This representation jointly models semantic consistency and semantic discrepancy. For positive samples, gradient updates encourage larger values in the shared-semantic dimensions , reinforcing semantic consistency; for negative samples, gradient updates encourage larger values in the differential dimensions , improving the model’s boundary discrimination capability and enhancing representation separability.
4.3.3. Matching Prediction Layer
Finally, the fused semantic representation
is projected through a two-layer feedforward network to obtain the matching confidence:
where
,
, and
are trainable weights and biases. The Sigmoid output
p quantifies the matching probability between the attack description and the TTP label and serves as the supervisory signal for multi-objective joint optimization (
Section 4.4).
Through the gated interaction mechanism, DTGBI-TM leverages hierarchical priors to achieve dynamic balance between semantic consistency and semantic discrepancy. The model’s semantic space is guided to maintain both sufficient shared semantic activation for positive samples and strengthened differential semantics for negative samples. This provides a solid semantic foundation for the downstream multi-objective optimization process.
4.4. Multi-Objective Joint Optimization Strategy
The multi-objective joint optimization strategy forms the key component of DTGBI-TM training. Robust optimization strategies are essential for deep learning in cybersecurity [
54]. In this work, we focus on constraining the decision boundary through loss function design. The model uses the fused semantic representation
and the gating vector
as the basis for optimization. The objective is to jointly minimize classification loss, maintain smoothness in the semantic space, and enhance the model’s discrimination capability on fine-grained samples.
Specifically, the optimization strategy integrates three complementary objectives: (1) supervision aligned with hard labels, (2) semantic-level regression, and (3) representation consistency regularization. These objectives together shape the unified semantic space by simultaneously enhancing the separability of positive and negative samples and improving robustness in long-tailed or ambiguous cases.
To this end, the model combines multi-level cues—sentence-level semantics, label-level semantics, and cross-sample structural signals—to avoid relying solely on any single semantic granularity, ensuring stable and balanced semantic optimization.
4.4.1. Classification and Semantic Regression Joint Supervision
First, the classification objective
adopts a weighted binary cross-entropy loss to determine the matching relation between an attack description and a TTP label:
where
B is the batch size,
is the binary hard label of the
b-th attack description,
is the matching confidence output from the gated interaction layer, and
is a balancing weight.
The explicit supervision ensures that the model learns a clear class decision boundary, allowing weakly correlated samples to remain separable in the semantic space.
To model the continuous nature of semantic similarity between an attack description and a TTP label, a regression objective
based on cosine similarity is introduced:
where
denotes the soft semantic similarity label, and
is the cosine similarity between the fused representation
of the
b-th attack description and the label embedding
.
The regression loss calibrates the model by pulling the prediction closer to the continuous semantic similarity label, enabling the learned representations to preserve semantic continuity.
In summary, the classification objective sharpens decision boundaries, whereas the regression objective smooths the output distribution, jointly promoting both discriminative ability and semantic continuity in the learned space.
4.4.2. Semantic Margin Regularization and Contrastive Learning
To enhance the model’s discriminative capability in the semantic latent space—particularly to reinforce margin constraints for hard negative samples—we introduce a contrastive learning objective based on the InfoNCE principle. The loss is defined as
where
is a temperature parameter;
denotes cosine similarity between normalized vectors;
and
represent the gating-weighted positive and negative semantic vectors associated with sample
b. This objective pulls the pair
closer while pushing
apart under a shared normalization framework.
As a result, the dominant direction of becomes aligned with to strengthen the semantic consistency for positive samples, while negative samples are repelled such that avoids being influenced by false semantic associations encoded in . This induces a clear margin structure in the semantic space.
4.4.3. Dynamic Class Weighting and Joint Optimization Objective
In annotated datasets, TTP labels commonly exhibit a long-tailed distribution, where certain techniques have extremely scarce strong-positive samples. Under conventional training, the model can be biased toward majority classes, making it difficult to capture rare but critical semantics. To address this issue, we introduce a dynamic reweighting strategy that adaptively adjusts class-specific weights based on the ratio of strong-positive versus negative samples within each training batch.
Let the dynamic class weights be defined as
where
and
denote the numbers of strong-positive and irrelevant samples in the current batch, respectively. The weights
and
correspond to the sample-specific balancing term
used in the classification loss.
This strategy constructs an adaptive cost-sensitive learning mechanism that dynamically increases the relative weight of minority classes in the loss function, ensuring that the model does not overlook rare but essential TTP patterns, thereby mitigating bias induced by long-tailed distributions.
Finally, the joint optimization objective of DTGBI-TM is defined as
where the combined loss integrates classification boundaries, continuous semantic regression, and contrastive margin regularization. We adopt an equal weighting strategy for these terms. Empirically, we observed that since all three objective functions operate on normalized vectors or bounded probability spaces, their magnitudes remain inherently comparable throughout the training process, rendering additional balancing hyperparameters unnecessary. These three objectives complement each other, allowing the model to form hierarchical semantic sensitivity and achieve smoother and more discriminative decision surfaces in high-dimensional semantic space.
4.5. TTP Inference Output Mechanism and Deployment Overhead Analysis
4.5.1. TTP Inference Output Mechanism
After model training is completed, the goal of the inference stage is, for any attack description , to predict from the complete TTP label set the subset of labels most semantically relevant to , denoted as . The inference process first computes matching confidence scores and then performs thresholding, ranking, and truncation.
For each attack description
and any candidate label
, their semantic relatedness is computed by the trained DTGBI–TM model, producing an estimated confidence score
, which must be converted into a binary relevance prediction. This requires selecting a decision threshold
. Following prior work, the optimal threshold
is determined by maximizing the Micro-F1 score on validation data:
With the optimal threshold, the initial binary relevance prediction is
Considering that a single attack description may correspond to multiple TTPs, all candidate labels satisfying are retained. These labels are then ranked in descending order of their confidence scores, and the top-K labels are selected as the final prediction result.
4.5.2. Model Deployment Overhead Analysis
To evaluate the resource efficiency and deployability of the model, we conduct a parameter and computational complexity analysis based on DTGBI–TM. The encoder is all-MiniLM-L6-v2 with an output embedding dimension of . The analysis below focuses on the newly introduced modules above the encoder.
- (1)
Parameter Analysis.
DTGBI–TM introduces a gating interaction module and two feed-forward layers on top of the dual-tower encoder. Let the concatenated input to the gating module be
, and let the gating projection matrix be
with bias
. The feed-forward layers have weights
,
, and bias terms
, where
. Therefore, the total number of additional parameters is
Substituting
yields
which corresponds to approximately
FP32 bytes or
FP16 bytes. This is significantly smaller than the parameters of the encoder itself (about
), indicating a lightweight add-on module.
- (2)
Computational Complexity Analysis.
The total computation of DTGBI–TM consists of the encoding stage and the interaction stage. The model employs a dual-tower design where each input attack sequence
x and each label definition
y are encoded independently. During inference, label embeddings can be pre-computed and cached offline. Thus, each online query only requires encoding
x once. The encoding complexity is
where
is the input sequence length,
H is the hidden dimension, and
is the number of Transformer layers.
After encoding, the system must compute interaction features between
x and candidate labels. Let the number of candidate labels be
M, with each label embedding dimension
d. The interaction module includes element-wise operations and feed-forward computations:
with
. Therefore, the total interaction complexity for all
M candidates is
During training, DTGBI–TM introduces hierarchical hard-negative sampling. Given cached label embeddings, the computational cost of negative sampling is dominated by similarity computation and weighting:
where
denotes the size of the sibling technique set and
denotes the number of tactic groups. This overhead is relatively small because the model primarily samples a constant number of negatives per step.
Overall, the total complexity of DTGBI–TM is
By contrast, a Cross-Encoder concatenates inputs into a sequence of length
and performs self-attention in each Transformer layer with cost:
which must be recomputed for every candidate label and cannot reuse intermediate representations. As
L grows, this incurs quadratic scaling and significantly slows inference. In contrast, DTGBI–TM benefits from a decoupled architecture that caches label embeddings offline and performs efficient vector interaction online, improving deployability and offering clear advantages in scenarios with large candidate label sets or long label-definition sequences.
5. Experimental Results
This section presents comprehensive experiments on semantic alignment and TTP prediction, which are the two core tasks of the proposed DTGBI-TM model. For semantic alignment, the model enhances the consistency between matched samples in the semantic space by enlarging the inter-class margin and tightening the intra-class compactness. For TTP prediction, the model aims to identify the correct technique labels associated with an attack description. We conduct detailed evaluations on both tasks across multiple representative baselines. Furthermore, we introduce an additional cross-domain generalization experiment based on the TRAM benchmark to verify the robustness of the model in realistic heterogeneous intelligence scenarios.
We report precision, recall, and the F1 score as the primary evaluation metrics. The semantic-matching baselines include TF-IDF, BM25/TTPDrill, SecureBERT-based matching, and TPTHunter. For fine-grained similarity analysis, we further include Sentence-BERT and recent LLM-based reasoning methods (e.g., TPTHunter-X). These models help assess the ability to distinguish between semantically close yet label-divergent samples. We then compare the proposed DTGBI-TM with these methods to validate its effectiveness in both semantic alignment and technique prediction.
5.1. Datasets and Evaluation Metrics
5.1.1. Datasets
The primary dataset used in the experiments is constructed from publicly available threat intelligence reports, APT analytical bulletins, and email-behavior logs. After manual verification, we follow the “soft label annotation protocol” to obtain sample pairs consisting of attack descriptions and their corresponding TTP definitions. The resulting dataset contains 9760 samples, which are split into training and testing subsets with an 8:2 ratio based on the soft-label score distribution. Specifically, 7808 samples are used for training and 1592 for testing, covering a total of 278 TTP labels.
The soft-label scores primarily concentrate around boundary regions, reflecting the continuous semantic relatedness between attack descriptions and TTP labels. We adopt 5-fold cross-validation on the training set and report the best-performing results on the validation folds.
Table 2 further shows the data distribution across different soft-label intervals.
To evaluate out-of-distribution generalization under cross-domain settings, we employ the TRAM (Threat Report ATT&CK Mapper) dataset as a migration-testing benchmark. TRAM is curated by MITRE’s threat intelligence team and is one of the most widely used public datasets for attack-technique identification. Its original annotation scheme is based on simplified ATT&CK indexing, resulting in a large number of semi-structured samples with incomplete context or missing relational information.
Crucially, TRAM provides only discrete binary hard labels and lacks the fine-grained continuous semantic scores required for the soft-label regression objective () central to our framework. This limitation makes it unsuitable for the training phase of DTGBI-TM, which relies on soft supervision to capture boundary semantics. Consequently, we reserve TRAM exclusively for testing, using it to emulate real-world domain-shift scenarios and examine the model’s robustness in handling noisy or weakly structured attack descriptions.
Table 3 summarizes the general statistics of both the primary dataset and the TRAM dataset. To provide a more detailed view of the class distribution within the private dataset,
Figure 3 illustrates the exact sample counts for the top-20 TTP labels. The visualization reveals a significant long-tailed distribution, where high-frequency techniques (e.g., T1059.001
PowerShell) dominate, while the prevalence of subsequent classes decays rapidly. This characteristic highlights the necessity of the proposed dynamic class weighting and hard-negative sampling mechanisms to handle label imbalance effectively.
5.1.2. Evaluation Metrics
Both the semantic modeling optimization task and the TTP prediction task are evaluated using Micro-averaged Precision, Recall, and F1-score, which are widely adopted for performance assessment under imbalanced multi-label classification settings. For deployment evaluation, we additionally report model parameter size, inference latency, and GPU memory consumption.
5.2. Baselines and Experimental Configuration
5.2.1. Baseline Implementations
To thoroughly assess the effectiveness of the proposed DTGBI-TM model, we compare it against representative baselines covering four categories: sparse retrieval, deep-learning-based classification, semantic matching, and LLM-based prompt learning. The baselines and their implementation details are as follows:
- (1)
Sparse Retrieval Methods.
TF-IDF. Constructs bag-of-words representations using term frequency and inverse document frequency, and computes similarity via cosine similarity.
BM25 (TTPDrill). A probabilistic retrieval algorithm that jointly accounts for term frequency and document length. BM25 is adopted by the TTPDrill system as its core mapping module.
- (2)
Deep-Learning-Based Multi-Class Classification.
SecureBERT + Classifier (TTPHunter). TTPHunter leverages SecureBERT—pre-trained on cybersecurity corpora—to extract textual features, which are then fed into a classification layer for multi-class TTP prediction.
SecureBERTDTGBI-TM + Classifier. Based on SecureBERT, this model integrates the proposed soft–hard label joint supervision mechanism, gated bi-interaction semantic feature modeling, and multi-objective loss optimization. These enhancements improve robustness and discriminative ability before the final classification layer.
- (3)
Semantic Matching.
- (4)
LLM-Based Prompt Learning.
LLM (TTPXHunter). TTPXHunter follows a few-shot, prompt-based inference strategy. The input combines the attack description with carefully designed TTP-guided prompt templates, enabling large language models to generate TTP predictions directly. We implemented a few-shot, prompt-based inference strategy consistent with recent works like TTPXHunter. The input combines the attack description with carefully designed TTP-guided system instructions and demonstrative examples, enabling the model to generate predictions via in-context learning. Specifically, we employed Llama3-70B (via 4-bit quantization) as the execution engine for this baseline. It is important to clarify the model roles: while GPT-3.5 was utilized solely for auxiliary candidate generation during data construction (
Section 4.1), Llama3-70B serves as the primary baseline for experimental accuracy comparison. The specific few-shot prompt template used is detailed in
Appendix A.
5.2.2. DTGBI-TM Inference Settings
For the proposed DTGBI-TM model, the decision threshold is determined using 5-fold cross-validation on the training set by maximizing the Micro-F1 score. This optimal threshold is then fixed during evaluation on the test set.
All input sequences are truncated or padded to a maximum length of 255 tokens. Sparse-retrieval baselines perform matching over the complete TTP label repository, while LLM-based baselines adopt the original prompt templates defined in TTPXHunter. These unified settings ensure the fairness and reproducibility of cross-model comparisons.
5.3. Experimental Setup
The experimental environment and key hyperparameter settings are summarized in
Table 4. Following common practices in task-specific and domain-adapted modeling, we perform a systematic hyperparameter search to ensure fair and optimized configurations.
The backbone encoders include MiniLM and BERT-base, selected for their balance between representational capacity and computational efficiency. The optimizer is set to AdamW to leverage its stability under Transformer-based architectures. Based on preliminary validation, the maximum number of training epochs is fixed to 12.
For sensitivity analysis, learning rate, batch size, and dropout rate are tuned via small-scale grid search. The temperature parameter for the InfoNCE loss is selected from the typical range used in representation-learning literature (e.g., SBERT/Siamese-BERT), i.e., . The temperature for hard negative sampling is tuned within . Final configurations are chosen based on validation performance and used consistently across all subsequent experiments to ensure reproducibility.
To further analyze the behavior of the hard-negative sampling strategy inspired by ATT&CK hierarchical semantics, we examine how the sampling temperature influences model performance. The temperature controls the sharpness of the semantic similarity distribution when selecting hard negatives—lower concentrates the distribution, whereas higher yields smoother weighting.
As shown in
Figure 4, when
, the F1 score is relatively low, likely due to overly concentrated sampling that forces the model to focus on a small number of extreme hard negatives and thus destabilizes optimization. As
increases to
, performance steadily improves. Further increasing
to
leads to a slight performance drop, indicating that sampling becomes overly smooth and unable to properly emphasize challenging negatives.
Overall, achieves the best balance and is adopted as the fixed setting in all later experiments to ensure stable performance and reproducibility.
5.4. Experiments and Result Analysis
This section provides a comprehensive evaluation of the proposed DTGBI-TM model across multiple TTP-related tasks. We assess the model from four perspectives: (1) semantic representation learning; (2) TTP classification under both soft- and hard-label supervision; (3) cold-start generalization; (4) ablation analysis and efficiency evaluation. All metrics are reported in percentage form with two decimal places retained for clarity.
5.4.1. Semantic Representation Learning
This task measures the ability of DTGBI-TM to learn discriminative and boundary-aware representations of attack descriptions and TTP labels. We compare our method with representative baselines, including sparse retrieval (TF-IDF, BM25/TTPDrill), deep-learning classifiers (SecureBERT
DTGBI-TM, SecureBERT + Classifier), and semantic matching models (Sentence-BERT).
Table 5 summarizes the results.
The proposed DTGBI-TM model achieves the best performance across all three core metrics, reaching an F1 score of 98.53%. Compared with Sentence-BERT, DTGBI-TM improves the F1 score by 3.53 percentage points, and surpasses BM25 by 15.19 points. Although SecureBERTDTGBI-TM also benefits from our optimization strategies, its large parameter scale (∼110 M) leads to substantial inference cost, significantly reducing its deployment feasibility in resource-constrained environments. By contrast, DTGBI-TM (parameter size ∼22 M) achieves competitive accuracy while substantially improving deployment efficiency, particularly for edge devices and latency-sensitive systems.
5.4.2. TTP Prediction Task
This subsection evaluates the capability of each model to accurately predict TTP labels under a fully supervised setting, where the complete label space is available during testing. To ensure fair comparison, we adopt the same baselines used in the semantic modeling task and compute Precision, Recall, and F1 under the Top-1 prediction setting for each attack description. This evaluation setting emphasizes the model’s ability to identify the single most relevant TTP label and reflects its practical utility in automated TTP detection.
The performance comparison results are summarized in
Table 6. Under the Top-1 metric, DTGBI-TM achieves the highest F1 score of 79.77%, outperforming Sentence-BERT by 3.74 percentage points and surpassing LLM-based prediction by 5.9 points. These results demonstrate that the proposed semantic matching framework—incorporating hybrid supervision and multi-level interaction modeling—offers superior discriminative capability for TTP prediction.
Compared with traditional multi-class classification models, DTGBI-TM demonstrates stronger robustness when predicting labels within a large and semantically diverse TTP space. Although LLM-based methods exhibit strong linguistic understanding, their reliance on generative reasoning results in unstable confidence and weaker Top-1 accuracy. Thus, DTGBI-TM remains the most practical and reliable choice for TTP prediction.
5.4.3. Unseen-Label Generalization
To systematically evaluate the robustness of DTGBI-TM under open-label conditions and heterogeneous linguistic environments, we design two groups of unseen-label generalization experiments. These experiments assess: (1) the model’s ability to handle newly emerging TTP labels; and (2) its capability to transfer across domains with significant data-distribution shifts. Both the main dataset and the TRAM dataset are used for comprehensive analysis.
First, to assess the model’s ability to process completelyunseen TTP labels, we randomly select 20% of the labels in the main dataset and designate them as “cold-start” labels. These labels do not appear during training, ensuring that the model has no prior exposure to them and must infer their semantics solely based on textual descriptions in the testing phase. This setup better reflects real-world open-label environments in which new attack techniques continuously emerge.
The resulting test set contains 384 samples distributed across 56 unseen labels, with approximately five samples per label. Unlike closed-set classification, the model must match the semantic representations of attack descriptions to candidate labels through similarity computation.
Table 7 reports the Top-1 prediction performance on the main dataset under unseen-label conditions.
Under this cold-start setting, DTGBI-TM achieves an F1 score of 59.27%, outperforming the best multi-class baseline by 8.44 percentage points, and surpassing LLM-based prompting methods by 13.88 percentage points. These results demonstrate that DTGBI-TM effectively models semantic discrepancies across labels and generalizes well to novel semantics. Its superiority can be attributed to three factors: (1) the binary-interaction representation provides finer-grained modeling of semantic relations between attack descriptions and candidate labels; (2) the integration of soft-label alignment and hard-label contrastive constraints enhances robustness against label noise and distribution mismatches; (3) unified multi-objective optimization yields more stable embeddings and mitigates overfitting to seen labels.
To further evaluate generalization under domain-shift conditions, we adopt the
TRAM dataset as an out-of-distribution (OOD) test set. As shown in
Table 8, DTGBI-TM achieves an F1 score of
47.25%, exceeding the best multi-class baseline by
4.44 percentage points and outperforming Sentence-BERT and LLM-based prompting models. These results confirm the model’s enhanced resilience to domain variations and its ability to generalize across heterogeneous CTI corpora. In contrast, traditional sparse retrieval models (BM25 and TF-IDF) show limited adaptability due to insufficient semantic modeling capacity.
5.4.4. Ablation Study
To further validate the contribution of each component within the DTGBI-TM framework to overall semantic modeling performance, we conduct an ablation study by removing four core mechanisms: soft-label supervision, ATT&CK-guided hard-negative sampling, the gated bi-interaction semantic module, and the multi-objective joint loss optimization. Experimental results are presented in
Table 9.
The complete DTGBI-TM model achieves the highest performance, with Precision = 98.36%, Recall = 98.70%, and F1 = 98.53%, maintaining strong consistency across all three metrics. Removing soft-label supervision substantially decreases Recall (from 98.70% to 93.32%) and reduces F1 by 3.26 percentage points, indicating that soft-label signals effectively guide the model toward boundary-aware semantic representations.
Similarly, eliminating ATT&CK-informed hard-negative sampling yields a notable decline of 2.92 percentage points in Recall, reflecting the importance of semantically structured hard negatives in enforcing fine-grained separability across confusing TTP labels. Without the gated bi-interaction module, performance decreases by 2.10 percentage points in F1, demonstrating that this module captures cross-level semantic dependencies essential for consistent representation.
Finally, removing the multi-objective loss optimization leads to a noticeable drop in all metrics, showing that combining classification, semantic regression, and contrastive learning objectives strengthens representation smoothness and improves label consistency. Overall, the results confirm that each module plays a critical and complementary role, and jointly they contribute to the performance and robustness of DTGBI-TM.
5.5. Deployment Overhead and Efficiency Evaluation
To assess the deployability of the DTGBI-TM model in engineering scenarios, this section evaluates both computational efficiency and resource consumption. We focus on two aspects: (1) the sampling overhead introduced by the ATT&CK prior–driven hierarchical negative sampling mechanism, which reflects the computational cost during training; (2) the inference-phase efficiency and memory footprint, which indicate the practicality of real-world deployment.
5.5.1. Negative Sampling Efficiency Based on ATT&CK Hierarchical Semantics
All experiments are conducted on an NVIDIA RTX 3090 GPU (24 GB memory) using the FP16 precision mode provided by PyTorch 1.9. The batch size is set to 16. In each training iteration, the model samples one positive example and eight hierarchical negative examples following the ATT&CK prior.
Table 10 summarizes the sampling efficiency.
It indicates that for every 1000 training steps, semantic similarity computation accounts for approximately 0.18 ms/step, whereas cross-domain negative sampling requires 0.47 ms/step, leading to an overall sampling overhead of roughly 0.65 ms/step. This constitutes only 5.3% of the total per-step computation time of DTGBI-TM, showing that the proposed hierarchical negative sampling mechanism imposes negligible overhead while effectively preserving the semantic diversity of negative examples.
5.5.2. Model Size and Deployment Cost Analysis
We further compare DTGBI-TM with representative baselines (including Sentence-BERT, SecureBERT, and large LLMs such as Llama3-70B) under identical hardware settings. The efficiency evaluation results are summarized in
Table 11. Sentence-BERT (all-MiniLM-L6-v2 backbone) and SecureBERT (BERT-base backbone) require significantly larger memory and higher computational cost during inference.
We evaluate each model by processing 256 test samples in batch inference for 100 iterations. Results show that DTGBI-TM, owing to its lightweight architecture, requires only 1.2 GB of GPU memory—substantially lower than SecureBERT and LLM baselines. We recognize that comparing our fine-tuned, task-specific model (DTGBI-TM) with a few-shot general-purpose LLM is not a strictly like-for-like architectural comparison. Nevertheless, this setting is informative in practice, as it highlights the efficiency–accuracy trade-off: in our experiments, a domain-optimized lightweight model (∼22 M parameters) can outperform a much larger general-purpose model (∼70 B parameters) on specialized TTP mapping, suggesting its suitability for resource-constrained deployment scenarios. In addition, the inference latency is reduced by more than 50%, demonstrating its strong deployability in edge environments or large-scale online TTP prediction systems.
Overall, DTGBI-TM achieves an excellent balance between training efficiency, inference latency, and memory usage. The model’s lightweight architecture and hierarchical negative sampling mechanism significantly reduce training cost while maintaining strong accuracy, making DTGBI-TM well-suited for deployment in resource-constrained threat intelligence matching and TTP inference systems.
6. Conclusions
This paper presents a TTP identification framework that integrates soft-label supervision, gated bi-interaction semantic modeling, and multi-objective joint optimization. The method is designed to address several key challenges in threat-technique prediction, including non-standardized attack descriptions, long-tailed label distributions, and blurred semantic boundaries. To this end, we construct a structure-aligned dataset and introduce interaction features together with consistency regularization, enabling the model to achieve stronger generalization on both boundary cases and unseen samples. Experiments across multiple subtasks validate the effectiveness and deployment potential of the proposed approach. Future work will further explore label-structure modeling and semantic relation modeling among multiple TTP labels, with the goal of enhancing model expressiveness in complex cyber-defense scenarios.
Author Contributions
Conceptualization, Z.Q. and F.L.; methodology, Z.Q.; software, Z.Q.; validation, Z.Q., F.L. and Y.Z.; formal analysis, Z.Q.; investigation, Z.Q. and B.L.; resources, M.H.; data curation, M.H.; writing—original draft preparation, Z.Q.; writing—review and editing, F.L. and Y.Z.; visualization, Z.Q.; supervision, F.L.; project administration, F.L.; funding acquisition, Z.Q. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the China Southern Power Grid Major Science and Technology Program (Project No. 037800KC24040002, GDKJXM20240428), titled “Digital Grid Cybersecurity Offense–Defense Simulation and Key Technology Research and Application Program—Multi-Dimensional Active Defense Technology for the Digital Grid”.
Data Availability Statement
The main dataset used in this study originate from internal enterprise systems of China Southern Power Grid. Due to security and confidentiality restrictions, the data cannot be publicly released. However the TRAM dataset can be access from
https://github.com/mitre-attack/tram (accessed on 15 October 2025).
Acknowledgments
The authors thank the China Southern Power Grid Information Center for technical and infrastructure support. During the preparation of this manuscript, the authors used ChatGPT (OpenAI, 2025 version) for language refinement. The authors reviewed and edited the content and take full responsibility for the final version of the publication.
Conflicts of Interest
Authors Zhenghao Qian, Fengzheng Liu, Mingdong He and Bo Li were employed by the company Guangdong Power Grid Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Appendix A. LLM Prompt Templates
To ensure the reproducibility of the LLM baseline results (Llama3-70B), we provide the exact few-shot prompt template used in our experiments. Following the inference strategy of TTPXHunter, we utilized in-context learning with three demonstrative examples to guide the model’s prediction.
| [System Instruction] |
| You are a senior Cyber Threat Intelligence (CTI) analyst. Your task is to map a |
| given attack description to the most relevant MITRE ATT&CK technique. |
| [Constraints] |
Output only the Technique ID (e.g., T1059.001) and Name. Do not provide explanations. If the description is too vague, output “None”.
|
| [Demonstration 1] |
| Input: “Adversaries may use PowerShell to download payloads from a remote |
| C2 server.” |
| Output: T1059.001 (PowerShell) |
| [Demonstration 2] |
| Input: “The malware copies itself to the startup folder to ensure it runs |
| automatically upon system reboot.” |
| Output: T1547.001 (Registry Run Keys/Startup Folder) |
| [Demonstration 3] |
| Input: “Attackers encode the data using Base64 to conceal the command strings.” |
| Output: T1027 (Obfuscated Files or Information) |
| [Target Task] |
| Input: {attack_text} |
| Output: |
References
- Ren, Y.; Xiao, Y.; Zhou, Y.; Zhang, Z.; Tian, Z. CSKG4APT: A cybersecurity knowledge graph for advanced persistent threat organization attribution. IEEE Trans. Knowl. Data Eng. 2022, 35, 5695–5709. [Google Scholar] [CrossRef]
- Qi, R.; Xiang, G.; Zhang, Y.; Yang, Q.; Cheng, M.; Zhang, H.; Ma, M.; Sun, L.; Ma, Z. A trustworthy dataset for APT intelligence with an auto-annotation framework. Electronics 2025, 14, 3251. [Google Scholar] [CrossRef]
- Maymí, F.; Bixler, R.; Jones, R.; Lathrop, S. Towards a definition of cyberspace tactics, techniques and procedures. In Proceedings of the 2017 IEEE International Conference on Big Data, Boston, MA, USA, 11–14 December 2017; pp. 4674–4679. [Google Scholar]
- Peng, Y.; Ye, Z.; Qi, J.; Zhuo, Y. Unsupervised visual–textual correlation learning with fine-grained semantic alignment. IEEE Trans. Cybern. 2020, 52, 3669–3683. [Google Scholar] [CrossRef]
- Huang, J.; Qiu, Q.; Sapiro, G.; Calderbank, R. Discriminative robust transformation learning. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 1333–1341. [Google Scholar]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
- Chua, L.O. CNN: A Paradigm for Complexity; World Scientific: Singapore, 1998. [Google Scholar]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data, Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
- Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Appl. 2021, 33, 4741–4753. [Google Scholar] [CrossRef]
- Aslan, M.F.; Unlersen, M.F.; Sabanci, K.; Durdu, A. CNN-based transfer learning–BiLSTM network: A novel approach for COVID-19 infection detection. Appl. Soft Comput. 2021, 98, 106912. [Google Scholar] [CrossRef]
- Sun, C.; Qiu, X.; Xu, Y.; Huang, X. How to fine-tune BERT for text classification? In Proceedings of the China National Conference on Chinese Computational Linguistics, Kunming, China, 18–20 October 2019; pp. 194–206. [Google Scholar]
- Kula, S.; Kozik, R.; Choraś, M. Implementation of the BERT-derived architectures to tackle disinformation challenges. Neural Comput. Appl. 2022, 34, 20449–20461. [Google Scholar] [CrossRef] [PubMed]
- Souza, F.C.; Nogueira, R.F.; Lotufo, R.A. BERT models for Brazilian Portuguese: Pretraining, evaluation and tokenization analysis. Appl. Soft Comput. 2023, 149, 110901. [Google Scholar] [CrossRef]
- Lahitani, A.R.; Permanasari, A.E.; Setiawan, N.A. Cosine similarity to determine similarity measure: Case study in online essay assessment. In Proceedings of the 2016 4th International Conference on Cyber and IT Service Management, Bandung, Indonesia, 26–27 April 2016; pp. 1–6. [Google Scholar]
- Euzenat, J.; Valtchev, P. Similarity-based ontology alignment in OWL-Lite. In Proceedings of the 6th European Conference on Artificial Intelligence, Valencia, Spain, 22–27 August 2004; pp. 333–337. [Google Scholar]
- Rosenberger, C.; Brun, L. Similarity-based matching for face authentication. In Proceedings of the 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar]
- Clark, H.H.; Clark, E.V. Semantic distinctions and memory for complex sentences. Q. J. Exp. Psychol. 1968, 20, 129–138. [Google Scholar] [CrossRef] [PubMed]
- You, Y.; Jiang, J.; Jiang, Z.; Yang, P.; Liu, B.; Feng, H.; Wang, X.; Li, N. TIM: Threat context-enhanced TTP intelligence mining on unstructured threat data. Cybersecurity 2022, 5, 3. [Google Scholar] [CrossRef]
- Zhang, M.; Li, J. A commentary of GPT-3 in MIT Technology Review 2021. Fundam. Res. 2021, 1, 831–833. [Google Scholar] [CrossRef]
- Liu, J.; Shen, D.; Zhang, Y.; Dolan, W.B.; Carin, L.; Chen, W. What makes good in-context examples for GPT-3? In Proceedings of the Deep Learning Inside Out Workshop, Dublin, Ireland, 27 May 2022; pp. 100–114. [Google Scholar]
- Ni, J.; Abrego, G.H.; Constant, N.; Ma, J.; Hall, K.; Cer, D.; Yang, Y. Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. In Proceedings of the Findings of the Association for Computational Linguistics, Online, 1–6 August 2022; pp. 1864–1874. [Google Scholar]
- Wang, J.; Jiang, Y.; Vincent, M.; Sun, Y.; Yu, H.; Wang, J.; Bao, Q.; Kong, H.; Hu, S. Complete genome sequence of bacteriophage T5. Virology 2005, 332, 45–65. [Google Scholar] [CrossRef]
- Fu, M.; Tantithamthavorn, C.; Le, T.; Nguyen, V.; Phung, D. VulRepair: A T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 14–16 November 2022; pp. 935–947. [Google Scholar]
- Floridi, L.; Chiriatti, M. GPT-3: Its nature, scope, limits, and consequences. Minds Mach. 2020, 30, 681–694. [Google Scholar] [CrossRef]
- Zhuang, H.; Qin, Z.; Jagerman, R.; Hui, K.; Ma, J.; Lu, J.; Wang, X.; Bendersky, M. RankT5: Fine-tuning T5 for text ranking with ranking losses. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 2308–2313. [Google Scholar]
- Schillaci, Z. On-site deployment of LLMs. In Large Language Models in Cybersecurity; Springer: Cham, Switzerland, 2024; pp. 205–211. [Google Scholar]
- Li, D.; Jiang, B.; Huang, L.; Beigi, A.; Zhao, C.; Tan, Z.; Bhattacharjee, A.; Jiang, Y.; Chen, C.; Wu, T.; et al. From generation to judgment: Opportunities and challenges of LLM-as-a-judge. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China, 4–9 November 2025; pp. 2757–2791. [Google Scholar]
- Rezaei-Dastjerdehei, M.R.; Mijani, A.; Fatemizadeh, E. Addressing imbalance in multi-label classification using weighted cross entropy loss. In Proceedings of the 2020 27th National and 5th International Iranian Conference on Biomedical Engineering, Tehran, Iran, 26–28 November 2020; pp. 333–338. [Google Scholar]
- Sellami, A.; Hwang, H. A robust deep convolutional neural network with batch-weighted loss for heartbeat classification. Expert Syst. Appl. 2019, 122, 75–84. [Google Scholar] [CrossRef]
- Muthukumar, V.; Narang, A.; Subramanian, V.; Belkin, M.; Hsu, D.; Sahai, A. Classification vs regression in overparameterized regimes: Does the loss function matter? J. Mach. Learn. Res. 2021, 22, 1–69. [Google Scholar]
- Li, D.; Luo, Z. Regression loss in transformer-based supervised neural machine translation. Int. J. Comput. Commun. Control 2021, 16, 4217. [Google Scholar] [CrossRef]
- Wu, H.; Chen, K.; Luo, Y.; Qiao, R.; Ren, B.; Liu, H.; Xie, W.; Shen, L. Scene consistency representation learning for video scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–22 June 2022; pp. 14021–14030. [Google Scholar]
- Zhao, T.; Zhao, J.; Zhou, W.; Zhou, Y.; Li, H. State representation learning with adjacent state consistency loss for deep reinforcement learning. IEEE MultiMedia 2021, 28, 117–127. [Google Scholar] [CrossRef]
- Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; Yu, S.X. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2537–2546. [Google Scholar]
- Addula, S.R.; Meesala, M.K.; Ravipati, P.; Sajja, G.S. A hybrid autoencoder and gated recurrent unit model optimized by Honey Badger algorithm for enhanced cyber threat detection in IoT networks. Secur. Priv. 2025, 8, e70086. [Google Scholar] [CrossRef]
- Husari, G.; Al-Shaer, E.; Ahmed, M.; Chu, B.; Niu, X. TTPDrill: Automatic and accurate extraction of threat actions from CTI text. In Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, 4–8 December 2017; pp. 103–115. [Google Scholar]
- Xiong, S.H.; Wang, Z.H.; Chen, Z.S.; Li, G.; Zhang, H. Text classification of public online messages in civil aviation: A N-BM25 weighted word vectors method. Inf. Sci. 2025, 704, 121956. [Google Scholar] [CrossRef]
- Dai, Z.; Wang, X.; Ni, P.; Li, Y.; Li, G.; Bai, X. Named entity recognition using BERT-BiLSTM-CRF for Chinese electronic health records. In Proceedings of the 2019 12th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics, Suzhou, China, 19–21 October 2019; pp. 1–5. [Google Scholar]
- Zongxun, L.; Yujun, L.; Haojie, Z.; Juan, L. Construction of TTPs from APT reports using BERT. In Proceedings of the 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing, Chengdu, China, 18–20 December 2021; pp. 260–263. [Google Scholar]
- Alves, P.M.M.R.; Geraldo Filho, P.R.; Gonçalves, V.P. Leveraging BERT to classify TTP from unstructured text. In Proceedings of the 2022 Workshop on Communication Networks and Power Systems, São Paulo, Brazil, 10–12 November 2022; pp. 1–7. [Google Scholar]
- Ge, W.; Wang, J.; Tang, B. Network Threat Intelligence Tactical Classification Based on Correlation Enhancement. J. Sichuan Univ. 2022, 59, 100–108. [Google Scholar]
- Al-Janabi, H.H.M. Philosophy: (Strategy–Tactics–Technique) tripartite negotiating. Tikrit J. Polit. Sci. 2014, 1, 57–88. [Google Scholar] [CrossRef]
- Ge, W.; Wang, J. SeqMask: Behavior extraction over cyber threat intelligence via multi-instance learning. Comput. J. 2024, 67, 253–273. [Google Scholar] [CrossRef]
- Ge, W.; Wang, J.; Lin, T.; Tang, B.; Li, X. Explainable cyber threat behavior identification based on self-adversarial topic generation. Comput. Secur. 2023, 132, 103369. [Google Scholar] [CrossRef]
- Titler, M.G. The evidence for evidence-based practice implementation. In Patient Safety and Quality: An Evidence-Based Handbook for Nurses; Agency for Healthcare Research and Quality: Rockville, MD, USA, 2008. [Google Scholar]
- Liu, C.; Wang, J.; Chen, X. Threat intelligence ATT&CK extraction based on attention transformer hierarchical RNN. Appl. Soft Comput. 2022, 122, 108826. [Google Scholar]
- Chen, M.; Zhu, K.; Lu, B.; Li, D.; Yuan, Q.; Zhu, Y. AECR: Automatic attack technique intelligence extraction based on fine-tuned LLM. Comput. Secur. 2025, 150, 104213. [Google Scholar] [CrossRef]
- Hu, Y.; Kim, H.; Ye, K.; Lu, N. Applying fine-tuned LLMs for reducing data needs in load profile analysis. Appl. Energy 2025, 377, 124666. [Google Scholar] [CrossRef]
- Hamzic, D.; Skopik, F.; Landauer, M.; Wurzenberger, M.; Rauber, A. TTP classification with minimal labeled data: A retrieval-based few-shot approach. In Proceedings of the International Conference on Availability, Reliability and Security, Vienna, Austria, 25–28 August 2025; pp. 387–408. [Google Scholar]
- Fayyazi, R.; Taghdimi, R.; Yang, S.J. Advancing TTP analysis with retrieval-augmented LLMs. In Proceedings of the ACSAC Workshops, Austin, TX, USA, 9–13 December 2024; pp. 255–261. [Google Scholar]
- Rani, N.; Saha, B.; Maurya, V.; Shukla, S.K. TTPXHunter: Actionable threat intelligence extraction as TTPs from cyber threat reports. Digit. Threat. Res. Pract. 2024, 5, 1–19. [Google Scholar] [CrossRef]
- Watters, P.A. The Cyber Operational Environment. In Counterintelligence in a Cyber World; Springer International Publishing: Cham, Switzerland, 2023; pp. 19–29. [Google Scholar]
- Cer, D.; Diab, M.; Agirre, E.; Lopez-Gazpio, I.; Specia, L. SemEval-2017 Task 1: Semantic Textual Similarity—Multilingual and Cross-Lingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation, Vancouver, BC, Canada, 3–4 August 2017; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 1–14. [Google Scholar]
- Vadakkethil, S.E.; Polimetla, K.; Alsalami, Z.; Pareek, P.K.; Kumar, D. Mayfly optimization algorithm with Bidirectional Long–Short Term Memory for intrusion detection system in Internet of Things. In Proceedings of the 2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics, New York, NY, USA, 26–27 April 2024; pp. 1–4. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).