Attacker Attribution in Multi-Step and Multi-Adversarial Network Attacks Using Transformer-Based Approach

Torres, Romina; García, Ana

doi:10.3390/app15158476

Open AccessArticle

Attacker Attribution in Multi-Step and Multi-Adversarial Network Attacks Using Transformer-Based Approach

by

Romina Torres

^1,2,*

and

Ana García

^1,2

¹

Faculty of Engineering and Sciences, Universidad Adolfo Ibáñez, Viña del Mar 2562340, Chile

²

National Center of Artificial Intelligence, Santiago 7820605, Chile

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8476; https://doi.org/10.3390/app15158476

Submission received: 22 June 2025 / Revised: 23 July 2025 / Accepted: 24 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Application of Deep Learning for Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

Recent studies on network intrusion detection using deep learning primarily focus on detecting attacks or classifying attack types, but they often overlook the challenge of attributing each attack to its specific source among many potential adversaries (multi-adversary attribution). This is a critical and underexplored issue in cybersecurity. In this study, we address the problem of attacker attribution in complex, multi-step network attack (MSNA) environments, aiming to identify the responsible attacker (e.g., IP address) for each sequence of security alerts, rather than merely detecting the presence or type of attack. We propose a deep learning approach based on Transformer encoders to classify sequences of network alerts and attribute them to specific attackers among many candidates. Our pipeline includes data preprocessing, exploratory analysis, and robust training/validation using stratified splits and 5-fold cross-validation, all applied to real-world multi-step attack datasets from capture-the-flag (CTF) competitions. We compare the Transformer-based approach with a multilayer perceptron (MLP) baseline to quantify the benefits of advanced architectures. Experiments on this challenging dataset demonstrate that our Transformer model achieves near-perfect accuracy (99.98%) and F1-scores (macro and weighted ≈ 99%) in attack attribution, significantly outperforming the MLP baseline (accuracy 80.62%, macro F1 65.05% and weighted F1 80.48%). The Transformer generalizes robustly across all attacker classes, including those with few samples, as evidenced by per-class metrics and confusion matrices. Our results show that Transformer-based models are highly effective for multi-adversary attack attribution in MSNA, a scenario not or under-addressed in the previous intrusion detection systems (IDS) literature. The adoption of advanced architectures and rigorous validation strategies is essential for reliable attribution in complex and imbalanced environments.

Keywords:

multi-step attack; attack attribution; multi-adversary attribution; intrusion detection systems; Transformer; cybersecurity

1. Introduction

Attack attribution in networks is a critical challenge in cybersecurity, especially when dealing with multi-step and multi-stage attacks [1], such as those carried out by Advanced Persistent Threat (APT) groups [2]. These attacks differ from isolated attacks due to their complexity and duration. According to the MITRE ATT&CK framework [3], attackers involved in such operations use a wide range of tactics, from initial reconnaissance to advanced techniques in later stages, posing significant risks to the information security triad (confidentiality, integrity and availability). In real-world environments, these attacks often occur concurrently, further complicating their attribution. Their objectives typically include data theft, sabotage, or espionage, and they tend to follow recurring strategies and patterns often proven effective in past incidents. Therefore, accurate attack attribution becomes crucial for identifying consistent attacker behaviors and patterns, thereby improving threat response and mitigation in complex environments.

The recent literature has explored various approaches to attack attribution. For example, Xiang et al. [4] introduced a model that integrates threat intelligence with network behavior data, achieving 88.89% accuracy in identifying attacker communities. However, this clustering-based approach does not address scenarios involving multiple adversaries, where each attack must be attributed to its specific attacker. Similarly, Li et al. [5] employed behavioral patterns and rough set theory to analyze correlations between APT groups, achieving 90.90% accuracy. Their analysis operated at the group level rather than the individual instance level. Kim et al. [6] proposed an automated attribution technique that vectorizes the MITRE ATT&CK matrix and matches Indicators of Compromise (IoCs), achieving 91.48% accuracy. Nonetheless, their approach was limited to mobile malware and did not tackle attribution in multi-adversary environments or within sequences of network alerts. In summary, although significant advances have been made in attacker attribution and group-level correlation, a significant gap remains in the literature regarding precise attribution in complex, multi-adversary network scenarios, particularly in identifying the responsible actor behind each sequence of network alerts. In addition, new approaches have emerged that apply hierarchical frameworks enhanced with large language models to extract structured attribution knowledge from cybersecurity reports. While these frameworks enrich the qualitative understanding of attack attribution, they focus primarily on post-event analysis and macro-level insights, lacking direct application in real-time or sequence-based adversary identification tasks [7]. Furthermore, efforts to support cyberattack attribution through NLP and NER techniques have gained momentum, especially with the release of specialized datasets [8]. These resources, designed to extract attacker information from textual cybersecurity reports, reveal the growing interest in augmenting manual attribution workflows using large language models. However, such approaches are inherently limited by their reliance on textual data and lack of integration with sequential alert data or network behavior patterns.

Transformers [9], particularly their attention mechanisms such as multi-head attention, have demonstrated outstanding capabilities in modeling long-term dependencies and capturing complex relationships within extensive sequences. These abilities enable the detection of relevant patterns and the effective management of multiple concurrent information flows. Motivated by these properties, this work formally investigates the impact of attention-based architectures, particularly multi-head attention, in the context of attack attribution within multi-adversary MSNA scenarios. Unlike traditional models that struggle with the concurrency and complexity of diverse data streams, Transformers excel by utilizing self-attention mechanisms and parallel processing capabilities. These features enable them to efficiently manage intricate patterns and simultaneous processes, making them particularly suitable for handling complex scenarios. As a result, Transformers have seen increasing adoption in cybersecurity applications, particularly in the detection of sophisticated APT activities [10], and have shown promising results in intrusion detection tasks due to their ability to learn temporal dependencies and outperform recurrent models in terms of convergence and efficiency [11]. For instance, RAPID [10] introduced self-supervised sequential learning with iterative embeddings to adapt to dynamic system behaviors. TBDetector [12], using a Transformer-based framework, has proven effective at extracting contextual features from provenance graphs. LogShield [13], another Transformer-based detection system, applied training techniques based on RoBERTa, achieving high F1 scores across multiple datasets. Additionally, hybrid models have also been explored, combining Transformers with architectures like Temporal Convolutional Networks (TCNs) to enhance robustness and predictive performance [14]. Recently, the work of Manocchio et al. [15] introduced FlowTransformer, a flexible and efficient framework for the systematic implementation and evaluation of Transformer-based models in network intrusion detection systems (NIDS). FlowTransformer demonstrated that well-designed Transformer architectures can achieve high levels of accuracy (e.g., F1-score of up to 98% on NSL-KDD) while maintaining computational efficiency, outperforming previous models in network traffic classification tasks. Furthermore, the study highlights the importance of component selection, such as input encoding type and classification head, showing that shallow models can be as effective as larger architectures but with lower computational cost.

In this work, we present a novel approach to attacker attribution in complex MSNA environments by leveraging Transformer-based models to move beyond traditional detection and address the multi-adversary challenge. Our hypothesis is that Transformers, due to their capacity to manage multiple information flows and capture complex relationships, can surpass and outperform other architectures in classifying the responsible attacker, even when learning from multiple adversaries simultaneously.

The main contribution of this work to the state of the art lies in the introduction of a Transformer-based model tailored for attacker attribution in complex MSNA scenarios. Unlike prior approaches that focus primarily on detection or classification by attack type, our model is designed to pinpoint the specific attacker among multiple candidates, which is essential for real-world response and mitigation strategies. This approach not only improves attribution accuracy but also opens new avenues for research in complex, adversary-aware attribution systems. Our work aims to bridge a critical gap in the current literature by providing an innovative approach validated in real MSNA scenarios, promising to improve accuracy and effectiveness in identifying attackers in complex environments. Our study employs a dataset that includes multiple attacker entities, enabling us to assess the model’s effectiveness in distinguishing between different adversaries, a challenge that remains not fully addressed in previous work. To the best of our knowledge, this is the first work to achieve supervised, fine-grained attribution of individual attackers using sequential alert data in multi-adversary network environments, establishing a new benchmark for attacker attribution in complex cybersecurity scenarios.

Section 2 reviews the most relevant literature, emphasizing the limitations of existing approaches to attacker attribution. Section 3 outlines the materials and methodology used, including dataset description, preprocessing techniques, model architecture, and training–validation procedures. Section 4 presents the results, featuring exploratory data analysis, model performance metrics, class imbalance evaluation, and interpretability and robustness assessments. Finally, Section 5 discusses the implications of the findings and future research directions, while Section 6 concludes this study and outlines suggestions for future work.

2. Related Work

Attack attribution and adversary identification have been the subject of significant research in recent years, with approaches ranging from behavioral correlation and threat intelligence analysis to, more recently, the use of language models (LLMs) and natural language processing (NLP) techniques. Much of the literature has focused on APTs and malware analysis, where attribution is often performed at the group or campaign level, leveraging static features, code artifacts, or threat intelligence extracted from textual reports [5,6,8,16,17]. For example, Li et al. [5] analyzed correlations among APT groups using behavioral patterns and rough set theory, but their approach operated at the group level and did not address individual attack instances or sequential modeling. Kim et al. [6] proposed an automated attribution method for mobile malware based on MITRE ATT&CK matrix vectorization and indicator similarity, yet their work was limited to the mobile domain and did not consider network traffic or alert sequences.

Recent advances in NLP and LLMs have enabled the extraction of tactics, techniques, and procedures (TTPs) from threat reports and the mapping of these to known actors using semantic embeddings [8,16]. While these approaches demonstrate the potential of semantic enrichment for attribution, they operate primarily on unstructured textual data and do not address the challenges of network-based, sequential attack attribution. Additionally, the problem of attribution evasion was highlighted by Skopik and Pahi [18], who showed how attackers can manipulate technical artifacts to mislead attribution systems, underscoring the need for robust and resilient models.

A comprehensive survey by Rani et al. [17] categorized existing APT attribution methods, including clustering, behavioral correlation, threat intelligence enrichment, and machine learning-based techniques. The survey identified key challenges such as the scarcity of labeled datasets, the difficulty of modeling adversarial behavior over time, and the lack of generalizable, fine-grained attribution models. However, most of these works focus on persistent, long-term campaigns and do not address the attribution of individual attackers in dynamic, multi-adversary network environments.

Closer to the context of this work, Xiang et al. [4] introduced IPAttributor, a model that combines threat intelligence with network behavior data to cluster malicious IP addresses using similarity matrices and dynamic clustering. Although IPAttributor achieves an attribution accuracy of 88.89%, its approach is unsupervised and limited to IP-level grouping, without leveraging deep learning architectures or Transformers and without addressing individualized attribution in multi-adversary scenarios. Importantly, IPAttributor does not model multi-step network attacks or long sequences of events; its analysis is based on individual events and does not consider the temporal dynamics or complexity of attacks composed of multiple steps. Furthermore, its validation is restricted to a single dataset, without hyperparameter exploration or cross-validation.

In summary, most previous approaches focus on group-level attribution, malware analysis, or semantic enrichment from text and do not directly address supervised, fine-grained attribution of individual attackers in network alert sequences within multi-adversary environments. This work situates itself at the intersection of behavioral modeling and deep learning-based attribution, addressing several of the limitations identified in the literature. Specifically, we propose a Transformer-based model for supervised attribution of individual attackers in multi-adversary environments, using temporally ordered sequences of network alerts. Unlike prior work focused on group-level attribution or static features, our approach captures dynamic, sequential behavior and enables fine-grained attribution even in highly imbalanced and realistic scenarios.

3. Materials and Methods

3.1. Dataset Description and Characterization

The dataset employed in this study originated from the DEFCON Capture The Flag 22 event [19], recognized as one of the most prestigious and technically demanding cybersecurity competitions worldwide. Held in Las Vegas from 8 August to 10 August 2014, the event featured 20 finalist teams from countries such as the United States, Taiwan, South Korea, China, Russia, Australia, Germany, Japan, France, Denmark, and Poland, participating in a traditional attack–defense format. This competitive and high-pressure environment enabled the collection of full network traffic, later released in PCAC (packet capture) format, facilitating in-depth forensic analysis. Each team was assigned a distinct CIDR subnet and tasked with defending a designated .2 server IP address within their range. The vulnerable services deployed in the competition included

eliza: a text-based apce economy simulation game;
wdub: a web service;
justify: a constraint solver;
imap: an email server based on the IMAP protocol.

The CTF infrastructure was carefully designed to test both offensive and defensive capabilities, requiring teams to secure their own services while simultaneously attempting to exploit vulnerabilities in their opponents’ infrastructure.

The dataset was composed of 272 game rounds, each lasting 5 min, capturing bidirectional traffic in libpcap format, including both packet headers and full payloads. Unlike typical network environments where benign traffic predominates, this dataset was uniquely characterized by a high proportion of malicious traffic, given the adversarial nature of the CTF scenario. Additionally, embedded within the traffic were 26-character validation tokens—comprising letters and digits—which teams were required to extract in order to score points. Table 1 provides an overview of the participating teams, detailing their assigned subnets, defended servers, and the total number of alerts generated during the competition.

According to Yam et al. [20] a detailed analysis of network traffic and team behaviors during the DEFCON CTF revealed key insights into the strategies that contributed to varying levels of team performance. In terms of offensive tactics, the most successful teams, such as PPP, HITCON, and Blue-Lotus, implemented automated systems to launch exploits at regular intervals. This approach likely enabled a consistent pace of attacks while allowing human team members to focus on tasks requiring adaptive decision-making. Furthermore, high-performing teams exhibited superior situational awareness by dynamically modifying their exploits during the competition in response to evolving defensive strategies. Exploitation rates among teams varied widely, ranging from fewer than 30 to over 400 exploits per round. However, this variation did not show a direct correlation with final rankings. Defensive strategies also played a critical role. Teams that ranked in the top five managed to patch their services before round 224, whereas four of the bottom five teams only completed patching after round 265. A strong correlation was observed between service uptime and final ranking; the top-ranked teams kept their services operational over 90% of the time throughout the competition. Token exfiltration methods—whether via reusing open connections, targeting specific service ports (e.g., 143, 4444, 6969, 8888), or establishing new callback channels—did not show a significant correlation with final success. Notably, most tokens were extracted from traffic without sophisticated obfuscation, suggesting that stealth in exfiltration was not a decisive factor. Additional strategic patterns emerged from the analysis. Early offensive action was critical: teams that took less time to develop and deploy initial exploits generally achieved higher rankings, revealing a strong negative correlation between time-to-exploit and performance. Moreover, offensive dominance appeared more consequential than balanced strategies, as final rankings were more strongly associated with the number of tokens captured than with tokens lost. The widespread reuse of exploits among teams further indicated that the protection of offensive tools was less effective than the timely development of new attack vectors. Overall, the most successful teams displayed greater strategic flexibility and responsiveness to the evolving dynamics of the competition.

To convert raw traffic data into a structured format suitable for analysis, the PCAP files were processed using the IDS Snort intrusion detection system [21], following the methodology outlined by Cano and Torres [22]. This process applies a predefined set of detection rules to network packets, generating alerts whenever a rule associated with suspicious or malicious behavior is triggered. Each alert includes essential metadata such as timestamps, source and destination IP addresses, and, most importantly, a unique alert_id corresponding to the specific rule match.

Table 2 summarizes the alert identifiers (alert_id) obtained through this methodology, along with their respective frequencies and brief descriptions. In total, 5,537,232 alerts were generated, covering 30 distinct event types. These alerts formed the basis for reconstructing attack sequences and conducting higher-level analyses such as pattern extraction and attacker attribution, effectively transforming raw network traffic into structured, actionable intelligence for cybersecurity research. Building on this foundation, the alert sequences generated by each attacking team against a specific victim team were grouped by competition round. For each round, we defined an MSNA as an ordered sequence of alerts issued by an attacker targeting a victim.

Many of the alert types listed in Table 2 correspond to well-known adversarial tactics observed in real-world cybersecurity incidents. For example, alert_id 2 (“Unknown IMAP Command”) was indicative of reconnaissance or probing of mail services, while alert_id 13 (“Invalid HTTP content”) and alert_id 14 (“consecutive small TCP segments”) reflected evasion techniques and fragmentation attacks targeting web services. Alerts such as 0, 5, and 21 represented classic port scanning behavior, commonly used during the initial stages of an attack to identify open services. These patterns aligned with tactics described in the MITRE ATT&CK [3] framework and reinforced the realism of the dataset for studying attacker attribution.

As illustrated in Figure 1, the box plot displays the distribution of alert sequence lengths across the dataset. In this graph, the Y-axis corresponds to the number of alerts (i.e., the length of each sequence), while the X-axis represents a single position (labeled “1”), given that only one variable was being analyzed. The box represents the interquartile range (IQR), capturing the middle 50% of the data, with the horizontal line inside the box indicating the median. The whiskers extend to the smallest and largest values not classified as outliers, and individual points beyond the whiskers denote extreme values. An analysis of the figure reveals that the majority of sequences were relatively short, as evidenced by the position of the box near the lower end of the scale. Nevertheless, several extreme outliers were observed, with the longest sequences reaching up to nearly 100,000 alerts. The most prominent outliers appeared at around 90,000 and 60,000 and between 20,000 and 40,000 alerts. This distribution was markedly skewed, indicating that while most attack sequences were brief, there existed a minority of extraordinarily long sequences that significantly deviated from the norm. Statistical summary metrics supported these observations. The dataset comprised a total of 82,092 sequences, with an average length of 67.28 alerts and a median of 18.00. The shortest sequence contained a single alert, while the longest reached 95,427. The standard deviation, at 532.83, further reflected the high dispersion in sequence lengths. Percentile analysis showed that 95% of sequences were shorter than 185 alerts and 97% were under 298 alerts. This reinforced the notion that extremely long sequences were statistical anomalies and not representative of typical behavior. To conduct more focused and interpretable analyses, this study restricted attention to sequences with lengths between 9 and 243 alerts. This range was selected to retain the diversity of attack patterns while excluding trivial or excessively long outliers. Applying this filter resulted in a refined subset of 53,830 sequences, which balanced representativeness and tractability. This selection enabled a more targeted investigation of attacker behavior, which was critical for the development of effective and explainable attribution models.

3.1.1. Complexity and Behavioral Diversity in the CTF Dataset

For the attacker attribution experiments conducted in this study, the aforementioned filtered subset—comprising sequences between 9 and 243 alerts in length—served as the core dataset. This criterion ensured that the sequences included were sufficiently complex to capture meaningful behavioral patterns, while simultaneously removing noise introduced by overly short or anomalously long sequences.

Table 3 presents a statistical summary of this final dataset, which formed the empirical basis for all subsequent analyses and modeling efforts.

This refined subset retained much of the complexity and imbalance present in the original dataset. It exhibited a high proportion of unique sequences and recurring patterns, and an uneven distribution of activity among attacking teams. Furthermore, the low silhouette index values observed during clustering analysis indicated that the attacker attribution task remained highly challenging as the classes were not easily distinguishable in the feature space.

The diversity of attack patterns within the dataset was closely linked to the nature of the vulnerable services and the competitive dynamics inherent to the CTF event. Many of the Snort rules observed in the sequences—such as those related to HTTP, IMAP, and TCP fragmentation—were directly associated with these services, and their frequency highlighted the strategic focus on exploiting them.

The analysis of alert sequences not only revealed behavioral signatures of the attacking teams but also enabled inferences about their tactics and the specific vulnerabilities they targeted during the competition. For instance, the “Unknown IMAP4 command” alert (alert_id 2) was typically associated with reconnaissance or exploitation attempts on the IMAP service, while alerts like “HTTP invalid content-length or chunk size” (alert_id 13) and “Consecutive TCP small segments” (alert_id 14) often indicated attacks targeting web services such as wdub. Recurrent sequences of specific alerts suggested the use of automated tools to exploit known vulnerabilities, whereas more complex sequences—those combining alerts across multiple protocols—reflected evasion techniques, reconnaissance phases, or multi-stage exploitation strategies. These are hallmarks of the adversarial behavior commonly observed in CTF environments, where teams continuously adapt to exploit any available weakness to obtain tokens. Table 4 summarizes representative alert sequences and their possible interpretations in the context of the CTF competition.

This analysis shows that the dataset contained not only trivial patterns but also complex and realistic behaviors, aligned with attack scenarios observed in competitive and real environments. The presence of both trivial and sophisticated patterns in the sequences was consistent with the dynamics of a CTF, where automated scripts and manual or adaptive attacks coexist. Thus, the richness of patterns, aligned with the malicious and strategic nature of traffic in the CTF, reinforced the relevance of the dataset as a realistic benchmark for attack attribution.

The distribution of sequences by attacker IP and by team, illustrated in Figure 2 and Figure 3, showed a significant imbalance in the number of sequences generated by each entity. This imbalance introduced an additional challenge, as attribution models must avoid overfitting to majority classes and maintain robust performance against minority classes. Regarding the complexity of the sequences, Figure 4 shows the distribution of their lengths. While most sequences were relatively short, there were notable outliers with significantly greater lengths. The distribution revealed a median length of 38, but the upper percentiles (95th through 100th) converged at 243, highlighting the presence of a long tail composed of particularly extensive and complex sequences. Figure 4 visually illustrates this distribution, highlighting the variability in sequence lengths.

The separability of classes and the inherent difficulty of the attribution task were evaluated using t-SNE visualizations, shown in Figure 5 and Figure 6. These visual tools were crucial for understanding the underlying complexity of the attack attribution problem.

In Figure 5, each point corresponds to an alert sequence generated by a specific team, with colors indicating the team of origin. The absence of well-defined clusters suggested a high degree of overlap between sequences from different teams. In several regions of the plot, sequences from multiple teams appeared intermingled, making it clear that there was no simple separation in the feature space. This overlapping distribution highlights the intrinsic challenge of distinguishing attackers based solely on the characteristics of their alert sequences.

On the other hand, Figure 6 presents the centroids of the alert sequences, offering a more abstract and aggregated perspective of how the sequences were distributed in the feature space. Each point in this figure corresponds to the centroid of a group of sequences attributed to a specific team. The observed dispersion and overlap among these centroids underscored the difficulty of achieving clear class separation. This observation aligns with the silhouette index values reported earlier, which were negative for both attacker IP and team labels. Such negative values confirmed that the clusters were poorly defined and that the sequences did not naturally group according to their true labels. These results emphasize the complexity of the attribution task in this setting and highlight the necessity of employing more sophisticated models capable of capturing subtle distinctions in attacker behavior.

While the CTF dataset provides a rich and realistic foundation for studying multi-adversary attack attribution, it is essential to acknowledge its inherent limitations and contextualize its characteristics relative to other datasets commonly used in cybersecurity research. First, the dataset spans only 2.17 days across 272 structured rounds, which contrasts with the continuous and long-term monitoring typical of production environments. This temporal constraint may limit the observation of persistent attack behaviors or seasonal variations. Additionally, the controlled nature of the competition—with known participants and predefined objectives—differs significantly from real-world scenarios, where attackers operate with diverse motivations, resources, and levels of sophistication. The attack diversity is also constrained to services specific to the CTF setting, potentially excluding other vectors such as zero-day exploits or social engineering. Moreover, the artificial segmentation of traffic into 5 min rounds introduces boundaries that do not exist in real networks, potentially affecting the model’s ability to generalize to asynchronous or irregular attack patterns. In comparison with other datasets, such as UNSW-NB15, the CTF dataset offers unique advantages and challenges. While UNSW-NB15 includes a mix of benign and malicious traffic, the CTF dataset consists entirely of malicious alerts, making it particularly suitable for attribution tasks. Furthermore, UNSW-NB15 operates in a mono-adversary context, whereas the CTF dataset captures simultaneous multi-adversary interactions, providing a more complex and realistic setting for attribution modeling. The temporal structure also differs, with CTF offering a longer duration and more granular segmentation. Compared to production datasets, the CTF data lacks the unpredictability of unknown attackers, dynamic topologies, and advanced evasion techniques, but it compensates with a high density of strategic and adversarial behavior. These distinctions underscore the relevance of the CTF dataset for studying attribution in competitive environments, while also highlighting the need for methodological adaptations when extending the approach to broader operational contexts.

3.1.2. Justification of Dataset Selection and Comparative Analysis

In this work, we focused on the attacker attribution of MSNAs using a dataset derived from a real CTF competition. After preprocessing, our CTF dataset contained 53,830 MSNA sequences, each representing a temporally ordered series of malicious alerts generated by 50 unique attacker IPs (corresponding to 20 teams) against 20 victim teams, across 272 competitive rounds. This structure provided a rich, realistic, and highly diverse environment for evaluating attribution models in genuine multi-adversary scenarios. To illustrate the uniqueness and suitability of the CTF dataset for this task, we applied the same preprocessing methodology to UNSW-NB15 [23], one of the most widely used benchmark datasets in the cybersecurity literature. UNSW-NB15 is a synthetic dataset designed primarily for intrusion detection and attack classification, not for attribution. When extracting MSNA-type sequences from UNSW-NB15, we found that it contained a maximum of 40 attack sequences (MSNA) originating from only 4 unique source IPs (the “attackers”), targeting 10 different destination IPs (the “victims”). The vast majority of the dataset consisted of benign or non-attributable traffic, and the adversarial interactions were extremely limited in both scale and diversity. Importantly, UNSW-NB15 was never intended or declared as a multi-adversary benchmark; its design and labeling do not reflect the complexity or concurrency of real-world adversarial scenarios. In contrast, the CTF dataset is inherently multi-adversary by nature as it captures simultaneous, competitive interactions among multiple attackers and defenders in a dynamic environment. This stark contrast demonstrates that while UNSW-NB15 and similar datasets are valuable for detection and classification tasks, they are not suitable for studying multi-adversary attribution. Their synthetic nature, lack of temporal structure, and minimal number of attackers and victims make them fundamentally different from the complex, dynamic, and adversarial environment captured in the CTF dataset.

3.2. Method

Multi-stage attack attribution presents distinctive challenges due to the sequential and temporal characteristics of IDS alerts. The architecture proposed in this work addresses these challenges through a tailored adaptation of the Transformer model, specifically designed to capture intricate temporal patterns and long-range dependencies within alert sequences. In contrast to the original Transformer, which is geared toward sequence-to-sequence tasks, our model excludes the decoder component and concentrates on extracting temporal features for classification purposes.

The attack attribution problem was mathematically formulated as a function

f : X \to Y

, where the input space

X

consists of triplets

(i p_{v}, i p_{a}, S)

. Here,

i p_{v}

and

i p_{a}

denote the victim and attacker IP addresses, respectively, and

S = {s_{1}, \dots, s_{n}}

is a sequence of n alerts, with each

s_{i}

representing a discrete alert type. The output space

Y

corresponds to the probability distribution over the set of possible attackers

A

. This formulation allows for the capturing of both the sequential nature of attacks and the contextual information provided by the involved IPs.

To process this input, we propose a model architecture capable of simultaneously capturing local features at the alert level and global dependencies throughout the sequence. The model’s capacity to handle sequences of varying lengths and learn context-sensitive representations makes it particularly suitable for attribution in MSNA environments.

Discrete entities (IPs and alerts) are mapped to continuous vector representations via embedding layers:

E_{i p} : N \to R^{d_{i p}}, E_{a l e r t} : N \to R^{d_{a l e r t}}

(1)

where

d_{i p}

and

d_{a l e r t}

denote the embedding dimensions for IP addresses and alerts, respectively.

The resulting embeddings are

v_{v} = E_{i p} (i p_{v}), v_{a} = E_{i p} (i p_{a}), S = [E_{a l e r t} (s_{1}), \dots, E_{a l e r t} (s_{n})]

(2)

where

v_{v}, v_{a} \in R^{d_{i p}}

are the victim and attacker IP embeddings and

S \in R^{n \times d_{a l e r t}}

is the matrix of alert embeddings.

To encode temporal information, we add a sinusoidal positional encoding to the alert embeddings:

P E_{(p o s, 2 i)} = sin (\frac{p o s}{{10,000}^{2 i / d_{m o d e l}}}), P E_{(p o s, 2 i + 1)} = cos (\frac{p o s}{{10,000}^{2 i / d_{m o d e l}}})

(3)

where

d_{m o d e l}

is the model (embedding) dimension used in the Transformer encoder and

p o s

and i are the position and dimension indices, respectively.

The encoded sequence is

S^{'} = S + P E

, with

P E \in R^{n \times d_{m o d e l}}

.

The core of the model is a multi-head self-attention mechanism, defined for each head as

Attention (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V

(4)

where

Q, K, V \in R^{n \times d_{k}}

are the query, key, and value matrices, obtained via learned linear projections of the input, and

d_{k}

is the dimension of each head.

The multi-head output is

MultiHead (X) = Concat ({head}_{1}, \dots, {head}_{h}) W^{O}

(5)

where h is the number of attention heads,

W^{O} \in R^{h d_{k} \times d_{m o d e l}}

is a learned output projection matrix, and

Concat (\cdot)

denotes concatenation along the feature dimension.

The encoded sequence

S^{'}

is processed by the Transformer encoder, yielding a contextual representation

H \in R^{n \times d_{m o d e l}}

.

To aggregate the sequence, we apply mean pooling:

z = \frac{1}{n} \sum_{i = 1}^{n} h_{i}

(6)

where

h_{i}

is the i-th row of

H

and

z \in R^{d_{m o d e l}}

.

The final representation is formed by concatenating the pooled sequence and the IP embeddings:

c = [z; v_{v}; v_{a}]

(7)

where

[\cdot; \cdot; \cdot]

denotes vector concatenation.

The output is computed as

\hat{y} = softmax (W c + b)

(8)

where

W \in R^{| A | \times (d_{m o d e l} + 2 d_{i p})}

and

b \in R^{| A |}

are the weights and bias of the final linear layer and

| A |

is the number of attacker classes.

Notation: Bold symbols (e.g.,

v_{v}

,

S

,

h_{i}

) denote vectors or matrices; normal font denotes scalars or indices. The operator

[\cdot; \cdot]

indicates vector concatenation. All dimensions are explicitly defined at first use.

The overall architecture and the interaction between its main components are illustrated in Figure 7, which visually summarizes the flow from input embeddings and positional encoding, through the Transformer encoder, to the final classification layer for attacker attribution. This architecture efficiently captures both the local features of each alert and the global dependencies within the sequence. The proposed model integrates the following fundamental components to achieve effective attack attribution:

IP and Alert Embeddings: Dense, learnable vector representations ( $v_{v}$ , $v_{a}$ , $S$ ) enable the model to capture semantic relationships between IPs and alert types, facilitating the generalization of attacker behavior patterns across diverse scenarios.
Positional Encoding: Sinusoidal positional encoding ( $P E$ ) is added to the alert embeddings to preserve the temporal order of alerts, a critical aspect for modeling the evolution and characteristic patterns of multi-stage attacks.
Transformer Encoder: The core of the architecture, composed of multi-head self-attention and feed-forward layers, models long-range dependencies and complex relationships between alerts, regardless of their temporal distance in the sequence.
Aggregation and Concatenation: Temporal aggregation (mean pooling, $z$ ) condenses the sequence into a global representation, which is then concatenated with the contextual information of the involved IPs ( $c$ ), allowing the model to jointly consider temporal patterns and attack context.
Classification Layer: A final linear layer transforms the aggregated features into a probability distribution over the possible attackers, enabling robust attribution in multi-adversary settings.

This design adheres to the core principles of the Transformer model while specializing in attack attribution. Its modular structure and clear separation between embedding, encoding, and classification stages facilitate efficient adaptation to a wide range of attribution scenarios, while preserving the model’s ability to learn intricate patterns across variable-length alert sequences.

4. Experimental Results

In this section, we revisit the central problem addressed in this work: the accurate attribution of attacks in multi-adversary environments, where each alert sequence must be correctly associated with the responsible attacker. As highlighted by the exploratory t-SNE analysis presented in Section 3, the sequence representations exhibited significant overlap and complexity, suggesting that the decision boundaries between attackers were non-trivial. This challenge was further exacerbated by severe class imbalance and the presence of multiple attackers with partially overlapping behavioral patterns.

Our hypothesis was that Transformer-based models, due to their ability to model temporal dependencies and capture complex interactions between entities (such as IPs and alert types), are better suited to this task than traditional architectures like MLPs. We posited that their capacity to manage multiple information flows and learn contextual relationships enables them to generalize more effectively, particularly in the presence of minority classes and contaminated data subsets.

To evaluate this, we first implemented a baseline MLP model to establish a performance reference and then assessed a Transformer-based model using standard multiclass classification metrics (accuracy, macro and weighted F1-score), per-class analysis, training dynamics, and robustness tests. This comparative analysis allowed us to validate whether the modeling advantages of Transformers translate into tangible improvements in attacker attribution performance.

4.1. MLP Model Performance (Baseline)

To establish a clear and quantifiable reference for the task of attack attribution in multi-adversary environments, we implemented an MLP classifier as a baseline. This subsection aims to demonstrate the limitations of traditional approaches and justify the need for more advanced models, such as the Transformer, in scenarios characterized by high complexity and class imbalance.

The MLP was designed to process alert sequences represented as integer vectors, where each alert is transformed into a learned embedding. The entire sequence was flattened and fed into a deep neural network with several hidden layers. The specific configuration of the model was as follows:

Embedding layer: 64-dimensional embedding for each alert.
Network structure: three hidden layers with 1024, 512, and 256 neurons, respectively, each followed by ReLU activation, batch normalization, and dropout (0.2).
Input: sequences of lengths between 9 and 243 alerts, padded to the maximum length.
Loss function: cross-entropy with class weights inversely proportional to class frequency to mitigate imbalance.
Optimizer: Adam, initial learning rate 0.001.
Training: early stopping with patience of 20 epochs; batch size 128; data split into 60% training, 20% validation, and 20% test.

This architecture allowed for a fair and reproducible comparison against more sophisticated models and helped to gauge the challenge posed by attribution in this context. Figure 8 shows the evolution of the F1-score for the five best classified IPs by the model throughout the training epochs. The MLP quickly learned to identify majority classes, achieving high precision, recall, and F1-score within a few epochs and maintaining them close to 1.0. These IPs—10.5.11.110, 10.5.7.101, 10.5.11.114, 10.5.2.101, and 10.5.19.111—achieved consistently high performance, with F1-scores converging toward 1.0 as training progressed. This highlights the model’s ability to generalize well for frequent or well-represented attacker classes.

The results obtained by the MLP on the test set reflect the limitations of this approach. The macro F1-score was considerably lower than the weighted F1-score, indicating that the model tended to favor majority classes and struggled to generalize to minority ones. The main metrics were

Accuracy: 0.8062
F1-score (macro): 0.6505
F1-score (weighted): 0.8048.

The confusion matrix shown in Figure 9 reinforces this diagnosis. The model correctly identified several of the IPs with the largest number of samples, such as 10.5.1.110, 10.5.8.101, 10.5.10.104, 10.5.17.101, and 10.5.9.102, which showed high values on the diagonal. However, performance dropped significantly for certain classes, such as 10.5.8.101 misclassified as 10.5.9.102, 10.5.17.101 as 10.5.16.101, and 10.5.6.107 as 10.5.6.108, where the model frequently confused similar patterns. In these cases, most examples were assigned to neighboring or semantically related classes, rather than being rarely classified correctly. This reflects a systematic bias and the MLP’s difficulty in learning distinctive patterns in overlapping or imbalanced scenarios. This error pattern was not random but concentrated in specific classes (attackers), indicating a bias toward classes with greater support. This phenomenon was consistent with the low macro F1-score, which penalized the inability to generalize to minority classes. Overall, these results show that while the MLP can be effective at identifying frequent attackers, its ability to discriminate among multiple adversaries in imbalanced scenarios is moderate, and it tends to systematically confuse closely related attackers.

4.2. Transformer Model Performance

To address the complexity of attack attribution in multi-adversary environments, we implemented a model based on the Transformer architecture, adapted to process alert sequences and capture long-range contextual dependencies. This subsection describes the specific configuration of the model and justifies its selection over traditional approaches. The Transformer model was designed to represent both IP addresses (victim and attacker) and alert identifiers using learned embeddings. The alert sequence was transformed into an embedding matrix, to which positional encoding was added to preserve temporal order. The core of the model was a multi-layer Transformer encoder, whose output was combined with the IP embeddings and fed into a dense layer for final classification. The specific configuration was as follows:

Alert embedding layer: 16 dimensions.
IP embedding layer: 16 dimensions for each IP (victim and attacker).
Positional encoding: applied to the alert sequence.
Transformer encoder: one or two layers, four attention heads, model dimension 16, dropout 0.1.
Input: sequences of lengths between 9 and 243 alerts, padded to the maximum length.
Loss function: cross-entropy with class weights inversely proportional to class frequency.
Optimizer: Adam, initial learning rate 0.001.
Training: early stopping, batch size 128, stratified cross-validation. Data split into 60% training, 20% validation, and 20% test. However, to ensure a more robust and unbiased evaluation, the experiments with the Transformer model employed a 5-fold stratified cross-validation scheme. In this setup, for each fold, 80% of the data was used for training, 10% for validation, and the remaining 10% for testing, ensuring that all samples were used for testing exactly once across the folds.

Training and validation of the Transformer model required approximately 41 min per fold. Considering 5-fold cross-validation, the total time invested was about 206 min (around 3 h and 26 min) to complete the entire experimental process. This contrasted with the 25 min required by the MLP, reflecting the higher computational cost associated with more complex and robust models.

Before analyzing the final performance on the test set, it was essential to observe how the Transformer model’s ability to correctly attribute IPs evolved throughout the training and validation epochs.

Figure 10 below presents the evolution of the main global metrics of the Transformer model during training and validation: loss, accuracy, weighted F1-score, and macro F1-score. Both the loss and performance metrics improved rapidly in the first epochs, stabilizing near their optimal values from epoch 6 onwards. The gap between the training and validation curves was minimal, indicating that the model did not exhibit significant overfitting and generalized well to unseen data. These curves reflect efficient and stable learning, with outstanding performance in both training and validation.

Figure 11 shows the evolution of the F1-score for each IP. Most IPs achieved a perfect F1-score within a few epochs, indicating the model’s rapid learning capacity for most classes. However, there was variability: some IPs required more epochs to reach optimal performance, as seen in the bottom right graph. This analysis helped identify possible difficult cases or minority classes that could require special attention or adjustments in the training strategy. This detailed analysis of per-IP evolution complemented the final test evaluation, providing a more granular view of the learning dynamics and the model’s robustness to attacker diversity.

The Transformer model achieved significantly superior metrics on the test set (average over the 5 cross-validation folds):

Accuracy (test): 0.9998 ± 0.0001
F1-score (macro, test): 0.9944 ± 0.0032
F1-score (weighted, test): 0.9998 ± 0.0001

The confusion matrix shown in Figure 12 demonstrates the excellent performance of the Transformer model on the test set. Unlike the MLP, the Transformer achieved almost perfect classification for all IPs, with high values on the diagonal and no systematic errors outside it. All classes, including those with fewer samples, were correctly identified, as reflected in the absence of significant values outside the main diagonal. This result indicates that the model is not only capable of learning distinctive patterns for the most frequent attackers but also generalizes well to minority classes, overcoming the bias observed in the MLP. The homogeneity along the diagonal of the matrix and the absence of relevant confusions demonstrate the robustness and discriminative capacity of the Transformer in multi-adversary and imbalanced scenarios. Together, these results explain the high macro and weighted F1-scores obtained on the test set and confirm that the Transformer can correctly attribute attacks to each IP, regardless of the attacker’s frequency.

Despite the Transformer model’s exceptional performance (99.97% accuracy on the test set), a detailed analysis of the few misclassified sequences provides valuable insights into the model’s behavior and the complexity of the attribution task. Out of 10,766 test sequences across the 5-fold cross-validation, only 3 sequences were incorrectly classified. It is important to note that the confusion matrix shown in Figure 12 corresponds to a single fold of the 5-fold cross-validation. In that particular fold, only one misclassification occurred. The total of three misclassifications refers to the aggregate across all five folds.

Analysis of Misattributed Sequences

Table 5 presents a detailed analysis of these three misattributed sequences, including their characteristics, the actual and predicted attacker IPs, and potential explanations for the classification errors.

The analysis reveals several important patterns in the misattributed sequences. First, all three errors involved sequences with highly repetitive patterns, suggesting that the model may struggle with sequences that lack sufficient diversity to distinguish between similar attackers. Second, two of the three errors involved IPs from the same subnet (10.5.11.x), indicating potential confusion between attackers from the same team. Third, the misclassified sequences were relatively short (9–20 alerts) and may not have provided enough temporal context for accurate attribution.

These findings highlight the importance of sequence diversity and length in attack attribution, as well as the challenge of distinguishing between attackers from the same team or with similar behavioral patterns. The extremely low error rate (0.028%) suggests that the Transformer successfully captures the majority of distinguishing features, but these edge cases provide valuable insights for future improvements in model design and training strategies.

5. Discussion

The experimental results presented in this work provide clear quantitative and qualitative evidence of the superiority of our Transformer-based model not only over the traditional MLP baseline but also in comparison with existing approaches in the literature.

Quantitatively, the Transformer achieved near-perfect performance across all standard metrics. Specifically, the Transformer attained an average accuracy of 0.99, macro F1-score of 0.99, and weighted F1-score of 0.99 on the test set. In contrast, the MLP baseline only reached an accuracy of 0.8062, macro F1-score of 0.6505, and weighted F1-score of 0.8048. This substantial gap was particularly pronounced in the macro F1-score, which reflected the model’s ability to generalize across all classes, including those with few samples. The low macro F1-score of the MLP indicates a strong bias toward majority classes and a limited capacity to correctly classify minority attackers.

Qualitatively, the confusion matrix for the Transformer (see Figure 12) revealed a dominant diagonal and an absence of systematic misclassifications, even for classes with low support. In contrast, the MLP confusion matrix showed that while several frequent classes were correctly identified, many errors were concentrated in semantically similar or overlapping classes, particularly among minority attackers. The per-IP F1-score evolution further demonstrated that the Transformer rapidly learned to distinguish all attacker classes, stabilizing its performance within a few epochs, while the MLP achieved high performance for frequent classes but continued to struggle with minority or ambiguous classes throughout training.

These improvements were attributed to the Transformer’s ability to model long-range contextual dependencies and handle variable-length sequences, as well as the use of learned embeddings for both IPs and alerts and stratified cross-validation. The MLP, lacking these capabilities, was unable to exploit the temporal and structural information present in the alert sequences.

From a practical perspective, the adoption of the Transformer enables robust and fair attribution of attacks, ensuring high accuracy not only for frequent attackers but also for those that are rarely observed. This is critical in real-world cybersecurity scenarios, where the identification of low-frequency adversaries is essential for effective response and mitigation.

While the Transformer incurs a higher computational cost (206 min of training versus 25 min for the MLP), the significant gains in accuracy, generalization, and robustness justify this investment, especially in high-stakes security applications.

An important methodological aspect of this work is the evolution of the data partitioning strategy. Initially, for the MLP baseline, we adopted a conventional split of 60% for training, 20% for validation, and 20% for testing. However, to ensure a more robust and unbiased evaluation, the experiments with the Transformer model employed a 5-fold stratified cross-validation scheme. In this setup, for each fold, 80% of the data was used for training, 10% for validation, and the remaining 10% for testing, ensuring that all samples were used for testing exactly once across the folds.

This cross-validation approach not only provides a more reliable estimate of the model’s generalization performance but also mitigates the risk of overfitting to a particular data split. The reported test metrics for the Transformer (accuracy, macro F1-score, and weighted F1-score) corresponded to the average and standard deviation across the five test folds, offering a comprehensive and statistically sound assessment of the model’s effectiveness.

It is important to note that while the MLP results were based on a single train/validation/test split, the Transformer results benefited from the increased robustness and representativeness provided by cross-validation. This methodological improvement further highlights the reliability of the conclusions drawn regarding the superiority of the Transformer architecture for attack attribution in multi-adversary scenarios.

Comparative Analysis with Related Work

To contextualize our results and demonstrate the advantages of our approach, we compared our method with several Transformer-based or deep learning-based models for attack detection or attribution. As shown in Table 6, many recent approaches employed deep learning or Transformer-based models for threat detection tasks, such as identifying anomalous or malicious behavior. For instance, Afnan et al. [13] adopted a Transformer-based approach and evaluated its performance on two large-scale DARPA datasets comprising over 17 billion system events. Although its detection metrics are strong, the model did not address attacker attribution, and its evaluation primarily focused on binary classification of malicious behavior within imbalanced log data.

Zhang et al. [24] utilized the CIC-IDS2017 dataset, which includes over 2.8 million network traffic records: 2,273,097 normal and 557,646 attack instances, captured in a realistic network environment. However, their method focused on detecting types of malicious activity rather than attributing responsibility to specific actors. While the dataset size allowed for extensive training, the model evaluation was limited to standard classification performance without deeper analyses of attack attribution or the interpretability of the results. Amaru et al. [10] evaluated their RAPID method on three real-world system audit datasets totaling over 160 million events. Their method excelled in detecting APT activity based on provenance analysis but did not address the attribution problem, nor did it infer adversary identities from the data itself.

In contrast, Xiang et al. [4] focused explicitly on attribution using enriched logs tagged with external threat intelligence. Although they achieved promising results (F1 = 0.90), their reliance on threat intelligence sources limited applicability in environments where such enrichment is unavailable. Moreover, their dataset remained relatively small, with about 25,000 sequences annotated for five actors.

Our approach differs fundamentally in both design and scope. We tackled attribution directly by modeling temporal and structural correlations in raw alert sequences, using no external enrichment or prior knowledge. Our dataset consisted of approximately 53,830 alert sequences, extracted from CTF environments with fifty distinct adversaries. Despite this class imbalance, our model achieved a macro F1-score of 0.99, surpassing prior methods in both detection and attribution.

The strength of our method lies not only in its performance but also in its capacity to generalize across underrepresented actors through learned attention-based representations. This enables precise, interpretable attribution even in imbalanced, noisy, or data-sparse contexts, a key requirement for real-world deployment in operational security environments.

In summary, while prior works excel in APT detection under ideal or enriched conditions, our approach uniquely combines accuracy, scalability, and independence from external intelligence to offer a robust solution for multi-actor attribution from native alert streams.

6. Conclusions and Future Work

This work demonstrated the effectiveness of Transformer-based models for attack attribution in multi-adversary network environments. Unlike traditional detection tasks, attribution requires identifying the specific source of an attack, a challenge compounded by class imbalance and adversarial diversity. Our approach addresses this gap in the literature and establishes a new benchmark for robust and accurate attribution under these conditions.

Experimental results show that the Transformer architecture significantly outperforms conventional models such as the MLP, achieving near-perfect accuracy and F1-scores across all classes, including those with limited representation. This performance is largely due to the Transformer’s ability to model long-range dependencies and leverage contextual information, enabling effective generalization in highly imbalanced scenarios.

Despite these promising results, this study has limitations. It relied on a single dataset derived from a CTF competition, which, while offering authentic adversarial behavior, may not have fully captured the complexity and obfuscation techniques present in real-world attacks.

In particular, the dataset’s temporal scope was limited to 2.17 days and segmented into 272 rounds, which may not have reflected long-term or persistent attack behaviors. Additionally, the controlled nature of the competition—where all participants were known and the attack surface was predefined—differed from open-world environments with unknown adversaries and more diverse threat vectors. The artificial segmentation into 5 min rounds may also have affected the model’s ability to generalize to asynchronous or irregular attack patterns. Furthermore, the dataset contained only malicious alerts, precluding the evaluation of false positives or the model’s ability to distinguish between benign and malicious behavior.

Future work should validate the model using more diverse data sources, such as enterprise network logs or public threat intelligence feeds, and explore its robustness through sensitivity analyses involving synthetic noise or alternative datasets.

We also plan to integrate benign traffic into the input sequences to better reflect real-world conditions and assess the model’s performance in mixed-traffic environments.

Another important limitation is the model’s limited interpretability. In cybersecurity, where misattribution can have serious consequences, understanding the rationale behind each decision is critical. To address this, we plan to incorporate techniques for visualizing attention weights, identifying key alerts that influence attribution, and analyzing misclassifications to improve transparency and trust.

Additionally, we aim to evaluate the model’s resilience against evasion tactics, such as the injection of decoy alerts or the perturbation of alert sequences. Understanding how such manipulations affect attribution performance is essential for real-world deployment and will inform the development of more robust attribution mechanisms.

We also recognize that our current sequences included only alerts associated with malicious activity. However, benign traffic may also form part of an attacker’s behavioral footprint, especially in stealthy or long-term campaigns. Future work will explore the integration of non-malicious events into the input sequences to provide a more complete representation of adversarial behavior. In parallel, we plan to benchmark the model under different hardware configurations and deployment scenarios (e.g., batch vs. streaming inference) to assess its suitability for real-time operation. We also intend to explore lightweight architectural variants, such as distilled Transformers or sparse attention mechanisms, to reduce inference latency without compromising attribution performance.

Finally, we acknowledge that the use of sinusoidal positional encodings may oversimplify the temporal dynamics of multi-step attacks, particularly in asynchronous environments where the timing between steps is irregular. Future iterations of this work will investigate learned or time-aware positional encodings to better capture these temporal patterns and further enhance the model’s ability to reason over complex, temporally variable attack sequences.

Ultimately, we envision integrating the proposed model into a comprehensive intrusion detection and response pipeline, where it would serve as an attribution layer following initial alert generation. This integration would enable security teams not only to detect threats but also to rapidly identify likely adversaries, enhancing situational awareness and enabling more targeted, effective responses. The flexibility of the Transformer architecture to process multiple streams of information and capture complex relationships suggests its potential for broader applications in cybersecurity, including real-time attribution, adaptive defense strategies, and integration with other detection and response systems.

Author Contributions

Conceptualization, R.T. and A.G.; methodology, R.T.; software, R.T.; validation, A.G.; investigation, R.T. and A.G.; resources, R.T.; data curation, R.T.; writing—original draft preparation, R.T.; writing—review and editing, A.G.; visualization, R.T.; supervision, R.T.; project administration, R.T.; funding acquisition, R.T. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge funding support from: the National Agency of Research and Development of Chile (ANID), through the National Center of Artificial Intelligence (CENIA) (grant ANID-FB210017). Romina Torres was funded by ANID FONDECYT 11221155.

Institutional Review Board Statement

Not applicable. The study did not involve humans or animals.

Informed Consent Statement

Not applicable. The study did not involve human participants.

Data Availability Statement

The dataset used in this study was obtained from the DEFCON Media server (https://media.defcon.org/), specifically from publicly available Capture The Flag (CTF) competition traces. While no new data were created, the original dataset was preprocessed and transformed to suit the modeling requirements of this study.

Acknowledgments

During the preparation of this manuscript/study, the authors occasionally used GPT-4.1 to enhance the writing of the manuscript. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

APT	Advanced Persistent Threat
CTF	Capture-the-Flag
IDS	Intrusion Detection System
IoCs	Indicators of Compromise
IQR	Interquartile Range
MLP	Multilayer Perceptron
MSNA	Multi-step Network Attack
NIDS	Network Intrusion Detection System
PCAP	Packet Capture
TCNs	Temporal Convolutional Networks

References

Navarro, J.; Deruyver, A.; Parrend, P. A systematic survey on multi-step attack detection. Comput. Secur. 2018, 76, 214–249. [Google Scholar] [CrossRef]
Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A Survey on Advanced Persistent Threats: Techniques, Solutions, Challenges, and Research Opportunities. IEEE Commun. Surv. Tutor. 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
MITRE Corporation. MITRE ATT&CK: A Knowledge Base of Adversary Tactics and Techniques. Available online: https://attack.mitre.org/ (accessed on 20 June 2025).
Xiang, X.; Liu, H.; Zeng, L.; Zhang, H.; Gu, Z. IPAttributor: Cyber Attacker Attribution with Threat Intelligence-Enriched Intrusion Data. Mathematics 2024, 12, 1364. [Google Scholar] [CrossRef]
Li, J.; Liu, J.; Zhang, R. Advanced Persistent Threat Group Correlation Analysis via Attack Behavior Patterns and Rough Sets. Electronics 2024, 13, 1106. [Google Scholar] [CrossRef]
Kim, K.; Shin, Y.; Lee, J.; Lee, K. Automatically Attributing Mobile Threat Actors by Vectorized ATT&CK Matrix and Paired Indicator. Sensors 2021, 21, 6522. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Cheng, K.; Xiong, X.; Dong, R.; Huang, J.; Jie, S. Construction of Cyber-attack Attribution Framework Based on LLM. In Proceedings of the 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Sanya, China, 17–21 December 2024. [Google Scholar] [CrossRef]
Deka, P.; Rajapaksha, S.; Rani, R.; Almutairi, A.; Karafil, E. AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset. In Proceedings of the Web Information Systems Engineering–WISE 2024, Doha, Qatar, 2–5 December 2024; Springer: Singapore, 2024. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017. [Google Scholar] [CrossRef]
Amaru, Y.; Wudali, P.; Elovici, Y.; Shabtai, A. RAPID: Robust APT Detection and Investigation Using Context-Aware Deep Learning. arXiv 2024. [Google Scholar] [CrossRef]
Long, Z.; Yan, H.; Shen, G.; Zhang, X.; He, H.; Cheng, L. A Transformer-based network intrusion detection approach for cloud security. J. Cloud Comput. 2024, 13, 5. [Google Scholar] [CrossRef]
Wang, N.; Wen, X.; Zhang, D.; Zhao, X.; Ma, J.; Luo, M.; Nie, S.; Wu, S.; Liu, J. TBDetector: Transformer-Based Detector for Advanced Persistent Threats with Provenance Graph. arXiv 2023. [Google Scholar] [CrossRef]
Afnan, S.; Sadia, M.; Iqbal, S.; Iqbal, A. LogShield: A Transformer-based APT Detection System Leveraging Self-Attention. arXiv 2023. [Google Scholar] [CrossRef]
Yin, K.; Yang, Y.; Yao, C.; Yang, J. Long-Term Prediction of Network Security Situation Through the Use of the Transformer-Based Model. IEEE Access 2022, 10, 56145–56157. [Google Scholar] [CrossRef]
Manocchio, L.D.; Layeghy, S.; Lo, W.W.; Kulatilleke, G.K.; Sarhan, M.; Portmann, M. FlowTransformer: A transformer framework for flow-based network intrusion detection systems. Expert Syst. Appl. 2024, 241, 122564. [Google Scholar] [CrossRef]
Guru, K.; Moss, R.J.; Kochenderfer, M.J. On Technique Identification and Threat-Actor Attribution using LLMs and Embedding Models. arXiv 2025. [Google Scholar] [CrossRef]
Rani, N.; Saha, B.; Shukla, S.K. A Comprehensive Survey of Advanced Persistent Threat Attribution. arXiv 2024, arXiv:2409.11415. [Google Scholar] [CrossRef]
Skopik, F.; Pahi, T. Under false flag: Using technical artifacts for cyber attack attribution. Cybersecurity 2020, 3, 8. [Google Scholar] [CrossRef]
DEF CON CTF Organizers. DEF CON 22 Capture the Flag PCAP Dataset. DEF CON Media Server, Las Vegas, 2014. Available online: https://media.defcon.org/DEF%20CON%2022/DEF%20CON%2022%20ctf/ (accessed on 20 June 2025).
Yam, W.K.J. Strategies Used in Capture-the-Flag Events Contributing to Team Performance. Master’s Thesis, Naval Postgraduate School, Monterey, CA, USA, March 2016. Available online: https://archive.org/details/strategiesusedin1094548498/strategiesusedin1094548498/ (accessed on 20 June 2025).
Snort. Snort Intrusion Detection System. Cisco, 2024. Available online: https://www.snort.org/ (accessed on 20 June 2025).
Cano, M.; Torres, R. Caracterización de ataques multietapa en ejercicios Capture The Flag. Rev. Política Estrateg. 2023, 133–151. [Google Scholar] [CrossRef]
Moustafa, N.; Slay, J. UNSW-NB15 Dataset [Data Set]. University of New South Wales. 2015. Available online: https://research.unsw.edu.au/projects/unsw-nb15-dataset (accessed on 20 June 2025).
Zhang, C.; Li, J.; Wang, N.; Zhang, D. Research on Intrusion Detection Method Based on Transformer and CNN-BiLSTM in Internet of Things. Sensors 2025, 25, 2725. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Distribution of Alert Sequence Lengths in the Multi-Adversary CTF Dataset.

Figure 2. Distribution of sequences by attacker IP.

Figure 3. Distribution of sequences by team.

Figure 4. Distribution of sequence lengths.

Figure 5. t-SNE visualization of sequences by team.

Figure 6. t-SNE visualization of sequence centroids.

Figure 7. Conceptual diagram of the proposed Transformer-based model for attack attribution in multi-adversary network alert sequences. The architecture integrates input and IP embeddings, sinusoidal positional encoding, a multi-layer Transformer encoder (with multi-head attention, feed-forward, and residual connections), mean pooling for temporal aggregation, and a final classification layer. This design enables the model to capture both contextual and temporal dependencies and robustly attribute attacks to specific adversaries.

Figure 8. F1-score evolution for the 5 best and 5 worst IPs in the MLP during training.

Figure 9. Confusion matrix of the MLP on the test set. Darker cells indicate higher classification counts.

Figure 10. Evolution of the global metrics of the Transformer model during training and validation: loss, accuracy, weighted F1-score, and macro F1-score.

Figure 11. Evolution of the F1-score per IP during training and validation of the Transformer model.

Figure 12. Confusion matrix of the Transformer on the test set.

Table 1. Summary of teams, subnets, and servers to defend in DEFCON CTF 22.

ID	Team	Team Subnet	Server to Defend
1	Plaid Parliament of Pwning (PPP)	`10.5.1.0/24`	10.5.1.2
2	9447	`10.5.2.0/24`	10.5.2.2
3	Reckless Abandon	`10.5.3.0/24`	10.5.3.2
4	Routards	`10.5.4.0/24`	10.5.4.2
5	raon_ASRT	`10.5.5.0/24`	10.5.5.2
6	KAIST GoN	`10.5.6.0/24`	10.5.6.2
7	shellphish	`10.5.7.0/24`	10.5.7.2
8	CodeRed	`10.5.8.0/24`	10.5.8.2
9	HITCON	`10.5.9.0/24`	10.5.9.2
10	Blue-Lotus	`10.5.10.0/24`	10.5.10.2
11	HackingForChiMac	`10.5.11.0/24`	10.5.11.2
12	(Mostly) Men in Black Hats (mmibh)	`10.5.12.0/24`	10.5.12.2
13	w3stormz	`10.5.13.0/24`	10.5.13.2
14	More Smoked Leet Chicken (mslc)	`10.5.14.0/24`	10.5.14.2
15	Dragon Sector	`10.5.15.0/24`	10.5.15.2
16	[SEWorks] penthackon	`10.5.16.0/24`	10.5.16.2
17	Stratum Auhuur	`10.5.17.0/24`	10.5.17.2
18	Gallopsled	`10.5.18.0/24`	10.5.18.2
19	BalalaikaCr3w	`10.5.19.0/24`	10.5.19.2
20	Binja	`10.5.20.0/24`	10.5.20.2

Table 2. Summary of alert identifiers.

alert_id	Frequency	Description
0	3984	TCP port scan.
1	38	Combined SDF flow alert.
2	430,466	Unknown IMAP command.
3	196	TCP honeypot port.
4	9498	Missing HTTP content length.
5	1295	Multiple TCP port scan.
6	1	SIP URI too long.
7	5	Distributed TCP port scan.
8	65	SSH protocol mismatch.
9	81	TCP timestamp out of window.
10	73	Incorrect segment, size adjusted.
11	2	Consecutive small HTTP chunk.
12	69	Overlapping TCP packet limit.
13	526,640	Invalid HTTP content.
14	2,265,584	Consecutive small TCP segments.
15	261,768	Non-RFC defined characters in HTTP.
16	349	Missing TCP timestamp.
17	195	Overloaded directory in HTTP URI.
18	62,701	Reset out of window.
19	21,812	Other HTTP protocol on server.
20	9348	Long HTTP header.
21	39	UDP port scan.
22	499,607	TCP session without 3-way handshake.
23	1971	Invalid HTTP chunk size.
24	1	ICMP port scan.
25	44	Invalid line before HTTP response.
26	1	Invalid or malformed HTTP chunk.
27	1,122,635	Unknown HTTP method.
28	34	Simple HTTP request.
29	304,721	Unescaped space in HTTP URI.

Table 3. Statistical summary of the subset of alert sequences used for attribution experiments.

Feature	Value
Total sequences	53,830
Unique attacker IPs	50
Teams (third octet of the IP)	20
Unique victim IPs	20
Unique attacker–victim pairs	849
Minimum sequence length	9
Maximum sequence length	243
Mean sequence length	61.7
Median sequence length	38
Standard deviation of length	62.2
Percentile 95	243
Unique sequences	29,200 (54.2%)
Most repeated sequence (occurrences)	1259
Silhouette index (by IP)	−0.312
Silhouette index (by team)	−0.263
Alert types (Snort)	30

Table 4. Examples of representative sequences and their possible interpretation in the CTF context.

Alert Sequence	Alert Description	Possible Interpretation
[27, 13, …] (18 alerts)	27: TCP timestamp missing 13: HTTP invalid content length or chunk size	Automated attack repeatedly exploiting an HTTP vulnerability, possibly to evade session controls or exploit errors in header handling.
[14, …] (20 alerts)	14: Consecutive TCP small segments exceeding threshold	Intentional fragmentation of TCP packets to evade detection systems or exploit in traffic reassembly.
[22, …] (9 alerts)	22: HTTP server response before client request	Anomalous protocol behavior possibly generated by misconfigured tools or scripts.
[27, 13, 27, 14, 2, …] (16 alerts)	27: TCP timestamp missing 13: HTTP invalid content length 14: Consecutive TCP small segments 2: Unknown IMAP4 command	Complex sequence suggesting an attack combining HTTP and IMAP exploitation, possibly for simultaneous reconnaissance and exploitation.
[29, …] (10 alerts)	29: TCP session

Table 5. Analysis of misattributed sequences in the test set.

Actual IP	Predicted IP	Sequence Length	Alert Pattern	Sequence Description	Potential Error Cause
10.5.11.101	10.5.11.110	9	[15, 15, 15, 15, 15, 15, 15, 15, 15]	Repetitive HTTP non-RFC characters	Similar IP range (10.5.11.x) with identical alert pattern
10.5.4.104	10.5.8.101	20	[14, 2, 14, 2, 14, 2, …]	Alternating TCP segments and IMAP commands	Pattern similarity across different teams (teams 4 and 8)
10.5.6.102	10.5.11.120	15	[27, 13, 27, 13, …]	Alternating HTTP methods and invalid content	Cross-team pattern confusion in HTTP-based attacks

Table 6. Comparison with related methods for threat detection or attribution.

Method	Dataset	Task Type	F1 (Macro)	Attribution Capability
Xiang et al. [4]	Enriched threat logs (25k sequences)	Attribution	0.90	Group-level attribution using threat intelligence
Amaru et al. [10]	Enterprise audit logs (THEIA, CADETS, Public Arena)	APT detection	0.96	No attribution; provenance- based detection only
Afnan et al. [13]	Synthetic APT logs	APT detection	0.94	No attribution; focused on anomaly detection in synthetic logs
Zhang et al. [24]	IoT traffic (UNSW-NB15)	Detection	0.95	No attribution; focused on traffic classification
Ours	53k CTF alert sequences	Attribution	0.99	Fine-grained, individual-level attribution without external enrichment

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Torres, R.; García, A. Attacker Attribution in Multi-Step and Multi-Adversarial Network Attacks Using Transformer-Based Approach. Appl. Sci. 2025, 15, 8476. https://doi.org/10.3390/app15158476

AMA Style

Torres R, García A. Attacker Attribution in Multi-Step and Multi-Adversarial Network Attacks Using Transformer-Based Approach. Applied Sciences. 2025; 15(15):8476. https://doi.org/10.3390/app15158476

Chicago/Turabian Style

Torres, Romina, and Ana García. 2025. "Attacker Attribution in Multi-Step and Multi-Adversarial Network Attacks Using Transformer-Based Approach" Applied Sciences 15, no. 15: 8476. https://doi.org/10.3390/app15158476

APA Style

Torres, R., & García, A. (2025). Attacker Attribution in Multi-Step and Multi-Adversarial Network Attacks Using Transformer-Based Approach. Applied Sciences, 15(15), 8476. https://doi.org/10.3390/app15158476

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Attacker Attribution in Multi-Step and Multi-Adversarial Network Attacks Using Transformer-Based Approach

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Dataset Description and Characterization

3.1.1. Complexity and Behavioral Diversity in the CTF Dataset

3.1.2. Justification of Dataset Selection and Comparative Analysis

3.2. Method

4. Experimental Results

4.1. MLP Model Performance (Baseline)

4.2. Transformer Model Performance

Analysis of Misattributed Sequences

5. Discussion

Comparative Analysis with Related Work

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI