AgentRed: Towards an Agent-Based Approach to Automated Network Attack Traffic Generation

Koffi, Koffi Anderson; Lucke, Kyle; Danquah Darko, Elijah; Berhanu, Tollan; Borrelli, Robert Angelo; Kolias, Constantinos

doi:10.3390/a19010043

Open AccessArticle

AgentRed: Towards an Agent-Based Approach to Automated Network Attack Traffic Generation

by

Koffi Anderson Koffi

^*

,

Kyle Lucke

,

Elijah Danquah Darko

,

Tollan Berhanu

,

Robert Angelo Borrelli

and

Constantinos Kolias

^*

Department of Computer Science, University of Idaho, Idaho Falls, ID 83402, USA

^*

Authors to whom correspondence should be addressed.

Algorithms 2026, 19(1), 43; https://doi.org/10.3390/a19010043

Submission received: 13 November 2025 / Revised: 18 December 2025 / Accepted: 25 December 2025 / Published: 4 January 2026

(This article belongs to the Special Issue Artificial Intelligence in Modern Cybersecurity: Changes, Applications and Challenges)

Download

Browse Figures

Versions Notes

Abstract

Network security tools are indispensable in testing and evaluating the security of computer networks. Existing tools, such as Hping3, however, offer a limited set of options and attack-specific configurations, which restrict their use solely to well-known attack patterns. Although highly parameterizable libraries, such as Scapy, provide more options and scripting capabilities, they require extensive manual setup and often a steep learning curve. The development of powerful AI models, capitalizing on the transformer architecture, has enabled cybersecurity researchers to develop or incorporate these models into existing cyber-defense systems and red-team assessments. Prominent models such as NetGPT, TrafficFormer, and TrafficGPT can be effective, but require extensive computational resources for fine-tuning and a complex setup to adapt to proprietary networking environments and protocols. In this work, we propose AgentRed, a lightweight tool for generating network attack traffic with minimal human configuration and setup. Our tool integrates an AI agent and a large language model with fewer than a billion parameters into the network traffic generation process. Our method creates lightweight Low-Rank Adaptation (LoRA) adapters that can learn specific traffic patterns in a particular network environment. Our agent can autonomously train the LoRA adapters, search online documentation for attack patterns and parameters, and select appropriate adapters to generate network traffic specific to the user’s needs. It utilizes the LoRA adapters to create an intermediate traffic representation that can be parsed and executed by tools such as Scapy to generate malicious traffic in a virtualized test environment. We assess the performance of the proposed approach on six popular network attacks, including flooding attacks, Smurf, Ping-of-Death, and normal ICMP ping traffic. Our results validate the ability of the proposed tool to efficiently generate network packets with 97.9% accuracy using the LoRA adapters, compared to 95.4% accuracy using the base pre-trained Qwen3 0.6B model. When the AI agent performs online searches to enrich the LoRA adapters’ context during traffic generation, our method maintains an accuracy of 96.0% across all tested traffic patterns.

Keywords:

large language models; agent; fine-tuning; GRPO; LoRA; network traffic generation; synthetic datasets; reinforcement learning; parameter-efficient fine-tuning

1. Introduction

The emergence of increasingly sophisticated networking threats requires the development of more advanced tools. Prominent networking attacks include distributed denial-of-service (DDoS) attacks such as network flooding [1] (TCP, UDP, and ICMP), smurf attack [2], and ping-of-death [3]. Cyber-security researchers often unleash such attacks in controlled or simulated environments when developing detection and mitigation techniques using tools such as Hping3 [4] and Ostinato [5]. Due to their lack of scripting capabilities, libraries such as libtins [6] and Scapy [7] are often used instead to implement complex network attacks. However, even with a comprehensive knowledge of essential elements of an attack, the process of developing new or modifications of existing network attacks can be time-consuming.

To rectify these deficiencies, researchers have developed approaches that leverage pre-trained transformer models, specifically large language models (LLMs), to simulate complex network environments. Examples of these approaches include PAC-GPT [8], NetGPT [9], TrafficFormer [10], and TrafficGPT [11]. Although these approaches have demonstrated encouraging results, they often incur a high computational cost and complexity, and lack the flexibility required to adapt to novel attack patterns in specific network environments and protocols. All of these approaches aim to generate synthetic network traffic directly hoping to capture the intricacies of (potentially) malicious traffic effectively.

To tackle these challenges and limitations, we propose AgentRed, a novel AI agent-based framework for network traffic generation that can produce intermediate network traffic representations corresponding to various traffic patterns, including those associated with attacks. Tools like Scapy can consume these portable representations to produce immediate attack behavior, including the sending of packets, without requiring human-written code. In other words, we adopt a different approach i.e., our proposed framework aims to create accurate programs/scripts fast that deploy new or modified versions of attacks rather than generate synthetic traffic. Our method integrates the strengths of reinforcement learning [12], specifically Group Relative Proximal Policy Optimization (GRPO) [13], with the Low-Rank Adaptation (LoRA) [14] parameter-efficient fine-tuning (PEFT) technique during an agent-triggered fine-tuning. This combination enables us to adapt lightweight pre-trained LLMs, such as Qwen3 0.6B [15], to specific network traffic patterns and environments while maintaining computational efficiency and flexibility. By enabling the framework to dynamically select and fine-tune various adapters based on traffic generation requirements, we create a system that can generate diverse and realistic network traffic patterns without manual implementation of such behavior. This adaptability is crucial for simulating various network scenarios, including benign and malicious traffic types and novel attack patterns.

Our contributions are as follows:

We introduce AgentRed, an AI agent networking tool. Unlike static pre-trained models (e.g., NetGPT) that are limited to their training cutoff, we introduce an autonomous agent workflow capable of creating lightweight LoRA adapers and retrieving real-time context via online search to generate traffic for various attack patterns without model retraining. This framework enables a deterministic network packet generator (Scapy) to be utilized as the verifier, enabling the fine-tuned model to learn strict protocol constraints without human supervision.
We design a novel portable traffic generation format that can be easily parsed to create network packets. This effectively alleviates the need to manually implement attacks in code, for example, using Scapy scripts. This novel intermediate XML traffic representation bridges the gap between probabilistic LLM generation and deterministic packet construction.
We evaluate the performance of our approach considering six popular network attacks to assess its ability to generate network packets with high accuracy and low latency.

The rest of the paper is structured as follows. Section 2 reviews some recent related work in network traffic generation. We provide some technical background in Section 3. In Section 4, we detail our proposed methodology, including the proposed framework. Section 5 presents the experimental setup, datasets, results, and analysis. Finally, we conclude the paper in Section 7, summarizing our main findings.

2. Related Work

Several recent works have explored the application of pre-trained transformer models to the understanding and generation of network traffic as shown in Table 1. Before the advent of large language models, Generative Adversarial Networks (GANs) represented the state-of-the-art in synthetic traffic generation. Anande and Leeson provide a comprehensive survey of this domain, categorizing various GAN architectures such as PAC-GAN for packet-level generation and ITCGAN for addressing class imbalance in traffic datasets [16]. Their analysis highlights the evolution from simple flow-level statistical replication to complex packet-byte generation using adversarial training. However, they also identify key limitations in these architectures, specifically the difficulty in converging on multi-serial network packets for diverse protocols, a challenge our agent-based transformer approach aims to resolve.

Beyond adversarial networks, researchers have recently applied diffusion models to the domain of traffic synthesis. Jiang et al. proposed NetDiffusion, a framework that fine-tunes text-to-image latent diffusion models to generate high-fidelity network traffic [17]. Their approach transforms traffic flows into image-based representations using the nPrint encoding and utilizes ControlNet to enforce protocol constraints and field-level consistency. While NetDiffusion demonstrates superior statistical similarity and utility for data augmentation compared to GAN-based baselines, it relies on converting binary network data into visual representations for processing. In contrast, our work with AgentRed treats network traffic generation as a language modeling task, leveraging the inherent sequential reasoning capabilities of LLMs to generate complex attack vectors without intermediate modality conversions.

In the realm of transformer-based architectures, several works have explored pre-training models for traffic understanding. The authors in TrafficFormer [10] propose a two-stage pre-training approach. This approach, designed explicitly for traffic data, employs Masked Burst Modeling (MBM) and Same Origin-Direction-Flow (SODF). It is followed by supervised fine-tuning with Random Initialization Field Augmentation (RIFA). The evaluation of TrafficFormer shows superior protocol understanding capabilities. Although achieving nearly 100% F1 scores in packet direction judgment tasks, it mainly focuses on traffic classification rather than generation.

To enable generation capabilities within transformers, researchers designed a large-scale pre-training method, in NetGPT [9], for generating and understanding network traffic. Existing encoding schemes used in LLM architectures are often unsuitable for network traffic data. Thus, the authors encode the network packets in hex and apply WordPiece tokenization, which can better handle the diverse byte patterns in network data. During fine-tuning, the researchers propose header field shuffling as an augmentation strategy to increase data diversity. With these changes, the traffic generation performance shows some improvements over baseline GPT-2 models. However, the results indicate a marginal average Jensen-Shannon Divergence (JSD) score of 0.0406 across the packet length, source port, and destination port fields.

To address the token length limitations of previous approaches, the authors in [11] propose TrafficGPT. To increase the generation throughput and handle large traffic flows, they extend the maximum token capacity from 512 to 12,032 tokens. Since traditional attention layers are quadratic in the input length, the authors choose to use linear attention mechanisms instead. Also, they design a novel tokenization strategy that incorporates packet start tokens, link type tokens, and time interval tokens. These changes provide a more comprehensive representation of traffic flow. Although their model shows a 2% improvement in classification tasks and a packet-level JSD score of 0.1605 for traffic generation, it requires 189 GB of traffic data for pre-training.

Most recently, Delgado-Soto et al. introduced a framework utilizing OpenAI’s GPT-3.5 Turbo to generate realistic multi-protocol network conversations [18]. Their approach implements a Mixture of Experts (MoE) architecture combined with prompt engineering to specialize the model in generating stateful traffic for specific protocols, including ICMP, ARP, DNS, TCP, and HTTP. Similar to our work, their system produces executable Scapy commands to construct the final network packets. However, their architecture relies on accessing large, API-based models and standard fine-tuning or few-shot prompting techniques. In contrast, AgentRed focuses on the parameter-efficient adaptation (LoRA) of lightweight, locally hostable models (e.g., Qwen3 0.6B) and integrates an autonomous agent capable of retrieving emerging attack contexts dynamically.

Despite these significant advancements, some drawbacks of existing methods include high computational costs, complex tokenization processes, limited adaptability to specific traffic types, and dependency on large datasets. These challenges hinder their practical application and ease of use.

3. Technical Overview

This section provides an overview of the key concepts discussed in our approach.

3.1. Network Traffic Generation Tools

Computer networking tools are often crucial in network cybersecurity research. They aid in the simulation of various network conditions and behaviors. These tools are typically used for testing and evaluation purposes. They often rely on static networking components, are rule-based systems, or use statistical models. Consequently, these tools may not accurately capture the complexity and diversity of real-world traffic patterns. For instance, tools such as aircrack-ng [19], Hping3 are specifically designed to craft a specific kind of packets and network traffic. While frameworks like Metasploit [20] contains various modules like aircrack-ng [19] and mdk3 [21] that can simulate different types of network traffic and attacks; they primarily rely on predefined templates and user-defined parameters. Other tools, such as Scapy [7] and libtins [6], enable the creation and manipulation of network packets. In a nutshell, all these tools require manual and time-consuming configuration and scripting, which can be a barrier for non-expert networking users. Conversely, the recent advancements in machine learning, specifically in LLMs, have enabled the development of more advanced tools. These tools typically utilize LLMs capable of autonomously generating more realistic and contextually relevant network traffic. These models can learn intricate patterns from vast datasets, which makes them well-suited for generating structured data, such as network packets.

3.2. PPO, GRPO, and LoRA

Reinforcement learning (RL) [22] is a paradigm in machine learning, where an agent learns to make decisions by interacting with an environment. The agent receives the state of the environment and feedback in the form of rewards for each action taken. RL has recently shown remarkable results in fine-tuning LLMs for specific tasks. Specifically, RL algorithms can be utilized to optimize a model’s performance based on task-specific objectives. This is made possible by carefully selecting the reward mechanism, which can include a reward model [23] (typically an LLM), a verifiable reward [13] (a user-defined algorithm), or some form of internal reward. For instance, PPO [23] is a widely used policy gradient algorithm designed to improve training stability. It constrains policy updates by clipping the probability ratio between the new policy

π_{θ}

and the old policy

π_{θ_{old}}

. The objective function is

L^{CLIP} (θ) = E_{t} [min (r_{t} (θ) {\hat{A}}_{t}, clip (r_{t} (θ), 1 - ϵ, 1 + ϵ) {\hat{A}}_{t})],

(1)

where

r_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{old}} (a_{t} | s_{t})}

is the probability ratio between the new and old policies,

{\hat{A}}_{t}

is the advantage estimate, and

ϵ

is a hyperparameter controlling the trust region. GRPO extends PPO with the introduction of a group-relative baseline and KL regularization. These improvements stabilize training in environments with sparse or noisy rewards. The GRPO objective is defined as

\begin{matrix} J_{GRPO} (θ) = E [\sum_{i = 1}^{G} min (\frac{π_{θ} (o_{i})}{π_{θ_{old}} (o_{i})} A_{i}, clip (\frac{π_{θ} (o_{i})}{π_{θ_{old}} (o_{i})}, 1 - ϵ, 1 + ϵ) A_{i}) \\ - β D_{KL} (π_{θ} ∥ π_{ref})], \end{matrix}

(2)

Here, the group-normalized advantage is computed as

A_{i} = \frac{r_{i} - mean ({r_{1}, r_{2}, \dots, r_{G}})}{std ({r_{1}, r_{2}, \dots, r_{G}})} .

(3)

Here, G denotes the group size,

r_{i}

is the reward of the i-th sample in the group,

π_{ref}

is a reference policy (e.g., the original model), and

β

controls the strength of the KL penalty. This relative formulation encourages the policy to prefer outputs that outperform the group baseline, thereby aiding in maintaining proximity to the reference policy.

LoRA [14] is a Parameter-Efficient Fine-Tuning (PEFT) [24] method that reduces the number of trainable parameters by injecting trainable low-rank matrices into each layer of a pre-trained model. It avoid updating the full weight matrix

W_{0} \in R^{d \times k}

, but instead LoRA freezes

W_{0}

and learns two low-rank matrices

A \in R^{d \times r}

and

B \in R^{r \times k}

:

W = W_{0} + Δ W = W_{0} + B A,

(4)

Here

r ≪ min (d, k)

is the low rank. These improvements significantly reduce the fine-tuning cost while maintaining model performance.

GRPO combined with LoRA provides an efficient and stable framework for fine-tuning LLMs on complex reasoning tasks, such as network traffic generation, where the model must learn to produce structured outputs that meet protocol specifications and satisfy verifiable reward criteria.

4. Methodology

In this work, we propose AgentRed, a novel approach to generating synthetic network traffic using an LLM agent. Our approach, illustrated in Figure 1, leverages an agentic fine-tuning (steps 2–4) and inference (steps 5–6) framework that unites the strengths of reinforcement learning, specifically GRPO, with the LoRA parameter-efficient fine-tuning technique. (see Algorithm 1).

To address the limitations of existing LLM-based traffic generators, AgentRed introduces a conceptual shift from Model-Centric to Agent-Centric generation. Previous approaches such as NetGPT [9] and TrafficGPT [11] rely on the “knowledge-in-weights” paradigm, where the model must memorize every protocol and byte pattern during massive pre-training. This results in static models that are computationally expensive and prone to structural hallucinations (e.g., incorrect checksums).

In contrast, AgentRed decouples intent from implementation. The agent handles high-level reasoning and dynamic context retrieval (e.g., searching for specific packet structures via online search), overcoming the static knowledge cutoff of traditional LLMs. Simultaneously, the LoRA Adapters provide lightweight, task-specific conditioning rather than broad general knowledge, drastically reducing computational overhead. Finally, the XML-like traffic representation format acts as a semantic bridge, allowing the LLM to generate logic (e.g., “set TCP flag to SYN”) while offloading the rigorous byte-level construction to a deterministic engine (Scapy).

A critical methodological innovation of our approach is the verifier-guided training methodology, detailed in Figure 2. Unlike standard Reinforcement Learning from Human Feedback (RLHF), which typically treats the reward model as a black box, we integrate a deterministic tool-in-the-loop verifier. This mechanism, embedded within the reward computation of GRPO, executes the generated Scapy code in a sandboxed environment during the training loop (see Algorithms 1 and 2). First, the model generated text is parse to validate that it conforms to the correct format, as specified in Figure A1 and Figure A2. Invalid format are penalized while correct format are rewarded. Second, the packet generation code is extracted from the model’s output to reward valid code and penalize syntax errors. Finally, the list of packets are traversed to check that the fields match the expected fields to reward valid packet structures and penalized invalid packets. By applying immediate penalties for parsing failures, the system enforces strict protocol syntax without requiring human intervention, ensuring the model converges to generating executable network traffic rather than hallucinated text.

4.1. Workflow Overview

A workflow of our proposed tool, as illustrated in Figure 1 and Figure 2, has the following steps: (1) user prompt input, (2) traffic pattern identification, (3) adapter selection, (4) adapter creation with GRPO and LoRA, (5) agent online search, and (6) packet generation.

①:: User Prompt and PCAP Files. The workflow begins with the user providing a set of initial PCAP files and a prompt that specifies the desired characteristics of the network traffic to be generated. The PCAP files are used to bootstrap the creation of LoRA adapters, while the prompt is a natural language description of the traffic generation task. The prompt may include details such as the target protocols, traffic patterns, and any specific requirements.
②:: Traffic Pattern Identification. The AI agent analyzes the user prompt to identify the specific network traffic to generate. This step involves parsing the input to extract relevant information about protocols, packet structures, and traffic behaviors, and searching online for relevant information (see Figure A1 for an example of agent output). Finally, the agent determines whether an existing adapter can be used or if a new one needs to be created.
③:: Adapter Selection. Based on the identified traffic pattern, the agent may consult its memory to check for existing adapters that match the requirements. A suitable adapter is one that has been fine-tuned on similar traffic patterns or protocols as specified in the user prompt. If a suitable adapter is found, its weights are merged into the base LLM for immediate use. Otherwise, the process continues with the creation of a new LoRA adapter.
④:: Adapter Creation with GRPO and LoRA. If no existing adapter is suitable, the agent initiates the creation of a new adapter, as shown in Figure 2. This involves fine-tuning a base LLM (e.g., Qwen3 0.6B) using GRPO and LoRA techniques (see Equations (2) and (4)). The fine-tuning process is guided by a verifiable reward function that evaluates the quality of the generated network traffic based on Algorithm 1 and Equation (3). The adapter(s) are trained on the provided initial PCAPs. The PCAPs may correspond to normal traffic that contains packets of the desired protocol/application, or they may contain packets with examples of the desired network attack.
⑤:: Agent Online Search. Once the adapter(s) are selected, the agent may perform an online search to gather additional context, for example, information relevant to a specific attack, the protocols involved, packet header fields, etc. This step ensures that the agent includes the latest information in the prompt to send to the adapter(s), enabling adaptation to emerging patterns or threats, such as new attack vectors or zero-day exploits.
⑥:: Packets Generation. Finally, the agent merges the weights of the adapter(s) into the base LLM model and then uses it for traffic generation. The adapter(s) may output an intermediate packet creation format structured with XML-like tags. This format includes the model thinking process in a thinking tag, the packet(s) creation details in a packets tag, and the Scapy execution script in a execution tag. Additionally, the agent may incorporate information obtained from the online search to further refine the generated intermediate format. Finally, the LLM output is parsed using Scapy to generate packets, which are stored in a PCAP file.

Algorithm 1 GRPO Fine-Tuning for Network Traffic Generation

Require: Pre-trained LLM $π_{θ}$ , Dataset $D = {(x, y)}$ , Group size G
Ensure: Fine-tuned Policy $π_{θ_{n e w}}$
1: initialize $π_{θ_{o l d}} \leftarrow π_{θ}$
2: for each training step do
3: Sample batch of prompts $B = {x_{1}, \dots, x_{M}}$ from $D$
4: for each prompt $x_{i} \in B$ do
5: Generate G outputs ${o_{i, 1}, \dots, o_{i, G}}$ from $π_{θ_{o l d}} (o | x_{i})$
6: for each output $o_{i, g}$ do
7: $r_{i, g} \leftarrow ComputeReward (o_{i, g}, y_{i})$ ▹ See Algorithm 2
8: end for
9: Compute advantage $A_{i, g}$ using normalized rewards over group G
10: end for
11: Update $π_{θ}$ via GRPO objective: $E [\frac{π_{θ} (o | x)}{π_{θ_{o l d}} (o | x)} A_{i, g} - β D_{K L}]$
12: end for

Algorithm 2 ComputeReward: Packet Reward Calculation

Require: Generated Output o, Ground Truth Packet y
Ensure: Scalar Reward R
1: $R \leftarrow 0$
2: if o does not match XML regex <think>...<packets>... then return $R - 1.0$
3: ▹ Penalty for format violation
4: end if
5: Extract packet string s from o
6: if s fails eval(s) then return $R - 0.5$ ▹ Penalty for malformed packet syntax
7: end if
8: $R \leftarrow R + 1.5$
9: Parse generated packet $p_{g e n} \leftarrow Scapy (s)$
10: Parse ground truth $p_{g t} \leftarrow Scapy (y)$
11: if $p_{g e n}$ fields match $p_{g t}$ fields then
12: $R \leftarrow R + 2.0$
13: else
14: $R \leftarrow R - 0.5$
15: end if
16: $R \leftarrow \frac{m a x (0, R)}{m a x_r e w a r d}$ ▹ Normalize reward between 0 and 1
return R

4.2. Fine-Tuning LoRA Adapters

We create and evaluate multiple adapters for various traffic patterns observed in normal ICMP Ping and in different flooding attack patterns, including TCP-SYN, UDP, TCP-ACK, and ICMP floods. Each adapter is fine-tuned using GRPO to optimize the model’s performance in generating realistic and diverse network traffic. The fine-tuning process is detailed in Algorithm 1. Moreover, we use a novel verifiable reward function and instruct the model to follow a structured output format. This format includes the model’s thought process in a thinking tag, a packets tag that contains representative packets for the specific traffic pattern, and a execution tag that contains the necessary parameters to generate the packets with Scapy (see Figure A2 for an example of LLM response). This structured approach enables easy verification of the traffic generation output.

Our reward function, defined in Algorithm 2, captures the quality of the traffic generation output based on the following deterministic criteria. First, we measure the model’s ability to conform to the required XML-like structure. A failure to do so results in an immediate format penalty

R - 1.0

. Second, we evaluate the semantic validity of the code by attempting to evaluate code with Scapy. A malformed syntax that leads to execution errors is penalized (

R - 0.5

). Finally, we assess field-level accuracy by comparing the successfully parsed packets against the ground truth. If fields do not match, a logic penalty is applied (

R - 0.5

). This hierarchical design ensures stability by providing dense supervision forcing the model to first learn the format and syntax, before it can optimize for field accuracy.

5. Experiments

In this section, we describe the experimental setup used to evaluate our proposed agentic network tool.

5.1. Datasets

We created a simulated container-based test environment in our experiments. We set up two Docker containers, where one acts as an attacker and the other as the victim. The attacker container launches various network attacks, including UDP flood, TCP-SYN flood, TCP-ACK flood, ICMP flood, Ping-of-Death (PoD), ICMP Ping, and Smurf attack, against the victim container using Hping3 for simple attacks and Scapy for more complex attacks. The traffic from the attacker to the victim is captured using tcpdump [25] and stored in a PCAP file. Each PCAP file contains between 20 (e.g., normal ICMP Ping) and 10,000 packets, with each file having a size of less than 5 MB. The discrepancy in the number of packets per PCAP file is due to the nature of the attack. For example, flooding attacks generate a large number of packets in a short period, while Ping-of-Death and Smurf attacks involve fewer packets with specific characteristics. For each traffic type, we captured 500 PCAP files, resulting in a total of 3500 PCAP files. The dataset was split into training (80%), validation (10%), and test (10%) sets to ensure robust evaluation.

While this approach can generate a synthetic dataset, we opted for this controlled generation methodology to ensure the availability of high-fidelity code-level labels. Unlike real-world network traces which lack the generating source code, this pipeline provides the exact Scapy scripts required to train the model on valid API usage and protocol syntax. We maintain that for the specific task of learning protocol grammar, this synthetic data is functionally representative of real-world standards, as a packet’s validity is defined by its adherence to deterministic RFC specifications rather than environmental stochasticity.

5.2. Experimental Setup

Our experimental setup involved fine-tuning the Qwen3 0.6B model on the dataset we created. To ensure sufficient computational power for the fine-tuning process, we performed all the experiments (training and evaluation) on a machine equipped with two 2 NVIDIA A6000 GPUs.

5.3. Evaluation Metrics

We evaluated our approach using the following key metrics: traffic generation execution time, packet-level accuracy, statistical similarity measures, and field-level precision analysis. Each metric captures distinct aspects of generation quality critical for practical deployment in network security applications. We use the standard classification metrics, including accuracy, f1-score, precision, and recall.

5.3.1. Classification Metrics

We evaluate the fundamental ability of our models to generate correct packet structures using standard classification metrics, including accuracy, precision, recall, and F1 score. These metrics use the true positives (TPs) and false positives (FPs) in their calculations. The former (TP) represents correctly generated packet fields that match the target traffic type, while the latter (FP) represents those with incorrectly generated fields. These metrics are computed at both the packet level (overall structure) and field level (individual protocol fields). This enables a granular assessment of generation quality.

\begin{matrix} Accuracy & = \frac{T P + T N}{T P + T N + F P + F N} \end{matrix}

(5)

\begin{matrix} Precision & = \frac{T P}{T P + F P} \end{matrix}

(6)

\begin{matrix} Recall & = \frac{T P}{T P + F N} \end{matrix}

(7)

\begin{matrix} F 1 - Score & = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall} \end{matrix}

(8)

5.3.2. Statistical Distribution Similarity

We use the Jensen-Shannon Divergence (JSD) as a score to assess whether the generated traffic exhibits realistic statistical properties. This aids in quantifying the similarity between the generated and original traffic distributions. We compute the JSD across the most critical packet attributes, including ports and IP addresses.

JSD (P | | Q) = \frac{1}{2} D_{K L} (P | | M) + \frac{1}{2} D_{K L} (Q | | M)

(9)

In this formula

M = \frac{1}{2} (P + Q)

is the average distribution.

D_{K L}

represents Kullback-Leibler divergence. The JSD score provides a symmetric, bounded measure of distributional difference in the range of

[0, 1]

, with lower values indicating better statistical fidelity.

5.3.3. Field-Level Accuracy Analysis

We implement field-level accuracy scoring to assess generation quality at the granular level. Typical network packets are comprised of multiple protocol fields, each requiring precise values for successful transmission and processing.

Field {Accuracy}_{i} = \frac{\sum_{j = 1}^{N} 1 [{field}_{i, j}^{g e n} = {field}_{i, j}^{g t}]}{N}

(10)

Here

1

is the indicator function, and N represents the total number of packets. We evaluate 16 critical header fields, including: IP fields (version, ihl, tos, flags, frag, ttl, proto, src, and dst), TCP fields (sport, dport, flags, and window), UDP fields (sport and dport), and ICMP fields (type and code).

5.4. Experimental Results

We evaluated our tool across three inference modes: (a) base mode, (b) adapter mode, and (c) agent mode. In the first experiment, we evaluate the performance of the base model using the Qwen3 0.6B model with a custom system prompt that instructs the model to generate network traffic in the specified intermediate format without any fine-tuning. The second experiment involves assessing the performance of the LoRA adapters created using GRPO fine-tuning by the AI Agents. Finally, in the third experiment, we evaluate the performance of the AI agent that utilizes online searches to enrich the context of the LoRA adapters during traffic generation. Our evaluation used seven distinct network attack patterns. The experiments assessed packet generation accuracy, field-level precision, execution efficiency, and statistical similarity to original traffic patterns. The results of our experiments are summarized in Table 2. Table 3 and Figure 3 present a performance comparison across the different inference modes. We further break down the performance evaluation by attack type to highlight the adaptability of our approach to diverse traffic patterns (see Table 4 and Figure 4).

5.4.1. Experiment 1: Evaluation of the base Mode

We evaluate the performance of the base LLM model (Qwen3 0.6B) without any fine-tuning on the intermediate traffic generation format. The goal is to assess the model’s inherent ability to generate network traffic patterns based solely on its pre-trained knowledge and the provided system prompt. The results indicate that the base model achieves an overall cumulative accuracy of 95.4% (±0.066), with a precision of 0.953, recall of 0.954, and F1-score of 0.954 across all traffic patterns. However, for attacks involving Ack Flood, TCP-SYN Flood, and UDP Flood, the accuracy drops significantly to 69.2%, 57.7%, and 72.7%, respectively. This demonstrates the limitations of the base model in capturing complex traffic patterns without fine-tuning. This also suggests that the base model may struggle to generalize to specific attack patterns that were not well-represented in its pre-training data.

5.4.2. Experiment 2: Evaluation of the adapter Mode

In this experiment, we assess the performance of the adapter mode, where LoRA adapters fine-tuned using GRPO are employed for traffic generation. The objective is to evaluate how effectively the fine-tuned adapters can enhance the model’s ability to generate accurate network traffic patterns corresponding to various types of attacks. The adapter mode demonstrates exceptional efficacy across all evaluated metrics. The overall cumulative accuracy reaches 97.9% (±0.043), with a precision of 0.980, recall of 0.979, and F1-score of 0.978. Notably, the adapter mode sustains superior accuracy across all attack types, including those that posed challenges for the base model. For instance, the accuracy for Ack Flood, TCP-SYN Flood, and UDP Flood attacks improves significantly to 97.2%, 97.3%, and 98.4%, respectively. This underscores the effectiveness of the adapter-based approach in capturing attack-specific characteristics through targeted fine-tuning. Furthermore, the adapter mode completed packet generation in an average of 20.8 s, representing a 37.5% improvement over the base mode (32.3 s).

5.4.3. Experiment 3: Evaluation of the agent Mode

We tested the performance of the agent mode, where an AI agent searches online to enrich the context of LoRA adapters for traffic generation. The aim is to evaluate how the agent’s ability to gather additional information impacts the quality of generated network traffic patterns. The agent mode achieves an overall cumulative accuracy of 96.0% (±0.071), with a precision of 0.961, recall of 0.960, and F1-score of 0.959. While the agent mode does not outperform the adapter mode in terms of raw accuracy, it still demonstrates significant improvements over the base model, particularly for complex attack patterns. For example, the accuracy for Ack Flood, TCP-SYN Flood, and UDP Flood attacks improves to 96.7%, 92.7%, and 90.8%, respectively. This indicates that the agent’s ability to gather additional context through online searches enhances its capacity to generate accurate traffic patterns, even when faced with novel or complex scenarios.

5.4.4. Attack-Specific Performance

We assess the performance of our framework across different attack types (see Table 4 and Figure 3). The Ping of Death attacks, for example, achieved the highest accuracy (99.5% ± 0.031), followed by ICMP Ping Flood (97.6% ± 0.049). More complex attacks, such as the UDP Flood, exhibited lower but still robust performance (92.5% ± 0.070), potentially due to the greater variability in legitimate UDP traffic patterns. Notably, the adapter mode maintained consistent high performance across all attack types (ranging from 97.2 to 99.1% accuracy), while the base mode showed significant degradation for certain attacks. In particular, the base mode performed poorly on ACK Flood (69.2%), TCP-SYN Flood (57.7%), and UDP Flood (72.7%). This underscores the adapter’s ability to capture attack-specific characteristics through targeted fine-tuning.

5.4.5. Field-Level Accuracy Analysis

We further assess the performance of our tools at the header field level (Figure 4 and Figure 5). The experimental results reveal near-perfect performance for critical protocol fields. To rigorously validate these findings, we conducted a statistical error analysis, as detailed in Table 5, which reports the mean accuracy, standard deviation, and 95% confidence intervals for key fields. Our tools demonstrated high accuracy on header fields, including IP (IP.flags, IP.src, IP.dst) and ICMP header fields, achieving 95–100% accuracy across all modes with tight confidence intervals (±0.001–0.007). TCP and UDP port fields demonstrated the superiority of our adapter-based approach on the base model, with the adapter achieving 99% accuracy while the base model dropped to 25.0% for TCP and 0.0% for UDP. The agent mode demonstrated similar results to the adapter-based approach (94–100%), with the performance dropping to 89%, 77%, 65%, respectively, for the IP protocol field, TCP source port, and UDP source port. This further confirms the superiority of the agent-based approach in capturing complex field relationships, particularly when traffic patterns may not have been observed during the training of the base model. Although we observed some minor degradation in complex fields, such as TCP flags, in certain attack scenarios, the adapter mode maintains consistent high accuracy across all fields and attack types, as shown in Figure 5.

6. Discussion and Future Directions

6.1. Discussion

Our results demonstrate three core benefits of our tool: superior accuracy with minimal parameters, attack-specific adaptation, and low time complexity overhead. By employing LoRA with rank-16 decomposition, we achieve 97.9% accuracy while training only ~0.5%of model parameters. Our approach represents a paradigm shift from existing approaches that require pre-training on massive datasets. The use of lightweight LoRA adapters enhances the computational efficiency and quality of the generated traffic, which are key limitations of current methods. In contrast, approaches such as NetGPT [9] and TrafficFormer [10] may require large amounts of training data (e.g., gigabytes) and clusters of GPUs for pre-training. Our method, however, achieves significant results with only 500 PCAP samples per attack type, resulting in an overall size of less than 1GB after pre-processing. This setup allows the training and deployment of our tool on consumer hardware.

6.2. Comparison with State-of-the-Art Approaches

To contextualize our contributions, we present a comparative analysis of AgentRed against other prominent state-of-the-art frameworks in network traffic generation. Table 1 summarizes key architectural differences, generation scopes, and limitations across these methods.

As illustrated in Table 1, existing approaches primarily focus on scaling model size or context windows to improve generation fidelity. For instance, TrafficGPT utilizes linear attention to handle massive contexts, but at the cost of significant pre-training data requirements. Similarly, NetDiffusion achieves high statistical fidelity by leveraging image-generation diffusion models, but this introduces a modality gap where network binary data must be converted to visual representations [17].

In contrast, AgentRed differentiates itself through parameter efficiency and agentic autonomy. Unlike GPT on the wire, which relies on static prompt engineering and external APIs [18], our approach runs locally using lightweight LoRA adapters that are dynamically selected and fine-tuned by an agent. This allows our system to adapt to contextual parameters for existing attacks via online search without requiring a full model retrain or massive datasets. While our current scope focuses on specific attack patterns, the integration of Scapy for final packet construction ensures that the generated traffic is immediately executable and structurally valid, a practical advantage shared with GPT on the wire but absent in purely statistical generators like NetGPT.

6.3. Limitations and Future Work

Although our methodology produces encouraging outcomes, it also has some limitations and areas for future improvement. Our evaluation centers on simple attack patterns. Attacks that require multi-stage interactions or long-term state tracking (e.g., multi-step exploits) remain unexplored.

Further investigation is needed to assess our framework performance on proprietary or encrypted protocols. While our approach can effectively generate individual packets, its capacity to encapsulate long-term temporal dependencies in traffic flows requires further investigation. Thus, future work will attempt to enhance temporal modeling and the generation of encrypted traffic. We also aim to apply our approach to real-world networking and cybersecurity scenarios.

7. Conclusions

This work introduces a novel framework for generating synthetic network attack traffic. The framework uses an AI agent to adapt various LoRA adapters to diverse network protocols and scenarios. This framework addresses the computational efficiency, adaptability, and realism, which are key challenges in existing networking tools. The experimental evaluation confirms the ability of our framework to produce realistic and diverse traffic patterns that meet specified requirements. Future work will address some limitations related to more exhaustive experiments and evaluation on real-world scenarios.

Author Contributions

Conceptualization, K.A.K.; methodology, K.A.K.; software, K.A.K.; investigation, K.A.K.; data curation, K.A.K., K.L., E.D.D. and T.B.; validation, C.K.; writing—original draft preparation, K.A.K.; writing—review and editing, K.L., E.D.D., T.B., R.A.B. and C.K.; supervision, C.K.; project administration, C.K.; funding acquisition, C.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was performed under the following financial assistance award 60NANB24D159 from U.S. Department of Commerce, National Institute of Standards and Technology and P3R1 Special Initiative Funding through the University of Idaho’s Office of Research and Economic Development (ORED).

Data Availability Statement

We provide the code and datasets used in this study at: https://github.com/UoIShieldLabs/agent_red (accessed on 1 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. AI Agent Output Example

Figure A1. Example of the AI agent output (text in markdown format) for the user prompt: “Create a UDP flood attack against the host 192.168.1.1. Send 1000 packets.”. During the traffic pattern identification step, the agent extracts relevant information from the user prompt and searches online for additional context to inform the traffic generation process.

Appendix B. LLM Response Example

Figure A2. Example of LLM response in XML-like format for network traffic generation. The top text box displays the user prompt, while the bottom text boxes show the LLM’s structured output, including the model’s reasoning (in blue), packet details (in green), and the Scapy execution script, which takes the packets as input (in red).

References

Khalaf, B.A.; Mostafa, S.A.; Mustapha, A.; Mohammed, M.A.; Mahmoud, M.A.; Al-Rimy, B.A.S.; Abd Razak, S.; Elhoseny, M.; Marks, A. An adaptive protection of flooding attacks model for complex network environments. Secur. Commun. Netw. 2021, 2021, 5542919. [Google Scholar] [CrossRef]
Kumar, S. Smurf-based distributed denial of service (ddos) attack amplification in internet. In Proceedings of the Second International Conference on Internet Monitoring and Protection (ICIMP 2007), San Jose, CA, USA, 1–5 July 2007; p. 25. [Google Scholar]
Yihunie, F.; Abdelfattah, E.; Odeh, A. Analysis of ping of death DoS and DDoS attacks. In Proceedings of the 2018 IEEE Long Island Systems, Applications and Technology Conference (LISAT), Farmingdale, NY, USA, 4–8 May 2018; pp. 1–4. [Google Scholar]
hping3 | Kali Linux Tools. Available online: https://github.com/antirez/hping (accessed on 21 September 2025).
Ghosh, S.K.; Satvat, K.; Gjomemo, R.; Venkatakrishnan, V. Ostinato: Cross-Host Attack Correlation Through Attack Activity Similarity Detection. 2022. Available online: https://ostinato.org/ (accessed on 1 October 2025).
Fontanini, M. C++ Packet Sniffing and Crafting Library. Available online: https://libtins.github.io/ (accessed on 1 October 2025).
Scapy. Available online: https://scapy.net/ (accessed on 21 September 2025).
Kholgh, D.K.; Kostakos, P. PAC-GPT: A Novel Approach to Generating Synthetic Network Traffic With GPT-3. IEEE Access 2023, 11, 114936–114951. [Google Scholar] [CrossRef]
Meng, X.; Lin, C.; Wang, Y.; Zhang, Y. NetGPT: Generative Pretrained Transformer for Network Traffic. arXiv 2023, arXiv:2304.09513. [Google Scholar] [CrossRef]
Zhou, G.; Guo, X.; Liu, Z.; Li, T.; Li, Q.; Xu, K. TrafficFormer: An Efficient Pre-trained Model for Traffic Data. In Proceedings of the 2025 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 12–15 May 2025; pp. 1844–1860, ISSN 2375-1207. [Google Scholar] [CrossRef]
Qu, J.; Ma, X.; Li, J. TrafficGPT: Breaking the Token Barrier for Efficient Long Traffic Analysis and Generation. arXiv 2024, arXiv:2403.05822. [Google Scholar] [CrossRef]
Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
Shao, Z.; Wang, P.; Zhu, Q.; Xu, R.; Song, J.; Bi, X.; Zhang, H.; Zhang, M.; Li, Y.; Wu, Y.; et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv 2024, arXiv:2402.03300. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
Team, Q. Qwen3 Technical Report. arXiv 2025, arXiv:2505.09388. [Google Scholar] [CrossRef]
Anande, T.J.; Leeson, M.S. Generative Adversarial Networks (GANs): A survey of network traffic generation. Int. J. Mach. Learn. Comput. 2022, 12, 333–343. [Google Scholar] [CrossRef]
Jiang, X.; Liu, S.; Gember-Jacobson, A.; Bhagoji, A.N.; Schmitt, P.; Bronzino, F.; Feamster, N. NetDiffusion: Network Data Augmentation Through Protocol-Constrained Traffic Generation. Proc. ACM Meas. Anal. Comput. Syst. 2024, 8, 1–32. [Google Scholar] [CrossRef]
Delgado-Soto, J.A.; López de Vergara, J.E.; González, I.; Perdices, D.; de Pedro, L. GPT on the wire: Towards realistic network traffic conversations generated with large language models. Comput. Netw. 2025, 265, 111308. [Google Scholar] [CrossRef]
Aircrack-ng. Available online: https://www.aircrack-ng.org/ (accessed on 29 September 2025).
Metasploit. Metasploit | Penetration Testing Software, Pen Testing Security. Available online: https://www.metasploit.com/ (accessed on 1 October 2025).
mdk3 | Kali Linux Tools. Available online: https://www.kali.org/tools/mdk3 (accessed on 1 October 2025).
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Ding, N.; Qin, Y.; Yang, G.; Wei, F.; Yang, Z.; Su, Y.; Hu, S.; Chen, Y.; Chan, C.M.; Chen, W.; et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat. Mach. Intell. 2023, 5, 220–235. [Google Scholar] [CrossRef]
TCPDUMP & LIBPCAP. Available online: https://www.tcpdump.org/ (accessed on 29 September 2025).

Figure 1. Architecture of the proposed approach for generating synthetic network traffic datasets using LLMs and LoRA.

Figure 2. Adapter creation process using GRPO and LoRA. The diagram highlights the novel Verifier-Guided feedback loop in where the reward function acts as a compiler. The area enclosed by dotted lines illustrates the reward function computation, which checks format, syntax validity, and packet fields validity from the model output before updating the model weights.

Figure 3. Performance comparison across different attack types for adapter, agent, and base inference modes.

Figure 4. Field-level accuracy analysis for critical packet fields across different inference modes. Green bars indicate high-performing fields, while red bars indicate lower-performing fields. The absence of red in the average field accuracy plot indicates that no fields fall into the low-accuracy range under averaged performance.

Figure 5. Heatmaps showing field-level accuracy across different attack types and inference modes.

Table 1. Comparison of AgentRed with other state-of-the-art network traffic generation approaches.

Approach	Core Architecture	Generation Scope	Key Methodology	Primary Limitations
NetGPT [9]	Transformer (GPT-2)	Packet-level	Domain-specific tokenizer & Header shuffling	Marginal statistical fidelity (JSD scores); requires specific encoding schemes.
TrafficGPT [11]	Linear Transformer	Flow & Packet	Linear attention mechanism & Large context window	High computational resource requirement (189 GB pre-training data).
NetDiffusion [17]	Latent Diffusion	Flow-level	Text-to-Image diffusion with ControlNet	Relies on modality conversion (binary to image); higher latency.
GPT on the wire [18]	GPT-3.5 Turbo (API)	Stateful Conversations	Mixture of Experts (MoE) & Prompt Engineering	Dependency on external APIs; lacks autonomous context retrieval.
AgentRed (Ours)	Local LLM (Qwen3).	Attack Scenarios	Agentic LoRA Fine-Tuning & Online Search	Currently focused on specific attack patterns; temporal modeling requires expansion.

Table 2. Accuracy comparison across mode and attack type. Bold values indicate the highest accuracy for each attack type.

Attack Type	Adapter	Agent	Base
Ping Of Death	0.991	0.994	0.998
Icmp Ping Flood	0.988	0.987	0.952
Icmp Ping	0.988	0.945	0.952
Ping Smurf	0.957	0.987	0.921
Ack Flood	0.972	0.967	0.692
Udp Flood	0.984	0.908	0.727
Tcp Syn Flood	0.973	0.927	0.577

Table 3. Performance metrics by inference mode across all traffic patterns.

Mode	Accuracy	Precision	Recall	F1-Score	JSD	Time (s)
Adapter	0.979 ± 0.043	0.978 ± 0.045	0.979 ± 0.043	0.978 ± 0.044	0.002 ± 0.018	20.8 ± 6.2
Agent	0.960 ± 0.071	0.958 ± 0.080	0.960 ± 0.071	0.959 ± 0.075	0.024 ± 0.059	101.8 ± 67.1
Base	0.954 ± 0.066	0.953 ± 0.069	0.954 ± 0.066	0.954 ± 0.068	0.012 ± 0.038	32.3 ± 10.6

Table 4. Performance metrics by attack type.

Attack Type	Accuracy	Precision	Recall	F1-Score	JSD	Time (s)
Ping of Death	0.995 ± 0.031	0.995 ± 0.033	0.995 ± 0.031	0.995 ± 0.032	0.000 ± 0.006	47.4 ± 27.1
ICMP Ping Flood	0.976 ± 0.049	0.976 ± 0.049	0.976 ± 0.049	0.976 ± 0.049	0.007 ± 0.028	54.8 ± 43.2
ACK Flood	0.967 ± 0.067	0.967 ± 0.069	0.967 ± 0.067	0.967 ± 0.068	0.003 ± 0.025	128.8 ± 105.0
Ping Smurf	0.958 ± 0.054	0.958 ± 0.054	0.958 ± 0.054	0.958 ± 0.054	0.009 ± 0.034	51.1 ± 34.0
ICMP Ping	0.957 ± 0.079	0.950 ± 0.101	0.957 ± 0.079	0.953 ± 0.091	0.018 ± 0.052	60.2 ± 55.0
TCP-SYN Flood	0.942 ± 0.079	0.941 ± 0.083	0.942 ± 0.079	0.942 ± 0.081	0.008 ± 0.033	80.1 ± 75.0
UDP Flood	0.925 ± 0.070	0.924 ± 0.071	0.925 ± 0.070	0.925 ± 0.071	0.085 ± 0.087	67.1 ± 47.3

Table 5. Field-level accuracy error analysis showing Mean, Standard Deviation (Std), and 95% Confidence Interval (CI) of the mean computed across all evaluated packets.

	Mean	Std	Min	Max	95% CI
Field
IP.tos	0.999	0.028	0.000	1.000	0.001
IP.ttl	0.999	0.039	0.000	1.000	0.002
IP.frag	0.998	0.048	0.000	1.000	0.002
UDP.dport	0.996	0.061	0.000	1.000	0.007
IP.version	0.995	0.071	0.000	1.000	0.003
TCP.flags	0.995	0.074	0.000	1.000	0.006
TCP.dport	0.991	0.095	0.000	1.000	0.008
IP.dst	0.988	0.107	0.000	1.000	0.004
ICMP.code	0.977	0.150	0.000	1.000	0.007
ICMP.type	0.977	0.150	0.000	1.000	0.007
IP.ihl	0.968	0.177	0.000	1.000	0.007
IP.flags	0.952	0.213	0.000	1.000	0.008
IP.proto	0.941	0.236	0.000	1.000	0.009
TCP.window	0.885	0.319	0.000	1.000	0.027
IP.src	0.865	0.342	0.000	1.000	0.013
UDP.sport	0.820	0.385	0.000	1.000	0.046
TCP.sport	0.757	0.429	0.000	1.000	0.036

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Koffi, K.A.; Lucke, K.; Danquah Darko, E.; Berhanu, T.; Borrelli, R.A.; Kolias, C. AgentRed: Towards an Agent-Based Approach to Automated Network Attack Traffic Generation. Algorithms 2026, 19, 43. https://doi.org/10.3390/a19010043

AMA Style

Koffi KA, Lucke K, Danquah Darko E, Berhanu T, Borrelli RA, Kolias C. AgentRed: Towards an Agent-Based Approach to Automated Network Attack Traffic Generation. Algorithms. 2026; 19(1):43. https://doi.org/10.3390/a19010043

Chicago/Turabian Style

Koffi, Koffi Anderson, Kyle Lucke, Elijah Danquah Darko, Tollan Berhanu, Robert Angelo Borrelli, and Constantinos Kolias. 2026. "AgentRed: Towards an Agent-Based Approach to Automated Network Attack Traffic Generation" Algorithms 19, no. 1: 43. https://doi.org/10.3390/a19010043

APA Style

Koffi, K. A., Lucke, K., Danquah Darko, E., Berhanu, T., Borrelli, R. A., & Kolias, C. (2026). AgentRed: Towards an Agent-Based Approach to Automated Network Attack Traffic Generation. Algorithms, 19(1), 43. https://doi.org/10.3390/a19010043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AgentRed: Towards an Agent-Based Approach to Automated Network Attack Traffic Generation

Abstract

1. Introduction

2. Related Work

3. Technical Overview

3.1. Network Traffic Generation Tools

3.2. PPO, GRPO, and LoRA

4. Methodology

4.1. Workflow Overview

4.2. Fine-Tuning LoRA Adapters

5. Experiments

5.1. Datasets

5.2. Experimental Setup

5.3. Evaluation Metrics

5.3.1. Classification Metrics

5.3.2. Statistical Distribution Similarity

5.3.3. Field-Level Accuracy Analysis

5.4. Experimental Results

5.4.1. Experiment 1: Evaluation of the base Mode

5.4.2. Experiment 2: Evaluation of the adapter Mode

5.4.3. Experiment 3: Evaluation of the agent Mode

5.4.4. Attack-Specific Performance

5.4.5. Field-Level Accuracy Analysis

6. Discussion and Future Directions

6.1. Discussion

6.2. Comparison with State-of-the-Art Approaches

6.3. Limitations and Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. AI Agent Output Example

Appendix B. LLM Response Example

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI