Malicious Traffic Detection on Tofino Using Graph Attention Model

Gao, Xichang; Tan, Lizhuang; Chen, Shengpeng; Zhang, Peiying; Wang, Jian

doi:10.3390/app15137179

Open AccessArticle

Malicious Traffic Detection on Tofino Using Graph Attention Model

by

Xichang Gao

¹,

Lizhuang Tan

^2,3,*,

Shengpeng Chen

^4,5,

Peiying Zhang

^4,5

and

Jian Wang

⁶

¹

Beijing Shang E-Town Group, Beijing 100191, China

²

Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China

³

Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan 250014, China

⁴

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China

⁵

Shandong Key Laboratory of Intelligent Oil & Gas Industrial Software, Qingdao 266580, China

⁶

College of Science, China University of Petroleum (East China), Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7179; https://doi.org/10.3390/app15137179

Submission received: 9 May 2025 / Revised: 22 June 2025 / Accepted: 23 June 2025 / Published: 26 June 2025

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

Download

Browse Figures

Versions Notes

Abstract

With the surge of malicious traffic in networks, existing detection methods struggle to balance real-time performance and efficiency. Data plane programmability, as an emerging technology, offers rapid control loops that can more effectively detect and mitigate network attacks. This paper presents Maltof, a malicious traffic detection method based on P4 programmable switches, aimed at enhancing the accuracy and efficiency of real-time detection. First, Maltof leverages the P4 programming language to embed a rule-based random forest model into the Tofino data plane of the switch, enabling quick feature matching to efficiently filter out potential malicious traffic. Subsequently, Maltof runs a lightweight Edge-based Graph Attention Network model on the CPU data plane of the switch, performing in-depth analysis on suspicious packets identified in the initial screening, learning and capturing complex relational features in network traffic to further determine whether malicious behavior is present. Through this staged detection mechanism, Maltof significantly improves performance while maintaining high detection accuracy, enabling efficient online detection of malicious traffic and ensuring timely and effective identification and mitigation of network threats.

Keywords:

malicious traffic detection; data plane programmability; P4 programmable switch; Edge-based Graph Attention Network

1. Introduction

With the rapid development of computer networks and communication technologies, the scale of networks and the number of network applications are growing exponentially [1]. At the same time, malicious attacks targeting the Internet are becoming increasingly frequent, posing serious threats to socio-economic development and national security. Therefore, leveraging technological methods to monitor network conditions, detect malicious traffic attacks early, and respond promptly can significantly enhance network resilience and defense capabilities, playing a crucial role in maintaining cybersecurity [2].

Malicious traffic detection technologies are designed to classify network traffic at multiple scales, identifying malicious attacks. Early malicious traffic detection methods primarily relied on port identification [3] and Deep Packet Inspection (DPI) [4], but they faced challenges in handling encrypted traffic and complex data [5]. Traditional machine learning algorithms have partially overcome the limitations of port-based and DPI methods; however, their effectiveness in traffic detection remains heavily dependent on feature engineering, making it difficult to adapt to emerging and dynamic attack patterns. In contrast, deep learning-based approaches provide a new direction for malicious traffic detection by directly feeding pre-processed data into neural networks for training, significantly reducing the reliance on manual feature extraction. Currently, the most widely used deep learning algorithms for this purpose include convolutional neural networks (CNNs) [6], long short-term memory (LSTM) [7], and deep belief networks (DBNs) [8]. These methods offer promising solutions to address the evolving challenges in malicious traffic detection [9].

The emergence of programmable switches has introduced new approaches to malicious traffic detection. Programmable switches, which allow users to customize packet processing logic through programming languages, can handle large-scale network data streams [10]. These switches typically integrate two types of chips: CPU and ASIC [11]. The CPU acts as a local control unit responsible for managing high-level network logic, such as loading configurations, issuing rules, and interacting with network applications. In contrast, the ASIC (such as Tofino) serves as the core of the data plane, enabling ultra-fast packet processing and forwarding, thus executing customized traffic handling tasks. P4 programmable switches use the P4 language to define their data forwarding behavior, allowing for programmable control of the ASIC chip and flexible packet processing. When detecting novel attacks or malicious traffic patterns, P4 switches can quickly update rules to implement custom traffic filtering and classification. More complex analysis and detection tasks, such as running graph neural network models, can be offloaded to the upper-layer CPU processors. This architecture maintains efficient traffic processing while enhancing the accuracy of malicious traffic detection.

Therefore, this paper proposes a malicious traffic detection method, Maltof, deployed on the P4 programmable switch, aiming to improve the accuracy and efficiency of real-time malicious traffic detection. Specifically, Maltof embeds a rule-based random forest (RF) model into the Tofino data plane of the switch through the P4 programming language, quickly matching traffic features with preset rules to efficiently screen out potential malicious traffic. This preliminary screening can reduce the amount of data analyzed and reduce the computational burden of network devices. Then, Maltof deploys the lightweight Edge-based Graph Attention Network (EGAT) model proposed in this paper on the CPU of the switch. The model conducts in-depth analysis of the suspicious data packets screened in the previous stage, learns the complex correlation features in the network traffic, and further verifies whether it is malicious traffic. Through the above detection method, Maltof effectively balances performance and accuracy, realizes online detection of malicious traffic, and ensures that malicious flows can be quickly and effectively identified and processed.

In general, the contributions and innovations of this paper include the following:

(1) A CPU/Tofino collaborative detection architecture is proposed: on the Tofino side, a rule-based RF is used for rapid, parallel pre-filtering, and suspicious traffic is cloned to the CPU side for more refined EGAT inference analysis, balancing real-time performance and accuracy.

(2) An efficient real-time malicious traffic detection method is proposed: in the P4-programmable switch, multiple RF rule tables are deployed to quickly identify malicious traffic; meanwhile, multi EGAT models are used to perform deeper feature aggregation and classification of suspicious flows, improving detection accuracy.

(3) Through the dynamic feedback and collaborative update mechanism between the CPU and Tofino, once malicious traffic is recognized by EGAT, the switch-side blacklist can be updated immediately to drop subsequent traffic of the same type, greatly reducing the spread latency caused by false positives and false negatives. Experimental results show that this architecture clearly outperforms traditional methods in terms of real-time performance while maintaining high detection accuracy.

The rest of this paper is organized as follows: Section 2 provides background and related works, including malicious traffic detection based on deep learning and P4 programmable switches. Section 3 describes the Maltof architecture and its components in detail. Section 4 presents the experimental results and analysis. Finally, Section 5 concludes the paper and discusses future work.

2. Background and Related Works

2.1. Malicious Traffic Detection Based on Deep Learning

Faced with the diversification and complexity of network attack strategies, traditional traffic detection solutions based on ports and payload have been unable to meet current security needs [12]. Therefore, more and more researchers have begun to introduce deep learning (DL) [13] into network security protection systems, and the analysis results are presented in Table 1. As a major milestone in the development of artificial intelligence, deep learning has achieved remarkable results in fields such as pattern recognition, automatic translation, and network security. Its ability to automatically extract features gives it strong adaptability in processing complex data. Wang et al. [14] proposed a method of converting traffic data into images using CNN, and used the concept of images to classify and identify malware traffic. Subsequently, Wang et al. [15] proposed an end-to-end traffic classification model based on a one-dimensional convolutional neural network, which can efficiently handle encryption protocol classification problems and achieve excellent accuracy. Lopez et al. [16] proposed a new network traffic classifier (NTC) technology that combines RNN and CNN to effectively improve the detection efficiency of network services such as HTTP and DNS without the need for complex feature engineering. Zhang et al. [17] proposed an autonomous update framework for deep learning traffic identification models, which integrates three stages: packet classification, autonomous identification of unknown applications, and automatic marking of unknown packets to achieve efficient updates, providing a dynamic update mechanism for traffic detection.

However, most current deep learning models process conventional Euclidean data, such as network traffic in the form of images or sequences, which can only capture its spatial and temporal characteristics and ignore the correlation between data, resulting in low detection rate and high false alarm rate. The graph neural network (GNN) [20], as an advanced graph representation of deep learning technology, has shown significant advantages in mining graph-structured data and identifying malicious traffic, thanks to its ability to integrate the structure of node information and attribute information. In recent years, Sun et al. [18] proposed a method that combines statistical features in traffic tracking graphs and graph convolutional network (GCN) models, which can achieve performance improvements with fewer labels. Lo et al. [19] developed the E-GraphSAGE algorithm, which is used for IoT intrusion detection by capturing edge features and topological structures of graphs. Experimental results show that this method outperforms existing methods on multiple NIDS benchmark datasets, verifying the potential of GNN in network intrusion detection.

2.2. P4 Programmable Switch

The P4 switch is a programmable network device that allows users to customize the processing logic of data packets using the P4 language, and can flexibly and efficiently handle large-scale network traffic. The P4 switch is equipped with two processors. One is an embedded CPU, which can manage the local data plane; handle the compilation, debugging, and installation tasks of the P4 program; and also configure the matching rules of P4. The other processor is the Tofino ASIC, which is responsible for data plane operations and runs customized P4 programs. Its architecture includes a parser, a match-action pipeline, and an inverse parser. The main task of the parser is to parse the incoming data packets, identify the protocol stack, and extract the required data fields. In each pipeline stage, a series of structured tables and actions are defined, which together specify the processing logic of the data packet. The table presets rules that match specific data packet attributes and executes corresponding actions. The action consists of a statement block similar to a C language function, which is used to modify the data packet or related data structure in the switch memory. The inverse parser is responsible for reassembling the modified data packet and determining its transmission form in the network by repackaging, serializing, and sending the data packet.

Compared with traditional systems, P4 switches can identify and respond to malicious traffic more quickly and update detection rules in real time through flexible programming, thereby quickly adapting to changing network threats [21]. Therefore, many researchers deploy machine learning algorithms on P4 switches to implement traffic detection. Mi et al. [22] proposed the ML-Pushback mechanism, in which the P4 switch is responsible for collecting discarded packets and sending packet summaries to the control plane. The deep learning module on the control plane extracts signatures from the summaries and classifies them through a decision tree algorithm. If it is identified as attack traffic, the control plane limits its rate, thereby effectively mitigating the attack threat. Bruno et al. [23] proposed a framework for network traffic classification that can generate simple and efficient machine learning models and deploy them in programmable switches. Compared with existing complex models, the performance loss of this model is minimal while maintaining high accuracy. Xie et al. [24] optimized the decision tree into a binary decision tree in Mousika and offloaded the BDT to the P4 switch for traffic analysis, thereby improving the efficiency of traffic detection.

3. Maltof

3.1. Problem Statement

How to use the technical advantages of programmable switches to detect malicious traffic on the data path is the current main technical requirement. Specifically, how to design a combined detection mechanism for CPU and Tofino chips and reasonably deploy the best detection model to achieve high detection accuracy is the key problem that Maltof needs to solve.

3.2. Overall Architecture

Maltof is an efficient online malicious traffic detection architecture that covers two key processes: training and detection, as shown in Figure 1. In the training phase, Maltof combines RF models and EGAT-based deep learning models for training. After training, the model is distributed to the P4 switch to achieve real-time threat detection and response at the edge of the network. In this way, Maltof can effectively improve network security and respond to potential malicious traffic threats in a timely manner.

In the detection phase, Tofino in the P4 switch is first responsible for pre-screening the traffic and filtering packets through the five-tuple filtering mechanism: traffic matching illegal five-tuples will be immediately discarded, traffic matching legal five-tuples will pass directly, and packets that do not match any entries will enter further screening. Subsequently, the system uses the P4-based RF model to analyze these unmatched traffic, filter out potential suspicious packets, and clone them to the deep detection module to avoid deep learning detection of all traffic. The deep detection module runs on the embedded CPU of the switch, and uses a sliding window mechanism [25] to manage suspicious data packets. It dynamically adjusts the window size according to the real-time network load, balances the difference between data input and processing speed, and prevents data loss. Subsequently, multiple EGAT model instances run in parallel on different CPU cores to process batch data. When feature extraction is completed or the waiting time reaches the threshold, the idle EGAT model is selected for deep detection. The detection results are fed back to the filter list through a fast update mechanism for dynamic adjustment of filtering rules. This mechanism can quickly apply the detected threat information to the pre-screening stage, thereby improving the system’s defense capabilities against new attacks and reducing potential security risks.

3.3. RF-Based Screening Module

In this module, the RF-based screening process is carried out on the P4 switch. The module uses a series of decision tables, where each table represents an individual decision tree from the random forest model. The packet features, such as IP protocol type, TCP flags, and flow duration, are used as inputs to match the predefined rules in these tables. When a packet meets the criteria for a rule, the vote_increment action is triggered, incrementing the vote count for the respective tree and updating meta.vote_count.

This process is performed iteratively across all the decision trees, accumulating votes that indicate whether the packet exhibits characteristics of potentially malicious behavior. After processing all the trees, the total meta.vote_count is compared against a predefined threshold. If the vote count exceeds this threshold, the packet is considered suspicious and is sent to the CPU for further analysis via the clone_to_cpu action. If the threshold is not exceeded, the packet is marked as benign and allowed to proceed without additional scrutiny, as shown in Algorithm 1.

Algorithm 1: RF-based Screening Algorithm

3.4. Deep Detection Module

In this module, there are three functions, namely, sliding window management, traffic trajectory map construction, and EGAT.

3.4.1. Sliding Window Management

This Algorithm 2 describes the sliding window management and data preparation process in the deep detection module. The deep detection module first uses the sliding window mechanism to dynamically manage the incoming buffered packets. The code starts with initializing an empty sliding_window (line 2), and then adjusts the window_size in real time according to the network load to ensure a balance between data input and processing capacity (line 3). For each incoming packet p, it is added to the sliding_window (line 5), and if the window size exceeds the currently set window_size, the oldest packet is deleted to prevent overflow (lines 6–7). This mechanism helps avoid buffer overflow when the incoming traffic rate exceeds the processing speed, thereby preventing data loss. After managing the packet flow, all packets in the sliding window are forwarded to the EGAT module for further deep detection (line 10). The EGAT deep learning model runs in parallel on multiple CPU cores to analyze whether there are potential threats in the batch data, thereby improving the detection performance and efficiency of the system.

Algorithm 2: Deep Detection Algorithm

3.4.2. Traffic Trajectory Map Construction

During the sliding window management process, the system dynamically manages and caches filtered data packets to process network traffic more efficiently. In order to achieve accurate traffic detection, the EGAT module designed in this paper uses the switch’s CPU to calculate complex traffic-level features, such as the sum of packet lengths, traffic duration, and total bytes. This model groups the data packets after preliminary screening by random forest into four-tuple groups and organizes them into a traffic trajectory graph, converting the traffic characteristics originally represented by edges (as shown in Figure 2a) into nodes. Malicious traffic is detected through node embedding technology, and the detection task is finally transformed into a node classification problem (as shown in Figure 2b), where nodes represent edges in the original traffic graph, and edges represent IP–port combination relationships. Such a design significantly improves detection efficiency and accuracy.

Among them, S represents Source, which represents the sender in the network; D represents Destination, which represents the receiver in the network; and 1, 2, 3, 4, a, b, c, and d correspond to the 8 IP–port pairs in the network. When converting, the connecting edges between the nodes in Figure 2a are used as nodes in the right figure. For example, 1a represents the edge between node 1 and node a in Figure 2a, which will connect the two nodes to the same node. The edges are connected in Figure 2b. For example, both edges 1b and 4b include node b, so 1b and 4b are connected in Figure 2b.

3.4.3. EGAT Module

The EGAT module employs a pre-sampling strategy to enhance the performance of the attack detection model in complex network environments. Specifically, the neighborhood of a given node is uniformly sampled with a fixed number of neighbors, where the neighborhood is defined as the set of source and destination nodes. By uniformly sampling this fixed-size neighborhood, the number of sampled neighbors for each node remains consistent, facilitating batch formation and enabling efficient batch training on the CPU. This approach ensures more structured data handling, improving both training efficiency and model scalability.

The input of the GAT aggregation layer consists of the F-dimensional feature representation of the sampled nodes, denoted as

h = {{\vec{h}}_{1}, {\vec{h}}_{2}, . . . {\vec{h}}_{N}}, {\vec{h}}_{n} \in R^{F}

, where N represents the number of nodes in a batch and F corresponds to the feature count of a single node. The output of all node features is denoted as

h = {{\vec{h^{'}}}_{1}, {\vec{h^{'}}}_{2}, . . . {\vec{h^{'}}}_{N}}, {\vec{h^{'}}}_{n} \in R^{F^{'}}

. In this process, a learnable linear transformation W is required to achieve feature enhancement and transform the input features into high-level features. Figure 3 shows the core aggregation mechanism of GAT. For node i, the following equation calculates the attention correlation coefficient

e_{i j}

between all its neighbor nodes j and it, as follows:

e_{i j} = a ([W h_{i} | | W h_{j}]), j \in N_{i},

(1)

where

e_{i j}

represents the importance of neighbor node j to target node i;

[• | | •]

represents adding

W h_{i}

and

W h_{j}

together; a is a

2 F^{'}

weight vector of the hidden layer, where

W \in R^{F^{'} F}

is the shared weight matrix; F and

F^{'}

are the number of features of the input and output layers of the network layer, respectively; and the weight matrix establishes the mapping relationship between these two parameters.

In order to compare coefficients between different nodes, the attention coefficients of all neighboring nodes are normalized using the Softmax function and used as feature weights. Then, the new node features are calculated by weighted summing of neighboring nodes. The updated feature vector calculation equation in a single-layer GAT is as follows:

a_{i j} = \frac{e^{L e a k y R e L U ({\vec{a}}^{T}) [W \vec{h_{i}} | | W \vec{h_{j}}]}}{\sum_{k \in N_{i}} e^{L e a k y R e L U ({\vec{a}}^{T} [W {\vec{h}}_{i} | | W {\vec{h}}_{k}]}},

(2)

where

| |

is the connection operation,

N_{i}

represents the set of sampled neighboring nodes of the target node i, and W is the shared weight matrix.

{\vec{h}}_{i}

and

{\vec{h}}_{j}

are the feature vectors of node i and its neighbor node j. In addition,

{\vec{a}}^{T} \in R^{2 F^{'}}

represents the weight vector, and LeakyReLU is used as the activation function. Then, through the activation function

σ

, the new feature vector

{\vec{h}}_{i}^{'}

of the target node i is obtained after normalization as

{\vec{h}}_{i}^{'} = σ (\sum_{j \in N_{i}} a_{i j} W {\vec{h}}_{j}) .

(3)

The output node features are represented as

h_{i}^{'} = {{\vec{h}}_{1}^{'}, {\vec{h}}_{2}^{'}, \dots {\vec{h}}_{N}^{'}}, {\vec{h}}_{n}^{'} \in R^{F^{'}} .

(4)

In order to improve the model’s structural differentiation ability in node embedding, we improved the target node feature aggregation strategy and retained the original attention mechanism learning ability. Maltof optimizes the feature aggregation method of the target node to enhance the output effect of the new node vector feature of each network layer as

h_{i}^{'} = σ^{l} (\sum_{j \in N_{i}} a_{i j}^{l - 1} {\vec{h}}_{j}^{l - 1} + {\vec{h}}_{j}^{l - 1}) .

(5)

where l represents the number of network layers,

σ^{l}

represents the nonlinear activation function of the lth layer, and

{\vec{h}}_{j}^{l - 1}

represents the final result of the dot product between the features of all neighboring nodes of the target node in the

l - 1

th layer network and the matrix W.

In the process of updating the node features of the next layer, the original node features

{\vec{h}}_{j}^{l - 1}

are retained without weight selection. This ensures that the basic information encapsulated by the original node features can be incorporated into the training of the next layer of the network, thereby improving the performance of the model.

In order to make the attention learning process more stable, the graph attention layer introduces a multi-head attention mechanism, that is, using K independent attention mechanisms to update the node features, and then Maltof chooses to adopt the splicing aggregation strategy to retain the node features as

h_{i}^{'} = {| |}_{k = 1}^{K} σ (\sum_{j \in N_{i}} a_{i j}^{k} W^{k} {\vec{h}}_{j}),

(6)

where

| |

represents concatenation, K is the number of attention heads used,

a_{i j}^{k}

is the attention score calculated by the K attention mechanism, and

W^{k}

is the K weight matrix.

4. Experimental Results and Analysis

4.1. Experimental Environment

We deployed Maltof on a Barefoot Tofino switch. The switch is equipped with an Intel J1900 4-core 2.0GHz CPU and 8GB memory, and has a total of 54 Ethernet interfaces and 2 management interfaces. The data interfaces include 48 10 GbE (SFP+) or 25GbE (SFP28) interfaces and 6 40GbE (QSFP+)/100GbE (QSFP28) interfaces. The experiment evaluates the overall performance of Maltof for real-time online malicious traffic detection. The Tofino switch is used to perform rule matching and traffic filtering in the data plane, while the CPU is responsible for further deep identification of potential malicious traffic. The experiment uses PyTorch 1.13 [26] as a deep learning framework.

We use two high-performance servers as the sender and receiver. The servers are equipped with two Intel(R) Xeon(R) E5-2690 v2 CPUs (3.00 GHz, 10 cores), 64G RAM, and two 10 G Ethernet NICs, running Linux kernel 4.4.0. We use packETH to simulate network traffic. The all-traffic dataset is divided into 70% training set and 30% test set, with a batch size of 500, and the model is trained for 2 epochs. Additionally, Maltof uses a 3-layer EGAT, each layer contains 6 attention heads, and the hidden layer size is 64. Table 2 shows other model parameter settings.

4.2. Experimental Dataset

We evaluate the performance of the online classification of NF-UNSW-NB15 data on the P4 switch. NF-UNSW-NB15 (The dataset is available at https://staff.itee.uq.edu.au/marius/NIDS_datasets/, accessed on 23 June 2025). [27] is a new variant of the original UNSW-NB15 dataset from PCAP format to NetFlow format. The benefits of using NetFlow as a common format include its practicality, widespread deployment in production networks, and scalability. The total number of NF-UNSW-NB15 data flows is 1,623,118, of which 72,406 (4.46%) are malicious attack samples and 1,550,712 (95.54%) are benign samples. The dataset is also class-imbalanced, and normal traffic is downsampled. Table 3 shows the description of the NF-UNSW-NB15 dataset, and Table 4 presents the details of each flow data.

The NF-UNSW-NB15 dataset has become a benchmark in network intrusion research due to its comprehensive representation of real-world traffic patterns and diverse attack scenarios. By incorporating a wide range of contemporary attack vectors alongside normal background traffic, NF-UNSW-NB15 offers researchers a balanced and high-fidelity platform for training and evaluating machine learning–based detection models. As shown in Table 5, numerous recent studies have successfully leveraged NF-UNSW-NB15 to validate both deep learning and graph-based intrusion detection approaches, demonstrating its robustness and extensibility across different methodologies. In our work, we similarly adopt NF-UNSW-NB15 to ensure that our proposed model is tested against a realistic and challenging traffic environment, thereby providing a fair basis for comparison with existing techniques and confirming the dataset’s continued relevance to modern cybersecurity problems.

4.3. Comparison Method

(1) CNN: CNN is a deep learning model. The convolutional neural network used in this comparative experiment consists of two convolutional modules, a fully connected layer, a dropout layer, and a Softmax layer.

(2) LSTM: LSTM is a special recurrent neural network. This comparative experiment consists of an input layer, two LSTM hidden layers, and an output layer. The stacking structure enhances the learning ability of the model.

(3) E-GraphSAGE [19]: The model updates the feature representation of the target node by aggregating the feature information of neighboring nodes. Specifically, it consists of two aggregation layers and an output layer, and the aggregation function used is average aggregation.

A unified training protocol was adopted with a batch size of 500, 2 training epochs, cross-entropy as the loss function, and the Adam optimizer. Model-specific configurations are as follows: The CNN model consists of two convolutional modules, each comprising a convolutional layer (64 kernels of size 3 × 3, stride 1, padding 1), a BatchNorm layer, and a ReLU activation function. The convolutional blocks are followed by a fully connected layer with 128 units, a dropout layer with a rate of 0.2, and a Softmax output layer. The learning rate is set to 0.005. The LSTM model utilizes two stacked LSTM layers (128 units per layer), with tanh as the activation function, followed by a Softmax output layer. The learning rate is set to 0.01. The E-GraphSAGE model comprises two GraphSAGE layers (each with 64 units), employing the MeanPooling aggregation function and ReLU activation. This model uses a learning rate of 0.005 and a dropout rate of 0.2 and incorporates L2 regularization with a weight decay coefficient of

1 \times 10^{5}

, followed by a Softmax output layer.

This section selects accuracy, precision, recall, and F1-score as the criteria for evaluating the classification effect of the model. Binary classification refers to detecting benign/malicious attacks in RF, while multi-class classification refers to detecting whether the data packet is benign or a specific type of attack.

4.4. RF Multi-Class Classification Results

In the first stage, Maltof embeds the RF model into the data plane of Tofino and preliminarily screens the massive network traffic through the matching-action table mechanism. Additionally, a lower threshold increases sensitivity, reducing the likelihood of false negatives but at the expense of system overhead due to more traffic being forwarded for deep analysis. Conversely, a higher threshold improves efficiency but may miss subtle malicious patterns. In our implementation, a threshold of 38 was selected to balance this trade-off, ensuring real-time processing capabilities of the P4 switch while achieving high detection accuracy (see Figure 4). Figure 4 shows the detection indicators of the binary and multi-classification tasks of the RF model on the NF-UNSW-NB15 dataset. Among them, the binary classification task refers to the ability of RF to distinguish between normal traffic and malicious traffic, while the multi-classification task involves the ability of RF to identify whether the traffic is normal or a specific type of attack. The results show that RF performs better in relatively simple binary classification tasks.

The multi-classification confusion matrix in Figure 5 shows that the performance of the RF model is good in the binary classification task (Figure 5a) but slightly insufficient in the multi-classification task (Figure 5b), and some traffic categories are misclassified as other types of malicious packets. This reflects that RF still has room for improvement in identifying different types of attacks. However, the main goal of this stage is to use the efficiency of the RF model to quickly isolate potential malicious traffic and preliminarily screen out suspicious packets, thereby effectively reducing the burden of subsequent deep detection and ensuring the system’s processing capabilities in high-throughput scenarios.

4.5. EGAT Multi-Class Classification Results

In the second stage, Maltof deploys a lightweight EGAT model on the switch’s CPU. The confusion matrix results shown in Figure 6 show that the EGAT model significantly improves the accuracy in multi-classification tasks, especially in distinguishing specific attack types. Specifically, the EGAT model can accurately classify complex attack types (such as Exploit, Reconnaissance, and Shellcode), reflecting its high robustness and adaptability in complex network environments.

In addition, the ROC curve [34] in Figure 7 further verifies the excellent performance of the EGAT model. In various types of attack detection, the AUC value of the EGAT model is close to 1.0. Especially when detecting normal traffic, exploit and reconnaissance attacks, the AUC value is as high as 0.99, indicating that the model has extremely low false-positive and false-negative rates in practical applications, and Maltof has a strong discriminative ability in distinguishing normal traffic from attacks; that is, it can achieve an almost zero false-positive rate and an almost one recall rate simultaneously.

This two-stage detection mechanism shows excellent performance in real-time detection of network traffic. It can not only meet the needs of high throughput, but also provide high-precision attack detection capabilities, which is very suitable for network security scenarios that require real-time response.

According to the multi-classification comparison results in Table 6 and Figure 8, the EGAT model performs better than other models (such as CNN, LSTM, and E-GraphSAGE) in terms of detection performance. Specifically, the EGAT model achieved the highest scores on all evaluation metrics, with an accuracy of 0.9785 and precision, recall, and F1 scores of 0.9779, 0.9785, and 0.9786, respectively.

In addition, Figure 8 shows that the EGAT model maintains good performance in various attack detection tasks. Especially in the detection of complex attack types (such as Shellcode, Worms, and Dos), the F1 score of the EGAT model is significantly higher than that other models. It further shows that it is more robust and adaptable in multi-type attack detection.

The average training time per epoch is approximately 282.5 s. Each epoch processed around 108 batches (batch size = 500). The average processing time per batch is about 2.62 s, and the average training time per sample is approximately 5.25 ms.

4.6. Discussion

In the experimental results, Maltof achieves a higher multi-classification F1 score on the NF-UNSW-NB15 dataset compared with CNN, LSTM, and E-GraphSAGE, as shown in Figure 8. The embedded attention mechanism in Maltof effectively weights network traffic features, enhancing its ability to capture complex patterns. Unlike traditional LSTM, which can only extract temporal features in a unidirectional manner, Maltof leverages the parallel computing advantages provided by deep programmable hardware on the Tofino switch to achieve lower latency and higher throughput, thereby improving real-time detection performance.

E-GraphSAGE, while having advantages in graph structure feature extraction, suffers from latency bottlenecks in practical deployment due to the lack of direct adaptation to network hardware (such as Barefoot Tofino). Maltof fully utilizes the enhanced capabilities of the data plane through the P4 language, combining GNNs with DL frameworks. This approach retains the advantages of GNNs in topological learning while achieving line-rate forwarding-level anomaly detection.

Currently, experiments have only been conducted on the NF-UNSW-NB15 dataset, and further validation is needed to assess generalization capabilities on other intrusion detection datasets (such as CIC-IDS2017) or real production network traffic. Additionally, a systematic evaluation of the energy consumption and switch resource usage required for the practical deployment of Maltof has not yet been performed. Future work will include energy consumption testing and cost–benefit analysis to evaluate its feasibility in operator-level networks. The large-scale imbalance of positive and negative samples in the NF-UNSW-NB15 dataset may lead to false negatives in minority classes. Future work should further reduce the risk of false negatives through methods such as weighted loss or oversampling, and continuously validate the robustness of the model in real production environments.

Additionally, control charts are one of the main tools of statistical process control (SPC). They monitor the changes of process indicators in real time and identify the deviation from the “controlled” state to the “out of control” state [35,36]. Classic methods such as Shewhart control charts, exponentially weighted moving average (EWMA), and cumulative sum (CUSUM) can achieve early detection and location of abnormal signals through continuous observation of single or multiple indicators [36]. In the field of network traffic monitoring, control chart methods have attracted the attention of researchers due to their advantages such as robustness to noise and adjustable thresholds. For multivariate control charts of dynamic networks, a variety of network features are combined to achieve adaptive monitoring of various network forms by flexibly selecting parametric or non-parametric control charts, and their efficiency and reliability are verified on simulation and real datasets [37]. In addition, for network sequences with autocorrelation characteristics, the performance of different control charts is evaluated based on the separable time series exponential random graph model. This study found that the detection ability of standard Shewhart, EWMA, and CUSUM charts decreases in high autocorrelation scenarios, while the control chart based on residuals can significantly improve the anomaly detection rate [38]. In the future, we will further consider introducing SPC and control chart methods into network traffic monitoring.

5. Summary

This paper proposes Maltof, an intelligent network malicious traffic online detection system deployed on programmable switches, aiming to improve the accuracy and efficiency of malicious traffic detection. The system first uses the P4 programming language to integrate a rule-based RF model on the switch’s Tofino chip to quickly screen potentially malicious traffic; then, it runs a lightweight EGAT model on the switch CPU to conduct in-depth analysis and detection of the filtered data packets. Through this layered detection mechanism, Maltof significantly reduces the computational burden and can quickly respond to potential threats in high-speed network environments. Experimental results show that Maltof performs well in malicious traffic detection and has broad application prospects.

At the same time, Maltof still has some limitations. For example, (1) the parallel computing capability of P4 switches is weak, which can be solved by deploying GPUs on programmable switches in the future. (2) P4 programming capabilities need to be improved, and the syntax and functions are not flexible for multi-classification tasks in traffic detection. In the future, it is an important task for Maltof to implement more powerful malicious traffic detection based on the latest programmable hardware, and we will further explore integration with control plane frameworks such as ONOS and address extensions like P4 NetFPGA, enabling dynamic policy updates and seamless orchestration in real-world networks.

Author Contributions

Conceptualization, X.G. and L.T.; Methodology, X.G.; Software, X.G. and L.T.; Validation, X.G.; Formal analysis, J.W.; Data curation, P.Z.; Writing—review & editing, S.C.; Visualization, P.Z. and J.W.; Supervision, P.Z. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the Shandong Provincial Natural Science Foundation under Grant Numbers ZR2023LZH017, ZR2022LZH015, ZR2023QF025, and ZR2024MF066; the National Natural Science Foundation of China under Grant Numbers 62471493 and 62402257; and the China University Research Innovation Fund under Grant Number 2023IT207.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this survey.

References

Al-Garadi, M.A.; Mohamed, A.; Al-Ali, A.K.; Du, X.; Ali, I.; Guizani, M. A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security. IEEE Commun. Surv. Tutor. 2020, 22, 1646–1685. [Google Scholar] [CrossRef]
Fu, C.; Li, Q.; Shen, M.; Xu, K. Frequency Domain Feature based Robust Malicious Traffic Detection. IEEE/ACM Trans. Netw. 2022, 31, 452–467. [Google Scholar] [CrossRef]
Kumar, R.; Swarnkar, M.; Singal, G.; Kumar, N. IoT Network Traffic Classification using Machine Learning Algorithms: An Experimental Analysis. IEEE Internet Things J. 2021, 9, 989–1008. [Google Scholar] [CrossRef]
Azab, A.; Khasawneh, M.; Alrabaee, S.; Choo, K.K.R.; Sarsour, M. Network Traffic Classification: Techniques, Datasets, and Challenges. Digit. Commun. Netw. 2024, 10, 676–692. [Google Scholar] [CrossRef]
Fu, C.; Li, Q.; Xu, K.; Wu, J. Point Cloud Analysis for ML-based Malicious Traffic Detection: Reducing Majorities of False Positive Alarms. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ACM, Copenhagen, Denmark, 26 November 2023; pp. 1005–1019. [Google Scholar]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent Advances in Convolutional Neural Networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Van Houdt, G.; Mosquera, C.; Nápoles, G. A Review on the Long Short-Term Memory Model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
Zhang, H.; Li, Y.; Lv, Z.; Sangaiah, A.K.; Huang, T. A Real-time and Ubiquitous Network Attack Detection based on Deep Belief Network and Support Vector Machine. IEEE/CAA J. Autom. Sin. 2020, 7, 790–799. [Google Scholar] [CrossRef]
Shaji, N.S.; Jain, T.; Muthalagu, R.; Pawar, P.M. Deep-discovery: Anomaly discovery in software-defined networks using artificial neural networks. Comput. Secur. 2023, 132, 103320. [Google Scholar] [CrossRef]
Bosshart, P.; Daly, D.; Gibb, G.; Izzard, M.; McKeown, N.; Rexford, J.; Schlesinger, C.; Talayco, D.; Vahdat, A.; Varghese, G.; et al. P4: Programming Protocol-independent Packet Processors. ACM SIGCOMM Comput. Commun. Rev. 2014, 44, 87–95. [Google Scholar] [CrossRef]
Pan, T.; Yu, N.; Jia, C.; Pi, J.; Xu, L.; Qiao, Y.; Li, Z.; Liu, K.; Lu, J.; Lu, J.; et al. Sailfish: Accelerating Cloud-scale Multi-tenant Multi-service Gateways with Programmable Switches. In Proceedings of the SIGCOMM’21, ACM, Virtual Event, USA, 27 August 2021; pp. 194–206. [Google Scholar]
Chen, Z.; Cheng, G.; Niu, D.; Qiu, X.; Zhao, Y.; Zhou, Y. WFF-EGNN: Encrypted Traffic Classification based on Weaved Flow Fragment via Ensemble Graph Neural Networks. IEEE Trans. Mach. Learn. Commun. Netw. 2023, 1, 389–411. [Google Scholar] [CrossRef]
Keshk, M.; Koroniotis, N.; Pham, N.; Moustafa, N.; Turnbull, B.; Zomaya, A.Y. An Explainable Deep Learning-enabled Intrusion Detection Framework in IoT Networks. Inf. Sci. 2023, 639, 119000. [Google Scholar] [CrossRef]
Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware Traffic Classification using Convolutional Neural Network for Representation Learning. In Proceedings of the ICOIN’17, IEEE, Da Nang, Vietnam, 11–13 January 2017; pp. 712–717. [Google Scholar]
Wang, W.; Zhu, M.; Wang, J.; Zeng, X.; Yang, Z. End-to-end Encrypted Traffic Classification with One-dimensional Convolution Neural Networks. In Proceedings of the ISI’17, Beijing, China, 22–24 July 2017; pp. 43–48. [Google Scholar]
Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A.; Lloret, J. Network Traffic Classifier with Convolutional and Recurrent Neural Networks for Internet of Things. IEEE Access 2017, 5, 18042–18050. [Google Scholar] [CrossRef]
Zhang, J.; Li, F.; Ye, F.; Wu, H. Autonomous Unknown-application Filtering and Labeling for DL-based Traffic Classifier Update. In Proceedings of the INFOCOM’20, IEEE, Virtual Event, USA, 6–9 July 2020; pp. 397–405. [Google Scholar]
Sun, B.; Yang, W.; Yan, M.; Wu, D.; Zhu, Y.; Bai, Z. An Encrypted Traffic Classification Method Combining Graph Convolutional Network and Autoencoder. In Proceedings of the IPCCC’20, Virtual Event, USA, 20 November 2020; pp. 1–8. [Google Scholar]
Lo, W.W.; Layeghy, S.; Sarhan, M.; Gallagher, M.; Portmann, M. E-graphsage: A Graph Neural Network based Intrusion Detection System for IoT. In Proceedings of the NOMS’22, Budapest, Hungary, 25–29 April 2022; pp. 1–9. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Networks 2008, 20, 61–80. [Google Scholar] [CrossRef]
AlSabeh, A.; Khoury, J.; Kfoury, E.; Crichigno, J.; Bou-Harb, E. A survey on security applications of P4 programmable switches and a STRIDE-based vulnerability assessment. Comput. Netw. 2022, 207, 108800. [Google Scholar] [CrossRef]
Mi, Y.; Wang, A. ML-pushback: Machine Learning based Pushback Defense Against DDoS. In Proceedings of the CoNEXT’19, Orlando, FL, USA, 9–12 December 2019; pp. 80–81. [Google Scholar]
Xavier, B.M.; Guimarães, R.S.; Comarela, G.; Martinello, M. Programmable Switches for In-networking Classification. In Proceedings of the INFOCOM’21, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
Xie, G.; Li, Q.; Dong, Y.; Duan, G.; Jiang, Y.; Duan, J. Mousika: Enable General In-network Intelligence in Programmable Switches by Knowledge Distillation. In Proceedings of the INFOCOM’22, London, UK, 2–5 May 2022; pp. 1938–1947. [Google Scholar]
Baldini, G.; Amerini, I. Online Distributed Denial of Service (DDoS) Intrusion Detection based on Adaptive Sliding Window and Morphological Fractal Dimension. Comput. Netw. 2022, 210, 108923. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In NIPS’19; Curran Associates Inc.: Red Hook, NY, USA, 2019. [Google Scholar]
Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Proceedings of the MilCIS’15, IEEE, Canberra, ACT, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar]
Alam, K.; Monir, M.F.; Hassan, Z.; Habib, M.T. Optimizing IoT Network Intrusion Detection: A Deep Learning Approach. In Proceedings of the 2024 7th Conference on Cloud and Internet of Things (CIoT), Montreal, QC, Canada, 29–31 October 2024; pp. 1–5. [Google Scholar] [CrossRef]
Bhuiyan, M.H.; Alam, K.; Shahin, K.I.; Farid, D.M. A Deep Learning Approach for Network Intrusion Classification. In Proceedings of the 2024 IEEE Region 10 Symposium (TENSYMP), New Delhi, India, 27–29 September 2024; pp. 1–6. [Google Scholar] [CrossRef]
Xie, S.; Zhan, C.; Li, J.; Li, Y. Intrusion Detection Method based on Graph Edge Attention and Focal Loss. In Proceedings of the 2025 4th International Conference on Cryptography, Network Security and Communication Technology, New York, NY, USA, 17–19 January 2025; pp. 21–28. [Google Scholar] [CrossRef]
Mohy-Eddine, M.; Guezzaz, A.; Benkirane, S.; Azrour, M.; Farhaoui, Y. An Ensemble Learning Based Intrusion Detection Model for Industrial IoT Security. Big Data Min. Anal. 2023, 6, 273–287. [Google Scholar] [CrossRef]
Yang, C.; Wu, L.; Xu, J.; Ren, Y.; Tian, B.; Wei, Z. Graph Learning Framework for Data Link Anomaly Detection. IEEE Access 2024, 12, 114820–114828. [Google Scholar] [CrossRef]
Wu, C.; Sun, J.; Chen, J.; Alazab, M.; Liu, Y.; Xiang, Y. TCG-IDS: Robust Network Intrusion Detection via Temporal Contrastive Graph Learning. IEEE Trans. Inf. Forensics Secur. 2025, 20, 1475–1486. [Google Scholar] [CrossRef]
Park, S.H.; Goo, J.M.; Jo, C.H. Receiver Operating Characteristic (ROC) Curve: Practical Review for Radiologists. Korean J. Radiol. 2004, 5, 11–18. [Google Scholar] [CrossRef]
Aykroyd, R.G.; Leiva, V.; Ruggeri, F. Recent developments of control charts, identification of big data sources and future trends of current research. Technol. Forecast. Soc. Chang. 2019, 144, 221–232. [Google Scholar] [CrossRef]
Yeganeh, A.; Shadman, A.R.; Triantafyllou, I.S.; Shongwe, S.C.; Abbasi, S.A. Run Rules-Based EWMA Charts for Efficient Monitoring of Profile Parameters. IEEE Access 2021, 9, 38503–38521. [Google Scholar] [CrossRef]
Flossdorf, J.; Fried, R.; Jentsch, C. Online monitoring of dynamic networks using flexible multivariate control charts. Soc. Netw. Anal. Min. 2023, 13, 87. [Google Scholar] [CrossRef]
Zhou, P.; Lin, D.K.; Niu, X.; He, Z. Performance evaluation method for network monitoring based on separable temporal exponential random graph models with application to the study of autocorrelation effects. Comput. Ind. Eng. 2020, 145, 106507. [Google Scholar] [CrossRef]

Figure 1. The architecture design of Maltof.

Figure 2. The construction of a traffic trajectory map.

Figure 3. Neighborhood feature aggregation based on GAT.

Figure 4. Performance indicators of the RF model on the NF-UNSW-NB15.

Figure 5. Confusion matrix of the RF model. Note: The horizontal axis represents the category predicted by the model, and the vertical axis represents the actual category. The depth of the color block reflects the number of categories after normalization. The darker the main diagonal line, the higher the proportion of the corresponding category being correctly classified. (a) Binary classification task; (b) multi-classification task.

Figure 6. Confusion matrix of the EGAT model in the multi-classification task.

Figure 7. ROC curve of the EGAT model.

Figure 8. Comparison of multi-classification F1 values on the NF-UNSW-NB15 dataset.

Table 1. Contributions of main malicious traffic detection based on DL.

Ref	Model	Dataset	Contributions
Wang et al. [14]	CNN	-	First-time application of representation learning methods to malicious traffic classification shows experimental results that meet the accuracy requirements for real-world applications.
Wang et al. [15]	CNN	ISCX VPN–non-VPN	Automatically extract nonlinear features using E2E approach; in four experiments, 11 out of 12 evaluation metrics outperformed existing methods.
Lopez et al. [16]	CNN + RNN	IoT network traffic	No feature engineering is required, and CNN is naturally extended to traffic classification.
Zhang et al. [17]	Learning Traffic Classifier	ISCX VPN–non-VPN	Capable of effectively filtering unknown-category traffic in real time and providing accurate labels, with support for online classifier updates.
Sun et al. [18]	GCN	Open public datasets	Extract structural features using GCN and employing an autoencoder to complement flow data representation; high classification performance can still be achieved even when labeled samples are limited.
Lo et al. [19]	NN	Real network data	Capable of capturing all intrusion samples with zero false positives; custom window-based feature extraction for critical infrastructure environments enhances detection reliability.
This work	EGAT	NF-UNSW-NB15	This paper proposes a malicious traffic detection architecture that collaborates with CPU and Tofino. It combines the rule-based fast pre-filtering on the Tofino side with the EGAT deep reasoning analysis on the CPU side to perform deeper feature aggregation and classification of suspicious traffic, thus achieving high-accuracy and low-latency malicious traffic detection.

Table 2. Model parameter settings.

Experimental Platform	Setting
Parameters of Algorithm 1.
n_estimators	64
max_depth	4
max_features	sqrt
min_samples_split	2
min_samples_leaf	1
bootstrap	True
THRESHOLD	38
Parameters of Algorithm 2.
Activation function	eLU
Loss function	Cross-entropy
Optimization algorithm	Adam
Number of training rounds	Epoch = 2
Batch size	Batch_size = 500
Learning rate	Lr = 0.007
Dropout	0.2

Table 3. NF-UNSW-NB15 dataset.

Category Name	Meaning	Quantity
Normal	Normal traffic	46,521
Analysis	Attack traffic infiltrated through port scanning, emails, and web script files.	200
Backdoor	Attack traffic that bypasses security mechanisms such as identity authentication to illegally access data.	179
DoS	Attack traffic that occupies a large amount of memory resources and makes the network unable to provide normal services.	505
Exploits	Attack traffic that exploits security vulnerabilities in operating systems, etc.	2474
Fuzzers	Attack traffic that causes a target program to overflow by inputting random data.	1948
Generic	Use hash functions to create collision attack traffic for each block cipher.	557
Reconnaissance	Attack traffic that bypasses security mechanisms by collecting network information.	1230
Shellcode	Attack traffic that allows attackers to exploit vulnerabilities and execute arbitrary instructions by adding code blocks.	137
Worms	Attack traffic that replicates itself and spreads to other target hosts.	30
Generic	Use hash functions to create collision attack traffic for each block cipher.	550

Table 4. Details of flows within NF-UNSW-NB15.

Name	Description
SRC_ADDR	The source IP address for the data flow.
SRC_PORT	The source port.
DST_ADDR	The destination IP address for the data flow.
DST_PORT	The destination port.
PROTOCOL	The transport-layer protocol.
PROTO	The application-layer protocol.
IN_BYTES	The total number of incoming bytes from the source.
OUT_BYTES	The total number of outgoing bytes to the destination.
IN_PKTS	The total number of incoming packets from the source.
OUT_PKTS	The total number of outgoing packets to the destination.
TCP_FLAGS	TCP control flags set on the connection to track flow state.
MILLISECONDS	The total duration of the flow, measured in milliseconds.
Label	A label describing whether the flow is normal or suspicious.
Attack	Indicator specifying if the flow is part of an attack scenario.

Table 5. Summary of NF-UNSW-NB15 usage in recent intrusion detection studies.

Ref	Method	Purpose	Task Type	Result
Alam et al. [28]	CNN	Generalization test for IoT traffic detection	Binary-class	High accuracy and F1 (main dataset: F1 = 0.9952); outperforms CNN+LSTM, DNN, etc.
Bhuiyan et al. [29]	DNN	Intrusion type classification and cross-dataset robustness evaluation	Binary-class	Outperforms baseline models on NF datasets; accuracy up to 0.99 on test set
Xie et al. [30]	Edge Attention	Multi-class attack detection under imbalanced data	Multi-class	Accuracy = 97.87% on NF-UNSW-NB15
Mohy-Eddine et al. [31]	RF	Industrial IoT intrusion detection with feature selection	Binary-class	Accuracy = 99.30% on NF-UNSW-NB15-v2; low inference time
Yang et al. [32]	Graph Attention	Link anomaly detection using structural and edge features	Binary-class and multi-class	Outperforms baselines on multiple datasets including NF-UNSW-NB15
Wu et al. [33]	GNN	Multi-type attack detection in zero-trust networks	Binary-class and multi-class	Balanced accuracy = 91.48%, FPR = 3.34% on NF-UNSW-NB15-v2

Table 6. Comparison of multi-classification performance with other algorithms.

Method	Accuracy	Precision	Recall	F1
CNN	0.9751	0.9743	0.9747	0.9748
LSTM	0.9744	0.9681	0.9743	0.9712
E-GraphSAGE	0.9764	0.9727	0.9764	0.9746
EGAT	0.9785	0.9779	0.9785	0.9786

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, X.; Tan, L.; Chen, S.; Zhang, P.; Wang, J. Malicious Traffic Detection on Tofino Using Graph Attention Model. Appl. Sci. 2025, 15, 7179. https://doi.org/10.3390/app15137179

AMA Style

Gao X, Tan L, Chen S, Zhang P, Wang J. Malicious Traffic Detection on Tofino Using Graph Attention Model. Applied Sciences. 2025; 15(13):7179. https://doi.org/10.3390/app15137179

Chicago/Turabian Style

Gao, Xichang, Lizhuang Tan, Shengpeng Chen, Peiying Zhang, and Jian Wang. 2025. "Malicious Traffic Detection on Tofino Using Graph Attention Model" Applied Sciences 15, no. 13: 7179. https://doi.org/10.3390/app15137179

APA Style

Gao, X., Tan, L., Chen, S., Zhang, P., & Wang, J. (2025). Malicious Traffic Detection on Tofino Using Graph Attention Model. Applied Sciences, 15(13), 7179. https://doi.org/10.3390/app15137179

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Malicious Traffic Detection on Tofino Using Graph Attention Model

Abstract

1. Introduction

2. Background and Related Works

2.1. Malicious Traffic Detection Based on Deep Learning

2.2. P4 Programmable Switch

3. Maltof

3.1. Problem Statement

3.2. Overall Architecture

3.3. RF-Based Screening Module

3.4. Deep Detection Module

3.4.1. Sliding Window Management

3.4.2. Traffic Trajectory Map Construction

3.4.3. EGAT Module

4. Experimental Results and Analysis

4.1. Experimental Environment

4.2. Experimental Dataset

4.3. Comparison Method

4.4. RF Multi-Class Classification Results

4.5. EGAT Multi-Class Classification Results

4.6. Discussion

5. Summary

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI