Anomaly Detection in Network Traffic via Cross-Domain Federated Graph Representation Learning

Zhao, Yanli; Liu, Zongduo; Pang, Junjie

doi:10.3390/app15116258

Open AccessArticle

Anomaly Detection in Network Traffic via Cross-Domain Federated Graph Representation Learning

by

Yanli Zhao

,

Zongduo Liu

and

Junjie Pang

^*

College of Computer Science and Technology, Qingdao University, Qingdao 266071, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 6258; https://doi.org/10.3390/app15116258

Submission received: 3 April 2025 / Revised: 27 May 2025 / Accepted: 30 May 2025 / Published: 2 June 2025

(This article belongs to the Special Issue Advances and Challenges in the Next-Generation Internet of Things (IoT))

Download

Browse Figures

Versions Notes

Abstract

With the growing complexity and frequency of network threats, anomaly detection in network traffic has become a vital task for ensuring cybersecurity. Traditional detection approaches typically rely on statistical features while overlooking the interaction patterns and structural dependencies among traffic flows. In addition, network traffic data are distributed across heterogeneous devices and domains, where centralized training methods face significant challenges such as data leakage and data silos. To address these issues, we propose a network traffic anomaly detection method based on cross-domain federated graph representation learning. In this method, network traffic is modeled as a graph, and a feature-structure decoupling design is adopted to separate the encoding and learning of graph topology and node attributes. Only structural information with minimal sensitive content is transmitted to the central server, whereas sensitive node attributes are preserved and processed locally to enhance privacy protection. Furthermore, a cross-gated feature fusion mechanism is introduced to enhance the expressive interaction between features and to generate graph-level embeddings for anomaly classification. To further improve the model’s generalization across domains, a cross-domain structural guidance mechanism is implemented on the server side, which integrates structural information from multiple domains to guide the training of local models. Comparative experiments with other methods demonstrate that the proposed approach achieves superior performance in distributed network traffic anomaly detection scenarios.

Keywords:

graph neural network; federated learning; network traffic anomaly detection

1. Introduction

In the era of digitization and intelligence, network traffic anomaly detection has become the key to measuring network health. A network traffic anomaly refers to the phenomenon where the behavior of network traffic deviates significantly from its normal state, which will deteriorate the normal operation of the network, leading not only to network performance degradation but also possibly data privacy leakage and other security issues [1]. For example, malware propagation, abnormal port scans, and other network attacks can generate abnormal network traffic, seriously threatening data security and network stability. Therefore, network traffic anomaly detection is one of the important means to ensuring network security.

Network traffic anomaly detection is a crucial technique that identifies irregularities by analyzing network communication data. Among its various approaches, static data analysis stands out as a significant research direction. This method focuses on offline analysis of pre-collected and stored network traffic data, aiming to uncover hidden patterns and features within historical traffic to detect abnormal behavior. Traditional static detection approaches often rely on manual feature extraction and rule-based strategies, such as thresholding, clustering-based anomaly identification, and classical machine learning algorithms like support vector machine (SVMs) and decision trees [2]. Although these approaches have shown a certain level of effectiveness, they exhibit significant limitations when confronting modern advanced attacks. Such attacks often employ complex and multi-stage strategies, such as advanced persistent threats (APTs) and distributed denial of service (DDoS) attacks, which are characterized by distinct structural features [3]. For instance, DDoS attacks often involve a large number of distributed nodes simultaneously communicating with a single target node, resulting in a high-density, star-shaped subgraph centered on the target. In contrast, APT attacks are highly progressive and feature complex attack paths and lateral movements involving multiple hops and communication routes, making them difficult to detect through isolated traffic features alone. Traditional machine learning methods, lacking the ability to model structural relationships among traffic flows effectively, struggle to accurately capture the coordinated patterns underlying such attacks.

Recent studies have demonstrated that abstracting network traffic into a graph structure is an effective modeling approach, where nodes represent entities such as IP addresses or devices and edges correspond to communication events between these entities [4]. This graph-based abstraction enables the explicit capture of topological dependencies among traffic flows and entities, thereby providing graph-based methods with a natural advantage in anomaly detection tasks. Among them, graph neural networks (GNNs) have emerged as powerful tools for analyzing complex interactions and latent structural patterns within graphs due to their superior structural modeling capabilities. However, most GNN-based models are developed under centralized settings, where access to all data is assumed. This assumption raises significant privacy concerns and limits the practicality of such models in real-world applications.

In addition, network traffic data are naturally distributed among different network environments, devices, or organizations. This distribution characteristic makes centralized data collection methods not only generate high communication overhead and delay but also make them more likely to lead to privacy leakage of sensitive data. Federated learning (FL) [5] provides an effective solution to the above problems by enabling cross-device collaborative model training without sharing original data. However, traditional federated learning frameworks usually rely on a single source of network traffic data local to each client. These data often have obvious regional and local behavior characteristics, and it is difficult to cover complex and diverse attack patterns and communication structures. Therefore, when the model faces non-independent and identically distributed (non-IID) data, the features it learns have strong domain dependence and are difficult to adapt to the differences in traffic characteristics in other network environments, resulting in significantly limited overall generalization capabilities.

To address the above challenges, this study proposes a cross-domain federated graph representation learning framework for network traffic anomaly detection. The goal is to address the limitations of traditional approaches in capturing structural dependencies and protecting sensitive information while enabling effective and privacy-preserving anomaly detection in heterogeneous, cross-domain environments. The proposed method achieves technical advancements through the development of a tripartite collaborative mechanism. First, a dual-channel GNN architecture is employed to process structural knowledge (federated-shared) and attribute knowledge (locally private) in parallel, sharing only low-sensitivity structural information to ensure privacy protection. Second, a cross-gating fusion module is designed to dynamically integrate features, thereby enhancing the representation capacity and improving classification accuracy. Finally, a cross-domain structural guidance mechanism is introduced, wherein a global guidance model is trained on the server side by aggregating knowledge from public domains. This model provides global supervision for local client learning, enabling deep integration of structural information and effective knowledge transfer across domains.

2. Background

2.1. Network Traffic Anomaly Detection

With the continuous expansion of the network scale and the increasing diversification of attack techniques, ensuring the stability and security of network systems has become a core task in the protection of information infrastructure. As a critical component of the cybersecurity defense system, network traffic anomaly detection aims to identify abnormal traffic that deviates from expected communication patterns by analyzing network behavior, thereby uncovering potential malicious activities such as DDoS attacks, port scanning, data breaches, and APTs. In recent years, the rise of data-driven approaches has led to the widespread adoption of machine learning and deep learning techniques in the field of network security, significantly improving the accuracy of anomaly detection. However, the inherent characteristics of network traffic—such as high dimensionality, an unstructured form, and dynamic variability—pose substantial challenges for effective modeling and inference. As a result, developing efficient, accurate, and generalizeable anomaly detection mechanisms remains a pressing research focus in contemporary cybersecurity.

2.2. Graph Neural Networks

Graphs are a type of non-Euclidean data characterized by complex topological structures and a richer capacity for information representation. In real-world applications, many forms of data inherently exhibit graph structures, such as social networks, knowledge graphs, and protein–protein interaction networks [6].

GNNs are a class of neural network specifically designed to process graph-structured data. The fundamental building block of a GNN is the graph convolution layer, which iteratively updates node representations by aggregating features from target nodes and their neighbors. A general update formula is given below:

H^{(l + 1)} = σ (\hat{A} H^{(l)} W^{(l)})

(1)

where

H^{(l)}

denotes the node feature matrix at layer l and

H^{(0)} = X

represents the input node features.

\hat{A}

is the normalized adjacency matrix, typically computed as

\hat{A} = D^{- 1 / 2} A D^{- 1 / 2}

, where A is the adjacency matrix of the original graph and D is the degree matrix.

W^{(l)}

is the trainable weight matrix at layer l, and

σ

is a nonlinear activation function. Stacking multiple graph convolutional layers enables the model to capture deeper structural dependencies. The resulting node representations can be used for downstream tasks such as node classification, graph classification, and link prediction.

The core objective of GNNs is graph representation learning, which aims to map nodes, edges, or subgraphs into low-dimensional vectors by aggregating local neighborhood information. This process ensures that the resulting vector space reflects both the topological structure of the graph and the attribute information of nodes [7]. In the context of computer networks, interactions between hosts can be naturally modeled as graphs, where IP addresses represent nodes and communication flows represent edges. GNNs are well suited to exploiting network traffic data encoded in this format, making their application to traffic anomaly detection a promising research direction.

2.3. Federated Learning

FL is a distributed machine learning framework designed to enable collaborative model training across multiple parties without requiring centralized access to raw data [8]. This approach primarily addresses the challenges faced by centralized training methods, including concerns related to data privacy, data silos, and high communication costs [9]. Within the FL framework, each participating client performs model training locally and only transmits intermediate results, such as model parameters or gradients, to a central server. The server then aggregates these updates to refine the global model.

This decentralized learning paradigm effectively preserves the privacy of network traffic data, as raw data remain on local devices and are never shared externally, thereby mitigating the risk of data leakage. Moreover, FL requires the transmission of only lightweight model updates (e.g., gradient or weight changes), which substantially reduces the communication overhead and network burden associated with data sharing.

3. Related Works

3.1. Network Traffic Anomaly Detection Based on Machine Learning

In recent years, machine learning methods have been widely applied in the field of network traffic detection, aiming to distinguish between malicious and benign traffic by extracting informative features from network flow data. Nguyen and Costa [10] explored the effectiveness of both supervised learning methods (e.g., SVMs and decision trees) and unsupervised learning methods (e.g., clustering) in detecting network anomalies. Ma et al. [11] proposed a hybrid model, SVM-C, which integrates SVMs and clustering analysis to enhance detection performance. Al-Saleh [12] introduced an intrusion detection system that combines SVMs with decision trees, aiming to improve accuracy while reducing the processing time. Vibhute et al. [13] trained and systematically evaluated three different machine learning classifiers—SVMs, logistic regression, and k-nearest neighbors—for their effectiveness in anomaly detection tasks. Machine learning methods, with their ability to model statistical features, have demonstrated broad applicability in network traffic classification without relying on a specific payload content. However, these approaches still exhibit limitations when confronted with increasingly sophisticated modern attacks. Such attacks often adopt multi-stage, stealthy, and coordinated infiltration strategies, involving the collaboration of multiple nodes to progressively penetrate the target network. These behaviors exhibit strong structural dependencies and gradual progression. Traditional machine learning models, which typically rely on isolated flow features, struggle to accurately detect such complex attack patterns that involve intricate structural relationships.

Recent studies have also explored feature fusion mechanisms in the context of anomaly detection. Lin et al. [14] proposed MFFusion, a multi-level feature fusion model that improves malicious traffic detection by hierarchically combining deep features. Similarly, Guarino et al. [15] introduced a two-stage fusion framework specifically designed for cyber-physical anomaly detection, demonstrating the advantages of integrating heterogeneous data representations. Inspired by these works, our method employs a cross-gating mechanism that dynamically integrates structural and attribute features, enabling the generation of more expressive graph-level representations and enhancing detection accuracy.

3.2. Network Traffic Anomaly Detection Based on Graph Neural Networks

Network traffic can essentially be abstracted as a graph structure composed of multiple nodes and edges. In this representation, nodes typically denote concrete network entities such as IP addresses, hosts, or routers, while edges characterize the communication relationships between these entities, thereby reflecting the structural dependencies among traffic flows. Such edges are generally constructed based on observed network behaviors—for example, connections between source and destination addresses or interaction paths among different flows—thus forming explicit communication links. This connectivity reveals the underlying topological structure of network communication and captures various latent organizational patterns. Given these inherent graph-structured properties of network traffic, GNNs serve as a natural and effective modeling tool for traffic anomaly detection tasks.

Training GNN models has proven effective in distinguishing between normal and anomalous network traffic. In recent years, an increasing number of studies have leveraged graph structures to model the relationships among network flows, aiming to enhance the detection of complex dependency patterns. Most of these approaches apply GNNs to node-level or edge-level anomaly detection tasks. For example, Zheng et al. [16] realized network flow detection by representing network flows as nodes in a graph via a GCN-TC method and utilizing graph convolutional networks (GCN) for node classification; Chang et al. [17] constructed a bipartite graph and transformed it into a line graph, converting the original edge classification task into a node classification task; Guo et al. [18] leveraged graph attention networks (GAT) to learn the fusion features of topological graphs and finally classify them through fully connected layers and softmax functions to accomplish node-level detection of network traffic; Lo et al. [19] and Caville et al. [20] focused on edge-level classification, integrating edge features through variants of E-GraphSAGE; Altaf et al. [21] augmented the GCN model to learn the host and network features in the graph structure and accomplished edge-level traffic detection through a two-layer GCN; and Marfo et al. [22] proposed a hybrid model combining GraphSAGE and GAT architectures, utilizing dynamic feature extraction mechanisms to build an efficient network state representation system, but the model still generated node-level or edge-level predictions.

These approaches have achieved a certain degree of success. However, they primarily focus on node-level and edge-level tasks, emphasizing local information within the network, such as anomalous behavior of individual nodes or abnormal interaction patterns between node pairs. In real-world complex network environments, attackers often adopt coordinated multi-point strategies or disguise their actions through multi-path behaviors. Such attacks are no longer limited to the anomalies of a certain node or edge but appear in a more macroscopic structural pattern, and their manifestation has obvious graph structure characteristics. Consequently, detection strategies that rely solely on local information at the node or edge level are insufficient for effectively identifying these distributed and structurally organized anomalous behaviors.

To address the aforementioned challenges, it is essential to explore anomaly detection methods based on graph classification. By leveraging a GNN to uncover latent structural information across the entire network, capture global relationships, and extract multi-level features, these approaches can effectively learn complex structural patterns and potential high-order dependencies embedded within the graph. This enables comprehensive graph-level representation learning and offers novel perspectives and technical means for network traffic anomaly detection.

3.3. Network Traffic Anomaly Detection Based on Federated Learning

The widespread deployment of Internet of Things (IoT) and edge computing devices has significantly increased the complexity and decentralization of network environments, leading to the dispersion of data across multiple locations rather than its concentration on a single central server. The rapid proliferation of smart devices generates vast amounts of network traffic data, generally containing sensitive information that is distributed across diverse geographic regions and administrative domains. Traditional centralized data processing methods require aggregating all network traffic data onto a central server for analysis. However, the central server is constrained by limited computational power, storage capacity, and network bandwidth. As the system scales, the workload increases, or malicious attacks occur, the central server may become incapable of meeting system demands, resulting in performance degradation, prolonged response times, or even complete system failure. Moreover, adversaries can potentially infer sensitive private data from the global model aggregated by the central server. Consequently, any security breach targeting the central server poses a severe risk of privacy leakage.

With the growing demand for data privacy protection and the rapid proliferation of distributed devices, federated learning has emerged as a prominent research direction in the field of network anomaly detection. Against this backdrop, an increasing number of studies have explored the application of federated learning to network traffic anomaly detection tasks, aiming to enable distributed and privacy-preserving threat awareness and analysis. Marfo et al. [23] proposed an FL framework based on deep neural networks for network anomaly detection on resource-constrained devices. However, their approach assumes that all clients share similar data distributions and does not adequately address the variability of traffic features in non-IID settings. Karunamurthy et al. [24] applied FL to IoT environments by training deep learning classifiers to detect various types of attacks, yet their method similarly relies on the assumption of consistent data sources across clients. Jianping et al. [25] constructed an attention-based graph neural network within the FL framework to detect cross-level network attacks. Although their approach achieved promising performance, it still remains constrained by the modeling limitations of single-domain data distributions. Dong et al. [26] introduced FADngs, a method that enhances privacy protection by sharing density functions instead of raw data. Nevertheless, the method does not explicitly address the generalization challenges caused by heterogeneous client data distributions. Although these methods perform well in preserving data privacy and enabling distributed detection, they usually assume that client data comes from a single domain or has a similar distribution. This assumption limits the generalization ability of the global model, particularly in the non-IID setting, where the traffic patterns of different clients vary significantly. Recent research [27] has pointed out that different graph domains often share common structural properties, which opens up new opportunities for cross-domain knowledge transfer. However, existing FL approaches have rarely explored this potential. Therefore, our work aims to address this limitation by incorporating cross-domain structural guidance into the federated training process, thereby achieving more robust representation learning in the non-IID setting.

4. Methods

4.1. AD-FG Framework

The overall framework of AD-FG is illustrated in Figure 1. During the local learning phase, each client first converts its local network traffic data into graph-structured representations using a graph construction module. Within the federated collaborative learning module, a dual-channel GNN equipped with decoupled structural and attribute encoders is locally trained to enable parallel learning of structural and attribute knowledge. A cross-gated feature fusion mechanism is employed to integrate node-level information into a global graph-level representation, which is then used for classifying the network traffic graphs. Notably, only the parameters of the structural encoder are uploaded to the server, while node attribute information remains strictly local and is never shared. In the cross-domain knowledge learning phase, the server aggregates the uploaded structural encoder parameters and leverages knowledge from public domains to learn generalized structural features, thereby forming an enhanced global structural model. This updated model is redistributed to the clients in the next round of communication.

4.2. Graph Structure Construction Module

Figure 2 illustrates the process of constructing a graph from network traffic data. Firstly, key information is extracted from each traffic flow in the dataset. This information includes the source IP address, destination IP address, source port, destination port, protocol type, flow duration, total number of packets, and label. The source IP address and destination IP address, which serve as the basis for constructing the graph data, are considered nodes, while information such as the flow duration and total number of packets is considered edge features. For the purpose of a graph classification task using a GNN, node features are generated by aggregating the features of the incoming and outgoing edges associated with that node. In addition, the structural information of the graph is obtained by two encoding methods: degree-based encoding and random walk-based encoding.

In most traditional graph neural network frameworks, node representations are obtained through a feature aggregation process that integrates structural information and node features into each node’s representation. However, in distributed learning scenarios, node features may contain sensitive information (such as traffic patterns or access logs of individual devices). In this paper, the structural knowledge of the graph is independently represented and learned in the form of vectors. This separation of structural knowledge learning from node features effectively prevents sensitive information from being exposed to the global model, thereby enhancing privacy protection.

In AD-FG, structural embeddings containing both local and global structural patterns are constructed, and this combination helps the model understand the structure and function of the graph more comprehensively. For local structural knowledge, degree-based structure embedding (DE) is used, which creates a one-hot encoding by the degree of a node. For a node v in the graph, its DE is denoted as follows:

s_{v}^{DE} = [e_{1}, e_{2}, \dots, e_{k - 1}, e_{k}]

(2)

where

e_{i}

is defined by

e_{i} = \{\begin{matrix} 1, & if i = d_{v} and i < k \\ 1, & if d_{v} \geq k and i = k \\ 0, & otherwise \end{matrix}

(3)

Here,

d_{v}

is the degree of node v, representing the number of edges directly connected to it, which reflects the centrality of the node within its local subgraph, while k is the dimension of DE, which is used to generate a fixed-length structural embedding vector. Specifically, when the degree

d_{v}

is less than the predefined dimension k, the

d_{v}

th position in the vector is set to one, and all other positions are set to zero. When

d_{v} \geq k

, the last position (the kth position) is set to one, indicating that degrees exceeding the upper limit are grouped into a single category. DE is used because the degree is one of the most basic and intuitive attributes of a node in a graph for expressing the local connectivity of a node. By directly converting the degree of a node to an embedding vector, the local structural characteristics of a node can be captured succinctly and efficiently, and the one-time degree-based embedding is computationally friendly, which avoids the high computational cost of embedding initialization.

Meanwhile, in order to obtain global structural knowledge, a random walk-based embedding (RWE) is introduced, which is computed based on the random walk diffusion process [28,29]. Specifically, RWE is denoted as follows:

s_{v}^{RWE} = [p_{v}^{1}, p_{v}^{2}, \dots, p_{v}^{k}]

(4)

where

p_{v}^{k} = {(T^{k})}_{v v}

denotes the probability that node v returns to itself after k steps of the random walk. This probability is calculated using the random walk transition matrix

T = A D^{- 1}

, where A is the adjacency matrix of the graph and D is the diagonal degree matrix. The diagonal element

{(T^{k})}_{v v}

of the matrix

T^{k}

represents the probability that node v returns to itself at step k. Unlike focusing on nodes only in their direct neighborhoods, RWE captures the global connectivity of a node by considering the behavior of the node over multi-step random walks and is able to naturally take into account the graph’s path dependencies and the strength of interactions between nodes. As a result, RWE can capture the structural knowledge of the graph sensitively. Finally, the structural embedding can be obtained by concatenating DE and RWE as follows:

s_{v} = [s_{v}^{DE} ∥ s_{v}^{RWE}]

(5)

4.3. Federated Collaborative Learning Module

As shown in Figure 3, the federated collaborative learning module mainly consists of two parts: a feature-structure decoupled graph neural network and a cross-gated feature fusion mechanism.

4.3.1. Feature-Structure Decoupled Graph Neural Network

In existing graph neural networks, structural information is typically encoded implicitly through aggregation with node features. However, in the context of federated graph learning, directly uploading node features may pose privacy risks for users, especially in scenarios where nodes contain highly sensitive information, such as user behavior patterns, traffic content, or access logs. To explicitly mitigate these privacy risks, our approach separates the encoding and learning processes of the graph structure and node attributes. This separation significantly enhances privacy protection, as it ensures that only structural information is transmitted to the central server, and such information inherently contains minimal sensitive details. The sensitive node attribute information remains securely stored and processed locally on each client, thereby preventing potential privacy leakage caused by centralized data aggregation or exposure. Moreover, in large-scale graph data scenarios, jointly processing both node features and topological structures can be computationally intensive. By decoupling these two components, the system can independently optimize storage and computation strategies. This separation not only enhances privacy protection but also significantly reduces the volume of data exchanged during each communication round. Such reduction is particularly beneficial in low-power IoT environments [30], where devices are typically constrained by limited energy and bandwidth resources.

Therefore, inspired by Dwivedi et al. [28], a feature-structure decoupled graph neural network is proposed. As shown in Figure 3, it learns feature knowledge and structure knowledge through two parallel channels, respectively. In the structure-based channel, the structure encoder focuses on learning the structural information of the graph. This encoder operates on initial structural embeddings derived from degree-based and random walk-based encoding. It utilizes the message-passing mechanism of graph neural network layers to extract and propagate structural information, iteratively aggregating the structural properties of each node and its neighbors across multiple layers. By being independent of feature data, this approach enables the model to more accurately capture and leverage the topological properties of the graph. This structurally focused independent channel ensures that the model effectively learns complex network relationships purely from graph connectivity. In the feature-based channel, the feature encoder is responsible for extracting and learning useful information from the raw feature data of the nodes. This process involves aggregating feature information from the nodes themselves and their neighbors at each layer of the network to form an information-rich hidden layer embedding. Formally, for an l-layer feature encoder

f_{h}^{(l)}

, its lth layer hidden feature embedding

h^{(l)}

can be expressed as follows:

h^{(l)} = f_{h}^{(l)} ([h^{(l - 1)} ∥ s^{(l - 1)}])

(6)

where

h^{(l - 1)}

is the feature embedding at layer

l - 1

,

s^{(l - 1)}

is the structural embedding at layer

l - 1

, and ∥ represents the concatenation operation, which combines the feature and structural embeddings into a new input vector. In this way, the feature encoder can further utilize the structural knowledge learned by the structural encoder to generate feature embeddings that provide supplementary information for representation learning.

4.3.2. Cross-Gated Feature Fusion Mechanism

The decoupled graph neural network uses two different information flow channels to extract features from feature knowledge and structural knowledge to establish a reasonable relationship between

h_{v}

and

g_{v}

for a comprehensive representation of the network flow graph. To this end, a fusion mechanism called the cross-gated feature fusion mechanism is specifically designed as shown in Figure 3 to fuse

h_{v}

and

g_{v}

into the final encoded feature vector of the network flow graph.

Two filters are used, each including two linear layers, with a ReLU activation function between the two layers. First, the two filters, which do not share parameters, are applied to

h_{v}

and

g_{v}

, and then an element-by-element sigmoid function is used to scale each element to the range [0, 1]. The scaled vectors are regarded as gated vectors, and the corresponding

h_{v}

and

g_{v}

are cross-filtered using them. This mechanism can effectively combine the features (packet sizes, flow rates, etc.) and structural information (e.g., connectivity patterns among the nodes) of a network flow. This fusion not only improves the distinguishing power of the features but also enables the model to capture potential threats and anomalous behaviors from more dimensions. The cross-gated feature fusion mechanism allows each type of information to weight and filter the other. This means that the model can focus on key information that best characterizes and explains the current network situation, thereby filtering out irrelevant or secondary data. For example, if certain feature information is not critical for a particular type of anomaly detection, then this mechanism can reduce the weight of this information and vice versa. The cross-gated feature fusion mechanism can be represented by

s_{g} = Sigmoid (W_{h 2}^{T} ReLU (W_{h 1}^{T} g_{v} + b_{h 1}) + b_{h 2})

(7)

s_{h} = Sigmoid (W_{g 2}^{T} ReLU (W_{g 1}^{T} h_{v} + b_{g 1}) + b_{g 2})

(8)

h_{G} = READOUT (CONCAT (s_{g} ⊙ h_{v}, s_{h} ⊙ g_{v}))

(9)

W_{h 1}, W_{h 2}, W_{g 1}, W_{g 2} \in R^{(d_{g} \times d_{g})}

are the weights of the linear layer,

b_{h 1}, b_{h 2}, b_{g 1}, b_{g 2} \in R^{d_{g}}

are the deviations of the linear layer, and the symbol ⊙ denotes any element-level products, which are finally connected together and passed through a readout function to obtain the graph-level embedding

h_{G}

. This graph embedding is then fed into a softmax classifier to perform graph classification. In the last step, the cross-entropy according to the graph classification task loss

L_{m}

is optimized. The set of learnable parameters for the mth client is denoted by

W_{m} = {W_{h, m}, W_{g, m}}

, where

W_{h, m}

denotes the parameters of the feature encoder and

W_{g, m}

denotes the parameters of the structure encoder. They are updated during the local training process as follows:

W_{h, m}^{*}, W_{g, m}^{*} = arg min_{W_{h, m}, W_{g, m}} L_{m} (W_{m}; D_{m})

(10)

where

L_{m (W_{m}; D_{m})}

is the loss function of client m on the local dataset

D_{m}

.

4.4. Cross-Domain Knowledge Learning Module

In different graph domains, feature information exhibits significant heterogeneity, whereas many fundamental structural properties remain common across domains, making them suitable for shared learning potential. Therefore, to capitalize on this property, this paper explores a server-side mechanism whereby cross-domain public data are shared through a server to provide a global view and help models better understand and learn generic structural features. Although public-domain graph data differ substantially from network traffic graphs in semantics and application scenarios, they exhibit rich and widely observed structural characteristics, such as community structures, degree distribution heterogeneity, and path length patterns. In this study, we do not treat these public graphs as direct substitutes but rather as sources of structural priors to guide the learning of anomaly detection models.

Specifically, the server holds data from other public domains (e.g., data from social networks, small molecule data in chemistry or protein structure data in biology) and utilizes these data to train a global model. According to Equation (10), the server utilizes the public domain dataset

D_{pub}

to train a global model

θ_{pub}

, where

L_{pub}

denotes the loss function computed over the public data:

θ_{pub}^{t + 1} = θ_{pub}^{t} - η \nabla L_{pub} (θ_{pub}^{t}; D_{pub})

(11)

During the aggregation phase, as shown in Equation (11), the server aggregates the structure model parameters uploaded by all clients and combines them with the model parameters

θ_{pub}

trained on the public data to generate a new global model

θ_{global}

:

θ_{global}^{t + 1} = λ θ_{pub}^{t + 1} + (1 - λ) \sum_{k = 1}^{K} \frac{n_{k}}{N} θ_{k}^{t + 1}

(12)

where

λ \in [0, 1]

is a weighting coefficient that adjusts the contribution of the public data to the global model. K denotes the number of clients,

n_{k}

is the number of data samples held by the client k, and

N = \sum_{k = 1}^{K} n_{k}

is the total number of samples across all clients. During model aggregation, the server utilizes this cross-domain public data to enhance the learning of generalizable structural features, thereby refining the global model that guides local clients. In this case, the server not only performs the conventional role of aggregating model parameters but also acts as a bridge for cross-domain data transfer, enabling knowledge sharing across different domains. This approach enhances the adaptability of the federated learning framework in complex and diverse data environments.

5. Experiments and Results

5.1. Experimental Set-Up

5.1.1. Dataset and Parameter Settings

The CIC-IDS 2017 dataset was generated by the Canadian Institute for Cyber Security Research. A total of five days of data were collected from 3 July to 7 July 2017, with a small period of time taken each day for traffic collection. The dataset contains 2.83 million data flows, including 550,000 anomalous attack flows and 2.28 million normal data flows. The traffic data needed to be preprocessed in a uniform manner before they were fed into the model. Detailed statistics for the dataset are shown in Table 1. To better understand the distribution of anomalous and benign traffic within the dataset, we further provide a breakdown of the traffic types and their corresponding flow counts in Table 2.

Graph construction: Nodes in the graph are identified with IPs, and directed edges are created based on the source and destination IPs of each flow. Since the flow features provided by the dataset are edge features, the features of the incoming and outgoing edges connected to the nodes are accumulated as node features.
Normalization: The features of the nodes and the adjacency matrix of the graph are normalized, the node features are normalized according to the degree, and the adjacency matrix is taken to be normalized based on the random walk.
Graph segmentation: The size of the graph can be determined according to the actual requirements, such as the attack period to be detected, the amount of network communication, and the size of the network. According to the size of the dataset used in the experiment, a topology graph was constructed every 500 graph flows in chronological order.
Graph labeling: If there are anomalous flows in the graph, then the graph is labeled as malicious; otherwise, it is labeled as benign.

A total of 5660 graphs were constructed in the dataset, and 70% of the graph data were randomly selected as the training set, while 15% were the validation set and 15% were the test set. Ten clients were involved in the training, each with 395 training graphs and 84 validation graphs. The server selected the IMDB-BINARY dataset of the social domain for training.

In this paper, we use a three-layer GCN as the structure encoder and a three-layer Graph isomorphism network (GIN) as the feature encoder, both with hidden layers of 64 neurons each. The dimensions of both the DE and RWE were 32. The number of local iterations and batch size were 1 and 64, respectively. An Adam [31] optimizer was used with a weight decay of

5 \times 10^{- 4}

and a learning rate of

1 \times 10^{- 4}

. The number of communication rounds for all FL methods was 200. To optimize model performance, we performed a small-scale grid search on the validation set for key hyperparameters while keeping the less sensitive ones fixed. Specifically, we searched over the following ranges:

Dimension of degree-based structure embedding $k_{1}$ : {8, 16, 32, 64, 128};
Dimension of random walk-based structure embedding $k_{2}$ : {8, 16, 32, 64, 128};
Learning rate: { $5 \times 10^{- 3}$ , $1 \times 10^{- 3}$ , $5 \times 10^{- 4}$ , $1 \times 10^{- 4}$ };
Weight decay: { $7 \times 10^{- 4}$ , $5 \times 10^{- 4}$ , $3 \times 10^{- 4}$ , $1 \times 10^{- 4}$ }.

The final selections (

k_{1} = k_{2} = 32

; learning rate =

1 \times 10^{- 4}

; weight decay =

5 \times 10^{- 4}

) were based on the best validation performance in terms of accuracy and stability. All methods were implemented using PyTorch 2.0.1, and all experiments were performed on a single computer server with NVIDIA GeForce RTX 4090 (24 GB of memory) GPUs, Intel(R) Xeon(R) W-2133 CPUs @ 3.60GHz, and 32 GB of RAM.

5.1.2. Baseline

In order to verify the effectiveness of the methods in this paper, AD-FG was compared with eight methods, and the specific results are shown in Table 3. Among them, eight baselines were included:

Local: Each client trained local models based on local data and did not communicate with other clients.
FedProx [32]: Based on FedAvg, a regular term was introduced into the original loss function to alleviate the problem of non-independent homogeneous distribution among clients.
FedAvg [5]: The clients sent all learnable parameters to the server and received aggregated parameters from the server for the next round of training.
FedSage [33]: The extraction of graph information was achieved using the two steps of aggregation and splicing combined with the GraphSAGE sampling algorithm.
GCFL [34]: A federated learning approach based on graph clustering, with federated searching for clients where graphs with similar structures and features were located and training graph models between clients within the same cluster using FedAvg.
MemAE-EIF [35]: Unsupervised detection methods based on deep autoencoders and Isolation Forest.
CTGCN [36]: Modeling network devices and communications as graphs to capture changes in communications over time and detect anomalous traffic patterns by aggregating neighbor information through convolutional operations.
DyGCN [37]: A GCN was used to capture the graph structure features of each time period to mine the abnormal substructure in the graph and identify network anomalies.

Table 3. Results of different methods.

Methods	Accuracy	Precision	Recall	F1 Score
Local	85.21	83.68	61.26	70.71
FedAvg	85.02	82.25	61.94	70.66
FedProx	85.14	85.38	59.10	69.85
FedSage	85.37	86.82	58.70	70.04
GCFL	84.43	79.18	63.15	70.27
MemAE-EIF	83.50	74.31	69.72	71.46
CTGCN	—	87.80	50.23	63.90
DyGCN	—	77.77	66.48	71.68
AD-FG (Ours)	90.20	89.63	70.94	79.19

5.2. Results

5.2.1. Comparative Performance Results

To comprehensively evaluate the performance of the proposed model in network traffic anomaly detection, we adopted four key metrics: accuracy, precision, recall, and F1 score. A comparative analysis between AD-FG and several existing methods was conducted, as summarized in Table 3. Overall, AD-FG demonstrated superior performance across all metrics. This improvement is attributed to its dual-channel feature-structure decoupling architecture, which aggregates only the graph structure encoder parameters during federated learning. In particular, AD-FG achieved an F1 score of 79.19, significantly outperforming baseline approaches such as FedAvg (70.66), which relies on simple parameter averaging, and FedProx (69.85), which incorporates regularization during optimization. This highlights the advantage of the structure-sharing strategy in enhancing classification performance. Although FedSage and GCFL, as GNN-based federated methods, exhibit certain capabilities in structural modeling, their performance in this experiment remained suboptimal, indicating limited adaptability to cross-domain heterogeneous traffic patterns. The conventional method, MemAE-EIF, also showed inferior performance in terms of both accuracy and precision, further emphasizing the necessity of integrating graph modeling with federated learning. In contrast, DyGCN and CTGCN, two GNN-based classification methods, demonstrated certain strengths in specific metrics but still fell short of AD-FG in terms of recall and F1 score. These results validate the effectiveness of the proposed approach, particularly the integration of degree- and random walk-based structural encoding with cross-domain structure sharing. This design enables the model to capture richer topological patterns and latent anomalies in network traffic, making it especially suitable for complex, heterogeneous, and cross-domain detection scenarios.

5.2.2. Convergence Analysis

Figure 4 compares the trends in the accuracy and F1 score of AD-FG with other baselines during multiple communication rounds. When processing distributed network traffic data, AD-FG showed better convergence for both the accuracy and F1 score as the number of communication rounds increased and could achieve high performance metrics within a relatively small number of communication rounds and eventually stabilize at a high level, which is advantageous compared with other methods. This suggests that, unlike the feature-based encoders in other methods, AD-FG can better capture and utilize the structural information in graph data by explicitly encoding the graph structure. Other methods mainly focus on the feature vectors of the nodes, encode structural information implicitly with features, and ignore the topological relationships and structural patterns among the nodes in the graph. This approach often fails to fully exploit the complex dependencies of graph data and tends to lose sensitivity to structural information. In contrast, AD-FG captured the topological information of the graph more accurately by encoding the graph structure using degree and random walk methods, enabling it to explicitly understand the relationships between nodes and effectively learn potential patterns and anomalies. This encoding of structural information is more flexible and powerful than encoding that relies solely on node features, as it is able to deal with connectivity, adjacency, and global topological patterns in the graph, thus enhancing the model’s performance when dealing with complex graph data.

5.2.3. The Effect of the Number of Local Training Rounds

In FL, clients can perform multiple local training stages before global aggregation to reduce communication costs. In Table 4, the results are provided for FedAvg, FedSage, and AD-FG with a total of 10 clients. Overall, as the number of training rounds increased, the accuracy improved significantly in the early stages, with FedSage and AD-FG showing a decline afterward, reaching the best performance at the third training round. FedAvg continued to improve even after the third round. In the early stages of training, the models had weak fitting abilities for the data, but through the local training stages, the clients could quickly learn the patterns in the local data, leading to rapid accuracy improvements. However, in the later stages of training, after several rounds of local training, the client models stabilized, and further optimization became limited. Moreover, excessive local training may cause the client models in the AD-FG algorithm to become overly reliant on local features, which affects the learning effectiveness of the global model under cross-domain guidance from the server.

5.2.4. Ablation Experiment

The ablation study results in Table 5 clearly demonstrate the systematic impact of the cross-gated feature fusion mechanism, cross-domain knowledge learning, and dual-channel GNN architecture on model performance. The experimental data show that the dual-channel GNN served as a critical foundational component. When this module was enabled independently, the F1 score rose significantly from the baseline of 70.66 to 77.76, representing a 7.1 percent improvement. This result confirms the effectiveness of the feature-structure decoupling design. By learning the topological and behavioral characteristics of network traffic in parallel, the model was able to capture anomalous patterns more comprehensively.

Further incorporating the cross-gated feature fusion mechanism led to a slight increase in the F1 score to 77.81. Although the improvement was relatively small, this mechanism introduces nonlinear feature fusion, which proves beneficial in processing complex and heterogeneous network traffic. It enables the model to emphasize important feature channels while suppressing irrelevant noise.

The most substantial performance gain was observed when all three components were enabled together: the dual-channel GNN, the cross-gated feature fusion mechanism, and cross-domain knowledge learning. In this setting, the model achieved its best performance with an F1 score of 79.19, accuracy of 90.20%, and recall of 70.94%. This indicates a strong synergistic effect among the components. The dual-channel GNN provides a rich multi-view representation, the cross-gated feature fusion mechanism facilitates fine-grained feature interaction, and the cross-domain knowledge learning mechanism injects global structural prior knowledge by sharing server-side information. Among these, the cross-domain knowledge learning mechanism contributed most notably to the improvement in recall, raising it from 69.25% to 70.94%. This enhancement demonstrates its effectiveness in helping the model identify rare types of attacks, which is especially critical when dealing with highly imbalanced network attack data.

5.2.5. Client Number Comparison

To verify the scalability and robustness of the AD-FG method in federated environments, we conducted experiments under different numbers of clients: 5, 10, 15, 20, 50, and 100. We compared the performance of AD-FG with FedAvg and FedSage under these conditions. As shown in Figure 5, AD-FG consistently outperformed the other methods across all settings. While all methods showed some performance degradation as the number of clients increased, especially beyond 20 clients, AD-FG maintained a significantly higher accuracy even at 50 and 100 clients (87.85% and 87.01%) compared with FedAvg (82.68% and 79.79%) and FedSage (83.1% and 80.2%, respectively). These results demonstrate that AD-FG not only performs well in small- and medium-scale federated scenarios but also scales effectively in large-scale environments with many clients. This robustness is primarily due to its decoupled structural encoding mechanism and communication-efficient design, which allow AD-FG to capture and aggregate meaningful structural information without being heavily impacted by data heterogeneity or communication overhead.

5.2.6. Effect of $λ$ on Global Model Integration

In the cross-domain knowledge learning module, the parameter

λ

is used to adjust the influence of public data on the global model. As shown in Figure 6, we investigated the impact of varying

λ

from 0 to 1 on model performance. When

λ = 0

, the server aggregated only the structural parameters from local clients, meaning that the global model relied solely on client-specific structural information without incorporating any cross-domain structural knowledge. Under this setting, the model exhibited slightly worse performance, indicating that training based solely on local data may limit the generalization ability of the global model. As

λ

increased, the influence of cross-domain knowledge gradually strengthened. The results show that the best performance was achieved when

λ = 0.3

, suggesting that moderate incorporation of cross-domain knowledge can enhance the model’s adaptability and improve its robustness across heterogeneous client data distributions. However, when

λ = 1

, the global model was entirely derived from cross-domain knowledge provided by the server, diminishing the role of local network topologies in model updates. This led to a significant drop in performance, likely because excessive reliance on cross-domain knowledge hinders the learning of client-specific structures. The experimental results demonstrate that a small

λ

value may cause the model to overly depend on local information, limiting generalization, whereas a large

λ

value may bias the model toward public data, thereby neglecting important local topological features.

6. Discussion and Conclusions

This paper proposed AD-FG, a cross-domain federated graph representation learning approach for network traffic anomaly detection which effectively addresses the limitations of traditional methods in structural modeling and privacy preservation. The proposed method integrates degree and random walk structural encoding approaches and feature-structure decoupled graph neural network design to explicitly represent and accurately capture structural knowledge to efficiently learn graph structural representations in network traffic in a distributed environment. Unlike traditional approaches, AD-FG shares only structural encoders in a federated learning framework, which avoids the interference of feature information and protects data privacy at the same time. In addition, under the cross-domain knowledge learning module of the server, the client can learn domain-independent structural knowledge, which enhances the global learning capability and improves the generalization ability and robustness of the model. The experimental results demonstrate the effectiveness of the proposed method in distributed network traffic anomaly detection, showing obvious advantages over existing approaches.

Despite its promising results, AD-FG still has certain limitations. First, the experiments utilized the CIC-IDS 2017 dataset for graph construction. Although the dataset contains multiple types of attacks, the traffic data may not fully capture the complexity and diversity of real-world network environments. Second, the current method constructs static graphs using fixed time windows, which may not comprehensively capture temporal dependencies in dynamic network scenarios. Future work may explore dynamic graph modeling to enhance real-time adaptability. Moreover, we acknowledge the potential domain mismatch between public graph datasets and real-world network traffic graphs. While our method demonstrates robustness, future work could explore domain adaptation techniques or leverage synthetic graph generators tailored to network-specific properties to further reduce this gap.

Author Contributions

Conceptualization, Y.Z.; methodology, Y.Z.; software, Y.Z.; validation, Y.Z.; formal analysis, Y.Z.; investigation, Y.Z.; resources, Y.Z.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, Y.Z.; visualization, Y.Z.; supervision, Z.L. and J.P.; project administration, J.P.; funding acquisition, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a key project of the National Natural Science Foundation of China (U22B2057).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Roy, S. A comprehensive Survey on Network Traffic Anomaly Detection Using Deep Learning. Preprints 2024. [Google Scholar] [CrossRef]
Alshamrani, A.; Myneni, S.; Chowdhary, A.; Huang, D. A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutorials 2019, 21, 1851–1877. [Google Scholar] [CrossRef]
Wang, N.; Wen, X.; Zhang, D.; Zhao, X.; Ma, J.; Luo, M.; Nie, S.; Wu, S.; Liu, J. Tbdetector: Transformer-based detector for advanced persistent threats with provenance graph. arXiv 2023, arXiv:2304.02838. [Google Scholar]
Van Langendonck, L.; Castell-Uroz, I.; Barlet-Ros, P. Towards a graph-based foundation model for network traffic analysis. In Proceedings of the 3rd GNNet Workshop on Graph Neural Networking Workshop, Los Angeles, CA, USA, 9–12 December 2024; pp. 41–45. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Zhao, Y.; Liu, H.; Duan, H. HGNN-GAMS: Heterogeneous Graph Neural Networks for Graph Attribute Mining and Semantic Fusion. IEEE Access 2024, 12, 191603–191611. [Google Scholar] [CrossRef]
Alsamhi, S.H.; Myrzashova, R.; Hawbani, A.; Kumar, S.; Srivastava, S.; Zhao, L.; Curry, E. Federated learning meets blockchain in decentralized data-sharing: Healthcare use case. IEEE Internet Things J. 2024, 11, 19602–19615. [Google Scholar] [CrossRef]
Myakala, P.K.; Jonnalagadda, A.K.; Bura, C. Federated learning and data privacy: A review of challenges and opportunities. Int. J. Res. Publ. Rev. 2024, 5, 1867–1879. [Google Scholar] [CrossRef]
Nguyen, C.; Costa, A. Anomaly Detection in Network Traffic using Machine Learning Techniques. ITSI Trans. Electr. Electron. Eng. 2024, 13, 1–7. [Google Scholar]
Ma, Q.; Sun, C.; Cui, B. A novel model for anomaly detection in network traffic based on support vector machine and clustering. Secur. Commun. Netw. 2021, 2021, 2170788. [Google Scholar] [CrossRef]
Al-Saleh, A. A balanced communication-avoiding support vector machine decision tree method for smart intrusion detection systems. Sci. Rep. 2023, 13, 9083. [Google Scholar] [CrossRef]
Vibhute, A.D.; Patil, C.H.; Mane, A.V.; Kale, K.V. Towards detection of network anomalies using machine learning algorithms on the NSL-KDD benchmark datasets. Procedia Comput. Sci. 2024, 233, 960–969. [Google Scholar] [CrossRef]
Lin, K.; Xu, X.; Xiao, F. MFFusion: A multi-level features fusion model for malicious traffic detection based on deep learning. Comput. Netw. 2022, 202, 108658. [Google Scholar] [CrossRef]
Guarino, S.; Vitale, F.; Flammini, F.; Faramondi, L.; Mazzocca, N.; Setola, R. A two-level fusion framework for cyber-physical anomaly detection. IEEE Trans. Ind.-Cyber-Phys. Syst. 2023, 2, 1–13. [Google Scholar] [CrossRef]
Zheng, L.; Li, Z.; Li, J.; Li, Z.; Gao, J. AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; pp. 4419–4425. [Google Scholar]
Chang, L.; Branco, P. Graph-based solutions with residuals for intrusion detection: The modified e-graphsage and e-resgat algorithms. arXiv 2021, arXiv:2111.13597. [Google Scholar]
Guo, W.; Qiu, H.; Liu, Z.; Zhu, J.; Wang, Q. GLD-Net: Deep Learning to Detect DDoS Attack via Topological and Traffic Feature Fusion. Comput. Intell. Neurosci. 2022, 2022, 4611331. [Google Scholar] [CrossRef]
Lo, W.W.; Layeghy, S.; Sarhan, M.; Gallagher, M.; Portmann, M. E-graphsage: A graph neural network based intrusion detection system for iot. In Proceedings of the NOMS 2022–2022 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 25–29 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–9. [Google Scholar]
Caville, E.; Lo, W.W.; Layeghy, S.; Portmann, M. Anomal-E: A self-supervised network intrusion detection system based on graph neural networks. Knowl.-Based Syst. 2022, 258, 110030. [Google Scholar] [CrossRef]
Altaf, T.; Wang, X.; Ni, W.; Liu, R.P.; Braun, R. NE-GConv: A lightweight node edge graph convolutional network for intrusion detection. Comput. Secur. 2023, 130, 103285. [Google Scholar] [CrossRef]
Marfo, W.; Tosh, D.K.; Moore, S.V. Enhancing network anomaly detection using graph neural networks. In Proceedings of the 2024 22nd Mediterranean Communication and Computer Networking Conference (MedComNet), Nice, France, 11–13 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar]
Marfo, W.; Tosh, D.K.; Moore, S.V. Network anomaly detection using federated learning. In Proceedings of the MILCOM 2022-2022 IEEE Military Communications Conference (MILCOM), Rockville, MD, USA, 28 November–2 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 484–489. [Google Scholar]
Karunamurthy, A.; Vijayan, K.; Kshirsagar, P.R.; Tan, K.T. An optimal federated learning-based intrusion detection for IoT environment. Sci. Rep. 2025, 15, 8696. [Google Scholar] [CrossRef]
Jianping, W.; Guangqiu, Q.; Chunming, W.; Weiwei, J.; Jiahe, J. Federated learning for network attack detection using attention-based graph neural networks. Sci. Rep. 2024, 14, 19088. [Google Scholar] [CrossRef]
Dong, B.; Chen, D.; Wu, Y.; Tang, S.; Zhuang, Y. Fadngs: Federated learning for anomaly detection. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 2578–2592. [Google Scholar] [CrossRef]
Tan, Y.; Liu, Y.; Long, G.; Jiang, J.; Lu, Q.; Zhang, C. Federated learning on non-iid graphs via structural knowledge sharing. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 9953–9961. [Google Scholar]
Dwivedi, V.P.; Luu, A.T.; Laurent, T.; Bengio, Y.; Bresson, X. Graph neural networks with learnable structural and positional representations. arXiv 2021, arXiv:2110.07875. [Google Scholar]
Liu, Y.; Zheng, Y.; Zhang, D.; Lee, V.C.; Pan, S. Beyond smoothing: Unsupervised graph representation learning with edge heterophily discriminating. In Proceedings of the AAAI conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 4516–4524. [Google Scholar]
D’Addato, M.; Antolini, A.; Renzini, F.; Elgani, A.M.; Perilli, L.; Scarselli, E.F.; Gnudi, A.; Magno, M.; Canegallo, R. Nanowatt Clock and Data Recovery for Ultra-Low Power Wake-Up Based Receivers. In Proceedings of the 2020 International Conference on Embedded Wireless Systems and Networks, Lyon, France, 17–19 February 2020; pp. 224–229. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Zhang, K.; Yang, C.; Li, X.; Sun, L.; Yiu, S.M. Subgraph federated learning with missing neighbor generation. Adv. Neural Inf. Process. Syst. 2021, 34, 6671–6682. [Google Scholar]
Wang, B.; Li, A.; Pang, M.; Li, H.; Chen, Y. Graphfl: A federated learning framework for semi-supervised node classification on graphs. In Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA, 28 November–1 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 498–507. [Google Scholar]
Carrera, F.; Dentamaro, V.; Galantucci, S.; Iannacone, A.; Impedovo, D.; Pirlo, G. Combining unsupervised approaches for near real-time network traffic anomaly detection. Appl. Sci. 2022, 12, 1759. [Google Scholar] [CrossRef]
Liu, J.; Xu, C.; Yin, C.; Wu, W.; Song, Y. K-core based temporal graph convolutional network for dynamic graphs. IEEE Trans. Knowl. Data Eng. 2020, 34, 3841–3853. [Google Scholar] [CrossRef]
Gu, Y.; Zhang, X.; Xu, H.; Wu, T. DyGCN: Dynamic Graph Convolution Network-based Anomaly Network Traffic Detection. In 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Sanya, China, 17–21 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1838–1843. [Google Scholar]

Figure 1. The overall framework of AD-FG.

Figure 2. Graph structure building module.

Figure 3. Federated collaborative learning module.

Figure 4. Performance metrics during multi-round communication. (a) Accuracy curve during multi-round communication. (b) F1 score curve during multi-round communication.

Figure 5. Results for different numbers of clients.

Figure 6. Experimental results for different

λ

values.

Figure 6. Experimental results for different

λ

values.

Table 1. CIC-IDS 2017 dataset introduction.

Dataset	Number of Nodes	Number of Edges	Time Duration	Number of Graphs	Attack Type
CIC-IDS 2017	19,211	2,824,000	5 days	5650	14

Table 2. Dataset details: types of network traffic in CIC-IDS 2017.

Network Traffic Type	Quantity (Flows)
Benign	2,273,097
DoS Hulk	231,073
PortScan	158,930
DDoS	128,027
DoS GoldenEye	10,293
FTP-Patator	7938
SSH-Patator	5897
DoS Slowloris	5796
DoS Slowhttptest	5499
Bot	1966
Web Attack Brute Force	1507
Web Attack XSS	652
Infiltration	36
Web Attack SQL Injection	21
Heartbleed	11

Table 4. Effect of the number of local training rounds on model accuracy.

Methods	Epochs
Methods	1	2	3	4
FedAvg	85.02	86.94	88.58	89.70
FedSage	85.37	87.41	89.17	88.94
AD-FG	90.20	91.12	93.41	92.29

Table 5. Impact of dual-channel GNN, cross-gated, and cross-domain mechanisms.

Dual-Channel GNN	Cross-Gated	Cross-Domain	Accuracy	Precision	Recall	F1 Score
×	×	×	85.02	82.25	61.94	70.66
✓	×	×	89.23	88.12	69.58	77.76
✓	✓	×	89.70	88.78	69.25	77.81
✓	✓	✓	90.20	89.63	70.94	79.19

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Y.; Liu, Z.; Pang, J. Anomaly Detection in Network Traffic via Cross-Domain Federated Graph Representation Learning. Appl. Sci. 2025, 15, 6258. https://doi.org/10.3390/app15116258

AMA Style

Zhao Y, Liu Z, Pang J. Anomaly Detection in Network Traffic via Cross-Domain Federated Graph Representation Learning. Applied Sciences. 2025; 15(11):6258. https://doi.org/10.3390/app15116258

Chicago/Turabian Style

Zhao, Yanli, Zongduo Liu, and Junjie Pang. 2025. "Anomaly Detection in Network Traffic via Cross-Domain Federated Graph Representation Learning" Applied Sciences 15, no. 11: 6258. https://doi.org/10.3390/app15116258

APA Style

Zhao, Y., Liu, Z., & Pang, J. (2025). Anomaly Detection in Network Traffic via Cross-Domain Federated Graph Representation Learning. Applied Sciences, 15(11), 6258. https://doi.org/10.3390/app15116258

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Anomaly Detection in Network Traffic via Cross-Domain Federated Graph Representation Learning

Abstract

1. Introduction

2. Background

2.1. Network Traffic Anomaly Detection

2.2. Graph Neural Networks

2.3. Federated Learning

3. Related Works

3.1. Network Traffic Anomaly Detection Based on Machine Learning

3.2. Network Traffic Anomaly Detection Based on Graph Neural Networks

3.3. Network Traffic Anomaly Detection Based on Federated Learning

4. Methods

4.1. AD-FG Framework

4.2. Graph Structure Construction Module

4.3. Federated Collaborative Learning Module

4.3.1. Feature-Structure Decoupled Graph Neural Network

4.3.2. Cross-Gated Feature Fusion Mechanism

4.4. Cross-Domain Knowledge Learning Module

5. Experiments and Results

5.1. Experimental Set-Up

5.1.1. Dataset and Parameter Settings

5.1.2. Baseline

5.2. Results

5.2.1. Comparative Performance Results

5.2.2. Convergence Analysis

5.2.3. The Effect of the Number of Local Training Rounds

5.2.4. Ablation Experiment

5.2.5. Client Number Comparison

5.2.6. Effect of λ on Global Model Integration

6. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.2.6. Effect of $λ$ on Global Model Integration