Next Article in Journal
Flavouring Agent with High-Frequency Heating of Compositions Based on Natural Raw Materials
Previous Article in Journal
Energy-Efficient Train Control Based on Energy Consumption Estimation Model and Deep Reinforcement Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

GLBAD: Online BGP Anomaly Detection Under Partial Observation

by
Zheng Wu
*,
Yaoyu Zhou
and
Junda Wu
The School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(24), 4940; https://doi.org/10.3390/electronics14244940
Submission received: 27 November 2025 / Revised: 14 December 2025 / Accepted: 14 December 2025 / Published: 16 December 2025
(This article belongs to the Section Networks)

Abstract

The Border Gateway Protocol (BGP) is the core protocol for inter-domain routing on the Internet. However, due to its lack of built-in security authentication mechanisms, BGP is highly vulnerable to misconfigurations or malicious route announcements, which can lead to severe incidents such as route hijacking and information leakage. Existing detection methods face two major bottlenecks: First, as the scale of Autonomous System (AS)-level topology continues to grow, conventional graph neural networks struggle to meet the demands of computational resources and latency. Second, the observational data provided by current monitoring systems are inherently localized. To address these challenges, this paper proposes a Graph Learning-driven framework for BGP Anomaly Detection, named GLBAD. The core design of GLBAD comprises three components: First, to handle BGP’s large-scale network topology, we propose a graph partition method to perform a dedicated topological partitioning on the BGP network. Second, to overcome the limitation of localized observational data, we design a graph autoencoder-based approach for adaptive graph learning, enabling topology inference. Finally, integrating the above components, we develop a comprehensive BGP anomaly detection system to achieve real-time and accurate anomaly detection. We evaluate our approach on 20 real-world BGP anomaly events. Experimental results demonstrate that the proposed GLBAD effectively detects anomalies with less time consumption while achieving a lower false positive rate.

1. Introduction

The Border Gateway Protocol (BGP) is one of the most important protocols for the global network, which is applied to ensure Internet reachability between Autonomous Systems (ASes) [1]. However, the protocol does not include authentication and validation steps, making global networks more vulnerable to malicious attacks and misconfigurations, such as prefix hijacks, route leaks, and network outages [2]. Due to the global nature of BGP, these attacks would generally cause much more severe consequences than a normal attack, leading to a rapid and widespread spread throughout the entire network [3]. For example, a Facebook anomaly event caused by a BGP misconfiguration on 4 October 2021 led to a large scale network outage, as a result of which, the service of Facebook was disconnected from the global network for almost six hours, and its market value shrunk by nearly 5% (an estimated loss of more than USD 6 billion) [4].
To enhance routing security in large-scale networks, two complementary approaches prevail: proactive defensemechanisms that prevent threats preemptively, and passive detection systems that identify ongoing attacks. Proactive defense typically leverages out-of-band trust information to detect abnormal routing messages, such as Route Origin Validation (ROV) [5] and Autonomous System Provider Authorization (ASPA) [6], which are built upon the Resource Public Key Infrastructure (RPKI). However, while these schemes offer robust security, achieving comprehensive deployment will require a significant amount of time. Crucially, their partial implementation may introduce new vulnerabilities [7].
Passive detection leverages AS behavioral patterns—specifically, when an AS detects alterations in its routing information base and propagates updates to neighboring ASes—to identify anomalous routes. This type of method can be further divided into two categories, rule-based [8] and Artificial Intelligence (AI)-based [9]. The rule-based method first constructs its rule base and then matches each incoming update message with the rule base. A well-matched update message is recognized as a valid route announcement.
In the closed labeled datasets, these methods typically perform well. Once deployed in a real-world environment, the accuracy and real-time performance are often not satisfactory. We argue that there are two major challenges limiting the existing methods in real-deployed environments, described as follows:
  • The BGP network is large-scale. Modern Internet topologies encompass approximately 75,000 autonomous systems (ASes) and over 600,000 inter-AS links. Given this scale, anomaly detection systems face substantial challenges, demanding not only high detection accuracy but also low-latency processing to meet the stringent requirements of real-time monitoring.
  • The BGP network observed by collectors is in partial view. The AS links collected from vantage points do not capture all existing AS-level connections, resulting in incomplete input data for the detection method [10]. This limitation can significantly degrade anomaly detection performance, particularly in terms of detection accuracy and false alarm rates.
In this paper, we present a routing anomaly detection system named GLBAD designed to address the aforementioned problems, aiming to enhance both accuracy and real-time performance. The contributions of our paper can be summarized below:
  • A BGP-dedicated Partition Scheme. To face the large size of topology and volumes of BGP messages, we develop a multi-level graph partition scheme. This partition method leverages the characteristics of the BGP network, specifically the power law distribution and sparsity, to divide the BGP graph, taking into account both load balance and minimizing information loss.
  • An Adaptive Structure Inference Method. To overcome the problem of partial view, we propose a topology inference method designed for incomplete structures. Additionally, this adjacency matrix can be self-learning to better obtain the graph and node embeddings.
  • A Complete BGP Anomaly Detection System. We propose a complete BGP anomaly detection system. It will achieve real-time detection. The system retrieves paths, aligns them, detects anomalies, and locates root causes.
  • Comprehensive Experiments. An evaluation of our method and its baselines under the closed-world and open-world datasets, respectively, which demonstrates that our method is effective and behaves best in terms of accuracy and time performance.

2. Background and Problem Statement

In this section, we present the relevant preliminary knowledge and outline the research problems.

2.1. BGP Hijack

According to the identified vulnerabilities, the BGP anomaly can be further categorized into prefix hijacking, routing leaks, and route hijacking. The specific models are shown in Figure 1.
Prefix Hijack. The attacker configures its AS router to advertise prefixes owned by other ASes, thereby hijacking traffic destined for its own AS. In Figure 1a, attacker AS4 forges ownership of prefix P from AS1. As seen from the observation point, the AS path to prefix P shifts from (AS3, AS2, AS1) to (AS3, AS2, AS4).
Route Leak. RFC 7908 [11] defines such BGP anomalies as route announcements that propagate beyond their intended scope. In Figure 1b, attacker AS2 leaks AS1’s prefix to its upstream provider, AS4. This detour from the expected path has a direct impact on network traffic.
Network Outage. ASes exchange reachability information via BGP keep-alive messages at fixed intervals. Misconfigurations, disasters, or political events can disrupt AS connectivity, causing unreachable paths or disconnections. In Figure 1c, AS2 loses connection to AS1 and AS3, rendering path (AS3, AS2, AS1) unreachable. AS1 must then select an alternative path.

2.2. Characteristics of BGP Network

The global BGP network includes approximately 70,000 ASes linked by about 600,000 AS-level connections. As shown in Figure 2a, the Internet’s inter-domain topology has consistently expanded over time, in both the number of ASes and inter-AS links. This growth, combined with the large volume of BGP updates, presents significant challenges for current anomaly detection methods, making real-time analysis increasingly difficult.
Then, we also calculate the average degree of the BGP network in Figure 2b. We can observe that the AS-level connectivity remains sparse (the average degree of AS is at a very low level, i.e., 0.0003).
Finally, we measure the cumulative distribution of the AS degree for each year in Figure 2c. It is observed that the BGP network is with a highly skewed degree distribution, and the degrees of ASes exhibit power-law distribution.
These characteristics of the BGP network lead to negative impacts on the anomaly detection. The detailed causes can be referred to in Section 2.4.

2.3. Partial View of BGP Routing Data

What Is Partial View

Although BGP routing data is extensive, these open-sourced data are drawn from collectors that provide only a partial view. BGP collectors are distributed unevenly, with a predominant presence in Europe and North America [12]. As a result, several links and ASes remain unobserved, and these unobserved degrees vary over time. Figure 3 illustrates the calculated percentages of unobserved AS links from 2016 to 2024.
All of the above negative factors stem from the existing methods’ inability to perform effectively in a real-world environment. Thus, this paper presents a novel BGP anomaly detection framework to reach accurate and timely performance.

2.4. Problem Statement

The primary focus of this work is to accurately and timely detect anomalies. The BGP network can be modeled into an attributed graph represented by G = ( V , E , X ) , where V = { v 1 , v 2 , , v n } is the set of n nodes, E is the set of m edges, and X R n × d is the attributed matrix. The structure of the graph can also be denoted as A R n × n . Specifically, A i j = 1 if node v i and node v j exist on one edge, and A i j = 0 if not. The graph of the Laplacian matrix L is defined as D A , where D is the degree matrix. Facing such a large topology, most detection systems based on graph learning consume a significant amount of time, making it impossible for these systems to achieve real-time detection. Thus, we aim to partition the BGP graph and then learn the subgraphs, respectively.
However, graph partition on the BGP graph is difficult due to its inherent characteristics. Firstly, this graph is sparse. It means that inducing an arbitrary split on its AS links will result in significant information loss for one AS node in the graph. As such, the obtained detection results are typically inaccurate. Second, this graph is a highly skewed degree distribution. It will lead to a load imbalance if the existing methods are used to partition the BGP graph. Third, as discussed above, the observed BGP graph represents only a subset of the actual BGP network. Anomaly detection on this incomplete graph is more likely to result in inaccurate judgments and decisions.
Thus, in this paper, we dedicate the graph partition method to splitting the BGP graph, considering load balance and minimizing information loss, in order to achieve real-time anomaly detection. Additionally, we devise a topology structure inference method to restore the partial-view, incomplete graph through learning the attributes and relationships of the ASes.

3. Related Work

In recent years, researchers have conducted extensive studies on BGP anomaly detection. Existing BGP anomaly detection methods can be categorized into three main types: rule-based methods, artificial intelligence-based methods, and active probing-based methods.
Active Probing-Based Detection Methods. This method actively sends traffic into the network and detects whether a certain IP or prefix is reachable by analyzing the traffic reception and stability. Schlamp et al. [13] enhance this approach by utilizing encrypted traffic based on the Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol to test the reachability of abnormal IP addresses, which reduces the false positive rate. By comparing public key changes before and after abnormal events, they distinguish between benign and suspicious events. The advantage of this method, unlike others, is that it can help assess the extent of an event’s development if the event’s source is known. In contrast, Trinocular [14] introduces an adaptive probing mechanism that reduces the frequency of active traffic injection, lowering network load. However, this approach cannot achieve comprehensive network detection, unlike the broader reach of previous methods.
Rule-Based Detection Methods. These methods detect anomalies by matching against legitimate rules. If the information in a routing update message does not match any valid rule, an alert is triggered. Artemis [8], as a typical routing logic method, has been widely used for prefix hijacking detection. Its advantage lies in its good interpretability, providing a clear logical explanation for each detection decision. However, routing logic methods are heavily dependent on the accuracy of the data. If the input data source is inaccurate or incomplete, the detection results may be significantly affected. Especially when facing partial data loss or inconsistencies, this type of method is prone to false positives or missed detections.
Artificial Intelligence-Based Detection Methods. These methods use artificial intelligence techniques to model network traffic or routing data, identifying abnormal patterns to detect network faults or attack events. Researchers have used unsupervised learning algorithms to detect network anomalies by clustering network behavior into different clusters and identifying behaviors that deviate from the normal cluster as anomalies [1]. Li et al. [15] combined supervised and unsupervised learning methods to propose an ensemble learning-based anomaly detection framework, utilizing multiple trained base classifiers and unsupervised learning for integration, improving detection accuracy and robustness. The advantage of these methods is that they can automatically learn complex patterns of network behavior. However, machine learning methods typically require large amounts of labeled data for training, and their performance is highly dependent on the quality of the data and the effectiveness of feature engineering. Furthermore, the black-box effect of deep learning models makes it difficult to interpret the detection results, thereby limiting their trustworthiness in practical applications [16].
GLBAD is a passive anomaly detection framework. It complements proactive tools such as Resource Public Key Infrastructure (RPKI), and Autonomous System Provider Authorization (ASPA). It detects suspicious path behavior on partially protected prefixes. GLBAD then supplies candidate events to ARTEMIS-like systems. To handle conflicts from misidentified AS relationships, GLBAD sets conservative thresholds and groups high-risk alerts for operator confirmation. This approach keeps policy conflicts in check and supports the other mechanisms. In a broader context of network security, addressing both path anomalies and covert threats contributes to more comprehensive detection capabilities.
Encrypted traffic and wireless side-channel analyses are closely tied to application and behavior identification for intrusion detection. For example, FOAP [17] targets open-world Android traffic fingerprinting. After filtering irrelevant flows, it can infer fine-grained UI-associated user operations. This demonstrates that encrypted traffic still leaks information about applications and behaviors. Building on this, AppListener [18] shows that, even without packet capture, one can identify applications and their in-app activities using only passive harvesting of Wi-Fi RF energy variations. This broadens the range of exploitable side channels. Together, these studies indirectly show that even with incomplete observations and noise, anomalous behaviors can still be detected via deviations in temporal or structural patterns.

4. Method

In this section, we will introduce our proposed method, named GLBAD, to present how to overcome the existing challenges of BGP anomaly detection.

4.1. Overview

The framework of the proposed method is illustrated in Figure 4, comprising route collection, graph construction, graph partition, structure inference, and ultimately, anomaly detection. In this part, we first introduce the process of route collection. Then, we construct the graph through the route. To accommodate such a large BGP network, we propose a graph partition scheme that enables partitioning a large graph into several similar and properly sized subgraphs. To address the issue of partial visibility of public vantages, we propose an adaptive structure inference method to compensate for the missing links. Finally, we develop a comprehensive anomaly detection system to enable real-time detection.

4.2. Route Collection

The BGP route data generally includes the BGP route information base (RIB) and BGP update message, which are publicly collected by multiple globally distributed vantage points. These data are saved in the binary format of MRT. Thus, before usage, it is necessary to decompress and parse it into a human-friendly format. The BGP RIB and update message have the same field. The detailed descriptions of the route field are listed in Table 1. The RIB data are the route tables of the vantage points, updated every 8 h. The update messages are triggered by changes to the route table and sent to neighbors.
Using this collected data, we can model the BGP network as a graph G ( t ) at time t as follows. Building on this, we detail how the network structure and attributes are constructed.

4.3. Graph Construction

We use the AS links extracted from AS paths to construct the structure of G ( t ) , i.e., the adjacent matrix A. Next, the AS attribute matrix is constructed using AS attribute information, which comprises geo-location information, the number of update messages destined for and crossed by the specific AS, and the semantic role extracted by BGPvector [19]. Thus, the vector size of each AS is d. To provide more detail, the following table describes each AS attribute.
However, the constructed graph is too large to be processed in a timely manner by the downstream anomaly detection method. Thus, we devise a customized graph partition method below.

4.4. The Proposed Graph Partition on BGP Graph

Due to the scale of BGP routing topology, graph partitioning is used to divide it into subgraphs, enabling downstream tasks to be computed separately on each subgraph. During partitioning, node counts should be balanced across subgraphs to ensure computational load balancing. To reduce information loss, the number of edges cut should be minimized. When cutting edges, the goal is to cut those with the least information. Thus, BGP topology partitioning is a graph partitioning problem: partition a large graph G into k subgraphs p = { G 1 , G 2 , , G k } , where the objective is to minimize edge cut weight while balancing node counts among subgraphs. This problem can be formally described as
f ( p ) = arg min G i , G j p W ( G i , G j ) s . t . : 1 V i V ε , i { 1 , 2 , . . . , k }
In the equation, V represents the vertex set of G, G i is a subgraph of G, V i is the vertex set of G i , and W represents the edge cut weight between two subgraphs.
This paper uses a multilevel graph partitioning algorithm to efficiently partition the BGP topology. The execution of the algorithm can be summarized into three core steps: coarsening, initialization, and refinement with refinement optimization.
To achieve the optimization objective, this paper proposes an innovative design of edge weights through a weight fusion approach. First, the AS node weight is determined based on the number of Customer Cones associated with each AS, where the size of the Customer Cone [20] is inferred from the BGP paths using CAIDA’s AS relationship inference algorithm. ASes with large Customer Cones play a significant role in the capital and governance structure of the Internet. In the routing table data, we count the number of reachable destination network addresses formed by each pair of ASes in the AS path, representing the traffic of that edge. The traffic weight is designed based on the distribution of traffic size. The number of ASes in each AS’s Customer Cone and the number of network prefixes it owns both follow a power-law distribution. Therefore, we design the weight function in a piecewise manner. Additionally, we aim for the node weight to play a primary role in the edge weight design. Hence, the edge weight is designed as the sum of the weights of the adjacent nodes and the traffic weight, with the node weight being greater than the traffic weight. The designs of node weights and traffic weights are shown in Table 2.
The edge weight calculation formula is as follows:
W edge = W node 1 + W node 2 + W traffic
where W node 1 and W node 2 represent the node weights of two ASes, and W traffic is the traffic weight between the two ASes.

4.4.1. Graph Coarsening

During the coarsening process of the graph, by merging adjacent nodes, the original graph is gradually reduced to a smaller graph, where it is evident that | V i | < | V i 1 | . As the coarsening operation proceeds in stages, a set of nodes in graph G i will be represented as a single node in graph G i + 1 . Specifically, let V v be the set of nodes generated during the coarsening of graph G i , corresponding to node v in graph G i + 1 . To ensure load balancing during graph partitioning, the weight of node v in the coarsest graph G i + 1 is set as the sum of the weights of all nodes in the set V i v from the current level graph G i . Moreover, when multiple nodes in V i v point to the same vertex u in G i + 1 , the weight of the edge between v and u in graph G i + 1 will be the sum of the weights of the edges in V i v that point to u.
Metis provides several edge-matching strategies for coarsening. For the BGP topology partitioning, a heavy-edge matching method is used, which aims to cut the less important edges in the BGP network topology to minimize the impact on subsequent routing anomaly detection. The heavy-edge matching uses a greedy algorithm-like approach, prioritizing matching edges with larger weights to construct the coarse graph. Let v V i ; for each unmatched vertex u, a random unmatched adjacent vertex v is selected, and the edge between u and v is matched such that the edge weight is maximized.
w ( u , v ) = max { w ( u , v ) | v V i }
Here, w ( u , v ) represents the weight of the edge connecting nodes u and v. The matched vertices u and v are merged into a single node, which becomes a new node in the coarsest graph. This process is repeated until no more vertices can be matched or the pre-set coarsening threshold is met.

4.4.2. Graph Partition

After coarsening, the graph enters the initial partitioning phase, which performs a high-quality bisection. The partition P m is computed on the coarsened graph so each partition contains roughly half the original vertices. During coarsening, vertex and edge weights accurately represent the finer-level graph. Thus, G m provides enough information to balance the partition and minimize edge cut cost. To achieve load balancing and reduce edge cut loss, a graph-growing method expands an initial vertex into a partition until the size constraint is met.

4.4.3. Graph Uncoarsening

In this stage, the partition P m of the coarse graph G m is mapped back to the original graph G 0 through a layer-by-layer traversal of the graphs G m 1 , G m 2 , , G 1 . For a vertex v in graph G i , its partition corresponds to the partition of the set V i v in the previous layer graph G i + 1 where the merged nodes are located. Thus, the partition of the merged nodes corresponding to V v is assigned to all nodes in V i v , which allows partition P i to be derived from partition P i + 1 .
Although P i + 1 is a locally optimal partition for graph G i + 1 , the refined partition P i may not necessarily be optimal for graph G i . Since G i is a finer-level graph, it has more degrees of freedom that can be used to further improve the partition P i . Therefore, the partition of G i can be optimized using a local refinement heuristic. After each refinement, the algorithm applies an optimization algorithm to the refined partition. The partition optimization algorithm is primarily based on the Kernighan-Lin algorithm, where boundary vertices are swapped to reduce the number of cut edges. The gain Δ c u t after swapping is given by the following formula:
Δ c u t = v V 1 g ( v ) v V 2 g ( v )
Here, g ( v ) is the gain from swapping vertex v, measured by the reduction in cut edges.

4.4.4. Subgraph Enhancement

To address information loss from edge cuts, each subgraph generated by Metis partitioning is enhanced. For each subgraph, the adjacent nodes of its boundary nodes are added as virtual nodes. The corresponding edge cut information is also retained. This process allows the local subgraph to fully reflect cross-partition structural relationships. Formally, let the original graph be G = ( V , E ) . After partitioning, the subgraphs form the set { G 1 , G 2 , , G k } , where each G i = ( V i , E i ) and the set of edge cuts is E cut . For any subgraph G i , its boundary node set is B i . For every boundary node u B i , its external node set is N ext ( u ) . Within G i , for each v N ext ( u ) , a virtual node v ˜ is added, resulting in the set of virtual nodes:
V i v i r t = { v ˜ | v N e x t ( u ) }
Retaining the edge cut information, that is
E i v i r t = { ( u , v ˜ ) | ( u , v ) E c u t , u V i }
The enhanced subgraph is
G i = ( V i V i v i r t , E i E i v i r t )

4.5. Structure Inference

The BGP network obtained by the public collectors is incomplete, leading to missing AS links or even AS nodes. This incompleteness likely misled the anomaly detection method to obtain erroneous results. In this part, we try to reconstruct the structure of the BGP network to recover a complete BGP graph.
Our reconstruction model includes the graph convolutional encoder, the decoder, and the Laplacian structure.

4.5.1. Encoder Module

The encoder module learns a layer-wise transformation by a spectral graph convolutional function f ( Z ( l ) , A | W ( l ) ) , i.e.,
f ( Z ( l ) , A | W ( l ) ) = σ ( D ˜ 1 / 2 A ˜ D ˜ 1 / 2 Z ( l ) W ( l ) ) .
Here, A ˜ = A + I , D ˜ i i = j A ˜ i j . I is the identity matrix of A ˜ and σ is the activation function (we use R e l u ( · ) function in this paper).

4.5.2. Decoder Module

The decoder module reconstructs the graph from the learned latent representations Z ( l ) , as follows:
A ^ = S i g m o i d ( Z · Z T ) .
where Z is the latent representation and Z = e n c o d e r ( Z | X , A ) .

4.5.3. Loss Function

Our devised loss function comprises reconstruction bias and latent structure bias. Firstly, we impose a greater penalty on the reconstruction error of the non-zero elements than that of the zero elements.
L G 1 = i = 1 n ( a i a ^ i ) b i 2 2 , = ( A A ^ ) B F 2 ,
We simultaneously consider the latent representation output by the encoder.
L L = i , j = 1 n z i z j 2 2 · a i j + γ i a i j 2 , = tr ( Z T L Z ) + γ A F 2 , s . t . a i T 1 = 1 , 0 a i 1 ,
where γ is the regulation factor. This Laplacian loss helps adaptively learn the graph structure, which poses a penalty when similar embedding vectors have a far distance representation in the embedding space. Thus, the Laplacian loss can cause vertices linked by an edge to be mapped into the embedding space.
Additionally, we set the regularization item to prevent overfitting.
L r e g = 1 2 i W ( i ) F 2 .
The total loss is as follows:
L S = L G 1 + L L + L r e g .
We update the adjacency matrix using the learned graph structure and the initial graph structure as follows:
A = α A L + ( 1 α ) A 0
where A 0 is the initial adjacency matrix, A L is the learned adjacency matrix, and α balances their weights. We update the graph every n epochs. We set a threshold τ so that updates stop once the epoch exceeds τ .
The whole process of adaptive structure learning is shown as Figure 5 and Algorithm 1.
Algorithm 1: Adaptive Graph Learning Algorithm
Electronics 14 04940 i001

4.6. Anomaly Detection

We use the monitor AS and destination prefix as keys to extract paths from routing announcements and RIB snapshots, then identify suspicious routing changes by comparing path differences. To quantify path changes, we employ dynamic time warping (DTW) in conjunction with the mean cosine distance, which effectively measures the overall discrepancy between two ordered sequences of unequal lengths. To formalize this calculation, for a vertex v i on path S and a vertex v j on path S , we compute the cosine distance between their embedding vectors.
CosineDistance ( v i , v j ) = 1 X v i , X v j X v i X v j .
Here, X v i and X v j denote the vector representations of the corresponding ASes. DTW employs dynamic programming to identify an alignment (warping path) that minimizes the cumulative cosine distance, which we take as the final path-difference score. If this score exceeds a dynamic threshold—estimated from the empirical distribution of historically normal changes—the routing change is deemed suspicious. The pseudocode for the DTW procedure is presented in Algorithm 2.
Algorithm 2: Anomaly Detection Algorithm
Electronics 14 04940 i002

4.7. Complexity Analysis

The total time complexity of the graph partitioning algorithm is the sum of the complexities of its three phases. In the coarsening phase, the algorithm operates with a time complexity of O ( | E | ) per coarsening level, repeated O ( log n ) times, as the number of vertices is roughly halved at each level. The partitioning phase has a complexity of O ( | E | ) . In the refinement phase, the time complexity is O ( | E | log | E | ) , where the number of iterations depends on the number of refinement passes. Given these factors, Metis generally operates with a time complexity of O ( | E | log | E | ) in typical cases, making it particularly efficient for large, sparse graphs compared to spectral methods.
The topology inference algorithm effectively balances convergence speed and training stability by dynamically adjusting the learning rate. During the iteration process, parameter updates will continue within the maximum number of iterations T until the convergence condition is met or the stopping threshold is reached. Regarding the adaptive learning mechanism of the adjacency matrix, we further examine its computational complexity during the training process. The time complexity of this process is O ( ( d n 2 ) t ) , where d represents the feature dimension of the nodes, n is the total number of nodes, and t is the number of iterations.
The anomaly detection method is based on the Dynamic Time Warping (DTW) algorithm, which has a time complexity of O ( N × M ) , where N and M are the lengths of the two sequences. It measures the similarity between the two sequences by computing the distance matrix using cosine distance and performing dynamic programming.

5. Experiments

This section evaluates anomaly detection in real BGP network settings. Experimental details are as follows:
Datasets. To assess detection accuracy, we integrate multiple BGP-related data sources—RIB, UPDATES, and AS business-relationship data—building upon the experimental framework introduced above. Next, we present the process for assembling our evaluation datasets. We collected 20 historical routing anomaly reports from 2005 to 2024, including 8 prefix hijacking events, 7 route leak events, and 5 network outage events, as detailed in Table 3. For each anomaly, we retrieved verifiable information from authoritative sources, such as the anomaly time and affected network prefixes. Using this information, we extracted all routing announcements for a 6 h period before and after each anomaly from RIS RIPE, resulting in 20 datasets with a total data volume of 340 GB. In addition, data for May 2025 was collected for the open experiment, resulting in a total data volume of 3.1 TB.
Experimental environment. To ensure experimental consistency, all experiments are run on a server with an Intel(R) Xeon(R) Platinum 8352Y 64-core CPU, Linux OS, and 219 GB RAM. For fair comparison, both our method and all baselines are implemented in Python 3.9.
Baselines. For comparative evaluation, we include four additional inter-domain routing anomaly detection methods—BGPvector [21], ISP-Operated [22], MSLSTM [23], and BGPviewer [24]. Table 4 summarizes their core techniques, and Table 5 presents their hyperparameter settings.

5.1. Research Questions

This section outlines the approach to anomaly detection in real-world BGP networks, presenting the evaluation framework and specifying the major research questions addressed.
RQ1: First, we consider graph partitioning effectiveness. The primary goal is to evaluate the partition quality of Metis when dividing the graph into multiple subgraphs.
RQ2: Next, we examine the accuracy of structure inference. The primary goal is to test the ability of adaptive graph learning to recover edges.
RQ3: Finally, we focus on anomaly detection using real-world datasets. The primary goal is to evaluate the performance of our method during real incidents.

5.2. BGP Topology Partitioning (RQ1)

We build the network from the October 2024 business-relationship table and set edge weights using our plan. We split the graph into 2 to 10 parts and note the total cut edge weight and how many edges are cut.
Next, we analyze the partitioning results. From Figure 6a, the total cut edge weight increases with the number of subgraphs, peaking at nine and then stabilizing. Figure 6b shows that the red curve indicates the subgraph adjacency-matrix size (reflecting memory use), while the blue curve shows the number of cut edges. Since subgraph size relates to memory consumption, partitioning must balance memory usage and information loss due to cuts. Overall, partitioning into four subgraphs yields the best trade-off.

5.3. Topology Reconstruction (RQ2)

Across datasets, we train all autoencoder models for 100 iterations using Adam with a learning rate of 0.001. The adaptive-learning mixing ratio ( α ) is set to 10%, with the number of adaptive updates t limited to 10–15. The regularization parameter ( λ ) is set to 0.01, and the parameter in the weight matrix W is set to 36. We report AUC for link-recovery quality. Data are split into training, testing, and validation sets. Using the 2024 route-leak event as an example, we partition the global BGP topology into four subgraphs and predict varying levels of incompleteness per subgraph.
As shown in Figure 7a, despite changes in topological incompleteness, link-inference accuracy remains around 0.72 AUC, indicating robust recovery performance. Using the same dataset, we performed link prediction based on GNNs, and the results are shown in Figure 7b. The four subgraphs yield an average AUC of 68%, indicating that our method achieves superior link prediction accuracy.

5.4. Path Difference Score Analysis (RQ3)

We analyze real routing announcements to quantify both legitimate and anomalous route changes using the path-difference score. For each event, we obtain authoritative ground truth (e.g., time and affected prefixes) and extract all announcements within a ±6 h window, yielding 20 datasets. Route changes cover origin changes (different origin AS) and path changes (same origin AS, different traversed AS path).
From Figure 8, results show anomalous route changes have significantly higher path-difference scores than normal changes, implying anomalies markedly alter AS roles along paths. Our method yields higher anomaly scores than BGPvector on real-world datasets.
To fairly compare separability for normal vs. anomalous BGP messages, we first apply a Fisher transform to cosine-similarity scores to remove dimensional effects, then compute the area between the two empirical CDFs (1D Wasserstein-1 distance) [25]. Results are summarized below.
The Wasserstein-1 distance (W1) quantifies the difference between the distributions of normal and anomalous routing paths. It can be interpreted as a measure of effect size, with a larger value indicating a greater distinction between the normal and anomalous paths in the overall distribution. According to Table 6, quantifying differences via the Wasserstein-1 distance shows our method yields substantially larger empirical cumulative distribution function (ECDF) areas than BGPvector, indicating better discrimination between normal and anomalous routes.

5.5. Detection Results of Closed Dataset (RQ3)

For each dataset, we use the latest routing table snapshot before the incident. We create a dictionary with the observation-point AS and network prefix as keys and the AS path as the value. We then sample BGP announcements inside and outside the incident window. Using a one-minute window, we examine all announcements during the incident and within the six hours preceding and following it. Next, each announcement is matched to the routing-table dictionary via the observation point. For each announcement, we match it to the routing-table dictionary by its observation point AS and prefix. We obtain the embedding vector for each AS in the path, then use dynamic time warping and cosine distance to obtain a path-difference score. If this score exceeds a predetermined limit, the route is considered anomalous. After detecting anomalous routes, set the 90th percentile as the first outlier threshold for each anomaly. Then, fit values above this threshold using the Generalized Pareto Distribution (GPD) to better capture unusual points. The final outlier threshold is set at the 98th percentile after GPD fitting. We evaluate BGPvector, ISP-Operated, MSLTM, BGPviewer, and our method on real-world datasets.The definition of experimental results is shown in Table 7. Table 8 reports the detection results and the alerting results, respectively. For each anomaly detection method, we summarize and compute the mean of the false positive rate, alarms, and false alarms for 20 events.
As shown in Table 8, our method and BGPviewer successfully detect all 20 real anomaly events, whereas MSLSTM achieves the lowest success rate (70%). Table 8 further shows that our method has the fewest false positives, whereas BGPviewer produces the most. Considering both detection coverage and false-positive rate, our approach offers the best overall performance and efficiency.

5.6. Statistical Analysis

To evaluate whether our method outperforms the comparison methods in detection performance, we employed the Nemenyi statistical hypothesis test. Each method was ranked by false positive rate and detection success rate across datasets, with the best-performing method ranked first. We then computed the average rank for each method over all datasets. Building on this, the Friedman rank test was used to test the null hypothesis of “no overall performance difference among methods.” Upon rejecting the null hypothesis at a significance level of α = 0.05 , the Nemenyi post hoc multiple comparison test was applied. In the Nemenyi test, these average ranks are used to analyze differences, producing a critical distance ( C D = q α k ( k + 1 ) 6 G ) where k is the number of methods, G is the number of datasets, and q is derived from the studentized range statistic divided by 2 . If the difference in average ranks between two methods exceeds the critical distance, their performance difference is statistically significant. The results for false positive rate and detection success rate are shown in Figure 9, where algorithm groups with no significant differences are connected.
The analysis reveals that our method achieves the best performance for both false positive rate and detection success rate, while BGPviewer performs the worst on both metrics.

5.7. Detection Result in the Open-World Dataset (RQ3)

We studied routing announcements in May 2025 using a 1 min window and counted daily alerts, totaling 1202 for the month, as shown in the figure. We also applied other anomaly detection methods to the May data; these results are presented in Figure 10a. Additionally, an authoritative external report classifies May 2025 alarms, including true and false alarms, as shown in Figure 10b.
To illustrate the impact of this incident, we observed that the deviation score between the anomalous AS path and the normal AS path was 0.945, substantially exceeding the threshold of 0.875. This result indicates that our method has strong practical value in operational deployments.
According to Table 9, our method achieves a detection time of 0.04 ms with a memory usage of 537 MB, outperforming the other methods in computational efficiency and resource consumption, denoting that our method enables dealing with real-time BGP update messages while keeping lower memory consumption.
Graph partitioning takes approximately 0.15 s, while training all subgraph models takes a total of around 93 min. For BGP network topology updates, set a threshold for significant network or AS-level changes. If exceeded, re-collect data, rebuild and partition the topology, run inference, generate AS embeddings, and perform anomaly detection. These steps ensure an efficient response to ongoing changes in the network.
To help network operators estimate GLBAD deployment cost at Internet scale, we measured resource needs. A standard server (16-core CPU, 64 GB RAM) can train on the partitioned BGP topology and process about 14,000 updates per second during divergence scoring. Model training requires approximately 37 GB of memory, while anomaly detection utilizes around 500 MB. This shows the system is feasible for real-world deployment.

5.8. Ablation Experiment

To assess performance gain in anomaly detection, we partitioned the BGP network topology in two ways: one using Metis without weights and another using our weight design approach. The results appear in Table 10.
Table 10 shows that designing weights lowers the false positive rate in anomaly detection through topology inference.
We conducted ablation experiments on the US_carrier_Sprint and GHOSTnet datasets to evaluate our method. We tested four types of embedding vectors: the initial AS embedding, one from topology inference across the whole graph, another from topology inference on partitioned subgraphs without enhancement, and the embedding from our proposed method. We compared their anomaly detection results, memory usage, and GPU memory consumption during training. Table 11 summarizes the methods, and Figure 11 shows the comparison.
Figure 11 shows that our proposed method achieves the lowest false positives. Enhancing subgraphs does not significantly increase memory or GPU consumption during training. However, the experimental environment cannot meet GPU memory demands for training on the full BGP network graph, and using only system memory requires substantial resources.

5.9. Threshold Sensitivity Analysis

We conduct a threshold sensitivity analysis on the GHOSTnet dataset, with results shown in Figure 12.
Figure 12 illustrates that the false positive rate (FPR) decreases in a stepwise manner with increasing thresholds. The model achieves balanced performance, with its lowest FPR of 0% between thresholds of 0.68 and 0.69. Performance declines and the FPR rises sharply when the threshold drops below 0.61.

5.10. Case Study

Building upon these computational results, we now consider a real-world anomaly event. On 21 June 2022, Cloudflare experienced a service disruption that affected traffic in 19 data centers. In this subsection, we analyze this severe network outage from a BGP perspective using the proposed method. AS paths, such as (17639 13335), represent the normal historical routing, while (17639 10099 4657 4637 6461 174 13335) correspond to the AS paths updated due to the disruption. We analyze the experimental results by calculating the cosine distance between the embedding vectors corresponding to the ASes, and the results are shown in Figure 13.
For the Cloudflare outage, we observe that most of the total cost is attributed to the new transit ASes between AS17639 and AS13335. These ASes are not on the usual short path (17639 13335); instead, they make a long detour in the affected path. This shows their role changed: from being outside the path normally to acting as key transit providers during the incident.
Building on this analysis, such a decomposition makes the path-difference score directly actionable. Rather than merely indicating that “the path has changed significantly,” GLBAD can identify a small set of ASes that contribute most to the deviation. In an operational setting, operators can prioritize examining routing policies and BGP communities for these high-contribution ASes. They can also check RPKI/ASPA configurations and export filters. If needed, they may temporarily increase preference for alternative upstreams or withdraw affected announcements at relevant points. Thus, the same metric that raises an alarm also guides where to begin mitigation and which AS roles are most likely responsible for the abnormal paths.

6. Conclusions

To address the dual challenges of Internet-scale topology and localized vantage points in inter-domain routing anomaly detection, this paper proposes an adaptive graph learning-based anomaly detection framework. The method employs a customized topology partitioning scheme based on METIS, incorporating subgraph augmentation to mitigate computational and storage demands associated with modeling Internet-wide BGP topologies. An adaptive topology inference module powered by a graph autoencoder reconstructs latent AS connectivity under incomplete observations. This produces more discriminative node embeddings. A path-comparison strategy combines dynamic time warping (DTW) with mean cosine distance, enabling the rapid quantification of routing changes and the timely identification of anomalies. Experimental results show that the approach achieves the highest accuracy on real anomalous event datasets, maintaining higher real-time performance. In addition, it also significantly reduces false positives, with an average reduction of 107 false alarms compared to its competing methods. In the future, we will integrate external information (e.g., RPKI/ROV, operator feedback) to further improve its robustness.

Author Contributions

Conceptualization, Z.W.; Methodology, Z.W. and Y.Z.; Validation, Y.Z. and J.W.; Writing—review & editing, Z.W.; Visualization, Y.Z. and J.W.; Supervision, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant 62402234, and the Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications under grant NY223168.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Al-Musawi, B.; Branch, P.; Armitage, G. BGP anomaly detection techniques: A survey. IEEE Commun. Surv. Tutor. 2016, 19, 377–396. [Google Scholar] [CrossRef]
  2. Mitseva, A.; Panchenko, A.; Engel, T. The state of affairs in BGP security: A survey of attacks and defenses. Comput. Commun. 2018, 124, 45–60. [Google Scholar] [CrossRef]
  3. Chen, Y.; Yin, Q.; Li, Q.; Liu, Z.; Xu, K.; Xu, Y.; Xu, M.; Liu, Z.; Wu, J. Learning with Semantics: Towards a Semantics-Aware Routing Anomaly Detection System. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 5143–5160. [Google Scholar]
  4. WIKIPEDIA. 2021 Facebook Outage. 2021. Available online: https://en.wikipedia.org/wiki/2021_Facebook_outage (accessed on 5 December 2025).
  5. Heilman, E.; Cooper, D.; Reyzin, L.; Goldberg, S. From the Consent of the Routed: Improving the Transparency of the RPKI. In Proceedings of the 2014 ACM Conference on SIGCOMM, Chicago, IL, USA, 17–22 August 2014; pp. 51–62. [Google Scholar]
  6. Wirtgen, T.; Rybowski, N.; Pelsser, C.; Bonaventure, O. The Multiple Benefits of a Secure Transport for BGP. Proc. Acm Netw. 2024, 2, 1–23. [Google Scholar] [CrossRef]
  7. Lychev, R.; Goldberg, S.; Schapira, M. BGP security in partial deployment: Is the juice worth the squeeze? In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, Hong Kong, China, 12–16 August 2013; pp. 171–182. [Google Scholar]
  8. Sermpezis, P.; Kotronis, V.; Gigis, P.; Dimitropoulos, X.; Cicalese, D.; King, A.; Dainotti, A. ARTEMIS: Neutralizing BGP hijacking within a minute. IEEE/ACM Trans. Netw. 2018, 26, 2471–2486. [Google Scholar] [CrossRef]
  9. Peng, S.; Nie, J.; Shu, X.; Ruan, Z.; Wang, L.; Sheng, Y.; Xuan, Q. A multi-view framework for BGP anomaly detection via graph attention network. Comput. Netw. 2022, 214, 109129. [Google Scholar] [CrossRef]
  10. Alfroy, T.; Holterbach, T.; Krenc, T.; Claffy, K.; Pelsser, C. The next generation of bgp data collection platforms. In Proceedings of the ACM SIGCOMM 2024 Conference, Sydney, Australia, 4–8 August 2024; pp. 794–812. [Google Scholar]
  11. Sriram, K.; Montgomery, D.; McPherson, D.; Osterweil, E.; Dickson, B. Problem Definition and Classification of BGP Route Leaks; Technical Report; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2016. [Google Scholar]
  12. Silva, B.A., Jr.; Mol, P.; Fonseca, O.; Cunha, I.; Ferreira, R.A.; Katz-Bassett, E. Automatic inference of BGP location communities. Proc. ACM Meas. Anal. Comput. Syst. 2022, 6, 1–23. [Google Scholar] [CrossRef]
  13. Schlamp, J.; Holz, R.; Jacquemart, Q.; Carle, G.; Biersack, E.W. HEAP: Reliable assessment of BGP hijacking attacks. IEEE J. Sel. Areas Commun. 2016, 34, 1849–1861. [Google Scholar] [CrossRef]
  14. Cheng, M.; Xu, Q.; Liu, W.; Li, Q.; Wang, J. MS-LSTM: A multi-scale LSTM model for BGP anomaly detection. In Proceedings of the 2016 IEEE 24th International Conference on Network Protocols (ICNP), Singapore, 8–11 November 2016; IEEE: New York, NY, USA, 2016; pp. 1–6. [Google Scholar]
  15. Li, J.; Dou, D.; Wu, Z.; Kim, S.; Agarwal, V. An Internet routing forensics framework for discovering rules of abnormal BGP events. ACM Sigcomm Comput. Commun. Rev. 2005, 35, 55–66. [Google Scholar] [CrossRef]
  16. Minh, D.; Wang, H.X.; Li, Y.F.; Nguyen, T.N. Explainable artificial intelligence: A comprehensive review. Artif. Intell. Rev. 2022, 55, 3503–3568. [Google Scholar] [CrossRef]
  17. Li, J.; Zhou, H.; Wu, S.; Luo, X.; Wang, T.; Zhan, X.; Ma, X. {FOAP}:{Fine-Grained}{Open-World} android app fingerprinting. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 1579–1596. [Google Scholar]
  18. Ni, T.; Lan, G.; Wang, J.; Zhao, Q.; Xu, W. Eavesdropping mobile app activity via {Radio-Frequency} energy harvesting. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 3511–3528. [Google Scholar]
  19. Shapira, T.; Shavitt, Y. BGP2Vec: Unveiling the Latent Characteristics of Autonomous Systems. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4516–4530. [Google Scholar] [CrossRef]
  20. Luckie, M.; Huffaker, B.; Dhamdhere, A.; Giotsas, V.; Claffy, K. AS relationships, customer cones, and validation. In Proceedings of the 2013 Conference on Internet Measurement Conference, Barcelona, Spain, 23–25 October 2013; pp. 243–256. [Google Scholar]
  21. Shapira, T.; Shavitt, Y. AP2Vec: An unsupervised approach for BGP hijacking detection. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2255–2268. [Google Scholar] [CrossRef]
  22. Kamiyama, N.; Mori, T.; Kawahara, R.; Harada, S.; Hasegawa, H. Analyzing influence of network topology on designing ISP-operated CDN. Telecommun. Syst. 2013, 52, 969–977. [Google Scholar]
  23. Cheng, M.; Li, Q.; Lv, J.; Liu, W.; Wang, J. Multi-scale LSTM model for BGP anomaly classification. IEEE Trans. Serv. Comput. 2018, 14, 765–778. [Google Scholar] [CrossRef]
  24. Papadopoulos, S.; Moustakas, K.; Tzovaras, D. BGPViewer: Using graph representations to explore BGP routing changes. In Proceedings of the 2013 18th International Conference on Digital Signal Processing (DSP), Santorini, Greece, 1–3 July 2013; IEEE: New York, NY, USA, 2013; pp. 1–6. [Google Scholar]
  25. Dukler, Y.; Li, W.; Lin, A.; Montúfar, G. Wasserstein of Wasserstein loss for learning generative models. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 1716–1725. [Google Scholar]
Figure 1. Classical BGP anomaly types.
Figure 1. Classical BGP anomaly types.
Electronics 14 04940 g001
Figure 2. Characteristics of BGP network. (a) The numbers of node and edge with years; (b) the sparsity degree (i.e., d e g r e e / # n o d e s ) of BGP network with years; (c) the cumulative distribution function of BGP degree.
Figure 2. Characteristics of BGP network. (a) The numbers of node and edge with years; (b) the sparsity degree (i.e., d e g r e e / # n o d e s ) of BGP network with years; (c) the cumulative distribution function of BGP degree.
Electronics 14 04940 g002
Figure 3. Unobserved links by BGP collectors with years.
Figure 3. Unobserved links by BGP collectors with years.
Electronics 14 04940 g003
Figure 4. The framework of the proposed anomaly detection method.
Figure 4. The framework of the proposed anomaly detection method.
Electronics 14 04940 g004
Figure 5. The framework of the proposed anomaly detection method.
Figure 5. The framework of the proposed anomaly detection method.
Electronics 14 04940 g005
Figure 6. Influence of the numbers of subgraphs. (a) Sum of cut edge weights vs. number of subgraphs. (b) Number of cut edges and matrix size vs. number of subgraphs.
Figure 6. Influence of the numbers of subgraphs. (a) Sum of cut edge weights vs. number of subgraphs. (b) Number of cut edges and matrix size vs. number of subgraphs.
Electronics 14 04940 g006
Figure 7. Results of topology reconstruction experiments. (a) AUC under varying incompleteness. (b) Results of GNN-based link prediction.
Figure 7. Results of topology reconstruction experiments. (a) AUC under varying incompleteness. (b) Results of GNN-based link prediction.
Electronics 14 04940 g007
Figure 8. Comparisons of path difference scores between anomalous and normal route changes.
Figure 8. Comparisons of path difference scores between anomalous and normal route changes.
Electronics 14 04940 g008
Figure 9. Critical difference diagrams for two metrics. (a) Critical difference in FPR. (b) Critical difference in detection accuracy.
Figure 9. Critical difference diagrams for two metrics. (a) Critical difference in FPR. (b) Critical difference in detection accuracy.
Electronics 14 04940 g009
Figure 10. Results of open-world experiments. (a) The number of daily alarms. (b) Specific details of alarms.
Figure 10. Results of open-world experiments. (a) The number of daily alarms. (b) Specific details of alarms.
Electronics 14 04940 g010
Figure 11. Results of the ablation study.
Figure 11. Results of the ablation study.
Electronics 14 04940 g011
Figure 12. Results of the threshold sensitivity experiment.
Figure 12. Results of the threshold sensitivity experiment.
Electronics 14 04940 g012
Figure 13. The case study of Cloudflare outage. (a) Path difference score at each AS pairing step of DTW. (b) Embedding vectors in a 2-D plane.
Figure 13. The case study of Cloudflare outage. (a) Path difference score at each AS pairing step of DTW. (b) Embedding vectors in a 2-D plane.
Electronics 14 04940 g013
Table 1. Key fields and their meanings in BGP RIB and update message.
Table 1. Key fields and their meanings in BGP RIB and update message.
FieldMeaning
AS PathAS sequence through which an update message passes
in turn.
PrefixThe destined IP prefix comprising IPv4 and IPv6.
Peer ASThe AS from where the update message is received.
OperationThe action to the routing information with the
update message, which usually is withdrawal or announcement.
Table 2. The detailed design of node weights and traffic weights.
Table 2. The detailed design of node weights and traffic weights.
Customer Cone CountNode WeightReachable IP Address CountTraffic Weight
[0, 10)5[0, 100)0
[10, 100)15[100, 1000)2
[100, 1000)25[1000, 10,000)4
[1000, 5000)35[10,000, 100,000)6
≥500045≥100,0008
Table 3. The detailed descriptions of the proposed anomaly dataset.
Table 3. The detailed descriptions of the proposed anomaly dataset.
EventDateType#Message (M)
Google_hijack20050507hijack0.60
Pakistan_Telecom20080224hijack1.22
US_carrier_Sprint20140909hijack28.10
20140910-AS5780720140910hijack16.47
H3S_median_services20141114hijack9.42
VolumeDrive20151204hijack16.87
GHOSTnet20160221hijack3.85
Bitcanal20180629hijack10.66
Australia_Telstra20120223leak5.37
Malaysian_Telecom20150612leak12.52
Google_leak20170825leak67.51
Level_320171106leak4.39
Allegheny_DQE20190624leak28.88
Cablevision_Mexico20210211leak77.50
Worldstream20241030leak79.69
Facebook20111004outage123.28
Comcast _120211109outage68.30
Comcast_220111109outage67.18
UA_ISP_BGP20220223outage184.96
Cloudflare20220621outage62.18
Table 4. The details of the baseline methods.
Table 4. The details of the baseline methods.
MethodCore Technique(s)
BGPvectorSkip-Gram + Continuous Bag Of Words
ISP-OperatedStatistical features + Self-attention + LSTM
MSLSTMWavelet transform + LSTM
BGPviewerStatistical features
Table 5. The hyperparameter settings of the baseline methods.
Table 5. The hyperparameter settings of the baseline methods.
MethodHyperparameter(s)
BGPvectorwindow = 2, epoch = 20, negative = 5
ISP-Operatedwindow = 10, batch = 128, epoch = 50, learning rate = 0.0001
MSLSTMwindow=16, batch = 128, epoch = 50, learning rate = 0.001
BGPviewerwindow = 10, batch = 128, epoch = 50, learning rate = 0.0001
Table 6. Performance comparison between ours and BGPvector.
Table 6. Performance comparison between ours and BGPvector.
DatasetOursBGPvector
GHOSTnet0.4790.331
Bitcanal0.4010.263
Level_30.0800.047
Worldstream0.1380.156
UA_ISP_BGP0.1760.131
Cloudflare0.2030.123
Table 7. Definition of experimental results.
Table 7. Definition of experimental results.
Experimental ResultsDescription
AlarmsThe number of time windows where anomalies were detected.
False AlarmsTime windows that were falsely identified as anomalies.
Table 8. Detection results on the 20 real-world datasets.
Table 8. Detection results on the 20 real-world datasets.
DatasetDetected#Alarms (#FalseAlarms)
BGPvectorISP-OperatedMSLSTMBGPviewerOursBGPvectorISP-OperatedMSLSTMBGPviewerOurs
Google_hijack18(16)0(0)1(1)18(16)11(8)
Pakistan_Telecom22(21)0(0)13(13)801(796)27(22)
US_carrier_Sprint51(50)63(49)33(27)88(81)36(23)
20140910-AS5780732(32)27(26)3(2)51(50)28(24)
H3S_median_services19(17)0(0)8(8)1106(1100)26(20)
VolumeDrive61(61)82(73)6(6)38(37)70(69)
GHOSTnet22(16)14(10)0(0)36(33)5(0)
Bitcanal26(25)54(52)20(15)35(34)25(21)
Australia_Telstra12(11)26(19)17(8)73(59)19(17)
Malaysian_Telecom18(17)165(122)148(115)141(126)7(0)
Google_leak12(11)82(67)16(2)91(84)17(13)
Level_316(16)196(103)134(89)178(142)16(15)
Allegheny_DQE24(0)257(142)173(90)209(140)17(0)
Cablevision_Mexico4(4)0(0)69(69)126(119)19(17)
Worldstream25(21)15(0)56(50)107(96)18(9)
Facebook22(16)506(167)296(120)647(427)19(6)
Comcast_115(15)69(32)589(526)531(493)15(12)
Comcast_211(10)96(46)644(580)523(498)16(15)
UA_ISP_BGP61(46)738(501)1707(1352)1155(913)61(54)
Cloudflare13(6)83(16)190(165)574(519)13(6)
FPR 1.74%6.19%12.23%30.22%1.45%
Overall16/2016/2014/2020/2020/20483(410)2473(1325)3525(2588)6522(5658)465(351)
Table 9. The resource occupation of the baselines.
Table 9. The resource occupation of the baselines.
MethodMemory (MB)Detection Time (ms)
BGPvector5580.06
ISP-Operated62052
MSLSTM54553
BGPviewer66850
Ours5370.04
Table 10. Experimental results with no weights assigned and with weights assigned.
Table 10. Experimental results with no weights assigned and with weights assigned.
Dataset#Alarms (#FalseAlarms)
No Weights AssignedWeights Assigned
US_carrier Sprint42(31)36(23)
GHOSTnet20(14)5(0)
FPR1.34%0.51%
Overall62(45)41(23)
Table 11. The methods of ablation experiments.
Table 11. The methods of ablation experiments.
MethodsMethod Implementation
M0Original hypothesis.
M1Perform topology inference on the entire graph.
M2Perform topology inference directly on the partitioned subgraph
without enhancement.
M3The method proposed in this paper.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Z.; Zhou, Y.; Wu, J. GLBAD: Online BGP Anomaly Detection Under Partial Observation. Electronics 2025, 14, 4940. https://doi.org/10.3390/electronics14244940

AMA Style

Wu Z, Zhou Y, Wu J. GLBAD: Online BGP Anomaly Detection Under Partial Observation. Electronics. 2025; 14(24):4940. https://doi.org/10.3390/electronics14244940

Chicago/Turabian Style

Wu, Zheng, Yaoyu Zhou, and Junda Wu. 2025. "GLBAD: Online BGP Anomaly Detection Under Partial Observation" Electronics 14, no. 24: 4940. https://doi.org/10.3390/electronics14244940

APA Style

Wu, Z., Zhou, Y., & Wu, J. (2025). GLBAD: Online BGP Anomaly Detection Under Partial Observation. Electronics, 14(24), 4940. https://doi.org/10.3390/electronics14244940

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop