1. Introduction
The Border Gateway Protocol (BGP) is one of the most important protocols for the global network, which is applied to ensure Internet reachability between Autonomous Systems (ASes) [
1]. However, the protocol does not include authentication and validation steps, making global networks more vulnerable to malicious attacks and misconfigurations, such as prefix hijacks, route leaks, and network outages [
2]. Due to the global nature of BGP, these attacks would generally cause much more severe consequences than a normal attack, leading to a rapid and widespread spread throughout the entire network [
3]. For example, a Facebook anomaly event caused by a BGP misconfiguration on 4 October 2021 led to a large scale network outage, as a result of which, the service of Facebook was disconnected from the global network for almost six hours, and its market value shrunk by nearly 5% (an estimated loss of more than USD 6 billion) [
4].
To enhance routing security in large-scale networks, two complementary approaches prevail: proactive defensemechanisms that prevent threats preemptively, and passive detection systems that identify ongoing attacks. Proactive defense typically leverages out-of-band trust information to detect abnormal routing messages, such as Route Origin Validation (ROV) [
5] and Autonomous System Provider Authorization (ASPA) [
6], which are built upon the Resource Public Key Infrastructure (RPKI). However, while these schemes offer robust security, achieving comprehensive deployment will require a significant amount of time. Crucially, their partial implementation may introduce new vulnerabilities [
7].
Passive detection leverages AS behavioral patterns—specifically, when an AS detects alterations in its routing information base and propagates updates to neighboring ASes—to identify anomalous routes. This type of method can be further divided into two categories, rule-based [
8] and Artificial Intelligence (AI)-based [
9]. The rule-based method first constructs its rule base and then matches each incoming update message with the rule base. A well-matched update message is recognized as a valid route announcement.
In the closed labeled datasets, these methods typically perform well. Once deployed in a real-world environment, the accuracy and real-time performance are often not satisfactory. We argue that there are two major challenges limiting the existing methods in real-deployed environments, described as follows:
The BGP network is large-scale. Modern Internet topologies encompass approximately 75,000 autonomous systems (ASes) and over 600,000 inter-AS links. Given this scale, anomaly detection systems face substantial challenges, demanding not only high detection accuracy but also low-latency processing to meet the stringent requirements of real-time monitoring.
The BGP network observed by collectors is in partial view. The AS links collected from vantage points do not capture all existing AS-level connections, resulting in incomplete input data for the detection method [
10]. This limitation can significantly degrade anomaly detection performance, particularly in terms of detection accuracy and false alarm rates.
In this paper, we present a routing anomaly detection system named GLBAD designed to address the aforementioned problems, aiming to enhance both accuracy and real-time performance. The contributions of our paper can be summarized below:
A BGP-dedicated Partition Scheme. To face the large size of topology and volumes of BGP messages, we develop a multi-level graph partition scheme. This partition method leverages the characteristics of the BGP network, specifically the power law distribution and sparsity, to divide the BGP graph, taking into account both load balance and minimizing information loss.
An Adaptive Structure Inference Method. To overcome the problem of partial view, we propose a topology inference method designed for incomplete structures. Additionally, this adjacency matrix can be self-learning to better obtain the graph and node embeddings.
A Complete BGP Anomaly Detection System. We propose a complete BGP anomaly detection system. It will achieve real-time detection. The system retrieves paths, aligns them, detects anomalies, and locates root causes.
Comprehensive Experiments. An evaluation of our method and its baselines under the closed-world and open-world datasets, respectively, which demonstrates that our method is effective and behaves best in terms of accuracy and time performance.
2. Background and Problem Statement
In this section, we present the relevant preliminary knowledge and outline the research problems.
2.1. BGP Hijack
According to the identified vulnerabilities, the BGP anomaly can be further categorized into prefix hijacking, routing leaks, and route hijacking. The specific models are shown in
Figure 1.
Prefix Hijack. The attacker configures its AS router to advertise prefixes owned by other ASes, thereby hijacking traffic destined for its own AS. In
Figure 1a, attacker AS4 forges ownership of prefix P from AS1. As seen from the observation point, the AS path to prefix P shifts from (AS3, AS2, AS1) to (AS3, AS2, AS4).
Route Leak. RFC 7908 [
11] defines such BGP anomalies as route announcements that propagate beyond their intended scope. In
Figure 1b, attacker AS2 leaks AS1’s prefix to its upstream provider, AS4. This detour from the expected path has a direct impact on network traffic.
Network Outage. ASes exchange reachability information via BGP keep-alive messages at fixed intervals. Misconfigurations, disasters, or political events can disrupt AS connectivity, causing unreachable paths or disconnections. In
Figure 1c, AS2 loses connection to AS1 and AS3, rendering path (AS3, AS2, AS1) unreachable. AS1 must then select an alternative path.
2.2. Characteristics of BGP Network
The global BGP network includes approximately 70,000 ASes linked by about 600,000 AS-level connections. As shown in
Figure 2a, the Internet’s inter-domain topology has consistently expanded over time, in both the number of ASes and inter-AS links. This growth, combined with the large volume of BGP updates, presents significant challenges for current anomaly detection methods, making real-time analysis increasingly difficult.
Then, we also calculate the average degree of the BGP network in
Figure 2b. We can observe that the AS-level connectivity remains sparse (the average degree of AS is at a very low level, i.e., 0.0003).
Finally, we measure the cumulative distribution of the AS degree for each year in
Figure 2c. It is observed that the BGP network is with a highly skewed degree distribution, and the degrees of ASes exhibit power-law distribution.
These characteristics of the BGP network lead to negative impacts on the anomaly detection. The detailed causes can be referred to in
Section 2.4.
2.3. Partial View of BGP Routing Data
What Is Partial View
Although BGP routing data is extensive, these open-sourced data are drawn from collectors that provide only a partial view. BGP collectors are distributed unevenly, with a predominant presence in Europe and North America [
12]. As a result, several links and ASes remain unobserved, and these unobserved degrees vary over time.
Figure 3 illustrates the calculated percentages of unobserved AS links from 2016 to 2024.
All of the above negative factors stem from the existing methods’ inability to perform effectively in a real-world environment. Thus, this paper presents a novel BGP anomaly detection framework to reach accurate and timely performance.
2.4. Problem Statement
The primary focus of this work is to accurately and timely detect anomalies. The BGP network can be modeled into an attributed graph represented by , where is the set of n nodes, E is the set of m edges, and is the attributed matrix. The structure of the graph can also be denoted as . Specifically, if node and node exist on one edge, and if not. The graph of the Laplacian matrix L is defined as , where D is the degree matrix. Facing such a large topology, most detection systems based on graph learning consume a significant amount of time, making it impossible for these systems to achieve real-time detection. Thus, we aim to partition the BGP graph and then learn the subgraphs, respectively.
However, graph partition on the BGP graph is difficult due to its inherent characteristics. Firstly, this graph is sparse. It means that inducing an arbitrary split on its AS links will result in significant information loss for one AS node in the graph. As such, the obtained detection results are typically inaccurate. Second, this graph is a highly skewed degree distribution. It will lead to a load imbalance if the existing methods are used to partition the BGP graph. Third, as discussed above, the observed BGP graph represents only a subset of the actual BGP network. Anomaly detection on this incomplete graph is more likely to result in inaccurate judgments and decisions.
Thus, in this paper, we dedicate the graph partition method to splitting the BGP graph, considering load balance and minimizing information loss, in order to achieve real-time anomaly detection. Additionally, we devise a topology structure inference method to restore the partial-view, incomplete graph through learning the attributes and relationships of the ASes.
3. Related Work
In recent years, researchers have conducted extensive studies on BGP anomaly detection. Existing BGP anomaly detection methods can be categorized into three main types: rule-based methods, artificial intelligence-based methods, and active probing-based methods.
Active Probing-Based Detection Methods. This method actively sends traffic into the network and detects whether a certain IP or prefix is reachable by analyzing the traffic reception and stability. Schlamp et al. [
13] enhance this approach by utilizing encrypted traffic based on the Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol to test the reachability of abnormal IP addresses, which reduces the false positive rate. By comparing public key changes before and after abnormal events, they distinguish between benign and suspicious events. The advantage of this method, unlike others, is that it can help assess the extent of an event’s development if the event’s source is known. In contrast, Trinocular [
14] introduces an adaptive probing mechanism that reduces the frequency of active traffic injection, lowering network load. However, this approach cannot achieve comprehensive network detection, unlike the broader reach of previous methods.
Rule-Based Detection Methods. These methods detect anomalies by matching against legitimate rules. If the information in a routing update message does not match any valid rule, an alert is triggered. Artemis [
8], as a typical routing logic method, has been widely used for prefix hijacking detection. Its advantage lies in its good interpretability, providing a clear logical explanation for each detection decision. However, routing logic methods are heavily dependent on the accuracy of the data. If the input data source is inaccurate or incomplete, the detection results may be significantly affected. Especially when facing partial data loss or inconsistencies, this type of method is prone to false positives or missed detections.
Artificial Intelligence-Based Detection Methods. These methods use artificial intelligence techniques to model network traffic or routing data, identifying abnormal patterns to detect network faults or attack events. Researchers have used unsupervised learning algorithms to detect network anomalies by clustering network behavior into different clusters and identifying behaviors that deviate from the normal cluster as anomalies [
1]. Li et al. [
15] combined supervised and unsupervised learning methods to propose an ensemble learning-based anomaly detection framework, utilizing multiple trained base classifiers and unsupervised learning for integration, improving detection accuracy and robustness. The advantage of these methods is that they can automatically learn complex patterns of network behavior. However, machine learning methods typically require large amounts of labeled data for training, and their performance is highly dependent on the quality of the data and the effectiveness of feature engineering. Furthermore, the black-box effect of deep learning models makes it difficult to interpret the detection results, thereby limiting their trustworthiness in practical applications [
16].
GLBAD is a passive anomaly detection framework. It complements proactive tools such as Resource Public Key Infrastructure (RPKI), and Autonomous System Provider Authorization (ASPA). It detects suspicious path behavior on partially protected prefixes. GLBAD then supplies candidate events to ARTEMIS-like systems. To handle conflicts from misidentified AS relationships, GLBAD sets conservative thresholds and groups high-risk alerts for operator confirmation. This approach keeps policy conflicts in check and supports the other mechanisms. In a broader context of network security, addressing both path anomalies and covert threats contributes to more comprehensive detection capabilities.
Encrypted traffic and wireless side-channel analyses are closely tied to application and behavior identification for intrusion detection. For example, FOAP [
17] targets open-world Android traffic fingerprinting. After filtering irrelevant flows, it can infer fine-grained UI-associated user operations. This demonstrates that encrypted traffic still leaks information about applications and behaviors. Building on this, AppListener [
18] shows that, even without packet capture, one can identify applications and their in-app activities using only passive harvesting of Wi-Fi RF energy variations. This broadens the range of exploitable side channels. Together, these studies indirectly show that even with incomplete observations and noise, anomalous behaviors can still be detected via deviations in temporal or structural patterns.
4. Method
In this section, we will introduce our proposed method, named GLBAD, to present how to overcome the existing challenges of BGP anomaly detection.
4.1. Overview
The framework of the proposed method is illustrated in
Figure 4, comprising route collection, graph construction, graph partition, structure inference, and ultimately, anomaly detection. In this part, we first introduce the process of route collection. Then, we construct the graph through the route. To accommodate such a large BGP network, we propose a graph partition scheme that enables partitioning a large graph into several similar and properly sized subgraphs. To address the issue of partial visibility of public vantages, we propose an adaptive structure inference method to compensate for the missing links. Finally, we develop a comprehensive anomaly detection system to enable real-time detection.
4.2. Route Collection
The BGP route data generally includes the BGP route information base (RIB) and BGP update message, which are publicly collected by multiple globally distributed vantage points. These data are saved in the binary format of MRT. Thus, before usage, it is necessary to decompress and parse it into a human-friendly format. The BGP RIB and update message have the same field. The detailed descriptions of the route field are listed in
Table 1. The RIB data are the route tables of the vantage points, updated every 8 h. The update messages are triggered by changes to the route table and sent to neighbors.
Using this collected data, we can model the BGP network as a graph at time t as follows. Building on this, we detail how the network structure and attributes are constructed.
4.3. Graph Construction
We use the AS links extracted from AS paths to construct the structure of
, i.e., the adjacent matrix
A. Next, the AS attribute matrix is constructed using AS attribute information, which comprises geo-location information, the number of update messages destined for and crossed by the specific AS, and the semantic role extracted by BGPvector [
19]. Thus, the vector size of each AS is
d. To provide more detail, the following table describes each AS attribute.
However, the constructed graph is too large to be processed in a timely manner by the downstream anomaly detection method. Thus, we devise a customized graph partition method below.
4.4. The Proposed Graph Partition on BGP Graph
Due to the scale of BGP routing topology, graph partitioning is used to divide it into subgraphs, enabling downstream tasks to be computed separately on each subgraph. During partitioning, node counts should be balanced across subgraphs to ensure computational load balancing. To reduce information loss, the number of edges cut should be minimized. When cutting edges, the goal is to cut those with the least information. Thus, BGP topology partitioning is a graph partitioning problem: partition a large graph
G into
k subgraphs
, where the objective is to minimize edge cut weight while balancing node counts among subgraphs. This problem can be formally described as
In the equation,
V represents the vertex set of
G,
is a subgraph of
G,
is the vertex set of
, and
W represents the edge cut weight between two subgraphs.
This paper uses a multilevel graph partitioning algorithm to efficiently partition the BGP topology. The execution of the algorithm can be summarized into three core steps: coarsening, initialization, and refinement with refinement optimization.
To achieve the optimization objective, this paper proposes an innovative design of edge weights through a weight fusion approach. First, the AS node weight is determined based on the number of Customer Cones associated with each AS, where the size of the Customer Cone [
20] is inferred from the BGP paths using CAIDA’s AS relationship inference algorithm. ASes with large Customer Cones play a significant role in the capital and governance structure of the Internet. In the routing table data, we count the number of reachable destination network addresses formed by each pair of ASes in the AS path, representing the traffic of that edge. The traffic weight is designed based on the distribution of traffic size. The number of ASes in each AS’s Customer Cone and the number of network prefixes it owns both follow a power-law distribution. Therefore, we design the weight function in a piecewise manner. Additionally, we aim for the node weight to play a primary role in the edge weight design. Hence, the edge weight is designed as the sum of the weights of the adjacent nodes and the traffic weight, with the node weight being greater than the traffic weight. The designs of node weights and traffic weights are shown in
Table 2.
The edge weight calculation formula is as follows:
where
and
represent the node weights of two ASes, and
is the traffic weight between the two ASes.
4.4.1. Graph Coarsening
During the coarsening process of the graph, by merging adjacent nodes, the original graph is gradually reduced to a smaller graph, where it is evident that . As the coarsening operation proceeds in stages, a set of nodes in graph will be represented as a single node in graph . Specifically, let be the set of nodes generated during the coarsening of graph , corresponding to node v in graph . To ensure load balancing during graph partitioning, the weight of node v in the coarsest graph is set as the sum of the weights of all nodes in the set from the current level graph . Moreover, when multiple nodes in point to the same vertex u in , the weight of the edge between v and u in graph will be the sum of the weights of the edges in that point to u.
Metis provides several edge-matching strategies for coarsening. For the BGP topology partitioning, a heavy-edge matching method is used, which aims to cut the less important edges in the BGP network topology to minimize the impact on subsequent routing anomaly detection. The heavy-edge matching uses a greedy algorithm-like approach, prioritizing matching edges with larger weights to construct the coarse graph. Let
; for each unmatched vertex
u, a random unmatched adjacent vertex
v is selected, and the edge between
u and
v is matched such that the edge weight is maximized.
Here,
represents the weight of the edge connecting nodes
u and
v. The matched vertices
u and
v are merged into a single node, which becomes a new node in the coarsest graph. This process is repeated until no more vertices can be matched or the pre-set coarsening threshold is met.
4.4.2. Graph Partition
After coarsening, the graph enters the initial partitioning phase, which performs a high-quality bisection. The partition is computed on the coarsened graph so each partition contains roughly half the original vertices. During coarsening, vertex and edge weights accurately represent the finer-level graph. Thus, provides enough information to balance the partition and minimize edge cut cost. To achieve load balancing and reduce edge cut loss, a graph-growing method expands an initial vertex into a partition until the size constraint is met.
4.4.3. Graph Uncoarsening
In this stage, the partition of the coarse graph is mapped back to the original graph through a layer-by-layer traversal of the graphs . For a vertex v in graph , its partition corresponds to the partition of the set in the previous layer graph where the merged nodes are located. Thus, the partition of the merged nodes corresponding to is assigned to all nodes in , which allows partition to be derived from partition .
Although
is a locally optimal partition for graph
, the refined partition
may not necessarily be optimal for graph
. Since
is a finer-level graph, it has more degrees of freedom that can be used to further improve the partition
. Therefore, the partition of
can be optimized using a local refinement heuristic. After each refinement, the algorithm applies an optimization algorithm to the refined partition. The partition optimization algorithm is primarily based on the Kernighan-Lin algorithm, where boundary vertices are swapped to reduce the number of cut edges. The gain
after swapping is given by the following formula:
Here,
is the gain from swapping vertex
v, measured by the reduction in cut edges.
4.4.4. Subgraph Enhancement
To address information loss from edge cuts, each subgraph generated by Metis partitioning is enhanced. For each subgraph, the adjacent nodes of its boundary nodes are added as virtual nodes. The corresponding edge cut information is also retained. This process allows the local subgraph to fully reflect cross-partition structural relationships. Formally, let the original graph be
. After partitioning, the subgraphs form the set
, where each
and the set of edge cuts is
. For any subgraph
, its boundary node set is
. For every boundary node
, its external node set is
. Within
, for each
, a virtual node
is added, resulting in the set of virtual nodes:
Retaining the edge cut information, that is
The enhanced subgraph is
4.5. Structure Inference
The BGP network obtained by the public collectors is incomplete, leading to missing AS links or even AS nodes. This incompleteness likely misled the anomaly detection method to obtain erroneous results. In this part, we try to reconstruct the structure of the BGP network to recover a complete BGP graph.
Our reconstruction model includes the graph convolutional encoder, the decoder, and the Laplacian structure.
4.5.1. Encoder Module
The encoder module learns a layer-wise transformation by a spectral graph convolutional function
, i.e.,
Here,
,
.
I is the identity matrix of
and
is the activation function (we use
function in this paper).
4.5.2. Decoder Module
The decoder module reconstructs the graph from the learned latent representations
, as follows:
where
Z is the latent representation and
.
4.5.3. Loss Function
Our devised loss function comprises reconstruction bias and latent structure bias. Firstly, we impose a greater penalty on the reconstruction error of the non-zero elements than that of the zero elements.
We simultaneously consider the latent representation output by the encoder.
where
is the regulation factor. This Laplacian loss helps adaptively learn the graph structure, which poses a penalty when similar embedding vectors have a far distance representation in the embedding space. Thus, the Laplacian loss can cause vertices linked by an edge to be mapped into the embedding space.
Additionally, we set the regularization item to prevent overfitting.
The total loss is as follows:
We update the adjacency matrix using the learned graph structure and the initial graph structure as follows:
where
is the initial adjacency matrix,
is the learned adjacency matrix, and
balances their weights. We update the graph every
n epochs. We set a threshold
so that updates stop once the epoch exceeds
.
The whole process of adaptive structure learning is shown as
Figure 5 and Algorithm 1.
| Algorithm 1: Adaptive Graph Learning Algorithm |
| | ![Electronics 14 04940 i001 Electronics 14 04940 i001]() |
4.6. Anomaly Detection
We use the monitor AS and destination prefix as keys to extract paths from routing announcements and RIB snapshots, then identify suspicious routing changes by comparing path differences. To quantify path changes, we employ dynamic time warping (DTW) in conjunction with the mean cosine distance, which effectively measures the overall discrepancy between two ordered sequences of unequal lengths. To formalize this calculation, for a vertex
on path
S and a vertex
on path
, we compute the cosine distance between their embedding vectors.
Here,
and
denote the vector representations of the corresponding ASes. DTW employs dynamic programming to identify an alignment (warping path) that minimizes the cumulative cosine distance, which we take as the final path-difference score. If this score exceeds a dynamic threshold—estimated from the empirical distribution of historically normal changes—the routing change is deemed suspicious. The pseudocode for the DTW procedure is presented in Algorithm 2.
| Algorithm 2: Anomaly Detection Algorithm |
| | ![Electronics 14 04940 i002 Electronics 14 04940 i002]() |
4.7. Complexity Analysis
The total time complexity of the graph partitioning algorithm is the sum of the complexities of its three phases. In the coarsening phase, the algorithm operates with a time complexity of per coarsening level, repeated times, as the number of vertices is roughly halved at each level. The partitioning phase has a complexity of . In the refinement phase, the time complexity is , where the number of iterations depends on the number of refinement passes. Given these factors, Metis generally operates with a time complexity of in typical cases, making it particularly efficient for large, sparse graphs compared to spectral methods.
The topology inference algorithm effectively balances convergence speed and training stability by dynamically adjusting the learning rate. During the iteration process, parameter updates will continue within the maximum number of iterations T until the convergence condition is met or the stopping threshold is reached. Regarding the adaptive learning mechanism of the adjacency matrix, we further examine its computational complexity during the training process. The time complexity of this process is , where d represents the feature dimension of the nodes, n is the total number of nodes, and t is the number of iterations.
The anomaly detection method is based on the Dynamic Time Warping (DTW) algorithm, which has a time complexity of , where N and M are the lengths of the two sequences. It measures the similarity between the two sequences by computing the distance matrix using cosine distance and performing dynamic programming.
5. Experiments
This section evaluates anomaly detection in real BGP network settings. Experimental details are as follows:
Datasets. To assess detection accuracy, we integrate multiple BGP-related data sources—RIB, UPDATES, and AS business-relationship data—building upon the experimental framework introduced above. Next, we present the process for assembling our evaluation datasets. We collected 20 historical routing anomaly reports from 2005 to 2024, including 8 prefix hijacking events, 7 route leak events, and 5 network outage events, as detailed in
Table 3. For each anomaly, we retrieved verifiable information from authoritative sources, such as the anomaly time and affected network prefixes. Using this information, we extracted all routing announcements for a 6 h period before and after each anomaly from RIS RIPE, resulting in 20 datasets with a total data volume of 340 GB. In addition, data for May 2025 was collected for the open experiment, resulting in a total data volume of 3.1 TB.
Experimental environment. To ensure experimental consistency, all experiments are run on a server with an Intel(R) Xeon(R) Platinum 8352Y 64-core CPU, Linux OS, and 219 GB RAM. For fair comparison, both our method and all baselines are implemented in Python 3.9.
Baselines. For comparative evaluation, we include four additional inter-domain routing anomaly detection methods—BGPvector [
21], ISP-Operated [
22], MSLSTM [
23], and BGPviewer [
24].
Table 4 summarizes their core techniques, and
Table 5 presents their hyperparameter settings.
5.1. Research Questions
This section outlines the approach to anomaly detection in real-world BGP networks, presenting the evaluation framework and specifying the major research questions addressed.
RQ1: First, we consider graph partitioning effectiveness. The primary goal is to evaluate the partition quality of Metis when dividing the graph into multiple subgraphs.
RQ2: Next, we examine the accuracy of structure inference. The primary goal is to test the ability of adaptive graph learning to recover edges.
RQ3: Finally, we focus on anomaly detection using real-world datasets. The primary goal is to evaluate the performance of our method during real incidents.
5.2. BGP Topology Partitioning (RQ1)
We build the network from the October 2024 business-relationship table and set edge weights using our plan. We split the graph into 2 to 10 parts and note the total cut edge weight and how many edges are cut.
Next, we analyze the partitioning results. From
Figure 6a, the total cut edge weight increases with the number of subgraphs, peaking at nine and then stabilizing.
Figure 6b shows that the red curve indicates the subgraph adjacency-matrix size (reflecting memory use), while the blue curve shows the number of cut edges. Since subgraph size relates to memory consumption, partitioning must balance memory usage and information loss due to cuts. Overall, partitioning into four subgraphs yields the best trade-off.
5.3. Topology Reconstruction (RQ2)
Across datasets, we train all autoencoder models for 100 iterations using Adam with a learning rate of 0.001. The adaptive-learning mixing ratio () is set to 10%, with the number of adaptive updates t limited to 10–15. The regularization parameter () is set to 0.01, and the parameter in the weight matrix W is set to 36. We report AUC for link-recovery quality. Data are split into training, testing, and validation sets. Using the 2024 route-leak event as an example, we partition the global BGP topology into four subgraphs and predict varying levels of incompleteness per subgraph.
As shown in
Figure 7a, despite changes in topological incompleteness, link-inference accuracy remains around 0.72 AUC, indicating robust recovery performance. Using the same dataset, we performed link prediction based on GNNs, and the results are shown in
Figure 7b. The four subgraphs yield an average AUC of 68%, indicating that our method achieves superior link prediction accuracy.
5.4. Path Difference Score Analysis (RQ3)
We analyze real routing announcements to quantify both legitimate and anomalous route changes using the path-difference score. For each event, we obtain authoritative ground truth (e.g., time and affected prefixes) and extract all announcements within a ±6 h window, yielding 20 datasets. Route changes cover origin changes (different origin AS) and path changes (same origin AS, different traversed AS path).
From
Figure 8, results show anomalous route changes have significantly higher path-difference scores than normal changes, implying anomalies markedly alter AS roles along paths. Our method yields higher anomaly scores than BGPvector on real-world datasets.
To fairly compare separability for normal vs. anomalous BGP messages, we first apply a Fisher transform to cosine-similarity scores to remove dimensional effects, then compute the area between the two empirical CDFs (1D Wasserstein-1 distance) [
25]. Results are summarized below.
The Wasserstein-1 distance (W1) quantifies the difference between the distributions of normal and anomalous routing paths. It can be interpreted as a measure of effect size, with a larger value indicating a greater distinction between the normal and anomalous paths in the overall distribution. According to
Table 6, quantifying differences via the Wasserstein-1 distance shows our method yields substantially larger empirical cumulative distribution function (ECDF) areas than BGPvector, indicating better discrimination between normal and anomalous routes.
5.5. Detection Results of Closed Dataset (RQ3)
For each dataset, we use the latest routing table snapshot before the incident. We create a dictionary with the observation-point AS and network prefix as keys and the AS path as the value. We then sample BGP announcements inside and outside the incident window. Using a one-minute window, we examine all announcements during the incident and within the six hours preceding and following it. Next, each announcement is matched to the routing-table dictionary via the observation point. For each announcement, we match it to the routing-table dictionary by its observation point AS and prefix. We obtain the embedding vector for each AS in the path, then use dynamic time warping and cosine distance to obtain a path-difference score. If this score exceeds a predetermined limit, the route is considered anomalous. After detecting anomalous routes, set the 90th percentile as the first outlier threshold for each anomaly. Then, fit values above this threshold using the Generalized Pareto Distribution (GPD) to better capture unusual points. The final outlier threshold is set at the 98th percentile after GPD fitting. We evaluate BGPvector, ISP-Operated, MSLTM, BGPviewer, and our method on real-world datasets.The definition of experimental results is shown in
Table 7.
Table 8 reports the detection results and the alerting results, respectively. For each anomaly detection method, we summarize and compute the mean of the false positive rate, alarms, and false alarms for 20 events.
As shown in
Table 8, our method and BGPviewer successfully detect all 20 real anomaly events, whereas MSLSTM achieves the lowest success rate (70%).
Table 8 further shows that our method has the fewest false positives, whereas BGPviewer produces the most. Considering both detection coverage and false-positive rate, our approach offers the best overall performance and efficiency.
5.6. Statistical Analysis
To evaluate whether our method outperforms the comparison methods in detection performance, we employed the Nemenyi statistical hypothesis test. Each method was ranked by false positive rate and detection success rate across datasets, with the best-performing method ranked first. We then computed the average rank for each method over all datasets. Building on this, the Friedman rank test was used to test the null hypothesis of “no overall performance difference among methods.” Upon rejecting the null hypothesis at a significance level of
, the Nemenyi post hoc multiple comparison test was applied. In the Nemenyi test, these average ranks are used to analyze differences, producing a critical distance (
) where
k is the number of methods,
G is the number of datasets, and
q is derived from the studentized range statistic divided by
. If the difference in average ranks between two methods exceeds the critical distance, their performance difference is statistically significant. The results for false positive rate and detection success rate are shown in
Figure 9, where algorithm groups with no significant differences are connected.
The analysis reveals that our method achieves the best performance for both false positive rate and detection success rate, while BGPviewer performs the worst on both metrics.
5.7. Detection Result in the Open-World Dataset (RQ3)
We studied routing announcements in May 2025 using a 1 min window and counted daily alerts, totaling 1202 for the month, as shown in the figure. We also applied other anomaly detection methods to the May data; these results are presented in
Figure 10a. Additionally, an authoritative external report classifies May 2025 alarms, including true and false alarms, as shown in
Figure 10b.
To illustrate the impact of this incident, we observed that the deviation score between the anomalous AS path and the normal AS path was 0.945, substantially exceeding the threshold of 0.875. This result indicates that our method has strong practical value in operational deployments.
According to
Table 9, our method achieves a detection time of 0.04 ms with a memory usage of 537 MB, outperforming the other methods in computational efficiency and resource consumption, denoting that our method enables dealing with real-time BGP update messages while keeping lower memory consumption.
Graph partitioning takes approximately 0.15 s, while training all subgraph models takes a total of around 93 min. For BGP network topology updates, set a threshold for significant network or AS-level changes. If exceeded, re-collect data, rebuild and partition the topology, run inference, generate AS embeddings, and perform anomaly detection. These steps ensure an efficient response to ongoing changes in the network.
To help network operators estimate GLBAD deployment cost at Internet scale, we measured resource needs. A standard server (16-core CPU, 64 GB RAM) can train on the partitioned BGP topology and process about 14,000 updates per second during divergence scoring. Model training requires approximately 37 GB of memory, while anomaly detection utilizes around 500 MB. This shows the system is feasible for real-world deployment.
5.8. Ablation Experiment
To assess performance gain in anomaly detection, we partitioned the BGP network topology in two ways: one using Metis without weights and another using our weight design approach. The results appear in
Table 10.
Table 10 shows that designing weights lowers the false positive rate in anomaly detection through topology inference.
We conducted ablation experiments on the US_carrier_Sprint and GHOSTnet datasets to evaluate our method. We tested four types of embedding vectors: the initial AS embedding, one from topology inference across the whole graph, another from topology inference on partitioned subgraphs without enhancement, and the embedding from our proposed method. We compared their anomaly detection results, memory usage, and GPU memory consumption during training.
Table 11 summarizes the methods, and
Figure 11 shows the comparison.
Figure 11 shows that our proposed method achieves the lowest false positives. Enhancing subgraphs does not significantly increase memory or GPU consumption during training. However, the experimental environment cannot meet GPU memory demands for training on the full BGP network graph, and using only system memory requires substantial resources.
5.9. Threshold Sensitivity Analysis
We conduct a threshold sensitivity analysis on the GHOSTnet dataset, with results shown in
Figure 12.
Figure 12 illustrates that the false positive rate (FPR) decreases in a stepwise manner with increasing thresholds. The model achieves balanced performance, with its lowest FPR of 0% between thresholds of 0.68 and 0.69. Performance declines and the FPR rises sharply when the threshold drops below 0.61.
5.10. Case Study
Building upon these computational results, we now consider a real-world anomaly event. On 21 June 2022, Cloudflare experienced a service disruption that affected traffic in 19 data centers. In this subsection, we analyze this severe network outage from a BGP perspective using the proposed method. AS paths, such as (17639 13335), represent the normal historical routing, while (17639 10099 4657 4637 6461 174 13335) correspond to the AS paths updated due to the disruption. We analyze the experimental results by calculating the cosine distance between the embedding vectors corresponding to the ASes, and the results are shown in
Figure 13.
For the Cloudflare outage, we observe that most of the total cost is attributed to the new transit ASes between AS17639 and AS13335. These ASes are not on the usual short path (17639 13335); instead, they make a long detour in the affected path. This shows their role changed: from being outside the path normally to acting as key transit providers during the incident.
Building on this analysis, such a decomposition makes the path-difference score directly actionable. Rather than merely indicating that “the path has changed significantly,” GLBAD can identify a small set of ASes that contribute most to the deviation. In an operational setting, operators can prioritize examining routing policies and BGP communities for these high-contribution ASes. They can also check RPKI/ASPA configurations and export filters. If needed, they may temporarily increase preference for alternative upstreams or withdraw affected announcements at relevant points. Thus, the same metric that raises an alarm also guides where to begin mitigation and which AS roles are most likely responsible for the abnormal paths.