GLBAD: Online BGP Anomaly Detection Under Partial Observation

Wu, Zheng; Zhou, Yaoyu; Wu, Junda

doi:10.3390/electronics14244940

Open AccessArticle

GLBAD: Online BGP Anomaly Detection Under Partial Observation

by

Zheng Wu

^*,

Yaoyu Zhou

and

Junda Wu

The School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(24), 4940; https://doi.org/10.3390/electronics14244940

Submission received: 27 November 2025 / Revised: 14 December 2025 / Accepted: 14 December 2025 / Published: 16 December 2025

(This article belongs to the Section Networks)

Download

Browse Figures

Versions Notes

Abstract

The Border Gateway Protocol (BGP) is the core protocol for inter-domain routing on the Internet. However, due to its lack of built-in security authentication mechanisms, BGP is highly vulnerable to misconfigurations or malicious route announcements, which can lead to severe incidents such as route hijacking and information leakage. Existing detection methods face two major bottlenecks: First, as the scale of Autonomous System (AS)-level topology continues to grow, conventional graph neural networks struggle to meet the demands of computational resources and latency. Second, the observational data provided by current monitoring systems are inherently localized. To address these challenges, this paper proposes a Graph Learning-driven framework for BGP Anomaly Detection, named GLBAD. The core design of GLBAD comprises three components: First, to handle BGP’s large-scale network topology, we propose a graph partition method to perform a dedicated topological partitioning on the BGP network. Second, to overcome the limitation of localized observational data, we design a graph autoencoder-based approach for adaptive graph learning, enabling topology inference. Finally, integrating the above components, we develop a comprehensive BGP anomaly detection system to achieve real-time and accurate anomaly detection. We evaluate our approach on 20 real-world BGP anomaly events. Experimental results demonstrate that the proposed GLBAD effectively detects anomalies with less time consumption while achieving a lower false positive rate.

Keywords:

border gateway protocol; anomaly detection; graph learning; graph partition

1. Introduction

The Border Gateway Protocol (BGP) is one of the most important protocols for the global network, which is applied to ensure Internet reachability between Autonomous Systems (ASes) [1]. However, the protocol does not include authentication and validation steps, making global networks more vulnerable to malicious attacks and misconfigurations, such as prefix hijacks, route leaks, and network outages [2]. Due to the global nature of BGP, these attacks would generally cause much more severe consequences than a normal attack, leading to a rapid and widespread spread throughout the entire network [3]. For example, a Facebook anomaly event caused by a BGP misconfiguration on 4 October 2021 led to a large scale network outage, as a result of which, the service of Facebook was disconnected from the global network for almost six hours, and its market value shrunk by nearly 5% (an estimated loss of more than USD 6 billion) [4].

To enhance routing security in large-scale networks, two complementary approaches prevail: proactive defensemechanisms that prevent threats preemptively, and passive detection systems that identify ongoing attacks. Proactive defense typically leverages out-of-band trust information to detect abnormal routing messages, such as Route Origin Validation (ROV) [5] and Autonomous System Provider Authorization (ASPA) [6], which are built upon the Resource Public Key Infrastructure (RPKI). However, while these schemes offer robust security, achieving comprehensive deployment will require a significant amount of time. Crucially, their partial implementation may introduce new vulnerabilities [7].

Passive detection leverages AS behavioral patterns—specifically, when an AS detects alterations in its routing information base and propagates updates to neighboring ASes—to identify anomalous routes. This type of method can be further divided into two categories, rule-based [8] and Artificial Intelligence (AI)-based [9]. The rule-based method first constructs its rule base and then matches each incoming update message with the rule base. A well-matched update message is recognized as a valid route announcement.

In the closed labeled datasets, these methods typically perform well. Once deployed in a real-world environment, the accuracy and real-time performance are often not satisfactory. We argue that there are two major challenges limiting the existing methods in real-deployed environments, described as follows:

The BGP network is large-scale. Modern Internet topologies encompass approximately 75,000 autonomous systems (ASes) and over 600,000 inter-AS links. Given this scale, anomaly detection systems face substantial challenges, demanding not only high detection accuracy but also low-latency processing to meet the stringent requirements of real-time monitoring.
The BGP network observed by collectors is in partial view. The AS links collected from vantage points do not capture all existing AS-level connections, resulting in incomplete input data for the detection method [10]. This limitation can significantly degrade anomaly detection performance, particularly in terms of detection accuracy and false alarm rates.

In this paper, we present a routing anomaly detection system named GLBAD designed to address the aforementioned problems, aiming to enhance both accuracy and real-time performance. The contributions of our paper can be summarized below:

A BGP-dedicated Partition Scheme. To face the large size of topology and volumes of BGP messages, we develop a multi-level graph partition scheme. This partition method leverages the characteristics of the BGP network, specifically the power law distribution and sparsity, to divide the BGP graph, taking into account both load balance and minimizing information loss.
An Adaptive Structure Inference Method. To overcome the problem of partial view, we propose a topology inference method designed for incomplete structures. Additionally, this adjacency matrix can be self-learning to better obtain the graph and node embeddings.
A Complete BGP Anomaly Detection System. We propose a complete BGP anomaly detection system. It will achieve real-time detection. The system retrieves paths, aligns them, detects anomalies, and locates root causes.
Comprehensive Experiments. An evaluation of our method and its baselines under the closed-world and open-world datasets, respectively, which demonstrates that our method is effective and behaves best in terms of accuracy and time performance.

2. Background and Problem Statement

In this section, we present the relevant preliminary knowledge and outline the research problems.

2.1. BGP Hijack

According to the identified vulnerabilities, the BGP anomaly can be further categorized into prefix hijacking, routing leaks, and route hijacking. The specific models are shown in Figure 1.

Prefix Hijack. The attacker configures its AS router to advertise prefixes owned by other ASes, thereby hijacking traffic destined for its own AS. In Figure 1a, attacker AS4 forges ownership of prefix P from AS1. As seen from the observation point, the AS path to prefix P shifts from (AS3, AS2, AS1) to (AS3, AS2, AS4).

Route Leak. RFC 7908 [11] defines such BGP anomalies as route announcements that propagate beyond their intended scope. In Figure 1b, attacker AS2 leaks AS1’s prefix to its upstream provider, AS4. This detour from the expected path has a direct impact on network traffic.

Network Outage. ASes exchange reachability information via BGP keep-alive messages at fixed intervals. Misconfigurations, disasters, or political events can disrupt AS connectivity, causing unreachable paths or disconnections. In Figure 1c, AS2 loses connection to AS1 and AS3, rendering path (AS3, AS2, AS1) unreachable. AS1 must then select an alternative path.

2.2. Characteristics of BGP Network

The global BGP network includes approximately 70,000 ASes linked by about 600,000 AS-level connections. As shown in Figure 2a, the Internet’s inter-domain topology has consistently expanded over time, in both the number of ASes and inter-AS links. This growth, combined with the large volume of BGP updates, presents significant challenges for current anomaly detection methods, making real-time analysis increasingly difficult.

Then, we also calculate the average degree of the BGP network in Figure 2b. We can observe that the AS-level connectivity remains sparse (the average degree of AS is at a very low level, i.e., 0.0003).

Finally, we measure the cumulative distribution of the AS degree for each year in Figure 2c. It is observed that the BGP network is with a highly skewed degree distribution, and the degrees of ASes exhibit power-law distribution.

These characteristics of the BGP network lead to negative impacts on the anomaly detection. The detailed causes can be referred to in Section 2.4.

2.3. Partial View of BGP Routing Data

What Is Partial View

Although BGP routing data is extensive, these open-sourced data are drawn from collectors that provide only a partial view. BGP collectors are distributed unevenly, with a predominant presence in Europe and North America [12]. As a result, several links and ASes remain unobserved, and these unobserved degrees vary over time. Figure 3 illustrates the calculated percentages of unobserved AS links from 2016 to 2024.

All of the above negative factors stem from the existing methods’ inability to perform effectively in a real-world environment. Thus, this paper presents a novel BGP anomaly detection framework to reach accurate and timely performance.

2.4. Problem Statement

The primary focus of this work is to accurately and timely detect anomalies. The BGP network can be modeled into an attributed graph represented by

G = (V, E, X)

, where

V = {v_{1}, v_{2}, \dots, v_{n}}

is the set of n nodes, E is the set of m edges, and

X \in R^{n \times d}

is the attributed matrix. The structure of the graph can also be denoted as

A \in R^{n \times n}

. Specifically,

A_{i j} = 1

if node

v_{i}

and node

v_{j}

exist on one edge, and

A_{i j} = 0

if not. The graph of the Laplacian matrix L is defined as

D - A

, where D is the degree matrix. Facing such a large topology, most detection systems based on graph learning consume a significant amount of time, making it impossible for these systems to achieve real-time detection. Thus, we aim to partition the BGP graph and then learn the subgraphs, respectively.

However, graph partition on the BGP graph is difficult due to its inherent characteristics. Firstly, this graph is sparse. It means that inducing an arbitrary split on its AS links will result in significant information loss for one AS node in the graph. As such, the obtained detection results are typically inaccurate. Second, this graph is a highly skewed degree distribution. It will lead to a load imbalance if the existing methods are used to partition the BGP graph. Third, as discussed above, the observed BGP graph represents only a subset of the actual BGP network. Anomaly detection on this incomplete graph is more likely to result in inaccurate judgments and decisions.

Thus, in this paper, we dedicate the graph partition method to splitting the BGP graph, considering load balance and minimizing information loss, in order to achieve real-time anomaly detection. Additionally, we devise a topology structure inference method to restore the partial-view, incomplete graph through learning the attributes and relationships of the ASes.

3. Related Work

In recent years, researchers have conducted extensive studies on BGP anomaly detection. Existing BGP anomaly detection methods can be categorized into three main types: rule-based methods, artificial intelligence-based methods, and active probing-based methods.

Active Probing-Based Detection Methods. This method actively sends traffic into the network and detects whether a certain IP or prefix is reachable by analyzing the traffic reception and stability. Schlamp et al. [13] enhance this approach by utilizing encrypted traffic based on the Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol to test the reachability of abnormal IP addresses, which reduces the false positive rate. By comparing public key changes before and after abnormal events, they distinguish between benign and suspicious events. The advantage of this method, unlike others, is that it can help assess the extent of an event’s development if the event’s source is known. In contrast, Trinocular [14] introduces an adaptive probing mechanism that reduces the frequency of active traffic injection, lowering network load. However, this approach cannot achieve comprehensive network detection, unlike the broader reach of previous methods.

Rule-Based Detection Methods. These methods detect anomalies by matching against legitimate rules. If the information in a routing update message does not match any valid rule, an alert is triggered. Artemis [8], as a typical routing logic method, has been widely used for prefix hijacking detection. Its advantage lies in its good interpretability, providing a clear logical explanation for each detection decision. However, routing logic methods are heavily dependent on the accuracy of the data. If the input data source is inaccurate or incomplete, the detection results may be significantly affected. Especially when facing partial data loss or inconsistencies, this type of method is prone to false positives or missed detections.

Artificial Intelligence-Based Detection Methods. These methods use artificial intelligence techniques to model network traffic or routing data, identifying abnormal patterns to detect network faults or attack events. Researchers have used unsupervised learning algorithms to detect network anomalies by clustering network behavior into different clusters and identifying behaviors that deviate from the normal cluster as anomalies [1]. Li et al. [15] combined supervised and unsupervised learning methods to propose an ensemble learning-based anomaly detection framework, utilizing multiple trained base classifiers and unsupervised learning for integration, improving detection accuracy and robustness. The advantage of these methods is that they can automatically learn complex patterns of network behavior. However, machine learning methods typically require large amounts of labeled data for training, and their performance is highly dependent on the quality of the data and the effectiveness of feature engineering. Furthermore, the black-box effect of deep learning models makes it difficult to interpret the detection results, thereby limiting their trustworthiness in practical applications [16].

GLBAD is a passive anomaly detection framework. It complements proactive tools such as Resource Public Key Infrastructure (RPKI), and Autonomous System Provider Authorization (ASPA). It detects suspicious path behavior on partially protected prefixes. GLBAD then supplies candidate events to ARTEMIS-like systems. To handle conflicts from misidentified AS relationships, GLBAD sets conservative thresholds and groups high-risk alerts for operator confirmation. This approach keeps policy conflicts in check and supports the other mechanisms. In a broader context of network security, addressing both path anomalies and covert threats contributes to more comprehensive detection capabilities.

Encrypted traffic and wireless side-channel analyses are closely tied to application and behavior identification for intrusion detection. For example, FOAP [17] targets open-world Android traffic fingerprinting. After filtering irrelevant flows, it can infer fine-grained UI-associated user operations. This demonstrates that encrypted traffic still leaks information about applications and behaviors. Building on this, AppListener [18] shows that, even without packet capture, one can identify applications and their in-app activities using only passive harvesting of Wi-Fi RF energy variations. This broadens the range of exploitable side channels. Together, these studies indirectly show that even with incomplete observations and noise, anomalous behaviors can still be detected via deviations in temporal or structural patterns.

4. Method

In this section, we will introduce our proposed method, named GLBAD, to present how to overcome the existing challenges of BGP anomaly detection.

4.1. Overview

The framework of the proposed method is illustrated in Figure 4, comprising route collection, graph construction, graph partition, structure inference, and ultimately, anomaly detection. In this part, we first introduce the process of route collection. Then, we construct the graph through the route. To accommodate such a large BGP network, we propose a graph partition scheme that enables partitioning a large graph into several similar and properly sized subgraphs. To address the issue of partial visibility of public vantages, we propose an adaptive structure inference method to compensate for the missing links. Finally, we develop a comprehensive anomaly detection system to enable real-time detection.

4.2. Route Collection

The BGP route data generally includes the BGP route information base (RIB) and BGP update message, which are publicly collected by multiple globally distributed vantage points. These data are saved in the binary format of MRT. Thus, before usage, it is necessary to decompress and parse it into a human-friendly format. The BGP RIB and update message have the same field. The detailed descriptions of the route field are listed in Table 1. The RIB data are the route tables of the vantage points, updated every 8 h. The update messages are triggered by changes to the route table and sent to neighbors.

Using this collected data, we can model the BGP network as a graph

G (t)

at time t as follows. Building on this, we detail how the network structure and attributes are constructed.

4.3. Graph Construction

We use the AS links extracted from AS paths to construct the structure of

G (t)

, i.e., the adjacent matrix A. Next, the AS attribute matrix is constructed using AS attribute information, which comprises geo-location information, the number of update messages destined for and crossed by the specific AS, and the semantic role extracted by BGPvector [19]. Thus, the vector size of each AS is d. To provide more detail, the following table describes each AS attribute.

However, the constructed graph is too large to be processed in a timely manner by the downstream anomaly detection method. Thus, we devise a customized graph partition method below.

4.4. The Proposed Graph Partition on BGP Graph

Due to the scale of BGP routing topology, graph partitioning is used to divide it into subgraphs, enabling downstream tasks to be computed separately on each subgraph. During partitioning, node counts should be balanced across subgraphs to ensure computational load balancing. To reduce information loss, the number of edges cut should be minimized. When cutting edges, the goal is to cut those with the least information. Thus, BGP topology partitioning is a graph partitioning problem: partition a large graph G into k subgraphs

p = {G_{1}, G_{2}, \dots, G_{k}}

, where the objective is to minimize edge cut weight while balancing node counts among subgraphs. This problem can be formally described as

\begin{matrix} f (p) & = \arg \min \sum_{G_{i}, G_{j} \in p} |W (G_{i}, G_{j})| \\ s . t . : & |1 - \frac{V_{i}}{V}| \leq ε, \forall i \in {1, 2, . . ., k} \end{matrix}

(1)

In the equation, V represents the vertex set of G,

G_{i}

is a subgraph of G,

V_{i}

is the vertex set of

G_{i}

, and W represents the edge cut weight between two subgraphs.

This paper uses a multilevel graph partitioning algorithm to efficiently partition the BGP topology. The execution of the algorithm can be summarized into three core steps: coarsening, initialization, and refinement with refinement optimization.

To achieve the optimization objective, this paper proposes an innovative design of edge weights through a weight fusion approach. First, the AS node weight is determined based on the number of Customer Cones associated with each AS, where the size of the Customer Cone [20] is inferred from the BGP paths using CAIDA’s AS relationship inference algorithm. ASes with large Customer Cones play a significant role in the capital and governance structure of the Internet. In the routing table data, we count the number of reachable destination network addresses formed by each pair of ASes in the AS path, representing the traffic of that edge. The traffic weight is designed based on the distribution of traffic size. The number of ASes in each AS’s Customer Cone and the number of network prefixes it owns both follow a power-law distribution. Therefore, we design the weight function in a piecewise manner. Additionally, we aim for the node weight to play a primary role in the edge weight design. Hence, the edge weight is designed as the sum of the weights of the adjacent nodes and the traffic weight, with the node weight being greater than the traffic weight. The designs of node weights and traffic weights are shown in Table 2.

The edge weight calculation formula is as follows:

W_{edge} = W_{node 1} + W_{node 2} + W_{traffic}

where

W_{node 1}

and

W_{node 2}

represent the node weights of two ASes, and

W_{traffic}

is the traffic weight between the two ASes.

4.4.1. Graph Coarsening

During the coarsening process of the graph, by merging adjacent nodes, the original graph is gradually reduced to a smaller graph, where it is evident that

| V_{i} | < | V_{i - 1} |

. As the coarsening operation proceeds in stages, a set of nodes in graph

G_{i}

will be represented as a single node in graph

G_{i + 1}

. Specifically, let

V_{v}

be the set of nodes generated during the coarsening of graph

G_{i}

, corresponding to node v in graph

G_{i + 1}

. To ensure load balancing during graph partitioning, the weight of node v in the coarsest graph

G_{i + 1}

is set as the sum of the weights of all nodes in the set

V_{i}^{v}

from the current level graph

G_{i}

. Moreover, when multiple nodes in

V_{i}^{v}

point to the same vertex u in

G_{i + 1}

, the weight of the edge between v and u in graph

G_{i + 1}

will be the sum of the weights of the edges in

V_{i}^{v}

that point to u.

Metis provides several edge-matching strategies for coarsening. For the BGP topology partitioning, a heavy-edge matching method is used, which aims to cut the less important edges in the BGP network topology to minimize the impact on subsequent routing anomaly detection. The heavy-edge matching uses a greedy algorithm-like approach, prioritizing matching edges with larger weights to construct the coarse graph. Let

v \in V_{i}

; for each unmatched vertex u, a random unmatched adjacent vertex v is selected, and the edge between u and v is matched such that the edge weight is maximized.

w (u, v) = \max {w (u, v) | \forall v \in V_{i}}

(2)

Here,

w (u, v)

represents the weight of the edge connecting nodes u and v. The matched vertices u and v are merged into a single node, which becomes a new node in the coarsest graph. This process is repeated until no more vertices can be matched or the pre-set coarsening threshold is met.

4.4.2. Graph Partition

After coarsening, the graph enters the initial partitioning phase, which performs a high-quality bisection. The partition

P_{m}

is computed on the coarsened graph so each partition contains roughly half the original vertices. During coarsening, vertex and edge weights accurately represent the finer-level graph. Thus,

G_{m}

provides enough information to balance the partition and minimize edge cut cost. To achieve load balancing and reduce edge cut loss, a graph-growing method expands an initial vertex into a partition until the size constraint is met.

4.4.3. Graph Uncoarsening

In this stage, the partition

P_{m}

of the coarse graph

G_{m}

is mapped back to the original graph

G_{0}

through a layer-by-layer traversal of the graphs

G_{m - 1}, G_{m - 2}, \dots, G_{1}

. For a vertex v in graph

G_{i}

, its partition corresponds to the partition of the set

V_{i}^{v}

in the previous layer graph

G_{i + 1}

where the merged nodes are located. Thus, the partition of the merged nodes corresponding to

V_{v}

is assigned to all nodes in

V_{i}^{v}

, which allows partition

P_{i}

to be derived from partition

P_{i + 1}

.

Although

P_{i + 1}

is a locally optimal partition for graph

G_{i + 1}

, the refined partition

P_{i}

may not necessarily be optimal for graph

G_{i}

. Since

G_{i}

is a finer-level graph, it has more degrees of freedom that can be used to further improve the partition

P_{i}

. Therefore, the partition of

G_{i}

can be optimized using a local refinement heuristic. After each refinement, the algorithm applies an optimization algorithm to the refined partition. The partition optimization algorithm is primarily based on the Kernighan-Lin algorithm, where boundary vertices are swapped to reduce the number of cut edges. The gain

Δ c u t

after swapping is given by the following formula:

Δ c u t = \sum_{v \in V_{1}} g (v) - \sum_{v \in V_{2}} g (v)

(3)

Here,

g (v)

is the gain from swapping vertex v, measured by the reduction in cut edges.

4.4.4. Subgraph Enhancement

To address information loss from edge cuts, each subgraph generated by Metis partitioning is enhanced. For each subgraph, the adjacent nodes of its boundary nodes are added as virtual nodes. The corresponding edge cut information is also retained. This process allows the local subgraph to fully reflect cross-partition structural relationships. Formally, let the original graph be

G = (V, E)

. After partitioning, the subgraphs form the set

{G_{1}, G_{2}, \dots, G_{k}}

, where each

G_{i} = (V_{i}, E_{i})

and the set of edge cuts is

E_{cut}

. For any subgraph

G_{i}

, its boundary node set is

B_{i}

. For every boundary node

u \in B_{i}

, its external node set is

N_{ext} (u)

. Within

G_{i}

, for each

v \in N_{ext} (u)

, a virtual node

\tilde{v}

is added, resulting in the set of virtual nodes:

V_{i}^{v i r t} = ⋃ {\tilde{v} | v \in N_{e x t} (u)}

(4)

Retaining the edge cut information, that is

E_{i}^{v i r t} = {(u, \tilde{v}) | (u, v) \in E_{c u t}, u \in V_{i}}

(5)

The enhanced subgraph is

{G^{'}}_{i} = (V_{i} ⋃ V_{i}^{v i r t}, E_{i} ⋃ E_{i}^{v i r t})

(6)

4.5. Structure Inference

The BGP network obtained by the public collectors is incomplete, leading to missing AS links or even AS nodes. This incompleteness likely misled the anomaly detection method to obtain erroneous results. In this part, we try to reconstruct the structure of the BGP network to recover a complete BGP graph.

Our reconstruction model includes the graph convolutional encoder, the decoder, and the Laplacian structure.

4.5.1. Encoder Module

The encoder module learns a layer-wise transformation by a spectral graph convolutional function

f (Z^{(l)}, A | W^{(l)})

, i.e.,

f (Z^{(l)}, A | W^{(l)}) = σ ({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} Z^{(l)} W^{(l)}) .

(7)

Here,

\tilde{A} = A + I

,

{\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j}

. I is the identity matrix of

\tilde{A}

and

σ

is the activation function (we use

R e l u (\cdot)

function in this paper).

4.5.2. Decoder Module

The decoder module reconstructs the graph from the learned latent representations

Z^{(l)}

, as follows:

\hat{A} = S i g m o i d (Z \cdot Z^{T}) .

(8)

where Z is the latent representation and

Z = e n c o d e r (Z | X, A)

.

4.5.3. Loss Function

Our devised loss function comprises reconstruction bias and latent structure bias. Firstly, we impose a greater penalty on the reconstruction error of the non-zero elements than that of the zero elements.

\begin{matrix} L_{G 1} & = \sum_{i = 1}^{n} {∥ (a_{i} - {\hat{a}}_{i}) ⊙ b_{i} ∥}_{2}^{2}, \\ = ∥ (A - \hat{A}) {⊙ B ∥}_{F}^{2}, \end{matrix}

(9)

We simultaneously consider the latent representation output by the encoder.

\begin{matrix} L_{L} & = \sum_{i, j = 1}^{n} (∥ z_{i} - z_{j} ∥_{2}^{2} \cdot a_{i j} + γ_{i} a_{i j}^{2}), \\ = tr (Z^{T} L Z) + γ {∥ A ∥}_{F}^{2}, \\ s . t . a_{i}^{T} 1 = 1, 0 \leq a_{i} \leq 1, \end{matrix}

(10)

where

γ

is the regulation factor. This Laplacian loss helps adaptively learn the graph structure, which poses a penalty when similar embedding vectors have a far distance representation in the embedding space. Thus, the Laplacian loss can cause vertices linked by an edge to be mapped into the embedding space.

Additionally, we set the regularization item to prevent overfitting.

L_{r e g} = \frac{1}{2} \sum_{i} ∥ W^{(i)} ∥_{F}^{2} .

(11)

The total loss is as follows:

L_{S} = L_{G 1} + L_{L} + L_{r e g} .

(12)

We update the adjacency matrix using the learned graph structure and the initial graph structure as follows:

A = α A_{L} + (1 - α) A_{0}

(13)

where

A_{0}

is the initial adjacency matrix,

A_{L}

is the learned adjacency matrix, and

α

balances their weights. We update the graph every n epochs. We set a threshold

τ

so that updates stop once the epoch exceeds

τ

.

The whole process of adaptive structure learning is shown as Figure 5 and Algorithm 1.

Algorithm 1: Adaptive Graph Learning Algorithm

4.6. Anomaly Detection

We use the monitor AS and destination prefix as keys to extract paths from routing announcements and RIB snapshots, then identify suspicious routing changes by comparing path differences. To quantify path changes, we employ dynamic time warping (DTW) in conjunction with the mean cosine distance, which effectively measures the overall discrepancy between two ordered sequences of unequal lengths. To formalize this calculation, for a vertex

v_{i}

on path S and a vertex

v_{j}^{'}

on path

S^{'}

, we compute the cosine distance between their embedding vectors.

CosineDistance (v_{i}, v_{j}^{'}) = 1 - \frac{〈 X_{v_{i}}, X_{v_{j}^{'}} 〉}{∥ X_{v_{i}} ∥ ∥ X_{v_{j}^{'}} ∥} .

(14)

Here,

X_{v_{i}}

and

X_{v_{j}^{'}}

denote the vector representations of the corresponding ASes. DTW employs dynamic programming to identify an alignment (warping path) that minimizes the cumulative cosine distance, which we take as the final path-difference score. If this score exceeds a dynamic threshold—estimated from the empirical distribution of historically normal changes—the routing change is deemed suspicious. The pseudocode for the DTW procedure is presented in Algorithm 2.

Algorithm 2: Anomaly Detection Algorithm

4.7. Complexity Analysis

The total time complexity of the graph partitioning algorithm is the sum of the complexities of its three phases. In the coarsening phase, the algorithm operates with a time complexity of

O (| E |)

per coarsening level, repeated

O (\log n)

times, as the number of vertices is roughly halved at each level. The partitioning phase has a complexity of

O (| E |)

. In the refinement phase, the time complexity is

O (| E | \log | E |)

, where the number of iterations depends on the number of refinement passes. Given these factors, Metis generally operates with a time complexity of

O (| E | \log | E |)

in typical cases, making it particularly efficient for large, sparse graphs compared to spectral methods.

The topology inference algorithm effectively balances convergence speed and training stability by dynamically adjusting the learning rate. During the iteration process, parameter updates will continue within the maximum number of iterations T until the convergence condition is met or the stopping threshold is reached. Regarding the adaptive learning mechanism of the adjacency matrix, we further examine its computational complexity during the training process. The time complexity of this process is

O ((d n^{2}) t)

, where d represents the feature dimension of the nodes, n is the total number of nodes, and t is the number of iterations.

The anomaly detection method is based on the Dynamic Time Warping (DTW) algorithm, which has a time complexity of

O (N \times M)

, where N and M are the lengths of the two sequences. It measures the similarity between the two sequences by computing the distance matrix using cosine distance and performing dynamic programming.

5. Experiments

This section evaluates anomaly detection in real BGP network settings. Experimental details are as follows:

Datasets. To assess detection accuracy, we integrate multiple BGP-related data sources—RIB, UPDATES, and AS business-relationship data—building upon the experimental framework introduced above. Next, we present the process for assembling our evaluation datasets. We collected 20 historical routing anomaly reports from 2005 to 2024, including 8 prefix hijacking events, 7 route leak events, and 5 network outage events, as detailed in Table 3. For each anomaly, we retrieved verifiable information from authoritative sources, such as the anomaly time and affected network prefixes. Using this information, we extracted all routing announcements for a 6 h period before and after each anomaly from RIS RIPE, resulting in 20 datasets with a total data volume of 340 GB. In addition, data for May 2025 was collected for the open experiment, resulting in a total data volume of 3.1 TB.

Experimental environment. To ensure experimental consistency, all experiments are run on a server with an Intel(R) Xeon(R) Platinum 8352Y 64-core CPU, Linux OS, and 219 GB RAM. For fair comparison, both our method and all baselines are implemented in Python 3.9.

Baselines. For comparative evaluation, we include four additional inter-domain routing anomaly detection methods—BGPvector [21], ISP-Operated [22], MSLSTM [23], and BGPviewer [24]. Table 4 summarizes their core techniques, and Table 5 presents their hyperparameter settings.

5.1. Research Questions

This section outlines the approach to anomaly detection in real-world BGP networks, presenting the evaluation framework and specifying the major research questions addressed.

RQ1: First, we consider graph partitioning effectiveness. The primary goal is to evaluate the partition quality of Metis when dividing the graph into multiple subgraphs.

RQ2: Next, we examine the accuracy of structure inference. The primary goal is to test the ability of adaptive graph learning to recover edges.

RQ3: Finally, we focus on anomaly detection using real-world datasets. The primary goal is to evaluate the performance of our method during real incidents.

5.2. BGP Topology Partitioning (RQ1)

We build the network from the October 2024 business-relationship table and set edge weights using our plan. We split the graph into 2 to 10 parts and note the total cut edge weight and how many edges are cut.

Next, we analyze the partitioning results. From Figure 6a, the total cut edge weight increases with the number of subgraphs, peaking at nine and then stabilizing. Figure 6b shows that the red curve indicates the subgraph adjacency-matrix size (reflecting memory use), while the blue curve shows the number of cut edges. Since subgraph size relates to memory consumption, partitioning must balance memory usage and information loss due to cuts. Overall, partitioning into four subgraphs yields the best trade-off.

5.3. Topology Reconstruction (RQ2)

Across datasets, we train all autoencoder models for 100 iterations using Adam with a learning rate of 0.001. The adaptive-learning mixing ratio (

α

) is set to 10%, with the number of adaptive updates t limited to 10–15. The regularization parameter (

λ

) is set to 0.01, and the parameter in the weight matrix W is set to 36. We report AUC for link-recovery quality. Data are split into training, testing, and validation sets. Using the 2024 route-leak event as an example, we partition the global BGP topology into four subgraphs and predict varying levels of incompleteness per subgraph.

As shown in Figure 7a, despite changes in topological incompleteness, link-inference accuracy remains around 0.72 AUC, indicating robust recovery performance. Using the same dataset, we performed link prediction based on GNNs, and the results are shown in Figure 7b. The four subgraphs yield an average AUC of 68%, indicating that our method achieves superior link prediction accuracy.

5.4. Path Difference Score Analysis (RQ3)

We analyze real routing announcements to quantify both legitimate and anomalous route changes using the path-difference score. For each event, we obtain authoritative ground truth (e.g., time and affected prefixes) and extract all announcements within a ±6 h window, yielding 20 datasets. Route changes cover origin changes (different origin AS) and path changes (same origin AS, different traversed AS path).

From Figure 8, results show anomalous route changes have significantly higher path-difference scores than normal changes, implying anomalies markedly alter AS roles along paths. Our method yields higher anomaly scores than BGPvector on real-world datasets.

To fairly compare separability for normal vs. anomalous BGP messages, we first apply a Fisher transform to cosine-similarity scores to remove dimensional effects, then compute the area between the two empirical CDFs (1D Wasserstein-1 distance) [25]. Results are summarized below.

The Wasserstein-1 distance (W1) quantifies the difference between the distributions of normal and anomalous routing paths. It can be interpreted as a measure of effect size, with a larger value indicating a greater distinction between the normal and anomalous paths in the overall distribution. According to Table 6, quantifying differences via the Wasserstein-1 distance shows our method yields substantially larger empirical cumulative distribution function (ECDF) areas than BGPvector, indicating better discrimination between normal and anomalous routes.

5.5. Detection Results of Closed Dataset (RQ3)

For each dataset, we use the latest routing table snapshot before the incident. We create a dictionary with the observation-point AS and network prefix as keys and the AS path as the value. We then sample BGP announcements inside and outside the incident window. Using a one-minute window, we examine all announcements during the incident and within the six hours preceding and following it. Next, each announcement is matched to the routing-table dictionary via the observation point. For each announcement, we match it to the routing-table dictionary by its observation point AS and prefix. We obtain the embedding vector for each AS in the path, then use dynamic time warping and cosine distance to obtain a path-difference score. If this score exceeds a predetermined limit, the route is considered anomalous. After detecting anomalous routes, set the 90th percentile as the first outlier threshold for each anomaly. Then, fit values above this threshold using the Generalized Pareto Distribution (GPD) to better capture unusual points. The final outlier threshold is set at the 98th percentile after GPD fitting. We evaluate BGPvector, ISP-Operated, MSLTM, BGPviewer, and our method on real-world datasets.The definition of experimental results is shown in Table 7. Table 8 reports the detection results and the alerting results, respectively. For each anomaly detection method, we summarize and compute the mean of the false positive rate, alarms, and false alarms for 20 events.

As shown in Table 8, our method and BGPviewer successfully detect all 20 real anomaly events, whereas MSLSTM achieves the lowest success rate (70%). Table 8 further shows that our method has the fewest false positives, whereas BGPviewer produces the most. Considering both detection coverage and false-positive rate, our approach offers the best overall performance and efficiency.

5.6. Statistical Analysis

To evaluate whether our method outperforms the comparison methods in detection performance, we employed the Nemenyi statistical hypothesis test. Each method was ranked by false positive rate and detection success rate across datasets, with the best-performing method ranked first. We then computed the average rank for each method over all datasets. Building on this, the Friedman rank test was used to test the null hypothesis of “no overall performance difference among methods.” Upon rejecting the null hypothesis at a significance level of

α = 0.05

, the Nemenyi post hoc multiple comparison test was applied. In the Nemenyi test, these average ranks are used to analyze differences, producing a critical distance (

C D = q_{α} \sqrt{\frac{k (k + 1)}{6 G}}

) where k is the number of methods, G is the number of datasets, and q is derived from the studentized range statistic divided by

\sqrt{2}

. If the difference in average ranks between two methods exceeds the critical distance, their performance difference is statistically significant. The results for false positive rate and detection success rate are shown in Figure 9, where algorithm groups with no significant differences are connected.

The analysis reveals that our method achieves the best performance for both false positive rate and detection success rate, while BGPviewer performs the worst on both metrics.

5.7. Detection Result in the Open-World Dataset (RQ3)

We studied routing announcements in May 2025 using a 1 min window and counted daily alerts, totaling 1202 for the month, as shown in the figure. We also applied other anomaly detection methods to the May data; these results are presented in Figure 10a. Additionally, an authoritative external report classifies May 2025 alarms, including true and false alarms, as shown in Figure 10b.

To illustrate the impact of this incident, we observed that the deviation score between the anomalous AS path and the normal AS path was 0.945, substantially exceeding the threshold of 0.875. This result indicates that our method has strong practical value in operational deployments.

According to Table 9, our method achieves a detection time of 0.04 ms with a memory usage of 537 MB, outperforming the other methods in computational efficiency and resource consumption, denoting that our method enables dealing with real-time BGP update messages while keeping lower memory consumption.

Graph partitioning takes approximately 0.15 s, while training all subgraph models takes a total of around 93 min. For BGP network topology updates, set a threshold for significant network or AS-level changes. If exceeded, re-collect data, rebuild and partition the topology, run inference, generate AS embeddings, and perform anomaly detection. These steps ensure an efficient response to ongoing changes in the network.

To help network operators estimate GLBAD deployment cost at Internet scale, we measured resource needs. A standard server (16-core CPU, 64 GB RAM) can train on the partitioned BGP topology and process about 14,000 updates per second during divergence scoring. Model training requires approximately 37 GB of memory, while anomaly detection utilizes around 500 MB. This shows the system is feasible for real-world deployment.

5.8. Ablation Experiment

To assess performance gain in anomaly detection, we partitioned the BGP network topology in two ways: one using Metis without weights and another using our weight design approach. The results appear in Table 10.

Table 10 shows that designing weights lowers the false positive rate in anomaly detection through topology inference.

We conducted ablation experiments on the US_carrier_Sprint and GHOSTnet datasets to evaluate our method. We tested four types of embedding vectors: the initial AS embedding, one from topology inference across the whole graph, another from topology inference on partitioned subgraphs without enhancement, and the embedding from our proposed method. We compared their anomaly detection results, memory usage, and GPU memory consumption during training. Table 11 summarizes the methods, and Figure 11 shows the comparison.

Figure 11 shows that our proposed method achieves the lowest false positives. Enhancing subgraphs does not significantly increase memory or GPU consumption during training. However, the experimental environment cannot meet GPU memory demands for training on the full BGP network graph, and using only system memory requires substantial resources.

5.9. Threshold Sensitivity Analysis

We conduct a threshold sensitivity analysis on the GHOSTnet dataset, with results shown in Figure 12.

Figure 12 illustrates that the false positive rate (FPR) decreases in a stepwise manner with increasing thresholds. The model achieves balanced performance, with its lowest FPR of 0% between thresholds of 0.68 and 0.69. Performance declines and the FPR rises sharply when the threshold drops below 0.61.

5.10. Case Study

Building upon these computational results, we now consider a real-world anomaly event. On 21 June 2022, Cloudflare experienced a service disruption that affected traffic in 19 data centers. In this subsection, we analyze this severe network outage from a BGP perspective using the proposed method. AS paths, such as (17639 13335), represent the normal historical routing, while (17639 10099 4657 4637 6461 174 13335) correspond to the AS paths updated due to the disruption. We analyze the experimental results by calculating the cosine distance between the embedding vectors corresponding to the ASes, and the results are shown in Figure 13.

For the Cloudflare outage, we observe that most of the total cost is attributed to the new transit ASes between AS17639 and AS13335. These ASes are not on the usual short path (17639 13335); instead, they make a long detour in the affected path. This shows their role changed: from being outside the path normally to acting as key transit providers during the incident.

Building on this analysis, such a decomposition makes the path-difference score directly actionable. Rather than merely indicating that “the path has changed significantly,” GLBAD can identify a small set of ASes that contribute most to the deviation. In an operational setting, operators can prioritize examining routing policies and BGP communities for these high-contribution ASes. They can also check RPKI/ASPA configurations and export filters. If needed, they may temporarily increase preference for alternative upstreams or withdraw affected announcements at relevant points. Thus, the same metric that raises an alarm also guides where to begin mitigation and which AS roles are most likely responsible for the abnormal paths.

6. Conclusions

To address the dual challenges of Internet-scale topology and localized vantage points in inter-domain routing anomaly detection, this paper proposes an adaptive graph learning-based anomaly detection framework. The method employs a customized topology partitioning scheme based on METIS, incorporating subgraph augmentation to mitigate computational and storage demands associated with modeling Internet-wide BGP topologies. An adaptive topology inference module powered by a graph autoencoder reconstructs latent AS connectivity under incomplete observations. This produces more discriminative node embeddings. A path-comparison strategy combines dynamic time warping (DTW) with mean cosine distance, enabling the rapid quantification of routing changes and the timely identification of anomalies. Experimental results show that the approach achieves the highest accuracy on real anomalous event datasets, maintaining higher real-time performance. In addition, it also significantly reduces false positives, with an average reduction of 107 false alarms compared to its competing methods. In the future, we will integrate external information (e.g., RPKI/ROV, operator feedback) to further improve its robustness.

Author Contributions

Conceptualization, Z.W.; Methodology, Z.W. and Y.Z.; Validation, Y.Z. and J.W.; Writing—review & editing, Z.W.; Visualization, Y.Z. and J.W.; Supervision, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant 62402234, and the Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications under grant NY223168.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Al-Musawi, B.; Branch, P.; Armitage, G. BGP anomaly detection techniques: A survey. IEEE Commun. Surv. Tutor. 2016, 19, 377–396. [Google Scholar] [CrossRef]
Mitseva, A.; Panchenko, A.; Engel, T. The state of affairs in BGP security: A survey of attacks and defenses. Comput. Commun. 2018, 124, 45–60. [Google Scholar] [CrossRef]
Chen, Y.; Yin, Q.; Li, Q.; Liu, Z.; Xu, K.; Xu, Y.; Xu, M.; Liu, Z.; Wu, J. Learning with Semantics: Towards a Semantics-Aware Routing Anomaly Detection System. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, 14–16 August 2024; pp. 5143–5160. [Google Scholar]
WIKIPEDIA. 2021 Facebook Outage. 2021. Available online: https://en.wikipedia.org/wiki/2021_Facebook_outage (accessed on 5 December 2025).
Heilman, E.; Cooper, D.; Reyzin, L.; Goldberg, S. From the Consent of the Routed: Improving the Transparency of the RPKI. In Proceedings of the 2014 ACM Conference on SIGCOMM, Chicago, IL, USA, 17–22 August 2014; pp. 51–62. [Google Scholar]
Wirtgen, T.; Rybowski, N.; Pelsser, C.; Bonaventure, O. The Multiple Benefits of a Secure Transport for BGP. Proc. Acm Netw. 2024, 2, 1–23. [Google Scholar] [CrossRef]
Lychev, R.; Goldberg, S.; Schapira, M. BGP security in partial deployment: Is the juice worth the squeeze? In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, Hong Kong, China, 12–16 August 2013; pp. 171–182. [Google Scholar]
Sermpezis, P.; Kotronis, V.; Gigis, P.; Dimitropoulos, X.; Cicalese, D.; King, A.; Dainotti, A. ARTEMIS: Neutralizing BGP hijacking within a minute. IEEE/ACM Trans. Netw. 2018, 26, 2471–2486. [Google Scholar] [CrossRef]
Peng, S.; Nie, J.; Shu, X.; Ruan, Z.; Wang, L.; Sheng, Y.; Xuan, Q. A multi-view framework for BGP anomaly detection via graph attention network. Comput. Netw. 2022, 214, 109129. [Google Scholar] [CrossRef]
Alfroy, T.; Holterbach, T.; Krenc, T.; Claffy, K.; Pelsser, C. The next generation of bgp data collection platforms. In Proceedings of the ACM SIGCOMM 2024 Conference, Sydney, Australia, 4–8 August 2024; pp. 794–812. [Google Scholar]
Sriram, K.; Montgomery, D.; McPherson, D.; Osterweil, E.; Dickson, B. Problem Definition and Classification of BGP Route Leaks; Technical Report; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2016. [Google Scholar]
Silva, B.A., Jr.; Mol, P.; Fonseca, O.; Cunha, I.; Ferreira, R.A.; Katz-Bassett, E. Automatic inference of BGP location communities. Proc. ACM Meas. Anal. Comput. Syst. 2022, 6, 1–23. [Google Scholar] [CrossRef]
Schlamp, J.; Holz, R.; Jacquemart, Q.; Carle, G.; Biersack, E.W. HEAP: Reliable assessment of BGP hijacking attacks. IEEE J. Sel. Areas Commun. 2016, 34, 1849–1861. [Google Scholar] [CrossRef]
Cheng, M.; Xu, Q.; Liu, W.; Li, Q.; Wang, J. MS-LSTM: A multi-scale LSTM model for BGP anomaly detection. In Proceedings of the 2016 IEEE 24th International Conference on Network Protocols (ICNP), Singapore, 8–11 November 2016; IEEE: New York, NY, USA, 2016; pp. 1–6. [Google Scholar]
Li, J.; Dou, D.; Wu, Z.; Kim, S.; Agarwal, V. An Internet routing forensics framework for discovering rules of abnormal BGP events. ACM Sigcomm Comput. Commun. Rev. 2005, 35, 55–66. [Google Scholar] [CrossRef]
Minh, D.; Wang, H.X.; Li, Y.F.; Nguyen, T.N. Explainable artificial intelligence: A comprehensive review. Artif. Intell. Rev. 2022, 55, 3503–3568. [Google Scholar] [CrossRef]
Li, J.; Zhou, H.; Wu, S.; Luo, X.; Wang, T.; Zhan, X.; Ma, X. {FOAP}:{Fine-Grained}{Open-World} android app fingerprinting. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 1579–1596. [Google Scholar]
Ni, T.; Lan, G.; Wang, J.; Zhao, Q.; Xu, W. Eavesdropping mobile app activity via {Radio-Frequency} energy harvesting. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), Anaheim, CA, USA, 9–11 August 2023; pp. 3511–3528. [Google Scholar]
Shapira, T.; Shavitt, Y. BGP2Vec: Unveiling the Latent Characteristics of Autonomous Systems. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4516–4530. [Google Scholar] [CrossRef]
Luckie, M.; Huffaker, B.; Dhamdhere, A.; Giotsas, V.; Claffy, K. AS relationships, customer cones, and validation. In Proceedings of the 2013 Conference on Internet Measurement Conference, Barcelona, Spain, 23–25 October 2013; pp. 243–256. [Google Scholar]
Shapira, T.; Shavitt, Y. AP2Vec: An unsupervised approach for BGP hijacking detection. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2255–2268. [Google Scholar] [CrossRef]
Kamiyama, N.; Mori, T.; Kawahara, R.; Harada, S.; Hasegawa, H. Analyzing influence of network topology on designing ISP-operated CDN. Telecommun. Syst. 2013, 52, 969–977. [Google Scholar]
Cheng, M.; Li, Q.; Lv, J.; Liu, W.; Wang, J. Multi-scale LSTM model for BGP anomaly classification. IEEE Trans. Serv. Comput. 2018, 14, 765–778. [Google Scholar] [CrossRef]
Papadopoulos, S.; Moustakas, K.; Tzovaras, D. BGPViewer: Using graph representations to explore BGP routing changes. In Proceedings of the 2013 18th International Conference on Digital Signal Processing (DSP), Santorini, Greece, 1–3 July 2013; IEEE: New York, NY, USA, 2013; pp. 1–6. [Google Scholar]
Dukler, Y.; Li, W.; Lin, A.; Montúfar, G. Wasserstein of Wasserstein loss for learning generative models. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 1716–1725. [Google Scholar]

Figure 1. Classical BGP anomaly types.

Figure 2. Characteristics of BGP network. (a) The numbers of node and edge with years; (b) the sparsity degree (i.e.,

d e g r e e / # n o d e s

) of BGP network with years; (c) the cumulative distribution function of BGP degree.

Figure 2. Characteristics of BGP network. (a) The numbers of node and edge with years; (b) the sparsity degree (i.e.,

d e g r e e / # n o d e s

) of BGP network with years; (c) the cumulative distribution function of BGP degree.

Figure 3. Unobserved links by BGP collectors with years.

Figure 4. The framework of the proposed anomaly detection method.

Figure 5. The framework of the proposed anomaly detection method.

Figure 6. Influence of the numbers of subgraphs. (a) Sum of cut edge weights vs. number of subgraphs. (b) Number of cut edges and matrix size vs. number of subgraphs.

Figure 7. Results of topology reconstruction experiments. (a) AUC under varying incompleteness. (b) Results of GNN-based link prediction.

Figure 8. Comparisons of path difference scores between anomalous and normal route changes.

Figure 9. Critical difference diagrams for two metrics. (a) Critical difference in FPR. (b) Critical difference in detection accuracy.

Figure 10. Results of open-world experiments. (a) The number of daily alarms. (b) Specific details of alarms.

Figure 11. Results of the ablation study.

Figure 12. Results of the threshold sensitivity experiment.

Figure 13. The case study of Cloudflare outage. (a) Path difference score at each AS pairing step of DTW. (b) Embedding vectors in a 2-D plane.

Table 1. Key fields and their meanings in BGP RIB and update message.

Field	Meaning
AS Path	AS sequence through which an update message passes in turn.
Prefix	The destined IP prefix comprising IPv4 and IPv6.
Peer AS	The AS from where the update message is received.
Operation	The action to the routing information with the update message, which usually is withdrawal or announcement.

Table 2. The detailed design of node weights and traffic weights.

Customer Cone Count	Node Weight	Reachable IP Address Count	Traffic Weight
[0, 10)	5	[0, 100)	0
[10, 100)	15	[100, 1000)	2
[100, 1000)	25	[1000, 10,000)	4
[1000, 5000)	35	[10,000, 100,000)	6
≥5000	45	≥100,000	8

Table 3. The detailed descriptions of the proposed anomaly dataset.

Event	Date	Type	#Message (M)
Google_hijack	20050507	hijack	0.60
Pakistan_Telecom	20080224	hijack	1.22
US_carrier_Sprint	20140909	hijack	28.10
20140910-AS57807	20140910	hijack	16.47
H3S_median_services	20141114	hijack	9.42
VolumeDrive	20151204	hijack	16.87
GHOSTnet	20160221	hijack	3.85
Bitcanal	20180629	hijack	10.66
Australia_Telstra	20120223	leak	5.37
Malaysian_Telecom	20150612	leak	12.52
Google_leak	20170825	leak	67.51
Level_3	20171106	leak	4.39
Allegheny_DQE	20190624	leak	28.88
Cablevision_Mexico	20210211	leak	77.50
Worldstream	20241030	leak	79.69
Facebook	20111004	outage	123.28
Comcast _1	20211109	outage	68.30
Comcast_2	20111109	outage	67.18
UA_ISP_BGP	20220223	outage	184.96
Cloudflare	20220621	outage	62.18

Table 4. The details of the baseline methods.

Method	Core Technique(s)
BGPvector	Skip-Gram + Continuous Bag Of Words
ISP-Operated	Statistical features + Self-attention + LSTM
MSLSTM	Wavelet transform + LSTM
BGPviewer	Statistical features

Table 5. The hyperparameter settings of the baseline methods.

Method	Hyperparameter(s)
BGPvector	window = 2, epoch = 20, negative = 5
ISP-Operated	window = 10, batch = 128, epoch = 50, learning rate = 0.0001
MSLSTM	window=16, batch = 128, epoch = 50, learning rate = 0.001
BGPviewer	window = 10, batch = 128, epoch = 50, learning rate = 0.0001

Table 6. Performance comparison between ours and BGPvector.

Dataset	Ours	BGPvector
GHOSTnet	0.479	0.331
Bitcanal	0.401	0.263
Level_3	0.080	0.047
Worldstream	0.138	0.156
UA_ISP_BGP	0.176	0.131
Cloudflare	0.203	0.123

Table 7. Definition of experimental results.

Experimental Results	Description
Alarms	The number of time windows where anomalies were detected.
False Alarms	Time windows that were falsely identified as anomalies.

Table 8. Detection results on the 20 real-world datasets.

Dataset	Detected					#Alarms (#FalseAlarms)
Dataset	BGPvector	ISP-Operated	MSLSTM	BGPviewer	Ours	BGPvector	ISP-Operated	MSLSTM	BGPviewer	Ours
Google_hijack	✓	✗	✗	✓	✓	18(16)	0(0)	1(1)	18(16)	11(8)
Pakistan_Telecom	✓	✗	✗	✓	✓	22(21)	0(0)	13(13)	801(796)	27(22)
US_carrier_Sprint	✓	✓	✓	✓	✓	51(50)	63(49)	33(27)	88(81)	36(23)
20140910-AS57807	✗	✓	✓	✓	✓	32(32)	27(26)	3(2)	51(50)	28(24)
H3S_median_services	✓	✗	✗	✓	✓	19(17)	0(0)	8(8)	1106(1100)	26(20)
VolumeDrive	✗	✓	✗	✓	✓	61(61)	82(73)	6(6)	38(37)	70(69)
GHOSTnet	✓	✓	✗	✓	✓	22(16)	14(10)	0(0)	36(33)	5(0)
Bitcanal	✓	✓	✓	✓	✓	26(25)	54(52)	20(15)	35(34)	25(21)
Australia_Telstra	✓	✓	✓	✓	✓	12(11)	26(19)	17(8)	73(59)	19(17)
Malaysian_Telecom	✓	✓	✓	✓	✓	18(17)	165(122)	148(115)	141(126)	7(0)
Google_leak	✓	✓	✓	✓	✓	12(11)	82(67)	16(2)	91(84)	17(13)
Level_3	✗	✓	✓	✓	✓	16(16)	196(103)	134(89)	178(142)	16(15)
Allegheny_DQE	✓	✓	✓	✓	✓	24(0)	257(142)	173(90)	209(140)	17(0)
Cablevision_Mexico	✗	✗	✗	✓	✓	4(4)	0(0)	69(69)	126(119)	19(17)
Worldstream	✓	✓	✓	✓	✓	25(21)	15(0)	56(50)	107(96)	18(9)
Facebook	✓	✓	✓	✓	✓	22(16)	506(167)	296(120)	647(427)	19(6)
Comcast_1	✓	✓	✓	✓	✓	15(15)	69(32)	589(526)	531(493)	15(12)
Comcast_2	✓	✓	✓	✓	✓	11(10)	96(46)	644(580)	523(498)	16(15)
UA_ISP_BGP	✓	✓	✓	✓	✓	61(46)	738(501)	1707(1352)	1155(913)	61(54)
Cloudflare	✓	✓	✓	✓	✓	13(6)	83(16)	190(165)	574(519)	13(6)
FPR						1.74%	6.19%	12.23%	30.22%	1.45%
Overall	16/20	16/20	14/20	20/20	20/20	483(410)	2473(1325)	3525(2588)	6522(5658)	465(351)

Table 9. The resource occupation of the baselines.

Method	Memory (MB)	Detection Time (ms)
BGPvector	558	0.06
ISP-Operated	620	52
MSLSTM	545	53
BGPviewer	668	50
Ours	537	0.04

Table 10. Experimental results with no weights assigned and with weights assigned.

Dataset	#Alarms (#FalseAlarms)
Dataset	No Weights Assigned	Weights Assigned
US_carrier Sprint	42(31)	36(23)
GHOSTnet	20(14)	5(0)
FPR	1.34%	0.51%
Overall	62(45)	41(23)

Table 11. The methods of ablation experiments.

Methods	Method Implementation
M₀	Original hypothesis.
M₁	Perform topology inference on the entire graph.
M₂	Perform topology inference directly on the partitioned subgraph without enhancement.
M₃	The method proposed in this paper.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Z.; Zhou, Y.; Wu, J. GLBAD: Online BGP Anomaly Detection Under Partial Observation. Electronics 2025, 14, 4940. https://doi.org/10.3390/electronics14244940

AMA Style

Wu Z, Zhou Y, Wu J. GLBAD: Online BGP Anomaly Detection Under Partial Observation. Electronics. 2025; 14(24):4940. https://doi.org/10.3390/electronics14244940

Chicago/Turabian Style

Wu, Zheng, Yaoyu Zhou, and Junda Wu. 2025. "GLBAD: Online BGP Anomaly Detection Under Partial Observation" Electronics 14, no. 24: 4940. https://doi.org/10.3390/electronics14244940

APA Style

Wu, Z., Zhou, Y., & Wu, J. (2025). GLBAD: Online BGP Anomaly Detection Under Partial Observation. Electronics, 14(24), 4940. https://doi.org/10.3390/electronics14244940

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GLBAD: Online BGP Anomaly Detection Under Partial Observation

Abstract

1. Introduction

2. Background and Problem Statement

2.1. BGP Hijack

2.2. Characteristics of BGP Network

2.3. Partial View of BGP Routing Data

What Is Partial View

2.4. Problem Statement

3. Related Work

4. Method

4.1. Overview

4.2. Route Collection

4.3. Graph Construction

4.4. The Proposed Graph Partition on BGP Graph

4.4.1. Graph Coarsening

4.4.2. Graph Partition

4.4.3. Graph Uncoarsening

4.4.4. Subgraph Enhancement

4.5. Structure Inference

4.5.1. Encoder Module

4.5.2. Decoder Module

4.5.3. Loss Function

4.6. Anomaly Detection

4.7. Complexity Analysis

5. Experiments

5.1. Research Questions

5.2. BGP Topology Partitioning (RQ1)

5.3. Topology Reconstruction (RQ2)

5.4. Path Difference Score Analysis (RQ3)

5.5. Detection Results of Closed Dataset (RQ3)

5.6. Statistical Analysis

5.7. Detection Result in the Open-World Dataset (RQ3)

5.8. Ablation Experiment

5.9. Threshold Sensitivity Analysis

5.10. Case Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI