Skip to Content
TechnologiesTechnologies
  • Article
  • Open Access

26 February 2026

GAT-LA: Graph Attention-Based Locality-Aware Sampling for Modeling the Dynamic Evolution of I2P Routing Topologies

,
,
,
,
and
1
Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510555, China
2
Department of New Networks, Pengcheng Laboratory, Shenzhen 518108, China
3
University International College, Macau University of Science and Technology, Macau 999078, China
4
Research Institute, China Telecom Company Ltd., Guangzhou 510660, China

Abstract

Anonymous communication networks such as the Invisible Internet Project (I2P) are essential for safeguarding privacy and ensuring freedom of expression, necessitating robust performance and security evaluation in controlled environments. Network testbeds offer a reliable alternative to real-world testing. This paper proposes a dynamic modeling framework based on Graph Attention Network (GAT). We introduce a Region-Centric Initialization (RCI) strategy to establish an initial observation anchor, followed by a GAT-based Locality-Aware (GAT-LA) sampling mechanism that treats representative node selection as a dynamic learning task. Experimental results demonstrate that the GAT-LA mechanism significantly outperforms static methods in maintaining long-term similarity to real-world I2P performance metrics. The integrated stability penalty mechanism effectively suppresses excessive topological fluctuations, ensuring temporal smoothness across evolutionary cycles. Furthermore, the RCI strategy provides high engineering flexibility by supporting both automated scoring and target-oriented manual configuration. This paper presents a scalable methodology for dynamic network simulation with enhanced statistical alignment, providing a practical reference for security research within resource-constrained anonymous network ranges or testbeds.

1. Introduction

Anonymous communication networks (ACNs) employ mechanisms such as layered encryption and multi-hop forwarding to protect the identities and relationships of communicating parties, which is essential for safeguarding user privacy and promoting freedom of expression [1]. Currently, The Onion Router (Tor) and the Invisible Internet Project (I2P) represent the most prevalent anonymous networks [2]. While the increasing demand for robust anonymity drives the continuous evolution of novel protocols and cryptographic mechanisms [3,4], adversaries are persistently attempting to breach existing defense boundaries through traffic analysis, timing correlation, and de-anonymization [5]. Consequently, security assessment and the verification of offensive and defensive capabilities for anonymous networks have become critical components of contemporary research.
However, the infeasibility of conducting direct technical assessments and offensive-defensive drills within real-world anonymous networks has become increasingly prominent [6]. First, the validation of novel protocols often requires large-scale software updates by users to observe network effects, resulting in a protracted and complex experimental process. Second, the inherent openness and uncontrollability of live anonymous networks make it difficult to establish uniform experimental conditions, leading to poor reproducibility and undermining scientific rigor. Furthermore, conducting offensive testing in live environments inevitably risks the collection of actual user data, thereby violating ethical guidelines and privacy protection requirements. Additionally, aggressive testing may significantly impair overall network performance, potentially leading to service disruptions or systemic collapse.
Developing simulated experimental platforms, such as cyber ranges, has emerged as an effective alternative to direct experimentation in real-world environments [7,8,9,10,11,12,13,14]. These platforms support diverse research areas, including path selection optimization [15,16,17], load balancing [18,19], and Denial-of-Service (DoS) attacks [20,21,22,23]. For simulations to be valid, the modeled networks within the cyber range must maintain high fidelity to their real-world counterparts. However, due to hardware resource constraints, simulated environments are typically operated at a significantly reduced scale. Consequently, a robust modeling strategy is required to ensure that the downscaled topology retains the essential characteristics and behavioral patterns of the target network [24], such as topological properties, performance metrics, and traffic patterns.
Currently, network structure modeling methods for anonymous network cyber ranges can be categorized by their simulation fidelity into three levels: those based on basic network topologies [25,26], those utilizing darknet resource crawling [12], and specialized Tor network modeling [24]. In contrast, structural modeling techniques for I2P focus primarily on obtaining netDb information through node measurements—RouterInfos for I2P routers contact information [27,28,29,30] and LeaseSets for hidden services (Eepsites) [31,32]—and conducting in-depth analyses of I2P characteristics from diverse perspectives to enhance the understanding of its operational mechanisms. Although these studies have revealed the highly dynamic nature of the I2P network driven by frequent node churn, which manifests as continuous structural drift and evolution at both macroscopic and microscopic levels, current measurement findings have not yet been fully integrated into reproducible and time-varying modeling schemes suitable for simulated environments such as cyber ranges.
Therefore, leveraging measurement data from the real I2P network to construct representative structural models capable of reflecting its dynamic evolution is crucial for providing a reliable environment for subsequent experimental validation. Such modeling efforts not only enhance the scientific rigor of anonymous network research, but also strengthen the credibility of empirical findings, thereby ensuring the validity of research outcomes in real-world scenarios.
Against this backdrop, this paper adopts a local-view modeling perspective and proposes a Graph Attention Network [33] (GAT)-based locality-aware sampling strategy (GAT-LA). The global modeling of I2P is significantly hindered by technical complexities, as full-network topology mapping incurs prohibitive resource overhead and poses significant risks regarding the large-scale collection of sensitive data. By focusing on a specific observation domain, local modeling substantially reduces computational requirements. This approach maintains favorable precision and controllability within the target scope, aligning more closely with the ethical imperatives of research and privacy protection.
This paper proposes a dynamic local network structure modeling method for I2P cyber ranges, designed to generate a simulated target network that reflects the temporal evolution of network structures under the constraint of reduced node scales. The design philosophy draws inspiration from the adaptive zooming and regional focusing mechanisms of digital maps, supporting refined modeling of specific areas within a limited observational scope. Specifically, centering on a designated observation domain, the method performs targeted screening of anchor points within this domain and critical nodes within their topological neighborhoods; this captures and characterizes the key network structural and dynamic evolution features of the domain, thereby maintaining the representativeness of the simulated environment under a controlled experimental scale.
We describe the measurement data encompassing network nodes and logical topology obtained from the real I2P network in Section 3.1 and introduce a region center initialization (RCI) strategy in Section 3.3 to identify optimal entry points within the acquired local observation domain via two modes: performance-aware and manual-configuration. Then, we describe GAT-LA in Section 3.4, which governs the process of dynamic evolution and the screening of representative nodes. In Section 4, we instantiate the proposed model within the cyber range utilizing the P-I2Prange [34] method, an automated framework for constructing I2P application scenarios, and compare its network performance characteristics with data collected from the real I2P network. Experimental results demonstrate that our approach successfully models representative node selection as a dynamic learning task. This method effectively preserves the performance distribution of the target network throughout the modeling cycle.
Our contributions are listed as follows:
  • We propose a dynamic network modeling framework specifically for I2P cyber ranges. By formulating the selection of representative nodes within a localized observation domain as an adaptive learning task, this framework provides a structured approach to capturing the temporal evolution manifested in the localized observation domain.
  • We design a region center initialization strategy that implements a cold-start for local modeling through an automated scoring mechanism for representative anchors. This strategy further supports a manual configuration mode to accommodate diverse research requirements such as goal-oriented optimization for regional performance.
  • We implement a locality-aware sampling mechanism, termed GAT-LA, which integrates multi-head attention with stability constraints to enhance the approximation of the observed local network characteristics and temporal smoothness within a downscaled simulation domain.

3. Methodology

Given the decentralized nature of the I2P network and the technical complexity of full-network topology mapping, this section describes a dynamic local network modeling method based on GAT, leveraging local network data obtained from the real I2P network. The method aims to characterize the key structural and dynamic evolution features of the actual local I2P network within a cyber range simulation environment, specifically at a reduced node scale. Its design philosophy draws inspiration from the adaptive zooming and regional focusing mechanisms found in digital maps. Figure 2 illustrates the overall framework of the proposed method.
Figure 2. The overall framework of the proposed dynamic local network modeling method for I2P.

3.1. Data Preparation from the Real I2P Network

To ensure that the modeled I2P network preserves the statistical characteristics of the real I2P network, this section first acquires local topology data from the real I2P network through passive measurement techniques. Since the I2P netDb aggregates fragmented global views from individual routers, we leverage controlled probes to collect local netDb data, enabling an incremental and localized characterization of the I2P topology.
As illustrated in the data preparation module in Figure 2, the measurement process leverages the I2P node discovery mechanism by employing a passive probing strategy characterized as Breadth-First Search (BFS)-like. Specifically, the probe node R extracts routerHash, the unique identifier of each router, from its local storage to define a vertex set V. Let V l o c a l denote the local netDb peer list; for each v i V l o c a l , we construct a logical edge ( R , v i ) E . To ensure computational efficiency and mitigate redundancy, we introduced an edge filtering mechanism based on a generic hashing strategy. This mechanism computes MD5 digests of edge identifiers and queries a global edge hash table, thereby enabling efficient deduplication and rapid insertion of newly discovered topological edges into the dataset.
Throughout this process, the probe nodes strictly adhere to the principle of passive observation, merely recording the natural diffusion of netDb entries without initiating active queries to external nodes. This non-intrusive approach ensures that the measurement avoids interference with the neighbor selection decisions of read nodes. The measurement task terminates when the observation process no longer yields new logical edges. The resulting undirected topological graph G = ( V , E ) characterizes the logical adjacency relationships within the local I2P network, providing a foundational dataset for subsequent modeling of dynamic evolution.

3.2. Problem Definition

Building upon the local topological observations obtained in Section 3.1, we define the representative structure modeling of the I2P network as a Dynamic Representative Node Selection (DRNS) problem over spatio-temporal graphs. Specifically, the primary objective of dynamic modeling is to derive a representative topology sequence for the target environment from the locally observed graph sequence G = { G ( 1 ) , G ( 2 ) , , G ( t ) , , G ( T ) } , where T denotes the total number of observation intervals encompassed in the modeling scope. Each time step t is configured with a default granularity of a 24-h observation cycle.
For each G ( t ) = ( V ( t ) , E ( t ) ) at time step t, V ( t ) represents the set of N routerHashes observed locally during this interval, and E ( t ) is the set of logical topological edges derived via the methodology described in Section 3.1. Each node v V ( t ) is associated with a d-dimensional normalized feature vector x v ( t ) [ 0 , 1 ] 5 , which integrates the critical performance metrics summarized in Table 1. These metrics are extracted from the I2P local management interface, named viewprofile, which provides real-time performance statistics and peer profiling data.
Table 1. Critical performance metrics and their descriptions for I2P nodes.
The objective of the proposed method is to learn a time-varying mapping function f : { G ( 1 ) , , G ( t ) , , G ( T ) } { S ( 1 ) , , S ( t ) , , S ( T ) } , where S ( t ) V ( t ) denotes a subset of K representative nodes to be instantiated within the cyber range ( K N ). To ensure simulation fidelity, the ideal optimization objective of the mapping function f is to minimize the cumulative discrepancy in performance distributions between the modeled environment and the live network, as formulated in Equation (2).
min f t = 1 T Dist P ( S ( t ) ) , P ( V ( t ) ) + λ · Ω ( S ( t ) , S ( t 1 ) )
where P ( · ) denotes the distribution of performance metrics, such as average Time-to-Last-Byte (TTLB). Dist ( · ) represents a distance metric used to quantify the divergence between distributions. Ω ( · ) serves as a stability constraint weighted by the penalty coefficient λ , which is specifically designed to suppress excessive topological fluctuations between consecutive simulation cycles. In practice, since subset selection S ( t ) is a discrete combinatorial process, we decompose this global objective into a supervised node-level regression task. By accurately predicting individual node scores, as detailed in Section 3.4, the model facilitates the selection of a representative subset S ( t ) that satisfies these distributional and stability constraints.

3.3. Region-Centric Initialization

In the modeling of I2P networks, defining the spatial scope involves a strategic trade-off between technical feasibility, computational efficiency, and research ethics. Rather than pursuing global modeling at a network-wide scale, this study deliberately focuses on a specific observation domain. This localized observation paradigm significantly mitigates redundant demands for computational complexity and storage resources inherent in large-scale anonymous network mapping, while effectively minimizing unnecessary interference with irrelevant nodes. From an ethical perspective, localized modeling maximizes the avoidance of privacy leakage risks associated with network-wide probing, thereby aligning with the compliance requirements of anonymity research.
Against this backdrop, we propose the region-centric initialization (RCI) strategy to establish a robust topological reference frame within the defined observation domain. By establishing representative anchor points, RCI fixes the modeling focus, ensuring that when structures are mapped to a further downscaled simulation domain, the topological evolution patterns of the target region can be characterized.
The RCI strategy supports two modes: a manual configuration mode guided by specific research objectives, and a default representative scoring mechanism as described in this section.
By default, the RCI strategy implements an anchor importance scoring mechanism to identify nodes that exert significant structural influence within the observation domain. The selection of anchor points is contingent upon the entire observed sequence G , i.e., c t = 1 T V ( t ) . For any candidate node v, its importance score, score ( v ) , is defined in Equation (3), where each constituent parameter represents the time-averaged value of the node’s metrics across the entire duration T.
score ( v ) = r ¯ ^ v + c ¯ ^ v + s ¯ ^ v + i ¯ ^ v + b ¯ ^ v + f v + u v
where r v = 1 1 + tunnel _ create _ time v , c v = capacity v , s v = speed v , i v = integration v , b v { 1 , 2 , , 7 } denotes the ordinal mapping value of the node’s router _ bandwidth v , and f v { 0 , 1 } is a binary indicator signifying whether node v functions as a Floodfill router. The stability metric u v is quantified as the reachability frequency, defined as the proportion of successful observations of node v across all measurement sub-steps within the duration T. · ¯ denotes the arithmetic mean calculated across all measurement sub-steps, while · ^ signifies the Min-Max normalization applied over the vertex set to ensure all metrics are commensurable within the range [ 0 , 1 ] .
The node achieving the highest score ( v ) is designated as the anchor point c, which is identified at the initialization of each modeling process based on performance data across the duration T. Once designated, this anchor remains invariant throughout the subsequent dynamic evolution period T and is not subject to re-selection despite state transitions of other nodes. This fixed topological reference frame is critical for quantifying structural drift and precisely characterizing performance fluctuations within the target region. The anchor is re-evaluated and updated at the commencement of each new modeling cycle T using updated performance metrics.
To enable the subsequent Graph Neural Network (GNN) to distinguish this focal center, a binary indicator variable z v = I v = c , f v and u v are appended to the tail of each node v’s feature vector, forming a concatenated input for the Graph Attention (GAT) model detailed in Section 3.4. This architectural design allows the model to implicitly capture the structural and functional centrality of the anchor point c, thereby bypassing the high computational overhead associated with explicit topological distance calculations.
Although a multi-center configuration could be considered for broader regions, this study adopts a single-center initialization mode as the default setting to ensure the controllability and interpretability of the evaluation framework.

3.4. GAT-Based Locality-Aware Sampling Strategy

To construct the simulation domain S ( t ) from the observation domain, the node selection process must evolve from static heuristic evaluation toward a dynamic, performance-oriented modeling approach. This section introduces the GAT-based locality-aware sampling strategy (GAT-LA), which employs GAT as a learnable feature aggregator to synthesize multi-dimensional health metrics and spatial constraints. By leveraging the attention mechanism, GAT-LA can adaptively weigh the importance of individual nodes, ensuring that the resulting simulation domain S ( t ) is structurally representative.
The GAT-LA strategy is executed iteratively at each time step t T to generate a sequence of simulation domains { S ( 1 ) , S ( 2 ) , , S ( T ) } . The objective of GAT-LA is to map the raw observational data of candidate nodes into a set of performance-based sampling probabilities. This mapping ensures that the transition from observation to simulation is guided by learned node importance rather than static rules. The input and output definitions of GAT-LA are detailed as follows:
  • Input: A graph G ( t ) = ( V ( t ) , E ( t ) ) representing the network topology at time t, and an augmented feature vector x v ( t ) R 8 for each node v V ( t ) . The vector is constructed as x v ( t ) = [ x 1 , , x 5 , f v , u v , z v ] , where the first five dimensions correspond to the normalized metrics listed in Table 1, while f v , u v , and z v are the node-specific indicators derived in Section 3.3.
  • Output: A scalar performance score s v ( t ) [ 0 , 1 ] for each node v V ( t ) , representing its predicted service efficacy within the current observation cycle. This score quantifies the relative capability of each node in maintaining stable service delivery, thereby providing the probabilistic basis for the subsequent sampling process.
A two-layer multi-head Graph Attention Network (GAT) was implemented to capture structural interdependencies. Let h v ( l ) denote the embedding of node v at the l-th layer, where h v ( 0 ) = x v ( t ) represents the initial state. The output embedding h v ( l ) for node v at layer l is formulated as Equation (4).
h v ( l ) = σ u N ( v ) { v } α v u ( l ) W ( l ) h u ( l 1 )
where σ denotes the Exponential Linear Unit (ELU) adopted as the non-linear activation function. W ( l ) R d × d i n is the weight matrix used to transform input features into d-dimensional embeddings, where d is set to 64 by default. For the initial layer ( l = 1 ), d i n = 8 corresponds to the dimension of the augmented feature vector x v ( t ) . α v u ( l ) represents the attention weight normalized via the Softmax function, as formulated in Equations (5) and (6).
α v u ( l ) = exp ( e v u ( l ) ) k N ( v ) exp ( e v k ( l ) )
e v u ( l ) = LeakyReLU a ( l ) T W ( l ) h v ( l 1 ) W ( l ) h u ( l 1 )
where e v u ( l ) represents the attention coefficient that quantifies the importance of neighbor u to node v in the l-th layer. a ( l ) denotes the learnable attention weight vector of the l-th layer, and ‖ symbolizes the concatenation operation.
The final node embedding is defined as z v ( t ) = h v ( 2 ) R d , which is generated by aggregating neighborhood information across multiple layers to capture high-order structural dependencies.
To map the high-dimensional latent embedding z v ( t ) into a scalar sampling metric, a lightweight Multi-Layer Perceptron (MLP) is employed as a scorer. The embedding vector z v ( t ) is mapped to a raw efficiency score s v ( t ) [ 0 , 1 ] via a two-layer MLP, which transforms the 64-dimensional input into a 32-dimensional hidden representation with ReLU activation, followed by a linear projection and a Sigmoid output function. This score serves to quantify the predicted service effectiveness of the node within the current simulation window. All weights and biases involved in this mapping process are integrated into the comprehensive parameter set Θ for subsequent joint training and optimization.
Subsequently, at the onset of each time step t, GAT-LA undergoes a weakly supervised training phase to align the model with the most recent network state. Specifically, the model employs an incremental update strategy: for t = 1 , the parameter set Θ is initialized randomly; for all subsequent steps t > 1 , the model inherits the optimized parameters from the previous step t 1 and performs local fine-tuning. We define the training label y v ( t ) as the instantaneous vitality score. While its underlying calculation logic remains consistent with Equation (3) in Section 3.3, it is specifically applied to the real-time health metrics observed at the current time step t.
The training objective is to minimize the Mean Squared Error (MSE) between the predicted score s v ( t ) and the ground-truth label y v ( t ) , as formulated in Equation (7). This optimization process ensures that the model captures the instantaneous network dynamics by aligning the predicted efficiency with the observed vitality indicators.
L ( Θ ) = 1 | V | v V ( s v ( t ) y v ( t ) ) 2 + γ Θ 2 2
where Θ denotes the comprehensive set of model parameters, encompassing the weight matrices W ( l ) and attention vectors a ( l ) of the GAT, as well as all weights and biases within the MLP. The parameter γ serves as the regularization coefficient to prevent overfitting. By minimizing L ( Θ ) , the model effectively learns the optimal attention weights and feature mappings required to accurately predict service effectiveness.
To ensure the statistical validity and temporal smoothness of the modeling process, GAT-LA incorporates a stability penalty mechanism to refine the raw scores. The adjusted importance score s ^ v ( t ) is defined as formulated in Equation (8).
s ^ v ( t ) = s v ( t ) · 1 λ · 1 S ( t 1 ) ( v )
where 1 S ( t 1 ) ( v ) denotes the indicator function that represents the node selection status in the previous cycle, and λ is the penalty coefficient. This mechanism effectively penalizes previously selected nodes, thereby encouraging the model to explore new high-utility nodes and ensuring temporal diversity in the sampling process.
Finally, based on the selection probability P v ( t ) = Softmax ( s ^ v ( t ) ) , K nodes are selected through weighted sampling without replacement to constitute the simulation domain S ( t ) for the current cycle. This probabilistic selection mechanism ensures that nodes with higher adjusted importance scores are prioritized while maintaining a degree of stochastic exploration.
By integrating the aforementioned components, the GAT-LA strategy operates as a cohesive, iterative pipeline. At each time step t, the model first calibrates its internal parameters via weakly supervised learning to match the evolving network dynamics. Subsequently, it reconciles instantaneous performance prediction with temporal stability constraints to yield the refined importance scores. This process culminates in the probabilistic sampling of K nodes, effectively generating a sequence of simulation domains { S ( 1 ) , , S ( T ) } that are both statistically representative and topologically stable.
Conceptually, the design of GAT-LA is inspired by the adaptive zooming and region-focusing mechanisms in digital mapping applications. In this paradigm, the model centers on a designated anchor and selectively prioritizes key nodes within its topological neighborhood to sustain a high-fidelity structural representation. Unlike stochastic sampling methods such as Random Walk or Forest Fire—which may fail to maintain local focus or overlook structural importance—GAT-LA functions as a targeted filtering process. By focusing on nodes that define the core structural characteristics and dynamic evolution of the observation domain, GAT-LA ensures that the downscaled simulation environment remains structurally representative while operating within a controlled experimental scale.

4. Experimental Results

We instantiate the proposed dynamic modeling method within the cyber range environment of Pengcheng Laboratory (PCL). This section evaluates the effectiveness of the dynamic evolution modeling approach by quantitatively comparing the performance consistency between the ground-truth I2P network observation domain and various controlled simulation instances.

4.1. Experiment Setup

To obtain empirical performance data, three controlled nodes were deployed within the live I2P network to serve as performance monitoring probes. These probes initiated continuous, serial download requests for a 5 MB static payload hosted on a dedicated eepsite server, thereby emulating representative user access patterns. This setup ensures that the gathered metrics, such as throughput and latency, reflect the actual network conditions encountered by users in I2P.
We instantiated the I2P network model within the PCL environment, utilizing alpine:3.17.1 Docker images to configure the I2P routers. To ensure parity with the live network, we replicated the deployment strategy by designating three virtualized nodes as monitoring probes and one eepsite as a file server hosting an identical 5 MB payload. Each modeling window t corresponds to a 24-h duration. To allow the network to reach a steady state—specifically for the I2P bootstrapping and NetDb exploration processes—a 30-min warm-up period was implemented at the onset of each experiment. Performance metrics collected during this interval were discarded, with analysis focusing exclusively on data from the subsequent operational phase.
Automated measurement scripts were developed to collect the performance metrics from the monitoring probes, as detailed in Table 2, to evaluate and validate the proposed method. During the experimental process, the modeling instances underwent high-frequency sampling with a 5-min granularity, whereas the real-world network was sampled at 1-h intervals. To mitigate the impact of instantaneous stochastic fluctuations and achieve temporal alignment, all metric values involved in the subsequent analysis were calculated as the arithmetic mean of the observations recorded within each time step t.
Table 2. Definition of Performance Metrics for Monitoring Probes.
The total modeling duration T is set to 7 days to encompass a full weekly cycle, capturing potential periodic variations in node behavior between weekdays and weekends. This duration is partitioned into consecutive 24-h modeling windows t { 1 , 2 , , 7 } to capture the temporal dynamics of the network. To ensure statistical reliability, each simulation group consists of multiple independent trials, and the results are reported as ensemble averages. The experimental groups are defined as follows:
  • Reference Group (Ground Truth): Consists of the ground-truth data collected directly from the monitoring probes in the real I2P network.
  • GAT-LA: The proposed approach that executes daily iterative evolution using the methodology described in Section 3.
  • Static GAT-LA (Baseline I): This group is initialized on the first day using the RCI strategy and GAT-based feature fusion but maintains a static topology for the remainder of the period. It is designed to illustrate the performance degradation of static modeling over time.
  • Heuristic-Dynamic (Baseline II): A model that performs daily resampling based exclusively on the heuristic scoring formula discussed in Section 3.3, without GAT-based feature fusion. This baseline verifies the superiority and necessity of the learned attention mechanism in the proposed framework.
The GAT-LA model is configured with two layers and a hidden dimension of 64. The stability penalty coefficient λ is set to 0.2 to prevent the premature eviction of high-performance backbone nodes, thereby ensuring topological continuity throughout the evolution process.
The default sampling scale K is set to 200 nodes, representing a specific local subset within the observational domain. The experiments were deployed within the cyber range environment of PCL. To maximize the simulation scale under constrained hardware resources, we utilized customized Docker containers to construct the I2P nodes. The experimental environment consists of 15 virtual user nodes, each allocated 8 GB of RAM and 40 GB of disk storage. To strike a balance between system load and interactive response speed, each virtual node hosts between 22 and 25 containerized I2P instances. Facilitated by this distributed container architecture, the simulation environment provides stable support for network evolution experiments at a scale of K = 200 , ensuring high data collection precision for the monitoring probes.

4.2. Evaluation Metrics

To quantitatively evaluate the fidelity of the dynamic simulation domain relative to the live I2P network, we define three high-order evaluation metrics based on the raw performance data collected in Section 4.1.
Performance Deviation ( P D ). The primary objective of the cyber range is to minimize the numerical discrepancy between the simulated environment and the real network. For a specific metric m in Table 2, the performance deviation at time step t, denoted as P D m ( t ) , is defined as Equation (9).
P D m ( t ) = 1 N i = 1 N | P m , s i m ( t , i ) P m , r e f ( t ) |
where P m , r e f ( t ) denotes the hourly mean of the reference group at time step t, while P m , s i m ( t , i ) represents the corresponding mean observed in the i-th independent simulation trial. N signifies the total number of independent trials within the simulation group. A smaller P D m ( t ) value indicates higher numerical fidelity, demonstrating that the simulation environment more closely approximates the performance levels of the real network.
Trend Consistency ( T C ). This metric evaluates whether the modeled network effectively captures the temporal fluctuation characteristics of the live network. For a specific metric m, the trend consistency T C m throughout the observation period is formulated as shown in Equation (10).
T C m = cov ( P m , s i m , P m , r e f ) σ P m , s i m σ P m , r e f
where P m , sim and P m , ref represent the mean performance vectors of the simulation group and the reference group, respectively, over the seven-day observation period. This metric employs the Pearson correlation coefficient to quantify the consistency between the evolutionary trends of the simulation domain and the live network. A T C m value closer to 1 signifies that the simulation domain successfully tracks real-world network performance fluctuations, demonstrating high temporal fidelity in capturing the dynamic nature of the environment.
Selection Stability Score ( S S ). To evaluate the impact of the stability penalty mechanism introduced in Section 3.4 on topological evolution, we define the selection stability score S S ( t ) at time step t as shown in Equation (11).
S S ( t ) = | S ( t ) S ( t 1 ) | | S ( t ) S ( t 1 ) |
where S ( t ) and S ( t 1 ) represent the sets of modeled nodes in two adjacent modeling cycles. A stable S S value ensures that while the cyber range adapts to network dynamics, it simultaneously maintains the stability of backbone service nodes, thereby preventing unrealistic routing oscillations during the evolutionary process.

4.3. Experiment Results

Following the experimental setup detailed in Section 4.1 and the evaluation metrics defined in Section 4.2, this section presents a quantitative evaluation of the fidelity performance of the GAT-LA strategy.
During the 7-day experimental observation, a total of 13,807 observed nodes were identified. Among these, 2993 were transient nodes with a lifespan of only one day, whereas 5937 nodes remained active throughout the entire seven-day duration. Figure 1 illustrates the dynamic evolution trends of the P T T L B metric for each experimental group during this period.
As illustrated in Figure 3a, the GAT-LA group successfully tracked the performance fluctuations of the Reference group throughout the 7-day observation period. The T C of GAT-LA reached 0.81, significantly outperforming the other baseline groups. In contrast, Baseline I exhibited a continuous escalation in P TTLB values over time. This performance degradation stems from its inability to dynamically update the simulation domain as nodes leave the network. Although Baseline II performs daily re-sampling, its fluctuation magnitude is considerably larger than that of the reference group due to the lack of deep feature fusion of neighborhood characteristics via GAT. Furthermore, Baseline II demonstrates significant latency and bias during peak performance periods, such as Day 4, reflecting the limitations of purely heuristic-based sampling in capturing complex network dynamics.
Figure 3. (a) Comparison of P TTLB dynamic tracking performance over a 7-day cycle. (b) Cumulative distribution function (CDF) of P TTLB across different experimental groups.
To verify the capability of the simulation domain to reconstruct the performance distribution of the live network, Figure 3b depicts the Cumulative Distribution Function (CDF) curves of P TTLB over the entire observation period.
Figure 3b illustrates that the CDF curve of GAT-LA is in close proximity to that of the reference group. Within the interval of 1.4 s to 1.8 s, the slope of the GAT-LA CDF closely resembles the reference group’s profile, indicating that by aggregating neighborhood features through the attention mechanism, the model effectively reconstructs the performance distribution characteristics of pivot nodes in the live network. In contrast, Baseline I exhibits a significant rightward shift in its distribution curve, as it is unable to capture the dynamic evolution of node states during the observation period. This divergence underscores that static sampling strategies struggle to maintain the statistical validity of the simulation domain over extended durations.
Regarding the slight rightward shift observed in all simulation curves beyond the 2.2 s threshold, we posit that this deviation reflects the inherent limitations of localized modeling environments in reproducing the heavy-tailed latency distributions. Throughout the 7-day observation, 13,807 unique nodes were recorded, of which 2993 were identified as transient nodes with a survival duration of only one day. These transient nodes introduce significant stochastic uncertainty and extreme latency variance within the live network. Quantitatively, the controlled range environment introduces specific computational overheads: the high-density deployment of 22–25 containers per virtual host leads to CPU scheduling contention and increased I/O virtualization latency. Internal profiling indicates that the Docker network stack and context switching between numerous instances add an estimated 5–8% processing overhead per cryptographic operation. Furthermore, the limited memory allocation per container (approx. 320 MB) triggers more frequent JVM garbage collection cycles during peak loads, contributing an additional 15–30 ms of jitter. Consequently, without the stochastic smoothing effect naturally provided by the massive routing nodes in the global network, the simulation environment exhibits slightly higher latencies than the live network during extreme congestion scenarios.
Although it is technically feasible to remove transient nodes during the pre-processing phase to achieve a closer numerical fit, we deliberately retained them to preserve the original stochastic characteristics of the live I2P network. From a practical standpoint, rather than relying on the idealized removal of nodes, GAT-LA addresses this challenge through its stability-aware sampling mechanism. This ensures that nodes exhibiting high-variance or short-term behavior are progressively assigned lower selection probabilities. Consequently, the framework achieves a robust trade-off: it remains grounded in real-world noise while ensuring the core simulation domain is not destabilized by transient anomalies.
Table 3 summarizes the P D for each experimental group relative to the reference group across the observation period. As indicated, the GAT-LA strategy achieves the minimum deviation in every dimension. Specifically, for the P D S R , GAT-LA restricts the error to within 2.1%, significantly outperforming the Heuristic-Dynamic approach (Baseline II) which lacks feature fusion.
Table 3. P D of simulation strategies relative to the reference group across multiple metrics.
Table 4 illustrates the influence of the sampling scale K on the P D TTLB of GAT-LA. Experimental results demonstrate that the simulation fidelity improves non-linearly as K increases. Specifically, once K reaches a threshold of 200, the reduction in P D TTLB begins to plateau, indicating diminishing returns in accuracy for further scale expansions. In resource-constrained cyber range environments, increasing the node scale significantly intensifies the computational burden on the underlying hardware infrastructure. Therefore, guided by the experimental constraints outlined in Section 4.1, we selected a sampling scale of K = 200 to achieve an optimal trade-off between modeling fidelity and computational overhead.
Table 4. P D TTLB values under different K sampling scales.
Table 5 evaluates the influence of the stability penalty coefficient λ on the continuity of the GAT-LA simulation domain. While λ is designed to encourage exploration by penalizing previously selected nodes, it paradoxically enhances Selection Stability ( S S ) at moderate levels (e.g., λ = 0.2 ). As shown in Table 5, with λ = 0.2 , GAT-LA sustains a higher average success rate of 65.1% with markedly lower fluctuations ( S S = 0.78 ). This is attributed to the fact that λ mitigates the rank-oscillation effect, where minor performance fluctuations in the live I2P network cause nodes to frequently enter and exit the Top-K set. Conversely, without the penalty ( λ = 0 ), the model over-optimizes for instantaneous gains, resulting in a chaotic “churn” and severe jitters in P TSR ( S S = 0.42 ). By diversifying the selection, λ acts as a temporal buffer that smooths the evolutionary trajectory. However, an excessively high λ would eventually risk a “stale” or forced-turnover topology; thus, λ = 0.2 is optimized to maintain high fidelity while preventing unrealistic routing oscillations.
Table 5. Impact of the stability penalty coefficient λ on GAT-LA simulation continuity.

5. Conclusions

To mitigate the limitations of static modeling in capturing network dynamics, we proposed a dynamic modeling approach utilizing Graph Attention Network (GAT). The methodology consists of a Region-Centric Initialization (RCI) strategy for establishing an initial observation anchor and a GAT-based locality-aware sampling strategy (GAT-LA) that treats node selection as a dynamic learning task. The primary results and contributions of this work are summarized as follows. First, the RCI strategy provides a practical solution for local modeling initialization by supporting both automated scoring and manual configuration based on research needs. Second, experimental evaluations indicate that the GAT-LA mechanism improves the similarity between simulated and real-world performance distributions over extended periods compared to static sampling methods. Third, the stability penalty mechanism contributes to mitigating excessive topological fluctuations, thereby improving the temporal consistency of the simulation domain.
These findings suggest the feasibility of employing graph-based learning methods to handle node selection in dynamic anonymous networks. By considering both node health indicators and neighborhood structures, the proposed approach offers an alternative to traditional heuristic sampling, particularly for scenarios requiring long-term performance observation.
In terms of practical application, the results provide a more realistic baseline for evaluating network protocols in a testbed environment. Specifically, the framework can be used to benchmark the performance of path selection algorithms under simulated node churn. Additionally, it offers a controlled platform for analyzing localized network characteristics while adhering to ethical standards regarding user privacy.
The paper has certain limitations, as it primarily focuses on topological evolution and does not yet integrate flow-level traffic modeling. The accuracy of the model is also contingent on the quality of the initial netDb measurements.
Future research could explore several directions. One objective is to incorporate traffic-level characteristics to provide a more comprehensive simulation environment. Another direction involves investigating multi-center initialization to represent broader segments of the network. Furthermore, considering the inherent diversity of node attributes in I2P, future work will specifically investigate heterophilic graph learning architectures, such as ACM-GCN [40], to further enhance model adaptability to dissimilar node interconnections. Additionally, integrating simple adversarial behaviors into the selection process could help evaluate network resilience. Finally, improving the synchronization efficiency between the live network state and the simulation domain remains a subject for further investigation.

Author Contributions

Conceptualization, R.T.; methodology and validation, R.T.; resources, H.W. and B.H.; writing—original draft, R.T.; writing—review and editing, Q.T., H.W. and P.Z.; visualization, Y.X.; supervision, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the State Key Program of the National Natural Science Foundation of China (No. U2336202).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed during this study are not publicly available due to restrictions imposed by the funding agency and ongoing related work under confidentiality agreements.

Conflicts of Interest

Author Yushun Xie was employed by China Telecom Company Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Hosseini Shirvani, M.; Akbarifar, A. A comparative study on anonymizing networks: ToR, I2P, and riffle networks comparison. J. Electr. Comput. Eng. Innov. (JECEI) 2022, 10, 259–272. [Google Scholar]
  2. Ali, A.; Khan, M.; Saddique, M.; Pirzada, U.; Zohaib, M.; Ahmad, I.; Debnath, N. TOR vs. I2P: A comparative study. In Proceedings of the 2016 IEEE International Conference on Industrial Technology (ICIT), Taipei, Taiwan, 14–17 March 2016; pp. 1748–1751. [Google Scholar]
  3. Dahlberg, R.; Pulls, T.; Ritter, T.; Syverson, P. Privacy-preserving & incrementally-deployable support for certificate transparency in tor. Proc. Priv. Enhancing Technol. 2021, 2021, 194–213. [Google Scholar]
  4. Kim, S.; Han, J.; Ha, J.; Kim, T.; Han, D. Sgx-tor: A secure and practical tor anonymity network with sgx enclaves. IEEE/ACM Trans. Netw. 2018, 26, 2174–2187. [Google Scholar] [CrossRef]
  5. Chao, D.; Xu, D.; Gao, F.; Zhang, C.; Zhang, W.; Zhu, L. A systematic survey on security in anonymity networks: Vulnerabilities, attacks, defenses, and formalization. IEEE Commun. Surv. Tutor. 2024, 26, 1775–1829. [Google Scholar] [CrossRef]
  6. Jansen, R.; Bauer, K.S.; Hopper, N.; Dingledine, R. Methodically Modeling the Tor Network. In Proceedings of the CSET, Bellevue, WA, USA, 6 August 2012. [Google Scholar]
  7. Chun, B.; Culler, D.; Roscoe, T.; Bavier, A.; Peterson, L.; Wawrzoniak, M.; Bowman, M. Planetlab: An overlay testbed for broad-coverage services. ACM Sigcomm Comput. Commun. Rev. 2003, 33, 3–12. [Google Scholar] [CrossRef]
  8. Hibler, M.; Ricci, R.; Stoller, L.; Duerig, J.; Guruprasad, S.; Stack, T.; Webb, K.; Lepreau, J. Large-scale virtualization in the emulab network testbed. In Proceedings of the 2008 USENIX Annual Technical Conference (USENIX ATC 08), Berkeley, CA, USA, 22–27 June 2008. [Google Scholar]
  9. Mirkovic, J.; Benzel, T. Teaching cybersecurity with DeterLab. IEEE Secur. Priv. 2012, 10, 73–76. [Google Scholar] [CrossRef]
  10. Cappos, J.; Hemmings, M.; McGeer, R.; Rafetseder, A.; Ricart, G. EdgeNet: A global cloud that spreads by local action. In Proceedings of the ACM Symposium on Edge Computing (SEC), Bellevue, WA, USA, 25 October 2018; pp. 359–360. [Google Scholar]
  11. Wang, Q.; Cao, W. A tor anonymity attack experiment platform driven by raspberry pi. In Proceedings of the 2020 11th International Conference on Prognostics and System Health Management (PHM-2020 Jinan), Jinan, China, 23–25 October 2020; pp. 569–574. [Google Scholar]
  12. Wang, P.; Liu, H.; Wang, B.; Dong, K.; Wang, L.; Xu, S. Simulation of dark network scene based on the big data environment. In Proceedings of the International Conference on Information Technology and Electrical Engineering 2018, Xiamen, China, 7–8 December 2018; pp. 1–6. [Google Scholar]
  13. Jansen, R.; Hopper, N.J. Shadow: Running Tor in a box for accurate and efficient experimentation. In Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS 2012), San Diego, CA, USA, 5–8 February 2012. [Google Scholar]
  14. Bauer, K.; Sherr, M.; Grunwald, D. ExperimenTor: A testbed for safe and realistic tor experimentation. In Proceedings of the 4th Workshop on Cyber Security Experimentation and Test (CSET 11), San Francisco, CA, USA, 8 August 2011. [Google Scholar]
  15. Barton, A.; Wright, M. Denasa: Destination-naive as-awareness in anonymous communications. Proc. Priv. Enhancing Technol. 2016, 2016, 356–372. [Google Scholar] [CrossRef]
  16. Hanley, H.; Sun, Y.; Wagh, S.; Mittal, P. DPSelect: A differential privacy based guard relay selection algorithm for Tor. Proc. Priv. Enhancing Technol. 2019, 2019, 166–186. [Google Scholar] [CrossRef][Green Version]
  17. Dahal, S.; Lee, J.; Kang, J.; Shin, S. Analysis on end-to-end node selection probability in tor network. In Proceedings of the 2015 International Conference on Information Networking (ICOIN), Siem Reap, Cambodia, 12–14 January 2015; pp. 46–50. [Google Scholar]
  18. Geddes, J.; Schliep, M.; Hopper, N. Abra cadabra: Magically increasing network utilization in tor by avoiding bottlenecks. In Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society, Vienna, Austria, 24 October 2016; pp. 165–176. [Google Scholar]
  19. Imani, M.; Barton, A.; Wright, M. Guard sets in tor using as relationships. Proc. Priv. Enhancing Technol. (PoPETs) 2018, 2018, 145–165. [Google Scholar] [CrossRef]
  20. Conrad, B.; Shirazi, F. Analyzing the effectiveness of dos attacks on tor. In Proceedings of the 7th International Conference on Security of Information and Networks, Glasgow, UK, 9–11 September 2014; pp. 355–358. [Google Scholar]
  21. Jansen, R.; Vaidya, T.; Sherr, M. Point break: A study of bandwidth {Denial-of-Service} attacks against tor. In Proceedings of the 28th USENIX security symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 1823–1840. [Google Scholar]
  22. Jansen, R.; Tschorsch, F.; Johnson, A.; Scheuermann, B. The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network. In Proceedings of the NDSS, San Diego, CA, USA, 23–26 February 2014. [Google Scholar]
  23. Rochet, F.; Pereira, O. Dropping on the Edge: Flexibility and Traffic Confirmation in Onion Routing Protocols. Proc. Priv. Enhancing Technol. 2018, 2018, 27–46. [Google Scholar] [CrossRef]
  24. Jansen, R.; Tracey, J.; Goldberg, I. Once is never enough: Foundations for sound statistical inference in tor network experimentation. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual, 11–13 August 2021; pp. 3415–3432. [Google Scholar]
  25. Chakravarty, S.; Stavrou, A.; Keromytis, A.D. Traffic analysis against low-latency anonymity networks using available bandwidth estimation. In Proceedings of the European Symposium on Research in Computer Security, Athens, Greece, 20–22 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 249–267. [Google Scholar]
  26. Zheng, X.; Yan, H.; Wang, R.; Zhang, Z.; Li, H. LUNAR: A Practical Anonymous Network Simulation Platform. Secur. Commun. Netw. 2022, 2022, 5832124. [Google Scholar] [CrossRef]
  27. Liu, P.; Wang, L.; Tan, Q.; Li, Q.; Wang, X.; Shi, J. Empirical measurement and analysis of I2P routers. J. Netw. 2014, 9, 2269. [Google Scholar] [CrossRef]
  28. Hoang, N.P.; Doreen, S.; Polychronakis, M. Measuring {I2P} censorship at a global scale. In Proceedings of the 9th USENIX Workshop on Free and Open Communications on the Internet (FOCI 19), Santa Clara, CA, USA, 13 August 2019. [Google Scholar]
  29. Liu, L.; Zhang, H.; Shi, J.; Yu, X.; Xu, H. I2P anonymous communication network measurement and analysis. In Proceedings of the Smart Computing and Communication: 4th International Conference, SmartCom 2019, Birmingham, UK, 11–13 October 2019; Proceedings 4; Springer: Berlin/Heidelberg, Germany, 2019; pp. 105–115. [Google Scholar]
  30. Hoang, N.P.; Kintis, P.; Antonakakis, M.; Polychronakis, M. An empirical study of the i2p anonymity network and its censorship resistance. In Proceedings of the Internet Measurement Conference 2018, Boston, MA, USA, 31 October–2 November 2018; pp. 379–392. [Google Scholar]
  31. Magán-Carrión, R.; Abellán-Galera, A.; Maciá-Fernández, G.; García-Teodoro, P. Unveiling the I2P web structure: A connectivity analysis. Comput. Netw. 2021, 194, 108158. [Google Scholar] [CrossRef]
  32. Gao, Y.; Tan, Q.; Shi, J.; Wang, X.; Chen, M. Large-scale discovery and empirical analysis for I2P eepSites. In Proceedings of the 2017 IEEE Symposium on Computers and Communications (ISCC), Heraklion, Greece, 3–6 July 2017; pp. 444–449. [Google Scholar]
  33. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  34. Tan, R.; Tan, Q.; Wang, H.; Xie, Y.; Zhang, P. P-I2Prange: An Automatic Construction Architecture for Scenarios in I2P Ranges. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–10. [Google Scholar]
  35. Elahi, T.; Danezis, G.; Goldberg, I. Privex: Private collection of traffic statistics for anonymous communication networks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1068–1079. [Google Scholar]
  36. Jansen, R.; Johnson, A. Safely measuring tor. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 1553–1567. [Google Scholar]
  37. Jansen, R.; Traudt, M.; Hopper, N. Privacy-preserving dynamic learning of tor network traffic. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1944–1961. [Google Scholar]
  38. Conrad, B.; Shirazi, F. A Survey on Tor and I2P. In Proceedings of the Ninth International Conference on Internet Monitoring and Protection (ICIMP2014), Paris, France, 20–24 July 2014; pp. 22–28. [Google Scholar]
  39. I2P Official Homepage. Secure Semireliable UDP (SSU). 2025. Available online: https://i2p.net/en/docs/legacy/ssu-overview/ (accessed on 1 June 2025).
  40. Luan, S.; Hua, C.; Lu, Q.; Zhu, J.; Zhao, M.; Zhang, S.; Chang, X.W.; Precup, D. Revisiting heterophily for graph neural networks. Adv. Neural Inf. Process. Syst. 2022, 35, 1362–1375. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.