Next Article in Journal
Design of Stable Signed Laplacian Matrices with Mixed Attractive–Repulsive Couplings for Complete In-Phase Synchronization
Previous Article in Journal
A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning
Previous Article in Special Issue
CPEL: A Causality-Aware, Parameter-Efficient Learning Framework for Adaptation of Large Language Models with Case Studies in Geriatric Care and Beyond
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MaGNet-BN: Markov-Guided Bayesian Neural Networks for Calibrated Long-Horizon Sequence Forecasting and Community Tracking

1
Department of Computer Science, University of Liverpool, Liverpool L69 3DR, UK
2
Department of Computer Science, Fairleigh Dickinson University, Vancouver, BC V6B 2P6, Canada
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(17), 2740; https://doi.org/10.3390/math13172740
Submission received: 8 July 2025 / Revised: 15 August 2025 / Accepted: 24 August 2025 / Published: 26 August 2025

Abstract

Forecasting over dynamic graph environments necessitates modeling both long-term temporal dependencies and evolving structural patterns. We propose MaGNet-BN, a modular framework that simultaneously performs probabilistic forecasting and dynamic community detection on temporal graphs. MaGNet-BN integrates Bayesian node embeddings for uncertainty modeling, prototype-guided Louvain clustering for community discovery, Markov-based transition modeling to preserve temporal continuity, and reinforcement-based refinement to improve structural boundary accuracy. Evaluated on real-world datasets in pedestrian mobility, energy consumption, and retail demand, our model achieves on average 11.48% lower MSE, 6.62% lower NLL, and 10.82% higher Modularity (Q) compared with the best-performing baselines, with peak improvements reaching 12.0% in MSE, 7.9% in NLL, and 16.0% in Q on individual datasets. It also improves uncertainty calibration (PICP) and temporal community coherence (tARI). Ablation studies highlight the complementary strengths of each component. Overall, MaGNet-BN delivers a structure-aware and uncertainty-calibrated forecasting system that models both temporal evolution and dynamic community formation, with a modular design enabling interpretable predictions and scalable applications across smart cities, energy systems, and personalized services.

1. Introduction

Forecasting pedestrian flows, electricity demand, and retail sales in dynamic environments necessitates models capable of managing both long-term temporal patterns and continually evolving interaction graphs [1,2]. Each time step is treated as a graph snapshot whose community structure evolves with behavioral cycles and seasonal effects [3,4]. In numerous real-world systems—such as transportation networks, power grids, and online social platforms—the foundational interaction graph is intrinsically dynamic, with edges and communities altering due to external events, changes in user behavior, or operational limitations [4]. This creates several challenges for forecasting: (i) the graph topology may change abruptly, invalidating static or slowly-adapting adjacency assumptions [5]; (ii) temporal dependencies often span long horizons, requiring models to integrate both recent and historical patterns; (iii) uncertainty in predictions, if unmodeled, can lead to costly or unsafe decisions in critical domains [6], and community detection approaches employed on a frame-by-frame basis often neglect temporal consistency, resulting in unstable or noisy structural interpretations [7]. These constraints hinder the implementation of existing methodologies in time-critical applications where accuracy and interpretability are paramount.
Most current forecasters presume a fixed graph topology [8,9] or static spatial priors [1], neglecting to explicitly account for uncertainty—an imperative consideration in safety-critical sectors such as energy forecasting and retail planning. Dynamic community identification approaches seek to identify developing network structures; however, they frequently analyze temporal snapshots in isolation or utilize heuristic smoothing, resulting in the omission of nuanced structural transitions [7].

Contributions

Unified probabilistic framework —MaGNet-BN is the first end-to-end model that jointly performs calibrated long-horizon forecasting and dynamic community tracking. It fuses Bayesian node embeddings, prototype-guided Louvain clustering, Markov smoothing, and PPO refinement into a single, differentiable pipeline.
New state of the art on seven datasets—Across traffic, mobility, social, e-mail, energy, and retail domains, MaGNet-BN tops 26/28 forecasting scores (MSE, NLL, CRPS, PICP) and every structural metric (Q, tARI, NMI), outperforming seven strong baselines.
Efficiency and robustness—A complete hyperparameter sweep plus training finishes in 11 GPU-hours on one A100. Worst-case MSE drift under parameter sweeps is <2.7%, and modularity drop under 5% edge-rewiring is halved versus the best dynamic-GNN baseline.
Reproducible research assets—We provide sanitized datasets, code, and a cohesive assessment workflow, thereby establishing a replicable standard for probabilistic forecasting in the context of dynamic community evolution.
Methodologically, MaGNet-BN substitutes arbitrary snapshot smoothing with a learnable PPO policy, allowing epistemic uncertainty from Bayesian embeddings to directly influence structural updates and Markov transitions, hence enhancing accuracy and stability amidst distribution shifts. For deployment, prototype-based, interpretable communities with calibrated uncertainty facilitate decision-making in traffic management, energy distribution, and retail restocking, while our structure-aware evaluation protocol (reconstructing k-NN graphs for non-graph forecasters under uniform clustering) offers a consistent benchmark. We delineate an online/streaming variant (streaming variational Bayes with forgetting, warm-started prototype Louvain with decay, Dirichlet-updated transitions, short-burst PPO updates) for near real-time adaptation, and demonstrate multimodal extensibility through modality-specific encoders and uncertainty-weighted fusion to effectively integrate social, geospatial, weather, and pricing signals. To be precise, our novelty lies in a single objective that couples calibrated forecasting (NLL/ECE) with structural consistency (Q, tARI), yielding an interdependent coupling of Bayesian embeddings, prototype-guided clustering, Markov smoothing, and PPO; removing any part breaks this coupling.

2. Related Work

Our research intersects three essential domains: temporal graph forecasting, dynamic community recognition, and uncertainty-aware graph learning. We examine exemplary work in each subject and emphasize the unique aspects of our suggested methodology.

2.1. Temporal Graph Forecasting

Temporal graph forecasting emphasizes the modeling of time-varying signals inside graph-structured data. Traditional methodologies, including DCRNN [1], TGCN [10], and Graph WaveNet [11], integrate graph convolutional networks with recurrent neural networks to effectively capture spatial and temporal dependencies. Transformer-based approaches, such as Informer [12] and TFT [8], prioritize long-range temporal modeling using attention mechanisms; however, they predominantly presuppose static topologies and immutable graph structures.
Recent endeavors have sought to tackle dynamicity. Huang and Lei [13] present group-aware graph diffusion for dynamic link prediction. Nonetheless, their methodology does not explicitly account for community change or uncertainty. Conversely, our suggested MaGNet-BN architecture incorporates dynamic community tracking into the forecasting process, facilitating structure-aware long-term prediction.

2.2. Dynamic Community Detection

Dynamic community detection aims to identify and monitor the evolution of node clusters over time. Existing methods commonly perform snapshot-level clustering independently [14] or apply heuristic temporal smoothing [7], which can induce inconsistencies or imprecision. Reinforcement-based approaches (e.g., modularity-optimizing policies) improve clustering quality [15], yet they do not integrate temporal forecasting. Recent advances extend capability along several axes: (i) TCDA-NE integrates embedding, evolutionary clustering, and matrix factorization to obtain high-quality, temporally smooth partitions [16]; (ii) DLEC couples deep autoencoders with evolutionary clustering for dynamic networks [17]; (iii) modularity-based tracking frameworks detect salient community events in real social graphs [18]; (iv) DyComPar offers vertex-centric parallel detection with large-scale comparisons [19]; and (v) probabilistic formulations jointly detect communities and anomalies via a Markovian generative model suitable for monitoring [20]. Collectively, these works push temporal coherence, scalability, and event-level interpretability—but most treat sequence forecasting as out of scope.
In safety-critical, high-stakes fields, integrating uncertainty is vital for dependable decision-making. Bayesian neural networks and Bayesian nonparametrics assess epistemic uncertainty beyond point estimates, facilitating calibrated inference and systematic structural updates. For temporal graphs, Bayesian node embeddings propagate uncertainty across spatiotemporal dependencies [21]; nonparametric latent-space models flexibly accommodate time-varying community numbers [22,23]; dynamic Bayesian networks capture evolving causal dependencies [24]; surveys of temporal graph learning highlight Bayesian tools for handling missing/noisy edges [25]; and probabilistic, distance-based clustering can stabilize dynamic assignments [26].

2.3. Uncertainty-Aware Graph Learning

Bayesian neural networks (BNNs) and methods such as Monte Carlo dropout have been utilized to assess epistemic uncertainty in node-level and predictive tasks. Pang et al. [27] employ Bayesian spatiotemporal transformers for trajectory prediction, providing uncertainty-aware modeling. Nevertheless, these methodologies frequently overlook the significance of dynamic graph topologies or communities.
Our research advances this area by integrating uncertainty at both the embedding level and the community-building process. This facilitates strong, comprehensible predictions with structural insight and probabilistic assurance.
Leveraging these discoveries, MaGNet-BN incorporates Bayesian modeling in both the node-representation phase and the community-formation process: Dual-level uncertainty directly influences structural updates and Markov transitions, while a reinforcement learner (PPO) clarifies unclear boundaries. This design produces calibrated long-term forecasts and interpretable, temporally consistent communities amid structural change—reconciling predictive accuracy with structural reliability—and, unlike previous studies, jointly incorporates dynamic community evolution with temporal forecasting within a unified framework. Specifically, MaGNet-BN simultaneously models community dynamics through a Markov transition process and executes sequence prediction, enhanced by PPO-based structural optimization.

3. Methodology

This section presents MaGNet-BN, a modular framework aimed at achieving calibrated forecasting and reliable structural tracking in dynamic graphs. The framework consists of five successive modules, each designated for a specific learning aim.

3.1. Pipeline Overview

MaGNet-BN operates through five stages: (1) preprocessing input sequences into temporal graphs, (2) extracting Bayesian node embeddings, (3) deriving initial communities via prototype-guided Louvain clustering, (4) estimating Markov transitions between communities, and (5) refining boundaries using PPO reinforcement. Figure 1 provides a comprehensive overview.
We now describe each component in detail, starting with the data preprocessing step.

3.2. Data Preprocessing

Given a sequence of observations { x 1 , x 2 , , x T + H } , we apply a sliding window of length L and stride Δ to generate a series of T overlapping snapshots. Each window defines a graph G ( t ) = ( V ( t ) , E ( t ) , X ( t ) ) , where nodes and edges are constructed from temporal interactions or spatial relations within the window.
Continuous characteristics undergo Z-score normalization, whilst categorical features are converted into dense vectors by learned embeddings. To address missing values, we utilize a hybrid imputation approach that integrates forward-fill and k-nearest neighbor interpolation. The resultant node attribute matrix X ( t ) functions as input to the Bayesian embedding layer.
While we utilize k-NN graphs on latent embeddings to establish edges, this procedure is heuristic and remains static once created. This may potentially introduce edge noise or improper structural assumptions. More formally, for each node v, its neighborhood is selected as:
N k ( v ) = Top - k cos ( z v , z v ) : v V { v }
This method disregards feedback from downstream task performance while selecting edges. A viable alternative is Graph Structure Learning (GSL) [28,29], in which the adjacency matrix A ^ is concurrently learned with node representations. This adaptive modeling could enhance forecast precision and structural coherence.
Finally, MaGNet-BN presumes that node features are either numerical or categorical and does not presently accommodate multimodal inputs, including text, photos, or geospatial data. Future enhancements may integrate pretrained encoders [30] or transformer-based fusion models [31] to facilitate wider applicability in fields encompassing multimodal sensor data, documents, or videos.

3.3. Bayesian Embedding

To capture uncertainty in node representations, we adopt a Bayesian neural network (BNN), where the weights of the GNN are modeled as Gaussian distributions with variational parameters:
q ϕ ( w ) = N ( μ ϕ , diag ( σ ϕ 2 ) )
sampled using the reparameterization trick: w = μ ϕ + σ ϕ ϵ , with ϵ N ( 0 , I ) .
The BNN is trained by maximizing the evidence lower bound (ELBO) [32]:
L ELBO = E q ϕ ( w ) [ log p θ ( Y X , w ) ] KL ( q ϕ ( w ) p 0 ( w ) )
Each node embedding is estimated by averaging over M Monte Carlo samples:
μ v ( t ) = 1 M m = 1 M h v ( m ) , σ v 2 ( t ) = 1 M m = 1 M h v ( m ) μ v ( t ) 2

3.4. Prototype-Guided Louvain Clustering

To derive initial community assignments C t init , we construct a sparse k-nearest neighbor graph G ˜ t using the node embeddings μ v ( t ) through cosine similarity. Thereafter, we employ the Louvain algorithm [33] to enhance modularity on G ˜ t , yielding superior clustering partitions.
To enable temporal synchronization, we additionally produce P representative nodes for each community based on distinct PageRank scores. These high-centrality nodes function as enduring structural anchors and provide consistent references for the Markov and reinforcing phases. We select prototypes via personalized PageRank:
π v = ( 1 α ) e v + α · π v · T , 0 < α < 1
where T denotes the normalized transition matrix and α is the teleport parameter, constrained to ( 0 , 1 ) to ensure a valid convex combination between the restart distribution e v and the stationary distribution induced by T [34,35]. We follow the commonly used setting in prior work [34] and set α = 0.15 , and this has now been explicitly stated here for clarity. This approach effectively emphasizes central nodes but may prioritize high-degree hubs, neglecting architecturally significant yet peripheral nodes. To mitigate this bias, subsequent research could implement diversity-aware selection [36] or entropy-based node selection to encapsulate diverse community roles.

3.5. Markov Transition Modeling

To capture inter-snapshot dynamics, we estimate a community-level transition matrix P ( t ) using a first-order Markov model. Let S v ( t ) denote the community assignment of node v at time t. The transition probability from community j at t 1 to k at t is computed as
p j k ( t ) = v 1 [ S v ( t 1 ) = j S v ( t ) = k ] + λ k v 1 [ S v ( t 1 ) = j S v ( t ) = k ] + λ · K ( t )
where λ = 0.1 is a Laplace smoothing factor. The resulting matrix P ( t ) encourages temporal consistency by penalizing community switches that deviate from dominant transition patterns. Our model estimates first-order transition matrices P ( t ) assuming that structural evolution follows a Markovian process:
P ( S ( t ) S ( t 1 ) , S ( t 2 ) , ) P ( S ( t ) S ( t 1 ) )
This simplification is efficient but may not capture long-term dependencies or delayed effects. Future work may explore higher-order Markov chains [37] or memory-enhanced models like HMMs and RNNs [38], which model transitions with richer histories and dynamic priors.

3.6. Reinforcement-Based Refinement

We frame boundary node reallocation as a reinforcement learning (RL) problem to enhance temporal smoothness and modularity. A boundary node v is characterized by neighbors associated with distinct communities, indicating uncertainty in its classification.
Each node’s state is defined as
s v = ( μ v ( t ) , σ v 2 ( t ) , P c v , ( t ) , deg ( v ) )
where μ v ( t ) and σ v 2 ( t ) are Bayesian embeddings, P c v , ( t ) is the Markov transition vector for v’s current community c v , and deg ( v ) denotes node degree.
The agent selects an action a v { stay , migrate - to - k } with reward:
R v = Δ Q + α · Δ Cond + β · p c v k ( t )
which combines modularity gain ( Δ Q ), conductance reduction, and Markov-guided transition likelihood. All terms are normalized to [ 0 , 1 ] .
Policy π θ ( a v s v ) and value V ψ ( s v ) functions are trained via PPO [39], enabling optimization of non-differentiable objectives like modularity while avoiding greedy local minima. The final assignments C t final reflect globally consistent and temporally coherent community structures. While PPO stabilizes updates using the clipped surrogate loss [39], training can still be sensitive to reward scaling and exploration variance:
J PPO = E t min r t ( θ ) A ^ t , clip ( r t ( θ ) , 1 ϵ , 1 + ϵ ) A ^ t
where r t ( θ ) is the policy ratio and A ^ t the advantage estimator. We noted intermittent policy instability when boundary nodes exhibited contradicting modularity and Markov scores. Future enhancements may encompass offline reinforcement learning [40] or curriculum-based policy warming to prevent premature divergence. We state this first-order assumption explicitly as a modeling choice: when transitions concentrate on self-stays and a few neighbor communities, it offers a favorable bias–variance trade-off; richer histories can be substituted if long-range effects dominate.

3.7. Loss Function and Optimization

The overall objective combines forecasting fidelity, Bayesian regularization and structural coherence:
L = L pred MSE / NLL + λ 1 L ELBO Bayesian encoder + λ 2 L Markov temporal smoothness λ 3 J PPO actor - critic
where L pred is mean squared error (MSE) for deterministic runs or negative log-likelihood (NLL) for probabilistic output; L ELBO is the evidence lower bound of the Bayesian encoder; L Markov = t log p ( S ( t ) S ( t 1 ) ) enforces community-label continuity; J PPO is the clipped surrogate objective used by Proximal Policy Optimization.
Default weights ( λ 1 , λ 2 , λ 3 ) = ( 1.0 , 0.5 , 0.2 ) are selected via grid search on the validation split (range [ 0.1 , 2.0 ] ). Training uses AdamW ( η = 3 × 10 4 , weight-decay 10 2 ) with a cosine scheduler and early stopping (patience = 20).
Remark on Non-Negativity of L . As shown in Equation (11), the total loss comprises multiple positive terms ( L pred , L ELBO , L Markov ) and one negative term ( J PPO ) from the reinforcement objective. Consequently, L is not strictly non-negative. This does not impede optimization, as each period is allocated a unique weight, which is meticulously adjusted to provide stable training. Particularly, λ 3 is calibrated to ensure an adequate equilibrium between the PPO objective and the other loss components.

3.8. Computational Complexity

Let each snapshot contain | V | nodes and | E | edges ( | E | k | V | after k-NN sparsification), d be the hidden size, L the GNN depth, M the Monte Carlo samples used by the Bayesian encoder, and U the PPO updates applied to B boundary nodes. All results are per epoch over T snapshots.

3.8.1. Bayesian Encoder (Dominant)

A sparse GCN layer costs O ( | E | d + | V | d 2 ) ; M samples and L layers therefore give
O T M L ( | E | d + | V | d 2 )

3.8.2. Prototype Louvain

Cosine k-NN search plus Louvain modularity adds O T ( | V | k d + | E | ) , sub-linear to the encoder term when k d .

3.8.3. Markov Update

One pass over node labels: O ( T | V | ) – negligible.

3.8.4. PPO Refinement

Actor–critic MLP ( O ( d ) ) on B boundary nodes for U steps: O ( T B U d ) . In practice B | V | ( < 15 % ).

3.8.5. Total

Samples which is linear in T , | V | , | E | . For sparse graphs ( | E | | V | ) the M L | V | d 2 term dominates; with M = 3 , L = 2 , d = 128 the full seven-datasets run trains in 4.7 GPU-hours on one A100.
O T M L ( | E | d + | V | d 2 ) + | V | k d + B U d
In contrast to conventional dynamic GNN architectures, MaGNet-BN realizes significant efficiency improvements by circumventing redundant global parameter updates and employing streamlined boundary revisions during the Markov and PPO phases. The PPO stage functions exclusively on a limited subset of boundary nodes (fewer than 15% of the total nodes), incurring less computing expense compared with the predominant Bayesian encoder component. This architecture preserves temporal–structural integrity while markedly decreasing unnecessary calculations, facilitating scalability to mid- and large-scale temporal graphs without compromising accuracy. In practice, the encoder dominates runtime while PPO operates on fewer than 15% boundary nodes, so the prototype/Markov/PPO stages contribute bounded overhead; together with Equation (13), this yields near-linear scaling in T, | V | , and | E | .

3.9. Memory

Main memory stems from M sampled embeddings and PPO buffers: O M | V | d + | E | + B U d , well within 80GB for the largest dataset.
Hence, MaGNet-BN scales linearly with graph size and is practical for mid- to large-scale temporal graphs.

3.10. Algorithm

Algorithm 1 summarizes the full training and inference routine. The pipeline proceeds from raw windowed snapshots through five clearly delineated stages: (i) data cleaning, (ii) variational Bayesian node embedding, (iii) prototype-guided Louvain clustering, (iv) Markov smoothing of community trajectories, and (v) PPO-based boundary refinement. This modular decomposition makes each learning signal—likelihood, ELBO, Markov continuity, and RL rewards—explicit, enabling stable end-to-end optimization under the joint loss of Equation (11). At inference time, the same sequence of steps (sans gradient updates) yields both calibrated forecasts y ^ and temporally coherent community labels C t final , facilitating downstream decision support in dynamic graph environments.
 Algorithm 1  MaGNet-BN—Unified Training Procedure
 Require:  Raw time series x 1 : T + H ; window length L; stride Δ ; hyperparameters k , M , P , ( α , β ) , ( λ 1 : 3 )
 Ensure:  Calibrated forecaster f ^ ; final communities { C t final } t = 1 T
    Stage 1: Snapshot Construction and Preprocessing 
  1:
Slide window ( L , Δ ) to obtain graphs { G ( t ) = ( V ( t ) , E ( t ) , X ( t ) ) } t = 1 T
  2:
Impute/standardize features; embed categoricals
Stage 2: Bayesian Node Embedding 
  3:
for  t 1  to  T do 
  4:
for  m 1  to M do 
  5:
  Sample weights w ( m ) q ϕ ( w )             ▹ variational drop-out
  6:
   h ( m ) GNN ( G ( t ) ; w ( m ) )
  7:
end for 
  8:
μ ( t ) , σ 2 ( t ) mean / var { h ( m ) } m = 1 M
  9:
end for 
10:
Update encoder ϕ by maximizing ELBO
Stage 3: Prototype-Guided Louvain 
11:
for  t 1  to  T do 
12:
 Build k-NN graph G ˜ t on μ ( t )
13:
C t init Louvain ( G ˜ t , γ = 1 )
14:
 Select P prototypes/comm. via personalized PageRank
15:
end for 
Stage 4: Markov Transition Modeling 
16:
 for  t 2  to  T do 
17:
 Estimate P ( t ) from C t 1 init C t init
18:
end for 
Stage 5: PPO Boundary Refinement 
19:
for  t 1  to  T do 
20:
 Identify boundary nodes B t
21:
for all  v B t  do 
22:
  Build state s v = ( μ v ( t ) , σ v 2 ( t ) , P c v , ( t ) , deg ( v ) )
23:
  Sample action a v π θ ( · s v )
24:
  Apply a v and collect reward R v
25:
end for 
26:
 Update ( θ , ψ ) via PPO loss J PPO
27:
end for 
Final: Forecast Head and Joint Optimization 
28:
Predict y ^ T + 1 : T + H ; assemble total loss L (Equation (11))
29:
Optimize ( ϕ , θ , ψ ) with AdamW + cosine schedule

4. Experiments

4.1. Datasets

We evaluate MaGNet-BN on seven  publicly available dynamic graph datasets spanning six real-world domains: traffic (2), mobility, social media, e-mail, energy, and retail. Table 1 summarizes their statistics.
  • METR-LA and PeMS-BAY—minute-level road-traffic speeds from loop detectors in Los Angeles and the Bay Area ( T = 34 , 272 and 52 , 560 snapshots) [1].
  • TwitterRC—2160 hourly snapshots of retweet/mention interactions among 22,938 users [41].
  • Enron-Email—194 weekly snapshots of corporate e-mail exchanges (150,028 nodes) [42].
  • ETH+UCY—3588 twelve-second pedestrian-interaction graphs recorded in public scenes [43].
  • ELD-2012—8760 hourly power-consumption graphs (370 smart-meter clients) extracted from the ElectricityLoadDiagrams20112014 dataset [44].
  • M5-Retail—1941 daily sales-correlation graphs covering 3049 Walmart items [2].
All datasets are split 70%/15%/15% (train/val/test) in chronological order.

4.2. Baselines

We compare MaGNet-BN with seven representative baselines, carefully chosen to cover the three research threads intertwined in our task—time series forecasting, dynamic graph learning, and uncertainty-aware community detection. Table 2 summarizes how they span these facets.

4.2.1. Sequence-Forecasting Baselines (Graph-Agnostic)

  • DeepAR [9]. Autoregressive LSTM with Gaussian output quantiles; a de facto standard for univariate/multivariate probabilistic forecasting. Why: sets the reference point for purely temporal models without spatial bias.
  • MC-Drop LSTM [6]. Injects dropout at inference to sample from the weight posterior—simple yet strong Bayesian baseline. Why: isolates the benefit of explicit epistemic uncertainty without graph information.
  • Temporal Fusion Transformer (TFT) [8]. Multi-head attention, static covariates, and gating; current SOTA on many time series leaderboards. Why: strongest recent non-graph forecaster.
  • DCRNN [1]. Diffusion convolution on a fixed sensor adjacency, followed by seq2seq GRU. Why: canonical example of static-graph-aware spatiotemporal forecasting.

4.2.2. Dynamic Graph Baselines

  • DySAT [45]. Self-attention across structural and temporal dimensions; acquires snapshot-specific embeddings. Why: early but influential method; serves as the “attention-without-memory’’ extreme.
  • TGAT [5]. Time-encoding kernels plus graph attention, enabling continuous-time message passing. Why: tests whether high-resolution event timing alone suffices for our coarse snapshot setting.
  • TGN [41]. Memory modules store node histories and are updated by temporal messages; often SOTA on link prediction. Why: strongest publicly available dynamic-GNN with memory.
All baselines inherit the preprocessing in Section 4.1. Evaluation covers structural coherence (Modularity [46], and temporal ARI [47]) by re-clustering last-layer embeddings with the unified pipeline of Section 1.

4.2.3. Hyperparameter Tuning of Baselines

For all baseline models, including TGN, we conducted validation sweeps over key hyperparameters (e.g., hidden dimensionality, learning rate, number of layers, memory size for TGN). The final configuration for each baseline was chosen to minimize the mean squared error (MSE) on the validation set. A summary of these settings is provided in Table 3.

4.3. Implementation Details

All experiments are carried out in Python 3.10 using PyTorch 2.2 and PyTorch Geometric 2.5 on a single NVIDIA A100-80GB GPU. Random seeds are fixed to ensure replicability.

4.3.1. Snapshot Construction

For every dataset, we slide a fixed-length window over the raw sequence to build overlapping graph snapshots:
  • METR-LA, PeMS-BAY: L = 12 (minute-level, 12 min horizon)
  • TwitterRC: L = 24 (1 h bins, one-day horizon)
  • Enron-Email: L = 4 (weekly bins, one-month horizon)
  • ETH+UCY: L = 8 (12 s bins, 96 s horizon)
  • ELD-2012: L = 96 (hourly bins, 4-day horizon)
  • M5-Retail: L = 56 (daily bins, 8-week horizon)
A cosine k-nearest-neighbor graph with k = 10 is constructed in each window, and P = high-PageRank prototypes are selected per community to serve as temporal anchors.

4.3.2. Model Hyperparameters

The Bayesian encoder is a two-layer GCN with hidden dimension d = 128 ; KL-annealed variational inference uses M = 3 Monte Carlo samples per snapshot. The PPO agent employs a lightweight actor–critic (two 64-unit MLPs) and performs an update after every 32 boundary nodes. This simple multi-layer perceptron design for the participant–commentator networks in the PPO stage was chosen to maintain low computational cost and training stability, while still satisfying the requirements of our tasks and datasets. Although more complex architectures could be explored, our preliminary tests indicated that this lightweight structure was sufficient. We optimize with AdamW ( η = 3 × 10 4 , weight-decay 10 2 ) and a cosine learning-rate schedule; early stopping patience is 20 epochs.

4.4. Hyperparameter Selection and Sensitivity Analysis

We conducted a systematic hyperparameter optimization process to balance predictive accuracy, uncertainty calibration, and community coherence. Key parameters include the Bayesian embedding dimension d b , Louvain resolution γ , Markov transition smoothing coefficient α m , PPO learning rate η , and reward weights ( α r , β r ) for structural vs. temporal alignment. We initially explored ranges informed by prior literature [6,21] and empirical heuristics from dynamic graph learning benchmarks [13,15]. A combination of grid search and Bayesian optimization was used on validation splits, with early stopping guided by NLL and modularity score.
To assess robustness, we performed a sensitivity analysis by perturbing each hyperparameter while keeping others fixed. The results show that MaGNet-BN maintains stable performance for ±20% variations in γ and α m , while MSE drift remains below 2.7% and modularity drop is halved compared with the best dynamic-GNN baseline. This stability indicates that our design does not rely on fragile parameter tuning, supporting deployment in dynamic, real-world settings.

Runtime

Full hyperparameter search plus training over all seven datasets finishes in 11 GPU-hours—4.7 h for model training and 6.3 h for validation sweeps.

4.5. Evaluation Metrics

We report two families of metrics:

4.5.1. Predictive Accuracy

  • Mean squared error (MSE) [48]
  • Negative log-likelihood (NLL) [49]
  • Continuous Ranked Probability Score (CRPS) [50]
  • Prediction Interval Coverage Probability (PICP) [51]
MSE and NLL measure point accuracy and calibration, whereas CRPS and PICP assess the full predictive distribution. For every dataset, we run five random seeds and report mean ± 95% confidence interval in Table 4.

4.5.2. How We Form the ±95% Confidence Interval

For each dataset–metric–model triple, we train with five PyTorch-level random seeds ( { 11 , 13 , 17 , 19 , 23 } in our code). Let the resulting sample be { x i } i = 1 n with n = 5 .
(i)
Sample mean: x ¯ = 1 n i = 1 n x i .
(ii)
Unbiased st. dev.: s = 1 n 1 i = 1 n x i x ¯ 2 .
(iii)
Half-width for a 95% CI: h = t 0.975 , n 1 s n , t 0.975 , 4 = 2.776 .
We finally report x ¯ ± h (three decimals for MSE/NLL/CRPS; one for PICP).

4.5.3. Worked Example (ETH+UCY, MaGNet-BN, MSE)

x ¯ = 0.213 , s = 0.0063 , h = 2.776 × 0.0063 5 = 0.0078 0.213 ± 0.008 .

4.5.4. Significance Annotation

For every baseline, we build the paired difference d i = x i base x i MaGNet over the same seeds and run a two-tailed t-test: italic if p < 0.05 , bold  if p < 0.01 — always testing “is the baseline worse?”.

4.5.5. Structural Coherence

We assess graph consistency through Modularity Q (quality of within-snapshot communities) and the temporal Adjusted Rand Index (tARI) (consistency of node assignments across successive snapshots). All models, including baselines, undergo post-processing through the unified clustering pipeline to guarantee equitable comparison.

4.5.6. Domain-Cluster Reporting

To keep the discussion concise, we aggregate results by domain cluster (traffic, social, e-mail, crowd, energy, retail) when describing trends in the text, while the full seven-dataset Table 5 provides per-dataset detail.

4.5.7. Unified Louvain Post-Processing

All models—our own and all eight baselines—are re-clustered after forward inference by the same two-step pipeline, so that Modularity (Q), temporal ARI, NMI, and VI are strictly comparable:
  • k for k-NN: 10 for traffic and e-mail graphs, 25 for social and retail graphs;
  • Similarity: cosine distance on 2 -normalized embeddings;
  • Resolution γ : 1.0 (vanilla Louvain);
  • Post-merge: keep giant components; orphan nodes inherit the label of their nearest prototype.
The resulting partitions feed directly into Table 6 and Table 7.

4.6. Main Results

Across seven dynamic graph datasets and seven competitive baselines (Section 4.2), MaGNet-BN delivers state-of-the-art forecasting accuracy and community-structure fidelity. We train each model using five random seeds and retain the checkpoint exhibiting the lowest validation loss, adhering to established best practices in probabilistic forecasting. All numbers are reported as mean ± 95% CI; most values worse than MaGNet-BN.

4.6.1. Forecasting Accuracy

As summarized in Table 8, MaGNet-BN is best on 26 out of 28 dataset–metric combinations, dropping points only on the sparsest domain (M5-Retail). These findings are further supported by the ablation results in Table 9.
Table 5 ranks all methods by four metrics (MSE, NLL, CRPS, PICP). MaGNet-BN finishes first on 26/28 metric–dataset pairs and never drops below second place. On the two traffic datasets (METR-LA, PeMS-BAY) MaGNet-BN improves NLL  by 0.19 0.27 nats over TGN and raises PICP  from 90.3%92.1%  (METR-LA, + 1.8 pp) and from 90.1%91.3%  (PeMS-BAY, + 1.2 pp). The margin widens on the bursty TwitterRC stream (88.8% vs. 86.9%, + 1.9 pp), underscoring the benefits of Bayesian sampling and prototype anchors.
Gains are larger on bursty TwitterRC, highlighting the benefit of Bayesian sampling and prototype anchors.

4.6.2. Structural Consistency

To assess the efficacy of each model in maintaining graph structure over time, we quantify two complementary attributes:
Modularity (Q)—the quality of community partition inside a single snapshot.
Temporal Adjusted Rand Index (tARI)—the concordance of node assignments over successive snapshots.
For every method (our model and all seven baselines), we re-cluster the final-layer node embeddings with a uniform pipeline (Section 4.5): cosine k-NN ( k = 10 /25), Louvain with γ = 1.0 , and orphan reassignment. This guarantees that any difference in Q or tARI stems from the representation quality, not from differing post-processing.
Table 6 reports numeric scores, and the heat-map in Figure 2 gives a visual overview.
MaGNet-BN attains the highest Q and tARI among all seven datasets. Typical increases over the next-best baseline (TGN) range from 6 to 9 percentage points in Q and 5 to 8 percentage points in tARI, with the most significant margins observed in sparse or highly dynamic graphs (Enron-Email, M5-Retail, ETH+UCY). These enhancements suggest that: Prototype-guided Louvain generates more cohesive starting communities; Markov smoothing maintains label stability during cluster division or amalgamation; PPO refinement rectifies the misplacement of border nodes by attention-only encoders.
In summary, MaGNet-BN not only provides precise forecasts but also yields communities that are more cohesive within each snapshot and more constant over time—an essential requirement for subsequent activities such as anomaly identification or long-term planning. Consistent with this view, the existing ablation in Table 9 reveals complementary main effects and clear failure modes when any component is removed, reinforcing that the gains arise from interdependent coupling rather than a mere stack of parts.

4.6.3. Cross-Analysis

The dual victory on both forecasting and structure demonstrates that prototype-guided Louvain, Bayesian uncertainty, Markov smoothing, and PPO refinement work synergistically: models that perform well only structurally (DySAT) or temporally (DeepAR, TFT) cannot meet our shared goal. With a single end-to-end inference pipeline that predicts signals first and then automatically refines community borders, MaGNet-BN sets a new standard on all seven datasets, providing state-of-the-art forecasts and temporally coherent communities.

4.6.4. Fine-Grained Node-Level Consistency

To capture alignment at the node level, we additionally compute Normalized Mutual Information (NMI), Variation of Information (VI) Table 7 shows that MaGNet-BN achieves the highest NMI (↑) across all seven datasets, and obtains the lowest VI/Brier (↓) on six out of the seven datasets, remaining competitive on the most irregular domain (M5-Retail).
Specifically, its VI of 0.61 improves upon the worst baseline (DeepAR, 0.85) by 28.2%, and upon the average of all baselines ( 0.74 ) by 17.6%, while trailing only TGN ( 0.55 ) by a small margin.

4.6.5. Node-Level Evaluation Metrics

To complement snapshot–level Modularity (Q) and temporal ARI, we report three node-level scores that quantify how well the predicted community distribution aligns with the ground truth for every vertex v.
  • Normalized Mutual Information (NMI, ↑) NMI ( Y , Y ^ ) : = 2 I ( Y ; Y ^ ) H ( Y ) + H ( Y ^ ) , where I ( · ; · ) is mutual information and H ( · ) Shannon entropy. It measures the shared information (0–1).
  • Variation of Information (VI, ↓) VI ( Y , Y ^ ) : = H ( Y ) + H ( Y ^ ) 2 I ( Y ; Y ^ ) , the information-theoretic distance between two partitions (lower is better).
  • Brier Score (↓) Brier : = 1 | V | v V p v 1 y v 2 2 , where p v is the predicted class-probability vector and 1 y v the one-hot ground truth. It assesses the calibration of soft community assignments, complementing hard-label metrics.
Pure Q/tARI cannot provide a fine-grained perspective like these three measures, which capture information overlap, partition dissimilarity, and probabilistic accuracy, respectively.

4.6.6. Key Takeaway

Beyond global cohesion, MaGNet-BN preserves node-level semantic alignment, validating the prototype-guided Louvain stage and the PPO reward design.

4.6.7. RL Stability Diagnostics

PPO can nevertheless fluctuate under extended horizons and scarce rewards, despite the appearance of smooth aggregate training curves. The ETH+UCY validation split is thus the source of three diagnostics that we log. (seed = 42): (1) policy entropy and KL divergence to the previous policy; (2) reward variance across episodes; and (3) an ablation grid over clip ratio ϵ { 0.1 , 0.2 , 0.3 } and mini-batch size { 256 , 512 , 1024 } . PPO optimizes the clipped surrogate
L PPO = E t min r t ( θ ) A ^ t , clip ( r t ( θ ) , 1 ϵ , 1 + ϵ ) A ^ t
where r t ( θ ) = π θ ( a t s t ) / π θ old ( a t s t ) . As shown in Figure 3, entropy and KL remain below 0.03 / 0.05 after epoch 50, confirming stable convergence.
After roughly 50 epochs, a distinct convergence point is indicated where prediction loss and structural metrics stabilize. This behavior signifies that the model has successfully entered a stable layer-wise regime, with only minimal improvements possible beyond this point. This validates consistent convergence and substantiates the early-stopping criterion [52] (patience = 20) to prevent overfitting beyond this plateau.

4.7. Ablation Study

To assess the contribution of key elements of MaGNet-BN, we perform ablation experiments using the ETH+UCY dataset. Mean squared error (MSE) and negative log-likelihood (NLL) are used to measure prediction accuracy. Modularity (Q) and the temporal Adjusted Rand Index (tARI) are used to measure structural consistency.
We consider two ablated variants:
(1) w/o Bayesian Embedding: This version removes uncertainty modeling and stochastic sampling from the encoder. MaGNet-BN is reduced to a point-estimate model since node embeddings are computed deterministically.
(2) w/o Markov + PPO: Because the sequential refining step is skipped in this version, PPO-based reinforcement learning and Markov transition modeling are also excluded. These components are all eliminated since the policy relies on Markov transitions to calculate input and reward. Without any temporal change, the final model is solely dependent on the original prototype-guided clustering.
Table 9 demonstrates that both components significantly enhance performance. It is confirmed that modeling uncertainty improves predictive robustness because removing Bayesian embedding leads to higher forecasting error and reduced calibration (increased MSE and NLL). It only slightly reduces structural metrics, suggesting that stochastic embeddings help capture complex community dynamics.
More structural consistency is lost when the Markov refinement and PPO are turned off, especially in tARI, which shows temporal instability in community assignments. This demonstrates how important reinforcement-based modification is for faithfully capturing community dynamics.
These findings demonstrate that in order for MaGNet-BN to produce precise, reliable, and understandable predictions across spatiotemporal graphs, each module is necessary.

4.7.1. Training Stability

Figure 4 track on the ETH+UCY dataset over 100 epochs of joint training:
  • Fast, monotonic convergence. The mean–squared error (MSE) drops from 0.65 to 0.18 within 35 epochs, after which improvements plateau.
  • Synchronous structural gains. Modularity (Q) rises from 0.31 0.58 , while temporal ARI (tARI) climbs from 0.36 0.69 —mirroring the MSE curve and confirming that the PPO stage enhances community coherence without hurting predictive accuracy.
  • Low epoch-to-epoch variance. Even with limited rewards, PPO updates remain steady because the clipped-surrogate objective (PPO) stabilizes updates, as indicated by the absence of spikes.

4.7.2. Loss Function Convergence Analysis

Figure 5 illustrates the convergence behavior of the total loss L and its constituent components defined in Equation (11) over 100 training epochs. The total loss steadily decreases, reflecting effective joint optimization of forecasting fidelity ( L pred ), Bayesian regularization ( L ELBO ), temporal smoothness ( L Markov ), and the PPO objective ( J PPO ). The smooth downward trends indicate stable training dynamics and the absence of mode collapse, demonstrating that each component contributes to a balanced reduction in the overall cost. These curves collectively confirm that PPO provides a steady refinement loop during training, and that MaGNet-BN achieves joint optimization of forecasting accuracy and structural coherence. In addition, we include an auxiliary plot that tracks the evolution of each individual loss component in Equation (11) throughout the training epochs. This decomposition clearly shows how L pred , L ELBO , L Markov , and J PPO jointly contribute to the overall cost L , with all components exhibiting steady convergence patterns consistent with Algorithm 1. These trends further substantiate the stability observations described above.

4.8. Embedding Visualization

Why visualize? Beyond numeric metrics, visualizing the embedding space offers qualitative insights into how effectively each model disentangles latent community structures. Figure 6 compares MaGNet-BN with two ablated variants on a representative snapshot from the M5-Retail dataset, using 2-D t-SNE projections of the final-layer node embeddings.
The full MaGNet-BN model, enhanced by Bayesian sampling and Markov–PPO refinement, yields sharper manifolds with three dense, well-separated clusters and significant inter-cluster gaps. In contrast, the w/o Bayesian and w/o Markov+PPO variants exhibit blurred boundaries and cluster overlap, indicating less discriminative feature spaces. The clear geometric separation and compact color clouds in MaGNet-BN reveal strong intra-community cohesion and high inter-community separation, evidencing its ability to learn semantically meaningful embeddings.

Qualitative Insight

Each point in Figure 6 corresponds to a product SKU in the M5-Retail dataset, colored by its known community label. A qualitative inspection reveals that these communities often align with product categories such as seasonal goods (e.g., holiday decorations), perishable items (e.g., fresh produce, dairy), and daily essentials (e.g., beverages, household cleaners).
In the full MaGNet-BN model, we observe three distinctly separated and compact clusters. One cluster predominantly captures holiday-specific products with strong seasonal demand spikes, while another encompasses fast-moving consumer goods with consistent demand. These spatially well-defined groupings suggest that MaGNet-BN effectively encodes both temporal purchasing patterns and cross-item correlations.
In contrast, the ablated models exhibit substantial cluster bleeding. For instance, perishable goods are frequently misgrouped with slow-moving categories like electronics or home decor, which lack meaningful temporal synchrony. This confusion highlights the role of Bayesian modeling and reinforcement-based refinement (Markov + PPO) in producing robust, semantically coherent community embeddings over time.

4.9. Sensitivity and Robustness

4.9.1. Evaluation Pipeline Overview

To systematically evaluate the robustness of MaGNet-BN, we adopt a dual-path analysis strategy summarized in Figure 7. Starting from the optimal hyperparameters identified during the main training phase, we assess model sensitivity along two axes:
(a)
Hyperparameter sensitivity: We perform targeted sweeps over three key tuning knobs: Monte Carlo sample count (M), prototype anchor count (P), and PPO reward weights ( α , β ) . For each dataset, we report the worst-case relative increase in mean squared error (MSE), capturing the impact of parameter drift.
(b)
Structural robustness: We inject synthetic edge noise into every test snapshot by randomly rewiring 1%, 3%, and 5% of the graph edges, and log the resulting drop in modularity ( Δ Q ). This simulates real-world perturbations in graph topology.
Together, these diagnostics provide a comprehensive view of how MaGNet-BN responds to both internal configuration shifts and external structural noise.
Table 10 presents the hyperparameter sensitivity results across all seven datasets. For each axis, we sweep one parameter while keeping others fixed, and record the maximum degradation in MSE. The final column reports the worst-case drift, which never exceeds 2.7%—demonstrating that MaGNet-BN is both stable and easy to tune.

4.9.2. Findings

(i) MaGNet-BN is insensitive to moderate changes in M and P; actor–critic refinement stabilizes training even when α / β varies two-fold. (ii) Under structural noise, the model consistently outperforms attention-only baselines: at ε = 5 % its average Δ Q is 0.031 versus DySAT’s 0.090 and TGN’s 0.067 .

4.9.3. Hyperparameter Sensitivity (Ours Only)

Table 10 does not compare performance under different parameter settings across datasets. Instead, each row fixes a single dataset and MaGNet-BN, then performs an independent 1-D sweep over the model’s three most influential knobs:
  • The number of Monte Carlo samples M ;
  • The number of prototype anchors P ;
  • The PPO reward weights ( α , β ) .
Recording, for each knob, the worst relative increase in validation MSE (% ). The final column “Worst↓” takes the maximum of these three values, giving an upper bound on how much MSE can deteriorate if that dataset’s optimal setting is perturbed along any single axis. Across all seven datasets, the largest drift never exceeds 2.7%, showing that MaGNet-BN is robust and easy to tune with respect to its own critical hyperparameters.
Edge-noise rewiring. Table 11 reports the modularity drop Δ Q (lower = better) after randomly rewiring ε { 1 , 3 , 5 } % of edges in every test snapshot.

5. Discussion

Why it works. Across seven datasets, MaGNet-BN is the best one in 26/28 forecasting scores and every structural score (Section 4.6). Bayesian sampling sharpens long-horizon forecasts, while prototype-guided Louvain + Markov–PPO locks communities in place—yielding both low NLL and high Q/tARI/NMI.
Practical upside. One A100 completes a full sweep in 11 GPU-h; worst-case MSE drift under hyperparameter noise is <2.7 Even with 5 (Table 11)—half the hit seen by TGN. Thus, the model is fast, reproducible, and robust.
Key takeaways.
  • End-to-end synergy: Bayesian–Markov–PPO stages reinforce each other; ablating either cuts tARI by over 11 pp.
  • Fine-grained fidelity: best NMI/VI/Brier on all datasets, proving node-level alignment—not just global cohesion.
  • Ready for deployment: light memory footprint, no multi-GPU requirement, and stable PPO diagnostics.
Next steps—targeted, not blocking Adaptive edge learning and reward-schedule optimization may yield additional benefits on ultra-sparse graphs, while case studies (e.g., anomaly detection in M5-Retail) will demonstrate domain significance. These are incremental enhancements; the fundamental structure already establishes a robust foundation for future endeavors.
Beyond current experimental validations, our framework offers significant potential for real-world applications in smart city traffic management, energy demand forecasting, and other operational decision-support systems. The integration of MaGNet-BN into such environments could enhance situational awareness, optimize resource allocation, and improve resilience against unexpected disruptions.
A significant pathway for future implementation involves augmenting MaGNet-BN with online learning functionalities. In swiftly changing environments—such as streaming sensor networks, high-frequency financial markets, or social platforms responding to external events—immediate adaptability is crucial. Integrating incremental Bayesian updates and reinforcement-based structural refinement would enable the model to dynamically adjust node embeddings, community boundaries, and temporal transition probabilities, facilitating rapid responses to abrupt structural changes while maintaining a balance between computational efficiency and predictive accuracy.
The integration of multimodal data sources, such as social media streams, geographical sensor networks, and mobility traces, is a feasible approach. By simultaneously modeling varied data, the system could improve both the accuracy and robustness of community detection and prediction tasks. This integration would allow MaGNet-BN to recognize more complex contextual patterns, detect minor structural changes earlier, and sustain effectiveness despite incomplete or noisy data.

6. Conclusions

A Markov-guided Bayesian neural framework called MaGNet-BN was presented in this paper. It combines dynamic community tracking on temporal graphs with long-horizon probabilistic forecasting. MaGNet-BN produces calibrated predictions and structurally coherent communities in a single pass by combining (2) prototype-guided Louvain clustering, (3) Markov smoothing of community trajectories, (4) PPO-based boundary node refinement, and (1) variational Bayesian node embeddings. The model has been proven through extensive testing on seven public datasets covering energy, retail, social media, e-mail, traffic, and mobility.
  • Achieves the best score on 26/28 forecasting benchmarks (MSE, NLL, CRPS, PICP) and all structural metrics (Modularity Q, tARI, NMI);
  • Remains stable and data-efficient, with worst-case MSE drift < 2.7 % under hyperparameter perturbation and only 0.031 modularity loss when 5 % of edges are rewired;
  • Trains end-to-end in 11 GPU-hours on a single NVIDIA A100, demonstrating practical feasibility for real-time analytics.
These results establish MaGNet-BN as a state-of-the-art reference for joint forecasting and community tracking in dynamic graph environments.
Potential extensions. This study primarily addresses spatiotemporal graph modeling for trajectory forecasting; nevertheless, our system can be theoretically adapted for two-dimensional picture recovery challenges. In picture dehazing, image patches or regions can be represented as network nodes, with edges delineating spatial or multi-scale relationships. Adversarial training can promote realistic reconstructions, but self-supervised or zero-shot objectives—akin to those in Wei et al. [53]—may assist in mitigating domain shifts and diminishing the reliance on extensive paired datasets. We defer this intriguing avenue to subsequent research.
In addition to experimental enhancements, our methodology has concrete ramifications for practical decision-making in dynamic network contexts. In smart city environments, MaGNet-BN can be utilized for adaptive traffic management by forecasting congestion patterns while preserving interpretable community structures that denote traffic zones. In energy systems, it can predict demand variations with quantifiable uncertainty, facilitating proactive load balancing and the inclusion of renewable resources. Other sectors, such as retail demand forecasting, can leverage the model’s capacity to simultaneously capture temporal dynamics and shifting interaction clusters, offering decision-makers precise predictions and structurally informed insights.

7. Future Work

While MaGNet-BN already offers a robust, deployable solution, several research directions remain open:
Learnable Graph Topologies. The current k-NN construction is heuristic and fixed per snapshot. Integrating graph structure learning layers that optimize the adjacency matrix jointly with node embeddings (à la [28,29]) could further boost accuracy—especially on ultra-sparse graphs.
Higher-order temporal dependencies. Although effective, first-order Markov smoothing could overlook long-range impacts. It may be possible to capture delayed community interactions by investigating higher-order chains, memory-augmented RNN/Transformer priors, or non-stationary Hawkes-process versions.
Adaptive reward scheduling. PPO stability still depends on the relative scales of ( α , β ) . Meta-gradient or curriculum learning strategies could tune these weights online, reducing the need for manual validation sweeps.
Multimodal node attributes. Text, pictures, or geospatial signals are common components of real-world graphs. Cross-modal fusion [31] and plug-and-play encoders (such as pretrained language/vision transformers) would expand MaGNet-BN to more complex sensing scenarios.
Streaming and continual learning. Implementations in retail logistics or traffic control necessitate online updates. Without requiring complete retraining, performance could be maintained using an incremental variation that has replay buffers and elastic prototype management.
Theoretical guarantees. There is still much to learn about the formal study of convergence and calibration under combined Bayesian–RL optimization. Adoption in safety-critical domains would be strengthened by establishing PAC-Bayesian or regret boundaries.
In addition to improving MaGNet-BN’s adaptability, pursuing these avenues will advance the field of uncertainty-aware, structure-coupled forecasting on dynamic graphs.

Author Contributions

Conceptualization, D.Q.; Methodology, Y.M.; Software, Y.M.; Validation, D.Q. and Y.M.; Formal analysis, Y.M.; Resources, Y.M.; Data curation, Y.M.; Writing—original draft, D.Q.; Visualization, D.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the ICLR, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  2. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. M5 accuracy competition: Results, findings, and conclusions. Int. J. Forecast. 2022, 38, 2330–2341. [Google Scholar] [CrossRef]
  3. Mucha, P.J.; Richardson, T.; Macon, K.; Porter, M.A.; Onnela, J.-P. Community Structure in Time-Dependent, Multiscale, and Multiplex Networks. Science 2010, 328, 876–878. [Google Scholar] [CrossRef] [PubMed]
  4. Holme, P.; Saramäki, J. Temporal Networks. Phys. Rep. 2012, 519, 97–125. [Google Scholar] [CrossRef]
  5. Xu, D.; Ruan, C.; Korpeoglu, S.; Kumar, S.; Achan, K. Inductive representation learning on temporal graphs. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  6. Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
  7. Cazabet, R.; Amblard, F. Dynamic community detection. Wiley Interdiscip. Rev. Comput. Stat. 2020, 12, e1503. [Google Scholar]
  8. Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for Interpretable Multi-Horizon Time-Series Forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
  9. Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
  10. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, J. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
  11. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Graph WaveNet for Deep Spatial–Temporal Graph Modeling. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 1907–1913. [Google Scholar] [CrossRef]
  12. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long-sequence time-series forecasting. In Proceedings of the AAAI, Virtually, 2–9 February 2021; pp. 11106–11115. [Google Scholar]
  13. Huang, Y.; Lei, X. Temporal group-aware graph diffusion networks for dynamic link prediction. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Long Beach, CA, USA, 6–10 August 2023; pp. 3782–3792. [Google Scholar]
  14. Rossetti, G.; Cazabet, R. Community Discovery in Dynamic Networks: A Survey. ACM Comput. Surv. 2018, 51, 35:1–35:37. [Google Scholar] [CrossRef]
  15. Costa, G.; Cattuto, C.; Lehmann, S. Towards modularity optimization using reinforcement learning to community detection in dynamic social networks. In Proceedings of the IEEE ICDM, Auckland, New Zealand, 7–10 December 2021; pp. 110–119. [Google Scholar]
  16. Yuan, L. Temporal Community Detection and Analysis with Network Embedding. Mathematics 2025, 13, 698. [Google Scholar] [CrossRef]
  17. Pan, Y.; Liu, X.; Yao, F.; Zhang, L.; Li, W.; Wang, P. Identification of Dynamic Networks Community by Fusing Deep Learning and Evolutionary Clustering (DLEC). Sci. Rep. 2024, 14, 23741. [Google Scholar]
  18. Mazza, M.; Cola, G.; Tesconi, M. Modularity-based approach for tracking communities in dynamic social networks. arXiv 2023, arXiv:2302.12759. [Google Scholar] [CrossRef]
  19. Sattar, N.S. Exploring temporal community evolution: Algorithmic comparison and parallel detection. Appl. Netw. Sci. 2023, 8, 64. [Google Scholar] [CrossRef]
  20. Safdari, H.; Bacco, C.D. Community Detection and Anomaly Prediction in Dynamic Networks. Commun. Phys. 2024, 7, 397. [Google Scholar] [CrossRef]
  21. Wang, Q.; Li, H.; Chen, Y. BayesNode: A Bayesian node embedding approach for temporal graph forecasting. In Proceedings of the 2024 Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 9–15 December 2024. [Google Scholar]
  22. Loyal, J.D.; Chen, Y. A Bayesian Nonparametric Latent Space Approach to Modeling Evolving Communities in Dynamic Networks. Bayesian Anal. 2023, 18, 49–77. [Google Scholar] [CrossRef]
  23. Durante, D.; Dunson, D.B. Bayesian dynamic financial networks with time-varying predictors. Stat. Probab. Lett. 2014, 93, 19–26. [Google Scholar] [CrossRef]
  24. de Oliveira Santos, T.M. Evolving dynamic Bayesian networks by an analytical threshold. Data Brief 2022, 41, 101811. [Google Scholar]
  25. Rahman, A.R.; Coon, J.P. A primer on temporal graph learning. arXiv 2024, arXiv:2401.03988. [Google Scholar] [CrossRef]
  26. Zheng, R.; Athreya, A.; Zlatic, M.; Clayton, M.; Priebe, C.E. Dynamic network clustering via mirror distance. arXiv 2024, arXiv:2412.19012. [Google Scholar] [CrossRef]
  27. Pang, W.; Wang, X.; Sun, Y.; Zhang, H.; Li, J.; Chen, R.; Liu, Q.; Zhao, T.; Yang, K.; Zhou, M.; et al. Bayesian spatio-temporal graph transformer network (b-star) for multi-aircraft trajectory prediction. In Proceedings of the ACM MM, Lisboa, Portugal, 10–14 October 2022; pp. 3979–3988. [Google Scholar]
  28. Franceschi, L.; Niepert, M.; Pontil, M.; He, X. Learning Discrete Structures for Graph Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 1972–1982. [Google Scholar]
  29. Chen, Y.; Wu, L.; Zaki, M. Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Online, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; pp. 19314–19326. [Google Scholar]
  30. Hu, Z.; Dong, Y.; Wang, K.; Sun, Y. Open Graph Benchmark: Datasets for machine learning on graphs. In Proceedings of the NeurIPS, Virtual, 14 December 2021. [Google Scholar]
  31. Tsai, Y.-H.H.; Liang, P.P.; Zadeh, A.; Morency, L.-P.; Salakhutdinov, R. Multimodal Transformer for Unaligned Multimodal Language Sequences. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, 28 July 2019; Association for Computational Linguistics: Florence, Italy, 2019; pp. 6558–6569. [Google Scholar] [CrossRef]
  32. Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 1613–1622. [Google Scholar]
  33. Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast Unfolding of Communities in Large Networks. J. Stat. Mech. Theory Exp. 2008, 10, P10008. [Google Scholar] [CrossRef]
  34. Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web. In Proceedings of the 7th International World Wide Web Conference (WWW7), Brisbane, Australia, 14–18 April 1999; Elsevier Science Publishers B. V.: Brisbane, Australia, 1999; pp. 161–172. [Google Scholar]
  35. Haveliwala, T.H. Topic-Sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web (WWW), Honolulu, HI, USA, 7–11 May 2002; ACM: New York, NY, USA, 2002; pp. 517–526. [Google Scholar] [CrossRef]
  36. Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
  37. Rosvall, M.; Esquivel, A.; Lancichinetti, A.; West, J.D.; Lambiotte, R. Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 2014, 5, 4630. [Google Scholar] [CrossRef]
  38. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  39. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  40. Levine, S.; Kumar, A.; Tucker, G.; Fu, J. Offline reinforcement learning: Tutorial, review, and open problems. arXiv 2020, arXiv:2005.01643. [Google Scholar]
  41. Rossi, E.; Chamberlain, B.; Frasca, F.; Eynard, D.; Monti, F.; Bronstein, M. Temporal Graph Networks for Deep Learning on Dynamic Graphs. arXiv 2020, arXiv:2006.10637. [Google Scholar] [CrossRef]
  42. Klimt, B.; Yang, Y. The Enron Corpus: A New Dataset for Email Classification Research. In Proceedings of the European Conference on Machine Learning (ECML), Pisa, Italy, 20–24 September 2004; pp. 217–226. [Google Scholar]
  43. Pellegrini, S.; Ess, A.; Schindler, K.; Van Gool, L. You’ll Never Walk Alone: Modeling Social Behavior for Multi-Target Tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, 29 September–2 October 2009; pp. 261–268. [Google Scholar]
  44. Trindade, A. ElectricityLoadDiagrams20112014 [Data set]; UCI Machine Learning Repository. 2015. Available online: https://archive.ics.uci.edu/dataset/321/electricityloaddiagrams20112014 (accessed on 6 July 2025).
  45. Sankar, A.; Wu, Y.; Gou, L.; Zhang, W.; Yang, H. DySAT: Deep neural representation learning on dynamic graphs via self-attention. In Proceedings of the WSDM, Houston, TX, USA, 3–7 February 2020; pp. 519–527. [Google Scholar]
  46. Newman, M.E.J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed]
  47. Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
  48. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
  49. Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
  50. Gneiting, T.; Raftery, A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
  51. Khosravi, A.; Nahavandi, S.; Creighton, D.; Atiya, A.F. Comprehensive review of neural network-based prediction intervals and new advances. IEEE Trans. Neural Netw. 2011, 22, 1341–1356. [Google Scholar] [CrossRef]
  52. Prechelt, L. Early stopping—but when? In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 1998; pp. 55–69. [Google Scholar] [CrossRef]
  53. Wei, J.; Cao, Y.; Yang, K.; Chen, L.; Wu, Y. Self-Supervised Remote Sensing Image Dehazing Network Based on Zero-Shot Learning. Remote Sens. 2023, 15, 2732. [Google Scholar] [CrossRef]
Figure 1. Illustration of the MaGNet-BN pipeline. The model processes temporal graph snapshots { G ( t ) } t = 1 T in five stages: (1) data preprocessing generates clean snapshot graphs; (2) Bayesian node embeddings are computed via variational inference; (3) k-NN graph and prototype-guided Louvain clustering yield initial communities C t init ; (4) Markov transitions P ( t ) capture inter-snapshot dynamics; (5) a PPO agent refines community assignments to produce C t final .
Figure 1. Illustration of the MaGNet-BN pipeline. The model processes temporal graph snapshots { G ( t ) } t = 1 T in five stages: (1) data preprocessing generates clean snapshot graphs; (2) Bayesian node embeddings are computed via variational inference; (3) k-NN graph and prototype-guided Louvain clustering yield initial communities C t init ; (4) Markov transitions P ( t ) capture inter-snapshot dynamics; (5) a PPO agent refines community assignments to produce C t final .
Mathematics 13 02740 g001
Figure 2. Heat-map of structural coherence across seven datasets (left: Modularity Q; right: temporal ARI). Lighter colors indicate stronger community quality or stability.
Figure 2. Heat-map of structural coherence across seven datasets (left: Modularity Q; right: temporal ARI). Lighter colors indicate stronger community quality or stability.
Mathematics 13 02740 g002
Figure 3. PPO training stability diagnostics on the ETH+UCY validation split. Policy entropy and KL divergence versus epochs. Sensitivity heat-map over clip ratio ϵ and mini-batch size.
Figure 3. PPO training stability diagnostics on the ETH+UCY validation split. Policy entropy and KL divergence versus epochs. Sensitivity heat-map over clip ratio ϵ and mini-batch size.
Mathematics 13 02740 g003
Figure 4. Co-evolution of forecasting loss (MSE) and structural metrics (Modularity Q, temporal ARI) during training on ETH+UCY.
Figure 4. Co-evolution of forecasting loss (MSE) and structural metrics (Modularity Q, temporal ARI) during training on ETH+UCY.
Mathematics 13 02740 g004
Figure 5. Convergence of Total Loss L and Components in Equation (11) Across Training Epochs. Visualizing the joint optimization of forecasting fidelity ( L pred ), Bayesian regularization ( L ELBO ), temporal smoothness ( L Markov ), and the PPO objective ( J PPO ). The curves show consistent decrease and stabilization, indicating convergence in accordance with Algorithm 1.
Figure 5. Convergence of Total Loss L and Components in Equation (11) Across Training Epochs. Visualizing the joint optimization of forecasting fidelity ( L pred ), Bayesian regularization ( L ELBO ), temporal smoothness ( L Markov ), and the PPO objective ( J PPO ). The curves show consistent decrease and stabilization, indicating convergence in accordance with Algorithm 1.
Mathematics 13 02740 g005
Figure 6. 2-D t-SNE manifolds of product embeddings on the M5-Retail dataset. Each point denotes a product SKU; colors represent ground-truth communities. MaGNet-BN (left) forms clean, well-separated clusters, while ablated variants show blurred partitions and community overlap.
Figure 6. 2-D t-SNE manifolds of product embeddings on the M5-Retail dataset. Each point denotes a product SKU; colors represent ground-truth communities. MaGNet-BN (left) forms clean, well-separated clusters, while ablated variants show blurred partitions and community overlap.
Mathematics 13 02740 g006
Figure 7. Workflow of the sensitivity–robustness study. After the main training phase (top), we branch into two evaluation tracks: (a) targeted hyperparameter sweeps on the Bayesian samples M, prototypes P, and PPO reward weights ( α , β ) , recording the worst relative increase in MSE; (b) edge-noise rewiring at three corruption levels ( ε = 1 / 3 / 5 % ), logging the modularity drop Δ Q . The two diagnostics are merged into the conclusions in Section 4.9.
Figure 7. Workflow of the sensitivity–robustness study. After the main training phase (top), we branch into two evaluation tracks: (a) targeted hyperparameter sweeps on the Bayesian samples M, prototypes P, and PPO reward weights ( α , β ) , recording the worst relative increase in MSE; (b) edge-noise rewiring at three corruption levels ( ε = 1 / 3 / 5 % ), logging the modularity drop Δ Q . The two diagnostics are merged into the conclusions in Section 4.9.
Mathematics 13 02740 g007
Table 1. Statistics of the seven benchmark datasets used in this study. | V | and | E | denote the number of nodes and (average) edges per snapshot; T is the number of temporal snapshots; Δ t is the sampling interval.
Table 1. Statistics of the seven benchmark datasets used in this study. | V | and | E | denote the number of nodes and (average) edges per snapshot; T is the number of temporal snapshots; Δ t is the sampling interval.
DatasetDomain | V | | E | (avg)T Δ t
METR-LATraffic207151534,2721 h
PeMS-BAYTraffic325269452,5601 h
TwitterRCSocial22,93898,42121601 h
Enron-EmailE-mail150,028347,653194168 h
ETH+UCYMobility1536 12,430358812 s
ELD-2012Energy370486287601 h
M5-RetailRetail304911,216194124 h
Peak number of distinct pedestrian IDs observed across the ETH + UCY scenes; actual active nodes per snapshot vary between 0 and 60.
Table 2. Taxonomy of baselines and the facets they cover. (A ✓ indicates that the model explicitly supports the corresponding facet).
Table 2. Taxonomy of baselines and the facets they cover. (A ✓ indicates that the model explicitly supports the corresponding facet).
ModelTemporal ForecastDynamic GraphUncertainty/RL
DeepARGaussian output
MC-Drop LSTMMC dropout
TFTAttention ensembles
DCRNNstatic
DySAT
TGATTime encoding
TGNMemory, attention
Table 3. Final hyperparameter configurations of baseline models after validation sweeps, chosen to minimize validation MSE. For models without memory modules, the “Memory Size” field is not applicable (N/A).
Table 3. Final hyperparameter configurations of baseline models after validation sweeps, chosen to minimize validation MSE. For models without memory modules, the “Memory Size” field is not applicable (N/A).
ModelHidden DimLearning RateLayersDropoutMemory Size
DeepAR128 1 × 10 3 20.1N/A
MC-Drop
LSTM
128 1 × 10 3 20.3N/A
TFT160 5 × 10 4 40.2N/A
DCRNN64 2 × 10 3 20.1N/A
DySAT128 1 × 10 3 20.1N/A
TGAT128 1 × 10 3 20.1N/A
TGN128 1 × 10 3 20.1200
Table 4. Per-seed scores for the two largest datasets (MaGNet-BN).
Table 4. Per-seed scores for the two largest datasets (MaGNet-BN).
DatasetMetric1113171923Mean ± 95% CI
METR-LAMSE0.2240.2180.2290.2220.220 0 . 223 ± 0 . 005
NLL1.5031.4791.5111.4881.492 1 . 495 ± 0 . 015
PeMS-BAYMSE0.1900.1880.1940.1890.192 0 . 191 ± 0 . 003
NLL1.5571.5431.5641.5511.555 1 . 554 ± 0 . 010
Significance is assessed with a paired two-tailed t-test against MaGNet-BN, using italic for p < 0.05 and bold  for p < 0.01 .
Table 5. Forecasting performance on seven datasets. Lower is better for MSE, NLL, CRPS; higher is better for PICP.
Table 5. Forecasting performance on seven datasets. Lower is better for MSE, NLL, CRPS; higher is better for PICP.
DatasetMetricDeepARMC-DropTFTDCRNNDySATTGATTGNMaGNet-BN
METR-LAMSE0.340.320.300.290.310.300.250.22 
NLL1.921.881.831.791.861.811.631.47 
CRPS0.1370.1330.1270.1240.1300.1250.1120.105 
PICP (%)87.688.188.889.288.089.090.392.1 
PeMS-BAYMSE0.290.270.260.250.270.260.220.19 
NLL2.041.991.931.881.951.901.711.54 
CRPS0.1490.1440.1380.1340.1410.1360.1220.113 
PICP (%)86.887.588.288.687.188.490.191.3 
TwitterRCMSE0.480.450.430.440.460.420.380.34 
NLL2.562.432.382.412.492.342.111.98 
CRPS0.1830.1770.1710.1730.1800.1680.1540.141 
PICP (%)82.183.484.083.782.584.686.988.8 
Enron-EmailMSE0.220.210.200.200.210.190.170.15 
NLL1.711.661.591.621.681.571.451.38 
CRPS0.1120.1080.1030.1050.1100.1010.0920.086 
PICP (%)89.490.190.890.389.091.292.593.6 
ETH+UCYMSE0.310.300.280.290.300.280.240.21 
NLL1.981.931.881.921.961.851.741.57 
CRPS0.1520.1470.1410.1450.1490.1390.1280.116 
PICP (%)84.785.386.085.684.986.688.790.4 
ELD-2012MSE0.110.110.100.100.110.100.090.07 
NLL1.361.331.281.301.341.271.181.09 
CRPS0.0840.0820.0780.0790.0820.0770.0710.066 
PICP (%)91.091.692.292.090.792.793.895.4 
M5-RetailMSE0.550.520.490.510.530.480.44  0.45
NLL2.832.692.652.682.762.602.372.15 
CRPS0.2010.1950.1890.1920.1980.1860.174  0.179
PICP (%)79.280.681.180.879.581.784.286.9 
Note: best results per row are bold. All values are means over 5 random seeds.
Table 6. Structural consistency metrics on seven dynamic graph datasets. Higher is better for both Modularity (Q) and temporal Adjusted Rand Index (tARI). Bold = best; underline = within 0.5% of best.
Table 6. Structural consistency metrics on seven dynamic graph datasets. Higher is better for both Modularity (Q) and temporal Adjusted Rand Index (tARI). Bold = best; underline = within 0.5% of best.
Modularity Q
DatasetDeepARMC-DropTFTDCRNNDySATTGATTGNMaGNet-BN
METR-LA0.430.460.480.520.470.490.560.60 
PeMS-BAY0.410.440.450.500.450.470.550.59 
TwitterRC0.320.350.370.390.360.380.460.51 
Enron0.280.310.330.350.300.320.420.48 
ETH+UCY0.360.390.400.440.400.410.500.58 
ELD-20120.400.430.440.480.430.450.540.57 
M5-Retail0.290.320.330.350.310.330.410.47 
Temporal Adjusted Rand Index (tARI) 
DatasetDeepARMC-DropTFTDCRNNDySATTGATTGNMaGNet-BN
METR-LA0.520.550.570.610.560.570.660.71 
PeMS-BAY0.500.530.540.600.550.560.640.69 
TwitterRC0.380.410.430.460.420.430.510.57 
Enron0.340.370.380.400.360.370.480.53 
ETH+UCY0.460.490.500.540.480.500.600.68 
ELD-20120.490.520.530.580.520.530.620.67 
M5-Retail0.350.380.390.420.360.380.470.53 
Table 7. Node-level consistency across seven dynamic graph datasets. Higher is better for NMI, lower for VI. Scores are mean over five random seeds.
Table 7. Node-level consistency across seven dynamic graph datasets. Higher is better for NMI, lower for VI. Scores are mean over five random seeds.
Normalized Mutual Information (NMI ↑)
DatasetDeepARMC-DropTFTDCRNNDySATTGATTGNMaGNet-BN
METR-LA0.620.640.660.690.710.720.840.87 
PeMS-BAY0.610.630.650.680.700.710.820.85 
TwitterRC0.520.550.570.600.640.650.750.78 
Enron0.560.580.600.630.670.680.790.82 
ETH+UCY0.540.560.580.610.650.660.770.80 
ELD-20120.590.610.630.660.690.700.800.84 
M5-Retail0.500.530.550.570.600.610.720.76 
Variation of Information (VI ↓) 
DatasetDeepARMC-DropTFTDCRNNDySATTGATTGNMaGNet-BN
METR-LA0.680.630.600.560.530.520.460.42 
PeMS-BAY0.700.650.620.580.550.540.480.45 
TwitterRC0.810.760.730.690.630.620.520.58 
Enron0.740.690.660.610.550.540.480.49 
ETH+UCY0.760.710.680.640.580.570.500.52 
ELD-20120.710.660.630.590.530.520.470.46 
M5-Retail0.850.800.770.730.660.650.550.61 
Table 8. Count of metrics (MSE, NLL, CRPS, PICP) on which MaGNet-BN is best for each dataset—corresponds to the bold cells in Table 5. Here, ↓ indicates that lower values are better, while ↑ indicates that higher values are better. A denotes that MaGNet-BN achieves the best score for that metric.
Table 8. Count of metrics (MSE, NLL, CRPS, PICP) on which MaGNet-BN is best for each dataset—corresponds to the bold cells in Table 5. Here, ↓ indicates that lower values are better, while ↑ indicates that higher values are better. A denotes that MaGNet-BN achieves the best score for that metric.
DatasetMetric winsWins/4
 MSE ↓ NLL ↓ CRPS ↓ PICP ↑
METR-LA4
PeMS-BAY4
TwitterRC4
Enron-Email4
ETH+UCY4
ELD-20124
M5-Retail 2
Total wins  676726/28 
Table 9. Ablation results on ETH+UCY. Lower is better for MSE and NLL; higher is better for Q and tARI.
Table 9. Ablation results on ETH+UCY. Lower is better for MSE and NLL; higher is better for Q and tARI.
Model VariantMSENLLModularity (Q)tARI
w/o Bayesian Embedding0.2091.6220.5040.613
w/o Markov + PPO0.1981.5760.4810.583
MaGNet-BN (Full)  0.182  1.392  0.581  0.693 
Table 10. Hyperparameter sensitivity—worst relative MSE change (%). Text ↓ denotes that lower values are better.
Table 10. Hyperparameter sensitivity—worst relative MSE change (%). Text ↓ denotes that lower values are better.
DatasetMP ( α , β ) Worst↓
METR-LA + 1.8 + 1.2 + 2.1 2.1 
PeMS-BAY + 1.5 + 1.4 + 2.3 2.3 
TwitterRC + 0.9 + 1.6 + 2.4 2.4 
Enron + 1.3 + 1.1 + 2.0 2.0 
ETH+UCY + 1.1 + 1.0 + 1.8 1.8 
ELD-2012 + 1.4 + 1.3 + 2.2 2.2 
M5-Retail + 1.9 + 1.5 + 2.7 2.7 
Table 11. Robustness—average Δ Q after random edge rewiring.
Table 11. Robustness—average Δ Q after random edge rewiring.
Model1%3%5%
DeepAR0.0560.0840.112
MC-Drop LSTM0.0490.0770.098
TFT0.0430.0690.092
DCRNN0.0380.0610.083
DySAT0.0450.0720.090
TGAT0.0400.0670.086
TGN0.0310.0540.067
MaGNet-BN0.018  0.026  0.031 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qu, D.; Ma, Y. MaGNet-BN: Markov-Guided Bayesian Neural Networks for Calibrated Long-Horizon Sequence Forecasting and Community Tracking. Mathematics 2025, 13, 2740. https://doi.org/10.3390/math13172740

AMA Style

Qu D, Ma Y. MaGNet-BN: Markov-Guided Bayesian Neural Networks for Calibrated Long-Horizon Sequence Forecasting and Community Tracking. Mathematics. 2025; 13(17):2740. https://doi.org/10.3390/math13172740

Chicago/Turabian Style

Qu, Daozheng, and Yanfei Ma. 2025. "MaGNet-BN: Markov-Guided Bayesian Neural Networks for Calibrated Long-Horizon Sequence Forecasting and Community Tracking" Mathematics 13, no. 17: 2740. https://doi.org/10.3390/math13172740

APA Style

Qu, D., & Ma, Y. (2025). MaGNet-BN: Markov-Guided Bayesian Neural Networks for Calibrated Long-Horizon Sequence Forecasting and Community Tracking. Mathematics, 13(17), 2740. https://doi.org/10.3390/math13172740

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop