Next Article in Journal
Stochastic Modelling of Dry-Clutch Coefficient of Friction for a Wide Range of Operating Conditions
Next Article in Special Issue
A Convolutional Neural Network Framework for Sleep Apnea Detection via Ballistocardiography Signals
Previous Article in Journal
The Biostatistical Landscape of Scientific Output in the Field of Open Bite: Trends, Themes, and Publication Dynamics
Previous Article in Special Issue
Development of an Extreme Machine Learning-Based Computational Application for the Detection of Armillaria in Cherry Trees
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SpeQNet: Query-Enhanced Spectral Graph Filtering for Spatiotemporal Forecasting

Department of Computer and Information Systems, University of Aizu, Aizuwakamatsu 965-8580, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(3), 1176; https://doi.org/10.3390/app16031176
Submission received: 8 January 2026 / Revised: 21 January 2026 / Accepted: 22 January 2026 / Published: 23 January 2026
(This article belongs to the Special Issue Research and Applications of Artificial Neural Network)

Abstract

Accurate spatiotemporal forecasting underpins high-stakes decision making in smart urban systems, from traffic control and energy scheduling to environment monitoring. Yet two persistent gaps limit current models: (i) spatial modules are often biased toward low-pass smoothing and struggle to reconcile slow global trends with sharp local dynamics; and (ii) the graph structure required for forecasting is frequently latent, while learned graphs can be unstable when built from temporally derived node features alone. We propose SpeQNet, a query-enhanced spectral graph filtering framework that jointly strengthens node representations and graph construction while enabling frequency-selective spatial reasoning. SpeQNet injects global spatial context into temporal embeddings via lightweight learnable spatiotemporal queries, learns a task-oriented adaptive adjacency matrix, and refines node features with an enhanced ChebNetII-based spectral filtering block equipped with channel-wise recalibration and nonlinear refinement. Across twelve real-world benchmarks spanning traffic, electricity, solar power, and weather, SpeQNet achieves state-of-the-art performance and delivers consistent gains on large-scale graphs. Beyond accuracy, SpeQNet is interpretable and robust: the learned spectral operators exhibit a consistent band-stop-like frequency shaping behavior, and performance remains stable across a wide range of Chebyshev polynomial orders. These results suggest that query-enhanced spatiotemporal representation learning and adaptive spectral filtering form a complementary and effective foundation for effective spatiotemporal forecasting.

1. Introduction

In an era of smart urban systems [1], accurate spatiotemporal forecasting is essential for avoiding crises and optimizing resources, underpinning applications such as traffic management [2], energy scheduling [3], weather prediction [4], and environmental monitoring [5]. These systems generate multivariate time series (MTS) from spatially distributed sensors, who interact over an underlying spatial structure. This structure is often conveniently modeled as a graph, where each node corresponds to a time series and edges encode physical or functional relationships [6,7]. Graph neural networks (GNNs) are therefore a natural choice for modeling spatial dependencies and have been widely adopted in spatiotemporal forecasting architectures [8,9]. However, despite their theoretical appeal, graph-based models often fall short of strong Transformer-based architectures [10,11]. This shortfall arises from two central limitations. First, spatiotemporal data contain both slowly varying global trends and rapidly changing local patterns [12], dynamics that are well suited for spectral analysis. Yet classic ChebNet [13], despite its strong theoretical expressiveness [14], frequently underperforms spatial GNNs like GCN on real-world datasets [15]. Second, real-world datasets rarely provide high-quality graph structures that faithfully reflect the spatial dependencies needed for forecasting, and existing graph learning methods often struggle to infer these dependencies reliably. Together, these limitations highlight the need for a spatiotemporal forecasting framework that models the spectral dependencies more effectively while learning graph structures in a robust and data-driven manner.
A large body of work on graph-based spatiotemporal forecasting is built on GCN [15], where each node aggregates information from its neighbors via a learned local kernel. While simple and effective, such neighbor aggregation behaves primarily as a low-pass filter on the graph [16,17,18]. It promotes smoothness across connected nodes and tends to suppress high-frequency components that encode sharp local variations or heterogeneous interactions. As a result, GCN often struggles to simultaneously capture slowly varying global trends, e.g., flow propagation across a corridor of roads, and rapidly changing local patterns, e.g., sudden congestion at intersection [18,19]. In contrast, spectral GNNs offer a more flexible alternative, and by parameterizing filters in the eigenbasis of the Laplacian graph, they operate directly in the frequency domain [20], enabling the learning of arbitrary spectral responses. This allows the model to selectively amplify or attenuate specific frequency bands, representing a much richer set of spatial dependencies that span both smooth global structures and localized, fast-changing behaviors. Although classic ChebNet often underperforms on real-world datasets [21] despite its theoretical expressiveness, recent advances such as ChebNetII [22] refine the Chebyshev approximation and demonstrate strong performance on node classification tasks, making spectral filtering a more promising backbone for spatiotemporal forecasting.
However, the expressive power of any GNN is fundamentally limited by the quality of the graph it operates on and the informativeness of the node representations propagated over that graph. Predefined physical graphs [8,23], such as distance- or connectivity-based structures, offer useful geometric priors but fail to capture functional dependencies such as hubs, bottlenecks, high-demand regions. To overcome these limitations, modern spatiotemporal forecasting models primarily adopt two families of graph learning strategies. Adaptive graph learning [24,25,26] generates a static, input-independent adjacency matrix by parameterizing node affinities through learnable embeddings, which is effective for capturing stable spatial patterns. Attention-based constructions [27,28], in contrast, derive spatial dependencies from the input itself, the query–key interactions implicitly define sample-specific adjacency patterns and can, in principle, capture non-recurring or abrupt spatial changes. Yet recent works employing these strategies often overlook the role of node representations, which are typically derived solely from temporal encoders and thus lack explicit spatial context. This way, they place the full burden of discovering meaningful spatial structure either on the graph parameters or on attention weights. This disconnect between node representation learning and graph construction leaves room for performance improvement, and calls for a framework where spatiotemporally enriched node features and adaptive graphs are learned in a mutually reinforcing way.
To address the above limitations, we propose SpeQNet, a spatiotemporal forecasting model that jointly learns a frequency-aware spatiotemporal representation and a task-relevant graph structure. Building on ChebNetII, our spectral graph filtering block integrates node-wise feature recalibration and nonlinear transformations to refine graph signal processing. To enhance the spatiotemporal learning, we introduce a learnable spatiotemporal query module that injects global spatial context into temporal node representations. These representations improve the forecasting accuracy while promoting adaptive graph learning with a clearer topology. Extensive experiments on 12 real-world forecasting benchmarks spanning traffic, energy, and climate demonstrate that SpeQNet achieves state-of-the-art performance.
Our contributions are summarized as follows:
  • Frequency-aware spatial modeling—We design an enhanced ChebNetII-based spectral graph filtering block that captures both global and local spatial dependencies, boosting forecasting accuracy.
  • Query-enhanced node representations—We introduce a lightweight spatiotemporal query that injects global spatial context into temporal node representations. The enriched spatiotemporal representations contribute to consistent forecasting improvements.
The remainder of this paper is organized as follows. Section 2 reviews related work on multivariate time-series forecasting. Section 3 introduces the problem formulation and introduces the necessary graph spectral preliminaries. Section 4 presents the proposed SpeQNet framework in detail, describing the query-enhanced spatiotemporal representations, adaptive graph learning, and spectral graph filtering modules. Section 5 reports experimental setttings and results on twelve real-world benchmarks, including Ablation Studies, Spectral Filter Response Analysis, Sensitivity to Chebyshev Polynomial Order Analysis, Scaling analysis, Computational Efficiency Analysis, Graph Interpretability Analysis, and Qualitative Forecasting Visualization. Finally, Section 6 concludes the paper and discusses limitations and directions for future work.

2. Related Works

Recent progress in multivariate time-series forecasting has been shaped by two contrasting modeling philosophies: treating each variable independently, or explicitly learning relationships across variables. Channel-independent models simplify multivariate forecasting by decomposing the problem into a collection of univariate tasks, an idea popularized by PatchTST [29] and the family of linear models such as DLinear and RLinear [30,31], and recent LLM-based methods like Chronos [32] and TimeLLM [33] that adapt large language models for univariate processing. These methods achieve strong long-term forecasting accuracy by focusing exclusively on temporal dynamics while avoiding the risk of spurious cross-variable correlations. Similar motivations underlie temporal CNN architectures such as TimesNet [34] and CycleNet [35], which extract periodic or multi-scale temporal patterns with minimal reliance on variable interactions. Although effective on long-sequence datasets where intra-variable temporal structures dominate, these approaches inherently discard spatial or functional relationships—an omission that limits their applicability to domains where interactions between variables are essential, such as traffic networks, environment systems, and energy grids.
Motivated by this limitation, a second line of works develop models that explicitly capture cross-variable dependencies. Transformer-based methods such as iTransformer [10] and Crossformer [36] treat each variable as a token and use attention mechanisms to discover multivariate correlations, while architectures like TimeXer [11] further combine patch-level and variate-level representations to reconcile temporal patterns with cross-series interactions. Recent extensions, such as VCformer [37] incorporating lagged correlations, MOIRAI [38] enabling universal pre-training, and UniTS [39] facilitating multi-task unification, build on these foundations to address diverse forecasting scenarios. These approaches provide flexible relational modeling but operate purely in the input space: attention weights are derived directly from observed time series, making them sensitive to noise, abrupt fluctuations, and nonstationarity [40].
Recent GNN-based architectures advance this paradigm by incorporating structural inductive biases, such as predefined or learned graphs over variables, to better encode domain-specific topologies. Classical spatiotemporal GNNs pioneered the use of spatial graphs to capture interactions in applications like traffic and energy forecasting. Many rely on predefined graphs constructed from spatial distances, connectivity, or correlation matrices, as in STGCN [8], DCRNN [23], AST-GCN [41], and STSGCN [42]. To address scenarios where the true graph is latent, adaptive graph learning has emerged, inferring structures from learnable node embeddings or similarity functions, as demonstrated in MTGNN [26], Graph WaveNet [43], and AGCRN [25]. Building on these foundations, modern models like MSGNet [44] and CrossGNN [45] incorporate multi-scale or relation-aware mechanisms, leveraging adaptive graphs to handle complex, evolving dependencies. While effective, these approaches present opportunities to further integrate frequency-domain insights and enhance representation learning for improved robustness in noisy, nonstationary environments.
These observations highlight a gap in the current landscape: while channel-independent methods excel at temporal modeling and Transformer/GNN-based methods capture variable interactions, few provide a unified framework that combines stable, task-oriented graph structures with frequency-aware spatiotemporal representations.

3. Preliminaries

3.1. Problem Definition

We formulate multivariate time-series forecasting over a set of variables that form a graph. Let G = ( V , E ,  A) denote a graph with N = | V | nodes, where each node corresponds to a variate in the multivariate sequence. The edge set E V × V captures pairwise spatial dependencies, and  A R N × N is the weighted adjacency matrix. In many real-world forecasting tasks, the graph structure is latent or partially known. In our SpeQNet, the adjacency matrix is adaptively learned and refined through the query-enhanced spatiotemporal representations (see Section 4). Given a historical sequence:
X 1 : T R N × T ,
the goal is to predict the future values:
Y ^ T + 1 : T + S R N × S .
The forecasting model learns a function f θ that satisfies
Y ^ T + 1 : T + S = f θ ( X 1 : T ; A θ ) ,
where A θ denotes the adaptive adjacency matrix learned within the model. In SpeQNet, the adaptive adjacency matrix A θ is learned via a softmax normalization and therefore has non-negative entries with row-wise sums equal to one. The core challenge lies in modeling the multi-scale spatiotemporal dependencies among the N variables over T time steps, capturing both temporal patterns and spatial dependencies. SpeQNet addresses these challenges by jointly modeling the spatiotemporal representations via a frequency-aware spectral graph filtering block.

3.2. Graph Spectral Theory and Filtering

Given an adaptive adjacency matrix A θ and a degree matrix D , we adopt the random-walk normalized Laplacian [46]:
L = I D 1 A θ ,
which models stochastic diffusion dynamics.
The spectral graph theory provides a principled framework for analyzing graph-structured signals through the eigendecomposition of the Laplacian:
L = U Λ U ,
where U contains orthonormal eigenvectors and Λ = diag ( λ 1 , , λ N ) contains eigenvalues that lie in [ 0 , 2 ]  [22]. The eigenvectors form the graph Fourier basis, and the eigenvalues represent frequencies. Low frequencies correspond to smooth global modes, whereas high frequencies capture abrupt variations. The formulation of Equation (3) is introduced to convey the classical spectral interpretation of graph signals. In SpeQNet, the adjacency matrix A θ is adaptively learned and is not necessarily symmetric; accordingly, the model does not explicitly rely on an orthonormal eigendecomposition in practice. Instead, spectral filtering is implemented via polynomial approximations, as detailed in the next subsection, which remain well defined for general weighted graphs. Given a graph signal x R N , its graph Fourier transform and inverse-transform are
x ^ = U x , x = U x ^ .
A spectral filter g ( Λ ) can modify graph signals by amplifying or suppressing specific frequencies:
y = U g ( Λ ) U x .
Learning such filters allows selective modeling of both smooth spatial correlations and localized variations. However, computing Equation (5) directly is computationally expensive for large graphs and is therefore typically approximated in practice.

3.3. Chebyshev-II Approximation

Classical ChebNet [13] approximate spectral filters using Chebyshev polynomials:
y k = 0 K w k T k ( L ^ ) x ,
that satisfy the recurrence:
T 0 ( x ) = 1 , T 1 ( x ) = x , T k ( x ) = 2 x T k 1 ( x ) T k 2 ( x ) .
In Equation (6), the  K N denotes the Chebyshev polynomial order, which determines the degree of the polynomial approximation and the effective K-hop spatial receptive field of the spectral filter. In principle, K can take any positive integer value, trading off spectral expressivity against computational cost. In this study, we fix K = 5 for all main experiments, representing a moderate polynomial order that balances spectral expressivity and computational efficiency. The w k are the trainable Chebyshev coefficients and the scaled Laplacian L ^ , whose eigenvalues lie in [ 1 , 1 ] , is defined as
L ^ = 2 λ max L I .
In practice, λ max is usually set to 2 [15,22], which avoids the need for eigendecomposition.
Although efficient and expressive, ChebNet learns illegal coefficients when approximating analytic filter functions, causing degraded generalization due to over-fitting [22]. ChebNetII [22] addresses this limitation by replacing direct coefficient learning with Chebyshev interpolation. Instead of directly learning the Chebyshev coefficients w k , ChebNetII learns the filter values at the Chebyshev nodes. These values are then projected into the Chebyshev basis, ensuring stable and theoretically valid spectral coefficients. Specifically, ChebNetII reparameterizes the coefficients as
w k = 2 K + 1 j = 0 K γ j T k ( x j ) ,
where x j = cos j + 1 / 2 K + 1 π are the Chebyshev nodes, and  γ j are learnable parameters representing the filter evaluated at those nodes. Substituting these coefficients into the spectral filtering operation yields the ChebNetII propagation rule:
y 2 K + 1 k = 0 K j = 0 K γ j T k ( x j ) T k ( L ^ ) x .
The spectral filters are able to approximate a wide range of spectral shapes—including low-pass, band-pass, and band-rejection patterns—demonstrating strong expressive power and stable optimization. Therefore, ChebNetII is well suited for spatiotemporal forecasting scenarios where different frequency bands capture distinct spatial interaction patterns.

4. Methodology

4.1. Overview of SpeQNet

Given a multivariate time-series window X R N × T , where N denotes the number of variates and T is the lookback length, SpeQNet predicts the future horizon of length S for all nodes. Figure 1 shows the overview of SpeQNet, an encoder-only design with six main components: (1) Temporal Encoding: Raw time series are first embedded by a temporal module to capture temporal dependencies, producing node-wise temporal representations. (2) Query-Enhanced Spatial Embedding: A set of lightweight learnable spatiotemporal queries interacts with the input to inject global spatial context, a linear projection layer then aligns the spatial enhanced signal to the embedding space. The temporal and spatial embeddings are then fused together via addition, yielding the spatiotemporal representations. (3) Adaptive Graph Learning: Using two learnable low-rank embeddings, SpeQNet constructs a task-oriented adaptive adjacency matrix that captures latent and dataset-specific spatial dependencies. (4) Node Feature Extraction: Based on the query-enhanced spatiotemporal representations, a two-layer MLP extracts the node representation. (5) Spectral Graph Filtering: Node features are processed by an enhanced Chebyshev-based spectral filtering block, where ChebNetII enables flexible frequency-selective filtering, complemented by channel-wise recalibration and nonlinear refinement to improve expressiveness and stability. (6) Forecasting Head: The refined representations are finally mapped to future predictions via a lightweight prediction head. By integrating query-enhanced spatiotemporal representations with adaptive spectral graph filtering, SpeQNet enables expressive, interpretable, and robust spatiotemporal forecasting under latent graph structures.
Formally, starting from X , SpeQNet computes temporal embeddings E temp R N × E and query-enhanced spatial embeddings E spatial R N × E , where E denotes the embedding dimension. These representations are then fused to form a spatiotemporal representation E st R N × E . The adaptive graph learning constructs an adjacency matrix A R N × N from learnable low-rank embeddings and a two-layer MLP initializes node features H 0 R N × E from E st . A stack of spectral graph filtering blocks then refines { H } = 1 L using ChebNetII-based filters over the learned graph. Finally, the forecasting head applies a per-node linear projection to obtain Y ^ R N × S .

4.2. Query-Enhanced Spatiotemporal Representation

Given X , SpeQNet first encodes the temporal dynamics of each variate and then injects global spatial context via learnable queries. Figure 2 illustrates the query-enhanced spatiotemporal representation module, where the Temporal Embedding Block (1), performs temporal embedding to encode historical dynamics of each variate, while the Query-Enhanced Spatial Embedding Block (2) applies spatial multi-head attention to inject global spatial context via learnable spatiotemporal queries. The two embeddings are fused to produce the spatiotemporal representation used for subsequent modeling.

4.2.1. Temporal Embedding

The temporal embedding layer extracts periodicities and trends from each variate’s length-T history into a compact representation of dimension E by applying a shared linear projection along the temporal dimension:
E temp = X W temp + b temp , E temp R N × E ,
where W temp R E × T and b temp R E are learnable parameters shared across all variates.

4.2.2. Query-Enhanced Spatial Embedding

While E temp summarizes the temporal evolution of each variate independently, it is agnostic to global spatial patterns shared across the dataset. To inject such information, SpeQNet initializes a learnable spatiotemporal query matrix Z R N × T , where the i-th row z i acts as a global query pattern associated with the i-th variate. We employ multi-head attention (MHA) to let these queries attend over the raw input series:
Q h = Z W h Q , K h = X W h K , V h = X W h V ,
where W h Q , W h K , W h V R T × d h are head-specific projection matrices and d h is the head dimension. The spatially enriched signal for head h is then
X spatial , h = softmax Q h K h d h V h R N × d h ,
which aggregates information from all variates according to learned query–key similarities. After concatenating all heads and applying an output projection, we obtain
X spatial = MHA ( Z , X , X ) R N × T .
To align this spatial enriched signal with the temporal embedding space, X spatial is passed through a linear projection:
E spatial = X spatial W spatial + b spatial , E spatial R N × E ,
where W spatial R E × L and b spatial R E are learnable parameters.

4.2.3. Spatiotemporal Fusion

Finally, SpeQNet fuses temporal and spatial information via additive integration:
E st = E temp + E spatial ,
yielding the query-enhanced spatiotemporal representation E st R N × E that serves as the input to the subsequent adaptive graph learning and spectral filtering blocks.

4.3. Adaptive Graph Learning and Node Feature Extraction

SpeQNet learns a task-specific adjacency matrix using learnable low-rank embeddings and initializes graph node features from the spatiotemporal representations. Figure 3 shows the adaptive graph learning and node feature extraction module, where Block (3) constructs a task-oriented adaptive adjacency matrix from the learnable low-rank embeddings and Block (4) extracts node features through a lightweight feature transformation of the query-enhanced spatiotemporal representations. Specifically, we introduce two learnable embedding matrices:
E 1 R N × r , E 2 R N × r ,
where r N is a rank hyperparameter. We construct a dense affinity matrix and apply a row-wise softmax to obtain the adjacency matrix:
A θ = softmax ( R e L U ( E 1 E 2 ) ) R N × N .
This low-rank factorization provides an efficient, input-independent graph that captures stable spatial patterns, while remaining fully learnable and task-driven. We then obtain the scaled Laplacian L ^ using Equations (2) and (8) with λ max = 2 .
Simultaneously, given the spatiotemporal representation E st , SpeQNet initializes the graph node features with a lightweight transformation. We apply a two-layer node-wise MLP with a ReLU activation to refine the representations and map them into the hidden feature space used by the spectral blocks. Denote this module by MLP node ( · ) , and the initial graph signal is
H 0 = MLP node ( E st ) , H 0 R N × E .
This step improves feature expressiveness while keeping the architecture simple and computationally efficient.

4.4. Spectral Graph Filtering Block

To model spatial dependencies across the learned graph, SpeQNet applies L spectral graph filtering blocks repeatedly over the node features, where
{ 1 , , L }
indexes the block. Each block combines (i) ChebNetII spectral filtering, (ii) node-wise feature recalibration, and (iii) a residual MLP, as visualized in Figure 4. The learnable filters enable flexible frequency-selective modulation of node representations, capturing critical spatial dependencies across various frequencies.
Given node features H 1 R N × E and the scaled Laplacian L ^ , the ChebNetII layer implements a spectral filter g ( · ) as in Equation (10):
H ˜ = g ( L ^ , H 1 ) R N × E .
This yields flexible and stable spectral responses that can realize low-pass, band-pass, or band-stop behaviors over the learned graph.
To further emphasize informative channels and suppress noisy ones, we apply a squeeze-and-excitation [47] style recalibration on H ˜ . First, we aggregate node-wise statistics:
s = 1 N i = 1 N H ˜ ( i , : ) R E .
Then, a two-layer bottleneck MLP followed by a sigmoid activation function produces channel-wise gates:
α = s i g m o i d ( MLP se ( s ) ) R E .
The recalibrated features are
H ^ ( i , : ) = α H ˜ ( i , : ) , i = 1 , , N .
Finally, we refine H ^ with an MLP:
H = MLP spec ( H ^ + H 1 ) .
producing frequency-aware node representations that encode both global and local spatial dependencies on the learned graph.

4.5. Forecasting Head

The forecasting head maps the final node representations to the prediction horizon. Given the output of the last spectral graph filtering block, H L R N × E , we apply a shared linear projection along the feature dimension:
Y ^ = H L W proj + b proj , Y ^ R N × S ,
where W proj R S × E and b proj R S . The same projection is applied to every node, consistent with the view that each variate shares a common forecasting mechanism but interacts through the learned graph and spectral filters.

5. Experiments

5.1. Experimental Settings

5.1.1. Datasets

We evaluate SpeQNet on twelve widely used multivariate time-series benchmarks that cover both long-term and short-term forecasting. The long-term forecasting group includes four ETT subsets (ETTh1, ETTh2, ETTm1, and ETTm2), Weather, Solar-Energy, Electricity, and Traffic, which span energy, climate, electricity, and transportation domains. The short-term group contains four traffic datasets from the PEMS collection (PEMS03, PEMS04, PEMS07, and PEMS08). All datasets used in this work are publicly available and are accessed following the benchmark configurations and data links provided in the iTransformer repository [10]. For the four PEMS traffic datasets and the four ETT benchmarks, we follow a train/validation/test split of 6/2/2, consistent with prior work. The remaining datasets (Weather, Solar-Energy, Electricity, and Traffic) follow a standard 7/1/2 split. All reported results, including those of SpeQNet and the baselines, are based on these predefined splits. The main statistics are summarized in Table 1.

5.1.2. Baselines

We select baseline methods to provide a comprehensive and representative comparison across major modeling paradigms for multivariate time-series forecasting, including linear models, convolutional architectures, Transformer-based methods, and graph-based approaches. These methods are widely adopted in the recent spatiotemporal forecasting literature and are commonly used as reference baselines on the evaluated benchmarks. We compare SpeQNet with eleven representative baselines spanning a diverse range of multivariate time-series forecasting frameworks, including channel-independent models, transformer-based models, convolutional architectures, and graph-based methods. Specifically, the baselines include RLinear [31], DLinear [30], TiDE [48], PatchTST [29], TimesNet [34], CycleNet [35], SCINet [49], Crossformer [36], iTransformer [10], MSGNet [44], and TimeXer [11]. For baseline results, we primarily report results provided in the iTransformer paper, which evaluates a wide range of state-of-the-art methods across all considered datasets under a unified experimental protocol. For those baselines that are not reported in iTransformer such as TimeXer, CycleNet, and MSGNet, we report the results as provided in their original publications.

5.1.3. Experiment Setup

SpeQNet is implemented in PyTorch 2.3.1 using Python 3.11, and its overall training and inference procedure is summarized in Algorithm 1. All experiments are conducted on two NVIDIA RTX 3090 GPUs (NVIDIA Corporation, Santa Clara, CA, USA) with an AMD EPYC 7301 CPU (AMD, Santa Clara, CA, USA). We optimize the model using the Adam optimizer with Huber loss for a maximum of 30 epochs, and apply early stopping based on validation loss with a patience of 3 epochs. A learning rate scheduler with exponential decay is applied, where the learning rate is multiplied by a factor of 0.8 at each epoch. The model checkpoint achieving the lowest validation loss is saved and used for final evaluation. The hyperparameters are dataset-dependent and summarized in Table 2, where E denotes the model embedding dimension, Blocks indicates the number of stacked spectral graph filtering blocks, Batch is the batch size, and LR is the initial learning rate. For all datasets, we set the chebyshev polynomial order K to 5, and the number of attention-head to 4. We adopt a unified experimental protocol widely used in recent forecasting studies. The input window length is fixed to T = 96 time steps for all datasets. Prediction horizons are set to S { 96 , 192 , 336 , 720 } for long-term forecasting benchmarks and S { 12 , 24 , 48 , 96 } for short-term forecasting benchmarks. This setting evaluates both short-range and long-range forecasting performance under a consistent lookback window, enabling fair and direct comparisons across methods. Forecasting performance is evaluated using Mean Squared Error (MSE) and Mean Absolute Error (MAE) on the test set for each prediction horizon, following standard practice in multivariate time-series forecasting.
Algorithm 1 SpeQNet forward pass and training procedure
Input: Historical window X R N × T , horizon S, Chebyshev order K, number of spectral blocks L
Output: Forecast Y ^ R N × S
  1:  Temporal embedding:  E temp X W temp + b temp ▹ Equation (11)
  2:  Query-enhanced spatial embedding: initialize learnable queries Z R N × T
  3:   X spatial MHA ( Z , X , X ) ▹ Equations (12)–(14)
  4:   E spatial X spatial W spatial + b spatial ▹ Equation (15)
  5:  Fusion:  E s t E temp + E spatial ▹ Equation (16)
  6:  Adaptive graph learning:  A θ softmax ReLU ( E 1 E 2 ) ▹ Equation (17)
  7:  Compute L ^ from A θ using Equations (2) and (8) with λ max = 2
  8:  Node feature init:  H 0 MLP node ( E s t ) ▹ Equation (18)
  9:  for  = 1 to L do
10:       H ˜ g ( L ^ , H 1 ; K ) ▹ Equation (19), ChebNetII (Equation (10))
11:       s 1 N i = 1 N H ˜ ( i , : ) ▹ Equation (20)
12:       α σ ( MLP se ( s ) ) ▹ Equation (21)
13:       H ^ ( i , : ) α H ˜ ( i , : ) , i ▹ Equation (22)
14:       H MLP spec ( H ^ + H 1 ) ▹ Equation (23)
15:  end for
16:  Forecast head:  Y ^ H L W proj + b proj ▹ Equation (24)
17:  Training: minimize L ( Y ^ , Y ) (Huber loss) with Adam (Section 5.1)
18:  return  Y ^

5.2. Main Results

We now present the main empirical results of SpeQNet across all twelve datasets. Detailed quantitative results are reported in Table 3 and Table 4, which list MSE and MAE for each horizon as well as averages across horizons.

5.2.1. Long-Term Forecasting

Table 3 summarizes the long-term forecasting results on eight public benchmarks. Using dataset-level evaluation based on average performance, SpeQNet achieves the highest number of overall wins, ranking first on four datasets, compared with two for TimeXer and one for CycleNet. These results indicate that SpeQNet consistently delivers strong forecasting accuracy across diverse variate numbers and data domains, rather than excelling only at specific horizons or metrics. In particular, SpeQNet achieves consistent best performance across all metrics on Electricity (321 variates), and Solar-Energy (137 variates), where multivariate interactions play a critical role.
On datasets with simpler spatial structure, such as ETT subsets (7 variates) and Weather (21 variates), SpeQNet remains highly competitive and often achieves the best or second-best results across individual horizons. Notably, on ETTh1, ETTm1, and Weather, the performance gap between SpeQNet and other top models diminishes, reflecting the inherent noise and weaker spatial coupling of the variables. Even in this setting, SpeQNet remains competitive and consistently ranks among the top-performing models, indicating the robustness of SpeQNet.
The Traffic dataset presents a particularly challenging case. Although it contains the largest number of variates (862), SpeQNet does not exhibit the same level of consistent superiority observed on other high-dimensional benchmarks. We attribute this behavior primarily to the temporal resolution of the dataset. Unlike the PEMS benchmarks, which are sampled at 5 min intervals, the Traffic dataset is recorded at an hourly frequency. This coarser temporal granularity weakens spatiotemporal dependencies and reduces the amount of fine-grained information available for the model. As a result, the benefits of query-enhanced spatiotemporal representations are partially diminished. In contrast, on PEMS07, which has a comparable number of variates (883) but is sampled at a much finer 5-min interval, SpeQNet demonstrates consistent superiority across all prediction horizons and achieves the best average performance, see Table 4. This comparison highlights that SpeQNet is particularly effective in settings where dense spatial interactions are coupled with high-resolution temporal dynamics, a regime that is common in real-world spatiotemporal systems.

5.2.2. Short-Term Forecasting

Table 4 reports the results on four PEMS benchmarks, variates varying from 170 to 883, under the short-term forecasting setting. SpeQNet achieves dataset-level wins on all four datasets, and more importantly, exhibits consistent superiority across every horizon and both metrics. Overall, these results validate SpeQNet as a strong spatiotemporal forecaster in classical traffic regimes, it not only scales to hundreds of nodes but also delivers uniformly improved accuracy across short and longer prediction horizons, highlighting its effectiveness.

5.3. Ablation Studies

We conduct ablation studies to validate the core design choices of SpeQNet and to isolate the contribution of each major component. Specifically, we create four variants: (1) w/o ST-Query, where we remove the spatiotemporal query module and use only temporally encoded node representations to construct the graph, which evaluates the necessity of query-enhanced spatiotemporal representations. (2) Attn-Graph, where we replace the adaptive graph learning mechanism with an attention-based graph construction, where the adjacency matrix is computed via scaled dot-product similarity between node representations, i.e., A = Softmax ( Q K / d ) , with Q and K obtained from linear projections of the node features. This variant assesses whether purely input-driven attention graphs can substitute task-oriented adaptive graph learning. (3) Corr-Graph, where we replace the adaptive graph learning module with a fixed correlation-based graph constructed from the training data, where the adjacency matrix is obtained by applying a softmax normalization to the correlation matrix, which evaluates whether static statistical dependencies can substitute task-adaptive graph learning. (4) GCN, where we replace the spectral graph filtering block with an equal number of spatial GCN layers, in order to evaluate the impact of the proposed spectral graph filtering versus standard low-pass message passing. We evaluate all variants on four datasets spanning all major application domains and graph scales considered in this work: PEMS07 (Transportation, 883 variates), Solar-Energy (Solar Power, 137 variates), Electricity (Electricity, 321 variates), and Weather (Weather, 21 variates). This setup allows us to systematically examine how each design choice affects forecasting performance under varying numbers of variates and domains. Quantitative results are reported in Table 5.
The ablation results demonstrate that each design component of SpeQNet contributes meaningfully to forecasting performance, with their impact becoming increasingly pronounced as the number of variates and the complexity of spatial dependencies grow. Removing the spatiotemporal query (w/o ST-Query) consistently degrades performance across all datasets, confirming that query-enhanced spatiotemporal representations are crucial for spatiotemporal forecasting.
Replacing the adaptive graph learning with a fixed correlation-based graph (Corr-Graph) also leads to noticeable performance drops, particularly on datasets with larger graph sizes. Although correlation graphs capture coarse pairwise dependencies, their static nature limits their ability to adapt to task objectives. Similarly, the Attn-Graph variant, which constructs the graph purely from input-driven attention, underperforms the proposed adaptive graph learner. This suggests that the adaptive graph learning is more effective than relying on similarity measures alone.
Among all variants, replacing the spectral graph filtering block with standard GCN layers (GCN) results in the most severe performance degradation across datasets. This finding underscores the limitation of purely low-pass message passing and highlights the importance of spectral graph filtering for modeling heterogeneous spatial interactions. Notably, performance differences between variants are relatively modest on the Weather dataset with limited spatial coupling, but widen substantially on larger graphs such as Solar-Energy and PEMS07. In particular, the pronounced degradation on PEMS07 confirms that SpeQNet’s components are jointly necessary rather than interchangeable, and that their benefits scale with system size and spatial complexity.

5.4. Learned Spectral Filter Response Analysis

To better understand how SpeQNet exploits spectral graph information, we analyze the learned frequency responses of the spectral filtering module. Since SpeQNet employs Chebyshev polynomial approximation, the learned filter can be explicitly characterized by its spectral response
h ( λ ) = k = 0 K c k T k ( λ ) ,
which describes how different graph frequencies, parameterized by Laplacian eigenvalues λ , are amplified or suppressed. Specifically, we examine the induced frequency responses, which provide an interpretable view of how information is propagated and reshaped in the graph spectral domain.
Figure 5 visualizes the learned spectral responses from both the first and the final spectral graph filtering layers on the PEMS07, Solar-Energy, and Weather datasets. Despite differences in shape, the learned spectral responses across all three datasets consistently exhibit a band-stop–like behavior, where mid-range graph frequencies are most strongly attenuated while both low- and high-frequency components are preserved. This shared pattern suggests that the proposed spectral filtering mechanism is particularly effective at suppressing intermediate-frequency variations that are less informative for forecasting, while retaining global trends and localized dynamics.
Beyond this common structure, the responses vary notably across datasets and across network depth. On PEMS07, the filters, especially in deeper layers, exhibit sharp transitions and pronounced curvature, indicating a highly expressive and aggressive frequency-selective behavior suited to complex and heterogeneous traffic interactions. In contrast, the Solar-Energy dataset yields a more moderated and piecewise response, where mid-range attenuation is achieved through a combination of steep and smooth variations, resulting in conservative yet structured spectral shaping. For the Weather dataset, the learned responses are considerably smoother across both early and late layers, characterized by gradual attenuation toward mid-range frequencies followed by a gentle and symmetric recovery at higher frequencies.
To complement the qualitative visualizations, Table 6 provides quantitative measurements of the learned spectral responses across predefined frequency bands. Following the Laplacian spectrum λ [ 0 , 2 ] , we partition the spectrum into low ( [ 0 , 0.67 ] ), mid ( [ 0.67 , 1.33 ] ), and high ( [ 1.33 , 2.0 ] ) frequency regions, and report the average gain, relative energy ratio, and within-band variance across bands for both first and last spectral filtering layers. Across all datasets, mid-frequency bands consistently exhibit the lowest average gain and energy ratio, numerically confirming the band-stop behavior observed in Figure 5. In contrast, low- and high-frequency bands retain substantially higher energy, indicating preservation of both global trends and localized dynamics.
Overall, these quantitative statistics and response visualizations corroborate that SpeQNet learns depth- and domain-adaptive band-stop spectral operators, rather than relying on a fixed low-pass bias. The observed diversity in response curvature across datasets and network depth highlights the expressivity and flexibility of the proposed spectral filtering mechanism, which progressively refines frequency selectivity in alignment with the forecasting objective.

5.5. Sensitivity to Chebyshev Polynomial Order

We further investigate the sensitivity of SpeQNet to the Chebyshev polynomial order K, which controls the spectral expressivity of the graph filtering operation. To this end, we conduct a controlled study on the Solar-Energy dataset by varying K from 1 to 10 while keeping all other settings fixed. Figure 6 reports the corresponding MSE and MAE results. SpeQNet exhibits stable performance across a wide range of K values, with both MSE and MAE fluctuating within a narrow margin. This observation indicates that the proposed model is not overly sensitive to the precise choice of spectral order and does not rely on carefully tuned high-degree polynomials to achieve good performance. Overall, the model demonstrates robustness to the choice of K, confirming that its effectiveness stems from adaptive spectral shaping rather than sensitivity to polynomial depth.

5.6. Scaling Analysis

To examine how SpeQNet scales with increasing spatial complexity, we analyze the relative performance improvement over a strong and consistent baseline (iTransformer) as the number of variates increases across datasets, as shown in Figure 7. For short-term traffic forecasting, SpeQNet exhibits a clear scaling advantage: the relative improvement consistently increases with the number of sensors, reaching nearly 20 % on PEMS07 with 883 variates. This pronounced trend indicates that query-guided spectral modeling becomes increasingly beneficial as spatial dependencies grow more complex. For long-term forecasting, SpeQNet maintains stable improvements across datasets with varying dimensionalities, achieving consistent gains on medium- to large-scale settings such as Solar-Energy and Electricity. Although the improvement on the largest Traffic dataset is more modest, this behavior is consistent with our earlier analysis attributing it to the coarse temporal sampling interval (one hour), which limits the amount of exploitable temporal and spatial variation. Overall, these results demonstrate that SpeQNet scales favorably with the number of variates, particularly in settings where complex and latent spatial interactions can be effectively leveraged.

5.7. Computational Efficiency Analysis

We further evaluate the computational efficiency and resource usage of SpeQNet and representative Transformer-based baselines under identical hardware and software settings. Table 7 reports the number of parameters, computational cost (GFLOPs), runtime, and memory consumption on two datasets with contrasting graph scales: Traffic (862 variates), representing large-scale systems, and ETTm1 (7 variates), representing small-scale networks.
On Traffic, SpeQNet incurs higher computational cost than Transformer-based baselines due to adaptive graph learning and spectral filtering over a large number of nodes, while maintaining comparable or lower peak memory usage and practical inference latency. On ETTm1, all models exhibit negligible computational overhead, and the absolute efficiency differences become marginal. Overall, SpeQNet presents a moderate increase in computation in exchange for improved modeling capacity, while remaining practical in terms of memory footprint and inference efficiency.

5.8. Graph Interpretability Analysis

We analyze the structure of the learned adaptive graph across four representative domains. As shown in Figure 8, the learned graphs consistently exhibit pronounced hub structures, visible as vertical stripe patterns in the adjacency matrices, whereas the correlation-based graphs are dominated by block-like connectivity induced by pairwise similarity. Confirming that the learned topology differs substantially from correlation structure.
The strength of the learned hubs varies systematically across domains. Traffic networks exhibit the strongest hub concentration (maximum degree 14.77 ), followed by Electricity ( 9.43 ), Solar-Energy ( 4.96 ), and Weather ( 2.02 ). This trend reflects dataset complexity, larger systems tend to develop stronger hub-dominated structures, where a small subset of nodes aggregates information from many others. Moreover, the difference maps (learned graph minus correlation graph) reveal systematic deviations between the two structures, with certain edges being emphasized or attenuated relative to correlation-based connectivity. These observations suggest that the adaptive graph encodes structural patterns that differ from simple pairwise similarity and may reflect task-specific relational characteristics present in the data.

5.9. Qualitative Forecasting Visualization

In addition to quantitative metrics, we provide qualitative visualizations of representative forecasting results to illustrate how SpeQNet captures temporal dynamics across different domains. Figure 9 shows example predictions for three randomly selected variables from each dataset (PEMS07, Electricity, Solar-Energy, and Weather), where the prediction horizons correspond to those evaluated in the main experiments (96, 336, 720, and 192, respectively). Across diverse temporal patterns, SpeQNet closely follows the ground-truth trajectories, accurately modeling both smooth trends and sharp periodic variations. These visualizations complement the quantitative results by providing an intuitive illustration of the model’s forecasting behavior.

6. Conclusions

Spatiotemporal forecasting demands models that can simultaneously capture global coordination across sensors and sharp, localized deviations—while operating under uncertain or latent graph structure. In this work, we introduced SpeQNet, a query-enhanced spectral graph filtering framework that couples (i) query-enhanced spatiotemporal representations, (ii) task-oriented adaptive graph construction, and (iii) a frequency-aware ChebNetII-based spectral filtering block for flexible spatial modeling. Extensive experiments on twelve benchmarks across traffic, energy, and climate domains demonstrate that SpeQNet consistently improves forecasting accuracy and scales effectively to large graphs with hundreds of variates.
Our analysis further sheds light on why SpeQNet works. Ablations show that the spatiotemporal query, adaptive graph learning, and spectral filtering are jointly necessary, with performance gaps widening as the number of variates increases. Moreover, SpeQNet learns domain-adaptive spectral operators with a consistent band-stop-like behavior, selectively attenuating mid-range graph frequencies while preserving both low- and high-frequency components—an interpretable pattern aligned with suppressing less informative variations while retaining global trends and localized dynamics. Finally, SpeQNet is robust to the spectral depth hyperparameter: performance remains stable across a broad range of Chebyshev polynomial orders, indicating that its gains arise from adaptive spectral shaping rather than fragile tuning.

Limitations and Future Work

While SpeQNet demonstrates strong performance across a diverse set of spatiotemporal forecasting benchmarks, several limitations define the current scope of the proposed framework. First, SpeQNet assumes regularly sampled time series with a fixed set of nodes, and performs adaptive graph learning and spectral filtering on sample-wise node representations obtained after temporal compression. As a result, the current formulation does not explicitly model time-varying graph structures, irregular sampling patterns, or dynamically evolving node sets. Second, although the evaluated benchmarks span multiple domains, generalization to substantially different application domains or sensing regimes beyond these benchmarks has not been explicitly validated. Broader evaluation under more irregular, partially observed, or domain-shifted conditions therefore remains an important direction for future work. Promising extensions include incorporating explicit mechanisms for dynamic graph evolution, designing hierarchical or time-adaptive query representations, and coupling spectral filter generation more tightly with the query mechanism to enable context-aware frequency adaptation without increasing computation cost.

Author Contributions

Conceptualization, Z.F. and K.M.; methodology, Z.F.; software, Z.F.; validation, Z.F.; formal analysis, Z.F.; investigation, Z.F.; resources, Z.F.; data curation, Z.F.; writing—original draft preparation, Z.F.; writing—review and editing, Z.F. and K.M.; visualization, Z.F. and K.M.; supervision, K.M.; project administration, K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in this study is publicly accessible and can be obtained from https://github.com/thuml/iTransformer (accessed on 10 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Satpathy, I.; Nayak, A.; Jain, V. The green city: Sustainable and smart urban living through artificial intelligence. In Utilizing Technology to Manage Territories; IGI Global: Hershey, PA, USA, 2025; pp. 273–304. [Google Scholar]
  2. Fang, Y.; Liang, Y.; Hui, B.; Shao, Z.; Deng, L.; Liu, X.; Jiang, X.; Zheng, K. Efficient large-scale traffic forecasting with transformers: A spatial data management perspective. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, Toronto, ON, Canada, 3–7 August 2025; pp. 307–317. [Google Scholar]
  3. Dong, Q.; Huang, R.; Cui, C.; Towey, D.; Zhou, L.; Tian, J.; Wang, J. Short-Term Electricity-Load Forecasting by deep learning: A comprehensive survey. Eng. Appl. Artif. Intell. 2025, 154, 110980. [Google Scholar] [CrossRef]
  4. Zhang, H.; Liu, Y.; Zhang, C.; Li, N. Machine learning methods for weather forecasting: A survey. Atmosphere 2025, 16, 82. [Google Scholar] [CrossRef]
  5. Kim, S.J.; Cho, Y. Enhancing environmental monitoring of harmful algal blooms with ConvLSTM image prediction. Environ. Res. Commun. 2025, 7, 025012. [Google Scholar] [CrossRef]
  6. Jin, M.; Koh, H.Y.; Wen, Q.; Zambon, D.; Alippi, C.; Webb, G.I.; King, I.; Pan, S. A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. arXiv 2023, arXiv:2307.03759. [Google Scholar] [CrossRef] [PubMed]
  7. Cini, A.; Marisca, I.; Zambon, D.; Alippi, C. Graph deep learning for time series forecasting. ACM Comput. Surv. 2025, 57, 321. [Google Scholar] [CrossRef]
  8. Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
  9. Pan, Z.; Liang, Y.; Wang, W.; Yu, Y.; Zheng, Y.; Zhang, J. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1720–1730. [Google Scholar]
  10. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
  11. Wang, Y.; Wu, H.; Dong, J.; Qin, G.; Zhang, H.; Liu, Y.; Qiu, Y.; Wang, J.; Long, M. Timexer: Empowering transformers for time series forecasting with exogenous variables. Adv. Neural Inf. Process. Syst. 2024, 37, 469–498. [Google Scholar]
  12. Cini, A.; Marisca, I.; Zambon, D.; Alippi, C. Taming local effects in graph-based spatiotemporal forecasting. Adv. Neural Inf. Process. Syst. 2023, 36, 55375–55393. [Google Scholar]
  13. Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), Barcelona, Spain, 5–10 December 2016; pp. 3844–3852. [Google Scholar]
  14. Balcilar, M.; Renton, G.; Héroux, P.; Gaüzère, B.; Adam, S.; Honeine, P. Analyzing the expressive power of graph neural networks in a spectral perspective. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
  15. Kipf, T. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  16. Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6861–6871. [Google Scholar]
  17. Xu, B.; Shen, H.; Cao, Q.; Cen, K.; Cheng, X. Graph convolutional networks using heat kernel for semi-supervised learning. arXiv 2020, arXiv:2007.16002. [Google Scholar] [CrossRef]
  18. Nt, H.; Maehara, T. Revisiting graph neural networks: All we have is low-pass filters. arXiv 2019, arXiv:1905.09550. [Google Scholar] [CrossRef]
  19. Meng, G.; Jiang, Q.; Fu, K.; Lin, B.; Lu, C.T.; Chen, Z. Early forecasting of the impact of traffic accidents using a single shot observation. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), Alexandria, VA, USA, 28–30 April 2022; pp. 100–108. [Google Scholar]
  20. Wang, X.; Zhang, M. How powerful are spectral graph neural networks. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 23341–23362. [Google Scholar]
  21. He, M.; Wei, Z.; Xu, H. Bernnet: Learning arbitrary graph spectral filters via bernstein approximation. Adv. Neural Inf. Process. Syst. 2021, 34, 14239–14251. [Google Scholar]
  22. He, M.; Wei, Z.; Wen, J.R. Convolutional neural networks on graphs with chebyshev approximation, revisited. Adv. Neural Inf. Process. Syst. 2022, 35, 7264–7276. [Google Scholar]
  23. Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
  24. Shao, Z.; Zhang, Z.; Wei, W.; Wang, F.; Xu, Y.; Cao, X.; Jensen, C.S. Decoupled dynamic spatial-temporal graph neural network for traffic forecasting. arXiv 2022, arXiv:2206.09112. [Google Scholar] [CrossRef]
  25. Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. arXiv 2020, arXiv:2007.02842. [Google Scholar] [CrossRef]
  26. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. arXiv 2020, arXiv:2005.11650. [Google Scholar] [CrossRef]
  27. Lee, H.; Ko, S. TESTAM: A time-enhanced spatio-temporal attention model with mixture of experts. arXiv 2024, arXiv:2403.02600. [Google Scholar]
  28. Park, C.; Lee, C.; Bahng, H.; Tae, Y.; Jin, S.; Kim, K.; Ko, S.; Choo, J. ST-GRAT: A novel spatio-temporal graph attention networks for accurately forecasting dynamically changing road speed. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, 19–23 October 2020; pp. 1215–1224. [Google Scholar]
  29. Nie, Y. A Time Series is Worth 64Words: Long-term Forecasting with Transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
  30. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar]
  31. Li, Z.; Qi, S.; Li, Y.; Xu, Z. Revisiting long-term time series forecasting: An investigation on linear mapping. arXiv 2023, arXiv:2305.10721. [Google Scholar] [CrossRef]
  32. Ansari, A.F.; Stella, L.; Turkmen, C.; Zhang, X.; Mercado, P.; Shen, H.; Shchur, O.; Rangapuram, S.S.; Arango, S.P.; Kapoor, S.; et al. Chronos: Learning the language of time series. arXiv 2024, arXiv:2403.07815. [Google Scholar] [CrossRef]
  33. Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.; Chen, P.Y.; Liang, Y.; Li, Y.F.; Pan, S.; et al. Time-llm: Time series forecasting by reprogramming large language models. arXiv 2023, arXiv:2310.01728. [Google Scholar]
  34. Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
  35. Lin, S.; Lin, W.; Hu, X.; Wu, W.; Mo, R.; Zhong, H. Cyclenet: Enhancing time series forecasting through modeling periodic patterns. Adv. Neural Inf. Process. Syst. 2024, 37, 106315–106345. [Google Scholar]
  36. Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  37. Yang, Y.; Zhu, Q.; Chen, J. Vcformer: Variable correlation transformer with inherent lagged correlation for multivariate time series forecasting. arXiv 2024, arXiv:2405.11470. [Google Scholar] [CrossRef]
  38. Woo, G.; Liu, C.; Kumar, A.; Xiong, C.; Savarese, S.; Sahoo, D. Unified training of universal time series forecasting transformers. arXiv 2024, arXiv:2402.02592. [Google Scholar] [CrossRef]
  39. Gao, S.; Koker, T.; Queen, O.; Hartvigsen, T.; Tsiligkaridis, T.; Zitnik, M. Units: A unified multi-task time series model. Adv. Neural Inf. Process. Syst. 2024, 37, 140589–140631. [Google Scholar]
  40. Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 9881–9893. [Google Scholar]
  41. Zhu, J.; Wang, Q.; Tao, C.; Deng, H.; Zhao, L.; Li, H. AST-GCN: Attribute-augmented spatiotemporal graph convolutional network for traffic forecasting. IEEE Access 2021, 9, 35973–35983. [Google Scholar] [CrossRef]
  42. Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 914–921. [Google Scholar]
  43. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar]
  44. Cai, W.; Liang, Y.; Liu, X.; Feng, J.; Wu, Y. Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 11141–11149. [Google Scholar]
  45. Huang, Q.; Shen, L.; Zhang, R.; Ding, S.; Wang, B.; Zhou, Z.; Wang, Y. Crossgnn: Confronting noisy multivariate time series via cross interaction refinement. Adv. Neural Inf. Process. Syst. 2023, 36, 46885–46902. [Google Scholar]
  46. Bauer, F. Normalized graph Laplacians for directed graphs. Linear Algebra Appl. 2012, 436, 4193–4222. [Google Scholar] [CrossRef]
  47. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  48. Das, A.; Kong, W.; Leach, A.; Mathur, S.; Sen, R.; Yu, R. Long-term forecasting with tide: Time-series dense encoder. arXiv 2023, arXiv:2304.08424. [Google Scholar]
  49. Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. Scinet: Time series modeling and forecasting with sample convolution and interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar]
Figure 1. Overview of SpeQNet: query-enhanced spectral graph filtering for spatiotemporal forecasting. SpeQNet adopts an encoder-only architecture that integrates query-enhanced spatiotemporal representation learning, adaptive graph construction, and Chebyshev-based spectral graph filtering to model latent spatial dependencies and generate future forecasts.
Figure 1. Overview of SpeQNet: query-enhanced spectral graph filtering for spatiotemporal forecasting. SpeQNet adopts an encoder-only architecture that integrates query-enhanced spatiotemporal representation learning, adaptive graph construction, and Chebyshev-based spectral graph filtering to model latent spatial dependencies and generate future forecasts.
Applsci 16 01176 g001
Figure 2. Query-enhanced spatiotemporal representation in SpeQNet. Block (1) encodes temporal embeddings, while Block (2) injects global spatial context via learnable queries; the two embeddings are fused to form the spatiotemporal representation.
Figure 2. Query-enhanced spatiotemporal representation in SpeQNet. Block (1) encodes temporal embeddings, while Block (2) injects global spatial context via learnable queries; the two embeddings are fused to form the spatiotemporal representation.
Applsci 16 01176 g002
Figure 3. Adaptive graph learning and node feature extraction in SpeQNet. Block (3) learns a task-oriented adaptive adjacency matrix, and Block (4) initializes node features from the query-enhanced spatiotemporal representations for subsequent spectral graph filtering.
Figure 3. Adaptive graph learning and node feature extraction in SpeQNet. Block (3) learns a task-oriented adaptive adjacency matrix, and Block (4) initializes node features from the query-enhanced spatiotemporal representations for subsequent spectral graph filtering.
Applsci 16 01176 g003
Figure 4. Spectral graph filtering block in SpeQNet. Given node features and the learned adaptive graph, the Block (5) applies ChebNetII-based spectral filtering to operate in the graph frequency domain.
Figure 4. Spectral graph filtering block in SpeQNet. Given node features and the learned adaptive graph, the Block (5) applies ChebNetII-based spectral filtering to operate in the graph frequency domain.
Applsci 16 01176 g004
Figure 5. Learned spectral filter responses from SpeQNet. Subfigures (a,d), (b,e), and (c,f), respectively, correspond to the PEMS07, Solar-Energy, and Weather datasets, where (ac) visualize the filters learned at the first spectral filtering layer and (df) show those from the final spectral filtering layer. The solid blue curve denotes the learned spectral filter response as a function of the Laplacian eigenvalue λ , while the shaded region highlights the filter magnitude across the spectrum.
Figure 5. Learned spectral filter responses from SpeQNet. Subfigures (a,d), (b,e), and (c,f), respectively, correspond to the PEMS07, Solar-Energy, and Weather datasets, where (ac) visualize the filters learned at the first spectral filtering layer and (df) show those from the final spectral filtering layer. The solid blue curve denotes the learned spectral filter response as a function of the Laplacian eigenvalue λ , while the shaded region highlights the filter magnitude across the spectrum.
Applsci 16 01176 g005
Figure 6. Sensitivity analysis of SpeQNet with respect to the Chebyshev polynomial order K on the Solar-Energy dataset.
Figure 6. Sensitivity analysis of SpeQNet with respect to the Chebyshev polynomial order K on the Solar-Energy dataset.
Applsci 16 01176 g006
Figure 7. Scaling behavior of SpeQNet with respect to the number of variates.
Figure 7. Scaling behavior of SpeQNet with respect to the number of variates.
Applsci 16 01176 g007
Figure 8. Learned vs. correlation graph comparison across domains. Each row corresponds to a dataset from a different domain: (a) PEMS07, (b) Electricity, (c) Solar-Energy, and (d) Weather. Columns show, from left to right: the learned adaptive graph, the correlation-based graph, their difference (learned minus correlation), and the node degree distribution of the learned graph.
Figure 8. Learned vs. correlation graph comparison across domains. Each row corresponds to a dataset from a different domain: (a) PEMS07, (b) Electricity, (c) Solar-Energy, and (d) Weather. Columns show, from left to right: the learned adaptive graph, the correlation-based graph, their difference (learned minus correlation), and the node degree distribution of the learned graph.
Applsci 16 01176 g008
Figure 9. Qualitative forecasting results on representative datasets. For each dataset, three variables are randomly selected and visualized from the same sample. Blue curves denote ground-truth observations, while red curves indicate SpeQNet predictions. The mean MAE for each variable over the prediction horizon is reported in the corresponding subplot title. Rows (a)–(d) correspond to the PEMS07, Electricity, Solar-Energy, and Weather datasets, respectively.
Figure 9. Qualitative forecasting results on representative datasets. For each dataset, three variables are randomly selected and visualized from the same sample. Blue curves denote ground-truth observations, while red curves indicate SpeQNet predictions. The mean MAE for each variable over the prediction horizon is reported in the corresponding subplot title. Rows (a)–(d) correspond to the PEMS07, Electricity, Solar-Energy, and Weather datasets, respectively.
Applsci 16 01176 g009
Table 1. Dataset statistics.
Table 1. Dataset statistics.
TaskDatasetVariatesFrequencyDomain
Long-term
Forecasting
ETTm1715 minElectricity
ETTm2715 minElectricity
ETTh171 hElectricity
ETTh271 hElectricity
Electricity3211 hElectricity
Traffic8621 hTransportation
Weather2110 minWeather
Solar-Energy13710 minSolar Power
Short-term
Forecasting
PEMS033585 minTransportation
PEMS043075 minTransportation
PEMS078835 minTransportation
PEMS081705 minTransportation
Table 2. Hyperparameter settings of SpeQNet across different datasets.
Table 2. Hyperparameter settings of SpeQNet across different datasets.
DatasetEBlocksBatchLR
ETTh15123320.0001
ETTh21283320.0001
ETTm11283320.0001
ETTm21283320.0001
Electricity5124160.0005
Solar-Energy5123320.0005
Traffic5125160.001
Weather5124640.0003
PEMS035125320.001
PEMS045125320.0005
PEMS075125320.001
PEMS085125320.0005
Table 3. Results for long-term multivariate time-series forecasting. All models use a fixed lookback length of T = 96 . Baseline results are taken from original or authoritative subsequent publications, with the source of reported results cited in the table header. The row ”# 1st” reports the number of times each method achieves the best average performance across all datasets, counted separately for MSE and MAE. Best results are highlighted in red bold, and second-best results are in blue underlined.
Table 3. Results for long-term multivariate time-series forecasting. All models use a fixed lookback length of T = 96 . Baseline results are taken from original or authoritative subsequent publications, with the source of reported results cited in the table header. The row ”# 1st” reports the number of times each method achieves the best average performance across all datasets, counted separately for MSE and MAE. Best results are highlighted in red bold, and second-best results are in blue underlined.
ModelsSpeQNetTimeXer(2024) [11]CycleNet(2024) [35]iTransformer(2024) [10]MSGNet(2024) [44]TimesNet(2023) [10]PatchTST(2023) [10]Crossformer(2023) [10]DLinear(2023) [10]SCINet(2022) [10]
MetricMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
ETTh1960.3730.3960.3820.4030.3750.3950.3860.4050.3900.4110.3840.4020.4140.4190.4230.4480.3860.4000.6540.599
1920.4280.4230.4290.4350.4360.4280.4410.4360.4430.4420.4360.4290.4600.4450.4710.4740.4370.4320.7190.631
3360.4730.4460.4680.4480.4960.4550.4870.4580.4820.4690.4910.4690.5010.4660.5700.5460.4810.4590.7780.659
7200.5090.4830.4690.4610.5200.4840.5030.4910.4960.4880.5210.5000.5000.4880.6530.6210.5190.5160.8360.699
Avg0.4460.4370.4370.4370.4570.4410.4540.4480.4530.4530.4580.4500.4690.4550.5290.5220.4560.4520.7470.647
ETTh2960.2630.3290.2860.3380.2980.3440.2970.3490.3290.3710.3400.3740.3020.3480.7450.5840.3330.3870.7070.621
1920.3210.3650.3630.3890.3720.3960.3800.4000.4020.4140.4020.4140.3880.4000.8770.6560.4770.4760.8600.689
3360.3650.3960.4140.4230.4310.4390.4280.4320.4400.4450.4520.4520.4260.4331.0430.7310.5940.5411.0000.744
7200.4270.4440.4080.4320.4500.4580.4270.4450.4800.4770.4620.4680.4310.4461.1040.7630.8310.6571.2490.838
Avg0.3440.3840.3680.3960.3880.4090.3830.4070.4130.4270.4140.4270.3870.4070.9420.6840.5590.5150.9540.723
ETTm1960.3120.3500.3180.3560.3190.3600.3340.3680.3190.3660.3380.3750.3290.3670.4040.4260.3450.3720.4180.438
1920.3580.3730.3620.3830.3600.3810.3770.3910.3770.3970.3740.3870.3670.3850.4500.4510.3800.3890.4390.450
3360.3890.3950.3950.4070.3890.4030.4260.4200.4170.4220.4100.4110.3990.4100.5320.5150.4130.4130.4900.485
7200.4570.4340.4520.4410.4470.4410.4910.4590.4870.4630.4780.4500.4540.4390.6660.5890.4740.4530.5950.550
Avg0.3790.3880.3820.3970.3790.3960.4070.4100.4000.4120.4000.4060.3870.4000.5130.4950.4030.4070.4850.481
ETTm2960.1750.2560.1710.2560.1630.2460.1800.2640.1820.2660.1870.2670.1750.2590.2870.3660.1930.2920.2860.377
1920.2420.3000.2370.2990.2290.2900.2500.3090.2480.3060.2490.3090.2410.3020.4140.4920.2840.3620.3990.445
3360.2980.3360.2960.3380.2840.3270.3110.3480.3120.3460.3210.3510.3050.3430.5970.5420.3690.4270.6370.591
7200.4080.4000.3920.3940.3890.3910.4120.4070.4140.4040.4080.4030.4020.4001.7301.0420.5540.5220.9600.735
Avg0.2810.3230.2740.3220.2660.3140.2880.3320.2890.3300.2910.3330.2810.3260.7570.6110.3500.4010.5710.537
Electricity960.1210.2250.1400.2420.1360.2290.1480.2400.1650.2740.1680.2720.1810.2700.2190.3140.1970.2820.2470.345
1920.1410.2420.1570.2560.1520.2440.1620.2530.1850.2920.1840.2890.1880.2740.2310.3220.1960.2850.2570.255
3360.1600.2600.1760.2750.1700.2640.1780.2690.1970.3040.1980.3000.2040.2930.2460.3370.2090.3010.2690.369
7200.1960.2900.2110.3060.2120.2990.2250.3170.2310.3320.2200.3200.2460.3241.7301.0420.2450.3330.2990.390
Avg0.1550.2550.1710.2700.1680.2590.1780.2700.1940.3010.1930.2950.2050.2900.2440.3340.2120.3000.2680.365
Solar-Energy960.1700.2300.2150.2950.1900.2470.2030.2370.2100.2460.2500.2920.2340.2860.3100.3310.2900.3780.2370.344
1920.1880.2430.2360.3010.2100.2660.2330.2610.2650.2900.2960.3180.2670.3100.7340.7250.3200.3980.2800.380
3360.2020.2510.2520.3070.2170.2660.2480.2730.2940.3180.3190.3300.2900.3150.7500.7350.3530.4150.3040.389
7200.2150.2570.2440.3050.2230.2660.2490.2750.2850.3150.3380.3370.2890.3170.7690.7650.3560.4130.3080.388
Avg0.1940.2450.2370.3020.2100.2610.2330.2620.2630.2920.3010.3190.2700.3070.6410.6390.3300.4010.2820.375
Traffic960.4260.2490.4280.2710.4580.2960.3950.2680.6080.3490.5930.3210.4620.2900.5220.2900.6500.3960.7880.499
1920.4460.2620.4480.2820.4570.2940.4170.2760.6340.3710.6170.3360.4660.2900.5300.2930.5980.3700.7890.505
3360.4690.2770.4730.2890.4700.2990.4330.2830.6690.3880.6290.3360.4820.3000.5580.3050.6050.3730.7970.508
7200.5640.3350.5160.3070.5020.3140.4670.3020.7290.4200.6400.3500.5140.3200.5890.3280.6450.3940.8410.523
Avg0.4760.2810.4660.2870.4720.3010.4280.2820.6600.3820.6200.3360.4810.3000.5500.3040.6250.3830.8040.509
Weather960.1580.1990.1570.2050.1580.2030.1740.2140.1630.2120.1720.2200.1770.2100.1580.2300.1960.2550.2210.306
1920.2070.2510.2040.2470.2070.2470.2210.2540.2110.2540.2190.2610.2250.2500.2060.2770.2370.2960.2610.340
3360.2620.2900.2610.2900.2620.2890.2780.2960.2730.2990.2800.3060.2780.2900.2720.3350.2830.3350.3090.378
7200.3470.3430.3400.3410.3440.3440.3580.3490.3510.3480.3650.3590.3540.3400.3980.4180.3450.3810.3770.427
Avg0.2430.2710.2410.2710.2430.2710.2580.2780.2490.2780.2590.2870.2590.2730.2590.3150.2650.3170.2920.363
# 1st47222210000000000000
Table 4. Results for short-term multivariate time-series forecasting. All models use a fixed lookback length of T = 96 . Baseline results are taken from original or authoritative subsequent publications, with the source of reported results cited in the table header. The row ”# 1st” reports the number of times each method achieves the best average performance across all datasets, counted separately for MSE and MAE. Best results are highlighted in red bold, and second-best results are in blue underlined.
Table 4. Results for short-term multivariate time-series forecasting. All models use a fixed lookback length of T = 96 . Baseline results are taken from original or authoritative subsequent publications, with the source of reported results cited in the table header. The row ”# 1st” reports the number of times each method achieves the best average performance across all datasets, counted separately for MSE and MAE. Best results are highlighted in red bold, and second-best results are in blue underlined.
ModelsSpeQNetCycleNet(2024) [35]iTransformer(2024) [10]RLinear(2023) [10]TiDE(2023) [10]TimesNet(2023) [10]PatchTST(2023) [10]Crossformer(2023) [10]DLinear(2023) [10]SCINet(2022) [10]
MetricMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
PEMS03120.0640.1650.0660.1720.1260.2360.1780.3050.0780.1870.0850.1920.0990.2160.0900.2030.1220.2430.0660.172
240.0780.1820.0890.2010.0930.2010.2460.3340.2570.3710.1180.2230.1420.2590.1210.2400.2010.3170.0850.198
480.1060.2100.1360.2470.1250.2360.5510.5290.3790.4630.1550.2600.2110.3190.2020.3170.3330.4250.1270.238
960.1310.2350.1820.2820.1640.2751.0570.7870.4900.5390.2690.3700.2280.3170.2620.3670.4570.5150.1780.287
Avg0.0950.1980.1180.2260.1130.2220.4950.4720.3260.4190.1470.2480.1800.2910.1690.2820.2780.3750.1140.224
PEMS04120.0690.1730.0780.1860.0780.1830.1380.2520.2190.3400.0870.1950.1050.2240.0980.2180.1480.2720.0730.177
240.0780.1850.0990.2120.0950.2050.2580.3480.2920.3980.1030.2150.1530.2750.1310.2560.2240.3400.0840.193
480.0970.2050.1330.2480.1200.2330.5720.5440.4090.4780.1360.2500.2290.3390.2050.3260.3550.4370.0990.211
960.1120.2200.1670.2810.1500.2621.1370.8200.4920.5320.1900.3030.2910.3890.4020.4570.4520.5040.1140.227
Avg0.0890.1960.1190.2320.1110.2210.5260.4910.3530.4370.1290.2410.1950.3070.2090.3140.2950.3880.0930.202
PEMS07120.0530.1450.0620.1620.0670.1650.1180.2350.1730.3040.0820.1810.0950.2070.0940.2000.1150.2420.0680.171
240.0620.1540.0860.1920.0880.1900.2420.3410.2710.3830.1010.2040.1500.2620.1390.2470.2100.3290.1190.225
480.0760.1680.1280.2340.1100.2150.5620.5410.4460.4950.1340.2380.2530.3400.3110.3690.3980.4580.1490.237
960.1000.1910.1760.2680.1390.2451.0960.7950.6280.5770.1810.2790.3460.4040.3960.4420.5940.5530.1410.234
Avg0.0730.1640.1130.2140.1010.2040.5040.4780.3800.4400.1250.2260.2110.3030.2350.3150.3290.3960.1190.217
PEMS08120.0700.1680.0820.1850.0790.1820.1330.2470.2270.3430.1120.2120.1680.2320.1650.2140.1540.2760.0870.184
240.0920.1900.1170.2260.1150.2190.2490.3430.3180.4090.1410.2380.2240.2810.2150.2600.2480.3530.1220.221
480.1410.2300.1690.2680.1860.2350.5690.5440.4970.5100.1980.2830.3210.3540.3150.3550.4400.4700.1890.270
960.2060.2840.2330.3060.2210.2671.1660.8140.7210.5920.3200.3510.4080.4170.3770.3970.6740.5650.2360.300
Avg0.1270.2180.1500.2460.1500.2260.5290.4870.4410.4640.1930.2710.2800.3210.2680.3070.3790.4160.1590.244
# 1st44000000000000000000
Table 5. Ablation study of SpeQNet. We evaluate the contribution of key architectural components by comparing the full model with four variants: w/o Query, without the spatiotemporal queries. Attn-Graph, replacing adaptive graph learning with attention weights. Corr-Graph, replacing adaptive graph learning with a fixed correlation-based graph. GCN, replacing spectral graph filtering block with GCN. Experiments are conducted on PEMS07, Solar-Energy, Electricity, and Weather datasets, spanning diverse domains and numbers of variates.
Table 5. Ablation study of SpeQNet. We evaluate the contribution of key architectural components by comparing the full model with four variants: w/o Query, without the spatiotemporal queries. Attn-Graph, replacing adaptive graph learning with attention weights. Corr-Graph, replacing adaptive graph learning with a fixed correlation-based graph. GCN, replacing spectral graph filtering block with GCN. Experiments are conducted on PEMS07, Solar-Energy, Electricity, and Weather datasets, spanning diverse domains and numbers of variates.
ModelsSpeQNetw/o QueryAttn-GraphCorr-GraphGCN
MetricMSEMAEMSEMAEMSEMAEMSEMAEMSEMAE
Weather960.1580.1990.1650.2050.1670.2110.1690.2130.2130.261
1920.2070.2510.2100.2480.2110.2500.2120.2530.2580.294
3360.2620.2900.2640.2880.2650.2900.2660.2920.3030.323
7200.3470.3430.3460.3410.3470.3420.3480.3430.3790.366
Avg0.2430.2710.2470.2710.2480.2730.2490.2750.2880.311
Solar960.1700.2300.1930.2430.1870.2360.2020.2590.2120.285
1920.1880.2430.2090.2560.2080.2500.2240.2710.2330.298
3360.2020.2510.2150.2600.2140.2550.2360.2740.2530.304
7200.2150.2570.2220.2610.2220.2560.2440.2750.2560.298
Avg0.1940.2450.2100.2550.2080.2490.2260.2700.2390.296
Electricity960.1210.2250.1290.2290.1260.2290.1270.2320.1850.287
1920.1410.2420.1450.2430.1430.2430.1480.2520.1940.294
3360.1600.2600.1620.2590.1620.2610.1760.2780.2120.309
7200.1960.2900.1990.2880.2010.2940.2250.3150.2540.338
Avg0.1550.2550.1590.2550.1580.2570.1690.2690.2110.307
PEMS07120.0530.1450.0750.1540.0700.1730.0610.1570.2900.382
240.0620.1540.0840.1650.0810.1850.0700.1680.2970.387
480.0760.1680.0950.1770.0950.1990.0830.1810.3120.400
960.1000.1910.1070.1880.1140.2140.0970.1930.3220.404
Avg0.0730.1640.0900.1710.0900.1930.0780.1750.3050.393
Note: Boldface is used to denote the best-performing results for each metric.
Table 6. Quantitative spectral response statistics across frequency bands. For each dataset, we report band-wise metrics for the first and last spectral filtering layers. Metrics include average gain (Gain), energy ratio (Energy, %), and gain variance (Var) computed over low-, mid-, and high-frequency bands of the learned spectral response. Frequency bands are defined over Laplacian eigenvalues λ [ 0 , 2 ] as low ( [ 0 , 0.67 ] ), mid ( [ 0.67 , 1.33 ] ), and high ( [ 1.33 , 2.0 ] ).
Table 6. Quantitative spectral response statistics across frequency bands. For each dataset, we report band-wise metrics for the first and last spectral filtering layers. Metrics include average gain (Gain), energy ratio (Energy, %), and gain variance (Var) computed over low-, mid-, and high-frequency bands of the learned spectral response. Frequency bands are defined over Laplacian eigenvalues λ [ 0 , 2 ] as low ( [ 0 , 0.67 ] ), mid ( [ 0.67 , 1.33 ] ), and high ( [ 1.33 , 2.0 ] ).
DatasetLayerLow Freq.Mid Freq.High Freq.
GainEnergyVarGainEnergyVarGainEnergyVar
PEMS07First0.8229.40.100.377.70.061.1162.40.32
Last1.0632.90.070.142.80.091.2964.00.59
SolarFirst0.8825.50.13−0.051.400.051.3773.00.63
Last0.9134.50.020.6215.60.031.0749.20.05
WeatherFirst1.0034.90.020.7619.90.011.1244.40.03
Last1.0538.00.040.6413.90.031.1747.30.06
Table 7. Computational efficiency comparison. We report model size, computational cost, runtime, and memory consumption for SpeQNet and representative Transformer-based baselines under identical hardware and software settings. Params (M) denotes the total number of trainable parameters in millions, GFLOPs denotes the number of giga floating-point operations required for a single forward pass, Train (ms) denotes the training time per iteration, Infer (ms) denotes the inference time per sample, and Memory (GB) denotes the peak GPU memory usage during training. Experiments are conducted on ETTm1 (7 variates) and Traffic (862 variates) with forecasting horizon H = 720 .
Table 7. Computational efficiency comparison. We report model size, computational cost, runtime, and memory consumption for SpeQNet and representative Transformer-based baselines under identical hardware and software settings. Params (M) denotes the total number of trainable parameters in millions, GFLOPs denotes the number of giga floating-point operations required for a single forward pass, Train (ms) denotes the training time per iteration, Infer (ms) denotes the inference time per sample, and Memory (GB) denotes the peak GPU memory usage during training. Experiments are conducted on ETTm1 (7 variates) and Traffic (862 variates) with forecasting horizon H = 720 .
DatasetTraffic (862 Variates)
ModelParams (M)GFLOPsTrain (ms)Infer (ms)Memory (GB)
SpeQNet11.85364.91168.861.504.92
iTransformer6.73141.5398.630.355.85
PatchTST1.5182.45169.960.304.93
DatasetETTm1 (7 variates)
ModelParams (M)GFLOPsTrain (ms)Infer (ms)Memory (GB)
SpeQNet0.590.1361.410.450.04
iTransformer0.300.0712.950.090.16
PatchTST1.511.3416.520.140.13
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, Z.; Markov, K. SpeQNet: Query-Enhanced Spectral Graph Filtering for Spatiotemporal Forecasting. Appl. Sci. 2026, 16, 1176. https://doi.org/10.3390/app16031176

AMA Style

Feng Z, Markov K. SpeQNet: Query-Enhanced Spectral Graph Filtering for Spatiotemporal Forecasting. Applied Sciences. 2026; 16(3):1176. https://doi.org/10.3390/app16031176

Chicago/Turabian Style

Feng, Zongyao, and Konstantin Markov. 2026. "SpeQNet: Query-Enhanced Spectral Graph Filtering for Spatiotemporal Forecasting" Applied Sciences 16, no. 3: 1176. https://doi.org/10.3390/app16031176

APA Style

Feng, Z., & Markov, K. (2026). SpeQNet: Query-Enhanced Spectral Graph Filtering for Spatiotemporal Forecasting. Applied Sciences, 16(3), 1176. https://doi.org/10.3390/app16031176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop