USTGCN: A Unified Spatio-Temporal Graph Convolutional Network for Stock-Ranking Prediction

Yao, Wenjie; Gao, Lele; Zhang, Xiangzhou; Chen, Haotao; Liu, Mingzhe; Hu, Yong

doi:10.3390/electronics15061317

Open AccessArticle

USTGCN: A Unified Spatio-Temporal Graph Convolutional Network for Stock-Ranking Prediction

by

Wenjie Yao

^1,†,

Lele Gao

^1,†,

Xiangzhou Zhang

^2,3

,

Haotao Chen

¹,

Mingzhe Liu

^2,4,* and

Yong Hu

^1,2,3,4,*

¹

College of Information Science and Technology, Jinan University, Guangzhou 510632, China

²

Big Data Decision Institute, Jinan University, Guangzhou 510632, China

³

School of Medicine, Jinan University, Guangzhou 510632, China

⁴

School of Management, Jinan University, Guangzhou 510632, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2026, 15(6), 1317; https://doi.org/10.3390/electronics15061317

Submission received: 2 March 2026 / Revised: 17 March 2026 / Accepted: 20 March 2026 / Published: 21 March 2026

(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Stock-ranking prediction is an important task in quantitative finance because it directly influences portfolio construction and alpha generation. Recent Graph Neural Network (GNN) models provide a promising way to describe inter-stock dependencies, but many existing methods still have difficulty balancing rapidly changing market interactions with relatively stable structural relationships. They are also easily affected by financial micro-structure noise. To address these issues, this paper proposes USTGCN, a Unified Spatio-Temporal Graph Convolutional Network for stock-ranking prediction. USTGCN adopts a dual-stream temporal encoder based on ALSTM and GRU to capture short-term dynamic patterns and longer-horizon structural information, respectively. We further introduce a rolling-window correlation smoothing strategy to build a more stable dynamic graph, and then integrate the dynamic and structural graph views through a shared fusion layer. Skip connections are used to preserve original temporal information during spatial aggregation. Experiments on the CSI100 and CSI300 benchmark datasets show that USTGCN achieves IC values of 0.141 and 0.154, respectively, and exhibits improved drawdown control during stressed market periods, indicating its practical value for quantitative trading.

Keywords:

stock-ranking prediction; spatio-temporal graph convolution; financial machine learning; deep learning; quantitative finance

1. Introduction

Stock-ranking prediction is a core task in quantitative finance because it supports data-driven portfolio construction and alpha discovery [1,2,3]. Rooted in the Efficient Market Hypothesis [4] and Modern Portfolio Theory [5], early studies often modeled financial assets independently and relied only on individual historical price series [6,7]. In real markets, however, stocks rarely evolve in isolation. Sector membership, supply-chain relations, and investor sentiment can all induce clear spatio-temporal dependencies among stocks [8,9,10]. Ignoring these links may therefore discard useful relational information.

This task remains difficult for two main reasons. First, traditional factor-based models usually rely on fixed-weight linear combinations of handcrafted alpha features, which adapt poorly to changing market regimes [11]. Second, financial time-series data are noisy and strongly non-stationary. Without suitable structural inductive biases, deep models can easily overfit transient market fluctuations [12]. Similar stability concerns have also been discussed in deep learning for other complex systems [13]. This motivates a framework that can capture persistent stock relations while remaining responsive to short-term changes.

The development of Graph Neural Networks (GNNs) [14,15] has provided a natural way to model stocks as nodes in a relational graph [16]. Spatio-temporal graph models have also shown strong performance in related domains [17,18], which motivates their use in financial prediction. Representative stock-oriented graph models include RSR [1], HIST [10], SDGNN [2], and DTSRN [3]. Among them, SDGNN jointly models static and dynamic graphs, but its static branch depends on stock ID embeddings, and its dynamic branch uses single-step correlations that are easily disturbed by temporary price noise. DTSRN avoids the ID-embedding issue, but discards static structural relations altogether. In addition, many existing approaches still rely on a single temporal encoder and use separate fusion mechanisms for different graph views, which increases architectural complexity and may weaken feature integration.

To address these limitations, we propose USTGCN, a Unified Spatio-Temporal Graph Convolutional Network for stock-ranking prediction. As shown in Figure 1, stock trajectories contain both rapidly changing co-movements and more persistent structural patterns. USTGCN models these temporal characteristics with complementary streams before graph construction and then fuses the resulting graph information through a unified fusion module. The main contribution is a task-oriented integration of temporal modeling, graph construction, and graph-view fusion for stock-ranking under noisy and evolving inter-stock relations.

Our primary contributions are summarized as follows:

We design a dual-stream temporal encoder that uses ALSTM to capture short-term volatility and GRU to summarize longer-term structural patterns.
We introduce a rolling-window correlation smoothing strategy to stabilize the dynamic graph and reduce the effect of transient micro-structure noise.
We build a shared fusion layer to aggregate dynamic and structural relations within one downstream prediction pipeline, while skip connections retain the original temporal features.

2. Related Work

2.1. Deep Temporal Models for Financial Time-Series

Early stock prediction studies relied heavily on classical econometric models such as ARIMA [19]. With the growth of deep learning, recurrent networks such as LSTM [20] and GRU [21] became common tools for modeling non-linear temporal dependence. These models were later improved by feature-fusion strategies [22], attention-based variants such as ALSTM [23], and more recent Transformer-based variants [24,25,26]. For a broader background on deep learning for financial time-series forecasting and recent graph-based stock forecasting literature, see [27,28]. Together, these studies form the temporal modeling background for this work.

2.2. Spatio-Temporal Graph Neural Networks

The success of GCNs [14] and GATs [29] also promoted the development of spatiotemporal GNNs, which first achieved strong results in applications such as traffic forecasting [17,30,31]. Models such as Graph WaveNet [18] and AGCRN [32] further showed that useful graph structures can be learned directly from data. In finance, treating stocks as interconnected nodes has improved predictive modeling [9,28]. Early graphbased approaches mainly relied on predefined relations, such as industry information [10] or explicit company links [16], while RSR [1] modeled relations through temporal response strength. However, purely static graphs are often not flexible enough for rapidly changing markets.

2.3. Dynamic Graph Learning and Temporal Fusion

Dynamic graph learning has therefore attracted increasing attention in evolving environments [33,34]. In finance, TRAN [35] introduces time-aware relational attention, while FinGAT [36] and HyperStockGAT [37] explore more expressive graph structures. Hybrid approaches have also attempted to combine static and dynamic views [38]. SDGNN [2] jointly learns both graphs but relies on noisy ID embeddings, whereas DTSRN [3] removes static relations altogether. Matsunaga et al. [9] further emphasized the usefulness of rolling windows in financial graph construction. These observations motivate the design choices adopted in USTGCN.

3. Preliminary

3.1. Problem Formulation

Let the investable stock universe be defined as a set of N stocks

S = {s_{1}, s_{2}, \dots, s_{N}}

. At any given trading day t, each stock

s_{i}

is associated with a historical multidimensional time-series data

X_{i}^{t} \in R^{T \times F}

, where

T = 60

denotes the defined lookback window, and F represents the number of observed daily trading features (e.g., open, close, high, low, volume, VWAP).

The fundamental objective of stock-ranking prediction is to accurately estimate the relative order of stocks based on their future profitability to facilitate algorithmic portfolio construction. Specifically, we define the target variable as the actual return of stock i at time

t + 1

:

y_{i}^{t + 1} = \frac{{Close}_{i}^{t + 1} - {Close}_{i}^{t}}{{Close}_{i}^{t}} .

(1)

The core learning task aims to optimize a neural mapping function f that translates historical multidimensional data into predicted future returns:

{\hat{y}}^{t + 1} = f (X^{t}) .

(2)

Ultimately, the equities are sorted in descending order based on

{\hat{y}}^{t + 1}

. Portfolios are formed by executing long positions on the top-ranked assets.

3.2. Graph Construction Paradigm

To mathematically represent inter-stock dependencies, we formalize the financial market as a set of nodes within two complementary graph topologies:

Dynamic Graph: Designed to encapsulate transient and highly reactive inter-stock correlations. It is dynamically generated via the cosine similarity of short-horizon latent representations extracted from recent temporal patterns.
Static Graph: Designed to capture relatively stable structural dependencies. It is derived from the cosine similarity of longer-horizon latent representations encoded via a separate recurrent network and is refreshed at each trading day, but typically evolves more smoothly than the dynamic graph.

4. Methodology

As illustrated in Figure 2, the operational pipeline of USTGCN comprises four synergistic modules: dual-stream temporal encoding, robust graph construction, unified spatio-temporal fusion, and ranking prediction.

4.1. Dual-Stream Temporal Encoding

Given the raw input tensor

X^{t} \in R^{N \times T \times F}

, where

X^{t}

contains observed trading features rather than latent node embeddings, a monolithic sequential encoder inherently struggles to isolate high-frequency volatility from low-frequency structural trends. We therefore map the raw inputs into two learned latent representations with embedding dimension

d = 128

:

Dynamic Features ( $H^{dyn}$ ): To capture transient market shocks and short-term sequential patterns, we deploy an Attention-augmented LSTM (ALSTM). The attention mechanism allows the network to assign varying importance weights to informative trading days, and attention-based LSTM models have been used in stock forecasting to highlight important time steps [23,39], which makes this branch well suited to emphasize abrupt short-horizon fluctuations:

$H^{dyn} = ALSTM (X^{t}) \in R^{N \times d} .$

(3)
Static Features ( $H^{stat}$ ): Conversely, to extract robust, longer-horizon structural characteristics that are less sensitive to daily noise, we utilize a standard Gated Recurrent Unit (GRU). GRU-based recurrent modeling provides a compact gated summary over sequence histories [21], and GRU has also been adopted for relatively longer prediction windows in stock trading applications [40]. By extracting the final hidden state over the whole lookback window, we obtain a smoother historical summary for each asset:

$H^{stat} = GRU (X^{t}) [:, - 1, :] \in R^{N \times d} .$

(4)

This ALSTM/GRU assignment is motivated by their different inductive biases. Attention-based LSTM models can emphasize informative time steps in stock sequences [39], whereas GRU-based modeling has also been used for relatively longer prediction windows in trading applications [40]. In our setting, ALSTM is therefore used for the dynamic stream, while GRU provides a comparatively smoother summary over the whole lookback window for the structural stream. Section 5 summarizes an encoder-assignment experiment showing that the default ALSTM/GRU pairing attains the strongest overall performance, whereas homogeneous encoder choices are consistently weaker, supporting the complementarity of the two streams.

4.2. Robust Graph Construction

The raw, instantaneous dynamic adjacency matrix at time t is computed via pairwise cosine similarity of the dynamic features:

A_{i j}^{dyn, t} = \frac{{(h_{i}^{dyn})}^{⊤} h_{j}^{dyn}}{∥ h_{i}^{dyn} ∥ ∥ h_{j}^{dyn} ∥} .

(5)

In high-frequency financial data, relying solely on

A^{dyn, t}

introduces severe micro-structure noise, leading to erratic graph topologies. To mathematically enforce temporal consistency and stabilize the manifold, we apply a sliding rolling-window average over the preceding

W = 20

days:

{\bar{A}}^{dyn, t} = \frac{1}{W} \sum_{τ = t - W + 1}^{t} A^{dyn, τ} .

(6)

We adopt a simple moving average because it suppresses transient correlation spikes without introducing an additional decay hyperparameter. In parallel, the structural graph

A^{stat, t}

is recomputed at each trading day from

H^{stat}

. Because

H^{stat}

summarizes longer-horizon behavior,

A^{stat, t}

typically evolves more slowly than the dynamic graph and serves as a relatively stable relational anchor without relying on stock ID embeddings:

A_{i j}^{stat, t} = \frac{{(h_{i}^{stat})}^{⊤} h_{j}^{stat}}{∥ h_{i}^{stat} ∥ ∥ h_{j}^{stat} ∥} .

(7)

During both training and inference, the two graph views are refreshed for each trading day using only information available up to day t; the distinction lies in temporal sensitivity rather than whether the graph is updated. To ensure numerical stability during message passing, both adjacency matrices are subjected to row-wise SoftMax normalization, yielding the stochastic matrices

Λ^{dyn, t}

and

Λ^{stat, t}

.

4.3. Unified Spatio-Temporal Fusion

USTGCN uses a unified fusion module to combine information from the dynamic and structural graph views. We first project the temporally specialized embeddings across their corresponding graph topologies:

\begin{matrix} {\tilde{H}}^{dyn} & = Λ^{dyn, t} H^{dyn}, \end{matrix}

(8)

\begin{matrix} {\tilde{H}}^{stat} & = Λ^{stat, t} H^{stat} . \end{matrix}

(9)

The propagated dynamic and structural representations are then integrated within a shared fusion stage through a linear transformation matrix

W \in R^{2 d \times d}

:

H^{fused} = [{\tilde{H}}^{dyn} ∥ {\tilde{H}}^{stat}] \cdot W,

(10)

where

∥

denotes the matrix concatenation operation.

4.4. Ranking Prediction

Deep graph convolutions inevitably risk over-smoothing, potentially erasing idiosyncratic stock signatures critical for precise ranking. To explicitly preserve the fidelity of original temporal signals, we establish a dense skip connection encompassing both initial feature spaces:

H^{skip} = [H^{dyn} ∥ H^{stat}] .

(11)

The final alpha score predictions are generated by routing the concatenated fused and skip tensors through a deep 4-layer Multi-Layer Perceptron (MLP):

{\hat{y}}^{t + 1} = MLP ([H^{fused} ∥ H^{skip}]),

(12)

configured with an aggressive dimensionality reduction cascade

(3 d \to d \to d / 2 \to d / 4 \to 1)

utilizing non-linear ReLU activations.

To specifically optimize for the relative ordering of assets rather than absolute scalar regression, we deploy the scale-invariant RankIC Loss:

L_{RankIC} = - \frac{\sum_{i} ({\hat{y}}_{i} - \bar{\hat{y}}) (y_{i} - \bar{y})}{N \cdot σ_{\hat{y}} \cdot σ_{y}},

(13)

where

\bar{\hat{y}}

,

\bar{y}

, and

σ_{\hat{y}}

,

σ_{y}

represent the sample mean and standard deviation of the predicted alpha scores and the ground truth returns, respectively. Minimizing this objective directly maximizes the Pearson correlation between predictions and true market outcomes in rank space.

5. Experiments

5.1. Datasets and Experimental Setup

We evaluate USTGCN on two Chinese A-share benchmark datasets, CSI100 and CSI300. Consistent with SDGNN and DTSRN, we use 2007–2014 for training, 2015–2016 for validation, and 2017–2020 for out-of-sample testing. Each stock is represented by six raw trading features, namely open, close, high, low, volume, and VWAP, over a lookback window of

T = 60

.

5.2. Baseline Methods

We compare USTGCN with 11 representative baselines across three categories:

Traditional ML: MLP, SFM.
Deep Sequence Models: GRU, LSTM, ALSTM, Transformer, ALSTM+TRA.
Advanced Graph Models: GATs, HIST, SDGNN, DTSRN.

5.3. Evaluation Metrics

To rigorously quantify both the ranking accuracy and the practical trading viability of the models, we evaluate performance from two dimensions: predictive metrics and portfolio backtesting metrics.

5.3.1. Predictive Metrics

To assess the cross-sectional ranking capability of the models at each trading day t, we employ three standard predictive indicators. Let N denote the total number of valid stocks in the daily cross-section, while

y_{i}

and

{\hat{y}}_{i}

represent the true return and the model’s predicted alpha score for stock i, respectively (the time superscript

t + 1

is omitted for brevity).

Information Coefficient (IC): Calculates the cross-sectional Pearson correlation coefficient between the predicted scores and the actual true returns. A higher IC indicates stronger linear predictive power:

$IC = \frac{\sum_{i = 1}^{N} ({\hat{y}}_{i} - \bar{\hat{y}}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{N} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2} \sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}},$

(14)

where $\bar{\hat{y}}$ and $\bar{y}$ are the cross-sectional means of the predictions and true returns.
Rank Information Coefficient (Rank IC): Computes the Spearman’s rank correlation coefficient. It measures the monotonic relationship by replacing continuous values with their respective ordinal ranks $r ({\hat{y}}_{i})$ and $r (y_{i})$ :

$Rank IC = \frac{\sum_{i = 1}^{N} (r ({\hat{y}}_{i}) - \bar{r}) (r (y_{i}) - \bar{r})}{\sqrt{\sum_{i = 1}^{N} {(r ({\hat{y}}_{i}) - \bar{r})}^{2} \sum_{i = 1}^{N} {(r (y_{i}) - \bar{r})}^{2}}},$

(15)

where $\bar{r} = (N + 1) / 2$ is the mean of the ranks. Compared to a standard IC, Rank IC is significantly more robust to financial extreme outliers and directly evaluates the effectiveness of the portfolio sorting order.
Precision@N (P@N): Measures the accuracy of the top algorithmic recommendations. It is defined as the proportion of the actual true top-N most profitable stocks successfully identified within the model’s top-N predictions:

$P @ N = \frac{| {Top}_{N} (\hat{y}) \cap {Top}_{N} (y) |}{N} .$

(16)

5.3.2. Portfolio Backtesting Metrics

For algorithmic trading evaluation, we establish a daily rebalanced Long-Top-K portfolio (

K = 30

). Let

R_{s, t}

and

R_{b, t}

denote the daily return of the strategy and the benchmark index at day t, respectively. Assuming 252 trading days per year and a total of D trading days in the backtest period, the quantitative metrics are defined as follows:

Annualized Return (Ann. Ret.): The geometric average of annualized wealth accumulation. Given the cumulative return $R_{c u m} = \prod_{t = 1}^{D} (1 + R_{s, t}) - 1$ , the annualized return is calculated as:

$Ann . Ret . = {(1 + R_{c u m})}^{\frac{252}{D}} - 1 .$

(17)
Maximum Drawdown (Max DD): Measures profound downside risk by computing the maximum observed loss from a historical peak to a trough of the portfolio’s net asset value (NAV):

$Max DD = min_{t \in (1, D)} (\frac{{NAV}_{t} - {max}_{τ \leq t} {NAV}_{τ}}{{max}_{τ \leq t} {NAV}_{τ}}),$

(18)

where ${NAV}_{t} = \prod_{τ = 1}^{t} (1 + R_{s, τ})$ .
Sharpe Ratio (Sharpe): Evaluates risk-adjusted performance. Setting the risk-free rate to zero for strict daily evaluation, it is computed by scaling the daily ratio by $\sqrt{252}$ :

$Sharpe = \frac{E [R_{s}]}{σ (R_{s})} \times \sqrt{252},$

(19)

where $E [R_{s}]$ and $σ (R_{s})$ are the sample mean and standard deviation of the daily strategy returns.
Information Ratio (IR): Measures the active return of the strategy relative to its tracking error against the benchmark:

$IR = \frac{E [R_{s} - R_{b}]}{σ (R_{s} - R_{b})} \times \sqrt{252} .$

(20)
Annualized Alpha ( $α$ ): Gauges excess profitability strictly independent of systemic market movements (Beta). We compute the daily alpha ( $α_{d a i l y}$ ) via Ordinary Least Squares (OLS) regression ( $R_{s, t} = α_{d a i l y} + β R_{b, t} + ϵ_{t}$ ), and project the annualized alpha as:

$Alpha = α_{d a i l y} \times 252 .$

(21)

5.4. Implementation Details and Hyperparameter Configuration

The proposed USTGCN framework is implemented using PyTorch (v2.4.1+cu121). To ensure reproducibility and provide transparent structural details, the key hyperparameters governing the data inputs, model architecture, and optimization process are summarized in Table 1.

The uniform hidden size (d) is configured to 128 based on empirical sensitivity analysis. In the dynamic graph generation module, the window size is set to

W = 20

based on the same analysis, which provides a favorable balance between noise suppression and responsiveness to market changes. During optimization, model parameters are updated using the Adam optimizer with a designated learning rate of

2 \times 10^{- 4}

. To prevent overfitting on highly stochastic financial time-series data, training spans a maximum of 100 epochs with early stopping patience set to 30 epochs based on the validation set. Furthermore, batch processing is executed on a daily cross-sectional basis to preserve holistic structural integrity.

5.5. Results and Quantitative Analysis

Predictive Performance: As detailed in Table 2, USTGCN achieves the best overall results across the reported evaluation metrics (IC, Rank IC, and Top-N Precisions). The performance gains are also somewhat larger on the CSI300 dataset, suggesting that the proposed design remains effective when the stock universe becomes larger and relation patterns become more heterogeneous.

5.6. Ablation Studies

To quantify the contribution of individual design choices, we conducted component ablations (Table 3) and an encoder-assignment study (Table 4). Removing the rolling window (USTGCN-RW) noticeably degrades IC metrics, indicating the interference caused by micro-structure noise. Replacing the unified fusion module (USTGCN-UC) or removing the dual-stream design (USTGCN-DS) also leads to clear performance drops, indicating the contribution of temporally specialized encoders and shared dual-graph fusion. The drop from USTGCN to USTGCN-UC is larger on CSI300 than on CSI100, especially in IC and Rank IC, which indicates that the shared fusion stage plays a larger role when the stock universe is larger and the cross-stock relation patterns are more heterogeneous. Removing residual skip connections (USTGCN-SK) causes a further decline, indicating that preserving stock-specific temporal signals is important for ranking quality.

Table 4 further examines the ALSTM/GRU allocation. The default configuration attains the best overall IC/Rank IC on both datasets. Swap-1 remains competitive on CSI300, which indicates that the two recurrent encoders can both contribute useful information. However, the homogeneous settings (GRU/GRU and ALSTM/ALSTM) consistently underperform, indicating that complementary encoder choices are preferable to identical dual branches.

5.7. Hyperparameter Sensitivity

To examine the sensitivity of USTGCN to several key hyperparameters, we monitored the variations of Test IC and Rank IC on both CSI100 and CSI300 while varying the hidden size, the learning rate, and the window size (Figure 3).

Impact of Hidden Size: Figure 3a shows that performance on both datasets peaks around

d = 128

. A smaller hidden size (e.g., 64) appears to limit representation capacity, while larger sizes such as 256 or 512 are associated with weaker out-of-sample performance, which is consistent with overfitting to noisy market fluctuations.

Impact of Learning Rate: Figure 3b shows that performance is strongest around

2 \times 10^{- 4}

. Lower learning rates lead to slower convergence, while a larger rate such as

10^{- 3}

is associated with noticeably weaker ranking performance.

Impact of Window Size: Figure 3c shows that very short windows retain more day-to-day noise, while overly long windows can oversmooth time-varying relations and reduce responsiveness to market changes. Both datasets achieve their best or near-best results around

W = 20

, which is therefore adopted in all experiments. We use a simple moving average because it stabilizes correlation estimates without introducing an additional decay parameter; exploring exponentially weighted smoothing is left for future work. The similar hyperparameter preferences observed on CSI100 and CSI300 suggest that the model behavior is reasonably stable across the two datasets.

5.8. Practical Backtest Performance and Risk Management

The practical value of a financial model is also reflected in trading performance. As visualized in Figure 4 and Figure 5, USTGCN achieves competitive cumulative returns in a daily rebalanced Long-Top-30 portfolio scheme spanning from 2017 to 2020. The drawdown analyses in Figure 6 and Figure 7 also indicate comparatively stable risk behavior. During severe systemic market downturns (e.g., Q1 2020), USTGCN shows shallower drawdown trajectories than several baselines.

To assess quantitative trading viability, Table 5 reports five backtesting metrics. USTGCN performs competitively across these dimensions:

1. Alpha Generation: USTGCN yields Annualized Returns of 20.75% (CSI100) and 31.16% (CSI300). The corresponding Annualized Alpha values of 7.09% and 16.36% indicate that the model captures excess returns beyond passive market exposure.

2. Risk-Adjusted Performance: While achieving strong returns, USTGCN also obtains the highest Sharpe Ratios (1.117 and 1.396) and Information Ratios (0.632 and 1.538). The Maximum Drawdowns are −17.48% and −20.65%, indicating competitive risk-adjusted profitability.

6. Conclusions

In this work, we proposed USTGCN, a Unified Spatio-Temporal Graph Convolutional Network for stock-ranking prediction. The model combines a dual-stream temporal encoder, rolling-window dynamic graph smoothing, and a unified dual-graph fusion layer with skip connections to capture both short-term market variation and longer-term structural information. Experimental results on the CSI100 and CSI300 benchmarks show that USTGCN improves ranking performance and delivers competitive backtest results with favorable drawdown behavior. The results also show that a compact design tailored to stock-ranking can achieve strong performance under noisy and evolving inter-stock relations. In future work, we plan to incorporate broader external information, such as macroeconomic indicators and news signals, to further enrich graph construction.

Author Contributions

Conceptualization, W.Y., L.G. and X.Z.; methodology, W.Y. and L.G.; software, W.Y.; validation, W.Y., L.G. and H.C.; formal analysis, W.Y.; writing—original draft preparation, W.Y. and L.G.; writing—review and editing, X.Z., M.L. and Y.H.; supervision, X.Z., M.L. and Y.H.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 72371116) and the Guangdong Engineering Technology Research Center for Big Data Precision Healthcare (Grant No. 603141789047).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data are available from Qlib (https://github.com/microsoft/qlib) (accessed on 15 January 2026). The preprocessed data and code used in this study are not publicly available due to confidentiality restrictions. Requests regarding research materials should be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Feng, F.; He, X.; Wang, X.; Luo, C.; Liu, Y.; Chua, T. Temporal Relational Ranking for Stock Prediction. ACM Trans. Inf. Syst. 2019, 37, 27:1–27:30. [Google Scholar] [CrossRef]
He, Y.; Li, Q.; Wu, F.; Gao, J. Static-dynamic graph neural network for stock recommendation. In 34th International Conference on Scientific and Statistical Database Management; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1–4. [Google Scholar] [CrossRef]
Zhong, Y.; Chen, J.; Gao, J.; Wang, J.; Wan, Q. DTSRN: Dynamic Temporal Spatial Relation Network for Stock Ranking Recommendation. In International Conference on Neural Information Processing; Springer: Cham, Switzerland, 2023; pp. 346–359. [Google Scholar] [CrossRef]
Fama, E.F. Efficient capital markets: A review of theory and empirical work. J. Financ. 1970, 25, 383–417. [Google Scholar] [CrossRef]
Markowitz, H.M. Portfolio Selection. J. Financ. 1952, 7, 77–91. [Google Scholar] [CrossRef]
Baek, Y.; Kim, H.Y. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Syst. Appl. 2018, 113, 457–480. [Google Scholar] [CrossRef]
Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock market index using fusion of machine learning techniques. Expert Syst. Appl. 2015, 42, 2162–2172. [Google Scholar] [CrossRef]
Bathla, G. Stock Price prediction using LSTM and SVR. In 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC); IEEE: Piscataway, NJ, USA, 2020; pp. 211–214. [Google Scholar] [CrossRef]
Matsunaga, D.; Suzumura, T.; Takahashi, T. Exploring Graph Neural Networks for Stock Market Predictions with Rolling Window Analysis. arXiv 2019, arXiv:1909.10660. [Google Scholar] [CrossRef]
Xu, W.; Liu, W.; Wang, L.; Xia, Y.; Bian, J.; Yin, J.; Liu, T.Y. HIST: A graph-based framework for stock trend forecasting via mining concept-oriented shared information. arXiv 2021, arXiv:2110.13716. [Google Scholar] [CrossRef]
Shi, H.; Song, W.; Zhang, X.; Shi, J.; Luo, C.; Ao, X.; Arian, H.; Seco, L.A. Alphaforge: A framework to mine and dynamically combine formulaic alpha factors. Proc. AAAI Conf. Artif. Intell. 2025, 39, 12524–12532. [Google Scholar] [CrossRef]
Duan, Y.; Wang, W.; Li, J. FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction. Proc. AAAI Conf. Artif. Intell. 2025, 39, 173–181. [Google Scholar] [CrossRef]
Noorizadegan, A.; Cavoretto, R.; Young, D.L.; Chen, C.S. Stable weight updating: A key to reliable PDE solutions using deep learning. Eng. Anal. Bound. Elem. 2024, 168, 105933. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Chen, Y.; Wei, Z.; Huang, X. Incorporating corporation relationship via graph convolutional neural networks for stock price prediction. In 27th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1655–1658. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar] [CrossRef]
Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock price prediction using the ARIMA model. In 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation; IEEE: Piscataway, NJ, USA, 2014; pp. 106–112. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Kim, T.; Kim, H.Y. Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data. PLoS ONE 2019, 14, e0212320. [Google Scholar] [CrossRef]
Feng, F.; Chen, H.; He, X.; Ding, J.; Sun, M.; Chua, T.S. Enhancing stock movement prediction with adversarial training. arXiv 2018, arXiv:1810.09936. [Google Scholar] [CrossRef]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar] [CrossRef]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. iTransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar] [CrossRef]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar] [CrossRef]
Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
Patel, M.; Jariwala, K.; Chattopadhyay, C. A Systematic Review on Graph Neural Network-based Methods for Stock Market Forecasting. ACM Comput. Surv. 2024, 57, 34. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, P.; Liò, P.; Bengio, Y. Graph attention networks. arXiv 2018, arXiv:1710.10903. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar] [CrossRef]
Jiang, W.; Luo, J. Graph neural network for traffic forecasting: A survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17804–17815. [Google Scholar] [CrossRef]
Pareja, A.; Domeniconi, G.; Chen, J.; Ma, T.; Suzumura, T.; Kanezashi, H.; Kaler, T.; Schardl, T.; Leiserson, C. EvolveGCN: Evolving graph convolutional networks for dynamic graphs. Proc. AAAI Conf. Artif. Intell. 2020, 34, 5363–5370. [Google Scholar] [CrossRef]
Rossi, E.; Chamberlain, B.; Frasca, F.; Eynard, D.; Monti, F.; Bronstein, M. Temporal graph networks for deep learning on dynamic graphs. arXiv 2020, arXiv:2006.10637. [Google Scholar] [CrossRef]
Gao, J.; Ying, X.; Xu, C.; Wang, J.; Zhang, S.; Li, Z. Graph-Based Stock Recommendation by Time-Aware Relational Attention Network. ACM Trans. Knowl. Discov. Data 2022, 16, 4:1–4:21. [Google Scholar] [CrossRef]
Hsu, Y.-L.; Tsai, Y.-C.; Li, C.-T. FinGAT: Financial Graph Attention Networks for Recommending Top-K Profitable Stocks. IEEE Trans. Knowl. Data Eng. 2023, 35, 469–481. [Google Scholar] [CrossRef]
Sawhney, R.; Agarwal, S.; Wadhwa, A.; Shah, R.R. Exploring the Scale-Free Nature of Stock Markets: Hyperbolic Graph Learning for Algorithmic Trading. In Proceedings of the Web Conference 2021, ACM/IW3C2, Ljubljana, Slovenia, 19–23 April 2021; pp. 11–22. [Google Scholar] [CrossRef]
Sawhney, R.; Agarwal, S.; Wadhwa, A.; Shah, R. Deep attentive learning for stock movement prediction from social media text and company correlations. In 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP); Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 8415–8426. [Google Scholar] [CrossRef]
Yu, Y.; Kim, Y.J. Two-dimensional attention-based LSTM model for stock index prediction. J. Inf. Process. Syst. 2019, 15, 1231–1242. [Google Scholar] [CrossRef]
Touzani, Y.; Douzi, K. An LSTM and GRU based trading strategy adapted to the Moroccan market. J. Big Data 2021, 8, 126. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of spatio-temporal co-movements among representative stocks. The y-axis shows standardized stock prices, and the x-axis shows trading days. The alternating divergence and convergence across stocks suggest the coexistence of time-varying and relatively stable cross-stock dependencies.

Figure 2. Overall architecture of USTGCN. The GRU branch corresponds to the structural/static stream, while the ALSTM branch corresponds to the dynamic stream. The framework contains four stages: dual-stream temporal encoding, robust graph construction, unified spatio-temporal fusion, and ranking prediction.

Figure 3. Hyperparameter sensitivity analysis of USTGCN. The y-axis is the performance metric (Test IC and Rank IC); CSI100 curves are shown in blue and CSI300 in orange, with solid lines/markers for IC and dashed lines for Rank IC (see legends). (a) Hidden size d: performance peaks at

d = 128

and weakens toward 512. (b) Learning rate: best results around

2 \times 10^{- 4}

, with a sharp drop at larger rates; learning-rate ticks use

\times 10^{n}

notation. (c) Rolling-window length W: both markets peak near

W = 20

, consistent with Table 1.

Figure 3. Hyperparameter sensitivity analysis of USTGCN. The y-axis is the performance metric (Test IC and Rank IC); CSI100 curves are shown in blue and CSI300 in orange, with solid lines/markers for IC and dashed lines for Rank IC (see legends). (a) Hidden size d: performance peaks at

d = 128

and weakens toward 512. (b) Learning rate: best results around

2 \times 10^{- 4}

, with a sharp drop at larger rates; learning-rate ticks use

\times 10^{n}

notation. (c) Rolling-window length W: both markets peak near

W = 20

, consistent with Table 1.

Figure 4. Empirical cumulative returns on the CSI100 dataset (2017–2020). The curves illustrate the out-of-sample wealth accumulation of daily rebalanced, equally weighted portfolios consisting of the top 30 consistently ranked stocks. The USTGCN portfolio (solid red line) stays above several baselines and the passive market index (dashed black line) for much of the test period, indicating competitive alpha-generation capability.

Figure 5. Empirical cumulative returns on the broader CSI300 dataset (2017–2020). Evaluated under identical top-30 long-only conditions, USTGCN maintains competitive portfolio performance on the larger asset universe, indicating that the method remains effective beyond the smaller CSI100 setting.

Figure 6. Dynamic drawdown analysis of the top-30 portfolios on the CSI100 dataset (2017–2020). The graph tracks the percentage decline from historical equity peaks. USTGCN (solid red line) shows comparatively smaller drawdowns during periods of severe market turbulence (e.g., the 2018 bear market and the early 2020 systemic shock), indicating better downside-risk control than several baselines.

Figure 7. Dynamic drawdown analysis of the top-30 portfolios on the CSI300 dataset (2017–2020). USTGCN maintains relatively shallow drawdown trajectories compared with several evaluated models, which is consistent with its competitive portfolio performance under systemic market stress.

Table 1. Summary of detailed hyperparameters, configurations, and their specific architectural purposes in USTGCN.

Category	Hyperparameter	Value	Remark/Functional Purpose
Data and Input	Feature Dimension (F)	6	Raw daily trading indicators (e.g., OHLCV, VWAP).
	Temporal Lookback Window (T)	60	Length of the historical trading day sequence.
	Batch Size	Daily Cross-Section	Preserves holistic market spatial topology per day.
Model Architecture	Dual-Stream Hidden Dim. (d)	128	Latent representation capacity for sequence encoders.
	Recurrent Layers	2	Depth of ALSTM and GRU sequential extractors.
	Graph Rolling Window (W)	20	Days aggregated to smooth transient market noise.
	Attention Reduction Factor (v)	4	Reduces parameter overhead in self-attention projection.
	Predictor MLP Cascade	$512 \to 128 \to \dots \to 1$	Progressive feature-fusion mapping to final alpha scores.
	Dropout Rate	0.0	Network regularization threshold (empirically disabled).
Training and Opt.	Optimizer	Adam	Adaptive gradient descent algorithm.
	Learning Rate ( $η$ )	$2 \times 10^{- 4}$	Optimal step size determined via sensitivity analysis.
	Maximum Epochs	100	Upper bound for training iterations.
	Early Stopping Patience	30	Epochs without validation gain before early termination.

Table 2. Predictive performance comparison on CSI100 and CSI300 test sets. Bold formatting indicates the superior result in each column.

Model	CSI100						CSI300
	IC	Rank IC	Precision@N (%)				IC	Rank IC	Precision@N (%)
	(↑)	(↑)	3 (↑)	5 (↑)	10 (↑)	30 (↑)	(↑)	(↑)	3 (↑)	5 (↑)	10 (↑)	30 (↑)
MLP	0.071	0.067	56.53	56.17	55.49	53.55	0.082	0.079	57.21	57.10	56.75	55.56
SFM	0.081	0.074	57.79	56.96	55.92	53.88	0.102	0.096	59.84	58.28	57.89	56.82
GATs	0.096	0.090	59.17	58.71	57.48	54.59	0.111	0.105	60.49	59.96	59.02	57.41
LSTM	0.097	0.091	60.12	59.49	59.04	54.77	0.104	0.098	59.51	59.27	58.40	56.98
ALSTM	0.102	0.097	60.79	59.76	58.13	55.00	0.115	0.109	59.51	59.33	58.92	57.47
GRU	0.103	0.097	59.97	58.99	58.37	55.09	0.113	0.108	59.95	59.28	58.59	57.43
Transformer	0.089	0.090	59.62	59.20	57.94	54.80	0.106	0.104	60.76	60.06	59.48	57.71
ALSTM+TRA	0.107	0.102	60.27	59.09	57.66	55.16	0.119	0.112	60.45	59.52	59.16	58.24
HIST	0.120	0.115	61.87	60.82	59.38	56.04	0.131	0.126	61.60	61.08	60.51	58.79
SDGNN	0.126	0.120	62.49	61.41	59.81	56.39	0.137	0.132	62.23	61.76	61.18	59.56
DTSRN	0.137	0.132	62.85	61.79	60.68	56.84	0.146	0.141	62.72	62.03	61.37	59.74
USTGCN	0.141	0.137	63.56	62.60	61.33	57.42	0.154	0.148	63.62	62.92	62.21	60.50

Table 3. Ablation study evaluating the specific impact of USTGCN components. Precision@N metrics are scaled by 100.

Variant	CSI100						CSI300
	IC	Rank IC	Precision@N (%)				IC	Rank IC	Precision@N (%)
	(↑)	(↑)	3 (↑)	5 (↑)	10 (↑)	30 (↑)	(↑)	(↑)	3 (↑)	5 (↑)	10 (↑)	30 (↑)
USTGCN	0.141	0.137	63.56	62.60	61.33	57.42	0.154	0.148	63.62	62.92	62.21	60.50
USTGCN-RW	0.139	0.135	64.17	62.58	61.23	57.22	0.149	0.144	63.96	62.55	61.48	60.48
USTGCN-UC	0.133	0.129	63.30	62.71	60.88	56.95	0.135	0.129	62.15	61.48	60.89	59.46
USTGCN-DS	0.126	0.120	63.88	62.23	60.26	56.30	0.138	0.132	62.01	61.27	61.24	59.54
USTGCN-SK	0.117	0.113	62.15	60.89	60.14	56.17	0.126	0.123	61.09	60.72	59.96	59.25

Table 4. Effect of encoder assignment in the dual-stream temporal module.

Dataset	Method	Dynamic	Static	IC	Rank IC
CSI100	USTGCN (Default)	ALSTM	GRU	0.141	0.137
	Swap-1	GRU	ALSTM	0.138	0.134
	Swap-2	GRU	GRU	0.138	0.132
	Swap-3	ALSTM	ALSTM	0.128	0.125
CSI300	USTGCN (Default)	ALSTM	GRU	0.154	0.148
	Swap-1	GRU	ALSTM	0.152	0.148
	Swap-2	GRU	GRU	0.146	0.140
	Swap-3	ALSTM	ALSTM	0.148	0.143

Table 5. Comprehensive quantitative backtest performance metrics (2017–2020). Bold formatting indicates the best result in each column.

Model	CSI100					CSI300
	Ann. Ret.	Sharpe	Max DD	IR	Alpha	Ann. Ret.	Sharpe	Max DD	IR	Alpha
	(% ↑)	(↑)	(% ↑)	(↑)	(% ↑)	(% ↑)	(↑)	(% ↑)	(↑)	(% ↑)
MLP	9.74	0.606	−24.65	−0.602	−2.40	21.86	1.051	−23.67	0.858	8.88
SFM	8.75	0.551	−25.63	−0.670	−3.26	26.97	1.249	−19.33	1.303	12.95
GATs	16.68	0.924	−20.36	0.211	3.54	25.69	1.182	−22.68	1.096	12.07
LSTM	15.43	0.870	−25.24	0.075	2.46	24.89	1.094	−26.66	1.009	10.83
ALSTM	16.37	0.905	−25.85	0.178	3.24	25.72	1.197	−21.04	1.142	12.08
GRU	18.10	0.974	−20.81	0.360	4.63	27.09	1.227	−24.02	1.215	13.06
Transformer	14.15	0.812	−20.08	−0.069	1.43	22.28	1.027	−23.71	0.863	8.85
ALSTM+TRA	14.72	0.839	−21.98	−0.006	2.03	21.74	1.021	−28.22	0.865	8.41
HIST	14.66	0.828	−20.25	−0.008	1.75	26.38	1.180	−27.11	1.116	12.42
SDGNN	18.75	1.002	−18.11	0.414	5.24	26.85	1.273	−19.27	1.230	13.30
DTSRN	15.40	0.852	−17.39	0.073	2.52	28.13	1.265	−24.72	1.255	13.99
USTGCN	20.75	1.117	−17.48	0.632	7.09	31.16	1.396	−20.65	1.538	16.36

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yao, W.; Gao, L.; Zhang, X.; Chen, H.; Liu, M.; Hu, Y. USTGCN: A Unified Spatio-Temporal Graph Convolutional Network for Stock-Ranking Prediction. Electronics 2026, 15, 1317. https://doi.org/10.3390/electronics15061317

AMA Style

Yao W, Gao L, Zhang X, Chen H, Liu M, Hu Y. USTGCN: A Unified Spatio-Temporal Graph Convolutional Network for Stock-Ranking Prediction. Electronics. 2026; 15(6):1317. https://doi.org/10.3390/electronics15061317

Chicago/Turabian Style

Yao, Wenjie, Lele Gao, Xiangzhou Zhang, Haotao Chen, Mingzhe Liu, and Yong Hu. 2026. "USTGCN: A Unified Spatio-Temporal Graph Convolutional Network for Stock-Ranking Prediction" Electronics 15, no. 6: 1317. https://doi.org/10.3390/electronics15061317

APA Style

Yao, W., Gao, L., Zhang, X., Chen, H., Liu, M., & Hu, Y. (2026). USTGCN: A Unified Spatio-Temporal Graph Convolutional Network for Stock-Ranking Prediction. Electronics, 15(6), 1317. https://doi.org/10.3390/electronics15061317

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

USTGCN: A Unified Spatio-Temporal Graph Convolutional Network for Stock-Ranking Prediction

Abstract

1. Introduction

2. Related Work

2.1. Deep Temporal Models for Financial Time-Series

2.2. Spatio-Temporal Graph Neural Networks

2.3. Dynamic Graph Learning and Temporal Fusion

3. Preliminary

3.1. Problem Formulation

3.2. Graph Construction Paradigm

4. Methodology

4.1. Dual-Stream Temporal Encoding

4.2. Robust Graph Construction

4.3. Unified Spatio-Temporal Fusion

4.4. Ranking Prediction

5. Experiments

5.1. Datasets and Experimental Setup

5.2. Baseline Methods

5.3. Evaluation Metrics

5.3.1. Predictive Metrics

5.3.2. Portfolio Backtesting Metrics

5.4. Implementation Details and Hyperparameter Configuration

5.5. Results and Quantitative Analysis

5.6. Ablation Studies

5.7. Hyperparameter Sensitivity

5.8. Practical Backtest Performance and Risk Management

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI