Next Article in Journal
Identification of Investment-Ready SMEs: A Machine Learning Framework to Enhance Equity Access and Economic Growth
Previous Article in Journal
Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions
Previous Article in Special Issue
NCD-Pred: Forecasting Multichannel Shipboard Electrical Power Demand Using Neighborhood-Constrained VMD
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SGR-Net: A Synergistic Attention Network for Robust Stock Market Forecasting

by
Rasmi Ranjan Khansama
1,2,
Rojalina Priyadarshini
2,
Surendra Kumar Nanda
2,
Rabindra Kumar Barik
3 and
Manob Jyoti Saikia
1,4,*
1
Biomedical Sensors & Systems Lab, University of Memphis, Memphis, TN 38152, USA
2
Department of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar 752054, Odisha, India
3
School of Computer Applications, KIIT Deemed to be University, Bhubaneswar 752054, Odisha, India
4
Electrical and Computer Engineering Department, University of Memphis, Memphis, TN 38152, USA
*
Author to whom correspondence should be addressed.
Forecasting 2025, 7(3), 50; https://doi.org/10.3390/forecast7030050
Submission received: 23 July 2025 / Revised: 29 August 2025 / Accepted: 2 September 2025 / Published: 14 September 2025
(This article belongs to the Special Issue Feature Papers of Forecasting 2025)

Abstract

Owing to the high volatility, non-stationarity, and complexity of financial time-series data, stock market trend prediction remains a crucial but difficult endeavor. To address this, we present a novel Multi-Perspective Fused Attention model (SGR-Net) that amalgamates Random, Global, and Sparse Attention mechanisms to improve stock trend forecasting accuracy and generalization capability. The proposed Fused Attention model (SGR-Net) is trained on a rich feature space consisting of thirteen widely used technical indicators derived from raw stock index prices to effectively classify stock index trends as either uptrends or downtrends. Across nine global stock indices—DJUS, NYSE AMEX, BSE, DAX, NASDAQ, Nikkei, S&P 500, Shanghai Stock Exchange, and NIFTY 50—we evaluated the proposed model and compared it against baseline deep learning techniques, which include LSTM, GRU, Vanilla Attention, and Self-Attention. Experimental results across nine global stock index datasets show that the Fused Attention model produces the highest accuracy of 94.36% and AUC of 0.9888. Furthermore, even at lower epochs of training, i.e., 20 epochs, the proposed Fused Attention model produces faster convergence and better generalization, yielding an AUC of 0.9265, compared with 0.9179 for Self-Attention, on the DJUS index. The proposed model also demonstrates competitive training time and noteworthy performance on all nine stock indices. This is due to the incorporation of Sparse Attention, which lowers computation time to 57.62 s, only slightly more than the 54.22 s required for the Self-Attention model on the Nikkei 225 index. Additionally, the model incorporates Global Attention, which captures long-term dependencies in time-series data, and Random Attention, which addresses the problem of overfitting. Overall, this study presents a robust and reliable model that can help individuals, research communities, and investors identify profitable stocks across diverse global markets.

1. Introduction

Given the very unpredictable and non-stationary character of financial data [1], forecasting stock market trends is a vital but difficult chore. Although they have been extensively applied, traditional statistical models such as Autoregressive Integrated Moving Average (ARIMA) [2,3] and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) frequently miss intricate temporal correlations and nonlinear patterns found in stock price fluctuations. Deep learning models, especially recurrent neural networks (RNNs), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), have shown great potential in recent years in managing sequential financial data by learning long-term dependencies [4]. However, these models still struggle to effectively capture significant market changes and trends.
Deep learning models have been coupled with attention processes meant to increase feature selection and sequence learning to raise forecasting accuracy [5,6,7]. For many natural language processing and time-series forecasting applications, standard attention methods, including Vanilla Attention and Self-Attention, have proven successful. In the context of stock market prediction, nevertheless, they frequently fail to balance short-term volatility with long-term dependency, thus producing less-than-ideal results. We present a Fusion Attention Model using a linear fusion approach to combine Sparse Attention, Random Attention, and Global Attention, thus addressing these difficulties and improving predictive performance.

1.1. Contributions

This study provides the following important contributions:
  • Propose a novel Multi-Perspective Fused Attention model: We developed a Multi-Perspective Fused Attention-based deep learning model (SGR-Net) that amalgamates the strengths of different Fused Attention mechanisms to efficiently capture uncertain temporal dependencies in stock time-series data, interdependency among technical indicators, and intricate patterns in financial time-series data.
  • Address key limitations of previous studies using attention-based models: Prior studies on stock market forecasting were based on individual attention approaches or standalone deep learning models. However, these models had difficulty focusing on time steps that have a significant influence on model predictions, were unable to capture long-term dependencies, and struggled to deal with noise in time-series sequences. In order to overcome these limitations, we adopted three complementary attention mechanisms: Sparse Attention, which focuses on impactful time steps while simultaneously reducing computation time; Global Attention, which captures long-term dependencies in sequences; and Random Attention, which mitigates noise and reduces overfitting. We thus produced a model that is both robust and generalizable.
  • Engineer a rich input feature space: The utilization of 13 technical indicators in this study enriched the feature space for model learning, thereby enabling the proposed Fused Attention model to learn intricate patterns and capture trends in stock indices efficiently.
  • Conduct extensive empirical analysis across nine global stock indices: Assessed the model’s performance on nine volatile global stock market indices, validating its performance and adaptability across a range of financial tasks.
  • Demonstrate superior performance and efficiency: We evaluated multiple global stock indices to showcase the noteworthiness of the proposed Fused Attention model, which not only maintains computational efficiency but also generalizes well across varied market conditions.

1.2. Organization

This paper is organized in general as follows: Section 2 addresses related studies on stock market forecasting and trend prediction. Section 3 introduces the suggested Fusion Attention Model together with its constituents. Section 4 addresses the dataset description. Section 6 offers an analysis of the results of different baseline models and the proposed Fused Attention model on different stock indices. Section 9 ends the study with a conclusion and future plans.

2. Related Work

Two classic paradigms define stock market prediction approaches: technical analysis and fundamental analysis. Fundamental analysis mainly focuses on the qualitative and quantitative evaluation of unstructured textual sources—including financial disclosures, earnings reports, and macroeconomic indices such as Gross Domestic Product (GDP) growth or inflation rates to find the stock’s intrinsic worth. In contrast, technical analysis uses the quantitative study of historical price charts, trading volumes, and statistical indicators—such as moving averages and the Relative Strength Index (RSI)—to spot recurrent trends and patterns that project short-term price movements.
Various past studies are presented in Table 1.
In technical analysis, traders and financial analysts use various technical indicators, like moving averages, RSI, etc., to predict the upcoming trend of a stock index. All technical indicators are derived from historical raw stock index data. In the past, researchers have leveraged these technical indicators along with machine learning models to improve the prediction accuracy of the model. The author of [8] adopted backpropagation-based neural networks along with fundamental analysis (16 financial variables) and technical analysis (11 macroeconomic indicators) for stock market forecasting. The author concluded that models trained on 1 to 3 years of financial data outperformed the minimum standard (market average return), but their incorporation of macroeconomic predictors produced no statistically significant results. Recent studies [9,10,13,27,28] have used ensembles of these technical indicators with machine learning techniques to learn complex and uncertain stock patterns. The authors of [12] leveraged a support vector machine optimized with a genetic algorithm for prediction of stock trends. The results not only improved the accuracy of stock market trend prediction but also outperformed other baseline models. Also, researchers have applied various ensemble techniques, such as Random Forest and AdaBoost [11], to predict stock index trends. The author of [29] presented a new model to forecast the S&P 500 index by combining technical indicators (e.g., moving averages and volatility measures) with ESG sentiment indices produced from news data.
With the success of deep learning techniques in various tasks, such as image recognition and language modeling, researchers and industrialists have started exploring the application of deep learning to financial time-series data. The authors of [30,31] reviewed many deep learning models for stock market prediction. Also, the author of [14] proposed LSTNet, a hybrid framework that incorporates a convolutional neural network (CNN) for short-term variable dependencies, an RNN for long-term trends, and an autoregressive component for time-series forecasting. The study by [32], supported by adversarial training for market stochasticity, introduced MONEY, an ensemble framework that integrates a graph convolutional neural network (GCN) and hypergraph networks to describe pairwise industry linkages and group-level stock co-movement. Through improved long-term dependency learning and robustness, their technique outperformed state-of-the-art algorithms, especially in bear markets, by giving priority to graph processing over RNNs, contrary to past studies.
In 2017, the author of [33] introduced the Transformer architecture for sequence-to-sequence tasks that has become a state-of-the-art deep learning architecture for attention-centric applications; it captures long-range interdependence in high-dimensional time-series data by using Self-Attention methods. The Transformer architecture, initially designed for the natural language processing (NLP) domain, captures long-range dependencies in sequential data and contextual interactions inside sequential data (e.g., word tokens in sentences) via Self-Attention processes. Its success in NLP results from adaptive token interaction and parallelizable training, which go beyond the sequential restrictions of recurrent architectures. Transformers have since been applied to computer vision [15], where spatial attention mechanisms substitute spatial embeddings to capture global pixel correlations. More recently, these architectures have been adapted to time-series forecasting, where temporal attention enables the modeling of complex sequential dependencies and long-horizon trends, demonstrating their adaptability across domains [16].
Recent advances in attention-based time-series forecasting have introduced architectures such as Informer, Crossformer, and FEDformer, each of which provides significant improvements in efficiency and scalability but also exhibits limitations when applied to financial data. Informer [18] leverages ProbSparse Self-Attention to reduce computational overhead for long input sequences, but its reliance on sparsity in query–key interactions often overlooks the subtle but critical short-term fluctuations characteristic of stock markets. Crossformer [20] extends attention to model cross-dimensional dependencies in multivariate time series, a useful design for domains with stable inter-series correlations (e.g., sensor networks). However, in financial contexts, correlations between indicators are highly dynamic and regime-dependent, causing instability in learned cross-dimensional representations. FEDformer [19] integrates Fourier decomposition with attention to enhance long-horizon forecasting, but its decomposition assumes quasi-stationary seasonal and trend components, which fails to generalize under the strong non-stationarity and abrupt regime shifts observed in stock price movements.
While architectures such as Informer [18], Crossformer [20], and FEDformer [19] have advanced attention-based time-series forecasting by improving scalability and long-horizon prediction, their underlying assumptions limit their applicability in financial contexts. These limitations have motivated the development of specialized attention-based models tailored for stock price forecasting, such as dynamic feature fusion frameworks [26], spot-forward parity-enhanced Transformers [34], memory–attention networks with long-distance loss functions [35], and adversarially trained graph attention hybrids [32], each addressing the non-stationarity, regime dependence, and stochasticity of financial markets.
Overall, the earlier studies have mainly focused on using an individual attention model to model time-series data. However, our model consists of the linear fusion of Sparse Attention, to reduce computational overhead; Global Attention, for capturing long-term trends in time series; and Random Attention, for dealing with the overfitting of the model. Despite the existence of many studies on the application of attention models for various domains, there is a lack of studies utilizing the attention models for stock market trend prediction. In our work, we utilize a hybrid attention model with thirteen technical indicators extracted from raw stock price data for stock market trend prediction which inherits the key strengths of each individual attention model.

3. Propose Model Architecture

The proposed Fused Attention model comprises the following main elements:
  • Input Layer: It process several technical indicators derived from past raw stock market data as inputs.
  • LSTM Layer: It captures the sequential dependencies in the financial time-series data.
  • Sparse Attention Module: It focuses on important, significant time steps under a sparsity restriction.
  • Global Attention Module: It assigns dynamic priority values across all time steps.
  • Random Attention Module: It provides random weight assignment meant to enhance generalization.
  • Fusion Layer: The fusion layer combines attention outputs with feature representation improvement.
  • Feedforward Network: The feedforward network classifies final stock market trends as up/down.
The proposed architecture is illustrated in Figure 1.

3.1. Input Layer

The input to the model consists of T time steps, where each time step contains 13 technical indicators derived from raw historical stock market data. The input data are defined as
X = { x 1 , x 2 , , x T } , x t R N
where
  • X is the input data with the shape (batch_size, seq_len, input_size).
  • Each x t represents a feature vector that includes 13 technical indicators at time step t.

3.2. LSTM Layer

The following gating method is used by the LSTM layer to update hidden states H t in order to capture long-range dependencies:
  • Forget Gate (Filtering Old Information):
    F t = σ ( Θ F Z t + Φ F H t 1 + Ψ F )
  • Input Gate (Deciding What to Store):
    I t = σ ( Θ I Z t + Φ I H t 1 + Ψ I )
  • Candidate Memory Update (New-Information Processing):
    M ~ t = tanh ( Θ M Z t + Φ M H t 1 + Ψ M )
  • Memory Cell Update (Retaining Important Information):
    M t = F t M t 1 + I t M ˜ t
  • Output Gate (Deciding What to Reveal as Output):
    O t = σ ( Θ O Z t + Φ O H t 1 + Ψ O )
  • Final Hidden State Calculation:
    H t = O t tanh ( M t )
where
  • The forget, input, and output gates are denoted by F t , I t , and O t .
  • The sigmoid activation function is denoted by σ .
  • The cell state capturing memory across time steps is denoted by M t .
  • The forget gate weights are denoted by Θ F and Φ F .
  • The input gate weights are denoted by Θ I and Φ I .
  • The memory update weights are denoted by Θ M and Φ M .
  • The output gate weights are denoted by Θ O and Φ O .
  • Biases are denoted by Ψ .
The attention mechanism then uses the LSTM outputs, a hidden state sequence
H = [ H 1 , H 2 , , H T ] ,
as input.

3.3. Attention Mechanisms

3.3.1. Sparse Attention

Only a few key time steps are considered selectively by Sparse Attention to assign weights, reducing noise and overfitting. The attention score is calculated as
α t = exp ( W s h t ) j = 1 T exp ( W s h j )
C s = t = 1 T α t h t
where
  • W s is a learned parameter.
  • C s denotes the Sparse Attention context vector.

3.3.2. Global Attention

Global Attention mechanisms assign dynamic importance scores across the time steps to capture long-term dependencies. It is calculated as
β t = exp ( W g h t ) j = 1 T exp ( W g h j )
C g = t = 1 T β t h t
where
  • W g is a trainable parameter.
  • C g denotes the Global Attention context vector.

3.3.3. Random Attention

Random Attention assigns weights randomly to introduce stochasticity. It is computed as follows:
γ t = U ( 0 , 1 ) j = 1 T U ( 0 , 1 )
C r = t = 1 T γ t h t
where
  • U ( 0 , 1 ) is a uniform random distribution.
  • C r denotes the Random Attention context vector.
This method prevents the model from overfitting to specific patterns, improving generalization and robustness of the model [36,37].

3.4. Fusion Layer

To utilize the complementary strengths of all three attention mechanisms—Sparse, Global, and Random Attention—their outputs are concatenated:
F = [ C s , C g , C r ]
Since each context vector has a dimensionality of hidden_size, the concatenated vector F has dimensions hidden_size × 3.
Then, a linear transformation is applied to reduce redundancy:
F = W f F + b f
where
  • W f is a learnable weight matrix.
  • b f is a bias term.
The final fused representation F is then forwarded to the feedforward layers.

3.5. Feedforward Network

The last stage is the classification layer, which consists of a two-layer fully connected network:
  • First Layer:
    This layer consists of a fully connected layer.
    The activation function used in this layer is ReLU .
  • Second Layer:
    This is a fully connected layer.
    The activation function used in this layer is the sigmoid activation function, for predicting uptrends and downtrends.
    The final prediction y ^ is calculated as
    y ^ = σ ( W o F + b o )
    for binary classification.

4. Dataset Description

To capture the volatility and dynamics of multiple markets, the dataset used in this study consists of historical stock market data from nine major global indices. The following indices were utilized to evaluate the model’s performance:
  • Bombay Stock Exchange, India’s BSE index.
  • Germany’s DAX index, Deutscher Aktienindex.
  • Dow Jones Industrial Average, USA (DJUS).
  • NASDAQ—USA’s Composite Index.
  • NIFTY 50: National Stock Exchange of India.
  • Nikkei 225 Tokyo Stock Exchange, Japan.
  • NYSE AMEX: NYSE American Composite Index, USA.
  • Standard and Poor’s 500, USA (S&P 500).
  • Shanghai stock index—China’s Shanghai Stock Exchange.

4.1. Preprocessing and Data Collection

The dataset comprises daily raw historical records of open, high, low, and closing prices for each index, extracted using the Yahoo Finance (yfinance) library and from Quandl. The dataset spans several years to ensure robust model evaluation and training.
Thirteen technical indicators were derived from the raw data to enhance predictive performance. These indicators capture trends, momentum, volatility, and market strength.

4.2. Feature Engineering

Each instance in the dataset consists of technical indicators derived from historical price data. The class variable represents the stock market trend (an uptrend or a downtrend) for the next day, while the input features consist of the following 13 indicators:
  • Simple Moving Average (ten-day SMA);
  • Ten-day Weighted Moving Average (WMA);
  • Stochastic %K (fourteen-day indicator);
  • Stochastic %D (three-day moving average of %K);
  • Five-day Discrepancy Index;
  • Ten-day Disparity Index;
  • Ten-day Oscillator Percentage (OSCP);
  • Ten-day Momentum;
  • Relative Strength Index (RSI; fourteen-day index);
  • Larry Williams %R (fourteen-day indicator);
  • Accumulation and Distribution Indicator (A/D);
  • Twenty-day Commodity Channel Index (CCI);
  • Moving Average Convergence Divergence (MACD: 12, 26, 9).
Target Variable (Class Variable): The class variable represents the price movement for the next day:
  • It takes a value of 1 (up) when the closing price of the next day surpasses that of the current day.
  • It takes a value of 0 (down) when the closing price of the next day is lower than that of the current day.

4.3. Dataset Details

The dataset was divided into training (70%) and testing (30%) sets with the time-based split method to preserve the order of the temporal sequence and to avoid information leakage. The details of the dataset and the distributions of the uptrend and downtrend classes are provided in Table 2.
The dataset was normalized to a standard range to improve the learning process of the model.

5. Experimental Setup

The complete hyperparameter configuration used in our experiments for the training of all models is summarized in Table 3. These hyperparameters were selected on the basis of best practices in the prior literature and empirical adjustments.
The models were trained with an LSTM hidden size of 128 and using the Adam optimizer with a fixed learning rate of 0.001. A mini-batch size of 32 was used for both training and evaluation. Training was performed with varying epoch counts (10, 20, 30, 40, and 50 epochs) for baseline models, while the proposed Fused Attention model and ablation study were trained for up to 100 epochs to explore convergence behavior. To ensure reproducibility during different runs, we fixed the random seed to 42 for Random Attention during evaluation. For comparative baseline models, we report accuracy and AUC across varying epochs. In contrast, the proposed model, SGR-Net, is evaluated with a broader set of metrics, including accuracy, AUC, precision, recall, and F1-score. To further ensure robustness, 95% confidence intervals (CIs) were computed with the Wilcoxon test for accuracy and bootstrap estimates for the remaining metrics.
All experiments were conducted on Google Colab equipped with an Intel(R) Xeon(R) CPU @ 2.20GHz and 13 GB of system RAM, without GPU acceleration.

6. Result Analysis and Discussion

The proposed model is tested on nine different stock indices, that is, the DJUS stock index, the NYSE AMEX stock index, the BSE stock index, the DAX stock index, the NASDAQ stock index, the Nikkei 225 stock index, the S&P 500 stock index, the Shanghai Stock Exchange, and the NIFTY 50 stock index. The proposed Fused Attention model is executed for 100 epochs on all the stock indices. The results are provided in Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12. Then, the rationality and effectiveness of the proposed model are measured by comparing its performance with different state-of-art models, which are represented in Table 13, Table 14, Table 15, Table 16, Table 17, Table 18, Table 19, Table 20 and Table 21 and Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18 and Figure 19. The comparison is performed for 50 epochs due to the overfitting of the model beyond 50 epochs.

6.1. Performance of All Models on DJUS Stock Index

The performance of all models on the DJUS stock index is shown in Table 13 and illustrated in Figure 2 and Figure 3. On the DJUS stock index, the proposed Fused Attention model outperforms all other baseline models with the highest accuracy of 85.81% and the highest AUC of 0.9442 at 50 epochs. This suggests its improved capacity to learn intricate patterns in financial time-series data. Self-Attention performs competitively, especially at higher epochs; our model outperforms it with a 0.49% better AUC and a 1.89% higher accuracy at 50 epochs. With an AUC of 0.9265 at just 20 epochs, Fused Attention also shows faster convergence and better generalization than Self-Attention’s 0.9179 AUC.
Table 13. Performance of different models on DJUS stock index.
Table 13. Performance of different models on DJUS stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.78130.88691.4998
200.81330.90202.3865
300.81710.90774.2865
400.83120.91304.8311
500.83380.91646.6354
GRU100.80310.88701.0240
200.82230.90192.2396
300.76730.90633.3364
400.82610.91274.0333
500.82860.91705.7167
Vanilla Attention100.79540.88451.5839
200.81590.90162.6829
300.81970.90804.1972
400.82860.91246.1048
500.83120.91626.7852
Self-Attention100.82610.91011.9021
200.83120.91804.4426
300.83500.92606.0507
400.85290.93188.1773
500.84530.939310.1723
Fused Attention100.82990.91442.5265
200.83330.92654.1947
300.84610.93456.7247
400.85810.94018.1700
500.85810.944210.9043
Figure 2. Accuracy vs. epochs for each model on DJUS stock index.
Figure 2. Accuracy vs. epochs for each model on DJUS stock index.
Forecasting 07 00050 g002
Figure 3. AUC vs. epochs for each model on DJUS stock index.
Figure 3. AUC vs. epochs for each model on DJUS stock index.
Forecasting 07 00050 g003

6.2. Performance of All Models on NYSE AMEX Stock Index

The performance of all models on the NYSE AMEX stock index is shown in Table 14 and illustrated in Figure 4 and Figure 5. The proposed model shows the best AUC of 0.9824 and the highest accuracy of 94.28% on the NYSE AMEX stock index at 50 epochs, thereby proving its increased capacity to extract complex patterns and correlations in financial time series. Especially at 30 epochs, Fused Attention achieves an AUC of 0.9718, above the best performance of several baseline models. Self-Attention obtains a competitive AUC of 0.9726 after 40 epochs; our model shows better generalization.
Table 14. Performance of different models on NYSE AMEX stock index.
Table 14. Performance of different models on NYSE AMEX stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.87520.94752.9539
200.88220.95876.0131
300.89700.96478.3584
400.89900.967810.2014
500.90600.970314.3584
GRU100.83980.94622.2718
200.86100.95614.2328
300.88670.96327.2727
400.89580.96849.0038
500.90150.969610.0793
Vanilla Attention100.86550.94793.1822
200.88610.95935.4869
300.90090.96489.4229
400.89120.967913.1927
500.89830.970218.9116
Self-Attention100.88350.96274.4814
200.87710.96569.0363
300.89450.971912.1958
400.89000.972616.4426
500.89700.968321.4925
Fused Attention100.89000.96655.2198
200.89930.96978.7127
300.91450.971813.9070
400.92220.972817.9407
500.94280.982422.6548
Figure 4. Accuracy vs. epochs for each model on NYSE AMEX stock index.
Figure 4. Accuracy vs. epochs for each model on NYSE AMEX stock index.
Forecasting 07 00050 g004
Figure 5. AUC vs. epochs for each model on NYSE AMEX stock index.
Figure 5. AUC vs. epochs for each model on NYSE AMEX stock index.
Forecasting 07 00050 g005

6.3. Performance of All Models on BSE Stock Index

The performance of all models on the BSE stock index is shown in Table 15 and illustrated in Figure 6 and Figure 7. The experiments on BSE stock data show that Fused Attention often beats others in terms of accuracy and AUC. Fused Attention shows its great capacity in catching market trends at 50 epochs, since it achieves a maximum accuracy of 0.932439 and the best AUC score of 0.979257. Among conventional recurrent models, LSTM and GRU show similar performance; LSTM peaks at 0.912195 accuracy and 0.969459 AUC, while GRU achieves 0.906098 accuracy and 0.969459 AUC at their best-performing epochs. With an accuracy of 0.908537 and an AUC of 0.969561, Vanilla Attention performs modestly but is a good alternative to Fused Attention. Though efficient, Self-Attention trails somewhat behind, with an AUC of 0.969769 and an accuracy of 0.910976. Training times rise with epochs across all models; Fused Attention, at 50 epochs, takes the longest time—21.91 s—indicating a trade-off between computational expense and predictive accuracy. Balancing accuracy and strong AUC performance, Fused Attention shows overall to be the most successful model for BSE stock trend prediction.
Table 15. Performance of different models on BSE stock index.
Table 15. Performance of different models on BSE stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.89760.96453.3735
200.90610.96896.8165
300.90370.969212.2368
400.91220.969412.4857
500.89880.969515.2494
GRU100.88540.96462.2977
200.90610.96895.4531
300.90610.96937.0210
400.89880.969410.3086
500.90240.969712.0394
Vanilla Attention100.89760.96353.1332
200.90730.96837.9487
300.90000.969310.4125
400.89760.969613.8738
500.90850.969416.2311
Self-Attention100.86830.96794.6417
200.91100.96897.7108
300.90000.969013.3624
400.90610.969218.6159
500.90120.969821.5084
Fused Attention100.90980.96855.3015
200.91610.96919.1033
300.90850.969014.7980
400.92540.971019.3200
500.93240.979321.9127
Figure 6. Accuracy vs. epochs for each model on BSE stock index.
Figure 6. Accuracy vs. epochs for each model on BSE stock index.
Forecasting 07 00050 g006
Figure 7. AUC vs. Epochs for each model on BSE stock index.
Figure 7. AUC vs. Epochs for each model on BSE stock index.
Forecasting 07 00050 g007

6.4. Performance of All Models on DAX Stock Index

The performance of all models on the DAX stock index is shown in Table 16 and illustrated in Figure 8 and Figure 9. The findings on the DAX index dataset show that Fused Attention beats all models in both accuracy (0.920278) and AUC (0.983956) at 50 epochs. Though computationally costly, Self-Attention also produces excellent results (0.909840 accuracy and 0.975661 AUC). With smaller training time, LSTM and GRU both perform competitively, with GRU being somewhat better in AUC (0.974304) after 50 epochs. Vanilla Attention trails behind, displaying reduced accuracy compared with other deep learning techniques. Offering the optimum trade-off between accuracy and predictive power, Fused Attention seems to be the best option for DAX index trend prediction overall.
Table 16. Performance of different models on DAX stock index.
Table 16. Performance of different models on DAX stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.87070.95165.1605
200.89130.962811.9538
300.87790.967717.2379
400.89180.971821.3852
500.90160.973927.1811
GRU100.87480.95305.9193
200.89230.96269.1716
300.89640.970112.9187
400.88560.971717.0575
500.90110.974320.9216
Vanilla Attention100.84850.951111.0104
200.86710.961111.9999
300.89850.969017.4971
400.89440.972126.3364
500.89800.972629.6048
Self-Attention100.88870.96639.0568
200.89800.972815.8787
300.91090.975224.8836
400.88050.972734.1670
500.90980.975743.9295
Fused Attention100.88910.971510.0830
200.89360.974917.8837
300.91260.978427.9773
400.89800.974836.8110
500.92080.984044.7582
Figure 8. Accuracy vs. epochs for each model on DAX stock index.
Figure 8. Accuracy vs. epochs for each model on DAX stock index.
Forecasting 07 00050 g008
Figure 9. AUC vs. epochs for each model on DAX stock index.
Figure 9. AUC vs. epochs for each model on DAX stock index.
Forecasting 07 00050 g009

6.5. Performance of All Models on NASDAQ Stock Index

The performance of all models on the NASDAQ stock index is shown in Table 17 and illustrated in Figure 10 and Figure 11. The performance of all the models on NASDAQ stock market data demonstrates that the Fused Attention model beats all other models with 93.64% accuracy and 0.9888 AUC at 50 epochs. With LSTM achieving 92.36% accuracy and 0.9790 AUC and GRU reaching 92.36% accuracy and 0.9787 AUC at 50 epochs among the baseline models, LSTM and GRU show competitive performance. Indicating great feature extraction ability, the Self-Attention model also performs well, peaking at 92.72% accuracy and 0.9796 AUC at 40 epochs. With 92.36% accuracy and 0.9793 AUC across 50 epochs, Vanilla Attention trails somewhat behind. Although Fused Attention offers the best performance, its training duration is more than that of GRU and Vanilla Attention. The overall results show that Fused Attention is the most appropriate model for this work since it greatly increases trend prediction capacities for NASDAQ data.
Table 17. Performance of different models on NASDAQ stock index.
Table 17. Performance of different models on NASDAQ stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.90180.97095.9462
200.91760.97766.9876
300.91150.97839.7193
400.91520.979012.0025
500.92360.979016.2997
GRU100.90910.97282.2077
200.92120.97794.4367
300.92360.97877.6702
400.92240.97948.3064
500.91760.979411.8980
Vanilla Attention100.90300.96973.2347
200.90790.97736.9445
300.91520.978510.4709
400.90300.978512.8610
500.92360.979315.8315
Self-Attention100.91270.97703.8595
200.91640.97869.8097
300.91880.979012.5427
400.92730.979617.0108
500.92120.979220.4992
Fused Attention100.91270.97765.5162
200.91450.97928.0000
300.92190.981113.3201
400.92970.979918.2401
500.93640.988822.0903
Figure 10. Accuracy vs. epochs for each model on NASDAQ stock index.
Figure 10. Accuracy vs. epochs for each model on NASDAQ stock index.
Forecasting 07 00050 g010
Figure 11. AUC vs. epochs for each model on NASDAQ stock index.
Figure 11. AUC vs. epochs for each model on NASDAQ stock index.
Forecasting 07 00050 g011

6.6. Performance of All Models on Nikkei 225 Stock Index

The performance of all models on the Nikkei 225 stock index is shown in Table 18 and illustrated in Figure 12 and Figure 13. On the Nikkei stock index dataset over several epochs, the performance of several models—LSTM, GRU, Vanilla Attention, Self-Attention, and Fused Attention model—was assessed. With an accuracy of 94.36% and an AUC of 0.9876, Fused Attention at 50 epochs showed, among all models, the best performance, surpassing others in both accuracy and predictive power. With accuracy rates above 90% at higher epochs, the Self-Attention and Vanilla Attention models likewise displayed competitive performance. With the increase in epochs, the LSTM and GRU models showed consistent progress in accuracy and AUC; their final accuracy values stayed below those of the attention-based models. The better performance of Fused Attention shows that including several attention mechanisms improves the capacity of the model to detect significant stock market patterns. Higher training time (57.62 s for 50 epochs) results from this, though, compared with LSTM (39.92 s) and GRU (27.62 s). Despite having a higher computational cost, the Fused Attention model shows overall to be the best option for Nikkei stock index prediction since it balances high accuracy and predictive capabilities.
Table 18. Performance of different models on Nikkei 225 stock index.
Table 18. Performance of different models on Nikkei 225 stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.88270.96518.7356
200.90110.969623.6938
300.89830.971724.0915
400.89240.972729.1918
500.87630.972039.9231
GRU100.89420.96476.1163
200.89830.969511.1740
300.90330.971519.4970
400.90610.972721.4837
500.90520.972527.6272
Vanilla Attention100.87810.96437.7629
200.89600.969515.4814
300.90200.971822.8225
400.89880.972533.2535
500.90790.972641.2992
Self-Attention100.90470.970411.1087
200.89880.972122.3048
300.90610.972732.5559
400.86850.972342.5713
500.90470.972654.2201
Fused Attention100.90720.977511.6053
200.89740.971123.6152
300.91920.982434.8730
400.92060.983445.7169
500.94360.987657.6205
Figure 12. Accuracy vs. epochs for each model on Nikkei 225 stock index.
Figure 12. Accuracy vs. epochs for each model on Nikkei 225 stock index.
Forecasting 07 00050 g012
Figure 13. AUC vs. epochs for each model on Nikkei 225 stock index.
Figure 13. AUC vs. epochs for each model on Nikkei 225 stock index.
Forecasting 07 00050 g013

6.7. Performance of All Models on S&P 500 Stock Index

The performance of all models on the S&P 500 stock index is shown in Table 19 and illustrated in Figure 14 and Figure 15. On S&P market data, the results of several models show that attention-based models beat the other deep learning models, including LSTM and GRU. With a high training time of 67.54 s, LSTM progressively improves with epochs; it reaches 89.15% accuracy and an AUC of 0.9669 at 50 epochs. With an AUC of 0.9667 and 88.86% accuracy at 40 epochs, GRU performs similarly; it is somewhat faster, at 51.70 s. With an AUC of 0.9669 at 50 epochs and 89.90% accuracy, Vanilla Attention beats both but with a 72.24-s training time. Self-Attention has an AUC of 0.9676 and 89.68% accuracy; although its training duration peaks at 99.88 s, with the longest training time of 103.31 s, the Fused Attention model produces the best results, 92.90% accuracy and an AUC of 0.9837 at 50 epochs. Although conventional models demonstrate consistent progress, attention-based models, especially the proposed Fused Attention model, offer better predictive potential.
Table 19. Performance of different models on S&P 500 stock index.
Table 19. Performance of different models on S&P 500 stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.87500.958414.2380
200.86670.964428.8773
300.87010.964941.3208
400.88830.967054.5288
500.89150.966967.5416
GRU100.88620.959310.7352
200.87840.965419.7192
300.87060.965631.7420
400.88860.966941.1791
500.88660.966751.7076
Vanilla Attention100.88570.959315.9441
200.89390.964129.2153
300.89860.966543.6378
400.89780.966257.3505
500.89910.966972.2488
Self-Attention100.88180.964319.7511
200.89780.967039.3778
300.88910.967360.6932
400.89690.967478.0388
500.87030.967799.8812
Fused Attention100.89660.965120.9063
200.90300.968741.7439
300.91300.976665.2541
400.91220.967682.2037
500.92910.9837103.3167
Figure 14. Accuracy vs. epochs for each model on S&P 500 stock index.
Figure 14. Accuracy vs. epochs for each model on S&P 500 stock index.
Forecasting 07 00050 g014
Figure 15. AUC vs. epochs for each model on S&P 500 stock index.
Figure 15. AUC vs. epochs for each model on S&P 500 stock index.
Forecasting 07 00050 g015

6.8. Performance of All Models on Shanghai Stock Index

The performance of all models on the Shanghai stock index is shown in Table 20 and illustrated in Figure 16 and Figure 17. The Fused Attention model at 50 epochs obtains the highest accuracy (87.30%) and AUC (0.9577), according to a performance study of several models on the Shanghai Stock Exchange dataset, thereby ranking as the best-performing model. Strong performance in stock trend prediction is shown by the closely following Self-Attention model at 50 epochs, with an accuracy of 87.18% and an AUC of 0.9570. With the increase in epochs, LSTM and GRU show consistent improvements among conventional models; LSTM reaches 84.37% accuracy and 0.9332 AUC at 50 epochs, while GRU reaches 83.77% accuracy and 0.9933 AUC. Vanilla Attention performs poorly in the first epochs but gains an accuracy of 84.07% and an AUC of 0.9305 in 50 epochs. Later epochs have Self-Attention outperforming Vanilla Attention and standard RNN models, thus proving the advantages of attention mechanisms. Fused Attention achieves the greatest AUC score, thereby indicating better predictive ability than any other model.
Table 20. Performance of different models on Shanghai stock index.
Table 20. Performance of different models on Shanghai stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.81560.90282.5063
200.81780.91524.9480
300.83410.92146.6090
400.83700.92619.1610
500.84370.933311.5554
GRU100.81700.90341.7150
200.82000.91414.3073
300.82960.92205.3555
400.83560.92698.6228
500.83780.93439.8929
Vanilla Attention100.79110.90292.4022
200.82150.91424.8537
300.82520.91977.9445
400.82070.925710.3482
500.84070.930612.4405
Self-Attention100.79700.91793.3445
200.82070.93237.8554
300.85560.937611.5627
400.86150.951219.2510
500.87190.957120.3122
Fused Attention100.84000.92834.2346
200.83330.94257.2577
300.86150.944611.0987
400.86960.956315.3568
500.87300.957719.8575
Figure 16. Accuracy vs. epochs for each model on Shanghai stock index.
Figure 16. Accuracy vs. epochs for each model on Shanghai stock index.
Forecasting 07 00050 g016
Figure 17. AUC vs. epochs for each model on Shanghai stock index.
Figure 17. AUC vs. epochs for each model on Shanghai stock index.
Forecasting 07 00050 g017

6.9. Performance of All Models on NIFTY 50 Stock Index

The performance of all models on the NIFTY 50 stock index is shown in Table 21 and illustrated in Figure 18 and Figure 19. With an accuracy of 0.8983 and an AUC of 0.9968 at 40 epochs, the Fused Attention model performs the best according to the result analysis of the NIFTY 50 stock market trend prediction. Although it needs longer training time (16.22 s) than Fused Attention (9.92 s at 40 epochs and 11.87 s at 50 epochs), Self-Attention is a close predictor, with 0.8948 accuracy and 0.96 AUC at 50 epochs. With more epochs, traditional models such as LSTM and GRU also exhibit gains; still, their top performances of 0.8862 and 0.8845 accuracy, respectively, do not exceed 0.96 AUC. Vanilla Attention gains with epochs, although in both accuracy and AUC, it stays rather behind Self-Attention and Fused Attention.
Table 21. Performance of different models on NIFTY 50 stock index.
Table 21. Performance of different models on NIFTY 50 stock index.
ModelEpochsAccuracyAUCTraining Time (s)
LSTM100.82410.91692.0056
200.85170.94633.3540
300.86380.95316.6336
400.87590.95726.3785
500.88620.95898.3614
GRU100.83280.92691.0515
200.86550.94872.1409
300.86550.95343.2570
400.87240.95685.7098
500.88450.95955.6378
Vanilla Attention100.82930.92431.5489
200.84830.94714.1955
300.87070.95375.7672
400.87070.95757.4089
500.87410.95917.7112
Self-Attention100.85690.95112.3682
200.88280.96105.2289
300.86030.96235.8652
400.88790.96549.9230
500.89480.967216.2260
Fused Attention100.85520.96102.1109
200.88970.96315.3348
300.87340.96776.4097
400.89790.96859.9200
500.89830.967811.8721
Figure 18. Accuracy vs. epochs for each model on NIFTY 50 stock index.
Figure 18. Accuracy vs. epochs for each model on NIFTY 50 stock index.
Forecasting 07 00050 g018
Figure 19. AUC vs. epochs for each model on NIFTY 50 stock index.
Figure 19. AUC vs. epochs for each model on NIFTY 50 stock index.
Forecasting 07 00050 g019

7. Comparative Analysis of Model Performance

Table 22 and Table 23 summarize the best classification accuracy and AUC values, respectively, achieved by the baseline models (LSTM, GRU, Vanilla Attention, and Self-Attention) and the proposed Fused Attention architecture (SGR-Net) across nine global stock indices. A clear pattern emerges from both metrics: while conventional recurrent models (LSTM and GRU) and attention-based variants provide competitive performance, the integration of Sparse, Global, and Random Attention modules in SGR-Net consistently delivers superior results.

7.1. Accuracy Analysis

The proposed model, SGR-Net, achieves the highest accuracy across all indices, with improvements ranging between 0.5% and 4.5% compared with the strongest baselines. Notably, on the NYSE AMEX and Nikkei 225 indices, SGR-Net attains 0.9428 and 0.9436, respectively, surpassing the best Self-Attention baseline scores of 0.8970 and 0.9061. This margin of nearly 4–5% absolute gain demonstrates the effectiveness of the fusion strategy under markets with moderate-to-high volatility. On relatively stable markets, such as DJUS and Shanghai, improvements are more modest (SGR-Net: 0.8581 and 0.8730 vs. Self-Attention: 0.8529 and 0.8719), highlighting that Fused Attention is particularly advantageous in complex or noisy financial environments.

7.2. AUC Analysis

The AUC results reinforce the robustness of the Fused Attention design. SGR-Net achieves near-perfect separability with AUC ≥ 0.98 on most developed market indices (NYSE AMEX: 0.9824; NASDAQ: 0.9888; Nikkei 225: 0.9876; S&P 500: 0.9837). These values represent 0.8–1.6% gains over Self-Attention, a margin that, although numerically small, is statistically meaningful given the difficulty of improving AUC beyond 0.97 in financial classification tasks. On emerging markets such as the Shanghai Stock Exchange, the advantage is marginal (0.9577 vs. 0.9571), again suggesting that Fused Attention is most beneficial when handling volatility and nonlinear dynamics.

7.3. Critical Observations

The consistency of SGR-Net across diverse indices underscores its generalizability, with no single dataset showing underperformance relative to baselines. The largest relative improvements appear on markets with higher noise and liquidity variations (e.g., NYSE AMEX, and Nikkei 225), which validates the inclusion of the stochastic Random Attention component as a means to mitigate overfitting and enhance adaptability. Although Self-Attention remains strong on stable markets (e.g., Shanghai and DAX), SGR-Net’s fusion provides an incremental but consistent edge, supporting the claim of robustness across regimes.
Overall, the experimental evidence shows that the Fused Attention mechanism is not only statistically superior but also practically relevant for real-world trading scenarios, where small improvements in predictive reliability can translate into significant financial gains.

8. Ablation Study

8.1. Individual Attention Ablation Study: Sparse, Global, and Random Attention

Table 24 presents the ablation study on the five datasets (DJUS, NYSE AMEX, BSE, DAX, and NASDAQ), highlighting the relative contribution of each attention component.
The Sparse-only configuration exhibited the weakest performance across most indices. For instance, on DJUS, accuracy dropped to 0.8352 (95% CI: 0.8150–0.8550) and AUC to 0.9250 (95% CI: 0.9185–0.9310). Similarly, on BSE and DAX, the accuracy scores of 0.9127 and 0.9048, respectively, were lower than SGR-Net by more than 5%. Precision, recall, and F1-scores also remained consistently weaker, highlighting that relying exclusively on Sparse Attention fails to capture long-range dependencies critical in financial time series.
The Global-only variant performed better than the Sparse-only one, with accuracy scores in the range of 0.8361–0.9256 across datasets. For instance, on DAX, the accuracy was 0.9029 (95% CI: 0.8880–0.9160), with an AUC of 0.9744 (95% CI: 0.9698–0.9790), which is competitive but still below that of SGR-Net. Precision values, such as 0.8328 (95% CI: 0.8050–0.8570) on DJUS, indicate improved stability, yet recall scores dropped in several cases, showing that Global Attention alone overemphasizes dominant signals and under-represents minority trends.
Random-only Attention achieved moderate results, better than Sparse-only but weaker than Global-only in most indices. On NASDAQ, the accuracy was 0.9249 (95% CI: 0.9070–0.9400) with 0.9793 AUC (95% CI: 0.9732–0.9848), but the recall was relatively lower (0.9041, 95% CI: 0.8810–0.9250), indicating limited ability to consistently capture directional changes. Although the stochastic initialization occasionally matched Global Attention in precision, as observed in DJUS, with 0.8292 (95% CI: 0.8020–0.8540), the overall stability across epochs was weaker.
The full SGR-Net model consistently outperformed the ablation variants across all indices. For example, on NYSE AMEX, accuracy was 0.9428 (95% CI: 0.9300–0.9560) with 0.9824 AUC (95% CI: 0.9760–0.9880), far exceeding the ablation configurations by 5–8%. Similarly, on NASDAQ, SGR-Net reached 0.9364 (95% CI: 0.9200–0.9520) accuracy and 0.9888 (95% CI: 0.9820–0.9940) AUC. Precision, recall, and F1-score all maintained higher estimates with narrower CIs; notably, for DJUS the F1-score was 0.8900 (95% CI: 0.8640–0.9140). This indicates that the synergy of the Sparse, Global, and Random Attention components produces both more accurate and more stable predictions, validating the necessity of all three attention mechanisms.
Table 25 presents the ablation study on the remaining four datasets (Nikkei 225, S&P 500, Shanghai Stock Exchange, and NIFTY 50), further demonstrating the impact of individual and combined attention components.
The Sparse-only configuration consistently underperformed compared with the other variants. On the Nikkei 225 index, accuracy dropped to 0.9028 (95% CI: 0.8890–0.9150) with an AUC of 0.9665 (95% CI: 0.9590–0.9730). Similarly, S&P 500 and NIFTY 50 achieved only 0.8962 and 0.8892 accuracy, respectively, both well below that of SGR-Net. Precision values, such as 0.8380 (95% CI: 0.8100–0.8640) in Shanghai, indicate difficulty in maintaining predictive stability, while recall appeared inflated, at 0.8720 (95% CI: 0.8480–0.8940), reflecting overemphasis on one class. Overall, Sparse Attention alone fails to generalize consistently across diverse indices.
Global-only Attention performed slightly better than Sparse-only configuration but still lagged behind the integrated model. For instance, Nikkei 225 showed an accuracy of 0.9056 (95% CI: 0.8930–0.9180) and an AUC of 0.9689 (95% CI: 0.9620–0.9750). On S&P 500, accuracy stagnated at 0.8940 (95% CI: 0.8810–0.9060), while on Shanghai, it remained weak, at 0.8605 (95% CI: 0.8400–0.8790). Although precision and recall values were more balanced, such as a recall of 0.9100 (95% CI: 0.8940–0.9240) on Nikkei, the model struggled with minority classes and produced wider confidence intervals, reflecting instability in directional forecasting.
Random-only Attention yielded slightly better recall on indices such as S&P 500 (0.8900, 95% CI: 0.8760–0.9030), but its accuracy gains were marginal, with 0.9040 (95% CI: 0.8910–0.9160) on the Nikkei 225 stock index and only 0.8571 (95% CI: 0.8360–0.8760) on Shanghai. Precision was stronger, such as 0.9045 (95% CI: 0.8880–0.9200) on the Nikkei 225 stock index, but the inconsistent recall on Shanghai (0.8480, 95% CI: 0.8200–0.8750) resulted in weaker F1-scores. While stochastic attention captures some robustness to noise, it alone lacks the structure to yield reliable improvements across markets.
The proposed model, SGR-Net, clearly outperformed all ablation configurations. On Nikkei 225, it achieved 0.9436 accuracy (95% CI: 0.9300–0.9550) and 0.9876 AUC (95% CI: 0.9820–0.9920), representing a substantial margin over the best single-attention-component baselines. On S&P 500, accuracy rose to 0.9291 (95% CI: 0.9190–0.9380) with an F1-score of 0.8891 (95% CI: 0.8790–0.8990), significantly surpassing the Random and Sparse variants. Shanghai, the most challenging dataset, still showed strong improvements, with 0.8730 accuracy (95% CI: 0.8530–0.8910) and 0.9577 AUC (95% CI: 0.9480–0.9660). NIFTY 50 similarly reached 0.8983 accuracy (95% CI: 0.8790–0.9160) and 0.9685 AUC (95% CI: 0.9590–0.9780). In all cases, precision, recall, and F1-score were consistently higher with narrower confidence intervals, confirming that the synergy of Sparse, Global, and Random Attention mechanisms drives both predictive strength and robustness across international stock indices.

8.2. Remove-One Ablation Study

8.2.1. Analysis of Remove-One Ablation on DJUS, NYSE AMEX, BSE, DAX, and NASDAQ Stock Indices

Table 26 presents the remove-one ablation study across the DJUS, NYSE AMEX, BSE, DAX, and NASDAQ indices, highlighting the contribution of the Sparse, Global, and Random Attention components. Across all datasets, the complete SGR-Net consistently outperforms its reduced variants. For instance, on the NYSE AMEX index, SGR-Net achieves an accuracy of 0.9428 (95% CI: 0.9300–0.9560) and an AUC of 0.9824 (95% CI: 0.9760–0.9880), compared with the 0.9073/0.9691 for the No-Sparse variant and 0.9134/0.9696 for the No-Random configuration. Similarly, on the BSE index, SGR-Net attains an accuracy of 0.9324 (95% CI: 0.9130–0.9490) and an AUC of 0.9793 (95% CI: 0.9690–0.9880), which are markedly higher than those of the ablation baselines (0.9073–0.9134 accuracy and 0.9691–0.9696 AUC). Gains are also observed in precision, recall, and F1-score, where SGR-Net maintains values above 0.92 across BSE, DAX, and NASDAQ, while one-component ablation configurations drop to the 0.88–0.90 range.
The DAX index further illustrates the stability of the fused model, with SGR-Net yielding 0.9208 accuracy (95% CI: 0.9070–0.9350) and 0.9840 AUC (95% CI: 0.9780–0.9890), compared with 0.9062/0.9750 for No-Global and 0.9088/0.9753 for No-Random. On the NASDAQ index, SGR-Net achieves the strongest overall results, with 0.9364 accuracy (95% CI: 0.9200–0.9520) and 0.9888 AUC (95% CI: 0.9820–0.9940), outperforming single-attention-component variants by approximately 0.03–0.04 in accuracy and 0.015–0.02 in AUC. Even on the more challenging DJUS index, where overall performance is relatively lower, SGR-Net secures 0.8581 accuracy (95% CI: 0.8370–0.8780) and 0.9442 AUC (95% CI: 0.9320–0.9570), showing consistent improvements over the reduced variants (0.8368–0.8478 accuracy, 0.9323–0.9439 AUC).
Overall, the results confirm that no single attention mechanism is sufficient: while Sparse, Global, and Random Attention individually contribute meaningful discriminative power, their integration in SGR-Net consistently enhances predictive performance. The improvements are not only evident in point estimates but also reflected in tighter confidence intervals, suggesting greater robustness and reliability of the Fused Attention framework across diverse financial markets.

8.2.2. Analysis of Remove-One Ablation on Nikkei 225, S&P 500, Shanghai, and NIFTY 50 Stock Indices

Table 27 presents the remove-one ablation study across the Nikkei 225, S&P 500, Shanghai Stock Exchange, and NIFTY 50 indices, highlighting the contribution of the Sparse, Global, and Random Attention components. Across the four indices, SGR-Net consistently outperformed the remove-one ablation variants (No-Sparse, No-Global, and No-Random). On the Nikkei 225 index, SGR-Net achieved the highest accuracy of 0.9436 (95% CI: 0.9300–0.9550) and an AUC of 0.9876 (95% CI: 0.9820–0.9920), surpassing the reduced models, which remained in the 0.9006–0.9075 accuracy and 0.9697–0.9737 AUC ranges. On the S&P 500 index, SGR-Net obtained an accuracy of 0.9291 (95% CI: 0.9190–0.9380) and an AUC of 0.9837 (95% CI: 0.9790–0.9880), outperforming the ablation configurations (accuracy 0.8971–0.9020, AUC 0.9671–0.9699).
For the Shanghai index, where overall accuracy scores were lower, SGR-Net still produced the best results with an accuracy of 0.8730 (95% CI: 0.8530–0.8910) and an AUC of 0.9577 (95% CI: 0.9480–0.9660), compared with 0.8511–0.8711 accuracy and 0.9446–0.9568 AUC for the remove-one models. Similarly, on the NIFTY 50 index, SGR-Net achieved 0.8983 (95% CI: 0.8790–0.9160) accuracy and 0.9685 (95% CI: 0.9590–0.9780) AUC, outperforming ablation configurations with accuracy scores of 0.8879–0.8948 and AUC values of 0.9643–0.9650.
In addition to accuracy and AUC, SGR-Net provided stronger precision, recall, and F1-scores across all indices, with both higher point estimates and narrower confidence intervals, while the ablation variants exhibited small drops in these metrics. These results confirm that the combined use of Sparse, Global, and Random Attention yields a measurable improvement over removing any single attention component.

9. Conclusions and Future Work

This study introduces a novel Fused Attention model (SGR-Net), which integrates Random, Global, and Sparse Attention mechanisms to enhance stock market trend prediction across multiple indices. Utilizing thirteen technical indicators, the proposed model demonstrates superior accuracy, AUC, and generalization capability compared with baseline models such as LSTM, GRU, Vanilla Attention, and Self-Attention. Specifically, the Fused Attention model achieves AUC improvements of 0.49% to 1.89% and accuracy gains of 1.89% to 6.53%, consistently outperforming other models across the datasets DJUS, NYSE AMEX, BSE, DAX, NASDAQ, Nikkei, S&P 500, Shanghai Stock Exchange, and NIFTY 50. Notably, the model exhibits faster convergence at lower epochs, making it computationally efficient despite longer training times.
The Fused Attention model effectively captures complex temporal patterns, cross-variable interdependencies, and nonlinear interactions in financial time-series data. While conventional models like LSTM and GRU provide stable performance, attention-based models, particularly the proposed Fused Attention model, demonstrate superior predictive power and interpretability.
In this study, Sparse Attention reduces computational overhead, Global Attention captures long-term dependencies, and Random Attention mitigates overfitting, thereby enhancing the model’s robustness across diverse market conditions.
In subsequent research, we plan to extend the applicability of the Fused Attention model to different forecasting tasks, such as electricity consumption prediction, FOREX trend prediction, and wind energy forecasting. Additionally, we aim to incorporate chaotic time-series modeling to further enhance accuracy and generalization. Furthermore, we will explore ways to optimize the computational efficiency of the model for real-time applications.

Author Contributions

Conceptualization, R.R.K., R.P., S.K.N., R.K.B. and M.J.S.; methodology, R.R.K. and R.P.; software, R.R.K., R.P., S.K.N. and R.K.B.; validation, R.P. and M.J.S.; formal analysis, R.R.K., S.K.N. and R.K.B.; investigation, M.J.S.; resources, R.P.; data curation, R.R.K., R.P., S.K.N. and R.K.B.; writing—original draft preparation, R.R.K., R.P., S.K.N. and R.K.B.; writing—review and editing, M.J.S.; visualization, R.R.K. and M.J.S.; supervision, M.J.S.; project administration, R.P. and M.J.S.; funding acquisition, M.J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research study and the APC were funded by Biomedical Sensors & Systems Lab, University of Memphis, Memphis, TN 38152, USA.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used and analyzed in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. González-Rivera, G.; Lee, T.H. Financial Forecasting, Nonlinear Time Series. In Encyclopedia of Complexity and Systems Science; Meyers, R.A., Ed.; Springer: New York, NY, USA, 2009; pp. 3475–3504. [Google Scholar] [CrossRef]
  2. Pai, P.F.; Lin, C.S. A Hybrid ARIMA and Support Vector Machines Model in Stock Price Forecasting. Omega 2005, 33, 497–505. [Google Scholar] [CrossRef]
  3. Wei, W.W.S. Forecasting with ARIMA Processes. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 534–536. [Google Scholar] [CrossRef]
  4. Long, W.; Lu, Z.; Cui, L. Deep Learning-Based Feature Engineering for Stock Price Movement Prediction. Knowl.-Based Syst. 2019, 164, 163–173. [Google Scholar] [CrossRef]
  5. Liu, G.; Wang, X. A Numerical-Based Attention Method for Stock Market Prediction with Dual Information. IEEE Access 2019, 7, 7357–7367. [Google Scholar] [CrossRef]
  6. Chen, J.; Xie, L.; Lin, W.; Wu, Y.; Xu, H. Multi-Granularity Spatio-Temporal Correlation Networks for Stock Trend Prediction. IEEE Access 2024, 12, 67219–67232. [Google Scholar] [CrossRef]
  7. Luo, A.; Zhong, L.; Wang, J.; Wang, Y.; Li, S.; Tai, W. Short-Term Stock Correlation Forecasting Based on CNN-BiLSTM Enhanced by Attention Mechanism. IEEE Access 2024, 12, 29617–29632. [Google Scholar] [CrossRef]
  8. Lam, M. Neural network techniques for financial performance prediction: Integrating fundamental and technical analysis. Decis. Support Syst. 2004, 37, 567–581. [Google Scholar] [CrossRef]
  9. Kara, Y.; Boyacioglu, M.A.; Kaan Baykan, Ö. Predicting Direction of Stock Price Index Movement Using Artificial Neural Networks and Support Vector Machines: The Sample of the Istanbul Stock Exchange. Expert Syst. Appl. 2011, 38, 5311–5319. [Google Scholar] [CrossRef]
  10. Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting Stock and Stock Price Index Movement Using Trend Deterministic Data Preparation and Machine Learning Techniques. Expert Syst. Appl. 2015, 42, 259–268. [Google Scholar] [CrossRef]
  11. Ballings, M.; Poel, D.V.D.; Hespeels, N.; Gryp, R. Evaluating Multiple Classifiers for Stock Price Direction Prediction. Expert Syst. Appl. 2015, 42, 7046–7056. [Google Scholar] [CrossRef]
  12. Zhang, X.D.; Li, A.; Pan, R. Stock Trend Prediction Based on a New Status Box Method and AdaBoost Probabilistic Support Vector Machine. Appl. Soft Comput. 2016, 49, 385–398. [Google Scholar] [CrossRef]
  13. Fischer, T.; Krauss, C. Deep Learning with Long Short-Term Memory Networks for Financial Market Predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef]
  14. Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long- and short-term temporal patterns with deep neural networks. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar] [CrossRef]
  15. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. In Proceedings of the ICLR 2021—9th International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
  16. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar] [CrossRef]
  17. Qiu, J.; Wang, B.; Zhou, C. Forecasting stock prices with long-short term memory neural network based on attention mechanism. PLoS ONE 2020, 15, e0227222. [Google Scholar] [CrossRef]
  18. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar] [CrossRef]
  19. Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency Enhanced Decomposed Transformer for Long-Term Series Forecasting. In Proceedings of the 39th International Conference on Machine Learning (ICML 2022), Baltimore, MD, USA, 17–23 July 2022; Proceedings of Machine Learning Research. Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S., Eds.; Volume 162, pp. 27268–27286. Available online: https://proceedings.mlr.press/v162/zhou22g.html (accessed on 21 August 2025).
  20. Zhang, Y.; Yan, J. Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  21. Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In Proceedings of the 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  22. Cheng, R.; Li, Q. Modeling the Momentum Spillover Effect for Stock Prediction via Attribute-Driven Graph Attention Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021. [Google Scholar]
  23. Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM Method for Stock Price Prediction. Neural Comput. Appl. 2021, 33, 4741–4753. [Google Scholar] [CrossRef]
  24. Su, H.; Wang, X.; Qin, Y.; Chen, Q. Attention-based adaptive spatial–temporal hypergraph convolutional networks for stock price trend prediction. Expert Syst. Appl. 2024, 238, 121899. [Google Scholar] [CrossRef]
  25. Zhang, Y.; Wu, R.; Dascalu, S.M.; Harris, F.C. Sparse transformer with local and seasonal adaptation for multivariate time series forecasting. Sci. Rep. 2024, 14, 15909. [Google Scholar] [CrossRef]
  26. Dong, Y.; Hao, Y. A Stock Prediction Method Based on Multidimensional and Multilevel Feature Dynamic Fusion. Electronics 2024, 13, 4111. [Google Scholar] [CrossRef]
  27. Bustos, O.; Pomares-Quimbaya, A. Stock Market Movement Forecast: A Systematic Review. Expert Syst. Appl. 2020, 156, 113464. [Google Scholar] [CrossRef]
  28. Nabipour, M.; Nayyeri, P.; Jabani, H.; Shahab, S.; Mosavi, A. Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms via Continuous and Binary Data: A Comparative Analysis. IEEE Access 2020, 8, 150199–150212. [Google Scholar] [CrossRef]
  29. Lee, H.; Kim, J.H.; Jung, H.S. Deep-learning-based stock market prediction incorporating ESG sentiment and technical indicators. Sci. Rep. 2024, 14, 10262. [Google Scholar] [CrossRef] [PubMed]
  30. Chong, E.; Han, C.; Park, F.C. Deep Learning Networks for Stock Market Analysis and Prediction: Methodology, Data Representations, and Case Studies. Expert Syst. Appl. 2017, 83, 187–205. [Google Scholar] [CrossRef]
  31. Jiang, W. Applications of Deep Learning in Stock Market Prediction: Recent Progress. Expert Syst. Appl. 2021, 184, 115537. [Google Scholar] [CrossRef]
  32. Sun, Z.; Harit, A.; Cristea, A.I.; Wang, J.; Lio, P. MONEY: Ensemble learning for stock price movement prediction via a convolutional network with adversarial hypergraph model. AI Open 2023, 4, 165–174. [Google Scholar] [CrossRef]
  33. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
  34. Mao, W.; Liu, P.; Huang, J. SF-Transformer: A Mutual Information-Enhanced Transformer Model with Spot-Forward Parity for Forecasting Long-Term Chinese Stock Index Futures Prices. Entropy 2024, 26, 478. [Google Scholar] [CrossRef]
  35. Yang, S.; Ding, Y.; Xie, B.; Guo, Y.; Bai, X.; Qian, J.; Gao, Y.; Wang, W.; Ren, J. Advancing Financial Forecasts: A Deep Dive into Memory Attention and Long-Distance Loss in Stock Price Predictions. Appl. Sci. 2023, 13, 12160. [Google Scholar] [CrossRef]
  36. Peng, H.; Pappas, N.; Yogatama, D.; Schwartz, R.; Smith, N.A.; Kong, L. Random Feature Attention. arXiv 2021, arXiv:2103.02143. [Google Scholar] [CrossRef]
  37. Zheng, L.; Wang, C.; Kong, L. Linear Complexity Randomized Self-Attention Mechanism. arXiv 2022, arXiv:2204.04667. [Google Scholar] [CrossRef]
Figure 1. Architecture of the proposed model, SGR-Net, integrating Sparse, Global, and Random Attention mechanisms.
Figure 1. Architecture of the proposed model, SGR-Net, integrating Sparse, Global, and Random Attention mechanisms.
Forecasting 07 00050 g001
Table 1. Comparison of techniques in stock market prediction.
Table 1. Comparison of techniques in stock market prediction.
ReferenceTechniqueAttentionTask
[8]Backpropagation Neural NetworkNoTime-Series Classification
[9]Neural Network and Support Vector MachineNoTime-Series Classification
[10]Neural Network, Support Vector Machine, Random Forest, and Naïve BayesNoTime-Series Classification
[11]Kernel FactoryNoTime-Series Classification
[12]Genetic Algorithm and Support Vector MachineNoTime-Series Classification
[13]LSTMNoTime-Series Classification
[14]CNN and RNNNoTime-Series Forecasting
[15]TransformerYesImage Classification
[16]TransformerYesTime-Series Forecasting
[17]LSTM with AttentionYesTime-Series Forecasting
[18]InformerYesTime-Series Forecasting
[19]FEDformerYesTime-Series Forecasting
[20]CrossformerYesTime-Series Forecasting
[21]PatchTSTYesTime-Series Forecasting
[22]Graph Attention NetworkYesTime-Series Forecasting
[23]BiLSTM, TransformerYesTime-Series Forecasting
[24]Noise-Aware AttentionYesTime-Series Forecasting
[25]DozerFormerYesTime-Series Forecasting
[26]Dynamic Feature Fusion FrameworksYesTime-Series Forecasting
Our Model: SGR-NetFusion of Sparse, Global, and Random AttentionYesTime-Series Classification
Table 2. Dataset details with distribution of uptrend and downtrend classes across indices.
Table 2. Dataset details with distribution of uptrend and downtrend classes across indices.
Stock IndexTime SpanTraining Data (70%)Testing Data (30%)
Up-TrendsDown-TrendsTotal InstancesUp-TrendsDown-TrendsTotal Instances
DJUS IndexApril 2005–July 20169978271824438344782
NYSE AMEX IndexJanuary 1996–July 20161957166836258507041554
BSE IndexJanuary 2005–December 201510208921912437382819
DAX IndexJanuary 1991–July 201624132114452710448971941
NASDAQ IndexJanuary 2005–December 201510588651923446378824
Nikkei 225 IndexJanuary 1987–July 2016260924835092111510672182
S&P 500 IndexJanuary 1962–July 2016511844739591214619644110
Shanghai Stock Exchange IndexJanuary 1998–July 20161679147031496856651350
NIFTY 50 IndexJanuary 2008–December 20157096441353300279579
Table 3. Hyperparameter settings used in experiments.
Table 3. Hyperparameter settings used in experiments.
HyperparameterValue/Description
LSTM hidden units128
Batch size32
Learning rate0.001 (Adam optimizer)
Loss functionCrossEntropyLoss
OptimizerAdam
Epochs10, 20, 30, 40, and 50 (all models); 10–100 in steps of 10 (SGR-Net)
Table 4. Performance of the proposed model on the DJUS stock index.
Table 4. Performance of the proposed model on the DJUS stock index.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.82990.91442.5265
200.83330.92654.1947
300.84610.93456.7247
400.85810.94018.1700
500.85810.944210.9043
600.86320.946712.0609
700.85420.949315.4306
800.87210.948118.3172
900.85550.946920.6902
1000.82990.953523.8549
Table 5. Performance of the proposed model on the NYSE AMEX stock index.
Table 5. Performance of the proposed model on the NYSE AMEX stock index.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.89000.96655.2198
200.89930.96978.7127
300.91450.971813.9070
400.92220.972817.9407
500.94280.982422.6548
600.94450.971730.4493
700.91030.969043.5673
800.90480.971954.3972
900.90150.971372.3547
1000.87580.967487.3527
Table 6. Performance of the proposed model on the BSE stock index.
Table 6. Performance of the proposed model on the BSE stock index.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.90980.96855.3015
200.91610.96919.1033
300.90850.969014.7980
400.92540.971019.3200
500.93240.979321.9127
600.92850.969725.6869
700.91120.970429.3633
800.90240.969833.9195
900.89760.970736.6264
1000.90980.971642.2621
Table 7. Performance of the proposed model on the DAX index.
Table 7. Performance of the proposed model on the DAX index.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.88910.971510.0830
200.89360.974917.8837
300.91260.978427.9773
400.89800.974836.8110
500.92080.984044.7582
600.91010.974855.8708
700.90310.975665.7375
800.89030.975673.8886
900.90980.977183.1955
1000.89180.976391.1857
Table 8. Performance of the proposed model on the NASDAQ index.
Table 8. Performance of the proposed model on the NASDAQ index.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.91270.97765.5162
200.91450.97927.9996
300.92190.981113.3201
400.92970.979918.2401
500.93640.988822.0903
600.93550.988225.8913
700.91880.978728.6653
800.87150.979732.0084
900.91640.979335.9114
1000.91760.978340.5604
Table 9. Performance of the proposed model on the Nikkei 225 stock index.
Table 9. Performance of the proposed model on the Nikkei 225 stock index.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.90720.977511.8397
200.89740.971119.1397
300.91920.982425.6087
400.92060.983434.4480
500.94360.987641.7566
600.94560.982949.7582
700.89310.971857.6543
800.89920.975366.6440
900.91040.983775.8100
1000.90750.974885.2727
Table 10. Performance of the proposed model on the S&P 500 stock index.
Table 10. Performance of the proposed model on the S&P 500 stock index.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.89660.965120.9063
200.90300.968741.7439
300.91300.976665.2541
400.91220.967682.2037
500.92910.9837103.3167
600.89220.9692123.0970
700.90220.9712144.4847
800.89590.9694163.6611
900.90150.9719189.9799
1000.90320.9730214.2635
Table 11. Performance of the proposed model on Shanghai Stock Exchange data.
Table 11. Performance of the proposed model on Shanghai Stock Exchange data.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.84000.92834.2346
200.83330.94257.2577
300.86150.944611.0987
400.86960.956315.3568
500.87300.957719.8575
600.87330.959327.4802
700.87480.959734.3601
800.87480.960440.7298
900.84590.958346.8047
1000.82810.955754.0861
Table 12. Performance of the proposed model on NIFTY 50.
Table 12. Performance of the proposed model on NIFTY 50.
ModelEpochsAccuracyAUCTraining Time (s)
Fused Attention100.85520.96102.1109
200.88970.96315.3348
300.87340.96776.4097
400.89790.96859.9200
500.89830.967811.8721
600.86900.965615.5928
700.88280.968318.2711
800.89140.968321.8305
900.87240.968223.9187
1000.89480.968927.5492
Table 22. Best performance (accuracy (best over epochs)) of all models across stock indices.
Table 22. Best performance (accuracy (best over epochs)) of all models across stock indices.
ModelDJUSNYSE AMEXBSEDAXNASDAQNikkei 225S&P 500Shanghai Stock ExchangeNIFTY 50
LSTM0.83380.90600.91220.90160.92360.90110.89150.84370.8862
GRU0.82860.90150.90610.90110.92360.90610.88860.83780.8845
Vanilla Attention0.83120.90090.90850.89850.92360.90790.89910.84070.8741
Self-Attention0.85290.89700.91100.91090.92730.90610.89780.87190.8948
Fused Attention (SGR-Net)0.85810.94280.93240.92080.93640.94360.92910.87300.8983
Values are the best per model per dataset (across 10–50 epochs).
Table 23. Best performance (AUC (best over epochs)) of all models across stock indices.
Table 23. Best performance (AUC (best over epochs)) of all models across stock indices.
ModelDJUSNYSE AMEXBSEDAXNASDAQNikkei 225S&P 500Shanghai Stock ExchangeNIFTY 50
LSTM0.91640.97030.96940.97390.97900.97270.96690.93330.9589
GRU0.91700.96960.96970.97430.97940.97270.96690.93430.9595
Vanilla Attention0.91620.97020.96960.97260.97930.97260.96690.93060.9591
Self-Attention0.93930.97260.96980.97570.97960.97270.96770.95710.9672
Fused Attention (SGR-Net)0.94420.98240.97930.98400.98880.98760.98370.95770.9685
Values are the best per model per dataset (across 10–50 epochs).
Table 24. Individual-attention-component ablation study on DJUS, NYSE AMEX, BSE, DAX, and NASDAQ stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Table 24. Individual-attention-component ablation study on DJUS, NYSE AMEX, BSE, DAX, and NASDAQ stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Model VariantMetricDJUSNYSE AMEXBSEDAXNASDAQ
Only SparseAccuracy0.8352
[0.8150–0.8550]
0.9072
[0.8920–0.9190]
0.9127
[0.8940–0.9290]
0.9048
[0.8890–0.9180]
0.9244
[0.9060–0.9400]
Precision0.8285
[0.8000–0.8540]
0.8830
[0.8620–0.9020]
0.8880
[0.8640–0.9100]
0.8780
[0.8550–0.8980]
0.8920
[0.8680–0.9140]
Recall0.8421
[0.8150–0.8680]
0.8895
[0.8690–0.9090]
0.8975
[0.8750–0.9190]
0.8910
[0.8700–0.9110]
0.9035
[0.8800–0.9250]
F1-score0.8351
[0.8100–0.8600]
0.8862
[0.8680–0.9040]
0.8927
[0.8730–0.9110]
0.8842
[0.8660–0.9020]
0.8973
[0.8770–0.9150]
AUC0.9250
[0.9185–0.9310]
0.9698
[0.9660–0.9735]
0.9696
[0.9635–0.9755]
0.9742
[0.9695–0.9790]
0.9795
[0.9735–0.9850]
Only GlobalAccuracy0.8361
[0.8170–0.8560]
0.9084
[0.8950–0.9200]
0.9129
[0.8950–0.9290]
0.9029
[0.8880–0.9160]
0.9256
[0.9080–0.9410]
Precision0.8328
[0.8050–0.8570]
0.8845
[0.8640–0.9040]
0.8895
[0.8660–0.9120]
0.8765
[0.8530–0.8990]
0.8940
[0.8700–0.9160]
Recall0.8442
[0.8180–0.8700]
0.8880
[0.8680–0.9080]
0.8988
[0.8770–0.9200]
0.8887
[0.8680–0.9090]
0.9052
[0.8820–0.9260]
F1-score0.8383
[0.8140–0.8620]
0.8862
[0.8680–0.9035]
0.8940
[0.8750–0.9120]
0.8825
[0.8650–0.9010]
0.8989
[0.8790–0.9165]
AUC0.9290
[0.9200–0.9380]
0.9699
[0.9665–0.9738]
0.9697
[0.9640–0.9752]
0.9744
[0.9698–0.9790]
0.9796
[0.9740–0.9852]
Only RandomAccuracy0.8349
[0.8150–0.8550]
0.9067
[0.8920–0.9180]
0.9126
[0.8940–0.9290]
0.9053
[0.8890–0.9180]
0.9249
[0.9070–0.9400]
Precision0.8292
[0.8020–0.8540]
0.8820
[0.8610–0.9010]
0.8870
[0.8630–0.9100]
0.8795
[0.8550–0.9000]
0.8928
[0.8690–0.9150]
Recall0.8410
[0.8140–0.8670]
0.8865
[0.8660–0.9060]
0.8979
[0.8750–0.9190]
0.8905
[0.8700–0.9100]
0.9041
[0.8810–0.9250]
F1-score0.8344
[0.8100–0.8590]
0.8842
[0.8660–0.9020]
0.8923
[0.8730–0.9100]
0.8848
[0.8670–0.9020]
0.8981
[0.8780–0.9160]
AUC0.9215
[0.9185–0.9300]
0.9696
[0.9660–0.9732]
0.9695
[0.9635–0.9750]
0.9741
[0.9692–0.9788]
0.9793
[0.9732–0.9848]
SGR-NetAccuracy0.8581
[0.8370–0.8780]
0.9428
[0.9300–0.9560]
0.9324
[0.9130–0.9490]
0.9208
[0.9070–0.9350]
0.9364
[0.9200–0.9520]
Precision0.9000
[0.8730–0.9250]
0.9300
[0.9120–0.9460]
0.9280
[0.9050–0.9490]
0.9140
[0.8950–0.9310]
0.9360
[0.9120–0.9560]
Recall0.8800
[0.8520–0.9050]
0.9360
[0.9180–0.9520]
0.9260
[0.9030–0.9470]
0.9180
[0.8980–0.9380]
0.9380
[0.9140–0.9590]
F1-score0.8900
[0.8640–0.9140]
0.9330
[0.9160–0.9490]
0.9270
[0.9070–0.9460]
0.9160
[0.8980–0.9340]
0.9370
[0.9150–0.9560]
AUC0.9442
[0.9320–0.9570]
0.9824
[0.9760–0.9880]
0.9793
[0.9690–0.9880]
0.9840
[0.9780–0.9890]
0.9888
[0.9820–0.9940]
Table 25. Individual-attention-component ablation study on Nikkei 225, S&P 500, Shanghai, and NIFTY 50 stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Table 25. Individual-attention-component ablation study on Nikkei 225, S&P 500, Shanghai, and NIFTY 50 stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Model VariantMetricNikkei 225S&P 500ShanghaiNIFTY 50
Only SparseAccuracy0.9028
[0.8890–0.9150]
0.8962
[0.8830–0.9070]
0.8542
[0.8330–0.8730]
0.8892
[0.8700–0.9070]
Precision0.8960
[0.8800–0.9110]
0.8880
[0.8740–0.9010]
0.8380
[0.8100–0.8640]
0.8780
[0.8560–0.8980]
Recall0.9070
[0.8920–0.9210]
0.9010
[0.8880–0.9140]
0.8720
[0.8480–0.8940]
0.9000
[0.8790–0.9200]
F1-score0.9015
[0.8860–0.9160]
0.8945
[0.8820–0.9060]
0.8540
[0.8330–0.8730]
0.8890
[0.8700–0.9070]
AUC0.9665
[0.9590–0.9730]
0.9639
[0.9570–0.9710]
0.9410
[0.9320–0.9500]
0.9617
[0.9540–0.9690]
Only GlobalAccuracy0.9056
[0.8930–0.9180]
0.8940
[0.8810–0.9060]
0.8605
[0.8400–0.8790]
0.8918
[0.8730–0.9100]
Precision0.9000
[0.8840–0.9160]
0.8850
[0.8710–0.8980]
0.8460
[0.8200–0.8700]
0.8800
[0.8580–0.9000]
Recall0.9100
[0.8940–0.9240]
0.8920
[0.8780–0.9050]
0.8680
[0.8420–0.8920]
0.8960
[0.8750–0.9150]
F1-score0.9048
[0.8900–0.9180]
0.8885
[0.8760–0.9010]
0.8565
[0.8360–0.8760]
0.8880
[0.8690–0.9060]
AUC0.9689
[0.9620–0.9750]
0.9652
[0.9580–0.9720]
0.9428
[0.9340–0.9520]
0.9629
[0.9550–0.9700]
Only RandomAccuracy0.9040
[0.8910–0.9160]
0.8951
[0.8820–0.9070]
0.8571
[0.8360–0.8760]
0.8880
[0.8690–0.9060]
Precision0.9045
[0.8880–0.9200]
0.8830
[0.8680–0.8970]
0.8600
[0.8350–0.8840]
0.8760
[0.8540–0.8960]
Recall0.9000
[0.8840–0.9160]
0.8900
[0.8760–0.9030]
0.8480
[0.8200–0.8750]
0.8920
[0.8710–0.9110]
F1-score0.9022
[0.8870–0.9160]
0.8865
[0.8740–0.8990]
0.8539
[0.8330–0.8740]
0.8840
[0.8650–0.9020]
AUC0.9678
[0.9610–0.9740]
0.9647
[0.9570–0.9720]
0.9402
[0.9310–0.9490]
0.9609
[0.9530–0.9690]
SGR-NetAccuracy0.9436
[0.9300–0.9550]
0.9291
[0.9190–0.9380]
0.8730
[0.8530–0.8910]
0.8983
[0.8790–0.9160]
Precision0.9085
[0.8910–0.9260]
0.8956
[0.8820–0.9100]
0.8400
[0.8110–0.8670]
0.8712
[0.8510–0.8910]
Recall0.8948
[0.8770–0.9130]
0.8830
[0.8690–0.8960]
0.9050
[0.8810–0.9270]
0.9050
[0.8810–0.9270]
F1-score0.9016
[0.8890–0.9150]
0.8891
[0.8790–0.8990]
0.8710
[0.8510–0.8910]
0.8710
[0.8510–0.8910]
AUC0.9876
[0.9820–0.9920]
0.9837
[0.9790–0.9880]
0.9577
[0.9480–0.9660]
0.9685
[0.9590–0.9780]
Table 26. Remove-one ablation study on DJUS, NYSE AMEX, BSE, DAX, and NASDAQ stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Table 26. Remove-one ablation study on DJUS, NYSE AMEX, BSE, DAX, and NASDAQ stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Model VariantMetricDJUSNYSE AMEXBSEDAXNASDAQ
No-Sparse (Global + Random)Accuracy0.8368
[0.8150–0.8570]
0.9073
[0.8930–0.9210]
0.9073
[0.8890–0.9240]
0.9088
[0.8950–0.9210]
0.8971
[0.8780–0.9140]
Precision0.8200
[0.7900–0.8480]
0.8900
[0.8700–0.9100]
0.9000
[0.8750–0.9240]
0.8800
[0.8600–0.9000]
0.8850
[0.8600–0.9100]
Recall0.8600
[0.8300–0.8880]
0.9000
[0.8800–0.9200]
0.9150
[0.8920–0.9370]
0.9200
[0.9000–0.9400]
0.9050
[0.8800–0.9280]
F1-score0.8400
[0.8120–0.8660]
0.8950
[0.8780–0.9110]
0.9070
[0.8880–0.9250]
0.9000
[0.8830–0.9160]
0.8950
[0.8750–0.9130]
AUC0.9323
[0.9160–0.9460]
0.9691
[0.9630–0.9750]
0.9691
[0.9590–0.9780]
0.9751
[0.9690–0.9800]
0.9671
[0.9600–0.9740]
No-Global (Sparse + Random)Accuracy0.8470
[0.8260–0.8660]
0.9012
[0.8860–0.9150]
0.9012
[0.8830–0.9180]
0.9062
[0.8920–0.9190]
0.8993
[0.8800–0.9160]
Precision0.8350
[0.8050–0.8640]
0.8850
[0.8640–0.9040]
0.8920
[0.8660–0.9160]
0.8950
[0.8740–0.9140]
0.8900
[0.8650–0.9120]
Recall0.8450
[0.8140–0.8730]
0.8920
[0.8710–0.9110]
0.9050
[0.8820–0.9280]
0.9050
[0.8840–0.9250]
0.9020
[0.8760–0.9260]
F1-score0.8400
[0.8120–0.8660]
0.8880
[0.8710–0.9040]
0.8980
[0.8790–0.9160]
0.9000
[0.8830–0.9160]
0.8960
[0.8760–0.9140]
AUC0.9408
[0.9250–0.9540]
0.9692
[0.9630–0.9750]
0.9692
[0.9590–0.9780]
0.9750
[0.9690–0.9800]
0.9671
[0.9600–0.9740]
No-Random (Sparse + Global)Accuracy0.8478
[0.8270–0.8670]
0.9134
[0.8990–0.9260]
0.9134
[0.8950–0.9290]
0.9088
[0.8950–0.9210]
0.9020
[0.8840–0.9180]
Precision0.8450
[0.8160–0.8720]
0.9020
[0.8800–0.9220]
0.9050
[0.8800–0.9290]
0.9000
[0.8800–0.9200]
0.9030
[0.8780–0.9260]
Recall0.8350
[0.8040–0.8640]
0.8980
[0.8770–0.9180]
0.9100
[0.8860–0.9320]
0.9050
[0.8840–0.9250]
0.8920
[0.8660–0.9150]
F1-score0.8400
[0.8120–0.8660]
0.9000
[0.8830–0.9150]
0.9070
[0.8870–0.9250]
0.9020
[0.8860–0.9160]
0.8980
[0.8790–0.9150]
AUC0.9439
[0.9280–0.9560]
0.9696
[0.9640–0.9750]
0.9696
[0.9600–0.9780]
0.9753
[0.9700–0.9800]
0.9699
[0.9620–0.9760]
SGR-NetAccuracy0.8581
[0.8370–0.8780]
0.9428
[0.9300–0.9560]
0.9324
[0.9130–0.9490]
0.9208
[0.9070–0.9350]
0.9364
[0.9200–0.9520]
Precision0.9000
[0.8730–0.9250]
0.9300
[0.9120–0.9460]
0.9280
[0.9050–0.9490]
0.9140
[0.8950–0.9310]
0.9360
[0.9120–0.9560]
Recall0.8800
[0.8520–0.9050]
0.9360
[0.9180–0.9520]
0.9260
[0.9030–0.9470]
0.9180
[0.8980–0.9380]
0.9380
[0.9140–0.9590]
F1-score0.8900
[0.8640–0.9140]
0.9330
[0.9160–0.9490]
0.9270
[0.9070–0.9460]
0.9160
[0.8980–0.9340]
0.9370
[0.9150–0.9560]
AUC0.9442
[0.9320–0.9570]
0.9824
[0.9760–0.9880]
0.9793
[0.9690–0.9880]
0.9840
[0.9780–0.9890]
0.9888
[0.9820–0.9940]
Table 27. Remove-one ablation study on Nikkei 225, S&P 500, Shanghai, and NIFTY 50 stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Table 27. Remove-one ablation study on Nikkei 225, S&P 500, Shanghai, and NIFTY 50 stock indices: best-epoch configuration with accuracy, precision, recall, F1-score, and AUC. Values are reported up to four decimal places; numbers in brackets denote the 95% confidence intervals (CIs).
Model VariantMetricNikkei 225S&P 500ShanghaiNIFTY 50
No-Sparse (Global + Random)Accuracy0.9006
[0.8890–0.9120]
0.8971
[0.8880–0.9080]
0.8637
[0.8440–0.8820]
0.8914
[0.8720–0.9090]
Precision0.8950
[0.8780–0.9110]
0.8850
[0.8710–0.8990]
0.8350
[0.8070–0.8610]
0.8700
[0.8450–0.8930]
Recall0.8950
[0.8760–0.9130]
0.9020
[0.8890–0.9150]
0.9050
[0.8820–0.9260]
0.9000
[0.8780–0.9200]
F1-score0.8950
[0.8800–0.9090]
0.8930
[0.8820–0.9040]
0.8680
[0.8460–0.8880]
0.8850
[0.8670–0.9030]
AUC0.9697
[0.9640–0.9750]
0.9671
[0.9620–0.9720]
0.9457
[0.9360–0.9550]
0.9667
[0.9590–0.9740]
No-Global (Sparse + Random)Accuracy0.9066
[0.8950–0.9170]
0.8993
[0.8900–0.9100]
0.8511
[0.8310–0.8700]
0.8879
[0.8680–0.9050]
Precision0.9020
[0.8850–0.9180]
0.8800
[0.8660–0.8940]
0.8660
[0.8390–0.8910]
0.8650
[0.8400–0.8890]
Recall0.9080
[0.8900–0.9250]
0.8920
[0.8770–0.9060]
0.8700
[0.8460–0.8930]
0.8800
[0.8560–0.9020]
F1-score0.9050
[0.8900–0.9190]
0.8860
[0.8730–0.8980]
0.8680
[0.8470–0.8870]
0.8720
[0.8500–0.8920]
AUC0.9736
[0.9680–0.9790]
0.9671
[0.9620–0.9720]
0.9446
[0.9350–0.9530]
0.9643
[0.9560–0.9720]
No-Random (Sparse + Global)Accuracy0.9075
[0.8960–0.9180]
0.9020
[0.8920–0.9120]
0.8711
[0.8520–0.8890]
0.8948
[0.8750–0.9130]
Precision0.9100
[0.8930–0.9260]
0.8880
[0.8740–0.9010]
0.8780
[0.8520–0.9020]
0.8800
[0.8560–0.9030]
Recall0.9050
[0.8880–0.9210]
0.8900
[0.8760–0.9030]
0.8820
[0.8580–0.9050]
0.8840
[0.8610–0.9070]
F1-score0.9070
[0.8920–0.9210]
0.8890
[0.8760–0.9020]
0.8800
[0.8600–0.9000]
0.8820
[0.8600–0.9020]
AUC0.9737
[0.9690–0.9790]
0.9699
[0.9630–0.9750]
0.9568
[0.9490–0.9640]
0.9650
[0.9570–0.9730]
SGR-NetAccuracy0.9436
[0.9300–0.9550]
0.9291
[0.9190–0.9380]
0.8730
[0.8530–0.8910]
0.8983
[0.8790–0.9160]
Precision0.9200
[0.9020–0.9370]
0.8960
[0.8820–0.9100]
0.8720
[0.8500–0.8920]
0.8850
[0.8640–0.9050]
Recall0.9250
[0.9080–0.9420]
0.8850
[0.8710–0.8980]
0.9070
[0.8830–0.9280]
0.9020
[0.8800–0.9220]
F1-score0.9220
[0.9060–0.9380]
0.8900
[0.8780–0.9020]
0.8890
[0.8680–0.9090]
0.8930
[0.8720–0.9120]
AUC0.9876
[0.9820–0.9920]
0.9837
[0.9790–0.9880]
0.9577
[0.9480–0.9660]
0.9685
[0.9590–0.9780]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khansama, R.R.; Priyadarshini, R.; Nanda, S.K.; Barik, R.K.; Saikia, M.J. SGR-Net: A Synergistic Attention Network for Robust Stock Market Forecasting. Forecasting 2025, 7, 50. https://doi.org/10.3390/forecast7030050

AMA Style

Khansama RR, Priyadarshini R, Nanda SK, Barik RK, Saikia MJ. SGR-Net: A Synergistic Attention Network for Robust Stock Market Forecasting. Forecasting. 2025; 7(3):50. https://doi.org/10.3390/forecast7030050

Chicago/Turabian Style

Khansama, Rasmi Ranjan, Rojalina Priyadarshini, Surendra Kumar Nanda, Rabindra Kumar Barik, and Manob Jyoti Saikia. 2025. "SGR-Net: A Synergistic Attention Network for Robust Stock Market Forecasting" Forecasting 7, no. 3: 50. https://doi.org/10.3390/forecast7030050

APA Style

Khansama, R. R., Priyadarshini, R., Nanda, S. K., Barik, R. K., & Saikia, M. J. (2025). SGR-Net: A Synergistic Attention Network for Robust Stock Market Forecasting. Forecasting, 7(3), 50. https://doi.org/10.3390/forecast7030050

Article Metrics

Back to TopTop