A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization

He, Jiarun; Wan, Fangying; He, Mingfang

doi:10.3390/a19010067

Open AccessArticle

A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization

by

Jiarun He

,

Fangying Wan

^* and

Mingfang He

^*

Institute of Artificial Intelligence Applications Changsha, Central South University of Forestry and Technology, Changsha 410004, China

^*

Authors to whom correspondence should be addressed.

Algorithms 2026, 19(1), 67; https://doi.org/10.3390/a19010067

Submission received: 29 October 2025 / Revised: 20 December 2025 / Accepted: 4 January 2026 / Published: 12 January 2026

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download

Browse Figure

Versions Notes

Abstract

The stock market is of paramount importance to economic development. Investors who accurately predict stock price fluctuations based on its high volatility can effectively mitigate investment risks and achieve higher returns. Traditional time series models face limitations when dealing with long sequences and short-term volatility issues, often yielding unsatisfactory predictive outcomes. This paper proposes a novel algorithm, MSNet, which integrates a Multi-scale Channel Attention mechanism (MSCA) and Sparse Perturbation Greedy Optimization (SPGO) onto an xLSTM framework. The MSCA enhances the model’s spatio-temporal information modeling capabilities, effectively preserving key price features within stock data. Meanwhile, SPGO improves the exploration of optimal solutions during training, thereby strengthening the model’s generalization stability against short-term market fluctuations. Experimental results demonstrate that MSNet achieves an MSE of 0.0093 and an MAE of 0.0152 on our proprietary dataset. This approach effectively extracts temporal features from complex stock market data, providing empirical insights and guidance for time series forecasting.

Keywords:

stock price prediction; multi-scale channel attention; xLSTM; sparse perturbation greedy optimization; time series forecasting

1. Introduction

The stock market, as a core component of financial markets, reflects the dynamic changes within these markets to a certain extent and serves as a crucial indicator for gauging a nation’s economic condition [1]. Predicting stock price movements has long been a focal point of attention within the financial sector and among investors, as accurate forecasting provides robust support for formulating investment strategies. Stock price forecasting typically relies on historical stock data, which exhibits characteristics such as nonlinearity, multi-variability, long memory, and high noise levels, thereby increasing the complexity of market prediction.

Traditional stock price forecasting methods are mainly based on historical stock data and build forecasting models through statistical and econometric methods, such as the ARMA model [2,3], etc. However, these methods still have great limitations in today’s increasingly complex and diverse financial markets. However, due to the numerous factors affecting the stock price and the complex influence mechanism, the above-mentioned traditional model has some problems, such as a lack of information, regional heterogeneity, a complex model, and so on. Therefore, it is difficult to obtain good prediction accuracy using these simple mathematical models [4].

With the continuous development of computer and artificial intelligence technology, the method based on deep learning technology began to emerge in the financial field. With its powerful feature extraction ability, it gradually surpassed the traditional methods in various financial forecasting tasks and became a hot spot in the research of stock forecasting. In the field of stock forecasting, the most commonly used deep learning network structures are Convolutional Neural Network (CNN) [5,6], Long Short-Term Memory network (LSTM) [7,8], Gated Recurrent Unit (Gru) [9,10], etc. Based on these common networks, researchers at home and abroad have made various structural combinations and structural improvements to further improve the prediction effect of the current network model. Lu et al. [11] proposed a method called CNN BiLSTM am to predict the stock closing price of the next day, using CNN to extract the characteristics of the input data. BiLSTM uses the extracted feature data to predict the stock closing price of the next day. Am is used to capture the impact of characteristic states on the closing price of stocks at different times in the past to improve the accuracy of prediction. Bai et al. [12] proposed a new modular prediction framework for the stock market: ModAugNet. To solve the problem that the neural network is easy to over fit in the prediction process, two LSTM modules are used as the over fitting prevention module and prediction module of the model. Zhu et al. [13] proposed a stock forecasting model, mci-GRU, based on the multi-head cross-attention mechanism and the improved GRU to solve the problem that the existing methods are difficult to effectively capture the unobservable potential market state. The Gru model was enhanced by replacing the reset gate with the attention mechanism, and a multi-head cross-attention mechanism was designed to learn the unobservable potential market state representation, so as to improve the accuracy of prediction.

Review

However, although the above model is effective, it still faces considerable limitations in modeling long-term dependence, and long-term dependence is crucial for accurate long-term prediction [14]. This restriction may lead to the reduction of prediction accuracy, especially in the long-term prediction of stock prices. The recently proposed extended long-term and short-term memory (xLSTM) architecture [15] has been improved on the basis of the traditional LSTM. Through a more flexible gating mechanism, the ability to manage and maintain long-term sequence information has been enhanced. On this basis, a hierarchical memory unit has been introduced to stabilize the long-term dependence of storage. Compared with the mainstream large-scale language model, xLSTM shows considerable competitiveness in long sequence and large context processing, and its potential in stock price prediction [16] has not been developed to a large extent. Because Gru has a limited ability to model complex nonlinear dynamics, compared with the model with an attention mechanism, Gru cannot explicitly focus on the most important historical time point for current prediction. On the other hand, although an LSTM based on a recurrent neural network can effectively capture sequence dependencies through hidden states, it is still vulnerable to gradient disappearance or explosion when dealing with long sequences. In view of the urgent need for an in-depth understanding of the characteristics of time series and improving the ability of models in the field of stock forecasting, we choose xLSTM as the benchmark network to make up for the key deficiencies of existing methods and further promote the development of this field.

There are three main challenges in the current stock price forecasting process: (A) Long series and gradient problem: the stock price series usually contains many time steps, requiring the model to manage a large number of data points. This greatly increases the computational resources and time required for training and reasoning, resulting in an increase in computational costs. Using too long an input sequence in stock market forecasting will have a negative impact on forecasting performance. (B) Uncertainty and nonstationarity of the market: various stock prices are affected by multiple factors such as macro-economy, policy changes, investor sentiment, emergencies, and so on, which are highly random and nonstationary, making it difficult for conventional models to stably capture their internal laws. (C) It is difficult to model short-term fluctuations: high-frequency stock market data contains a lot of noise and instantaneous fluctuations, and traditional models are difficult to effectively distinguish between real trends and random disturbances, resulting in limited prediction accuracy.

Aiming at the long series problem in the stock price series, Yadav et al. [17] quantified the influence of the state mode, the number of hidden layers, and hyperparameters on the prediction effect through experiments, and obtained the parameter combination of LSTM that adapts to the long time series, high volatility, and nonlinear characteristics of the stock market. Muhammad et al. [18] proposed a new version of EMD using Akima spline interpolation technology instead of cubic spline interpolation, dividing the noise inventory data into multiple components, and then using highly correlated sub-components to build the LSTM network. By disassembling the original long sequence into relatively simple and stable sub-sequences, the LSTM will not be disturbed by strong noise or multi-scale fluctuations when learning long-term dependence.

Aiming at the uncertainty and nonstationarity in the stock market, Eapen et al. [19] proposed a new deep learning model, which combines a convolutional neural network and multiple pipes of bidirectional long-term and short-term memory units (BiLSTM). Among them, BiLSTM uses both forward and backward time dependencies to more comprehensively model sequence dynamics and improve the ability to capture complex nonlinear relationships. Gülmez et al. [20] created the optimized depth LSTM network based on the aro model (LSTM-aro) to predict the stock price, in which LSTM is responsible for modeling the complex time-series dependence, and aro ensures that the model operates under the optimal superparameter, reducing the prediction error magnified by improper parameter selection. This combination significantly reduces the fluctuation of the prediction results affected by uncertainty.

Aiming at the problem that it is difficult to accurately model the short-term volatility of stock data, Gupta et al. [21] proposed a stocknet model based on Gru, which is composed of two modules: the injection module for preventing overfitting and the investigation module for stock index prediction. The function of this module is equivalent to regularization, so that the model will not remember the accidental noise in the training data, but pay more attention to the generalization pattern. Baek et al. [22] proposed a CNN-LSTM based on a genetic algorithm (GA [23]), whose GA has global search ability and can automatically find a set of optimal parameters, so that the model can better adapt to short-term changes in different market stages, rather than only perform well in a specific sample. These studies provide valuable insights and directions for our stock price prediction model, and we hope to achieve greater breakthroughs and progress in the field of stock market price prediction through more in-depth innovation and research.

The main contributions of this paper are as follows:

We have developed a stock price data set containing four different industries, covering minute-level stock price data from 24 January 2024 to 18 October 2024, with 45,600 records for each data. This data set extracts rich resources of different stock trend characteristics, which help to get a deeper understanding of market dynamics.
A new multi-scale channel attention (MSCA) module is proposed, which enhances the expression of temporal features through channel-time gating and grouping multi-scale separable convolution. Firstly, the channel-time gating mechanism is used to decouple the time dimension from the channel dimension, and dynamically distinguish the high-confidence period from the low-confidence period, the important factor and the secondary factor; Then, by grouping multi-scale separable convolution, the characteristics of the stock market are extracted in parallel on multiple receptive field scales, and the attention weight is generated by fusing three-way convolution branches and two-way gating, which effectively retains the key price characteristics. This design can significantly improve the prediction ability of the model in the face of an uncertain stock market.
A new sparse permutation greedy optimization (SPGO) algorithm is proposed, which improves the traditional greedy search mechanism through sparse random perturbation and differential evolution strategy. Firstly, the experimental solution is constructed for the individual population in each generation of optimization, and the greedy substitution criterion is used to accelerate the convergence. Then, sparse random radius and continuous difference disturbance are introduced to maintain population diversity and alleviate the premature convergence problem in a strong noise environment. The optimization algorithm supports the model to quickly fine-tune the output layer in the short-term fluctuation range, so as to enhance the generalization stability of the model for short-term market fluctuations.
The MSNet based on xLSTM proposed in this paper obtains 0.0093 MSE and 0.0152 Mae on the self-built dataset. For long series of stock data, it can effectively predict. In general, this method can accurately predict the stock market with high uncertainty.

2. Datasets and Methodology

2.1. Data Acquisition and Preprocessing

The objective of this study is to forecast the closing prices of representative stocks across four distinct sectors: Guohua Wang’an (GHWA) in the software services sector, Shenzhou High-Speed Railway (SZGT) in the transport equipment sector, China Bao’an (ZGBA) in the electrical equipment sector, and Hualian Holdings (HLKG) in the property sector. The study utilizes minute-level data for each stock spanning from 0:30 on 24 January 2024 to 15:00 on 18 October 2024, comprising a total of 182,400 records. Time-series data commonly employed in stock price forecasting includes price data (such as opening and closing prices) and transaction data (such as price-to-earnings ratios and price-to-sales ratios). Price data directly reflects stock price fluctuations, while transaction data provides additional market insights through metrics like stock activity and corporate profitability. These data points are considered crucial references for predicting stock trends, encompassing multiple factors influencing share prices. In this study, we extracted seven key factors for each stock: opening price (Open), high price (High), low price (Low), closing price (Close), trading volume (Vol), and trading value (Amount). The closing price was designated as the target variable for prediction, with the remaining five metrics incorporated as feature variables. The High and Low prices reveal the intraday price range, reflecting the upper limit of market demand and the lower limit of supply; the Close price embodies the market’s overall assessment of the stock for that day. These factors collectively influence investors’ expectations regarding upside potential, downside risk, and return levels. Trading Volume represents the number of shares traded within a specific timeframe, measuring market activity and stock liquidity; Trading Value, the product of volume and price, reflects capital flows within the market. By integrating these five factors, closing prices can be predicted with greater accuracy. In all subsequent experiments, the data set is divided according to the proportion of training set: test set = 9:1, and the starting time of the divided test data is 12 September 2024 14:59:00. The download link of the dataset is: https://pan.baidu.com/s/1aba-M2giwmxNsmaBBHHKCQ?pwd=kg48 (accessed on 3 January 2026).

As training deep neural network models relies on high-quality real-world data to extract effective features, preprocessing of the raw dataset is essential. This ensures the integrity and accuracy of input data, thereby enhancing the model’s training effectiveness and generalization capability. The data preprocessing undertaken in this study encompasses two key aspects: (A) Outlier handling: Identifying and removing outliers that deviate significantly from the normal range or show excessive divergence from data in preceding and subsequent periods, thereby reducing noise interference in model training. (B) Missing value imputation: For missing data points in the time series, imputation is performed using the mean of data from the preceding and subsequent periods to maintain data continuity and consistency.

2.2. MSNet

Given the characteristics of stock market data, this paper proposes MSNet, an xLSTM-based stock price forecasting method, whose structural diagram is shown in Figure 1a. At the model’s input stage, we construct a sliding window over stock market data to read time-series data, with each sample comprising historical market data from the most recent L days alongside five-dimensional features. Following standardization, these input data are fed into the model for training. Structurally, we employ a stacked MSNet architecture. Each block incorporates sLSTM and mLSTM memory units alongside an MSCA attention mechanism. Training utilizes the SPGO optimization algorithm, where MSCA enhances the model’s spatio-temporal information modeling capabilities while SPGO improves the exploration of optimal solutions during training. This architecture effectively extracts both long-term and short-term dependencies within time series data, maintaining information flow through residual connections. To bolster robustness and predictive capability, convolutional and linear projection layers were introduced between strata. This ensures the model relies solely on historical data during forecasting, thereby preventing information leakage.

2.2.1. Multi-Scale Channel Attention (MSCA)

As mentioned in the introduction, the stock market is characterized by considerable uncertainty and nonstationarity. These properties can significantly impact model training, increasing its difficulty and reducing robustness. The attention mechanism enables the network to focus on crucial information within the stock market, dynamically assigning varying importance weights to historical time steps based on the current prediction target. In nonstationary markets, certain moments [24]—such as major news releases, earnings announcements, or policy shifts—exert far greater influence on current prices than others. Attention mechanisms can automatically identify and emphasize these critical event periods while disregarding noisy or irrelevant historical data, thereby enhancing the model’s adaptability to abrupt shifts and structural changes.

Attention mechanisms are extensively employed in neural networks [25,26,27,28]. Early SE modules [29] introduced attention at the channel dimension, enabling adaptive learning of channel weights and assigning corresponding importance to different features. This approach enhances both network performance and model robustness. Subsequently, the proposed CBAM [30] performs pooling along the channel dimension after channel compression, generating a two-dimensional spatial attention map to highlight the locations of critical regions within the data. The recently introduced Coordinate Attention (CA) [31] further refines this approach by modeling along both the height and width dimensions of the image, more effectively capturing global contextual information and long-range dependencies. Building upon these foundations, we propose the Multi-scale Channel Attention (MSCA) mechanism for the volatile stock market. This mechanism incorporates temporal gating, enabling the model to focus on locally more significant time periods while suppressing ineffective random fluctuations. This demonstrates greater adaptability in highly uncertain market environments. The MSCA workflow is as follows:

First, we perform preliminary processing on the input stock price data, drawing upon the key principles of CA. For the input time-series data

X \in R^{B \times C \times L}

, where B denotes the batch size, C represents the number of channels, and L signifies the length of the time series. Given that different channels may exhibit distinct response patterns and different time points may possess varying local structures, while the time-series data lacks dimensions such as width and height, we segment this data along the channel and time dimensions. This yields two feature dimensions: the feature along the time dimension is denoted as

x_{t}

, and the feature along the channel dimension is denoted as

x_{c}

. The resulting two-dimensional features and subsequent operations are as follows:

x_{t} = {Avg}_{C} (x) = \frac{1}{C} \sum_{c = 1}^{C} x (:, c, :) \in R^{B \times 1 \times L}

(1)

x_{c} = {Avg}_{L} (x) = \frac{1}{L} \sum_{t = 1}^{L} x (:, :, t) \in R^{B \times C \times 1}

(2)

Given that the input stock data comprises five dimensions—Open, High, Low, Volume, and Amount—the time axis is segmented into five intervals. Each segment undergoes a single 1D convolution within the Deep Separable Convolution (DSConv) layer before being concatenated. The resulting output is normalized and passed through a sigmoid function to yield the temporal gate b. For the feature

x_{c}

along the channel, we divide it into five equal segments based on length. Similarly, these segments are concatenated via DSConv, ultimately yielding the channel gate a, which can be expressed by the following formula:

\tilde{b} = {Concat}_{i = 1}^{n} ({DSConv}_{k_{i}} (x_{t}^{(i)})) \in R^{B \times 1 \times L}

(3)

b = σ (BN (\tilde{b})) \in R^{B \times 1 \times L}

(4)

\tilde{a} = {reshape}^{- 1} ({Concat}_{i = 1}^{n} ({DSConv}_{k_{i}} ({\hat{x}}_{c}^{(i)}))) \in R^{B \times C \times 1}

(5)

a = σ (BN (\tilde{a}))

(6)

σ = \frac{1}{1 + e^{- x}}

(7)

The operation for depthwise separable convolutions proceeds as follows, taking features on the channel as an example. For each group of width

C_{g}

= C/5, a per-channel convolution operation is first performed:

Y_{c}^{(i)} (t) = \sum_{u = 0}^{k_{i} - 1} w_{c}^{dw, (i)} [u] x_{c}^{(i)} (t \cdot s + u - p_{i}), c = 1 . . C_{g}

(8)

Z_{o}^{(i)} (t) = \sum_{c = 1}^{C_{g}} W_{o, c}^{pw, (i)} Y_{c}^{(i)} (t), o = 1 . . C_{g}

(9)

Finally, we employ an outer product gate configuration to combine the two gates into a B × C × L gated circuit, which can be represented as follows:

G_{top} = a ⊙ b, {(G_{top})}_{b, c, t} = a_{b, c, 1} \cdot b_{b, 1, t}

(10)

Subsequently, for state transitions between certain and uncertain markets within the stock market, we employ channel attention to dynamically recalibrate feature weights. Within the attention mechanism, the Q (Query), K (Key), V (Value) design [32] demonstrates distinct advantages, functioning as a learnable, context-aware information exchange mechanism.

In the MSCA, we employ a depth-separable convolution to obtain Proj(X):

P r o j (X) = {C o n c a t}_{i = 1}^{n} Z^{(i)} \in R^{B \times C \times L^{'}}

(11)

Three independent branches were ultimately obtained:

Q = P r o j_{Q (X)}, K = P r o j_{K (X)}, V = P r o j_{V (X)}

(12)

Fusion, normalization, and pooling yield:

f = \frac{q + k + v}{3}, \hat{f} = BN (f), G_{bot} = σ ({Avg}_{L} (\hat{f})) \in R^{B \times C \times 1}

(13)

Finally, through channel attention, the global importance of each channel is computed, and the channels are then weighted:

g = σ ({Conv 1 D}_{channel} (u)) \in R^{B \times C \times 1}

(14)

Obtain the final output y:

y = (G_{top} ⊙ g)

(15)

MSCA leverages the spatio-temporal information modeling capabilities of channel-time gating to significantly enhance the capture of critical patterns within financial time series. Channel-time gating decouples the temporal and channel dimensions, distinguishing high-confidence from low-confidence periods and primary factors from secondary factors. Meanwhile, grouped multi-scale separable convolutions extract and fuse stock market features across multiple kernel scales. Furthermore, the attention weights generated by two parallel gating paths and three parallel convolutional branches, combined with activation functions and residual connections, integrate with the original feature maps. This approach not only preserves critical stock price signals but also effectively suppresses noise fluctuations and irrelevant disturbances. This integrated strategy markedly enhances the stability and efficiency of identification and prediction within nonstationary, uncertain stock market environments, ensuring the model’s robustness and generalization capability under complex market conditions. The MSCA architecture is illustrated in Figure 1a, with experimental validation of the MSCA module presented in Section 3.3.2.

2.2.2. Sparse Perturbation Greedy Optimization (SPGO)

In the field of stock price forecasting, short-term fluctuations prove challenging to model accurately. Such volatility often manifests as high-frequency noise, readily inducing numerous spurious local minima on the loss function surface. Should an optimization algorithm traverse these noise-induced pseudo-valleys, the model becomes more likely to converge toward a superior solution reflecting genuine market dynamics, rather than merely fitting transient disturbances. Therefore, selecting efficient optimization algorithms such as SGD [33] and Adam [34] is crucial. Adam achieves model convergence through adaptive learning rates, while SGD calculates gradients based on individual samples at each iteration and updates parameters with a fixed learning rate. However, SGD converges slowly, is sensitive to learning rates, and is prone to getting stuck in local minima during short-term stock fluctuations. Adam may prematurely converge due to excessively low learning rates in later training stages caused by adaptive adjustment. The subsequent AdamW [35] combines Adam’s adaptive learning rate with decoupled weight decay, achieving faster convergence and superior generalization. Nevertheless, its exploration capabilities and update strategy remain somewhat inadequate.

As previously noted, gradient-based optimizers are widely employed in neural network training owing to their efficiency on differentiable objectives. Nevertheless, meta-heuristic methods demonstrate robustness in scenarios involving non-differentiable losses, high noise, or adversarial settings. Drawing upon the strengths of meta-heuristic optimization algorithms in information extraction and utilization, this paper proposes a novel hybrid meta-heuristic algorithm: Sparse Perturbation Greedy Optimization (SPGO). This method extends Alpha Evolution (AE) [36] by introducing adaptive guidance trajectories and a sampling strategy biased toward elite individuals. It employs two sets of operators to generate two candidate solutions each, followed by dimension-wise masked fusion. Model parameters are updated using a greedy replacement [37] mechanism synchronized with the global optimum. The specific workflow is as follows:

During algorithmic operation, the optimization parameter vector is denoted as

θ \in R^{D}

, with a population size of N. The population matrix at generation t is represented as:

X^{(t)} = [x_{1}^{(t)}; \dots; x_{N}^{(t)}] \in R^{N \times D}, f_{i}^{(t)} = F (x_{i}^{(t)})

(16)

where F(⋅) denotes the objective being evaluated (e.g., training loss), and gradients are not computed. The globally optimal individual is denoted as

g^{(t)} = a r g {m i n}_{x \in X^{(t)}} F (x)

, with its fitness

f_{g}^{(t)}

. If box constraints

l \leq θ \leq u

exist, with interval width

Δ = u - l

, the algorithm performs dimensional truncation after generating new individuals:

x \leftarrow m i n (m a x (x, l), u)

.

Subsequently, the flattened initial parameter population

θ_{0}

undergoes initialization, alongside the fitness vector

f \in R^{N}

. The global optimum is set to

{g b e s t = θ}_{0}

,

{g b e s t}_{f} = \infty

, and the two first-order guide trajectories

P_{a}

and

P_{b}

are initialized. This step aims to capture mean-reversion behavior within the market:

X^{(0)} = θ_{0} + σ \cdot N (0, I)

(17)

To generate random tensors for this generation, control perturbations are applied. Randomness introduces diversity into the optimization process, simulating the uncertainty inherent in stock markets. The damping factor

α

progressively reduces the perturbation magnitude throughout the optimization, akin to mean-reverting from exploratory markets to focus on more precise algorithmic hyperparameters. The random matrix generation rules are as follows:

R 1, R 2 \in [0, 1]^{N \times D}, S \in {0, 1}^{N \times D}

. Subsequently, the progress coefficient ‘p’ is calculated as

p = m i n (F E s, \max_{f} e s) / m a x (1, \max_{f} e s)

. Subsequently, the annealing exploration factor is computed. This factor initially broadens the exploration scope to enhance result diversity, then rapidly diminishes in later stages:

α = e x p (\ln (\max (e^{- 12}, 1 - p)) - {(4 p)}^{2})

(18)

Next, we construct a comparative search direction that both guides progression toward superior regions and steers clear of substandard areas by distinguishing between advantageous and disadvantageous individuals. For the kth target individual

E = X_{k}

, we sample R from the top 25% of superior individuals and W from the bottom 25% of inferior individuals, thereby constructing the anchor trajectory

O_{v}

:

With a 50% probability: Randomly sample A and update:

P_{a} \leftarrow (1 - cab) P_{a} + cab \cdot A, O_{v} = P_{a}

(19)

Otherwise: randomly select a small subset

{B_{b}}

from the top 50%, and perform a softmax-weighted combination using their normalized fitness

f_{B}

.

w = softmax (- \frac{f_{B} - m i n f_{B}}{std (f_{B}) + ε}), P_{b} \leftarrow (1 - cab) P_{b} + cab \cdot \sum_{b} w_{b} B_{b}

(20)

At this point, we have obtained the sparse random radius term, which we utilize to control the exploration radius and the effective dimension. Subsequently, we derive the first candidate solution:

x^{a} = O_{v} + a_{r} + ζ ⊙ (R + E - O_{v} - W)

(21)

Subsequently, we drew upon the merits of differential algorithms to construct a second candidate solution for the optimization algorithm. For the target individual of the first candidate solution, we first uniformly sampled

x_{p b e s t}

. Then, from the population, we selected two distinct individuals with indices r1 and r2 for differential evolution. These individuals must also differ from both the target individual currently being updated and the selected best individual. These were used to construct the differential vector, followed by the mutation operation:

v = x_{i} + F \cdot (x_{p b e s t} - x_{i}) + F \cdot (x_{r_{1}} - x_{r_{2}})

(22)

Finally, through a binary cross-operation, each dimension is assigned v with a certain probability, otherwise

x_{i}

is selected, ensuring at least one dimension originates from v. This yields the second candidate solution

x^{d}

. Subsequently, a dimension-by-dimension masking fusion is performed to obtain the final solution from the optimization Algorithm 1:

x^{F} = x^{a} + M ⊙ (x^{d} - x^{a}) = (1 - M) ⊙ x^{a} + M ⊙ x^{d}

(23)

Algorithm 1: Sparse Perturbation Greedy Optimization (SPGO)

Input:

θ :

Initial parameter vector to be optimized
F(⋅): Gradient-free objective function
N: Population size

u, l

: Lower and upper bounds of box constraints

σ :

Standard deviation for initial population perturbation
cab: Update coefficient for guiding trajectories

F

: Scaling factor for differential evolution

ζ :

Coefficient for contrastive search direction

Pseudocode:

Initialize

X^{(0)} = θ_{0} + σ \cdot N (0, I)

trajectories P_{a}

, P_{b}

, Global Best g b e s t

While t < T do:

Calculate

progress p and annealing factor α

Rank population; identify Top 25% (Elite) and Bottom 25% (Inferior)

For i = 1 to N do:

Set

current individual E = x_{i}

; Sample R from Elite, W from Inferior

If rand() < 0.5 then:

Update

P_{a}

with random A; Set Anchor Ov = P_{a}

Else:

Update

P_{b}

with softmax - weighted elites; Set Anchor Ov = P_{b}

Compute

Candidate 1 : x^{a} = O_{v} + a_{r} + ζ ⊙ (R + E - O_{v} - W)

Select

distinct indices R 1

, R 2

and x_{p b e s t}

from population

Mutation

: v = x_{i} + F \cdot (x_{p b e s t} - x_{i}) + F \cdot (x_{r_{1}} - x_{r_{2}})

Crossover: Generate Candidate 2 (xd) via binomial crossover on v

Generate mask M;

x^{F} = x^{a} + M ⊙ (x^{d} - x^{a}) = (1 - M) ⊙ x^{a} + M ⊙ x^{d}

Boundary Handling:

x^{F} = c l i p (x^{F}, l, u)

Evaluate

Fitness : f_{n e w} = F (x^{F})

If f_{n e w}

\leq f (x_{i})

then:

x_{i} < - x^{F}

If f_{n e w}

\leq f (g b e s t)

then : g b e s t < - x^{F}

End For

t < - t + 1

Output : g b e s t

During each generation of algorithmic optimization, we construct trial solutions

x^{F}

for each individual in the population and compute their objective values f. Subsequently, we employ a greedy replacement criterion: an individual

x_{i}

is replaced by

x^{F}

and its fitness is updated only if

f \leq f (x_{i})

; otherwise, it remains unchanged. While greedy replacement significantly accelerates convergence under finite evaluation budgets, it may exhibit weaker robustness against strong noise. Hence, this paper concurrently introduces sparse random radius and continuous perturbation strategies from differential algorithms to maintain the essential equilibrium between diversity preservation and exploration-exploitation. This optimization algorithm enables rapid fine-tuning of the model’s output layer within volatile, nonstationary short-term intervals while preserving representation of medium-to-long-term statistical structures. This enhances responsiveness to short-term stock market fluctuations and improves generalization stability. Experimental validation of SPGO’s efficacy is presented in Section 3.3.3.

3. Results and Analysis

This section experimentally verifies that MSNet achieves efficient modeling of long sequences in stock prediction, robust adaptation to nonstationarity and uncertainty, and effective suppression of short-term noise interference. Its overall prediction accuracy and stability surpass baseline methods. Sub-sections comprise: (1) description of the experimental environment and setup, including hardware and software configurations; (2) evaluation metrics; (3) performance evaluation of individual modules to explore their effectiveness; (4) ablation studies on MSNet to validate the proposed methodology; and (5) comparative analysis with other deep learning network models and generalization experiments to intuitively demonstrate the model’s superiority and generalization capabilities.

3.1. Experimental Environment and Training Details

To prevent variations in experimental environments from influencing results, all experiments in this paper were conducted under identical hardware and software conditions. The primary hardware components utilized were an NVIDIA GeForce RTX 4090 graphics card and a 16-vCPU Intel^® Xeon^® Platinum 8481C processor. Whilst the versions of Python, CUDA, and CUDNN did not affect the experimental results, they required compatibility with the software and hardware. We implemented MSNet on PyTorch 1.11.0. The specific hardware specifications are detailed in Table 1.

3.2. Evaluation Metric

As predicting stock prices constitutes a regression task, we selected several metrics commonly employed in stock price forecasting to evaluate the performance of our proposed prediction model: the coefficient of determination (R²), mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE).

MAE penalizes large deviations more severely by averaging squared errors, optimizing smoothness while remaining sensitive to outliers. Conversely, MAE exhibits low sensitivity to outliers, thereby aiding in assessing model robustness. A lower MAE value indicates superior prediction performance. The calculation formulae are as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(24)

MSE employs mean squared error to impose a harsher penalty on large deviations, optimizing for smoothness and differentiability while exhibiting heightened sensitivity to outliers. This metric assigns greater weight to substantial errors; in stock forecasting, significant discrepancies may directly compromise investment accuracy and viability. The calculation formula is as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(25)

In stock price forecasting, R² serves as a statistical measure of a predictive model’s ability to explain actual stock price movements. Its value typically ranges between 0 and 1, representing the proportion of stock price variance attributable to the model. The calculation formula is as follows:

R^{2} = 1 - \frac{\sum (y_{i} - {\hat{y}}_{i})^{2}}{\sum (y_{i} - {y_{i - 1})}^{2}}

(26)

RMSE is calculated by taking the average of the squares of the errors and then extracting the square root. The formula is as follows:

R M S E = \sqrt{\frac{1}{m}} \sum_{i = 1}^{m} (y_{i} - {\hat{y}}_{i})^{2}

(27)

Here,

{\bar{y}}_{i}

denotes the average price,

{\hat{y}}_{i}

denotes the forecast price, and

y_{i}

denotes the actual price. In the field of stock price forecasting, we employ Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Coefficient of Determination (R²) as evaluation metrics. MSE and RMSE exhibit heightened sensitivity to larger forecast errors within specific time periods; MAE measures the average absolute deviation between forecast values and actual prices, serving as a standard indicator of prediction accuracy. R² quantifies the proportion of stock price variation accounted for by the model, facilitating cross-model comparisons. Collectively, these metrics are widely applied in financial time series forecasting, proving particularly suitable for evaluating stock prediction models.

3.3. Module Effectiveness Experiments

3.3.1. Effectiveness of xLSTM

As previously outlined, we employ the xLSTM as our baseline network to accurately model long sequences within stock price data. To demonstrate the xLSTM’s efficacy, we utilize the GHWA dataset as an example, partitioning 45,600 data points into three CSV files: the first CSV file, containing data points 1–15,200, is named Short; the second CSV file, named Mid, contains data points 1 to 30,400; and the third CSV file, named Long, contains the complete dataset from 1 to 45,600. We conducted experiments on these three datasets using both LSTM and xLSTM models. For LSTM, the Adam optimizer was employed, with the final comparison value being the average of three experimental runs. The experimental results are presented in Table 2.

Experimental results indicate that when the time series length is relatively short, the predictive performance of LSTM and xLSTM differs little. However, as the time series length increases, both models exhibit deteriorating performance, with LSTM showing a more pronounced decline. This demonstrates that when processing lengthy stock data sequences, LSTM strictly follows sequential processing by time steps and cannot be parallelized. Within lengthy sequences, information must be transmitted sequentially step by step, resulting in inefficient training and inference processes where errors accumulate progressively. Conversely, the xLSTM combines the memory units of sLSTM and mLSTM, preserving LSTM’s ability to model local temporal patterns while handling global dependencies akin to Transformers. This dual capability enables more stable performance in long-sequence tasks.

3.3.2. Effectiveness of MSCA

In this paper, we employ MSCA to replace the LSTM module within the original xLSTM architecture. To validate the effectiveness of this attention mechanism, we conducted comparative experiments by integrating different attention systems (SE, CBAM, TA [38], CA) at identical positions. The experimental results are presented in Table 3.

The two fully connected layers in SE are prone to saturation, gradient vanishing, and overfitting when confronted with the high noise distributions prevalent in stock markets. Whilst CBAM simultaneously models both channel and spatial attention, these are computed independently, and spatial attention relies solely on static convolutions, rendering it weak at modeling nonstationary time series. TA suffers from computational redundancy, inefficient extraction of fine-grained features, and a lack of multi-scale modeling capabilities. CA, whilst decomposing spatial attention into horizontal and vertical dimensions, employs statically constructed positional encoding. Changes in trading sessions or policy frameworks consequently degrade its predictive performance. Our proposed MSCA, however, employs a channel-time gating mechanism that separates primary and secondary factors for processing. Its weights evolve dynamically over time, unlike the aforementioned attention mechanisms, which primarily rely on global pooling to derive a static set of channel weights. Should market volatility arise, the gating mechanism promptly recalibrates the weight distribution, preventing confusion between historical and current market conditions.

3.3.3. Effectiveness of SPGO

In this paper, we replace the Adam optimizer within the original xLSTM with SPGO. To validate the effectiveness of this optimization algorithm, comparative experiments were conducted using different optimizers (Adam, AdamW, Radam [39], PSO [40]). The experimental results are presented in Table 4.

Experimental results indicate that AdamW demonstrates superior performance to conventional Adam optimizers in addressing short-term volatility within stock markets. However, during the optimization process, it decouples only the decay component, with its

β_{2}

parameter’s long memory causing

v_{t}

to respond sluggishly to fresh short-term fluctuations. Conversely, when confronted with price volatility, Adam’s effective step size abruptly diminishes, rendering the model slow to adapt to genuinely novel market conditions requiring rapid adjustment and thereby introducing lag. RAdam’s variance rectification primarily addresses estimation bias during cold starts. However, the stock market’s inherent continuous nonstationarity renders this approach insufficient in later stages of the optimization process. PSO’s one-off hyperparameter adjustment within a past time window lacks robustness for subsequent short-term volatility. In contrast, our proposed SPGO employs a greedy replacement criterion at each iteration, enabling the algorithm to swiftly adopt superior trial solutions. This accelerates convergence, efficiently refining the output layer to capture short-term price shifts promptly.

3.3.4. Model Stability Testing

To verify the stability of the model, we conducted two comparative experiments. One was to use multiple non-overlapping rolling validation, dividing 182,400 data points into 145,920 for training and 36,480 for validation. We used a window of 18,240 data points for sliding and reported the mean ± standard deviation of all windows. The second one is non-overlapping cross-domain testing, where we divide the data into four domains, train in three domains, and test with the reserved fourth domain. Repeat this process for each field. The experimental results are shown in Table 5:

3.3.5. Comparison Experiments with State-of-the-Art Methods

To further analyze the performance of MSNet, we conducted comparative experiments against several classical and state-of-the-art time series forecasting methods across four stock market datasets, utilizing the identical testing environment and test sets. The experimental results are presented in Table 6, Table 7, Table 8 and Table 9. We also provided training models using 10 different random seeds and reported the mean square of the median and worst test errors, as shown in Table 10.

In algorithms based on recurrent neural networks, LSTMs are prone to sequential bottlenecks. As temporal information must pass through gates step by step, errors tend to accumulate progressively when handling long sequences. BiLSTM employs a bidirectional LSTM architecture, concatenating (or summing) the hidden states from both directions at each time step. This enables each output to incorporate both past and future contextual information. Although BiLSTM leverages bidirectional context, each direction remains constrained by the memory capacity inherent to unidirectional LSTMs. GRU, as a simplified variant of LSTM, offers advantages in computational efficiency and parameter scale. However, its lack of a memory cell mechanism leads to more unstable gradient propagation. Within Transformer-based algorithms, the wavelet enhancement operation in FEDformer is better suited to data with stable cycles. Yet, stock data is predominantly nonstationary, potentially causing performance bias. Informer primarily retains sparse attention, where minor critical fluctuations may be overlooked. Experimental results demonstrate that across four datasets, although MSNet slightly trails Informer and FEDformer on two metrics in two datasets, it leads on all other metrics. This confirms that incorporating MSCA and SPGO enhances the accuracy of stock price forecasting.

3.3.6. Discussion

To test the effectiveness of MSNet in predicting time-series data, we conducted model generalization experiments using the ETTh1 transformer temperature dataset provided by the State Grid Corporation of China, covering periods from 1 January 2016 to 1 January 2018 and 1 December 2015 to 1 December 2017. Under identical testing conditions and forecasting durations, comparative and generalization experiments were conducted against several classical and state-of-the-art object detection methods. Test data are presented in Table 11. Experimental results demonstrate that our proposed MSNet outperforms other popular networks on public datasets, confirming the universal applicability of our MSCA module and SPGO algorithm to sequential forecasting problems.

4. Conclusions

To achieve more precise stock price forecasting, this paper proposes the MSNet predictive model, which integrates the Multi-Scale Channel Attention (MSCA) with time-channel gating, the Sparse Probabilistic Gradient Optimization (SPGO) algorithm, and the xLSTM architecture. By introducing the MSCA multi-scale channel attention module, it employs a time-channel gating mechanism to decouple temporal and channel dimensions, accurately distinguishing high-confidence periods from key factors while extracting and fusing stock market features across multiple scales. Concurrently, the SPGO algorithm accelerates convergence through greedy replacement while maintaining population diversity via sparse random radius and differential perturbation strategies, effectively balancing exploration and exploitation. This significantly enhances the model’s robustness and predictive stability within nonstationary, high-noise stock market environments. Experimental results demonstrate that MSNet exhibits superior predictive performance compared to advanced deep learning models such as Informer and Fedformer, displaying stronger generalization and robustness across diverse market conditions.

Despite encouraging results from existing methods, limitations remain. Our model’s precise capture of nonlinear patterns provides empirical support for challenging the weak form of the efficient market hypothesis, offering stock practitioners a powerful tool for signal extraction. However, since reducing statistical prediction errors does not directly equate to investment gains, future work will focus on deepening architectural design and expanding application evaluations. First, at the architectural level, integrating diverse frequency domain transformations and multi-scale mechanisms will enhance the existing framework’s ability to capture volatility patterns and sudden anomalies. Second, to address external factors like macroeconomic policies, a multi-modal framework can be constructed. By extracting textual features from financial news and social media, market fluctuations can be predicted more accurately, rather than relying solely on historical price trends. Finally, by simulating real trading scenarios, we can evaluate the algorithm’s profit-generating capacity and quantify its effectiveness using risk-adjusted metrics, providing investors with more comprehensive decision-making references.

Author Contributions

Conceptualization, M.H.; Methodology, J.H.; Software, J.H.; Validation, J.H.; Formal analysis, F.W.; Data curation, F.W.; Writing—original draft, J.H.; Writing—review & editing, F.W.; Visualization, M.H.; Funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The download link of the dataset is: https://pan.baidu.com/s/1aba-M2giwmxNsmaBBHHKCQ?pwd=kg48 (accessed on 3 January 2026).

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhu, W.; Dai, W.; Tang, C.; Zhou, G.; Liu, Z.; Zhao, Y. MSNet: A time series forecasting model for Chinese stock price prediction. Sci. Rep. 2024, 14, 18351. [Google Scholar] [CrossRef] [PubMed]
Tang, H. Stock prices prediction based on ARMA model. In Proceedings of the 2021 International Conference on Computer, Blockchain and Financial Development (CBFD), Nanjing, China, 23–25 April 2021; IEEE: New York, NY, USA, 2021. [Google Scholar]
Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock price prediction using the ARIMA model. In Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, Cambridge, UK, 26–28 March 2014; IEEE: New York, NY, USA, 2014. [Google Scholar]
Gao, Y.; Wang, R.; Zhou, E. Stock prediction based on optimized LSTM and GRU models. Sci. Program. 2021, 2021, 4055281. [Google Scholar] [CrossRef]
Hoseinzade, E.; Haratizadeh, S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst. Appl. 2019, 129, 273–285. [Google Scholar] [CrossRef]
Chen, Y.-C.; Huang, W.-C. Constructing a stock-price forecast CNN model with gold and crude oil indicators. Appl. Soft Comput. 2021, 112, 107760. [Google Scholar] [CrossRef]
Chen, K.; Zhou, Y.; Dai, F. A LSTM-based method for stock returns prediction: A case study of China stock market. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015; IEEE: New York, NY, USA, 2015. [Google Scholar]
Roondiwala, M.; Patel, H.; Varma, S. Predicting stock prices using LSTM. Int. J. Sci. Res. 2017, 6, 1754–1756. [Google Scholar] [CrossRef]
Chen, C.; Xue, L.; Xing, W. Research on improved GRU-based stock price prediction method. Appl. Sci. 2023, 13, 8813. [Google Scholar] [CrossRef]
Qi, C.; Ren, J.; Su, J. GRU neural network based on CEEMDAN–wavelet for stock price prediction. Appl. Sci. 2023, 13, 7104. [Google Scholar] [CrossRef]
Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Applic 2021, 33, 4741–4753. [Google Scholar] [CrossRef]
Baek, Y.; Kim, H.Y. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Syst. Appl. 2018, 113, 457–480. [Google Scholar] [CrossRef]
Zhu, P.; Li, Y.; Hu, Y.; Xiang, S.; Liu, Q.; Cheng, D.; Liang, Y. MCI-GRU: Stock Prediction Model Based on Multi-Head Cross-Attention and Improved GRU. arXiv 2024, arXiv:2410.20679. [Google Scholar] [CrossRef]
Lu, W.; Li, J.; Li, Y.; Sun, A.; Wang, J. A CNN-LSTM-based model to forecast stock prices. Complexity 2020, 2020, 6622927. [Google Scholar] [CrossRef]
Beck, M.; Pöppel, K.; Spanring, M.; Auer, A.; Prudnikova, O.; Kopp, M.; Klambauer, G.; Brandstetter, J.; Hochreiter, S. xlstm: Extended long short-term memory. Adv. Neural Inf. Process. Syst. 2024, 37, 107547–107603. [Google Scholar]
Yuan, F.; Huang, X.; Jiang, H.; Jiang, Y.; Zuo, Z.; Wang, L.; Wang, Y.; Gu, S.; Peng, Y. An xLSTM–XGBoost Ensemble Model for Forecasting Non-Stationary and Highly Volatile Gasoline Price. Computers 2025, 14, 256. [Google Scholar] [CrossRef]
Yadav, A.; Jha, C.K.; Sharan, A. Optimizing LSTM for time series prediction in Indian stock market. Procedia Comput. Sci. 2020, 167, 2091–2100. [Google Scholar] [CrossRef]
Ali, M.; Khan, D.M.; Alshanbari, H.M.; El-Bagoury, A.A.-A.H. Prediction of complex stock market data using an improved hybrid EMD-LSTM model. Appl. Sci. 2023, 13, 1429. [Google Scholar] [CrossRef]
Eapen, J.; Bein, D.; Verma, A. Novel deep learning model with CNN and bi-directional LSTM for improved stock market index prediction. In Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 7–9 January 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
Gülmez, B. Stock price prediction with optimized deep LSTM network with artificial rabbits optimization algorithm. Expert Syst. Appl. 2023, 227, 120346. [Google Scholar] [CrossRef]
Gupta, U.; Bhattacharjee, V.; Bishnu, P.S. StockNet—GRU based stock index prediction. Expert Syst. Appl. 2022, 207, 117986. [Google Scholar] [CrossRef]
Baek, H. A CNN-LSTM stock prediction model based on genetic algorithm optimization. Asia-Pac. Financ. Mark. 2024, 31, 205–220. [Google Scholar] [CrossRef]
Lambora, A.; Gupta, K.; Chopra, K. Genetic algorithm-A literature review. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; IEEE: New York, NY, USA, 2019. [Google Scholar]
Stankevičienė, J.; Akelaitis, S. Impact of public announcements on stock prices: Relation between values of stock prices and the price changes in Lithuanian stock market. Procedia Soc. Behav. Sci. 2014, 156, 538–542. [Google Scholar] [CrossRef]
Dai, W.; Zhu, W.; Zhou, G.; Liu, G.; Xu, J.; Zhou, H.; Hu, Y.; Liu, Z.; Li, J.; Li, L. AISOA-SSformer: An effective image segmentation method for rice leaf disease based on the transformer architecture. Plant Phenomics 2024, 6, 0218. [Google Scholar] [CrossRef]
Huang, Z.; Wang, X.; Wei, Y.; Huang, L.; Shi, H.; Liu, W.; Huang, T.S. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
Liu, R.; Wang, T.; Zhang, X.; Zhou, X. DA-Res2UNet: Explainable blood vessel segmentation from fundus images. Alex. Eng. J. 2023, 68, 539–549. [Google Scholar] [CrossRef]
Pandian, J.A.; Kumar, V.D.; Geman, O.; Hnatiuc, M.; Arif, M.; Kanchanadevi, K. Plant disease detection using deep convolutional neural network. Appl. Sci. 2022, 12, 6982. [Google Scholar] [CrossRef]
Cheng, D.; Meng, G.; Cheng, G.; Pan, C. SeNet: Structured edge network for sea-land segmentation. IEEE Geosci. Remote. Sens. Lett. 2017, 14, 247–251. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. arXiv 2021, arXiv:2103.02907. [Google Scholar]
Si, Y.; Xu, H.; Zhu, X.; Zhang, W.; Dong, Y.; Chen, Y.; Li, H. SCSA: Exploring the synergistic effects between spatial and channel attention. Neurocomputing 2025, 634, 129866. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Diederik, K. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Gao, H.; Zhang, Q. Alpha evolution: An efficient evolutionary algorithm with evolution path adaptation and matrix generation. Eng. Appl. Artif. Intell. 2024, 137, 109202. [Google Scholar] [CrossRef]
Reingold, E.M.; Tarjan, R.E. On a greedy heuristic for complete matching. SIAM J. Comput. 1981, 10, 676–681. [Google Scholar] [CrossRef]
Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Online, 5–9 January 2021. [Google Scholar]
Scabini, L.; Zielinski, K.M.; Ribas, L.C.; Gonçalves, W.N.; De Baets, B.; Bruno, O.M. RADAM: Texture recognition through randomized aggregated encoding of deep activation maps. Pattern Recognit. 2023, 143, 109802. [Google Scholar] [CrossRef]
Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef]
Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning PMLR, Baltimore, MD, USA, 17–23 July 2022. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35. [Google Scholar]

Figure 1. MSNet network architecture diagram.

Table 1. Experimental hardware and software parameters.

Hardware environment	CPU	16 vCPU Intel(R) Xeon(R) Platinum 8481C
	GPU	NVIDIA GeForce RTX 4090
	RAM	32 GB
	Video Memory	24 GB
Software environment	OS	Ubuntu20.04
	CUDA Toolkit	V11.3
	CUDNN	V8.2.4
	Python	3.8.10
	torch	1.11.0
	torchvision	0.14.1

Table 2. Model performance comparison.

Data	Short		Mid		Long
Method	LSTM	xLSTM	LSTM	xLSTM	LSTM	xLSTM
MSE	0.0079	0.0065	0.0107	0.0092	0.0197	0.0136
MAE	0.0154	0.0124	0.0195	0.0178	0.0169	0.0169
RMSE	0.0168	0.0147	0.0214	0.0163	0.0214	0.0214
R²	0.9365	0.9483	0.8938	0.9137	0.8553	0.8741

Table 3. Attention contrast experiment.

Method	MSE	MAE	RMSE	R²
SE	0.0183	0.0191	0.0237	0.8446
CBAM	0.0149	0.0168	0.0212	0.8761
TA	0.0195	0.0199	0.0223	0.8638
CA	0.0143	0.0165	0.0208	0.8795
MSCA	0.0127	0.0161	0.0191	0.8878

Table 4. Optimization algorithm comparison experiment.

Method	MSE	MAE	RMSE	R²
Adam	0.0136	0.0169	0.0214	0.8741
AdamW	0.0144	0.0173	0.0214	0.8734
RAdam	0.0155	0.0181	0.0244	0.8621
PSO	0.0167	0.0183	0.0225	0.8602
SPGO	0.0124	0.0161	0.0193	0.8863

Table 5. Model stability test results.

	Method	MAE	RMSE	R²
Rolling verification periods	xLSTM	0.0275 ± 0.0031	0.0304 ± 0.0017	0.8462 ± 0.0026
Rolling verification periods	Ours	0.0228 ± 0.004	0.0243 ± 0.0022	0.8812 ± 0.0057
Cross-domain testing	xLSTM	0.0281 ± 0.0014	0.0335 ± 0.0021	0.8449 ± 0.0041
Cross-domain testing	Ours	0.0254 ± 0.0023	0.0266 ± 0.0019	0.8626 ± 0.0027

Table 6. Comparison results of several methods on the GHWA dataset.

Model	MSE	MAE	RMSE	R²
LSTM	0.0159	0.0207	0.0244	0.8353
BiLSTM	0.0175	0.0181	0.0229	0.8551
xLSTM	0.0136	0.0169	0.0214	0.8741
GRU	0.0361	0.0195	0.0247	0.8329
FEDformer	0.0124	0.0162	0.0205	0.8836
Informer	0.0145	0.0168	0.0213	0.8745
Ours	0.0093	0.0152	0.00193	0.9067

Table 7. Comparison results of several methods on the SZGT dataset.

Model	MSE	MAE	RMSE	R²
LSTM	0.0084	0.0612	0.0936	0.9036
BiLSTM	0.0065	0.0533	0.0804	0.9255
xLSTM	0.0049	0.0479	0.0703	0.9423
GRU	0.0066	0.0586	0.0813	0.9238
FEDformer	0.0042	0.0443	0.0645	0.9519
Informer	0.0041	0.0434	0.0634	0.9536
Ours	0.0031	0.0368	0.0557	0.9642

Table 8. Comparison results of several methods on the ZGBA dataset.

Model	MSE	MAE	RMSE	R²
LSTM	0.0073	0.0185	0.0231	0.8533
BiLSTM	0.0063	0.0183	0.0228	0.8551
xLSTM	0.0049	0.0169	0.0221	0.8663
GRU	0.0051	0.0172	0.0223	0.8625
FEDformer	0.0048	0.0178	0.0221	0.8653
Informer	0.0042	0.0167	0.0213	0.8757
Ours	0.0045	0.0162	0.0206	0.8828

Table 9. Comparison results of several methods on the HLKG dataset.

Model	MSE	MAE	RMSE	R²
LSTM [41]	0.0094	0.0232	0.0307	0.8943
BiLSTM [42]	0.0082	0.0206	0.0285	0.9088
xLSTM [15]	0.0075	0.0194	0.0273	0.9167
GRU [43]	0.0077	0.0191	0.0277	0.9142
FEDformer [44]	0.0069	0.0159	0.0263	0.9227
Informer [45]	0.0071	0.0171	0.0265	0.9213
Ours	0.0063	0.0163	0.0244	0.9331

Table 10. Random seed training results on four datasets.

Model	Median Test MSE	Worst-Case Test MSE
GHWA	0.0091	0.0121
SZGT	0.0035	0.0067
ZGBA	0.0049	0.0074
GHWA	0.0060	0.0082

Table 11. Comparison results of several methods on the ETTh1 dataset.

Model	MSE	MAE	RMSE	R²
LSTM	0.0284	0.0471	0.0533	0.9348
BiLSTM	0.0243	0.0405	0.0493	0.9442
xLSTM	0.0222	0.0381	0.0471	0.9489
GRU	0.0329	0.0424	0.0574	0.9244
FEDformer	0.0261	0.0319	0.0511	0.9433
Informer	0.0185	0.0356	0.0431	0.9547
Ours	0.0159	0.0327	0.0339	0.9633

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, J.; Wan, F.; He, M. A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization. Algorithms 2026, 19, 67. https://doi.org/10.3390/a19010067

AMA Style

He J, Wan F, He M. A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization. Algorithms. 2026; 19(1):67. https://doi.org/10.3390/a19010067

Chicago/Turabian Style

He, Jiarun, Fangying Wan, and Mingfang He. 2026. "A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization" Algorithms 19, no. 1: 67. https://doi.org/10.3390/a19010067

APA Style

He, J., Wan, F., & He, M. (2026). A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization. Algorithms, 19(1), 67. https://doi.org/10.3390/a19010067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Stock Price Prediction Network That Integrates Multi-Scale Channel Attention Mechanism and Sparse Perturbation Greedy Optimization

Abstract

1. Introduction

Review

2. Datasets and Methodology

2.1. Data Acquisition and Preprocessing

2.2. MSNet

2.2.1. Multi-Scale Channel Attention (MSCA)

2.2.2. Sparse Perturbation Greedy Optimization (SPGO)

3. Results and Analysis

3.1. Experimental Environment and Training Details

3.2. Evaluation Metric

3.3. Module Effectiveness Experiments

3.3.1. Effectiveness of xLSTM

3.3.2. Effectiveness of MSCA

3.3.3. Effectiveness of SPGO

3.3.4. Model Stability Testing

3.3.5. Comparison Experiments with State-of-the-Art Methods

3.3.6. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI