Next Article in Journal
Enhance the Performance of Expectation Propagation Detection in Spatially Correlated Massive MIMO Channels via DFT Precoding
Previous Article in Journal
ADP-Based Fault-Tolerant Control with Stability Guarantee for Nonlinear Systems
Previous Article in Special Issue
Hybrid CNN-LSTM-GNN Neural Network for A-Share Stock Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

EMAT: Enhanced Multi-Aspect Attention Transformer for Financial Time Series Forecasting

School of Computer and Information Engineering, Shanghai Polytechnic University, Shanghai 201209, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(10), 1029; https://doi.org/10.3390/e27101029
Submission received: 10 September 2025 / Revised: 28 September 2025 / Accepted: 29 September 2025 / Published: 1 October 2025
(This article belongs to the Special Issue Entropy, Artificial Intelligence and the Financial Markets)

Abstract

Financial time series prediction remains a challenging task due to the inherent non-stationarity, noise, and complex temporal dependencies present in market data. Traditional forecasting methods often fail to capture the multifaceted nature of financial markets, where temporal proximity, trend dynamics, and volatility patterns simultaneously influence price movements. To address these limitations, this paper proposes the Enhanced Multi-Aspect Transformer (EMAT), a novel deep learning architecture specifically designed for stock market prediction. EMAT incorporates a Multi-Aspect Attention Mechanism that simultaneously captures temporal decay patterns, trend dynamics, and volatility regimes through specialized attention components. The model employs an encoder–decoder architecture with enhanced feed-forward networks utilizing SwiGLU activation, enabling superior modeling of complex non-linear relationships. Furthermore, we introduce a comprehensive multi-objective loss function that balances point-wise prediction accuracy with volatility consistency. Extensive experiments on multiple stock market datasets demonstrate that EMAT consistently outperforms a wide range of state-of-the-art baseline models, including various recurrent, hybrid, and Transformer architectures. Our ablation studies further validate the design, confirming that each component of the Multi-Aspect Attention Mechanism makes a critical and quantifiable contribution to the model’s predictive power. The proposed architecture’s ability to simultaneously model these distinct financial characteristics makes it a particularly effective and robust tool for financial forecasting, offering significant improvements in accuracy compared to existing approaches.

1. Introduction

Stock price forecasting represents one of the most critically important yet fundamentally challenging tasks in modern financial markets [1]. The accuracy of such predictions directly influences investment decision-making processes for individual investors through enhanced asset allocation strategies, while simultaneously serving as a cornerstone for corporate strategic planning and institutional risk management frameworks. The inherent complexity of this task stems from the highly non-linear and dynamic evolution of financial time series, which are shaped by a complex interplay of factors including historical price patterns, company-specific economic fundamentals, market sentiment dynamics, and broader macroeconomic conditions [2]. Financial markets exhibit intrinsic non-stationarity and extreme volatility, characteristics that render conventional forecasting approaches inadequate for capturing the full spectrum of market dynamics and pose fundamental challenges to predictive modeling.
To address these challenges, effective predictive models must demonstrate the capability to simultaneously capture short-term market fluctuations and long-term temporal dependencies inherent in financial time series data. The development of sophisticated architectures that can model complex temporal relationships while maintaining computational efficiency has thus become a central focus in advancing stock price prediction methodologies.
Early approaches to stock price modeling predominantly relied on classical statistical methods [3], including moving averages, ARMA, ARIMA, ARCH, and GARCH models. While these methods offer computational simplicity and interpretability, they fundamentally depend on idealized assumptions such as linearity and stationarity that are rarely satisfied in complex, real-world financial markets [4]. In practice, numerous dynamic uncertainties—including market noise, policy changes, and market manipulation—interact in complex ways, rendering traditional statistical models inadequate for extracting meaningful patterns from large-scale, high-dimensional market data.
Recognizing these limitations, researchers subsequently turned to machine learning algorithms [5], including K-Nearest Neighbors (KNN), decision trees, Support Vector Machines (SVM), random forests, and XGBoost. These approaches demonstrated improved performance by accommodating more flexible data distributions and have been successfully applied to various tasks including investment selection, feature extraction, and risk prediction [6]. However, traditional machine learning methods often rely heavily on handcrafted features and exhibit limited capacity to capture deep temporal patterns or complex interactions among multiple time series variables. Furthermore, they face significant challenges in scaling to and generalizing across complex, large-scale financial markets.
The advent of advanced computational resources has facilitated the development of deep learning approaches capable of extracting latent patterns from extensive historical financial data [7]. Models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory networks (LSTMs) have gained prominence in financial forecasting applications. CNNs excel at capturing local spatial–temporal correlations and have been successfully applied to stock price direction prediction and multi-feature integration tasks [8]. RNNs inherently process sequential information but suffer from vanishing or exploding gradient problems when handling long sequences. LSTMs address these gradient issues through sophisticated gating mechanisms [9] and demonstrate superior capability in capturing long-term dependencies. Nevertheless, even LSTM-based methods can experience information decay over very long sequences due to their sequential processing nature, which limits their ability to fully capture evolving patterns in extended financial time series.
More recently, Transformer models—originally developed for natural language processing—have been increasingly adapted for time series forecasting applications [10,11,12]. Existing financial applications include enhancing locality awareness and integrating auxiliary data such as social media sentiment or textual news signals with price data. However, a critical limitation of many current Transformer-based approaches in finance is their primary focus on leveraging the architecture to analyze auxiliary inputs (e.g., social sentiment, news content) rather than fundamentally enhancing feature extraction from core numerical time series data [13]. This heavy reliance on external, often uncertain information sources can result in unstable performance and limited generalizability across different markets, particularly since the quality and relevance of such auxiliary data can vary significantly [14]. Consequently, many existing models emphasize data breadth through multi-modal inputs rather than achieving depth of understanding in the primary time series patterns. When core numerical feature extraction capabilities are inadequate, incorporating potentially noisy auxiliary data may not yield reliable performance improvements, suggesting that strengthening a model’s ability to extract comprehensive features directly from historical time series data provides a more robust foundation for financial forecasting.
A notable advancement toward addressing this limitation is the Series Decomposition Transformer with Period-correlation (SDTP) [15], which focuses on intrinsic patterns within price series by explicitly separating trend and seasonal components while leveraging inherent periodicity. SDTP demonstrated superior performance by concentrating on core numerical data rather than external information sources. However, even this promising direction reveals significant opportunities for further enhancement [16]. Current Transformer models, including SDTP, may not fully capture the complex interplay of temporal dependencies, dynamic market trends, and varying volatility regimes simultaneously. Additionally, existing approaches often operate on single series and may underutilize the multi-dimensional nature of financial time series characteristics. These limitations highlight the critical need for more sophisticated attention mechanisms that can explicitly model multiple aspects of financial data concurrently [17].
To address these fundamental limitations and provide a more robust foundation for financial time series forecasting, this paper proposes a novel deep learning architecture: the Enhanced Multi-Aspect Transformer (EMAT). EMAT is specifically designed for stock price prediction by leveraging intrinsic temporal patterns in historical price data through sophisticated attention mechanisms that avoid dependence on uncertain external information sources. The core innovations of EMAT include (1) a Multi-Aspect Attention Mechanism that simultaneously incorporates temporal decay patterns, trend dynamics, and volatility awareness; (2) a comprehensive multi-objective loss function that balances point-wise prediction accuracy with volatility consistency; and (3) a flexible encoder–decoder architecture that adapts to different forecasting scenarios. This approach represents a significant advancement in adapting Transformer architectures for financial time series prediction by focusing on the temporal, trend, and volatility characteristics inherent in price movements, thereby providing enhanced robustness and generalizability compared to approaches that rely on external information sources. The key contributions of this paper are as follows:
1.
We propose EMAT, a novel Transformer-based architecture that integrates a Multi-Aspect Attention Mechanism with an enhanced loss function to effectively capture complex temporal dependencies in financial time series data.
2.
We introduce a comprehensive attention framework that simultaneously considers temporal proximity effects, trend dynamics, and volatility patterns, enabling the model to adapt to different market regimes and capture multiple dimensions of price behavior.
3.
We design a multi-objective loss function that optimizes both point-wise prediction accuracy and volatility consistency, providing more robust predictions that align with practical financial forecasting requirements.
4.
We conduct extensive experiments on multiple stock market datasets, demonstrating that EMAT consistently outperforms state-of-the-art time series forecasting methods in terms of prediction accuracy and stability across diverse market conditions.
The remainder of this paper is organized as follows. Section 2 reviews related work in financial time series forecasting. Section 3 formulates the problem and provides necessary background. Section 4 presents the proposed EMAT methodology. Section 5 reports experimental results and analysis. Finally, Section 6 concludes the paper and discusses future directions.

2. Related Work

In recent decades, forecasting stock market trends has attracted growing attention due to the critical importance of financial markets and their pervasive impact on economic activity [18]. However, accurately predicting stock prices remains a formidable challenge due to the inherent complexity, high volatility, and non-stationarity characteristics of financial markets [19].
Stock price prediction is typically formulated as a time series forecasting problem: given a historical sequence ( x 1 , x 2 , , x T ) , the objective is to predict future values ( x T + 1 , x T + 2 , , x T + k ) . Methodologies for stock price forecasting can be broadly categorized into three main classes: classical statistical models, traditional machine learning models, and modern deep learning models, each representing an evolutionary advance in forecasting capabilities.

2.1. Statistical Models for Financial Time Series Forecasting

Classical statistical approaches form the foundation of time series analysis, employing mathematical models to generate empirical predictions from historical data. Representative methods include ARMA [20], ARIMA [21], ARCH [22], and GARCH models [23]. These models are valued for their interpretability and computational efficiency, making them widely adopted in financial applications.
However, statistical models rely on strong assumptions such as linearity and stationarity, which are frequently violated in real-world stock markets [24]. Financial markets involve numerous interacting variables, abrupt policy changes, and behavioral factors that create complex and noisy dynamics [25]. Consequently, these traditional approaches often struggle to capture the intricate correlations and evolving uncertainties present in large-scale financial systems.

2.2. Machine Learning Models for Financial Time Series Forecasting

Machine learning (ML) approaches have gained prominence in financial data analysis by moving beyond the rigid assumptions of classical statistical models. Notable examples include K-Nearest Neighbors (KNN) [26,27], decision trees [28,29], Support Vector Machines (SVM) [30,31], and random forests [32]. These models have been successfully applied to various financial tasks, including investment selection, rule induction from historical data, stock price movement forecasting, and risk prediction.
However, traditional ML models face significant limitations in financial applications. They typically require extensive feature engineering and may overlook important financial indicators due to their limited capacity for automatic feature extraction. While filter-based and voting-based feature selection methods have been proposed to identify relevant indicators for stock returns [33], these approaches still struggle to capture the full spectrum of market dynamics, particularly under complex and volatile market conditions [34].

2.3. Deep Learning Models for Financial Time Series Forecasting

With the rise of high-performance computing, deep learning has achieved remarkable success in many domains, offering novel solutions for stock prediction. CNNs have been effectively utilized for price direction prediction and integrating diverse stock-related information [35,36].
RNNs and their variants, particularly LSTM networks [9], are widely adopted for their ability to process sequential data and capture temporal dependencies. LSTMs address the vanishing gradient problems inherent in vanilla RNNs through gating mechanisms, making them suitable for longer-term forecasting [37]. However, LSTMs can still suffer from information loss over very long sequences due to their sequential processing nature, potentially limiting their ability to capture evolving patterns in extended time series [38]. Furthermore, alternative RNN paradigms such as Reservoir Computing, specifically Echo State Networks (ESN), have been proposed as a computationally efficient approach. ESNs have demonstrated strong performance in time series forecasting while drastically reducing training complexity and resources for training with similar performance [39].
Recently, Transformer-based models [40] have been increasingly applied to financial forecasting, leveraging their parallel processing capabilities and self-attention mechanisms. Early applications include variants that enhance locality with multi-scale Gaussian priors [41] and models integrating auxiliary data such as social media [42] and sentiment information [43]. However, many early Transformer-based approaches primarily focused on integrating external information rather than fundamentally enhancing feature extraction from core price series. This reliance on potentially noisy external data can lead to unstable performance and limited generalizability across diverse markets. A more promising direction emphasizes strengthening intrinsic feature extraction from historical price data. The Series Decomposition Transformer with Period-correlation (SDTP) model exemplifies this approach by explicitly separating trend and seasonal components while leveraging inherent periodicity [15]. By concentrating on core numerical patterns rather than uncertain external features, SDTP has demonstrated improved forecasting performance.
Despite these advances, significant challenges remain in tailoring Transformer architectures to financial time series. Current limitations include the need for more granular attention mechanisms that explicitly account for time decay, trend direction, and volatility patterns. Additionally, standard positional encodings may not adequately capture irregular time intervals common in financial markets. These observations highlight the need for specialized architectures capable of capturing the multifaceted nature of financial data, directly motivating our proposed EMAT model.

3. Problem Definition

Consider a multivariate financial time series X = { x t } t = 1 T , where x t R d denotes the market state at time step t and d is the number of features. Given a fixed-length historical window X t = ( x t I + 1 , x t I + 2 , , x t ) R I × d with window size I, our goal is to learn a prediction function f : R I × d R that estimates the next day’s closing price:
x ^ t + 1 = f ( X t ) .
In adapting the Transformer architecture for this task, we define a “token” as the feature vector x t corresponding to a single time step. Thus, the input window X t is treated as a sequence of tokens, where each token encapsulates the complete market state at that point in time. This direct tokenization approach preserves the full granularity of the time series data. The prediction problem is formulated as a rolling-window regression, where the model is trained to minimize the error between the predicted value x ^ t + 1 and the actual value x t + 1 over the training period. After each forecast, the window shifts forward by one time step and the process repeats for the next prediction.
Formally, given a training set D = { ( X t , x t + 1 ) } t = I T 1 , we aim to find the optimal parameters θ * of the prediction function f θ that minimizes the following objective:
θ * = arg min θ t = I T 1 L ( f θ ( X t ) , x t + 1 )
where L is an appropriate loss function that measures the discrepancy between predictions and actual values. The main challenge lies in capturing both the short-term market dynamics and long-term temporal dependencies in X t , while accounting for the non-stationary and noisy nature of financial time series.

3.1. Transformer Architecture Overview

The Transformer model, introduced by Vaswani et al. [40], has significantly advanced natural language processing and has been adapted for time series forecasting. The canonical Transformer architecture comprises an encoder and a decoder. The encoder transforms input sequences through successive layers: each layer applies input embeddings with positional encodings to capture sequential order, multi-head self-attention to capture dependencies across positions, position-wise feed-forward networks for feature transformation, and residual connections with layer normalization for stable training.
The decoder extends this structure with additional mechanisms: it employs masked self-attention to prevent the model from attending to future tokens during training, cross-attention to incorporate information from the encoder, and position-wise feed-forward networks, all coupled with residual connections and layer normalization. A key strength of Transformers is their ability to capture long-range dependencies by processing all input positions in parallel, thereby overcoming the sequential limitations of RNNs. This property is particularly valuable for financial time series, which often exhibit complex long-term temporal patterns. However, financial time series are characterized by multiple interacting dimensions that pose additional challenges for effective modeling.

3.2. Multi-Dimensional Characteristics of Financial Time Series

Financial time series exhibit complex multi-dimensional characteristics that are fundamental to accurate forecasting. Understanding and modeling these characteristics is crucial for developing effective prediction models. We identify three key dimensions that significantly influence price movements and must be simultaneously considered in any robust forecasting framework.
First, financial time series exhibit complex temporal dependencies where recent observations typically carry more predictive power than distant ones. This temporal decay effect can be modeled as
w t = exp ( λ · | t t 0 | )
where w t represents the influence weight of observation at time t, t 0 is the current time, and λ controls the decay rate. Additionally, financial data often exhibit periodic patterns at multiple time scales, including daily trading patterns, weekly cycles, and seasonal variations, which can be represented as
x t = i = 1 K α i sin ( 2 π f i t + ϕ i ) + ϵ t
where x t represents the time series value at time t, K is the number of periodic components, f i denotes the frequency of the i-th component, α i and ϕ i are the amplitude and phase respectively, and ϵ t represents the non-periodic residual component.
Second, financial markets exhibit distinct trend behaviors that influence future price movements. Trend patterns can be characterized by momentum effects, where
Momentum t = P t P t n P t n
where P t represents the price at time t and n is the lookback period. These trend dynamics manifest as bullish or bearish market regimes, with different predictive patterns emerging during trending versus sideways market conditions.
Third, financial markets experience varying volatility regimes that significantly impact prediction accuracy. Volatility clustering, where high-volatility periods tend to be followed by high-volatility periods, can be mathematically represented by the following:
σ t 2 = ω + α ϵ t 1 2 + β σ t 1 2
where σ t 2 represents the conditional variance at time t, ω is the constant term, α and β are parameters controlling the impact of past squared residuals and past variances, respectively, and ϵ t 1 is the previous period’s residual. Different volatility regimes require adaptive modeling approaches, as prediction patterns that work in low-volatility environments may fail during high-volatility periods.
These three characteristics are often modeled independently in traditional approaches, failing to capture their complex interactions. The challenge lies in developing unified frameworks that can simultaneously account for temporal dependencies, trend dynamics, and volatility patterns while adapting to their time-varying nature. This limitation motivates our development of a Multi-Aspect Attention Mechanism that explicitly integrates these three dimensions into a coherent modeling framework.

3.3. Challenges in Financial Time Series Prediction

The multi-dimensional characteristics discussed above give rise to several fundamental challenges that distinguish financial time series prediction from conventional forecasting tasks. These challenges highlight the limitations of existing approaches and underscore the need for sophisticated modeling frameworks.
First, financial markets exhibit strong non-stationarity, where statistical properties such as mean, variance, and correlation structures evolve dynamically over time [44]. This characteristic renders traditional statistical models inadequate for capturing evolving market patterns. Second, financial data contains substantial noise arising from market microstructure effects, high-frequency trading activities, and information asymmetries, which can obscure underlying price signals and lead to prediction instability [45].
Third, price movements are influenced by factors operating across multiple time scales, from high-frequency fluctuations to long-term macroeconomic trends, necessitating models capable of capturing multi-resolution temporal patterns. Fourth, financial markets experience distinct regime changes corresponding to different economic conditions—such as bull versus bear markets and high- versus low-volatility periods—each requiring adaptive modeling approaches [46]. Fifth, financial markets exhibit time-varying multiscaling behavior [47,48,49,50,51], a well-established stylized fact of stock market dynamics [52,53,54]. This means the statistical properties of market fluctuations are dependent on the time scale of observation, and the nature of this scaling can shift dramatically between stable, efficient market periods and turbulent, crisis periods. This dynamic complexity, where the underlying “rules” of the market change across different time horizons, poses a significant challenge for predictive models that may assume a more stable data-generating process.
These challenges directly correspond to the three key dimensions identified earlier: capturing temporal proximity effects to address multi-scale dependencies, identifying and adapting to trend patterns to handle regime changes, and responding to volatility variations to manage non-stationarity and noise. Effective solutions must integrate these capabilities within a unified framework that ensures robustness across diverse market conditions while maintaining computational efficiency for practical applications.

4. Methodology

To effectively address the complex temporal patterns inherent in financial time series, we propose EMAT, a novel deep learning architecture. This section details its information processing mechanism, which transforms raw time series data into accurate predictions through a structured pipeline. The process begins by representing the input as a sequence of high-dimensional tokens enriched with positional encodings to preserve temporal order.
At the core of the EMAT encoder is our Multi-Aspect Attention Mechanism, which simultaneously disentangles market signals from three distinct financial perspectives: temporal decay, trend dynamics, and volatility regimes. These parallel insights are then synergistically fused through a sequential gating process to create a holistic market representation. This enriched representation is subsequently utilized by the decoder, via cross-attention, to generate the final forecast. The entire predictive process is guided by a comprehensive multi-objective loss function that balances point-wise accuracy with volatility consistency. The following subsections provide the technical details of these components, and the overall architecture is illustrated in Figure 1.

4.1. Multi-Aspect Attention Mechanism

Traditional multi-head attention mechanisms, while effective for many sequence modeling tasks, often fail to capture the complex temporal dynamics inherent in financial time series. To address this limitation, we propose a Multi-Aspect Attention Mechanism that enhances the standard attention framework by incorporating specialized components for temporal decay, trend analysis, and volatility awareness. The mechanism processes information through a base attention computation followed by three enhancement streams and sequential multiplicative gating for output refinement, as illustrated in Figure 2.
The mechanism begins by computing standard query (Q), key (K), and value (V) projections using linear transformations. These projections serve as inputs for the base attention computation and subsequent enhancement components.

4.1.1. Base Attention with Content-Aware Gating

The foundation component computes scaled dot-product attention scores, which are then refined through a content-aware gating mechanism. The gated attention scores A base are calculated as
A base = Q K T d k Gate content ( [ Q ; K ] )
where d k is the key dimension, ⊙ denotes element-wise multiplication, and Gate content is a neural network that processes concatenated query–key pairs to generate content-dependent attention modulation weights.

4.1.2. Temporal Enhancement Component

This component captures temporal proximity effects by incorporating positional information and learnable decay patterns. The temporal enhancement is computed as
A time = ( Q W q ( t ) ) ( K W k ( t ) ) T d k Decay time
where W q ( t ) and W k ( t ) are temporal projection matrices, and the decay matrix is defined as
Decay time ( i , j ) = exp ( | i j | · σ ( γ t ) )
with γ t being a learnable parameter and σ ( · ) the sigmoid function. Positional features are extracted through a dedicated neural network processing sequential position indices.

4.1.3. Trend Enhancement Component

To capture market momentum patterns, this component analyzes price movement characteristics derived from value sequences. Price changes are computed as
Δ V t = V t V t 1
The computed price changes serve two critical functions in the trend-aware attention mechanism. First, they are processed through a dedicated feature extractor to generate trend-specific features:
Features trend = f trend ( mean ( Δ V t ) )
where f trend represents a neural network that extracts meaningful trend patterns from the average price changes.
Second, the price changes are utilized to compute a trend-based decay matrix that modulates attention based on the similarity of price movement patterns between different time steps:
Decay trend ( i , j ) = exp ( | mean ( Δ V i ) mean ( Δ V j ) |   ·   σ ( γ t r ) )
where γ t r is a learnable decay parameter.
The trend-aware attention component is then calculated as
A trend = ( Q W q ( t r ) ) ( K W k ( t r ) ) T d k Decay trend
where W q ( t r ) and W k ( t r ) are trend-specific projection matrices. The extracted trend features Features trend are subsequently used in the sequential gating process to further refine the attention output.

4.1.4. Volatility Enhancement Component

This component adapts attention patterns based on market volatility regimes. Volatility is computed using a sliding window approach:
Vol i = std ( V i w + 1 : i )
where w represents the window size (set to 5 in our implementation).
The computed volatility values serve two essential functions in the volatility-aware attention mechanism. First, they are processed through a dedicated feature extractor to generate volatility-specific features:
Features vol = f vol ( Vol i )
where f vol represents a neural network that extracts meaningful volatility regime patterns from the computed volatility values.
Second, the volatility information is utilized to compute a volatility-based decay matrix that modulates attention based on the similarity of volatility regimes between different time steps:
Decay vol ( i , j ) = exp ( | Vol i Vol j |   ·   σ ( γ v ) )
where γ v is a learnable decay parameter that controls the sensitivity to volatility differences.
The volatility-aware attention component is then calculated as
A vol = ( Q W q ( v ) ) ( K W k ( v ) ) T d k Decay vol
where W q ( v ) and W k ( v ) are volatility-specific projection matrices. The extracted volatility features Features vol are subsequently employed in the sequential gating process to further modulate the attention output based on market volatility characteristics.

4.1.5. Multi-Component Integration and Sequential Gating

The enhanced attention components are integrated with the base attention through learnable weight combinations:
A combined = w base A base + w t A time + w t r A trend + w v A vol
where the weights are derived from learnable parameters: w t = σ ( α t ) , w t r = σ ( α t r ) , w v = σ ( α v ) , and w base = 1 w t w t r w v .
The combined attention scores are applied to the value matrix through softmax normalization:
Output attn = softmax ( A combined ) V
This intermediate output undergoes sequential refinement through multiplicative gating mechanisms:
Output 1 = Output attn Gate time ( [ Output attn ; Features time ] )
Output 2 = Output 1 Gate trend ( [ Output 1 ; Features trend ] )
Output final = Output 2 Gate vol ( [ Output 2 ; Features vol ] )
where each gate processes concatenated features to generate element-wise modulation weights. This multi-aspect approach enables the model to develop a comprehensive understanding of financial time series patterns, incorporating temporal proximity, trend dynamics, and volatility regimes into a unified attention framework. The proposed EMAT architecture, through its integration of specialized attention mechanisms and enhanced loss functions, provides a robust foundation for accurate financial time series prediction.

4.2. Enhanced Loss Function

Financial time series prediction requires a comprehensive loss function that captures multiple aspects of prediction accuracy. Traditional loss functions often focus solely on point-wise accuracy, which may not adequately reflect the complex nature of financial forecasting. To address this limitation, we propose a multi-objective loss function that simultaneously considers point-wise accuracy and volatility prediction. This enhanced loss function is designed to better align with the practical needs of financial forecasting and improve the model’s ability to capture various aspects of market behavior.
The total loss function is formulated as a weighted combination of three components:
L total = λ 1 L MSE + λ 2 L MAE + λ 3 L volatility
where λ 1 , λ 2 , λ 3 are hyperparameters that determine the relative importance of each component. These weights are set based on the characteristics of the target financial market.
The first component, mean squared error (MSE), measures the average squared difference between predicted and actual values:
L MSE = 1 n i = 1 n ( y i y ^ i ) 2
This component penalizes larger errors more heavily, making it particularly sensitive to outliers and extreme market movements. The MSE loss helps the model focus on reducing significant prediction errors, which is crucial for financial applications where large errors can have substantial implications.
The second component, mean absolute error (MAE), provides a more robust measure of prediction accuracy:
L MAE = 1 n i = 1 n | y i y ^ i |
Unlike MSE, MAE is less sensitive to outliers and provides a more balanced view of the model’s performance. This component helps ensure that the model maintains reasonable prediction accuracy across all data points, not just those with extreme values.
The third component, volatility loss, measures the accuracy of volatility predictions:
L volatility = 1 m j = 1 m ( σ j pred σ j true ) 2
where σ j pred and σ j true represent the predicted and actual volatility computed using sliding windows, respectively. The volatility is calculated as the standard deviation within a sliding window of size w:
σ j = std ( unfold ( x , size = w , step ) j )
where w denotes the window size, the unfold operation creates overlapping windows of size w with unit step, and the standard deviation is computed for each window. In our implementation, we set w = 5 to capture short-term market fluctuations. This component enables the model to better capture market volatility patterns and adapt to different market regimes.
The implementation of this enhanced loss function involves several key steps. First, the input predictions and targets are flattened to ensure consistent dimensionality. The MSE and MAE components are computed using standard loss computation methods. For volatility calculation, we apply the unfold operation to create sliding windows of size w, then compute the standard deviation within each window for both predictions and targets. The volatility loss is then computed as the mean squared error between predicted and actual volatility sequences.
This enhanced loss function offers several advantages over traditional approaches. It balances point-wise accuracy with volatility prediction, considers both magnitude sensitivity through MSE and robustness through MAE, incorporates market volatility patterns through sliding window analysis, and provides a comprehensive evaluation metric that aligns with practical financial forecasting needs. By simultaneously optimizing multiple objectives, the model can better capture the complex nature of financial time series and provide more reliable predictions for practical applications.

4.3. Model Architecture

The EMAT model employs an encoder–decoder architecture tailored for financial time series forecasting. This structure enables the model to capture complex temporal dependencies by separating the roles of sequence representation and prediction generation. The model begins by processing the input sequence of tokens, where each token represents the feature vector of a single time step, as established in Section 3. An input embedding layer first transforms each token from its original feature dimension into a higher-dimensional vector representation. The encoder is composed of multiple layers, each containing a Multi-Aspect Attention Mechanism and an enhanced feed-forward network based on the SwiGLU activation. Specifically, each encoder layer adopts a pre-normalization residual formulation:
EncoderLayer ( x ) = x + Dropout ( MAAM ( LayerNorm ( x ) ) ) + Dropout ( SwiGLU ( LayerNorm ( x ) ) )
where x is the output of the first residual connection, MAAM denotes our Multi-Aspect Attention Mechanism, and SwiGLU is the enhanced feed-forward component. To preserve temporal information, positional encodings are incorporated as follows:
x pos = x d model + PE
where PE denotes sinusoidal positional embeddings, and d model is the embedding dimension.
The decoder utilizes both self-attention and cross-attention to integrate target-side and source-side information. Each decoder layer consists of three major components:
y 1 = x + Dropout ( SelfAttention ( LayerNorm ( x ) ) )
y 2 = y 1 + Dropout ( CrossAttention ( LayerNorm ( y 1 ) , enc _ output ) )
y out = y 2 + Dropout ( SwiGLU ( LayerNorm ( y 2 ) ) )
where enc _ output represents the encoder’s output. The decoder incorporates causal masking to maintain temporal integrity during training and incorporates both self-attention and cross-attention computations.

5. Experiments

To comprehensively evaluate the performance of the proposed EMAT model, we conduct extensive experiments following a systematic evaluation framework. Our experimental design encompasses several key components: dataset curation and preprocessing, evaluation metrics definition, baseline model comparison, detailed experimental setup, comprehensive performance analysis across diverse markets, ablation studies to assess individual component contributions, and parameter sensitivity analysis.
The evaluation framework follows a structured approach. First, we establish the experimental foundation through careful dataset selection and feature preprocessing, followed by the definition of appropriate evaluation metrics for financial time series prediction. We then benchmark EMAT against state-of-the-art baseline models using standardized experimental protocols. Performance analysis is conducted across two major market categories, namely Chinese and global market indices, ensuring comprehensive validation across diverse market conditions and geographic regions. Subsequently, ablation studies quantify the individual contributions of key architectural components, while parameter sensitivity analysis provides insights into model robustness and optimal configuration strategies.
This systematic approach ensures thorough validation of EMAT’s effectiveness while providing detailed insights into its behavior across different market conditions, time periods, and parameter configurations. The comprehensive evaluation demonstrates both the model’s superior predictive performance and its practical applicability for real-world financial forecasting scenarios.

5.1. Dataset and Preprocessing

For rigorous validation of the EMAT model, we curated a diverse collection of financial time series datasets spanning different geographic markets and market characteristics. The dataset selection strategy validates the model’s effectiveness across various market conditions, volatility regimes, and temporal patterns.
  • Chinese Market Indices: We selected three major Chinese stock market indices representing different market segments: the Shanghai Stock Exchange Composite Index (SSE Composite, 000001.SS), the Shenzhen Stock Exchange Component Index (SZSE Component, 399001.SZ), and the China Securities Index 300 (CSI 300, 000300.SS). These indices provide comprehensive coverage of the Chinese equity market and represent different market capitalizations and sector compositions.
  • Global Market Indices: To evaluate the model’s generalization capabilities across international markets, we include three major global indices: the Dow Jones Industrial Average (DJI), the S&P 500 Index, and the CAC 40 Index. These indices represent mature developed markets with distinct characteristics, allowing us to assess the model’s robustness across different economic environments and market structures.
Table 1 provides an overview of the selected datasets, including their categorization, trading symbols, and key market characteristics. Figure 3 illustrates the historical price trajectories for all selected indices, demonstrating the diverse volatility patterns and trend behaviors across Chinese and global financial markets.
To ensure effective model training and consistent performance across different market conditions, we apply a rolling Min–Max normalization technique specifically designed for financial time series. Unlike traditional global normalization approaches, this method employs a sliding window mechanism that adapts to local market conditions:
x i * = x i min ( x i w : i 1 ) max ( x i w : i 1 ) min ( x i w : i 1 )
where x i represents the original price at time i, w denotes the rolling window size, and x i * is the normalized value. This rolling normalization approach offers several advantages: (1) it adapts to changing market regimes and volatility levels; (2) it preserves local temporal relationships within the data; (3) it reduces the impact of long-term trends on short-term pattern recognition. The normalization window size is empirically set to ensure optimal balance between local adaptivity and statistical stability. During evaluation, predictions are inverse-transformed to their original price scale using the corresponding rolling statistics for accurate performance assessment.

5.2. Evaluation Metrics

To provide comprehensive and objective assessment of model performance, we employ four widely adopted evaluation metrics that capture different aspects of prediction accuracy in financial time series forecasting. These metrics collectively enable thorough comparison between EMAT and baseline methods while addressing the specific requirements of financial prediction tasks.
Given true values y i and predicted values y ^ i for n test samples, the evaluation metrics are defined as follows:
  • Mean absolute error (MAE) measures the average magnitude of prediction errors:
    MAE = 1 n i = 1 n | y i y ^ i |
  • Root mean square error (RMSE) emphasizes larger errors through quadratic weighting:
    RMSE = 1 n i = 1 n ( y i y ^ i ) 2
  • Mean absolute percentage error (MAPE) provides scale-independent relative error measurement:
    MAPE = 100 n i = 1 n y i y ^ i y i
  • Coefficient of determination ( R 2 ) quantifies the proportion of variance explained by the model:
    R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
    where y ¯ = 1 n i = 1 n y i represents the mean of true values.
These complementary metrics address distinct evaluation perspectives essential for financial forecasting. MAE provides robust assessment of average prediction accuracy with reduced sensitivity to extreme errors, making it particularly suitable for evaluating consistent model performance. RMSE emphasizes larger deviations through quadratic weighting, thereby capturing the model’s capacity to avoid significant mispredictions that could have substantial financial implications. MAPE enables scale-independent comparison across different price levels and market conditions through relative error measurement. Finally, R 2 quantifies the model’s explanatory power and goodness-of-fit to underlying data patterns.
For MAE, RMSE, and MAPE, lower values indicate superior predictive performance, with optimal values approaching zero. For R 2 , values approaching 1.0 signify better model fit, with R 2 = 1.0 representing perfect prediction. This multi-metric evaluation framework ensures comprehensive assessment of model capabilities across different error characteristics and provides robust validation of the proposed EMAT architecture.
  • LSTM [55]: LSTM networks are specialized RNNs designed to capture long-range dependencies in sequential data. They employ gating mechanisms (input, output, and forget gates) to regulate information flow, thereby mitigating vanishing gradient problems. While effective for time series modeling, their sequential processing nature can be computationally intensive and may suffer from information decay over extremely long sequences.
  • BiLSTM [56]: Bidirectional LSTMs enhance standard LSTMs by processing input sequences in both forward and backward directions, enabling the capture of contextual information from past and future states. This bidirectional approach improves prediction accuracy in financial forecasting, though often at increased computational cost.
  • GRU [57]: Gated Recurrent Units simplify LSTM architecture by combining forget and input gates into a single update gate and merging cell and hidden states. This design yields computationally efficient models with fewer parameters, making GRUs attractive alternatives for sequence modeling tasks.
  • CNN-LSTM [58]: This hybrid architecture combines CNNs with LSTMs, where CNN layers extract local patterns through one-dimensional convolutions, and LSTM layers model temporal dynamics of extracted features. This combination leverages both local feature extraction capabilities and sequential modeling strengths.
  • CNN-BiLSTM [59]: Building upon CNN-LSTM, this architecture replaces unidirectional LSTMs with BiLSTMs, allowing temporal modeling components to leverage bidirectional context from CNN-extracted features for enhanced pattern understanding.
  • CNN-BiLSTM-AM [59]: This model incorporates attention mechanisms into the CNN-BiLSTM framework. The attention layer dynamically assigns weights to hidden states across different time steps, enabling focus on influential historical patterns and improving prediction accuracy.
  • Transformer [60]: This architecture relies entirely on self-attention mechanisms, processing input points in parallel to model global dependencies regardless of sequential distance. While offering computational advantages and superior long-range interaction modeling, standard Transformers are domain-agnostic and may not be optimized for financial time series characteristics such as volatility and trend dynamics.

5.3. Experimental Setup

To comprehensively evaluate the effectiveness of EMAT, we conduct comparative experiments following a systematic protocol. All experiments were conducted on a standardized computing platform using a PyTorch 2.5 deep learning framework. To ensure reproducibility, we fixed random seeds across all experiments and maintained consistent computational environments.
For each dataset, we follow a systematic training protocol with models trained on designated training sets, the time ranges of which are detailed in Table 2. The optimal hyperparameters for EMAT were determined through multiple training sessions and empirical evaluation, as summarized in Table 3. All baseline models are configured with their respective optimal hyperparameters to ensure fair comparison.
Model performance is assessed using the four evaluation metrics described in the Evaluation Metrics section on designated test sets. To ensure statistical reliability, we conduct multiple training runs with different random initializations and report average performance. The reported results represent performance on completely unseen test data, maintaining strict separation between training and testing phases to prevent data leakage. All models are trained for a fixed number of epochs to ensure optimal performance.

5.4. Performance Comparison and Analysis

To validate the effectiveness and robustness of EMAT model, we conducted a detailed performance comparison against baseline models across all datasets. The results are presented and analyzed by category in the following subsections.

5.4.1. Results on Chinese Market Indices

The experimental results for the three major Chinese market indices are presented in Table 4 and Table 5. EMAT demonstrates consistent superior performance across all evaluation metrics compared to competitive baselines, including recurrent architectures (LSTM, BiLSTM, GRU), hybrid models (CNN-LSTM, CNN-BiLSTM, CNN-BiLSTM-AM), and the Transformer.
On the SSE Composite Index, EMAT achieves a mean absolute error of 24.2440 and a root mean square error of 34.9370. These results represent improvements over the Transformer model, which attains 24.6510 in MAE and 35.5720 in RMSE, corresponding to relative error reductions of approximately 1.65% and 1.78%, respectively.
For the SZSE Component Index, EMAT yields the lowest errors among all methods, with an MAE of 111.3750 and an RMSE of 157.0250. On the CSI 300 Index, the model achieves an MAE of 33.7990 and an RMSE of 48.3970, demonstrating its ability to generalize across multiple markets. Furthermore, EMAT records a mean absolute percentage error of 0.8722% and an R 2 value of 0.9804, indicating excellent predictive accuracy.
These results highlight the model’s effectiveness in capturing temporal dependencies in financial time series. The high R 2 values across all indices—such as the 0.9591 observed for the SSE Composite—underscore the model’s strong explanatory power and its capability to reconstruct actual market movements with high fidelity.
Figure 4 provides a comparative visualization of performance across the three indices, while Figure 5 shows the close alignment between predicted and actual closing prices. Together, these results demonstrate the robustness and accuracy of EMAT in real-world stock market forecasting tasks.

5.4.2. Results on Global Markets Indices

To evaluate the generalizability of the proposed EMAT model beyond the Chinese stock market, we conduct experiments on three representative global stock indices: DJIA, the S&P 500, and the CAC 40. The quantitative results, reported in Table 6 and Table 7, confirm EMAT’s superior performance across diverse market conditions and geographic regions.
EMAT demonstrates consistent superior performance across all global market indices compared to baseline methods. On the DJIA index, EMAT achieves a mean absolute error of 240.3570 and a root mean square error of 322.8700, outperforming the best baseline, Transformer, which yields 246.1640 in MAE and 329.0910 in RMSE. For the S&P 500, EMAT attains the lowest errors, with an MAE of 36.8150 and an RMSE of 48.7510. On the CAC 40 index, EMAT also delivers superior results, achieving an MAE of 55.3640 and an RMSE of 74.4850. These results indicate that EMAT offers improved predictive accuracy on a global scale.
In terms of relative percentage errors, EMAT achieves the lowest MAPE values across all three indices. For example, the MAPE on the S&P 500 is 0.8303%. In addition, the model produces the highest R 2 scores among all evaluated methods. Specifically, the R 2 value reaches 0.9923 for DJIA, 0.9946 for the S&P 500, and 0.9840 for CAC 40. These results indicate strong explanatory power and minimal variance in prediction errors.
Figure 6 provides a comparative visualization of the average performance across all evaluation metrics, and Figure 7 illustrates the close alignment between EMAT’s predicted values and the actual index prices. These results confirm EMAT’s robustness across diverse market structures and trading environments. The consistent performance improvements across both Chinese and global markets validate the generalizability of the proposed Multi-Aspect Attention Mechanism for financial time series forecasting.

5.5. Ablation Study

To validate the effectiveness and generalizability of the key components within our proposed model, we conducted a series of ablation experiments. The study was performed on two representative indices: the SSE Composite Index from the Chinese market and the S&P 500 from the global market. We systematically tested the impact of our core architectural choices by comparing the full EMAT model with several variants where individual components were removed. The results of this analysis are reported in Table 8.
The experiments clearly demonstrate the contribution of each specialized feature in our enhanced attention mechanism. The variant EMAT w/o Time, which lacks the time-aware component, shows a noticeable degradation in performance across all metrics for both indices. A similar decline is observed for the EMAT w/o Trend and EMAT w/o Volatility variants. For the SSE Composite Index, removing any single component increases the MAE, MAPE, and RMSE values while decreasing the R 2 score. This confirms that each architectural component provides a unique and valuable contribution to the prediction task.
The consistency of these findings across both the SSE Composite and S&P 500 datasets is particularly significant. It strongly indicates that each component of our model contributes effectively and synergistically to the final predictive power. Furthermore, the benefits of our Multi-Aspect Attention Mechanism are generalizable across different market structures. The full EMAT model consistently outperforms all ablated versions, confirming that the integration of time, trend, and volatility awareness is essential to achieving state-of-the-art performance.

5.6. Parameter Sensitivity Study

To assess the sensitivity of the EMAT model to its hyperparameters, we conducted an analysis on a key parameter: the input sequence length. The length of the historical data fed into the model directly impacts the final forecast results. The sequence must be long enough to capture relevant patterns but short enough to avoid introducing irrelevant historical noise. To analyze the effect of different lag lengths, we performed experiments on the SSE Composite and the S&P 500 indices, with the results presented in Table 9.
As shown in the table, a lag length of 5 days, which was used in our main experiments, yields the best performance for both markets. When the lag length is increased to 7 days and subsequently to 10 days, there is a consistent degradation in performance across all evaluation metrics for both indices. For example, the MAE for the S&P 500 increases from 36.8150 at a lag of 5 days to 37.7600 at a lag of 10 days. This trend suggests that for these specific markets, a shorter lookback period is more effective at capturing predictive signals. The optimal selection of the lag length is crucial for improving model performance, and these results validate our choice of a 5-day input sequence for the primary model configuration.

6. Conclusions

In this paper, we proposed EMAT, a novel deep learning architecture designed to address the significant challenges of stock market prediction. Our work confronts the limitations of standard models that are often domain-agnostic and fail to capture the unique, multifaceted characteristics of financial time series. The EMAT model introduces a specialized Multi-Aspect Attention Mechanism that simultaneously integrates temporal, trend, and volatility information, complemented by a multi-objective loss function to enhance predictive stability.
To empirically validate the effectiveness of the proposed EMAT architecture and its constituent mechanisms, we conducted extensive experiments on multiple stock market datasets. The EMAT model consistently outperformed a wide range of state-of-the-art baseline methods, including various recurrent, hybrid, and Transformer architectures, demonstrating significant improvements in key evaluation metrics. Furthermore, our ablation studies confirmed the critical contribution of each component within the enhanced attention framework. The removal of any single aspect, be it temporal, trend, or volatility awareness, resulted in a quantifiable degradation of performance, proving the synergistic effectiveness of our design.
Collectively, these findings underscore the practical value and theoretical significance of our approach. Building on these results, this research demonstrates that by tailoring the Transformer architecture with domain-specific mechanisms, it is possible to achieve a new level of performance in stock price forecasting. The EMAT model provides a more robust and accurate tool for financial analysis. Future work could explore the application of this architecture to other financial instruments, such as commodities or cryptocurrencies, and investigate the integration of additional market factors within this multi-aspect framework. A particularly promising direction would be to incorporate the multiscaling characteristics of stock market time series into our Multi-Aspect Attention Mechanism to potentially enhance prediction accuracy.

Author Contributions

Conceptualization, Y.C.; Methodology, Y.C.; Software, Y.C. and X.C.; Validation, Y.C.; Formal analysis, Y.C.; Investigation, Y.C.; Resources, Y.C.; Data curation, Y.C.; Writing—original draft, Y.C.; Writing—review & editing, Y.C., W.S., H.L. and X.C.; Visualization, Y.C.; Supervision, Y.C.; Project administration, Y.C.; Funding acquisition, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by AI-Driven Reform of Scientific Research Paradigms and Discipline Leapfrogging Initiative.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to ethical and privacy restrictions. For detailed information, please contact the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Schwartz, R.A. Efficient capital markets: A review of theory and empirical work: Discussion. J. Financ. 1970, 25, 421–423. [Google Scholar] [CrossRef]
  2. Zou, J.; Zhao, Q.; Jiao, Y.; Cao, H.; Liu, Y.; Yan, Q.; Abbasnejad, E.; Liu, L.; Shi, J.Q. Stock market prediction via deep learning techniques: A survey. arXiv 2022, arXiv:2212.12717. [Google Scholar]
  3. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  4. Bao, W.; Cao, Y.; Yang, Y.; Che, H.; Huang, J.; Wen, S. Data-driven stock forecasting models based on neural networks: A review. Inf. Fusion 2024, 113, 102616. [Google Scholar] [CrossRef]
  5. Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock market index using fusion of machine learning techniques. Expert Syst. Appl. 2015, 42, 2162–2172. [Google Scholar] [CrossRef]
  6. Mintarya, L.N.; Halim, J.N.; Angie, C.; Achmad, S.; Kurniawan, A. Machine learning approaches in stock market prediction: A systematic literature review. Procedia Comput. Sci. 2023, 216, 96–102. [Google Scholar] [CrossRef]
  7. Zhang, C.; Sjarif, N.N.A.; Ibrahim, R. Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020–2022. Wiley Interdiscip. Rev. Data Min. Knowl. Discovery 2024, 14, e1519. [Google Scholar] [CrossRef]
  8. Tsantekidis, A.; Passalis, N.; Tefas, A.; Kanniainen, J.; Gabbouj, M.; Iosifidis, A. Forecasting stock prices from the limit order book using convolutional neural networks. In Proceedings of the 2017 IEEE 19th Conference on Business Informatics (CBI), Thessaloniki, Greece, 24–27 July 2017; Volume 1, pp. 7–12. [Google Scholar]
  9. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  10. Gao, S.; Wang, Y.; Yang, X. StockFormer: Learning Hybrid Trading Machines with Predictive Coding. In Proceedings of the IJCAI, Macao, China, 19–25 August 2023; pp. 4766–4774. [Google Scholar]
  11. Zeng, Z.; Kaur, R.; Siddagangappa, S.; Rahimi, S.; Balch, T.; Veloso, M. Financial time series forecasting using CNN and transformer. arXiv 2023, arXiv:2304.04912. [Google Scholar] [CrossRef]
  12. Mozaffari, L.; Zhang, J. Predictive Modeling of Stock Prices Using Transformer Model. In Proceedings of the 2024 9th International Conference on Machine Learning Technologies, Oslo, Norway, 24–26 May 2024; pp. 41–48. [Google Scholar]
  13. Emami Gohari, H.; Dang, X.H.; Shah, S.Y.; Zerfos, P. Modality-aware Transformer for Financial Time series Forecasting. In Proceedings of the 5th ACM International Conference on AI in Finance, Brooklyn, NY, USA, 14–17 November 2024; pp. 677–685. [Google Scholar]
  14. Vishwakarma, V.K.; Bhosale, N.P. A survey of recent machine learning techniques for stock prediction methodologies. Neural Comput. Appl. 2025, 37, 1951–1972. [Google Scholar] [CrossRef]
  15. Tao, Z.; Wu, W.; Wang, J. Series decomposition Transformer with period-correlation for stock market index prediction. Expert Syst. Appl. 2024, 237, 121424. [Google Scholar] [CrossRef]
  16. Kabir, M.R.; Bhadra, D.; Ridoy, M.; Milanova, M. LSTM–Transformer-Based Robust Hybrid Deep Learning Model for Financial Time Series Forecasting. Sci 2025, 7, 7. [Google Scholar] [CrossRef]
  17. Lezmi, E.; Xu, J. Time Series Forecasting with Transformer Models and Application to Asset Management. Available at SSRN 4375798. 2023. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4375798 (accessed on 29 September 2025).
  18. Yang, S. Research on Stock Price Prediction Based on Machine Learning. In Proceedings of the 2024 International Conference on Artificial Intelligence and Communication (ICAIC 2024), Davos, Switzerland, 10–11 July 2024; Atlantis Press: Dordrecht, The Netherlands, 2024; pp. 693–698. [Google Scholar]
  19. Li, M. Unraveling Financial Markets: Deep Neural Network-Based Models for Stock Price Prediction. Adv. Econ. Manag. Political Sci. 2024, 82, 195–203. [Google Scholar] [CrossRef]
  20. Tang, H. Stock prices prediction based on ARMA model. In Proceedings of the 2021 International Conference on Computer, Blockchain and Financial Development (CBFD), Nanjing, China, 23–25 April 2021; pp. i–iv. [Google Scholar]
  21. Ariyo, A.A.; Adewumi, A.O.; Ayo, C.K. Stock price prediction using the ARIMA model. In Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, Cambridge, UK, 26–28 March 2014; pp. 106–112. [Google Scholar]
  22. Yıldırım, H.; Bekun, F.V. Predicting volatility of bitcoin returns with ARCH, GARCH and EGARCH models. Future Bus. J. 2023, 9, 75. [Google Scholar] [CrossRef]
  23. Arowolo, W. Predicting stock prices returns using GARCH model. Int. J. Eng. Sci. 2013, 2, 32–37. [Google Scholar]
  24. Malkiel, B.G. The efficient market hypothesis and its critics. J. Econ. Perspect. 2003, 17, 59–82. [Google Scholar] [CrossRef]
  25. Lo, A.W. The Adaptive Markets Hypothesis: Market Efficiency from an Evolutionary Perspective. J. Portf. Manag. Forthcom. 2004. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=602222 (accessed on 29 September 2025).
  26. Alkhatib, K.; Najadat, H.; Hmeidi, I.; Shatnawi, M.K.A. Stock price prediction using k-nearest neighbor (kNN) algorithm. Int. J. Bus. Humanit. Technol. 2013, 3, 32–44. [Google Scholar]
  27. Wang, Y.; Xie, Y.; Wu, Y.; Yang, Y. Improved KNN-based Stock Price Prediction. Acad. J. Comput. Inf. Sci. 2024, 7, 38–43. [Google Scholar] [CrossRef]
  28. Chang, T.S. A comparative study of artificial neural networks, and decision trees for digital game content stocks price prediction. Expert Syst. Appl. 2011, 38, 14846–14851. [Google Scholar] [CrossRef]
  29. Kamble, R.A. Short and long term stock trend prediction using decision tree. In Proceedings of the 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 15–16 June 2017; pp. 1371–1375. [Google Scholar]
  30. Fenghua, W.; Jihong, X.; Zhifang, H.; Xu, G. Stock price prediction based on SSA and SVM. Procedia Comput. Sci. 2014, 31, 625–631. [Google Scholar] [CrossRef]
  31. Kumari, N.; Kumar, R. ANN and SVM Machine Learning Algorithms in Stock Market Price Prediction. In Deep Learning Innovations for Securing Critical Infrastructures; IGI Global Scientific Publishing: Hershey, PA, USA, 2025; pp. 131–142. [Google Scholar]
  32. Meher, B.K.; Singh, M.; Birau, R.; Anand, A. Forecasting stock prices of fintech companies of India using random forest with high-frequency data. J. Open Innov. Technol. Mark. Complex. 2024, 10, 100180. [Google Scholar] [CrossRef]
  33. Toochaei, M.R.; Moeini, F. Evaluating the performance of ensemble classifiers in stock returns prediction using effective features. Expert Syst. Appl. 2023, 213, 119186. [Google Scholar] [CrossRef]
  34. Cao, L. Ai in finance: Challenges, techniques, and opportunities. ACM Comput. Surv. (CSUR) 2022, 55, 1–38. [Google Scholar]
  35. Liu, Q.; Tao, Z.; Tse, Y.; Wang, C. Stock market prediction with deep learning: The case of China. Financ. Res. Lett. 2022, 46, 102209. [Google Scholar] [CrossRef]
  36. Hoseinzade, E.; Haratizadeh, S. CNNPred: CNN-based stock market prediction using several data sources. arXiv 2018, arXiv:1810.08923. [Google Scholar] [CrossRef]
  37. Roondiwala, M.; Patel, H.; Varma, S. Predicting stock prices using LSTM. Int. J. Sci. Res. (IJSR) 2017, 6, 1754–1756. [Google Scholar] [CrossRef]
  38. Selvin, S.; Vinayakumar, R.; Gopalakrishnan, E.; Menon, V.K.; Soman, K. Stock price prediction using LSTM, RNN and CNN-sliding window model. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Manipal, Karnataka, India, 13–16 September 2017; pp. 1643–1647. [Google Scholar]
  39. Georgopoulos, S.P.; Tziatzios, P.; Stavrinides, S.G.; Antoniades, I.P.; Hanias, M.P. Reservoir computing vs. neural networks in financial forecasting. Int. J. Comput. Econ. Econom. 2023, 13, 1–22. [Google Scholar] [CrossRef]
  40. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You need. Advances in Neural Information Processing Systems 30 (NIPS 2017). Available online: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 29 September 2025).
  41. Ding, Q.; Wu, S.; Sun, H.; Guo, J.; Guo, J. Hierarchical Multi-Scale Gaussian Transformer for Stock Movement Prediction. In Proceedings of the IJCAI, Yokohama, Japan, 11–17 July 2020; pp. 4640–4646. [Google Scholar]
  42. Khan, W.; Ghazanfar, M.A.; Azam, M.A.; Karami, A.; Alyoubi, K.H.; Alfakeeh, A.S. Stock market prediction using machine learning classifiers and social media, news. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 3433–3456. [Google Scholar] [CrossRef]
  43. Zhang, Q.; Qin, C.; Zhang, Y.; Bao, F.; Zhang, C.; Liu, P. Transformer-based attention network for stock movement prediction. Expert Syst. Appl. 2022, 202, 117239. [Google Scholar] [CrossRef]
  44. Cont, R. Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Financ. 2001, 1, 223. [Google Scholar] [CrossRef]
  45. Hasbrouck, J. Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading; Oxford University Press: Oxford, UK, 2007. [Google Scholar]
  46. Hamilton, J.D. A new approach to the economic analysis of nonstationary time series and the business cycle. Econom. J. Econom. Soc. 1989, 57, 357–384. [Google Scholar] [CrossRef]
  47. Antoniades, I.P.; Brandi, G.; Magafas, L.; Di Matteo, T. The use of scaling properties to detect relevant changes in financial time series: A new visual warning tool. Phys. A Stat. Mech. Its Appl. 2021, 565, 125561. [Google Scholar] [CrossRef]
  48. Di Matteo, T. Multi-scaling in finance. Quant. Financ. 2007, 7, 21–36. [Google Scholar] [CrossRef]
  49. Di Matteo, T.; Aste, T.; Dacorogna, M.M. Long-term memories of developed and emerging markets: Using the scaling analysis to characterize their stage of development. J. Bank. Financ. 2005, 29, 827–851. [Google Scholar] [CrossRef]
  50. Di Matteo, T.; Aste, T.; Dacorogna, M.M. Scaling behaviors in differently developed markets. Phys. A Stat. Mech. Its Appl. 2003, 324, 183–188. [Google Scholar] [CrossRef]
  51. Mandelbrot, B. The variation of certain speculative prices. J. Bus. 1963, 36, 394. [Google Scholar] [CrossRef]
  52. Calvet, L.; Fisher, A. Multifractality in asset returns: Theory and evidence. Rev. Econ. Stat. 2002, 84, 381–406. [Google Scholar] [CrossRef]
  53. Bouchaud, J.P.; Potters, M.; Meyer, M. Apparent multifractality in financial time series. Eur. Phys. J. B-Condens. Matter Complex Syst. 2000, 13, 595–599. [Google Scholar] [CrossRef]
  54. Liu, R.; Di Matteo, T.; Lux, T. Multifractality and long-range dependence of asset returns: The scaling behavior of the Markov-switching multifractal model with lognormal volatility components. Adv. Complex Syst. 2008, 11, 669–684. [Google Scholar] [CrossRef]
  55. Ding, G.; Qin, L. Study on the prediction of stock price based on the associated network model of LSTM. Int. J. Mach. Learn. Cybern. 2020, 11, 1307–1317. [Google Scholar] [CrossRef]
  56. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
  57. Minh, D.L.; Sadeghi-Niaraki, A.; Huy, H.D.; Min, K.; Moon, H. Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 2018, 6, 55392–55404. [Google Scholar] [CrossRef]
  58. Lu, W.; Li, J.; Li, Y.; Sun, A.; Wang, J. A CNN-LSTM-based model to forecast stock prices. Complexity 2020, 2020, 6622927. [Google Scholar] [CrossRef]
  59. Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Appl. 2021, 33, 4741–4753. [Google Scholar] [CrossRef]
  60. Wang, C.; Chen, Y.; Zhang, S.; Zhang, Q. Stock market index prediction using deep Transformer model. Expert Syst. Appl. 2022, 208, 118128. [Google Scholar] [CrossRef]
Figure 1. The architecture of the EMAT model.
Figure 1. The architecture of the EMAT model.
Entropy 27 01029 g001
Figure 2. Architecture of the Multi-Aspect Attention Mechanism, showing base attention computation enhanced by temporal, trend, and volatility components, followed by sequential gating refinement.
Figure 2. Architecture of the Multi-Aspect Attention Mechanism, showing base attention computation enhanced by temporal, trend, and volatility components, followed by sequential gating refinement.
Entropy 27 01029 g002
Figure 3. Historical closing price trends for all selected market indices. The (left) column displays Chinese market indices, while the (right) column displays global market indices. (a) SSE Composite; (b) SZSE Component; (c) CSI 300; (d) DJIA; (e) S&P 500; (f) CAC 40.
Figure 3. Historical closing price trends for all selected market indices. The (left) column displays Chinese market indices, while the (right) column displays global market indices. (a) SSE Composite; (b) SZSE Component; (c) CSI 300; (d) DJIA; (e) S&P 500; (f) CAC 40.
Entropy 27 01029 g003
Figure 4. Bar chart visualization of average evaluation metrics across the three Chinese market indices. Each subplot corresponds to a different metric: (a) MAE, (b) RMSE, (c) MAPE, and (d) R 2 .
Figure 4. Bar chart visualization of average evaluation metrics across the three Chinese market indices. Each subplot corresponds to a different metric: (a) MAE, (b) RMSE, (c) MAPE, and (d) R 2 .
Entropy 27 01029 g004
Figure 5. Comparison between the predicted (red) and actual (blue) closing prices on the test sets for the three major Chinese market indices. The predictions from our EMAT model closely track the true price movements. (a) SSE Composite Index; (b) SZSE Component Index; (c) CSI 300 Index.
Figure 5. Comparison between the predicted (red) and actual (blue) closing prices on the test sets for the three major Chinese market indices. The predictions from our EMAT model closely track the true price movements. (a) SSE Composite Index; (b) SZSE Component Index; (c) CSI 300 Index.
Entropy 27 01029 g005aEntropy 27 01029 g005b
Figure 6. Bar chart visualization of evaluation metrics for the proposed EMAT model on the three major global market indices. Each subplot corresponds to a different metric: (a) MAE, (b) RMSE, (c) MAPE, and (d) R 2 .
Figure 6. Bar chart visualization of evaluation metrics for the proposed EMAT model on the three major global market indices. Each subplot corresponds to a different metric: (a) MAE, (b) RMSE, (c) MAPE, and (d) R 2 .
Entropy 27 01029 g006aEntropy 27 01029 g006b
Figure 7. Comparison between the predicted (red) and actual (blue) closing prices on the test sets for the three major global market indices. (a) DJIA; (b) S&P 500; (c) CAC 40.
Figure 7. Comparison between the predicted (red) and actual (blue) closing prices on the test sets for the three major global market indices. (a) DJIA; (b) S&P 500; (c) CAC 40.
Entropy 27 01029 g007
Table 1. Composition of experimental datasets.
Table 1. Composition of experimental datasets.
CategoryNameSymbolKey Characteristics
Chinese MarketSSE Composite Index000001.SHBenchmark index for Shanghai Stock Exchange
SZSE Component Index399001.SZMajor index for Shenzhen-listed stocks
CSI 300 Index000300.SHTracks top 300 large-cap A-shares from SSE and SZSE
Global MarketsDJIA (Dow Jones)DJI30 major U.S. industrial companies
S&P 500 Index^GSPCBroad-based U.S. large-cap equity index
CAC 40 Index^FCHITracks 40 largest companies listed on Euronext Paris
Table 2. Training and testing time ranges for all datasets.
Table 2. Training and testing time ranges for all datasets.
IndexTraining Set RangeTest Set Range
SSE Composite1 January 2005–31 December 20211 January 2022–31 December 2024
SZSE Component1 January 2005–31 December 20211 January 2022–31 December 2024
CSI 3001 January 2005–31 December 20211 January 2022–31 December 2024
DJIA1 January 2010–31 December 20211 January 2022–31 December 2024
S&P 5001 January 2010–31 December 20211 January 2022–31 December 2024
CAC 401 January 2010–31 December 20211 January 2022–31 December 2024
Table 3. EMAT model configuration and hyperparameters.
Table 3. EMAT model configuration and hyperparameters.
CategoryParameterValue
Training SetupEpochs100
Batch Size64
Learning Rate0.001
OptimizerAdamW
Model ArchitectureEncoder Layers4
Decoder Layers4
Attention Heads8
Model Dimension256
Dropout Rate0.1
Loss FunctionMSE Weight ( λ 1 )0.5
MAE Weight ( λ 2 )0.4
Volatility Weight ( λ 3 )0.1
Table 4. Comparison of MAE and MAPE on Chinese market indices.
Table 4. Comparison of MAE and MAPE on Chinese market indices.
ModelMAEMAPE (%)
SSE
Composite
SZSE
Component
CSI
300
SSE
Composite
SZSE
Component
CSI
300
LSTM25.1222116.861534.89010.79861.10040.9009
BiLSTM24.4306114.954334.53290.77741.08210.8929
GRU25.6517116.824135.08290.81591.10190.9076
CNN-LSTM25.4385119.655035.57310.80861.12830.9167
CNN-BiLSTM25.0634117.209034.67220.79681.10480.8945
CNN-BiLSTM-AM25.0359113.793834.47440.79581.07030.8905
Transformer24.6510112.635034.29200.78441.05810.8847
EMAT (Ours)24.2440111.375033.79900.77051.04410.8722
Table 5. Comparison of RMSE and R 2 on Chinese market indices.
Table 5. Comparison of RMSE and R 2 on Chinese market indices.
ModelRMSE R 2
SSE
Composite
SZSE
Component
CSI
300
SSE
Composite
SZSE
Component
CSI
300
LSTM36.9929177.686351.82350.95260.98210.9764
BiLSTM36.1131165.767349.63410.95480.98450.9784
GRU37.4966176.002851.74540.95130.98250.9765
CNN-LSTM37.2077182.543852.85340.95210.98110.9755
CNN-BiLSTM36.5473179.509150.82120.95370.98170.9774
CNN-BiLSTM-AM36.4000160.999949.44350.95410.98540.9785
Transformer35.5720158.060048.76900.95760.98650.9801
EMAT (Ours)34.9370157.025048.39700.95910.98660.9804
Table 6. Comparison of MAE and MAPE on global market indices.
Table 6. Comparison of MAE and MAPE on global market indices.
ModelMAEMAPE (%)
DJIAS&P500CAC 40DJIAS&P 500CAC 40
LSTM254.211639.456257.92340.72660.89040.8327
BiLSTM251.451939.692156.51620.71620.89460.8127
GRU254.668039.542358.46030.72840.89340.8426
CNN-LSTM254.955539.788157.29350.72930.89960.8248
CNN-BiLSTM254.434340.176057.35910.72910.90730.8258
CNN-BiLSTM-AM250.207339.811357.13630.71860.89410.8197
Transformer246.164037.904056.82700.70450.85620.8178
EMAT (Ours)240.357036.815055.36400.68620.83030.7961
Table 7. Comparison of RMSE and R 2 on global market indices.
Table 7. Comparison of RMSE and R 2 on global market indices.
ModelRMSE R 2
DJIAS&P 500CAC 40DJIAS&P 500CAC 40
LSTM341.155551.981977.10510.99150.99390.9829
BiLSTM334.426552.176775.02750.99190.99390.9838
GRU343.934652.529379.25730.99140.99380.9819
CNN-LSTM342.361152.751277.57910.99150.99380.9827
CNN-BiLSTM342.559852.955778.14290.99150.99370.9824
CNN-BiLSTM-AM340.001251.996475.51480.99160.99390.9836
Transformer329.091050.294076.56200.99210.99430.9830
EMAT (Ours)322.870048.751074.48500.99230.99460.9840
Table 8. Ablation study results on the SSE Composite and S&P 500 sets.
Table 8. Ablation study results on the SSE Composite and S&P 500 sets.
Model
Variation
SSE
Composite
S&P
500
MAEMAPE (%)RMSE R 2 MAEMAPE (%)RMSE R 2
EMAT w/o Time24.48200.777635.14300.958637.27370.840549.21450.9945
EMAT w/o Trend24.34600.773435.11400.958737.10600.836949.12200.9945
EMAT w/o Volatility24.43000.776435.05400.958837.16300.837349.25600.9945
EMAT (Full Model)24.24400.770534.93700.959136.81500.830348.75100.9946
Table 9. Parameter sensitivity of EMAT model on representative indices with different input sequence lengths.
Table 9. Parameter sensitivity of EMAT model on representative indices with different input sequence lengths.
Lag
Length
SSE
Composite
S&P
500
MAERMSEMAPE (%) R 2 MAERMSEMAPE (%) R 2
Lag_1024.50500.779335.17600.957237.76000.850950.12100.9944
Lag_724.41000.776334.88800.958637.61000.847849.79500.9944
Lag_5 (original)24.24400.770534.93700.959136.81500.830348.75100.9946
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Shen, W.; Liu, H.; Cao, X. EMAT: Enhanced Multi-Aspect Attention Transformer for Financial Time Series Forecasting. Entropy 2025, 27, 1029. https://doi.org/10.3390/e27101029

AMA Style

Chen Y, Shen W, Liu H, Cao X. EMAT: Enhanced Multi-Aspect Attention Transformer for Financial Time Series Forecasting. Entropy. 2025; 27(10):1029. https://doi.org/10.3390/e27101029

Chicago/Turabian Style

Chen, Yingjun, Wenfeng Shen, Han Liu, and Xiaolin Cao. 2025. "EMAT: Enhanced Multi-Aspect Attention Transformer for Financial Time Series Forecasting" Entropy 27, no. 10: 1029. https://doi.org/10.3390/e27101029

APA Style

Chen, Y., Shen, W., Liu, H., & Cao, X. (2025). EMAT: Enhanced Multi-Aspect Attention Transformer for Financial Time Series Forecasting. Entropy, 27(10), 1029. https://doi.org/10.3390/e27101029

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop