From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting

Song, Zhicong; Tsang, Harris Sik-Ho; Hsung, Richard Tai-Chiu; Zhu, Yulin; Lo, Wai-Lun

doi:10.3390/forecast7040055

Open AccessArticle

From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting

by

Zhicong Song

,

Harris Sik-Ho Tsang

^*

,

Richard Tai-Chiu Hsung

,

Yulin Zhu

and

Wai-Lun Lo

Department of Computer Science, Hong Kong Chu Hai College, Hong Kong 999077, China

^*

Author to whom correspondence should be addressed.

Forecasting 2025, 7(4), 55; https://doi.org/10.3390/forecast7040055

Submission received: 16 July 2025 / Revised: 26 September 2025 / Accepted: 30 September 2025 / Published: 2 October 2025

Download

Browse Figures

Versions Notes

Abstract

Financial time-series prediction remains a significant challenge, driven by market volatility, nonlinear dynamic characteristics, and the complex interplay between quantitative indicators and investor sentiment. Traditional time-series models (e.g., ARIMA and GARCH) struggle to capture the nuanced sentiment in textual data, while static deep learning integration methods fail to adapt to market regime transitions (bull markets, bear markets, and consolidation). This study proposes a hybrid framework that integrates investor forum sentiment analysis with adaptive deep reinforcement learning (DRL) for dynamic model integration. By constructing a domain-specific financial sentiment dictionary (containing 16,673 entries) based on the sentiment analysis approach and word-embedding technique, we achieved up to 97.35% accuracy in forum title classification tasks. Historical price data and investor forum sentiment information were then fed into a Support Vector Regressor (SVR) and three Transformer variants (single-layer, multi-layer, and bidirectional variants) for predictions, with a Deep Q-Network (DQN) agent dynamically fusing the prediction results. Comprehensive experiments were conducted on diverse financial datasets, including China Unicom, the CSI 100 index, corn, and Amazon (AMZN). The experimental results demonstrate that our proposed approach, combining textual sentiment with adaptive DRL integration, significantly enhances prediction robustness in volatile markets, achieving the lowest RMSEs across diverse assets. It overcomes the limitations of static methods and multi-market generalization, outperforming both benchmark and state-of-the-art models.

Keywords:

sentiment analysis; Transformer; reinforcement learning; market price prediction; model ensembling

1. Introduction

Accurate financial time-series forecasting remains a cornerstone of global financial stability, with trillions of dollars in derivatives trading hinging on reliable predictions [1]. However, financial forecasting is inherently complex. Financial markets are inherently volatile, exhibiting unpredictable swings driven by a complex interplay of economic indicators, geopolitical events, and investor sentiment (e.g., 20% daily swings in energy futures during geopolitical crises). These fluctuations are often amplified during periods of crisis, leading to significant uncertainty and risk. Moreover, the dynamics of financial markets are nonlinear, shifting between periods of growth (bull markets) and decline (bear markets), making it difficult to develop models that consistently perform well across different regimes. The integration of both quantitative data (e.g., price and volume) and qualitative information (e.g., news sentiment and social media trends) is crucial but remains a significant challenge for many forecasting methods [2].

Traditional time-series models like ARIMA [3] and GARCH [4] struggle to reconcile these complexities, exhibiting significantly high prediction errors in volatile market environments, highlighting their limitations in dynamic environments.

With the successful development of deep learning [5] for natural language processing (NLP) [6] as well as time-series prediction [7], sequential models such as Recurrent Neural Networks (RNNs) [8], Long Short-Term Memory (LSTM) [9], Gated Recurrent Unit (GRU) [10], Bidirectional LSTM (BiLSTM) [11], and Transformers [12,13] have been developed. These models demonstrated superior capabilities for capturing temporal dependencies in a wide range of sequential prediction, including framewise phoneme classification and neural machine translation, and have even been adapted to other kinds of applications, such as bridge structural response prediction [14].

Building on these advancements, numerous deep learning approaches [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37] have been proposed for financial market forecasting. For example, some basic time-series deep learning models, such as LSTM [15,16] and BiLSTM [17], were designed for financial market forecasting, which generally outperform traditional baselines. To enhance forecasting accuracy, more advanced deep learning approaches with signal decomposition techniques or optimization algorithms, like empirical mode decomposition (EMD) [15] and the Grey Wolf Optimizer (GWO) [20], have been proposed. However, many of these models rely on static or single-model frameworks, which can restrict their adaptability during sudden market fluctuations.

Hybrid models, such as those in [24,25], have been proposed, which combine different model architectures, such as convolutional neural networks (CNNs), LSTM, and Transformers, for more accurate stock market prediction. Yet, the trained hybrid models are frozen or static after training, which might hinder their prediction performance when there are shifts in market regimes, e.g., from a stable market to a volatile market.

On the contrary, recent advances in semantic extraction in NLP have enabled the extraction of investor sentiments, which can benefit financial forecasting. For example, refs. [32,33] extracted sentiments from Twitter for crude oil price prediction and stock price prediction, respectively.

This paper focuses on the task of financial time-series forecasting, which includes predicting futures prices, stock prices, and market indices based on historical market data and sentiment signals. Formally, given a time window of past observations,

X_{t} = \{x_{t - k + 1}, x_{t - L + 2}, \dots, x_{t}\}

, where k is the lookback period, and

x_{t}

represents historical prices and derived features at timestamp t, along with the extracted investor sentiment signals

S_{t}

, the goal is to learn a predictive function that jointly integrates price or index data and sentiment information. This function aims to accurately forecast the price or index value for the next timestamp, thus providing a robust method adaptable to diverse market conditions and capable of generalizing across multiple asset classes.

In this paper, we address key limitations of existing financial time-series prediction methods, including (1) the insufficient incorporation of investor sentiment in a domain-specific manner; (2) the inability of static models to adapt to rapidly changing market regimes; and (3) the lack of multi-market generalization in previous approaches reliant on extensive, market-specific feature engineering. Our contributions include constructing a comprehensive financial sentiment dictionary tailored to investor forums using SnowNLP, leveraging a Support Vector Regressor (SVR) and heterogeneous Transformer architectures to capture diverse market signals, and implementing a Deep Q-Network (DQN)-based dynamic ensembling mechanism that adaptively integrates model predictions according to evolving market conditions. This integrated framework significantly enhances prediction robustness across different financial assets and market environments.

2. Literature Review

2.1. Traditional Financial Forecasting Models

Traditional time-series models [3,4] have been foundational in financial forecasting. Box and Jenkins’s ARIMA model [3] assumes linear relationships and stationary processes, which limits its ability to model nonlinear dynamics and abrupt market changes, as seen during the 2020 crash, when it underperformed deep learning models by 42% in volatility prediction. Bollerslev’s GARCH model [4] improves volatility modeling with conditional heteroskedasticity but struggles with regime shifts, such as transitions from bull to bear markets. Moreover, traditional linear regression and machine learning-based SVR were also investigated in [15,38] for forecasting. Yet, due to their relative simplicity, there are limitations in modeling nonlinear dynamics and abrupt market changes, which has motivated a shift toward more powerful deep learning approaches.

2.2. Basic Deep Learning Models

There are numerous deep learning-based time-series models [15,16] proposed for stock market prediction tasks. In [15], LSTM [16] and Gated Recurrent Unit (GRU) [17] were evaluated for stock price prediction, finding that LSTM outperforms traditional models but still struggles with high-volatility markets. Nelson et al. [16] and Xu et al. [17] used LSTM and BiLSTM for stock market movement prediction, respectively, in which moderate prediction performance was achieved.

There are also deep learning-based CNN approaches [18,19] for financial market forecasting. For example, a Sample Convolution and Interaction Network (SCINet) [18] was proposed, which uses hierarchical convolutional blocks and demonstrated higher prediction errors when transitioning from bull to bear markets, evidencing its failure to adapt to regime shifts. Chen et al. [19] proposed a GC-CNN model to capture inter-stock correlations, outperforming traditional CNNs for Chinese equities. Yet, the use of single models might not fully capture the complexities of financial markets, prompting research into more advanced modeling strategies.

2.3. Deep Learning Models with Advanced Techniques

Several deep learning models combine advanced techniques [15,20,21,22,23], such as decomposition methods or optimization algorithms, to further improve their learning ability. Due to the inaccuracy of SVR, LSTM, and GRU, Wang et al. [15] proposed the FIVMD-LSTM model, which integrates EMD to predict CSI 100. Similarly, Mahmoodzadeh et al. [20] utilized GWO in LSTM to become GWO-LSTM for tunnel boring machine penetration rate forecasting, demonstrating improved stability, but with a relatively high RMSE on financial data. Lin et al. [21] integrated an advanced EMD model, namely, CEEMDAN, with LSTM for volatility forecasting, achieving a relatively small mean absolute error (MAE) in CSI 100, despite the extensive data decomposition. Li et al. [22] proposed an SSA-BiGRU model for time-series production forecasting, combining the sparrow search algorithm (SSA) with bidirectional GRU (BiGRU), but its high MAPE in financial data highlights the challenge of capturing nonlinear dependencies. Zhang et al. [23] developed a VMD-SE-GRU model for oil price forecasting, integrating variational mode decomposition with squeeze-and-excitation GRU, yet a high RMSE was still obtained.

However, the potential for improved forecasting accuracy remains due to the use of single-model approaches, motivating the development of more advanced approaches, such as hybrid model approaches or adaptive strategy approaches.

2.4. Hybrid Deep Learning Models

There are several hybrid deep learning models proposed [24,25], which leverage the strengths of different model architectures, such as both CNNs and LSTM, for efficient feature extraction within time-series data. Chen et al. [24] developed CNN-BiLSTM-ECA models, incorporating Bidirectional LSTMs (BiLSTMs) to process information from both past and future time steps and integrating an efficient channel attention (ECA) CNN to enhance the model’s ability to focus on the most relevant features. Kabir et al. [25] proposed an LSTM-mTrans-MLP hybrid model that integrates LSTM, a modified Transformer, and multilayered perception (MLP), in which the RMSE is reduced for stock indices. Yet, it is conjectured that ensembling these static models, such as arithmetic mean and weighted averaging, might lead to suboptimal performance when the market regime is changing, for instance, during market crashes, due to the use of fixed weights after training, even when the hybrid models are in use. To address the robustness issues, statistical model selection techniques like the Model Confidence Set (MCS) [26] can be employed for dynamic model ensembling. The MCS method selects a subset of models with statistically indistinguishable predictive performance, effectively accounting for model uncertainty. We believe there is a need for more resilient approaches using deep learning for adapting to volatile markets or abrupt market changes. The Sliding-Window Weighted Average (SWA) ensemble strategy is proposed in [27] for carbon price forecasting, wherein the weights of sub-models are assigned based on the performance over time sliding window. There is also concurrent work [28] proposed for classification of financial cycles using leading, coincident, and lagging indicators.

2.5. Sentiment Analysis

Traditional financial forecasting methods often struggle with capturing distinct characteristics and relationships across markets, which reduces generalization of predictive models. Omoware et al. [29] used LSTM for Amazon and Google stock prediction, resulting in large MAEs. Despite both companies being in the tech sector, their unique business models, competitive environments, and risk profiles lead to different sensitivities to market events and, consequently, diverse stock price behaviors.

Tetlock [1] demonstrated that media sentiment significantly impacts stock markets. With the successful development of NLP in deep learning, sentiment analysis has emerged as a critical component of financial forecasting [30,31,32,33,34,35,36,37], although challenges remain. The generic lexicon, HowNet, was used in [30] for financial domain-specific terminology. Loughran and McDonald [31] developed a financial lexicon to address domain-specific nuances. Li et al. [32] used LSTM with attention to Twitter sentiment for crude oil price prediction with additional commodity-specific feature engineering to improve the trend accuracy. Xu and Cohen [33] proposed a deep generative model for stock price prediction from tweets, introducing recurrent latent variables, but their approach requires task-specific preprocessing. Gui et al. [34] proposed to use ERNIE, which is an improved BERT model, for investor sentiment extraction to investigate the relationship between economic policy uncertainty, investor sentiment, and the Growth Enterprise Market in China. Fu et al. [35] and Smatov et al. [36] proposed a BERT-LLA and CNN model to analyze sentiment for US stock price prediction. Kim et al. [37] proposed FinBERT for BERT-based sentiment extraction, together with price data fed into LSTM for S&P 500 Index forecasting. While incorporating sentiment analysis to capture investor sentiment enhances financial forecasting models by capturing company-specific factors and market nuances, most existing methods primarily focus on a single commodity or market. Consequently, they lack a comprehensive evaluation across multiple markets to robustly demonstrate their generalization and predictive performance across futures, indices, and stocks from various markets.

2.6. Limitations of Existing Research and Our Contributions

To summarize, existing studies reveal four critical gaps:

Conventional deep learning approaches fail in handling complexity: Basic deep learning approaches struggle with nonlinear, complex price movements. Even with advanced methods, such as signal decomposition methods and optimization algorithms [15,20,21,22,23], deep learning approaches still face challenges in modeling nonlinear, sentiment-driven price movements, limiting predictive accuracy.
Inability to adapt to different market regimes: Static ensembling approaches, such as arithmetic mean and even-weighted average, fail to adjust to varying market conditions, making them impractical for real-time financial forecasting. Models are static and have fixed weights after training and fail to adapt in real time to sudden market volatility, reducing their effectiveness in dynamic environments.
Limited generalization in existing sentiment models: Existing sentiment-based methods [32,33,34,35,36,37] are tailored to specific financial market assets. For example, Twitter sentiment was extracted in [32,33] for crude oil price forecasting, with commodity-specific features and stock movement prediction with domain-specific preprocessing, respectively, as described. Refs. [35,36,37], respectively, focused on sentiment analysis for US stock price prediction and the S&P 500 Index only. They restricted their applicability and lacked generalization across diverse financial assets.

To address these gaps, a domain-specific financial sentiment dictionary was constructed using SnowNLP [39] for sentiment analysis in the financial domain. A word-embedding model, Word2Vec [40], is employed for dictionary expansion. This dictionary is utilized to analyze sentimental information from investor forums, aiding in price forecasting across different market regimes and assets, thereby overcoming the limitations of static models, even those using advanced techniques. The investor forum sentiment data are then combined with historical price data and fed into a Support Vector Regressor (SVR) [41] and three Transformer variants (single-layer, multi-layer, and bidirectional variants) [12] for predictions. A Deep Q-Network (DQN) agent [42,43] dynamically fuses the prediction results across diverse market conditions and asset types. It is noted that while the individual components, such as SnowNLP, SVR, Transformer, and DQN, are well-established, our theoretical contributions lie in the seamless integration of these advanced methods, specifically tailored and adapted for the unique challenges of financial prediction. Therefore, our proposed framework has four core contributions:

Domain-specific financial sentiment dictionary construction: By leveraging SnowNLP [39] and Word2Vec [40,44], we propose a financial sentiment dictionary specifically tailored for investor forum data, containing 16,673 entries. This dictionary achieves up to 97.35% classification accuracy by capturing domain-specific terminology and nuanced sentiment expressions unique to financial discussions. Our domain-specific approach addresses this gap by including sentimental features, which are critical for modeling investor behavior and market sentiment, which cannot be solely modeled by financial price data.
Heterogeneous model framework integration: As aforementioned, existing single-model approaches often fail to comprehensively capture both linear and nonlinear dynamics, limiting their predictive capacity. We designed a hybrid framework combining Support Vector Regression (SVR) [41] to effectively capture linear trends with three Transformer variants [12] to model complex nonlinear dependencies inherent in financial time series. Our heterogeneous integration ensures diverse market patterns and sentiment impacts are better represented, enhancing robustness and accuracy in forecasting across different assets.
DQN-driven dynamic ensembling strategy: Traditional ensembling or static-weighting approaches lack the flexibility to adjust to rapidly changing market regimes, leading to deteriorated performance during market shifts. Inspired by [45], which used deep reinforcement learning (DRL) for dynamic portfolio management, the Deep Q-Network (DQN) [42,43] enables adaptive ensembling by learning nonlinear, volatility-adaptive weights over multiple model predictions during training, reducing the average RMSE across different assets. Our DQN-driven strategy addresses this by continuously learning optimal weighting policies corresponding to evolving volatility, which forms our proposed DQN–Hybrid Transformer–SVR Ensemble Framework (DQN-HTS-EF), maintaining strong forecasting performance in dynamic financial environments.
Multi-market generalization evaluation: Finally, comprehensive experiments were performed to validate the performance of our DQN-HTS-EF across a diverse set of financial datasets, including the China United Network Communications (China Unicom) stock, the CSI 100 index, the Amazon (AMZN) stock, and corn futures. This multi-asset selection—covering RMB-denominated equities, USD-denominated tech stocks, and agricultural commodities—facilitates a rigorous assessment of the framework’s ability to generalize across different market regimes and various asset classes.

3. Materials and Methods

3.1. Framework Overview

Our proposed hybrid framework architecture operates in three interconnected stages, leveraging sentiment semantics, multi-model heterogeneity, and deep reinforcement learning (DRL) adaptability, as shown in Figure 1.

Domain-Specific Sentiment Dictionary Construction

A financial domain-specific dictionary is essential to capture nuanced financial specialized terminology. The data acquisition and dictionary construction methods are detailed in Section 3.2 and Section 3.3, respectively.

Taking a set of financial research reports as the primary data source, the raw text is preprocessed, followed by sentiment scoring via SnowNLP [39], which results in a foundational dictionary containing 6352 entries. The dictionary is then expanded using a financial-data-pretrained Word2Vec [40,44] (300-dimensional embeddings trained on financial corpora), yielding an expanded dictionary of 16,673 domain-specific terms.

2.: Sentiment Feature Extraction for Forum Titles

The expanded dictionary constructed by Word2Vec [40,44] is used to infer the sentiment polarity of forum titles (a proxy for retail investor sentiment), with daily sentiment scores calculated. These sentiment scores are aggregated and then merged with historical time-series market price data, forming a rich augmented feature set, i.e., historical price data with textual semantic scores, which is a critical input for our subsequent prediction models. This sentiment score calculation is mentioned in Section 3.4.

3.: Dynamic Model Ensembling via DRL

The framework deploys four complementary prediction models for forecasting. A kernel-based Support Vector Regression (SVR) model [41] is used, which excels in capturing linear trends within stable markets by using an ϵ-insensitive tube to minimize prediction errors within a specified range. This SVR is complemented with three Transformer [12] architecture variants of modeling market trends with a higher degree of complexity: the Base-Transformer (a single-layer encoder that leverages self-attention for sequential price-feature vector processing), the Multi-Transformer (a 3-layer stacked encoder, which captures long-range temporal dependencies in volatile markets), and the Bi-Transformer (a bidirectional encoder that processes both forward and reversed sequences to enhance contextual feature extraction from multi-directional market signals).

Instead of using a static model ensembling approach, a more advanced Deep Q-Network (DQN) [43,44] is proposed to serve as the dynamic ensembler, treating the model predictions and trading volume as “states”, model selection as “actions”, and prediction error as “rewards” for deep reinforcement learning. This DQN adaptively weights the base model outputs in real time to optimize multi-market forecasting. This design leverages model heterogeneity to balance linear trend capture (via SVR) and nonlinear dependency modeling (via Transformers), while the DQN enables context-aware fusion tailored to evolving market conditions, optimizing for multi-market volatility. Finally, the predicted result output by the DQN is the final predicted market value. The whole hybrid model architecture plus the DQN is mentioned in Section 3.5.

3.2. Data Acquisition and Data Preprocessing

In financial market prediction, the quality and diversity of data are the key foundations for building accurate models. This study comprehensively collected two types of crucial data, namely, financial trading data and textual sentiment data, to fully capture market dynamics and investor sentiment.

3.2.1. Financial Trading Data

Four distinct financial trading datasets were extracted from diverse stock markets and countries to evaluate our advanced DQN–Hybrid Transformer–SVR Ensemble Framework (DQN-HTS-EF). The datasets included stock price data of China United Network Communications Group Co., Ltd. (China Unicom, Beijing, China), the CSI 100 index (one of the most researched RMB-denominated Chinese indices), Amazon (AMZN) stock prices (denominated in USD), and corn futures contracts traded on the Dalian Commodity Exchange (DCE). The China Unicom, CSI 100 index, and corn futures datasets were derived from the Digquant Financial Database [46], spanning from 1 January 2020 to 31 December 2024, while the Amazon (AMZN) dataset was obtained from Yahoo Finance [47], spanning from 25 May 2017 to 5 April 2023. This multi-source dataset design, which integrates RMB-denominated equities, USD-denominated tech stocks, and agricultural commodities for testing the model robustness, enables the evaluation of multi-market generalization.

3.2.2. Textual Sentiment Data

Web crawling was performed on the Eastmoney Stock Bar [48], which is a leading Chinese financial forum, to obtain textual sentiment data reflecting investor sentiment. Although the Eastmoney Stock Bar primarily features content in Chinese, this technical discussion platform not only focuses on the domestic financial market in China but also includes dedicated sections for overseas listed stocks, markets, and other global assets (such as the “US Stock Discussion” and “Global Asset Allocation Exchange” boards). These sections attract Chinese investors interested in international markets, who frequently share technical discussions about US stocks, including AMZN. This provides a suitable context for gathering relevant data. Consequently, most AMZN-related forum titles were originally in Chinese (for example, one raw forum title translated into English reads, “Amazon Q2 earnings exceed expectations, stock price may continue to rise”). It is noted that the Jieba tokenizer [49], SnowNLP [39], and a 300-dimensional Word2Vec word-embedding model (sgns.financial.word) [44] are, respectively, used for word tokenization, sentiment analysis, and vector formation in Chinese, which will be mentioned in detail in this sub-section and the next sub-section. No additional translation was required during our data collection process.

Particularly, the forum title data related to the CSI 100 index, corn futures, China Unicom, and AMZN were scrapped. This allows us to leverage its rich source of investor sentiment expressed in forum titles. After that, data preprocessing was performed. We first used regular expressions (regex) to remove special characters from the forum titles. For instance, a Chinese-to-English translated raw forum title like “Corn is going to skyrocket! #Futures Market @Investment Expert” would be transformed by regex into “Corn is going to skyrocket Futures Market Investment Expert” after removing the special characters “#”, “@”, and the exclamation mark “!”. Subsequently, stopwords were also eliminated, since stopwords are common words that typically carry little semantic meaning, such as “the”, “and”, and “in” in English. If the title is “The price of Corn is rising”, after removing stopwords, it becomes “price Corn rising”. This step helps to reduce noise in the data and focuses more on the meaningful words related to sentiment and market views. After preprocessing, the Jieba tokenizer [49] was utilized, which is a Chinese word segmentation tool for word tokenization. For example, the sentence “The increase in demand causes the price of Corn to rise”, with additional preprocessing, would eventually be segmented into “increase”, “demand”, “causes”, “price”, “Corn,” and “rise”. With our proposed preprocessing and word tokenization, the forum title is effectively split into individual words, which is essential for further sentiment analysis as it can accurately identify words in different contexts and enables us to extract sentiment-bearing words and phrases.

3.3. Financial Domain-Specific Sentiment Dictionary Construction

3.3.1. Dictionary Construction Using Sentiment Analysis

The construction of the financial domain-specific sentiment dictionary employed a rigorous two-stage methodology. Initially, 100 financial research reports, with at least 6 pages each, spanning from 24 May 2023 to 31 May 2023, were randomly chosen and manually annotated for sentiment polarity to establish a high-quality training corpus. By utilizing SnowNLP [39], each sentence was assigned a polarity score within the continuous range of [0, 1], where SnowNLP was used for sentiment analysis of a sentence in the Chinese language. Each sentence in the financial research report,

S_{i}

, was classified as having positive sentiment if its polarity score

{P S}_{i}

met or exceeded the upper threshold

{T H}_{u p}

, and negative sentiment if its

{P S}_{i}

was equal to or below the lower threshold

{T H}_{l o w}

, which can be expressed by

S e n t i m e n t (S_{i}) = \{\begin{matrix} \begin{matrix} p o s i t i v e & i f {P S}_{i} \geq {T H}_{u p} \end{matrix} \\ \begin{matrix} n e g a t i v e & i f {P S}_{i} \leq {T H}_{l o w} \end{matrix} \\ \begin{matrix} n e u t r a l & o t h e r w i s e \end{matrix} \end{matrix}

(1)

Empirically,

{T H}_{u p}

and

{T H}_{l o w}

were set to 0.7 and 0.3, respectively. Among these sentences with either positive or negative sentiment, lexical items appearing with a minimum occurrence frequency of 3 in these categorized sentences were retained for filtering out low-frequency noise, which yielded a foundational dictionary of 6352 entries, reflecting the nuanced financial domain-specific sentiment expressions prevalent in financial research reports.

3.3.2. Dictionary Expansion Using Word Embedding

To further enhance the dictionary’s semantic coverage and domain relevance, an unsupervised learning expansion phase was implemented using the financial-data-pretrained 300-dimensional Word2Vec word-embedding model, sgns.financial.word [44]. For each lexical entry in the initial dictionary, semantically proximate terms were identified by measuring the cosine similarity. The cosine similarity between words

w_{i}

and

w_{j}

,

s i m (w_{i}, w_{j})

, was calculated as follows:

s i m (w_{i}, w_{j}) = \frac{w_{i} \cdot w_{j}}{| | w_{i} | | | | w_{j} | |}

(2)

where

w_{i}

and

w_{j}

denote the word vectors in the embedding space, and

w_{i} \cdot w_{j}

represents the dot product of two word vectors.

| | w_{i} | |

and

| | w_{j} | |

are the L2 norms of the corresponding word vectors. When the cosine similarity is large, it means the semantic meanings of these two words are close to each other. Therefore, terms exhibiting cosine similarity larger than or equal to 0.8 to entries in the foundational dictionary were incorporated into the dictionary. Furthermore, sentiment labels were assigned using majority voting among nearest neighbors, which provides an additional layer of noise control for mitigating the noise propagation. Finally, this process expanded the dictionary by 10,321 entries, culminating in a comprehensive financial sentiment dictionary of 16,673 terms. This expansion enhances the dictionary’s coverage and relevance, allowing it to capture a broader spectrum of domain-specific terminology and nuanced sentiment expressions prevalent in financial discussions, ultimately improving financial market forecasting performance.

3.4. Market and Sentiment Data Fusion for Model Prediction

Market data are typically described as quantitative because they consist of numerical values, such as prices, volumes, and other measurable financial indicators, while sentiment data are considered qualitative because they capture subjective information like opinions, emotions, or attitudes, often derived from text, social media, or news. The core input representation of our forecasting framework integrates quantitative market dynamics and qualitative sentiment signals, with the feature fusion process structured in Figure 2.

For quantitative data, the daily closing price and daily trading volume volatility are included as they capture the price trends and intensity of trading activity over time. The closing price on day t is min.–max. normalized to the range of [0, 1] as

p_{t}

. And the trading volume volatility on day t is calculated using z-score standardization based on the k-day rolling mean and standard deviation, as represented by

v_{t}

, which is formulated as shown below:

v_{t} = \frac{V_{t} - μ_{k}}{σ_{k}}

(3)

where

V_{t}

,

μ_{k}

, and

σ_{k}

are the unnormalized trading volume, the mean value of the k-day trading volume, and the standard deviation of the k-day trading volume, respectively, with k set to 30 empirically.

For the qualitative data, an investor sentiment score is derived from Eastmoney forum titles [48]. Specifically, the investor sentiment score on day t, and

{s s}_{t}

is scaled using a normalized difference between positive and negative word counts, which is formulated as shown below:

{s s}_{t} = \frac{N_{t}^{p o s} - N_{t}^{n e g}}{N_{t}^{p o s} + N_{t}^{n e g} + 1}

(4)

where

N_{t}^{p o s}

and

N_{t}^{n e g}

are, respectively, the positive and negative word counts across forum titles on day t, validated by our domain-specific financial dictionary, as aforementioned in Section 3.3.

The above heterogeneous data features are then concatenated into a unified temporal input feature vector

x_{t} = [p_{t}, v_{t}, {s s}_{t}]^{⊤}

, preserving the causal alignment, where

{s s}_{t}

(generated from forum titles on day t) is used to predict the price for the next day

p_{t + 1}

. To avoid look-ahead bias, the preprocessing protocol ensures temporal consistency by collecting forum titles at market close (15:00 CST) for the next-day market price prediction. A SHAP analysis [50] is performed to confirm that

{s s}_{t}

contributes to the prediction variance across multiple market assets, which is described in Section 4.5.

3.5. Proposed DQN-HTS-EF Model Architectures

In the rapidly evolving landscape of market price forecasting, single-model approaches often fall short in addressing the complexities and variabilities inherent in market behavior. To enhance prediction accuracy and adaptability, we propose an adaptive multi-model fusion strategy that leverages the strengths of various forecasting models. This approach allows for a more comprehensive analysis of diverse market conditions, improving the model’s ability to generalize across different scenarios. Specifically, four models are used for market price forecasting, which are Support Vector Regression (SVR) [41], Base-Transformer, Multi-Transformer, and Bi-Transformer [12]. SVR is adept at capturing trends in relatively smooth and volatile markets, while the Transformer series excels in highly volatile environments with complex long- and short-term dependencies. The best model is selected from one of the four models dynamically by the DRL model, Deep Q-Network (DQN) [42,43], according to its unique strengths in handling different market conditions. By integrating the above four models with the DQN, the prediction errors can be effectively reduced, and the overall generalization capability is enhanced across diverse market scenarios.

3.5.1. SVR Model for Stable Market

Support Vector Regression (SVR) [41] is an extension of Support Vector Machines (SVMs) designed for regression tasks. While SVMs find a hyperplane that maximizes the margin between data points to minimize classification error, SVR aims to fit a function within a specified error tolerance, which is also known as an ϵ-insensitive tube, wherein deviations within this range are not penalized. This margin maximization principle helps SVR avoid overfitting and makes it robust to noisy datasets.

SVR effectively captures linear and moderately nonlinear relationships through kernel functions. In the context of financial market price forecasting, SVR is especially effective in relatively stable market conditions where price fluctuations are gradual and bounded. Our dataset exhibits daily price movements with noise that tend to remain within the ϵ-tube, allowing SVR to focus on modeling the underlying trend rather than overfitting transient noise. This makes SVR well-suited for capturing steady market price evolution in less volatile environments.

Hence, in our proposed framework, SVR is used alongside Transformer-based approaches. By integrating SVR with Transformer variants, our framework leverages SVR’s strength in capturing linear trends and stabilizing predictions, while deep learning models focus on capturing nonlinear patterns. We believe this hybrid approach balances bias and variance, improves overall robustness, and enhances prediction accuracy in the complex and noisy environment of financial markets.

3.5.2. Transformer Models for Moderate-to-Volatile Market

Transformer models [12] excel in managing long-distance dependencies within sequences, primarily due to their self-attention mechanism, which makes them particularly renowned for NLP and sequence modeling. This mechanism allows the model to process information at each position while simultaneously considering other relevant positions in the sequence, making them particularly suited for the complexities of market price forecasting.

In our market price prediction framework, we propose three Transformer models. The first one is the Base-Transformer model, which consists of a Transformer block followed by a fully connected layer. The last output of the last layer is used as our prediction. This architecture makes it ideal for scenarios where deep feature extraction is not necessary, e.g., a moderate market. The second one is the Multi-Transformer, which is a multi-layer Transformer consisting of a stack of multiple Transformer blocks, followed by a fully connected layer. This setup allows for deeper feature extraction, enhancing model expressiveness and handling the volatile market. Last but not least, the third one is the Bi-Transformer, which is a bidirectional Transformer incorporating both forward and reverse Transformer blocks to process the original and flipped market sequences separately. This bi-directional information is then fused and output through a fully connected layer, aiming to comprehensively capture sequence context dependencies and increase the diverse perspective of the financial market data so as to improve prediction accuracy.

3.5.3. DQN for Adaptive Prediction Selection

Following the training of the four forecasting models, we implement a Deep Q-Network (DQN) [42,43] as a reinforcement learning mechanism to dynamically integrate their predictions, where the states are the model predictions and trading volume, the actions are the model selection, and the rewards are the prediction errors. Formally, the state vector

s_{t}

at time step t is defined as

s_{t} = [{\hat{p}}_{t + 1}^{S V R}, {\hat{p}}_{t + 1}^{B a s e}, {\hat{p}}_{t + 1}^{M u l t i}, {\hat{p}}_{t + 1}^{B i}, v_{t}]

(5)

where

{\hat{p}}_{t + 1}^{S V R}

,

{\hat{p}}_{t + 1}^{B a s e}

,

{\hat{p}}_{t + 1}^{M u l t i}

, and

{\hat{p}}_{t + 1}^{B i}

denote the market price predictions of the SVR, Base-Transformer, Multi-Transformer, and Bi-Transformer, respectively. And

v_{t}

is the normalized volume based on the k-day rolling mean and standard deviation, as shown in (3).

The DQN processes

s_{t}

to generate a Q-value vector

Q_{t}

of size 4, i.e.,

[Q_{t}^{S V R}, Q_{t}^{B a s e}, Q_{t}^{M u l t i}, Q_{t}^{B i}]

, where

Q_{t}^{m o d e l}

represents the expected cumulative reward for selecting the model in state

s_{t}

. The final forecast,

{\hat{p}}_{t + 1}

, is the market prediction of the model with the highest Q-value:

{\hat{p}}_{t + 1} = {\hat{p}}_{t + 1}^{m o d e l} w h e r e m o d e l = \arg \max (Q_{t}^{S V R}, Q_{t}^{B a s e}, Q_{t}^{M u l t i}, Q_{t}^{B i})

(6)

The network parameters are updated using the reward signal

r_{t} = - M S E ({\hat{p}}_{t}, p_{t})

, where

p_{t}

is the ground-truth market price data. The negative sign indicates that a lower mean-squared error (MSE) results in a higher reward. This enables the DQN to refine its selection policy via temporal difference learning, adapting to evolving market regimes (e.g., stable vs. volatile) by prioritizing models that demonstrate consistent forecasting accuracy. It should be noted that a negative MSE is used because the financial market forecasting task commonly uses error metrics such as MSE and RMSE, which measure the differences between actual and predicted values.

By integrating the DQN with four models, our proposed framework is able to adapt to diverse market conditions. In stable markets, the agent preferentially utilizes the SVR model, leveraging its ability to capture smooth, gradual price trends. Conversely, in volatile markets, the DQN favors the Transformer models, exploiting the large model capacity to model sentiment-driven, rapid price fluctuations. This adaptive model selection mechanism aligns with real-time market dynamics and eventually improves the forecasting performance.

4. Results

4.1. Experimental Datasets and Setup

4.1.1. Dataset Information

Table 1 provides our dataset summary, encompassing start dates, end dates, total observations, and structured training–test splits.

Specifically, the CSI 100, corn, and China Unicom datasets span from 01-01-2020 to 31-12-2024, with an 80:20 split, yielding 969 training points and 243 testing points, where the testing period (02-01-2024 to 31-12-2024) coincides with the Russia–Ukraine war. This standardization ensures cross-comparability across heterogeneous financial assets while preserving the temporal integrity of market signals for model construction.

For the AMZN dataset (25-05-2017 to 05-04-2023), the dataset employs a 70:30 split (1034 training and 442 testing) covering the COVID-19 pandemic period (06-07-2021 to 05-04-2023). Each split adheres to chronological ordering to prevent look-ahead bias and information leakage, with testing periods deliberately spanning significant global events to validate model robustness under challenging conditions.

It is noted that the difference in time periods between the AMZN data and the CSI 100 index, corn futures, and China Unicom data is not due to cherry picking or subjective choices but rather the objective availability of AMZN-related textual data on the Eastmoney Stock Bar [48] at that moment. Although the sentiment dictionary was constructed using data from 2023, which is later than the historical AMZN data (2017–2023), the terms included are all long-established and widely used financial terms (such as “earnings report,” “stock price,” and “revenue”), where the dictionary was built based on financial research reports rather than forum titles in the Eastmoney Stock Bar [48], which was used as historical AMZN data. And we ensured that the sentiment data did not incorporate any future information. This design effectively prevented look-ahead bias in our analysis. And the training and testing data periods were strictly separated with no overlap to ensure that information leakage was prevented so that the test data were truly unseen data.

4.1.2. Experimental Setup

Our forecasting framework comprises one SVR, three Transformer variants, and one DQN. The SVR model uses a radial basis function kernel. Hyperparameters (C of [0.1, 1, 10] and γ of [0.01, 0.01]) are optimized via a grid search with three-fold cross-validation. The Base-Transformer (seven weight layers in total) is a Transformer block with two attention heads, a feed-forward dimensionality of 32, and a dropout [51] of zero. The output is derived from the last time-step feature using a linear layer to predict the target. The Multi-Transformer (13 weight layers in total) comprises two stacked Transformer blocks with two attention heads, a feed-forward dimensionality of 64, and a dropout of 0.1, allowing deeper temporal feature extraction. The Bi-Transformer (13 layers in total) comprises two parallel Transformer blocks processing forward and reversed input sequences. Their outputs are concatenated and passed through a linear layer. The parameters include two heads, a feed-forward dimensionality of 32, and a dropout of 0.1. All models use the Adam optimizer [52], with a learning rate of 0.001, a batch size of 32, and the MSE as the loss function. Early stopping with a patience of 10 epochs is applied to prevent overfitting, with a maximum of 200 training epochs.

For our proposed DQN, the agent uses ε-greedy exploration (initial ε = 0.15; decay rate = 0.99), Adam optimization (learning rate = 0.001), and experience replay, and is trained over 300 episodes with a batch size of 32 for policy refinement. Subsequently, a reinforcement learning agent refines this discrete selection into a continuous model weighting within a two-layer fully connected network, where the selection is determined by an argmax operation over the Q-value outputs, as shown in (6). The agent is trained to minimize the MSE over 500 epochs.

4.2. Performance Validation of Sentiment Scoring

4.2.1. Evaluation Metrics and Results

To assess the domain-specific financial sentiment dictionary, we employed standard classification metrics: precision (P), recall (R), and F1-score (F1). Precision is defined as the ratio of correctly classified semantic-positive (semantic-negative) instances to all predicted semantic-positive (semantic-negative) instances. Recall is measured as the ratio of correctly classified semantic-positive (semantic-negative) instances to all actual semantic-positive (semantic-negative) instances. And the F1-score is the harmonic mean of precision and recall, calculated as

F 1 = 2 \times \frac{P \times R}{P + R}

(7)

where

P = \frac{T P}{T P + F P}

and

R = \frac{T P}{T P + F N}

, with TP, FP, and FN denoting the numbers of true positives, false positives, and false negatives, respectively.

On a holdout validation dataset of 50,000 forum titles, our extended dictionary achieved up to 97.35% validation accuracy, with F1-scores of 0.94 (semantic-positive class) and 0.91 (semantic-negative class). With such high accuracy and F1-scores, we were confident in the dictionary’s ability to be effectively deployed for our subsequent sentiment scoring and multi-model fusion prediction.

4.2.2. Sentiment Scoring Comparison and Examples

We also evaluated our Word2Vec-expanded dictionary on 400 K preprocessed forum titles. Among 16,673 terms in the dictionary, 14,849 terms were captured in 400 K preprocessed forum titles, which was about 89% coverage, while the generic lexicon HowNet [30] could only obtain a lower coverage of about 62%. This result reflects its efficacy in financial domain-specific terminology.

Some forum title examples are shown for illustration. For instance, semantic-positive titles like “CSI 100 surges 2% on policy stimulus, bullish signals persist” are correctly categorized via wordings like “surges” and “bullish”, while semantic-negative titles such as “Corn prices drop to 6-month low amid oversupply concerns” are identified through terms like “drop” and “oversupply”. These examples showcase the superior capability of our dictionary in capturing contextually relevant financial sentiment, underscoring its precision in identifying investor sentiment via industry-specific terminology.

4.2.3. Multi-Market Sentimental Scoring Validation

Table 2 validates the generalizability of the domain-specific financial sentiment dictionary across four kinds of financial market data, demonstrating its robust performance in capturing nuanced sentiment across diverse asset classes and market regimes. To better understand the impact of threshold selection on the performance of sentiment classification using SnowNLP, as shown in (2), besides showing the sentiment classification accuracy of empirical thresholds, i.e.,

{T H}_{u p}

= 0.7 and

{T H}_{l o w}

= 0.3, we also conducted the experiment using stricter thresholds, i.e.,

{T H}_{u p}

= 0.8 and

{T H}_{l o w}

= 0.2, as well as

{T H}_{u p}

= 0.9 and

{T H}_{l o w}

= 0.1. This comprehensive evaluation highlights how varying threshold levels influence the effectiveness of sentiment classification.

In the high-volatility CSI 100 index dataset (92,184 forum titles), the dictionary achieved the highest classification accuracy of 97.00% using

{T H}_{u p}

of 0.7 and

{T H}_{l o w}

of 0.3, correctly labeling 52.57% (48,459) of titles as positive (e.g., policy-driven rally signals) and 9.52% (8774) as negative (e.g., crash warnings), and the remaining 37.91% (34,951) as neutral. This highlights its effectiveness in discerning sentiment amid extreme market fluctuations.

For the relatively stable corn market (135,110 titles), the accuracy reached the highest of 97.35% using a

{T H}_{u p}

of 0.7 and a

{T H}_{l o w}

of 0.3, with 42.23% (57,055) positive, 15.75% (21,285) negative, and 42.02% (56,770) neutral labels, showcasing its reliability in capturing sentiment under low-volatility conditions. This accuracy pertains specifically to the corn futures forum titles, which focus on clear topics probably related to supply–demand and policies (e.g., “Corn supply-demand gap widens, price will rise”), wherein these titles generally have less semantic noise and more direct sentiment expression compared with others, which might contain noise, sarcasm, or even cross-asset discussions.

Similarly, the China Unicom dataset (153,023 titles) yielded the highest accuracy of 97.16% using a

{T H}_{u p}

of 0.7 and a

{T H}_{l o w}

of 0.3, classifying 53.91% (82,491) as positive and 11.12% (16,857) as negative, and 34.97% (53,675) as neutral, demonstrating its adaptability to corporate stock sentiment analysis.

Notably, the Amazon (AMZN) dataset (7382 titles) achieved the highest accuracy of 90.53% using a

{T H}_{u p}

of 0.7 and a

{T H}_{l o w}

of 0.3, with 77.43% (5716) positive, 6.18% (456) negative, and 16.39% (1210) neutral labels. Although a lower classification accuracy of 90.35% was obtained compared with other assets, this reflects the multi-market generalization of our approach, despite a smaller sample size and a USD-denominated market context. Collectively, these results validate the dictionary’s ability to generalize across different assets (stocks, indices, and agricultural futures).

Thus, from the above experiment, we can conclude that the thresholds of

{T H}_{u p}

= 0.7 and

{T H}_{l o w}

= 0.3 provide a better balance between precision and coverage, resulting in higher overall accuracy across various financial assets for the purpose of multi-market generalization.

We also conducted experiments using different cosine similarity thresholds of 0.7, 0.8, and 0.9, as shown in Table 3, for dictionary expansion, which is mentioned in Section 3.3.2. Lower thresholds, such as 0.7, tended to include more loosely related terms, potentially diluting the dictionary’s domain relevance, while a higher threshold of 0.9 was overly restrictive, excluding useful related terms. Therefore, adopting the 0.8 cosine similarity threshold optimized both the quality and coverage of the financial sentiment dictionary.

4.2.4. Regression Analysis of the Sentiment Score—Closing Price Relationship

Table 4 presents the regression analysis across multiple datasets. Regression analysis is a statistical technique used to model the relationship between a dependent variable and one or more independent variables. Here, it details the regression coefficients for both the constant term and the sentiment score, along with their corresponding t-values.

In financial markets, investor sentiment is widely recognized as a key determinant of asset prices. Our study employed linear regression on datasets from AMZN, China Unicom, the stock index, and corn to explore the relationship between the sentiment score and closing price. The variation in regression coefficients among datasets was notable.

For the AMZN dataset, the regression coefficient for the constant was 98.443, and for the sentiment score, it was 26.526, with t-values of 59.057 and 9.910, respectively. Positive correlations between the sentiment score and closing price were observed in the AMZN dataset. This suggests that in these markets, as the sentiment score (reflecting investor sentiment) increases, the closing price of the assets also tends to rise.

In contrast, for the China Unicom Data with sentiment scores, the regression coefficient for the sentiment score was −0.488, indicating an inverse relationship, and a t-value of −2.450. Similarly, the regression coefficient for the sentiment score was −327.564 for corn, with the t-value being −4.703. Negative correlations emerging in the China Unicom mean that an increase in the sentiment score is associated with a decrease in the closing price, potentially due to the unique market dynamics, investor expectations, or product-specific factors of these financial products. One possible interpretation is that retail investors might exhibit emotional behaviors such as buying high and selling low, which can lead to potential money losses. This phenomenon underscores the need for models like ours to better understand and potentially mitigate such market inefficiencies.

Furthermore, the R² values for the CSI 100 index, corn futures, AMZN stock, and China Unicom stock were measured with values of 0.9395, 0.9270, 0.9489, and 0.889, respectively. These high R² values indicate that the sentiment scores explain a very large proportion of the variability in price movements, with AMZN showing the strongest explanatory power.

Regarding significance, the sentiment score had a significant impact on closing prices in AMZN, China Unicom, and corn since the absolute t-values were large enough in these cases, which validates the influence of investor sentiment on financial product prices. However, in the CSI 100 stock index data, the regression coefficient for the constant was 6382.353, and for the sentiment score, it was 26.888, with t-values of 59.624 and 0.105, respectively. While the R² value was high at 0.9395, the sentiment score had an insignificant impact, as indicated by its low t-value of 0.105. This may reflect the fact that the CSI 100, as a broad market index representing the top 100 Chinese stocks by market capitalization and liquidity in China, is influenced by a diverse range of macroeconomic factors, such as overall economic trends, monetary policies, and geopolitical events, which play a dominant role in price formation, overshadowing the effect of investor sentiment. However, it is also important to note that sentiment correlated strongly with the CSI 100’s price movements. This finding supports the valuable role of sentiment analysis in modeling market behavior, but suggests more nuanced, stock-specific analyses are warranted to fully capture complex market dynamics for index data.

4.3. Model Performance and Comparison

4.3.1. Evaluation Metrics

To comprehensively validate the effectiveness and generalization capability of the proposed framework, this study adopted the standard evaluation paradigm in the field of financial time-series forecasting: by comparing the framework with traditional statistical methods, mainstream deep learning models, and the latest state-of-the-art (SOTA) approaches across diverse datasets (encompassing stocks, indices, agricultural futures, and Amazon (AMZN) stock), and quantifying errors using multi-dimensional metrics including mean-squared error (MSE), root-mean-squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Four key examined metrics are presented as follows:

M S E = \frac{1}{N} {\sum_{i = 1}^{N} (p_{i} - {\hat{p}}_{i})}^{2}

(8)

R M S E = \sqrt{\frac{1}{N} {\sum_{i = 1}^{N} (p_{i} - {\hat{p}}_{i})}^{2}}

(9)

M A E = \frac{1}{N} \sum_{i = 1}^{N} |p_{i} - {\hat{p}}_{i}|

(10)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} |\frac{p_{i} - {\hat{p}}_{i}}{p_{i}}|

(11)

where

p_{i}

,

{\hat{p}}_{i}

, and N are the ground-truth real market price, predicted market price, and the number of samples, respectively. Comprehensive experiments were conducted to evaluate the model performance and robustness across diverse financial assets in order to compare the model under different volatility and complexity levels, highlighting the strengths and limitations of different approaches, with a particular focus on stable versus erratic futures market behaviors.

4.3.2. Performance Evaluation on CSI 100 Index

Table 5 presents the comparison of our proposed DQN-HTS-EF’s performance with the latest state-of-the-art (SOTA) approaches and traditional deep learning models on the CSI 100 index dataset using the MAPE, RMSE, and MAE metrics. Compared with other existing prediction models [15,20,21,22,23,37], our proposed DQN-HTS-EF achieved superior results with the lowest MAPE of 2.027, RMSE of 148.7959, and MAE of 106.9365.

Particularly, our DQN-HTS-EF outperformed FinBERT [37] even though our DQN-HTS-EF uses a financial domain-specific expanded dictionary via Word2Vec [40,44], while FinBERT uses a Transformer-based financial BERT model. We believe that this performance difference is primarily due to FinBERT’s reliance on a relatively simple LSTM architecture for financial market forecasting, which lacks the adaptive capability to respond effectively to varying market regimes and abrupt changes. In contrast, our DQN-HTS-EF uses multi-model ensembling using the DQN for optimal model selection tailored to the current market regime or sudden shifts.

Compared with the statistical MCS approach [26], our proposed DQN-HTS-EF achieved a closely competitive performance on the CSI 100 index, with a slightly higher MAPE (2.027 vs. 1.798), a slightly higher RMSE (148.796 vs. 146.625), and a lower MAE (106.937 vs. 113.614). We consider a tie between MCS and our proposed DQN-HTS-EF on CSI 100. However, it should be noted that MCS only obtained a performance that was close to our proposed DQN-HTS-EF for CSI 100 only, but not for corn futures, China Unicom stock, and Amazon stock, which will be mentioned in the next sub-sections. With a tie with MCS and a superior performance to others, we consider that our proposed framework enhances prediction accuracy for financial forecasting by integrating multi-model fusion with deep reinforcement learning via the DQN.

4.3.3. Performance Evaluation on Corn Futures

Table 6 presents the comparison of our proposed DQN-HTS-EF’s performance with the latest SOTA approaches and traditional deep learning models on the corn futures dataset. Similarly, our proposed DQN-HTS-EF achieved a superior performance, with the lowest MAPE of 1.075, RMSE of 30.835, and MAE of 24.826, which significantly outperformed conventional deep learning models such as TCN, GRU, LSTM, and SCINet proposed in [18], the statistical MCS approach [26], and FinBERT [37]. It is noted that although MCS achieved a tie on the CSI 100 index, its performance was noticeably behind our model and even worse than that of FinBERT. This result demonstrates that our proposed framework improves the prediction accuracy for agricultural commodity series forecasting.

4.3.4. Performance Evaluation on China Unicom Stock

Moreover, Table 7 tabulates the performance evaluation of our DQN-HTS-EF on the China Unicom stock dataset. As we can see, compared with the existing approaches [9,11,19,24,25,26,37], including the hybrid model LSTM-mTrans-MLP [25], the statistical MCS model [26], and FinBERT [37], our DQN-HTS-EF outperformed them with the lowest MSE of 0.012, RMSE of 0.108, and MAE of 0.075. Notably, it outperformed the second-best static hybrid models, LSTM-mTrans-MLP [25], by 33.3% in the MSE for the China Unicom dataset, establishing a new benchmark for adaptive financial forecasting. Furthermore, the MCS model performed worse than LSTM-mTrans-MLP, FinBERT, and our proposed approach. This suggests that our model significantly improves prediction accuracy.

4.3.5. Performance Evaluation on Amazon Stock

Last but not least, Table 8 presents the performance comparison of our DQN-HTS-EF with the latest SOTA approaches on the Amazon stock dataset. Our DQN-HTS-EF outperformed the existing approaches [24,26,29,33,37,38], including the hybrid model CNN-BiLSTM [24], the statistical MCS model [26], and FinBERT [37], with the lowest MAE of 4.335, RMSE of 5.293, and MSE of 28.018, which proves our model’s supremacy. Notably, MCS obtained substantially larger MAE, MSE, and RMSE values, performing worse than many approaches, including CNN-BiLSTM, FinBERT, and our proposed method.

To summarize, across all datasets, our DQN-HTS-EF consistently obtained low prediction errors, attributed to its dynamic ensemble strategy. The DQN agent’s real-time volatility adaptation via a reward function tied to MSE optimally balances the linear trend modeling of SVR with the nonlinear dependency capture of Transformers. Furthermore, the framework’s multi-market generalizability, ranging from corn futures to RMB-denominated indices and USD-tech stocks, highlights its potential for real-world financial applications.

4.4. Forecasting Performance Across Time

The predictions across time for the four assets are visualized in Figure 3, Figure 4, Figure 5 and Figure 6 to verify the adaptability advantage of DQN-HTS-EF in different market structures, each of which reflects different asset dynamics and integrated customized responses.

4.4.1. Prediction Across Time on CSI 100 Index

For the CSI 100 index (Figure 3), a volatility-dominated equity market sensitive to policy shifts and sentiment shocks (e.g., the 2024–10 rally), our DQN-HTS-EF (purple) tightly tracked the actual price (black). To capture abrupt sentiment-driven fluctuations, the usage of models was also counted. It was found that our DQN agent dynamically increased the usage of the Base-Transformer (blue) to 42% of predictions, leveraging its attention mechanism. In contrast, SVR, as reported in [15], over-smoothed extreme volatility, with a higher RMSE of 469.6172. Meanwhile, FIVMD-LSTM [15] exhibited greater bias, with an RMSE of 154.6032, compared with our model’s RMSE of 148.7959, as shown in Table 5. This highlights the ensemble’s superior balance between stability and responsiveness.

4.4.2. Prediction Across Time on Corn Futures

In the corn futures market (Figure 4), characterized by stable supply–demand dynamics and minor trend-adjacent fluctuations, our DQN-HTS-EF (purple) outperformed all single models. The ensemble prioritized SVR (58% of predictions) for kernel-based smoothness to exploit linear trends, while Transformer insights (42% of predictions) were also integrated to refine local fluctuations. This dynamic multi-model fusion strategy using DQN minimized bias, avoiding SVR’s underfitting on minor swings, and minimized variance by mitigating the Transformer from overfitting to noise, resulting in an RMSE of 30.835, which was smaller than half of SVR’s 74.215 and 44.3% lower than SCINet’s value [18] (Table 6), yielding the lowest MAPE of 1.075.

4.4.3. Prediction Across Time on China Unicom Stock

The China Unicom stock (Figure 5) is a high-frequency tech stock that is sensitive to sector-specific news and sentiment spikes (e.g., the late-2024 rallies). Our DQN-HTS-EF adapted to this by allocating 42% of predictions to Transformers for sentiment-driven pattern capture while using SVR for regularization to avoid overfitting. This balance is reflected in the quantitative metrics, with an MAE of 0.075 achieved, which was a 4.7% reduction against the MAE of 0.110 of CNN-BiLSTM [24] (Table 7), demonstrating efficiency in tracking high-frequency fluctuations.

4.4.4. Prediction Across Time on Amazon Stock

Amazon’s stock (Figure 6) exhibits multi-year cycles and regime shifts (e.g., the 2022 downturn and 2023 recovery), which can validate the cross-regime adaptability of our DQN-HTS-EF. During the 2022 bear market, our DQN-HTS-EF increased the usage of the Multi-Transformer (green, comprising 38% of predictions) to capture long-term trend reversals. During the 2023 recovery, it prioritized the Single Transformer (45% of predictions) for short-term momentum. This dynamic selection yielded an RMSE of 5.293, outperforming the value of 5.336 of CNN-BiLSTM [24] and also other traditional models (Table 8). This underscores the generalizability of our proposed approach across complex market structures.

4.4.5. Statistical Validation

To rigorously verify the reliability of the DQN-HTS-EF model’s performance gains, statistical validation via paired t-tests was also conducted, comparing its root-mean-squared error (RMSE) against those of individual baseline models (SVR and the three Transformer variants) across all assets. The paired t-test statistic was calculated as

t = \frac{\bar{d} - μ_{0}}{s_{d} / \sqrt{n}}

(12)

where

\bar{d}

is the mean of the paired RMSE differences between models,

μ_{0} = 0

represents the null hypothesis of no difference,

s_{d}

is the sample standard deviation of the paired differences, and n is the number of prediction time steps. We confirmed that our ensemble’s improvements were highly significant (p < 0.001 for all assets), with t-values ranging from 8.23 to 15.67, substantially exceeding the critical value of

t_{0.0005,239} = 3.32

, and effectively ruling out performance improvements as random variation.

Furthermore, it was found that DQN-HTS-EF systematically addresses the bias–variance trade-off, which is a central challenge in financial time-series forecasting. By dynamically selecting optimal models via a reward function maximizing prediction accuracy, the framework reduces bias and variance distinctly across assets:

Bias reduction: Bias is reduced through context-specific model selection. For example, it prioritizes SVR (approximately 58%) in stable, trend-dominated markets (e.g., corn) to capture smooth price movements, while emphasizing the Base-Transformer (42%) in volatile environments (e.g., CSI 100) that exhibit relatively abrupt changes.
Variance reduction: By not overly relying on a single model architecture, the approach lowers variance. For example, on the China Unicom stock, DQN-HTS-EF achieved an MSE of 0.012, which was 14.8% lower than SVR’s value of 0.0141 and 67.5% lower than the bidirectional Transformer’s value of 0.0369.

This versatile model selection strategy enhances robustness across heterogeneous financial assets. The visualizations in Figure 3, Figure 4, Figure 5 and Figure 6 confirm the model’s adaptability, for instance, showing alignment with SVR in a stable market, as shown in Figure 4 for corn futures, and a preference for the Base-Transformer in a relatively volatile market, as shown in Figure 3 for the CSI 100 index. Extensive quantitative comparisons in Table 4, Table 5, Table 6 and Table 7 further support these findings, with improvements such as a 34.1% RMSE reduction on corn (30.835 vs. SVR’s 46.828) and a 30.0% reduction on AMZN (5.293 vs. 7.549). Together, these results validate the DQN-HTS-EF framework’s practical applicability and superior performance in diverse financial prediction tasks.

Moreover, we conducted the Diebold–Mariano test using MAE to assess the statistical significance of the performance differences, particularly between DQN-HTS-EF and MCS [26]. The test statistics of CSI 100, corn futures, China Unicom stock, and Amazon stock were −7.0252, −5.4365, −10.0518, and −4.0664, respectively. These results indicate that the improvements of DQN-HTS-EF over MCS were statistically significant. Yet, considering DQN-HTS-EF achieved a slightly higher MAPE (2.027 vs. 1.798) and a slightly higher RMSE on CSI 100, we concluded that DQN-HTS-EF ties with MCS for this index, but shows significant improvements on corn futures, China Unicom stock, and Amazon stock.

4.5. Further Analysis

4.5.1. SHAP Analysis Sentiment Scores and Model Predictions

To evaluate the contribution of sentiment scores

{s s}_{t}

, as shown in (4), SHAP analysis [50] was performed across different datasets. Table 9 shows the SHAP importance of

{s s}_{t}

, the ratio of positive to negative impacts, the dominant predictive model(s) for each dataset, and the SHAP importance of those dominant models. Corn futures showed the lowest impact (0.003), suggesting sentiment plays a minimal role in predicting its price movements. In contrast, AMZN stock exhibited a substantially higher SHAP importance (0.0367), indicating a much stronger influence of sentiment on its price prediction.

The dominant predictive models also differed across datasets. This suggests that the optimal predictive model may be asset-specific, reflecting the unique characteristics of each asset.

This SHAP analysis demonstrated that sentiment played a significant role in predicting the price movements. Furthermore, the choice of predictive model appeared to be asset-specific, highlighting the need for DQN-based hybrid modeling.

4.5.2. Effects of Different Training Data Portions

As presented in Table 1, we followed the standard practice in the literature for the train–test splitting ratio of different assets, using historical data for the training set and future data for the test set. This temporal separation ensured no overlap between training and test periods, effectively preventing information leakage. In this subsection, we further investigate the influence of the training data size on model performance by subsampling the original training set, specifically reducing it by 10%, as shown in Table 10. The results demonstrate that increasing the training data generally led to reductions in the RMSE, MAE, and MAPE metrics. However, a slight increase in the MAPE for the CSI 100 index suggested a mild indication of overfitting. Overall, these findings imply that, for the assets examined, a larger training dataset can still yield more accurate predictions, underscoring the importance of the amount of training data during model development.

4.5.3. Comparison with Alternative Ensemble Strategies

To further validate the effectiveness of our proposed DQN-HTS-EF-driven dynamic ensembling strategy, we compared it against three conventional static ensemble methods: arithmetic mean (a simple average of all model predictions), weighted average (weights determined by the inverse MSE of individual models during training), and directional voting (predictions aligned with the majority direction of component models, converted to regression values via linear scaling).

Table 11 presents the MSE results of these strategies across the four financial datasets, highlighting the superiority of the proposed DQN-HTS-EF’s adaptive model selection in handling market volatility and regime shifts. We can see that the proposed DQN-HTS-EF consistently achieved the lowest values across all datasets. For instance, in the CSI 100 stock index, its value was 0.0015, far lower than the values of 0.0087, 0.0020, and 0.0140 obtained by the arithmetic mean, weighted average, and directional voting, respectively. Similar advantages were also seen in corn futures, China Unicom stock, and AMZN stock.

4.5.4. Convergence Analysis

Figure 7a,b depict the MSE loss against training epochs for corn futures and the CSI 100 index, respectively, which show consistent convergence behavior. The gradual loss decrease observed in our training curves confirms the practical training stability of our training approach and sufficient training iterations, without any divergence.

5. Conclusions

In this paper, with the advancement of deep learning, a hybrid framework is proposed that integrates well-established components such as Transformers and Deep Q-Networks (DQNs) in a way that is specifically tailored and adapted to address the unique challenges of financial time-series prediction, which integrates a domain-specific financial sentiment dictionary, heterogeneous Transformer variants, and a Deep Q-Network (DQN)-driven dynamic ensembling strategy to tackle the complexities of financial time-series forecasting, including market volatility, nonlinear regime shifts, and the integration of quantitative indicators with investor sentiment. The framework constructs a 16,673-term sentiment dictionary using SnowNLP and Word2Vec, achieving up to 97.35% accuracy in classifying forum titles, which effectively captures financial domain-specific terminology. By combining Support Vector Regression (SVR) for linear trend modeling as well as three Transformer architectures (a Base-Transformer, Multi-Transformer, and Bi-Transformer) for nonlinear dependency extraction, the framework balances stability and adaptability. And our DQN is able to dynamically fuse model predictions based on real-time volatility, reducing the RMSE by a large margin on average across diverse datasets.

Comprehensive experiments were conducted across the CSI 100 index, corn futures, China Unicom, and Amazon stock data, which demonstrated the framework’s superiority over state-of-the-art approaches. For instance, on the CSI 100 index, it achieved the lowest MAE of 106.9365. In corn futures, it reduced the RMSE to 30.835, a 40.5% improvement over MCS (51.845). In high-frequency markets like China Unicom and Amazon, it surpassed CNN-BiLSTM’s baselines by 4.1% and 0.8% in the MAE, respectively. This demonstrates the multi-market generalization of our framework across RMB equities, USD tech stocks, and agricultural futures and validates its robustness during global events like the Russia–Ukraine war and the COVID-19 pandemic.

This research advances financial forecasting by bridging sentiment analysis with adaptive model integration, showing that DRL-driven ensembling enhances prediction robustness. Practically, it offers a scalable solution for real-time markets, requiring minimal pretraining compared with static ensembles. The framework’s dynamic weighting of SVR and Transformers optimizes performance in both stable and volatile regimes.

Future work will focus on expanding the corpus to include more linguistic variations and exploring adaptive thresholding methods or more sophisticated sentiment models, possibly incorporating contextual or deep learning approaches to better reflect sentiment subtleties in financial texts and, consequently, further enhance dictionary construction. In terms of prediction models, integrating event-driven features and exploring advanced RL algorithms like Proximal Policy Optimization (PPO) might be one possible direction for handling rare black swan events, which have limited historical data only, and extending to multi-time scale intraday forecasting. Overall, this research establishes a robust paradigm for sentiment-driven financial prediction, combining NLP, deep learning, and DRL to enable adaptive decision making in dynamic markets.

Author Contributions

Conceptualization, Z.S. and H.S.-H.T.; data curation, Z.S.; formal analysis, Z.S. and H.S.-H.T.; funding acquisition, H.S.-H.T. and W.-L.L.; investigation, Z.S. and H.S.-H.T.; methodology, Z.S., H.S.-H.T. and R.T.-C.H.; project administration, H.S.-H.T., R.T.-C.H., Y.Z. and W.-L.L.; resources, Z.S.; software, Z.S.; supervision, H.S.-H.T. and R.T.-C.H.; validation, Z.S. and H.S.-H.T.; visualization, Z.S.; writing—original draft, Z.S., H.S.-H.T. and R.T.-C.H.; writing—review and editing, H.S.-H.T., R.T.-C.H., Y.Z. and W.-L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Hong Kong Chu Hai College.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations were used in this manuscript:

AMZN	Amazon
ARIMA	Autoregressive Integrated Moving Average
BERT	Bidirectional Encoder Representations from Transformers
BiGRU	Bidirectional Gated Recurrent Unit
BiLSTM	Bidirectional Long Short-Term Memory
CEEMDAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
CNN	Convolutional Neural Network
CSI	China Securities Index
CST	China Standard Time
DRL	Deep Reinforcement Learning
DQN	Deep Q-Network
DQN-HTS-EF	DQN–Hybrid Transformer–SVR Ensemble Framework
ECA	Efficient Channel Attention
EMD	Empirical Mode Decomposition
ERNIE	Enhanced Representation from Knowledge Integration
FinBERT	Financial Bidirectional Encoder Representations from Transformers
FIVMD	Fast Iterative Variational Mode Decomposition
FN	False Negative
FP	False Positive
GARCH	Generalized Autoregressive Conditional Heteroskedasticity
GC-CNN	Graph Convolutional Neural Network
GRU	Gated Recurrent Unit
GWO	Grey Wolf Optimizer
LSTM	Long Short-Term Memory
MA	Moving Average
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MCS	Model Confidence Set
MLP	Multilayer Perceptron
MSE	Mean-Squared Error
NMSE	Negative Mean-Squared Error
NLP	Natural Language Processing
PPO	Proximal Policy Optimization
regex	Regular Expression
RL	Reinforcement Learning
RMB	Renminbi Currency
RMSE	Root-Mean-Squared Error
RNN	Recurrent Neural Network
SHAP	Shapley Additive Explanations
SOTA	State-of-the-Art
SSA	Sparrow Search Algorithm
SSA-BiGRU	Sparrow Search Algorithm–Bidirectional Gated Recurrent Unit
SVM	Support Vector Machine
SVR	Support Vector Regression
SWA	Sliding-Window Weighted Average
TP	True Positive
USD	US Dollar
VMD-SE-GRU	Variational Mode Decomposition–Squeeze-and-Excitation–Gated Recurrent Unit

References

Tetlock, P.C. Giving Content to Investor Sentiment: The Role of Media in Stock Markets. J. Financ. 2007, 62, 1139–1168. [Google Scholar] [CrossRef]
Lu, Y. Investor Sentiment and Stock Price Change: Evidence from CSI 500 Index. E-Commer. Lett. 2025, 14, 836–846. [Google Scholar] [CrossRef]
Box, G.E.; Jenkins, G.M. Time Series Analysis: Forecasting and Control, 1st ed.; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
Bollerslev, T. Generalized Autoregressive Conditional Heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Lauriola, I.; Lavelli, A.; Aiolli, F. An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools. Neurocomputing 2022, 470, 443–456. [Google Scholar] [CrossRef]
Casolaro, A.; Capone, V.; Iannuzzo, G.; Camastra, F. Deep Learning for Time Series Forecasting: Advances and Open Problems. Information 2023, 14, 598. [Google Scholar] [CrossRef]
Connor, J.; Martin, R.; Atlas, L. Recurrent Neural Networks and Robust Time Series Prediction. IEEE Trans. Neural Netw. 1994, 5, 240–254. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Archi-tectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 5998–6008. [Google Scholar] [CrossRef]
Nayak, G.H.H.; Patra, M.R.; Swain, R.K. Transformer-Based Deep Learning Architecture for Time Series Forecasting. Softw. Impacts 2024, 22, 100716. [Google Scholar] [CrossRef]
Li, Z.Q.; Li, D.S.; Sun, T.S. A Transformer-Based Bridge Structural Response Prediction Framework. Sensors 2022, 22, 3100. [Google Scholar] [CrossRef]
Wang, J.; Liu, J.; Jiang, W. An Enhanced Interval-Valued Decomposition Integration Model for Stock Price Prediction Based on Comprehensive Feature Extraction and Optimized Deep Learning. Expert Syst. Appl. 2023, 243, 122891. [Google Scholar] [CrossRef]
Nelson, D.M.Q.; Pereira, A.C.M.; de Oliveira, R.A. Stock Market’s Price Movement Prediction with LSTM Neural Networks. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: New York, NY, USA, 2017; pp. 1419–1426. [Google Scholar] [CrossRef]
Xu, Y.; Chhim, L.; Zheng, B.; Nojima, Y. Stacked Deep Learning Structure with Bidirectional Long-Short Term Memory for Stock Market Prediction. In Proceedings of the Neural Computing for Advanced Applications, Kyoto, Japan, 12–14 October 2020; Springer: Cham, Switzerland, 2020; pp. 33–44. [Google Scholar] [CrossRef]
Wang, Y.; Liu, Q.; Hu, Y.; Liu, H. A Study of Futures Price Forecasting with a Focus on the Role of Different Economic Markets. Information 2024, 15, 817. [Google Scholar] [CrossRef]
Chen, W.; Jiang, M.; Zhang, W.-G.; Chen, Z. A Novel Graph Convolutional Feature Based Convolutional Neural Network for Stock Trend Prediction. Inf. Sci. 2021, 556, 67–94. [Google Scholar] [CrossRef]
Mahmoodzadeh, A.; Nejati, H.R.; Mohammadi, M.; Ibrahim, H.H.; Rashidi, S.; Rashid, T.A. Forecasting Tunnel Boring Machine Penetration Rate Using LSTM Deep Neural Network Optimized by Grey Wolf Optimization Algorithm. Expert Syst. Appl. 2022, 209, 118303. [Google Scholar] [CrossRef]
Lin, Y.; Lin, Z.; Liao, Y.; Li, Y.; Xu, J.; Yan, Y. Forecasting the Realized Volatility of Stock Price Index: A Hybrid Model Integrating CEEMDAN and LSTM. Expert Syst. Appl. 2022, 206, 117736. [Google Scholar] [CrossRef]
Li, X.; Ma, X.; Xiao, F.; Xiao, C.; Wang, F.; Zhang, S. Time-series production forecasting method based on the integration of Bidirectional Gated Recurrent Unit (Bi-GRU) network and Sparrow Search Algorithm (SSA). J. Pet. Sci. Eng. 2022, 208 Pt A, 109309. [Google Scholar] [CrossRef]
Zhang, S.; Luo, J.; Wang, S.; Liu, F. Oil Price Forecasting: A Hybrid GRU Neural Network Based on Decomposition–Reconstruction Methods. Expert Syst. Appl. 2023, 218, 119617. [Google Scholar] [CrossRef]
Chen, Y.; Fang, R.; Liang, T.; Sha, Z.; Li, S.; Yi, Y.; Zhou, W.; Song, H. Stock Price Forecast Based on CNN-BiLSTM-ECA Model. Sci. Program. 2021, 2021, 2446543. [Google Scholar] [CrossRef]
Kabir, M.R.; Bhadra, D.; Ridoy, M.; Milanova, M. LSTM–Transformer-Based Robust Hybrid Deep Learning Model for Financial Time Series Forecasting. Sci 2025, 7, 7. [Google Scholar] [CrossRef]
Hansen, P.R.; Lunde, A.; Nason, J.M. The Model Confidence Set. Econometrica 2011, 79, 453–497. Available online: https://www.jstor.org/stable/41057463 (accessed on 2 September 2025). [CrossRef]
Zeng, L.; Hu, H.; Song, Q.; Zhang, B.; Lin, R.; Zhang, D. A drift-aware dynamic ensemble model with two-stage member selection for carbon price forecasting. J. Energy 2024, 313, 133699. [Google Scholar] [CrossRef]
Ghosh, I.; Chaudhuri, T.D.; Isskandarani, L.; Abedin, M.Z. Predicting financial cycles with dynamic ensemble selection frameworks using leading, coincident and lagging indicators. Res. Int. Bus. Financ. 2025, 80, 103114. [Google Scholar] [CrossRef]
Omoware, J.M.; Abiodun, O.J.; Wreford, A.I. Predicting Stock Series of Amazon and Google Using Long Short-Term Memory (LSTM). Asian Res. J. Curr. Sci. 2023, 5, 205–217. Available online: https://jofscience.com/index.php/ARJOCS/article/view/17 (accessed on 2 September 2025).
Hu, J.; Cen, Y.; Wu, C. Automatic Construction of Domain Sentiment Dictionary Based on Deep Learning: A Case Study of Financial Domain. Data Anal. Knowl. Discov. 2018, 2, 95–102. [Google Scholar] [CrossRef]
Loughran, T.; McDonald, B. When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks. J. Financ. 2011, 66, 35–68. [Google Scholar] [CrossRef]
Li, J.; Qian, S.; Li, L.; Guo, Y.; Wu, J.; Tang, L. A Novel Secondary Decomposition Method for Forecasting Crude Oil Price with Twitter Sentiment. Energy 2024, 290, 129954. [Google Scholar] [CrossRef]
Xu, Y.; Cohen, S.B. Stock Movement Prediction from Tweets and Historical Prices. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 1970–1979. [Google Scholar] [CrossRef]
Gui, J.; Naktnasukanjn, N.; Yu, X.; Ramasamy, S.S. Research on the Impact of Economic Policy Uncertainty and Investor Sentiment on the Growth Enterprise Market Return in China—An Empirical Study Based on TVP-SV-VAR Model. Int. J. Financ. Stud. 2024, 12, 108. [Google Scholar] [CrossRef]
Fu, K.; Zhang, Y. Incorporating Multi-Source Market Sentiment and Price Data for Stock Price Prediction. Mathematics 2024, 12, 1572. [Google Scholar] [CrossRef]
Smatov, N.; Kalashnikov, R.; Kartbayev, A. Development of Context-Based Sentiment Classification for Intelligent Stock Market Prediction. Big Data Cogn. Comput. 2024, 8, 51. [Google Scholar] [CrossRef]
Kim, J.; Kim, H.-S.; Choi, S.-Y. Forecasting the S&P 500 Index Using Mathematical-Based Sentiment Analysis and Deep Learning Models: A FinBERT Transformer Model and LSTM. Axioms 2023, 12, 835. [Google Scholar] [CrossRef]
Umer, M.; Awais, M.; Muzammul, M. Stock Market Prediction Using Machine Learning (ML) Algorithms. ADCAIJ Adv. Distrib. Comput. Artif. Intell. J. 2019, 8, 97–116. [Google Scholar] [CrossRef]
Sun, F.; Belatreche, A.; Coleman, S.; McGinnity, T.M. Pre-Processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach. In Proceedings of the IEEE Computational Intelligence for Financial Engineering and Economics 2014, London, UK, 21–23 April 2014; IEEE: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013; pp. 1–12. [Google Scholar] [CrossRef]
Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Neural Inf. Process. Syst. 1996, 9, 155–161. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Li, S.; Zhao, Z.; Hu, R.; Li, W.; Liu, T.; Du, X. Analogical Reasoning on Chinese Morphological and Semantic Relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018, Melbourne, Australia, 15–20 July 2018; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; Volume 2, pp. 138–143. [Google Scholar] [CrossRef]
Liu, Y.; Mikriukov, D.; Tjahyadi, O.C.; Li, G.; Payne, T.R.; Yue, Y.; Siddique, K.; Man, K.L. Revolutionising Financial Portfolio Management: The Non-Stationary Transformer’s Fusion of Macroeconomic Indicators and Sentiment Analysis in a Deep Reinforcement Learning Framework. Appl. Sci. 2024, 14, 274. [Google Scholar] [CrossRef]
Digquant Financial Database. Available online: https://digquant.com/ (accessed on 2 September 2025).
Yahoo Finance. Available online: https://finance.yahoo.com/ (accessed on 2 September 2025).
Eastmoney Stock Bar. Available online: https://guba.eastmoney.com/ (accessed on 19 June 2025).
Jieba Documentation. Available online: https://github.com/fxsjy/jieba (accessed on 19 June 2025).
Sen, D.; Deora, B.S.; Vaishnav, A. Explainable Deep Learning for Time Series Analysis: Integrating SHAP and LIME in LSTM-Based Models. J. Inf. Syst. Eng. Manag. 2025, 10, 412–423. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. Available online: https://dl.acm.org/doi/10.5555/2627435.2670313 (accessed on 2 September 2025).
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference for Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar] [CrossRef]

Figure 1. Overview of our proposed hybrid framework architecture.

Figure 2. Heterogeneous feature fusion pipeline for causal price forecasting.

Figure 3. Model prediction across time on CSI 100 index.

Figure 4. Model prediction across time on corn futures.

Figure 5. Model prediction across time on China Unicom stock.

Figure 6. Model prediction across time on Amazon stock.

Figure 7. MSE loss against epochs for Transformer variants and DQN. (a) Corn futures. (b) CSI 100 index.

Table 1. Dataset summary with the details of training and testing periods.

Assets	Related Works for Dataset	Start Date and End Date	Train–Test Split Ratio	Test Data Duration	Important Global Events During Test Period
CSI 100 stock index	[15]	01.01.2020– 31.12.2024	80%:20% (969:243)	02.01.2024– 31.12.2024	Russia–Ukraine war
Corn futures	[18]
China Unicom stock	[25]
AMZN stock	[25]	25.05.2017– 05.04.2023	70%:30% (1034:442)	06.07.2021– 05.04.2023	COVID-19 pandemic

Table 2. Sentimental scoring accuracy of our domain-specific financial sentiment dictionary using different SnowNLP thresholds.

Assets	Number of Forum Titles	Number of Positive Labels (%)	Number of Negative Labels (%)	SnowNLP Thresholds ( ${T H}_{u p}$ $, {T H}_{l o w}$ )	Overall Accuracy (%)
CSI 100 stock index	92,184	52.57 (48,459)	9.52 (8774)	(0.7, 0.3)	97.00
				(0.8, 0.2)	92.84
				(0.9, 0.1)	96.75
Corn futures	135,110	42.23 (57,055)	15.75 (21,285)	(0.7, 0.3)	97.35
				(0.8, 0.2)	95.94
				(0.9, 0.1)	97.28
China Unicom stock	153,023	53.91 (82,491)	11.12 (16,857)	(0.7, 0.3)	97.16
				(0.8, 0.2)	96.28
				(0.9, 0.1)	95.88
AMZN stock	7382	77.43 (5716)	6.18 (456)	(0.7, 0.3)	90.53
				(0.8, 0.2)	86.88
				(0.9, 0.1)	81.79

Table 3. Sentimental scoring accuracy of our domain-specific financial sentiment dictionary using different cosine similarity thresholds.

Cosine Similarity Threshold	CSI 100 Stock Index (%)	Corn Futures (%)	China Unicom Stock (%)	AMZN Stock (%)	Overall Accuracy (%)
0.7	96.07	97.23	96.21	82.33	92.96
0.8	97.00	97.35	97.16	90.53	95.51
0.9	93.62	96.93	95.68	83.95	92.55

Table 4. Regression analysis for the sentiment score.

Assets	Regression Coefficient (Constant)	Regression Coefficient (Sentiment Score)	t-Value (Constant)	t-Value (Sentiment Score)	R² Value
CSI 100 stock index	6382.353	26.888	59.624	0.105	0.9395
Corn futures	2631.048	−327.564	132.487	−4.703	0.9270
China Unicom stock	4.541	−0.488	53.235	−2.450	0.8889
AMZN stock	98.443	26.526	59.057	9.910	0.9489

Table 5. Model comparison on CSI 100 index dataset.

Model	MAPE	RMSE	MAE
SVR [15]	10.993	469.617	393.234
GRU [15]	7.887	415.654	353.276
LSTM [15]	7.054	382.569	313.854
FIVMD-LSTM [15]	2.772	154.603	116.563
GWO-LSTM [20]	6.083	326.457	265.384
CEEMDAN-LSTM [21]	5.249	296.312	233.445
SSA-BIGRU [22]	13.688	545.211	501.139
VMD-SE-GRU [23]	3.316	192.817	148.918
MCS [26]	1.798	146.625	113.614
FinBERT [37]	2.273	186.770	139.047
Proposed DQN-HTS-EF	2.027	148.796	106.937

Table 6. Model comparison on corn futures dataset.

Model	MAPE	RMSE	MAE
TCN [18]	2.532	85.720	70.128
GRU [18]	2.347	78.946	65.015
LSTM [18]	2.093	74.215	59.657
SCINet [18]	1.634	55.404	45.190
MCS [26]	1.588	51.845	41.808
FinBERT [37]	1.379	34.976	28.542
Proposed DQN-HTS-EF	1.075	30.835	24.826

Table 7. Model comparison on China Unicom stock dataset.

Model	MSE	RMSE	MAE
CNN [19]	0.037	0.193	0.134
LSTM [9]	0.036	0.189	0.128
BiLSTM [11]	0.035	0.189	0.132
CNN-LSTM [24]	0.030	0.174	0.110
CNN-BiLSTM [24]	0.029	0.170	0.110
BiLSTM-ECA [24]	0.039	0.198	0.142
CNN-LSTM-ECA [24]	0.032	0.180	0.127
CNN-BiLSTM-ECA [24]	0.028	0.167	0.103
LSTM-mTrans-MLP [25]	0.018	0.133	0.092
MCS [26]	0.029	0.170	0.143
FinBERT [37]	0.024	0.154	0.107
Proposed DQN-HTS-EF	0.012	0.108	0.075

Table 8. Model comparison on AMZN stock dataset.

Model	MAE	MSE	RMSE
Linear regression [38]	72.47	7231.59	85.04
Exponential smoothing [33]	16.62	363.83	19.074
LSTM [29]	14.97	418.97	20.468
CNN-BiLSTM [24]	4.518	28.478	5.336
MCS [26]	23.729	841.835	29.014
FinBERT [37]	6.420	66.731	8.169
Proposed DQN-HTS-EF	4.335	28.018	5.293

Table 9. SHAP analysis of sentiment scores and model predictions across datasets.

Assets	Sentiment Score SHAP Importance	Dominant Model (Highest SHAP Importance)	Dominant Model SHAP Importance
CSI 100 stock index	0.0378	Multi-Transformer	0.1687
Corn futures	0.003	Bi-Transformer	0.2289
China Unicom stock	0.0088	SVR	0.1186
AMZN stock	0.0367	Multi-Transformer and SVR (tied)	1.5589/1.5510

Table 10. Performance analysis using different training data portions.

Assets	Train–Test (60:20)			Train–Test (70:20)			Train–Test (80:20)
Assets	MAPE	RMSE	MAE	MAPE	RMSE	MAE	MAPE	RMSE	MAE
CSI 100 stock index	1.872	151.025	117.695	1.868	150.448	117.449	2.027	148.796	106.937
Corn futures	1.547	52.184	41.307	1.287	43.128	34.042	1.075	30.835	24.826
	MSE	RMSE	MAE	MSE	RMSE	MAE	MSE	RMSE	MAE
China Unicom stock	0.028	0.167	0.138	0.029	0.172	0.144	0.012	0.108	0.075
	Train–Test (50:30)			Train–Test (60:30)			Train–Test (70:30)
	MAE	MSE	RMSE	MAE	MSE	RMSE	MAE	MSE	RMSE
AMZN stock	29.787	1212.208	34.817	28.811	1164.353	34.123	4.335	28.018	5.293

Table 11. MSE comparison of ensemble strategies on diverse datasets.

Assets	Proposed DQN-HTS-EF	Arithmetic Mean	Weighted Average	Directional Voting
CSI 100 stock index	0.0015	0.0087	0.0020	0.0140
Corn futures	0.0008	0.0018	0.0011	0.0036
China Unicom stock	0.0117	0.0196	0.0146	0.0287
AMZN stock	0.0018	0.0025	0.0024	0.0033

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, Z.; Tsang, H.S.-H.; Hsung, R.T.-C.; Zhu, Y.; Lo, W.-L. From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting. Forecasting 2025, 7, 55. https://doi.org/10.3390/forecast7040055

AMA Style

Song Z, Tsang HS-H, Hsung RT-C, Zhu Y, Lo W-L. From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting. Forecasting. 2025; 7(4):55. https://doi.org/10.3390/forecast7040055

Chicago/Turabian Style

Song, Zhicong, Harris Sik-Ho Tsang, Richard Tai-Chiu Hsung, Yulin Zhu, and Wai-Lun Lo. 2025. "From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting" Forecasting 7, no. 4: 55. https://doi.org/10.3390/forecast7040055

APA Style

Song, Z., Tsang, H. S.-H., Hsung, R. T.-C., Zhu, Y., & Lo, W.-L. (2025). From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting. Forecasting, 7(4), 55. https://doi.org/10.3390/forecast7040055

Article Menu

From Market Volatility to Predictive Insight: An Adaptive Transformer–RL Framework for Sentiment-Driven Financial Time-Series Forecasting

Abstract

1. Introduction

2. Literature Review

2.1. Traditional Financial Forecasting Models

2.2. Basic Deep Learning Models

2.3. Deep Learning Models with Advanced Techniques

2.4. Hybrid Deep Learning Models

2.5. Sentiment Analysis

2.6. Limitations of Existing Research and Our Contributions

3. Materials and Methods

3.1. Framework Overview

3.2. Data Acquisition and Data Preprocessing

3.2.1. Financial Trading Data

3.2.2. Textual Sentiment Data

3.3. Financial Domain-Specific Sentiment Dictionary Construction

3.3.1. Dictionary Construction Using Sentiment Analysis

3.3.2. Dictionary Expansion Using Word Embedding

3.4. Market and Sentiment Data Fusion for Model Prediction

3.5. Proposed DQN-HTS-EF Model Architectures

3.5.1. SVR Model for Stable Market

3.5.2. Transformer Models for Moderate-to-Volatile Market

3.5.3. DQN for Adaptive Prediction Selection

4. Results

4.1. Experimental Datasets and Setup

4.1.1. Dataset Information

4.1.2. Experimental Setup

4.2. Performance Validation of Sentiment Scoring

4.2.1. Evaluation Metrics and Results

4.2.2. Sentiment Scoring Comparison and Examples

4.2.3. Multi-Market Sentimental Scoring Validation

4.2.4. Regression Analysis of the Sentiment Score—Closing Price Relationship

4.3. Model Performance and Comparison

4.3.1. Evaluation Metrics

4.3.2. Performance Evaluation on CSI 100 Index

4.3.3. Performance Evaluation on Corn Futures

4.3.4. Performance Evaluation on China Unicom Stock

4.3.5. Performance Evaluation on Amazon Stock

4.4. Forecasting Performance Across Time

4.4.1. Prediction Across Time on CSI 100 Index

4.4.2. Prediction Across Time on Corn Futures

4.4.3. Prediction Across Time on China Unicom Stock

4.4.4. Prediction Across Time on Amazon Stock

4.4.5. Statistical Validation

4.5. Further Analysis

4.5.1. SHAP Analysis Sentiment Scores and Model Predictions

4.5.2. Effects of Different Training Data Portions

4.5.3. Comparison with Alternative Ensemble Strategies

4.5.4. Convergence Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI