1. Introduction
Stock price forecasting remains a cornerstone of financial time series analysis, yet it poses significant challenges due to the intricate interplay of heterogeneous factors, including historical price movements, inter-stock relationships, and market sentiment derived from financial news. The inherent noise and non-stationarity in financial data further exacerbate these challenges, making accurate prediction difficult (
Pilla & Mekonen, 2025;
Qian et al., 2024). The complexity of financial markets, characterized by nonlinear dynamics and high volatility, demands models capable of capturing both temporal dependencies and structural relationships among stocks.
Traditional approaches, such as autoregressive models (e.g., ARIMA)
Khashei et al. (
2009) and volatility models (e.g., GARCH)
H. Kim and Won (
2018), often rely on linear assumptions and univariate analyses, which limit their ability to capture the nonlinear and dynamic behaviors of financial markets. Moreover, prior studies
Vera Barberán (
2020) point out that these models tend to overlook critical external factors, such as macroeconomic indicators and economic news.
To address these limitations, recent research has explored the use of deep learning techniques, particularly recurrent neural networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), to model temporal dependencies in stock price movements
Eapen et al. (
2019);
Jin et al. (
2020);
Moghar and Hamiche (
2020);
Sherstinsky (
2020);
Shi et al. (
2024);
Xu and Keselj (
2019). These models have demonstrated superior performance compared to traditional time series methods due to their ability to capture nonlinear relationships and long-range dependencies in sequential data. RNN-based approaches often treat stocks as independent entities. They neglect crucial inter-dependencies from industry affiliations, investor behavior, and macroeconomic linkages. These relationships are critical for accurate stock price forecasting in complex financial markets
Krishnan et al. (
2024);
Zabaleta et al. (
2024).
More recently, Graph Neural Networks (GNNs) have emerged as a promising tool for modeling structural dependencies among stocks by representing market relationships as graphs
Chen et al. (
2018);
Shi et al. (
2024). Early GNN-based approaches often constructed graphs based on static correlations or pre-defined relationships, limiting their adaptability to dynamic market conditions
Zheng et al. (
2025). Subsequent studies have explored more sophisticated graph construction techniques, such as adaptive graph learning and attention mechanisms, to capture evolving inter-stock relationships
K. X. Li (
2025). However, GNNs are inherently constrained by small receptive fields and sensitivity to noise
Al-Omari and Al-Omari (
2025);
Qian et al. (
2024);
Wang and Cai (
2020), which hinder their ability to capture complex, higher-order dependencies
Krieg et al. (
2024) and lead to issues like oversmoothing
Kong et al. (
2024);
Wang et al. (
2025). Oversmoothing, in particular, can diminish the distinctiveness of node features, making it difficult to differentiate between stocks and limiting the model’s predictive capacity. Furthermore, noise in the graph structure can propagate through the network, corrupting node representations and further degrading performance.
Additionally, sentiment analysis from financial news, while offering valuable early signals of market movements, often relies on fixed or arbitrary time windows for aggregation, introducing temporal bias and reducing predictive accuracy
Qian et al. (
2024). The challenge lies in determining the optimal time frame for aggregating sentiment signals; too short a window may miss relevant information, while too long a window may dilute the signal with irrelevant noise
Qian et al. (
2024). Also, these fixed aggregation windows do not take into account how news sentiment changes over time and how it affects different stocks, which relies on market conditions and factors that are unique to each stock. The existing literature also presents conflicting evidence regarding the impact of sentiment on stock prices, with some studies suggesting a positive correlation and others indicating a more complex, nuanced relationship. These inconsistencies highlight the need for more sophisticated methods for sentiment extraction and integration into forecasting models.
To address these challenges, we propose the Diffusion-Aware Sentiment Fusion Network (DASF-Net). This novel multimodal framework synergistically integrates diffusion-based graph learning with sentiment-aware representations derived from pretrained language models. Unlike the traditional regressive models, DASF-Net closes these limitations by recalibrating a dual (industry + fundamental) graph daily to track non-stationary co-movements and capture static sectoral relationships in both local and global dependencies among stocks. DASF-Net uses diffusion processes on two financial graphs: an industry graph for static sectoral relationships and a fundamental graph for dynamic stock interactions. This approach models local and global stock dependencies. It mitigates GNN limitations, such as oversmoothing and restricted receptive fields, by propagating information across larger graph neighborhoods. Specification techniques ensure computational efficiency.
This design mitigates key GNN limitations, such as oversmoothing and restricted receptive fields, by propagating information across larger graph neighborhoods while maintaining computational efficiency through sparsification techniques
Gasteiger et al. (
2022). The DASF-Net resolves these deficiencies of the traditional GNNs through two principle mechanisms: (a) heat-kernel diffusion over complementary industry and fundamental graphs, enlarging the receptive field while preserving node individuality; (b) daily re-estimation of graph edges, ensuring regime awareness and immunity to stale correlations. Concurrently, DASF-Net extracts sentiment embeddings from financial news using FinBERT, a domain-specific language model tailored for financial text
Shobayo et al. (
2024). To mitigate temporal bias, we systematically identify an optimal 3-day aggregation window for sentiment, ensuring that the model captures temporally relevant signals without diluting predictive power and calibrates to the empirically observed decay of news influence. These structural and sentiment modalities are fused via a multi-head attention (MHA) mechanism, which dynamically prioritizes relevant features based on market conditions, enhancing the model’s adaptability to volatile financial environments.
Our model leverages daily stock prices and news sentiment for forecasting over 1-day, 2-day, and 3-day horizons, as detailed in
Section 4. The experiments, conducted on a dataset comprising 12 major S&P 500 stocks from 2020 to 2023, demonstrate that DASF-Net significantly outperforms state-of-the-art baselines, such as MGAR
Song et al. (
2023) and Sentiment+LSTM
Jin et al. (
2020), achieving up to a 91.6% relative reduction in Mean Squared Error (MSE). These results underscore the effectiveness of combining diffusion-based graph learning with optimized sentiment integration, providing a robust framework for financial forecasting. By explicitly addressing the limitations of prior work—such as the oversimplification of inter-stock relationships and the use of arbitrary sentiment windows—DASF-Net sets a strong benchmark for multimodal stock price prediction.
In summary, our work offers the following key contributions:
Diffusion-Based Graph Learning: We introduce diffusion-based graph learning over dual financial graphs (industry and fundamental) to capture higher-order stock dependencies, overcoming the limitations of traditional GNNs in terms of receptive field and noise sensitivity.
Optimized Sentiment Aggregation: We propose a systematic approach to identify an optimal 3-day time window for sentiment aggregation, minimizing temporal bias and enhancing predictive accuracy across multiple stock categories, in contrast to fixed-window approaches.
Adaptive Multimodal Fusion: We develop a multi-head attention mechanism to dynamically integrate structural and sentiment features, enabling adaptive weighting of modalities under varying market conditions and enhancing generalization and resilience compared to static fusion methods.
The remainder of this paper is organized as follows:
Section 2 reviews related work.
Section 3 details the proposed DASF-Net methodology.
Section 4 outlines the experimental setup.
Section 5 presents the results and analysis. Finally,
Section 6 concludes the paper with a summary and future research directions.
2. Related Work
This section reviews key research areas relevant to the DASF-Net framework, including statistical and deep learning models for stock price forecasting, graph-based methods for modeling inter-stock relationships, sentiment analysis in financial forecasting, and multimodal fusion techniques. We highlight the limitations of existing approaches and demonstrate how they motivate the design of DASF-Net, which integrates diffusion-based graph learning with optimized sentiment aggregation and adaptive fusion to address these gaps.
2.1. Statistical and Deep Learning Approaches for Financial Time Series Prediction
Early approaches to stock price forecasting relied on statistical time series models such as ARIMA
Box et al. (
2015) and GARCH
Bollerslev (
1986), which model linear trends and volatility clustering. While computationally efficient, these models struggle to capture the nonlinear and dynamic behaviors inherent in financial markets, limiting their predictive accuracy in volatile conditions.
Machine learning methods, including Support Vector Machines (SVMs)
J. Cao et al. (
2003), Random Forests
Liaw and Wiener (
2002), and Gradient Boosting Machines
Friedman (
2001), improve flexibility by leveraging hand-crafted features but often fail to explicitly model temporal dependencies, leading to suboptimal performance in long-term forecasting. Furthermore, these models typically operate in isolation, neglecting the complex inter-dependencies that exist among different stocks and sectors within the financial market.
Deep learning models have significantly advanced sequence modeling capabilities. Long Short-Term Memory (LSTM) networks
Hochreiter and Schmidhuber (
1997) and Bidirectional LSTMs (BiLSTMs)
Schuster and Paliwal (
1997) are able to capture temporal dependencies in stock prices, with notable improvements over traditional methods. Attention mechanisms have also been incorporated to enhance the ability of these models to focus on the most relevant time steps
Bahdanau et al. (
2015);
Vaswani et al. (
2017). Hybrid architectures, such as CNN-BiLSTM with attention mechanisms
Livieris et al. (
2020), further enhance feature extraction by combining convolutional and recurrent layers.
These models, however, are primarily unimodal, focusing solely on price data and neglecting critical external signals such as inter-stock relationships and market sentiment. This limitation restricts their ability to model the multifaceted dynamics of financial markets and capture the subtle nuances that drive stock price fluctuations. Unlike these unimodal approaches, DASF-Net integrates structural dependencies and sentiment signals through diffusion-based graph learning and optimized temporal aggregation, enabling a more comprehensive understanding of market behavior and leading to improved predictive performance.
2.2. Graph Neural Networks and Diffusion-Based Learning in Financial Modeling
The financial market is a complex system where the behavior of individual stocks is influenced by their relationships with other entities. These relationships can arise from various factors, including industry affiliations, supply chain linkages, and investor sentiment. Capturing these interdependencies is crucial for accurate stock price forecasting.
Graph Neural Networks (GNNs) have emerged as a powerful tool for modeling such relationships by representing the market as a graph, where nodes represent stocks and edges represent connections between them
Satishbhai Sonani et al. (
2025). Early GNN-based approaches often constructed graphs based on static correlations or pre-defined relationships, limiting their adaptability to dynamic market conditions
Chauhan (
2025);
Hu and Wang (
2025). For instance, conventional methods rely on Pearson correlation coefficients computed over a fixed period to determine edge weights, assuming that inter-stock relationships remain constant over time. These approaches lack adaptability to the shifting dynamics of real-world markets, where correlations can shift rapidly in response to economic events and investor behavior.
More recent studies have explored adaptive graph learning techniques to capture evolving inter-stock relationships
Cui et al. (
2023);
J. Kim et al. (
2019);
Sawhney et al. (
2021). These methods typically employ attention mechanisms or learnable similarity metrics to dynamically adjust edge weights based on the current market state. For example, the MGAR framework utilizes a meta-graph structure to capture both local and global dependencies among stocks, adapting the graph structure over time based on market conditions
Song et al. (
2023). However, even these adaptive approaches often suffer from limitations such as small receptive fields and sensitivity to noise, which hinder their ability to capture long-range structural patterns. Furthermore, GNNs are prone to oversmoothing, where repeated message passing can cause node representations to converge, diminishing their distinctiveness and reducing predictive accuracy.
Diffusion-based graph learning outperforms traditional GNNs by propagating information across larger graph neighborhoods. It maintains computational efficiency and produces robust, accurate representations. This approach captures local and global stock dependencies, overcoming small receptive fields and noise sensitivity
Atwood and Towsley (
2016);
Chang et al. (
2020);
Y. Li et al. (
2018);
Vignac et al. (
2023). By simulating a diffusion process on the graph, these methods can capture both local and global dependencies among stocks, overcoming the limitations of small receptive fields. Additionally, sparsification techniques can be employed to reduce computational complexity and mitigate the effects of noise, resulting in more robust and accurate representations
You et al. (
2024);
S. Zhao et al. (
2025). Motivated by this, DASF-Net leverages diffusion processes on two complementary financial graphs—an industry graph and a fundamental graph—to capture a richer set of inter-stock relationships.
2.3. Sentiment Analysis and Temporal Aggregation in Stock Prediction
Sentiment analysis from financial news and social media has become an increasingly important component of stock price forecasting
Araci (
2019);
J. Kim et al. (
2023);
R. Zhang et al. (
2023). The premise is that news events and opinions expressed online can influence investor behavior and, consequently, stock prices. Early sentiment analysis techniques relied on simple lexicon-based methods, which assign sentiment scores to text based on the presence of positive or negative keywords
Taboada et al. (
2011);
L. Zhang and Liu (
2023). However, these methods often fail to capture the nuances of financial language, leading to inaccurate sentiment assessments
Rizinski et al. (
2024).
More recently, deep learning models, particularly transformer-based architectures such as BERT and its variants, have demonstrated superior performance in sentiment analysis tasks. FinBERT, a BERT model pretrained on financial text, has shown particularly strong performance in capturing sentiment in the financial domain
J. Kim et al. (
2023). By leveraging large-scale pretraining and fine-tuning on financial datasets, FinBERT can accurately assess sentiment in news articles, social media posts, and other financial documents.
Despite the advances in sentiment analysis techniques, effectively integrating sentiment into stock price forecasting models remains a challenge
R. Gupta and Chen (
2020);
Loughran and McDonald (
2020). One key issue is the determination of the optimal time window for aggregating sentiment signals
Smales (
2016);
Xiao and Ihnaini (
2023). Too short a window may miss relevant information, while too long a window may dilute the signal with irrelevant noise. Existing studies often rely on fixed or arbitrary time windows, introducing temporal bias and reducing predictive accuracy
Wang et al. (
2019). Moreover, the static nature of these aggregation windows fails to account for the time-varying impact of news sentiment on different stocks, which depends on market conditions and stock-specific factors
Smales (
2016).
In contrast to these fixed-window approaches, DASF-Net systematically identifies an optimal time window for sentiment aggregation, minimizing temporal bias and enhancing predictive accuracy. By empirically evaluating different window sizes, we determine the optimal aggregation period for sentiment signals, ensuring that the model captures temporally relevant information without diluting predictive power.
2.4. Multimodal Fusion Techniques for Integrating Heterogeneous Financial Data
Multimodal fusion is the process of combining information from multiple sources or modalities to improve the performance of a machine learning model
Lahat et al. (
2015);
F. Zhao et al. (
2024). In the context of stock price forecasting, multimodal fusion involves integrating price data, inter-stock relationships, sentiment signals, and other relevant information to create a more comprehensive and accurate model
Wang (
2025);
Zehtab-Salmasi et al. (
2023).
Early multimodal fusion techniques relied on simple concatenation or averaging of features from different modalities
Baltrušaitis et al. (
2018). However, these methods often fail to capture the complex interactions between modalities, constraining performance under real-world volatility. More recent approaches have explored attention mechanisms to dynamically weight the contribution of each modality based on the current market state. For example, attention-based fusion can allow the model to prioritize sentiment signals during periods of high market volatility or focus on inter-stock relationships during stable periods
He and Gu (
2021).
Another challenge in multimodal fusion is dealing with the heterogeneity of different modalities
Baltrušaitis et al. (
2018);
Gao et al. (
2020). Price data are typically represented as time series, inter-stock relationships as graphs, and sentiment signals as text. To effectively combine these modalities, it is necessary to learn a shared representation space that captures the relevant information from each modality. Deep learning models, such as autoencoders and generative adversarial networks (GANs), have been used to learn such representations
Wang (
2021).
DASF-Net employs a multi-head attention (MHA) mechanism to dynamically integrate structural and sentiment features, enabling adaptive weighting of modalities under varying market conditions. This approach allows the model to prioritize the most relevant information from each modality, improving robustness and predictive accuracy compared to static fusion methods.
DASF-Net addresses limitations of prior models through three innovations. First, diffusion-based learning uses a heat kernel to propagate information across larger graph neighborhoods, mitigating oversmoothing in GNNs, as evidenced by a 12% reduction in feature similarity compared to Multi-GCGRU (Table 6,
Section 5.3). Second, a 3-day sentiment aggregation window captures multi-day market trends, overcoming Sentiment-LSTM’s limited 1-day window, which misses sustained sentiment shifts (15% MSE improvement, Table 5,
Section 5.2). Third, Multi-Head Attention (MHA) dynamically fuses structural and sentiment features, unlike LSTM+CNN’s static fusion, improving performance by 10% during volatile periods (Table 4,
Section 5.1). These improvements are detailed in
Table 1, which compares baseline models and their shortcomings with DASF-Net’s advancements.
3. Method
This section provides a detailed description of the problem formulation and the proposed Diffusion-Aware Sentiment Fusion Network (DASF-Net) framework, including mathematical formulations and implementation details to ensure clarity and reproducibility.
3.1. Problem Definition
Given a dataset containing
N stocks, we frame the stock price prediction task as a regression problem, where the goal is to estimate each stock’s future price at time
based on its state at time
t. Formally, the prediction for stock
i is expressed as:
where
represents the input feature vector for stock
i at time
t, and
is the predicted price at the next time step.
At the dataset level, the feature matrix and the corresponding label matrix capture stock attributes and their target future prices across all stocks, respectively. Here, M denotes the dimensionality of each feature vector , and is the scalar target for stock i.
In this work, we enhance the input representation by incorporating two key components: (i) inter-stock relationships, captured via P-Reps, and (ii) sentiment-based features, reflecting market sentiment (positive, neutral, or negative) at time t.
3.2. Proposed Framework
The DASF-Net architecture (
Figure 1) consists of five key components: (1) dual financial graph construction, (2) diffusion-based structural representation learning (Price-based Representation, P-Rep), (3) sentiment-aware representation extraction (Sentiment-Aware Representation, SA-Rep) with optimized temporal aggregation, (4) adaptive multimodal fusion via multi-head attention (MHA), and (5) temporal forecasting with LSTM.
Our framework integrates structured inter-stock dependencies with sentiment cues from financial news, adaptively learned for robust stock forecasting.
3.2.1. Dual Financial Graph Construction
To capture multifaceted inter-stock relationships, we construct two complementary graphs, each focusing on different aspects of the market structure:
Industry Graph (IG): This graph represents static, sector-based affiliations. An edge exists between two stocks if they operate within the same industry sector, reflecting inherent similarities in their business models and market exposures:
where
is the edge weight between stock
i and stock
j in the Industry Graph (IG). This binary value indicates the presence (1) or absence (0) of a connection based on industry sector membership.
Fundamental Graph (FG): In contrast to the static IG, the FG captures dynamic, return-based relationships. Edges in this graph reflect the similarity in historical return patterns between stocks, capturing how stocks move in relation to one another:
First, we calculate the return sequence
for each stock
i over a lookback period
. Each return
is computed as:
where
denotes the closing price of stock
i at time
t.
Then, the edge weight
between stocks
i and
j is determined by the absolute cosine similarity of their return sequences
and
:
This ensures that the edge weights reflect the degree of correlation in the stocks’ return behaviors, irrespective of the direction of the relationship (positive or negative).
Formally, each graph , where , consists of a set of nodes V (representing the stocks) and an adjacency matrix representing the edge weights, where N is the number of stocks. By integrating these dual graphs, DASF-Net effectively captures both inherent (industry-based) and emergent (return-based) relationships within the stock market, enabling a more comprehensive representation of inter-stock dependencies.
3.2.2. Diffusion-Based Structural Representation Learning (P-Rep)
To capture multi-hop, dynamic, and non-local dependencies among stocks, we adopt a diffusion-based graph learning paradigm to encode inter-stock structural relationships. Unlike traditional GNN-based models that rely on localized message passing within fixed neighborhoods, our approach models node interactions via diffusion processes, allowing for more expressive and flexible information propagation across the graph.
Specifically, we define the structural embedding
for stock
i in a given graph
, where
, using a diffusion process:
Here, denotes the initial feature vector of node i (its return sequence over a lookback period ), and is the adjacency matrix of graph g. The function simulates a diffusion process on the graph, starting from node i, for steps, with a diffusion rate . This process allows information to propagate beyond immediate neighbors, capturing deeper relational dependencies.
In our implementation, the diffusion process is defined as:
where
is the initial feature matrix (return sequences),
is the adjacency matrix including all edges of graph
, and
is the feature matrix at diffusion step
t. This iterative process aggregates information from increasingly distant neighbors, with the parameter
controlling the balance between local and global information. We then extract the structural embedding for each stock
i from the final diffused feature matrix
.
The resulting embeddings for IG and FG, denoted as
and
, respectively, are calculated as:
where
is the matrix of return sequences for all stocks, and
and
are the adjacency matrices for the industry and fundamental graphs, respectively. Both
and
are treated as complementary P-Rep views, encoding market structure from distinct topological perspectives.
3.2.3. Sentiment-Aware Representation Extraction (SA-Rep) with Optimized Temporal Aggregation
To incorporate market sentiment, we process daily financial news associated with each stock. Let
be a set of
news articles for stock
i on day
t. We use FinBERT to extract sentiment embeddings from each article, obtaining a sentiment score
for article
:
The raw sentiment vector
represents the sentiment scores of all news articles related to stock
i on day
t. This variable-length sequence is then compressed into a 5-dimensional feature vector using basic statistics:
where
is the statistics of
.
To capture temporal dynamics and optimize the aggregation window, we perform an empirical analysis to determine the optimal time window
for sentiment aggregation. Through experiments on a validation set, we found that a 3-day window (
) consistently yields the best performance across diverse stocks. The aggregated sentiment input for stock
i at time
t is then constructed as:
This matrix is flattened and passed through a fully connected layer to produce a fixed-length sentiment-aware embedding
:
Collectively, we obtain a sentiment-aware representation matrix
, which is later fused with the P-Reps from IG and FG using a Multi-Head Attention mechanism, as detailed in
Section 3.2.4.
3.2.4. Adaptive Feature Fusion via Multi-Head Attention
Since the model constructs three distinct types of embeddings—two graph-based structural representations (P-Rep from IG and FG) and one sentiment-aware representation (SA-Rep)—it is crucial to integrate them in a manner that captures their complementary contributions. These embeddings encode stock information from different perspectives: sectoral structure, behavioral correlation, and sentiment dynamics. Direct concatenation or simple pooling would fail to model the intricate dependencies and relevance between them.
To address this, we employ a Multi-Head Attention (MHA) mechanism (
Figure 2), which allows the model to adaptively learn both the importance and interaction of each embedding stream. Unlike single-head attention, MHA employs multiple parallel attention heads, each focusing on different subspaces of the input features. This enhances the model’s expressiveness while maintaining computational efficiency.
Let denote the concatenation of the three embedding matrices along the feature dimension, where each corresponds to a modality-specific representation (industry, fundamental, sentiment), and N is the number of stocks.
We then linearly project each modality-specific representation into query, key, and value spaces using learned parameter matrices:
Here,
are the projection matrices for modality * (IG, FG, SA), and
is the dimensionality per head. For each attention head
, we compute the scaled dot-product attention as:
All attention heads are then concatenated and linearly transformed to obtain the fused representation:
where
is the output projection matrix.
This fusion layer enables the model to capture both intra-modality and inter-modality interactions dynamically. The attention weights reflect the relevance of each modality to the prediction task, allowing the model to suppress irrelevant signals while enhancing critical features. The resulting fused embedding is then passed to an LSTM layer for temporal modeling and prediction.
3.2.5. Temporal Forecasting
The Multi-Head Attention mechanism fuses features from the industry graph (IG), fundamental graph (FG), and sentiment embeddings (SA-Rep). The resulting unified representation encodes rich multi-modal information for each stock. An LSTM layer then models temporal dependencies and predicts stock prices.
While conventional feedforward neural networks are inadequate for this task due to their lack of memory, Recurrent Neural Networks (RNNs)
Sherstinsky (
2020) were designed to address this by maintaining hidden states across time steps. However, standard RNNs often struggle with vanishing gradients, limiting their ability to learn long-range dependencies. Long Short-Term Memory (LSTM) networks
Hochreiter and Schmidhuber (
1997) overcome these limitations by introducing gated memory units that selectively retain, update, or discard information over time. These gates allow the network to preserve relevant information from earlier time steps, making LSTMs particularly well suited for financial forecasting scenarios where market behavior can be influenced by events or trends occurring over extended periods.
In this study, we chose LSTM for the prediction module due to its proven effectiveness in capturing temporal dependencies in financial time series data. While more recent techniques like transformers or temporal convolutional networks have shown promise in other domains, LSTMs remain a robust and computationally efficient choice for sequence modeling, especially given the relatively short sequence lengths in our daily stock price data. Additionally, captures complex spatial dependencies, making the LSTM a suitable complement for temporal modeling.
3.2.6. DASF-Net Training Procedure
The complete training process of the proposed DASF-Net framework is summarized in Algorithm 1. The model ingests historical stock price data and news articles, transforming them into structured graph-based representations and sentiment-based features, respectively. These are then adaptively fused via Multi-Head Attention before being processed by an LSTM network to generate future stock price predictions.
Algorithm 1 DASF-Net Training Algorithm. |
Input: Historical stock prices ,
Financial News Articles N is number of stocks, and are lookback windows for prices and articles. Output: Predicted stock prices
1: Construct Industry Graph and Fundamental Graph (Section 3.2.1)
2: Initialize model parameters
3: for epoch = 1 to MaxEpochs do
4: for t = to do
5: Graph Representation Learning:
6: for each graph do
7: Compute stock return sequences N using historical prices P
8: Construct adjacency matrix based on Equations (6) or (7)
9: Generate structural embeddings via diffusion process (Equations (8)–(10))
10: end for
11: Sentiment Representation Learning:
12: for each stock do
13: Retrieve financial news articles
14: for each article do
15: Compute sentiment score using FinBERT (Equation (11))
16: end for
17: Construct sentiment statistics using Equation (12)
18: end for
19: Construct sentiment-aware representation using Equations (13) and (14)
20: Feature Fusion and Prediction:
21: Fuse , , via Multi-Head Attention (Section 3.2.4)
22: for each stock do
23: Predict next price
24: end for
25: Compute Mean Squared Error (MSE) loss:
26: Update parameters:
27: end for
28: end for
29: return Predicted stock prices |
4. Experimental Setup
This section outlines the dataset, evaluation metrics, baseline models, and parameter settings used to evaluate our proposed framework. We provide a comprehensive description to ensure reproducibility and clarity in assessing the model’s performance.
4.1. Dataset
For this study, we focus on the period from
1 January 2020 to 31 December 2023, capturing recent market dynamics, including the COVID-19 pandemic and economic recovery phases. This period ensures relevance to contemporary financial conditions. The dataset is divided into training, validation, and test sets, as shown in
Table 2, with temporal separation to evaluate generalization to unseen future data. We select
12 major S&P 500 stocks based on their market capitalization and sector diversity to ensure a representative sample of the market.
Data Preprocessing:
Stock Prices: Normalized using min-max scaling to ensure comparability across stocks with varying price ranges.
News Articles: Missing articles are handled by propagating the most recent available sentiment score, ensuring continuity in sentiment analysis.
Graph Construction: The dataset is filtered to include only the largest connected component of the stock graph to maintain consistency in graph-based learning.
4.2. Evaluation Metrics
To comprehensively assess DASF-Net’s performance and compare it with baseline models, we employ the following widely used regression metrics:
Mean Squared Error (MSE): Measures the average squared difference between predicted and actual stock prices, giving greater weight to larger errors. It is calculated as:
where
represents the actual stock price and
is the predicted price for the
i-th data point, and
n is the number of data points.
Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual stock prices, providing a more robust measure against outliers. It is calculated as:
where
and
are the actual and predicted stock prices, respectively, and
n is the number of data points.
These metrics provide complementary insights into the models’ predictive accuracy. MSE penalizes larger errors more heavily, while MAE provides a more balanced assessment by treating all errors equally, making it less sensitive to outliers.
4.3. Baseline Models
In this chapter, we present the baseline models for the stock price prediction compared to our proposal.
LSTM + CNN Eapen et al. (
2019): A hybrid model combining Convolutional Neural Networks (CNNs) for spatial feature extraction with Long Short-Term Memory (LSTM) units for temporal modeling, effective for sequential data but limited to price-based inputs.
Multi-GCGRU Ye et al. (
2021): Integrates Graph Convolutional Networks (GCNs) with Gated Recurrent Units (GRUs) to model both structural relationships and temporal dynamics among stocks.
Sentiment + LSTM Jin et al. (
2020): An LSTM-based model incorporating sentiment features extracted from financial news, capturing qualitative signals but lacking structural modeling.
MGAR D. Cao et al. (
2020): A framework that fuses embeddings from multiple graph structures (e.g., industry, correlation) to enhance stock price prediction, representing a multimodal graph-based approach.
These baselines cover a spectrum of methodologies, enabling a robust comparison with our DASF-Net.
4.4. Parameter Settings
All models, including DASF-Net and baselines, were optimized using a combination of grid and random search on the validation set, with configurations selected based on the lowest validation MSE.
Table 3 summarizes the hyperparameters for DASF-Net, incorporating settings for diffusion-based graph learning, sentiment aggregation, and attention mechanisms.
For baseline models, hyperparameters were tuned within comparable ranges to ensure fairness:
LSTM+CNN: Hidden dimension of 32–64, learning rate of 0.001–0.01, up to four layers.
Multi-GCGRU: 1–3 GCN layers, GRU hidden size of 32–64, learning rate of 0.001–0.01.
Sentiment+LSTM: Sentiment window size of 1–5 days, LSTM hidden size of 32–64, learning rate of 0.001–0.01.
MGAR: Four graph types (industry, correlation, etc.), hidden dimension of 32–64, learning rate of 0.001–0.01.
These settings ensure a fair and rigorous comparison, with all models optimized for the FNSPID dataset and forecasting task.
5. Results
5.1. Forecasting Performance
This section presents the forecasting performance of the Diffusion-Aware Sentiment Fusion Network (DASF-Net) compared to state-of-the-art baseline models and ablation variants across 1-day, 2-day, and 3-day prediction horizons. We evaluate performance using the Mean Squared Error (MSE) and Mean Absolute Error (MAE), where lower values indicate higher accuracy. To ensure robustness, we performed paired t-tests to confirm the statistical significance of DASF-Net’s improvements over baselines, with unless otherwise noted.
Table 4 presents the results for 1-day, 2-day, and 3-day stock price forecasts, comparing DASF-Net against baselines and ablations. We use the Mean Squared Error (MSE) and Mean Absolute Error (MAE) to evaluate performance, where lower values indicate higher accuracy.
DASF-Net consistently outperforms all baseline models across all forecasting horizons, as shown in
Table 4. For 1-day predictions, DASF-Net (Full) achieves an MSE of
, representing a relative reduction of 91.6% compared to MGAR (
D. Cao et al., 2020) (
), 94.7% compared to Sentiment + LSTM (
Jin et al., 2020) (
), 80.0% compared to LSTM+CNN (
Eapen et al., 2019) (
), and 98.8% compared to Multi-GCGRU (
Ye et al., 2021) (
). Similar improvements are observed for MAE, with DASF-Net achieving
, a 6.3% to 86.4% reduction relative to baselines. These gains are statistically significant (
, paired
t-test), underscoring DASF-Net’s superior accuracy and robustness.
The full model, which integrates industry graph (IG), fundamental graph (FG), and sentiment-aware representations (SA-Rep), outperforms variants using only IG ( MSE for 1-day) or FG ( MSE) alone, demonstrating the complementarity of dual-graph learning. The IG+FG variant ( MSE) improves over single-graph models but is surpassed by the full model by 17.4% in MSE for 1-day predictions, emphasizing the critical role of sentiment integration via FinBERT. Compared to the FG-only variant, the full model reduces MSE by 88.8%, highlighting the synergy of structural (P-Rep) and sentiment (SA-Rep) representations.
For 3-day forecasts, DASF-Net (Full) maintains its advantage, achieving an MSE of and an MAE of , compared to and for IG-only, and and for FG-only variants. These results demonstrate that incorporating sentiment signals enhances long-term forecasting accuracy, particularly in volatile markets where news-driven sentiment plays a significant role. The relative MSE reduction over baselines ranges from 45.0% (LSTM+CNN) to 93.5% (Multi-GCGRU) for 3-day predictions, further validating DASF-Net’s robustness across horizons.
In conclusion, DASF-Net achieves state-of-the-art performance in multi-horizon stock price forecasting by effectively integrating diverse market signals through diffusion-based graph learning and optimized sentiment fusion. These results, validated on the FNSPID dataset, establish DASF-Net as a robust framework for financial forecasting, with significant improvements over existing methods.
5.2. Impact of Sentiment Aggregation Window Size
To highlight the impact of temporal context in sentiment analysis, we analyze the sensitivity of DASF-Net to the sentiment aggregation window size, denoted
. This parameter determines the number of preceding trading days from which sentiment is aggregated to form the sentiment-aware representation (SA-Rep). We assess the model’s performance across a range of
values using our set of 12 representative S&P 500 stocks from four sectors.
Table 5 presents MSE and MAE values averaged across these stocks for varying sentiment window sizes.
The sensitivity analysis presented in
Table 5 meticulously examines the impact of the sentiment window size, denoted as
, on the prediction performance of the DASF-Net model for 1-day forecasting, averaged across 12 stocks. This study highlights the critical importance of selecting an appropriate window for sentiment integration.
As evidenced by the data, the model demonstrates optimal prediction accuracy when the sentiment window size is set to days. Consequently, this value is explicitly identified as the optimal window size, . At this configuration, the model achieves the lowest MSE of and MAE of .
Deviations from this optimal window size lead to a consistent degradation in performance. A shorter window of day, for instance, results in higher errors (MSE , MAE ), suggesting that a very narrow sentiment scope may lack sufficient contextual information. Conversely, progressively increasing the window size beyond consistently worsens performance. For , the MSE rises to and MAE to . This trend becomes more pronounced with larger windows, culminating in the highest errors at days (MSE , MAE ). This indicates that excessively large sentiment windows may introduce irrelevant noise, dilute the impact of recent and more pertinent sentiment, or incorporate outdated information, thereby hindering predictive accuracy.
We also tested intermediate windows and to ensure the observed optimum is not an artifact of coarse sampling. These additional results show that both 2-day and 4-day windows perform slightly worse than , with average MAE differences within 0.05–0.07 percentage points. This suggests a relatively flat error surface in the 2, 4 day region but confirms as the global optimum due to its consistent superiority across most stocks.
Furthermore, to address potential sector- or volatility-dependence of the optimal window size, we analyzed per-stock performance across the 12 representative S&P 500 stocks from four distinct sectors (Information Technology, Consumer Discretionary, Energy, Communication Services). As reported in
Table A1,
consistently provides strong and stable results across both highly volatile stocks (e.g., TSLA, NVDA) and more stable ones (e.g., V, XOM). This analysis reinforces the robustness of
and supports the choice of a uniform sentiment window in DASF-Net.
We assessed the temporal stability of FinBERT sentiment scores, observing a 15% increase in variance during the COVID-19 period (2020–2021) compared to 2022–2023. The 3-day aggregation window mitigates these fluctuations, as discussed in
Section 3.2.3.
5.3. Impact of Individual Components Within DASF-Net
To quantify the individual contributions of each component within DASF-Net, we conduct an ablation study. We evaluate the performance impact of removing or altering key modules, training, and evaluating all configurations under identical conditions for 1-day forecasting across the 12 representative S&P 500 stocks. The results are presented in
Table 6.
The ablation study, systematically presented in
Table 6, provides compelling evidence for the indispensable contribution of each proposed component to the overall predictive performance of the DASF-Net model for 1-day forecasting.
The Full DASF-Net configuration achieves the most favorable results, demonstrating an MSE of and an MAE of . This serves as the benchmark for evaluating the individual impact of each module.
The study reveals distinct performance degradations upon the successive removal of key components. Specifically, the omission of the embedding named , which captures sector-level relationships, leads to a noticeable increase in MSE to and MAE to . Similarly, the exclusion of the embedding named , constructed from 20-day stock return similarities, results in a heightened MSE of and MAE of . These findings underscore the significant utility of both graph-based embeddings in comprehensively representing market inter-dependencies.
Most notably, constructing SA-Rep proves to be a pivotal component. Its removal leads to the most pronounced decline in accuracy, with the MSE escalating to and MAE to . This highlights the critical and indispensable role of incorporating real-time sentiment information for robust predictive capabilities.
Furthermore, a comparative analysis of feature fusion strategies accentuates the efficacy of the full DASF-Net’s integrated architecture. The superior performance of the Full DASF-Net indicates that its more sophisticated fusion mechanism (implicitly Multi-Head Attention) is crucial.
In conclusion, the ablation study unequivocally demonstrates that the optimal predictive performance of DASF-Net is contingent upon the synergistic integration of all proposed components: the embedding constructed from the industry graph (IG) for sector-level insights, the embedding constructed from the fundamental graph (FG) for capturing stock trending similarity, and the vital sentiment analysis representation (SA-Rep), all effectively combined through its advanced feature fusion architecture.
5.4. Impact of Attention Heads and Fusion Methods
To evaluate the role of the attention-based fusion mechanisms in our framework, we systematically examine the impact of (i) varying the number of attention heads in the MHA module and (ii) comparing it to alternative static fusion methods. Specifically, we consider:
Self-Attention: a single-head attention mechanism that lacks the ability to learn diverse relational perspectives.
MHA-n: multi-head attention with heads to allow distributed representation learning across different subspaces.
Mean-Pooling: uniform averaging across feature representations.
Max-Pooling: selection of dominant features without contextual adaptivity.
As shown in
Table 7 and
Figure 3, the number of attention heads significantly influences the quality of feature fusion. The MHA configuration with 16 heads consistently yields the best performance, achieving an MSE of
and an MAE of
, substantially outperforming self-attention and static pooling methods.
Notably, increasing the number of heads from 2 to 16 progressively reduces the error, indicating that the model benefits from attending to multiple subspaces in parallel. This richer representation allows the network to better model complex interactions between price-based and sentiment-based features. While improvements taper off beyond 8 heads, MHA-16 still provides marginal gains, suggesting its effectiveness in capturing fine-grained cross-modal dependencies.
In contrast, static fusion approaches such as Mean-Pooling and Max-Pooling are markedly less effective. These methods apply uniform or fixed aggregation, which cannot adaptively emphasize contextually important signals. For instance, Max-Pooling achieves an MSE of , which is more than twice the error of MHA-16. Similarly, Mean-Pooling performs better than Max-Pooling but still lags behind even MHA-2.
These findings reinforce that:
Learnable fusion methods significantly outperform fixed ones in modeling heterogeneous financial features.
The use of multiple attention heads provides complementary views of data, enabling more accurate and robust predictions.
MHA-16 strikes the best balance between model complexity and predictive accuracy in our DASF-Net framework.
5.5. Impact of Diffusion Strategies
To assess the sensitivity of DASF-Net to different diffusion formulations, we compare three widely used non-recurrent methods: Random Walk (RW), Personalized PageRank (PPR), and Heat Kernel (HK). For consistency, a top-
k sparsification (keeping the strongest 128 edges per node) was applied in the PPR and RW configurations, following the approach in (
Gasteiger et al., 2022). Each method controls the spread of information across the graph in a distinct way.
As shown in
Table 8, the Heat Kernel strategy consistently yields the best performance, achieving the lowest MSE (
) and MAE (
). This highlights the benefit of exponential smoothing over the graph Laplacian, which effectively balances local node identity and global structure.
The choice of diffusion parameters, such as the teleportation probability in Personalized PageRank, was determined through cross-validation on a held-out validation set. We found that a teleportation probability of 0.15 provided the best balance between local and global information propagation. Additionally, our experiments with different diffusion kernels showed that Personalized PageRank outperformed uniform Random Walk, likely due to its ability to bias the diffusion towards the source node, preserving node-specific signals. This is consistent with our ablation study results, where the uniform Random Walk approach led to a slight degradation in performance.
Overall, these results indicate that the choice of diffusion kernel significantly affects model performance, reinforcing the flexibility of DASF-Net in supporting multiple graph learning paradigms.
6. Conclusions
This work introduces the Diffusion-Aware Sentiment Fusion Network (DASF-Net), a novel multimodal framework for stock price forecasting that integrates structural and sentiment information through diffusion-based graph learning and adaptive fusion. By addressing limitations in traditional Graph Neural Networks (GNNs) and sentiment aggregation methods, DASF-Net achieves state-of-the-art performance on a large-scale financial dataset. This section summarizes our contributions, highlights key empirical findings, and outlines promising directions for future research.
6.1. Summary of Contributions
DASF-Net integrates three core components to advance stock price forecasting: dual-graph learning, sentiment encoding, and adaptive multimodal fusion. First, DASF-Net employs heat kernel diffusion on two complementary financial graphs—an industry graph (IG) capturing static sectoral relationships and a fundamental graph (FG) encoding dynamic return-based similarities—to model higher-order inter-stock dependencies. This approach overcomes limitations of traditional GNNs, such as oversmoothing and small receptive fields. Second, sentiment-aware representations (SA-Rep) are extracted from financial news using FinBERT, with a systematically optimized 3-day aggregation window to capture short-term investor sentiment while minimizing temporal bias, addressing issues in fixed-window approaches. Third, a multi-head attention (MHA) mechanism adaptively fuses structural (P-Rep) and sentiment (SA-Rep) representations, enabling dynamic weighting of modalities under fluctuating market conditions. These innovations collectively establish DASF-Net as a robust and flexible framework for multimodal financial forecasting. However, DASF-Net also presents certain limitations, such as its dependency on high-quality and timely financial news data, and potential scalability challenges when applied to very large stock universes. These aspects highlight promising avenues for future work to enhance the model’s practicality and generalizability.
6.2. Key Findings
We conducted experiments on the Financial News and Stock Price Integration Dataset (FNSPID) dataset (
Dong et al., 2024), covering 12 S&P 500 stocks from 2020 to 2023. DASF-Net outperforms baselines like MGAR, Sentiment-LSTM, LSTM + CNN, and Multi-GCGRU. For 1-day predictions, DASF-Net achieves an MSE of
, representing a relative reduction of 91.6% compared to MGAR (
D. Cao et al., 2020), 94.7% compared to Sentiment + LSTM (
Jin et al., 2020), 80.0% compared to LSTM + CNN (
Eapen et al., 2019), and 98.8% compared to Multi-GCGRU (
Ye et al., 2021). Similar improvements are observed for MAE, with DASF-Net achieving
, a 6.3% to 86.4% reduction relative to baselines. These gains are statistically significant (
, paired
t-test), underscoring DASF-Net’s robustness.
Although our experiments focused on 12 major S&P 500 stocks, the DASF-Net framework is designed to be adaptable to a wider range of stocks and market conditions. The dual-graph approach, combining industry affiliations and dynamic return-based similarities, should theoretically capture both sector-specific and market-wide dependencies, making it applicable to small-cap or international stocks. However, further experiments on diverse datasets would be necessary to confirm this. Additionally, during periods of high volatility, the sentiment integration component may become even more crucial, as news and investor sentiment often drive rapid price movements. Future work could explore the model’s performance during such periods, potentially incorporating real-time sentiment analysis for more responsive predictions.
Ablation studies (
Section 5.4) confirm that replacing MHA with concatenation increases MSE by 139.5% with the Mean-Pool method, emphasizing the critical role of adaptive fusion. Sensitivity analyses further reveal that the 3-day sentiment window and deeper attention heads enhance predictive accuracy, particularly in volatile markets, by effectively capturing short-term sentiment dynamics and fine-grained cross-modal interactions. These findings validate DASF-Net’s design and its ability to model complex financial dynamics through diffusion-based learning and optimized sentiment aggregation.
6.3. Practical Considerations
While DASF-Net primarily targets predictive accuracy, we recognize the importance of interpretability, scalability, and adaptability for practical deployment. The model’s attention mechanisms in both the diffusion and sentiment modules provide inherent explainability by highlighting which graph connections and news tokens drive predictions. For scalability, diffusion updates operate incrementally—processing only newly added or removed edges—while sentiment embeddings are cached to avoid recomputation for unchanged headlines. This design reduces computational overhead significantly and supports real-time inference for larger universes. Furthermore, DASF-Net is asset-agnostic: adapting it to other markets requires only updating the relationship graph and using a suitable multilingual sentiment encoder. These considerations strengthen the framework’s potential for real-world applications.
6.4. Future Directions
This research opens several promising avenues for advancing multimodal financial forecasting. First, exploring alternative diffusion kernels, such as learnable or graph-adaptive kernels, could enhance the capture of richer structural semantics. Second, integrating large language models (LLMs) beyond FinBERT, such as those capable of narrative-driven or event-based reasoning, could enable deeper analysis of financial texts, moving beyond sentence-level sentiment to capture macroeconomic trends or company-specific events (
Jin et al., 2020). Third, modeling hierarchical financial graphs at multiple resolutions—spanning company-level, sector-level, and macroeconomic interactions—could improve the representation of complex market dynamics. Additionally, optimizing DASF-Net’s computational complexity through sparse diffusion updates, model pruning, or hardware acceleration can further support real-time inference and scalability in large-scale applications. Finally, extending DASF-Net to other financial tasks, such as volatility prediction or portfolio optimization, could broaden its applicability.
Future work should also extend the evaluation of DASF-Net to longer and more diverse market periods beyond 2020–2023 to assess robustness across typical economic cycles. The current study focused on this recent period as it provides a challenging testbed with extreme volatility and rapid structural changes, while also reflecting practical constraints due to the availability of high-quality, large-scale sentiment data in recent years. Earlier periods often lack comprehensive sentiment annotations aligned with price series, which presents a challenge for historical evaluation. Addressing this limitation in future studies could further validate the model’s generalizability and performance under different market conditions.
In conclusion, DASF-Net sets a strong benchmark for stock price forecasting by synergistically integrating diffusion-based dual-graph learning, optimized sentiment encoding, and adaptive multi-head attention. Its design considerations for interpretability, scalability, and portability further pave the way for practical deployment and innovations in multimodal predictive modeling.