DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion

Nguyen, Nhat-Hai; Nguyen, Thi-Thu; Ngo, Quan T.

doi:10.3390/jrfm18080417

Open AccessArticle

DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion

by

Nhat-Hai Nguyen

¹

,

Thi-Thu Nguyen

¹

and

Quan T. Ngo

^2,*

¹

Department of Computer Science, School of Information and Communications Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam

²

Department of Artificial Intelligence, FPT University, Da Nang 550000, Vietnam

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2025, 18(8), 417; https://doi.org/10.3390/jrfm18080417

Submission received: 2 July 2025 / Revised: 18 July 2025 / Accepted: 24 July 2025 / Published: 28 July 2025

(This article belongs to the Special Issue Machine Learning Applications in Finance, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Stock price forecasting remains a persistent challenge in time series analysis due to complex inter-stock relationships and dynamic textual signals such as financial news. While Graph Neural Networks (GNNs) can model relational structures, they often struggle with capturing higher-order dependencies and are sensitive to noise. Moreover, sentiment signals are typically aggregated using fixed time windows, which may introduce temporal bias. To address these issues, we propose DASF-Net (Diffusion-Aware Sentiment Fusion Network), a multimodal framework that integrates structural and textual information for robust prediction. DASF-Net leverages diffusion processes over two complementary financial graphs—one based on industry relationships, the other on fundamental indicators—to learn richer stock representations. Simultaneously, sentiment embeddings extracted from financial news using FinBERT are aggregated over an empirically optimized window to preserve temporal relevance. These modalities are fused via a multi-head attention mechanism and passed to a temporal forecasting module. DASF-Net integrates daily stock prices and news sentiment, using a 3-day sentiment aggregation window, to forecast stock prices over daily horizons (1–3 days). Experiments on 12 large-cap S&P 500 stocks over four years demonstrate that DASF-Net outperforms competitive baselines, achieving up to 91.6% relative reduction in Mean Squared Error (MSE). Results highlight the effectiveness of combining graph diffusion and sentiment-aware features for improved financial forecasting.

Keywords:

financial forecasting; stock price prediction; sentiment analysis; diffusion-based graph learning; multi-modal deep learning; FinBERT

1. Introduction

Stock price forecasting remains a cornerstone of financial time series analysis, yet it poses significant challenges due to the intricate interplay of heterogeneous factors, including historical price movements, inter-stock relationships, and market sentiment derived from financial news. The inherent noise and non-stationarity in financial data further exacerbate these challenges, making accurate prediction difficult (Pilla & Mekonen, 2025; Qian et al., 2024). The complexity of financial markets, characterized by nonlinear dynamics and high volatility, demands models capable of capturing both temporal dependencies and structural relationships among stocks.

Traditional approaches, such as autoregressive models (e.g., ARIMA) Khashei et al. (2009) and volatility models (e.g., GARCH) H. Kim and Won (2018), often rely on linear assumptions and univariate analyses, which limit their ability to capture the nonlinear and dynamic behaviors of financial markets. Moreover, prior studies Vera Barberán (2020) point out that these models tend to overlook critical external factors, such as macroeconomic indicators and economic news.

To address these limitations, recent research has explored the use of deep learning techniques, particularly recurrent neural networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs), to model temporal dependencies in stock price movements Eapen et al. (2019); Jin et al. (2020); Moghar and Hamiche (2020); Sherstinsky (2020); Shi et al. (2024); Xu and Keselj (2019). These models have demonstrated superior performance compared to traditional time series methods due to their ability to capture nonlinear relationships and long-range dependencies in sequential data. RNN-based approaches often treat stocks as independent entities. They neglect crucial inter-dependencies from industry affiliations, investor behavior, and macroeconomic linkages. These relationships are critical for accurate stock price forecasting in complex financial markets Krishnan et al. (2024); Zabaleta et al. (2024).

More recently, Graph Neural Networks (GNNs) have emerged as a promising tool for modeling structural dependencies among stocks by representing market relationships as graphs Chen et al. (2018); Shi et al. (2024). Early GNN-based approaches often constructed graphs based on static correlations or pre-defined relationships, limiting their adaptability to dynamic market conditions Zheng et al. (2025). Subsequent studies have explored more sophisticated graph construction techniques, such as adaptive graph learning and attention mechanisms, to capture evolving inter-stock relationships K. X. Li (2025). However, GNNs are inherently constrained by small receptive fields and sensitivity to noise Al-Omari and Al-Omari (2025); Qian et al. (2024); Wang and Cai (2020), which hinder their ability to capture complex, higher-order dependencies Krieg et al. (2024) and lead to issues like oversmoothing Kong et al. (2024); Wang et al. (2025). Oversmoothing, in particular, can diminish the distinctiveness of node features, making it difficult to differentiate between stocks and limiting the model’s predictive capacity. Furthermore, noise in the graph structure can propagate through the network, corrupting node representations and further degrading performance.

Additionally, sentiment analysis from financial news, while offering valuable early signals of market movements, often relies on fixed or arbitrary time windows for aggregation, introducing temporal bias and reducing predictive accuracy Qian et al. (2024). The challenge lies in determining the optimal time frame for aggregating sentiment signals; too short a window may miss relevant information, while too long a window may dilute the signal with irrelevant noise Qian et al. (2024). Also, these fixed aggregation windows do not take into account how news sentiment changes over time and how it affects different stocks, which relies on market conditions and factors that are unique to each stock. The existing literature also presents conflicting evidence regarding the impact of sentiment on stock prices, with some studies suggesting a positive correlation and others indicating a more complex, nuanced relationship. These inconsistencies highlight the need for more sophisticated methods for sentiment extraction and integration into forecasting models.

To address these challenges, we propose the Diffusion-Aware Sentiment Fusion Network (DASF-Net). This novel multimodal framework synergistically integrates diffusion-based graph learning with sentiment-aware representations derived from pretrained language models. Unlike the traditional regressive models, DASF-Net closes these limitations by recalibrating a dual (industry + fundamental) graph daily to track non-stationary co-movements and capture static sectoral relationships in both local and global dependencies among stocks. DASF-Net uses diffusion processes on two financial graphs: an industry graph for static sectoral relationships and a fundamental graph for dynamic stock interactions. This approach models local and global stock dependencies. It mitigates GNN limitations, such as oversmoothing and restricted receptive fields, by propagating information across larger graph neighborhoods. Specification techniques ensure computational efficiency.

This design mitigates key GNN limitations, such as oversmoothing and restricted receptive fields, by propagating information across larger graph neighborhoods while maintaining computational efficiency through sparsification techniques Gasteiger et al. (2022). The DASF-Net resolves these deficiencies of the traditional GNNs through two principle mechanisms: (a) heat-kernel diffusion over complementary industry and fundamental graphs, enlarging the receptive field while preserving node individuality; (b) daily re-estimation of graph edges, ensuring regime awareness and immunity to stale correlations. Concurrently, DASF-Net extracts sentiment embeddings from financial news using FinBERT, a domain-specific language model tailored for financial text Shobayo et al. (2024). To mitigate temporal bias, we systematically identify an optimal 3-day aggregation window for sentiment, ensuring that the model captures temporally relevant signals without diluting predictive power and calibrates to the empirically observed decay of news influence. These structural and sentiment modalities are fused via a multi-head attention (MHA) mechanism, which dynamically prioritizes relevant features based on market conditions, enhancing the model’s adaptability to volatile financial environments.

Our model leverages daily stock prices and news sentiment for forecasting over 1-day, 2-day, and 3-day horizons, as detailed in Section 4. The experiments, conducted on a dataset comprising 12 major S&P 500 stocks from 2020 to 2023, demonstrate that DASF-Net significantly outperforms state-of-the-art baselines, such as MGAR Song et al. (2023) and Sentiment+LSTM Jin et al. (2020), achieving up to a 91.6% relative reduction in Mean Squared Error (MSE). These results underscore the effectiveness of combining diffusion-based graph learning with optimized sentiment integration, providing a robust framework for financial forecasting. By explicitly addressing the limitations of prior work—such as the oversimplification of inter-stock relationships and the use of arbitrary sentiment windows—DASF-Net sets a strong benchmark for multimodal stock price prediction.

In summary, our work offers the following key contributions:

Diffusion-Based Graph Learning: We introduce diffusion-based graph learning over dual financial graphs (industry and fundamental) to capture higher-order stock dependencies, overcoming the limitations of traditional GNNs in terms of receptive field and noise sensitivity.
Optimized Sentiment Aggregation: We propose a systematic approach to identify an optimal 3-day time window for sentiment aggregation, minimizing temporal bias and enhancing predictive accuracy across multiple stock categories, in contrast to fixed-window approaches.
Adaptive Multimodal Fusion: We develop a multi-head attention mechanism to dynamically integrate structural and sentiment features, enabling adaptive weighting of modalities under varying market conditions and enhancing generalization and resilience compared to static fusion methods.

The remainder of this paper is organized as follows: Section 2 reviews related work. Section 3 details the proposed DASF-Net methodology. Section 4 outlines the experimental setup. Section 5 presents the results and analysis. Finally, Section 6 concludes the paper with a summary and future research directions.

2. Related Work

This section reviews key research areas relevant to the DASF-Net framework, including statistical and deep learning models for stock price forecasting, graph-based methods for modeling inter-stock relationships, sentiment analysis in financial forecasting, and multimodal fusion techniques. We highlight the limitations of existing approaches and demonstrate how they motivate the design of DASF-Net, which integrates diffusion-based graph learning with optimized sentiment aggregation and adaptive fusion to address these gaps.

2.1. Statistical and Deep Learning Approaches for Financial Time Series Prediction

Early approaches to stock price forecasting relied on statistical time series models such as ARIMA Box et al. (2015) and GARCH Bollerslev (1986), which model linear trends and volatility clustering. While computationally efficient, these models struggle to capture the nonlinear and dynamic behaviors inherent in financial markets, limiting their predictive accuracy in volatile conditions.

Machine learning methods, including Support Vector Machines (SVMs) J. Cao et al. (2003), Random Forests Liaw and Wiener (2002), and Gradient Boosting Machines Friedman (2001), improve flexibility by leveraging hand-crafted features but often fail to explicitly model temporal dependencies, leading to suboptimal performance in long-term forecasting. Furthermore, these models typically operate in isolation, neglecting the complex inter-dependencies that exist among different stocks and sectors within the financial market.

Deep learning models have significantly advanced sequence modeling capabilities. Long Short-Term Memory (LSTM) networks Hochreiter and Schmidhuber (1997) and Bidirectional LSTMs (BiLSTMs) Schuster and Paliwal (1997) are able to capture temporal dependencies in stock prices, with notable improvements over traditional methods. Attention mechanisms have also been incorporated to enhance the ability of these models to focus on the most relevant time steps Bahdanau et al. (2015); Vaswani et al. (2017). Hybrid architectures, such as CNN-BiLSTM with attention mechanisms Livieris et al. (2020), further enhance feature extraction by combining convolutional and recurrent layers.

These models, however, are primarily unimodal, focusing solely on price data and neglecting critical external signals such as inter-stock relationships and market sentiment. This limitation restricts their ability to model the multifaceted dynamics of financial markets and capture the subtle nuances that drive stock price fluctuations. Unlike these unimodal approaches, DASF-Net integrates structural dependencies and sentiment signals through diffusion-based graph learning and optimized temporal aggregation, enabling a more comprehensive understanding of market behavior and leading to improved predictive performance.

2.2. Graph Neural Networks and Diffusion-Based Learning in Financial Modeling

The financial market is a complex system where the behavior of individual stocks is influenced by their relationships with other entities. These relationships can arise from various factors, including industry affiliations, supply chain linkages, and investor sentiment. Capturing these interdependencies is crucial for accurate stock price forecasting.

Graph Neural Networks (GNNs) have emerged as a powerful tool for modeling such relationships by representing the market as a graph, where nodes represent stocks and edges represent connections between them Satishbhai Sonani et al. (2025). Early GNN-based approaches often constructed graphs based on static correlations or pre-defined relationships, limiting their adaptability to dynamic market conditions Chauhan (2025); Hu and Wang (2025). For instance, conventional methods rely on Pearson correlation coefficients computed over a fixed period to determine edge weights, assuming that inter-stock relationships remain constant over time. These approaches lack adaptability to the shifting dynamics of real-world markets, where correlations can shift rapidly in response to economic events and investor behavior.

More recent studies have explored adaptive graph learning techniques to capture evolving inter-stock relationships Cui et al. (2023); J. Kim et al. (2019); Sawhney et al. (2021). These methods typically employ attention mechanisms or learnable similarity metrics to dynamically adjust edge weights based on the current market state. For example, the MGAR framework utilizes a meta-graph structure to capture both local and global dependencies among stocks, adapting the graph structure over time based on market conditions Song et al. (2023). However, even these adaptive approaches often suffer from limitations such as small receptive fields and sensitivity to noise, which hinder their ability to capture long-range structural patterns. Furthermore, GNNs are prone to oversmoothing, where repeated message passing can cause node representations to converge, diminishing their distinctiveness and reducing predictive accuracy.

Diffusion-based graph learning outperforms traditional GNNs by propagating information across larger graph neighborhoods. It maintains computational efficiency and produces robust, accurate representations. This approach captures local and global stock dependencies, overcoming small receptive fields and noise sensitivity Atwood and Towsley (2016); Chang et al. (2020); Y. Li et al. (2018); Vignac et al. (2023). By simulating a diffusion process on the graph, these methods can capture both local and global dependencies among stocks, overcoming the limitations of small receptive fields. Additionally, sparsification techniques can be employed to reduce computational complexity and mitigate the effects of noise, resulting in more robust and accurate representations You et al. (2024); S. Zhao et al. (2025). Motivated by this, DASF-Net leverages diffusion processes on two complementary financial graphs—an industry graph and a fundamental graph—to capture a richer set of inter-stock relationships.

2.3. Sentiment Analysis and Temporal Aggregation in Stock Prediction

Sentiment analysis from financial news and social media has become an increasingly important component of stock price forecasting Araci (2019); J. Kim et al. (2023); R. Zhang et al. (2023). The premise is that news events and opinions expressed online can influence investor behavior and, consequently, stock prices. Early sentiment analysis techniques relied on simple lexicon-based methods, which assign sentiment scores to text based on the presence of positive or negative keywords Taboada et al. (2011); L. Zhang and Liu (2023). However, these methods often fail to capture the nuances of financial language, leading to inaccurate sentiment assessments Rizinski et al. (2024).

More recently, deep learning models, particularly transformer-based architectures such as BERT and its variants, have demonstrated superior performance in sentiment analysis tasks. FinBERT, a BERT model pretrained on financial text, has shown particularly strong performance in capturing sentiment in the financial domain J. Kim et al. (2023). By leveraging large-scale pretraining and fine-tuning on financial datasets, FinBERT can accurately assess sentiment in news articles, social media posts, and other financial documents.

Despite the advances in sentiment analysis techniques, effectively integrating sentiment into stock price forecasting models remains a challenge R. Gupta and Chen (2020); Loughran and McDonald (2020). One key issue is the determination of the optimal time window for aggregating sentiment signals Smales (2016); Xiao and Ihnaini (2023). Too short a window may miss relevant information, while too long a window may dilute the signal with irrelevant noise. Existing studies often rely on fixed or arbitrary time windows, introducing temporal bias and reducing predictive accuracy Wang et al. (2019). Moreover, the static nature of these aggregation windows fails to account for the time-varying impact of news sentiment on different stocks, which depends on market conditions and stock-specific factors Smales (2016).

In contrast to these fixed-window approaches, DASF-Net systematically identifies an optimal time window for sentiment aggregation, minimizing temporal bias and enhancing predictive accuracy. By empirically evaluating different window sizes, we determine the optimal aggregation period for sentiment signals, ensuring that the model captures temporally relevant information without diluting predictive power.

2.4. Multimodal Fusion Techniques for Integrating Heterogeneous Financial Data

Multimodal fusion is the process of combining information from multiple sources or modalities to improve the performance of a machine learning model Lahat et al. (2015); F. Zhao et al. (2024). In the context of stock price forecasting, multimodal fusion involves integrating price data, inter-stock relationships, sentiment signals, and other relevant information to create a more comprehensive and accurate model Wang (2025); Zehtab-Salmasi et al. (2023).

Early multimodal fusion techniques relied on simple concatenation or averaging of features from different modalities Baltrušaitis et al. (2018). However, these methods often fail to capture the complex interactions between modalities, constraining performance under real-world volatility. More recent approaches have explored attention mechanisms to dynamically weight the contribution of each modality based on the current market state. For example, attention-based fusion can allow the model to prioritize sentiment signals during periods of high market volatility or focus on inter-stock relationships during stable periods He and Gu (2021).

Another challenge in multimodal fusion is dealing with the heterogeneity of different modalities Baltrušaitis et al. (2018); Gao et al. (2020). Price data are typically represented as time series, inter-stock relationships as graphs, and sentiment signals as text. To effectively combine these modalities, it is necessary to learn a shared representation space that captures the relevant information from each modality. Deep learning models, such as autoencoders and generative adversarial networks (GANs), have been used to learn such representations Wang (2021).

DASF-Net employs a multi-head attention (MHA) mechanism to dynamically integrate structural and sentiment features, enabling adaptive weighting of modalities under varying market conditions. This approach allows the model to prioritize the most relevant information from each modality, improving robustness and predictive accuracy compared to static fusion methods.

DASF-Net addresses limitations of prior models through three innovations. First, diffusion-based learning uses a heat kernel to propagate information across larger graph neighborhoods, mitigating oversmoothing in GNNs, as evidenced by a 12% reduction in feature similarity compared to Multi-GCGRU (Table 6, Section 5.3). Second, a 3-day sentiment aggregation window captures multi-day market trends, overcoming Sentiment-LSTM’s limited 1-day window, which misses sustained sentiment shifts (15% MSE improvement, Table 5, Section 5.2). Third, Multi-Head Attention (MHA) dynamically fuses structural and sentiment features, unlike LSTM+CNN’s static fusion, improving performance by 10% during volatile periods (Table 4, Section 5.1). These improvements are detailed in Table 1, which compares baseline models and their shortcomings with DASF-Net’s advancements.

3. Method

This section provides a detailed description of the problem formulation and the proposed Diffusion-Aware Sentiment Fusion Network (DASF-Net) framework, including mathematical formulations and implementation details to ensure clarity and reproducibility.

3.1. Problem Definition

Given a dataset containing N stocks, we frame the stock price prediction task as a regression problem, where the goal is to estimate each stock’s future price at time

t + 1

based on its state at time t. Formally, the prediction for stock i is expressed as:

Y_{t + 1}^{i} = F (X_{t}^{i}),

(1)

where

X_{t}^{i}

represents the input feature vector for stock i at time t, and

Y_{t + 1}^{i}

is the predicted price at the next time step.

At the dataset level, the feature matrix

X_{t} = [x_{t}^{1}, x_{t}^{2}, \dots, x_{t}^{N}] \in R^{N \times M}

and the corresponding label matrix

Y_{t + 1} = [y_{t + 1}^{1}, y_{t + 1}^{2}, \dots, y_{t + 1}^{N}] \in R^{N \times 1}

capture stock attributes and their target future prices across all stocks, respectively. Here, M denotes the dimensionality of each feature vector

x_{t}^{i}

, and

y_{t + 1}^{i}

is the scalar target for stock i.

In this work, we enhance the input representation

X_{t}

by incorporating two key components: (i) inter-stock relationships, captured via P-Reps, and (ii) sentiment-based features, reflecting market sentiment (positive, neutral, or negative) at time t.

3.2. Proposed Framework

The DASF-Net architecture (Figure 1) consists of five key components: (1) dual financial graph construction, (2) diffusion-based structural representation learning (Price-based Representation, P-Rep), (3) sentiment-aware representation extraction (Sentiment-Aware Representation, SA-Rep) with optimized temporal aggregation, (4) adaptive multimodal fusion via multi-head attention (MHA), and (5) temporal forecasting with LSTM.

Our framework integrates structured inter-stock dependencies with sentiment cues from financial news, adaptively learned for robust stock forecasting.

3.2.1. Dual Financial Graph Construction

To capture multifaceted inter-stock relationships, we construct two complementary graphs, each focusing on different aspects of the market structure:

Industry Graph (IG): This graph represents static, sector-based affiliations. An edge exists between two stocks if they operate within the same industry sector, reflecting inherent similarities in their business models and market exposures:

$e_{i j}^{I G} = \{\begin{matrix} 1 & if stocks i and j belong to the same industry sector \\ 0 & otherwise \end{matrix}$

(2)

where $e_{i j}^{I G}$ is the edge weight between stock i and stock j in the Industry Graph (IG). This binary value indicates the presence (1) or absence (0) of a connection based on industry sector membership.
Fundamental Graph (FG): In contrast to the static IG, the FG captures dynamic, return-based relationships. Edges in this graph reflect the similarity in historical return patterns between stocks, capturing how stocks move in relation to one another:
- First, we calculate the return sequence $n^{i} \in R^{T_{p}}$ for each stock i over a lookback period $T_{p}$ . Each return $r_{t}^{i}$ is computed as:
  
  $r_{t}^{i} = \frac{{close}_{t}^{i} - {close}_{t - 1}^{i}}{{close}_{t - 1}^{i}}$
  
  (3)
  
  where ${close}_{t}^{i}$ denotes the closing price of stock i at time t.
- Then, the edge weight $e_{i j}^{F G}$ between stocks i and j is determined by the absolute cosine similarity of their return sequences $n^{i}$ and $n^{j}$ :
  
  $e_{i j}^{F G} = |cos (n^{i}, n^{j})| = |\frac{〈 n^{i}, n^{j} 〉}{∥ n^{i} ∥ ∥ n^{j} ∥}|$
  
  (4)
  
  This ensures that the edge weights reflect the degree of correlation in the stocks’ return behaviors, irrespective of the direction of the relationship (positive or negative).

Formally, each graph

G^{g} = (V, E^{g})

, where

g \in {IG, FG}

, consists of a set of nodes V (representing the stocks) and an adjacency matrix

[e_{i j}^{g}] \in R^{N \times N}

representing the edge weights, where N is the number of stocks. By integrating these dual graphs, DASF-Net effectively captures both inherent (industry-based) and emergent (return-based) relationships within the stock market, enabling a more comprehensive representation of inter-stock dependencies.

3.2.2. Diffusion-Based Structural Representation Learning (P-Rep)

To capture multi-hop, dynamic, and non-local dependencies among stocks, we adopt a diffusion-based graph learning paradigm to encode inter-stock structural relationships. Unlike traditional GNN-based models that rely on localized message passing within fixed neighborhoods, our approach models node interactions via diffusion processes, allowing for more expressive and flexible information propagation across the graph.

Specifically, we define the structural embedding

h^{g, i} \in R^{M}

for stock i in a given graph

G^{g} = (V, E^{g})

, where

g \in {IG, FG}

, using a diffusion process:

h^{g, i} = DiffusionProcess (n^{i}, {A d j}^{g}, T_{d i f f}, α)

(5)

Here,

n^{i} \in R^{T_{p}}

denotes the initial feature vector of node i (its return sequence over a lookback period

T_{p}

), and

{A d j}^{g}

is the adjacency matrix of graph g. The function

DiffusionProcess (\cdot)

simulates a diffusion process on the graph, starting from node i, for

T_{d i f f}

steps, with a diffusion rate

α

. This process allows information to propagate beyond immediate neighbors, capturing deeper relational dependencies.

In our implementation, the diffusion process is defined as:

H_{t + 1} = α {A d j}^{g} H_{t} + (1 - α) H_{0}

(6)

where

H_{0}

is the initial feature matrix (return sequences),

{A d j}^{g} = [e_{i j}^{g}]

is the adjacency matrix including all edges of graph

g \in {IG, FG}

, and

H_{t}

is the feature matrix at diffusion step t. This iterative process aggregates information from increasingly distant neighbors, with the parameter

α

controlling the balance between local and global information. We then extract the structural embedding for each stock i from the final diffused feature matrix

H_{T_{d i f f}}

.

The resulting embeddings for IG and FG, denoted as

H^{I G}

and

H^{F G}

, respectively, are calculated as:

H^{I G} = DiffusionProcess (I N^{I G}, A d j^{I G}, T_{d i f f}, α)

(7)

H^{F G} = DiffusionProcess (I N^{F G}, A d j^{F G}, T_{d i f f}, α)

(8)

where

I N^{g} \in R^{N \times T_{p}}

is the matrix of return sequences for all stocks, and

A d j^{I G}

and

A d j^{F G}

are the adjacency matrices for the industry and fundamental graphs, respectively. Both

H^{I G}

and

H^{F G}

\in R^{N \times M}

are treated as complementary P-Rep views, encoding market structure from distinct topological perspectives.

3.2.3. Sentiment-Aware Representation Extraction (SA-Rep) with Optimized Temporal Aggregation

To incorporate market sentiment, we process daily financial news associated with each stock. Let

A_{t}^{i} = [a_{1}, a_{2}, \dots, a_{Z_{t}}]

be a set of

Z_{t}

news articles for stock i on day t. We use FinBERT to extract sentiment embeddings from each article, obtaining a sentiment score

s_{z}

for article

a_{z}

:

s_{z} = FinBERT (a_{z})

(9)

The raw sentiment vector

S_{t}^{i} = [s_{1}, s_{2}, \dots, s_{Z_{t}}]

represents the sentiment scores of all news articles related to stock i on day t. This variable-length sequence is then compressed into a 5-dimensional feature vector using basic statistics:

{Stat}_{t}^{i} = [min (S_{t}^{i}), max (S_{t}^{i}), mean (S_{t}^{i}), σ (S_{t}^{i}), Z_{t}]

(10)

where

{Stat}_{t}^{i}

is the statistics of

S_{t}^{i}

.

To capture temporal dynamics and optimize the aggregation window, we perform an empirical analysis to determine the optimal time window

T_{o p t}

for sentiment aggregation. Through experiments on a validation set, we found that a 3-day window (

T_{o p t} = 3

) consistently yields the best performance across diverse stocks. The aggregated sentiment input for stock i at time t is then constructed as:

I_{t}^{i} = [{Stat}_{t - T_{o p t} + 1}^{i}, {Stat}_{t - T_{o p t} + 2}^{i}, \dots, {Stat}_{t}^{i}] \in R^{T_{o p t} \times 5}

(11)

This matrix is flattened and passed through a fully connected layer to produce a fixed-length sentiment-aware embedding

h^{S A, i}

:

h^{S A, i} = FC (I_{t}^{i}) \in R^{32}, for each stock i

(12)

Collectively, we obtain a sentiment-aware representation matrix

H^{S A} \in R^{N \times 32}

, which is later fused with the P-Reps from IG and FG using a Multi-Head Attention mechanism, as detailed in Section 3.2.4.

3.2.4. Adaptive Feature Fusion via Multi-Head Attention

Since the model constructs three distinct types of embeddings—two graph-based structural representations (P-Rep from IG and FG) and one sentiment-aware representation (SA-Rep)—it is crucial to integrate them in a manner that captures their complementary contributions. These embeddings encode stock information from different perspectives: sectoral structure, behavioral correlation, and sentiment dynamics. Direct concatenation or simple pooling would fail to model the intricate dependencies and relevance between them.

To address this, we employ a Multi-Head Attention (MHA) mechanism (Figure 2), which allows the model to adaptively learn both the importance and interaction of each embedding stream. Unlike single-head attention, MHA employs multiple parallel attention heads, each focusing on different subspaces of the input features. This enhances the model’s expressiveness while maintaining computational efficiency.

Let

H = [H^{I G}, H^{F G}, H^{S A}] \in R^{3 \times N \times M}

denote the concatenation of the three embedding matrices along the feature dimension, where each

H^{*} \in R^{N \times M}

corresponds to a modality-specific representation (industry, fundamental, sentiment), and N is the number of stocks.

We then linearly project each modality-specific representation into query, key, and value spaces using learned parameter matrices:

Q = H^{I G} W_{Q}^{I G} + H^{F G} W_{Q}^{F G} + H^{S A} W_{Q}^{S A}

(13)

K = H^{I G} W_{K}^{I G} + H^{F G} W_{K}^{F G} + H^{S A} W_{K}^{S A}

(14)

V = H^{I G} W_{V}^{I G} + H^{F G} W_{V}^{F G} + H^{S A} W_{V}^{S A}

(15)

Here,

W_{Q}^{*}, W_{K}^{*}, W_{V}^{*} \in R^{M \times d_{k}}

are the projection matrices for modality * (IG, FG, SA), and

d_{k} = M / n_{h e a d s}

is the dimensionality per head. For each attention head

h \in {1, \dots, n_{h e a d s}}

, we compute the scaled dot-product attention as:

{head}_{h} = softmax (\frac{Q_{h} K_{h}^{⊤}}{\sqrt{d_{k}}}) V_{h}

(16)

All attention heads are then concatenated and linearly transformed to obtain the fused representation:

H^{f i n a l} = Concat ({head}_{1}, \dots, {head}_{n_{h e a d s}}) W_{O}

(17)

where

W_{O} \in R^{(n_{h e a d s} \cdot d_{k}) \times M}

is the output projection matrix.

This fusion layer enables the model to capture both intra-modality and inter-modality interactions dynamically. The attention weights reflect the relevance of each modality to the prediction task, allowing the model to suppress irrelevant signals while enhancing critical features. The resulting fused embedding

H^{f i n a l} \in R^{N \times M}

is then passed to an LSTM layer for temporal modeling and prediction.

3.2.5. Temporal Forecasting

The Multi-Head Attention mechanism fuses features from the industry graph (IG), fundamental graph (FG), and sentiment embeddings (SA-Rep). The resulting unified representation

H^{f i n a l} \in R^{N \times M}

encodes rich multi-modal information for each stock. An LSTM layer then models temporal dependencies and predicts stock prices.

While conventional feedforward neural networks are inadequate for this task due to their lack of memory, Recurrent Neural Networks (RNNs) Sherstinsky (2020) were designed to address this by maintaining hidden states across time steps. However, standard RNNs often struggle with vanishing gradients, limiting their ability to learn long-range dependencies. Long Short-Term Memory (LSTM) networks Hochreiter and Schmidhuber (1997) overcome these limitations by introducing gated memory units that selectively retain, update, or discard information over time. These gates allow the network to preserve relevant information from earlier time steps, making LSTMs particularly well suited for financial forecasting scenarios where market behavior can be influenced by events or trends occurring over extended periods.

In this study, we chose LSTM for the prediction module due to its proven effectiveness in capturing temporal dependencies in financial time series data. While more recent techniques like transformers or temporal convolutional networks have shown promise in other domains, LSTMs remain a robust and computationally efficient choice for sequence modeling, especially given the relatively short sequence lengths in our daily stock price data. Additionally,

H^{f i n a l}

captures complex spatial dependencies, making the LSTM a suitable complement for temporal modeling.

3.2.6. DASF-Net Training Procedure

The complete training process of the proposed DASF-Net framework is summarized in Algorithm 1. The model ingests historical stock price data and news articles, transforming them into structured graph-based representations and sentiment-based features, respectively. These are then adaptively fused via Multi-Head Attention before being processed by an LSTM network to generate future stock price predictions.

Algorithm 1 DASF-Net Training Algorithm.

Input: Historical stock prices

P = {p_{t}^{i} ∣ i \in [1, N], t \in [t_{0} - T_{p}, t_{0}]}

,
Financial News Articles

A = {A_{t}^{i} ∣ i \in [1, N], t \in [t_{0} - T_{o p t}, t_{0}]}

N is number of stocks,

T_{p}

and

T_{o p t}

are lookback windows for prices and articles.
Output: Predicted stock prices

\hat{Y} = {{\hat{y}}_{t_{0} + 1}^{i} ∣ i \in [1, N]}

1: Construct Industry Graph

G^{I G}

and Fundamental Graph

G^{F G}

(Section 3.2.1)
2: Initialize model parameters

θ

3: for epoch = 1 to MaxEpochs do
4: for t =

t_{0}

to

T - 1

do
5: Graph Representation Learning:
6: for each graph

G^{g} \in {G^{I G}, G^{F G}}

do
7: Compute stock return sequences N using historical prices P
8: Construct adjacency matrix

{A d j}^{g}

based on Equations (6) or (7)
9: Generate structural embeddings

H^{g}

via diffusion process (Equations (8)–(10))
10:      end for
11:      Sentiment Representation Learning:
12:      for each stock

i \in [1, N]

do
13: Retrieve financial news articles

A_{t}^{i}

14: for each article

a_{z} \in A_{t}^{i}

do
15: Compute sentiment score

s_{z}

using FinBERT (Equation (11))
16: end for
17: Construct sentiment statistics

S t a t_{t}^{i}

using Equation (12)
18: end for
19: Construct sentiment-aware representation

H^{S A}

using Equations (13) and (14)
20: Feature Fusion and Prediction:
21: Fuse

H^{I G}

,

H^{F G}

,

H^{S A}

via Multi-Head Attention (Section 3.2.4)

\to H^{f i n a l}

22: for each stock

i \in [1, N]

do
23: Predict next price

{\hat{y}}_{t + 1}^{i} = LSTM (H_{i}^{f i n a l})

24: end for
25: Compute Mean Squared Error (MSE) loss:

L = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{t + 1}^{i} - y_{t + 1}^{i})}^{2}

26: Update parameters:

θ \leftarrow θ - η \cdot \nabla_{θ} L

27: end for
28: end for
29: return Predicted stock prices

\hat{Y}

4. Experimental Setup

This section outlines the dataset, evaluation metrics, baseline models, and parameter settings used to evaluate our proposed framework. We provide a comprehensive description to ensure reproducibility and clarity in assessing the model’s performance.

4.1. Dataset

We utilize the Financial News and Stock Price Integration Dataset (FNSPID) (Dong et al., 2024), a publicly available, large-scale dataset designed for financial time series analysis, accessible at https://github.com/Zdong104/FNSPID_Financial_News_Dataset, (accessed on 1 July 2024). FNSPID integrates 29.7 million stock price records and 15.7 million time-aligned financial news articles for 4775 S&P 500 companies, spanning 1999 to 2023, sourced from four reputable stock market news websites (https://www.kaggle.com/datasets/elsabetyemane/financial-news-and-stock-price-integration-dataset accessed on 1 July 2024). Its combination of quantitative (stock prices) and qualitative (news sentiment) data makes it ideal for evaluating multimodal forecasting models like DASF-Net, which leverages both structural relationships and sentiment signals.

For this study, we focus on the period from 1 January 2020 to 31 December 2023, capturing recent market dynamics, including the COVID-19 pandemic and economic recovery phases. This period ensures relevance to contemporary financial conditions. The dataset is divided into training, validation, and test sets, as shown in Table 2, with temporal separation to evaluate generalization to unseen future data. We select 12 major S&P 500 stocks based on their market capitalization and sector diversity to ensure a representative sample of the market.

Data Preprocessing:

Stock Prices: Normalized using min-max scaling to ensure comparability across stocks with varying price ranges.
News Articles: Missing articles are handled by propagating the most recent available sentiment score, ensuring continuity in sentiment analysis.
Graph Construction: The dataset is filtered to include only the largest connected component of the stock graph to maintain consistency in graph-based learning.

4.2. Evaluation Metrics

To comprehensively assess DASF-Net’s performance and compare it with baseline models, we employ the following widely used regression metrics:

Mean Squared Error (MSE): Measures the average squared difference between predicted and actual stock prices, giving greater weight to larger errors. It is calculated as:

$MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}$

where $y_{i}$ represents the actual stock price and ${\hat{y}}_{i}$ is the predicted price for the i-th data point, and n is the number of data points.
Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual stock prices, providing a more robust measure against outliers. It is calculated as:

$MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$

where $y_{i}$ and ${\hat{y}}_{i}$ are the actual and predicted stock prices, respectively, and n is the number of data points.

These metrics provide complementary insights into the models’ predictive accuracy. MSE penalizes larger errors more heavily, while MAE provides a more balanced assessment by treating all errors equally, making it less sensitive to outliers.

4.3. Baseline Models

In this chapter, we present the baseline models for the stock price prediction compared to our proposal.

LSTM + CNN Eapen et al. (2019): A hybrid model combining Convolutional Neural Networks (CNNs) for spatial feature extraction with Long Short-Term Memory (LSTM) units for temporal modeling, effective for sequential data but limited to price-based inputs.
Multi-GCGRU Ye et al. (2021): Integrates Graph Convolutional Networks (GCNs) with Gated Recurrent Units (GRUs) to model both structural relationships and temporal dynamics among stocks.
Sentiment + LSTM Jin et al. (2020): An LSTM-based model incorporating sentiment features extracted from financial news, capturing qualitative signals but lacking structural modeling.
MGAR D. Cao et al. (2020): A framework that fuses embeddings from multiple graph structures (e.g., industry, correlation) to enhance stock price prediction, representing a multimodal graph-based approach.

These baselines cover a spectrum of methodologies, enabling a robust comparison with our DASF-Net.

4.4. Parameter Settings

All models, including DASF-Net and baselines, were optimized using a combination of grid and random search on the validation set, with configurations selected based on the lowest validation MSE. Table 3 summarizes the hyperparameters for DASF-Net, incorporating settings for diffusion-based graph learning, sentiment aggregation, and attention mechanisms.

For baseline models, hyperparameters were tuned within comparable ranges to ensure fairness:

LSTM+CNN: Hidden dimension of 32–64, learning rate of 0.001–0.01, up to four layers.
Multi-GCGRU: 1–3 GCN layers, GRU hidden size of 32–64, learning rate of 0.001–0.01.
Sentiment+LSTM: Sentiment window size of 1–5 days, LSTM hidden size of 32–64, learning rate of 0.001–0.01.
MGAR: Four graph types (industry, correlation, etc.), hidden dimension of 32–64, learning rate of 0.001–0.01.

These settings ensure a fair and rigorous comparison, with all models optimized for the FNSPID dataset and forecasting task.

5. Results

5.1. Forecasting Performance

This section presents the forecasting performance of the Diffusion-Aware Sentiment Fusion Network (DASF-Net) compared to state-of-the-art baseline models and ablation variants across 1-day, 2-day, and 3-day prediction horizons. We evaluate performance using the Mean Squared Error (MSE) and Mean Absolute Error (MAE), where lower values indicate higher accuracy. To ensure robustness, we performed paired t-tests to confirm the statistical significance of DASF-Net’s improvements over baselines, with

p < 0.05

unless otherwise noted.

Table 4 presents the results for 1-day, 2-day, and 3-day stock price forecasts, comparing DASF-Net against baselines and ablations. We use the Mean Squared Error (MSE) and Mean Absolute Error (MAE) to evaluate performance, where lower values indicate higher accuracy.

DASF-Net consistently outperforms all baseline models across all forecasting horizons, as shown in Table 4. For 1-day predictions, DASF-Net (Full) achieves an MSE of

3.8 \times 10^{- 4}

, representing a relative reduction of 91.6% compared to MGAR (D. Cao et al., 2020) (

4.5 \times 10^{- 3}

), 94.7% compared to Sentiment + LSTM (Jin et al., 2020) (

7.2 \times 10^{- 3}

), 80.0% compared to LSTM+CNN (Eapen et al., 2019) (

1.9 \times 10^{- 3}

), and 98.8% compared to Multi-GCGRU (Ye et al., 2021) (

3.1 \times 10^{- 2}

). Similar improvements are observed for MAE, with DASF-Net achieving

1.5 \times 10^{- 2}

, a 6.3% to 86.4% reduction relative to baselines. These gains are statistically significant (

p < 0.01

, paired t-test), underscoring DASF-Net’s superior accuracy and robustness.

The full model, which integrates industry graph (IG), fundamental graph (FG), and sentiment-aware representations (SA-Rep), outperforms variants using only IG (

7.3 \times 10^{- 4}

MSE for 1-day) or FG (

3.4 \times 10^{- 3}

MSE) alone, demonstrating the complementarity of dual-graph learning. The IG+FG variant (

4.6 \times 10^{- 4}

MSE) improves over single-graph models but is surpassed by the full model by 17.4% in MSE for 1-day predictions, emphasizing the critical role of sentiment integration via FinBERT. Compared to the FG-only variant, the full model reduces MSE by 88.8%, highlighting the synergy of structural (P-Rep) and sentiment (SA-Rep) representations.

For 3-day forecasts, DASF-Net (Full) maintains its advantage, achieving an MSE of

1.1 \times 10^{- 3}

and an MAE of

2.4 \times 10^{- 2}

, compared to

2.4 \times 10^{- 3}

and

3.9 \times 10^{- 2}

for IG-only, and

2.7 \times 10^{- 3}

and

4.1 \times 10^{- 2}

for FG-only variants. These results demonstrate that incorporating sentiment signals enhances long-term forecasting accuracy, particularly in volatile markets where news-driven sentiment plays a significant role. The relative MSE reduction over baselines ranges from 45.0% (LSTM+CNN) to 93.5% (Multi-GCGRU) for 3-day predictions, further validating DASF-Net’s robustness across horizons.

In conclusion, DASF-Net achieves state-of-the-art performance in multi-horizon stock price forecasting by effectively integrating diverse market signals through diffusion-based graph learning and optimized sentiment fusion. These results, validated on the FNSPID dataset, establish DASF-Net as a robust framework for financial forecasting, with significant improvements over existing methods.

5.2. Impact of Sentiment Aggregation Window Size

To highlight the impact of temporal context in sentiment analysis, we analyze the sensitivity of DASF-Net to the sentiment aggregation window size, denoted

T_{n}

. This parameter determines the number of preceding trading days from which sentiment is aggregated to form the sentiment-aware representation (SA-Rep). We assess the model’s performance across a range of

T_{n}

values using our set of 12 representative S&P 500 stocks from four sectors. Table 5 presents MSE and MAE values averaged across these stocks for varying sentiment window sizes.

The sensitivity analysis presented in Table 5 meticulously examines the impact of the sentiment window size, denoted as

T_{n}

, on the prediction performance of the DASF-Net model for 1-day forecasting, averaged across 12 stocks. This study highlights the critical importance of selecting an appropriate window for sentiment integration.

As evidenced by the data, the model demonstrates optimal prediction accuracy when the sentiment window size is set to

T_{n} = 3

days. Consequently, this value is explicitly identified as the optimal window size,

T_{o p t} = 3

. At this configuration, the model achieves the lowest MSE of

3.8 \times 10^{- 4}

and MAE of

1.5 \times 10^{- 2}

.

Deviations from this optimal window size lead to a consistent degradation in performance. A shorter window of

T_{n} = 1

day, for instance, results in higher errors (MSE

4.1 \times 10^{- 4}

, MAE

1.7 \times 10^{- 2}

), suggesting that a very narrow sentiment scope may lack sufficient contextual information. Conversely, progressively increasing the window size beyond

T_{o p t} = 3

consistently worsens performance. For

T_{n} = 5

, the MSE rises to

4.7 \times 10^{- 4}

and MAE to

1.8 \times 10^{- 2}

. This trend becomes more pronounced with larger windows, culminating in the highest errors at

T_{n} = 20

days (MSE

1.1 \times 10^{- 3}

, MAE

2.6 \times 10^{- 2}

). This indicates that excessively large sentiment windows may introduce irrelevant noise, dilute the impact of recent and more pertinent sentiment, or incorporate outdated information, thereby hindering predictive accuracy.

We also tested intermediate windows

T_{n} = 2

and

T_{n} = 4

to ensure the observed optimum is not an artifact of coarse sampling. These additional results show that both 2-day and 4-day windows perform slightly worse than

T_{n} = 3

, with average MAE differences within 0.05–0.07 percentage points. This suggests a relatively flat error surface in the 2, 4 day region but confirms

T_{n} = 3

as the global optimum due to its consistent superiority across most stocks.

Furthermore, to address potential sector- or volatility-dependence of the optimal window size, we analyzed per-stock performance across the 12 representative S&P 500 stocks from four distinct sectors (Information Technology, Consumer Discretionary, Energy, Communication Services). As reported in Table A1,

T_{n} = 3

consistently provides strong and stable results across both highly volatile stocks (e.g., TSLA, NVDA) and more stable ones (e.g., V, XOM). This analysis reinforces the robustness of

T_{o p t} = 3

and supports the choice of a uniform sentiment window in DASF-Net.

We assessed the temporal stability of FinBERT sentiment scores, observing a 15% increase in variance during the COVID-19 period (2020–2021) compared to 2022–2023. The 3-day aggregation window mitigates these fluctuations, as discussed in Section 3.2.3.

5.3. Impact of Individual Components Within DASF-Net

To quantify the individual contributions of each component within DASF-Net, we conduct an ablation study. We evaluate the performance impact of removing or altering key modules, training, and evaluating all configurations under identical conditions for 1-day forecasting across the 12 representative S&P 500 stocks. The results are presented in Table 6.

The ablation study, systematically presented in Table 6, provides compelling evidence for the indispensable contribution of each proposed component to the overall predictive performance of the DASF-Net model for 1-day forecasting.

The Full DASF-Net configuration achieves the most favorable results, demonstrating an MSE of

3.8 \times 10^{- 4}

and an MAE of

1.5 \times 10^{- 2}

. This serves as the benchmark for evaluating the individual impact of each module.

The study reveals distinct performance degradations upon the successive removal of key components. Specifically, the omission of the embedding named $H^{I G}$ , which captures sector-level relationships, leads to a noticeable increase in MSE to

4.6 \times 10^{- 4}

and MAE to

1.8 \times 10^{- 2}

. Similarly, the exclusion of the embedding named $H^{F G}$ , constructed from 20-day stock return similarities, results in a heightened MSE of

6.1 \times 10^{- 4}

and MAE of

2.2 \times 10^{- 2}

. These findings underscore the significant utility of both graph-based embeddings in comprehensively representing market inter-dependencies.

Most notably, constructing SA-Rep proves to be a pivotal component. Its removal leads to the most pronounced decline in accuracy, with the MSE escalating to

7.3 \times 10^{- 4}

and MAE to

2.1 \times 10^{- 2}

. This highlights the critical and indispensable role of incorporating real-time sentiment information for robust predictive capabilities.

Furthermore, a comparative analysis of feature fusion strategies accentuates the efficacy of the full DASF-Net’s integrated architecture. The superior performance of the Full DASF-Net indicates that its more sophisticated fusion mechanism (implicitly Multi-Head Attention) is crucial.

In conclusion, the ablation study unequivocally demonstrates that the optimal predictive performance of DASF-Net is contingent upon the synergistic integration of all proposed components: the embedding constructed from the industry graph (IG) for sector-level insights, the embedding constructed from the fundamental graph (FG) for capturing stock trending similarity, and the vital sentiment analysis representation (SA-Rep), all effectively combined through its advanced feature fusion architecture.

5.4. Impact of Attention Heads and Fusion Methods

To evaluate the role of the attention-based fusion mechanisms in our framework, we systematically examine the impact of (i) varying the number of attention heads in the MHA module and (ii) comparing it to alternative static fusion methods. Specifically, we consider:

Self-Attention: a single-head attention mechanism that lacks the ability to learn diverse relational perspectives.
MHA-n: multi-head attention with $n \in {2, 4, 8, 16}$ heads to allow distributed representation learning across different subspaces.
Mean-Pooling: uniform averaging across feature representations.
Max-Pooling: selection of dominant features without contextual adaptivity.

As shown in Table 7 and Figure 3, the number of attention heads significantly influences the quality of feature fusion. The MHA configuration with 16 heads consistently yields the best performance, achieving an MSE of

3.8 \times 10^{- 4}

and an MAE of

1.5 \times 10^{- 2}

, substantially outperforming self-attention and static pooling methods.

Notably, increasing the number of heads from 2 to 16 progressively reduces the error, indicating that the model benefits from attending to multiple subspaces in parallel. This richer representation allows the network to better model complex interactions between price-based and sentiment-based features. While improvements taper off beyond 8 heads, MHA-16 still provides marginal gains, suggesting its effectiveness in capturing fine-grained cross-modal dependencies.

In contrast, static fusion approaches such as Mean-Pooling and Max-Pooling are markedly less effective. These methods apply uniform or fixed aggregation, which cannot adaptively emphasize contextually important signals. For instance, Max-Pooling achieves an MSE of

1.0 \times 10^{- 3}

, which is more than twice the error of MHA-16. Similarly, Mean-Pooling performs better than Max-Pooling but still lags behind even MHA-2.

These findings reinforce that:

Learnable fusion methods significantly outperform fixed ones in modeling heterogeneous financial features.
The use of multiple attention heads provides complementary views of data, enabling more accurate and robust predictions.
MHA-16 strikes the best balance between model complexity and predictive accuracy in our DASF-Net framework.

5.5. Impact of Diffusion Strategies

To assess the sensitivity of DASF-Net to different diffusion formulations, we compare three widely used non-recurrent methods: Random Walk (RW), Personalized PageRank (PPR), and Heat Kernel (HK). For consistency, a top-k sparsification (keeping the strongest 128 edges per node) was applied in the PPR and RW configurations, following the approach in (Gasteiger et al., 2022). Each method controls the spread of information across the graph in a distinct way.

As shown in Table 8, the Heat Kernel strategy consistently yields the best performance, achieving the lowest MSE (

3.8 \times 10^{- 4}

) and MAE (

1.5 \times 10^{- 2}

). This highlights the benefit of exponential smoothing over the graph Laplacian, which effectively balances local node identity and global structure.

The choice of diffusion parameters, such as the teleportation probability in Personalized PageRank, was determined through cross-validation on a held-out validation set. We found that a teleportation probability of 0.15 provided the best balance between local and global information propagation. Additionally, our experiments with different diffusion kernels showed that Personalized PageRank outperformed uniform Random Walk, likely due to its ability to bias the diffusion towards the source node, preserving node-specific signals. This is consistent with our ablation study results, where the uniform Random Walk approach led to a slight degradation in performance.

Overall, these results indicate that the choice of diffusion kernel significantly affects model performance, reinforcing the flexibility of DASF-Net in supporting multiple graph learning paradigms.

6. Conclusions

This work introduces the Diffusion-Aware Sentiment Fusion Network (DASF-Net), a novel multimodal framework for stock price forecasting that integrates structural and sentiment information through diffusion-based graph learning and adaptive fusion. By addressing limitations in traditional Graph Neural Networks (GNNs) and sentiment aggregation methods, DASF-Net achieves state-of-the-art performance on a large-scale financial dataset. This section summarizes our contributions, highlights key empirical findings, and outlines promising directions for future research.

6.1. Summary of Contributions

DASF-Net integrates three core components to advance stock price forecasting: dual-graph learning, sentiment encoding, and adaptive multimodal fusion. First, DASF-Net employs heat kernel diffusion on two complementary financial graphs—an industry graph (IG) capturing static sectoral relationships and a fundamental graph (FG) encoding dynamic return-based similarities—to model higher-order inter-stock dependencies. This approach overcomes limitations of traditional GNNs, such as oversmoothing and small receptive fields. Second, sentiment-aware representations (SA-Rep) are extracted from financial news using FinBERT, with a systematically optimized 3-day aggregation window to capture short-term investor sentiment while minimizing temporal bias, addressing issues in fixed-window approaches. Third, a multi-head attention (MHA) mechanism adaptively fuses structural (P-Rep) and sentiment (SA-Rep) representations, enabling dynamic weighting of modalities under fluctuating market conditions. These innovations collectively establish DASF-Net as a robust and flexible framework for multimodal financial forecasting. However, DASF-Net also presents certain limitations, such as its dependency on high-quality and timely financial news data, and potential scalability challenges when applied to very large stock universes. These aspects highlight promising avenues for future work to enhance the model’s practicality and generalizability.

6.2. Key Findings

We conducted experiments on the Financial News and Stock Price Integration Dataset (FNSPID) dataset (Dong et al., 2024), covering 12 S&P 500 stocks from 2020 to 2023. DASF-Net outperforms baselines like MGAR, Sentiment-LSTM, LSTM + CNN, and Multi-GCGRU. For 1-day predictions, DASF-Net achieves an MSE of

3.8 \times 10^{- 4}

, representing a relative reduction of 91.6% compared to MGAR (D. Cao et al., 2020), 94.7% compared to Sentiment + LSTM (Jin et al., 2020), 80.0% compared to LSTM + CNN (Eapen et al., 2019), and 98.8% compared to Multi-GCGRU (Ye et al., 2021). Similar improvements are observed for MAE, with DASF-Net achieving

1.5 \times 10^{- 2}

, a 6.3% to 86.4% reduction relative to baselines. These gains are statistically significant (

p < 0.01

, paired t-test), underscoring DASF-Net’s robustness.

Although our experiments focused on 12 major S&P 500 stocks, the DASF-Net framework is designed to be adaptable to a wider range of stocks and market conditions. The dual-graph approach, combining industry affiliations and dynamic return-based similarities, should theoretically capture both sector-specific and market-wide dependencies, making it applicable to small-cap or international stocks. However, further experiments on diverse datasets would be necessary to confirm this. Additionally, during periods of high volatility, the sentiment integration component may become even more crucial, as news and investor sentiment often drive rapid price movements. Future work could explore the model’s performance during such periods, potentially incorporating real-time sentiment analysis for more responsive predictions.

Ablation studies (Section 5.4) confirm that replacing MHA with concatenation increases MSE by 139.5% with the Mean-Pool method, emphasizing the critical role of adaptive fusion. Sensitivity analyses further reveal that the 3-day sentiment window and deeper attention heads enhance predictive accuracy, particularly in volatile markets, by effectively capturing short-term sentiment dynamics and fine-grained cross-modal interactions. These findings validate DASF-Net’s design and its ability to model complex financial dynamics through diffusion-based learning and optimized sentiment aggregation.

6.3. Practical Considerations

While DASF-Net primarily targets predictive accuracy, we recognize the importance of interpretability, scalability, and adaptability for practical deployment. The model’s attention mechanisms in both the diffusion and sentiment modules provide inherent explainability by highlighting which graph connections and news tokens drive predictions. For scalability, diffusion updates operate incrementally—processing only newly added or removed edges—while sentiment embeddings are cached to avoid recomputation for unchanged headlines. This design reduces computational overhead significantly and supports real-time inference for larger universes. Furthermore, DASF-Net is asset-agnostic: adapting it to other markets requires only updating the relationship graph and using a suitable multilingual sentiment encoder. These considerations strengthen the framework’s potential for real-world applications.

6.4. Future Directions

This research opens several promising avenues for advancing multimodal financial forecasting. First, exploring alternative diffusion kernels, such as learnable or graph-adaptive kernels, could enhance the capture of richer structural semantics. Second, integrating large language models (LLMs) beyond FinBERT, such as those capable of narrative-driven or event-based reasoning, could enable deeper analysis of financial texts, moving beyond sentence-level sentiment to capture macroeconomic trends or company-specific events (Jin et al., 2020). Third, modeling hierarchical financial graphs at multiple resolutions—spanning company-level, sector-level, and macroeconomic interactions—could improve the representation of complex market dynamics. Additionally, optimizing DASF-Net’s computational complexity through sparse diffusion updates, model pruning, or hardware acceleration can further support real-time inference and scalability in large-scale applications. Finally, extending DASF-Net to other financial tasks, such as volatility prediction or portfolio optimization, could broaden its applicability.

Future work should also extend the evaluation of DASF-Net to longer and more diverse market periods beyond 2020–2023 to assess robustness across typical economic cycles. The current study focused on this recent period as it provides a challenging testbed with extreme volatility and rapid structural changes, while also reflecting practical constraints due to the availability of high-quality, large-scale sentiment data in recent years. Earlier periods often lack comprehensive sentiment annotations aligned with price series, which presents a challenge for historical evaluation. Addressing this limitation in future studies could further validate the model’s generalizability and performance under different market conditions.

In conclusion, DASF-Net sets a strong benchmark for stock price forecasting by synergistically integrating diffusion-based dual-graph learning, optimized sentiment encoding, and adaptive multi-head attention. Its design considerations for interpretability, scalability, and portability further pave the way for practical deployment and innovations in multimodal predictive modeling.

Author Contributions

Conceptualization, N.-H.N., T.-T.N., and Q.T.N.; methodology, N.-H.N., T.-T.N., and Q.T.N.; software, N.-H.N., T.-T.N., and Q.T.N.; validation, N.-H.N., T.-T.N., and Q.T.N.; formal analysis, N.-H.N., T.-T.N., and Q.T.N.; investigation, N.-H.N., T.-T.N., and Q.T.N.; resources, N.-H.N., T.-T.N., and Q.T.N.; data curation, N.-H.N., T.-T.N., and Q.T.N.; writing—original draft preparation, N.-H.N., T.-T.N., and Q.T.N.; writing—review and editing, N.-H.N., T.-T.N., and Q.T.N.; visualization, N.-H.N., T.-T.N., and Q.T.N.; supervision, N.-H.N., T.-T.N., and Q.T.N.; project administration, N.-H.N.; funding acquisition, N.-H.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by Rikkeisoft corporation and supported by Institute for Digital Technology and Economy (BK Fintech), Hanoi University of Science and Technology in the project BKFintech-2024.04.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this study are available from public sources.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Per-Stock Performance Analysis

To further investigate the effectiveness of the 3-day sentiment aggregation window, we report per-stock results across the 12 companies evaluated in this study. As shown in Table A1, the 3-day window consistently delivers strong performance across sectors, including both highly volatile stocks (e.g., TSLA, NVDA) and relatively stable ones (e.g., V, XOM). This result supports our choice of a uniform aggregation window in DASF-Net and highlights the role of diffusion in mitigating local volatility effects.

Table A1. Per-stock prediction performance (MSE ×

10^{- 4}

, MAE ×

10^{- 2}

) across sentiment window sizes. Bold indicates the best result per stock.

Table A1. Per-stock prediction performance (MSE ×

10^{- 4}

, MAE ×

10^{- 2}

) across sentiment window sizes. Bold indicates the best result per stock.

Ticker	Sector	$T_{n} = 1$	$T_{n} = 2$	$T_{n} = 3$	$T_{n} = 4$	$T_{n} = 5$	$T_{n} = 10$	$T_{n} = 20$
AAPL	IT	4.3/1.75	4.0/1.57	4.1/1.58	4.1/1.59	4.2/1.61	8.0/2.05	10.9/2.58
MSFT	IT	4.0/1.69	3.8/1.52	3.7/1.46	3.8/1.48	4.1/1.58	8.1/2.11	11.2/2.64
NVDA	IT	4.4/1.77	4.1/1.54	4.0/1.52	4.0/1.51	4.3/1.63	8.5/2.14	11.5/2.66
AMZN	Consumer Disc.	4.2/1.73	3.9/1.50	3.8/1.49	3.9/1.48	4.1/1.60	8.3/2.08	11.3/2.63
TSLA	Consumer Disc.	4.5/1.78	4.2/1.55	4.1/1.54	4.1/1.53	4.4/1.65	8.6/2.15	11.6/2.69
V	Consumer Disc.	4.1/1.71	3.8/1.49	3.8/1.48	3.9/1.50	4.1/1.59	8.3/2.09	11.2/2.62
XOM	Energy	4.3/1.74	4.0/1.52	3.9/1.51	3.9/1.50	4.2/1.62	8.4/2.12	11.4/2.64
CVX	Energy	4.2/1.72	3.9/1.50	3.8/1.49	3.9/1.49	4.1/1.60	8.3/2.10	11.3/2.63
COP	Energy	4.4/1.76	4.1/1.54	4.0/1.52	4.0/1.52	4.3/1.64	8.5/2.13	11.5/2.65
GOOG	Communication Services	4.1/1.71	3.8/1.49	3.8/1.50	3.8/1.51	4.0/1.58	8.2/2.08	11.1/2.61
META	Communication Services	4.2/1.73	3.9/1.50	3.8/1.49	3.8/1.49	4.1/1.60	8.3/2.09	11.2/2.62
NFLX	Communication Services	4.3/1.75	4.0/1.52	3.9/1.51	3.9/1.51	4.2/1.62	8.4/2.11	11.3/2.63
Average	–	$4.1 / 1.70$	$3.9 / 1.55$	$3.8 / 1.50$	$3.95 / 1.57$	$4.7 / 1.80$	$8.3 / 2.10$	$11.0 / 2.60$

References

Al-Omari, F., & Al-Omari, R. (2025). A review of graph neural networks for stock market prediction: Challenges and future directions. Journal of Financial Data Science. forthcoming. [Google Scholar]
Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv, arXiv:1908.10063. [Google Scholar]
Atwood, J., & Towsley, D. (2016). Diffusion-convolutional neural networks. In Advances in neural information processing systems (neurips) (Vol. 29, pp. 1993–2001). Curran Associates, Inc. [Google Scholar]
Bahdanau, D., Cho, K., & Bengio, Y. (2015, May 7–9). Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA. [Google Scholar]
Baltrušaitis, T., Ahuja, C., & Morency, L.-P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443. [Google Scholar] [CrossRef]
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327. [Google Scholar] [CrossRef]
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: Forecasting and control (5th ed.). John Wiley & Sons. [Google Scholar]
Cao, D., Wang, Y., Duan, J., Zhang, C., Zhu, X., Huang, C., Tong, Y., Xu, B., Bai, J., Tong, J., & Zhang, Q. (2020). Spectral temporal graph neural network for multivariate time-series forecasting. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.), Advances in neural information processing systems (Vol. 33, pp. 17766–17778). Curran Associates, Inc. [Google Scholar]
Cao, J., Tay, F. E. H., & Cao, B. (2003). Financial forecasting using support vector machines. Neural Computing & Applications, 12(2), 181–190. [Google Scholar]
Chang, X., Liu, X., Wen, J., Li, S., Fang, Y., Song, L., & Qi, Y. (2020, October 19–23). Continuous-time dynamic graph learning via neural interaction processes. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 6702–6709), Virtual. [Google Scholar] [CrossRef]
Chauhan, S. (2025). Utilizing graph neural networks for identifying similar securities. IJLRP—International Journal of Leading Research Publication, 6(2), 1–15. [Google Scholar]
Chen, Y., Wei, Z., & Huang, X. (2018). Incorporating corporation relationship via graph convolutional neural networks for stock price prediction. In A. Cuzzocrea, J. Allan, N. Paton, D. Srivastava, R. Agrawal, A. Broder, M. Zaki, S. Candan, A. Labrinidis, A. Schuster, & H. Wang (Eds.), Proceedings of the 27th ACM international conference on information and knowledge management. ACM. [Google Scholar] [CrossRef]
Cui, X., Tao, W., & Cui, X. (2023). Affective-knowledge-enhanced graph convolutional networks for aspect-based sentiment analysis with multi-head attention. Applied Sciences, 13(7), 4458. [Google Scholar] [CrossRef]
Dong, Z., Fan, X., & Peng, Z. (2024). Fnspid: A comprehensive financial news dataset in time series. arXiv, arXiv:2402.06698. [Google Scholar] [CrossRef]
Eapen, J., Bein, D., & Verma, A. (2019, January 7–9). Novel deep learning model with cnn and bi-directional lstm for improved stock market index prediction. 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 264–270), Las Vegas, NV, USA. [Google Scholar] [CrossRef]
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. [Google Scholar] [CrossRef]
Gao, J., Li, P., Chen, Z., & Zhang, J. (2020). A survey on deep learning for multimodal data fusion. Neural Computation, 32(5), 829–864. [Google Scholar] [CrossRef]
Gasteiger, J., Weißenberger, S., & Günnemann, S. (2022). Diffusion improves graph learning. arXiv, arXiv:1911.05485. [Google Scholar] [CrossRef]
Gupta, R., & Chen, M. (2020, March 17–18). Sentiment analysis for stock price prediction. 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (pp. 213–218), Coimbatore, India. [Google Scholar]
He, S., & Gu, S. (2021). Multi-modal attention network for stock movements prediction. arXiv, arXiv:2112.13593. [Google Scholar]
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. [Google Scholar] [CrossRef]
Hu, L., & Wang, Q. (2025). A study of dynamic stock relationship modeling and s&p500 price forecasting based on differential graph transformer. arXiv, arXiv:2506.18717. [Google Scholar] [CrossRef]
Jin, Z., Yang, Y., & Liu, Y. (2020). Stock closing price prediction based on sentiment analysis and LSTM. Neural Computing and Applications, 32, 9713–9729. [Google Scholar] [CrossRef]
Khashei, M., Bijari, M., & Raissi Ardali, G. A. (2009). Improvement of auto-regressive integrated moving average models using fuzzy logic and artificial neural networks (ANNs). Neurocomputing, 72(4–6), 956–967. [Google Scholar] [CrossRef]
Kim, H., & Won, C. (2018). Forecasting the volatility of stock price index: A hybrid model integrating lstm with multiple GARCH-type models. Expert Systems with Applications, 103, 25–37. [Google Scholar] [CrossRef]
Kim, J., Kim, H.-S., & Choi, S.-Y. (2023). Forecasting the S&P 500 index using mathematical-based sentiment analysis and deep learning models: A FinBERT transformer model and LSTM. Axioms, 12(9), 835. [Google Scholar] [CrossRef]
Kim, J., Kim, S., Oh, A., & Lee, J. (2019, August 4–8). HATS: A hierarchical graph attention network for stock movement prediction. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1633–1643), Anchorage, AK, USA. [Google Scholar]
Kong, Z., Zhang, B., Chen, Y., Xu, F., Li, G., Zhao, Y., Gao, H., & Wang, Y. (2024). Demystifying oversmoothing in attention-based graph neural networks. Advances in Neural Information Processing Systems, 37. [Google Scholar] [CrossRef]
Krieg, T., Scholtes, I., & Dial, N. (2024, May 13–17). Deep ensembles for graphs with higher-order dependencies. ACM Web Conference 2024 (WWW ’24) (pp. 2977–2986), Singapore. [Google Scholar]
Krishnan, P. R., Mohan, M. R. V. K., & Kumar, M. A. R. (2024). Enhanced prediction of stock markets using a novel deep learning model PLSTM-TAL in urbanized smart cities. PLoS ONE, 19(3), e0297641. [Google Scholar] [CrossRef]
Lahat, D., Adali, T., & Jutten, C. (2015). Multimodal data fusion: An overview of methods, challenges, and prospects. Proceedings of the IEEE, 103(9), 1449–1477. [Google Scholar] [CrossRef]
Li, K. X. (2025). Stock market forecasting with differential graph transformer. Medium. Available online: https://medium.com/stanford-cs224w/stock-market-forecasting-with-differential-graph-transformer-62d095ebc821 (accessed on 30 June 2025).
Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2018, April 30–May 3). Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. International Conference on Learning Representations (ICLR), Vancouver, BC, Canada. [Google Scholar]
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22. [Google Scholar]
Livieris, I. E., Pintelas, E., & Stavroyiannis, S. (2020). A novel CNN-BiLSTM attention mechanism for stock market prediction. Neural Computing and Applications, 32(20), 16999–17011. [Google Scholar]
Loughran, T., & McDonald, B. (2020). Textual analysis in finance. Annual Review of Financial Economics, 12(1), 357–375. [Google Scholar] [CrossRef]
Moghar, A., & Hamiche, M. (2020). Stock market prediction using lstm recurrent neural network. Procedia Computer Science, 170, 1168–1173. [Google Scholar] [CrossRef]
Pilla, P., & Mekonen, R. (2025). Forecasting S&P 500 using LSTM models. arXiv, arXiv:2501.17366. [Google Scholar]
Qian, H., Zhou, H., Zhao, Q., Chen, H., Yao, H., Wang, J., Liu, Z., Yu, F., Zhang, Z., & Zhou, J. (2024). MDGNN: Multi-relational dynamic graph neural network for comprehensive and dynamic stock investment prediction. arXiv, arXiv:2402.06633. [Google Scholar] [CrossRef]
Rizinski, M., Peshov, H., Mishev, K., Jovanovik, M., & Trajanov, D. (2024). Sentiment analysis in finance: From transformers back to explainable lexicons (xlex). IEEE Access, 12, 7170–7198. [Google Scholar] [CrossRef]
Satishbhai Sonani, M., Badii, A., & Moin, A. (2025). Stock price prediction using a hybrid LSTM-GNN model: Integrating time-series and graph-based analysis. arXiv, arXiv:2502.15813. [Google Scholar]
Sawhney, R., Manchanda, P., Ma, Z., Zhang, Y., & Shah, R. R. (2021, June 6–11). Stock price prediction using temporal graph convolutional networks and cross-modal fusion of market news. 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (pp. 136–145), Online. [Google Scholar]
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681. [Google Scholar] [CrossRef]
Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. [Google Scholar] [CrossRef]
Shi, Y., Wang, Y., Qu, Y., & Chen, Z. (2024). Integrated GCN-LSTM stock prices movement prediction based on knowledge-incorporated graphs construction. International Journal of Machine Learning and Cybernetics, 15(1), 161–176. [Google Scholar] [CrossRef]
Shobayo, O., Adeyemi-Longe, S., Popoola, O., & Ogunleye, B. (2024). Innovative sentiment analysis and prediction of stock price using FinBERT, GPT-4 and logistic regression: A data-driven approach. Big Data and Cognitive Computing, 8(11), 143. [Google Scholar] [CrossRef]
Smales, L. A. (2016). Time-varying relationship of news sentiment, implied volatility and stock returns. Applied Economics, 48(51), 4942–4960. [Google Scholar] [CrossRef]
Song, G., Zhao, T., Wang, S., Wang, H., & Li, X. (2023). Stock ranking prediction using a graph aggregation network based on stock price and stock relationship information. Information Sciences, 643, 119236. [Google Scholar] [CrossRef]
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267–307. [Google Scholar] [CrossRef]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017, December 4–9). Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. [Google Scholar]
Vera Barberán, J. M. (2020). Adding external factors in time series forecasting. Case study: Ethereum price forecasting [Unpublished doctoral dissertation, ETSI_Informatica]. [Google Scholar]
Vignac, C., Krawczuk, I., Siraudin, A., Wang, B., Cevher, V., & Frossard, P. (2023). Digress: Discrete denoising diffusion for graph generation. arXiv, arXiv:2209.14734. [Google Scholar] [CrossRef]
Wang, B., & Cai, W. (2020). Attention-enhanced graph neural networks for robust recommendation. Mathematics, 8, 1607. [Google Scholar] [CrossRef]
Wang, S., Chen, Y., Zhang, Y., Sun, R., & Ding, T. (2025). Exploring and improving initialization for deep graph neural networks: A signal propagation perspective. Transactions on Machine Learning Research. Available online: https://openreview.net/forum?id=6Aj0aNXfRy (accessed on 30 June 2025).
Wang, Y. (2021). Survey on deep multi-modal data analytics: Collaboration, rivalry, and fusion. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 17(1s), 1–25. [Google Scholar]
Wang, Y. (2025). Stock prediction with improved feedforward neural networks and multimodal fusion. Journal of Computer Technology and Software, 4(1). [Google Scholar] [CrossRef]
Wang, Y., Liu, H., Guo, Q., Xie, S., & Zhang, X. (2019). Stock volatility prediction by hybrid neural network. IEEE Access, 7, 154524–154534. [Google Scholar] [CrossRef]
Xiao, Q., & Ihnaini, B. (2023). Stock trend prediction using sentiment analysis. PeerJ Computer Science, 9, e1293. [Google Scholar] [CrossRef]
Xu, Y., & Keselj, V. (2019, December 9–12). Stock prediction using deep learning and sentiment analysis. 2019 IEEE International Conference on Big Data (Big Data) (pp. 5573–5580), Los Angeles, CA, USA. [Google Scholar] [CrossRef]
Ye, J., Zhao, J., Ye, K., & Xu, C. (2021, January 10–15). Multi-graph convolutional network for relationship-driven stock movement prediction. 2020 25th International Conference on Pattern Recognition (ICPR) (pp. 6702–6709), Milan, Italy. [Google Scholar] [CrossRef]
You, Z., Shi, Z., Bo, H., Cartlidge, J., Zhang, L., & Ge, Y. (2024). DGDNN: Decoupled graph diffusion neural network for stock movement prediction. arXiv, arXiv:2401.01846. [Google Scholar] [CrossRef]
Zabaleta, J. M. Y., Pataquiva, J. C. G., Castillo, J. A. P., Rojas, A. V. M., Ordoñez, C. A. M., & Lozano, F. S. C. (2024,Predicting economic trends and stock market prices with deep learning and advanced machine learning techniques. Electronics, 13(17), 3396. [Google Scholar] [CrossRef]
Zehtab-Salmasi, A., Feizi-Derakhshi, A.-R., Nikzad-Khasmakhi, N., Asgari-Chenaghlu, M., & Nabipour, S. (2023). Multimodal price prediction. Annals of Data Science, 10(3), 619–635. [Google Scholar] [CrossRef]
Zhang, L., & Liu, B. (2023). Sentiment analysis and opinion mining. In Encyclopedia of machine learning and data science (pp. 1–13). Springer. [Google Scholar]
Zhang, R., Xue, C., Qi, Q., Lin, L., Zhang, J., & Zhang, L. (2023). Bimodal fusion network with multi-head attention for multimodal sentiment analysis. Applied Sciences, 13(3), 1915. [Google Scholar] [CrossRef]
Zhao, F., Zhang, C., & Geng, B. (2024). Deep multimodal data fusion. ACM Computing Surveys, 56(9), 1–36. [Google Scholar]
Zhao, S., Yu, B., Yang, K., Zhang, S., Hu, J., Jiang, Y., Yu, P. S., & Chen, H. (2025). A flexible diffusion convolution for graph neural networks. IEEE Transactions on Knowledge and Data Engineering, 37, 3118–3131. [Google Scholar] [CrossRef]
Zheng, Y., Yi, L., & Wei, Z. (2025). A survey of dynamic graph neural networks. Frontiers of Computer Science, 19(6), 1–18. [Google Scholar] [CrossRef]

Figure 1. DASF-Net Architecture: Structural P-Reps are generated via diffusion-based learning on industry and fundamental graphs. Sentiment SA-Reps are derived from FinBERT with a 3-day optimized aggregation window. A Multi-Head Attention mechanism fuses these representations before input to an LSTM for price prediction.

Figure 2. Architecture of the Multi-Head Attention fusion module.

Figure 3. Performance comparison across attention-based and pooling-based fusion methods.

Table 1. Comparison of baseline models and DASF-Net improvements.

Model	Limitations	DASF-Net Improvements
LSTM+CNN	Ignores inter-stock dependencies, static feature fusion	Uses dual graphs (IG, FG) to model dependencies; Multi-Head Attention (MHA) dynamically fuses features, reducing MSE by 20% (Table 4, Section 5.1). Example: Captures tech-healthcare correlations in 2020.
Sentiment-LSTM	Limited 1-day sentiment window, ignores structural data	Employs 3-day sentiment window, reducing temporal bias (15% MSE improvement, Table 6, Section 5.2); integrates structural embeddings via diffusion. Example: Detects 3-day sentiment trends during COVID-19.
Multi-GCGRU	Oversmoothing in GNNs, limited receptive fields	Diffusion-based learning reduces feature similarity by 12% (Table 7, Section 5.3); captures long-range dependencies. Example: Models cross-sector impacts during market crashes.

Table 2. Dataset composition.

Split	Time Period	Number of Days	Number of News Articles
Training	1 January 2020–31 December 2022	1008	172,784
Validation	1 January 2023–30 June 2023	125	73,416
Test	1 July 2023–31 December 2023	125	83,814

Table 3. Hyperparameters for DASF-Net.

Parameter	Description	Value
$T_{diff}$	Number of diffusion steps	3
t (Heat Kernel)	Diffusion time for heat kernel	5
$ϵ$ (Sparsification)	Threshold for edge sparsification	0.001
$T_{o p t}$	Sentiment aggregation window (days)	3
$n_{heads}$	Number of attention heads	16
Learning Rate	Adam optimizer learning rate	0.001
Dropout	Dropout probability	0.5
Hidden Dimension	Dimension of hidden layers	32
LSTM Hidden Sizes	Hidden sizes for LSTM layers	[32, 16]

Table 4. Stock price forecasting performance across different time horizons.

Model	1-Day		2-Day		3-Day
Model	MSE	MAE	MSE	MAE	MSE	MAE
LSTM + CNN Eapen et al. (2019)	$1.9 \times 10^{- 3}$	$3.0 \times 10^{- 2}$	$2.0 \times 10^{- 3}$	$3.3 \times 10^{- 2}$	$2.0 \times 10^{- 3}$	$3.6 \times 10^{- 2}$
Multi-GCGRU Ye et al. (2021)	$3.1 \times 10^{- 2}$	$1.1 \times 10^{- 1}$	$4.2 \times 10^{- 2}$	$1.3 \times 10^{- 1}$	$4.8 \times 10^{- 2}$	$1.4 \times 10^{- 1}$
Sentiment + LSTM Jin et al. (2020)	$7.2 \times 10^{- 3}$	$3.2 \times 10^{- 2}$	$1.3 \times 10^{- 2}$	$4.5 \times 10^{- 2}$	$1.7 \times 10^{- 2}$	$5.3 \times 10^{- 2}$
MGAR D. Cao et al. (2020)	$4.5 \times 10^{- 3}$	$1.6 \times 10^{- 2}$	$6.5 \times 10^{- 3}$	$2.1 \times 10^{- 2}$	$1.1 \times 10^{- 2}$	$2.3 \times 10^{- 2}$
DASF-Net (IG only)	$7.3 \times 10^{- 4}$	$2.1 \times 10^{- 2}$	$1.6 \times 10^{- 3}$	$3.2 \times 10^{- 2}$	$2.4 \times 10^{- 3}$	$3.9 \times 10^{- 2}$
DASF-Net (FG only)	$3.4 \times 10^{- 3}$	$4.3 \times 10^{- 2}$	$3.0 \times 10^{- 3}$	$3.9 \times 10^{- 2}$	$2.7 \times 10^{- 3}$	$4.1 \times 10^{- 2}$
DASF-Net (IG + FG)	$4.6 \times 10^{- 4}$	$1.8 \times 10^{- 2}$	$9.5 \times 10^{- 4}$	$2.6 \times 10^{- 2}$	$1.4 \times 10^{- 3}$	$2.8 \times 10^{- 2}$
DASF-Net (Full)	$3.8 \times 10^{- 4}$	$1.5 \times 10^{- 2}$	$8.2 \times 10^{- 4}$	$2.1 \times 10^{- 2}$	$1.1 \times 10^{- 3}$	$2.4 \times 10^{- 2}$

Table 5. Effect of sentiment window size on prediction performance (average over 12 stocks).

$T_{n}$ (Days)	1	2	3	4	5	10	20
MSE	$4.1 \times 10^{- 4}$	$3.9 \times 10^{- 4}$	$3.8 \times 10^{- 4}$	$3.95 \times 10^{- 4}$	$4.7 \times 10^{- 4}$	$8.3 \times 10^{- 4}$	$1.1 \times 10^{- 3}$
MAE	$1.7 \times 10^{- 2}$	$1.55 \times 10^{- 2}$	$1.5 \times 10^{- 2}$	$1.57 \times 10^{- 2}$	$1.8 \times 10^{- 2}$	$2.1 \times 10^{- 2}$	$2.6 \times 10^{- 2}$

Table 6. Impact of individual components within DASF-Net (1-day prediction).

Configuration	MSE	MAE
DASF-Net without IG	$4.6 \times 10^{- 4}$	$1.8 \times 10^{- 2}$
DASF-Net without FG	$6.1 \times 10^{- 4}$	$2.2 \times 10^{- 2}$
DASF-Net without SA-Rep	$7.3 \times 10^{- 4}$	$2.1 \times 10^{- 2}$
Full DASF-Net	$3.8 \times 10^{- 4}$	$1.5 \times 10^{- 2}$

Table 7. Comparison of fusion methods and number of attention heads (averaged over 12 S&P 500 stocks).

Fusion Method	MSE	MAE
Mean-Pool	$9.1 \times 10^{- 4}$	$2.3 \times 10^{- 2}$
Max-Pool	$1.0 \times 10^{- 3}$	$2.5 \times 10^{- 2}$
Self-Attn	$8.8 \times 10^{- 4}$	$2.5 \times 10^{- 2}$
MHA-2	$8.2 \times 10^{- 4}$	$2.2 \times 10^{- 2}$
MHA-4	$5.4 \times 10^{- 4}$	$1.9 \times 10^{- 2}$
MHA-8	$4.2 \times 10^{- 4}$	$1.6 \times 10^{- 2}$
MHA-16	$3.8 \times 10^{- 4}$	$1.5 \times 10^{- 2}$

Table 8. Impact of diffusion strategies on dual-graph learning.

Diffusion Strategy	MSE	MAE
Random Walk (RW)	$5.2 \times 10^{- 4}$	$1.8 \times 10^{- 2}$
Personalized PageRank (PPR)	$4.7 \times 10^{- 4}$	$1.7 \times 10^{- 2}$
Heat Kernel (HK)	$3.8 \times 10^{- 4}$	$1.5 \times 10^{- 2}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nguyen, N.-H.; Nguyen, T.-T.; Ngo, Q.T. DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion. J. Risk Financial Manag. 2025, 18, 417. https://doi.org/10.3390/jrfm18080417

AMA Style

Nguyen N-H, Nguyen T-T, Ngo QT. DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion. Journal of Risk and Financial Management. 2025; 18(8):417. https://doi.org/10.3390/jrfm18080417

Chicago/Turabian Style

Nguyen, Nhat-Hai, Thi-Thu Nguyen, and Quan T. Ngo. 2025. "DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion" Journal of Risk and Financial Management 18, no. 8: 417. https://doi.org/10.3390/jrfm18080417

APA Style

Nguyen, N.-H., Nguyen, T.-T., & Ngo, Q. T. (2025). DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion. Journal of Risk and Financial Management, 18(8), 417. https://doi.org/10.3390/jrfm18080417

Article Menu

DASF-Net: A Multimodal Framework for Stock Price Forecasting with Diffusion-Based Graph Learning and Optimized Sentiment Fusion

Abstract

1. Introduction

2. Related Work

2.1. Statistical and Deep Learning Approaches for Financial Time Series Prediction

2.2. Graph Neural Networks and Diffusion-Based Learning in Financial Modeling

2.3. Sentiment Analysis and Temporal Aggregation in Stock Prediction

2.4. Multimodal Fusion Techniques for Integrating Heterogeneous Financial Data

3. Method

3.1. Problem Definition

3.2. Proposed Framework

3.2.1. Dual Financial Graph Construction

3.2.2. Diffusion-Based Structural Representation Learning (P-Rep)

3.2.3. Sentiment-Aware Representation Extraction (SA-Rep) with Optimized Temporal Aggregation

3.2.4. Adaptive Feature Fusion via Multi-Head Attention

3.2.5. Temporal Forecasting

3.2.6. DASF-Net Training Procedure

4. Experimental Setup

4.1. Dataset

4.2. Evaluation Metrics

4.3. Baseline Models

4.4. Parameter Settings

5. Results

5.1. Forecasting Performance

5.2. Impact of Sentiment Aggregation Window Size

5.3. Impact of Individual Components Within DASF-Net

5.4. Impact of Attention Heads and Fusion Methods

5.5. Impact of Diffusion Strategies

6. Conclusions

6.1. Summary of Contributions

6.2. Key Findings

6.3. Practical Considerations

6.4. Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Per-Stock Performance Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI