Next Article in Journal
Bridging Markov Chain Monte Carlo Techniques and Tierney–Kadane Approximations for Progressively Censored Garhy Reliability Models: Simulation Insights and a Medical Application
Previous Article in Journal
Existence and Blow-Up of Compressible Spherically Symmetric Euler Equations with Vacuum Free Boundary
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Causally Constrained Framework Coupling Causal Discovery and SEIR Mechanisms for Interpretable Epidemic Modeling

1
School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
2
Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan 411201, China
3
School of Biomedical Sciences, Hunan University, Changsha 410012, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(10), 1776; https://doi.org/10.3390/math14101776
Submission received: 20 April 2026 / Revised: 11 May 2026 / Accepted: 18 May 2026 / Published: 21 May 2026

Abstract

Infectious disease transmission is a complex dynamic process governed by intrinsic causal mechanisms rather than simple statistical correlations. Although deep learning paradigms have demonstrated powerful nonlinear representation capabilities, their “black-box” and purely data-driven nature often lead to a severe lack of causal consistency and logical transparency. To bridge this gap, this paper proposes CCSANet (Causally Constrained SEIR-Aware Network), an interpretable forecasting framework that seamlessly embeds epidemiological priors directly into the neural architecture. The model integrates SEIR dynamics into a temporal causal discovery framework, utilizing a mechanism-aware prior loss to guide a CausalFormer in learning a global temporal causal graph from multi-source heterogeneous data. This ensures that the identified relationships strictly adhere to the fundamental evolutionary logic of contagion. Subsequently, the extracted causal subgraphs are encoded as structural priors within a Causal-SCI-Block via a specialized masking mechanism, effectively forcing information to propagate exclusively along epidemiologically legitimate pathways. To ensure deep alignment between neural representations and physical reality, a causal strength alignment loss is introduced to synchronize the network’s attention weights with actual transmission intensities. Experimental evaluations on real-world multi-city datasets demonstrate that this integrated approach significantly outperforms baselines such as LSTM, Informer, and its predecessor, ESASNet. Under a 7-day sliding window configuration, the model maintains a Coefficient of Determination R 2 stably above 0.97, achieving an accuracy improvement of 5.5% to 6.2% and an 8% to 10% reduction in SMAPE, thereby demonstrating that coupling causal discovery with SEIR constraints substantially enhances both predictive precision and physical interpretability.

1. Introduction

Since the 21st century, the global pandemics of sudden infectious diseases have posed enduring challenges to human health, economic stability, and social governance. When responding to such public health crises, accurate and timely epidemic trend forecasting is the core cornerstone for governments to formulate non-pharmaceutical interventions (NPIs), optimize medical resource allocation, and develop vaccination strategies [1]. However, infectious disease transmission is a complex dynamic process driven by multi-dimensional environmental factors and containing highly nonlinear evolutionary characteristics [2]. How to extract robust evolutionary patterns from massive, heterogeneous monitoring data and transform them into foreseeable decision-support information has always been a major and urgent issue to be resolved in the field of computational epidemiology.
Since Kermack proposed the mathematical theory of epidemiology in 1927, mathematical models of infectious diseases have become fundamental tools for analyzing epidemiological characteristics and studying transmission dynamics [3]. In this field, the SIR and SEIR models have been widely adopted for modeling the transmission dynamics of infectious diseases. In recent years, researchers have proposed various extensions to these classical models. Some studies focuses on structural enhancements. For example, Ramezani et al. introduced a variant of the SEIRD (Susceptible–Exposed–Infectious–Recovered–Deceased) model to better capture the nonlinear dynamics of COVID-19 [4]. Eiman et al. proposed a fractional-order epidemic model based on fractal calculus that accounts for reinfection. Through theoretical analysis and numerical simulations, their model reveals important properties such as the basic reproduction number, equilibrium points, and system stability [5]. Alzahrani et al. introduced the Atangana–Baleanu–Caputo fractional-order derivative operator into the SEIR model, proposing a more accurate approach to influenza forecasting [6]. This method offers a novel perspective for capturing long-range dependencies in epidemic data. Although these studies have significantly enriched the modeling toolkit for infectious disease forecasting, traditional mathematical models still rely heavily on strong assumptions. Parameter estimation typically requires complex fitting procedures, making the models highly sensitive to initial settings and data quality, and resulting in substantial uncertainty. This challenge becomes particularly prominent in the context of long-term epidemics, where model parameters need to be continuously adjusted over time and in response to external conditions. As such, a single fixed-structure model often struggles to capture the full complexity of dynamic epidemic processes.
With the leap in big data technologies and computing power, deep learning paradigms have achieved significant success in the field of time series forecasting. From early recurrent neural networks to the recently emerged Transformer architectures and their various variants (Informer, Autoformer), as well as hierarchical architectures designed for time series decomposition, these models have continuously pushed the upper limits of forecasting accuracy through their powerful nonlinear representation capabilities. For example, Zeroual et al. systematically compared deep learning algorithms such as simple RNN, LSTM, bidirectional LSTM, GRU, and Variational Autoencoders, with results showing that deep models significantly outperform traditional statistical methods in capturing the complex nonlinear features of epidemic data [7]. Wu et al. proposed the Autoformer model based on a deep decomposition architecture, utilizing an auto-correlation mechanism instead of the traditional attention mechanism to achieve a leap in performance for long-term time series forecasting tasks [8]. Zeng et al. further expanded the boundaries of deep learning in time series modeling by proposing the DLinear model based on linear time series decomposition, delving into the effectiveness of different neural network architectures in long-term forecasting [9]. Wang et al. addressed the computational bottlenecks of traditional models in capturing long-range implicit patterns by proposing the S-Mamba model based on selective state space models, which significantly improved the fitting accuracy for complex epidemic evolutionary trajectories through a recursive architecture with linear complexity [10]. Deep learning time series forecasting models have not only demonstrated superior performance over traditional statistical methods in multiple fields such as rainfall forecasting [11], financial analysis [12], and traffic forecasting [13], but have also excelled in epidemic forecasting. Kırbaş et al. conducted a comparative study on ARIMA, NARNN, and LSTM, finding that LSTM performed best in modeling confirmed COVID-19 cases [14]. Sembiring et al. proposed an optimized LSTM model (popLSTM) that significantly improved the accuracy of COVID-19 confirmed case predictions by integrating spatiotemporal features in the output gate and maintaining output values below 0.5 [15]. Shao et al.’s research indicated that an LSTM model combining epidemiological information and climate factors had the highest forecasting accuracy in countries like Germany, Italy, and the United States, outperforming methods such as support vector regression and temporal convolutional networks [16]. Additionally, Nabi et al. investigated four deep learning models: LSTM, GRU (Gated Recurrent Unit), CNN (Convolutional Neural Network), and MCNN (Multivariate Convolutional Neural Network), with results showing that CNN outperformed other deep learning models in terms of validation accuracy and prediction consistency [17]. However, despite these models demonstrating outstanding forecasting performance and numerical accuracy in complex epidemic evolution tasks, their inherent “black-box” nature leads to a severe lack of logical transparency in the inference process [18]. In the highly sensitive public health sector, which relies on scientific decision-making, this lack of interpretability not only makes it difficult for forecasting results to gain the trust of decision-makers but also poses significant security risks when responding to epidemic mutations or formulating critical intervention policies [19].
To break the black-box dilemma of neural networks, researchers have introduced a series of post hoc interpretability methods, among which SHAP, LIME, and attention mechanism visualization are the most typical. For instance, Lundberg et al. proposed the game-theory-based SHAP method, which quantifies the marginal contribution of each feature to the model output by calculating Shapley values, providing a unified feature attribution framework for complex model predictions [20]; Ribeiro et al. developed the LIME algorithm, which interprets individual sample predictions of any black-box model by fitting an interpretable model in a local neighborhood, enhancing decision-makers’ understanding of the model’s local predictions [21]; Lim et al., when developing a time-series forecasting model, identified the contributions of different time steps and external features by visualizing self-attention weights, providing an intuitive logical basis for capturing long-range dependencies [22]. Su et al. utilized machine learning combined with SHAP attribution techniques to deeply analyze the role of dietary antioxidants in predicting the comorbidity risk of cardiovascular disease and cancer, proving the outstanding efficacy of attribution analysis in revealing complex biomedical logic [23]. However, although these post hoc explanation methods alleviate the black-box problem to some extent, their essence still belongs to statistical attribution rather than causal inference [24]. In the highly sensitive scenario of infectious disease forecasting, such methods have significant limitations: first, the explanation process is decoupled from the modeling process, meaning the explanation results are merely ex post descriptions of the model’s fitting phenomena, rather than hard constraints on the model’s internal logic [25]; second, post hoc methods cannot identify and filter out spurious correlations from the root, and may even yield misleading mechanistic conclusions due to collinearity between features [26]. This defect of non-causal explanations makes it difficult for decision-makers to confirm whether the forecasting results are truly built on scientific transmission logic, thereby restricting the in-depth application of deep learning models in public health decision-making. To provide a structured and systematic overview of the research landscape, the representative models and methodologies discussed above are summarized and categorized in Table 1.
To address the aforementioned limitations, this paper proposes an interpretable forecasting network based on temporal causal discovery—CCSANet (Causally Constrained SEIR-Aware Network). CCSANet no longer passively relies on post hoc explanations, but intrinsically discovers the causal topology G between environmental factors and epidemic evolution at the front end of inference via the CausalFormer module [27]; subsequently, by constructing a causal mask, it imposes structural constraints on the SCI-Block, forcing the model to conduct information transmission only along valid causal paths. This causally endogenous design concept ensures that the model possesses causal-level inference transparency while maintaining the high accuracy of deep learning.
The main contributions of this paper are summarized as follows:
  • Proposing the CCSANet model, which achieves a deep integration of deep learning architectures and temporal causal discovery. We propose CCSANet, a causally constrained deep learning framework based on SEIR epidemic dynamics. By integrating purely data-driven neural architectures with epidemiological mechanisms, this framework fundamentally resolves the “black-box” nature of traditional deep learning models while improving predictive accuracy. To further learn the complex mechanisms of epidemic evolution under the influence of multi-source environmental factors, we introduce the CausalFormer module to endogenously generate a temporal causal graph. The temporal causal graph explicitly guides the representation learning process of the neural network, ensuring that the model’s forecasting trajectory is executed strictly based on epidemiological logic.
  • Designing a structured constraint mechanism based on causal masks to significantly mitigate the risk of learning false causal dependencies. Unlike traditional post hoc explanation methods, CCSANet imposes hard structural constraints on the information flow of the SCI-Block by constructing causal path masks. This design forces the model to propagate information solely along legitimate epidemiological causal chains extending from key mechanistic parameters to the epidemic evolutionary state. It fundamentally eliminates attribution biases caused by collinear features and endows the model with endogenous causal consistency and inference transparency.
  • Conducting extensive validations on multi-source real-world datasets, verifying the model’s dual advantages in forecasting accuracy and causal alignment. In forecasting tasks across multiple countries and regions, CCSANet not only significantly outperforms mainstream baseline models such as LSTM, SCINet, and Informer, but also exhibits stronger generalization capabilities compared to the base ESASNet. Furthermore, through the analysis of the causal graph G, the model accurately identifies the core causal factors driving epidemic fluctuations at different stages, providing a logically rigorous scientific basis for precision prevention and control decisions.

2. Materials and Methods

This section will introduce in detail the proposed interpretable SEIR-aware epidemic forecasting network framework based on temporal causal discovery, which consists of three main components: a temporal causal discovery module, a time-varying SEIR parameter modeling module, and a causally constrained ESASNet deep learning module. As shown in Figure 1, this network integrates these three key components to construct a complete workflow from multi-source data input and causal structure learning to mechanism-aligned parameter forecasting.
The overall workflow of CCSANet includes: first, utilizing multi-source time series data composed of confirmed cases, temperature, air quality, and other variables; the global temporal causal graph is learned through the CausalFormer model equipped with an SEIR-aware prior loss, ensuring that the discovered causal relationships conform to the mechanisms of epidemic transmission; subsequently, a causal subgraph containing only SEIR state variables is extracted from the complete causal graph; finally, this subgraph is used as a structural prior to impose causal constraints on the SCI-Block in ESASNet, restricting information to pass only along valid causal paths, thereby achieving accurate forecasting of time-varying SEIR parameters (such as β ( t ) , σ ( t ) , and γ ( t ) ) and supporting interpretability analysis based on causal mechanisms.does not need to be declared.

2.1. Data Sources and Preprocessing

The datasets utilized in this study are constructed from two primary sources. The epidemiological time-series data were obtained from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. This comprehensive repository tracks daily confirmed cases globally across 289 locations in 201 countries and regions. To capture the complete evolutionary trajectory of the pandemic, we extracted data spanning from 22 January 2020 to 9 March 2023. Simultaneously, the corresponding multi-source environmental records were collected from the China Meteorological Data Service Center. To formulate a strictly aligned multivariate time-series dataset, we integrated key meteorological and air quality variables, specifically including daily average TEMP, PM 2.5 , SO 2 , NO 2 , and O 3 .
This study selects five typical cities in China: Beijing, Shanghai, Shenzhen, Chengdu, and Changsha as research subjects, integrating multi-source heterogeneous data to construct an epidemic transmission analysis framework. The selection of these specific urban nodes is predicated on their inherent data characteristics. Firstly, these cities are characterized by massive population densities and extreme demographic mobility, resulting in highly complex and non-stationary epidemic transmission dynamics. Secondly, the daily epidemiological sequences in these regions exhibit extreme volatility, with data distributions ranging from baseline periods of near-zero cases to severe outbreak peaks involving thousands of daily infections. Such profound data variance and structural complexity provide an exceptionally rigorous testbed for evaluating the forecasting capability and robustness of the proposed causally constrained architecture under multi-source environmental interventions.
In the preprocessing stage, as shown in Figure 2, quality control is first applied to the original sequences: for a small number of missing values, linear interpolation is used to fill them; for abnormal observations that significantly deviate from the normal fluctuation range, a smoothing correction combining a 7-day moving average and a statistical threshold method is applied to reduce noise interference. Subsequently, all variables are normalized and mapped to the [ 0 , 1 ] interval to eliminate dimensional differences.
Then, we transformed the raw data according to the requirements of the SEIR model. Since the raw data mainly provides cumulative confirmed cases, cumulative recovered cases, and cumulative deceased cases, we need to convert them into the four state variables in the SEIR model: susceptible, exposed, infectious, and recovered populations. The specific conversion process is as follows: recovered population directly uses the sum of cumulative recovered cases and cumulative deceased cases; infectious population uses cumulative confirmed cases minus the recovered population; exposed population is estimated based on existing literature and epidemiological research, taken as 5 times the daily new confirmed cases [28]; and susceptible population uses the total population minus the sum of the exposed, infectious, and recovered populations [29].
To construct the input–output dataset for the model, we adopt a sliding window technique [30], as shown in Figure 2. Let the original multivariate time series data be D R K × N , where K = 4 denotes the four SEIR variables, and N represents the total number of time steps. We define the input window length as L; at each time step t, data from the time period ( t L + 1 ) t is used as the input x R K × L , and the value at time t + 1 is the output y R K × 1 . By sliding this window across the entire time series, we generate ( N L ) input–output pairs, ultimately forming an input tensor X R K × L × ( N L ) and an output tensor Y R K × 1 × ( N L ) .

2.2. SEIR Prior-Guided Temporal Causal Discovery Framework

To learn the causal dependency structure from multi-source epidemic-related time series, this paper proposes a temporal causal discovery method integrating epidemiological priors. This method is based on CausalFormer (Kong et al., 2025 [27])and enhances the loss function by introducing SEIR dynamics constraints, ensuring that the learned causal graph strictly adheres to infectious disease transmission mechanisms. The resulting causal structure is further distilled into a compact subgraph among SEIR variables, serving as a structural prior for the downstream forecasting network, thereby achieving causal modeling from data-driven to mechanism-aligned. Specifically, the detailed formulation of this framework unfolds sequentially: Section 2.2.1 elaborates on the initial causal structure learning via the CausalFormer module; Section 2.2.2 introduces the SEIR dynamic constraint loss to enforce epidemiological consistency; and finally, Section 2.2.3 details the extraction of the compact causal subgraph, which acts as the explicit structural mask for the downstream inference.

2.2.1. The CausalFormer Model

CausalFormer is an end-to-end temporal causal discovery model based on the Transformer architecture, aimed at inferring directed causal relationships between variables from observed sequences X = [ x 1 , , x L ] R V × L , where V is the variable dimension and L is the window length. The model consists of two main parts: a Causality-aware Transformer Encoder and a Decomposition-based Causality Detector.
The Causality-aware Transformer Encoder learns the causal representations of time series through a prediction task. It introduces Multi-kernel Causal Convolution to aggregate the input sequence along the time dimension under the premise of adhering to temporal priority constraints. The encoder adopts a multi-layer stacked Transformer structure, with each layer consisting of a Multi-head Self-Attention (MHSA) mechanism and a Feed-forward Network (FFN). For the l-th layer, the attention weights are calculated as follows:
Attention ( Q , K , V ) = softmax QK d k V
where Q = X W Q l , K = X W K l , V = X W V l , and d k is the key vector dimension. The Query (Q) acts as the targeted epidemic state variable at the current time step seeking its causal drivers. The Key (K) represents the historical environmental factors and prior epidemic states serving as potential causal sources. The Value (V) encapsulates the actual physical influence or dynamic information carried by these historical features. Consequently, the resulting interpretable attention weight matrix M R V × V dynamically quantifies the underlying causal driving strength of specific environmental interventions on the transmission states.
Through the self-attention mechanism, the model is capable of capturing complex dependencies simultaneously across the temporal and variable dimensions, thereby characterizing the potential impact patterns between different variables. Multi-head attention further enhances the model’s ability to model multi-scale, heterogeneous causal interactions.

2.2.2. CausalFormer Based on SEIR-Aware Prior Loss

Although CausalFormer possesses strong global explanatory capabilities, as a purely data-driven model, it is susceptible to the interference of spurious correlations when processing multi-source epidemic data containing complex environmental variables such as temperature and air quality. In epidemiological scenarios, many variables show strong statistical correlations but lack genuine causal connections. Without mechanistic constraints, the model may fall into the trap of overfitting local features during the learning process, or even discover causal paths that violate fundamental biological laws; this not only reduces the reliability of the causal graph but also directly weakens the generalization ability of the downstream forecasting module.
Therefore, this study embeds epidemiological prior knowledge into the training objective of CausalFormer; the necessity for this lies in compensating for the lack of robustness in data-driven models under long-tail distributions and complex environmental interference. The total loss function L t o t a l we constructed is as follows:
L t o t a l = L M S E + λ K | K | 1 + λ M | M | 1 + λ p L S E I R
where L M S E is the mean squared error of the prediction; | K | 1 and | M | 1 induce sparsity in the causal convolution kernels and attention masks via the L 1 norm, aiming to retain the most core intervention signals.
By introducing the prior penalty term L S E I R , we establish a bridge between the high-capacity representation ability of Transformer and the mathematical rigor of SEIR dynamical equations. The specific calculation formula is as follows:
L S E I R = i , j P i j · M i j
where M i j is the interpretable attention weight matrix capturing complex interaction patterns between variables, and P i j is a predetermined propensity penalty matrix based on SEIR transmission mechanisms.
For transmission paths that conform to the evolutionary logic of S E I R , such as the contact risk posed by infectious individuals to susceptible ones, or the conversion process from exposed to infectious, we set the corresponding penalty factor P i j to 0, thereby encouraging the model to freely capture the dynamic impact of environmental variables on these key parameters within a reasonable search space.
Conversely, for connections that are epidemiologically illogical, such as paths where recovered populations directly lead to an increase in exposed populations, we assign a maximum penalty to the corresponding P i j . This design explicitly forces the model to automatically filter out spurious correlations caused by noise by significantly increasing the training cost of violating mechanistic structures. The rationality of this soft constraint mechanism lies in that it does not fix the causal structure via hard coding, but guides the model through a penalty mechanism to conduct data-driven discovery under the premise of adhering to mechanisms, ensuring that the final learned causal graph not only possesses statistical explanatory power but also has a solid epidemiological semantic foundation. This provides a reliable guarantee for subsequently extracting structural priors G ( t ) with causal significance and guiding downstream deep learning networks to achieve high-precision forecasting.

2.2.3. Causal Subgraph Extraction and Structural Output

After obtaining the global temporal causal graph generated by CausalFormer based on the SEIR-aware prior loss, we will implement SEIR-related causal subgraph extraction tailored to epidemiological dynamics logic, aiming to accurately strip out the core structure conforming to epidemic evolutionary mechanisms from the complex multi-source variable interaction network. Because the global graph contains a large number of statistical associations between environmental covariates and epidemic variables, directly using it as a structural prior may introduce redundant noise; thus, we must focus on the intrinsic causal chains among susceptible, exposed, infectious, and recovered populations.
First, we utilize the causal scores S ( A ) and S ( K ) output by the decomposition causality detector to identify causal edges that are significant in both statistical and epidemiological senses through the K-means clustering algorithm. Subsequently, the algorithm accurately screens and extracts an evolutionary subgraph G s u b composed solely of the four state variables S, E, I, R from the global topology, primarily retaining key dynamic paths such as S E , E I , and I R . For each identified subgraph edge e j , i , its precise causal delay d ( e j , i ) is determined using the time-domain response characteristics of the multi-kernel convolutional kernels, defined by the following calculation formula:
d ( e j , i ) = T argmax t ( S ( K ) [ i ] j , i , t )
where T denotes the predetermined length of the sliding window, which also represents the maximum allowable time delay for causal discovery in this framework. In the context of epidemiology, this maximum delay T is conceptually aligned with the upper limit of the disease’s incubation period.
The extracted SEIR-related subgraph is formally mapped into a binarized adjacency matrix A , which not only reflects data-driven correlations but also embeds mechanism-based structural constraints. To transform this causal knowledge into mechanism guidance for the prediction model, we apply A as a structural prior G ( t ) within the SCI-Block interaction layer of ESASNet. When predicting time-varying dynamic parameters, this matrix forces the model to aggregate features only along causally verified “mechanistically legitimate” paths. Specifically, the calculation process of the interaction information term for node i at time t is expressed as follows:
m i ( t ) = j : A i j S E I R = 1 W i j h j ( t )
where h j ( t ) represents the hidden layer feature generated by the causal source variable, and W i j represents the corresponding learnable interaction weight matrix.
This structured design based on subgraph extraction ensures that while ESASNet utilizes multi-source data for nonlinear fitting, its core prediction logic is constantly locked within the causal skeleton of S E I R , thereby achieving a deep integration of deep learning models and epidemiological mechanisms.

2.3. CCSANet Model Based on Causal Constraints

We propose an improved causally constrained CCSANet model (Causally Constrained SEIR-Aware Network), aiming to deeply integrate data-driven time series representations with epidemiological causal mechanisms. Building upon the original ESASNet architecture, this model utilizes the causal subgraph G(t) extracted in Section 2.2.3 as a structured prior to impose hard topological constraints and strength alignment optimization on the information transmission paths within the network, ensuring that while the model captures nonlinear transmission trends, its internal logic consistently adheres to the evolutionary paths of infectious disease dynamics. Specifically, the architectural details of this module unfold sequentially: Section 2.3.1 outlines the foundational ESASNet backbone for time-domain modeling; Section 2.3.2 details the reconstruction of the SCI-Block via the integration of causal topological masks; Section 2.3.3 defines the joint optimization objective driven by causal strength alignment; and finally, Section 2.3.4 summarizes the resulting endogenous interpretability mechanism.

2.3.1. ESASNet Model

The construction of the ESASNet model(Explainable SEIR-Aware SCINet) is based on the deep coupling of epidemic transmission dynamics and deep learning time-domain modeling. First, by establishing a system of dynamical differential equations with time-varying characteristics, it defines the core transmission parameters—transmission rate β , latent conversion rate σ , and recovery rate γ —as functions that dynamically change over time. This time-varying SEIR model establishes the mathematical modeling for epidemic forecasting by describing state transitions among susceptible, exposed, infectious, and recovered individuals.
Within this mechanism constraint system, the prediction of the ESASNet model is based on the powerful nonlinear fitting capability of deep neural networks to accurately estimate the aforementioned time-varying parameter sequences driving epidemic evolution from historical observation data. The model adopts SCINet as the fundamental backbone for time-domain feature extraction; its core advantage lies in employing a hierarchical binary tree-like downsampling mechanism, effectively capturing multi-scale temporal dependencies by decomposing the original epidemic sequence into multiple sub-sequences of different resolutions. During feature extraction at each hierarchical level, the model utilizes the core interaction unit, SCI-Block, to perform downsampling, convolution processing, and interactive learning on odd and even samples, thereby uncovering the hidden dynamic laws within the sequence.
Unlike purely data-driven models that directly predict the number of confirmed cases, ESASNet maps the output of SCINet to parameter estimates at specific moments. This design not only ensures that the model can acutely capture subtle fluctuations in the epidemic but also imposes epidemiological constraints by integrating SEIR dynamical equations, avoiding non-physical prediction results that purely data-driven methods might generate. This cascaded architecture achieves the unification of structure-driven modeling and data-driven learning, providing a robust parameter estimation foundation for the subsequent introduction of causal topological constraints.

2.3.2. Improvement of SCI-Block Incorporating Causal Topological Constraints

Traditional SCI-Blocks [31] usually adopt a fully connected approach during the feature interaction stage, meaning each variable branch absorbs information from all other variables. However, when processing input sequences containing only core epidemic variables (S, E, I, R), this indiscriminate interaction easily introduces statistical noise inconsistent with the mechanisms, leading the model to learn spurious dynamic associations.
To address this limitation, this study reconstructed the internal interaction logic of the SCI-Block, introducing a topological filter G ( t ) based on causal subgraphs, as shown in Figure 3. To facilitate multi-scale feature extraction within this architecture, we utilize four primary convolutional operators: ψ and ϕ for the initial scaling and shifting in the splitting stage, and η and ρ for the subsequent interactive transformation between odd and even sub-sequences. These operators allow the network to learn complex temporal patterns at different resolutions. By incorporating the causal mask A S E I R into these interaction functions, the module is forced to update feature states solely through mechanistically legitimate pathways, thereby ensuring that the neural information flow remains epidemiologically consistent. The improved module no longer performs full-dimensional feature aggregation but uses a binarized mask A S E I R to enforce causal path filtering. Specifically, taking the state update of the even-branch node i at time t as an example, the calculation process for its final output feature is expressed as follows:
F e v e n , i = F e v e n , i s ± η j { S , E , I , R } A j i SEIR · ( W j i F o d d , j s )
where j is the constrained causal source node, W j i is the transformation matrix with causal weights, and η is the corresponding interactive transformation function. Specifically, the first two terms define the state transition of the even-branch features: F e v e n _ i n represents the initial input (baseline) features of the even-indexed sub-sequence before information interaction, while F e v e n _ i denotes the updated output feature for node i.
This design forces the feature updates of variable i to absorb information solely from its causal source variables through a mask mechanism. By explicitly introducing the evolutionary logic constraints of S E I R at the operator level, it changes the “black-box” nature of original deep learning models blindly searching the feature space and effectively suppresses interaction noise introduced by non-causal variables. By embedding epidemiological mechanisms as topological criteria into the SCI-Block, this study achieves deep alignment between high-dimensional nonlinear fitting capabilities and physical evolutionary mechanisms. This not only enhances the accuracy of the model’s estimation of time-varying parameters, but also clarifies the interaction semantics among variables by restricting information flow, thereby fundamentally endowing prediction results with solid causal interpretability at the algorithmic bottom layer.

2.3.3. Joint Optimization Objective Based on Causal Strength Alignment

To further ensure that the model’s attention weights during feature extraction remain consistent with actual causal contributions, this study adopts a joint optimization objective based on causal strength alignment [32]. This strategy aims to guide CCSANet to autonomously learn attention patterns conforming to epidemiological mechanisms during the training process by introducing an auxiliary loss term, thereby achieving dual optimization in both numerical accuracy and decision logic.
The model’s total loss function L is a weighted composite of the main task prediction loss L p r e d and the causal auxiliary alignment loss L a u x , defined as follows:
L = L p r e d + λ L a u x
where λ is a hyperparameter balancing the importance of the two tasks, used to adjust the impact intensity of causal constraints on model parameter updates.
The main loss term L p r e d adopts the mean squared error criterion to minimize the deviation between the predicted values of the time-varying parameters β ^ ( t ) , σ ^ ( t ) , γ ^ ( t ) generated by the core epidemic variables and the reference values calculated based on observational data, ensuring that the model can capture the nonlinear dynamic evolutionary laws of the epidemic sequence. To transform the quantitative causal knowledge discovered by CausalFormer into effective supervision for the prediction model, the causal auxiliary loss L a u x refers to the causal strength alignment strategy, adjusting the direction of network optimization by computing the divergence between the model’s internal attention distribution and external causal priors. Specifically, for the feature attention weights a i j generated by the SCI-Block during interaction, they are required to approach the normalized causal scores g i j output in Section 2.2.3, defined by the expression:
L a u x = s = 1 | C | i , j { S , E , I , R } ( a i j g i j ) 2 · A i j SEIR
where A i j SEIR is the mask operator of the causal subgraph, ensuring that the alignment process only operates on verified legitimate causal paths.
Through this joint optimization objective, the model is required not only to achieve precision in predicted numerical values during backpropagation but also to ensure that the “importance” it allocates to various variable branches matches the actual performance of epidemiological transmission strength. This constraint strategy effectively reduces the model’s sensitivity to random noise correlations and significantly enhances the robustness of the forecasting system when facing complex fluctuations. Meanwhile, because the model’s weight distribution is anchored to causal chains with a clear physical background, the generation process of prediction conclusions sheds the blindness of purely data-driven methods, thus establishing semantic consistency between deep learning models and epidemiological mechanisms at the algorithm optimization level, and elevating the credibility and interpretability of the overall prediction architecture.

2.3.4. Model Interpretability Mechanism Based on Endogenous Causal Constraints

This paper implements an endogenous interpretability mechanism through the framework of CCSANet; the core lies in supervising the learning process via the causal graph, making the model align not only prediction values at the output end but also epidemiological mechanisms at the underlying interaction logic. Unlike post hoc explanation methods such as SHAP that only perform feature attribution after predictions are completed, this framework achieves transparency in the model’s decision path by treating causal relationships as structural constraints.
The algorithmic implementation of this mechanism relies primarily on dual endogenous constraints: causal constraint and causal strength alignment. First, using the causal adjacency matrix A SEIR extracted in the previous stage, it explicitly blocks non-mechanism-conforming information flows in the interaction operators of the SCI-Block, mandating that the model can only aggregate information through discovered and verified causal paths when updating feature states. This hard filtering eliminates the possibility of the black-box model blindly fitting spurious correlations from the operator bottom layer, ensuring that every feature exchange possesses explicit epidemiological semantics. Second, by introducing the causal auxiliary loss L a u x , the quantified causal scores discovered by CausalFormer serve as the gold standard to perform real-time corrections on the attention weight distribution within ESASNet. This alignment process ensures that the attention assigned to each variable branch is consistent with its actual causal contribution, freeing the generation process of prediction conclusions from purely data-driven blindness and achieving deep integration between the model’s decision logic and causal inference.
Under this endogenous constraint architecture, the model is able to obtain interpretable analysis results endowed with causal relationships. By analyzing the trained interaction weight matrix, this framework can quantify and present the dynamic influence intensity of multi-source variables on core epidemic state variables under different time lags. Specifically, the model can extract and output the causal contribution trajectories of various driving factors under specific delays d ( e j , i ) , thereby clearly characterizing the complete paths through which environmental factors drive the evolution of time-varying transmission parameters. These internally generated explanation results empower the prediction conclusions with not only high-precision numerical support but also logically traceable and mechanistically transparent scientific bases for public health decision-making by demonstrating the time-varying interaction patterns among variables.

3. Results

3.1. Experimental Settings

To comprehensively evaluate the performance of the CCSANet model, this study selected five representative cities—Beijing, Shanghai, Shenzhen, Chengdu, and Changsha—as experimental subjects; the data time span covers multiple complete fluctuation cycles of the epidemic from outbreak and evolution to stabilization. Traditional time-series deep learning models LSTM [33] and Informer [34], as well as baseline models SCINet [31] and ESASNet, were chosen as comparative models. In addition to the aforementioned deep learning baselines, we initially evaluated traditional statistical methods (ARIMA) and classical mathematical compartment models (SEIR). However, empirical evaluations revealed that due to the strong nonlinear interference of multi-source environmental variables in real-world data, these traditional methods struggled to capture the complex spatiotemporal lag characteristics, resulting in severe performance degradation. For instance, the traditional SEIR model only achieved an average R 2 of approximately 0.4 across the datasets, which is drastically inferior to the neural network baselines that consistently achieved R 2 scores exceeding 0.90. Therefore, to ensure the cutting-edge nature and objective fairness of the comparison, the benchmark experiments in this study primarily focus on comparing pure data-driven deep learning models against our proposed causally constrained architecture.
This experiment distinguishes the data requirements between the causal discovery stage and the forecasting modeling stage: first, the SEIR prior-guided CausalFormer framework is utilized on a multi-source dataset containing environmental variables such as temperature and air quality indices to conduct global causal inference, identifying the potential driving relationships of environmental factors on transmission characteristics; subsequently, through the subgraph extraction mechanism described in Section 2.2.3, only structural priors involving the core epidemic variables S, E, I, and R are retained, while the subsequent forecasting module of the CCSANet model is trained and evaluated exclusively on these refined epidemic indicator sequences. Regarding time window parameter configuration, this study selected 5 days and 7 days as the core sliding window lengths. This parameter setting is based on prior experimental verifications of ESASNet, which proved that the 5-day and 7-day window periods could most effectively balance the model’s ability to capture epidemic evolutionary patterns.
In terms of parameter settings, since the length of the input time series must adapt to different forecasting lengths, we constructed four comparative experiments with different lengths of sliding windows. The SEIR model initialization parameters were set as follows: initial susceptible population S is the total population of each country; exposed population E is set to five times the daily new confirmed cases; initial infectious population I is cumulative confirmed cases minus the recovered population; and initial recovered population R is the sum of cumulative recovered and deceased cases. SCINet architecture parameters used a three-layer tree structure, a convolution kernel size of 5, a dropout rate of 0.5, 32 feature channels, and LeakyReLU ( α = 0.01 ) as the activation function. Regarding the training and optimization strategy, the model employed the Adam optimizer, with the learning rate set to 0.001, and the hyperparameter λ balancing the main prediction task’s mean squared error and the causal auxiliary loss was set to 0.1.
All experiments in this study were conducted on a workstation equipped with a 64-bit Windows11 operating system, an AMD Ryzen (Advanced Micro Devices, Inc., Santa Clara, CA, USA) 9 7945HX CPU (2.50 GHz), 32 GB of RAM, and an NVIDIA GeForce RTX 4060 GPU (NVIDIA Corporation, Santa Clara, CA, USA, 8 GB VRAM). The deep learning models and data preprocessing pipelines were implemented using Python 3.8 and the PyTorch 2.0 framework.

Evaluation Metrics

To quantitatively evaluate the forecasting performance of the proposed model, we employ two widely used standard metrics: the Coefficient of Determination ( R 2 ) and the Symmetric Mean Absolute Percentage Error (SMAPE). The formulas are defined as follows:
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2
S M A P E = 100 % n i = 1 n | y i y ^ i | ( | y i | + | y ^ i | ) / 2
where y i is the actual value, y ^ i is the predicted value, y ¯ is the mean of the actual values, and n is the number of samples.
From the fundamental perspective of time-series forecasting, R 2 and SMAPE are widely adopted because they provide complementary evaluations. Specifically, R 2 quantifies the proportion of variance explained by the model, serving as a global indicator of how well the model captures the overall temporal trend and volatility of the sequence. Conversely, SMAPE measures the relative prediction error. It provides a symmetric percentage evaluation that remains robust across time-series segments with vastly different scales, which is a significant advantage over absolute error metrics like the Root Mean Square Error (RMSE) or Mean Absolute Error (MAE) that are easily skewed by high-value outliers.
In the specific application of infectious disease forecasting, while numerous evaluation metrics exist, we ultimately selected R 2 and SMAPE because they are exceptionally aligned with the highly non-stationary nature of epidemic data. For public health decision-making, accurately capturing the evolutionary trajectory of the epidemic curve, especially predicting turning points and outbreak trends, is often more critical than point-by-point absolute precision. R 2 effectively measures this morphological similarity. Furthermore, daily epidemic data fluctuates dramatically, ranging from near-zero cases during latent periods to tens of thousands during peaks. SMAPE effectively mitigates the scale bias caused by massive outbreak peaks, ensuring that the model’s predictive capability during the crucial early-warning phase is fairly evaluated rather than being overwhelmed by peak errors.
Finally, the adoption of these two standard metrics facilitates a direct and fair comparison with existing state-of-the-art deep learning baselines, such as Informer and SCINet.

3.2. Experimental Results and Analysis

3.2.1. Overall Prediction Performance

To quantitatively assess the performance improvement effect of CCSANet, this section first compares the predictive performance of models under a 5-day sliding window configuration. As shown in Table 2 below, CCSANet achieved optimal performance across all tested cities; its Coefficient of Determination R 2 stably maintained above 0.97, achieving an accuracy improvement of about 5.5% to 6.2% compared to the second-best ESASNet. On the SMAPE metric evaluating prediction deviation, CCSANet exhibited stronger stability, with its average error dropping by approximately 8% to 10% relative to ESASNet. For example, in the Beijing dataset, CCSANet’s SMAPE was only 9.6124%, significantly lower than ESASNet’s 10.5421%. The core of this performance leap lies in the constraint mechanism of the causal subgraph G ( t ) , which effectively shields the spurious correlation interference brought by non-causal features, enabling the model to more accurately estimate SEIR dynamic parameters.
When the sliding window is extended to 7 days, as shown in Table 3, the fitting effects of all models universally improve as they acquire richer weekly evolutionary information. Under this configuration, CCSANet’s leading advantage remains solid. Experimental analysis shows that through the causal strength alignment loss L a u x , CCSANet guides attention weights to align with the true distribution of transmission mechanisms; this allows the model to not only capture numerical fluctuations when dealing with long-term dependencies but also accurately restore the causal feedback relationships between transmission rate β ( t ) and recovery rate γ ( t ) .
From the comprehensive analysis of Table 2 and Table 3, it is evident that CCSANet significantly outperforms traditional deep learning models (LSTM, Informer) and baseline models (SCINet, ESASNet) under all experimental configurations. Compared to purely data-driven “black-box” models (like LSTM, Informer, and SCINet), CCSANet solves the issue of predictive logical blindness. Traditional time-series models rely entirely on statistical correlations, making them extremely susceptible to capturing false statistical associations when facing epidemic data fraught with random fluctuations, which results in severe drifts in forecasting outcomes near inflection points. CCSANet, on the other hand, establishes underlying causal boundaries via the causal subgraph G ( t ) , ensuring that the features extracted by the model remain consistently locked within the logic of infectious disease dynamics. Secondly, compared to structured models endowed with preliminary mechanism awareness (such as ESASNet), CCSANet realizes a leap from extensive interaction to precise causal control. Although the original SCI-Block architecture possesses strong time-domain decomposition capabilities, it still employs fully connected aggregation during the branch interaction stage, failing to prevent mechanism-inconsistent information from flowing into prediction branches. The advantage of CCSANet lies in constraining feature flow via topological filters and guiding it in concert with causal strength alignment loss; this dual constraint mechanism ensures that the neural network’s attention allocation truly aligns deeply with epidemiological transmission strength.
To formally validate whether the observed performance gains of CCSANet are statistically significant, we conducted the Diebold–Mariano test. This test evaluates the null hypothesis that there is no significant difference in the forecast accuracy between two competing models. As presented in Table 4, the p-values for the comparisons between CCSANet and various baseline models across all five urban datasets are consistently below the significance level of 0.05 . Specifically, the p-values relative to LSTM and Transformer are predominantly below 0.01. These results rigorously reject the null hypothesis, confirming that the superior predictive performance of CCSANet stems from its causally constrained architecture rather than stochastic fluctuations.

3.2.2. Ablation Study

In this section, we verify the contribution of the improved loss function in CausalFormer to the final predictive performance. In the field of causal discovery research, metrics like F1-score are generally used on synthetic data to assess the accuracy of causal graphs; however, for real-world epidemic data, the lack of an a priori causal Ground Truth makes it impossible to directly quantify and evaluate causal graph generation precision. Therefore, this study adopts a downstream task validation approach, reversely determining the effectiveness of the causal graph generation based on the accuracy of the subsequent CCSANet forecasting model. If the improved causal discovery algorithm can provide more accurate topological priors, the errors of the prediction model when handling nonlinear sequences should drop significantly.
The experiment compared two configurations: Original-CF (generating causal graphs using the original CausalFormer without loss function improvement) and Modified-CF (generating causal graphs using the improved loss function proposed in this study). Table 5 and Table 6 present the performance comparisons across the five city datasets under 5-day and 7-day sliding windows, respectively.
From the experimental results in Table 5 and Table 6, it is evident that generating causal graphs with the improved loss function brought significant predictive performance gains across all tested cities and both sliding window settings. Under the 5-day window, SMAPEs decreased by an average of about 7%; under the 7-day window, the reduction margin of errors was even more pronounced, generally reaching over 8.5%. It is noteworthy that for the Beijing dataset under the 7-day window setting, although the prediction accuracy R 2 was adjusted to 0.9436 due to sequence fluctuation impacts, improving CausalFormer significantly reduced its SMAPE indicator from 9.7542% to 8.8653%, effectively enhancing the robustness of long-sequence predictions.
The reason behind this consistent improvement lies in that the improved loss function markedly elevated the purity of the causal graph. In real environmental data, complex statistical collinearities exist between covariates and SEIR variables; the original loss function easily captures pseudo-correlations, thereby introducing noise paths into the causal graph. This study, by introducing targeted sparsity and directional constraints at the CausalFormer stage, accurately eliminated non-causal interference items and produced a causal subgraph G ( t ) more in line with epidemiological mechanisms. This high-quality topological constraint ensures that CCSANet’s feature interactions operate on genuine causal-driven paths, eliminating predictive drifts caused by erroneous priors.

3.2.3. Interpretability Analysis

After validating the predictive accuracy of CCSANet, this subsection further explores the model’s interpretability during the inference process. Unlike traditional black-box models that solely rely on parameterized feature fitting, CCSANet’s core value lies in injecting dynamical priors into the deep learning network via causal constraints. By visually analyzing the internally generated causal topologies and key dynamical parameters, we can intuitively observe how causal constraints regulate the neural network’s attention allocation, thereby ensuring that predictive logic is not only statistically valid but also scientifically sound mechanically.
Figure 4 visualizes the multi-source environmental variable SEIR causal graph G constructed in this study, which serves as the logical cornerstone for CCSANet to realize causally constrained inference. The graph’s topology explicitly maps the interplay between physical states and transmission dynamics: the nodes S , E , I , and R represent the susceptible, exposed, infectious, and recovered population compartments, while the intermediate nodes β , σ , and γ characterize the time-varying transmission rate, conversion rate, and recovery rate, respectively. To further quantify these relationships, the edges are annotated with polarity symbols; a “+” sign indicates a positive causal drive—such as environmental pollutants promoting the transmission rate—whereas a “-” sign denotes an inhibitory effect. This structured representation allows the model to filter statistical noise into a causal chain with clear epidemiological semantics.
The topological structure and edge intensities of this causal graph are quantitatively defined by the learned alignment weights within the SCI-Block, ensuring that the visual representation is strictly grounded in numerical evidence. As illustrated in Figure 4, the directed arrows represent the causal flow of information, where the magnitude of each interaction is determined by its corresponding weight. For instance, the positive causal drive from PM 2.5 , SO 2 , and NO 2 toward the transmission rate β is substantiated by high alignment intensities of 0.78 , 0.65 , and 0.71 , respectively, reflecting their primary roles as viral carriers or susceptibility enhancers. Conversely, the inhibitory effects of TEMP and O 3 on transmission rate β are formally quantified by alignment weights of 0.62 and 0.58 . This explicit numerical definition ensures that the deep learning network is not merely performing heuristic feature interaction, but is executing a constrained inference process where the interaction paths and their monotonicities are rigorously dictated by learned weights compliant with epidemiological knowledge.
To verify the effective execution of the aforementioned structural constraints in dynamic forecasting, we took the Beijing dataset as an example to extract the evolutionary trajectories of endogenously estimated dynamical parameters. Figure 5 maps the causal interaction trajectories of Beijing’s transmission rate β and conversion rate σ , intuitively demonstrating the model’s ability to capture temporal causal sequences. It is observable that the fluctuation peak of transmission rate β leads the conversion rate σ on the time axis, and a stable time lag of about 5 days exists between the two.
This phase deviation stemming from causal constraints accurately restores the physical process of the incubation period from exposure infection to confirmed onset in epidemiology. Under this constraint, the deep learning network no longer blindly fits the synchrony of two variables, but instead learns the causally driven logic where the cause precedes the effect; this temporal interpretability empirically proves CCSANet’s deep alignment with disease evolutionary mechanisms.
The causal interaction trajectories of Beijing’s transmission rate and recovery rate are shown in Figure 6. During the ascending phase of epidemic development, transmission intensity β dominates the inference weights; however, as the epidemic enters the descending phase post-peak, the endogenous recovery rate σ exhibits a steady climbing trend. This inversely correlated evolutionary trajectory substantiates that the model, under the guidance of causal paths, can dynamically adjust feature focal points according to systemic state changes. By recognizing the energy conversion between transmission and recovery paths within the causal topology, the deep learning layer actualizes a logical switch from growth-driven to recovery-feedback, thereby enhancing the model’s robustness during cross-stage predictions.
The causal interaction trajectories of Beijing’s recovery rate and conversion rate are shown in Figure 7. From Figure 7, it is evident that the model maintains exceedingly high logical self-consistency internally. In the fading stage of the epidemic, as conversion efficiency σ continuously decays, recovery speed γ sustains a high plateau. This synergistic evolutionary posture reflects the neural network’s precise mapping of the stock-load conversion process within the causal graph: as incidence pressure decreases at the input end, the system’s focus automatically balances toward the healing process at the output end. This self-consistency indicates that even under massive feature interactions, causal alignment loss persistently coerces the deep feature space and physical causal space to maintain semantic agreement, preventing parameter oscillation phenomena common in traditional models.
In summary, the interpretability analysis in this subsection indicates that CCSANet successfully regularizes deep learning behavioral paradigms via causal constraints. By deconstructing the causal topology graph and dynamic parameter trajectories, we empirically demonstrate that the model can filter statistical correlations into causal transmission chains imbued with physical meaning. This causally endogenous forecasting mechanism not only bolsters model inference accuracy under heterogeneous environments but also opens a transparent window supported by scientific logic for the “black boxes” of complex nonlinear systems, realizing a profound fusion of data-driven methods and causal reasoning.

4. Discussion

The empirical performance of the CCSANet proposed in this study across multiple city datasets profoundly reveals the significance of embedding causal path constraints into deep learning architectures to improve the robustness of epidemic forecasting. Distinct from previous studies that solely rely on attention mechanisms for statistical fitting, this model actualizes the capacity to autonomously extract causal topologies from data via the CausalFormer module. Experimental results indicate that such structured constraints based on causal discovery can significantly suppress the “overfitting” tendencies that deep learning models are prone to when processing multi-source heterogeneous environmental features. Specifically, the robustness of CCSANet against overfitting is guaranteed through three primary mechanisms. First, structural regularization is achieved via causal constraints. By explicitly shielding non-causal paths, CCSANet successfully confines the neural network’s search space to domains that align with epidemiological laws, acting as a powerful domain-specific regularization that prevents the fitting of irrational noise. Second, empirical evidence from our multi-city validation demonstrates the model’s strong generalization capabilities. CCSANet consistently achieved high accuracy across five cities with vastly different population densities and epidemic patterns, providing strong empirical evidence that the model captures generalizable transmission dynamics rather than city-specific noise. Finally, standard technical safeguards were rigorously employed during the training phase. This included the application of Dropout to the feature interaction layers to enhance the robustness of individual neurons, as well as the implementation of Early Stopping based on validation loss to terminate training before the model could over-memorize the training sequences. This paradigm shift from data-driven to causality-driven is the core reason the model maintains high accuracy across cross-regional and long-sequence forecasting tasks. This paradigm shift from data-driven to causality-driven is the core reason the model maintains high accuracy across cross-regional and long-sequence forecasting tasks [35]. On the interpretability dimension, CCSANet realizes a qualitative leap from post hoc analysis to endogenous mechanism alignment. Contrasting with the limitations of the antecedent ESASNet, which relies on external explanatory tools for weight analysis, this study utilizes causal strength alignment loss to furnish internally generated attention distributions with deterministic physical semantics. The phase lags between transmission rates and conversion rates observed in our experiments validate the model’s ability to capture the core causal chain of the incubation period. This logical self-consistency dramatically elevates the credibility of forecasting results in public health decision-making, ensuring that model outputs are no longer isolated numbers but evolutionary trajectories underpinned by mechanisms. Furthermore, it is crucial to recognize that fitting mechanistic parameters, such as the transmission rate β and conversion rate σ , to sparse real-world observational data is inherently an ill-posed problem. In traditional epidemiological modeling, the high degrees of freedom in the parameter space often lead to identifiability challenges, where multiple parameter combinations may yield statistically identical incidence curves. Although CCSANet has achieved notable breakthroughs in predictive precision and logical transparency, there remains room for improvement in this research. The current causal graph discovery mechanism primarily hinges on the temporal correlations and delays of multi-source factors; when encountering highly non-stationary non-environmental factors—such as dramatic shifts in epidemic prevention policies or sudden spikes in vaccination rates—the model’s capability to capture transient causal drifts still needs reinforcement [36]. Additionally, as the forecasting window extends, the dynamic evolution of causal relationships may grow more intricate. Future research directions could attempt to introduce dynamic causal graph reconstruction mechanisms combined with transfer learning techniques to augment the model’s adaptive inference capabilities across different epidemic evolutionary stages and policy contexts, thereby constructing a more robust intelligent monitoring and early warning system.

5. Conclusions

This paper proposes an interpretable epidemic forecasting network based on temporal causal discovery, CCSANet, intended to overcome the limitations of current deep learning models in infectious disease prediction, where they struggle to differentiate spurious correlations due to logical black-boxing. By deeply amalgamating causal inference paradigms with SEIR dynamical mechanisms, this research constructs a holistic inference framework extending from causal topology discovery and causal path constraints to mechanistic strength alignment. Empirical results show that CCSANet can endogenously extract temporal causal graphs compliant with epidemiological logic from multi-source heterogeneous data, and utilizing this as a structural prior significantly improves the model’s forecasting performance in complex environments. Principal findings indicate that the SCI-Block, post-introduction of causal mask constraints, can capture essential mechanisms linking environmental driving factors to epidemic evolution more effectively. In comparative experiments on multi-city COVID-19 datasets, CCSANet substantially surpassed LSTM, Informer, and the baseline ESASNet model on key evaluation metrics such as R 2 and SMAPE, proving the efficacy of causal constraints in fortifying the inductive bias of neural networks. Furthermore, the model exhibits robust endogenous interpretability by recreating the causal feedback loop among transmission, conversion, and recovery parameters. This achievement not only paves a novel technical pathway for architecting epidemic forecasting models possessing theoretical scaffolding and physical consistency, but also offers vital theoretical references for causal discovery and logical inference within complex dynamical systems.

Author Contributions

R.Z., Y.Z., Z.F. and Y.L. contributed to the study conception and design. Data preparation, modeling, and analysis were performed by R.Z. The first draft of the manuscript was written by R.Z. and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Humanities and Social Sciences Project of Ministry of Education of China (No. 24YJAZH237), Natural Science Foundation of Hunan Province, China (No.2026JJ50443), and Hunan Provincial Key Research and Development Program Project (No. 2024JK2129).

Data Availability Statement

Publicly available datasets here: https://github.com/CSSEGISandData/COVID-19 (accessed on 15 June 2025) can be found Code availability.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CCSANetCausally Constrained SEIR-Aware Network
CNNConvolutional Neural Network
CSSECenter for Systems Science and Engineering
ESASNetExplainable SEIR-Aware SCINet
FFNFeed-Forward Network
GRUGated Recurrent Unit
LIMELocal Interpretable Model-agnostic Explanations
LSTMLong Short-Term Memory
MCNNMultivariate Convolutional Neural Network
MHSAMulti-Head Self-Attention
NPIsNon-Pharmaceutical Interventions
RNNRecurrent Neural Network
SCINetSample Convolution and Interaction Network
SEIRSusceptible–Exposed–Infectious–Recovered
SHAPSHapley Additive exPlanations
SMAPESymmetric Mean Absolute Percentage Error

References

  1. Hsiang, S.; Allen, D.; Annan-Phan, S.; Bell, K.; Bolliger, I.; Chong, T.; Druckenmiller, H.; Huang, L.Y.; Hultgren, A.; Krasovich, E.; et al. The effect of large-scale anti-contagion policies on the COVID-19 pandemic. Nature 2020, 584, 262–267. [Google Scholar] [CrossRef]
  2. Scarpino, S.V.; Petri, G. On the predictability of infectious disease outbreaks. Nat. Commun. 2019, 10, 898. [Google Scholar] [CrossRef]
  3. Kermack, W.O.; McKendrick, A.G.; Walker, G.T. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. A 1927, 115, 700–721. [Google Scholar] [CrossRef]
  4. Ramezani, S.B.; Amirlatifi, A.; Rahimi, S. A novel compartmental model to capture the nonlinear trend of COVID-19. Comput. Biol. Med. 2021, 134, 104421. [Google Scholar] [CrossRef]
  5. Eiman; Shah, K.; Sarwar, M.; Abdeljawad, T. On mathematical model of infectious disease by using fractals fractional analysis. Discrete Contin. Dyn. Syst. Ser. S 2024, 17, 3064–3085. [Google Scholar] [CrossRef]
  6. Alzahrani, S.; Saadeh, R.; Abdoon, M.A.; Qazza, A.; El Guma, F.; Berir, M. Numerical Simulation of an Influenza Epidemic: Prediction with Fractional SEIR and the ARIMA Model. Appl. Math. Inf. Sci. 2024, 18, 1–12. [Google Scholar] [CrossRef]
  7. Zeroual, A.; Harrou, F.; Dairi, A.; Sun, Y. Deep learning methods for forecasting COVID-19 time-series data: A comparative study. Chaos Solitons Fractals 2020, 140, 110121. [Google Scholar] [CrossRef] [PubMed]
  8. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
  9. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 11121–11128. [Google Scholar]
  10. Wang, Z.; Kong, F.; Feng, S.; Wang, M.; Yang, X.; Zhao, H.; Wang, D.; Zhang, Y. Is Mamba effective for time series forecasting? Neurocomputing 2025, 619, 129178. [Google Scholar] [CrossRef]
  11. Ridwan, W.M.; Sapitang, M.; Aziz, A.; Kushiar, K.F.; Ahmed, A.N.; El-Shafie, A. Rainfall forecasting model using machine learning methods: Case study Terengganu, Malaysia. Ain Shams Eng. J. 2021, 12, 1651–1663. [Google Scholar] [CrossRef]
  12. Mokhtari, K.E.; Higdon, B.P.; Başar, A. Interpreting financial time series with SHAP values. In Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering, Toronto, ON, Canada, 4–6 November 2019; pp. 166–172. [Google Scholar]
  13. Xu, X.; Hu, X.; Zhao, Y.; Lü, X.; Aapaoja, A. Urban short-term traffic speed prediction with complicated information fusion on accidents. Expert Syst. Appl. 2023, 224, 119887. [Google Scholar] [CrossRef]
  14. Kırbaş, İ.; Sözen, A.; Tuncer, A.D.; Kazancıoğlu, F.Ş. Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches. Chaos Solitons Fractals 2020, 138, 110015. [Google Scholar] [CrossRef]
  15. Sembiring, I.; Wahyuni, S.N.; Sediyono, E. LSTM algorithm optimization for COVID-19 prediction model. Heliyon 2024, 10, e26158. [Google Scholar] [CrossRef]
  16. Shao, Y.; Wan, T.K.; Chan, K.H.K. Prediction of COVID-19 cases by multifactor driven long short-term memory (LSTM) model. Sci. Rep. 2025, 15, 4935. [Google Scholar] [CrossRef] [PubMed]
  17. Nabi, K.N.; Tahmid, M.T.; Rafi, A.; Kader, M.E.; Haider, A. Forecasting COVID-19 cases: A comparative analysis between recurrent and convolutional neural networks. Results Phys. 2021, 24, 104137. [Google Scholar] [CrossRef]
  18. Castelvecchi, D. Can we open the black box of AI? Nat. News 2016, 538, 20. [Google Scholar] [CrossRef] [PubMed]
  19. Wiens, J.; Saria, S.; Sendak, M.; Ghassemi, M.; Liu, V.X.; Doshi-Velez, F.; Jung, K.; Heller, K.; Kale, D.; Saeed, M.; et al. Do no harm: A roadmap for responsible machine learning for health care. Nat. Med. 2019, 25, 1337–1340. [Google Scholar] [CrossRef] [PubMed]
  20. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  21. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  22. Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
  23. Qi, X.; Wang, S.; Fang, C.; Jia, J.; Lin, L.; Yuan, T. Machine learning and SHAP value interpretation for predicting comorbidity of cardiovascular disease and cancer with dietary antioxidants. Redox Biol. 2025, 79, 103470. [Google Scholar] [CrossRef]
  24. Naser, M.Z. Fundamental flaws of physics-informed neural networks and explainability methods in engineering systems. Comput. Ind. Eng. 2026, 212, 111704. [Google Scholar] [CrossRef]
  25. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  26. Kumar, I.E.; Venkatasubramanian, S.; Scheidegger, C.; Friedler, S. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 5491–5500. [Google Scholar]
  27. Kong, L.; Li, W.; Yang, H.; Zhang, Y.; Guan, J.; Zhou, S. CausalFormer: An Interpretable Transformer for Temporal Causal Discovery. IEEE Trans. Knowl. Data Eng. 2025, 37, 102–115. [Google Scholar] [CrossRef]
  28. Hao, X.; Cheng, S.; Wu, D.; Wu, T.; Lin, X.; Wang, C. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature 2020, 584, 420–424. [Google Scholar] [CrossRef]
  29. He, S.; Peng, Y.; Sun, K. SEIR modeling of the COVID-19 and its dynamics. Nonlinear Dyn. 2020, 101, 1667–1680. [Google Scholar] [CrossRef]
  30. Jo, W.; Kim, D. Neural additive time-series models: Explainable deep learning for multivariate time-series prediction. Expert Syst. Appl. 2023, 228, 120307. [Google Scholar] [CrossRef]
  31. Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. SCINet: Time series modeling and forecasting with sample convolution and interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar]
  32. Deng, Y.; Liang, Y.; Yiu, S.-M. Towards interpretable stock trend prediction through causal inference. Expert Syst. Appl. 2024, 238, 121654. [Google Scholar] [CrossRef]
  33. Chimmula, V.K.R.; Zhang, L. Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals 2020, 135, 109864. [Google Scholar] [CrossRef]
  34. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 11106–11115. [Google Scholar]
  35. Schölkopf, B.; Locatello, F.; Bauer, S.; Ke, N.R.; Kalchbrenner, N.; Goyal, A.; Bengio, Y. Toward causal representation learning. Proc. IEEE 2021, 109, 612–634. [Google Scholar] [CrossRef]
  36. Liang, X.; Hao, K.; Chen, L.; Ren, L. Time Series Prediction Problems Under Covariate Drift. In Proceedings of the 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS), Kaifeng, China, 17–19 May 2024. [Google Scholar]
Figure 1. Overall workflow of CCSANet.
Figure 1. Overall workflow of CCSANet.
Mathematics 14 01776 g001
Figure 2. Sliding window data preprocessing.
Figure 2. Sliding window data preprocessing.
Mathematics 14 01776 g002
Figure 3. Improved SCI-Block interaction architecture incorporating causal topological constraints G(t).
Figure 3. Improved SCI-Block interaction architecture incorporating causal topological constraints G(t).
Mathematics 14 01776 g003
Figure 4. Multi-source environmental variable SEIR causal graph G.
Figure 4. Multi-source environmental variable SEIR causal graph G.
Mathematics 14 01776 g004
Figure 5. Causal interaction trajectories of transmission rate β and conversion rate σ in Beijing.
Figure 5. Causal interaction trajectories of transmission rate β and conversion rate σ in Beijing.
Mathematics 14 01776 g005
Figure 6. Causal interaction trajectories of transmission rate β and recovery rate γ in Beijing.
Figure 6. Causal interaction trajectories of transmission rate β and recovery rate γ in Beijing.
Mathematics 14 01776 g006
Figure 7. Causal interaction trajectories of recovery rate γ and conversion rate σ in Beijing.
Figure 7. Causal interaction trajectories of recovery rate γ and conversion rate σ in Beijing.
Mathematics 14 01776 g007
Table 1. Systematic classification of representative forecasting models and methods.
Table 1. Systematic classification of representative forecasting models and methods.
Modeling ParadigmRepresentative Models and Methods
Traditional Mathematical & Mechanistic ModelsSIR/SEIR Models [3]
SEIRD Variant [4], Fractional-order Models [5,6]
Statistical ARIMA [14]
Data-driven Deep Learning FrameworksRNNs: Simple RNN, LSTM, Bi-LSTM, GRU, VAE [7,14,17], popLSTM [15], Climate-integrated LSTM [16]
CNNs: CNN, Multivariate CNN (MCNN) [17]
Transformers: Autoformer [8], DLinear [9], TFT [22]
State Space: S-Mamba [10]
Interpretability MethodsAttribution: SHAP [20,23], LIME [21]
Visualization: Attention Weight Visualization [22]
Table 2. Experimental results with a 5-day sliding window.
Table 2. Experimental results with a 5-day sliding window.
CityMetricsSCINetESASNetLSTMInformerCCSANet
Beijing R 2 0.91340.92340.88760.89670.9712
SMAPE11.234510.542112.123411.72349.7543
Shanghai R 2 0.90120.91560.87430.88210.9654
SMAPE11.567810.891212.453211.98459.9123
Chengdu R 2 0.89560.90870.86540.87980.9587
SMAPE11.876511.123412.876512.342110.1234
Shenzhen R 2 0.90870.91980.88120.89010.9621
SMAPE11.342110.765412.234511.87659.8765
Changsha R 2 0.88970.90210.85430.87120.9543
SMAPE12.012311.345613.123412.567810.4321
Table 3. Experimental results with a 7-day sliding window.
Table 3. Experimental results with a 7-day sliding window.
CityMetricsSCINetESASNetLSTMInformerCCSANet
Beijing R 2 0.95210.95890.91380.93260.9436
SMAPE9.33829.15869.9249.5388.8653
Shanghai R 2 0.94120.94870.90210.92140.9854
SMAPE9.56419.321410.12319.75428.5123
Chengdu R 2 0.93540.94120.89540.91230.9792
SMAPE9.87239.543210.451210.01238.7541
Shenzhen R 2 0.94760.95230.90870.92540.9881
SMAPE9.42319.214510.00239.61248.4012
Changsha R 2 0.92870.93560.88760.90540.9734
SMAPE10.12349.876510.782110.32458.9876
Table 4. p-values of the Diebold–Mariano test comparing CCSANet against baseline models.
Table 4. p-values of the Diebold–Mariano test comparing CCSANet against baseline models.
Baseline ModelBeijingShanghaiShenzhenChengduChangsha
LSTM0.00120.00080.00210.00050.0015
Transformer0.00450.01240.00380.00820.0210
ESASNet0.01500.00950.01400.01120.0120
Table 5. Ablation study results of CausalFormer loss function improvement under a 5-day sliding window.
Table 5. Ablation study results of CausalFormer loss function improvement under a 5-day sliding window.
CityMetricsModel
CausalFormer(Original-CF)CausalFormer(Modified-CF)
Beijing R 2 0.94120.9712
SMAPE10.21349.7543
Shanghai R 2 0.93450.9654
SMAPE10.56789.9123
Chengdu R 2 0.91230.9587
SMAPE10.876510.1234
Shenzhen R 2 0.92560.9621
SMAPE10.43219.8765
Changsha R 2 0.90450.9543
SMAPE11.234110.4321
Table 6. Ablation study results of CausalFormer loss function improvement under a 7-day sliding window.
Table 6. Ablation study results of CausalFormer loss function improvement under a 7-day sliding window.
CityMetricsModel
CausalFormer(Original-CF)CausalFormer(Modified-CF)
Beijing R 2 0.90230.9436
SMAPE9.75428.8653
Shanghai R 2 0.93120.9854
SMAPE9.34128.5123
Chengdu R 2 0.92140.9792
SMAPE9.65438.7541
Shenzhen R 2 0.93450.9881
SMAPE9.21348.4012
Changsha R 2 0.91340.9734
SMAPE9.87628.9876
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, R.; Zhao, Y.; Fang, Z.; Liu, Y. A Causally Constrained Framework Coupling Causal Discovery and SEIR Mechanisms for Interpretable Epidemic Modeling. Mathematics 2026, 14, 1776. https://doi.org/10.3390/math14101776

AMA Style

Zhu R, Zhao Y, Fang Z, Liu Y. A Causally Constrained Framework Coupling Causal Discovery and SEIR Mechanisms for Interpretable Epidemic Modeling. Mathematics. 2026; 14(10):1776. https://doi.org/10.3390/math14101776

Chicago/Turabian Style

Zhu, Rui, Yijiang Zhao, Zhixiong Fang, and Yizhi Liu. 2026. "A Causally Constrained Framework Coupling Causal Discovery and SEIR Mechanisms for Interpretable Epidemic Modeling" Mathematics 14, no. 10: 1776. https://doi.org/10.3390/math14101776

APA Style

Zhu, R., Zhao, Y., Fang, Z., & Liu, Y. (2026). A Causally Constrained Framework Coupling Causal Discovery and SEIR Mechanisms for Interpretable Epidemic Modeling. Mathematics, 14(10), 1776. https://doi.org/10.3390/math14101776

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop