1. Introduction
Since the 21st century, the global pandemics of sudden infectious diseases have posed enduring challenges to human health, economic stability, and social governance. When responding to such public health crises, accurate and timely epidemic trend forecasting is the core cornerstone for governments to formulate non-pharmaceutical interventions (NPIs), optimize medical resource allocation, and develop vaccination strategies [
1]. However, infectious disease transmission is a complex dynamic process driven by multi-dimensional environmental factors and containing highly nonlinear evolutionary characteristics [
2]. How to extract robust evolutionary patterns from massive, heterogeneous monitoring data and transform them into foreseeable decision-support information has always been a major and urgent issue to be resolved in the field of computational epidemiology.
Since Kermack proposed the mathematical theory of epidemiology in 1927, mathematical models of infectious diseases have become fundamental tools for analyzing epidemiological characteristics and studying transmission dynamics [
3]. In this field, the SIR and SEIR models have been widely adopted for modeling the transmission dynamics of infectious diseases. In recent years, researchers have proposed various extensions to these classical models. Some studies focuses on structural enhancements. For example, Ramezani et al. introduced a variant of the SEIRD (Susceptible–Exposed–Infectious–Recovered–Deceased) model to better capture the nonlinear dynamics of COVID-19 [
4]. Eiman et al. proposed a fractional-order epidemic model based on fractal calculus that accounts for reinfection. Through theoretical analysis and numerical simulations, their model reveals important properties such as the basic reproduction number, equilibrium points, and system stability [
5]. Alzahrani et al. introduced the Atangana–Baleanu–Caputo fractional-order derivative operator into the SEIR model, proposing a more accurate approach to influenza forecasting [
6]. This method offers a novel perspective for capturing long-range dependencies in epidemic data. Although these studies have significantly enriched the modeling toolkit for infectious disease forecasting, traditional mathematical models still rely heavily on strong assumptions. Parameter estimation typically requires complex fitting procedures, making the models highly sensitive to initial settings and data quality, and resulting in substantial uncertainty. This challenge becomes particularly prominent in the context of long-term epidemics, where model parameters need to be continuously adjusted over time and in response to external conditions. As such, a single fixed-structure model often struggles to capture the full complexity of dynamic epidemic processes.
With the leap in big data technologies and computing power, deep learning paradigms have achieved significant success in the field of time series forecasting. From early recurrent neural networks to the recently emerged Transformer architectures and their various variants (Informer, Autoformer), as well as hierarchical architectures designed for time series decomposition, these models have continuously pushed the upper limits of forecasting accuracy through their powerful nonlinear representation capabilities. For example, Zeroual et al. systematically compared deep learning algorithms such as simple RNN, LSTM, bidirectional LSTM, GRU, and Variational Autoencoders, with results showing that deep models significantly outperform traditional statistical methods in capturing the complex nonlinear features of epidemic data [
7]. Wu et al. proposed the Autoformer model based on a deep decomposition architecture, utilizing an auto-correlation mechanism instead of the traditional attention mechanism to achieve a leap in performance for long-term time series forecasting tasks [
8]. Zeng et al. further expanded the boundaries of deep learning in time series modeling by proposing the DLinear model based on linear time series decomposition, delving into the effectiveness of different neural network architectures in long-term forecasting [
9]. Wang et al. addressed the computational bottlenecks of traditional models in capturing long-range implicit patterns by proposing the S-Mamba model based on selective state space models, which significantly improved the fitting accuracy for complex epidemic evolutionary trajectories through a recursive architecture with linear complexity [
10]. Deep learning time series forecasting models have not only demonstrated superior performance over traditional statistical methods in multiple fields such as rainfall forecasting [
11], financial analysis [
12], and traffic forecasting [
13], but have also excelled in epidemic forecasting. Kırbaş et al. conducted a comparative study on ARIMA, NARNN, and LSTM, finding that LSTM performed best in modeling confirmed COVID-19 cases [
14]. Sembiring et al. proposed an optimized LSTM model (popLSTM) that significantly improved the accuracy of COVID-19 confirmed case predictions by integrating spatiotemporal features in the output gate and maintaining output values below 0.5 [
15]. Shao et al.’s research indicated that an LSTM model combining epidemiological information and climate factors had the highest forecasting accuracy in countries like Germany, Italy, and the United States, outperforming methods such as support vector regression and temporal convolutional networks [
16]. Additionally, Nabi et al. investigated four deep learning models: LSTM, GRU (Gated Recurrent Unit), CNN (Convolutional Neural Network), and MCNN (Multivariate Convolutional Neural Network), with results showing that CNN outperformed other deep learning models in terms of validation accuracy and prediction consistency [
17]. However, despite these models demonstrating outstanding forecasting performance and numerical accuracy in complex epidemic evolution tasks, their inherent “black-box” nature leads to a severe lack of logical transparency in the inference process [
18]. In the highly sensitive public health sector, which relies on scientific decision-making, this lack of interpretability not only makes it difficult for forecasting results to gain the trust of decision-makers but also poses significant security risks when responding to epidemic mutations or formulating critical intervention policies [
19].
To break the black-box dilemma of neural networks, researchers have introduced a series of post hoc interpretability methods, among which SHAP, LIME, and attention mechanism visualization are the most typical. For instance, Lundberg et al. proposed the game-theory-based SHAP method, which quantifies the marginal contribution of each feature to the model output by calculating Shapley values, providing a unified feature attribution framework for complex model predictions [
20]; Ribeiro et al. developed the LIME algorithm, which interprets individual sample predictions of any black-box model by fitting an interpretable model in a local neighborhood, enhancing decision-makers’ understanding of the model’s local predictions [
21]; Lim et al., when developing a time-series forecasting model, identified the contributions of different time steps and external features by visualizing self-attention weights, providing an intuitive logical basis for capturing long-range dependencies [
22]. Su et al. utilized machine learning combined with SHAP attribution techniques to deeply analyze the role of dietary antioxidants in predicting the comorbidity risk of cardiovascular disease and cancer, proving the outstanding efficacy of attribution analysis in revealing complex biomedical logic [
23]. However, although these post hoc explanation methods alleviate the black-box problem to some extent, their essence still belongs to statistical attribution rather than causal inference [
24]. In the highly sensitive scenario of infectious disease forecasting, such methods have significant limitations: first, the explanation process is decoupled from the modeling process, meaning the explanation results are merely ex post descriptions of the model’s fitting phenomena, rather than hard constraints on the model’s internal logic [
25]; second, post hoc methods cannot identify and filter out spurious correlations from the root, and may even yield misleading mechanistic conclusions due to collinearity between features [
26]. This defect of non-causal explanations makes it difficult for decision-makers to confirm whether the forecasting results are truly built on scientific transmission logic, thereby restricting the in-depth application of deep learning models in public health decision-making. To provide a structured and systematic overview of the research landscape, the representative models and methodologies discussed above are summarized and categorized in
Table 1.
To address the aforementioned limitations, this paper proposes an interpretable forecasting network based on temporal causal discovery—CCSANet (Causally Constrained SEIR-Aware Network). CCSANet no longer passively relies on post hoc explanations, but intrinsically discovers the causal topology G between environmental factors and epidemic evolution at the front end of inference via the CausalFormer module [
27]; subsequently, by constructing a causal mask, it imposes structural constraints on the SCI-Block, forcing the model to conduct information transmission only along valid causal paths. This causally endogenous design concept ensures that the model possesses causal-level inference transparency while maintaining the high accuracy of deep learning.
The main contributions of this paper are summarized as follows:
Proposing the CCSANet model, which achieves a deep integration of deep learning architectures and temporal causal discovery. We propose CCSANet, a causally constrained deep learning framework based on SEIR epidemic dynamics. By integrating purely data-driven neural architectures with epidemiological mechanisms, this framework fundamentally resolves the “black-box” nature of traditional deep learning models while improving predictive accuracy. To further learn the complex mechanisms of epidemic evolution under the influence of multi-source environmental factors, we introduce the CausalFormer module to endogenously generate a temporal causal graph. The temporal causal graph explicitly guides the representation learning process of the neural network, ensuring that the model’s forecasting trajectory is executed strictly based on epidemiological logic.
Designing a structured constraint mechanism based on causal masks to significantly mitigate the risk of learning false causal dependencies. Unlike traditional post hoc explanation methods, CCSANet imposes hard structural constraints on the information flow of the SCI-Block by constructing causal path masks. This design forces the model to propagate information solely along legitimate epidemiological causal chains extending from key mechanistic parameters to the epidemic evolutionary state. It fundamentally eliminates attribution biases caused by collinear features and endows the model with endogenous causal consistency and inference transparency.
Conducting extensive validations on multi-source real-world datasets, verifying the model’s dual advantages in forecasting accuracy and causal alignment. In forecasting tasks across multiple countries and regions, CCSANet not only significantly outperforms mainstream baseline models such as LSTM, SCINet, and Informer, but also exhibits stronger generalization capabilities compared to the base ESASNet. Furthermore, through the analysis of the causal graph G, the model accurately identifies the core causal factors driving epidemic fluctuations at different stages, providing a logically rigorous scientific basis for precision prevention and control decisions.
2. Materials and Methods
This section will introduce in detail the proposed interpretable SEIR-aware epidemic forecasting network framework based on temporal causal discovery, which consists of three main components: a temporal causal discovery module, a time-varying SEIR parameter modeling module, and a causally constrained ESASNet deep learning module. As shown in
Figure 1, this network integrates these three key components to construct a complete workflow from multi-source data input and causal structure learning to mechanism-aligned parameter forecasting.
The overall workflow of CCSANet includes: first, utilizing multi-source time series data composed of confirmed cases, temperature, air quality, and other variables; the global temporal causal graph is learned through the CausalFormer model equipped with an SEIR-aware prior loss, ensuring that the discovered causal relationships conform to the mechanisms of epidemic transmission; subsequently, a causal subgraph containing only SEIR state variables is extracted from the complete causal graph; finally, this subgraph is used as a structural prior to impose causal constraints on the SCI-Block in ESASNet, restricting information to pass only along valid causal paths, thereby achieving accurate forecasting of time-varying SEIR parameters (such as , , and ) and supporting interpretability analysis based on causal mechanisms.does not need to be declared.
2.1. Data Sources and Preprocessing
The datasets utilized in this study are constructed from two primary sources. The epidemiological time-series data were obtained from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. This comprehensive repository tracks daily confirmed cases globally across 289 locations in 201 countries and regions. To capture the complete evolutionary trajectory of the pandemic, we extracted data spanning from 22 January 2020 to 9 March 2023. Simultaneously, the corresponding multi-source environmental records were collected from the China Meteorological Data Service Center. To formulate a strictly aligned multivariate time-series dataset, we integrated key meteorological and air quality variables, specifically including daily average TEMP, , , , and .
This study selects five typical cities in China: Beijing, Shanghai, Shenzhen, Chengdu, and Changsha as research subjects, integrating multi-source heterogeneous data to construct an epidemic transmission analysis framework. The selection of these specific urban nodes is predicated on their inherent data characteristics. Firstly, these cities are characterized by massive population densities and extreme demographic mobility, resulting in highly complex and non-stationary epidemic transmission dynamics. Secondly, the daily epidemiological sequences in these regions exhibit extreme volatility, with data distributions ranging from baseline periods of near-zero cases to severe outbreak peaks involving thousands of daily infections. Such profound data variance and structural complexity provide an exceptionally rigorous testbed for evaluating the forecasting capability and robustness of the proposed causally constrained architecture under multi-source environmental interventions.
In the preprocessing stage, as shown in
Figure 2, quality control is first applied to the original sequences: for a small number of missing values, linear interpolation is used to fill them; for abnormal observations that significantly deviate from the normal fluctuation range, a smoothing correction combining a 7-day moving average and a statistical threshold method is applied to reduce noise interference. Subsequently, all variables are normalized and mapped to the
interval to eliminate dimensional differences.
Then, we transformed the raw data according to the requirements of the SEIR model. Since the raw data mainly provides cumulative confirmed cases, cumulative recovered cases, and cumulative deceased cases, we need to convert them into the four state variables in the SEIR model: susceptible, exposed, infectious, and recovered populations. The specific conversion process is as follows: recovered population directly uses the sum of cumulative recovered cases and cumulative deceased cases; infectious population uses cumulative confirmed cases minus the recovered population; exposed population is estimated based on existing literature and epidemiological research, taken as 5 times the daily new confirmed cases [
28]; and susceptible population uses the total population minus the sum of the exposed, infectious, and recovered populations [
29].
To construct the input–output dataset for the model, we adopt a sliding window technique [
30], as shown in
Figure 2. Let the original multivariate time series data be
, where
denotes the four SEIR variables, and
N represents the total number of time steps. We define the input window length as
L; at each time step
t, data from the time period
is used as the input
, and the value at time
is the output
. By sliding this window across the entire time series, we generate
input–output pairs, ultimately forming an input tensor
and an output tensor
.
2.2. SEIR Prior-Guided Temporal Causal Discovery Framework
To learn the causal dependency structure from multi-source epidemic-related time series, this paper proposes a temporal causal discovery method integrating epidemiological priors. This method is based on CausalFormer (Kong et al., 2025 [
27])and enhances the loss function by introducing SEIR dynamics constraints, ensuring that the learned causal graph strictly adheres to infectious disease transmission mechanisms. The resulting causal structure is further distilled into a compact subgraph among SEIR variables, serving as a structural prior for the downstream forecasting network, thereby achieving causal modeling from data-driven to mechanism-aligned. Specifically, the detailed formulation of this framework unfolds sequentially:
Section 2.2.1 elaborates on the initial causal structure learning via the CausalFormer module;
Section 2.2.2 introduces the SEIR dynamic constraint loss to enforce epidemiological consistency; and finally,
Section 2.2.3 details the extraction of the compact causal subgraph, which acts as the explicit structural mask for the downstream inference.
2.2.1. The CausalFormer Model
CausalFormer is an end-to-end temporal causal discovery model based on the Transformer architecture, aimed at inferring directed causal relationships between variables from observed sequences , where V is the variable dimension and L is the window length. The model consists of two main parts: a Causality-aware Transformer Encoder and a Decomposition-based Causality Detector.
The Causality-aware Transformer Encoder learns the causal representations of time series through a prediction task. It introduces Multi-kernel Causal Convolution to aggregate the input sequence along the time dimension under the premise of adhering to temporal priority constraints. The encoder adopts a multi-layer stacked Transformer structure, with each layer consisting of a Multi-head Self-Attention (MHSA) mechanism and a Feed-forward Network (FFN). For the
l-th layer, the attention weights are calculated as follows:
where
,
,
, and
is the key vector dimension. The Query (
Q) acts as the targeted epidemic state variable at the current time step seeking its causal drivers. The Key (
K) represents the historical environmental factors and prior epidemic states serving as potential causal sources. The Value (
V) encapsulates the actual physical influence or dynamic information carried by these historical features. Consequently, the resulting interpretable attention weight matrix
dynamically quantifies the underlying causal driving strength of specific environmental interventions on the transmission states.
Through the self-attention mechanism, the model is capable of capturing complex dependencies simultaneously across the temporal and variable dimensions, thereby characterizing the potential impact patterns between different variables. Multi-head attention further enhances the model’s ability to model multi-scale, heterogeneous causal interactions.
2.2.2. CausalFormer Based on SEIR-Aware Prior Loss
Although CausalFormer possesses strong global explanatory capabilities, as a purely data-driven model, it is susceptible to the interference of spurious correlations when processing multi-source epidemic data containing complex environmental variables such as temperature and air quality. In epidemiological scenarios, many variables show strong statistical correlations but lack genuine causal connections. Without mechanistic constraints, the model may fall into the trap of overfitting local features during the learning process, or even discover causal paths that violate fundamental biological laws; this not only reduces the reliability of the causal graph but also directly weakens the generalization ability of the downstream forecasting module.
Therefore, this study embeds epidemiological prior knowledge into the training objective of CausalFormer; the necessity for this lies in compensating for the lack of robustness in data-driven models under long-tail distributions and complex environmental interference. The total loss function
we constructed is as follows:
where
is the mean squared error of the prediction;
and
induce sparsity in the causal convolution kernels and attention masks via the
norm, aiming to retain the most core intervention signals.
By introducing the prior penalty term
, we establish a bridge between the high-capacity representation ability of Transformer and the mathematical rigor of SEIR dynamical equations. The specific calculation formula is as follows:
where
is the interpretable attention weight matrix capturing complex interaction patterns between variables, and
is a predetermined propensity penalty matrix based on SEIR transmission mechanisms.
For transmission paths that conform to the evolutionary logic of , such as the contact risk posed by infectious individuals to susceptible ones, or the conversion process from exposed to infectious, we set the corresponding penalty factor to 0, thereby encouraging the model to freely capture the dynamic impact of environmental variables on these key parameters within a reasonable search space.
Conversely, for connections that are epidemiologically illogical, such as paths where recovered populations directly lead to an increase in exposed populations, we assign a maximum penalty to the corresponding . This design explicitly forces the model to automatically filter out spurious correlations caused by noise by significantly increasing the training cost of violating mechanistic structures. The rationality of this soft constraint mechanism lies in that it does not fix the causal structure via hard coding, but guides the model through a penalty mechanism to conduct data-driven discovery under the premise of adhering to mechanisms, ensuring that the final learned causal graph not only possesses statistical explanatory power but also has a solid epidemiological semantic foundation. This provides a reliable guarantee for subsequently extracting structural priors with causal significance and guiding downstream deep learning networks to achieve high-precision forecasting.
2.2.3. Causal Subgraph Extraction and Structural Output
After obtaining the global temporal causal graph generated by CausalFormer based on the SEIR-aware prior loss, we will implement SEIR-related causal subgraph extraction tailored to epidemiological dynamics logic, aiming to accurately strip out the core structure conforming to epidemic evolutionary mechanisms from the complex multi-source variable interaction network. Because the global graph contains a large number of statistical associations between environmental covariates and epidemic variables, directly using it as a structural prior may introduce redundant noise; thus, we must focus on the intrinsic causal chains among susceptible, exposed, infectious, and recovered populations.
First, we utilize the causal scores
and
output by the decomposition causality detector to identify causal edges that are significant in both statistical and epidemiological senses through the K-means clustering algorithm. Subsequently, the algorithm accurately screens and extracts an evolutionary subgraph
composed solely of the four state variables S, E, I, R from the global topology, primarily retaining key dynamic paths such as
,
, and
. For each identified subgraph edge
, its precise causal delay
is determined using the time-domain response characteristics of the multi-kernel convolutional kernels, defined by the following calculation formula:
where
T denotes the predetermined length of the sliding window, which also represents the maximum allowable time delay for causal discovery in this framework. In the context of epidemiology, this maximum delay
T is conceptually aligned with the upper limit of the disease’s incubation period.
The extracted SEIR-related subgraph is formally mapped into a binarized adjacency matrix
, which not only reflects data-driven correlations but also embeds mechanism-based structural constraints. To transform this causal knowledge into mechanism guidance for the prediction model, we apply
as a structural prior
within the SCI-Block interaction layer of ESASNet. When predicting time-varying dynamic parameters, this matrix forces the model to aggregate features only along causally verified “mechanistically legitimate” paths. Specifically, the calculation process of the interaction information term for node
i at time
t is expressed as follows:
where
represents the hidden layer feature generated by the causal source variable, and
represents the corresponding learnable interaction weight matrix.
This structured design based on subgraph extraction ensures that while ESASNet utilizes multi-source data for nonlinear fitting, its core prediction logic is constantly locked within the causal skeleton of , thereby achieving a deep integration of deep learning models and epidemiological mechanisms.
2.3. CCSANet Model Based on Causal Constraints
We propose an improved causally constrained CCSANet model (Causally Constrained SEIR-Aware Network), aiming to deeply integrate data-driven time series representations with epidemiological causal mechanisms. Building upon the original ESASNet architecture, this model utilizes the causal subgraph G(t) extracted in
Section 2.2.3 as a structured prior to impose hard topological constraints and strength alignment optimization on the information transmission paths within the network, ensuring that while the model captures nonlinear transmission trends, its internal logic consistently adheres to the evolutionary paths of infectious disease dynamics. Specifically, the architectural details of this module unfold sequentially:
Section 2.3.1 outlines the foundational ESASNet backbone for time-domain modeling;
Section 2.3.2 details the reconstruction of the SCI-Block via the integration of causal topological masks;
Section 2.3.3 defines the joint optimization objective driven by causal strength alignment; and finally,
Section 2.3.4 summarizes the resulting endogenous interpretability mechanism.
2.3.1. ESASNet Model
The construction of the ESASNet model(Explainable SEIR-Aware SCINet) is based on the deep coupling of epidemic transmission dynamics and deep learning time-domain modeling. First, by establishing a system of dynamical differential equations with time-varying characteristics, it defines the core transmission parameters—transmission rate , latent conversion rate , and recovery rate —as functions that dynamically change over time. This time-varying SEIR model establishes the mathematical modeling for epidemic forecasting by describing state transitions among susceptible, exposed, infectious, and recovered individuals.
Within this mechanism constraint system, the prediction of the ESASNet model is based on the powerful nonlinear fitting capability of deep neural networks to accurately estimate the aforementioned time-varying parameter sequences driving epidemic evolution from historical observation data. The model adopts SCINet as the fundamental backbone for time-domain feature extraction; its core advantage lies in employing a hierarchical binary tree-like downsampling mechanism, effectively capturing multi-scale temporal dependencies by decomposing the original epidemic sequence into multiple sub-sequences of different resolutions. During feature extraction at each hierarchical level, the model utilizes the core interaction unit, SCI-Block, to perform downsampling, convolution processing, and interactive learning on odd and even samples, thereby uncovering the hidden dynamic laws within the sequence.
Unlike purely data-driven models that directly predict the number of confirmed cases, ESASNet maps the output of SCINet to parameter estimates at specific moments. This design not only ensures that the model can acutely capture subtle fluctuations in the epidemic but also imposes epidemiological constraints by integrating SEIR dynamical equations, avoiding non-physical prediction results that purely data-driven methods might generate. This cascaded architecture achieves the unification of structure-driven modeling and data-driven learning, providing a robust parameter estimation foundation for the subsequent introduction of causal topological constraints.
2.3.2. Improvement of SCI-Block Incorporating Causal Topological Constraints
Traditional SCI-Blocks [
31] usually adopt a fully connected approach during the feature interaction stage, meaning each variable branch absorbs information from all other variables. However, when processing input sequences containing only core epidemic variables (S, E, I, R), this indiscriminate interaction easily introduces statistical noise inconsistent with the mechanisms, leading the model to learn spurious dynamic associations.
To address this limitation, this study reconstructed the internal interaction logic of the SCI-Block, introducing a topological filter
based on causal subgraphs, as shown in
Figure 3. To facilitate multi-scale feature extraction within this architecture, we utilize four primary convolutional operators:
and
for the initial scaling and shifting in the splitting stage, and
and
for the subsequent interactive transformation between odd and even sub-sequences. These operators allow the network to learn complex temporal patterns at different resolutions. By incorporating the causal mask
into these interaction functions, the module is forced to update feature states solely through mechanistically legitimate pathways, thereby ensuring that the neural information flow remains epidemiologically consistent. The improved module no longer performs full-dimensional feature aggregation but uses a binarized mask
to enforce causal path filtering. Specifically, taking the state update of the even-branch node
i at time
t as an example, the calculation process for its final output feature is expressed as follows:
where
j is the constrained causal source node,
is the transformation matrix with causal weights, and
is the corresponding interactive transformation function. Specifically, the first two terms define the state transition of the even-branch features:
represents the initial input (baseline) features of the even-indexed sub-sequence before information interaction, while
denotes the updated output feature for node
i.
This design forces the feature updates of variable i to absorb information solely from its causal source variables through a mask mechanism. By explicitly introducing the evolutionary logic constraints of at the operator level, it changes the “black-box” nature of original deep learning models blindly searching the feature space and effectively suppresses interaction noise introduced by non-causal variables. By embedding epidemiological mechanisms as topological criteria into the SCI-Block, this study achieves deep alignment between high-dimensional nonlinear fitting capabilities and physical evolutionary mechanisms. This not only enhances the accuracy of the model’s estimation of time-varying parameters, but also clarifies the interaction semantics among variables by restricting information flow, thereby fundamentally endowing prediction results with solid causal interpretability at the algorithmic bottom layer.
2.3.3. Joint Optimization Objective Based on Causal Strength Alignment
To further ensure that the model’s attention weights during feature extraction remain consistent with actual causal contributions, this study adopts a joint optimization objective based on causal strength alignment [
32]. This strategy aims to guide CCSANet to autonomously learn attention patterns conforming to epidemiological mechanisms during the training process by introducing an auxiliary loss term, thereby achieving dual optimization in both numerical accuracy and decision logic.
The model’s total loss function
L is a weighted composite of the main task prediction loss
and the causal auxiliary alignment loss
, defined as follows:
where
is a hyperparameter balancing the importance of the two tasks, used to adjust the impact intensity of causal constraints on model parameter updates.
The main loss term
adopts the mean squared error criterion to minimize the deviation between the predicted values of the time-varying parameters
generated by the core epidemic variables and the reference values calculated based on observational data, ensuring that the model can capture the nonlinear dynamic evolutionary laws of the epidemic sequence. To transform the quantitative causal knowledge discovered by CausalFormer into effective supervision for the prediction model, the causal auxiliary loss
refers to the causal strength alignment strategy, adjusting the direction of network optimization by computing the divergence between the model’s internal attention distribution and external causal priors. Specifically, for the feature attention weights
generated by the SCI-Block during interaction, they are required to approach the normalized causal scores
output in
Section 2.2.3, defined by the expression:
where
is the mask operator of the causal subgraph, ensuring that the alignment process only operates on verified legitimate causal paths.
Through this joint optimization objective, the model is required not only to achieve precision in predicted numerical values during backpropagation but also to ensure that the “importance” it allocates to various variable branches matches the actual performance of epidemiological transmission strength. This constraint strategy effectively reduces the model’s sensitivity to random noise correlations and significantly enhances the robustness of the forecasting system when facing complex fluctuations. Meanwhile, because the model’s weight distribution is anchored to causal chains with a clear physical background, the generation process of prediction conclusions sheds the blindness of purely data-driven methods, thus establishing semantic consistency between deep learning models and epidemiological mechanisms at the algorithm optimization level, and elevating the credibility and interpretability of the overall prediction architecture.
2.3.4. Model Interpretability Mechanism Based on Endogenous Causal Constraints
This paper implements an endogenous interpretability mechanism through the framework of CCSANet; the core lies in supervising the learning process via the causal graph, making the model align not only prediction values at the output end but also epidemiological mechanisms at the underlying interaction logic. Unlike post hoc explanation methods such as SHAP that only perform feature attribution after predictions are completed, this framework achieves transparency in the model’s decision path by treating causal relationships as structural constraints.
The algorithmic implementation of this mechanism relies primarily on dual endogenous constraints: causal constraint and causal strength alignment. First, using the causal adjacency matrix extracted in the previous stage, it explicitly blocks non-mechanism-conforming information flows in the interaction operators of the SCI-Block, mandating that the model can only aggregate information through discovered and verified causal paths when updating feature states. This hard filtering eliminates the possibility of the black-box model blindly fitting spurious correlations from the operator bottom layer, ensuring that every feature exchange possesses explicit epidemiological semantics. Second, by introducing the causal auxiliary loss , the quantified causal scores discovered by CausalFormer serve as the gold standard to perform real-time corrections on the attention weight distribution within ESASNet. This alignment process ensures that the attention assigned to each variable branch is consistent with its actual causal contribution, freeing the generation process of prediction conclusions from purely data-driven blindness and achieving deep integration between the model’s decision logic and causal inference.
Under this endogenous constraint architecture, the model is able to obtain interpretable analysis results endowed with causal relationships. By analyzing the trained interaction weight matrix, this framework can quantify and present the dynamic influence intensity of multi-source variables on core epidemic state variables under different time lags. Specifically, the model can extract and output the causal contribution trajectories of various driving factors under specific delays , thereby clearly characterizing the complete paths through which environmental factors drive the evolution of time-varying transmission parameters. These internally generated explanation results empower the prediction conclusions with not only high-precision numerical support but also logically traceable and mechanistically transparent scientific bases for public health decision-making by demonstrating the time-varying interaction patterns among variables.