Enhanced Causal Discovery for Autocorrelated Time Series via Adaptive Momentary Conditional Independence

Gao, Minglong; Zhou, Yingchun

doi:10.3390/math14071129

Open AccessFeature PaperArticle

Enhanced Causal Discovery for Autocorrelated Time Series via Adaptive Momentary Conditional Independence

by

Minglong Gao

and

Yingchun Zhou

^*

Key Laboratory of Advanced Theory and Application in Statistics and Data Science—MOE, School of Statistics, East China Normal University, Shanghai 200062, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(7), 1129; https://doi.org/10.3390/math14071129

Submission received: 18 February 2026 / Revised: 19 March 2026 / Accepted: 23 March 2026 / Published: 27 March 2026

(This article belongs to the Section D1: Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

Discovering causal relationships from time series data is essential for understanding complex dynamical systems across a range of domains. However, strong autocorrelation often limits the detection power of existing algorithms and increases the risk of false positives. To address these challenges, the Adaptive Momentary Conditional Independence (aMCI) method is introduced to mitigate the masking effects of autocorrelation and maintain control over false discovery rates. The aMCI method adaptively modifies the conditioning set to reduce the impact of autocorrelation on the accuracy of causal discovery. In addition, a multi-phase algorithm, the Enhanced Causal Discovery via aMCI (ECD-aMCI) algorithm, is proposed to robustly learn the causal graph by effectively applying the aMCI framework. The algorithm is designed to be hyperparameter-insensitive, order-independent, and provably consistent under oracle conditions. Extensive evaluations on simulated and benchmark datasets demonstrate that the proposed algorithm substantially improves the accuracy of causal discovery from time series, especially in the presence of strong autocorrelation.

Keywords:

causal structure learning; conditional independence test; time series; window causal graph

MSC:

68T30; 62H22

1. Introduction

Multivariate time series exist widely in various domains such as earth science, neuroscience, and economics. Discovering causal relationships within these multivariate time series, encompassing both contemporaneous and lagged effects, is crucial for understanding the underlying mechanisms driving these systems. Accurate causal structures enable researchers to better comprehend system dynamics, build more precise prediction models, and estimate causal effects more reliably within the potential outcome framework [1,2].

While significant advances have been made in causal discovery methods for time series data, a persistent challenge remains: effectively handling autocorrelation [3,4]. Autocorrelation, the correlation of a variable with its past values, is a common characteristic of time series data that can significantly impact the performance of causal discovery algorithms. In particular, strong autocorrelation can obscure true causal relationships, leading to both false negatives (missed causal links) and false positives (spurious causal links) [5].

Existing approaches for causal discovery from time series data broadly fall into four categories: structural causal model (SCM)-based, Granger causality (GC)-based, score-based (SB) and constraint-based (CB) algorithms. SCM-based algorithms such as VAR-LiNGAM [6] learn windowed causal graphs with explicit time lag information but are restricted by their linearity assumption. More recent SCM methods such as Time Series Models with Independent Noise (TiMINo) [7], Noise-Based/Constraint-Based approach (NBCB) [8], and Nonlinear Causal Discovery via HM-NICA (NCDH) [9] can capture nonlinear dynamics but often learn summary graphs that do not preserve time lag information. Additionally, NCDH does not model contemporaneous relationships. GC-based algorithms rooted in predictive modeling, test whether past values of one variable improve predictions of another. Modern variants like ACD [10] and CR-VAE [11] employ neural architectures to capture nonlinear dynamics. However, these methods also produce summary graphs without explicit time lag information.

Score-based algorithms like NOTEARS [12] and its time series extensions DYNOTEARS [13] and NTS-NOTEARS [14] formulate directed acyclic graph (DAG) learning as continuous optimization problems. While these methods have shown promising results, they may require careful hyperparameter tuning, which can present challenges in practical applications. Constraint-based algorithms such as PCMCI [5] and PCMCI+ [15] utilize conditional independence tests to learn causal graphs from time series. PCMCI+ extends PCMCI by incorporating contemporaneous link discovery while effectively controlling false positive rates. However, its detection power for lagged links tends to decrease as autocorrelation increases. Bagged-PCMCI+ [16] addresses uncertainty estimation through bagging techniques, though at the cost of substantially increased computational requirements. Finally, Table 1 summarizes the characteristics of existing causal discovery methods for time series data.

Overall, existing time series causal discovery algorithms struggle to simultaneously address three main challenges: performance degradation as autocorrelation strengthens, sensitivity to hyperparameter selection, and the inability to simultaneously learn both lagged and contemporaneous causal relationships. To overcome these limitations, a novel method called Adaptive Momentary Conditional Independence (aMCI) is proposed. Conditional independence testing forms the backbone of decision-making in the constraint-based causal discovery algorithms, and thus enhancing its efficiency can substantially influence overall performance. The aMCI achieves this goal by dynamically modifying the conditioning set according to the causal structure of variables. Different strategies are applied to lagged and contemporaneous links, with a structured decision procedure guiding the choice of an appropriate conditioning set for each conditional independence test. Building on this foundation, the paper introduces the Enhanced Causal Discovery via aMCI (ECD-aMCI) algorithm specifically designed to fully leverage the capabilities of aMCI. Conventional causal discovery algorithms typically start from a fully connected graph, which can limit aMCI’s effectiveness in the initial stages. To overcome this limitation, ECD-aMCI employs a progressive refinement strategy: first establishing initial lagged parent estimates, then refining these estimates using aMCI, before finally discovering the complete causal skeleton. This sequential approach allows aMCI to operate with increasingly accurate structural information, substantially improving detection power for true causal links in autocorrelated systems while maintaining robust control over false positive rates.

The contributions of this paper are summarized as follows:

We deeply investigate the mechanism behind the significant performance decline of causal discovery algorithms in autocorrelated data, revealing that conditioning on historical non-confounder nodes substantially reduces test power.
We develop aMCI, a novel methodology that strategically modifies conditioning sets in time series data to effectively overcome the masking effects of autocorrelation.
We propose ECD-aMCI, a multi-phase algorithm designed to fully leverage the capabilities of aMCI through the progressive refinement of causal structures. The proposed algorithm provides a hyperparameter-insensitive, order-independent, and provably consistent framework to learn both lagged and contemporaneous links.
We offer a novel perspective for constraint-based algorithms, emphasizing that theoretically equivalent design choices under ideal conditions may yield significantly different results in practice, thereby necessitating a preference for choices more robustly suited to the underlying data characteristics.

The remainder of this paper is organized as follows: Section 2 introduces the fundamental concepts of d-separation in causal graphs and the PCMCI algorithm. Section 3 describes the aMCI method and the ECD-aMCI algorithm in detail. Section 4 provides a comprehensive evaluation of the ECD-aMCI algorithm through both simulated and benchmark datasets. Section 5 and Section 6 conclude with a discussion of our contributions.

2. Preliminaries

2.1. Notations

Table 2 summarizes the key mathematical notations used throughout the paper.

2.2. d-Separation and Completed Partially Directed Acyclic Graph

A directed acyclic graph (DAG) [17] encodes a set of conditional independence relations through the criterion of d-separation. Given three disjoint subsets of nodes X, Y, and Z in a DAG

G

, we say that X is d-separated from Y given Z (denoted

X ⊥ ⊥ Y ∣ Z

) if all paths from any node in X to any node in Y are blocked by Z according to the following rules: (1) A non-collider path segment

A \to B \to C

,

A \leftarrow B \to C

, or

A \leftarrow B \leftarrow C

is blocked if the middle node

B \in Z

; (2) A collider path segment

A \to B \leftarrow C

is blocked unless

B \in Z

or some descendant of B is in Z. D-separation provides a graphical criterion to determine whether two variables are conditionally independent given a set of other variables. It forms the theoretical basis for constraint-based causal discovery algorithms.

A Completed Partially Directed Acyclic Graph (CPDAG) [18] represents a Markov equivalence class of DAGs by displaying directed edges where causal direction is consistent across all equivalent graphs, and undirected edges where direction cannot be uniquely determined from observational data alone.

2.3. The PCMCI Algorithm

PCMCI is a constraint-based causal discovery algorithm specifically designed for high-dimensional time series data. It extends the PC algorithm by addressing temporal dependencies and controlling for autocorrelation. The PCMCI algorithm consists of two main steps: (1) PC₁ step: A preliminary skeleton is constructed by iteratively removing links based on a set of conditional independence tests, using a gradually increasing conditioning set. (2) MCI step: The resulting links are then subjected to a stricter conditional independence test, called the Multivariate Conditional Independence (MCI) test, which conditions on both parents of the target and source variables to better control false positives.

The Multivariate Conditional Independence (MCI) test is designed to evaluate the conditional independence between two time series variables

X_{i, t - τ}

and

X_{j, t}

, while accounting for temporal dependencies. Specifically, it tests

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ P (X_{i, t - τ}) \cup P (X_{j, t}),

where

P (\cdot)

denotes the estimated parent set of a variable. This choice of conditioning set, including both source and target parents, helps to mitigate the effects of autocorrelation and confounding from nearby variables.

3. Method

This section describes the proposed causal discovery algorithm for time series through three parts. First, Section 3.1 illustrates the intuitive insights of aMCI. Section 3.2 introduces the Adaptive Momentary Conditional Independence (aMCI) method to improve detection power by refining the conditioning sets used in tests. Finally, Section 3.3 presents a multi-phase algorithm to effectively leverage the aMCI method. Theoretical properties for the algorithm are provided in Section 3.4.

3.1. Intuitive Insights of aMCI

For intuitive understanding of aMCI, consider a bivariate time series

{X_{t}}_{t \in N_{+}}

, where

X_{t} = {(X_{1, t}, X_{2, t})}^{'}

. This time series is generated according to the following structural causal model:

\begin{matrix} \begin{matrix} X_{1, t} & = a X_{1, t - 1} + ε_{1, t}, \\ X_{2, t} & = a X_{2, t - 1} + c X_{1, t - 1} + ε_{2, t}, \end{matrix} \end{matrix}

(1)

where a represents the strength of autocorrelation, c represents the strength of the causal link

X_{1, t - 1} \to X_{2, t}

, and

ε_{1, t}, ε_{2, t}

are independent noise terms. The corresponding full causal graph is depicted in Figure 1A.

According to the MCI defined in Section 2, determining whether the link

X_{1, t - 1} \to X_{2, t}

exists involves testing whether

X_{1, t - 1} ⊥ ⊥ X_{2, t} ∣ B^{-} (X_{1, t - 1}) \cup B^{-} (X_{2, t})

holds, where

B^{-} (X_{1, t - 1}) = {X_{1, t - 2}}

and

B^{-} (X_{2, t}) = {X_{2, t - 1}}

in model (1). Although from the d-separation perspective, conditioning on

X_{1, t - 2}

should not affect the conditional dependence between

X_{1, t - 1}

and

X_{2, t}

, simulations reveal that replacing

B^{-} (X_{1, t - 1})

with

B_{a d}^{-} (X_{1, t - 1}) = {X_{1, t - 3}}

(i.e., substituting

X_{1, t - 2}

with

X_{1, t - 3}

) significantly improves the detection power for the link

X_{1, t - 1} \to X_{2, t}

.

This phenomenon can be explained through Equations (2) and (3). Equations (2) and (3) can be derived straightforwardly from model (1). The distinction between Equations (2) and (3) lies in the substitution of

X_{1, t - 2}

in Equation (2) with

X_{1, t - 3}

and

ε_{1, t - 2}

.

\begin{matrix} \begin{matrix} X_{1, t - 1} = \underset{conditioning}{\underset{︸}{a X_{1, t - 2}}} + \underset{randomness}{\underset{︸}{ε_{1, t - 1}}}, \\ X_{2, t} = \underset{conditioning}{\underset{︸}{a X_{2, t - 1} + c a X_{1, t - 2}}} + \underset{randomness}{\underset{︸}{c ε_{1, t - 1} + ε_{2, t}}}, \end{matrix} \end{matrix}

(2)

\begin{matrix} \begin{matrix} X_{1, t - 1} = \underset{conditioning}{\underset{︸}{a^{2} X_{1, t - 3}}} + \underset{randomness}{\underset{︸}{a ε_{1, t - 2} + ε_{1, t - 1}}}, \\ X_{2, t} = \underset{conditioning}{\underset{︸}{a X_{2, t - 1} + c a^{2} X_{1, t - 3}}} + \underset{randomness}{\underset{︸}{c a ε_{1, t - 2} + c ε_{1, t - 1} + ε_{2, t}}}, \end{matrix} \end{matrix}

(3)

In Equation (2), the randomness of

X_{1, t - 1} ∣ X_{1, t - 2}, X_{2, t - 1}

depends on the error

ε_{1, t - 1}

, while the randomness of

X_{2, t} ∣ X_{1, t - 2}, X_{2, t - 1}

depends on

c ε_{1, t - 1} + ε_{2, t}

. Thus, the conditional independence test

X_{1, t - 1} ⊥ ⊥ X_{2, t} ∣ X_{1, t - 2}, X_{2, t - 1}

relies on detecting the correlation between

ε_{1, t - 1}

and

c ε_{1, t - 1} + ε_{2, t}

. In contrast, in Equation (3), the randomness of

X_{1, t - 1} ∣ X_{1, t - 3}, X_{2, t - 1}

depends on

a ε_{1, t - 2} + ε_{1, t - 1}

, while the randomness of

X_{2, t} ∣ X_{1, t - 3}, X_{2, t - 1}

depends on

c a ε_{1, t - 2} + c ε_{1, t - 1} + ε_{2, t}

. Therefore, the conditional independence test

X_{1, t - 1} ⊥ ⊥ X_{2, t} ∣ X_{1, t - 3}, X_{2, t - 1}

relies on detecting the correlation between

a ε_{1, t - 2} + ε_{1, t - 1}

and

c a ε_{1, t - 2} + c ε_{1, t - 1} + ε_{2, t}

. Notably, the randomness in Equation (3) both contains the common components

ε_{1, t - 2}

and

ε_{1, t - 1}

with amplifying coefficients. This structure produces a stronger correlation than the single shared term

ε_{1, t - 1}

in Equation (2), thereby enhancing the statistical power for detecting the causal link. This analytical process is visually illustrated in Figure 1C,D.

However, this substitution is not always beneficial. In Figure 1B,

X_{1, t - 2}

acts as a confounder for both

X_{1, t - 1}

and

X_{2, t}

. Failing to condition on this confounder could lead to false positive detection of the link

X_{1, t - 1} \to X_{2, t}

. This highlights the need for an adaptive method that adjusts the conditioning set based on the specific causal structure, such as conditioning on

X_{1, t - 2}

when it is a confounder (Figure 1B) but conditioning on an alternative variable (e.g.,

X_{1, t - 3}

) when it is not (Figure 1A).

3.2. Adaptive Momentary Conditional Independence

As established in Section 3.1, original methods such as MCI, while effective in controlling false positives, may exhibit reduced detection power when applied to time series with strong autocorrelation. The core issue identified is that conditioning on highly correlated immediate predecessors of source variable

X_{i, t - 1}

(e.g.,

X_{i, t - 2}

when testing

X_{i, t - 1} ⊥ ⊥ X_{j, t} ∣ X_{i, t - 2}

) can obscure the signal of the direct link, even though it does not block the path according to d-separation. To address this challenge, this paper proposes the aMCI method that dynamically modifies the conditioning set based on the temporal structure of the variables in the conditioning set. The key innovation of aMCI lies in its strategic handling of immediate predecessors—replacing them with earlier variables when they might obscure causal signals rather than control for confounding.

Using the time series data

X

and a chosen conditional independence test, the operation of aMCI can be formally represented as the mapping:

aMCI : (X_{i, t - τ}, X_{j, t}, S, {\hat{B}}^{-} (X_{i, t - τ}), {\hat{B}}^{-} (X_{j, t})) \to (p - value, I, S_{a d}),

where

X_{i, t - τ}

and

X_{j, t}

are the specific variables under investigation for conditional independence (representing a potential cause and effect, respectively, with lag

τ \geq 0

);

S

is the initial set of conditioning variables provided to aMCI;

{\hat{B}}^{-} (\cdot)

represents the estimated set of lagged parents for a given variable;

S_{a d}

is the final, adaptively determined, conditioning set generated by the aMCI based on

S

; and I is the test statistic value associated with this p-value. The aMCI handles different scenarios based on the temporal relationship between

X_{i, t - τ}

and

X_{j, t}

.

Case 1 (Lagged Links (

τ > 0

)): The aMCI checks whether

X_{i, t - τ - k}

appears in both parent sets

{\hat{B}}^{-} (X_{i, t - τ})

and

{\hat{B}}^{-} (X_{j, t})

, where

X_{i, t - τ - k} \in {\hat{B}}^{-} (X_{i, t - τ})

and

k \in {1, \dots, τ_{\max}}

. If so, the original conditioning set is maintained. Otherwise,

X_{i, t - τ - k}

is substituted with

X_{i, t - τ - k - 1}

in the conditioning set to avoid obscuring causal signals.

A signal-to-noise ratio (SNR)-based explanation is provided for why testing

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ X_{i, t - τ - k - 1}

can be preferable to testing

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ X_{i, t - τ - k}

when

X_{i, t - τ - k}

is not a confounder and the true causal structure is:

X_{i, t - τ - k} \to X_{i, t - τ} \to X_{j, t} .

Let the information contained in

X_{j, t}

be denoted as

I (X_{j, t})

, and the information flow from

X_{i, t - τ}

to

X_{j, t}

be denoted as

I (X_{i, t - τ} \to X_{j, t})

. In testing for independence between

X_{i, t - τ}

and

X_{j, t}

,

I (X_{i, t - τ} \to X_{j, t})

represents the useful signal, while the remaining information in

I (X_{j, t})

is treated as noise. A stronger signal implies a more detectable causal effect and reduces the risk of false negatives.

When conditioning on

X_{i, t - τ - k}

, the signal in

X_{j, t}

is correspondingly reduced and becomes

I (X_{i, t - τ} \to X_{j, t}) - I (X_{i, t - τ - k} \to X_{i, t - τ} \to X_{j, t})

. The corresponding SNR is:

\begin{matrix} S N R_{X_{j, t}} = \frac{I (X_{i, t - τ} \to X_{j, t}) - I (X_{i, t - τ - k} \to X_{i, t - τ} \to X_{j, t})}{I (X_{j, t}) - [I (X_{i, t - τ} \to X_{j, t}) - I (X_{i, t - τ - k} \to X_{i, t - τ} \to X_{j, t})]} . \end{matrix}

From basic mathematical analysis,

S N R_{X_{j, t}}

is a decreasing function of

I (X_{i, t - τ - k} \to X_{i, t - τ} \to X_{j, t})

. When replacing

X_{i, t - τ - k}

with

X_{i, t - τ - k - 1}

, we have

I (X_{i, t - τ - k - 1} \to X_{i, t - τ} \to X_{j, t}) \leq I (X_{i, t - τ - k} \to X_{i, t - τ} \to X_{j, t})

, assuming

X_{i, t - τ - k - 1}

transmits less information through the causal chain (Assumption 4 in Section 3.4). This reduction thus improves

S N R_{X_{j, t}}

. As a result, the conditioning set used in aMCI is more likely to lead to correct decisions and fewer false negatives.

Case 2 (Contemporaneous Links (

τ = 0

)): The aMCI employs a more nuanced strategy depending on whether

X_{i, t - k}

and

X_{j, t - k}

are confounders of

(X_{i, t}, X_{j, t})

, where

X_{i, t - k} \in {\hat{B}}^{-} (X_{i, t})

or

X_{j, t - k} \in {\hat{B}}^{-} (X_{j, t})

,

k \in {1, \dots, τ_{\max}}

. (1) If both are confounders, the standard conditioning set is maintained. (2) If only one is a confounder, the non-confounder immediate predecessor is replaced with its earlier lag. (3) If neither is a confounder, three possible conditioning sets are evaluated (substituting either immediate predecessor or using the standard set) and the one yielding the smallest p-value is selected.

It is important to observe that the handling of lagged links follows directly from the intuition provided in Section 3.1. However, the handling of contemporaneous links requires a more sophisticated approach. This complexity arises because the aMCI methodology involves adjusting the conditioning set of the source variable, yet for contemporaneous links

(X_{i, t}, X_{j, t})

, temporal precedence cannot be used to determine which variable serves as the source. Consequently, Case 2 explores three scenarios based on the principle that confounders must be retained in the conditioning set. Specifically, in Case 2 (3), by considering test results from three different conditioning sets and selecting the one that yields the smallest p-value, the method ensures that links previously undetectable due to autocorrelation become readily identifiable. The pseudocode of aMCI is provided in Algorithm 1. To facilitate a better understanding of the aMCI method, Figure 2 and Figure 3 illustrate how aMCI changes the conditioning set for a particular pair of variables.

The primary challenge in implementing such an adaptive method arises because the true causal structure, particularly the role of immediate predecessors, is typically unknown and must be inferred. Conventional constraint-based discovery algorithms learn the structure iteratively, typically beginning with a fully connected graph structure. It is difficult to apply such adaptive logic effectively in the early stages when the necessary structural information is still uncertain. This challenge motivates the multi-phase algorithm detailed in Section 3.3.

Algorithm 1 The aMCI method

Require: Data $X$ , which is a d-dimensional time series of length T, condition variable set $S$ , conditional independence test $C I (X_{i, t - τ}, X_{j, t}, S)$ , which returns p-value and test statistic value I, estimated lagged parent sets ${\hat{B}}^{-} (X_{i, t - τ})$ and ${\hat{B}}^{-} (X_{j, t})$ , confidence level $α$

1:: if $τ > 0$ then
2:: if $X_{i, t - τ - 1} \in {\hat{B}}^{-} (X_{i, t - τ}) \cap {\hat{B}}^{-} (X_{j, t})$ then
3:: $(p - value, I)$ ← $C I (X_{i, t - τ}, X_{j, t}, S \cup {\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}))$
4:: $S_{a d} \leftarrow S \cup {\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t})$
5:: else
6:: ${\hat{B}}_{a d}^{-} (X_{i, t - τ}) \leftarrow$ Substitute $X_{i, t - τ - 1}$ in ${\hat{B}}^{-} (X_{i, t - τ})$ with $X_{i, t - τ - 2}$
7:: $(p - value, I) \leftarrow C I (X_{i, t - τ}, X_{j, t}, S \cup {\hat{B}}_{a d}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}))$
8:: $S_{a d} \leftarrow S \cup {\hat{B}}_{a d}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t})$
9:: end if
10:: else if $τ = 0$ then
11:: if both $X_{i, t - 1}$ and $X_{j, t - 1}$ are confounders of $(X_{i, t}, X_{j, t})$ then
12:: $(p - value, I)$ ← $C I (X_{i, t}, X_{j, t}, S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t}))$
13:: $S_{a d} \leftarrow S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t})$
14:: else if $X_{i, t - 1}$ or $X_{j, t - 1}$ is confounder of $(X_{i, t}, X_{j, t})$ then
15:: if $X_{i, t - 1}$ is confounder of $(X_{i, t}, X_{j, t})$ then
16:: ${\hat{B}}_{a d}^{-} (X_{j, t}) \leftarrow$ Substitute $X_{j, t - 1}$ in ${\hat{B}}^{-} (X_{j, t})$ with $X_{j, t - 2}$
17:: $(p - value, I)$ ← $C I (X_{i, t}, X_{j, t}, S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}_{a d}^{-} (X_{j, t}))$
18:: $S_{a d} \leftarrow S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}_{a d}^{-} (X_{j, t})$
19:: else if $X_{j, t - 1}$ is confounder of $(X_{i, t}, X_{j, t})$ then
20:: ${\hat{B}}_{a d}^{-} (X_{i, t}) \leftarrow$ Substitute $X_{i, t - 1}$ in ${\hat{B}}^{-} (X_{i, t})$ with $X_{i, t - 2}$
21:: $(p - value, I)$ ← $C I (X_{i, t}, X_{j, t}, S \cup {\hat{B}}_{a d}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t}))$
22:: $S_{a d} \leftarrow S \cup {\hat{B}}_{a d}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t})$
23:: end if
24:: else if Neither of $X_{i, t - 1}$ and $X_{j, t - 1}$ is confounder of $(X_{i, t}, X_{j, t})$ then
25:: ${\hat{B}}_{a d}^{-} (X_{i, t}) \leftarrow$ Substitute $X_{i, t - 1}$ in ${\hat{B}}^{-} (X_{i, t})$ with $X_{i, t - 2}$
26:: $(p {- value}_{1}, I_{1})$ ← $C I (X_{i, t}, X_{j, t}, S \cup {\hat{B}}_{a d}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t}))$
27:: $S_{a d, 1} \leftarrow S \cup {\hat{B}}_{a d}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t})$
28:: ${\hat{B}}_{a d}^{-} (X_{j, t}) \leftarrow$ Substitute $X_{j, t - 1}$ in ${\hat{B}}^{-} (X_{j, t})$ with $X_{j, t - 2}$
29:: $(p {- value}_{2}, I_{2})$ ← $C I (X_{i, t}, X_{j, t}, S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}_{a d}^{-} (X_{j, t}))$
30:: $S_{a d, 2} \leftarrow S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}_{a d}^{-} (X_{j, t})$
31:: $(p {- value}_{3}, I_{3})$ ← $C I (X_{i, t}, X_{j, t}, S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t}))$
32:: $S_{a d, 3} \leftarrow S \cup {\hat{B}}^{-} (X_{i, t}) \cup {\hat{B}}^{-} (X_{j, t})$
33:: $(p - value, I, S_{a d}) \leftarrow (p {- value}_{k^{*}}, I_{k^{*}}, S_{a d, k^{*}}) where k^{*} = \underset{k \in {1, 2, 3}}{\arg \min} p {- value}_{k}$
34:: end if
35:: end if
36:: return $(p - value, I, S_{a d})$ for arbitrary $(X_{i, t - τ}, X_{j, t}, S)$

3.3. Enhanced Causal Discovery Algorithm

Traditional causal discovery algorithms typically start from a fully connected graph and progressively remove edges through conditional independence tests. However, the traditional procedure limits the effectiveness of aMCI in the initial stages, as in a fully connected graph, immediate predecessors are necessarily treated as confounders. To fully leverage the capabilities of aMCI, this paper proposes the ECD-aMCI algorithm that operates in three sequential phases.

The core innovation of the ECD-aMCI algorithm lies in its progressive refinement approach, which enables more accurate estimation of causal structures in time series data with autocorrelation. The algorithm consists of three phases: (1) initial estimation of lagged parent sets, (2) refinement of these estimated sets using aMCI, and (3) discovery of the complete causal structure including both lagged and contemporaneous links.

Phase 1: PC₁-Based Initial Estimation. In the first phase, a simplified PC algorithm is employed to obtain initial estimates of the lagged parent sets. This phase considers only lagged relationships without accounting for contemporaneous effects, which may lead to false positives due to indirect causal effects mediated by contemporaneous variables. Starting with a fully connected lagged structure (up to maximum lag

τ_{\max}

), the algorithm iteratively tests the conditional independence between each lagged variable

X_{i, t - τ}

and target variable

X_{j, t}

, given progressively larger conditioning sets

S

. The minimum test statistic values are stored and links that are conditionally independent given any conditioning set are removed. The procedure continues until all possible conditioning sets up to the available size (up to one) have been tested. The result is an initial estimate of lagged parent sets

{\hat{B}}^{-} (X_{j, t})

for each variable

X_{j, t}

. The details of Phase 1 are presented in Algorithm 2.

Phase 2: Lagged Parent Sets Refinement via aMCI. Since the initial estimates may include false positives due to a contemporaneous mediator, the second phase refines these estimates using aMCI. Starting with the initial estimates from Phase 1, the algorithm tests the conditional independence of adjacent pairs

(X_{i, t - τ}, X_{j, t})

for

τ > 0

using aMCI. The conditioning sets

S

are chosen from the contemporaneous adjacency set

\hat{A} (X_{j, t})

, and the algorithm progressively increases the size of these conditioning sets up to one. After each iteration, the lagged parent sets are updated based on the current graph structure, which enables more accurate aMCI tests in subsequent iterations. The details of Phase 2 are presented in Algorithm 3.

Phase 3: Complete Skeleton Discovery. The final phase builds on the refined lagged parent sets of Phase 2 to discover the complete causal skeleton, including both lagged and contemporaneous relationships. This phase ensures robust results regardless of the variable order by not immediately updating

\hat{B} (X_{j, t})

and

\hat{A} (X_{j, t})

based on aMCI test results. Unlike conventional constraint-based algorithms, this phase tests conditional independence for lagged links first and then contemporaneous links, which minimizes the influence of not-yet-removed edges on the effectiveness of aMCI. For lagged relationships, the algorithm tests and potentially removes links

X_{i, t - τ} \to X_{j, t}

. For contemporaneous relationships, it tests and potentially removes links

X_{i, t} \leftrightarrow X_{j, t}

. After each iteration, both the lagged parent sets and contemporaneous adjacency sets are updated based on the current graph structure. The details of Phase 3 are presented in Algorithm 4.

Algorithm 2 Phase 1: PC₁-Based Initial Estimation of Lagged Parent Sets

Require: Time series dataset $X : = {X_{t} ∣ t \in {1, \dots, T}}$ , maximum lag $τ_{\max}$ , confidence level $α$ , conditional independence test $C I (X_{i, t - τ}, X_{j, t}, S)$ which returns p-value and test statistic value I.

1:: for all j in ${1, \dots, d}$ do
2:: Initialize lagged parent set ${\hat{B}}^{-} (X_{j, t}) \leftarrow (X_{t - 1}^{j}, \dots, X_{t - τ_{\max}}^{j})$ and min test statistic values $I_{m i n} (X_{i, t - τ}, X_{j, t}) \leftarrow \infty$ for all $X_{i, t - τ} \in {\hat{B}}^{-} (X_{j, t})$ .
3:: Let $p \leftarrow 0$ .
4:: while $p \leq 1$ and any $X_{i, t - τ} \in {\hat{B}}^{-} (X_{j, t})$ satisfies $| {\hat{B}}^{-} (X_{j, t}) ∖ {X_{i, t - τ}} | \geq p$ do
5:: for all $X_{i, t - τ}$ in ${\hat{B}}^{-} (X_{j, t})$ satisfying $| {\hat{B}}^{-} (X_{j, t}) ∖ {X_{i, t - τ}} | \geq p$ do
6:: $S = first p variables in {\hat{B}}^{-} (X_{j, t}) ∖ {X_{i, t - τ}}$ . ▹ Select conditioning set of size p
7:: $(p - value, I) \leftarrow C I (X_{i, t - τ}, X_{j, t}, S)$
8:: $I_{m i n} (X_{i, t - τ}, X_{j, t}) = min (I, I_{m i n} (X_{i, t - τ}, X_{j, t}))$
9:: if $p - value > α$ then
10:: Mark $X_{i, t - τ}$ for removal from ${\hat{B}}^{-} (X_{j, t})$ .
11:: end if
12:: end for
13:: Remove non-significant entries from ${\hat{B}}^{-} (X_{j, t})$ and sort remaining entries in ${\hat{B}}^{-} (X_{j, t})$ by $I_{m i n} (X_{i, t - τ}, X_{j, t})$ from largest to smallest.
14:: Let $p \leftarrow p + 1$ .
15:: end while
16:: end for
17:: return ${\hat{B}}^{-} (X_{j, t})$ for all $j \in {1, \dots, d}$ ▹ The estimated lagged parent sets

Algorithm 3 Phase 2: Refined Lagged Parent Skeleton via aMCI

Require: Time series dataset $X$ , maximum lag $τ_{\max}$ , confidence level $α$ , aMCI criterion $a M C I (X_{i, t - τ}, X_{j, t}, S, {\hat{B}}^{-} (X_{i, t - τ}), {\hat{B}}^{-} (X_{j, t}))$ which returns p-value, test statistic value I and adaptive conditioning set $S_{a d}$ , estimated lagged parent sets ${\hat{B}}^{-} (X_{j, t})$ for all j in ${1, \dots, d}$ , confidence level $α$

1:: Initialize graph $\hat{G}$ with fully connected lagged links (up to $τ_{\max}$ ) and contemporaneous links
2:: Initialize contemporaneous adjacency sets $\hat{A} (X_{j, t}) \leftarrow {X_{k, t} ∣ k \in {1, \dots, d}, k \neq j}$ for all $j \in {1, \dots, d}$ .
3:: $p \leftarrow 0$
4:: while $p \leq 1$ and any adjacent pair $(X_{i, t - τ}, X_{j, t})$ for $0 \leq τ \leq τ_{\max}$ in $\hat{G}$ satisfies $| \hat{A} (X_{i, t - τ}) ∖ {X_{j, t}} | \geq p$ or $| \hat{A} (X_{j, t}) ∖ {X_{i, t - τ}} | \geq p$ do
5:: for all adjacent pairs $(X_{i, t - τ}, X_{j, t})$ for $0 < τ \leq τ_{\max}$ satisfying the condition in Line 5 do
6:: for all possible subsets $S \subseteq \hat{A} (X_{j, t})$ with $| S | = p$ do
7:: $(p - value, I, S_{a d}) \leftarrow a M C I (X_{i, t - τ}, X_{j, t}, S, {\hat{B}}^{-} (X_{i, t - τ}), {\hat{B}}^{-} (X_{j, t}))$
8:: if $p - value > α$ then
9:: Delete link $X_{i, t - τ} \to X_{j, t}$ in $\hat{G}$ for $τ > 0$ from $\hat{G}$
10:: Store $S_{a d}$ as seperating set of $(X_{i, t - τ}, X_{j, t})$
11:: end if
12:: end for
13:: end for
14:: Update ${\hat{B}}^{-} (X_{j, t})$ for j in ${1, \dots, d}$ based on estimated $\hat{G}$
15:: $p \leftarrow p + 1$
16:: end while
17:: return ${\hat{B}}^{-} (X_{j, t})$ for all $j \in {1, \dots, d}$

Algorithm 4 Phase 3: Complete Skeleton Discovery

Require: Time series dataset $X$ , maximum lag $τ_{\max}$ , confidence level $α$ , aMCI method $a M C I (X_{i, t - τ}, X_{j, t}, S, {\hat{B}}^{-} (X_{i, t - τ}), {\hat{B}}^{-} (X_{j, t}))$ which returns p-value, test statistic value I and adaptive conditioning set $S_{a d}$ , estimated lagged parent sets ${\hat{B}}^{-} (X_{j, t})$ for all j in ${1, \dots, d}$ , confidence level $α$

1:: Initialize graph $\hat{G}$ with fully connected lagged links (up to $τ_{\max}$ ) and contemporaneous links
2:: Initialize contemporaneous adjacency sets $\hat{A} (X_{j, t}) \leftarrow {X_{k, t} ∣ k \in {1, \dots, d}, k \neq j}$ for all $j \in {1, \dots, d}$ .
3:: Initialize $I_{m i n} (X_{i, t - τ}, X_{j, t}) \leftarrow \infty$ for all links in $\hat{G}$
4:: $p \leftarrow 0$
5:: while any adjacent pair $(X_{i, t - τ}, X_{j, t})$ for $0 \leq τ \leq τ_{\max}$ in $\hat{G}$ satisfies $| \hat{A} (X_{i, t - τ}) ∖ {X_{j, t}} | \geq p$ or $| \hat{A} (X_{j, t}) ∖ {X_{i, t - τ}} | \geq p$ do
6:: for all adjacent pairs $(X_{i, t - τ}, X_{j, t})$ for $0 < τ \leq τ_{\max}$ satisfying the condition in Line 5 do
7:: for all possible subsets $S \subseteq \hat{A} (X_{j, t})$ with $| S | = p$ do
8:: $(p - value, I, S_{a d}) \leftarrow a M C I (X_{i, t - τ}, X_{j, t}, S, {\hat{B}}^{-} (X_{i, t - τ}), {\hat{B}}^{-} (X_{j, t}))$
9:: if $p - value > α$ then
10:: Delete link $X_{i, t - τ} \to X_{j, t}$ in $\hat{G}$ for $τ > 0$ from $\hat{G}$
11:: Store $S_{a d}$ as seperating set of $(X_{i, t - τ}, X_{j, t})$
12:: end if
13:: end for
14:: end for
15:: Update $\hat{B} (X_{j, t})$ for j in ${1, \dots, d}$ based on estimated $\hat{G}$
16:: for all adjacent pairs $(X_{i, t}, X_{j, t})$ satisfying the condition in Line 5 do
17:: for all possible subsets $S \subseteq \hat{A} (X_{j, t}) ∖ {X_{i, t}}$ with $| S | = p$ do
18:: $(p - value, I, S_{a d}) \leftarrow a M C I (X_{i, t}, X_{j, t}, S, {\hat{B}}^{-} (X_{i, t}), {\hat{B}}^{-} (X_{j, t}))$
19:: if $p - value > α$ then
20:: Delete link $X_{i, t} \leftrightarrow X_{j, t}$ from $\hat{G}$
21:: Store $S_{a d}$ as seperating set of $(X_{i, t}, X_{j, t})$
22:: end if
23:: end for
24:: end for
25:: Update $\hat{A} (X_{j, t})$ for j in ${1, \dots, d}$ based on estimated $\hat{G}$
26:: $p \leftarrow p + 1$
27:: end while
28:: return Graph $\hat{G}$ , Separating sets of all nonadjacent pairs.

By progressively refining estimates and leveraging the capabilities of aMCI, the three phases outlined above enable more accurate identification of causal adjacencies in time series. To facilitate a better understanding of the algorithmic procedure, Figure 4 presents a flowchart of the multi-phase ECD-aMCI algorithm.

The subsequent orientation rules in the ECD-aMCI algorithm follow Meek’s rules [19] and build upon the relevant work of Runge [15]. The complete pseudocode of the orientation rules is provided in Algorithms A1 and A2 in Appendix A.

3.4. Theoretical Properties

This section provides a theoretical analysis of the ECD-aMCI algorithm. First, necessary assumptions for the analysis are introduced. Then, two key theoretical properties of ECD-aMCI are established: consistency and order independence.

The theoretical analysis relies on several standard assumptions in time series causal discovery: (1) causal sufficiency, (2) causal Markov condition, (3) faithfulness, (4) temporal priority, and (5) stationarity.

Assumption 1

(Causal Sufficiency). All common causes of any pair of observed variables in the system are also observed.

Assumption 2

(Causal Markov Condition). Each variable in the causal graph is conditionally independent of its non-descendants given its direct parents. Formally, if

G

is a directed acyclic graph (DAG) representing the causal structure, and P is the joint probability distribution of the variables, then P factorizes according to

G

as

P (X_{1}, X_{2}, \dots, X_{d}) = \prod_{i = 1}^{d} P (X_{i} | P (X_{i})),

where

P (X_{i})

denotes the parent set of

X_{i}

in

G

.

Assumption 3

(Faithfulness). All conditional independence relationships in the probability distribution are entailed by the causal graph structure via the d-separation criterion. That is,

X ⊥ ⊥ Y ∣ Z

if and only if X and Y are d-separated by

Z

in the causal graph. This assumption rules out precise parameter cancellations that could create conditional independencies not implied by the causal structure.

Assumption 4

(Temporal Priority). A cause precedes its effect in time, or at the very least, occurs simultaneously. Furthermore, as the time lag increases, the autoregressive influence of a process on its own future values progressively diminishes.

Assumption 5

(Stationarity). The causal structure among variables and the population of time series do not change over time. This assumption allows us to learn a window causal graph from time series, where edges represent consistent causal links across all time points.

These assumptions are commonly adopted in the causal discovery literature [5,15,20,21].

Theorem 1

(Consistency). Under the Assumptions (1)–(5), if the conditional independence tests are oracle, ECD-aMCI returns the correct CPDAG, i.e.,

\hat{G} = G_{C P D A G}

, where

G_{C P D A G}

denotes the CPDAG of the time series graph

G

.

Theorem 1 demonstrates that given oracle conditional independence information, the ECD-aMCI algorithm guarantees correct identification of all links that can be identified based on conditional independence information. The proof strategy involves first establishing that the proposed algorithm correctly identifies the skeleton of the graph, and then verifying that the orientation rules correctly identify directions. This establishes that ECD-aMCI achieves the same theoretical consistency guarantees as other constraint-based causal discovery algorithms under ideal conditions.

Since Colombo and Maathuis [22] proposed the PC-stable algorithm, order independence has been recognized as an essential property for causal discovery algorithms. The proposed algorithm avoids the influence of variable ordering in all edge removal processes based on conditional independence tests and in the orientation phase, naturally leading to Theorem 2.

Theorem 2

(Order Independence). Under Assumptions (1)–(5), the outcome of ECD-aMCI is independent of the order of the variables.

Detailed proofs of Theorems 1 and 2 are provided in the Appendix B.

4. Evaluation

This section presents a comprehensive evaluation of the ECD-aMCI algorithm. First, several baseline algorithms for comparison are introduced, followed by the evaluation metrics used to assess performance. The data generation process for simulations is then described in detail. Results from both simulated and benchmark datasets are analyzed to demonstrate the effectiveness of ECD-aMCI. Finally, hyperparameter settings for all algorithms are discussed.

4.1. Baselines

The goal of the proposed algorithm is to learn causal graphs from observational time series that include both lagged (Window) and contemporaneous links. As shown in Table 3, algorithms that share this objective include PCMCI+, PCMCI+Bagged, DYNOTEARS, NTS-NOTEARS, and VAR-LiNGAM. Among these, PCMCI+, PCMCI+Bagged, and NTS-NOTEARS are selected as baselines. DYNOTEARS and VAR-LiNGAM were not included because PCMCI+ has been demonstrated to outperform VAR-LiNGAM in its original paper, and NTS-NOTEARS has been shown to outperform DYNOTEARS in comparative studies. Because CR-VAE and ACD algorithms are designed to learn summary causal graphs, whereas the ECD-aMCI algorithm learns time-lagged window causal graphs, CR-VAE and ACD were not included as baselines.

The constraint-based methods (PCMCI+, Bagged-PCMCI+, and ECD-aMCI) can be directly compared as they are all implemented with ParCorr [23] test for linear cases and GPDC [24] test for nonlinear cases. In contrast, NTS-NOTEARS is a continuous optimization-based algorithm whose hyperparameter weight threshold

W_{t h r e s h}

does not have common choices like the confidence level

α

in constraint-based algorithms (e.g.,

α = 0.01

). The weight threshold

W_{t h r e s h}

is adjusted through grid search to ensure that its

F_{1}

-score is maximized. PCMCI+ and Bagged-PCMCI+ were implemented using the pcmci function within the Python 3.8 package tigramite. NTS-NOTEARS was implemented using the NTS_NOTEARS function from the NTS-NOTEARS Python package.

4.2. Evaluation Metrics

To thoroughly assess the performance of causal discovery methods, four important metrics are employed that evaluate different aspects of algorithms. The F1-score, which balances precision and recall, is measured separately for lagged cross-adjacencies (

i \neq j

) and all adjacencies, with higher values indicating better performance. The Structural Hamming Distance (SHD) is measured to quantify the overall structural difference between the learned and true causal graphs. SHD counts the number of edge additions, deletions, and reversals needed to transform one graph into another, with lower values indicating better performance. The average runtimes were evaluated on Intel Xeon Platinum 8260L CPU at 2.30 GHz. To remove the influence of patterns in marginal variance that might affect results, all metrics were computed on standardized data [25].

4.3. Simulated Data Generation

Following Runge et al. [5] and Runge [15], we generate autocorrelated time series with both contemporaneous and lagged causal dependencies. The data is generated from an additive model where each variable is influenced by its own past values, the past and contemporaneous values of its causal parents, and random noise. Specifically, we use the following model:

X_{j, t} = \sum_{τ = 1}^{τ_{\max}} δ_{j, j, τ} a_{j} f_{j} (X_{j, t - τ}) + \sum_{i = 1}^{d} \sum_{τ = 1}^{τ_{\max}} δ_{i, j, τ} c_{i, j, τ} f_{i} (X_{i, t - τ}) + ε_{j, t},

(4)

where

X_{j, t}

represents the value of variable j at time t,

j \in {1, \dots, d}

, and d is the dimension of the time series. The autocorrelation coefficients

a_{j}

are uniformly drawn from

[max (0, a - 0.2), a]

, where a controls the autocorrelation strength. The noise term

ε_{j, t}

follows an i.i.d. standard Gaussian distribution. The coefficient

c_{i, j, τ}

, which represents the strength of the link

X_{i, t - τ} \to X_{j, t}

, is uniformly drawn from

[0.15, 0.25]

to ensure the stationarity of the time series. The binary coefficient

δ_{i, j, τ}

indicates the existence of the link

X_{i, t - τ} \to X_{j, t}

. For each model,

2.5 \cdot d

cross-links between variables are randomly selected, with 40% set to be contemporaneous (

τ = 0

), and the remaining links having time lags

τ

uniformly drawn from

{1, \dots, τ_{true}}

, where

τ_{true}

represents the true maximum time lag.

Under the framework of model (4), we consider two main scenarios: linear and nonlinear causal relationships.

Linear Setting: The functions are set to

f_{i} (x) = x

and

τ_{true} = 5

. The autocorrelation strength (a) varies from 0.5 to 0.9, the dimension of the time series (d) from 10 to 40, and the sample size (T) from 250 to 1000. For each combination of

(a, d, T)

, 300 simulated datasets are generated. Each dataset is generated with different randomly sampled parameters (

a_{j}

,

c_{i, j, τ}

,

ε_{j, t}

) and a different ground truth causal graph (different

δ_{i, j, τ}

for each dataset).

Nonlinear Setting: The functions are set to

f_{i} (x) = tanh (x)

and

τ_{true} = 3

. The autocorrelation strength (a) varies from 0.5 to 0.9, the dimension of the time series (d) from 10 to 40, and the sample size (T) from 250 to 1000. For each combination of

(a, d, T)

, 50 simulated datasets are generated.

4.4. Results

4.4.1. Linear Setting

As shown in Table 4, the ECD-aMCI method achieves the highest average values across all parameter settings for both F1-score_lagged and F1-score_all, demonstrating an advantage over the compared methods. Meanwhile, its relatively small standard deviations indicate good stability under varying experimental conditions. In contrast, while PCMCI+ and NTS-NOTEARS perform closely under certain parameters, their overall performance is slightly less competitive. Bagged-PCMCI+ shows comparatively weaker results across all metrics, particularly regarding SHD and computation time, where higher average errors and standard deviations suggest room for improvement in consistency and efficiency. For the SHD metric, ECD-aMCI also yields the lowest structural error, with an average value notably lower than the other methods and well-controlled standard deviations, indicating higher accuracy and consistency in structure recovery. PCMCI+ ranks second, while Bagged-PCMCI+ and NTS-NOTEARS exhibit higher SHD values. Regarding computation time, PCMCI+ shows the highest computational efficiency across all parameter settings. The running time of ECD-aMCI is slightly higher than that of PCMCI+ but remains within a highly acceptable range, and its small standard deviation suggests stable computational performance. Bagged-PCMCI+ requires the longest computation time among all methods.

Table 5 and Table 6 further validate these trends. Under different time lengths T and variable dimensions d, ECD-aMCI consistently maintains a lead in F1-score and SHD metrics, exhibiting good robustness and adaptability. PCMCI+ ranks second in most cases and possesses a clear advantage in running time. Bagged-PCMCI+ and NTS-NOTEARS show a relative decrease in performance, especially under high-dimensional or long-time-series settings, where their standard deviations also increase, reflecting certain limitations in scalability.

As shown in Figure 5, for each parameter setting, ECD-aMCI consistently outperforms the baseline algorithms across all evaluation metrics. In the first column, as the autocorrelation strength increases from 0.5 to 0.9, ECD-aMCI maintains superior F1-scores for both lagged and all adjacencies while keeping the SHD lower than competing methods. The advantage becomes more pronounced as autocorrelation strengthens. The second column demonstrates that ECD-aMCI achieves better performance even with limited sample sizes. The third column illustrates that ECD-aMCI scales effectively with increasing dimensionality, maintaining high F1-scores and low SHD as the dimension increases from 10 to 40.

Regarding computational efficiency, the runtime comparison in the bottom row shows that ECD-aMCI achieves this superior performance with reasonable computational cost. For visual clarity, error bars are not shown in the figures.

4.4.2. Nonlinear Setting

Under nonlinear settings, we further evaluated the performance of each method under varying parameter conditions. Table 7, Table 8 and Table 9 present the comparison results of F1-score (lagged and all edges), SHD, and running time across different methods when the parameter a, time length T, and variable dimension d vary.

As shown in Table 7, for the F1-score_lagged metric, ECD-aMCI achieves the highest average values across all parameter settings, ranging from 0.737 to 0.816, which is notably better than PCMCI+ and NTS-NOTEARS. Meanwhile, the standard deviation for ECD-aMCI is relatively small (0.077 to 0.107), indicating that it maintains good stability under nonlinear conditions. NTS-NOTEARS performs closely to ECD-aMCI under certain parameters, but its standard deviation is slightly larger, suggesting relatively lower stability. For the F1-score_all metric, ECD-aMCI also performs excellently, achieving the highest average values except when

a = 0.5

, with standard deviations generally around 0.035, showing minimal fluctuation. NTS-NOTEARS slightly outperforms ECD-aMCI when

a = 0.5

, but overall, ECD-aMCI still holds the advantage. Regarding the SHD metric, ECD-aMCI yields the lowest structural error under most parameters, with average values ranging from 20.280 to 21.120 and standard deviations controlled between 2.5 and 3.3, demonstrating capable structure recovery. When a is small, the SHD of NTS-NOTEARS is slightly lower than that of ECD-aMCI, but it exhibits a larger standard deviation (up to 4.679), indicating limited stability. In terms of running time, NTS-NOTEARS demonstrates extremely high computational efficiency, with average running times of only 6.08 to 10.27 s. The running time of ECD-aMCI is higher than that of NTS-NOTEARS, averaging between 3600 and 4700 s with a larger standard deviation, which reflects the increased computational burden when utilizing nonlinear conditional independence tests.

Table 8 illustrates the performance comparison under varying time lengths T. For the F1-score metrics, ECD-aMCI achieves the highest average values when

T \geq 500

, accompanied by smaller standard deviations, demonstrating good stability and adaptability. NTS-NOTEARS performs best on short time series with

T = 250

, but as T increases, this advantage is gradually surpassed by ECD-aMCI. PCMCI+ exhibits comparatively weaker performance across all T settings. Regarding the SHD metric, ECD-aMCI achieves the lowest structural error when

T \geq 500

, with well-controlled standard deviations. NTS-NOTEARS yields a slightly lower SHD than ECD-aMCI only when

T = 250

; however, as T increases, its SHD decreases slowly, and its standard deviation remains relatively large.

Table 9 presents the performance comparison under varying variable dimensions d. For the F1-score metrics, ECD-aMCI consistently achieves the highest average values for d ranging from 10 to 30. At

d = 40

, its performance is comparable to PCMCI+ and NTS-NOTEARS, but its generally smaller standard deviations highlight a clear advantage in stability. The F1-scores for PCMCI+ and NTS-NOTEARS show a noticeable decline as d increases, accompanied by relatively larger standard deviations. Regarding the SHD metric, ECD-aMCI yields the lowest structural error for d ranging from 10 to 30. At

d = 40

, its SHD is slightly higher than that of NTS-NOTEARS, yet its standard deviation remains well-controlled. While NTS-NOTEARS shows a slightly better SHD at

d = 40

, its larger standard deviation (9.104) suggests limited stability.

As shown in Figure 6, ECD-aMCI consistently outperforms the baseline algorithms across most evaluation metrics. In the first column, as the autocorrelation strength increases from 0.5 to 0.9, ECD-aMCI maintains superior F1-scores, with improvements of approximately 5–10% over competing algorithms, while keeping the SHD consistently lower, particularly under high autocorrelation conditions. Although the performance advantage is modest at lower sample sizes, the second column reveals that the ECD-aMCI algorithm becomes increasingly superior as the sample size grows. The third column illustrates that ECD-aMCI exhibits stable performance with increasing dimensionality, maintaining relatively consistent F1-scores even as the number of variables increases.

The bottom row illustrates that constraint-based algorithms (including ECD-aMCI) require longer computational time than optimization-based algorithms in nonlinear settings, primarily due to the computational intensity of nonparametric conditional independence tests. Future research could explore more efficient conditional independence tests to reduce computational runtime. These simulation results demonstrate that the ECD-aMCI algorithm offers substantial advantages over competing algorithms, particularly in settings with high autocorrelation and sufficient sample sizes. Note that Bagged-PCMCI+ is not included in the nonlinear setting comparisons due to its prohibitively high computational and memory requirements.

4.5. Benchmark Data

The functional Magnetic Resonance Imaging (fMRI) benchmark from NetSim contains rich, realistic simulated time series for modeling brain networks [26]. This nonlinear benchmark dataset has been widely used to evaluate time series causal discovery algorithms due to its diverse range of underlying networks that closely mimic challenges in neuroimaging analysis. As demonstrated in Table 10, ECD-aMCI achieves higher F1-scores and lower SHD on this dataset compared to all baseline methods, confirming its effectiveness and practical value for neuroscientific applications. Due to the prohibitively high computational cost of the Bagged-PCMCI+ algorithm when processing non-linear datasets, this algorithm is excluded from consideration in the analysis of fMRI datasets.

Figure 7 illustrates the results of learning a causal graph on an fMRI dataset (

d = 5

) using different causal discovery algorithms. As shown in Figure 7, the window causal graph learned by the ECD-aMCI algorithm is very close to the ground truth graph. Regarding the skeleton, there is only one discrepancy, which is an additional edge

X_{4, t} \to X_{5, t}

. In contrast, the window causal graph learned by PCMCI+ omits the edges

X_{3, t} \to X_{2, t}

and

X_{1, t} \to X_{5, t}

, while incorrectly adding the edge

X_{3, t} \to X_{1, t}

. The omission of these two instantaneous edges is primarily caused by conditioning on historical nodes that are not confounders. The aMCI method successfully avoids this issue, thereby recovering a more accurate window causal graph.

In addition to the fMRI data, we conducted supplementary real-world validation on telecommunication network alarm data. In telecommunication networks, a single fault can trigger a cascade of various alarm types across multiple connected devices. Understanding the causal relationships among these alarms is crucial for intelligent alarm management and fault root cause analysis, as manually handling such alarm floods can quickly overwhelm operators. The experimental results in Table 11 demonstrate that ECD-aMCI remains highly competitive in this practical application, further validating its effectiveness beyond the fMRI domain.

4.6. Hyperparameters Analysis

ECD-aMCI has two hyperparameters. The first is the confidence level

α

for hypothesis testing, which should be set as low as possible while maintaining sufficient detection power. In all experiments presented in this paper,

α = 0.01

is used for all constraint-based methods (ECD-aMCI, PCMCI+, and Bagged-PCMCI+). The second hyperparameter is the maximum time lag

τ_{\max}

, which in all experiments is set to the true maximum time lag. In practical applications,

τ_{\max}

can be initially chosen based on estimated autocorrelation coefficients, typically selecting a slightly larger value, and then iteratively reduced based on the estimated causal graph. Simulation results in Table 12 indicate that the proposed method exhibits robustness to the choice of

τ_{\max}

. NTS-NOTEARS has six hyperparameters, with the weight threshold

W_{thresh}

being sensitive to the strength of causal links. Therefore, the

W_{thresh}

is determined by maximizing the F1-score, while other hyperparameters are set to the optimal values reported in the original paper [14]. See Appendix C.1 and Appendix C.2 for detailed hyperparameter settings for all methods.

The core idea for selecting

τ_{max}

is to first determine a relatively large initial value based on the autocorrelation coefficients, and then use this initial value to estimate the window causal graph with the ECD-aMCI algorithm to determine the final

τ_{max}

. Specifically, we calculate the lagged autocorrelations for each variable based on a gradually increasing

τ_{candidate}

. We select the

τ_{candidate}

where the lagged autocorrelations of all variables fall below 0.05 as the initial choice and execute the ECD-aMCI algorithm. Finally, the maximum lag of the lagged edges in the window graph learned by the ECD-aMCI algorithm is adopted as the definitive

τ_{max}

.

Next, we discuss how the choice of different CI tests affects the stability of the aMCI framework and its robustness when facing complex data distributions. Theorem 1 demonstrates that the aMCI framework is theoretically guaranteed to learn the correct causal graph when the CI tests are perfectly accurate. Consequently, the primary criterion for selecting a CI test in practice is statistical power. For example, in strictly linear scenarios, partial correlation is inherently the optimal choice. On complex real-world data, GPDC is preferred, whereas CMIKNN typically requires a larger sample size.

Furthermore, we have supplemented our experiments with a performance comparison of the ECD-aMCI algorithm using various CI tests under a linear setting and on fMRI data. As shown in Table 13, the ECD-aMCI algorithm achieves the best performance using partial correlation in linear settings, aligning with theoretical expectations. On the fMRI data, the ECD-aMCI algorithm performs optimally with the non-parametric conditional independence test, GPDC. However, its performance when using partial correlation remains competitive, demonstrating a commendable level of robustness across varying distributions.

5. Discussion

Previous time series causal discovery methods have noted the impact of autocorrelation on causal graph learning and have introduced several improvements. For instance, some studies proposed momentary conditional independence [15], while others utilized deep learning architectures tailored for time series to compute scores [14]. However, these methods still suffer from significant performance degradation when autocorrelation becomes strong. To address these key challenges in causal discovery for autocorrelated time series, this paper introduces the aMCI method and the ECD-aMCI algorithm. The proposed algorithm dynamically adapts conditioning sets to mitigate the masking effects of strong autocorrelation while maintaining control over false discovery rates. Furthermore, our theoretical analysis establishes its consistency and order independence properties.

Extensive evaluations on both simulated and benchmark datasets demonstrate significant improvements in detection power for causal links, especially lagged links. In both linear and nonlinear scenarios, as the autocorrelation coefficient a increases, the performance of all methods in the causal structure identification task exhibits varying degrees of decline. However, ECD-aMCI consistently maintains the optimal F1-score. Even under the most challenging setting of

a = 0.9

, its F1-score remains close to

0.8

, which is significantly higher than those reported in previous studies [5,15]. These simulation results indicate that while high autoregressive strength is detrimental to causal structure identification, ECD-aMCI effectively mitigates this issue.

Despite these contributions, we discuss the limitations of the ECD-aMCI algorithm and future research directions from three main perspectives: regular sampling, the causal sufficiency assumption, and the stationarity assumption. The limitations are as follows:

The proposed algorithm is currently limited to regularly sampled time series and cannot handle irregular sampling intervals. If causal graphs need to be learned from irregularly sampled or event-driven time series data [27], frameworks based on stochastic processes would be more appropriate [28,29].
Addressing latent confounders remains an important direction for future work. Ignoring latent confounding factors may lead to incorrect causal conclusions. Some existing causal discovery algorithms account for unobserved confounders; for example, tsFCI [30] and SVAR-FCI [31] extract information regarding ancestral relationships among observed variables to learn a partial ancestral graph. Since ECD-aMCI and tsFCI are both constraint-based algorithms, future research could consider combining the insights of the aMCI method with the frameworks of tsFCI and SVAR-FCI to handle the effects of latent confounding.
In many practical scenarios (e.g., financial data with trends, climatic data with regime shifts or structural breaks), time series may not be stationary [32]. Therefore, investigating the scalability of the ECD-aMCI algorithm on non-stationary data is an important direction. Algorithms designed for non-stationary time series are typically built upon foundations established for stationary cases [32,33,34]. For instance, the SPACETIME algorithm’s core strategy involves identifying change points in the distribution and assuming stationarity between them [34]. Consequently, the insights of the ECD-aMCI algorithm can potentially transfer to algorithms designed for non-stationary time series.

6. Conclusions

This paper addresses a critical bottleneck in time series causal discovery: the significant performance degradation of constraint-based algorithms in the presence of strong autocorrelation. To overcome the masking effects of autocorrelation, we proposed the aMCI method, which dynamically adjusts the conditioning set to preserve causal signals while rigorously controlling false discovery rates. Building upon this foundational method, we introduced the ECD-aMCI, a multi-phase algorithm designed to progressively refine causal structures and fully leverage the aMCI framework for both lagged and contemporaneous links.

Theoretically, the ECD-aMCI algorithm provides a hyperparameter-insensitive and order-independent framework that is provably consistent under oracle conditions. Empirically, extensive evaluations demonstrate that ECD-aMCI consistently achieves the highest

{F_{1} - score}_{lagged}

and

{F_{1} - score}_{all}

while maintaining the lowest SHD across various linear and nonlinear simulated datasets. Furthermore, validation on real-world fMRI benchmarks and telecommunication network alarm data confirms its practical effectiveness and robustness in complex, high-autocorrelation environments.

While ECD-aMCI substantially advances the accuracy of causal discovery from time series, several avenues for future research remain. Currently, the proposed algorithm relies on the assumptions of regular sampling, causal sufficiency, and stationarity. Future work could explore integrating the adaptive insights of the aMCI method with stochastic process frameworks to handle irregularly sampled data, or combining it with latent variable discovery frameworks (such as tsFCI) to account for unmeasured confounders.

Author Contributions

Conceptualization, M.G.; methodology, M.G. and Y.Z.; software, M.G.; validation, M.G.; formal analysis, M.G.; investigation, M.G. and Y.Z.; resources, Y.Z.; data curation, M.G.; writing—original draft preparation, M.G.; writing—review and editing, M.G. and Y.Z.; visualization, M.G.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Shanghai Natural Science Foundation (No. 24ZR1420400), Shanghai Science and Technology Innovation Action Plan’ Computational Biology Key Project (No. 23JS1400500, No. 23JS1400800) and Fundamental and Interdisciplinary Disciplines Breakthrough Plan of the Ministry of Education of China (No. JYB2025XDXM904).

Data Availability Statement

The dataset NetSim used for this study is available at: https://www.fmrib.ox.ac.uk/datasets/netsim/ (accessed on 17 February 2026), and telecommunication network alarm data is available at: https://competition.huaweicloud.com/information/1000041487/dataset (accessed on 17 February 2026).

Acknowledgments

During the preparation of this manuscript/study, the authors used Gemini 3 Pro for the purposes of text polishing and language refinement. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

aMCI	Adaptive momentary conditional independence
SCM	Structural causal model
GC	Granger causality
SB	Score-based
CB	Constraint-based
DAG	Directed acyclic graph
SHD	Structural hamming distance

Appendix A. Pseudocodes of the Orientation Rules

Appendix A provides the complete pseudocodes for the orientation rules.

Algorithm A1 Detailed collider phase with conservative rules

Require: $\hat{G}$ and separating sets from Algorithm 4, time series dataset $X$ , confidence level $α$ , $C I (X, Y, Z)$ , ${\hat{B}}^{-} (X_{j, t})$ for all j in ${1, \dots, d}$ .

1:: for all unshielded triples $X_{i, t - τ} \to X_{k, t} \circ - \circ X_{j, t}$ ( $τ > 0$ ) or $X_{i, t} \circ - \circ X_{k, t} \circ - \circ X_{j, t}$ ( $τ = 0$ ) in $\hat{G}$ where $(X_{i, t - τ}, X_{j, t})$ are not adjacent do
2:: Define contemporaneous adjacencies $\hat{A} (X_{j, t}) \leftarrow {X_{i, t} \neq X_{j, t} \in X_{t} : X_{i, t} \circ - \circ X_{j, t} in \hat{G}}$
3:: for all $S \subseteq \hat{A} (X_{j, t}) ∖ {X_{i, t - τ}}$ and for all $S \subseteq \hat{A} (X_{i, t}) ∖ {X_{j, t}}$ (if $τ = 0$ ) do
4:: $(p - value, I, S) \leftarrow C I (X_{i, t - τ}, X_{j, t}, S \cup {\hat{B}}^{-} (X_{j, t}) \cup {\hat{B}}^{-} (X_{i, t - τ}) ∖ {X_{i, t - τ}})$
5:: Store subset $S$ with p-value $> α$ as separating subset
6:: end for
7:: if no separating subsets are found then
8:: Mark triple as ambiguous
9:: else
10:: Compute fraction $n_{k}$ of separating subsets that contain $X_{k, t}$ , orient triple as collider if $n_{k} = 0$ , leave unoriented if $n_{k} = 1$ , and mark as ambiguous if $0 < n_{k} < 1$
11:: end if
12:: Mark links in $\hat{G}$ with conflicting orientations as $\times - \times$
13:: end for
14:: return $\hat{G}$ , seperating set, ambiguous triples, conflicting links

Algorithm A2 Detailed rule orientation phase

Require: $\hat{G}$ , ambiguous triples, conflicting links

1:: while any unambiguous triples suitable for rules R1-R3 are remaining do
2:: Apply rule R1 (orient unshielded triples that are not colliders):
3:: for all unambiguous triples $X_{i, t - τ} \to X_{k, t} \circ - \circ X_{j, t}$ where $(X_{i, t - τ}, X_{j, t})$ are not adjacent do
4:: Orient as $X_{i, t - τ} \to X_{k, t} \to X_{j, t}$
5:: end for
6:: Mark links with conflicting orientations as $\times - \times$
7:: Apply rule R2 (avoid cycles):
8:: for all unambiguous triples $X_{i, t} \to X_{k, t} \to X_{j, t}$ with $X_{i, t} \circ - \circ X_{j, t}$ do
9:: Orient as $X_{i, t} \to X_{j, t}$
10:: end for
11:: Mark links with conflicting orientations as $\times - \times$
12:: Apply rule R3 (orient unshielded triples that are not colliders and avoid cycles):
13:: for all pairs of unambiguous triples $X_{i, t} \circ - \circ X_{k, t} \to X_{j, t}$ and $X_{i, t} \circ - \circ X_{l, t} \to X_{j, t}$ where $(X_{k, t}, X_{l, t})$ are not adjacent and $X_{i, t} \circ - \circ X_{j, t}$ do
14:: Orient as $X_{i, t} \to X_{j, t}$
15:: end for
16:: Mark links with conflicting orientations as $\times - \times$
17:: end while
18:: return $\hat{G}$ , conflicting links

Appendix B. Proofs of Theoretical Properties

Proof of Theorem 1.

The proof of consistency comprises three primary steps: initially demonstrating that Algorithms 2 and 3 return a superset of the lagged parent set

{\hat{B}}^{-} (X_{j, t})

, then establishing that Algorithm 4 accurately recovers the skeleton of the causal graph, and finally establishing that all unshielded triples that are colliders are correctly identified.

Step 1: Superset Property of Estimated Lagged Parents. Since Algorithm 2 is adopted from Algorithm 1 in Runge [15], the estimated lagged parent set

{\hat{B}}^{-} (X_{j, t})

satisfies the condition

B^{-} (X_{j, t}) \subseteq {\hat{B}}^{-} (X_{j, t})

based on Lemma S1 in Runge [15]. Given oracle conditional independence tests, we assert that the outcome of the aMCI method is equivalent to that of MCI. Without loss of generality, we only need to prove that if

X_{i, t - τ - 1}

is not a confounder, then replacing

X_{i, t - τ - 1}

with

X_{i, t - τ - 2}

does not affect the conditional independence conclusion—that is, testing

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ X_{i, t - τ - 2}

instead of

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ X_{i, t - τ - 1}

still yields the same result. Under the causal sufficiency and faithfulness assumptions,

X ⊥ ⊥ Y ∣ Z

if and only if X is d-separated from Y given

Z

. Under the temporal priority assumption,

X_{i, t - τ - 1}

cannot be a collider. Therefore, if

X_{i, t - τ - 1}

is also not a confounder, then according to the definition of d-separation in Section 2, replacing

X_{i, t - τ - 1}

with

X_{i, t - τ - 2}

does not affect the d-separation between

X_{i, t - τ}

and

X_{j, t}

. That is,

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ X_{i, t - τ - 1}

if and only if

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ X_{i, t - τ - 2}

. Since the two conditional independence statements are equivalent and the tests are assumed to be oracle, the outcome of the aMCI method is equivalent to that of MCI. Consequently, Algorithm 3 does not update the originally estimated lagged parents at

p = 0

. In the

p = 1

phase, Algorithm 3 will remove parents of one contemporaneous parent of

X_{j, t}

that are not parents of

X_{j, t}

directly. Therefore, the refined estimated lagged parent set remains a superset of the true lagged parent set

B^{-} (X_{j, t})

.

Step 2: Correctness of the Skeleton Discovery. In this step,

{\hat{G}}^{*} = G^{*}

is proved under Assumptions 1–5, where

G^{*}

represents the skeleton of the causal graph. To establish this equality, it is sufficient to demonstrate that for any arbitrary

X_{i, t - τ}, X_{j, t}

, the following statements hold: (1)

X_{i, t - τ} ★ - ★ X_{j, t} \notin {\hat{G}}^{*} \Rightarrow X_{i, t - τ} ★ - ★ X_{j, t} \notin G^{*}

and (2)

X_{i, t - τ} ★ - ★ X_{j, t} \notin G^{*} \Rightarrow X_{i, t - τ} ★ - ★ X_{j, t} \notin {\hat{G}}^{*}

, where

★ - ★

denotes both types of skeleton links (directed links → and undirected links

\circ - \circ

) for simplicity.

(1): Algorithm 2 eliminates a link $X_{i, t - τ} ★ - ★ X_{j, t}$ from ${\hat{G}}^{*}$ if and only if $X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ S_{a d}$ for some subset $S \subseteq \hat{A} (X_{j, t})$ during the iterative conditional independence tests, where $S_{a d}$ is the third outcome of the aMCI. Here, $\hat{A} (X_{j, t})$ denotes the contemporaneous adjacencies. By the principle of faithfulness, this conditional independence directly implies that $X_{i, t - τ}$ and $X_{j, t}$ are d-seperated by $S_{a d}$ in the true causal graph, thus $X_{i, t - τ} ★ - ★ X_{j, t} \notin G^{*}$ .
(2): According to the conclusion of Step 1, ${\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ}$ does not contain descendants. Thus, the causal Markov condition yields $(X_{i, t - τ}, {\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ}) ⊥ ⊥ X_{j, t} ∣ P (X_{j, t})$ . Applying the weak union property of conditional independence, we derive $X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ P (X_{j, t}) \cup ({\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ})$ . Note that $X_{i, t - τ} \notin P (X_{j, t})$ since $X_{i, t - τ} ★ - ★ X_{j, t} \notin G^{*}$ , for the case where $τ = 0$ , we assume $X_{i, t}$ is not a descendant of $X_{j, t}$ (as the alternate case would be covered by exchanging $X_{i, t}$ and $X_{j, t}$ ).
Now it suffices to prove that the conditioning set $P (X_{j, t}) \cup ({\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ})$ must be tested in Algorithm 4. Algorithm 4 systematically tests $X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ S \cup {\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ}$ across all subsets $S \subseteq \hat{A} (X_{j, t})$ . The contrapositive of Step 2 (1) confirms that the estimated contemporaneous adjacencies consistently include the true contemporaneous adjacencies as a subset, i.e., $A (X_{j, t}) \subseteq \hat{A} (X_{j, t})$ . Furthermore, Step 1 also confirms that ${\hat{B}}^{-} (X_{j, t})$ encompasses all lagged parents of $X_{j, t}$ , i.e., $B^{-} (X_{j, t}) \subseteq {\hat{B}}^{-} (X_{j, t})$ . Consequently, during the iteration process, there exists a subset $S$ such that $S \cup {\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ} = P (X_{j, t}) \cup ({\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ})$ . Algorithm 4 will detect $X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ P (X_{j, t}) \cup ({\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ X_{i, t - τ})$ and subsequently remove $X_{i, t - τ} ★ - ★ X_{j, t}$ from ${\hat{G}}^{*}$ .

Step 3: The proof that all unshielded triples that are colliders are correctly identified involves two aspects: (1) If unshielded triples are oriented as colliders in Algorithm A1, then these triples are truly colliders (establishing the correctness of the collider orientation phase), and (2) All unshielded triples that are colliders can be correctly identified. Based on the established correctness of the skeleton discovery in Step 2 and the reliability of oracle conditional independence tests, the triples that are oriented as colliders in Line 10 of Algorithm A1 are correct. Therefore, it remains to show that all unshielded triples that are colliders are correctly identified.

Considering a generic triple

X_{i, t_{i}} ★ - ★ X_{k, t_{k}} ★ - ★ X_{j, t_{j}}

, we can fix

t_{j} = t

by stationarity. Time order constraints and stationarity properties allow us to reduce the analysis to two specific cases:

X_{i, t - τ} \to X_{k, t} \circ - \circ X_{j, t}

(for

τ > 0

) or

X_{i, t} \circ - \circ X_{k, t} \circ - \circ X_{j, t}

(for

τ = 0

) in

G

where

X_{i, t - τ}

and

X_{j, t}

are not adjacent. Since

X_{k, t}

is contemporaneous with

X_{j, t}

, only contemporaneous components of separating sets are relevant for the collider orientation phase. Given the correctness of the skeleton discovery and the fact that

(X_{i, t - τ}, X_{j, t})

are not adjacent, there exists a subset

S

such that

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ S \cup {\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ {X_{i, t - τ}}

. Furthermore, by the faithfulness assumption and the definition of d-separation,

X_{k, t}

cannot belong to any set

S

for which

X_{i, t - τ} ⊥ ⊥ X_{j, t} ∣ S \cup {\hat{B}}^{-} (X_{i, t - τ}) \cup {\hat{B}}^{-} (X_{j, t}) ∖ {X_{i, t - τ}}

. This implies that all unshielded triples that are colliders can be correctly identified according to the rules of the Algorithm A1. □

Proof of Theorem 2.

The proof of the order-independence consists of three main steps: first demonstrating that the aMCI method is order-independent, then proving that the estimation of refined lagged parent sets and complete skeleton is order-independent, and finally establishing the order-independence of the orientation process.

Step 1: Order-independence of the aMCI method. The aMCI method, by definition, operates as a mapping function that takes inputs

(X_{i, t - τ}, X_{j, t}, S, {\hat{B}}^{-} (X_{i, t - τ}), {\hat{B}}^{-} (X_{j, t}))

and produces outcomes determined solely by these inputs. Since the outcomes are completely determined by the input values rather than the order in which variables are processed, the application of the aMCI method preserves the order-independence property of any algorithm that incorporates it.

Step 2: Order-independence of parent set estimation and skeleton discovery. The processes described in Algorithms 2–4 maintain order-independence by adopting the idea of PC-stable. In Algorithms 2–4, edge removals are executed only after completing each iteration over conditioning sets of cardinality p. As established in Step 1, the integration of the aMCI method does not compromise this property, thereby ensuring that the estimation of refined lagged parent sets and the discovery of the complete skeleton remain order-independent.

Step 3: Order-independence of orientation phases. The orientation process maintains order-independence through careful management of potential conflicts. Following the methodology established in [15], the collider identification phase (Algorithm A1) and the rule-based orientation phase (Algorithm A2) preserve order-independence by implementing two key strategies: (1) ambiguity marking for triples with inconsistent separating sets, and (2) consistent marking of conflicting link orientations using the notation

\times - \times

. □

Appendix C. Detailed Hyperparameter Settings

This section presents all hyperparameters required for the experiments in linear and nonlinear settings.

Appendix C.1. Hyperparameters of Linear Settings

All hyperparameters required for the experiments in Figure 2 are provided in the following.

ECD-aMCI, PCMCI+: Confidence level $α = 0.01$ , maximum time lag $τ_{\max} = 5$ , condition independence test $C I = P a r C o r r$ .
Bagged-PCMCI+: Confidence level $α = 0.01$ , maximum time lag $τ_{\max} = 5$ , condition independence test $C I = P a r C o r r$ , boot samples $B_{bagged} = 50$ .
NTS-NOTEARS: $λ_{1} = 0.01$ for $T \in {250, 500}$ , $λ_{1} = 0.001$ for $T \in {750, 1250}$ , $λ_{2} = 0.05$ , $K = 5$ , $m = d$ , the number of hidden layers = 1.

Appendix C.2. Hyperparameters of Nonlinear Settings

Hyperparameters: All hyperparameters required for the experiments in Figure 1 are provided in the following.

ECD-aMCI, PCMCI+: Confidence level $α = 0.01$ , maximum time lag $τ_{\max} = 3$ , condition independence test $C I = G P D C$ .
NTS-NOTEARS: $λ_{1} = 0.01$ for $T \in {250, 500}$ , $λ_{1} = 0.001$ for $T \in {750, 1250}$ , $λ_{2} = 0.05$ , $K = 3$ , $m = d$ , the number of hidden layers = 1.

References

Miao, W.; Shi, X.; Li, Y.; Tchetgen, E.J.T. A confounding bridge approach for double negative control inference on causal effects. Stat. Theory Relat. Fields 2024, 8, 262–273. [Google Scholar] [CrossRef]
Xu, T.; Zhao, J. Relaxed doubly robust estimation in causal inference. Stat. Theory Relat. Fields 2024, 8, 69–79. [Google Scholar] [CrossRef] [PubMed]
Nogueira, A.R.; Pugnana, A.; Ruggieri, S.; Pedreschi, D.; Gama, J. Methods and tools for causal discovery and causal inference. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2022, 12, e1449. [Google Scholar] [CrossRef]
Gong, C.; Zhang, C.; Yao, D.; Bi, J.; Li, W.; Xu, Y. Causal discovery from temporal data: An overview and new perspectives. ACM Comput. Surv. 2024, 57, 1–38. [Google Scholar] [CrossRef]
Runge, J.; Nowack, P.; Kretschmer, M.; Flaxman, S.; Sejdinovic, D. Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 2019, 5, eaau4996. [Google Scholar] [CrossRef] [PubMed]
Hyvärinen, A.; Zhang, K.; Shimizu, S.; Hoyer, P.O. Estimation of a structural vector autoregression model using non-Gaussianity. J. Mach. Learn. Res. 2010, 11, 1709–1731. [Google Scholar]
Peters, J.; Janzing, D.; Schölkopf, B. Causal inference on time series using restricted structural equation models. Adv. Neural Inf. Process. Syst. 2013, 26, 154–162. [Google Scholar]
Assaad, C.K.; Devijver, E.; Gaussier, E.; Ait-Bachir, A. A Mixed Noise and Constraint-Based Approach to Causal Inference in Time Series. In Proceedings of the Machine Learning and Knowledge Discovery in Databases (Research Track); Springer: Cham, Germany, 2021; pp. 453–468. [Google Scholar]
Wu, T.; Wu, X.; Wang, X.; Liu, S.; Chen, H. Nonlinear Causal Discovery in Time Series. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2022; CIKM ’22; pp. 4575–4579. [Google Scholar] [CrossRef]
Löwe, S.; Madras, D.; Zemel, R.; Welling, M. Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data. In Proceedings of the First Conference on Causal Learning and Reasoning; PMLR; JMLR: Cambridge, MA, USA, 2022; Volume 177, pp. 509–525. [Google Scholar]
Li, H.; Yu, S.; Principe, J. Causal Recurrent Variational Autoencoder for Medical Time Series Generation. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2023; Volume 37, pp. 8562–8570. [Google Scholar] [CrossRef]
Zheng, X.; Aragam, B.; Ravikumar, P.K.; Xing, E.P. Dags with no tears: Continuous optimization for structure learning. Adv. Neural Inf. Process. Syst. 2018, 31, 9492–9503. [Google Scholar]
Pamfil, R.; Sriwattanaworachai, N.; Desai, S.; Pilgerstorfer, P.; Georgatzis, K.; Beaumont, P.; Aragam, B. DYNOTEARS: Structure learning from time-series data. In Proceedings of the International Conference on Artificial Intelligence and Statistics; PMLR; JMLR: Cambridge, MA, USA, 2020; pp. 1595–1605. [Google Scholar]
Sun, X.; Schulte, O.; Liu, G.; Poupart, P. NTS-NOTEARS: Learning Nonparametric DBNs With Prior Knowledge. arXiv 2023, arXiv:2109.04286. [Google Scholar] [CrossRef]
Runge, J. Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets. In Proceedings of the Conference on Uncertainty in Artificial Intelligence; PMLR; JMLR: Cambridge, MA, USA, 2020; pp. 1388–1397. [Google Scholar]
Debeire, K.; Gerhardus, A.; Runge, J.; Eyring, V. Bootstrap aggregation and confidence measures to improve time series causal discovery. In Proceedings of the Causal Learning and Reasoning; PMLR; JMLR: Cambridge, MA, USA, 2024; pp. 979–1007. [Google Scholar]
Pearl, J. Causality; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Spirtes, P.; Glymour, C.; Scheines, R.; Kauffman, S.; Aimale, V.; Wimberly, F. Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data. In Proceedings of the Atlantic Symposium on Computational Biology, Genome Information Systems & Technology; Association for Intelligent Machinery: Durham, NC, USA, 2002. [Google Scholar]
Meek, C. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, 18–20 August 1995; UAI’95. pp. 403–410. [Google Scholar]
Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
Biswas, R.; Shlizerman, E. Statistical perspective on functional and causal neural connectomics: The Time-Aware PC algorithm. PLoS Comput. Biol. 2022, 18, e1010653. [Google Scholar] [CrossRef] [PubMed]
Colombo, D.; Maathuis, M.H. Order-independent constraint-based causal structure learning. J. Mach. Learn. Res. 2014, 15, 3741–3782. [Google Scholar]
Hotelling, H. New Light on the Correlation Coefficient and its Transforms. J. R. Stat. Soc. Ser. B (Methodol.) 1953, 15, 193–225. [Google Scholar] [CrossRef]
Runge, J. Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In Proceedings of the International Conference on Artificial Intelligence and Statistics; PMLR; JMLR: Cambridge, MA, USA, 2018; pp. 938–947. [Google Scholar]
Reisach, A.; Seiler, C.; Weichwald, S. Beware of the simulated dag! causal discovery benchmarks may be easy to game. Adv. Neural Inf. Process. Syst. 2021, 34, 27772–27784. [Google Scholar]
Smith, S.M.; Miller, K.L.; Salimi-Khorshidi, G.; Webster, M.; Beckmann, C.F.; Nichols, T.E.; Ramsey, J.D.; Woolrich, M.W. Network modelling methods for FMRI. Neuroimage 2011, 54, 875–891. [Google Scholar] [CrossRef]
Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods; Springer: New York, NY, USA, 2003. [Google Scholar]
Truccolo, W.; Eden, U.T.; Fellows, M.R.; Donoghue, J.P.; Brown, E.N. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. J. Neurophysiol. 2005, 93, 1074–1089. [Google Scholar] [CrossRef]
Eichler, M.; Dahlhaus, R.; Dueck, J. Graphical modeling for multivariate Hawkes processes with nonparametric link functions. J. Time Ser. Anal. 2017, 38, 225–242. [Google Scholar] [CrossRef]
Entner, D.; Hoyer, P. On causal discovery from time series data using FCI. In Proceedings of the 5th EuropeanWorkshop on Probabilistic Graphical Models, Helsinki, Finland, 13–15 September 2010; Myllymäki, P., Roos, T., Jaakkola, T., Eds.; HIIT: Helsinki, Finland, 2010; pp. 121–128. [Google Scholar]
Malinsky, D.; Spirtes, P. Causal Structure Learning from Multivariate Time Series in Settings with Unmeasured Confounding. In Proceedings of the 2018 ACM SIGKDD Workshop on Causal Disocvery; Le, T.D., Zhang, K., Kıcıman, E., Hyvärinen, A., Liu, L., Eds.; PMLR; JMLR: Cambridge, MA, USA, 2018; Volume 92, pp. 23–47. [Google Scholar]
Huang, B.; Zhang, K.; Gong, M.; Glymour, C. Causal discovery and forecasting in nonstationary environments with state-space models. In Proceedings of the International Conference on Machine Learning; PMLR; JMLR: Cambridge, MA, USA, 2019; pp. 2901–2910. [Google Scholar]
Zhang, K.; Huang, B.; Zhang, J.; Glymour, C.; Schölkopf, B. Causal discovery from nonstationary/heterogeneous data: Skeleton estimation and orientation determination. In Proceedings of the IJCAI: Proceedings of the Conference; AAAI Press: Washington, DC, USA, 2017; Volume 2017, p. 1347. [Google Scholar]
Mameche, S.; Cornanguer, L.; Ninad, U.; Vreeken, J. SPACETIME: Causal Discovery from Non-Stationary Time Series. Proc. AAAI Conf. Artif. Intell. 2025, 39, 19405–19413. [Google Scholar] [CrossRef]

Figure 1. (A) Causal graph corresponding to the model (1). (B) Alternative causal graph with

X_{1, t - 2}

as a confounder. (C) Causal signal loss when conditioning on

X_{1, t - 2}

and

X_{2, t - 1}

under causal graph (A). (D) Causal signal enhancement when conditioning on

X_{1, t - 3}

and

X_{2, t - 1}

under causal graph (A).

Figure 1. (A) Causal graph corresponding to the model (1). (B) Alternative causal graph with

X_{1, t - 2}

as a confounder. (C) Causal signal loss when conditioning on

X_{1, t - 2}

and

X_{2, t - 1}

under causal graph (A). (D) Causal signal enhancement when conditioning on

X_{1, t - 3}

and

X_{2, t - 1}

under causal graph (A).

Figure 2. The difference in conditioning set selection between the MCI and aMCI methods for lagged links. Light blue nodes represent the pair of variables under consideration, and light green nodes represent the variables in the conditioning set.

Figure 3. The difference in conditioning set selection between the MCI and aMCI methods for instantaneous links. Light blue nodes represent the pair of variables under consideration, and light green nodes represent the variables in the conditioning set.

Figure 4. Flowchart of the multi-phase ECD-aMCI algorithm. The red solid arrows represent falsely identified causal edges, while the red dashed arrows denote missed causal edges.

Figure 5. Mean metrics over 300 datasets for each linear setting.

Figure 6. Mean metrics over 50 datasets for each nonlinear linear setting.

Figure 7. The results of learning a causal graph on an fMRI dataset (

d = 5

) using different causal discovery algorithms.

Figure 7. The results of learning a causal graph on an fMRI dataset (

d = 5

) using different causal discovery algorithms.

Table 1. Summary and comparison of causal discovery methods for time series data.

Category	Method	Graph Type	Nonlinear	Instantaneous
SCM-based	VAR-LiNGAM	Window	No	Yes
	TiMINo	Summary	Yes	No
	NBCB	Summary	Yes	Yes
	NCDH	Summary	Yes	No
GC-based	ACD	Summary	Yes	No
GC-based	CR-VAE	Summary	Yes	Yes
Score-based	DYNOTEARS	Window	No	Yes
Score-based	NTS-NOTEARS	Window	Yes	Yes
Constraint-based	PCMCI	Window	Yes	No
	PCMCI+	Window	Yes	Yes
	Bagged-PCMCI+	Window	Yes	Yes

Table 2. Summary of mathematical notations.

Notation	Description
$P (X_{j, t})$	True parent set of $X_{j, t}$ in $G$
${\hat{B}}^{-} (X_{j, t})$	Estimated lagged parent set of $X_{j, t}$
$B^{-} (X_{j, t})$	True lagged parent set of $X_{j, t}$
$\hat{A} (X_{j, t})$	Estimated contemporaneous adjacency set of $X_{j, t}$
$A (X_{j, t})$	True contemporaneous adjacency set of $X_{j, t}$
$a M C I (\cdot)$	Adaptive Momentary Conditional Independence method
$⊥ ⊥$	Conditional independence relation
$★ - ★$	Generic link (directed → or undirected $\circ - \circ$ )
$\times - \times$	Conflicting link orientation
d-separation	Graph-theoretic conditional independence criterion

Table 3. Summary of causal discovery algorithms for time series.

Category	Method	Window	Contemporaneous
CB	PCMCI+	Y	Y
CB	Bagged-PCMCI+	Y	Y
SB	NTS-NOTEARS	Y	Y
SB	DYNOTEARS	Y	Y
SCM	VAR-LiNGAM	Y	Y
SCM	TiMINo	N	Y
SCM	NBCB	N	Y
SCM	NCDH	N	N
GC	ACD	N	N
GC	CR-VAE	N	Y

Table 4. Performance comparison of methods under linear setting (varying a).

Metric	Method	a
Metric	Method	0.9	0.8	0.7	0.6	0.5
F1-score_lagged	ECD-aMCI	0.775 ± 0.106	0.848 ± 0.077	0.873 ± 0.072	0.884 ± 0.067	0.886 ± 0.064
	PCMCI+	0.519 ± 0.156	0.695 ± 0.118	0.807 ± 0.086	0.848 ± 0.073	0.867 ± 0.066
	Bagged-PCMCI+	0.456 ± 0.148	0.586 ± 0.121	0.672 ± 0.101	0.679 ± 0.098	0.671 ± 0.091
	NTS-NOTEARS	0.240 ± 0.150	0.493 ± 0.194	0.708 ± 0.097	0.799 ± 0.074	0.837 ± 0.071
F1-score_all	ECD-aMCI	0.875 ± 0.054	0.917 ± 0.037	0.928 ± 0.035	0.932 ± 0.034	0.931 ± 0.033
	PCMCI+	0.785 ± 0.066	0.857 ± 0.046	0.900 ± 0.037	0.916 ± 0.034	0.922 ± 0.033
	Bagged-PCMCI+	0.724 ± 0.069	0.779 ± 0.054	0.810 ± 0.048	0.808 ± 0.050	0.798 ± 0.050
	NTS-NOTEARS	0.355 ± 0.122	0.564 ± 0.160	0.753 ± 0.059	0.850 ± 0.043	0.883 ± 0.042
SHD	ECD-aMCI	11.603 ± 4.102	8.903 ± 3.169	8.643 ± 3.107	9.410 ± 3.121	10.547 ± 3.126
	PCMCI+	16.673 ± 4.559	12.570 ± 3.393	10.560 ± 3.039	10.543 ± 2.958	11.207 ± 2.997
	Bagged-PCMCI+	22.203 ± 4.877	18.997 ± 4.030	17.947 ± 4.004	18.977 ± 4.247	20.717 ± 4.186
	NTS-NOTEARS	67.903 ± 16.787	38.937 ± 10.174	22.957 ± 5.424	14.623 ± 4.004	11.793 ± 3.583
Runtime (s)	ECD-aMCI	35.68 ± 21.64	21.15 ± 12.93	18.80 ± 13.11	12.30 ± 4.22	11.53 ± 3.60
	PCMCI+	20.92 ± 13.96	10.80 ± 5.61	9.71 ± 6.63	6.66 ± 2.28	4.34 ± 0.97
	Bagged-PCMCI+	439.70 ± 120.31	359.30 ± 89.56	339.73 ± 89.41	330.18 ± 75.78	314.70 ± 74.63
	NTS-NOTEARS	13.76 ± 9.98	15.59 ± 11.42	13.13 ± 5.02	14.15 ± 4.80	14.43 ± 5.52

Note: Bold values in all tables indicate the best performance in their respective columns.

Table 5. Performance comparison of methods under linear setting (varying T).

Metric	Method	T
Metric	Method	250	500	750	1000
F1-score_lagged	ECD-aMCI	0.592 ± 0.134	0.848 ± 0.077	0.920 ± 0.055	0.945 ± 0.046
	PCMCI+	0.459 ± 0.135	0.695 ± 0.118	0.796 ± 0.106	0.844 ± 0.080
	Bagged-PCMCI+	0.374 ± 0.131	0.586 ± 0.121	0.705 ± 0.113	0.760 ± 0.093
	NTS-NOTEARS	0.426 ± 0.191	0.493 ± 0.194	0.531 ± 0.158	0.537 ± 0.164
F1-score_all	ECD-aMCI	0.781 ± 0.058	0.917 ± 0.037	0.954 ± 0.027	0.967 ± 0.024
	PCMCI+	0.709 ± 0.060	0.857 ± 0.046	0.909 ± 0.043	0.932 ± 0.032
	Bagged-PCMCI+	0.647 ± 0.062	0.779 ± 0.054	0.844 ± 0.051	0.874 ± 0.043
	NTS-NOTEARS	0.522 ± 0.173	0.564 ± 0.160	0.607 ± 0.135	0.610 ± 0.142
SHD	ECD-aMCI	19.317 ± 3.854	8.903 ± 3.169	5.267 ± 2.631	3.783 ± 2.330
	PCMCI+	23.360 ± 3.556	12.570 ± 3.393	8.123 ± 3.421	5.857 ± 2.586
	Bagged-PCMCI+	28.003 ± 4.234	18.997 ± 4.030	14.283 ± 4.167	11.780 ± 3.603
	NTS-NOTEARS	39.940 ± 7.682	38.937 ± 10.174	39.063 ± 7.767	37.303 ± 6.749
Runtime (s)	ECD-aMCI	9.790 ± 2.842	21.154 ± 12.929	16.803 ± 5.779	17.980 ± 7.912
	PCMCI+	3.375 ± 0.887	10.797 ± 5.614	8.548 ± 2.449	10.487 ± 3.330
	Bagged-PCMCI+	235.378 ± 51.948	359.303 ± 89.555	512.583 ± 113.982	589.065 ± 137.459
	NTS-NOTEARS	10.321 ± 3.378	15.589 ± 11.417	22.436 ± 15.596	18.564 ± 4.539

Table 6. Performance comparison of methods under linear setting (varying d).

Metric	Method	d
Metric	Method	10	20	30	40
F1-score_lagged	ECD-aMCI	0.945 ± 0.046	0.935 ± 0.031	0.917 ± 0.028	0.897 ± 0.027
	PCMCI+	0.844 ± 0.080	0.833 ± 0.101	0.807 ± 0.108	0.785 ± 0.115
	Bagged-PCMCI+	0.760 ± 0.093	0.739 ± 0.095	0.700 ± 0.095	0.669 ± 0.097
	NTS-NOTEARS	0.537 ± 0.164	0.486 ± 0.176	0.423 ± 0.189	0.388 ± 0.198
F1-score_all	ECD-aMCI	0.967 ± 0.024	0.957 ± 0.018	0.944 ± 0.016	0.931 ± 0.018
	PCMCI+	0.932 ± 0.032	0.924 ± 0.042	0.909 ± 0.047	0.897 ± 0.050
	Bagged-PCMCI+	0.874 ± 0.043	0.839 ± 0.044	0.805 ± 0.042	0.776 ± 0.043
	NTS-NOTEARS	0.610 ± 0.142	0.563 ± 0.161	0.510 ± 0.184	0.480 ± 0.189
SHD	ECD-aMCI	3.783 ± 2.330	9.077 ± 3.445	15.963 ± 4.585	26.157 ± 7.028
	PCMCI+	5.857 ± 2.586	12.957 ± 6.062	21.823 ± 10.238	32.750 ± 14.379
	Bagged-PCMCI+	11.780 ± 3.603	29.827 ± 6.990	53.683 ± 9.811	82.400 ± 12.798
	NTS-NOTEARS	37.303 ± 6.749	79.283 ± 11.669	126.643 ± 13.792	170.520 ± 17.492
Runtime (s)	ECD-aMCI	17.980 ± 7.912	68.724 ± 18.949	160.456 ± 66.461	337.322 ± 183.383
	PCMCI+	10.487 ± 3.330	38.660 ± 13.031	85.480 ± 32.818	176.178 ± 102.563
	Bagged-PCMCI+	589.065 ± 137.459	2571.851 ± 699.392	6732.931 ± 2098.731	13,243.336 ± 3738.661
	NTS-NOTEARS	18.564 ± 4.539	99.162 ± 53.911	384.579 ± 188.730	795.130 ± 407.888

Table 7. Performance comparison of methods under nonlinear setting (varying a).

Metric	Method	a
Metric	Method	0.9	0.8	0.7	0.6	0.5
F1-score_lagged	ECD-aMCI	0.737 ± 0.101	0.752 ± 0.107	0.778 ± 0.093	0.794 ± 0.079	0.816 ± 0.077
	PCMCI+	0.685 ± 0.117	0.713 ± 0.103	0.738 ± 0.101	0.763 ± 0.097	0.784 ± 0.092
	NTS-NOTEARS	0.718 ± 0.103	0.745 ± 0.091	0.766 ± 0.084	0.789 ± 0.078	0.809 ± 0.081
F1-score_all	ECD-aMCI	0.850 ± 0.037	0.857 ± 0.039	0.868 ± 0.036	0.870 ± 0.032	0.869 ± 0.035
	PCMCI+	0.806 ± 0.045	0.825 ± 0.040	0.837 ± 0.038	0.847 ± 0.034	0.849 ± 0.035
	NTS-NOTEARS	0.812 ± 0.046	0.834 ± 0.041	0.854 ± 0.041	0.863 ± 0.039	0.872 ± 0.040
SHD	ECD-aMCI	20.920 ± 3.123	20.600 ± 3.280	20.280 ± 2.892	20.640 ± 2.544	21.120 ± 2.903
	PCMCI+	24.320 ± 3.552	23.300 ± 3.132	22.760 ± 3.178	22.300 ± 2.744	22.440 ± 2.815
	NTS-NOTEARS	24.540 ± 4.679	22.400 ± 3.863	20.560 ± 4.253	19.400 ± 3.950	18.660 ± 4.246
Runtime (s)	ECD-aMCI	4423.80 ± 696.92	4467.35 ± 876.32	4733.17 ± 823.10	3692.25 ± 836.55	4046.83 ± 1067.61
	PCMCI+	1649.92 ± 388.47	1811.52 ± 414.25	1951.26 ± 465.31	1550.20 ± 356.56	1515.32 ± 424.96
	NTS-NOTEARS	8.52 ± 1.05	8.40 ± 1.41	9.06 ± 1.26	6.08 ± 0.51	10.27 ± 2.06

Table 8. Performance comparison of methods under nonlinear setting (varying T).

Metric	Method	T
Metric	Method	250	500	750	1000
F1-score_lagged	ECD-aMCI	0.425 ± 0.141	0.752 ± 0.107	0.890 ± 0.063	0.944 ± 0.045
	PCMCI+	0.387 ± 0.137	0.713 ± 0.103	0.839 ± 0.083	0.899 ± 0.046
	NTS-NOTEARS	0.566 ± 0.094	0.745 ± 0.091	0.824 ± 0.074	0.841 ± 0.060
F1-score_all	ECD-aMCI	0.660 ± 0.052	0.857 ± 0.039	0.938 ± 0.027	0.966 ± 0.022
	PCMCI+	0.610 ± 0.051	0.825 ± 0.040	0.911 ± 0.030	0.948 ± 0.024
	NTS-NOTEARS	0.716 ± 0.051	0.834 ± 0.041	0.883 ± 0.034	0.900 ± 0.031
SHD	ECD-aMCI	34.440 ± 3.226	20.600 ± 3.280	13.240 ± 3.734	9.460 ± 3.390
	PCMCI+	37.060 ± 2.760	23.300 ± 3.132	15.420 ± 3.909	11.220 ± 3.324
	NTS-NOTEARS	32.060 ± 4.688	22.400 ± 3.863	17.900 ± 4.239	16.200 ± 3.980
Runtime (s)	ECD-aMCI	238.83 ± 31.69	4467.35 ± 876.32	10,259.70 ± 2007.56	23,890.22 ± 3613.27
	PCMCI+	53.76 ± 9.85	1811.52 ± 414.25	4985.80 ± 1353.46	14,973.02 ± 3799.10
	NTS-NOTEARS	5.60 ± 0.35	8.40 ± 1.41	7.31 ± 0.75	8.28 ± 0.74

Table 9. Performance comparison of methods under nonlinear setting (varying d).

Metric	Method	d
Metric	Method	10	20	30	40
F1-score_lagged	ECD-aMCI	0.752 ± 0.107	0.782 ± 0.057	0.755 ± 0.043	0.733 ± 0.047
	PCMCI+	0.713 ± 0.103	0.757 ± 0.068	0.731 ± 0.051	0.730 ± 0.048
	NTS-NOTEARS	0.745 ± 0.091	0.741 ± 0.072	0.727 ± 0.046	0.729 ± 0.050
F1-score_all	ECD-aMCI	0.857 ± 0.039	0.860 ± 0.026	0.852 ± 0.021	0.832 ± 0.024
	PCMCI+	0.825 ± 0.040	0.834 ± 0.030	0.827 ± 0.021	0.816 ± 0.023
	NTS-NOTEARS	0.834 ± 0.041	0.818 ± 0.036	0.818 ± 0.020	0.813 ± 0.026
SHD	ECD-aMCI	20.600 ± 3.280	42.400 ± 5.223	66.040 ± 6.579	95.880 ± 9.024
	PCMCI+	23.300 ± 3.132	46.500 ± 5.442	71.980 ± 5.941	100.840 ± 7.630
	NTS-NOTEARS	22.400 ± 3.863	45.240 ± 6.445	66.900 ± 6.655	92.540 ± 9.104
Runtime (s)	ECD-aMCI	4467.35 ± 876.32	13,977.40 ± 3313.75	26,279.66 ± 4543.04	43,190.11 ± 8954.29
	PCMCI+	1811.52 ± 414.25	3949.59 ± 1480.62	7322.64 ± 1707.84	9831.77 ± 2665.12
	NTS-NOTEARS	8.40 ± 1.41	29.66 ± 3.47	192.25 ± 84.32	473.98 ± 173.88

Table 10. Performance comparison on 50 fMRI datasets with 200 observations each.

Method	$d = 5$		$d = 10$
Method	F1-Score_all	SHD	F1-Score_all	SHD
ECD-aMCI	0.908 ± 0.075	5.940 ± 1.737	0.843 ± 0.048	17.840 ± 2.129
NTS-NOTEARS	0.672 ± 0.056	11.040 ± 2.218	0.646 ± 0.039	28.060 ± 3.706
PCMCI+	0.871 ± 0.081	6.560 ± 1.651	0.809 ± 0.059	19.280 ± 2.307

Table 11. Evaluation on historical alarm data.

Method	$F_{1} {- Score}_{all}$	SHD
ECD-aMCI	0.5872	103
PCMCI+	0.5524	109
Bagged-PCMCI+	0.5000	120
NTS-NOTEARS	0.3956	110

Table 12. Robustness of the ECD-aMCI to hyperparameter

τ_{\max}

.

Table 12. Robustness of the ECD-aMCI to hyperparameter

τ_{\max}

.

	$τ_{\max} = 5$	$τ_{\max} = 6$	$τ_{\max} = 7$	$τ_{\max} = 8$	$τ_{\max} = 9$	$τ_{\max} = 10$
$F_{1}$ -score_all	0.917 ± 0.037	0.910 ± 0.038	0.905 ± 0.040	0.902 ± 0.042	0.900 ± 0.041	0.897 ± 0.042
SHD	8.903 ± 3.169	9.280 ± 3.289	9.657 ± 3.451	9.953 ± 3.492	10.173 ± 3.600	10.423 ± 3.568

Table 13. Performance comparison of conditional independence tests (Linear vs. fMRI).

Data Type	CI Test	$F_{1} {- Score}_{all}$	SHD
Linear	ParCorr	$0.6670 \pm 0.0669$	$25.6000 \pm 4.4045$
	GPDC	$0.6011 \pm 0.1270$	$41.9800 \pm 54.4935$
	CMIKNN	$0.2350 \pm 0.1059$	$43.2200 \pm 4.2061$
fMRI	ParCorr	$0.9075 \pm 0.0746$	$5.9400 \pm 1.7368$
	GPDC	$0.9303 \pm 0.0503$	$5.3200 \pm 1.5549$
	CMIKNN	$0.7391 \pm 0.1197$	$8.8200 \pm 1.7052$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, M.; Zhou, Y. Enhanced Causal Discovery for Autocorrelated Time Series via Adaptive Momentary Conditional Independence. Mathematics 2026, 14, 1129. https://doi.org/10.3390/math14071129

AMA Style

Gao M, Zhou Y. Enhanced Causal Discovery for Autocorrelated Time Series via Adaptive Momentary Conditional Independence. Mathematics. 2026; 14(7):1129. https://doi.org/10.3390/math14071129

Chicago/Turabian Style

Gao, Minglong, and Yingchun Zhou. 2026. "Enhanced Causal Discovery for Autocorrelated Time Series via Adaptive Momentary Conditional Independence" Mathematics 14, no. 7: 1129. https://doi.org/10.3390/math14071129

APA Style

Gao, M., & Zhou, Y. (2026). Enhanced Causal Discovery for Autocorrelated Time Series via Adaptive Momentary Conditional Independence. Mathematics, 14(7), 1129. https://doi.org/10.3390/math14071129

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Causal Discovery for Autocorrelated Time Series via Adaptive Momentary Conditional Independence

Abstract

1. Introduction

2. Preliminaries

2.1. Notations

2.2. d-Separation and Completed Partially Directed Acyclic Graph

2.3. The PCMCI Algorithm

3. Method

3.1. Intuitive Insights of aMCI

3.2. Adaptive Momentary Conditional Independence

3.3. Enhanced Causal Discovery Algorithm

3.4. Theoretical Properties

4. Evaluation

4.1. Baselines

4.2. Evaluation Metrics

4.3. Simulated Data Generation

4.4. Results

4.4.1. Linear Setting

4.4.2. Nonlinear Setting

4.5. Benchmark Data

4.6. Hyperparameters Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Pseudocodes of the Orientation Rules

Appendix B. Proofs of Theoretical Properties

Appendix C. Detailed Hyperparameter Settings

Appendix C.1. Hyperparameters of Linear Settings

Appendix C.2. Hyperparameters of Nonlinear Settings

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI