Causalized Convergent Cross Mapping and Its Implementation in Causality Analysis

Rooted in dynamic systems theory, convergent cross mapping (CCM) has attracted increased attention recently due to its capability in detecting linear and nonlinear causal coupling in both random and deterministic settings. One limitation with CCM is that it uses both past and future values to predict the current value, which is inconsistent with the widely accepted definition of causality, where it is assumed that the future values of one process cannot influence the past of another. To overcome this obstacle, in our previous research, we introduced the concept of causalized convergent cross mapping (cCCM), where future values are no longer used to predict the current value. In this paper, we focus on the implementation of cCCM in causality analysis. More specifically, we demonstrate the effectiveness of cCCM in identifying both linear and nonlinear causal coupling in various settings through a large number of examples, including Gaussian random variables with additive noise, sinusoidal waveforms, autoregressive models, stochastic processes with a dominant spectral component embedded in noise, deterministic chaotic maps, and systems with memory, as well as experimental fMRI data. In particular, we analyze the impact of shadow manifold construction on the performance of cCCM and provide detailed guidelines on how to configure the key parameters of cCCM in different applications. Overall, our analysis indicates that cCCM is a promising and easy-to-implement tool for causality analysis in a wide spectrum of applications.


Introduction
Causality analysis aims to find the relationship between causes and effects by exploring the directional influence of one variable on the other, and it has been a central topic in science, economy, climate, and many other fields [1][2][3][4][5][6][7][8][9].Compared with correlation, which reflects the mutual dependence between two variables, causality analysis may provide additional information since two time series with low correlation may have strong unidirectional or bi-directional causal coupling between them.Some representative examples can be found in [9].
The first practical causal analysis framework is Granger Causality (GC), which was proposed by Granger in 1969 [10].GC is a statistical approach that relies on a multi-step linear prediction model and aims to determine whether the values of one time series are useful in predicting the future values of the other.As a well-known technique, the validity and computational simplicity of GC have been widely recognized [11][12][13][14].At the same time, it has also been noticed that when there exists instantaneous and/or strong nonlinear interactions between two regions, GC analysis may lead to invalid results [9,15].Moreover, GC may not be able to detect the causation in deterministic settings [10,16].
In 1990, directed information (DI)-the first causality detection tool based on information theory-was proposed by Massey [17] when studying discrete memoryless communication channels with feedback.DI measures the directed information flowing from one sequence to the other.As an information-theoretic framework, a major advantage of DI is that it is a universal method that does not rely on any model assumptions of the signals and is not limited by linearity or separability [18,19].In refs.[9,18], the performance of DI in causality analysis was demonstrated using both simulated data and experimental fMRI data.It was found that DI is capable of detecting both linear and non-linear causal relationships.However, it was also noticed that the direct evaluation of DI relies heavily on probability estimation and tends to be sensitive to data length as well as the step size used in the quantization process [9].
In 2012, convergent cross mapping (CCM), a new causality model based on state space reconstruction was proposed by Sugihara et al. [16], and it was demonstrated that CCM could serve as an effective tool in addressing non-separable systems and identifying weakly coupled variables under deterministic settings, which may not be covered by GC.Since then, CCM has attracted considerable attention from the research community in many different fields [20][21][22][23][24][25][26][27][28].
Recall that causality aims to determine whether the current and past values of one time series are useful in predicting the future values of another in addition to its own past values.In CCM, however, both the past and future values are utilized to reconstruct the current value [9].As a result, the causality defined by CCM is inconsistent with the original, widely accepted definition of causality where the key assumption is that the future values of one process cannot influence the past of the other.
Motivated by this observation, in [9], we introduced the concept of causalized convergent cross mapping (cCCM).More specifically, if only the current and historical values of X and the past values of Y are used to predict the current value Y(t), and vice versa, then CCM is converted to causalized CCM.We further proved the approximate equivalence of DI and cCCM under stationary ergodic Gaussian random processes [9].
This study is a continued work of our previous research [9] and is focused on the implementation perspective of cCCM in causality detection.More specifically, in this study, we aimed to further investigate the effectiveness of cCCM in identifying both linear and nonlinear causal coupling in various settings through a large number of examples, including Gaussian random variables with additive noise, sinusoidal waveforms, autoregressive models, stochastic processes with a dominant spectral component embedded in noise, deterministic chaotic maps, and systems with memory, as well as experimental functional Magnetic Resonance Imaging (fMRI) data.In particular, we analyze the impact of shadow manifold construction on the performance of cCCM and provide detailed guidelines on how to configure the key parameters of cCCM (especially the shadow manifold dimension and time lag) in different applications.Moreover, we examine the noise effect in cCCM and show that, in general, reliable causality detection can be achieved when the signal-to-noise ratio (SNR) is larger than 15 dB.Overall, our analysis indicates that CCM is a promising and easy-to-implement tool for causality analysis in a wide spectrum of applications.
The rest of the paper is organized as follows.In Section 2, we briefly revisit the original CCM, the causalized CCM (cCCM), the conditional equivalence between cCCM and DI, and the extension of bivariate cCCM to multivariate cCCM.In Section 3, we present the major results of the study, where we discuss the impact of noise on the performance of cCCM, evaluate the effectiveness of cCCM in causality analysis through numerous examples, provide detailed guidelines on the configuration of cCCM, and compare the performances of bivariate and multivariate cCCM and GC through both simulation examples and ex-perimental fMRI data.Finally, we present the conclusions drawn from this research and provide related discussions in Section 4.

A Revisit of Causalized Convergent Cross Mapping
In this section, we first briefly revisit convergent cross mapping (CCM) [16] and introduce the concept of causalized CCM (cCCM) [9].We then present the conditional equivalence of cCCM and the directed information framework [9] and introduce the extension of bivariate cCCM to multivariate cCCM.Convergent cross mapping (CCM).The CCM algorithm is based on state-space reconstruction.Consider two dynamically coupled variables X and Y that share the same attractor manifold M. Let ] be the time series consisting of samples of X and Y, respectively.The CCM causality analysis framework can be summarized as follows: • Step 1: Construct the shadow manifolds with respect to X n and Y n .
• Step 2: For each vector x t , find its E + where where d denotes the Euclidean distance between two vectors.Please note that for every i, ω i is a function of t.The cross mapping from Y to X can be defined in a similar way.As n increases, it is expected that Xt |M y and Ŷt |M x would converge to X t and Y t , respectively.• Step 4: The cross mapping correlations are defined as where ρ denotes the Pearson correlation.
then we say that the causal effect of X on Y is stronger than that in the reverse.

Geometric illustration of convergent cross mapping.
Here, we provide the geometric illustration of convergent cross mapping from the shadow manifold M x to the shadow manifold M y under both strong and weak causal coupling.Figure 1a corresponds to the situation when there is a strong causal relationship from X to Y, and Figure 1b illustrates the case when there is only a weak causation.For illustration purpose, the dimension of the shadow manifold was chosen to be E = 2, the neighborhood of x t is represented using a the simplex consisting of three nearest neighbors, and the neighborhood of y t is represented in the same way.Causalized convergent cross mapping (cCCM).Note that in CCM, both the past and future values are used in data reconstruction, which is inconsistent with the original definition of causality where it is assumed that the future values of one process cannot impact the past of another.For this reason, we propose the concept of causalized convergent cross mapping (cCCM).
More specifically, in CCM, if we limit the search of all the nearest neighbors in M x to t i < t, i.e., we only use the current and previous values of X and the past values of Y to predict the current value Y t , operating in the same way for the other direction, and then we obtained causalized CCM.That being said, Step 2 in cCCM now becomes Then, we follow Steps 3-5 above, and denote the corresponding causalized cross mapping correlation, or the cCCM causation, as ρ cCCM .Conditional equivalence between cCCM and directed information.As an informationtheoretic causality model, directed information (DI) measures the information flow from one time series to the other.DI plays a central role in causality analysis for two reasons.First, it is a universal method that does not have any modeling constraints on the sequences to be evaluated [29,30].Second, DI serves as the pivot that links existing causality models GC [10,18], transfer entropy (TE) [9,31,32], and dynamic causal modeling (DCM) [33,34] through conditional equivalence between them.
In [9], we proved the conditional equivalence between cCCM and DI under Gaussian variables and used DI as a bridge to connect cCCM to other representative tools of causality analysis.More specifically, we showed that if (i) X and Y are dynamically coupled, zero-mean Gaussian random variables and their joint distribution is bivariate Gaussian, and (ii) X n ,Y n are stationary ergodic random processes; then, when n is sufficiently large, where Īn (X → Y) denotes the average DI from X to Y, measured in bits per sample.The conditional equivalence of DI and cCCM under Gaussian random variables was demonstrated in [9] using experimental fMRI data.This result also connects cCCM to other representative causality analysis frameworks in the family-GC, TE (Transfer Entropy, 2000 [31]), and DCM (Dynamic causal modelling, 2003 [33])-through the conditional equivalence between them under Gaussian random variables [9,12].
It is worth pointing out that the simulation-based analysis in [9] suggested that cCCM is often more robust in causality detection than DI.This is mainly because the DI calculation is based on probability estimation, which is sensitive to the step size used in the quantization process [35].cCCM, on the other hand, gets around this obstacle through geometric cross mapping between the corresponding shadow manifolds, at the cost of a higher computational complexity.More specifically, cCCM relies on a K-nearest neighbor search and has a computational complexity of O(n 2 ) in the sequence length n, but the computational complexity of DI is only O(n).Extension of bivariate cCCM to multivariate cCCM.Bivariate cCCM can be extended to multivariate conditional cCCM [9] based on a multivariate KNN search, which takes a similar approach as in the multivariate KNN predictability approaches [36][37][38][39].
Let Ω = {X 1 , • • • , X L } denote the set of dynamically coupled random variables that share the same attractor manifold.As shown in [9], the multivariate conditional cCCM from X j → X i with respect to Ω ∖ {X i , X j } (i.e., all the remaining random variables in Ω) is defined using the causality ratio as cCCM(X j → , where e n i |Ω ∖ {X j } denotes the estimation error vector based on Ω ∖ {X j }, and e n i |Ω is the estimation error vector based on the whole Ω.The definition can be adjusted by modifying Ω to reflect the conditional cCCM with respect to either an individual random variable or a group of random variables.

The Impact of Estimation Error in cCCM
Note that CCM and cCCM are based on data reconstruction, and the reconstructed data converge to the true data as the data length goes to infinity when there exists causal coupling between the random variables under consideration.Here, we consider the impact of estimation error in cCCM.
As an example, we consider where t i < t, and When there exists estimation error, we can model Ŷ as where n e denotes the estimation error, which is independent of Y.In assuming n e is of zero-mean and variance σ 2 e , it can be shown that (please refer to the Supplementary file of [9]) where σ 2 Y denotes the variance of Y.This result implies that the cCCM value ρ cCCM (X → Y) ≈ ρ(Y, Ŷ) decreases as the estimation error power increases.
In the following, using the noise-free case as the benchmark, we examine the noise effect on cCCM through simulation examples, including Gaussian random variable and its signed and squared versions (Examples 1 and 2, respectively), as well as sinusoidal waveforms (Example 3).As shown in Table 1, when we increase the SNR from 0 dB to 20 dB, the cCCM value of the noisy signal gradually converges to the noise-free result.More specifically, our results suggest that reliable causality detection can be achieved when the SNR is larger than 15 dB.The performance of cCCM is not only affected by noise but also closely related to the selection of E and τ.For Examples 1 and 2 in Table 1, we chose E = 5 and τ = 1.For Example 3, we used E = 5 and τ = 5.Here, a larger τ is used mainly because X(t) and Y(t) are significantly over-sampled in Example 3.More discussion on the choice of shadow manifold parameters can be found in Section 3.2.

Causality Detection Using cCCM and the Choice of Shadow Manifold Parameters
In this section, we illustrate the performance of cCCM (together with CCM) in causality detection through simulation examples, including autoregressive models, stochastic processes with a dominant spectral component embedded in noise, deterministic chaotic maps, and systems with memory.As will be seen, these examples show that CCM and cCCM are sensitive to changes in coupling strength.It can be observed that CCM tends to result in larger causation values than cCCM; this is expected since CCM uses both the past and future values of X to predict the current value of Y (and vice versa), while cCCM only uses both the past values of X to predict the current value of Y (and vice versa).
We will also discuss the choice of key parameters-the dimension of shadow manifold E and the time lag τ-in the cCCM algorithm and the impact of these parameters on the detection of causal relationships.According to Takens' theorem [40] and Whitney's embedding theorem [41,42], the "magic number" is E = 2d + 1, and often less [16], where d is the dimension of the attractor M shared by X and Y. Another parameter, the time lag τ, is generally chosen as τ = 1.When the signal is over-sampled, τ > 1 can also be used.
It should be noted that for an accurate assessment of the causation, the sampling rate should always be chosen to be larger than the Nyquist rate.Otherwise, the causal relationship identified by cCCM may be invalid since the under-sampled sequences cannot capture the total information in the original signals.

Example 4:
Let X and Y be random processes given by where n 1 , n 2 ∼ N(0, 0.05 2 ), n 1 and n 2 are independent, t = [0, 1, 2, • • • , 2047], and X(0) = Y(0) = 1.5.We chose E = 5 and τ = 1, and then the cCCM and CCM values between these two time series are The convergence of CCM and cCCM with respect to the data length is shown in Figure 2.

Example 5:
Let X and Y be random processes given by where n 1 , n 2 ∼ N(0, 0.05 2 ), n 1 and n 2 are independent, t = [0, 1, • • • , 2047], and X(0) = Y(0) = 1.5.Then, the cCCM and CCM values between these two time series are The convergence of CCM and cCCM with respect to the data length is shown in Figure 2.

Example 6:
Let X and Y be two stochastic processes given by where n 1 , n 2 are independent AWGN noise with SNR = 10 dB, and t = 0 : 0.005 : 2 (here, 0.005 is the step size).Then, the cCCM and CCM values between these two time series are The convergence of CCM and cCCM with respect to the data length is shown in Figure 3.

Example 7:
Let X and Y be two stochastic processes given by where n 1 and n 2 are independent AWGN noise with SNR= 10dB, and t = 0 : 0.005 : 2.Then, the cCCM and CCM values between these two time series are The convergence of CCM and cCCM with respect to the data length is shown in Figure 3.We selected τ = 5 in Examples 6 and 7 to reduce the impact of noise; please refer to Section 3.2.5 for more details.

Example 8:
Let X and Y be two stochastic processes given by where t = [0, 1, • • • , 2047], X(0) = 0.7, and Y(0) = 0.1.Then, the cCCM and CCM values between these two time series are The convergence of CCM and cCCM with respect to the data length is shown in Figure 4.

Example 9:
Let X and Y be two stochastic processes given by where t = [0, 2047], X(0) = 0.7, and Y(0) = 0.1.Then, the cCCM and CCM values between these two time series are The convergence of CCM and cCCM with respect to the data length is shown in Figure 4.

Examples on Systems with Memory
In this subsection, we examined the causal relationship in systems with memory (Examples 10-14) using CCM and cCCM under different choices of E and τ.Example 10: Consider a system with memory Here, (i) the MATLAB command "randn(1024,1)" returns an 1024-by-1 matrix of normally distributed random numbers; (ii) t = [0, 1, • • • , 1023] and X(t) = 0 while t < 0. The results corresponding to different E or τ values are displayed in Table 2. Values for E and τ cCCM CCM We can see that the causation from X → Y cannot be fully captured when E = and τ = 1.

Example 11:
Consider where t = [0, 1, • • • , 1023], and X(t) = 0 while t < 0.Then, for different E or τ values, the results are displayed in Table 3.Values for E and τ cCCM CCM We can see that the causation from X → Y cannot be fully captured when E = and τ = 1, 2.

Example 12:
Consider a system with different dominant delays from Example 11: where t = [0, 1, • • • , 1023], and X(t) = 0 while t < 0.Then, for different E or τ values, the results are displayed in Table 4. Values for E and τ cCCM CCM We can see that the causation from X → Y cannot be fully captured when E = 3 and τ = 1.

Example 13:
Consider where t = [0, 1, • • • , 1023], and X(t) = 0 while t < 0.Then, for different E or τ values, the results are displayed in Table 5.Values for E and τ cCCM CCM From this example, we can see the following: (i) when E = 5, τ = 1, we have and the causation corresponding to item 0.6X(t − 5) cannot be captured; (ii) when E = 3 and τ = 2, we have x(t) = [X(t), X(t − 2), X(t − 4)], and the causation corresponding to items 0.8X(t − 1) and 0.6X(t − 5) cannot be captured; and (iii) when E = 6 and τ = 1, we have x(t) = [X(t), X(t − 1), • • • , X(t − 5)], and the causation corresponding to all the items can be captured.Now, if we consider the time-delayed causality, in which X(t) remains the same and Y 1 (t) = Y(t + 1), then this is equivalent to considering the causality from X 1 (t) = X(t − 1) to Y(t).In this case, as shown in Table 6, even when E = 5 and τ = 1, we have , and the causation corresponding to all the items can be captured.where t = [0, 1, • • • , 1023], and X(t) = 0 while t < 0.Then, for different E or τ values, the results are displayed in Table 7.
In this example, both E = 5, τ = 1 and E = 3, τ = 2 can only capture the causation corresponding to 0.8X(t − 4), and E = 6 and τ = 1 can capture the overall causation accurately.Now, if we consider the time-delayed causality, in which X(t) remains the same and Y 3 (t) = Y(t + 3), then this is equivalent to considering the causality from X 3 (t) = X(t − 3) to Y(t).In this case, as shown in Table 8, E = 5 and τ = 1 work even better than E = 6 and τ = 1 since E = 5 leads to a manifold with a lower dimension and, hence, a higher nearest neighbor density.Values for E and τ cCCM CCM  From Examples 10-14, it can be seen that in systems with memory, the selection of the shadow manifold dimension E and the signal lag τ largely rely on the positions of the dominant delays in the channel impulse response.
It can be seen that in systems with memory, for the accurate evaluation of CCM and cCCM causality, the following conditions need to be satisfied: , where d d,max denotes the largest dominant delay.(b) For each t, the shadow manifold constructing vector ] should contain all the samples corresponding to the dominant delays.
It is also observed that if the conditions above are not satisfied, time-delayed cCCM from X(t − τ) to Y(t) might still capture the causation accurately if the instantaneous information exchange between X(t) and Y(t) is not significant.More specifically, if we consider a linear time-invariant (LTI) system Y(t) = X(t) * h(t) = L ∑ l=0 h(l)X(t − l), where h(t) denotes the channel impulse response, when h(0) is negligibly small, we say that there is no significant instantaneous information exchange between X(t) and Y(t).
In the following two examples, we compare the performance between cCCM and Granger causality (GC) for systems with memory.

Example 15:
Consider a system with memory: X = randn(1024, 1), ( 9) where t = [0, 1, • • • , 1023], and X(t) = 0 while t < 0. Here, we assume that n(t) ∼ N(0, σ 2 ) is independent of X.We then compare the performances of GC and CCM under different noise powers, and the results are shown in Table 9.As can be seen, as long as the signal-to-noise ratio (SNR) is not too small, cCCM can capture the strong bidirectional causality between X(t) and Y(t), but GC cannot.This is mainly because cCCM takes the instantaneous information exchange between X(t) and Y(t) into consideration, but GC does not.That is, when there exists instantaneous information exchange between X(t) and Y(t), GC may fail to capture the causal coupling between X(t) and Y(t).
It is also observed that the ρ cCCM value decreases as the noise power increases, which is consistent with our analysis in Section 3.1.When σ 2 = 4 and SNR = −7.74dB, both cCCM and GC can no longer deliver valid results due to the strong noise effect.
Recall that the most commonly used method in Granger causality [10][11][12] analysis is to compare the following two prediction errors e i and ẽi : And the Granger causality is defined to be the log-likelihood ratio where e = [e 1 , e 2 , . . ., e n ] T , ẽ = [ ẽ1 , ẽ2 , . . ., ẽn ] T , and |cov(•)| stands for the determinant of the covariance matrix.
Our results in Table 9 and the definition of GC suggest that the small fluctuations in the GC values as the noise variance increases from 10 −6 to 4 are more likely to reflect the impact of the noise rather than the detection of the causality.

Example 16:
Consider where t = [0, 1, • • • , 1023], X(t) = 0 while t < 0, and n(t) ∼ N(0, 10 −6 ) is an independently generated Gaussian noise.Then, the Granger causality between X and Y is and the causality detected by cCCM is In this example, there is no instantaneous information exchange between X(t) and Y(t), and both GC and cCCM detect the strong unidirectional causality from X to Y and deliver consistent results.

Additional Examples on the Selection of the Dimension of the Shadow Manifold E and Time Lag τ
In this subsection, we illustrate the impact of E and τ on the performance of cCCM through some additional simulation examples, including single-tone time series embedded in noise (Example 13) and Gaussian stochastic process (Example 14).As in Example 3, it was found that a large E • τ value may help enhance the performance of cCCM under noise.However, it is also noticed that if E is too large, cCCM may no longer deliver valid results, as the excessively high dimension of the shadow manifold significantly reduces the density of the nearest neighbors, leading to inaccurate state-space reconstruction and causality evaluation.

Example 17:
This is a revisit of Example 3, with additional discussions on the selection of E and τ and different sampling instants.Consider the following noisy single-tone time series: where t = 0 : 0.01π : 2π, and n 1 (t) and n 2 (t) are independent AWGN noises with SNRs varying in 0, 5, 10, 15, and 20 dB, or equal to 0 for all t in the noise-free case.By changing the values for E and τ, we are able to observe different noise effects.The simulation results for E = 5, τ = 1 and E = 5, τ = 5 are shown in Table 10 below.As can be seen, as we increase the length of the data span E • τ, the noise effect is reduced.In particular, compared with E = 5 and τ = 1, when we choose E = 5 and τ = 5, a much better noise immunity is achieved, since E • τ is sufficiently long.
Note that increasing τ leads to the downsampling of the time series, and increasing E expands the dimension of the shadow manifolds.An over-increase in E or τ might downgrade the performance of cCCM.From Example 14, it can be seen that if E is much larger than 2d + 1, cCCM may deliver inaccurate results.

Example 18:
Consider In this example, there is a strong unidirectional causality from X to Y, but very weak causation in the inverse direction.Choose τ = 1.From Figure 5, we can see that as E increases, the cCCM value keeps on decreasing and reduces to 0.2 when E = 50, which no longer reflects the strong unidirectional causality from X to Y.

Examples of the Impact of Sampling Frequency on cCCM
In this subsection, we show that for the accurate assessment of causation, signals under consideration should be sampled with a sampling frequency higher than the Nyquist rate.

Example 19:
Consider where t = 0 : 0.005 : 4, and k = 150, 200, 400.From Figure 6, it can be seen that if the sampling frequency is higher than the Nyquist rate, then strong bidirectional causal coupling can be detected between X and Y. On the other hand, if the sampling frequency is lower than the Nyquist rate, then the resulted cCCM value is no longer valid.

Examples on Data Repetition in Causality Analysis
The following examples illustrate that even if X and Y are two independent signals that are not causally coupled, a causal pattern can be enforced in the concatenated time series through data repetition.

Example 20:
Let X = randn(1000, 1) and Y = randn(1000, 1) be two independent normally distributed time series.We have Then, we have As can be seen, data concatenation results in strong causality that does not exist in the original X and Y.

Example 21:
Let X 0 =randn(1024,1), Y 0 =randn(1024,1), and Z =randn(1024,1) be independent and normally distributed Gaussian random variables.Consider Then, the bivariate cCCM between X and Y is which provides a delusion that there exists strong bidirectional causality between X and Y. On the other hand, the multivariate cCCM between X and Y conditioning on Z is cCCM(X → Y|Z) = 0.0298, cCCM(Y → X|Z) = 0.0306, which accurately reflect the independent relationship between X and Y. From this example, it can be seen that conditional cCCM can help inspect the dependence among the random variables under consideration and may deliver more accurate results in the causality evaluation.

Application of cCCM for Brain Causality Analysis Using Experimental fMRI Data
In this study, we applied both bivariate and multivariate cCCM for a causality analysis of the brain network using experimental fMRI data and compared the results with those of GC [10,43].
We considered an fMRI dataset where fourteen right-handed healthy college students (7 males and 7 females, 23.4 ± 4.2 years of age) from Michigan State University volunteered to participate in a task-driven fMRI-based study.For each subject, fMRI datasets were collected on a visual stimulation condition with a scene-object fMRI paradigm, where each volume of images was acquired 192 times (8 min) while a subject was presented with 12 blocks of visual stimulation after an initial 10 s resting period.In a predefined randomized order, the scenery pictures were presented in six blocks, and the object pictures were presented in another six blocks.All pictures were unique.In each block, ten pictures were presented continuously for 25 s (2.5 s for each picture), followed with a 15 s baseline condition (a white screen with a black fixation cross at the center).The subject needed to press their right index finger once when the screen was switched from the baseline to the picture condition.More details on fMRI data acquisition and preprocessing can be found in [44].

Results for Bivariate and Multivariate cCCM
Note that the total length of the fMRI BOLD time series under visual stimulation condition was n = 192, with the sampling period being 2.5 s.In the literature, it was reported that increasing the sampling rate of the fMRI signal can improve the robustness of the causality analysis [45].Here, we first interpolate the fMRI sequence by a factor of 2 using the spline interpolation command in MATLAB and then conducted causality analysis for all the possible unidirectional regional pairs using both bivariate and multivariate cCCM.
The results corresponding to multivariate conditional cCCM with respect to individual brain regions are shown in Figure 8.As can be seen, RV1 has the most significant impact on the conditional causality from LV1 → LSMC, LV1 → LPW M, and LV1 → RFW M. This implies that RV1 has the highest inter-region dependence with LV1.For the same reason, LV1 has the most significant impact on the conditional causality from RV1 → LSMC.That is, multivariate conditional cCCM with respect to individual regions can detect unidirectional causality and also reflect the impact of interdependence between the ROIs on the conditional causality.The results indicate that multivariate cCCM (with respect to individual regions) can detect unidirectional causality and also reflect the impact of interdependence between the ROIs on the conditional causality.More specifically, it can be seen that due to the dependence between the brain regions, multivariate conditional CCM values are much smaller than the bivariate cCCM values.In particular, RV1 has the most significant impact on the conditional causality from LV1 → LSMC, LV1 → LPW M, and LV1 → RFW M. This implies that RV1 has the highest dependence with LV1.For the same reason, LV1 has the the most significant impact on the conditional causality from RV1 → LSMC.

Results for Bivariate and Multivariate Granger Causality (GC)
For comparison purposes, we analyzed the brain network causality using both bivariate and multivariate GC [12] with the same fMRI dataset.
From Figures 7 and 9, it can be seen that bivariate GC delivers similar results as cCCM except for the causal coupling from LV1 → LSMC.More specifically, cCCM shows that there exists unidirectional causality from LV1 → LSMC, while GC shows that there exists bidirectional causality between LV1 and LSMC but no significant unidirectional causality.In ref. [9], the DI-based causality analysis also verified the presence of unidirectional causality from LV1 → LSMC for the same dataset, which is consistent with the results of cCCM.These results suggest that for this fMRI dataset, cCCM tends to deliver a more accurate causality evaluation than GC.The results of GC are largely consistent with those of bivariate cCCM except from LV1 → LSMC.This may be because (i) cCCM takes the instantaneous information exchange between X(t) and Y(t) into consideration, but GC does not; and (ii) cCCM can capture both linear and nonlinear causal causal coupling, and GC may have difficulty in detecting nonlinear causality.That is, when there exists instantaneous information exchange and/or a nonlinear causal relationship between X(t) and Y(t), GC may fail to capture the underlying causal coupling accurately.
A natural question arises: how should we explain the difference between cCCM and GC for the causality analysis here?Since based on the central limit theorem, fMRI signals can be modeled as Gaussian random variables for which cCCM and GC are conditionally equivalent.The underlying argument is that the equivalence between cCCM and GC under Gaussian random variables is subject to two conditions: (i) both X(t) and Y(t) follow the linear auto-regression model; and (ii) there is no significant instantaneous information exchange between X(t) and Y(t).More specifically, cCCM takes the instantaneous information exchange between X(t) and Y(t) into consideration, but GC does not.For this reason, when there exists instantaneous information exchange between X(t) and Y(t), GC may fail to capture the causal coupling between X(t) and Y(t), but cCCM succeeds.This is demonstrated through simulations in Example 15.In addition, cCCM can capture both linear and nonlinear causal causal coupling, but GC may have difficulty in detecting nonlinear causality.For these reasons, cCCM might be a more robust causality analysis tool than GC.
In the multivariate case, the theoretical relationship between cCCM and GC is not clear yet.In comparing Figures 8 and 10, it can be seen that the results of multivariate cCCM and GC are largely consistent for LV1 → LSMC and LV1 → RFW M.However, they deliver very different results for the conditional causality from RV1 → LSMC and LV1 → LPW M with respect to other individual regions.In particular, for these two region pairs, the results of multivariate cCCM with respect to other individual regions are consistent with their bivariate counterparts and also reflect the impact of inter-region dependence on the conditional causality.However, the corresponding results of multivariate GC with respect to other individual regions vary significantly with the region under consideration, and 50% or more are no longer consistent with those of the bivariate GC.Further theoretical analysis is needed on the theoretical relationship between conditional GC and multivariate cCCM, as well as the relationship between DI and the recent minimum entropy framework [46] in both bivariate and multivariate scenarios.

Conclusions and Discussion
In this paper, we revisited the definition of original CCM, identified the gap between CCM and the traditional definition of causality, presented causalized CCM (cCCM), and discussed the conditional equivalence of cCCM and directed information and the extension of bivariate cCCM to multivariate cCCM.We then evaluated the effectiveness of cCCM in the detection of causality through a large number of examples including Gaussian random variables with additive noise, sinusoidal waveforms, autoregressive models, stochastic processes with a dominant spectral component embedded in noise, deterministic chaotic maps, and systems with memory, as well as experimental fMRI data.We also examined the impact of noise on the performance of cCCM, and our results suggest that, in general, reliable results can be achieved when SNR > 15 dB.In particular, we provided detailed discussions on the choice of the dimension of the shadow manifolds E and the time lag τ and the impact of these parameters on the detection of causal relationships using cCCM.Finally, we applied both bivariate and multivariate cCCM for the causality analysis of the brain network using experimental fMRI data and compared the results with those of GC.
Based on the conditional equivalence of cCCM and DI [9], we can see that cCCM provides an alternative way to evaluate the directed information transfer between stationary ergodic Gaussian random variables.Compared with DI, which relies heavily on probability estimation and tends to be sensitive to data length and quantization step size, cCCM, on the other hand, gets around this problem through geometric cross mapping between the manifolds involved.
However, the advantage of cross-mapping-based causality detection techniques comes with prices.The major limitation with CCM and cCCM is that they are based on the K-nearest neighbor (KNN) search algorithm and hence have a high computation complexity O(n 2 ), where n is the data length.The convergence speeds of CCM and cCCM also vary with the signals under applications and need to be taken into consideration in causality analysis, especially in dynamic systems where the causal relationships are time-variant.It is worthy to point out that when combined with the sliding window approach [47,48], cCCM can be used to evaluate time-varying causality in dynamic networks such as brain networks [49].
Overall, both our theoretical [9] and numerical analysis demonstrated that cCCM is a promising and easy-to-implement tool for causality detection in a wide spectrum of applications.In this paper, we showed that appropriate choices of E, τ, and the sampling frequency are critical for cCCM-based causality analysis and provided detailed guidelines on the configuration of cCCM.We wish that this paper can serve as a helpful reference on the implementation of cCCM for causality detection in different applications.

Figure 1 .
Figure 1.Geometric illustration of the cross mapping from M x to M y .(a) When strong causation exists from X to Y, the nearest neighbors of x t are mapped to the nearest neighbors of y t .(b) When there is only weak causation from X to Y, the nearest neighbors of x t are no longer mapped to the nearest neighbors of y t .

Figure 2 .
Figure 2. Performance of cCCM and CCM versus the data length for Examples 4 and 5.

Figure 3 .
Figure 3. Performance of cCCM and CCM versus the data length for Examples 6 and 7.

Figure 4 .
Figure 4. Performance of cCCM and CCM versus the data length for Examples 8 and 9.

Figure 6 .
Figure 6.Impact of sampling frequency on cCCM convergence speed: an illustration using sinusoidal waveforms with different frequencies.(a) f 0 = 75 Hz; (b) f 0 = 100 Hz; (c) f 0 = 200 Hz.Here, f 0 denotes the frequency of the corresponding sinusoidal waveform.The sampling time sequence was chosen as t = 0 : 0.005 : 4; that is, sampling frequency f s = 200 Hz.As can be seen, cCCM works well when the sampling rate is above the Nyquist rate, as shown in (a) but may or may not deliver meaningful results when the sampling frequency is below or equal to the Nyquist rate, as shown in (b,c).

Figure 7 .Figure 8 .
Figure 7. FMRI-based causality analysis using bivariate cCCM.Unidirectional causality was detected in the brain network under a visual simulation condition with a scene-object fMRI paradigm.

Figure 9 .
Figure 9. FMRI-based causality analysis using bivariate GC.The results of GC are largely consistent with those of bivariate cCCM except from LV1 → LSMC.This may be because (i) cCCM takes the instantaneous information exchange between X(t) and Y(t) into consideration, but GC does not; and (ii) cCCM can capture both linear and nonlinear causal causal coupling, and GC may have difficulty in detecting nonlinear causality.That is, when there exists instantaneous information exchange and/or a nonlinear causal relationship between X(t) and Y(t), GC may fail to capture the underlying causal coupling accurately.

Figure 10 .
Figure 10.FMRI-based causality analysis using multivariate conditional GC with respect to individual regions.(a) LV1 → LSMC, (b) RV1 → LSMC, (c) LV1 → LPWM, (d) LV1 → RFWM.It can be seen that the results of multivariate cCCM and GC are largely consistent for (a,d).However, for (b,c), multivariate cCCM and GC deliver very different results.In particular, for the conditional causality from RV1 → LSMC and LV1 → LPWM, the results of multivariate cCCM with respect to other individual regions are consistent with their bivariate counterparts and also reflect the impact of inter-region dependence on the conditional causality.However, the corresponding results of multivariate GC with respect to other individual regions vary significantly with the region under consideration, and 50% or more are no longer consistent with those of the bivariate GC.

• Step 2 for cCCM: For
each vector x t , find its E + 1 nearest neighbors in M x with an index smaller than t and denote the time indices (from closest to farthest) of the E + 1 nearest neighbors of x t by t 1 , ...t E+1 .Note that for i = 1, 2, • • • , E + 1, we now have t i < t.

Table 4 .
The results of Example 12.

Table 5 .
The results of Example 13.

Table 6 .
The results of time-delayed causality analysis in Example 13.

Table 7 .
The results of Example 14.

Table 8 .
The results of time-delayed causality analysis in Example 14.

Table 10 .
Performance of cCCM under additive white Guassian noise with different E and τ values.