Causal Discovery of Stochastic Dynamical Systems: A Markov Chain Approach
Abstract
:1. Introduction
2. Preliminaries
2.1. Basic Definitions
3. Markov Property
4. Markov Chains
5. Inference
5.1. Preconditions and Preprocessing
5.2. Testing
5.3. Implementation Details
- Do preprocessing (see Section 5.1);
- Find the most significant lag where the two time series are the most dependent (for example, simple or partial cross-correlation can be used for this);
- Make the time series stationary (remove trend, seasonality, …etc.);
- Find the memory of the processes (e.g., with partial auto-correlation);
- Apply time-delay embedding to convert a order Markov chain to a 1st order Markov chain;
- Based on the remaining rows, check which of the dependence measures need to be tested for significance (i.e., being greater than zero), and filter out additional rows;
- The remaining row represents the possible causal case that is in alignment with the data.
- How do we even test the Markov property? Assuming a discrete-valued time series, if we select a single point in time and consider it to be the “present” (according to the Markov property), then the previous data point from the present is the past, and the subsequent data point from the present is the future. We can loop through the time series and look for the same value that we currently have as present, and we can also collect the past and future values around those points in time. Thus, we practically found all possible conditional state transitions, given that one specific value of the condition. In order to test the Markov property, we need to do an independence test between these past and future values. In principle, a number of independence tests could be used; the choice is up to the researchers—in this particular implementation, we proceed with contingency-tables (having their own difficulties, see Items 3–4 below). Calculating a p-value is not trivial either (see Item 5 below), but even if we manage to do that, the Markov property is only proved locally because we analyzed a single value of the condition;
- To extend the testing procedure from a local property to a global one, we take all possible values of the condition and calculate a local p-value for each of them. The idea is that, strictly speaking, if the Markov property holds for all possible values of the condition, then we can say that the Markov property holds for the process in general. In this way, we run multiple hypothesis tests, leading to a multiple comparisons problem with dependent tests (since all the tests run on the same data)—in such cases, the harmonic mean of the p-values is a good proxy for the actual, global p-value (see [25]);
- We also need to handle continuous time series. Choosing an independence test for discrete variables requires some preparation before applying it to continuous data. First, if we select a specific point in time and consider it to be the present (i.e., we want to condition on it), then if we would like to collect all the related pasts and futures, it is impossible because we will not find the exact same value twice. To circumvent this issue, we find the k nearest neighbors (specifically, their indexes) for each “present’’ and consider them to be roughly equal (of course, k should be small, preferably vanishing compared to the sample size—for example, the square root of the sample size);
- We also need to discretize continuous time series. The most common way to discretize the data is to apply binning, but due to the time-delay embedding, we work with higher dimensional vectors for X and Y (and J, of course). If we binned along each dimensions, the amount of data would rapidly become insufficient for hypothesis testing, which would be especially problematic for smaller sample sizes. Instead of binning the embedded vectors along each dimension, we project the vectors onto a one-dimensional subspace (practically, combine a vector into a scalar with a linear transformation). Using random projections is one way to achieve this, which can be followed by the actual binning. However, using a single linear transformation for the whole time series may not work well; therefore, local random projections have to be repeated, i.e., different normal vectors have to be used and then the resulting p-values must be aggregated.
- In principle, it is up to the researcher to decide which independence test to use. As mentioned above, we have chosen contingency-tables in our examples. The problem with these is that our sample points are in general not independent; therefore, the analytical (or asymptotic) distribution of the test statistic is likely unavailable. To circumvent similar issues, bootstrapping-like techniques are commonly used to numerically approximate the cumulative distribution function of the test statistic. We apply the local permutation technique (proposed by Runge et al. [24]) to approximate the CDF of the test statistic under the null-hypothesis, which is then used to calculate the global p-value.
- Another technical issue is that filling in a contingency table based on a local neighborhood can easily lead to zeros appearing in the low probability cells of the table. For -like tests, this immediately leads to a p-value of 0, and even if a single local p-value is 0, then the global p-value obtained by taking their harmonic mean is badly affected. To avoid division by 0, we apply Yates’ correction to the contingency table to avoid strictly 0 local p-values.
Algorithm 1 Procedure for finding the p-value of Markov property for the time series separately, and their joint time series, given lags 0, 1, and −1. | |
Require: | x: sample, the first time series (preprocessed) |
Require: | y: sample, the second time series (preprocessed) |
Require: | m: integer, memory; maximum of the largest significant lags of the two chains |
Require: | r: number of times the local permutations should be repeated to bootstrap the |
test statistic | |
Require: | k: size of the local neighborhoods in which permutations are performed, shall be |
less than used for the dependence testing; for details, refer to [24]. | |
for do | |
for r do | |
end for | |
end for |
Algorithm 2: The function responsible for testing Markovness of a time series (via conditional independence testing). | |
Require: | : the time-series where the Markov property is to be tested |
Require: | : the list of unique values upon which conditioning is to be carried out |
(possible values of the “present”) | |
for to do | |
end for | |
return | |
* In our implementation, the independence test is the modified log-likelihood test. |
6. Examples
6.1. Simulated Data
6.2. Real World Examples
6.2.1. Weather Data
6.2.2. Plasma Fluctuation Data
7. Discussion
8. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Lloyd, G.E.R. Magic, Reason and Experience; Cambridge University Press: Cambridge, UK, 1979. [Google Scholar]
- Pearl, J. Causality, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar] [CrossRef]
- Wiener, N. The theory of prediction. In Modern Mathematics for Engineers; Beckenbach, E., Ed.; McGraw-Hill: New York, NY, USA, 1956. [Google Scholar]
- Granger, C.W.J. Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
- Maziarz, M. A review of the Granger-causality fallacy. J. Philos. Econ. 2015, 8, 6. [Google Scholar] [CrossRef]
- Balasis, G.; Donner, R.V.; Potirakis, S.M.; Runge, J.; Papadimitriou, C.; Daglis, I.A.; Eftaxias, K.; Kurths, J. Statistical Mechanics and Information-Theoretic Perspectives on Complexity in the Earth System. Entropy 2013, 15, 4844–4888. [Google Scholar] [CrossRef]
- Runge, J.; Bathiany, S.; Bollt, E.; Camps-Valls, G.; Coumou, D.; Deyle, E.; Glymour, C.; Kretschmer, M.; Mahecha, M.; Muñoz, J.; et al. Inferring causation from time series in Earth system sciences. Nat. Commun. 2019, 10, 2553. [Google Scholar] [CrossRef] [PubMed]
- Sugihara, G.; May, R.; Ye, H.; Hao Hsieh, C.; Deyle, E.; Fogarty, M.; Munch, S. Detecting Causality in Complex Ecosystems. Science 2012, 338, 496–500. [Google Scholar] [CrossRef]
- Takens, F. Detecting Strange Attractors in Turbulence. In Dynamical Systems and Turbulence, Warwick 1980; Lecture Notes in Mathematics; Rand, D., Young, L.S., Eds.; Springer: Berlin/Heidelberg, Germany, 1981; Volume 898, Chapter 21; pp. 366–381. [Google Scholar] [CrossRef]
- Stark, J. Delay Embeddings for Forced Systems. I. Deterministic Forcing. J. Nonlinear Sci. 1999, 9, 255–332. [Google Scholar] [CrossRef]
- Stark, J.; Broomhead, D.; Davies, M.; Huke, J. Delay Embeddings for Forced Systems. II. Stochastic Forcing. J. Nonlinear Sci. 2003, 13, 519–577. [Google Scholar] [CrossRef]
- Spirtes, P.; Glymour, C. An Algorithm for Fast Recovery of Sparse Causal Graphs. Soc. Sci. Comput. Rev. 1991, 9, 62–72. [Google Scholar] [CrossRef]
- Malinsky, D.; Spirtes, P. Causal Structure Learning from Multivariate Time Series in Settings with Unmeasured Confounding. In Proceedings of the 2018 ACM SIGKDD Workshop on Causal Disocvery, London, UK, 20 August 2018; Volume 92, pp. 23–47. [Google Scholar]
- Benko, Z.; Zlatniczki, A.; Stippinger, M.; Fabó, D.; Solyom, A.; Eross, L.; Telcs, A.; Somogyvari, Z. Complete Inference of Causal Relations between Dynamical Systems. arXiv 2018, arXiv:1808.10806. [Google Scholar]
- Lasota, A.; Mackey, M. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics; Applied Mathematical Sciences, Springer: New York, NY, USA, 2013. [Google Scholar]
- Norris, J. Markov Chains; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University: Cambridge, UK, 1998. [Google Scholar]
- Sun, J.; Taylor, D.; Bollt, E.M. Causal Network Inference by Optimal Causation Entropy. SIAM J. Appl. Dyn. Syst. 2015, 14, 73–106. [Google Scholar] [CrossRef]
- Li, C.; Fan, X. On nonparametric conditional independence tests for continuous variables. WIREs Comput. Stat. 2020, 12, e1489. [Google Scholar] [CrossRef]
- Lundborg, A.R.; Shah, R.D.; Peters, J. Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis. arXiv preprint 2021. [Google Scholar] [CrossRef]
- Guyon, I.; Janzing, D.; Schölkopf, B. Causality: Objectives and Assessment. In Proceedings of the Workshop on Causality: Objectives and Assessment at NIPS 2008, Whistler, BC, Canada, 12 December 2008; Volume 6, pp. 1–42. [Google Scholar]
- Lin, Z.; Han, F. On boosting the power of Chatterjee’s rank correlation. Biometrika 2022, asac048. [Google Scholar] [CrossRef]
- Azadkia, M.; Chatterjee, S.; Bayati, M.; Taylor, J. A Nonparametric Measure of Conditional Dependence; Stanford University: Stanford, CA, USA, 2020. [Google Scholar]
- Rényi, A. On measures of dependence. Acta Math. Hung. 1959, 10, 441–451. [Google Scholar] [CrossRef]
- Runge, J. Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, Lanzarote, Canary Islands, 9–11 April 2018; Volume 84, pp. 938–947. [Google Scholar]
- Wilson, D.J. The harmonic mean p-value for combining dependent tests. Proc. Natl. Acad. Sci. USA 2019, 116, 1195–1200. [Google Scholar] [CrossRef]
- Mooij, J.M.; Peters, J.; Janzing, D.; Zscheischler, J.; Schölkopf, B. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 2016, 17, 1103–1204. [Google Scholar]
- Hron, M.; Adámek, J.; Cavalier, J.; Dejarnac, R.; Ficker, O.; Grover, O.; Horáček, J.; Komm, M.; Macúšová, E.; Matveeva, E.; et al. Overview of the COMPASS results. Nucl. Fusion 2022, 62, 042021. [Google Scholar] [CrossRef]
- Anda, G.; Bencze, A.; Berta, M.; Dunai, D.; Hacek, P.; Krbec, J.; Réfy, D.; Krizsanóczi, T.; Bató, S.; Ilkei, T.; et al. Lithium beam diagnostic system on the COMPASS tokamak. Fusion Eng. Des. 2016, 108, 1–6. [Google Scholar] [CrossRef]
- Berta, M.; Anda, G.; Bencze, A.; Dunai, D.; Háček, P.; Hron, M.; Kovácsik, A.; Krbec, J.; Pánek, R.; Réfy, D.; et al. Li-BES detection system for plasma turbulence measurements on the COMPASS tokamak. Fusion Eng. Des. 2015, 96–97, 795–798. [Google Scholar] [CrossRef]
- Bencze, A.; Berta, M.; Buzás, A.; Hacek, P.; Krbec, J.; Szutyányi, M.; the COMPASS Team. Characterization of edge and scrape-off layer fluctuations using the fast Li-BES system on COMPASS. Plasma Phys. Control. Fusion 2019, 61, 085014. [Google Scholar] [CrossRef]
- Rudakov, D.L.; Boedo, J.A.; Moyer, R.A.; Krasheninnikov, S.; Leonard, A.W.; Mahdavi, M.A.; McKee, G.R.; Porter, G.D.; Stangeby, P.C.; Watkins, J.G.; et al. Fluctuation-driven transport in the DIII-D boundary. Plasma Phys. Control. Fusion 2002, 44, 717. [Google Scholar] [CrossRef]
- Vowels, M.J.; Camgoz, N.C.; Bowden, R. D’ya Like DAGs? A Survey on Structure Learning and Causal Discovery. ACM Comput. Surv. 2022, 55, 1–36. [Google Scholar] [CrossRef]
- Mastakouri, A.A.; Schölkopf, B.; Janzing, D. Necessary and sufficient conditions for causal feature selection in time series with latent common causes. arXiv 2020, arXiv:2005.08543. [Google Scholar]
- Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artif. Intell. 2008, 172, 1873–1896. [Google Scholar] [CrossRef]
- Zhang, J. Causal Reasoning with Ancestral Graphs. J. Mach. Learn. Res. 2008, 9, 1437–1474. [Google Scholar]
- Lin, H.; Zhang, J. On Learning Causal Structures from Non-Experimental Data without Any Faithfulness Assumption. In Proceedings of the 31st International Conference on Algorithmic Learning Theory, San Diego, CA, USA, 9–11 February 2020; Volume 117, pp. 554–582. [Google Scholar]
- Liu, T.; Ungar, L.; Kording, K. Quantifying causality in data science with quasi-experiments. Nat. Comput. Sci. 2021, 1, 24–32. [Google Scholar] [CrossRef]
- Hirata, Y.; Aihara, K. Identifying hidden common causes from bivariate time series: A method using recurrence plots. Phys. Rev. E 2010, 81, 016203. [Google Scholar] [CrossRef]
- Hirata, Y.; Amigó, J.M.; Matsuzaka, Y.; Yokota, R.; Mushiake, H.; Aihara, K. Detecting Causality by Combined Use of Multiple Methods: Climate and Brain Examples. PLoS ONE 2016, 11, e0158572. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Stippinger, M.; Bencze, A.; Zlatniczki, Á.; Somogyvári, Z.; Telcs, A. Causal Discovery of Stochastic Dynamical Systems: A Markov Chain Approach. Mathematics 2023, 11, 852. https://doi.org/10.3390/math11040852
Stippinger M, Bencze A, Zlatniczki Á, Somogyvári Z, Telcs A. Causal Discovery of Stochastic Dynamical Systems: A Markov Chain Approach. Mathematics. 2023; 11(4):852. https://doi.org/10.3390/math11040852
Chicago/Turabian StyleStippinger, Marcell, Attila Bencze, Ádám Zlatniczki, Zoltán Somogyvári, and András Telcs. 2023. "Causal Discovery of Stochastic Dynamical Systems: A Markov Chain Approach" Mathematics 11, no. 4: 852. https://doi.org/10.3390/math11040852
APA StyleStippinger, M., Bencze, A., Zlatniczki, Á., Somogyvári, Z., & Telcs, A. (2023). Causal Discovery of Stochastic Dynamical Systems: A Markov Chain Approach. Mathematics, 11(4), 852. https://doi.org/10.3390/math11040852