# Causal Geometry

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Effective Information in Continuous Systems

**x**. Note that this is distinct from $p\left(\mathit{y}\phantom{\rule{0.166667em}{0ex}}\right|\phantom{\rule{0.166667em}{0ex}}\mathit{x})$ in that the $do$ operator allows us to distinguish the correlation introduced by the causal relation $\mathit{x}\to \mathit{y}$ from one due to a common cause $\mathit{a}\to \{\mathit{x},\mathit{y}\}$.

#### 2.1. Toy Example: Dimmer Switch

## 3. Causal Geometry

#### 3.1. Construction

#### 3.2. Relation To Sloppiness

## 4. Two-Dimensional Example

**x**, having some uniform error tolerance $\delta $, map to normal distributions over parameters $\mathbf{\theta}$ as: $\mathit{x}\to q\left(\mathbf{\theta}\phantom{\rule{0.166667em}{0ex}}\right|\phantom{\rule{0.166667em}{0ex}}do\left(\mathit{x}\right))={\mathcal{N}}_{\theta}(A\mathit{x},\phantom{\rule{0.166667em}{0ex}}A\phantom{\rule{0.166667em}{0ex}}{A}^{T}{\delta}^{2})$, giving the Bayesian inverse probability $\tilde{q}\left(do\left(\mathit{x}\right)\phantom{\rule{0.166667em}{0ex}}\right|\phantom{\rule{0.166667em}{0ex}}\mathbf{\theta})={\mathcal{N}}_{x}({A}^{-1}\mathbf{\theta},\phantom{\rule{0.166667em}{0ex}}{\delta}^{2})$, and hence the intervention metric ${h}_{\mu \nu}={\sum}_{i}{\left({A}^{-1}\right)}_{i\mu}{\left({A}^{-1}\right)}_{i\nu}/{\delta}^{2}$. The effect space

**y**is constructed by measuring the population size at several time-points, spaced out at intervals $\Delta t$, such that the components of

**y**are given by ${y}_{n}=y(n\phantom{\rule{0.166667em}{0ex}}\Delta t)={\mathrm{e}}^{-n\phantom{\rule{0.166667em}{0ex}}\Delta t\phantom{\rule{0.166667em}{0ex}}{\theta}_{1}}+{\mathrm{e}}^{-n\phantom{\rule{0.166667em}{0ex}}\Delta t\phantom{\rule{0.166667em}{0ex}}{\theta}_{2}}$, with $n\in \{1,2,\dots ,N\}$ and error $\u03f5$ on each measurement (the initial conditions are thus always $y\left(0\right)=2$). Thus, we have $\mathbf{\theta}\to p\left(\mathit{y}\phantom{\rule{0.166667em}{0ex}}\right|\phantom{\rule{0.166667em}{0ex}}do\left(\mathbf{\theta}\right))={\mathcal{N}}_{y}\left(\left\{{y}_{n}\right\},{\u03f5}^{2}\right)$ and effect metric ${g}_{\mu \nu}={\sum}_{n}{\partial}_{\mu}{y}_{n}\phantom{\rule{0.166667em}{0ex}}{\partial}_{\nu}{y}_{n}/{\u03f5}^{2}$. Figure 2 shows these mappings with $N=2$ for visual clarity, and we use $N=3$ for the $E{I}_{g}$ calculations below, but all the qualitative behaviors remain the same for larger N. Figure 3 shows the resulting geometric $E{I}_{g}$ (blue curves), computed via Equation (7) for varying values of the error tolerances $\u03f5$ and $\delta $.

## 5. Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Example Illustrating the Do-Operator

## Appendix B. Deriving Geometric EI

#### Appendix B.1. One-Dimensional Case

#### Appendix B.2. Multi-Dimensional Case

**g**and

**h**). Here, we assume that both functions $\mathit{F}\left(\theta \right)$ and $\mathit{f}\left(\theta \right)$ are invertible, which means that the intervention and effect spaces $\mathcal{X}$ and $\mathcal{Y}$ both have the same dimension as the parameter space $\Theta $: ${d}_{I}={d}_{E}=d$. This allows us to view the map ${\theta}^{\mu}\to {f}^{i}\left(\theta \right)$ as a change of coordinates, with a square Jacobian matrix ${\partial}_{\mu}{f}^{i}$, whose determinant in the first line of Equation (A10) is thus well-defined and may be usefully expressed as $det\left({\partial}_{\mu}{f}^{i}\right)=\sqrt{\frac{det{g}_{\mu \nu}}{det{E}_{ij}}}$. Note also that to get the above result, we once again assume the distributions $\tilde{q}\left(do\left(\mathit{x}\right)\phantom{\rule{0.166667em}{0ex}}\right|\phantom{\rule{0.166667em}{0ex}}\mathbf{\theta})$ and $p\left(\mathit{y}\phantom{\rule{0.166667em}{0ex}}\right|\phantom{\rule{0.166667em}{0ex}}do\left(\mathbf{\theta}\right))$ to be nearly deterministic, meaning here that the matrices $\Delta $ and

**E**must be large, though the precise form of the assumption is messy here.

## References

- Hoel, E.P. Agent above, atom below: How agents causally emerge from their underlying microphysics. In Wandering Towards a Goal; Springer: New York, NY, USA, 2018; pp. 63–76. [Google Scholar]
- Anderson, P.W. More is different. Science
**1972**, 177, 393–396. [Google Scholar] [CrossRef] [PubMed][Green Version] - Solé, R.V.; Manrubia Cuevas, S.; Luque, B.; Delgado, J.; Bascompte, J. Phase Transitions and Complex Systems: Simple, Nonlinear Models Capture Complex Systems at the Edge of Chaos; John Wiley Sons: Hoboken, NJ, USA, 1996. [Google Scholar]
- Zenil, H.; Soler-Toscano, F.; Joosten, J.J. Empirical encounters with computational irreducibility and unpredictability. Minds Mach.
**2012**, 22, 149–165. [Google Scholar] [CrossRef][Green Version] - Israeli, N.; Goldenfeld, N. Computational irreducibility and the predictability of complex physical systems. Phys. Rev. Lett.
**2004**, 92, 074105. [Google Scholar] [CrossRef] [PubMed][Green Version] - Transtrum, M.K.; Machta, B.B.; Brown, K.S.; Daniels, B.C.; Myers, C.R.; Sethna, J.P. Perspective: Sloppiness and emergent theories in physics, biology, and beyond. J. Chem. Phys.
**2015**, 143, 07B201_1. [Google Scholar] [CrossRef] - Box, G.E. Science and statistics. J. Am. Stat. Assoc.
**1976**, 71, 791–799. [Google Scholar] [CrossRef] - Maiwald, T.; Hass, H.; Steiert, B.; Vanlier, J.; Engesser, R.; Raue, A.; Kipkeew, F.; Bock, H.H.; Kaschek, D.; Kreutz, C.; et al. Driving the model to its limit: Profile likelihood based model reduction. PLoS ONE
**2016**, 11, e0162366. [Google Scholar] [CrossRef][Green Version] - MacKay, D.J.; Mac Kay, D.J. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Amari, S.I. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
- Gutenkunst, R.N.; Waterfall, J.J.; Casey, F.P.; Brown, K.S.; Myers, C.R.; Sethna, J.P. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol.
**2007**, 3, e189. [Google Scholar] [CrossRef][Green Version] - Transtrum, M.K.; Qiu, P. Model reduction by manifold boundaries. Phys. Rev. Lett.
**2014**, 113, 098701. [Google Scholar] [CrossRef][Green Version] - Mattingly, H.H.; Transtrum, M.K.; Abbott, M.C.; Machta, B.B. Maximizing the information learned from finite data selects a simple model. Proc. Natl. Acad. Sci. USA
**2018**, 115, 1760–1765. [Google Scholar] [CrossRef][Green Version] - Machta, B.B.; Chachra, R.; Transtrum, M.K.; Sethna, J.P. Parameter space compression underlies emergent theories and predictive models. Science
**2013**, 342, 604–607. [Google Scholar] [CrossRef][Green Version] - Raju, A.; Machta, B.B.; Sethna, J.P. Information loss under coarse graining: A geometric approach. Phys. Rev. E
**2018**, 98, 052112. [Google Scholar] [CrossRef][Green Version] - Laughlin, R.B.; Pines, D. From the cover: The theory of everything. Proc. Natl. Acad. Sci. USA
**2000**, 97, 28. [Google Scholar] [CrossRef] [PubMed][Green Version] - Pearl, J. Causality; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Halpern, J.Y. Actual Causality; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Sugihara, G.; May, R.; Ye, H.; Hsieh, C.H.; Deyle, E.; Fogarty, M.; Munch, S. Detecting causality in complex ecosystems. Science
**2012**, 338, 496–500. [Google Scholar] [CrossRef] [PubMed] - Solvang, H.K.; Subbey, S. An improved methodology for quantifying causality in complex ecological systems. PLoS ONE
**2019**, 14, e0208078. [Google Scholar] [CrossRef][Green Version] - Albantakis, L.; Tononi, G. Causal composition: Structural differences among dynamically equivalent systems. Entropy
**2019**, 21, 989. [Google Scholar] [CrossRef][Green Version] - Tononi, G.; Sporns, O. Measuring information integration. BMC Neurosci.
**2003**, 4, 31. [Google Scholar] [CrossRef][Green Version] - Hoel, E.P.; Albantakis, L.; Tononi, G. Quantifying causal emergence shows that macro can beat micro. Proc. Natl. Acad. Sci. USA
**2013**, 110, 19790–19795. [Google Scholar] [CrossRef][Green Version] - Hoel, E.P. When the map is better than the territory. Entropy
**2017**, 19, 188. [Google Scholar] [CrossRef][Green Version] - Gugercin, S.; Antoulas, A.C. A Survey of Model Reduction by Balanced Truncation and Some New Results. Int. J. Control.
**2004**, 77, 748–766. [Google Scholar] [CrossRef] - Huang, H.; Fairweather, M.; Griffiths, J.; Tomlin, A.; Brad, R. A systematic lumping approach for the reduction of comprehensive kinetic models. Proc. Combust. Inst.
**2005**, 30, 1309–1316. [Google Scholar] [CrossRef] - Daniels, B.C.; Nemenman, I. Automated adaptive inference of phenomenological dynamical models. Nat. Commun.
**2015**, 6, 1–8. [Google Scholar] [CrossRef] [PubMed][Green Version] - Dufresne, E.; Harrington, H.A.; Raman, D.V. The geometry of sloppiness. arXiv
**2016**, arXiv:1608.05679. [Google Scholar] [CrossRef] - Pearl, J.; Mackenzie, D. The Book of Why: The New Science of Cause and Effect; Basic Books: New York, NY, USA, 2018. [Google Scholar]
- Balduzzi, D. Information, learning and falsification. arXiv
**2011**, arXiv:1110.3592. [Google Scholar] - Klein, B.; Hoel, E. The emergence of informative higher scales in complex networks. Complexity
**2020**, 2020. [Google Scholar] [CrossRef] - Liu, Y.Y.; Slotine, J.J.; Barabási, A.L. Controllability of complex networks. Nature
**2011**, 473, 167–173. [Google Scholar] [CrossRef] - Portugal, R.; Svaiter, B.F. Weber-Fechner law and the optimality of the logarithmic scale. Minds Mach.
**2011**, 21, 73–81. [Google Scholar] [CrossRef] - Hu, W.; Davis, W. Dimming curve based on the detectability and acceptability of illuminance differences. Opt. Express
**2016**, 24, A885–A897. [Google Scholar] [CrossRef] - Transtrum, M.K.; Machta, B.B.; Sethna, J.P. Geometry of nonlinear least squares with applications to sloppy models and optimization. Phys. Rev. E
**2011**, 83, 036701. [Google Scholar] [CrossRef][Green Version] - Tikhonov, M. Theoretical ecology without species. Bull. Am. Phys. Soc.
**2016**, 61, 1–13. [Google Scholar] - Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M.K. Occam’s razor. Inf. Process. Lett.
**1987**, 24, 377–380. [Google Scholar] [CrossRef] - Shapiro, L. Embodied Cognition; Routledge: Abingdon, UK, 2019. [Google Scholar]
- Srednicki, M. Quantum Field Theory; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]

**Figure 1.**Illustrating continuous Effective Information (EI) on a simple toy system. (

**a**) shows the system construction: a dimmer switch with a particular “dimmer profile” $f\left(\theta \right)$. We can intervene on it by setting the switch $\theta \in (0,1)$ up to error tolerance $\delta $, while effects are similarly measured with error $\u03f5$; (

**b**) shows that for uniform errors $\u03f5=\delta =0.03$, out of the family of dimmer profiles parametrized by a (left), the linear profile gives the “best control,” i.e., has the highest EI (where dark blue—numerical EI calculation and light blue—approximation in Equation (4)); (

**c**) illustrates how for two other dimmer profiles (left), increasing error tolerances $\u03f5=\delta $ influence the EI (right, calculated numerically). The profile in red represents a discrete binary switch—which emerges if we restrict the interventions on the blue dimmer profile to only use “ends of run.” Crucially, such coarse-graining allows for an improved control of the light (higher EI) when errors are sufficiently large.

**Figure 2.**An illustration of the causal geometry construction in Equation (6). The parameter space $\Theta $ of our model gets two distinct geometric structures: the effect metric ${g}_{\mu \nu}\left(\mathbf{\theta}\right)$ and the intervention metric ${h}_{\mu \nu}\left(\mathbf{\theta}\right)$. Here, a model is seen as a map that associates with each set of parameters $\mathbf{\theta}$, some distribution of possible measured effects

**y**(right). As parameters $\mathbf{\theta}$ may involve arbitrary abstractions and thus need not be directly controllable, we similarly associate them with practically doable interventions

**x**(left). This way, our system description in terms of $\mathbf{\theta}$ “mediates” between the interventions and resulting effects in the causal model.

**Figure 3.**Causal emergence from increasing errors for the toy model in Section 4. In all panels, the blue line shows the $E{I}_{g}$ for the full 2D model, while red for the 1D sub-manifold A shown in Figure 2 (solid red line). In (

**a**), we vary the effect error $\u03f5$ at fixed intervention error $\delta ={10}^{-2}$; (

**b**) varies intervention error $\delta $ at fixed effect error $\u03f5={10}^{-2}$; and (

**c**) varies both together $\delta =\u03f5$. In each case, we see a crossover where, with no change in system behavior, the coarse-grained 1D model becomes causally more informative when our intervention or effect errors become large.

**Figure 4.**The optimal model choice depends on both the effects we choose to measure and the intervention capabilities we have. Horizontally, we vary the time-scale $\Delta t$ on which we measure the bacterial population dynamics in our toy model (Section 4): the top row shows how this changes the shape of our effect manifold. (

**a**) shows the results when our intervention capabilities are nearly in direct correspondence with the parameters $\mathbf{\theta}$. Here, the $E{I}_{g}$ plot shows that varying $\Delta t$ takes us through three regimes: with submanifold A as the optimal model at early times, the full 2D model optimal at intermediate times, and submanifold B most informative at late times. (

**b**) shows that this entire picture changes for a different set of intervention capabilities—illustrating that the appropriate model choice depends as much on the interventions as on the effects.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chvykov, P.; Hoel, E. Causal Geometry. *Entropy* **2021**, *23*, 24.
https://doi.org/10.3390/e23010024

**AMA Style**

Chvykov P, Hoel E. Causal Geometry. *Entropy*. 2021; 23(1):24.
https://doi.org/10.3390/e23010024

**Chicago/Turabian Style**

Chvykov, Pavel, and Erik Hoel. 2021. "Causal Geometry" *Entropy* 23, no. 1: 24.
https://doi.org/10.3390/e23010024