The Causal Interaction between Complex Subsystems

Liang, X. San

doi:10.3390/e24010003

Open AccessArticle

The Causal Interaction between Complex Subsystems

by

X. San Liang

^1,2,3

¹

Department of Atmospheric & Oceanic Sciences, Institute of Atmospheric Sciences, Fudan University, Shanghai 200438, China

²

IRDR ICoE on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health, Fudan University, Shanghai 200438, China

³

Shanghai Qi Zhi Institute (Andrew C. Yao Institute for Artificial Intelligence), Shanghai 200232, China

Entropy 2022, 24(1), 3; https://doi.org/10.3390/e24010003

Submission received: 29 November 2021 / Revised: 16 December 2021 / Accepted: 16 December 2021 / Published: 21 December 2021

(This article belongs to the Special Issue Information Geometry, Complexity Measures and Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Information flow provides a natural measure for the causal interaction between dynamical events. This study extends our previous rigorous formalism of componentwise information flow to the bulk information flow between two complex subsystems of a large-dimensional parental system. Analytical formulas have been obtained in a closed form. Under a Gaussian assumption, their maximum likelihood estimators have also been obtained. These formulas have been validated using different subsystems with preset relations, and they yield causalities just as expected. On the contrary, the commonly used proxies for the characterization of subsystems, such as averages and principal components, generally do not work correctly. This study can help diagnose the emergence of patterns in complex systems and is expected to have applications in many real world problems in different disciplines such as climate science, fluid dynamics, neuroscience, financial economics, etc.

Keywords:

bulk information flow; complex system; causality; subspace; networks of networks

1. Introduction

When investigating the properties of a complex system, it is often necessary to study the interaction between one subsystem and another subsystem, which themselves also form complex systems, usually with a large number of components involved. In climate science, for example, there is much interest in understanding how one sector of the system collaborates with another sector to cause climate change (see [1] and the references therein); in neuroscience, it is important to investigate the effective connectivity from one brain region to another, each with millions of neurons involved (e.g., [2,3]), and the interaction between structures (e.g., [4,5,6]; see more references in a recent review [7]). This naturally raises a question: How can we study the interaction between two subsystems in a large parental system?

An immediate answer coming to mind might be to study the componentwise interactions by assessing the causalities between the respective components using, for instance, the classical causal inference approaches (e.g., [8,9,10]). This is generally infeasible if the dimensionality is large. For two subsystems, each with, say, 1000 components, they end up with 1 million causal relations, making it impossible to analyze, albeit with all the details. In this case, the details are not a benefit; they need to be re-analyzed for a big, interpretable picture of the phenomena. On the other hand, in many situations, this is not necessary; one needs only a “bulk” description of the subsystems and their interactions. Such examples are seen from the Reynolds equations for turbulence (e.g., [11]) and the thermodynamic description of molecular motions (e.g., [12]). In some fields (e.g., climate science, neuroscience, geography, etc.), a common practice is simply to take respective averages and form the mean properties, and to study the interactions between the proxies, i.e., the mean properties. A more sophisticated approach is to extract the respective principal components (PCs) (e.g., [13,14,15]), based on which the interactions are analyzed henceforth. These approaches, as we will be examining in this study, however, may not work satisfactorily; their validities need to be carefully checked before being put into application.

During the past 16 years, it has been gradually realized that causality in terms of information flow (IF) is a real physical notion that can be rigorously derived from first principles (see [16]). When two processes interact, IF provides not only the direction but also the strength of the interaction. Thus far, the formalism of the IF between two components has been well established (see [16,17,18,19,20], among others). It has been shown promising to extend the formalism to subspaces with many components involved. A pioneering effort is [21], where the authors show that the heuristic argument in [17] equally applies to that between subsystems in the case with only one-way causality. A recent study on the role of individual nodes in a complex network [22] may be viewed as another effort. (Causality analyses between subspaces with the classical approaches are rare; a few examples are [23,24], etc.) However, a rigorous formalism for more generic problems (e.g., with mutual causality involved) is yet to be implemented. This makes the objective of this study, i.e., to study the interactions between two complex subsystems within a large parental system by investigating the “bulk” information flow between them.

The rest of the paper is organized as follows. In Section 2, we first present the setting for the problem and then derive the IF formulas. Maximum likelihood estimators of these formulas are given in Section 3, which is followed by a validation. Finally, Section 5 summarizes the study.

2. Information Flow between Two Subspaces of a Complex System

Consider an n-dimensional dynamical system

\begin{matrix} A : \{\begin{matrix} \frac{d x_{1}}{d t} = F_{1} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{1, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \\ ⋮ ⋮ \\ \frac{d x_{r}}{d t} = F_{r} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{r, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \end{matrix} \end{matrix}

(1)

\begin{matrix} B : \{\begin{matrix} \frac{d x_{r + 1}}{d t} = F_{r + 1} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{r + 1, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \\ ⋮ ⋮ \\ \frac{d x_{s}}{d t} = F_{s} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{s, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \end{matrix} \end{matrix}

(2)

\begin{matrix} \{\begin{matrix} \frac{d x_{s + 1}}{d t} = F_{s + 1} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{s + 1, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \\ ⋮ ⋮ \\ \frac{d x_{n}}{d t} = F_{n} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{n k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} . \end{matrix} \end{matrix}

(3)

where

x \in ℝ^{n}

denotes the vector of state variable

(x_{1}, x_{2}, \dots, x_{n})

,

F = (F_{1}, \dots, F_{n})

are differentiable functions of

x

and time t,

w

is a vector of m independent standard Wiener processes, and

B = (b_{i j})

is an

n \times m

matrix of stochastic perturbation amplitudes. Here we follow the convention in physics not to distinguish a random variable from its deterministic counterpart. From the components

(x_{1}, \dots, x_{n})

, we separate out two sets,

(x_{1}, \dots, x_{r})

and

(x_{r + 1}, \dots, x_{s})

, and denote them as

x_{1 \dots r}

and

x_{r + 1, \dots, s}

, respectively. The remaining components

(x_{s + 1}, \dots, x_{n})

are denoted as

x_{s + 1, \dots, n}

. The subsystems formed by them are henceforth referred to as A and B, and the following is a derivation of the information flow between them. Note that, for convenience, here A and B are put adjacent to each other; if not, the equations can always be rearranged to make them so.

Associated with Equations (1)–(3) there is a Fokker–Planck equation governing the evolution of the joint probability density function (pdf)

ρ

of

x

:

\begin{matrix} \frac{\partial ρ}{\partial t} + \frac{\partial ρ F_{1}}{\partial x_{1}} + \frac{\partial ρ F_{2}}{\partial x_{2}} + \dots + \frac{\partial ρ F_{n}}{\partial x_{n}} = \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{\partial^{2} g_{i j} ρ}{\partial x_{i} \partial x_{j}}, \end{matrix}

(4)

where

g_{i j} = \sum_{k = 1}^{m} b_{i k} b_{j k}

,

i, j = 1, \dots, n

. Without much loss of generality,

ρ

is assumed to be compactly supported on

ℝ^{n}

. The joint pdfs of

x_{1 \dots r}

and

x_{r + 1, \dots, s}

are, respectively,

\begin{matrix} ρ_{1 \dots r} = \int_{ℝ^{n - r}} ρ (x) d x_{r + 1} \dots d x_{n} \equiv \int_{ℝ^{n - r}} ρ (x) d x_{r + 1, \dots, n}, \\ ρ_{r + 1, \dots, s} = \int_{ℝ^{n - s + r}} ρ (x) d x_{1} \dots d x_{r} d x_{s + 1} \dots d x_{n} \equiv \int_{ℝ^{n - s + r}} ρ (x) d x_{1, \dots, r, s + 1, \dots, n} . \end{matrix}

With respect to them, the joint entropies are then

\begin{matrix} H_{A} = - \int_{ℝ^{r}} ρ_{1 \dots r} log ρ_{1 \dots r} d x_{1 \dots r}, \end{matrix}

(5)

\begin{matrix} H_{B} = - \int_{ℝ^{s - r}} ρ_{r + 1, \dots, s} log ρ_{r + 1, \dots, s} d x_{r + 1, \dots, s} . \end{matrix}

(6)

To derive the evolution of

ρ_{1 \dots r}

, integrate out

(x_{r + 1}, \dots, x_{n})

in Equation (4). This yields, by using the assumption of compactness for

ρ

,

\begin{matrix} \frac{\partial ρ_{1 \dots r}}{\partial t} + \sum_{i = 1}^{r} \frac{\partial}{\partial x_{i}} \int_{ℝ^{n - r}} ρ F_{i} d x_{r + 1, \dots, n} = \frac{1}{2} \sum_{i = 1}^{r} \sum_{j = 1}^{r} \int_{ℝ^{n - r}} \frac{\partial^{2} g_{i j} ρ}{\partial x_{i} \partial x_{j}} d x_{r + 1, \dots, n} . \end{matrix}

(7)

Similarly,

\begin{matrix} \frac{\partial ρ_{r + 1, \dots, s}}{\partial t} + \sum_{i = r + 1}^{s} \frac{\partial}{\partial x_{i}} \int_{ℝ^{n - s + r}} ρ F_{i} d x_{1, \dots, r, s + 1, \dots, n} = \frac{1}{2} \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \int_{ℝ^{n - s + r}} \frac{\partial^{2} g_{i j} ρ}{\partial x_{i} \partial x_{j}} d x_{1, \dots, r, s + 1, \dots, n} . \end{matrix}

(8)

Multiplication of Equation (7) by

- (1 + log ρ_{1 \dots r})

, followed by an integration with respect to

x_{1 \dots r}

over

ℝ^{r}

, yields

\begin{matrix} \frac{d H_{A}}{d t} - \sum_{i = 1}^{r} \int_{ℝ^{r}} [(1 + log ρ_{1 \dots r}) \cdot \frac{\partial}{\partial x_{i}} \int_{ℝ^{n - r}} ρ F_{i} d x_{r + 1, \dots, n}] d x_{1 \dots r} \\ = - \frac{1}{2} \int_{ℝ^{r}} [(1 + log ρ_{1 \dots r}) \cdot \sum_{i = 1}^{r} \sum_{j = 1}^{r} \int_{ℝ^{n - r}} \frac{\partial^{2} g_{i j} ρ}{\partial x_{i} \partial x_{j}} d x_{r + 1, \dots, n}] d x_{1 \dots r} . \end{matrix}

Note that in the second term of the left hand side, the part within the summation is, by integration by parts,

\begin{matrix} \int_{ℝ^{r}} (log ρ_{1 \dots r}) \cdot \frac{\partial}{\partial x_{i}} (\int_{ℝ^{n - r}} ρ F_{i} d x_{r + 1, \dots, n}) d x_{1 \dots r} \\ = - \int_{ℝ^{r}} \int_{ℝ^{n - r}} ρ F_{i} \frac{\partial log ρ_{1 \dots r}}{\partial x_{i}} d x_{r + 1, \dots, n} d x_{1 \dots r} \\ = - \int_{ℝ^{n}} ρ F_{i} \frac{\partial log ρ_{1 \dots r}}{\partial x_{i}} d x \\ = - E [F_{i} \frac{\partial log ρ_{1 \dots r}}{\partial x_{i}}] . \end{matrix}

In the derivation, the compactness assumption has been used (variables vanish at the boundaries). By the same approach, the right hand side becomes

\begin{matrix} - \frac{1}{2} \int_{ℝ^{r}} [log ρ_{1 \dots r} \cdot \sum_{i = 1}^{r} \sum_{j = 1}^{r} \int_{ℝ^{n - r}} \frac{\partial^{2} g_{i j} ρ}{\partial x_{i} \partial x_{j}} d x_{r + 1, \dots, n}] d x_{1 \dots r} \\ = - \frac{1}{2} \sum_{i = 1}^{r} \sum_{j = 1}^{r} \int_{ℝ^{n}} [log ρ_{1 \dots r} \cdot \frac{\partial^{2} g_{i j} ρ}{\partial x_{i} \partial x_{j}}] d x \\ = - \frac{1}{2} \sum_{i = 1}^{r} \sum_{j = 1}^{r} \int_{ℝ^{n}} [g_{i j} ρ \frac{\partial^{2} log ρ_{1 \dots r}}{\partial x_{i} \partial x_{j}}] d x \\ = - \frac{1}{2} \sum_{i = 1}^{r} \sum_{j = 1}^{r} E [g_{i j} \frac{\partial^{2} log ρ_{1 \dots r}}{\partial x_{i} \partial x_{j}}] . \end{matrix}

Hence,

\begin{matrix} \frac{d H_{A}}{d t} = - \sum_{i = 1}^{r} E [F_{i} \frac{\partial log ρ_{1 \dots r}}{\partial x_{i}}] - \frac{1}{2} \sum_{i = 1}^{r} \sum_{j = 1}^{r} E [g_{i j} \frac{\partial^{2} log ρ_{1 \dots r}}{\partial x_{i} \partial x_{j}}] . \end{matrix}

(9)

Likewise, we have

\begin{matrix} \frac{d H_{B}}{d t} = - \sum_{i = r + 1}^{s} E [F_{i} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}}] - \frac{1}{2} \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} E [g_{i j} \frac{\partial^{2} log ρ_{r + 1, \dots, s}}{\partial x_{i} \partial x_{j}}] . \end{matrix}

(10)

Now consider the impact of the subsystem A on its peer B, written

T_{A \to B}

. Following Liang (2016) [16], this is associated with the evolution of the joint entropy of the latter:

\begin{matrix} \frac{d H_{B}}{d t} = \frac{d H_{B ∖ A}}{d t} + T_{A \to B}, \end{matrix}

(11)

where

H_{B ∖ A}

signifies the entropy evolution with the influence of A excluded, which is found by instantaneously freezing

(x_{1}, \dots, x_{r}) \equiv x_{1 \dots r}

as parameters. To do this, examine, on an infinitesimal interval

[t, t + Δ t]

, a system modified from the original Equations (1)–(3) by removing the r equations for

x_{1}

,

x_{2}

, ...,

x_{r}

from the equation set

\begin{matrix} \frac{d x_{r + 1}}{d t} = F_{r + 1} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{r + 1, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \end{matrix}

(12)

\begin{matrix} ⋮ ⋮ \end{matrix}

\begin{matrix} \frac{d x_{s}}{d t} = F_{s} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{s, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \end{matrix}

(13)

\begin{matrix} \frac{d x_{s + 1}}{d t} = F_{s + 1} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{s + 1, k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} \end{matrix}

(14)

\begin{matrix} ⋮ ⋮ \end{matrix}

\begin{matrix} \frac{d x_{n}}{d t} = F_{n} (x_{1}, x_{2}, \dots, x_{n}; t) + \sum_{k = 1}^{m} b_{n k} (x_{1}, x_{2}, \dots, x_{n}; t) {\dot{w}}_{k} . \end{matrix}

(15)

Notice that the

F_{i}

s and

b_{i k}

s still have dependence on

(x_{1}, \dots, x_{r}) \equiv x_{1 \dots r}

, which, however, appear in the modified system as parameters. By [16], we can construct a mapping

Φ : ℝ^{n - r} \to ℝ^{n - r}

,

x_{∖ A} (t) \mapsto x_{∖ A} (t + Δ t)

, where

x_{∖ A}

means

x

but with

x_{1 \dots r}

appearing as parameters, and study the Frobenius–Perron operator (see, for example, [25]) of the modified system. An alternative approach is given by Liang in [18], which we henceforth follow. Observe that on the interval

[t, t + Δ t]

, corresponding to the modified dynamical system, there is also a Fokker–Planck equation

\begin{matrix} \frac{\partial ρ_{∖ A}}{\partial t} + \sum_{i = r + 1}^{n} \frac{\partial F_{i} ρ_{∖ A}}{\partial x_{i}} = \frac{1}{2} \sum_{i = r + 1}^{n} \sum_{j = r + 1}^{n} \frac{\partial^{2} g_{i j} ρ_{∖ A}}{\partial x_{i} \partial x_{j}}, \\ ρ_{∖ A} = ρ_{r + 1, \dots, n} at time t . \end{matrix}

Here

g_{i j} = \sum_{k = 1}^{m} b_{i k} b_{j k}

,

ρ_{∖ A}

means the joint pdf of

(x_{r + 1}, \dots, x_{n})

with

x_{1 \dots r}

frozen as parameters. Note the difference between

ρ_{∖ A}

and

ρ_{r + 1, \dots, n}

; the former has

x_{1 \dots r}

as parameters, while the latter has no dependence on

x_{1 \dots r}

. However, they are equal at time t.

Integration of the above Fokker–Planck equation with respect to

d x_{s + 1, \dots, n}

gives the evolution of the pdf of subsystem B with A frozen as parameters, written

ρ_{B, ∖ A}

:

\begin{matrix} \frac{\partial ρ_{B, ∖ A}}{\partial t} + \sum_{i = r + 1}^{s} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{∖ A}}{\partial x_{i}} d x_{s + 1, \dots, n} = \frac{1}{2} \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{∖ A}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}, \end{matrix}

(16)

\begin{matrix} ρ_{B, ∖ A} = ρ_{r + 1, \dots, s} at time t . \end{matrix}

(17)

Divide Equation (16) by

ρ_{B, ∖ A}

and simplify the notation

x_{r + 1, \dots, s}

by

x_{B}

to obtain

\begin{matrix} \frac{\partial log ρ_{B, ∖ A}}{\partial t} + \sum_{i = r + 1}^{s} \frac{1}{ρ_{B, ∖ A}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{∖ A}}{\partial x_{i}} d x_{s + 1, \dots, n} = \frac{1}{2 ρ_{B, ∖ A}} \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{∖ A}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n} . \end{matrix}

Discretizing and noticing that

ρ_{B, ∖ A} (t) = ρ_{r + 1, \dots, s} (t)

, we have (in the following, unless otherwise indicated, the variables without arguments explicitly specified are assumed to be at time step t)

\begin{matrix} log ρ_{B, ∖ A} (x_{B}; t + Δ t) \\ = log ρ_{r + 1, \dots, s} (x_{B}; t) - Δ t \cdot \sum_{i = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n} \\ + \frac{Δ t}{2} \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n} + o (Δ t) . \end{matrix}

To arrive at

d H_{B, ∖ A} / d t

, we need to find

log ρ_{B, ∖ A} (x_{B} (t + Δ t); t + Δ t)

. Using the Euler–Bernstein approximation,

\begin{matrix} x_{B} (t + Δ t) = x_{B} (t) + F_{B} Δ t + B_{B} Δ w \end{matrix}

(18)

where, just as the notation

x_{B}

,

\begin{matrix} F_{B} = {(F_{r + 1}, \dots, F_{s})}^{T}, \\ B_{B} = [\begin{matrix} b_{r + 1, 1} & \dots & b_{r + 1, m} \\ ⋮ & ⋱ & ⋮ \\ b_{s 1} & \dots & b_{s m} \end{matrix}] \\ Δ w = {(Δ w_{1}, \dots, Δ w_{m})}^{T} \end{matrix}

and

Δ w_{k} \sim N (0, Δ t)

, we have

\begin{matrix} log (ρ_{B, ∖ A} (x_{B} (t + Δ t); t + Δ t) \\ = log ρ_{r + 1, \dots, s} (x_{B} (t) + F_{B} Δ t + B_{B} Δ w; t) \\ - Δ t \cdot \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n} + \frac{Δ t}{2} \sum_{r + 1}^{s} \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n} + o (Δ t) . \\ = log ρ_{r + 1, \dots, s} (x_{B} (t)) + \sum_{i = r + 1}^{s} [\frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}} (F_{i} Δ t + \sum_{k = 1}^{m} b_{i k} Δ w_{k})] \\ + \frac{1}{2} \cdot \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} [\frac{\partial^{2} log ρ_{r + 1, \dots, s}}{\partial x_{i} \partial x_{j}} (F_{i} Δ t + \sum_{k = 1}^{m} b_{i k} Δ w_{k}) \cdot (F_{j} Δ t + \sum_{l = 1}^{m} b_{j l} Δ w_{l})] \\ - Δ t \cdot \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n} + \frac{Δ t}{2} \sum_{r + 1}^{s} \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n} + o (Δ t) . \end{matrix}

Take mathematical expectation on both sides. The left hand side is

- H_{B, ∖ A} (t + Δ t)

. By Corollary III.I of [16], and noting

E Δ w_{k} = 0

,

E Δ w_{k}^{2} = Δ t

and the fact that

Δ w

are independent of

x_{B}

, we have

\begin{matrix} - H_{B, ∖ A} (t + Δ t) = - H_{B} (t) + Δ t \cdot E \sum_{i = r + 1}^{s} F_{i} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}} \\ + \frac{Δ t}{2} \cdot E \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \sum_{k = 1}^{m} \sum_{l = 1}^{m} b_{i k} b_{j l} δ_{k l} \frac{\partial^{2} log ρ_{r + 1, \dots, s}}{\partial x_{i} \partial x_{j}} \\ - Δ t \cdot E \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n} \\ + \frac{Δ t}{2} E \sum_{r + 1}^{s} \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n} + o (Δ t) \\ = - H_{B} (t) + Δ t \cdot E \sum_{i = r + 1}^{s} F_{i} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}} + \frac{Δ t}{2} \cdot E \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} g_{i j} \frac{\partial^{2} log ρ_{r + 1, \dots, s}}{\partial x_{i} \partial x_{j}} \\ - Δ t \cdot E \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n} \\ + \frac{Δ t}{2} E \sum_{r + 1}^{s} \sum_{r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n} + o (Δ t) . \end{matrix}

Thus,

\begin{matrix} \frac{d H_{B, ∖ A}}{d t} = lim_{Δ t \to 0} \frac{H_{B, ∖ A} - H_{B} (t)}{Δ t} \\ = - E \sum_{i = r + 1}^{n} (F_{i} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}} - \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}) \\ - \frac{1}{2} E \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} (g_{i j} \frac{\partial^{2} log ρ_{r + 1, \dots, s}}{\partial x_{i} \partial x_{j}} + \frac{1}{ρ_{r + 1, \dots, s}} \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} \int_{ℝ^{n - s}} g_{i j} ρ_{r + 1, \dots, n} d x_{s + 1, \dots, n}) . \end{matrix}

Hence, the information flow from

x_{1 \dots r}

to

x_{r + 1, \dots, s}

is

\begin{matrix} T_{A \to B} = \frac{d H_{B}}{d t} - \frac{d H_{B, ∖ A}}{d t} \\ = - E \sum_{i = r + 1}^{s} (F_{i} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}}) - \frac{1}{2} E \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} (g_{i j} \frac{\partial^{2} log ρ_{r + 1, \dots, s}}{\partial x_{i} \partial x_{j}}) \\ + E \sum_{i = r + 1}^{s} (F_{i} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}} - \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}) \\ + \frac{1}{2} E \sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} (g_{i j} \frac{\partial^{2} log ρ_{r + 1, \dots, s}}{\partial x_{i} \partial x_{j}} + \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}) \\ = - E [\sum_{i = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}] \\ + \frac{1}{2} E [\sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}] . \end{matrix}

Likewise, we can obtain the information flow from subsystem B to subsystem A. These are summarized in the following theorem.

Theorem 1.

For the dynamical system Equations (1)–(3), if the probability density function (pdf) of

x

is compactly supported, then the information flow from

x_{1 \dots r}

to

x_{r + 1, \dots, s}

and that from

x_{r + 1, \dots, s}

to

x_{1 \dots r}

are (in nats per unit time), respectively,

\begin{matrix} T_{A \to B} = - E [\sum_{i = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}] \\ + \frac{1}{2} E [\sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}], \end{matrix}

(19)

\begin{matrix} T_{B \to A} = - E [\sum_{i = 1}^{r} \frac{1}{ρ_{1 \dots r}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}] \\ + \frac{1}{2} E [\sum_{i = 1}^{r} \sum_{j = 1}^{r} \frac{1}{ρ_{1 \dots r}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}], \end{matrix}

(20)

where

g_{i j} = \sum_{k = 1}^{m} b_{i k} b_{j k}

, and E signifies mathematical expectation.

When

r = 1

,

s = n = 2

, (20) reduces to

\begin{matrix} T_{B \to A} = - E [\frac{1}{ρ_{1}} \frac{\partial F_{1} ρ_{1}}{\partial x_{1}}] + \frac{1}{2} E [\frac{1}{ρ_{1}} \frac{\partial^{2} g_{11} ρ_{1}}{\partial x_{1}^{2}}] \end{matrix}

which is precisely the same as the Equation (15) in [18]; the same holds for Equation (19). These equations are hence verified.

The following theorem forms the basis for causal inference.

Theorem 2.

If the evolution of subsystem A (resp. B) does not depend on

x_{r + 1, \dots, s}

(resp.

x_{1 \dots r}

), then

T_{B \to A} = 0

(resp.

T_{A \to B} = 0

).

Proof.

We only check the formula for

T_{B \to A}

. In (20), the deterministic part

\begin{matrix} - E [\sum_{i = 1}^{r} \frac{1}{ρ_{1 \dots r}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}] \\ = - \sum_{i = 1}^{r} \int_{ℝ^{r}} \int_{ℝ^{s - r}} [(\frac{ρ_{1, \dots, s}}{ρ_{1 \dots r}}) \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}] d x_{1 \dots r} d x_{r + 1, \dots, s} . \end{matrix}

Now,

F_{i}

is independent of

x_{r + 1, \dots, s}

, and note that

ρ_{1, \dots, r, s + 1, \dots, n}

is also so. Thus, we may integrate

ρ_{1, \dots, s}

within the parentheses directly with respect to

d x_{r + 1, \dots, s}

, yielding

\begin{matrix} \frac{\int_{ℝ^{s - r}} ρ_{1, \dots, s} d x_{r + 1, \dots, s}}{ρ_{1 \dots r}} = \frac{ρ_{1 \dots r}}{ρ_{1 \dots r}} = 1 . \end{matrix}

By the compactness of

ρ

, the whole deterministic part hence vanishes. Likewise, it can be proved that the stochastic part also vanishes.

This theorem allows us to identify the causality with information flow. Ideally, if

T_{B \to A} = 0

, then B is not causal to A, and vice versa; the same holds for

T_{A \to B}

. □

3. Information Flow between Linear Subsystems and Its Estimation

Linear systems provide the simplest framework which is usually taken as the first step toward a more generic setting. Simple as it may be, it has been demonstrated in practice that linear results often provide a good approximation of an otherwise much more complicated problem. It is hence of interest to examine this special case.

Let

\begin{matrix} F_{i} = f_{i} + \sum_{j = 1}^{n} a_{i j} x_{j}, \end{matrix}

(21)

where

f_{i}

and

a_{i j}

are constants. Additionally, suppose that

b_{i j}

are constants—that is to say, the noises are additive. Then,

g_{i j}

are also constants. Thus, in Equation (20),

\begin{matrix} E (\frac{1}{ρ_{1 \dots r}} \int_{n - s} \frac{\partial^{2} g_{i j} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}) \\ = g_{i j} \int_{ℝ^{s}} ρ (x_{1 \dots s}) \frac{1}{ρ_{1 \dots r}} \frac{\partial^{2} \int_{} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n} d x_{1 \dots s} \\ = g_{i j} \int_{ℝ^{r}} \int_{ℝ^{s - r}} \frac{ρ_{1 \dots s}}{ρ_{1 \dots r}} \frac{\partial^{2} ρ_{1 \dots r}}{\partial x_{i} \partial x_{j}} d x_{r + 1, \dots, s} d x_{1 \dots r} \\ = g_{i j} \int_{ℝ^{r}} 1 \cdot \frac{\partial^{2} ρ_{1 \dots r}}{\partial x_{i} \partial x_{j}} d x_{1 \dots r} \\ = 0 . \end{matrix}

The same holds in Equation (19). Thus, the stochastic parts in both Equations (19) and (20) vanish.

Since a linear system initialized with a Gaussian process will always be Gaussian, we may write the joint pdf of

x

as

\begin{matrix} ρ (x_{1}, \dots, x_{n}) = \frac{1}{\sqrt{{(2 π)}^{n} det Σ}} e^{- \frac{1}{2} {(x - μ)}^{T} Σ^{- 1} (x - μ)}, \end{matrix}

(22)

where

Σ = {(σ_{i j})}_{n \times n}

is the population covariance matrix of

x

. By the property of the Gaussian process, it is easy to show

\begin{matrix} ρ_{r + 1, \dots, s} (x_{r + 1}, \dots, x_{s}) = \frac{1}{\sqrt{{(2 π)}^{s - r} det Σ_{B}}} e^{- \frac{1}{2} {(x_{B} - μ_{B})}^{T} Σ_{B}^{- 1} (x_{B} - μ_{B})}, \end{matrix}

(23)

where

x_{B} = (x_{r + 1}, \dots, x_{s})

,

μ_{B} = (μ_{r + 1}, \dots, μ_{s})

is the vector of the means of

x_{B}

, and

Σ_{B}

the covariance matrix of

x_{B}

. For easy correspondence, we will augment

x_{B}

,

μ_{B}

, and

Σ_{B}

, so that their entries have the same indices as their counterparts in

x

,

μ

and

Σ

. Separate

F_{i}

into two parts:

\begin{matrix} F_{i} & = & [f_{i} + \sum_{j = 1}^{r} a_{i j} x_{j} + \sum_{j = s + 1}^{n} a_{i j} x_{j}] + [\sum_{j = r + 1}^{s} a_{i j} x_{j}] \\ \equiv & F_{i}^{'} + F_{i}^{'}, \end{matrix}

where

F_{i}^{'}

and

F_{i}^{'}

correspond to the respective parts in the two square brackets. Thus,

F_{i}^{'}

has nothing to do with the subspace B. By Theorem 2, this part does not contribute to the causality from A to B, so we only need to consider

F_{i}^{″}

in evaluating

T_{A \to B}

; that is to say,

\begin{matrix} T_{A \to B} & = & - E [\sum_{i = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \frac{\partial}{\partial x_{i}} \int_{ℝ^{n - s}} F_{i} ρ_{r + 1, \dots, n} d x_{s + 1, \dots, n}] \\ = & - E [\sum_{i = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \frac{\partial}{\partial x_{i}} \int_{ℝ^{n - s}} F_{i}^{″} ρ_{r + 1, \dots, n} d x_{s + 1, \dots, n}] \\ = & - E [\sum_{i = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \frac{\partial F_{i}^{″} ρ_{r + 1, \dots, s}}{\partial x_{i}}] \\ = & - \sum_{i = r + 1}^{s} [E (F_{i}^{″} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}}) + E (\frac{\partial F_{i}^{″}}{\partial x_{i}})] . \end{matrix}

The second term in the bracket is

a_{i i}

. The first term is

\begin{matrix} F_{i}^{″} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}} = [\sum_{j = r + 1}^{s} a_{i j} x_{j}] \cdot \frac{\partial}{\partial x_{i}} [- \frac{1}{2} {(x_{B} - μ_{B})}^{T} Σ_{B}^{- 1} (x_{B} - μ_{B})] \\ = (\sum_{j = r + 1}^{s} a_{i j} x_{j}) {\sum^{˙}}_{j = r + 1}^{s} (- \frac{σ_{i j}^{'} + σ_{j i}^{'}}{2}) \cdot (x_{j} - μ_{j}) . \end{matrix}

Here,

σ_{i j}^{'}

is the

(i, j) th

entry of the matrix

\begin{matrix} [\begin{matrix} I & 0 & 0 \\ 0 & Σ_{B}^{- 1} & 0 \\ 0 & 0 & I \end{matrix}] . \end{matrix}

Since, here, only

1 \leq i, j \leq s

are in question, this is equal to the

(i, j) th

entry of the matrix

\begin{matrix} [\begin{matrix} I_{r \times r} & 0_{r \times (s - r)} \\ 0_{(s - r) \times r} & Σ_{B}^{- 1} \end{matrix}] \end{matrix}

As

Σ_{B}

is symmetric, so is

Σ_{B}^{- 1}

, and hence

(σ_{i j}^{'} + σ_{j i}^{'}) / 2 = σ_{i j}^{'}

. Thus,

\begin{matrix} - E F_{i} \frac{\partial log ρ_{r + 1, \dots, s}}{\partial x_{i}} = - E \sum_{j = 1}^{s} a_{i j} x_{j} \cdot \sum_{j = r + 1}^{s} (- σ_{i j}^{'}) \cdot (x_{j} - μ_{j}) \\ = E \sum_{k = 1}^{s} a_{i k} (x_{k} - μ_{k}) \cdot \sum_{j = r + 1}^{s} σ_{i j}^{'} (x_{j} - μ_{j}) \\ = \sum_{k = 1}^{s} \sum_{j = r + 1}^{s} a_{i k} σ_{i j}^{'} E (x_{k} - μ_{k}) (x_{j} - μ_{j}) \\ = \sum_{k = 1}^{s} \sum_{j = r + 1}^{s} a_{i k} σ_{i j}^{'} σ_{k} j . \end{matrix}

Substituting back, we obtain a very simplified result for

T_{A \to B}

. Likewise,

T_{B \to A}

can also be obtained, as shown in the following.

Theorem 3.

In Equations (1)–(3), suppose

b_{i j}

are constants, and

\begin{matrix} F_{i} = f_{i} + \sum_{j = 1}^{n} a_{i j} x_{j}, \end{matrix}

(24)

where

f_{i}

and

a_{i j}

are also constants. Furthermore, suppose that initially

x

has a Gaussian distribution; then,

\begin{matrix} T_{A \to B} & = & \sum_{i = r + 1}^{s} [\sum_{j = r + 1}^{s} σ_{i j}^{'} (\sum_{k = 1}^{s} a_{i k} σ_{k j}) - a_{i i}], \end{matrix}

(25)

where

σ_{i j}^{'}

is the

(i, j) th

entry of

[\begin{matrix} I_{r \times r} & 0 \\ 0 & Σ_{B}^{- 1} \end{matrix}]

, and

\begin{matrix} T_{B \to A} & = & \sum_{i = 1}^{r} [\sum_{j = 1}^{r} σ_{i j}^{″} (\sum_{k = 1}^{s} a_{i k} σ_{k j}) - a_{i i}], \end{matrix}

(26)

where

σ_{i j}^{''}

is the

(i, j) th

entry of

[\begin{matrix} Σ_{A}^{- 1} & 0 \\ 0 & I_{(s - r) \times (s - r)} \end{matrix}]

.

Given a system such as (1)–(3), we can evaluate in a precise sense the information flows among the components. Now, suppose that instead of the dynamical system, what we have are just n time series with K steps,

K ≫ n

,

{x_{1} (k)}, {x_{2} (k)}, \dots, {x_{n} (k)}

. We can estimate the system from the series and then apply the information flow formula to fulfill the task. Assume a linear model as shown above, and assume

m = 1

. following Liang (2014) [19], the maximum likelihood estimator (mle) of

a_{i j}

is equal to the least-square solution of the following over-determined problem:

\begin{matrix} (\begin{matrix} 1 & x_{1} (1) & x_{2} (1) & \dots & x_{n} (1) \\ 1 & x_{1} (2) & x_{2} (2) & \dots & x_{n} (2) \\ 1 & x_{1} (3) & x_{2} (3) & \dots & x_{n} (3) \\ ⋮ & ⋮ & ⋮ ⋱ & ⋮ \\ 1 & x_{1} (K) & x_{2} (K) & \dots & x_{n} (K) \end{matrix}) (\begin{matrix} f_{i} \\ a_{i 1} \\ a_{i 2} \\ ⋮ \\ a_{i n} \end{matrix}) = (\begin{matrix} {\dot{x}}_{i} (1) \\ {\dot{x}}_{i} (2) \\ {\dot{x}}_{i} (3) \\ ⋮ \\ {\dot{x}}_{i} (K) \end{matrix}) \end{matrix}

where

{\dot{x}}_{i} (k) = (x_{i} (k + 1) - x_{i} (k)) / Δ t

(

Δ t

is the time stepsize), for

i = 1, 2, \dots, n

,

k = 1, \dots, K

. Use an overbar to denote the time mean over the K steps. The above equation is

\begin{matrix} (\begin{matrix} 1 & {\bar{x}}_{1} & {\bar{x}}_{2} & \dots & {\bar{x}}_{n} \\ 0 & x_{1} (2) - {\bar{x}}_{1} & x_{2} (2) - {\bar{x}}_{2} & \dots & x_{n} (2) - {\bar{x}}_{n} \\ 0 & x_{1} (3) - {\bar{x}}_{1} & x_{2} (3) - {\bar{x}}_{2} & \dots & x_{n} (3) - {\bar{x}}_{n} \\ ⋮ & ⋮ & ⋮ ⋱ & ⋮ \\ 0 & x_{1} (K) - {\bar{x}}_{1} & x_{2} (K) - {\bar{x}}_{2} & \dots & x_{n} (K) - {\bar{x}}_{n} \end{matrix}) (\begin{matrix} f_{i} \\ a_{i 1} \\ a_{i 2} \\ ⋮ \\ a_{i n} \end{matrix}) = (\begin{matrix} {\bar{\dot{x}}}_{i} \\ {\dot{x}}_{i} (2) - {\bar{\dot{x}}}_{i} \\ {\dot{x}}_{i} (3) - {\bar{\dot{x}}}_{i} \\ ⋮ \\ {\dot{x}}_{i} (K) - {\bar{\dot{x}}}_{i} \end{matrix}) \end{matrix}

Denote by

R

the matrix

(\begin{matrix} x_{1} (2) - {\bar{x}}_{1} & x_{2} (2) - {\bar{x}}_{2} & \dots & x_{n} (2) - {\bar{x}}_{n} \\ ⋮ & ⋮ & ⋮ ⋱ & ⋮ \\ x_{1} (K) - {\bar{x}}_{1} & x_{2} (K) - {\bar{x}}_{2} & \dots & x_{n} (K) - {\bar{x}}_{n} \end{matrix}),

q

the vector

{(x_{i} (2) - {\bar{\dot{x}}}_{i}, \dots, x_{i} (K) - {\bar{\dot{x}}}_{i})}^{T}

, and

a_{i}

the row vector

{(a_{i 1}, \dots, a_{i n})}^{T}

. Then,

R a_{i} = q

. The least square solution of

a_{i}

,

{\hat{a}}_{i}

, solves

\begin{matrix} R^{T} R {\hat{a}}_{i} = R^{T} q . \end{matrix}

Note that

R^{T} R

is

K C

, where

C = (c_{i j})

is the sample covariance matrix. Thus,

\begin{matrix} (\begin{matrix} {\hat{a}}_{i 1} \\ {\hat{a}}_{i 2} \\ ⋮ \\ {\hat{a}}_{i n} \end{matrix}) = C^{- 1} (\begin{matrix} c_{1, d i} \\ c_{2, d i} \\ ⋮ \\ c_{n, d i} \end{matrix}) \end{matrix}

(27)

where

c_{j, d i}

is the sample covariance between the series

{x_{j} (k)}

and

{(x_{i} (k + 1) - x_{i} (k)) / Δ t}

.

Thus, finally, the mle of

T_{A \to B}

is

\begin{matrix} {\hat{T}}_{A \to B} = \sum_{i = r + 1}^{s} [\sum_{j = r + 1}^{s} c_{i j}^{'} (\sum_{k = 1}^{s} {\hat{a}}_{i k} c_{k j}) - {\hat{a}}_{i i}], \end{matrix}

(28)

where

c_{i j}^{'}

is the

(i, j) th

entry of

{\tilde{C}}^{- 1}

, and

\begin{matrix} \tilde{C} = (\begin{matrix} I_{r \times r} & 0_{r \times (s - r)} \\ 0_{(s - r) \times r} & [\begin{matrix} c_{r + 1, r + 1} & \dots & c_{r + 1, s} \\ ⋮ & ⋮ & ⋮ \\ c_{s, r + 1} & \dots & c_{s, s} \end{matrix}] \end{matrix}) . \end{matrix}

(29)

Likewise,

\begin{matrix} {\hat{T}}_{B \to A} & = & \sum_{i = 1}^{r} [\sum_{j = 1}^{r} c_{i j}^{''} (\sum_{k = 1}^{s} {\hat{a}}_{i k} c_{k j}) - {\hat{a}}_{i i}] \end{matrix}

(30)

Here,

\begin{matrix} \tilde{\tilde{C}} = (\begin{matrix} [\begin{matrix} c_{11} & \dots & c_{1 r} \\ ⋮ & ⋮ & ⋮ \\ c_{r 1} & \dots & c_{r r} \end{matrix}] & 0_{r \times (s - r)} \\ 0_{(s - r) \times r} & I_{(s - r) \times (s - r)} \end{matrix}), \end{matrix}

(31)

and

c_{i j}^{″}

is the

(i, j) t h

entry of

{\tilde{\tilde{C}}}^{_{-} 1}

.

When

n = 2

and

r = 1

, and hence,

s = 1

,

\tilde{\tilde{C}} = [\begin{matrix} c_{11} & 0 \\ 0 & 1 \end{matrix}]

, so

c_{11}^{″} = c_{11}^{- 1}

. Equation (30) thus becomes

\begin{matrix} {\hat{T}}_{B \to A} & = & c_{11}^{″} \sum_{k = 1}^{2} {\hat{a}}_{1 k} c_{k 1} - {\hat{a}}_{11} \\ = & \frac{1}{c_{11}} ({\hat{a}}_{11} c_{11} + {\hat{a}}_{12} c_{12}) - {\hat{a}}_{11} \\ = & \frac{c_{11} c_{12} c_{2, d 1} - c_{12}^{2} c_{1, d 1}}{c_{11}^{2} - c_{11} c_{12}^{2}}, \end{matrix}

recovering the well-known Equation (10) in [19].

4. Validation

4.1. One-Way Causal Relation

To see if the above formalism works, consider the vector autoregressive (VAR) process:

\begin{matrix} X : \{\begin{matrix} x_{1} (n + 1) = - 0.5 x_{1} (n) + 0.5 x_{2} (n) + 0.2 x_{3} (n) + e_{x 1} (n + 1), \\ x_{2} (n + 1) = 0 x_{1} (n) - 0.2 x_{2} (n) - 0.6 x_{3} (n) + e_{x 2} (n + 1), \\ x_{3} (n + 1) = - 0.2 x_{1} (n) + 0.4 x_{2} (n) - 0.2 x_{3} (n) + ε_{3} y_{3} (n) + e_{x 3} (n + 1), \end{matrix} \end{matrix}

(32)

\begin{matrix} Y : \{\begin{matrix} y_{1} (n + 1) = - 0.2 y_{1} (n) - 0.5 y_{2} (n) + 0 y_{3} (n) - ε_{1} x_{1} (n) + e_{y 1} (n + 1), \\ y_{2} (n + 1) = 0.5 y_{1} (n) - 0.6 y_{2} (n) + 0.4 y_{3} (n) + e_{y 2} (n + 1), \\ y_{3} (n + 1) = - 0.1 y_{1} (n) - 0.4 y_{2} (n) - 0.5 y_{3} (n) + e_{y 3} (n + 1) \end{matrix}, \end{matrix}

(33)

where

e_{x i}, e_{y i} \sim N (0, 1)

,

i = 1, 2, 3

, are independent. As schematized in Figure 1,

(x_{1}, x_{2}, x_{3})

and

(y_{1}, y_{2}, y_{3})

form two subsystems, written as X and Y, respectively. They are coupled only through the first and third components; more specifically,

x_{1}

drives

y_{1}

, and Y feeds back to X through coupling

y_{3}

with

x_{3}

. The strength of the coupling is determined by the parameters

ε_{1}

and

ε_{3}

. In this subsection,

ε_{3} = 0

, so the causality is one-way, i.e., from X to Y without feedback.

Initialized with random numbers, we iterate the process for 20,000 steps and discard the first 10,000 steps to form six time series with a length of 10,000 steps. Using the algorithm by Liang (e.g., [16,18,19,20]), the information flows between

x_{1}

and

y_{1}

can be rather accurately obtained. As shown in Figure 2a, the information flow/causality from X to Y increases with

ε_{1}

, and there is no causality the other way around, just as expected. Since no other coupling exists, one can imagine that the bulk information flows must also bear a similar trend. Using Equations (28) and (30), the estimators are indeed similar to that, as shown in Figure 2b. This demonstrates the success of the above formalism.

Since practically averages and principal components (PCs) have been widely used to measure complex subsystem variations, we also compute the information flows between

\bar{x} = \frac{1}{3} (x_{1} + x_{2} + x_{3})

and

\bar{y} = \frac{1}{3} (y_{1} + y_{2} + y_{3})

, and that between the first PCs of

(x_{1}, x_{2}, x_{3})

and

(y_{1}, y_{2}, y_{3})

. The results are plotted in Figure 2c,d, respectively. As can be seen, the principal component analysis (PCA) method works just fine in this case. By comparison, the averaging method yields an incorrect result.

The incorrect inference based on averaging is within expectation. In a network with complex causal relations, for example, with a causality from

y_{2}

to

y_{1}

, the averaging of

y_{1}

with

y_{2}

is equivalent to mixing

y_{1}

with its future state, which is related to the contemporary state of

x_{1}

, and hence will yield a spurious causality to

x_{1}

. The PCA here functions satisfactorily, perhaps because in selecting the most coherent structure, it discards most of the influences from other (implicit) time steps. However, the relative success of PCA may not be robust, as evidenced in the following mutually causal case.

4.2. Mutually Causal Relation

If both the coupling parameters,

ε_{1}

and

ε_{3}

, are turned on, the resulting causal relation has a distribution on the

ε_{1} - ε_{3}

plane. Figure 3 shows the componentwise information flows

T_{x_{1} \to y_{1}}

(bottom) and

T_{y_{3} \to x_{3}}

(top) on the plane. The other two flows, i.e., their counterparts

T_{y_{1} \to x_{1}}

and

T_{x_{3} \to y_{3}}

, are by computation essentially zero. As argued in the preceding subsection, the bulk information flows should follow the general pattern, albeit perhaps in a more coarse and/or mild pattern, since it is a property on the whole. This is indeed true. Shown in Figure 4 are the bulk information flows between X and Y computed using Equations (28) and (30).

Again, as usual, we try the averages and first PCs as proxies for estimating the causal interaction between X and Y. Figure 5 shows the distributions of the information flows between

\bar{x}

and

\bar{y}

. The resulting patterns are totally different from what Figure 3 displays; obviously, these patterns are incorrect.

One may expect that the PCA method should yield more reasonable causal patterns. We have computed the first PCs for

(x_{1}, x_{2}, x_{3})

and

(y_{1}, y_{2}, y_{3})

, respectively, and estimated the information flows using the algorithm by Liang [20]. The resulting distributions, however, are no better than those with the averaged series (Figure 6). That is to say, this seemingly more sophisticated approach does not yield the right interaction between the complex subsystems, either.

5. Summary

Information flow provides a natural measure of the causal interaction between dynamical events. In this study, the information flows between two complex subsystems of a large dimensional system are studied, and analytical formulas have been obtained in a closed form. For easy reference, the major results are summarized hereafter.

For an n-dimensional

\begin{matrix} \frac{d x}{d t} = F (x, t) + B (x, t) \dot{w}, \end{matrix}

if the probability density function (pdf) of

x

is compactly supported, then the information flows from subsystem A, which are made of

x_{1 \dots r}

, to subsystem B, made of

x_{r + 1, \dots, s}

(

1 \leq r < s \leq n

), and that from B to A are, respectively (in nats per unit time),

\begin{matrix} T_{A \to B} = - E [\sum_{i = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{r + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}] \\ + \frac{1}{2} E [\sum_{i = r + 1}^{s} \sum_{j = r + 1}^{s} \frac{1}{ρ_{r + 1, \dots, s}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{r + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}], \end{matrix}

\begin{matrix} T_{B \to A} = - E [\sum_{i = 1}^{r} \frac{1}{ρ_{1 \dots r}} \int_{ℝ^{n - s}} \frac{\partial F_{i} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i}} d x_{s + 1, \dots, n}] \\ + \frac{1}{2} E [\sum_{i = 1}^{r} \sum_{j = 1}^{r} \frac{1}{ρ_{1 \dots r}} \int_{ℝ^{n - s}} \frac{\partial^{2} g_{i j} ρ_{1, \dots, r, s + 1, \dots, n}}{\partial x_{i} \partial x_{j}} d x_{s + 1, \dots, n}], \end{matrix}

where

g_{i j} = \sum_{k = 1}^{m} b_{i k} b_{j k}

, and E signifies mathematical expectation. Given n stationary time series,

T_{A \to B}

and

T_{B \to A}

can be estimated. The maximum likelihood estimators under a Gaussian assumption are referred to in Equations (28) and (30).

We have constructed a VAR process to validate the formalism. The system has a dimension of 6, with two subsystems respectively denoted by X and Y, each with a dimension of 3. X drives Y via the coupling at one component, and Y feeds back to X via another. The detailed, componentwise causal relation can be easily found using our previous algorithms such as that in [20]. It is expected that the bulk information flow should in general also follow a similar trend, though the structure could be in a more coarse and mild fashion, as now displayed is an overall property. The above formalism does yield such a result. On the contrary, the commonly used proxies for subsystems, such as averages and principal components (PCs), generally do not work. Particularly, the averaged series yield the wrong results in the two cases considered in this study; the PC series do not work either for the mutually causal case, though they result in a satisfactory characterization for the case with a one-way causality.

The result of this study is applicable in many real world problems. As explained in the Introduction, it will be of particular use in the related fields of climate science, neuroscience, financial economics, fluid mechanics, etc. For example, it helps clarify the role of greenhouse gas emissions in bridging the climate system and the socioeconomic system (see a review in [26]). Likewise, the interaction between the earth system and public health [27] can also be studied. In short, it is expected to play a role in the frontier field of complexity, namely, multiplex networks or networks of networks (see the references in [28,29,30]). We are therefore working on these applications.

Funding

This research was funded by the Shanghai International Science and Technology Partnership Project (grant number: 21230780200), the National Science Foundation of China (grant number: 41975064), and the 2015 Jiangsu Program for Innovation Research and Entrepreneurship Groups.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

Intergovenmental Panel on Climate Change (IPCC). The Sixth Assessment Report, Climate Change 2021: The Physical Science Basis. Available online: https://www.ipcc.ch/report/ar6/wg1/#FullReport (accessed on 15 November 2021).
Friston, K.J.; Harrison, L.; Penny, W. Dynamic causal modeling. Neuroimage 2003, 19, 1273–1302. [Google Scholar] [CrossRef]
Li, B.; Daunizeau, J.; Stephan, K.E.; Penny, W.; Hu, D.; Friston, K. Generalised filtering and stochastic DCM for fMRI. Neuroimage 2011, 58, 442–457. [Google Scholar] [CrossRef] [Green Version]
Friston, K.J.; Ungerleider, L.G.; Jezzard, P.; Turner, R. Characterizing modulatory interactions between V1 and V2 in human cortex with fMRI. Hum. Brain Mapp. 1995, 2, 211–224. [Google Scholar] [CrossRef]
Karl, J. Friston, Joshua Kahan, Adeel Razi, Klaas Enno Stephan, Olaf Sporns. On nodes and modes in resting state fMRI. NeuroImage 2014, 99, 533C547. [Google Scholar]
Qiu, P.; Jiang, J.; Liu, Z.; Cai, Y.; Huang, T.; Wang, Y.; Liu, Q.; Nie, Y.; Liu, F.; Cheng, J.; et al. BMAL₁ knockout macaque monkeys display reduced sleep and psychiatric disorders. Natl. Sci. Rev. 2019, 6, 87–100. [Google Scholar] [CrossRef] [Green Version]
Wang, X.-J.; Hu, H.; Huang, C.; Keennedy, H.; Li, C.T.; Logothetis, N.; Lu, Z.-L.; Luo, Q.; Poo, M.-M.; Tsao , D.; et al. Computational neuroscience: A frontier of the 21st century. Natl. Sci. Rev. 2020, 7, 1418–1422. [Google Scholar] [CrossRef] [PubMed]
Granger, C. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424. [Google Scholar] [CrossRef]
Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
Imbens, G.W.; Rubin, D.B. Causal Inference for Statistics, Social, and Biomedical Sciences; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Batchelor, G.K. The Theory of Homogeneous Turbulence; Cambridge University Press: Cambridge, UK, 1953; 197p. [Google Scholar]
Landau, L.D.; Lifshitz, E.M. Statistical Physics, 2nd Revised and Enlarged ed.; Pergamon Press: Oxford, UK, 1969. [Google Scholar]
Preisendorfer, R. Principal Component Analysis in Meteorology and Oceanography; Elsevier: Amsterdam, The Netherlands, 1998; 418p. [Google Scholar]
Friston, K.J.; Frith, C.D.; Liddle, P.F.; Frackowiak, R.S. Functional connectivity: The principal-component analysis of large (PET) data sets. J. Cereb. Blood Flow Metab. 1993, 13, 5–14. [Google Scholar] [CrossRef]
Friston, K.; Phillips, J.; Chawla, D.; Buchel, C. Nonlinear PCA: Characterizing interactions between modes of brain activity. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2000, 355, 135–146. [Google Scholar] [CrossRef] [Green Version]
Liang, X.S. Information flow and causality as rigorous notions ab initio. Phys. Rev. E 2016, 94, 052201. [Google Scholar] [CrossRef] [Green Version]
Liang, X.S.; Kleeman, R. Information transfer between dynamical system components. Phys. Rev. Lett. 2005, 95, 244101. [Google Scholar] [CrossRef] [Green Version]
Liang, X.S. Information flow within stochastic systems. Phys. Rev. E 2008, 78, 031113. [Google Scholar] [CrossRef] [Green Version]
Liang, X.S. Unraveling the cause-effect relation between time series. Phys. Rev. E 2014, 90, 052150. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liang, X.S. Normalized multivariate time series causality analysis and causal graph reconstruction. Entropy 2021, 23, 679. [Google Scholar] [CrossRef] [PubMed]
Majda, A.J.; Harlim, J. Information flow between subspaces of complex dynamical systems. Proc. Natl. Acad. Sci. USA 2007, 104, 9558–9563. [Google Scholar] [CrossRef] [Green Version]
Liang, X.S. Measuring the importance of individual units in producing the collective behavior of a complex network. Chaos 2021, 31, 093123. [Google Scholar] [CrossRef]
AI-Sadoon, M.M. Testing subspace Granger causality. Econom. Stat. 2019, 9, 42–61. [Google Scholar]
Triacca, U. Granger causality between vectors of time series: A puzzling property. Stat. Probab. Lett. 2018, 142, 39–43. [Google Scholar] [CrossRef]
Lasota, A.; Mackey, M.C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics; Springer: New York, NY, USA, 1994. [Google Scholar]
Tachiiri, K.; Su, X.; Matsumoto, K.K. Identifying the key processes and sectors in the interaction between climate and socio-economic systems: A review toward integrating Earth-human systems. Progress. Earth Planet. Sci. 2021, 8, 24. [Google Scholar] [CrossRef]
Balbus, J.; Crimmins, A.; Gamble, J.L.; Easterling, D.R.; Kunkel, K.E.; Saha, S.; Sarofim, M.C. Introduction: Climate Change and Human Health. The Impacts of Climate Change on Human Health in the United States: A Scientifi Assessment. U.S. Global Change Research Program: Washington, DC, USA, 2016. [Google Scholar]
D’Agostino, G.; Scala, A. Networks of Networks: The Last Frontier of Complexity; Springer: New York, NY, USA, 2014. [Google Scholar]
Kenett, D.Y.; Matjaž, P.; Boccaletti, S. Networks of networks—An introduction. Chaos Solitons Fractals 2015, 80, 1–6. [Google Scholar] [CrossRef]
DeFord, D.R.; Pauls, S.D. Spectral clustering methods for multiplex networks. Phys. A Stat. Mech. Its Appl. 2019, 533, 121949. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The preset coupling between the subsystems X and Y.

Figure 2. The absolute information flows between subspaces X and Y as functions of the coupling coefficients

ε_{1}

(

ε_{3} = 0

). (a) The componentwise information flows between

x_{1}

and

y_{1}

; (b) the bulk information flows between subsystems X and Y computed with Equations (28) and (30); (c) the information flows between

\bar{x}

and

\bar{y}

; (d) the information flows between the first principal components of (

x_{1}, x_{2}, x_{3}

) and (

y_{1}, y_{2}, y_{3}

), respectively (units: nats per time step).

Figure 2. The absolute information flows between subspaces X and Y as functions of the coupling coefficients

ε_{1}

(

ε_{3} = 0

). (a) The componentwise information flows between

x_{1}

and

y_{1}

; (b) the bulk information flows between subsystems X and Y computed with Equations (28) and (30); (c) the information flows between

\bar{x}

and

\bar{y}

; (d) the information flows between the first principal components of (

x_{1}, x_{2}, x_{3}

) and (

y_{1}, y_{2}, y_{3}

), respectively (units: nats per time step).

Figure 3. The absolute information flow from

y_{3}

to

x_{3}

and that from

x_{1}

to

y_{1}

as functions of

ε_{1}

and

ε_{3}

. The units are in nats per time step.

Figure 3. The absolute information flow from

y_{3}

to

x_{3}

and that from

x_{1}

to

y_{1}

as functions of

ε_{1}

and

ε_{3}

. The units are in nats per time step.

Figure 4. The absolute bulk information flow from subsystem Y to subsystem X, and that from X to Y. The abscissa and ordinate are the coupling coefficients

ε_{1}

and

ε_{3}

, respectively.

Figure 4. The absolute bulk information flow from subsystem Y to subsystem X, and that from X to Y. The abscissa and ordinate are the coupling coefficients

ε_{1}

and

ε_{3}

, respectively.

Figure 5. As Figure 4, but for the information flows between the mean series

\bar{x} = \frac{1}{3} (x_{1} + x_{2} + x_{3})

and

\bar{y} = \frac{1}{3} (y_{1} + y_{2} + y_{3})

.

Figure 5. As Figure 4, but for the information flows between the mean series

\bar{x} = \frac{1}{3} (x_{1} + x_{2} + x_{3})

and

\bar{y} = \frac{1}{3} (y_{1} + y_{2} + y_{3})

.

Figure 6. As Figure 4, but for the information flows between the first principal component of

(x_{1}, x_{2}, x_{3})

and that of

(y_{1}, y_{2}, y_{3})

.

Figure 6. As Figure 4, but for the information flows between the first principal component of

(x_{1}, x_{2}, x_{3})

and that of

(y_{1}, y_{2}, y_{3})

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, X.S. The Causal Interaction between Complex Subsystems. Entropy 2022, 24, 3. https://doi.org/10.3390/e24010003

AMA Style

Liang XS. The Causal Interaction between Complex Subsystems. Entropy. 2022; 24(1):3. https://doi.org/10.3390/e24010003

Chicago/Turabian Style

Liang, X. San. 2022. "The Causal Interaction between Complex Subsystems" Entropy 24, no. 1: 3. https://doi.org/10.3390/e24010003

APA Style

Liang, X. S. (2022). The Causal Interaction between Complex Subsystems. Entropy, 24(1), 3. https://doi.org/10.3390/e24010003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Causal Interaction between Complex Subsystems

Abstract

1. Introduction

2. Information Flow between Two Subspaces of a Complex System

3. Information Flow between Linear Subsystems and Its Estimation

4. Validation

4.1. One-Way Causal Relation

4.2. Mutually Causal Relation

5. Summary

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI