Next Article in Journal
Quantum Information in Relativity: The Challenge of QFT Measurements
Next Article in Special Issue
Along the Lines of Nonadditive Entropies: q-Prime Numbers and q-Zeta Functions
Previous Article in Journal
Magnetic Phase Diagram of the MnxFe2−xP1−ySiy System
Previous Article in Special Issue
Application of Generalized Composite Multiscale Lempel–Ziv Complexity in Identifying Wind Turbine Gearbox Faults
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Causal Interaction between Complex Subsystems

1
Department of Atmospheric & Oceanic Sciences, Institute of Atmospheric Sciences, Fudan University, Shanghai 200438, China
2
IRDR ICoE on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health, Fudan University, Shanghai 200438, China
3
Shanghai Qi Zhi Institute (Andrew C. Yao Institute for Artificial Intelligence), Shanghai 200232, China
Entropy 2022, 24(1), 3; https://doi.org/10.3390/e24010003
Submission received: 29 November 2021 / Revised: 16 December 2021 / Accepted: 16 December 2021 / Published: 21 December 2021
(This article belongs to the Special Issue Information Geometry, Complexity Measures and Data Analysis)

Abstract

:
Information flow provides a natural measure for the causal interaction between dynamical events. This study extends our previous rigorous formalism of componentwise information flow to the bulk information flow between two complex subsystems of a large-dimensional parental system. Analytical formulas have been obtained in a closed form. Under a Gaussian assumption, their maximum likelihood estimators have also been obtained. These formulas have been validated using different subsystems with preset relations, and they yield causalities just as expected. On the contrary, the commonly used proxies for the characterization of subsystems, such as averages and principal components, generally do not work correctly. This study can help diagnose the emergence of patterns in complex systems and is expected to have applications in many real world problems in different disciplines such as climate science, fluid dynamics, neuroscience, financial economics, etc.

1. Introduction

When investigating the properties of a complex system, it is often necessary to study the interaction between one subsystem and another subsystem, which themselves also form complex systems, usually with a large number of components involved. In climate science, for example, there is much interest in understanding how one sector of the system collaborates with another sector to cause climate change (see [1] and the references therein); in neuroscience, it is important to investigate the effective connectivity from one brain region to another, each with millions of neurons involved (e.g., [2,3]), and the interaction between structures (e.g., [4,5,6]; see more references in a recent review [7]). This naturally raises a question: How can we study the interaction between two subsystems in a large parental system?
An immediate answer coming to mind might be to study the componentwise interactions by assessing the causalities between the respective components using, for instance, the classical causal inference approaches (e.g., [8,9,10]). This is generally infeasible if the dimensionality is large. For two subsystems, each with, say, 1000 components, they end up with 1 million causal relations, making it impossible to analyze, albeit with all the details. In this case, the details are not a benefit; they need to be re-analyzed for a big, interpretable picture of the phenomena. On the other hand, in many situations, this is not necessary; one needs only a “bulk” description of the subsystems and their interactions. Such examples are seen from the Reynolds equations for turbulence (e.g., [11]) and the thermodynamic description of molecular motions (e.g., [12]). In some fields (e.g., climate science, neuroscience, geography, etc.), a common practice is simply to take respective averages and form the mean properties, and to study the interactions between the proxies, i.e., the mean properties. A more sophisticated approach is to extract the respective principal components (PCs) (e.g., [13,14,15]), based on which the interactions are analyzed henceforth. These approaches, as we will be examining in this study, however, may not work satisfactorily; their validities need to be carefully checked before being put into application.
During the past 16 years, it has been gradually realized that causality in terms of information flow (IF) is a real physical notion that can be rigorously derived from first principles (see [16]). When two processes interact, IF provides not only the direction but also the strength of the interaction. Thus far, the formalism of the IF between two components has been well established (see [16,17,18,19,20], among others). It has been shown promising to extend the formalism to subspaces with many components involved. A pioneering effort is [21], where the authors show that the heuristic argument in [17] equally applies to that between subsystems in the case with only one-way causality. A recent study on the role of individual nodes in a complex network [22] may be viewed as another effort. (Causality analyses between subspaces with the classical approaches are rare; a few examples are [23,24], etc.) However, a rigorous formalism for more generic problems (e.g., with mutual causality involved) is yet to be implemented. This makes the objective of this study, i.e., to study the interactions between two complex subsystems within a large parental system by investigating the “bulk” information flow between them.
The rest of the paper is organized as follows. In Section 2, we first present the setting for the problem and then derive the IF formulas. Maximum likelihood estimators of these formulas are given in Section 3, which is followed by a validation. Finally, Section 5 summarizes the study.

2. Information Flow between Two Subspaces of a Complex System

Consider an n-dimensional dynamical system
A : d x 1 d t = F 1 ( x 1 , x 2 , , x n ; t ) + k = 1 m b 1 , k ( x 1 , x 2 , , x n ; t ) w ˙ k d x r d t = F r ( x 1 , x 2 , , x n ; t ) + k = 1 m b r , k ( x 1 , x 2 , , x n ; t ) w ˙ k
B : d x r + 1 d t = F r + 1 ( x 1 , x 2 , , x n ; t ) + k = 1 m b r + 1 , k ( x 1 , x 2 , , x n ; t ) w ˙ k d x s d t = F s ( x 1 , x 2 , , x n ; t ) + k = 1 m b s , k ( x 1 , x 2 , , x n ; t ) w ˙ k
d x s + 1 d t = F s + 1 ( x 1 , x 2 , , x n ; t ) + k = 1 m b s + 1 , k ( x 1 , x 2 , , x n ; t ) w ˙ k d x n d t = F n ( x 1 , x 2 , , x n ; t ) + k = 1 m b n k ( x 1 , x 2 , , x n ; t ) w ˙ k .
where x n denotes the vector of state variable ( x 1 , x 2 , , x n ) , F = ( F 1 , , F n ) are differentiable functions of x and time t, w is a vector of m independent standard Wiener processes, and B = ( b i j ) is an n × m matrix of stochastic perturbation amplitudes. Here we follow the convention in physics not to distinguish a random variable from its deterministic counterpart. From the components ( x 1 , , x n ) , we separate out two sets, ( x 1 , , x r ) and ( x r + 1 , , x s ) , and denote them as x 1 r and x r + 1 , , s , respectively. The remaining components ( x s + 1 , , x n ) are denoted as x s + 1 , , n . The subsystems formed by them are henceforth referred to as A and B, and the following is a derivation of the information flow between them. Note that, for convenience, here A and B are put adjacent to each other; if not, the equations can always be rearranged to make them so.
Associated with Equations (1)–(3) there is a Fokker–Planck equation governing the evolution of the joint probability density function (pdf) ρ of x :
ρ t + ρ F 1 x 1 + ρ F 2 x 2 + + ρ F n x n = 1 2 i = 1 n j = 1 n 2 g i j ρ x i x j ,
where g i j = k = 1 m b i k b j k , i , j = 1 , , n . Without much loss of generality, ρ is assumed to be compactly supported on n . The joint pdfs of x 1 r and x r + 1 , , s are, respectively,
ρ 1 r = n r ρ ( x ) d x r + 1 d x n n r ρ ( x ) d x r + 1 , , n , ρ r + 1 , , s = n s + r ρ ( x ) d x 1 d x r d x s + 1 d x n n s + r ρ ( x ) d x 1 , , r , s + 1 , , n .
With respect to them, the joint entropies are then
H A = r ρ 1 r log ρ 1 r d x 1 r ,
H B = s r ρ r + 1 , , s log ρ r + 1 , , s d x r + 1 , , s .
To derive the evolution of ρ 1 r , integrate out ( x r + 1 , , x n ) in Equation (4). This yields, by using the assumption of compactness for ρ ,
ρ 1 r t + i = 1 r x i n r ρ F i d x r + 1 , , n = 1 2 i = 1 r j = 1 r n r 2 g i j ρ x i x j d x r + 1 , , n .
Similarly,
ρ r + 1 , , s t + i = r + 1 s x i n s + r ρ F i d x 1 , , r , s + 1 , , n = 1 2 i = r + 1 s j = r + 1 s n s + r 2 g i j ρ x i x j d x 1 , , r , s + 1 , , n .
Multiplication of Equation (7) by ( 1 + log ρ 1 r ) , followed by an integration with respect to x 1 r over r , yields
d H A d t i = 1 r r ( 1 + log ρ 1 r ) · x i n r ρ F i d x r + 1 , , n d x 1 r = 1 2 r ( 1 + log ρ 1 r ) · i = 1 r j = 1 r n r 2 g i j ρ x i x j d x r + 1 , , n d x 1 r .
Note that in the second term of the left hand side, the part within the summation is, by integration by parts,
r ( log ρ 1 r ) · x i n r ρ F i d x r + 1 , , n d x 1 r = r n r ρ F i log ρ 1 r x i d x r + 1 , , n d x 1 r = n ρ F i log ρ 1 r x i d x = E F i log ρ 1 r x i .
In the derivation, the compactness assumption has been used (variables vanish at the boundaries). By the same approach, the right hand side becomes
1 2 r log ρ 1 r · i = 1 r j = 1 r n r 2 g i j ρ x i x j d x r + 1 , , n d x 1 r = 1 2 i = 1 r j = 1 r n log ρ 1 r · 2 g i j ρ x i x j d x = 1 2 i = 1 r j = 1 r n g i j ρ 2 log ρ 1 r x i x j d x = 1 2 i = 1 r j = 1 r E g i j 2 log ρ 1 r x i x j .
Hence,
d H A d t = i = 1 r E F i log ρ 1 r x i 1 2 i = 1 r j = 1 r E g i j 2 log ρ 1 r x i x j .
Likewise, we have
d H B d t = i = r + 1 s E F i log ρ r + 1 , , s x i 1 2 i = r + 1 s j = r + 1 s E g i j 2 log ρ r + 1 , , s x i x j .
Now consider the impact of the subsystem A on its peer B, written T A B . Following Liang (2016) [16], this is associated with the evolution of the joint entropy of the latter:
d H B d t = d H B A d t + T A B ,
where H B A signifies the entropy evolution with the influence of A excluded, which is found by instantaneously freezing ( x 1 , , x r ) x 1 r as parameters. To do this, examine, on an infinitesimal interval [ t , t + Δ t ] , a system modified from the original Equations (1)–(3) by removing the r equations for x 1 , x 2 , ..., x r from the equation set
d x r + 1 d t = F r + 1 ( x 1 , x 2 , , x n ; t ) + k = 1 m b r + 1 , k ( x 1 , x 2 , , x n ; t ) w ˙ k
d x s d t = F s ( x 1 , x 2 , , x n ; t ) + k = 1 m b s , k ( x 1 , x 2 , , x n ; t ) w ˙ k
d x s + 1 d t = F s + 1 ( x 1 , x 2 , , x n ; t ) + k = 1 m b s + 1 , k ( x 1 , x 2 , , x n ; t ) w ˙ k
d x n d t = F n ( x 1 , x 2 , , x n ; t ) + k = 1 m b n k ( x 1 , x 2 , , x n ; t ) w ˙ k .
Notice that the F i s and b i k s still have dependence on ( x 1 , , x r ) x 1 r , which, however, appear in the modified system as parameters. By [16], we can construct a mapping Φ : n r n r , x A ( t ) x A ( t + Δ t ) , where x A means x but with x 1 r appearing as parameters, and study the Frobenius–Perron operator (see, for example, [25]) of the modified system. An alternative approach is given by Liang in [18], which we henceforth follow. Observe that on the interval [ t , t + Δ t ] , corresponding to the modified dynamical system, there is also a Fokker–Planck equation
ρ A t + i = r + 1 n F i ρ A x i = 1 2 i = r + 1 n j = r + 1 n 2 g i j ρ A x i x j , ρ A = ρ r + 1 , , n at time t .
Here g i j = k = 1 m b i k b j k , ρ A means the joint pdf of ( x r + 1 , , x n ) with x 1 r frozen as parameters. Note the difference between ρ A and ρ r + 1 , , n ; the former has x 1 r as parameters, while the latter has no dependence on x 1 r . However, they are equal at time t.
Integration of the above Fokker–Planck equation with respect to d x s + 1 , , n gives the evolution of the pdf of subsystem B with A frozen as parameters, written ρ B , A :
ρ B , A t + i = r + 1 s n s F i ρ A x i d x s + 1 , , n = 1 2 i = r + 1 s j = r + 1 s n s 2 g i j ρ A x i x j d x s + 1 , , n ,
ρ B , A = ρ r + 1 , , s at time t .
Divide Equation (16) by ρ B , A and simplify the notation x r + 1 , , s by x B to obtain
log ρ B , A t + i = r + 1 s 1 ρ B , A n s F i ρ A x i d x s + 1 , , n = 1 2 ρ B , A i = r + 1 s j = r + 1 s n s 2 g i j ρ A x i x j d x s + 1 , , n .
Discretizing and noticing that ρ B , A ( t ) = ρ r + 1 , , s ( t ) , we have (in the following, unless otherwise indicated, the variables without arguments explicitly specified are assumed to be at time step t)
log ρ B , A ( x B ; t + Δ t ) = log ρ r + 1 , , s ( x B ; t ) Δ t · i = r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + Δ t 2 i = r + 1 s j = r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n + o ( Δ t ) .
To arrive at d H B , A / d t , we need to find log ρ B , A ( x B ( t + Δ t ) ; t + Δ t ) . Using the Euler–Bernstein approximation,
x B ( t + Δ t ) = x B ( t ) + F B Δ t + B B Δ w
where, just as the notation x B ,
F B = ( F r + 1 , , F s ) T , B B = b r + 1 , 1 b r + 1 , m b s 1 b s m Δ w = ( Δ w 1 , , Δ w m ) T
and Δ w k N ( 0 , Δ t ) , we have
log ( ρ B , A ( x B ( t + Δ t ) ; t + Δ t ) = log ρ r + 1 , , s ( x B ( t ) + F B Δ t + B B Δ w ; t ) Δ t · r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + Δ t 2 r + 1 s r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n + o ( Δ t ) . = log ρ r + 1 , , s ( x B ( t ) ) + i = r + 1 s log ρ r + 1 , , s x i ( F i Δ t + k = 1 m b i k Δ w k ) + 1 2 · i = r + 1 s j = r + 1 s 2 log ρ r + 1 , , s x i x j ( F i Δ t + k = 1 m b i k Δ w k ) · ( F j Δ t + l = 1 m b j l Δ w l ) Δ t · r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + Δ t 2 r + 1 s r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n + o ( Δ t ) .
Take mathematical expectation on both sides. The left hand side is H B , A ( t + Δ t ) . By Corollary III.I of [16], and noting E Δ w k = 0 , E Δ w k 2 = Δ t and the fact that Δ w are independent of x B , we have
H B , A ( t + Δ t ) = H B ( t ) + Δ t · E i = r + 1 s F i log ρ r + 1 , , s x i + Δ t 2 · E i = r + 1 s j = r + 1 s k = 1 m l = 1 m b i k b j l δ k l 2 log ρ r + 1 , , s x i x j Δ t · E r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + Δ t 2 E r + 1 s r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n + o ( Δ t ) = H B ( t ) + Δ t · E i = r + 1 s F i log ρ r + 1 , , s x i + Δ t 2 · E i = r + 1 s j = r + 1 s g i j 2 log ρ r + 1 , , s x i x j Δ t · E r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + Δ t 2 E r + 1 s r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n + o ( Δ t ) .
Thus,
d H B , A d t = lim Δ t 0 H B , A H B ( t ) Δ t = E i = r + 1 n F i log ρ r + 1 , , s x i 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n 1 2 E i = r + 1 s j = r + 1 s g i j 2 log ρ r + 1 , , s x i x j + 1 ρ r + 1 , , s 2 x i x j n s g i j ρ r + 1 , , n d x s + 1 , , n .
Hence, the information flow from x 1 r to x r + 1 , , s is
T A B = d H B d t d H B , A d t = E i = r + 1 s F i log ρ r + 1 , , s x i 1 2 E i = r + 1 s j = r + 1 s g i j 2 log ρ r + 1 , , s x i x j + E i = r + 1 s F i log ρ r + 1 , , s x i 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + 1 2 E i = r + 1 s j = r + 1 s g i j 2 log ρ r + 1 , , s x i x j + 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n = E i = r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + 1 2 E i = r + 1 s j = r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n .
Likewise, we can obtain the information flow from subsystem B to subsystem A. These are summarized in the following theorem.
Theorem 1.
For the dynamical system Equations (1)–(3), if the probability density function (pdf) of x is compactly supported, then the information flow from x 1 r to x r + 1 , , s and that from x r + 1 , , s to x 1 r are (in nats per unit time), respectively,
T A B = E i = r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + 1 2 E i = r + 1 s j = r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n ,
T B A = E i = 1 r 1 ρ 1 r n s F i ρ 1 , , r , s + 1 , , n x i d x s + 1 , , n + 1 2 E i = 1 r j = 1 r 1 ρ 1 r n s 2 g i j ρ 1 , , r , s + 1 , , n x i x j d x s + 1 , , n ,
where g i j = k = 1 m b i k b j k , and E signifies mathematical expectation.
When r = 1 , s = n = 2 , (20) reduces to
T B A = E 1 ρ 1 F 1 ρ 1 x 1 + 1 2 E 1 ρ 1 2 g 11 ρ 1 x 1 2
which is precisely the same as the Equation (15) in [18]; the same holds for Equation (19). These equations are hence verified.
The following theorem forms the basis for causal inference.
Theorem 2.
If the evolution of subsystem A (resp. B) does not depend on x r + 1 , , s (resp. x 1 r ), then T B A = 0 (resp. T A B = 0 ).
Proof. 
We only check the formula for T B A . In (20), the deterministic part
E i = 1 r 1 ρ 1 r n s F i ρ 1 , , r , s + 1 , , n x i d x s + 1 , , n = i = 1 r r s r ρ 1 , , s ρ 1 r n s F i ρ 1 , , r , s + 1 , , n x i d x s + 1 , , n d x 1 r d x r + 1 , , s .
Now, F i is independent of x r + 1 , , s , and note that ρ 1 , , r , s + 1 , , n is also so. Thus, we may integrate ρ 1 , , s within the parentheses directly with respect to d x r + 1 , , s , yielding
s r ρ 1 , , s d x r + 1 , , s ρ 1 r = ρ 1 r ρ 1 r = 1 .
By the compactness of ρ , the whole deterministic part hence vanishes. Likewise, it can be proved that the stochastic part also vanishes.
This theorem allows us to identify the causality with information flow. Ideally, if T B A = 0 , then B is not causal to A, and vice versa; the same holds for T A B . □

3. Information Flow between Linear Subsystems and Its Estimation

Linear systems provide the simplest framework which is usually taken as the first step toward a more generic setting. Simple as it may be, it has been demonstrated in practice that linear results often provide a good approximation of an otherwise much more complicated problem. It is hence of interest to examine this special case.
Let
F i = f i + j = 1 n a i j x j ,
where f i and a i j are constants. Additionally, suppose that b i j are constants—that is to say, the noises are additive. Then, g i j are also constants. Thus, in Equation (20),
E 1 ρ 1 r n s 2 g i j ρ 1 , , r , s + 1 , , n x i x j d x s + 1 , , n = g i j s ρ ( x 1 s ) 1 ρ 1 r 2 ρ 1 , , r , s + 1 , , n x i x j d x s + 1 , , n d x 1 s = g i j r s r ρ 1 s ρ 1 r 2 ρ 1 r x i x j d x r + 1 , , s d x 1 r = g i j r 1 · 2 ρ 1 r x i x j d x 1 r = 0 .
The same holds in Equation (19). Thus, the stochastic parts in both Equations (19) and (20) vanish.
Since a linear system initialized with a Gaussian process will always be Gaussian, we may write the joint pdf of x as
ρ ( x 1 , , x n ) = 1 ( 2 π ) n det Σ e 1 2 ( x μ ) T Σ 1 ( x μ ) ,
where Σ = ( σ i j ) n × n is the population covariance matrix of x . By the property of the Gaussian process, it is easy to show
ρ r + 1 , , s ( x r + 1 , , x s ) = 1 ( 2 π ) s r det Σ B e 1 2 ( x B μ B ) T Σ B 1 ( x B μ B ) ,
where x B = ( x r + 1 , , x s ) , μ B = ( μ r + 1 , , μ s ) is the vector of the means of x B , and Σ B the covariance matrix of x B . For easy correspondence, we will augment x B , μ B , and Σ B , so that their entries have the same indices as their counterparts in x , μ and Σ . Separate F i into two parts:
F i = f i + j = 1 r a i j x j + j = s + 1 n a i j x j + j = r + 1 s a i j x j F i + F i ,
where F i and F i correspond to the respective parts in the two square brackets. Thus, F i has nothing to do with the subspace B. By Theorem 2, this part does not contribute to the causality from A to B, so we only need to consider F i in evaluating T A B ; that is to say,
T A B = E i = r + 1 s 1 ρ r + 1 , , s x i n s F i ρ r + 1 , , n d x s + 1 , , n = E i = r + 1 s 1 ρ r + 1 , , s x i n s F i ρ r + 1 , , n d x s + 1 , , n = E i = r + 1 s 1 ρ r + 1 , , s F i ρ r + 1 , , s x i = i = r + 1 s E F i log ρ r + 1 , , s x i + E F i x i .
The second term in the bracket is a i i . The first term is
F i log ρ r + 1 , , s x i = j = r + 1 s a i j x j · x i 1 2 ( x B μ B ) T Σ B 1 ( x B μ B ) = j = r + 1 s a i j x j ˙ j = r + 1 s σ i j + σ j i 2 · ( x j μ j ) .
Here, σ i j is the ( i , j ) th entry of the matrix
I 0 0 0 Σ B 1 0 0 0 I .
Since, here, only 1 i , j s are in question, this is equal to the ( i , j ) th entry of the matrix
I r × r 0 r × ( s r ) 0 ( s r ) × r Σ B 1
As Σ B is symmetric, so is Σ B 1 , and hence ( σ i j + σ j i ) / 2 = σ i j . Thus,
E F i log ρ r + 1 , , s x i = E j = 1 s a i j x j · j = r + 1 s ( σ i j ) · ( x j μ j ) = E k = 1 s a i k ( x k μ k ) · j = r + 1 s σ i j ( x j μ j ) = k = 1 s j = r + 1 s a i k σ i j E ( x k μ k ) ( x j μ j ) = k = 1 s j = r + 1 s a i k σ i j σ k j .
Substituting back, we obtain a very simplified result for T A B . Likewise, T B A can also be obtained, as shown in the following.
Theorem 3.
In Equations (1)–(3), suppose b i j are constants, and
F i = f i + j = 1 n a i j x j ,
where f i and a i j are also constants. Furthermore, suppose that initially x has a Gaussian distribution; then,
T A B = i = r + 1 s j = r + 1 s σ i j k = 1 s a i k σ k j a i i ,
where σ i j is the ( i , j ) th entry of I r × r 0 0 Σ B 1 , and
T B A = i = 1 r j = 1 r σ i j k = 1 s a i k σ k j a i i ,
where σ i j is the ( i , j ) th entry of Σ A 1 0 0 I ( s r ) × ( s r ) .
Given a system such as (1)–(3), we can evaluate in a precise sense the information flows among the components. Now, suppose that instead of the dynamical system, what we have are just n time series with K steps, K n , { x 1 ( k ) } , { x 2 ( k ) } , , { x n ( k ) } . We can estimate the system from the series and then apply the information flow formula to fulfill the task. Assume a linear model as shown above, and assume m = 1 . following Liang (2014) [19], the maximum likelihood estimator (mle) of a i j is equal to the least-square solution of the following over-determined problem:
1 x 1 ( 1 ) x 2 ( 1 ) x n ( 1 ) 1 x 1 ( 2 ) x 2 ( 2 ) x n ( 2 ) 1 x 1 ( 3 ) x 2 ( 3 ) x n ( 3 ) 1 x 1 ( K ) x 2 ( K ) x n ( K ) f i a i 1 a i 2 a i n = x ˙ i ( 1 ) x ˙ i ( 2 ) x ˙ i ( 3 ) x ˙ i ( K )
where x ˙ i ( k ) = ( x i ( k + 1 ) x i ( k ) ) / Δ t ( Δ t is the time stepsize), for i = 1 , 2 , , n , k = 1 , , K . Use an overbar to denote the time mean over the K steps. The above equation is
1 x ¯ 1 x ¯ 2 x ¯ n 0 x 1 ( 2 ) x ¯ 1 x 2 ( 2 ) x ¯ 2 x n ( 2 ) x ¯ n 0 x 1 ( 3 ) x ¯ 1 x 2 ( 3 ) x ¯ 2 x n ( 3 ) x ¯ n 0 x 1 ( K ) x ¯ 1 x 2 ( K ) x ¯ 2 x n ( K ) x ¯ n f i a i 1 a i 2 a i n = x ˙ ¯ i x ˙ i ( 2 ) x ˙ ¯ i x ˙ i ( 3 ) x ˙ ¯ i x ˙ i ( K ) x ˙ ¯ i
Denote by R the matrix
x 1 ( 2 ) x ¯ 1 x 2 ( 2 ) x ¯ 2 x n ( 2 ) x ¯ n x 1 ( K ) x ¯ 1 x 2 ( K ) x ¯ 2 x n ( K ) x ¯ n ,
q the vector ( x i ( 2 ) x ˙ ¯ i , , x i ( K ) x ˙ ¯ i ) T , and a i the row vector ( a i 1 , , a i n ) T . Then, R a i = q . The least square solution of a i , a ^ i , solves
R T R a ^ i = R T q .
Note that R T R is K C , where C = ( c i j ) is the sample covariance matrix. Thus,
a ^ i 1 a ^ i 2 a ^ i n = C 1 c 1 , d i c 2 , d i c n , d i
where c j , d i is the sample covariance between the series { x j ( k ) } and { ( x i ( k + 1 ) x i ( k ) ) / Δ t } .
Thus, finally, the mle of T A B is
T ^ A B = i = r + 1 s j = r + 1 s c i j k = 1 s a ^ i k c k j a ^ i i ,
where c i j is the ( i , j ) th entry of C ˜ 1 , and
C ˜ = I r × r 0 r × ( s r ) 0 ( s r ) × r c r + 1 , r + 1 c r + 1 , s c s , r + 1 c s , s .
Likewise,
T ^ B A = i = 1 r j = 1 r c i j k = 1 s a ^ i k c k j a ^ i i
Here,
C ˜ ˜ = c 11 c 1 r c r 1 c r r 0 r × ( s r ) 0 ( s r ) × r I ( s r ) × ( s r ) ,
and c i j is the ( i , j ) t h entry of C ˜ ˜ 1 .
When n = 2 and r = 1 , and hence, s = 1 , C ˜ ˜ = c 11 0 0 1 , so c 11 = c 11 1 . Equation (30) thus becomes
T ^ B A = c 11 k = 1 2 a ^ 1 k c k 1 a ^ 11 = 1 c 11 ( a ^ 11 c 11 + a ^ 12 c 12 ) a ^ 11 = c 11 c 12 c 2 , d 1 c 12 2 c 1 , d 1 c 11 2 c 11 c 12 2 ,
recovering the well-known Equation (10) in [19].

4. Validation

4.1. One-Way Causal Relation

To see if the above formalism works, consider the vector autoregressive (VAR) process:
X : x 1 ( n + 1 ) = 0.5 x 1 ( n ) + 0.5 x 2 ( n ) + 0.2 x 3 ( n ) + e x 1 ( n + 1 ) , x 2 ( n + 1 ) = 0 x 1 ( n ) 0.2 x 2 ( n ) 0.6 x 3 ( n ) + e x 2 ( n + 1 ) , x 3 ( n + 1 ) = 0.2 x 1 ( n ) + 0.4 x 2 ( n ) 0.2 x 3 ( n ) + ε 3 y 3 ( n ) + e x 3 ( n + 1 ) ,
Y : y 1 ( n + 1 ) = 0.2 y 1 ( n ) 0.5 y 2 ( n ) + 0 y 3 ( n ) ε 1 x 1 ( n ) + e y 1 ( n + 1 ) , y 2 ( n + 1 ) = 0.5 y 1 ( n ) 0.6 y 2 ( n ) + 0.4 y 3 ( n ) + e y 2 ( n + 1 ) , y 3 ( n + 1 ) = 0.1 y 1 ( n ) 0.4 y 2 ( n ) 0.5 y 3 ( n ) + e y 3 ( n + 1 ) ,
where e x i , e y i N ( 0 , 1 ) , i = 1 , 2 , 3 , are independent. As schematized in Figure 1, ( x 1 , x 2 , x 3 ) and ( y 1 , y 2 , y 3 ) form two subsystems, written as X and Y, respectively. They are coupled only through the first and third components; more specifically, x 1 drives y 1 , and Y feeds back to X through coupling y 3 with x 3 . The strength of the coupling is determined by the parameters ε 1 and ε 3 . In this subsection, ε 3 = 0 , so the causality is one-way, i.e., from X to Y without feedback.
Initialized with random numbers, we iterate the process for 20,000 steps and discard the first 10,000 steps to form six time series with a length of 10,000 steps. Using the algorithm by Liang (e.g., [16,18,19,20]), the information flows between x 1 and y 1 can be rather accurately obtained. As shown in Figure 2a, the information flow/causality from X to Y increases with ε 1 , and there is no causality the other way around, just as expected. Since no other coupling exists, one can imagine that the bulk information flows must also bear a similar trend. Using Equations (28) and (30), the estimators are indeed similar to that, as shown in Figure 2b. This demonstrates the success of the above formalism.
Since practically averages and principal components (PCs) have been widely used to measure complex subsystem variations, we also compute the information flows between x ¯ = 1 3 ( x 1 + x 2 + x 3 ) and y ¯ = 1 3 ( y 1 + y 2 + y 3 ) , and that between the first PCs of ( x 1 , x 2 , x 3 ) and ( y 1 , y 2 , y 3 ) . The results are plotted in Figure 2c,d, respectively. As can be seen, the principal component analysis (PCA) method works just fine in this case. By comparison, the averaging method yields an incorrect result.
The incorrect inference based on averaging is within expectation. In a network with complex causal relations, for example, with a causality from y 2 to y 1 , the averaging of y 1 with y 2 is equivalent to mixing y 1 with its future state, which is related to the contemporary state of x 1 , and hence will yield a spurious causality to x 1 . The PCA here functions satisfactorily, perhaps because in selecting the most coherent structure, it discards most of the influences from other (implicit) time steps. However, the relative success of PCA may not be robust, as evidenced in the following mutually causal case.

4.2. Mutually Causal Relation

If both the coupling parameters, ε 1 and ε 3 , are turned on, the resulting causal relation has a distribution on the ε 1 ε 3 plane. Figure 3 shows the componentwise information flows T x 1 y 1 (bottom) and T y 3 x 3 (top) on the plane. The other two flows, i.e., their counterparts T y 1 x 1 and T x 3 y 3 , are by computation essentially zero. As argued in the preceding subsection, the bulk information flows should follow the general pattern, albeit perhaps in a more coarse and/or mild pattern, since it is a property on the whole. This is indeed true. Shown in Figure 4 are the bulk information flows between X and Y computed using Equations (28) and (30).
Again, as usual, we try the averages and first PCs as proxies for estimating the causal interaction between X and Y. Figure 5 shows the distributions of the information flows between x ¯ and y ¯ . The resulting patterns are totally different from what Figure 3 displays; obviously, these patterns are incorrect.
One may expect that the PCA method should yield more reasonable causal patterns. We have computed the first PCs for ( x 1 , x 2 , x 3 ) and ( y 1 , y 2 , y 3 ) , respectively, and estimated the information flows using the algorithm by Liang [20]. The resulting distributions, however, are no better than those with the averaged series (Figure 6). That is to say, this seemingly more sophisticated approach does not yield the right interaction between the complex subsystems, either.

5. Summary

Information flow provides a natural measure of the causal interaction between dynamical events. In this study, the information flows between two complex subsystems of a large dimensional system are studied, and analytical formulas have been obtained in a closed form. For easy reference, the major results are summarized hereafter.
For an n-dimensional
d x d t = F ( x , t ) + B ( x , t ) w ˙ ,
if the probability density function (pdf) of x is compactly supported, then the information flows from subsystem A, which are made of x 1 r , to subsystem B, made of x r + 1 , , s ( 1 r < s n ), and that from B to A are, respectively (in nats per unit time),
T A B = E i = r + 1 s 1 ρ r + 1 , , s n s F i ρ r + 1 , , n x i d x s + 1 , , n + 1 2 E i = r + 1 s j = r + 1 s 1 ρ r + 1 , , s n s 2 g i j ρ r + 1 , , n x i x j d x s + 1 , , n ,
T B A = E i = 1 r 1 ρ 1 r n s F i ρ 1 , , r , s + 1 , , n x i d x s + 1 , , n + 1 2 E i = 1 r j = 1 r 1 ρ 1 r n s 2 g i j ρ 1 , , r , s + 1 , , n x i x j d x s + 1 , , n ,
where g i j = k = 1 m b i k b j k , and E signifies mathematical expectation. Given n stationary time series, T A B and T B A can be estimated. The maximum likelihood estimators under a Gaussian assumption are referred to in Equations (28) and (30).
We have constructed a VAR process to validate the formalism. The system has a dimension of 6, with two subsystems respectively denoted by X and Y, each with a dimension of 3. X drives Y via the coupling at one component, and Y feeds back to X via another. The detailed, componentwise causal relation can be easily found using our previous algorithms such as that in [20]. It is expected that the bulk information flow should in general also follow a similar trend, though the structure could be in a more coarse and mild fashion, as now displayed is an overall property. The above formalism does yield such a result. On the contrary, the commonly used proxies for subsystems, such as averages and principal components (PCs), generally do not work. Particularly, the averaged series yield the wrong results in the two cases considered in this study; the PC series do not work either for the mutually causal case, though they result in a satisfactory characterization for the case with a one-way causality.
The result of this study is applicable in many real world problems. As explained in the Introduction, it will be of particular use in the related fields of climate science, neuroscience, financial economics, fluid mechanics, etc. For example, it helps clarify the role of greenhouse gas emissions in bridging the climate system and the socioeconomic system (see a review in [26]). Likewise, the interaction between the earth system and public health [27] can also be studied. In short, it is expected to play a role in the frontier field of complexity, namely, multiplex networks or networks of networks (see the references in [28,29,30]). We are therefore working on these applications.

Funding

This research was funded by the Shanghai International Science and Technology Partnership Project (grant number: 21230780200), the National Science Foundation of China (grant number: 41975064), and the 2015 Jiangsu Program for Innovation Research and Entrepreneurship Groups.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Intergovenmental Panel on Climate Change (IPCC). The Sixth Assessment Report, Climate Change 2021: The Physical Science Basis. Available online: https://www.ipcc.ch/report/ar6/wg1/#FullReport (accessed on 15 November 2021).
  2. Friston, K.J.; Harrison, L.; Penny, W. Dynamic causal modeling. Neuroimage 2003, 19, 1273–1302. [Google Scholar] [CrossRef]
  3. Li, B.; Daunizeau, J.; Stephan, K.E.; Penny, W.; Hu, D.; Friston, K. Generalised filtering and stochastic DCM for fMRI. Neuroimage 2011, 58, 442–457. [Google Scholar] [CrossRef] [Green Version]
  4. Friston, K.J.; Ungerleider, L.G.; Jezzard, P.; Turner, R. Characterizing modulatory interactions between V1 and V2 in human cortex with fMRI. Hum. Brain Mapp. 1995, 2, 211–224. [Google Scholar] [CrossRef]
  5. Karl, J. Friston, Joshua Kahan, Adeel Razi, Klaas Enno Stephan, Olaf Sporns. On nodes and modes in resting state fMRI. NeuroImage 2014, 99, 533C547. [Google Scholar]
  6. Qiu, P.; Jiang, J.; Liu, Z.; Cai, Y.; Huang, T.; Wang, Y.; Liu, Q.; Nie, Y.; Liu, F.; Cheng, J.; et al. BMAL1 knockout macaque monkeys display reduced sleep and psychiatric disorders. Natl. Sci. Rev. 2019, 6, 87–100. [Google Scholar] [CrossRef] [Green Version]
  7. Wang, X.-J.; Hu, H.; Huang, C.; Keennedy, H.; Li, C.T.; Logothetis, N.; Lu, Z.-L.; Luo, Q.; Poo, M.-M.; Tsao , D.; et al. Computational neuroscience: A frontier of the 21st century. Natl. Sci. Rev. 2020, 7, 1418–1422. [Google Scholar] [CrossRef] [PubMed]
  8. Granger, C. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424. [Google Scholar] [CrossRef]
  9. Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
  10. Imbens, G.W.; Rubin, D.B. Causal Inference for Statistics, Social, and Biomedical Sciences; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
  11. Batchelor, G.K. The Theory of Homogeneous Turbulence; Cambridge University Press: Cambridge, UK, 1953; 197p. [Google Scholar]
  12. Landau, L.D.; Lifshitz, E.M. Statistical Physics, 2nd Revised and Enlarged ed.; Pergamon Press: Oxford, UK, 1969. [Google Scholar]
  13. Preisendorfer, R. Principal Component Analysis in Meteorology and Oceanography; Elsevier: Amsterdam, The Netherlands, 1998; 418p. [Google Scholar]
  14. Friston, K.J.; Frith, C.D.; Liddle, P.F.; Frackowiak, R.S. Functional connectivity: The principal-component analysis of large (PET) data sets. J. Cereb. Blood Flow Metab. 1993, 13, 5–14. [Google Scholar] [CrossRef]
  15. Friston, K.; Phillips, J.; Chawla, D.; Buchel, C. Nonlinear PCA: Characterizing interactions between modes of brain activity. Philos. Trans. R. Soc. Lond. B Biol. Sci. 2000, 355, 135–146. [Google Scholar] [CrossRef] [Green Version]
  16. Liang, X.S. Information flow and causality as rigorous notions ab initio. Phys. Rev. E 2016, 94, 052201. [Google Scholar] [CrossRef] [Green Version]
  17. Liang, X.S.; Kleeman, R. Information transfer between dynamical system components. Phys. Rev. Lett. 2005, 95, 244101. [Google Scholar] [CrossRef] [Green Version]
  18. Liang, X.S. Information flow within stochastic systems. Phys. Rev. E 2008, 78, 031113. [Google Scholar] [CrossRef] [Green Version]
  19. Liang, X.S. Unraveling the cause-effect relation between time series. Phys. Rev. E 2014, 90, 052150. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Liang, X.S. Normalized multivariate time series causality analysis and causal graph reconstruction. Entropy 2021, 23, 679. [Google Scholar] [CrossRef] [PubMed]
  21. Majda, A.J.; Harlim, J. Information flow between subspaces of complex dynamical systems. Proc. Natl. Acad. Sci. USA 2007, 104, 9558–9563. [Google Scholar] [CrossRef] [Green Version]
  22. Liang, X.S. Measuring the importance of individual units in producing the collective behavior of a complex network. Chaos 2021, 31, 093123. [Google Scholar] [CrossRef]
  23. AI-Sadoon, M.M. Testing subspace Granger causality. Econom. Stat. 2019, 9, 42–61. [Google Scholar]
  24. Triacca, U. Granger causality between vectors of time series: A puzzling property. Stat. Probab. Lett. 2018, 142, 39–43. [Google Scholar] [CrossRef]
  25. Lasota, A.; Mackey, M.C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics; Springer: New York, NY, USA, 1994. [Google Scholar]
  26. Tachiiri, K.; Su, X.; Matsumoto, K.K. Identifying the key processes and sectors in the interaction between climate and socio-economic systems: A review toward integrating Earth-human systems. Progress. Earth Planet. Sci. 2021, 8, 24. [Google Scholar] [CrossRef]
  27. Balbus, J.; Crimmins, A.; Gamble, J.L.; Easterling, D.R.; Kunkel, K.E.; Saha, S.; Sarofim, M.C. Introduction: Climate Change and Human Health. The Impacts of Climate Change on Human Health in the United States: A Scientifi Assessment. U.S. Global Change Research Program: Washington, DC, USA, 2016. [Google Scholar]
  28. D’Agostino, G.; Scala, A. Networks of Networks: The Last Frontier of Complexity; Springer: New York, NY, USA, 2014. [Google Scholar]
  29. Kenett, D.Y.; Matjaž, P.; Boccaletti, S. Networks of networks—An introduction. Chaos Solitons Fractals 2015, 80, 1–6. [Google Scholar] [CrossRef]
  30. DeFord, D.R.; Pauls, S.D. Spectral clustering methods for multiplex networks. Phys. A Stat. Mech. Its Appl. 2019, 533, 121949. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The preset coupling between the subsystems X and Y.
Figure 1. The preset coupling between the subsystems X and Y.
Entropy 24 00003 g001
Figure 2. The absolute information flows between subspaces X and Y as functions of the coupling coefficients ε 1 ( ε 3 = 0 ). (a) The componentwise information flows between x 1 and y 1 ; (b) the bulk information flows between subsystems X and Y computed with Equations (28) and (30); (c) the information flows between x ¯ and y ¯ ; (d) the information flows between the first principal components of ( x 1 , x 2 , x 3 ) and ( y 1 , y 2 , y 3 ), respectively (units: nats per time step).
Figure 2. The absolute information flows between subspaces X and Y as functions of the coupling coefficients ε 1 ( ε 3 = 0 ). (a) The componentwise information flows between x 1 and y 1 ; (b) the bulk information flows between subsystems X and Y computed with Equations (28) and (30); (c) the information flows between x ¯ and y ¯ ; (d) the information flows between the first principal components of ( x 1 , x 2 , x 3 ) and ( y 1 , y 2 , y 3 ), respectively (units: nats per time step).
Entropy 24 00003 g002
Figure 3. The absolute information flow from y 3 to x 3 and that from x 1 to y 1 as functions of ε 1 and ε 3 . The units are in nats per time step.
Figure 3. The absolute information flow from y 3 to x 3 and that from x 1 to y 1 as functions of ε 1 and ε 3 . The units are in nats per time step.
Entropy 24 00003 g003
Figure 4. The absolute bulk information flow from subsystem Y to subsystem X, and that from X to Y. The abscissa and ordinate are the coupling coefficients ε 1 and ε 3 , respectively.
Figure 4. The absolute bulk information flow from subsystem Y to subsystem X, and that from X to Y. The abscissa and ordinate are the coupling coefficients ε 1 and ε 3 , respectively.
Entropy 24 00003 g004
Figure 5. As Figure 4, but for the information flows between the mean series x ¯ = 1 3 ( x 1 + x 2 + x 3 ) and y ¯ = 1 3 ( y 1 + y 2 + y 3 ) .
Figure 5. As Figure 4, but for the information flows between the mean series x ¯ = 1 3 ( x 1 + x 2 + x 3 ) and y ¯ = 1 3 ( y 1 + y 2 + y 3 ) .
Entropy 24 00003 g005
Figure 6. As Figure 4, but for the information flows between the first principal component of ( x 1 , x 2 , x 3 ) and that of ( y 1 , y 2 , y 3 ) .
Figure 6. As Figure 4, but for the information flows between the first principal component of ( x 1 , x 2 , x 3 ) and that of ( y 1 , y 2 , y 3 ) .
Entropy 24 00003 g006
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liang, X.S. The Causal Interaction between Complex Subsystems. Entropy 2022, 24, 3. https://doi.org/10.3390/e24010003

AMA Style

Liang XS. The Causal Interaction between Complex Subsystems. Entropy. 2022; 24(1):3. https://doi.org/10.3390/e24010003

Chicago/Turabian Style

Liang, X. San. 2022. "The Causal Interaction between Complex Subsystems" Entropy 24, no. 1: 3. https://doi.org/10.3390/e24010003

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop