Appendix A. Finer Details of Numerical Estimation Techniques
Recall from Equation (
7) that the squared information rate is
where the partial time derivative and integral can be numerically approximated using discretization, i.e.,
and
, respectively, where for brevity and if no ambiguity arises, the summation over index
i is often omitted and replaced by
x itself as
, and the symbol
x serves as both the index of summation (e.g., the
x-th interval with length
) and the actual value
x in
.
A common technique to improve the approximation of integral by finite summation is the trapezoidal rule , which will be abbreviated as to indicate that the summation is following the trapezoidal rule imposing a weight/factor on the first and last summation terms (corresponding to the lower and upper bounds of the integral). Similarly, we use to denote a 2D trapezoidal approximation of the double integral , where different weights ( or ) will be applied to the “corner”/boundary terms of the summation. Meanwhile, to distinguish regular summation from trapezoidal approximation, we use the notation to signify a regular summation as a more naive approximation of the integral.
The PDF
is numerically estimated using a histogram with Rice’s rule applied, i.e., the number of bins is
(with uniform bin width
), which is rounded towards zero to avoid overestimating the number of bins needed. And for joint PDF
, since the bins are distributed in a 2D plane of
, the number of bins in each dimension is rounded to
(and similarly for 3D joint probability as in the transfer entropy calculation, the number of bins in each dimension is rounded to
). Combining all of the above, the information rate’s square will be approximated by
where the bin width
can be moved into the square root and multiplied with the PDF to get the probability (mass) of finding a data sample in the
x-th bin, which is estimated as
, i.e., the number of data samples inside that bin divided by the number of all data samples (using the relevant functions in MATLAB or Python). The trapezoidal rule imposes a
factor on the first and last terms of summation, corresponding to the first and last bins.
For the causal information rate
, the
can be estimated by
where the number of bins in each of the
and
dimensions is rounded to
, and the
can be estimated as above as
using regular or trapezoidal summation. However, here for
, the number of bins for
must
not be chosen as
following the 1D Rice’s rule, which is very critical to avoid insensible or inconsistent estimation of
, for which the reason is explained below.
Consider the quantity
; theoretically and by definition, the
can be pulled outside the integral over
to combine the two integrals into one integral as follows:
and the corresponding numerical approximations of integrals should be combined as
where the sum over
is performed on the same bins for both of the two terms inside the large braces
above. On the other hand, if one numerically approximates
and
separately as
then the sum over
in the second term
should
still be performed on the same bins of
for the first term involving the joint PDFs estimated by 2D histograms (i.e., using the square root number of bins
of Rice’s rule, instead of following the 1D Rice’s rule without the square root), even though this second summation term is written as a separate and “independent” term from the first double-summation term. The definition
might result in a misimpression that one can estimate
separately by using a Rice’s rule’s binning method containing
bins, while estimating
using the square root of Rice’s rule’s number of bins
. Using different bins for
will make it invalid to combine the two summations into one summation over the same
’s (and hence invalid to combine the two integrals into one integral by pulling out the same
).
Using bins for will overestimate the value of , for example, if there are 1 million samples/data points to estimate the PDFs, then for 1D distribution and for 2D joint distribution. Calculating using 200 bins will result in a much larger value than calculating it using 14 bins, which will result in negative values in calculating the causal information rate . When using the same 14 bins of (for estimating the 2D joint PDF of ) to estimate the 1D PDF in , all the unreasonable negative values disappear, except for only some isolated negative values remained, which is related to estimating and using 1D and 2D trapezoidal rules for summations approximating the integrals: if one uses 1D trapezoidal summation for , while on the other hand, one blindly and inconsistently uses 2D trapezoidal summation for , this will also result in some negative values in computing , because the 2D trapezoidal sum will under-estimate the as compared to the 1D trapezoidal-sum-estimated .
To resolve this inconsistent mixing of 1D and 2D trapezoidal rules, there are two possible methods:
Using 2D trapezoidal rule for both and , that is, , and . In other words, when calculating , instead of estimating marginal PDF and directly by 1D histograms (using the relevant functions in MATLAB or Python), one first estimates the joint PDF and by 2D histograms and integrates over by trapezoidal summation on it. This will reduce the value of estimated , and integrals over both and are both estimated by trapezoidal summation.
Using the 1D trapezoidal rule for both and , that is, , and . In this approach, the marginal PDF , where the equal sign holds exactly for the regular or naive summation . This is because the histogram estimation in MATLAB and Python is performed by counting the occurrence of data samples inside each bin, and the probability (mass) is estimated as , and the density is estimated as , where is the width of the x-th bin (and for 2D histogram, this is replaced by bin area ), and therefore, summing over is aggregating the 2D bins of and combining or mixing samples with -values/coordinates in the same -bin (but with -values/coordinates in different -bins) together. In other words, it is always true that , where is the number of samples inside the -th bin and is number of samples inside the -th bin in 2D, and hence, for estimated probability (mass), , and for estimated PDFs, , which is why holds exactly for numerically estimated marginal and joint PDFs using histograms, which is consistent with the theoretical relation between marginal and joint PDFs , and this has been numerically verified using the relevant 1D and 2D histogram functions in MATLAB and Python, i.e., by (naively) summing the estimated joint PDF over , and the (naively) summed marginal is exactly the same as the one estimated directly by 1D histogram function. So in this approach, integral over is estimated by naive summation on , but integral over is estimated by trapezoidal summation on .
The 1st approach will violate the relation between joint and marginal , because as explained in the 2nd approach above, when using MATLAB’s and Python’s 1D and 2D histogram functions, one will always get exactly and for naive summation, but not for trapezoidal summation over due to the weights/factors (≠1) imposed on the “corner”/boundary/first/last summation terms, which is used in the 1st approach. However, the 2nd approach puts different importance or weights on the summation over as compared to , which might also be problematic, because the original definition is a double integral over and without different weights/factors imposed by different summation methods.
To resolve this, we use the regular or naive summations on both and , which avoids the issues in both the 1st and 2nd approaches, and we find that the numerical difference between the 1st and 2nd approaches and our adopted simply naive summations are really negligible, and because in this work, we are performing empirical statistics on the estimated causal information rates and illustrating the qualitative features of the empirical probability distributions of them, we use our simple naive summations over both and when estimating and in causal information rate .