A straightforward approach to the estimation of entropy combinations would be to add separate estimates of each of the multi-dimensional entropies appearing in combination. Popular estimators of differential entropy include plug-in estimators, as well as fixed and adaptive histogram or partition methods. However, other non-parametric techniques, such as kernel and nearest-neighbor estimators, have been shown to be extremely more data efficient [

20,

21]. An asymptotically unbiased estimator based on nearest-neighbor statistics is due to Kozachenko and Leonenko (KL) [

22]. For

N realizations

x[

1],

x[

2],

…, x[

N] of a

d-dimensional random vector

X, the KL estimator takes the form:

where

ψ is the digamma function,

v_{d} is the volume of the

d-dimensional unit ball and

ϵ(

i) is the distance from

x[

i] to its k-th nearest neighbor in the set {

x[

j]}

_{∀j≠i}. The KL estimator is based on the assumption that the density of the distribution of random vectors is constant within an

ϵ-ball. The bias of the final entropy estimate depends on the validity of this assumption and, thus, on the values of

ϵ(

n). Since the size of the

ϵ-balls depends directly on the dimensionality of the random vector, the biases of estimates for the differential entropies in

Equation (1) will, in general, not cancel, leading to a poor estimator of the entropy combination. This problem can be partially overcome by noticing that

Equation (2) holds for any value of

k, so that we do not need to have a fixed

k. Therefore, we can vary the value of

k in each data point, so that the radius of the corresponding

ϵ-balls would be approximately the same for the joint and the marginal spaces. This idea was originally proposed in [

23] for estimating mutual information and was used in [

16] to estimate PMI, and we generalize it here to the following estimator of entropy combinations:

where

F(

k) =

ψ(

k) –

ψ(

N) and

${\langle \cdots \rangle}_{n}=\frac{1}{N}{\displaystyle {\sum}_{n=1}^{N}(\cdots )}$ denotes averaging with respect to the time index. The term

k_{i}(

n) accounts for the number of neighbors of the

n-th realization of the marginal vector

${V}_{{\mathcal{L}}_{i}}$ located at a distance strictly less than

ϵ(

n), where

ϵ(

n) denotes the radius of the

ϵ-ball in the joint space. Note that the point itself is included in the counting neighbors in marginal spaces (

k_{i}(

n)), but not when selecting

ϵ(

n) from the

k-th nearest neighbor in the full join space. Furthermore, note that estimator

Equation (3) corresponds to extending “Algorithm 1” in [

23] to entropy combinations. Extensions to conditional mutual information and conditional transfer entropy using “Algorithm 2” in [

23] have been discussed recently [

12].

This naive time-adaptive estimator is not useful in practice, due to its large variance, which stems from the fact that a single data point is used for producing the estimate at each time instant. More importantly, the neighbor searches in the former estimator run across the full time series and, thus, ignore possible non-stationary changes.

However, let us consider the case of an ensemble of

r′ repeated measurements (trials) from the dynamics of

V. Let us also denote by {

v^{(r)}[

n]}

_{r} the measured dynamics for those trials (

r = 1, 2,

…r′). Similarly, we denote by

${\{{v}_{i}^{(r)}[n]\}}_{r}$ the measured dynamics for the marginal vector

${V}_{{\mathcal{L}}_{i}}$. A straightforward approach for integrating the information from different trials is to average together estimates obtained from individual trials:

where

${\widehat{C}}^{(r)}(\{{V}_{{\mathcal{L}}_{1}},\dots ,{V}_{{\mathcal{L}}_{p}}\},n)$ is the estimate obtained from the

r-th trial. However, this approach makes poor use of the available data and will typically produce useless estimates, as will be shown in the experimental section of this text.

A more effective procedure takes into account the multi-trial nature of our data by searching for neighbors across ensemble members, rather than from within each individual trial. This nearest ensemble neighbors [

24] approach is illustrated in

Figure 1 and leads to the following ensemble estimator of entropy combinations:

where the counts of marginal neighbors

$\{{k}_{i}^{(r)}(n)\}\begin{array}{c}\forall r=1,\dots ,r\prime \\ \forall i=1,\dots ,p\end{array}$ are computed using overlapping time windows of size 2σ, as shown in

Figure 1. For rapidly changing dynamics, small values of σ might be needed to increase the temporal resolution, thus, being able to track more volatile non-stationarities. On the other hand, larger values of σ will lead to lower estimator variance and are useful when non-stationarities develop over slow temporal scales.