Information Theory for Non-Stationary Processes with Stationary Increments

Granero-Belinchón, Carlos; Roux, Stéphane G.; Garnier, Nicolas B.

doi:10.3390/e21121223

Open AccessArticle

Information Theory for Non-Stationary Processes with Stationary Increments

by

Carlos Granero-Belinchón

^1,2,

Stéphane G. Roux

¹ and

Nicolas B. Garnier

^1,*

¹

Univ Lyon, Ens de Lyon, Univ Claude Bernard, CNRS, Laboratoire de Physique, F-69342 Lyon, France

²

ONERA-DOTA, University of Toulouse, FR-31055 Toulouse, France

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(12), 1223; https://doi.org/10.3390/e21121223

Submission received: 26 November 2019 / Revised: 4 December 2019 / Accepted: 12 December 2019 / Published: 15 December 2019

(This article belongs to the Special Issue Information Theoretic Measures and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

:

We describe how to analyze the wide class of non-stationary processes with stationary centered increments using Shannon information theory. To do so, we use a practical viewpoint and define ersatz quantities from time-averaged probability distributions. These ersatz versions of entropy, mutual information, and entropy rate can be estimated when only a single realization of the process is available. We abundantly illustrate our approach by analyzing Gaussian and non-Gaussian self-similar signals, as well as multi-fractal signals. Using Gaussian signals allows us to check that our approach is robust in the sense that all quantities behave as expected from analytical derivations. Using the stationarity (independence on the integration time) of the ersatz entropy rate, we show that this quantity is not only able to fine probe the self-similarity of the process, but also offers a new way to quantify the multi-fractality.

Keywords:

entropy; entropy rate; non-stationary; scale invariance; non-Gaussian

1. Introduction

Many real-world processes, like global weather data, water reservoir levels, biological or medical signals, economic time series, etc., are intrinsicaly non-stationary [1,2,3,4,5]: their probability density function (PDF) deforms when time evolves. Analyzing such processes requires a stationary hypothesis in order to apply classical analysis, like, e.g., two-point correlations assessment [6]. The stationary hypothesis can be either strict or weak: while a strict stationarity requires all moments of the process—and hence its PDF—to be time-independent, a weak stationarity is achieved when the first moment and the covariance function are time-independent and the variance is finite at all time [7]. Even the weaker hypothesis is often very restrictive and not realistic over long time periods. When the signal has a drift or a linear trend, another approach is to focus on its time-increments or time-derivatives. Indeed, assuming that the increments or time derivative are stationary is then a more realistic hypothesis. For real-world processes, the stationarity of the increments or even the stationarity of the signal is often argued to be valid when considering small chunks of data spanning short enough time range [8,9,10], so that slow evolutions of higher order moments can be neglected. The present article focuses on non-stationary processes with increments that are stationary and centered—this hypothesis ensures that the processes do not have any trend or drift.

Shannon information theory provides a very general framework to study stationary processes [11,12], and some attempts to analyze non-stationary processes have been reported [13,14,15]. Contrary to most classical approaches, like, e.g., linear response theory in statistical physics or solid state physics, this framework is not restricted to the study of two-point correlations and linear relationships, and it allows to quantify higher order dependences [16] and nonlinear dynamics [12]. Information theory can be straightforwardly applied to any non-stationary time-process

X = {\{x_{t}\}}_{t \in R}

: by carefully studying how the probability density and dependences of the process evolve in time, a time-evolving Shannon entropy

H_{t} (X)

can be defined. The drawback of this approach is that it requires the knowledge of many realizations of the time-evolution of the process, as it relies on having enough statistics over the realizations [15].

Unfortunately, obtaining enough data is very difficult in real-world systems where in the best case scenario a few realizations can be recorded experimentally, and usually only a single realization is accessible. In this paper, we develop a methodology that can be applied to a single realization, in order to analyse a non-stationary signal with stationary centered increments. We describe a time-averaged framework that gathers all available data points in a time window representing a single realization, whether it is the full experimental time duration, or just a fraction of it [13].

The present paper is organized as follows. In Section 2, we present the general framework of information theory for a non-stationary signal, and our new framework that exploit time averages. We then give a particular emphasis on self-similar processes. In Section 3, we report a benchmarking of our framework in the special case of Gaussian self-similar signals, a model situation where it is possible to obtain analytical developments. In Section 4, we explore the case of non-Gaussian self-similar processes. Finally, in Section 5, we drop the hypothesis of self-similarity and we apply our framework to a multifractal process.

2. Information Theory for Non-Stationary Processes

2.1. Non-Stationary Processes with Stationary Increments

In this article, we consider non-stationary processes with stationary increments. Such a process can be written as a motion

M = {m_{t}}_{t \in R}

obtained by integrating a stationary noise

W = {w_{t}}_{t \in R}

:

m_{t} = m_{0} + \int_{0}^{t} w_{t^{'}} d t^{'} .

(1)

where

m_{0}

and

w_{0}

are the values at time

t = 0

, both of which can be set to 0 without loss of generality.

Nowadays, signals are recorded and stored on digital media, which amounts to consider in practice a set of data sampled at discrete times

t_{k}

where

k \in N^{+}

. We further assume that the signals are equi-sampled, i.e.,

d t

is constant and we choose

d t = 1

. So we consider in this article discrete time processes and we express them as motions

M = {m_{t}}_{t \in N^{+}}

obtained by integrating a stationary noise

W = {w_{t}}_{t \in N^{+}}

according to:

m_{t} = m_{0} + \sum_{k = 1}^{t} w_{k}, t > 0,

(2)

where again

m_{0} = w_{0} = 0

. Equation (2) can also be replaced by

m_{t} = m_{t - 1} + w_{t}, t > 0 .

(3)

If the noise W is not centered, i.e., has a statistical mean

E (W) = β \neq 0

, we introduce the centered noise

w_{t}^{'} = w_{t} - β

. The equations for the motion M read:

\begin{matrix} m_{t} & = m_{0} + β t + \sum_{k = 1}^{t} w_{k}^{'}, t > 0, \end{matrix}

(4)

\begin{matrix} = m_{t - 1} + β + w_{t}^{'}, t > 0 . \end{matrix}

(5)

The process M can be interpreted as a motion built on the stationary centered noise

W^{'}

together with an additive deterministic drift, which is the linear trend

β t

.

In this article, we study motions without trend, so we impose that the noise W is centered, i.e., that its statistical mean

E (W) = β = 0

. Besides the simple centering of the increments

W \mapsto W - E (W)

, any detrending method can be applied to M, e.g., using moving averages. As a consequence, the motion M is centered: its statistical mean is

E {m_{t}} = m_{0} = 0

at all times

t > 0

. Nevertheless, its variance, and all its higher order moments, may depend on time: the motion M is a non-stationary process with stationary increments. Typical examples of such processes are Brownian motion and fractional Brownian motion [17], both of which have a variance that evolves with time.

2.2. General Framework

For a generic non-stationary process

X_{t} = {\{x_{t}\}}_{t \in R}

, the probability density function (PDF)

p_{x_{t}} (x_{t})

changes with time. The information theory framework can be applied to each random variable

x_{t}

, i.e., at each time t. To do so, the PDF of

x_{t}

needs to be estimated at each time t, which in practice requires to have many realizations available [15].

To analyze the temporal dynamics of a random processes at a given time t, we consider the m-dimensional vector obtained with the Takens time-embedding procedure [18]:

x_{t}^{(m, τ)} = (x_{t}, x_{t - τ}, \dots, x_{t - (m - 1) τ}) .

(6)

The embedding dimension m controls the order of the statistics that are considered, and the delay

τ

defines a time scale. We define below some information theory quantities that are functionals of the m-point joint-distributions

p_{x_{t}^{(m, τ)}} (x_{t}^{(m, τ)})

, in order to characterize linear and non-linear temporal dynamics.

2.2.1. Shannon Entropy

The entropy of

x_{t}^{(m, τ)}

is:

\begin{matrix} H (x_{t}^{(m, τ)}) & = - \int_{R^{m}} p_{x_{t}^{(m, τ)}} (x) log (p_{x_{t}^{(m, τ)}} (x)) d x . \end{matrix}

(7)

This quantity depends on time t, as well as on embedding parameter m and delay

τ

. We further note it

H_{t}^{(m, τ)} (X) = H (x_{t}^{(m, τ)})

, where the index t indicates the time and the parameters

(m, τ)

are indicated as upper indices. It measures the amount of information characterizing the m-dimensional PDF of the process X at time t sampled at scale

τ

. When

m = 1

, the entropy does not depend on

τ

and does not probe the dynamics of the process; we then note it

H_{t} (X)

, dropping the

(m = 1, τ)

upper indices. However, for embedding dimension

m > 1

the entropy depends on the linear and non-linear dynamics of the process. Indeed, the entropy involves arbitrarily high order moments of the joint PDF

p_{x_{t}^{(m, τ)}} (x_{t}^{(m, τ)})

. As usual, the entropy does not depend on the first moment of the distribution.

Using the time-increments of size

τ

,

δ_{τ} x_{t} \equiv x_{t} - x_{t - τ}

, it can be shown (see Appendix A) that the amount of information measured by

H_{t}^{(m, τ)} (X)

is the same as the amount of information in the vector

{\tilde{x}}_{t}^{(m, τ)} \equiv (x_{t}, δ_{τ} x_{t}, δ_{τ} x_{t - τ}, \dots, δ_{τ} x_{t - (m - 2) τ})

, i.e.,

\begin{matrix} H_{t}^{(m, τ)} (X) & = H ({\tilde{x}}_{t}^{(m, τ)}) \\ = H (x_{t}, δ_{τ} x_{t}, δ_{τ} x_{t - τ}, \dots, δ_{τ} x_{t - (m - 2) τ}) . \end{matrix}

(8)

For processes with stationary increments, the marginal distribution of

x_{t}

may be strongly time-dependent, but the marginal distribution of any increment is time-independent. Equation (8) thus suggests that the time-dependence of

H_{t}^{(m, τ)} (X)

originates mainly from

x_{t}

, the first component of the rewritten embedded vector

{\tilde{x}}_{t}^{m, τ}

. Nevertheless, it should be observed that although the

m - 1

increments, considered by themselves, have a stationary dependence structure, the covariance of

x_{t}

with any of the increments is a priori non-stationary.

2.2.2. Mutual Information and Auto-Mutual Information

The mutual information

M I

measures the amount of information shared by two processes. For two non-stationary time-embedded vectors

x_{t_{1}}^{(m, τ)}

and

y_{t_{2}}^{(n, τ)}

, it is defined as:

\begin{matrix} M I (x_{t_{1}}^{(m, τ)}, y_{t_{2}}^{(n, τ)}) & = H_{t_{1}} (x_{t_{1}}^{(m, τ)}) + H_{t_{2}} (y_{t_{2}}^{(n, τ)}) \\ - H (x_{t_{1}}^{(m, τ)}, y_{t_{2}}^{(n, τ)}) . \end{matrix}

(9)

In the following, we use auto-mutual information

I_{t}^{(m, n, τ)} (X)

to measure, for a single process X, the shared information between two successive time-embedded vectors of dimension m and n [19]:

\begin{matrix} I_{t}^{(m, n, τ)} (X) & = M I (x_{t}^{(n, τ)}, x_{t - n τ}^{(m, τ)}) . \end{matrix}

(10)

Auto-mutual information defined in (10) probes the dynamics of the process

X_{t}

at time t by measuring the dependencies between two consecutive chunks of m and n points sampled every

τ

.

2.2.3. Entropy Rate

The entropy rate, or entropy gain [20], of order m at time t measures the increase of Shannon entropy when the embedding dimension is increased from m to

m + 1

. It is defined as the variation of Shannon entropy between

x_{t - τ}^{(m, τ)}

and

x_{t}^{(m + 1, τ)}

, two successive time-embedded versions of the process X:

\begin{matrix} h_{t}^{(m, τ)} (X) & = H_{t}^{(m + 1, τ)} (X) - H_{t - τ}^{(m, τ)} (X) \end{matrix}

(11)

\begin{matrix} = H_{t} (X) - I_{t}^{(m, 1, τ)} (X) . \end{matrix}

(12)

Within the general framework, the entropy, mutual information, and entropy rate are well defined at any time t for a non-stationary process. Although this framework can formally be used to analyze non-stationary processes at any time t, in practice it is often impossible to assess statistics at a fixed time t, as the number of available realizations from real-world datasets may be very small. To overcome this issue, we propose in the next section another framework that considers averages over a finite and possibly large time window, which represents for example the duration of an experimental measurement.

2.3. Practical Time-Averaged Framework

We now focus on non-stationary processes with stationary increments. We develop in this section a pragmatic approach which can be applied when a single time trace of a non-stationary signal is available.

We first present a very formal perspective that defines a time-averaged PDF of a non-stationary process. We then propose a practical approach which uses a very simple estimation of such a time-average PDF. We finally use this practical approach to define all the information quantities that we are interested in.

2.3.1. Time-Averaged Framework

Using a formal perspective, we consider the global statistics of the dataset, when forgetting its time dynamics, and we formally consider the time-averaged probability density function in the time window

[t_{0}, t_{0} + T]

:

{\bar{p}}_{T, t_{0}, x^{(m, τ)}} (x) = \frac{1}{T} \int_{t_{0}}^{t_{0} + T} p_{x_{t}^{(m, τ)}} (x) d t .

(13)

Because of the time-average, this probability density function does not depend on a single time t but on the starting time

t_{0}

and the duration T of the time window.

In the case of a stationary process, the PDF

p_{x_{t}^{(m, τ)}} (x)

is independent of t, so the PDF

{\bar{p}}_{T, t_{0}, x^{(m, τ)}} (x)

is independent of

t_{0}

and T.

In the case of a non-stationary process with stationary centered increments, the dependence on

t_{0}

only appears on the mean of the time-averaged PDF

{\bar{p}}_{T, t_{0}, x^{(m, τ)}} (x)

. As a consequence, since the Shannon entropy does not depend on the mean, none of the information theoretic quantities depends on

t_{0}

.

In the case of a non-stationary process with stationary but non-centered increments, there is a drift: the first moment of

p_{x_{t}^{(m, τ)}} (x)

evolves linearly with time. When integrated in time in Equation (13), this induces a deformation of the time-averaged PDF

{\bar{p}}_{T, t_{0}, x^{(m, τ)}} (x)

, which affects a priori moments of any order. As a consequence, the Shannon entropy is then expected to depend on

t_{0}

.

In the following, we focus on non-stationary processes with stationary centered increments, described in Section 2.1.

2.3.2. Practical Framework

In practice, given a time series of length T, we propose to very roughly approximate the PDF

{\bar{p}}_{T, t_{0}, x^{(m, τ)}}

defined in (13) with the normalized histogram

{\hat{\bar{p}}}_{T, t_{0}}

of all data points

x_{t}^{(m, τ)}

,

t \in [t_{0}, t_{0} + T]

available in the time window. This is a very strong assumption, as

{\bar{p}}_{T, t_{0}, x^{(m, τ)}}

is a priori very different from any

p_{x_{t}^{(m, τ)}}

, and a priori very different from the histogram

{\hat{\bar{p}}}_{T, t_{0}}

constructed after cumulating all the available data in the interval. This pragmatic approach comes down to treating the set of available data points in the time interval exactly in the same way as if it was a set of data points originating from a stationary, albeit unknown, process and then estimate its PDF.

In the following, we drop the hat in the notations, and consider only the ersatz probabilities

\hat{\bar{p}}

in place of the time-averaged probabilities

\bar{p}

. As we discuss later in Section 6, if several experimental realizations are available, it is of course possible to use them to enhance the estimation of the time-averaged PDF.

2.3.3. Information Theory Quantities in the Practical Framework

Given a time series of length T, and considering the ersatz PDFs

{\bar{p}}_{T, t_{0}, x^{(m, τ)}}

, we define

{\bar{H}}_{T}^{(m, τ)} (X)

the time-averaged Shannon entropy,

{\bar{I}}_{T}^{(m, n, τ)} (X)

the time-averaged auto-mutual information, and

{\bar{h}}_{T}^{(m, τ) (X)}

the time-averaged entropy rate, as described below.

Ersatz Shannon Entropy

We define the ersatz entropy of the time-embedded signal as the entropy of the time-averaged PDF

{\bar{p}}_{T, t_{0}, x^{(m, τ)}}

:

{\bar{H}}_{T}^{(m, τ)} (X) = - \int_{R^{m}} {\bar{p}}_{T, t_{0}, x^{(m, τ)}} (x) log ({\bar{p}}_{T, t_{0}, x^{(m, τ)}} (x)) d x .

(14)

{\bar{H}}_{T}^{(m, τ)} (X)

gives the amount of information of the set of values of the signal

x_{t}^{(m, τ)}

in the time interval

[t_{0}, t_{0} + T]

and hence it can be interpreted as the total information characterizing the temporal trajectory

{x_{t}^{(m, τ)}, t \in [t_{0}, t_{0} + T]}

of the process. If the process has stationary centered increments, the total amount of information in the trajectory depends only on its length T, and not on its starting time

t_{0}

. In that sense, the ersatz entropy

{\bar{H}}_{T}^{(m, τ)}

is not stationary.

Using the rewriting (8), we argue that this dependence in T originates from

x_{t}

—the first component of the vector

{\tilde{x}}_{t}^{m, τ}

—which has a time-dependent marginal distribution. Because the

m - 1

other components of

{\tilde{x}}_{t}^{m, τ}

are increments, they have by hypothesis a stationary dependence structure. So increasing the embedding dimension does not impact the dependence of the ersatz entropy on the window size T, but only its dependence on the increments size

τ

.

Auto-Mutual Information

We define the ersatz auto-mutual information as:

{\bar{I}}_{T}^{(m, n, τ)} (X) = {\bar{H}}_{T}^{(m, τ)} (X) + {\bar{H}}_{T}^{(n, τ)} (X) - {\bar{H}}_{T}^{(m + n, τ)} (X) .

(15)

Entropy Rate

We define the ersatz entropy rate over a time interval of size T as:

\begin{matrix} {\bar{h}}_{T}^{(m, τ)} (X) & = {\bar{H}}_{T}^{(m + 1, τ)} (X) - {\bar{H}}_{T}^{(m, τ)} (X) \end{matrix}

(16)

\begin{matrix} = {\bar{H}}_{T} (X) - {\bar{I}}_{T}^{(m, 1, τ)} (X) . \end{matrix}

(17)

From (16), we may expect a cancelation of the main dependence in T which is the same for

{\bar{H}}_{T}^{(m + 1, τ)}

and for

{\bar{H}}_{T}^{(m, τ)}

. As a consequence, the ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)}

should be stationary, in the sense that it should not depend on the length T of the time interval that is considered.

If the available samples span a very large time window, one may consider using multiple non-overlapping time windows of size T starting at various times. Because of the stationarity and zero-mean of the increments and hence the independence of the ersatz quantities on

t_{0}

, it is possible to average the different estimations of the ersatz quantities obtained in each window. It is also possible to use all the non-overlapping windows to populate the histogram and thus enhance the estimation of the time-averaged PDF. Each of these two operations will increase the statistics and hence improve the estimation.

2.4. Self-Similar Processes

In this section, we focus on the special case of self-similar processes, i.e., signals which exhibit monofractal scale invariance [21]. Such processes have been used as a satisfying first approximation to model or describe very various phenomena, such as ionic transport [22], fluid turbulence [23], climate [24], river flows [25], cloud structure [26], or earthquakes [27], as well as neural signals [28], stock markets [29,30], texture patterns [31], or internet traffic [32]. A process

X_{t}

is monofractal scale-invariant if there exists a real number

H

such that for all

a \in R^{+ *}

, the probability density functions of

x_{a t}

and

a^{H} x_{t}

are equivalent.

H

is called the Hurst exponent. If

H < 0

, the process is stationary and called a fractional noise. If

0 \leq H < 1

, the process is non-stationary with stationary increments. The case

H = 1 / 2

corresponds to the traditional Brownian motion.

Assuming

x_{t = 0} = 0

, the scale invariance property can be expressed as [33]:

p_{x_{a t}} (x) = \frac{1}{a^{H}} p_{x_{t}} (\frac{x}{a^{H}}) .

(18)

The scale invariance property of a process

X_{t}

transfers to its increments, as well as any of its time-embedded version:

p_{x_{a t}^{(m, a τ)}} (x) = \frac{1}{a^{m H}} p_{x_{t}^{(m, τ)}} (\frac{x}{a^{H}}) .

(19)

This relation allows to express the non-stationary PDF of

x_{t}^{(m, τ)}

at any time t as a function of the PDF at unit-time (t = 1). This is done by using the factor

a = 1 / t

in Equation (19), i.e., by rescaling each coordinate of the embedded vector by the factor

t^{H}

.

Using Equation (13), it is straightforward to see that the scale invariant property of the form (19) is also valid for the time-averaged PDF

{\bar{p}}_{T, t_{0}, x^{(m, τ)}} (x)

.

Because of its definition (2) as a cumulative sum of a noise, a motion can be seen as accumulating the correlations between successive points of the noise. When performing a time-embedding, the particular case

τ = 1

is interesting: considering the relation (8), we may expect that the information contained in the time-embedded motion

m_{t}^{(m, τ = 1)}

is closely related to the information contained in the time-embedded noise

w_{t}^{(m, τ = 1)}

. This is not the case anymore when

τ \geq 2

.

Fractional Brownian Motion

The fractional Brownian motion (fBm) was proposed by Mandelbrot and Van Ness [17] and quickly became a benchmark for self-similarity and long-range dependence. The fBm is the only Gaussian self-similar process with stationary increments. It is characterized by its Hurst exponent,

H

.

The fBm is a motion, obtained by integrating according to (2) a fractional Gaussian noise (fGn), defined as a centered Gaussian process with the correlation structure

\begin{matrix} c_{fGn} (τ) & = \frac{σ_{1}^{2}}{2} [{(τ - 1)}^{2 H} - 2 τ^{2 H} + {(τ + 1)}^{2 H}] . \end{matrix}

(20)

The fGn is a stationary noise with the standard deviation

σ_{1}

. It is scale-invariant with a Hurst exponent

H - 1

.

The non-stationary covariance structure of the fBm B reads

E {B_{t} B_{t - τ}} = \frac{σ_{1}^{2}}{2} [t^{2 H} + {(t - τ)}^{2 H} - τ^{2 H}],

(21)

where

τ < t

.

2.4.1. General Framework

We show below how the theoretical information quantities depend on time t and delay

τ

. We start from the relation (8) between the entropy of the time-embedded vector and the entropy of the increments and we normalize each component of the vector

{\tilde{x}}_{t}^{(m, τ)}

by its standard deviation. The standard deviation

σ_{t}

of the motion

x_{t}

evolves with time as

σ_{t} = σ_{1} t^{H}

, while the standard deviation

σ_{τ}

of the increments

x_{t} - x_{t - τ}

is independent of t, thanks to the stationarity of the increments, and evolves with the size of the increment as

σ_{τ} = σ_{1} τ^{H}

. So we have:

\begin{matrix} H_{t}^{(m, τ)} (X) & = H (x_{t} / σ_{t}, δ_{τ} x_{t} / σ_{τ}, \dots, δ_{τ} x_{t - (m - 2) τ} / σ_{τ}) \end{matrix}

\begin{matrix} + ln σ_{t} + (m - 1) ln σ_{τ}, \\ = H (x_{t} / t^{H}, δ_{τ} x_{t} / τ^{H}, \dots, δ_{τ} x_{t - (m - 2) τ} / τ^{H}) \end{matrix}

(22)

\begin{matrix} + H ln t + (m - 1) H ln τ + m ln σ_{1} . \end{matrix}

(23)

We then use the scaling law (19) for

a = 1 / t

to relate the joint probability at a given time t to the joint probability at unit-time

t = 1

, which leads to:

\begin{matrix} H (x_{t} / t^{H}, δ_{τ} x_{t} / τ^{H}, \dots, δ_{τ} x_{t - (m - 2) τ} / τ^{H}) & = H (x_{1}, δ_{τ / t} x_{1}, \dots, δ_{τ / t} x_{1 - (m - 2) τ / t}) - m ln σ_{1} \end{matrix}

(24)

\begin{matrix} = H ({\tilde{x}}_{t = 1}^{(m, τ / t)}) - m ln σ_{1} . \end{matrix}

(25)

Using (8) again at time

t = 1

, we have

H ({\tilde{x}}_{t = 1}^{(m, τ / t)}) = H_{t = 1}^{(m, τ / t)} (X)

, so we can express the time-dependent Shannon entropy (7) for self-similar processes as:

\begin{matrix} H_{t}^{(m, τ)} (X) & = H_{t = 1}^{(m, τ / t)} (X) + H ln t + (m - 1) H ln τ . \end{matrix}

(26)

The entropy rate can be rewritten with (11) and (26) as:

h_{t}^{(m, τ)} (X) = h_{t = 1}^{(m, τ / t)} (X) + H ln τ,

(27)

where

h_{1}^{(m, τ / t)} (X)

is the entropy rate at time

t = 1

, using the rescaled time delay

τ / t

.

Although the two quantities

H_{t = 1}^{(m, τ / t)} (X)

and

h_{t = 1}^{(m, τ / t)} (X)

are considered at a fixed time

t = 1

, they still depend on t via the delay

τ / t

. Because

τ / t

is small as soon as

t ≫ τ

, we expect that the dependence of the entropy

H_{t}^{(m, τ)} (X)

on time t is mainly in

H ln t

, and that the entropy rate is almost time-independent.

Fractional Brownian Motion

The PDF

p_{B_{t}}

of the fBm is Gaussian at any time t, so we can express its Shannon entropy and entropy rate at time t by using Equation (26) and the expression of the Shannon entropy of a Gaussian multivariate process [34]. We obtain the following approximated expressions:

\begin{matrix} H_{t}^{(m, τ)} (B) & ≃ m H_{1}^{FBM} + H ln t + (m - 1) H ln τ \end{matrix}

(28)

\begin{matrix} h_{t}^{(m, τ)} (B) & ≃ H_{1}^{FBM} + H ln τ, \end{matrix}

(29)

where

H_{1}^{FBM} \equiv \frac{1}{2} ln (2 π e σ_{1}^{2})

is the entropy of the fBm at unit-time. These formulae are exact for

m = 1, 2

, but for

m \geq 3

, constant terms as well as corrections in

τ / t

have been omitted for clarity.

2.4.2. Practical Time-Averaged Framework

For a generic self-similar process, we are not able to derive any analytical results in the practical time-averaged framework. Nevertheless, the behaviors expected for a generic non-stationary process with stationary increments are holding: (i) the ersatz entropy

{\bar{H}}_{T}^{(m, τ)}

is not stationary, in the sense that it depends on the length T of the time-interval, (ii) the ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)}

is stationary.

Fractional Brownian Motion

The ersatz entropy of the fBm over a time window of size

T ≫ τ

can be expressed by averaging its covariance structure on a time window of size T. We obtain [35]:

{\bar{H}}_{T}^{(1, τ)} (B) = H_{1}^{FBM} + H ln T .

(30)

The entropy of the fBm thus increases linearly with the logarithm of the window size T. The larger the time window, the more there is information in the trajectory.

The auto-mutual information of the fBm can be derived in the same way using (15) for

T ≫ τ

:

{\bar{I}}_{T}^{(1, 1, τ)} (B) = - H ln (\frac{τ}{T}) + C (\frac{τ}{T}),

(31)

where

C (\frac{τ}{T})

is a correction in

τ / T

that reads

\begin{matrix} C (\frac{τ}{T}) & = \frac{1}{2} ln (\frac{\frac{τ}{2 T} + \frac{1}{2 H + 1}}{(\frac{τ}{T} + \frac{1}{2 H + 1})}) \end{matrix}

(32)

\begin{matrix} = - \frac{2 H + 1}{4} \frac{τ}{T} + O ({(\frac{τ}{T})}^{2}) . \end{matrix}

(33)

The ersatz mutual information depends logarithmically on the scale

τ

and the window size T. The larger the window-size T or the smaller the scale

τ

, the stronger the dependencies.

The ersatz entropy rate of order

m = 1

is obtained by combining (30) and (31) according to (17):

{\bar{h}}_{T}^{(1, τ)} (B) = H_{1}^{FBM} + H ln τ - C (\frac{τ}{T})

(34)

which is independent of T up to corrections in

τ / T

, while being linear in

ln (τ)

with a constant slope

H

. The correction

- C (τ / T)

in Equation (34) is positive, see Equation (33).

Comparing (28) with (30) shows that for the fBm, the ersatz entropy dependence on T is exactly the same as the entropy dependence on t. Comparing (29) with (34) shows that the entropy rate and the ersatz entropy rate do not depend on t or T up to corrective terms that are negligible if the scale

τ

is not too large. We also see explicitly that both quantities evolves with the scale

τ

in

H ln τ

, again up to corrections of order

τ / t

and

τ / T

.

The example of the fBm suggests that for a scale-invariant process the evolution of any information theory quantity with the scale

τ

is the same within the practical time-averaged framework or the general framework. We push this analysis further in the next sections, by exploring if this property holds when the process is non-Gaussian.

3. Benchmarking the Practical Framework with the fBm

We focus in this section on the fractional Brownian motion, for which analytical expressions were derived in the previous sections. We use the fBm not only to benchmark our estimators of information theory quantities, but also to illustrate the use of the practical framework and the expected behavior of the ersatz quantities when used on a self-similar process of Hurst exponent

H

.

3.1. Characterization of the Estimates

3.1.1. Data

To obtain a fBm, we integrate a fractional Gaussian noise (fGn). We use circulant matrix method [36] to impose the correlation structure of the fGn (20). Then, we center and normalize the noise such that the standard deviation,

σ_{fGn}

, is equal to one. We then take the cumulative sum to obtain the fBm. Throughout all of this article,

H = 0.7

for all the processes used to illustrate our results, but we have checked that they hold for any other value

0 < H < 1

.

3.1.2. Procedure

We estimate the Shannon entropy

{\bar{H}}_{T}^{(m, τ)}

with our own implementation of the k-nearest neighbors estimate from Kozachenko and Leonenko [37]. We estimate the auto-mutual information

{\bar{I}}_{T}^{(m, p, τ)}

with the algorithm provided by Kraskov, Stogbauer, and Grassberger [38]. This estimator is also based on a nearest neighbors search and it provides—amongst several good properties—a build-in cancellation of the bias difference originating from each of the two arguments. In the following, we note k the number of neighbors, which is the only parameter of the estimators. The entropy rate

{\bar{h}}_{T}^{(m, τ)}

is then computed using Equation (17).

We generate for each motion a set of 100 independent realizations of fixed size T with a Hurst exponent

H = 0.7

. We compute averages of the estimates on the realizations and use the standard deviation as error bars in the different graphs.

In Section 3.1.3 and Section 3.1.4, we characterize respectively the bias and standard deviation (std) over realizations of our estimators of entropy, auto-mutual information and entropy rate.

3.1.3. Convergence/Bias

We detail here how the ersatz entropy rate evolves with T and k. We report in Figure 1a our results for all possible values of the couples

({log}_{2} (T), k) \in [9, \dots, 17] \times [4, \dots, 18]

, while

τ

is set to 1 here. According to Equation (34), the ersatz entropy rate of the fBm converges for large T to the value

H_{1}^{fBm}

(horizontal black line in Figure 1a) thanks to the vanishing of the correction term

C (τ / T)

, according to (33).

Figure 1a can be interpreted as describing the behavior of the bias of the estimator. This bias vanishes non-monotonically as

k / T^{\frac{1}{m + 1}}

. When

k / T^{\frac{1}{m + 1}}

is reduced, first the bias is positive and diminishes toward negative values and then converges to zero. This behavior was previously reported for the k-nn mutual information estimator applied for stationary processes [16,38,39], and we confirm it is valid for the fBm.

We observed the same convergence for a large range of scales

τ > 1

: the ersatz entropy rate then converges to

H_{fBm}^{1} + H ln τ

for large T with the same behavior of the bias.

3.1.4. Standard Deviation of the Estimates

We present in Figure 2a the evolution of the standard deviation of the ersatz entropy, mutual information, and entropy rate with T for

τ = 1

. The standard deviation of both the entropy and mutual information is large, and does not decrease when T—and hence the number of samples—increases. On the contrary, the standard deviation of the entropy rate is much smaller and decreases when T increases. We attribute this feature to the dependence of the quantities on the observation time T, see Equations (30) and (31) for the fBm. While

{\bar{H}}_{T}

and

{\bar{I}}_{T}

increase as

ln T

, this is not the case for

{\bar{h}}_{T}

which is independent on T (up to small corrections, negligible for smallish

τ

). Although it is difficult to explain why the standard deviation of the entropy and mutual information remain constant when T increases, it seems that this results from a balance between the non-stationarity (in

ln T

) and the increased statistics. On the contrary, for the entropy rate which is stationary, the decrease of the std is as expected.

As a conclusion, both the bias and the standard deviation of the ersatz entropy rate increase when k increases or T decreases and can be made arbitrarily small by increasing the window size T. In the remainder of this article, we choose

k = 5

and when studying the behavior of information theoretic quantities on the scale

τ

, we set

T = 2^{16}

.

3.2. Dependence on Times T and $τ$

In this section, we present a detailed numerical study of the ersatz entropy, auto-mutual information, and entropy rate of the fBm with

H = 0.7

. In particular, we present a quantitative comparison with the analytical expressions (28) and (29) in the general framework, as well as with analytical expressions (30), (31) and (34) in the practical framework for the fBm. These comparisons allow first to validate the analytical expressions obtained for fBm in the practical framework, and second to show that the information theoretic quantities in the practical framework evolve in T and

τ

exactly as their counterparts evolve in the general framework in t and

τ

. To compare analytical and numerical results, we vary the window size T, the scale

τ

, and the embedding dimension m.

3.2.1. Entropy and Auto-Mutual Information

Dependence on T

The left column of Figure 3 shows the ersatz Shannon entropy

{\bar{H}}_{T}^{(m, τ)}

(Figure 3a) and auto-mutual information

{\bar{I}}_{T}^{(m, 1, τ)}

(Figure 3c) at a given scale

τ = 1

, as a function of

ln T

. The evolution of these two quantities for

m = 1

is very close to

H ln T

, which is represented by a continuous black line. This is in agreement with Equation (30) and (31). For

m > 1

, we obtain in the practical framework the behaviors predicted in the general framework, replacing t by T in the equations. We observe that the auto-mutual information does not depend on the embedding dimension m, while the entropy does, with an offset that seems to depend linearly on m. The dependence of the entropy and the auto-mutual information on the time window T is the signature of the non-stationarity of the signal.

Dependence on $τ$

The right column of Figure 3 shows the ersatz Shannon entropy and auto-mutual information for a fixed window size

T = 2^{16}

when varying the scale parameter

τ

. The ersatz Shannon entropy behaves as

(m - 1) H ln (τ)

, see Figure 3b, in agreement with Equation (26) or Equation (28). The ersatz auto-mutual information behaves as

- H ln (τ)

for any embedding m, see Figure 3d, in agreement with Equation (31), thus suggesting this formula is valid for any embedding dimension.

3.2.2. Stationarity of the Entropy Rate

Figure 4a shows that the ersatz entropy rate

{\bar{h}}_{T}^{(m = 1, τ)}

with embedding dimension

m = 1

is almost constant when T is varied. For embedding dimensions

m > 1

, there is a small variation, of about 15%, much smaller than the 200% variation observed for either the entropy or the auto-mutual information (Figure 3a,c) on the same range of T. This small dependence on T can be due to the correction in Equation (34), which may depend on m. We argue that it is mostly due to bias, which increases with the embedding dimension. Indeed, we observe that the entropy rate seems to converge for larger T to the same value close to

H_{1}^{fBm}

for all m. As a larger T corresponds to a larger sampling of the statistics, the bias is reduced, as reported in Figure 1. Moreover, for

m = 1

, Equation (34) predicts a positive correction that vanishes when T is large. On the contrary, we observe a convergence to a value lower than

H_{1}^{fBm}

, which hints that the bias is negative and larger than the theoretical correction. This suggests that the form of Equation (34) is still valid for embedding dimensions

m > 1

.

3.2.3. Entropy Rate Dependence on Scale $τ$

Figure 4b shows that for a fixed window size

T = 2^{16}

, the ersatz entropy rate is proportional to

H ln (τ)

. We have added a black line defined by the linear function

H_{1}^{FBM} + H ln τ

, as suggested by Equation (34) without the corrective term. This black line perfectly describes the evolution of the entropy rate with the scale

τ

, which is independent on the embedding dimension m.

To observe the finer evolution of the entropy rate on the scale

τ

, we subtract the main contribution

H ln τ

to the entropy rate and we plot

{\bar{h}}_{T}^{(m, τ)} - H ln τ

for different embedding dimensions in Figure 5. We observe a slight increase, which is larger for larger embedding dimensions. For

m = 1

, the correction term can be evaluated from Equation (32), and is at most

{2.10}^{- 3}

, and does not account for the evolution reported here, which is probably due to the bias, which increases when the number of points—which is proportional to

T / τ

—decreases and when the embedding m increases.

For a scale invariant self-similar process, the standard deviation

σ_{τ}

of the increments of size

τ

behaves as

σ_{τ} = σ_{1} τ^{H}

. Subtracting

H ln τ

amounts to subtracting

ln σ_{τ}

: for each scale

τ

, this corresponds to normalizing the down-sampled data (taking one point every

τ

points) by the standard deviation

σ_{τ}

of the increments of size

τ

. When the Hurst exponent is a priori unknown,

σ_{τ}

can be computed and used to compute the main contribution

- ln (σ_{τ})

. Thus, the fine evolution of the entropy rate with

τ

can be used as a tool to probe the deviation from the self-similarity assumption, which is interesting for multifractal signals.

4. Application of the Practical Framework to Non-Gaussian Self-Similar Processes

In this section, we turn to non-Gaussian processes and describe the results obtained in the time-average framework generalized in this larger class of processes.

4.1. Procedure

We construct two different motions, in the very same way as we did for the fBm. We integrated two log-normal noises synthesized with the same log-normal marginal distribution and with the same correlation function (20) as the fGn, but different dependance structure. To generate these noises, we use the methodology proposed in [36] to obtain the log-normal marginal by applying two different transformations to the cumulative distribution function

F_{Z}

of a Gaussian white noise Z: the Hermitian transformation of rank 1 (

f^{1} (z) = F^{- 1} (F_{Z} (z))

) and the even-Hermitian transformation of rank 2:

f^{1} (z) = F^{- 1} (2 (F_{Z} (| z |) - \frac{1}{2}))

, where F is the cumulative distribution function of the targeted log-normal distribution. This synthesis is performed with the toolbox provided at www.hermir.org. Once the two log-normal noises have been generated, they are integrated using Equation (2) to obtain two non-stationary scale invariant processes with non-Gaussian statistics.

The dependence structures of the two log-normal noises were previously studied in detail [16]: while the correlation function is the same for the two noises—and identical to the targeted one of the fBm given by (20)—the complete dependence structure was shown to be different.

To study these two non-stationary and non-Gaussian motions, we use again realizations of

T = 2^{16}

points,

k = 5

neighbors, and we focus on the case where embedding dimension

m = 1

and Hurst exponent

H = 0.7

.

4.2. Bias and Standard Deviation

We report in Figure 1b,c the evolution of the ersatz entropy rate of the Hermitian and the even-Hermitian log-normal processes in function of

\frac{k}{T^{1 / 2}}

. We observe exactly the same behavior as for the fBm: the entropy rate converges to

H_{1}^{\ln}

, the entropy of the log-normal process at unit-time (horizontal blue/red line in Figure 1b,c). If X is a log-normal process of mean

μ

and standard deviation

σ

, then the process

log X

is Gaussian with the mean

μ^{'} = log (\frac{μ^{2}}{\sqrt{μ^{2} + σ^{2}}})

and the standard deviation

σ^{'} = 2 log (1 + \frac{σ^{2}}{μ^{2}})

and the entropy of X can be expressed as

H_{1}^{\ln} = \frac{1}{2} log (2 π e σ^{'}) + μ^{'}

[16]. Figure 1b,c thus gives an estimation of the bias of our estimator, which appears to be the same as for the fBm.

We report in Figure 2b,c the behavior of the standard deviation of the estimators. Again, exactly as for the fBm, the standard deviation is large for the ersatz entropy and the ersatz auto-mutual information, while it is much smaller for the ersatz entropy rate.

Again, both the bias and the standard deviation of the entropy rate increase when k increases or T decreases and can be made arbitrarily small by increasing T. These results do not depend on the marginal distribution: they have been obtained not only for the fBm with Gaussian statistics, but also for two motions built on log-normal noises.

4.3. Dependence on Times T and $τ$

The evolution of

{\bar{h}}_{T}^{(1, τ)}

on the time window size T for the two motions is presented in Figure 6a. As it was the case for the fBm,

{\bar{h}}_{T}^{(1, τ)}

depends only weakly on T, and seems to converge for larger T to the value

H_{1}^{\ln}

, up to a small corrective term.

The evolution of

{\bar{h}}_{T}^{(1, τ)}

with the time scale

τ

is presented in Figure 6b. In the same way as for the fBm, we again observe a large increase, almost proportional to

ln τ

. Because this strong tendency originates from the increase of the standard deviation

σ_{τ}

of the increments of size

τ

when

τ

increases, we again normalize the entropy rate by subtracting

ln σ_{τ} = H ln τ

. Results are presented in Figure 7, together with results for the fBm with

m = 1

for comparison.

The normalized ersatz entropy rate of the motion built from the even-Hermitian log-normal noise appears as almost independent of

τ

. This behavior is identical to the one observed for the fBm, but the remaining constant value is different (

H_{1}^{fBm}

or

H_{1}^{\ln}

). The ersatz entropy rate of the fBm (in black) and the even-Hermitian motion (in red) both behaves exactly as

H ln τ

, which is the expected behavior for a self-similar process, see Equation (27). On the contrary, the motion built with the Hermitian transformation of rank 1 exhibits an additional variation in

τ

: the normalized entropy rate

{\bar{h}}_{T}^{(m, τ)} - ln (σ_{τ})

evolves from the value

H_{1}^{\ln}

at

τ = 1

—expected for the motion built with a log-normal noise and obtained for the even-Hermitian process at any

τ

—up to the value

H_{1}^{fBm}

—expected for a Gaussian process, and obtained for the fBm at any

τ

.

As a conclusion, one can estimate the Hurst exponent of a perfectly self-similar process as the slope of the linear fit in

ln τ

of the ersatz entropy rate. This is a valid approach for the fBm and the motion built from the noise constructed with the even-Hermitian transformation, because the ersatz entropy rate then behaves linearly in

ln τ

. On the contrary, the motion built using an Hermitian transformation of rank 1 does not appear as perfectly self-similar. This can be indeed verified by plotting the normalized PDFs (setting the standard deviation to unity) of the increments

m_{t} - m_{t - τ}

of the motions for various values of

τ

. As can be seen in Figure 8, the PDFs of the increments of the “standard log-normal process” vary with the scale

τ

, while those of the “even-Hermitian motion” remain identical. For

τ = 1

, the increments are nothing but the log-normal noises, which are log-normal, as prescribed. For large

τ

, the increments of the “even-Hermitian motion” remain log-normal, while the increments of the “standard log-normal motion” deforms and seems to become more Gaussian. The ersatz entropy rate catches this fine evolution perfectly.

5. Application of the Practical Framework to a Multifractal Process

We now explore the proposed time-averaged framework on the Multifractal Random Walk, to illustrate how it performs on a multifractal process. The multifractal random walk (MRW) [40,41] is a popular multiplicative cascade process widely used to model systems that exhibit multifractal properties [42]. Like the fBm, the MRW is a motion obtained by integrating—again with Equation (2)—a stationary noise

W^{MRW} = {w_{t}^{MRW}}_{t \in R}

such that

w_{t}^{MRW} = w_{t}^{fGn} e^{ω_{t}}

(35)

where

W^{fGn} = {w_{t}^{fGn}}_{t \in R}

is a fGn with parameter

H^{fGn}

and

Ω = {ω_{t}}_{t \in R}

is a Gaussian random process, independent of

X^{fGn}

with a correlation function

\begin{matrix} c_{ω} (τ) & = - c_{2} log (\frac{L}{| τ | + 1}) & if | τ | < L \\ = 0 & otherwise \end{matrix}

(36)

where L is the integral scale, set here to

L = T

.

The MRW is a scale invariant process: the power spectrum of its time-derivative

W^{MRW}

behaves as a power law with an exponent

2 (H^{fGn} - c_{2}) + 1

, which would be the Hurst exponent obtained for a fGn with parameter

H^{fGn} - c_{2}

. Any moment of order q of the increments of size

τ

behaves as a power law of

τ

with the exponent

ζ (q)

. Contrary to the fBm, the MRW is not exactly self-similar and exhibits intermittency:

ζ (q) = H^{fGn} q - \frac{c_{2}}{2} q^{2}

is not a linear function of q, as expected for a self-similar process. As a consequence, the shape of the PDF of the increments depends on the scale.

We choose the parameter

H^{fGn}

such that the power spectrum of the noise

W^{MRW}

is identical to the one of the fBm used in the former sections, i.e.,

H^{fGn} = 0.7 + c_{2}

. We set the parameter

c_{2} = 0.025

, a value widely used to model the intermittency of Eulerian turbulent velocity field [23].

Figure 9 compares the evolution of the PDF of the increments of the fBm and the MRW. As expected, no change is observed for the fBm, while the PDF of the MRW has wider tails for smaller

τ

. The fBm is perfectly self-similar, while the MRW exhibits intermittency [43]: the PDF of its increments is deformed when the scale

τ

of the increments is varied, although no analytical expression of the PDF is available.

We apply our practical framework and plot in Figure 10a the evolution of the ersatz entropy rate of the MRW with T. Again, the entropy rate seems to be independent of T. We nevertheless observe a small tendency to increase towards the value

H_{1}^{MRW}

the entropy of the MRW at unit-time. Here, because there is no analytical expression of the PDF,

H_{1}^{MRW}

cannot be derived analytically and we numericaly estimate its value.

The dependence in

τ

is plotted in Figure 10b. We again observe a strong linear evolution of the ersatz entropy rate in

H ln τ

. After subtracting this strong tendency (Figure 11), we still observe an evolution with

τ

, but this evolution appears much weaker than for the Hermitian log-normal (blue curve in Figure 7). Indeed, the deformation of the PDFs of the increments when varying

τ

is much slower for the MRW (Figure 9b) than for the Hermitian log-normal (Figure 8a).

6. Discussion and Conclusions

We proposed a new framework in information theory to analyze a non-stationary process by considering it as resulting from a gedanken stationary process and estimating the PDF by cumulating all available samples in a time interval of size T. This framework hence considers a PDF obtained by time-averaging over a time window

[t_{0} : t_{0} + T]

, and then proceeds to compute the associated information theory quantities. In particular, the ersatz entropy

{\bar{H}}_{T} (X)

that is then defined can be interpreted as the amount of information characterizing the complete trajectory

{X_{t}, t \in [t_{0}, t_{0} + T]}

of the process X. If we assume that the increments of X are stationary and centered, then

{\bar{H}}_{T} (X)

and all other information ersatz theoretical quantities depend only on the duration T and not of the first time

t_{0}

.

We illustrated our approach by focusing first on a model system: the fractional Brownian motion. We derived in this context the analytical expressions of the ersatz entropy, ersatz auto-mutual information, and ersatz entropy rate, which allowed a pedagogical description of our new information theory quantities. We also reported how the ersatz quantities behave when the time-interval size T and the embedding time scale

τ

are varied: we obtained analytical expressions for embedding dimension

m = 1

, and confirmed them numerically for

m \geq 1

. Besides the fBm, we reported numerical observations for various self-similar or multifractal processes. The ersatz entropy

{\bar{H}}_{T}^{(m, τ)}

always diverges logarithmically in T, while the ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)}

always behaves almost independently of T. The examination of how the ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)}

depends on the scale

τ

provides a fine exploration of either the self-similarity or the multifractality of the process.

This exploration of the multifractality of a non-stationary process with stationary increments using the ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)} (M)

gives a viewpoint very similar to the one reported when analyzing the increments of the process with the regular Shanon entropy, as reported in [43]. We are currently investigating how to relate quantitatively the two approaches.

In the same vein, the ersatz entropy rate allowed us to discriminate two different non-stationary processes, and obtain fine differences in their self-similarity properties (Figure 7), in close relation to a method using the entropy rate of the increments of the signal, as exposed in [16]. A possible connection is also under investigation.

Through all this article, we have estimated the ersatz quantities of a process on a single trajectory

[t_{0}; t_{0} + T]

of this process; this situation corresponds to the worst case scenario where only a single realization of the process is know. If enough experimental data are available, one can improve the estimation of the ersatz quantities in two ways. First, if the same experiment has been conducted multiple times, and thus multiple realizations are available over the time interval

[t_{0}; t_{0} + T]

, one can use all these independent realizations to enhance the estimation of the time-averaged PDF. Second, if a single but long enough realization of size

T

is available, one can split it into multiple time intervals

[k T; (k + 1) T]

,

k \in [0 . . ⌊ T / T ⌋]

and the use these intervals as independent realizations as in the first case. This later situation is made possible by the assumption that the increments of the signal are not only stationary, but also centered.

Author Contributions

C.G.-B., S.G.R. and N.B.G.: investigation, methodology, article writing.

Funding

This work was supported by the LABEX iMUST (ANR-10-LABX-0064) of Université de Lyon, within the program “Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR).

Acknowledgments

The authors wish to thank L. Chevillard for stimulating discussions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Entropy of a Time-Embedded Signal

The time-embedded vector

x_{t}^{(m, τ)}

(Equation (6)) can be mapped into the vector

{\tilde{x}}_{t}^{(m, τ)} \equiv (x_{t}, δ_{τ} x_{t}, δ_{τ} x_{t - τ}, \dots, δ_{τ} x_{t - (m - 2) τ})

by the linear transformation

Q^{m}

:

\begin{matrix} x_{t}^{(m, τ)} & \mapsto {\tilde{x}}_{t}^{(m, τ)} = Q^{m} . x_{t}^{(m, τ)}, \end{matrix}

(A1)

where

Q^{m}

is the band matrix defined as:

\begin{matrix} Q^{m} \equiv (\begin{matrix} 1 & 0 & 0 & \dots & 0 & 0 \\ 1 & - 1 & 0 & \dots & 0 & 0 \\ 0 & 1 & - 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & - 1 & 0 \\ 0 & 0 & 0 & \dots & 1 & - 1 \end{matrix}) \end{matrix}

(A2)

the determinant of which satisfies

| \det (Q_{m}) | = 1

. As a consequence,

H ({\tilde{x}}_{t}^{(m, τ)}) = H (x_{t}^{(m, τ)})

, which proves (8).

References

Andreas, E.; Geiger, C.; Treviño, G.; Claffey, K. Identifying nonstationarity in turbulence series. Bound. Layer Meteorol. 2008, 127, 37–56. [Google Scholar] [CrossRef]
Nerini, D.; Besic, N.; Sideris, I.; Germann, U.; Foresti, L. A non-stationary stochastic ensemble generator for radar rainfall fields based on the short-space Fourier transform. Hydrol. Earth Syst. Sci. 2017, 21, 2777–2797. [Google Scholar] [CrossRef] [Green Version]
Boashash, B.; Azemi, G.; O’Toole, J. Time-frequence processing of nonstationary signals. IEEE Signal Process. Mag. 2013, 30, 108–119. [Google Scholar] [CrossRef]
Couts, D.; Grether, D.; Nerlove, M. Forecasting non-stationary economic time series. Manag. Sci. 1966, 18, 1–151. [Google Scholar] [CrossRef] [Green Version]
Young, P. Time-variable parameter and trend estimation in non-stationary economic time series. J. Forecast. 1994, 13, 179–210. [Google Scholar] [CrossRef]
Yang, K.; Shahabi, C. On the stationarity of multivariate time series for correlation-based data analysis. In Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, TX, USA, 27–30 November 2005. [Google Scholar]
Dębowski, L. On processes with summable partial autocorrelations. Stat. Probab. Lett. 2007, 77, 752–759. [Google Scholar] [CrossRef] [Green Version]
Yaglom, A. Correlation theory of processes with random stationary nth increments. Mat. Sb. 1955, 37, 141–196. [Google Scholar]
Ibe, O. 11-Levy processes. In Markov Processes for Stochastic Modeling, 2nd ed.; Elsevier: London, UK, 2013; pp. 329–347. [Google Scholar]
Frisch, U. Turbulence: The Legacy of A.N. Kolmogorov; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, XXVII, 388–427. [Google Scholar]
Kantz, H.; Schreiber, T. Nonlinear Time Series Analysis; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Vu, V.Q.; Yu, B.; Kass, R.E. Information in the Non-Stationary Case. Neural Comput. 2009, 21, 688–703. [Google Scholar] [CrossRef]
Ray, A.; Chowdhury, A.R. On the characterization of non-stationary chaotic systems: Autonomous and non-autonomous cases. Phys. A 2010, 389, 5077–5083. [Google Scholar] [CrossRef]
Gómez Herrero, G.; Wu, W.; Rutanen, K.; Soriano, M.C.; Pipa, G.; Vicente, R. Assessing coupling dynamics from an ensemble of time series. Entropy 2015, 17, 1958–1970. [Google Scholar] [CrossRef] [Green Version]
Granero-Belinchón, C.; Roux, S.; Abry, P.; Garnier, N.B. Probing high-order dependencies with information theory. IEEE Trans. Signal Process. 2019, 67, 3796–3805. [Google Scholar] [CrossRef]
Mandelbrot, B.; Van Ness, J. Fractional brownian motions fractional noises and applications. SIAM Rev. 1968, 10, 422–437. [Google Scholar] [CrossRef]
Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick: Proceedings of a Symposium Held at the University of Warwick 1979/80; Springer: Berlin/Heidelberg, Germany, 1981; pp. 366–381. [Google Scholar]
Granero-Belinchon, C.; Roux, S.; Abry, P.; Doret, M.; Garnier, N. Information Theory to Probe Intrapartum Fetal Heart Rate Dynamics. Entropy 2017, 19, 640. [Google Scholar] [CrossRef] [Green Version]
Crutchfield, J.; Feldman, D. Regularities unseen, randomness observed: The entropy convergence hierarchy. Chaos 2003, 15, 25–54. [Google Scholar] [CrossRef]
Mandelbrot, B. The Fractal Geometry of Nature; W.H. Freeman and Co.: San Francisco, CA, USA, 1982. [Google Scholar]
Mauritz, K. Dielectric relaxation studies of ion motions in electrolyte-containing perfluorosulfonate ionomers: 4. long-range ion transport. Macromolecules 1989, 22, 4483–4488. [Google Scholar] [CrossRef]
Chevillard, L.; Castaing, B.; Arneodo, A.; Lévêque, E.; Pinton, J.; Roux, S. A phenomenological theory of Eulerian and Lagrangian velocity fluctuations in turbulent flows. C. R. Phys. 2012, 13, 899–928. [Google Scholar] [CrossRef] [Green Version]
Kavvas, M.; Govindaraju, R.; Lall, U. Introduction to the focus issue: physics of scaling and self-similarity in hydrologic dynamics, hydrodynamics and climate. Chaos 2015, 25, 075201. [Google Scholar] [CrossRef]
Rigon, R.; Rodriguez-Iturbe, I.; Maritan, A.; Giacometti, A.; Tarboton, D.; Rinaldo, A. On Hack’s law. Water Resour. Res. 1996, 32, 3367–3374. [Google Scholar] [CrossRef]
Gotoh, K.; Fujii, Y. A fractal dimensional analysis on the cloud shape parameters of cumulus over land. J. Appl. Meteorol. 1998, 37, 1283–1292. [Google Scholar] [CrossRef]
Console, R.; Lombardi, A.; Murru, M.; Rhoades, D. Bath’s law and the self-similarity of earthquakes. J. Geophys. Res. Solid Earth 2003, 108, 2128. [Google Scholar] [CrossRef] [Green Version]
Ivanov, P.C.; Ma, Q.D.Y.; Bartsch, R.P.; Hausdorff, J.M.; Amaral, L.A.N.; Schulte-Frohlinde, V.; Stanley, H.E.; Yoneyama, M. Levels of complexity in scale-invariant neural signals. Phys. Rev. E 2009, 79, 041920. [Google Scholar] [CrossRef] [PubMed]
Drozdz, S.; Ruf, F.; Speth, J.; Wojcik, M. Imprints of log-periodic self-similarity in the stock market. Eur. Phys. J. B Condens. Matter Complex Syst. 1999, 10, 589–593. [Google Scholar] [CrossRef] [Green Version]
Cont, R.; Potters, M.; Bouchaud, J.P. Scaling in stock market data: stable laws and beyond. In Scale Invariance and Beyond; Springer: Berlin/Heidelberg, Germany, 1997; Volume 7, pp. 75–85. [Google Scholar]
Uhl, A.; Wimmer, G. A systematic evaluation of the scale invariance of texture recognition methods. Pattern Anal. Appl. 2015, 18, 945–969. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Chakraborty, D.; Ashir, A.; Suganuma, T.; Mansfield-Keeni, G.; Roy, T.; Shiratori, N. Self-similar and fractal nature of internet traffic. Netw. Manag. 2004, 14, 119–129. [Google Scholar] [CrossRef]
Flandrin, P. Wavelet analysis and synthesis of fractional Brownian motion. IEEE Trans. Inf. Theory 1992, 38, 910–917. [Google Scholar] [CrossRef]
Zografos, K.; Nadarajah, S. Expressions for Rényi and Shannon entropies for multivariate distributions. Stat. Probab. Lett. 2005, 71, 71–84. [Google Scholar] [CrossRef]
Granero-Belinchon, C.; Roux, S.G.; Garnier, N.B. Scaling of information in turbulence. EPL 2016, 115, 58003. [Google Scholar] [CrossRef] [Green Version]
Helgason, H.; Pipiras, V.; Abry, P. Synthesis of multivariate stationary series with prescribed marginal distributions and covariance using circulant matrix embedding. Signal Process. 2011, 91, 1741–1758. [Google Scholar] [CrossRef]
Kozachenko, L.; Leonenko, N. Sample estimate of entropy of a random vector. Probl. Inf. Transm. 1987, 23, 95–100. [Google Scholar]
Kraskov, A.; Stöbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gao, W.; Oh, S.; Viswanath, P. Demystifying Fixed k-Nearest Neighbor Information Estimators. IEEE Trans. Inf. Theory 2018, 64, 5629–5661. [Google Scholar] [CrossRef] [Green Version]
Bacry, E.; Delour, J.; Muzy, J.F. Multifractal random walk. Phys. Rev. E 2001, 64, 026103. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bacry, E.; Muzy, J. Multifractal stationary random measures and multifractal random walk with log-infinitely divisible scaling laws. Phys. Rev. E 2002, 66, 056121. [Google Scholar]
Delour, J.; Muzy, J.; Arnéodo, A. Intermittency of 1D velocity spatial profiles in turbulence: A magnitude cumulant analysis. Eur. Phys. J. B 2001, 23, 243–248. [Google Scholar] [CrossRef]
Granero-Belinchón, C.; Roux, S.G.; Garnier, N.B. Kullback-Leibler divergence measure of intermittency: Application to turbulence. Phys. Rev. E 2018, 97, 013107. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Dependence of

{\bar{h}}_{T}^{(1, τ)}

(for

τ = 1)

) on

k / T^{\frac{1}{2}}

for the fractional Brownian motion (fBm) (a, in black) and for the Hermitian (b, in blue) and even-Hermitian (c, in red) log-normal processes.

Figure 1. Dependence of

{\bar{h}}_{T}^{(1, τ)}

(for

τ = 1)

) on

k / T^{\frac{1}{2}}

for the fractional Brownian motion (fBm) (a, in black) and for the Hermitian (b, in blue) and even-Hermitian (c, in red) log-normal processes.

Figure 2. Standard deviations of

{\bar{H}}_{T}^{(1, τ)}

(triangles),

{\bar{I}}_{T}^{(1, 1, τ)}

(circles), and

{\bar{h}}_{T}^{(1, τ)}

(stars), for

τ = 1

, as functions of T, for the fBm (a, black) the Hermitian (b, in blue), and even-Hermitian (c, in red) log-normal processes.

Figure 2. Standard deviations of

{\bar{H}}_{T}^{(1, τ)}

(triangles),

{\bar{I}}_{T}^{(1, 1, τ)}

(circles), and

{\bar{h}}_{T}^{(1, τ)}

(stars), for

τ = 1

, as functions of T, for the fBm (a, black) the Hermitian (b, in blue), and even-Hermitian (c, in red) log-normal processes.

Figure 3. (a) Entropy

{\bar{H}}_{T}^{(m, τ)}

and (c) auto-mutual information

{\bar{I}}_{T}^{(m, 1, τ)}

of the fBm in function of the logarithm of the window size

ln (T)

for a fixed scale

τ = 1

. (b) Entropy and (d) auto-mutual information in function of the logarithm of the scale of analysis

ln (τ)

for a fixed

T = 2^{16}

. Each symbol corresponds to a different embedding dimension m. In (a,c) the black line has a slope

H = 0.7

, while in (d) its slope is

- H = - 0.7

.

Figure 3. (a) Entropy

{\bar{H}}_{T}^{(m, τ)}

and (c) auto-mutual information

{\bar{I}}_{T}^{(m, 1, τ)}

of the fBm in function of the logarithm of the window size

ln (T)

for a fixed scale

τ = 1

. (b) Entropy and (d) auto-mutual information in function of the logarithm of the scale of analysis

ln (τ)

for a fixed

T = 2^{16}

. Each symbol corresponds to a different embedding dimension m. In (a,c) the black line has a slope

H = 0.7

, while in (d) its slope is

- H = - 0.7

.

Figure 4. Ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)}

of a fBm with

H = 0.7

. (a): as a function of the window size T for fixed

τ = 1

. (b): as a function of the scale

τ

for

T = 2^{16}

. Each symbol corresponds to a different embedding dimension m. The horizontal black line in (a) indicates the theoretical value

H_{1}^{fBm}

. The black line in (b) represents the linear function

H_{1}^{fBm} + H ln τ

with

H = 0.7

.

Figure 4. Ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)}

of a fBm with

H = 0.7

. (a): as a function of the window size T for fixed

τ = 1

. (b): as a function of the scale

τ

for

T = 2^{16}

. Each symbol corresponds to a different embedding dimension m. The horizontal black line in (a) indicates the theoretical value

H_{1}^{fBm}

. The black line in (b) represents the linear function

H_{1}^{fBm} + H ln τ

with

H = 0.7

.

Figure 5. Ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)} - ln (σ_{τ})

of the fBm as a function of

ln (τ)

for fixed

T = 2^{16}

and varying embedding dimension m. The thick horizontal black line represents the constant value

H_{1}^{fBm}

.

Figure 5. Ersatz entropy rate

{\bar{h}}_{T}^{(m, τ)} - ln (σ_{τ})

of the fBm as a function of

ln (τ)

for fixed

T = 2^{16}

and varying embedding dimension m. The thick horizontal black line represents the constant value

H_{1}^{fBm}

.

Figure 6.

{\bar{h}}_{T}^{(1, τ)}

for a motion built from a Hermitian (blue) or even-Hermitian (red) log-normal noise, as a function of (a) the time window size T or (b) the time scale

τ

. Results for the fBm (from Figure 4 with

m = 1

) are reported in black for comparison.

T = 2^{16}

and

k = 5

. The horizontal lines in (a) indicates the entropy

H_{1}

of the noise (in black for the fBm and in red and blue for a log-normal process).

Figure 6.

{\bar{h}}_{T}^{(1, τ)}

for a motion built from a Hermitian (blue) or even-Hermitian (red) log-normal noise, as a function of (a) the time window size T or (b) the time scale

τ

. Results for the fBm (from Figure 4 with

m = 1

) are reported in black for comparison.

T = 2^{16}

and

k = 5

. The horizontal lines in (a) indicates the entropy

H_{1}

of the noise (in black for the fBm and in red and blue for a log-normal process).

Figure 7. Ersatz entropy rate

{\bar{h}}_{T}^{(m = 1, τ)} - ln (σ_{τ})

for motions built on Hermitian (blue) or even-Hermitian (red) log-normal noise, together with results for the fBm (black) as a function of

ln τ

.

T = 2^{16}

and

k = 5

. The horizontal straight lines indicate the theoretical values of the entropy of the processes.

Figure 7. Ersatz entropy rate

{\bar{h}}_{T}^{(m = 1, τ)} - ln (σ_{τ})

for motions built on Hermitian (blue) or even-Hermitian (red) log-normal noise, together with results for the fBm (black) as a function of

ln τ

.

T = 2^{16}

and

k = 5

. The horizontal straight lines indicate the theoretical values of the entropy of the processes.

Figure 8. Probability density function (PDF) of the increments of the (a) Hermitian and (b) even-Hermitian log-normal motions of size

τ = 2^{j}

, from

j = 0

(bottom) up to

j = 6

(up). Curves have been arbitrarily shifted on the Y-axis for clarity.

Figure 8. Probability density function (PDF) of the increments of the (a) Hermitian and (b) even-Hermitian log-normal motions of size

τ = 2^{j}

, from

j = 0

(bottom) up to

j = 6

(up). Curves have been arbitrarily shifted on the Y-axis for clarity.

Figure 9. PDF of the increments of the (a) fBm and (b) of a multifractal random walk (MRW) of size

τ = 2^{j}

, from

j = 0

(bottom) up to

j = 6

(up). Curves have been arbitrarily shifted on the Y-axis for clarity.

Figure 9. PDF of the increments of the (a) fBm and (b) of a multifractal random walk (MRW) of size

τ = 2^{j}

, from

j = 0

(bottom) up to

j = 6

(up). Curves have been arbitrarily shifted on the Y-axis for clarity.

Figure 10. Ersatz entropy rate

{\bar{h}}_{T}^{(m = 1, τ)}

of a MRW with

H = 0.7

. (a): as a function of the window size T for fixed

τ = 1

. (b): as a function of the scale

τ

for

T = 2^{16}

. The horizontal line in (a) indicates the numerical value

H_{1}^{MRW}

of the noise. The straight line in (b) has a slope

H = 0.7

.

Figure 10. Ersatz entropy rate

{\bar{h}}_{T}^{(m = 1, τ)}

of a MRW with

H = 0.7

. (a): as a function of the window size T for fixed

τ = 1

. (b): as a function of the scale

τ

for

T = 2^{16}

. The horizontal line in (a) indicates the numerical value

H_{1}^{MRW}

of the noise. The straight line in (b) has a slope

H = 0.7

.

Figure 11. Ersatz entropy rate

{\bar{h}}_{T}^{(m = 1, τ)} - ln (σ_{τ})

of the MRW as a function of

ln (τ)

for fixed

T = 2^{16}

.

Figure 11. Ersatz entropy rate

{\bar{h}}_{T}^{(m = 1, τ)} - ln (σ_{τ})

of the MRW as a function of

ln (τ)

for fixed

T = 2^{16}

.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Granero-Belinchón, C.; Roux, S.G.; Garnier, N.B. Information Theory for Non-Stationary Processes with Stationary Increments. Entropy 2019, 21, 1223. https://doi.org/10.3390/e21121223

AMA Style

Granero-Belinchón C, Roux SG, Garnier NB. Information Theory for Non-Stationary Processes with Stationary Increments. Entropy. 2019; 21(12):1223. https://doi.org/10.3390/e21121223

Chicago/Turabian Style

Granero-Belinchón, Carlos, Stéphane G. Roux, and Nicolas B. Garnier. 2019. "Information Theory for Non-Stationary Processes with Stationary Increments" Entropy 21, no. 12: 1223. https://doi.org/10.3390/e21121223

APA Style

Granero-Belinchón, C., Roux, S. G., & Garnier, N. B. (2019). Information Theory for Non-Stationary Processes with Stationary Increments. Entropy, 21(12), 1223. https://doi.org/10.3390/e21121223

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Theory for Non-Stationary Processes with Stationary Increments

Abstract

1. Introduction

2. Information Theory for Non-Stationary Processes

2.1. Non-Stationary Processes with Stationary Increments

2.2. General Framework

2.2.1. Shannon Entropy

2.2.2. Mutual Information and Auto-Mutual Information

2.2.3. Entropy Rate

2.3. Practical Time-Averaged Framework

2.3.1. Time-Averaged Framework

2.3.2. Practical Framework

2.3.3. Information Theory Quantities in the Practical Framework

Ersatz Shannon Entropy

Auto-Mutual Information

Entropy Rate

2.4. Self-Similar Processes

2.4.1. General Framework

2.4.2. Practical Time-Averaged Framework

3. Benchmarking the Practical Framework with the fBm

3.1. Characterization of the Estimates

3.1.1. Data

3.1.2. Procedure

3.1.3. Convergence/Bias

3.1.4. Standard Deviation of the Estimates

3.2. Dependence on Times T and τ

3.2.1. Entropy and Auto-Mutual Information

Dependence on T

Dependence on τ

3.2.2. Stationarity of the Entropy Rate

3.2.3. Entropy Rate Dependence on Scale τ

4. Application of the Practical Framework to Non-Gaussian Self-Similar Processes

4.1. Procedure

4.2. Bias and Standard Deviation

4.3. Dependence on Times T and τ

5. Application of the Practical Framework to a Multifractal Process

6. Discussion and Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Entropy of a Time-Embedded Signal

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Dependence on Times T and $τ$

Dependence on $τ$

3.2.3. Entropy Rate Dependence on Scale $τ$

4.3. Dependence on Times T and $τ$