Information theory for non-stationary processes with stationary increments

We describe how to analyze the wide class of non stationary processes with stationary centered increments using Shannon information theory. To do so, we use a practical viewpoint and define ersatz quantities from time-averaged probability distributions. These ersatz versions of entropy, mutual information and entropy rate can be estimated when only a single realization of the process is available. We abundantly illustrate our approach by analyzing Gaussian and non-Gaussian self-similar signals, as well as multi-fractal signals. Using Gaussian signals allow us to check that our approach is robust in the sense that all quantities behave as expected from analytical derivations. Using the stationarity (independence on the integration time) of the ersatz entropy rate, we show that this quantity is not only able to fine probe the self-similarity of the process but also offers a new way to quantify the multi-fractality.


Introduction
Many real world processes, like global weather data, water reservoir levels, biological or medical signals, economic time series, etc, are intrinsicaly non-stationary [1][2][3][4][5]: their probability density function (PDF) deforms when time evolves. Analyzing such processes requires a stationary hypothesis in order to apply classical analysis, like, e.g., two-point correlations assessment [6]. The stationary hypothesis can be either strict or weak: while a strict stationarity requires all moments of the process -and hence its PDF -to be time-independent, a weak stationarity is achieved when the first moment and the covariance function are time-independent and the variance is finite at all time [7]. Even the weaker hypothesis is often very restrictive and not realistic over long time periods. When the signal has a drift or a linear trend, another approach is to focus on its time-increments or time-derivatives. Indeed, assuming that the increments or time derivative are stationary is then a more realistic hypothesis. For real world processes, the stationarity of the increments or even the stationarity of the signal is often argued to be valid when considering small chunks of data spanning short enough time range [8][9][10], so that slow evolutions of higher order moments can be neglected. The present article focuses on non-stationary processes with increments that are stationary and centered; this hypothesis ensures that the processes do not have any trend or drift.
Shannon information theory provides a very general framework to study stationary processes [11,12], and some attempts to analyze non-stationary processes have been reported [13][14][15]. Contrary to most classical approaches, like, e.g., linear response theory in statistical physics or solid state physics, this framework is not restricted to the study of two-point correlations and linear relationships, and it allows to quantify higher order dependences [16] and nonlinear dynamics [12]. Information theory can be straightforwardly applied to any non-stationary time-process X = {x t } t∈R : by carefully studying how the probability density and dependences of the process evolve in time, a time-evolving Shannon entropy H t (X) can be defined. The drawback of this approach is that it requires the knowledge of many realizations of the time-evolution of the process, as it relies on having enough statistics over the realizations [15].
Unfortunately, obtaining enough data is very difficult in real world systems where in the best case scenario a few realizations can be recorded experimentally, and usually only a single realization is accessible. In this paper, we develop a methodology that can be applied to a single realization, in order to analyse a non-stationary signal with stationary centered increments. We describe a time-averaged framework that gathers all available data points in a time window representing a single realization, whether it is the full experimental time duration, or just a fraction of it [13].
The present paper is organized as follows. In section 2, we present the general framework of information theory for a non-stationary signal, and our new framework that exploit time averages. We then give a particular emphasis on self-similar processes. In section 3, we report a benchmarking of our framework in the special case of Gaussian self-similar signals, a model situation where it is possible to obtain analytical developments. In section 4, we explore the case of non-Gaussian self-similar processes. Finally, in section 5, we drop the hypothesis of self-similarity and we apply our framework to a multifractal process.

Non-stationary processes with stationary increments
In this article, we consider non-stationary processes with stationary increments. Such a process can be written as a motion M = {m t } t∈R obtained by integrating a stationary noise W = {w t } t∈R : where m 0 and w 0 are the values at time t = 0, both of which can be set to 0 without loss of generality. Nowadays signals are recorded and stored on digital media, which amounts to consider in practice a set of data sampled at discrete times t k where k ∈ N + . We further assume that the signals are equi-sampled, i.e., dt is constant and we choose dt = 1. So we consider in this article discrete time processes and we express them as motions M = {m t } t∈N + obtained by integrating a stationary noise W = {w t } t∈N + according to: where again m 0 = w 0 = 0. Eq.(2) can also be replaced by If the noise W is not centered, i.e., has a statistical mean E(W) = β = 0, we introduce the centered noise w t = w t − β. The equations for the motion M read: The process M can be interpreted as a motion built on the stationary centered noise W together with an additive deterministic drift, which is the linear trend βt.
In this article, we study motions without trend, so we impose that the noise W is centered, i.e., that its statistical mean E(W) = β = 0. Besides the simple centering of the increments W → W − E(W), any detrending method can be applied to M, e.g., using moving averages. As a consequence, the motion M is centered: its statistical mean is E{m t } = m 0 = 0 at all times t > 0. Nevertheless, its variance, and all its higher order moments, may depend on time: the motion M is a non-stationary process with stationary increments. Typical examples of such processes are Brownian motion and fractional Brownian motion [17], both of which have a variance that evolves with time.

General framework
For a generic non-stationary process X t = {x t } t∈R , the probability density function (PDF) p x t (x t ) changes with time. The information theory framework can be applied to each random variable x t , i.e., at each time t. To do so, the PDF of x t needs to be estimated at each time t, which in practice requires to have many realizations available [15].
To analyze the temporal dynamics of a random processes at a given time t, we consider the m-dimensional vector obtained with the Takens time-embedding procedure [18]: The embedding dimension m controls the order of the statistics that are considered, and the delay τ defines a time scale. We define below some information theory quantities that are functionals of the m-point joint-distributions p , in order to characterize linear and non-linear temporal dynamics.

Shannon entropy
The entropy of x (m,τ) t is: This quantity depends on time t, as well as on embedding parameter m and delay τ. We further where the index t indicates the time and the parameters (m, τ) are indicated as upper indices. It measures the amount of information characterizing the m-dimensional PDF of the process X at time t sampled at scale τ. When m = 1, the entropy does not depend on τ and does not probe the dynamics of the process; we then note it H t (X), dropping the (m = 1, τ) upper indices. However, for embedding dimension m > 1 the entropy depends on the linear and non-linear dynamics of the process. Indeed, the entropy involves arbitrarily high order moments of the joint PDF ). As usual, the entropy does not depend on the first moment of the distribution.
Using the time-increments of size τ, δ τ x t ≡ x t − x t−τ , it can be shown (see appendix A) that the amount of information measured by H (m,τ) t (X) is the same as the amount of information in the vector For processes with stationary increments, the marginal distribution of x t may be strongly time-dependent, but the marginal distributions of any increment is time-independent. Eq.(8) thus suggests that the time-dependence of H (m,τ) t (X) originates mainly from x t , the first component of the rewritten embedded vectorx m,τ t . Nevertheless, it should be observed that although the m − 1 increments, considered by themselves, have a stationary dependence structure, the covariance of x t with any of the increments is a priori non-stationary.

Mutual information and auto-mutual information
The mutual information MI measures the amount of information shared by two processes. For two non-stationary time-embedded vectors x (m,τ) t 1 and y (n,τ) t 2 , it is defined as : In the following, we use auto-mutual information I (m,n,τ) t (X) to measure, for a single process X, the shared information between two successive time-embedded vectors of dimension m and n [19]: Auto-mutual information defined in (10) probes the dynamics of the process X t at time t by measuring the dependencies between two consecutive chunks of m and n points sampled every τ.

Entropy rate
The entropy rate, or entropy gain [20], of order m at time t measures the increase of Shannon entropy when the embedding dimension is increased from m to m + 1. It is defined as the variation of Shannon entropy between x Within the general framework, the entropy, mutual information and entropy rate are well defined at any time t for a non-stationary process. Although this framework can formally be used to analyze non-stationary processes at any time t, in practice it is often impossible to assess statistics at a fixed time t, as the number of available realizations from real world datasets may be very small. To overcome this issue, we propose in the next section another framework that considers averages over a finite and possibly large time window, which represents for example the duration of an experimental measurement.

Practical time-averaged framework
We now focus on non-stationary processes with stationary increments. We develop in this section a pragmatic approach which can be applied when a single time trace of a non-stationary signal is available.
We first present a very formal perspective that defines a time-averaged PDF of a non-stationary process. We then propose a practical approach which uses a very simple estimation of such a time-average PDF. We finally use this practical approach to define all the information quantities that we are interested in.

Time-averaged framework
Using a formal perspective, we consider the global statistics of the dataset, when forgetting its time dynamics, and we formally consider the time-averaged probability density function in the time window [t 0 , t 0 + T] :p Because of the time-average, this probability density function doesn't depend on a single time t but on the starting time t 0 and the duration T of the time window.
In the case of a stationary process, the PDF p is independent of t 0 and T.
In the case of a non-stationary process with stationary centered increments, the dependence on t 0 only appears on the mean of the time-averaged PDFp T,t 0 ,x (m,τ) (x). As a consequence, since the Shannon entropy does not depend on the mean, none of the information theoretic quantities depends on t 0 .
In the case of a non-stationary process with stationary but non-centered increments, there is a drift: the first moment of p x (m,τ) t (x) evolves linearly with time. When integrated in time in eq.(13), this induces a deformation of the time-averaged PDFp T,t 0 ,x (m,τ) (x), which affects a priori moments of any order. As a consequence, the Shannon entropy is then expected to depend on t 0 .
In the following, we focus on non-stationary processes with stationary centered increments, described in section 2.1.

Practical framework
In practice, given a time series of length T, we propose to very roughly approximate the PDF available in the time window. This is a very strong assumption, asp T,t 0 ,x (m,τ) is a priori very different from any p x (m,τ) t , and a priori very different from the histogramp T,t 0 constructed after cumulating all the available data in the interval. This pragmatic approach comes down to treat the set of available data points in the time interval exactly in the same way as if it was a set of data points originating from a stationary, albeit unknown, process and then estimate its PDF.
In the following, we drop the hat in the notations, and consider only the ersatz probabilitiesp in place of the time-averaged probabilitiesp. As we discuss later in section 6, if several experimental realizations are available, it is of course possible to use them to enhance the estimation of the time-averaged PDF.

Information theory quantities in the practical framework
Given a time series of length T, and considering the ersatz PDFsp T,t 0 ,x (m,τ) , we defineH

Ersatz Shannon entropy
We define the ersatz entropy of the time-embedded signal as the entropy of the time-averaged  , t ∈ [t 0 , t 0 + T]} of the process. If the process has stationary centered increments, the total amount of information in the trajectory depends only on its length T, and not on its starting time t 0 . In that sense, the ersatz entropyH (m,τ) T is not stationary. Using the rewriting (8), we argue that this dependence in T originates from x t -the first component of the vectorx m,τ t -which has a time-dependent marginal distribution. Because the m − 1 other components ofx m,τ t are increments, they have by hypothesis a stationary dependence structure. So increasing the embedding dimension does not impact the dependence of the ersatz entropy on the window size T, but only its dependence on the increments size τ.

Auto-mutual information
We define the ersatz auto-mutual information as: Entropy rate We define the ersatz entropy rate over a time interval of size T as: From (16), we may expect a cancelation of the main dependence in T which is the same for . As a consequence, the ersatz entropy rateh should be stationary, in the sense that it should not depend on the length T of the time interval that is considered.
If the available samples span a very large time window, one may consider using multiple non-overlapping time windows of size T starting at various times. Because of the stationarity and zero-mean of the increments and hence the independence of the ersatz quantities on t 0 , it is possible to average the different estimations of the ersatz quantities obtained in each window. It is also possible to use all the non-overlapping windows to populate the histogram and thus enhance the estimation of the time-averaged PDF. Each of these two operations will increase the statistics and hence improve the estimation.

Self-similar processes
In this section, we focus on the special case of self similar processes, i.e., signals which exhibit monofractal scale invariance [21]. Such processes have been used as a satisfying first approximation to model or describe very various phenomena, such as ionic transport [22], fluid turbulence [23], climate [24], river flows [25], cloud structure [26] or earthquakes [27], as well as neural signals [28], stock markets [29,30], texture patterns [31] or internet traffic [32]. A process X t is monofractal scale-invariant if there exists a real number H such that for all a ∈ R + * , the probability density functions of x at and a H x t are equivalent. H is called the Hurst exponent. If H < 0, the process is stationary and called a fractional noise. If 0 ≤ H < 1, the process is non-stationary with stationary increments. The case H = 1/2 corresponds to the traditional Brownian motion.
Assuming x t=0 = 0, the scale invariance property can be expressed as [33]: The scale invariance property of a process X t transfers to its increments, as well as any of its time-embedded version: p This relation allows to express the non-stationary PDF of x (m,τ) t at any time t as a function of the PDF at unit-time (t=1). This is done by using the factor a = 1/t in eq. (19), i.e., by rescaling each coordinate of the embedded vector by the factor t H .
Using eq.(13), it is straightforward to see that the scale invariant property of the form (19) is also valid for the time-averaged PDFp T,t 0 ,x (m,τ) (x).
Because of its definition (2) as a cumulative sum of a noise, a motion can be seen as accumulating the correlations between successive points of the noise. When performing a time-embedding, the particular case τ = 1 is interesting: considering the relation (8), we may expect that the information contained in the time-embedded motion m (m,τ=1) t is closely related to the information contained in the time-embedded noise w (m,τ=1) t . This is not the case anymore when τ ≥ 2.

Fractional Brownian motion
The Fractional Brownian motion (fBm) was proposed by Mandelbrot and Van Ness [17] and quickly became a benchmark for self-similarity and long-range dependence. The fBm is the only Gaussian self-similar process with stationary increments. It is characterized by its Hurst exponent, H.
The fBm is a motion, obtained by integrating according to (2) a fractional Gaussian noise (fGn), defined as a centered Gaussian process with the correlation structure The fGn is a stationary noise with the standard deviation σ 1 . It is scale-invariant with a Hurst exponent H − 1.
The non-stationary covariance structure of the fBm B reads where τ < t.

General framework
We show below how the theoretical information quantities depend on time t and delay τ. We start from the relation (8) between the entropy of the time-embedded vector and the entropy of the increments and we normalize each component of the vectorx (m,τ) t by its standard deviation. The standard deviation σ t of the motion x t evolves with time as σ t = σ 1 t H , while the standard deviation σ τ of the increments x t − x t−τ is independent of t, thanks to the stationarity of the increments, and evolves with the size of the increment as σ τ = σ 1 τ H . So we have: We then use the scaling law (19) for a = 1/t to relate the joint probability at a given time t to the joint probability at unit-time t = 1, which leads to: (X), so we can express the time-dependent Shannon entropy (7) for self-similar processes as: The entropy rate can be rewritten with (11) and (26) as: where h (m,τ/t) 1 (X) is the entropy rate at time t = 1, using the rescaled time delay τ/t.

Although the two quantities H
(m,τ/t) t=1 (X) and h (m,τ/t) t=1 (X) are considered at a fixed time t = 1, they still depend on t via the delay τ/t. Because τ/t is small as soon as t τ, we expect that the dependence of the entropy H (m,τ) t (X) on time t is mainly in H ln t, and that the entropy rate is almost time-independent.

Fractional Brownian motion
The PDF p B t of the fBm is Gaussian at any time t, so we can express its Shannon entropy and entropy rate at time t by using eq.(26) and the expression of the Shannon entropy of a Gaussian multivariate process [34]. We obtain the following approximated expressions: where H FBM 1 ≡ 1 2 ln 2πeσ 2 1 is the entropy of the fBm at unit-time. These formulae are exact for m = 1, 2, but for m ≥ 3, constant terms as well as corrections in τ/t have been omitted for clarity.

Practical time-averaged framework
For a generic self-similar process, we are not able to derive any analytical results in the practical time-averaged framework. Nevertheless, the behaviors expected for a generic non-stationary process with stationary increments are holding: i) the ersatz entropyH

Fractional Brownian motion
The ersatz entropy of the fBm over a time window of size T τ can be expressed by averaging its covariance structure on a time window of size T. We obtain [35]: The entropy of the fBm thus increases linearly with the logarithm of the window size T. The larger the time window, the more there is information in the trajectory. The auto-mutual information of the fBm can be derived in the same way using (15) for T τ: where C τ T is a correction in τ/T that reads The ersatz mutual information depends logarithmically on the scale τ and the window size T. The larger the window-size T or the smaller the scale τ, the stronger the dependencies.
The ersatz entropy rate of order m = 1 is obtained by combining (30) and (31) according to (17): which is independent of T up to corrections in τ/T, while being linear in ln(τ) with a constant slope H. The correction −C(τ/T) in eq.(34) is positive, see eq. (33). Comparing (28) with (30) shows that for the fBm, the ersatz entropy dependence on T is exactly the same as the entropy dependence on t. Comparing (29) with (34) shows that the entropy rate and the ersatz entropy rate do not depend on t or T up to corrective terms that are negligible if the scale τ is not too large. We also see explicitly that both quantities evolves with the scale τ in H ln τ, again up to corrections of order τ/t and τ/T.
The example of the fBm suggests that for a scale-invariant process the evolution of any information theory quantity with the scale τ is the same within the practical time-averaged framework or the general framework. We push this analysis further in the next sections, by exploring if this property holds when the process is non-Gaussian.

Benchmarking the practical framework with the fBm
We focus in this section on the fractional Brownian motion, for which analytical expressions were derived in the previous sections. We use the fBm not only to benchmark our estimators of information theory quantities, but also to illustrate the use of the practical framework and the expected behavior of the ersatz quantities when used on a self-similar process of Hurst exponent H.

Data
To obtain a fBm, we integrate a fractional Gaussian noise (fGn). We use circulant matrix method [36] to impose the correlation structure of the fGn (20) . Then, we center and normalize the noise such that the standard deviation, σ fGn , is equal to one. We then take the cumulative sum to obtain the fBm. Through all this article, H = 0.7 for all the processes used to illustrate our results, but we have checked that they hold for any other value 0 < H < 1.

Procedure
We estimate the Shannon entropyH (m,τ) T with our own implementation of the k-nearest neighbors estimate from Kozachenko and Leonenko [37]. We estimate the auto-mutual informationĪ (m,p,τ) T with the algorithm provided by Kraskov, Stogbauer and Grassberger [38]. This estimator is also based on a nearest neighbors search and it provides -amongst several good properties -a build-in cancellation of the bias difference originating from each of the two arguments. In the following, we note k the number of neighbors, which is the only parameter of the estimators. The entropy rateh (m,τ) T is then computed using eq.(17).
We generate for each motion a set of 100 independent realizations of fixed size T with a Hurst exponent H = 0.7. We compute averages of the estimates on the realizations and use the standard deviation as error bars in the different graphs.
In subsections 3.1.3 and 3.1.4, we characterize respectively the bias and standard deviation (std) over realizations of our estimators of entropy, auto-mutual information and entropy rate.

Convergence / bias
We detail here how the ersatz entropy rate evolves with T and k. We report in Fig. 1a our results for all possible values of the couples (log 2 (T), k) ∈ [9, ..., 17] × [4, ..., 18], while τ is set to 1 here. According to eq.(34), the ersatz entropy rate of the fBm converges for large T to the value H fBm 1 (horizontal black line in Fig. 1a) thanks to the vanishing of the correction term C(τ/T), according to (33). Fig. 1a can be interpreted as describing the behavior of the bias of the estimator. This bias vanishes non-monotonically as k/T 1 m+1 . When k/T 1 m+1 is reduced, first the bias is positive and diminishes toward negative values and then converges to zero. This behavior was previously reported for the k-nn mutual information estimator applied for stationary processes [16,38,39], and we confirm it is valid for the fBm. We observed the same convergence for a large range of scales τ > 1: the ersatz entropy rate then converges to H 1 fBm + H ln τ for large T with the same behavior of the bias.

Standard deviation of the estimates
We present in Fig. 2a the evolution of the standard deviation of the ersatz entropy, mutual information and entropy rate with T for τ = 1. The standard deviation of both the entropy and mutual information is large, and does not decrease when T -and hence the number of samples -increases. On the contrary, the standard deviation of the entropy rate is much smaller and decreases when T increases. We attribute this feature to the dependence of the quantities on the observation time T, see eqs. (30) and (31) for the fBm. WhileH T andĪ T increase as ln T, this is not the case forh T which is independent on T (up to small corrections, negligible for smallish τ). Although it is difficult to explain why the standard deviation of the entropy and mutual information remain constant when T increases, it seems that this results from a balance between the non-stationarity (in ln T) and the increased statistics. On the contrary, for the entropy rate which is stationary, the decrease of the std is as expected.
As a conclusion, both the bias and the standard deviation of the ersatz entropy rate increase when k increases or T decreases and can be made arbitrarily small by increasing the window size T. In the remainder of this article, we choose k = 5 and when studying the behavior of information theoretic quantities on the scale τ, we set T = 2 16 .

Dependence on times T and τ
In this section, we present a detailed numerical study of the ersatz entropy, auto-mutual information and entropy rate of the fBm with H = 0.7. In particular, we present a quantitative comparison with the analytical expressions (28,29) in the general framework, as well as with analytical expressions (30,31,34) in the practical framework for the fBm. These comparisons allow: first, to validate the analytical expressions obtained for fBm in the practical framework, and second to show that the information theoretic quantities in the practical framework evolve in T and τ exactly as their counterparts evolve in the general framework in t and τ. To compare analytical and numerical results, we vary the window size T, the scale τ and the embedding dimension m.

Entropy and auto-mutual information
Dependence on T The left column of Fig. 3 shows the ersatz Shannon entropyH (m,τ) T (Fig. 3a) and auto-mutual informationĪ (m,1,τ) T (Fig. 3c) at a given scale τ = 1, as a function of ln T. The evolution of these two quantities for m = 1 is very close to H ln T, which is represented by a continuous black line. This is in agreement with eq.(30) and eq.(31). For m > 1, we obtain in the practical framework the behaviors predicted in the general framework, replacing t by T in the equations. We observe that the auto-mutual information does not depend on the embedding dimension m, while the entropy does, with an offset that seems to depend linearly on m. The dependence of the entropy and the auto-mutual information on the time window T is the signature of the non-stationarity of the signal.

Dependence on τ
The right column of Fig. 3 shows the ersatz Shannon entropy and auto-mutual information for a fixed window size T = 2 16 when varying the scale parameter τ. The ersatz Shannon entropy behaves as (m − 1)H ln(τ), see Fig. 3b, in agreement with eq.(26) or eq.(28). The ersatz auto-mutual information behaves as −H ln(τ) for any embedding m, see Fig. 3d, in agreement with eq.(31), thus suggesting this formula is valid for any embedding dimension. Fig. 4a shows that the ersatz entropy rateh (m=1,τ) T with embedding dimension m = 1 is almost constant when T is varied. For embedding dimensions m > 1, there is a small variation, of about 15%, much smaller than the 200% variation observed for either the entropy or the auto-mutual information (Fig. 3a,c) on the same range of T. This small dependence on T can be due to the correction in eq.(34), which may depend on m. We argue that it is mostly due to bias, which increases with the embedding dimension. Indeed, we observe that the entropy rate seems to converge for larger T to the same value close to H fBm that the bias is negative and larger than the theoretical correction. This suggests that the form of eq.(34) is still valid for embedding dimensions m > 1. Fig. 4b shows that for a fixed window size T = 2 16 the ersatz entropy rate is proportional to H ln(τ). We have added a black line defined by the linear function H FBM 1 + H ln τ, as suggested by eq.(34) without the corrective term. This black line perfectly describes the evolution of the entropy rate with the scale τ, which is independent on the embedding dimension m.

Entropy rate dependence on scale τ
To observe the finer evolution of the entropy rate on the scale τ, we subtract the main contribution H ln τ to the entropy rate and we ploth (m,τ) T − H ln τ for different embedding dimensions in Fig. 5. We observe a slight increase, which is larger for larger embedding dimensions. For m = 1, the correction term can be evaluated from eq.(32), and is at most 2.10 −3 , and does not account for the evolution reported here, which is probably due to the bias which increases when the number of points -which is proportional to T/τ -decreases and when the embedding m increases.
For a scale invariant self similar process, the standard deviation σ τ of the increments of size τ behaves as σ τ = σ 1 τ H . Subtracting H ln τ amounts to subtracting ln σ τ : for each scale τ, this corresponds to normalizing the down-sampled data (taking one point every τ points) by the standard deviation σ τ of the increments of size τ. When the Hurst exponent is a priori unknown, σ τ can be computed, and used to compute the main contribution − ln(σ τ ); thus the fine evolution of the entropy rate with τ can be used as a tool to probe the deviation from the self similarity assumption, which is interesting for multifractal signals.

Application of the practical framework to non-Gaussian self-similar processes
In this section, we turn to non-Gaussian processes and describe the results obtained in the time-average framework generalized in this larger class of processes.

Procedure
We construct two different motions, in the very same way as we did for the fBm. We integrated two log-normal noises synthesized with the same log-normal marginal distribution and with the same correlation function (20) as the fGn, but different dependance structure. To generate these noises, we use the methodology proposed in [36] to obtain the log-normal marginal by applying two different transformations to the cumulative distribution function F Z of a Gaussian white noise Z: the Hermitian transformation of rank 1 ( f 1 (z) = F −1 (F Z (z))) and the even-Hermitian transformation of rank 2 : where F is the cumulative distribution function of the targeted log-normal distribution. This synthesis is performed with the toolbox provided at www.hermir.org. Once the two log-normal noises have been generated, they are integrated using eq.(2) to obtain two non-stationary scale invariant processes with non-Gaussian statistics.
The dependence structures of the two log-normal noises were previously studied in detail [16]: while the correlation function is the same for the two noises -and identical to the targeted one of the fBm given by (20) -the complete dependence structure was shown to be different.
To study these two non-stationary and non-Gaussian motions, we use again realizations of T = 2 16 points, k = 5 neighbors and we focus on the case where embedding dimension m = 1 and Hurst exponent H = 0.7.

Bias and standard deviation
We report in Fig. 1b and 1c the evolution of the ersatz entropy rate of the Hermitian and the even-Hermitian log-normal processes in function of k T 1/2 . We observe exactly the same behavior as for the fBm: the entropy rate converges to H ln 1 , the entropy of the log-normal process at unit-time 1 (horizontal blue/red line in Fig. 1b,c), which then gives an estimation of the bias of our estimator, which appears to be the same as for the fBm.
We report in Fig. 2b and Fig. 2c the behavior of the standard deviation of the estimators. Again, exactly as for the fBm, the standard deviation is large for the ersatz entropy and the ersatz auto-mutual information, while it is much smaller for the ersatz entropy rate.
Again, both the bias and the standard deviation of the entropy rate increase when k increases or T decreases and can be made arbitrarily small by increasing T. These results do not depend on the marginal distribution: they have been obtained not only for the fBm with Gaussian statistics, but also for two motions built on log-normal noises.

Dependence on times T and τ
The evolution ofh (1,τ) T on the time window size T for the two motions is presented in Fig. 6a). As it was the case for the fBm,h (1,τ) T depends only weakly on T, and seems to converge for larger T to the value H ln 1 , up to a small corrective term. 1 if X is a log-normal process of mean µ and standard deviation σ, then the process log X is Gaussian with the mean µ = log µ 2 √ µ 2 +σ 2 and the standard deviation σ = 2 log 1 + σ 2 µ 2 and the entropy of X can be expressed as [16]:

The evolution ofh
(1,τ) T with the time scale τ is presented in Fig. 6b). In the same way as for the fBm, we again observe a large increase, almost proportional to ln τ. Because this strong tendency originates from the increase of the standard deviation σ τ of the increments of size τ when τ increases, we again normalize the entropy rate by subtracting ln σ τ = H ln τ. Results are presented in Fig. 7, together with results for the fBm with m = 1 for comparison.
The normalized ersatz entropy rate of the motion built from the even-Hermitian log-normal noise appears as almost independent of τ. This behavior is identical to the one observed for the fBm, but the remaining constant value is different (H fBm 1 or H ln 1 ). The ersatz entropy rate of the fBm (in black) and the even-Hermitian motion (in red) both behaves exactly as H ln τ, which is the expected behavior for a self-similar process, see eq. (27). On the contrary, the motion built with the Hermitian transformation of rank 1 exhibits an additional variation in τ: the normalized entropy rateh (m,τ) T − ln(σ τ ) evolves from the value H ln 1 at τ = 1 -expected for the motion built with a log-normal noise and obtained for the even-Hermitian process at any τ -up to the value H fBm 1 -expected for a Gaussian process, and obtained for the fBm at any τ.
As a conclusion, one can estimate the Hurst exponent of a perfectly self-similar process as the slope of the linear fit in ln τ of the ersatz entropy rate. This is a valid approach for the fBm and the motion built from the noise constructed with the even-Hermitian transformation, because the ersatz entropy rate then behaves linearly in ln τ. On the contrary, the motion built using an hermitian transformation of rank 1 does not appear as perfectly self-similar. This can be indeed verified by plotting the normalized PDFs (setting the standard deviation to unity) of the increments m t − m t−τ of the motions for various values of τ. As can be seen in Fig. 8 the PDFs of the increments of the "standard log-normal process" varies with the scale τ, while these of the "even-Hermitian motion" remain identical. For τ = 1, the increments are nothing but the log-normal noises, which are log-normal, as prescribed. For large τ, the increments of the "even-Hermitian motion" remain log-normal, while the increments of the standard log-normal motion" deforms and seems to become more Gaussian. The ersatz entropy rate catches this fine evolution perfectly.

Application of the practical framework to a multifractal process
We now explore the proposed time-averaged framework on the Multifractal Random Walk, to illustrate how it performs on a multifractal process. The multifractal random walk (MRW) [40,41]   is a popular multiplicative cascade process widely used to model systems that exhibit multifractal properties [42]. Like the fBm, the MRW is a motion obtained by integrating -again with eq. (2) -a stationary noise W MRW = {w MRW t } t∈R such that where W fGn = {w fGn t } t∈R is a fGn with parameter H fGn and Ω = {ω t } t∈R is a Gaussian random process, independent of X fGn with a correlation function where L is the integral scale, set here to L = T. The MRW is a scale invariant process: the power spectrum of its time-derivative W MRW behaves as a power law with an exponent 2(H fGn − c 2 ) + 1, which would be the Hurst exponent obtained for a fGn with parameter H fGn − c 2 . Any moment of order q of the increments of size τ behaves as a power law of τ with the exponent ζ(q). Contrary to the fBm, the MRW is not exactly self similar and exhibits intermittency: ζ(q) = H fGn q − c 2 2 q 2 is not a linear function of q, as expected for a self-similar process. As a consequence, the shape of the PDF of the increments depends on the scale.
We choose the parameter H fGn such that the power spectrum of the noise W MRW is identical to the one of the fBm used in the former sections, i.e., H fGn = 0.7 + c 2 . We set the parameter c 2 = 0.025, a value widely used to model the intermittency of Eulerian turbulent velocity field [23]. Figure 9 compares the evolution of the PDF of the increments of the fBm and the MRW. As expected, no change is observed for the fBm, while the PDF of the MRW has wider tails for smaller τ. The fBm is perfectly self-similar, while the MRW exhibits intermittency [43]: the PDF of its increments is deformed when the scale τ of the increments is varied, although no analytical expression of the PDF is available.
We apply our practical framework and plot in Fig. 10a) the evolution of the ersatz entropy rate of the MRW with T. Again, the entropy rate seems to be independent of T. We nevertheless observe a small tendency to increase towards the value H MRW 1 the entropy of the MRW at unit-time. Here, because there is no analytical expression of the PDF, H MRW 1 cannot be derived analytically and we numericaly estimate its value.
The dependence in τ is plotted in Fig. 10b). We again observe a strong linear evolution of the ersatz entropy rate in H ln τ. After subtracting this strong tendency (Fig. 11), we still observe an evolution with τ, but this evolution appears much weaker than for the Hermitian log-normal (blue curve in Fig. 7). Indeed, the deformation of the PDFs of the increments when varying τ is much slower for the MRW (Fig. 9b) than for the Hermitian log-normal (Fig. 8a).

Discussion and Conclusions
We proposed a new framework in information theory to analyze a non-stationary process by considering it as resulting from a gedanken stationary process and estimating the PDF by cumulating all available samples in a time interval of size T. This framework hence considers a PDF obtained by time-averaging over a time window [t 0 : t 0 + T], and then proceeds to compute the associated information theory quantities. In particular, the ersatz entropyH T (X) that is then defined can be interpreted as the amount of information characterizing the complete trajectory {X t , t ∈ [t 0 , t 0 + T]} of the process X. If we assume that the increments of X are stationary and centered, thenH T (X) and all other information ersatz theoretical quantities depend only on the duration T and not of the first time t 0 .
We illustrated our approach by focusing first on a model system: the fractional Brownian motion. We derived in this context the analytical expressions of the ersatz entropy, ersatz auto-mutual information and ersatz entropy rate, which allowed a pedagogical description of our new information theory quantities. We also reported how the ersatz quantities behave when the time-interval size ln(T ) T and the embedding time scale τ are varied: we obtained analytical expressions for embedding dimension m = 1, and confirmed them numerically for m ≥ 1. Besides the fBm, we reported numerical observations for various self similar or multifractal processes. The ersatz entropyH (m,τ) T always diverges logarithmically in T while the ersatz entropy rateh (m,τ) T always behaves as almost independent of T. The examination of how the ersatz entropy rateh (m,τ) T depends on the scale τ provides a fine exploration of either the self-similarity or the multifractality of the process.
This exploration of the multifractality of a non-stationary process with stationary increments using the ersatz entropy rateh (m,τ) T (M) gives a viewpoint very similar to the one reported when analyzing the increments of the process with the regular Shanon entropy, as reported in [43]. We are currently investigating how to relate quantitatively the two approaches.
In the same vein, the ersatz entropy rate allowed us to discriminate two different non-stationary processes, and obtain fine differences in their self similarity properties (figure 7), in close relation to a method using the entropy rate of the increments of the signal, as exposed in [16]. A possible connection is also under investigation.
Through all this article, we have estimated the ersatz quantities of a process on a single trajectory [t 0 ; t 0 + T] of this process; this situation corresponds to the worst case scenario where only a single realization of the process is know. If enough experimental data are available, one can improve the estimation of the ersatz quantities in two ways. First, if the same experiment has been conducted multiple times, and thus multiple realizations are available over the time interval [t 0 ; t 0 + T], one can use all these independent realizations to enhance the estimation of the time-averaged PDF. Second, if a single but long enough realization of size T is available, one can split it into multiple time intervals [kT; (k + 1)T], k ∈ [0.. T /T ] and the use these intervals as independent realizations as in the first case. This later situation is made possible by the assumption that the increments of the signal are not only stationary, but also centered.