Entropy Production in Stochastics

While the modern definition of entropy is genuinely probabilistic, in entropy production the classical thermodynamic definition, as in heat transfer, is typically used. Here we explore the concept of entropy production within stochastics and, particularly, two forms of entropy production in logarithmic time, unconditionally (EPLT) or conditionally on the past and present having been observed (CEPLT). We study the theoretical properties of both forms, in general and in application to a broad set of stochastic processes. A main question investigated, related to model identification and fitting from data, is how to estimate the entropy production from a time series. It turns out that there is a link of the EPLT with the climacogram, and of the CEPLT with two additional tools introduced here, namely the differenced climacogram and the climacospectrum. In particular, EPLT and CEPLT are related to slopes of log-log plots of these tools, with the asymptotic slopes at the tails being most important as they justify the emergence of scaling laws of second-order characteristics of stochastic processes. As a real-world application, we use an extraordinary long time series of turbulent velocity and show how a parsimonious stochastic model can be identified and fitted using the tools developed.


Introduction
Entropy was first recognized as a probabilistic concept in 1887 by Boltzmann [1], who established a relationship of entropy with probabilities of statistical mechanical system states, thus explaining the Second Law of Thermodynamics as the tendency of the system to run toward more probable states.In 1948 Shannon [2] used an essentially similar, albeit more general, definition of entropy as a probabilistic concept, a measure of information or, equivalently, uncertainty.In 1957 Jaynes [3] introduced the principle of maximum entropy thus equipping the entropy concept with a powerful tool for logical inference.
A decade later, probabilistic entropy and the principle of maximum entropy were used in geophysical sciences and particularly hydrology, initially for parameter estimation of models [4] and probability distributions [5].Detailed reviews on the use of entropy in applications in hydrology, and water and environmental engineering have been provided by Singh ([6,7] and more recently [8][9][10][11]).Most applications of probabilistic entropy are static, disregarding time.In studying systems out of the equilibrium, time should necessarily be involved (e.g., [12]) and the paths of the systems should be inferred.The more recently developed, within non-equilibrium thermodynamics, extremum principles of entropy production attempt to predict the most likely path of system evolution [13][14][15].Central among the latter are Prigogine's minimum entropy production principle [16] and Ziegler's maximum entropy production principle [17,18].
Entropy production in thermodynamic systems is typically defined in terms of the derivative of entropy with respect to time, while the entropy flux in open systems is also considered.Niven and Ozawa [19] provide a general definition of entropy production along with a brief review of applications of extremum principles in geophysics and particularly hydrology.It can be seen that in

•
It is dimensionless as d(ln t) = dt/t and thus, given that entropy is a dimensionless quantity per se (see next section), the EPLT remains a dimensionless quantity.

•
It is consistent with the notion of the 'arrow of time'.Usually entropy per se is related to the time asymmetry and the Second Law is regarded as the origin of the arrow.However, it may be simpler to think that it is the action of observation that creates time asymmetry.In this respect, we can set t = 0 for the observation time, i.e., the present, while t < 0 denotes the observable past and t > 0 denotes the unknown future.The fact that the past is observable, but the future not, generates the asymmetry.It is clarified that when the future becomes observable, it is no longer future; rather it has become present or past.Once the future (t > 0) is treated separately from the past, it is legitimate to differentiate with respect to the logarithm of time.

•
It makes extremization of entropy production easier, particularly for asymptotic times t → 0 and t → ∞, as it avoids infinite or zero values of entropy production [22]; indeed, the asymptotic values of EPLT are always bounded (see Section 2.4).

•
It provides the means to study or even explain the scaling behaviour often observed in geophysical processes (see Sections 2.2 and 2.4).
Here we advance the study of the EPLT concept, with particular emphasis on the conditional entropy production (CEPLT), when the past and present have been observed (Section 2.2).The theoretical properties of EPLT and CEPLT are studied in a general setting and in application to a broad set of specific types of stochastic processes (Sections 3.1 and 3.2).A main question studied is how the entropy production can be estimated from time series, i.e., realizations of stochastic processes.It turns out that there is a strong link of the EPLT with the climacogram, with the slope of a log-log plot of the latter representing the EPLT.Further, we introduce two new tools, additional to the climacogram and based on it, the differenced climacogram and the climacospectrum, where the latter has properties similar to those of the power spectrum.The slope of a log-log plot of either of the two additional tools can be used as an estimator of CEPLT.Of particular importance are the asymptotic slopes for time scale tending to zero or infinity (Section 3.3).To illustrate the theoretical and empirical properties of EPLT and CEPLT using real-world data and to test the applicability of the specific models proposed, a long time series of very fine resolution is needed in order to allow viewing the real-world behaviour at the far-left and the far-right tails.Fine resolution measurements of turbulence can provide an ideal test series and indeed a time series of length 36 × 10 6 of turbulent velocities is utilized for illustration and testing (Section 3.4).
To avoid bothering the reader with derivations and theoretical details, most of them have been put in a series of annexed sections organized as Appendix A.1.In addition, the details of the specific stochastic models and their behaviour have been grouped in Appendix A.2.While the two appendices are made in a stand-alone form separate from the body of the article, they are perhaps the most essential part of the study.

General Context
The evolution of a system state x over time t is usually represented as a trajectory x(t), but such a description is often inefficient.Instead, the evolution can be represented as a stochastic process x(t), which is a collection of (infinitely many) random variables x indexed by t.A realization (sample) x(t) of x(t) is a trajectory; if it is known at certain points t i , i = 1, 2, . . ., the realization is called a time series.
A random variable x is an abstract mathematical entity associated with a probability density (or mass) function f (x); the realizations x of x belong to a set of possible numerical values.Notice the different notation of random variables and hence stochastic processes (underlined, according to the Dutch notation [23]) from regular ones.
Most natural processes evolve in continuous time but they are observed in discrete time, typically by averaging.Accordingly, the stochastic processes devised to represent the natural processes should evolve in continuous time and be converted into discrete time, as illustrated in Figure 1.

General Context
The evolution of a system state x over time t is usually represented as a trajectory x(t), but such a description is often inefficient.Instead, the evolution can be represented as a stochastic process x(t), which is a collection of (infinitely many) random variables x indexed by t.A realization (sample) x(t) of x(t) is a trajectory; if it is known at certain points t i , i = 1, 2, …, the realization is called a time series.
A random variable x is an abstract mathematical entity associated with a probability density (or mass) function f(x); the realizations x of x belong to a set of possible numerical values.Notice the different notation of random variables and hence stochastic processes (underlined, according to the Dutch notation [23]) from regular ones.
Most natural processes evolve in continuous time but they are observed in discrete time, typically by averaging.Accordingly, the stochastic processes devised to represent the natural processes should evolve in continuous time and be converted into discrete time, as illustrated in Figure 1.Explanatory sketch for a stochastic process in continuous time and in discrete time.Note that the graphs display a realization of the process (it is impossible to display the process as such) while the notation is for the process per se (adapted from [24]).
While a stochastic process denotes, by conception, change (process = a series of changes), there should be some properties that are unchanged in time.This implies the concept of stationarity [25], which is central in stochastics.According to Kolmogorov's definition [26], a stochastic process x(t) is stationary if the distributions of ( ), ( ), … , ( ) and ( + ), ( + ), … , ( + ) coincide for any , , , … , , .By negation, in a nonstationary process the probability density ( ), ( ), … , ( ) for some (or all) τ is not equal to ( + ), ( + ), … , ( + ) , which means that the mathematical expression of the latter should explicitly contain the time shift τ, or else that it should be a deterministic function of τ.
Stationarity is closely related to ergodicity, which in turn is a prerequisite to make inference from data, that is, induction.Within the stochastics domain ( [20]; [27] (p.427)) ergodicity is defined in the following manner: A stochastic process x(t) is ergodic if the time average of any function ( ) , as time tends to infinity, equals the true (ensemble) expectation E ( ) This allows the estimation (i.e., approximate calculation) of the true but unknown quantity E ( ) (e.g., the true average E ( ) ) from the available data, i.e., from the sample mean).Without stationarity there is no ergodicity and without ergodicity inference from data is impossible.More details about stationarity and ergodicity, and their importance, are provided by Koutsoyiannis and Montanari [25], along with highlights of the misconceptions and abuses of these concepts which abound in the literature.
For the remaining part of this article, unless otherwise stated, the processes are assumed to be stationary and ergodic, noting that nonstationary processes should be converted to stationary before Figure 1.Explanatory sketch for a stochastic process in continuous time and in discrete time.Note that the graphs display a realization of the process (it is impossible to display the process as such) while the notation is for the process per se (adapted from [24]).
While a stochastic process denotes, by conception, change (process = a series of changes), there should be some properties that are unchanged in time.This implies the concept of stationarity [25], which is central in stochastics.According to Kolmogorov's definition [26], a stochastic process x(t) is stationary if the distributions of (x(t 1 ), x(t 1 ), . . . ,x(t n )) and (x(t 1 + τ), x(t 1 + τ), . . ., x(t n + τ)) coincide for any n, t 1 , t 2 , . . ., t n , τ.By negation, in a nonstationary process the probability density f (x(t 1 ), x(t 1 ), . . . ,x(t n )) for some (or all) τ is not equal to f (x(t 1 + τ), x(t 1 + τ), . . ., x(t n + τ)), which means that the mathematical expression of the latter should explicitly contain the time shift τ, or else that it should be a deterministic function of τ.
Stationarity is closely related to ergodicity, which in turn is a prerequisite to make inference from data, that is, induction.Within the stochastics domain ( [20,27] (p.427)) ergodicity is defined in the following manner: A stochastic process x(t) is ergodic if the time average of any function g(x(t)), as time tends to infinity, equals the true (ensemble) expectation E[g(x(t))].This allows the estimation (i.e., approximate calculation) of the true but unknown quantity E[g(x(t))] (e.g., the true average E[x(t)]) from the available data, i.e., from the sample mean).Without stationarity there is no ergodicity and without ergodicity inference from data is impossible.More details about stationarity and ergodicity, and their importance, are provided by Koutsoyiannis and Montanari [25], along with highlights of the misconceptions and abuses of these concepts which abound in the literature.
For the remaining part of this article, unless otherwise stated, the processes are assumed to be stationary and ergodic, noting that nonstationary processes should be converted to stationary before their study.For example, the cumulative process X(t) in Figure 1 is nonstationary, but by differentiating it in time we obtain the stationary process x(t).
Without loss of generality we assume that the mean of the process x(t) is zero (E[x(t)] = 0) and we denote its variance as: Let: be the variances of the cumulative and the averaged process, respectively (note that Γ(0) = 0 and γ(0) = γ 0 ).While Γ(t), the variance of the nonstationary process X(t), is a function of t, γ(k) is the variance of the stationary process x τ ] and thus it is not a function of time τ.We call Γ(t) and γ(k), as functions of time t and time scale k, the cumulative climacogram and the climacogram of the process, respectively.
The climacogram is the second central moment of the process, as a function of time scale, and thus it is a second-order characteristic of the process.Other customary second order characteristics of a stationary process are the autocovariance function c(h), where h denotes time lag, the power spectrum s(w), where w denotes frequency, and the structure function v(h) (also known, for a stationary process, as the semivariogram or variogram).The latter has a direct analogy based on the climacogram, the climacostructure function ξ(k).Definitions and notation for these second order characteristics are contained in Table 1 for the continuous time domain and in Table 2 for the discrete time case.All of these functions are transformations of one another, as shown in Table 3.
The climacogram, like the autocovariance function, is a positive definite function (see proof in Appendix A.1.2) but of the time scale k, rather than the time lag h.It is not as popular as the other tools but it has several good properties due to its simplicity, close relationship to entropy (see below), and more stable behaviour, which is an advantage in model identification and fitting from data.In particular, when estimated from data, the climacogram behaves better than all other tools, which involve high bias and statistical variation [24,28].
Variance, instantaneous Cumulative climacogram Power spectrum (or spectral density) Frequency, continuous w = 1/k Units of inverse time (11) Table 2. Definition of main characteristics and notations for a stochastic process in discrete time and their relationship with those in continuous time; see Figure 1 for clarification.

Name of Quantity or Characteristic Symbol and Definition Remarks Equation
Stochastic process, discrete time Characteristic variance Var[ x Time scale, discrete κ := k/D Dimensionless quantity (16) Autocovariance function c Structure function v Power spectrum s Frequency, discrete ω := wD = 1/κ Dimensionless quantity (20) Note: In time-related quantities, Latin letters denote dimensional quantities and Greek letters dimensionless ones.The Latin i, j, l may also be used as integers to denote quantities τ, η, κ, depending on the context.
Table 3. Relationships between several characteristics of a process in continuous and discrete time.

Characteristics
Direct Relationship Inverse Relationship Equation (κD) 2 where, in recursive mode, The climacogram involves bias too, but this can be determined analytically and included in the estimation.Specifically, assuming that we have n observations of the averaged process x (k) i , at scale k = κD, so that T = nk is the observation period (rounded off to an integer multiple of k), the standard statistical estimator of variance has expected value: where θ is the bias correction coefficient, given by [22,24]: It becomes clear from the above equations, which are valid for any process, that direct estimation of the variance γ(k) for any time scale k, let alone that of the instantaneous process γ 0 , is not possible merely from the data.We need to know the ratio γ(T)/γ(k) and thus we should assume a stochastic model which evidently influences the estimation of γ(k).Once the model is assumed and its parameters estimated based on the data, we can expand our calculations to estimate the variance for any time scale, including that of the instantaneous scale γ 0 .

Entropy and Entropy Production
Historically entropy was introduced in thermodynamics but its rigorous definition was given within probability theory.Thermodynamic and probabilistic entropy are regarded by many as different concepts having in common only their name.However, here we embrace the view that they are essentially the same thing, as has articulated elsewhere [29,30].According to the latter interpretation, the mathematical description of thermodynamic systems could be produced by the probabilistic definition of entropy.As an example indicating how powerful this interpretation could be, the law of phase change transition of water (Clausius-Clapeyron equation) has been produced in [30], with impressive agreement with reality, by maximizing, in a formal probabilistic frame, the entropy, i.e., the total uncertainty about the state of a single molecule of water.
In this frame, entropy (often called Boltzmann-Gibbs-Shannon entropy to give credit to its pioneers and distinguish it from other forms of generalized entropies) is a dimensionless measure of uncertainty defined as follows:

•
For a discrete random variable x with probability mass function P j := P{x = x j }, j = 1, . . ., w: • For a continuous random variable z with probability density function f (z): where m(x) is the density of a background measure (usually Lebesgue, i.e., m(x) = 1, with dimensions [x −1 ]).Entropy acquires its importance from the principle of maximum entropy [3], which postulates that the entropy of a random variable should be at maximum, under some conditions, formulated as constraints, which incorporate the information that is given about this variable.Its physical counterpart, the tendency of entropy to become maximal (Second Law of thermodynamics) is the driving force of natural change.
The definition of the entropy of a stochastic process directly follows that of a random variable.Thus, the entropy of the cumulative process X(t) with probability density function f (X; t) is a dimensionless quantity defined as: In a stochastic process, which by definition involves time, it is natural to consider the change of entropy in time.Intuitively, for the stochastic process X(t) we can define entropy production as the time derivative, i.e., Φ [X(t)] := dΦ[X(t)]/dt.This has dimensions of inverse time.Koutsoyiannis [22] defined the entropy production in logarithmic time (EPLT) as the derivative of entropy in logarithmic time: and showed that it has some advantages over Φ'[x(t)] as described above.
Assuming that the background density is constant, m(X) ≡ m, and using simple constraints related to the preservation of the mean and variance, the principle of maximum entropy yields that the probability distribution of X will be Gaussian.For a Gaussian process the entropy depends on its variance Γ(t) and is: Hence: Because of the simplicity of the Gaussian case, we are basing further analyses on that assumption, noting that for large times, due to the Central Limit Theorem, the cumulative process X(t) will tend to Gaussian, even if for the smallest scales it deviates from Gaussian.
A question arises whether φ(t) is a totally abstract concept useful for theoretical analyses only or it can be related to data, and visualized or estimated from them.The reply is fortunate if, in addition, we consider another concept, the log-log derivative (LLD) of a function f (x), formally expressed by: This is visualized as the (local) slope on the popular log-log plot (plot of log f (x) vs. log x).It is then seen that ( 37) can be written as: In other words, EPLT is visualized and estimated by the slope of a log-log plot of the climacogram.The quantity φ(t) represents the entropy production when nothing is known (observed) about the process realization.More interesting is the conditional EPLT (CEPLT, φ C (t)) when the past (t < 0) and the present (t = 0) are observed.In this case, instead of the unconditional variance Γ(t) we should use a variance Γ C (t) conditional on the past and present.It has been shown [22,24] that for a Markov process (see definition of the Markov process in Section 3.1 and its details in Appendix A.2.1), the conditional variance is: We note that the difference γ(k) − γ(2k) is always nonnegative (this is shown in Appendix A.1.3,Equation (A21)) and thus it can indeed represent a (conditional) variance.
For a non-Markov process the following adaptation is necessary: where ε is a constant, ensuring that as k → ∞, γ C (k)/γ(k) → 1 ; the rationale of this limit is that the information about the past and present will tend not to affect the far distant future.As shown in the Appendix A.1.4,its value is: It is easily verified that for a Markov process in which γ # (∞) = −1 (see Appendix A.2.1), ε = 2, while for a Hurst-Kolmogorov process (original or filtered; see details in [24], in next session and in Appendix A.2) for which γ where H is the Hurst parameter.
Due to its mathematical form, we will refer to γ C (k) as the differenced climacogram.The inverse transformation, i.e., that giving the climacogram γ(k) once the differenced climacogram γ C (k) is known, can be easily deduced by iterative evaluations also observing that γ(∞) = γ C (∞) = 0 (this is a necessary condition in order for a stochastic process to be ergodic [27] (p.429)) and takes both the following forms: whereas for numerical evaluations it should be kept in mind that for large time scales k, γ(k) ≈ γ C (k).
Note that (43) entails: and this is valid for any k.The importance of the differenced climacogram stems from the fact that it is directly related to the CEPLT, through its obvious relationship (similar to (39)): As the differenced climacogram can be easily visualized from data, because it only needs calculation of variances, it directly provides a means for estimation and visualization of the CEPLT in terms of the slope in a log-log plot of the differenced climacogram.
In Appendix A.1.5 it is shown that the asymptotic properties of γ C (k) are: In other words, we need both the climacogram γ(k) and the climacostructure function ξ(k) to express the asymptotic behaviour of γ C (k).Once this is known, the asymptotic behaviour of CEPLT is also known from (45).
As shown in Appendix A.1.6, the differenced climacogram for persistent processes (see their definition in next subsection) is not integrable in (0, ∞).However, by slightly changing it multiplying with k, we can obtain an additional tool which is integrable and has some additional good properties.Specifically, we introduce the following tool, which, as we will see next, resembles the power spectrum and thus we refer to it as the climacospectrum: The climacospectrum is also written in an alternative manner in terms of frequency w = 1/k: The inverse transformation, i.e., that giving the climacogram γ(k) once the climacospectrum ζ(k) is known, can be easily deduced with the same method as before, and takes the form: A very good approximation in which the sum is replaced by an integral, is: This holds for all k and its details are given in Appendix A.1.7.For small k, it is easy to see that the multiplier of the integral I(k) is 1.It follows that for small time scales k or large frequencies w = 1/k the climacospectrum, expressed in terms of frequency, has an area I(k) on the left of w equal to the variance γ(1/w).As shown in Appendix A.1.7, the entire area under the curve ζ(w) (i.e., I(0)) is precisely equal to the variance γ(0) of the instantaneous process, a property also shared by the power spectrum s(w).This is not the only connection with the power spectrum.The climacospectrum has also the same asymptotic behaviour with the power spectrum, i.e., (see Appendix A.1.9).This property holds almost always, with the exception of the cases where Again the connection of the climacospectrum with the CEPLT emerges through log-log derivatives.Specifically, combining (41), ( 45) and (47) By virtue of (51) and with the exceptions mentioned, the asymptotic CEPLT can also be inferred from the slopes of the power spectrum and, as the climacospectrum slopes do not differ substantially from those of the power spectrum even for intermediate time scales, the power spectrum slopes could also be used for a qualitative estimation if the CEPLT.
Obviously, however, the climacospectrum is more advantageous than the power spectrum in this respect, because the connection with conditional entropy production is more precise and without exceptions at all.In addition, intuitively the variance, on which the definition of the climacospectrum is based, is more closely related to uncertainty, and hence entropy of the process, than the power spectrum and the autocovariance.Other advantages are the easy calculation only using the concept of variance, without any need to perform laborious transformations (such as Fourier) and the fact that, like the climacogram, it is not affected by discretization (while autocovariance and power spectrum are).The empirical climacogram and climacospectrum are easily determined from data using nothing more than the standard statistical estimator of variance and they have a smooth shape, much smoother than those of the empirical autocovariance and power spectrum, thus enabling better model identification and fitting.
When the objective is the model fitting, we should be aware that any of the statistical tools entail estimation bias, which should be accounted for in the fitting.Again the climacogram and the climacospectrum are preferable than other tools as the bias is explicitly calculated from the assumed model (Equations ( 30) and ( 31)).In particular, the bias of the climacospectrum is usually very small (see the relevant graphs in Appendix A.2) because of its definition as a difference of two variances, in which the biases tend to cancel out.In comparison with another version of climacogram-based spectrum [24], the climacospectrum introduced here has advantages.Namely, these are its simpler expression, as it is a linear transformation (difference) of the climacogram, and the absence of the variance γ 0 ≡ γ(0) from the definition, which makes possible the calculation of the empirical climacospectrum prior to specifying the model (note that γ 0 is not known before a model is specified and its parameters are estimated).

Scaling
In the previous subsection we have shown that the EPLT and the CEPLT are related to LLDs or slopes of log-log plots of second order tools such as climacogram, climacospectrum, power spectrum, etc.With a few exceptions, these slopes are nonzero asymptotically, hence entailing asymptotic scaling or asymptotic power laws with the LLDs being the scaling exponents.It is intuitive to expect that an emerging asymptotic scaling law would provide a good approximation of the true law for a range of scales.If the scaling law was appropriate for the entire range of scales, then we would have a simple scaling law.Such simple scaling sounds attractive from a mathematical point of view, but, for reasons that will be explained in the next subsection, it turns out to be impossible to make it appropriate for physical processes.
It is thus physically more realistic to expect two different types of asymptotic scaling laws, one in each of the ends of the continuum of scales.We call these two behaviours local scaling or fractal behaviour, when k → 0 or w → ∞, and global scaling, persistence or Hurst-Kolmogorov behaviour, when k → ∞ or w → 0. The respective scaling exponents are (see Appendix A.1.8):

•
For local scaling: Here, the emergence of scaling has been related to maximum entropy considerations, and this may provide the theoretical background in modelling complex natural processes by such scaling laws.However, as shown in Appendix A.1.8,scaling laws are a mathematical necessity and could be constructed for virtually any continuous function defined in (0, ∞).In other words, there is no magic in power laws, except that they are, logically and mathematically, a necessity.(No assumption of criticality, self-organization, fractal or multi-fractal generating mechanisms, is necessary to justify their emergence).Here we have focused on second order characteristics of a stochastic process.If we also considered higher order (raw) moments, it is natural to expect that some other (different) power laws would emerge (or could be constructed using the procedure described in Appendix A.1.8),and then one would speak about multifractals everywhere, etc.

Bounds of Scaling and Entropy Production
It is not well known that the asymptotic scaling exponents cannot take on any arbitrary values but they should be constrained in rather narrow ranges.Instead, misconceptions and incorrect applications resulting in inconsistent values abound in the literature.One of the most typical examples is the asymptotic scaling of the power spectrum for w → 0, for which often steep slopes are reported for low frequencies (down to 0), based on data analyses.Usually these reports are accompanied with a claim that the process with −s # (0) > 1 is nonstationary, going further to characterize the power spectrum as a tool to identify whether a process is stationary or nonstationary (with values −s # (0) below or above 1 suggesting stationarity and nonstationarity, respectively).As thoroughly studied elsewhere [31], this entire line of thought is theoretically inconsistent and flawed, while such results often reported are usually artefacts due to insufficient data or inadequate estimation algorithms.In fact, it has been shown [31] that a process with −s # (0) > 1 is nonergodic.Inference from data is only possible when the process is ergodic and thus, claiming that −s # (0) > 1 based on data is self-contradictory.Furthermore, claiming nonstationarity using the power spectrum as a function of frequency is also self-contradictory as in a nonstationary process both the autocovariance and the spectral density, i.e., the Fourier transform of the autocovariance, are functions of two variables, one being related to "absolute" time (see e.g., [32]).
An example of the conditions leading to misinterpreting a stationary process as nonstationary is discussed in Section 3.3 and Appendix A.2.2.We must clarify though that steep slopes (-s # (w) > 1) are mathematically and physically possible for medium and large w (see, e.g., Figure A7) and indeed they are quite frequent in geophysical and other processes.In this respect, the estimation from data of steep slopes is possible and not problematic, if we are conscious that such slopes are for intermediate or large frequencies (actually, the estimation algorithms are bandpass filtered with the bounds of the filter determined by the resolution and length of the series).The problem arises when such slopes are interpreted as asymptotic ones and particularly when they are used for inference resulting in self-contradicting conclusions.
Because of the equality of slopes of power spectrum and climacospectrum, the ergodicity limitation holds also for the slope of the climacospectrum, i.e., ζ # (∞) = − ζ # (0) < 1.On the other hand, too steep negative asymptotic slopes of the climacospectrum are also impossible.Indeed, (37) and ( 45)), Γ C (k) < 0. This means that the variance of the cumulative process would be a decreasing function of Entropy 2017, 19, 581 11 of 50 time, which is absurd.This holds both for the global case (k → ∞, in which the conditional variance Γ C (∞) equals the unconditional Γ(∞)) and the local case (k → 0, for the conditional variance Γ C (0)).
However, in the local case there is a more severe limitation imposed by physical reasoning.As demonstrated in Appendix A.1.10, the case ζ # (0) = −s # (∞) < 1 would entail infinite variance.Infinite variance would require infinite energy to emerge, which is physically inconsistent.Therefore, the physical lower limit for ζ # (0) = −s # (∞) is 1.Interestingly, the inconsistency if this constraint is violated stems from the behaviour at high frequencies (see Appendix A.1.10),even though the power spectrum at high frequencies in such cases tends to zero rather diverging to infinity.Therefore, the related "catastrophe" is of "ultraviolet" type, while the infinite value of the power spectrum at low frequencies, which was highlighted in several texts as "infrared catastrophe" [33] is not actually a problem at all, nor does it impose any limitation on the scaling law additional to the limitation for ergodicity discussed above.
A final-and quite severe-limitation is an upper bound of the local scaling exponent, which is 3 for ζ # (0) = −s # (∞).Proof is provided in Appendix A.1.11.The problem if this limitation is violated is that the resulting autocovariance function is not positive definite or, equivalently, that the resulting power spectrum is not always (for any frequency w) positive but takes on negative values for some w.Likewise, the Fourier transform of the climacogram takes on negative values for some w.
Because of the relationship of the entropy production with the scaling exponents, the bounds on the spectrum slopes translate directly into bounds of the CEPLT, i.e., 1 ≤ ϕ C (0) ≤ 2 and 0 ≤ ϕ C (∞) ≤ 1.These define the "green square" of admissible values of ϕ C in Figure 2, which is also depicted in terms of admissible values of slopes ζ # and s # (noting that, as discussed above, s # can, by exception, take on values out of the square when ϕ C (0) = 2 or ϕ C (∞) = 0.The reasons why a process out of the square would be impossible or inconsistent, as discussed above, are also marked in Figure 2.
Entropy 2017, 19, 581 11 of 49 resulting power spectrum is not always (for any frequency w) positive but takes on negative values for some w.Likewise, the Fourier transform of the climacogram takes on negative values for some w.
Because of the relationship of the entropy production with the scaling exponents, the bounds on the spectrum slopes translate directly into bounds of the CEPLT, i.e., 1 ≤ (0) ≤ 2 and 0 ≤ (∞) ≤ 1.These define the "green square" of admissible values of φ C in Figure 2, which is also depicted in terms of admissible values of slopes ζ # and s # (noting that, as discussed above, s # can, by exception, take on values out of the square when (0) = 2 or (∞) = 0).The reasons why a process out of the square would be impossible or inconsistent, as discussed above, are also marked in Figure 2. Large values of (0) indicate a smooth process and small ones a rough process.Also, large values of (∞) indicate a persistent process and small ones an antipersistent process.The centre of the square, with coordinates (0) = 3/2, (∞) = 1/2 represents a neutral process.As will be seen in next section, neutrality corresponds to a Markov process, which is neither rough nor smooth, and neither antipersistent nor persistent.Even though sometimes it is said to reflect short-term persistence, we must have in mind that a discretized Markov process at time scale k tends to be uncorrelated in time as k increases and therefore we may not have it in mind as persistent.In other words, here persistence is used as synonymous to long-term persistence, while high autocorrelations at Large values of ϕ C (0) indicate a smooth process and small ones a rough process.Also, large values of ϕ C (∞) indicate a persistent process and small ones an antipersistent process.The centre of the square, with coordinates ϕ C (0) = 3/2, ϕ C (∞) = 1/2 represents a neutral process.As will be seen in next section, neutrality corresponds to a Markov process, which is neither rough nor smooth, and neither antipersistent nor persistent.Even though sometimes it is said to reflect short-term persistence, we must have in mind that a discretized Markov process at time scale k tends to be uncorrelated in time as k increases and therefore we may not have it in mind as persistent.In other words, here persistence is used as synonymous to long-term persistence, while high autocorrelations at small lags or scales, which die off at large lags or scales, are better described with the term smoothness.Both smoothness and persistence reflect high entropy production, locally and globally, respectively.
A useful observation in Figure 2 is that the entire "green square" lies below the equality line, which means that the same scaling exponent is not possible for both local and global behaviour, or else, it is impossible to have a physically realistic simple scaling process.There is one exception, the upper-left corner of the "green square", which corresponds to the so-called "pink noise" or "1/f noise" and will be discussed further in Section 3.3.
Since we have pointed out several inconsistent or contradictory results that are reported in the literature, it may be useful to mention another case, which is frequently met in geostatistics.The so-called intrinsic models are common in geostatistics but susceptible to inconsistent use, even though the literature presents them as nonstationary to avoid some of the emerging problems.Such models are defined in terms of their structure function (variogram), v(h), which tends to infinity as lag h tends to infinity (e.g., [34]).An example is v I (h) = a h b with b > 0, in which we note that its formulation with respect to a single scalar argument, h, does not reveal the nonstationarity.Thus, it may be treated as if it was stationary.In that case, from (24) we have γ 0 = v I (∞) = ∞ and thus the intrinsic process violates the constraint for physical consistency.Moreover, since v I (h) = γ 0 − c(h) < ∞ for h < ∞, it turns out that c(h) = ∞ for any h and, hence, (21) entails that γ(k) = ∞ for any k.In other words, the infinity of v I (∞) is transferred to the entire autocorrelation function and the entire climacogram.In particular, the property γ(k) = ∞ means that the process is non-ergodic (in ergodic processes γ(k) should tend to zero as k tends to infinity; see [27] (p.429)).Thus, in addition to being physically inconsistent such a treatment of the process is mathematically inconsistent and logically self-contradictory.A consistent way of treatment is to identify the intrinsic process with the (nonstationary) cumulative process X(t) [35], derive the stationary process x(t) and treat the latter further regularly.In that case the variogram is no longer related to the structure function of x(t) but to its climacogram; namely, it is identical to the cumulative climacogram v Another peculiarity in geostatisical analyses is the so-called "nugget effect", which is also problematic or enigmatic [36].Namely, this is a discontinuity of the structure function at the origin.An example is v E (h) = a (h +c) b , h > 0, with c, b > 0, while v E (0) = 0. Investigation shows that the "nugget effect" does not necessarily create inconsistency.It is obviously associated with an infinite derivative of the structure function at the origin, i.e., v (0) = ∞.However, the LLD, v # (h) = v (h)h/v(h) can be finite at the limit as h → 0 (because of the multiplication by h).Therefore, the resulting ϕ C (0) = 1 + v # (0)/2 does not necessarily lie out of the "green square".Figure 3 illustrates both the "intrinsic" case and the "nugget effect" and provides hints how to avoid both, adopting much better modelling alternatives.The above example of v E (h) was used with parameters indicated in the figure caption.It is easy to find that a FHK-C model which is both persistent (H = 0.75 > 1 /2 ) and rough (M = 0.2 < 1 /2 ) can replace the v E (h) and be used further.
Conversely, we can imagine that in the model identification and fitting of v E (h), persistence was regarded as an "intrinsic" characteristic and roughness was interpreted as "nugget effect".Note that because M < 1 /2, v (h) = −c (h) = ∞.Thus, a rough process seems like if it exhibited the "nuggest effect".However, if instead of the standard plot of the variogram, a logarithmic plot was made (also shown in Figure 3 with respect to the upper horizontal axis), then the "nugget" would disappear.
at the limit as h → 0 (because of the multiplication by h).Therefore, the resulting (0) = 1 + # (0)/2 does not necessarily lie out of the "green square".Figure 3 illustrates both the "intrinsic" case and the "nugget effect" and provides hints how to avoid both, adopting much better modelling alternatives.The above example of v E (h) was used with parameters indicated in the figure caption.It is easy to find that a FHK-C model which is both persistent (H = 0.75 > ½) and rough (M = 0.2 < ½) can replace the v E (h) and be used further.Conversely, we can imagine that in the model identification and fitting of v E (h), persistence was regarded as an "intrinsic" characteristic and roughness was interpreted as "nugget effect".Note that because M < ½, (ℎ) = − (ℎ) = ∞.Thus, a rough process seems like if it exhibited the "nuggest effect".However, if instead of the standard plot of the variogram, a logarithmic plot was made (also shown in Figure 3 with respect to the upper horizontal axis), then the "nugget" would disappear.

Revisiting Earlier Results
In an earlier study (already mentioned, [22]) on subjects related to the present one, it was suggested that extremization of EPLT in a continuous time representation could determine the entire dependence structure of the process of interest based on simple constraints.The specific premises for EPLT extremization were:

Revisiting Earlier Results
In an earlier study (already mentioned, [22]) on subjects related to the present one, it was suggested that extremization of EPLT in a continuous time representation could determine the entire dependence structure of the process of interest based on simple constraints.The specific premises for EPLT extremization were: the present and past state of a process, the future entropy is reduced, whereas as t → ∞ conditional and unconditional entropies should tend to be equal, which, however, cannot happen if the entropy production is consistently lower in the conditional than in the unconditional case; (e) extremization of entropy production at asymptotic time scales, i.e., t → 0 and t → ∞.
These premises after systematic analyses resulted in two processes extremizing entropy production:

•
A Markov process: maximizes entropy production for small times (t → 0) but minimizes it for large times (t → ∞).
In these definitions α and λ are scale parameters with dimensions of [t] and [x 2 ], respectively.The parameter H, known as the Hurst parameter, determines the global properties of the process with the notable property: Here we revisit these results as well as the premises, in light of the theoretical analyses of the previous section and incorporating empirical experience based on various data sets.These allow the following notes: 1.
The minimal entropy production of the HK process at small times may not be plausible for complex processes; rather we should generally expect that complex processes maximize entropy production at both small and large times (asymptotically as t → 0 and ∞).

2.
The HK process is characterized by infinite variance (γ(0) = ∞), which, as discussed in the previous section, makes it physically inconsistent.

3.
The premise φ C (t) ≥ φ(t) should certainly hold asymptotically (for t → 0 and t → ∞) but not necessarily at intermediate times.A specific example is the process shown in Figure 4h which includes a periodic component where the curve of φ C (t) intersects that of φ(t) (the latter is not shown in the figure), even though asymptotically the inequality holds.It can thus be conjectured that if a process incorporates a deterministic component (e.g., periodic) while it is treated stochastically without separating the deterministic component, the inequality for EPLT and CEPLT could be violated at intermediate times.

4.
While the Markov process represents the maximal EPLT as t → 0 (φ(0) = 1), this property is shared by many other processes (see next subsection).Furthermore, in terms of CEPLT, while the Markov process corresponds to φ C (0) = 3/2, there are processes which have higher CEPLT than the Markov, up to φ C (0) = 2.These are smoother than the Markov process and have structure function with v # (0) = ξ # (0) > 1, where the value 1 corresponds to the Markov process.
Based on these remarks, the Markov process remains a valid candidate of physically consistent processes maximizing entropy production, if only the local asymptotic behaviour is of interest.However, additional constraints to the above premises (a)-(e) (from [22]) are needed to confirm its appropriateness, both on small scales (to enable determination of φ C (0)) and large scales (to enable determination of φ C (∞) = φ(∞)).
On the contrary, the HK process should not be regarded as a candidate of general validity, even though it is valid for large scales.Indeed, it is still quite useful as it results in high entropy production at large scales, a property that cannot hold for processes with exponential decrease of autocovariance such as the Markov process.We can thus regard the HK process as a useful mathematical construct which does not appear in nature but can be filtered appropriately to make a physically consistent process.This is the same with white noise (WN), which again is not physically consistent as it involves infinite variance in continuous time (see the specific location and the relationship of the WN and HK processes in the diagram of Figure 2).
It is well known that if a WN process y(t) is the input to a moving average filter x(t) =

2T
t+T t−T y(a)da, then it produces a physically consistent process x(t) with finite variance and autocovariance linearly varying in the interval [0, 2T] ( [27] (p.325)).Also, if y(t) is the input to a linear system corresponding to the linear differential equation x (t) + αx(t) = y(t), then the output x(t) is a Markov process ( [27] (p.326)), physically consistent.
Likewise if in these two cases the input y(t) is an HK process, then it is easy to see that the filtered output is a physically realistic process with finite variance γ(0), practically unaffected climacogram γ(k) at large scales, with γ # (∞) = 2H − 2 (as in the original HK process) but highly modified climacogram at small scales, thus having a valid structure (and climacostructure) function with v # (0) = ξ # (0) = 2H.

Specific Processes
Following the theoretical analyses of Section 2 and the remarks of the Section 3.1, we describe here a number of stochastic processes specified through any of their second order functions.All of these processes respect the limitations discussed in Section 2.4 and thus they are consistent mathematically and physically.Some of them correspond to high entropy production, but we also discuss processes with low entropy production.The complete details of these processes, along with graphs illustrating their properties and results of their application, are contained in Appendix A.2. , with CEPLT close to the absolute maximum for large scales (H = 0.99) and close to the absolute minimum for small scales (M = 0.01); (e) GP with n = 1 (blackbody spectrum) with CEPLT equal to the absolute minimum (0) for large scales and to the absolute maximum (2) for small scales; (f) Approximation of (e) by FHK (H = 0.01, M = 0.99, also using the same initial sequence of random numbers as in (e) to visualize similarity); (g) AE with n = 2; (h) average of a FHK with H = M = 0.8 and a harmonic oscillation with T = 5α = 50.In all cases the discretization time scale is D = 1, the characteristic time scale a = 10, and the characteristic variance scale λ is chosen so that for time scale D, γ(D) = 1.The mean is 0 in all cases and the marginal distribution is normal.The FHK is implemented using the Cauchy-type climacogram.with CEPLT close to the absolute maximum for large scales (H = 0.99) and close to the absolute minimum for small scales (M = 0.01); (e) GP with n = 1 (blackbody spectrum) with CEPLT equal to the absolute minimum (0) for large scales and to the absolute maximum (2) for small scales; (f) Approximation of (e) by FHK (H = 0.01, M = 0.99, also using the same initial sequence of random numbers as in (e) to visualize similarity); (g) AE with n = 2; (h) average of a FHK with H = M = 0.8 and a harmonic oscillation with T = 5α = 50.In all cases the discretization time scale is D = 1, the characteristic time scale a = 10, and the characteristic variance scale λ is chosen so that for time scale D, γ(D) = 1.The mean is 0 in all cases and the marginal distribution is normal.The FHK is implemented using the Cauchy-type climacogram.
We start with a general Filtered Hurst-Kolmogorov (FHK) process (which in some cases, e.g., [24], has also been called the Hybrid Hurst-Kolmogorov-HHK-process).In Section 3.1 we briefly discussed two types of linear filters to the HK process, whose asymptotic properties are determined based on a single parameter, H.Here we generalize this result also making the asymptotic behaviour on the left independent from that on the right (cf. [37]) through introducing another parameter denoted as M (in honour of Mandelbrot, following [31]) which is called the smoothness (or fractal) parameter.We avoid specifying the linear filter needed to convert the HK process into the types of the FHK process given as this is not necessary (and in some cases it would be too involved).Rather we specify these types in terms of a convenient expression of the climacogram.The first case is defined through a generalized Cauchy-type climacogram (FHK-C) (see also [24]): Both M and H are dimensionless parameters, and M and 1 − H vary in the interval (0, 1] with M < 1 /2 or > 1 /2 indicating a rough or a smooth process, respectively, and with H < 1 /2 or > 1 /2 indicating an antipersistent or a persistent process, respectively. The second case is defined through a Dagum-type (FHK-D) climacogram: and, despite its different expression, its behaviour, as well as the meaning and ranges of parameters are the same as in the Cauchy-type climacogram.For M = 1 − H both result in precisely the same special case: The third case is obtained by summing a Cauchy-type climacogram with M = 1 and a Dagum-type climacogram with H = 0.The climacogram of the thus produced model (FHK-CD) is: This is a convenient form, in which the first additive term determines merely the persistence of the process and the second one the smoothness of the process.In addition, it is more flexible and richer than its constituents, as it contains two couples of scale parameters; however, if parsimony is sought, then it can take the same number of parameters as each of the constituents by setting α 1 = α 2 and λ 1 = λ 2 (note that, for dimensional consistency, one λ and one α are minimal parameter requirements).
Even in its most parsimonious form, the FHK in any of the above three variants can cover the entire admissible range, i.e., the entire "green square" in Figure 2. The different patterns in time series generated by different M and H (specifically for the Cauchy-type climacogram) are illustrated in some plots of Figure 4 also in comparison with other models, first of which is the most customary Markov model (Figure 4a).The Markov model is good as a benchmark for comparisons because, as already discussed in Section 2, it is fully neutral (neither rough nor smooth as φ C (0) = 3/2, and neither antipersistent nor persistent as φ C (∞) = 1/2).
Each of the panels shows the first fifty terms of time series produced by each of the model implementations at time scales k = 1 and 20.In addition, each panel contains a "stamp" of the specific model represented by the plot of CEPLT, φ C (k).Additional plots for all second order tools of each of the models are given in Appendix A.2.The time series for these models were generated quite easily using the generic model proposed in [24]; their length is 1024 and this is also the length of the series of coefficients a used for the generating symmetric moving average (SMA) scheme.All calculations are of algebraic type (based on the equations of Section 2 and Appendix A.2) and were performed in Excel spreadsheets without difficulties.
In Figure 4b the CEPLT is close to the absolute maximum both for small and large scales (H = M = 0.97 so as to obtain φ C (0) = 1.97 ≈ 2 and φ C (∞) = 0.97 ≈ 1); notable is the very smooth shape at scale 1 and the large departures from the mean (which is 0) at scale 20.On the contrary, in Figure 4c the CEPLT is close to the absolute minimum for all scales (H = M = 0.05, so as to obtain φ C (0) = 1.05 ≈ 1 and φ C (∞) = 0.05 ≈ 0-for better visualization it was preferred not to use values of H and M < 0.05).Furthermore, in Figure 4d the CEPLT is close to the absolute maximum for large scales (H = φ C (∞) = 0.99 ≈ 1) and close to the absolute minimum for small scales (M = 0.01 resulting in φ C (0) = 1.01 ≈ 1).Finally, in Figure 4f the conditions are opposite to those in 4d, i.e., the CEPLT is equal to the absolute minimum for large scales (H = φ C (∞) = 0.01 ≈ 0) and to the absolute maximum (2) for small scales (M = 0.99 resulting in φ C (0) = 1.99 ≈ 2).
The particular case of Figure 4d is close to what is usually called "pink noise" or "1/ f noise", as the power spectrum has almost constant slope −1 for the entire frequency domain (which is the same in the climacospectrum).This means that using the FHK model we can theoretically represent and practically produce even "pink noise" in a consistent stationary setting without linking it to a nonstationary process [38,39], which actually involves several theoretical inconsistencies [31].Indeed, as can be seen more thoroughly in the detailed graphs of Figure A5 in Appendix A.2.2, which are for the same application with that of Figure 4d, the small change of slope of from 0.99 to 1.01 is not actually visible, especially in view of the very rough shape of the empirical periodogram, which certainly cannot support differentiation between 0.99 and 1.The FHK model can be used also in other ways to produce "pink noise", that is, by selecting a very large (small) parameter α so as to expel from our field of vision the asymptotic behaviour on large (small) scales.And we can imagine that in several cases of empirical explorations using observations of natural processes, the observation resolution and length, compared to characteristic scale(s) of the process, are such as to hide the asymptotic behaviour of the process.
We can use this as a trick to obtain virtually constant power spectrum slopes much steeper than −1.This is illustrated Figure A7 of Appendix A.2.2 where the FHK was used with H = M = 0.75 and α = 100.These yield theoretical slopes ζ # (∞) = s # (∞) = −(1 + 2M) = −2.5 and ζ # (0) = s # (0) = −(2H − 1) = −0.75.However, the large α does not allow viewing the asymptotic behaviour at low frequencies or large scales and the slope −2.5 dominates everywhere.Actually the empirical periodogram estimates an even steeper constant slope, s # = −2.6.This should not mislead us to conclude that the process is nonstationary because the slope is steeper than −1 (as happens in many studies in the literature).Likewise, in the case that the time series represented observations of a natural process, such a result must not be misinterpreted as evidence that natural processes can lead to slopes constant for all scales and steeper than −1.Clearly here, the process is stationary, the slope −2.5 refers to large frequencies or small scales but, because of the large characteristic scale and the limited observation period (1024 in this example), we cannot see what happens at larger scales.If we saw, then the slope would be −0.75, in accord with the theory.
As discussed, the Markov process is neutral but, by modifying its power spectrum, we can obtain processes which are smooth or antipersistent.Two types of modifications are studied in Appendix A.2, introduced in terms of their power spectrum, i.e., and: where n = 0, 1, 2 . . ., and the value n = 0 results precisely in a Markov process.For n > 0, (60) corresponds to a smooth process with φ C (0) = 2 and (61) corresponds to an antipersistent process with φ C (∞) = 0.The resulting autocovariance function retains the characteristic exponential form of the Markov process but is multiplied by a polynomial function of lag (see Appendix A.2). Therefore, we call these two processes the smooth exponential (SE) and the antipersistent exponential (AE), respectively.In each of the cases, a most extreme value of CEPLT is achieved (maximal for small scales in SE and minimal for large scales in AE). Figure 4g, shows an example of AE for n = 2; the hole in the function φ C (k) for moderate k is a characteristic of this process.An interesting process that is simultaneously extremely smooth (φ C (0) = 2) and extremely antipersistent (φ C (∞) = 0) is defined by the following power spectrum: where c n is a normalizing constant making the area of the power spectrum equal to γ 0 = λ.This process, which we call generalized Planck (GP) is obtained by a generalization of Planck's law of black-body radiation, whereas it is identical to this law if n = 1.The detailed equations of this process are also given in Appendix A.2.A characteristic plot of time series generated from this process is shown in Figure 4e, and is almost indistinguishable from Figure 4f which, as already discussed, was produced as an approximation by FHK.
A final case examined in the Appendix A.2 is a harmonic oscillation, which is very easily modelled as a deterministic process but here it is treated as a stochastic process.Of course stochastic treatment is not advisable in this case, but often such periodic behaviours appear as components of stochastic processes.Obviously the second-order characteristics of such processes are affected by periodic components and therefore we need to know which equations should be superimposed in those of the pure stochastic process (see also [40,41]).As an example of such a case, Figure 4h shows the behaviour of a process defined as the average of a FHK with H = M = 0.8 and a harmonic oscillation.It is difficult to identify the presence of the oscillation from visual inspection of the time series, but the detailed graph of any of the second order characteristics, plotted in Appendix A.2., captures the periodic behaviour.

Comparison of Asymptotic Properties
Most of the models examined in the previous subsection can only reproduce specific values of asymptotic properties.An exception is the FHK model which is the most powerful as it can perform in the entire admissible domain of asymptotic properties.A visual comparison of all models examined, it terms of the asymptotic values of the different second order characteristics is made in Figure 5, while Figure 6 provides an envelope for all models, with classification of the ranges in terms of smoothness and persistence.
Further comparative information about which model can perform for the most extremal cases is provided in Table 4.These figures and the table can be useful for the model selection based on data.
Table 4. Models that can perform in each of the specified extremal or neutral cases of smoothness and persistence.

Maximal Roughness
Note: the symbol '~' indicates that the model can perform in close approximation but not precisely.
Entropy 2017, 19,581 in the entire admissible domain of asymptotic properties.A visual comparison of all models examined, it terms of the asymptotic values of the different second order characteristics is made in Figure 5, while Figure 6 provides an envelope for all models, with classification of the ranges in terms of smoothness and persistence.
Figure 5. Ranges of asymptotic values for each of the most important second order functions of the examined processes.Wide light blue lines/boxes represent global properties (k, h → ∞, w → 0) while narrow dark red lines/boxes represent local properties (k, h → 0, w → ∞).Markov: Markov process; FHK: filtered Hurst-Kolmogorov process in any of the three forms; SE: smooth process with generalized exponential-type autocovariance; AE: antipersistent process with generalized exponential-type autocovariance; GP: generalized Planck.Note that the Markov process is identical with the SE and PE for n = 0.
-  Further comparative information about which model can perform for the most extremal cases is provided in Table 4.These figures and the table can be useful for the model selection based on data.Note: the symbol '~' indicates that the model can perform in close approximation but not precisely.

Real World Case Study
Empirical testing of the theoretical analyses presented here needs fine resolution of measurements and long time series, in order to reliably determine the asymptotic properties from data.Geophysical time series usually do not satisfy either of these two conditions.However, laboratory measurements of turbulent velocity can meet the conditions.Indeed, in laboratory experiments at sampling intervals of μs, very large samples can be formed which can enable viewing the asymptotic behaviour.Here we use grid data of nearly isotropic turbulence from the Corrsin Wind Tunnel at a high-Reynolds-number [42], which were made available on the Internet by the Johns Hopkins University.This dataset consists of 40 time series with n = 36 × 10 6 data points of wind velocity along the flow direction and an equal number of time series of cross-stream velocity, all measured at a sampling time interval D = 25 μs by X-wire probes placed downstream of the grid.Here we use part of the data, namely the series of wind velocity along the flow direction at the first of the probes (first column of files in http://pages.jh.edu/~cmeneve1/datasets/Activegrid/M20/H1/).More data and analyses have been used by Dimitriadis and Koutsoyiannis [43] at a somewhat different context.
Characteristic plots of the turbulent velocity time series are shown in the Figure 7a for various time scales.The large persistence of turbulence is evident from the large variability at the largest time scales.Impressively, an FHK-CD model with five parameters describes well the data at the entire range of available time scales which spans almost seven orders of magnitude.

Local properties
Smooth -special case 1

Global properties
Persistent Antipersistent Antipersistentspecial case 2 Special case 1: Envelopes of ranges of asymptotic values for each of the most important second order functions for all examined processes.

Real World Case Study
Empirical testing of the theoretical analyses presented here needs fine resolution of measurements and long time series, in order to reliably determine the asymptotic properties from data.Geophysical time series usually do not satisfy either of these two conditions.However, laboratory measurements of turbulent velocity can meet the conditions.Indeed, in laboratory experiments at sampling intervals of µs, very large samples can be formed which can enable viewing the asymptotic behaviour.Here we use grid data of nearly isotropic turbulence from the Corrsin Wind Tunnel at a high-Reynolds-number [42], which were made available on the Internet by the Johns Hopkins University.This dataset consists of 40 time series with n = 36 × 10 6 data points of wind velocity along the flow direction and an equal number of time series of cross-stream velocity, all measured at a sampling time interval D = 25 µs by X-wire probes placed downstream of the grid.Here we use part of the data, namely the series of wind velocity along the flow direction at the first of the probes (first column of files in http://pages.jh.edu/~cmeneve1/datasets/Activegrid/M20/H1/).More data and analyses have been used by Dimitriadis and Koutsoyiannis [43] at a somewhat different context.
Characteristic plots of the turbulent velocity time series are shown in the Figure 7a for various time scales.The large persistence of turbulence is evident from the large variability at the largest time scales.Impressively, an FHK-CD model with five parameters describes well the data at the entire range of available time scales which spans almost seven orders of magnitude.

Discussion and Conclusions
The question whether the probabilistic (information) entropy incorporates, as a special case, thermodynamic entropy or the two are different concepts, is still debated [44,45].If we adopt the former proposition of the dilemma, as we have done here, the requirement emerges that entropy production should be formulated in a stochastic context, while current attempts mostly use the classical thermodynamic entropy definition as in heat transfer.This is not an easy task and certainly requires exploration of a broad spectrum of alternatives for a definition of the concept of entropy production.Exploration of extremum principles for entropy production in a stochastic framework is also crucial.The two versions proposed here, entropy production in logarithmic time, unconditionally (EPLT) or conditionally on the past and present having been observed (CEPLT), may have some usefulness and potential as explanatory and modelling tools.The findings of this study include the following:

•
Definition of entropy production with respect to logarithmic time has some attractive characteristics and is consistent with the notion of the time arrow.

•
The asymptotic values of entropy production for time zero and infinity seem to be crucial for its extremization and define important features of a process.

•
In particular, the asymptotic value of CEPLT for zero time defines the roughness (fractality) of the stochastic process while the common asymptotic value of EPLT and CEPLT for infinite time defines the (long-term) persistence of the process and equals the Hurst parameter.

•
Both ELPT and CEPLT are finite and constrained within specific theoretically justified bounds.
The scaling laws of all second order characteristics of stochastic processes are a result of these finite values and are also subject to bounds (with few exceptions fully described).The proofs of existence of these bounds offer the means to fight common misunderstanding, misuse, false reporting and inconsistent estimation of characteristics of geophysical processes, as well as to develop consistent estimation algorithms.

•
The climacogram and its derivative tools introduced here, namely, the differenced climacogram and the climacospectrum are very powerful and support the estimation of entropy production, in either of its forms, from data series.

•
The broad family of models for stochastic processes investigated here covers the entire admissible range of asymptotic behaviours and offers the means for modelling any type of processes, either using a single model or a combination of more than one model.

•
The real-world application presented with the extraordinary long time series of turbulent velocity shows how a parsimonious stochastic model can be identified and fitted using the tools developed.
Measurements of diverse turbulent phenomena can potentially provide equally long time series with fine resolution, shaping a broader empirical basis for future work on identifying natural behaviours, specifically on larger spatial and temporal scales, and further testing the adequacy of the models.Characteristics of order higher than second, along with the effects of non-Gaussianity on entropy production, are also interesting to examine in light of the developments for second-order ones.A recent study [43] has already provided interesting results with respect to high-order moments.Connections of microscale turbulence and macroscale atmospheric phenomena are also a broad field for future studies.Further applications, also extending similar recent works in hydrology, geophysics and ecosystems [30,40,41], would enrich our knowledge on natural behaviours.

2.
We find x −c f (x) .If β 2 = 0 then we replace f (x) with ln|f (x)| (which preserves the property lim x → ∞ f (x) = ∞); if necessary, we make iterations so that eventually lim

3.
Given the properties in point 2, there exists a unique b, 0 < b < ∞ satisfying: so that for any b = b: The constant b defines an asymptotic power law with exponent b (cf.Hausdorff dimension; the case b = 0 signifies an improper scaling).
To identify the second asymptotic law as x → 0, we define f (x) f (1/x) and we proceed in the same manner to construct a function f (x) satisfying the conditions in point 2 above.Proceeding likewise, we determine the unique a for which | lim x→∞ x −a f (x)| < ∞.This determines an asymptotic power law with exponent a for f (x) as x → 0.
We remark that the two power laws refer to the same function f (x) but, due to the replacement steps of the procedure, may eventually correspond to different functions, say, f a (x) and f b (x) for the asymptotic behaviours as x → 0 (local behaviour) and x → ∞ (global behaviour), respectively.However, it is easy to construct a single function that combines both, making a final replacement, e.g., f (x) = f a (x)f b (x), but many different f (x) can actually be constructed.Thus, as well as any object has a dimension, any continuous function entails asymptotic power laws; generally not one but two, which in the special case of simple scaling can be identical.Now, coming to the asymptotic values of f # (x) for x → 0 and ∞, symbolically f # (0) and f # (∞), these are: To prove the former case, we proceed as follows: This implies that lim x→∞ f # (x)/b = 1 and thus f # (∞) = b.The latter case ( f # (0) = a) can be proved in the same manner.
A useful property is that if lim Thus, to have asymptotic (for x → 0 and ∞) values of f # different for 0 and ∞, the asymptotic values of f should be either 0 or ∞.These properties are useful and are utilized in other proofs herein.
Appendix A.1.9.Asymptotic Scaling of the Different Second Order Tools We assume that the asymptotic behaviour of the climacogram-based second order tools is given as: Entropy 2017, 19, 581 29 of 50 and we will determine that of the autocovariance-based tools.Note that the equality of the different function limits in (A50) has been proved in Appendix A.1.5.We will fist examine the asymptotic properties of the autocovariance function.Initially we note that, because γ(0) = c(0) = γ 0 (finite), it follows that γ # (0) = c # (0) = 0. Now, since b = γ # (∞) < 0, it follows that: lim where c is finite.We use the relationship between γ(k) and c(h) (Equation ( 21)) to find: Because both γ(h) and h b → 0 as h → ∞, applying l'Hôpital's rule we find: Next we will examine the asymptotic properties of the structure function.Initially we note that, because ξ(∞) = v(∞) = γ 0 (finite), it follows that ξ # (∞) = v # (∞) = 0. Now, since a = ξ # (0) > 0, it follows that: lim where d is finite.We use the relationship between ξ(k) and v(h) (Equation ( 26)) to find: Because both ξ(h) and h a → 0 as h → 0, applying l'Hôpital's rule we find: If a > 1, then both γ (h) and h a−1 will also → 0 as h → 0. In this case, again applying l'Hôpital's rule: If a < 1, then γ (h) and h a−1 will both → ∞ as h → 0. Thus, again applying l'Hôpital's rule, we will conclude with (A58) again.Thus, in both cases: Since a > 0 the rightmost expression cannot be zero unless d = 0. Therefore, the limit lim Entropy If a = 1 precisely, then h a−1 = 1 (it does not tend to 0 or ∞), while lim h→0 hξ (h) = 0. Thus, from (A56) we get lim h→0 h −1 v(h) = 3d (finite), which again means that v # (0) = ξ # (0) = a.Also, if a = 2 precisely, then h a−2 = 1 and from (A56) we get lim h→0 h −1 v(h) = 6d (finite).In conclusion, in all cases it holds that v # (∞) = ξ # (∞) = 0 and v # (0) = ξ # (0) = a, which means that v(k) has the same asymptotic behaviour with ξ(k) in both ends.We can thus rewrite (A50) as: Now coming to the asymptotic behaviour of the power spectrum, according to Gneiting and Schlather [37] (who also cite [46-48]), the power spectrum is "typically" associated with the asymptotic relationship s(w) ~w−a−1 as w → ∞ and s(w) ~w−b−1 as w → 0, where a (>0) and b (<0) are the LLDs of the structure function, v # (0) and autocorrelation function c # (∞), respectively, same to the above discourse.By inspection, it is seen that the slopes −(a + 1) and −(b + 1) are precisely the log-log slopes of the climacospectrum expressed in terms of frequency, ζ(w), or equivalently −ζ # (0) and −ζ # (∞), respectively.It can be conjectured that, since this is the "typical" behaviour, the non-typical cases correspond to integral values of a and b.In plain language, the above results can be expressed as follows: 1.
The asymptotic behaviour of the autocovariance function is the same with that of the climacogram.

2.
The asymptotic behaviour of the structure function is the same with that of the climacostructure function.

3.
The asymptotic behaviour of the power spectrum is the same with that of the climacospectrum.
Exceptions to the rule 1 may occur when b = −1 or b = −2 and this was indeed verified by thorough analytical work using the series of models examined in Appendix A.2. Exception to rule 2 can also appear and indeed they were found when b = −2 or a = 2.The entire list of exceptions found in this analytical work follows, also mentioning the specific models in which they appeared:

Proof of Physical Inconsistency of too Mild Slopes at High Frequencies in Power Spectrum
Let us assume that for small scales k < or high frequencies w > 1/ , with however small, the log-log derivative is s # (w) ≈ β, or else s(w) ~wβ where β is a constant satisfying β > −1.As the area of the power spectrum equals the variance of the continuous time process, we will have:              Appendix A.2.7.Generalized Planck Model (GP) The generalized Planck (GP) model is obtained by a generalization of Planck's law of black-body radiation, introducing an integral parameter n = 0, 1, 2, …, where the value n = 1 corresponds to Planck's law precisely.The process, whose detailed equations are given in Table A7,

Figure 1 .
Figure 1.Explanatory sketch for a stochastic process in continuous time and in discrete time.Note that the graphs display a realization of the process (it is impossible to display the process as such) while the notation is for the process per se (adapted from[24]).
Time unit = discretization time step D Time step or length of time window of averaging Time, discrete τ := t/D Dimensionless quantity, integer

Figure 2 .
Figure 2. Bounds of asymptotic values of CEPLT, (0) and (∞), and corresponding bounds of the log-log slopes of power spectrum and climacospectrum The "green square" represents the admissible region (note that s # can, by exception, take on values out of the square when (0) = 2 or (∞) = 0).The reasons why a process out of the square would be impossible or inconsistent are also marked.The lines (0) = 3/2 and (∞) = 1/2 represent neutrality and support the classification of stochastic processes into the indicated four categories (smaller squares within the "green square").

Figure 2 .
Figure 2. Bounds of asymptotic values of CEPLT, ϕ C (0) and ϕ C (∞), and corresponding bounds of the log-log slopes of power spectrum and climacospectrum The "green square" represents the admissible region (note that s # can, by exception, take on values out of the square when ϕ C (0) = 2 or ϕ C (∞) = 0).The reasons why a process out of the square would be impossible or inconsistent are also marked.The lines ϕ C = 3/2 and ϕ C (∞) = 1/2 represent neutrality and support the classification of stochastic processes into the indicated four categories (smaller squares within the "green square").
(a) Lebesgue background measure; (b) constrained mean µ and variance γ(1) at a specified (observation) time scale; (c) constrained lag-one autocorrelation ρ at the specified time scale; (d) an inequality constraint φ C (t) ≥ φ(t) to ensure physical realism as, naturally, by observing

Figure 4 .
Figure 4.The first fifty terms at time scales k = 1 and 20 of time series produced by various models, along with "stamps" of the models (green lines plotted with respect to the secondary axes) represented by the CEPLT, φ C (k).The different models are (a) Markov; (b) FHK, with CEPLT close to the absolute maximum (H = M = 0.97); (c) FHK, with CEPLT close to the absolute minimum (H = M = 0.05); (d) FHK, with CEPLT close to the absolute maximum for large scales (H = 0.99) and close to the absolute minimum for small scales (M = 0.01); (e) GP with n = 1 (blackbody spectrum) with CEPLT equal to the absolute minimum (0) for large scales and to the absolute maximum (2) for small scales;(f) Approximation of (e) by FHK (H = 0.01, M = 0.99, also using the same initial sequence of random numbers as in (e) to visualize similarity); (g) AE with n = 2; (h) average of a FHK with H = M = 0.8 and a harmonic oscillation with T = 5α = 50.In all cases the discretization time scale is D = 1, the characteristic time scale a = 10, and the characteristic variance scale λ is chosen so that for time scale D, γ(D) = 1.The mean is 0 in all cases and the marginal distribution is normal.The FHK is implemented using the Cauchy-type climacogram.

Figure 4 .
Figure 4.The first fifty terms at time scales k = 1 and 20 of time series produced by various models, along with "stamps" of the models (green lines plotted with respect to the secondary axes) represented by the CEPLT, φ C (k).The different models are (a) Markov; (b) FHK, with CEPLT close to the absolute maximum (H = M = 0.97); (c) FHK, with CEPLT close to the absolute minimum (H = M = 0.05); (d) FHK,with CEPLT close to the absolute maximum for large scales (H = 0.99) and close to the absolute minimum for small scales (M = 0.01); (e) GP with n = 1 (blackbody spectrum) with CEPLT equal to the absolute minimum (0) for large scales and to the absolute maximum (2) for small scales; (f) Approximation of (e) by FHK (H = 0.01, M = 0.99, also using the same initial sequence of random numbers as in (e) to visualize similarity); (g) AE with n = 2; (h) average of a FHK with H = M = 0.8 and a harmonic oscillation with T = 5α = 50.In all cases the discretization time scale is D = 1, the characteristic time scale a = 10, and the characteristic variance scale λ is chosen so that for time scale D, γ(D) = 1.The mean is 0 in all cases and the marginal distribution is normal.The FHK is implemented using the Cauchy-type climacogram.

Figure 5 .
Figure5.Ranges of asymptotic values for each of the most important second order functions of the examined processes.Wide light blue lines/boxes represent global properties (k, h → ∞, w → 0) while narrow dark red lines/boxes represent local properties (k, h → 0, w → ∞).Markov: Markov process; FHK: filtered Hurst-Kolmogorov process in any of the three forms; SE: smooth process with generalized exponential-type autocovariance; AE: antipersistent process with generalized exponential-type autocovariance; GP: generalized Planck.Note that the Markov process is identical with the SE and PE for n = 0.

Figure 6 .
Figure 6.Envelopes of ranges of asymptotic values for each of the most important second order functions for all examined processes.

Figure 7 .
Figure 7. Characteristic graphs of the turbulent velocity time series: (a) plot of the first fifty terms of the time series at time scales κ = 1, 20, 400 and 8000, along with the "stamp" φ C (k) of the fitted model (green line plotted with respect to the secondary axes); (b) empirical and fitted theoretical climacogram; (c) empirical and fitted theoretical climacospectrum.

Figure 7 .
Figure 7. Characteristic graphs of the turbulent velocity time series: (a) plot of the first fifty terms of the time series at time scales κ = 1, 20, 400 and 8000, along with the "stamp" φ C (k) of the fitted model (green line plotted with respect to the secondary axes); (b) empirical and fitted theoretical climacogram; (c) empirical and fitted theoretical climacospectrum.

Figure A2 .
Figure A2.Characteristic graphs of a time series of length 1024 generated by the Markov model with α = 10 and λ = 1.03 (so that γ(1) = 1): (a) plot of the first fifty terms of the at time scales κ = 1 and 20, along with the "stamp" φC(k) of the fitted model (green line plotted with respect to the secondary axes); (b) empirical and theoretical climacogram and autocovariance function; (c) empirical and fitted theoretical climacospectrum and power spectrum.

Figure A2 .
Figure A2.Characteristic graphs of a time series of length 1024 generated by the Markov model with α = 10 and λ = 1.03 (so that γ(1) = 1): (a) plot of the first fifty terms of the at time scales κ = 1 and 20, along with the "stamp" φ C (k) of the fitted model (green line plotted with respect to the secondary axes); (b) empirical and theoretical climacogram and autocovariance function; (c) empirical and fitted theoretical climacospectrum and power spectrum.

Figure A3 .
Figure A3.Characteristic graphs as in Figure A2 but for a time series generated by the FHK-C model with γ(1) = 1, α = 10 and H = M = 0.97.

Figure A7 .
Figure A7.Characteristic graphs as in Figure A2 but for a time series generated by the FHK-C model with γ(1) = 1, α =100 and H = M = 0.75.

Figure A7 .
Figure A7.Characteristic graphs as in Figure A2 but for a time series generated by the FHK-C model with γ(1) = 1, α = 100 and H = M = 0.75.

Figure A8 .
Figure A8.Characteristic graphs as in Figure A2 but for a time series generated by the AE model with γ(1) = 1, α = 10 and n = 2.

Figure A9 .
Figure A9.Characteristic graphs as in Figure A2 but for a time series generated by the GP model with γ(1) = 1, α = 10 and n = 1.

Figure A10 .Figure A10 .
Figure A10.Characteristic graphs as in Figure A2 but for a time series generated by the FHK-C model with λ = 1, α = 10, and H = M = 0.8 in which a harmonic oscillation with T = 5α = 50 is superimposed.

Table 1 .
Definition of main characteristics and notations for a stochastic process in continuous time; see Figure1for clarification.

Table 4 .
Models that can perform in each of the specified extremal or neutral cases of smoothness and persistence.