On Rényi Permutation Entropy

Among various modifications of the permutation entropy defined as the Shannon entropy of the ordinal pattern distribution underlying a system, a variant based on Rényi entropies was considered in a few papers. This paper discusses the relatively new concept of Rényi permutation entropies in dependence of non-negative real number q parameterizing the family of Rényi entropies and providing the Shannon entropy for q=1. Its relationship to Kolmogorov–Sinai entropy and, for q=2, to the recently introduced symbolic correlation integral are touched.


Paper Background and Motivation
Since Bandt and Pompe [1] introduced the concept of permutation entropy (PE), it has been applied in different fields from biomedicine to econophysics (e.g., Zanin et al. [2], and Amigò et al. [3]) and developed into various directions. One relatively new variant of permutation entropy is based on Rényi entropies instead of the originally used Shannon entropy and is called Rényi permutation entropy (RPE). Roughly speaking, RPE quantifies the complexity of the distribution of ordinal patterns of some length n underlying a dynamical system, where ordinal patterns describe the up and down in the dynamics. As Rényi entropies depend on a parameter q ∈ [0, ∞[, there are also different choices of RPE depending on q.
The central aim of the paper is to discuss the asymptotics of RPE for increasing pattern length. This is motivated by the striking fact that, under certain assumptions, asymptotic PE is equal to Kolmogorov-Sinai entropy, which was first observed by Bandt et al. [4]. This paper shows that the situation for q = 1 is more complicated than that for q = 1.
The paper is organized as follows. It first follows a short overview of first applications of the RPE. Section 2 provides the main definitions. The concepts of RPE are introduced in empirical and model-based settings. Moreover, RPE is discussed for some special q, including q = ∞ as a limit case. Section 3 is devoted to the asymptotics of RPE and PE. With Corollary 1, the section contains the main new result of the paper relating RPE to Kolmogorov-Sinai entropy for q ∈ [0, 1] and measures with maximal entropy. Its proof and a class of discriminating examples for q > 1 (see Example A1) are given in Appendix A.

First Applications of RPE
To our best knowledge, the concept of RPE was first considered in the literature in 2015. In a study of monitoring the depth of anaesthesia by EEG, Liang et al. [5] systematically compared 12 entropy measures, with RPE among them. They reported that RPE had the best performance in distinguishing different anaesthesia states. Mammone et al. [6] discussed RPE in the context of absence epilepsy EEG. Their results suggested improved abilities in classifying ictal and interictal EEG by using RPE (with suitable parameters) instead of PE. Zunino et al. [7] introduced permutation min entropy, which is the limit of Rényi entropy for defining parameter q approaching to ∞ as a tool for finding temporal correlations in a time series.
Moreover, Rivero et al. [8] combined an enhanced Bayesian approach and RPE for predicting long-term time series. Following the results of Liang et al., Park et al. [9] used RPE for comparing anaesthetics given during a cesarean section with similar results as those for other entropy measures. Different variants of RPE, from weighting to multiscaling, have been applied to complex stock-market data (Zhou and Shang [10], and Chen et al. [11]). Some remarks on RPE can also be found in [12].

General Entropy Concept
Given a finite index set I consisting of n elements, (p i ) i∈I ∈ R n is called a stochastic vector if p i ≥ 0 for all i ∈ I and ∑ i∈I p i = 1. The Rènyi-entropy RE((p i ) i∈I , q) of a stochastic vector (p i ) i∈I for q ∈ [0, ∞[ is defined by The Rényi entropy of a fixed stochastic vector monotonically decreases and is continuous with respect to q. It generalizes the Shannon entropy given in the standard case q = 1. The larger that q is, the more the role of the largest entries in the stochastic vector is emphasized, and the smaller that q is, the more equal the role of all positive entries in the entropy formula is.
On the basis of the concept of Rényi entropies, we want to give precise definitions of RPE regarding both the empirical and the modelling viewpoint.

Empirical RPE
For n ∈ N, we denote the set of permutations of {0, 1, . . . , n − 1} by S n . A vector (v 0 , v 1 , . . . , v n−1 ) ∈ R n has ordinal pattern π = (π 0 , π 1 , . . . , The latter requirement realises the uniqueness of ordinal patterns. Definition 1. The empirical Rényi permutation entropy for q ∈ [0, ∞[ and n ∈ N of a time series (x t ) N−1 t=0 is defined by being the relative frequency of ordinal patterns π in the time series, and 0 log 0 and 0 0 being defined by 0.

RPE
On the model side, we consider a measure-preserving dynamical system (Ω, A, µ, T), defined as a probability space (Ω, A, µ) being equipped with a A-A-measurable map T : Ω → Ω which satisfies µ(T −1 (A)) = µ(A) for all A ∈ A. T and the system (Ω, Generally, the dynamics of T can be related to ordinal patterns via a real-valued random variable X on Ω by assigning ω ∈ Ω the ordinal pattern of (X(ω), X(T(ω)), . . . , X(T n−1 (ω)). Here, X is interpreted as observable modelling of a measuring process. If X (or, more generally, a collection of random variables) has certain separation properties, the ordinal patterns obtained via X (or all random variables) contain much information on the given system. In the following, however, we usually assume that Ω is a subset of R, and the ordinal pattern Π(x) assigned to x ∈ Ω is that taken from (x, T(x), . . . , T n (x)) (this is equivalent to considering an X being the identity map).

Estimation
Given an orbit of some x ∈ Ω, it is natural to estimate µ(P π ) for π ∈ S n and PE(T, q, n) by p π and ePE (x t ) N−1 t=0 , q, n , respectively. In the case that T is ergodic, by Birkhoff's ergodic theorem, the corresponding estimators are asymptotically consistent. This particularly means that lim N→∞ ePE (T t (x)) N−1 t=0 , q, n = PE(T, q, n) for µ-almost all x ∈ Ω.

RPE for Special Parameters q
In the following, we discuss the RPE for some special parameters q, and touch the general concept of Rényi entropies. q = 0: Rényi entropy for q = 0 is the well-known Hartley entropy, and the RPE of a measure preserving dynamical system is no more than the logarithm of the number of ordinal patterns appearing with positive probability. Rényi entropy for q = 0 is also called max entropy since it is maximal among Rényi entropies. q = 1: This case providing the standard (Shannon) permutation entropy has been discussed in various papers both from a theoretical and an application viewpoint. We particularly refer to the literature mentioned in several parts of this paper. q = 2: Rényi entropy for q = 2, also called quadratic entropy or collision entropy, is used in different fields. It is obviously related to the Simpson index ∑ n i=1 p 2 i given for a stochastic vector (p i ) n i=1 and used as a diversity measure in ecology (see [13]). Given a measurepreserving dynamical system (Ω, A, µ, T), we look at the RPE for q = 2.
By Fubini's theorem, it holds that Here, 1 A stands for the indicator of a set A assigning a point the value 1 if it belongs to A and value 0 otherwise, and µ 2 denotes the product measure of µ with itself. So, PE(T, 2, n) is related to the probability that the ordinal patterns of length n of two independently (with respect to µ) chosen points coincide.
A natural estimation of S n based on a finite orbit of some x ∈ Ω is given by providing the relative frequency of pairs in the orbit with coinciding (completely defined) ordinal patterns. This qualifies the RPE for q = 2 as a recurrence measure. Quantity (1) was introduced by Caballero et al. [14] as the symbolic correlation integral in the context of a stochastic process and studied mainly in the i.i.d. case. q = ∞: It is well-known that the Rényi entropy of a stochastic vector (p i ) i∈I for q → ∞ converges to value This fact can be used to reconstruct a stochastic vector up to permuting its components from its Rényi entropies for an unbounded sequence (q n ) n∈N (see Appendix A). Since is called min entropy of (p i ) i∈I . Applications of min entropy in the permutation entropy context can be found in Zunino et al. [7]. In the following, we further assume that q < ∞.

Asymptotics of RPE and PE
As already mentioned, there is a strong relationship between Kolmogorov-Sinai entropy and PE. The result of Takens and Verbitskiy [15] that, for q > 1, Kolmogorov-Sinai entropy can be expressed by a limit on the basis of Rényi entropies instead of Shannon entropies suggests the question whether PE can be similarly replaced by RPE in that relationship. This question addresses the asymptotics of RPE, and the general nature of RPE is thus in the centre of this section.

Kolmogorov-Sinai Entropy via Rényi Entropies
Definitions and statements of this subsection go back to Takens and Verbitsky [15]. Many considerations of this paper are related to partitions of Ω. We generally assume that, in a context where a σ-algebra on Ω is specified, such partitions are contained in it.
Let (Ω, A, µ, T) now be a measure-preserving dynamical system, and consider a finite partition P = {P i } i∈I of Ω. For n ∈ N and multi-indices i = (i 0 , i 1 , . . . i n−1 ) ∈ I n define the sets forming the partition with for a finite partition Q of Ω. Generalized Kolmogorov-Sinai entropy for q ∈ [0, ∞[ is defined as the supremum of generalized entropy rates taken over all finite partitions: (Standard) Kolmogorov-Sinai entropy is given by In the case of q = 1, the limit inferior in (3) can be replaced by a limit and that, for Ω being an interval and A being the Borel σ-algebra, Kolmogorov-Sinai entropy is already determined by finite interval partitions defined as finite partitions consisting of intervals (e.g., [16]): The following theorem of Takens and Verbitskiy [15] was originally proved for invertible systems; however, it also holds true for noninvertible systems (see Verbitskiy [17]). Assumption (i) of ergodicity can be relaxed (see Takens and Verbitskiy [18]); however, we do not go into the technical details. Theorem 1. Let (Ω, A, µ) be a standard-probability space and T : Ω → Ω an aperiodic and ergodic measure-preserving function. Then, Here, T is called aperiodic if the set of periodic points has measure zero with respect to µ. The property of a probability space to be standard is a relatively technical one; however, it is not very restrictive since it is principally satisfied for the most common probability spaces (e.g., Walters [16]).

Kolmogorov-Sinai Entropy and RPE
In order to discuss the relationship between RPE and Kolmogorov-Sinai entropy of a measure-preserving dynamical system (Ω, A, T, µ), we define lower and upper Rényi permutation entropies PE(T, q) and PE(T, q) for q ∈ [0, ∞[ as PE(T, q) = lim inf  [4] that, for piecewise continuous and monotone interval maps, the permutation and Kolmogorov-Sinai entropy coincide is the motivation for the following discussion. Here, we state the more general version of the result proved in Gutjahr and Keller [19], but afterwards return to the case of piecewise monotone interval maps. In the following, we call a subset of Ω ⊂ R interval if it is the intersection of an interval of R with Ω or a one point set.

Theorem 2 ([19]
). Let (Ω, B, µ, T) be a measure-preserving dynamical system, with Ω ⊆ R being compact, and B being the Borel σ-algebra on Ω. If there exists a finite partition M or a countable partition M with H(M) < ∞ of Ω into intervals, such that T is monotone on each of the intervals, then PE(T) = h(T).
Theorem 2 covers interval maps, since a noncompact Ω can be replaced by compactification without substantially changing the structure of the given system. q > 1: In light of the statement of Takens and Verbitskiy [15] mentioned above, it is a natural question whether also PE(T, q) = h(T, q)(= h(T)). The general answer is no. Examples with PE(T, q) < h(T, q) covering all q > 1 are given by Example A1 in Appendix B.2. q < 1: We also look at case q < 1 in the class of maps considered by Bandt et al. in [4]. For this, let Ω be an interval, B the Borel σ-algebra on it, and M be a finite partition of Ω into intervals on each of which T is monotone and continuous. For such a map, it was shown in [4] that holds true. Using the fact that Rényi entropy monotonically decreases in q, this implies

Quantity lim sup n→∞
1 n log #{P π ∈ OP(n) | P π = ∅} could be considered to be a topological version of permutation entropy. This is justified by (4) for T, as defined above (4), and the following: If T is continuous on all of Ω, Misiurewicz and Szlenk showed that lim n→∞ 1 n log #{M ∈ M (n) | M = ∅} is equal to the topological entropy of T [20]. For the definition of topological entropy and the following, see, e.g., [16].
By variation principle, the topological entropy of a map T on a compact Hausdorff space is equal to the supremum of the Kolmogorov-Sinai entropy of all Borel measures for which T is measure-preserving (e.g., [16]). Often, topological entropy is assumed by the Kolmogorov-Sinai of such a measure. Generally, given a continuous map T on a metric space, a corresponding Borel measure being measure-preserving has maximal entropy if its Kolmogorov-Sinai entropy coincides with the topological entropy of T.
On the basis of the discussion above, we show the following statement (see Appendix B.1).

Corollary 1. Let
(Ω, B, µ, T) be a measure-preserving dynamical system, with Ω ⊆ R being an interval, and B being the Borel σ-algebra on Ω. Suppose that T is continuous, and there exists a finite partition of Ω into intervals, such that T is monotone on each of those intervals, and µ is a measure of maximal entropy. Then

Conclusions
In this paper, we looked more closely at the recently introduced and used Rényi variant of permutation entropy, depending on a parameter q ∈ [0, ∞[, which is called Rényi permutation entropy (RPE) here. Giving a summary of first applications of RPE, and discussing RPE for some special parameter q, we mainly focused on the asymptotics of RPE for ordinal pattern length going to ∞. This was motivated by the fact that the usual permutation entropy (PE) often asymptotically coincides with Kolmogorov-Sinai entropy, and that, for q > 1, Kolmogorov-Sinai entropy can be defined by Rényi entropies instead of Shannon entropies. This paper showed that, for q > 1, asymptotics of RPE can be different from that of PE, meaning that, for long ordinal patterns, the nature of RPE is also not the same as that of PE. One the other hand, it is interesting that, for continuous piecewise monotone interval maps with a measure of maximal entropy and q < 1, asymptotics of RPE and PE are the same. Results indicate that the behaviour of general RPE is more specific than that of PE, although the asymptotics of PE is not completely understood. Further work for the better understanding of RPE for large pattern lengths is necessary.
The content of this paper is more or less purely mathematical, but in a certain sense, it justifies the application of RPE in dynamical systems and time series besides PE. Some of the applications mentioned at the beginning of the paper underline the benefit of using RPE. There is, however, the other interesting point that special q address special features; so, for example, q = 2 is related to recurrence. The symbolic correlation integral related to case q = 2 is a U-statistic, which is helpful in the statistical analysis of the corresponding entropy. Work on utilizing this fact for testing for asymmetry in temporal data is in progress.
Author Contributions: K.K. and T.G. designed and wrote the paper, and T.G. provided all statements and proofs given in Appendix B. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Reconstruction of Stochastic Vectors from Rényi Entropies
be a stochastic vector, and let (q n ) n∈N be an unbounded sequence of positive numbers, such that RE((p i ) m i=1 , q n ) is known for all n ∈ N. Assume that p 1 ≤ p 2 ≤ . . . ≤ p m . The reconstruction we give is inductive.

Appendix B. Proofs
Some of the following statements are only given on a level of generality required here. In some cases, for more general statements and further details, we refer to Gutjahr [21]. First, if a finite partition P of Ω is finer than another one, Q, then H(P, q) ≥ H(Q, q) for all q ∈ [0, ∞[. This is well-known and easily follows from the concavity and convexity of x → x q for q < 1 and q > 1, respectively, and from the convexity of x → x log x for q = 1 on ]0, ∞[. Given two finite partitions P, Q of Ω, let P ∨ Q = {P ∩ Q = ∅ | P ∈ P, Q ∈ Q} be the largest common refinement of P and Q. Further, for a partition P of Ω and some Q ∈ A, let ∆(P |Q) := {P ∈ P | µ(P ∩ Q) > 0} and #∆(P |Q) the number of elements of ∆(P |Q).
The proof of Corollary 1 is built on the two following lemmata.

Using the above inequality provides
The following lemma was proved in Gutjahr and Keller [19] (see Lemma A1). We present it here in a slightly weakened form since we do not need the full generality.
Lemma A2. Let (Ω, B, µ, T) be a measure-preserving dynamical system, with Ω ⊆ R being an interval and B being the Borel σ-algebra on Ω. Further, let P be a finite interval partition of Ω. Then for all P π ∈ OP(n) We can now finalise the proof of Corollary 1. Given the assumptions of the corollary, by Proposition 1 and the monotony of the Rényi entropy with respect to q, we have Proof. Let P = {P i } i∈I be a finite partition of Ω. Then, Lemma A1 provides H(OP(n), q) ≤ H(OP(n) ∨ P (n) , q) ≤ H(P (n) , q) + max i∈I n log(#∆(OP(n)|P(i))) for all q > 1 and n ∈ N. Dividing both sides by n and taking the limit superior n → ∞ finishes the proof.
As a consequence of the two lemmata, we obtain an upper bound of RPE that is important for the following.
for such cylinder sets. σ is µ p -preserving and ergodic.
Proof. Take k ∈ N and q ∈ R. Notice that holds true for all q = 1.
In order to obtain a map with finitely many monotone parts, identify Bernoulli shift σ on {0, 1, . . . , N − 1} ∞ with interval map T on Ω = [0, 1[ defined by T(ω) = N · ω mod 1 for all ω ∈ Ω. This is possible since the correspondence holds true. Here the first equality follows from Corollary A1, the last one from Theorem 1, and the first inequality from Lemma A6.