Generalized Ordinal Patterns and the KS-Entropy

Ordinal patterns classifying real vectors according to the order relations between their components are an interesting basic concept for determining the complexity of a measure-preserving dynamical system. In particular, as shown by C. Bandt, G. Keller and B. Pompe, the permutation entropy based on the probability distributions of such patterns is equal to Kolmogorov–Sinai entropy in simple one-dimensional systems. The general reason for this is that, roughly speaking, the system of ordinal patterns obtained for a real-valued “measuring arrangement” has high potential for separating orbits. Starting from a slightly different approach of A. Antoniouk, K. Keller and S. Maksymenko, we discuss the generalizations of ordinal patterns providing enough separation to determine the Kolmogorov–Sinai entropy. For defining these generalized ordinal patterns, the idea is to substitute the basic binary relation ≤ on the real numbers by another binary relation. Generalizing the former results of I. Stolz and K. Keller, we establish conditions that the binary relation and the dynamical system have to fulfill so that the obtained generalized ordinal patterns can be used for estimating the Kolmogorov–Sinai entropy.


Introduction
In 2002, Bandt and Pompe introduced so-called permutation entropy [1]. This entropy has been established in non-linear dynamical system theory and time series analysis, including applications in many fields from biomedicine to econophysics (compare with Zanin et al. [2]). It is a crucial point that permutation entropy is theoretically justified by asymptotic results relating it to Kolmogorov-Sinai entropy (KS entropy, also called metric entropy) which is the central complexity measure for dynamical systems. The important relationship of permutation entropy and KS entropy was first observed and mathematically founded for piece-wise monotone dynamical systems by Bandt et al. [3].
The (empirical) concept of permutation entropy is based upon analyzing the distribution of ordinal patterns in a time series or the underlying system. In this paper, we concentrate on a measure-preserving dynamical system (Ω, A , µ, T), i.e., a probability space (Ω, A , µ) equipped with a measurable map T : Ω → Ω satisfying µ(T −1 (A)) = µ(A) for all A ∈ A .
Given a random variable X : Ω → R, in this paper, an ordinal pattern of length n ∈ N with respect to X is considered as a subset of the state space Ω. It is indicated by a permutation π = (π 0 , π 1 , . . . , π n−1 ) of {0, 1, . . . , n − 1} and defined by P π := {ω ∈ Ω | X(T π 0 (ω)) ≤ X(T π 1 (ω)) ≤ . . . ≤ X(T π n−1 (ω))}. (1) In the rest of this section, we assume that X preserves enough information about the given system in a certain sense. This is particularly the case if Ω is contained in R and X is the identity map. A precise general description of the assumption is given when presenting the results of this paper. It was shown in [4] that, under not too restrictive further conditions, the probability distribution on the partitions OP(n) for n ∈ N can be used for determining the KS entropy of the given system. The reason is that, roughly speaking, under these conditions, OP(n) is able to separate the orbits of the system if n → ∞.
In order to address the problem that this paper is concerned with, we give a description of ordinal patterns being slightly different from the above. One can determine to which ordinal pattern P π of length n a point ω belongs to if, for all (s, t) in: one knows whether X(T s (ω)) ≤ X(T t (ω)) holds true or not. In other words, there exists a set A ⊆ E n such that: {ω ∈ Ω | (X(T s (ω)), X(T t (ω))) ∈ R} ∩ (s,t)∈E n \A {ω ∈ Ω | (X(T s (ω)), X(T t (ω))) ∈ R 2 \ R}, where: The above set contains all the points ω ∈ Ω that satisfy X(T s (ω)) > X(T t (ω)) for (s, t) ∈ A and X(T s (ω)) ≤ X(T t (ω)) for (s, t) ∈ E n \ A. Note that, given some arbitrary A ⊆ E n , the set on the right hand side of (3) can be empty. In the case that it is non-empty, it coincides with some ordinal pattern P π of length n. While Equation (3) might be a bit more abstract than (1), it shows a way to generalize the concept of ordinal patterns on the basis of replacing the set R in (4) by some arbitrary Borel subset R of R 2 , also to investigate why ordinal patterns are so successful.

Definition 1.
We call a non-empty Borel subset R of R 2 discriminating relation.
The figures given in this paper show different discriminating relations R. In each case, only the part of R contained in [0, 1[ 2 is presented. Note that in the case that X maps Ω into [0, 1[, the restriction of R itself to this part would not change anything. Figure 1a illustrates R as given in (4), again only on [0, 1[ 2 . In the case of such an R, note that tan(π(X − 1/2)) mapping [0, 1[ into [−∞, ∞[ would not make a difference to a given X for our considerations, since order relations and associated partitions are preserved.
Given some discriminating relation, generalized ordinal patterns of length n with respect to X are given as the non-empty sets defined by the right hand side of (3) for some A ⊆ E n . Obviously, they also form a partition of Ω. The question that arises is what a discriminating relation R should look like, such that those generalized ordinal patterns inhibit the same nice properties the original ordinal patterns had. More precisely, we ask the following question:

Main Question.
Under what conditions on a discriminating relation R the partitions given by the generalized ordinal patterns determine the KS entropy of a dynamical system?
Why is this determination of entropy, which is precisely described by formula (10) in Theorem 1 interesting? For answering this question, interpret X as an observable, ω as the initial state of the given system and X(ω), X(T(ω)), X(T 2 (ω)), X(T 3 (ω)), . . . as the measured values at times 0, 1, 2, 3, . . .. Determining (generalized) ordinal patterns on the basis of those values is a symbolization, where a symbol obtained is the (generalized) ordinal pattern containing ω. Generally, symbolization means a coarse-graining of the state space underlying a system, where each point is assigned one of finitely many given symbols. Instead of considering the precise development of the system, one is interested in the change of symbols in the course of time, justifying the naming of the method symbolic dynamics. Note that a symbolization is equivalent to partitioning the state space into classes of states (with the same symbol).
The reason for obtaining the full entropy from the (generalized) ordinal patterns is, roughly speaking, that the symbol system obtained has high potential for separating orbits. Such kinds of successful symbolizations are important, for example, in big data analysis, see, e.g., Smith et al. [5].
The above question was first considered in [6], where the authors basically showed that sets of the form: lead to generalized ordinal patterns that, under some conditions, can be used to determine the entropy if g : R → R is measurable and one-to-one. Such an R is shown in Figure 1b and will be discussed in Section 4 as well as another R illustrated in Figure 1.
In this paper, we consider general sets R ⊆ R 2 that cannot necessarily be described by functions and inequalities and establish some conditions under which the entropy can be determined using those sets. As in [6], the discussion also includes a generalization of the sets E n given by (2) and is conducted in a multidimensional framework. In particular, the results give insights as to why the basic ordinal approach and generalizations are working.
It is instructive to discuss the partition of R 2 into R and R 2 \ R from the viewpoint of symbolic dynamics. In contrast to classical symbolization approaches with symbolizing only in the range of single "measurements" x, the symbolization of pairs (x, y) via the partition {R, R 2 \ R} also regards some kind of link between x and y if R lies "diagonal" in a certain sense. We will discuss this constellation, which explains the success of ordinal patterns in a wider context, more precisely in Section 5.
A completely different constellation is given for the sets R shown in Figure 2. Here, R is obtained as a half-plane from a "horizontal" division of R 2 . If, for example, Ω = [0, 1[, A is the Borel σ-algebra and µ the Lebesgue measure on Ω, and if T is the tent map on Ω, meaning that: and X is the identity map, then the location of the horizontal cut is substantial. Figure 2. "Non-diagonal" discriminating relations.
On the one hand, R = {(x, y) ∈ R 2 | x ≤ 2/3} ( Figure 2b) does not discriminate enough to obtain the KS entropy of the given system, and on the other hand, there is enough discrimination by R = {(x, y) ∈ R 2 | x ≤ 1/2} (Figure 2a) due to the fact that {[0, 1 2 [, [ 1 2 , 1[} is a generating partition for T. In the situation considered, there is no additional information given by the measurements (x, y) relative to measurement x, hence R provides nothing more than a classical symbolization. For a detailed discussion of these facts, see [6].
The rest of this paper is organized as follows. Section 2 provides the notions and concepts being necessary for formulating the main statement of this paper in Section 3. This statement is rather abstract and general and has to be considered in relation to some special cases discussed in Section 4 and making our ideas and findings transparent. Section 5 is devoted to the proof of our main statement.

Preliminaries
Throughout this paper, (Ω, A , µ, T) will be a measure-preserving dynamical system.

Some Notions
We will write B = B(R) or B(R d ) for the Borel σ-algebra on R or R d ; d ∈ N, respectively. Given a random variable X : Ω → R, by µ X we denote the push-forward measure of µ with regard to X, i.e., µ X (A) If it is clear from the context which set R ∈ B(R 2 ) is considered, we simply write f X instead of f R X . The function f X can be represented as the integral: Since 1 R is integrable with regard to µ 2 X , f X is integrable and therefore, also measurable by Fubini's Theorem.
The complement R 2 \ R of a set R ⊆ R 2 will be denoted by R c . The notation ∂R will be used for the boundary of a set R, i.e., the closure of R without its interior.

Entropy
The Shannon entropy of a finite partition P ⊂ A of Ω is defined as The refinement of two partitions P, Q ⊂ A of Ω is given by For a finite collection of partitions P i ⊂ A , i ∈ {1, 2, . . . , n} of Ω, one analogously defines: The entropy rate of a finite partition P ⊂ A of Ω is defined as where T −t (P ) = {T −t (P) | P ∈ P }. For the existence of the limit in the formula, see, e.g., [7]. We are interested in determining the Kolmogorov-Sinai entropy of a system, which is defined as where the supremum is taken over all finite partitions of Ω in A . Note that the Kolmogorov-Sinai entropy serves as the central complexity measure for dynamical systems and can be considered as a reference for other complexity measures, including in data analysis. Roughly speaking, it measures the mean information obtained by each iteration step. Since the Kolmogorov-Sinai entropy is the supremum of the entropy rates of all finite partitions, its determination and its estimation in a practical context are not easy and it is of some interest to find natural finite partitions for which the entropy rate is near the Kolmogorov-Sinai entropy. This is also a motivation for considering ordinal patterns and its generalization in this paper.

σ-Algebras
Given a family of sets A i ∈ A , i ∈ I, by σ(A i | i ∈ I) we denote the smallest σ-algebra containing all sets A i . Analogously, for a family of partitions P i ⊂ A , i ∈ I of Ω, we define σ(P i | i ∈ I) as the smallest σ-algebra containing all partitions P i as subsets. Given as the smallest σ-algebra containing all preimages of Borel sets. When comparing two σ-algebras A 1 , A 2 ⊆ A , we ignore sets with measure 0, i.e., we write:

The Main Statement
Recall that a measure- , meaning that the system does not divide into proper parts.
Referring to Section 1, we give some preparation for stating our main result. Recall that for defining (generalized) ordinal patterns it was basic to know whether (X(T s (ω)), X(T s (ω))) ∈ R or (X(T s (ω)), X(T s (ω))) ∈ R 2 \ R for ω ∈ Ω and a random variable X on Ω, where "time" pairs (s, t) were taken from the sets E n (see (2)). In order to also allow reducing the number of necessary "comparisons", we relax the definition of the sets E n leading to the following concept.
0 timing, if E n contains finitely many elements and if there exists a sequence (a n ) n∈N ∈ N N 0 with: Formula (7) roughly says that nearly each "temporal" distance is available for "comparisons". It guarantees that enough time-pairs are considered to not have any loss of information contained in the "thinned out" generalized ordinal patterns relative to the "full" generalized ordinal patterns.

Remark 1.
In the first paper on generalized ordinal patterns ( [6]), a timing (E n ) n∈N was differently defined: the authors of that paper called a sequence of finite sets E 1 ⊆ E 2 ⊆ E 3 ⊆ . . . ⊆ N 2 0 timing, if there exists an increasing sequence (a n ) n∈N such that for all n ∈ N: for all s ∈ {a 0 , a 1 , . . . , a n }, there exists a t ∈ {a 0 , a 1 , . . . , a n } with s = t and (s, t) ∈ E n , . . . , d} hold true. Note that the last condition does not only depend on the timing (E n ) n∈N but also on T and X = (X 1 , X 2 , . . . , X d ). Instead of those three conditions, we instead simply require that almost all differences can be found in the timing.
Given a random vector (X 1 , X 2 , . . . , X d ) and R ∈ B(R 2 ), we define the partition: . . , d}, which is equal to: Then, for E n as given in (2) the partition (s,t)∈E k (T s , T t ) −1 (R i ) is no more than the partition of generalized ordinal patterns with respect to X i defined in Section 1 and can be considered as the partition of generalized ordinal patterns with respect to (X 1 , X 2 , . . . , X d ).
The proof of the following main theorem of the paper is given in Section 5.
be an ergodic measure-preserving dynamical system, X = (X 1 , X 2 , . . . , X d ) : Ω → R d be a random vector, R ∈ B(R 2 ) be a discriminating relation and (E n ) n∈N be a timing. Assume that the following conditions are valid: There exists a countable set C ⊆ R 2 with: There exists a random variable Y : Ω → R with: Then: holds true.
At first glance, conditions (8) and (9), being sufficient for (10), are looking very special. The considerations in the following section will, however, elucidate their role and show that they are relatively general. Roughly speaking, (8) says that the distribution of pairs of "independent measurements" with respect to X i is discrete on the boundary of R. Condition (9) is an orbit separation condition based on the involved "measurements" and the functions f R X i . In general: holds true because all functions involved in (9) are A measurable. Therefore, (9) is equivalent to The inclusion of the random variable Y provides some further separation and allows the above inclusion to hold true for a wider class of dynamical systems than, for example, the ones considered in [6]. In the case that Y is constant, it can also be omitted. In theory, Y should be chosen to take different values on those sets on which f R X i • X i • T t takes the same values for i ∈ {1, 2, . . . , d} and t ∈ N 0 . In practice, the fact that such a random variable Y exists is sufficient and Y does not need to be explicitly specified. An example is given in Section 4.5.

Special Cases
In the following, we discuss some special situations where the assumptions of Theorem 1, i.e., (8) and (9), are satisfied. Lemma 2 provides an easy-to-check condition, that of when (8) holds true. It is more difficult to see, when the condition (9) is satisfied. Roughly speaking, this condition is fulfilled if X = (X 1 , X 2 , . . . , X d ) together with Y can uniquely describe the outcomes of the whole dynamical system and applying f X i to the results of X is, in some sense, "reversible" for all i ∈ {1, 2, . . . , d}. In other words, X = (X 1 , X 2 , . . . , X d ) together with Y preserve the information of the system and there is no information loss for the symbolization. The first means that: which obviously follows from To describe the range of outcomes of the random variables X on a probability space (Ω, A , µ), we will use its cumulative distribution functions F X : R → [0, 1] defined by When applying the cumulative distribution functions F X to the outcomes X of a system, we do not lose any essential information about the system, according to the following lemma. This lemma is a simple modification of Lemma A.3 in [6]. (Ω, A , µ) be a probability space, X : Ω → R be a random variable and g : R → R be a B − B measurable function which maps Borel sets to Borel sets and satisfies the following property:
Condition (11) in the above Lemma is a slightly weaker condition on g than injectivity. If g is injective, then g −1 (g(] − ∞, x])) =] − ∞, x] will hold true for all x ∈ R and condition (11) will be satisfied. More general, condition (11) can still be true if g is not necessarily injective but if all sets on which g is not injective, which are given by for all x ∈ R, have measure 0. For example, this is true if g is equal to the cumulative distribution function.

On the Boundary of R
The condition (8) in Theorem 1, that the boundary of R apart from countably many points has measure 0, holds true for all "simple" sets R. In the following lemma, we specify what we mean by "simple". (Ω, A , µ) be a probability space, (X 1 , X 2 , . . . , X d ) be a random vector and R ∈ B(R 2 ). If, for all i ∈ {1, 2, . . . , d}:

Remark 2. The patterns visualized in
In the following three subsections, Y is assumed to be constant, and hence can be omitted.

Basic Ordinal Patterns
If: Figure 1a), then f R X i is just the distribution function of µ X i , i.e., f R X i = F X i . Since ∂R ∩ {x} × R = {(x, x)} is finite for all x ∈ R, (8) holds true by Lemma 2. According to Lemma 1, one has: for all t ∈ N 0 and i = 1, 2, . . . , n. Therefore: is equivalent to: By Theorem 1, for ergodic systems, condition (14) implies (10). A more general statement also includes a large class of non-ergodic systems which was shown in [4]. Condition (14) is, for example, satisfied if Ω ∈ B(R d ) and X i is the projection on the i-th coordinate for all i ∈ {1, 2, . . . , d}, or if Ω is a compact Hausdorff space and X = (X 1 , X 2 , . . . , X d ) is injective and continuous. One can also use Taken's theorem to argue that the set of maps X : Ω → R d that satisfy (14) is large in a certain topological sense. For both, see Keller [8].

Patterns Defined by "Injective" Functions
Let X = (X 1 , X 2 , . . . , X d ) be a random vector and consider now: for a B − B measurable function g : R → R (see Figure 1b). Since ∂R ∩ ({x} × R) = {(x, g(x))} is finite for all x ∈ R, (8) holds true by Lemma 2. Moreover, one easily sees that holds true for all i ∈ {1, 2, . . . , d}. This directly yields: for all t ∈ N 0 and i ∈ {1, 2, . . . , d}. Now, suppose that: holds true. This would then imply (16). However, the above equation only holds true µ-almost surely (see (13)). This can be a problem when applying the function g because there could exist sets B ∈ B with µ X i (B) = 0 but µ X i (g(B)) > 0. Additionally, we therefore need to require that µ X i (g −1 (B)) = 0 implies µ X i (B) = 0 for all B ∈ B. Theorem 1 then provides the following statement: Corollary 1. Let (Ω, A , µ, T) be an ergodic measure-preserving dynamical system, X = (X 1 , X 2 , . . . , X d ) : Ω → R d be a random vector and (E n ) n∈N be a timing. Let further g : R → R be a B − B measurable function which maps Borel sets to Borel sets, is injective on X i (Ω) and satisfies µ X i (g −1 (B)) = 0 ⇒ µ X i (B) = 0 for all B ∈ B and i ∈ {1, 2, . . . , d}. Let Then, (14) implies (10). Moreover, (10) holds true if Ω ∈ B(R d ) and X i is the projection on the i-th coordinate for all i ∈ {1, 2, . . . , d} or if Ω is a compact Hausdorff space and X = (X 1 , X 2 , . . . , X d ) is injective and continuous.
Note that the statements in Corollary 1, in principle, were shown in [6]. The case of basic ordinal patterns is included by g(x) = x for all x ∈ R.

Patterns Defined by "Surjective" Functions
Swapping coordinates in (15) yields the set: Figure 1c) with (8) following from Lemma 2 and with:
(ii) If Ω ∈ B(R d ) and X i is the projection on the i-th coordinate for all i ∈ {1, 2, . . . , d} or if Ω is a compact Hausdorff space and X = (X 1 , X 2 , . . . , X d ) is injective and continuous, if further µ(U) > 0 for every non-empty open set U ⊆ Ω, g is continuous and X i (Ω) ⊆ g(X i (Ω)), then (10) is valid in each of the following two cases: (1) For each i ∈ {1, 2, . . . , d} and all x 1 , Ω is connected.
(ii): Given the assumptions of (ii), we have to show that F g•X i is injective on X i (Ω) for all i ∈ {1, 2, . . . , d}. If Ω is connected, then (1) is obviously satisfied. We can thus start from (1). Take ) contains a non-empty open set. This implies that F g•X i (x 1 ) < F g•X i (x 2 ). because every non-empty open set was assumed to have a strictly positive measure.
Notice that, unlike in (15), it is not necessary that g is one-to-one.

Piecewise Patterns
The previous subsection illustrates that (9) is fulfilled if, roughly speaking, (X 1 , X 2 , . . . , X d ) preserves all information and if f R X i is a µ X i almost surely invertible function for all i ∈ {1, 2, . . . , d}. The finite-valued random variable Y in (9) can be used to weaken the condition of invertibility in the sense that only piecewise invertibility is needed where the different pieces are induced by the random variable Y.

A Remark on the Work of Amigó et al.
Consider the discriminating relation: Figure 3 for k ∈ N. Assume for simplicity that the dynamical system is defined on Ω = [0, 1[ and that X is the identity map id. It is easy to see that: . Therefore, (9) in Theorem 1 holds true if P k is a generating partition.
which was introduced by Amigó et. al. [9]. They used finite-valued random variables to quantize the dynamical system into k parts and considered the ordinal patterns of the quantized systems while we directly apply the quantization to the discriminating relation. Both approaches only differ in their notation. They showed in their paper that the limit in (18) is equal to the Kolmogorov-Sinai entropy.

Proof of the Main Statement
We first recall some definitions and statements related to partitions and the conditional entropy. For two partitions P, Q ⊂ A of Ω, the conditional entropy is defined as Roughly speaking, the conditional entropy H(P |Q) describes how much uncertainty is left in the outcomes described by the sets given in P if one already has information about the outcomes described by the sets given in Q. For example, if P = Q, then H(P |Q) = 0. However, if P and Q are independent, meaning that µ(P ∩ Q) = µ(P) · µ(Q) for all P ∈ P and Q ∈ Q, and H(P |Q) = H(Q).
Without explicitly referencing them, we will use the following properties of the conditional entropy: See, for examples, [7] for proofs. A sequence of partitions (P i ) i∈N in A of Ω is said to be generating (the σ-algebra A ), if σ(P i ) ⊆ σ(P i+1 ) for all i ∈ N and: σ(P i | i ∈ N) = µ A holds true. As a consequence of this property: holds true for all partitions P ⊂ A of Ω. Using the properties of the conditional entropy implies: h(T) = lim n→∞ h(T, P n ).
For N ∈ N: will denote the partition of [0, 1] in 2 N equally sized intervals. We start the proof of Theorem 1 with two basic lemmata.
Lemma 3. Let (Ω, A , µ, T) be a measure-preserving dynamical system, X = (X 1 , X 2 , . . . , X d ) be a random vector and Y be a random variable satisfying (9). Then, there exists some constant c ∈ R with: for all m ∈ N.
Proof. Fix m ∈ N. Set: Since Y was assumed to attain only a finite number of different values, M is a finite partition of Ω. Because the Borel σ-algebra of [0, 1] is generated by the partitions U N and due to (9), we have: Thus, for any ε > 0 and any finite partition P ⊂ A of Ω, there exists an N ε ∈ N and a t ε ∈ N with: for any ε > 0, which implies:

Lemma 4. Let
(Ω, A , µ) be a probability space, X : Ω → R be a random variable and A, B ∈ B(R 2 ). Then: holds true.
Proof. For all x ∈ R: holds true. Analogously, one can show: This implies: and, by Fubini's theorem: Therefore, in particular, the above lemma implies that, if (R j ) j∈N is a sequence of sets in B(R 2 ) with lim j→∞ µ 2 X (R j R) = 0, then f R j X converges to f R X in L 1 for j → ∞. Given R ⊆ R 2 and a random variable X : Ω → R, consider the function f R X,n : We want to show that f R X,n (x, ω) converges to f R X (x) for all x ∈ R and µ-almost all ω ∈ Ω. If f R X,n (x, ω) is monotone in x for all ω ∈ Ω and n ∈ N, this can be shown relatively easily using the pointwise ergodic theorem and the monotonicity of the considered functions. Monotonicity is guaranteed, if x 1 ≤ x 2 implies: For example, if R = {(x, y) ∈ R 2 | x > y}, the above implication holds true. For this special case, a proof of the statement in Lemma 5 can be found in [4].
However, we are interested in general sets R ∈ B(R 2 ) and therefore, cannot use the monotonicity. Therefore, we have to prove this statement differently.
holds true for all x ∈ R with {x} × R ∩ A i = ∅ and: Consider the function φ n : Ω → [0, 1] with: Then: for all ω ∈ Ω 0 . It is easy to see that: holds true for all n ∈ N. This implies (see for instance [11], Theorem 13.4 (i)): Therefore k t=1 id, T N t −1 (R i ) is a sequence of partitions generating σ f R X i • X i . By (19), this implies: for all N ∈ N. Set: n k := max 1≤n≤k a N n .
Notice that: holds true for all k ∈ N. Consequently: = 0 for all N ∈ N.
We can now finalize the proof of Theorem 1.
Proof of Theorem 1. Let N ∈ N and m ∈ N. Set: and: for all i ∈ {1, 2, . . . , d} and k ∈ N. According to Lemma 1, there exists a sequence (n k ) k∈N ⊆ N N 0 with: for all i ∈ {1, 2, . . . , d}. We have: for all k, m, N ∈ N. This implies: Using Lemma 3, we can conclude that there exists a constant c ∈ R with: which is equivalent to: On the other hand: which, together with (31), finishes the proof.

Conclusions
We discussed a special "two-dimensional" approach to symbolic dynamics differing from many usual approaches which was introduced in [6]. From the practical viewpoint, the difference can be illustrated as follows: given the time-dependent measurements of a real-valued quantity, a symbolization is not conducted for the measurements themselves as in usual approaches, but for pairs of measurements at two different times. This means that to each pair of possible measured values, a symbol from a finite symbol set is assigned. Here, we only considered two symbols which lead to a partitioning of the two-dimensional real space R 2 into a set R and its complement R 2 \ R. In usual approaches, partitions of R are considered. (Advantages of the "two-dimensional" approach are described in [6]).
The set R, called a discriminating relation, was considered as a basic building block for constructing partitions of the state space of a given dynamical system, having timedependent measurements of finitely many quantities in mind. In addition to the discrimination relation, the second central concept was the concept of a timing which roughly describes which pairs of times are included in the symbolization process and guarantees that there are not too few such pairs. The central question of the paper was that of under which conditions on a discriminating relation R the partitions constructed from R determine the KS entropy of a measure-preserving dynamical system. With Theorem 1, we gave a relatively general statement partially answering this question. Some specifications of the theorem in Section 4 illustrate the nature of "successful" discriminating relations.
Although the statement of Theorem 1 appears relatively natural when looking at the proofs a little closer, we do not expect that all cases where the K-S entropy can be constructed based on a discriminating relation is covered by the statement; however, we have no counterexample. The main tool used in the proofs of the results is the pointwise ergodic theorem. It allows to establish a connection between the generalized ordinal patterns and the shape of the discriminating relation.
The results of this paper, being on a rather abstract level, give some insights as to why the idea of ordinal patterns is working well, as reported by several applied papers, with extracting those advantageous features being more general than in the original ordinal approach. Having many choices for a discriminating relation, for practical purposes such as, for example, in a classification context, one needs methods and criteria for finding good discrimination relations, adapted to given data and problems. This is an important challenge for further research related to the given approach to symbolic dynamics. A further aspect is to discuss the approach for partitioning the R 2 into more than two pieces.