Generalized Ordinal Patterns and the KS-Entropy

Tim Gutjahr; Karsten Keller

doi:10.3390/e23081097

and

Institute of Mathematics, University of Lübeck, D-23562 Lübeck, Germany

^*

Author to whom correspondence should be addressed.

Entropy2021, 23(8), 1097;https://doi.org/10.3390/e23081097

This article belongs to the Special Issue Entropy Measures for Data Analysis II: Theory, Algorithms and Applications

Version Notes

Order Reprints

Abstract

Ordinal patterns classifying real vectors according to the order relations between their components are an interesting basic concept for determining the complexity of a measure-preserving dynamical system. In particular, as shown by C. Bandt, G. Keller and B. Pompe, the permutation entropy based on the probability distributions of such patterns is equal to Kolmogorov–Sinai entropy in simple one-dimensional systems. The general reason for this is that, roughly speaking, the system of ordinal patterns obtained for a real-valued “measuring arrangement” has high potential for separating orbits. Starting from a slightly different approach of A. Antoniouk, K. Keller and S. Maksymenko, we discuss the generalizations of ordinal patterns providing enough separation to determine the Kolmogorov–Sinai entropy. For defining these generalized ordinal patterns, the idea is to substitute the basic binary relation ≤ on the real numbers by another binary relation. Generalizing the former results of I. Stolz and K. Keller, we establish conditions that the binary relation and the dynamical system have to fulfill so that the obtained generalized ordinal patterns can be used for estimating the Kolmogorov–Sinai entropy.

Keywords:

ordinal patterns; measure-preserving dynamical system; Kolmogorov–Sinai entropy; permutation entropy; ergodic theory

1. Introduction

In 2002, Bandt and Pompe introduced so-called permutation entropy []. This entropy has been established in non-linear dynamical system theory and time series analysis, including applications in many fields from biomedicine to econophysics (compare with Zanin et al. []). It is a crucial point that permutation entropy is theoretically justified by asymptotic results relating it to Kolmogorov–Sinai entropy (KS entropy, also called metric entropy) which is the central complexity measure for dynamical systems. The important relationship of permutation entropy and KS entropy was first observed and mathematically founded for piece-wise monotone dynamical systems by Bandt et al. [].

The (empirical) concept of permutation entropy is based upon analyzing the distribution of ordinal patterns in a time series or the underlying system. In this paper, we concentrate on a measure-preserving dynamical system

(Ω, A, μ, T)

, i.e., a probability space

(Ω, A, μ)

equipped with a measurable map

T : Ω \to Ω

satisfying

μ (T^{- 1} (A)) = μ (A)

for all

A \in A

.

Given a random variable

X : Ω \to R

, in this paper, an ordinal pattern of length

n \in N

with respect to X is considered as a subset of the state space

Ω

. It is indicated by a permutation

π = (π_{0}, π_{1}, \dots, π_{n - 1})

of

{0, 1, \dots, n - 1}

and defined by

P_{π} : = {ω \in Ω ∣ X (T^{π_{0}} (ω)) \leq X (T^{π_{1}} (ω)) \leq \dots \leq X (T^{π_{n - 1}} (ω))} .

(1)

(Usually, ordinal patterns are defined in the range of X, i.e., for the vectors

(X (T (ω)), X (T^{1} (ω)), \dots, X (T^{n - 1} (ω)))

). The collection of ordinal patterns:

O P_{} (n) : = {P_{π} ∣ π is a permutation of length n}

is a partition of

Ω

.

In the rest of this section, we assume that X preserves enough information about the given system in a certain sense. This is particularly the case if

Ω

is contained in

R

and X is the identity map. A precise general description of the assumption is given when presenting the results of this paper. It was shown in [] that, under not too restrictive further conditions, the probability distribution on the partitions

O P_{} (n)

for

n \in N

can be used for determining the KS entropy of the given system. The reason is that, roughly speaking, under these conditions,

O P_{} (n)

is able to separate the orbits of the system if

n \to \infty

.

In order to address the problem that this paper is concerned with, we give a description of ordinal patterns being slightly different from the above. One can determine to which ordinal pattern

P_{π}

of length n a point

ω

belongs to if, for all

(s, t)

in:

E_{n} = {(s, t) \in N_{0}^{2} ∣ 0 \leq s < t \leq n - 1},

(2)

one knows whether

X (T^{s} (ω)) \leq X (T^{t} (ω))

holds true or not. In other words, there exists a set

A \subseteq E_{n}

such that:

\begin{matrix} P_{π} = & ⋂_{(s, t) \in A} {ω \in Ω ∣ (X (T^{s} (ω)), X (T^{t} (ω))) \in R} \\ \cap & ⋂_{(s, t) \in E_{n} \ A} {ω \in Ω ∣ (X (T^{s} (ω)), X (T^{t} (ω))) \in R^{2} \ R}, \end{matrix}

(3)

where:

R : = {(x, y) \in R^{2} ∣ x > y} .

(4)

The above set contains all the points

ω \in Ω

that satisfy

X (T^{s} (ω)) > X (T^{t} (ω))

for

(s, t) \in A

and

X (T^{s} (ω)) \leq X (T^{t} (ω))

for

(s, t) \in E_{n} \ A

. Note that, given some arbitrary

A \subseteq E_{n}

, the set on the right hand side of (3) can be empty. In the case that it is non-empty, it coincides with some ordinal pattern

P_{π}

of length n.

While Equation (3) might be a bit more abstract than (1), it shows a way to generalize the concept of ordinal patterns on the basis of replacing the set R in (4) by some arbitrary Borel subset R of

R^{2}

, also to investigate why ordinal patterns are so successful.

Definition 1.

We call a non-empty Borel subset R of

R^{2}

discriminating relation.

The figures given in this paper show different discriminating relations R. In each case, only the part of R contained in

{[0, 1 [}^{2}

is presented. Note that in the case that X maps

Ω

into

[0, 1 [

, the restriction of R itself to this part would not change anything. Figure 1a illustrates R as given in (4), again only on

{[0, 1 [}^{2}

. In the case of such an R, note that

tan (π (X - 1 / 2))

mapping

[0, 1 [

into

[- \infty, \infty [

would not make a difference to a given X for our considerations, since order relations and associated partitions are preserved.

Figure 1. This figure illustrates some special discriminating relations R (striped areas) considered in Section 4. Only the part of R contained in [0, 1[² is shown (compare the corresponding remarks in Section 4).

Given some discriminating relation, generalized ordinal patterns of length n with respect to X are given as the non-empty sets defined by the right hand side of (3) for some

A \subseteq E_{n}

. Obviously, they also form a partition of

Ω

. The question that arises is what a discriminating relation R should look like, such that those generalized ordinal patterns inhibit the same nice properties the original ordinal patterns had. More precisely, we ask the following question:

Main Question.

Under what conditions on a discriminating relation R the partitions given by the generalized ordinal patterns determine the KS entropy of a dynamical system?

Why is this determination of entropy, which is precisely described by formula (10) in Theorem 1 interesting? For answering this question, interpret X as an observable,

ω

as the initial state of the given system and

X (ω), X (T (ω)), X (T^{2} (ω)), X (T^{3} (ω)), \dots

as the measured values at times

0, 1, 2, 3, \dots

. Determining (generalized) ordinal patterns on the basis of those values is a symbolization, where a symbol obtained is the (generalized) ordinal pattern containing

ω

. Generally, symbolization means a coarse-graining of the state space underlying a system, where each point is assigned one of finitely many given symbols. Instead of considering the precise development of the system, one is interested in the change of symbols in the course of time, justifying the naming of the method symbolic dynamics. Note that a symbolization is equivalent to partitioning the state space into classes of states (with the same symbol).

The reason for obtaining the full entropy from the (generalized) ordinal patterns is, roughly speaking, that the symbol system obtained has high potential for separating orbits. Such kinds of successful symbolizations are important, for example, in big data analysis, see, e.g., Smith et al. [].

The above question was first considered in [], where the authors basically showed that sets of the form:

R = {(x, y) \in R^{2} ∣ g (x) \geq y}

lead to generalized ordinal patterns that, under some conditions, can be used to determine the entropy if

g : R \to R

is measurable and one-to-one. Such an R is shown in Figure 1b and will be discussed in Section 4 as well as another R illustrated in Figure 1.

In this paper, we consider general sets

R \subseteq R^{2}

that cannot necessarily be described by functions and inequalities and establish some conditions under which the entropy can be determined using those sets. As in [], the discussion also includes a generalization of the sets

E_{n}

given by (2) and is conducted in a multidimensional framework. In particular, the results give insights as to why the basic ordinal approach and generalizations are working.

It is instructive to discuss the partition of

R^{2}

into R and

R^{2} \ R

from the viewpoint of symbolic dynamics. In contrast to classical symbolization approaches with symbolizing only in the range of single “measurements” x, the symbolization of pairs

(x, y)

via the partition

{R, R^{2} \ R}

also regards some kind of link between x and y if R lies “diagonal” in a certain sense. We will discuss this constellation, which explains the success of ordinal patterns in a wider context, more precisely in Section 5.

A completely different constellation is given for the sets R shown in Figure 2. Here, R is obtained as a half-plane from a “horizontal” division of

R^{2}

. If, for example,

Ω = [0, 1 [

,

A

is the Borel

σ

-algebra and

μ

the Lebesgue measure on

Ω

, and if T is the tent map on

Ω

, meaning that:

T (ω) = \{\begin{matrix} 2 ω & if 0 \leq ω < \frac{1}{2}, \\ 2 - 2 ω & if \frac{1}{2} \leq ω < 1, \end{matrix}

(5)

and X is the identity map, then the location of the horizontal cut is substantial.

Figure 2. “Non-diagonal” discriminating relations.

On the one hand,

R = {(x, y) \in R^{2} ∣ x \leq 2 / 3}

(Figure 2b) does not discriminate enough to obtain the KS entropy of the given system, and on the other hand, there is enough discrimination by

R = {(x, y) \in R^{2} ∣ x \leq 1 / 2}

(Figure 2a) due to the fact that

{[0, \frac{1}{2} [, [\frac{1}{2}, 1 [}

is a generating partition for T. In the situation considered, there is no additional information given by the measurements

(x, y)

relative to measurement x, hence R provides nothing more than a classical symbolization. For a detailed discussion of these facts, see [].

The rest of this paper is organized as follows. Section 2 provides the notions and concepts being necessary for formulating the main statement of this paper in Section 3. This statement is rather abstract and general and has to be considered in relation to some special cases discussed in Section 4 and making our ideas and findings transparent. Section 5 is devoted to the proof of our main statement.

2. Preliminaries

Throughout this paper,

(Ω, A, μ, T)

will be a measure-preserving dynamical system.

2.1. Some Notions

We will write

B = B (R)

or

B (R^{d})

for the Borel

σ

-algebra on

R

or

R^{d}; d \in N

, respectively. Given a random variable

X : Ω \to R

, by

μ_{X}

we denote the push-forward measure of

μ

with regard to X, i.e.,

μ_{X} (A) : = μ (X^{- 1} (A))

for all

A \in B (R)

. The measure

μ_{X} \times μ_{X} = μ_{X}^{2}

is the product measure, i.e.,

μ_{X}^{2} (A \times B) = μ_{X} (A) μ_{X} (B)

for all

A, B \in B (R)

.

For some Borel set

R \in B (R^{2})

, we define the function

f_{X}^{R} : R \to [0, 1]

by

f_{X}^{R} (x) : = μ ({ω \in Ω ∣ (x, X (ω)) \in R}) .

(6)

If it is clear from the context which set

R \in B (R^{2})

is considered, we simply write

f_{X}

instead of

f_{X}^{R}

. The function

f_{X}

can be represented as the integral:

f_{X} (x) = \int 1_{R} (x, y) d μ_{X} (y) .

Since

1_{R}

is integrable with regard to

μ_{X}^{2}

,

f_{X}

is integrable and therefore, also measurable by Fubini’s Theorem.

The complement

R^{2} \ R

of a set

R \subseteq R^{2}

will be denoted by

R^{c}

. The notation

\partial R

will be used for the boundary of a set R, i.e., the closure of R without its interior.

2.2. Entropy

The Shannon entropy of a finite partition

P \subset A

of

Ω

is defined as

H (P) : = - \sum_{P \in P} μ (P) log (μ (P)) .

The refinement of two partitions

P, Q \subset A

of

Ω

is given by

P \lor Q : = {P \cap Q \neq \emptyset ∣ P \in P, Q \in Q} .

For a finite collection of partitions

P_{i} \subset A, i \in {1, 2, \dots, n}

of

Ω

, one analogously defines:

⋁_{i = 1}^{n} P_{i} : = \{⋂_{i = 1}^{n} P_{i} \neq \emptyset ∣ P_{i} \in P_{i} for all i \in {1, 2, \dots, n}\} .

The entropy rate of a finite partition

P \subset A

of

Ω

is defined as

h (T, P) : = lim_{n \to \infty} \frac{1}{n} H (⋁_{t = 0}^{n - 1} T^{- t} (P)),

where

T^{- t} (P) = {T^{- t} (P) ∣ P \in P}

. For the existence of the limit in the formula, see, e.g., []. We are interested in determining the Kolmogorov–Sinai entropy of a system, which is defined as

h (T) : = sup_{P} h (T, P),

where the supremum is taken over all finite partitions of

Ω

in

A

.

Note that the Kolmogorov–Sinai entropy serves as the central complexity measure for dynamical systems and can be considered as a reference for other complexity measures, including in data analysis. Roughly speaking, it measures the mean information obtained by each iteration step. Since the Kolmogorov–Sinai entropy is the supremum of the entropy rates of all finite partitions, its determination and its estimation in a practical context are not easy and it is of some interest to find natural finite partitions for which the entropy rate is near the Kolmogorov–Sinai entropy. This is also a motivation for considering ordinal patterns and its generalization in this paper.

2.3. $σ$ -Algebras

Given a family of sets

A_{i} \in A, i \in I

, by

σ (A_{i} ∣ i \in I)

we denote the smallest

σ

-algebra containing all sets

A_{i}

. Analogously, for a family of partitions

P_{i} \subset A, i \in I

of

Ω

, we define

σ (P_{i} ∣ i \in I)

as the smallest

σ

-algebra containing all partitions

P_{i}

as subsets. Given d-dimensional random vectors

(Y_{i, 1}, Y_{i, 2}, \dots, Y_{i, d}) : Ω \to R^{d}, i \in I

, we define:

σ ((Y_{i, 1}, Y_{i, 2}, \dots, Y_{i, d}) ∣ i \in I) : = σ ({(Y_{i, 1}, Y_{i, 2}, \dots, Y_{i, d})}^{- 1} (B) ∣ i \in I, B \in B (R^{d}))

as the smallest

σ

-algebra containing all preimages of Borel sets. When comparing two

σ

-algebras

A_{1}, A_{2} \subseteq A

, we ignore sets with measure 0, i.e., we write:

A_{1} \subseteq_{μ} A_{2}

if for all

A_{1} \in A_{1}

there exists an

A_{2} \in A_{2}

with

μ ((A_{1} \ A_{2}) \cup (A_{2} \ A_{1})) = 0

. In this sense,

A_{1} =_{μ} A_{2}

means that

A_{1} \subseteq_{μ} A_{2}

and

A_{2} \subseteq_{μ} A_{1}

.

3. The Main Statement

Recall that a measure-preserving dynamical system

(Ω, A, μ, T)

is ergodic if for each

B \in A

with

T^{- 1} (B) = B

it holds

μ (B) \in {0, 1}

, meaning that the system does not divide into proper parts.

Referring to Section 1, we give some preparation for stating our main result. Recall that for defining (generalized) ordinal patterns it was basic to know whether

(X (T^{s} (ω)), X (T^{s} (ω))) \in R

or

(X (T^{s} (ω)), X (T^{s} (ω))) \in R^{2} \ R

for

ω \in Ω

and a random variable X on

Ω

, where “time” pairs

(s, t)

were taken from the sets

E_{n}

(see (2)). In order to also allow reducing the number of necessary “comparisons”, we relax the definition of the sets

E_{n}

leading to the following concept.

Definition 2.

We call a sequence

{(E_{n})}_{n \in N}

with

E_{1} \subseteq E_{2} \subseteq E_{3} \subseteq \dots \subseteq N_{0}^{2}

timing, if

E_{n}

contains finitely many elements and if there exists a sequence

{(a_{n})}_{n \in N} \in N_{0}^{N}

with:

\underset{N \to \infty}{lim sup} \frac{# \{n \in {1, 2, \dots, N} ∣ (a_{n}, a_{n} + n) \in ⋃_{k = 1}^{\infty} E_{k}\}}{N} = 1 .

(7)

Formula (7) roughly says that nearly each “temporal” distance is available for “comparisons”. It guarantees that enough time-pairs are considered to not have any loss of information contained in the “thinned out” generalized ordinal patterns relative to the “full” generalized ordinal patterns.

Remark 1.

In the first paper on generalized ordinal patterns ([]), a timing

{(E_{n})}_{n \in N}

was differently defined: the authors of that paper called a sequence of finite sets

E_{1} \subseteq E_{2} \subseteq E_{3} \subseteq \dots \subseteq N_{0}^{2}

timing, if there exists an increasing sequence

{(a_{n})}_{n \in N}

such that for all

n \in N

:

(i): $E_{n} \subseteq {a_{0}, a_{1}, \dots, a_{n}}^{2}$ ,
(ii): for all $s \in {a_{0}, a_{1}, \dots, a_{n}}$ , there exists a $t \in {a_{0}, a_{1}, \dots, a_{n}}$ with $s \neq t$ and $(s, t) \in E_{n}$ ,
(iii): ${(i d, T^{n})}^{- 1} (R_{i}) \subseteq_{μ} σ (⋁_{(s, t) \in E_{k}} {(T^{s}, T^{t})}^{- 1} (R_{i}) ∣ k \in N)$ for all $i \in {1, 2, \dots, d}$

hold true. Note that the last condition does not only depend on the timing

{(E_{n})}_{n \in N}

but also on T and

X = (X_{1}, X_{2}, \dots, X_{d})

. Instead of those three conditions, we instead simply require that almost all differences can be found in the timing.

Given a random vector

(X_{1}, X_{2}, \dots, X_{d})

and

R \in B (R^{2})

, we define the partition:

R_{i} : = {(X_{i}, X_{i})}^{- 1} ({R, R^{c}}) = {{(X_{i}, X_{i})}^{- 1} (R), {(X_{i}, X_{i})}^{- 1} (R^{c})}

for all

i \in {1, 2, \dots, d}

, which is equal to:

\begin{matrix} R_{i} = { & {(ω_{1}, ω_{2}) \in Ω^{2} ∣ (X_{i} (ω_{1}), X_{i} (ω_{2})) \in R}, \\ {(ω_{1}, ω_{2}) \in Ω^{2} ∣ (X_{i} (ω_{1}), X_{i} (ω_{2})) \notin R}} . \end{matrix}

Then, for

E_{n}

as given in (2) the partition

⋁_{(s, t) \in E_{k}} {(T^{s}, T^{t})}^{- 1} (R_{i})

is no more than the partition of generalized ordinal patterns with respect to

X_{i}

defined in Section 1 and

⋁_{i = 1}^{d} ⋁_{(s, t) \in E_{k}} {(T^{s}, T^{t})}^{- 1} (R_{i})

can be considered as the partition of generalized ordinal patterns with respect to

(X_{1}, X_{2}, \dots, X_{d})

.

The proof of the following main theorem of the paper is given in Section 5.

Theorem 1.

Let

(Ω, A, μ, T)

be an ergodic measure-preserving dynamical system,

X = (X_{1}, X_{2}, \dots, X_{d}) : Ω \to R^{d}

be a random vector,

R \in B (R^{2})

be a discriminating relation and

{(E_{n})}_{n \in N}

be a timing. Assume that the following conditions are valid:

\begin{matrix} T h e r e e x i s t s a c o u n t a b l e s e t C \subseteq R^{2} w i t h : \\ μ_{X_{i}}^{2} (\partial R \ C) = 0 f o r a l l i \in {1, 2, \dots, d} . \end{matrix}

(8)

\begin{matrix} T h e r e e x i s t s a r a n d o m v a r i a b l e Y : Ω \to R w i t h : \\ # Y (Ω) < \infty a n d A =_{μ} σ ((f_{X_{i}}^{R} \circ X_{i} \circ T^{t}, Y) ∣ t \in N_{0}, i \in {1, 2, \dots, d}) . \end{matrix}

(9)

Then:

h (T) = lim_{k \to \infty} h (T, ⋁_{i = 1}^{d} \underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})) .

(10)

holds true.

At first glance, conditions (8) and (9), being sufficient for (10), are looking very special. The considerations in the following section will, however, elucidate their role and show that they are relatively general. Roughly speaking, (8) says that the distribution of pairs of “independent measurements” with respect to

X_{i}

is discrete on the boundary of R. Condition (9) is an orbit separation condition based on the involved “measurements” and the functions

f_{X_{i}}^{R}

. In general:

A \supseteq_{μ} σ ((f_{X_{i}}^{R} \circ X_{i} \circ T^{t}, Y) ∣ t \in N_{0}, i \in {1, 2, \dots, d})

holds true because all functions involved in (9) are

A

measurable. Therefore, (9) is equivalent to

A \subseteq_{μ} σ ((f_{X_{i}}^{R} \circ X_{i} \circ T^{t}, Y) ∣ t \in N_{0}, i \in {1, 2, \dots, d})

The inclusion of the random variable Y provides some further separation and allows the above inclusion to hold true for a wider class of dynamical systems than, for example, the ones considered in []. In the case that Y is constant, it can also be omitted. In theory, Y should be chosen to take different values on those sets on which

f_{X_{i}}^{R} \circ X_{i} \circ T^{t}

takes the same values for

i \in {1, 2, \dots, d}

and

t \in N_{0}

. In practice, the fact that such a random variable Y exists is sufficient and Y does not need to be explicitly specified. An example is given in Section 4.5.

4. Special Cases

In the following, we discuss some special situations where the assumptions of Theorem 1, i.e., (8) and (9), are satisfied. Lemma 2 provides an easy-to-check condition, that of when (8) holds true. It is more difficult to see, when the condition (9) is satisfied. Roughly speaking, this condition is fulfilled if

X = (X_{1}, X_{2}, \dots, X_{d})

together with Y can uniquely describe the outcomes of the whole dynamical system and applying

f_{X_{i}}

to the results of

X

is, in some sense, “reversible” for all

i \in {1, 2, \dots, d}

. In other words,

X = (X_{1}, X_{2}, \dots, X_{d})

together with Y preserve the information of the system and there is no information loss for the symbolization. The first means that:

A =_{μ} σ ((X_{i} \circ T^{t}, Y) ∣ t \in N_{0}, i \in {1, 2, \dots, d}),

which obviously follows from

A =_{μ} σ ((f_{X_{i}}^{R} \circ X_{i} \circ T^{t}, Y) ∣ t \in N_{0}, i \in {1, 2, \dots, d}) .

To describe the range of outcomes of the random variables X on a probability space

(Ω, A, μ)

, we will use its cumulative distribution functions

F_{X} : R \to [0, 1]

defined by

F_{X} (x) : = μ_{X} (] - \infty, x]) .

When applying the cumulative distribution functions

F_{X}

to the outcomes X of a system, we do not lose any essential information about the system, according to the following lemma. This lemma is a simple modification of Lemma A.3 in [].

Lemma 1.

Let

(Ω, A, μ)

be a probability space,

X : Ω \to R

be a random variable and

g : R \to R

be a

B - B

measurable function which maps Borel sets to Borel sets and satisfies the following property:

F o r a l l x \in R i t h o l d s μ (X^{- 1} (g^{- 1} (g (] - \infty, x])) \] - \infty, x])) = 0

(11)

Then,

σ (g \circ X) =_{μ} σ (X)

. In particular,

σ (g \circ X) =_{μ} σ (X)

if

g = F_{X}

or g is injective on

X (Ω)

.

Condition (11) in the above Lemma is a slightly weaker condition on g than injectivity. If g is injective, then

g^{- 1} (g (] - \infty, x])) =] - \infty, x]

will hold true for all

x \in R

and condition (11) will be satisfied. More general, condition (11) can still be true if g is not necessarily injective but if all sets on which g is not injective, which are given by

X^{- 1} (g^{- 1} (g (] - \infty, x])) \] - \infty, x])

for all

x \in R

, have measure 0. For example, this is true if g is equal to the cumulative distribution function.

4.1. On the Boundary of R

The condition (8) in Theorem 1, that the boundary of R apart from countably many points has measure 0, holds true for all “simple” sets R. In the following lemma, we specify what we mean by “simple”.

Lemma 2.

Let

(Ω, A, μ)

be a probability space,

(X_{1}, X_{2}, \dots, X_{d})

be a random vector and

R \in B (R^{2})

. If, for all

i \in {1, 2, \dots, d}

:

\begin{matrix} \partial R \cap ({x} \times R) is countable for μ_{X_{i}} - almost all x \in R \\ or & \partial R \cap (R \times {y}) is countable for μ_{X_{i}} - almost all y \in R, \end{matrix}

then R satisfies (8), i.e., there exists a countable set

C \subseteq R^{2}

with:

μ_{X_{i}}^{2} (\partial R \ C) = 0 for all i \in {1, 2, \dots, d} .

Proof.

Consider the sets:

A_{i} : = {x \in R ∣ μ_{X_{i}} ({x}) > 0}

for

i \in {1, 2, \dots, d}

, which, obviously, are countable. Set:

C : = ⋃_{i = 1}^{d} A_{i} \times A_{i} .

Let

i \in {1, 2, \dots, d}

. If

{y \in R ∣ (x, y) \in \partial R}

is countable for

μ_{X_{i}}

-almost all

x \in R

, Fubini’s theorem implies:

\begin{matrix} μ_{X_{i}}^{2} (\partial R \ C) & = \int \int 1_{\partial R \ C} (x, y) d μ_{X_{i}} (y) d μ_{X_{i}} (x) \\ = \int μ_{X_{i}} ({y \in R ∣ (x, y) \in \partial R \ C}) d μ_{X_{i}} (x) \\ = \int \sum_{\begin{matrix} y \in R : \\ (x, y) \in \partial R \ C \end{matrix}} μ_{X_{i}} ({y}) d μ_{X_{i}} (x) \\ = \int 0 d μ_{X_{i}} (x) = 0 . \end{matrix}

Analogously, one can show the same if

{x \in R ∣ (x, y) \in \partial R}

is countable for

μ_{X_{i}}

-almost all

y \in R

. □

Remark 2.

The patterns visualized in Figure 1 could also be defined on the whole real axis instead of a bounded interval by, for example, applying the transformation

φ (x) = tan (π (x - 1 / 2))

. Then,

\tilde{R} = (φ \times φ) (R)

is a pattern defined on

R^{2}

if R is defined on

{[0, 1 [}^{2}

.

In the following three subsections, Y is assumed to be constant, and hence can be omitted.

4.2. Basic Ordinal Patterns

If:

R = {(x, y) \in R^{2} ∣ x \geq y}

(see Figure 1a), then

f_{X_{i}}^{R}

is just the distribution function of

μ_{X_{i}}

, i.e.,

f_{X_{i}}^{R} = F_{X_{i}}

. Since

\partial R \cap {x} \times R = {(x, x)}

is finite for all

x \in R

, (8) holds true by Lemma 2. According to Lemma 1, one has:

σ (X_{i} \circ T^{t}) \subseteq_{μ} σ (F_{X_{i}} \circ X_{i} \circ T^{t})

(12)

for all

t \in N_{0}

and

i = 1, 2, \dots, n

. Therefore:

\begin{matrix} A & =_{μ} σ ((f_{X_{i}}^{R} \circ X_{i} \circ T^{t}) ∣ t \in N_{0}, i \in {1, 2, \dots, d}) \\ = σ ((F_{X_{i}} \circ X_{i} \circ T^{t}) ∣ t \in N_{0}, i \in {1, 2, \dots, d}) \end{matrix}

(13)

is equivalent to:

A =_{μ} σ ((X_{i} \circ T^{t}) ∣ t \in N_{0}, i \in {1, 2, \dots, d}) .

(14)

By Theorem 1, for ergodic systems, condition (14) implies (10). A more general statement also includes a large class of non-ergodic systems which was shown in []. Condition (14) is, for example, satisfied if

Ω \in B (R^{d})

and

X_{i}

is the projection on the i-th coordinate for all

i \in {1, 2, \dots, d}

, or if

Ω

is a compact Hausdorff space and

X = (X_{1}, X_{2}, \dots, X_{d})

is injective and continuous. One can also use Taken’s theorem to argue that the set of maps

X : Ω \to R^{d}

that satisfy (14) is large in a certain topological sense. For both, see Keller [].

4.3. Patterns Defined by “Injective” Functions

Let

X = (X_{1}, X_{2}, \dots, X_{d})

be a random vector and consider now:

R = {(x, y) \in R ∣ g (x) \geq y}

(15)

for a

B - B

measurable function

g : R \to R

(see Figure 1b). Since

\partial R \cap ({x} \times R) = {(x, g (x))}

is finite for all

x \in R

, (8) holds true by Lemma 2. Moreover, one easily sees that

f_{X_{i}}^{R} = F_{X_{i}} \circ g

.

Now, suppose that:

σ (F_{X_{i}}) \subseteq σ (F_{X_{i}} \circ g)

(16)

holds true for all

i \in {1, 2, \dots, d}

. This directly yields:

σ (F_{X_{i}} \circ X_{i} \circ T^{t}) \subseteq σ (F_{X_{i}} \circ g \circ X_{i} \circ T^{t})

for all

i \in {1, 2, \dots, d}

and

t \in N

. Remember that

σ (X_{i} \circ T^{t}) \subseteq_{μ} σ (F_{X_{i}} \circ X_{i} \circ T^{t})

holds true according to Lemma 1. Thus, (14) and (16) imply (10). When considering basic ordinal patterns in Section 4.2, we stated some conditions under which (14) holds true. It remains to consider when (16) is satisfied.

Assume that g maps Borel sets to Borel sets and is injective. This implies:

σ (g \circ X_{i} \circ T^{t}) = σ (X_{i} \circ T^{t})

for all

t \in N_{0}

and

i \in {1, 2, \dots, d}

. Now, suppose that:

A = σ (F_{X_{i}} \circ X_{i} \circ T^{t} ∣ t \in N_{0}, i \in {1, 2, \dots, d}) .

holds true. This would then imply (16). However, the above equation only holds true

μ

-almost surely (see (13)). This can be a problem when applying the function g because there could exist sets

B \in B

with

μ_{X_{i}} (B) = 0

but

μ_{X_{i}} (g (B)) > 0

. Additionally, we therefore need to require that

μ_{X_{i}} (g^{- 1} (B)) = 0

implies

μ_{X_{i}} (B) = 0

for all

B \in B

.

Theorem 1 then provides the following statement:

Corollary 1.

Let

(Ω, A, μ, T)

be an ergodic measure-preserving dynamical system,

X = (X_{1}, X_{2}, \dots, X_{d}) : Ω \to R^{d}

be a random vector and

{(E_{n})}_{n \in N}

be a timing. Let further

g : R \to R

be a

B - B

measurable function which maps Borel sets to Borel sets, is injective on

X_{i} (Ω)

and satisfies

μ_{X_{i}} (g^{- 1} (B)) = 0 \Rightarrow μ_{X_{i}} (B) = 0

for all

B \in B

and

i \in {1, 2, \dots, d}

. Let

R = {(x, y) \in R^{2} ∣ g (x) \geq y}

.

Then, (14) implies (10). Moreover, (10) holds true if

Ω \in B (R^{d})

and

X_{i}

is the projection on the i-th coordinate for all

i \in {1, 2, \dots, d}

or if Ω is a compact Hausdorff space and

X = (X_{1}, X_{2}, \dots, X_{d})

is injective and continuous.

Note that the statements in Corollary 1, in principle, were shown in []. The case of basic ordinal patterns is included by

g (x) = x

for all

x \in R

.

4.4. Patterns Defined by “Surjective” Functions

Swapping coordinates in (15) yields the set:

R = {(x, y) \in R ∣ x < g (y)}

(see Figure 1c) with (8) following from Lemma 2 and with:

\begin{matrix} f_{X_{i}}^{R} (x) & = μ ({ω \in Ω ∣ (x, X_{i} (ω)) \in R}) = μ ({ω \in Ω ∣ x < g (X_{i} (ω))}) \\ = μ ({ω \in Ω ∣ g (X_{i} (ω)) \in] x, \infty [}) = 1 - F_{g \circ X_{i}} (x) . \end{matrix}

Corollary 2.

Let

(Ω, A, μ, T)

be an ergodic measure-preserving dynamical system,

X = (X_{1}, X_{2}, \dots, X_{d}) : Ω \to R^{d}

be a random vector and

{(E_{n})}_{n \in N}

be a timing. Let further

g : R \to R

be a

B - B

measurable function and let

R = {(x, y) \in R^{2} ∣ x < g (y)}

. Then, the following holds:

(i)

If

F_{g \circ X_{i}}

is injective on

X_{i} (Ω)

for

i \in {1, 2, \dots, d}

, (14) implies (10).

(ii)

If

Ω \in B (R^{d})

and

X_{i}

is the projection on the i-th coordinate for all

i \in {1, 2, \dots, d}

or if Ω is a compact Hausdorff space and

X = (X_{1}, X_{2}, \dots, X_{d})

is injective and continuous, if further

μ (U) > 0

for every non-empty open set

U \subseteq Ω

, g is continuous and

X_{i} (Ω) \subseteq g (X_{i} (Ω))

, then (10) is valid in each of the following two cases:

(1): For each $i \in {1, 2, \dots, d}$ and all $x_{1}, x_{2} \in X_{i} (Ω)$ with $x_{1} < x_{2}$ , there exists some $y \in X_{i} (Ω)$ with $x_{1} < y < x_{2}$ ,
(2): Ω is connected.

Proof.

(i): If the above assumptions are satisfied and

F_{g \circ X_{i}}

is injective on

X_{i} (Ω)

for all

i \in {1, 2, \dots, d}

, then by Lemma 1 it holds that

σ (X_{i} \circ T^{t}) \subseteq_{μ} σ (F_{g \circ X_{i}} \circ X_{i} \circ T^{t})

for all

t \in N_{0}

and

i \in {1, 2, \dots, d}

. This implies (9), hence, by Theorem 1 the statement (10).

(ii): Given the assumptions of (ii), we have to show that

F_{g \circ X_{i}}

is injective on

X_{i} (Ω)

for all

i \in {1, 2, \dots, d}

. If

Ω

is connected, then (1) is obviously satisfied. We can thus start from (1). Take

x_{1}, x_{2} \in X_{i} (Ω)

with

x_{1} < x_{2}

. Then,

g^{- 1} (] x_{1}, x_{2} [)

is non-empty and because

g \circ X_{i}

is continuous,

X_{i}^{- 1} (g^{- 1} (] x_{1}, x_{2}]))

contains a non-empty open set. This implies that

F_{g \circ X_{i}} (x_{1}) < F_{g \circ X_{i}} (x_{2})

. because every non-empty open set was assumed to have a strictly positive measure. □

Notice that, unlike in (15), it is not necessary that g is one-to-one.

4.5. Piecewise Patterns

The previous subsection illustrates that (9) is fulfilled if, roughly speaking,

(X_{1}, X_{2}, \dots, X_{d})

preserves all information and if

f_{X_{i}}^{R}

is a

μ_{X_{i}}

almost surely invertible function for all

i \in {1, 2, \dots, d}

. The finite-valued random variable Y in (9) can be used to weaken the condition of invertibility in the sense that only piecewise invertibility is needed where the different pieces are induced by the random variable Y.

For

Ω = [0, 1 [

and an absolutely continuous measure

μ

, one could, for example, consider:

R_{circles} = {(x, y) \in Ω^{2} ∣ {∥ (k x mod 1, k y mod 1) - (0.5, 0.5) ∥}_{2} \leq 0.5}

(17)

for any

k \in N

, as shown for

k = 5

in Figure 1d. The set R satisfies condition (9) with

Y (ω) = i

for

ω \in [(i - 1) / (2 k), i / (2 k) [

and

i \in {1, 2, \dots, 2 k}

. The set R is a pattern with

k^{2}

circles of diameter

1 / k

distributed in

{[0, 1]}^{2}

on a square grid.

4.6. A Remark on the Work of Amigó et al.

Consider the discriminating relation:

R_{k} = {(x, y) \in R^{2} ∣ ⌈ k \cdot x ⌉ \geq ⌈ k \cdot y ⌉}

shown in Figure 3 for

k \in N

. Assume for simplicity that the dynamical system is defined on

Ω = [0, 1 [

and that X is the identity map

i d

. It is easy to see that:

σ (⋁_{t = 0}^{n} T^{- t} (P_{k}) ∣ n \in N) =_{μ} σ (f_{i d}^{R_{k}} \circ T^{t} ∣ t \in N_{0})

holds true, where

P_{k} : = {[(i - 1) / k, i / k {[}}_{i = 1}^{k}

. Therefore, (9) in Theorem 1 holds true if

P_{k}

is a generating partition.

Figure 3.

R_{k} : = {(x, y) \in R^{2} ∣ ⌊ k \cdot x ⌋ \geq ⌊ k \cdot y ⌋}

for

k = 8

(left side) and

k = 16

(right side), only shown in

{[0.1]}^{2}

.

Additionally, one could consider the quantity:

lim_{k \to \infty} \underset{n \to \infty}{lim inf} \frac{1}{n} H (⋁_{s = 0}^{n - 1} ⋁_{t = s + 1}^{n - 1} {(T^{s}, T^{t})}^{- 1} ({R_{k}, R^{2} \ R_{k}}))

(18)

which was introduced by Amigó et. al. []. They used finite-valued random variables to quantize the dynamical system into k parts and considered the ordinal patterns of the quantized systems while we directly apply the quantization to the discriminating relation. Both approaches only differ in their notation. They showed in their paper that the limit in (18) is equal to the Kolmogorov–Sinai entropy.

5. Proof of the Main Statement

We first recall some definitions and statements related to partitions and the conditional entropy. For two partitions

P, Q \subset A

of

Ω

, the conditional entropy is defined as

H (P | Q) : = H (P \lor Q) - H (Q) .

Roughly speaking, the conditional entropy

H (P | Q)

describes how much uncertainty is left in the outcomes described by the sets given in

P

if one already has information about the outcomes described by the sets given in

Q

. For example, if

P = Q

, then

H (P | Q) = 0

. However, if

P

and

Q

are independent, meaning that

μ (P \cap Q) = μ (P) \cdot μ (Q)

for all

P \in P

and

Q \in Q

, and

H (P | Q) = H (Q)

.

Without explicitly referencing them, we will use the following properties of the conditional entropy:

(i): $H (T^{- 1} (P) | T^{- 1} (Q)) = H (P | Q)$ ,
(ii): $H (⋁_{i = 1}^{n} P_{i} | Q) \leq \sum_{i = 1}^{n} H (P_{i} | Q)$ ,
(iii): $H (P | Q_{1} \lor Q_{2}) \leq H (P | Q_{1})$ .

See, for examples, [] for proofs.

A sequence of partitions

{(P_{i})}_{i \in N}

in

A

of

Ω

is said to be generating (the

σ

-algebra

A

), if

σ (P_{i}) \subseteq σ (P_{i + 1})

for all

i \in N

and:

σ (P_{i} ∣ i \in N) =_{μ} A

holds true. As a consequence of this property:

lim_{n \to \infty} H (P ∣ P_{n}) = 0

(19)

holds true for all partitions

P \subset A

of

Ω

. Using the properties of the conditional entropy implies:

h (T) = lim_{n \to \infty} h (T, P_{n}) .

For

N \in N

:

U_{N} = \{[(i - 1) / 2^{N}, i / 2^{N}[∣ i \in {1, 2, \dots, 2^{N} - 1}\} \cup \{[(2^{N} - 1) / 2^{N}, 1]\}

will denote the partition of

[0, 1]

in

2^{N}

equally sized intervals.

We start the proof of Theorem 1 with two basic lemmata.

Lemma 3.

Let

(Ω, A, μ, T)

be a measure-preserving dynamical system,

X = (X_{1}, X_{2}, \dots, X_{d})

be a random vector and Y be a random variable satisfying (9). Then, there exists some constant

c \in R

with:

h (T^{m}) \leq lim_{N \to \infty} h (T^{m}, ⋁_{i = 1}^{d} ⋁_{t = 0}^{m - 1} T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N})))) + c .

for all

m \in N

.

Proof.

Fix

m \in N

. Set:

M : = \{Y^{- 1} (y) ∣ y \in Y (Ω)\}

Since Y was assumed to attain only a finite number of different values,

M

is a finite partition of

Ω

. Because the Borel

σ

-algebra of

[0, 1]

is generated by the partitions

U_{N}

and due to (9), we have:

\begin{matrix} A & =_{μ} σ ((f_{X_{i}} \circ X_{i} \circ T^{t}, Y) ∣ t \in N_{0}, i \in {1, 2, \dots, d}) \\ = σ (T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N}))) \lor M ∣ N \in N, t \in N_{0}, i \in {1, 2, \dots, d}) . \end{matrix}

Thus, for any

ε > 0

and any finite partition

P \subset A

of

Ω

, there exists an

N_{ε} \in N

and a

t_{ε} \in N

with:

\begin{matrix} h (T^{m}, P) & \leq h (T^{m}, ⋁_{i = 1}^{d} ⋁_{t = 0}^{t_{ε} - 1} T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N}))) \lor M) + ε \\ \leq h (T^{m}, ⋁_{i = 1}^{d} ⋁_{t = 0}^{t_{ε} - 1} T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N})))) + h (T^{m}, M) + ε \\ \leq h (T^{m}, ⋁_{i = 1}^{d} ⋁_{t = 0}^{t_{ε} - 1} T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N})))) + H (M) + ε \\ \leq h (T^{m}, ⋁_{i = 1}^{d} ⋁_{t = 0}^{m - 1} T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N})))) + H (M) + ε \end{matrix}

for all

N \geq N_{ε}

. Hence:

h (T^{m}, P) \leq lim_{N \to \infty} h (T^{m}, ⋁_{i = 1}^{d} ⋁_{t = 0}^{m - 1} T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N})))) + H (M) + ε

for any

ε > 0

, which implies:

\begin{matrix} h (T^{m}) & = sup_{P} h (T^{m}, P) \\ \leq lim_{N \to \infty} h (T^{m}, ⋁_{i = 1}^{d} ⋁_{t = 0}^{m - 1} T^{- t} (X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N})))) + H (M) . \end{matrix}

□

Lemma 4.

Let

(Ω, A, μ)

be a probability space,

X : Ω \to R

be a random variable and

A, B \in B (R^{2})

. Then:

\int |f_{X}^{A} - f_{X}^{B}| d μ_{X} \leq μ_{X}^{2} (A ▵ B)

holds true.

Proof.

For all

x \in R

:

\begin{matrix} μ ({ω \in Ω ∣ (x, X (ω)) \in A Δ B}) \\ = & μ ({ω \in Ω ∣ (x, X (ω)) \in A \ B}) + μ ({ω \in Ω ∣ (x, X (ω)) \in B \ A}) \\ \geq & μ ({ω \in Ω ∣ (x, X (ω)) \in A \ B}) \\ \geq & μ ({ω \in Ω ∣ (x, X (ω)) \in A}) - μ ({ω \in Ω ∣ (x, X (ω)) \in B}) \\ = & f_{X}^{A} (x) - f_{X}^{B} (x) \end{matrix}

holds true. Analogously, one can show:

μ ({ω \in Ω ∣ (x, X (ω)) \in A Δ B}) \geq f_{X}^{B} (x) - f_{X}^{A} (x) .

This implies:

μ ({ω \in Ω ∣ (x, X (ω)) \in A Δ B}) \geq |f_{X}^{A} (x) - f_{X}^{B} (x)|

and, by Fubini’s theorem:

\begin{matrix} \int |f_{X}^{A} - f_{X}^{B}| d μ_{X} (x) \\ \leq \int μ ({ω \in Ω ∣ (x, X (ω)) \in A Δ B}) d μ_{X} (x) \\ = \int \int 1_{{ω \in Ω ∣ (x, X (ω)) \in A Δ B}} (ω^{'}) d μ (ω^{'}) d μ_{X} (x) \\ = \int \int 1_{{y \in R ∣ (x, y) \in A Δ B}} (y^{'}) d μ_{X} (y^{'}) d μ_{X} (x) \\ = \int_{R^{2}} 1_{A Δ B} (x, y^{'}) d μ_{X}^{2} (x, y^{'}) \\ = μ_{X}^{2} (A Δ B) . \end{matrix}

□

Therefore, in particular, the above lemma implies that, if

{(R_{j})}_{j \in N}

is a sequence of sets in

B (R^{2})

with

{lim}_{j \to \infty} μ_{X}^{2} (R_{j} ▵ R) = 0

, then

f_{X}^{R_{j}}

converges to

f_{X}^{R}

in

L^{1}

for

j \to \infty

.

Given

R \subseteq R^{2}

and a random variable

X : Ω \to R

, consider the function

f_{X, n}^{R} : Ω \times R \to [0, 1]

with:

f_{X, n}^{R} (x, ω) : = \frac{1}{n} # {t \in {1, 2, \dots, n} ∣ (x, X (T^{t} (ω))) \in R} .

We want to show that

f_{X, n}^{R} (x, ω)

converges to

f_{X}^{R} (x)

for all

x \in R

and

μ

-almost all

ω \in Ω

. If

f_{X, n}^{R} (x, ω)

is monotone in x for all

ω \in Ω

and

n \in N

, this can be shown relatively easily using the pointwise ergodic theorem and the monotonicity of the considered functions. Monotonicity is guaranteed, if

x_{1} \leq x_{2}

implies:

{y \in R ∣ (x_{1}, y) \in R} \subseteq {y \in R ∣ (x_{2}, y) \in R} .

For example, if

R = {(x, y) \in R^{2} ∣ x > y}

, the above implication holds true. For this special case, a proof of the statement in Lemma 5 can be found in [].

However, we are interested in general sets

R \in B (R^{2})

and therefore, cannot use the monotonicity. Therefore, we have to prove this statement differently.

Lemma 5.

Let

(Ω, A, μ, T)

be an ergodic measure-preserving dynamical system,

X = (X_{1}, X_{2}, \dots, X_{d})

be a random vector and

R \in B (R^{2})

satisfy (8). Then, for all

i \in {1, 2, \dots, d}

, there exist sets

\tilde{Ω} \in A

and

B \in B (R)

with

μ (\tilde{Ω}) = μ_{X_{i}} (B) = 1

satisfying:

lim_{n \to \infty} f_{X_{i}, n}^{R} (x, ω) = f_{X_{i}}^{R} (x)

for all

ω \in \tilde{Ω}

and

x \in B

.

Proof.

Fix

i \in {1, 2, \dots, d}

. According to (8), there exists a countable set

C = {(a_{k}, b_{k}) ∣ k \in N}

with:

μ_{X_{i}}^{2} (\partial R \ C) = 0 .

By the pointwise ergodic theorem (see, e.g., []), for all

j, k \in N

, there exists

Ω_{j, k}^{*} \in A

with

μ (Ω_{j, k}^{*}) = 1

and:

lim_{n \to \infty} f_{X_{i}, n}^{{(a_{k}, b_{k})}} (a_{j}, ω) = f_{X_{i}}^{{(a_{k}, b_{k})}} (a_{j})

for all

ω \in Ω_{j, k}^{*}

. It is easy to see that:

f_{X_{i}, n}^{{(a_{k}, b_{k})}} (x, ω) = 0 = f_{X_{i}}^{{(a_{k}, b_{k})}} (x)

holds true for all

n \in N

and

ω \in Ω

if

x \neq a_{k}

. Hence:

lim_{n \to \infty} f_{X_{i}, n}^{{(a_{k}, b_{k})}} (x, ω) = f_{X_{i}}^{{(a_{k}, b_{k})}} (x)

(20)

for all

x \in R

and

ω \in ⋂_{j = 1}^{\infty} Ω_{j, k}^{*}

. Using Fatou’s lemma and the fact that C is countable implies:

\begin{matrix} \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R \cap C} (x, ω) & = \underset{n \to \infty}{lim inf} \sum_{\begin{matrix} k \in N : \\ (a_{k}, b_{k}) \in R \end{matrix}} f_{X_{i}, n}^{{(a_{k}, b_{k})}} (x, ω) \\ \geq \sum_{\begin{matrix} k \in N : \\ (a_{k}, b_{k}) \in R \end{matrix}} \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{{(a_{k}, b_{k})}} (x, ω) \\ \overset{(20)}{=} \sum_{\begin{matrix} k \in N : \\ (a_{k}, b_{k}) \in R \end{matrix}} f_{X_{i}}^{{(a_{k}, b_{k})}} (x) = f_{X_{i}}^{R \cap C} (x) \end{matrix}

(21)

for all

x \in R

and

ω \in ⋂_{j = 1}^{\infty} ⋂_{k = 1}^{\infty} Ω_{j, k}^{*}

. We will use this fact later.

Since

R \ \partial R

is open, there exists a countable collection of pairwise disjoint rectangles

A_{j} \subseteq R^{2}

with:

R \ \partial R = ⋃_{j = 1}^{\infty} A_{j} .

(22)

Take

(x_{j}, y_{j}) \in A_{j}

for all

j \in N

. Using the pointwise ergodic theorem, for all

j \in N

there exists a set

Ω_{j} \in A

with

μ (Ω_{j}) = 1

and:

\begin{matrix} lim_{n \to \infty} f_{X_{i}, n}^{A_{j}} (x_{j}, ω) = f_{X_{i}}^{A_{j}} (x_{j}) \end{matrix}

(23)

for all

ω \in Ω_{j}

. Because

A_{j}

is a rectangle, for all

ω \in Ω

:

\begin{matrix} f_{X_{i}, n}^{A_{j}} (x, ω) & = f_{X_{i}, n}^{A_{j}} (x_{j}, ω) \\ and f_{X_{i}}^{A_{j}} (x) & = f_{X_{i}}^{A_{j}} (x_{j}) \end{matrix}

holds true for all

x \in R

with

{x} \times R \cap A_{i} \neq \emptyset

and:

\begin{matrix} f_{X_{i}, n}^{A_{j}} (x, ω) = f_{X_{i}}^{A_{j}} (x) = 0 \end{matrix}

holds true for all

x \in R

with

R \times {x} \cap A_{i} = \emptyset

. Together with (23), this implies:

lim_{n \to \infty} f_{X_{i}, n}^{A_{j} \ C} (x, ω) = f_{X_{i}}^{A_{j} \ C} (x)

(24)

for all

x \in R

and

ω \in Ω_{j}

.

Set

R_{J} : = ⋃_{j = 1}^{J} A_{j}

. Lemma 4 provides:

\begin{matrix} lim_{J \to \infty} \int |f_{X_{i}}^{R_{J} \ C} (x) - f_{X_{i}}^{R \ C} (x)| d μ_{X} (x) \\ = lim_{J \to \infty} μ_{X_{i}}^{2} ((R_{J} \ C) ▵ (R \ C)) \\ = lim_{J \to \infty} μ_{X_{i}}^{2} ((R \ C) \ (R_{J} \ C)) \\ = lim_{J \to \infty} μ_{X_{i}}^{2} (R \ (R_{J} \cup C)) \\ = lim_{J \to \infty} μ_{X_{i}}^{2} (((R \cap \partial R) \cup (R \ \partial R)) \ (R_{J} \cup C)) \\ = lim_{J \to \infty} μ_{X_{i}}^{2} ((R \cap \partial R) \ (R_{J} \cup C)) + μ_{X_{i}}^{2} ((R \ \partial R) \ (R_{J} \cup C)) \\ \leq lim_{J \to \infty} μ_{X_{i}}^{2} (\partial R \ C) + μ_{X_{i}}^{2} ((R \ \partial R) \ R_{J}) \\ \overset{(8)}{=} lim_{J \to \infty} μ_{X_{i}}^{2} ((R \ \partial R) \ R_{J}) \\ \overset{(22)}{=} μ_{X_{i}}^{2} ((R \ \partial R) \ (R \ \partial R)) = 0 . \end{matrix}

Therefore, there exists a set

B_{1}

with

μ_{X_{i}} (B_{1}) = 1

and a sequence

{(J_{k})}_{k \in N}

with:

lim_{k \to \infty} f_{X_{i}}^{R_{J_{k}} \ C} (x) = f_{X_{i}}^{R \ C} (x)

(25)

for all

x \in B_{1}

. Thus:

\begin{matrix} \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R} (x, ω) & \geq \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R \cap C} (x, ω) + \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R \ C} (x, ω) \\ \overset{(21)}{\geq} f_{X_{i}}^{R \cap C} (x) + \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R_{J_{k}} \ C} (x, ω) \\ \geq f_{X_{i}}^{R \cap C} (x) + lim_{k \to \infty} \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R_{J_{k}} \ C} (x, ω) \\ = f_{X_{i}}^{R \cap C} (x) + lim_{k \to \infty} \underset{n \to \infty}{lim inf} \sum_{j = 1}^{J_{k}} f_{X_{i}, n}^{A_{j} \ C} (x, ω) \\ \geq f_{X_{i}}^{R \cap C} (x) + lim_{k \to \infty} \sum_{j = 1}^{J_{k}} \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{A_{j} \ C} (x, ω) \\ \overset{(24)}{=} f_{X_{i}}^{R \cap C} (x) + lim_{k \to \infty} \sum_{j = 1}^{J_{k}} f_{X_{i}}^{A_{j} \ C} (x) \\ = f_{X_{i}}^{R \cap C} (x) + lim_{k \to \infty} f_{X_{i}}^{R_{J_{k}} \ C} (x) \\ \overset{(25)}{=} f_{X_{i}}^{R \cap C} (x) + f_{X_{i}}^{R \ C} (x) \\ = f_{X_{i}}^{R} (x) \end{matrix}

for all

x \in B_{1}

and

ω \in {\tilde{Ω}}_{1} : = ⋂_{j = 1}^{\infty} ⋂_{k = 1}^{\infty} Ω_{j} \cap Ω_{j, k}^{*}

. Because

R^{c} \ \partial R

is open as well, one can analogously show that there exist sets

{\tilde{Ω}}_{2} \in A

and

B_{2} \in B (R)

with

μ ({\tilde{Ω}}_{2}) = μ_{X_{i}} (B_{2}) = 1

and:

\underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R^{c}} (x, ω) \geq f_{X_{i}}^{R^{c}} (x)

for all

x \in B_{2}

and

ω \in {\tilde{Ω}}_{2}

. This implies:

\begin{matrix} \underset{n \to \infty}{lim sup} f_{X_{i}, n}^{R} (x, ω) = 1 - \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R^{c}} (x, ω) \leq 1 - f_{X_{i}}^{R^{c}} (x) = f_{X_{i}}^{R} (x) . \end{matrix}

Hence:

f_{X_{i}}^{R} (x) \leq \underset{n \to \infty}{lim inf} f_{X_{i}, n}^{R} (x, ω) \leq \underset{n \to \infty}{lim sup} f_{X_{i}, n}^{R} (x, ω) \leq f_{X_{i}}^{R} (x)

(26)

for all

x \in B : = B_{1} \cap B_{2}

and

ω \in \tilde{Ω} : = {\tilde{Ω}}_{1} \cap {\tilde{Ω}}_{2}

. □

Given a random vector

(X_{1}, X_{2}, \dots, X_{d})

and

R \in B (R^{2})

, we define the partition:

R_{i} : = {(X_{i}, X_{i})}^{- 1} ({R, R^{c}})

for all

i \in {1, 2, \dots, d}

, which is equal to:

\begin{matrix} R_{i} = { & {(ω_{1}, ω_{2}) \in Ω^{2} ∣ (X_{i} (ω_{1}), X_{i} (ω_{2})) \in R}, \\ {(ω_{1}, ω_{2}) \in Ω^{2} ∣ (X_{i} (ω_{1}), X_{i} (ω_{2})) \notin R}} . \end{matrix}

Lemma 1.

Let

(Ω, A, μ, T)

be an ergodic measure-preserving dynamical system,

X = (X_{1}, X_{2}, \dots, X_{d}) : Ω \to R^{d}

be a random vector,

{(E_{n})}_{n \in N}

be a timing and

R \in B (R^{2})

satisfying (8). Then, there exists a sequence

{(n_{k})}_{k \in N} \subseteq N_{0}^{N}

with:

lim_{k \to \infty} H (T^{- n_{k}} ({(f_{X_{i}} \circ X_{i})}^{- 1} (U_{N})) |⋁_{v = 0}^{n_{k}} T^{- v} (\underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i}))) = 0

for all

N \in N

and

i \in {1, 2, \dots, d}

.

Proof.

Because

{(E_{n})}_{n \in N}

is a timing, there exist a sequence

{(a_{n})}_{n \in N} \in N_{0}^{N}

with:

\underset{N \to \infty}{lim sup} \frac{# \{n \in {1, 2, \dots, N} ∣ (a_{n}, a_{n} + n) \in ⋃_{k = 1}^{\infty} E_{k}\}}{N} = 1 .

So one can find a strictly increasing sequence

{(N_{n})}_{n \in N} \in N^{N}

with:

(a_{N_{n}}, a_{N_{n}} + N_{n}) \in ⋃_{k = 1}^{\infty} E_{k}

for all

n \in N

and:

\underset{n \to \infty}{lim sup} \frac{n}{N_{n}} = 1 .

(27)

Now, fix

i \in {1, 2, \dots, d}

. According to Lemma 5, there exist sets

\tilde{Ω} \in A

and

B \in B (R)

with

μ (\tilde{Ω}) = μ_{X_{i}} (B) = 1

satisfying:

lim_{n \to \infty} f_{X_{i}, n}^{R} (ω, x) = f_{X_{i}}^{R} (x)

for all

ω \in \tilde{Ω}

and

x \in B

. Set

Ω_{0} : = \tilde{Ω} \cap X_{i}^{- 1} (B) .

Consider the function

ϕ_{n} : Ω \to [0, 1]

with:

ϕ_{n} (ω) : = \frac{1}{N_{n}} # {t \in {N_{1}, N_{2}, \dots, N_{n}} ∣ (X_{i} (T^{t} (ω)), X_{i} (ω)) \in R} .

Then:

\begin{matrix} f_{X_{i}}^{R} (X_{i} (ω)) \\ = & lim_{n \to \infty} f_{X_{i}, n}^{R} (ω, X_{i} (ω)) \\ = & \underset{n \to \infty}{lim sup} f_{X_{i}, N_{n}}^{R} (ω, X_{i} (ω)) \\ \geq & \underset{n \to \infty}{lim sup} ϕ_{n} (ω) \\ = & \underset{n \to \infty}{lim sup} [f_{X_{i}, N_{n}}^{R} (ω, X_{i} (ω)) \\ - \frac{1}{N_{n}} # {t \in {1, 2, \dots, N_{n}} \ {N_{1}, N_{2}, \dots, N_{n}} ∣ (X_{i} (T^{t} (ω)), X_{i} (ω)) \in R}] \\ \geq & \underset{n \to \infty}{lim sup} [f_{X_{i}, N_{n}}^{R} (ω, X_{i} (ω)) - \frac{1}{N_{n}} # {1, 2, \dots, N_{n}} \ {N_{1}, N_{2}, \dots, N_{n}}] \\ = & \underset{n \to \infty}{lim sup} [f_{X_{i}, N_{n}}^{R} (ω, X_{i} (ω)) - \frac{N_{n} - n}{N_{n}}] \\ = & lim_{n \to \infty} f_{X_{i}, N_{n}}^{R} (ω, X_{i} (ω)) - 1 + \underset{n \to \infty}{lim sup} \frac{n}{N_{n}} \\ \overset{(27)}{=} & f_{X_{i}}^{R} (X_{i} (ω)) \end{matrix}

(28)

for all

ω \in Ω_{0}

.

It is easy to see that:

σ (ϕ_{n}) \subseteq σ (⋁_{t = 1}^{k} {(i d, T^{N_{t}})}^{- 1} (R_{i}) ∣ k \in N)

holds true for all

n \in N

. This implies (see for instance [], Theorem 13.4 (i)):

σ (f_{X_{i}}^{R} \circ X_{i}) \overset{(28)}{=_{μ}} σ (\underset{n \to \infty}{lim sup} ϕ_{n}) \subseteq σ (⋁_{t = 1}^{k} {(i d, T^{N_{t}})}^{- 1} (R_{i}) ∣ k \in N) .

Therefore

⋁_{t = 1}^{k} {(i d, T^{N_{t}})}^{- 1} (R_{i})

is a sequence of partitions generating

σ (f_{X_{i}}^{R} \circ X_{i})

. By (19), this implies:

lim_{k \to \infty} H ({(f_{X_{i}} \circ X_{i})}^{- 1} (U_{N}) |⋁_{t = 1}^{k} {(i d, T^{N_{t}})}^{- 1} (R_{i})) = 0

(29)

for all

N \in N

. Set:

n_{k} : = max_{1 \leq n \leq k} a_{N_{n}} .

Notice that:

σ (T^{- n_{k}} (⋁_{t = 1}^{k} {(i d, T^{N_{t}})}^{- 1} (R_{i}))) \subseteq σ (⋁_{v = 0}^{n_{k}} T^{- v} (\underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})))

holds true for all

k \in N

. Consequently:

\begin{matrix} lim_{k \to \infty} H (T^{- n_{k}} ({(f_{X_{i}} \circ X_{i})}^{- 1} (U_{N})) |⋁_{v = 0}^{n_{k}} T^{- v} (\underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i}))) \\ \leq lim_{k \to \infty} H (T^{- n_{k}} ({(f_{X_{i}} \circ X_{i})}^{- 1} (U_{N})) |T^{- n_{k}} (⋁_{t = 1}^{k} {(i d, T^{N_{t}})}^{- 1} (R_{i}))) \\ = lim_{k \to \infty} H ({(f_{X_{i}} \circ X_{i})}^{- 1} (U_{N}) |⋁_{t = 1}^{k} {(i d, T^{N_{t}})}^{- 1} (R_{i})) \overset{(29)}{=} 0 \end{matrix}

for all

N \in N

. □

We can now finalize the proof of Theorem 1.

Proof of Theorem 1.

Let

N \in N

and

m \in N

. Set:

P_{N}^{i} : = X_{i}^{- 1} (f_{X_{i}}^{- 1} (U_{N}))

and:

Q_{k}^{i} : = \underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})

for all

i \in {1, 2, \dots, d}

and

k \in N

. According to Lemma 1, there exists a sequence

{(n_{k})}_{k \in N} \subseteq N_{0}^{N}

with:

lim_{k \to \infty} H (T^{- n_{k}} (P_{N}^{i}) |⋁_{v = 0}^{n_{k}} T^{- v} (Q_{k}^{i})) = 0

(30)

for all

i \in {1, 2, \dots, d}

. We have:

\begin{matrix} lim_{n \to \infty} \frac{1}{n} H (⋁_{i = 1}^{d} ⋁_{u = 0}^{n m - 1} T^{- u} (P_{N}^{i}) |⋁_{i = 1}^{d} ⋁_{u = 0}^{n m - 1} T^{- u} (Q_{k}^{i})) \\ \leq & \sum_{i = 1}^{d} lim_{n \to \infty} \frac{1}{n} H (⋁_{u = 0}^{n m - 1} T^{- u} (P_{N}^{i}) |⋁_{u = 0}^{n m - 1} T^{- u} (Q_{k}^{i})) \\ = & \sum_{i = 1}^{d} lim_{n \to \infty} \frac{1}{n} H (⋁_{u = 0}^{n m - 1 - n_{k}} T^{- u} (T^{- n_{k}} (P_{N}^{i})) |⋁_{u = 0}^{n m - 1 - n_{k}} T^{- u} (⋁_{v = 0}^{n_{k}} T^{- v} (Q_{k}^{i}))) \\ \leq & \sum_{i = 1}^{d} lim_{n \to \infty} \frac{1}{n} \sum_{u = 0}^{n m - 1 - n_{k}} H (T^{- u} (T^{- n_{k}} (P_{N}^{i})) |⋁_{v = 0}^{n_{k}} T^{- v} (Q_{k}^{i})) \\ = & \sum_{i = 1}^{d} lim_{n \to \infty} \frac{n m - 1 - n_{k}}{n} H (T^{- n_{k}} (P_{N}^{i}) |⋁_{v = 0}^{n_{k}} T^{- v} (Q_{k}^{i})) \\ \leq & \sum_{i = 1}^{d} m \cdot H (T^{- n_{k}} (P_{N}^{i}) |⋁_{v = 0}^{n_{k}} T^{- v} (Q_{k}^{i})) \end{matrix}

for all

k, m, N \in N

. This implies:

\begin{matrix} h (T^{m}, ⋁_{i = 1}^{d} ⋁_{u = 0}^{m - 1} T^{- u} (P_{N}^{i})) - lim_{k \to \infty} h (T^{m}, ⋁_{i = 1}^{d} ⋁_{u = 0}^{m - 1} T^{- u} (Q_{k}^{i})) \\ \leq lim_{k \to \infty} lim_{n \to \infty} \frac{1}{n} H (⋁_{i = 1}^{d} ⋁_{u = 0}^{n m - 1} T^{- u} (P_{N}^{i}) |⋁_{i = 1}^{d} ⋁_{u = 0}^{n m - 1} T^{- u} (Q_{k}^{i})) \\ \leq lim_{k \to \infty} \sum_{i = 1}^{d} m \cdot H (T^{- n_{k}} (P_{N}^{i}) |⋁_{v = 0}^{n_{k}} T^{- v} (Q_{k}^{i})) \overset{(30)}{=} 0 . \end{matrix}

Using Lemma 3, we can conclude that there exists a constant

c \in R

with:

\begin{matrix} h (T^{m}) & \leq lim_{k \to \infty} h (T^{m}, ⋁_{i = 1}^{d} ⋁_{u = 0}^{m - 1} T^{- u} (\underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i}))) + c \\ = lim_{k \to \infty} m \cdot h (T, ⋁_{i = 1}^{d} \underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})) + c \end{matrix}

for all

m \in N

. Thus:

\begin{matrix} h (T) - lim_{k \to \infty} h (T, ⋁_{i = 1}^{d} \underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})) \\ = lim_{m \to \infty} \frac{1}{m} \cdot h (T^{m}) - lim_{k \to \infty} h (T, ⋁_{i = 1}^{d} \underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})) \\ \leq lim_{m \to \infty} \frac{1}{m} \cdot c = 0, \end{matrix}

which is equivalent to:

h (T) \leq lim_{k \to \infty} h (T, ⋁_{i = 1}^{d} \underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})) .

(31)

On the other hand:

\begin{matrix} h (T) = sup_{P} h (T, P) \geq lim_{k \to \infty} h (T, ⋁_{i = 1}^{d} \underset{(s, t) \in E_{k}}{⋁} {(T^{s}, T^{t})}^{- 1} (R_{i})), \end{matrix}

which, together with (31), finishes the proof. □

6. Conclusions

We discussed a special “two-dimensional” approach to symbolic dynamics differing from many usual approaches which was introduced in []. From the practical viewpoint, the difference can be illustrated as follows: given the time-dependent measurements of a real-valued quantity, a symbolization is not conducted for the measurements themselves as in usual approaches, but for pairs of measurements at two different times. This means that to each pair of possible measured values, a symbol from a finite symbol set is assigned. Here, we only considered two symbols which lead to a partitioning of the two-dimensional real space

R^{2}

into a set R and its complement

R^{2} \ R

. In usual approaches, partitions of

R

are considered. (Advantages of the “two-dimensional” approach are described in []).

The set R, called a discriminating relation, was considered as a basic building block for constructing partitions of the state space of a given dynamical system, having time-dependent measurements of finitely many quantities in mind. In addition to the discrimination relation, the second central concept was the concept of a timing which roughly describes which pairs of times are included in the symbolization process and guarantees that there are not too few such pairs. The central question of the paper was that of under which conditions on a discriminating relation R the partitions constructed from R determine the KS entropy of a measure-preserving dynamical system. With Theorem 1, we gave a relatively general statement partially answering this question. Some specifications of the theorem in Section 4 illustrate the nature of “successful” discriminating relations.

Although the statement of Theorem 1 appears relatively natural when looking at the proofs a little closer, we do not expect that all cases where the K-S entropy can be constructed based on a discriminating relation is covered by the statement; however, we have no counterexample. The main tool used in the proofs of the results is the pointwise ergodic theorem. It allows to establish a connection between the generalized ordinal patterns and the shape of the discriminating relation.

The results of this paper, being on a rather abstract level, give some insights as to why the idea of ordinal patterns is working well, as reported by several applied papers, with extracting those advantageous features being more general than in the original ordinal approach. Having many choices for a discriminating relation, for practical purposes such as, for example, in a classification context, one needs methods and criteria for finding good discrimination relations, adapted to given data and problems. This is an important challenge for further research related to the given approach to symbolic dynamics. A further aspect is to discuss the approach for partitioning the

R^{2}

into more than two pieces.

Author Contributions

T.G. and K.K. designed and wrote the paper and T.G. provided the proofs. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bandt, C.; Pompe, B. Permutation Entropy: A Natural Complexity Measure for Time Series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef]
Zanin, M.; Zunino, L.; Rosso, O.A.; Papo, D. Permutation Entropy and Its Main Biomedical and Econophysics Applications: A Review. Entropy 2012, 14, 1553–1577. [Google Scholar] [CrossRef]
Bandt, C.; Keller, G.; Pompe, B. Entropy of interval maps via permutations. Nonlinearity 2002, 15, 1595–1602. [Google Scholar] [CrossRef]
Antoniouk, A.; Keller, K.; Maksymenko, S. Kolmogorov-Sinai entropy via separation properties of order-generated σ-algebras. Discret. Contin. Dyn. Syst. A 2014, 34, 1793–1809. [Google Scholar] [CrossRef]
Smith, G.; Goulding, J.; Barrack, D. Towards Optimal Symbolization for Time Series Comparisons. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, 7–10 December 2013; pp. 646–653. [Google Scholar] [CrossRef]
Stolz, I.; Keller, K. A General Symbolic Approach to Kolmogorov-Sinai Entropy. Entropy 2017, 19, 675. [Google Scholar] [CrossRef] [Green Version]
Walters, P. An introduction to ergodic theory. In Graduate Texts in Mathematics; Springer: New York, NY, USA, 1982; Volume 79. [Google Scholar]
Keller, K. Permutations and the Kolmogorov-Sinai entropy. Discret. Contin. Dyn. Syst. 2012, 32, 891–900. [Google Scholar] [CrossRef]
Amigó, J.M.; Kennel, M.B.; Kocarev, L. The permutation entropy rate equals the metric entropy rate for ergodic information sources and ergodic dynamical systems. Phys. D 2005, 210, 77–95. [Google Scholar] [CrossRef] [Green Version]
Cornfeld, I.P.; Fomin, S.V.; Sinai, Y.G. Ergodic Theory, 1st ed.; Springer: New York, NY, USA, 1982. [Google Scholar]
Billingsley, P. Probability and Measure, 2nd ed.; John Wiley and Sons: Hoboken, NJ, USA, 1986. [Google Scholar]

Figure 1. This figure illustrates some special discriminating relations R (striped areas) considered in Section 4. Only the part of R contained in [0, 1[² is shown (compare the corresponding remarks in Section 4).

Figure 2. “Non-diagonal” discriminating relations.

Figure 3.

R_{k} : = {(x, y) \in R^{2} ∣ ⌊ k \cdot x ⌋ \geq ⌊ k \cdot y ⌋}

for

k = 8

(left side) and

k = 16

(right side), only shown in

{[0.1]}^{2}

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Generalized Ordinal Patterns and the KS-Entropy

Abstract

1. Introduction

2. Preliminaries

2.1. Some Notions

2.2. Entropy

2.3. $σ$ -Algebras

3. The Main Statement

4. Special Cases

4.1. On the Boundary of R

4.2. Basic Ordinal Patterns

4.3. Patterns Defined by “Injective” Functions

4.4. Patterns Defined by “Surjective” Functions

4.5. Piecewise Patterns

4.6. A Remark on the Work of Amigó et al.

5. Proof of the Main Statement

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Generalized Ordinal Patterns and the KS-Entropy

Abstract

1. Introduction

2. Preliminaries

2.1. Some Notions

2.2. Entropy

2.3. σ -Algebras

3. The Main Statement

4. Special Cases

4.1. On the Boundary of R

4.2. Basic Ordinal Patterns

4.3. Patterns Defined by “Injective” Functions

4.4. Patterns Defined by “Surjective” Functions

4.5. Piecewise Patterns

4.6. A Remark on the Work of Amigó et al.

5. Proof of the Main Statement

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

2.3. $σ$ -Algebras