Next Article in Journal
Structure and Dynamics of Water at Carbon-Based Interfaces
Next Article in Special Issue
Cooperative Particle Filtering for Tracking ERP Subcomponents from Multichannel EEG
Previous Article in Journal / Special Issue
Spectral Entropy Parameters during Rapid Ventricular Pacing for Transcatheter Aortic Valve Implantation

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Permutation Entropy: New Ideas and Challenges

Institute of Mathematics, University of Lübeck, Lübeck D-23562, Germany
*
Author to whom correspondence should be addressed.
Entropy 2017, 19(3), 134; https://doi.org/10.3390/e19030134
Received: 17 February 2017 / Revised: 17 March 2017 / Accepted: 17 March 2017 / Published: 21 March 2017

## Abstract

:
Over recent years, some new variants of Permutation entropy have been introduced and applied to EEG analysis, including a conditional variant and variants using some additional metric information or being based on entropies that are different from the Shannon entropy. In some situations, it is not completely clear what kind of information the new measures and their algorithmic implementations provide. We discuss the new developments and illustrate them for EEG data.

## 1. Introduction

The concept of Permutation entropy introduced by Bandt and Pompe [1] in 2002 has been applied to data analysis in various disciplines (compare e.g., the collection [2], and papers [3,4]). The Permutation entropy of a time series is the Shannon entropy of the distribution of ordinal patterns in the time series (see also [5]). Such ordinal patterns, describing order types of vectors, are coded by permutations. Denoting the set of permutations of ${ 0 , 1 , … , d }$ for d in the natural numbers $N$ by $Π d$, we say that a vector $( v 0 , v 1 , … , v d ) ∈ R d + 1$ has ordinal pattern $π = ( r 0 , r 1 , … , r d ) ∈ Π d$ if
$v r 0 ≥ v r 1 … ≥ v r d − 1 ≥ v r d$
and
$r l − 1 > r l if v r l − 1 = v r l .$
Definition 1.
The empirical Permutation entropy (ePE) of order $d ∈ N$ and of delay $τ ∈ N$ of a time series $( x t ) t = 0 N − 1$ with $N ∈ N$ is given by
$ePE d , τ , ( x t ) t = 0 N − 1 = − 1 d ∑ π ∈ Π d p π τ ln p π τ ,$
where
$p π τ = # { t ∈ { d τ , d τ + 1 , … , N − 1 } ∣ ( x t , x t − τ , … , x t − d τ ) h a s o r d i n a l p a t t e r n π } N − d τ$
is the relative frequency of ordinal patterns π in the time series and $0 ln 0$ is defined by 0 (compare [6]). The vectors $( x t , x t − τ , … , x t − d τ )$ related to $t , d$ and τ are called delay vectors.
In contrast to some other presentations of Permutation entropy in the literature, the Shannon entropy obtained from the ordinal pattern distribution is normalized by dividing by d. This is justified by an interesting statement made by Bandt et al. [7], which roughly states that for time series from a special class of systems, for $τ = 1$ and $d → ∞$, (1) approaches a non-negative real number, however, how large the corresponding class is remains open to debate. Normalizing as given allows comparing Permutation entropy for different orders. (Some further discussion on this point follows in this paper.) Another usual normalization is given by dividing the Shannon entropy by its maximal possible value $ln ( d ! )$.
In order to be more flexible in data analysis, various modifications of Permutation entropy have been developed during the last years. One class of such modifications is based on adding some metric information of the corresponding delay vectors to the ordinal patterns derived. Entropy measures of that type are the fine-grained Permutation entropy proposed by Liu and Wang [8], the weighted Permutation entropy and the robust Permutation entropy introduced by Fadlallah et al. [9] and Keller et al. [10], respectively. Bian et al. [11] have adapted Permutation entropy to the situation of delay vectors with equal components which are relevant when the number of possible values in a time series is small. Other variants to consider are Tsallis or Renyi entropies instead of the Shannon entropy which goes back to the work of Zunino et al. [12] and Liang et al. [13], or to integrate information from different scales. The latter was done by Li et al. [14], Ouyang et al. [15] and Azami and Ascodero [16] on the base of averaging the original data on different scales and by Zunino et al. [17] and Zunino and Ribeiro [18]) by considering a whole spectrum of delays τ.
Unakafov and Keller [19] have proposed the Conditional entropy of (successive) ordinal patterns, which has been shown to perform better than the Permutation entropy itself in many cases.
In this paper, we discuss some of the new (“non-weighting”) developments in Permutation entropy. We first give some theoretical background in order to justify and to motivate conditional variants of Permutation entropy. Then we take a closer look at Renyi and Tsallis modifications of Permutation entropy. The last part of the paper is aimed at the classification of electroencephalography (EEG) data from the viewpoint of epileptic activity. Here, we point out that ordinal time series analysis combined with other methods and automatic learning could be promising for data analysis. We rely on EEG data reported in [6,20] and use algorithms developed in Unakafova and Keller [21].

## 2. Some Theoretical Background

#### 2.1. The Kolmogorov–Sinai Entropy

In order to achieve a better understanding of Permutation entropy, we take a modeling approach. For this, we choose a discrete dynamical system defined on a probability space $( Ω , B , P )$. The elements $ω ∈ Ω$ are considered as the (hidden) states of the system, the elements of the σ-algebra $B$ as the events one is interested in, and the probability P measures the chances of such events taking place. The dynamics of the system are given by a map $T : Ω ↩$ being $B − B$-measurable, i.e., satisfying $T − 1 ( B ) ∈ B$ for all $B ∈ B$. The map T is assumed to be P-invariant saying that $P ( T − 1 ( A ) ) = P ( A )$ for all $B ∈ B$, in order to guarantee that the distribution of the states of the system does not change under the dynamics.
For simplicity, during the whole paper, we assume that $( Ω , B , P )$ and T are fixed; specifications are given directly where they are needed.
In dynamical systems, the Kolmogorov–Sinai entropy based on entropy rates of finite partitions is a well established theoretical complexity measure. For explaining this concept, consider a finite partition $C = { C 1 , C 2 , … , C l } ⊂ B$ and note that the (Shannon) entropy of such a partition is defined by
$H ( C ) = − ∑ i = 1 l P ( C i ) ln P ( C i ) .$
For $A = { 1 , 2 , … , l }$, the set $A k$ of words $a 1 a 2 … a k$ of length k over A provides a partition $C k$ of Ω into pieces
$C a 1 a 2 … a k : = { ω ∈ Ω | ( ω , T ( ω ) , … , T ∘ k − 1 ( ω ) ) ∈ C a 1 × C a 2 × … × C a k } .$
(In this paper, $T ∘ l$ denotes the l-fold concatenation of T, where $T ∘ 0$ is the identity map.) The distribution of word probabilities $P ( C a 1 a 2 … a k )$ contains some information on the complexity of the system, which, as the word length k approaches ∞, provides the entropy rate
$EntroRate ( C ) = lim k → ∞ 1 k H ( C k ) = lim k → ∞ ( H ( C k ) − H ( C k − 1 ) )$
of $C$. The latter can be interpreted as the mean information per symbol. Here, let $H ( C 0 ) : = 0$. Note that both the sequences $( 1 k H ( C k ) ) k = 1 ∞$ and $( H ( C k ) − H ( C k − 1 ) ) k = 1 ∞$ are monotonically non-increasing.
In order to have a complexity that does not depend only on a fixed partition and measure the overall complexity, the Kolmogorov–Sinai entropy (KS entropy) is defined by
$KS = sup C finite partition EntroRate ( C ) .$
For more information, in particular concerning formula (3), see for example Walters [22].
By its nature, the KS entropy is not easy to determine and to obtain from time dependent measurements of a system. One important point is to find finite partitions supporting as much information on the dynamics as possible. If there is no feasible so-called generating partition containing all information (for a definition, see Walters [22]), this is not easy. The approach of Bandt and Pompe builds up appropriate partitions based on the ordinal structure of a system. Let us explain the idea in a general context.

#### 2.2. Observables and Ordinal Partitioning

In our modeling, an observed time series $x 0 , x 1 , … , x N − 1$ is considered as the sequence of “outcoming” values $X ( ω ) , X ( T ( ω ) ) , … X ( T ∘ N − 1 ( ω ) )$ for some $ω ∈ Ω$, where X is a real-valued random variable on $( Ω , B , P )$. Here, X is interpreted as an observable establishing the outreading process.
Since it is no additional effort to consider more than one observable, let $X = ( X 1 , X 2 , … , X n ) : Ω → R n$ be a random vector with observables $X 1 , X 2 , … , X n$ as components. Originally, Bandt and Pompe [1] discussed the case that the measuring values coincide with the states of a one-dimensional system, which in our language means that $Ω ⊂ R$, $n = 1$, and that there is only one observable which coincides with the identity map.
For each order $d ∈ N$, we are interested in the partition
$C X ( d ) = { C ( π 1 , π 2 , … , π n ) | π i ordinal pattern for i = 1 , 2 , … , n }$
with
$C ( π 1 , π 2 , … , π n ) = { ω ∈ Ω ∣ ( X i ( T ∘ d ( ω ) ) , … , X i ( T ( ω ) ) , X i ( ω ) ) has ordinal pattern π i for i = 1 , 2 , … , n }$
called ordinal partition of order d. Here, ordinal patterns are of order d and the delay is one.

#### 2.3. No Information Loss

For the rest of this section, assume that the measurements by the observables do not lose information on the modeling system, roughly meaning that the random variables $X 1 , X 1 ∘ T , X 1 ∘ T ∘ 2 , … , X 2 , X 2 ∘ T , X 2 ∘ T ∘ 2 , … , X n , X n ∘ T , X n ∘ T ∘ 2 , …$ separate states of Ω and, more precisely, that for each event C in the σ-algebra generated by these random variables, there exists some $B ∈ B$ with $P ( C Δ B ) = 0$. Note that already, for one observable, the separation property described is mostly satisfied in a certain sense (see Takens [23], Gutman [24]).
Moreover, assume that T is ergodic, meaning that $P ( T − 1 ( B ) ) = P ( B )$ implies $P ( B ) ∈ { 0 , 1 }$ for all $B ∈ B$. Ergodicity says that the given system does not separate into proper subsystems and, by Birkhoff’s ergodic theorem (see [22]), allows to obtain properties of the whole system on the base of single orbits $ω , T ( ω ) , T ∘ 2 ( ω ) , …$
Under these assumptions, it holds
$KS = lim d → ∞ EntroRate ( C X ( d ) ) = sup d ∈ N EntroRate ( C X ( d ) ) .$
This statement shown in Antoniouk et al. [25] and generalized in [26] can be interpreted as follows: If there is no loss of information caused by the measuring process, the whole information of the process can be obtained by only considering ordinal relations between the measurements for each observable. Formula (4), to be read together with formula (3), is illustrated by Figure 1 and Figure 2, where directions of arrows indicate the direction of convergence.

#### 2.4. Conditional Entropy of Ordinal Patterns

In order to motivate a complexity measure based on considering successive ordinal patterns, we deduce some useful inequalities from the above statement. First, fix some $d ∈ N$ and note that, since $( H ( C X ( d ) k ) − H ( C X ( d ) k − 1 ) ) k = 1 ∞$ is monotonically non-increasing, it holds for all $k ∈ N$
$H ( C X ( d ) k ) − H ( C X ( d ) k − 1 ) ≤ 1 k ∑ i = 1 k ( H ( C X ( d ) i ) − H ( C X ( d ) i − 1 ) ) = 1 k H ( C X ( d ) k ) .$
Because $EntroRate ( C X ( d ) )$ is converging to the KS entropy for $d → ∞$ in a monotonically non-decreasing way, the following is valid:
$KS ≤ lim inf d → ∞ ( H ( C X ( d ) k d ) − H ( C X ( d ) k d − 1 ) ) ≤ lim inf d → ∞ 1 k d H ( C X ( d ) k d ) for each each sequence ( k d ) d = 1 ∞ of natural numbers .$
Note that the faster the sequence $( k d ) d = 1 ∞$ increases, the nearer are the terms in the inequality, and good choices of the sequence provide equality of all terms in the last inequality. In particular, it holds
$KS ≤ lim inf d → ∞ ( H ( C X ( d ) 2 ) − H ( C X ( d ) ) ) ,$
being the background for the following definition (compare Unakafov and Keller [19]):
Definition 2.
$H ( C X ( d ) 2 ) − H ( C X ( d ) )$ is called the Conditional entropy of ordinal patterns of order d.
Indeed, the concept given is a conditional entropy, however, since $C X ( d ) 2$ is a finer partition than $C X ( d )$, it reduces to a difference of entropies.

#### 2.5. Permutation Entropy

The description of KS entropy given by formula (4) includes a double-limit, where the inner one is non-increasing and the outer is non-decreasing. Bandt et al. [7] (see also [1]) have proposed the concept of Permutation entropy, which only needs one limit and was the starting point of using ordinal pattern methods. Here, the concept is given in our general context.
Definition 3.
For $d ∈ N$, the quantity $PE X ( d ) = 1 d H ( C X ( d ) )$ is called the Permutation entropy of order d with respect to $X$. Moreover, by the Permutation entropy with respect to $X$, we understand $PE X = lim sup d → ∞ PE X ( d )$.
The definition of Permutation entropy given is justified by the result of Bandt et al. [7] that if T is a piecewise monotone interval map, then it holds $PE id = KS$. Here, Ω is an interval and $X = i d$, where $i d$ denotes the identity on Ω. We do not want to say more about this result, but mention that general equality of KS entropy and Permutation entropy is an open problem, however as shown in [25] (under the assumptions above) it holds
$KS ≤ PE X .$
Assuming that the increments of entropy of successive ordinal partitions are well-behaved in the sense that
$lim d → ∞ ( H ( C X ( d + 1 ) ) − H ( C X ( d ) ) )$
exists, by the Stolz–Cesàro theorem, one obtains the inequality
$KS ≤ lim inf d → ∞ ( H ( C X ( d ) 2 ) − H ( C X ( d ) ) ) ≤ lim d → ∞ ( H ( C X ( d + 1 ) ) − H ( C X ( d ) ) ) = lim d → ∞ ( H ( C X ( d ) ) − H ( C X ( d − 1 ) ) ) = lim d → ∞ 1 d ∑ i = 1 d ( H ( C X ( i ) ) − H ( C X ( i − 1 ) ) ) = lim d → ∞ 1 d H ( C X ( d ) ) = PE X$
(compare [19]). This inequality is shedding some light on the relationship of KS entropy, Permutation entropy and the various conditional entropy differences (see Figure 2) considered.
Summarizing, all quantities considered are related to the entropies $H ( C X ( d ) k )$. For simplicity, we now switch to only one observable X. The generalization is simple but not necessary for the following. In the restricted case, for $d , k ∈ N$, it holds
$H ( C X ( d ) k ) = − ∑ ( π l ) l = 1 k ∈ Π d k P ( π l ) l = 1 k ln P ( π l ) l = 1 k ,$
where
$P ( π l ) l = 1 k = P ( { ω ∈ Ω ∣ ( X ( T ∘ l + d − 1 ( ω ) ) , … , X ( T ∘ l ( ω ) ) , X ( T ∘ l − 1 ( ω ) ) ) has ordinal pattern π l for l = 1 , 2 , … , k } ) .$

#### 2.6. The Practical Viewpoint

Despite the general relation of KS entropy and Permutation entropy, the asymptotics of entropies taken from ordinal pattern distributions are relatively well understood by the above considerations. This allows a good interpretation of what they measure. The practical problem, however, is that asymptotics can be very slow and thus cause problems in the estimation of $P ( C X ( d ) k )$ from orbits of a system when d or k is high. Here, for $n , k ∈ N$, a naive estimator of $P ( π l ) l = 1 k$ is
$p ( π l ) l = 1 k = # { t ∈ { d + k − 1 , … , N − 1 } ∣ ( x t − i , … , x t − i − d ) has ord . patt . π i for i = 0 , … , k − 1 } N − d − k + 1$
and a naive estimator of $H ( C X ( d ) k )$ is
$h ( d , k ) = − ∑ ( π l ) l = 1 k ∈ Π d k p ( π l ) l = 1 k ln p ( π l ) l = 1 k .$
The problem is that one needs too many measurements for a reliable estimation if d and k are large. This is demonstrated for the logistic map with “maximal chaos” in the one-dimensional case. Here, $Ω = [ 0 , 1 ]$, T is defined by $T ( ω ) = 4 ω ( 1 − ω )$ for $x ∈ [ 0 , 1 ]$, P is the Lebesgue measure, and X is the identity map. Note that this map and maps of the whole logistic family are often used for testing complexity measures. (For more information, see [27,28].)
Figure 3 shows $h ( d , k ) − h ( d , k − 1 )$ and $h ( d , 1 ) d$ for different $d , k ∈ N$ in dependence on orbit lengths of T between $10 2$, and $10 5$ and $10 6$, respectively. The horizontal black line indicates the KS entropy of T, which is $ln 2$. The graphic on the top provides curves for $h ( d , k ) − h ( d , k − 1 )$ for $k = 2$ and $d = 2 , 3 , … , 7$ being estimates of the Conditional entropy of ordinal patterns of corresponding orders d. The approximation of the KS entropy is rather good for $d = 7$, but a long orbit is needed. In general, the higher d is, the longer a stabilization of the estimate takes. A similar situation is demonstrated in the graphic in the middle, now with fixed $d = 7$ but increasing k. In the graphic on the bottom, three of the curves from the upper graphic are drawn again as a contrast to curves showing $h ( d , 1 ) d$ for $d = 5 , 6 , 7$. The latter can be considered as estimates of the corresponding Permutation entropies; the estimates, however, are bad since d is not large enough. Note that according to the result of Bandt et al., estimates for very high d must be near to the KS entropy. The results illustrate that using Conditional entropy of ordinal patterns is a good compromise. Good performance of Conditional entropy of ordinal patterns for the logistic family is reported in Unakafov and Keller [19].

## 3. Generalizations Based on the Families of Renyi and Tsallis Entropies

#### 3.1. The Concept

It is natural to generalize Permutation entropy to Tsallis and Renyi entropy variants, which has first been done by Zunino et al. [12]. Liang et al. [13] discuss the performances of a large list of complexity measures, among them are the classical as well as Tsallis and Renyi Permutation entropies in tracking changes of EEG during different anesthesia states. They report that the class of Permutation entropies in some features shows good performance relative to the other measures, with best results of the Renyi variant. Let us have a closer look at the new concepts.
Definition 4.
For some given positive $α ≠ 1$, the empirical Renyi Permutation entropy (eRPE) and empirical Tsallis Permutation entropy (eTPE) of order $d ∈ N$ and of delay $τ ∈ N$ of a time series $( x t ) t = 0 N − 1$ with $N ∈ N$ is defined by
$eRPE α , d , τ , ( x t ) t = 0 N − 1 = − 1 d ( 1 − α ) ln ∑ π ∈ Π d ( p π τ ) α$
and
$eTPE α , d , τ , ( x t ) t = 0 N − 1 = − 1 d ( α − 1 ) 1 − ∑ π ∈ Π d ( p π τ ) α ,$
respectively, with $p π$ as given in (2). (We include the factor $1 d$ in the entropy formulas only for reasons of comparability with the classical Permutation entropy.)

#### 3.2. Some Properties

As, in general, the Renyi and the Tsallis entropy of a distribution for $α → 1$ converge to the Shannon entropy, convergence of eRPE and eTPE to the ePE holds as well. The two concepts principally can be used in data analysis to further emphasize the role of small ordinal pattern probabilities if $α < 1$ or of large ones if $α > 1$ (compare the graphs of the functions $x ↦ x ln x , x ↦ x α$ for different α and $x ∈ [ 0 , 1 ]$). The consequences of this weighting become obvious for the eRPE when considering limits for $α → 0$ and $α → ∞$. One easily sees that
$lim α → ∞ eRPE α , d , τ , ( x t ) t = 0 N − 1 = − ln ( max π ∈ Π d p π τ ) d lim α → 0 eRPE α , d , τ , ( x t ) t = 0 N − 1 = ln ( # { π ∈ Π d ∣ p π τ ≠ 0 } ) d ,$
meaning that for large α, the eRPE mainly measures the largest relative ordinal pattern frequency (on a logarithmic scale), with low entropy for high relative frequency; for small α, the eRPE mainly measures the number of occurring ordinal patterns (on a logarithmic scale). Since
$eTPE α , d , τ , ( x t ) t = 0 N − 1 = e d ( 1 − α ) eRPE α , d , τ , ( x t ) t = 0 N − 1 − 1 d ( 1 − α ) ,$
the eTPE is only a monotone functional of the eRPE for fixed α; despite a different scale, it has similar properties as the eRPE.
For $α = 2$, the eRPE has a nice interpretation. Having $N π = ( N − d τ ) p π τ$ in the time series, there are $N π ( N π − 1 ) 2$ pairs of different times providing ordinal pattern π. Since, all in all, we have $( N − d τ ) ( N − d τ − 1 ) 2$ different time pairs, the quantity
$∑ π ∈ Π d N π ( N π − 1 ) ( N − d τ ) ( N − d τ − 1 ) = ∑ π ∈ Π d N π 2 − ( N − d τ ) ( N − d τ ) 2 − ( N − d τ ) ≈ ∑ π ∈ Π d ( p π τ ) 2$
can be interpreted as the degree of recurrence in the time series. This quantity is, in fact, the symbolic correlation integral recently introduced by Caballero et al. [29].

#### 3.3. Demonstration

We demonstrate the performance of eRPE and eTPE for different α on the base of EEG data discussed in [6]. For this, we consider two parts each of a 19-channel scalp EEG of a boy with lesions predominantly in the left temporal lobe caused by a connatal toxoplasmosis. The data were sampled at a rate of 256 Hz, meaning that 256 measurements were obtained each second. The first data part (data set 1) was taken from an EEG derived at an age of eight years and the second one (data set 2) was derived at an age of 11 years, four months after the implantation of a vagus stimulator. Note that epileptic activity was significant before vagus stimulation and was the reason for the implantation. (For some more details on the data, see [6]).
Figure 4 shows ePE and eRPE for $α = 0.5 , 2$ and eTPE for $α = 2$ for the two data sets in dependence on a shifted time window, where $d = 3$ and $τ = 4$. Each graphic represents the 19 channels for a fixed entropy by 19 entropy curves. Among the channels are T3 and P3, which are interesting in the following. The curve related to a fixed channel contains pairs $( t , h t )$, where $h t$ is the entropy of the segment of the related time series ending at time t and containing $2 × 256 + 3 × 4$ successive measurements. (Each segment provides $512 = 2 × 256$ ordinal patterns representing a time segment of two seconds.) The t are chosen from a time segment of 100 seconds, where the beginning time is set to 0 for simplicity. We have added a fat black curve representing the entropy of the whole brain instead of the single channels. Here, the relative frequencies used for entropy determination were obtained by counting ordinal patterns in the considered time segment pooled over all channels.
The two EEG parts reflect changes of brain dynamics. Whereas the first EEG part shows low entropies relative to the other channels of P3 and, partially, of T3, entropies of both channels are relatively higher in the second EEG part. More information with additional data sets derived directly before and after the vagus stimulator implantation is available from [6], where only the ePE was considered. Here, we want to note that P3 and T3 are from the left part of the brain with the lesions, and P3 seems to mainly reflect some kind of irregular behavior related to them. The most interesting point is that before vagus stimulator implantation, P3 and partially T3 are conspicuous both in phases with and without epileptic activity. For orientation, the graphics given in Figure 4 for data set 1 and in Figure 5 show a part from around 30 to 90 seconds with low entropies for pooled ordinal patterns (fat black curve) related to a generalized epileptic seizure, meaning epileptic activity in the whole brain.
Figure 4 suggests that a visual inspection of the data using eRPE and eTPE instead of ePE does not seem to gain further insights when α is chosen close to 1. Here, it is important to note that, for the parameters considered, all ordinal patterns are accessed at many times (with some significant frequency). Our guess is supported by Table 1. For each of the channels FP2, T3, and P3 and given α, the relative frequency of concordant pairs of the observations ePE, eRPE at time s and ePE, eRPE at time t among all pairs $( s , t ) ; s < t$ is shown. $( s , t )$ is said to be concordant if the difference between ePE at times s and t and the difference between eRPE at times s and t have the same sign.
The results particularly reflect the fact that for channel Fp2, providing measurements from the front of the brain, the ordinal patterns are more equally distributed than for T3; and for P3, the distribution of ordinal patterns is the farthest from equidistribution. For contrast, we also consider $α = 250$. Figure 5 related to data set 1 indicates that extreme choices of α could be useful in order to analyze and visualize changes in the brain dynamics more forcefully. The upper graphic of eRPE for $α = 0.01$ shows that at the beginning of an epileptic seizure, the number of ordinal patterns abruptly decreases for nearly all channels and after some increasing stays on a relatively low level until the end of the seizure. For $α = 35$, it is interesting to look at the whole ensemble of the entropies.
Here, eRPE indicates much more variability of the largest ordinal pattern frequencies of the channels in the seizure-free parts than in the seizure epochs. The very low entropy at the beginning of the seizure reverts back to mainly increasing or decreasing only ordinal patterns. Here, the special kind of scaling given by the eTPE allows to emphasize the special situation at the beginning of a seizure and can therefore be interesting for an automatic detection of epileptic seizures, whereby the correct tuning of α is important.

## 4. Classification on the Base of Different Entropies

Classification is an important issue in the analysis of EEG data, for which in many cases entropy measures can be exploited. Here, it is often unclear which of these measures are the best and how much they are ‘overlapping’. In view of the discussion in Section 2, note that by the asymptotic statements considered, ordinal pattern based measures can behave very similarly for high orders, but for low orders their performance can be rather different. Also including different delays in the analysis, one obtains much flexibility on one hand, but the problem of a high number of parameter combinations arises on the other hand.
Here, we want to discuss the classification of EEG data using ePE, empirical Conditional entropies of ordinal patterns (eCE, see below) and, additionally, Approximate entropy (ApEn) (see Pincus [30]) and Sample entropy (SampEn) (see Richmann and Moormann [31], and extending work in Keller et al. [10]). For the definition and usage of ApEn and SampEn, in particular for the parameter choice (“$ApEn ( 2 , 0.2 σ , x )$”, “$SampEn ( 2 , 0.2 σ , x )$”), we directly refer to [10]. The eCE, motivated and defined for dynamical systems in Section 2, is given by the following definition (compare also [10]):
Definition 5.
Given a time series $( x t ) t = 0 N − 1$ with $N ∈ N$, the quantity
$eCE d , τ , ( x t ) t = 0 N − 1 = ∑ π ∈ Π d p π τ ln p π τ − ∑ π 1 , π 2 ∈ Π d p π 1 , π 2 τ ln p π 1 , π 2 τ$
is called empirical Conditional entropy of ordinal patterns (eCE) of order $d ∈ N$ and of delay $τ ∈ N$, where $p π τ$ is defined by (2) and $p π 1 , π 2 τ$ by
$p π 1 , π 2 τ = 1 N − d τ − 1 # { t ∈ { d τ , d τ + 1 , … , N − 2 } ∣ ( x t , x t − τ , … , x t − d τ ) , ( x t + 1 , x t + 1 − τ , … , x t + 1 − d τ ) h a v e o r d i n a l p a t t e r n s π 1 , π 2 }$
Note that for each time series $( x t ) t = 0 N − 1$ and each $d ∈ N$, it holds $eCE d , 1 , ( x t ) t = 0 N − 1 = h ( d , 2 ) − h ( d , 1 )$. So $eCE d , 1 , ( x t ) t = 0 N − 1$ is an estimate of the Conditional entropy of ordinal patterns defined in Section 2.

#### 4.1. The Data

The considered EEG data are recordings from the Bonn EEG Database [20], each of a length of $23.6$ s recorded at a sampling rate of $173.61$ Hz. The data consists of five groups, each of 100 recordings:
• group A: surface EEG’s recorded from healthy subjects with open eyes,
• group B: surface EEG’s recorded from healthy subjects with closed eyes,
• group C: intracranial EEG’s recorded from subjects with epilepsy during a seizure-free period from within the epileptogenic zone,
• group D: intracranial EEG’s recorded from subjects with epilepsy during a seizure-free period from hippocampal formation of the opposite hemisphere of the brain,
• group E: intracranial EEG’s recorded from subjects with epilepsy during a seizure period.
In contrast to [10], where the groups A and B and the groups C and D were pooled, each of the groups is considered separately in the following.

#### 4.2. Visualization and Classification for Delay One

In order to give a visual impression of how the considered entropies separate the data, we provide three figures. For the ordinal pattern based measures, we have chosen order $d = 5$ and delay $τ = 1$. Figure 6 shows the values of the four considered entropies for all data sets. The values obtained are drawn from the left to the right starting from those from group A and ending with those from group E. For better presentation of the distribution of all entropies, we have added boxplots (see Figure 7). Figure 6 shows that each of the considered entropies does not separate much, however, one can see different kinds of separation properties for the ordinal pattern based entropies and the two other entropies. A better separation is seen in Figure 8 where one entropy measure is plotted versus another one, in four different combinations. Here, the discrimination between E, the union of A and B, and the union of C and D is rather good, confirming the results in [10], but both $A , B$ and $C , D$ are strongly overlapping. The general separation seems to be slightly better using three entropies, which is illustrated by Figure 9. Here, however, a two-dimensional representation is chosen by plotting the second principal component versus the first one, both obtained by principal component analysis from the three entropy components’ variables.
In order to obtain more objective and reliable results, we have tested a classification of the data on the base of the chosen entropies and of Random Forests (see Breiman [32]), a popular and efficient machine learning method. For this purpose, the data were randomly divided into a learning group consisting of $80 %$ of the data sets and a testing group consisting of the remaining $20 %$. The obtained accuracy of the classification, i.e., the relative frequency of the correctly classified data sets, was averaged over 1000 testing trials for each entropy combination considered. The results of this procedure, summarized in Table 2, Table 3 and Table 4, show that including an additional entropy results in a higher accuracy, but that the way in which entropies are combined is crucial for the results. Note that combining all four entropies provides an accuracy of only $70.6 %$.

#### 4.3. Other Delays

Clearly, the accuracy of the classification using ePE and eCE depends on the choice of the parameters of the entropies. Whereas the choice of higher orders does not make sense for statistical reasons, as already mentioned in Section 2, testing different delays τ is useful since the delay contains important scale information. Note that already for both, ePE and eCE, the classification accuracy varies by more then $14 %$. Considering only delays $τ = 1 , 2 , … , 9$ for $d = 5$, the maximal accuracies for ePE and eCE are $45.01 %$ and $48.0 %$ for $τ = 6$ and $τ = 5$, respectively. Note that both results are better than those for the SampEn (see Table 2) and that $τ = 1$ provides the worst results. Combining two ePEs and eCEs for delays in ${ 1 , 2 , … , 9 }$, one reaches an accuracy of $61.79 %$ (for delays 1 and 2) and $62.28 %$ (for delays 1 and 9), respectively.

## 5. Resume

Throughout this paper, we have discussed ordinal pattern based complexity measures both from the viewpoint of their theoretical foundation and their application in EEG analysis, centered upon Permutation entropy and its conditional variants. We have pointed out that, as in many situations in (model-based) data analysis, one must give attention to the discrepancy of theoretical asymptotics and statistical requirements, here in view of estimating KS entropy. In the case of moderately but not extremely long data sets, the concept of Conditional entropy of ordinal patterns, as discussed, could be a compromise. It has been shown to have better performance than the classical Permutation entropy in many situations.
A good way of further investigating the performances of ordinal pattern based measures is an extensive testing of these measures for data classification. In this direction, the results of this paper for a restricted parameter choice are already promising, however systematic studies are required and planned. For this purpose, based on the considerations in Section 3 and Section 4, the authors also propose including Renyi and Tsallis variants of Permutation entropy (with an extreme parameter choice), ordinal pattern based disequilibrium measures as considered by Zunino et al. [17] and Zunino and Ribeiro [18]), and classical concepts such as Approximate entropy and Sample entropy. The latter are interesting in combination with ordinal complexity measures since they possibly address other features. The most important challenge, however, is to deal with the great number of entropy measures and parameters which, in the opinion of the authors, can be faced by using machine learning ideas in a sophisticated way.

## Acknowledgments

We would like to thank Heinz Lauffer from the Clinic and Outpatient Clinic for Pediatrics of the University Greifswald for providing the EEG data discussed in Section 3. Moreover, the first author thanks Manuel Ruiz Marín from the Technical University of Cartagena and his colleagues for some nice days in South Spain discussing their concept of Symbolic correlation integral.

## Author Contributions

Karsten Keller wrote the paper. Teresa Mangold contributed to Section 3 by the data analysis and the most results. Inga Stolz contributed to Section 2.6, mainly by Figure 3 and the computations behind it. Jenna Werner provided the whole data analysis and material presented in Section 4. All authors have read and approved the final manuscript.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

1. Bandt, C.; Pompe, B. Permutation entropy—A natural complexity measure for time series. Phys. Rev. E 2002, 88, 174102. [Google Scholar] [CrossRef] [PubMed]
2. Amigó, J.M.; Keller, K.; Kurths, J. (Eds.) Recent progress in symbolic dynamics and permutation complexity. Ten years of permutation entropy. Eur. Phys. J. Spec. Top. 2013, 222, 247–257.
3. Zanin, M.; Zunino, L.; Rosso, O.A.; Papo, D. Permutation entropy and its main biomedical and econophysics applications: A review. Entropy 2012, 14, 1553–1577. [Google Scholar] [CrossRef]
4. Amigó, J.M.; Keller, K.; Unakafova, V.A. Ordinal symbolic analysis and its application to biomedical recordings. Philos. Trans. R. Soc. A 2015, 373, 20140091. [Google Scholar] [CrossRef] [PubMed]
5. Amigó, J.M. Permutation Complexity in Dynamical Systems; Springer: Berlin-Heidelberg, Germany, 2010. [Google Scholar]
6. Keller, K.; Lauffer, H. Symbolic analysis of high-dimensional time series. Int. J. Bifurc. Chaos 2003, 13, 2657–2668. [Google Scholar] [CrossRef]
7. Bandt, C.; Keller, G.; Pompe, B. Entropy of interval maps via permutations. Nonlinearity 2002, 15, 1595–1602. [Google Scholar] [CrossRef]
8. Liu, X.-F.; Wang, Y. Fine-grained permutation entropy as a measure of natural complexity for time series. Chin. Phys. B 2009, 18, 2690–2695. [Google Scholar]
9. Fadlallah, B.; Chen, B.; Keil, A.; Príncipe, J. Weighted-permutation entropy: A complexity measure for time series incorporating amplitude information. Phys. Rev. E 2013, 87, 022911. [Google Scholar] [CrossRef] [PubMed]
10. Keller, K.; Unakafov, A.M.; Unakafova, V.A. Ordinal Patterns, Entropy, and EEG. Entropy 2014, 16, 6212–6239. [Google Scholar] [CrossRef]
11. Bian, C.; Qin, C.; Ma, Q.D.Y.; Shen, Q. Modified permutation-entropy analysis of heartbeat dynamics. Phys. Rev. E 2012, 85, 021906. [Google Scholar] [CrossRef] [PubMed]
12. Zunino, L.; Perez, D.G.; Kowalski, A.; Martín, M.T.; Garavaglia, M.; Plastino, A.; Rosso, O.A. Brownian motion, fractional Gaussian noise, and Tsallis permutation entropy. Physica A 2008, 387, 6057–6068. [Google Scholar] [CrossRef]
13. Liang, Z.; Wang, Y.; Sun, X.; Li, D.; Voss, L.J.; Sleigh, J.W.; Hagihira, S.; Li, X. EEG entropy measures in anesthesia. Front. Comput. Neurosci. 2015, 9, 00016. [Google Scholar] [CrossRef] [PubMed]
14. Li, D.; Li, X.; Liang, Z.; Voss, L.J.; Sleigh, J.W. Multiscale permutation entropy analysis of EEG recordings during sevoflurane anesthesia. J. Neural Eng. 2010, 7, 046010. [Google Scholar] [CrossRef] [PubMed]
15. Ouyang, G.; Li, J.; Liu, X.; Li, X. Dynamic characteristics of absence EEG recordings with multiscale permutation entropy analysis. Epilepsy Res. 2013, 104, 246–252. [Google Scholar] [CrossRef] [PubMed]
16. Azami, H.; Escudero, J. Improved multiscale permutation entropy for biomedical signal analysis: Interpretation and application to electroencephalogram recordings. Biomed. Signal Process. 2016, 23, 28–41. [Google Scholar] [CrossRef]
17. Zunino, L.; Soriano, M.C.; Rosso, O.A. Distinguishing chaotic and stochastic dynamics from time series by using a multiscale symbolic approach. Phys. Rev. E 2012, 86, 046210. [Google Scholar] [CrossRef] [PubMed]
18. Zunino, L.; Ribeiro, H.V. Discriminating image textures with the multiscale two-dimensional complexity-entropy causality plane. Chaos Solitons Fract. 2016, 91, 679–688. [Google Scholar] [CrossRef]
19. Unakafov, A.M.; Keller, K. Conditional entropy of ordinal patterns. Physica D 2013, 269, 94–102. [Google Scholar] [CrossRef]
20. Andrzejak, R.G.; Lehnertz, K.; Rieke, C.; Mormann, F.; David, P.; Elger, C.E. Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E 2001, 64, 061907. [Google Scholar] [CrossRef] [PubMed]
21. Unakafova, V.A.; Keller, K. Efficiently Measuring Complexity on the Basis of Real-World Data. Entropy 2013, 15, 4392–4415. [Google Scholar] [CrossRef]
22. Walters, P. An Introduction to Ergodic Theory; Springer: New York, NY, USA, 1982. [Google Scholar]
23. Takens, F. Detecting strange attractors in turbulence. In Lecture Notes in Mathematics; Dynamical Systems and Turbulence; Rand, D.A., Young, L.S., Eds.; Springer: New York, NY, USA, 1981; Volume 898, pp. 366–381. [Google Scholar]
24. Gutman, J. Takens’ embedding theorem with a continuous observable. arXiv 2016. [Google Scholar]
25. Antoniouk, A.; Keller, K.; Maksymenko, S. Kolmogorov-Sinai entropy via separation properties of order-generated σ-algebras. Discrete Contin. Dyn. Syst. A 2014, 34, 1793–1809. [Google Scholar]
26. Keller, K.; Maksymenko, S.; Stolz, I. Entropy determination based on the ordinal structure of a dynamical system. Discrete Contin. Dyn. Syst. B 2015, 20, 3507–3524. [Google Scholar] [CrossRef]
27. Sprott, J.C. Chaos and Time-Series Analysis; Oxford University Press: Oxford, UK, 2003. [Google Scholar]
28. Young, L.-S. Mathematical theory of Lyapunov exponents. J. Phys. A Math. Theor. 2013, 46, 1–17. [Google Scholar] [CrossRef]
29. Caballero, M.V.; Mariano, M.; Ruiz, M. Draft: Symbolic Correlation Integral. Getting Rid of the Proximity Parameter. Available online: http://data.leo-univ-orleans.fr/media/seminars/175/WP_208.pdf (accessed on 14 February 2017).
30. Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed]
31. Richman, J.S.; Moorman, J.R. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ. Physiol. 2000, 278, 2039–2049. [Google Scholar]
32. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Figure 1. Approximation of Kolmogorov–Sinai (KS) entropy: the direct way.
Figure 1. Approximation of Kolmogorov–Sinai (KS) entropy: the direct way.
Figure 2. Approximation of KS entropy: the conditional way.
Figure 2. Approximation of KS entropy: the conditional way.
Figure 3. Different estimates of the KS entropy of $T ( ω ) = 4 ω ( 1 − ω ) ; ω ∈ [ 0 , 1 ]$ for different orbit lengths. On the top: $h ( d , 2 ) − h ( d , 1 )$ for $d = 2 , 3 , … , 7$; in the middle: $h ( 7 , k ) − h ( 7 , k − 1 )$ for $k = 2 , 3 , … , 10$; on the bottom: $h ( d , 2 ) − h ( d , 1 )$ versus $h ( d , 1 ) d$ for $d = 5 , 6 , 7$.
Figure 3. Different estimates of the KS entropy of $T ( ω ) = 4 ω ( 1 − ω ) ; ω ∈ [ 0 , 1 ]$ for different orbit lengths. On the top: $h ( d , 2 ) − h ( d , 1 )$ for $d = 2 , 3 , … , 7$; in the middle: $h ( 7 , k ) − h ( 7 , k − 1 )$ for $k = 2 , 3 , … , 10$; on the bottom: $h ( d , 2 ) − h ( d , 1 )$ versus $h ( d , 1 ) d$ for $d = 5 , 6 , 7$.
Figure 4. Comparison of empirical Renyi Permutation entropy (eRPE) for $α = 0.5$, 2, empirical Tsallis Permutation entropy (eTPE) for $α = 2$ and ePE, computed from EEG recordings before stimulator implantation (data set 1) and after stimulator implantation (data set 2) for 19 channels using a shifted time window, order $d = 3$ and delay $τ = 4$. Highlighting, in particular, the channels T3 (green line) and P3 (red line) as well as the entropy over all channels by a fat black line. The sampling rate is 256 Hz.
Figure 4. Comparison of empirical Renyi Permutation entropy (eRPE) for $α = 0.5$, 2, empirical Tsallis Permutation entropy (eTPE) for $α = 2$ and ePE, computed from EEG recordings before stimulator implantation (data set 1) and after stimulator implantation (data set 2) for 19 channels using a shifted time window, order $d = 3$ and delay $τ = 4$. Highlighting, in particular, the channels T3 (green line) and P3 (red line) as well as the entropy over all channels by a fat black line. The sampling rate is 256 Hz.
Figure 5. eRPE for $α = 0.01$, 35 and eTPE for $α = 35$ computed from data set 1 for 19 channels using a shifted time window, $d = 3$ and $τ = 4$ (cf. Figure 4). T3: green line, P3: red line.
Figure 5. eRPE for $α = 0.01$, 35 and eTPE for $α = 35$ computed from data set 1 for 19 channels using a shifted time window, $d = 3$ and $τ = 4$ (cf. Figure 4). T3: green line, P3: red line.
Figure 6. Entropy of the EEG data sorted by groups for four different entropy measures.
Figure 6. Entropy of the EEG data sorted by groups for four different entropy measures.
Figure 7. Boxplots for four different entropy measures sorted by groups.
Figure 7. Boxplots for four different entropy measures sorted by groups.
Figure 8. One entropy versus another entropy for four entropy combinations.
Figure 8. One entropy versus another entropy for four entropy combinations.
Figure 9. Second principal component versus the first one obtained from principal component analysis on three entropy variables.
Figure 9. Second principal component versus the first one obtained from principal component analysis on three entropy variables.
Table 1. Concordance of the sign of entropy differences of ePE and eRPE for given α.
Table 1. Concordance of the sign of entropy differences of ePE and eRPE for given α.
α$0 . 5$$0 . 8$$0 . 9$$1 . 1$$1 . 2$$1 . 5$2250
Fp2$98 . 69 %$$99 . 52 %$$99 . 77 %$$99 . 78 %$$99 . 58 %$$99 . 05 %$$98 . 38 %$$94 . 85 %$
T3$95 . 41 %$$98 . 39 %$$99 . 22 %$$99 . 26 %$$98 . 59 %$$96 . 92 %$$95 . 20 %$$89 . 31 %$
P3$93 . 18 %$$97 . 71 %$$98 . 93 %$$99 . 05 %$$98 . 21 %$$96 . 30 %$$94 . 33 %$$83 . 66 %$
Table 2. Results of the classification on the base of one entropy.
Table 2. Results of the classification on the base of one entropy.
EntropyClassification Accuracy (In %)
ApEn31.0
SampEn37.8
ePE32.0
eCE30.0
Table 3. Results of the classification on the base of two entropies.
Table 3. Results of the classification on the base of two entropies.
EntropyClassification Accuracy (In %)
ApEn & SampEn51.0
ApEn & ePE58.0
ApEn & eCE61.8
SampEn & ePE64.0
SampEn & eCE64.6
ePE & eCE48.2
Table 4. Results of the classification on the base of three entropies.
Table 4. Results of the classification on the base of three entropies.
EntropyClassification Accuracy (In %)
ApEn & SampEn & ePE67.4
ApEn & SampEn & eCE66.8
ApEn & ePE & eCE65.4
SampEn & ePE & eCE71.8

## Share and Cite

MDPI and ACS Style

Keller, K.; Mangold, T.; Stolz, I.; Werner, J. Permutation Entropy: New Ideas and Challenges. Entropy 2017, 19, 134. https://doi.org/10.3390/e19030134

AMA Style

Keller K, Mangold T, Stolz I, Werner J. Permutation Entropy: New Ideas and Challenges. Entropy. 2017; 19(3):134. https://doi.org/10.3390/e19030134

Chicago/Turabian Style

Keller, Karsten, Teresa Mangold, Inga Stolz, and Jenna Werner. 2017. "Permutation Entropy: New Ideas and Challenges" Entropy 19, no. 3: 134. https://doi.org/10.3390/e19030134

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.