Change-Point Detection Using the Conditional Entropy of Ordinal Patterns

This paper is devoted to change-point detection using only the ordinal structure of a time series. A statistic based on the conditional entropy of ordinal patterns characterizing the local up and down in a time series is introduced and investigated. The statistic requires only minimal a priori information on given data and shows good performance in numerical experiments. By the nature of ordinal patterns, the proposed method does not detect pure level changes but changes in the intrinsic pattern structure of a time series and so it could be interesting in combination with other methods.


Introduction
Most of real-world time series are non-stationary, that is some of their properties change over time.A model for some non-stationary time series is provided by a piecewise stationary stochastic process: its properties are locally constant except for certain time-points called change-points, where some properties change abruptly [8].Detecting change-points is a classical problem and there are many methods for tackling it [8][9][10]12,27].However, most of the existing methods have a common drawback: they require certain a priori information about the time series.It is necessary to know either a family of stochastic processes providing a model for the time series (see for instance [13] where AR processes are considered) or at least to know which characteristics (mean, standard deviation, etc.) of the time series reflect the change (see [10,34]).In real-world applications such information is often unavailable [11].
Here we suggest a new method for change-point detection that requires minimal a priori knowledge: we only assume that the changes affect the evolution rule linking the past of the process with its future (a formal description of the considered processes is provided by Definition 4).A natural example of such change is an alteration of the increments distribution.
Our method is based on ordinal pattern analysis, a promising approach to real-valued time series analysis [1,6,7,22,23,33,47].In ordinal pattern analysis one considers order relations between values of a time series instead of the values themselves.These order relations are coded by ordinal patterns; specifically, an ordinal pattern of an order d ∈ N describes order relations between (d + 1) successive points of a time series.The main step of ordinal pattern analysis is the transformation of an original time series into a sequence of ordinal patterns.A result of this transformation is demonstrated in Figure 1 for order d = 1.For detecting a change-point t * ∈ N in a time series x = x(t) L t=0 with values in R, one generally considers x as a realization of a stochastic process X and computes for x a statistic S(t; x) that should reach its maximum at t = t * .Here we suggest a statistic on the basis of the conditional entropy of ordinal patterns introduced in [46].
Let us provide an 'obvious' example only to motivate our approach and to illustrate its idea.

Example 1. Consider a time series x(t)
L t=0 , its central part is shown in Figure 1.The time series is periodic before and after L/2, but at L/2 there occurs a change (marked by a vertical line): the "oscillations" become faster.Figure 1 also presents the ordinal patterns π(t) of order d = 1 at times t underlying the time series.Note that there are only two ordinal patterns of order 1: the increasing (coded by 0) and the decreasing (coded by 1).Both ordinal patterns occur with the same frequency before and after the change-point.
To detect change-points we use a test statistic for d = 1 defined as follows: for θ ∈ (0, 1) with θL ∈ N. According to the properties of conditional entropy (see Section 3 for details), CEofOP(θL) attains its maximum when θL coincides with a change-point.Figure 2 demonstrates this for the time series from Figure 1.For simplicity and in view of real applications, in Example 1 we define ordinal patterns and the CEofOP statistic immediately for concrete time series.However, for theoretical consideration it is necessary to define the CEofOP statistic for stochastic processes.For this we refer to Section 3.
To illustrate applicability of the CEofOP statistic let us discuss a realworld data example: Example 2.Here we consider EEG recording 14 from the sleep EEG dataset kindly provided by Vasil Kolev (details and other results for this dataset are provided in [45,Subsection 5.3.2]).We employ the following procedure for an automatic discrimination between sleep stages from the EEG time series: first we split time series into pseudo-stationary intervals by finding changepoints with the CEofOP statistic (change-points are detected in each EEG channel separately), then we cluster all the obtained intervals.Figure 3 illustrates the outcome of the proposed discrimination for single EEG channel in comparison with the manual scoring by an expert; the automated identification of a sleep type (waking, REM, light sleep, deep sleep) is correct for 79.6% of 30-second epochs.Note that the borders of the segments (that is the detected change-points) in most cases correspond to the changes of sleep stage.The CEofOP statistic was first introduced in [23], where we have employed it as a component of a method for sleep EEG discrimination.However no theoretical details of the method for change-point detection were provided there.This paper aims to fill in this gap and provides a justification of the CEofOP statistic.
This paper is organized as follows.In Section 2 we provide a brief introduction into ordinal pattern analysis.In particular, we define the conditional entropy of ordinal patterns and discuss its properties.In Section 3 we investigate the properties of the CEofOP statistic.We also suggest there a method for detecting multiple change-points via the CEofOP statistic.Section 4 is devoted to a comparison of this method with two other (classical and ordinal-patterns-based) methods for change-point detection by performing experiments on realizations of piecewise stationary stochastic processes.In Section 5 we summarize the results and state open problems.Finally, in supplementary Section 6 we investigate the asymptotic properties of the CEofOP statistic.

Preliminaries
Central objects of the following are stochastic processes X = X(t) m t=n on a probability space (Ω, A, P) with values in R.Here n ∈ N 0 and m ∈ N ∪ {∞} allowing both finite and infinite lengths of processes.We consider only univariate stochastic processes to keep notation simple, though there are no principal restrictions on the dimension of a process.X = X(t) m t=n is stationary if for all t 1 , t 2 , . . ., t k , s with t 1 , t 2 , . . ., t k , t 1 + s, t 2 + s, . . ., t k + s ∈ {n, n + 1, . . ., m} the distributions of (X t i ) k i=1 and (X t i +s ) k i=1 coincide.Throughout this paper we discuss detection of change-points in a piecewise stationary stochastic process.Simply speaking, a piecewise stationary stochastic process is obtained by "gluing" several pieces of stationary stochastic processes (for a formal definition of piecewise stationarity see, for instance, [42,Section 3.1]).
In this section we recall the basic facts from ordinal pattern analysis (Subsection 2.1), present the idea of ordinal-patterns-based change-point detection (Subsection 2.2), and define the conditional entropy of ordinal patterns (Subsection 2.3).

Ordinal patterns
Let us recall the definition of an ordinal pattern [22,23,47].For d ∈ N denote the set of permutations of {0, 1, . . ., d} by S d .Definition 1.We say that a real vector (x 0 , x 1 , . . ., x d ) has ordinal pattern As one can see there are (d + 1)! different ordinal patterns of order d.
Definition 2. Given a stochastic process X = X(t) is called the abstract sequence of ordinal patterns of order d ∈ N of the process X.Similarly, given x = (x(t)) L t=0 a realization of X, the sequence of ordinal patterns of order d of x is defined as For simplicity, we say that L in the case L ∈ N is the length of the sequence π d,L though, in fact, it consists of (L − d + 1) elements.Definition 3. A stochastic process X = X(t) L t=0 for L ∈ N ∪ {∞} is said to be ordinal-d-stationary if for all i ∈ S d the probability P Π(t) = i does not depend on t for d ≤ t ≤ L. In this case we call the probability of the ordinal pattern i ∈ S d in X.
The idea of ordinal pattern analysis is to consider the sequence of ordinal patterns and the ordinal patterns distribution obtained from it instead of the original time series.Though implying the loss of all the metric information, this often allows to extract some relevant information from a time series, in particular, when it comes from a complex system.For example, ordinal pattern analysis provides estimators of the Kolmogorov-Sinai entropy [5,21,46] of dynamical systems, measures of time series complexity [6,23,32], measures of coupling between time series [20,33] and estimators of parameters of stochastic processes [7,40], see also [1,2] for a review of applications to real-world time series.Methods of ordinal pattern analysis are invariant with respect to strictly-monotone distortions of time series [22], do not need information about range of measurements, and are computationally simple [47].This qualifies it for application in the case that no much is known about the system behind a time series, possibly as a first exploration step.
For a discussion of the properties of a sequence of ordinal patterns we refer to [4,7,16,39,40].For the following we need two results stated below.
The next result is a direct consequence of [46,Lemma 4] and generalizes the main result of [30].
Theorem 2. If X is a Markov chain with values in a two-element-set (contained in R), then the corresponding abstract sequence Π d of ordinal patterns also forms a Markov chain for every order d ∈ N.
In relation to this theorem, note that in general one does not need values in R but in a totally ordered set.It is shown in [30] and [45,Subsection 3.3.3]that if X is a Markov chain in a finite set of more than two elements, Π d does not generally form a Markov chain.
Probability distributions of ordinal patterns are known only for some special cases of stochastic processes [7,16,39].In general one estimates probabilities of ordinal patterns by their empirical probabilities.Consider a sequence π d,L of ordinal patterns.For any t ∈ {d + 1, d + 2, . . ., L} the frequency of occurrence of an ordinal pattern i ∈ S d among the first (t − d) ordinal patterns of the sequence is given by A natural estimator of the probability of an ordinal pattern i in the ordinald-stationary case is provided by its relative frequency in the sequence π d,L :

Detecting change-points in the sequence of ordinal patterns
Sequences of ordinal patterns are invariant to certain changes in the original stochastic process X, such as shifts (adding a constant to the process) [1, Subsection 3.4.3]and scaling (multiplying the process with a positive constant) [22].However, in many cases changes in the original process X affect also the corresponding abstract sequences of ordinal patterns and ordinal patterns distributions.On the one hand, this impedes application of ordinal pattern analysis to non-stationary time series since most of ordinal-patternsbased quantities require ordinal-d-stationarity of a time series [1,6,33] and may be unreliable when this condition fails.On the other hand, one can detect change-points in the original process by detecting changes in the sequence of ordinal patterns.First ideas of using ordinal patterns for detecting change-points were formulated in [4,35,38,41,49].The advantage of the ordinal-patterns-based methods is that they require less information than most of the existing methods for change-point detection: it is assumed neither that the stochastic process is from a specific family nor that the change affects specific characteristic of the process.Instead, we consider further change-points with the following property.
Definition 4. Let X(t) L t=0 with L ∈ N ∪ {∞} be a piecewise stationary stochastic process with a change-point t * ∈ N. We say that t * is a ordinal change-point if there exist some m, n ∈ N with m < t * < n ≤ L and some d ∈ N such that X(t) t * t=m and X(t) n t * +1 are ordinal-d-stationary but X(t) n t=m is not.This approach seems to be natural for many stochastic processes and real-world time series.
Remark.Note that a change-point where the change in mean occurs, need not be ordinal, since the mean is irrelevant for the distribution of ordinal patterns [1,Subsection 3.4.3].However, there are many methods that effectively detect changes in mean; the method proposed here is intended for use in a more complex case, when there is no classical method, or it is not clear, which of them to apply.
, where φ 1 , φ 2 , . . ., φ Nst ∈ [0, 1) are the parameters of the autoregressive model and . ., N st , where t * 0 := 0 and t * Nst := L, with ǫ being the standard white Gaussian noise, and AR(0) := ǫ(0).AR processes are often used for the investigation of methods for changepoints detection (see, for instance, [38,41]), since they provide models for a wide range of real-world time series.Figure 4a illustrates a realization of a 'two piece' AR process with a change-point at L/2.By [7, Proposition 5.3] the distributions of ordinal patterns of order d ≥ 2 reflect change-points for piecewise stationary AR processes.Figure 4c illustrates this for the realization from Figure 4a: empirical probability distributions of ordinal patterns of order d = 2 before and after the change-point L/2 differ significantly.
Example 4. A classical example of non-linear system is provided by a logistic map on the unit interval: with t ∈ N, for x(0) ∈ [0, 1] and r ∈ [1,4].The behaviour of this map significantly varies for different value r; we are especially interested in r ∈ [3.57, 4] with chaotic behaviour.In the latter situation there exists an invariant ergodic measure absolutely continuous with respect to the Lebesgue measure [29,44], therefore (3) defines a stationary stochastic process NL 0 : with NL 0 (0) ∈ [0, 1] being a uniformly distributed random number.Note that for almost all r ∈ [3.57, 4] (more generally, for all r ∈ [0, 4]), either the map NL 0 is chaotic or hyperbolic roughly meaning that an attractive periodic orbit is dominating it.This is a deep result in one-dimensional dynamics (see [29] for details).In the hyperbolic case after some transient behaviour numerically one only sees some periodic orbit, which has long periods in the interval r ∈ [3.57, 4].From the practical viewpoint, i.e. when considering short orbits, dynamics for that interval can be considered as chaotic since already small changes of r provide chaotic behaviour also in the theoretical sense.
Let us include some observational noise by adding standard white Gaussian noise ǫ to an orbit: where σ > 0 is the level of noise.
Orbits of logistic maps, particularly with observational noise, are often used as a studying and illustrating tool of non-linear time series analysis (see [15,28]).This justifies as a natural object for study a piecewise stationary noisy logistic (NL) process with change-points t * 1 , t * 2 , . . ., t * Nst−1 , defined as , where r 1 , . . ., r Nst ∈ [3.57, 4] are the values of control parameter, σ 1 , . . ., σ Nst > 0 are the levels of noise, and Figure 4b shows a realization of a 'two-piece' NL process with a changepoint at L/2; as one can see in Figure 4d, the empirical distributions of ordinal patterns of order d = 2 before the change-point and after the changepoint do not coincide.In general, the distributions of ordinal patterns of order d ≥ 1 reflect change-points for the NL processes (which can be easily checked).The NL and AR processes considered have rather different ordinal patterns distributions, being the reason for using them for empirical investigation of change-point detection methods in Section 4.
We now consider the classical problem of detecting a change-point t * on the basis of a realization x of a stochastic process X having at most one change-point, that is it holds either N st = 1 or N st = 2 (compare [12]).To solve this problem one estimates a tentative change-point t * as the time-point that maximizes a test statistic S(t; x).Then the value of S( t * ; x) is compared to a given threshold in order to decide whether t * is a change-point.
Therefore, a position of a change-point can be estimated by an ordinalpatterns-based statistic S(t; π d,L ) that, roughly speaking, measures dissimilarity between the distributions of ordinal patterns for π(k) A method for detecting one change-point can be extended to an arbitrary number of change-points using the binary segmentation [48]: one applies a single change-point detection procedure to the realization x; if a changepoint is detected then it splits x into two segments in each of which one is looking for a change-point.This procedure is repeated iteratively for the obtained segments until all of them either do not contain change-points or are too short.
The key problem in this paper is the selection of an appropriate ordinalpatterns-based test statistic S(t; π d,L ) for detecting changes.We suggest an appropriate test statistic in Section 3, but first we introduce in Subsection 2.3 the conditional entropy of ordinal patterns being the cornerstone of this statistic.

Conditional entropy of ordinal patterns
Let us call a process X = X(t) L t=0 for L ∈ N ∪ {∞} ordinal-d + -stationary if for all i, j ∈ S d the probability of pairs of ordinal patterns p i,j = P Π(t) = i, Π(t + 1) = j does not depend on t for d ≤ t ≤ L − 1 (compare with Definition 3).Obviously, ordinal-d + 1-stationarity implies ordinal-d + -stationarity.
For an ordinal-d + -stationary stochastic process, consider the probability of an ordinal pattern j ∈ S d to occur after an ordinal pattern i ∈ S d ; similarly to (1), it is given by: Definition 5.The conditional entropy of ordinal patterns of order d ∈ N of an ordinal d + -stationary stochastic process X is defined by: For brevity, we refer to CE(X, d) as the "conditional entropy" when no confusion can arise.The conditional entropy characterizes the mean diversity of successors j ∈ S d of a given ordinal pattern i ∈ S d .This quantity often provides a good practical estimation of the Kolmogorov-Sinai entropy for dynamical systems; for a discussion of this and other theoretical properties of conditional entropy we refer to [46].
One can estimate the conditional entropy from a time series by using the empirical conditional entropy of ordinal patterns [23].Consider a sequence π d,L of ordinal patterns of order d ∈ N with length L ∈ N. Similarly to (2), the frequency of occurrence of an ordinal patterns pair i, j ∈ S d is given by for t ∈ {d + 1, d + 2, . . ., L}.The empirical conditional entropy of ordinal patterns for π d,L is defined by As a direct consequence of Lemma 1 the empirical conditional entropy approaches the conditional entropy under certain assumptions, namely it holds the following.Corollary 3.For the sequence π d,∞ of ordinal patterns of order d ∈ N of a realization of an ergodic stochastic process X = X(t) t∈N 0 with associated stationary increment process (X(t)− X(t − 1)) t∈N , it holds almost surely that

Statistic for change-point detection based of on the conditional entropy of ordinal patterns
We suggest to use the following statistic for detecting ordinal change-points of a process on the basis of a sequence π d,L of ordinal patterns for d, L ∈ N of a realization of the process: for all t ∈ N with d < t < L − d.The intuition behind this statistic comes from the concavity of conditional entropy (not only for ordinal patterns but in general, see [19,Subsection 2.1.3]).It holds Therefore, if the probabilities of ordinal patterns change at some point t * , but do not change before and after t * , then CEofOP t; π d,L tends to attain its maximum at t = t * .If the probabilities do not change at all, then for L being sufficiently large, (9) tends to hold with equality (see Corollary 6 in Section 6).
Let n i (t) and n i,j (t) be the frequencies of occurrence of an ordinal pattern i ∈ S d and of an ordinal patterns pair i, j ∈ S d (given by ( 2) and ( 5), respectively).Set m i (t) = n i (L)−n i (t+d) and m i,j (t) = n i,j (L)−n i,j (t+d), then using (6) we rewrite (8) in a straightforward form: This statistic was first introduced and applied to the segmentation of sleep EEG time series in [23].
In the rest of this section, we present a motivating example (Subsection 3.1), show the relation of the CEofOP statistic to the likelihood ratio statistic for a piecewise stationary Markov chain (Subsection 3.2) and formulate an algorithm for detecting multiple change-points by means of the CEofOP statistic in Subsection 3.3.

Motivating example
We demonstrate a "non-linear" nature of the CEofOP statistic by means of Example 5 concerning transition from a time series to its surrogate.Although being in a sense tailor-made, this example shows that CEofOP discerns changes that cannot be detected by conventional "linear" methods.
Remark.The question whether a time series is linear or non-linear often arises in data analysis.For instance, linearity should be verified before using such powerful methods as Fourier analysis.For this one usually employs a procedure known as surrogate data testing [36,37,43].It utilises the fact that a linear time series is statistically indistinguishable from any time series sharing some of its properties (for instance, second moments and amplitude spectrum).Therefore one can generate surrogates having the certain properties of the original time series without preserving other properties, irrelevant for a linear system.If such surrogates are significantly different from the original series then non-linearity is assumed.
Example 5. Consider a time series consisting of a realisation of a noisy logistic process NL r, σ of length L/2 without changes, glued with its surrogate of the same length (to generate surrogates we use the iterative AAFT algorithm suggested by [37] and implemented by [17]).This compound time series has a change-point at t * = L/2, which conventional methods may fail to detect since the surrogate has the same autocorrelation function as the original process (for instance this is the case for the Brodsky-Darkhovsky method considered further in Section 4).However, the ordinal pattern distributions for the original time series and its surrogate generally are significantly different.Therefore the CEofOP statistic detects the change-point, which is illustrated by Figure 5. Remark.Although the idea that ordinal structure is a relevant indicator of time series linearity/non-linearity is not new [1,6], to our knowledge, it was not rigorously proved that the distribution of ordinal patterns is altered by surrogates.This is clearly beyond the scope of this paper and will be discussed elsewhere as a separate study; here it is sufficient for us to provide an empirical evidence for this.

CEofOP statistic for a sequence of ordinal patterns forming a Markov chain
In this subsection we show that there is a connection between the CEofOP statistic and the classical likelihood ratio statistic.Though taking place only in a particular case, this connection reveals the nature of the CEofOP statistic.
First we set up necessary notations.Consider a sequence π d,L of ordinal patterns for that transition probabilities of ordinal patterns may change at some t ∈ {d, d + 1 . . ., L}.The basic statistic for testing whether there is a change in the transition probabilities is the likelihood ratio statistic [8, Subsection 2.2.3]: where Lkl H | π d,L is the likelihood of the hypothesis H given a sequence π d,L of ordinal patterns, and the hypotheses are given by where p j|i (t), q j|i (t) are transition probabilities of ordinal patterns before and after t, respectively.Proof.First we estimate the probabilities and the transition probabilities before (p) and after (q) the change [3, Section 2]: Then, as one can see from [3, Section 3.2], we have t) .
Assume that the first ordinal pattern π(d) is fixed in order to simplify the computations.Then p π(d) (L) = p π(d) (t) and it holds: Since j∈S d n i,j (t) = n i (t), one finally obtains: where T min is a minimal length of a sequence of ordinal patterns that is sufficient for a reliable estimation of empirical conditional entropy.

Change-point detection via the CEofOP statistic
Remark.From the representation (8) it follows that for a reasonable computation of the CEofOP statistic, a reliable estimation of eCE before and after the assumed change-point is required.To satisfy this requirement the stationary parts of a process are assumed to be sufficiently long.We take T min = (d + 1)!(d + 1), which is equal to the number of all possible pairs of ordinal patterns of order d (see [23] for details).Consequently, the length L of a time series should satisfy Note that this does not impose serious limitations on the suggested method, since condition ( 12) is not too restrictive for d ≤ 3. However it implies using of either d = 2 or d = 3, since d = 1 does not provide effective change-point detection (see Examples 6 and 3), while d > 3 in most applications demands too large sample sizes.
Then in order to check whether t * is an actual change-point we test between the hypotheses: This test is performed by comparing CEofOP t * ; π d,L to a threshold h, such that if the value of CEofOP is above the threshold then one rejects H 0 in favour of H A , otherwise H 0 is accepted.The choice of the threshold is ambiguous: the higher h, the higher the possibility of false acceptance of the hypothesis H 0 is; on the contrary, the lower h, the higher the possibility of false rejection of H 0 in favour of H A (false alarm) is.
As it is usually done, we consider the threshold h as a function of the desired probability α of false alarm; for computing the threshold h(α) we use block bootstrapping from the sequence π d,L of ordinal patterns (bootstrapping is often used in change-point detection for computing a threshold, see [14,25] for a theoretical discussion and [24,31] for applications of bootstrapping with detailed and clear explanations).Algorithm 1 describes the detection of at most one change-point via the CEofOP statistic.
To detect multiple change-points we use an algorithm that consists of two steps: Step 1: preliminary estimation of boundaries of the stationary segments with a threshold h(2α) computed for doubled nominal probability of false alarm (that is with a higher risk of detecting false change-points).
Step 2: verification of the boundaries and exclusion of false change-points: a change-point is searched for a merging of every two adjacent intervals.T min ← (d + 1)!(d + 1); end if 19: end function Details of these two steps are displayed in Algorithm 2. While Step 1 is the usual binary segmentation procedure as suggested in [48], Step 2 improves the obtained solution following the idea suggested in [11].

Experiments
In this section we empirically investigate performance of the method for change-point detection via the CEofOP statistic.We apply it to the noisy logistic processes and to autoregressive processes (see Subsection 2.2) and compare performances of change-point detection by the suggested method and by the following methods.
• The ordinal-patterns-based method for detecting change-points via the CMMD statistic [38,41]: A time series is split into windows of equal  until k < N st − 1; • Two versions of the classical Brodsky-Darkhovsky method [11]: The Brodsky-Darkhovsky method can be used for detecting changes in various characteristics of a time series x = x(t) L t=1 , but the characteristic of interest should be selected in advance.In this paper we consider detecting changes in mean which is just the basic characteristic, and in correlation function corr(x(t), x(t+1)) which reflects relations between the future and the past of a time series and seems to be a natural choice for detecting ordinal change-points.Changes in mean are detected by the generalized version of the Kolmogorov-Smirnov statistic [11]: where the parameter δ ∈ [0, 1] regulates properties of the statistic, δ = 0 is basically used (see [11] for details).Changes in the correlation function are detected by the following statistic: BD corr (t; x, δ) = BD exp t; y(t) L−1 t=1 , δ with y(t) = x(t)x(t + 1).
Remark.Note that we consider the statistic BD exp , which is intended to detect changes in mean, though ordinal-patterns-based statistics do not detect these changes.This is motivated by the fact that changes in the noisy logistic processes are on the one hand changes in mean, and in the other hand -ordinal changes in the sense of Definition 4. Therefore, they can be detected both by BD exp and by ordinal-patterns-based statistics.In general, by the nature of ordinal time series analysis, changes in mean and in the ordinal structure are in some sense complementary.
We carry out experiments for order d = 3 of ordinal patterns (lower orders may provide worse results because of reduced sensitivity, while higher orders are applicable only to rather long time series due to condition (12)).For the CMMD statistic; we take the window size W = 256.(There are no special reasons for this choice except the fact that W = 256 is sufficient for estimating probabilities of ordinal patterns of order d = 3 inside the windows, since 256 > 120 = 5(d + 1)! [1, Section 9.3].Results of the experiments remain almost the same for 200 ≤ W ≤ 1000.) In Subsection 4.1 we study how well the statistics for change-point detection estimate the position of a single change-point.Since we expect that the performance of the statistics for change-point detection may strongly depend on the length of realization, we check this in Subsection 4.2.Finally, we investigate the performance of various statistics for detecting multiple change-points in Subsection 4.3.

Estimation of the position of a single change-point
Consider N = 10000 realizations x j = x j (t) L t=0 with j = 1, . . ., N for each of the processes listed in Table 1 To measure the overall accuracy of change-point detection via some statistic S as applied to the process X we use three quantities.Let us first determine the error of the change-point estimation provided by the statistic S for the j-th realization of a process X: where t * is the actual position of the change-point and t * (S; x j ) is its estimate obtained by using S. Then the fraction of satisfactorily estimated changepoints sE (averaged over N realizations) is defined by: where MaxErr is the maximal satisfactory error, we take MaxErr = W = 256.The bias and the root mean squared error are respectively given by The larger sE is and the more near to zero the bias and the RMSE are, the more accurate the estimation of a change-point is.
Results of the experiments are presented in Tables 2, 3 for NL and AR processes, respectively.For every process the best values of performance measures are shown in bold.Let us summarize: For the considered processes the CEofOP statistic estimates change-point more accurately than the CMMD statistic.For the NL processes the CEofOP statistic has almost the same performance as the Brodsky-Darkhovsky method; for the AR processes performance of the classical method is better, though CEofOP has lower bias.In contrast to the ordinal-patterns-based methods, the Brodsky-Darkhovsky method is unreliable when there is lack of a priori information about the time series.For instance, changes in NL processes only slightly influence the correlation function and BD corr does not provide a good indication of changes (cf.performance of BD corr and CEofOP in Table 2).Meanwhile, changes in the AR processes do not influence the expected value (see Example 3), which does not allow to detect them using BD exp (see Table 3).Therefore we do not consider the BD exp statistic in further experiments.

Estimating position of a single change-point for different lengths of time series
Here we study how the accuracy of change-point estimation for the three considered statistics depends on the length L of a time series.We take N = 50000 realizations of NL, 3.95 → 3.98, σ = 0.2 and AR, 0.1 → 0.4 for realization lengths L = 24 W, 28 W, . . ., 120 W . Again, we consider a single change at a random time t * ∈ L 4 − W, L 4 − W + 1, . . ., L 4 + W . Results of the experiment are presented in Figure 6.Summarizing, performance of the CEofOP statistic strongly depends on the length of time series but is generally better than for the CMMD statistic.In comparison with the classical Brodsky-Darkhovsky method, CEofOP has better performance for NL processes (see Figures 6a,b), and lower bias for AR processes (see Figure 6d).

Detecting multiple change-points
Here we investigate how well the considered statistics detect multiple changepoints.Methods for change-point detection via the CEofOP and the CMMD statistics are implemented according to Subsection 3.3 and [45, Subsection 4.5.1],respectively.The Brodsky-Darkhovsky method is implemented according to [11] with only one exception: to compute a threshold for it we use bootstrapping, which in our case provided better results than the technique described in [11].Nominal probability of a false alarm α = 0.05 has been taken for all methods (in the case of the CMMD statistic we have used the equivalent value 0.001, see [45,Subsection 4.3.2]).
We consider here two processes, AR (0.For both processes we generate N = 10000 realizations x j with j = 1, . . ., N .We unequal lengths of stationary segments to study methods for change-point detection in more realistic conditions. As we apply change-point detection via a statistic S to realization x j , we obtain estimates of the number N st (S; x j ) of stationary segments and of change-points positions t * l (S; x j ) for l = 1, 2, . . ., N st (S; x j ) − 1.Since the number of estimated change-points may be different from the actual number of changes, we suppose that the estimate for t * k is provided by the nearest t * l (S; x j ).Therefore the error of estimation of the k-th change-point provided by S is given by err j k (S, X) = min l=1,2,..., Nst(S;x j )−1 t * l (S; x j ) − t * k .
To assess the overall accuracy of change-point detection, we compute two quantities.The fraction sE k of satisfactory estimates of a change-point t * k , k = 1, 2, 3 is given by where MaxErr is the maximal satisfactory error; we take MaxErr = W = 256.The average number of false change-points is defined by: Results of the experiment are presented in Tables 4 and 5, the best values are shown in bold.
In CEofOP statistic provides good results for the NL processes.However, for the AR processes its performance is much worse: only the most prominent change is detected rather well.Weak results for two other change-points are caused by the fact that the CEofOP statistic is rather sensitive to the lengths of stationary segments (we have already seen this in Subsection 4.2), and in this case they are not very long.

Conclusions and open points
In this paper we have introduced a method for change-point detection via the CEofOP statistic and have tested it for time series coming from two classes of models having quite different distributions, namely piecewise stationary noisy logistic and autoregressive processes.
The empirical investigations suggest that the method proposed provides better detection of ordinal change-points than the ordinal-patterns-based method introduced in [38,41].Performance of our method for the two model classes considered is particularly comparable to that for the classical Brodsky-Darkhovsky method, but in contrast to it, ordinal-patterns-based methods require less a priori knowledge about the time series.This can be especially useful in the case of considering non-linear models where the autocorrelation function does not describe distributions completely.Here the point is that with exception of the mean much of the distribution is captured by its ordinal structure.So (together with methods finding changes in mean) the CEofOP statistic can be used at least for a first exploration step.
Although numerical experiments and tests to real-world data cannot replace rigorous theoretical studies, the results of the current study show the potential of the change-point detection via the CEofOP statistic.However, there are some open points listed below.
1.A method for computing a threshold h for the CEofOP statistic without using bootstrapping is of interest, since the bootstrapping procedure is rather time consuming.One possible solution is to utilize Theorem 5 (Section 6) and to precompute thresholds using the values of ∆ d γ,θ (P, Q).However, this approach requires further investigation.
2. The binary segmentation procedure [48] is not the only possible method for detecting multiple change-points.In [26,27] an alternative approach is suggested: the number of stationary segments N st is estimated by optimizing a contrast function, then the positions of the change-points are adjusted.Likewise one can consider a method for multiple changepoint detection based on maximizing the following generalization of CEofOP statistic: ∈ N are estimates of change-points.Further investigation in this direction could be of interest.
3. As we have seen in Subsection 4.2, CEofOP statistic requires rather large sample sizes to provide reliable change-point detection.This is due to the necessity of the empirical conditional entropy estimation (see Subsection 3.3).In order to reduce the required sample size, one may consider more effective estimates of the conditional entropy, for instance, the Grassberger estimate (see [18] and [45, Subsection 3.4.1]).However elaboration of this idea is beyond the scope of this paper.

4.
In this paper only one-dimensional time series are considered, though there is no principal limitation for applying ordinal-patterns-based methods to multivariate data (see [21]).Discussion of using ordinalpatterns-based methods for detecting change-point in multivariate data (for instance, in multichannel EEG) is therefore of interest.
5. We have considered here only the "off-line" detection of changes, which is used when the acquisition of a time series is completed.Meanwhile, in many applications it is necessary to detect change-points "on-line", based on a small number of observations after the change [8].Development of on-line versions of ordinal-patterns-based methods for changepoint detection may be an interesting direction of a future work.

Asymptotic behavior of the CEofOP statistic
Here we consider the values of CEofOP for the case when segments of a stochastic process before and after the change-point have infinite length.
Let us first introduce some notation.Given an ordinal-d + -stationary stochastic process X for d ∈ N the distribution of pairs of ordinal patterns is denoted by P = (p i,j ) i,j∈S d , with p i,j = P Π(t) = i, Π(t + 1) = j = p j|i p i for all i, j ∈ S d .One easily sees the following: The conditional entropy of ordinal patterns is represented as CE(X, d) = H(P ), where Here recall that p i = j∈S d p i,j .
Then, for all θ ∈ (0, 1) it holds lim L→∞ CEofOP ⌊θL⌋; Π L,γ = L ∆ d γ,θ (P, Q) P-almost sure, where By definition, ( 13) is a stochastic process of length L + 1 with a potential ordinal change-point t * = ⌊θL⌋, i.e. the position of t * relative to L is principally the same for all L, and the statistics considered are stabilizing for increasing L. (13) can be particularly interpreted as a part of a stochastic process including exactly one ordinal chance point.We omit the proof of Theorem 5 since it is a simple computation.
Due to the properties of the conditional entropy, it holds Values of ∆ d γ,θ (P, Q) can be computed for a piecewise stationary stochastic process with known probabilities of ordinal patterns before and after the change-point2 .In [7] authors compute probabilities of ordinal patterns of orders d = 2 (Proposition 5.3) and d = 3 (Theorem 5.5) for stationary Gaussian processes (in particular, for autoregressive processes).Here we use these results to provide Example 6 illustrating Theorem 5.
Example 6.Consider an autoregressive process AR (φ 1 , φ 2 ), t * with a single change-point t * = L/2 for L ∈ N. Using the results from [7] we compute distributions P φ 1 , P φ 2 of ordinal pattern pairs for orders d = 1, 2 and on this basis we calculate the values of ∆ d 0.5,0.5 P φ 1 , P φ 2 for different values of φ 1 and φ 2 .The results are presented in Tables 6 and 7.The following result is a simple consequence of Theorem 5. Corollary 6.Let X = (X t ) t∈N 0 be an ergodic d + -ordinal-stationary stochastic process on a probability space (Ω, A, P).For L ∈ N let Π d,L be the abstract sequence of ordinal patterns of order d of (X 0 , X 1 , . . ., X L ).Then for any θ ∈ (0, 1) it holds lim L→∞ CEofOP ⌊θL⌋; Π d,L = 0 P-almost sure.

Figure 1 :
Figure 1: A part of a piecewise stationary time series with a change-point at t = L/2 (marked by a vertical line) and corresponding ordinal patterns of order d = 1 (below the plot)

Figure 2 :
Figure 2: Statistic CEofOP(θL) for the sequence of ordinal patterns of order 1 for the time series from Figure 1

Figure 3 :
Figure 3: Hypnogram (bold curve) and the results of ordinal-patterns-based discrimination of sleep EEG.Here ordinate axis represents the results of the expert classification: W stands for waking, stages S1, S2 and S3, S4 indicate light and deep sleep, respectively, REM stands for REM sleep and Error -for unclassified samples.Results of ordinal-patterns-based discrimination are represented by the background colour: white colour indicates epochs classified as waking state, light gray -as light sleep, gray -as deep sleep, dark gray -as REM, red colour indicates unclassified segments We illustrate Definition 4 by two examples.Piecewise stationary autoregressive processes considered in Example 3 are classical and provide models for linear time series.Since many real-world time series are non-linear, further we introduce in Example 4 a process originated from non-linear dynamical systems.These two types of processes are used throughout the paper for empirical investigation of change-point detection methods.Example 3. A first order piecewise stationary autoregressive (AR) process with change-points t * 1 , t * 2 , . . ., t * Nst−1 is defined as

Figure 4 :
Figure 4: Upper row: parts of realizations of AR (a) and NL (b) process with change-points marked by vertical lines, L = 20000.Lower row: empirical probability distributions of ordinal patterns of order d = 2 before and after the change-point for the realizations of AR (c) and NL (d) process an estimate of the change-point t * is given by t * = arg max t=d,d+1,...,L S(t; π d,L ).

Figure 5 :
Figure 5: Statistic CEofOP(θL) for a time series, obtained by "gluing" a realization of a noisy logistic stochastic process NL 4, 0.2 with its surrogate at t * = 2000

Proposition 4 .
If an abstract sequence Π d of ordinal patterns of order d ∈ N forms a Markov chain with at most one change-point, then for a sequence π d,L = π(k) L k=d of ordinal patterns being a realization of Π d of length L ∈ N, it holds LR t; π d,L = 2 CEofOP t; π d,L + 2d • eCE π d,L .
Consider a sequence π d,L of ordinal patterns of order d ∈ N with length L ∈ N, corresponding to a realization of some piecewise stationary stochastic process.To detect a single change-point via the CEofOP statistic we first estimate its possible position by t * = arg max t=T min +d,...,L−T min S(t; π d,L ), . ., π( t * ) and π( t * + d), . . ., π(L) of the sequence π d,L come from the same distribution; H A : parts π(d), π(d + 1), . . ., π( t * ) and π( t * + d), . . ., π(L) of the sequence π d,L come from different distributions.

Algorithm 1
Detecting at most one change-point Input: sequence π = π(k) t end k=tstart of ordinal patterns of order d, nominal probability α of false alarm Output: estimate of a change-point t * if change-point is detected, otherwise return 0. 1: function DetectSingleCP(π, α) 2:

Algorithm 2 1 :
Detecting multiple change-points Input: sequence π = π(k) L k=d of ordinal patterns of order d, nominal probability α of false alarm.Output: estimates of the number N st of stationary segments and of their boundaries t function DetectAllCP(π, α) 2: end function lengths W ∈ N, empirical probabilities of ordinal patterns are estimated in every window.If there is a ordinal change-point in the time series, then the empirical probabilities of ordinal patterns should be approximately constant before the change-point and after the changepoint, but they change at the window with the change-point.To detect this change authors have introduced the CMMD statistic 1 .In the original papers [38,41] authors do not estimate change-points, but only the corresponding window numbers; for the algorithm of change-point estimation by means of the CMMD statistic we refer to [45, Subsection 4.5.1].

Figure 7
Figure 7 shows how fast this convergence is.Note that the CEofOP statistic for orders d = 1, 2 allows to distinguish between change and no change in the considered processes for L ≥ 20 • 10 3 .For L = 10 5 the values of the CEofOP statistic for order d = 2 is already very close to its theoretical values, whereas for d = 1 this length does not seem sufficient.

Table 1 :
. A single change occurs at a random time t * uniformly distributed in L 4 − W, L 4 − W + 1, . . ., L 4 + W .For all processes, length L = 80 W of sequences of ordinal patterns is taken.Processes used for investigation of the change-point detection

Table 2 :
Performance of different statistics for estimating change-point in NL processes

Table 3 :
Performance of different statistics for estimating change-point in AR processes

Table 5 :
summary, since distributions of ordinal patterns for NL and AR processes have different properties, results for them differ significantly.The Performance of change-point detection methods for the process with