Random Walk Null Models for Time Series Data

Permutation entropy has become a standard tool for time series analysis that exploits the temporal properties of these data sets. Many current applications use an approach based on Shannon entropy, which implicitly assumes an underlying uniform distribution of patterns. In this paper, we analyze random walk null models for time series and determine the corresponding permutation distributions. These new techniques allow us to explicitly describe the behavior of real world data in terms of more complex generative processes. Additionally, building on recent results of Martinez, we define a validation measure that allows us to determine when a random walk is an appropriate model for a time series. We demonstrate the usefulness of our methods using empirical data drawn from a variety of fields.


Introduction
In the past fifteen years, measures of entropy defined in terms of the distribution of ordinal patterns have become important tool in the analysis of time series. These methods effectively make use of the temporal structure of this type of data in ways that are both computationally efficient and simple to implement. In addition, permutation entropy is invariant under scaling of the data, i.e. under non-linear monotonic transformations, adding to its wide applicability [2,7]. These techniques have found application in in many fields including economics [11,19,29,35,40,41], medicine [27,28,30,33,34], and physics [37,39], among others. Three recent surveys [23,36,38] provide a comprehensive overview of developments in the field and related applications.
The features of permutation entropy, mentioned above, make it particularly well-suited for long time series such as those collected from EEG or ECG machines [27,30]. Extensions of permutation entropy such as creating a spectorgram-like visualization of permutation entropy by considering patterns defined by x t , x t+d , x t+2d for some various d ≥ 1 are able to even further highlight subtle changes in behavior, even for periodic data. This method was used to characterize sleep stages from EEG data and matched the expert annotations almost exactly [8]. Similar scale data sets are becoming increasingly available in the current big data paradigm and permutation methods are well positioned to contribute to comprehensive and meaningful analyses.
Another well-motivated application of permutation entropy appears in the context of economic markets. According to economic theory, an efficient market is one in which price histories cannot predict future behavior, and thus the market is described by a random walk [16,17]. Thus, the proximity of a particular market to the random walk model serves as a proxy for market efficiency. Observed market inefficiencies can be caused by communication barriers, unfair competition, momentum, and calendar year effects including the release or announcement of new product lines, among others. As a result, quantifying inefficiency over time and comparing relative inefficiency between markets is an important, longstanding question in finance [16].
To distinguish developed and emerging markets, the authors of [41] use permutation entropy on the changes in stock prices (returns) to measure the independence of these steps. Other economics researchers used similar methods to evaluate market volatility directly [29]. The approach presented in this paper is motivated in part by these recent applications of permutation entropy and forbidden other related measures have grown in popularity. Thus, most recent applications of permutation methods to time series use a related measure, Permutation Entropy, which is computed from the distribution of patterns that occur, rather than making measurements defined by the strictly binary forbidden/allowed distinction.
1.2. Permutation Entropy. Currently, the most commonly used metric on pattern distributions in time series is the permutation entropy, originally described in [10]. For a time series X = {X i } and fixed integer n this measure is defined to be the Shannon entropy for the distribution of ordinal patterns of length n that occur in X and is defined to be [10] PE n (X) = − 1 log(n!) π∈Sn p π log(p π ), where p π represents the proportion of patterns of length n with shape π and the logarithm here, and throughout this paper, is taken base 2. The following table gives the permutation entropy and number of forbidden patterns in several different types of data sets for small values of n. The data is fully described in Section 1.5 and contains both empirical and simulated time series. Of particular interest is the fact that missing patterns appear in all data sets for n = 6, even those that are guaranteed (cf. Propositions 1 and 2) asymptotically to contain all patterns. Additionally, notice that the permutation entropy values are quite large for many of the noisy and random data sets.

Data
Forbidden Patterns Permutation Entropy n = 4 n = 5 n = 6 n = 4 n = 5 n = When a time series is defined by iterating a piece-wise monotone interval map f , the permutation entropy of the time series coincides with the Kolmogorov-Sinai entropy of f [25,24]. Thus, as the number of forbidden patterns is a permutation analog of the topological entropy of f , the permutation entropy is an analog of the Kolmogorov-Sinai entropy of f [9]. However, most time series data that we encounter is not assumed to be derived from an iterated function, even with a noisy model.
For a time series whose values are drawn independently from a given distribution, each pattern of length n asymptotically appears with the same relative frequency, see Proposition 1. Such a time 3 series is considered to be of maximal entropy and has expected permutation entropy equal to 1 as the number of time steps goes to infinity. This motivates a recently introduced, alternative interpretation of permutation entropy, as the Kullback-Leibler divergence (KL divergence) of the deviation of the empirical distribution from that of white noise (see [8,23] for some exposition about this perspective). The KL divergence for the distribution of patterns in Z from those in Y is defined by: The relationship between permutation entropy and the Kullback-Leibler divergence of the distribution of patterns in the time series from the uniform distribution, U , is The formulation of permutation entropy in terms of the KL divergence from the expected behavior of white noise motivates our approach in this paper since many types of time series, particularly those arising in financial contexts, exhibit characteristic behavior of their distributions of patterns that is highly non-uniform. Our purpose here is to quantitatively explain this difference and provide null models that more closely approximate the distributions seen in actual data.
1.3. Notation and Terminology. For consistency, we describe the notation that we will use throughout this paper. Given an ordered list of values x 1 , x 2 , . . . , x n with x i = x j for all i = j we define the associated permutation st(x 1 , x 2 , . . . , x n ) = π ∈ S n such that x π −1 (1) < x π −1 (2) < . . . < x π −1 (n) . This is also called the ordinal pattern of x 1 , x 2 , . . . , x n . Given a time series X = {x 1 , x 2 , . . . , x N }, we represent the ordinal pattern of length n, beginning at time t, by st(X, n, t).
In this paper we are concerned with the distribution over patterns rather than the specific time of occurrence of any individual pattern since, as described above, the distribution of patterns in a time series X contains important information about the underlying dynamics. For a fixed time series X and permutation π ∈ S n , we denote the empirical proportion of occurrences of the pattern π in X by p π := |{i : st(X i , X i+1 , . . . , X i+n−1 ) = π}| N − n + 1 .
Similarly, to a sequence of independent random variables, {Z i } n i=1 , we define the expected proportion of occurrences of π ∈ S n by P Z (π) = P(st(Z 1 , Z 2 , . . . , Z n ) = π) = P(Z π −1 (1) < Z π −1 (2) < . . . < Z π −1 (n) ), noting that by independence the starting point does not change the probability. Thus, for a long time series, X t , whose values are determined by drawing a value at random according to Z t , we expect p π ≈ P Z (π). Additionally, by Proposition 1, we note that if the {Z i } n i=1 are independent and identically distributed continuous random variables, then P Z (π) = 1 n! for all π ∈ S n . Thus, the distribution of patterns in white noise (i.e. a randomly generated time series) is approximately uniform and converges to the uniform distribution as the length of the time series goes to infinity.
We primarily focus on the distribution of patterns in random walks are independent and identically distributed continuous random variables with Z i = i−1 j=1 Y j . Since the probabilities P Z (π), for π ∈ S n only involve the first n random variables, it will be enough to consider finite random walks, If there are no requirements on the distributions of the steps {Y i } we say that this is an arbitrary random walk while if the steps, {Y i }, are symmetric random variables we will say that Z is a symmetric random walk. 4 In this paper, we focus on the properties of two particular random walk null models based on standard step distributions. When the steps {Y i } are normally distributed, we refer to this as a random walk with normal steps, with parameters µ and σ. When the steps {Y i } are uniformly distributed on the interval [b − 1, b], with 0 < b < 1, we refer to this as a random walk with uniform steps. The parameter specifying the distribution is P(Y i > 0) = b. Due to the scale invariance of the permutation measure, it suffices to consider an interval of unit length. Since each of the Y i , are identically distributed, we will sometimes drop the subscript when referring to their distributions.
1.4. Contributions. Our purpose in this paper is to describe the distributions of ordinal patterns of random walk null models for time series data in order to derive a corresponding KL measure generalizing permutation entropy. These models are motivated by the KL divergence definition of permutation entropy described in Section 2.2 and domain specific hypotheses about the random behavior of time series data. In the next section we describe the theoretical properties of these models, including the expected distributions, which allow us to define a KL divergence to the derived values. Next, we describe a metric, based on recent work of Martinez and Elizalde [31], that measures how well a given distribution matches any random walk model. We conclude by applying the new methods to a wide variety of data sets to demonstrate their advantages and applicability.
1.5. The Data. Throughout this paper we use several example data sets to evaluate our methods and compare to traditional approaches. Unless otherwise specified, these time series have N = 2000 data points. This data includes synthetic random values as well as empirical data from economics, ecology, and medicine. Below we describe the key features of the data and the abbreviations that we use throughout the paper. Plots of the time series are displayed in Appendix A.  [20]. • (HEART): Instantaneous heart rate measurements taken at .5 second intervals collected at MIT [18].
In all cases, random values are generated using Mathematica's [20] pseudo-random number generator and all historical market closing values are provided by Morningstar through Mathematica. In the final section, we use the daily closing prices of the S&P500, Apple (AAPL), Amazon (AMZN), Bank of America (BAC), General Electric (GE), Coca Cola (KO), and United Parcel Service of America (UPS) for trading days from January 1, 2002 to January 1, 2017 (N = 3777). Finally, for a longitudinal test, we use daily closing prices of the S&P500 from January 1, 1958 until January 1, 2017 (N = 14348). 5

Distributions of Patterns in Random Walks
In this section, we establish some of the important properties of the distribution of patterns for random walk null models. For the uniform and symmetric normal random walk models, we give the distribution of patterns of length n = 3, 4 in Table 2 and show how these values can be computed for larger n in Proposition 5.
2.1. Comparison to Forbidden Patterns and Permutation Entropy. We begin by showing that any data model whose values are i.i.d. random variables gives rise to the uniform distribution over permutations.
continuous random variables, then for any π ∈ S n , we have Proof. Let π ∈ S n ; we will show that , and so transposing any pair of variables does not change the probability of the event. It follows that Therefore, all permutations of a fixed length occur with the same probability, and so P Y (π) = 1 n! . In particular, Proposition 1 implies that for any time series Y = {Y i } ∞ i=1 generated by a random process, for each π ∈ S n , we expect the relative frequency of π to approach 1 n! as N → ∞. The next results describe how the behavior of random walk models differ from the forbidden pattern and permutation entropy measures by showing that in such a model there are no forbidden patterns (Proposition 2) and that the distribution of patterns is never uniform (Propositions 3 and 4).

Proposition 2.
If Z is a normal or uniform random walk such that P(Y > 0) / ∈ {0, 1} then for any π ∈ S n , we have Proof. Let b = P(Y > 0), consider the case when b ≤ 1 2 , the other case follows a similar argument. Define a collection of intervals by Notice that {I j } n j=1 ⊂ (0, b). We claim that for all positive integers i and pairs 1 ≤ j, k ≤ n we have Suppose that j < k, the other case follows a parallel argument. Let Z i+1 ∈ I j , and so either a) have positive probability since every interval has positive probability. Hence, P(Z i+1 ∈ I k |Z i ∈ I j ) > 0.
In the case that Y i is uniform on We now write Notice that the events P(Z i ∈ I k |Z i−1 ∈ I j ) and P(Z i ∈ I k |Z i −1 ∈ I j ) are independent since they only depend on Y i , Y i , respectively, which are themselves independent for i = i . We now have and conclude that P Z (π) > 0.
In particular, Proposition 2 implies that in a time series Z = {Z i } ∞ i=1 defined by a normal or uniform random walk model, we expect that Thus, as mentioned previously, for the random walks in Table 1, the patterns that do not appear in the time series X are not forbidden, but merely "missing." Note that this is another example of the divergence of random walk models from the traditional methods that were motivated by one-dimensional dynamical systems.
This result mirrors recent work on permutons [26], where a similar result is obtained in the limiting case. For our purposes, this is enough to show that the distribution of patterns in a random walk can never match the uniform distribution derived from i.i.d. random data.
Proposition 4. For any random walk Z, the distribution of patterns of length n (for n ≥ 3) is not the uniform distribution.
Proof. We have that In such a case, we would obtain If the patterns of length n had the uniform distribution, then P Z (π) = 1 n! for each π ∈ S n . But this is impossible because 1 2 n−1 = 1 n! when n ≥ 3. This result implies that for data sets derived from a random walk, the distribution of ordinal patterns must differ from the uniform distribution enforced by random data. Figure 1 shows examples of the characteristic shapes of distributions that arise from random walks. The symmetry apparent in the first two distributions of patterns is not coincidental. It arises from the symmetry in the definition of the random walks which we discuss more fully in Section 3. 7 Proposition 5. For a uniform or normal random walk Z, the value P Z (π) for π ∈ S n can be interpreted as a volume of a region in an (n−1)-dimensional surface and bounded by certain hyperplanes through the origin (see Figure 2).
Proof. Let b = P(Y > 0). We graphically represent the joint distribution {Y 1 , Y 2 , . . . , Y n−1 } as an (n−1)-dimensional region. Thus, the probability density function is uniform on the (n−1)dimensional cube [b − 1, b] n−1 , which is partitioned by patterns according to the relative order of (Z 1 , Z 2 , . . . , For example, consider P(π) = P(Z π −1 (1) < Z π −1 (2) < . . . < Z π −1 (n) ). If π −1 (i + 1) > π −1 (i), the inequality Z π −1 (i+1) < Z π −1 (i) becomes Similarly, if π −1 (i + 1) < π −1 (i), the inequality Z π −1 (i+1) < Z π −1 (i) becomes It follows that the regions in the hypercube such that Z π −1 (1) < Z π −1 (2) < . . . < Z π −1 (n) are bounded by certain hyperplanes of the form a 1 x 1 + a 2 x 2 + . . . + a n−1 x n−1 = 0, for a 1 , a 2 , . . . , a n−1 ∈ {0, 1}, and so, in all cases P Z (π) can be interpreted as a volume of a region in an (n−1)-dimensional cube bounded by hyperplanes through the origin. For the random walk with uniform steps and P(Y > 0) ≥ 1 2 , the probability of each of the ordinal patterns of length 3 and those of length 4 occurring are given in the Table 2 in Appendix A. For a random walk with normal steps and µ = 0, the probability of each of the ordinal patterns of length 3 and length 4 are computed using Proposition 5 and the spherical symmetry of the multi-variate normal distribution and the area of spherical triangles. In particular, when µ = 0, the spherical symmetry of sums of normally distributed random variables tells us that the distribution of patterns is independent of the variance, but this is not the case when µ = 0.
This result allows us to determine the expected behavior of the distribution of ordinal patterns under the assumption that the data was generated by a particular random walk null model. Thus, we can compute the KL divergence between the expected value and empirical data to measure the portion of the behavior explained by the random walk model.

2.3.
Examples. We conclude this section with two examples highlighting the differences between our models and the i.i.d. model that underlies permutation entropy. This allows us to demonstrate that for some data sets, the distributions derived from a random walk model matches empirical data quite closely compared to the uniform distribution.
We begin by constructing a time series of length 2000 from a uniform random walk (U-DRIFT RW) by fixing b = P(Y > 0) = .65 and comparing the distribution of patterns of length 4 to the values derived from Proposition 5 as well as the uniform values of 1 24 . Figure 3 displays these results, the observed distributions are plotted in blue (on both graphs) while the gray bars represent the expected random walk distribution (left) and uniform distribution (right).
As expected, the observed values match the null model distributions much more closely than the uniform distribution. Note that the expected and observed values on the left do not match exactly because the emprical time series has finite length. This is a common feature of time series data that is observed throughout this paper.  Figure 3. The distribution of patterns of length n = 4 in U-DRIFT RW, listed in lexicographical order compared to (a) the true distribution of patterns in the uniform random walk on with P(Y > 0) = .65 (see Table 2) and (b) the distribution of patterns in white noise.
We next consider a similar analysis for economic market data, using the closing prices of the S& P 500 over a seven year period (SP500). For this example, we need to estimate an underlying distribution. To do this we calculate the sequence of steps {X t+1 − X t } ∞ t=1 (called the stock returns)and find the best fit normal curve; in this case obtaining parameters (µ, σ) = (0.702, 14.945). The null model for SP500 is the distribution of patterns for the normal random walk with these parameters. Using a simulated normal random walk we approximate the distribution of patterns for a fixed n.
This null model is shown in Figure 4 for n = 4 and n = 5. Note that this data displays a very similar shape to those in Figures 1 and 3 and is highly non-uniform. This reinforces our conclusion that modeling some time series with random walk null models more effectively describes the behavior in this case than permutation entropy. 9  Figure 4. The distribution of patterns, listed in lexicographical order, for the uniform random walk null model for SP500 of length (left) n = 4 and (right) n = 5. Note that the distributions are far from uniform as is characteristic of random walk data.
This example was computed with respect to a particular null model, however, there are many options for selecting the distribution of steps Y . A discussion of the possible inferential processes for selecting Y given a particular data set is beyond the scope of this paper. However, for the purposes of comparing to permutation entropy we consider several difference choices of Y and compare their performance to the uniform distribution. These results are summarized in Figure 5 below.
We compare the distributions derived from the actual SP500 data to three random walk null models: (a)the normally distributed model described above with (µ, σ) = (0.702, 14.945), (b) a uniform model with P(Y > 0) = p 12 = .5441, and (c) a uniform model fitting the stock returns with P(Y > 0) = .5279. The error between the expected values and the empirical values are shown for each permutation in Figure 5. Notice that each of the random walk models significantly outperforms the permutation entropy model on almost all permutations. The sum of squared errors for randomness is 0.0213 and for each of the models is (a) 0.0018, (b) 0.0027, (c) 0.0031. Although there is some variance among the random walk models, they each convincingly outperform the uniform distribution.  Figure 5. Comparison of null model distributions for the SP500 data to the uniform distribution. The difference |p π − P Z (π)| is plotted for each of the four null models:

Equality in Any Random Walk
Although the distributions of permutations under random walk null models are not uniform they are still constrained in some ways by the structure of the models, particularly the assumption of i.i.d. steps. This is reflected in the characteristic distribution shapes displayed in the figures above. 10 The possible behaviors of these models was recently considered in [31], giving a classification of permutations that must occur with the same probability in any random walk model. Here we use related results to characterize distribution in terms of their proximity to the random walk constraints.
The existence of nontrivial equivalence classes of permutations that appear with the same frequency in any random walk is an important distinguishing characteristic of patterns in this context. To illustrate the symmetries underlying this feature of permutations in a random walk, we present two results describing constraints that must occur in this setting. Proposition 6. If Z is a random walk with symmetric steps, then P Z (π) = P Z (π c ), where π c (i) := (n + 1) − π(i) is the complement of π.
Proof. Since Z j = j−1 i=1 Y i is a sum of symmetric random variables, Z j is itself symmetric. Thus, for any 1 ≤ j, k ≤ n, we have P(Z j > Z k ) = P(Z j < Z k ). It follows that The symmetry condition in Proposition 6 is quite strong. In particular, it will not apply to real world data containing drift or expected long term gain. In contrast, Proposition 7 holds for any random walk, regardless of the underlying distribution of steps.
In particular, Proposition 7 explains why the probability of certain permutations, such as 1243 and 2134, are equal in each of the distributions considered in Table 2. In [31], Proposition 7 is extended to give a complete characterization of the classes of patterns that appear with the same frequency, regardless of distribution.
In particular, the patterns that are listed in the same line in Table 2 occur with the same probability in any random walk, regardless of the distribution associated to the steps. The full decomposition into equivalence classes is presented in Table 2 in Appendix B. This explicit decomposition had not been previously computed for n = 4, 5.
Next we use this structure to define a simple test for determining whether a random walk may be an appropriate choice of model based on these equivalence classes. For each equivalence class Λ i ⊂ S n , of permutations occurring with the same probability in any random walk, define µ i = 1 Λ i π∈Λ i p π . We let g n (T ) be total variation from the mean across each equivalence class Thus, g n (X) is a measure of the amount that the distribution of permutations that remains unexplained by any random walk model. Figure 6 demonstrates how the value of g n (X) evolves for a normal random walk and a sequence of i.i.d. randomly generated data points (RAND) for n = 4 and n = 5. As predicted above as N → ∞ the values of g n (X) go to zero but that it requires a large number of data points, echoing our comment in Section 2.3 about time series data. This is further supported by the fact that we observe that the random walk and the i.i.d. model appear to converge at the same rate, suggesting that the discrepancy is caused by the finite number of time steps.
As an example of a model that does not respect these classes, consider a sequence of random variables Z = {Z i } ∞ i=1 and Z i = n−1 j=1 Y j where the steps Y j are drawn from: In this case the pattern 1243 occurs with a relatively high frequency but the pattern 2134 is forbidden, leading to a large value for g n (T ), in expectation. Notice that Z is not a random walk because the steps Y j are not i.i.d. and hence this sequence does not contradict any of our previous propositions.
In Section 5, we calculate ε 12...(n−1)n and ε n(n−1)...21 for several stocks and discover the effects of market momentum in the data.

KL Divergence Method
As described in Section 1.2, it is natural to interpret the permutation entropy of a time series as a measure of the divergence of the distribution of ordinal patterns from the uniform distribution as in white noise. Here, we compute the KL divergence to the distribution of patterns determined by a random walk model as a more nuanced measure of complexity. This is particularly relevant for data that is expected to be generated from a random walk process, such as stock closing prices. We also consider some periodic weather and heart rate data whose behavior lies in between these extremes.
This measure more accurately reflects the underlying process that generates our data. This is important as it allows us to more accurately explain the behavior of the time series. Additionally, observed deviations from the model are more meaningful in this setting since the random walk is chosen as a purposeful null model, rather than occurring as an artifact, as in the case of permutation entropy.
In the remainder of this paper, we construct null models by sampling from the distribution of observed steps from the data as described below. This approach has two advantages, first, we need not artificially select a particular inferential framework and second, it allows us to control for variance by generating many samples and comparing them to the observed data. Differences between the models and the empirical time series are then related to correlation between the steps.
To determine how the behavior these data sets deviate from a random walk, we compute the relative frequency p π of each of the patterns π of length four in the daily closing values, X. Next, we construct a random walk model, Z of length M N , whose steps are taken by drawing at random from from the distribution of steps {X 2 − X 1 , X 3 − X 2 , . . . , X n − X n−1 } in the original time series; we refer to Z as the random walk associated with X. For each time series, we determine the deviation from the model by computing where p π is the relative frequency of π in X and q π is the relative frequency in π in Z. In order to directly compare our results to permutation entropy we computed 1 − PE n and D KLn (X) for each of the data sets RAND, HEART, MEX, NYC, SP500, GE, and NORM RW. The results are displayed in Figure 7. The permutation entropy is plotted on the left and the random walk KL on the right.  Figure 7. On the left, we compute PE n for the time-series for n = 4 (in blue) and n = 5 (in orange). On the right, we compute D KLn for n = 4 and the data of length N = 2000 (blue). We generate 400 random walks X of length N = 2000 and compute D KLn ( X) for each. The mean and errors are plotted in gray. Figure 7 (a), the changes in heart rate are more correlated than steps in the other time series investigated here. However, when considering the KL divergence method, randomness is 13 the time series furthest from a random walk. This supports our view that the KL divergence method is frequently a better measure of deviation from a random walk than the permutation entropy of steps. The weather data sets are an interesting example where the structure is periodic and hence neither uniformly random or a fixed random walk. Thus, we see moderate performance under both measures. However, notice that PE n only slightly distinguishes temperature data and a simulated random walk but the D KLn measure clearly separates them.

As shown in
To add context to the value of the KL divergence, we simulated 400 random walks, X, associated with X of length N and calculated D KLn ( X) for each. Using these simulations, we calculate the mean and standard deviation of the KL divergence of the simulated random walks against the model. These are plotted in Figure 7(b). Notice that the stock data is much better approximated by the random walk of its steps than any of the other time series.
Finally, in order to determine how the length of the time series affects D KL , we simulate a uniform random walk with µ = 0, X, of length N and compare it to the distribution of patterns in the random walk. The results mirror those of Figure 6. For X of length N = 1000, D KL4 ( X) ≈ 0.10 and D KL5 ( X) ≈ 0.11. For X of length N = 5000, D KL4 ( X) ≈ 0.07, where it remains for larger N , and D KL5 ( X) ≈ 0.01, and falling to 0.007 when N = 10, 000. This is expected behavior as the value goes to zero in the limit in expectation.
Expanding on our remarks from the previous section, permutation entropy has frequently been used to study financial time series. For instance, permutation entropy and the number of forbidden patterns for both closing values and returns were suggested as methods for distinguishing developed and emerging markets with the aim of using these measures to quantify stock market inefficiency [41]. In this analysis, permutation entropy of returns were correlated with either being a developed or an emerging market, with emerging markets having smaller permutation entropy (i.e. more correlation). We plot these values for our data sets below and perform a more direct comparison in the following section.  Figure 8. On the left, we compute PE n for the time-series of steps X n − X n−1 for n = 4 (in blue) and n = 5 (in orange). This can be used as a measure of step independence and was presented in [29] as a measure of volatility in developing economic markets.
A careful analysis of this method demonstrates some key features that lead us to prefer the explicit random walk model. First, we note that this measure assigns very low values to all of the stock data. As we will see in the next section, this property limits the amount of information that can be extracted. Secondly, we note that the measure does not clearly distinguish the periodic weather data from the random walks. Finally, the the permutation entropy of the steps discovers 14  Figure 9. Values of ε π for π = 1234 in blue, and π = 4321 in red. Larger values of ε π correspond to markets containing longer increasing (resp. decreasing) runs than predicted by the associated random walk model. a relatively high value for i.i.d. randomly drawn data points because the difference between the random variables is not independent.

Inefficiency in Financial Markets
In this final section, we analyze the stock market data more closely, using the KL method and measure of momentum introduced above. Economic heuristics suggests that the most appropriate model of the stock market is that of the random walk, see for example [16]. Moreover, since a market whose prices are modeled by a random walk is considered efficient, the divergence of a market from that of a random walk serves as a measure of inefficiency [16]. Developing meaningful measures of market inefficiency is an important and well-studied question in finance. Applying our method from Section 4 to a variety of stocks, we posit that a measure of inefficiency using the KL divergence from a random walk null model is preferable to the permutation entropy of returns.
First, using the measures ε 12...(n−1)n := p 12...(n−1)n − (p 12 ) n−1 and ε n(n−1)...21 := p n(n−1)...21 − (p 21 ) n−1 that we developed in Section 3, we capture the momentum phenomena observed in financial markets. Indeed, ε 12...(n−1)n > 0 suggests a presence of upward momentum and ε n(n−1)...21 > 0 suggests a presence of downward momentum. As depicted in Figure 9, for each of the stocks considered, the values of ε π for π = 1234 and π = 4321 are positive, suggesting a presence of both upward and downward momentum in these markets. Both of these results accord with economic data reported by the NBER [21,22] Although the previous result suggests that a random walk may not capture all of the information about the stock behavior since the momentum is a measure of correlation of the steps, which we have assumed to be i.i.d., we conclude with two examples demonstrating the advantages of the random walk divergence over permutation entropy. For each of the stocks under consideration, we form 400 random walks, X, associated to X of length N = 3777 (the length of X). Then, to determine the significance of D KL4 (X), we compute D KL4 ( X) for each.
The results of this experiment are presented in Figure 9. As depicted in Figure 10, Apple stock (AAPL) was furthest from a random walk, perhaps a result of calendar year phenomena associated the release of new products. On the other hand, large industrial stocks such as General Electric, 15 Coke, and United Parcel Service (resp. GE, KO, and UPS) adhere more closely to the random walk model and are considered more efficient markets in this analysis.  Figure 10. On the left, we compute PE n for the time series of steps for n = 4 (in blue) and n = 5 (in orange). On the right, we compute D KLn for n = 4 and the data of length N = 2000 (blue). We generate 400 random walks X (associated with X) of length N = 2000 and compute D KLn ( X) for each. The mean and errors are plotted in gray.
As a final application of these methods, we use historical S&P500 closing prices from January 1958 until January 2017 and plot our measure of inefficiency, D KL4 , over time, comparing to the permutation entropy of the steps. For each year from 1960 until 2014, for the five year range surrounding the year (i.e. from January 1 of two years prior to December 31 of two years after, N ≈ 1258), we compute D KL4 for the S&P500, see Figure 11.
The general trends depicted in the plot of D KL4 resonate with the evolution of technology and economic events of that time, while the permutation entropy of the steps is less informative. In particular, we can see the decline in inefficiency as a result of computerized trading, as well as the stock market crash of 1989, the 2000 technology bubble, and the 2008 financial crisis causing an increase in variability and distance from the model. The results presented here are similar to those in [19] for the Shanghai and Shenzhen Stock Exchanges. 1960Exchanges. 1965Exchanges. 1970Exchanges. 1975Exchanges. 1980Exchanges. 1985Exchanges. 1990Exchanges. 1995Exchanges. 2000Exchanges. 2005Exchanges. 2010Exchanges. 2015 .025 .05 .075

.10
Year D KLn Figure 11. Computation of D KL4 (orange) and the permutation entropy of the steps (blue) on historical S&P500 daily closing prices during each 5 year window surrounding the year on the x-axis. Both of these metrics can be treated as a proxy for inefficiency but the D KL4 provides significantly more information. 16

Conclusion
In order to account for observed behavior of the distribution of ordinal patterns in time series from economics and other fields, we have introduced a measure of complexity based on random walk null models. Since much of the structure of the ordinal patterns appearing in these financial time series is explained by the underlying process of a random walk, this measure is better suited for such time series than previous methods based on permutation entropy. We provided theoretical and numerical results on the distribution of patterns in the context of random walk models and provided a set of tools for analyzing the complexity of data modeled by time series. Additionally, we have applied our methods to examples from several different domains in order to validate their usefulness. Not all time series data plausibly arises from random walk processes but for those that do the methods presented in this paper provide a principled method for studying their complexity and inefficiency. Here we give the expected distributions of ordinal patterns for the uniform and normal random walk models as determined by Proposition 5. Recall that for the normal distribution the values do not depend on the variance when the mean is zero. For the uniform distribution the µ = 0 case is equivalent to setting b = 1 2 .  Table 2. The values of P Z (π) for the normal distribution with µ = 0 and in the uniform case for P(Y > 0) = b, where 1 2 ≤ b ≤ 1.