Recently, the permutation-information theoretic approach to time series analysis proposed by Bandt and Pompe [1
] has become popular in various fields [2
]. It has been proven that the method of permutation is easy to implement relative to the other traditional methods and is robust under the existence of noise [3
]. However, if we turn our eyes to its theoretical side, few results are known for the permutation analogues of information theoretic measures, except the entropy rate.
There are two approaches to introduce permutation into dynamical systems theory [8
]. The first approach is introduced by Bandt et al.
]. Given a one-dimensional interval map, they considered permutations induced by iterations of the map. Each point in the interval is classified into one of n
! permutations according to the permutation defined by n
− 1 times iterations of the map starting from the point. Then, the Shannon entropy of this partition (called the standard partition) of the interval is taken and normalized by n
. The quantity obtained in the limit n
→ ∞ is called permutation entropy if it exists. It was proven that the permutation entropy is equal to the Kolmogorov-Sinai entropy for any piecewise monotone interval map [9
]. This approach based on the standard partitions was extended by Keller et al.
The second approach is taken by Amigó et al.
]. In this approach, given a measure-preserving map on a probability space, first, an arbitrary finite partition of the space is taken. This gives rise to a finite-alphabet stationary stochastic process. An arbitrary ordering is introduced on the alphabet, and the permutations of the words of finite lengths can be naturally defined (see Section 2
below). It is proven that the Shannon entropy of the occurrence of the permutations of a fixed length normalized by the length converges in the limit of the large length of the permutations. The quantity obtained is called the permutation entropy rate (also called metric permutation entropy) and is shown to be equal to the entropy rate of the process. By taking the limit of finer partitions of the measurable space, the permutation entropy rate of the measure-preserving map is defined if the limit exists. Amigó [16
] proved that it exists and is equal to the Kolmogorov-Sinai entropy.
In this paper, we restrict our attention to finite-alphabet stationary stochastic processes. Thus, we follow the second approach, namely, ordering on the alphabet is introduced arbitrarily. For quantities other than the entropy rate, three results for finite-alphabet stationary ergodic Markov processes have been shown by our previous work: the equality between the excess entropy and the permutation excess entropy [17
], the equality between the mutual information expression of the excess entropy and its permutation analogue [18
] and the equality between the transfer entropy rate and the symbolic transfer entropy rate [19
]. Whether these equalities for the permutation entropies can be extended to general finite-alphabet stationary ergodic stochastic processes is still unknown. However, for the modified permutation entropies defined by the partition of the set of words based on permutations and equalities between occurrences of symbols, which is finer than the partition obtained by permutations only, we have the corresponding equalities for general finite-alphabet stationary ergodic stochastic processes [20
The purpose of this paper is to generalize our previous results on the permutation entropies for finite-alphabet stationary ergodic Markov processes to output processes of finite-state finite-alphabet hidden Markov models with ergodic internal processes. Upon this generalization, somewhat ad hoc
proofs in our previous work for multivariate stationary ergodic Markov processes become straightforward. The key property of hidden Markov models (HMMs), which we will use repeatedly, is the following: a marginal process of the output process of a hidden Markov model with an ergodic internal process is, again, the output process of a hidden Markov model with an ergodic internal process obtained from the original hidden Markov model. In general, this property does not hold for multivariate stationary ergodic Markov processes. The generalization also makes us easily access quantities that have not been considered theoretically in the permutation approach. In this paper, we shall treat the following quantities: excess entropy [21
], transfer entropy [22
], momentary information transfer [24
] and directed information [25
]. As far as the authors are aware, the equality between the momentary information transfer and its permutation analogue and that for directed information have not been discussed anywhere. The equalities could be directly proven with some extra discussion to that in [17
] for finite-alphabet multivariate stationary ergodic Markov processes. However, the equalities can be proven straightforwardly, as in [17
], within the realm of HMMs with ergodic internal processes, once we show Lemma 3 below.
This paper is organized as follows: In Section 2
, we briefly review our previous result on the duality between words and permutations to make this paper as self-contained as possible. In Section 3
, we prove a lemma about finite-state finite-alphabet hidden Markov models. In Section 4
, we show equalities between various information theoretic complexity and coupling measures and their permutation analogues that hold for output processes of finite-state finite-alphabet hidden Markov models with ergodic internal processes. In Section 5
, we discuss how our results are related to the previous work in the literature.
2. The Duality between Words and Permutations
In this section, we summarize the results from our previous work [17
] that will be used in this paper.
Let be a finite set consisting of natural numbers from one to n, called an alphabet. In this paper, is considered as a totally ordered set ordered by the usual “less-than-or-equal-to” relationship. When we emphasize the total order, we call an ordered alphabet.
Note that the results in this paper hold for every total order on . This is because the probability of the occurrence of the permutations in a given stationary stochastic process over with an arbitrary total order is just a re-indexing of that with the “less-than-or-equal-to” total order.
The set of all permutations of length is denoted by . Namely, is the set of all bijections π on the set . For convenience, we sometimes denote a permutation π of length L by a string . The number of descents, places with , of is denoted by . For example, if is given by , then .
Let be the L-fold product of . A word of length is an element of . It is denoted by . We say that the permutation type of a word is if we have and when for . Namely, the permutation type of is the permutation of indices defined by re-ordering symbols in the increasing order. For example, the permutation type of is , because .
be a map sending each word,
, to its permutation type,
. For example, the map,
, is given by:
This example illustrates the following two properties of the map,
can be a proper subset of
. As one can see from Theorem 1 below,
is a proper subset of
, if and only if
. Second, two different words can have the same permutation type.
We define another map,
, by the following procedure:
Given a permutation, , we decompose the sequence, , of length L into maximal ascending subsequences. A subsequence, , of a sequence, , of length L is called a maximal ascending subsequence if it is ascending, namely, , and neither nor is ascending;
is a decomposition of
into maximal ascending subsequences, then a word,
, is defined by:
. Note that
, because π
is the permutation type of some word,
. Thus, we have
is well-defined as a map from
By construction, we have for all . To illustrate the construction of , let us consider a word, . The permutation type of is . The decomposition of into maximal ascending subsequences is 23, 145. We obtain by putting .
. In particular,
, if and only if
Let us put:
Then, restricted on is a map into and restricted on is a map into
. They form a pair of mutually inverse maps. Furthermore, we have:
The theorem is a recasting of statements in Lemma 5 and Theorem 9 in [17
be a finite-alphabet stationary stochastic process, where each stochastic variable,
, takes its value in
. By the assumed stationarity, the probability of the occurrence of any word
is time-shift invariant:
. Hence, it makes sense to define it without referring to the time to start. We denote the probability of the occurrence of a word,
. The probability of the occurrence of a permutation
is given by
For a finite-alphabet stationary stochastic process,
, over the alphabet,
, we define:
is the largest integer not greater than a
Let be a finite-alphabet stationary stochastic process and ϵ a positive real number. If for all , then we have .
The claim follows from Theorem 1 (ii). See Lemma 12 in [17
] for the complete proof.
3. A Result on Finite-State Finite-Alphabet Hidden Markov Models
In this paper, we use the parametric description of hidden Markov models as given in [27
A finite-state finite-alphabet hidden Markov model
(in short, HMM) [27
] is a quadruple,
, where Σ and A
are finite sets, called state set
is a family of
matrices indexed by elements of A
is the size of state set Σ, and μ
is a probability distribution on the set Σ. The following conditions must be satisfied:
for any and ;
for any ;
and for any .
Any probability distribution satisfying condition (iii) is called a stationary distribution
is called a state transition matrix
. The ternary,
, defines the underlying Markov chain
. Note that condition (iii) is equivalent to condition (iii’)
Two finite-alphabet stationary processes are induced by a HMM
. One is solely determined by the underlying Markov chain. It is called an internal process
and is denoted by
. The alphabet for
is Σ. The joint probability distributions that characterize
is given by:
. The other process
with the alphabet, A
, is defined by the following joint probability distributions:
and is called an output process
. The stationarity of the probability distribution μ
ensures that of both the internal and output processes.
Symbols , such that occurs in the output process with a probability of zero. Hence, we obtain the equivalent output process, even if we remove these symbols. Thus, we can assume that for any without loss of generality.
The internal process,
, of an HMM
is called ergodic
if the state transition matrix, T
, is irreducible
]: for any
, there exists
, such that
. If the internal process,
, is ergodic, then the stationary distribution μ
is uniquely determined by the state transition matrix T
via condition (iii). Every finite-alphabet finite-order multivariate stationary ergodic Markov process can be described as an HMM with an ergodic internal process.
Let be the output process of an HMM , where is an ordered alphabet. If the internal process, , of the HMM is ergodic, then for every , there exists and , such that for all .
, let us put
. Fix an arbitrary
. We have:
is the usual inner product of the
-dimensional Euclidean space,
. The spectral radius
of the matrix
is less than one. Indeed, this follows immediately from the Perron-Frobenius theorem for non-negative irreducible matrices: T
is a non-negative irreducible matrix with
by the assumption. Since
, applying Theorem 1.5 (e) in [29
] implies that
. By Lemma 5.6.10 in [30
], for any
, there exists a matrix norm,
, such that
. It follows that for any
, there exists
, such that for all
is the Euclidean norm. Since we have
, we can choose
, so that
. If we put
, then we obtain
by the Cauchy-Schwartz inequality as desired.
In this section, we discuss how our theoretical results in this paper are related to the previous work in the literature.
Being confronted with real-world time series data, we cannot take the limit of a large length of words. Hence, we have to estimate information rates with a finite length of words. In such a situation, one permutation method could have some advantages over the other permutation methods. As a matter of fact, TERV was originally proposed as an improved analogue of STE [43
]. However, it has been unclear whether they coincide in the limit of a large length of permutations. In this paper, we provide a partial answer to this question: the two permutation analogues of the transfer entropy rate, the rate of STE and the rate of TERV, are equivalent to the transfer entropy rate for bivariate processes generated by HMMs with ergodic internal processes.
The Granger causality graph [46
] is a model of causal dependence structure in multivariate stationary stochastic processes. Given a multivariate stationary stochastic process, nodes in a Granger causality graph are components of the process. There are two types of edges: one is directed, and the other is undirected. The absence of a directed edge from one node to another node indicates the lack of the Granger cause from the former to the latter relative to the other remaining processes. Similarly, the absence of an undirected edge between two nodes indicates the lack of the instantaneous cause between them relative to the other remaining processes. Amblard and Michel [40
] proposed that the Granger causality graph can be constructed based on directed information theory: let
be a multivariate finite-alphabet stationary stochastic process with the alphabet,
, the Granger causality graph of the process,
is the set of nodes,
is the set of directed edges and
is the set of undirected edges. Their proposal is that:
for any , , if and only if ;
for any , , if and only if .
Thus, in the Granger causality graph construction proposed in [40
], the causal conditional transfer entropy rate captures the Granger cause from one process to another process relative to the other remaining processes. On the other hand, the causal conditional instantaneous information exchange rate captures the instantaneous cause between two processes relative to the other remaining processes.
Now, let us consider the case when
is an output process of an HMM with an ergodic internal process. Then, from the results of Section 4.4
, we have:
for any , , if and only if ;
for any , , if and only if .
Thus, the Granger causality graphs in the sense of [40
] for multivariate processes generated by HMMs with ergodic internal processes can be captured by the language of the permutation entropy: the symbolic causal conditional transfer entropy rate and the symbolic instantaneous information exchange rate. This statement opens up a possibility of the permutation approach to the problem of assessing the causal dependence structure of multivariate stationary stochastic processes. However, of course, the details of the practical implementation should be an issue of further study.
Real-world time series data are often multivariate. However, it seems that univariate analysis is still main stream in the field of ordinal pattern analysis (see, for example, the papers in [47
]). We hope that this work stimulates multivariate analysis of real-world time series data.