^{1}

^{ }

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Using a recently discovered method for producing random symbol sequences with prescribed transition counts, we present an exact null hypothesis significance test (NHST) for mutual information between two random variables, the null hypothesis being that the mutual information is zero (

Mutual information is an information theoretic measure of dependency between two random variables [

Zero dependence occurs if and only if

In this article we are interested in the case that the marginal and joint probabilities are not known beforehand, but are approximated from data, so that estimates of

The problem of determining significance of dependency can be formulated as a chi-squared test [

To introduce the need for a significance test, suppose the random variables

In

The most probable value of mutual information is 0.3 bits/roll, which—if we did not know better—might seem significant considering that the total uncertainty in one die roll is log_{2} 6 ≈ 2.585 bits.

The true significance of

The logic we are describing is that of a null hypothesis significance test (NHST) for mutual information, the null hypothesis being that the mutual information is zero. The probability of obtaining the measured

To be clear, the

To perform an NHST we need to know the distribution of the test statistic given the null hypothesis. In general, this distribution is not known

In the case of dice, these conditions can be met exactly by randomly permuting the elements of

Also shown in

An equivalent way to compute exact _{ij}_{n}_{n}

In this context, counting tables is equivalent to counting sequences with fixed marginals, neither of which is remotely practical except for very small data sets. For the case of 75 rolls of a fair 6-sided die, the number of permutation surrogates is in the order of 10^{53}. In contrast to Fisher’s exact test, the permutation test requires only a uniform sampling from the set of sequences with fixed marginals, rather than a full enumeration. The exact

Permutation surrogates preserve single symbol frequencies but not multiple symbol (or

where _{ij}

We use simulation to discover the true distribution for

To create an exact test, the surrogates need to be constrained such that not only single symbol counts but also the counts of consecutive symbol pairs are preserved. By preserving the counts of both single and consecutive symbol pairs, the transition probability of the surrogate sequences is made identical to that of the observed sequence.

To be more general, let ^{k}_{n}_{n}_{−1},…, _{n}_{−}_{k}^{k}^{+1} denote a (^{k}^{k}^{k}

Knowing that our Markov dice are order one, we generate the correct null hypothesis distribution from surrogates of order one (

The algorithm described in the Appendix can be simply modified to enumerate every sequence of a given Markov order and given marginals. The exact

Our algorithm enables the investigator to produce surrogates of a given order but introduces another issue: finding the Markov order of the data. To illustrate, let us take the _{n}_{+1} = _{n}_{n}

In

For the 250-sample logistic map data, the null distribution estimate improves up to order two and then degrades gradually thereafter, based on the root mean square error between the estimated and actual distributions.

What is needed is a method for selecting the optimal order. Fortunately, this is the context in which the order-preserving surrogates were originally developed [

Note that because entropy is reduced by the presence of higher order structure, the

The results of the significance tests for orders

Using the standard significance level (

Using this methodology to select the Markov orders, we repeated the exact NHST

In summary, we have described an exact significance test for

As a final comment, we wish to point out that this exact test is not sufficient for

We now present the procedure for producing random symbol sequences with prescribed word counts (following [

Let be the set of sequences that have the word transition count matrix

where _{i}_{vu}

As an example, consider the following sequence of twelve binary observations:

The sequence x has

From

and _{10} = 4/5. Substituting into

The cardinality of the set (

From Whittle’s formula we can construct a sequence with a prescribed transition count. Let the sequence y = {_{1} … _{N}_{1} = _{N}_{2} are the set {_{y}_{1}_{w}_{wv}

Once _{2} is chosen, _{3} and so on until _{N}_{−1} is reached.

Returning to our example, we have _{1} = 0, _{N}_{12} = 1, and _{2} lead to the following number of remaining sequences:

Therefore _{2} = 0 is chosen with 20/80 = 1/4 probability and _{2} = 1 with 3/4 probability. By weighting our choice at each step using Whittle’s formula, we guarantee that invalid sequences are not selected and that all valid sequences are selected with uniform probability.

If a complete list of all valid sequences is desired, then modify the algorithm to follow every path that has a non-zero probability.

Both authors contributed to the initial motivation of the problem, to research and calculation, and to the writing. Both authors read and approved the final manuscript.

The authors declare no conflict of interest.

^{2}from contingency tables and the calculation of

Mutual information between a pair of independent dice rolled 75 times. Distribution computed from

Mutual information between a pair of independent Markov dice rolled 150 times. Distribution computed from

A typical trajectory of the logistic map,

Distribution of

Markov order tests for a logistic map, _{k}_{+1}(_{k}_{+1}(_{k}_{+1}(