# Uncovering Discrete Non-Linear Dependence with Information Theory

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

^{2}test to test that a Markov process is of a given order against a larger order. In particular, with it we can test the hypothesis that a process has rth-order against (r + 1)th-order, until the test rejects the null hypothesis and then choose the last r as the optimal order. The statistical tests for optimal order of Markov process from information theoretic perspective are currently missing and in this paper we fill that gap.

## 2. Kullback-Leibler Divergence Distribution

_{j}to state s

_{i}as

_{i}is conditioned on the history of previous k transitions ${s}_{{i}_{k}}\to \cdots \to {s}_{{i}_{1}}$. If the history of the previous transitions is known from the context or is irrelevant, we denote the previous k transitions with ${S}_{i}=({s}_{{i}_{k}}\to \cdots \to {s}_{{i}_{1}})\in {\mathcal{S}}^{k}$, and the resulting transition probabilities of the k-th order Markov process with

_{i}is the number of possible transitions from state S

_{i}, i = 1, …, M. Denoting with H(W) the entropy rate of the Markov process [15] corresponding to W

_{i}the number of times the system resided in state s

_{i}and let ${s}_{{i}_{1}},\dots ,{s}_{{i}_{{l}_{i}}}$ denote the possible states to transition to. If the transitions are drawn from a Markov process with transition probability matrix $W=[\mathbb{P}({s}_{i}\to {s}_{{i}_{j}})]$, we estimate the probability of transitioning from state s

_{i}to states ${s}_{{i}_{1}},\dots ,{s}_{{i}_{{l}_{i}}}$ with the following expression

_{k}(0, 1), k = 1, …, k

_{i}are uniformly distributed random variables on [0, 1] and 1

_{A}denotes the indicator function of set A. We note that $\mathbb{E}[\mathbb{Q}({s}_{i}\to {s}_{i}{}_{{}_{j}})]=\mathbb{P}({s}_{i}\to {s}_{i}{}_{{}_{j}})$ and ${\sigma}_{{i}_{j}}^{2}=\text{Var}(\mathbb{Q}({s}_{i}\to {s}_{i}{}_{{}_{j}}))=\mathbb{P}({s}_{i}\to {s}_{i}{}_{{}_{j}})(1-\mathbb{P}({s}_{i}\to {s}_{i}{}_{{}_{j}}))$.

_{i}−1 converge to normal distribution

_{r}∈ [0,1], r=1, …,l

_{i}, ${\sum}_{r=1}^{{l}_{i}}pr=1$ and we denote with ${f}_{\left\{p\right\}}:{\mathcal{D}}_{f\left\{p\right\}}\to \mathbb{R}$ the function that reflects the cross entropy between ℙ and ℚ

_{{}

_{p}

_{}}such that the Expression (11) is well defined. Using a Taylor expansion up to second order for function f

_{{}

_{p}

_{}}we obtain the following approximation

_{i}

^{2}distribution, the sample mean of n independent, identical χ

^{2}random variables of degree 1, is gamma distributed with shape $\frac{n}{2}$ and scale $\frac{2}{n}$ (see [16]), therefore we obtain that the distribution of Kullback-Leibler divergence is approximately

_{i}is the number of states with i possible transitions

_{i}the number of states with i possible transitions, i = 1, …, M.

## 3. Information Memory Loss

^{2}test [10]. AIC has had a fundamental impact in statistical model evaluation problems and has been applied to the problem of estimating the order of autoregressive processes and Markov processes. The estimator was derived as an asymptotic estimate of the Kullback-Leibler divergence. AIC is the most used and successful estimator of the optimal order of Markov process at the present time, as it performs better than BIC estimator for samples of relatively small size. Both AIC and BIC do not provide a test of a model in the sense of testing a null hypothesis, i.e., they can tell nothing about the quality of the model in an absolute sense—order selection is heuristic as the order estimator is usually defined as the minimizing value of this information criterion. χ

^{2}test can evaluate that a Markov process is of a given order against a larger order. In particular, with it we can test the hypothesis that a process has rth-order against (r + 1)th-order, until the test rejects the null hypothesis and then choose the last r as the optimal order. We note that as the χ

^{2}test relies on asymptotic distribution of χ

^{2}statistics, it does not take into account the sample size. On the other hand, the Iml is a theoretical information measurement that determines the optimal order of a Markov process in an absolute sense, i.e., it provides a statistical test, and it takes into account both the sample size and the number of estimated parameters.

^{2}test to determine the optimal order on the toy example. χ

^{2}test correctly identifies that the Markov process is of fifth order. On the other hand, both AIC and BIC do not correctly identify the optimal order of Markov process in all evaluated cases. For instance, both methods claim that the second order Markov process is the best approximation of the seventh order Markov process-false, as we have prescribed the Markov process to be of fifth order. We note that the mistake might be due to heuristic order selection criteria for AIC and BIC.

## 4. Information Codependence Structure

_{i}∈ ${\mathcal{S}}_{i}$ and let W

_{i}, i = 1, …, m denote the accompanying transition probability matrices. We construct the states s ∈ ${\mathcal{S}}_{1}\times \cdots \times {\mathcal{S}}_{m}$ of joint Markov process in the following manner

_{i}of i-th Markov process, since independence implies that the transitions in i-th Markov process does not depend on states on other Markov processes.

_{1}⊕…⊕W

_{m}and set the transition probabilities to

_{1}⊕…⊕W

_{m}models the transitions of a joint Markov process assuming there is no co-dependence structure among Markov processes i = 1, …, m.

_{1}⊕W

_{2}and compute the Kullback-Leibler divergence ${\mathcal{D}}_{KL}(W\parallel {W}_{1}\underset{\xaf}{\oplus}{W}_{2})$. We repeat this procedure 10,000 times and obtain the average Kullback-Leibler divergence, finally computing Ics.

_{1}⊕W

_{2}as described above, compute the Kullback-Leibler divergence ${\mathcal{D}}_{KL}(W\parallel {W}_{1}\underset{\xaf}{\oplus}{W}_{2})$ and obtaining the Ics.

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Markov, A.A. Rasprostranenie zakona bol’shih chisel na velichiny zavisgaschie drug ot druga. Izvestiya Fiziko-matematicheskogo obschhestra pri Kazanskom universitete, 2-ya seriya
**1906**, 15, 135–156. [Google Scholar] - Levin, D.A.; Peres, Y.; Wilmer, E.L. Markov Chains and Mixing Time; American Mathematical Society: Rhode Island, RI, USA, 2008. [Google Scholar]
- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equations of State Calculations by Fast Computing Machines. J. Chem. Phys
**1953**, 21, 1087–1092. [Google Scholar] - Rabiner, L.R. First Hand: The Hidden Markov Model. Available online: http://ethw.org/First-Hand:The_Hidden_Markov_Model accessed on 22 April 2015.
- Hamilton, J. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica
**1989**, 57, 357–384. [Google Scholar] - Tong, H. Determination of the order of Markov Chain by Akaikes Information Criterion. J. Appl. Probab
**1975**, 12, 488–497. [Google Scholar] - Katz, R. W. An optimal selection of regression variables. Biometrika
**1981**, 68, 45–54. [Google Scholar] - Akaike, H. A new look at the statistical mode indentification. IEEE Trans. Autom. Control
**1974**, 19, 716–723. [Google Scholar] - Schwartz, G. Estimating the dimension of a model. Ann. Stat
**1978**, 6, 461–464. [Google Scholar] - Anderson, T.W.; Goodman, L.A. Statistical Inference about Markov chains. Ann. Math. Stat
**1957**, 28, 89–110. [Google Scholar] - Bartlett, M.S. The frequency goodness of fit test for probability chains. Proc. Camb. Philos. Soc
**1951**, 47, 86–95. [Google Scholar] - Ching, W.; Fung, E.; Ng, M. A multivariate Markov Chain model for categorical data sequences and its applications in demand predictions. IMA J. Manag. Math
**2002**, 13, 187–199. [Google Scholar] - Ching, W.; Fung, E.; Ng, M.; Akutsu, T. On Construction of Stochastic Genetic Networks Based on Gene Expression Sequences. Int. J. Neural Syst
**2005**, 15, 297–310. [Google Scholar] - Siu, T.; Ching, W.; Ng, M.; Fung, E. On multivariate credibility approach for portfolio credit risk measurement. Quant. Finance
**2005**, 5, 543–556. [Google Scholar] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 1991. [Google Scholar]
- Feller, W. An Introduction to Probability Theory and Its Applications; Volume I, Wiley: New York, NY, USA, 1968. [Google Scholar]
- Rached, Z.; Alajaji, F.; Campbell, L.L. The Kullback-Leibler divergence rate between Markov Sources. IEEE Trans. Inf. Theory
**2003**, 50, 917–921. [Google Scholar] - Cont, R. Long range dependence in financial markets. In Fractals in Engineering; Lévy-Véhel, J., Lutton, E., Eds.; Springer: London, UK, 2005; pp. 159–179. [Google Scholar]
- Min, Q. Nonlinear Predictability of Stock Returns Using Financial and Economic Variables. J. Bus. Econ. Stat
**1999**, 17, 419–429. [Google Scholar] - Epps, T.W. Comovements in Stock Prices in the Very Short Run. J. Am. Stat. Assoc
**1979**, 74, 291–298. [Google Scholar] - Münnix, M. C.; Schäfer, R.; Guhr, T. Impact of the tick-size on financial returns and correlations. Physica A
**2014**, 389, 4828–4843. [Google Scholar] - Shapira, Y.; Berman, Y.; Ben-Jacob, E. Modelling the short term herding behavior of stock markets. New J. Phys
**2014**, 16, 053040. [Google Scholar] - Glattfelder, J.B.; Dupuis, A.; Olsen, R.B. Patterns in high-frequency FX data: discovery of 12 empirical scaling laws. Quant. Finance
**2011**, 11, 599–614. [Google Scholar]

**Figure 1.**Graph of densities for Kullback-Leibler divergence obtained through Monte Carlo simulations and approximation (Equation (16)), for 2-by-2, 3-by-3, 4-by-4 and 5-by-5 transition probability matrices. The number of transitions N is set to 1000; 2500; 5000 and 10; 000 corresponding to red, blue, green and yellow curves, while the black curves draw the approximate density of distribution as ${\mathcal{D}}_{KL}(W\parallel \widehat{W})~{\displaystyle {\sum}_{i=1}^{M}\mathrm{\Gamma}\left(\frac{{M}_{i}(i-1)}{2},\frac{1}{N}\right)}$.

**Figure 2.**The AUD/JPY exchange rate (black curve) during the unfolding of 2008 financial crisis, the Iml (blue curve) and Iml smoothed with 20 point moving average (orange curve), whereas W is estimated as a fourth order Markov process, while Ŵ is estimated as a first order Markov process.

**Figure 3.**The upper graph shows the Kullback-Leibler divergence, while the lower graph shows the Ics, between the joint Markov process modeled by W and its independent counterpart modeled by the structureless transition probability matrix W

_{1}⊕W

_{2}, as a function of correlation coefficient ρ linking two time series together.

**Figure 4.**The figure graphs the EUR/USD (

**blue**) and USD/CHF (

**orange**) exchange rate during the unfolding of 2010–2011 Euro Crisis and hourly Ics (

**green curve**) and Ics smoothed with 20 point moving average (

**red curve**), whereas all processes are modeled as third order Markov processes.

**Table 1.**The table presents the Iml and Kullback-Leibler divergence in brackets, for all possible combinations of higher and lower order Markov processes.

W (Lower Order Markov Process)
| |||||||
---|---|---|---|---|---|---|---|

1st | 2nd | 3rd | 4th | 5th | 6th | ||

W higher order Markov process | 2nd | 0.99 (0.02) | – | – | – | – | – |

3rd | 1.00 (0.07) | 1.00 (0.28) | – | – | – | – | |

4th | 1.00 (0.11) | 1.00 (0.31) | 1.00 (0.30) | – | – | – | |

5th | 1.00 (0.17) | 1.00 (0.34) | 1.00 (0.32) | 1.00 (0.14) | – | – | |

6th | 1.00 (0.16) | 1.00 (0.32) | 1.00 (0.30) | 1.00 (0.14) | 3.22 × 10^{−6}(0.02) | – | |

7th | 0.99 (0.14) | 1.00 (0.29) | 1.00 (0.27) | 0.98 (0.13) | 7.87 × 10^{−22}(0.02) | 4.91 × 10^{−51}(0.01) |

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Golub, A.; Chliamovitch, G.; Dupuis, A.; Chopard, B. Uncovering Discrete Non-Linear Dependence with Information Theory. *Entropy* **2015**, *17*, 2606-2623.
https://doi.org/10.3390/e17052606

**AMA Style**

Golub A, Chliamovitch G, Dupuis A, Chopard B. Uncovering Discrete Non-Linear Dependence with Information Theory. *Entropy*. 2015; 17(5):2606-2623.
https://doi.org/10.3390/e17052606

**Chicago/Turabian Style**

Golub, Anton, Gregor Chliamovitch, Alexandre Dupuis, and Bastien Chopard. 2015. "Uncovering Discrete Non-Linear Dependence with Information Theory" *Entropy* 17, no. 5: 2606-2623.
https://doi.org/10.3390/e17052606