# Maximum Entropy Rate Reconstruction of Markov Dynamics

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Theory

**W**. This denotes the probability of switching from state x to state y in one time step.

**p**=

**pW**.

**W**and

**p**are therefore not independent parameters of the process, since the latter has to be the eigenvector of the unitary eigenvalue of the former. Following [13], we will actually impose a detailed balance in order to guarantee the stationarity, that is we impose p(x)W (x, y) = p(y)W (y, x).

#### 2.1. Unconstrained Case

**W**and the detailed balance condition, namely:

^{−}

^{Λ}, and the normalization constraint ${\sum}_{y}W(x,y)=1$ is satisfied. Moreover, the eigenvector of

**W**with the unitary eigenvalue is the vector with all elements equal to 1/N, so that the detailed balance is enforced. The unconstrained maximum entropy rate process is therefore given by

**W**, such that W (x, y) = 1/N for all x, y, as expected.

#### 2.2. Constraints

## 3. Solving the Equations

#### 3.1. Two-State Case

**W**

_{ME}can be rewritten as:

^{2}= 1 for any

**W**. It is interesting to note how the consideration of one non-trivial constraint squeezes the space on independent transition coefficients onto a one-dimensional submanifold, while enhancing multiple (consistent) constraints would result in a MaxEnt submanifold of larger dimensionality.

#### 3.2. Three-State Case

**W**

_{ME},

#### 3.3. Algorithm

**W**

_{ME}(45) needs to be numerically estimated. To this end, we implemented a version of the well-known and widely used generalized iterative scaling (GIS). Specifically, the algorithm starts from an initial solution that is iteratively adjusted to fit the constraints. Setting an initial value for α = 1.0 and β = 1.0, we iterate over k to satisfy the constraint on the normalization of p. Once a solution for k is reached for a given couple (α, β), these are updated as indicated in the pseudo-code below, and the process is iterated until α and β have converged towards values satisfying the constraints on σ

^{2}and A given by Equations (19) and (20), respectively. The procedure is summarized in Algorithm 1 below. We refer to [16] for a discussion of the convergence of the GIS and an overview of other related algorithms.

## 4. MaxEnt Estimations for Time Series

#### 4.1. Stationary Processes

**W**when we only have short samples at our disposal. We detail the calculations for the coefficient W (−, −), the other three being similar.

^{(}

^{n}

^{)}measured from a sample of size n is distributed normally according to $\mathcal{N}(A,{n}^{-}{}^{1})$. Figure 1 shows that this estimate turns out to be quite good, even for short samples, in particular when A stays small. Here, a time series is generated from a known transition matrix and then sampled in order to reconstruct the matrix using both the MaxEnt method and histogram sampling.

_{,}where p(−) denotes the stationary probability of being in state −1, which, in the current setting, is given by $p(-)=\frac{1-W(+,+)}{2-W(-,-)-W(+,+)}$. Following the same steps as previously, the sampled absolute error on W (−, −) has mean and deviation:

^{(}

^{n}

^{)}for the

**W**matrix considered. While a conservative option is to choose the minimum over all coefficients, we shall rather tolerate a poor estimation of one of the coefficients as long as the corresponding transitions occur scarcely and, therefore, define ${\mathrm{\Delta}}_{\mathbf{W}}^{(n)}$ as the sum of all ${\mathrm{\Delta}}_{ij}^{(n)}\u2019\mathrm{S}$ weighted by the stationary distribution, namely ${\mathrm{\Delta}}_{\mathbf{W}}^{(n)}={\displaystyle {\sum}_{i}p(i){\mathrm{\Delta}}_{ij}^{(n)}}$. From our experiments, the definition of ${\mathrm{\Delta}}_{\mathbf{W}}^{(n)}$ does not alter qualitatively the forthcoming results (see Figure 2). We now let n

_{c}(

**W**) denote the value of n above which ${\mathrm{\Delta}}_{\mathbf{W}}^{(n)}$ becomes negative. In other words, a non-negative n

_{c}means that for historical samples shorter than n

_{c}, the MaxEnt method gives better results when estimating the transition matrix underlying the observed process.

_{c}(

**W**) is found numerically from Equation (55) and plotted in Figure 2 over the space of 2 × 2 stochastic matrices parametrized by (W (−, −), W (+, +)). Note that n

_{c}is large close to the diagonal, but decays when one moves away from it, which means that a matrix that is “compatible” with the structure Equation (50) is better estimated using MaxEnt.

_{c}(

**W**) ≥ n and µ(n), the relative size of M(n) compared to M(0) (the space of all 2×2 stochastic matrices), then the relevance of the MaxEnt approach for a given state space will depend critically on the function µ(n). In the two-state case, one can read from Figure 2 that M(50) is concentrated in a neighborhood around the diagonal, so that µ(50) ≈ 0.15. This means that for samples of a size smaller than n = 50, the MaxEnt estimate is better than the frequency sampling estimate for about 15% of all possible processes. One should, however, note that processes on which one might want to apply the method are unlikely to be scattered randomly over [0, 1]

^{2}, but will rather be processes having a large entropy, that is low predictability. This tends to focus our interest on the central area of [0, 1]

^{2}and increase the effective µ(n).

_{i}defined by the conditions h

_{i}< h < h

_{max}where h

_{max}= ln 3 and h

_{i}is specified in Figure 3; this figure shows the effectiveness of our approach by highlighting that processes having a large entropy rate are more suited to our approach. We display there the cases where two constraints are enforced (blue curves) and where the constraint on the variance of the process is relaxed (red curves). We observe that, for short samples, going from one to two constraints results in a loss of performance or at best a marginal gain, as estimation errors of constraints tend to accumulate. However, when the sampling window is long enough to allow for an accurate estimation of all constraints, adding constraints results in a spectacular improvement of the MaxEnt method.

#### 4.2. Non-Stationary Processes

_{ij}(t) evolve within M(τ), where τ is the typical time scale on which the parameters of the dynamics change, then MaxEnt provides a quicker estimation of the instantaneous dynamics than sampling does.

_{−−}(t) (red). In particular, it avoids the large deviations shown by the sampling estimate (yellow).

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Jaynes, E.T. Information Theory Statistical Mechanics. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] - Jaynes, E.T. Information Theory Statistical Mechanics. II. Phys. Rev.
**1957**, 108, 171–190. [Google Scholar] - Schneidman, E.; Still, S.; Berry, M.J.; Bialek, W. Network Information and Connected Correlations. Phys. Rev. Lett.
**2003**, 91. [Google Scholar] [CrossRef] - Stephens, G.J.; Bialek, W. Statistical Mechanics of Letters in Words. Phys. Rev. E
**2010**, 81. [Google Scholar] [CrossRef] - Mora, T.; Bialek, W. Are Biological Systems Poised at Criticality. J. Stat. Phys.
**2011**, 144, 268–302. [Google Scholar] - Bialek, W.; Cavagna, A.; Giardina, I.; Mora, T.; Silvestri, E.; Viale, M.; Walczak, A. Statistical Mechanics for Natural Flocks of Birds. Proc. Natl. Acad. Sci. USA
**2012**, 109, 4786–4791. [Google Scholar] - Stephens, G.J.; Mora, T.; Tkacik, G.; Bialek, W. Statistical Thermodynamics of Natural Images. Phys. Rev. Lett.
**2013**, 110. [Google Scholar] [CrossRef] - McCallum, A.; Freitag, D.; Pereira, F. Maximum Entropy Markov Models for Information Extraction and Segmentation. Proceedings of the Seventeenth International Conference on Machine Learning, Standord, CA, USA, 29 June–2 July, 2000.
- Marre, O.; El Boustani, S.; Fregnac, Y.; Destexhe, A. Prediction of Spatiotemporal Patterns of Neural Activity from Pairwise Correlations. Phys. Rev. Lett.
**2009**, 102, 138101. [Google Scholar] - Biondi, F.; Legay, A.; Nielsen, B.; Wasowski, A. Maximizing Entropy over Markov Processes. Proceedings of the 7th International Conference, Language and Automata Theory and Applications 2013, Bilbao, Spain, 2–5 April 2013; pp. 128–140.
- Cavagna, A.; Giardina, I.; Ginelli, F.; Mora, T.; Piovani, D.; Tavarone, R.; Walczak, A. Dynamical Maximum Entropy Approach to Flocking. Phys. Rev. E
**2014**, 89, 042707. [Google Scholar] - Fiedor, P. Maximum Entropy Production Principle for Stock Returns
**2014**, arXiv, 1408.3728. - Van der Straeten, E. Maximum Entropy Estimation of Transition Probabilities of Reversible Markov Chains. Entropy.
**2009**, 11, 867–887. [Google Scholar] - Chliamovitch, G.; Dupuis, A.; Golub, A.; Chopard, B. Improving Predictability of Time Series Using Maximum Entropy Methods. Europhys. Lett.
**2015**, 110. [Google Scholar] [CrossRef] - Cover, T.; Thomas, J. Elements of Information Theory; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Malouf, R. A. Comparison of Algorithms for Maximum Entropy Parameter Estimation. Proceedings of the CoNLL-2002, Taipei, Taiwan, 31 August–1 September 2002; pp. 49–55.
- Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods; Springer: Berlin, Germany, 1991. [Google Scholar]
- Leone, F.C.; Nelson, L.S.; Nottingham, R.B. The Folded Normal Distribution. Technometrics
**1961**, 3, 867–887. [Google Scholar] - Brandimarte, P. Handbook in Monte Carlo Simulation: Applications in Financial Engineering, Risk Management, and Economics; Wiley: New York, NY, USA, 2014. [Google Scholar]

**Figure 1.**Comparison between the empirical mean and the standard deviation of data (bars) and estimate Equations (51)–(54) derived from the central limit assumption (continuous lines: mean; dashed lines: standard deviation), for transition matrices with autocorrelation A = −0.03 (top) and A = 0.36 (bottom).

**Figure 2.**n

_{c}(

**W**) plotted over the space of two-state stochastic matrices parametrized by W (−, −), W (+, +), for ${\mathrm{\Delta}}_{\mathbf{W}}^{(n)}$ chosen as (

**a**) the weighted sum of individual coefficients and (

**b**) the minimum over coefficients.

**Figure 3.**Success rate of three-state processes as a function of the sampling window, for two- and three-state processes involving one or two constraints. Cumulated quintiles of the entropy rate are displayed separately for three-state processes. One thousand processes are picked in each quintile.

**Figure 4.**A realization of the process Equation (56). The true coefficient W

_{−−}(t) (red) is compared with its MaxEnt (blue) and sampling (yellow) estimates.

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chliamovitch, G.; Dupuis, A.; Chopard, B.
Maximum Entropy Rate Reconstruction of Markov Dynamics. *Entropy* **2015**, *17*, 3738-3751.
https://doi.org/10.3390/e17063738

**AMA Style**

Chliamovitch G, Dupuis A, Chopard B.
Maximum Entropy Rate Reconstruction of Markov Dynamics. *Entropy*. 2015; 17(6):3738-3751.
https://doi.org/10.3390/e17063738

**Chicago/Turabian Style**

Chliamovitch, Gregor, Alexandre Dupuis, and Bastien Chopard.
2015. "Maximum Entropy Rate Reconstruction of Markov Dynamics" *Entropy* 17, no. 6: 3738-3751.
https://doi.org/10.3390/e17063738