# Entropies from Markov Models as Complexity Measures of Embedded Attractors

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Attractor Reconstruction

**Theorem 1**(Time-delay embedding theorem). Given a dynamic system with a d-dimensional solution space and an evolving solution

**f**(

**x**(k)), let s

_{k}be some observation

**h**(

**x**(k)) taken at instant k. Let us also define the lag vector (with dimension d

_{e}and common time lag τ):

**y**

_{k}generated by the dynamics contains all of the information of the space of solution vectors

**x**(k). The mapping between them is smooth and invertible. This property is referred to as diffeomorphism, and this kind of mapping is referred to as an embedding. Thus, the study of the time series

**y**

_{k}is also the study of the solutions of the underlying dynamical system

**f**(

**x**(k)) via a particular coordinate system given by the observation s

_{k}.

## 3. Entropy Measures

**X**is drawn according to the probability mass function p(x).

**X**= {X

_{i}}, i = 1, 2, …, n (i.e., a stochastic process), the process can be characterized by a joint probability mass function P (X

_{1}= x

_{1}, …, X

_{n}= x

_{n}) = p(x

_{1}, x

_{2}, …, x

_{n}). Under the assumption of the existence of the limit, the rate at which the joint entropy grows with n is defined by [17] (Chapter 4):

_{KS}). In order to approximate it, let the state space be partitioned into hypercubes of content ε

^{d}and the state of the system measured at intervals of time δ. Moreover, let p(q

_{1}, …, q

_{n}) denote the joint probability that the state of the system is in the hypercube

**q**

_{1}at t = δ, q

_{2}at t = 2δ, and so on. The H

_{KS}is defined as [20]:

_{KS}estimates the generation of information by computing the probabilities of nearby points that remain close to the signal trajectory after some time. Numerically, only entropies of finite order n can be computed.

_{KS}. ApEn is a measure of the average conditional information generated by diverging points of the trajectory [5,21]. Given a signal $s=\left\{{s}_{1},{s}_{2},\dots ,{s}_{{n}_{s}}\right\}$, where s

_{k}is an observation as those defined in Section 2 and n

_{s}is the length of the signal, ApEn can be defined as a function of the correlation sum given by:

**y**

_{k}and

**y**

_{j}and the norm $\Vert \cdot \Vert $ is defined in any consistent metric space. For a fixed d

_{e}and r, ApEn is given by:

_{e}points remain similar, with a tolerance r at the next point. Therefore, a low value of ApEn reflects a high degree of regularity.

## 4. HMM-Based Entropy Measures

_{0}< t

_{1}< t

_{2}< ⋯). The values of the stochastic process change with known probabilities, called transition probabilities. The particularity of this stochastic process is that the probability of change to another state depends only on the current state of the process; this is known as the Markov condition. If such probabilities do not change with time and the steady-state distribution is also constant, the MC is stationary; additionally, if every state of MC can be reached from any other stage (which is called irreducible) and all of the states are aperiodic and non-null recurrent, the MC is considered ergodic [23]. Let $\mathit{X}=\left\{{X}_{{t}_{i}}\right\}$ be a stationary MC, where ${X}_{{t}_{i}}$ is the state at the time t

_{i}, which takes values in the finite alphabet $\mathcal{X}$. The MC is completely determined by the set {

**π, K**} where:

**π**= {π_{i}}, i = 1, 2, …, m is the stationary distribution, where m is the number of states in the MC and π_{i}= P(X_{t}= i) as t → ∞ being the probability of ending at the i-th state independent of the initial state.**K**= {K_{ij}}, 1 ≤ i, j ≤ m is the transition kernel of the MC, where K_{ij}= P(X_{t}_{+1}= j|X_{t}= i) is the probability of reaching the j-th state at time t + 1, coming from the i-th state at time t.

**Z**= {Z

_{t}} denote a noisy version of

**X**= {X

_{t}} corrupted by discrete memoryless perturbations, which takes values in the finite alphabet $\mathcal{Z}$.

**C**will denote the channel transition matrix, i.e., the $|\mathcal{X}|\times |\mathcal{Z}|$ matrix with entries

**C**(x, z) = P(Z

_{t}= z|X

_{t}= x).

**Z**is then the HMP; since the states of the Markov process cannot be identified from its output (the states are “hidden”). The distribution and entropy rate H(

**Z**) of an HMP are completely determined by the pair {

**K, C**}; however, the explicit form of H(

**Z**) as a function of such a pair is unknown [25]. On the other hand, let us consider the joint entropy H(

**X, Z**), which, according to Equation (5), can be expressed as:

**X**) is the entropy of the Markov process (Equation (15)) and H(

**Z**|

**X**) is the conditional entropy of the stochastic process

**Z**given the Markov process

**X**. Therefore, in the same way as in Equation (15) and taking into account that the noise distributions in each state of the Markov process are conditionally independent of each other, it is possible to establish an entropy measure of the HMP as the entropy of the Markov process plus the entropy generated by the noise in each state of the process. The conditional entropy rate of

**Z**given

**X**can be expressed as [27]:

#### 4.1. Estimation of the HMM-Based Entropies

**K, B**}, where π and

**K**remain as described before (Section 4), and

**B**is defined as follows [28]:

**B**= {B_{ij}}, i = 1, 2, …, m, j = 1, 2, …, b is the probability distribution of the observation symbols, being B_{ij}= P(Z_{t}= υ_{j}|X_{t}= i), where Z_{t}is the output at time t, υ_{j}are different symbols that can be associated with the output and b is the total number of symbols.

_{j}}, which has typically been found by means of unsupervised clustering methods, such as linear vector quantization or k-means [30], that require a prior setting of the number of clusters. However, for attractor modeling, the estimation of the PC of the attractor is more interesting, since it can be considered as the “profile” trajectory, and its points can be used to construct Ψ. There are different methods to estimate PCs, in this work, the subspace-constrained mean shift (SCMS) method proposed in [15] was employed, because it provides very stable estimations and good convergence properties [31]. The SCMS is based on the well-known mean shift (MS) algorithm [32], which is a procedure to find the nearest stationary point of the underlying density function and is able to detect the modes of the density. SCMS is a generalization of MS that iteratively tries to find modes of a pdf in a local subspace. It can be used with parametric or non-parametric pdf estimators that are, at least, twice continuously differentiable. Formally, let f be a pdf on

**R**

^{D}; the d-dimensional principal surface is the collection of all points

**x**∈

**R**

^{D}where the gradient ∇f(

**x**) is orthogonal to exactly D − d eigenvectors of the Hessian of f, and the eigenvalues corresponding to these eigenvectors are negative [31]. For the PC (which is the interest of this paper), the one-dimensional principal surface is the collection of all points

**x**∈

**R**

^{D}at which the gradient of the pdf is an eigenvector of the Hessian of the pdf, and the remaining eigenvectors of the Hessian have negative eigenvalues; i.e., a principal curve is a ridge of the pdf, and every point on the principal curve is a local maximum of the pdf in the affine subspace orthogonal to the curve [31]. SCMS uses a simple modified MS algorithm to iteratively find the set of points according to the previous PC definition [15]. Figure 1 depicts a reconstructed attractor obtained from 200 ms of a normal speech recording and its principal curve obtained with the SCMS algorithm.

_{θ}); therefore, it is very easy to estimate the kernel transition matrix

**K**, only by counting the transitions between states and converting them into probabilities. Moreover, taking into account the quasi-periodicity of the pointed processes, some kind of oscillatory behavior could be expected for the Markov chains estimated from the reconstructed attractors. Thus, the final state of the process depends on the length of the recorded time series, and it will not tend to some states notoriously more than others. Therefore, the stationary distributions

**π**can be assumed as uniform. From Equation (15), the Markov chain entropy H

_{MC}can be defined as:

_{HMP}) in (17) and, of course, of the joint probability in (16). In this work, the conditional H

_{HMP}for each state i was estimated using two different non-parametric entropy estimators (Shannon and Renyi’s entropies, respectively) given by [33]:

_{w}is the number of points associated with the current MC state and σ is the window size of the kernel function κ (similar to r in the ApEn-based entropies). The order of Renyi’s entropy estimator α in all cases was set to two. Since the probability density functions of every state are independent of each other, the whole H

_{HMP}can be defined using the Shannon and Renyi estimators respectively as:

**Z**|X

_{t}= x) over all possible values x that X

_{t}may take.

_{RSE}was estimated as the mean of the recurrence-state entropies over all of the MC states. Formally, let η

_{i}(j) be the number of times that a whole loop around the state i took j steps; the Shannon and Renyi estimators of H

_{RSE}can be respectively defined as:

## 5. Experiments and Results

_{HMP}depends on the threshold r.

#### 5.1. Results

#### 5.1.1. Synthetic Sequences

_{HMP}(using Shannon and Renyi’s entropy estimators) along with those obtained with FuzzyEn, ApEn and SampEn for three periodic sinusoids of different frequencies f

_{1}= 10 Hz, f

_{2}= 50 Hz and f

_{3}= 100 Hz, as well as two different signal lengths of N = 50 and 500 points. This experiment has been reproduced as described in [12] assuming a sampling frequency f

_{s}= 1000 Hz, a time delay τ = 1 and an embedding dimension d

_{e}= 2. In view of the results depicted in Figure 2, it is possible to observe that both H

_{HMP}estimations provide very similar results. Furthermore, the values obtained for H

_{HMP}are quite similar for both lengths evaluated, being smaller for bigger N values. Since H

_{HMP}is based on a non-parametric estimation using a Gaussian kernel and in contrast to ApEn-based measures, it shows a smooth behavior and does not change abruptly with respect to small changes in r. The original definition of H

_{HMP}, Equations (19) and (20), used σ instead of r. Although they state for different concepts a window size of the kernel function and neighborhood threshold, respectively, during the experiments with H

_{HMP}, the reader must interpret them as equal. On the other hand, and also in contrast to the ApEn-based measures, H

_{HMP}presents also a good consistency, because there is no crossing among the plots corresponding to different frequencies. These properties are desirable for any entropy estimator.

_{HMP}in Figure 2 shows that for the first one, the larger the frequency of the sinusoid, the larger the entropy values obtained, whereas for the second one, the larger the frequency, the smaller the entropy. The explanation given to this phenomenon for FuzzyEn in [12] is that high frequency sinusoids appear to be more complex, which is lightly inaccurate, being an undesirable effect of the methods used for estimation; such an empirical result is due to the windowing effect and the need to fix a sampling frequency (f

_{s}) for comparing sinusoids of different fundamental frequencies (f

_{a}) and does not match well with the expected theoretical results, since from an information point of view, it should be zero for any discrete periodic sinusoid of infinite length [39]. This phenomenon is discussed in the Appendix.

**y**

_{k}in the correlation sum (Equation (10)) match the vectors defined by Taken’s theorem perfectly, which established that a proper embedding dimension and time delay must be found in order to achieve good results. For many practical purposes, the most important embedding parameter is the product d

_{e}τ of the time delay and the embedding dimension, rather than the embedding dimension and time delay alone [1] (Chapter 3, Section 3.3). The reason is that d

_{e}τ is the time span represented by an embedding vector. However, usually, each of the embedding parameters are estimated separately. The most employed method for the estimation of d

_{e}is the false nearest neighbors method [41] and the first minimum of the auto-mutual information or the first zero crossing of the autocorrelation function for τ. Regarding Figure 3a, it is possible to observe that for f = 10 Hz and τ = 1, the attractor furls around the main diagonal. However, a proper τ estimation (in this case τ = 4) returns the elliptical shape of the embedded attractor, and therefore, a proper selection of the time delay should help to compensate for the effect of the sampling frequency.

_{HMP}, since the entropy is estimated depending on the states of a Markov chain, for f = 10 Hz, each state of the MC will contain several points of the attractor, whilst for f = 100 Hz, the distance among consecutive points increases (Figure 3a), so each state will contain only copies of the same point (from different periods), resulting in an entropy tending to zero as f

_{a}increases. It is worth emphasizing that a proper attractor reconstruction plays an important role in the reliability of the measures that can be estimated from it [1] (Chapter 3).

_{HMP}- and ApEn-based entropies, the other two proposed measures, H

_{MC}and H

_{RSE}, do not depend on the parameter r; therefore, they were only analyzed regarding the frequency, f

_{a}, and signal length, N. Table 1 shows the entropy values obtained for the aforementioned synthetic sinusoids. For those entropy measures depending on r, this parameter was a priori set to 0.15. As expected, the Markov-based entropy measures tend to zero, because for perfect sinusoids, there is no unpredictability with respect to the state transition (H

_{MC}) or recurrence (H

_{RSE}); in other words, for an embedded attractor from a perfect sinusoid, the conditional entropy of one point given the previous one (Markov condition) is zero. The length of the signal affects the values obtained for all of the measures, mainly because the more periods, the larger the number of points of the attractor falling in the same positions, implying that the random process becomes more predictable; therefore, its entropy decreases. This is an important property of the proposed measures: they characterize the attractor from a dynamical point of view, instead of in a static way, such as ApEn-based entropies.

_{j}are i.i.d. uniform random variables on the interval $\left[-\sqrt{3},\sqrt{3}\right]$; and Z

_{j}are i.i.d. Bernoulli random variables with parameter ρ. The larger ρ is, the more irregular the process becomes. Figure 4 depicts the values of H

_{HMP}, FuzzyEn, ApEn and SampEn statistics measuring the complexity of three different MIX processes. It is possible to observe that H

_{HMP}has a similar consistency to FuzzyEn with respect to the values of complexity given to the processes under analysis, but H

_{HMP}has less dependence on the parameter r, because H

_{HMP}takes values in a narrower interval. Additionally, comparing Figures 4b and 4g against 4c and 4h, the results demonstrate being more consistent for short series. For H

_{MC}and H

_{RSE}, Table 2 shows the results obtained for the Shannon and Renyi estimators. It is possible to observe that these measures present also a good consistency regarding the complexity of the MIX processes. However, for lower ρ values, there is a stronger dependence on the length of the signal.

_{MC}, ApEn, SampEn and FuzzyEn; H

_{HMP}and H

_{RSE}showed good consistency for signal-to-noise relationships lower than 0.2.

#### 5.1.2. Real-Life Physiological Signals

_{HMP}performed similar to SampEn; however, H

_{MC}and H

_{RSE}are clearly able to differentiate between S and E groups. This indicates that these measures could be used together to improve the performance of an automatic recognition system, since they provide complementary discriminant information.

_{MC}and H

_{HMP}performed very well at separating both classes, whilst H

_{RSE}behaves similarly to SampEn. It is clear from Figure 6 that normal samples can be grouped with less dispersion than the pathological ones. Lastly, the entropy values provided by FuzzyEn let the two groups completely overlap.

_{RSE}presents the best performance among the measures evaluated. The performance of the measures was also evaluated using the ECG recordings directly; nevertheless, the ApEn-based feature showed a better behavior applied to the HRV sequences, whilst the Markov-based features presented a more stable behavior.

_{B}and S

_{W}are the within-cluster scatter matrix and the between-cluster scatter matrix, respectively, which are given by:

_{i}means the group or class i, N

_{c}is the number of classes, µ

_{i}is the mean of the ith-class, n

_{i}is the number of samples from class i and µ is the mean of all samples. Since FI has no maximum bound, for the sake of comparison, it is easier to normalize the FIs obtained for the different measures with respect to the maximum one in a single experiment. Therefore, the best measure will obtain an FI of one, and the FI obtained for the other measures represents how good (or bad) the measure is with respect to the best in the particular experiment. The one-way ANOVA test is included for the sake of comparison with other works in the state-of-the-art.

_{RSE}is better than FuzzyEn, H

_{HMP}performs worse than SampEn and H

_{MC}is by far the best measure to identify different pathophysiological states. On the other hand, for voice signals’ H

_{HMP}performs similar than ApEn. The FI obtained by FuzzyEn is quite low, which is in concordance with Figure 6. In this case, H

_{MC}is again the best feature, indicating that the regularity of the voice signals is a very important aspect for differentiating between normal and pathological subjects. Finally, for the apnea database, the recurrence entropy H

_{RSE}perform better than all of the other measures in both, ECG recordings and their derived HRV signals. As was pointed out above, Markov-based measures show similar FI values for ECG and their derived HRV signals, whilst SampEn and FuzzyEn showed a very low performance when they were evaluated on the original ECG recordings.

## 6. Discussion and Conclusions

_{a}of the sinusoid, the bigger the number of cycles needed to obtain small entropy values (i.e., for a given N, the lower the f

_{a}, the higher the uncertainly, and vice versa). In view of the previous discussion, it is not possible to conclude that higher f

_{a}implies more complexity in terms of entropy (i.e., there is no causality relationship, although there exists an obvious correlation for some of the entropy measurements found in the literature, specifically for FuzzyEn and permutation entropy [42]). In contrast, the higher the f

_{a}, the smaller the entropy should be for a given f

_{s}and N, because the number of samples falling in one period of a high frequency sinusoid is lower, so the amplitude values are more frequently repeated, leading to less points in the attractor that are more distant.

_{a}affect the entropy values, we could speak about a causality relationship (to be verified and out of the scope of this paper) with respect to both the length of the signal and the frequency of the sinusoid (not necessarily linear). Despite this, the literature reports causality relationships identifying more frequency with more entropy. This is also shown for other complexity measurements [43]. In any case, as demonstrated in [42], the relationship is not linear.

_{s}and f

_{a}, these conclusions are valid for a range of τ values. The bigger the quotient f

_{s}/f

_{a}, the larger the range for τ that could be used to satisfy the aforementioned causality relationship. If τ takes very large values, the relationship between f

_{a}and entropy could not be monotonically decreasing or increasing. Thus, the higher the τ value, the larger the f

_{s}needed to satisfy the aforementioned causality relationship.

## Acknowledgments

## Appendix

_{HMP}for sinusoids of different frequencies. Note that for the first one, the larger the frequency of the sinusoid, the larger the entropy values obtained, whereas for the second one, the larger the frequency, the smaller the entropy.

_{a}k) is the sampled version of a continuous one x(t) = A cos(2πf

_{a}t) (satisfying the constrain that ensures a periodic discrete sequence Ω

_{a}/2π = N

_{0}/l, being

**N**

_{0}the discrete fundamental period of the sequence and l ∈ ℤ

^{+}), and for a fixed sampling frequency f

_{s}and number of samples, N (i.e., equal time window), increasing f

_{a}implies less points in the attractor; therefore, the space among consecutive points increases, and due to the intrinsic method used by FuzzyEn to estimate the“complexity”, the entropy value is higher. This is shown in Figure 3a in which the embedded attractors reconstructed from three perfect sinusoids are depicted for the same frequencies used in Figure 2 with a default value for the time delay (τ = 1) (as suggested in [5]). The analysis requires pointing out that a periodic discrete sinusoid must always have a d

_{e}-dimensional ellipsoidal attractor; so from a complexity point of view, every sinusoid should be considered equivalent, no matter what f

_{a}is, but the focuses of the ellipsoidal attractor might vary depending on f

_{a}and f

_{s}. Therefore, an eventual comparison of the results obtained using sinusoids for any complexity measurement derived from their attractors requires ensuring the same f

_{s}for all of the sinusoids under analysis. Obviously, the problem that arises fixing f

_{s}is that the number of points in the attractor is smaller for higher frequencies (3). Note that two segments of discrete sinusoids of the same length (i.e., the same number of samples, N) generated with different frequencies f

_{1}and f

_{2}are identical if they are sampled with different frequencies f

_{s}

_{1}and f

_{s}

_{2}(matching the Nyquist theorem) and satisfying the relationship f

_{s}

_{1}/f

_{1}= f

_{s}

_{2}/f

_{2}. Thus, their attractors would also be identical, leading to the same results for each complexity measurement that could be derived from their corresponding attractors.

_{a}implies more complexity in terms of entropy (i.e., there is no causality relationship, although there exists an obvious correlation for some of the entropy measurements found in the literature, specifically for FuzzyEn and permutation entropy [42]). This is an undesirable effect of the methods used for estimation and does not match well with the expected theoretical results, since from an information point of view, it should be zero for any discrete periodic sinusoid of infinite length [39]. In contrast, the higher the f

_{a}, the smaller the entropy should be for a given f

_{s}and N, because the number of samples falling in one period of a high frequency sinusoid is smaller, so the amplitude values are more frequently repeated, leading to less points in the attractor that are more distant.

## Author Contributions

## Conflicts of Interest

## References

- Kantz, H.; Schreiber, T. Nonlinear Time Series Analysis, 2nd ed; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Gao, J.; Hu, J.; Tung, W.W. Entropy measures for biological signal analyses. Nonlinear Dyn.
**2012**, 68, 431–444. [Google Scholar] - Yan, R.; Gao, R. Approximate entropy as a diagnostic tool for machine health monitoring. Mech. Syst. Signal Process
**2007**, 21, 824–839. [Google Scholar] - Balasis, G.; Donner, R.V.; Potirakis, S.M.; Runge, J.; Papadimitriou, C.; Daglis, I.A.; Eftaxias, K.; Kurths, J. Statistical Mechanics and Information-Theoretic Perspectives on Complexity in the Earth System. Entropy
**2013**, 15, 4844–4888. [Google Scholar] [Green Version] - Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci.
**1991**, 88, 2297–2301. [Google Scholar] - Abarbanel, H.D. Analysis of Observed Chaotic Data; Springer: New York, NY, USA, 1996. [Google Scholar]
- Milnor, J. On the concept of attractor. Commum. Math. Phys.
**1985**, 99, 177–195. [Google Scholar] - Giovanni, A.; Ouaknine, M.; Triglia, J.M. Determination of Largest Lyapunov Exponents of Vocal Signal: Application to Unilateral Laryngeal Paralysis. J. Voice.
**1999**, 13, 341–354. [Google Scholar] - Serletis, A.; Shahmordi, A.; Serletis, D. Effect of noise on estimation of Lyapunov exponents from a time series. Chaos Solutions Fractals
**2007**, 32, 883–887. [Google Scholar] - Arias-Londoño, J.; Godino-Llorente, J.; Sáenz-Lechón, N.; Osma-Ruiz, V.; Castellanos-Domínguez, G. Automatic detection of pathological voices using complexity measurements, noise parameters and cepstral coefficients. IEEE Trans. Biomed. Eng.
**2011**, 58, 370–379. [Google Scholar] - Richman, J.S.; Moorman, J.R. Physiological time-series analysis using approximate entropy and sample entropy. Am. J. Physiol. Heart Circ Physiol.
**2000**, 278, H2039–H2049. [Google Scholar] - Chen, W.; Zhuang, J.; Yu, W.; Wang, Z. Measuring complexity using FuzzyEn, ApEn and SampEn. Med. Eng. Phys.
**2009**, 31, 61–68. [Google Scholar] - Xu, L.S.; Wang, K.Q.; Wang, L. Gaussian kernel approximate entropy algorithm for analyzing irregularity of time series, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, China, 18–21 August 2005; pp. 5605–5608.
- Cappé, O. Inference in Hidden Markon Models; Springer: New York, NY, USA, 2007. [Google Scholar]
- Ozertem, U.; Erdogmus, D. Locally Defined Principal Curves and Surfaces. J. Mach. Learn. Res.
**2011**, 12, 241–274. [Google Scholar] - Hastie, T.; Stuetzle, W. Principal curves. J. Am. Stat. Assoc.
**1989**, 84, 502–516. [Google Scholar] - Cover, T.M.; Thomas, J.A. Elements of information theory, 2nd ed; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Takens, F. Detecting strange attractors in turbulence. In Nonlinear Optimization; Volume 898, Lecture Notes in Mathematics; Springer: Berlin, Germany, 1981; pp. 366–381. [Google Scholar]
- Alligood, K.T.; Sauer, T.D.; Yorke, J.A. CHAOS: An Introduction to Dynamical Systems; Springer: New York, NY, USA, 1996. [Google Scholar]
- Costa, M.; Goldberger, A.; Peng, C.K. Multiscale entropy analysis of biological signals. Phys. Rev. E.
**2005**, 71, 021906. [Google Scholar] - Rezek, I.A.; Roberts, S.J. Stochastic complexity measures for physiological signal analysis. IEEE Trans. Biomed. Eng.
**1998**, 45, 1186–1191. [Google Scholar] - Woodcock, D.; Nabney, I.T. A new measure based on the Renyi entropy rate using Gaussian kernels; Technical Report; Aston University: Birmingham, UK, 2006. [Google Scholar]
- Murphy, K.P. Machine learning a probabilistic perspective; MIT Press: Cambridge, MA, USA, 2012; Chapter 17. [Google Scholar]
- Fraser, A.M. Hidden Markov Models and Dynamical Systems; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar]
- Ephraim, Y.; Merhav, N. Hidden Markov Processes. IEEE Trans. Inf. Theory
**2002**, 48, 1518–1569. [Google Scholar] - Ragwitz, M.; Kantz, H. Markov models from data by simple nonlinear time series predictors in delay embedding spaces. Phys. Rev. E.
**2002**, 65, 1–12. [Google Scholar] - Sheng, Y. The theory of trackability and robustness for process detection. Ph.D. Thesis; Dartmouth College: Hanover, New Hampshire, NH, USA, 2008. Available online: http://www.ists.dartmouth.edu/library/206.pdf accessed on 2 June 2015.
- Rabiner, L.R. A tutorial on hidden Markov models and selected applications on speech recognition. Proc. IEEE.
**1989**, 77, 257–286. [Google Scholar] - Massachusetts Eye and Ear Infirmary, Voice Disorders Database, Version.1.03 [CD-ROM]; Kay Elemetrics Corp; Lincoln Park, NJ, USA, 1994.
- Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed; Wiley: New York, NY, USA, 2000. [Google Scholar]
- Ghassabeh, Y.; Linder, T.; Takahara, G. On some convergence properties of the subspace constrained mean shift. Pattern Recognit.
**2013**, 46, 3140–3147. [Google Scholar] - Comaniciu, D.; Meer, P. Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Trans. Pattern Anal. Mach. Intell.
**2002**, 24, 603–619. [Google Scholar] - Erdogmus, D.; Hild, K.E.; Principe, J.C.; Lazaro, M.; Santamaria, I. Adaptive Blind Deconvolution of Linear Channels Using Renyi’s Entropy with Parzen Window Estimation. IEEE Trans. Signal Process
**2004**, 52, 1489–1498. [Google Scholar] - Rényi, A. On measure of entropy and information, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Statistical Laboratory of the University of California, Berkeley, CA, USA, 20 June–30 July 1960; pp. 547–561. Available online: http://projecteuclid.org/euclid.bsmsp/1200512181 accessed on 2 June 2015.
- Andrzejak, R.; Lehnertz, K.; Rieke, C.; Mormann, F.; David, P.; Elger, C. Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. Rev. E.
**2001**, 061907. [Google Scholar] - Parsa, V.; Jamieson, D. Identification of pathological voices using glottal noise measures. J. Speech Lang. Hear. Res.
**2000**, 43, 469–485. [Google Scholar] - Goldberger, A.; Amaral, L.; Glass, L.; Hausdorff, J.; Ivanov, P.; Mark, R.; Mietus, J.; Moody, G.; Peng, C.K.; Stanley, H. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation
**2000**, 101, e215–e220. [Google Scholar] - Penzel, T.; Moody, G.; Mark, R.; Goldberger, A.; Peter, J.H. The Apnea-ECG Database, Proceedings of Computers in Cardiology, Cambridge, MA, USA, 24–27 September 2000; pp. 255–258.
- Viertiö-Oja, H.; Maja, V.; Särkelä, M.; Talja, P.; Tenkanen, N.; Tolvanen-Laakso, H.; Paloheimo, M.; Vakkuri, A.; YLi-Hankala, A.; Meriläinen, P. Description of the entropy™ algorithm as applied in the Datex-Ohmeda S/5™ entropy module. Acta Anaesthesiol. Scand.
**2004**, 48, 154–161. [Google Scholar] - Kaffashi, F.; Foglyano, R.; Wilson, C.G.; Loparo, K.A. The effect of time delay on Approximate & Sample Entropy calculation. Physica D.
**2008**, 237, 3069–3074. [Google Scholar] - Cao, L. Practical method for determining the minimum embedding dimension of a scalar time series. Physica D.
**1997**, 110, 43–50. [Google Scholar] - Morabito, F.C.; Labate, D.; La-Foresta, F.; Bramanti, A.; Morabito, G.; Palamara, I. Multivariate Multi-Scale Permutation Entropy for Complexity Analysis of Alzheimer’s Disease EEG. Entropy
**2012**, 14, 1186–1202. [Google Scholar] - Aboy, M.; Hornero, R.; Abásolo, D.; Álvarez, D. Interpretation of the Lempel-Ziv Complexity Measure in the Context of Biomedical Signal Analysis. IEEE Trans. Biomed. Eng.
**2006**, 53, 2282–2288. [Google Scholar]

**Figure 1.**Two-dimensional embedded attractor extracted from 200 ms of the speech signal HB1NAL.NSP(taken from [29]) and its principal curve obtained using the subspace-constrained mean shift (SCMS) method. The recording corresponds to the sustained phonation of the vowel/ah/.

**Figure 2.**Values of hidden Markov processes (HMP) entropy (H

_{HMP}), fuzzy entropy (FuzzyEn), approximate entropy (ApEn) and sample entropy (SampEn) for periodic sinusoids of different frequencies and signal lengths with respect to the parameter r.

**Figure 3.**Two-dimensional embedded attractors for perfect sinusoids with frequencies f

_{1}= 10 Hz, f

_{2}= 50 Hz and f

_{3}= 100 Hz;

**(a)**time delays set as the default value τ = 1; note that, depending on the fundamental and sampling frequencies, perfect sinusoids could even be seen, in the states space, as a diagonal line indicating that the ellipsoid has collapsed;

**(b)**time delays set using a criterion based on the auto-mutual information; note that if a more appropriate τ is set for every sinusoidal signal, the embedded attractor could recover its ellipsoidal shape.

**Figure 4.**Values of H

_{HMP}, FuzzyEn, ApEn and SampEn obtained measuring the complexity of MIX(0.3), MIX(0.5) and MIX(0.7). The abscissa represents the r values, displayed logarithmically in order to make the experiment comparable to the one proposed in [12].

**Figure 5.**Box plots for entropy measures estimated from EEG signals of three different groups of people: (H) healthy, (E) epileptic subjects during a seizure-free interval and (S) epileptic subjects during a seizure

**Figure 6.**Box plots for entropy measures estimated from voice signals of two different groups of people: (N) normal, (P) pathological.

**Figure 7.**Box plots for entropy measures estimated from HRV signals of two different groups of people: (N) normal, (A) apnea.

**Table 1.**Values of entropy measures for perfect sinusoids of different frequencies (f

_{i}) and signal lengths (N). MC, Markov chain; RSE, recurrence-state entropy.

f_{i} | N | H_{MC} | H_{HMP} | H_{RSE} | ApEn | SampEn | FuzzyEn | |||
---|---|---|---|---|---|---|---|---|---|---|

S* | R** | S | R | S | R | |||||

10 | 50 | 1.21 | 1.07 | 1.38 | 1.37 | 0.75 | 0.67 | 0.18 | 0.28 | 0.48 |

500 | 1.01 | 0.76 | 0.66 | 0.65 | 0.96 | 0.70 | 0.24 | 0.19 | 0.21 | |

50 | 50 | 0.30 | 0.28 | 0.33 | 0.33 | 0.26 | 0.23 | 0.18 | 0.31 | 0.72 |

500 | 0.12 | 0.10 | 0.14 | 0.14 | 0.11 | 0.09 | 0.14 | 0.19 | 0.65 | |

100 | 50 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 | 0.02 | 0.94 |

500 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.82 |

^{*}Shannon estimator

^{**}Renyi estimator

**Table 2.**Values of Markovian entropies measuring the complexity of MIX(0.3), MIX(0.5) and MIX(0.7)). For H

_{HMP}, the values were estimated using r = 0.15 × std(signal).

ρ | N | H_{MC} | H_{HMP} | H_{RSE} | |||
---|---|---|---|---|---|---|---|

S | R | S | R | S | R | ||

0.3 | 50 | 0.64 | 0.59 | 0.63 | 0.61 | 0.56 | 0.56 |

0.3 | 100 | 0.96 | 0.88 | 0.59 | 0.56 | 0.94 | 0.89 |

0.5 | 50 | 0.93 | 0.91 | 0.64 | 0.62 | 0.57 | 0.57 |

0.5 | 100 | 1.31 | 1.26 | 0.80 | 0.78 | 1.09 | 1.08 |

0.7 | 50 | 0.83 | 0.80 | 0.83 | 0.82 | 0.90 | 0.62 |

0.7 | 100 | 1.22 | 1.18 | 0.81 | 0.79 | 1.16 | 1.14 |

**Table 3.**Entropy values obtained from the logistic system for different R and noise level (NL). The parameter r was set to 0.25.

R | NL | H_{MC} | H_{HMP} | H_{RSE} | ApEn | SampEn | FuzzyEn | |||
---|---|---|---|---|---|---|---|---|---|---|

S | R | S | R | S | R | |||||

3.5 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | 0.28 |

0.1 | 0.85 | 0.74 | 0.23 | 0.22 | 1.90 | 1.67 | 0.08 | 0.07 | 0.48 | |

0.2 | 1.40 | 1.17 | 0.85 | 0.83 | 2.56 | 2.07 | 0.61 | 0.56 | 0.82 | |

0.3 | 1.44 | 1.28 | 1.33 | 1.31 | 2.23 | 2.37 | 0.90 | 0.88 | 1.03 | |

3.7 | 0.0 | 0.84 | 0.74 | 0.43 | 0.42 | 3.41 | 2.68 | 0.38 | 0.38 | 0.77 |

0.1 | 1.10 | 0.95 | 0.67 | 0.66 | 3.38 | 2.73 | 0.49 | 0.49 | 0.89 | |

0.2 | 1.58 | 1.43 | 1.54 | 1.51 | 2.83 | 2.98 | 0.86 | 0.81 | 1.10 | |

0.3 | 1.77 | 1.63 | 2.65 | 2.02 | 2.74 | 3.11 | 1.10 | 1.13 | 1.29 | |

3.8 | 0.0 | 1.10 | 0.98 | 0.40 | 0.39 | 3.63 | 3.28 | 0.47 | 0.47 | 0.99 |

0.1 | 1.22 | 1.03 | 0.72 | 0.70 | 3.75 | 3.52 | 0.58 | 0.59 | 1.14 | |

0.2 | 1.78 | 1.65 | 1.53 | 1.49 | 3.37 | 3.25 | 0.92 | 0.90 | 1.31 | |

0.3 | 1.83 | 1.68 | 2.14 | 2.10 | 2.42 | 2.36 | 1.19 | 1.24 | 1.55 |

**Table 4.**Performance of the entropy measures discriminating the different classes in the EEG, voice pathology and apnea (ECG and heart rate variability (HRV)) datasets, according to the Fisher class separability index (FI) and the one-way ANOVA test.

Measures | Datasets
| |||||||
---|---|---|---|---|---|---|---|---|

EEG
| Voice
| ECG
| HRV
| |||||

FI | ANOVA | FI | ANOVA | FI | ANOVA | FI | ANOVA | |

ApEn | 0.48 | p < 0.001 | 0.69 | p < 0.001 | 0.64 | p < 0.001 | 0.69 | p < 0.001 |

SampEn | 0.48 | p < 0.001 | 0.34 | p < 0.001 | 0.08 | v < 0.001 | 0.83 | p < 0.001 |

FuzzyEn | 0.80 | p < 0.001 | 0.09 | p < 0.001 | 0.03 | p < 0.001 | 0.72 | p < 0.001 |

H_{MCr} | 1.00 | p < 0.001 | 1.00 | p < 0.001 | 0.37 | p < 0.001 | 0.58 | p < 0.001 |

H_{HMPr} | 0.24 | p < 0.001 | 0.57 | p < 0.001 | 0.16 | p < 0.001 | 0.01 | p > 0.05 |

H_{RSEr} | 0.83 | p < 0.001 | 0.20 | p < 0.001 | 1.00 | p < 0.001 | 1.00 | p < 0.001 |

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Arias-Londoño, J.D.; Godino-Llorente, J.I.
Entropies from Markov Models as Complexity Measures of Embedded Attractors. *Entropy* **2015**, *17*, 3595-3620.
https://doi.org/10.3390/e17063595

**AMA Style**

Arias-Londoño JD, Godino-Llorente JI.
Entropies from Markov Models as Complexity Measures of Embedded Attractors. *Entropy*. 2015; 17(6):3595-3620.
https://doi.org/10.3390/e17063595

**Chicago/Turabian Style**

Arias-Londoño, Julián D., and Juan I. Godino-Llorente.
2015. "Entropies from Markov Models as Complexity Measures of Embedded Attractors" *Entropy* 17, no. 6: 3595-3620.
https://doi.org/10.3390/e17063595