# Some Convex Functions Based Measures of Independence and Their Application to Strange Attractor Reconstruction

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Quality Factor (QF) of Quasientropy (QE)

#### 2.1. Quasientropy

_{r}(·) the cumulative distribution function (CDF) of variable r. , where Prob(A) denotes the probability that event A occurs, and p

_{r}(·) is the probability density function (PDF) of r. Without loss of generality, consider two continuous variables r

_{1}and r

_{2}. In the past one or two decades, the study of copulas has become a blooming field of statistical research [20]. Copula is the joint CDF of the transformed variables by their respective CDFs. The rationale of the study of copulas is that, to study the relation between two or more variables, we should nullify the effect of their marginal distributions and concentrate on their joint distribution. Based on this principle, we transform r

_{1}and r

_{2}by their respective CDFs as follows:

_{1}, z

_{2}) ∈ [0, 1] × [0, 1], and we have [8,9]:

**Lemma 1**.

_{1}and r

_{2}are independent if and only if (z

_{1}, z

_{2}) is uniformly distributed in [0, 1] × [0, 1].

_{1}and r

_{2}by measuring the uniformity of the distribution of (z

_{1}, z

_{2}) in [0, 1] × [0, 1]. To this end, let us partition the region [0, 1] × [0, 1] into an l × l uniform grid, and denote by ξ (i, j) the (i, j)th square in the grid:

_{1}, z

_{2}) belongs to ξ (i, j), i.e.,

_{1}and r

_{2}is defined as [9]:

^{2}– u

^{q})/(1 – q), up to a minus sign, QE becomes the Tsallis entropy [6] of p (i, j).

_{1}and r

_{2}are independent, then (z

_{1}, z

_{2}) is uniform in [0, 1] × [0, 1] and, thus, p (i, j) is uniform in {1,…, l} × {1,…, l}. Then the equality in (5) holds and β (r

_{1}, r

_{2)}reaches its minimum l

^{2}f (1/l

^{2}). Conversely, if r

_{1}and r

_{2}are not independent, then there exists l

_{0}such that for any l > l

_{0}, β (r

_{1}, r

_{2}) cannot reach its minimum [9]. Thus, for a large enough l, the minimal β (r

_{1}, r

_{2}) implies independent r

_{1}and r

_{2.}

#### 2.2. Quality Factor (QF) of QE

_{1}, r

_{2}) derived from (6) and (5). The greater the QF, the more sensitive is QE to the change in the uniformity of probability distribution around its minimum and, thus, the sharper is the shape of the minimum of QE.

_{β}is large (small), QE is sharp (blunt) around its minimum.

## 3. QF of Grid Occupancy (GO)

_{1}, r

_{2}) denoted by (r

_{1}(t), r

_{2}(t)), t = 1,..., N, where N > 1, let:

_{1}, r

_{2}) and two dependent variables (r

_{1}

^{#}, r

_{2}

^{#}), respectively. Figure 2c and Figure 2d plot 500 observed points of (z

_{1}, z

_{2}) and (z

_{1}

^{#}, z

_{2}

^{#}), respectively, where z

_{1}≡ q

_{r1}(r

_{1}), z

_{2}≡ q

_{r2}(r

_{2}), z

_{1}

^{#}≡ q

_{r1#}(r

_{1}

^{#}), and z

_{2}

^{#}≡ q

_{r2#}(r

_{2}

^{#}). It is easy to see that the points in Figure 2c are uniformly distributed whereas those in Figure 2d are not. Partition the region [0, 1] × [0, 1] in Figure 2c and Figure 2d into a grid of l × l, say, 10 × 10, same-sized squares. A square is said to be occupied if there is at least one point in it. Then, being uniform is the most efficient way to occupy maximum number of squares, and this is confirmed in Figure 2e and Figure 2f where the situations of Figure 2c and Figure 2d are shown, respectively. (The occupied squares are shaded.) The grid occupancy (GO) defined in (13) is exactly minus the ratio of occupied squares. As Figure 2e and Figure 2f clearly show, α(r

_{1}, r

_{2}) = −0.99 < α(r

_{1}

^{#}, r

_{2}

^{#}) = −0.89. Therefore, r

_{1}and r

_{2}are more independent than r

_{1}

^{#}and r

_{2}

^{#}.

**Figure 2.**Illustration of grid occupancy (GO). (a) r

_{1}and r

_{2}are independent. (b) r

_{1}

^{#}and r

_{2}

^{#}are dependent. (c) (z

_{1},z

_{2}) transformed from (r

_{1},r

_{2}) is uniform. (d) (z

_{1}

^{#},z

_{2}

^{#}) transformed from (r

_{1}

^{#},r

_{2}

^{#}) is not uniform. (e) α(r

_{1},r

_{2}) = −0.99. (f) α(r

_{1}

^{#},r

_{2}

^{#}) = −0.89.

_{α}is not easy to be analyzed, we study it using numerical methods as shown in Figure 3. We vary l from 2 to 1,000. For each l, find the N, denoted N

_{max}, that maximizes Q

_{α}to Q

_{α,max}. In Figure 3, we can find:

## 4. QF of Generalized Mutual Information (GMI)

#### 4.1. QF of GMI

_{1}and z

_{2}are as defined in (1), and we have utilized the following facts [9]:

#### 4.2. Existence of GMI

_{z1z2}(u, v)) being continuous on [0,1] × [0,1] is a sufficient condition to ensure the existence of GMI. Sometimes, however, f(p

_{z1z2}(u,v)) might not be continuous, so let us investigate the existence of GMI in more depth. As already done in (22), the in (21) can be treated as the QE with convex function . Thus, (5) and (6) can be applied to this QE, which yield:

_{1}and r

_{2}are independent [5]. The upper bound is reached when, for example, r

_{1}≡ r

_{2}, which is one of the most dependent cases. When f(u) = ulogu, (24) gives the lower and upper bounds of Shannon MI of l-level uniformly quantized versions of z

_{1}and z

_{2}, denoted I

_{l}(z

_{1},z

_{2}), as follows:

_{1}≡ z

_{2}≡ z, I

_{l}(z,z) = logl, which reaches the upper bound shown in (26). The MI of continuous variables is the limit of the MI of their quantized versions as the number of quantization levels goes to infinity [9]. Therefore, when r

_{1}≡ r

_{2}≡ r and thus z

_{1}≡ z

_{2}≡ z,

_{zz}(u, v) is infinite along the diagonal u = v and causes the integration in (19) to diverge. Compared with the Shannon MI, there are more stable forms of GMI. For example, when f (u) = exp (−u), (25) becomes:

## 5. Orders of QFs with Respect to l

_{β}’s and Q

_{γ}’s of some convex functions and their orders with respect to l as l → ∞. The function −sinπu, which appears in the last line, is not convex on [0, ∞), so it cannot be used in GMI and Q

_{γ}does not exist. However, it is convex on [0,1], so it can be used in QE and Q

_{β}can be calculated. Note the Q

_{β}and Q

_{γ}of −u

^{a}(0 < a < 1), and the Q

_{γ}of u

^{a}(0 < a < 1). Their orders are all Ϭ (l

^{2}), which are higher than Ϭ(l

^{2}/ln l), the order of the minus entropy and the MI. For the case of GO, (16) already shows that it can have the QF of Ϭ(l

^{2}). This means that, when l is large enough, these measures will have sharper minima than the minus entropy and the MI.

f (u) | Q_{β} (f (u)) | Q_{γ} (f (u)) |

−
u^{a} (0 < a < 1) | ||

u log u | ||

u^{a} (a > 1) | ||

a^{u} (a > 0, a ≠ 1) | ||

−sin πu |

## 6. Numerical Experiments

_{2}l = 2 log

_{2}100 = 13.29 bits to its ordinate approximately leads to the plot of MI shown in Figure 5i. The QFs of Figure 5e and Figure 5f are both Ϭ (l). Hence the shapes of their minima are basically identical. The order Ϭ (l) reaches the theoretical lower bound proposed in [9]. However, we can see that the neighboring areas of the minima are too flat and the positions of minima are not easy to locate. This is due to that e(t) and e(t + τ) cannot be completely independent. Therefore, generally we should choose convex functions whose orders of QFs are higher than Ϭ(l). Figure 5e is indeed the variance between p(i,j) and the uniform probability distribution. Figure 5e shows that such form of variance, though easily conceived of, is obviously not a good measure of independence. The shape of the QE plot of f(u) = u

^{2}is identical with Figure 5e. According to Table 1, under the precondition a > 1, the smaller a, the higher the order of QF of u

^{a}. Thus the nearer a approaches 1, the better the effect of QE. For instance, Figure 5d and Figure 5e show that the effect of u

^{1.001}is better than that of u

^{2}. Tsallis proved that the limit of his entropy is the Shannon entropy when the parameter q in his entropy, which corresponds to the a in f(u) = u

^{a}here, approaches 1 [6]. Figure 5c and Figure 5d are good manifestations of that statement. We can see that the shapes of curves in Figure 5c and Figure 5d look very close. What is more, that statement can be testified by the order of QFs. The order of QF of u

^{a}where a > 1, Ϭ(l

^{3−a}), is lower than but approaches the order of QF of ulogu, Ϭ(l

^{2}/lnl), as a approaches 1. We have used in Figure 5g the sine function − sinπu. This can be traced back to Kapur who used trigonometric functions to create entropies [7]. The QF of − sinπu is of constant order Ϭ (1). Therefore, the neighborhoods of the minima in Figure 5g are even more flat than those in Figure 5e and Figure 5f, and the positions of minima are hardly located. Finally, we can see that the minima in Figure 5a, Figure 5b, and Figure 5h, whose order is Ϭ(l

^{2}), are sharper, and their positions are more definite than those in Figure 5c and Figure 5i whose order is Ϭ(l

^{2}/lnl). Namely, these measures outperform the minus entropy and the MI in the prominence of minima. Of course, the minima in Figure 5a–d, Figure 5h, and Figure 5i are all distinct enough, and their positions coincide. The first (leftmost) minimum, τ = 47, is a good choice of the delay for reconstruction. To verify this point, Figure 6a, Figure 6b, and Figure 6c show the delay portraits using the first (leftmost) minimum, a non-minimum, and the seventh minimum, respectively. Compared with Figure 4, it is clear that Figure 6a reproduces well the folding structure of the original Rössler attractor.

**Figure 5.**GO, QE, and GMI of delay-coordinate variables of Rössler chaotic system versus delay τ. All plots are calculated using 65,536 sample points. l = 100 is used for GO and QE. Delay is measured in sampling numbers. The convex functions f (u)’s used in (b)–(i) and the orders of QFs of (a)–(i) are labeled in their captions.

**Figure 7.**GO and auto-covariance function of reconstructed variables of the Lorenz attractor versus τ. N = 50000 sample points are used in both. l = 200 in GO.

**Figure 8.**Comparison of reconstruction effects of the Lorenz attractor. The reconstruction delays of (b) and (c) are the first minima of GO and auto-covariance function in Figure 7, respectively.

## 7. Conclusions

## Acknowledgements

## References

- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656. [Google Scholar] [CrossRef] - Zemansky, M.W. Heat and Thermodynamics; McGraw-Hill: New York, NY, USA, 1968. [Google Scholar]
- Renyi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 547–561. [Google Scholar]
- Havrda, J.; Charvat, F. Quantification method of classification processes. Kybernetika
**1967**, 1, 30–34. [Google Scholar] - Csiszar, I. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung.
**1967**, 2, 299–318. [Google Scholar] - Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys.
**1988**, 52, 479–487. [Google Scholar] [CrossRef] - Kapur, J.N. Measures of Information and Their Applications; John Wiley & Sons: New York, NY, USA, 1994. [Google Scholar]
- Chen, Y. A novel grid occupancy criterion for independent component analysis. IEICE Trans. Fund. Electron. Comm. Comput. Sci.
**2009**, E92-A, 1874–1882. [Google Scholar] [CrossRef] - Chen, Y. Blind separation using convex functions. IEEE Trans. Signal Process.
**2005**, 53, 2027–2035. [Google Scholar] [CrossRef] - Kapur, J.N. Maximum-Entropy Models in Science and Engineering; John Wiley & Sons: New York, NY, USA, 1989. [Google Scholar]
- Xu, J.; Liu, Z.; Liu, R.; Yang, Q. Information transmission in human cerebral cortex. Physica D
**1997**, 106, 363–374. [Google Scholar] [CrossRef] - Kolarczyk, B. Representing entropy with dispersion sets. Entropy
**2010**, 12, 420–433. [Google Scholar] [CrossRef] - Takata, Y.; Tagashira, H.; Hyono, A.; Ohshima, H. Effect of counterion and configurational entropy on the surface tension of aqueous solutions of ionic surfactant and electrolyte mixtures. Entropy
**2010**, 12, 983–995. [Google Scholar] [CrossRef] - Zupanovic, P.; Kuic, D.; Losic, Z.B.; Petrov, D.; Juretic, D.; Brumen, M. The maximum entropy production principle and linear irreversible processes. Entropy
**2010**, 12, 996–1005. [Google Scholar] [CrossRef] - Van Dijck, G.; Van Hulle, M.M. Increasing and decreasing returns and losses in mutual information feature subset selection. Entropy
**2010**, 12, 2144–2170. [Google Scholar] [CrossRef] - Takens, F. Detecting strange attractors in turbulence. In Warwick 1980 Lecture Notes in Mathematics; Springer-Verlag: Berlin, Germany, 1981; Volume 898, pp. 366–381. [Google Scholar]
- Fraser, A.M.; Swinney, H.L. Independent coordinates for strange attractors from mutual information. Phys. Rev. A
**1986**, 33, 1134–1140. [Google Scholar] [CrossRef] [PubMed] - Gretton, A.; Bousquet, O.; Smola, A.; Scholkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings of the 16th International Conference on Algorithmic Learning Theory, Singapore, October 2005; pp. 63–77.
- Szekely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and testing dependence by correlation of distances. Ann. Stat.
**2007**, 35, 2769–2794. [Google Scholar] [CrossRef] - Nelsen, R.B. An Introduction to Copulas, Lecture Notes in Statistics 139; Springer-Verlag: New York, NY, USA, 1999. [Google Scholar]
- Chen, Y. On Theory and Methods for Blind Information Extraction. Ph.D. dissertation, Southeast University, Nanjing, China, 2001. [Google Scholar]
- Rössler, O.E. An equation for continuous chaos. Phys. Lett. A
**1976**, 57, 397–398. [Google Scholar] [CrossRef] - Longstaff, M.G.; Heath, R.A. A nonlinear analysis of the temporal characteristics of handwriting. Hum. Movement Sci.
**1999**, 18, 485–524. [Google Scholar] [CrossRef] - Lorenz, E.N. Deterministic nonperiodic flow. J. Atmos. Sci.
**1963**, 20, 130–141. [Google Scholar] [CrossRef]

## Appendix: Recursive Algorithm for Computing GMI and Related Issues

## A. Recursive Algorithm for Computing GMI

_{z1z2}in different local regions. Larger l should be taken in more fluctuant regions to avoid the estimated γ from being too small, whereas smaller l should be taken in rather flat regions to avoid the estimated γ from being too large due to limited sample size.

^{m}, m = 0,1,2,..., then:

^{m}× 2

^{m}grid consists of four squares in the 2

^{m+1}× 2

^{m+1}grid. Let us denote a certain square in the 2

^{m}× 2

^{m}grid by S and denote the four squares 2

^{m+1}× 2

^{m+1}in the grid that constitute S by S

_{1}, S

_{2}, S

_{3}, and S

_{4}. Also, let:

_{1},z

_{2}) is uniformly distributed over S, then:

_{m}on S = Component of γ

_{m+1}on S

_{1}, S

_{2}, S

_{3}, and S

_{4}. Thus, the following recursive algorithm for computing γ(r

_{1},r

_{2}) is obtained.

**Figure 9.**Each square in the 2

^{m}× 2

^{m}grid is composed of four squares in the 2

^{m+1}× 2

^{m+1}grid.

_{m}(i,j) denote the square in the ith column and the jth row of the 2

^{m}× 2

^{m}uniform grid over [0,1] × [0,1], i.e.,

_{z1z2}is uniform in ξ

_{m}(i, j), then

_{z1z2}is not uniform in ξ

_{m}(i,j), then we further divide ξ

_{m}(i,j) into four squares in the 2

^{m+1}× 2

^{m+1}grid, noting that:

## B. Uniformity Test

_{z1z2}is uniform in ξ

_{m}(i,j). This can be done using the χ

^{2}test proposed in [17]. However, the 20% confidence level in [17] was chosen arbitrarily. Experiments show that the uniformity test with the 20% confidence level may be too stringent and cause the GMI estimate to be too large. To solve this problem, we here propose a simple method for choosing the confidence level. We mix two independent variables, s

_{1}and s

_{2}, with identical distributions by a 2 × 2 rotation matrix with rotation angle θ to get two variables, r

_{1}and r

_{2}, i.e.,

_{1}and r

_{2}is then a function of θ, i.e.,

^{2}distribution table, we choose five confidence levels, 20, 10, 5, 2, and 1%, to estimate I (θ). If s

_{1}and s

_{2}have some standard distributions, then standard values of I (θ) are easy to obtain. Examining in turn the two cases where s

^{1}and s

^{2}are both uniform variables and both Laplacian variables, we can spot that the 5% confidence level produces the minimum estimation error in both cases. Comparing the estimation results using 20% and 5% confidence levels with the standard curves, we can see that the results with the 20% confidence level are obviously too large, whereas those with the 5% confidence level are quite accurate, as shown in Figure 10 and Figure 11. Therefore, 5% is a better choice for the confidence level.

**Figure 10.**MI of rotational mixtures of two independent identical uniform variables versus rotation angle.

**Figure 11.**MI of rotational mixtures of two independent identical Laplacian variables versus rotation angle.

## C. Practical Implementation

_{r}(r (t)).Indeed, when computing GO and QE, we may directly map the sample points of (r

_{1},r

_{2}) to the indices of the squares that the corresponding sample points of (z

_{1},z

_{2}) belong to [8,9]. When computing GMI, a sample point of (r

_{1},r

_{2}) is first mapped to an order pair (i,j). Then the square that (i,j) belongs to is determined according to the present partitioned grid. These operations (mainly sorting) do not involve floating point operations and can achieve high efficiency. The p(i,j) (p

_{m}(i,j)) in QE (GMI) is estimated by the ratio of the sample points in the square ξ(i,j) (ξ

_{m}(i,j)) to the total number of sample points.

## D. Output of the Recursive Algorithm of GMI

_{1}and r

_{2}are independent, (z

_{1},z

_{2}) will be uniform in [0,1] × [0,1]. The recursive algorithm for computing GMI will terminate at the m = 0 hierarchy and produce the minimal output f(1). The maximal output is generated for the most dependent case, e.g., r

_{1}≡ r

_{2}. Assume that we use N = 2

^{k}sample points of two same variables r

_{1}≡ r

_{2}to compute GMI. The corresponding sample points of (z

_{1},z

_{2}) will line up along the diagonal. The χ

^{2}uniformity tests, no matter using 20, 10, or 5% confidence levels, all lead to the same final partition that divides the sample points of (z

_{1},z

_{2}) into every four points in a square. For example, Figure 12 shows the case of N = 16 sample points. In this case, the computation of GMI proceeds as follows:

^{k}sample points:

^{u}(a > 1) is used.

© 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Chen, Y.; Aihara, K.
Some Convex Functions Based Measures of Independence and Their Application to Strange Attractor Reconstruction. *Entropy* **2011**, *13*, 820-840.
https://doi.org/10.3390/e13040820

**AMA Style**

Chen Y, Aihara K.
Some Convex Functions Based Measures of Independence and Their Application to Strange Attractor Reconstruction. *Entropy*. 2011; 13(4):820-840.
https://doi.org/10.3390/e13040820

**Chicago/Turabian Style**

Chen, Yang, and Kazuyuki Aihara.
2011. "Some Convex Functions Based Measures of Independence and Their Application to Strange Attractor Reconstruction" *Entropy* 13, no. 4: 820-840.
https://doi.org/10.3390/e13040820