The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118

Legat, Benoît; Rocher, Luc

doi:10.3390/s21113700

Open AccessComment

The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118

by

Benoît Legat

^1,†

and

Luc Rocher

^2,3,*,†

¹

Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Université Catholique de Louvain, B-1348 Louvain-la-Neuve, Belgium

²

Department of Computing, Imperial College London, London SW7 2AZ, UK

³

Data Science Institute, Imperial College London, London SW7 2AZ, UK

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2021, 21(11), 3700; https://doi.org/10.3390/s21113700

Submission received: 17 February 2021 / Revised: 12 May 2021 / Accepted: 24 May 2021 / Published: 26 May 2021

Download Versions Notes

Information theory is a unifying mathematical theory to measure information content, which is key for research in cryptography, statistical physics, and quantum computing [1,2,3]. A central property of information theory is the entropy, a metric quantifying the amount of information encoded in a signal [4]. In “Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network”, Nga et al. propose a general entropy correlation model to study the dependence patterns between multiple spatio-temporal signals [5]. They derive lower and upper bounds on the overall information entropy from only marginal and pairwise entropies, and use these bounds to study the impact of correlation on data aggregation, compression, and clustering of signals. Replicating these findings, however, we show that these bounds were incorrect, over- and underestimating the actual association patterns depending on the data. Deriving constraints and bounds on joint entropies is still a computationally difficult task and an active field of research [1,6], and new inequalities are regularly found [7,8,9,10,11]. More work is likely to be needed in order to develop a simple and general entropy correlation model for spatio-temporal signals.

Nga et al. study a system of m random variables

X_{1}, X_{2}, \dots, X_{m}

. They propose a normalized measure of correlation between two variables Y and Z, defined as:

ρ (Y, Z) = 2 - 2 \frac{H (Y, Z)}{H (Y) + H (Z)}

(1)

with H the Shannon entropy [4]. The authors further denote by

ρ_{\min} = {min}_{i \neq j} ρ (X_{i}, X_{j})

and

ρ_{\max} = {max}_{i \neq j} ρ (X_{i}, X_{j})

the minimum and maximum correlation between pairs of variables;

H_{\min} = {min}_{i} H (X_{i})

and

H_{\max} = {max}_{i} H (X_{i})

the minimum and maximum individual entropies.

The general entropy correlation model proposed by the authors rely on two claims, both incorrect:

Claim 1.

In Equation (13) and Section 2.2.2, Nga et al. claim that higher-order correlations are bounded by pairwise correlations:

\forall (i, j, k), ρ_{\min} \leq ρ (X_{i j}, X_{k}) \leq ρ_{\max}

Claim 2.

In Equations (16) and (20), Nga et al. use Claim 1 to prove that, for any subset of m variable, its joint entropy

H_{m}

is bounded by:

l_{m} H_{\min} \leq H_{m} \leq k_{m} H_{\max}

with

l_{m} = \frac{2 - ρ_{\max}}{2} (l_{m - 1} + 1)

,

k_{m} = \frac{2 - ρ_{\min}}{2} (k_{m - 1} + 1)

, and

l_{1} = k_{1} = 1

.

We propose two examples for

n = 3

, demonstrating that all four inequalities are incorrect. In our first example, we obtain

ρ_{\min} > ρ (X_{i j}, X_{k})

which contradicts the lower bound of Claim 1 and

H_{3} > k_{3} H_{\max}

, which contradicts the upper bound of Claim 2.

Proposition 1.

Consider the four i.i.d. discrete random variables

Y_{1}, Y_{2}, Y_{3}, Z

uniformly distributed over

{0, 1}

. For the random variables

{(X_{i})}_{i = 1}^{3} = (Y_{i}, Z)

, we have

ρ_{\min} = 1 / 2

,

ρ (X_{i j}, X_{k}) = 2 / 5

for any permutation

(i, j, k)

of

(1, 2, 3)

,

k_{3} = 15 / 8

,

H_{3} = 4

and

H_{\max} = 2

.

Proof.

As

Y_{1}, Y_{2}, Y_{3}, Z

are independent, we have

H (X_{i}) = H (Y_{i}) + H (Z) = 2

for

i = 1, 2, 3

and

H (X_{i j}) = H (Y_{i}) + H (Y_{j}) + H (Z) = 3

for

i \neq j

. Using Equation (1), we have

ρ (X_{i}, X_{j}) = 1 / 2

for

i \neq j

hence

ρ_{\min} = 1 / 2

,

k_{2} = 2 - ρ_{\min} = 3 / 2

and

k_{3} = (k_{2} + 1) k_{2} / 2 = 15 / 8

. For any permutation

(i, j, k)

of

(1, 2, 3)

, we have

H_{3} = H (X_{i j k}) = H (Y_{i}) + H (Y_{j}) + H (Y_{k}) + H (Z) = 4

hence

ρ (X_{i j}, X_{k}) = 2 / 5

. □

In our second example, we obtain

ρ_{\max} < ρ (X_{i j}, X_{k})

, which contradicts the upper bound of Claim 1 and

H_{3} < l_{3} H_{\min}

, which contradicts the lower bound of Claim 2.

Proposition 2.

Consider three discrete random variables

X_{1}, X_{2}, X_{3}

uniformly distributed over

{0, 1}

that are pairwise independent and satisfying the equation

X_{1} \oplus X_{2} \oplus X_{3} = 0

where ⊕ denotes the xor operation. We have

ρ_{\max} = 0

,

ρ (X_{i j}, X_{k}) = 2 / 3

for any permutation

(i, j, k)

of

(1, 2, 3)

,

l_{3} = 3

,

H_{3} = 2

and

H_{\min} = 1

.

Proof.

We have

H (X_{i}) = 1

for

i = 1, 2, 3

and as the variables are pairwise independent,

H (X_{i j}) = H (X_{i}) + H (X_{j}) = 2

for

i \neq j

. Using Equation (1), we have

ρ (X_{i}, X_{j}) = 0

for

i \neq j

hence

ρ_{\max} = 0

,

l_{2} = 2 - ρ_{\max} = 2

and

l_{3} = (l_{2} + 1) l_{2} / 2 = 3

. For any permutation

(i, j, k)

of

(1, 2, 3)

, we have

H_{3} = H (X_{i j k}) = 2

hence

ρ (X_{i j}, X_{k}) = 2 / 3

. □

Overall, the two new inequalities derived by Nga et al. for the joint entropy

H_{m}

do not appear to be correct starting at

m = 3

. The errors in the model stem from the assumption made in Claim 1 that pairwise and higher-order associations share the same minimum and maximum. The authors validate their method on a very specific dataset with

ρ_{\min} = 0.6

,

H_{\min} = 2.16

, and

H_{\max} = 2.55

, yet our examples show that different association structures yield widely different joint entropies. Bounding the joint entropy allows the authors to study the impact of correlation on data aggregation, compression, and clustering of signals. Although different bounds could potentially offer similar results, the broader conclusions of this article may not hold in practice.

Finally, deriving constraints and bounds on joint entropies is a computationally difficult task and an active field of research [1,6,7,8,9,10,11]. Theoretical derivations and numerical estimations both have to be used to bound the joint entropy

H_{m}

, based upon research on entropic vectors. The entropic vector of the random variables

X_{1}, X_{2}, \dots, X_{m}

is the vector of the entropies of all

2^{m - 1}

subsets of these variables. The set of all entropic vectors is a convex cone, for which a polyhedral outer-approximation is known (Theorem 1, [12]). For instance, we derive below tight (the tightness is a consequence of the fact that Equations (2) and (3) completely describe the entropic cone (Theorem 2, [12])) lower and upper bounds for

H_{3}

in Proposition 3, suggesting an alternative approach that could lead to upper bounds for

n > 3

and lower bounds as well. This bound relies on the following inequalities (Theorem 2.34, [6]):

H (X_{I}) \leq H (X_{J})

(2)

which is valid for any subsets

I \subseteq J \subseteq {1, \dots, m}

and

H (X_{I}) + H (X_{J}) \geq H (X_{I \cap J}) + H (X_{I \cup J})

(3)

which is valid for any subsets

I, J \subseteq {1, \dots, m}

.

Proposition 3.

For any three random variables

X_{1}, X_{2}, X_{3}

, the following inequalities hold:

\begin{matrix} max (H (X_{12}), H (X_{23}), H (X_{31})) \leq H_{3} \leq min (H (X_{31}) + H (X_{12}) - H (X_{1}), \\ H (X_{12}) + H (X_{23}) - H (X_{2}), H (X_{23}) + H (X_{31}) - H (X_{3})) . \end{matrix}

Proof.

For any permutation

(i, j, k)

of

(1, 2, 3)

, by Equation (2) with

I = {i, j}

and

J = {i, j, k}

, we have

H (X_{i j}) \leq H (X_{i j k}) = H_{3}

and by Equation (3) with

I = {i, j}

and

J = {j, k}

, we have

H (X_{i j}) + H (X_{j k}) \geq H (X_{i j k}) + H (X_{j})

, which implies that

H_{3} = H (X_{i j k}) \leq H (X_{i j}) + H (X_{j k}) - H (X_{j})

. □

Similar bounds can be obtained for

m > 3

using Equations (2) and (3) but their tightness is not guaranteed as the entropic cone is not completely described by these inequalities for

m > 3

(Theorem 6, [13]). This gap could be reduced numerically by iteratively producing linear cuts, in order to refine the polyhedral outer-approximation of the entropic cone given by Equations (2) and (3) [14]. Taken together, our findings suggest that theoretical derivations (

m \leq 3

) and numerical approximations (

m > 3

) on the entropic cone might provide future research directions towards a robust general entropy correlation model.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yeung, R.W. The Science of Information. In Information Theory and Network Coding; Yeung, R.W., Ed.; Springer: Boston, MA, USA, 2008; pp. 1–4. [Google Scholar]
Lesne, A. Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Math. Struct. Comput. Sci. 2014, 24, e240311. [Google Scholar] [CrossRef] [Green Version]
Vedral, V. The role of relative entropy in quantum information theory. Rev. Mod. Phys. 2002, 74, 197–234. [Google Scholar] [CrossRef] [Green Version]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Nguyen Thi Thanh, N.; Nguyen Kim, K.; Ngo Hong, S.; Ngo Lam, T. Entropy correlation and its impacts on data aggregation in a wireless sensor network. Sensors 2018, 18, 3118. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yeung, R.W. A First Course in Information Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Matus, F. Infinitely Many Information Inequalities. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 41–44. [Google Scholar]
Zhang, Z.; Yang, J. On a new non-Shannon-type information inequality. In Proceedings of the Proceedings IEEE International Symposium on Information Theory, Lausanne, Switzerland, 30 June–5 July 2002; p. 235. [Google Scholar]
Makarychev, K.; Makarychev, Y.; Romashchenko, A.; Vereshchagin, N. A new class of non-Shannon-type inequalities for entropies. Commun. Inf. Syst. 2002, 2, 147–166. [Google Scholar] [CrossRef] [Green Version]
Matúš, F. Conditional Independences among Four Random Variables III: Final Conclusion. Comb. Probab. Comput. 1999, 8, 269–276. [Google Scholar] [CrossRef]
Dougherty, R.; Freiling, C.; Zeger, K. Six New Non-Shannon Information Inequalities. In Proceedings of the 2006 IEEE International Symposium on Information Theory, Seattle, WA, USA, 9–14 July 2006; pp. 233–236. [Google Scholar]
Zhang, Z.; Yeung, R.W. A non-Shannon-type conditional inequality of information quantities. IEEE Trans. Inf. Theory 1997, 43, 1982–1986. [Google Scholar] [CrossRef]
Zhang, Z.; Yeung, R.W. On characterization of entropy function via information inequalities. IEEE Trans. Inf. Theory 1998, 44, 1440–1452. [Google Scholar] [CrossRef] [Green Version]
Legat, B.; Jungers, R.M. Parallel optimization on the Entropic Cone. In Proceedings of the 37rd Symposium on Information Theory in the Benelux, Louvain-la-Neuve, Belgium, 19–20 May 2016; pp. 206–211. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Legat, B.; Rocher, L. The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118. Sensors 2021, 21, 3700. https://doi.org/10.3390/s21113700

AMA Style

Legat B, Rocher L. The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118. Sensors. 2021; 21(11):3700. https://doi.org/10.3390/s21113700

Chicago/Turabian Style

Legat, Benoît, and Luc Rocher. 2021. "The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118" Sensors 21, no. 11: 3700. https://doi.org/10.3390/s21113700

APA Style

Legat, B., & Rocher, L. (2021). The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118. Sensors, 21(11), 3700. https://doi.org/10.3390/s21113700

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI