The Limits of Pairwise Correlation to Model the Joint Entropy. Comment on Nguyen Thi Thanh et al. Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network. Sensors 2018, 18, 3118

Information theory is a unifying mathematical theory to measure information content, which is key for research in cryptography, statistical physics, and quantum computing [...].

Information theory is a unifying mathematical theory to measure information content, which is key for research in cryptography, statistical physics, and quantum computing [1][2][3]. A central property of information theory is the entropy, a metric quantifying the amount of information encoded in a signal [4]. In "Entropy Correlation and Its Impacts on Data Aggregation in a Wireless Sensor Network", Nga et al. propose a general entropy correlation model to study the dependence patterns between multiple spatio-temporal signals [5]. They derive lower and upper bounds on the overall information entropy from only marginal and pairwise entropies, and use these bounds to study the impact of correlation on data aggregation, compression, and clustering of signals. Replicating these findings, however, we show that these bounds were incorrect, over-and underestimating the actual association patterns depending on the data. Deriving constraints and bounds on joint entropies is still a computationally difficult task and an active field of research [1,6], and new inequalities are regularly found [7][8][9][10][11]. More work is likely to be needed in order to develop a simple and general entropy correlation model for spatio-temporal signals.
Nga et al. study a system of m random variables X 1 , X 2 , . . . , X m . They propose a normalized measure of correlation between two variables Y and Z, defined as: with H the Shannon entropy [4]. The authors further denote by ρ min = min i =j ρ(X i , X j ) and ρ max = max i =j ρ(X i , X j ) the minimum and maximum correlation between pairs of variables; H min = min i H(X i ) and H max = max i H(X i ) the minimum and maximum individual entropies.
The general entropy correlation model proposed by the authors rely on two claims, both incorrect: We propose two examples for n = 3, demonstrating that all four inequalities are incorrect. In our first example, we obtain ρ min > ρ(X ij , X k ) which contradicts the lower bound of Claim 1 and H 3 > k 3 H max , which contradicts the upper bound of Claim 2.
In our second example, we obtain ρ max < ρ(X ij , X k ), which contradicts the upper bound of Claim 1 and H 3 < l 3 H min , which contradicts the lower bound of Claim 2.
Overall, the two new inequalities derived by Nga et al. for the joint entropy H m do not appear to be correct starting at m = 3. The errors in the model stem from the assumption made in Claim 1 that pairwise and higher-order associations share the same minimum and maximum. The authors validate their method on a very specific dataset with ρ min = 0.6, H min = 2.16, and H max = 2.55, yet our examples show that different association structures yield widely different joint entropies. Bounding the joint entropy allows the authors to study the impact of correlation on data aggregation, compression, and clustering of signals. Although different bounds could potentially offer similar results, the broader conclusions of this article may not hold in practice.
Finally, deriving constraints and bounds on joint entropies is a computationally difficult task and an active field of research [1,[6][7][8][9][10][11]. Theoretical derivations and numerical estimations both have to be used to bound the joint entropy H m , based upon research on entropic vectors. The entropic vector of the random variables X 1 , X 2 , . . . , X m is the vector of the entropies of all 2 m−1 subsets of these variables. The set of all entropic vectors is a convex cone, for which a polyhedral outer-approximation is known (Theorem 1, [12]). For instance, we derive below tight (the tightness is a consequence of the fact that Equations (2) and (3) completely describe the entropic cone (Theorem 2, [12])) lower and upper bounds for H 3 in Proposition 3, suggesting an alternative approach that could lead to upper bounds for n > 3 and lower bounds as well. This bound relies on the following inequalities (Theorem 2.34, [6]): which is valid for any subsets I ⊆ J ⊆ {1, . . . , m} and which is valid for any subsets I, J ⊆ {1, . . . , m}.

Proposition 3.
For any three random variables X 1 , X 2 , X 3 , the following inequalities hold: Proof. For any permutation (i, j, k) of (1, 2, 3), by Equation (2) with I = {i, j} and J = {i, j, k}, we have H(X ij ) ≤ H(X ijk ) = H 3 and by Equation (3) with I = {i, j} and J = {j, k}, we have H(X ij ) + H(X jk ) ≥ H(X ijk ) + H(X j ), which implies that Similar bounds can be obtained for m > 3 using Equations (2) and (3) but their tightness is not guaranteed as the entropic cone is not completely described by these inequalities for m > 3 (Theorem 6, [13]). This gap could be reduced numerically by iteratively producing linear cuts, in order to refine the polyhedral outer-approximation of the entropic cone given by Equations (2) and (3) [14]. Taken together, our findings suggest that theoretical derivations (m ≤ 3) and numerical approximations (m > 3) on the entropic cone might provide future research directions towards a robust general entropy correlation model.