Next Article in Journal
New Construction of Maximum Distance Separable (MDS) Self-Dual Codes over Finite Fields
Previous Article in Journal
Simple Stopping Criteria for Information Theoretic Feature Selection
Article Menu
Issue 2 (February) cover image

Export Article

Open AccessArticle

Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees

1
Centrum voor Wiskunde & Informatica, 1090 GB Amsterdam, The Netherlands
2
Korteweg-de Vries Institute for Mathematics, University of Amsterdam, 1090 GE Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(2), 100; https://doi.org/10.3390/e21020100
Received: 30 October 2018 / Revised: 4 January 2019 / Accepted: 18 January 2019 / Published: 22 January 2019
(This article belongs to the Section Information Theory, Probability and Statistics)
  |  
PDF [11617 KB, uploaded 30 January 2019]
  |     |  

Abstract

In this study, we present a novel method for quantifying dependencies in multivariate datasets, based on estimating the Rényi mutual information by minimum spanning trees (MSTs). The extent to which random variables are dependent is an important question, e.g., for uncertainty quantification and sensitivity analysis. The latter is closely related to the question how strongly dependent the output of, e.g., a computer simulation, is on the individual random input variables. To estimate the Rényi mutual information from data, we use a method due to Hero et al. that relies on computing minimum spanning trees (MSTs) of the data and uses the length of the MST in an estimator for the entropy. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. Because the MST computation does not require knowledge (or estimation) of the distributions, our methodology is well-suited for situations where only data are available. Furthermore, we show that, in the case where only the ranking of several dependencies is required rather than their exact value, it is not necessary to compute the Rényi divergence, but only an estimator derived from it. The main contributions of this paper are the introduction of this quantifier of dependency, as well as the novel combination of using approximate methods for MSTs with estimating the Rényi mutual information via MSTs. We applied our proposed method to an artificial test case based on the Ishigami function, as well as to a real-world test case involving an El Nino dataset. View Full-Text
Keywords: Rényi entropy; dependent data; large datasets; minimum spanning trees Rényi entropy; dependent data; large datasets; minimum spanning trees
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Eggels, A.; Crommelin, D. Quantifying Data Dependencies with Rényi Mutual Information and Minimum Spanning Trees. Entropy 2019, 21, 100.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top