E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Journal Browser

Journal Browser

Special Issue "Estimating Information-Theoretic Quantities from Data"

Quicklinks

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory".

Deadline for manuscript submissions: closed (31 January 2013)

Special Issue Editor

Guest Editor
Dr. Ilya Nemenman (Website)

Theoretical Biophysics Laboratory, Department of Physics and Department of Biology, Emory University, Atlanta GA 30322, USA
Interests: information processing in biological systems; coarse-grained modeling in biology

Special Issue Information

Dear Colleagues,

Information-theoretic methods have become a workhorse of interdisciplinary research in computational molecular biology, computational neuroscience, ecology, social communications, and other fields. They are used for inference of interaction networks (such as protein networks or neural wiring diagrams), for understanding communication within these networks, and for building dynamical models of input-output behavior in them. They are further used to quantify diversity and stability of ecological niches, to characterize social interactions among individuals, and to develop assays for diseases and other abnormalities. One of the key problems slowing wider acceptance of these methods is the difficulty of reliable estimation of entropy and other information-theoretic quantities from empirical data. The field has made a remarkable progress in this direction in the recent years, and this Special Issue will explore this progress. We welcome contributions that explore methodological and algorithmic advances, applications to specialized data-driven research problems in the various fields of science, and theoretical investigations that explore the limits of our ability to solve the formidable problem of entropy and information estimation.

Dr. Ilya Nemenman
Guest Editor

Submission

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. Papers will be published continuously (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are refereed through a peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed Open Access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs).

Keywords

  • entropy estimation
  • information estimation
  • maximum entropy models
  • information statistics
  • mutual information
  • kernel methods
  • string matching
  • complexity
  • Bayesian methods
  • estimators
  • Shannon entropy
  • Renyi entropy
  • small data sets

Published Papers (12 papers)

View options order results:
result details:
Displaying articles 1-12
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Non–Parametric Estimation of Mutual Information through the Entropy of the Linkage
Entropy 2013, 15(12), 5154-5177; doi:10.3390/e15125154
Received: 25 September 2013 / Revised: 30 October 2013 / Accepted: 11 November 2013 / Published: 26 November 2013
Cited by 1 | PDF Full-text (376 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
A new, non–parametric and binless estimator for the mutual information of a d–dimensional random vector is proposed. First of all, an equation that links the mutual information to the entropy of a suitable random vector with uniformly distributed components is deduced. When [...] Read more.
A new, non–parametric and binless estimator for the mutual information of a d–dimensional random vector is proposed. First of all, an equation that links the mutual information to the entropy of a suitable random vector with uniformly distributed components is deduced. When d = 2 this equation reduces to the well known connection between mutual information and entropy of the copula function associated to the original random variables. Hence, the problem of estimating the mutual information of the original random vector is reduced to the estimation of the entropy of a random vector obtained through a multidimensional transformation. The estimator we propose is a two–step method: first estimate the transformation and obtain the transformed sample, then estimate its entropy. The properties of the new estimator are discussed through simulation examples and its performances are compared to those of the best estimators in the literature. The precision of the estimator converges to values of the same order of magnitude of the best estimator tested. However, the new estimator is unbiased even for larger dimensions and smaller sample sizes, while the other tested estimators show a bias in these cases. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle Estimating Functions of Distributions Defined over Spaces of Unknown Size
Entropy 2013, 15(11), 4668-4699; doi:10.3390/e15114668
Received: 3 August 2013 / Revised: 11 September 2013 / Accepted: 17 October 2013 / Published: 31 October 2013
Cited by 4 | PDF Full-text (459 KB) | HTML Full-text | XML Full-text
Abstract
We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size m and the Dirichlet prior’s concentration parameter c, we treat both as random variables set by a hyperprior. We show [...] Read more.
We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size m and the Dirichlet prior’s concentration parameter c, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, P(c, m), obeys a simple “Irrelevance of Unseen Variables” (IUV) desideratum iff P(c, m) = P(c)P(m). Thus, requiring IUV greatly reduces the number of degrees of freedom of the hyperprior. Some information-theoretic quantities can be expressed multiple ways, in terms of different event spaces, e.g., mutual information. With all hyperpriors (implicitly) used in earlier work, different choices of this event space lead to different posterior expected values of these information-theoretic quantities. We show that there is no such dependence on the choice of event space for a hyperprior that obeys IUV. We also derive a result that allows us to exploit IUV to greatly simplify calculations, like the posterior expected mutual information or posterior expected multi-information. We also use computer experiments to favorably compare an IUV-based estimator of entropy to three alternative methods in common use. We end by discussing how seemingly innocuous changes to the formalization of an estimation problem can substantially affect the resultant estimates of posterior expectations. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle The Measurement of Information Transmitted by a Neural Population: Promises and Challenges
Entropy 2013, 15(9), 3507-3527; doi:10.3390/e15093507
Received: 10 May 2013 / Revised: 19 August 2013 / Accepted: 27 August 2013 / Published: 3 September 2013
PDF Full-text (1625 KB) | HTML Full-text | XML Full-text
Abstract
All brain functions require the coordinated activity of many neurons, and therefore there is considerable interest in estimating the amount of information that the discharge of a neural population transmits to its targets. In the past, such estimates had presented a significant [...] Read more.
All brain functions require the coordinated activity of many neurons, and therefore there is considerable interest in estimating the amount of information that the discharge of a neural population transmits to its targets. In the past, such estimates had presented a significant challenge for populations of more than a few neurons, but we have recently described a novel method for providing such estimates for populations of essentially arbitrary size. Here, we explore the influence of some important aspects of the neuronal population discharge on such estimates. In particular, we investigate the roles of mean firing rate and of the degree and nature of correlations among neurons. The results provide constraints on the applicability of our new method and should help neuroscientists determine whether such an application is appropriate for their data. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle Estimation Bias in Maximum Entropy Models
Entropy 2013, 15(8), 3109-3129; doi:10.3390/e15083109
Received: 25 June 2013 / Revised: 25 July 2013 / Accepted: 29 July 2013 / Published: 2 August 2013
PDF Full-text (3902 KB) | HTML Full-text | XML Full-text
Abstract
Maximum entropy models have become popular statistical models in neuroscience and other areas in biology and can be useful tools for obtaining estimates of mutual information in biological systems. However, maximum entropy models fit to small data sets can be subject to [...] Read more.
Maximum entropy models have become popular statistical models in neuroscience and other areas in biology and can be useful tools for obtaining estimates of mutual information in biological systems. However, maximum entropy models fit to small data sets can be subject to sampling bias; i.e., the true entropy of the data can be severely underestimated. Here, we study the sampling properties of estimates of the entropy obtained from maximum entropy models. We focus on pairwise binary models, which are used extensively to model neural population activity. We show that if the data is well described by a pairwise model, the bias is equal to the number of parameters divided by twice the number of observations. If, however, the higher order correlations in the data deviate from those predicted by the model, the bias can be larger. Using a phenomenological model of neural population recordings, we find that this additional bias is highest for small firing probabilities, strong correlations and large population sizes—for the parameters we tested, a factor of about four higher. We derive guidelines for how long a neurophysiological experiment needs to be in order to ensure that the bias is less than a specified criterion. Finally, we show how a modified plug-in estimate of the entropy can be used for bias correction. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle Efficient Approximation of the Conditional Relative Entropy with Applications to Discriminative Learning of Bayesian Network Classifiers
Entropy 2013, 15(7), 2716-2735; doi:10.3390/e15072716
Received: 8 June 2013 / Revised: 3 July 2013 / Accepted: 3 July 2013 / Published: 12 July 2013
Cited by 2 | PDF Full-text (2244 KB) | HTML Full-text | XML Full-text
Abstract
We propose a minimum variance unbiased approximation to the conditional relative entropy of the distribution induced by the observed frequency estimates, for multi-classification tasks. Such approximation is an extension of a decomposable scoring criterion, named approximate conditional log-likelihood (aCLL), primarily used for [...] Read more.
We propose a minimum variance unbiased approximation to the conditional relative entropy of the distribution induced by the observed frequency estimates, for multi-classification tasks. Such approximation is an extension of a decomposable scoring criterion, named approximate conditional log-likelihood (aCLL), primarily used for discriminative learning of augmented Bayesian network classifiers. Our contribution is twofold: (i) it addresses multi-classification tasks and not only binary-classification ones; and (ii) it covers broader stochastic assumptions than uniform distribution over the parameters. Specifically, we considered a Dirichlet distribution over the parameters, which was experimentally shown to be a very good approximation to CLL. In addition, for Bayesian network classifiers, a closed-form equation is found for the parameters that maximize the scoring criterion. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems
Entropy 2013, 15(6), 2246-2276; doi:10.3390/e15062246
Received: 15 March 2013 / Revised: 21 May 2013 / Accepted: 30 May 2013 / Published: 5 June 2013
Cited by 4 | PDF Full-text (461 KB) | HTML Full-text | XML Full-text
Abstract
We characterize the statistical bootstrap for the estimation of informationtheoretic quantities from data, with particular reference to its use in the study of large-scale social phenomena. Our methods allow one to preserve, approximately, the underlying axiomatic relationships of information theory—in particular, consistency [...] Read more.
We characterize the statistical bootstrap for the estimation of informationtheoretic quantities from data, with particular reference to its use in the study of large-scale social phenomena. Our methods allow one to preserve, approximately, the underlying axiomatic relationships of information theory—in particular, consistency under arbitrary coarse-graining—that motivate use of these quantities in the first place, while providing reliability comparable to the state of the art for Bayesian estimators. We show how information-theoretic quantities allow for rigorous empirical study of the decision-making capacities of rational agents, and the time-asymmetric flows of information in distributed systems. We provide illustrative examples by reference to ongoing collaborative work on the semantic structure of the British Criminal Court system and the conflict dynamics of the contemporary Afghanistan insurgency. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle Bias Adjustment for a Nonparametric Entropy Estimator
Entropy 2013, 15(6), 1999-2011; doi:10.3390/e15061999
Received: 20 March 2013 / Revised: 8 May 2013 / Accepted: 17 May 2013 / Published: 23 May 2013
Cited by 2 | PDF Full-text (409 KB) | HTML Full-text | XML Full-text
Abstract
Zhang in 2012 introduced a nonparametric estimator of Shannon’s entropy, whose bias decays exponentially fast when the alphabet is finite. We propose a methodology to estimate the bias of this estimator. We then use it to construct a new estimator of entropy. [...] Read more.
Zhang in 2012 introduced a nonparametric estimator of Shannon’s entropy, whose bias decays exponentially fast when the alphabet is finite. We propose a methodology to estimate the bias of this estimator. We then use it to construct a new estimator of entropy. Simulation results suggest that this bias adjusted estimator has a significantly lower bias than many other commonly used estimators. We consider both the case when the alphabet is finite and when it is countably infinite. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle Bayesian and Quasi-Bayesian Estimators for Mutual Information from Discrete Data
Entropy 2013, 15(5), 1738-1755; doi:10.3390/e15051738
Received: 16 February 2013 / Revised: 24 April 2013 / Accepted: 2 May 2013 / Published: 10 May 2013
Cited by 3 | PDF Full-text (1877 KB) | HTML Full-text | XML Full-text
Abstract
Mutual information (MI) quantifies the statistical dependency between a pair of random variables, and plays a central role in the analysis of engineering and biological systems. Estimation of MI is difficult due to its dependence on an entire joint distribution, which is [...] Read more.
Mutual information (MI) quantifies the statistical dependency between a pair of random variables, and plays a central role in the analysis of engineering and biological systems. Estimation of MI is difficult due to its dependence on an entire joint distribution, which is difficult to estimate from samples. Here we discuss several regularized estimators for MI that employ priors based on the Dirichlet distribution. First, we discuss three “quasi-Bayesian” estimators that result from linear combinations of Bayesian estimates for conditional and marginal entropies. We show that these estimators are not in fact Bayesian, and do not arise from a well-defined posterior distribution and may in fact be negative. Second, we show that a fully Bayesian MI estimator proposed by Hutter (2002), which relies on a fixed Dirichlet prior, exhibits strong prior dependence and has large bias for small datasets. Third, we formulate a novel Bayesian estimator using a mixture-of-Dirichlets prior, with mixing weights designed to produce an approximately flat prior over MI. We examine the performance of these estimators with a variety of simulated datasets and show that, surprisingly, quasi-Bayesian estimators generally outperform our Bayesian estimator. We discuss outstanding challenges for MI estimation and suggest promising avenues for future research. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Figures

Open AccessArticle An Estimate of Mutual Information that Permits Closed-Form Optimisation
Entropy 2013, 15(5), 1690-1704; doi:10.3390/e15051690
Received: 1 February 2013 / Revised: 18 April 2013 / Accepted: 28 April 2013 / Published: 8 May 2013
Cited by 2 | PDF Full-text (740 KB) | HTML Full-text | XML Full-text
Abstract
We introduce a new estimate of mutual information between a dataset and a target variable that can be maximised analytically and has broad applicability in the field of machine learning and statistical pattern recognition. This estimate has previously been employed implicitly as [...] Read more.
We introduce a new estimate of mutual information between a dataset and a target variable that can be maximised analytically and has broad applicability in the field of machine learning and statistical pattern recognition. This estimate has previously been employed implicitly as an approximation to quadratic mutual information. In this paper we will study the properties of these estimates of mutual information in more detail, and provide a derivation from a perspective of pairwise interactions. From this perspective, we will show a connection between our proposed estimate and Laplacian eigenmaps, which so far has not been shown to be related to mutual information. Compared with other popular measures of mutual information, which can only be maximised through an iterative process, ours can be maximised much more efficiently and reliably via closed-form eigendecomposition. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle A Novel Nonparametric Distance Estimator for Densities with Error Bounds
Entropy 2013, 15(5), 1609-1623; doi:10.3390/e15051609
Received: 19 December 2012 / Revised: 25 April 2013 / Accepted: 28 April 2013 / Published: 6 May 2013
Cited by 1 | PDF Full-text (620 KB) | HTML Full-text | XML Full-text
Abstract
The use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an α-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of [...] Read more.
The use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an α-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of generalized Tsallis divergences and entropies. The nonparametric Parzen’s density estimator emerges as a natural candidate to estimate the underlying probability density function, since it may account for data from different groups, or experiments with distinct instrumental precisions, i.e., non-independent and identically distributed (non-i.i.d.) data. However, the information theoretic derived metric of the nonparametric Parzen’s density estimator displays infinite variance, limiting the direct use of resampling estimators. Based on measure theory, we present a change of measure to build a finite variance density allowing the use of resampling estimators. In order to counteract the poor scaling with dimension, we propose a new nonparametric two-stage robust resampling estimator of Hellinger’s metric error bounds for heterocedastic data. The approach presents very promising results allowing the use of different covariances for different clusters with impact on the distance evaluation. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Open AccessArticle Minimum Mutual Information and Non-Gaussianity through the Maximum Entropy Method: Estimation from Finite Samples
Entropy 2013, 15(3), 721-752; doi:10.3390/e15030721
Received: 8 November 2012 / Revised: 15 February 2013 / Accepted: 19 February 2013 / Published: 25 February 2013
Cited by 5 | PDF Full-text (637 KB) | HTML Full-text | XML Full-text
Abstract
The Minimum Mutual Information (MinMI) Principle provides the least committed, maximum-joint-entropy (ME) inferential law that is compatible with prescribed marginal distributions and empirical cross constraints. Here, we estimate MI bounds (the MinMI values) generated by constraining sets Tcr comprehended by m [...] Read more.
The Minimum Mutual Information (MinMI) Principle provides the least committed, maximum-joint-entropy (ME) inferential law that is compatible with prescribed marginal distributions and empirical cross constraints. Here, we estimate MI bounds (the MinMI values) generated by constraining sets Tcr comprehended by mcr linear and/or nonlinear joint expectations, computed from samples of N iid outcomes. Marginals (and their entropy) are imposed by single morphisms of the original random variables. N-asymptotic formulas are given both for the distribution of cross expectation’s estimation errors, the MinMI estimation bias, its variance and distribution. A growing Tcr leads to an increasing MinMI, converging eventually to the total MI. Under N-sized samples, the MinMI increment relative to two encapsulated sets Tcr1 Tcr2 (with numbers of constraints mcr1<mcr2 ) is the test-difference δH = Hmax 1, N - Hmax 2, N ≥ 0  between the two respective estimated MEs. Asymptotically, δH follows a Chi-Squared distribution 1/2NΧ2 (mcr2-mcr1) whose upper quantiles determine if constraints in Tcr2/Tcr1 explain significant extra MI. As an example, we have set marginals to being normally distributed (Gaussian) and have built a sequence of MI bounds, associated to successive non-linear correlations due to joint non-Gaussianity. Noting that in real-world situations available sample sizes can be rather low, the relationship between MinMI bias, probability density over-fitting and outliers is put in evidence for under-sampled data. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)

Review

Jump to: Research

Open AccessReview Machine Learning with Squared-Loss Mutual Information
Entropy 2013, 15(1), 80-112; doi:10.3390/e15010080
Received: 29 October 2012 / Revised: 7 December 2012 / Accepted: 21 December 2012 / Published: 27 December 2012
Cited by 10 | PDF Full-text (350 KB) | HTML Full-text | XML Full-text
Abstract
Mutual information (MI) is useful for detecting statistical independence between random variables, and it has been successfully applied to solving various machine learning problems. Recently, an alternative to MI called squared-loss MI (SMI) was introduced. While ordinary MI is the Kullback–Leibler divergence [...] Read more.
Mutual information (MI) is useful for detecting statistical independence between random variables, and it has been successfully applied to solving various machine learning problems. Recently, an alternative to MI called squared-loss MI (SMI) was introduced. While ordinary MI is the Kullback–Leibler divergence from the joint distribution to the product of the marginal distributions, SMI is its Pearson divergence variant. Because both the divergences belong to the ƒ-divergence family, they share similar theoretical properties. However, a notable advantage of SMI is that it can be approximated from data in a computationally more efficient and numerically more stable way than ordinary MI. In this article, we review recent development in SMI approximation based on direct density-ratio estimation and SMI-based machine learning techniques such as independence testing, dimensionality reduction, canonical dependency analysis, independent component analysis, object matching, clustering, and causal inference. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)

Journal Contact

MDPI AG
Entropy Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
entropy@mdpi.com
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Entropy
Back to Top