entropy-logo

Journal Browser

Journal Browser

Estimating Information-Theoretic Quantities from Data

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (31 January 2013) | Viewed by 98761

Special Issue Editor

Theoretical Biophysics Laboratory, Department of Physics and Department of Biology, Emory University, Atlanta, GA 30322, USA
Interests: information processing in biological systems; coarse-grained modeling in biology

Special Issue Information

Dear Colleagues,

Information-theoretic methods have become a workhorse of interdisciplinary research in computational molecular biology, computational neuroscience, ecology, social communications, and other fields. They are used for inference of interaction networks (such as protein networks or neural wiring diagrams), for understanding communication within these networks, and for building dynamical models of input-output behavior in them. They are further used to quantify diversity and stability of ecological niches, to characterize social interactions among individuals, and to develop assays for diseases and other abnormalities. One of the key problems slowing wider acceptance of these methods is the difficulty of reliable estimation of entropy and other information-theoretic quantities from empirical data. The field has made a remarkable progress in this direction in the recent years, and this Special Issue will explore this progress. We welcome contributions that explore methodological and algorithmic advances, applications to specialized data-driven research problems in the various fields of science, and theoretical investigations that explore the limits of our ability to solve the formidable problem of entropy and information estimation.

Dr. Ilya Nemenman
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • entropy estimation
  • information estimation
  • maximum entropy models
  • information statistics
  • mutual information
  • kernel methods
  • string matching
  • complexity
  • Bayesian methods
  • estimators
  • Shannon entropy
  • Renyi entropy
  • small data sets

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

376 KiB  
Article
Non–Parametric Estimation of Mutual Information through the Entropy of the Linkage
by Maria Teresa Giraudo, Laura Sacerdote and Roberta Sirovich
Entropy 2013, 15(12), 5154-5177; https://doi.org/10.3390/e15125154 - 26 Nov 2013
Cited by 12 | Viewed by 6886
Abstract
A new, non–parametric and binless estimator for the mutual information of a d–dimensional random vector is proposed. First of all, an equation that links the mutual information to the entropy of a suitable random vector with uniformly distributed components is deduced. When d [...] Read more.
A new, non–parametric and binless estimator for the mutual information of a d–dimensional random vector is proposed. First of all, an equation that links the mutual information to the entropy of a suitable random vector with uniformly distributed components is deduced. When d = 2 this equation reduces to the well known connection between mutual information and entropy of the copula function associated to the original random variables. Hence, the problem of estimating the mutual information of the original random vector is reduced to the estimation of the entropy of a random vector obtained through a multidimensional transformation. The estimator we propose is a two–step method: first estimate the transformation and obtain the transformed sample, then estimate its entropy. The properties of the new estimator are discussed through simulation examples and its performances are compared to those of the best estimators in the literature. The precision of the estimator converges to values of the same order of magnitude of the best estimator tested. However, the new estimator is unbiased even for larger dimensions and smaller sample sizes, while the other tested estimators show a bias in these cases. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

459 KiB  
Article
Estimating Functions of Distributions Defined over Spaces of Unknown Size
by David H. Wolpert and Simon DeDeo
Entropy 2013, 15(11), 4668-4699; https://doi.org/10.3390/e15114668 - 31 Oct 2013
Cited by 10 | Viewed by 7276
Abstract
We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size m and the Dirichlet prior’s concentration parameter c, we treat both as random variables set by a hyperprior. We show that [...] Read more.
We consider Bayesian estimation of information-theoretic quantities from data, using a Dirichlet prior. Acknowledging the uncertainty of the event space size m and the Dirichlet prior’s concentration parameter c, we treat both as random variables set by a hyperprior. We show that the associated hyperprior, P(c, m), obeys a simple “Irrelevance of Unseen Variables” (IUV) desideratum iff P(c, m) = P(c)P(m). Thus, requiring IUV greatly reduces the number of degrees of freedom of the hyperprior. Some information-theoretic quantities can be expressed multiple ways, in terms of different event spaces, e.g., mutual information. With all hyperpriors (implicitly) used in earlier work, different choices of this event space lead to different posterior expected values of these information-theoretic quantities. We show that there is no such dependence on the choice of event space for a hyperprior that obeys IUV. We also derive a result that allows us to exploit IUV to greatly simplify calculations, like the posterior expected mutual information or posterior expected multi-information. We also use computer experiments to favorably compare an IUV-based estimator of entropy to three alternative methods in common use. We end by discussing how seemingly innocuous changes to the formalization of an estimation problem can substantially affect the resultant estimates of posterior expectations. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

1625 KiB  
Article
The Measurement of Information Transmitted by a Neural Population: Promises and Challenges
by Marshall Crumiller, Bruce Knight and Ehud Kaplan
Entropy 2013, 15(9), 3507-3527; https://doi.org/10.3390/e15093507 - 03 Sep 2013
Cited by 11 | Viewed by 5710
Abstract
All brain functions require the coordinated activity of many neurons, and therefore there is considerable interest in estimating the amount of information that the discharge of a neural population transmits to its targets. In the past, such estimates had presented a significant challenge [...] Read more.
All brain functions require the coordinated activity of many neurons, and therefore there is considerable interest in estimating the amount of information that the discharge of a neural population transmits to its targets. In the past, such estimates had presented a significant challenge for populations of more than a few neurons, but we have recently described a novel method for providing such estimates for populations of essentially arbitrary size. Here, we explore the influence of some important aspects of the neuronal population discharge on such estimates. In particular, we investigate the roles of mean firing rate and of the degree and nature of correlations among neurons. The results provide constraints on the applicability of our new method and should help neuroscientists determine whether such an application is appropriate for their data. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

3902 KiB  
Article
Estimation Bias in Maximum Entropy Models
by Jakob H. Macke, Iain Murray and Peter E. Latham
Entropy 2013, 15(8), 3109-3129; https://doi.org/10.3390/e15083109 - 02 Aug 2013
Cited by 5 | Viewed by 7742
Abstract
Maximum entropy models have become popular statistical models in neuroscience and other areas in biology and can be useful tools for obtaining estimates of mutual information in biological systems. However, maximum entropy models fit to small data sets can be subject to sampling [...] Read more.
Maximum entropy models have become popular statistical models in neuroscience and other areas in biology and can be useful tools for obtaining estimates of mutual information in biological systems. However, maximum entropy models fit to small data sets can be subject to sampling bias; i.e., the true entropy of the data can be severely underestimated. Here, we study the sampling properties of estimates of the entropy obtained from maximum entropy models. We focus on pairwise binary models, which are used extensively to model neural population activity. We show that if the data is well described by a pairwise model, the bias is equal to the number of parameters divided by twice the number of observations. If, however, the higher order correlations in the data deviate from those predicted by the model, the bias can be larger. Using a phenomenological model of neural population recordings, we find that this additional bias is highest for small firing probabilities, strong correlations and large population sizes—for the parameters we tested, a factor of about four higher. We derive guidelines for how long a neurophysiological experiment needs to be in order to ensure that the bias is less than a specified criterion. Finally, we show how a modified plug-in estimate of the entropy can be used for bias correction. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

2244 KiB  
Article
Efficient Approximation of the Conditional Relative Entropy with Applications to Discriminative Learning of Bayesian Network Classifiers
by Alexandra M. Carvalho, Pedro Adão and Paulo Mateus
Entropy 2013, 15(7), 2716-2735; https://doi.org/10.3390/e15072716 - 12 Jul 2013
Cited by 13 | Viewed by 5678
Abstract
We propose a minimum variance unbiased approximation to the conditional relative entropy of the distribution induced by the observed frequency estimates, for multi-classification tasks. Such approximation is an extension of a decomposable scoring criterion, named approximate conditional log-likelihood (aCLL), primarily used for discriminative [...] Read more.
We propose a minimum variance unbiased approximation to the conditional relative entropy of the distribution induced by the observed frequency estimates, for multi-classification tasks. Such approximation is an extension of a decomposable scoring criterion, named approximate conditional log-likelihood (aCLL), primarily used for discriminative learning of augmented Bayesian network classifiers. Our contribution is twofold: (i) it addresses multi-classification tasks and not only binary-classification ones; and (ii) it covers broader stochastic assumptions than uniform distribution over the parameters. Specifically, we considered a Dirichlet distribution over the parameters, which was experimentally shown to be a very good approximation to CLL. In addition, for Bayesian network classifiers, a closed-form equation is found for the parameters that maximize the scoring criterion. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

461 KiB  
Article
Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems
by Simon DeDeo, Robert X. D. Hawkins, Sara Klingenstein and Tim Hitchcock
Entropy 2013, 15(6), 2246-2276; https://doi.org/10.3390/e15062246 - 05 Jun 2013
Cited by 33 | Viewed by 18244
Abstract
We characterize the statistical bootstrap for the estimation of informationtheoretic quantities from data, with particular reference to its use in the study of large-scale social phenomena. Our methods allow one to preserve, approximately, the underlying axiomatic relationships of information theory—in particular, consistency under [...] Read more.
We characterize the statistical bootstrap for the estimation of informationtheoretic quantities from data, with particular reference to its use in the study of large-scale social phenomena. Our methods allow one to preserve, approximately, the underlying axiomatic relationships of information theory—in particular, consistency under arbitrary coarse-graining—that motivate use of these quantities in the first place, while providing reliability comparable to the state of the art for Bayesian estimators. We show how information-theoretic quantities allow for rigorous empirical study of the decision-making capacities of rational agents, and the time-asymmetric flows of information in distributed systems. We provide illustrative examples by reference to ongoing collaborative work on the semantic structure of the British Criminal Court system and the conflict dynamics of the contemporary Afghanistan insurgency. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

409 KiB  
Article
Bias Adjustment for a Nonparametric Entropy Estimator
by Zhiyi Zhang and Michael Grabchak
Entropy 2013, 15(6), 1999-2011; https://doi.org/10.3390/e15061999 - 23 May 2013
Cited by 10 | Viewed by 7914
Abstract
Zhang in 2012 introduced a nonparametric estimator of Shannon’s entropy, whose bias decays exponentially fast when the alphabet is finite. We propose a methodology to estimate the bias of this estimator. We then use it to construct a new estimator of entropy. Simulation [...] Read more.
Zhang in 2012 introduced a nonparametric estimator of Shannon’s entropy, whose bias decays exponentially fast when the alphabet is finite. We propose a methodology to estimate the bias of this estimator. We then use it to construct a new estimator of entropy. Simulation results suggest that this bias adjusted estimator has a significantly lower bias than many other commonly used estimators. We consider both the case when the alphabet is finite and when it is countably infinite. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

1877 KiB  
Article
Bayesian and Quasi-Bayesian Estimators for Mutual Information from Discrete Data
by Evan Archer, Il Memming Park and Jonathan W. Pillow
Entropy 2013, 15(5), 1738-1755; https://doi.org/10.3390/e15051738 - 10 May 2013
Cited by 38 | Viewed by 10656
Abstract
Mutual information (MI) quantifies the statistical dependency between a pair of random variables, and plays a central role in the analysis of engineering and biological systems. Estimation of MI is difficult due to its dependence on an entire joint distribution, which is difficult [...] Read more.
Mutual information (MI) quantifies the statistical dependency between a pair of random variables, and plays a central role in the analysis of engineering and biological systems. Estimation of MI is difficult due to its dependence on an entire joint distribution, which is difficult to estimate from samples. Here we discuss several regularized estimators for MI that employ priors based on the Dirichlet distribution. First, we discuss three “quasi-Bayesian” estimators that result from linear combinations of Bayesian estimates for conditional and marginal entropies. We show that these estimators are not in fact Bayesian, and do not arise from a well-defined posterior distribution and may in fact be negative. Second, we show that a fully Bayesian MI estimator proposed by Hutter (2002), which relies on a fixed Dirichlet prior, exhibits strong prior dependence and has large bias for small datasets. Third, we formulate a novel Bayesian estimator using a mixture-of-Dirichlets prior, with mixing weights designed to produce an approximately flat prior over MI. We examine the performance of these estimators with a variety of simulated datasets and show that, surprisingly, quasi-Bayesian estimators generally outperform our Bayesian estimator. We discuss outstanding challenges for MI estimation and suggest promising avenues for future research. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Graphical abstract

740 KiB  
Article
An Estimate of Mutual Information that Permits Closed-Form Optimisation
by Raymond Liu and Duncan F. Gillies
Entropy 2013, 15(5), 1690-1704; https://doi.org/10.3390/e15051690 - 08 May 2013
Cited by 4 | Viewed by 5073
Abstract
We introduce a new estimate of mutual information between a dataset and a target variable that can be maximised analytically and has broad applicability in the field of machine learning and statistical pattern recognition. This estimate has previously been employed implicitly as an [...] Read more.
We introduce a new estimate of mutual information between a dataset and a target variable that can be maximised analytically and has broad applicability in the field of machine learning and statistical pattern recognition. This estimate has previously been employed implicitly as an approximation to quadratic mutual information. In this paper we will study the properties of these estimates of mutual information in more detail, and provide a derivation from a perspective of pairwise interactions. From this perspective, we will show a connection between our proposed estimate and Laplacian eigenmaps, which so far has not been shown to be related to mutual information. Compared with other popular measures of mutual information, which can only be maximised through an iterative process, ours can be maximised much more efficiently and reliably via closed-form eigendecomposition. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

620 KiB  
Article
A Novel Nonparametric Distance Estimator for Densities with Error Bounds
by Alexandre R.F. Carvalho, João Manuel R. S. Tavares and Jose C. Principe
Entropy 2013, 15(5), 1609-1623; https://doi.org/10.3390/e15051609 - 06 May 2013
Cited by 5 | Viewed by 6068
Abstract
The use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an α-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of generalized [...] Read more.
The use of a metric to assess distance between probability densities is an important practical problem. In this work, a particular metric induced by an α-divergence is studied. The Hellinger metric can be interpreted as a particular case within the framework of generalized Tsallis divergences and entropies. The nonparametric Parzen’s density estimator emerges as a natural candidate to estimate the underlying probability density function, since it may account for data from different groups, or experiments with distinct instrumental precisions, i.e., non-independent and identically distributed (non-i.i.d.) data. However, the information theoretic derived metric of the nonparametric Parzen’s density estimator displays infinite variance, limiting the direct use of resampling estimators. Based on measure theory, we present a change of measure to build a finite variance density allowing the use of resampling estimators. In order to counteract the poor scaling with dimension, we propose a new nonparametric two-stage robust resampling estimator of Hellinger’s metric error bounds for heterocedastic data. The approach presents very promising results allowing the use of different covariances for different clusters with impact on the distance evaluation. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

637 KiB  
Article
Minimum Mutual Information and Non-Gaussianity through the Maximum Entropy Method: Estimation from Finite Samples
by Carlos A. L. Pires and Rui A. P. Perdigão
Entropy 2013, 15(3), 721-752; https://doi.org/10.3390/e15030721 - 25 Feb 2013
Cited by 8 | Viewed by 6534
Abstract
The Minimum Mutual Information (MinMI) Principle provides the least committed, maximum-joint-entropy (ME) inferential law that is compatible with prescribed marginal distributions and empirical cross constraints. Here, we estimate MI bounds (the MinMI values) generated by constraining sets Tcr comprehended by mcr [...] Read more.
The Minimum Mutual Information (MinMI) Principle provides the least committed, maximum-joint-entropy (ME) inferential law that is compatible with prescribed marginal distributions and empirical cross constraints. Here, we estimate MI bounds (the MinMI values) generated by constraining sets Tcr comprehended by mcr linear and/or nonlinear joint expectations, computed from samples of N iid outcomes. Marginals (and their entropy) are imposed by single morphisms of the original random variables. N-asymptotic formulas are given both for the distribution of cross expectation’s estimation errors, the MinMI estimation bias, its variance and distribution. A growing Tcr leads to an increasing MinMI, converging eventually to the total MI. Under N-sized samples, the MinMI increment relative to two encapsulated sets Tcr1 Tcr2 (with numbers of constraints mcr1<mcr2 ) is the test-difference δH = Hmax 1, N - Hmax 2, N ≥ 0 between the two respective estimated MEs. Asymptotically, δH follows a Chi-Squared distribution 1/2NΧ2 (mcr2-mcr1) whose upper quantiles determine if constraints in Tcr2/Tcr1 explain significant extra MI. As an example, we have set marginals to being normally distributed (Gaussian) and have built a sequence of MI bounds, associated to successive non-linear correlations due to joint non-Gaussianity. Noting that in real-world situations available sample sizes can be rather low, the relationship between MinMI bias, probability density over-fitting and outliers is put in evidence for under-sampled data. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Show Figures

Figure 1

Review

Jump to: Research

350 KiB  
Review
Machine Learning with Squared-Loss Mutual Information
by Masashi Sugiyama
Entropy 2013, 15(1), 80-112; https://doi.org/10.3390/e15010080 - 27 Dec 2012
Cited by 30 | Viewed by 10276
Abstract
Mutual information (MI) is useful for detecting statistical independence between random variables, and it has been successfully applied to solving various machine learning problems. Recently, an alternative to MI called squared-loss MI (SMI) was introduced. While ordinary MI is the Kullback–Leibler divergence from [...] Read more.
Mutual information (MI) is useful for detecting statistical independence between random variables, and it has been successfully applied to solving various machine learning problems. Recently, an alternative to MI called squared-loss MI (SMI) was introduced. While ordinary MI is the Kullback–Leibler divergence from the joint distribution to the product of the marginal distributions, SMI is its Pearson divergence variant. Because both the divergences belong to the ƒ-divergence family, they share similar theoretical properties. However, a notable advantage of SMI is that it can be approximated from data in a computationally more efficient and numerically more stable way than ordinary MI. In this article, we review recent development in SMI approximation based on direct density-ratio estimation and SMI-based machine learning techniques such as independence testing, dimensionality reduction, canonical dependency analysis, independent component analysis, object matching, clustering, and causal inference. Full article
(This article belongs to the Special Issue Estimating Information-Theoretic Quantities from Data)
Back to TopTop