Entropy

Editorial

Jump to: Research

5 pages, 206 KiB

Open AccessEditorial

Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems

by Igal Sason

Entropy 2022, 24(5), 712; https://doi.org/10.3390/e24050712 - 16 May 2022

Cited by 8 | Viewed by 3922

Abstract

Data science, information theory, probability theory, statistical learning, statistical signal processing, and other related disciplines greatly benefit from non-negative measures of dissimilarity between pairs of probability measures [...] Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

Research

Jump to: Editorial

26 pages, 1548 KiB

Open AccessArticle

Discriminant Analysis under f-Divergence Measures

by Anmol Dwivedi, Sihui Wang and Ali Tajer

Entropy 2022, 24(2), 188; https://doi.org/10.3390/e24020188 - 27 Jan 2022

Cited by 6 | Viewed by 3354

Abstract

In statistical inference, the information-theoretic performance limits can often be expressed in terms of a statistical divergence between the underlying statistical models (e.g., in binary hypothesis testing, the error probability is related to the total variation distance between the statistical models). As the [...] Read more.

In statistical inference, the information-theoretic performance limits can often be expressed in terms of a statistical divergence between the underlying statistical models (e.g., in binary hypothesis testing, the error probability is related to the total variation distance between the statistical models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (the divergence reduces by the data-processing inequality). This paper considers linear dimensionality reduction such that the divergence between the models is maximally preserved. Specifically, this paper focuses on Gaussian models where we investigate discriminant analysis under five f-divergence measures (Kullback–Leibler, symmetrized Kullback–Leibler, Hellinger, total variation, and

χ^{2}

). We characterize the optimal design of the linear transformation of the data onto a lower-dimensional subspace for zero-mean Gaussian models and employ numerical algorithms to find the design for general Gaussian models with non-zero means. There are two key observations for zero-mean Gaussian models. First, projections are not necessarily along the largest modes of the covariance matrix of the data, and, in some situations, they can even be along the smallest modes. Secondly, under specific regimes, the optimal design of subspace projection is identical under all the f-divergence measures considered, rendering a degree of universality to the design, independent of the inference problem of interest. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

► Show Figures

Figure 1

52 pages, 769 KiB

Open AccessEditor’s ChoiceArticle

Error Exponents and α-Mutual Information

by Sergio Verdú

Entropy 2021, 23(2), 199; https://doi.org/10.3390/e23020199 - 5 Feb 2021

Cited by 14 | Viewed by 4965

Abstract

Over the last six decades, the representation of error exponent functions for data transmission through noisy channels at rates below capacity has seen three distinct approaches: (1) Through Gallager’s

E_{0}

functions (with and without cost constraints); (2) large deviations form, in terms [...] Read more.

Over the last six decades, the representation of error exponent functions for data transmission through noisy channels at rates below capacity has seen three distinct approaches: (1) Through Gallager’s

E_{0}

functions (with and without cost constraints); (2) large deviations form, in terms of conditional relative entropy and mutual information; (3) through the

α

-mutual information and the Augustin–Csiszár mutual information of order

α

derived from the Rényi divergence. While a fairly complete picture has emerged in the absence of cost constraints, there have remained gaps in the interrelationships between the three approaches in the general case of cost-constrained encoding. Furthermore, no systematic approach has been proposed to solve the attendant optimization problems by exploiting the specific structure of the information functions. This paper closes those gaps and proposes a simple method to maximize Augustin–Csiszár mutual information of order

α

under cost constraints by means of the maximization of the

α

-mutual information subject to an exponential average constraint. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

► Show Figures

Figure 1

15 pages, 297 KiB

Open AccessArticle

Minimum Divergence Estimators, Maximum Likelihood and the Generalized Bootstrap

by Michel Broniatowski

Entropy 2021, 23(2), 185; https://doi.org/10.3390/e23020185 - 31 Jan 2021

Cited by 6 | Viewed by 2274

Abstract

This paper states that most commonly used minimum divergence estimators are MLEs for suited generalized bootstrapped sampling schemes. Optimality in the sense of Bahadur for associated tests of fit under such sampling is considered. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

20 pages, 328 KiB

Open AccessArticle

Strongly Convex Divergences

by James Melbourne

Entropy 2020, 22(11), 1327; https://doi.org/10.3390/e22111327 - 21 Nov 2020

Cited by 8 | Viewed by 2754

Abstract

We consider a sub-class of the f-divergences satisfying a stronger convexity property, which we refer to as strongly convex, or

κ

-convex divergences. We derive new and old relationships, based on convexity arguments, between popular f-divergences. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

26 pages, 394 KiB

Open AccessArticle

A Two-Moment Inequality with Applications to Rényi Entropy and Mutual Information

by Galen Reeves

Entropy 2020, 22(11), 1244; https://doi.org/10.3390/e22111244 - 1 Nov 2020

Cited by 2 | Viewed by 2856

Abstract

This paper explores some applications of a two-moment inequality for the integral of the rth power of a function, where

0 < r < 1

. The first contribution is an upper bound on the Rényi entropy of a random vector in [...] Read more.

This paper explores some applications of a two-moment inequality for the integral of the rth power of a function, where

0 < r < 1

. The first contribution is an upper bound on the Rényi entropy of a random vector in terms of the two different moments. When one of the moments is the zeroth moment, these bounds recover previous results based on maximum entropy distributions under a single moment constraint. More generally, evaluation of the bound with two carefully chosen nonzero moments can lead to significant improvements with a modest increase in complexity. The second contribution is a method for upper bounding mutual information in terms of certain integrals with respect to the variance of the conditional density. The bounds have a number of useful properties arising from the connection with variance decompositions. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

► Show Figures

Figure 1

36 pages, 522 KiB

Open AccessArticle

On Relations Between the Relative Entropy and χ²-Divergence, Generalizations and Applications

by Tomohiro Nishiyama and Igal Sason

Entropy 2020, 22(5), 563; https://doi.org/10.3390/e22050563 - 18 May 2020

Cited by 19 | Viewed by 5316

Abstract

The relative entropy and the chi-squared divergence are fundamental divergence measures in information theory and statistics. This paper is focused on a study of integral relations between the two divergences, the implications of these relations, their information-theoretic applications, and some generalizations pertaining to [...] Read more.

The relative entropy and the chi-squared divergence are fundamental divergence measures in information theory and statistics. This paper is focused on a study of integral relations between the two divergences, the implications of these relations, their information-theoretic applications, and some generalizations pertaining to the rich class of f-divergences. Applications that are studied in this paper refer to lossless compression, the method of types and large deviations, strong data–processing inequalities, bounds on contraction coefficients and maximal correlation, and the convergence rate to stationarity of a type of discrete-time Markov chains. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

39 pages, 1011 KiB

Open AccessArticle

Conditional Rényi Divergences and Horse Betting

by Cédric Bleuler, Amos Lapidoth and Christoph Pfister

Entropy 2020, 22(3), 316; https://doi.org/10.3390/e22030316 - 11 Mar 2020

Cited by 8 | Viewed by 3962

Abstract

Motivated by a horse betting problem, a new conditional Rényi divergence is introduced. It is compared with the conditional Rényi divergences that appear in the definitions of the dependence measures by Csiszár and Sibson, and the properties of all three are studied with [...] Read more.

Motivated by a horse betting problem, a new conditional Rényi divergence is introduced. It is compared with the conditional Rényi divergences that appear in the definitions of the dependence measures by Csiszár and Sibson, and the properties of all three are studied with emphasis on their behavior under data processing. In the same way that Csiszár’s and Sibson’s conditional divergence lead to the respective dependence measures, so does the new conditional divergence lead to the Lapidoth–Pfister mutual information. Moreover, the new conditional divergence is also related to the Arimoto–Rényi conditional entropy and to Arimoto’s measure of dependence. In the second part of the paper, the horse betting problem is analyzed where, instead of Kelly’s expected log-wealth criterion, a more general family of power-mean utility functions is considered. The key role in the analysis is played by the Rényi divergence, and in the setting where the gambler has access to side information, the new conditional Rényi divergence is key. The setting with side information also provides another operational meaning to the Lapidoth–Pfister mutual information. Finally, a universal strategy for independent and identically distributed races is presented that—without knowing the winning probabilities or the parameter of the utility function—asymptotically maximizes the gambler’s utility function. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

24 pages, 1604 KiB

Open AccessArticle

On a Generalization of the Jensen–Shannon Divergence and the Jensen–Shannon Centroid

by Frank Nielsen

Entropy 2020, 22(2), 221; https://doi.org/10.3390/e22020221 - 16 Feb 2020

Cited by 103 | Viewed by 15188

Abstract

The Jensen–Shannon divergence is a renown bounded symmetrization of the Kullback–Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar

α

-Jensen–Bregman divergences and derive thereof the vector-skew

α

-Jensen–Shannon [...] Read more.

The Jensen–Shannon divergence is a renown bounded symmetrization of the Kullback–Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar

α

-Jensen–Bregman divergences and derive thereof the vector-skew

α

-Jensen–Shannon divergences. We prove that the vector-skew

α

-Jensen–Shannon divergences are f-divergences and study the properties of these novel divergences. Finally, we report an iterative algorithm to numerically compute the Jensen–Shannon-type centroids for a set of probability densities belonging to a mixture family: This includes the case of the Jensen–Shannon centroid of a set of categorical distributions or normalized histograms. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

► Show Figures

Graphical abstract

Journal Menu

Journal Browser

Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (9 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI