MDPI - Publisher of Open Access Journals

8 pages, 296 KiB

Open AccessCommunication

Equivalence of Informations Characterizes Bregman Divergences

by Philip S. Chodrow

Entropy 2025, 27(7), 766; https://doi.org/10.3390/e27070766 - 19 Jul 2025

Viewed by 240

Bregman divergences form a class of distance-like comparison functions which plays fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they generate agreement between two useful formulations of information content (in the sense of variability or [...] Read more.

Bregman divergences form a class of distance-like comparison functions which plays fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they generate agreement between two useful formulations of information content (in the sense of variability or non-uniformity) in weighted collections of vectors. The first of these is the Jensen gap information, which measures the difference between the mean value of a strictly convex function evaluated on a weighted set of vectors and the value of that function evaluated at the centroid of that collection. The second of these is the divergence information, which measures the mean divergence of the vectors in the collection from their centroid. In this brief note, we prove that the agreement between Jensen gap and divergence informations in fact characterizes the class of Bregman divergences; they are the only divergences that generate this agreement for arbitrary weighted sets of data vectors. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

16 pages, 656 KiB

Open AccessArticle

Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity

by Frank Nielsen

Entropy 2024, 26(3), 193; https://doi.org/10.3390/e26030193 - 23 Feb 2024

Cited by 1 | Viewed by 2049

Abstract

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the [...] Read more.

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the cumulant and partition functions are strictly convex and smooth functions inducing corresponding pairs of Bregman and Jensen divergences. It is well known that skewed Bhattacharyya distances between the probability densities of an exponential family amount to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and that in limit cases the sided Kullback–Leibler divergences amount to reverse-sided Bregman divergences. In this work, we first show that the

α

-divergences between non-normalized densities of an exponential family amount to scaled

α

-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetical means allows both convex functions and their arguments to be deformed, thereby defining dually flat spaces with corresponding divergences when ordinary convexity is preserved. Full article

► Show Figures

Figure 1

35 pages, 988 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Revisiting Chernoff Information with Likelihood Ratio Exponential Families

by Frank Nielsen

Entropy 2022, 24(10), 1400; https://doi.org/10.3390/e24101400 - 1 Oct 2022

Cited by 13 | Viewed by 5991

Abstract

The Chernoff information between two probability measures is a statistical divergence measuring their deviation defined as their maximally skewed Bhattacharyya distance. Although the Chernoff information was originally introduced for bounding the Bayes error in statistical hypothesis testing, the divergence found many other applications [...] Read more.

The Chernoff information between two probability measures is a statistical divergence measuring their deviation defined as their maximally skewed Bhattacharyya distance. Although the Chernoff information was originally introduced for bounding the Bayes error in statistical hypothesis testing, the divergence found many other applications due to its empirical robustness property found in applications ranging from information fusion to quantum information. From the viewpoint of information theory, the Chernoff information can also be interpreted as a minmax symmetrization of the Kullback–Leibler divergence. In this paper, we first revisit the Chernoff information between two densities of a measurable Lebesgue space by considering the exponential families induced by their geometric mixtures: The so-called likelihood ratio exponential families. Second, we show how to (i) solve exactly the Chernoff information between any two univariate Gaussian distributions or get a closed-form formula using symbolic computing, (ii) report a closed-form formula of the Chernoff information of centered Gaussians with scaled covariance matrices and (iii) use a fast numerical scheme to approximate the Chernoff information between any two multivariate Gaussian distributions. Full article

(This article belongs to the Special Issue Robust Distance Metric Learning in the Framework of Statistical Information Theory)

► Show Figures

Graphical abstract

21 pages, 1068 KiB

Open AccessFeature PaperArticle

Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences

by Frank Nielsen

Entropy 2022, 24(3), 421; https://doi.org/10.3390/e24030421 - 17 Mar 2022

Cited by 11 | Viewed by 6312

Abstract

By calculating the Kullback–Leibler divergence between two probability measures belonging to different exponential families dominated by the same measure, we obtain a formula that generalizes the ordinary Fenchel–Young divergence. Inspired by this formula, we define the duo Fenchel–Young divergence and report a majorization [...] Read more.

By calculating the Kullback–Leibler divergence between two probability measures belonging to different exponential families dominated by the same measure, we obtain a formula that generalizes the ordinary Fenchel–Young divergence. Inspired by this formula, we define the duo Fenchel–Young divergence and report a majorization condition on its pair of strictly convex generators, which guarantees that this divergence is always non-negative. The duo Fenchel–Young divergence is also equivalent to a duo Bregman divergence. We show how to use these duo divergences by calculating the Kullback–Leibler divergence between densities of truncated exponential families with nested supports, and report a formula for the Kullback–Leibler divergence between truncated normal distributions. Finally, we prove that the skewed Bhattacharyya distances between truncated exponential families amount to equivalent skewed duo Jensen divergences. Full article

(This article belongs to the Special Issue Information and Divergence Measures)

► Show Figures

Graphical abstract

28 pages, 1106 KiB

Open AccessEditor’s ChoiceArticle

On a Variational Definition for the Jensen-Shannon Symmetrization of Distances Based on the Information Radius

by Frank Nielsen

Entropy 2021, 23(4), 464; https://doi.org/10.3390/e23040464 - 14 Apr 2021

Cited by 29 | Viewed by 10550

Abstract

We generalize the Jensen-Shannon divergence and the Jensen-Shannon diversity index by considering a variational definition with respect to a generic mean, thereby extending the notion of Sibson’s information radius. The variational definition applies to any arbitrary distance and yields a new way to [...] Read more.

We generalize the Jensen-Shannon divergence and the Jensen-Shannon diversity index by considering a variational definition with respect to a generic mean, thereby extending the notion of Sibson’s information radius. The variational definition applies to any arbitrary distance and yields a new way to define a Jensen-Shannon symmetrization of distances. When the variational optimization is further constrained to belong to prescribed families of probability measures, we get relative Jensen-Shannon divergences and their equivalent Jensen-Shannon symmetrizations of distances that generalize the concept of information projections. Finally, we touch upon applications of these variational Jensen-Shannon divergences and diversity indices to clustering and quantization tasks of probability measures, including statistical mixtures. Full article

(This article belongs to the Special Issue Selected Papers from the 5th conference on Geometric Science of Information)

► Show Figures

Graphical abstract

34 pages, 1942 KiB

Open AccessArticle

On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds

by Frank Nielsen

Entropy 2020, 22(7), 713; https://doi.org/10.3390/e22070713 - 28 Jun 2020

Cited by 13 | Viewed by 6435

Abstract

We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis entropy related [...] Read more.

We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis entropy related to the conformal flattening of the Fisher-Rao geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman Voronoi diagram is the Euclidean Voronoi diagram and the dual Bregman Voronoi diagram coincides with the Cauchy hyperbolic Voronoi diagram. In addition, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families. Full article

(This article belongs to the Special Issue Information Geometry III)

► Show Figures

Figure 1

24 pages, 1604 KiB

Open AccessArticle

On a Generalization of the Jensen–Shannon Divergence and the Jensen–Shannon Centroid

by Frank Nielsen

Entropy 2020, 22(2), 221; https://doi.org/10.3390/e22020221 - 16 Feb 2020

Cited by 103 | Viewed by 15193

Abstract

The Jensen–Shannon divergence is a renown bounded symmetrization of the Kullback–Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar

α

-Jensen–Bregman divergences and derive thereof the vector-skew

α

-Jensen–Shannon [...] Read more.

The Jensen–Shannon divergence is a renown bounded symmetrization of the Kullback–Leibler divergence which does not require probability densities to have matching supports. In this paper, we introduce a vector-skew generalization of the scalar

α

-Jensen–Bregman divergences and derive thereof the vector-skew

α

-Jensen–Shannon divergences. We prove that the vector-skew

α

-Jensen–Shannon divergences are f-divergences and study the properties of these novel divergences. Finally, we report an iterative algorithm to numerically compute the Jensen–Shannon-type centroids for a set of probability densities belonging to a mixture family: This includes the case of the Jensen–Shannon centroid of a set of categorical distributions or normalized histograms. Full article

(This article belongs to the Special Issue Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems)

► Show Figures

Graphical abstract

23 pages, 417 KiB

Open AccessArticle

On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means

by Frank Nielsen

Entropy 2019, 21(5), 485; https://doi.org/10.3390/e21050485 - 11 May 2019

Cited by 153 | Viewed by 17591

Abstract

The Jensen–Shannon divergence is a renowned bounded symmetrization of the unbounded Kullback–Leibler divergence which measures the total Kullback–Leibler divergence to the average mixture distribution. However, the Jensen–Shannon divergence between Gaussian distributions is not available in closed form. To bypass this problem, we present [...] Read more.

The Jensen–Shannon divergence is a renowned bounded symmetrization of the unbounded Kullback–Leibler divergence which measures the total Kullback–Leibler divergence to the average mixture distribution. However, the Jensen–Shannon divergence between Gaussian distributions is not available in closed form. To bypass this problem, we present a generalization of the Jensen–Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using parameter mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen–Shannon divergence between probability densities of the same exponential family; and (ii) the geometric JS-symmetrization of the reverse Kullback–Leibler divergence between probability densities of the same exponential family. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen–Shannon divergence between scale Cauchy distributions. Applications to clustering with respect to these novel Jensen–Shannon divergences are touched upon. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Graphical abstract

16 pages, 274 KiB

Open AccessArticle

Information Geometric Approach on Most Informative Boolean Function Conjecture

by Albert No

Entropy 2018, 20(9), 688; https://doi.org/10.3390/e20090688 - 10 Sep 2018

Cited by 2 | Viewed by 3401

Abstract

Let

X^{n}

be a memoryless uniform Bernoulli source and

Y^{n}

be the output of it through a binary symmetric channel. Courtade and Kumar conjectured that the Boolean function [...] Read more.

Let

X^{n}

be a memoryless uniform Bernoulli source and

Y^{n}

be the output of it through a binary symmetric channel. Courtade and Kumar conjectured that the Boolean function

f : {0, 1}^{n} \to {0, 1}

that maximizes the mutual information

I (f (X^{n}); Y^{n})

is a dictator function, i.e.,

f (x^{n}) = x_{i}

for some i. We propose a clustering problem, which is equivalent to the above problem where we emphasize an information geometry aspect of the equivalent problem. Moreover, we define a normalized geometric mean of measures and interesting properties of it. We also show that the conjecture is true when the arithmetic and geometric mean coincide in a specific set of measures. Full article

(This article belongs to the Special Issue The 20th Anniversary of Entropy - Recent Advances in Entropy and Information-Theoretic Concepts and Their Applications)

15 pages, 1133 KiB

Open AccessArticle

Information Geometry for Radar Target Detection with Total Jensen–Bregman Divergence

by Xiaoqiang Hua, Haiyan Fan, Yongqiang Cheng, Hongqiang Wang and Yuliang Qin

Entropy 2018, 20(4), 256; https://doi.org/10.3390/e20040256 - 6 Apr 2018

Cited by 25 | Viewed by 3665

Abstract

This paper proposes a radar target detection algorithm based on information geometry. In particular, the correlation of sample data is modeled as a Hermitian positive-definite (HPD) matrix. Moreover, a class of total Jensen–Bregman divergences, including the total Jensen square loss, the total Jensen [...] Read more.

This paper proposes a radar target detection algorithm based on information geometry. In particular, the correlation of sample data is modeled as a Hermitian positive-definite (HPD) matrix. Moreover, a class of total Jensen–Bregman divergences, including the total Jensen square loss, the total Jensen log-determinant divergence, and the total Jensen von Neumann divergence, are proposed to be used as the distance-like function on the space of HPD matrices. On basis of these divergences, definitions of their corresponding median matrices are given. Finally, a decision rule of target detection is made by comparing the total Jensen-Bregman divergence between the median of reference cells and the matrix of cell under test with a given threshold. The performance analysis on both simulated and real radar data confirm the superiority of the proposed detection method over its conventional counterparts and existing ones. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

70 pages, 732 KiB

Open AccessArticle

Paradigms of Cognition

by Flemming Topsøe

Entropy 2017, 19(4), 143; https://doi.org/10.3390/e19040143 - 27 Mar 2017

Cited by 1 | Viewed by 4900

Abstract

An abstract, quantitative theory which connects elements of information —key ingredients in the cognitive proces—is developed. Seemingly unrelated results are thereby unified. As an indication of this, consider results in classical probabilistic information theory involving information projections and so-called Pythagorean inequalities. This [...] Read more.

An abstract, quantitative theory which connects elements of information —key ingredients in the cognitive proces—is developed. Seemingly unrelated results are thereby unified. As an indication of this, consider results in classical probabilistic information theory involving information projections and so-called Pythagorean inequalities. This has a certain resemblance to classical results in geometry bearing Pythagoras’ name. By appealing to the abstract theory presented here, you have a common point of reference for these results. In fact, the new theory provides a general framework for the treatment of a multitude of global optimization problems across a range of disciplines such as geometry, statistics and statistical physics. Several applications are given, among them an “explanation” of Tsallis entropy is suggested. For this, as well as for the general development of the abstract underlying theory, emphasis is placed on interpretations and associated philosophical considerations. Technically, game theory is the key tool. Full article

(This article belongs to the Special Issue Selected Papers from MaxEnt 2016)

► Show Figures

Figure 1

47 pages, 759 KiB

Open AccessReview

Log-Determinant Divergences Revisited: Alpha-Beta and Gamma Log-Det Divergences

by Andrzej Cichocki, Sergio Cruces and Shun-ichi Amari

Entropy 2015, 17(5), 2988-3034; https://doi.org/10.3390/e17052988 - 8 May 2015

Cited by 50 | Viewed by 9236

Abstract

This work reviews and extends a family of log-determinant (log-det) divergences for symmetric positive definite (SPD) matrices and discusses their fundamental properties. We show how to use parameterized Alpha-Beta (AB) and Gamma log-det divergences to generate many well-known divergences; in particular, we consider [...] Read more.

This work reviews and extends a family of log-determinant (log-det) divergences for symmetric positive definite (SPD) matrices and discusses their fundamental properties. We show how to use parameterized Alpha-Beta (AB) and Gamma log-det divergences to generate many well-known divergences; in particular, we consider the Stein’s loss, the S-divergence, also called Jensen-Bregman LogDet (JBLD) divergence, Logdet Zero (Bhattacharyya) divergence, Affine Invariant Riemannian Metric (AIRM), and other divergences. Moreover, we establish links and correspondences between log-det divergences and visualise them on an alpha-beta plane for various sets of parameters. We use this unifying framework to interpret and extend existing similarity measures for semidefinite covariance matrices in finite-dimensional Reproducing Kernel Hilbert Spaces (RKHS). This paper also shows how the Alpha-Beta family of log-det divergences relates to the divergences of multivariate and multilinear normal distributions. Closed form formulas are derived for Gamma divergences of two multivariate Gaussian densities; the special cases of the Kullback-Leibler, Bhattacharyya, Rényi, and Cauchy-Schwartz divergences are discussed. Symmetrized versions of log-det divergences are also considered and briefly reviewed. Finally, a class of divergences is extended to multiway divergences for separable covariance (or precision) matrices. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI