Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (53)

Search Parameters:
Keywords = Bregman divergence

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 299 KB  
Article
Uniqueness of the Canonical Reciprocal Cost
by Jonathan Washburn and Milan Zlatanović
Mathematics 2026, 14(6), 935; https://doi.org/10.3390/math14060935 - 10 Mar 2026
Cited by 1 | Viewed by 330
Abstract
We study a rigidity problem for functions F:R>0R0 that penalize deviation of a positive ratio from equilibrium x=1. Assuming (i) a d’Alembert-type composition law on R>0, and (ii) a [...] Read more.
We study a rigidity problem for functions F:R>0R0 that penalize deviation of a positive ratio from equilibrium x=1. Assuming (i) a d’Alembert-type composition law on R>0, and (ii) a single quadratic calibration at the identity (in logarithmic coordinates), we prove that F is uniquely determined. The composition law implies the normalization F(1)=0. The unique solution is called the canonical reciprocal cost, namely the difference between the arithmetic and geometric means of x and its reciprocal. Our proof uses the logarithmic coordinates H(t)=F(et)+1, where the composition law becomes d’Alembert’s functional equation on R. The calibration provides the minimal regularity needed to invoke the classical classification of continuous solutions and fixes the remaining scaling freedom, selecting the hyperbolic-cosine branch. We also establish the necessity of each assumption: without calibration the composition law admits a continuous one-parameter family; without the composition law the calibration does not determine the global form; and without regularity the composition law admits pathological non-measurable solutions. Finally, we establish a stability estimate for approximate solutions under bounded defect and characterize some properties of the canonical cost. Full article
(This article belongs to the Section C: Mathematical Analysis)
36 pages, 952 KB  
Article
On Minimum Bregman Divergence Inference
by Soumik Purkayastha and Ayanendranath Basu
Mathematics 2026, 14(4), 670; https://doi.org/10.3390/math14040670 - 13 Feb 2026
Viewed by 338
Abstract
The density power divergence (DPD) is a well-studied member of the Bregman divergence family and forms the basis of widely used minimum divergence estimators that balance efficiency and robustness. In this paper, we introduce and study a new sub-class of Bregman divergences, termed [...] Read more.
The density power divergence (DPD) is a well-studied member of the Bregman divergence family and forms the basis of widely used minimum divergence estimators that balance efficiency and robustness. In this paper, we introduce and study a new sub-class of Bregman divergences, termed the exponentially weighted divergence (EWD), designed to generate competitive and practically interpretable inference procedures. The EWD is constructed so that its associated weight function remains bounded within the interval [0, 1], which facilitates a transparent interpretation of robustness through controlled downweighting of low-density observations and avoids excessive influence from high-density points. We develop minimum EWD estimators (MEWDEs) within a general framework accommodating independent but non-homogeneous data, thereby extending classical minimum divergence theory beyond the i.i.d. setting. Under standard regularity conditions, we establish Fisher consistency and asymptotic normality, and we analyze robustness properties through influence function calculations. The EWD framework is further extended to parametric hypothesis testing, for which we derive the asymptotic null distribution of a Bregman divergence-based test statistic. Extensive simulation studies and real-data applications demonstrate that the proposed estimators perform comparably to, and often more robustly than, existing DPD-based procedures, particularly under moderate to heavy contamination, while retaining high efficiency under clean data. Overall, the EWD provides a tractable and interpretable alternative within the Bregman divergence class for robust parametric estimation and testing. Full article
Show Figures

Figure 1

12 pages, 570 KB  
Article
Generalized Legendre Transforms Have Roots in Information Geometry
by Frank Nielsen
Entropy 2026, 28(1), 44; https://doi.org/10.3390/e28010044 - 30 Dec 2025
Viewed by 796
Abstract
Artstein-Avidan and Milman [Annals of mathematics (2009), (169):661–674] characterized invertible reverse-ordering transforms in the space of lower, semi-continuous, extended, real-valued convex functions as affine deformations of the ordinary Legendre transform. In this work, we first prove that all those generalized Legendre transforms of [...] Read more.
Artstein-Avidan and Milman [Annals of mathematics (2009), (169):661–674] characterized invertible reverse-ordering transforms in the space of lower, semi-continuous, extended, real-valued convex functions as affine deformations of the ordinary Legendre transform. In this work, we first prove that all those generalized Legendre transforms of functions correspond to the ordinary Legendre transform of dually corresponding affine-deformed functions. In short, generalized convex conjugates are ordinary convex conjugates of dually affine-deformed functions. Second, we explain how these generalized Legendre transforms can be derived from the dual Hessian structures of information geometry. Full article
Show Figures

Figure 1

25 pages, 17533 KB  
Article
Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies
by Andrzej Cichocki, Toshihisa Tanaka, Frank Nielsen and Sergio Cruces
Entropy 2025, 27(12), 1243; https://doi.org/10.3390/e27121243 - 8 Dec 2025
Cited by 1 | Viewed by 1616
Abstract
This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms derived from trace-form entropies defined via deformed logarithms. Leveraging these generalized entropies yields MD and GEG algorithms with improved convergence behavior, robustness against vanishing and exploding gradients, [...] Read more.
This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms derived from trace-form entropies defined via deformed logarithms. Leveraging these generalized entropies yields MD and GEG algorithms with improved convergence behavior, robustness against vanishing and exploding gradients, and inherent adaptability to non-Euclidean geometries through mirror maps. We establish deep connections between these methods and Amari’s natural gradient, revealing a unified geometric foundation for additive, multiplicative, and natural gradient updates. Focusing on the Tsallis, Kaniadakis, Sharma–Taneja–Mittal, and Kaniadakis–Lissia–Scarfone entropy families, we show that each entropy induces a distinct Riemannian metric on the parameter space, leading to GEG algorithms that preserve the natural statistical geometry. The tunable parameters of deformed logarithms enable adaptive geometric selection, providing enhanced robustness and convergence over classical Euclidean optimization. Overall, our framework unifies key first-order MD optimization methods under a single information-geometric perspective based on generalized Bregman divergences, where the choice of entropy determines the underlying metric and dual geometric structure. Full article
Show Figures

Figure 1

22 pages, 370 KB  
Article
Tight Bounds Between the Jensen–Shannon Divergence and the Minmax Divergence
by Arseniy Akopyan, Herbert Edelsbrunner, Žiga Virk and Hubert Wagner
Entropy 2025, 27(8), 854; https://doi.org/10.3390/e27080854 - 11 Aug 2025
Viewed by 2165
Abstract
Motivated by questions arising at the intersection of information theory and geometry, we compare two dissimilarity measures between finite categorical distributions. One is the well-known Jensen–Shannon divergence, which is easy to compute and whose square root is a proper metric. The other is [...] Read more.
Motivated by questions arising at the intersection of information theory and geometry, we compare two dissimilarity measures between finite categorical distributions. One is the well-known Jensen–Shannon divergence, which is easy to compute and whose square root is a proper metric. The other is what we call the minmax divergence, which is harder to compute. Just like the Jensen–Shannon divergence, it arises naturally from the Kullback–Leibler divergence. The main contribution of this paper is a proof showing that the minmax divergence can be tightly approximated by the Jensen–Shannon divergence. The bounds suggest that the square root of the minmax divergence is a metric, and we prove that this is indeed true in the one-dimensional case. The general case remains open. Finally, we consider analogous questions in the context of another Bregman divergence and the corresponding Burbea–Rao (Jensen–Bregman) divergence. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

21 pages, 3816 KB  
Article
A K-Means Clustering Algorithm with Total Bregman Divergence for Point Cloud Denoising
by Xiaomin Duan, Anqi Mu, Xinyu Zhao and Yuqi Wu
Symmetry 2025, 17(8), 1186; https://doi.org/10.3390/sym17081186 - 24 Jul 2025
Cited by 1 | Viewed by 1028
Abstract
Point cloud denoising is essential for improving 3D data quality, yet traditional K-means methods relying on Euclidean distance struggle with non-uniform noise. This paper proposes a K-means algorithm leveraging Total Bregman Divergence (TBD) to better model geometric structures on manifolds, enhancing robustness against [...] Read more.
Point cloud denoising is essential for improving 3D data quality, yet traditional K-means methods relying on Euclidean distance struggle with non-uniform noise. This paper proposes a K-means algorithm leveraging Total Bregman Divergence (TBD) to better model geometric structures on manifolds, enhancing robustness against noise. Specifically, TBDs—Total Logarithm, Exponential, and Inverse Divergences—are defined on symmetric positive-definite matrices, each tailored to capture distinct local geometries. Theoretical analysis demonstrates the bounded sensitivity of TBD-induced means to outliers via influence functions, while anisotropy indices quantify structural variations. Numerical experiments validate the method’s superiority over Euclidean-based approaches, showing effective noise separation and improved stability. This work bridges geometric insights with practical clustering, offering a robust framework for point cloud preprocessing in vision and robotics applications. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

8 pages, 296 KB  
Communication
Equivalence of Informations Characterizes Bregman Divergences
by Philip S. Chodrow
Entropy 2025, 27(7), 766; https://doi.org/10.3390/e27070766 - 19 Jul 2025
Cited by 1 | Viewed by 1207
Abstract
Bregman divergences form a class of distance-like comparison functions which plays fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they generate agreement between two useful formulations of information content (in the sense of variability or [...] Read more.
Bregman divergences form a class of distance-like comparison functions which plays fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they generate agreement between two useful formulations of information content (in the sense of variability or non-uniformity) in weighted collections of vectors. The first of these is the Jensen gap information, which measures the difference between the mean value of a strictly convex function evaluated on a weighted set of vectors and the value of that function evaluated at the centroid of that collection. The second of these is the divergence information, which measures the mean divergence of the vectors in the collection from their centroid. In this brief note, we prove that the agreement between Jensen gap and divergence informations in fact characterizes the class of Bregman divergences; they are the only divergences that generate this agreement for arbitrary weighted sets of data vectors. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

24 pages, 2044 KB  
Article
Bregman–Hausdorff Divergence: Strengthening the Connections Between Computational Geometry and Machine Learning
by Tuyen Pham, Hana Dal Poz Kouřimská and Hubert Wagner
Mach. Learn. Knowl. Extr. 2025, 7(2), 48; https://doi.org/10.3390/make7020048 - 26 May 2025
Cited by 2 | Viewed by 2460
Abstract
The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on extending it to the family of Bregman divergences, which includes [...] Read more.
The purpose of this paper is twofold. On a technical side, we propose an extension of the Hausdorff distance from metric spaces to spaces equipped with asymmetric distance measures. Specifically, we focus on extending it to the family of Bregman divergences, which includes the popular Kullback–Leibler divergence (also known as relative entropy). The resulting dissimilarity measure is called a Bregman–Hausdorff divergence and compares two collections of vectors—without assuming any pairing or alignment between their elements. We propose new algorithms for computing Bregman–Hausdorff divergences based on a recently developed Kd-tree data structure for nearest neighbor search with respect to Bregman divergences. The algorithms are surprisingly efficient even for large inputs with hundreds of dimensions. As a benchmark, we use the new divergence to compare two collections of probabilistic predictions produced by different machine learning models trained using the relative entropy loss. In addition to the introduction of this technical concept, we provide a survey. It outlines the basics of Bregman geometry, and motivated the Kullback–Leibler divergence using concepts from information theory. We also describe computational geometric algorithms that have been extended to this geometry, focusing on algorithms relevant for machine learning. Full article
Show Figures

Figure 1

24 pages, 534 KB  
Article
Anomaly Detection in High-Dimensional Time Series Data with Scaled Bregman Divergence
by Yunge Wang, Lingling Zhang, Tong Si, Graham Bishop and Haijun Gong
Algorithms 2025, 18(2), 62; https://doi.org/10.3390/a18020062 - 24 Jan 2025
Cited by 8 | Viewed by 3719
Abstract
The purpose of anomaly detection is to identify special data points or patterns that significantly deviate from the expected or typical behavior of the majority of the data, and it has a wide range of applications across various domains. Most existing statistical and [...] Read more.
The purpose of anomaly detection is to identify special data points or patterns that significantly deviate from the expected or typical behavior of the majority of the data, and it has a wide range of applications across various domains. Most existing statistical and machine learning-based anomaly detection algorithms face challenges when applied to high-dimensional data. For instance, the unconstrained least-squares importance fitting (uLSIF) method, a state-of-the-art anomaly detection approach, encounters the unboundedness problem under certain conditions. In this study, we propose a scaled Bregman divergence-based anomaly detection algorithm using both least absolute deviation and least-squares loss for parameter learning. This new algorithm effectively addresses the unboundedness problem, making it particularly suitable for high-dimensional data. The proposed technique was evaluated on both synthetic and real-world high-dimensional time series datasets, demonstrating its effectiveness in detecting anomalies. Its performance was also compared to other density ratio estimation-based anomaly detection methods. Full article
(This article belongs to the Special Issue Machine Learning Models and Algorithms for Image Processing)
Show Figures

Figure 1

11 pages, 441 KB  
Article
Symplectic Bregman Divergences
by Frank Nielsen
Entropy 2024, 26(12), 1101; https://doi.org/10.3390/e26121101 - 16 Dec 2024
Cited by 1 | Viewed by 1767
Abstract
We present a generalization of Bregman divergences in finite-dimensional symplectic vector spaces that we term symplectic Bregman divergences. Symplectic Bregman divergences are derived from a symplectic generalization of the Fenchel–Young inequality which relies on the notion of symplectic subdifferentials. The symplectic Fenchel–Young inequality [...] Read more.
We present a generalization of Bregman divergences in finite-dimensional symplectic vector spaces that we term symplectic Bregman divergences. Symplectic Bregman divergences are derived from a symplectic generalization of the Fenchel–Young inequality which relies on the notion of symplectic subdifferentials. The symplectic Fenchel–Young inequality is obtained using the symplectic Fenchel transform which is defined with respect to the symplectic form. Since symplectic forms can be built generically from pairings of dual systems, we obtain a generalization of Bregman divergences in dual systems obtained by equivalent symplectic Bregman divergences. In particular, when the symplectic form is derived from an inner product, we show that the corresponding symplectic Bregman divergences amount to ordinary Bregman divergences with respect to composite inner products. Some potential applications of symplectic divergences in geometric mechanics, information geometry, and learning dynamics in machine learning are touched upon. Full article
(This article belongs to the Special Issue Information Geometry for Data Analysis)
Show Figures

Figure 1

30 pages, 1927 KB  
Article
Fast Proxy Centers for the Jeffreys Centroid: The Jeffreys–Fisher–Rao Center and the Gauss–Bregman Inductive Center
by Frank Nielsen
Entropy 2024, 26(12), 1008; https://doi.org/10.3390/e26121008 - 22 Nov 2024
Cited by 1 | Viewed by 1561
Abstract
The symmetric Kullback–Leibler centroid, also called the Jeffreys centroid, of a set of mutually absolutely continuous probability distributions on a measure space provides a notion of centrality which has proven useful in many tasks, including information retrieval, information fusion, and clustering. However, the [...] Read more.
The symmetric Kullback–Leibler centroid, also called the Jeffreys centroid, of a set of mutually absolutely continuous probability distributions on a measure space provides a notion of centrality which has proven useful in many tasks, including information retrieval, information fusion, and clustering. However, the Jeffreys centroid is not available in closed form for sets of categorical or multivariate normal distributions, two widely used statistical models, and thus needs to be approximated numerically in practice. In this paper, we first propose the new Jeffreys–Fisher–Rao center defined as the Fisher–Rao midpoint of the sided Kullback–Leibler centroids as a plug-in replacement of the Jeffreys centroid. This Jeffreys–Fisher–Rao center admits a generic formula for uni-parameter exponential family distributions and a closed-form formula for categorical and multivariate normal distributions; it matches exactly the Jeffreys centroid for same-mean normal distributions and is experimentally observed in practice to be close to the Jeffreys centroid. Second, we define a new type of inductive center generalizing the principle of the Gauss arithmetic–geometric double sequence mean for pairs of densities of any given exponential family. This new Gauss–Bregman center is shown experimentally to approximate very well the Jeffreys centroid and is suggested to be used as a replacement for the Jeffreys centroid when the Jeffreys–Fisher–Rao center is not available in closed form. Furthermore, this inductive center always converges and matches the Jeffreys centroid for sets of same-mean normal distributions. We report on our experiments, which first demonstrate how well the closed-form formula of the Jeffreys–Fisher–Rao center for categorical distributions approximates the costly numerical Jeffreys centroid, which relies on the Lambert W function, and second show the fast convergence of the Gauss–Bregman double sequences, which can approximate closely the Jeffreys centroid when truncated to a first few iterations. Finally, we conclude this work by reinterpreting these fast proxy Jeffreys–Fisher–Rao and Gauss–Bregman centers of Jeffreys centroids under the lens of dually flat spaces in information geometry. Full article
(This article belongs to the Special Issue Information Theory in Emerging Machine Learning Techniques)
Show Figures

Figure 1

23 pages, 7837 KB  
Article
Understanding Higher-Order Interactions in Information Space
by Herbert Edelsbrunner, Katharina Ölsböck and Hubert Wagner
Entropy 2024, 26(8), 637; https://doi.org/10.3390/e26080637 - 27 Jul 2024
Cited by 5 | Viewed by 2760
Abstract
Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. [...] Read more.
Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. One such setting is a finite collection of discrete probability distributions embedded in the probability simplex measured with the relative entropy (Kullback–Leibler divergence). More generally, one can work with a Bregman divergence parameterized by a different notion of entropy. While theoretical algorithms exist for this setup, there is a paucity of implementations for exploring and comparing geometric-topological properties of various information spaces. The interest of this work is therefore twofold. First, we propose the first robust algorithms and software for geometric and topological data analysis in information space. Perhaps surprisingly, despite working with Bregman divergences, our design reuses robust libraries for the Euclidean case. Second, using the new software, we take the first steps towards understanding the geometric-topological structure of these spaces. In particular, we compare them with the more familiar spaces equipped with the Euclidean and Fisher metrics. Full article
Show Figures

Figure 1

19 pages, 4945 KB  
Article
Multivariate Time Series Change-Point Detection with a Novel Pearson-like Scaled Bregman Divergence
by Tong Si, Yunge Wang, Lingling Zhang, Evan Richmond, Tae-Hyuk Ahn and Haijun Gong
Stats 2024, 7(2), 462-480; https://doi.org/10.3390/stats7020028 - 13 May 2024
Cited by 11 | Viewed by 7975
Abstract
Change-point detection is a challenging problem that has a number of applications across various real-world domains. The primary objective of CPD is to identify specific time points where the underlying system undergoes transitions between different states, each characterized by its distinct data distribution. [...] Read more.
Change-point detection is a challenging problem that has a number of applications across various real-world domains. The primary objective of CPD is to identify specific time points where the underlying system undergoes transitions between different states, each characterized by its distinct data distribution. Precise identification of change points in time series omics data can provide insights into the dynamic and temporal characteristics inherent to complex biological systems. Many change-point detection methods have traditionally focused on the direct estimation of data distributions. However, these approaches become unrealistic in high-dimensional data analysis. Density ratio methods have emerged as promising approaches for change-point detection since estimating density ratios is easier than directly estimating individual densities. Nevertheless, the divergence measures used in these methods may suffer from numerical instability during computation. Additionally, the most popular α-relative Pearson divergence cannot measure the dissimilarity between two distributions of data but a mixture of distributions. To overcome the limitations of existing density ratio-based methods, we propose a novel approach called the Pearson-like scaled-Bregman divergence-based (PLsBD) density ratio estimation method for change-point detection. Our theoretical studies derive an analytical expression for the Pearson-like scaled Bregman divergence using a mixture measure. We integrate the PLsBD with a kernel regression model and apply a random sampling strategy to identify change points in both synthetic data and real-world high-dimensional genomics data of Drosophila. Our PLsBD method demonstrates superior performance compared to many other change-point detection methods. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

16 pages, 656 KB  
Article
Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity
by Frank Nielsen
Entropy 2024, 26(3), 193; https://doi.org/10.3390/e26030193 - 23 Feb 2024
Cited by 2 | Viewed by 3031
Abstract
Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the [...] Read more.
Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the cumulant and partition functions are strictly convex and smooth functions inducing corresponding pairs of Bregman and Jensen divergences. It is well known that skewed Bhattacharyya distances between the probability densities of an exponential family amount to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and that in limit cases the sided Kullback–Leibler divergences amount to reverse-sided Bregman divergences. In this work, we first show that the α-divergences between non-normalized densities of an exponential family amount to scaled α-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetical means allows both convex functions and their arguments to be deformed, thereby defining dually flat spaces with corresponding divergences when ordinary convexity is preserved. Full article
Show Figures

Figure 1

17 pages, 775 KB  
Article
Block-Active ADMM to Minimize NMF with Bregman Divergences
by Xinyao Li and Akhilesh Tyagi
Sensors 2023, 23(16), 7229; https://doi.org/10.3390/s23167229 - 17 Aug 2023
Cited by 7 | Viewed by 2206
Abstract
Over the last ten years, there has been a significant interest in employing nonnegative matrix factorization (NMF) to reduce dimensionality to enable a more efficient clustering analysis in machine learning. This technique has been applied in various image processing applications within the fields [...] Read more.
Over the last ten years, there has been a significant interest in employing nonnegative matrix factorization (NMF) to reduce dimensionality to enable a more efficient clustering analysis in machine learning. This technique has been applied in various image processing applications within the fields of computer vision and sensor-based systems. Many algorithms exist to solve the NMF problem. Among these algorithms, the alternating direction method of multipliers (ADMM) and its variants are one of the most popular methods used in practice. In this paper, we propose a block-active ADMM method to minimize the NMF problem with general Bregman divergences. The subproblems in the ADMM are solved iteratively by a block-coordinate-descent-type (BCD-type) method. In particular, each block is chosen directly based on the stationary condition. As a result, we are able to use much fewer auxiliary variables and the proposed algorithm converges faster than the previously proposed algorithms. From the theoretical point of view, the proposed algorithm is proved to converge to a stationary point sublinearly. We also conduct a series of numerical experiments to demonstrate the superiority of the proposed algorithm. Full article
(This article belongs to the Special Issue Feature Papers in Physical Sensors 2023)
Show Figures

Figure 1

Back to TopTop